Information Processing and Management 56 (2019) 102093
Contents lists available at ScienceDirect
Information Processing and Management journal homepage: www.elsevier.com/locate/infoproman
Cascade embedding model for knowledge graph inference and retrieval
T
Daifeng Li , Andrew Madden ⁎
School of Information Management, Sun Yat-Sen University, Guangzhou, Guangdong, China
ARTICLE INFO
ABSTRACT
Keywords: Graph embedding Knowledge embedding Knowledge graph inference Cascade ranking
Knowledge graphs are widely used in retrieval systems, question answering systems (QA), hypothesis generation systems, etc. Representation learning provides a way to mine knowledge graphs to detect missing relations; and translation-based embedding models are a popular form of representation model. Shortcomings of translation-based models however, limits their practicability as knowledge completion algorithms. The proposed model helps to address some of these shortcomings. The similarity between graph structural features of two entities was found to be correlated to the relations of those entities. This correlation can help to solve the problem caused by unbalanced relations and reciprocal relations. We used Node2vec, a graph embedding algorithm, to represent information related to an entity's graph structure, and we introduce a cascade model to incorporate graph embedding with knowledge embedding into a unified framework. The cascade model first refines feature representation in the first two stages (Local Optimization Stage), and then uses backward propagation to optimize parameters of all the stages (Global Optimization Stage). This helps to enhance the knowledge representation of existing translation-based algorithms by taking into account both semantic features and graph features and fusing them to extract more useful information. Besides, different cascade structures are designed to find the optimal solution to the problem of knowledge inference and retrieval. The proposed model was verified using three mainstream knowledge graphs: WIN18, FB15K and BioChem. Experimental results were validated using the hit@10 rate entity prediction task. The proposed model performed better than TransE, giving an average improvement of 2.7% on WN18, 2.3% on FB15k and 28% on BioChem. Improvements were particularly marked where there were problems with unbalanced relations and reciprocal relations. Furthermore, the stepwise-cascade structure is proved to be more effective and significantly outperforms other baselines.
1. Introduction Knowledge graphs usually contain huge amounts of data structured in the form of triplets (subject entity, relation, object entity denoted (s, r, o)). Each entity (s, o) relates to a concept in the real world, while relation r specifies the relationship between the two entities. Knowledge graphs (KGs) play an important role in information retrieval systems such as query expansion (Zhang et al., 2017), results ranking (Power, Power & Callan, 2017; Xiong, Callan & Liu, 2017). However, they have also become increasingly important in many AI-related applications, such as recommendation (Catherine & Cohen, 2016; Palumbo, Rizzo & Troncy, 2017), ⁎
Corresponding author. E-mail address:
[email protected] (D. Li).
https://doi.org/10.1016/j.ipm.2019.102093 Received 24 October 2018; Received in revised form 18 July 2019; Accepted 30 July 2019 Available online 14 August 2019 0306-4573/ © 2019 Elsevier Ltd. All rights reserved.
Information Processing and Management 56 (2019) 102093
D. Li and A. Madden
question answering (QA) (Hu, Zou, Yu, Wang & Zhao, 2018; Seyler, Yahya & Berberich, 2017), relation extraction (RE), etc. Although many large-scale knowledge graphs with millions of entities have been built (e.g. WordNet (Miller, 1995), Freebase (Bollacker, Evans, Paritosh, Sturge & Taylor, 2008) and Yago (Suchanek, 2007)), they are still far from being complete. For example, more than 70% of the person entries in Freebase are missing nationalities or birthplaces (Dong et al. 2014; Krompaß, Baier, and Tresp 2015). Knowledge graph inference aims to predict relations between entities under supervision of the existing knowledge graph. It is an important way to supplement knowledge graphs in addition to extract relations from plain text, which is exactly the focus of this work. Knowledge graph inference uses an existing knowledge graph to predict missing relations between entities. For example, someone may wish to retrieve the nationality of Leslie Cheung (a famous singer and actor), but the target triplet (Leslie Cheung, nationality, ?) isn't in the existing knowledge graph. There must be one entity which fits the query, but remains unobserved in the knowledge graph. Our task is to build a framework which can infer potential entities for the query and return the most probable ones based on observations from the existing knowledge graph. Representation learning seeks to map data into a low-dimensional continuous vector space while preserving certain information from the original data, making it easier to extract useful information when building classifiers or other predictors (Bengio, Courville & Vincent, 2013). It has attracted the attention of many researchers in recent years and is considered to be an advanced solution for knowledge inference and knowledge completion. For instance, translation-based embedding models are popular for knowledge representation. In these models, each entity (s or o) is represented as a point in the embedding vector space; and each relation (r) is represented as an operation (translation, projection, etc.) in the embedding vector space. Translation-based models share a similar principle: sr + r ≈ or (Bordes, Usunier, Garcia-Duran, Weston & Yakhnenko, 2013; Ji, Liu, He & Zhao, 2016; Lin, Liu, Sun, Liu & Zhu, 2015b; Wang, Zhang, Feng & Chen, 2014). This results in a similar score function: fr(s,o) = sr + r - or, where sr and or are the embedding vectors of subject and object entities projected into the relation-specific space. Despite the success of translation-based models in knowledge completion, most of them treat knowledge graphs as sets of triples, and seldom consider the inner correlations among different triplets (Feng, Huang & Yang, 2016). The two main limitations of translation-based models are summarized as below: (1) On reciprocal relations. In knowledge graphs, there are always some relations that are reciprocal. For example, if the triplet (s, r, o) exists, then the triplet (o, r’, s) must also exist, where r and r’ are referred to as reciprocal relationship. So, after TransE embedding, s + r ≈ o, o + r’ ≈ s, which means that r + r’ ≈ 0. It is therefore easier to predict the triplet (s, r, o) if (o, r’, s) is already in the training set. For instance, “people_with_this_profession” and “profession” are reciprocal relationships. If (Leslie Cheung, profession, actor) is in the training set, then it will be easier to predict relation (actor, people_with_this_profession, Leslie Cheung). But if triplet (s, r, o) does not have a reciprocal triplet (o, r’, s) in the training set, TransE gives poor predictions (see Table 4). (2) On unbalanced relations (such as 1-to-n, n-to-1, n-to-n relationships). For example, one subject entity may relate to multiple object entities: s + r ≈ on, leading to far more “training” for the subject entities than the object entities. In addition, if a subject entity is translated to multiple object entities with only one vector r, compromises must have been made to achieve the overall effect. This can make it difficult to assign a distinct position for n-side entities in the embedding space, and harder to predict the n-side entity.
Fig. 1. An illustration of graph features. The red dashed line indicates the missing relation. (a) A simple instance of graph structural features; (b) A general illustration of graph features. The dashed line between “Context of A” and “Context of B” indicates that they may intersect.
2
Information Processing and Management 56 (2019) 102093
D. Li and A. Madden
Fig. 2. Correlation between graph embedding and relations. The x-axis is the graph embedding distance of triplets in the specific relation, the y-axis is the proportion of the triplets with certain distance.
The limitations introduced above can be attributed to lack of contextual information. A knowledge graph is not only a set of triplets, but also a graph. Both semantic features and graph features are important elements of a knowledge graph. For example, “Tony Leung” and “Leslie Cheung” are both actors and both appeared in the movie “Days of Wild” (see Fig. 1(a)). “Tony Leung” and “Leslie Cheung” are structurally equivalent to some extent, so they will share some properties. If we already know that “Tony Leung” is Chinese, then we can infer that “Leslie Cheung” is probably Chinese. We also know that the movie “Days of Wild” is in Mandarin Chinese so “Leslie Cheung”, “Days of Wild”, “Chinese”, “China”, “Tony Leung” form a tight subgraph. Local features are also helpful for drawing inferences (Grover & Lescovec, 2016). More generally, if graphical contexts of entities A and C are similar, and contexts of B and D are similar, then if A is known to have a particular relationship with B, by analogy, there is a high probability that C will have a similar relationship with D (see 1(b)). In reality, there are dozens of triplets in specific relation r, it will be costly to compare “C” with each “A”. So we assume that the specific relation is correlated to the distance between entities, the probability of (C, r, D) can be calculated according to the distance between (A, B) and the distance between (C, D). Graph embedding, such as Node2vec (Grover & Leskovec, 2016), is able to capture graphical features of each entity and map them into a d-dimension vector. This is an efficient way to measure graphical similarity among different entities, and helps to infer missing relations like the examples shown in Fig. 1(a) and (b). To further verify our analysis, we assume that the graph embedding distance between subject and object entities is correlated according to whether or not they are connected, as well as to their semantic relations (i.e., the specific relations between subject and object entities). To assess this relationship, we used Node2vec to determine the embedding-based distributions (Fig. 2). In most of the relation types of each dataset (most of the relations in WIN18 and BioChem, and 80% of relations in FB15k), there were significant correlations between the graph embedding distances of two entities and their specific relation types. We randomly selected 5 heterogeneous triplet relations from each dataset used in the experiment (see Section “Experiment”) as examples and clustered them according to the node embedding distance of their subject and object entities (using dot production to calculate the distance between two node embedding vectors). The distribution curves of all these relations indicate that the graph embedding distance of two entities is positively correlated to the relation. If the assumption of normality is correct, most of the subject and object entities with a certain relation (for example, entities with relation bind in BioChem) will have an embedding distance, such dot production, within a certain range. For example, the embedding distances of 80% bind relations are between 10 and 17. In addition, the graph embedding distance between subject and object entities is also correlated to their relations, since the correlation in different relations will show different patterns. All these imply that the graph embedding distance contains relevant contextual information which may help to fix problems arising from unbalanced relations and lack of information about reciprocal relations. Graph embedding algorithms such as Node2vec (Grover & Leskovec, 2016) provide graph context information, which shows a positive correlation with entity relations. However, it is still necessary to extract this information as new embedding features in order to reduce the influence of other embedding information, then to concatenate the graph features with knowledge embedding features. To achieve this, a cascade structure is taken into consideration. Cascade models are often used in human face recognition (Dollár, Welinder & Perona, 2010). The model uses an approximate location feature set at the first cascade stage, then makes refinements stage by stage. In our research, knowledge embedding and graph embedding provide a rough expression of entity relations, and a cascade model refines the feature expression in stages, finally producing an optimized result. Missing relations in knowledge graphs may reduce their practicability in applications, such as traditional information retrieval tasks. Existing knowledge completion algorithms often make some false inferences. In our research, we find that graph embedding vectors are able to encode both local and global topological information for each entity, and they are correlated with their relations. Thus our main research objective is to incorporate both translation-based embedding and graph embedding into a unified framework to better infer missing relations in current knowledge graphs. Our work contributes in three ways to research in this field:
3
Information Processing and Management 56 (2019) 102093
D. Li and A. Madden
(1) A knowledge graph is not only a set of triplets: it is also a graph. We found that the distance between two entities’ graph embedding is correlated to their relation, and showed that the correlation can help to refine the performance of translation-based models. (2) We designed a cascade model to incorporate graph embedding and knowledge embedding into a unified framework, the target of which is to optimize the learning process and fuse the knowledge and graph embedding features from both local and global stages, leading to a more accurate embedding-based representation of relations; (3) We designed different cascade structures to find the optimal solution to the problem of knowledge inference and retrieval. Experimental results show that stepwise-cascade model significantly outperforms other baselines, especially for BioChem dataset (see Table 3). 2. Related work 2.1. Knowledge embedding models The objective of TransE (Bordes et al., 2013) is to minimize the margin-based ranking scores over the training set:
min
=
[ + d (s , r , o )
d (s , r , o )]+
(s, r , o) T (s , r , o ) T (s, r , o)
Where [x ]+ denotes the positive part of x, γ > 0 is a margin hyperparameter, the L1 or L2-norm is regarded as a dissimilarity measure d(s,r,o), and
T (s, r , o) = {(s , r , o)|s
E}
{(s, r , o )|o
E}
The set of negative samples T’ is generated by replacing subject entity or object entity with a random candidate entity. Many variant models which share a similar translation principle with TransE have been proposed in order to improve performance on unbalanced relations (see Table 1). TransH projects entity embedding into a relation-specific hyperplane, helping to ensure that entities will be different when involved in different relations (Wang et al., 2014). TransR (Lin et al., 2015b) models entities and relations in distinct space: entities are projected from entity space to relation space via a projection matrix. Similarly, STransE projects entity space to relation space by two matrices related to subject entity and object entity respectively (Nguyen, Sirts, Qu & Johnson, 2016). In TransD (Ji, He, Xu, Liu & Zhao, 2015), a more advanced model, the mapping matrices are determined by both entities and relations. In TranSparse (Ji et al., 2016), a sparse matrix is used as the projection matrix, which is a more flexible and low-rank matrix. Some other works have found that the translation principle applied to the above models is too strict and cannot cope well with complex entities and relations. To compensate for this, they have introduced parameter vectors into the translation principle, mapping entity and relation vectors to different positions when involved in different triplets. For example, (Feng, Zhou, Hao, Huang & Zhu, 2016) put forward a flexible translation principle (FT): sr + r ≈ αor, where α > 0, which means that (sr + r) is in the same direction as or in embedding space, but the distance between them is flexible. Chang et al. (2017) propose a dynamic translation principle (DT): (sr+αs) + (r+αr) ≈ (o+αo). They can be incorporated into the translation-based models referred to above, improving their performance. In addition to being a set of triplets, a knowledge graph also has features such as paths and graph structures, which can be used to help the embedding of entities and relations. Some later developments (such as PtransE (Lin et al., 2015a) and PaSKoGE (Jia, Wang, Jin & Cheng, 2018)) take such features into consideration. Others also consider neighbors (Nie and Sun, 2019; Wang and Cheng, 2018), entity type (Moon, Harenberg, Slankas & Samatova, 2017; Rahman & Takasu, 2018), and relation-types (Shi et al., 2017). GAKE (Feng et al., 2016) also uses contextual information by taking neighbor context, path context and edge context into consideration, and includes an attention mechanism which learns the representative power of each subject (vertices or edges). TCE (Gao Table 1 score function in TransE and its variant models. Model
Translation principle
Score function
Relation parameters
TransE (Bordes et al., 2013) TransH (Wang et al., 2014) TransR (Lin et al., 2015b) STransE (Nguyen et al., 2016) TransD (Ji et al., 2015)
s+r≈o sr + r ≈ or sr + r ≈ or sr + r ≈ or sr + r ≈ or
TranSparse (Ji et al., 2016) TransX-FT (Feng et al., 2016) TransX-DT (Chang et al., 2017) PTransE (Lin et al., 2015a) Traversing-TransE Comp (Guu, Miller & Liang, 2015) ContE (Moon et al., 2017)
sr + r ≈ or sr + r ≈ αor (sr+αs) + (r+αr) ≈ (o+αo) s + r ≈ o; s + p ≈ o q ≈ o*********q = s/r1/r2/…rk
|| s + r - o||p || s - wrTswr + dr – (o - wrTowr) ||p || Mrs + r - Mro ||p || Mr,1 s + r - Mr,2o ||p || Mrss + r - Mroo ||p, Mrs=rpspT + Ik×d, Mro=rpopT + Ik×d || Msr(θrs)s + r – Mor(θro)o ||p (s + r)To+ sT(o-r) || sr + r – or||p || s + r – o || + || p – r || || s + r1 + r2+…+rk –o ||p
r ∈ ℝk wr,dr ∈ ℝk Mr ∈ ℝk×d, s,o ∈ ℝd, r ∈ ℝk Mr,1, Mr,2 ∈ℝk×k, r∈ ℝk Mrs, Mro∈ ℝk×d, s, sp, o, op ∈ ℝd, r, rp ∈ ℝk Msr(θrs), Mor(θro) ∈ ℝk×d, r ∈ ℝk sr, or, r, α ∈ ℝk sr, or, r, αs, αr, αo,∈ ℝk s, r, o ∈ℝk s, rk, o ∈ℝk
s + o + rc ≈ r
|| s + o + rc – r||, c ϵ Cs,o
s, o, rc, r ∈ ℝk
4
Information Processing and Management 56 (2019) 102093
D. Li and A. Madden
et al., 2018) further builds the neighbor context and path context for the triple that can help TCE to distinguish complex relations. The work presented here, concerns graph structural information, derived from a graph embedding model which encodes local and global topological information. 2.2. Graph embedding models Graph embedding models are algorithmic frameworks for learning feature representations for nodes in networks. In other words, they are models that map nodes in networks into a low-dimensional space. The mainstream graph embedding models are DeepWalk (Perozzi, Al-Rfou & Skiena, 2014), LINE (Grover & Leskovec, 2016) and Node2vec (Grover & Leskovec, 2016). Just as a sentence can be regarded as an ordered sequence of words, one can regard a network as an ordered sequence of nodes (if appropriate sampling strategies are applied). Node2vec is the most flexible of the three models in its strategy for sampling nodes from a network. Node2vec provides an extension of Word2vec (Mikolov, Chen, Corrado & Dean, 2013, Mikolov, Sutskever, Chen, Corrado, & Dean, 2013), the core idea of which is a skip-gram model. In word2vec, one word is inputted into the skip-gram model, and the output is the surrounding words. Similarly, in Node2vec, the input is one node, and the output is the surrounding nodes. The problem therefore, is how to define the surrounding nodes. Node2vec uses a strategy which combines two classic search strategies: BFS (Breadth-first sampling) and DFS (Depth-first sampling), by controlling two search parameters p and q (Grover & Leskovec, 2016). Homophily and structural equivalence are encoded in this way. Nodes and the surrounding node sequences are fed into a Word2vec model as though they were sentences. Edge features in Node2vec are generated by a binary operator such as average, hadmard, L1norm or L2-norm on vectors of corresponding nodes. Hadamard production has been shown to be best for link prediction (Grover & Leskovec, 2016). In this article, a knowledge graph is treated as an undirected and weighted graph. Because both in-links and out-links are contextual and dependent on entity, it is not necessary to distinguish between them. The weight of an edge is equal to the number of links between two nodes connected to that edge: a higher weight indicates more relations between them. Node2vec's search strategy relies on surrounding nodes. Because it combines DFS and BFS, the embedding of a single node encodes both local and global information from the whole graph. Node2vec embeds nodes according to their surrounding nodes, and clusters similar nodes together. Consequently, nodes with similar context (community or structural equivalence) will be closer in embedding space. Nodes embedded with a similar context are more likely to be linked (See Fig 2). The next problem is identifying patterns associated with each relation. 2.3. Cascade learning model Cascade learning is a strategy that uses a sequence of functions to approach the true value. It achieves a trade-off between efficiency and effectiveness. According to previous studies, cascade models perform well in applications such as object detection and face recognition (Bourdev & Brandt, 2005; Dollár et al., 2010). In the first stage the algorithm selects a model which provides a rough approximation (for example, determining the position of human eyes in visual object detection). The algorithm then adjusts the models and output condition in subsequent stages. In recent years, cascade models have also been applied to relevance calculations and rankings, and have obtained good performance in both accuracy and efficiency. Different types of cascade ranking models have been proposed. For example, in the adaboost style framework (Wang, Lin & Metzler, 2010, 2011), every ranked document from the previous stage is subjected to pruning functions, after which the rank order is refined. The process concludes when all documents remaining in the set exceed its feature pruning threshold. Features are pruned to balance effectiveness and efficiency. Other cascade models divide features into k subsets, allowing the use of efficient features from earlier stages, and saving the processing of more expensive features until later stages, thereby achieving a trade-off between effectiveness and efficiency (Chen, Gallagher, Blanco & Culpepper, 2017; Liu, Xiao, Ou & Si, 2017). Methods based on neural networks transfer and existing feature space into a new one. The boosting method (Freund & Schapire, 1995; Friedman, 2001) divides the process of estimating the true value into several sub-tasks. Each sub-task learns a new position distribution for all entities based on the error passed from the previous sub-task. Cascade model is a phased, continuous optimization process. At each stage, the model's performance can be observed and controlled, and adjustments can be made to parameters. It offers an advantage over both these approaches because they adopt a global optimization strategy in the same space and have proved superior to neural methods or boosting methods in face recognition tasks (Dollár et al., 2010). In this research, we also compare our model to neural and boosting methods, and further evaluate the performance of cascade models in knowledge embedding. 3. Proposed method 3.1. Overview The aim of the work presented here is to predict the one component of a triplet, given the other two. For example, given a query (s, r), what is the most likely object entity o? We propose a cascade embedding framework for knowledge retrieval (Fig. 3). We extract semantic features and graph features from knowledge embedding and graph embedding, then feed them into a cascade learning model. The knowledge embedding model and graph embedding model are introduced in Section 2. The key problem addressed in this section is that of defining the cascade learning model. 5
Information Processing and Management 56 (2019) 102093
D. Li and A. Madden
Fig. 3. Overview of proposed method. Given a subject entity and a relation r, the target is to return the prospective object entities. It is the same way if given an object entity and a relation r (return prospective subject entities).
3.2. Problem definition We define a knowledge graph G = 〈V, E〉, where V is the set of entities (s, o) and E is the set of relations between entities. The graph representations of s and o are g(s) and g(o), which are derived from Node2vec. The knowledge representations between s and o are t(o) and t(s), and relation representation is t(r). The representations are derived from a translation-based embedding model. The graphical embedding distance between s and o is h(g(s), g(o)) = g(s) * g(o). The knowledge embedding distance between s and o is f(t (s), t(o)) = t(s) + r – t(o). We associate s and o with a label ys, o, r ∈ {0,1}. ys, o, r = 1 indicates two entities which have a relation r in E. Given this, we can define the problem addressed in this paper, which is as follows: Problem 1. Given a knowledge graph G, the goal is to learn a predictive function P(ys, o, r| G) = ℱ(f(t(s), t(o)), h(g(s), g(o))), where ℱ is the cascade model with T stages: {ℱ1, ℱ2, …ℱT}. 3.3. Cascade model description Assuming the feature set of xs, o, r is: xs, o, r = {f(t(s), t(o)), h(g(s), g(o))} , we define F (s, o) = f (t (s ), t (o)) and H (s, o) = h (g (s ), g (o)) . The framework of the proposed cascade model is shown in Fig. 4: As shown in Fig. 4, the input feature set xs, o, r is a combination of knowledge embedding features f(t(s), t(o)) on relation r, and of graph embedding features h(g(s), g(o)). As with TransE, the negative samples are generated by randomly replacing the subject or object entity. For cascade stage 1, a logistic sigmoid function is used for a single stage parameter estimation:
P(ys, o, r |F (s , o), H (s, o)) =
(
F ,1
F (s , o ) +
H ,1
(1)
H (s , o))
where P(ys, o, r|F(s, o), H(s, o)) is the probability of entities s and o having relation r at stage 1, θF, 1 and θH, 1 are the estimated parameters of knowledge embedding feature F and graph embedding feature H at stage 1. σ is a standard sigmoid function. The loglikelihood function is:
L(
F ,1,
H ,1)=
y xs, o, r s, o, r (1
× log P (ys, o, r = 1|F (s, o), H (s, o))+
ys, o, r ) × log (1
P (ys, o, r = 1|F (s , o), H (s, o))) +
F ,1,
H ,1 ||2
(2)
where α||θF, 1, θH, 1||2 is L2-norm regulation to address multiple collinearity and overfitting. We use stochastic gradient descent to estimate θF, 1 and θH, 1. This allows us to obtain an updated embedding feature set 1( x s, o, r ) at stage 2: 1(
x
s, o, r )
= stack {
F ,1.
*F (s, o),
H ,1.
(3)
*H (s, o)}
where .* refers to a corresponding element multiplication of two vectors. The function stack is to concatenate vectors θF, 1.*F(s, o) and θH, 1.*H(s, o). We define F ,1 ( x s, o, r) = F ,1. *F (s , o) and H ,1( x s, o, r) = H ,1. *H (s, o) for the following stages such as jth cascade stage. The logistic sigmoid function is defined as:
P(ys, o, r |
F, j,
H ,j)
=
(
F,j
F ,j 1(
F , j 2 (...
F ,1 (x s, o, r )))
+
H ,j
H ,j 1(
H , j 2 (...
H ,1 (x s, o, r ))))
(4)
For the jth stage, the modified objective function is as follows:
{ ^F , j , ^H , j} = arg max L (
F ,j,
H ,j |
F , j 1,
(5)
H , j 1)
6
Information Processing and Management 56 (2019) 102093
D. Li and A. Madden
Fig. 4. Framework of Cascade Embedding model. The general framework has three stages, stage 1 and 2 are based on Local Optimization stage (LOS), which means only focus parameters in current stage. Stage 3 is Global Optimization stage (GOS), which means to optimize all parameters of each stage.
All the above stages mainly focus on parameter updating of their current stages, so is called Local Optimization stage(LOS). At the final stage, a global optimization algorithm (GOS) is introduced to further optimize all the parameters and feature representations in each stage, and a backward propagation based stochastic decent algorithm is proposed to make optimization stage by stage. The objective function could be seen as below in formula (6):
L(
F ,1,
F ,1, xs, o, r
(1
F ,2 ,
H ,2 ,
...,
F ,j ,
H , j )=
ys, o, r × log P (ys, o, r = 1| ys, o, r ) × log (1
j(
j 1(... 1 (x s, o, r ))))+
P (ys, o, r = 1|
j(
j 1 (... 1 (x s, o, r )))))
(6)
where j ( j 1(... 1 (xs, o, r ))) = stack { F , j 1(... F ,1(xs, o, r )), H , j 1 (... H ,1 (xs, o, r ))} . After the whole training process, all the learned parameters make it possible to predict the probability of a relation between entities s and o. The output indicates the level of confidence in predictions of knowledge retrieval and knowledge ranking. The detail description of the training process could be seen in Algorithm 1. As seen in Algorithm 1, we generated negative samples for each positive sample. For a positive triplet sample (s, o, r}) of relation r, negative samples (s’, o’, r) are generated by randomly replacing subject and object entities from candidate entities. The method can provide control negative samples for each positive sample, which can help the model to learn the differences between positive and negative instances. Then for each positive sample {s, o, r}, a small training set (s,r,o) = {(s, r, o) Pr} {(s , r, o ) Pr} is generated, where Pr is the set of positive instance of relation r. It should be pointed out that more than one negative sample can be generated based on a positive training triplet (s, r, o), so ɛ(s, r, o) includes one positive sample and n negative samples. In addition, the generated negative samples will be updated after a certain number of iterations since the generated negative samples are easy to identify. The updated batch of negative samples will keep the model learning. The means of assigning negative samples is presented in the “Experiment” Section. 4. Experiment In this section, we evaluate the performance of the proposed models on two tasks: (1) Predicting the missing subject entity or object entity in a given triplet (Entity prediction); (2). Predicting the relation between a given subject entity and object entity (Relation prediction). 7
Information Processing and Management 56 (2019) 102093
D. Li and A. Madden
4.1. Data description Three data sets were used: WN18, FB15k and BioChem. WN18 is a subset of WordNet (Miller, 1995), which is a lexical database of English. FB15k is a subset of Freebase (Bollacker et al., 2008), which is a knowledge graph containing around 1.9 billion triplets, based on general facts . Both WN18 and FB15k are released in TransE (Bordes et al., 2013). We also tested the proposed method on the biochemical dataset BioChem, which is a knowledge base derived from Chem2Bio2Rdf (Chen et al., 2010). FB15k is a dataset with dense relations while WN18 and BioChem contain more entities and fewer links. Basic information on the three data sets is presented in Table 2. The data sets comprise a training set, a test set and a validation set. The validation set is for model parameter selection and the test set is for prediction performance tests. 4.2. Experimental setup State-of-the-art models were used as baselines, including mainstream knowledge embedding models, shallow neural networks, and Gradient Boosted Decision Trees (GBDT). To improve evaluation of the proposed cascade model, we designed a single/two stage model and a stepwise model combined with different knowledge embedding models. Translation-based embedding models: We selected the most representative translation-based algorithms: TransE, TransH, TransR, TransD, TranSparse and PTransE as baselines (see Table 1). Other models: These included a shallow neural network and GBDT. A two-layer full connected neural network was constructed as one of the baselines (see Fig. 5(a)). Both knowledge embedding features and graph features were taken as the input and went through the hidden layer. In the experiment, the number of nodes in the hidden layer was set at 128; the sigmoid was set as the activation function; and settings were optimized for the partial test. Single/two/three stage model: Different structures of cascade model were built. All features were used only once in the single stage model (see Fig. 5(b)). In the two/three stage model, all features were used from the start and went through the objective functions twice (see Fig. 5(c)). LOS and GOS are applied in multi-stage cascade models. Stepwise cascade model: (The method proposed in this paper.) Cascade learning can eliminate irrelevant features in early stages, and process the most relevant features in later stages. As there is variation in the distribution of the graphs, the graph embedding features constructed in this paper are rough. Consequently a stepwise cascade model was constructed, with graph features in the first stage, and knowledge embedding features added in the second stage (see Fig. 5(d)). In our experiment, two different knowledge embedding models (TransE and TranSparse) were used to generate knowledge embedding features to show the necessity of graph features (denoted as stepwise-TransX). LOS and GOS are also incorporated into stepwise model to obtain better performance. Implementation: Negative samples were generated by randomly replacing the head entity or tail entity of triplets in the training set or test set with other entities, to create new triplets which were not in the training set, the test set, or the valid set. Since the number of training samples affects the accuracy and reliability of the classifiers, we set the minimum number of training samples at 100. Relations with fewer training samples were ranked only by the L1 distance of knowledge embedding. Parameter setting was carried out in two steps. Firstly, a search was made of the optimal parameters of the knowledge embedding model and graph embedding model in a certain range; next, the optimal parameters of cascade framework according to mean rank were searched. For the first parameter setting stage, we fixed the parameters of cascade model as default assignment, we ran TranSparse and Node2vec on the training set using the code provided on GitHub1,2 to search the optimal parameters of both TranSparse and Node2vec. Finally, for all three experimental datasets FB15k, WN18 and BioChem, the parameter assignment in TranSparse was {γ = 1.5, α = 0.001, k = 100, epochs = 1000}, where γ is the margin between positive triplet and negative triplet, α is the learning rate, k is the dimension of embedding vectors, and L1 is dissimilarity. In Node2vec, {p = 1, q = 1, k = 128, length of walks = 80, context size = 10, number of walks per node = 6}, where p and q are parameters controlling the random walk, and k is the dimension of embedding vectors. L1 is taken as the dissimilarity for all data sets. Training time was limited to a maximum of 1000 epochs in translation-based embedding models, and 10 epochs in Node2vec over the training set. For the second parameter setting stage, adaptive stochastic decent was utilized for global and local optimization stage of the proposed cascade model. Besides, we have summarized three important parameters, number of negative samples, weight decay and sample frequency, which would significantly influence the performance of the proposed model. The optimized parameters were selected according to mean rank, and the parameter sensitive analysis of cascade models on WN18 could be seen in Fig. 6. In Fig. 6(a), when the number of negative samples is set at 1, none of the cascade models could learn the differences between positive and negative instances (Mean Rank is above 1000). Cascade-stepwise-los does not consider the global optimization stage, so its mean rank is significantly lower than the other three cascade models. The performances of the cascade two-stage and three-stage models are similar, which suggests that adding more stages does not significantly improve the proposed cascade model. Cascadestepwise-gos-los significantly outperforms the other three cascade models. The model achieves the best mean rank score (149) when the number is set to 20. The lowest score 159 is obtained when the number is set to 10. In Fig. 6(b), weight decay is used to prevent over-fitting during the training process. For all weight decay assignments, the proposed cascade-stepwise-gos-los model outperforms the other cascade models significantly. The best mean rank (149) is achieved when the value is set at 0.1. With increasing weight decay, the mean rank is increased, which indicating that a larger value of weight decay could not make the model learn the training 1 2
http://www.nlpr.ia.ac.cn/cip/∼liukang/liukangPageFile/code/TransSparse.rar https://github.com/aditya-grover/node2vec 8
Information Processing and Management 56 (2019) 102093
D. Li and A. Madden
Table 2 Statistics of data sets. Three representative Knowledge graphs WN18, FB15k and BioChem are selected as experimental datasets.
Number of entities Number of relations Train Test Valid
WN18
FB15k
BioChem
40,943 18 141,442 5000 5000
14,951 1345 483,142 59,071 50,000
295,911 12 709,865 10,000 –
Fig. 5. (a) to (c) illustrate the procedures used to generate baselines; (d) summarizes the proposed model. F stands for knowledge embedding features and H stands for graph embedding features. All links in the figure indicate projection with parameters θj learned by objective function in the jth stage, which is a binary classifier. The output for ranking is the probability that s and o are linked in the model: P = σ (θ⋅x), where σ is a standard sigmoid function, x is the feature set in the final stage, and θ is the set of parameters learned in the final stage.
dataset properly. In Fig. 6(c), sample_freq means the frequency of updating training datasets: for example, 1 means changing training samples for each iteration. For all the cascade models, the best performances were obtained when sample_freq was smaller than 3. Performance decreases when the value of sample_freq is increased. For the proposed cascade-stepwise-gos-los model, the best performance was achieved when sample_freq was set to 1. 4.3. Entity prediction Given a specific relation r and a subject entity s or object entity o, the challenge is to predict the missing entity ? in the triplet (?, r, o) or (s, r, ?). Two metrics are reported following the evaluation protocol in TransE (Bordes et al., 2013): the average rank (denoted as mean rank) and the proportion of ranks not larger than 10 (denoted as hits@10). For each triplet in the testing set, the subject or 9
Information Processing and Management 56 (2019) 102093
D. Li and A. Madden
Fig. 6. Sensitivity Analysis of Parameters in the Proposed Cascade Model (WN18). TranSparse is assigned as knowledge embedding and Node2vec is assigned as graph embedding. Four representative cascade models are selected. Fig. 6(a) shows the correlation between LOG (Mean Rank) and the number of negative samples; Fig. 6(b) shows the correlation between mean rank and the changing frequency of training samples; Fig. 6(c) shows the correlation between mean rank and weight decay.
object entity is replaced by each entity in the set, and candidate triplets are formed. The dissimilarity scores of all candidate triplets, together with the scores for the correct triplet (i.e., the triplet in the test set), are computed by the models and are sorted in ascending order. The rank of the correct triplet is then recorded. Some candidate triplets may already exist in the training set, the test set or the validation set, and may be ranked above the test triplet. However, they should not be counted as errors because all these triplets are genuine. They should therefore be removed to avoid skewing the results. This procedure (known as filtered setting in TransE) provides a clearer evaluation of the success in link prediction. The evaluations in this paper all adopt filtered setting, so all candidate triplets in the knowledge graph (including training set, validation set and test set) are filtered. Table 3 summarizes the performance of all models for the three data sets. The proposed method performs better than TransE on all experiment data sets, regardless of mean rank or hits@10. Compared to TransE, the mean rank fell to half or less, the hits@10 increased by 2.3–2.7% and rising as high as 28% on BioChem. These findings suggest that the graph characters learned from Node2vec contribute to the entity prediction task. The proposed method also has advantages when compared to the other baseline methods. PtransE gives the best performance on the FB15k data set, achieving a mean rank of 54, and hits@10 of 83.4%. For WN18 and BioChem, the hits@10 rate of PTransE is quite close to our method, but the proposed method performs better on mean rank. The two-layer neural network and GBDT showed some improvement over TransE, but the proposed method performed better, especially on mean rank. In this experiment, we also constructed several different cascade models. From the results in Table 3 we can see that, although the two-stage model offered little improvement, the single stage model performed considerably better than TransE, suggesting that the knowledge embedding features and graph features contribute to ranking of predictions. The single stage model offers an improvement, but the stepwise cascade model performs still better. We also test the stepwise cascade model derived from different knowledge embedding models such as Table 3 Entity Prediction results (Filtered). * the result released in the original paper.
Translation- base embedding models
Other models Multi-stage cascade models Stepwise cascade models
Models
WN18 Mean rank
hits@10
FB15K Mean rank
hits@10
BioChem Mean rank
hits@10
TransE (Bordes et al., 2013) TransH (Wang et al., 2014) TransR (Lin et al., 2015b) TransD (Ji et al., 2015) TranSparse (Ji et al., 2016) PtransE(ADD,2-step) (Lin et al., 2015a) Two-layer neural network GBDT Single stage model Two stage model-los-gos Three stage model-los-gos TransE Stepwise-TransE-los Stepwise-TransE-los-gos TranSparse Stepwise-TranSparse-los Stepwise-TranSparse-los-gos
251* 303* 225* 212* 211* 516.2 304 302 258 166 168 449.6 218 157 264 214 149
0.892* 0.867* 0.920* 0.922* 0.932* 0.945 0.943 0.935 0.945 0.945 0.949 0.921 0.948 0.948 0.951 0.952 0.95
125* 87* 77* 91* 82* 54* 84 140 73 72 74 138.6 74 74 126 73 68
0.471* 0.644* 0.687* 0.773* 0.795* 0.834* 0.731 0.555 0.733 0.735 0.743 0.711 0.734 0.75 0.748 0.758 0.751
10,120 – – – – 2998 – – 1790 1354 1364 10,120 1733 1378 3332 1333 1313
0.185 – – – – 0.442 – – 0.442 0.46 0.461 0.185 0.442 0.459 0.441 0.457 0.465
10
Information Processing and Management 56 (2019) 102093
D. Li and A. Madden
Table 4 Entity prediction results on triplets without reciprocal triplets in training set. The purpose of the experiment is to analyze the model's performance on reciprocal relations.
TransE Single stage model Stepwise-los Stepwise-gos-los
WN18 Mean rank
hits@10
FB15K Mean rank
hits@10
7316 4233 3583 3357
0.1441 0.207 0.23 0.235
332 206 206 204
0.508 0.522 0.52 0.522
TranSparse. The proposed model appears to offer an improvement over those modified translation embedding models, supporting the idea that graph embedding contains useful features. To analyze the model's performance on reciprocal relations (as described in Section 1), we extracted triplets without reciprocals in the training set and tested them alone (see Table 4). TransE cannot handle this situation well, and the proposed model performed noticeably better. This is almost certainly due to the introduction of graph features. The proposed model is also helpful for tackling the challenge of unbalanced relations. The determination of unbalanced relations derives from TransE (Bordes et al., 2013). Given the pair (r, o), for each relation r, the average number of subject entities (s) appearing in the data set is calculated. Similarly, given the pair (s, r), for each relation r, the average number of object entities (o) appearing in the data set is calculated. In each case, the argument is labeled 1 if the average number is below 1.5, and labelled n otherwise. In BioChem, all the relations are n-to-n, or 1-to-n (see Table 5). For example, the relation bind is an extreme example of an unbalanced relation. On average, nearly 110 subject entities point to each object entity in the relation, and, on average, only 2 object entities point to each subject entity in the relation. As a result, there are substantially fewer links pointing to subject entities. The hits@10 for subject predictions and for object predictions is considerably higher for the proposed method than for TransE (see Table 5), which demonstrates the ability of the proposed model to handle unbalanced relations. The number of unbalanced relations is a one significant problem; sparsity is another. It is one of the reasons why TransE does not perform well on the BioChem dataset. Sparsity results in there being fewer links to each entity on average, leading to sparse graphs which provide limited information for TransE to learn. Just like BioChem, there are 708,965 triplets (or edges) and 295, 911 entities (or nodes), thus, there are 5 links (including in-links and out-links) for each node on average. More importantly, nodes with only one degree are in the majority. Inclusion of graph features provides topological information concerning local and global context, which can make the task of prediction more effective. Thus, the proposed model has a positive impact on knowledge inference. The features help the proposed stepwise model to refine the embedding from the translation-based embedding model, even where relations are unbalanced, or where there are no reciprocal relations in the training set. 4.4. Relation prediction WN18 and FB15k were used to evaluate the effectiveness of the proposed model in predicting the relation ? in the triple (s, ?, o). As with entity prediction, we first removed the relation in the test triplets in the test set, then replaced it with all candidate relations in the knowledge graph. As with entity prediction, the rankings of test triplets in corresponding candidate triplets were recorded. Because the number of relation types is smaller than the number of entities, hits@1 and hits@3 were used in the evaluation protocol instead of hits@10. We also compared our model's performance in predicting relations to the performance of TransE (Table 6). Mean rank was better than TransE for both WN18 and FB15k; and the new model has better hits@ values, especially for WN18. The single stage cascade Table 5 Detail entity prediction results for each relation of BioChem. The purpose of the experiment is to analyze the model's performance on unbalance relations. Relation
n-to-n
Predict subject TransE
Stepwise-los-gos
Predict object TransE
Stepwise-los-gos
hasChemicalOntology Bind Express hasGeneFamily proteinProteinInteraction expressIn hasPathway hasPathway causeDisease causeSideEffect hasSubstructure hasGO
5.2–16 109.4–2 3.7–10.4 1–21.6 4.4–4.1 2.5–19.2 2.8–55.1 1.6–4.8 1.5–2.1 11.2–8.4 20.8–4.6 9.1–6.1
0.0487 0.0499 0.0883 0.4348 0.1213 0.1486 0.3192 0.4444 0.1333 0.0461 0 0.0657
0.3695 0.1878 0.2026 0.7862 0.1842 0.295 0.6658 0.4673 0.3877 0.0752 0.1334 0.1666
0.1770 0.3918 0.0813 0 0.1343 0.0171 0.0282 0.1667 0 0.0724 0.1340 0.1525
0.4883 0.9115 0.2011 0.1493 0.1935 0.0802 0.1863 0.2778 0.12 0.0805 0.4183 0.3145
11
Information Processing and Management 56 (2019) 102093
D. Li and A. Madden
Table 6 Relation Prediction Results (filtered). Mean rank for WN18 could achieve 2.1, Stepwise model could also obtain a small improvement. Single stage cascade model could also achieve a better performance.
TransE Single stage cascade Stepwise-TransE-los Stepwise-TransE-gos-los Stepwise-TranSparse-gos-los
WN18 Mean rank
hits@1
hits@3
hits@10
FB15k Mean rank
hits@1
hits@3
hits@10
4.0 2.1 2.1 2.1 2.1
0.241 0.486 0.478 0.482 0.481
0.733 0.927 0.922 0.923 0.925
0.987 1 1 1 1
90.4 67.8 68.1 67.5 67.2
0.59 0.591 0.592 0.608 0.606
0.707 0.71 0.711 0.701 0.71
0.768 0.774 0.774 0.785 0.79
Fig. 7. Performance of Link Prediction based on AUC indicator. For Youtube, social relations are selected as edges. For WN18, relation member_meronym is selected to make evaluation. For FB15k, relation award_winner is selected to make evaluation. Two representative baselines are Node2vec (Grover & Leskovec, 2016) and LINE (Grover & Leskovec, 2016) respectively. Besides, two cascade models are selected.
model achieves similar improvements. However, it should be noted that relation prediction is simpler than entity prediction. The single-stage model could also distinguish relations and improves predictions. The Stepwise model does not improve much on the single-stage model, and performs worse on WN18. We also tested the proposed model based on AUC (Area Under Curve). Two knowledge graph WN18 and FB15K were selected. Besides, in order to further verify the extension application of the proposed model, a public dataset of Youtube (Wang et al., 2016) was also selected. The mission was to hide the relations r of testing dataset and utilized the trained model to predict the missing relations. Experimental results could be seen in Fig. 7. In Fig. 7(b), (c), Stepwise-gos-los outperformances the other three baselines significantly on both WN18 and FB15k. The performance of Stepwise-los is ranked the second. The experiment further verifies the effectiveness of gos (global optimization stage). Besides, compared with Node2vec, the average improvement is 3%. The contributions are mainly from stepwise framework and knowledge embedding. In Fig. 7(a), Because Youtube dataset has only one relation and lack of semantic information, the influence of knowledge embedding is very small. The stagewise framework contributes 0.16% improvement while the gos contributes 0.35% improvement. 4.5. Case Study Table 7 provides examples of entity predictions made by TransE and the proposed model. Given an entity and relation, the top 5 predictions are presented with correct predictions shown in bold. All the predictions make sense at a conceptual level: for example, in the relation education institution, the subject entity is a person, and the object entities in the predictions are all educational institutions. However, the rankings suggested by the proposed model are an improvement on those from TransE. For example, when given Leslie Cheung and nationality, the correct answer (People's Republic of China) is ranked at number 2 but is not listed in TransE's top 5. The proposed model also deals better with unbalanced relations than TransE. The relations films in this genre and people with this profession tend to be 1-N relationships, making it is harder to predict the n-side entities, but the proposed model managed to include correct answers in its top 5. 5. Conclusion In this paper, a cascade learning framework is proposed for relation-based knowledge inference and retrieval. The model combines knowledge embedding features and graph features derived from the embedding vectors of a translation-based embedding 12
Information Processing and Management 56 (2019) 102093
D. Li and A. Madden
Table 7 Examples of predictions on FB15k (Given a subject and a relation, top 5 object entity prediction). The right answer in test set is shown in bold. Input (subject entity and relation)
Stepwise cascade-los
Stepwise cascade-gos-los
TransE
Ted Kennedy – education institution
University of Virginia School of Law; Boston Latin School; University of Chicago; Harvard Graduate School of Design; Tufts University A Time to Kill; A Few Good Men; Fatal Attraction; One Flew Over the Cuckoo's Nest; The Taking of Pelham 123 Singapore; Hong Kong; Ice land; Switzeland; South korea
University of Virginia School of Law; Boston Latin School; 90th United States Congress; University of Chicago; Tufts University A Time to Kill (film); Gia (film); A Few Good Men; Sleepers (movie); Fatal Attraction; Ed Wood (film) Hong Kong; Republic of France; Ice land; Austria; Switzeland;
United Kingdom; People's Republic of China; Hong Kong; England; Northern Ireland; Academy Award; 28th Golden Globe Awards; 37th Golden Globe Awards; 25th Academy Awards nominees and winners; 56th Golden Globe Awards nominees;
United Kingdom; People's Republic of China; Hong Kong; United States; England; Academy Award; 25th Academy Awards nominees and winners; 55th Golden Globe Awards; 28th Golden Globe Awards; Coming Home (film);
University of Chicago; Tufts University; Massachusetts Institute of Technology; Vanderbilt University; Brandeis University A Time to Kill; One Flew Over the Cuckoo's Nest; Secretariat; Fatal Attraction; The Taking of Pelham 123 Hong Kong; Island of Ireland; United Arab Emirates; Republic of France; Switzeland; United Kingdom; United States of America; England; Northern Ireland; New Orleans 56th Golden Globe Awards nominees; 62nd Golden Globe Awards nominees; 28th Golden Globe Awards; 14th Screen Actors Guild Awards; Academy Award;
Legal drama – films in this genre Winnie the Pooh (film) film_release_region Leslie Cheung - nationality Oscar for Best Actor award_honor_ceremony
model and Node2vec respectively. The model was tested on two general knowledge bases (WN18 and FB15k), and a biochemical data set (BioChem). The proposed model proved better at predicting both entities and relations than the well-established baseline models used as comparators. This was true even for reciprocal relations and unbalanced relations. All above indicates that the representative power of the knowledge embedding model was enhanced by integrating graph features, which contains more global information. The proposed method is applicable to knowledge inference, to complete knowledge retrieval systems, and to question answering systems, etc. It has the potential to make systems more effective, and could potentially contribute to hypothesis generation systems, since it considerably improved predictions for the biochemical dataset. In addition, the application of the proposed method is not limited to knowledge graphs: it can also work on heterogeneous networks (since knowledge graphs can be seen as a type of heterogeneous network). So, there are many potential applications for the proposed method, including, for example, coauthor recommendation within an academic heterogeneous network (Dong, Chawla N & Swami, 2017), social network link prediction (Chen et al., 2017), etc. The proposed model also has potential value in other heterogeneous networks, such as IoT or vehicle networks. For example, given a vehicle and cloud service network (Ridhawi, Aloqaily, Kantarci, Jararwen & Moulftah, 2018), the proposed model could learn the semantic and context features of a vehicle based on its status, like the locations of vehicle, running states, etc., and could provide an optimal ranking list of candidate cloud services to improve users’ experiences. Using cloud and edge computing to accelerate the servicing speed in densely crowded environments (Aloqaily, Ridhawi, Salameh & Jararweh, 2019) is a challenging research topic in related domains. In a mobile network, communication information between a mobile device and other devices nearby can help cloud services to distribute sub-tasks to different terminals, improving the speed of information processing. Our research may contribute to the optimization of distribution algorithms.
Algorithm 1 Description of the proposed cascade model. LOS: Local Optimization Stage GOS: Global Optimization Stage INPUT: 1. Knowledge Graph: Triple {s, o, r}. 2. Features of Knowledge Embedding: F(s,o). m-dimension vectors. 3. Features of Graph Embedding: H(s, o). n-dimension vectors. 4. Iterations: N. INITIALIZE: 1. Initialize all the parameters in each stage: a. Use Normal Distribution to initialize all the parameters in each stage; b. Assign the value of Negative_smaple (n), Sample_freq (s), Weight_decay (w), Batch_size (s). 2. For each positive instance {s, o, r} of relation r, generate n negative samples to obtain training data. FOR EACH ITERATION: If (Iteration% s == 0): //Use Sample_freq (s) to judge whether to generate new training data. For each positive instance {s, o, r} of relation r, generate n negative samples to obtain training data. Endif For Stage 1 (LOS): Utilize formula (1) and (2) to learn the parameter θF, 1 and θH, 1. For Stage 2 (LOS): Utilize formula (3), (4) and (5) to learn parameter θF, 2 and θH, 2. For Stage 3 (GOS): Utilize formula (6) to optimize θF, 1, θH, 1, θF, 2 and θH, 2 of each stage. OUTPUT: θF, 1, θH, 1, θF, 2 and θH, 2.
13
Information Processing and Management 56 (2019) 102093
D. Li and A. Madden
The cascade ranking framework is transparent: the input and output of each stage is observable and controllable. In addition, the proposed model is flexible, so that some other features like degree and common neighbors can be easily integrated. In future, some feature selection strategy may be developed which can automatically construct models for the method. Finally, we hope that a unified model, which combines different embedding features, may be created in order to take advantage of the benefits of the different approaches inherent in knowledge embedding models and graph embedding models. This may significantly improve the task quality of knowledge completion and the practical value of knowledge graphs. Acknowledgments This research is supported by Chinese National Natural Science Youth Foundation Research (grant no: 61702564), Talent Scientific Research Foundation of Sun Yat-sen University (grant no: 20000-18831102). References Aloqaily, M., Ridhawi, I. A., .Salameh, H. B., .& Jararweh, Y. (2019). Data and service management in densely crowded environments: Challenges, opportunities, and recent developments. 57(4). Bengio, Y., Courville, A., & Vincent, P. (2013). Representation Learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35, 1798–1828. Bollacker, K., Evans, C., Paritosh, P., Sturge, T., & Taylor, J. (2008). Freebase: A collaboratively created graph database for structuring human knowledge. Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (pp. 1247–1250). AcM. Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., & Yakhnenko, O. (2013). Translating embeddings for modeling multi-relational data. Advances in Neural Information Processing Systems (pp. 2787–2795). . Bourdev, L., & Brandt, J. (2005). Robust object detection via soft cascade, in: Computer vision and pattern recognition, 2005. CVPR 2005. IEEE Computer Society Conference On (pp. 236–243). IEEE. Catherine, R., & Cohen, W. (2016). Personalized recommendations using knowledge graphs: A probabilistic logic programming approach. ACM Conference on Recommender Systems (pp. 325–332). . Chang, L., Zhu, M., Gu, T., Bin, C., Qian, J., & Zhang, J. (2017). Knowledge graph embedding by dynamic translation. IEEE Access : Practical Innovations, Open Solutions, 5, 20898–20907. https://doi.org/10.1109/ACCESS.2017.2759139. Chen, B., Dong, X., Jiao, D., Wang, H., Zhu, Q., Ding, Y., et al. (2010). Chem2Bio2RDF: A semantic framework for linking and data mining chemogenomic and systems chemical biology data. BMC bioinformatics, 11, 255. https://doi.org/10.1186/1471-2105-11-255. Chen, R. C, Gallagher, L., Blanco, R., & Culpepper, J. S. (2017). Efficient Cost-Aware Cascade Ranking in Multi-Stage Retrieval. Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. August 07-11 (pp. 445–454). Dollár, P., Welinder, P., & Perona, P. (2010). Cascaded pose regression, in: Computer vision and pattern recognition (CVPR). 2010 IEEE Conference On. Presented at the Computer Vision and Pattern Recognition (pp. 1078–1085). IEEE. Dong, X, Gabrilovich, E, Heitz, G, Horn, W, Lao, N, Murphy, K, Strohmann, T, Sun, S, & Zhang, W (2014). Knowledge Vault: A Web-Scale Approach to Probabilistic Knowledge Fusion. Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 601–610). . Dong, Y., Chawla N, V., & Swami, A. (2017). metapath2vec: Scalable representation learning for heterogeneous networks[C]. Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 135–144). ACM. Feng, J, Zhou, M, Hao, Y, Huang, M, & Zhu, X (2016). Knowledge Graph Embedding by Flexible Translation. Proceedings of 15th International Conference on Principles of Knowledge Representation and Reasoning (KR'16). April 25-29 (pp. 557–560). Feng, J., Zhou, M., Hao, Y., Huang, M., & Zhu, X. (2016). Knowlege graph embedding by flexible translation. ArXiv150505253 Cs. Freund, Y., & Schapire, R. E. (1995). A decision-theoretic generalization of on-line learning and an application to boosting. European Conference on Computational Learning Theory (pp. 23–37). . Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29, 1189–1232. Gao, H, Shi, J, Qi, G, & Wang, M (2018). Triple Context-Based Knowledge Graph Embedding. IEEE Access, 6, 58978–58989. Grover, A., & Leskovec, J. (2016). node2vec: Scalable feature learning for networks. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 855–864). ACM. Guu, K., Miller, J., & Liang, P. (2015). Traversing knowledge graphs in vector space. Computer Science. Hu, S., Zou, L., Yu, J. X., Wang, H., & Zhao, D. (2018). Answering natural language questions by subgraph matching over knowledge graphs. IEEE Transactions on Knowledge and Data Engineering, 30(5) 1–1. Ji, G., He, S., Xu, L., Liu, K., & Zhao, J. (2015). Knowledge graph embedding via dynamic mapping matrix. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers (pp. 687–696). . Ji, G, Liu, K, He, S, & Zhao, J (2016). Knowledge Graph Completion with Adaptive Sparse Transfer Matrix. Proceedings of the 13th AAAI Conference on Artificial Intelligence. February 12-17 (pp. 985–991). Phoenix, Arizona. Jia, Y., Wang, Y., Jin, X., & Cheng, X. (2018). Path-Specific knowledge graph embedding. Knowledge-Based Systems, 151, 37–44. Krompaß, D, Baier, S, & Tresp, V (2015). Type-Constrained Representation Learning in Knowledge Graphs. Proceedings of the 14th International Conference on The Semantic Web. 9366. Proceedings of the 14th International Conference on The Semantic Web (pp. 640–655). Lin, Y, Liu, Z, Luan, H, Sun, M, Rao, S, & Liu, S (2015a). Modeling relation paths for representation learning of knowledge bases. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. September 17-21 (pp. 705–714). Lisbon, Portugal. Lin, Y, Liu, Z, Sun, M, Liu, Y, & Zhu, X (2015b). Learning Entity and Relation Embeddings for Knowledge Graph Completion. Proceedings of the 29th AAAI Conference on Artificial Intelligence. January 25-30 (pp. 2181–2187). Austin, Texas. Liu, S., Xiao, F., Ou, W., & Si, L. (2017). Cascade ranking for operational E-commerce search. Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1557–1565). ACM. Mikolov, T, Chen, K, Corrado, G, & Dean, J (2013). Efficient estimation of word representations in vector spacehttps://arxiv.org/abs/1301.3781. Mikolov, T., Sutskever, I., Chen, K., Corrado, G., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. International Conference on Neural Information Processing Systems (pp. 3111–3119). . Miller, G. A. (1995). WordNet: A lexical database for english. Communications ACM, 38, 39–41. Moon, C, Harenberg, S, Slankas, J, & Samatova, N. F. (2017). Learning Contextual Embeddings for Knowledge Graph Completion. PACIS 2017 Proceedings (pp. 248). . http://aisel.aisnet.org/pacis2017/248. Nguyen, D. Q., Sirts, K., Qu, L., & Johnson, M. (2016). STransE: A Novel Embedding Model of Entities and Relationships in Knowledge Bases. Proceedings of NAACLHLT'16. June 12-17 (pp. 460–466). San Diego, California. Nie, B, & Sun, S. (2019). Knowledge Graph Embedding via reasoning over entities, relations and text. Future Generation Computer Systems, 91, 426–433. Palumbo, E., Rizzo, G., & Troncy, R. (2017). entity2rec: Learning user-item relatedness from knowledge graphs for Top-N item recommendation. Eleventh ACM Conference on Recommender Systems (pp. 32–36). . Perozzi, B., Al-Rfou, R., & Skiena, S. (2014). Deepwalk: Online learning of social representations. Proceedings of the 20th ACM SIGKDD International Conference on
14
Information Processing and Management 56 (2019) 102093
D. Li and A. Madden
Knowledge Discovery and Data Mining (pp. 701–710). ACM. Power, R., Power, R., & Callan, J. (2017). Explicit semantic ranking for academic search via knowledge graph embedding. International Conference on World Wide Web (pp. 1271–1279). . Rahman, M. M., & Takasu, A. (2018). Knowledge graph embedding via Entities‘ type mapping matrix. International Conference on Neural Information Processing (pp. 114–125). Springer. Ridhawi, I. A., Aloqaily, M., Kantarci, B., Jararwen, Y., & Moulftah, H. (2018). A continuous diversified vehicular cloud service availability framework for smart cities. Computer Networks, 145(9), 207–208. Seyler, D., Yahya, M., & Berberich, K. (2017). Knowledge questions from knowledge graphs, in: ICTIR 2017. Proceedings of the 2017 ACM SIGIR International Conference on the Theory of Information Retrieval (pp. 11–18). . Shi, J, Gao, H, Qi, G, & Zhou, Z (2017). Knowledge Graph Embedding with Triple Context. Proceedings of the 2017 ACM Conference on Information and Knowledge Management (pp. 2299–2302). . Suchanek, F. M. (2007). Yago : A core of semantic knowledge unifying wordnet and Wikipedia. International Conference on World Wide Web. Wang, C, & Cheng, P. (2018). Translating Representations of Knowledge Graphs with Neighbors. The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval (SIGIR'18). Wang, L., Lin, J., & Metzler, D. (2010). Learning to efficiently rank. Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 138–145). ACM. Wang, L., Lin, J., & Metzler, D. (2011). A cascade ranking model for efficient ranked retrieval. Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 105–114). ACM. Wang, Z., Zhang, J., Feng, J., & Chen, Z. (2014). Knowledge graph embedding by translating on hyperplanes. Proceedings of the 28th AAAI Conference on Artificial Intelligence. July 27-31. Québec City, Québec, Canada1112–1119. Wang, D, Cui, P, & Zhu, W (2016). Structural Deep Network Embedding. Proceedings of 22nd ACM SIGKDD Conference on Knowledge Discover and Data Mining (KDD'16). August 13-17. San Francisco, CA, USA. Xiong, C., Callan, J., & Liu, T.-. Y. (2017). Word-entity duet representations for document ranking. Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 763–772). ACM. Zhang, X., Chen, Y., Chen, J., Du, X., Wang, K., & Wen, J. R. (2017). Entity set expansion via knowledge graphs. International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 1101–1104). .
15