Proximity-aware heterogeneous information network embedding

Proximity-aware heterogeneous information network embedding

Knowledge-Based Systems xxx (xxxx) xxx Contents lists available at ScienceDirect Knowledge-Based Systems journal homepage: www.elsevier.com/locate/k...

2MB Sizes 0 Downloads 71 Views

Knowledge-Based Systems xxx (xxxx) xxx

Contents lists available at ScienceDirect

Knowledge-Based Systems journal homepage: www.elsevier.com/locate/knosys

Proximity-aware heterogeneous information network embedding✩ ∗

Chen Zhang a , Guodong Wang a , Bin Yu a , , Yu Xie b , Ke Pan b a b

School of Computer Science and Technology, Xidian University, Xi’an, Shaanxi Province 710071, China School of Electronic Engineering, Xidian University, Xi’an, Shaanxi Province 710071, China

article

info

Article history: Received 2 September 2019 Received in revised form 22 November 2019 Accepted 29 December 2019 Available online xxxx Keywords: Network embedding Heterogeneous information network Random walk

a b s t r a c t Network embedding, which aims to learn a high-quality low-dimensional representation for each node in a network, has attracted increasing attention recently. Heterogeneous information networks, with distinguishing types of nodes and relations, are one of the most significant networks. In the past years, heterogeneous information network embedding has been intensively studied. Most popular methods generate a set of node sequences, and feed them into an unsupervised feature learning model to obtain a low-dimensional vector for each node. However, the limitations of these approaches are that their generative node sequences neglect the different importances of diverse relations and they ignore the great value of proximity information which reveals whether two nodes are close or not in the network. To tackle these limitations, this paper presents a novel framework named Proximity-Aware Heterogeneous Information Network Embedding (PAHINE). The native information of a network is extracted from node sequences, which are generated by walking on a probabilitysensitive metagraph. Afterwards, the extracted information is fed into deep neural networks to derive the desired embedding vectors. The experimental results on four different heterogeneous networks indicate that the proposed method is efficient and it outperforms the state-of-the-art heterogeneous networks embedding algorithms. © 2020 Elsevier B.V. All rights reserved.

1. Introduction Networks are ubiquitous in the real world to explore and model complex systems, such as academic citation networks, biology networks, etc. Especially, heterogeneous information networks have attracted increasing attention during the past decades. Unlike the traditional homogeneous networks where all the vertexes available are in the same type, nodes and their relations of heterogeneous networks fall into multiple types, which contain a lot of potential semantics. These informative relations can benefit substantial machine learning tasks [1,2] in networks, such as node classification [3,4], node clustering [5], similarity search [6] and link prediction [7]. For example, DBLP, a typical citation network shown in Fig. 1, consists of three kinds of nodes (i.e., authors, papers and venues) and three relations (i.e., an author writes a paper, a paper is published at a venue and a paper cites another paper). Two papers will fall into the same class if they are published in the same venue. Similarly, two authors will be tagged by the same label if most of their papers belong to the ✩ No author associated with this paper has disclosed any potential or pertinent conflicts which may be perceived to have impending conflict with this work. For full disclosure statements refer to https://doi.org/10.1016/j.knosys. 2019.105468. ∗ Corresponding author. E-mail address: [email protected] (B. Yu).

same venue. With the increase of network size, it is challenging to densely represent a network and the cost of computation becomes extremely large. Thus, there is urgent need to study efficient and effective representation methods for heterogeneous networks. Network embedding, aiming to learn low-dimensional node representations that preserve intrinsic properties of a network, has shown superior ability for network analysis and mining. In particular, it can address the issue of intensive computation associated with large-scale networks. Apart from that, it can also benefit the subsequent off-the-shelf machine learning algorithms for various applications. A large number of efforts have been devoted to developing network embedding algorithms. Early works mainly pay attention to dimensional reduction on graphs by feature extraction or matrix factorization. However, high cost in large-matrix calculation on large-scale networks is a major challenge which is not easy to be handled. More recently, inspired by the success of word2vec [8], a great deal of researches have proposed various efficient network embedding frameworks, such as DeepWalk [9], Node2Vec [10], and LINE [11]. These traditional methods have shown promising performance in numerous machine learning applications, and most of them focus on the ordinary homogeneous networks. However, a substantial of networks in the real world are heterogeneous information networks, and the unique characters of heterogeneous information networks bring further

https://doi.org/10.1016/j.knosys.2019.105468 0950-7051/© 2020 Elsevier B.V. All rights reserved.

Please cite this article as: C. Zhang, G. Wang, B. Yu et al., Proximity-aware heterogeneous information network embedding, Knowledge-Based Systems (2020) 105468, https://doi.org/10.1016/j.knosys.2019.105468.

2

C. Zhang, G. Wang, B. Yu et al. / Knowledge-Based Systems xxx (xxxx) xxx

extracted from a network whereas the random walk is employed to generate sequences. • We present a heterogeneous information network embedding framework named PAHINE in this paper. Specifically, we combine a mutual proximity-aware mechanism with a strategic deep autoencoder in the encoding layer to retain proximity among nodes better in the latent representation space. • The superior performance of the proposed framework is demonstrated by three machine learning tasks (i.e., node classification, node clustering and similarity search) on four heterogeneous information networks.

Fig. 1. An example of heterogeneous information networks. A bibliographic network with three types of nodes: authors, papers, venues and two types of direct relations: an author writes a paper, a paper is published in a venue.

challenges that cannot be handled by universal models that are specifically designed for conventional homogeneous networks. Fortunately, the concept of metapath [6] is proposed to exploit rich semantics in heterogeneous information networks. Henceforth, many metapath-based algorithms are presented to deal with data analysis over heterogeneous information networks. Benefit from Skip-Gram [12], a prevalent word embedding model, these metapath-based algorithms normally generate a lot of node sequences based on a special strategy such as random walk over topological graph of raw networks, then learn a low-dimensional representation for each node by feeding the obtained sequences into an unsupervised feature learning model, such as Skip-Gram. MetaPath2vec [13] and MetaGraph2Vec [14] are two representative frameworks that have shown excellent performance. Although the existing methods have achieved promising results, they ignore two significant aspects, one refers to the different impacts of different types of nodes and their relations, another refers to the awareness of mutual proximity among nodes which are close in original networks. For the case of the former aspect, although the glaring attractions of heterogeneous information networks are multi-types of nodes and relations, existing methods suppose all node types have equal impacts to generate node sequences. Actually, not only the number of nodes of different types, but also the influence of semantics of different relations, play vital roles in heterogeneous information networks embedding. As for the latter point, for Skip-Gram, node sequences are considered as word sequences. The latent vector of a node is created by this model based on a window, which consists of its context nodes. This kind of word embedding model captures rich semantics of heterogeneous information networks, but ignores the awareness of proximity among nodes in latent space. Therefore, we propose PAHINE, an unsupervised heterogeneous information network embedding framework to mine the practical importances of two aspects aforementioned, which not only distributes a respective probability for each node type while generating node sequences, but also employs a mutual proximityaware mechanism to enhance the relations among relevant nodes in the encoding layer. To summarize, the major highlights of our proposed approach are as follows:

• Based on random walk and metagraph, a robust probabilitysensitive strategy is proposed to capture the rich information of original networks. Especially, the metagraph is

The remainder of this paper is organized as follows. Section 2 introduces the related works of the proposed framework. The detailed descriptions of the investigated model are given in Section 3. In Section 4, the performance of the proposed algorithm is validated on four heterogeneous networks of three different types through the comparison of our model with four approaches. Finally, we conclude this paper in Section 5. 2. Related works As mentioned roughly before, previous network embedding methods can be divided into two categories according to the kind of networks: homogeneous network embedding and heterogeneous network embedding. Especially, Goyal et al. [15] provided a comprehensive analysis to introduce the various existing network embedding algorithms. 2.1. Homogeneous network embedding Related works in homogeneous network embedding can be traced back to graph based dimensional reduction algorithms, such as Locally Linear Embedding (LLE) [16] and Laplacian Eigenmap (LE) [17]. Both of them extract a network to an adjacency matrix and calculate the leading eigenvectors to preserve the local geometric structure. However, these methods are not applicable for large-scale network embedding, since they involve the time-consuming eigendecomposition operation whose time complexity is O(n3 ). With the development of data analysis and neural networks, a plenty of advancements on network embedding have been made. For example, inspired by the observation that the node sequences generated by random walks are similar to the word sequences in natural language, DeepWalk [9] performs truncated random walks to produce a series of sequences and feeds them to Skip-Gram [12] to learn node representations. LINE [11] optimizes an objective function to capture the first-order and the second-order proximities based on the neighbours of the node. GraRep [18] expands the proximities into k-order and captures the global structural information in graph representation while M-NFM [19] incorporates the community information into network embedding. Node2Vec [10], which is extended from DeepWalk, designs a biased random walk to explore flexible neighbourhoods of nodes. In addition, node2Vec is also utilized to study a distributed representation learning for implicit feedback recommendation in [20]. Based on the deep extreme learning machines, Muhammad et al. [21] proposed an efficient representation learning method for image set classification. From another point of view, Li et al. [22], proposed a semisupervised learning schema which considers a lot of data with labels. Furthermore, Wang et al. [23] developed a deep architecture by utilizing the same semi-supervised model to optimize the first-order and second-order proximities simultaneously. Huang et al. [24] provided a model named AMVAE to fuse both links and the multi-modal contents for network embedding.

Please cite this article as: C. Zhang, G. Wang, B. Yu et al., Proximity-aware heterogeneous information network embedding, Knowledge-Based Systems (2020) 105468, https://doi.org/10.1016/j.knosys.2019.105468.

C. Zhang, G. Wang, B. Yu et al. / Knowledge-Based Systems xxx (xxxx) xxx

2.2. Heterogeneous network embedding Due to the ubiquity in the real world and the wealth in semantic information, heterogeneous information networks have been studied broadly. There also exist a group of special network embedding algorithms for heterogeneous information networks in recent years [25–30,13,31,14]. For instance, HINE [30] is a framework consisting of two embedding mechanisms. It not only captures both local and global semantic information to generate two temporary embedding vectors separately and encodes them to eventual embedding vectors, but also preserves the user-guided semantics. To handle the problem of comprehensive transcription of heterogeneous information networks, Yu et al. [32] investigated an unsupervised model HEER, which utilizes edge embedding built from node embedding and heterogeneous metrics to preserve various semantics even in the existence of incompatibility. Furthermore, the proposal of metapath [6] brings a new perspective which utilizes restricted random walks to explore the rich semantics of heterogeneous information networks for network embedding. Moreover, a part of subsequent researches pay attention to exploring the similarity or relevance of nodes based on metapaths. There are also some researchers mainly focus on deriving more efficient embedding vectors which benefit machine learning tasks. Huang et al. [27] designed two models to preserve the metapath-based proximities in heterogeneous information networks by minimizing the distance of close nodes. One of the models encodes the meta path based proximities among nodes, while the other encodes the proximities into lowdimensional space. Yu et al. [33] identified a crucial characteristic in path-based heterogeneous information networks relevance, named cross-meta-path synergy, and they proposed a data-driven relevance measure to integrate three characteristics (i.e., node visibility, path selectivity, and cross-meta-path synergy) in a unified framework. Metapaths can also be applied to some specific tasks. Ting et al. [34] proposed a heterogeneous network embedding framework which is augmented by metapaths to fulfil the task of author identification. Zheng et al. [35] not only employed metapaths to capture the rich semantics among entities for the application of entity set expansion, but also designed an approach to automatically discover the metapaths between entities in knowledge graph. Moreover, by specifying one set of multinomial distributions for each type of neighbourhood in the output layer of the skip-gram model, Dong et al. [13] presented the metapath2vec and metapath2vec++ frameworks to encoding the structures and semantics of a heterogeneous network. HIN2Vec, which contains two modules to learn vectors of nodes and metapaths respectively, is studied in [31]. Given a heterogeneous information network and a set of special relations in forms of metapaths, latent vectors of both nodes and relations can be obtained via predicting relations between nodes. Compared with previous works, this method catches more contextual information. However, owing to the difficulties in information collection, heterogeneous information networks in the real life are often incomplete. They may miss some pivotal nodes and links, which may lead to a deficiency in capturing deep semantics between distant nodes by metapath-based algorithms. To cope this challenge, a robust approach named MetaGraph2Vec is came up by [14]. It is a critical step to guide the generation of random walks by the metagraph in this model and it is also the kernel to deal with complex semantics successfully. For more complex networks including weights and directions, Keikha et al. [36] proposed an algorithm named CARE to explore the networks. Despite exploring diverse semantics, these existing methods do not consider the different significances of different semantics for embedding, and they do not exploit the mutual

3

awareness of proximity among nodes. Therefore, our work not only concentrates on distributing different transition probabilities for different semantics when walking on metagraph, but also pays attention to proximity-aware deep network embedding. 3. Methodology In this section, we introduce the related concepts, notations of heterogeneous information networks and problem formulation in Section 3.1. Then we show a brief overview of our proposed framework in Section 3.2. Next, the detailed descriptions are presented in remaining three sections. Furthermore, we discuss the computational complexity in Section 3.6. 3.1. Notations and problem formulation Definition 3.1 (Heterogeneous Information Network). A heterogeneous information network (HIN) [13] is defined as a graph G = (V , E , T ), where V is the set of nodes, E ∈ V × V is the set of edges among nodes, T is the set of TV and TE . TV and TE denote the sets of node and relation types, where |TV | + |TE | > 1. Each node and each link are associated with mapping functions φ (v ) : V → TV and φ (e) : E → TE respectively. For example, Fig. 1 shows a heterogeneous information network on bibliographic data. A bibliographic information network, such as the bibliographic network involving computer scientists derived from DBLP, is a typical heterogeneous information network consisting of three object types, i.e., author (A), paper (P) and venue (V) and three relation types, i.e., author-write-paper, venue-publish-paper and paper-cite-paper. Definition 3.2 (Heterogeneous Information Network Embedding). Given G = (V , E , T ), the heterogeneous information network embedding aims to represent each node v ∈ V as a d-dimensional vector x ∈ Rd via an embedding framework. In the task, d ≪ |V | is the essential objective and the obtained embedding vectors should preserve the feature information of G. In order to understand the node types and edge types better in a heterogeneous information network, it is necessary to extract a relational descriptor named network schema to provide an abstract description of the given G, which is denoted as TG = (TV , TE ). TG comprises all allowable node types and relation types to express the meta-information of G. Fig. 2(a) illustrates a bibliographic HIN schema and a movie-related HIN schema respectively. Different from homogeneous networks, two nodes n1 , n2 in a heterogeneous information network can be associated via various paths which reveal different semantics. Furthermore, these paths can be abstracted into metapaths [6] as follows. Definition 3.3 (Metapath). A metapath P is an abstract sequence of node types a1 , a2 , . . . , an connected by link types r1 , r2 , . . . , rn which describe how nodes are related to others in the form as follows: r1

r2

ri

rn−1

P = a1 → a2 → · · · ai → · · · → an Fig. 2(b) shows three instances of metapath for the citation network DBLP. The metapath ρ1 : A → P → A → P → A indicates that authors collaborate with the same author, while metapath ρ2 : A → P → V → P → A presents the relation that authors publish papers on the same venue, and metapath ρ3 : A → P → P → P → A describes that two papers written by different authors cite the same paper. In order to capture relations of distant nodes even if the network loses some key nodes, a robust schematic structure metagraph [37] is proposed recently.

Please cite this article as: C. Zhang, G. Wang, B. Yu et al., Proximity-aware heterogeneous information network embedding, Knowledge-Based Systems (2020) 105468, https://doi.org/10.1016/j.knosys.2019.105468.

4

C. Zhang, G. Wang, B. Yu et al. / Knowledge-Based Systems xxx (xxxx) xxx

Fig. 2. Network schema, metapath and metagraph of heterogeneous information networks. S1 and S2 in (a) are two schemas of a bibliographic network and a movie network respectively. In S1 , A, P, V denote author, paper and venue while in S2 , A, M, D, T denote actor, movie, director and tag. In addition, ρ1 , ρ2 , ρ3 in (b) are three different metapaths based on S1 , M is the corresponding metagraph.

Definition 3.4 (Metagraph). Given a heterogeneous information network schema TG = (TV , TE ), a metagraph [37] M = (N , R, ns , nt ) is a directed acyclic graph that starts at a single source node type ns and ends at a single target node type nt , where N is a subset of node types with each n ∈ TV , R is a subset of relation types with each r ∈ TE . As shown in Fig. 2(b), a metagraph M of DBLP can be considered as a integration of all kinds of metapaths in a heterogeneous information network intuitively. However, when generating random node sequences, a metagraph can provide a distant and complex walk path compared with single metapath especially for distant node pairs. We can find the metagraph of DBLP is a directed acyclic graph with A (author) as the source node and the target node. Both the source node and the target node belong to the same type, while other types play the role of intermediate stations in their links. The explanation of this discovery can be clarified through network mining tasks, i.e., node classification, node clustering and similarity search. In brief, these tasks are all to study the relations among nodes of the same type, which determines that the head and tail of the metagraph belong to the same type. Definition 3.5 (Metagraph-driven First-order Proximity (MFP)). The metagraph-driven first-order proximity is used to measure the Ntuplewise similarity among nodes. For any N nodes v1 , v2 , . . . , vn , the MFP of these nodes is defined as 1 if there exists a metagraph containing these N nodes. Definition 3.6 (Metagraph-driven Second-order Proximity (MSP)). The metagraph-driven second-order proximity measures the proximity of two nodes according to their neighbourhood structures. For one node vi ∈ Mi , Mi / vi is defined as a neighbourhood of vi . For any two nodes vi and vj , if they can be connected by their neighbourhoods, the MSP is defined as 1 for vi and vj , otherwise 0. Taken the relations in Fig. 1 as examples, the MFP of the author a2 and the author a3 is 1 whereas the MFP of the author a1 and the author a2 is 0, because a2 and a3 can be connected by the third branch of the metagraph in Fig. 2(b) whereas a1 and a2 cannot be connected by any branch of the metagraph in Fig. 2(b). However, the MSP of the author a1 and the author a2 is 1 due to their same neighbouring node, the author a3 .

3.2. Overview The framework overview is illustrated in Fig. 3. It consists of three major modules, i.e., semantic extraction, feature learning and representation optimization. In the semantic extraction module, we can attain the representation of the heterogeneous information network in form of a matrix, which is constructed based on a series of node sequences. It is worth noting that these node sequences are generated by walking on a probability-sensitive metagraph. Afterwords, each row of the obtained matrix is enhanced by close neighbours of its corresponding nodes. And the new matrix is fed into an autoencoder to derive the embedding vectors in the hidden layer. Furthermore, in the representation optimization module, we utilize a mutual proximity-aware mechanism to optimize the learned embedding vectors so as to preserve more proximity information among nodes. 3.3. Extraction of semantics Given a heterogeneous information network G = (V , E , T ), its schema TG = (TV , TE ) can be abstracted and its metagraph M = (N , R, ns , nt ) can be constructed automatically by the approach mentioned in [38]. The metagraph M can also be provided by experts in corresponding domains. In order to explore the relations of nodes that belong to the same type, we suppose ns = nt in M. After specifying a node type, we select nodes of the type ns to generate random walks on a probability-sensitive metagraph. In the process of walking, the traditional transition probability [14] at ith step is as follows: Pr (vi |vi−1 ; M ) =

1 TG (vi−1 )

·

1

|{u|(vi−1 , u) ∈ E , φ (vi ) = φ (u)}|

, (1)

where TG (vi−1 ) denotes the number of edge types whose edges are starting from vi−1 , |{u|(vi−1 , u) ∈ E , φ (vi ) = φ (u)}| is the number of neighbours of vi−1 via direct edge sharing the same type of vi .

Please cite this article as: C. Zhang, G. Wang, B. Yu et al., Proximity-aware heterogeneous information network embedding, Knowledge-Based Systems (2020) 105468, https://doi.org/10.1016/j.knosys.2019.105468.

C. Zhang, G. Wang, B. Yu et al. / Knowledge-Based Systems xxx (xxxx) xxx

5

Fig. 3. Framework of the proposed PAHNE for heterogeneous information network embedding. Given a heterogeneous information network, the available information is extracted by using a matrix X which is generated by restricted random walks, then the PAHINE transforms it to its neighbour-enhanced matrix H(X ). H(X ) is fed into an autoencoder to obtain the representations of nodes. Furthermore, proximity information is employed to reinforce the embeddings in latent space.

Considering the case that the number of different edge types starting from a node is normally different, the TG (vi−1 ) in above formula does not distribute the equal probability for each node of 1-hop neighbours of node vi−1 , which may cause offset frequency of nodes in terminal sequences and lead to an inaccurate result. We take the equal transition probability for each node into account to optimize above formula: Pr (vi |vi−1 ; M )

(

N Tv

)

1

) · |{u|(v , u) ∈ E , φ (v ) = φ (u)}| , i−1 i N Tvj

i = ∑ ( TG (vi−1 )

j=0

where TG (vi−1 ) is updated to ∑j=0

( )

N Tvi

(

j=TG vi−1

)

(

N Tvj

),

(2)

which is the ratio

of the number of vi type nodes and the number of whole nodes linked to vi−1 directly. At the ith step, the probability-navigated metagraph guided random walk works as follows. It firstly counts the number of edges for each edge type that satisfies the constrains of metagraph. Then calculates respective transition probability for each edge type whose edges are linked to node vi−1 . There is one more point that a qualified type with a transition probability which is between 0 and 1 is selected. In this case, the type with more edges will be chosen more easily. The last but not least, it walks to the next node via one edge of the selected type randomly. Compared with the classical random walk strategy adopted in DeepWalk, our metagraph guided random walk is more suitable for heterogeneous information network embedding. Random walk in DeepWalk is dedicated to sample nodes sequences especially for homogeneous networks whose nodes are all in one type, it selects the next node from neighbours of current node fairly, so as to capture all-around information and perform well in homogeneous network embedding. However, different from the homogeneous networks which only contain one relations among nodes, heterogeneous information networks consist of various nodes of different types and maintain various relations of nodes. These characteristics lead to a challenge for heterogeneous information networks, i.e., how to capture complex semantics among nodes. It is obvious that pure random walk in DeepWalk cannot solve this challenge due to its randomness in selecting nodes. For instance, a node sequence may be generated as a1 − p3 − v1 − v2 − p1 − p2 which contains no semantic clearly in DBLP. Such node sequences are useless for network embedding.

Nevertheless, the whole node sequences generated by our metagraph guided random walk are not confused. Each of them reveals one semantic guided by the metagraph of the network. So that, our node sequences can benefit network embedding via capturing more semantics from original networks. After obtaining node sequences, we design a co-occurrence matrix to preserve the information that whether two nodes can be connected. The co-occurrence matrix X ∈ Rm×n is an asymmetric matrix which is constructed by obtained sequences, where m is the number of nodes that belong to the specific type and n is the number of nodes occurred within node sequences. In detail, xi ∈ Rn demonstrates the occurrence of all nodes in random walks starting from node i. If the node j appears in node sequences which are starting from node i, xij = 1, otherwise xij = 0. As an instance, for the DBLP, m is the number of authors and n is the sum of authors, papers and venues. If there exists a node sequence a1 − p1 − v1 − p2 − a2 starting from the author a1 , the positions corresponding node pairs (a1 , p1 ), (a1 , v1 ), (a1 , p2 ), (a1 , a2 ) in matrix X are set to 1. Furthermore, according to our definition of MFP, it is obvious that the matrix X captures the MFP among nodes. Thus, we can obtain a matrix P (∈ Rm)×m which preserves MFP and MSP for each pair of nodes vi , vj in X by calculating XX T where 0 ≤ i < m and 0 ≤ j < m. That is to say, pij in matrix P can be calculated by the function pij = xi · xTj , 0 ≤ i < m, 0 ≤ j < m. Essentially, the process xi · xTj aims to capture MSP of (vi , vj ) by their neighbourhood with MFP. If node vi and node vj are not related (i.e., xij = 0) but they are both linked to node vk (i.e., xik = 1 and xjk = 1) based on the matrix X , they can be linked in P by xi · xTj (i.e., pij = 1). So that the matrix P not only preserves the MFP but also the MSP among nodes concurrently. In addition, the existence of semantics in node sequences leads to a desirable result that the matrix X preserves the semantics of the network implicitly. The proximities of nodes in heterogeneous information networks are determined by P, where pij denotes the proximity of node i and node j. Obviously, pij = 1 describes a proximity between the ith node and the jth one. Precisely, the proximities of two nodes can be decomposed into the first-order proximity, the second-order proximity and the high-order proximity. Generally, the integration of the first-order and the second order proximities (i.e., our matrix P) is enough for requirements of most applications.

Please cite this article as: C. Zhang, G. Wang, B. Yu et al., Proximity-aware heterogeneous information network embedding, Knowledge-Based Systems (2020) 105468, https://doi.org/10.1016/j.knosys.2019.105468.

6

C. Zhang, G. Wang, B. Yu et al. / Knowledge-Based Systems xxx (xxxx) xxx

3.4. Neighbour-enhanced representation learning To represent one node similarly with its neighbouring nodes in the low-dimensional space, we design a neighbour-enhanced learning model. The structure of our model which consists of an encoder and a decoder is similar to conventional autoencoders. But different from conventional autoencoders, we use the neighbours representation of a node as the input of encoder to reconstruct the representation of the corresponding node. Specifically, xi denotes the feature vector of node vi , and H (vi ) which is derived from a transforming function H (·) indicates the mean of feature representations of close neighbours of vi . We aim to minimize the loss function as follows: La =

n   ∑  ˆ  H(vi ) − xi  ,

(3)

where H(ˆvi ) is the output of decoder and H (·) incorporates proximity information into the model and can be adopted by the following transforming strategy, named Weighted Average Neighbour. For a given node vi , the H (vi ) can be calculated as corresponding weighted average neighbourhood. In detail, according to the node sequences which are obtained by the semantic extraction module of PAHINE, a compositive co-occurrence matrix X ∈ Rm×n can be constructed by extracting related nodes from each sequence. It is a crucial step generating the matrix P = XX T ∈ Rn×n to preserve the proximity of each∑ pair nodes. We can attain H (vi ) 1 via a formulation H(vi ) = |N(i) j∈N(i) pij xj , where N (i) denotes | the related neighbours of vi in the network, and pij which is treated as the weight of vj and vi indicates the grade of proximity between two nodes. Apart from the input H(vi ) and the output H(ˆvi ), the model utilizes K hidden layers to encode or decode data. Each two arrows in the centre part of Fig. 3 implies the mapping function underneath: (1)

( ) = σ Wx(i1) + b(1) ,

(4)

(k)

( ) = σ W (k) hki −1 + b(k) , k = 2, . . . , K ,

(5)

hi

Lb =

m ∑ i=1

i=1

hi

To ensure that potential relations of the source data can be preserved in latent vectors (i.e.,d leading eigenvectors), we employ a mutual proximity-aware mechanism to optimize the lowdimensional representations. In the right part of Fig. 3, we illustrate the process of the optimization. With encoding layers of foregoing model, we can obtain the matrix H ∈ Rm×d . Here, d represents the dimension of terminal vectors, m represents the number of nodes that are embedded. By introducing a proximity matrix P ∈ Rm×m , we compute a correlation matrix Fi· ∈ R1×d for each node vi with Fi· = |P1 | Pi· H, where Pi· is the i-th row of P and i· |Pi· | denotes the sum of pij if pij > 0. The correlation matrix F is produced by concatenating all Fi· . Hence, we define another loss function to optimize the latent representations:

where W is the weight matrix, b is the bias vector, and σ is a possible activation function such as tanh, sigmoid or softmax which are non-linear mapping functions. While K is the number of layers for encoder and decoder. Specifically, k illustrates that the mapping function is corresponding to the k-th layer. Compared to traditional autoencoder, our approach displays superior performance in retaining better proximity among nodes by minimizing the difference of a node representation and its neighbours representation. Intuitively, the obtained embedding vectors are robust since the model forces those nodes which are related in proximity to be associated together. Thus, they capture both node proximity information and distant semantics among nodes. 3.5. Optimization Heterogeneous information network embedding aims to learn a d-dimensional vector for each node, which can be considered as an engineering of feature extraction essentially. Though we can attain the latent vectors of nodes in Section 3.4, due to the symmetry of two mapping functions (i.e., encoding function and decoding function) in autoencoder and the inescapable loss of data in dimensional reduction, the obtained vectors may loss some essential information even if the La converges efficiently.

Hi· · Fi· . ∥Hi· ∥ × ∥Fi· ∥

(6)

We can then incorporate Eq. (6) into the loss function in Eq. (3) to obtain the overall objective as: L = La + β Lb =

m  m  ∑ ∑  ˆ  H(vi ) − xi  + β i=1

i=1

Hi· · Fi·

∥Hi· ∥ × ∥Fi· ∥

,

(7)

where β is a trade-off factor between the two loss functions La and Lb to balance the two objectives. 3.6. Complexity analysis The time complexity of proposed PAHINE mainly depends on the semantic extraction module and the feature learning module. Specifically, the computational complexity of the former module is O((s + nm)n), where n is the number of nodes, s denotes the number of node sequences for each node, and m is the total number of nodes in the network. Moreover, O(ns) is corresponding to the process of generating node sequences and O(n2 m) is corresponding to the generation of the matrix P which preserves the proximity of pairwise nodes. For the latter module, the time complexity of calculating gradients and updating parameters is O((nI + d)bd), where n is the size of nodes, d is the expected dimension, b is the batch size, and I is the number of iterations. Particularly, O(nlbd) is the computational complexity of proposed neighbour-enhanced autoencoder and O(bd2 ) is the computational complexity of the optimization in the encoding layer. In summary, the total computational complexity of the proposed PAHINE is O((s + nm)n) + O((nI + d)bd). 4. Experiments To validate the effectiveness of our approach, we introduce experimental datasets, comparison algorithms, evaluation metrics and experiment settings. Subsequently, we test our algorithm and four comparison algorithms on four real-world HIN datasets. The experimental results and the rational analyses are also shown in this section. 4.1. Experimental networks Two citation networks, DBLP [14,39] and DBIS [6], one social network, MovieLens [40], and a semantic network, Wordnet [41], are considered in our experiments. We summarize statistics of the four heterogeneous information networks in Table 1 with more details as follows: DBLP, constructed by Tang et al. [39], is a bibliographic dataset in computer science. We use the third version which contains 47,950 authors (A), 70,910 papers (P), 97 venues (V), and its schema is shown in Fig. 2(a). We can identify three base relations from the schema: an author writes a paper, a paper cites another

Please cite this article as: C. Zhang, G. Wang, B. Yu et al., Proximity-aware heterogeneous information network embedding, Knowledge-Based Systems (2020) 105468, https://doi.org/10.1016/j.knosys.2019.105468.

C. Zhang, G. Wang, B. Yu et al. / Knowledge-Based Systems xxx (xxxx) xxx

7

Table 1 Statistics of the four datasets. Datasets

Node type

DBLP DBIS MovieLens Wordnet

author author user head

#(V) paper paper movie relations

paper, and a paper is published in a venue. According to the venue of papers, papers can be classified into 4 categories: Database, Data mining, Artificial Intelligence and Computer Vision. DBIS dataset is established by Sun et al. [6]. It consists of 464 venues, 60,694 authors, and 72,207 corresponding papers. Similar to DBLP, the heterogeneous information network constructed from DBIS covers three types relations mentioned above. MovieLens is a dataset involving a great deal of records of activities that users tag the movies they watched. Moreover, each movie in this dataset belongs to at least one genres. Wordnet is composed of a large number of triples, which are in the form of synset, relation type, synset, to describe the relation between the head word and the tail word. 4.2. Baseline methods We select four algorithms available in the literatures for comparison, including two typical approaches, DeepWalk [9], LINE [11], and two state-of-art algorithms MetaPath2Vec [13] and MetaGraph2Vec [14], which are designed to handle the heterogeneous information networks especially. Besides, MetaPath2Vec ++[13] and MetaGraph2Vec++ [14], two variants of MetaPath2Vec and MetaGraph2Vec respectively, are also considered in our experiments. Particularly, in order to demonstrate the effectiveness of the proposed optimization of representation, we build a new embedding model named PAHINE(l1 ), which only uses the Eq. (3) as the loss function. Moreover, both these baseline methods and the proposed PAHNE are evaluated on the same evaluation metrics. We present the brief descriptions of these methods as follows:

• DeepWalk [9] produced by Perozzi et al. is one of the most significant algorithms to learn d-dimensional node vectors by capturing the topological information of the network. Skip-Gram model is employed to train node sequences that are generated by uniform random walk in the process. In this paper, we ignore the heterogeneity of the experimental networks. • LINE is proposed in [11], which takes the first-order and second-order proximities into account and learns vertex embedding over large-scale networks. In our experiments, we preserve the first-order and second-order information in d dimensions separately, then concatenate them to be the 2 final embeddings. • MetaPath2Vec [13] is an efficient heterogeneous network embedding model due to its heterogeneous Skip-Gram, which is designed specifically to embed node sequences. Considering the fact that a heterogeneous information network may have more than one metapath, we compare our PAHINE with different versions of this algorithm. • MetaGraph2Vec [14] adopts a more robust walking strategy while yielding node sequences. It provides capability for a node to generate longer and more compositive context sequences even in the case of lacking some links. Thus MetaGraph2Vec can capture richer semantics among distant nodes.

venue venue tag tail

67950 60694 2113 40504

70910 72207 5908 18

97 464 9079 40511

Table 2 The number of neurons in each layer. Datasets

Neurons in each layer

DBLP DBIS MovieLens Wordnet

138957-16384-1024-128-1024-16384-138957 133365-16384-1024-128-1024-16384-133365 17100-1024-128-1024-17100 81033-1024-128-1024-81033

4.3. Experiment settings and evaluation metrics We provide parameters settings for all algorithms here. For all baseline methods and the proposed PAHINE, we set the total embedding dimension d to be a fixed number 128 in order to compare methods more objectively. Particularly, in LINE, the first-order and the second-order based embedding are assigned 64 dimensions separately. Then the two embedding vectors are concatenated to be the total embedding vectors. The walk length l and window size w in DeepWalk, MetaPath2Vec and MetaGraph2Vec are specified to be 100 and 5 separately for generating random walks and performing the Skip-Gram model. Especially, for each node, there will be 80 node sequences generated starting from itself in DeepWalk, MetaPath2Vec, MetaGraph2Vec and our proposed PAHINE. Furthermore, in order to accelerate the convergence of models, 5 negative samples are employed in all baselines. And the learning rate is set with the value σ = 0.025 in LINE. For better exploit the capability of MetaPath2Vec, we take two metapaths A → P → A → P → A and A → P → V → P → A into account. To our PAHINE, whose core is a multi-layer deep neural network, the number of layers varies with different networks and the dimension of each layer is listed in Table 2. In the process of training, the batch size b is set to be 450 and the number of epoch is 10. We evaluate the effectiveness of the embedding representations on six typical heterogeneous information network mining tasks, including node classification, node clustering, similarity search, visualization, performance w.r.t. network sparsity and parameter sensitivity. For node classification, we adopt SVM as the supervised classifier and utilize the metrics accuracy, Micro-F1 and Macro-F1 to estimate the performance of all methods. Considering the labels in node classification as ground-truth of nodes, we use the common normalized mutual information (NMI) [42] to evaluate the performance of node clustering for each algorithm. As for similarity search, precision is treated as the criterion to analyse the PAHINE and other comparison approaches. We use t-SNE [43] to map embedding vectors into a 2-D space in visualization. Furthermore, we evaluate the performance of our proposed framework in the case that the network is sparse. Additionally, how the parameters d and β affect the results of our model is investigated in this section. All our experiments are conducted on a server with Ubuntu 16.04 system, 12 cores, 1.17 GHz, and 256G memory. Moreover, the GPU used for deep learning is TITAN XP Pascal 12G. 4.4. Node classification In this section, we conduct multi-class classification on the DBLP dataset, the Wordnet dataset and multi-label classification

Please cite this article as: C. Zhang, G. Wang, B. Yu et al., Proximity-aware heterogeneous information network embedding, Knowledge-Based Systems (2020) 105468, https://doi.org/10.1016/j.knosys.2019.105468.

8

C. Zhang, G. Wang, B. Yu et al. / Knowledge-Based Systems xxx (xxxx) xxx Table 3 The accuracies of multi-class author classification on DBLP. Methods

1%

2%

3%

4%

5%

6%

7%

8%

9%

DeepWalk LINE ρ1 MetaPath2Vec ρ2 MetaPath2Vec MetaGraph2Vec PAHINE(l1 ) PAHINE

82.32 76.06 83.14 51.01 85.65 85.85 87.34

85.95 81.03 87.69 52.43 88.96 88.56 89.85

87.07 83.26 88.33 53.67 89.00 90.23 91.03

88.03 85.57 89.12 54.43 90.68 90.97 91.87

89.22 86.98 89.15 55.62 91.33 92.00 93.07

89.36 88.17 89.30 55.67 91.60 92.85 93.85

89.99 88.73 89.56 56.29 92.07 93.08 94.11

90.25 89.42 89.77 56.62 92.17 93.89 94.33

90.43 87.02 89.73 57.33 92.46 94.05 94.64

Fig. 4. The experimental results of multi-class node classification on DBLP dataset. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Fig. 5. The experimental results of multi-class node classification on Wordnet dataset.

on the MovieLens dataset, since only these three networks have category or label information. We take the representations of nodes as features to train our classifier SVM, and vary the training ratio from 1% to 9% to evaluate the classification performance on DBLP and Wordnet. Besides, for multi-label classification over the MovieLens, we randomly sample nodes from 10% to 90% as the train data and utilize the left to validate the effectiveness. Accuracy, Micro-F1 and Macro-F1 are used as three metrics in our classification experiment. We repeat every experiment for 10 times, and report the average performance in term of the selected criteria. The experimental results are shown in Table 3, Fig. 4, Fig. 5 and Fig. 6. In summary, the analyses of the results for three datasets respectively as follows:

• Table 3 and Fig. 4 display the four-class classification on author nodes of the DBLP dataset. To evaluate the performance of the classification experiment, three metrics (i.e., accuracy, Micro-F1 and Macro-F1) are used in this dataset while two other datasets only tested in terms of the Micro-F1 and the Macro-F1. Furthermore, considering different metapaths adopted by MetaPath2Vec may lead to different results, the experiment takes into account two metapaths: ρ1 : A → P → A → P → A and ρ2 : A → P → V → P → A which are shown in Fig. 2(b). The bold scores in Table 3 and the red line in Fig. 4 demonstrate that the PAHINE consistently and significantly outperforms all baselines including PAHINE(l1 ) in terms of all three metrics. Specifically, PAHINE

Please cite this article as: C. Zhang, G. Wang, B. Yu et al., Proximity-aware heterogeneous information network embedding, Knowledge-Based Systems (2020) 105468, https://doi.org/10.1016/j.knosys.2019.105468.

C. Zhang, G. Wang, B. Yu et al. / Knowledge-Based Systems xxx (xxxx) xxx

9

Fig. 6. The experimental results of multi-label node classification on MovieLen dataset. Table 4 The performance (mean NMI in percent) of node clustering on the authors in DBLP and heads in Wordnet. Methods

DBLP

Wordnet

DeepWalk LINE MetaPath2Vec MetaGraph2Vec PAHINE(l1 ) PAHINE

41.97 20.02 46.63 51.34 51.88 53.78

18.03 21.94 30.34 37.28 36.98 40.82

Table 5 Author similarity search on DBLP. Methods

Precisions@100(%)

Precisions@500(%)

DeepWalk LINE MetaPath2Vec MetaGraph2Vec PAHINE(l1 ) PAHINE

91.33 91.94 89.76 92.51 92.77 93.66

90.75 91.02 88.64 91.92 92.00 92.24

Table 6 Author similarity search on DBIS.

achieves 3.86%–5.02%, 4.12%–11.28%, 2.16%–4.91%, 0.89%– 2.25%, 0.59%–1.49% improvements in terms of the accuracy over DeepWalk, LINE, MetaPath2Vec, MetaGraph2Vec and PAHINE(l1 ) individually. We can observe that in both MicroF1 and Macre-F1, our proposed PAHINE performs consistently better than baseline methods. We can also find that the improvement is increasingly obvious with the portion of the training set increases from 1% to 9%. It is worth noting that the two metapaths used in experiments show discrepant performance, which reveals that different metapaths employed by MetaPath2Vec will generate vectors of different qualities. In our experiments, MetaPath2Vec on the metapath ρ2 : A → P → V → P → A with better performance is selected to compare with our PAHINE. Notice that the yellow lines in Fig. 4 are too tortuous to maintain a stable value while the PAHINE performs steadily not only on accuracy but also on Micro-F1 and Macro-F1. • On the Wordnet dataset, PAHINE improves 0.04–0.17 scores in terms of the Micro-F1 and 0.01–0.13 scores in terms of the Macro-F1 over five baselines. Similarly, PAHINE improves 0.04–0.16 scores in terms of the Micro-F1 and 0.03– 0.17 scores in terms of the Macro-F1 over all baselines on the MovieLens dataset. In brief, compared with baselines, the PAHINE captures the quantitative information of nodes of diverse types and maximizes the effect of proximity information. The best performance of PAHINE demonstrate that integrating proximity-aware model into network embedding framework can derive high-quality embedding vectors. 4.5. Node clustering We also carry out clustering task to evaluate the performance of different embedding algorithms. We employ the same fourcategory author nodes on DBLP and eight-category head nodes

Methods

Precisions@100(%)

Precisions@500(%)

DeepWalk LINE MetaPath2Vec MetaGraph2Vec PAHINE(l1 ) PAHINE

83.23 83.85 84.12 86.02 86.34 87.79

81.81 82.03 82.96 84.00 84.52 85.43

on Wordnet used in the node classification task, i.e., in DBLP we cluster the authors while in Wordnet we cluster heads. We feed the learned vectors into a clustering model, and leverage the K means algorithm to cluster them. Here, considering the categories in Section 4.4 as ground-truth, we evaluate the performance in terms of the normalized mutual information (NMI). Table 4 illustrates the results on the DBLP with k = 4 and Wordnet with k = 8. All clustering experiments are run 10 times and the mean NMI scores are in exhibition. Overall, the highest scores highlighted in bold in Table 4 demonstrate that the proposed PAHINE outperforms all the baseline methods. Concretely, the mean NMI of the proposed PAHINE increases by 11.81%, 33.76%, 7.15%, 2.44%, 1.9% than baselines respectively for authors in DBLP, while the performance improves by 22.79%, 18.88%, 10.48%, 3.54%, 3.84% than baselines for heads in Wordnet. It is obvious that the investigated model gains more improvements compared with DeepWalk and LINE than MetaPath2Vec and MetaGraph2Vec. It is due to the fact that the latter two methods capture additional semantic information whereas the former two algorithms only preserve topological information. In addition, PAHINE(l1 ) attains higher NMI scores than other baselines in DBLP whereas lower than MetaGraph2Vec in Wordnet, which shows the optimization of enhancing proximities among nodes in the latent layer is efficient. Apart from the above observations, the clustering results also indicate the importance of paying more attentions to the utility of proximity information.

Please cite this article as: C. Zhang, G. Wang, B. Yu et al., Proximity-aware heterogeneous information network embedding, Knowledge-Based Systems (2020) 105468, https://doi.org/10.1016/j.knosys.2019.105468.

10

C. Zhang, G. Wang, B. Yu et al. / Knowledge-Based Systems xxx (xxxx) xxx

Fig. 7. Visualization results on DBLP.

4.6. Similarity search Experiments are also performed on similarity search over the DBLP and DBIS datasets to verify the performance of our proposed model PAHINE in embedding node proximities into the low-dimensional representations. For two datasets, we randomly select 1000 authors and rank their similar authors according to

the cosine similarity score respectively. The mean precision@100 and precision@500 are regarded as two standards to estimate the function of embedding vectors in the similarity search application. Tables 5 and 6 show the results. Apparently, not only the PAHINE, but also the PAHINE(l1 ) attains better performance than other baselines and our proposed PAHINE achieves the best search precisions on both DBLP and DBIS datasets.

Please cite this article as: C. Zhang, G. Wang, B. Yu et al., Proximity-aware heterogeneous information network embedding, Knowledge-Based Systems (2020) 105468, https://doi.org/10.1016/j.knosys.2019.105468.

C. Zhang, G. Wang, B. Yu et al. / Knowledge-Based Systems xxx (xxxx) xxx

11

among nodes, which leads to the fact that our PAHINE performs better than baselines in network visualization. 4.8. Performance w.r.t. network sparsity

Fig. 8. The result of performance w.r.t. network sparsity on DBLP in node classification with the metric Micro-F1.

4.7. Visualization Network visualization is another popular application of network embedding. In this section, to further demonstrate the performance of our proposed PAHINE on network visualization, we compare PAHINE with the baselines on the DBLP, a paper citation network. Particularly, we map the node representations learned by one embedding algorithm to a 2-D space with the t-SNE tool-kit. The results are shown in Fig. 7. As we can see, all algorithms are incapable to separate authors from different groups distinctly. This may be because the dataset DBLP is dense and the four groups (i.e., Database, Data mining, Artificial Intelligence and Computer Vision) are all related to computer science. LINE can basically separate the authors though black points mixed with points of other groups. The result of DeepWalk performs not well enough while points of different groups are disorganized. It can be observed that MetaPath2Vec shows better performance than DeepWalk but still not as good as MetaGraph2Vec. From the Fig. 7, we can find that the result of our PAHINE is different from others apparently. DeepWalk, MetaPath2Vec and MetaGraph2Vec utilize the Skip-Gram model to generate embedding vectors according to node contexts within a window size. They treat a node as a word in a sentence to capture relations between the node and its contexts. However, our PAHINE captures semantics in heterogeneous information networks to extract proximity features

In this section, we analyse the performance of the proposed PAHINE with regards to the sparsity of networks based on the dataset DBLP which is a large and dense paper citation network. In order to simulate sparse networks in different levels of sparsity and observe how the sparsity of networks affects the performance of our PAHINE model, we randomly cut off different percentages of edges from the original heterogeneous information network (i.e., the DBLP network), then use the rest edges to construct a new network as the source data of our model. The MicroF1 score of node classification is utilized as the metric to show the performance of different network sparsity. Fig. 8 shows the experimental results. We can see that even though the percentage of edges is as low as 20%, the score of Micro-F1 can still reach about 0.7 with our PAHINE. It is worth noting that with the increase of numbers of edges contained in network, PAHINE performs better than PAHINE(l1 ) from beginning to end, which shows the effectiveness of our optimization. 4.9. Parameter sensitivity The sensitivity of different parameters of our proposed PAHINE is investigated in this section. Specifically, we study how different sizes of dimension d and different trade-off factor β in Eq. (7) of the proposed PAHINE framework can affect experimental results. We only perform node classification and node clustering on the dataset DBLP. Following the previous experimental settings, we fix training ratio to 9% and only change the size of d from 64 to 176 to observe accuracies in node classification and normalized mutual information(NMI) in node clustering respectively. The hyper-parameter β measures the trade-off of two loss functions La and Lb in Eq. (7). We vary the value of β from 0 to 1 to show how the β affects the performance of experiments. In Fig. 9 we can see that, as the dimension d of embedding vectors increases, both classification accuracies of node classification and NMI scores of node clustering perform better and converge stably when d reaches 128. As illustrated in Fig. 10, both classification accuracies of node classification and NMI scores of node clustering increase at first and then decrease when the number of β is big enough. The curve in Fig. 10(a) reaches the peak when the value of β is 0.3, and the curve in Fig. 10(b) reaches the peak when β is 0.5. Note that even β is 0 or 1, the performance is worse than other values, which proves the efficiency of our proposed optimization strategy in Section 3.5.

Fig. 9. The effect of dimension d on node classification and node clustering.

Please cite this article as: C. Zhang, G. Wang, B. Yu et al., Proximity-aware heterogeneous information network embedding, Knowledge-Based Systems (2020) 105468, https://doi.org/10.1016/j.knosys.2019.105468.

12

C. Zhang, G. Wang, B. Yu et al. / Knowledge-Based Systems xxx (xxxx) xxx

Fig. 10. The effect of β on node classification and node clustering.

5. Conclusion In this paper, we explore the issue of heterogeneous information network embedding and investigate two significant aspects which are ignored in existing embedding methods. One of the aspects is related to the different impacts of diverse types of nodes and links, another one is related to the mutual awareness of proximity among relevant nodes. Accordingly, we proposed a joint framework named PAHINE based on probability-sensitive random walks and a deep neural network which is optimized by the proximity information. It is concluded that the proposed model performs better than four state-of-the-art algorithms for practical machine learning applications on four real-world heterogeneous information networks. In the future, we will extend the model to other kinds of networks such as attributed heterogeneous information networks, context heterogeneous information networks or an integration of two heterogeneous information networks. CRediT authorship contribution statement Chen Zhang: Conceptualization, Validation, Writing - original draft. Guodong Wang: Methodology, Conceptualization, Writing - original draft, Writing - review & editing. Bin Yu: Supervision, Methodology. Yu Xie: Validation, Supervision, Writing - original draft, Writing - review & editing. Ke Pan: Software, Writing original draft. Acknowledgements The authors wish to thank the editors and anonymous reviewers for their valuable comments and helpful suggestions which greatly improved the paper’s quality. This work was supported by the Key research and development program of Shaanxi Province, China (Grant no. 2019ZDLGY17-01, 2019GY-042). References [1] J. Liao, S. Wang, D. Li, X. Li, FREERL: Fusion relation embedded representation learning framework for aspect extraction, Knowl. Based Syst. 135 (2017) 9–17. [2] L. Boratto, S. Carta, G. Fenu, R. Saia, Using neural word embeddings to model user behavior and detect user segments, Knowl. Based Syst. 108 (2016) 5–14. [3] M. Ji, J. Han, M. Danilevsky, Ranking-based classification of heterogeneous information networks, in: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2011, pp. 1298–1306.

[4] R.A. Sinoara, J. Camachocollados, R. Rossi, R. Navigli, S.O. Rezende, Knowledge-enhanced document embeddings for text classification, Knowl. Based Syst. 163 (2019) 955–971. [5] T. Opsahl, P. Panzarasa, Clustering in weighted networks, Social Networks (2) (2009) 155–163. [6] Y. Sun, J. Han, X. Yan, P.S. Yu, T. Wu, Pathsim: Meta path-based topk similarity search in heterogeneous information networks, Proc. VLDB Endow. (11) (2011) 992–1003. [7] D. Liben-Nowell, J. Kleinberg, The link-prediction problem for social networks, J. Am. Soc. Inf. Sci. Technol. (7) (2007) 1019–1031. [8] T. Mikolov, I. Sutskever, K. Chen, G. Corrado, J. Dean, Distributed representations ofwords and phrases and their compositionality, Adv. Neural Inf. Process. Syst. (2013) 3111–3119. [9] B. Perozzi, R. Al-Rfou, S. Skiena, Deepwalk: Online learning of social representations, in: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2014, pp. 701–710. [10] A. Grover, J. Leskovec, node2vec: Scalable feature learning for networks, in: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, pp. 855–864. [11] J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, Q. Mei, Line: Large-scale information network embedding, in: Proceedings of the 24th International Conference on World Wide Web, 2015, pp. 1067–1077. [12] T. Mikolov, K. Chen, G. Corrado, J. Dean, Efficient estimation of word representations in vector space, 2013, arXiv preprint arXiv:1301.3781. [13] Y. Dong, N.V. Chawla, A. Swami, metapath2vec: Scalable representation learning for heterogeneous networks, in: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2017, pp. 135–144. [14] D. Zhang, J. Yin, X. Zhu, C. Zhang, MetaGraph2Vec: Complex semantic path augmented heterogeneous network embedding, in: Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2018, pp. 196–208. [15] P. Goyal, E. Ferrara, Graph embedding techniques, applications, and performance: A survey, Knowl. Based Syst. 151 (2018) 78–94. [16] S.T. Roweis, L.K. Saul, Nonlinear dimensionality reduction by locally linear embedding, Science (5500) (2000) 2323–2326. [17] M. Belkin, P. Niyogi, Laplacian eigenmaps for dimensionality reduction and data representation, Neural Comput. (6) (2003) 1373–1396. [18] S. Cao, W. Lu, Q. Xu, Grarep: Learning graph representations with global structural information, in: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, 2015, pp. 891–900. [19] X. Wang, P. Cui, J. Wang, J. Pei, W. Zhu, S. Yang, Community preserving network embedding, in: Proceedings of the 31st AAAI Conference on Artificial Intelligence, 2017. [20] Y. Liu, Z. Tian, J. Sun, Y. Jiang, X. Zhang, Distributed representation learning via node2vec for implicit feedback recommendation, Neural Comput. Appl. (2) (2019) 1–11. [21] M. Uzair, F. Shafait, B. Ghanem, A. Mian, Representation learning with deep extreme learning machines for efficient image set classification, Neural Comput. Appl. (2015) 1–13. [22] C. Li, Z. Li, S. Wang, Y. Yang, X. Zhang, J. Zhou, Semi-supervised network embedding, in: International Conference on Database Systems for Advanced Applications, Springer, 2017, pp. 131–147. [23] D. Wang, P. Cui, W. Zhu, Structural deep network embedding, in: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, pp. 1225–1234.

Please cite this article as: C. Zhang, G. Wang, B. Yu et al., Proximity-aware heterogeneous information network embedding, Knowledge-Based Systems (2020) 105468, https://doi.org/10.1016/j.knosys.2019.105468.

C. Zhang, G. Wang, B. Yu et al. / Knowledge-Based Systems xxx (xxxx) xxx [24] F. Huang, X. Zhang, J. Xu, C. Li, Z. Li, Network embedding by fusing multimodal contents and links, Knowl. Based Syst. 171 (2019) 44–55. [25] S. Chang, W. Han, J. Tang, G.-J. Qi, C.C. Aggarwal, T.S. Huang, Heterogeneous network embedding via deep architectures, in: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2015, pp. 119–128. [26] H. Gui, J. Liu, F. Tao, M. Jiang, B. Norick, J. Han, Large-scale embedding learning in heterogeneous event data, in: 2016 IEEE 16th International Conference on Data Mining, 2016, pp. 907–912. [27] Z. Huang, N. Mamoulis, Heterogeneous information network embedding for meta path based proximity, 2017, arXiv preprint arXiv:1701.05291. [28] Z. Liu, X. Wang, J. Pu, L. Wang, L. Zhang, Nonnegative low-rank representation based manifold embedding for semi-supervised learning, Knowl. Based Syst. (2017) 121–129. [29] J. Shang, M. Qu, J. Liu, L.M. Kaplan, J. Han, J. Peng, Meta-path guided embedding for similarity search in large-scale heterogeneous information networks, 2016, arXiv preprint arXiv:1610.09769. [30] Y. Chen, C. Wang, HINE: Heterogeneous information network embedding, in: International Conference on Database Systems for Advanced Applications, 2017, pp. 180–195. [31] T.-y. Fu, W.-C. Lee, Z. Lei, Hin2vec: Explore meta-paths in heterogeneous information networks for representation learning, in: Proceedings of Conference Information and Knowledge Management, 2017, pp. 1797–1806. [32] Y. Shi, Q. Zhu, F. Guo, C. Zhang, J. Han, Easing embedding learning by comprehensive transcription of heterogeneous information networks, in: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2018, pp. 2190–2199. [33] Y. Shi, P.-W. Chan, H. Zhuang, H. Gui, J. Han, Prep: Path-based relevance from a probabilistic perspective in heterogeneous information networks, in: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2017, pp. 425–434.

13

[34] T. Chen, Y. Sun, Task-guided and path-augmented heterogeneous network embedding for author identification, in: Proceedings of the 10th ACM International Conference on Web Search and Data Mining, 2017, pp. 295–304. [35] Y. Zheng, C. Shi, X. Cao, X. Li, B. Wu, A meta path based method for entity set expansion in knowledge graph, IEEE Trans. Big Data (2018). [36] M.M. Keikha, M. Rahgozar, M. Asadpour, Community aware random walk for network embedding, Knowl. Based Syst. 148 (2018) 47–54. [37] Z. Huang, Y. Zheng, R. Cheng, Y. Sun, N. Mamoulis, X. Li, Meta structure: Computing relevance in large heterogeneous information networks, in: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, pp. 1595–1604. [38] Y. Zhou, J. Huang, H. Sun, Y. Sun, Recurrent meta-structure for robust similarity measure in heterogeneous information networks, 2017, arXiv preprint arXiv:1712.09008. [39] J. Tang, J. Zhang, L. Yao, J. Li, L. Zhang, Z. Su, Arnetminer: extraction and mining of academic social networks, in: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2008, pp. 990–998. [40] F. Harper, J.A. Konstan, The movielens datasets: History and context, ACM Trans. Interact. Intell. Syst. (4) (2016) 19. [41] A. Bordes, N. Usunier, A. Garcia-Duran, J. Weston, O. Yakhnenko, Translating embeddings for modeling multi-relational data, in: Advances in Neural Information Processing Systems, 2013, pp. 2787–2795. [42] L. Danon, A. Diaz-Guilera, J. Duch, A. Arenas, Comparing community structure identification, J. Stat. Mech. Theory Exp. (9) (2005) 9008. [43] P.E. Rauber, A.X. Falcão, A.C. Telea, Visualizing time-dependent data using dynamic t-SNE, in: Eurographics Conference on Visualization, EuroVis 2016, Short Papers, Groningen, the Netherlands, 6-10 June 2016, 2016, pp. 73–77.

Please cite this article as: C. Zhang, G. Wang, B. Yu et al., Proximity-aware heterogeneous information network embedding, Knowledge-Based Systems (2020) 105468, https://doi.org/10.1016/j.knosys.2019.105468.