Accepted Manuscript
Relationship strength estimation based on Wechat Friends Circle Chunhua Ju , Wanqiong Tao PII: DOI: Reference:
S0925-2312(17)30438-1 10.1016/j.neucom.2016.11.075 NEUCOM 18188
To appear in:
Neurocomputing
Received date: Revised date: Accepted date:
30 April 2016 7 September 2016 21 November 2016
Please cite this article as: Chunhua Ju , Wanqiong Tao , Relationship strength estimation based on Wechat Friends Circle, Neurocomputing (2017), doi: 10.1016/j.neucom.2016.11.075
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
Relationship strength estimation based on Wechat Friends Circle Chunhua Ju1,2,3, Wanqiong Tao3
*
1
Contemporary Business and Trade Research Center of Zhejiang Gongshang University Hangzhou 310018, China
2
Contemporary Business and Collaborative Innovation Research Center of Zhejiang Gongshang University Hangzhou 310018, China 3
Hangzhou 310018, China
CR IP T
College of Computer Science & Information Engineering, Zhejiang Gongshang University
Abstract: Wechat has become a popular way in China for users to connect and share information with their friends. In particular, Wechat Friends Circle is a popular platform for users to express themselves. It is useful in many real applications, such as personalized recommendations, to calculate the relationship strength between Friends Circle users. However, most existed researches do not apply to
AN US
the Wechat Friends Circle due to the uniqueness that users can only see a common friend's comments or point praise, which is quite different from the general social network. To meet this need, in this paper we propose a general framework to measure the relationship strengths between different users in Wechat Friends Circle, considering not only the similarity of users’ profile information, but also the interaction between users. We conduct a set of experiments on Friends Circle dataset, from which we learn that the proposed framework is efficient and promising on improving the performances of relationship strength calculation.
M
Keywords:Wechat Friends Circle; Relationship circle; Semantic similarity; Interaction frequency;
1. Introduction
ED
Relationship strength
Granovetter [1] first proposed the concept of relationship strength in his landmark paper The
PT
Strength of Weak Ties. The relationship strength indicates the closeness between different individuals. Compared to the acquaintances, individuals are more likely to contact with their relatives and friends, which is considered to be the strong relationship. The relationship between acquaintances may be weak,
CE
but these weak ties often play a significant role in many new message transfer processes. The Online Social Networks, such as Facebook, Myspace and Twitter, are rapidly becoming popular with the rapid development of Internet technology and wireless communication technology.
AC
This phenomenon makes information dissemination and sharing more convenient, and it gradually extends to the information services of social networking sites, to provide users with richer services than the traditional ways [22]. However, the explosive growth of the content type of service and information has led to the problem of information overload. People need to spend a lot of time and effort finding the information they need from the vast amount of information database, which greatly reduces the efficiency. Furthermore, in real life, acquaintances and friends recommendations are of vital importantce to promote the social networks users’ consumption behavior [2]. If relating the social network user relationship strength to personalized services, can accurately recommend and show users their real interested social service content and information, and greatly improves the efficiency of information retrieval [3]. Therefore, in recent years, the relationship strength between users in online social networks and related content have become a hot research topic, and the accurate calculation of
ACCEPTED MANUSCRIPT
relationship strength is one of the important prerequisites in achieving accurate personalized social services, such as friend recommendation, product recommendation and so on. Following the development of online social networks is the popularity of smart phones and mobile open platform. The combination of mobile communication and the Internet will inevitably lead to further popularization and development of mobile social networks, information resources can be
CR IP T
acquired actively or passively by a variety of ways at any time and any place, which makes dissemination and sharing among users of online social networks more convenient. In China, Wechat is the hottest emerging socialized online platform at present. By the end of the first quarter of 2015, Wechat had covered more than 90% of China's smartphones, including 549 million monthly active users, over 8 million public accounts and over 85000 mobile application docking, the users of Wechat has covering more than 200 countries with more than 20 languages. Such a huge user base has made Wechat become the new master of social networks [4]. Wechat Friends Circle is the most interesting part, in which most of Wechat friends are from QQ friends and address book contact, so it is an interpersonal communication mode with strong relationship mainly and weak relationship supplementarily [5]. It will achieve the fastest and the most effective information dissemination with its
AN US
unique communication way, and it will expand people's real communication space. Therefore, in recent years, Wechat electronic commerce has developed rapidly, which leads the relationship strength between users in Wechat Friends Circle to be another research hotspot.
To measure the relationship strength between different users, the typical approaches utilize the user’s profile information and the interaction activities between different users [6, 7, 8]. The user’s profile information provides an overview of his/her basic information including the user’s sex, age, job,
M
hobbies, friends, the religious view, etc. Generally, the users with similar profiles are likely to exhibit greater similarity, therefore, they may have higher relationship strength, and vice versa [6]. On the
ED
other hand, numerous interaction activities, such as comment, like, etc. The more frequent the interaction activities are, the closer the relationship between the users is. Currently, many approaches [9, 10, 11] are proposed to estimate an overall relationship strength between different online social
PT
network users. However, these approaches all have some defects, which cannot be very accurate to calculate the relationship strength between users. What’s more, What’s vitally important is that these
CE
approaches are not suitable to Wechat Friends Circle which contains some special characteristics compared with the general online social networks.
AC
Aiming at the deficiency of the current research, in this paper we propose a calculation model to measure the relationship strengths between different users in Wechat Friends Circle. Our approach consists of three sequential steps: dividing relationship circles for users, determining the related activity area for each relationship circle, and measuring the relationship strengths on each relationship circle. In the first step, we first seek a target user as center, and use a traversal method to divide relationship circles for users according to the target user. In the second step, each generated relationship circle belongs to an activity area with a similarity score based on the word frequency. It helps to facilitate the further application of the method. In the third step, we calculate the relationship strengths between target user and his friends in each relationship circle based on the similarity and the interaction between
ACCEPTED MANUSCRIPT
them. Similarity involves the profile information similarity and the concerned Wechat Subscription similarity. Interaction is measured by the interactive behavior frequency, including like, comment, reply and collection. We conduct enormous experiments on the Wechat dataset, and the results show that our approach could achieve promising performance compared to the weighted social network graphs method and latent variable model method. Compared to existing literature, we summarize the main contributions of this paper as follows: 1. According to the document literatures we collect, this is the first work that investigates the
CR IP T
relationship strength between different users in Wechat Friends Circle. 2. Based on the target user, dividing all his/her friends into various relationship circles can be
helpful to the selection of strong relationship and the elimination of weak relationship.
3. We calculate the relationship strength between target user and his Wechat Friends Circle friends in each relationship circle based on the similarity and interaction. For the characteristic of Wechat Friends Circle, we comprehensively consider all kinds of influencing factors, including users’ profile information, Wechat Subscription, like, comment, reply, and collection (See Formula (9)). And it can
AN US
improve the accuracy of the relationship strength calculation results, which are demonstrated in Fig. 5 and Fig. 6.
The rest part of this paper is organized as follows. We review the related works in Section 2. In Section 3, we briefly introduce the framework of measuring the relationship strength between Wechat users in each relationship circle on Wechat Friends Circle. The details of the proposed approach are elaborated in Section 4. In Section 5, we show the experimental results of our approach on the Wechat
M
dataset. Finally, we conclude the paper and discuss the directions of the future works in Section 6. 2. Related works
ED
With the rapid development of the online social network, the research of the friend relationship and relationship strength between online social network users has aroused great attention. For the research about friend relationship, Xu K et al. [23] presented a method to find community structure built on
CE
PT
combination entropy, and evaluate modularity of a virtual campus mobile network (V-Net), and it describes the probability distributions of friendships per number of friends, number of check-ins and number of visited places. Wang M. [24] investigated how to exploit user׳s profile and topological structure information in social circle discovery, and put forward in-link Salton metric and out-link Salton metric to measure user׳s topological structure and an improved density peaks-based clustering method. For the research about relationship strength, Xiang et al. [12]
AC
developed an unsupervised model to estimate the relationship strength between different users from the interaction activity (e.g., communication and tagging) and the user similarity. Viswanath et al. [9] analyzed the relationship between the users through the “wall posts” of Facebook. Zhao et al. [3] built an algorithm of relationship strength in social network integrating personality traits and interactions computing. Gilbert et al. [7] presented a predictive model that mapped the social media data to the tie strength. Zhao et al. [6] proposed a general framework to measure the relationship strengths between different users, taking consideration not only the user’s profile information but also the interaction activities and the activity fields. Singla et al. [11] investigated the correlation between the individual search topics among the users that interact using instant messaging, and show that not only does a correlation exist but that it increases with the amount of time the user communicate. Backstrom et al. [13] investigated the evolution of the social network structure and the group membership in MySpace
ACCEPTED MANUSCRIPT
and show that the homophily can be used to improve the predictive models of the group membership. Xiang Lin et al. [14] proposed a method for relationship intensity in weighted social network graphs, which is based on the trust propagation strategy and the direct relationship intensity. However, some of these studies only show that there is a link between the users, but do not calculate the relationship strength. And the most important thing is that all the researches take no
account of discussing the hot platform—Wechat Friends Circle, which is an online social network with strong relationship mainly and weak relationship supplementarily, and with a unique communication way. Its mode is different with general online social networks. So the relationship strength calculation method of other social networks’ users is not very accurate to calculate the relationship strength
CR IP T
between the Wechat users.
The content and method of this paper make up for the deficiency of the existing research, and it is the first time to make the study on the relationship strength between the users in Wechat Friends Circle. Calculating the relationship strength of the Wechat users can indicate the close degree of them. We select a target user as center point to divide relationship circles, and each generated relationship circle belongs to an activity area with a similarity score based on the word frequency. We calculate the
AN US
relationship strength between target user and his/her friends in each relationship circle based on the similarity and interaction. It not only improves the accuracy of the calculation results, but also contributes to the further application.
3. Wechat Firends Circle users’ relationship strength model 3.1 Basic definition Definition 1 (User)
M
Each user who logins in WeChat at the first time is required to register and fill in personal information to achieve a unique account ID which could distinguish him from other users. After logging in, users can add friends, chat with friends, and participate in the Friends Circle. The set of all
ED
users can be expressed as U={u1,u2,…,un}, n representing the total number of users. Definition 2 (Relationship circle)
Different from the general social networking platforms, only mutual friends can see the interactive
PT
activities between them in the Wechat Friends Circle. Divide all the mutual friends who are able to see mutual interactive activities between them into a circle, which is called a relationship circle. In the
CE
same relationship circle, all users are friends, and they are all directly related on each other. Definition 3 (Target user)
AC
We select a Wechat user and then divide all his friends into several different relationship circles depending on whether they are mutual friends, and this Wechat user is called target user. A target user can extend multiple relationship circles, and a user may belong to multiple relationship circles. Definition 4 (Interactive activity document) There will be some interactive activities between users in the Wechat Friends Circle, including like,
comment, replay and collection. Each like, comment, reply and collection is an interactive activity document, of which associated user is information sender or information receiver. The set of all interactive activity documents can be expressed as D={d1,d2,…,dN}, N representing the total number of interactive activity documents. Definition 5 (Interaction between users) The related users are information senders or information receivers for each interactive activity document. We use the matrix UD={udij} to record its relevance. If the value of udij is 1, it means the
ACCEPTED MANUSCRIPT
user i and the document j are associated. If the value of udij is 0, it means the user i and the document j are unassociated. Definition6 (Friend relationship) If users are Wechat friends but have never had interactive activities in Wechat Friends Circle, we don’t regard them as friends in this paper. We only regard the users who have interactive activities in Wechat Friends Circle as friends. So we can judge whether two users are friends through interactive activity information. The matrix F={fij} is used to record friend relationship. If the value of fij is 1, it means the user i and the user j are friends. If the value of fij is 0, it means the user i and the user j are
CR IP T
not friends. 3.2 Establishing model
In view of the above analysis, we propose a Wechat Friends Circle users’ relationship strength model, which can be summarized into the following three main steps in the sequence, which is shown in Fig. 1:
(1) Dividing relationship circle based on target user according to the friend relationship matrix.
AN US
(2) Assigning an activity area to each relationship circle with a similarity score based on the word frequency.
(3) Calculating the relationship strength between target user and his/her friends in each relationship circle. Wechat Data Get
ED
M
Dividing Relationship Circle
d1
d2
d3
…… dn
CE
Interactive activity document
PT
Data Process
The matrix of friend relationship F={fij}
Step1
Assigning Activity Area
AC
Relationship Strength Calculation
The relationship strength between users in relationship circle
Step3
Step2
Fig. 1. The model of relationship strength estimation between Wechat Friends Circle users
In the first step, we firstly select and take a Wechat user as start point. Then, according to the characteristic that only mutual friends can see the interaction between each other in the Wechat Friends
ACCEPTED MANUSCRIPT
Circle, we utilize a traversal method to divide all the mutual friends who are able to see mutual interactive activities between each other into a circle based on target user. In this paper, we only regard Wechat users who have interactive activities in Wechat Friends Circle as friends. In the second step, we select ten activity area topics, including food, entertainment, sport, travel, shopping, life-service, education, medical, work and other. For each generated relationship circle, we extract the keywords of interactive activities content, and measure its semantic distance with each area name according to concept similarity calculation method based on ontology [15], and the most relevance activity area is assigned to the relationship circle.
CR IP T
In the third step, we calculate the relationship strength between target user and his/her friends in each relationship circle based on the similarity and interaction between users. The similarity including profile information similarity and WeChat Subscription concerned similarity, we utilize cosine similarity to calculate profile information similarity between users, and the squared Euclidean distance to calculate WeChat Subscription concerned similarity. And for interaction, we mainly consider the
AN US
interactive frequency.
4. Methodology of the relationship strength measurement in Wechat Friends Circle 4.1 Data preprocessing
Given the set of the users’ profile information (user ID, sex, region) and interaction activities information (like, comment, reply, collection) downloaded from the Wechat. The idea is to develop an information acquisition plug-in, authorizing to the plug-in after obtaining the consent of Wechat users,
M
and then access to data sets. Since the data sets are raw data, they should be processed before using. It means all data will be translated into the user's profile information set, friends relation set, and
ED
interactive activity information set.
Firstly, selecting a target user who is always a service object in the further application, he/she may
PT
need to be recommended in the personalized recommendation service or other scenarios. We regard target user as starting point, and utilize the method of stop word dictionary to remove the stop words
CE
(pronouns and modal particle etc.) of the interactive activity information between Wechat users. Thereout, we obtain the interactive activity information set based on an example of posting status updates (As shown in Fig. 2) in Wechat Friends Circle, as TABLE 1 shows. There is information sender,
AC
information receiver, information type, and information content in the interactive activity information set. Information type is classified into L, C and R, L representing like, C representing the comment, R representing the reply. The interactive activity of like is recorded as “[#like]” in information content.
CR IP T
ACCEPTED MANUSCRIPT
Fig. 2 An example of posting status updates in Wechat Friends Circle
TABLE 1 The table of interactive activity information based on the updates content as Fig. 2
Information receiver
Information type
Information content
5473232
5473231
C
Did you go to Taiwan?
5473231
5473232
R
5473233
5473231
5473234
5473231
5473235
5473231 5473235
5473235
5473231
……
PT
5473237
ED
5473236
M
5473231
AN US
Information sender
Yes
L
[#like]
L
[#like] You can put together a
C R
table to play mahjong. [Grin][Grin] I'm going to travel with
R you next time.
5473231
C
666
5473231
L
[#like]
……
……
……
In this example, target user is 5473231.
CE
What’s more, we gain the set of friend relationship based on the information sender and receiver
in the data set. In this paper, we only regard the users with interactive activities as friends. Those
AC
Wechat friends without interactive activities are regarded as non-friend relationship. Then, we collect the profile information of relevant Wechat users according to the data set of friend relationship, and constitute the users’ profile information set.
4.2 The relationship circle division for target user We use a traversal method to divide the relationship circle for target user based on the friend relationship set. Given the friends set of user ui as Fui={uj}, where uj represents the friend of ui, and uj=j, j 1, n , where the n is the total number of Wechat users. For the target user u1, the main steps of the
traversal method are as follows:
ACCEPTED MANUSCRIPT
Step1: Do intersection operation of target user’s friends set and the set of the first friend who has not yet been traversed in the target user’s friend set, we assuming that friends are ranked according to the subscript order from small to large. If the intersection result is empty set, then the target user and his first friend belong to a relationship circle. Otherwise, we proceed to the next step. Step2: Do intersection operation of the operation result set in the upper step and the set of the first friend who has not yet been traversed in the target user’s friend set. Repeat the intersection until the intersection result is empty set.
CR IP T
Step3: When the intersection result is empty set, all the users whose friends set has been involved
in intersection operation of the previous step are assigned to a relationship circle.
Step4: Repeat the intersection operation until all the friend nodes of target user are traversed and
the relationship they belong to is found. Take the relations in Fig. 3 as an example,
AN US
u2
u4
u6
M
u1
u5
ED
u3
Fig. 3 An example of contact between users
PT
according to the Fig. 3, we know there are user friend sets Fu u2 , u3 , u4 , u5 , Fu u1, u6 , 1 2
CE
Fu u1, u4 , u5 , Fu u1, u3 , u5 , u6 , Fu u1, u3 , u4 , u6 , Fu6 u2 , u4 , u5 3 5 4 Take user u1 as target user, and u2 is the first friend in the friend set of u1. On the basis of steps
AC
described above, do intersection operations of the friend set of u1 and u2, the result is Fu Fu , 1
2
so u1 and u2 belong to a friend circle. Then, we select the friend set of u1 and the friend set of u3 who is the second friend of u1 to do intersection operation, Fu Fu u4 , u5 , the result is non-empty. So, 1
3
we do the further intersection operation of this result and the friend set of user u4, Fu Fu Fu u5 , still non-empty. We do the further intersection operation of this result and the 1
3
4
friend set of user u5, getting the result Fu Fu Fu Fu , which shows an empty set, so 1
3
4
5
ACCEPTED MANUSCRIPT
we stop the operation. According to the above calculation process we know that user u1, u3, u4, u5 belong to a relationship circle. And u4 and u5 have been traversed, so there is no need to do more operations. 4.3 Assign the most similar activity area to each relationship circle The relationship strength between online social network users can be applied to cyber manhunt, link prediction, personalized recommendation and so on. In order to make its applications in these
CR IP T
aspects more targeted and accurate, we assign the most similar activity area for each relationship circle before calculating the relationship strength between users in Wechat Friends Circle. Activity area can cover many aspects, we select the ten activity areas which are representative and the most frequent areas in online social networks, including food, entertainment, sport, travel, shopping, life-service, education, medical, work and other.
On the basis of the relationship circle, we calculate the correlation degree between each
AN US
relationship circle and the name of each activity area. We set a threshold value in advance, and take the most relevant activity area name as the name of this relationship circle, and the premise is the correlation degree which should exceed the threshold value. Otherwise, if all the correlation degree values are less than the threshold value, the relationship circle belongs to the area “other”. We record the normalized word frequency of relationship circle cm as an R-dimensional vector TFm, where
M
element tf rm is the normalized frequency of the word wr in W. Let Sem(cm,Al) be the relatedness between the relationship circle cm and the activity area Al. R
ED
Sem cm , Al tf rm sim wr , Al
PT
Sim wr , Al 1
(1)
r 1
Dep wr Dep Al 1 1 Dist wr , Al OL wr , Al Dep wr Dep Al
(2)
We comprehensively consider the influence factors that affect the similarity between concepts,
CE
such as semantic overlap degree, the concept depth and the strength between concepts in Sim(wr,Al) [15]. Where OL(wr, Al) represents the semantic overlap degree between world wr and activity area
AC
name Al, OL wr , Al O wr O Al . Dep(wr) and Dep(Al) respectively represent the concept depth of world wr and activity area name Al, and Dep(C)=Dep(Parent(C))+1. 4.4 The users’ relationship strength calculation in Wechat Friend Circle The relationship strength calculation involves the target user and his/her friends, so it can indicate the intimacy between target user and his/her friends in Wechat Friends Circle. Combined with the activity area, the relationship strength can represent the ability of target user’s friends to influence the target user in an activity area. For example, when there is a food recommendation for the target user, we need to seek target user’s friends who have higher relationship strength with him/her in “food” area, and then spread the recommendation information to the target user through his/her friends, which will
ACCEPTED MANUSCRIPT
improve the recommended acceptable probability, because users with strong relationship strength also tend to have a higher credibility. In this paper, we calculate the relationship strength between target user and his/her friends in Wechat Friends Circle based on the similarity and interaction. The similarity is measured mainly based on users' profile information (PI), which is determined by target user and his/her friends. We respectively use vector P (ut) and P (uf) to represent the profile information of target user ut and his/her friend uf, which includes sex and region. We definite the value of each element as follows: if the sex of ut and uf are the same, then sex(ut)=1, sex(uf)=1, otherwise, sex(ut)=1, sex(uf)=0, if without this attribute, the default are sex(ut)=1, sex(uf)=0. If the region of ut and attribute, the default is region(ut)=1, region(uf)=0.
CR IP T
uf are the same, then region(ut)=1, region(uf)=1, otherwise, region(ut)=1, region(uf)=0, if without this After gaining the PI vector of user ut and uf, we use the cosine similarity Similarity(ut,uf) to calculate the profile information similarity between user ut and user uf, as formula (4) shows.
Similarity ut , u f cos
A ut A u f
A u A u
(4)
f
AN US
t
In addition to this, many developers and businesses have applied for Wechat Subscription in Wechat platform, and have achieved a full range of communication and interaction with some special groups through various forms, such as text, pictures, language and video, forming a mainstream Wechat interactive marketing method online and offline. Each Wechat user will pay attention to some Wechat Subscription according to their own needs and interests. Similar to relationship circle, different Wechat Subscriptions may belong to different activity areas. When two users concern different Wechat
M
Subscriptions but the activity areas of these Wechat Subscriptions belong to the same field, it means the needs and interests of the two users are more similar, and their relationship strength is stronger. For example, if two users concern lots of Wechat Subscriptions related to “food”, it shows they have a
ED
greater food demand, or they have a greater interest in food. Therefore, they may have more interaction about food with each other, and have stronger relationship strength in “food” activity area. In another case, if the Wechat Subscriptions concerned by two users are nearly entirely different, one user mainly
PT
focusing on the Wechat Subscriptions related to “food”, instead of “sport”, and the other mainly focusing on the Wechat Subscriptions related to “sport”, instead of “food”, it shows their needs and
CE
interests are extremely different. Therefore, they must nearly have no interactive activities in daily life, and their relationship strength tends to be weak. Based on these analyses above, we can know users’ needs and interests through the analysis of Wechat Subscriptions concerned by them. Comparing the
AC
concerned Wechat Subscriptions between target user and his/her friends, we can know the similarity of needs and interest between them. Gaining the concerned Wechat Subscriptions information of each related Wechat user at first, we
utilize Chinese word segmentation system ICTCLAS2013 to preprocess the name of Wechat Subscriptions pm, and record the segmentation result as R-dimensional vector TFm, then take Sem(pm,Al) to represent the similarity between Wechat Subscription pm and activity area name Al, as is shown in the Eqs. (5), where wr indicates the rth word in the segmentation results, goole_distance(wr,Al) represents the Standard Google distance between word wr and activity area name Al. The standard Google distance between two search words x and y is defined as is shown in Eqs. (6). R
Sem( Pm , Al ) google _ dis tan ce wr , Al r 1
(5)
ACCEPTED MANUSCRIPT
NGD x, y
max log f x ,log f x log f x, y log M min log f x ,log f x
(6)
M indicates the total number of web pages searched by Google. f(x) and f(y) are the click numbers of search word x and search word y. f(x,y) represents the web page number that has x and y at the same time. We set a threshold value in advance, and then calculate the relevance between Wechat Subscription pm and each activity area name Al. We take the activity area name whose relevancy value
CR IP T
exceeds the threshold value and is maximum value as the name of this Wechat Subscription. Otherwise, if all the relevancy values are less than the threshold value, the Wechat Subscription will belong to “other” area and be not involved in the calculation.
We record the number of Wechat Subscription which is concerned by user ui and belongs to activity area Al with a matrix
N iAl
. And then we calculate the different degrees between Wechat
AN US
Subscriptions concerned by users based on distance function which is used to measure the distance between two points in the space. In the model of this paper, we let Wechat users be equivalent to the points in space, and take the number of Wechat Subscription which is concerned by users and belongs to each area as the attribute value for the point. It can describe the similarity degree between the Wechat Subscription users concerned.
M
Some distance formulas are widely used in the similarity field, such as Euclidean distance, Manhattan distance [7, 17, 18], Chebyshev distance, Minkowski distance and so on [19, 20]. Moreover, the squared Euclidean distance calculation is simple and has practical significance. So, in the model of
ED
this paper, we adopt the squared Euclidean distance, if the distance of two point is closer, the similarity of them is higher. We take DistX(ut,uj) to represent the distance between the Wechat Subscriptions
PT
concerned by user ut and user uj, the formula is shown as (7).
DistX ut , u j xtl x jl 9
2
(7)
l 1
CE
Where, xtl indicates the number of Wechat Subscriptions concerned by user t. xjl indicates the number of Wechat Subscriptions concerned by user j. Interaction is measured mainly based on the frequency of interactive behaviors including like,
AC
comment, reply, collection. Let Inf(ut,uj) represent the interactive frequency between target user ut and his/her friend uj.
Inf ut , u j
In ut , u j In ut
(8)
In(ui,uj) denotes the counts of interactive behavior between user ut and user uj, and the interactive behaviors cover the comments between ut and uj, target user ut like for his/her friend uj, and target user ut collects the status contents of his/her friend uj. In(ut) denotes the total number of the interactive behaviors between target user ut and all his/her friends, and the interactive behaviors cover the comments between target user ut and all his/her friends, target user ut like for all his/her friends, and target user ut collects the status contents of all his/her friends.
ACCEPTED MANUSCRIPT
Different from the general social network platform, the collection source of status content in Wechat Friends Circle is difficult to directly get. So to this problem, we get all the collection contents of target user first, and then compare them with the contents of album of target user’s friends, in case of the same content that means target user make one time collection from his/her friend. Although the collection source may be the same contents with other friends, in this research, it will be regarded as one collection that once the collection contents of target user are the same as his/her friend’s status contents in the album. Finally, the relationship strength between target user and his/her friends is affected by the similarity of their profile information, the distance of concerned Wechat Subscriptions, and the
CR IP T
frequency of interactions. Thus,
RE ut , u j Sim ut , u j DistX ut , u j Inf ut , u j
(9)
where α, β, γ denote the weight coefficient of the similarity of profile information, the distance of concerned Wechat Subscriptions and the frequency of interactions. In addition, α, β and γ satisfy the equation α+β+γ=1, , , [0,1] . And AHP (Analytic Hierarchy Process) is used to determine the
AN US
value of α, β and γ. 5. Experiments 5.1 Experimental settings
The dataset is downloaded from Wechat Friends Circle, which consists of the friends in the Wechat contacts, it making friends circle in our real life more figurative. In fact, “Friends Circle” is a function of the Wechat, through which users can publicize mood words and photos, and express
M
feelings in words and photos, and share the links of some articles or music through other software. What’s more, users can make comments and likes for the words, photos and other information of his/her friends, and the most important is that other users can only see the comments and likes of
ED
common friends [21]. According to the data of “the research report about 2014 China's social networking application user behavior”, in the first half of 2014, more than 89% of users used instant messaging tools, and the use of Wechat was in the second place, only after the QQ.
PT
We download data from Wechat Friend Circle by plugging in the consents of Wechat users. We first select ten active Wechat users as target users, and collect all the friends of these ten target users,
CE
which results in a total of 985 persons. For each of these 985 Wechat users, we download their profile information (sex, region and individual signature) and interaction activities (likes, comments, replies and collection) between January 2016 and February 2016. Furthermore, for each of ten target users, we
AC
download the collection contents. This results in a total of 56 collection contents. To evaluate the performance, we adopt a manual labeling procedure to generate the ground truths.
We contact with target users and their friends, and ask all of them to label the relationship strengths. For each friend of target user, we provide a list of activity area, then the target user labels the relationship strengths on the specific activity area with each of his/her friend on the scale of ‘‘strong’’, ‘‘weak’’. When two users label different relationship strengths between them, we will ask them to re-label this relationship strength. After that, we obtain 1237 relationship strengths from Wechat Friends Circle. 5.2 Experimental results 5.2.1. Evaluation on activity area assignment
ACCEPTED MANUSCRIPT
The partition of relationship circles and assigning activity area to each relationship circle count a great deal of the relationship strength calculation between Wechat friends. Since different Wechat friends have different relationship strengths in different activity areas. Hence, we first test the performance of relationship circle and activity area assigning approach, which consists of two steps. One is dividing relationship circles among Wechat friends; the other is assigning the most closely related activity area to each relationship circle. The true proximate activity area of each pair of Wechat friends is manually labeled as is described in Section 5.1. And the true activity area of each relationship circle is revealed according to the following criteria: a relationship circle is labeled as the activity area Al as long as there is over 60% of relationship between target user and his/her friends in a certain relationship circle
CR IP T
belonging to the area Al. Otherwise, we think this relationship circle is a noisy relationship circle.
For performance evaluation measurement, we adopt “accuracy”, which is equal to the proportion of the correctly judgment number of the most closely activity area of the relationship between target user and his/her friends in a certain relationship circle. We also conducted an 11-way classification of the relationship circle to investigate the inter-area discriminative abilities of the activity area judgment. As Fig. 4 shows, it illustrates the detailed classification accuracies in different activity area. The accuracy of
AN US
classification is different in different activity areas. For example, “medical” and ‘‘travel’’ areas, are easily distinguished since there are some popular words to represent these activity areas, such as the words “sick”, “medicine” for the “medical” area and the words “scenic”, “transport” for the “traveling” area. Conversely, some activity areas, such as “work” and “shopping” have diverse representational words, thus these activity areas are difficult to be identified. And we can see from Figure 4 that the method proposed by us to assign activity area to each relationship circle is precise, in most activity areas
AC
CE
PT
ED
M
the correct rate is above 60%.
M
AN US
CR IP T
ACCEPTED MANUSCRIPT
Fig. 4 A cross matrix to illustrate the classification accuracies in different activity areas, where the core on the ith row and the
ED
jth column refers to the proportion of the relationship circles belong to the activity area i is assigned as the activity area j by our approach.
PT
5.2.2. Evaluation on relationship strength measurement In this part, we exploit Precision, Recall and the normalized Discounted Cumulative Gain (nDCG),
CE
to measure the relationship strength estimation result, which is estimated in part 4.4. Precision and Recall are two measures in the field of information retrieval and statistical classification, which are used to evaluate the quality of the results. For the evaluation of the results of
AC
user relationship strength calculation, Precision indicates the ratio of the correct number of relationship strength belonging to strong relationship or weak relationship according to the calculation results of the relationship between users and the number of total relationship. Recall indicates the ratio of the correct number of relationship strength belonging to strong relationship or weak relationship according to the calculation results of the relationship between users and the number of relationship strength actually belonging to each relationship strength range. For example, A relationships belong to strong relationship strength, while B relationships belong to weak relationship strength in fact. According to the results of relational strength calculation, there are C strong relationship strength and D weak relationship strength. Among them, E relationships are right in strong relationship strength, and F relationships are right in weak relationship strength. The Precision is shown as Eqs. (10).
ACCEPTED MANUSCRIPT
P =
E+F C+D
(10)
Recall is shown as Eqs. (11). E F (11) + A B What’s more, nDCG is an indicator of PageRank that is widely used in search engine. It considers R
1 2
both the importance of searching results and the relative location of searching results. If the strong punished. The formulation of nDCG is shown as follow. nDCGp =
DCGp IDCGp
CR IP T
correlation takes a higher rank, the more effective the method is. Otherwise, the method will be
(12)
In this equation, IDCG is an ideal CDG. Then we sort the results manually. In the best order status, we calculate the DCG of query, which is called IDCG. p
DCG p =
(13)
AN US
i=1
2reli - 1 log 2( 1 + i )
The average NDCG is the average nDCG of all the users in one certain relationship circle. In order to verify our method, we compare it with the following two methods. 1. An estimation method for relationship strength in weighted social network graphs which was developed by Xiang Lin. [14] (Here we denote this method as “WSNG”), this method is based on the trust propagation strategy.
M
2. Latent variable model method which was developed by Xiang et al. [12] (Here we denote this approach as ‘‘LVM’’.) It is an unsupervised link based on Latent Variable Model to estimate the overall relationship strength from interaction activity and user profile similarity.
ED
In our experiment, we only consider the most top 20 friends of each target user in each relationship circle. Based on Wechat Friends Circlet, we get the comparison results of WSNG method and ours according to Precision and Recall shown in Table 2. Method
PT
Table 2 the final result of these methods and the result of our method Precision
Recall
0.446
0.305
LVM
0.452
0.350
Our Method
0.679
0.531
AC
CE
WSNG
We can see from the above table, our method is better than WSNG and LVM according to Precision
and Recall.
For the weighted social network graphs method, we take the target user as the center and calculate
the relationship strength in each relationship circle. Based on Wechat Friends Circlet dataset, we get the comparison results. The final result of this method and the result of ours are shown in Fig. 5.
ACCEPTED MANUSCRIPT
0.9 WSNG Our Method 0.8
0.7
0.5
0.4
0.3
0.2
0.1
0 food entertainment sport
travel
shopping life-serviceeducation medical
work
other
AN US
activity area
CR IP T
NDCG@20
0.6
Fig. 5 The average nDCG of WSNG method and our method
In the comparison with latent variable model, we compare the average nDCG of our method and the
AC
CE
PT
ED
M
latent variable model method in different depths. The result is shown in Fig. 6.
Fig. 6 The average nDCG of LVM method and our method
We can see in Fig. 5 that our method is superior to the weighted social network graphs method in each relationship circle. And we can also see from Fig. 6 that our method is better than the latent variable model method, which indicates that our method is feasible and effective. This is because our method not only takes account of the different relationship strength in different relationship circle, but also comprehensively considers the various impact factors on the Wechat user’s relationship strength. 6. Conclusion and future works
ACCEPTED MANUSCRIPT
In this paper, we proposed a model to calculate the relationship strength between users in Wechat Friends Circle, fusing the similarity of users’ profile information and the interaction between them. The users’ profile information and interactive activity information were leveraged in the proposed model to calculate the relationship strength between Wechat users. And this model tried to represent the similarity of users’ need and interests. We verify our method with the dataset from Wechat Friends Circle and the results demonstrated the feasibility and effectiveness of our approach. In our future work, we will conduct experiments on more users and consider more influencing factors of relationship strength to improve the accuracy of the relationship strength calculation. What’s more, we will further explore and
CR IP T
apply our research results to more areas, such as personalized recommendation, precision marketing Acknowledgments
This research is supported by The National Key Technology R&D Program of China (Grant 2014BAH24F06); Natural Science Foundation of China (No.71571162); Natural Science Foundation of Zhejiang Province (Grant LY14F020002); Zhejiang Province philosophy social sciences planning project (No.16NDJC188YB); College Students' science and technology innovation activities of Zhejiang
AN US
Province (2016R408080). This research is supported by the Contemporary Business and Trade Research Center of Zhejiang Gongshang University which is the Key Research Institutes of Social Sciences and Humanities Ministry of Education (Grant 14SMXY04YB, 14JJD630011, 13YJC630041). The authors also gratefully acknowledge the helpful comments and suggestions of the reviewers, who have improved the presentation.
AC
CE
PT
ED
M
Reference [1] Granovetter M. The strength of weak ties. American journal of sociology, 1973: 1360-1380. [2] Lin. F. Lee. H. J, Use of social network information to enhance collaborative filtering performance. Expert Systems with Applications, 2010, 37(7): 4772-4778. [3] Zhao Yunlong, Li Yanbing. Research on forecasting personality traits and relationship strength of social network users. In Proceeding of the seventh MAM conference on Business intelligence. 2012, 10 [4] Chenxu Li. Socialogical Analysis of Wechat “Circle of friends” Communication Mode. Dongbei University of Finance and Economics, 2014. [5] Qiyun Chang. WeChat friends circle: an established acquaintance relationship. ,2014,15:45-47. [6] Zhao X, Yuan J, Li G, et al. Relationship strength estimation for online social networks with the study on Facebook. Neurocomputing, 2012, 95: 89-97. [7] Gilbert E, Karahalios K. Predicting tie strength with social media//Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 2009: 211-220. [8] Wilson C, Boe B, Sala A, et al. User interactions in social networks and their implications//Proceedings of the 4th ACM European conference on Computer systems. Acm, 2009: 205-218. [9] B. Viswanath, A. Mislove, M. Cha, K. Gummadi, On the evolution of user interaction in Facebook, in: Proceedings of the 2nd ACM Workshop on Online Social Networks, 2009, pp. 37–42. [10] M. Granovetter, The strength of weak ties: a network theory revisited, Sociol. Theory 1 (1983) 201–233. [11] P. Singla, M. Richardson, Yes, there is a correlation: from social networks to personal behavior on the web, in: Proceeding of the 17th International Conference on World Wide Web, 2008, pp. 655– 664. [12] Xiang R, Neville J, Rogati M. Modeling relationship strength in online social networks//Proceedings of the 19th international conference on World wide web. ACM, 2010: 981-990.
ACCEPTED MANUSCRIPT
[15] [16] [17] [18]
[19] [20] [21] [22] [23]
Xu K, Zou K, Huang Y, et al. Mining community and inferring friendship in mobile social networks[J]. Neurocomputing, 2016, 174: 605-616. Wang M, Zuo W, Wang Y. An improved density peaks-based clustering method for social circle discovery in social networks[J]. Neurocomputing, 2016, 179: 219-227.
AC
CE
PT
ED
M
[24]
CR IP T
[14]
Backstrom L, Huttenlocher D, Kleinberg J, et al. Group formation in large social networks: membership, growth, and evolution. Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM New York, NY, USA, 2006: 44-54. XLin X, Shang T, Liu J. An Estimation Method for Relationship Strength in Weighted Social Network Graphs. Journal of Computer and Communications, 2014, 2(04): 82. ZHANG Zhong-ping, ZHAO Hai-liang,ZHANG Zhi-hui.Concept Similarity Computation Based on Ontology. Computer Engineering, 2009, 07: 17-19. Kim H, Kang S, Oh S. Ontology-based quantitative similarity metric for event matching in publish/subscribe system. Neurocomputing, 2015, 152: 77-84. Mcauley J,Leskovec J.Discovering social circles in ego networks.ACM Transactions on Knowledge Discovery from Data(TKDD),2014,8(1):4. Leskovec J,Huttenlocher D,Keinberg J. Predicting positive and negative links in online social networks//Proceedings of the 19th international conference on World wide web.ACM , Stockholm :2010:641-650. Xu Zhi-Ming,LI Dong,LIU Ting et al. Measuring Similarity between Microblog Users and Its Application. Journal of Computers, 2014,(1):207—218. Kaneko T, Yanai K. Event photo mining from twitter using keyword bursts and image clustering. Neurocomputing, 2016, 172: 143-158. Xin Guo.The Research of Wechat Moments on Interpersonal Communication. SHAN DONG University, 2015. Nuñez-Gonzalez J D, Graña M, Apolloni B. Reputation features for trust prediction in social networks. Neurocomputing, 2015, 166: 1-7.
AN US
[13]
ACCEPTED MANUSCRIPT
AN US
CR IP T
Chunhua Ju is a professor, doctoral supervisor and division chief of science and technology department in Zhejiang Gongshang University who focuses on intelligent information processing, data mining and E-commerce. And he won the award for "New Century Excellent Talents in University" of China. In the past several years, he led more than 6 national projects. He has published more than 30 papers which are SCI and EI indexed.
AC
CE
PT
ED
M
Wanqiong Tao is a postgraduate in Zhejiang Gongshang University. Her research focuses on intelligent information processing, data mining and E-commerce.