Flocking based evolutionary computation strategy for measuring centrality of online social networks

Accepted Manuscript Title: Flocking based Evolutionary Computation Strategy for measuring Centrality of Online Social Networks Authors: Dhinesh Babu L...

Download PDF

3MB Sizes 1 Downloads 26 Views

Report

Full Text

Accepted Manuscript Title: Flocking based Evolutionary Computation Strategy for measuring Centrality of Online Social Networks Authors: Dhinesh Babu L.D, Ebin Deni Raj PII: DOI: Reference:

S1568-4946(17)30229-6 http://dx.doi.org/doi:10.1016/j.asoc.2017.04.047 ASOC 4180

To appear in:

Applied Soft Computing

Received date: Revised date: Accepted date:

29-7-2015 28-3-2017 24-4-2017

Please cite this article as: Dhinesh Babu L.D, Ebin Deni Raj, Flocking based Evolutionary Computation Strategy for measuring Centrality of Online Social Networks, Applied Soft Computing Journalhttp://dx.doi.org/10.1016/j.asoc.2017.04.047 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Flocking based Evolutionary Computation Strategy for measuring Centrality of Online Social Networks Dhinesh Babu L.D1 and Ebin Deni Raj2 1 School 2 Indian

of Information Technology and Engineering, VIT University, Vellore, India. Institute of Information Technology Kottayam (IIIT-K).

Graphical abstract BOID’s rule Separati

Interacti on

Connecti ons Procedu re Interacti

Interaction_vector =Interaction_vector + Connections

Compute Interaction value

Interactio ns Connecti ons

Alignme nt

Connecti ons Procedu re

Alignment_vector=Alignment _vector-connections/6

Compute Alignment value

FBC

Centrality values of nodes in OSNs

Procedu re

Sepr_vector=sepr_vector(Interaction-Total interaction)

Compute separation value

Highlights

    

We propose a novel and realistic centrality measure-FBCS for online social network. This algorithm is inspired from the flocking behaviour of birds and boid’s rule. Results have been tested in computer generated networks. The new centrality measure has been compared with some of the existing centrality measures. Algorithmic complexity analysis shows that the proposed algorithm is efficient.

Abstract— Centrality in social network is one of the major research topics in social network analysis. Even though there are more than half a dozen methods to find centrality of a node, each of these methods has some drawbacks in one aspect or the other. This paper analyses different centrality calculation methods and proposes a new swarm based method named Flocking Based Centrality for Social network (FBCS). This new computation technique makes use of parameters that are more realistic and practical in online social networks. The interactions between nodes play a significant role in determining the centrality of node. The new method has been calculated both empirically as well as experimentally. The new method is tested, verified and validated for different sets of random networks and benchmark datasets. The method has been correlated with other state of the art centrality measures. The new centrality measure is found to be realistic and suits well with online social networks. The proposed method can be used in applications such as finding the most prestigious node and for discovering the node which can influence maximum number of users in an online social network. FBCS centrality has higher Kendall’s tau correlation when compared with other state of the art centrality methods. The robustness of the FBCS centrality is found to be better than other centrality measures.

Key Terms— Centrality in social network, Degree of nodes, Online social network analysis, Boid’s Algorithm, Flocking of birds.

1. INTRODUCTION

O

nline Social Network Sites (OSNs) have become an inseparable part of the daily lives of billions of

people. Online Social Network sites can be defined as web-based services that help users to build a public or semi-public profile within a system, to be in touch with other users and allow them to view and traverse their own connections and those made by others in their list [1] [2]. The names and terminology will vary from one social network site to another. OSNs help users to maintain connections and to find new connections. Social networking sites can be either professional connections (such as LinkedIn) or relationship initiation among

people with similar perspective [3]. One of the key characteristics of social network analysis is that the growth of online social network is random and very dynamic [4]. Complex networks are used to model a variety of real world networks and have distinctive features in terms of global as well as local scenario, which make them different from regular networks [5]. OSN can be considered as one such complex network. All OSNs have a variety of technical features which they claim to be their unique feature, the lynchpin of all sites display a connected list of friends. Most of the OSNs have default techniques that help the users to find other users whom they know offline and strangers with shared likeness and perspective. Social network analysis traces back its roots to sociology, network and graph theory [6] [7]. The increasing popularity of OSNs and sociological studies has set off another research area in computer science named Online Social network analysis [8]. Online Social Network Analysis has different research aspects such as community detection, centrality, trust management, collaborative rating and page rank that help in deriving useful insights from the network [9] [10]. Centrality is one of the most researched areas in social network analysis. A social network can be represented using bipartite graphs, unipartite graphs, concept lattices, rough sets and fuzzy sets. The most common method of representation is by using a set of nodes and edges [11]. The network structure and network formation has a crucial role in social network analysis. Unlike other complex networks, OSNs have scale-free properties and are weakly connected in nature [5]. The data representation in social network analysis is usually by adjacency matrix, edge list and adjacency list for small networks [12]. There are numerous other representation techniques like compressed graph representations and dense sub graphs. [13] [14]. One of the simplest network models that are widely used in social network analysis is Erdős–Rényi model [15] [16]. There are also various other network models such as Barabási–Albert model, random encounter and growth model [17] [18] [19]. Social computing and big data go hand in hand in research as billions of bytes of data are generated each second in online social networks. This huge quantity of data can be analysed only by using social network analysis [20] [21]. Various techniques to measure centrality will be discussed in the following sections of this paper. In 1994, Wasserman and Faust defined centrality as the actor which has high participation in many relations irrespective of the volume of activity [22]. Today, this definition of centrality cannot be taken fully into consideration as much change has come about in centrality of social network with the instauration of online social networks. The structure of online social networks is that there are no formal ties or hierarchy thereby resulting in equal opportunity for everyone in the internet [23]. In 2005, Borgatti stated that centrality index will

be different for various types of networks and that each index will be appropriate for a particular instance [24]. The main objective of this paper is to propose and validate a realistic centrality for discovering the influential nodes in an online social network. From the literature survey, it has been concluded that primitive centrality measure is no match for online social networks. We have introduced a new centrality metric FBCS which is realistic and much suited for online social network. The paper is organized into four main sections. The initial section discuss about the relevance of social networks and the importance of centrality. The second section discusses the related works in centrality and the need of a new centrality technique for OSNs. The third section describes the new centrality metric FBCS. The final section deals with the validation and comparison of the proposed centrality technique FBCS with other centrality measures. We have used Kendall’s tau correlation and robustness to compare the centrality technique with other measures. The new centrality has been compared and validated using benchmark datasets and random networks. The statistical distribution of the new centrality measure is illustrated and compared with respect to standard centrality measures using networks of varying sample sizes.

2. CENTRALITY IN A NETWORK-RELATED WORKS Centrality in a network is the comparative importance of a particular node among a group of nodes or in the whole graph. The concept of centrality was discovered in the 19th century by Camille Jordan [25]. It points out to the most critical node or most influential person in a social network. Centrality plays a very crucial role in communication between different nodes in a social network. There are different types of centrality such as degree of a node, betweenness, and closeness [26]. The concept of centrality was applied on human communication by Bavelas [27]. The network representation consists of vertices representing each entity on the network and edges are associated with connection between vertices [28]. Mathematically, it is convenient and computationally easy to display a network using an adjacency matrix [29]. aij ={

1 if i is connected to j (1) 0 if i is not connected

Where i,j are the nodes of the network. Connected implies that that there is a vertex from node i to node j. There will be no self-connections. The conceptual classification of centrality is discussed in the following section.

2.1. Calculation of Centrality-Different approaches Linton C Freeman was the first to coin and discuss centrality in social circles and he proved that the concept

of centrality can be used in the context of problem solving [30] [31].

 C g

CF 

i 1

F

(n * )  C F (ni )



[( g  1)( g  2)]

(2)

where, CF is the freeman's centrality measure, n is the total number of nodes in the network, and g is the total number of nodes with which a particular node is connected to. Eigenvector centrality calculates the local importance of a node within a specific group in the social network [32]. It talks about the relevance of a particular node in the social network with respect to a group or community [32] [33] .Let G= (V, E) with V vertices and let the adjacency matrix be A= (aV, t) and let vertex v be linked to vertex t. It can be calculated as 1

1

CE = 𝜗 ∑𝑡∈𝑀(𝑣) 𝑥𝑡 = 𝜗 ∑𝑡∈𝐺 𝑎𝜗 , 𝑡𝑥𝑡 Where, CE

(3)

is the eigenvector centrality of a vertex v. The eigen vectors can be calculated only in a

symmetric matrix.

Interestingly more than one node will have the same centrality in some social networks.

The eigenvector centrality is a significant indicator of information flow over the social network. It depicts the big players in the social network pertaining to a particular region or group. The eigenvector centrality in social network is calculated by power laws and power iteration methods [34]. The social network is represented as an adjacency matrix, and the eigen value and vector is calculated iteratively and renormalized to find the centrality. The computational cost of eigen vector centrality in very large networks will be on the higher side as it uses the power methods [35]. Bonacich proposed another type of centrality where the centrality of a network depends not only on the degree of the vertex of a particular node but also on the centrality measure of the neighbours. This centrality takes into account all possible paths between the nodes in the complex network. It estimates a factor named popularity score of each node by solving a set of M linear equations where M is the number of nodes [36] [37]. The popularity score could be just the degree of the node which is the number of persons a node is associated within the social network. 𝐶(𝛼, 𝛽) = 𝛼(𝐼 − 𝛽𝑅)−1 𝑅𝐼 (4) Where C (α, β) is, the bonacich centrality and α is the scaling factor used to normalize the centrality and β is the weight of each node in the network. R is the adjacency matrix, ‘I’ is the identity matrix, and I is a matrix with all ‘ones’. The bonacich power centrality is closely related to eigenvector centrality [37] [32]. The Bonacich centrality is generally used while dealing with small connected networks such as a community or

group social network. The S-Z index centrality was inspired from the concept of electrical conductance in an electrical network [38]. This measure depends on the amount of communicable information between two pairs of individuals over a network. The computation is based on the inverse of M X M matrix where M is the size of the network. 2.2. Centrality Types This section analyses different centrality measures and the various factors each centrality consider for calculation. Centrality can be measured using different approaches such as degree centrality, closeness centrality, betweenness centrality, and eigenvector centrality [39]. These centrality measures are used in various complex networks. Table 2 shows the metrics in deciding the selection of a centrality metric in a network. This table considers both social network and other complex networks. The acronyms used in this table are described in table 1. These centrality measures are known as global measures of centrality. In all of these measures, the number of nodes a particular node is in contact with plays a crucial role in calculating its centrality value. A close examination of each of the mentioned centrality gives an impression that the number of ties a particular node holds will be directly proportional to the centrality of the node. Unfortunately, online social network centrality is dependent on number of ties as well as the frequency of communication between the nodes. Closeness centrality deals with the shortest distance between any two nodes and calculates the shortest distance between one particular node and all the other nodes [40]. Closeness centrality becomes insignificant on a static network or in a disconnected network. It tries to reach the destination node in the least number of hops. A node should know the information of all its near neighbour nodes for calculating the minimum distance. Addition of a node to a network will either increase the value of centrality or remain the same but will never decrease.

Closeness centrality is usually measured using symmetric matrix or a directed matrix. Closeness centrality is insignificant in an egocentric designed network [41]. Both degree centrality as well as closeness centrality can be called as node centrality [42]. Closeness centrality has immense applications in chemistry and molecular biology complex networks. Betweenness centrality value depends on the shortest path available between nodes in a network. The goal is to create a set of shortest paths such that each of the nodes in the network can be reached in minimum time [43]. Large number of shortest path crossing over one node will lead to a higher betweenness centrality for that node. The effectiveness of betweenness centrality in a dynamic network such as online social network is very little as

the task is computationally demanding and at times not viable. Betweenness centrality is an individual local measure in a complex network [44]. Various betweenness centrality are prevalent such as edge betweenness and group betweenness [45]. Node betweenness is computationally very expensive on weighted complex networks [46]. The criteria, as shown in table 1 and table 2 consist of number of nodes, position, structure of network, shortest path, assortative mixing, path discovery and fellowship development [47] [48]. Assortative mixing is defined as the tendency of nodes to get connected to other nodes with many connections. Fellowship development can be defined as the strong liking to grow friendly association with other nodes in the network [47]. Each of the existing centrality measures will satisfy some of the criteria listed in table 1. Degree centrality can also be defined as the vulnerability of a node. As the number of nodes increases more is the probability of being the centre of importance in the case of degree centrality [49]. In social computing, degree centrality solely cannot determine the influential person of the network [50]. A person with thousands of connections or nodes will be insignificant if there is no interaction with other nodes at any point of time. As shown in table 2 the degree centrality can answer only two of the criteria namely number of nodes and position of the node. Degree centrality will be useful only in static networks. It can be either out-degree (connections leaving a node) centrality or in-degree (connections coming to a node) centrality in a directed network. The degree centrality of a network can be computed as the row sum of the adjacency matrix [51]. It is known as the local topological property of a node in a network [52]. Betweenness centrality is widely used for community detection in complex networks. The nodes that occur in the shortest path between other nodes will have higher centrality than other nodes that fall less frequently on the shortest path. As shown in Table1, betweenness centrality depends on number of nodes, path discovery, shortest path and position of the node in the network. Table 1 shows the detailed description of acronyms used in table 2. Like closeness centrality, betweenness centrality also requires local information of nodes [53]. In online social network, information/communication may not happen over the shortest path and depends on many other factors such as frequency of communication, fellowship building, trust and interactions with other nodes [54]. Shortest path centralities are irrelevant in online social network context [46]. Communicability centrality is again derived from the concept of shortest path, which both betweenness centrality and closeness centrality makes use of [55]. Communicability centrality takes into account not only the shortest path between nodes but also all the paths between nodes. It is mostly used for community detection in social networks [56]. Table 2 shows that communicability centrality considers the number of nodes in the social

network, path discovery in social network, shortest path among nodes, fellowship development and the position of node in the social network. Eigenvector centrality is calculated as a relative score based on the number of connections a node has with other nodes and is compared to a global score that is dynamically calculated [57]. This centrality takes the whole network topology into account and calculates the global score for comparison with the score obtained by a single node in the network. It is also known as Gould index. It can be calculated as weighted row sum of the adjacency matrix and the weights correspond to the centralities of column nodes. Table 2 shows that eigenvector centrality considers the number of nodes, fellowship development, and structure of network, position and assortative mixing. Load centrality has some similarities with betweenness centrality as it calculates centrality from the number of paths via a particular node i.e. more the number of paths, greater will be the centrality of that node [58]. It helps to calculate the power vested with each node in the network and its capacity. Stress centrality is another variant of betweenness centrality. The weighting factor in betweenness centrality is dropped, and stress centrality is calculated. It could be applied even to disconnected networks. Table 2 shows that the stress centrality depends on the number of nodes, shortest paths, path discovery, structure of network and the position of the node. Edge centrality is the technique in which the most travelled edges are considered to be central and important [52]. It is widely used in the area of knowledge based systems and its centrality is usually calculated using random the walks of graphs. Centrality measures on a network are divided into various families. The division is based on the node selection criteria for measuring centrality. The first family of centrality measures depends on nodes near to them, and the second family of centrality measures depends only on a central node and other nodes play the role of intermediaries [59]. Degree centrality and closeness falls into the initial family while betweenness belong to the later one. Most of the centrality measures depend on the position and the number of nodes in the network. As already discussed in the introduction, not all centrality measures are suited for all types of network [60]. The most widely used centrality measures in complex networks are degree, betweenness and eigen vector [61]. The metric- position is not important in OSNs as there is no hierarchy in social networks. In online social network any node will be able to communicate with another node which is connected. As noted, since there are no hierarchy or formal ties in OSNs, if a node is well connected and if the node communicates well with other

nodes, that particular node will be more popular in the network. This paper proposes a new centrality measure which will have assortative mixing, fellowship development, and number of nodes. Unfortunately, none of the existing centrality measures satisfy all the three criteria mentioned. Communicability centrality satisfies the criterion of fellowship development and the number of nodes but fails to satisfy assortative mixing. This paper proposes and validates a realistic centrality measure inspired from the flocking behaviour of birds and Boid’s rule.

3. FLOCKING BASED CENTRALITY FOR SOCIAL NETWORKS (FBCS): THE REALISTIC CENTRALITY The proposed algorithm for determining the node possessing highest centrality value in online social network – Flocking based Centrality for Social network (FBCS) is inspired from the flocking behaviour of birds. A flock is defined as a group of entities that demonstrate an aligned, non-colliding and aggregate motion [62]. The flocking behaviour is autonomous and dynamic just like a social network and there is no defined leader in a flock [63] [64]. Collective motion is described as the coordinated behaviour displayed by animals. They behave as one body and change shape and direction along with the group [65]. Some of the examples include birds, fishes and bees. Each of the user/node is independent and interacts with others according to its local perception of the dynamic social network [66]. All the entities in a flock stick to the local rules, and there is no defined global rule as such. Birds in a flock have the propensity to stick to the centre of the flock and this fundamental principle is used to formulate the centrality of a node. The separation maintained in the flock helps in avoiding collisions with other birds in the flock. The birds adapt and tune the velocity as of its neighbours to make the flock movement uniform. These three rationales namely cohesion, alignment and separation make it possible for a flock to move without commotion. A boid is a simulated bird-like object which is prominently used in many flocking based algorithms. The Boids rule, as stated by Reynolds is derived using these three characteristics as follows: a) Collision avoidance: Avert collision with other birds b) Velocity matching: To match the velocity of neighbouring birds. c) Cohesion: The tendency to remain close to neighbouring birds. The three rules stated above makes it possible for a flock to bind around a centre of mass and move forward in a smooth flow [67]. The behaviour of boids can be compared to that of cellular neural networks as each of the boids behave autonomously and get local neighbourhood information [68]. The detailed proof of

mathematical derivation can be found in the Appendix section. Online social networks can be compared to that of flocking of birds in many aspects. In a social network different people have assorted ideas and opinion and there is a tendency to stick to the views of people with similar preferences and opinions [69] [70]. People tend to give credence to an idea/ideology which is shared by a majority of people. This is analogous to the alignment rule mentioned as in Boids rule. Users in a social network tend to cuddle around other users whom they like or respect, which is similar to the cohesion in boids rule. In a social network, users stay aloof from other users to whom they have contempt and will never cuddle towards them. This is similar to the collision avoidance rule by Boids [62] [68]. The observations derived from the above rules can be utilized to frame the centrality measuring algorithm. The rules can be modelled on to social network as Interaction rule, separation rule and cohesion rule. None of the centrality measures discussed in the previous sections takes into consideration the number of interactions between the nodes/users. The interactions between the nodes play a crucial role in finding the centrality of each node. The number of hops between users/nodes can be directly related to the number of connections (degree) of a node [71] [72]. For two nodes A and B, KA and KB are the numbers of connections. The total number of interactions in the social network can be calculated and defined as n. The interactions in a social network can be as in the equation (40) below. Interaction rule: 1

d(K A , K B ) ≤ d1 ∩ d(K A , K B ) ≥ d2 => VINTR = n ∑nKA VA if K A > K B (40) where, d1 and d2 are the set of predefined hop values from A to B and B to A and d(K A,KB) is a degree function of number of connections of A and B. In social networks users try to nestle together with other users whom they like or respect and try to move away from the people they dislike. The separation rule can be derived from this behaviour in online social network. For two nodes A and B the number of connections can be defined as K A and KB and VA and VB can be the interactions of A and B respectively. Separation rule: d(K A, K B ) ≤ d2 => VSEPR = ∑1n

VA+ VB d(KA +KB )

(41) Where d2 is the predefined hop value that is greater than

zero and n is the total number of interactions in the whole social network. Online social network users tend to follow and interact with the users they like and respect [65]. This behaviour of the users in an online social network can be used to derive the cohesion rule of social network. For two nodes A and B, the number of connections is KA and KB and n be the total interactions in the network.

Cohesion rule: d(K A , K B ) ≤ d1 ∩ d(K A , K B ) ≥ d2 => VREL = ∑nA(K A − K B ) , where K A > K B (42) where, d2is the set of predefined hop values. The three rules namely interaction, separation and cohesion form the basis of FBCS.

Algorithm: Flocking Based Centrality for social network 1.

Initialize N nodes in network

2.

Calculate the total number of interactions (total_intr) in the whole network N, at this instant.

3.

for i ←1 to N do a.

Gather all nodes to set C that are within the reach of Nodei.(In one hop)

b.

Compute the relative vector (VR) of the Nodei using the PROCEDURE Relative vector (Node Ni, C).

c.

Compute the average interaction vector (VI) of Nodei using PRODEDURE Interaction (Node Ni, C).

d.

Compute the separation vector (VS) using the PROCEDURE Separation (Node Ni, C, total_intr).

e.

centrality_nodei=total_intr/(VR-VI+VS+total_intr).

4.

End For

5.

Centrality_network=centrality_node0

6.

for i ←0 to N do a.

if(centrality_network < centrality_nodei) i.

b.

centrality_network=centrality_nodei

End If

7.

End For

8.

Centrality_network= Calculated value in step 6

PROCEDURE Separation(Node NJ,C, NTOTAL.interaction) Vector sepr = 0; min= Minimum value of interaction FOR EACH NODE 1 to N

IF N! = NJ THEN IF |NTOTAL.interaction - NJ.interaction| < min THEN sepr = sepr - (N.interaction - NJ.interaction) ELSE Continue to the next node. END IF END IF END RETURN sepr END PROCEDURE The average interaction computation

PROCEDURE Interaction (NODE NJ,C) Vector PCJ

FOR EACH NODE 1 to N IF N! = NJ THEN

PCJ =PCJ + N.Connections ELSE Break. END IF. END PCJ=PCJ / N-1 RETURN (PCJ - N.Connections) / min END PROCEDURE The relative computation

PROCEDURE Relative vector(Node NJ,C)

Vector PVJ FOR EACH Node 1 to N

IF N! = NJ THEN PVJ = PVJ+ N.connections ELSE Break. END IF END PVJ= PVJ / N-1 RETURN (PVJ - NJ.connections) /6

END PROCEDURE

The nodes in online social network will have a number of connections to other nodes and accordingly the interaction count varies from one node to another. For experimental purpose, the social network nodes were created using the Prescribed Node Degree, Connected Graph (PNDCG) algorithm. For each of the nodes, values are calculated using three different procedures defined along with the algorithm. These values are used to compute the centrality of the nodes. The values of the computed centrality are compared to each node’s centrality and the largest value is selected as the centrality of the network. In calculating the separation procedure, the minimum interaction (both in and out) is set for calculating the Separation value. Nodes that do not satisfy this criterion will have the Separation value as zero. The Separation value can be considered to be a function of both the degree of the node as well as the number of interactions a node possess. The average interaction value of each node is calculated on the basis of the number of connections an user/node has. In calculating the separation value, the minimum interaction was set as one hundred, and the same value is used in calculating the interaction value of a node. The relative value of a node is a function of connections in a node. The degree of separation in a social network is defined as the number of hops or connections through which any two unknown or stranger nodes/users can be made to meet. There are different surveys and research findings on the degree of separation. Degree of separation can be called as the distance between nodes. It can also be called as the friendship distance. The six degrees of separation was proposed by Frigyes Karinthy in early 20th century [73]. The friendship distance or degree of separation of various social networks varies considerably, and each of them follows a different value. For calculating the relative value, we have considered

the value six as the degree of separation. Figure 1 shows the graphical representation of the FBCS algorithm.4.

4. RESULTS AND DISCUSSION

We used PNDCG algorithm to generate the social network data required for the analysis of the proposed algorithm FBCS. The PNDCG algorithm is specifically designed for generating sample social network for online social network analysis [74]. The generated algorithm was tested on eight sample networks of different network sizes and density was also tested using benchmark datasets. Assortative mixing is very significant in social networks and it varies with different network datasets [48]. The PNDCG algorithm generates a graph which is weakly connected, and has lower assortative mixing values than other graph generation algorithms [74]. The parameters in the algorithm namely interaction vector, separation vector and relative vector were calculated using the generated networks and centrality was calculated as shown in the FBCS algorithm.

The parameters as described previously were calculated and centrality can be computed using the proposed algorithm, Flocking Based centrality for social networks. The centrality values of each set of sample nodes are spread across a region as depicted by the density plots in Figure 3 and Figure 4. The peak in the density plot denotes the area with maximum nodes with same centrality. The peak varies for each set of sample nodes. The FBCS algorithm is found to have a computational complexity time of O (VI) where V is the number of nodes a particular node is attached to and ‘I’ is the total interaction of the node. This clearly shows that even when interaction is very high, the computational cost will be a minimum.

Figure 3 depicts the centrality values of a sample size of 900 nodes and highlights the largest centrality value of the network as node 470. The highest computed centrality is 0.5030. The sample data set consists of 900 nodes with parameters such as interaction vector, separation vector and relative vector. Analyses have been performed on sample sizes varying between 200 to 900 nodes. Figure 2 shows the different centrality values for each of the sample data. A.

Computational Comparison For every sample network taken for analysis, we computed eight centrality measures including the realistic

FBCS centrality. The correlation between pairs of centrality measures can be computed with the help of correlation coefficient.

Centrality values were initially calculated for three model networks of varying sizes namely 50, 100 and 200. The mean correlation of each of the centrality measures is calculated from three different sample networks. The centrality measures that were taken into consideration were In Degree, Out Degree, Degree, Betweenness, closeness, Edge, Eigen vector and FBCS. The values of correlation vary between -1 to 1, where the value of 1 implies that the two variables are totally correlated and -1 to be totally anti-correlated [42]. Table 3 shows the values of spearman correlation coefficient of different centrality measures discussed in the paper. The various centrality measures discussed in this paper have been deduced by applying mathematical computation on the underlying adjacency matrix of the network dataset. A high correlation of the centrality measure would imply that the centrality measures are superfluous. On the contrary, if the measures are not highly correlated, they suggest that each one of them is a distinct measure with different results. The correlation coefficient shows that there is a high degree of correlation between FBCS and Eigen vector and very low degree of correlation between Betweenness and FBCS. The Eigen vector centrality is highly correlated with Degree centrality measure. This paper has considered only traditional state of the art centrality measures for social network such as Degree, Betweenness, Closeness, edge and Eigen vector for correlation with the new centrality measure [23]. We also calculated the average correlation values from the three sample networks of varying sizes. The overall correlation among 8 measures was 0.31 (SD=0.28). Figure 5 shows the correlation values of different centrality measures in the form of scatterplots. The curve obtained by each centrality measure is rendered along with the scatterplots. It is clear from the figure that none of the curves are similar and the FBCS curve is different from all other scatterplots. Table 3 shows the comparison of centrality measures and table 4 shows the correlation comparison of different centralities.

The mathematically formulated theories (refer Appendix for details) and the results from the implementation of the algorithm can be compared using Q-Q plots as shown in Figure 6. The Q-Q plot shows the variation in FBCS centrality values which were mathematically calculated and the ones that were computed experimentally. The black dotted lines depict the experimentally calculated centrality values.

The FBCS algorithm was tested and verified on five different benchmark datasets of varying sizes as shown in Table 5. The datasets include Dolphins Social Network, Zachary’s Karate Club Network, American Football, Neural Network and Political blog network. It shows the node label and the value of the corresponding centrality measure. Each of the data sets was analysed in terms of different centrality measures and compared

with existing knowledge about important nodes from associated published literature to review the role of each centrality measure in the determination of influential nodes [75]. Figure 7 shows the different centrality measures for Dolphins Social network. This is an undirected network created by D. Lusseau et al [76] showing the association between 62 dolphins. The network consists of 62 vertices and 159 edges. Here, nodes with higher centralities are represented as bigger nodes and coloured differently. The nodes with highest centrality are same in the case of degree, eigen vector and FBCS. For betweenness centrality and closeness centrality the nodes with highest centrality are unique.

Figure 8 shows the network representation of the benchmark dataset- American football network. This is a social network of the American football league which was compiled by Girwan and Newman from the games of the schedule of division IA colleges [77]. Each team is consigned into a conference/community of 8-12 teams. The team in the same community plays an average of 17 games. The network comprise of 115 vertices and 615 edges. Here the nodes with greater centrality values are highlighted in red. Degree centrality has same centrality value for multiple nodes, and for all other centrality measures namely betweenness centrality, closeness centrality and eigen vector centrality, the value is unique. For FBCS measure, Texas Tech is the node with highest centrality.

Figure 9 represent the network representation of the benchmark dataset Zachary’s Karate Club Network. It is a real world network of friendships between 34 members of a karate club and is a common benchmark data for most of the community detection algorithms [78]. The network is said to have been compiled over a period of two years by Zachary. Over those two years, the club members had split into two groups due to some disagreements with the instructor and the club owner. This benchmark dataset has 34 nodes and 78 edges. Here, the highest centrality value for eigen vector centrality, degree centrality and FBCS is same node, while closeness centrality and betweenness centrality has different nodes.

Figure 10 represents the random layout of the benchmark dataset -neural network of nematode Caenorhabditis elegans compiled by Duncan Watts and Steven Strogatz from original experimental data done by White et al. [79]. This network contains 297 nodes and 2359 edges. As shown in Figure 9 node ‘305’ has the highest centrality in all cases except for closeness centrality.

Figure 11 shows the network representation of the political blogs benchmark dataset. Political blog network is the data representation of the political blogs in the US after the election in 2004 [80]. The views were recorded by Adamic and Glance in 2005. The network consists of two groups namely ‘liberal’ and ‘conservative’. It is a directed network where each node is labelled either conservative or liberal. Political blog network consists of 1490 nodes and 19090 links. Here the highest centrality value for Degree centrality, betweenness centrality and FBCS is the node ‘Blogsforbush’. The highest centrality value for closeness and eigen vector centralities are different. The experiments conducted on benchmark datasets clearly demonstrated that not all centralities are equal. From the experiments it is evident that nodes possessing high degree centrality will have high eigen vector centrality and FBCS centrality. From the statistical correlation in Figure 5, it is proved that FBCS centrality is not similar to other existing centrality measures. The experiments on benchmark datasets reassure the same and prove that FBCS can be accepted as a new centrality measure which takes into account fellowship development, number of nodes and Assortative mixing.

To evaluate the performance of different centrality measures, the influence spreading capacity of each of the centrality is analysed. Kendall’s tau correlation coefficient (τ) is utilised to calculate the ranking scores of different centrality measures [81]. This ranking method obtains the nodes’ influence for spreading a process/information throughout the network. A higher value of (τ) indicates better performance. We consider the susceptible-infected-recovered (SIR) spreading dynamic model for analysing the influence of nodes in the network [82]. For calculating Kendall’s tau correlation coefficient, the infected probability is set as β=1.5𝛽𝑐 where 𝛽𝑐 is the approximate epidemic threshold. Table 6 shows the Kendall’s tau correlation coefficient (τ) between the ranking scores given by various centrality measures on different datasets. It is clear that FBCS has higher correlation value than other centrality measures, when tested with various datasets. The highest value in each dataset is emphasized in bold. The average correlation of different centrality measures are plotted against FBCS in figure 12. It is clear from the figure that FBCS outperforms other centrality measures when compared with various datasets.

Robustness is another measure which is commonly used to quantify the performance of centrality measures in networks. It is defined as the area under the curve of difference between the giant component in a network and the critical portion of the network. Mathematically it is the area under (σ – p), where ‘σ’ is the giant component

1

i

i

and ‘p’ is the critical portion of the network. It is calculated as R = n ∑ni=1 σ (n) where σ (n) is the size of the giant component after removing (i/n) of nodes from the complex network. Smaller values of R mean that algorithm is better than other algorithms. The robustness R have been calculated for five different benchmark datasets and compared with four state of the art centrality approaches as shown in table 7. The lowest value for each of the benchmark dataset is emphasized in bold. The average value of robustness for each of the benchmark dataset has been computed and compared with FBCS algorithm. It is clear from figure 13 that the FBCS algorithm has lower value than the average value of all other centrality measures. Overall, the FBCS algorithm performs better than all other algorithms. 5. CONCLUSION

The applications of centrality in social network include political view analysis and leader determination from a group of nodes. The FBCS method can be used efficaciously in social networking sites such as Facebook, Twitter, and LinkedIn as it will be useful in identifying emerging and popular leaders, in online social network. As discussed and substantiated, the proposed centrality named as Flocking Based Centrality for Social networks (FBCS) takes the most realistic and practical parameters into consideration. The algorithm is derived from the flocking behaviour of birds, which mimic a social network in many aspects such as interaction, separation and relative factor. The FBCS method can be effectively used to measure the most powerful node in a given network.

The higher computational complexity of most of the centrality measure makes it impossible to use it in a dynamic network such as online social networks. Online social networks require a centrality measure that is realistic, practical and fast. The computational complexity of FBCS is O(VI) where v is the total number of connections a node is possessing and I is the value of the number of interactions a node is having with other nodes. In this paper we have introduced an algorithmic framework, FBCS, for computing the centrality of a complex network by taking all practical and realistic factors into consideration. This algorithm, as demonstrated throughout the paper, promises to scale up its efficiency with a larger number of nodes. The algorithm has been tested and validated on five benchmark datasets namely Dolphins Social Network, Zachary’s Karate Club

Network, American Football, Neural Network and Political blog network. The experiments prove that the new centrality measure is different from the existing state of the art centrality measures.

The new centrality measure has been validated using Kendall’s tau correlation and robustness. The FBCS algorithm outperformed all other state of the art algorithms and has an average correlation value of 0.746. The average value of robustness is lowest, when compared with other centrality measures. Thus, these two ranking methods prove that this new algorithm can be used for influence maximization over online social networks.

APPENDIX

Mathematical Evaluation of Centrality in Social Networks A realistic centrality measure is bound to have fellowship development, assortative mixing and number of nodes. The goal of such a mathematical model is to define and validate the factors which we used in this realistic centrality. We define a characteristic function on the basis of whether a node has the highest centrality or not. The indicator functions that denotes whether or not a node has the highest centrality value in the network is defined as

I(A) = {

1 if the centrality is the highest (5) 0 if centrality is NOT the highest

The centrality indicator function of two separate nodes A and B will be a joint event. I(A⋂B) = I(A)I(B) (6) The complementary centrality indicator function can be described as the probability that a node is not the one with the highest centrality. I(AC ) = 1 − I(A)

(7)

For two nodes in the same social network, and each of the nodes have connections to n nodes and m nodes namely A1 ...An and B1......Bm. T will be the subset of the connection set of A and B. T ⊂ {(i, j), i = 1 … . n, j = 1 … . m } (8) Let v1 and v2 be the number of interactions {Ai , i = 1 … . n} {Bj . j = 1 … . . m} of A and B respectively.

The two functions that can be correlated with v1 and v2 are defined as below The interaction vector function is defined as G1 (𝑣1 , 𝑣2 ) = ∑(i,j∈T) cij (Vii) (Vjj ) (9) The separator function is stated as G2 (𝑣1 , 𝑣2 ) = ∑(i,j∈T) dij (Vii) (Vjj ) (10) From (9) and (10) we can imply that G1 (𝑣1 , 𝑣2 ) ≤ I(𝑣1 ≥ 1, 𝑣2 ≥ 1) ≤ G2 (𝑣1, 𝑣2 ) (11) where cij and dij are coefficients, that may depend on n and m but not on the interaction set fully. Since E(I(𝑣1 ≥ 1, 𝑣2 ≥ 1)) = P(𝑣1 ≥ 1, 𝑣2 ≥ 1 (12) And ) (v2 ) = Sij E(v1 i j

(13)

Taking the expectation in (12) and conjunction with (13) we get ∑(i,j)∈T Cij E(((vii) (vj )) ≤ E(I(𝑣1 ≥ 1, 𝑣2 ≥ 1)) ≤ ∑(i,j)∈T dij E((vii) (vj) j j

(14)

Eqn(10) is equivalent to the following inequality ∑(i,j)∈T Cij Sij ≤ P(𝑣1 ≥ 1, 𝑣2 ≥ 1) ≤ ∑(i,j)∈T dij Sij (15) In the above equation ∑(i,j)∈T Cij Sij is the lower Bonferroni type while the RHS is upper Bonferroni type [83] [84]. Thus, considering four bonferroni assumptions [85] [86]. S11 = E(𝑣1 , 𝑣2 ) (16) S12 = E [

v1v2(v2−1)

] (17)

2

S21 = E[𝑣2 𝑣1 (v1 − 1)/2)] (18) v1(v1−1)v2(v2−1)

𝑆22 = E [

4

] (19)

The relative function of v1 and v2 can be derived as follows H(𝑣1 , 𝑣2 ) = a𝑣1 𝑣2 + b

v1(v1−1)v2 2

+c

v1v2(v2−1) 2

+d

v1(v1−1)v2(v2−1) 4

(20)

for 0 ≤ 𝑣1 ≤ n 0 ≤ y ≤ 𝑣2 and H(𝑣1 , 𝑣2 ) ≤ 1 The constants a, b, e, d can be evaluated using the method of indicator functions.

From (20) it is clear that H(0, 𝑣2 ) = H(𝑣1 , 0) = 0 H(𝑣1 , 𝑣2 ) ≤ I(𝑣1 ≥ 1, 𝑣2 ≥ 1 (21) E(H(𝑣1 , 𝑣2 ) ≤ P(𝑣1 ≥ 1, 𝑣2 ≥ 1) (22) Applying (17) to (19) )(v2 )] + bE[(v1 )(v2 )] + cE[(v1 )(v2 )] + dE[(v1 )(v2 )] (23) = aE[(v1 1 1 2 1 1 2 1 2 =aS11 + bS21 + cS12 + dS22 (24) If we are assuming the connections of nodes A and B as k1 and k2, from equations 14-21 we can state that P(𝑣1 ≥ 1, 𝑣2 ≥ 1) ≥

2 S k1k2 11

−

2 S k1k22 12

−

2 S k12 k2 21

(25)

Converting equation(25) by using the method of indicator functions. f(𝑣1 , 𝑣2 ) = 𝑣1 , 𝑣2 (2k1 k2 + k1 + k 2 − k1 𝑣2 − k 2 𝑣1 ) ≤ (k1 k2 )2 (26) for 1 ≤ 𝑣1 ≤ n and 1 ≤ 𝑣2 ≤ m Converting (25) into real values of v1,v2 f(𝑣1 , 𝑣2 ) = 𝑣1 , 𝑣2 (2k1 k2 + k1 + k 2 − k1 𝑣2 − k 2 𝑣1 )

(27)

Taking the partial derivative of equation(23) ,

fv1 ′ = 𝑣2 (k1 k2 + k1 + k 2 − k1 𝑣2 − k 2 𝑣1 ) − k 2 𝑣1 𝑣2

(28)

And fv2 ′ = 𝑣1 (2k1 k2 + k1 + k 2 − k1 𝑣2 − k 2 𝑣1 ) − k1 𝑣1 𝑣2 (29) Letting fv1 ′ = 0 and fv2 ′ = 0 we have 𝑣1 0 =

2k1k2+k1+k2 3k2

and 𝑣2 0 =

2k1k2+k1+k2 3k1

(30)

For v1>0 and v2>0 1

f " 𝑣1 𝑣1 (𝑣1 0, 𝑣2 0)f " 𝑣2𝑣2 − f "2 𝑣1 𝑣2 (𝑣1 0 , 𝑣2 0 ) = 3 (2k1 k2 + k1 + k 2 )2 > 0 (31) f " 𝑣1 𝑣1 (𝑣1 0, 𝑣2 0) = f " 𝑣2 𝑣2 (𝑣1 0, 𝑣2 0) =

−2k22 (2k1k2+k1+k2) 3k1k2 −2k12 (2k1k2+k1+k2) 3k1k2

≤ 0 (32) ≤ 0 (33)

From equation (29) and (30) it is possible to infer that the maximum value of (𝑣1 0 , 𝑣2 0) will be

(

2k1k2+k1+k2 3 3

)

1

2

k1k2

1

3

1

= (k1 k 2 )2 (3 + 3k + 3k ) ≤ (k1 k 2)2 2

1

(34)

for k1 ≥ 2 and k 2 ≥ 2

Thus max f(𝑣1 , 𝑣2 ) ≤ (k1 k 2)2 (35) 𝑣1 ≥ 1 and 𝑣2 ≥ 1 It can be rewritten as 2

2

k1k2

𝑣1 𝑣2 − k12 k22

𝑣1 (𝑣1 −1)𝑣2 2

2

− k22k1

𝑣1 𝑣2 (𝑣2 −1) 2

≤ I(𝑣1 ≥ 1, 𝑣2 ≥ 1) (36)

The expectation of the inequality in (36) gives 2

2

2

P(𝑣1 ≥ 1, 𝑣2 ≥ 1) ≥ k1k2 S11 − k1k22 S12 − k12k2 S21 (38) which is same as equation (26)

The left-hand side of Eqn (38) can be called as the interaction function, and right-hand side can be the predicted separation function. The equations 24-37 can be used to find the separation vector and the interaction vector for a sample network of one hundred nodes and can be plotted. The perspective plot obtained by plotting interaction function and separator function is as shown in Figure2. From Equation (20) we know that H(𝑣1, 𝑣2 ) ≤ 1, thus plotting the relative function along with Interaction function in (37) and the number of nodes is depicted as in Figure2 The relative function H(𝑣1 , 𝑣2), the interaction function G1(𝑣1 , 𝑣2 ) and the separator function G2(𝑣1 , 𝑣2) can be used to calculate the centrality of the social network.

Centrality(C) =

𝑣1 +𝑣2 H(𝑣1 ,𝑣2 )−G1(𝑣1 ,𝑣2 )+G2(𝑣1 ,𝑣2 )+𝑣1 +𝑣2

(38)

where 𝑣1 + 𝑣2denotes the total number of interactions of two nodes A and B. The derived formula for centrality can be used over all nodes in the social network after calculating the connections and interactions of one node with respect to all the other nodes in the social network . In a dynamic network where the nodes and connections vary quickly, the centrality changes according to the alteration in the network. In a given period of time, the most influential node can turn into a less influential node

or vice versa. Let C(x) be the centrality of a social network which contains n number of nodes/users. c1 (x) c (x)

C(x) = (c23(x)) .. cn (x)

(39)

. The centrality might change over a period of time as more and more connections and interactions happen in the social network. A dynamic social environment can be denoted as

∫W(t) C d(x)

(40) where w(t) is the function of time.The solution to (40) can be obtained as,

d C(x). d(x) ∫ dt W(t)

(41)

Let the difference in centrality be over a small period h and the change from W (t) to W (t+h) can be calculated as 1

lim h {∫W(t+h) C(x. t + h)d(x) − ∫W(t) C(x)d(x) h→0

(42)

The changed centrality of different social networks will be y = x + h. C(x, t) + o(h) where h is the time period and o is the error term

1 + h∆x. u1 | h∆x. u2 h∆x. u3

h∆y. u1 1 + h∆y. u2 h∆y. u3

h∆z. u1 h∆z. u2 | 1 + h∆z. u3

(43)

The matrix obtained in (43) represents the changed centrality of a social network over a period of time. The interaction vector and separation vector with a sample dataset of 100 nodes are plotted in Figure 15.The interaction vector and the relative vector are plotted using a sample set of hundred nodes of a social network is depicted in Figure 14. The 3D scatter plot of a sample of 200 nodes is plotted and displayed in Figure 16.

References [1] N. B. Ellison, “Social network sites: Definition, history, and scholarship.,” Journal of Computer‐Mediated Communication, Vols. 13,, no. 1, pp. 210-230, 2007. [2] N. B. Ellison, C. Steinfield and C. Lampe, “The benefits of Facebook “friends:” Social capital and college students’ use of online social network sites.,” Journal of Computer‐Mediated Communication, vol. 12, no. 4, pp. 1143-1168, 2007. [3] S. A. Vannoy and P. Palvia, “The social influence model of technology adoption.,” Communications of the ACM, vol. 53, no. 6, pp. 149-153, 2010. [4] A. Mislove, M. Marcon, G. Krishna P, P. Druschel and B. Bhattacharjee., “Measurement and analysis of online social networks.,” in In Proceedings of the 7th ACM SIGCOMM conference on Internet measurement, San Diego, CA, 2007. [5] C.-Y. Lee and S. Jung, “Statistical self-similar properties of complex networks.,” Physical Review, vol. 73, no. 6, p. 066102, 2006. [6] J. Scott, “Social network analysis.,” Sociology, vol. 22, no. 1, pp. 109-127., 1988. [7] Newman, M. EJ, D. J. Watts and S. H. Strogatz, “Random graph models of social networks,” in Proceedings of the National Academy of Sciences 99, UNited states of America, 2002. [8] A. Mislove, M. Massimiliano, G. Krishna P, P. Druschel and B. Bhattacharjee, “Measurement and analysis of online social networks,” in 7th ACM SIGCOMM conference on Internet measurement, San Diego, CA, 2007. [9] E. D. Raj and L. D. Babu., “Effective Detection of Modular and Granular Overlaps in Online Social Networks Using Fuzzy ART,” International Journal of Fuzzy Systems, pp. 1-16. [10 J. Bonneau, J. Anderson, R. Anderson and F. Stajano, “Eight friends are enough: social graph approximation via ]

public listings.,” in Proceedings of the Second ACM EuroSys Workshop on Social Network Systems, Nuremberg, 2009.

[11] E. D.Raj . L. D. Babu. , “A fuzzy adaptive resonance theory inspired overlapping community detection method for online social networks,” Knowledge-Based Systems, vol. 113, pp. 75-87, 2016.

[12] D. A. Schult and P. J. Swart, “Exploring network structure, dynamics, and function using NetworkX,” in Proceedings of the 7th Python in Science Conferences (SciPy 2008), Pasadena, 2008. [13] C. Hernández and N. Gonzalo, “Compressed representation of web and social networks via dense subgraphs.,” String Processing and Information Retrieval, pp. 264-276, 2012. [14] F. Claude and L. Susana, “Practical representations for web and social graphs.,” in Proceedings of the 20th ACM international conference on Information and knowledge management, Glasgow, 2011. [15] E. D. Raj and L. D. Babu, “A firefly swarm approach for establishing new connections in social networks based on big data analytics,” International Journal of Communication Networks and Distributed Systems, vol. 15, no. 23, pp. 130-148, 2015. [16] H. Jeong, B. Tombor, R. Albert, Z. N. Oltvai and A.-L. Barabási, “The large-scale organization of metabolic networks,” Nature 407, vol. 6804, pp. 651-654, 2000. [17] Barabási, Albert-László and R. Albert, “Emergence of scaling in random networks.,” science 286, vol. 5439, pp. 509-512, 1999. [18] B. Skyrms and R. Pemantle., “A dynamic model of social network formation.,” Adaptive Networks, pp. 231-251, 2009. [19] O. Mandelshtam, “https://people.maths.ox.ac.uk/porterm/research/olga_final1.pdf,” 26 September 2007. [Online]. Available: https://people.maths.ox.ac.uk/porterm/research/olga_final1.pdf. [Accessed 27 APril 2015]. [20] E. D. Raj, L. D. Babu and E. Ariwa., “A Fuzzy Approach to Centrality and Prestige in Online Social Networks,” in Proceedings of the International Conference on Informatics and Analytics, Pondichery, 2016. [21] S. Lohr, “The age of big data,” New York Times, 11 February 2012. [22] W. Stanley and F. Katherine, Social network analysis: Methods and applications, USA: Cambridge University Press, 1994. [23] G. Barbian, “Trust Centrality in Online Social Networks,” in ntelligence and Security Informatics Conference (EISIC), Athens, 2011. [24] S. P. Borgatti, “Centrality and network flow,” Social Networks, vol. 27, p. 55–71, 2005. [25] P. Hage and F. Harary, “Eccentricity and centrality in networks.,” Social networks, vol. 17, no. 1, pp. 57-63, 1995. [26] E. D. Raj and L. D. Babu., “A model fuzzy inference system for online social network analysis.,” in 2015

International Conference on Computing and Network Communications (CoCoNet), Trivandrum, 2015. [27] A. Bavelas, “A mathematical model for group structures.,” Human organization, vol. 7, no. 3, pp. 16-30, 1948. [28] C.-Y. Lee, “Correlations among centrality measures in complex networks.,” 25 May 2006. [Online]. Available: http://www.citebase.org/abstract?id=oai:arxiv.org:physics/0605220. [Accessed 15 APril 2015]. [29] R. A. a. M. R. Hanneman, “Introduction to social network methods.,” 21 August 2005. [Online]. Available: http://faculty.ucr.edu/hanneman/nettext/. [Accessed 05 April 2015]. [30] L. C. Freeman, “Centrality in social networks conceptual clarification.,” Social networks, vol. 1, no. 3, pp. 215239, 1979. [31] L. C. Freeman, “A set of measures of centrality based on betweenness.,” Sociometry, pp. 35-41, 1977. [32] B. Ruhnau, “Eigenvector-centrality—a node-centrality?,” Social networks, vol. 22, no. 4, pp. 357-365, 2000. [33] S. P. Borgatti, “Centrality and AIDS.,” Connections 18, vol. 1, pp. 112-114, 1995. [34] S. P. Borgatti, C. Jones and M. G. Everett, “Network measures of social capital.,” Connections , vol. 21, no. 2, pp. 27-36, 1988. [35] M. Romance, “Local estimates for eigenvector-like centralities of complex networks,” Journal of Computational and Applied Mathematics, vol. 235, no. 7, pp. 1868-1874, 2011. [36] S. Wasserman and K. Faust, Structural analysis in the social sciences, Cambridge : Cambridge University Press, 1994. [37] P. Bonacich, “Power and centrality: A family of measures,” American journal of sociology, pp. 1170-1182, 1987. [38] R. Poulin, M.-C. Boily and B. R. Mâsse, “Dynamical systems to define centrality in social networks,” Social networks, vol. 22, no. 3, pp. 187-220, 2000. [39] J. Galaskiewicz, “Estimating point centrality using different network sampling techniques,” Social Networks, vol. 13, no. 4, pp. 347-386, 1991. [40] M. G. Everett and S. P. Borgatti, “Induced, endogenous and exogenous centrality,” Social Networks, vol. 32, no. 4, pp. 339-344, 2010. [41] D. C. Bell, J. S. Atkinson and J. W. Carlson, “Centrality measures for disease transmission networks,” Social networks, vol. 21, no. 1, pp. 1-21, 1999. [42] J. R. F. Ronqui and G. Travieso, “Analyzing complex networks through correlations in centrality measurements.,” 2014. [Online]. Available: arXiv preprint arXiv:1405.7724. [Accessed 5 March 2015].

[43] O. Green and D. A. Bader, “Faster betweenness centrality based on data structure experimentation,” in Procedia Computer Science 18, Barcelona, 2013. [44] T. Wey, D. T. Blumstein, W. Shen and a. F. Jordán, “Social network analysis of animal behaviour: a promising tool for the study of sociality.,” Animal behaviour, vol. 75, no. 2, pp. 333-344, 2008. [45] P. S. Nair and S. T. Sarasamma, “Data mining through fuzzy social network analysis,” in Fuzzy Information Processing Society, 2007. NAFIPS'07. Annual Meeting of the North American, San Diego, 2007. [46] T. Alahakoon, R. Tripathi, N. Kourtellis, R. Simha and A. Iamnitchi, “K-path centrality: A new centrality measure in social networks,” in Proceedings of the 4th Workshop on Social Network Systems, Salzburg, 2011. [47] U. M. Dholakia, R. P. Bagozzi and L. K. Pearo, “A social influence model of consumer participation in networkand small-group-based virtual communities,” International journal of research in marketing, vol. 21, no. 3, pp. 241-263, 2004. [48] M. E. Newman, “Assortative mixing in networks,” Physical review letters, vol. 89, no. 20, p. 208701, 2002. [49] M. Chen, K. Kuzmin and B. K. Szymanski, “Community detection via maximization of modularity and its variants,” IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, vol. 1, no. 1, 2014. [50] K. Faust, “Centrality in affiliation networks,” Social networks, vol. 19, no. 2, pp. 157-191, 1997. [51] C. Correa, T. Crnovrsanin and K.-L. Ma, “Visual reasoning about social networks using centrality sensitivity,” IEEE Transactions on Visualization and Computer Graphics, vol. 18, no. 1, pp. 106-120, 2012. [52] P. De Meo, E. Ferrara, G. Fiumara and A. Ricciardello., “A novel measure of edge centrality in social networks.,” Knowledge-based Systems, vol. 30, pp. 136-150, 2012. [53] M. R. Lee and T. T. Chen, “Understanding Social Computing Research,” IT Professional, vol. 15, no. 2, pp. 5662, 2013. [54] E. D. Raj and L. D. Babu, “An enhanced trust prediction strategy for online social networks using probabilistic reputation features,” Neurocomputing, vol. 219, pp. 412-421, 2017. [55] E. Estrada and N. Hatano, “Communicability in complex networks,” Physical Review, vol. E77, no. 3, p. 036111, 2008. [56] E. D. Raj and L. D. Babu, “A fuzzy adaptive resonance theory inspired overlapping community detection method for online social networks,” Knowledge-Based Systems, vol. 113, pp. 75-87, 2016. [57] A. Cuzzocrea, A. Papadimitriou, D. Katsaros and Y. Manolopoulos, “Edge betweenness centrality: A novel

algorithm for QoS-based topology control over wireless sensor networks,” Journal of Network and Computer Applications, vol. 35, no. 4, pp. 1210-1217, 2012. [58] K.-I. Goh, B. Kahng and D. Kim, “Universal behavior of load distribution in scale-free networks,” Physical Review Letters, vol. 87, no. 27, p. 278701, 2001. [59] J. Scott, Social Network Analysis: A Handbook, London: Sage, 2000. [60] C. Amrit and J. T. Maat, “Understanding Information Centrality Metric: A Simulation Approach,” 2013. [Online]. Available:http://www.researchgate.net/profile/Chintan_Amrit/publication/246548351_Understanding_Informatio n_Centrality_Metric_A_ Simulation_Approach/links/0deec537cb5242eeae000000.pdf. [Accessed 20 APril 2015]. [61] T. W. Valente, “Social network thresholds in the diffusion of innovations,” Social networks, vol. 18, no. 1, pp. 69-89, 1996. [62] J. T. Emlen, “Flocking behavior in birds,” The Auk, pp. 160-170, 1952. [63] X. Cui, J. Gao and T. E. Potok, “A flocking based algorithm for document clustering analysis.,” Journal of systems architecture, vol. 52, no. 8, pp. 505-515, 2006. [64] “Boids Pseudocode,” [Online]. Available: http://www.kfish.org/boids/pseudocode.html . [Accessed 24 February 2015]. [65] N. W. Bode, A. J. Wood and D. W. Franks, “Social networks and models for collective motion in animals,” Behavioral ecology and sociobiology, vol. 65, no. 2, pp. 117-130, 2011. [66] “Modeling Opinion Flow in Humans Using Boids Algorithm & Social Network Analysis,” [Online]. Available: http://www.gamasutra.com/view/feature/1815/modeling_opinion_flow . [Accessed 21 January 2015]. [67] M. Sajwan, D. Gosain and S. Surani, “Flocking Behaviour Simulation: Explanation and Enhancements in Boid Algorithm,” International Journal of Computer Science & Information Technologies, vol. 5, no. 4, 2014. [68] S. Alaliyat, H. Yndestad and F. Sanfilippo, “Optimisation of Boids Swarm Model Based on Genetic Algorithm and Particle Swarm Optimisation Algorithm (Comparative Study),” in Proceedings 28th European Conference on Modelling and Simulation , Brescia, 2014. [69] P. Domingos and R. M, “Mining the network value,” in 7th ACM SIGKDD Int. Conf. on Knowledge discovery and data, San Francisco, 2001.

[70] D.Watts, “Challenging the influentials hypothesis,” WOMMA Measuring the mouth, vol. 3, pp. 201-211, 2007. [71] R. Bakhshandeh, M. Samadi, Z. Azimifar and J. Schaeffer, “Degrees of separation in social networks,” in In Fourth Annual Symposium on Combinatorial Search, Barcelona, 2011. [72] C. W. Reynolds, “Flocks, herds and schools: A distributed behavioral model,” ACM Siggraph Computer Graphics, vol. 21, no. 4, pp. 25-34, 1987. [73] B. Doerr, M. Fouz and T. Friedrich, “Why rumors spread so quickly in social networks,” Communications of the ACM, vol. 55, no. 6, pp. 70-75, 2012. [74] J. F. Morris, J. W. O'Neal and R. F. Deckro, “A random graph generation algorithm for the analysis of social networks,” The Journal of Defense Modeling and Simulation: Applications, Methodology, Technology , p. 1–12, 2013. [75] K. Batool and M. A. Niazi, “Towards a methodology for validation of centrality measures in complex networks,” PloS one, vol. 9, no. 4, p. e90283, 2014. [76] K. D. Lusseau and B. Schneider, “The bottlenose dolphin community of Doubtful Sound features a large proportion of long-lasting associations,” Behav. Ecol. Sociobiol, vol. 54, no. 4, p. 396–405, 2003. [77] M. Girvan and M. Newman, “Community structure in social and biological networks,” Proc. Natl. Acad. Sci, vol. 99, no. 12, p. 7821–7826, 2002. [78] W. Zachary, “An information flow model for conflict and fission in small groups,” J. Anthropol. Res., vol. 33, p. 452–473, 1977. [79] S. S. Watts DJ, “Collective dynamics of small-world networks,” nature, vol. 393, no. 1, p. 440–442, 1998. [80] L. A. Adamic and N. Glance, “The political blogosphere and the 2004 US election: divided they blog,” in Proceedings of the 3rd international workshop on Link discovery, 2005. [81] L. Lü, D. Chen, X.-L. Ren, Q.-M. Zhang, Y.-C. Zhang and T. Zhou, “Vital nodes identification in complex networks,” Physics Reports, vol. 650, no. 1, pp. 1-63, 2016. [82] R. M. Anderson, R. M. May and B. Anderson, Infectious diseases of humans: dynamics and control, Oxford: Oxford university press, 1992. [83] R. J. Simes, “An improved Bonferroni procedure for multiple tests of significance,” Biometrika, vol. 73, no. 3, pp. 751-754, 1986. [84] Y. Hochberg, “A sharper Bonferroni procedure for multiple tests of significance,” Biometrika , vol. 75, no. 4,

pp. 800-802, 1988. [85] E. Seneta, “DEGREE, ITERATION AND PERMUTATION IN IMPROVING BONFERRONI‐TYPE BOUNDS,” Australian Journal of Statistics, vol. 30, no. 1, pp. 27-38, 1988. [86] J. Chen, Multivariate Bonferroni-type Inequalities: Theory and Applications, Boca Raton: CRC Press, 2014

Figure 1: Graphical Representation of FBCS calculation

BOID’s rule

Separation

Interaction

Connections Procedure Interaction

Interactions Connections

Alignment

Connections

Procedure Separation

Procedu Alignment

Interaction_vector =Interaction_vector + Connections

Compute Interaction value

Alignment_vector=Alignment_vectorconnections/6

Compute Alignment value

FBCS

Centrality values of nodes in OSNs

Sepr_vector=sepr_vector-(Interaction-Total interaction)

Compute separation value

Figure 2: Density plot of centrality values

Figure 3: Centrality values of different nodes in a sample size of 900 nodes

Figure 4: Density plot of centrality values

Figure 5: Correlation of different centrality measures compared with FBCS

Figure 6: Q-Q plot of centrality values

Figure 7: Dolphins Social Network. (a) Degree Centrality (b) Betweenness Centrality (c) Closeness centrality (d) Eigen vector centrality (e) FBCS centrality

Figure 8: American Football Network (a) Degree Centrality (b) Betweenness Centrality (c) Closeness centrality (d) Eigen vector centrality (e) FBCS centrality

Figure 9: Zachary’s Karate Club Network (a) Degree Centrality (b) Betweenness Centrality Closeness centrality (d) Eigen vector centrality (e) FBCS centrality

(c)

Figure 10: Neural Network (a) Degree Centrality (b) Betweenness Centrality (c) Closeness centrality (d) Eigen vector centrality (e) FBCS centrality

Figure 11: Political Blog (a) Degree Centrality (b) Betweenness Centrality (c) Closeness centrality (d) Eigen vector centrality (e) FBCS centrality

Figure 12: Comparison of average Kendall’s tau correlation of other algorithms with FBCS

Figure 13: Comparison of average robustness R of other algorithms with FBCS

Figure 14: Perspective plot of interaction vector and Relative of 100 nodes

Figure 15: Perspective plot of interaction vector and Separation vector of 100 nodes

Figure 16: 3D scatter plot of Relative vector, Interaction vector and separation vector of 200 nodes

s

TABLE 1

CENTRALITY MEASURES DESCRIPTION

Acronym used

Description

N

Number of nodes

F

Fellowship Development

SP

Shortest Path

PD

Path Discovery

SN

Structure of Network

P AM

Position Assortative Mixing

TABLE 2: COMPARISON of CENTRALITY MEASURES

N

F

SP

PD

SN

P

AM

Communicability









X



X

Betweenness



X



X

X





Closeness



X





X



X

Eigen Vector



X

X

X







Degree



X

X

X

X



X

Stress



X









X

Load



X

X





X



Edge



X

X







X

Measures Centrality

TABLE 3 COMPARISON OF CENTRALITY MEASURES

Centrality Index

Information

Dynamic

(Local/Global) L

Computational

Definition

Complexity

G

In Degree





O(V2)

Out Degree





O(V2)

|𝑉|

𝐷𝐸𝐺𝑖𝑛 (𝑣) = ∑ 𝑉𝑗𝑖 𝑗=1

|𝑉|

𝐷𝐸𝐺𝑜𝑢𝑡 (𝑣) = ∑ 𝑉𝑖𝑗 𝑗=1

Degree





O(V2)

Betweenness



O(VE)

Closeness



O(V(log V)E)

Edge



O(Ek)

𝐷= 𝐵𝐶 =

2 (𝑁−1)(𝑁−2)

𝐶𝐶 =

𝑑𝑒𝑔(𝑖) 𝑁−1

∑𝑗 ≠ 𝑘 ≠ 𝑖

𝑠∈𝑉

Eigen Vector



FBCS



N= Total number of nodes, j, V-vertices E-Edges



O(VI)

𝑑𝑗,𝑘

(𝑁 − 1) ∑ 𝑗 ∈ 𝐺, 𝑗 ≠ 𝑖 𝑑 𝑖, 𝑗

𝐸𝑑𝐶 = ∑ O(V3)

𝑑𝑗,𝐾(𝑖)

𝜎𝑠𝑘 (𝑒)1 𝜎𝑠𝑘

1 𝐸𝐶 = ∑ 𝑗 ∈ 𝐺 𝛼 𝑖, 𝑗. 𝑐𝑗 𝐸𝐶 𝜆 FC =

𝐼 𝑉𝑅 − 𝑉𝐼 + 𝑉𝑆 + 𝐼

VR= Value of relative vector, VI=value of interaction vector, VS= value of separation vector, I-value of total interaction, d I,j=shortest path length from I to

j

TABLE 4

SPEARMAN CORRELATION COEFFICIENT OF CENTRALITY MEASURES 1

2

3

4

5

6

7

8

1

In Degree

2

Out Degree

0.3

3

Degree

0.78

0.71

4

Betweenness

0.62

0.54

0.7

5

Closeness

0.55

0.16

0.45

0.37

6

Edge

0.21

0.86

0.61

0.44

0.41

7

Eigen vector

0.71

0.69

0.92

0.64

0.44

0.25

8

FBCS

0.26

0.33

0.45

-0.1

0.22

0.46

0.61

Average

0.49

0.54

0.62

0.33

0.35

0.35

0.60

0.31

SD

0.23

0.26

0.19

0.31

0.11

0.14

0.23

0.28

TABLE 5 COMPARISON OF DIFFERENT CENTRALITY MEASURES ON VARIOUS DATASETS

Centrality Eigen Degree

Betweenness

Closeness

FBCS vector

Dataset (Node label, value) Dolphins Social Grin,12

SN100, 454.27

Zig, 5.60

Grin, 1.0

Grin,0.0321

34, 17

1, 231.07

17, 3.51

34, 1.0

34,0.100

Orgeon state,2.8

Nevada, 1.0

Network Zachary’s Karate Club Network Multiple nodes

American Football

Notredame, with same

Texas Tech,

215.9

0.0096

centrality

Neural Network Political blog network

305,134

305,13246.3

49, 3.425

305,1.0

305, 0.033

Blogsforbush,

Blogsforbush,

Quimundus.modblog.com,

Dailykos.com,

Blogsforbush,

467

72997.9

5.12

1

0.011

TABLE 6: KENDALL’S TAU CORRELATION ( Τ) COEFFICIENT BETWEEN THE RANKING SCORES GIVEN BY DIFFERENT CENTRALITY MEASURES ON VARIOUS DATASETS

Dolphins social

American

Zachery karate

network

Football

club

Networks

β = 0.1625

β = 0.0605

β = 0.0115

Degree

0.4354

0.5781

Betweenness

0.5522

Closeness

Neural Network

Political Blog

β = 0.0667

β = 0.0551

0.4427

0.5487

0.4896

0.6841

0.5984

0.6824

0.5144

0.4978

0.4823

0.5487

0.5899

0.6987

Eigen vector

0.5978

0.6943

0.6211

0.7530

0.8011

FBCS

0.6577

0.8825

0.6345

0.7541

0.8041

Table 7: Robustness R of the five ranking methods on five real networks

Dolphins social

American

Zachery karate

Neural Network

Political Blog

Networks

network

Football

club

Degree

0.2587

0.1307

0.1284

0.5573

0.4458

Betweenness

0.3214

0.1310

0.1121

0.4187

0.3845

Closeness

0.3281

0.1261

0.1841

0.3214

0.1211

Eigen vector

0.1544

0.1269

0.0511

0.2994

0.3214

FBCS

0.1228

0.1287

0.0404

0.2977

0.1455

Flocking based evolutionary computation strategy for measuring centrality of online social networks

Flocking based evolutionary computation strategy for measuring centrality of online social networks

Recommend Documents