A new betweenness centrality measure based on an algorithm for ranking the nodes of a network

A new betweenness centrality measure based on an algorithm for ranking the nodes of a network

Applied Mathematics and Computation 244 (2014) 467–478 Contents lists available at ScienceDirect Applied Mathematics and Computation journal homepag...

2MB Sizes 0 Downloads 152 Views

Applied Mathematics and Computation 244 (2014) 467–478

Contents lists available at ScienceDirect

Applied Mathematics and Computation journal homepage: www.elsevier.com/locate/amc

A new betweenness centrality measure based on an algorithm for ranking the nodes of a network q Taras Agryzkov a, Jose L. Oliver b, Leandro Tortosa c,⇑, Jose Vicent c a

Universidad de Alicante, Campus de San Vicente, Ap. Correos 99, E–03080 Alicante, Spain Departamento de Expresion Grafica y Cartografia, Universidad de Alicante, Campus de San Vicente, Ap. Correos 99, E–03080 Alicante, Spain c Departamento de Ciencia de la Computación e Inteligencia Artificial, Universidad de Alicante, Campus de San Vicente, Ap. Correos 99, E–03080 Alicante, Spain b

a r t i c l e

i n f o

Keywords: Street network algorithms PageRank algorithms Centrality measures Betweenness Random-walk betweenness Eigenvector centrality

a b s t r a c t We propose and discuss a new centrality index for urban street patterns represented as networks in geographical space. This centrality measure, that we call ranking-betweenness centrality, combines the idea behind the random-walk betweenness centrality measure and the idea of ranking the nodes of a network produced by an adapted PageRank algorithm. We initially use a PageRank algorithm in which we are able to transform some information of the network that we want to analyze into numerical values. Numerical values summarizing the information are associated to each of the nodes by means of a data matrix. After running the adapted PageRank algorithm, a ranking of the nodes is obtained, according to their importance in the network. This classification is the starting point for applying an algorithm based on the random-walk betweenness centrality. A detailed example of a real urban street network is discussed in order to understand the process to evaluate the ranking-betweenness centrality proposed, performing some comparisons with other classical centrality measures. Ó 2014 Elsevier Inc. All rights reserved.

1. Introduction Real-world networks have been a field of study and research for a long time. They are represented by a graph G ¼ ðV; EÞ of vertices V and edges E. Often they have additional properties, e.g. weighted edges, undirected edges or parallel edges. A common example is the internet router topology where routers are vertices and links between routers are edges. One would like to know which routers or which links are important, e.g. how severe is the breakdown of a specified router or link. So centrality measures are required to label each vertex or edge with a number indicating its importance. But there is neither a mathematical definition for important nor for severe. So since the 1950s many centrality indices have evolved, each with specific applications. Some examples for applications include the facility location problem, highwaynode routing, web page ranking or prediction of polls. A centrality index is a structural index for vertices or edges. Often they are based on shortest paths. Some examples are closeness centrality, stress centrality, graph centrality, reach centrality and betweenness centrality [4]. One of the fundamental problems in network analysis is to determine the importance of a particular vertex (or an edge) in a network. We associate the idea of importance of a vertex with the mathematical concept of centrality. Starting from these q

This work was partially supported by Generalitat Valenciana Grant GV2012-111.

⇑ Corresponding author.

E-mail addresses: [email protected] (T. Agryzkov), [email protected] (J.L. Oliver), [email protected] (L. Tortosa), [email protected] (J. Vicent). http://dx.doi.org/10.1016/j.amc.2014.07.026 0096-3003/Ó 2014 Elsevier Inc. All rights reserved.

468

T. Agryzkov et al. / Applied Mathematics and Computation 244 (2014) 467–478

general considerations, the goal of this paper is the application of network centrality analysis to complex networks from a perspective oriented to ranking nodes, with a particular attention to urban networks. More specifically, we propose a centrality measure for urban networks that combines the idea of the random-walk betweenness centrality and an adapted PageRank algorithm which establishes a classification of the nodes, following an idea similar to that used by the Google engine for ranking Web pages. 2. Related work Over the years, network researchers have introduced a large number of centrality indices, measures of the varying importance of the vertices in a network according to a criterion [9]. These indices have proved to be of great value in the analysis and understanding of the roles played by networks like social networks [13], human flows in networks [18], computer networks [8,17], urban networks [10,11], and others. Depending on the type of the network studied, they are proxies for the structural importance of an element for the overall functioning of the network. Centrality is one of the most studied concepts in network analysis. Numerous measures have been developed, including degree centrality, closeness, betweenness [7], eigenvector centrality, information centrality, flow betweenness, the rush index, the influence measures of Katz [19], Hubbell [16], and Taylor measure, etc. What is not often recognized is that the formulas for these different measures make implicit assumptions about the manner in which things flow in a network. For example, some measures, such as Freeman closeness and betweenness [12], count only geodesic paths, apparently assuming that whatever flows through the network moves only along the shortest possible paths. Some measures, such as flow betweenness [14], do not assume shortest paths, but do assume proper paths in which no node is visited more than once. Other measures, such as Bonacich [4,5] eigenvector centrality and Katz influence, count walks, which assume that trajectories can not only be circuitous, but revisit nodes and lines multiple times along the way. Regardless of trajectory, some measures (e.g. betweenness) assume that what flows from node to node is indivisible (like a package) and must take one path or another, whereas other measures (e.g. eigenvector) assume multiple paths simultaneously (like information or infections). As defined by Friedkin [15], a node of closeness centrality is the sum of graph-theoretic distances from all other nodes, where the distance from a node to another is defined as the length (in links) of the shortest path from one to the other. In a flow context, we ordinarily interpret closeness as an index of the expected time until arrival of something flowing through the network [6]. Nodes with low raw closeness scores have short distances from others, and so will tend to receive flows sooner, assuming that what flows originates from all other nodes with equal probability and assuming that whatever is flowing manages to travel along shortest paths. In the case of information flows, we normally think of nodes with low closeness scores as being well-positioned to obtain novel information early, when it has the most value. In the last years, the network researchers have increased the interest in the effective use of urban and road networks. As an example, in [26,27] the authors show a general framework that takes as input a collection of pairs (trip, cost) and assigns trip cost based weights to a graph representing a road network. In these papers they used a weighted PageRank-based function to measure the weights on road segments with the aim of evaluating the information they want to study in the network. Starting from these general considerations, the goal of this paper is the application of network centrality analysis to complex networks from a perspective oriented to ranking nodes, with a particular attention to urban networks. More specifically, we propose a centrality measure for urban networks that combines the idea of the random-walk betweenness centrality and an adapted PageRank algorithm which establishes a classification of the nodes, following an idea similar to that used by the Google engine for ranking Web pages. 3. The random-walk betweenness centrality In the following, let us assume that we have a graph G ¼ ðV; EÞ where V is a set of n nodes or vertices, while E represents a set of links between vertex pairs. According to the intuitive idea of betweenness centrality C B , a point is considered to be central to the degree that it falls between other points on their shortest or geodesic communication paths. So, C B may be defined as the total fraction of shortest paths between each pair of nodes that pass through a given node [12]. For a general network, let us consider nist to be the number of shortest paths from nodes s to t that pass through node i, and g st to be the total number of shortest paths from nodes s to t. Then, the betweenness centrality of node i; C B ðiÞ, is

C B ðiÞ ¼

X ni

st

st

ð1Þ

g st ni

adopting the convention that gstst ¼ 0 if both nist or g st are zero. The formal definition of betweenness centrality implicitly assumes that information spreads only along those shortest paths. However, not all the networks are related with the geometric concept of euclidean distance. In some networks, information does not flow only along geodesic paths [24]. For example, when studying urban networks in which there is a continuous transit of people, we can not consider only shortest paths within the network itself. In many cases the flow of people and vehicles has a random component that can not be neglected.

T. Agryzkov et al. / Applied Mathematics and Computation 244 (2014) 467–478

469

Newman [21] proposed a new betweenness centrality measure, known as random-walk betweenness, that relaxes this assumption, including contributions from essentially all paths between nodes, not just the shortest, although it still gives more weight to short paths. The measure is based on random walks, counting how often a node is traversed by a random walk between two other nodes. We can say, in broad terms, that the random-walk betweenness of a vertex i is equal to the number of times that a random walk starting at s and ending at t passes through i along the way, averaged over all s and t. Newman summarizes the process to compute the random-walk betweenness of any vertex i, and denote it by bi , by means of the following algorithm. Algorithm 1. Let us assume that we have a graph G representing a network with n nodes. Step 1 Construct the matrix D  A, where D is the diagonal matrix of vertex degrees and A is the adjacency matrix. Step 2 Remove any single row, and the corresponding column (for example, the last row and column). Step 3 Invert the matrix ðD  AÞ1 and then add back in a new row and column consisting of all zeros in the position from which the row and column were previously removed. Call T the resulting matrix, which elements are T ij . Step 4 Calculate the random-walk betweenness for the node i from the expression

P bi ¼ 1 2

st s
nðn  1Þ

ð2Þ

;

using the values of Ii , from the equations

Isti ¼

1X Aij jT is  T it  T js þ T jt j; 2 j

i – s; t:

ð3Þ

We make some remarks on the algorithm to better understand the process of removing a row and a column in the matrix D  A, computing its inverse, and then adding a new row and column again.  If we look at the matrix constructed in step 1 of the algorithm, we realize that this is the Laplacian matrix Q, which provides us the relation between adjacency and incidence matrix of the graph. This matrix can be written as

Q ¼ BBT ¼ D  A; where A and B are the adjacency and incidence matrix, respectively, and D ¼ diagðd1 ; d2 ; . . . ; dn Þ is the degree matrix (a diagonal matrix whose entries are the degrees of the vertices).  Note that, by the definition of A, the row sum i of A equals the degree di of node i, that is,

di ¼

n X aik : k¼1

As a consequence of this, each row sums to zero and, therefore, Q is singular and det Q ¼ 0. This justifies step 2 of the algorithm in which we delete a row and column of the matrix to ensure that the matrix has an inverse. Note that this measure counts all paths between vertices and makes no assumptions of optimality; it is based on random walks between node pairs and gives us an idea about how often a given vertex will fall on a random walk between another pair of nodes. Besides, it is also notable that this method is based on a matrix inversion calculation. Although in computational mathematics we try to avoid this type of calculation, due to the sparse structure of the matrix, this problem is computationally tractable for most networks. 4. The adapted PageRank algorithm The PageRank method [22] was proposed to compute a ranking for every Web page based on the graph of the Web. Therefore, PageRank constitutes a global ranking of all Web pages, regardless of their content, based solely on their location in the Web’s graph structure. The purpose of the method is obtaining a vector, called PageRank vector, which gives the relative importance of the pages. Since this vector is calculated based on the structure of the Web connections, it is said to be independent of the request of the person performing the search. For a more detailed description of the PageRank algorithm see [3,20,23]. In [1], Agryzkov et al. propose an adaptation of the PageRank model to establish a ranking of nodes in an urban network, taking into account the influence of external activities or information. In the following, we refer to this algorithm as the adapted PageRank algorithm (APA algorithm). Although the APA algorithm is applied to urban networks, this is perfectly applicable to any network, whenever we want to analyze or represent additional information from the network itself, by means of a numerical assignment to the different nodes on the network.

470

T. Agryzkov et al. / Applied Mathematics and Computation 244 (2014) 467–478

The central idea behind the APA algorithm for ranking the nodes is the construction of a data matrix D0 , which allows us to represent numerically the information of the network that we are going to analyze and measure. The algorithm proposed in [1] is: Algorithm 2. APA Algorithm. Let us assume that we have a graph representing an urban network with n nodes, representing squares or intersections. We proceed with the following steps. Step 1 Obtain the transition matrix A from the graph of the network. Step 2 Consider the different characteristics ki associated to each of the nodes for the problem studied; evaluate them in each node. With these numerical values, we construct the matrix D0 . Step 3 Construct a vector v~0 , according to the importance of each of the characteristics evaluated. This vector represents a multiplicative factor. Step 4 Obtain a vector ~ v by multiplying D0  v~0 ¼ ~ v. ~ ~ Step 5 Normalize v !v  , using the standard method. Step 6 Construct the matrix V, from v~ . Step 7 Construct the matrix M0 ¼ ð1  aÞA þ aV, from A and V. Step 8 Compute the eigenvector associated to the eigenvalue 1 for the matrix M 0 . That is our ranking vector. r ¼ fr1 ; r2 ; . . . ; rn g with n components, where the ith The result of applying this algorithm to a network is a ranking vector ~ component represents the ranking of ith node within the overall network. For a better understanding of the algorithm and the parameters that characterize it see [2], where this algorithm is used to evaluate and visualize the commercial activities of a city. The general information that we intend to evaluate is the commercial activity, quantifying and visualizing trade endowments present in the urban network. These endowments are classified into four sectors: type I (food), type II (shops), type III (offices), type IV (malls). It is precisely these characteristics that will be measured by the data matrix D0 . Therefore, the matrix D0 now has four columns ki , for i ¼ 1; 2; 3; 4, each of them associated to a type of commercial activity. The first column will store the values of the activities of type I for all the nodes in the network, the second will store the values related to type II and so on for the rest of the columns. So, the element d52 2 D0 represents the commercial activity of type II for the node 5, that is, the number of shops in its vicinity (considering vicinity as the edges connecting node 5 with neighboring nodes). Once the matrix D0 is constructed, then the vector v~0 must be defined. This vector provides us the features we want to measure in our analysis. If we want to analyze the four types of commercial activity, we define v~0 ¼ ½1; 1; 1; 1. With D0 and v~0 continue with the following steps of the algorithm, already purely algebraic. In all examples made with this algorithm, the value of the parameter a we used was 0:15, simply because it is the usual value taken in the original PageRank algorithm. Summarizing the key points of this process, we can say that the main feature of this algorithm is the construction of the matrix D0 and the vector v~0 . Firstly, the matrix D0 allows us to represent numerically the information we want to study; secondly, the vector v~0 allows us to establish the importance of each of the factors or characteristics that have been measured by means of D0 . In other words, we can say that Algorithm 2 constitutes a model to establish a ranking of nodes in a network, with the primary feature that assigns a value to each node according to its significance within the physical network. 5. Joining the APA algorithm and a random-walk betweenness centrality The central idea behind this paper is to establish a new centrality measure by combining random-walk centrality proposed by Newman (see Section 2), with the APA algorithm to establish a ranking of nodes on a network proposed by Agryzkov et al. (see Section 3). In the following, we are going to call this centrality measure the ranking-betweenness centrality, and will be denoted as C rb . 5.1. The algorithm to compute the ranking-betweenness centrality measure To understand the bringing together of both algorithms, we must first turn our attention to a fundamental aspect of the random-walk betweenness model. In this model, we start constructing a matrix D, which is a diagonal matrix where each element is the degree of the corresponding node. Note that this matrix is constructed by considering only the connectivity of the vertices and does not take into account other factors or characteristics of the nodes. In some types of networks, as for example street networks, a node does not measure its importance only by its connectivity to other nodes; there are other factors that can and should be measured to be considered a true ranking of importance within the network. As an example related to urban networks, we should consider that certain nodes can have special features, such those representing historic sites of special interest, or those that have in their proximity certain commercial endowments that act as major attractors of people, or those meeting points for much people at certain times. This information associated with the nodes of the network

T. Agryzkov et al. / Applied Mathematics and Computation 244 (2014) 467–478

471

is independent of the degree of connectivity of them and are among factors that may be considered. When we refer to the importance of a node within a network we mean a high value of its centrality within the same. Moreover, Algorithm 2 provides a ranking of all the nodes in the network, based on a set of characteristics of their own nodes, by defining the data matrix D0 . Consequently, we are able to assign a numerical value to each node, depending on its importance within the network and, more importantly, depending on the characteristics we want to analyze. The key in this process is to use this numerical value of the ranking as input in the elements of the matrix D of the algorithm that measures the random-walk betweenness. We can think, as a first option, that the elements of D may be substituted by the values ri from the PageRank vector (they are easily obtained by running Algorithm 2). Once we construct the matrix D from the values of the PageRank vector, we run the algorithm and get the new value of the centrality for the nodes. However, we must be careful because this process can lead to mistakes. When the values of the PageRank vector are introduced in the diagonal matrix D (replacing the degree values of each node), these new values associated with the nodes in this matrix have nothing to do with the degrees of the same. Therefore, we are modifying the matrix of degrees but do not change the adjacency matrix of the graph, which is a misconception. We can avoid this problem by modifying the adjacency matrix as follows. We call di , for i ¼ 1; 2; . . . ; n, the number of edges from the node i to other nodes. Instead of working with the adjacency matrix A in the usual way, we define a new adjacency matrix A in the form A ¼ ðaij Þni;j¼1 2 Rnn , where

(

aij

¼

1 di

if nodes j and i are adjacent;

0 otherwise;

1 6 i; j 6 n:

ð4Þ

We also define the diagonal matrix R ¼ ðr ii Þ, for i ¼ 1; 2; . . . ; n, whose elements r ii are the values r i from the PageRank vector. From R and A we construct the matrix

K ¼ R  A ;

ð5Þ

which is the new matrix that substitutes the matrix A in Algorithm 1. Therefore, we can establish a new algorithm to compute a modified random-walk betweenness centrality, given by the following steps: Algorithm 3. Let us assume that we have a graph representing a network with n nodes. We proceed with the following steps. r ¼ fr 1 ; r 2 ; . . . ; rn g. Run Algorithm 2. Obtain a ranking vector ~ Construct the matrix R, where R ¼ ðr ii Þi¼1;2;...;n is a diagonal matrix with n elements, such that r ii ¼ ri . Construct the matrix A , according to the expression (4). Construct the matrix K, where K is given by the expression (5). Compute the matrix R  K. Remove the last row and column, obtaining an ðn  1Þ  ðn  1Þ matrix ðR  KÞ . 1 Invert the matrix ½ðR  KÞ  . Proceed to add back in a new row and column consisting of all zeros in the position from which the row and column were previously removed. Call T the resulting matrix, whose elements are T ij . Step 9 Calculate the new random-walk betweenness from the expression

Step Step Step Step Step Step Step Step

1 2 3 4 5 6 7 8

P C rb ðiÞ ¼ 1 2

st s
nðn  1Þ

ð6Þ

;

using the values of Ii , from the equations

Isti ¼

1X K ij jT is  T it  T js þ T jt j; 2 j

i – s; t:

ð7Þ

When comparing Algorithms 1 and 3, it might be thought that the differences are minimal. However, the differences between the two algorithms are very important. The main idea of the proposed algorithm is that now the importance of the nodes is determined by factors beyond the network connectivity, but a set of factors or characteristics that we can model ourselves according to the problem we are studying. It is this feature of using the information from the network to evaluate the importance of the nodes within the same, which makes this measure different from classical measures of centrality in the theory of networks. This approach is more in keeping with some types of networks, such as those representing urban street networks and their characteristics. 5.2. A simple example We describe in detail a simple example of a network represented by the graph with 10 nodes f1; 2; . . . ; 10g, shown in Fig. 1.

472

T. Agryzkov et al. / Applied Mathematics and Computation 244 (2014) 467–478

Fig. 1. A simple network used for this example with n ¼ 10 nodes. Colors are generated randomly and the diameters of the circles are proportional to the degree of the node.

First, the random-walk betweenness centrality is calculated for each node, according to the procedure described in Algorithm 1. Subsequently, the new ranking-betweenness centrality measure C rb is evaluated following the steps in Algorithm 3, in order to compare the results. From the graph in Fig. 1, we can construct the adjacency matrix A and the degrees matrix D given by

2

0 61 6 6 61 6 60 6 6 60 A¼6 60 6 6 60 6 60 6 6 40

1 1 0

0

0 0

0

0

3

2 6 6 6 6 6 6 6 6 6 D¼6 6 6 6 6 6 6 6 6 4

0

0

0

1 1 1

0 0

0 0

1 1 0 1

0

0

0

0

0 0 0 0 07 7 7 1 0 0 0 07 7 1 1 0 0 07 7 7 1 1 1 0 07 7; 0 0 1 1 07 7 7 0 0 0 0 07 7 1 0 0 1 17 7 7 1 0 1 0 15

0

0

0

0

0

0

1 1 0

1 0

1 0

1 1 0

0

0

1

1 0

3

2

7 7 7 7 7 7 7 7 7 7: 7 7 7 7 7 7 7 7 5

3 4 5 4 5

0 1 1 0

2 4 3 2

Running Algorithm 1, we calculate the random-walk betweenness for the nodes f2; 3; 4; 5; 6; 7; 8; 9g, as we take s ¼ 1 and t ¼ 10, following the expressions (6) and (7). Therefore, we have,

P b2 ¼ 1 2

¼

st s
nðn  1Þ

¼

1  0:440 þ 0:185 þ 0:082 þ 0:032 þ 0:015 þ 0:057 þ 0:004 þ 0:004 ¼ 0:018: 45 P

b3 ¼ 1 2

¼

1 1 10 ½I þ I32 10 þ I42 10 þ I52 10 þ I62 10 þ I72 10 þ I82 10 þ I92 10  45 2

st s
nðn  1Þ

¼

1 1 10 ½I þ I23 10 þ I43 10 þ I53 10 þ I63 10 þ I73 10 þ I83 10 þ I93 10  45 3

1  0:667 þ 0:541 þ 0:219 þ 0:085 þ 0:039 þ 0:152 þ 0:009 þ 0:009 ¼ 0:038: 45

For the remaining vertices, calculations are made similar to those performed to obtain b2 , and b3 . The results obtained are

b2 ¼ 0:018;

b3 ¼ 0:038;

b4 ¼ 0:055;

b5 ¼ 0:052;

b6 ¼ 0:080;

b7 ¼ 0:013;

b8 ¼ 0:110;

b9 ¼ 0:080:

Let us assume that the network in Fig. 1 represents a small part of a street network and suppose that we want to consider the importance of the nodes according to their touristic relevance, where each node of the graph represents a strategic point in a city. Then, we are going to consider only one characteristic, that is, k1 , to introduce in the data matrix D0 . The goal is to simplify to the maximum this first example shown. So, k1 is associated to the number of tourist spots surrounding each node. It is important to remark that, for this example, the vector k1 has been obtained randomly in a range of values ranging from 1 to 10 (number of tourist spots around each node). After constructing D from this vector, Algorithm 2 is run. As a result of applying the algorithm, we obtain the PageRank vector, which provides us a ranking of the nodes according to their touristic importance. The vector obtained for this case is

~ r ¼ fr 1 ; r 2 ; . . . ; r 10 g ¼ ~ r ¼ f0:2; 0:3; 0:2; 0:6; 0:2; 0:3; 0:3; 0:7; 0:4; 0:2g

473

T. Agryzkov et al. / Applied Mathematics and Computation 244 (2014) 467–478

rounded to the first decimal to simplify calculations. Now, Algorithm 3 is run. The matrix K ¼ R  A is given by

2

0 6 0:1 6 6 6 0:05 6 60 6 6 60 K¼6 60 6 6 60 6 60 6 6 40 0

0:1

0:1

0

0

0

0

0

0

0

0

0:1

0:1

0

0

0

0

0

0

0:05

0

0:05

0

0

0:1

0:1

0

3

0

0:06

0:06

0

0

0:15

0

0

0

0

0

0

7 7 7 7 0 0:05 0 0 0 0 7 7 0:12 0:12 0:12 0 0 0 7 7 7 0 0:05 0:05 0:05 0 0 7: 7 0:06 0 0 0:06 0:06 0 7 7 7 0:15 0 0 0 0 0 7 0:175 0:175 0 0 0:175 0:175 7 7 7 0 0:13 0 0:13 0 0:13 5

0

0

0

0

0:12 0:12 0 0 0 0:05

Consequently, the matrix R  K is

2

0:2

6 0:1 6 6 6 0:05 6 60 6 6 60 6 60 6 6 60 6 60 6 6 40 0

0:1

0

0:1

0

0

0

0

0

0

3

7 7 7 7 7 7 0:12 0:12 0:6 0:12 0:12 0:12 0 0 0 7 7 7 0 0 0:05 0:2 0:05 0:05 0:05 0 0 7: 7 0 0:06 0:06 0:06 0:3 0 0:06 0:06 0 7 7 7 0 0 0:15 0:15 0:3 0 0 0 0 7 0 0 0 0:175 0:175 0 0:7 0:175 0:175 7 7 7 0 0 0 0 0:13 0 0:13 0:4 0:13 5 0:3

0:1

0:1

0

0

0

0

0

0

0:05

0:2

0:05

0

0:05

0

0

0

0

0

0

0

0

0

0

0:1

0:1

0:2

Once the matrix R  K has been determined, we proceed to remove the last row and column and compute ðR  KÞ1 . When the inverse is computed, we again add the last row and column with zeros and obtain the square matrix T ij of degree n. Finally, applying expressions (6) and (7), the ranking-betweenness centrality measure for all the nodes of this graph can be evaluated. After performing the suitable arithmetic calculations and simplifications, we have

C rb ð2Þ ¼ 0:0239;

C rb ð3Þ ¼ 0:0176;

C rb ð4Þ ¼ 0:0978;

C rb ð5Þ ¼ 0:0485;

C rb ð6Þ ¼ 0:0318;

C rb ð7Þ ¼ 0:0415;

C rb ð8Þ ¼ 0:0945;

C rb ð9Þ ¼ 0:0272:

We summarize the results obtained for this example in Table 1. In Table 1 we have represented, for the node i, for i ¼ 2; 3; . . . ; 9 the values of the degree di , the values of the PageRank vector r i obtained by Algorithm 2, as well as the random-walk betweenness centrality bi . The last row shows the values for the new betweenness centrality measure C rb ðiÞ, as it has been described in Section 5.1. The results shown in Table 1 are entirely consistent with the changes that have been introduced in the original network. From the point of view of random-walk betweenness centrality, node 8 is the most important, followed by nodes 6 and 9. These values are logical if you look closely at the topology of the network, that is, if you look at their position in the network and their connectivity with other nodes (degree). With the introduction of new data on the nodes and the PageRank algorithm, the situation changes, since the PageRank vector obtained provides a classification of the nodes in order of importance. By observing the components of the ranking vector r i it follows that the situation changes; now, the most important nodes (highest values of centrality) become 4 and 8. Taking these values as a starting point, the new ranking-betweenness centrality measure is calculated, providing the results shown in row C rb ðiÞ. The node that happens to have a higher value of centrality is 4, followed by 8, which is quite logical with the amendments introduced. If we observe the values of the centrality C rb ð4Þ and C rb ð8Þ we see that values are very similar, although node 4 is slightly above node 8 in importance within the network. The reason for this fact may be the combination of two factors, such as the PageRank value and the network connectivity. Table 1 Numerical results for the example studied with 10 nodes. Node

2

3

4

5

6

7

8

9

di ri bi C rb ðiÞ

3 0.3 0.018 0.024

4 0.2 0.038 0.018

5 0.6 0.055 0.098

4 0.2 0.052 0.049

5 0.3 0.080 0.032

2 0.3 0.013 0.042

4 0.7 0.110 0.095

3 0.4 0.080 0.027

474

T. Agryzkov et al. / Applied Mathematics and Computation 244 (2014) 467–478

6. Discussing an example In this section we calculate and analyze the new centrality measure proposed by Algorithm 3 on a larger network, in order to compare and verify the differences between the ranking-betweenness centrality and the other centrality measures which form the basis. In addition, an evaluation of other standard measures of centrality on complex networks is performed to check the differences. For this example we consider the network shown in Fig. 2(a), whose graph is shown in Fig. 2(b). This network represents a real urban street network; more exactly, it is a fragment of the urban network of the city of Murcia (Spain). A street network is represented as a spatial graph (set of lines) where every street is represented by its central line enriched by some metadata that can be geometric or semantic. Even if this representation model is simple and intuitive, it offers the possibility to model the street network in a formal way. The modeling approach for this network that we follow is based on the concept of primal graph, where the main component of the analysis is the street segment located between two successive intersections. The intersections at the junctions of the segments represent the nodes of the graph and the street segments represent its links or edges. The advantage of this representation is that it preserves the geometry of the urban space. In order to explain how to compute the ranking-betweenness centrality measure C rb for a network, we have considered the urban street network in Fig. 2(a). The graph of this network can be seen in Fig. 2(b). In this network, we consider a series of trade endowments that have been assigned to the nodes that are closest to them. The location of endowments and the number of them are reflected in Fig. 3. These endowments provide us with information that enables us to construct the data matrix D0 appearing in Algorithm 2. With the data matrix it is possible to run the APA algorithm to obtain a ranking of the nodes. This ranking is given by a classification vector where the ith component represents the numerical value associated to the ith node, measuring its importance within the network. The ranking vector constitutes the starting point for the execution of Algorithm 3, which is the one that assesses the extent of centralization index C rb . We run the APA algorithm for this network constructing a data matrix from the information of endowments given by Fig. 3. The results are shown in Table 2. With the results obtained by the APA algorithm, we run Algorithm 3 to evaluate the ranking-betweenness centrality measure for this network, and the scores are shown in Table 2. With the aim to compare these scores with the random-walk centrality, we also evaluate this index and include it in the table. Table 2 shows us only the values of the 25 nodes with the highest centrality values for each of the measures studied. Looking at this table, we highlight several points:  In the three measures assessed, the nodes with higher centrality value are different, that is, we can not say that there is a node in the network that is the most important for all measures (even for two of them). The central node using the random-walk betweenness is 125, while the higher-ranked node according to the APA measure is 26. Finally, for the new ranking-betweenness measure C rb , the most important node is 54. This clearly shows us that there are significant differences between these indices.  If we analyze the random-walk betweenness (first column) and APA measure, the differences are clearly observed in the results obtained. The node with highest value for random-walk centrality does not appear in the list of the 20 most important nodes according to the APA ranking. Similarly, the node with a higher ranking after running the APA algorithm does

Fig. 2. The network used for discussing the betweenness centrality measure.

475

T. Agryzkov et al. / Applied Mathematics and Computation 244 (2014) 467–478

Fig. 3. This figure shows us the location and the number of endowments associated to each of the nodes of the network. The number of endowments are marked in orange labels. (For interpretation of the references to colour in this figure caption, the reader is referred to the web version of this article.)

Table 2 Numerical results for the random-walk betweenness, APA ranking and ranking-betweenness of the network studied. Node

Random-walk

Node

APA algorithm

Node

C rb

125 126 85 87 76 78 77 111 84 75 110 83 124 123 86 36 88 107 71 74

0.015623 0.007874 0.006172 0.006110 0.004590 0.004275 0.003596 0.003512 0.003194 0.003094 0.003090 0.003075 0.003070 0.003023 0.002885 0.002482 0.002319 0.002308 0.002279 0.002240

26 28 100 47 102 103 46 80 99 72 101 50 30 54 48 77 55 98 71 79

0.033712 0.030861 0.030382 0.029879 0.029318 0.029121 0.027105 0.026940 0.025689 0.024415 0.024412 0.024254 0.024221 0.024203 0.023786 0.023048 0.022888 0.022746 0.021691 0.021687

54 47 77 101 26 28 98 81 125 71 51 55 30 83 36 46 0 72 79 80

0.125625 0.083101 0.078589 0.060021 0.059505 0.057925 0.045083 0.035701 0.033599 0.028544 0.026533 0.025341 0.024907 0.024601 0.020995 0.020637 0.019867 0.018344 0.017609 0.016622

not appear in the list of the top 20 nodes in the random-walk column. The first node that appears in both lists is node 77, which appears in the eighth position on the random-walk centrality and appears in the sixteenth position in the APA ranking. Only two nodes appear in both lists, nodes 77 and 71.  If we analyze the centrality measures obtained by the APA measure (second column) and ranking-betweenness (C rb ) it follows that we are not in the same situation as before, where the relationship between random-walk betweenness and the APA measure was very weak. Now, the relationship between the two measures is much more evident, which is perfectly logical, since the classification that gives us the APA is basic to our measure. Furthermore, we can identify 14 nodes that appear in both lists, which means a large number of matches. There is a significant influence of the APA classification made on the new proposal centrality measure. Fig. 4 shows a graphical representation of the ranking-betweenness centrality evaluated over the nodes of the network. For the graphical visualization we use a gradient scale ranging from red (most central nodes) to blue (least important nodes in the network).

476

T. Agryzkov et al. / Applied Mathematics and Computation 244 (2014) 467–478

Analogously as in Fig. 4 we have represented the ranking-betweenness measure on the network under study, in fig. 5 we have graphed the random-walk betweenness measure (left figure), as well as the classification of the nodes provided by the algorithm APA (right image), in order to visualize the differences between the new centrality measure proposed and the measures in which it is based. Notice how in the image representing the measure given by the APA algorithm, the influence of the new information introduced based on the existence of commercial endowments can clearly be observed. This influence is clearly seen if we look at Fig. 3, where the allocations of the different endowments are marked. We can easily distinguish red color nodes in areas where we have introduced the allocations. For this network we have evaluated some other node centrality indices, in order to compare the results obtained. Closeness centrality C C . It measures to which extent a node i is near to all the other nodes along the shortest paths, and is defined as [25]

n1 C C ðiÞ ¼ P ; j2V;j–i nij where dij is the shortest path length between i and j throughout all the possible paths in the graph connecting i and j. Betweenness centrality, C C . It is based on the idea that a node is central if it lies between many other nodes. It may be evaluated by means of the expression (1). Betweenness centrality is usually thought to measure the volume of traffic moving from each node to every other node that would pass through a given node. Straightness centrality, C S . This index start from the hypothesis that the communication between two points is better when the path is straight. It may be evaluated by the expression

C S ðiÞ ¼

Eu 1 X dij ; n  1 j2V;j–i dij

Eu

where dij is the Euclidean distance between nodes i and j along a straight line. Eigenvector centrality, C eig . This index is a natural extension of the simple degree centrality. In any network, if we consider a node, not all neighbors are equivalent. A node’s importance in the network is increased by having connections with other nodes that are themselves important. This is the idea behind eigenvector centrality. Eigenvector centrality was first proposed by Bonaich [4]. In broad terms, we can say that the limiting vector of centrality is simply proportional to the leading eigenvector of the adjacency matrix. Equivalently, the centrality ~ x verifies that

A~ x ¼ k1~ x; where k1 represents the largest eigenvalue of the adjacency matrix A. See Fig. 6 for illustration about the different measures evaluated for this network. We follow a similar gradient color scale, where red areas show the most important nodes, according to the measure evaluated. Likewise, blue areas reflect nodes with very low values in the ranking.

Fig. 4. Visualization of the network according to the ranking-betweenness centrality of each node.

T. Agryzkov et al. / Applied Mathematics and Computation 244 (2014) 467–478

477

Fig. 5. Graphical visualization of the network after evaluating the random-walk betweenness centrality and the ranking given by the APA algorithm.

Fig. 6. Visualization of the network after evaluating four centrality measures: closeness (top left), betweenness (top right), straightness (bottom left) and eigenvector centrality (bottom right).

478

T. Agryzkov et al. / Applied Mathematics and Computation 244 (2014) 467–478

The various graphical representations shown in fig. 6 allow us to easily check the differences that occur in the network with respect to the different measures examined. There are few coincidences in hot areas of the various plots. Especially relevant are the differences in the network when measuring straightness centrality, which offers a visual overview quite different from the rest of the images, with two distinct hot spots. These hot spots are not repeated in any of the other measures considered in the study. Notice the differences between these four measures and the ranking-betweenness centrality measure, in which we consider the information we have obtained from the network itself. Comparing Figs. 3 and 6, we observe the differences between these classical measures of centrality in networks and C rb measure we have evaluated from the ranking obtained from the information considered, in this case related to the commercial activity of a city. Is worth noting that although the size of the urban network that is shown in the example is small, it does not mean that the algorithm is not applicable to much larger urban networks. For example, the APA algorithm has been applied to an urban network (the city of Murcia, Spain) of 300,000 inhabitants, where the number of nodes was nearly 1500 (see [2]. The only consideration is that if we analyze very large networks necessarily we will work with a very large volume of data, so we must have efficient tools to manage and store big data. 7. Conclusion In this paper, we proposed a new betweenness centrality measure following the random-walk betweenness index and an adapted algorithm for ranking the nodes in a network. The classification algorithm for the nodes is based on the concept of PageRank vector used by the Google engine to classify the Web pages. We modified the PageRank algorithm with the primary objective to introduce a data matrix representing some type of information of the network itself. This information allows us to obtain a ranking of the nodes of the network which is used in the random-walk betweenness algorithm. The new centrality measure, called ranking-betweenness centrality, reduces the effect of the randomness present in the random-walk betweenness measure and takes into account, in a decisive way, the information we want to evaluate of the network. The detailed example presented shows how to evaluate this new centrality index and the differences produced when compared to other classical centrality measures. References [1] T. Agryzkov, J.L. Oliver, L. Tortosa, J. Vicent, An algorithm for ranking the nodes of an urban network based on concept of PageRank vector, Appl. Math. Comput. 219 (2012) 2186–2193. [2] T. Agryzkov, J.L. Oliver, L. Tortosa, J. Vicent, Analyzing the commercial activities of a street network by ranking their nodes – a case study Murcia, Int. J. Geog. Inf. Sci. 28 (3) (2014) 479–495. [3] P. Berkhin, A survey on PageRank computing, Internet Math. 2 (1) (2005) 73–120. [4] P. Bonacich, Power and centrality: a family of measures, American Journal of Sociology 92 (1987) 1170–1182. [5] P. Bonacich, Simultaneous group and individual centrality, Social Networks 13 (1991) 155–168. [6] S.P. Borgatti, Centrality AIDS Connections 18 (1) (1995) 112–114. [7] U. Brandes, A faster algorithm for betweenness centrality, J. Math. Sociol. 25 (2) (2001) 163–177. [8] F. Calabrese, C. Ratti, M. Colonna, P. Lovisolo, D. Parata, Real-time urban monitoring using cell phones: a case study in Rome, IEEE Trans. Intell. Transp. Syst. 25 (2) (2011) 141–151. [9] P. Crucitti, V. Latora, S. Porta, Centrality measures in spatial networks of urban streets, Phys. Rev. E 73 (2006) 036125. [10] P. Crucitti, V. Latora, S. Porta, The network analysis of urban streets: a primal approach, Planning des. 33 (5) (2006) 705–725. [11] P. Crucitti, V. Latora, S. Porta, The network analysis of urban streets: a dual approach, Physica A: Stat. Mech. Appl. 369 (2) (2006) 853–866. [12] L.C. Freeman, A set of measures of centrality based on betweenness, Sociometry 40 (1) (1977) 35–41. [13] L.C. Freeman, Centrality in social networks: conceptual clarification, Social Networks 1 (3) (1979) 215–239. [14] L.C. Freeman, S.P. Borgatti, D.R. White, Centrality in valued graphs: a measure of betweenness based on network flow, Social Networks 13 (1991) 141– 154. [15] N.E. Friedkin, Theoretical foundations for centrality measures, Am. J. Sociol. 96 (1991) 1478–1504. [16] C.H. Hubbell, An input output approach to clique identification, Sociometry 28 (1965) 377–399. [17] S.K. Jeong, Y.U. Ban, Computational algorithms to evaluate design solutions using space syntax, Comput.-Aided Des. 43 (6) (2011) 664–676. [18] B. Jiang, Ranking spaces for predicting human movement in an urban environment, Int. J. Geog. Inf. Sci. 23 (7) (2009) 823–837. [19] L. Katz, A new index derived from sociometric data analysis, Psychometrika 18 (1953) 39–43. [20] A.N. Langville, C.D. Mayer, Deeper inside PageRank, Internet Math. 1 (3) (2005) 335–380. [21] M.E.J. Newman, A measure of betweenness centrality based on random walks, 2003. Available via arXiv:cond-mat/0309045. [22] L. Page, S. Brin, R. Motwani, T. Winogrand, The pagerank citation ranking: bringing order to the web, Technical report 1999–66, Stanford InfoLab, 1999. [23] F. Pedroche, Metodos de calculo del vector PageRank, Bol. Soc. Esp. Mat. Apl. 39 (2007) 7–30 (in spanish). [24] K.A. Stephenson, M. Zelen, Rethinking centrality: methods and examples, Social Networks 11 (1989) 1–37. [25] S. Wasserman, K. Faust, Social Network Analysis, Cambridge University Press, Cambridge, 1994. [26] B. Yang, M. Kaul, C. Jensen, Using incomplete information for complete weight annotation of road network, IEEE Trans. Knowl. Data Eng. 26 (5) (2011) 1267–1279. [27] B. Yang, M. Kaul, C. Jensen, Using incomplete information for complete weight annotation of road network (Extended Version). arXiv:1308.0484v2 [cs.LG].