Accepted Manuscript
Generation of Power-law Networks by Employing Various Attachment Schemes: Structural Properties Emulating Real World Networks Swarup Chattopadhyay, C.A. Murthy PII: DOI: Reference:
S0020-0255(17)30569-8 10.1016/j.ins.2017.02.057 INS 12780
To appear in:
Information Sciences
Received date: Revised date: Accepted date:
22 June 2016 22 February 2017 28 February 2017
Please cite this article as: Swarup Chattopadhyay, C.A. Murthy, Generation of Power-law Networks by Employing Various Attachment Schemes: Structural Properties Emulating Real World Networks, Information Sciences (2017), doi: 10.1016/j.ins.2017.02.057
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
Generation of Power-law Networks by Employing Various Attachment Schemes: Structural Properties Emulating Real World Networks Swarup Chattopadhyay†,1 and C. A. Murthy† Machine Intelligence Unit, Indian Statistical Institute Kolkata-700108, India, email:-
[email protected],
[email protected]
CR IP T
†
Abstract
ED
M
AN US
In this article we propose a general methodology for constructing complex networks. Popular selection schemes in Genetic algorithms are used for this construction. Mathematically, it has been shown that, under some weak constraints, the degree distribution of the resulting networks follow power-law, as seen in real world networks. Power-law degree distribution is one of the most significant structural characteristics observed in many real-world complex networks. The main reason behind the emergence of this phenomenon is the mechanism of preferential attachment which states that in a growing network a node with higher degree is more likely to receive new links. However, degree is not the only key factor influencing the network growth leading to power-law degree distribution. Instead, there must be several other factors whose cumulative effect, called fitness of a node, has a significant role in attracting other nodes and thereby producing power law networks. The concept of fitness can be thought of as a generalization of node degree. Heterogeneity in preferential linking also plays an important role in producing power-law networks in this context. The proposed construction methodology, also leading to power law networks, combines the inherent fitness value of a node, drawn from a particular distribution, with various attachment schemes based on the different selection methods commonly used in Genetic algorithms. Six different selection schemes are used in total. Different well known structural measures like average degree of the nearest neighbors, average path length, clustering coefficient, etc. are calculated for each newly generated network to understand their behavior patterns. It has been found that these six schemes can be divided into two distinct groups of three on the basis of their structural properties, where one of these two groups produces proper power-law networks which possess topological properties similar to observed in the real world. Finally, extensive simulations and experiments over scientific collaboration networks validate the effectiveness of the proposed models.
CE
PT
Keywords: Complex networks, Power-law distributions, Genetic algorithm, Preferential attachment, Scale-free networks, Social networks, Pattern recognition.
1. Introduction and related works
AC
During the last decade, there has been growing interest in the study of large scale real world complex networks [3, 39, 1, 36, 33, 6, 37] including its various modelling aspects. Complex networks have been normally modeled using graph theory in which evolving or growing sets of vertices are connected by edges. The vertices are the individuals of the system and the edges symbolize the relations between them. Complex networks display significant statistical and topological resemblances even though the nodes and links may have different interpretations. In the early stages of complex network research, several random graph theory based models had been proposed [7] in order to capture the topology of a complex network. One of the 1 Corresponding
author. Tel. +91-33-2575 3104/3100, Fax +91-33-2578-3357
Preprint submitted to Elsevier
February 22, 2017
ACCEPTED MANUSCRIPT
CR IP T
earliest theoretical models of a complex network was proposed by Erdo’s and Renyi [14, 15], in which the vertices are connected at random. The degree distribution of these models results in the binomial distribution, which can be approximated by a Poisson distribution for a large number of nodes. As a result the structures of these random graphs are almost uniform, with most vertices having approximately the same degree. It has also been observed that the degree distribution of these networks can be grouped and modeled using truncated geometric distributions [12]. Recent research [46, 30, 21, 44] has been focused on the analysis of structural characteristics such as degree distribution, correlation coefficient, average nearest neighbor, average path length, clustering coefficient, community discovery, etc. The node degree distribution has been viewed as an important structural characteristic of social and information networks [35]. Most of the interest has been concentrated on scale free networks, which are characterized by having a power-law degree distribution [35, 4]; i.e., if p(d) is the fraction of nodes in the network having degree d (i.e., having d connections to other nodes), then (for suitably large d) p(d) = cd−λ ,
AC
CE
PT
ED
M
AN US
where c = (λ − 1)n0λ−1 is a normalization constant, n0 is the minimum degree in the network, and λ > 1 is the power law exponent. This distribution is observed in many man-made and/or naturally occurring phenomena, including city sizes, incomes, word frequencies, earthquake magnitude and number of phone calls per customer [42, 34, 17]. Barabasi and Albert (1999), in their seminal works, introduced scalefree networks for the first time [4, 2]. They explored several large databases describing the topologies of real world complex networks, including the WWW [2], the Internet [16], metabolic networks [22], protein networks [23], co-authorship networks [36], and sexual contact networks [32], the language [43], the web of actors in Hollywood [1], etc. The empirical results showed that the degree P (d) in these networks decays as a power law for large d, where the exponent is scattered between 2 and 3. These sets of networks has just a few nodes with high degree and most other nodes with small degree, a property not found in standard Erdo’s–Renyi (ER) random graphs [14, 15]. Consequently the ER model fails to incorporate the topology of large scale real world complex networks, which possess power-law (heavy-tailed) degree distribution. Therefore, there must exist some theory behind the generation of heavy-tailed degree distribution that naturally arises in complex networks. The two dominant concepts behind the emergence of heavy-tailed degree distribution in complex networks are: (1) the network grows in time by the addition of new vertices at a constant rate and (2) the mechanism of preferential attachment proposed by Barabasi and Albert [4, 2]. Preferential attachment stands for the fact that new vertices added to the network are attached preferentially to high-degree vertices. Thus growth and preferential attachment came across as the principal mechanisms for scale-free degree distributions. However, people have observed [11, 31] that preferential attachment can be difficult for a new node to perform because the global degree distribution information is not available in general. In order to overcome this problem, it has been suggested that [11, 31, 5, 40], each node of the complex network may be associated with a “fitness value” that represents its ability to attract links. In real world networks the fitness will be related to intrinsic qualities of the vertex, such as rank, between-ness, or closeness, etc. It can be represented as a real non-negative variable x, drawn from a probability distribution ρ(x), assigned to each vertex of the network [40, 19]. Following this approach, we have used various attachment schemes (different from preferential attachment scheme) to produce successive edges by selecting the likely vertices from the existing set of vertices, based on their fitness values, in network evolution. The heterogeneity in preferential linking has played an important role, as seen in this article, in producing power-law networks with real exponents. Here we have proposed new methods, also leading to power law networks, by combining the inherent fitness value of a node, drawn from a particular distribution, with various attachment schemes based on different selection methods commonly used in Genetic Algorithms (GA). Genetic Algorithm (GA) is a method that mimics the process of natural evolution. Selection is an important operation in the GA process. The selection mechanism mimics ”survival of the fittest” in biological evolution. The main principle of selection strategy is, “the better is an individual; the higher is its chance of being a parent”. In the formation process of the real world complex networks, at every time step, each 2
ACCEPTED MANUSCRIPT
AC
CE
PT
ED
M
AN US
CR IP T
newcomer node is likely to get attached to the fittest node (e.g. node having highest degree) and thereby produces power-law (or heavy-tailed) structure. Since the concept of preferential attachment is similar to the concept of selection in genetic algorithms, it is felt that the performance of selection mechanism is expected to be similar to the performance of preferential attachment and can be used to model complex networks. In the literature there exist several selection methods: Roulette Wheel Selection [8], Stochastic Universal Sampling, Tournament Selection, Boltzmann Selection and others [20, 24]. Out of these, six selection methods, as described below, have been used in this article to obtain the vertices for creating new edges at each time step of network evolution. These six selection schemes have been chosen because of their popularity in the GA literatures and their general robustness in selecting better individuals [9, 8]. Ghadge, et al. [19] have used log-normal fitness attachment algorithm (LNFA), an approach similar to the first one of our GA based selection schemes (RWS) and derived the degree distribution by estimating the power law exponent λ but have disregarded the other important structural properties observed in real world networks such as clustering coefficient, triad participation, etc. The proposed work differs from LNFA [40, 19] by the inclusion of several more selection schemes for the purpose of generating power-law networks along with a theoretical justifications of their power law behavior. In addition to degree distribution, we also experimentally evaluate various other important structural properties of the generated networks. This work also includes validation of the proposed models with respect to the real world networks by computing their important topological properties. This article deals with the analysis of these selection methodologies for generating Power law networks. This paper mainly focuses on the generation process of the complex networks using various attachment schemes, based on selection methods frequently used in GA, leading to power-law degree distribution. After initializing the fitness distribution, networks are generated using six different selection methods commonly used in GA. For each newly generated network, we have calculated the following well known structural and/or topological measures: average degree of the nearest neighbors, average path length, central point dominance, clustering coefficient, global efficiency, etc. These indices is observed to reflect the connectivity and dynamism of a network. We have observed that networks generated through a different set of attachment schemes may broadly be divided into two groups, each group having similar structural and/or topological properties even though both the groups result in power-law networks. Each of these groups produces networks which exhibit behavior that are similar to those observed in real world networks. Experiments with real world networks demonstrate that one of these groups is able to capture several of the network structural and/or topological properties and outperforms the well known Kronecker graph model [30], the LNFA model [19] as well as the MAG model [25], the first of which we use as a baseline model and the latter two as state-of-the-art models respectively, for comparison purposes. We have also observed that the other group, for certain structural measures, performs superior to the baseline model and sometimes better than the state-of-the-art models. The remainder of the paper is organized as follows. Section 2 provides a brief overview of the early network models. In Section 3, we describe different selection methods, commonly used in GA, for choosing the most likely node in a network. We also briefly describe some inherently emergent structural and/or topological properties of real world complex networks in Section 4. In Section 5, we propose log-normal fitness attachment algorithms based on different selection methods frequently used in GA. We show the results of the calculations of the structural and/or topological measures for the power-law networks obtained, in Section 6. Section 7 discusses the behaviour of the proposed schemes over various density distribution functions other than log-normal. Section 8 is devoted to the comparison of the proposed models with respect to the baseline and the state-of-the-art models, over real world networks. Finally, Section 9 concludes the paper. 2. Basic Network Models The Random Network Model: The Erdos and Renyi (ER) random graph model [14, 15] consists of N number of nodes (or vertices), which is fixed and the probability of connecting each couple of nodes is p,
3
ACCEPTED MANUSCRIPT
Table 1: Notations used in this paper
ρ σ fk Γi LP (i → j) n0 n m
Graph-Specific A graph with set of nodes V and set of edges E Adjacency matrix of graph G N = |V |, Number of vertices in G Number of edges in G The probability that a new vertex i is connected to an existing vertex j Degree of a node u The average degree of the ith node The average nearest neighbors degree A geodesic path (or shortest path), between vertices i and j The set of vertices with degree greater than d Model-Specific Density function of the fitness distribution Parameter for the log-normal distribution used to generate the fitness Observed fitness value associated to node k Random variable corresponding to fitness of node i Linking probability between node i and j The number of nodes to initialize the network with The number of nodes in the final network The number of nodes an incoming node connects to, when it first joins the network
CR IP T
G(V, E) A N E P (i → j) du di dnn (d) lij ℵ(d)
AN US
0 < p < 1. The network contains pN (N − 1)/2 edges. The degree distribution is binomial, N −1 d P (d) = p (1 − p)N −1−d , d
AC
CE
PT
ED
M
so the average degree is z = p(N − 1). For large N and fixed z, the degree distribution is well approximated by a Poisson distribution. The Small-World Model: Many real world networks exhibit what is called the small world property, i.e. most vertices can be reached from the others through a small number of edges. The most popular model of random networks with small world characteristics and high clustering coefficients was developed by Watts and Strogatz [45] and is called the Watts-Strogatz (WS) small-world model. The model is based on a rewiring procedure of the edges implemented with a probability p. Alternative procedures for constructing small-world networks, based on adding edges instead of rewiring, have also been proposed [38]. The Scale-Free Network Model: As we have discussed that several important large growing networks in Nature are scale-free, i.e., their degree distributions follows power-law (i.e., P (d) is free of scale). The reason for such heavy-tailed distribution is as a network grows, a new vertex tends to connect to the pre-existing high degree vertices. The Barab´ asi–Albert (BA) model is a model of network growth inspired to the formation of the World Wide Web and is based on two basic ingredients: growth and preferential attachment [4, 2, 10], i.e., the model is defined in two steps: (1) Growth: Starting with a small number (m0 ) of vertices, at every time step we add a new vertex which will be connected to m(≤ m0 ) nodes. (2) Preferential attachment: Let P (i → j) denote the probability that a new vertex i is connected to an existing vertex j. The principle of preferential attachment assumes that P (i → j) is proportional to the d connectivity dj of the vertex j. i.e., P (i → j) = P jdu . u Variations of the preferential attachment model, the linear preferential and the non-linear preferential attachment models have been proposed by Dorogovtsev, et al. [13] and Krapivsky, et al. [26] respectively. In all the models described above, preferential attachment is an essential component for generating power-law networks. In a growing network, the preferential attachment scheme selects higher degree nodes from the existing to create edges with the new one. Hence the degree of a vertex is the only attribute responsible for creating new edges in a preferential attachment network. In this article we will be describing that in many complex networks, instead of a degree, each node has its associated intrinsic fitness value representing the propensity of the node to attract links. Selection of nodes based on their fitness value plays 4
ACCEPTED MANUSCRIPT
an important role in producing power-law (scale free) networks. The following section discusses different GA based selection methods that are used in this article by selecting the fittest vertices for creating new edges in network evolution. 3. Different Selection Methods
CR IP T
Selection is an important operation in the GA process. There are several methods for selection. As mentioned before, six different selection methods are considered in this work, namely: the roulette wheel selection (RWS), the stochastic universal sampling (SUS), the stochastic remainder selection (SRS), the linear rank selection (LRS), the exponential rank selection (ERS)and the uniform rank selection (URS). In this section, we provide a brief description of each of the aforementioned selection methods that are used in this article to generate power-law networks. Roulette Wheel Selection (RWS): Roulette wheel selection is one of the traditional and simplest GA selection techniques [20]. In this technique, all the individuals in the population are placed on the roulette wheel according to their fitness value. Each individual is assigned a segment of roulette wheel whose size is proportional to the value of the fitness of the individual. A random number is generated and the individual whose segment spans the random number is selected. The individuals are selected with a probability that is directly proportional to their fitness values. Let f1 , f2 , ..., fn be the fitness values of individuals 1, 2, ..., n respectively. Then the selection probability, P (i) for individual i is defined as P (i) = Pnfi fj . j=1
M
AN US
Stochastic Universal Sampling (SUS): Stochastic universal sampling (SUS) is a single-phase sampling algorithm with minimum spread and zero bias [20]. Instead of the single selection pointer employed in roulette wheel methods, SUS uses N equally spaced pointers, where N is the number of individuals to be selected. Then the distance between the pointers are N1 and a single random number ptr generated number in the range [0, N1 ]. The N individuals are then chosen by generating the N pointers spaced by N1 , [ptr, ptr + N1 , ..., ptr + NN−1 ], and selecting the individuals whose fitness span the positions of the pointers. Stochastic Remainder Selection (SRS): The basic idea of this selection scheme is to remove or copy the strings depending on the values of reproduction counts. This is achieved by computing the reproduction count associated with each string. Initially, the probability of selection P (i) is calculated as P (i) = Pnfi fj . j=1
AC
CE
PT
ED
The expected number of selections for i is determined as integral part of ei = P (i) ∗ N , where N is the number of individuals to be selected. The fractional part of ei is also used as a probability for selecting the individual i. A roulette wheel selection method is used for selecting the remaining individuals. Rnking Based Selection: Rank-based selection schemes use the strategy that the probability of an individual being selected is based on its fitness rank relative to the entire population [8]. A rank based selection scheme initially sorts individuals in the population according to their fitness values. A function is then used to map the rank of individuals in the sorted list to their selection probabilities. The mapping function may be linear (linear ranking) or non-linear (non-linear ranking). For more details about Linear Rank Selection (LRS), Exponential Rank Selection (ERS) and Uniform Rank Selection (URS), please go through the supplementary materials. In this article the above described selection methods have been used to select the vertices for creating new edges in network evolution. Even in real world scenarios, the selection of individuals plays a significant role in forming a social group in the generation process of a social or information network. In the following section we have described some structural measures, which are emergent topological properties observed in real world complex networks, for each of the networks generated using different attachment schemes based on different selection methods described above. 4. Structural Properties Real world complex networks exhibit several interesting structural properties such as average degree of the nearest neighbors, central point dominance, central edge dominance, clustering coefficient, global efficiency, diameter, transitivity, average shortest path, singular values, singular vector, degree centrality, 5
ACCEPTED MANUSCRIPT
CR IP T
motifs, etc. These indices differ significantly from regular networks to random networks. These sets of topological measurements provide a meaningful interpretation regarding a network’s dynamical characteristics and its connectivity. Therefore the behavior of these above mentioned inherently emergent structural and/or topological properties over the proposed networks is worth studying. Here in this article, our objective is not to perform an extensive review of all possible topological metrics, but just highlight some global features of different networks instead. In this article, we have considered a few well-known and important structural characteristics viz. Average Nearest Neighbors Degree (dnn (d)) [27, 41], Degree Distribution (DD) [27, 41], Clustering Coefficient (CC) [45], Triad Participation (TP) [45], Average shortest Path Length (APL) [27], Diameter (DM) [27], Central Point Dominance (CPD) [18], Central Edge Dominance (CED) [18], Rich-Club Coefficient (φ(d)) [47], Singular Values (SValue) [25], Singular Vectors (SVector) [25] and Global Efficiency (GE) [27]. for evaluating the performances of our proposed models. For more mathematical details about these indices along with their emerging characteristics, please look into the supplementary materials. 5. Proposed GA based log-normal fitness attachment algorithms
M
AN US
5.1. Motivation We propose several network generation models using the aforementioned GA based attachment schemes which produce the heavy-tailed degree distribution observed in many real world complex networks. This generation process does not require any knowledge of the degrees of existing nodes. It has been observe that in most real-world networks, each node is associated with several attributes, whose cumulative, in some way, represents the propensity to create links with the others in the network [19]. For example, in the World Wide Web, one web page may become more popular than another based on its intrinsic content value even if both are created at the same time. The Same can be concluded for judging an author quality among a set of authors having the same number of papers in a collaborative network in a particular subject domain. Similarly, in the case of a citation network, a paper may get cited because of factors such as the content of the paper, the eminence of the author(s), the reputation of the journal, etc. [19]. Therefore the quantity that determines the overall attractiveness of a paper for citation is essentially a function of various such factors.
CE
PT
ED
5.2. Model Description Following these examples, we can infer that in many complex networks, each node will have an associated quantity, which represents the propensity of a node to attract links. Here, this quantity is taken to be a product of several other contributing factors [19]. That is, to any node i in the network, there is an associated non-negative real number Γi , which is called the fitness of node i, and which is of the form Γi =
L Y l=1
γl ⇒ lnΓi =
L X
lnγi ,
l=1
AC
where each factor γl is non-negative and real. The maximum number of factors L associated with a node is unknown a priori. So in the long run, we assume that the number of factors associated with each node is reasonably large and the factors are statistically independent. Therefore, by the law of large numbers, lnΓi will be normally distributed and hence the fitness Γi will be log-normally distributed irrespective of the distribution of each factor γi . The density functions of the normal distribution and the log-normal distribution by the relation Y = lnX are respectively 2 2 1 f (y) = √ e−(y−µ) /2σ ; y ∈ (−∞, +∞) 2πσ and
2 2 1 f (x) = √ e−(lnx−µ) /2σ ; x ∈ (0, ∞), 2πσx
6
ACCEPTED MANUSCRIPT
where µ is the mean and σ is the standard derivation (i.e. σ 2 is the variance). In the growth process of real world complex networks, at every time step, each newcomer node gets attached to the fittest node (e.g. node having highest degree) and thereby producing a power-law (or heavy-tailed) structure. Therefore the selection process of the individuals in a growing network is just as important as the selection mechanism used in Genetic Algorithms (GA) which can be useful in this context to construct power-law networks. We now propose our main algorithm to construct a power-law network, based on different selection (attachment) schemes frequently used in GA. The parameters for the main algorithm are σ, n0 , m ≤ n0 , n as defined in Table 1. The main algorithm for network generation is stated below.
CR IP T
Step 1: Initially, we assume that the network is a graph with n0 nodes. Each of these initial nodes has a fitness generated according to the log-normal distribution with parameter σ. Step 2: At every time step we add a new node to the network. This new node has its associated lognormally distributed random fitness value and m new edges will be created by connecting this node to m distinct nodes already present in the network. Suppose that the network currently consists of n′ nodes 1, 2, ..., n′ (n′ < n). The m nodes to be connected to the generated new node are selected a given GA based selection scheme described in Section 3. Step 3: The algorithm stops when the number of nodes in the network reaches n.
M
AN US
Note that in Step 2 of the algorithm, different selection schemes commonly used in GA may be implemented. Each such GA based selection scheme would result in a new network generation method. There are more than 50 selection schemes in Genetic Algorithms. Among them, we chose six popular selection schemes, generally used in the literature, for generating networks. After completion of the main protocol using a particular selection scheme, a network having n vertices will be generated. Six different algorithms can be established through Step 2 of the main protocol by using six different GA based selection schemes viz. RWS, SRS, SUS, RankLinear, RankExp and RankUniform, as described in Section 3. In the subsequent sections we refer to the six generation algorithms as well as their resultant networks using the same name as their respective selection schemes.
PT
ED
5.3. Model Analysis and Derivation The network, in its initial state, has n0 nodes, which can be a complete graph, a tree or just a set of isolated nodes. At each time step of the network generation, a new node connects to m nodes amongst the preexisting nodes in the network, with linking probabilities based purely on their fitness values. Multiple edges are not allowed on the constructed network. After completion of the algorithm, the generated network has n nodes and at most m(n − n0 ) + e0 links, where e0 is the number of links in the initial network. The linking probability of the newcomer to the others in the network depends only upon their fitness values and it also depends on the selection scheme used. Following are the details of the linking probability for the respective selection schemes:
AC
CE
5.3.1. Definitions Let LP (i → j) be the linking probability that a new vertex i is connected to an existing vertex j and it lies between 0 and 1. Assume that at time step t, the network has k vertices {1, 2, 3, ..., k} associated with the fitness values {f1 , f2 , f3 , ..., fk }, drawn from a log-normal distribution with density ρ. At time t + 1, a new node k + 1 with fitness fk+1 comes in and creates an edge with the node r (amongst others) with probability LP (k + 1 → r). In each of the cases of RWS, SUS and SRS, the linking probability directly depends on the fitness values of the existing nodes and is defined as follows: 1. For RWS, the new node k + 1 creating an edge with linking probability fr LP (k + 1 → r) = Pk
i=1
7
fi
ACCEPTED MANUSCRIPT
2. For SUS, the new node k + 1 creating m edges with linking probability 1 if ( Pkfr f ) ≥ 1/m i=1 i LP (k + 1 → r) = m( Pkfr ) if ( Pkfr ) < 1/m f f i=1
i
i=1
i
CR IP T
3. For SRS, the new node k + 1 creates m edges with linking probability given below: Let m( Pkfr f ) = αr + βr ; where αr =integeral part and βr =fractional part, 0 < βr < 1 for all r. Let i=1 i Pk Pk Pk βr r=1 αr = α, r=1 βr = m − α and βr ′ = m−α ; r ′ =1 βr ′ = 1. Now there are two cases: 1 3a: Suppose α < 1, then α = 0. Therefore all fi < m , for i = 1, 2, ..., k. Then we consider all those fi to be insignificant and hence we shall choose only one node from βi with linking probability
βr LP (k + 1 → r) = Pk
i=1
βi
AN US
3b: Suppose α ≥ 1, choose all the nodes i for which αi ≥ 1. Hence we shall choose α many nodes. Aside from these α many nodes, we choose one more node with probability proportional to the fractional part βr . Hence in this case the nodes are chosen with linking probability ( 1 where αr ≥ 1 LP (k + 1 → r) = ( Pkβr β ) where 0 < βr < 1 i=1
i
ED
M
In case of the ranking based algorithms(RankLinear, RankExp, RankUniform), the linking probability does not depend directly on their fitness values of the existing nodes. It depends on their respective ranks based on their fitness values. Suppose at time step t, the network has k nodes {i1 , i2 , i3 , ..., ik } with fitness values {fi1 , fi2 , fi3 , ..., fik } such that fi1 < fi2 < fi3 < ... < fik . Now we rank those nodes in such a way that the rank k is accorded to the best node whilst the worst node gets the rank 1 based on their fitness values. Hence, the nodes i1 , i2 , i3 , ..., ik get associated with the rank 1, 2, 3, ..., k respectively. Now the new node k + 1 creates an edge with the existing nodes based on their respective ranks and the corresponding linking probability for RankLinear, RankExp, and RankUniform is defined as follows: For RankLinear,
For RankExp,
PT
LP (k + 1 → ir ) = (2 − SP + 2.(SP − 1).
Rank(ir ) − 1 ) : 1 ≤ SP ≤ 2, 1 ≤ r ≤ k k−1
CE
1 LP (k + 1 → ir ) = ( (1 − exp1−Rank(ir ) )) : c is a constant, 1 ≤ r ≤ k c
AC
For RankUniform,
LP (k + 1 → ir ) =
1 : 1≤r≤k k
5.3.2. Derivation in Case: RWS In case of the RWS scheme, we next derive a general expression for the degree distribution of the generated networks. Suppose the network reaches to the e number of edges after successive adding of an edge at each time step (i.e. m = 1). Suppose at time step t − 1, the network has i − 1 nodes and at time t, the ith node comes in to create an edge with probability LP (i → j) with the j th node among the pre-existing nodes, where 8
ACCEPTED MANUSCRIPT
j < i. Over time t, new nodes come in and create edges with node i. The k th incoming node, creates an edge with ith node, with probability LP (k → i), where k > i. Now the average degree of the ith node over the finished network can be calculated as
j=1
i−1 X
=
j=1
=
LP (i → j) + fj
Pi−1
p=1 fp
1+
n X
k=i+1
n X
k=i+1 n X
+
k=i+1
fi Pk−1
LP (k → i)
fi Pk−1 p=1
fp
p=1
(1)
fp
CR IP T
i−1 X
di =
i−1 X
2LP (i → i′ )LP (i → j ′ ) +
i−1 X
(LP (i → i′ )
i−1 X
i−1 X
LP (i → i′ )(1 − LP (i → i′ )) +
=2
i′ =1
=2
i′ =1
PT
=2
i−1 X
i′ =1 i−1 X
LP (i → i′ ) − 2 LP (i → i′ ) −
CE
=2
i′ =1
=2(
AC
j ′ =1 i′ 6=j ′
i−1 X
i′ =1
= 2−
j=1
LP (i → j ′ )) +
ED
i′ ,j ′ =1 i′ 6=j ′
i−1 X
fi′ Pi−1
i−1 X j=1
p=1 fp
fj ( Pi−1
p=1
)−
fp
i−1 X j=1
i−1 X j=1
(LP (i → j))2 ) +
M
di = (
AN US
For the case of two edges being created successively at each time step (i.e. m = 2), the expression for the average degree of the ith node is different. At time step t, the ith node comes in to create two edges with the pre-existing i − 1 nodes with probability LP (i → j), where j < i, is a pre-existing node. Now there are two possibilities. The first possibility is that the ith node chooses two different nodes {i′ , j ′ }, where 1 ≤ i′ , j ′ ≤ i − 1 with probability LP (i → i′ )LP (i → j ′ ). The second one is that the ith node chooses the same node j, where 1 ≤ j ≤ i − 1 two times with probability (LP (i → j))2 , in which case the two edges are considered to be a single edge. After time t, new nodes come in and create edges with node i. Just like the case m = 1 above, the k th incoming node, creates an edge with ith node, with probability LP (k → i), where k > i. Hence the average degree of the ith node in this case becomes
i−1 X j=1
j=1
)2 +
p=1 fp
n X
k=i+1
(LP (i → j))2 +
j=1
)2 +
i−1 X
p=1
n X
k=i+1
j=1
(LP (i → j))2 +
n X
k=i+1 n X
k=i+1
fi Pk−1
LP (k → i)
LP (k → i)
k=i+1
(LP (i → j))2 +
fj ( Pi−1
k=i+1
i−1 n X X (LP (i → j))2 + LP (k → i)
(LP (i → j))2 +
i−1 X
n X
n X
k=i+1
LP (k → i)
(2)
LP (k → i) fi Pk−1 p=1
fp
fp
For large enough i and for sufficiently large n, the second term of the above equation (i.e. Equation 2) can be made sufficiently small. The generalization of di in a network generated through RWS scheme by adding a m number of edges successively at each time step can be done in a similar way as in the Pshown k−1 case m = 1 and m = 2 above. The denominator of the last term of Equations 1 and 2 is p=1 fp and it 9
ACCEPTED MANUSCRIPT
can be approximated to (k − 1)f for large i and for sufficiently large n, where f = degree of the ith node, di can be written as in general
d i ≃ m′ + f i (
n X
k=i+1
Pn
i=1
fi . Then average
1 ), where m′ = m ± ǫ, is constant (k − 1)f
CR IP T
and ǫ is a small positive number. n X 1 ≃ m′ + Hfi , where H = ( ) is constant, for given i and n. (k − 1)f k=i+1
We choose large values of i and n to obtain consistent statistics of the generated network. In the continuous case, we can write the above equation as d = m′ + Hf , where f is drawn from lognormal distribution with density ρ. For sufficiently large n, the distribution of degree d, for a given fitness value f , will be concentrated around d. Therefore the variance in d is mostly due to the variance in f . Hence we may write d = m′ + Hf . Therefore the degree distribution P (d) of d can be derived from the distribution ρ(f ) of f and takes the form 1 d − m′ ρ( ) H H
AN US
P (d) = 5.4. Model Discussion and Implications
(3)
AC
CE
PT
ED
M
From the above Equation 3 it is clear that the degree distribution of a network generated through RWS scheme is directly dependent on the fitness distribution with density ρ. Similar derivations for SUS and SRS can be done by replacing the linking probability in Equations 1 and 2 as shown in Appendices A and B. The approximation of final expression of the degree distribution comes out to be the same except for changing the constants H and m′ . Hence, varying σ will affect the fitness distribution ρ, which in turn will change the degree distribution P (d). An important consequence of the above equation is that if the fitness distribution ρ is scale-free, the resultant degree distribution P (d) will also be scale free as has been pointed out in [11]. In our case ρ is the log-normal distribution and it has been shown in [34] that for σ > 1, the log-normal distribution shows a power-law characteristics for a large portion of the distribution, as can be readily seen in a log-log plot of the complementary cumulative distribution function or the density function. We may therefore conclude that the resultant networks generated through RWS, SRS and SUS algorithms follow power-law degree distribution for certain values of σ. This is demonstrated experimentally in the following sections. Unlike the above non-ranking algorithms, the degree distribution of the networks generated through any of the ranking based algorithms (RankLinear, RankExp. and RankUniform) is not linearly dependent on the fitness distribution ρ since the linking probability does not depend directly on the fitness values of the existing nodes. The mathematical expressions of the degree distributions in those cases turn out to be very complex, and hence, no theoretical statement is made here for those cases. Note that the proposed network construction algorithm does not make use of any information about the degrees of the pre-existing nodes. Also note that the only assumption we made about the fitness values is that they are log-normally distributed. By varying the parameter σ of initial log-normal fitness distribution, it has been observed that the six different selection schemes can be divided into two groups that produce similar types of networks. The first group consists of RWS, SRS and SUS algorithms and their resultant networks. The degree distributions of the resulting networks depend on the parameter σ of the fitness distribution. Experimentally also it has been observed that, for certain values of σ, this group produces power-law networks having the exponent λ lying in the interval (2, 3), which are seen in many real world complex networks. For other values of σ, this group generates networks having exponential degree distribution, which are also sometimes observed in real world networks. 10
ACCEPTED MANUSCRIPT
CR IP T
The second group consists of RankLinear, RankExp. and RankUniform algorithms and their resultant networks. The degree distribution of the resulting networks does not depend on the parameter σ directly. It has been observed that this group produces networks having power-law exponent λ, lying in the interval (1, 2), for all σ. Several other structural and/or topological properties, described in Section 4, have also been derived over the newly generated networks and describe in the next section in detail. We have found that algorithms belonging to the same group generate networks having similar structural and/or topological properties. More specifically, the functional dependence of the structural properties of the resulting networks with respect to σ shows one type of behavior for the first group of algorithms (viz. RWS, SRS, SUS) and another type of behavior for the second group (viz. RankLinear, RankExp, RankUniform). 6. Simulation Results
AC
CE
PT
ED
M
AN US
While simulating, we choose the fitness Γi associated with each node i in the network randomly from a log-normal distribution with parameter µ = 0 and σ. The parameter σ is the most important parameter, which affects the skewness of the distribution, can be observed in the supplementary material (Figure 1). Here in our evaluation, we confine ourselves to the value of µ = 0 and only vary the parameter σ. In each case, the network is built from a small initial network of size n0 = 4 nodes, each with two edges. Each new node comes with two edges, that means m = 2. Though we have considered networks of various sizes n, the results shown correspond to n = 2000. We have used the Kamada-Kawai visualization algorithm implemented in the R network package using the plot function with the parameters: vertex.size=0, vertex.label.dist=0.5, vertex.color=”red”, edge.arrow.size=0, vertex.label=NA. R Codes for generating networks using proposed algorithms can be found at 2 3 . As σ increases, the initial log-normal fitness distribution allows for a wider range of values which substantially changes the degree distribution of the resulting networks in the first three algorithms (viz. RWS, SRS, SUS) shown in Figure 1. Table 2 provides the estimated power-law exponent of the degree distribution of the resulting networks. In these cases we are successfully able to generate power-law networks for σ = 1.5 which result in the exponent value around 3, similar to what is observed in real world networks. For some other values of σ, the above three algorithms are also able to produce power-law networks with exponent ranges between 1.5 and 3. For large values of σ, say 5 or more, we see that the above mentioned algorithms generate networks in which most nodes have a few connections, while a very few nodes end up with a very large number of connections. This implies that few nodes dominate the entire network in terms of connectivity. These results clearly indicate that our first three algorithms (viz. RWS, SRS, SUS) is able to characterize the winner-takes-all outcome and the other characteristics of the real world complex networks that have already been reported in the literature [1, 27]. For the other three algorithms (viz. RankLinear, RankExp, RankUniform), no dramatic changes of the degree distribution of the resulting network has been observed by varying σ as can be seen in supplementary materials (Figure 2). In these cases, we are also able to generate networks with power-law exponent lying between 1 and 2 given in Table 2. For different values of σ similar types of networks are generated in which there are no dominant types of nodes in terms of connectivity. Ranking based algorithms are able to produce identical networks and maintain uniformity in all the above mentioned structural measures for varying σ. Several inherently emergent structural and/or topological properties, discussed in Section 4, of the proposed networks are described in detail in the subsequent sections of this article. All the structural measures have been analyzed over 100 networks with 2000 vertices. 6.1. Average Path Length and Diameter Figure 2 depicts the plot of average path length (APL) and diameter of the resulting network with respect to the standard deviation (σ) of the initial log-normal fitness distribution. In case of RWS, SRS and SUS, 2 https://swarupchatt.shinyapps.io/swarup_r/ 3 https://www.isical.ac.in/
~ swarup_r/
11
1000
1000
100
100
10000
10000
1000
Count
Count
Count
Count
1000
10
100
10 10
1
1 1001
10
1000
Count
Count
Count
100
10
ED
10
1 10 1
1
1 1000 1
10
100
1000
10000
1000
10000
1000
10000
Node Degree
10000
10000
1000
1000
100
10
10
1 100 1
10
100
1 100 1
10
Node Degree
10
Node Degree
100
Node Degree
CE
AC
1000
1000
10000
1000
1000
Count
Count
10
100
100
10 10
1 1
10000
100
Count
100
Count
SUS
PT
Node Degree
100
Count
1000
1
10
Node Degree
M
10000
100
1 100 1
10
10
Node Degree
SRS
Node Degree
100
AN US
1
CR IP T
RWS
ACCEPTED MANUSCRIPT
10
Node Degree
(a) σ = 0.01
1 1001
1 100 1
10
Node Degree
10
10
100
Node Degree
(b) σ = 0.1
(c) σ = 1.5
1 1000 1
10
100
Node Degree
(d) σ = 5
Figure 1: Resulting networks and its corresponding degree distributions generated through RWS, SRS and SUS algorithm has been shown here for varying σ: (a)σ = 0.01 (b) σ = 0.1 (c) σ = 1.5 (d) σ = 5.
12
ACCEPTED MANUSCRIPT
RWS 2.2824 2.2854 2.2888 2.4329 2.6667 2.9255 4.7212 5.8445
σ = 0.001 σ = 0.01 σ = 0.1 σ=1 σ = 1.5 σ=2 σ=5 σ=8
SRS 2.2827 2.2826 2.2833 2.4634 2.6538 3.0666 5.3292 6.7412
b λ RankLinear 1.6433 1.6422 1.6423 1.6425 1.6419 1.6422 1.6429 1.6419
SUS 2.2796 2.2866 2.2775 2.4583 2.6917 2.8275 4.8168 5.0123
RankExp. 1.6483 1.6496 1.6478 1.6465 1.6481 1.6488 1.6487 1.6498
RankUniform 1.6489 1.6486 1.6477 1.6478 1.6489 1.7471 1.6467 1.6485
CR IP T
σ
b with varying σ for network with 2000 vertices using six Table 2: Estimated power law exponent (λ) different algorithms by taking xmin = 2.
12.5
25 10.0 Algorithms
Algorithms
RankExp.
RankExp.
RankLinear
RankLinear
RankUniform
RankUniform
20
RWS
Diameter
RWS
SUS
7.5
SRS
AN US
APL
SRS
SUS
15
5.0
10
2.5 0.001
0.01
0.1
1
1.5
2
5
SD
8
0.001
0.01
0.1
1
1.5
2
5
8
SD
(b) Diameter
M
(a) Average Path Length
Figure 2: Plots of the following structural measures with varying σ for 100 networks with 2000 vertices: (a) Average Path Length (b) Diameter
ED
as σ increases, both APL and diameter slowly decrease and stabilize to a fixed value. On the other hand the average path length and diameter of the resulting network remains constant in the case of the ranking based algorithms (viz. RankLinear, RankExp, RankUniform).
AC
CE
PT
6.2. Average Clustering Coefficient and Number of Triangles In Figure 3(a), we notice that, for RWS, SRS and SUS networks, the average clustering coefficient (ACC) increases with increasing σ, if σ is greater than 1. For σ in the range between 0 and 1, ACC is almost zero . This finding is also consistent with the observed number of triangles with respect to varying σ, as shown in Figure 3(b). The clustering coefficient (CC) is a measure of the number of triangles existing in a network, normalized by the possible number of triangles that could exist. So the presence of a large number of triangles in a network results in a high clustering coefficient. Therefore, from Figure 3(b), we conclude that very few triangles are present in the RWS and SUS networks when σ lies between 0 and 1. On the other hand, for σ greater than 1, large numbers of triangles are observed, yielding higher clustering coefficient. In case of SRS, when σ lies between 0 and 2, ACC is 0 and no triangle is present in the resulting network. Further increasing σ beyond 2, ACC also increases because of the presence of a large number of triangles. From Figures 3(a)-3(b), we can also conclude that ACC is almost zero and very few triangles are present, for all values of σ, in the RankLinear, RankExp and RankUniform networks. This uniform behavior of the ACC and the number of triangles can be seen in the ranking based networks because of providing more or less equal priority to each node independent of their fitness value, in the generation process of the networks. 13
ACCEPTED MANUSCRIPT
0.8 Algorithms
Algorithms
RankExp.
RankLinear
RankUniform
RankUniform
RWS
RWS
SRS
SRS
SUS
SUS
No. of triangles
ACC
0.6
RankExp.
1500
RankLinear
0.4
1000
500
0.0
0 0.001
0.01
0.1
1
1.5
2
5
8
0.001
0.01
0.1
1
SD
CR IP T
0.2
1.5
2
5
8
SD
(a) Average Clustering Coefficient
(b) No. of Triangles
Figure 3: Plots of the following structural measures with varying σ for 100 networks with 2000 vertices: (a) Average Clustering Coefficient (b) No. of Triangles
AN US
6.3. Central Point Dominance (CPD) and Central Edge Dominance (CED) 0.8
400
0.6 Algorithms
Algorithms
RankExp.
Rank_Linear
RankLinear
RankExp.
300
RankUniform
SUS
0.4
RankUniform RWS
SRS
SRS
CED
CPD
RWS
SUS
M
200
0.2
100
0.001
0.01
0.1
1
1.5
2
5
8
0 0.001
0.01
0.1
1
1.5
2
5
8
SD
ED
SD
(a) CPD
(b) CED
PT
Figure 4: Plots of the following structural measures with varying σ for 100 networks with 2000 vertices: (a) Central Point Dominance(CPD) (b) Central Edge Dominance(CED)
AC
CE
Central point dominance (CPD) is a measure of the relative betweenness of the most central vertex in a network. It is a measure of the dominance of a single point in controlling communication within the network. It can be used to quantify the degree of concentration of the network layout around the central point. From Figure 4(a), it can be observed that CPD is an increasing function case of σ for the RWS, SRS and SUS networks. Hence, in these cases, with increasing σ, a few vertices play an increasingly central role in maintaining the stability of the network and removing those vertices may easily decompose the network into several components. In the case of the ranking based algorithms (viz. RankLinear, RankExp, RankUniform), CPD values remains constant and small for all σ. Therefore, in these cases, most of the nodes play equal roles in maintaining the stability of the network and it is hard to decompose the network just by removing a few of them. Central edge dominance (CED) is a measure of relative betweenness centrality of the most central edge in a network. It is also a measure of the dominance of a single edge point in controlling the communication within the network. From Figure 4(b), it can be observed that in the case of SRS and SUS networks, CED is an increasing function of σ. Hence, in these cases a few edges play a highly central role in maintaining the stability of the network for large values of σ and removing those edges may decompose the network into 14
ACCEPTED MANUSCRIPT
Algorithms
Algorithms
8
Algorithms 30
RWS
SRS
SRS
SRS
SUS
SUS
SUS
6
Average_knn
RWS
Average_knn
Average_knn
8
6
RWS
20
10 4
4
0 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
1
2
3
4
5
6
7
Degree
8
9
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
1234567891011 213141516171819202122 324252627282930313233 435363738394041424344 546474849505152535455 657585960616263646566 768697071727374757677 8798081828384858687
Degree
(a) σ = 0.01
Degree
(b) σ = 0.1
(c) σ = 1 1000
Algorithms
Algorithms
RWS
RWS
SRS
SRS 750
SUS
Algorithms RWS SRS
SUS
SUS
Average_knn
Average_knn
Average_knn
750
40
CR IP T
1000
60
500
500
20 250 250
0
0 0 123456710 811 912 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 100 98 101 99 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410
123456710 811 912 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 100 98 101 99 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439
Degree
Degree
Degree
AN US
123456789111 012 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160
(d) σ = 1.5
(e) σ = 5
(f) σ = 8
Figure 5: The above six figures corresponds the distributions of average degree of the nearest neighbors(dnn (d)) using three algorithms viz. RWS, SRS, SUS for varying σ for 100 networks with 2000 vertices: (a)σ = 0.01 (b)σ = 0.1 (c)σ = 1 (d)σ = 1.5 (e)σ = 5 (f)σ = 8.
M
several components. In the case of SRS, at first CED value decreases with the increasing σ but eventually it starts increasing for large values of σ. In the case of the ranking based networks, CED values remain constant and small for all σ. Therefore, in these cases, most of the edges play equal roles in maintaining network integrity.
CE
PT
ED
6.4. Average Degree Of The Nearest Neighbors (dnn (d)) Assortative mixing plays an important role for the structural properties of a network and it depends on the average degree of the nearest neighbours. For example, assortative mixing of a network by a discrete characteristic will tend to break up the network into separate communities. To measure the assortativity of the different networks, we analyzed the average degree of the nearest neighbors of vertices with degree d, dnn (d), as a function of d for 100 networks, each with 2000 vertices. From Figures 5(a) and 5(b), it is clear that the RWS, SRS and SUS networks have positive assortativity for smaller values of σ (e.g. σ = 0.01, 0.1), i.e., nodes with similar degrees tend to connect to each other. Whereas for higher values of σ (e.g. σ = 1, 1.5, 5, 8), disassortative mixing is observed as shown in Figures 5(c)-5(f). On the other hand, from Figures 6(a)-6(f), it is clear that for the ranking based networks dnn (d) is an increasing function of d for all values of σ.
AC
6.5. Rich Club Coefficient (RCC) The rich-club coefficient (RCC) is the ratio, for every degree k, of the number of actual to the number of potential edges of nodes with degree greater than k. Networks which have a relatively high rich-club coefficient is characterized by the property that nodes of higher degree are more interconnected than nodes with lower degree. The rich nodes, which are typically a small number of nodes with large numbers of links, are very well connected to each other. The relevance of the rich-club phenomenon is that its presence or absence typically reveals important high-level semantic aspects of a complex network [47]. In Figure 7, we notice that the resulting networks generated through RWS, SRS and SUS have rich-club phenomenon for small σ (e.g. σ = 0.001, 0.01) but as σ increases (e.g. σ = 1, 1.5, 5, 8), non-uniform rich-club effect can be 15
ACCEPTED MANUSCRIPT
10 9
Algorithms
Algorithms
RankExp. 12.5
Algorithms
RankExp.
9
RankExp.
RankLinear
RankLinear
RankUniform
RankUniform
RankLinear RankUniform 8
Average_knn
Average_knn
Average_knn
8 10.0
7
7
7.5 6 6
5
5
5.0 2
3
4
5
6
7
8
9
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
1
2
3
4
5
6
7
8
9
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
Degree
1
(a) σ = 0.01
9
Algorithms RankExp.
RankLinear
RankLinear
RankUniform
RankUniform
Average_knn
Average_knn
Average_knn
7
6 6
5
5
5 6
7
8
9
8
9
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
RankUniform
10
5
7
RankLinear
7
4
6
RankExp.
8
8
3
5
Algorithms
RankExp.
15
2
4
(c) σ = 1
20
Algorithms
1
3
Degree
(b) σ = 0.1
10
9
2
Degree
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
1
2
3
4
5
6
7
CR IP T
1
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
1
Degree
2
3
4
5
6
7
8
9
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
AN US
Degree
(d) σ = 1.5
(e) σ = 5
Degree
(f) σ = 8
Figure 6: The above six figures corresponds the distributions of average degree of the nearest neighbors(dnn (d)) using Rank base algorithms viz. RankLinear, RankExp., RankUniform for varying σ for 100 networks with 2000 vertices: (a)σ = 0.01 (b)σ = 0.1 (c)σ = 1 (d)σ = 1.5 (e)σ = 5 (f)σ = 8.
0.5
0.5
M
0.5
Algorithms
Algorithms
RWS SRS
0.4
RWS
SRS
0.4
SUS
0.4
SUS
0.1
0.0
0.3
RCC
ED
0.2
0.2
0.2
0.1
0.1
0.0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
1
PT
Degree
2
3
4
5
(a) σ = 0.01 0.5
Algorithms RWS SRS
CE
0.4
AC 0.0
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
1 2 3 4 5 6 7 8 9101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263
Degree
Degree
(b) σ = 0.1
(c) σ = 1 0.5
Algorithms
Algorithms
RWS 0.4
RWS
SRS
0.4
SUS
0.3
0.1
0.0 6
0.5
SUS
0.2
SRS SUS
RCC
0.3
RCC
0.3
SRS SUS
0.3
RCC
0.3
RCC
RCC
Algorithms
RWS
0.2
0.2
0.1
0.1
0.0
0.0
12345678910 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122
123456710 811 912 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 100 98 101 99 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425
123456710 811 912 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 100 98 101 99 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474
Degree
Degree
Degree
(d) σ = 1.5
(e) σ = 5
(f) σ = 8
Figure 7: The above six figures corresponds the distributions of rich club coefficient(φ(d)) using three algorithms viz. RWS, SRS, SUS for varying σ for 100 networks with 2000 vertices: (a)σ = 0.01 (b)σ = 0.1 (c)σ = 1 (d)σ = 1.5 (e)σ = 5 (f)σ = 8.
16
ACCEPTED MANUSCRIPT
0.5
0.3
0.5
Algorithms
Algorithms
RankExp.
Algorithms
RankExp.
RankLinear
RankExp.
RankLinear
0.4
RankUniform
RankLinear
0.4
RankUniform
RankUniform
RCC
RCC
0.3
RCC
0.3
0.2
0.2
0.2
0.1
0.1
0.1
0.0
0.0 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
0.0 1
2
3
4
5
6
7
8
9
10
11
Degree
12
13
14
15
16
17
18
19
20
21
22
23
24
25
1
2
3
4
5
6
7
8
9
10
11
Degree
(a) σ = 0.01
12
13
14
15
16
17
18
19
20
21
22
23
Degree
(b) σ = 0.1
(c) σ = 1 0.5
Algorithms
0.4
Algorithms
Algorithms
RankExp.
RankExp.
RankLinear
RankLinear
RankUniform
RankUniform
RankExp.
RankLinear
0.4
RankUniform
0.2
RCC
RCC
0.3
RCC
0.3
0.2
0.2 0.1
0.1
0.1
0.0
0.0
0.0 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
1
2
3
4
5
CR IP T
0.3 0.5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Degree
1
2
3
4
5
6
7
8
(d) σ = 1.5
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Degree
AN US
Degree
(e) σ = 5
(f) σ = 8
Figure 8: The above six figures corresponds the distributions of rich club coefficient(φ(d)) using Rank base algorithms viz. RankLinear, RankExp., RankUniform for varying σ for 100 networks with 2000 vertices: (a)σ = 0.01 (b)σ = 0.1 (c)σ = 1 (d)σ = 1.5 (e)σ = 5 (f)σ = 8.
M
seen in the resulting networks. Likewise the networks generated through the ranking based algorithms (viz. RankLinear, RankExp, RankUniform) follow the uniform rich-club effect for all σ as depicted in Figure 8. 6.6. Global Efficiency σ = 0.001 0.192922 0.192156 0.19787 0.257217 0.249464 0.265985
σ = 0.01 0.195927 0.193476 0.198959 0.244993 0.236716 0.270142
σ = 0.1 0.198982 0.197435 0.199516 0.264924 0.266291 0.247479
σ = 1 0.215891 0.219574 0.219497 0.252778 0.266515 0.244698
σ σ = 1.5 0.259676 0.259283 0.267896 0.254926 0.239283 0.249569
σ =2 0.280497 0.314286 0.309345 0.268441 0.265886 0.221782
σ = 5 0.358916 0.402452 0.386597 0.242838 0.243068 0.254465
σ =8 0.451548 0.463512 0.419578 0.269256 0.256283 0.262439
PT
RWS SRS SUS RankLinear RankExp. RankUniform
ED
Algorithms
Table 3: Measures of global efficiency with varying σ for 100 networks with 2000 vertices using six different algorithms
AC
CE
The efficiency of a network, measures how efficiently information can be exchanged over the network and it can be applied to both local and global scales in a network. Global efficiency signifies the exchange of information across the whole network where as local efficiency quantifies the exchange of information on a neighborhood. By using efficiency, one can conclude that small-world networks are both globally and locally efficient. The above table gives the global efficiency information for varying σ of the resulting 100 networks with 2000 vertices generated through our six algorithms. From Table 3, it is clear that the networks generated through RWS, SRS and SUS are more efficient than the networks generated through ranking based algorithms with the increasing value of σ. 7. A note on using other selection schemes and fitness distributions Although it is possible to generate various networks using various combinations of selection schemes and fitness functions, not all combinations of selection schemes and fitness functions lead to power-law networks. 17
Count
Count 5
10
20
50
100
200
500
1000
10 2
5
10
20
Node Degree
50
100
200
500
2
Node Degree
(a) RWS
5
10
20
50
100
200
Node Degree
CR IP T
2
1
1
1
5
5
5
10
10
Count
50
50
50
100
100
100
500
500
500
1000
1000
1000
ACCEPTED MANUSCRIPT
(b) SRS
(c) SUS
2
5
10
20
1000 500 100 50
Count
10 5 1
2
5
10
Node Degree
M
Node Degree
AN US
Count 1
1
5
5
10
10
Count
50
50
100
100
500
500
1000
1000
Figure 9: Log-log plot of the degree distribution of the generated networks through (a) RWS (b) SRS (c) SUS algorithms over Pareto fitness distribution. It can be clearly seen that all of the above degree distributions are heavy-tailed, which is a result of heavy-tailed nature of Pareto distribution. All the results have been computed over generated network with 2000 vertices, m = 2 and n0 = 4.
(a) RWS
(b) SRS
20
2
5
10
20
Node Degree
(c) SUS
ED
Figure 10: Log-log plot of the degree distribution of the generated networks through (a) RWS (b) SRS (c) SUS algorithms over Uniform fitness distribution. It is clear that all of the above degree distributions are not heavy-tailed because the Uniform distribution is not heavy-tailed. All the results have been computed over generated network with 2000 vertices, m = 2 and n0 = 4.
AC
CE
PT
Only particular choices of selection schemes and fitness functions produce power-law networks. For example, node-degree could be considered as a measure of fitness. Using node-degree as a fitness function is only a special case of the proposed framework. The proposed framework considers fitness at an abstract level and as a generalized entity, instead of only taking degree as a special case of fitness value. The log-normal fitness distribution has been chosen for its theoretical justification and it has also been used by Ghadge et al. [19]. However one may choose a fitness distribution other than log-normal. Through various experiments that we have conducted with other distributions, we have found that heavy-tailed distributions are able to generate power-law networks with the structural characteristics similar to real world networks. Distributions that are not heavy-tailed fail to produce power-law networks. Below we provide the simulation results corresponding to Pareto and Uniform fitness functions. More results can be found in the Supplementary Results. In our proposed framework the process of network generation depends on how we use the fitness (importance) values of the nodes. The more the fitness value of a node, more is the propensity of other nodes to create an edge with that node. This particular philosophy is inherent in the selection mechanism of Genetic algorithms. This is the reason for using the selection mechanisms of Genetic Algorithms. However one may choose different selection mechanism other than the ones proposed. In particular use a selection 18
1000 500 100 Count
10 5
10 10
20
50
100
200
500
2
5
Node Degree
10
20
50
100
Node Degree
(a) Pareto
(b) Poisson
200
500
1000
2
5
10
20
50
100
200
500
Node Degree
CR IP T
5
1
1
5
5 1 2
50
100 50
Count
50 10
Count
100
500
500
1000
1000
ACCEPTED MANUSCRIPT
(c) Log-normal
Figure 11: Log-log plot of the degree distribution of the generated networks using top-m selection scheme over (a) Pareto (b) Poisson (c) Log-normal fitness distribution functions. It is clear from the above figures that the top-m selection scheme fails to produce power-law degree distribution for any of these fitness functions. All the results have been computed over generated network with 2000 vertices, m = 2 and n0 = 4.
M
8. Comparison with Real World Networks
AN US
algorithm in which the top-m existing nodes with highest fitness scores are first selected, out of which a few nodes are then selected to connect with the incoming node, then the final generated network becomes biased towards the top-m nodes and leads to winner-takes-all behaviour. Experimentally we have found that this top-m selection scheme fails to produce power-law networks and also fails to reproduce real world structural characteristics. Here we show the results for the top-m selection scheme corresponding to Pareto, Poisson and log-normal fitness functions. More results can be found in Supplementary Results. Thus once again one can see that a proper choice of selection schemes and fitness functions is required to produce power-law networks which possess global topologies similar to real world networks.
AC
CE
PT
ED
8.1. Data Description and Parameter Setting To compare the performance of our proposed network models with existing baseline and state-of-the-art models, we have conducted some experiments over real world datasets. We have used five real world collaborative networks from the SNAP library [29], viz., Arxiv ASTRO-PH (Astro Physics, ca-AstroPh, N=18,772, E=198,110), Arxiv COND-MAT (Condensed Matter Physics, ca-CondMat, N=23,133, E=93,497), Arxiv GR-QC (General Relativity and Quantum Cosmology, ca-GrQc, N=5,242, E=14,496), Arxiv HEP-TH (High Energy Physics- Theory, ca-HepTh, N=9,877, E=25,998) and DBLP (DBLP co-authorship network, N=317,080, E=1,049,866), where N is number of nodes and E is number of edges in the network. We consider the Kronecker graph model [30], as a baseline model and the LNFA [19] model along with the MAG graph model [25] as state-of-the-art models for the purpose of comparison. Because of the restriction of the Kron model generation process, we down sampled each considered network through Random Jump Sampling (RJS) [28] by keeping the number of nodes to N=4,096 (212 ) for each network throughout the comparison. We have used the standard parameter value, p = 0.15 (i.e., the probability p = 0.15 to randomly jump to any node in the network), while applying Random Jump for each of the original networks except DBLP. For DBLP, we have used p=0.01 so as to preserve sufficient numbers of edges while down sampling hundreds of thousands of nodes to a few thousand. Therefore the number of resultant edges would become E=20,838, E= 12,714, E=12,343, E=11,394 and E= 11,166 for the networks ca-AstroPh, ca-CondMat, ca-GrQc, ca-HepTh and DBLP respectively. On these sub-sampled real world networks, we applied the MagFit and Kronfit algorithms to estimate the MAG and Kron model parameters of these networks. Then, using these parameters we generated synthetic MAG and Kron networks. On the other hand the proposed network generation models (RWS, SRS, SUS, RankLinear, RankExp, and RankUniform) are associated with the parameters σ, n0 and m, out of which σ is the most important 19
ACCEPTED MANUSCRIPT
8.2. Evaluation Criteria
AN US
In [25], authors have used several important network structural properties such as degree distribution (DD), average clustering coefficient (ACC), singular values (SValue), triad participation (TP), etc. with modified version of the Kolmogorov-Sminorv (KS) and the L2 statistics to measure the level of agreement between synthetic and real models. To measure the concurrency between the proposed network (RWS, SRS, SUS, RankLinear, RankExp., RankUniform), the Kron network, the LNFA network, the MAG network and the real network, we too have used a modified version of the Kolmogorov-Sminorv (KS) statistic and the L2 distance computed over the distributions of the important network structural properties viz. DD, ACC, SValue and TP as described in Section 4. The modified Kolmogorov-Smirnov (KS) statistic and L2 distance are preferable in this context because of the heavy-tailed nature of the distributions of the above structural properties over real world networks. We also plot the distributions in terms of complementary cumulative distribution functions (CCDF) (P (X > x)) because of their heavy-tailed nature. For more mathematical details about modified KS and L2 statistics, please go through the supplementary materials.
M
ca−AstroPh MAG SUS Kron LNFA
ca−AstroPh MAG SRS Kron LNFA
5
10 No of Triads
TP
20
50
100
5
10
1000 No of Nodes (CCDF)
10 10
50
100
500
1000
1 2
5
10
10000
ca−AstroPh MAG SRS Kron LNFA
20
50
100
ca−AstroPh MAG SRS Kron LNFA
500
1000
10
50
100
500
1000
DD ca−AstroPh MAG SUS Kron LNFA
ca−AstroPh MAG SUS Kron LNFA
10
Singular Value
20
1000 20 Singular Value 1
2
5
10
20
50
100
No of Triads
1
5
10
50
100
500
1000
1
1
2
10
5 100
5
Node Degree
2
10 50
Rank
SValue
1
Degree
ACC
1000 No of Participating Nodes (CCDF)
1e−01 1e−03
5
DD
1 1
1
Node Degree
2 1
2
Average Clustering Coefficient (CCDF)
1000 100
100
100 10
50
5
Singular Value
20
AC
20 Degree
100
10
ACC
ca−AstroPh MAG RWS Kron LNFA
50
ca−AstroPh MAG RWS Kron LNFA
100 10
5
50
2
1
1e−03
1000
5
500
10000
100
100
50
Node Degree
DD
No of Participating Nodes (CCDF)
10
1
5
1e−01
PT 1
100
100
50
50
10
20 Degree
CE
10
No of Nodes (CCDF)
Average Clustering Coefficient (CCDF)
1000 100
No of Nodes (CCDF)
10 1 5
ACC
ca−AstroPh MAG SUS Kron LNFA
1e+01
ca−AstroPh MAG SRS Kron LNFA
10000
10000
ca−AstroPh MAG RWS Kron LNFA
SUS
1e+01
ca−AstroPh MAG RWS Kron LNFA
ED
SRS
10000
RWS
100
8.3. Analysis of Comparison Results 8.3.1. ca-AstroPh
2
1
CR IP T
parameter for network generation process. We then compared, quantitatively and graphically, the proposed network models with both the baseline and state-of-the-art models. Note that, we have used the parameter values, σ = 2.7, σ = 1.9, σ = 2.8, σ = 2.1 and σ = 2.3 for the networks ca-AstroPh, ca-CondMat, ca-GrQc, ca-HepTh and DBLP respectively as we have observed that these are the optimal choices for these networks. The optimal choice for n0 is kept fixed at 5 and m takes the values 2 or 3 or 4, which are also the optimal choices for these networks. We have also applied the LNFA model over the optimal sigma as above observed to generate the corresponding LNFA networks. Finding the explicit expression of the parameters σ and m corresponding to a given network is less straightforward and would be one of our future explorations. In the following subsection, we describe the statistical measures used for quantitatively comparing the proposed models with the baseline and state-of-the-art models based on the structural properties of the network.
1
2
5
Rank
TP
SValue
20
10 No of Triads
TP
20
50
100
1
5
10
50 Rank
SValue
100
500
1000
ACCEPTED MANUSCRIPT
ca−AstroPh MAG RankUniform Kron LNFA
20
50
100
10
100
500
ca−AstroPh MAG RankExp. Kron LNFA
ca−AstroPh MAG RankExp. Kron LNFA
100
5
10
50
100
500
1000
10
20
50
100
100
1
5
10
50
100
500
1000
Node Degree
DD
ca−AstroPh MAG RankUniform Kron LNFA
ca−AstroPh MAG RankUniform Kron LNFA
1000
No of Participating Nodes (CCDF)
20 1
2
5
Rank
10
20
50
No of Triads
SValue
100
1
1
2
10
5
10
Singular Value
5 1
5
1
5
10
50
100
500
1000
1
2
AN US
50
1
2
10 20
2
ACC
1000 100
No of Participating Nodes (CCDF)
20 10
Singular Value
5 10 No of Triads
TP
1000
Degree
DD
10000
100 50
ca−AstroPh MAG RankLinear Kron LNFA
50 Node Degree
2 5
No of Nodes (CCDF)
10 5
1
1e−03 1
Degree
100
10
50
5
ACC
1 2
1e−01
Average Clustering Coefficient (CCDF)
1000
1e+01 2
20
1000
10
500
Singular Value
100
ca−AstroPh MAG RankUniform Kron LNFA
1
50 Node Degree
10000
10
DD ca−AstroPh MAG RankLinear Kron LNFA
100
No of Nodes (CCDF)
1000 5
1
1e−03 1
100
100
CR IP T
50
100
20 Degree
50
10
10000
10000
RankUniform ca−AstroPh MAG RankExp. Kron LNFA
10
1e−01
Average Clustering Coefficient (CCDF)
1000 100
No of Nodes (CCDF)
10 1 5
ACC
5
Rank
TP
10
20
50
100
1
5
10
No of Triads
SValue
50
100
500
1000
Rank
TP
SValue
Figure 12: The structural properties (viz. ACC, DD, TP, SValue) recovered by the Kronecker graphs, the LNFA, the
MAG, the RWS, the SRS, the SUS, the RankLinear, the RankExp. and the RankUniform models over ca-AstroPh network. Within each sub-figure, the dotted yellow, green, violet, blue and the red curves display the output patterns generated through the Original Network, the Kron model, the LNFA model, the MAG model and the proposed model (RWS, SRS, SUs, RankLinear, RankExp. and RankUniform), respectively.
M
8.3.2. DBLP
DBLP MAG SUS Kron LNFA
5
10 No of Triads
TP
20
50
100
1
5
10
5
10
50
100
500
1000
2
5
10
Node Degree
20
50
100
1000
10000
100
50
DBLP MAG SRS Kron LNFA
100
500
1000
DD DBLP MAG SUS Kron LNFA
DBLP MAG SUS Kron LNFA
5
10
Singular Value
20
1000 100
No of Participating Nodes (CCDF)
1
2
5
10
20
50
100
No of Triads
1
5
10
50
100
500
1000
1
1
2
10
5 500
10
Node Degree
2 100
5
ACC
10 1 50
Rank
SValue
1
Degree
DD DBLP MAG SRS Kron LNFA
100
No of Nodes (CCDF)
1000 1
100
100
1
1e−03
10
1e−01
Average Clustering Coefficient (CCDF)
1000 50
50
20 Degree
2 1
2
100
No of Nodes (CCDF)
10 10
ACC
1
20
5
50
CE
100
2
5
10
AC
Singular Value
1000
DBLP MAG RWS Kron LNFA
50
DBLP MAG RWS Kron LNFA
500
20
100
10
50 Node Degree
DD
Singular Value
10
1
1e−03
5
10000
1
1000
100
100
50
PT
20 Degree
No of Participating Nodes (CCDF)
10
1e−01
Average Clustering Coefficient (CCDF)
1000 100
No of Nodes (CCDF)
10 1 5
ACC
DBLP MAG SUS Kron LNFA
1e+01
DBLP MAG SRS Kron LNFA
10000
DBLP MAG SRS Kron LNFA
ED
DBLP MAG RWS Kron LNFA 1e+01
DBLP MAG RWS Kron LNFA
SUS
10000
SRS
10000
RWS
2
1
ca−AstroPh MAG RankExp. Kron LNFA
ca−AstroPh MAG RankLinear Kron LNFA 1e+01
ca−AstroPh MAG RankLinear Kron LNFA
2
1
RankExp.
10000
RankLinear
1
2
5
Rank
TP
SValue
21
10 No of Triads
TP
20
50
100
1
5
10
50 Rank
SValue
100
500
1000
ACCEPTED MANUSCRIPT
DBLP MAG RankExp. Kron LNFA
DBLP MAG RankUniform Kron LNFA
5
10
20
50
100
10000
50
100
500
No of Participating Nodes (CCDF)
20
5
10
50
100
500
1000
1 5
10
20
50
100
1
5
10
50
100
500
1000
Node Degree
ACC
DD
DBLP MAG RankUniform Kron LNFA
DBLP MAG RankUniform Kron LNFA
1
2
Rank
5
10
20
No of Triads
SValue
TP
50
100
1
1
2
10
5
10
Singular Value
5 1
1
5
10
50
100
Rank
500
1000
AN US
100
1
2
10 50
SValue
1
2
5
10
20
50
100
1
5
No of Triads
TP
ED
M
Table 4 show the KS and L2 statistics for each of the four important structural properties (ACC, DD, TP, SValue) over each real world network which are also plotted in Figures 12 - 13. Figures corresponding to the networks ca-CondMat, ca-GrQc and ca-HepTh are given in the supplementary materials (Figures 13 - 15). Closer inspection through the Table 4 will reveal that the results are consistent with the previous graphical plots. From Table 4, it is evident that the proposed non ranking based models (viz. RWS, SRS and SUS) clearly outperform the baseline and the state-of-the-art models in reproducing the important structural properties and show better average performance compared to both baseline (Kron) and state-ofthe-art models (LNFA and MAG) over each of the real world network under consideration. On the other hand, the proposed ranking based models (viz. RankLinear, RankExp. and RankUniform) can reproduce most of the network structural properties with a performance superior to the baseline and sometimes better than the state-of-the-art models over each of the real world networks as depicted through Table 4. This finding is also consistent with the plotted graph as seen in Figures 12 - 13 and Figures 13 - 15 as given in the supplementary materials. In the case of ca-HepTh network, both of the proposed ranking and non ranking based models show better average performances compared to the baseline and the state-of-the-art models. On the other hand, in the case of DBLP and ca-CondMat networks, few proposed ranking based models outperform the baseline and the state-of-the-art models with respect to the KS statistic only. Table 5 provides values of a few other network structural properties (viz. average degree(AD), average path length(APL), central point dominance(CPD), central edge dominance(CED), estimated power-law b global efficiency(GE)) as described in Section 4, for each of the considered real world networks exponent λ(λ), and also for the synthetic networks generated by fitting the MAG, the LNFA, the Kron and the proposed model (viz. RWS, SRS, SUS, RankLinear, RankExp. and RankUniform model). From the table it is clear that the recovered values of the network structural properties through the proposed models (both ranking and non-ranking based) are mostly close to the values produced by the baseline (Kron) and state-of-the-art (MAG and LNFA) models. Note that, all the proposed models outperform the baseline and state-of-the-art models for some networks with respect to a few of the aforesaid structural properties. For example, in the
PT
50 Rank
the MAG, the RWS, the SRS, the SUS, the RankLinear, the RankExp. and the RankUniform models over DBLP network. Within each sub-figure, the dotted yellow, green, violet, blue and the red curves display the output patterns generated through the Original Network, the Kron model, the LNFA model, the MAG model and the proposed model (RWS, SRS, SUs, RankLinear, RankExp. and RankUniform), respectively.
22
10
SValue
Figure 13: The structural properties (viz. ACC, DD, TP, SValue) recovered by the Kronecker graphs, the LNFA,
CE
20
2
DBLP MAG RankExp. Kron LNFA
1000 100
No of Participating Nodes (CCDF)
20 10
Singular Value
5 2 10 No of Triads
1000
Degree
DD DBLP MAG RankExp. Kron LNFA
100
1000 10
1000
100 50
DBLP MAG RankLinear Kron LNFA
1 5
TP
No of Nodes (CCDF)
10 5
Node Degree
AC
2
1e−01
Average Clustering Coefficient (CCDF)
1e−03 1
Degree
ACC
100
2
50
1000
20
500
10
100
Singular Value
50 Node Degree
1
10
10000
5
DD DBLP MAG RankLinear Kron LNFA
100
No of Nodes (CCDF)
1000 1
100
100
CR IP T
50
100
20 Degree
50
10
1
1e−03
10
1e−01
Average Clustering Coefficient (CCDF)
1000 100
No of Nodes (CCDF)
10 1 5
ACC
DBLP MAG RankUniform Kron LNFA
1e+01
DBLP MAG RankExp. Kron LNFA
10000
RankUniform
10000
DBLP MAG RankLinear Kron LNFA 1e+01
DBLP MAG RankLinear Kron LNFA
2
1
RankExp.
10000
RankLinear
100
500
1000
ACCEPTED MANUSCRIPT
Avg 4.555 4.295 4.715 3.727 3.837 3.832 5.92 5.895 5.685 Avg 4.018 3.905 3.865 3.288 3.428 3.443 3.698 3.713 4.153 Avg 4.43 4.008 4.17 3.37 3.42 3.59 4.13 4.15 4.049 Avg 4.708 3.918 3.678 3.543 3.418 3.565 4.473 4.598 4.705 Avg 4.01 3.41 3.43 2.97 2.87 2.79 3.157 3.411 3.818
L2 Kron LNFA MAG RWS SRS SUS RankLinear RankExp. RankUniform L2 Kron LNFA MAG RWS SRS SUS RankLinear RankExp. RankUniform L2 Kron LNFA MAG RWS SRS SUS RankLinear RankExp. RankUniform L2 Kron LNFA MAG RWS SRS SUS RankLinear RankExp. RankUniform L2 Kron LNFA MAG RWS SRS SUS RankLinear RankExp. RankUniform
ACC 2.48 2.38 2.21 2.26 2.1 2.09 2.45 2.43 2.47 ACC 2.24 2.16 2.09 1.86 1.87 1.81 2.15 1.99 2.08 ACC 2.35 2.11 2.04 1.78 1.85 1.96 2.15 2.08 2.075 ACC 2.49 2.18 1.95 1.69 1.62 1.75 2.51 2.52 2.47 ACC 2.34 2.01 2.02 1.79 1.81 1.76 1.93 1.86 2.01
DD 2.23 2.28 2.41 2.26 2.12 2.1 2.66 2.68 2.70 DD 1.58 1.62 1.44 1.56 1.57 1.61 1.61 1.74 1.78 DD 1.58 2.18 1.61 2.03 2.06 2.11 1.45 1.58 1.57 DD 2.21 2.11 1.67 1.98 1.82 2.1 1.89 1.95 2.02 DD 1.18 1.66 1.23 1.68 1.601 1.69 1.03 1.06 1.29
TP 3.64 3.53 3.58 3.46 3.467 3.468 3.86 3.85 3.85 TP 2.76 2.66 2.71 2.43 2.49 2.46 2.712 2.72 2.723 TP 2.82 2.62 2.69 2.16 2.12 2.118 2.75 2.76 2.755 TP 2.95 2.71 2.63 2.36 2.46 2.18 2.92 2.89 2.92 TP 2.49 2.27 2.42 2.01 2.12 2.05 2.38 2.403 2.402
SValue 0.96 1.23 1.16 0.99 1.14 1.1 1.13 1.14 1.14 SValue 0.76 0.72 0.88 0.66 0.73 0.62 1.08 1.09 1.09 SValue 0.94 0.96 0.941 0.87 0.93 0.928 1.18 1.177 1.178 SValue 1.29 0.73 0.66 0.68 0.61 0.62 1.32 1.35 1.35 SValue 0.83 0.82 0.78 0.81 0.801 0.807 1.107 1.103 1.108
AC
Table 4: KS and L2 statistics of Kron, LNFA, MAG, RWS, SRS, SUS, RankLinear, RankExp. and RankUniform model over considered five real-world networks.
23
Avg 2.327 2.355 2.34 2.242 2.206 2.189 2.525 2.525 2.54 Avg 1.835 1.79 1.78 1.628 1.665 1.625 1.888 1.885 1.918 Avg 1.92 1.968 1.82 1.71 1.74 1.78 1.883 1.899 1.894 Avg 2.235 1.933 1.728 1.678 1.628 1.663 2.16 2.178 2.19 Avg 1.71 1.69 1.613 1.598 1.608 1.602 1.612 1.607 1.703
CR IP T
SValue 1.07 0.87 0.98 0.36 0.43 0.41 1.29 1.3 1.26 SValue 0.55 0.68 0.49 0.32 0.38 0.33 0.62 0.69 0.65 SValue 1.38 0.87 1.01 0.37 0.43 0.48 1.21 1.24 1.26 SValue 1.65 0.75 0.95 0.56 0.53 0.49 1.42 1.48 1.49 SValue 0.93 0.61 0.81 0.48 0.52 0.36 0.897 0.894 0.91
AN US
TP 7.15 7.1 7.11 6.98 7.09 7.08 7.42 7.45 7.35 TP 7.26 6.71 6.49 6.23 6.41 6.54 6.7 6.91 6.85 TP 6.91 6.28 6.25 5.76 5.76 6.01 6.65 6.91 6.61 TP 6.49 6.29 5.99 6.12 6.11 5.76 6.51 6.64 6.78 TP 6.99 6.39 6.31 5.12 5.23 5.15 5.94 6.83 6.72
M
DD 4.95 5.25 5.26 4.86 5.21 5.21 8.25 8.26 8.25 DD 3.94 4.39 3.41 4.35 4.38 4.34 3.31 3.52 4.02 DD 4.28 4.99 3.91 4.83 4.93 5.09 4.29 4.28 4.04 DD 5.05 4.95 4.38 4.77 4.58 5.12 4.35 4.38 4.92 DD 2.59 3.19 2.89 3.17 3.01 3.13 2.48 2.78 3.49
ED
ACC 5.05 3.96 5.51 2.71 2.62 2.63 6.72 6.57 5.88 ACC 4.32 3.84 5.07 2.25 2.54 2.56 4.16 3.73 5.09 ACC 5.14 3.89 5.49 2.51 2.55 2.79 4.37 4.19 4.28 ACC 5.64 3.68 3.39 2.72 2.45 2.89 5.61 5.89 5.63 ACC 5.53 3.44 3.71 3.11 2.72 2.52 3.31 3.14 4.15
PT
Kron LNFA MAG RWS SRS SUS RankLinear RankExp. RankUniform KS Kron LNFA MAG RWS SRS SUS RankLinear RankExp. RankUniform KS Kron LNFA MAG RWS SRS SUS RankLinear RankExp. RankUniform KS Kron LNFA MAG RWS SRS SUS RankLinear RankExp. RankUniform KS Kron LNFA MAG RWS SRS SUS RankLinear RankExp. RankUniform
CE
ca-HepTh
ca-GrQc
DBLP
ca-CondMat
ca-AstroPh
KS
Original 10.86 4.32 0.0342 60.94 1.419 0.212 6.29 5.30 0.1035 48.69 1.492 0.193 5.46 8.74 0.0627 155.35 1.512 0.152 6.03 5.885 0.0392 54.16 1.558 0.123 5.61 5.67 0.0256 40.745 1.535 0.165
Kron 8.23 3.98 0.0309 12.07 1.412 0.263 6.05 4.47 0.0473 11.75 1.489 0.235 6.56 4.79 0.0211 10.71 1.50 0.219 5.286 5.025 0.018 10.61 1.509 0.207 6.53 4.23 0.0264 11.76 1.470 0.249
LNFA 7.789 2.82 0.363 189.32 1.389 0.338 5.91 3.08 0.367 184.35 1.512 0.338 5.98 3.08 0.498 89.54 1.537 0.326 6.84 3.02 0.419 184.73 1.455 0.323 5.97 3.21 0.215 145.81 1.509 0.301
MAG 5.39 5.03 0.0265 18.11 1.569 0.195 7.05 4.44 0.0102 15.22 1.450 0.233 5.89 5.14 0.0114 21.12 1.492 0.214 5.613 4.881 0.023 19.55 1.572 0.206 6.33 4.64 0.0112 13.31 1.482 0.237
RWS 9.758 2.82 0.353 190.25 1.398 0.308 5.98 3.182 0.593 212.25 1.511 0.306 5.95 3.19 0.520 99.78 1.527 0.316 5.933 3.711 0.357 169.25 1.556 0.313 5.96 3.56 0.184 56.25 1.515 0.274
SRS 9.842 2.65 0.223 156.12 1.401 0.319 5.993 3.75 0.085 37.88 1.4995 0.259 6.03 3.158 0.197 125.54 1.531 0.314 6.034 3.721 0.429 212.72 1.545 0.311 6.01 3.31 0.201 131.84 1.516 0.298
SUS 9.843 2.59 0.451 124.48 1.397 0.301 5.983 3.29 0.471 197.48 1.501 0.301 5.99 3.155 0.395 106.88 1.531 0.312 5.941 3.739 0.313 173.75 1.559 0.321 5.98 3.532 0.228 95.53 1.511 0.277
RankLinear 7.994 4.122 0.0112 7.336 1.364 0.225 5.996 4.67 0.016 8.28 1.434 0.203 5.996 4.67 0.0223 11.52 1.434 0.209 5.996 4.651 0.015 9.92 1.437 0.216 5.99 4.65 0.0165 9.57 1.435 0.207
RankExp. 7.949 4.151 0.0117 6.197 1.362 0.221 7.979 4.69 0.0141 7.787 1.431 0.206 5.991 4.689 0.0165 9.02 1.432 0.211 5.9961 4.686 0.016 8.71 1.433 0.221 7.991 4.687 0.01655 8.17 1.432 0.213
RankUniform 7.995 4.148 0.016 5.992 1.363 0.215 5.995 4.695 0.0161 9.092 1.432 0.207 5.995 4.694 0.0127 8.722 1.4317 0.212 5.9965 4.685 0.0134 8.54 1.4323 0.218 5.995 4.685 0.0169 8.935 1.4324 0.211
CR IP T
Prop. AD APL CPD CED b λ GE AD APL CPD CED b λ GE AD APL CPD CED b λ GE AD APL CPD CED b λ GE AD APL CPD CED b λ GE
AN US
ca-HepTh
ca-GrQc
DBLP
ca-CondMat
ca-AstroPh
ACCEPTED MANUSCRIPT
Table 5: Few other structural properties of the Kron, LNFA, MAG, RWS, SRS, SUS, RankLinear, RankExp. and RankUniform model over considered five real-world networks.
M
b values are best recovered by the proposed models. case of DBLP network, the AD, CED and λ 9. Summary and Concluding Remarks
AC
CE
PT
ED
We have implemented six different algorithms that generate power-law networks by using a set of initial log-normally distributed fitness values. We conclude that the log-normal fitness attachment protocol employing six different attachment schemes, frequently used in genetic algorithms, naturally constructs power-law networks which possess global topologies similar to real world networks. Furthermore, all the proposed algorithms for generating networks are independent of the connectivity of the existing network structure as the network is built. The principal task for the characterization, analysis, classification, modeling and validation of complex networks is to compute structural properties based on their connectivity and topology. We have provided here several available measures (viz. average path length, diameter, average nearest neighbor, clustering coefficient, rich-club coefficient, global efficiency) that are important characterizations of the distinct structural connectivity properties of power-law networks. Regarding the results observed, the aforementioned structural properties of the non-ranking based RWS, SRS and SUS networks differ (by visual inspection) from the networks generated through the ranking based algorithms (viz. RankLinear, RankExp, RankUniform). More specifically, the functional dependence of the structural properties of the resulting networks with respect to σ shows similar behavior in the non-ranking based algorithms as well as for the ranking based algorithms. In other words the behavior of the structural properties of the generated networks has been observed to broadly fall into two categories as summarized in Table 6. In the formation process of the networks generated through ranking algorithms, as the incoming nodes get attached based on fitness rank, it has been observed that every node gets more or less equal chances to form edges with all the other nodes instead of just a few of the highest fitness ones. As a result, the spread of the degrees in the degree-distribution table is less compared to the non-ranking based algorithms, i.e. maximum degree of the networks generated through ranking based algorithms is much smaller than that of 24
ACCEPTED MANUSCRIPT
Table 6: The observed behaviour of generated networks as σ increases from 0.001 to 8.
Real world behaviour
Degree distribution distribution Diameter of network Clustering Coefficient Number of Triangles Average Nearest Neighbour degree Rich club coefficient
Central Point Dominance Central Edge Dominance Global Efficiency
Group 2: Rank based schemes No noticeable change in visual appearance
Does not exhibit real world behaviour No significant change of the degree distribution More or less constant More or less constant More or less constant Slowly increases
CR IP T
Pictorial representation of the network
Group 1: RWS, SRS and SUS schemes Visual appearance of the network changes due to increasing presence of high degree nodes Exhibits real world behaviour for certain values of sigma Changes from power law to exponential Slowly decreasing Slowly increasing Slowly increasing Slowly increases up to a point and then decreases Slowly increases to some extent and then erratic pattern is observed Increases generally Generally increases except for SRS Slowly increasing
Slowly increasing
More or less constant More or less constant
AN US
Structural Properties
More or less constant
AC
CE
PT
ED
M
the other algorithms. Networks generated through RWS, SRS and SUS for higher values of σ have smallworld effect and more efficient compared to the networks generated through the ranking based algorithms. It is also important to notice that changing σ affects the fitness distribution ρ, which in turn changes the degree distribution P (k) of the resulting networks. From all our simulations, we have observed that for σ ∈ (1.5 − 3), RWS, SRS and SUS are perfectly able to generate power-law networks with characteristics similar to real world networks. For optimal values of σ, depending upon the real world network considered, the networks generated through RWS, SRS and SUS models are very well able to characterize the connectivity pattern observed in real world networks. Simulation results show that proposed non ranking based models outperform the considered baseline and state-of-the-art models in reproducing the important structural properties of the real world networks. The proposed ranking based algorithms although not exhibiting real world behaviour are still able to generate networks with uniform characteristics. They can also mostly reproduce the important structural properties of real world networks but admittedly are less consistent than the non ranking based models. Therefore, our models can be used to generate synthetic networks that mimic real world properties. These synthetic networks can be used to study the future behavior of the real world networks and also be used to evaluate the performance of various network algorithms. In conclusion, the six proposed algorithms using six different attachment schemes, frequently used in GA, result in a new approach for generating power-law networks which exhibit structural characteristics similar to the real world networks. Acknowledgment
The authors would like to thank the Editor-in-Chief, the Associate Editor and the reviewers for their valuable comments and detailed suggestions for the improvement of the presentation and the contents of this article. The authors gratefully acknowledge the financial assistance received from Indian Statistical Institute (I. S. I.) and Visvesvaraya PhD Scheme awarded by the Government of India. S. Chattopadhyay gratefully acknowledges Mr. A. Bakshi for his helpful discussion. 25
ACCEPTED MANUSCRIPT
References
AC
CE
PT
ED
M
AN US
CR IP T
[1] R. Albert and A.L. Barabasi. Statistical mechanics of complex networks. Reviews of modern physics, 74(1):47, 2002. [2] R. Albert, H. Jeong, and A.L. Barabasi. Internet: Diameter of the world-wide web. Nature, 401(6749):130–131, 1999. [3] R. Albert, H. Jeong, and A.L. Barabasi. Error and attack tolerance of complex networks. Nature, 406(6794):378–382, 2000. [4] A.L. Barabasi and R. Albert. Emergence of scaling in random networks. Science, 286(5439):509–512, 1999. [5] C. Bedogne and G.J. Rodgers. Complex growing networks with intrinsic vertex fitness. Physical Review E, 74(4):046115, 2006. [6] B. Bela and O.M. Riordan. Mathematical results on scale-free random graphs. Wiley,1-34, Handbook of graphs and networks: from the genome to the internet, 2003. [7] Bollobas Bela. Random graphs. Springer, New York, 1998. [8] T. Blickle and L. Thiele. A comparison of selection schemes used in genetic algorithms. 1995. [9] Tobias Blickle and Lothar Thiele. A comparison of selection schemes used in evolutionary algorithms. Evolutionary Computation, 4(4):361–394, 1996. [10] Stefano Boccaletti, Vito Latora, Yamir Moreno, Martin Chavez, and D-U Hwang. Complex networks: Structure and dynamics. Physics reports, 424(4):175–308, 2006. [11] G. Caldarelli, A. Capocci, P. De Los Rios, and M.A. Munoz. Scale-free networks from varying vertex intrinsic fitness. Physical review letters, 89(25):258702, 2002. [12] Swarup Chattopadhyay, CA Murthy, and Sankar K Pal. Fitting truncated geometric distributions in large scale real world networks. Theoretical Computer Science, 551:22–38, 2014. [13] S.N. Dorogovtsev, J.F.F. Mendes, and A.N. Samukhin. Structure of growing networks with preferential linking. Physical Review Letters, 85(21):4633, 2000. [14] P. Erdos and A. Renyi. On the strength of connectedness of a random graph. Acta Mathematica Hungarica, 12(1):261–267, 1961. [15] P. Erdos and A. Renyi. On the evolution of random graphs. Bull. Inst. Internat. Statist, 38(4):343–347, 1961. [16] M. Faloutsos, P. Faloutsos, and C. Faloutsos. On power-law relationships of the internet topology. In ACM SIGCOMM Computer Communication Review, 29(4):251–262, 1999. [17] T. Fenner, M. Levene, and G. Loizou. A model for collaboration networks giving rise to a power-law distribution with an exponential cutoff. Social Networks, 1(29):70–80, 2007. [18] L. C. Freeman. A set of measures of centrality based on betweenness. Sociometry, 0(1):35–41, 2002. [19] Shilpa Ghadge, Timothy Killingback, Bala Sundaram, and Duc A Tran. A statistical construction of power-law networks. International Journal of Parallel, Emergent and Distributed Systems, 25(3):223–235, 2010. [20] D.E. Goldberg and K. Deb. A comparative analysis of selection schemes used in genetic algorithms. Urbana, 51:61801–2996, 1991. [21] Kun Guo, Wenzhong Guo, Yuzhong Chen, Qirong Qiu, and Qishan Zhang. Community discovery by propagating local and global information based on the mapreduce model. Information Sciences, 323:73–93, 2015. [22] H. Jeong, B. Tombor, R. Albert, Z.N. Oltvai, and A.L. Barabasi. The large-scale organization of metabolic networks. Nature, 407(6804):651–654, 2000. [23] H. Jeong, S.P. Mason, A.L. Barabasi, and Z.N. Oltvai. Lethality and centrality in protein networks. Nature, 411(6833): 41–42, 2001. [24] Ihsan Kaya. A genetic algorithm approach to determine the sample size for attribute control charts. Information Sciences, 179(10):1552–1566, 2009. [25] Myunghwan Kim and Jure Leskovec. Modeling social networks with node attributes using the multiplicative attribute graph model. In In UAI, 2011. [26] P.L. Krapivsky, S. Redner, and F. Leyvraz. Connectivity of growing random networks. Physical Review Letters, 85(21): 4629, 2000. [27] G. Travieso L. F. Costa, F. A. Rodrigues and P. R. Villas Boas. Characterization of complex networks: A survey of measurements. Advances in Physics, 56(1):167–242, 2007. [28] Jure Leskovec and Christos Faloutsos. Sampling from large graphs. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 631–636. ACM, 2006. [29] Jure Leskovec and Andrej Krevl. SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data, June 2014. [30] Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg, Christos Faloutsos, and Zoubin Ghahramani. Kronecker graphs: An approach to modeling networks. The Journal of Machine Learning Research, 11:985–1042, 2010. [31] L. Li, D. Alderson, J.C. Doyle, and W. Willinger. Towards a theory of scale-free graphs: Definition, properties, and implications. Internet Mathematics, 2(4):431–523, 2005. [32] F. Liljeros, C.R. Edling, L.A.N. Amaral, H.E. Stanley, and Y. Aberg. The web of human sexual contacts. Nature, 411 (6840):907–908, 2001. [33] Ying Liu, Chunguang Li, Wallace KS Tang, and Zhaoyang Zhang. Distributed estimation over complex networks. Information Sciences, 197:91–104, 2012. [34] M. Mitzenmacher. A brief history of generative models for power law and lognormal distributions. Internet Mathematics, 2(1):226–251, 2004. [35] Lev Muchnik, Sen Pei, Lucas C Parra, Saulo DS Reis, Jos´ e S Andrade Jr, Shlomo Havlin, and Hern´ an A Makse. Origins of power-law degree distribution in the heterogeneity of human activity in social networks. Scientific reports, 3, 2013.
26
ACCEPTED MANUSCRIPT
Appendix A
AN US
CR IP T
[36] M.E.J. Newman. The structure of scientific collaboration networks. Proceedings of the National Academy of Sciences, 98 (2):404–409, 2001. [37] M.E.J. Newman. The structure and function of complex networks. SIAM review, 45(2):167–256, 2003. [38] M.E.J. Newman and D.J. Watts. Renormalization group analysis of the small-world network model. Physical Review A, 263(4):341–346, 1999. [39] MEJ. Newman, SH. Strogatz, and DJ. Watts. Random graphs with arbitrary degree distributions and their applications. Physical Review E, 64(2):026118, 2001. [40] Khanh Nguyen and Duc A Tran. Fitness-based generative models for power-law networks. In Handbook of Optimization in Complex Networks, pages 39–53. Springer, 2012. [41] Mar´ıa Angeles Serrano, Mari´ an Bogun´ a, Romualdo Pastor-Satorras, and Alessandro Vespignani. Correlations in complex networks. Large scale structure and dynamics of complex networks: From information technology to finance and natural sciences, pages 35–66, 2007. [42] M. Seshadri, S. Machiraju, A. Sridharan, J. Bolot, C. Faloutsos, and J. Leskovec. Mobile call graphs: Beyond power-law and lognormal distributions. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining(KDD’08), pages 596–604, New York, USA, 2008. [43] R.V. Sole and M. Montoya. Complexity and fragility in ecological networks. Proceedings of the Royal Society of London. Series B: Biological Sciences, 268(1480):2039–2045, 2001. [44] Peng Gang Sun, Lin Gao, and Shan Shan Han. Identification of overlapping and non-overlapping community structure by fuzzy clustering in complex networks. Information Sciences, 181(6):1060–1071, 2011. [45] D.J. Watts and S.H. Strogatz. Collective dynamics of small-world networks. Nature, 393(6684):440–442, 1998. [46] Jaewon Yang and Jure Leskovec. Defining and evaluating network communities based on ground-truth. Knowledge and Information Systems, 42(1):181–213, 2015. [47] S. Zhou and R.J. Mondragon. The rich-club phenomenon in the internet topology. IEEE, 8(3):180–182, 2004.
Case: SUS
For SUS, the new node k + 1 creating m edges with linking probability 1 if ( Pkfr f ) ≥ 1/m i=1 i LP (k + 1 → r) = m( Pkfr ) if ( Pkfr ) < 1/m f f i
i=1
i=1
(4)
i
i−1 X j=1
LP (i → j) +
n X
LP (k → i) =
ED
di =
M
In case of m = 1, i.e., adding one edge at each time step, the average degree of ith node over the completed network can be calculated similarly as in the case of RWS
k=i+1
i−1 X j=1
fj
Pi−1
p=1 fp
+
k=i+1
In case of m = 2, the average degree of ith node becomes i−1 X
2LP (i → i′ )LP (i → j ′ ) +
i−1 X
i−1 X
PT
di = (
CE
i′ ,j ′ =1 i′ 6=j ′
=2
(LP (i → i′ )
AC
i′ =1
=2
i−1 X
i′ =1
=2
i−1 X
i′ =1
j ′ =1 i′ 6=j ′
i−1 X j=1
LP (i → j ′ )) +
LP (i → i′ )(1 − LP (i → i′ )) + LP (i → i′ ) −
i−1 X j=1
n X
fi
Pk−1
p=1 fp
(LP (i → j))2 ) + i−1 X j=1
i−1 X j=1
(LP (i → j))2 +
n X
k=i+1
(LP (i → j))2 +
(LP (i → j))2 + n X
k=i+1
=1+
n X
k=i+1
fi Pk−1 p=1
fp
(5)
LP (k → i)
n X
k=i+1 n X
k=i+1
LP (k → i) (6)
LP (k → i)
LP (k → i)
For large enough i and for sufficiently large n, there would be very small number of nodes for which fi ( Pi−1 ) ≥ 12 , which therefore will not contribute significantly towards the final degree distribution and f p=1
p
27
ACCEPTED MANUSCRIPT
fk may be ignored. Therefore the linking probability LP (k → i) can be approximated by (2. Pi−1 ), from f p=1
p
Equation 4, and the average degree becomes ≃2
i−1 X
fi′ (2 ∗ Pi−1
i′ =1
p=1
fp
)−
i−1 X fj ≃ 2∗2− (2 ∗ Pi−1
p=1
j=1
i−1 X j=1
fj (2 ∗ Pi−1
2
fp
)
p=1
+
n X
k=i+1
fp
)2 +
n X
k=i+1
fi
(2 ∗ Pk−1 p=1
fp
)
fi (2 ∗ Pk−1 p=1
fp
) (7)
n X
k=i+1
1 ), where m′ = m2 ± ǫ, is constant (k − 1)f
AN US
di ≃ m′ + mfi (
CR IP T
The second term of the above equation (i.e. Equation 8) can be made sufficiently small for large i. The generalization of di in a network generated through SUS scheme by adding m number of edges successively at each time step can be done by the similar argument that for large enough i and for sufficiently large n, fi 1 there would be only very small number of nodes for which ( Pi−1 )≥ m . The denominator of the last p=1 fp Pk−1 term of Equations 5 and 8 is p=1 fp and it can be approximated to (k − 1)f for large i and for sufficiently Pn large n, where f = i=1 fi . Then average degree of the ith node, di can be written as in general
and ǫ is a small positive number. n X 1 ≃ m′ + mHfi , where H = ( ) is constant, for given i and n. (k − 1)f k=i+1
Appendix B
Case: SRS
ED
M
In continuous case, we can write the above equation as d = m′ + mHf , where f is drawn from log-normal distribution with density ρ. For sufficiently large n, the distribution of degree d, for a given fitness value f , will be concentrated around d. Therefore the variance in d is mostly due to the variance in f . Hence we may write d = m′ + mHf . Therefore the degree distribution P (d) of d can be derived from the distribution ρ(f ) of f and takes the form 1 d − m′ P (d) = ρ( ) (8) mH mH
CE
PT
For SRS, the new node k + 1 creates m edges with linking probability given below: Let m( Pkfr f ) = αr + βr ; where αr =integeral part and βr =fractional part, 0 < βr < 1 for all r. The i=1 i corresponding linking probability is ( 1 where αr ≥ 1 LP (k + 1 → r) = (9) β r ( Pk β ) where 0 < βr < 1 i=1
i
AC
In case of m = 1, αr = 0 for all r and hence the linking probability becomes on βr . Therefore the average degree of i di =
i−1 X j=1
=1+
LP (i → j) + n X
k=i+1
fi Pk−1 p=1
n X
k=i+1
th
LP (k → i) =
Pkβr
i=1
βi
, which depends only
node over finished network can be calculated as i−1 X j=1
βj Pi−1
p=1
βp
+
n X
k=i+1
βi Pk−1 p=1
βp
=
i−1 X j=1
fj Pi−1
p=1
fp
+
n X
k=i+1
fi Pk−1 p=1
fp
fp (10) 28
ACCEPTED MANUSCRIPT
In case of m = 2, the average degree of ith node becomes i−1 X
2LP (i → i′ )LP (i → j ′ ) +
i−1 X
(LP (i → i′ )
i−1 X
i−1 X
LP (i → i′ )(1 − LP (i → i′ )) +
i′ ,j ′ =1 i′ 6=j ′
=2
i′ =1
=2
i′ =1
=2
i−1 X
i′ =1
j ′ =1 i′ 6=j ′
′
LP (i → i ) −
i−1 X j=1
(LP (i → j))2 ) +
LP (i → j ′ )) +
i−1 X j=1
i−1 X j=1
i−1 X j=1
2
(LP (i → j)) +
n X
k=i+1
(LP (i → j))2 +
(LP (i → j))2 + n X
k=i+1
LP (k → i)
n X
k=i+1 n X
LP (k → i) (11)
LP (k → i)
CR IP T
di = (
k=i+1
LP (k → i)
For large enough i and for sufficiently large n, there would be very small number of nodes for which αi ≥ 1 and hence the linking probability mostly depends only on βi (Equation 9). Therefore the average degree can be approximated as i−1 X
βi′ ( Pi−1
i′ =1
≃ 2−
j=1
i−1 X j=1
βj ( Pi−1
p=1
βp
i−1 X βj ( Pi−1
p=1 βp
j=1
)2 +
n X
k=i+1
fj
(m. Pi−1
p=1 fp
n X
)2 +
)2 +
βi ( Pk−1
n X
k=i+1
βi ( Pk−1
p=1
k=i+1
βp
)
fi
(m. Pi−1
p=1
p=1
βp
1
fp
)( Pk−1 p=1
fi (since βi ∼ m( Pi−1
M
≃ 2−
i−1 X
p=1 βp
)−
AN US
≃ 2
p=1
fp
βp
)
(12) )
))
PT
ED
The second term of the above equation (i.e. Equation 10) becomes sufficiently small for large values of i. The generalization of di in a network generated through SRS scheme by adding m number of edges successively at each time step can be done by a similar argument that for large enough i and for sufficiently large n, there would small number of nodes for which αi ≥ 1. The denominator of the last term of Equation 12 is Pk−1 be veryP i−1 p=1 βp and p=1 fp which can be approximated to (k − 1)β and (i − 1)f for large i and for sufficiently Pn Pn large n, where β = i=1 βi and f = i=1 fi . Then average degree of the ith node, di can be written in general as n X
CE
di ≃ m′ + mfi (
k=i+1
(
1 1 )( )), where m′ = m ± ǫ, is constant (k − 1)β (i − 1)f and ǫ is a small positive number.
AC
≃ m′ + mHfi , where H = (
n X
k=i+1
(
1 1 )( )) is constant, for given i and n. (k − 1)β (i − 1)f
In continuous case, we can write the above equation as d = m′ + mHf , where f is drawn from log-normal distribution with density ρ. For sufficiently large n, the distribution of degree d, for a given fitness value f , will be concentrated around d. Hence we may write d = m′ + mHf . Therefore the degree distribution P (d) of d can be derived from the distribution ρ(f ) of f and takes the form P (d) =
1 d − m′ ρ( ) mH mH 29
(13)