Clustering coefficients of large networks

Clustering coefficients of large networks

Accepted Manuscript Clustering coefficients of large networks Yusheng Li, Yilun Shang, Yiting Yang PII: DOI: Reference: S0020-0255(16)32052-7 10.101...

576KB Sizes 0 Downloads 119 Views

Accepted Manuscript

Clustering coefficients of large networks Yusheng Li, Yilun Shang, Yiting Yang PII: DOI: Reference:

S0020-0255(16)32052-7 10.1016/j.ins.2016.12.027 INS 12662

To appear in:

Information Sciences

Received date: Revised date: Accepted date:

15 December 2015 15 October 2016 14 December 2016

Please cite this article as: Yusheng Li, Yilun Shang, Yiting Yang, Clustering coefficients of large networks, Information Sciences (2016), doi: 10.1016/j.ins.2016.12.027

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

Clustering coefficients of large networks∗

Abstract

CR IP T

Yusheng Li, Yilun Shang†, Yiting Yang School of Mathematical Sciences, Tongji University Shanghai 200092, China

AN US

Let G be a network with n nodes and eigenvalues λ1 ≥ λ2 ≥ · · · ≥ λn . Then G is called an (n, d, λ)-network if it is d-regular √ and λ = max{|λ2 |, |λ3 |, . . . , |λn |}. It is shown that if G is an (n, d, λ)-network and λ = O( d), the average clustering coefficient c(G) of G satisfies c(G) ∼ d/n for large d. We show that this description also holds for strongly regular graphs and Erd˝os-R´enyi graphs. Although most real-world networks are not constructed theoretically, (n−d−1) we find that many of them have c(G) close to d/n and many close to 1 − µ2d(d−1) , where d is the average degree of G and µ2 is the average of the numbers of common neighbors over all non-adjacent pairs of nodes.

Introduction

ED

1

M

Key Words: Clustering coefficient; theoretic graph; real-world network

AC

CE

PT

Complex systems from various fields, such as physics, biology, and sociology, can be systematically analyzed using a network representation. A network (or graph) is composed of vertices (or nodes) and edges, where vertices represent constituents of the system and edges represent relationships between constituents. Mathematically, we define G = (V, E) as a simple graph with vertex set V and edge set E ⊆ V × V . Let N (v) = {u ∈ V : uv ∈ E} be the neighborhood of a vertex v, dv the degree of v, and ev = e(N (v)) the number of edges in the subgraph of G induced by N (v). An important measure of network topology, called the clustering coefficient, assesses the triangular pattern and the connectivity in a vertex’s neighborhood: a vertex has a high clustering coefficient if its neighbors tend to be directly connected to each other. The clustering coefficient cv of a vertex v can be calculated as ( 0, if dv = 0, ev cv = , if dv ≥ 2. (d2v )

∗ Research supported by the National Natural Science Foundation of China under Grants No. 11331003, No. 11101360, No. 11505127, the Shanghai Pujiang Program under Grant No. 15PJ1408300, and Outstanding Young Scholar Foundation of Tongji University under Grants No. 2013KJ031, No. 2014KJ036. † Correspondence: [email protected]

1

ACCEPTED MANUSCRIPT

For dv = 1, it is conventional to define cv ∈ [0, 1] depending on the situation. Thus, 0 ≤ cv ≤ 1. The clustering coefficient cv for dv ≥ 2 is the ratio of the number of triangles to the total number of possible triangles that share vertex v. The average clustering coefficient of a graph G of order n (i.e., G contains n vertices) and minimum degree δ(G) ≥ 2 is defined as 1X 2X ev . cv = n n dv (dv − 1) v∈V

CR IP T

c(G) =

v∈V

ED

M

AN US

The average clustering coefficient measures the clustering (triangulation) within a network by averaging the clustering coefficients of all its nodes. The idea of a clustering coefficient is intended (especially in the analysis of social networks) to measure the degree to which nodes in a graph tend to cluster [31]. It is considered to be an effective measure of the local connectivity or ”cliqueness” of a social network. For example, the clustering coefficient and community structure of a social network might reveal a collaborative relationship between agents [41]. A network with a high average clustering coefficient and a small average distance is often called a ”small-world” network [42]. High clustering is also associated with the robustness of a network, that is, resilience against random damage [2, 24, 28, 38]. The clustering coefficient has also been extended to the settings of weighted networks [35, 43] and directed networks [15], and network models with tunable clustering coefficients [20, 22, 37], degree-related clustering coefficients [29], and granular clustering coefficients [26] have been proposed. Moreover, the work [36] presents a fast sampling-based approximation algorithm for calculating the average clustering coefficient. Some basic properties of the clustering coefficient can be easily observed. For a graph G with minimum degree δ(G) ≥ 2, c(G) = 0 if and only if G is triangle-free. Letδ(N (v)) be the minimum degree of the subgraph of G induced by N (v). The following result gives us a nontrivial lower bound of c(G).

PT

Lemma 1 Let G be a graph with δ(G) ≥ 2. If m = minv∈V δ(N (v)), then c(G) ≥

m , d−1

CE

where d is the average degree of G.

AC

Proof. Let n be the order of G. Note that ev = e(N (v)) ≥ c(G) =

mdv 2

for each vertex v, and thus,

1X 1 X ev 1X m m cv = ≥ , ≥ d v n n n dv − 1 d−1 2 v∈V

v∈V

v∈V

where the last inequality holds as a result of the convexity of the function

1 x−1

for x > 1.

2

The aim of this paper is to investigate the average clustering coefficients of some theoretical and large-scale real networks. The remainder of the paper is organized as follows. In Section 2, we derive the exact expression for the average clustering coefficient of strongly regular graphs of any order. √ In Section 3, we show that the average clustering coefficients of (n, d, λ)-graphs with λ = O( d) follow the asymptotic d/n for large d. The same expression is recovered for an almost regular algebraic graph, Erd˝ os-R´enyi graph, in Section 4. We present some data on networks 2

ACCEPTED MANUSCRIPT

2

CR IP T

taken from real-world applications in Section 5. Interestingly, the average clustering coefficient of a Florentine families graph is shown to be close to the corresponding estimate obtained for strongly regular graphs (Theorem 1). This finding highlights the fact that the asymptotic d/n is not only a characteristic of random graphs and some theoretical graphs but is also observed in various real systems distinct from small-world networks. Section 6 concludes the paper.

Clustering coefficients of strongly regular graphs

AN US

A graph G of order n is said to be a strongly regular graph with parameters n, d, µ1 , µ2 , denoted by srg(n, d, µ1 , µ2 ), if it is d-regular, and any pair of vertices has µ1 common neighbors if they are adjacent, and otherwise, µ2 common neighbors [19, p. 218]. For instance, C5 is an srg(5, 2, 0, 1). Another typical example of strong regular graphs is the Paley graph [19, p. 221], defined as follows. Let q ≡ 1 (mod 4) be a prime power and Fq the finite field of order q. The vertex set of Paley graph Pq is Fq , and distinct vertices x and y are adjacent if and only if x − y is q−5 q−1 non-zero quadratic. Clearly, Paley graph Pq is an srg(q, q−1 2 , 4 , 4 ). For an srg(n, d, µ1 , µ2 ), the number m in Lemma 1 is exactly µ1 . The next theorem says that Lemma 1 is sharp for strongly regular graphs. Theorem 1 For an srg(n, d, µ1 , µ2 ) G with d ≥ 2, µ1 c(G) = . d−1 also

µ1 d−1 ,

dµ1 2 ,

as claimed.

and thus, cv =

M

Proof. For a vertex v of G, ev =

ev

(d2)

=

µ1 d−1 .

Hence, the average of cv is 2

ED

Corollary 1 For a Paley graph Pq ,

c(Pq ) =

PT

q−5 q−1 Proof. Since Pq is an srg(q, q−1 2 , 4 , 4 ),

CE

c(Pq ) =

µ1 = d−1

q−5 . 2(q − 3)

q−5 4 q−1 2 −

1

=

q−5 . 2(q − 3)

2

AC

The eigenvalues of strongly regular graphs can be computed easily, as summarized in the following lemma [19, p. 219]. Lemma 2 For a connected, non-complete, non-empty srg(n, d, µ1 , µ2 ) G with n ≥ 3, λ1 = d is an eigenvalue with multiplicity m1 = 1 and any eigenvalue λ 6= λ1 satisfies λ2 + (µ2 − µ1 )λ + (µ2 − d) = 0.

The equation has two distinct solutions λ2 > λ3 with λ2 > 0 > λ3 , and λ3 is an eigenvalue. If d + (n − 1)λ3 6= 0, then λ2 is also an eigenvalue. Their multiplicities m2 and m3 can be determined by m2 + m3 = n − 1, and d + m2 λ2 + m3 λ3 = 0. 3

ACCEPTED MANUSCRIPT

In the next section, we will consider a more general class of graphs in which the average clustering coefficient can be estimated asymptotically under some conditions regarding their second largest eigenvalues.

Clustering coefficients of (n, d, λ)-graphs

CR IP T

3

Let us label the vertices of a graph G of order n as v1 , v2 , . . . , vn and let A = (aij )n×n be the adjacency matrix of G, for which  1, if vi vj ∈ E, aij = 0, otherwise.

AN US

Lemma 3 Let G be a graph of order n with degree sequence d1 , d2 , . . . , dn and let A be the adjacency matrix of G. If δ(G) ≥ 2 and the diagonal elements of A3 are γ1 , γ2 , . . . , γn in order, then n 1X γi c(G) = . n di (di − 1) i=1

Particularly, if G is d-regular and λ1 , λ2 , . . . , λn are the eigenvalues of A, then n

X 1 c(G) = λ3i . nd(d − 1) i=1

AC

CE

PT

ED

M

Proof. As A is symmetric, it is diagonalizable. Let λ1 , λ2 , . . . , λn be the eigenvalues of A. The eigenvalues of Ak are λk1 , λk2 , . . . , λkn . Note that the (i, j) element of Ak is the number of walks from vertex vi to vertex vj [19, 23], and a closed walk of length 3 is a triangle. Thus, the ith diagonal element of A3 is exactly 2evi , and the claimed equalities follow. 2 We also refer to the eigenvalues of A as the eigenvalues of G. Letλ1 ≥ λ2 ≥ · · · ≥ λn be the eigenvalues of G in a non-increasing order. Random graphs have proved to be one of the most important tools in modern graph theory and network research. Their tremendous success has posed the problems of determining the essential properties of a random graph and whether a given graph behaves similarly to a random graph. A cornerstone contribution of Chung, Graham, and Wilson [12] provided several equivalent properties for measuring the similarity between a given graph and a random graph G(n, p) with fixed p. This similarity is also called the quasirandomness of the given graph. One of those properties is characterized by the magnitude of λ = λ(G), where λ = λ(G) = max{|λi | : 2 ≤ i ≤ n}.

According to Alon [4], a graph G is an (n, d, λ)-graph if G is d-regular with n vertices and λ = λ(G). Note that for a d-regular connected graph, λ1 = d. For sparse graphs with edge density p = o(1), Chung and Graham [11] also presented some equivalent properties for quasirandomness under particular conditions. One of these properties is that λ1 ∼ pn and λ = o(λ1 ). For an (n, d, λ)-graph, the spectral gap between d and λ is a measure of its quasi-random property. The smaller the value of λ is compared to d, the closer the edge distribution is to the ideal uniform distribution (i.e., it becomes a random graph). A natural question now is ”How small can λ be?” 4

ACCEPTED MANUSCRIPT

Lemma 4 Let G be an (n, d, λ)-graph and let  > 0. If d ≤ (1 − )n, then √ λ ≥ d. Proof. Letting A be the adjacency matrix of G, n X

λ2i

i=1

CR IP T

nd = 2e(G) = tr(A2 ) =

≤ d2 + (n − 1)λ2 ≤ (1 − )nd + nλ2 ,

M

AN US

which concludes the proof. 2 √ Based on this estimate, we can say, not precisely, that an (n, d, λ)-graph with λ = O( d) has good quasi-randomness. Generally, this is a weak condition as most random graphs are such graphs [7]. Moreover, Lemma refsrg-eigen suggests that for an srg(n, d, µ1 , µ2 ), when |µ1 − µ2 | √ is small compared to d, λ is close to d and G has good quasi-randomness. √ It is known that for any fixed d ≥ 4, a random d-regular graph G of order n has λ = (2 + o(1)) d − 1, and for any fixed  > 0, the probability  d − 1 d−1 ≤ c¯(G) ≤ (1 + ) →1 Pr (1 − ) n n as n → ∞ [8, 17]. The following theorem provides an analogous estimate of the average clustering √ coefficient for (n, d, λ)-graph with λ = O( d). √ Theorem 2 Let G be an (n, d, λ)-graph. If λ = O( d) as d → ∞, then d . n Proof. Let n be the order of G whose eigenvalues are λ1 ≥ λ2 ≥ · · · ≥ λn . Since G is d-regular, λ1 = d. From Lemma 3, ! n X 1 3 3 c(G) = d + λi . nd(d − 1) i=2 √ The assumption λ = O( d) implies that Pn 3 nλ3 O(d3/2 ) i=2 λi ≤ = → 0. nd(d − 1) nd(d − 1) d2

AC

Thus,

CE

PT

ED

c(G) ∼

c(G) ∼

d2 d ∼ n(d − 1) n

(1)

for large d. 2 √ Remark 1. If G is an srg(n, d, µ1 , µ2 ), we have seen from Lemma 2 that λ = O( d) provides µ1 ≈ µ2 . Note that there is a well-known equality relating the four parameters in a strongly regular graph, namely d(d − µ1 − 1) = µ2 (n − d − 1). Hence, by dividing n on both sides of the 2 equality and using the estimate µ1 ≈ µ2 , we derive that dn ≈ µ2 . Thus, Theorem 1 implies that µ1 c¯(G) = d−1 ≈ µd2 ≈ nd , which agrees with Theorem 2. The condition that d is large in Theorem P 2 is used for estimating ni=2 λ3i in the proof. However, this is not necessary for some graphs, which are described in the next section. 5

ACCEPTED MANUSCRIPT

4

Clustering coefficients of Erd˝ os-R´ enyi graphs

CR IP T

Erd˝ os-R´enyi graph Eq [14] is one of the classical algebraic constructions in extremal combinatorics, defined as follows. Let Fq∗ = Fq \ {0}. Define an equivalence relation ≡ on Fq3 \ {(0, 0, 0)} by letting (a1 , a2 , a3 ) ≡ (b1 , b2 , b3 ) if there is γ ∈ Fq∗ such that (a1 , a2 , a3 ) = γ(b1 , b2 , b3 ). Let ha1 , a2 , a3 i be the equivalence class containing (a1 , a2 , a3 ) and let V be the set of all equivalence classes. Then |V | = q 2 + q + 1. Define a graph Eq on the vertex set V by letting distinct vertices hv1 , v2 , v3 i and hx1 , x2 , x3 i be adjacent if and only if v1 x1 + v2 x2 + v3 x3 = 0.

AN US

For a vertex v = hv1 , v2 , v3 i, since v1 x1 + v2 x2 + v3 x3 = 0 has q 2 − 1 solutions (x1 , x2 , x3 ) forming q + 1 vertices,  q, if v12 + v22 + v32 = 0, dv = q + 1, otherwise. However, Eq is not regular and its spectrum is not known. If we add a loop for a vertex v to Eq when v12 + v22 + v32 = 0, we obtain a (q + 1)-regular graph, which is denoted by Eqo . In the following, we use the spectrum of Eq0 to compute the spectrum of Eq . Let M be the adjacency matrix of Eqo , where a diagonal element is equal to the number of loops incident to the vertex. From the eigenvalues of M 2 [3], we can find the spectrum of M as follows.

M

Lemma 5 The spectrum of Eqo is as follows.

ED

eigenvalue multiplicity

q+1 1

√ q q(q+1) 2

√ − q

q(q+1) 2

PT

Return to the simple graph Eq with maximum degree ∆(Eq ) = q + 1 and minimum degree δ(Eq ) = q. Let A be the adjacency matrix of Eq and λ1 ≥ λ2 ≥ · · · ≥ λn the eigenvalues of A, where n = q 2 + q + 1. We estimate these {λi }ni=1 by using the following result [23, p. 181].

CE

Lemma 6 Let A and B be real symmetric matrices of order n. Let the eigenvalues λi (A), λi (B), and λi (A + B) of A, B, and A + B, respectively, be labeled in non-increasing order. For each 1 ≤ i ≤ n, λi (A) + λ1 (B) ≥ λi (A + B) ≥ λi (A) + λn (B).

AC

Corollary 2 Let G be a simple graph of order n with ∆ = ∆(G) and δ = δ(G). Let G0 be a graph obtained from G by attaching each vertex v with ∆ − dv loops. Suppose that G and G0 have eigenvalues λ1 ≥ λ2 ≥ · · · ≥ λn and λ01 ≥ λ02 ≥ · · · ≥ λ0n , respectively. For each 1 ≤ i ≤ n, λi + ∆ − δ ≥ λ0i ≥ λi .

Proof. It is easy to see that A(G0 ) = A(G) + D, where D is a diagonal matrix whose diagonal elements are ∆ − dv for each vertex v. Since λ1 (D) = ∆ − δ and λn (D) = 0, the assertion follows from Lemma 6. 2

6

ACCEPTED MANUSCRIPT

Lemma 7 Let λ1 ≥ λ2 ≥ · · · ≥ λn be eigenvalues of Eq , where n = q 2 + q + 1. Then these eigenvalues can be bounded as follows. eigenvalue bounds

λ1 q ≤ λ1 ≤ q + 1

λi ; λi > 0, i ≥ 2 √ √ q ≤ λi ≤ q + 1

λi ; λi < 0 √ √ − q ≤ λi ≤ − q + 1

CR IP T

Although Eq is not regular, c(Eq ) is still close to d/n, where n = q 2 + q + 1 and d is the average degree of Eq . Theorem 3 Let q be a prime power and Eq the Erd˝ os-R´enyi graph. Eq has n = q 2 + q + 1 vertices and its average degree d satisfies q < d < q + 1 and c(Eq ) ∼ nd ∼ 1q for large q.

X i

λ3i ≤ (q + 1)3 +

AN US

Proof. It is easy to see that except for the largest λ1 , both the number of other positive P eigenvalues and the number of negative eigenvalues are q(q + 1)/2. Thus, we can estimate i λ3i from the above as q(q + 1) √ q(q + 1) √ ( q + 1) − ( q − 1) = (q + 1)3 + q(q + 1), 2 2

and hence, by Lemma 3, c(Eq ) ≤

(q + 1)3 + q(q + 1) 1 ∼ , 2 (q + q + 1)q(q − 1) q

2

Clustering coefficients of some real networks

ED

5

M

and the desired lower bound can be obtained similarly.

AC

CE

PT

In this section, we present some applications of the above theoretical results to realistic networks. If a realistic network G is shown to be close (roughly speaking) to a strongly regular graph srg(n, d, µ1 , µ2 ), say, by sampling a portion of its nodes, then Theorem 1 can be an estimate of its average clustering coefficient c¯(G). Since many realistic large-scale networks (such as social networks) are highly clustered, the calculation of parameter µ1 can create an enormous burden on the computation facility in practice. Alternatively, we can first calculate the parameter µ2 by using a heuristic algorithm, and then, invoke the convenient relation d(d−µ1 −1) = µ2 (n−d−1), which in turn yields an estimate of µ1 . In other words, c¯(G) =

µ1 µ2 (n − d − 1) =1− . d−1 d(d − 1)

The advantage of evaluating µ2 over µ1 is shown in Appendix A. Let d be the average degree of a network and µ2 the average of the numbers of common neighbors of non-adjacent pairs of nodes. We estimate the average clustering coefficients of some networks as follows: µ2 (n − d − 1) c¯(G) ≈ 1 − . (2) d(d − 1)

Algorithm 1 is a heuristic algorithm for calculating µ2 . 7

ACCEPTED MANUSCRIPT

AN US

CR IP T

Algorithm 1 Input: G Output: θ = µ2 01 let a = b = 0 02 let Q1 and Q2 be two empty queues 03 for v in G\Q1 04 add v into Q1 and Q2 05 add each vertex u that is adjacent to v into Q2 06 b = b + |G\Q2 |, where | · | represents the length 07 for w in G\Q2 08 let c be the number of vertices in Q2 that are adjacent to w 09 a=a+c 10 clear Q2 11 θ = a/b

Algorithm 1. A heuristic algorithm that calculates the average value µ2 for a network G.

by (2), 1 −

µ2 (n−d−1) d(d−1)

= 1−

3.650 4.446

M

We worked out a concrete example by considering the social network G of some Florentine families [9]. This network describes the marriage relations among n = 15 families in fifteenthcentury Florence with data collected by John Padgett from historical documents (see Fig. 1 for an illustration). A direct calculation yields d = 2.667 as the average degree of G and c¯(G) = 0.16 as the average clustering coefficient. Algorithm 1 yields µ2 = 0.322. Interestingly, = 0.179 provides a reasonably good approximation of the

PT

ED

average clustering coefficient c¯(G). In addition, we studied two other real networks, i.e., the US top-500 airports network [13], in which two airports are connected if a flight was scheduled between them in 2002, and the autonomous system graph of the Internet [36]. The results are summarized in Table 1, which illustrates the availability of the proposed method. Note that the sampling-based algorithm in [36] yields an estimation c(G) = 0.311 for the AS graph, which seems slightly worse in comparison to the present method.

AC

CE

Network G Florentine families graph US airport network AS graph

n 15 500 13164

d 2.667 11.920 4.332

µ2 0.322 0.185 0.0005

c¯(G) 0.160 0.351 0.458

c¯(G) via eq.(2) 0.179 0.308 0.544

Table 1. Statistics for three social and information networks.

Next, we collected statistics for some real-world complex networks in Table 2. In this table, n, m, d, c¯(G), and R are the number of vertices, the number of edges, the average degree, the average clustering coefficient, and the relative error, defined as R =

8

|¯ c(G)−d/n| , c¯(G)

respectively.

AN US

CR IP T

ACCEPTED MANUSCRIPT

Fig. 1. The Florentine families graph G with n = 15 vertices, where the family names are shown alongside the vertices m 13453 1532 454 6300 16523 2327 2978 1018

M

n 3234 1168 55 240 1148 97 109 54

PT

ED

Network G p2p-Gnutella German highway cat brain constraint programming passenger air traffic protein interaction I protein interaction II protein interaction III

c¯(G) 0.007 0.0013 0.55 0.225 0.027 0.49 0.5 0.71

d/n 0.003 0.0019 0.3 0.219 0.025 0.495 0.501 0.698

d 9.7 2.6 16.5 52.5 28.8 47.5 54.6 37.7

R 0.57 0.46 0.45 0.027 0.074 0.01 0.002 0.017

Table 2. Statistics for some real networks

AC

CE

In Table 2, p2p-Gnutella is a snapshot of the Gnutella peer-to-peer file sharing network on August 5, 2002 [27, 34], where nodes represent hosts in the Gnutella network and edges represent connections between the hosts. German highway was compiled from data of the Autobahnwegweisungs-Informations-System in July 2002, taking cities as nodes and highways as edges [25]. cat brain is a biological network depicting long-range cortical connectivity in cat brains [21]. The dataset of constraint programming comes from monitoring of constraint-oriented programs on activity graphs [18]. passenger air traffic [5] was obtained by tracking the aerial routes among airports handling over two million passengers in 2000. protein interaction I, II, and III are three quasi-cliques for ribosome biogenesis in protein-protein interaction networks of budding yeast [10], where proteins are nodes and interactions form edges. We observe that these real networks have average clustering coefficients c¯(G) relatively close to d/n. Although these real-world networks were not constructed theoretically, they obey the same asymptotic average clustering coefficients as those obtained in Sections 3 and 4. A clear 9

ACCEPTED MANUSCRIPT

M

AN US

CR IP T

trend can be discerned from Table 2. Networks with large average degrees d tend to have smaller R values, i.e., the approximation of d/n becomes better for larger d values (compared to n). This phenomenon is consistent with our obtained theoretical results, although they are √ obtained under more restrictive assumptions (i.e., the (n, d, λ)-graph condition with λ = O( d)). This phenomenon might be explained by the fact that these considered networks are, in general, similar to edge-independent random graphs with large maximum degrees. Let G be a random graph with n vertices, where uv is an edge with probability P puv and each edge is independent √  of each other edge. It has been shown [30, 39] that if maxu { v puv }  ln4 n, then λ = O d almost surely. The classical Erd˝ os-R´enyi random graph is a special case of such graphs. The passenger air traffic graph, for example, has been shown to have asymptotically independent edges and a handful of hub nodes [5]. Hence, the result c(passenger air traffic) ≈ d/n presumably follows from Theorem 2. Finally, there are many real networks whose average clustering coefficients c¯(G) are far from d/n as compared to those given in Table 2. In particular, networks with small-world properties usually have high clustering coefficients but low values of d/n. In Table 3, we have collected some real network data in which the values of R, namely the relative errors, are typically much higher than those in Table 2. Most of these networks have been proved to be small-world networks, and the edge structures therein are far from random. In movie actors, for example, actors’ choices of collaborating with one or another are highly correlated, as the total potential number of collaborators is usually fixed. In contrast to Erd˝os-R´enyi-like graphs, which by nature have c(G) ≈ d/n, such small-world networks are prevalent in social, informational, and biological systems. Substantial effort on the study of clustering coefficients has been devoted to such small-world networks [31]. n 153127 225226 56627 209293 282 315 282

m 2695801 6869393 4898236 1203435 1036 4475 1974

c¯(G) 0.1078 0.79 0.726 0.76 0.32 0.59 0.28

d/n 0.00023 0.00027 0.003 0.000005 0.026 0.09 0.05

d 35.21 61 173 11.5 7.35 28.3 14

R 0.998 0.999 0.996 0.999 0.919 0.847 0.821

CE

PT

ED

Network G WWW [1] movie actors [42] SPIRES co-authorship [32] Neurosci. co-authorship [6] E. coli substrate [16] E. coli reaction [16] C. Elegans [42]

Table 3. Statistics for some real small-world networks, where the value R is typically very high.

Conclusion

AC

6

In this paper, we showed√analytically that the average clustering coefficient c(G) of an (n, d, λ)network G with λ = O( d) satisfies c(G) ∼ d/n for large d. This asymptotic expression also holds for strongly regular graphs and Erd˝ os-R´enyi graphs. In addition to the above theoretical graphs, we presented numerical results based on real network data. Our key finding-that a range of networks possesses the asymptotic average clustering coefficient d/n, where d is the empirical average degree of the network in question, in contrast to small-world networks-suggests a new category of Erd˝ os-R´enyi-like networks, which we hope might shed some light on the network clustering phenomenon and stimulate further research efforts on related topics. 10

ACCEPTED MANUSCRIPT

Appendix A

AN US

CR IP T

P For a graph G = (V, E), let M1 = uv∈E |{w ∈ V : w ∈ N (u) P ∩ N (v)}| be the total number of common neighbors of adjacent vertices. Similarly, let M2 = uv6∈E |{w ∈ V : w ∈ N (u)∩N (v)}| be the total number of common neighbors of non-adjacent vertices. Apparently, the complexity of computing µi is proportional to Mi (i = 1, 2). In Fig. 2, we show an illustrating example for a moderately clustered network, where M1 = 2M2 , indicating that using (2) can halve the computational burden.

Fig. 2. Left: network with five vertices. Right: associated table consisting of the numbers of common neighbors of each pair of vertices, where blue numbers indicate adjacent pairs of vertices and red numbers indicate non-adjacent pairs. Hence, M1 = 12 and M2 = 6.

ED

M

We show in Fig. 3 the ratio M2 /M1 for general random networks created using the clustered random graph model following [33]. This model is basically the configuration random graph decorated by two sequences, i.e., ti , the number of triangles in which the ith vertex participates, and si , the number of single edges attached to the ith vertex, other than those belonging to the triangles. The figure shows that 0 < M2 /M1 < 0.8 for networks of all orders considered, suggesting that our use of (2) can effectively reduce the computational burden for clustered networks.

PT

1

AC

M2/M1

CE

0.8

0.6

0.4

0.2

0 0

100

200

n

300

400

500

Fig. 3. Ratio M2 /M1 versus n, the number of vertices, for clustered random graphs with ti , si ∈ {1, 2, · · · , 5} uniformly at random. Each data point is based upon averaging over a sample of 30 independent realizations.

11

ACCEPTED MANUSCRIPT

Acknowledgement

CR IP T

The authors are very thankful to the editor and anonymous reviewers for their valuable comments and constructive suggestions that greatly help them improve the presentation of the manuscript.

References

[1] L. A. Adamic and B. A. Huberman, Growth dynamics of the World Wide Web, Nature, 401 (1999), 131.

AN US

[2] S. Agreste, S. Catanese, P. D. Meo, E. Ferrara, G. Fiumara, Network structure and resilience of Mafia syndicates, Inf. Sci., 351 (2016), 30–47. [3] M. Aigner and G. Ziegler, Proofs from THE BOOK (3rd ed.), Springer, 2004. [4] N. Alon and J. Spencer, The Probabilistic Method, 3rd ed., Wiley-Interscience, New York, 2008.

M

[5] M. Amiel, G. M´elan¸con, and C. Rozenblat. R´eseaux multi-niveaux : l’exemple des ´echanges a´eriens mondiaux de passagers. M@ppemonde, 79 (2005), p.05302. [6] A.-L. Barabasi, H. Jeonga, Z. Neda, E. Ravasz, A. Schubert, and T. Vicsek, Evolution of the social network of scientific collaborations, Physica A, 311 (2002), 590–614.

ED

[7] B. Bollob´ as, Random Graphs (2nd ed.), Cambridge University Press, London-New York, 2001.

PT

[8] B. Bollob´ as and O. Riordan, Mathematical results on scale-free random graphs, in Handbook of Graphs and Networks: From the Genome to the Internet (eds. S. Bornholdt and H. G. Schuster), pp. 1–34, Wiley-VCH, Berlin, 2002.

CE

[9] R. L. Breiger and P. E. Pattison, Cumulated social roles: the duality of persons and their algebras, Social Networks, 8 (1986), 215–256.

AC

[10] D. Bu, Y. Zhao, L. Cai, H. Xue, X. Zhu, H. Lu, J. Zhang, S. Sun, L. Ling, N. Zhang, G. Li, and R. Chen, Topological structure analysis of the protein-protein interaction network in budding yeast, Nucleic Acids Research, 31 (2003), 2443–2450.

[11] F. R. Chung and R. Graham, Sparse quasi-random graphs, Combinatoria, 22 (2002), 217– 244. [12] F. R. Chung, R. Graham, and R. Wilson, Quasi-random graphs, Combinatoria, 9 (1989), 345–362. [13] V. Colizza, R. Pastor-Satorras, and A. Vespignani, Reaction-diffusion processes and metapopulation models in heterogeneous networks, Nature Physics, 3 (2007), 276–282. 12

ACCEPTED MANUSCRIPT

[14] P. Erd˝os, and A. R´eyi, On a problem in theory of graphs, Publ. Math. Inst. Hungar. acad. Sci., 7A (1962), 623–641. [15] G. Fagiolo, Clustering in complex directed networks, Phys. Rev. E, 76 (2007), p.026107.

CR IP T

[16] D. A. Fell and A. Wagner, The small world of metabolism, Nat. Biotechnol, 18 (2000), 1121–1122. [17] J. Friedman, A proof of Alon’s second eigenvalue conjecture and related problems, Mem. Amer. Math. Soc., 195 (2008), article #910. [18] M. Ghoniem, Outils de visualisation et d’aide ´a la mise au point de programmes avec contraintes. Phd, Universit´e de Nantes, 2005.

AN US

[19] C. Godsil and G. Royle, Algebraic Graph Theory, Springer, 2001.

[20] L. S. Heath, N. Parikh, Generating random graphs with tunable clustering coefficients, Physica A, 390 (2011), 4577–4587. [21] C. Hilgetag, G. A. P. C. Burns, M. A. O’Neill, J. W. Scannell, and M. P. Young, Anatomical connectivity defines the organization of clusters of cortical areas in the macaque monkey and the cat, Philos. Trans. R. Soc. London, Ser. B, 355 (2000), 91–110.

M

[22] P. Holme and B. J. Kim, Growing scale-free networks with tunable clustering, Phys. Rev. E, 65 (2002), p.026107.

ED

[23] R. Horn and C. Johnson, Matrix Analysis, Cambridge University Press, London, 1986. [24] S. Iyer, T. Killingback, B. Sundaram, and Z. Wang, Attack robustness and centrality of complex networks, PLoS One, 8 (2013), article e59613.

PT

[25] M. Kaiser and C. C. Hilgetag, Spatial growth of real-world networks, Phys. Rev. E, 69 (2004), p.036103.

CE

[26] S. Kundu, S. K. Pal, FGSN: fuzzy granular social networks - model and applications, Inf. Sci., 314 (2015), 100–117.

AC

[27] J. Leskovec, J. Kleinberg, and C. Faloutsos, Graph evolution: densification and shrinking diameters, ACM Transactions on Knowledge Discovery from Data, 1 (2007), article 2. [28] M. Li and C. O’Riordan, The effect of clustering coefficient and node degree on the robustness of cooperation, 2013 IEEE Congress on Evolutionary Computation, Cancun, 2013, 2833–2839.

[29] Y. Liu, C. Zhao, X. Wang, Q. Huang, X. Zhang, D. Yi, The degree-related clustering coefficient and its application to link prediction, Physica A, 454 (2016), 24–33. [30] L. Lu and X. Peng, Spectra of edge-independent random graphs, Electron. J. Combin., 20 (2013), article #P27. 13

ACCEPTED MANUSCRIPT

[31] M. E. J. Newman, Networks: An Introduction, Oxford University Press, New York, 2010. [32] M. E. J. Newman, Scientific collaboration networks. I. Network construction and fundamental results, Phys. Rev. E, 64 (2001), p.016131.

CR IP T

[33] M. E. J. Newman, Random graphs with clustering, Phys. Rev. E, 103 (2009), p.058701. [34] M. Ripeanu, I. Foster, and A. Iamnitchi, Mapping the Gnutella network: properties of largescale peer-to-peer systems and implications for system design, IEEE Internet Computing Journal, 6 (2002), 50–57. [35] J. Saram¨ aki, M. Kivel¨ a, J.-P. Onnela, K. Kaski, and J. Kert´esz, Generalizations of the clustering coefficient to weighted complex networks, Phys. Rev. E, 75 (2007), p.027105.

AN US

[36] T. Schank and D. Wagner, Approximating clustering coefficient and transitivity, J. Graph Algor. Appl., 9 (2005), 265–275. [37] Y. Shang, Distinct clusterings and characteristic path lengths in dynamic small-world networks with identical limit degree distribution, J. Stat. Phys., 149 (2012), 505–518. [38] Y. Shang, Unveiling robustness and heterogeneity through percolation triggered by randomlink breakdown, Phys. Rev. E, 90 (2014) p.032820.

M

[39] Y. Shang, Bounding extremal degrees of edge-independent random graphs using relative entropy, Entropy, 18 (2016), article #53.

ED

[40] T. Szab´ o, On the spectrum of projective norm-graphs, Inform. Process. Lett., 86 (2003), 71–74.

PT

[41] S. Wang, L. Huang, C.-H. Hsu, F. Yang, Collaboration reputation for trustworthy Web service selection in social networks, J. Comput. Syst. Sci., 82 (2016), 130–143. [42] D. J. Watts and S. Strogatz, Collective dynamics of “small-world” networks, Nature, 393 (1998), 440–442.

AC

CE

[43] B. Zhang and S. Horvath, A general framework for weighted gene co-expression network analysis, Stat. Appl. Genet. Mol. Biol., 4 (2005), p.17.

14