Inference for the degree distributions of preferential attachment networks with zero-degree nodes

Inference for the degree distributions of preferential attachment networks with zero-degree nodes

Journal of Econometrics xxx (xxxx) xxx Contents lists available at ScienceDirect Journal of Econometrics journal homepage: www.elsevier.com/locate/j...

487KB Sizes 0 Downloads 54 Views

Journal of Econometrics xxx (xxxx) xxx

Contents lists available at ScienceDirect

Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom

Inference for the degree distributions of preferential attachment networks with zero-degree nodes✩ ∗

N.H. Chan a,b , , Simon K.C. Cheung c , Samuel P.S. Wong b,c a b c

School of Statistics, Southwestern University of Finance and Economics, Sichuan, China Department of Statistics, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China Department of Statistics and Actuarial Science, The University of Hong Kong, Pokfulam Road, Hong Kong SAR, China

article

info

Article history: Available online xxxx JEL classification: C50 C51 C55 C59 Keywords: Preferential attachment with zero-degree nodes Power-tail of degree distribution Sylvester matrix equation Martingale convergence theorem

a b s t r a c t The tail of the logarithmic degree distribution of networks decays linearly with respect to the logarithmic degree is known as the power law and is ubiquitous in daily lives. A commonly used technique in modeling the power law is preferential attachment (PA), which sequentially joins each new node to the existing nodes according to the conditional probability law proportional to a linear function of their degrees. Although effective, it is tricky to apply PA to real networks because the number of nodes and that of edges have to satisfy a linear constraint. This paper enables real application of PA by making each new node as an isolated node that attaches to other nodes according to PA scheme in some later epochs. This simple and novel strategy provides an additional degree of freedom to relax the aforementioned constraint to the observed data and uses the PA scheme to compute the implied proportion of the unobserved zero-degree nodes. By using martingale convergence theory, the degree distribution of the proposed model is shown to follow the power law and its asymptotic variance is proved to be the solution of a Sylvester matrix equation, a class of equations frequently found in the control theory (see Hansen and Sargent (2008, 2014)). These results give a strongly consistent estimator for the power-law parameter and its asymptotic normality. Note that this statistical inference procedure is non-iterative and is particularly applicable for big networks such as the World Wide Web presented in Section 6. Moreover, the proposed model offers a theoretically coherent framework that can be used to study other network features, such as clustering and connectedness, as given in Cheung (2016). © 2020 Elsevier B.V. All rights reserved.

1. Introduction Information networks and social networks have been playing important roles in daily lives. To enhance the applications of these networks, studying their statistical features becomes critical tasks. Among many interesting network features such as node clustering and connectedness of the networks, modeling the popularity of the nodes deserves a special ✩ We acknowledge the support of HKSAR-RGC-GRF, China Nos: 14300514 and 14325216 as well as HKSAR-RGC-TRF, China No. T32-101/15-R. We are deeply honored and humbled to be invited to contribute a paper to this special issue of Journal of Econometrics in honor of the 85th Birthday of Professor George Tiao. Although this paper is about network modeling, it is motivated by many of his ideas in related areas, in particular his arrays of papers in modeling ozone data and air pollutants, see for example Bojkov et al. (1990) and the references therein. This paper serves as an example of Professor Tiao’s aspiration and commitment about anchoring statistics with real applications (see Chan (1999)). ∗ Corresponding author at: Department of Statistics, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China. E-mail addresses: [email protected] (N.H. Chan), [email protected] (S.K.C. Cheung), [email protected] (S.P.S. Wong). https://doi.org/10.1016/j.jeconom.2020.01.015 0304-4076/© 2020 Elsevier B.V. All rights reserved.

Please cite this article as: N.H. Chan, S.K.C. Cheung and S.P.S. Wong, Inference for the degree distributions of preferential attachment networks with zero-degree nodes. Journal of Econometrics (2020), https://doi.org/10.1016/j.jeconom.2020.01.015.

2

N.H. Chan, S.K.C. Cheung and S.P.S. Wong / Journal of Econometrics xxx (xxxx) xxx

attention. The feature of popularity is particularly remarkable because of a straightforward reason: A popular node corresponds to a traffic bottleneck in an information network and an influential individual/organization in a social network. While controlling the traffic of the information network naturally requires statistical models for the node popularity, the methodology introduced in this paper also offers a way to estimate the scarcity of key opinion leaders (which is abbreviated as KOL in the public media) in the social network. The absolute popularity of any node can be easily quantified by counting its degree, which is the number of edges connecting that node with any other node in the network. To measure the popularity of each node relative to other nodes in the same network, we define the degree distribution of a network, denoted by {pk (t)}, such that each pk (t) is the proportion of nodes having degree k in a network of t nodes where both t and k are non-negative integers. By making use of some exploratory methods, several studies show that the degree distribution of many large networks (including the World Wide Web) exhibits power-law behavior in the sense that pk (t) ∝ k−γ with γ > 1 for sufficiently large k. In particular, Hofstad (2016, Section 1.7.3) illustrates this empirical stylized behavior using the plot of log pk (t) versus log k of various large networks and shows that the right tail is frequently found to be in the form of a straight line whose slope is a preliminary estimate of γ . Hofstad (2016) also suggests estimating γ by Hill’s estimator that is presented in Resnick (2007). Another feature of the power-law property is called scale-free because the same exponent γ holds for all large t. The aforementioned methods in estimating γ are known as exploratory because they focus only at the heavy-tail pk (t) without using the context of networks. To model the power-law behavior under the framework of random networks, a commonly used modeling technique is preferential attachment (PA) which is first introduced by Barabási and Albert (1999), see also Barabási et al. (2000). The idea of PA is sequentially building up networks by adding nodes and edges such that each new node connects to one of the existing nodes according to an assignment law, whose probabilities are proportional to their degrees. Barabási and Albert (1999) show that their PA random network model produces power law with γ = 3. Moreover, PA usually starts with a single node and after running t PA attachment steps where t ∈ N, the collection of nodes, Vt , and that of edges, Et have to be related by #(Vt ) = 1 + #(Et ),

(1)

where #(A) denotes the cardinality of set A. These conditions hamper PA from practical applications because numerous empirical studies show that γ is different from 3 and identity (1) can hardly be satisfied in real networks. In reality, the number of edges is usually significantly greater than the number of nodes. For example, in 2011, Facebook had 721 million active users and had 69 billion friendship links. Also, the Google Web Graph data analyzed in Section 6 of this paper, the number of nodes is roughly 9 × 105 while the number of edges is approximately 5 × 106 . To overcome the first shortcoming, another PA scheme, as discussed in Móri (2005) and Hofstad (2016), that connects each new node to the existing nodes with degree k according to an assignment probability proportional to k + θ gives γ = 3 + θ . Based upon the flexibility offered by this linear PA scheme, Gao and Vaart (2017) further tackle the issue of (1) by adopting the model of random initial degrees first proposed by Deijfen et al. (2009). Such a model modifies the PA scheme to add a random number of edges, which is referred as the random initial degrees, rather than just one edge each time. In particular, Gao and Vaart (2017) consider the case that the random initial degrees are i.i.d. with a common mean µ, and these edges are attached to the existing nodes following a linear PA scheme of parameter θ . They then derive the corresponding efficient maximum likelihood estimator (MLE) for θ from a complete evolution record of every incremental network change over time. In practice, it is infeasible to observe the complete history of a large network and only a snapshot of the data is available. Gao and Vaart (2017) circumvent this limitation by developing a quasi-MLE (QMLE) under the assumption that the common distribution of mt is known. Unlike their MLE, such a QMLE is not efficient anymore. Note that both of their MLE and QMLE require Newton–Raphson iterations which are computationally costly for big network analysis. This paper resolves the difficulty arising from the linear constraint (1) via a different perspective. Instead of adding a random number of edges at each step, we propose adding a zero-degree node and joins the existing nodes according to a linear PA conditional probability law. Note that the zero-degree nodes created at each step would have a non-zero probability to be attached to other nodes in the subsequent steps. Also, the resulting PA network has strictly positive proportion of zero-degree nodes, denoted as p0 (t), which in general is not observed from any empirical data. Nevertheless, it can be computed via the corresponding equation that relates #(Et ) and #(Vt ) similar to (1) as highlighted in Remarks 3 and 9 of this paper. In particular, since Vt is the disjoint union of the collection of unobserved zero-degree nodes and that of observed positive-degree nodes, the issue of (1) is resolved by the proposed scheme, which allows the number of edges to be greater than the number of observed nodes. As only network snapshot can be used to conduct inference, the statistical estimation methodology of the proposed model is developed from the asymptotic analysis of pk (t) presented in Section 3. On top of its power-law behavior, pk (t) is shown to be asymptotically Gaussian. In particular, the asymptotic covariance matrix of pk (t) is the solution of a Sylvester matrix equation (see Sylvester (1884), Gantmacher (1960) and Duflo (1997)). The class of Sylvester matrix equations plays a pivotal role in the econometric research of Hansen and Sargent (2008, 2014) and has numerous applications in control theory, signal processing, filtering, model reduction and image restoration (see Aliev and Larin (1998), Calvetti and Reichel (1996) and Trentelman et al. (2001)). While many algorithms, such as the classical Bartels-Stewart algorithm (see Bartels and Stewart (1972)), could be employed theoretically to solve this class of equations, the computational Please cite this article as: N.H. Chan, S.K.C. Cheung and S.P.S. Wong, Inference for the degree distributions of preferential attachment networks with zero-degree nodes. Journal of Econometrics (2020), https://doi.org/10.1016/j.jeconom.2020.01.015.

N.H. Chan, S.K.C. Cheung and S.P.S. Wong / Journal of Econometrics xxx (xxxx) xxx

3

burden is formidably heavy when the network is large. To lessen such a burden, we present a computationally efficient algorithm to solve the Sylvester equation for the asymptotic covariance matrix of pk (t) in Section 3. The computational cost consideration also motivates us to derive an explicit non-iterative estimator of the power-law parameter as in Section 4. The rest of this paper is organized as follows. Section 2 defines the proposed model and presents its power-law property. Section 3 uses the martingale theory to study the asymptotic behavior of pk (t) and presents a fast algorithm to compute the corresponding asymptotic covariance matrix. Section 4 develops a strongly consistent estimation procedure and its efficient computational implementation based on Section 3. Section 5 illustrates the estimators by simulations. Section 6 applies the proposed methodology to analyze a real network: the Google Web Graph. Section 7 summarizes some of our related work and concludes this paper. The proof of the corollaries and the theorems are presented in Appendix. 2. Preferential attachment network model Let G = (V , E) be a directed network whose set of nodes and set of edges are denoted by V and E, respectively. Each − edge in E is in the form of (a, b) where a ∈ V is the head node and b ∈ V is the tail node. For ∑each b ∈ V , d (b) denotes − the indegree of b and is defined as the total number edges with b being ∑the tail, i.e., d (b) = a∈V I{(a,b)∈E } and I{U } is the indicator function of the event U. Similarly, for each a ∈ V , d+ (a) = b∈V I{(a,b)∈E } the total number edges with a being the head and is usually referred as the outdegree of a. Our proposed PA directed network model refers to a sequence of directed networks {Gt = (Vt , Et ) : t ∈ N} such that V1 = {1}, E1 = {(1, 1)}, Vt +1 = Vt ∪ {t + 1}, Et +1 = Et ∪ {(At +1 , Bt +1 )} where At +1 and Bt +1 are the random head and tail independently chosen according to the following conditional probability law: pr{At +1 = a|Gt } = (d+ t (a) + σ )/[(1 + σ ) t ],

(2)

pr{Bt +1 = b|Gt } = (dt (b) + δ )/[(1 + δ) t ] −

− where both a and b ∈ Vt , and d+ t (a) and dt (b) are their outdegree and indegree, respectively. Since Gt is constructed sequentially by adding one node and one directed edge each time, Vt = {1, . . . , t } and the cardinality of Et is t. Thus, ∑t ∑t + − v=1 dt (v ) = v=1 dt (v ) = t and the above assignment probabilities are well defined for any positive δ and σ . To study the degree distribution, let Mk∑ of nodes in Vt with indegree and outdegree being k (t ) and Nk (t) be the number∑ t t . Theorem 1 shows that the proposed and Nk (t ) = for each k = 0, 1, . . . , t. That is, Mk (t ) = ( v ) = k } v=1 I{d− v=1 I{d+ t (v )=k} t PA model exhibits the scale-free power law.

Theorem 1.

The indegree distribution Nk (t)/t = pk (t) → ck almost surely and in L2 as t → ∞, where

ck = (σ + 1)

k−1 ∏

⎤−1 ⎡ k+1 ∏ (σ + j) ⎣ (2σ + j)⎦ (k = 0, 1, . . .)

with the convention of

(3)

j=1

j=0

∏−1

j=0 (

σ + j) = 1. A similar convergence result holds for Mk (t) with σ being replaced by δ .

In the next section, we first express (2) as a sequence of martingale differences and explain how to prove Theorem 1 by first proving Lemma 1. Remark 1. Applying Stirling’s formula to ck gives ck ∼ k−[k+1+2σ −(k−1+σ )] = k−(2+σ ) for large k, i.e., the power-law parameter of our proposed PA model is γ = 2 + σ . Remark 2. The conditional probability law (2) is a linear PA rule that is almost identical to that in Hofstad (2016, Chapter 8). The main difference is Hofstad (2016) does not consider the zero-degree nodes. Theorem 1 can also be proved by using similar arguments presented therein. Note that Theorem 1, however, is proved in the Appendix via a spectral decomposition that is crucial in deriving the asymptotic normality of pk (t) in Section 3 and the subsequent estimation theory. For related results, please see also Resnick and Samorodnitsky (2016) and Wan et al. (2017). 3. Asymptotic degree distribution To analyze the asymptotic behavior of the degree distribution, we first represent the PA scheme (2) via a sequence of martingale differences. Noting that because the results for indegree and outdegree distributions are analogous, only Nk (t)/t is presented herein. Besides, since t ∑ k=1

kNk (t ) = t and

t ∑

Nk (t ) = t ,

(4)

k=0

the joint distribution of (N0 (t ) , . . . , Nt (t )) has to be degenerated in Rt +1 . Thus, only the truncated random vector NtT = (N0 (t ) , . . . , Nm (t )) ∈ Rm+1 for some fixed m ≤ t − 1 is considered. Please cite this article as: N.H. Chan, S.K.C. Cheung and S.P.S. Wong, Inference for the degree distributions of preferential attachment networks with zero-degree nodes. Journal of Econometrics (2020), https://doi.org/10.1016/j.jeconom.2020.01.015.

4

N.H. Chan, S.K.C. Cheung and S.P.S. Wong / Journal of Econometrics xxx (xxxx) xxx

∑t ∑t N (t ) = k=1 kNk (t ) which is the corresponding identity of (1) in the standard ∑ ∑kt=0 k t PA model. In particular, N0 (t) = k=1 Nk (t ) is the k=1 kNk (t ) − ∑tkey equation that enables the number of edges, ∑ t k=1 Nk (t ). Remark 9 illustrates how N0 (t) is used k=1 kNk (t ), to be greater than the number of observed vertices, Remark 3.

Note that (4) implies

in real data analysis. 3.1. Stochastic recursive equation For k = 2, . . . , m, the PA scheme (2) implies Nk (t + 1) = Nk (t) + 1 if d+ t (At +1 ) = k − 1 and Nk (t + 1) = Nk (t) − 1 + if d+ t (At +1 ) = k. Also, if dt (At +1 ) = 0, then N1 (t + 1) = N1 (t) + 1 and N0 (t + 1) = N0 (t). By considering the mutually exclusiveness and exhaustiveness of the events {d+ t (At +1 ) = i} (i = 1, . . . , t), the PA scheme (2) can be expressed as the system of recursions over t: N0 (t + 1) = N0 (t) + 1 − I{d+ (At +1 )=0} , t

N1 (t + 1) = N1 (t) + I{d+ (At +1 )=0} − I{d+ (At +1 )=1} , t

(5)

t

Nk (t + 1) = Nk (t) + I{d+ (At +1 )=k−1} − I{d+ (At +1 )=k} (k = 2, . . . , m). t

t

The matrix form of (5) is Nt +1 = Nt + e + Φ Jt +1 ,

(6)

where Jt +1 = (I{d+ (At +1 )=0} , I{d+ (At +1 )=1} , . . . , I{d+ (At +1 )=m} ) , e ∈ R T

t

t

m+1

t

is the unit vector with only the first entry being 1,

and Φ is a lower bidiagonal matrix of dimension m + 1 with all diagonal elements being −1 and all lower subdiagonal elements being 1. The conditional mean of Jt +1 is E(Jt +1 |Gt ) = [(1 + σ ) t ]−1 (σ Nt + RNt ),

(7)

where R = diag(0, 1, . . . , m) is a direct consequence of the following form of PA scheme (2) E(I{d− (At +1 )=k} |Gt ) =



t

[d+ t (a) + σ ]/[(1 + σ ) t ]

a∈Vt :d+ t (a)=i

= [(1 + σ ) t ]−1 (k + σ )Nk (t), (k = 0, . . . , m).

(8)

By making use of the martingale difference sequence εt +1 = Jt +1 − E(Jt +1 |Gt ), the recursion (6) can then be expressed as Nt +1 = [I +

Φ (R + σ I) ]Nt + e + Φ εt +1 , (1 + σ ) t

(9)

where I is the (m + 1)-dimensional identity matrix. The PA scheme (2) starts at t = 1 with a non-random vector N1 = (0, 1, 0, . . . , 0)T ∈ Rm+1 . Iterating (9) leads to

⎛ ⎞ { j }−1 t −1 t −1 ∏ ∑ ∏ Φ (R + σ I) Φ (R + σ I) [I + ] ⎝Nm + ] (e + Φ εj+1 )⎠ Nt = [I + (1 + σ ) j (1 + σ ) r r =m j=m j=m =

t −1 t −1 t −1 ∏ ∑ ∏ Φ (R + σ I) Φ (R + σ I) [I + ]Nm + [I + ]e + St , (1 + σ ) j (1 + σ ) r j=m j=m r =j+1

(10)

where St =

t −1 t −1 ∑ ∏

[I +

j=m r =j+1

Φ (R + σ I) ]Φ εj+1 . (1 + σ ) r

(11)

Note as ∏r that since the matrix multiplication is not commutative, our cumulative product notation should be interpreted ∏t −1 i=k Mj = Mr · · · Mk with each Mj being a square matrix and r > k. Another convention adopted in (11) is r =t Mr = I. The stochastic behavior of Nt /t is determined by that St /t, which is summarized in the following lemma. Lemma 1.

St /t → 0 almost surely and in L2 as t → ∞.

Lemma 1 is proved by applying the martingale convergence theorem. Theorem 1 is then a direct consequence of Lemma 1. Both proofs are presented in Appendix. To show that Nt /t → c stated is (3), we take expectation on both sides of (9): E(Nt +1 ) = [I +

Φ (R + σ I) ]E(Nt ) + e. (1 + σ ) t

(12)

Please cite this article as: N.H. Chan, S.K.C. Cheung and S.P.S. Wong, Inference for the degree distributions of preferential attachment networks with zero-degree nodes. Journal of Econometrics (2020), https://doi.org/10.1016/j.jeconom.2020.01.015.

N.H. Chan, S.K.C. Cheung and S.P.S. Wong / Journal of Econometrics xxx (xxxx) xxx

5

Substituting E(Nt ) = tc and E(Nt +1 ) = (t + 1)c to (12) gives c = (1 + σ )−1 Φ (R + σ I)c + e, i.e., c = [I −

Φ (R + σ I) −1 ] e. 1+σ

(13)

Alternatively, the entry of c can be determined by solving [I − (1 + σ )−1 Φ (R + σ I)]c = e in a recursive form:

σ )c0 = 1, 1+σ σ +k σ +k−1 (1 + )ck − ck−1 = 0, (k = 1, . . . , m). 1+σ 1+σ (1 +

(14) (15)

Theorem 1 can then be proved by verifying ck of (3) satisfies (14) and (15) . Remark 4. The formula (13) can be expressed as c = Bc + e,

(16)

where B = (1 + σ )−1 Φ (R + σ I). Thus, c is the fixed point of the linear transformation F (x) = Bx + e, i.e., F (c) = c. As the difference equation (12) can be expressed as E(Nt +1 ) − E(Nt ) = B[E(Nt )/t ] + e = F [E(Nt )/t ] and F [E(Nt )/t ] ≈ E(Nt /t) when t is large, c can be interpreted as the stationary increasing rate of E [Nt ].

√ 3.2. Asymptotic normality of

t(Nt /t − c)

Theorem 1 shows that our model captures the stylized power law. Moreover, a strongly consistent point estimator of σ can be derived by solving σ from the equation Nt /t = c as presented in Section 4. Nevertheless, to access the estimation error, we need to make use of the asymptotic distribution of Nt /t. To begin the calculation of the asymptotic distribution, let ηt +1 = E(Jt +1 |Gt ) = (1 + σ )−1 (R + σ I)(Nt /t). By Theorem 1, ηt +1 → π almost surely as t → ∞ where π = (1 + σ )−1 (R + σ I)c and is equal to

π = Φ −1 Bc.

(17)

Combining such a limit with the martingale central limit theorem, we establish the asymptotic normality of pt = Nt /t in Theorem 2. Theorem 2. As t → ∞,



t (pt − c ) → N(0, Λ)

(18)

in distribution, where Λ is the solution of the Sylvester matrix equation

Λ(I − BT ) − BΛ = ΦΩΦ T

(19)

with Ω = diag(π ) − π π . T

Theorem 2 is proved in the Appendix and is critical to the development of the statistical estimation theory of σ presented in the next section. Remark 5. The formula (19) is of the form of Sylvester matrix equation, i.e., XP − QX = Y where P, Q , X and Y are all matrices. The solution of X is unique if P and Q have no common eigenvalues. Such a condition is satisfied by I − BT and B in (19). Also, by making use of the fact that B is lower bidiagonal, we derive an efficient algorithm to solve Λ from (19) and present it in the next subsection. Remark 6. Note that ΦΩΦ T = Φ diag(π )Φ T − (Φ π )(Φ π )T in (19) can be computed efficiently by (16) and (17), i.e., Φ π = Bc = c − e. Thus, (Φ π )(Φ π )T = (c − e)(c − e)T = cc T − ceT − ec T + eeT . Remark 7. Similar to Remark 4, Λ in (19) can also be interpreted as the solution of the fixed point of the function K which is defined as K (M) = BM + MBT + ΦΩΦ T and M is any symmetric matrix of dimension m + 1. Another way to interpret (19) is Λ − ΦΩΦ T = ΛBT + BΛ. Since Λ and ΦΩΦ T are the long-term unconditional and conditional covariance matrices of Nt , respectively, their difference is given by ΛBT + BΛ and is then easy to see that ΛBT + BΛ is symmetric and negative semi-definite. Please cite this article as: N.H. Chan, S.K.C. Cheung and S.P.S. Wong, Inference for the degree distributions of preferential attachment networks with zero-degree nodes. Journal of Econometrics (2020), https://doi.org/10.1016/j.jeconom.2020.01.015.

6

N.H. Chan, S.K.C. Cheung and S.P.S. Wong / Journal of Econometrics xxx (xxxx) xxx

3.3. Efficient algorithm to compute Λ While Sylvester matrix equation has been studied extensively and many algorithms, such as those reviewed in Gardiner et al. (1992) and Anderson et al. (1996), can be used to solve (19), the computational cost needed for big networks with t in the range of 106 could still be a formidable task. Fortunately, B is a lower bidiagonal matrix whose diagonal vector is given by b = (b0 , b1 , . . . , bm )T where bi = −(σ + i)/(1 + σ ) for i = 0, 1, . . . , m and its lower subdiagonal vector is (−b0 , −b1 , . . . , −bm−1 )T . Such a property provides the following efficient algorithm for Λ: Let Y = ΦΩΦ T and its (i, j)th element be yij for i, j = 0, 1, . . . , m. Similarly, the (i, j)th element of Λ is denoted λij . Note that both Y and Λ are symmetric. By comparing the elements on both sides of Λ − ΛBT − BΛ = Y , we obtain the following algorithm by backward substitution: yi0 y00

λ00 =

1 − 2b0

, λi0 =

1 − b0 − bi

for i = 1, . . . , m and

λii =

yii − bi−1 λi−1,i 1 − 2bi

, λij =

yij − bj−1 λi,j−1 − bi−1 λi−1,j 1 − bi − bj

for j = i + 1, . . . , m. Not only our algorithm avoided matrix inversion and spectral decomposition in the usual Sylvester solver, the scalar backward substitution is computationally efficient and numerically stable. In fact, the overflow issue is pronounced only when σ is tiny, i.e., when the PA scheme cannot model the corresponding data adequately. 4. Parameter estimation and inference To derive our proposed estimator for σ , we express (13) as

σ (c − e − Φ c) = Φ Rc + e − c.

(20)

Multiplying (c − e − Φ c) to both sides of the equation and dividing both sides by (c − e − Φ c) (c − e − Φ c) gives T

σ =

(c − e − Φ c)T (Φ Rc + e − c) (c − e − Φ c)T (c − e − Φ c)

T

.

(21)

Thus, by defining the function g as g(x) =

(x − e − Φ x)T (Φ Rx + e − x)

(22)

(x − e − Φ x)T (x − e − Φ x)

for all x ∈ Rm+1 with (I − Φ )x ̸ = e, the expression (21) suggests estimating σ by σˆ t = g(pt ). Theorem 1 and the continuity of g implies σˆ t is strongly consistent. Moreover, its asymptotic distribution can be obtained by applying Delta method (Theorem 11.2.14 of Lehmann and Romano (2005)) to Theorem 2, i.e.,



t(σˆ t − σ ) → N(0, [∇ g(c)]T Λ∇ g(c)),

in distribution as t → ∞ where ∇ g(c) is the gradient vector of g(x) at c. In particular, for x ∈ R

∇ g(x) =

1

[Q (x)]2

(23) m+1

,

[Q (x)∇ P(x) − P(x)∇ Q (x)],

where P(x) = (x − e − Φ x)T (Φ Rx + e − x), Q (x) = (x − e − Φ x)T (x − e − Φ x),

∇ P(x) = (Φ R − I)T (x − e − Φ x) + (I − Φ )T (Φ Rx + e − x), ∇ Q (x) = 2(I − Φ T )(x − e − Φ x). To compute τ 2 = [∇ g(c)]T Λ∇ g(c) for large m, the memory requirement in storing the m × m matrices of Λ and Y = ΦΩΦ T could be demanding. One can substantially reduce such a burden by using the algorithm stated in Section 3.3 as well as the fact that Φ diag(π )Φ T is a symmetric tridiagonal matrix whose diagonal vector d = (d0 , d1 , . . . , dm ) = (π0 , π0 + π1 , . . . , πm−1 + πm ) and subdiagonal vector s = (s0 , s1 , . . . , sm−1 ) = (−π0 , −π1 , . . . , −πm−1 ). That is, by setting z = Φ π = (z0 , . . . , zm ) = c − e = (c0 − 1, c1 , . . . , cm ) as in Remark 6 and using the fact that

⎧ 2 ⎪ ⎨di − zi si − zi zi+1 yij = ⎪ ⎩si − zi zi−1 −zi zj

if i = j, if i + 1 = j, if i − 1 = j, otherwise,

for i, j = 0, . . . , m, τ 2 can be computed by summing over ui uj λij via the algorithm given in Section 3.3, where u = 2 (u0 , . . . , um )T = ∇ g(c). ∑mSuch2 a computational ∑m−1 ∑mprocedure reduces the memory requirement from O(m ) to O(m). Also, note that τ 2 = uT Λu = u λ + 2 u u λ because Λ is symmetric. Without using the symmetry of Λ, the ii i j ij i=0 i i=0 i=j+1 additional computational cost is in the order of m2 . Please cite this article as: N.H. Chan, S.K.C. Cheung and S.P.S. Wong, Inference for the degree distributions of preferential attachment networks with zero-degree nodes. Journal of Econometrics (2020), https://doi.org/10.1016/j.jeconom.2020.01.015.

N.H. Chan, S.K.C. Cheung and S.P.S. Wong / Journal of Econometrics xxx (xxxx) xxx

7

Fig. 1. log(p¯ k (T )) versus log(k + 1) where T = 105 and p¯ k (T ) is the sample mean of the simulated pk (t). The solid curve is log(ck ). Table 1 Summary statistics of the maximum degree of 1000 simulated PA networks with σ = 0.4 and T = 105 . Minimum

First quartile

Median

Third quartile

Maximum

1938

4483

5791

7153

12 136

Table 2 Summary statistics of σˆ T and their asymptotic values.

σˆ T Asymptotic values

Mean

Standard deviation

Skewness

Kurtosis

0.4002 0.4

4.189×10−3 4.250×10−3

−0.096

3.092 3

0

Remark 8. Note that the condition (I − Φ )x ̸ = e in the definition of g(x) (22) is equivalent to c − e − Φ c ̸ = 0. Such a requirement is equivalent to the existence condition of the fixed point of F (x) = x as stated in Remark 4. 5. Simulation studies To examine our methodology, we simulate 1000 replicates for σ = 0.4 from the PA attachment scheme until t reaches T = 105 . Fig. 1 plots the logarithmic sample mean proportion of nodes with degree k versus log(k + 1) together with the solid curve of log(ck ). Note that when k is large, the sparsity of Nk (T )/T is more pronounced and causes more apparent deviation from the asymptotic mean c. Nevertheless, the straight line in Fig. 1 reconfirms the power tail of pk (T ) captured by the proposed PA model. Similar to Fig. 1, Fig. 2 plots the logarithmic sample variance of Nk (T )/T versus log(k + 1) together with the solid curve of log(λkk /T ) where λkk is the kth diagonal element of Λ. It also aligns with the result of Theorem 2. Table 1 is a summary of the maximum degree of these 1000 replicates and shows that the variation of the maximum degree is between 1938 and 12 136. To illustrate the asymptotic property presented in (23), m = 1500 is chosen so that the tail behavior of Nk (T )/T can be captured. Fig. 3 is the histogram of σˆ T . Table 2 compares the statistics of σˆ T to those of theoretical asymptotic values. Both Fig. 3 and Table 2 again are consistent with the aforementioned asymptotic results which include its mean, standard deviation and normality. Please cite this article as: N.H. Chan, S.K.C. Cheung and S.P.S. Wong, Inference for the degree distributions of preferential attachment networks with zero-degree nodes. Journal of Econometrics (2020), https://doi.org/10.1016/j.jeconom.2020.01.015.

8

N.H. Chan, S.K.C. Cheung and S.P.S. Wong / Journal of Econometrics xxx (xxxx) xxx

Fig. 2. log(s2p

k (T )

) versus log(k + 1) where T = 105 and s2p

k (T )

is the sample variance of the simulated pk (t). The solid curve is log(λkk /T ).

Fig. 3. Histogram of σˆ T of the simulated networks. The red vertical line corresponds to the true value σ = 0.4. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Please cite this article as: N.H. Chan, S.K.C. Cheung and S.P.S. Wong, Inference for the degree distributions of preferential attachment networks with zero-degree nodes. Journal of Econometrics (2020), https://doi.org/10.1016/j.jeconom.2020.01.015.

N.H. Chan, S.K.C. Cheung and S.P.S. Wong / Journal of Econometrics xxx (xxxx) xxx

9

Table 3 Parameter estimates of the PA model and other relevant statistics for the indegree and outdegree distribution of Google Web Graph data. Indegree Outdegree

σˆ

Standard error

m

Maximum degree

0.172 0.123

2.777×10−4 2.172×10−4

4000 300

6326 456

Fig. 4. log(pk (t)) versus log(k + 1) of the indegree of Google Web Graph. The solid line is the fitted logarithmic asymptotic mean.

6. Application to Google Web Graph data The internet, also known as the World Wide Web, can be considered as a directed network of webpages connected by hyperlinks. According to http://www.worldwidewebsize.com/, Google has been indexing at least 46 billion webpages between May and August of 2018 and is currently one of the biggest social networks. The data used in this section was released by Google in 2002 as part of the Google Programming Contest and thus, is a snapshot of the infant internet. This data set, also known as the Google Web Graph Data and is available at https://snap.stanford.edu/data/web-Google.html, consists of totally 875 713 nodes (or webpages) and 5 105 039 edges (hyperlinks). Table 3 summarizes our analysis for the indegree and the outdegree distribution of the Google Web Graph and Figs. 4 and 5 provide the corresponding goodness-of-fit analysis. For the indegree distribution, Fig. 4 shows that the proposed PA model fits the data satisfactorily. Moreover, the estimate of σˆ = 0.172 and its standard error together provide further evidence that the index of the power tail is significantly greater than 2. Also, assuming both σ and the ratio of the number of nodes to the number of edges are invariant from the day the Google Web Graph Data was collected to 2018, t = 46 × 109 × 5 105 039/875 713 = 268.2 × 109 . Under such a configuration, the top 106 (105 , 104 and 103 respectively) most popular webpages would have minimum indegree 9246 (6.6×104 , 4.7×105 and 3.36×106 respectively) on average under our PA model. For the outdegree distribution, Fig. 5 however suggests that PA model may not be appropriate. Such a discrepancy can be reasoned from the context of outdegree in the Google Web Graph Data. That is, while the indegree of a node measures the popularity of the corresponding webpage, the outdegree of a particular webpage refers to the number of hyperlinks listed on it and only measures its ability of introducing other nodes. In fact, PA scheme can hardly explain the evolution of outdegree distribution over time because most of the webpages are designed for human access and the number of hyperlinks listed on a certain webpage should not grow indefinitely. That is also why the maximum outdegree of Google Web Graph is only 456 and is substantially smaller than 6326, its maximum indegree as listed in Table 3. Please cite this article as: N.H. Chan, S.K.C. Cheung and S.P.S. Wong, Inference for the degree distributions of preferential attachment networks with zero-degree nodes. Journal of Econometrics (2020), https://doi.org/10.1016/j.jeconom.2020.01.015.

10

N.H. Chan, S.K.C. Cheung and S.P.S. Wong / Journal of Econometrics xxx (xxxx) xxx

Fig. 5. log(pk (t)) versus log(k + 1) of the outdegree of Google Web Graph. The solid line is the fitted logarithmic asymptotic mean.

Remark 9. As discussed in Section 1 and Remark 3, N0 (t) is customarily not reported in real data but can be computed via the key equation (4). For the Google Web Graph, the implied N0 (t)/t for the indegree and outdegree distributions are 86% and 85.5%, respectively. That is, the number of edges is substantially greater than ∑t the number of observed nodes, which hampers the applications of standard PA models. In addition, since N0 (t) = k=2 (k − 1)Nk (t ) which aggregates the information in Nk (t) for k ≥ 2, that gives the proposed inference procedure some robustness over the choice of m. In fact, m is chosen based on the consideration of (i) capturing as much information in the tail as possible and (ii) avoiding the sparsity of Nk (t) for large k that invalids the asymptotic normality stated in Theorem 2. It is then easy to see the factor (i) would choose m as large as possible while the factor (ii) would avoid choosing a large m. N0 (t) shows a natural compromise in aggregating tail information (as well as other information of pk (t)) that ensures the validity of asymptotic normality. It allows one to make use of the tail information even for a moderate m. Remark 10. In the Google Web Graph data, no self-linking edges are observed. Such an empirical behavior is different from the feature of the proposed PA model. Yet according to the PA model, the proportion of self-linking edges is tiny for a large network. Thus, PA model can still be considered as a good approximation of the Google Web Graph. 7. Conclusion and discussion Before concluding, we revisit the PA scheme with random initial degrees of Gao and Vaart (2017) that is discussed in Section 1. Note that Section 6 of Gao and Vaart (2017) presents the degenerated case of the deterministic initial degrees that makes the common mean of initial degrees, denoted by µ, as a positive integer. In that case, #(Et ) = µ(#(Vt ) − 1) (with E0 = ∅ and singleton V0 ), which is the corresponding constraint of (1) in the standard PA model and constitutes the main obstacle for real applications. That is, it is quite unlikely to find a real network with #(Et ) being divisible by #(Vt ) − 1. However, by setting µ = 1, the attachment parameter as σ − 1 and reducing the degree of every node by 1, this particular case is equivalent to the proposed model. As the MLE derived in Gao and Vaart (2017) for random initial degrees can be applied to the deterministic case and can be computed by using only a single snapshot, it can be considered as a viable alternative estimator to σˆ because their MLE is asymptotically efficient. However, as Newton–Raphson iterations are needed in computing their MLE, the application of this alternative estimator is limited to relatively small networks. Note that if iteration is allowed, then the choice of m can also be tuned to improve the statistical efficiency of σˆ , which is explicit and non-iterative in nature. Please cite this article as: N.H. Chan, S.K.C. Cheung and S.P.S. Wong, Inference for the degree distributions of preferential attachment networks with zero-degree nodes. Journal of Econometrics (2020), https://doi.org/10.1016/j.jeconom.2020.01.015.

N.H. Chan, S.K.C. Cheung and S.P.S. Wong / Journal of Econometrics xxx (xxxx) xxx

11

Another important point is the theoretical coherence in analyzing different features of network characteristics. While this paper focuses at analyzing the power law of degree distribution, there are other network behaviors, such as the clustering feature of the nodes and the connectivity robustness against random nodes/edges removal, also attracted intense research interest. Studying all these features under the same theoretical framework has been one of the main goals in random networks. This paper offers such a PA framework by using zero-degree nodes and develops corresponding statistical estimation theory for the power-law parameter. The same framework is used in analyzing many aforementioned network features in Cheung (2016). Interested readers could explore more from there. In conclusion, by means of martingale convergence theory, this paper derives a strongly consistent estimator for the power tail of the degree distribution of the PA network and offers an efficient computational implementation via a fast algorithm designed for the corresponding asymptotic covariance matrix. The proposed methodology is validated by simulation and is applied to the study the degree distributions of the Google Web Graph Data, a snapshot of the World Wide Web when it was still in its infancy. It is found that the indegree distribution of the Google Web Graph Data can be captured by the PA model with power-law parameter significantly greater than 2 while the outdegree distribution behaves quite differently. Such a difference can be attributed to the fact that the outdegree of a node in Google Web Graph cannot grow indefinitely because typical webpages are tailored for human users, which are tight in terms of hyperlink accommodations. That is, the ‘‘rich-get-richer’’ assumption of the PA scheme would not hold for the outdegree distribution of the Google Web Graph. Acknowledgments We would like to express our sincere thanks to the Guest Editor and two anonymous referees for critical suggestions and helpful references. In particular, both referees and the Editor brought to our attention of the relevant results of Gao and Vaart (2017) and Deijfen et al. (2009) and suggested us to establish the connections between our models and the PA model with random initial degrees, which not only clarifies some of the issues, but also highlights the subtle differences existed in the use of zero-degree nodes. N.H. Chan also acknowledges the support of HKSAR-RGC-GRF, China Nos: 14300514 and 14325216 as well as HKSAR-RGC-TRF, China No. T32-101/15-R. Appendix Proof of Lemma 1. Consider St stated in (11) as a martingale with the σ -field generated by {εs : s = t − 1, . . . , 2} as its filtration. ∑t −1 The filtration is the same as those generated by the network history {Gs : s = t − 1, . . . , 2}. We express St +1 = j=m Xj and Xj =

⎧ t −1 ⎨∏ ⎩

⎫ Φ (R + σ I) ⎬ [I + ] Φ εj+1 = L(j + 1, t − 1)Φ εj+1 (j = m, . . . , t − 1), (1 + σ ) r ⎭

r =j+1

where L(j + 1, t − 1) =

t −1 ∏ r =j+1

[I +

Φ (R + σ I) ]. (1 + σ ) r

(24)

To apply Corollary 2 of Kaufmann (1987) to show that St /t → 0 almost surely and in L2 , we need to show that t −1 ∑

tr(Ψj )/j2 < ∞,

(25)

j=m

where Ψj =var(Xj ) and can be expressed as

Ψj = L(j + 1, t − 1)ΦΣj+1 Φ T [L(j + 1, t − 1)]T , and Σj = E [var(εj |Gj−1 )] = E [diag(ηj )] − E(ηj ηjT ) and ηj = E(Jj |Gj−1 ). To prove (25), consider tr(Ψj ) = tr{L(j + 1, t − 1)ΦΣj+1 Φ T [L(j + 1, t − 1)]T }

< tr{L(j + 1, t − 1)Φ E [diag(ηj )]Φ T [L(j + 1, t − 1)]T } < tr{L(j + 1, t − 1)ΦΦ T [L(j + 1, t − 1)]T } = tr{ΦΦ T [L(j + 1, t − 1)]T L(j + 1, t − 1)} ≤ tr(ΦΦ T )tr{[L(j + 1, t − 1)]T L(j + 1, t − 1)}.

(26)

Please cite this article as: N.H. Chan, S.K.C. Cheung and S.P.S. Wong, Inference for the degree distributions of preferential attachment networks with zero-degree nodes. Journal of Econometrics (2020), https://doi.org/10.1016/j.jeconom.2020.01.015.

12

N.H. Chan, S.K.C. Cheung and S.P.S. Wong / Journal of Econometrics xxx (xxxx) xxx

The first and the second inequalities follow from the fact that Σj = E [diag(ηj )] − E(ηj ηjT ) and all entries of ηj = E(Jj |Gj−1 ) are less than 1, respectively. The following equality is a result of tr(M1 M2 ) =tr(M2 M1 ) where Mi are (m + 1) dimensional square matrices. The last inequality is true because tr(M1 M2 ) ≤tr(M1 )tr(M2 ) if M1 and M2 are both positive semi-definite. In particular, both ΦΦ T and [L(j + 1, t − 1)]T L(j + 1, t − 1) are positive semi-definite. Note that the spectral decomposition of L(j + 1, t − 1) =∏ VW (j + 1, t − 1)V −1 with W (j + 1, t − 1) = diag(w0 (j + 1, t − 1), . . . , w0 (j + 1, t − 1)) and 1 −1 wk (j + 1, t − 1) = tr− =j+1 {1 − (k + σ )[(1 + σ )r ] } (k = 0, . . . , m). Also, V is the matrix of the corresponding eigenvectors whose kth column vector is vk = (υ0k , . . . , υmk )T (k = 0, . . . , m) with

υlk =

0, 1,

⎧ ⎨ ⎩

[(l − k)!]−1

if l = 0, . . . , k − 1, if l = k, , if l = k + 1, . . . , m

∏l−1

i=k (i + σ ),

(27)

and the (l, k)th element of V −1 is (−1)l+k υlk . Note that V is independent of t and j as long as t − 1 ≥ j ≥ m. Thus, using the spectral decomposition and similar techniques in deriving (26), the second term of (26) can be further bounded by tr{[L(j + 1, t − 1)]T L(j + 1, t − 1)}

=tr{(V −1 )T W (j + 1, t − 1)V T VW (j + 1, t − 1)V −1 } =tr{W (j + 1, t − 1)V T VW (j + 1, t − 1)V −1 (V −1 )T } ≤tr{W (j + 1, t − 1)V T VW (j + 1, t − 1)}tr{V −1 (V −1 )T } =tr{V T VW 2 (j + 1, t − 1)}tr{V −1 (V −1 )T } ≤tr{V T V }tr{W 2 (j + 1, t − 1)}tr{V −1 (V −1 )T } m ∑ =D wk2 (j + 1, t − 1),

(28)

k=0

where D =tr{V T V }tr{V −1 (V −1 )T } = (

wk (j + 1, t − 1) =

t −1 ∏

∑m

l=k

υlk2 )2 > 0. Moreover, for each k = 0, . . . , m,

{1 − (k + σ )[(1 + σ )r ]−1 } < 1.

(29)

r =j+1

Thus, the bound (25) holds because t −1 ∑

tr(Ψj )/j2 < tr[ΦΦ T ]

j=m

t −1 m ∑ ∑ D [wk (j + 1, t − 1)]2 /j2 j=m

k=0

< (m + 1)tr[ΦΦ T ]D

t −1 ∑

1/j2

j=m

< (m + 1)tr[ΦΦ T ]D

π2 6

< ∞.

Proof of Theorem 1. Let µt = E(Nt ). Then

µt =

t −1 ∏

[I +

j=m

t −1 t −1 ∑ ∏ Φ (R + σ I) Φ (R + σ I) ]µm + [I + ]e (t = m + 1, m + 2, . . . ). (1 + σ ) j (1 + σ ) r

(30)

j=m r =j+1

Let c = [I − (1 + σ )−1 Φ (R + σ I)]−1 e. As stated in Remark 4, c is the fixed point of F , which is equivalent to c(t + 1) = [I + (1 + σ )−1 t −1 Φ (R + σ I)]ct + e for all t > m. That implies,

⎧ t −1 ⎨∏

⎫ t −1 t −1 ∑ ∏ Φ (R + σ I) ⎬ Φ (R + σ I) ct = [I + ] cm + [I + ]e. ⎩ (1 + σ ) j ⎭ (1 + σ ) r j=m j=m r =j+1

(31)

Taking the difference of (30) and (31) and dividing both sides by t gives

⎧ ⎫ t −1 ⎨∏ Φ (R + σ I) ⎬ −c = [I + ] (µm − cm). t t ⎩ (1 + σ ) j ⎭ j=m

µt

1

Thus,



µt t

− c ∥2 =

1 t2

(µm − cm)T [L(m, t − 1)]T L(m, t − 1)(µm − cm)

Please cite this article as: N.H. Chan, S.K.C. Cheung and S.P.S. Wong, Inference for the degree distributions of preferential attachment networks with zero-degree nodes. Journal of Econometrics (2020), https://doi.org/10.1016/j.jeconom.2020.01.015.

N.H. Chan, S.K.C. Cheung and S.P.S. Wong / Journal of Econometrics xxx (xxxx) xxx

1

=

t2 1

=

t2 1

13

tr{(µm − cm)T [L(m, t − 1)]T L(m, t − 1)(µm − cm)} tr{[L(m, t − 1)]T L(m, t − 1)(µm − cm)(µm − cm)T }

tr{[L(m, t − 1)]T L(m, t − 1)}tr[(µm − cm)(µm − cm)T ] t2 ( ) m ∑ 1 2 ≤ 2 D [wk (m, t − 1)] tr[(µm − cm)(µm − cm)T ] t



k=0

1



t2

D(m + 1)tr[(µm − cm)(µm − cm)T ].

(32)

Note that the first inequality holds because both [L(m, t −1)]T L(m, t −1) and (µm −cm)(µm −cm)T are positive semi-definite. The second and the third inequality also hold because of (28) and (29). The inequality (32) implies µt /t → c as t → ∞. Thus, Theorem 1 follows directly from the above argument and Lemma 1. Proof of Theorem 2. Note that St is square-integrable by using the above argument. Also, the corresponding Lindeberg’s condition can be justified by first noting |εi | = |Ji − E [Ji |Gi−1 ]| have all entries less than 1 and XiT Xi = εiT [L(i, t − 1)]T Φ T Φ L(i, t − 1)εi . As L(i, t − 1) have all eigenvalues less than 1 as stated in (29), I{X T X ≥ρ 2 t } = 0 for any ρ > 0 when t is sufficiently large. i i Thus, t −1 1∑

t

E {XiT Xi I{X T X ≥ρ 2 t } |Gi−1 } → 0 i

i

i=i0

in probability as t → ∞. To find the asymptotic covariance matrix of N(t)/t, we first compute Var [Nt +1 /(t + 1)] by applying conditional argument on (9), i.e., from Nt +1 t +1

=

[I +

Φ (R + σ I) Nt 1 1 e+ Φ εt +1 , ] + t +1 t +1 (1 + σ ) t t

[I +

Φ (R + σ I) Nt 1 1 e+ Φ εt +1 |Gt }) ] + t +1 t +1 (1 + σ ) t t

t t +1

compute E(Var {

=

t t +1

1 (t + 1)2

Φ E(Ωt +1 )Φ T

and

Φ (R + σ I) Nt 1 1 t [I + ] + e+ Φ εt +1 |Gt }) t +1 t +1 t +1 (1 + σ ) t t t Nt Φ (R + σ I) Φ (R + σ I) T =( ]Var( )[I + ], )2 [I + t +1 t (1 + σ ) t (1 + σ ) t

Var(E {

where Ωt +1 = diag(ηt +1 ) − ηt +1 ηtT+1 with ηt +1 = E(Jt +1 |Gt ) = (1 + σ )−1 (R + σ I)(Nt /t). Let Var [Nt /t ] = Λt /t. The above computation leads to

Λt +1 t +1

=

1 (t + 1)2

Φ E(Ωt +1 )Φ T + (

t t +1

)2 [I +

Φ (R + σ I) Λt Φ (R + σ I) T ] [I + ], (1 + σ ) t t (1 + σ ) t

which implies

Λt +1 =

t t +1

Λt +

1 t +1

[Λt BT + BΛt + Φ E(Ωt +1 )Φ T ] +

1 t(t + 1)

BΛt BT .

(33)

Using Kronecker product, (33) can be expressed as

v ec(Λt +1 ) =

t t +1

v ec(Λt ) +

1 t +1

(I ⊗ B + B ⊗ I +

1 t

B ⊗ B)v ec(Λt ) +

1 t +1

v ec(Yt +1 ),

(34)

where Yt +1 = Φ E(Ωt +1 )Φ . Consider Λ = ΛBT + BΛ + ΦΩΦ in the form of

v ec(Λ) = (I ⊗ B + B ⊗ I)v ec(Λ) + v ec(Y ),

(35)

Please cite this article as: N.H. Chan, S.K.C. Cheung and S.P.S. Wong, Inference for the degree distributions of preferential attachment networks with zero-degree nodes. Journal of Econometrics (2020), https://doi.org/10.1016/j.jeconom.2020.01.015.

14

N.H. Chan, S.K.C. Cheung and S.P.S. Wong / Journal of Econometrics xxx (xxxx) xxx

where Y = ΦΩΦ . Taking the difference between (34) and (35) gives

v ec(Λt +1 ) − v ec(Λ) =[

t t +1

1

J+

t +1

1

+

t(t + 1)

(I ⊗ B + B ⊗ I)](v ec(Λt ) − v ec(Λ))

(B ⊗ B)v ec(Λt ) +

1 t +1

(v ec(Yt +1 ) − v ec(Y ))

=Ht (v ec(Λt ) − v ec(Λ)) + ϑt +1 + ξt +1 ,

(36)

where J is the identity matrix of dimension (m + 1)2 , ϑt +1 = [t(t + 1)]−1 (B ⊗ B)v ec(Λt ), ξt +1 = (t + 1)−1 (v ec(Yt +1 ) −v ec(Y )) and t 1 Ht = J+ (I ⊗ B + B ⊗ I) t +1 t +1 is a symmetric matrix of dimension (m + 1)2 . Since the eigenvalues of B are {−(σ + i)(1 + σ )−1 : i = 0, . . . , m}, the eigenvalues of Ht are t

2(σ + i)



t +1

=

(t + 1)(1 + σ )

t − 2i + σ (t − 2) (t + 1)(1 + σ )

for i = 0, . . . , m. That implies the induced L2 norm of Ht is given by:

∥Ht ∥ =

t + σ (t − 2) (t + 1)(1 + σ )

<

t t +1

.

(37)

Let Ut = v ec(Λt +1 ) − v ec(Λ). The recursion (36) gives

∥Um+t ∥ ≤ ∥Hm+t −1 Hm+t −2 · · · Hm ∥ ∥Um ∥ + ⎞ ⎛ m+t −2 +t −1 ∑ m∏   ⎝ ∥Hr ∥⎠ ϑj + ξj  + ∥ϑm+t −1 + ξm+t +1 ∥ . j=m

(38)

r =j+1

By (37), the first term of (38) can be further bounded by m ∥Hm+t −1 Hm+t −2 · · · Hm ∥ ∥Um ∥ ≤ ∥Um ∥ → 0 as t → ∞. m+t The rest of (38) is bounded by m+t −2



∏ ⎝

j=m

  ∥Hr ∥⎠ ϑj + ξj  + ∥ϑm+t −1 + ξm+t +1 ∥

r =j+1

m+t −2





m+t −1









m+t −1

r

∏ ⎝

j=m

r =j+1

r +1

  ⎠ ϑj + ξj  + ∥ϑm+t −1 + ξm+t +1 ∥

m+t −2

=

∑ j+1   ϑj + ξj  + ∥ϑm+t −1 + ξm+t +1 ∥ m+t j=m



1 m+t

m+t −1

∑ j=m

1

j+1

j

j

[( )∥(B ⊗ B)∥∥v ec(Λj−1 )∥ + (

)∥v ec(Yj ) − v ec(Y )∥].

(39)

Since ∥v ec(Λj−1 )∥/j → 0 and ∥v ec(Yj ) − v ec(Y )∥ → 0 as j → ∞ by Theorem 1, the bound in (39) tends to zero as t → ∞ by Cesàro Mean Convergence Theorem (see p. 250 of Loève (1977)). Thus, Um+t → 0 in (38) as t → ∞, i.e., v ec(Λt ) → v ec(Λ) or Λt → Λ. References Aliev, F.A., Larin, V.B., 1998. Optimization of Linear Control Systems : Analytical Methods and Computational Algorithms. Gordon and Breach Science Publishers, Amsterdam, the Netherlands. Anderson, E.W., McGrattan, E.R., Hansen, L.P., Sargent, T.J., 1996. Mechanics of forming and estimating dynamic linear economies. In: Handbook of Computational Economics, Vol. 1. pp. 171–252. Barabási, A.L., Albert, R., 1999. Emergence of scaling in random networks. Science 286 (5439), 509–512. Barabási, A.L., Albert, R., Jeong, H., 2000. Scale-free characteristics of random networks: the topology of the world-wide web. Physica A 281 (1), 69–77. Bartels, R.H., Stewart, G.W., 1972. Algorithm 432, Solution of the matrix equation AX + XB = C . Commun. ACM 15, 820–826.

Please cite this article as: N.H. Chan, S.K.C. Cheung and S.P.S. Wong, Inference for the degree distributions of preferential attachment networks with zero-degree nodes. Journal of Econometrics (2020), https://doi.org/10.1016/j.jeconom.2020.01.015.

N.H. Chan, S.K.C. Cheung and S.P.S. Wong / Journal of Econometrics xxx (xxxx) xxx

15

Bojkov, R., Bishop, L., Hill, W.J., Reinsel, G., Tiao, G., 1990. A statistical trend analysis of dobson total ozone data over the northern hemisphere. J. Geophys. Res. 95, 9785–9807. Calvetti, D., Reichel, L., 1996. Application of adi iterative methods to the restoration of noisy images. SIAM J. Matrix Anal. Appl. 17 (1), 165–186. Chan, N.H., 1999. The ET interview: Professor George C. Tiao. Econometric Theory 15, 389–424. Cheung, K.C., 2016. A Random Directed Multigraph and its Inference (Ph.D. thesis). The Chinese University of Hong Kong. Deijfen, M., van den Esker, H., van der Hofstad, R., Hooghiemstra, G., 2009. A preferential attachment model with random initial degrees. Ark. Mat. 47, 41–72. Duflo, M., 1997. Random Iterative Models. Springer-Verlag, Berlin. Gantmacher, F.R., 1960. The Theory of Matrices, Volume One. Chelsea Pub, New York. Gao, F., Vaart, van der A.W., 2017. On the asymptotic normality of estimating the affine preferential attachment network models with random initial degrees. Stochastic Process. Appl. 127, 3754–3775. Gardiner, J.D., Laub, A.J., Amato, J.J., Moler, C.B., 1992. Solution of the Sylvester matrix equation AXBT + CXDT = E. ACM Trans. Math. Software 18 (2), 223–231. Hansen, L.P., Sargent, T.J., 2008. Robustness. Princeton University Press, New Jersey. Hansen, L.P., Sargent, T.J., 2014. Recursive Models of Dynamic Linear Economies. Princeton University Press, New Jersey. Hofstad, van der R., 2016. Random Graphs and Complex Networks. Cambridge University Press, Cambridge. Kaufmann, H., 1987. On the strong law of large numbers for multivariate martingales. Stochastic Process. Appl. 26, 73–85. Lehmann, E.L., Romano, J., 2005. Testing Statistical Hypotheses, third ed. Springer-Verlag, New York. Loève, M., 1977. Probability Theory I, fourth ed. Springer-Verlag, New York. Móri, T.F., 2005. The maximum degree of the Barabási-Albert random tree. Combin. Probab. Comput. 14 (03), 339–348. Resnick, S.I., 2007. Heavy-Tail Phenomena. Springer-Verlag, New York. Resnick, S.I., Samorodnitsky, G., 2016. Asymptotic normality of degree counts in a preferential attachment model. Adv. Appl. Probab. 48, 283—299. Sylvester, J., 1884. Sur l’equations en matrices px = xq. C. R. Acad. Sci. Paris 99 (2), 67–71, 115–116. Trentelman, H.L., Stoorvogel, A.A., Hautus, M.L.J., 2001. Control Theory for Linear Systems. Springer-Verlag, London. Wan, P., Wang, T., Davis, R.A., Resnick, S.I., 2017. Fitting the linear preferential attachment model. Electron. J. Stat. 11 (2), 3738–3780.

Please cite this article as: N.H. Chan, S.K.C. Cheung and S.P.S. Wong, Inference for the degree distributions of preferential attachment networks with zero-degree nodes. Journal of Econometrics (2020), https://doi.org/10.1016/j.jeconom.2020.01.015.