Asymptotic normality of graph statistics

Asymptotic normality of graph statistics

Journal of Statistical Planning and Inference 21 (1989) 209-222 209 North-Holland ASYMPTOTIC NORMALITY Krzysztof NOWICKI Department of Mat...

766KB Sizes 2 Downloads 103 Views

Journal

of Statistical

Planning

and Inference

21 (1989) 209-222

209

North-Holland

ASYMPTOTIC

NORMALITY

Krzysztof

NOWICKI

Department

of Mathematical

Received

27 February

Recommended

Abstract:

Statistics,

OF GRAPH

University

1986; revised manuscript

of Lund,

received

STATISTICS

Box 118, S-221 00 Lund,

11 February

Sweden

1988

by 0. Frank

Various

types of graph

statistics

for Bernoulli

graphs

are represented

as numerators

of

incomplete U-statistics. Asymptotic normality of these statistics is proved for Bernoulli graphs in which the edge probability is constant. In addition it is shown that subgraph counts asymptotically AMS

are linear

Subject

functions

of the number

Classification:

Key words:

Random

graphs;

U-statistics;

Markov

graphs.

of edges in the graph.

Primary

05C99;

Secondary

subgraph

counts;

induced

62699, subgraph

92A20 counts;

U-statistics;

incomplete

1. Introduction The statistical analysis of graph data is often based on subgraph counts. In some cases, subgraph counts are sufficient statistics, as, for example, in the case of certain Markov graphs, see Frank and Strauss (1986). However, for other models and for empirical graph data, subgraph counts are merely used to describe structural properties. Kendall and Babington Smith (1940) and Moran (1947) considered measures of inconsistency and agreement between judges in paired comparisons. Several authors have proposed and investigated indices for measuring transitivity of contacts and friendship choices, e.g. Holland and Leinhardt (1971, 197.5), Harary and Kommel (1979), Frank (1980) and Frank and Harary (1982). In random graphs, the subgraph counts are random variables with complicated mutual dependencies. Even for very simple graph models, studies of their asymptotic distributions have been restricted to special classes of subgraphs. Moon (1968) studied the distribution of the numbers of 3- and 4-cycles in a random tournament and proved their asymptotic normality. For the most common random graph model, so-called Bernoulli graphs (a graph in which edges occur independently with a common probability p), subgraph counts were investigated for so-called balanced subgraphs. The classical reference is two papers by Erdos and RCnyi (1959, 1960). Further references can be found in the text books by Bollobas (1985) and Palmer (1985) and in the survey articles by Karonski (1982, 1984). 0378-3758/89/$3.50

0

1989, Elsevier

Science Publishers

B.V. (North-Holland)

210

K. Nowicki

/ Normality

of graph statistics

In statistical analysis, Bernoulli graphs can be useful as representations of pure chance phenomena; they can thus be used to test if empirical graph data are generated from some more complicated model than a Bernoulli graph (Frank, 1980). Furthermore, Bernoulli graphs have been used to describe networks of irregularly appearing contacts between vertices; see Frank (1979). Barlow and Proschan (1975) considered reliability problems by using Bernoulli graphs. Holland and Leinhardt (1975, 1979) and Wasserman (1977) considered moments of dyad and triad counts (subgraph counts of order 2 and 3) for simple random graph models, while Frank (1979) gave the first two moments of subgraph counts of arbitrary order for general random graph models. The purpose of this paper is to study the asymptotic behaviour of subgraph counts for Bernoulli graphs in which the edge probability p is constant. It is proved that subgraph counts are asymptotically normally distributed. In addition, we show that subgraph counts asymptotically are linear functions of the number of edges in the graph. This allows us to obtain simultaneous distributions for multiple counts directly from distributions of single counts. In Section 2 we will give basic concepts and notations. Section 3 presents some results from the theory of U-statistics. In Section 4 we will give the asymptotic distribution of a single subgraph count with certain extensions in Section 5. Section 6 is devoted to investigations of the simultaneous distribution of subgraph counts, as well as their application to testing a Bernoulli graph versus a ?/Iarkov graph. Finally, in Section 7 we study induced subgraph count statistics, both for a single count and for simultaneous counts.

2. Basic concepts

and notations

LetN={l,..., n} be a set of elements called vertices, and let E be a subset of the collection of n’= (;) unordered pairs of vertices. The elements of E are called edges. A graph G consists of an ordered pair (N, E) of sets of vertices and edges. The number of vertices is called the order of G while the number of edges is called the size of G. The size of G can be at most n’ and a graph on n vertices with size n’ is called a complete graph and is denoted K,,. A subgraph H= (V, 0 of G is itself a graph whose vertex set V is a subset of N, and edge set r is a subset of E. Moreover, if every pair of vertices u and u of His connected by an edge in H whenever it is connected by an edge in G, then H is an induced subgraph of G. Two graphs G and Hare isomorphic if there is a one-to-one function Y from the vertex set of G onto the vertex set of H, such that for any two vertices t and u of G we have an edge between t and u in G if and only if we have an edge between Y(t) and Y(u) in H. We let t(G,H) denote the number of subgraphs of G which are isomorphic to H. In particular, let G= K, and let H be a graph of order u, u < n. Then, we have that b(K,, H) = (E)t(K,,, H).

K. Nowicki

3. Asymptotic

normality

/ Normality

of an incomplete

211

of graph statistics

U-statistic

Hoeffding (1948) considered a sequence of independent identically distributed function g(x,, . . . ,xk), random variables (i.i.d. T.v.) X,, . . . , X, and a symmetric which we call a kernel. He estimated the parameter B =E(g(X,, . . . ,X,)) by

where

the g;‘s are obtained

by inserting

the (t) possible

X;,, **., Xjp 1 Sj, < ...
~UwXl,

ordered

subsequences

of the gj being arbitrary.

The

(3.1)

. ..>&)IX.N

and suppose that o*>O. Using Hajek’s projection method it can be proved that fi(U,0) is asymptotically normally distributed with mean 0 and variance k2a2. In order to calculate U, we have to evaluate the sum of (z) terms. To reduce these calculations Blom (1976) introduced an incomplete U-statistic (3.2) Denote by S the set of the m subsequences Xj,, . . . ,XJk in the g;‘s. Due to the strong dependence between many of the gi’s in U. we can often choose S so that U is asymptotically equivalent to U,. This is shown by the following theorem, which was essentially given by Blom (1976). The m subsequences Xj,, . . . , Xjp appearing in S, form m2 pairs. Let f,, c= 1, . . . , k, be the number of pairs with c X-values in common, and set pc=fc/m2. Then we have Theorem

1. Suppose that for m/n + 00,

p1 = k*/n + o( 1/n)

(3.3a)

;i2Pj=Wn).

(3.3b)

and

Then (A) &(U- 0) is asymptotically normally distributed with mean 0 and variance k*a*. (B) nE(U- U*)2+0 as n-+m, where U*=(k/n)

i h(X,)-(k-1)0 i=l

and h(X;)=E(g(X;,,...,Xj~)IXi,=xj),

Isi,<

...
212

K. Nowicki

/ Normality

of graph statistics

Proof. For the proof of part (A) we refer the reader to Blom (1976). To prove part (B) we write

E(UFurthermore,

U*)2=E(U-

as proved

nE(U,and,

as proved

CQ2+E(U,-

by Hoeffding

U*)2+0,

U*)}.

(1948) we have that

by Blom (1976), under

by invoking

U,)(U,-

as n-a,

nE(U - U,,)2--f 0, Finally,

U*)2+2E{(U-

the assumption

of our theorem

as ~1403.

the Minkowski

inequality

we obtain

(B).

q

We shall now give the two other conditions that ensure the asymptotic normality of U. Let a,, t=l,..., n, denote the number of gi’s in U which contain X,, and a,, the number of g,‘s which contain both X, and X,,. Then we have: Theorem

2. If

5 af/m’= k2/n + o(l/n)

(3.4a)

I=1

and afu/m2=o(l/n),

c

(3.4b)

15l
then the conditions (3.3a) and (3.3b) of Theorem 1 ure satisfied. Proof. Consider the g;‘s appearing in U. There are Q: pairs (gi,gj) which have X, We seek the in common and a:, p airs which have both X, and X, in common. number f, + ... + fk of pairs with at least one X-value in common. Clearly, we have the double inequality

,~,a:-,_,~~~~a~~~f~+...+fh~,~~a:. Similary

it is found f2 +

Dividing

..,

that +fkl

c I~/
these inequalities

4,

by m2 and using the conditions

P1f ... +P,=k2/n+o(l/n), Hence

the conditions

...

(3.3a) and (3.3b) are satisfied.

In the sequel we shall be particularly ,?I

(2,

T,=cg, i= I

P2+

and

T=zg,

i=l

interested

of the theorem

+Pk=o(l/n). q in the numerators

we have

K. Nowicki

/ Normality

of U, and U. We call these quantities

complete

and incomplete

tively. All asymptotic results for u0 and U hold, of course, multiplication by a suitable factor.

4. Subgraph

213

of graph statistics

T-statistics,

respec-

also for T0 and T after

counts

Consider a Bernoulli graph G,, on a finite set { 1, . . . , n} of vertices, i.e. a graph where edges occur independently with the same probability p, O
= 121

1 0 i

if an edge occurs otherwise.

between

vertices

t and u,

We want to count the number of subgraphs of G,, with prescribed properties. For this purpose we may use the theory of U-statistics. We begin with a simple problem leading to a complete T-statistic: count the number of all subgraphs with s edges. In what follows, we consider only subgraphs without isolated vertices, i.e. subgraphs defined by specifying their set of edges. Subgraph counts for arbitrary graphs are then obtained by multiplying these subgraph counts by a suitable factor. For this purpose introduce the symmetric function g(x,, . . . . x,)=x,x,...x,, and the corresponding ”

complete

T-statistic

T, = ‘i’ gi .

(4.1)

i= 1

Here the gj’s are obtained by inserting in g all possible s-subsequences taken from the sequence {X,,}. Clearly, gi= 1 if all X,, corresponding to the i-th subsequence are equal to 1. Hence T, counts the number of all subgraphs with s edges. It follows from Hoeffding (1948) that T, is asymptotically normally distributed. In this case we have 6 = E(g) =ps. Moreover,

using a*=

Hence

(3.1) we obtain

V(E(gjX,,))

we have proved

Theorem

(4.2a) that

= V(p”P1X,,)=p2sP1(1

the following

-a).

(4.2b)

theorem:

3. For T, defined as before,

Ilti;(T,/(“,‘) -P”) is asymptotically

normally distributed with mean 0 and variance s2ptiP ‘(1 -p).

214

K. Nowicki

/ Normality

We now consider a fixed subgraph T, = t(Gn,p, H) of subgraphs of G,, Introduce an incomplete T-statistic

of graph statistics

H of order isomorphic

v and size s and seek the number to H.

r(k’,Z, H) TH=

1

(4.3)

ST;

i=l

with the same kernel g as in (4.1) and with t(K,, H) being the number of subgraphs isomorphic to H in K,,. By suitable choice of a set SH consisting of t(K,, H) ssubsequences from {XtU} and inserting these subsequences in g, TH counts the number of subgraphs isomorphic to H. For example, take s = 4, n = 4 and let H be a 4-cycle, i.e. an undirected graph on four vertices in which each vertex is connected with two other vertices. Set T,=g(X,,,X

13,X,,,X,,)+g(X,,,X,,,X23,X34)+g(X13,X,4,X23,X24).

Clearly, TH counts the number of 4-cycles in G4,p. For any given H the incomplete T-statistic in (4.3) is asymptotically normally distributed. To prove this we shall check that the conditions of Theorem 2 are satisfied for n = ~1’. Let a,, be the frequency of the single variable X,, in SH; also let a,,,+,, be the frethe frequency of the pair (X,,,X,,). Due quency of the pair (X,,,X,,,,); and a,,.,, to the invariance of the structure of SH when vertices are relabeled, the frequences a lu, a IL!,10;9 a Iu,uI(,do not depend on the indices. Let us therefore call them c, , c2, cj, respectively. It is immediately seen that c1 =st(K,, H)/n’; taking m = t(K,, H), a,=a,, in condition (3.4a) we see that this condition is fulfilled. We now turn to condition (3.4b). Its left side takes the form (3(:)c;

+ 3(;)c:)/(%%

H))2,

(4.4)

where we have used the fact that there are 3(t) pairs of type (X,,,X,Z) and 3(t) pairs of type (X,,,X,,). In order to find c2 and c3 denote the order of H by v and note that a pair (XrU,XwZ) can be completed to a subsequence of SH in al(zIj) ways, where a, depends on the form of H but not on n. Moreover, a pair (X,,, X,,) can be completed to a subsequence of SH in a2(EIi) ways, even here a2 depends on the form of SH but not n. Hence c2 is of order nvm4, and c3 of order noP3. Noting that t(K,, H) is of order no we see from (4.4) that the left side in condition (4.3b) is of order K3. But n’ =(z). Thus the left side of (3.4b) converges faster than l/n’ and hence (3.4b) is satisfied for n = n’. We now observe that we still have 8 and o2 as in (4.2ab). Thus, from Theorem 1 we obtain: Theorem 4. Let TH count subgraphs order v in G,,. Then @(T,/t(K,,

H) -

p”)

isomorphic

to a given graph H of size s and

K. Nowicki / Normality of graph statistics

is asymptotically

215

normally distributed with mean 0 and variance ~~p~~~‘(l -p).

Let us now investigate the asymptotic distributions for some special subgraph count statistics. Frank and Strauss (1986) showed that the number of triangles and the number of k-stars for k = 1, . . . ,n - 1 are sufficient statistics for any homogeneous Markov graph of order n. The distributions of these statistics in the case of a Bernoulli graph can be obtained from Theorem 4, namely: Corollary

tf T is the number of triangles, we have that

5.

@(T/(Y) is asymptotically

-p3)

normally distributed with mean 0 and variance 9(p5 -p6).

Corollary 6. If S, is the number of k-stars, where a k-star has order k+ 1 and size k, we have that

1/55;(W((,“,,)(k+ is asymptotically

1))-ak)

normally distributed with mean 0 and variance k2(p2kP ’ -Pan).

5. Extensions The ideas developed in Section 4 can be generalized to counting the number of all subgraphs isomorphic to at least one of a given set of graphs H,, . . . , HL, each of size s. As before, we consider only graphs without isolated vertices. For this purpose introduce an incomplete T-statistic

T= f g;

(5.1)

i=l

with

the same kernel g(xr, . . . ,x,) =x,x2 “.x, as before. Let S be the union of in S. Clearly, T counts SH,, .*. , S, and let m denote the number of subsequences the number of subgraphs isomorphic to at least one of HI, . . . , HL. Now, checking the conditions of Theorem 2 for S we prove that T is asymptotically normally distributed. First, a,, is the sum of a:,, . . . , a; which are frequences of X,, in SH,, . . . ,SHI, respectively. Hence a?, = (W,,

HI) + ... + t(K,,, HtA))2s2/n’2

and condition (3.4a) in Theorem 2 is fulfilled. Second, the left side of condition (3.4b) takes the form <3(~)c~ + 3(i)cf)/m2 where now the orders

of magnitude

of c2, c3 and m are determined

by the highest

216

K. Nowicki

/ Normality

of graph

statistics

order among H,, . . . , HL. Hence condition (3.4b) holds true because it is true for each S,. Thus we have proved that T in (5.1) is asymptotically normally distributed and the following theorem can be formulated:

Theorem 7. Let T count the number of all subgraphs isomorphic to at least one of a given set of graphs H,, . . . , HL, each of size s in G,,. Then fi(T/m

-p”),

where m = t(K,, H,) f ... + t(K,,, HL), is asymptotically

normally distributed with mean 0 and variance s2p2’-‘(1 -p).

In particular, let us count all subgraphs obtain the following corollary to Theorem Corollary

in G,, 7:

8. Let T count all subgraphs in G,,

of order

u and size s. Then

we

of order v and size s, where s> (‘2 ‘).

Then fi(T/m

-p”),

where m = (C) (‘) (

is asymptotically

6. Asymptotic

,

S )

normally distributed with mean 0 and variance s’~~‘~‘(l -p).

linear dependency

between subgraph

counts

For the purpose of investigating the asymptotic distributions of multiple counts and linear combinations of subgraph counts the following theorem 9. Let Tn count subgraphs in G,, order v and size s. Then, it holds that

Theorem

subgraph is useful:

isomorphic to a given graph H of

E(@(T,/t(K,,H)-p”)-fl(spSP’R/n’-spS))2+0,

as n-)m,

where R is the number of edges in G,,,. Proof.

By Theorem

l(B)

we have

that

fl(Tn/t(K,,,H)-pS)

is asymptotically

equal (in mean square) to n’ (s/n’) 6( Furthermore,

c l~i
we have that

(6.1)

K. Nowicki

Thus,

/ Normality

217

of graph statistics

(6.1) is equal to

R = 1 ,SicjSn Xij we obtain

and by substituting

Consider now T=(T,,..., assumed to count subgraphs respectively. The following Corollary

10.

T,.) to be a vector

0

the result of the theorem. of r subgraph

counts.

isomorphic to a given graph Hj of order result is obtained from Theorem 9:

Each Vj

Tj is

and size Sj,

The random vector (F,, . . . , F,.) where

~=~(Tj/(t(K,,Hj)-pSJ),

r,

j=l,...,

is asymptotically normally distributed with mean vector 0 and covariance matrix C=(Cij)ij=, of rank 1. Here Cij= CiCj,

where

i= 1, . . . . r.

Ci=s;p’l~~,

Proof.

By Theorem 4 each component of the considered vector is asymptotically to prove that a linear combination normally distributed. Thus, it remains CT= 1aj Fj is asymptotically normally distributed. First, we introduce

Xj=il;.-~(s,pSJ-‘R/n’-sjpSJ),

j=l,...,r,

Fj and its projection on {X,, 1
i.e. the distance

between

EXj = 0 and, from Theorem

Thus,

we have that E j$, ajc-@P(R/n’-p)

‘-0

as n-+03

>

where

Finally, since R E Bin(n’,p) we have that fl(R/n’-p) are asymptotically normally distributed.

and consequently

CJ= 1CXjTj

218

K. Nowicki

To calculate

the covariance

/ Normality

matrix

of graph siatistics

we write

C,:i=C(ai+biR,aj+bjR)=b;bjVar(R). since b, = sips’ ‘/fl, bj = sips/ ’ /@ and Var(R)=n’p(l -p) we obtain the covariance matrix. Finally, since each component of the vector is a multiple of R we obtain that the rank is equal to 1. 0

Hence,

Another application of Theorem 9 is in the investigation of linear combinations of subgraph counts. We will consider the example of testing a Bernoulli graph against a Markov graph. Here, we devote ourselves to a Markov graph model used by Frank and Strauss (1986) with probability function P,(G)=c~‘exp(~r+as+rf), where r, s, f are equal to the number of edges, 2-stars and triangles in graph G, respectively. This model captures both transitivity and clustering in graph data and can be seen as a natural generalisation of a Bernoulli graph model. In fact if o = r = 0, P&G) reduces to a probability function of a Bernoulli graph G,, which will be denoted by P,(G) Consider

=p’(l

-p)“‘-’

where

p = e@/(l + ee).

now the hypothesis H,:

P,(G)

= c-l exp(pr)

H,:

P,(G)=c,‘exp(elr+a,s+slt)

against By using the where e, ei, o, and ri are specified values of the parameters. Neyman-Pearson lemma we obtain that the critical region of the likelihood test is of the type (el-~)r+a,s+s~r>k. Now, let R, S and T be the number of edges, 2-stars tively. Then, from Theorem 9 we obtain that S=2p(n-2)R-3p2(;)+&,

and

and triangles

in Gn,p, respec-

T=p’(n-2)R-2p3(;)+&,

where r.v. e, and E, are such that E(E,) = E(E,) = 0 Consequently,

under ((ei -e)

and

Ha the critical + 2adn

from which we finally

obtain

I’(&,) = 0(n2), region

- 2) + tlp2(n

I/

= 0(n2).

is approximately - 2))R > k+ 3alp2(;)

that the test is of the following

R>k,

if (~~-@)+2a,p(n-2)+r,p~(rz-2)>0,

R
if (~~-~)+2o,p(n-2)+s,p~(n-2)<0.

+ 2rlp3(;), type:

K. Nowicki

/ Normality

219

of graph statistics

7. Induced subgrapb counts Consider,

as before, G,, to be a Bernoulli graph. We want to count the number subgraphs of G,,, isomorphic to a given graph H of order u and size s. We assume that H does not have isolated vertices; for graph H with isolated vertices we count instead induced subgraphs of G, 1up isomorphic to the complement of H. Since an induced subgraph of G,, having ii, . . . , i,, say, as vertices, is either isomorphic to H or nonisomorphic to H we can represent TEd as

Tkd of induced

rid=

c l
g(Xi,!*, ..*3XjL,~,i,,)

(7.1)

where 1 g(X,,+

if the subgraph of G,, induced i,, . . . , i, is isomorphic to H, otherwise.

. . . ,X;,_ , ;,) = 1 0

For instance,

take H to be a cycle of size 4 and order g(X@ Xik, Xi,, XJk, xjh

xk/)

=xijxi,xjkxk,(l

4. Then, -

+xi,x;kx~,xk/(l + xjkxi[xjkxj/(

by

Xik)(l

-

-X,/)(1

xj/>

- xjk)

1 - X0)( 1 - Xk,).

It is easy to verify that g is no longer a symmetric function and consequently Tgd is not a numerator of an incomplete U-statistic. However, T,$‘”can be written as a linear combination of subgraph counts for all subgraphs Hi such that H C H; c K,. Namely,

Tkd=

c (- l)S’-St(H;, H)T,, HCH,CK,,

where Si is the size of Hi and TH, is the number Hi. Thus we arrive at the following theorem:

(7.2) of subgraphs

of G,,

isomorphic

to

Theorem 11. Let Tk” count induced subgraphs in G,, isomorphic to a given graph H of order v and size s. Then

I/TI;(rid/(:) is asymptotically (P(1 -P)).

- f3), where

0 = t(K,, H)pS(l -p)“‘-“,

normally distributed

with mean 0 and variance e2(s- (y)p)‘/

10 since Proof. Asymptotical normality of T2d follows from Corollary linear combination of subgraph counts of the same order. Moreover, the (7.1) since each tion of Tkd can be easily obtained from representation each with the g(X,,,, ... ,xiv_ ,i,) is a sum of t(K,,H) random indicators, tion ps(l -p)“‘-“. In order to obtain the variance we use (7.2) and write

I/;;r(T;;ld/(;)-6’)=@(

c HcH,cK,,

(-1,‘-‘t(Hj,H)TH,/(;)-8).

Tkd is a expectafunction expecta-

220

K. Nowicki / Normality of graph statistics

Thus we have to calculate n’Var (which,

,,Z,

by arguments n’Var

I- ”

(- W”WiJW,X)

as in the proof

>

of Corollary

10, is asymptotically

C (- l)S’~St(Hj,H)SiPSI~lf(Ku,H,)R/n’ ( HCH,CK,,

equal to

>

and since Var(R) = n’pq to

P4

C (

( - l)S’~St(Hj, H)s;pS’-’

2.

t(K”, Hi)

HcH,cK,,

(7.3)

>

Finally, this expression can be simplified Namely, we obtain that

by calculating

the expectation

13using (7.2).

e=Hc;cK (- l)S’~St(H,,H)t(K,,Hi)PSf - I- ” and consequently (dB/dp) =

2 (- l)S’~Sf(H;,H)t(K,,H;)~ipS~~l. HcH,cK,,

Hence, expression (7.3) can be written pq(de/dp)2 tained by evaluating the expression

and consequently

variance

is ob-

d(W,,WpS(l-p)“‘-S) dp Remark 12. Observe that when p = s/(y) the asymptotic variance in Theorem 11 is equal to 0 and we obtain only the convergence in probability to 0. Maehara (1987) proved, independently, Theorem 11 when p = 0.5. Let us now consider T=(T,,..., T,.) to be a vector of induced subgraph count statistics, in which each component Tj counts induced subgraphs Hj of order uj and size sJ, respectively. Then we can give the following theorem: Theorem

13. We have that (~(T,/(,:)-e,),...,~(T,/(G,)-er)

is asymptotically

normally

distributed

with mean vector 0 and covariance matrix

C = (Cij)ij= 1 of rank 1. Here

ej= t(K”,

H,)$‘(l

-p)“‘+

and C,= CiCj,

Ci=e;(Si-(;)P)/J/m,

i,j= 1, . . . . r.

K. Nowicki

Proof.

The asymptotic

to prove Corollary similar

normality

/ Normality

can be proved

10. In order to establish

to that used to prove Theorem

cjj = cov

221

of graph statistics

by a method

covariance

matrix

similar

It allows us to conclude

Il.

to that used

C we use a method that

H,c;iX.,, (- ~)““-“‘~(H~,H;)~~P~~--I~(K~,H~)R/~, (- l)“‘flPsJt(H,, Hj)s,pS’“- ’ t(K,, H,)R/fl c fl, r H,,,c K”

and after calculation

Finally,

the evaluation

we obtain

1

that

of this expression 7

gives the covariance

matrix.

0

Consider now T= (T,, T,, T,, T,) to be a vector of triad counts for a Bernoulli graph where the components T,. give the number of triads with r edges where

r=O , . . . ,3, i.e. T, is of order 3 and size r. Frank

(1979) obtained the first- and second-degree moments of triad counts. We can use Theorem 13 in order to generalize his results. The following corollary gives the asymptotic distribution of T: Corollary

14. Let T=(T,,,...,

fl(T4;)

T,) be a vector of triad counts in G,,.

Then

- e),

where 13=(& ,..., f?,),

f3,=(:)p’(l-p)3P’

for i=O ,..., 3,

is asymptotically normally distributed with mean vector 0 and covariance matrix C = (Cij):j=o where Cij = Ci Cj

and C,=(~)pi(l-p)3-i(3p-i)/ll(l-plp,

i,j=O,...,3.

Acknowledgement I am indebted to Professor Gunnar Blom and Professor Ove Frank for offering so much of their time for discussion, valuable suggestions and comments. I also want to thank Professor Svante Jansson who gave me valuable comments and Docent Daniel Thorburn who further extended my understanding of the asymptotic behaviour of subgraph counts.

222

K. Nowicki

/ Normality

of graph statistics

References Barlow, R.E. and F. Proschan (1975). Statistical Theory ofReliability and Life Testing. Holt, New York. Blom, G. (1976). Some properties of incomplete U-statistics. Biometrika 63, 573-580. BollobBs, ErdGs,

B. (1985).

Random

Graphs.

Academic

P. and A. Renyi (1959). On random

Press,

graphs

Erdiis, P. and A. Renyi (1960). On the evolution 5, 17-61. Frank, 0. (1979). Moment Sci. 319, 207-218.

properties

Frank,

0. (1980). Transitivity

Frank,

0. and F. Harary

Amer. Frank,

Statist.

Assoc.

0. and D. Strauss

P. and S. Leinhardt Studies 2, 107-124.

Holland,

P. and S. Leinhardt

(1979).

.I. Amer.

measures

J. Math.

Statist.

Insr. Hung. Acad. Ann.

Social.

indices

with asymptotically

normal

in structural

San Francisco,

New

Sci.

York Acad.

7, 199-213.

in empirical

graphs.

J.

Social.

6,

Seria Matematyka

M. and B. Babington

81, 832-842.

and balance.

J. Math.

distribution.

Ann.

models

in social networks.

Math.

of small groups.

Statist.

Comparitive

In: D. Heise, Ed., Sociological

CA.

Eds. (1979). Structural

Subgraphs

Assoc.

for transitivity

Sociometry

Academic Press, New York. A review of random graphs. J. Graph

M. (1984). Balanced

Kendall,

graphs.

Matrix

(1975). Local structure

1976. Jossey-Bass,

za w Poznaniu,

6, 290-297.

graphs.

by using ransitivity

(1971). Transitivity

P. and S. Leinhardt,

Karoriski,

Debrecen Publ. Math.

in stochastic

and digraphs.

inference

(1986). Markov

Holland, Group

and Commentaries. Karoriski, M. (1982).

graphs

Cluster

Hoeffding, W. (1948). A class of statistics 19, 293-325.

Holland,

counts

graphs.

77, 835-840.

Harary, F. and H. Kommel 199-210.

Methodology

of random

of subgraph

in stochastic

(1982).

New York.

1. Pubf. Math.

of Large Random

Social Networks,

Surveys,

Advances

Theory 6, 349-389. Graph.

Uniwersytet

im. Adama

Mickiewic-

nr. 7.

Smith

(1940).

On the method

of paired

comparisons.

of a random

graph.

Biometrika

31,

324-345. Maehara,

H. (1987).

309-312. Moon, J.W.

(1968).

hloran,

P.A.

Palmer,

E.M.

Wasrcrman, J. Math.

On the number

of induced

Topics on Tournaments.

(1947). On the method

Evolution.

S.S. (1977). Random

directed

Social.

5, 61-86.

Holt,

of paired

(1985). Graphical

subgraphs

graph

Math.

64,

New York.

comparisons.

Wiley,

Discrete

Biometrika

34, 363-365.

New, York.

distributions

and the triad census in social networks.