Journal
of Statistical
Planning
and Inference
21 (1989) 209-222
209
North-Holland
ASYMPTOTIC
NORMALITY
Krzysztof
NOWICKI
Department
of Mathematical
Received
27 February
Recommended
Abstract:
Statistics,
OF GRAPH
University
1986; revised manuscript
of Lund,
received
STATISTICS
Box 118, S-221 00 Lund,
11 February
Sweden
1988
by 0. Frank
Various
types of graph
statistics
for Bernoulli
graphs
are represented
as numerators
of
incomplete U-statistics. Asymptotic normality of these statistics is proved for Bernoulli graphs in which the edge probability is constant. In addition it is shown that subgraph counts asymptotically AMS
are linear
Subject
functions
of the number
Classification:
Key words:
Random
graphs;
U-statistics;
Markov
graphs.
of edges in the graph.
Primary
05C99;
Secondary
subgraph
counts;
induced
62699, subgraph
92A20 counts;
U-statistics;
incomplete
1. Introduction The statistical analysis of graph data is often based on subgraph counts. In some cases, subgraph counts are sufficient statistics, as, for example, in the case of certain Markov graphs, see Frank and Strauss (1986). However, for other models and for empirical graph data, subgraph counts are merely used to describe structural properties. Kendall and Babington Smith (1940) and Moran (1947) considered measures of inconsistency and agreement between judges in paired comparisons. Several authors have proposed and investigated indices for measuring transitivity of contacts and friendship choices, e.g. Holland and Leinhardt (1971, 197.5), Harary and Kommel (1979), Frank (1980) and Frank and Harary (1982). In random graphs, the subgraph counts are random variables with complicated mutual dependencies. Even for very simple graph models, studies of their asymptotic distributions have been restricted to special classes of subgraphs. Moon (1968) studied the distribution of the numbers of 3- and 4-cycles in a random tournament and proved their asymptotic normality. For the most common random graph model, so-called Bernoulli graphs (a graph in which edges occur independently with a common probability p), subgraph counts were investigated for so-called balanced subgraphs. The classical reference is two papers by Erdos and RCnyi (1959, 1960). Further references can be found in the text books by Bollobas (1985) and Palmer (1985) and in the survey articles by Karonski (1982, 1984). 0378-3758/89/$3.50
0
1989, Elsevier
Science Publishers
B.V. (North-Holland)
210
K. Nowicki
/ Normality
of graph statistics
In statistical analysis, Bernoulli graphs can be useful as representations of pure chance phenomena; they can thus be used to test if empirical graph data are generated from some more complicated model than a Bernoulli graph (Frank, 1980). Furthermore, Bernoulli graphs have been used to describe networks of irregularly appearing contacts between vertices; see Frank (1979). Barlow and Proschan (1975) considered reliability problems by using Bernoulli graphs. Holland and Leinhardt (1975, 1979) and Wasserman (1977) considered moments of dyad and triad counts (subgraph counts of order 2 and 3) for simple random graph models, while Frank (1979) gave the first two moments of subgraph counts of arbitrary order for general random graph models. The purpose of this paper is to study the asymptotic behaviour of subgraph counts for Bernoulli graphs in which the edge probability p is constant. It is proved that subgraph counts are asymptotically normally distributed. In addition, we show that subgraph counts asymptotically are linear functions of the number of edges in the graph. This allows us to obtain simultaneous distributions for multiple counts directly from distributions of single counts. In Section 2 we will give basic concepts and notations. Section 3 presents some results from the theory of U-statistics. In Section 4 we will give the asymptotic distribution of a single subgraph count with certain extensions in Section 5. Section 6 is devoted to investigations of the simultaneous distribution of subgraph counts, as well as their application to testing a Bernoulli graph versus a ?/Iarkov graph. Finally, in Section 7 we study induced subgraph count statistics, both for a single count and for simultaneous counts.
2. Basic concepts
and notations
LetN={l,..., n} be a set of elements called vertices, and let E be a subset of the collection of n’= (;) unordered pairs of vertices. The elements of E are called edges. A graph G consists of an ordered pair (N, E) of sets of vertices and edges. The number of vertices is called the order of G while the number of edges is called the size of G. The size of G can be at most n’ and a graph on n vertices with size n’ is called a complete graph and is denoted K,,. A subgraph H= (V, 0 of G is itself a graph whose vertex set V is a subset of N, and edge set r is a subset of E. Moreover, if every pair of vertices u and u of His connected by an edge in H whenever it is connected by an edge in G, then H is an induced subgraph of G. Two graphs G and Hare isomorphic if there is a one-to-one function Y from the vertex set of G onto the vertex set of H, such that for any two vertices t and u of G we have an edge between t and u in G if and only if we have an edge between Y(t) and Y(u) in H. We let t(G,H) denote the number of subgraphs of G which are isomorphic to H. In particular, let G= K, and let H be a graph of order u, u < n. Then, we have that b(K,, H) = (E)t(K,,, H).
K. Nowicki
3. Asymptotic
normality
/ Normality
of an incomplete
211
of graph statistics
U-statistic
Hoeffding (1948) considered a sequence of independent identically distributed function g(x,, . . . ,xk), random variables (i.i.d. T.v.) X,, . . . , X, and a symmetric which we call a kernel. He estimated the parameter B =E(g(X,, . . . ,X,)) by
where
the g;‘s are obtained
by inserting
the (t) possible
X;,, **., Xjp 1 Sj, < ...
~UwXl,
ordered
subsequences
of the gj being arbitrary.
The
(3.1)
. ..>&)IX.N
and suppose that o*>O. Using Hajek’s projection method it can be proved that fi(U,0) is asymptotically normally distributed with mean 0 and variance k2a2. In order to calculate U, we have to evaluate the sum of (z) terms. To reduce these calculations Blom (1976) introduced an incomplete U-statistic (3.2) Denote by S the set of the m subsequences Xj,, . . . ,XJk in the g;‘s. Due to the strong dependence between many of the gi’s in U. we can often choose S so that U is asymptotically equivalent to U,. This is shown by the following theorem, which was essentially given by Blom (1976). The m subsequences Xj,, . . . , Xjp appearing in S, form m2 pairs. Let f,, c= 1, . . . , k, be the number of pairs with c X-values in common, and set pc=fc/m2. Then we have Theorem
1. Suppose that for m/n + 00,
p1 = k*/n + o( 1/n)
(3.3a)
;i2Pj=Wn).
(3.3b)
and
Then (A) &(U- 0) is asymptotically normally distributed with mean 0 and variance k*a*. (B) nE(U- U*)2+0 as n-+m, where U*=(k/n)
i h(X,)-(k-1)0 i=l
and h(X;)=E(g(X;,,...,Xj~)IXi,=xj),
Isi,<
...
212
K. Nowicki
/ Normality
of graph statistics
Proof. For the proof of part (A) we refer the reader to Blom (1976). To prove part (B) we write
E(UFurthermore,
U*)2=E(U-
as proved
nE(U,and,
as proved
CQ2+E(U,-
by Hoeffding
U*)2+0,
U*)}.
(1948) we have that
by Blom (1976), under
by invoking
U,)(U,-
as n-a,
nE(U - U,,)2--f 0, Finally,
U*)2+2E{(U-
the assumption
of our theorem
as ~1403.
the Minkowski
inequality
we obtain
(B).
q
We shall now give the two other conditions that ensure the asymptotic normality of U. Let a,, t=l,..., n, denote the number of gi’s in U which contain X,, and a,, the number of g,‘s which contain both X, and X,,. Then we have: Theorem
2. If
5 af/m’= k2/n + o(l/n)
(3.4a)
I=1
and afu/m2=o(l/n),
c
(3.4b)
15l
then the conditions (3.3a) and (3.3b) of Theorem 1 ure satisfied. Proof. Consider the g;‘s appearing in U. There are Q: pairs (gi,gj) which have X, We seek the in common and a:, p airs which have both X, and X, in common. number f, + ... + fk of pairs with at least one X-value in common. Clearly, we have the double inequality
,~,a:-,_,~~~~a~~~f~+...+fh~,~~a:. Similary
it is found f2 +
Dividing
..,
that +fkl
c I~/
these inequalities
4,
by m2 and using the conditions
P1f ... +P,=k2/n+o(l/n), Hence
the conditions
...
(3.3a) and (3.3b) are satisfied.
In the sequel we shall be particularly ,?I
(2,
T,=cg, i= I
P2+
and
T=zg,
i=l
interested
of the theorem
+Pk=o(l/n). q in the numerators
we have
K. Nowicki
/ Normality
of U, and U. We call these quantities
complete
and incomplete
tively. All asymptotic results for u0 and U hold, of course, multiplication by a suitable factor.
4. Subgraph
213
of graph statistics
T-statistics,
respec-
also for T0 and T after
counts
Consider a Bernoulli graph G,, on a finite set { 1, . . . , n} of vertices, i.e. a graph where edges occur independently with the same probability p, O
= 121
1 0 i
if an edge occurs otherwise.
between
vertices
t and u,
We want to count the number of subgraphs of G,, with prescribed properties. For this purpose we may use the theory of U-statistics. We begin with a simple problem leading to a complete T-statistic: count the number of all subgraphs with s edges. In what follows, we consider only subgraphs without isolated vertices, i.e. subgraphs defined by specifying their set of edges. Subgraph counts for arbitrary graphs are then obtained by multiplying these subgraph counts by a suitable factor. For this purpose introduce the symmetric function g(x,, . . . . x,)=x,x,...x,, and the corresponding ”
complete
T-statistic
T, = ‘i’ gi .
(4.1)
i= 1
Here the gj’s are obtained by inserting in g all possible s-subsequences taken from the sequence {X,,}. Clearly, gi= 1 if all X,, corresponding to the i-th subsequence are equal to 1. Hence T, counts the number of all subgraphs with s edges. It follows from Hoeffding (1948) that T, is asymptotically normally distributed. In this case we have 6 = E(g) =ps. Moreover,
using a*=
Hence
(3.1) we obtain
V(E(gjX,,))
we have proved
Theorem
(4.2a) that
= V(p”P1X,,)=p2sP1(1
the following
-a).
(4.2b)
theorem:
3. For T, defined as before,
Ilti;(T,/(“,‘) -P”) is asymptotically
normally distributed with mean 0 and variance s2ptiP ‘(1 -p).
214
K. Nowicki
/ Normality
We now consider a fixed subgraph T, = t(Gn,p, H) of subgraphs of G,, Introduce an incomplete T-statistic
of graph statistics
H of order isomorphic
v and size s and seek the number to H.
r(k’,Z, H) TH=
1
(4.3)
ST;
i=l
with the same kernel g as in (4.1) and with t(K,, H) being the number of subgraphs isomorphic to H in K,,. By suitable choice of a set SH consisting of t(K,, H) ssubsequences from {XtU} and inserting these subsequences in g, TH counts the number of subgraphs isomorphic to H. For example, take s = 4, n = 4 and let H be a 4-cycle, i.e. an undirected graph on four vertices in which each vertex is connected with two other vertices. Set T,=g(X,,,X
13,X,,,X,,)+g(X,,,X,,,X23,X34)+g(X13,X,4,X23,X24).
Clearly, TH counts the number of 4-cycles in G4,p. For any given H the incomplete T-statistic in (4.3) is asymptotically normally distributed. To prove this we shall check that the conditions of Theorem 2 are satisfied for n = ~1’. Let a,, be the frequency of the single variable X,, in SH; also let a,,,+,, be the frethe frequency of the pair (X,,,X,,). Due quency of the pair (X,,,X,,,,); and a,,.,, to the invariance of the structure of SH when vertices are relabeled, the frequences a lu, a IL!,10;9 a Iu,uI(,do not depend on the indices. Let us therefore call them c, , c2, cj, respectively. It is immediately seen that c1 =st(K,, H)/n’; taking m = t(K,, H), a,=a,, in condition (3.4a) we see that this condition is fulfilled. We now turn to condition (3.4b). Its left side takes the form (3(:)c;
+ 3(;)c:)/(%%
H))2,
(4.4)
where we have used the fact that there are 3(t) pairs of type (X,,,X,Z) and 3(t) pairs of type (X,,,X,,). In order to find c2 and c3 denote the order of H by v and note that a pair (XrU,XwZ) can be completed to a subsequence of SH in al(zIj) ways, where a, depends on the form of H but not on n. Moreover, a pair (X,,, X,,) can be completed to a subsequence of SH in a2(EIi) ways, even here a2 depends on the form of SH but not n. Hence c2 is of order nvm4, and c3 of order noP3. Noting that t(K,, H) is of order no we see from (4.4) that the left side in condition (4.3b) is of order K3. But n’ =(z). Thus the left side of (3.4b) converges faster than l/n’ and hence (3.4b) is satisfied for n = n’. We now observe that we still have 8 and o2 as in (4.2ab). Thus, from Theorem 1 we obtain: Theorem 4. Let TH count subgraphs order v in G,,. Then @(T,/t(K,,
H) -
p”)
isomorphic
to a given graph H of size s and
K. Nowicki / Normality of graph statistics
is asymptotically
215
normally distributed with mean 0 and variance ~~p~~~‘(l -p).
Let us now investigate the asymptotic distributions for some special subgraph count statistics. Frank and Strauss (1986) showed that the number of triangles and the number of k-stars for k = 1, . . . ,n - 1 are sufficient statistics for any homogeneous Markov graph of order n. The distributions of these statistics in the case of a Bernoulli graph can be obtained from Theorem 4, namely: Corollary
tf T is the number of triangles, we have that
5.
@(T/(Y) is asymptotically
-p3)
normally distributed with mean 0 and variance 9(p5 -p6).
Corollary 6. If S, is the number of k-stars, where a k-star has order k+ 1 and size k, we have that
1/55;(W((,“,,)(k+ is asymptotically
1))-ak)
normally distributed with mean 0 and variance k2(p2kP ’ -Pan).
5. Extensions The ideas developed in Section 4 can be generalized to counting the number of all subgraphs isomorphic to at least one of a given set of graphs H,, . . . , HL, each of size s. As before, we consider only graphs without isolated vertices. For this purpose introduce an incomplete T-statistic
T= f g;
(5.1)
i=l
with
the same kernel g(xr, . . . ,x,) =x,x2 “.x, as before. Let S be the union of in S. Clearly, T counts SH,, .*. , S, and let m denote the number of subsequences the number of subgraphs isomorphic to at least one of HI, . . . , HL. Now, checking the conditions of Theorem 2 for S we prove that T is asymptotically normally distributed. First, a,, is the sum of a:,, . . . , a; which are frequences of X,, in SH,, . . . ,SHI, respectively. Hence a?, = (W,,
HI) + ... + t(K,,, HtA))2s2/n’2
and condition (3.4a) in Theorem 2 is fulfilled. Second, the left side of condition (3.4b) takes the form <3(~)c~ + 3(i)cf)/m2 where now the orders
of magnitude
of c2, c3 and m are determined
by the highest
216
K. Nowicki
/ Normality
of graph
statistics
order among H,, . . . , HL. Hence condition (3.4b) holds true because it is true for each S,. Thus we have proved that T in (5.1) is asymptotically normally distributed and the following theorem can be formulated:
Theorem 7. Let T count the number of all subgraphs isomorphic to at least one of a given set of graphs H,, . . . , HL, each of size s in G,,. Then fi(T/m
-p”),
where m = t(K,, H,) f ... + t(K,,, HL), is asymptotically
normally distributed with mean 0 and variance s2p2’-‘(1 -p).
In particular, let us count all subgraphs obtain the following corollary to Theorem Corollary
in G,, 7:
8. Let T count all subgraphs in G,,
of order
u and size s. Then
we
of order v and size s, where s> (‘2 ‘).
Then fi(T/m
-p”),
where m = (C) (‘) (
is asymptotically
6. Asymptotic
,
S )
normally distributed with mean 0 and variance s’~~‘~‘(l -p).
linear dependency
between subgraph
counts
For the purpose of investigating the asymptotic distributions of multiple counts and linear combinations of subgraph counts the following theorem 9. Let Tn count subgraphs in G,, order v and size s. Then, it holds that
Theorem
subgraph is useful:
isomorphic to a given graph H of
E(@(T,/t(K,,H)-p”)-fl(spSP’R/n’-spS))2+0,
as n-)m,
where R is the number of edges in G,,,. Proof.
By Theorem
l(B)
we have
that
fl(Tn/t(K,,,H)-pS)
is asymptotically
equal (in mean square) to n’ (s/n’) 6( Furthermore,
c l~i
we have that
(6.1)
K. Nowicki
Thus,
/ Normality
217
of graph statistics
(6.1) is equal to
R = 1 ,SicjSn Xij we obtain
and by substituting
Consider now T=(T,,..., assumed to count subgraphs respectively. The following Corollary
10.
T,.) to be a vector
0
the result of the theorem. of r subgraph
counts.
isomorphic to a given graph Hj of order result is obtained from Theorem 9:
Each Vj
Tj is
and size Sj,
The random vector (F,, . . . , F,.) where
~=~(Tj/(t(K,,Hj)-pSJ),
r,
j=l,...,
is asymptotically normally distributed with mean vector 0 and covariance matrix C=(Cij)ij=, of rank 1. Here Cij= CiCj,
where
i= 1, . . . . r.
Ci=s;p’l~~,
Proof.
By Theorem 4 each component of the considered vector is asymptotically to prove that a linear combination normally distributed. Thus, it remains CT= 1aj Fj is asymptotically normally distributed. First, we introduce
Xj=il;.-~(s,pSJ-‘R/n’-sjpSJ),
j=l,...,r,
Fj and its projection on {X,, 1
i.e. the distance
between
EXj = 0 and, from Theorem
Thus,
we have that E j$, ajc-@P(R/n’-p)
‘-0
as n-+03
>
where
Finally, since R E Bin(n’,p) we have that fl(R/n’-p) are asymptotically normally distributed.
and consequently
CJ= 1CXjTj
218
K. Nowicki
To calculate
the covariance
/ Normality
matrix
of graph siatistics
we write
C,:i=C(ai+biR,aj+bjR)=b;bjVar(R). since b, = sips’ ‘/fl, bj = sips/ ’ /@ and Var(R)=n’p(l -p) we obtain the covariance matrix. Finally, since each component of the vector is a multiple of R we obtain that the rank is equal to 1. 0
Hence,
Another application of Theorem 9 is in the investigation of linear combinations of subgraph counts. We will consider the example of testing a Bernoulli graph against a Markov graph. Here, we devote ourselves to a Markov graph model used by Frank and Strauss (1986) with probability function P,(G)=c~‘exp(~r+as+rf), where r, s, f are equal to the number of edges, 2-stars and triangles in graph G, respectively. This model captures both transitivity and clustering in graph data and can be seen as a natural generalisation of a Bernoulli graph model. In fact if o = r = 0, P&G) reduces to a probability function of a Bernoulli graph G,, which will be denoted by P,(G) Consider
=p’(l
-p)“‘-’
where
p = e@/(l + ee).
now the hypothesis H,:
P,(G)
= c-l exp(pr)
H,:
P,(G)=c,‘exp(elr+a,s+slt)
against By using the where e, ei, o, and ri are specified values of the parameters. Neyman-Pearson lemma we obtain that the critical region of the likelihood test is of the type (el-~)r+a,s+s~r>k. Now, let R, S and T be the number of edges, 2-stars tively. Then, from Theorem 9 we obtain that S=2p(n-2)R-3p2(;)+&,
and
and triangles
in Gn,p, respec-
T=p’(n-2)R-2p3(;)+&,
where r.v. e, and E, are such that E(E,) = E(E,) = 0 Consequently,
under ((ei -e)
and
Ha the critical + 2adn
from which we finally
obtain
I’(&,) = 0(n2), region
- 2) + tlp2(n
I/
= 0(n2).
is approximately - 2))R > k+ 3alp2(;)
that the test is of the following
R>k,
if (~~-@)+2a,p(n-2)+r,p~(rz-2)>0,
R
if (~~-~)+2o,p(n-2)+s,p~(n-2)<0.
+ 2rlp3(;), type:
K. Nowicki
/ Normality
219
of graph statistics
7. Induced subgrapb counts Consider,
as before, G,, to be a Bernoulli graph. We want to count the number subgraphs of G,,, isomorphic to a given graph H of order u and size s. We assume that H does not have isolated vertices; for graph H with isolated vertices we count instead induced subgraphs of G, 1up isomorphic to the complement of H. Since an induced subgraph of G,, having ii, . . . , i,, say, as vertices, is either isomorphic to H or nonisomorphic to H we can represent TEd as
Tkd of induced
rid=
c l
g(Xi,!*, ..*3XjL,~,i,,)
(7.1)
where 1 g(X,,+
if the subgraph of G,, induced i,, . . . , i, is isomorphic to H, otherwise.
. . . ,X;,_ , ;,) = 1 0
For instance,
take H to be a cycle of size 4 and order g(X@ Xik, Xi,, XJk, xjh
xk/)
=xijxi,xjkxk,(l
4. Then, -
+xi,x;kx~,xk/(l + xjkxi[xjkxj/(
by
Xik)(l
-
-X,/)(1
xj/>
- xjk)
1 - X0)( 1 - Xk,).
It is easy to verify that g is no longer a symmetric function and consequently Tgd is not a numerator of an incomplete U-statistic. However, T,$‘”can be written as a linear combination of subgraph counts for all subgraphs Hi such that H C H; c K,. Namely,
Tkd=
c (- l)S’-St(H;, H)T,, HCH,CK,,
where Si is the size of Hi and TH, is the number Hi. Thus we arrive at the following theorem:
(7.2) of subgraphs
of G,,
isomorphic
to
Theorem 11. Let Tk” count induced subgraphs in G,, isomorphic to a given graph H of order v and size s. Then
I/TI;(rid/(:) is asymptotically (P(1 -P)).
- f3), where
0 = t(K,, H)pS(l -p)“‘-“,
normally distributed
with mean 0 and variance e2(s- (y)p)‘/
10 since Proof. Asymptotical normality of T2d follows from Corollary linear combination of subgraph counts of the same order. Moreover, the (7.1) since each tion of Tkd can be easily obtained from representation each with the g(X,,,, ... ,xiv_ ,i,) is a sum of t(K,,H) random indicators, tion ps(l -p)“‘-“. In order to obtain the variance we use (7.2) and write
I/;;r(T;;ld/(;)-6’)=@(
c HcH,cK,,
(-1,‘-‘t(Hj,H)TH,/(;)-8).
Tkd is a expectafunction expecta-
220
K. Nowicki / Normality of graph statistics
Thus we have to calculate n’Var (which,
,,Z,
by arguments n’Var
I- ”
(- W”WiJW,X)
as in the proof
>
of Corollary
10, is asymptotically
C (- l)S’~St(Hj,H)SiPSI~lf(Ku,H,)R/n’ ( HCH,CK,,
equal to
>
and since Var(R) = n’pq to
P4
C (
( - l)S’~St(Hj, H)s;pS’-’
2.
t(K”, Hi)
HcH,cK,,
(7.3)
>
Finally, this expression can be simplified Namely, we obtain that
by calculating
the expectation
13using (7.2).
e=Hc;cK (- l)S’~St(H,,H)t(K,,Hi)PSf - I- ” and consequently (dB/dp) =
2 (- l)S’~Sf(H;,H)t(K,,H;)~ipS~~l. HcH,cK,,
Hence, expression (7.3) can be written pq(de/dp)2 tained by evaluating the expression
and consequently
variance
is ob-
d(W,,WpS(l-p)“‘-S) dp Remark 12. Observe that when p = s/(y) the asymptotic variance in Theorem 11 is equal to 0 and we obtain only the convergence in probability to 0. Maehara (1987) proved, independently, Theorem 11 when p = 0.5. Let us now consider T=(T,,..., T,.) to be a vector of induced subgraph count statistics, in which each component Tj counts induced subgraphs Hj of order uj and size sJ, respectively. Then we can give the following theorem: Theorem
13. We have that (~(T,/(,:)-e,),...,~(T,/(G,)-er)
is asymptotically
normally
distributed
with mean vector 0 and covariance matrix
C = (Cij)ij= 1 of rank 1. Here
ej= t(K”,
H,)$‘(l
-p)“‘+
and C,= CiCj,
Ci=e;(Si-(;)P)/J/m,
i,j= 1, . . . . r.
K. Nowicki
Proof.
The asymptotic
to prove Corollary similar
normality
/ Normality
can be proved
10. In order to establish
to that used to prove Theorem
cjj = cov
221
of graph statistics
by a method
covariance
matrix
similar
It allows us to conclude
Il.
to that used
C we use a method that
H,c;iX.,, (- ~)““-“‘~(H~,H;)~~P~~--I~(K~,H~)R/~, (- l)“‘flPsJt(H,, Hj)s,pS’“- ’ t(K,, H,)R/fl c fl, r H,,,c K”
and after calculation
Finally,
the evaluation
we obtain
1
that
of this expression 7
gives the covariance
matrix.
0
Consider now T= (T,, T,, T,, T,) to be a vector of triad counts for a Bernoulli graph where the components T,. give the number of triads with r edges where
r=O , . . . ,3, i.e. T, is of order 3 and size r. Frank
(1979) obtained the first- and second-degree moments of triad counts. We can use Theorem 13 in order to generalize his results. The following corollary gives the asymptotic distribution of T: Corollary
14. Let T=(T,,,...,
fl(T4;)
T,) be a vector of triad counts in G,,.
Then
- e),
where 13=(& ,..., f?,),
f3,=(:)p’(l-p)3P’
for i=O ,..., 3,
is asymptotically normally distributed with mean vector 0 and covariance matrix C = (Cij):j=o where Cij = Ci Cj
and C,=(~)pi(l-p)3-i(3p-i)/ll(l-plp,
i,j=O,...,3.
Acknowledgement I am indebted to Professor Gunnar Blom and Professor Ove Frank for offering so much of their time for discussion, valuable suggestions and comments. I also want to thank Professor Svante Jansson who gave me valuable comments and Docent Daniel Thorburn who further extended my understanding of the asymptotic behaviour of subgraph counts.
222
K. Nowicki
/ Normality
of graph statistics
References Barlow, R.E. and F. Proschan (1975). Statistical Theory ofReliability and Life Testing. Holt, New York. Blom, G. (1976). Some properties of incomplete U-statistics. Biometrika 63, 573-580. BollobBs, ErdGs,
B. (1985).
Random
Graphs.
Academic
P. and A. Renyi (1959). On random
Press,
graphs
Erdiis, P. and A. Renyi (1960). On the evolution 5, 17-61. Frank, 0. (1979). Moment Sci. 319, 207-218.
properties
Frank,
0. (1980). Transitivity
Frank,
0. and F. Harary
Amer. Frank,
Statist.
Assoc.
0. and D. Strauss
P. and S. Leinhardt Studies 2, 107-124.
Holland,
P. and S. Leinhardt
(1979).
.I. Amer.
measures
J. Math.
Statist.
Insr. Hung. Acad. Ann.
Social.
indices
with asymptotically
normal
in structural
San Francisco,
New
Sci.
York Acad.
7, 199-213.
in empirical
graphs.
J.
Social.
6,
Seria Matematyka
M. and B. Babington
81, 832-842.
and balance.
J. Math.
distribution.
Ann.
models
in social networks.
Math.
of small groups.
Statist.
Comparitive
In: D. Heise, Ed., Sociological
CA.
Eds. (1979). Structural
Subgraphs
Assoc.
for transitivity
Sociometry
Academic Press, New York. A review of random graphs. J. Graph
M. (1984). Balanced
Kendall,
graphs.
Matrix
(1975). Local structure
1976. Jossey-Bass,
za w Poznaniu,
6, 290-297.
graphs.
by using ransitivity
(1971). Transitivity
P. and S. Leinhardt,
Karoriski,
Debrecen Publ. Math.
in stochastic
and digraphs.
inference
(1986). Markov
Holland, Group
and Commentaries. Karoriski, M. (1982).
graphs
Cluster
Hoeffding, W. (1948). A class of statistics 19, 293-325.
Holland,
counts
graphs.
77, 835-840.
Harary, F. and H. Kommel 199-210.
Methodology
of random
of subgraph
in stochastic
(1982).
New York.
1. Pubf. Math.
of Large Random
Social Networks,
Surveys,
Advances
Theory 6, 349-389. Graph.
Uniwersytet
im. Adama
Mickiewic-
nr. 7.
Smith
(1940).
On the method
of paired
comparisons.
of a random
graph.
Biometrika
31,
324-345. Maehara,
H. (1987).
309-312. Moon, J.W.
(1968).
hloran,
P.A.
Palmer,
E.M.
Wasrcrman, J. Math.
On the number
of induced
Topics on Tournaments.
(1947). On the method
Evolution.
S.S. (1977). Random
directed
Social.
5, 61-86.
Holt,
of paired
(1985). Graphical
subgraphs
graph
Math.
64,
New York.
comparisons.
Wiley,
Discrete
Biometrika
34, 363-365.
New, York.
distributions
and the triad census in social networks.