‘L;_reviged manuscript received 2 May 1977
The Horvitz-TThompson estimation theory is a~&ci to snowball sampling and some other skmpling procedures Using’ a known or unknown graph struct’rlre in the survey population. In pirrti&ar, simplegraphyparameter estimators and variance estimators are obtained which are based on various kinds &partial information about the graph. Key words;
I?c3pulationstructure, C raph, Sociogram, Network,
1.
Snowb,all sampling,
Estimation
IntnDduction
Graph theory is a branch of finite mathematics that has been powerfully expanded due to t.he increased use of computers. Graph theory provides concepts snd toois that are par titularly convenient for analyzing the relationships and interferences between the components in large scale systems, and various graph models have been developed in the many disparate fields of applied mathematics. Nultl3meier (1976) is a recent text book on graph theory which contains a comprehensive list of references and many illustrative applications. The :use of graph models implies a demand for methods for evaluating the models and est,!mating and testing various graph parameters. The statistical analysis of data in graph models offers a rich variety of problems for research and development, and it is still in its infancy. Some different lines of progress can be distinguished. Probability problems concerning various properties of a stochastic graph have teen considered in a number of early papers, e.g. ErdBs, Renyi (1959, 1960), Erdos (1959, ‘. w*’ m*~~r+ o WQ 1 *a 1 nnd Austin, Fagen, Penney, Riordan (1959).In 19611,Kenyi (1~27~ )U4Auurs \1/U.J~ rrur,,-_. later years quite 8 !grge cur;;Ldr of papers on stochastic graphs have appeared, in partWar in Theory ofProbability and Its Applicatiorts.A few further references are Frank (1968), Stepanov (1969), Ivchenko (I973), Ling (1973, 1975), Robinson, Schwenk (1975)and Tainiter (1975). Another approach which is oriented towards the numerical algorithms is provided by the. Iresearch an clustering methods, multidimensional scaling methods and other explora Live statistical methods of search for structure in empirical data. Matula (1970) an;; Hubert (1974) dcszrtbe the applications of graph theory to clus:er analysis. The stochasric. models have rsot yet received any advanced position in these kinds of data analysiz(.An outstanding &erence is Ling ( 1973)who suggests a stochastic approach to cluster ;knalysis. 235
101Fmnk
236
.
The field of survey sampling and statistical inference in a graph has been considbzed by only a few authors so far. Scowball sampling has been treated by Goodman (1961). Bloemena (1964) has derived moments of the sampling distributions of some graph statistics, and Proctor (1966, 1967)has iestimated the edge frequency of a graph,; Some generalizations \and extensions have been given by Sherdon (1970) and !'%~ttor, Suwattee (1970). An interesting paper by Stephan (1969) discusses the possible future development of survey sampling aild the need to utilize the known or observrtble structure that may be inherent in a population. Various graph inference problems hiave been considered !sy Frank (19699. b, 1971,197,5a, b, 1976,1977), Cepobianco (1970, 1972,1974), Tapicro, Boots (1974) and Tapiero, Capobianco, Eewin (1975).There are some papers treating various sampiing and inference problems which are not explicitly stated as graph problems but which can be given such formulations, Some relevant papers are deal& with matching, overlapping and multiplicity. See for instance Goodman (1952), Deming and Glasser (1959), Ghosh (1966), Frank [1970), Sirken (1970,1972a, b, c), Sirken and Levy (1974) and Nathan (1976). In this paper we shall consider the Horvitz-Thompson estimation theory in survey sampling and show how this theory can be applied in graph models. Some basic graph concepts are given in Section 2, and some preliminaries on Horvitz-Thompson estimation are givan in Section 3. Section 4 discusses the use of snowball sampling in surveys, and Section 5 gives sorateestimators of a population total based on snowball sample dais. The questions of observability and data accessibility in the graph which generates the snowball sample are discussed in Section 6. This discussion leads to the introduction of a graph total which is a generalization of the population total. Section 7 ahows how the Horvitz-Thompson theory can be applied to graph totals, Section 8 gives a number of examples illustrating the use of graph totals for describing various graph properties. Various sampling and observation procedures are also illustrated. Finally Sections P-12 are devoted to the estimation of graph totals based on various kinds of sample data. 2. Some basic graph concepts A directed (undirected) gi”ccphconsists of a non-empty set V of elements called vertices (nodes, points), a set B of elements called edges (arcs, links, lines, arrows) and an incidenceftrnction I which associates an ordered (unordered) pair of vertices !(e) with each edge eEE. Let Eij denote the subset of edges which are incident with the ordered (unordered) vertex pair (i, j)~ V2. It IS cewenient to use the notation E, =EJi in the undirected case, and permitting a slight abuse of notations we can put I(e)=={(i,j)E V':eEEij}n
Cl)
Zn the undirected case the unordered pair (i, j) will thus be identified with the pair of ordered pairs ((i, j), (j, i)}. The sets E,, may be empty for some or all (i, j) tzVs. For a directed graph the non-empty E, constitute a partition ofE, i.e. they are disjoint and exhaustive of E. For an undirected graph E, =Eji for all (i, j)E V’, and the non-empty between i and j, and i md j are called their
Wvey samplingin graphs
237
etdvertices. The edges in Ei, are also said to leadfrom i to j, and i and j are called their ,inihland ternrfnal vertices, respectively. If there are two or more edges in B,, they are said to be parallel. The edges in & are called the loops at i. The edges in
Es.= U E,, and
Ep
U E,,
(2)
$2;
fs\
are c&led the outedges and the inedges at i, respectively. In the undirected case the outedgea and the inedges coincide. It is convenient to use the incidence concept not only for edges and vertex pairs, but also for edges and vertices and for edges with a common endvertex. Thus two vertices i and j are said to be incident if there is an edge between them. In the directed case i is also said to be incident before j,.andj is said to be incident ufkr i, if there is an edge from itoj. The incident edges at a vertex icomprise the loops, the outedges and the inedges at i. The incident edges of an edge e EE, are the incident edges at i and the incident edges at j. The incident vertices of an edge are its endvertices. Let Ii (e) and I2 (e) denote the sets of the initial and terminay vertices of e, respectively, i.e. Il(e)={iE V:ee&.}
and
I&)=-&
‘V:eEKi}.
(3
For an undirected graph both these sets coincide. For arbitrary edge subsets M z E we define the sets of initial and terminal incident vertices
U Il(e)={k
II(M)=
V: M nEi. f(B)
eehf
I,(M)=
U Iz(e)=(iE
V: M n E.i#Q)},
eahd
where 8 is the empty set. Analogously we define the set of incident verter pairs I(M)=
(J I(e)={(i,j)EV’:MnE&Q)}.
,
(9
eeM
*TWOvertices i and j are said to be adjacent if they are equal or incident. In the directed case i is also said to be adjacent before j, and j is said to be adjacent after i, if either i= j or there is an edge from i to j. The sets of vertices adjacent after and before i are denoted by Ai={i}u{jE
V/:Eii+!J}
and
Bi={i}u{jE
V:Eji+g},
(6)
respectively. The set of the adjacent vertices at i is Ai u BimThe set of vertices adjacent after at least one of the vertices in a subset M = V and the set of vertices adjacent before at least one of the vertices in a subset A4z V are denoted by A(M)== iJ Af ieM respectively.
and
(‘7)
B( ie.44
0, Prank
238
of n edges is calhi a waik of length n if there: ‘e&t5 a A sequence (e,, . . ., q&E” sequence (iO,. . ., i&5 Vn+ 1 of n + 1 verticm such that ek Gk_
!fk
for
k= 1,...,n.
(8)
The edges el,. . ., emand the ver tices iO, . . .>in are said to be incident with the walk, The walk is said to be between iOand i,,and through il,. . a9i,,_ 1. The walk is ako said to lead from iOto in, and iOand in are called the enduertices of the walk. If iO= i, the walk is said to be closled.A walk in which all the i:icidlent edges are different is said to be simple. A walk in w’hichall the incident vertices;are cliRerent,except for possible equality between the cndvertiws, is called a path. Each path is a simple walk. If a path is a closed walk it is called a close&path. A simple closed walk need not be a closed path. If there is a walk of length n from i to j, j is said to be reachubke in n steps from i. Two vertices iand j are said to be connected if either i = j or if there is a walk from i to j and a walk from j to i. This concept of connectedness defines an equivalence relation in K and its equivoience classes are called the connected components OF K Any two different vertices in a connected component are incident with a closed walk (and with a path bttt not necessarily with a closed path). No two vertices in different connected components are incident with a closed walk. Consequently the edges which are not incident with any closed walk are exactly the edges which are between vertices in two ditierent connected components. The connected components in a directed graph are called strongly connected comgonents by some authors. ?Sakly connected componevrts are then defined as the connected components which are obtamed if the directed graph is considiered as an undirected graph, i.e. if the orders in the incident vertex pairs of the edges are ignored. A graph is said to be simple if it has no loop and at most one edge incident with each vertex pair. In a simple directed graph\ two edges are said to be mutual if they are incident with the same dertices. A graph specified by the vertex Yet V and the sets Eij of edges incident with (i,j) E V2 has a subgraph specified by the vertex set V’ and the sets Elj of edges incident with (i,j)E ( V’)2 if V’CV
EfjEEij
for
(5,j)E (V’)2
Eij’Eji
implies
Eii=Ej,
for
(i, j)E (V’)‘.
(9)
The subgraph generateId by a vertex subset A4!GV is defined as the subgrapki with V’=M
and
Eij=Ei,
(10)
l’he subgruph generu ted by UFIedge subset M c E :isdefined as the subgraph with ~‘=I,(M)wJ2(A4)
and
E+=MnE,i
for (i,j)E(V’)2.
(‘I 1)
1.
Swey samplingin graphs
‘/,
239
1~ partidular, if M GE then the subgraph generated by the vertex subset I 1 (M ) u is equal. to the subgraph with
and
V’=.!~(M)UI~(M)
E;J=E,
for
(i,j)~ (V’)2.
I2 (M )
(12)
The two subgraphs given by (11) and (12) are Identical if the graph is sim;k 3. Preliminariesconcerningsurveysarbpling ‘Weshall consider a finite population of N units and a real variable y defined in the population. Let T/‘t {1,. . ., N) denote the population, let y, denote the value of y for unit i E v and let the total of the y-values be denoted by T= ~yi. A stochastic variabie S taking values in the set of subsets of V is called a randoifti !qample, and the probability distribution giving non-negative selection prdkzbilit ies P(S = M) =pfM) to each sul set M of li’ is called the sampling desiglj. The random sample S can be represented by a vector ( S1,. . ., S, ) of N stochastic variables defined by Si=
1 if id?, 0 otherwise
(13)
for k E The stochastic variable Si indicates whether unit i is included in the random sample or not, and it is called an inelusicw indicator of S. Obviously the selection probabilities can be obtained as the expected value
P(M)=E n Si n (l_Sj),
(14)
where M denotes the complement of M. The sampling design which is specified by the selection probabilities can alternatively be specified by the inclusion probuhilities n(.M)=E n Si=P(M E tS)
W)
iEM
or the exchsion
probabilities
#(M)=E 1[1(~-S~)~P(MES) i&u for all non-empty M c=:K For convenience we define ~($3)= %:(Qb) = 1 where $4is the empty set. The inclusion and exclusion probabilities are aRso denoted by it;, nij, . . ., and ii;i,itij,..,ifhfz(i},
M=(i,j),....
Proving the identity
--
ieM’
1-c M
ieL
is a straig.htforward matter. Here IL1denotes the number of elements in L, the sum is over all subsets of M, and the term corresponding to I_,=$9is understood to be 1. Taking expectations in (17) entails that the exclusion probability 2(M) can be obtained from
8. FRwk
240
the inclusion probabilities arocordingto the formula T
j@?f)=
( - ‘6)h(
L%u
08)
L).
In par titular
(1%
-3Zi-ltjf7Cijs
ii,=1
Analogously we can express R(M) in terms of the exclusion probabilities c(M) =
c
( - 1pit(L).
:
WI
A,cIU
The selection probabilities p(M) can be obtained from the inclusion or the exclusion probabilities according to p(M)= c (-l)‘%(LuM)= l&R
c (-l)‘L’~(?~w&i). IisIw
We also note the following monotonic properties of 7tand ?t:
If the inclusion indicators S 1,. . ., S, of the random sample S are mdependent stochastic variables with ESj =pi for ifz i/, then the random sample is said to have a Berfloulli (pl , . . ., pi) desip. If c”i= p %r i E V the design is called a simple Bernod6i (p) desigrg. We will use this design to illustrate and simplify the general results in this paper. It should be noticed that for practical purposes it is often satisfactory to approximate a simple random sampling design with a simple Bernoulli (p) design where p iu the sampling fraction. For later use we shall also give some well-known results concerning WorvitzThompsnn estimation in survey sampling. See for instance Kendall and Stuart (1966). If ni > 0 for all k K then the total T has an unbiased esti.mator, the lYoruitzThwnpson estimator, given by
i=g!_, id
(23) i
and the varia\:ce of $is given by
,
(24)
Survey smpling
in graph
241
where (25)
and icil==?QforietiIfq> estimator given by
i:
ralliUand_jeV;
then the variance #as an unbiased
(26) the Horvitz-Thompson variance estimator. Other variance estimators, such as for instance the Yates-Grundy variance estimator, will not be considered in this paper, and no attempt has been made to evaluate or compare variolus variance estimators. 4. Some illustratiouiof snowball samp’riiclg Assume that a graph is defined with the population V = {1,. , ., Iv> as its vertex set. Ry use of the graph it is possible to extend an initial random sainple S c V by joining the vertices which are adjacent after at least one vertex ii: S. The extended random sample becomes S’= A(S). Now S’2 S, and if this inclusion is strict, S’:I S, then a second stage is given by S” = A (S’)2 S’. If S” 2 S’ a third stage is given by S”’ = A (S”) =, S”, etc. The sequence S’, S”, . . . . of successively extended samples is called a sequence of snowball samples. The following discussion will mostly be restricted to one stage snowball samples S’.We shall indicate some situations where snowball samples might be of interest. Assume that the population can be partitioned int a equivalence classes according to some kind uf prior knowledge. A simple illustration is the partition of a pcpulation of individuals into households, Formally the household structure can be considered LSa graph with the vertices as the individuals and the edges connecting the individuals which are members of a common household. This graph is undirected, has a loop at each vertex, has no parallel edges, and has an edge between any two vertices in the same connected component. The equivalence relation “is a member of the same household as” is an example of a relation which is ordinarily known prior to the selection of the initial sample and consequently usable for this selection procedure. It is also possible to have an equivalence relation which is unknown but observable in the course of the survey. For instance the relations “belongs to the same opinion group as” or ‘ has the sarne background as” might be relations of this kind. If thq individuals in a population are supposed to report about the occurring edges in a graph representing an equivalence relation in the population, then it is natural to assume that’ the knowledge about &e occurring edges is not complete and perhaps not free fram error. The individuals might not be aware of some of their edges, and the graph mightthen represent a partially known equivalence relation. The connected components of such a graph are equal to the equivalence classes or are subsets of them. The prior k;\owledge concerning a population can also consist of more general relations than c:qui*valencerelations. For instance. there may be knowledge about
contacts or dependencies between the units ia 8 populatiolr! which can be reprek%%&d by a directed graph. The contacts can be cooperation contticts, friendship, Ikinship, economic depenciencies, etc. In the same way as for the equivalence relatidns the contacts can be considered to t e known and usable for the drawing ofthe initial sam$& ’ or the contacts can be unknown and partially observable in the survey+ In the ordinary sample survey thle sample data consist of the observed y-valu,es for the units in the sample S, i.e. yi for i E S. Whe n considering a snowball sample one has to distinguish between various kinds rJf inform,ation concerning the graph as has been indicated above. The graph can be c;ompletely or partially known prior to the survey. It can be possibie to make observations of certa!in graph properties simultzlneously as the Jr-valuesare observed for the units in the sample. We shall give two simple example. The sample information can be collected by indirect interviews, Then the information about the y-values of all the individuals in the snowball sample $’ is obtained from interviews with the individuals in the initial s,ample S, For instance each member of a household might report the y-value of any mer:&er of the household. Sample data thGn consist of yj for j E S’ and Xijfor i E S and j E V where 1 if i’ reports the p-value of j, xiJz 0 otherwise.
(27)
Another possibility is that the individuals in S give information about their y-values and their adjacencies, and that the information about the y-values of the other individuals in S’is then obtained by direct interviews with them. In this case the sample 4a;a consist of yj for j E S’ and xij for i E S’ and j E I/ where 1 if i suggests an interview with J, 0 otherwise.
Xij =
(28)
If the snowball sampling is continued with a second stage the sample data will consist of L’jfor j E S” and xii for i E S” and .j E l( and so forth if several stages are considered.
5. Estimation of a total from a snowball sample
Assume that the graph which is used in forming the snowball sample S’=A@) is completely known. The adjucency indicators are defined by .Xij = i
1 if i is adjacent to 1, 0 otherwise.
LY us denote the selection probabilities of the sampling design of S’ by
(29)
8m&-Er$@W,@on and’exclusion probabilitiesby P[M~S’)=n’(M), , ‘a/ /,I( Pl:lwcS’)=#‘(Ag). -“)
(31)
The following l&mma gives these probabilities in terms of the corresponding probabtiities for the initial raridom sample S. Lemma 1. The qwnplingdesign of a snowballsample S’= A(S) with an initial random sample S and ti fixed grapkhas the selectioit,inclusionavldexclusion probabilities given bY
.
p’(M)
= c
(-
l)‘%(B(Lu H)),
LSM
(32)
n’(M)= 2 (-l)‘%(B(zJ)), LCM I_
(33)
?t’(M)=i(B(M))
(33)
forsubsets M G Ir: Proof. The inclusion indicators of s” defined by
can be obtained from the inclusion indicators of S and the adjacency indicators of t,he graph according to . Sj’-= max S&J = ieV
1 - n (1 - Si). ie Bl
(36)
It foUowsthat
and by taking expectations we obtain (34).Substituting (34) into (20) applied to S’gives (33h,and substituting (34) into (21) applied to S’ gives (32). The followitig theorem shows that if we have recourse to snowball sample data then we can estimate the total 7’ by an unbiased estirnator which dominates the HorvitzThom;lsson estimator ?’ based on the initial random salnple S.
isdominated By the unbiased estimator
based on the snowballsample S’ = A(S).
Proof. We have to prove that ET* = T
d(T*)sa2(?)
and
for all (VI,. . .,yN).
We shall show that T* is equal to the conditional expectation T* =E( ‘I’IS’)= C fi E(SiIS’),
W
&YIti
and then the unbiasedness and the variance inequality will follow from the elementary rules of calculation for conditional stochastic variables. Now S; = 0 implies that SI~0, and it follows that
(41)
E(?iS’)= C ’ ?C(SilS’). ieS’
xi
In order to determine the conditional expectation E(S@‘) we note that
E(S,IS’= M; = l--
1 E(l-Si) P’(M)
1 =f_____._ p,(v) 1,
=l
E
C LSM
n S; n (1-S;) &M kd? (-lYL’(l_Si)
fl j&d?
(I-Sj)
(42)
where the iast equality follows by application of(37). Using Lemma 1 and substituting A4= S’ we then obtain the desired result, and the proof is demonstrated. The estimator T* depends in a complicated way on the sampling design of the initial sample, and it seems to be desirable to provide some simpler estimator. The following ves the Worvitz-Thompson estimator based on S’.
Swuey samphg iingraphs
I
245
Tbecrrcar 3. If Iz@i)<1fir i# the Horvitz-Thompson estimator of T based on S’ is giveit by
a&it
has a variance given by 02(W=C
c YiYjY;l
W)
leV jw
where r;=
it(B, u B,)-$B&(B& [1 -ir’[&)][l -*(B&]
’
Proof. Applying (23)-(25) with S’ and using Lemma 1 to obtain R;and xi, will yield the result. The following theorem implies that generally one cannot be sure that p’ dominates F Theorem 4. If there exists a pair of distinct vertices i and j such that the graph and t/w initial sampiing design satisfy
Bi = {i>c B,
and
ff(Bj)nij f nij - ninj,
(46)
then neither p nor ?‘domiriates the other.
Proof. We have to show that the condition (46) for some i # j implies that the quadratic form
is not non-negative or non-positive definite. According to well-known facts about quadratic forms it is suficient to prove that (Yii .“.. Y&!Jj
- r;l> -
(Yij -
l(j)’
< Q
W-9
for some i #j. Now using (25) and (45) implies that (49)
0. Frank
246
and
‘)‘i,- I$ # 0 if (46) is valid. Consequently (48) is valid ar d the theorem is proved. Note. To prove the existence of a sampling design satisfying (46) it is suflicient to note that
if the initial sample has a Bernoulli (pi,.. . ., pN) design with 0 c pr< 1 for i E K Theorem 4 shows that generally T’ is not dominating ? if the graph has a vertex which has no inedge but at least one outedge. The author has not been able to find a necessary and sufficient condition for ?” to be dominating t
6. Comments on graph observability
We have hitherto considered the vertex set I;/ as a population from which we have drawn the initial sample S. We have extended S to a snowball sample S’ by using a known graph, and we have observed the values of a variable y for the vertices in the sample S’. If the graph is unknown but locally observable at the sampled vertices, it may be asked what kind of graph observations are required in order to apply the estimators of the preceding section. An examination of the Horvitz-Thompson estimator T’ in Theorem 3 shows that it requires the observability of B1for ic S’. If the initial sampling design is a simple Bernoulli (p) design with p = 1 -q > 0, then
and Jt is sufficient to observe the size x. i of BI for i E S’. From Theorem 3 and the application of (26) to T’, it follows that the Horvitz-Thompson estimator of a”@‘) also requires the observabil’ity c#B, for i 6 S’. If the initial sampling design is a simple Bernoulli (p) design with !P= 1 - q > 0 then k(Biu
Bj) =
qx iexI- EkeYxkkrkj
(53)
and it fdflow~that s2( ?‘) requires ths observability of’the sizes X.i
and
C su,,xkj
(54)
lC4V
iandlBinB,fori~S’andjES’.
.
/ .,._
Szcrveyscmpli~g
ingraphs
247
observability conditions considered above require sample data concerning certala locat properties of the entire graph. In many situations, however, it is not possible to observe more than some local properties of a sampled subgraph. For instance & might ,not be obse::vable but only Sn I$. We are now going to study some estuntitionproblems based on partial graph knowledge. We shah introduce a vertexpair variablle y with values ysj for (i, j) TV V2. We shall show htiw various graph chara,cteristics can be described by the totals 8Thei
which we will call gwph totals, and we shall show how 7’c;ln be estimated from various kinds of sample da?a, 5
7. Estimationof graphtotals We:shall consider a graph with the vertex set I/= {1,. . ., IV},the edge set E and the incidence indkators xij definedby
if i is incident to j, otherwise
(56)
for (i,j)~ V2. Note that the x’s now refer to incidences and not ac’jacencies, i.e. the diagonal entries need not 5e 1. We hav.p 2 real variable y with unknown values yii for (i, j) E V2, and we are interested in estimating the graph total r At our disposal we have the values of y observed from a sample S’ of vertex pairs, i.e. yij for (i,j)~:S’ where S’ G 1V2. The sample S’ can be obtained in various ways from an initial sample S by using the graph. The initial sample will be selected according to a known sampling design. We shall consider two different kin& of initial sampling units. I[r\Section 9+-l1 we shall consider the case an initial vertex sample $ G V and some alternative ways of defining the final vertex p&r sample S’c_V2.After that we shall consider in Section 12 an initial edge sample $GE and some ways of generating a vertex pair sample S’ c V2 f~ii~ it. The sampling design of the vertex pair sample S’ depends on the graph and the sampling design of S. It should be noticed that xiJ and it; denote the inclusior and exclusion probabilities of a single vertex pair (i, j), and Z;ikland fCfjkl denote the inclusion and exclusion probabilities of a pair of vertex pairs (i, j) and (k, i). Thus (19) does not apnly without modification. For convenience we shall put nijii = 7~:~ and TZilij = %ijn ‘2 and can be determined from ltfthieinclusion probabilities zIi are positive for (i, j) E k the seimple data far (i, j)~ S’, then it is possible to apply the kiorvitz-Thompson estimation theory to obtain an unbiased estimator of 7’ as
of
0. Frank
268
The varizsnceof this edimator is
where
If A; is zero for some (i, j), ad if yij is zero for these (i,j), then the formulaecan be applied with the convention that the total is over all pairs for which it;] > 0. If the inclusion probabilities & are positive for (i, j) E V2 and (k, I) e V2 and can be determined from the sample data for (i, j) E S’ and (k, I)E s’, then the variance has the unbiased Horvitz-Thompson estimator
a2(T)=c WkS’
c
yijh
(k,ZW'
2 .
l
These formulae will be applied in the following in order to derive explicit results pertaining to various specifications of the way9 is generated from S. 8. Some illustrative examples We shall indicate some concrete situations where the graph totals may be of interest and where the sampling schemes fitting our theory may be app!icable. I A CONDUCT network or a sociogram is a graph where the vertices represent individuals, household.s, enterprises or some other type of units which are connected with each other by some kind of cc ntact represented by the edges. If the verrexapair variable y is defined as the number of edges which are incident with the vertex pair, i.e. yii = IEd, then the graph total becomes equal to the total number of contacts if these are directed, and twice the total number of contacts if these are undirected and no loops occur. With the alternative definition yiJ- xii, where xsj is the incidence indicator, the graph total becomes equal to the number of incident vertex pairs, i.e. the number of ordered pairs of units which have a contact. In particular, yri is equal to the incidence indicator xij in both the cases considered above if the graph is simple. If we define ytj = XiiXji for a simple directed graph then the graph total becomes equal to the total mutual edge frequency, i.e. twice the total number of reciprccnl choices in the terminology of sociometry. The following comprise some further examples of structure parameters that may be ofinterest i? contact networks. In an undirected sociogram the total number of isolated pairs of units with a contact can bc:obtained as a graph total by defining
Yij -
Xii
i 0
if Xi,
=X,.
t herwise.
=Xij,
Sutoey sampling In graphs
249
The tots4 number of isolated units in an xndirected sociogram can be obtained as a. graph total (which is a vertex-value total) by defining YQ=
1 if i=j andxi.=O, 0 otherwise.
(62)
In a directed sociogram the total number of contacts, which are taken by a unit which receives contacts from r units, and are taken with a unit which emits contacts to s units, can be obtained as a graph total by defining y _ l&,1 if x.I=r and +=s, ijI 0 otherwise.
..
(63)
In particular, in a simple directed graph this graph total becomes equal to the number of edges Froma vertex with Yinedges to a vertex with s outedges. Let us now consider some different ways of observing a part of a large contact network. Assume that a sample S of units is selected. Information about the occurrences of contacts for the sampled units is obtained from interviews with them. It may be convenient to ask about the contacts within the sample, for instance by presenting a list of S ad asking the units to indicate their contacts there. ‘ThisInay reduce the risk of missing contacts. Then the sample information comprise E,,, for (i, j) E S2, i.e. the subgraph generated by the vertex sample S is observed. Obviolusly, for some of the yvariables; above, it is not sufficient to observe only the contacts within the sample. By asking thleunits in the sample S about their contacts within the whole population V the informat ion is extended to E,, for (i, j) E S x V v I/ x S. The observation procedures can also be based on samples of contacts. The contacts can for instance be letters, phone calls or some other objects or occurrences which are possible to record and sample. If S is a sample of contacts it maly be convenient to ask the incident units, the units in II (S)LII2 (S), about their contacts. The infhlrmation obtained from S might then consist OfE, for
Flow storks are graph models which are used to desi;ri’Detranspcrtp!tion and communication systems in a wide sense. The flows between the lxerticescan be amounts cf money’trensferrcd between different accounts, quantities ,ofgoods distributed from producers to consumers, information, rumour or infectior[ ccmmunicated between individutJs, etc. Let _fijdenote the amount of flow transmitted Fran vxx i to vertex j for 1:#j, a,nd let gi and hi denote the amounts of flow which are gen(,:ratedand z,bsorbed, respectivlzly,at vertex i. The generated and ttbsorbed flows at ve:Tts*x i can be :he initial and the final amounts of a certain period of time, or they can blethi; flows to v&cx i from the sources outsid:: the considered ver’texset V and the flows from vertex i to the receivers outside F respactively. In a goods distribution system the total amounl: of transportation wL>rkcan be obtained as a graph total if Yijis defi:led as the product of the &mount of goods j;jand
0. frmk
250
the distance d, from
.hj‘jij
3‘@= Q i
i to j,, is.,
if i+j, atherwk.
KW
If t-e-loading of the incoming and outgoing goods is required ;;t each vertex, then the total amount of re-loaded goods can be obtained as a graph total by defining !‘ij=
uj
if i+j, gi + fti if i-j.
I
If the generated and the incoming flow requires sorting at each vertex, then the total sorting time can be obtained as a graph total by defining
where the sorting time per flow unit is used as a time unit. Consider an information transmission system where documents are generated at the vertices, are copied and distributed to all the reachable vertices. If Gi,iis a path indicator defined by -. =, [ 1 if i ==j or if there is a path from i to ;, -1 (0 otherwise
(67)
and if J’iJ= giZlithen the praph total becomes equal to the total number of generated and copied doc?rments. If we introduce the reachnbility Zi. we can obtain the graph total as a vertex-value total with vertex values giZi*for i65‘k:Distinct information groups, i.e. subsets of vertices receiving the same documents, are represented by the connected components of the graph. If the graph is directed the groups may have overlapping information, otherwise net. The total number of distinct information groups can be obtained as a vertex value total by defining th.e vertex values as the: inverted reachabilities l/zl.. Partial ol Licrvation of a flow system can be effected by taking a vertex sample S and by rnkking observations of yii for (i&e S’ where S’ can be equal to S’, S x I/ S x Vu 1’x !; etc. Another possibility is to choose a sample of llow units and observe their walks. This means that some of the path indicators ZiJax2 observed. We are not, l~we~:er, going tlj consider any kinds of walk sampling procedures in this paper. Some important kinds of partial observation based on edge sampling can be illustrated by tlo:: following example. Assume that the gr;bph describes some sort of incorrect or illegal economic transactions between the individuals in a population. Assume that some of the improper transactions can be discovered by a routine control improper ones, If an improper iscovered, tltlen a more detailed
&m?tTyimpling lrngdqdfu
2.51
e~amhati~an of the transactionsfrc?mi to J’can be performedsnd luadto the discovering of 811the improper transactions from x’to j, say91~~: = lEljl in number, A 1morethorough
control pclncedure might examine all the outgoing transactions from i and all the iqqx&g trqpq~tisns. t,oj ,nd find-theimproper ones, i.e, I$, u A!LJI Sorarer alternative contrx4.~roc&ures tiightaxtr,mina @I:thes inlcsmitig.and/ar outgoing transactions at 1
an,d&krj, ‘I%@ initially ‘f43und. 34% S of imprap6v transactions will By the u8e of a ccombinedroutine snd &t;ailerdcontra1scheme ive information about yij for (i, j) E S’ where SLc&nbeequal to I(S), II (8) x Vu V’x I#), [~,(S)u~,(S)] x v; etc, Other .Gnds of graph models which are of interest in cluster analysis and multidimer~sionnl scaling are the simSIarity or dimhilarity sta’uctwes.The vertices can be plants, bacteria, finger prints or some other kind af objects which we want to compare and separate into classes of similar objects. If object i is characterized by a d:ptrlveCfor(xll,...,XI,)forf=l , . I., N, then the dissimilarity between objects i and j can be defined as the squared Euclidean distance Jij =
kiltxik- xjk )”
l
ow
z5
The corresponding grtipti total becomes equal to
where 1 N (70) a; =--- C( Xik-X.k)2 N 1-l is the variarlce of the kth component of the data vectors. The graph total is an overcll measure of’tiissimilarity. Assume that we know the data vectors of the objects in a sample S. TThenwe can calculate the dissimilarities yrj for (i&z S2, and consequently we can consider the problem of estimating T as a graph tot4 estimation problem based on S2 data, 9. Estimatiionof a graphtotal from S2 data Let the initial sample be a vertex sample SG Y and consider the simple case of a vertex-pair sample S’= S2 consisting of the pairs of vertices both of which belong to S. Obviously $ = ltij and ltfpr= qjkl and by applica*ion of (57)-(60) we obtain the following theorem.
.
‘,
It is possible to simplify the above result considerably Iz~yspecifying,the: ~idtial sampling design to a kind of sync,nnetricdesign which halsarJddepending ~~n%l~~titi@~~.: hut not the idintities of the dis’mct vertices in the quadlruple (f9ji k, !)I Sitiple &~&ti: sampling with t;r without q2;~ment and simple Ber&oullisampling tsreWMII~~~S’~~~. such symmetric designs. Frank (1971, 1977) has @en simpUied ve&ans of the variance a2(? ) and its estimator in the case of symmetric dittsigns.In order tt, illustrate these results we shall now state some results pertaining to simple Bernoulli samphg, 7 6. Let S be 4 vertex sample selected according to ti simple Rernaul~i -(p) design with p = 1 - q B 0. Then the Hortlitz-Thompsian estimator of T based on S’ = S” is given by
Corollary
and its ouriance is given by
An trnbiused estimator ofthe variance iz given by
Proof. Substituting ,
if i=j, P2 if ifj
Cp
ltij = ltij = I
i
into (57) yields 2 From (59) we obtain
Yi;kl
J’-pr+s f+S P
if (i, j, k, I)tz C,,,
(176)
where C,,, is defined as the set of those (i, j, k, I) which have r and s diflercnt vertices ir, the two pairs (i. j)and (k, l), and t different vertices in the quadruple ii, j, k, I). The qnge of (r, s, t) is the set J=~(I
‘9
1,1),(1,1,2),(1,2,2),(1,2,3),(2,2,2),(2,2,3),(2,2,4)}.
vv
The A,, can be given by
=c
A123
Ykkhj-+Yjih
A222=chj(Yij+Yjih
where al: the sums are taken with different values in V for all the indices. Using a number of lengthy algebraic calculations it is possible to show that (78) can be simplified to (73). Finally, substituting
into (60) and simplifying yields (74). This completes the proof. In particular this corollary can be applied to obtain an estimator of the edge frequency of a general graph, if the sample data consist of the subgraph generated by the initial vertex sample S. The following corollary gives the result for a simple undirected graph. Corolhwy 7, Let R be the total edgefrequency aad Q thesum of squares @he local edge
jbequencies qf the oertices in a simple undirected graph. Let R(S) and Q(S) be the same entities in the subgraph gmemted by a vertex sample S selected according to a simple Bernoulli (p) design with p t= I- 4 > 0. Then R has an unbiased estimator R
R(S)
-=--F-
($2)
which has ths variunce
(83)
and a@ rcrsbiasedvariuace estimator is given. by #(R)=+
P
Q(S)-;;
R(S).
(84)
ProoL If yij =.xif is the incidence indicator+ then
and it follows from (80) that 411
=A112
=4*2=423=0,
A222=4R,
A223=4Q-8R
(86)
A22h=4R2-4Q+4R.
Applying Corollary 6 to ?‘= 21i yields the result.
10. Estimation of a graph total from S x vu
data
We shall now consider sar.lapledata yij for (i, j)~ S’ where S’ =:= S x Vu V x S, i.e, the vertex-pair sample S’consists of the pairs of krtices of which at Xleast c3nebelongs to the initial sample S, Theorem 8. Let S be a vertex sample with ft, < 1for (i, j)e V2. Then the MorvitzThompson estimator ofT based on S’= S x Vu V x 3”is given by (57) where $ = 1 - I&~, and its variance is given by (58) where
If ii,, + &, - tiiyl< 1 *for (i, j, k, I) E I/‘, then an unbiased variance estimator is given by (60) where $k, = 1 -. tiij -- tiki + ii,,. Proof. If SiJar,d SI are inclusion indicators of S’ and S, then it can easily be seen that
s;j-
1 --(I -&)(I
-sj,,
(fw
a’@)=$ALlI
+
j$
(A122fA,,,)+p(+$j5
W
A223
where the A,,, are given by (80). An unbiased estimator ofthe variance is given by
t?‘(lt’)=$AII1(S)+
(I2
p2(!!-q2)
A,,,CU+ A222(S) 1
+-L
A223(S)
p4(1+4)2
(93) where the A,,,(S) ure given by (80) with V repluced by S
Proof. Substituting f$r=: 1 -qp
I
p
l-q2
if;=j, if i+j
(94)
in (5’7)1 yields (91), and substituting
in (!W yi&k (92). Mere Gpsris defined as in (76). The HorvitE-Thompson variance estimkaltorcould be obtained by substituting lticiki= 1 - 4’ - @+ q1 tf (i, j, k,I ) E C,,,
(96)
2%
* Q.Fmk
-
1
-
/.
/‘ > ,I 9’ , ‘,,., I_ ‘%i *I’ ,-, ’ ,; ‘1
but the simpler variance estimator given by (93) is obtain& from (%)%y:V&B ihst A, has the unbiased estimator A,&Jlfpg. In particular, Corollary 9 can be applied to obtain an estimator of the?7dge frequency of a general graph, if the sample data consist of the observ@iona of al1 the.qes tihich’ are incident with any of thr vertices in an initial vertex ~&mpk S. “F&p: ‘~i$~&t@~ corslhry gives the result for a simpie undirected graph. The proof is strai&tftimkrd and is omitted. in (tj@
. Corollary
10. Let 19 be the dotal edge fwquency and Q the sum of.squares afthe &~f
edge.frequencies elf the vertices in ~1&q~& undirecfed graph. Ler: R’(S) and R(S) tre,the numbers ofedges whichrare irrcident with ml.) one vertex iq S antilwith two vertices & 5, respecGveiy, where S is a vertex sample selected according to a simple Bernoulli cp! design with p = 1 - y > 0. Then R has aa unbiased estimator
R
= R(S)+ R’(S) I-------7’7
_
which has the variance
An unbiased variance estimator is given hi
(99) where Q(S) is the sutn of squares of rhe local edge,freyuencifps of the vertices in the subgraph generated by S.
A comparison between Corollaries 7 and 10 shows that the variance reduction is
due to the inclusion of the sample information about the edges which are incident with only one of the sample vertices. Bfthe vertex pair values yij are available for S’ = S Y Vu V x Sit is possible to observe the vertex value
-* L
-5
Yji
Yij
yi
=
-II_
-;i-
__--
W)
Ify isa non-negative or non-positive variable, then T* is dominated by the estimator ‘7’in Co~oll~rr~~ 9; stherwise the variance of T* can be smaller than the variance of?.’
Wof. The varianceof T* is eqtlal to (103)
and using (80) and Corollary 9 it follows that (104)
Accordillgto (SQ this differemx is non-negative if yij 2.0 (or .vij d 0) for all (iJ) E V2. The diflerence becomes negative if there is a pair (hi,j) such that Yij-I-Jtji>h0 and yii is negativeand of suEGently lar ge absolute value. Corollary 12. With the asswmptiotls and notations; in Corollury 10 R has an unbiased estimator
R* =------2R(S)+ R’(S) 2P b#hichhtis a variance sutisfying
with equ&ity
(f and only if iike graph has at most one incident edge at each vertex.
(105)
with equality if and only if & yij is equal to 0 or 1,for each I G K 11.
Estimationof ti graphtotal %wrtsome other khds of wrtctx-samplegenerateddaata
The general formulae in Section 7 can be applied to many other kinds ofdata than the two kinds considered in Sections 9 and IO. We sloalIillustrate this by giving some further examples of vertex-sample generated data, We shall first consider sample data J’ij for (i, j) 6 S’ where S’ is equal to S x V or S x A(S). The inclusion indicators of S’ become qj
:‘D
si,
I
(l_SkXkj)
Sij=Si
1
in the two cases, respectively. In the first case the Norvitz-Thompson estimator of the graph total 7”becomes (110)
and the formulae (57)(60) can be directly applied. In the other case it follows after a ti:rtsin amount of algebra that 7c -ij=%i
+
X(Bj)
_
fi(fi)
V
W) Bj),
and
and (57)-(60) can be applied. In pal titular it Xrj=Q implies that yij = O9then the two estimators ( 110)and (113) become ldenticai. Consider t~ow the case of samyle data from all ths pairs in a onemstagesnowball
-~(B,uB~uB,)+~(B~vB,uB,uB,)
(116)
The application of the variance formula (58) is somewhat simplified if xlj = 0 implies :lij =O. We shall now consider such a case, If we define v..x , JJ
XijXji for if j, 0 otherwise,
I
then the total T becomes equal to the total number of mutual edges in the graph. The following theorem gives the estimator of the total mutual edge frequency pertaining to a simple Bernoulli (p) design and a graph with exactly one loop and one other outedge at each vertexTheorem 1X Let S be a vertex sample having a simple Bernoulli (p) design with’ p=l-q>O. Assumethatxl,=I andxi. =2 *forits V.Then an unbiased estimatclr qf’ the total number u$mutual edges.
btased an S’ = A( S)2 is given by
and its r?u~inmt
is given by I.-q4
a2(?+----
4”
”
l-cc
ifzv jev i+j
Xvl’kXs
1
2
j
xijxjiq _qX.i_q",j+ilX.'~X.j-2
l.-___Iu_.‘__-_
---_
1
’
Nore. Ilt should be noticed that the estjtnatar requires the fabservab%tyuf th4 kotr@ number of inedges at every vertex in the snrlwball sample .4(S). If M,., and &&St.) denote the numbersof mutual edges between the unorderedvertex pairs in Pa and S
which have r and s inedges, thl:n (118) and (119) can IN given by
Proof. According to ( 11-Z)a simple Bernoulli (p) design implies that n;=
1 _.q”.r-qX.j-(_q”.f+~*!-“f~-x~f
(122)
and ( ! !I?)foi!ows. Moreover, assuming XijXji= X&1X1& =
1 d
(123)
9
it follows from (115) and (116) that j%;ik,-~;j~;,=(
1
_y4)qx.r+4.j”x.r+x.r-4,
(124) According to (20) (58), (59) and (124j we obtain (119) by substitution and simpiificatio~.
12. Estimation of a y-aph total from some different kinds of edge-sample generated data
We shall now turrl tc, vertex-pair samples S’G V2 generated by initial edge samples SEE.
Consider first 3 directed graph and define S’ as the set of the incident vertex-pairs of the edges in S, i.e. S’ = I(S). Then the sample inclusion indicators satisfy
Igj
fl (1-S,)
lf
E;j#(b*
PCEij
=
0
otherwise,
(W
and we obtain
Cl26) Hence we have the followmg theorem.
(129)
In order to illustrate these formulae we consider a simple particular case. Assume that one observes the subgmph generated by 11(S)u12(S) in a general graph, i.e. the subgraph generated by $ GE in a simple graph. Put yrJ= lEul which implies that T becomes equal to the edge frequency in a directed graph. If S has a s+nple Bernoulli (I>1 design with p = 1 -q >‘I, the estimator becomes equal IO (131) where m, (S’) is the number of ordered vertex pairs in S’ which are incideut with exactly r parallell edges. !f qi& the total number of or&red vertex pairs which are incident with exactly r parallel edges then the variance of ‘? is given by I 132)
and 3~ unbialsed variance estimator is given by (133)
, A simple direct proof is easily obtained by using the fact that w$?) is binomially distributed \yith the parameters rn~and 1 - @‘, t ^
Q, Frunk
262
<
iat S be apt edge sample and S’= lx (S) x Vu V x 12(S)1The HoriW~ Thompsonestinwtor of T based on s” is then given by (57) if I$= I- 8(El, u E.&Al unlessyij --0. its variance is given by (58) and (59) where
lbeorean 15.
&,=
1 -@(E~.uE.~)-~~(E, uE.~)+~(E~.uE.~wE,.wE.~).
,
(134)
An urtbiusedvarianceestimator is given by (60) ifziJM> 0 unfess yIjyhr= 0.
k-f.
The sample inclusion in rficators satisfy SiJ=
’-
n
et?E.iUE.j
(l-Se)
if E,.uE.~#Q),
otherwise,
0
(135)
and it follows thnt U
~‘(M)x%
(Ei.L!E-j)
ii&&i
>
for MEV’.
(136)
An application of (20) will complete the proof. In particular we obtain for a Bernoulli design that yifarin (59) can be given by (137) where 4 is defined by (138)
For a simpk Bernoulli (p) design with 0 < p < 1 and q = 1 - p we obtain It.;{=
l-q’
(139)
kr (i, j)E C,,
~;jk,=~-6]r-@LJ'~q'fsmt
for (i9j,k,&d&
(141)
where C,=((i,j)E v2:iEi.dcj~-r),
lEk.UE./I-S* lEi,UEk;J=t}.
(142)
(143)’
’
\
Survey sampling in graphs
263
If*fenowSnWxiuce the parsmel ers (144)
(145)
(146)
(143
$(Q=
c
qr+“-‘(l -4’)
r,s,t
(1-q’)(l - 4”)
t?2(T)=c r,s,t
-
(MS;
r,,,
4 r+s-y1
-q’)
Y,tW)* (1-mu -4”)U -qr-qS+q’+S-r)
W)
The results above can be modified and extended in various ways to cope with some related sampliq procedures, for instance S’ = J,(S) x I,(S) and S’==[~,~S)UP~(S)]~. The application of our tools to these cases does not present any new problems in principle, hut may provide interesting results if different sampling and observation procedures are compared for vxious kinds of graphs. . Referegces Austin, TF.E.,R.E. Fagrn, W.F. Penney, J. Riordan (1959). The number of components in random linear graphs. Ann. Math. Statist. 30,747- 754. Bloemena, A.R. (1964). Sampling from a graph. Mathematisch Centrum, Amsterdam. Capobianco, &%I. (1970) Statistical inference in finite populations having structure. Truns. New York Acud. SC&32,401*413. Cap&ian.co, M. (1972). Estimating the connectivity of a graph. In: Y. Alavi, D.R. Lick, A.T. White, editors, Grcrpk rireory and appficatbns. Springer, Berlin. Capobianco, M. (1974). Recent progress in stagraphics. Ann. New YovrcAcud. Sci. 231,139.-141. Deming, ‘W,E.,G.J. Glasser (l!%N). On the pr6blem of matching lists by samples, J. Am. Sttrtist. Assoc. 54, 403-415. Erdos, P. (INS}. Graph theory and prabability I. Canad. J. Math. l&34-35, Erdiis, P. (1962). Graph theory atid ptwbsbility II. &tan. .T.MntEr. 13,34&352. Erdiiti P., A. Rknyi (19S9). Qn random graphs. Pub!. M&I. Debwcen 6,290~-297.
Erd6s. P., A. Renyi (l%O). On the evolution al’random graphs. Puhh b#aoh. B&H. Hrmgore &a& Sci. $,17-6&: Frank, 0. ( 1968).Stochastic competition graphs, Ret!, ILIC$~)~WC. Statist. P&qt.36,3,31%326, Frank, 0. (1969a). Structure inferenw and stochastic grs~phs~FOA Repwts 3, (2), l-8. Frank, 0. (I969b). Estimation of the flow in tree graphs. Foal RicrporfsJ (6X l--10. Frank. 0. (1970). Sampling frotmov.riwpping subpopulation% IUerrika 16,1,32-h& Frank, 0. (1971). Statistical infitrer x in graphs, Farsvarets forakningsanstal!, Stuckhola~, Frank, 0. (1975s).Sampling popdation grilphs. Paper presiented at the 1975 meeting of the :fnteraati~ns~ Association of Survey Statirrticians in Warsaw. Frank, 0. (1975b). Survey sampling in graphs. Paper presented nt the Mathematical Social Science Boerd SocialNetworks Symposia-.a.held in Hanover, New Hampshire, September 18..+21,1935. Frank, 0. (1976). A’note on Bern&Ii san. lling and Horvi &-Thompson estimation. Dept. of Statistics,Univ. of Lund. Frank, 0. (1977). Estimation of graphto\&. SC&. .I. Sttltist. 4,2. Ghosh, S. (1966). Poiycholomy sampling. Ann. Math. Scratlst.37,657-665. Gilbert, E.N. (1959). Random graphs. Ann. Mark. Statist. 30,1141.-l 144. Gilbert. E.N. (1961). Random plane networks. J. Sot. Indust. Appl. Macir. 9,533-543. Gcdman. L.A. (1952). On the analysisof,samples from k tists.A~IFI.Murk Ytcotist.23,632-634, GoodmaIn. L.A ( 1961). Snowbirll sampling. Ann. Math. Stutist, 32,148- 170, Hubert, L.J (1974). Some applications of graph theory FOclustering. Psychomerdw 39 (3), 283-309. Ivchenkc, G.I. (1973). On the asymptotic behavior of degrees of vertices in a random graph. Theory IVob. Ap~/h*uriOttsI ti ( 1). 188 - 195. Kendall. M.G ., A. St ua:l ( i 966). ‘The Aduwcad Theory I@ Statistics, 3. Griffin, London. Linp, H.F. ( 1973). A probability theory of cluster analysis. J. Am. St&t. r4~~~~. 68,1$9-164. Ling. R.F. (1973). The expected number ofcomponents in random linear graphs. Ann. Prob. 1 (J), 876-881. Ling, R.F. t 1975).An exact probability distribution on the connectivity of random graphs. J. M’nrh.Psychof. , I2.90 98.
blahdo, D.W. (1970). Ciuste: analysis via graph theoretic techniques. In; KC. Mulling, K.B. Reid an’d D-P, Ros~llc, edi!rrcG.Yro~wti:ngs CI/the Louisitrn Cot$erence on Combinatorics, Graph Theory ad computing. iJniv. of Manitoba. Wit\?\rpeg, 199.-212. Nathan. G. (1976). Aa empirical study of response and sampling errors for multiplicity estimates with differenl counting rr:les.J. Am. Sralisr. Assac. 71,808.-815. Noltemeier, H. (1976). C~raphenrheorie mit ,4lgurithmen und Anwendungen. de Gruyter, Berlin. Pr(xtor, C.H (1966). Two mcusurement and sampling methods for studying social networks. Dept. of Statistics. Ni)rth Carolina State IJniv. at Raleigh. Proctor. (.‘.H. (1967). The variance of an efstimatebf linkage density from a simple random santple of graph nodes. Prr~. Srrrial Stori t. Sect. Ant. Waist. ,4ssoc.342 -343. Proctor, C’.H., P. Suwattee (1970). Some estimators, varianctcsand variance estimators for point-cluster sampling ofdigraphs. Dept of Sta;istics, North Carolina St& Univ. at Raleigh. Rbnyi, A. \ 1959 1On connected graphs. Puhl. Math. Inst. Wungur.dcad. Sci. 4,385-388. Robinson. R.W., A.J. Schwenk (1975). The distribution of degreesin a large random tree. Dbcrete Math. 12, 4.359 372. Shcrdon. A.&. (1970). Sampling directed graphs. Dept. of Stalastics,North Carolina State Univ. at Raleigh. Sirken, M. (1970). Household surveys with muiriplicity. ./. dnr. Statist. Assoc. 65,257.-266. Sirken, M. t 1972a). Stratified sample surveyswith multiplicity. J. Am. Statist. Assoc. 67,224.22!7. Slrkcn. M. (1972b). Survuy strutegiesfor estimating rare health attributes. Ptocecdings @the Sixth Berkeley S_w~posiumm Mnthumuticul Statistics trrtd Prabddlity~ Vol. VI, Univ. of California Press, 135-144. Sirkcn. M. (1972~).Variance components of multiplicit;t, estimrrtors,Biometrics 28,869-873. Sirken, M., Levy, P. (1974). Multiplicity estimation of proportions based on ratios of random variables. 3. .4In. Sfufisl. ASSOC. 69, 68 -72. Stepanov. V.E. I 1969). Combinatol iai argebra and random graphs, Theory Prob. Applicdons 14,3,373-399. Stephan. F .I;. ( 1969g.Three extensions of sample survey technique: hybrid, nexus, and graduated sampling. In: N.L. Johnson and H Smith, editors, Nuw dcr:elopnti!ntsin SMOOCH sampling. Wiley, New York, 89.-104. ‘fainiter, M. (1975). Statistkat theory of connectivity. Dwrers Math. 13,4,391.-398. 7 apicro. C.. t3. Boots (1974 I ,’ sirctural inference in transportation networks. Enoiranmmt und Planning 6, 411 41s. T$wo. (‘.. M. C‘apobianco 3rd A. Lewis (1975). Structutd itsference in organizatios. J. M~ak Social. 4, 121 ‘10