Quo Vadis, Graph Theory? J. Gimbel, J.W. Kennedy & L.V. Quintas (eds.) Annals of Discrete Marhematics, 55, 349-366 (1993) 0 1993 Elsevier Science Publishers B.V. All rights reserved.
EXPLORATORY STATISTICAL ANALYSIS OF NETWORKS Ove FRANK Department of Statistics, Stockholm University Stockholm, SWEDEN
Krzysztof NOWICKI Department of Statistics, University of Lund Lund, SWEDEN
Abstract We review standard multivariate statistical methods useful for exploring network data and discuss variow problems related to statistical analysis and modeling. General methods are suggested for three main problem areas, namely whether there is a need for block models, whether there is dependence between dyads, and whether there is dependence between different networks. In particular, we illustrate the use of logit regression analysis in order to fit log-linem models.We comment on various themes in the literature that are important for future research on statistical graph modeling.
1.
Exploring Network Data
Network data consist of attribute and relationship data on a set of individuals. Typically, we observe many different attributes and kinds of painvise relationships, and thus we have multivariate data referring to individuals as well as to pairs of individuals. Essential aspects of such multivariate data can be described by graphs and multigraphs. Random graph theory owes much of its development to the attempts being made to model uncertainty in networks. Uncertainty due to sampling variation, measurement e m r s and other inaccuracies necessitate the use of families of random graphs that depend on parameters that can be interpreted as quantities governing or controlling the outcomes of the mndom graphs. For example, in the exponential family of directed graphs introduced by Holland and Leinhardt [11, each vertex is characterized by two parameters governing fhe outcomes of out- and in-degree while two overall parameters govern reciprocity and density. In order to find an appropriate family ofrandom graphs for a certain application, there is a need for exploratory and confirmatory statistical methods by which empirical network data can be analyzed and fitted models evaluated. Such work might require special statistical methods but can also benefit from standard multivariate statistical program packages. Special computer software has been developed to analyze particular network models like, for instance, the Holland-Leinhardt model [l]. The approach of using log-linear analysis of multiway frequency tables with network data applied by Fienberg an6 Wasserman [2] and Fienberg, Meyer and Wasserman [31 is an example of thPusefulness of standard multivariate computer packages for the exploratory analysis of network data. A few other references to exploratory network analysis are given by [4]-[8]. The possibility of using easily available statistical software for network analysis has great potential. Simple tools are suggested in the following for exploring and modeling the statistical structure of network data encountered in various apphcations. We emphasize and illustrate these ideas by discussing in a fairly general way the choice of appropriate variables and units of analysis and the application of standard multivariate statistical techniques in order to find
0.Frank and K.Nowicki
350
useful models for describing networks and explaining their structure. To that end we focus on variation or change of various kinds in a network. Variation in the outcomes of individual statistics can be caused by some inhomogeneity that should be explicit in the model, or it can be caused just by random variation according to a model with individual homogeneity. Variation in the outcomes of dyad statistics, i.e. statistics referring to pairs of individuals, can be the cause of some structural dependencies that should be expressed by the model, or it can be caused merely by random variation according to a model with dyad independence. More generally, we could consider variation in the outcomes of triad statistics or other statistics referring to more than two individuals, but such variation is more difficult to relate to a plausible model, unless prior information is available that suggests specific model assumptions.
If a network changes with time, then either there is a need for a non-stationary model or the changes are considered as random fluctuations in a stationary model. Long series of networks are usually required to obtain statistical information about changes. Short series might suffice if attention is restricted to simple summary statistics of the networks. Special attention is due the frequency distributions of vertex statistics, dyad statistics, triad statistics, and other statistics referring to only a few individuals. The next section introduces some terminology and notations and, in particular, systematizes the kind of statistics we need for an exploratory analysis. Section 3 reviews some basic random graph models that are later used for illustrative purposes. Section 4 discusses how cluster analysis can be used to study the effects of individual heterogeneity. The analysis of dyad statistics is discussed in Section 5. A general method for using logistic regression analysis in graph modeling is described. Section 6 considers time series analysis of graph statistics. For large graphs, the practical problems involved in finding good ways of plotting a graph can sometimes be hard, and some suggestions are given in Section 7. Section 7 also describes a real data set consisting of a sequence of social networks, and Section 8 applies some of the suggested exploratory techniques to these data. A few concluding comments on exploratory analysis and modeling of network structures are given in Section 9. 2.
Preliminaries
A network on n individuals is specified by a matrix z = (zij),where the diagonal entries zii are attribute vectors characterizing the individuals i, and the off-diagonal entries z i j are attribute or relationship vectors characterizing pairs of distinct individuals ( i , ~ ] .
For instance, zii can be a two-vector giving the gender and age of individual i, and zij can be a two-vector giving the duration and strength of a certain type of contact from individual i to individual j . In the simplest case with no individual attributes and just one single symmetric relationship, the matrix z can be taken as the adjacency matrix of an undirected graph; that is, zii = 0 and zij = zji is an indicator of the occurrence of the relationship between individuals i and j . Any combination of characteristics of individual i that can be derived from the matrix z will be referred to as individual statistics and denoted by xi.Thus, provided the components of the vectors zij are numerical and can be added, xi can, for instance, be defined according to
Exploratory statistical analysis of networks
xi=
zii,c ZiJ
zii
351
.
[ i i i In the case of an adjacency matrix z of an undirected graph, the two-vector
I
1
zii.l (1- zji] maxkzikzjk ii i characterizes individual i by its numbers of neighbors at distance 1 and 2. xi =
Any combination of characteristics of the two individuals i and j that can be derived from the matrix z will be referred to as dyad statistics and will be denoted by xv For instance, xij can consist of zO, zji, zii, zj, &z&, & zjb &zW, & Z k j provided the components of the vectors zij canbeadded. In the case of an adjacency matrix z of an undirected graph, the four-vector
characterizes dyad (ij)by its edge indicator, its initial and final degrees, and its number of vertices adjacent to both i and j . For a network given by matrix z, any function of z will be referred to as network statistics and denoted by x. For instance, n can consist of the mean, vectors %and i
‘J 1 2n(n - 1) L..
itj
and the corresponding covariance matrices of the components of the diagonal entries and of the components of the off-diagonal entries, provided these are all numerical. In the case of an adjacency matrix z of an undirected graph, the two-vector
characterizes the graph by its numbers of edges and triangles. 3.
Some Random Graph Models
A random subset of a set S is called a Bernoulli @) subset if its elements are chosen by selecting independently each element of S with a common probability p .
An undirected random graph Z on N = { 1,...,n} is a random subset of
all 2-element subsets of N. A Bernoulli ( p ) subset of
(3%,
(3
,i.e. of the set of
is called an undirected Bernoulli (p)
graph on N. Its adjacency matrix (20)has Zit =O and Z g =
for all i, j E N , and the
[ ;) edge
indicators Z0 are independent Bernoulli @) variables for i c j .
If { N 1 , N 2 } is a partition of N into two disjoint non-empty subsets, and if Z1 and Z2 are independent undirected Bernoulli (pl) and Bernoulli ( p 2 ) graphs on N , and N2, respectively, then the union Z = Z1 u Z2 is called an undirected Bernoulli block model with two blocks N 1
0. Frank and K.Nowicki
352
and N2. If 212 is a Bernoulli ( ~ 1 2subset ) of N1 x N 2 , then the union Z = Z1 u &U 212 is a general undirected Bernoulli block model in which edges are allowed also between the blocks. This definition is readily extended to more than two blocks. Bernoulli block models, like Bernoulli models, have independent edges. A simple model exhibiting dependent edges can be introduced as follows. Let H be a Bernoulli @) subset of
( ), 1.e. of the set of all 3-element subsets of N. H i s a random hyperpph. For each hyperedge { i , j , k } E H, consider the complete undirected graph KVk = { { i , j } , { j , k } , { k, i} }
defined on { i , j , k } . Define the graph Z as the union of all these complete graphs Kijk for { i, j , k} E H . This random graph Z is called a Bernoulli @) triangle graph on N . Various generalizations are obtained by specifying other graphs than complete ones on the hyperedges. Frank and Strauss [9] have investigated graph models with dependent edges called Markov graphs. A simple undirected Markov graph is defined by the probability function P(Z = z) = exp&
+ hlx + h2% + h3x3),
where ho is a normalizing constant, h l , h 2 , h 3 are three parameters governing the density, clustering and transitivity properties of the graph, and ~ 1 ~ 2are . ~three 3 statistics given by the numbers of edges, two-paths and triangles in z; that is
4.
Clustering Individual Statistics
In order to decide whether or not there is a need for a model with individual heterogeneity, it is helpful to separate the individuals into clusters, that is subsets of individuals, such that individuals in the same subset are more similar than individuals in different subsets. After the clusters have been identified the approach is to specify distinct models within each cluster and between different clusters. The underlying idea is that it should be easier to find a useful network model under homogeneity assumptions. One very simple example of a model with individual heterogeneity is a Bernoulli block model with the blocks defined by the clusters. Even though this model might be unrealistic, it can be used to illustrate some of the problems involved in the search for clusters. In order to illustrate the clustering of individual statistics we start with an undirected Bernoulli ( p ) graph Z on N = { 1, ..., n} . For the individual statistics we choose the vertex degrees Xi = X, Zij which are binomial ( n- 1,p).The n vertices are clustered by similarity in degree.We denote by Fk the frequency of vertices of degree k for k = 0, I, ...,n - 1. The expected frequency of vertices of degree k is equal to
Exploratory statistical analysis of networks
353
Since this expected frequency is a unimodal function of k , we expect to find only one cluster with any reasonable decision rule based on the frequencies Fo,. ..,F, - 1, 1.e. we expect to correctly accept homogeneity between individuals under this model. If we use a block model with a Bernoulli (pl) block of size nl and a Bernoulli (p2) block of size n2, then the degrees are binomial (nl - 1,p1) for nl vertices and binomial (n2 - 1,pz)for n2 vertices. It follows that the expected frequency of vertices of degree k is equal to EFk=
2
2
ni['il)d(l
-p;)nc-l-k.
i= 1
If the value of (nl - l)pl is far from the value of (n2 - l)p2, we can expect to find two clusters corresponding fairly well to the two blocks, but if (nl - l)pl is close to (n2- l)p2,the identification of two clusters would require other methods. As a numerical illustration we consider the case of a two-block model with nl = 10, n2 = 20, p 1 = 0.3, p2 = 0.5. Here the expected degree distribution has a bimodal form, as shown in Figure 1. From this model we simulated lo00 networks. The smoothed average degree distribution did not deviate very much from the expected distribution, but for 38 of the networks, that is for about 4 per cent, the degree distribution after smoothing turned out to be unimodal. This percentage can be considered as an optimistic estimate of the risk of not identifying the need for a block model in this case.
Frequency
0
1
2
3
4
5
6
7
8
9 10 11 12
13 14 15 16 17
Degree
Figure 1: Expected degree distribution in a two-block model. In practice, we cannot very often expect to be content with Bernoulli block models, and the main advantage of the present discussion is that it can be easily extended to more interesting cases with more complicated data patterns. With several characteristics of the individuals available, the search for clusters can be based on various standard methods of cluster analysis. The efforts to sepamte the vertex set into clusters has of course to be balanced against the efforts needed to find appropriate models within and between the clusters. Should it, for
0.Frank and K.Nowicki
354
instance, be possible to find kclusters described by k - 1 parameters and a set of simple twoparameter models within and between the clusters, then the total number of parameters would beequalto
2 ( y + k- 1, and this should be compared to the possibility of finding an overall model with this number of
parameters.
5. Cross-Classificationof Dyads In order to decide whether or not there is a need for a model with edge dependencies, we
can cross-classify the dyads according to various statistics and count and analyze the numbers
of dyads in different categories. Log-linear analysis can be applied to detect interesting interaction effects between the statistics used for the classification.logit analysis can be used to analyze the edge proportions among the dyads in different categories. If the categories are defined in terms of statistics that measure the “local edge density”, then the discovery of different edge proportions among the dyads in different categories indicates a need for a model with dependent edges. To a great extent the success of such approaches depends on the choice of appropriate statistics for the cross-classification of dyads. Data for a simple model can be used to illustrate the difficulties involved. More realistic illustrations are provided in Section 8. Consider first a Bernoulli @) model on n vertices with the dyads classified according to a single statistic, say the number of two-paths between the two vertices in the dyad. Let F u be the number of dyads having k two-paths and 1 edges for k = 0, ...,n - 2 and I = 0,l. Set Fk = Fko + F k l for the number of dyads having k two-paths. The expected value of Fk is
and the expected value of Fkl is mkl = pEFk. The proportion Fkl/Fk of edges among the dyads with k two-paths is roughly constant, which correctly indicates no need for a model with edge dependence. Assume now instead that the model is a Bernoulli @)-triangle graph. Then for k # 1 the proportion FkltFk varies with k , and this strongly suggests the presence of edge dependence. In fact, this graph has very peculiar properties: It is transitive and it has no isolated edges; there are no end vertices, and every vertex of degree 2 is a comer of a triangle. If the model is modified so that instead of triangles we enter two-paths on the Bemoulli(p) selected triads, then the peculiarities of the graph will not be quite so revealing, but the method of examining conditional edge proportions Fkl/Fk will still work. The idea of considering the probability of an edge conditional on “local” properties can be modified and used to estimate graph models of exponential type. Consider an exponential model given by
z) = exp& + h l x l + h2%+... + h,~,,,), where ho is a normalizing constant and XI,. ..,A,,, are parameters corresponding to graph stap(Z=
tistics x l , ... ,xm evaluated at z.The probability of an edge at dyad (i,J conditional on all the rest of the graph z can be calculated as
Exploratory statistical analysis of networks
355
and it follows that
where xijk is the difference in statistic Xk evaluated when z has z i j substituted by 1 and 0, respectively. Thus, all the parameters hl,...,Am appear as coefficients in the logistic regression and can be estimated by standard methods. See[9] and the application in Section 8 below. 6.
Time Series of Graph Statistics
When a sequence of networks is available, the main question concerns how the networks are related. If the purpose of the analysis is to fit a non-stationary graph process, then a first approximation can be given by a sequence of independent random graphs governed by time dependent parameters. Frank [ 101 has elaborated on this idea. Any graph changes with time are considered as the effects of certain changes in the parameters governing the properties of the networks. Previous or present outcomes of the graph have no direct influence on the future outcomes. Should such influence be required, stochastic dependencies have to be introduced and a possible model is a Markov process with graph states. Exploratory analysis of a sequence of networks can be based on various summary statistics that reflect time changes. For instance, it is natural to look for time changes in the frequency distributions of various vertex statistics, dyad statistics, triad statistics, and so forth. Time series analysis of such low order statistics can also be helpful to detect interesting patterns. As a simple example, consider a random graph process for which the dyads are independent, identically distributed Markov chains with homogeneous transition probabilities P(Z,(t
I
+ 1) = I z&)
= k ) = Pkl
for k = 0,l and I = 0,l. If the evolution of the dyad processes are observable, then the transition probabilities can readily be estimated. If, however, only the “global” graph properties are available, say the total number of edges R, = 2
2 zg(d
i
then other methods are needed. Here R, is a Markov chain with R , + conditional on R , = r given by the sum of two independent binomial variables with parameters ( r , P I 1 ) and , respectively. It follows that the conditional edge density has an expected
0. Frank and K. Nowicki
356
and the transition probabilities can be estimated by regression methods. A numerical experiment with n = 20, Pol = 0.05 and Plo = 0.20 was simulated, and from a stationary sequence of 100 graphs the regression estimates with their standard deviations turned out to be Po, = 0.05f 0.01 and PI, = 0.20 f 0.08. A test of independence could be given as a test of whether the regression slope is zero. We omit the details.
7. A Sequence of Social Networks To get some flavour of real network data analysis we have reanalyzed sociometric network data provided by professor Wolfgang Sodeur. The data were collected at the University of Wuppertal as part of a research project on social network analysis. Various research reports from this project are included in a general bibliography on social networks edited by Sodeur et al [ll]. The data set can briefly be described as consisting of an ordered sequence of six networks of social contacts between 208 freshmen at the University of Wuppertal. The students were interviewed about their preferences for contacts with their fellow students on six different occasions, namely at 2 3 , 4 5 7 , and 9 weeks after the start of the semester. On each occasion, each student was asked to name the peers with whom he or she preferred to have contact. There was no restriction on the number of contacts, but most students named two or three peers. There were missing data due to the fact that some students refused to participate at all, some refused to reveal more than nick names of their peers, and some did not report on all occasions. The present discussion will be confined to the n=179 students for whom complete preference information was obtained on all six occasions, and no problems of missing data will be considered here. To simplify matters and to improve the reliability of the contact information, we restrict attention to reciprocal preferences only. Thus, data consist of an ordered sequence of six undirected networks of reciprocal contacts that can be represented by symmetric matrices z(f)=(zg(t)) for f = 2,3,4,5,7,9, where zb{t)=O and z&t)=zj(t) is 1 or 0 depending on whether or not students i and j report a reciprocal contact at time t. The students are labeled by integers 1,2, ...,179, in the same manner on each occasion. Table 1 reports some basic facts concerning number of contacts, number of triTable 1: Summary statistics of a social network of order 179 on six different occasions. Statistic Number of edges Number of triangles Number of isolated vertices Number of isolated edges Number of connected components of order 3 or more ~~
Order of the largest connected component
‘Iime
Exploratory statisticalanalysis of networks
357
angles, number of connected components, and other characteristics of the networks on each occasion With as many as 179 vertices, it is not straightforward to draw a nice graphical representation of the networks. In order to draw a graph we found it convenient to consider the connected components of the union of the six networks, that is the components of the network defined by the matrix with elements z..=max z i j ( t ) . 'j
r
This amounts to saying that two students i a n d j belong to the same component if and only if there is an ordered sequence io,. ..,i, of students such that 10 = i and ,i = j with ik - 1 and ik having a reciprocal contact at least once for every k = 1,. ..,111. According to the network given by matrix there are 54 isolated vertices, 12 isolated edges, 2 connected components of order 3 , 2 of order 4, and 4 of larger orders, namely 5, 10, 17 and 45.Now we can restrict attention to the vertex sets of these connected components separately, and consider the corresponding subsets of the original networks on each occasion. In this way we do not have to consider more than 45 vertices simultaneously, and only 3 components have more than 5 vertices. Figure 2 shows the connected component of order 17 in the network given by z. Figure 3 shows the corresponding networks on each occasion. We notice that the sequence of networks consists of many small components and few large ones, and there is no obvious pattern in the way components on one occasion are related to components on another occasion.
Figure 2: A connected component of order 17 in the network obtained by taking the union over six occasions. Some of the exploratory methods discussed above have been applied for a more detailed analysis of the networks. We illustrate these methods in the next section.
8.
Statistical Analysis of the Social Networks
A natural way of summarizing the six social networks is by their distributions of vertex degrees. The degree distributions are given in Table 2, and it is pretty obvious that there is no significant difference between the distributions on the six different occasions. From the assumption of stationarity and independence between the six networks, the expected degree distribution can be estimated by the average frequencies in Table 2. The expected proportion of edges is given by 0.0037. For a Bernoulli graph model, the expected
0. Frank and K.Nowicki
358
Time 2
Time 3
Time 4
Time 5
Time 7
Time 9
Figure 3: The network on each occasion corresponding to the connected component given in Figure 2. number of isolated vertices should be close to 93 and the expected number of end vertices should be close to 61; both these numbers are far from the values given in Table 2. These facts and many others provide clear evidence that a Bernoulli model is not adequate here. In order to get some guidance as to what specific features are present here, we can look at the distributions of various dyad statistics. Table 3 shows the distributions of the dyads according to three statistics, namely the edge indicator, the number of edges adjacent to only one of the two vertices in the dyad, and the number of vertices adjacent to both the vertices in the dyad. For brevity, we refer to these statistics as the edge indicator, the degree, and the number of two-paths of the dyad, and they are denoted by
Exploratory statistical analysis of networks
359
Table 2 Degree distributions of a social network of order 179 on six different occasions. -~
Degree ..
~
~~
Time
2
3
4
5
0
95
102
109
104
1
55
39
32
44
2
19
26
23
20
3
9
9
11
10
4
1
3
3
1
Table 4 yields the proportions of dyads with edges among the dyads with fixed degree and fixed number of two-paths, that is the estimated edge probabilities conditional on these numbers. Obviously there is strong evidence that edges occur more frequently among vertices with many neighbors than among vertices with few neighbors.An interesting correspondence between these findings and a Markov graph model can be noticed. Consider an exponential graph model given by
where
are the numbers of edges, two-paths and triangles of the graph. Frank and Straws [9] and Frank [lo] have shown how logistic regression can be used to estimate this model. The interesting connection between the cross-classification in Table 4 and the parameters of this model is that the parameters can be estimated as coefficients in a logistic regression
If Nz ( d j ) is the number of dyads (ij) with i
0.Frank and K. Nowicki
360
Table 3: Dyad distributions of a social network on six different occasions according to degree (4,number of two-paths (s), and edge Occurrence ( z ) Time
f 4 0 4 0
1
0
0
0
0
0
0
Exploratory statistical analysis of networks
36 1
Table 3: Dyad distributions of a social network on six different occasions according to degree (4,number of two-paths (s), and edge occurrence (2) d
I l l s
z
Time 2
5
4
3
9
7
6 1 0
2
4
7
0
0
0
6 1 1
0
0
1
0
0
2
6 2 0
0
0
3
1
3
1
3
1
0
0
0
0
0
0
0
0
0
0
1
0
0
1
0
1
0
0
0
0
0
0
I
or the weighted sum of squares
can be minimized with respect to hl, h2, h3 by using logistic regression options in standard statistical packages. Applying this approach to the present sequence of social networks leads to the results reported in Table 5. Thus we find a fairly good fit with a sequence of Markov graphs having time dependent parameters for density, clustering and transitivity.
362
0. Frank and K.Nowicki
Table 4 Edge proportions of a social network on six different occasions according to degree (d), and number of two-paths (s).
Exploratory statistical analysis of networks
363
Table 5: Fitting Markov graph parameters for density (hl),clustering (12)and transitivity (h3)and goodness of fit measures on six different occasions
9.
Comments on Modeling
The lack of special software for the statistical analysis of network data should not be considered as an obstacle for exploratory graph analysis. Most standard multivariate statistical packages contain several procedures that are sufficient to undertake an exploratory analysis of network data. Cluster analysis of vertex statistics, cross-classification analysis of dyad statistics, logistic regression analysis and time series analysis of various graph statistics have been discussed here and have been intended as illustrations of the possibility of applying standard methods in order to gain insight and support for graph modeling. There are a few basic approaches that can be varied according to the needs of the application considered and the intuition and imagination of the statistician. These approaches comprise the use of cluster analysis of vertex statistics, dyad statistics and other low order statistics in order to detect inhomogeneities, the use of cross-classifications of low order statistics and loglinear or logit analysis to detect dependencies within a graph, and the use of time series analysis of low or high order statistics in order to detect dependencies between distinct graphs. The main obstacle to network modeling is the difficulties involved in the interpretation of the various trends and tendencies that become evident from the exploratory analysis and the lack of knowledge about how such properties can be modeled by specific multiparametric random graphs.There should be more research devoted to the synthesis of random graph models with specific properties. For instance, in order to gain insight into the formation and development of human contact networks, dynamic models should be specified by setting up transition equations and deriving equilibrium solutions. Good references for this type of work are [12]-
rm.
Statistical graph theory developed during the 1970s and 1980s, and the emphasis in the literature is on different themes. Probably the main contributions have come from the develop-
0.Frank and K. Nowicki
364
ment of log-linear models in the analysis of categorical data. Research work related to Markov fields and conditional independence in families of random variables has also entered this development. A selection of modern references is [16]-[20]. Other approaches are related to developments in cluster analysis: [21]-[25l .There are also several contributions to random graph theory and graph sampling designs that are not directly related to exploratory statistical modeling but of relevance to the development of probability models and statistical inference in graphs. A few examples are contained in [26]-[31]. Statistical graph modeling is increasingly recognized as a field of importance for statistical applications. It also contributes to theoretical statistics and is a source of much enjoyable research.
References P. Holland and S. Leinhardt; An exponential family of probability densities for directed graphs, Journal of the American Statistical Association, 7633-65 (1981). S . Fienberg and S. Wasserman; Categorical data analysis of simple sociometric relations, Sociological Methodology, S. L e i a r d t (editor), Jossey-Bass, San Francisco, 156-182 (1981). S. Fienberg, M. Meyer and S. Wasserman; Statistical analysis of multiple sociometric relations, Journal of the American Statistical Association, 80,5147 (1985). 0. Frank, M. Hallinan and K. Nowicki; Clustering of dyad distributions as a tool in network modeling. Journal of Mathematical Sociology, 11.47-64 (1985). 0.Frank, H. Komanska and K. Widaman; Cluster analysis of dyad distributions in networks, Journal of Classification, 2,219-238 (1985). 0. Frank; Growing classification and regression trees on network data, Classification as a Tool of Research, W. Gaul and M.Schader (editors), North-Holland,Amsterdam, 137-143 (1986). 0. Frank; Multiple relation data analyses. Operations Research Proceedings 1986, H. Isermann et al. (editors), Springer-Vedag,Berlin Heidelberg. 455460 (1987). D. Knoke and J. Kuklinski; Network Analysis, Sage Publications, Beverly Hills (1982). 0. Frank and D. Straws; Markov graphs, Journal ofthe American StatisticaI Association, 81,832442 (1986). 0. Frank; Statistical analysis of change in networks, Statistica Neerlandica, 45. (1991). W. Sodeur et al.; Bibliographie zum Projekt Analyse sozialer Netzwerke, GesamthochschuleWuppertal (1978). U. Grenander; Pattern Synthesis, Lalures in Pattern Theory, Vol. I, Springer Verlag, New York (1976). U. Grenander; Pattern Analysis, Lectures in Pattern Theory, Vol. II, Springer Verlag, New York (1978). U. Grenander; Regular Structures, Lectures in Pattern Theory, Vol III,.Springer Verlag, New York (1W). P. Whittle; Systems in Stochastic Equilibrium, John Wiley & Sons, Chichester (1986). Y. Wang and G. Wong; Stochastic block models for directed graphs, Journal of the American Statistical Association, 82.8-19 (1987). G. Wong; Bayesian models for directed graphs, Journal of the American Statistical Association, 82, 140148 (1987). S. Wasserman and D. Iacobucci; Sequential social network data, Psychometrika, 53,261-285 (1988). T. Snijders;Testing for change in a digraph at two time points, Social Networks (to appear). J. Whittaker; Graphical Models in Applied Multivariate Statistics, John Wiley & Sons, Chichester
(1W).
J. Hartigan; Asymptotic distributions for clustering criteria, The Annals of Statistics, 6 , 117-131 (1978). 0. Frank and F. Harary; Cluster inference by using transitivity indices in empirical graphs, Journal of the American Statistical Association, 77,835840 (1982). J. Hartigan; Statistical theory in clustering, Journal of Classification, 2.63-76 (1985). E. Godehardt; Explorative mathematische Modelle in der Medizin, Nichtlineare Regression und Numerische Klassi$katwn, institut fiir Medizinische hkumentation und Statist& der Universit;it, Koln (1%).
Exploratory statistical analysis of networks
365
[a E. Godehardt; Graphs m Structural Models, F. Vieweg and S o h Verlagsgesellschaft, Braunschweig (199o). [26] T. Snijders and F. Stokman; Extensions of triad counts to networks with different subsets of points and testing underlying random graph distributions, Social Nehuorks,9, a s 2 7 5 (1987). 0.Frank, Random sampling and social networks: A survey of various approaches, Mafhhatzques.Informatique et Sciences humaines, M:104.19-33 (1988). [ZS] K. Nowicki; Asymptotic Poisson distributions with applications to statistical analysis of graphs, Advances in Applied Probability. 20.315330 (1!388). [2!?] S . Berg and L. Mutafchiev; Random mappings with an attracting center: Lagrangian distributions and a regmsion function, Journal of Applied Probability, 27,622-626 (1990). K.Nowicki; Asymptotic distributionsof subgraph counts in colored Bernoulli graphs, Random Graphs. M. Karobski, A. Rucibski and J. Jawmki (editors), John Wiley & Sons,New York,87,203-221 (1990). [31] S. Janson and K. Nowicki; The asymptotic distributions of generalized U-statistics with applications to random graphs,Probability Theory and Rehred Fie& (to appear).
[m
m]