SOCIAL NETWORKS ELSEVIER
Social Networks 18 (1996) 315-318
Analysis of discrete structure: an overview Phillip B o n a c i c h Department of Sociology, University of California, Los Angeles, CA 90024, USA
1. Introduction The papers in this issue of Social N e t w o r k s represent a distinct point of view outside the methodological framework in which most American social scientists operate. In this overview I wish to highlight the ways in which this analysis is different. Although these papers are appearing in a networks journal, the papers develop a set of data analysis methods that are quite general and can be applied to almost any data structure. I will focus first on the general data analysis approach exhibited by these papers.
2. Boolean analysis What has happened to 'necessary' and 'sufficient' conditions as data analysis concepts? We learned them in our elementary methods classes, but they have almost disappeared from the set of tools that we use as practicing social scientists. They have been replaced by regression and its variants or by log-linear models for categorical data. The correlation coefficient rxy is a symmetric measure of linear association: rxy = ry x. The log-linear measure of association rxy is also symmetric. However, if x is a necessary (sufficient) condition for y, y is not required to be a necessary (sufficient) condition for x. Consider Table 1, where x is the independent variable. This table shows that (with one exception) condition x is sufficient but not necessary for y to occur. Using log-linear models, the lack of symmetry is completely lost; the relationship coefficient is 3.2, and the lack of symmetry is entirely absorbed by the differing coefficients for the marginals for variables x and y: 3.2 for y and 0.45 for x. T h e r e are occasions in which it is clearly useful to distinguish necessary or sufficient conditions, when forming G u t t m a n scales, for example. But the distinc0378-8733/96/$15.00 © 1996 Elsevier Science B.V. All rights reserved SSDI 0378-8733(95)00278-2
316
P. Bonacich / Social Networks 18 (1996) 315-318
Table 1 y Not y
x
Not x
100 1
50 50
tion could profitably be reintroduced into our regular arsenal. For the past few years I have been working on the relation between network position and the exercise of power. Without much reflection, I have been assuming that occupying a favorable position in a network is a necessary but not sufficient condition for the exercise of power. Other conditions must be met as well: one must behave in an exploitative ('rational') manner; one must understand the rules of the experimental game. If it were shown that occupying a favorable condition was sufficient but not necessary for the exercise of power, I think I would be more inclined to look at the alternative sufficient conditions that were powerful enough to completely overwhelm structural disadvantages. Correlational analysis implies a larger structure within which a particular correlation is imbedded and within which it should be interpreted. For correlation-based analysis, this is the covariance matrix. Similarly, as the papers in this volume ably demonstrate, there is a larger structure within which 'necessary or sufficient conditions' is embedded - - Boolean algebra and variations of Boolean algebra. 'Condition A is necessary for B to occur' is equivalent to 'the set of conditions within which A occurs contains the conditions under which B occurs', or A 2 B . Similarly, if A is a sufficient condition for B, then A ~ B . Statements about truth can be translated into statements about sets, and Boolean algebra is the language of sets. The basic operators in Boolean algebra are the union (U), the intersection ( N ) and the complement of a set. A _cB is the same as A n B = A, A WB = B, and A ~ B. With these operators, more complex conditions can be expressed. For example, ( A n B) u C ~ D or ( A n B) U C U D = D say that if C and the joint occurrence of A and B both are sufficient to produce D. Moreover, the rules of Boolean algebra, (for example A N (B U C) = ( A n B) U ( A n B)), can be used to combine many different observations into fewer more abstract statements. The paper by Duquenne and Lebeaux, 'Boolean analysis of questionnaire data', illustrates the logic of this approach admirably. However, the logic of implications admit no exceptions. A ~ B means that a l l A conditions are B conditions without exception. White's paper, 'Constructing lattices for dichotomous data with noise and missing values' deals with the statistical problems of drawing implications where there are exceptions. This paper describes an approach he has developed to separate the regularities from the error called 'Statistical Entailment Analysis'. 3. Galois lattices
Galois lattices are designed to simultaneously analyze t w o - m o d e binary data: data tables in which the rows and columns represent different universes of objects
P. Bonacich / Social Networks 18 (1996) 315-318
317
and the table entries indicate a connection or the absence of a connection. For example, the rows could be people and the columns attributes which they possess or fail to possess, or the rows could represent people and the columns groups to which they belong. Boolean algebras are 'lattices' in a mathematical sense. Every pair of objects (subsets) has a least upper bound (their union) and a greatest lower bound (their intersection). A Boolean lattice can be diagrammed so that more inclusive sets are higher and a line joins a higher and a lower set when the higher one contains one more element than the lower. The resulting diagram resembles a traditional lattice design. Whereas Boolean lattices could be used to represent the relations between subsets formed by one classification (for example, patterns among interlocking directorates of corporations), Galois lattices represent relations among two modes simultaneously. If the modes were, for example, people and organizations, the elements of a Galois lattice would be sets of people who belonged to the same set of organizations. If read from one orientation, the Galois lattice represents relations between sets of people with similar membership patterns, their intersections (more accurately 'meets') and their 'joins'. Turning the diagram upside down, it represents relations between sets of organizations with similar membership patterns, their meets and joins. The diagram can be interpreted through one's knowledge of the people (the rows of the data table) or the groups to which they belong (or traits they possess). The Galois lattice representation is a simplification because its units are clusters of people-and-groups (or people-and-traits). These simultaneous clusterings of people and traits are called concepts by Duquenne. But the Galois lattice does much more than cluster. It also shows the relationships between these clusters in a useful diagram which shows relations between clusters, which clusters link other clusters, and hierarchies among clusters. 'On lattice approximations: syntactic aspects', by Duquenne, develops the logic of Galois lattices. 'Lattice analysis and the representation of handicap associations', also by Duquenne, uses Galois lattices on empirical data to describe the relations between different handicaps. But two of the papers, I believe, use Galois lattices to maximum effectiveness. Galois lattices show the relations between clusters consisting of both people and traits. Thus, their interpretation is richest when one can interpret both. In 'Actor and event orderings across time: lattice representation and Boolean analysis of the political disputes in Chen Village, China', Schweizer interprets the effects of recent historical events on the inhabitants of a village in China through using information about the events and information on the people affected by those events. In 'Cliques, Galois lattices, and the structure of human social groups', Freeman re-examines the well-known Roethlisberger and Dixon Bank Wiring Room data. I would suggest that this paper be read first; it provides some of the background necessary to understand the other papers. Finally, White and Jorion, in 'Kinship networks and discrete structure theory: Applications and implications', present an algebra for kinship relations. In conventional kinship diagrams there are two kinds of relations between people, 'married
318
P. Bonacich / Social Networks 18 (1996) 315-318
to' and 'parent of'. White and Jorion innovate by using diagrams in which the fundamental unit is the couple and individuals are the links between couples. Having only two relations, this structure is mathematically simpler. It also makes no assumptions about the differing meaning of marriage in different societies. White and Jorion then develop mathematically the properties of these structures. What all these papers share in common, and what makes them exciting, is that they develop or apply the logic of algebra to data structures. There have been other major attempts to use algebra to analyze data, but they were all specifically tied to network data. Blockmodeling's use of the algebra of semigroups depends on the 'axiom of quality', that paths of whatever length and type that connect the same pairs of individuals can be considered usefully equal. There are very few situations in which this has proved to be true. Borgatti and Everett's work on structural equivalence is very important, but little use is made of the resulting algebras themselves to describe empirical regularities. For example, their bestknown work is on 'automorphic equivalence'. Two positions in a graph are automorphically equivalent if there is a permutation ~ of the vertices such that aRb if and only if ~(a)R-O(b). This is very useful and very subtle, but no use is made of the permutation ~ or of the group it generates. The techniques described in these papers transcend network data, although they can profitably be used on network data (see the Freeman paper). Most exciting to me is that the algebras can be used to describe the data and to frame laws, especially structural ones. Of course, the tool-making and refining is clearly not at an end. The strictly statistical problems, the separation of real trends from error, are treated very meagerly in many of the papers. Statistical approaches integrally connected to the techniques remain to be further developed. I would also like to see these techniques used on the same data with their natural competitors, log-linear models and correspondence analysis. Nevertheless, I think that these papers represent a provocative development.