Modeling social network processes using constrained flow representations

Social Networks North-Holland 6 (1984) 259-292 259 MODELING SOCIAL NETWORK PROCESSES USING CONSTRAINED FLOW REPRESENTATIONS * Wayne W. ZACHARY **...

Download PDF

2MB Sizes 2 Downloads 79 Views

Report

PDF Reader
Full Text

Social Networks North-Holland

6 (1984) 259-292

259

MODELING SOCIAL NETWORK PROCESSES USING CONSTRAINED FLOW REPRESENTATIONS * Wayne

W. ZACHARY

**

DX\-el U!7ilVrsfrl~

1. Introduction The analysis of social network data has long relied heavily on models and results from mathematical graph theory, and somewhat less heavily (but perhaps more productively) on the various algebras that can be developed from graph-like models. The interest in structural aspects of human relations has motivated the use of these analytic tools, but their use is a two-edged sword since graph theory and algebraic approaches strongly predispose one toward structural concerns. Nonetheless, the basic social network concept of a set of entities interconnected by a set of relationships is sufficiently general and powerful that many other approaches can be used to formalize and analyze data so collected. This paper concerns one such approach, that of network flow models. These models should be of particular interest to the social network community because they facilitate analysis of the processes which occur within the networks under observation. In addition, network flow models permit many new forms of analysis of the structure of social networks by allowing consideration of the structure of processes extant within them. The intent of this paper is to present an overview of network flow models in the context of a non-trivial example of a social process * I would like to thank Henry Selby of the Univeristy of Texas and Marie Wurster of Temple University who made helpful comments on earher drafts of this manuscript. Thanks also go to Ms. Patsy Couch for her heroic efforts in typmg this difficult manuscript and Ms. Amy Polock for designing the Figures. Of course, all remaining errors and problems are my doing alone. My thanks also go the National Science Foundation and Analytics, Inc. which supported me on various periods during which this paper was being written. ** College of Information Studies, Drexel University, Philadelphia, PA 19104, U.S.A.

0378~8733/84/$3.00

‘I 1984. Elsevier Science Puhlihhera B.V. (North-Holland)

(group fission). The central ideas are presented through the definitions and theorems invoked within the course of the analysis of the example. All theorems are proved, but a full and detailed examination of the proofs is not necessary to an understanding of the applicability, conceptual organization, and underlying logic of network flow models. Indeed, the mathematically timid reader should not be deterred by the widespread use of forma1 notation; it is included here to make the presentation meaningful to those with the mathematical background and interest to appreciate it, not to make it inaccessible to those with other interests.

2. Basic terminology The basic terminology and formalism of network flow theory is almost identical to that of graph theory. A network is defined as an ordered pair (N, E) where N is a finite set containing II elements, and E is a subset of N x N. The elements of N, written ni, are referred to by the undefined term node, and the elements of E, alternately written e,, or (n,, n,), are referred to by the undefined term edge. 1 For maximum generality, networks are defined here to be directed. That is, for all n,, n, in N, there are two possible edges (n,, n,) and (n,, n,). One restriction on E not usually present in graphs is that no node in a network can be connected to itself, (i.e., (n,, n,) $ E, V n, E N. In networks (as in graphs) nodes are interpreted as individuals and edges as relationships among these individuals. It is useful in thinking about an edge e,, E E to express it as “node i being connected to node j,” rather than as node i being related to node j, as is often the case in graph theory. The motivation for this minor semantic variation is that it helps keep in mind the fundamentally concrete nature of the edges in a network as opposed to the more conceptually abstract relations in a graph. In a network, “something” will flow through the network, and this flow requires some kind of connection between the entities (the nodes) in the network. This notion of connectedness, it should be ’ The terminology here is a combination of the IWO forms commonly used. The term ” vertex” is used by many authors in place of “node” and the term “arc” is similarly used in place of “edge.” To the extent that the terms edge/vertex and arc/node are usually used in conjunction. the present usage of edge/node is a slight abuse of the current (confusing) terminology.

stressed, is merely a conceptual crutch and has no formal meaning and no relationship to the more precise technical meaning of connectedness in graph theory. A simple undirected network is depicted in Fig. 1. Two additional concepts that are important in network theory as well as graph theory are paths and c:rc,/es. A path is simply a connected sequence of edges in a network, and a cycle is a special kind of path, one that begins and ends in the same node. More formally, these are defined as follows: Definition.

A path

from II, to I?., in (N, E) is a set of ordered

nh), (nL, nj), (jr,, n,,,), Defitlition. n, = t7,.

A cycle

pairs

{(n,,

. . . . (t7II, t7,)}.

at node

t7, in the network

(N,

E) is a path

where

There two basic directions in which further mathematical elaboration is possible on this basic (hi, E) structure. On the one hand it is possible on N to examine in detail the properties and pattern of relationship that is imposed by E. This is the structuralist direction taken by most graph theoretic and algebraic applications over the last thirty years (e.g., Hage 1979). On the other hand is the consideration of the kinds of processes that can occur within and across the network. This leads directly to the study of flows in networks, the primary concern here.

Figure

1. A simple

undirectednetwork

3. Processes,

constraints,

and network flows

The application of networks as models of social processes may not be immediately evident. No allowance for temporal variance was made in the basic definition of a network, nor was any provision given for time-dependent relations to be added to the basic structure. Since process is by definition time-based, and network structure was defined as non-varying, networks would thus appear amenable only to structural analyses. However, structural variation over time is only one (relatively global) kind of process. Within any given structure, there are many processes that occur within it, and do not directly involve changes in this structure. It is this kind of connection between structure and process that is of interest here - the setting that the structure provides for process. Any process observed in the natural world (and particularly those in the biological/social realm) can be partitioned into two essential aspects _ the underlying mechanism and the environmental constraints. Such partition allows the process to be approached as if its dynamics were “driven” by the underlying mechanism, and its specific sequence of behaviors was shaped by the environment in which that mechanism was operating. The notion of constraint here is simply that the “setting” of a process contributes largely (but not deterministically) to the direction that a process takes. This contribution is usually of a limiting nature; thus a constraint specifies that some relationship or quantity can nor be violated, thereby limiting or constraining what the “mechanism” can do. Simon (1981) has contributed to the understanding of the setting/mechanism relationship and the importance of constraint, pointing out that in many cases it is simply the complexity of the environment and the constraints it imposes (and not the underlying behavioral mechanism) that gives rise to the apparent complexity of a behavioral process. This proposition provides a powerful theoretical vantage point from which to analyze and model social processes examining the role that constraints in the social environment place on the unfolding of social processes. We can build models of social environments and their constraining effects on even simple processual mechanisms, calculate the model results, and compare these to reality. This is done not out of a belief that the constraints determine the course of events, but out of a hope that such an approach can provide an opportunity to examine how much of the course of events can be

attributed to the properties of the social environment and how much must come from properties of the underlying mechanisms at work. The inclusion of the concepts of process and constraint fundamentally differentiates a network model from a graph model. Conceptually, a network is a system of interconnections across which a process occurs _ a process that is constrained (differently) by each connection in the network. The process is represented as a flow of some undefined quantity through the network, and the constraints are the limitations to the flow that are associated with each connection in the network. Before defining this concept more formally, it should be pointed out that an enormous number of generic and specific processes fit this general structure. From the perspective of political economy, for example, the flow of goods and services across the economic relationships of a society (traditional or industrialized) is the fundamental object of study. From an anthropological perspective, the flow of mates (in practice, women) across the kinship relationships in the society is a primary concern in kinship studies. And from the perspective of culture and communication, the flow of ideas, information, and sentiments across interpersonal (face-to-face) relationships has been a primary theme for decades. Thus, the basic network flow model is relevant to a wide variety of socially-situated processes of current and historical interest in social science. Back on more formal terms, the primary constraint within the network model is the edge capacity. This is the numerical limit on the rate at which a flow can occur across a given connection in a network. Edge capacities are defined by a function C which assigns a positive value to all edges in the network, as follows: C defined on N X N for which C( n,, nJ) > 0 if is called a capacity function. Any network of the form (A’, E, C) is a capacitated network. Definition.

A function

(n,, n,) E E, and C( n,, n,) = 0 otherwise,

The example network from Fig. 1 is shown as a simple undirected capacitated network (i.e., one where C(n,, n,) = Cjn,,, n,) for all edges) in Fig. 2. The same example is shown as a simple directed capacitated network in Fig. 3. In both figures the capacities along edges are written along the edge. In Fig. 3, two capacities are shown for each edge with an arrow above each indicating the directionality of the capacity it indicates. For the sake of the example, the net capacity along each edge in Fig. 3 has been constructed so as to be consistent with the value

264

W. W. Zachary / Modeling soc~ul network processes

shown for the corresponding

undirected

edge in Fig. 2. That is, IC( n,,

n,) - C( n,, n,) 1from Fig. 3 equals C( n,, n,) from Fig. 2.

Because the flow in a network is constrained uniquely by each edge, the flow is defined in the same manner as the capacities (i.e., in terms of the individual edges in the network). At any point in time, the flow is

Figure 2. An undirected

capacitated

Figure 3. A directed capacitated

network.

network

W. W. Zachap

/ Modeling socral network processes

defined by the flow function constrained by C.

F, which is defined

Definition. A function F defined on a capacitated for which all of the following hold true: (1) F(n,,n,)=Oif(n,,n,)@E (b) F(n,, n,) G C(n,, n,) (c) F(n,, n,) = - F(n,, n,)

265

in a way that is network (N, E, C)

v’n,,n,~N Vn,, n, EN Vn,, n, EN

is a jlow function. From condition (c) on the flow function it can be seen that a flow is unidirectional on an edge. In fact, the flow across an edge is exactly minus the flow in the opposite direction. Any flow Fin a network must then either be cycling in the network or moving into, across, and out of the network. This observation will prove important in several theorems below. A capacitated network with a single flow constitutes a minimal network flow model. There are many more complex representations possible within the general rubric of constrained flows in networks. For example, there may be multiple flows (of different items) within a given network at a time. There could be separate edge capacities for each quantity flowing through the network. There might be minimum capacities as well as maximum, that is, for a flow to exist on a given edge e,, it must be at least some minimal quantity C,in. There can also be costs associated with each node, and so on. Each of these more complex representations permits a broader range of processes to be modeled and analyzed within the network flow approach. A large body of mathematics - primarily from operations research - has been developed to support modeling and analysis using all these complex representations. The purpose of this paper, however, is not an overview of the mathematics of network flows, but rather an introduction to the network flow approach to analyzing processes in social networks. No consideration of these various complex network flow representations will therefore be given. Instead, a single empirical example will be used to demonstrate in detail how network flow analysis works. A problem is introduced - that of communication processes and their relationship to fission in small groups - and formalized as a network flow problem. A

detailed model is built of the empirical example and network flol\ analysis is applied to the model to “solve” the communication/fission problem. As it turns out, only a simple network flow representation of the type defined above (with simple edge capacities and a single flow) is needed, but the mathematical results that must be derived to solve this problem are fundamental to virtually every other result in network flom theory. Thus, the reader is provided with not only a concrete example of how network flow analysis is performed but also with a serious introduction to the mathematics underlying network flow analysis.

4. An example

problem

- communication

and fission in small groups

Small corporate groups of less than approximately 100 members have been widely studied by a number of disciplines and have formed the mainstream of social network theory. Within small groups. solidarity is in large measure supported by the sharing of what are often called “sentiments” - values, beliefs, information, knowledge, etc. about the group and its environment (see Malinowski 1913; Homans and Schneider 1955; Fortes 1969). Unfortunately, this is as much an explanatory hypothesis for solidarity as it is a proposed mechanism. since the concept of sentiment and its relationship to behavior is so ill-defined. Among the several mechanisms that have been proposed, this case study emphasizes face-to-face communications between group members as the primary means by which sentiments are transmitted and reinforced in small groups. * The communication processes in a small group are thus assumed to contribute substantially ~ although not solely - to a flow of sentiments within the group. This flow of sentiments is constrained in three ways: (1) by the existing direct relationships among individuals in the group, (2) by the frequency of communication across those direct rela-

’ The

mechanisms

been heavy goal

debated emphasis I\ the

emphasizes Informed focu>

which by

underlle

on the

public ritual,

subcultural

on the other

the maintenance

anthropologists role

and

of what

demon\tratlon but

defined group.

ohwou\

others

he called of

group

III a substantially such

macrolevel

mechanism.

and modlflcation for

xvnr

“ritual”

time. behaviors

membership different mechamams

face-to-face

of sentlmentb

Leach

in way can

tranwctlons.

-

(1957) actlonb

thlb

process.

Given not

the

easily

have. of course.

for

example

whose

only

Gertr emphaua

he invoked.

placed apparent

(1965) here

alw on

hence

an the

tionships. and (3) by the contextual breadth of the extant relationships in the group. Because of these constraints, the flow is not likely to be uniform. Some individuals will be more “in the mainstream” than others. This observation, coupled with the observation that faction formation and group fission are extremely common small group behaviors, gives rise to a proposition: If the sharing of sentiments within the group as a whole gives it its solidarity, and if the flow of sentiments is likely not to be uniform in the group because of the group’s structure (i.e., its constraints of communication), then there can be subgroups which have greater internal information flow than the group as a whole, and are thus potentially more cohesive internally than the entire group. When such a situation exists, it is reasonable to hypothesize that it could form the basis and boundary for a potential fission of the group. This is a hypothesis that can be formed as a network flow problem, modeled mathematically, and tested within the context of empirical data on a real group. A convenient empirical case within which this proposition can be explored is a voluntary association, specifically a martial arts (karate) club, studied by Zachary (1975, 1977). The group was observed for over two years, and during that time varied in size from 28 to 56 members. Group membership was always well-defined and there was a high level of social interaction among the club members. All members of the group were university students; because of this common involvement in a larger arena, they had extensive opportunity to interact outside the context of the actual karate classes. In addition, numerous activities organized by club and/or individual members maintained a high level of both group solidarity and social interaction among the group members. 3 During the middle of the two years during which the club was studied, it began to experience a sort of “identity crisis.” A disagreement developed over the degree to which the group should model itself after other student organizations as opposed to traditional martial arts schools. Eventually two well-developed competing ideologies emerged about the group’s purpose; a series of organizational disputes arose, culminating in the formal fission of the group into two separate

3 There was no interaction during actual karate classes (which. ironically were the only Iural activity of the group) because of strictly enforced traditions against socializmg during training.

268

W. W. Zachagx / Modeling social network processes

organizations. A more detailed description of the associated communication, social, and political processes within the group can be found in Zachary (1977, 1975) but the above sketchy description is sufficient to motivate development of a simple network model. 4.1. Building a network model The rudiments of a model of the communication processes within this group are easy to establish. Each of the three kinds of constraint on the flow of sentiments described above translates into one basic part of a network representation. The direct or face-to-face relationships existing among club members are represented as a set of edges connecting a set of nodes which themselves represent the club members. The breadth of these relationships - their potential for information flow ~ is represented by a set of edge capacities. The actual frequency and content of communication across the existing direct relationships is represented by a flow of information within the network. In the karate club example the flow is interpreted as a flow of ideological information or sentiments about what kind of organization the club actually was. A graphic depiction of the network model representing the structure of the club shortly before the fission is shown in Fig. 4. Each edge in the network is undirected (i.e., e,j = e,i and c(n,, nj) = c(n,, n,)), and the single number shown next to each edge is its capacity. Since the capacity of a relationship for supporting the flow of sentiments is not observable in any sense, the measurement process underlying the capacities shown in Fig. 4 is anything but trivial. In fact, a separate set of mathematical results not included within traditional network theory was required to specify how observable and measurable features of the direct relationships could be employed to generate a network model that preserves the key characteristics of the “true” but unobservable one. This transformed representation is what is shown in Fig. 4. These measurement-related results are presented later in the paper. For the present, the reader is asked simply to accept the capacities shown in Fig. 4. Adding a further level of detail to the representation in Fig. 4 requires establishing some new mathematical constructs. These new constructs can be motivated by recalling the point made earlier that a flow in a network is either cycling within the network or else entering, flowing through, and leaving it. In the example, two novel ideological

W. W. Zachary

/ Model,ng

social

network processes

269

positions were formed and then spread throughout the group. Such a situation favors the notion of information being created and distributed throughout the network as the more appropriate of the two choices. When there is a flow through a network, a certain class of nodes called sources is required. These are nodes whose flow of information outward (toward other nodes) exceeds their flow of information inward (from other nodes). As the name implies, source nodes simply designate the sources of the “moving quantity” that is flowing in the network. In the karate club, each ideological position had a primary advocate that acted as an “information source.” The club’s karate instructor supported the

Figure 4. Capacitated

network

model of the karate club social network.

the idea that the group should be modeled after traditional martial arts schools, while the club’s (student) president supported the idea that it should be modeled after other student organizations. In the simplest 4 these two individuals could be considered as possible representation, nodes in the network. The other individuals in the the only “source” club would then become intermediate nodes ~ those whose flow “in” equals their flow “out.” In a network, however, the flow cannot merely be flowing into the network; it must, in some sense, also be flowing out. There must be at least one node for which the net flow of information ~HWNU/ is greater than that out~ar-d. Such a node is termed a s/rlX. One way of interpreting the notion of a sink node within the karate club is to note that the (i.e., by one information generated around each ideological position source node) is at least functionally antithetical to the other ideological position (i.e., to the other source node). The network can thus be from each conceptualized as harboring ~M’O flows, one originating source, with each source node acting as a sink to the flow of information from the other source node. Together, the constructs of source, sink and intermediate nodes can to the rl~~/lre of the flow be defined formally to give a precise definition in a network at a point in time. Definition. In a capacitated and only if

network

(N, E, C) a node

I?(, is a source if

c en,,,17,)> 0. II,

E Iv

A node

n, is a sink if and only

4 It must be stressed that this is only the

if

simplest possible

representation.

The fact that it proves

productive for the problem at hand demonstrates by Occam’s razor that iI is a good one. This is. to a larger extent. the role of modeling and model-based analysis in social science to find what minimal representations prove useful in “explaining” complex processes. At the same time. the reader should be reassured that this is only an “as if” model. r.e.. one which simplifies the real social situation so that it can be analyzed “as if” it really were like this. More complex and realistic representations are certainly possible within network flow theory. This aimple one was chosen because 11 works and hecause examples should he simple.

271

Further,

c

F(,l,, II,)=

,r,t.L

2

F(n,,n,)=

F(n,.q)=f

,,,t v

and f‘ is the value of the current

flow in the network.

Definition. In a capacitated network (N, E, C), a node 11, which neither a source nor a sink is an intermediate node, for which ,j;,,F(q~7,1= i

1

is

Fb7,,~7,)=0.

n,tN

A few observations about flows in networks are appropriate here. First, any node in a set of nodes is free to be a source, sink, or intermediate node. To the extent that the triplet (N, E, C) defines the structure of a network, the source, sink and intermediate node designations are not structural attributes but situational characteristics of the nodes. A given node may be a sink in a situation involving one flow, a source in a situation involving another flow, and an intermediate node in yet another situation involving a third flow. Thus, a given network (Iv, E, C) can support models of many different processes within the group it represents. For examples of this, reconsider the simple undirected network shown earlier in Fig. 2. It is redrawn in Figs. 5a and 5b with two different flows within it. The ordered pair along each edge in Fig. 5 represents the flow along the edge and the edge capacity (respectively), and the arrow above the ordered pair represents the direction of the flow. In Fig. 5a, there are two sources (nodes 2 and 5) and one sink (node 7), while in Fig. 5b nodes 7 and 5 are sources and nodes 2 and 1 are sinks. These two flows could represent different processes occurring within the network, conceivably even at the same in Fig. 5. In 5a, the time. Two other points of ,: _rest are noteworthy net flow from sources to sink is 9 units not 10, because there is 1 unit of flow “backward” from the sink (node 7) to one of the sources (node 1). In 5b, the net flow from the sources to the sinks is 16, although the total flow from both sources totals 18. This is because of the unit flow from source node 5 to source node 7, which reduces the net flow out of node 7 by two units. These two points reinforce the fact that source nodes can have some flow in and sink nodes can have some flow auf, so long as their net flows are positive and negative, respectively.

The earlier proposition about the flow of sentiments and group fission can now be restated within the terms of a network flow model. The proposition was that group structure, in the form of the constraints on communication processes, could give rise to the existence of two mutually-exclusive and exhaustive subgroups such that the potential flow of information within each subgroup was greater than that of the group as a whole. This implies that there is some sort of structural “bottleneck” to the flow of information within the overall network. and :

( Source

net

flow

1

out =4

:

( Source

net flo out = 5

( Sink

:

( Sink

net

flow

in = 12 )

:

net

flow

in = 9 )

m

c+._&p_ ( Sink

:

net

flow

w

in = 4 1 \ii.41x

r’s

(2,2)

2

L

C

Source

:

net

CT>

1

flow

m ct3’

3

out=y&D-

8

16.6j b-0

-77

( Source

:

net

flow

out = 2 )

5B Figure 5. A single undirected

capacitated

network with two different

flows.(A) (B)

that this bottleneck separates the two subgroups. More precisely, the bottleneck actually defines the two subgroups when there is a source on each side of the bottleneck

Mith a corresponding

sink on the opposite

side.

By comparison, if the sink and source (for all flows) were all on the same side of the bottleneck (with no sink/source on the other side), then the two subgroups defined by the bottleneck would not have the property of greater internal information flow than the group as a whole. In this case, the portion of the network across the bottleneck from the source/sink would be (informationally speaking) the “poor relatives,” with there being no way for the flow to increase on that (far) side of the network beyond what was flowing from the side containing the source and sink. In this case, the “poor” side of the bottleneck would have the same information flow as the group as a whole, although this flow might be less than that on the side containing the source/sink. Thus, to formulate all this within the karate club example, the proposition about fission processes would apply (and thus “explain” the fission process) if and only if: (1) a bottleneck to information flow existed in the communication network of club members; (2) the two leaders were on opposite sides of this bottleneck; and (3) the bottleneck separated the two ideological factions and predicted the location of the subsequent fission.

5. Flows, cuts, and bottlenecks What is the mathematical communication flow in a exclusive and exhaustive Such a partition is called

formalization of the notion of a bottleneck to network? Clearly, the bottleneck is a mutually two-way division of a capacitated network. a “cut” and is defined as follows:

In a capacitated network (N, E, C) if two subsets N, and Al completely partition N (i.e., if N,nN, = 8 and N,UN, = N), the ordered pair (N,, N,) is a cut in the network.

Definition.

Definition. The capacity of a cut (N,, N,) is the sum of the capacities of all edges connecting the elements of N, and N, and is given by

C(N,,N,)=

C C(n,,n,). n,sfY “,EN,

Deji‘,lirion. The flow across a cut (N,.N,)is the sum of the flow across all edges connecting the elements of N, and N, and is given by

F( N,. N,) =

c

F(n,, 17,)

II,E

N,

u,t

N,

A bottleneck is a cut, but it is also a special kind of cut ~ one which separates the source and sink, limits the overall flow from the source to sink, and is unique among all cuts in these properties. Thus, the bottleneck should be found at a unique cut whose capacity constrains the maximum flow within the network. A flow-restricting cut of this kind is called a minimal cut and the flow it allows is termed the maximal flow. Several theorems are required: ?? to

demonstrate that a bottleneck defined in this manner exist; ?? to identify how to determine when it does exist; and ?? to locate it when it does exist.

can ever

The main result is the proof by Ford and Fulkerson (1962) that the maximal flow in a network is determined by the minimal cut. Their proof is important because its logic underlies that of many other key proofs in network flow theory, and because it is constructive - it provides an actual algorithm for finding the maximal flow and a minimal cut. This algorithm is essential to finding a bottleneck to information flow, or any other kind of network flow. The most convenient approach to formalizing this discussion is to begin by noting a lemma demonstrating that a flow in a network is always constrained by any and every cut separating the source and sink. This lemma is then used in Ford and Fulkerson’s proof. In it and succeeding theorems, the basic arguments are taken from Ford and Fulkerson (19.56, 1957, 1962). with some additions and simplifications taken from Hu (1969).

Lemmu I.Let F(No,N,)be a flow between non-adjacent nodes n, and n, in (Iv, E, C),and let (NO, N,) be any cut separating n, and 11,. If f is the value of F(n,,H,), then f=F(N”,N,)-F(N,.N,)~C(N,,N,)

(1)

Proof For any flow F( II,. 17,) = - F( 17,. 17,). There can be flow in two directions across a cut separating 17~)(the source) and 17, (the sink) only if a cycle exists on an intermediate node in N which involves nodes on the other side of the cut. so (1) reads: The flow from 170 to /I, is equal to the positive flow across the cut from N,, and N, minus the positive flow “cycling back” from N, and N,,. The proof of this intuitively obvious proposition is somewhat cumbersome and not germane to this paper (see Hu (1969), and Ford and Fulkerson (1962) for details). Lemma 1 is important because it shows that the flow from a source to a sink is always equal to the net flow across any cut separating them and is always less than or equal to the capacity of that cut. The “Maximum flow-Minimum cut” theorem can now be proved using Lemma 1. Tl7mw77 1. In a directed capacitated network (N, E, C), the maximum flow from the source /70 to the sink /T, is equal to the minimum of the values of the capacities of all cuts separating 170 and 17,.

Lemma 1 establishes the criteria for proof. Since F( N,,, N,), and C( N,,, N,) as defined in Lemma 1 are always positive, the minimum value of C( N,,, N,) in (1) occurs when F( N,], N,) - F( N,, N,,) = C( N,,, N,). Conversely, the expression on the left has its maximum value when it is equal to C( NO, N,). Proof of the theorem, then, requires only demonstration of the existence of a flow with value f which is equal to the capacity of some cut separating its source and the sink. Consider the maximal flow in (N, E, C) and a cut (N,, N,). If F( N,,. N,) is at its maximum, then by the above it is clear that Ptmf

F( N,, N,,),

F(N,,, N,)=C(N,. F( N, , N,, ) = 0

N,),und

(2) (3)

Thus, equality holds across (l), and F is a maximal flow with value f. Using F the sets N,, and N, can be defined recursively. By definition 1~~~is a member of N,,. Any other members of N can be assigned to iv, in one of two ways: (i) for any n, already in N,,, if E( II,, n,) = 1 assign II, to N, when F(n,, n,) < C( II,, n,), because to assign II, to N, would place c,, in the cut (since n, is in N,) and Eq. (2) would thus be violated. or (ii) for any II, already in N,,, if E( II,, 17,) = 1 assign II, to

276

W. W. Zachag,

/ Modelrng

social network processes

NO when F( n,, n,) > 0, because to assign 17, to N, would place e,, in the cut (again because n, is in N,,), and Eq. (3) would then be violated. Now, all that must be proven is that II,, the sink, cannot be an element of NO by either of these assignment procedures. Assume that n, E N. It follows from the assignment procedures that there is a path from jr0 to II, for which the flow on all adjacent edges is less than the edge capacity, or there is a positive flow across all edges from n, to n,. In the first case, there exists some xi = min[ C( n,, n,) - F(n,, n,)] for all nl, n, in the path, and in the second case, there exists some x2 = min[F(n,, n,)] for all n,, n, in the path. All flows on this path can then be increased, by xi in the first case, or by x2 in the second case, and the value of the flow F will increase (by either xi or x2), so the value f of the flow is not maximal. Since F is maximal by definition, a contradiction is established, and n, $LNo. The sets NO and N, created by the assignment procedure based on the definition of F as maximal represent a cut (NO, N,), in (N, E, C). From the rules by which NO was created, it is known that F( no, n,) = C( n,,, II,), and that F(nj, n;)= 0 for all n, E No and n, E N,. Thus F is a maximal flow by (2) and (3), and (No, IV,) is a minimal cut. The proof is complete. The assignment procedure in the proof is constructive in that it lays out the rudiments of an algorithm for finding the maximal flow and a minimal cut in any network. The full algorithm is presented in the Appendix. It should be clear that an algorithm for finding a minimal cut in a network is a necessary first step to finding the minimal cut that is a bottleneck to some flow in the network. Since the bottleneck must be a unique minimal cut, an algorithm which finds any minimal cut will find the bottleneck because there will be no other minimal cut to find. Unfortunately, more than one cut separates any source and sink in a capacitated network (N, E, C), and there is no prohibition against several cuts having the same minimal capacity. All Theorem 1 proves is that a minimal cut separating the source (no) and the sink (n,) exists, and that the value of the maximal flow in the network is equal to the capacity of this minimal cut. To help find the bottleneck being sought in the example model, this result must be coupled with one which indicates when a minimal cut located by the algorithm in the proof of Theorem 1 and the Appendix is unique. A re-examination of the approach taken in the algorithm suggests that the minimal cut it finds is the one that is “closest” to the source.

This is because it begins to augment the flow from the source along each path to the sink and stops at the first saturated edge (i.e., the first edge where the flow is equal to the capacity) it finds. Intuitively, this does imply that there cannot be other edges on the same path that are also saturated but merely further from the source. Such other saturated edges might define additional minimal cuts. The following Lemma and Theorem prove this intuition by showing that the minimum cut chosen in Theorem 1 is the intersection of all minimal cuts separating the source and sink. Lemma 2. Let (N,‘, N,‘) and (NO, N,) be minimal cuts in (N, E, C). Then (N;nN,, (N - (N,‘fW,)) is also a minimal cut. Proof. All edges from NO to N, are saturated

if the flow is at maximum, as are all edges from Nd to N,’ by Lemma 1. Thus all edges from are also saturated. Then by (2) (N,‘flN,, N (&‘fW,, N - W;fW,)) (N,lflN,)) is also a minimal cut. Theorem 2. Let (NO, N,) be the minimal cut chosen in Theorem 1, and let (N,, N,‘), i = 1, 2, . . . , m be all the minimal cuts in (N, E, C). Then

NO= NJ-lNJl...fW,,,. Proof: Apply Lemma 2 m + 1 times, to show that ( NiflN,llN,fl . . .nnJ,,, N - (NiflN,flN,fl.. .N,fl.. .fVV,,)) is a minimal cut. Now call this minimal cut (NO, N,) and assert that it is the minimal cut chosen by Theorem 1. Note that this cut has no proper subsets which are minimal cuts, and that there are no flow-augmenting paths in the network. Each path is saturated in at least one edge, and the cut we have called (N,,, N,) includes the first edge (moving from the source toward the sink) that is saturated only. The algorithm used in Theorem 1 will choose only the first saturated edge on a flow-augmenting path to be included in the minimal cut, and therefore the minimal cut created by the applications of Lemma 2 is the same minimal cut chosen by Theorem 1. There is also a very useful corollary to this theorem. Corollury 1. Let (N,, N,) be any minimal cut in the directed capacitated network (N, E, C). Let F be a maximal flow, and let (NO, N,) be the minimal cut defined by the procedure in Theorem 1. Then NOc N,. Proof

NO = N~nN,flN,n..

Theorem

.fW,, where (N,, N,‘) is a minimal 2. Since (N,, N,) is a minimal cut, NOc N,.

cut, in

Theorem 2 showed that if another minimal cut existed it would be closer to the sink than the cut defined in Theorem 1. and this result

278

W. W. Zachary

/ Modelmg

sncrcll nerwrk

processes

immediately suggests a method to test the uniqueness of the minimal cut found in Theorem 1, or the labeling algorithm based on Theorem 1. Imagine that the entire network is turned around and the flow is moving through the network in the oposite direction - from the sink to the source. A minimal cut is again defined by the procedure in Theorem 1, but now any other minimal cut would be closer to the original source than the one defined in the “labeling algorithm” in Theorem 1 and the Appendix. If the same cut is defined in both cases, then the cut is unique. If the two cuts are different, obviously there is no unique cut. This can be stated as a theorem and proved using Corollary 1. Theorem 3. Let (IV,, IV,) be the minimal cut defined by the procedure in Theorem 1 in a directed capacitated network (N, E, C). Reverse all flows, making the source in (N, E, C) the sink, and the sink the source. Apply the procedure in Theorem 1 to define a minimal cut (N,, N,). The cut (NO, N,) is unique if and only if (NO, N,) = (N,, N,). Proof. It has already been shown that if any other minimal cut existed, say (N,,,, iv,,), then C( NO, N,) = C( N,,,, N,). From Corollary 1, NOC_N,. Also, Nk c N,,, because (N,, Nk) was defined in the opposite direction from the cut (NO, N,). This is true because in the cut (N,, Iv,), Nk is the set containing the source, and Corollary 1 applies to the source side of the cut, not merely the first element in the ordered pair representing the cut. Since this is a two-sided Theorem (“if and only if”) it must be proved both ways. The more difficult way is to prove that (NO, N,) = (N,, Nk) implies that the cut is unique. By Corollary 1, NO c y,, so (NO, N,)c(N,, Nk).At thesametime,forany(rr,,n,)E(NO, N,)=(N,, N,), n, is an element of NO, and n, is an element of Nk. Therefore, (NO, N,) = (N,, Nk) = (NO, Nk). Now make some substitutions. By Corollary 1, NOc N, and Nk c N,,. Thus

(No,4) c (N,,, 4) c (Nm,Nn). The capacity function

(4)

C is strictly positive, so (4) implies that

C(N,, N,) < C(Nm,N,) < C(N,, N,t). It’s already established that (NO, N,) = (NO, Nk) and, therefore, that C(N,, N,) = C(N,, Nk). Since (N,, N,) is a minimal cut, C( Nnl, N,) = C( NO, N,). Therefore equality must hold between the capacities in

(5) by (2). If equality holds across (?I), it must also hold across (4) and i

iv,,,N,,) = ( N,, 7

Iv, ) = ( N,, N,: ).

the first half of the proof is complete. The second part, providing that and left to the reader.

/ N,,, N,) is unique implies that (NO, N,) = (N,, N,) is obvious,

5. I. Looking

for

bottleneck-s

itI the model

As the old saying goes, “the proof of the pudding is in the eating.” So how does all this mathematical pudding go down? Earlier in this paper, three criteria were identified for determining whether a proposition about constraints on communication process and group fission was able to account for the data collected from the karate club. These criteria were: (1) existence of a bottleneck; (2) placement of the two leaders on opposite sides of it; and (3) alignment of the bottleneck with the divisions after the fission. The preceding Theorems provided a way of establishing a network flow model of the club and testing for these criteria. The first criteria can be demonstrated by using Theorems 1 and 3 together to establish whether there is a unique minimal cut in the network model of the club. This was done in Zachary (1977), and a unique minimal cut was in fact found. Thus, criterion 1 is present. Criterion 2 directly follows from 1 by the way in which the flow was defined. Remember that each leader was considered as a source of sentiments for one ideological position and a sink to the sentiments generated around the other. Thus, the two leaders are by definition on opposite sides of a unique minimal cut. (This use of the model escapes being a tautology by allowing for non-uniqueness of the cut.) In Fig. 4 the two leaders are depicted as nodes 1 and 34. (Their labeling in this manner was a concession to the computer program that was used to calculate the bottleneck.) Criterion 3 can be evaluated by simple inspection. It should be noted that the alignment of individuals after fission did not exactly coincide with the alignment of individuals in (ideological) factions before the split. There were external constraints which, for some individuals, overrode their sentiments about the club and its political disputes.

These external constraints are discussed in more detail in Zachary (1977). Therefore, there are two ways of evaluating criterion 3; in terms of the factional alignment prior to the fission, and in terms of the organizational alignment following the fission. Table 1 summarizes both methods, it shows that the unique cut separates the two factions before split with complete accuracy, and separates the two post-split groups with 97 percent accuracy. In either case it suggests that criterion 3 is also met, and that the model is sufficient to explain the observed process. Of course this is only an example, but it is perhaps sufficiently convincing to demonstrate the utility of net flowmodels. Before moving on to the measurement problem, it is worthwhile to reconsider what this little example demonstrates. Within any small group, there exists a network of direct, face-to-face relationships which forms the structure of and organizes the group’s communication processes. This communication network imposes constraints on communication that can have far-reaching impact on the flow of mundane, pragmatic, and even ideological information within the social network it encompasses. In the above example, a very simple network flow model was constructed and used to show that these constraints can give rise to very complex and pervasive group-level processes - faction formation and fission. Thus, the example demonstrates a way of thinking about networks that is not widely employed by network researchers - how the network itself constrains and directs processes operating within it. Moreover, it points a way of moving social network research out of the static world of structure and into a more processual world of mechanisms and their structural constraints.

6. Measurement

of edge capacities

and the stability

of network

flow

solutions

Measurement has always been the Achilles heel of social science. This has been less of a problem in social network research than in other areas; in fact, it can be argued that the ease of basic formalization is the primary reason for the widespread and rapid rise of social network approaches. However, recent difficulties in obtaining reliable measurements of even simple secondary network-related variables have somewhat burst the bubble of social networks as the golden road to convenient quantification. Bernard and Killworth’s “informant accu-

W. W. Zachug

/ Modeling

social nerswrk processes

281

Table 1 Test of the proposition Individual number in Fig. 4

8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 Totals

Faction membership as predicted

Faction membership as modeled

Hit/ miss

Club after split as predicted

Club after split as modeled

h h h h h h h h

Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit Hit

H H H H H H H H H J H H H H J J H H J H J H J J J J J J J J J J J J

H Hit H Hit H Hit H Hit H Hit H Hit H Hit H Hit J Miss J Hit H Hit H Hit H Hit H Hit J Hit J Hit H Hit H Hit J Hit H Hit J Hit H Hit J Hit J Hit J Hit J Hit J Hit J Hit J Hit J Hit J Hit J Hit J Hit J 33hits,1missHit 97% hits, 3% misses

j j h h h h

h h

j j h h

h

J h

h

j h

j j

j j

j j j j j j j j j j j j 34 hits, 0 misses 100% hits, 0% misses

Hit/ miss

This table gives the results of the simulation used to test the hypothesis. The two factions in the club before the split are referred to as “h” and “J" and the two clubs which formed after the split are referred to as “H” (the club formed by members of faction h), and “J” (the club formed by members of faction j). Columns two and five state to which club or faction the individual belonged. Columns three and six predict the factional/club membership based on the network model: an individual is assigned to the club/faction if he belongs to the cut-set containing the leader of that group. Columns four and seven list whether the prediction based on the model corresponded to what actually happened. The model was 97 percent accurate in predicting club membership, 100 percent accurate for faction membership.

racy” dilemma is a primary example of such difficulties (Killworth and Bernard 1976, 1977, 1979; Bernard et al. 1980). Measurement issues have great importance to network flow models because the powerful tools available for network flow analysis such as those provided by Theorems 1 and 3 are unusable if the basic model, especially the edge capacities, can not be measured and quantified. There is no general approach to quantifying edge capacities in a network flow model because the unit of measurement (and therefore the method of measurement) varies with the kind of flow being considered. In some cases the nature of the flow suggests a reliable and straightforward quantification mechanism. For example, if a flow of goods and services through a network is of interest, then a monetary or some commodity-based unit representing the value of the potential flow provides an easily-obtained and reliable measurement unit (see Lombardi 1974). In other cases, however, the basic nature of the flow being considered may totally preclude direct measurement of any kind. The flow of sentiments or ideological information in the preceding section is an example of a quantity for which flows and edge capacities are essentially unmeasurable. When it is not possible to measure the edge capacities in a network flow model directly, it is necessary to use an indirect or “back-door” approach instead. The most obvious indirect measurement approach is to identify a secondary variable that is somehow related to edge capacity, measure it, and use the resulting values as a surrogate for the edge capacities in the network model. This approach of operationalizing a conceptual variable in terms of a more measurable surrogate has long been used in social science. It works well, hut only when it is demonstruted mathemutically that the surrogate v&es have the sume muthematicul properties as the real ones. In theory, this means that before

some secondary quantity can be used as an edge capacity, it must first be proved that its use in this manner leads to the same solution as would be generated if the “true” (but unavailable) edge capacity were used. In practice, it means that the model builder: (1) needs to know what kinds of transformation of the true edge capacities will yield equivalent solutions, and then (2) needs to find some measurable quantity that is related in one of those ways to the true edge capacities. It is possible to shed some light here on (1). but (2) is inevitably dependent on the cleverness of the modeler and the details of the situation being modeled.

Before looking for the kinds of transformations that will preserve a network flow solution, it is necessary to consider exactly what is meant by one network mode1 yielding an equivalent solution to another. In both Theorems 1 and 3, two fundamental results are obtained from applying the algorithm in the Appendix - the location of the minimal cut and the value of the maxima1 flow. Network flow theory was historically sparked by the interest in physical networks such as pipeline systems. This content yielded a desire to know how much oil, water, etc. could be transported through an existing network. The “solution” to such a problem is thus the value of the maximum flow. When Theorem 1 was originally proved the authors viewed the minimum cut only as a convenient device through which to find the maximum flow. However in the information flow/group fission example presented above, the location of the bottleneck was the item of interest, and as a result the maximum flow in Theorem 1 seemed only a useful device to locate the bottleneck, (i.e., the minimum cut). Thus in this sociological realm, the partition that the cut imposes on the set of edges constitutes the solution to a network flow problem. Besides defining what must be preserved in the application of Theorems 1 and 3 to networks using surrogate edge capacities, this observation also points out how social analyses can impose different goals on applied mathematical research than physical analyses using the same underlying models. The problem is now intuitively defined as finding what kinds of transformations of the edge capacities in a network model (N, E, C) will yield networks that have the same minimal cuts under Theorems 1 and 3 as (N, E, C).Next, the problem must be stated more formally. Define 52 as the set of all capacitated networks with a given structure, (i.e., fixed N and E), as 52= {(N, E, C,):N, E fixed, and C, is a unique, arbitrary capacity function on N and E}. Next define a relation, - on L?x D as: (N, E, C) = (N, E, C*) if and only if the minimal cuts defined on N by C are identically those defined on N and C*. (Note:(N, E, C)=(N, E, C*) is read “network (N, E, C) is related to network (N, E, C*) under = .“) Clearly, = is an equivalence relation as it is (1) reflexive, (N, E, C) 2 (N,E, C);(2) symmetric, (N, E, C)=(N, E, C*)-(N, E, C*)=(N, E, Cl; and (3) transitive, (N, E,C)=(N,E,C*) and(N, E,C*)=(N,E,C)+(N,E,C)-(N,E, e).Thus, = partitions L’ into equivalence classes, and for any given capacitated network (N, E, C),the members of its equivalence class (all

those (N, E, C*) for which (N, E, C) = (N, E, C*)) are exactly those elements of 52with capacity functions which generate the same minimal cuts on N as C. The equivalence class of (N, E, C) is denoted as c. This notation allows the measurement problem to be restated as a problem of characterizing c. The following theorem provides a partial solution to this problem. Theorem 4. Given capacitated networks (N, E, C) and (N, E, C*), if there exists some n? E Z+ such that for all c,,, c,*,# 0, c,,/c,*, = or, then (N, E, C*)E~. Proof. Obviously there are finitely many cuts separting n, (the source) and n, (the sink) in any finite network. In particular there are

distinct cuts separating them, where n is the order of N. 5 Since the theorem refers to networks with a fixed N and E, both (N, E, C) and (N, E, C*) have the same number of distinct cuts separating the source and sink. Denote the sequence of all these cuts, in an arbitrary order,

[WV49lL and the sequence of the cut capacities, in the same arbitrary

order, by

(6) In (6) the capacity function C is used; if C* is used, then C* replaces C in (6). Using the hypothesis of the theorem (c,, = rn. c,:) and the The formula for k is derived as follows. There is one way to choose a cut where the source set. N,, has exactly one element. There are (n -2) ways to choose a cut where N, has exactly two elements. There are (;-*) ways ( I.c.. the number of unique ways to choose two elements from n -2 elements disregarding order) to choose a cut where N, has exactly three elements. This expansion can be continued up to n - 1 elements in N,, which is the maximum possible size. The sum of all these terms can be expressed in closed form by the formula for k given in the text.

285

definition

of capacity of a cut, the following is obtained.

c 17; E N,’

m.C*(n,,n:)

1k ,=I

Thus the sequence of the capacities generated by C is equal to the sequence generated by C* with each value multiplied by m. Again, the order is arbitrary, as the concern is only with the values themselves. Since m is positive, the cuts which are minimum in [C( Nj, N,‘)]F=, are also minimum in [rn. C*(N,, N,‘)],k_,, although the values of the capacity of these cuts are different by a factor of m. Since m is an integer, a maximal flow exists that is equal to the minimal cut capacities for both capacity functions, C and C*. As both C and C* generate the same minimal cuts in N, (N, E, C) = (N, E, C*) and (N, E, C*) E c. While this theorem does not characterize all the elements of c, it does provide an infinitely large subset. Moreover, the theorem indicates that when only the minimal cut in the network is of interest, any measurement technique may be used which preserves the interval ratios among the edge capacities, C. In other words, this theorem provides us with a chisel for chipping away at the difficult problem of measurement, when the location of the minimal cut and not the value of the maximum flow is of primary interest. This “chisel” is the conclusion that any scheme for assigning values to edge capacities can be used as long as it can be proven (or argued to the researcher’s satisfaction) that this scheme results in values that are related in some linear way to the “true” values. One important shortcoming of this result is that it does not deal with the kinds of error that might crop into such indirect measurement schemes - it does not state how close to precisely linear the transformation must be before the

results of the flow analysis become invalid. Can one or more tiny errors in value assignment lead to a totally erroneous result? Unfortunately, there is no formal answer to this question, at least now. Since the development of the applied mathematics of network flow theory has, to date, been driven by those disciplines concerned with the values of flows and not the implications of cuts, this question has simply not arisen before. But it is a mathematically interesting and tractable one, and to the extent that social scientists begin to apply network flow models and need answers to such questions, the applied mathematics world will begin to provide them. A more empirical answer (albeit one based on an 117of l!) can be drawn from the results shown in Table 1. It has probably not escaped the reader that finding some measurable quantity linearly related to “capacity for information flow” across an edge is only a slightly less difficult problem than that of assessing capacity for information flow directly. Yet the results presented in Table 1 suggest a great accuracy in the model’s predictions. Perhaps a consideration of the means for assigning values to edge capacities in the light of Theorem 3 above can help convince the reader, as it has the author, that a great robustness seems present in the max-flow/mm-cut algorithm. In the karate club, the relevant units of information were communicated in contexts outside the regular activities of the club, and club members interacted in a number of such contexts. Moreover, the transmission of sentiments was certainly an epiphenomenon of everyday interaction in these contexts, as ideological information was implicit in the communications of mundane social, political, and cultural messages. Given the context-sensitivity of such communications, it seemed reasonable to try to use the contextual breadth of relationship as a surrogate for capacity for information flow. It was assumed that the number of contexts in which any pair of individuals interacted was somehow related to the potential flow of ideological information (sentiments) between them. Put another way, it could be assumed that the capacity for informatimon flow across an edge in the network was a function of the contextual breadth of the human relationship which the edge represented. This assumption suggested a strategy for measurement, since a notion such as contextual breadth is (at some level) operationalizable and measurable. Of course, operationalizing and measuring contextual breadth (itself a difficult task) still leaves open the even thornier issue

of the nature of the context-capacity relationship. Still, if (based on data from the club) some specific relationship between the amount of information transmitted and the number of contexts of interaction could be determined or postulated, then a procedure for assigning values to C could be devised. It was clearly the case that the relationship was at least monotone increasing, (i.e., the greater the number of contexts of interaction the greater the capacity for information flow). For a more specific relationship, Zachary (1977) assumed that each context contributed about equally, and that capacity was therefore linearly related to contextual breadth. The procedure used to measure contextual breadth was a straightforward one. Based on detailed ethnographic observation of the group, a set of common contexts of interaction was determined. The set represented a list of situations in which club members expected to (and did) meet and interact with other club members. The number of elements of the set which applied to each edge, (II,,IT,) in Fig. 4 was then used as the value of C!, for that edge. A more detailed discussion can be found in the cited references. The ethnographic data, particularly as reported in detail in Zachary (1973, generally supported the assumption of equal contribution of all contexts. Nevertheless, this linearity assumption should be viewed as very similar to the single source/single sink assumption previously built into the model. It is an “as if” assumption, arising not so much out of a detailed reduction of the data as out of a desire to see if the apparent complexity of the social situation can be adequately explained by a simple (or simplified) mechanism. Such a model-building approach is clearly in line with the philosophy expressed by Simon (1981) and emphasized at the beginning of this paper. The ability of such a simplified model to represent complex social phenomena has been shown in Table 1. The strong “as if” assumptions of linearity and single source and sink, although they present the coarsest possible representation of the social environment, thus “work” in the sense of providing reasonable predictions of the social process they model. The results in Table 1 also suggest a strong robustness of the max cut/min flow algorithm, since the data used to build the model were very clearly noisy and based on strong assumptions. Detailed mathematical treatment of the robustness of this (and other) network flow models has generally not been a prime concern of network flow research, but here.again, this is largely because the model has not been used in situations where it would be an important issue. The increased

LIX of network flow representations in the sociological realm could well stimulate mathematical inquiry of this sort, generating the kind of productive interplay that has occurred for so long between applied mathematics and other substantive domains.

Conclusions

This paper has tried to provide an overview of the use of network flow representations in addressing social network kinds of problems. It has presented the motivations for using constrained process models such as network flows, and noted how these motivations differ from those of traditional social network models. It has also presented some of the relevant mathematics, but in such a way that a deep understanding is not essential to an appreciation of the network flow approach. A single example has also been used in detail to demonstrate the way in which sociological network thinking and mathematical network thinking intertwine. It should be re-emphasized, however, that this was only a single example and that many other applications to many other kinds of substantive problems (other than communication flow and group fission) are possible. Much more sophisticated netflow representations than the sample one introduced here are also possible. Multiple flows with common or separate capacities can be modeled in a single network. The cost of flow across edges can also be added, as can various properties of nodes such as node capacities, node costs, node delay times, etc. Much more realistic models can thus be built. It is also possible to address other questions than minimum cuts and maximum flows, (e.g., local flows, vulnerability of flows to disruption, stability, etc.). Stochastic flows can be introduced to make the entire representation probabilistic rather than deterministic. For all these cases, the mathematics has been well-developed, and needs only to be applied.

Appendix An

algorithm

The proof assignment

for

finding

the tnininluttl

cut

in u cupuitated

given in Theorem 1 is a constructive procedure used in it leads directly

t1etuwr.X.

proof. because the to definition of an

algorithm for finding the maximal flow and minimal cut in any capacitated network. This algorithm uses the concept of a flow-uupenting path employed in the proof. There, a path in the network was sought to demonstrate that n, $4 N,; this is the flow-augmenting path. The central notion of the proof is that if it can be proved that such a flow-augmenting path exists in the network, then the current flow is not maximal. The assignment procedure in the proof provides a systematic way of searching for flow-augmenting paths and incrementing the current flow on each one found in such a way as to make it no longer flow-augmenting. The proof ensures that when no more such paths exist, the flow is indeed maximal and the capacity of the cut separating the source and sink is minimal. A formal algorithm is given below for performing the assignment procedure and finding the maximal flow/minimal cut. This algorithm is sometimes referred to as the Ford-Fulkerson labeling algorithm after its inventors. In the algorithm, labels are assigned to each node n, in N. Each label is of the form (n,*, e(j) = x), where n, E N and x is either a positive integer or co, an undefined label.

The maximum flow-minimal

cut labeling algorithm

0. Label the source (-, e(0) = co), and let it be node n,. 1. Node n, is now called labeled and unscanned. Given a label in the form (n,*, e(j) = x) look at all nodes n, for which E( n,, n,) = 1. 2. If F( n;, n,) < C(n,, n,), give n, the label (n:, e(j) = x) where x=min[e(i), C(n,,n,)-F(n,,nj)]. 3. If F(n,, n,) > 0, then label ni(n,, e(j) = x) where x = min[e( i), F(n,,

n,)l.

4. Node n, is now labeled

and unscanned. 5. If the sink is still unlabeled, return to l., letting n, = n,. If the sink is labeled, go to step 6. If no further labels can be assigned (if there is no node n, for which the criteria in either 2 or 3 are true), then stop. e(t) = x). If the sink is labeled (nt , 6. The sink has been labeled (n:, e(t) = x), then replace F(n,, n,) by F( n,, n,) + e(t). labeled (n; , e(t) = x, then replace F( n,, n,) by F( n,, F(n,, 7. For node n,, labeled (nl, e(j) = x), increment 8. Let node nk be node n, and repeat 7. Continue until been incremented, where n, is the source. 9. Discard all labels and repeat the procedure, beginning the new (incremented) values of F.

If the sink

is

n,) - e(t). n,) by -e(j). F( no, n,) has with

0,using

When no further labels can be applied as indicated in step 5, the labeled nodes are N, and the unlabeled nodes are N,, The flow is now maximal. Discussion

This algorithm is based on the procedure by which nodes in N are assigned to the set N,, in the proof of Theorem 1. It begins in 0 by specifying the source; all nodes adjacent to the source are then labeled. Each label ~ of the form (l*. 3) - includes three items of information necessary to the assignment procedure. Item 1 indicates the node from which the label is applied, or the previous node on the flow-augmenting path. Item 2 is the sign indicating whether the flow is forward or backward on the path. If the edge is saturated (if F( II,, n,) = C( tz,, n,)). no label can be applied. If the sign is positive, the algorithm assigns a potential positive increment of a value given in item 3. This increment is the minimum of the potential increment of the last node (the first item in the label), and the difference between the flow and the edge capacity (referred to as the excess capacity). If the sign is negative, a potential negative increment is assigned. This is the minimum between the value of the backward flow on the edge and the potential increment of the previous node. Thus, item 3 carries the results of a continuing series of tests to determine an increment to the flow which may be made to all edges in the path. If the sink is included in N,, the flow is not maximal, and the flow is incremented recursively. Since the minimum possible increment is e(f), the algorithm increments all flow along the flow-augmenting path by e(r), in a backwards direction, that is, starting with the sink and continuing until the source is reached. Because of the procedure by which e(t) has been chosen, it is certain beforehand that no edge capacity will be exceeded. This procedure of incrementing along the flow-augmenting path is done in steps 6 through 9. Labels are then assigned again, in the hope that n, will not be labeled. When the algorithm stops, there will be a set of saturated edges that separate the source and sink. Those nodes on the source side of this cut are part of N,, while those on the sink side are part of N,.

W. W. Zucha~,

/ Modeling

socral mrwork

processes

291

References Bernard. H.R.. P.D. Killworth and L. Sailer 1980 “Informant accuracy in social network data IV: a comparison of chque-level structure in behavioral and cognitive network data. Social Nerwurks 7: 191-218. Bowman, Scott A. and Harrison White 1976 “Social structure from multiple networks II. Role structure.” Americm Joumul of Sociolog,’ 81(6): 138441446. Breiger. Ronald L., Scott Boorman, and P. Arabie 1975 “An algorithm for clustering relational data with applications for social network analysis and comparisons with multidimensional scaling.” Jorrrrtrrl of Morlwmatrcul f’+~o/o~qy I_‘: 328-383. Buaacker. Robert and Thomas Saaty 1965 FIntteGraphs mnd Networks. New York: McGraw-Hill. Davis, J.A. 1967 “Clustering and structural balance in graphs, “Human Relaions 20: 181-7. Flament. Claude 1963 Applications of Graph Theoj:r IO Group Structure. Englewood Cliffs. N.J.: Prentice-Hall. Ford, L.R. and D. Fulkerson 1956 “Maximal flow through a network, “Canada Journul of Mathentotics 8: 399-404. 1957 “A simple algorithm for finding maximal network flows and an application to the Hitchcock problem,” Canadu Joumal of Mathetnatrcs 9: 210-218. 1962 Flows in Networks. Princeton: Princeton University Press. Fortes, Meyer 1969 Kmship md the Social Order. Chicago: A1dine.i Geertz. Clifford 1965 “Religion as a cultural system” in Michael Banton (ed.), Anthropologicul Approaches to the Study of Relrgron (Association of Social Anthropologists Monographs, No. 3) London: Tavistock. Hage. Per 1979 “Graph theory as a structural model in cultural anthropology,” Atmual Reore~z~ ofrlnthropologv, 8: 115-136. Palo Alto: Annual Review Press. Harary, Frank and Robert Norman 1956 Graph Theo01 as a Mathematical Model in the Socicrl Sciences. Ann Arbor: Umversity of Michigan Institute of Social Relations. Harary, Frank, R.Z. Norman, and D. Cartwright 1965 Structuml Models: An Introduction to the Theov of Dwected Graphs. New York: Wiley and Sons. Homans, George and David M. Schneider 1955 Marriage, Authority, and Final Causes. Glencoe, IL: The Free Press. Hu. T.C. 1969 Integer Programming ond Netxwk Flows. Reading, Mass.: Addison-Wesley. Part 11. Killworth, P.O. and Bernard. H.P. 1976 “Informant accuracy in social networks data,” Human Orgmizatfon 35(3): 269-86. Research 4(l): 1977 “Informant accuracy in social network data II,” Humon Communications 3-18.

1979 “Informant accuracy in social network data III. or a comparison of triadic structure behaworal and cognitive data,” Socuzl Networks, 2: 19-46. Leach, E.R. 1957 Polltwo/ S~~~.rtem.~ of Highland Burma. Cambridge, Mass: Harvard University Press.

in

292

W. W. Zochq

/ Modeling socral network processes

Lombardi, John 1974 “Flows in social networks and the strength of interpersonal bonds,” presented at the American Anthropological Association meetings, November 19-24. Mexico City. Mexico. Lorrame. F. and H.C. White 1971 “Structural equivalence of Individuals in social networks,” Joun~nl of Muthenwricul Socioloa I: 49-80. Malinowski. Branislaw 1913 The fam!v among the Aurfralian aborigines. London: Hodder (Reprinted by New York: Schocken Books, 1963) Maki. Dame1 and Maynard Thompson 1974 Marhenmtical Models and Applrcatlons. Englewood Cliffs. N.J.: Prentice-Hall. Ch. 7. Maxwell, Lee and Myril Reed 1971 The Theo? of Graphs: A Basrs for Network Theoy New York: Pergamon Press. Reitz. Karl and Douglas White 1983 “Rethinking the role concept: Social network, homomorphisms and equivalence of positions,” unpublished manuscript. School of Social Science, University of California at Irvme. Saaty, Thomas 1973 Topics in Behaoioral Marhematrcs. Mathematical Association of America. Ch. 7. Selby, Henry 1970 “Continuities and perspectives in anthropology.” In A. Fischer, ed., New Dvectiorts ,n Ihe Study o/Anthropology, Bulletin of the American Anthropological Association. Simon, Herbert 1981 Sciences of Ihe Arlificial. Cambridge: MIT Press. 2nd. edition. White, Harrison. Scott Boorman and Robert Breiger 1976 “Social structure from multiple networks IV: Blockmodels of roles and positions.” American Journal of Sociologl~ 81(4): 730-780. Wolfe, Alvin and Norman Whitten 1974 “Network analysis,” In J. Honnmgman. ed., Handbook of Sooal and Culrural Anihropology New York: Rand McNallly. Zachary, Wayne 1975 The Cyberneim of Conflict in o Small Group: An Informatrotl Flow Model. Unpublished M.A. Thesis, Department of Anthropology, Temple University. 1977 “An information flow model for conflict and fission in small groups.” Journul of Anthropological Research 33(4): 452-73.

Modeling social network processes using constrained flow representations

Modeling social network processes using constrained flow representations

Recommend Documents