Measuring association in link-node problems

Measuring association in link-node problems

43 Geoforum 13/73 Measuring Association in Link-Node Das Messen von Assotiationen hoblems in Netzproblemen (Knotenvetiindungen) L. P. CUMMINGS, ...

906KB Sizes 0 Downloads 26 Views

43

Geoforum 13/73

Measuring Association in Link-Node Das Messen von Assotiationen

hoblems

in Netzproblemen

(Knotenvetiindungen)

L. P. CUMMINGS, B. J. MANLY, H. C. WEINAND, Boroko, T.P.N.G.*

Abnnct: in this paper we have attempted to pose a methodological problem, and to provide answers to some fundamental measurement problems in research. In the search for the regularities which are thought to undertie the phenomena we study, geographers and othen have often had recourse to Graph Theory to represent certain situations. Once some pattern of interaction can be depicted by using a group of points (nodes) joined by lines (links or edges), interest is often centered around the degree of correpondence between two graph-theoretic mpresentationr of empirical evidence. We have suggested methods for answering the question: am the link patterns for the two graphs so similar that they could not reasonably have arisen by chance? Ths approach to the solutions are based on probability theory and the properties of certain and other combinations. In statistical dbtributionr - the binomial, the hypergwmetric, the hypergeometric/bionominJ each case we hwvc provided a rationale for the suuyprted solution as well as tables and computatlonal procedums. The examples have been chosen from a wide variety of situations, for graph-theomtk representations are found in many fiddr of investigation. Zusemmutfawng: Auf der tiche nach den GesetzmYBigkeiten dn Forschungsgegenrtanda wird nicht s&en Zuflucht zur Theorie der graph&hen Darstellung genommen. Sobald skh lin vemiinftiges Bild ergibt, kontentriert rkh du lnteresse auf den Grad der &eminszimmung zwischen der graph&hen und der theoretlnhen OarsMung da jedenfalls empirtsch gewonnenen Beweirmaterlals. Demgegeniiber haben wir Mothoden vorgeschiqm, die foigende Frage beantworten, ob zwei graphische Durteliungen nur zufMg ghnlich sind. Die L&sung wird gefunden mit Hilfe der Wahrscheinlichkeitszheorie und. bestimmter statistischer Verfahren - der binomischen, der hypergeometrischen, der hyperpmetrisch-binomiwhen und anderer Kombinationen. in jedem Fall iiefem wir linen Beweir tir den gewghlten LOsunyweg sowie Tabellen und Rechnungsverfahmn. Die Beirpieie rind aus einer Vieiuhl von hMglkhkeiten ausgewghit worden, denn graphirh-theoretische Darstellungen sind ja auf verschiedenen Gebieten der Forschung Ublich. R6sum(: Dan, cette &ude. nous avons essayi de poser un probl4me mdthodologique et de proposer dm solutions 1 qwlqua questions fondamentdes de mesum en coun d’Ctude. En regard des r6gularit6s qui sent senti souligner ie ph&ne mdne en question, da PQographes et d’autrer chercheurs ont souvent eu recours i la thdorie graphique pour rep&enter certaines situations. Urn fois ie mod&e d’interaction d&ni par I’emploi d’ensembio de points (noeudr) ii6 la uns aux autres par da iignes (liens ou bords), I’intMt est souvent port6 sur ie degr6 de correspondance qui existe entre deux reprisentations graphiqun thbriqua d%vidence empirique. Nous lvons propori un certain nombre de mdthodes en guise de rdponn i la question: la mod&es de liaison utili%s pour rep&enter lo deux graphiques, sent-iis si semblables qu’ils ne peuvent raisonnabiement 6tre Ie produit du hasard? La m&ode utilitie pour trouver des soolutionsest b&e sur Ia th&orie de probabiiid et wr Ies propriith de certaines distributions de statirtique - don&s de binomie, d’hypergiomitrie/binomie et d’autm combinaisons. Pour chacun de ces cas, nous avons essayi de fournir une raison pour justifier la solution propo& dnsi que da tableaux et der pro&db de cakul. ies exempies ont iti tirdr d’une grande varidtd de situations, car la mpr6sentations graphiquer th(oriques peuvent provenir de piusieurs domaims de recherche.

Link-Node Problems One of the central problems of geographic research, as in any scientific investigation, is the search for more efficient ordering of facts to discern the regularities which are hypothesized as underlying the phenomena we study. * Geography Department, University of Papua New Guinea, Boroko,T. P. N. G., Papua New Guinea

In the past decade gains in the search for laws governing the spatial distribution of phenomena have been brought about largely be the simultaneous reevaluation of both the tools and the methodological basisfor our research efforts. The problem of establishing the degreeof correspondence between spatial patterns - points, lines, areas, intensities, flows, etc., - has been of major concern in Geography. Graphical, cartographical and linear statistics (e.g. correlation and regressionanalysis) have been used.

44

Geoforum

For some time a number of writers - among them

Case Al - Different Nodes Linked with Non-Directed

BACHI [ 1 ]ROBINSON

Edees*

and BRYSON [IS] and NEFT [ 13]-

13/73

has felt the need for some method of measuring relation-

KNOX [lo]

ships that would be more suitable to the type of corres-

lukemia, (n = %), was interested in analyzing not only

pondence geographers seek. It has been pointed out that in spatial analysis some methods of varying refinement

the temporal variations but also what he called timespace

do exist for measuring the correspondence of point to point,

point to mea and m

to mu patterns. But virtually

nothing has been developed or appl’ti

interaction. He classed all “C, pairs of cases as adjacent in space or not and as adjacent in time or not according to home addresses and “dates of onset” of the disease. If

to measure lint

to line, and point to line patterns, although steps have been made in this direction (NOBLER [IT]).

cases adjacent in time are linked in a time graph G1 and cases adjacent in space are linked on a space gnph

Gz then

this isan example where:

car example, a major problem in the quantitative conAeration of networks is to find an appropriate mapping of elements of the mathematical-statistical

in a study of the incidence of childhood

.model to elt

1. a different number of links exist in each graph (rl for Gr, r3 for G,)

ments of the real world. In seeking to reveal and investi-

2. different nodes are linked in each graph, and

gate the underlying structure of networks, recourse is

3. there is no direction on the links.

often made to establish analogies between the networks

If the intersection of the two graphs is defined as the num-

and those systems belonging to graph theory, in situations

ber of links common to both graphs, and S is the total of these common links, then the question arises: is the value

where one can use a group of points (nodes) joined by lines (links or edges), either directed or nondirected, to depict some pattern of interaction. HAGGElT

and CHOR-

of S so large (or small) that it could not reasonably be attributed to chance? Ho: the cm& for links are inde-

LEY [9] have recently provided an adequate summary of

pendent. On the basis of independent criteria the distribu-

contributions

tion of S will be taken to be hypergeometric. [ 111

by this approach.

This paper suggests methods for testing whether two

rq

graphs are significantly related. The situation is considered where two graphs both have n nodes representingthe

PH (S).=

where

by different criteria for the two graphs. The question to be answered is then: are the link patterns for the two

M = “cl,

graphs so similar that they could not reasonably have arisen by chance? In other words, are the criteria for do

rl = the number of links in Graph G, ;

tiding the links 1) on graph 1 (G,

) related

to those used

(1)

1

same n objects (which might be cities, families, nations, etc.) but where the links between nodes are determined

M’r1Cr2 .S MCr

the number of possible links between the

n nodes; r2 = the number of links in Graph G1 ; Consider the M possible links on Gr and Gs. In G1, rl

for deciding the links on graph 2 (Gs)? Situations that can arise are summarized in Table 1.

of these are actual links and M-r, are non-links. It is obviously feasible to label the M possible links in such a rl are the links and rl + 1, rl + 2, waythat1,2,...,

We will now discuss each case in turn.

. . . , M are the non-links in G,: Now, if Gr and Gz are independent it follows that this labelling will be indepen-

Tabk 1

dent of the status of the M possible links in Gs : a non-link in Gs is just as likely as a link in Gz to be one of the 1, 2,

0 Type of Link-Node ProMems (with su~ested tiutions) 0 Arten von Netzproblemcn 1 NonDirected Edges

(mit Lasungsvorschlggen) 3

2 Uni Directed

, Edw

/

Unl and Bidirectional Edges

Different

Hyper-

Hypergeometric/

. . . , rl links in G1. The hypergeometric distribution for 5, the number of links common to Gr and G1, immediately arises if G1 is considered equally likely to be any one of

14

j BiI j

directional EdgepOnly

the ‘X~possible n node graphs with r2 links since rl can be regarded as a random sample from the (1, 2, . . . , rl ) links and (r, + 1, rl + 2, . . . , W) non-links in G,. The mean and standard deviation of S are

j WwHyper, geometric 1 geometric (2) and OH Jr

Source: Authors

.. .

.. .

.

(3)

45

Geoforum 13/73

Table 2 l Cases of Childhood Leukemia l F’blle von Kinder-Leuk%mie

Adjacent in

Not Adjacent

Space

bl space

TOWS I

Adjacent in Time

5

147

152

Not Adjacent in Time

20

4388

44Oa

t0td5

23

4535

4560

Source: Knox [lo]

KNOX’S data, Table 2, show that 5 - 5, M = 4560, rl (time) = 152 and r2 (space) = 25. Hence PH = 0.8333. Since r1 and r2 are small relative to M which is very large, the distribution of S should be well approximated by the Poissonwith mean 0.8333 under the null hypothesis. See Appendix B. Calculations show that the probability of getting S 2 5 is approximately 0.002 for this distribution. The links on the two graphs are therefore significantly relat.&, and Ho is rejected. BARTON and DAVID [Z], in considering KNOX’s data, analysed the distribution of 5 obtained from the intersection of two graphsGt and G2 when then! possible allocation of the n nodes in G2 are equally likely. The mathematics of their situation became very complicated because if the nodes on a graph are relabeled the graph still consists of the same basic pattern of links. Whilst we do not regard BARTON and DAVID’s approach as in any way invalid, we can see no reason why, in general, only fixed pattern graphs should be considered in evaluating the distribution of 5. It might be mentioned that KNOX’S study is a special case in that G1 is a time graph. There are restrictions on the link patterns that can be obtained using a one dimen. sional link criterion such as adjacency in time. The argument for the hypergeometric presented above holds, but the time graph must be GI .

Case B2 - Same Nodes Linked, U&Direct&

Edges

study by CUMMINGS [8] explored the possible relationships between the structure of a transportation network and the flow of people in that configuration. The structure and fiow in thirteen “local” airlines in the U.S.A. were analyzed, and several measuresof structure and flow illustrated. One of the methods for measuring movement was “standardized flow”, a till discussion of which can be found in COLEMAN [7]. Briefly the method attempts to factor out the distortion due to differing population A

Source: CUMMINGS

(81

Fig. 1 l Digraph based on Standardized Flow, GI 0 Diyraph

basicrend auf standardisierten FluBmengen, G,

sizes when examining flows of people between citiess). Centrality indexes s), which are measuresdeveloped to examine the influence of group (spatial) structure on information flow, have been discussedby a number of writers [3), [4), 1121. One of the “local” airline sysrems (Ozark Airlines, ,1967) is the example cited in this paper. On the basis of the flow and centrality measurestwo directed graphs (digraphs) were constructed (Figs. 1 and 2). Their intersection is shown in Fig. 3. In this casethe intersection of the two graphs can be de fined as the (number) of links with the same direction, Although these measures exist in the published literature, and are used here as the bases for assigning directions to links, the authors are not happy with the dotails of thoir original computational procedures. A reaxamination of the various measures of structure and flow is currently beiw made. Centrality here is not to be confused with the use of the same word in CHRISTALLER 161 and THOMAS (161.

46

Geoforum

Source: CUMMINGS

[S]

Fig. 2

13/73

Source: CUMMJ NGS

[ 8)

Fig. 3

l Digraph based on the Centrality

Index, G2

l Diagraph basicrend auf dem Zentrdititsindex,

0 The Intersection Gt nG2 l lntersektion von Gt mit G2

G2

and, 5, the total number of these. The question here is:

under the normal curve under these conditions. The stand-

could S reasonably have arisen by chance? Ho: the criteria

ardized deviate is

for assigning directions are independent. Since both graphs have the same nodes linked, the probability of S agreements is given by the binomial distribution PB (S) = %S (4)’

(4)

where r is the total number of links, which is the same for both graphs. This is because there is a probability of i that any two links will agree by chance alone. The binomial means and standard deviation are, of course, j& = + r, and

(9

This is significant beyond the 1 % level, and Ho is rejected. Thus there is evidence of a positive agreement of the directions on G1 and G2.

Case A2 - DWerent POPHAM

and MANLY

Nodes Linked with &&Directed

Edges

[14] give a list of coefficients of

association for the distribution of Dermuptera (earwigs) in the eight regions of the world; (a) Europe and Asia,

nomial mean and standard deviation are 34.00 and 4.12.

(b) Africa, (c) Madagascar, the Seychelles and the Comoro Islands, (d) the Oriental region, (e) Australia, exluding New Zealand, (f) New Zealand, (g) South America, and (h) North America. Their study considered the relation-

The binomial distribution

ship between the present distribution of earwigs and the

For the example cited S = 45 and r = 68 so that the biis well approximated

by areas

47

Geoforum 13173

theory of Continental Drift. The data that they present do, however, allow a different comparison to be made. Using their Table 1 it is possibie to compare the present distributions of the Labiokka and Forfiwloidea groups of genera. Considering the LaMoideu group first of all, it is possible to construct a graph (G, ) with eight nodes, corresponding to the eight regions of the world mentioned above. Two nodes are linked if the coefficient of correlation between the two areas is outside the range f 0.25, cormsponding, rather roughly, to a 10 % level of significance. A direction can be assignedarbitrarily to each link. The Fotfiwloidea group can now be considered. Again a graph (G) can be constructed in the same manner as that for the Labioidea, except that two nodes are linked if the coefficient of correlation between two areas is outside the range f 0.20 which again corresponds, rather roughly, to a 10 % level of significance. Directions must be assigned to the links on G?. The link is in the same direction as that on G, if the same link exists on G, and the signs of the coefficients of association are the same for both graphs, in the opposite direction to that on G, if the same link exists on G, but the signs of the coefficients of association are opposite for the two graphs. No genera in the Forflcu/oida~ group are present in New Zealand. This area is therefore taken to have gero correlation with the other land areas. Clearly, if the distributions of the Labio#m and Forf/cu/oldco throughout the world are not related then Gt and Gs will be independent. We can examine this possibility by considering the distribution of S (the number of links completely agreeing for GI and Ga ) under the null hypothesis: there is no association between the links and direction on G, and Gs . The probability of S agreements is given by a combination of the hypergeometric and binomial distributions: t1-S

PHB(S)= 2

PH(S+i)‘+icE

(f)S+’

i-0

where PH (S + i’) is defined by (1). Equation (8) is derived as follows: One way of obtaining S complete ~eements would be for S links to be common for the two graphs, these all being in the same direction. The probability of this is the product of hypergeometric and binomial probabilities: PH

6)

Obviously, the possibilities can be enumerated in this way until the maximum possible number of common links is reached. This maximum is the smaller of rl and rz. Since the labelling of gaphs is arbitary this can be taken as r2. The full probability of S complete agreements is then given by the sum (8). It is shown (Appendix A) that the mean and standard de viations are related to the mean and standard deviation for case Al : (9)

MiB +tl

(10)

OHB =fe

The distribution implied by (8) has not apparently been tabulated and, at this suge, the probabilities must be computed using tables of the individual terms of the hypergeometric and binomial distributions. Returning to the example of the distribution of earwigs. For the two graphsn = 8 and hence M = “Ci = 28. Also, for Gr (Labioidea), rl = 8; for Gs (Forfiwloidea), ra = 12; and S = 7. Using (9) and (10) t&se values lead to MB = 1.72 and OHB = 0.89, clearly showing that the observed value of S = 7 cannot reasonably be attributed to change. Although the significance is quite clear in this example it may be useful to use the results to illustrate the computa-

tions required for an exact test of significance.

GuA3-DiffemntNodsrLirtltedwithUniandBidincted E&s. BERRY [5] has published the results of an experiment designed to test some hypotheses on stereotypes in Australia. One of these was that there is a positive relation between familiarity and the uniformity of stereotypes as he define: them. A graph (G, ) can be constructed based on BERRY’S “Rank Familiarity Index” (his Table 2) where two states are linked in one direction if the Rank Familiarity Index is six or &ss (Fig. 4a). A second graph (G2 ) can be constructed based on his measurement of uniformity (his Table 3) where two are linked in a particular direction if the measure is greater than six (Figure 4b). The intersection of these two graphs is given in Fig. 4c4). The distribution of S, the number of agreementsis again given by the hypergeometric,

(+t

hcs

Another way to obtain this result is for there to be (S + i) links to common and S of these in the right direction. The probability of this is

PH (S + i) S+ics

(+)S+i.

RHd 6)

2M - rlc,

_

s

(11)

=

2MC ra 4)

We have not included the self links in our analysis since it does not seem realistk to assume that the probability of l self link is the same as the probability of a link between dlffcrent nodes.

48

Geoforum 13/73

Table 3

%I Y

.E

*

l Calculations for an Exact Hyper-

A

2 42 *oE 1: 0 1 2 3 4 5 6

1

:, E s

0.004 0.044 0.170 0.309 0290 0.143 0.036 0.004 0.000

Probability that S of the I common links will agree on direction**

geometric/Binomial Significance

0

1

2

3

4

5

6

7

8

1.OOo 0.500 0250 0.135 0.063 0.031 0.017 0.008 0.004

0.500 0.500 0.375 0.250 0.156 0.094 0.055 0.031

0250 0.375 0.375 0.313 0234 0.164 0.109

0.125 0250 0.313 0.313 0273 0219

0.063 0.156 0234 0.273 0.273

0.031 0.094 0.164 0219

0.017 0.055 0.109

0.008 0.031

0.094

Test of

L Berechnungen Wr cinen exakten hypergeometrischtn/binominalen Signifikanztest

PHB (5) = 0.130 0.321 0.321 0.168 0.050 0.008 0.001 0.000 0.000 * The hypergeometric probability PH (I) from quation

(I).

** The binomial probability ‘CS (l/2)‘. The PHB (S) are found by multiplying the binomial and Hypergeometric adding, e.g.:

probabilities together and

PHB(2) = 0.170 x 0.250 + 0.309 x 0.375 + . . .O.OOO x 0.109 = 0.321. Source: Authors

c

Q

WI8

2M = 2 “Ci, the number of possible edgeson an n node bidirected graph.

\

C

WA

3

C S.A.

.-

t %k.

rl

= the number of directed edgeson GI .

r2

=

the number of directed edgeson G2.

This is exactly the same as case Al except that there is now the possibility of two links between any two nodes.

Tar

Here rl r2

(12)

/-tHd =x

1 “Hd =

r2 (2M-r,)

(2M-r2)

(2M-1)

(13)

From BERRY’s data (Fig. 4) S = 4, rl = 8, r2 = 13,2M = 30, so that &+d = 3.47, and o,.,d = 1.22. The Vaiue Of 5 is clearly not significantly different from that which is expected. In caseswhere the result is not so obvious reference could be made to tables of the hypergeometric distribution [ 111.

Fig. 4 l Digraphs of Stereotypes in Australia l Diagraph von Stereotypen in Australien

Based on BERRY

(S]

Case 83 - Same Nodes Linked with Bi-Directed or UniDirected Edges. In this caseG1 and G2 will have the same F node links. Let there be a total of rl links on G1 and r2 links on G2, so that rl 2 F and r2 > F. It then follows that there will

49

Geoforum 13173

Table 4

and

l Link Connections -

Case 83

0 Verbindungsgiiedsr

- Fdl 83

d-i

OHS* = f

Sirll8 Linked

(17)

where Double Linkd

Single Linked

I t 1

ut-t

Double Linked

1 Ut_-t j

F-U,+

Total

/U2,

IF-U2

Tow

and

’ UI + t

oyr

F-Ut

=

J

UI

7

U2

(F-U,)

F-U21 (F-1)

(19)

are the mean and standard deviation of the hypergeometric distribution (1). (See Appendix B).

}F

- F double links on G, and F - (rr - F) = 2F - rl (say) single links on Cr. Similariy, there will be 2F - r2 = Ua (say) single links on Gs. The situation is illustrated by Table 4, where + is the number of nodes with a single link on both G, and Gs.

Case A4 - Different Nodes Linked with Bi-Directed Edges.

Now, the nodes that have i single link on one graph but a double link on the other will obviornly contribute txactly (Ur - t) + (U, - t) agreements to the intersection of G, and Ga . Also, the nodes that are double linked on both graphs will contribute 2(F - U, - Us + t) agreements. Hence the total number of agreementscontributed by single/double or double/double links is 2F - Ur - Ua, which is independent of t and is a fixed quantity no matter what the forms of Gr and Ga are. We will therefore define the intersection of Gt and Gs as being the links that are single on both graphs and agreeing in direction, i.e. those of the t nodes mnected by a single link on both graphs where the directions agree.

GsesBlandB4-%rneNo&sLinkedwithNonorBiDirected Edges.

The situation is really very similar to that for Case AZ. If Gr and Gs are independent then the singie links on G, should look like a random sample of Us from the links on G, and the probability of obtaining t links on both graphs is hypergeometric:

“q F-“lC& _t (14)

PH* (t) = Fc”,

Following exactly the arguments used for Case A2 leads to the hypergeometric/binomid expression: PHBO(S) f “2’ i=O

PHL (S + i) ’ + b

(t)S + ’

(15)

for the probability that the links that are single on both G1 and Ga will give S agreements. The mean and standard deviations of this distribution are then WB+

= f Pw

(16)

Given a situation where all links are bidirected this clearly reduces to case Al since there can be no variation in the direction of links.

Here G, and G2 must be the same graph and therefore no problem exists.

Summary and Conclusions in this paper we have attempted to pose a philosophical problem, and to provide answers to some fundamental measurement problems in research. In the search for the regularities which are thought to underlie the phenomena we study, geographersand others have often had recourse to Graph Theory to represent certain situations. Once some pattern of interaction can be depicted by using a group of points (nodes) joined by lines (links or edges), interest is often centered around the degree of correspondence between two graph-theoretic representations of empirical evidence. We have suggestedmethods for answering the question: are the link patterns for the two graphs so similar that they could not reasonably have arisen by chance? We have examined in turn each of the situations which could possibly arise in link-node problems (Table l), where different nodes of two graphs are linked, where the Sune nodes are linked in both graphs, and where there is a combination of directed and/or non-directed edges. The approach to the solutions is based on probability theory and the properties of crmin statistical distributions the binomial, the hypergeometric, the hypergeometricbinomial and other combinations. in each casewe have provided a rationale for the suggcstedsolution as well as tables and computational procedures. The examples have been chosen from a wide variety of situations, for graphtheoretic representations are found in many fields of investigation - sociograms(Sociology and Psychology),

50

Geoforum 13/73

trees (Communication Theory). We hasten to add that these examples are illustrative only, and that most of them, when originally reported, were not analyzed in the manner suggested.We hope to have advanced the search for methods of measuring association and relationships that would be more suitable to the type of correspondence geographersseek. Appendix A Derivationof the Mean and Standard Wition Hypetgeometric/Binomirl Distribution

it follows that MS (8) = i j=o

(i + f ee)j PG* (j)

(25)

Differentiating (25) with respect to @ then yields the relationship between the (known) moments of the hypergeometric distribution and the moments of the H/B distribution.

for the

Equations (7) and (15) are both examples of a general type of distribution based upon the hypergeometric and binomial distributions. A general form of the distribution could be written as

and

E(S’)$

1

=f i

8pO

tj2+j) PC1(j)

(27)

j=O

Equation (27) leads to the result

(20)

ok =f (&

+a’)

(28)

ix0

The means and standard deviations given by equations (8), (9), (16) and (17) are obtained from (26) and (28) by substituting appropriate symbols for T, U, N and n.

where UqN-Uc

PG1(j)=

.

NC

n

“-’

(21) Appendix B

is the hypergeometric probability of obtaining] black balls in a random sample of n taken from a population of U black balls and N - U white balls. The mean and variance of the hypergeometric distribution defined by (21) ue, of course m)

I$.1

=

nU(N-U)

(N-n)

(23)

N2 (N-l) Now, the moment generating function of .the hypergeometric/binomial (H/B) distribution defined by (20) is obtained as follows

LimitingFarmsof the D#str$butionr The normal distribution as an approximation to the binomial is so well known that no comment is needed on this. The hypergeometric distribution can be approximated under various conditions by the binomial, Poissonand normal distributions; see the introduction to the hypergeometric tables of LIEBERMAN and OWEN [ 111. The approximations to the hypergeometric binomial distributions require more comments. They are obtained as follows. Note that if Ml (a) is the moment generating-funk tion of the hypergeometric distribution defined by (21) then, from the result (27) we find MS (e) = Mj (a)

MS (0) = E (e%)

(29)

where eo = z++ee 1 Now suppose that this hypergeometric distribution is well

=

=

2 i

s=o j=s

E i j=o

approximated by the binomial distribution with p = i. In this case it can be shown that

ks (+e*)’ (f)“SPG’ (j)

Mj (a) = ( (1 ‘- p) + pea)”

jcs(3 ee)s (+) i-SpGl

0’)

(24)

(30)

i.e. the moment generating function of the binomial distribution. But (29) therefore implies that

S=O

This last change of limits can be understood by consideration of Table 3.

Ms(@={(l-p)+p(i+feo))”

(31)

= {l -4

(32)

p+f

pee]”

Geoforum 13/73

51

which is the moment generating functions of the binomial distribution with probabilities p(S)=nCS

(it)’

(1

-jN)

1

lJ

n-S

Alternatively, if it is assumedthat the distribution (21) is well approximated by the Poisson with mean ~1, then 1)

BEAUCHAMP, M. A. (1966): An Improved Index of Centrality; Bahovl Sci., 59, 161-l 63.

151

BERRY, J. W. (1970): A Functional Approach to the Rolatbnship between Stereotypes and Familiarity; Amt. J. Psycho/., 22,29,33.

I61

CHRISTALLER, W. (1933): Ok rmtmkn Orte in SW dwtschlond etc., Jona, tnnslata by C. W. BASKIN as Centml Places in Southorn Germany. New Jereey: Prentice Hall.

171

COLEMAN, J. 5. (1965): Sociology. London.

181

CUMMINGS, L. P. (1967): The Structure of Natworks und Network Flows. Unpublished doctoral diswrtatbn. Ames, Iowa State Univ. Pnu

I91

HAGGETT, P. and R. j. CHORLEY (1970): lysis in Geogmphy. London: Arnold.

(33)

It follows, therefore, that when the hypergeometric part of (20) can be approximated by a binomial distribution so also can the whole expression.

MI (a) = @‘G1tea-

141

(34)

Hence, again using (29) MS (e) = &’

(a +f ca- 1)

which is the moment generating function of the Poisson distribution with mean i ~1. It follows that whenever the hypergeometric part of (21) is well approximated by the Poissondistribution, so is the whole expression.

lntmduction

to Moth8moticul

NefworR Afro-

IlO1

KNOX, G. (1964): Epidembbgy of Childhood Leukwmia in Northumherland and Durham; Br. J. pnv. sov. Med., 18, 17-24.

IllI

LIBERMAN, G. J. and D. B. OWEN (1961): T&/es of the Hypwyeomrtric Probublllry Dktrlbution. Stanford, Callf.: Stanford Univ. Press.

I121

MACKENZIE, K. D. (1966): Structural Centrality munbatbn Networks; Psychometrka, 31,17-25.

I131

NEFT, D. (1966): Stotlstical Analysis for Anal Distrlbutiun. Mangraph No. 3. Phikdeiphh: ReBbnal Science Research Institute.

I141

POPHAM, E. J. and B. F. J. MANLY (1969): Geographical Distrlhution of the Durmaptm and tho Continental Drift Hypothesir; Nufurc, 222, June 7,1969,981-982.

in Corn.

[l]

BACHI,R.(1963):SundirdDbUnceM1uursrud RIlated Methods for Spathl Analys& Pup. Proc. Reg. sci. Ass., 10,83-132.

1131

ROBINSON, A. H. and R. A. BRYSON (1957): A Method for Desuihlng Duantltathmly the Correspondence of Geographic DNtrbutbns; Ann. An Am. Grogr., 47,379-391.

[2]

BARTON, D. E. and F. N. DAVID (1966): The Random Intersectbn of Two Graphs. In: Reuorch Papers in Stotistics, F. N. DAVID (d.), New York: Wiley.

I161

THOMAS, E. N. (1961): Towards an Expanded Central Place Modol; Geogr! Rev., 51,400-411.

I171

131

BAVELAS, A. (1950): Communication Patterns in TaskOriented Groups; J. ocous. Sot. Am., 22.725-730.

TOBLER, W. R. (1965): Computatbn of tho Comspondonce of Geqnphical Parternst Pop. Proc. Reg. Sci. Ass., 14,131-139.