Controlling for size in centrality scores

Controlling for size in centrality scores

Social Networks 20 Ž1998. 135–141 Controlling for size in centrality scores Phillip Bonacich a,) , Amalya Oliver b, Tom A.B. Snijders c a c Dep...

47KB Sizes 0 Downloads 46 Views

Social Networks 20 Ž1998. 135–141

Controlling for size in centrality scores Phillip Bonacich

a,)

, Amalya Oliver b, Tom A.B. Snijders

c

a

c

Department of Sociology, UniÕersity of California, Los Angeles, CA, USA b Department of Sociology, Hebrew UniÕersity, Los Angeles, CA USA Department of Statistics and Measurement Theory, UniÕersity of Groningen, Groningen, Netherlands

Abstract All measures of centrality in graphs seem to be correlated with degree, the sheer number of connections of a position. There are occasions in which one wants a measure that is not necessarily related to degree but whose relationship to degree is an empirical finding. Existing corrections, which force a lack of correlation, or which have no statistical justification, are inadequate for this purpose. Based on an algorithm developed by Snijders Ž1991. wSnijders, T.A.B., 1991. Enumeration and simulation methods for 0-1 matrices with given marginals. Psychometrika 56, 397–417.x, for generating random graphs with fixed marginals, we suggest a measure of centrality that is logically but not necessarily empirically independent of degree. We examine the measure using data from Davis Ž1941. wDavis, A., Gardner, B., Gardner, M.R., 1941. Deep South. Univ. of Chicago Press, Chicago.x and Oliver Ž1993. wOliver, A., 1993. New Biotechnology Firms: A Multilevel Analysis of Interorganizational Relations in an Emerging Industry. PhD dissertation, Univ. of California, Los Angeles.x. q 1998 Elsevier Science B.V.

1. The problem In Linton Freeman’s seminal paper on centrality ŽFreeman, 1979., he distinguishes degree centrality, the sheer number of connections of each position in a network, from more subtle types: closeness and betweenness centrality. The other two types of centrality are clearly different conceptually from degree centrality. However, measures of closeness and betweenness centralities tend to be highly correlated with degree centrality. Consider the following example. Suppose this is the diagram for the communications network among a set of positions in an organization. Positions 3 and 5 communicate with the most other positions in the network. Position 3 is the highest in betweenness centrality in this network; it is on the geodesic Ža shortest path. between nine pairs of other positions, which is greater than for any other position. Degree and betweenness centrality are not perfectly related. For example, position 5 has the same number of connections as position 3 Žthree. but its )

Corresponding author.

0378-8733r98r$19.00 q 1998 Elsevier Science B.V. All rights reserved. PII S 0 3 7 8 - 8 7 3 3 Ž 9 7 . 0 0 0 0 9 - 9

136

P. Bonacich et al.r Social Networks 20 (1998) 135–141

Fig. 1. A graph in which centrality is highly related to degree.

betweenness centrality is less. Nevertheless, betweenness Žor closeness. centrality and degree are highly correlated; r s 0.995 for betweenness centrality and degree ŽFig. 1.. Of course, it is possible to make up examples in which the correlation is less. However, it will generally be true that degree centrality is highly correlated with closeness and betweenness centrality in most networks. There are two attitudes one can take about this correlation. The first is that the correlation is not undesirable. All that is necessary is that the two forms of centrality be conceptually different. They need not be orthogonal. For example, individuals who seek to be central in an organization may, in pursuing this goal, find it necessary to have many connections. It may well be that high degree centrality is Žalmost. a necessary condition for high betweenness or closeness centralities, but this does not invalidate the usefulness of the distinction between degree and betweenness centrality; any more than the fact that height and weight are correlated makes them redundant qualities. However, from a second perspective, the correlation is a problem. If high degree centrality almost assures high betweenness and closeness centralities, then the meaning of the measures is unclear. Does a high score on closeness or betweenness centrality merely reflect differences in degree, or is more going on? Those who study interlocking directorate structures among corporations are faced with a related problem. The eigenvector-based measure of centrality is the most commonly used among those who study interlocking directorates ŽBonacich, 1972, Mintz and Schwartz, 1985.. Its advantage over the graph-theoretic measures is that it does not require a binary adjacency matrix; it can consider the degree of overlap between firms. Its formula is given in Eq. Ž1.. l x s AXA x Ž 1. The rectangular matrix A gives the board memberships of individuals; a ij s 1 if person i is a member of board j, a ij s 0 otherwise. The square matrix AXA, the product of AX and A, gives the overlap between all pairs of boards. The main diagonal of AXA shows the sizes of the boards. The centrality scores are given by the eigenvector x. l is an eigenvalue of the matrix AXA. The eigenvector centrality of a firm is proportional to the sum of its overlap with other firms, each overlap weighted by the other firm’s centrality. The uncorrected eigenvector measure of centrality is always highly correlated with the size of the board. In assessing firms, this is clearly a problem. The sheer size of a board has nothing to do with the importance of a firm. The size may simply be set by the firm’s rules. Board size is a anomaly that should be corrected for.

P. Bonacich et al.r Social Networks 20 (1998) 135–141

137

Consider, for example, the following data. This well-known table reports the community events attended by 14 women in a southern community. This data is of the same form as interlocking directorate data; subsets of individuals Žthe eighteen women. belong to various groups Žthose who attended the fourteen events.. The sheer number of women who attended each event Ž‘Degree’. and the eigenvector centrality scores Ž‘Eigenvector’. are given in the second and third columns of Table 2. The two vectors are highly correlated: r s 0.97. Faced with this problem in interlocking directorate data, the most common solution is X . ij between boards of directors by the geometric mean of the to divide each overlap ŽAA size of the two boards i and j ŽEq. Ž2.., then to take the eigenvector associated with the largest eigenvalue of this standardized matrix S ŽEq. Ž3... S a ki a kj S ij s Ž 2. S a ki S a kj

(

S xslx

Ž 3.

This corrected centrality Ž‘Normalized eigenvector’. is given in the forth column Ž‘Normalized eigenvector’. of Table 2. Note that the corrected scores are also highly correlated with degree Ž r s 0.69.. This appears to be almost always the case. Division by the geometric mean of board size appears to be an inadequate correction for group size. There is another problem with Eq. Ž2.. The correction has no substantive or statistical justification. What is the meaning of dividing the overlap between two groups by the geometric mean of group size? Why not divide by the harmonic mean or the arithmetic mean instead? Of course, there is a way of creating centrality scores that are uncorrelated with group X size. Eq. Ž1. is satisfied by every eigenvector of the matrix AA. All the eigenvectors are orthogonal. Only the eigenvector associated with the largest eigenvalue will be highly correlated with the sizes of the groups. All the other eigenvectors serve as alternative centrality scores that will thus have very small correlations with size. The primary difficulty with this possible solution is that it does more than control for group size; it does not allow group size to be correlated with centrality. It may happen that large groups are also central. One wants an approach that corrects for variations in size in a meaningful way Žunlike the unmotivated correction in Eqs. Ž2. and Ž3.. but that does not preclude large groups being central.

2. The duality of persons and groups Bonacich Ž1991. has shown that simultaneous group and individual centralities can be computed given a membership matrix like that in Table 2. The centralities of individuals can be computed in exactly the same way as the centralities of groups. AAX y s l y

Ž 4.

The centrality of each group is proportional to the sum of the centralities of its members, and the centralities of individuals are proportional to the sums of the

P. Bonacich et al.r Social Networks 20 (1998) 135–141

138

centralities of the groups to which they belong. The two sets of centrality scores, for individuals and for groups, are thus related by the following equation. A x s 'l y Ž 5. Because individual and group centralities can be computed from the same data and have dual interpretations, the ideal correction for group size would maintain this pair of dual interpretations. 3. A solution Let us consider the space of all binary tables with a fixed set of row and column marginals. This space contains all the possible membership tables consistent with two sets of constraints: the sizes of groups and the number of groups each individual belongs to. This space provides a basis of comparison for the data. By comparing the centrality of a group or an individual with the distribution of its centrality scores in the space of all tables with the same marginals, we can determine whether the group or individual is particularly central within this distribution. The relative centrality within this space provides a measure of centrality that is conditional on the degree of this group or individual, and on all other degrees. An individual of low degree has a high relative centrality, if the centrality of this individual is high when compared to centralities of individuals with the same degree, in membership tables that are random under the constraint of having the same row and column marginals. For most data, the creation of the list of all possible tables with fixed row and column marginals is a combinatorial nightmare. Snijders Ž1991. has developed an algorithm and

Table 1 Attendance of 18 women in 14 community events ŽDavis et al., 1941. 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0

1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0

1 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0

1 1 1 1 1 1 1 0 1 0 0 0 0 0 0 0 0 0

1 1 1 1 0 1 1 1 0 0 0 0 0 1 0 0 0 0

0 1 1 1 1 0 1 0 1 1 0 0 1 1 1 0 0 0

1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 0 0

1 0 1 0 0 0 0 1 1 1 1 1 1 1 0 1 1 1

0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 1

0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 0 0

0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0

0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0

The harmonic mean of a set of numbers is the reciprocal of the mean value of their reciprocals. The geometric mean of a set of n positive numbers is the nth root of their product.

P. Bonacich et al.r Social Networks 20 (1998) 135–141

139

Table 2 Four centrality scores for the events in Table 2: degree, eigenvector centrality, normalized eigenvector centrality, and percentile centrality score Group

Degree Žsize.

Eigenvector

Normalized eigenvector

Percentile

1 2 3 4 5 6 7 8 9 10 11 12 13 14

3 3 6 4 8 8 10 14 12 5 4 6 3 3

0.13 0.14 0.23 0.16 0.30 0.31 0.37 0.50 0.38 0.21 0.10 0.24 0.16 0.16

0.25 0.26 0.30 0.27 0.31 0.32 0.33 0.35 0.28 0.22 0.13 0.22 0.20 0.20

89 98 67 77 51 67 69 93 02 38 03 29 68 68

This has been our experience. Relationships between the row sum and the maximal eigenvector are discussed in Minc Ž1988., Chap. II.

a computer program for a simulation-based evaluation of this space. The algorithm samples from the space with probabilities that are not exactly uniform Žconstant., but in the estimation phase this is corrected so that the conclusions do refer to a random table with the given row and column marginals. 1 We can place the data within the sample to evaluate the centralities of groups and individuals in relation to the constraints imposed by group size and number of group memberships. A sample of 500 tables with the same row and column marginals as Table 2 was drawn. Centrality scores were computed for each of the 500 tables using Eq. Ž1.. The actual data-based group centrality scores were then given a percentile ranking in relation to these 500 sets of scores. The results are presented in the fifth column of Table 1. These centrality scores Žin the column of Table 1 labeled ‘Percentile’. are correlated y0.13 with group size Žcolumn 2.. However, it is an important advantage of this correction that the scores need not be uncorrelated with group size. The mean correlation between group size and percentile centrality ranking across all 500 simulations was 0.0004 with a range from q0.65 to y0.64. Because the percentile centrality scores need not be uncorrelated with group size, a correlation in the data is an empirical finding.

4. A second example: centrality among biotechnology firms Oliver ŽOliver, 1993, Oliver, forthcoming; Liebeskind et al., forthcoming. has reported data on formal collaborative relations among 89 American new biotechnology firms ŽNBFs., which are R & D start-ups, founded since 1976, that operate as network 1

The associated computer program, ZO, is available free from the third author.

140

P. Bonacich et al.r Social Networks 20 (1998) 135–141

organizations. The largest block connected directly or indirectly by relations of interlock contains 36 firms. The analysis that follows is confined to these 36 firms. This relation pattern is represented as an undirected graph, or by its adjacency matrix, which is a symmetric binary table with zero main diagonal Žonly interfirm ties were coded; thus no main diagonal values are listed.. In this case, the relevant space to be considered for relative centrality coefficients is the space of all symmetric binary tables with the given marginal sums and a zero main diagonal, or Žequivalently. the space of undirected graphs with given degrees.

Table 3 Degree and standard score for centrality based on sample of 4000 random tables Group

Degree

Percentile

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36

1 3 1 3 4 4 1 1 1 1 1 1 8 7 4 1 2 1 3 3 1 2 3 1 5 1 2 1 1 2 1 1 2 2 1 1

34 68 27 100 100 56 83 65 83 38 70 10 100 100 75 7 28 0.1 0.2 100 13 8 28 5 92 33 34 83 11 27 83 81 60 84 33 64

P. Bonacich et al.r Social Networks 20 (1998) 135–141

141

In addition, the constraint was imposed that the structure had to be connected Žcentrality scores cannot be computed uniquely when there are pairs of vertices in the graph that are mutually unreachable.. The method of Snijders Ž1991. and the ZO computer program are also applicable to this space. Four thousand tables were sampled. The degree centrality and standard scores for eigenvector centralities based on the mean and standard deviation of the sample of four thousand are presented in Table 3. Interestingly, the corrected scores in this instance are highly correlated with degree; the correlation between degree and standardized centrality score is 0.601. This illustrates an advantage of this approach. Centrality scores are not forced to be uncorrelated with group size; a correlation, or lack of correlation, is an empirical finding. 5. Conclusions We have presented a technique for controlling the effects of group size or degree that does not artificially force a low correlation, but adjusts in what we think is a meaningful way. The adjustment is based on referring the eigenvector centrality of a given individual or group to the distribution of its own centrality when all marginal sums are fixed Žthis includes the degrees of this and all other individualsrgroups. while the further structure is random. With data on group memberships, this means that the resulting centrality scores for groups and for individuals are corrected both for the distribution of group sizes and the distribution of memberships by individuals. With one mode data, the correction is for the degrees of vertices within the graph. Other corrections are possible. In looking at two mode data, one might wish to correct only for group size and not for individual memberships. In this case, ones sample space would consist of all binary tables with the same column marginals, but not necessarily the same row marginals. Using the same approach one could, for example, correct for the degree of transitivity within a graph by placing a graph within the set of graphs with the same marginals and Žapproximately. the same degree of overall transitivity. References Bonacich, P., 1972. Techniques for analyzing overlapping memberships. In: Costner, H.L. ŽEd.., Sociological Methodology. Jossey-Bass, San Francisco, pp. 176–185. Bonacich, P., 1991. Simultaneous group and individual centralities. Social Networks 13, 155–168. Davis, A., Gardner, B., Gardner, M.R., 1941. Deep South. Univ. of Chicago Press, Chicago. Freeman, L., 1979. Centrality in social networks: conceptual clarification. Social Networks 1, 215–239. Liebeskind, J.P., Oliver, A.L., Zucker, L., Brewer, M. Organization Sci. Žforthcoming.. Minc, H., 1988. Nonnegative Matrices. Wiley, New York. Mintz, B., Schwartz, M., 1985. The Power Structure of American Business, Univ. of Chicago Press, Chicago. Oliver, A., 1993. New Biotechnology Firms: A Multilevel Analysis of Interorganizational Relations in an Emerging Industry. PhD Dissertation, Univ. of California, Los Angeles. Oliver, A. On the nexus of organizations and professions: networking through trust. Sociological Inquiry Žforthcoming.. Snijders, T.A.B., 1991. Enumeration and simulation methods for 0-1 matrices with given marginals. Psychometrika 56, 397–417.