Social Science Research 28, 111–134 (1999) Article ID ssre.1998.0643, available online at http://www.idealibrary.com on
Entropic Measures of Belief System Constraint John Levi Martin Rutgers University The notion of the constraint (or degree of organization) of belief systems is a potentially fruitful one in both political science and sociology. Existing attempts to measure constraint, however, have had severe drawbacks, both conceptual and methodological. This paper argues that this constraint is better conceived of as the absence of dispersion in a table of cross-classified beliefs and demonstrates that measures based on this conception are better at discerning the presence of constraint where it is quite likely to exist, namely, in small, ideologically focused groups. The approach to constraint outlined here has a number of other advantages: it is equally appropriate to interval, ordinal, or nominal data; it does not require external criteria of what responses should be associated with others; it is not variance-dependent; it does not assume that constraint is equivalent to either reduced dimensionality or pairwise association; and it is equally appropriate for mass and nonmass belief systems. It is also simple to conduct tests of significance difference across populations. Finally, this paper outlines the proper way to standardize such measures, which applies to other uses of the entropy measure, such as the quantification of income inequality. r 1999 Academic Press
THE PROBLEM OF MEASURING CONSTRAINT The Conception and Measurement of Constraint Does a certain group have an organized system of beliefs? This question, which I will refer to as the question of ‘‘constraint,’’ is often central to both political scientists and sociologists, who generally attempt to answer it using survey data. I will begin by reviewing the conception of the constraint in belief systems and then critically analyze the most common methods used to measure such constraint. After demonstrating the severe limitations to existing methods, I will demonstrate how a reconceptualization leads to a set of measures more likely to actually measure such constraint. The notion of constraint comes most directly from the work of Philip Converse. The Conversian idea of constraint in public opinion arose in the context of what is I thank Mike Hout and Jim Wiley for their encouragement on this project, and two anonymous reviewers for their helpful critique. Address correspondence and reprint requests to John L. Martin, Sociology Department, Rutgers– The State University of New Jersey, 54 Joyce Kilmer Avenue, Piscataway, NJ 08854-8045. 111 0049-089X/99 $30.00 Copyright r 1999 by Academic Press All rights of reproduction in any form reserved.
ssre 0643 @xyserv2/disk4/CLS_jrnlkz/GRP_ssre/JOB_ssre28-1/DIV_233a05 jame No. of Pages—00 First page no.—110 Last page no.—133
112
JOHN LEVI MARTIN
now known as the ‘‘nonattitudes’’ hypothesis (see the review in Smith, 1984). This hypothesis was first raised in Campbell, Converse, Miller, and Stokes’s (1960) analysis of the American voter, but has its classic formulation in Converse (1964). For Converse (1964, p. 207), what made a belief system systemic was the presence of constraint, which he defined as the ‘‘success we would have in predicting, given initial knowledge that an individual holds a specified attitude, that he holds certain further ideas and attitudes,’’ which he took to imply the inspection of binary relations between items. This led to the dominant measure of this constraint’s being the average of pairwise association measures, usually correlation coefficients ( Jennings, 1992, p. 424), but sometimes (as in Converse’s original study) Goodman-Kruskal gammas. It has often been noted that since correlations are based on the variances of the different items, comparisons across groups, which are of chief interest in the study of constraint, are biased unless the group variances are equal (see Barton and Parsons, 1977; Knight, 1985, p. 841).1 Concern with the variance-based property of correlation coefficients, however, often obscures an equally vexing limitation, namely their pairwise nature. The limitation of association to pairwise relations is at odds with the equation of association and sophistication, or the ability to connect the significance of different facts and opinions. Upon reflection, it is clear that what is so distinctive about the systemization of beliefs is the possibility of complicated relationships between sets of ideas, not simply pairwise interactions. As we shall see below, a correlation between two items may be zero precisely because of the particular form of constraint in a belief system. A second technique, perhaps less often explicitly introduced as a measure of constraint but no less often employed to determine whether a group’s ideas really make sense, involves an attempt to reduce the dimensionality of response. While in some cases, multidimensional solutions are allowed, the most common approach is to search for unidimensionality in a set of items. A common example in public opinion is the question of whether attitudes are crystallized along a liberal–conservative dimension. Methodologically, this can involve ‘‘scale tests’’ such as Cronbach’s ␣, or, less frequently, fundamental measurement tests such as those developed in Item Response Theory (e.g., Reiser, 1981; Brooks, 1994). There is a well-known problem in the use of most unidimensional techniques, for they depend on the researcher’s correctly imputing to all respondents a single way of thinking about things, the only alternative to which is disorganization (see the review and critical assessment of conventional approaches in Wyckoff, 1987; 1 Converse’s original analysis (as well as the later analyses of Nie, Verba, and Petrocik [1976], pp. 23–28, 123–140, 148–155) actually did not use the correlation coefficient, but the Goodman-Kruskal gamma, which is a proportional-reduction-in-error measure (in keeping with Converse’s epistemological definition of constraint). The symmetric gamma, an average of the asymmetric gammas, is not ‘‘variance’’ based in the way that the product-moment coefficient is, but it is extremely sensitive to the particularities of the distribution, and can be zero even if the two variables are not independent (see Liebtrau, 1983, p. 19); hence, the comparison of average gammas from different populations asked different questions (e.g., Nie and Andersen, 1974, p. 115) is at best an uncertain venture.
ssre 0643 @xyserv2/disk4/CLS_jrnlkz/GRP_ssre/JOB_ssre28-1/DIV_233a05
jame
113
CONSTRAINT
note that this is also true of the innovative item/person/group study of variance of Barton and Parsons [1977]). Differing patterns of constraint in different groups may be missed, as the researcher unknowingly imposes the structure of one group’s thought on the others (see, e.g., the positions of Lane, 1962; Nie and Andersen, 1974).2 The Problem of Associationism Conventional methods thus have severe limitations to their use as a general approach to the measurement of constraint. But even more importantly, all are grounded in the same assumption, one that necessarily leads to logical paradoxes as well as methodological difficulties, namely, the assumption that nonindependence is indicative of constraint. Why is a high association between two items considered to be indicative of high constraint? The underlying model is, first of all, based on a continuous version of the underlying beliefs. Two beliefs with a high correlation (for example, in Fig. 1) are judged to be indicative of a cognitive association between the two items in each and every respondent’s mind. However, practitioners are ever on the lookout for some sort of unobserved heterogeneity, such as a situation like that of Fig. 2, where the juxtaposition of two apparently separate subgroups, each with a correlation of 0, leads to an overall significant correlation. In such a case, we would be less confident in imputing the high aggregate association to each and every respondent as an internal psychological quality. But can we be confident even for a case like that in Fig. 1? The ‘‘variance’’ dependence of correlations in practice means that the answer to our question, ‘‘do the members of this group connect beliefs X and Y?’’ will change greatly, depending on whether our sample includes values ⫺10 ⬍ x ⬍10 (as in Fig. 1), or only ⫺3 ⬍ x ⬍ 3, let alone ⫺1 ⬍ x ⬍ 1. Indeed, for every relation of association, there is some level of resolution at which partitioning the sample into subsets based on the value of one of the variables leads to a number of subsamples in all of which there is no association. In a crystal-clear relation, this level of resolution is not reached until each person is in her own subset, but in more true-to-life circumstances, this level of resolution is significantly coarser. Conceptual difficulties naturally arise when using a global property to make individual level statements when that global property tends to disappear as one approaches the individual level. I shall term this the ‘‘associationist paradox’’ that haunts conventional approaches to measuring constraint. 2 While such critiques can, to some extent, be answered with multidimensional factor analysis (as in the exemplary analysis of Stimson [1975]), there has of yet been no clear reason offered to associate any reduction in dimensionality with an archetypically constrained belief system. Even more problematically, such techniques are based on the information contained in the set of pairwise correlations and, therefore, suffer the same problems as do simple correlational methods. Thus, while they allow for ‘‘multidimensionality,’’ the relations between all items must still be expressible in pairwise, linear relations—complex or ‘‘twisty’’ relations in the space of X items are not allowed and, therefore, not measured.
ssre 0643 @xyserv2/disk4/CLS_jrnlkz/GRP_ssre/JOB_ssre28-1/DIV_233a05
jame
114
JOHN LEVI MARTIN
FIG. 1.
Two highly correlated beliefs: The ideal-typical case.
The reader may assume that this is a necessary, if unfortunate, result of using aggregate data to discuss an individual level, if fundamentally social, phenomenon. I will now demonstrate that this is not necessarily the case. There is an alternative formulation of constraint which is both epistemologically defensible and has obvious methodological advantages. To arrive at this alternative formulation, we must call into question the assumptions of the associationist perspective. The associationist perspective considers as constraint only what can be taken as evidence of an individual-level internal association of ideas with one another. Compositional ‘‘artifacts,’’ in contrast, are not considered ‘‘constraint’’—if it could be demonstrated that the two clumps of Fig. 2 represented two subgroups that could be distinguished by some third variable Z, the correlation between X and Y would have to be dismissed, since, within each subgroup, the beliefs are statistically independent. According to the associationist perspective, we must, therefore, drive from our minds the single most important fact, namely, the polarization of the group, and conclude that the beliefs in question are nowhere constrained. In certain cases, the associationist perspective may capture a socialpsychological phenomenon of interest. But when it comes to those beliefs that are most central to a group, which I shall call ‘‘core’’ beliefs, it seems quite reasonable that any attempt to compare the constraint of different groups will approximate
ssre 0643 @xyserv2/disk4/CLS_jrnlkz/GRP_ssre/JOB_ssre28-1/DIV_233a05
jame
115
CONSTRAINT
FIG. 2.
Two highly correlated beliefs: The pathological case.
the situation of trying to compare the two clumps in Fig. 2. To do so requires a nonassociationist approach to constraint. Let us begin to think of the data most generally as a multidimensional ‘‘belief space,’’ that is, the space formed by treating the responses to each item in our data as an independent dimension. (Unlike correlational methods, which only look at two dimensions at a time, we will look at the full distribution in this multidimensional belief space.3) By rejecting the associationist perspective, we no longer have to relegate some of what appears to be constraint to ‘‘compositional’’ effects. More importantly, we can derive a single, unified, understanding of what constraint is for any set of beliefs—it is the inverse of the degree of arbitrary movement in this space of all 3 By looking at the full belief space, instead of beginning by reducing it to pairwise coefficients, we are in a better position to uncover actually existing constraint, for it is possible that a highly constrained distribution will have absolutely no ‘‘constraint’’ as measured by correlational methods, if the particular form of constraint is complex. For an example, consider a three-dimensional belief space with continuous responses, and imagine the admittedly unlikely observed distribution of the surface of a sphere in this three dimensional space. This is an extremely tightly constrained system—given information as to one’s position on any two of the items, there are only two possible values for one’s responses to the third. But the correlation between any two items, and the partial correlation between any two items, is exactly zero. Hence, the conventional method would find no constraint, because in looking successively at two dimensions at a time, it misses the organization in a multidimensional belief space.
ssre 0643 @xyserv2/disk4/CLS_jrnlkz/GRP_ssre/JOB_ssre28-1/DIV_233a05
jame
116
JOHN LEVI MARTIN
possible beliefs. From this conception of constraint, we can then quantify and compare the degree of constraint in a group for any set of beliefs. We will find, however, that this measurement approach is not only compatible with the project of disaggregating compositional and associational constraint, but is indeed superior to other techniques for the case of mass belief systems. AN APPROPRIATE DEFINITION OF CONSTRAINT Any measure of constraint that is based on the idea of restriction of arbitrary movement must also have the property that the aggregation of subgroups, even when these subgroups are defined in terms of their positions in the belief space, makes substantive sense. For example, if we first looked at our group as comprising only one of the two clusters in Fig. 2, and then consider the group to include both clusters, we would have to conclude that the analysis which took them to be in the same group should find less constraint than the analysis based on only one cluster. Even though the correlation is significant in the second case, but not in the first, a significant portion of the belief space has been opened up to movement and, therefore, a major constraint has been relaxed. (Conventional methods, of course, would say the opposite.) I now present the method for quantifying belief constraint as defined above, which turns out to be based on the familiar measure of entropy.4 The Information Version A summary statistic of this reconceptualized constraint, that is, the degree of resistance to arbitrary movement in the belief space, should have the following properties: (1) it should be at a maximum when there is unanimity; (2) it should be at a minimum under equiprobability; (3) it should change continuously as the cell probabilities change continuously; (4) given marginal distributions, it should be at a minimum when items are independent; (5) given equiprobability, it should decrease as the number of possible categories goes up (since people are objectively more dispersed). Shannon (1948) has demonstrated that there is only one form for such a measure to take, for it is the negative of the entropy, where entropy is defined as follows: Entropy ⫽ H ⫽ ⫺⌺pi ln (pi )
(1)
(the summation is over all cells in the cross classification). This is the form of entropy that is used in information theory, and which has been applied to contingency table analysis (Kullback, 1959), but seems to have fallen into disuse. We will continue to discuss the application of information 4 There is, so far as I know, only one significant precedent in using this approach to a multidimensional attitude table. Lieberson (1969) suggested using measures of diversity adopted for multiple classifications of populations to multiway attitude tables, which turns out to be closely related to the entropic solution described below. Unfortunately, his suggestion seems not to have been picked up on. (See Nayak [1985] on the relation between Lieberson’s measures and various forms of entropy.)
ssre 0643 @xyserv2/disk4/CLS_jrnlkz/GRP_ssre/JOB_ssre28-1/DIV_233a05
jame
117
CONSTRAINT
theory entropy to our problem, focusing on entropy as opposed to constraint, so as to be compatible with previous work, but it should be recalled that constraint is the negative of entropy. Equation (1) is for a ‘‘one-dimensional’’ representation of probabilities. In the analysis of attitudes, we are looking at a table comprised of different variables, each of which can be thought of as a dimension in the ‘‘belief space.’’ The ‘‘overall entropy’’—i.e., that statistic which answers our question as to the degree of arbitrary movement in the belief space—can be thought of as a one-dimensional problem, since every cell in the belief space formed by a distinct combination of values on all the items is treated as the basis for a pi. This overall entropy is the dispersion throughout that belief space—less dispersion means that the cloud of positions is more ‘‘clumped’’ in the space, though it implies nothing about the shape of that clump (or clumps). In studying the constraint in belief systems, we are likely to compare two or more groups. It is well known that only differences between entropies are meaningful (i.e., that entropy is a relative, not absolute, quantity). In sociology, this has led to attempts to ‘‘standardize’’ it to make comparisons intuitively accessible; these attempts, however, as we shall see below, are incorrect. Such standardizations are also unnecessary for the large sample (information) version of entropy, since to make a comparison between two groups to see which has higher constraint (lower entropy), we need only compare the magnitudes and then see whether this difference is statistically significant. We, therefore, need not standardizations, but statistical tests. Miller and Madow (1963 [1954], p. 449f) show that, if there is not equiprobability, the estimated entropy has a normal limiting distribution with mean 0 (i.e., it is unbiased) and variance: 2 ⫽
1 N
N
兺 p(i)[ln p(i) ⫹ H]
2
(2)
i⫽1
where H is the true entropy.5 Substituting the a null value of the entropy for H (such as 0), we get an estimate of the variance, to be used to test the null hypothesis that the true entropy is this value. More importantly, it allows us to test whether the difference between two observed entropies is significant, which is equivalent to testing the null hypothesis that the population entropies are identical. The test of the null hypothesis that H1 ⫽ H2 then uses the combined variance 1 ⫹ 2; since the variance increases with the value of H, a conservative test is to use the larger of H1 and H2 as the estimate for H in the computation of both variances. Thus, in contrast to conventional approaches to comparing the constraint of different groups, with the entropic measures we can conduct tests of significant difference, in addition to merely comparing magnitudes.
5 Miller and Madow (1963 [1954], p. 457) also supply a small sample correction for both the estimate and variance of the entropy in log2 terms.
ssre 0643 @xyserv2/disk4/CLS_jrnlkz/GRP_ssre/JOB_ssre28-1/DIV_233a05
jame
118
JOHN LEVI MARTIN
Constraint beyond the Marginals The measure of constraint based on entropy does not, in itself, distinguish the contribution to reduced dispersion due to the marginals and that due to association. This is, I believe, the most theoretically reasonable operationalization of constraint. However, it is still possible to look at the constraint only above and beyond the marginals. Following Borhek and Curtis’s (1975) terminology, I will call this aspect of the constraint the ‘‘tightness’’—this is what the conventional technique of average correlation coefficients aims to measure. Indeed, it is a rigorous implementation of Converse’s original definition of the increased information about one variable coming from another. This aspect of the constraint may have special importance in certain cases (such as those where marginals are experimentally manipulated) and is likely to be favored by those conceiving of constraint in associationist terms. For this reason, I will now discuss its measurement and how to decompose the overall constraint into two portions, one the quantification of the degree of ‘‘consensus’’ at the marginals and the other this ‘‘tightness.’’ Even given the associationist project, the techniques discussed here are superior to conventional ones because (1) unlike the case of methods based on correlations, there is no restriction to bivariate relations; (2) relatedly, we do not need to assume simple linear relations between items; (3) the method is equally appropriate to items that are measured at the interval, ordinal, and nominal level; (4) unlike the item, person, group study of variance, it does not require the researcher to choose the ‘‘correct’’ relationships between beliefs. To measure tightness (constraint beyond the marginals), we must switch from the one-dimensional notation of the probabilities as in (1) to one utilizing the full dimensionality of the belief space. Not only is there an overall entropy, but there is an entropy associated with each dimension (each variable). It is, therefore, simple to decompose the overall entropy into a portion due to the marginals and a portion due to association beyond the marginals. As stated above, the overall entropy is at a maximum when the items are independent; in this case H(AB) ⫽ H(A) ⫹ H(B), where H(A) is the entropy associated with variable A, similarly for H(B). More generally, H(AB) ⱕ H(A) ⫹ H(B). This allows us to treat the residual entropy when the marginal entropies have been subtracted as a measure of the entropy beyond the marginals (or the ‘‘mutual information’’ I);6 thus, I(AB) ⫽ H(A) ⫹ H(B) ⫺ H(AB) (cf. Luce, 1960, p. 39), where I(AB) is called the ‘‘mutual information’’—it means that given information about A, we have less uncertainty about the value of B, in keeping with Converse’s predictive definition. There are tests for the statistical significance of I(AB) which are asymptotically distributed as 2,7 and therefore are asymptoti6 It is also possible to use not the mutual information as defined above, but the quadratic mutual information—I2(AB) ⫽ H2(A) ⫹ H2 (B) ⫺ H2(AB)—which has certain advantages in terms of the precision of the estimation from a sample (Gil et al., 1986). 7 A test of independence is a test of the hypothesis (for two variables) that I(AB) is zero. Luce (1960, p. 47) and McGill (1954, pp. 112–113) provide formulae to convert any I-type measure of
ssre 0643 @xyserv2/disk4/CLS_jrnlkz/GRP_ssre/JOB_ssre28-1/DIV_233a05
jame
119
CONSTRAINT
cally equivalent to loglinear type tests of independence. Similarly, for three variables, H(ABC) ⫽ H(A) ⫹ H(B) ⫹ H(C) ⫹ I5ABC6 ⫽ H(A) ⫹ H(B) ⫹ H(C) ⫹ I(AC) ⫹ I(AB) ⫹ I(BC) ⫹ I(ABC). It is possible to formulate an entire system of analysis on these bases, as did Kullbach (1959); however, there is no need to. The loglinear system of Goodman is asymptotically equivalent and, in addition, is more flexible, clear, and well known, and seems to have computational advantages in certain circumstances (Goodman [1970], pp. 133, 135; [1971], p. 169).8 All researchers need from the entropic approach is the interpretable quantification of the tightness, which gives us a means of comparing different groups in terms of their constraint beyond the marginals. This also allows us to decompose the constraint into two portions, one due to the marginals and another due to the tightness. Generally, C ⫽ M ⫹ T, where C is the overall constraint, M is the constraint due to the marginals, which we can consider the degree of homogeneity, and T the Tightness. The informational version of entropy, and the tests of significant difference, require large samples (an average cell frequency of 5 is generally considered necessary). However, the underlying goal of this approach is to be able to look at a possibly large, multidimensional belief space (as opposed to reducing away complexity by restricting attention to pairwise relations). Even with large sample sizes, the table resulting from the cross-classification of, say, only 8 trichotomies is too huge (6561 cells) for the large sample assumptions to hold. It is, therefore, necessary that we have an ‘‘exact’’ formulation of entropy that is appropriate for situations with a small N (small relative to the number of cells). Furthermore, as we shall argue, the treatment of such small N cases is a crucial one for a study of truly constrained belief systems. We are probably more likely to find truly constrained belief systems in nonmass settings, and our techniques should be sensitive to these small N cases. The next section derives such a formulation from the thermodynamic principles underlying the original formulation of entropy. The Thermodynamic Representation The general relation between thermodynamic entropy and informational entropy—i.e., whether the identity is accidental or intrinsic—has been hotly debated; in this case, the relation is evident, for both are the appropriate ways of describing the relative concentration or dispersion in a multidimensional table. transmitted information (such as the information from A to B controlling for C, etc.) to a likelihood form asymptotically distributed as 2. Miller (1955) provides a small sample correction for I(AB) (there called T). 8 However, when we want measures of the relation between different dimensions in a table, especially for polychotomies, there are very useful statistics given by Preuss (1980) that are superior to 2 based, proportional-reduction-in-error techniques, and correlations. There are also other entropic measures of diversity for which statistical tests have been developed, though I am sticking with the Shannon entropy because of its relation to thermodynamic entropy, which will prove helpful for small sample sizes. The reader interested in tests for other related measures can consult Salicru´ et al. (1993) and Nayak (1985).
ssre 0643 @xyserv2/disk4/CLS_jrnlkz/GRP_ssre/JOB_ssre28-1/DIV_233a05
jame
120
JOHN LEVI MARTIN
We will rederive the equations for entropy from thermodynamic principles and then look more closely at the lessons for contingency table analysis.9 Classical statistical thermodynamics begins with a distinction between microstates and macrostates—macrostates are the recognizable orderings of sets of particles; the microstates associated with any macrostate are the various ways in which indistinguishable particles, by their specific arrangement, produce this same aggregate macrostate. The analogy to contingency table analysis is quite straightforward—what we have in the cell counts of the table is a macrostate—the microstates are all the possible arrangements of individuals in the sample.10 If it is assumed that all microstates are equally probable, the ‘‘thermodynamic probability’’ of any macrostate can be derived from the combinatorial formulate as follows: Thermodynamic Probability ⫽ W ⫽
N! N1 !N2 !N3 ! . . . NM !
(3)
where N1 is the number of persons in the first cell, etc, and M is the total number of cells. The total number of microstates is MN, and the probability of any macrostate is W divided by MN. The thermodynamic entropy S is equal to k ln (W ) (where k is Boltzmann’s constant), i.e., it is proportional to the logarithm of the thermodynamic probability. As can be shown, the Shannon measure of entropy is the entropy, not of the thermodynamic probability, but of the conventional probability. This relation, however, involves an approximation for ln (x!) (Stirling’s approximation) ln (x!) ⬇ x(ln [x] ⫺ 1); with this approximation, H ⫽ S/(kN).11 It would seem that there is no reason to prefer one form of entropy—the thermodynamic or the information—over the other (with the exception that the decomposition of homogeneity and tightness is easier with the information form). However, when the number of persons in a cell is small, Stirling’s approximation used to establish the identity H ⫽ S/(kN) becomes very inexact. At 5 persons (the usual minimum claimed for proper distribution of 2-type measures) the approximation is 36% short of the true value; at 10 persons, it is still off by 14%, and even at 20, it is off by 6%. (While it is quite true, as a reviewer pointed out, that there are more accurate versions of Stirling’s approximation, the connection between representations of entropy depends on the form given in footnote 11). So while 9 My source for this section is Sears (1950). For a discussion of the link between representations of entropy, see Akaike (1983). 10 Lest one object to using an analogy with physics to clarify social statistics, I hasten to point out that it was actually social statistics a´ la Quetelet that inspired both Maxwell and Boltzmann to develop statistical thermodynamics, and they compared molecules to people (see Porter, 1987). 11 Stirling’s approximation comes from the fact that for large x,
ln x! ⫽
兰 x
1
ln xdx ⫽ x ln x ⫺ x ⫹ 1 ⬇ x (ln x ⫺ 1)
(the 1 is simply dropped, being relatively small).
ssre 0643 @xyserv2/disk4/CLS_jrnlkz/GRP_ssre/JOB_ssre28-1/DIV_233a05
jame
121
CONSTRAINT
Shannon’s entropy is the correct measure for continuous probabilities, and the observed probabilities (⫽Ni/N) are the maximum likelihood estimates of the continuous probabilities which approach these probabilities as N becomes very large, for small samples, Shannon’s entropy is different from the true entropy given by the thermodynamic approach (furthermore, its estimation is biased for small samples [Yuan and Kesavan, 1997, p. 140]). This has consequences for the estimation of the relative improbability of any state, to which we will next turn our attention. It is for this reason that S should be preferred over H for contingency table analysis when some counts are small (⬍15). However, once the connection of entropy to probability (i.e., the underlying thermodynamic probability) has been made, there is no reason to prefer one over the other, since there is a monotonic relation between the two. In fact, it turns out that it is better for contingency table analysis to use the probability itself. This is because, as mentioned above, entropy is a purely relative quantity. Only changes in entropy or differences in entropy across similar tables are meaningful. Unfortunately, in the small N case, we cannot use the asymptotic properties of the entropy function to conduct tests of significant difference, as we did above. When we were using the information version, we were assuming that we had N’s large enough to consider our observed probabilities to be continuous quantities. When we switch to the thermodynamic representation for small N’s, we cannot ignore the effect that sample size has over the distribution of possible values of the thermodynamic probability. (For example, given two cells and two people, the maximum crystallization occurs in 50% of the cases generated under equiprobability and is, therefore, not evidence of high crystallization; given four people, however, this crystallization is only expected in 12.5% of the cases and, therefore, does suggest crystallization, though the W ’s are the same.) For this reason, we need to standardize the observations. Recognizing the relative nature of entropy, social scientists have previously proposed ‘‘normalizing’’ entropy by comparing it to the maximum possible entropy (for example, Coleman, 1964). The logic is reasonable, akin to proportion of variance explained. In some cases, such measures can be useful to gauge explanatory power (Magdison [1981] demonstrates their similarities to and differences from other approaches). But there is a fundamental problem, which comes from the general shorthand of standardizing by division by maximum, as opposed to using the whole probability distribution, as we shall demonstrate. Alternate Standardizations To make this point, let us consider a standardization for an observed value (t1) of a parameter t that has no known metric, but a maximum T. The common approach would be to make a new parameter t* ⫽ t1/T. But let us say that we also have a probability distribution for t. We realize that it is not the value of t compared with T that is important, but the position of t1 on a distribution of more-to-less likely. Similarly, given two populations, and two t1’s and two T’s, the best comparison is one relative to each’s probability distribution. So a better
ssre 0643 @xyserv2/disk4/CLS_jrnlkz/GRP_ssre/JOB_ssre28-1/DIV_233a05
jame
122
JOHN LEVI MARTIN
standardization would be along the lines of
兰
t1
⬁
p(t) dt
(4)
i.e., the relative improbability of the observed value of t. This, in effect, changes our approach from ‘‘normalizing’’ the range or ‘‘setting the metric,’’ which is done through division-by-the-maximum, to one much more like a conventional test of significance. Furthermore, it is true that for a statistic whose properties are understood, the division by the maximum preserves a relative ‘‘sense’’ of just how big the observed value was. But for a statistic such as entropy, whose properties are not intuitively understood, there is no real information lost in this switch to a standardization based on a possible distribution. Instead, we avoid making false inferences due to an unreliable interpretation of the magnitude of a statistic. The incoherence of the technique of standardizing by dividing by the maximum can be seen through a simple thought experiment, which I will illustrate below. Recall that the entropy varies monotonically with the probability, and, therefore, one is as good as the other in making comparisons of magnitude. Two researchers, one using the probability metric and the other the entropic metric, might each standardize their observations by dividing by the maximum, but their answers would be different, and so they would come to different conclusions regarding the degree of constraint, even though the statistics they are using are direct transforms of one another. Such confusion is possible only because there is nothing about the metric of the measurements which necessarily makes division by the maximum reasonable. In fact, a moment’s reflection tells us the technique can’t be correct for both the probability metric and the entropy metric. If Entropy ⫽ ln (Probability), then a ratio of probabilities corresponds to a difference of entropies (not a ratio). While we understand what probabilities are and, therefore, dividing by the maximum here is not so bad (though not intrinsically meaningful), we lack such an understanding of entropy, as shown by our propensity to divide by the maximum instead of subtracting it. Thus, we seek to standardize the observed probability in regards to some sort of probability distribution. Since we lack a parameter on which to condition a distribution, we instead construct a ‘‘pseudo-probability’’ distribution, which is a ranking of possible states in terms of probability; we can then use this to see what proportion of possible macrostates are as improbable as the observed macrostate. Consider some contingency table with M cells and N persons, leading to L possible macrostates. Let us make an ordered bar chart-type plot of the thermodynamic probability distribution of these macrostates, where, as we recall, the thermodynamic probability is the number of microstates associated with any macrostate. For example, if N ⫽ 4 and M ⫽ 2, there are 16 possible microstates and 5 distinct macrostates (see Table 1). The ordered probability distribution would look as follows (see Fig. 3). We standardize any observation by comparing the probability mass to the left or right of it (depending on convention). For a
ssre 0643 @xyserv2/disk4/CLS_jrnlkz/GRP_ssre/JOB_ssre28-1/DIV_233a05
jame
123
CONSTRAINT TABLE 1 Macro and Microstates for N ⫽ 4, M ⫽ 2 Possible macrostates
Cell 1 Cell 2 Number of microstates
1
2
3
4
5
4 0 1
3 1 4
2 2 6
1 3 4
0 4 1
Distribution of persons in Two Cells Total ⫽ 16
more informative (though less pictorially simple) example, Table 2 presents us with the pseudo-probability distribution of the case for 8 people and 9 cells. If we had observed a state with Wo ⫽ 5040, we find that there are 1260 distinct macrostates having the same W, leading to a probability mass (a total number of microstates) of 6,350,400. Were we to attempt to standardize this observed Wo by dividing by the maximum possible W(⫽40,320), we would get a relative probability of .125. If, on the other hand, we used the standardized entropy by dividing by the maximum in logarithmic terms a´ la Coleman (which is equivalent to the ratio of the ln (Wo)/ln (Wmax)), we would get the quite different answer of .804. The two results, both ‘‘normalized’’ to go from 0 to 1, are radically at odds, one being low,
FIG. 3.
Ordered probability distribution for N ⫽ 4, M ⫽ 2.
ssre 0643 @xyserv2/disk4/CLS_jrnlkz/GRP_ssre/JOB_ssre28-1/DIV_233a05
jame
124
JOHN LEVI MARTIN TABLE 2 Equivalence Classes for N ⫽ 8, M ⫽ 9
W 1 8 28 56 70 168 280 336 420 560 840 1120 1680 2520 3360 5040 6720 10080 20160 40320
Number of ways a
Total microstates b
Prob.c
P-Value
9 72 72 324 36 504 504 504 252 252 1512 756 2142 126 2520 1260 504 1260 252 9 Total:
9 576 2016 18,144 2520 84,672 141,120 169,344 105,840 141,120 1,270,080 846,720 3,598,560 317,520 8,467,200 6,350,400 3,386,880 12,700,800 5,080,320 362,880 43,046,721
.0000002 .0000134 .0000468 .0004215 .0000585 .0019670 .0032783 .0039340 .0024587 .0032783 .0295047 .0196698 .0835966 .0073762 .1966979 .1475234 .0786792 .2950469 .1180187 .0084299 1.0000000
.0000000 .0000002 .0000136 .0000604 .0004819 .0005404 .0025074 .0057857 .0097197 .0121784 .0154567 .0449614 .0646312 .1482278 .1556040 .3523019 .4998253 .5785045 .8735514 .9915701 1.0000000
a Number of distinct macrostates that have this thermodynamic probability. Regarding the calculation, see Feller (1950, p. 38). b W ⴱ Number of ways; corresponds to the integral area of this section of the pseudoprobability distribution. c Total number of microstates associated with this W as proportion of total number of possible microstates (M N ).
the other high. They are also both absolutely misleading. As Table 2 shows, 35.2% of all possible macrostates have greater constraint and 50.0% have less constraint, safely in the middle of the range.12 To make our standardization increase with constraint, so that total constraint ⫽ 1 and no constraint ⫽ 0, we can use the percentage of microstates that are greater in W than the observed state—in this last example, the standardized value would be .50. This standardization is equivalent to 1 minute the p-value of the Wo minus the probability of the observed Wo. This method can be applied to an actual example from the data we set will analyze below. Table 3 has data from a small group on two items, where we have 8 12 It is also worth noting that the ‘‘most probable’’ states are not the ones of the greatest thermodynamic probability. That is, the states with the greatest probability mass (column 4) are not necessarily those with the greatest W (column 1). This is because there may be more ways that less dispersed states can arise, and so the W corresponding to maximum entropy is not necessarily the most likely value to be observed.
ssre 0643 @xyserv2/disk4/CLS_jrnlkz/GRP_ssre/JOB_ssre28-1/DIV_233a05
jame
125
CONSTRAINT TABLE 3 Example of Constraint in Two Trichotomies Example: Commune 1 Society brings out the worst in man To get ahead, you must do bad things
Yes
Unsure
No
Yes Unsure No
5 0 0
0 0 0
1 0 2
W ⫽ 8!/(5!2!1!) ⫽ 168 Standardization ⫽ 1 ⫺ .0020 ⫺ .0005 ⫽ .9975
people in 9 cells and, therefore, can use the probability distribution given in Table 2. Despite the very small size of the group, the standardization demonstrates that the answers are very highly constrained—our standardized score is .9975. Our standardization of the entropy is then a generalization of the above procedure which allows us to compute, for any observed distribution of N people and M cells, how many other macrostates (weighted by their probabilities) had probabilities greater than that of the observed state. Of course, it would be far easier computationally to have a formula that directly expressed this number for any microstate in terms of N and M (as opposed to generating all of the probabilities of the microstates, ordering them, and counting, which is nontrivial in terms of time for middle size values of N and M). This seems not to be possible, but we can save a great deal of time by using an combinatorial algorithm (see Feller, 1957, p. 38) to produce all the ‘‘equivalence classes’’ of macrostates with identical numbers of microstates.13 While the number of possible macrostates grows explosively with N and M14 (with only ten people in a belief space formed by three trichotomies, there are around 250 million macrostates—with 20 people in a belief space formed by four trichotomies, there are over a billion trillion), the number of equivalence classes is far smaller, and complete enumeration is tractable for middle size N and M’s (as in Table 2). We, therefore, can compute the number of macrostates of greater W (implying 13 For example, the macrostates 50, 1, 1, 26 and 50, 2, 1, 16 are in the same equivalence class, because one is a permutation of the other. However, there are some members of an equivalence class that are not simple permutations of one another. For example, 56, 1, 1, 06 and 55, 3, 0, 06 are in the same equivalence class (since 6! ⫽ 6*5! ⫽ 3*2*1*5! ⫽ 3!*5!). The method of generating these classes is simply an algorithm that starts with N persons concentrated in 1 cell, and then systematically begins spreading out the persons across all cells, while using a combinatorial formula to compute the number of macrostates that would fall into the equivalence class. 14 Since we are, in effect, sorting N indistinguishable persons into M distinguishable states (which may be empty), the number of possible macrostates is given by the combinatorial formula C(M ⫹ N ⫺ 1,N), where C(a,b) ⫽ a!/[b!(a ⫺ b)!]. Such a combinatorial function increases dramatically with M and N. On combinatorics of such occupancy problems, see Riordan (1978) and Roberts (1984).
ssre 0643 @xyserv2/disk4/CLS_jrnlkz/GRP_ssre/JOB_ssre28-1/DIV_233a05
jame
126
JOHN LEVI MARTIN
lesser concentration), and weight them by their W’s to get a normalized statistic that is the probability mass associated with all states having less constraint than the observed. This is what should be taken as the degree of constraint in the belief space. This is, in effect, to do a significance test, but unlike a normal 2 test of independence, this is not conditioning on the marginals but on a uniform distribution. This demonstrates that the entropic statistics are, in effect, measures of departure from equiprobability, while the associationist perspective always looks at departures from independence. In cases of significantly larger M and N, however, even this technique using equivalence classes may be too computationally intense. For such cases, we can instead estimate the relative improbability computationally, by using the uniform distribution to generate a large number of hypothetical macrostates, and then for each, determining whether its probability is greater than or less than that of the observed.15 The percentage of hypothetical states that have W greater than the observed will converge to our standardization. This is, of course, an estimate of the true relative probability, but it is an unbiased estimate, as opposed to the unknown biases resulting from using the Stirling approximation and division-bymaximum. While it may be less time-consuming than a complete enumeration of all macrostates, it may still be nontrivial in terms of computational time. Entropic Measures of Tightness In the information (large N) version of entropic constraint, it was easy to decompose the overall constraint into that due to homogeneity (marginal effects) and that due to ‘‘tightness,’’ or constraint beyond the marginals. In the thermodynamic case, this is not possible, due to the discretization of the probabilities. However, we can use the same tactic outlined above to formulate a standardized tightness score that will allow for the comparison of the constraint beyond the marginals in groups of different sizes; just as the above method was akin to an exact test of the hypothesis of equiprobable distribution (though our standardized value is large instead of small when constraint is high), so this would be akin to an exact test of independence given marginals, such as that proposed by Freeman and Halton (1951).16 In this case, we simply examine only those macrostates that have the marginals equal to those of the observed distribution and compute the weighted number of macrostates having less constraint than the observed (the logic being that since the constraint due to the marginals is the same in all generated macrostates having these marginals, the constraint above-and-beyond the marginals of each macrostate is reflected in the differences in their overall
15 Senchaudhuri, Mehta, and Patel (1995) have recently proposed methods of drastically improving the speed of such Monte Carlo exact tests. 16 For a discussion of alternatives to the Freeman-Halton approach, see Agresti (1992, p. 137). Mehta and Patel (1983) provide an algorithm that computes such exact tests of independence in a fraction of the time required for a Freeman-Halton enumeration.
ssre 0643 @xyserv2/disk4/CLS_jrnlkz/GRP_ssre/JOB_ssre28-1/DIV_233a05
jame
127
CONSTRAINT
constraint).17 We are then able to tell whether or not all the observed constraint is due to likeness or homogeneity or whether we must also take into account a more complex relation in the belief space.18 EXAMPLES To demonstrate the utility of the entropic approach to belief constraint, we can contrast results obtained from these measures to those obtained using the conventional approach when applied to data from groups that are quite likely on a priori grounds to have highly constrained belief systems, namely, small, ideologically focused groups. The data are taken from the Zablocki (1980) study of American communes. This study gives us the rare opportunity to compare groups asked identical items on their core and peripheral beliefs, since some of these groups had political or religious ideological focuses, and the members were asked items about religion and politics. We can thus contrast the results attained when applying both conventional and entropic methods to the analyses of those beliefs that we would imagine the most constrained and those beliefs we would imagine less constrained (see the Appendix for items used; Martin, 1997, for details). The observers’ original codings as to the group’s ideological type (Eastern Religious, Christian, Political, Alternative Family, Counter-Cultural, Psychological-Growth, and Cooperative) are used to classify the communes; again, the first three groups are of greatest interest here. Two sets of items, four items regarding religious 17 There is, however, a choice to make which comes from the discrete nature of the probability distribution—we must decide whether or not to include the observed macrostate as one possibly arising ‘‘by chance.’’ For example, consider a case in which there are four other possible macrostates compatible with the observed marginals, and the thermodynamic probabilities of each are W1 , W2 , W3 , and W4 , while the thermodynamic probability of the observed macrostate is Wo . Assume that we find that W3 , W4 ⬎ Wo ⬎ W1 , W2 . We have to decide whether our measure of constraint is (W3 ⫹ W4 )/ (W1 ⫹ W2 ⫹ W 3 ⫹ W4 ) or (W3 ⫹ W4 )/(W 1 ⫹ W2 ⫹ W 3 ⫹ W4 ⫹ Wo ). (I will term the first versions of these ‘‘exclusive’’ and the second ‘‘inclusive.’’) This issue makes no difference in the measure of overall constraint, because the contribution of the observed macrostate to the overall probability is generally minuscule. In measuring tightness, however, there may be so few macrostates consistent with the marginals that it will make a noticeable difference whether or not we include Wo in the denominator of our standardization. This difference is crucial in cases where there is only one macrostate compatible with the marginals. The exclusive measure, which, in effect, considers the observed macrostate not to be a sampling result from a random procedure, but an ‘‘intrinsically meaningful’’ one, cannot be computed (the denominator and numerator are both zero). But the inclusive measure, which, in effect, considers the observed macrostate to have been the outcome of a random procedure, can be computed, and it is zero (indicating no constraint). Since we are trying to use this analysis to determine whether or not the observed macrostate was, in fact, the result of a random placement of persons in the belief space, there is no clear way to determine which measure to use. In the examples below (Table 6, Fig. 4), the inclusive version was used, though employing the exclusive measure did not change the results. 18 Finally, it is also worth noting that we can also standardize not only by controlling for marginal effects, but also fitting higher ‘‘subtables’’ such as the pairwise subtables for spaces with three or more variables; such standardizations are therefore identical to what would be exact tests of the equivalent hierarchical loglinear models in the Goodman system. Some of the simpler cases have been worked out in detail; see Agresti (1992, p. 140f).
ssre 0643 @xyserv2/disk4/CLS_jrnlkz/GRP_ssre/JOB_ssre28-1/DIV_233a05
jame
128
JOHN LEVI MARTIN TABLE 4 Average Interitem Correlation for Religious Items by Ideology Ideology
Mean AIC a
Rank
SD
Number of groups
Eastern Spiritual Christian Political Counter-Cultural Alternative Family Cooperative Personal Growth
.443 .275 .435 .443 .384 .501 .501
4 7 5 3 6 2 1
.262 .091 .137 .097 .054 .054 .309
(14) (7) (6) (9) (5) (4) (7)
⫽ .351; 2 ⫽ .123 a AIC ⫽ Average Interitem Correlations, or averaged absolute values of the correlations of all pairs of items.
beliefs and four items regarding political beliefs, comprise our two belief spaces. The items were also recoded from 5-point scales to 3-point agree/no opinion/ disagree scales, leading to 81 cells in each belief space. The constraint in each of these two belief spaces was then computed for each group, as described above. I will only present analyses of the religious beliefs for purposes of brevity, but make comparisons to the results obtained from similar analyses of the political beliefs, available upon request. We can compare our results to those obtained using the conventional method of what I shall call average interitem correlations (AIC), in which we take all M items and compute the M(M-1)/2 correlations between all pairs of items for each group, and then average these correlations to come up with a score for each group.19 Table 4 shows the mean AIC’s (the average across all the groups in the category) for the religious items for different types of communes. We might be quite shocked to find that the Christian groups have the lowest AIC and therefore, the least constrained belief system, according to conventional criteria.20 The results using the entropic methods are in stark contrast to those resulting from the conventional methods. (Again, recall that our standardized measure is the percent of macrostates of greater W than the observed.) Table 5 demonstrates that the constraint for religious beliefs is by far the greatest in Christian groups, and that of Eastern groups is a close second. Furthermore, the standard deviation is quite small for Christian groups and is second smallest in Eastern groups, indicating that this is a general phenomenon within these categories. This 19 Here we use the trichotomized versions of the variables to make sure that the differences do not arise from recoding; however, computing the AIC coefficient with the unrecoded data, while changing the values and the ranks of the commune types, did not affect the conclusions. The recoded and unrecoded versions of AIC correlated r ⫽ .792 for the religious items and r ⫽ .917 for the political items. 20 Were we to use the unrecoded versions of the responses, the Christian groups would still have the lowest AIC for religious items, but the Eastern groups would rise from fourth place to first.
ssre 0643 @xyserv2/disk4/CLS_jrnlkz/GRP_ssre/JOB_ssre28-1/DIV_233a05
jame
129
CONSTRAINT TABLE 5 Means of Entropic Constraint on Religious Variables, by Ideology Ideology
Mean constraint
Rank
SD
Number of groups
Eastern Spiritual Christian Political Counter-Cultural Alternative Family Cooperative Personal Growth
.751 .936 .309 .599 .364 .220 .465
2 1 6 3 5 7 4
.430 .105 .479 .457 .501 .441 .451
(13) (9) (6) (9) (5) (4) (7)
⫽ .511; 2 ⫽ .261
suggests that our measure of constraint is indeed tapping the organization of central belief systems. (This constraint is not a general characteristic of those groups high in constraint of religious beliefs, since these religious groups are not highest in constraint when it comes to the political beliefs—instead, political communes are.) Finally, also note that the variance in constraint explained by group ideology (as given by the 2 statistic) is higher using the entropic measure of constraint than the AIC, suggesting that the former is indeed more closely related to the ideological divisions and, therefore, to the belief systems, of these groups.21 While I have suggested that this overall constraint is the most reasonable measure of constraint as reconceptualized above, we have also seen that the entropic method can produce a measure of the tightness of belief systems, that is, the constraint beyond the marginals. It might be thought that the reason for the discrepancy between the AIC measures and the entropic measures is that the latter are only reflecting the degree of homogeneity (the constraint due to the marginals); both measures could be right, if groups tended to be low on the tightness of core beliefs, because the agreement was so high. It is, therefore, of great significance that it is not the case that groups are necessarily low in the tightness of core beliefs. For religious items (Table 6), we find that while the tightness of Christian groups is relatively low, that of Eastern groups is actually the highest. While the Christian groups were highest in overall constraint on religious items, the Eastern groups were second highest, so the high tightness of Eastern groups is not indicative of low constraint at the marginals. Furthermore, a comparison of the ranks in Table 6 to those given by the AIC statistic (Table 4) shows that even when we use the entropic approach to measure constraint-beyond-the-marginals, the results differ from those of the correlational approach. A common finding using conventional measures is that constraint, as measured 21 This is also true for the political beliefs: the using the AIC is .149, while that using the entropic statistic is .360.
ssre 0643 @xyserv2/disk4/CLS_jrnlkz/GRP_ssre/JOB_ssre28-1/DIV_233a05
jame
130
JOHN LEVI MARTIN TABLE 6 Means of Tightness on Religious Variables, by Ideology Ideology
Mean tightness
Rank
SD
Number of groups
Eastern Spiritual Christian Political Counter-Cultural Alternative Family Cooperative Personal Growth
.360 .069 .000 .127 .031 .000 .201
1 4 6–7 3 5 6–7 2
.396 .182 .000 .257 .069 .000 .293
(13) (9) (6) (9) (5) (4) (7)
⫽ .478; 2 ⫽ .229
by correlation or unidimensionality, tends to be inversely correlated with homogeneity (e.g., Barton and Parsons, 1977, p. 76); thus, the constraint beyond the marginals (or tightness) seems to be inversely correlated with the constraint at the levels of the marginals. It is, therefore, worth demonstrating that in stark contrast to most other techniques, this measure of tightness can be high whether or not there are skewed marginals. While, of course, total homogeneity does pose the difficulty of unmeasurable tightness (since there is no possible distribution of alternative macrostates with which to compare the observed macrostate), short of this, tightness may be high despite high consensus. Unfortunately, in the small N (thermodynamic) representation, unlike the information representation of entropic constraint, we cannot decompose the overall constraint into a portion due to the marginals and a portion due to ‘‘tightness’’; however, it was possible to use a different measure of consensus to capture the degree of homogeneity. We can measure consensus on the group level as the average of dyadic agreement; members of a dyad are consider to agree on a topic if they ‘‘co-reside’’ in the same cell of the trichotomous classification. This measure of consensus is equivalent to the probability that any two members picked at random will agree on any question picked at random. While this definition of consensus is not directly related in mathematical terms to the overall constraint, it clearly captures the most important marginal effects. By examining groups in terms of their consensus and tightness simultaneously, we can see whether the measure of tightness, like other measures of the interrelation of beliefs, such as the AIC measure, is necessarily inversely associated with homogeneity. This seems not to be the case; indeed, for the religious variables, the correlation between tightness and consensus is insignificantly positive (r ⫽ .144). A more informative demonstration is found in Fig. 4, which graphs groups on their consensus and tightness scores simultaneously for these religious beliefs; we can see that it is not necessarily the case that core beliefs have any particular pattern of consensus and tightness. While we found above (Table 4) that the conventional (AIC) method showed that religious groups tended to have low constraint on religious items, here we find that religious groups may have high or
ssre 0643 @xyserv2/disk4/CLS_jrnlkz/GRP_ssre/JOB_ssre28-1/DIV_233a05
jame
131
CONSTRAINT
FIG. 4.
Tightness vs consensus for religious items.
low tightness and tend toward greater consensus rather than less. While here there are no groups with high tightness and consensus over .75, this is not necessarily the case; in the example of political groups (not shown), there is one group with high tightness and extremely high consensus as well (the latter is over .8). Thus, the methods presented here, unlike conventional ones, are sensitive to the interrelation of beliefs even when there is a great deal of homogeneity. CONCLUSION The Decomposition of Homogeneity and Tightness The ability of the entropic approach to formulate a measure of constraint beyond the marginals suggests that it can successfully be used for the decomposition of constraint into a portion due to homogeneity and a portion due to tightness or constraint beyond the marginals. In the large sample (informational) version, this is quite easy—the constraint due to the marginals can simply be subtracted from the total constraint, while the remainder is then the constraint beyond the marginals. In the small sample (thermodynamic) version, this cannot be done, due to the quantizing of the probabilities. But there are other ways of trying to characterize the constraint-due-to-the-marginals. We employed one, based on dyadic coresidence, that is appropriate for nominal variables. For interval variables, an
ssre 0643 @xyserv2/disk4/CLS_jrnlkz/GRP_ssre/JOB_ssre28-1/DIV_233a05
jame
132
JOHN LEVI MARTIN
average standard deviation (as in Barton and Parsons, 1977) might be appropriate. Finally, a more rigorous though more specialized measure might be the weighted average of the constraint of all macrostates compatible with the observed marginals. Despite the possibility of such decomposition, it is worth emphasizing that the theoretical approach guiding this paper does not imply the necessity of such a technique. The constraint beyond the marginals actually seems less likely to indicate ‘‘true constraint’’ than the overall constraint. Groups are very high on the overall constraint for their core beliefs and not necessarily so high on the constraint beyond the marginals (tightness). In order for the latter to accurately represent the organization of core beliefs, there must be a significant range of variation in these beliefs. While the entropic measures introduced here are more sensitive to small variations than are conventional methods and are, therefore, better able to detect tightness in core belief systems, such variation does seem less likely in highly constrained belief systems, and one would not want to impute disorganization to people’s beliefs merely because our measures have not induced sufficient variation in response. Final Remarks We have seen the necessary methodological incoherence arising from the associationist conception of constraint in mass belief systems, and proposed a more sociologically-grounded conception of constraint. The relative improbability (or, for larger samples, the negative of the entropy) is the only theoretically reasonable measure of the degree of this constraint-in-general that is appropriate for belief spaces of any size. The computational ease of the negative entropy, the fact that it has the same asymptotic properties as the likelihood ratio, that a theoretically clear exact formulation exists for small N situations, and its wider significance in a general approach to statistics, which we have not discussed here (see Akaike, 1983), make it a natural choice for the study of belief system constraint, to replace correlational methods currently used. APPENDIX List of Questions Used Religion ‘‘There is only one solution to the problems of the world today, and that is Christ.’’ ‘‘This country would be better off if religion had a greater influence in daily life.’’ ‘‘Some people will be saved and others will not. It is predestined. God knew who would be saved long before we were born.’’ ‘‘Despite the great diversity of religious orientation which currently abound [sic], it is still possible for the sincere seeker to separate truth from illusion.’’
ssre 0643 @xyserv2/disk4/CLS_jrnlkz/GRP_ssre/JOB_ssre28-1/DIV_233a05
jame
133
CONSTRAINT
Politics ‘‘I am convinced that working slowly for reform within the American political system is preferable to working for a revolution.’’ ‘‘There is not much I can do about most of the important national problems we face today.’’ ‘‘You can never achieve freedom within the framework of contemporary American society.’’ ‘‘People who have the power are out to take advantage of us.’’ All responses coded Strongly Agree-Agree-No Opinion-Disagree-Strongly Disagree; responses were recoded to Agree-No Opinion-Disagree for calculation of entropic constraint and tightness. REFERENCES Agresti, A. (1992). ‘‘A survey of exact inference for contingency tables,’’ Statistical Science 7, 131–153. Akaike, H. (1983). ‘‘Statistical inference and measurement of entropy,’’ in Scientific Inference, Data Analysis, and Robustness (G. E. P. Box, T. Leonard, and C. Wu, Eds.), pp. 165–189, Academic Press, New York. Barton, A. H., and Parsons, R. W. (1977). ‘‘Measuring belief system structure,’’ Public Opinion Quarterly 41, 159–180. Borhek, J. T., and Curtis, R. F. (1975). A Sociology of Belief, John Wiley and Sons, New York. Brooks, E. C. (1994). ‘‘The selectively political citizen? Modeling attitudes, non-attitudes, and change in 1950s public opinion,’’ Sociological Methods and Research 22, 419–459. Campbell, A., Converse, P. E., Miller, W. E., and Stokes, D. E. (1960). The American Voter, John Wiley and Sons, New York. Coleman, J. S. (1964). Introduction to Mathematical Sociology, The Free Press of Glencoe, New York. Converse, P. E. (1964). ‘‘The nature of belief systems in mass publics,’’ in Ideology and Discontent (D. E. Apter, Ed.), pp. 206–261, Free Press, New York. Feller, W. (1957). An Introduction to Probability Theory and Its Applications, John Wiley and Sons, New York. Freeman, G. H., and Halton, J. H. (1951). ‘‘Note on an exact treatment of contingency, goodness of fit and other problems of significance,’’ Biometrika 38, 141–149. Gil, M. A., Perez, R., and Martinez, I. (1986). ‘‘The mutual information estimation in the sampling with replacement,’’ RAIRO (Recherche ope´rationelle) 20, 257–268. Goodman, L. (1970). ‘‘The multivariate analysis of qualitative data: Interactions among multiple classifications,’’ Journal of the American Statistical Association 65, 226–256. Goodman, L. (1971). ‘‘The analysis of multidimensional contingency tables: Stepwise procedures and direct estimation methods for building models for multiple classifications,’’ Technometrics 13, 33–61. Jennings, M. K. (1992). ‘‘Ideology among mass publics and political elites,’’ Public Opinion Quarterly Vol. 56, 419–441. Knight, K. (1985). ‘‘Ideology in the 1980 election: Ideological sophistication does matter,’’ Journal of Politics 47, 828–853. Kullback, S. (1959). Information Theory and Statistics, Wiley, New York. Lane, R. E. (1962). Political Ideology, Free Press, New York. Lieberson, S. (1969). ‘‘Measuring population diversity,’’American Sociological Review 34, 850–862. Lieberson, S. (1985). Making It Count, Univ. of California Press, Berkeley. Liebtrau, A. M. (1983). Measures of Association, Sage Publications, Newbury Park.
ssre 0643 @xyserv2/disk4/CLS_jrnlkz/GRP_ssre/JOB_ssre28-1/DIV_233a05
jame
134
JOHN LEVI MARTIN
Luce, R. D. (1960). ‘‘The theory of selective information and some of its behavioral applications,’’ Part I of Developments in Mathematical Psychology, Free Press, Glencoe, Illinois. Magdison, J. (1981). ‘‘Qualitative variance, entropy, and correlation ratios for nominal dependent variables,’’ Social Science Research 10, 177–194. Martin, J. L. (1997). Power Structure and Belief Structure in 40 American Communes. PhD. Dissertation, University of California, Berkeley. McGill, W. J. (1954). ‘‘Multivariate information transmission,’’ Psychometrika 19, 97–116. Mehta, C. R., and Patel, N. R. (1983). ‘‘A network algorithm for performing Fisher’s exact test in r ⫻ c contingency tables,’’ Journal of the American Statistical Association 78, 427–434. Miller, G. A. (1955). ‘‘Notes on the bias of information estimates,’’ in Information Theory in Psychology (H. Quastler, Ed.), pp. 95–100, The Free Press, Glencoe. Miller, G. A., and Madow, W. G. (1963) [1954]. in Readings in Mathematical Psychology (R. D. Luce, R. R. Bush, and E. Galanter, Eds.), pp. 448–469, John Wiley and Sons, New York. Nayak, T. K. (1985). ‘‘On diversity measures based on entropy functions,’’ Communications in Statistics–Theoretical and Methodological 14, 203–215. Nie, N. H., and Andersen, K. (1974). ‘‘Mass belief systems revisited: Political change and attitude structure,’’ Journal of Politics 36, 540–587. Nie, N. H., Verba, S., and Petrocik, J. R. (1976). The Changing American Voter, Harvard Univer. Press, Cambridge, MA. Porter, T. M. (1986). The Rise of Statistical Thinking, 1820–1900. Princeton: Princeton University Press. Preuss, L. G. (1980). ‘‘A class of statistics based on the information concept,’’ Communications in Statistics–Theoretical and Methodological A9(15), 1563–1585. Reiser, M. (1981). ‘‘Latent trait modeling of attitude items,’’ in Social Measurement: Current Issues (G. W. Bohrnstedt and E. F., Borgatta, Eds.), pp. 117–144, Sage, London, UK. Riordan, J. (1978) [1958]. An Introduction to Combinatorial Analysis, Princeton Univ. Press, Princeton, NJ. Roberts, F. S. (1984). Applied Combinatorials, Prentice-Hall, Englewood Cliffs, NJ. Salicru´, M., Menendez, M. L., Morales, D., and Pardo, L. (1993). ‘‘Asymptotic distribution of (h, )–entropies,’’ Communications in Statistics–Theoretical and Methodological 22, 2015–2031. Sears, F. W. (1950). An Introduction to Thermodynamics, the Kinetic Theory of Gases, and Statistical Mechanics, Addison-Wesley Press, Cambridge, MA. Senchaudhuri, P., Mehta, C. R., and Patel, N. R. (1995). ‘‘Estimating exact p values by the method of control variates or Monte Carlo rescue,’’ Journal of the American Statistical Association 90, 640–648. Shannon, C. E. (1963) [1948]. ‘‘The mathematical theory of communication,’’ in The Mathematical Theory of Communication (C. E. Shannon and W. Weaver, Eds.), pp. 3–93, The Univ. of Illinois Press, Urbana, IL. Smith, T. W. (1984). ‘‘Nonattitudes: A review and evaluation,’’ in Surveying Subjective Phenomena, Vol. II (C. F. Turner and E. Martin, Eds.), pp. 215–255, Russell Sage Foundation, New York. Stimson, J. A. (1975). ‘‘Belief systems: Constraint, complexity, and the 1972 election,’’ American Journal of Political Science 19, 393–417. Wyckoff, M. L. (1987). ‘‘Measures of attitudinal consistency as indicators of ideological sophistication: A reliability and validity assessment,’’ The Journal of Politics 49, 148–168. Yuan, L. and Kesavan, H. K. (1997). ‘‘Bayesian estimation of Shannon entropy,’’ Communications in Statistics: Theory and Methods 26, 139–148. Zablocki, B. (1980). Alienation and Charisma, The Free Press, New York.
ssre 0643 @xyserv2/disk4/CLS_jrnlkz/GRP_ssre/JOB_ssre28-1/DIV_233a05
jame