SOCIAL SCIENCE RESEARCH 5,279-314
Constructing
Simple
Theories
(1976)
from Propositional
Inventories
OTOMAR J. BARTOS and ROBERT C. HANSON University of
Colorado
For better or worse, vast amounts of the research in the social sciences is done by researchers working independently of each other. Moreover, until quite recently, most of the data gathered has been of the correlational variety, relating two variables at a time. Consequently, those who try to make theoretical sense out of the vast amount of data often give up this task as hopeless. One way of alleviating this situation is through a propositional inventory: by bringing into one volume a large number of propositions that cover similar grounds. The late 1950s and the 1960s saw an upsurge in such inventories and this trend has continued into the 1970s.l Yet even these inventories do not condense and organize the data sufficiently to permit one to develop theories from them. In this paper we shall describe and illustrate a procedure which permits one to construct simple theories from propositional inventories.
THE PROCEDURE As we shall now show, our procedure is similar to statistical factor analysis in two important ways. First, in factor analysis each factor tells us which variables “belong together” becausethey all load highly on that factor; our procedure will tell us which variables are “compatible” with each other in the sense of comprising a very simple deductive system. Second, the first factor in the principal components version of factor analysis representsthe most condensed information one can obtain about a matrix of correlations, becausethis factor comes closest to being able to reproduce that matrix. Our procedure also represents highly condensedinformation: We shall be able to go from knowing which variables are compatible to a set of hypotheses about the relationshipsamong them. With Robert Burton, Ruth Hooley, Charles Huggs, James Little, P. R. Morgan, J. Robert Passmore, and Michael Thomson. We also wish to express our appreciation to the Council on Research and Creative Work at the University of Colorado for their grant-m-aid that supported part of this work. Send reprint requests to Otomar J. Bartos, Department of Sociology, University of Colorado, Boulder, CO 80302. ISee, for example, March and Simon (1958), Berelson and Steiner (1964), Altman and McCrath (1966), Riley and Fohner (1968) and Goode, Hopkins, and McClure (1971). A complete list of such inventories is given in Passmore (1974).
279 Copyright @ 1976 by Academic Press, Inc. All rights of reproduction in any form reserved.
280
BARTOS AND HANSON
Yet, in spite of these similarities, our procedure is different in one all-important aspect. While factor analysispresupposesthat we know not only the sign but also the magnitude of relationships (i.e., that we know the coefficients of correlation), our procedure requires only that we know the sign of the relationship. Consequently it can be applied to verbal propositions that specify merely that Xi influences Xj positively (or negatively) without specifying the size of this influence. This, of course, renders our procedure ideally suited for something that factor analysiscannot accomplish: integrating theoretically independent research found in a variety of sources,including the propositional inventories. Main Features The fundamental point of departure for us is the notion that vast amounts of data can be best comprehendedwhen the theoretical framework imposed upon the data is as simple as possible.We therefore define one of the simplest frameworks that the data might obey and then proceed to separate findings that obey it from those that do not. More specifically, we wish to group together all propositions that obey two well-known rules of derivation-rules that apply to many systems, including the graphsderived through path-analysis:2 1. Rule of multiplication: If individual influences form a continuous chain such as b X,n,YZ--bX3CX~,
(where an arrow indicates the direction of influence and the coefficients a,b, . . . indicate its sign and size) then the total influence from the beginning to the end of the chain is given by the product of the path coefficients a,b, . . .: abc x,-x,.
2. Rule of addition: If there are severalinfluences flowing from point X1 to X2 such as
a
then the total influence from Xr to X2 is given by the sum of the path coefficients a,b, . . .: a+b+c
2For a discussion of the path-analytical (1971), Chap. 12.
procedures see, for example, van der Geer
281
CONSTRUCTING SIMPLE THEORIES
We should emphasize one point that is essential to understanding our procedure. Unlike some previous attempts along similar lines,3 we do not attempt to justify our procedure by examining the assumptions that the data must satisfy. Instead, our justification (if any) comes from the fact that large amounts of unrehzted findings are compatible with the above two rules. In fact, the crucial point is that the above two rules give us the leverage necessary to separate mutually compatible findings from those that are incompatible. We shall now examine this notion of compatibility and describe a procedure for finding the largest possible set of compatible data. We shall end our discussion of the procedure by showing what advantages accrue from having isolated a compatible subset of data. Representation
of Propositions
Verbal propositions that specify that influence flows from variable Xi to variable Xj can be represented in a variety of ways. One of them is a square specify a positive matrix with components a&, . . . if the propositions influence, -a,-b,-c, . . . if they specify a negative influence, and 0 if neither positive nor negative influence is specified. Thus, for example, the following arbitrary matrix corresponds to three separate propositions: 1 2 3 A =
High urbanization Large families High delinquency
1 2 3
[ 1 Oa b 0 o-c 000.
When one attempts to state verbally the three propositions that correspond to the above matrix A, one encounters a great variety of possible formats. For our purposes it is useful to utilize the following format: High (low) Xi leads to high (low) Xi. Application problem:
of this format to two of the three propositions High urbanization High urbanization
in A represents no
leads to large families leads to high delinquency.
The third proposition, however, that which specifies a negative relationship between large families and bigb delinquency does pose a problem since the words “leads to” does not allow us to distinguish between positive and negative relationship. But, as we shall discuss shortly, it is possible to use our format if we change “high delinquency” to “low delinquency”: 3See, for example, Costner and Leik (1964). Our evaluation section additional comments on the differences between their and our approach.
provides
282
BARTOS AND HANSON
Large families lead to low delinquency. It is sometimesuseful to use a third mode of representation, one which employs graphs. Thus, for example, the above matrix A correspondsto the following “causal graph”:
Large families
X I * I -C
0
High
urbonizotion
x, <
b
1 X3
High delinquency Note that we use solid arrows to represent positive influences, broken-line arrows for negative influences. Finally, we can represent the very samepropositions through so-called “structural equations”: 4 x, = ax, X3 = bX1 - cXz.
(1) Thus we see that there are at least four isomorphic representations of a theory: verbal, graphic, through square matrices, and through structural equations. Compatible Propositions Let us consider why we may wish to introduce the concept of “compatibility.” In order to determine “total” influence of an increase in urbanization X1 upon an increase in delinquency, let us derive a so-called “reduced” equation from the structural equations (1). We do this by substituting for Xi : X3 = bX1 - c(uX,) = bX1 - caXl = (b - ca)X, .
(2)
We note, parenthetically, that in deriving the “total” influence of Xi upon XJ we made use of the same rules as Rules 1 and 2: We multiplied two influences, -CII, and added two together, b + (-~a). Thus we see that the equations in (1) are consistent with our intentions to impose upon our theory a simple framework that obeys these two rules. 4For the manner in which structural equations are derived from a graph see, for example, van der Geer (1971), p. I15 or Harary ef al. (1965).
CONSTRUCTING
SIMPLE
THEORIES
283
More relevant to the present purpose, however, is the following query: If we know the sign of the relationship but not its magnitude, what can we say about “total” influences such as shown in Eq. (2)? The answer is that the value of the product -ca will always be negative, no matter what quantity a and c represent. However, the sum b + (-~a) can be negative (if b <~a), positive (if b > ca) or zero (if b = ca). Consequently, unless we know the size of the influences, we cannot use our two rules to derive a new (total) relationship. Since we intend to start from verbal propositions that specify the sign of the relationships but not their magnitude, we cannot hope to use our two rules profitably on cases such as we just illustrated. But this does not mean that there are no cases to which the two rules may be applied: If all influences are positive, then we get useful results even if we do not know the magnitudes. For example, suppose that we have the following three propositions: 0 42 X
-i.e., propositions equations are
that employ only positive relationships.
Then the structural
X2 = aXl X3 = bX1 + cXz, and the reduced equation for X3 is X3 = (b + ca)X, . Since all three coefficients, a, b, and c, are positive numbers, it follows that (bt ca) must be positive also. In general, it can be shown that if a theory can be represented by a matrix or graph in which all influences are positive, then all derived influences must be positive as well. Consequently, it is useful to use a special term for propositions that can be represented by a non-negative matrix5 or by a graph in which all arrows are solid (positive): We shall call such propositions mutually compatible. Reversal of Labels Our definition of compatibility cannot be fully comprehended until we explain what we mean when we say that a set of propositions “cannot be 5We speakabout a “non-negative”rather than because
some of the components
may
be zero.
about an “all-positive” matrix
284
BARTOS AND HANSON
represented by a non-negative matrix.” The crucial fact compatibility is that it is possible to convert a negative positive one without changing the meaning of the proposition changing one of the “labels.“6 We already saw an example changed the label “high” delinquency in Large families X, - - - - - - ---‘- - - - - - +Xa
for determining influence into a in any way: by of this when we
High delinquency.
into “low” delinquency, thereby changing the influence into a positive one: c Large families X, cXs Low delinquency. The general principle of label reversal can be stated if we agree to use “X” for labels such as “high” or “large,” “-X” for “low” or “small.” Then, for example, the following negative influence Xi - - - - - *Xi can be changed into a positive influence (with the same meaning) by either changing the label Of Xi, -Xi----t Xi, or the label of Xi, Xi--d -Xi. Given the notion of “label reversal,” we can define an “incompatible set of propositions” as one which, once represented by a square matrix, cannot be converted into a non-negative mat& by any reversal of the labels. Matrix A given earlier is an incompatible matrix because no label reversals produce a non-negative matrix. Let us consider why. The fundamental fact to remember is that when a label of Xi is reversed, then all entries in the i-th row and the i-th column must change their sign. Furthermore, in order to simplify our presentation, let us agree to show all positive influences as components +I, all negative as - 1.7 Then the matrix can be written as 123 High urbanization A = Large famihes High delinquency
1 2
3
Suppose we start by reversing “large” families into “small.” Let us first reverse the label of the second row and change the signs of all entries in that row: High urbanization A = Small families
High delinquency 6This, relationship %his which 1 and pp. 115 ff.
1 -2 3
12 3 011 0 0+1 00 0.
[ 1
of course, is an assumption about the data, one which is satisfied if the between Xi and xi is either monotonic increasing or monotonic decreasing. new convention amounts to a decision to deal with “Boolean” matrices in 0 have a special meaning. See, for example, Harary et al. (1965), especially
CONSTRUmING
SIMPLE
285
THEORIES
This does not complete label reversal, for we must also reverse the label and signs for the second column: 1 -2 High urbanization A = Small families High delinquency
1 -2 3
3
[ 1 0 -1 1 0 01 0 00.
This completes reversing the label of X2, but the result is not satisfactory: We merely shifted the negative influence from one position to the next, we did not change it into a positive one. The reader can amuse himself by changing systematically labels associated with the variables X1, X2, and XB, either singly or in combination; he will find that none of these operations renders the matrix non-negative. Consequently, we conclude that A represents incompatible propositions.
Algorithm for Converting Negative Influences into Positive Let us repeat our reasons for wanting to work with non-negative matrices: When the magnitudes of influences are unknown, we can derive new propositions only from such (non-negative) matrices. Since verbal propositions of the kind typically found in the literature specify the sign but not the magnitude of an influence, they are amenable to simple theoretical insights only if they can be represented by a non-negative matrix. Given this motivation the question arises about what to do with matrices that cannot be converted (through label reversal) into non-negative matrices. And it seems reasonable to adopt a principle often found in mathematics: If a set of propositions is incompatible, then let us find a largest possible subset that is compatible.8 We shall now outline and illustrate a simple algorithm that finds a subset of compatible propositions. Unfortunately, it is not guaranteed to find the largest subset nor is the subset it finds necessarily unique. However, the subset it finds is almost always compatible (as will be illustrated later). The algorithm is based on the principle that one shod always reverse that label that changes most components from -I to +l. It involves the following steps: 1. Represent the propositions by a square matrix. 2. Sum the components in each row. 3. Sum the components in each column.
analysis: possible.
8This principle is the same as that of the principal components version of factor The first factor is so defined that it explains as large a proportion of variance as
286
BARTOS AND HANSON
4. Add the sum of i-th row to the sum of i-th column, calling the resultssi. 5. Reversethe label and signsin the row with the largest negative si. 6. Reverse the label and signsin the column with the largest negative si. 7. Repeat steps 2-6 on the altered matrix as long as this reduces the total number of negative components in the matrix. Let us illustrate this procedure on a subset of 26 propositions taken from Goode ef al. (1971). These propositions are listed in Table 1; the corresponding matrix generatedin Step 1 is as follows: 12345678910 High High High High High High High High High High
westernization industrialization urbanization social mobility geograph. mobility family cohesion birth rate family size stability of marriage delinquency
1 2 3 4 5 6 I 8 9 10
0 0 0 0 O-l-l
0 0
Step 2 involves summing each row. These sumscan be representedas a vector S,: S,=(-2,-2,-I,-3,-2,0,0,3,1,0). Step 3 involves summingeach column: S,=(-1,0,0,0,2,-I,-2,-4,-1,l). Step 4 calls for adding a sum of row i to the sum of column i. This correspondsto adding together the above two vectors: S=S,+S,=(-3,-2,-l,-3,0,-1;2,-l,O,l). Step 5 specifiesthat we should reverse the label of variable Zr for which the component si of the vector S is the largest negative number. The reason for this is that the way vector S was constructed guaranteesthat if we reverse Xi for which s is negative and largest, we will remove the largest number of negative entries from our matrix. We seethat we have two variableswhich vie for our attention, since three variables in S have the highest negative component, Si = - 3: X1 : high Westernization X4 : high social mobility.
CONSTRUCTING
SIMPLE THEORIES
TABLE
287
1
Some Illustrative Propositionse~b 061: a61: ‘56: a26:
ae,ro:
When family bonds are strong, the rate of acculturation is retarded (p. 142). The greater the internal family solidarity, the higher the birth rate (p. 144). Family solidarity is weakened by geographic mobility (p. 144). There is a negative correlation between industrialization and family integration (p. 144). Juvenile delinquency will be less likely when the solidarity of the family is high (p. 144).
age and 069: The cohesiveness of the family is directly correlated with the stability of the marriage pattern (p. 145). a971 A low birth rate is directly correlated with marital instability (p. 261) a171 The birth rate tends to decline in situations of culture contact (p. 259). A decrease in the birth rate will tend to accompany the establishment of a5-k patterns of migratory labor (p. 260). 037: Fertility is inversely related to degree of urbanization (p. 262). Rapid social mobility in a society is associated with low fertility (p. 262). “47: Increasing industrialization is correlated with decreasing family size (p. 314). a28: There is a positive relationship between residential mobility and divorce (p. 189). a59: The higher a country’s divorce rate, the higher its delinquency rate (p. 190). 09, 10: There is a direct relationship between urbanization and divorce (p. 192). a39: Rural families are less likely than are urban families to produce delinquent 03, 10: children (p. 717). There is a correlation between residential mobility and delinquency of the child as, 10: (p. 270). An increase in the divorce rate is correlated with a higher rate of social mobility a49: (p. 683). Farm families are more stable in residence than urban families (p. 272). a35: A reduction in the size of the family tends to accompany acculturation (p. 669). a18: Marriages with few or no children have a higher divorce rate than others (p. aa9: 671). The larger the family, the more likely it is to move (p. 671). a85: 08, 10: Children from large families are more likely to be delinquent than are children from small families (p. 671). The greater the social mobility of a family, the smaller the size of the family (p. a48: 674). There is a direct relation between the degree of urbanization (urban, suburban, 038: rural) of a family and that family’s size (p. 675). aSource: Goode ef al. (1971); numbers in parentheses refer to page numbers. bSymbo1 aii refers to the location of the proposition in the adjacency matrix A. Thus, for example, aer associated with the first proposition means that this proposition is represented in the 6th row and 1st column of A.
288
BARTOS AND HANSON
Our procedure does not provide us with a criterion for choosing between the two alternatives, so let us take, say, the first and change“high” westernization to “low” westernization.9 Given this choice, step 5 calls for multiplying the first row by - 1, step 6 for multiplying the first column by - 1. Performing both operations, we obtain the following matrix: 12345678910
Low westernization High High High High High High High High High
industrialization urbanization social mobility geograph. mobility family cohesion birth rate family size stability of marriage delinquency
8 9 10
-0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100000101-1 0000000000 0000100011 000001100-1 -0 0 0 0
0 0 1 1 0 0 o-1 o-1 0 0 1 o-1-1-1 1 0 o-1-1-1 0 O-l-l o-1 1
0 0 0 0 0 0
Note that this matrix is identical with the one preceding it except for the first row and first column: the number of negative componentshasbeen reduced by three, just asindicated by vector S. Step 7 calls for determining whether our procedure is capable of reducing the number of negative entries further. Since we just showed that the negative components of vector S indicate how many negative relations can be reversed, we simply create vector S for the new matrix: aslong as it contains at least one negative entry, we can proceed. The sum of rows for the last matrix is s, = (2, -2, -1, -3, -2, 2,0,3,1,0). The sum of columns yield s,=(1,0,0,0,2,-1,0,-2,-1,1). And vector S = S, + SC is S=(3,-2,-l,-3,0,1,0,1,0,1). Since S contains some negative entries, we continue applying our procedure. We could continue by reversingthe label on X4. But since we explained what needs to be done sufficiently, let us consider the final matrix we obtained:
9It is this Tack of a criterion for deciding which of the variables to select from among those tied for fist place that can produce a nonunique result.
CONSTRUCTING
SIMPLE THEORIES
289
12345678910 Low westernization Low industrialization Low urbanization Low social mobility Low geograph. mobility High family cohesion High birth rate High family size High stability of marriage Low delinquency
1 2 3 4 5 6 I
(3)
8
9 10
The reader should satisfy himself that this indeed is the final matrix: Further application of our algorithm will not eliminate the remaining two negative components. Algorithm
for Eliminating
Incompatible
Variables
Inspection of the last matrix suggests that Xs (family size) is incompatible with the remaining variables: if we remove it from the set, we obtain a non-negative matrix: 123456789
Low westernization Low industrialization Low urbanization Low social mobility Low geograph. mobility High family cohesion High birth rate High stability of marriage Low delinquency
8 9
~00000106 000001000 000010111 000000110 000001111 100000111 000000000 000001101 000000000
(4)
We are fortunate in our example in that there is no doubt about which variable should be excluded as incompatible. However, we cannot expect this to be always that obvious, and hence an additional algorithm is called for. The one we propose is similar to that we outlined for converting negative influences into positive: We apply to the matrix with the minimum negative components the same seven steps as before, except that now we sum the negative entries only. It is not difficult to see that when we apply this procedure to the final 10 X 10 matrix (3), we indeed end by eliminating Xs: First we create a vector that sums all negative components in the rows of the 10 X 10 matrix (3): s, = (0, 0, 0, 0, 0, 0, 0, -2,o, 0). Second we create a vector that contains column sums of negative entries: SC = (0, 0, 0, 0, - 1, 0, 0, 0, 0, - 1).
290
BARTOSANDHANSON
Creating a sum of the two vectors S = S, t S,, we get s = (O,O, o,o, - 1, o,o, -2,o, - 1). Since the largest negative entry is in the eighth position, we eliminate X8. By so doing we removed the largest possiblenumber of negative influences. And, of course, we stop this procedure as soon as we obtain a matrix without any -1s. Vector of Labels As stated earlier, one of the similarities between our procedure and statistical factor analysis is that both help us to determine which variables form a subset that “belongs” together. Factor analysis conveys this information by displaying factor loadings for each variable: If the loading is high, then the variable “belongs” to the factor in question. Ultimately, this information can be summarizedby listing the variables that load highly on a given factor, provided that we indicate for each variable whether it is related positively or negatively to the factor. We can proceed in an analogousfashion, listing the compatible variables and associatingwith each a number indicating how that variable relates to all others. However, if we utilize the final labels (some of which have been reversed), then it is obvious that each compatible variable “loads” in the same way : it is positively related to (at least some of) the other variables. Consequently our “factor” looks as follows: c-
1. Low Westernization 2. Low industrialization 3. Low Low Low
4. 5. 6. 7. 8. 9.
High High High Low
urbanization
social mobility geographic mobility family cohesion birth rate stability of marriage delinquency
+1 +1 +1 +1 +1 +1 +1 +1 --+1
To distinguish our “factor” from its statistical cousin we shall refer to the above vector of +Is as the “vector of labels.” Our vector of labels representshighly concentrated information that can be used in ways similar to those available for a statistical factor. First and foremost, it permits us to construct new concepts and to examine old ones. In this particular example we are reminded of the well-known sociological concept going by namessuch as Gemeinschaft, rural community, or mechanical solidarity. This being the case,we come to conclude that concepts such as Gemeinschaft are useful concepts, useful in a very precisesense:They can be used to generate propositions that form a very simple theory becausethey
CONSTRUCTING
291
SIMPLE THEORIES
form a compatible set. We shall now consider how we can use a vector of labels to generate propositions. Reproducing the Propositions
The first factor in the principal components analysis has the property of reproducing the matrix of coefficients of correlation (on which the analysis is based) as closely as possible. What this means can be illustrated by an example. Suppose that the first vector has components (a, b, c); then we can multiply the vector by itself in the following manner:
The resulting square matrix comes as close as possible to being the same as the original matrix of correlations. It is perhaps clear that when we apply the very same procedure to our vector of labels, we will always obtain a matrix that has +I everywhere. For our example,
(1, l,l,
1, 1, 1, 1, 1,l)
=
1 1 1 1 1 1 1 1 i1
1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 11 .
(5)
Thus we see that a vector of labels requires that we generate propositions that relate every variable to every other variable. More informally, if we have a concept such as Gemeinschaft and know that it corresponds to a vector of labels, but do not have access to the specific propositions from which the vector has been constructed, then we should assume that every compatible variable influences every other variable in the compatible set. Confronted with this conclusion the reader is likely to be dissatisfied: It is almost certain that the propositions found in literature will not link each variable with every other variable. This objection is quite valid, but it does not represent such unsurmountable difficulties as may seem at first. To begin with, we do not wish to reproduce the original findings alone, we wish to reproduce the findings together with their implications. The whole thrust of our discussion has been that we wished to work with propositions that permit us to use our two rules of deduction, i.e., with compatible propositions. Now that we have them, we might as well utilize their deductive properties. And it can be shown that if we have a matrix of compatible
292
BARTOS AND HANSON
propositions (i.e., a non-negative matrix B), then all of the propositions, original as well as derived, will be given by the “reachability” matrix R which is the sum of powers of B: R =B+B’
+B3 +...B”,
where n is the length of the longest chain of influences that can be constructed from B.l” It should be added that the operations of addition and multiplication in the above equation are meant to be of Boolean kind: Any positive number is recorded as t 1 .l 1 The reachability matrix R computed for our example as given in matrix (4) is 123456789 1 2 3 R=4 5 6 I 8 9
000000100’ r 100001111 1100001111 100001111 100001111 100001111 000000000 ‘100001101 _ooooooooo_
(6)
We see that this matrix R comes much closer to the reproduced matrix (5) than does the original matrix B of (3). Nevertheless, only about 37% of the components of the above R are the same as those of the reproduced (all-positive) matrix of (5). Thus the gap between the two matrices is still quite large. Let us therefore consider a way in which this gap can be narrowed. Partitioning the Vector of Labels
One of the reasons for the discrepancy between the matrix R derived from the original propositions and the (all-positive) matrix reproduced from the vector of labels is that our variables play different roles: Some are “inputs,” i.e., they influence other variables but are themselves not influenced; others are “throughputs,” i.e., are both influenced by some variables and influence others; still others are “outputs,” i.e., they are inffuenced by some variables but do not influence others. We shall now show that if we succeed in partitioning our variables into these three types, then the matrix we reproduce from the vector of labels comes much closer to the derived matrix R. lbhere is a more practical way of determining n: if p adds new influences to R (changes some OS to +I$) while B n+t does not, then II is the last power of B that needs to be considered. llSee Harary et al. (1965).
CONSTRUCTING
293
SIMPLE THEORIES
We begin by partitioning our nine variables into the three sets. We shall remark on how such partitioning is arrived at shortly. Right now we simply give the result for our example: Low Low Low Low High High Low
2 3 4 5 1 6 8 1
industrialization urbanization social mobility geographic mobility family cohesion stability of marriage Westernization
High birth rate Low delinquency
Inputs
Throughputs
I 9 i
outputs
Notice that we not only partitioned the variables, but that we rearranged them to some extent. Observe also that we listed the input variables on the top, the throughput variables in the middle, the output variables at the bottom: This ordering must always be preserved. We now reproduce matrix R from thusly rearranged and partitioned vector by following these rules: 1. If variables 1, 2, . . . , m are the inputs, then the first m rows of R will have 0 in columns 1,2, . . . , m and +l everywhere else. 2. If variables m t 1, m t 2, . . . , k are the throughputs arranged in proper order (i.e., so that each throughput variable is influenced by all variables listed above it and in turn influences all variables listed below it), then rows m + 1, m + 2, . . . , k will have tl above the major diagonal, 0 everywhere else. 3. If variables k t 1, k t 2, . . . , n are the outputs, then rows k t 1, k t2,..., n will have 0 everywhere. Below is the matrix R resulting from the application of these rules:
k =
Low Low Low Low
industrialization urbanization social mobility geographic mobility
High High Low High Low
family cohesion stability of marriage Westernization birth rate delinquency
2 3 4 5 ii000
0 0 0 0
9
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
681
19
iii
ii
111 111 111 8011 001 0 0 0 000 000
11 11 11 11
j :i 00 oo_
(7)
The encircled entries are those that are different from the (empirically) derived matrix R given in (6): Only three components are different, thus rendering about 96% of the components identical. This is a considerable improvement from the 37% reproducibility we dealt with earlier.
294
BARTOS
AND
HANSON
In order to determine what the ordering and partitioning of the vector of labels ought to be, we note that the reproduced matrix ri in (7) is partitioned into nine submatrices a=
0 0 0
I CT 0
I I 0
inputs throughputs outputs
(8)
where 0 refers to a submatrix composed entirely of 0, I to a submatrix composed entirely of 1, and U is an “upper-triangular” submatrix, i.e., one that has 1 above the major diagonal, 0 everywhere else. This fact suggests what procedure we ought to follow when searching for the appropriate partitioning of the vector of labels: We ought to derive from our original (compatible) propositions the reachability matrix R, and then proceed by rearranging rows and columns until we obtain a matrix that most closely resembles fi of (8). Such a matrix then immediately suggests which variables are inputs, throughputs, and outputs. Although we have not worked out a detailed paradigm that finds the optimal partitioning, some of the principles are obvious: 1. Identify the columns in R that have 0 everywhere: The variables associated with these columns are the inputs. 2. Identify the rows that have 0 everywhere: The variables associated with these columns are the outputs. 3. Arrange the remaining variables so that they form an upper-triangular matrix U: these variables are the throughputs. A detailed paradigm would resolve ambiguous cases in such a manner that the discrepancy between the derived R and reproduced 8 is as small as possible.
AN EXAMPLE:
PROPOSITIONS
ABOUT FAMILIES
Ultimately, our procedure will be evaluated not so much on the grounds of its mathematical elegance as on the grounds of usefulness: Does it help the practicing sociologist to gain theoretical insights he might otherwise not have? We therefore turn to a fairly large set of propositions to demonstrate that our procedure indeed permits some theoretical insights that are of considerable practical importance. Selection and Codification
of the Propositions
We decided to work with Goode’s (1971) inventory of propositions relating to the institution of family. We chose this inventory because it seemed ready-made for our procedure: It dealt with a relatively homogeneous
CONSTRUCTING
SIMPLE THEORIES
295
area of inquiry, it contained a large number of propositions, and the propositions were formulated in such a way that, in most cases,we could cast them into the mold required by our procedure. The actual selection and codification of the propositions was done by one of us12 together with sevengraduate students in sociology. The book was divided into eight approximately equal blocks and one person was assignedto each block. Within each block every fourth proposition was considered, thus securinga systematic sampleof the entire book. Having identified one-fourth of propositions within his block, each researcher then attempted to transform the propositions into a standard format involving the labels of the two variables and the sign of the relationship. Thus, for example, the proposition The larger the family, the lower the mobility aspirationsof the children. was recorded as Size of family/upward mobility aspiration of the children (-). Slightly more than 20% of the selected propositions proved uncodable in this manner becausethey were either too complex or too ambiguous. The next step was to prepare a uniform dictionary of variable labels, thus removing possible redundancy of terminology l3 This proved to be a time-consuming task, involving several sessionsin which all eight researchers strived to arrive at a common terminology. The final product was a list of variable labels, each having a abbreviated name for the purposesof computer analysis. Thus, for example, “size of family” became FAMSIZ and “upward mobility aspirationsof children” became one of severalindicators of a variable called “achievement motivation,” ACHMOT. When this step was completed, we discovered that we still had too many variables: Even with careful programming, our computer could not handle more than, roughly, a 200 X 200 matrix. We therefore eliminated those variables that were involved in only a small number of propositions. We ended with a total of 193 variables. These variables were then subjected to our procedure. Application of the Procedure The paradigm for converting negative influences into positive (described above) was converted into a computer program and applied to 525 propositions connecting 193 variables.I4 The paradigm reversed the labels of 67 12Robert C. Hanson. 13Wefound that a large number of the propositions was redundant. 14We are indebted to Mr. Zeke Little of the Institute of Behavioral Science at the University of Colorado for writing the computer program and applying it to our data.
However: (143 has pastoral economy,c) N = 5
72 156 87 15 3 132 193
70abs
money economy is individualistic is short of land is short of food supply has high population density has high migration has high social mobility
Society that
Inputs Tends to be a society which 209 is urbanized 179 is stratified 133 is geographically mobile * 110 has disintegrating kinship b 105 promotes woman’s independence 88 has low fertility 211 has high warfare 124 leaves children free to choose their mates
Throughputs
outputs
However: (158 does not approve of premarital sex), N=l (152 assigns political importance to the family), N = 2
And a society which 65 has a high divorce rate ( 95 has high illegitimacy) (219 does not punish adultery) (218 approves adultery) (216 expects adultery) (217 has frequent extramarital sexual relations) (186 views sex permissively) (174 has frequent remarriage) (107 has frequent interclass marriages) (159 has nonmonogamous premarital sexual relations) ( 61 is socially disorganized) ( 94 has high homicide rate) ( 51 has high delinquency) ( 56 has adults who do not depend on their family) ( 91 has large generation gap) ( 75 does not respect their elders) ( 54 has democratically structured family)
Vector of Labels Resulting from an Analysis of Propositions listed in Goode ef al. (1971)
TABLE 2
9” ii? 8
5
5CA
N \5, b\
are suburban are childless have a low division of labor are socially mobile are not politically integrated
Parents 18 57 182 69 198
who are young do not want many children have high status are economically independent believe that feelings are important
However: 96 do not own their residence, N = 17 162 are black, N = 76 170 are Catholic, N = 74 ( 20 have children of widely differing ages), N = 5
199 137 64 192 15 1
Families that
95 homogamy 81 endogamy 144 patrilocal residence
Society that prescribes
Tend to *83 *86 207 *67 8
be parents who have low family cohesion have an unstable family have unplanned pregnancies have short marriage do not share activities
82 are individuated 173 are not religious ( 90 are flexible in childrearing)
Tend to be families which
However. 128 matrilocal residence
38 bilateralism 136 neolocal residence
Tends to be society that prescribes
Continued
And parents who 77 are emotionally disturbed (185 have infrequent sexual relations) (12 1 have low desertion rate) ( 42 neglect their children) ( 16 do not express affection)
189 have sibling rivalry (106 do not have in-law problems)
And families which
And society that prescribes ( 80 class endogamy) (196 soroml polygyny) ( 9 genolocal residence) ( 37 avuncular residence) (206 nonunilooal residence) ( 84 family ritual)
(111 assigns political importance to lineage), N = 2 (112 has stable kinship), N = 2 ( 32 has authoritarian fathers), N = 3
Mother 79 23 (167 (163
who has emotional problems is aggressive towards her children rejects her children) rebels against authority)
(200 does not have a higher status than his wife) (157 does not have low prestige in his family) (165 is cold and dominating) ( 1 is frequently absent from his home)
Father who
( 2 are absent from home) ( 46 fight with each other)
Inputs
2 (Continued)
with his son)
15 does not have a warm relationship with her children) 29 does not like her role ( 48 rears her children inconsistently)
Tends to be a mother who
( 13 has good relationship
is not affectionate) has emotional problems) rejects his children) participates in child care)
( 89 does not value fertility) ( 11 if unwed, readily offers her child for adoption) ( 39 breastfeeds her children for a long
And a mother who
( 12 ( 78 (166 ( 41
And a father who
However. ( 27 do not experience anxiety), N = 1 ( 28 do not assist their children tinancially), N = 1 (188 emphasize sex-role differentiation) N=2 (158 disapprove of premarital sexual relations), N = 1 (122 have frequent sexual relations), N=l
However: ( 59 are deviant), N = 5 ( 99 have an illegitimate child) N = 3 (148 bring up their children nonpermissively) , N = 5
Tends to be a father who
(126 have high sexual drive) (103 train their children to be independent) (102 feel inadequate)
outputs
( 47 disagree on child-rearing) (169 reject their children) 6 put academic pressure on their children
Throughputs
TABLE
z
5
$ z is
(177 is a schizophrenic) ( 35 is authoritarian) (108 has low intelligence)
Tends to be a child who
(164 rejects her children) ( 14 has low regard for her husband)
(130,131 is mentally ill) *( 50 is delinquent) ( 26 will become an alcoholic) ( 24 is not verbally aggressive) ( 58 matures slowly) ( 92,93 is homosexual) (118 is manic depressive) (178 is schizophrenic) ( 127 is masochistic) (138 is obese) ( 139 has Oedipal complex) (150 has low motor skills) (195 is socially incompetent) (197 is not adjusted to his stepparents) (147 is not imprisoned)
And a child who 25 is aggressive ( 43 is outgoing) ( 45 is nonconformist) ( 49 is creative) (213 adjusts to work)
time) *(215 is satisfied with her work)
aThe numbers preceding each label refer to the index number of the proposition. &The labels preceded by an * are similar to labels in Table 3. CThe labels in parentheses are involved in six or fewer propositions. N refers to the actual number of propositions in which the label is involved.
210 is an urban-area girl
184 is a male
is overprotected is an only child is the oldest child is exposed to accelerated toilet training ( 187 is confused about his/her sex role) (160 is given reason for punishment)
142 140 17 203
A child who
( 76 is frequently absent) ( 7 1 is economically important) I knows her children’s friends (183 is lower class) ( 9 has low education)
300
BARTOS AND HANSON
variables and reduced the number of variables involved in one or more negative propositions to 89. At that point our paradigm for eliminating incompatible variables was computerized and applied to the subset of 89 “negatively linked” variables.15 The net result was that 37 variables were identified as “incompatible.” Table 2 lists the 156 variables that our procedure identified as “compatible,” Table 3 the 37 variables identified as “incompatible.” Two comments are inorder concerning Table 2. First, note that we have divided the table into three columns, each corresponding to one of the three time-classifications we discussed earlier: inputs, throughputs, and outputs. We followed simple rules for this classification: Variables that served as causes but not as effects were listed as “inputs” (first column); those that served as effects but not as causes were listed as “outputs” (third column); all the remaining variables were classified as “throughputs” (second column). We made no attempt to order the variable within the “throughput” column, even though our preceding discussion suggests that this should be done. Second, we divided the table into seven broad blocks of rows, each block containing labels referring to the same entity: society, societal rules, families, parents, father, mother, child. Table 3 lists the variables that our procedure separated from those of Table 2 as being incompatible. Note that we used the same blocks of rows as in Table 2, but that we made no attempt at classifying these variables into three columns. The reason for this is that we did not attempt to determine whether the variables of Table 3 are compatible with each other. Since division into inputs, throughputs, and outputs makes sense-in our systemonly if the propositions are compatible, such division for Table 3 would be premature. Is the Compatible Set Truly Compatible?
Given the results of Tables 2 and 3, one naturally wants to evaluate them. In particular, since our procedure claims to separate variables that are compatible from those that are not, we might ask whether it has succeeded in doing so in our example. Are there some labels in Table 2 that should not be there? There are some instances of labels that our procedure identified as “compatible” which some might view as incompatible. For example, is “having money economy” really compatible with “having pastoral economy”? Or “being socially mobile” compatible with “being black”? Or “having high status” compatible with “having illegitimate child”? We searched through Table 2 and isolated labels that to us seemed incompatible: We listed them at the bottom of each block of rows, under the heading “However.” IsThis program was written and applied by 0. J. Bartos.
CONSTRUCTING
SIMPLE THEORIES
TABLE 3 Variables Found Incompatible with Those in Table 2 Societies which * 113 de-emphasize kinship0 * 109 have noncohesive kinship * 114 have small kinship Families that 4 are assimilated to majority culture 208 are urbanized 85 are large 129 are matrilinear (146 are patrilinear) b ( 134 are monogamous) Parents who 161 do not punish their children ( 31 are not authoritarian) ( 60 do not discipline their children) (204 do not toilet tram their children rigidly) 74 are well educated * 123 have unstable marriage 2 1 married late * 119 are poorly adjusted to marriage 175 differentiate their roles 176 are lower class ( 62 will not adjust to divorce) (120 feel that communication with spouse is unimportant) Father who 66 does not dominate his wife Mother who * 212 is gainfully employed ( 33 is authoritarian) Child who 191 *52,53 5 10 168 55 ( 3 ( 22 ( 36 (141 (117 (125
has high self-esteem is delinquent is motivated to achieve is emotionally unstable rejects his parents is dependent on his mother is academically successful) if a boy, is not aggressive) is autoerotic) is first-born) if a boy, has high sex identification) will get married)
aLabels marked by * are similar to some labels in Table 2. bLabels in parentheses are involved in six or fewer propositions.
301
302
BARTOSANDHANSON
Assuming that we identified incompatible labels that our procedure missed,why did this happen? It should be clear that our procedure can fail in this manner, but that it is likely to do so only in certain cases: only for variables that are “weakly connected,” i.e., that are connected with only a few other variables. The reason for this is that a variable will be excluded as “incompatible” only if it participates in a proposition that is incompatible with another proposition in the set.l6 And the probability that any one variable will participate in a proposition that “clashes” with another proposition depends on the number of propositions to which this variable is linked-i.e., on its connectednessl7 To determine whether this reasoningis correct, we distinguishedweakly connected variables from strongly connected ones: We defined a “weakly connected” variable as any variable that participates in six or fewer propositions; all other variables are “strongly connected.“18 Tables 2 and 3 identify the weakly connected variablesby enclosingtheir labelsin parentheses. When one inspects Table 2, one finds that, indeed, a great majority of the labels listed under “However” are in parentheses.This meansthat most of the “wrong decisions” of our procedure involve weakly connected variables. And the lesson we learn is that one should apply our procedure to strongly connected variables, that if one uses weakly connected variables one might encounter instances when a variable is listed ascompatible when, in fact, it is not. But what about the suspect caseswhich are strongly connected? We identified one cluster of such labels, those describing lower-class families: Families that do not own their house, are black, and Catholic. We felt that these labels were inconsistent with the remainder of labels, since those seem to involve middle-classfamilies. Is it not true many properties of lower class are incompatible with those of middle class? Actually, it turns out that this incompatibility is not of the kind that we wish to take into account. Note in Table 2 that both the cluster of the middle-classvariables (such as being suburban) and the cluster of lower-class variables occur in the input column. Since, by definition, none of the input variables is influenced by any other variable, it follows that there are no propositions that involve two input variables! Thus these labelsmay strike us as incompatible, but they cannot possibly be involved in conflicting propositions. To be sure, they can have conflicting implications, but such implications 16~~0 propositions are incompatible with each other if (1) they involve the same two labels and (2) the same direction of influence, but (3) one specifies a positive while the other specifies a negative influence. 17For the concept of “strong” and “weak” connectedness see, for example, Harary ef al. (1965), Chap. 3. 1gAlthough choosing 6 as the dividing point is arbitrary, the distinction between “weakly” and “strongly” connected variables is not: Most of the strongly connected variables of Tables 1 and 2 participate in about 70 propositions.
CONSTRUCTING SIMPLETHEORIES
303
would occur under the “throughput” or “output” columns. The fact that we find no “However” in these two columns (for the row block dealing with families) means one thing: The properties of families in Table 2 (such as their being religious or having sibling rivalry) do not depend on whether they are lower class or middle class. Of course, it ought to be added that our procedure did exclude certain properties of families as incompatible, and that these seem to be those of lower-class families: assimilation to majority culture, urbanization, and large size. All of these appear in Table 3. There is one more case of seemingly incompatible levels that involve strongly connected variables: the labels 144 (patrilocal residence) and 128 (matrilocal residence). However, once again our procedure is not at fault. Specifically, our compatible set contains the following two propositions, both restated versions of propositions found in Goode et al. (1971): Patrilocal residence leads to independence of women. Independence of women leads to matrilocal residence. It follows, by our Rule 1, that Patrilocal residence leads to matrilocal residence. Hence, indeed, these two labels are compatible. Thus we conclude that our procedure can miss some incompatible labels, but that it seems to do so only in cases that involve weakly connected variables. Such failures of our procedure can be treated in two ways: First, we can limit our consideration only to the strongly connected variables; second, we can conduct further research involving the weakly connected variables that seem suspect, thus converting them into strongly connected variables that wiZ2 be classified correctly by our procedure. Is the Incompatible
Set Truly Incompatible?
Just as it is possible to find labels that do not seem to belong into the compatible set of Table 2, it is possible to find some that do not seem to belong to the incompatible set of Table 3. How, for example, can one justify treating “having noncohesive kinship” as incompatible and listing it in Table 3, when “having disintegrating kinship” is treated as compatible and listed in Table 2? We examined thoroughly the labels of Table 3 and put an asterisk in front of those that seem to be similar to labels retained in Table 2. We then put an asterisk in front of those similar labels in Table 2, too. Why do we find these cases of “split” labels that seem almost identical? We investigated all cases and found that the reason is, invariably, that labels in Table 3 participated in “wrong” propositions while those in Table 2 participated in “right” ones. For example, we found that the societal labels of Table 3 were involved in the following propositions (derived from Goode et al.):
304
BARTOS AND HANSON
Societieswhich de-emphasizekinship tend to have stable families. Societieswith noncohesivekinship tend to have stable marriages. These propositions seem “wrong” because they imply that as the larger kinship is dissolving, the nuclear family is becoming more stable. Since dissolution of the kinships is a part of the industrialization process,one would be left with the proposition that industrialization tends to stabilize nuclear families. And this strikes us as “wrong,” precisely becauseit doesnot fit the bulk of the findings on families. Of course, our procedure allows US to do more than rely on our feeling that a proposition is not “right”: it permits us to determine precisely why such propositions are “wrong.” We could determine this by showing that there exist other propositions located in Goode’s inventory or derivable from it that are incompatible with them. But such a demonstration would delay our main argument too much. We therefore merely note that the compatible label in our example participates in the following proposition: Societies with disintegrating kinships have unstable marriages. Not only is this proposition almost the exact opposite of the above two propositions, but also it strikes us as being “right.” Consequently, it is not surprising that the label “disintegrating kinship” was retained in the compatible set. Thus we see that although our procedure can split very similar labels, treating one as compatible and the other as incompatible, the fault does not lie with the procedure but rather with the propositions. Somehow or other, a wrong proposition found its place into our set, either becauseof a research anomaly or becausethe researchfindings were wrongly interpreted (by Goode or by us).lg In any case, such instances againstimulate further research, one which would get at the root of the discrepancy. Generating Plausible Propositions We stated earlier that one of the benefits from having a vector of labels such as given in Table 2 is that we can use it to reproduce (estimate) the reachability matrix that can be derived from the compatible propositions. In plain English, Table 2 permits us to generate a large number of plausible propositions. Let us illustrate how we would go about generating such propositions. Let us consider all propositions that can be generated from the input lgR.
C. Hanson and his team, in the course of scrutinizing
a small set of
propositions(approximately200), found many errors, not only editorial “mistakes” but also substantive errors in which propositions reported without foundation in the source quoted by them.
by Goode et
al.
(1971) were
CONSTRUCTING
SIMPLE THEORIES
305
label 18 “being a young parent.” We begin by casting this label into the causal mold described earlier: Being young leads to . . . We then complete the sentence by inserting all the labels that appear in Table 2 to the right and at the same level or below the cell in which label 18 appears. For reasons just discussed, we ignore the labels in parentheses and obtain the following 10 propositions from Table 2. Being a young parent leads to: having low family cohesion;* having an unstable family; having unplanned pregnancies;** having a short marriage;** not sharing activities;** putting academic pressure on one’s children;** being emotionally disturbed; not having a warm relationship with her children;* not liking her role;** having an aggressive child.** We marked by * propositions which, in point of fact, were found in Goode et al. (19’71); the propositions marked by ** are those we derived from the compatible set by means of our Rules 1 and 2. Thus we can see that, for this example, reproducing propositions from a vector of labels matches the propositions found in literature or derived logically from them quite well: seventy percent of the above 10 reproduced propositions are marked by * or **. In order to clarify our procedure further, let us note which propositions should not be generated. In the first place, propositions such as being a young parent leads to not wanting many children are not permissible because they link together two inputs although, by definition, an input is not caused by anything. Second, we should not generate a proposition such as: Being a young parent leads to having an individuated family. Generating such propositions is unadvisable because our propositions tend to be so formulated that influence flows from larger systems to smaller systems.20 The above proposition postulates a flow from a smaller system (parents) to a larger system (family) and thus is inappropriate. 2nThis is an arbitrary assumption on our part. Others may wish to make different assumptions, reflecting their understanding of the causal relationships.
306
BARTOS ANDHANSON
Resulting Theory We claimed at the very beginning that our procedure helps us derive a simple theory from a large set of empirically based propositions. Although Table 2 provides us with the basisfor such a simple theory, it might be useful to demonstrate how this base can be used. Figure 1 displays a graph that constitutes a theory which, in our opinion, is consistent with Table 2. Since it would go beyond the scope of this paper to discusshow one should extract a theory from a table such as Table 2, let us merely make a few comments. In the first place, we obtained the main variables of Fig. 1 (such as industrialization, individualism) in the sameway one proceedswhen naming a factor emerging from statistical factor analysis: We searchedfor a name that would incorporate most of the labels in a given block of rows of Table 2. Second, we again used the two-way classification of Table 2 (by row blocks and by columns) to determine the flow of influence. In particular, we assumedthat influence flows from the larger unit to the smaller and from inputs to outputs. While the distinctions between different-sized units help us to draw the arrows that point downward, the distinction between inputs, throughputs, and outputs helps us to draw the arrows pointing from the left to the right. Consequently, the labels on the left side of Fig. 1 represent properties which, although possibly not belonging to the main stream of influences, neverthelesstend to strengthen it. As a result, our theory suggestssome interesting conclusions. For example, it suggeststhat although industrialization can account for lack of both structural and emotional stability found in our society, we gain even Industrialized societies
Middle families
class
Individualistic famili s 9. Unstable
0y;zpct-J
, Figure 1
Mentally children
families
ill
CONSTRUCTING
SIMPLE THEORIES
301
better understanding if we invoke labels like middle class,younger generation, and parental instability. The ideas making up this simple theory represent the result of many individual studies conducted primarily within American and Western industrialized societies. Since it is probable that most of the investigators were working within the so-called “functional” framework, it seemssomewhat ironic that our theory reads like a radical indictment of such societies: one of the “outputs” of modern industrial societies seemsto be emotionally disturbed youth. It is perhapsclear that converting Table 2 into a simple theory shown in Figure 1 is a matter of art rather than that of an exact procedure. Yet it is reasonably simple to check this work of art againstits base.We do not claim that the theory of Fig. 1 is the only possibletheory derivable from Table 2, nor that it is the best one. We merely feel that it is consistent with the propositions of Table 2. A SUMMARY OF THE PROCEDURE We outlined a procedure that is capable of converting a large set of verbal propositions into factor-like lists that summarize much of the information contained in the original propositions, and which, in turn, can be used to construct a simple causal theory. Since the procedure is complex and involves several steps,let us summarizeand characterize briefly each of the steps. Step I: Selection and Codification of the Propositions The first step is time-consuming and often frustrating. It involves identifying a source of propositions (we chose Goode’s 1971 inventory of propositions concerning the institution of family), selecting the specific propositions to be considered (we chose a systematic sample of all Goode’s propositions), and codifying them. The codification is the most timeconsuming processsinceit entails (1) assigninga code number to each variable and (2) punching an IBM card for each proposition,21 specifying (a) the code number of the causalvariable, (b) the code number of the effect variable, and (c) the sign of the causalinfluence, t or -. It is perhapsclear that this first step is preparatory to the application of our procedure and that, unlike the procedure itself, it is highly subjective. Not only must one decide which operationally defined variables are sufficiently similar to be treated as the same(theoretical) variable,22 but one must also 21Naturally, one need not use IBM cards as inputs.If time-sharing is used, one simply typesin the requisiteinformation. z21f one did not lump similar variables together, one would obtain mostly “weakly connected” variables-and these were shown above to be problematic.
308
BARTOSANDHANSON
convert each proposition found in the literature into a causal proposition.23 We found that about 20% of the propositions in our sample could not be coded in this manner. Step 2: Finding a Non-negative Matrix
Once the propositions are codified and punched on cards, the computer program written by us for this purpose operates on the cards.24 The end product of this program is a subset of variables (a “vector of labels”) that can be used in constructing a simple theory, i.e., one represented by a matrix in which all relationships are non-negative. This end product is reached in two steps: First, the number of negative relationships is reduced to a minimum; second, the set of variables is reduced in such a way that all negative relationships are eliminated. Step 3: Derivations from the Theory
This third step involves partitioning the set of variables produced in Step 2 into three subsets: the inputs, the throughputs, and the outputs. Although this partitioning is automatic once the propositions are punched on the cards,25 one should not lose sight of the fact that in many cases the decision (made in Step 1) about what is an effect and what is a cause were “intuitive.” Consequently this partitioning is only as good as were the original “causeeffect” decisions. Once this partitioning is obtained, a matrix with zeros and ones is constructed in a fashion illustrated earlier in (7) and summarized in (8). This matrix is the reachability matrix A; it contains most of the propositions that can be “safely” made on the basis of the propositions originally punched on the cards. In other words, the numbers “1” in the body of matrix R represent all the derivations that can be made from the vector of labels produced in Step 2. The sociologist who is interested in being able to work with-a large number of “plausible” propositions without having to remember them all will be attracted to this result of our procedure: All he has to remember is the (properly partitioned) vector of labels. 23This requirement is not as restrictive as it may seem. If one cannot decide whether X or Y is a cause, one can always state two propositions, one for each interpretation. 24This program is easy to write using even the simplest computer language such as BASIC. Only when a very large number of variables is utilized is considerable programming skill needed, since computer memory has to be used very efficiently. 25Any variable that participates in propositions as a “cause” only is treated as an input variable; any variable that participates in propositions as an “effect” only is treated as an output variable; all the rest are the throughput variables.
CONSTRUCTING
SIMPLE THEORIES
309
Step 4: Constructing a CausalTheory
Some sociologists are more interested in data reduction than in generating a rich set of propositions. Yet they may be dissatisfied with the condensed information contained in the vector of labels for the same reason that some are dissatisfied with the results of factor analysis: The vector (and a factor) suppresses the causal structure of the theory. These sociologists can use the results of Steps 2 and 3 to construct a simple causal theory. They are completely on their own in doing it, though; we did not attempt to do any more than provide an illustration (see Fig. 1).
EVALUATION
OF THE PROCEDURE
Now that we have summarized the procedure we propose, let us justify it and summarize its merits. Justification
We shall attempt to justify only Steps 2 and 3 of our procedure. More specifically, we shall attempt to justify our deriving new propositions by means of our two rules, the rule of multiplication and the rule of addition. Let us consider any “axiomatic” theory, i.e., a theory that consists of (1) axioms, (2) rules of derivation, and (3) theorems (i.e., propositions derived from the axioms by means of the rules). Suppose that we have verified all the axioms and wish to verify the rules of derivation; how should we proceed? One approach is to address the rules directly and determine whether they are applicable. This is the tack taken by Costner and Leik (1964) in their pioneering analysis of the conditions under which one can use our multiplication rule (called by them the “sign rule”) on correlational data. If we followed this route, we would find that Steps 2 and 3 of our procedure are justified whenever Step 1 was performed correctly, i.e., whenever we succeeded in distinguishing the cause from the effect in every one of the propositions we punched on our cards.26 But this conclusion merely shifts the need for justification from Steps 2 and 3 to Step 1. A different approach-one we believe more appropriate for our purposes-is indirect. Instead of attempting to show directly that our rules are valid, we try to do this indirectly, by showing that the theorems implied by our rules are empirically supported. The logic of this justification is simple enough. Suppose that we have a set of axioms that have been verified empirically, and that we proceed to 26Thisresult amountsto requiringthat eachobservedcorrelationbetweenX1 and X2 musthavethe samesignasthe beta weightpIz -345.. . 3wherex2,x3,x&x5, . . . are all the causes of X1.
310
BARTOS AND HANSON
derive from them all of the possible theorems. Next, we subject all these theorems to an empirical test. Suppose that every one of these theorems is supported empirically-are we not justified in concluding that the rules we used to derive the theorems are valid? We certainly are, since this is what we mean by “valid” (“applicable”) rules: Whenever applied to empirically true axioms they yield empirically true theorems.27 This is the basic schemefor the justification of our procedure: We wish to demonstrate that our two rules (the rule of addition and the rule of multiplication), when applied to a set of empirically true axioms, generate in a substantial majority of casesempirically true theorems. Simple as this scheme is, however, we must take care of someproblems. The first problem is that when we start with a propositional inventory, we have no way of knowing which of the propositions are axioms and which are theorems. And yet, for any given set of propositions represented by a non-negative matrix, to discover which propositions can serve as axioms is possible.Let us illustrate by returning to the 26 propositions we took more or less at random from Goode’s (1971) inventory and listed in Table 1. In order to obtain a non-negative matrix representation, let us make useof our earlier results: let us eliminate Variable 8 (family size); let us change some of the labels; and in order to have most of the “1” above the major diagonal (which simplifies our deciding which propositions can serve as axioms), let us order the labels so that the input labels come first, throughput labels second, and the output labels last. The resulting non-negative matrix derived from Table 1 is: 2345691710 2 Lowindustrialization Low urbanization 3 4 Low socialmobility Low geographic mobility 5 Highfamily cohesion 6 Highstabilityof marriage 9 Low Westernization 1 Highbirth rate 7 Low delinquency 10
(9)
Note that 11 components are encircled: These are the axioms of the theory. Just how we proceeded to obtain them is irrelevant for our purposes; what is important is that the encircled components indeed function as axioms for our theory: all of the theorems (the “1s” that are nor encircled) can be derived from the axioms. The reader may verify this by defining a matrix A 27This reasoning is quite similarto that used in, say, regression analysis.If we find that a regression equationY = ?&Xi predictsY fairly accurately, we conclude that the “theory”
embodied
in the regression equation
must be correct. Since this “theory”
includesthe rule that the individualinfluences&Xi shouldbe added,our conclusionthat the “theory”
is correct implies that using the rule ofudditior~
is justified.
CONSTRUCTING
SIMPLE THEORIES
311
that consists of the above axioms only, and then create the reachability matrix R by raising A to subsequent powers and adding these powers: R=A+A*
tA3.
He will find that all of the theorems in Matrix (9) are contained in R (as well as many other theorems, not contained in Matrix (9)). What does this example show? It shows that if we choose randomly a set of propositions from a propositional inventory, it will be possible to distinguish a set of axioms from which the remaining chosen propositions follow as theorems. Since both sets are empirically verified (both coming from a propositional inventory based on empirical research), the fact that the theorems are derived from the axioms by means of our two rules (or by means of matrix multiplication and addition, which amounts to the same thing as using the two rules) demonstrates that our rules are valid. To be sure, we did not state the axioms first, and then derived the theorems and tested them; but this does not matter. All that matters is that the theorems are derived from the empirically supported axioms and that they (the theorems) are themselves empirically supported. Just how confident can we be that any set of propositions chosen more or less at random from a propositional inventory will contain a large number of theorems? We feel quite confident, for two reasons. First, it is not difficult to show that the number of derivable theorems is normally larger than the number of axioms from which they were derived.28 Second, the very fact that the propositional inventories are constructed “haphazardly” (without distinguishing axioms from theorems) suggests that a theorem is as likely to be included as an axiom. As a result, one can reasonably expect that any partitioning of a propositional inventory into axioms and theorems will reveal that many of the propositions are theorems. So far we showed that in most propositional inventories there will be many theorems present, and that this fact justifies the rules used to derive these theorems. But what if we find that some of the propositions in the inventory contradict the derivable theorems, i.e., that they are “incompatible” 28A theory with k input variables, m throughput variables, and n output variables (and without mutual influences) can have, at most, M propositions, where M=tn(k+
+n)+ktl.
We conjecture that these M propositions can be derived from as few as N-l axioms where N is the total number of variables included in the theory. For example, Matrix (7) displays a total of 29 propositions. Since for this matrix k = 4, m = 3, n = 2, we find that indeed the maximum possible propositions M is: M = 3(4 + y
+ 2) + 4 x 2 = 29.
Since Matrix (7) displays nine variables (nine rows and nine columns), it follows from our conjecture that the 29 propositions can be derived from 9 - 1 = 8 axioms. In other words, of the 29 propositions, eight will be axioms and 21 will be theorems.
312
BARTOS
AND
HANSON
with our theory? Is the presence of incompatible propositions not a proof that our rules cunrrof be applied to the data? There is considerable merit in this objection: It would seem that the larger the proportion of propositions we must discard as incompatible, the less justified we are in using our rules of derivation. At the same time, if the remaining compatible propositions form a rich set (i.e., if most of the derivable theorems are actually found in the inventory) and/or if most of the compatible propositions are supported by many pieces of independent research, then it seems reasonable to have faith in the applicability of the rules. It may be illuminating to note some analogies with factor analysis. When applying factor analysis to a matrix of correlations, we desire to extract a single factor on which all variables load highly. (In our procedure, we desire to extract a single “vector of labels,” one which includes all labels.) Frequently, however, we find that only some variables load highly on the first factor, and that additional factors need to be extracted. (In our procedure, we often find that only a subset of labels is mutually “compatible.“) This does not make our analysis useless: As long as the first factor explains a fairly large proportion of variance in the data, we may conclude that the variables loading highly on this first factor are highly intercorrelated. (In our procedure, as long as many studies support propositions linking some of the compatible labels, we may state propositions for all of the possible links between the compatible labels. This is another way of saying that we are justified in applying our two rules to derive new propositions.) Just as the discovery that factor analysis seldom produces a single factor led to a search for criteria determining whether additional factors ought to be extracted, one ought to search for more precise critera determining whether our method is applicable to a given set of data. But that is beyond the scope of our paper. We are satisfied to point out that in our extended example, the number of variables we had to exclude as “incompatible” was relatively small (37 out of 193), and that most of our compatible labels were “strongly connected” (most participating in about 70 propositions). Thus we tentatively conclude that the application of our two rules to Goode’s inventory is justified. Further Comments on the Procedure So far, we only justified Step 2 and-partially-Step 3 of our procedure.2g The reader should note that we made no attempts to justify directly Steps 1 and 4. Let us consider why. *%Ve merely justified the notion that once a non-negative matrix is obtained (Step 2), one can apply to it our two rules to derive new propositions. To justify Step 3 we would have to show on a concrete example that the derivationsfrom the vector of labels-thesederivationsfollowing the pattern illustratedin Matrix (7)-correspondto a largeextent to the derivationsfrom the non-negative matrix of propositions.
CONSTRUCTING
SIMPLE THEORIES
313
Step 1 involves a lot of intuitive decision-making. The most crucial part involves specifying the causal flow: determining for each proposition found in the inventory which variable is a cause and which is an effect. Although we did not consider these decisions directly, our discussion (in the preceding section) suggests an ex post facto justification: if our procedure works (i.e., if only a small number of propositions is incompatible and if most compatible propositions are strongly supported by evidence), then the causal decisions made in Step 1 must have been largely correct. Matters are quite different when it comes to Step 4, the creation of a simple causal theory. We did not provide any justification because this step is not a necessary part of our procedure; it is merely an optional extension for the social scientist interested in constructing simple causal theories. In particular, the assumption that “influence flows from larger to smaller units” that we made in constructing Fig. 1 is an arbitrary one and is not needed for the application of our procedure. Usefulnessof the Procedure
At the risk of repeating ourselves, let us summarize the benefits that might accrue to the social scientist using our procedure. In the first place, although the procedure requires a considerable amount of preparatory work (Step l), its reward is that it yields extremely condensed information in the form of a “vector of labels” (Step 2). This vector is often helpful in constructing concepts such as Gemeinschaft or bureaucracy. Second, the condensed information contained in the vector of labels can be “released” into a large number of plausible propositions (Step 3). Such propositions can be used either for the purposes of prediction and explanation or, if they offend the researcher’s intuitive understandings, for the purposes of further research. Such research is likely to be quite profitable: if the suspected propositions are supported, then the research should dispel the suspicion and support the theory; if they are contradicted, then one ought to raise questions about the process through which they were characterized as “compatible .” 3o Third, the vector of labels provides a theoretician with a good basis for constructing very simple causal theories (Step 4). Since literature on this subject is sufficiently plentiful, we need only to reiterate that this theory will be as good as are the skills of the theoretician who constructs it.
JOAmong the specific questions that are raised in this case are: Is the proposition in fact incompatible? If it is, which label (if any) should be removed from the compatible set? If it is not, have the labels been operationalized wrongly? Or, was there an error in compiling the Inventory, or in the original research?
314
BARTOS AND HANSON REFERENCES
Altman, I., and McGrath, J. E. (1966), Small Group Research: A Synthesis and Critique of the Field. Holt, Rinehart & Winston, New York. Berelson, B. R., and Steiner, G. A. (1964), Human Behavior: An Inventory of Scientific Findings. Harcourt, Brace & World, New York. Costner, H. L., and Leik, R. K. (1964), “Deductions from ‘Axiomatic Theory’,” American Sociological Review 29, 819-835. Goode, W. J., Hopkins, E., and McClure, H. M. (1971), Social Systems and Family Patterns: A Propositional Inventory. Bobbs-Merrill, New York. Harary, F., Norman, R. Z., and Cartwright, D. (1965), Structural Models: An Introduction to the Theory of Directed Graphs. Wiley, New York. March, J. G., and Simon H. (1958), Organizations. Wiley, New York. Passmore, J. (1974), A selected and annotated bibliography in social sciences, Boulder, Colorado, Department of Sociology at the University of Colorado (mimeographed). van de Geer, J. P. (1971), Introduction to Multivariate Analysis for the Social Sciences, San Francisco, W. H. Freeman.