JOURNAL
OF MATHEMATICAL
13, 89-100
PSYCHOLOGY
(1976)
The Number of Two-Way Satisfying Certain Additivity JAMES ARBUCKLE AND JAMES Department
of Psychology,
Temple
University,
Tables Axioms LARIMER
Philadelphia,
Pennsylvania
19122
An expression is given for the number of ways of ranking the cells of an T by c factorial design so as to satisfy independence. For selected values of r and c, estimates are given of the number of rankings that satisfy both independence and double cancellation, and also of the number of rankings allowing an additive representation. These results may be used in at least two ways: first, in evaluating the probability of the satisfaction of certain measurement axioms by chance; and second, in placing a lower bound on the amount of information necessary to establish the ordering of the cells of a factorial design when it is known that these axioms are satisfied.
1. INTRODUCTION Consider an empirical relation R on the elementsof the product set A x 23,where A = {al , a2,..., a,} and B = {b, , b, ,..., b,} are both finite. If there exist real valued functions $r on A and +s on B, such that
then the double (A x B, R) is said to have an additive representation. Krantz, Lute, Suppes,and Tversky (1971, Chap. 9) discussthe problem of additive representations for (A x B, R) from a measurementtheoretic viewpoint, giving additional references. Box and Cox (1964) and Kruskal (1965) discusssimilar issues of additivity within the context of the analysisof variance for factorial experiments. In practice, the existence of a solution to (1) is demonstratedby exhibiting $i and 4s , while the nonexistence of an additive representation is demonstrated by exhibiting the violation of some consequenceof (1). Some useful consequencesof (l), called cancellation axioms, are in the form of implications in which a certain number of inequalities of the form (ui , bj) R(a, , b,) imply an additional inequality of the same form. The cancellation axioms are conveniently grouped into classesaccording to the number of inequalities appearing in their antecedent conditions, an axiom expressed in terms of n antecedent inequalities being called an nth order cancellation axiom. 89 Copyright AU rights
0 1976 by Academic Press, Inc. of reproduction in any form reserved.
90
ARBUCKLE
AND
LARIMER
In this paper we attempt to find N(r, c), th e number of different orderings of the cells of an r x c factorial design for which an additive representation exists. We also seek Nm(r, c), the number of different orderings of the cells of an Y :< c factorial design that satisfy all ith order cancellation axioms for 1 < i < n. Only the case where R is a simple order will be considered. An expression will be given for N,(r, c), the number of simple orderings that satisfy single cancellation (independence). A Monte Carlo procedure will be used to estimate N2(r, c) and N(r, c). There are at least two circumstances in which one would be interested in N(r, c). First, suppose that one has obtained an empirical ordering on the cells of an r x c factorial design and has discovered that an additive representation exists. It is reasonable for him to ask how “strong,” in some sense, this result is. To take a specific case, consider a 2 x 2 table. There are 24 ways of ordering the four cells of this design. It can be determined by examining each possibility, or from the later results of this paper, that eight of these 24 orderings, or one-third of them, allow an additive representation. By contrast, consider a 4 x 6 design. Among our later results is an estimate that, of the 24! different possible orderings of the cells of a 4 x 6 design, a proportion of less than lo-l6 allow an additive representation. It seems as though the discovery that an additive representation exists is a much stronger result in the 4 x 6 case than in the 2 x 2 case. It is tempting to give “strong result” a precise meaning, in terms of statistical hypothesis testing. We may say that the existence of an additive representation for an empirical ordering on the cells of a 2 x 2 design allows us to reject the null hypothesis that the observed ordering was drawn randomly from the population of all possible orderings at the 0.33 level of significance. Or, after finding an additive representation in the 4 x 6 case, we can reject the same null hypothesis with a lo-l6 level of significance. The null hypothesis of random sampling from a family of permutations has been used in a related context by Klahr (1969). It should be understood, however, that, in using N(r, c) in comparison with (rc)! as an overall measure of the strength of the additivity model, we are not attempting to give a model to account for empirical violations of the additivity axioms. In particular, we do not maintain that empirical departures from the additivity model arise from random sampling from the population of all orderings. N(Y, c) will also be of interest in the case where an additive representation is known in advance to exist, and where it is wished to use this knowledge in determining an empirical ordering on the cells of an r x c design. In this circumstance, an economical strategy for establishing the ordering would collect only that minimum amount of data that would allow the inference of the entire ordering through the assumption of additivity. To take a simple example, once it has been ascertained that (ai , bi) R(ai , b,), there is no point in asking whether (a, , bj) R(a, , ble) as long as additivity is known to hold. If we restrict our attention to data collection procedures that involve a sequence of pairwise comparisons of the cells of the design, the best strategy will be the one that requires the fewest comparisons. The expression (log, iV(r, c)} where (x] is the smallest
TWO-WAY
TABLES
SATISFYING
CERTAIN
ADDITIVITY
AXIOMS
91
integer greater than or equal to X, is a theoretical lower bound on the number of pairwise comparisons necessary for picking a single ordering out of the N(Y, c) orderings allowing an additive representation (cf. Coombs, 1964, pp. 34-51). This is not to say that this lower bound is attainable, but only that it is something to shoot for, and that if you have developed a strategy that reaches this bound, there is no point in looking for a better one. The use of the integer {log, N(Y, c)} in this connection is akin to the use of the integer (log, n!> in evaluating the various strategies reviewed by Busacker and Saaty (1965) for establishing the ordering of n objects through a sequence of pair-wise comparisons. We have spoken so far only of N(Y, c). We will also discuss Ni(r, c) for three reasons. First, N,(Y, c), for any i, is an upper bound on N(Y, c). We will only be able to estimate N(r, c), and then only for selected small values of Y and c. On the other hand, we will give an exact expression for Nl(r, c). Second, verification of certain low-order cancellation axioms may in itself be important whether or not an additive representation is possible. For example, if each of Y types of student is paired with each of c teaching methods, where the relation (4 , h) %G , b,) is read “students of type i subjected to method i score lower on a standard examination than do students of type p subjected to method 9,” the critical question is likely to be whether or not single cancellation is satisfied (i.e., whether or not there is a single best method for all students). In such a case, one may wish to compare Nl(r, c) to (YC)! in order to get a measure of the “strength” of the single cancellation axiom for given Y and c. Third, there are cases in which certain low order additivity axioms, particularly that of single cancellation, may be presumed to hold a priori. For example, supposethat a subject is askedto make“smaller than” judgments amongrectangleschosenat Y levels of height and c levels of width. Alternatively, imagine that workers are askedto give preferencejudgments amongcompensationpackagesprepared by pairing each of Y levels of vacation time with each of c salary levels. In both of theseexamples,the only reasonfor collecting data would be to check on additivity conditions beyond that of single cancellation. In evaluating the strength of the additive model in an Y x c designin a casein which singlecancellationis known to hold in any event, a comparison of N(Y, c) with Nl(r, c) will be more useful than a comparison of either with (YC)! If, say, most of the orderings of the cells of a 3 x 4 factorial designthat satisfy single cancellation, alsoallow an additive representation,then a factorial study of compensation packagesinvolving three levels of vacation time and four levels of salary would be of little interest even though, aswe later estimate, the proportion of all orderings on a 3 x 4 design that satisfy additivity is about 0.0001. We do not, however, suggest that empirical departuresfrom additivity in caseslike this arisefrom random sampling from the population of orderings satisfying single cancellation.
92
ARBUCKLE
2. SINGLE
AND
LARIMER
CANCELLATION
Ni(r, c) is the number of ways of assigning xij , of an r x c table so that
(INDEPENDENCE)
the integers,
1, 2,..., rc, to the entries,
Zij
< Xik
whenever
H,~ < z,~
zij
< xkj
whenever
zi,,& < zk,,& ,
and
(2)
where aij is the integer placed in the ith row and jth column of the table. Let 5 be the set of all r x c tables that satisfy Eq. (2). Let PI be the group of operations on the elements of [ that consist of permutations of rows and let Pz be the group of operations on the elements of 5 that consist of permutations of columns. Let P = P2 x PI . We define an equivalence relation on 5 as follows:
2, ZEz,
iff
Z, = pZ, for some p E P.
Each equivalence class contains r! c! tables. To find the number of tables satisfying (2) we need only know the number of equivalence classes. A table will be called a regular table if it satisfies the following definition. DEFINITION
1.
An r x c table,
W, containing
the integers
1, 2,..., rc is a regular
table if wij < wik
whenever
j < k
wij < Wkj
whenever
i < k,
and
where wij is the integer in the ij cell of W. A table is a regular table if its entries increase from top to bottom and from left to right. Let D be the set of all regular r x c tables. Any Z E 5 can be transformed to a regular table by some p E P, and thus, every equivalence class of 5 contains at least one regular table. Moreover, every equivalence class of 5 contains at most one regular table, since given a WE Q and p E P other than the identity operation, pW is not in 52. Therefore, to find the number of equivalence classes, we need only find the number of regular tables. Consider an arrangement of n dots into rows and columns according to the following specifications: (a) the leftmost dots in each row are aligned into a single column; (b) the topmost dots in each column are aligned into a single row; (c) each dot has a dot above it unless it is in the top row and each dot has a dot to its left unless it is in the first column. Such an arrangement of dots is called a right diagram and the dots are called its nodes. In the theory of group representations there is associated with each right diagram having n nodes some irreducible representation of the symmetric
TWO-WAY
TABLES
SATISFYING
CERTAIN
ADDITIVITY
AXIOMS
93
group S, . It is known (e.g., Hamermesh, 1963, Chap. 7) that the degree of this irreducible representation is equal to the number of ways of assigning the integers, 1, 2,..., n, to the nodes of the associated right diagram subject to the restriction that the integers increase from top to bottom and from left to right. The number of regular r x c tables is thus the degree of the irreducible representation of S,, associated with the rectangular right diagram having r rows and c columns. A useful result here is that of Frame, Robinson, and Thrall (1954), employing a device known as a hook graph (Nakayama, 1941). Let L denote an arbitrary right diagram having n nodes. Call the node in the ith row and jth column its ;i node. The ij node is the corner of the f right hook that consists of the ;i node together with all of the nodes to the right of it and all of the nodes below it. The hook length of the q right hook, hij , is the number of nodes in it. The hook graph associated with L is the array of positive integers obtained by placing each of the n hook lengths hij at the corresponding ij node of the right diagram. The hook product, HL , is the product of the 1zintegers hij in the hook graph. Frame et al. (1954, Theorem 1) show that the degree of the irreducible representation of S, associated with L is K(L) Thus, the number of regular r
x
= n!/HL .
c tables is
and the number of tables satisfying independence is
w, c)= r! c!Jq-,4 = nT=n$.!+j _ 1)* 22
32
N~(r, c) grows rapidly with r and c. On the other hand, the ratio of N~(r, c) to (YC)!, namely [nb2 nP2 (i + j - 1)1-l shrinks rapidly as r and c increase, showing that single cancellation is nontrivial for fairly moderate choices of Y and c. 3.
DOUBLE
CANCELLATION
We do not have expressions for the number of r by c tables satisfying additivity axioms beyond that of independence. We shall describe here a Monte Carlo procedure for estimating N2(r, c), the number of T by c tables satisfying independence and double cancellation. The adaptation of this method to the estimation of the number of T by c tables satisfying independence together with any other set of axioms raises no essentially new points. A couple of plausible methods for determining or estimating N2(r, c) ought to be disposed of; one of these is to list all of the Y by c tables satisfying independence,
94
ARBUCKLE
AND
LARIMER
testing the double cancellation axiom for each one. This procedure could be speeded up by noting that, if Z, E 5, 2, E 5, and 2, = 2, , then either 2, and 2, both satisfy double cancellation or neither does. One could thus examine all the regular I’ by c tables in order to determine the number of them that satisfy double cancellation. Multiplying this number by r! c! would give the number of r by c tables satisfying independence and double cancellation. The trouble with this plan is the great number of regular r by c tables for reasonable sized Y and c. For example, if Y = 4 and c == 5, one would have to examine about 1.7 million regular tables. A second plan that must be discarded is a Monte Carlo procedure that consists of making several random draws from the population of all orderings of the cells of an Y by c table, making note of the proportion of those drawn that satisfy independence and double cancellation. This proportion could be taken as an estimate of the proportion of the population satisfying both axioms, and multiplying this estimate by the size of the population of orderings, namely (YC)!, would give an estimate of N2(y, c). Generating a random ordering of YC cells would pose no special problem. However, as the proportion of the population satisfying independence is exceedingly small for moderate choices of Y and c, a great many such tables would have to be examined in order to estimate Ns(r, c). For example, about 1 out of every 500 million orderings of the cells of a 4 by 5 table satisfy independence. The procedure we will use also involves random sampling. However, it involves random sampling from the population of regular Y by c tables in order to estimate T~(Y, c), the proportion of regular Y by c tables satisfying double cancellation. r,(r, c) is also the proportion of Y by c tables satisfying independence that also satisfy double cancellation, i.e., n,(r, c) = Na(r, c)/Nl(r, c). Suppose that a sample of II regular Y by c tables has been drawn and that the proportion of these satisfying double cancellation is p,(r, c). Then p,(r, c) is an estimate of T,(Y, c) and ms(r, c) = Ni(r, c) p,(r, c) is an estimate of N2(y, c). We now describe a procedure for random sampling from the population of regular Y by c tables. The plan is to insert the integers, 1, 2,..., rc, one at a time, into the cells of an Y by c table in such a way that the result is guaranteed to be a regular table, and so that each regular table has the same probability of being formed. Suppose that the integers are to be inserted in descending order of magnitude, beginning with the integer YC. If the completed table is to be regular, no cell may be filled in until any cell immediately to its right and any cell immediately below it have already been filled in. Thus, requiring that the completed table be regular is equivalent to requiring that, at each step of its construction according to this plan, the unfilled cells form the nodes of a right diagram. In inserting the integer rc, there is no choice but to put it in cell (r, c). Suppose that the Kth step has just been completed, so that the integers YC - K + 1, YC - li + 2,..., rc, have been assigned and the problem is to find a place for the integer YC - k. Let L’< denote the right diagram whose nodes are the YC - k unfilled cells, and let
TWO-WAY
TABLES
SATISFYING
CERTAIN
ADDITIVITY
95
AXIOMS
hFj denote the hook number associated with the (i, j) node of Lk. We know that the previous k choices have narrowed the number of possible regular tables to AiT( There may be several alternative choices for the placement of the integer TC- K. Generally, the integer YC- K can be placed in cell (s, t) provided that Lk remains a right diagram after the removal of its (s, t) node. Such a node of L” can be recognized by the fact it lies in the last entry of its row and the last entry of its column, that is, by the fact that h,, = 1. Suppose that several such nodes, (sr , tJ, (s2, t,),..., (srn , tm), have been found. Let Ltti denote the right diagram obtained from Lk by deleting its (si , to node. Then of those K(Lk) regular tables having the integers, rc - K + 1, rc - K + 2,..., rc, in the k locations already chosen, just K(L$) have the integer rc - k in cell (si , ti). Thus the integer rc - K should be placed in cell (s&J with probability
with the understanding that the first product in this expression is equal to 1 if si = 1 and that the second product is equal to one if ti = 1.
TABLE Number
la
of Tables Satisfying Double Cancellation 1000 Randomly Sampled Regular Tables
out of
C
r
480/13/r-7
3
4
5
6
3
856
4
648
320
5
481
125
14
6
327
31
3
0
7
202
19
1
0
8
116
2
0
0
9
59
0
0
10
51
1
0
11
19
0
12
17
13
3
14 15
7 1
96
ARBUCKLE
AND
TABLE
LARIMER
lb
95 9; Confidence Intervals on the Proportions of Single Cancellation Tables That Also Satisfy Double Cancellation
c I
3 4 5 6 7 8 9 10 11 12 13 14 15
3 .833, .617, .450, .298, .178, .097, .045, .038, .Oll, .OlO, .0006, .003, .(4)25,
4 .877 .678 .512 ,357 .228 .137 .075 .067 .030 .027 .009 .014 .006
.291, .105, .021, .011, .0002, .ooo*, .(4)25, .ooa*,
5
.350 .I47 .044 .030 ,007 .003 .006 .003
.008, .0006, .(4)25, .ooo*, .ooo*, .ooo*,
.023 .009 .006 .003 .003 ,003
6
.ooo*, .003 .ooo*, .003 .ooo*, .003
* Exact.
Samplesof 1000 regular Y x c tables were generated for each of several choices of r and c. Table la showsthe number of tables satisfying double cancellationin each sample. Unfilled entries in Table la correspond to choicesof r and c for which no sampleswere taken. The estimates,p,(y, c), can be obtained by inspection of Table la, and it is evident that the double cancellation requirement becomesstronger asr and c increase. There is plenty of opportunity, so to speak, for a single cancellation table to violate double cancellation, even for the relatively smallvalues of Y and c for which we have estimates.Correspondingestimatesof Ns(r, c) are not tabulated here, although it can readily be verified that theseestimatesgrow rapidly with r and c. Table 1b gives 959/, confidence intervals (Kendall and Stuart 1973, Example 20.2) for rs(r, c).
4. ADDITIVITY
Our procedure for estimating iV(r, c) is similar to that used to estimate N~(Y,c). Given a sampleof n regular r x c tables, the proportion of these having an additive representation,p(r, c), is an estimateof ~(r, c), the proportion of regular tables having an additive representation. Notice that, if 2, E 5, Z, E 5, and Z, = Zs , then either
TWO-WAY
TABLES
SATISFYING
CERTAIN
ADDITIVITY
97
AXIOMS
2, and 2, both have an additive representation or neither does. Hence, rr(r, c) is also the proportion of single cancellation tables that have an additive representation, so that R(T, c) = NI(r, c) p(r, c) is an estimate of N(r, c). Each randomly generated regular table that satisfied the double cancellation axiom in the preparation of Table 1 was examined for additivity using linear programming methods (Gale, 1960). The numbers of tables passing the additivity test are given in Table 2a. Estimates of the proportions of single cancellation tables that satisfy additivity can be read by inspection from Table 2a. Evidently, the estimates of N(r, c)/NI(r, c) decrease rapidly as r and c increase. This is not surprising, however, inasmuch as the earlier estimates of Nz(r, c)/NI(r, c) d ecreased rapidly as r and c increased. A more interesting ratio is that of N(r, c) to N2(r, c). Table 2b gives the numbers of observed additive tables as proportions of the numbers of observed double cancellation tables. Since those randomly generated regular tables that passed the double cancellation test can be regarded as random samples from the population of regular tables satisfying double cancellation, these proportions are estimates of the proportions of regular double cancellation tables also satisfying additivity. By an argument similar to that of Section 2 partitioning the set of double cancellation tables into equivalence classes, these observed proportions are also estimates of N(r, c)/Nz(r, c), the proportions of TABLE Number of Tables 1000 Randomly
2a
Satisfying Sampled
Additivity out of Regular Tables
c r
3
4
5
6
3
856
4
636
280
5
450
97
5
6
287
20
0
7
178
7
0
0
8
94
0
0
0
9
39
0
0
10
34
0
0
11
6
0
12
9
13
2
14
5
15
1
0
98
ARBUCKLE
AND
TABLE Proportion
of Sampled
Double
Cancellation
LARIMER
2b
Tables
That
Had
an Additive
Representation=
c r
3
4
5
3
1.00
4
.98
.88
5
.94
.78
.36
6
.88
.65
.oo
7
.88
.37
.oo
8
.81
.oo
9
.66
10
(.66)
.67
11
(.49)
.32
12
(.49)
.53
13
(.49)
.67
14
(.49)
.71
15
(.49)
1.00
a The numbers in parentheses an assumption of monotonicity the estimates obtained without
6
.oo
are maximum likelihood estimates of these proportions under (see text). These estimates are given whenever they differ from the constraint.
double cancellation tables that have an additive representation. The entries of Table2b are independent of the entries in Table 1 although the entries of Table 2a are not. Unfortunately, the samples upon which the proportions in Table 2b are based are extremely small in some cases, as can be seen from Table la. The resulting instability of these estimates is shown by the confidence intervals given in Table 2c. There seems to be some evidence that N(r, c)/Na(r, c) decreases with increasing r and c. The exceptions to this pattern in the estimates appear in the last few entries of the first column of Table 2b. We do not have a significance test for these violations of monotonicity, although we note that they occur in cases of particularly small sample size. It may be of interest to estimate N(r, c)/Na(r, c) under the assumption that this ratio does in fact decrease with increasing r and c. Maximum likehood estimates of N(r, c)/Na(~, c) under this monotonicity assumption were obtained by the method of van Eeden (1957). These estimates are given in parentheses inTable 2b, wherever they differ from those obtained without the monotonicity constraint. Our sampling method has proved to be too costly to permit the systematic estimation of N(r, c)/Na(r, c) to greater accuracy or for larger values of r and c. We will mention
TWO-WAY
TABLES
SATISFYING
CERTAIN
TABLE 95 o/0 Confidence
ADDITIVITY
99
AXIOMS
2c
Intervals on the Proportion of Double Having an Additive Representation
Cancellation
Tables
c Y
3
4
5
3
.997,
4
.968,
1.00* .990
.834,
.909
5
.910,
.956
.693,
.846
.128,
6
.837,
.911
.454,
.808
.OOO*,
.632
7
.828,
.922
.163,
.616
.ooo*,
.950
.OOO*,
.776
.ooo*,
.950
8
.727,
.877
9
.526,
.779
10
.521,
.792
11
.126,
.566
12
.278,
.770
13
.094,
.992
14
.290,
15
.050,
.649
.963 1.00
* Exact.
here two additional samples that we obtained in the hopes of revealing the asymptotic behavior of N(Y, c)/N2(r, c). One of these was for the case r = 7, c = 5, for which our earlier results seemed to suggest that N(Y, c)/N2(r, c) might be especially small. Out of 216,000 randomly constructed regular tables, 77 satisfied double cancellation, and, of these, 25 had an additive representation. Thus our estimate of N(7, 5)/N,(7, 5) based on this sample alone is 0.32. A second sample of 30,000 regular tables of size 15 x 3 turned up 60 tables satisfying double cancellation, of which 31 had an additive representation. Thus, our estimate of N(15, 3)/Ns(15, 3) based on the new sample is 0.52, a result in line with that obtained from the original sample under the assumption that N(r, c)/N2(r, c) is monotone nonincreasing in Y and c.
REFERENCES
Box, G. E. P., & Cox, D. R. An analysis of transformation. Society, Series B, 1964, 26, 21 l-252. BUSACKER, R. G., & SAATY, T. L. Finite graphs and networks. COOMBS, C. H. A theory of data. New York: Wiley, 1964.
Journal New
York:
of the Royal McGraw-Hill,
Statistical 1965.
100
ARBUCKLE
AND
LARIMER
FRAME, J. S., ROBINSON, G. DE B., & THRALL, R. M. The hook graphs of the symmetric group. Canadian Journal of Mathematics, 1954, 6, 316-324. GALE, D. The theory of linear economic models. New York: McGraw-Hill, 1960. HAMERMESH, M. Group theory. Reading, Mass.: Addison-Wesley, 1962. KENDALL, M. G., & STUART, A. The Advamed Theory of Statistics, Vol. 3, 3rd ed. New York: Hafner, 1973. KLAHR, D. A Monte Carlo investigation of the statistical significance of Kruskal’s nonmetric scaling procedure. Psychometrika, 1969, 34, 319-330. KRANTZ, D. H., LUCE, R. D., SUPPES, P., & TVERSKY, A. Foundations of measurement. New York: Academic Press, 1971. NAKAYAMA, T. Some modular properties of irreducible representations of a symmetric group, I, II. Japanese Journal of Muthematics, 1941, 17, 165-184, 277-294. VAN EEDEN, C. Maximum likelihood estimation of partially or completely ordered parameters, I. Proc. Akademie van Wetenschappen, Series A, 1957, 60, 1288136. RECEIVED:
June 30, 1974