Int. J. Man-Machhte Studies (1978) 10, 75-86
The GUHA method and desk calculators DAN POKORN~"
Center of Biomathematics, Czechoslovak Academy of Sciences, 142 20 Prague, Czechoslovakia A general discussion concerning the advantages of direct access to computing devices for the iterative use of GUHA procedures is presented (section 1). Section 2 brings a brief survey of GUHA procedures, implemented on desk calculators. Section 3 contains an example of a desk calculator oriented GUHA procedure, including an application in physiology. The procedure concerns a new method of identifying sources of dependence in two-way contingency table. Its program implementation is described in the Appendix.
1. The G U H A m e t h o d on a desk c a l c u l a t o r - - w h y and h o w WHAT DEVICES ARE INDISPENSABLE FOR THE GUHA METHOD ? G U H A can be viewed as a typical example of a mathematical method whose construction is conditioned by wide use of computers. Nevertheless, G U H A makes sense even in a world without any electronic computers. For example, the Baby-GUHA procedure described in H~jek & Havrfinek (1978, this issue) could be "implemented" on a mechanical device based on the principle of balls falling down in grooves with sliding stops. Moreover, even a mechanical device is not indispensable--to generate hypotheses exhaustively one needs pen, paper a n d . . , patience. We actually tried it on one occasion (far from all computers) and after three days of hard work by two men we obtained the solution of the associational G U H A procedure on a matrix of 50 objects and 20 qualities. This implementation is called a "jungle G U H A " procedure. The jungle G U H A procedure is not to be recommended, but we put forward the following questions. What are the simplest devices on which a G U H A procedure can be effectively implemented 9. And our answer is (in 1977): high-level programmable desk calculators. We mean, for example, the more sophisticated calculators of the Hewlett-Packard 9800 series or the Wang 2200 series, etc. VARIOUS ADVANTAGES OF GUHA PROCEDURES IMPLEMENTED ON DESK CALCULATORS The author of the present paper believes in the following advantages. (i) Direct access of a researcher to a computing device minimizes the time interval between the formulation and the solution of a problem. This fact is very important, if we have to solve a relatively simple problem concerning not too large data or if we use an iterative method of solution [cf. Ch. J. Bliss: " . . . data analysis by desk calculator has the advantage of forcing the analyst into intimate familiarity with the actual data values" (Dickey & Walrath, 1974)]. (ii) Direct access of a mathematician constructing an algorithm to the computing device offers a more effective way of constructing the programs. A mathematician hence has the possibility of trying different variants of his procedure; his theoretical work is supported by experimentation. 75 0020-7373/78/0101--0075 $02.00/0 © 1978 Academic Press Inc. (London) Limited
76
D. POKORNY
(iii) On the direct access computing device we can use conversational programs. An analyst has then the possibility of controlling branching of the procedure (in dependence on information contained in partial solution) and hence to obtain solutions corresponding well to his research goal. We try to demonstrate advantages of the "questionanswering" approach in the last part of this paper. (iv) A wide variety of peripherals, namely on-line measurement devices, can be used with the considered desk calculators ("measurement and computation"). Henceforward, we can construct G U H A procedures (tailored for particular purposes) yielding not only an automatic formation of hypotheses but also automatic data collection (this last idea is due to M. K. Chytil). Clearly, points (i)-(iii) have a common core---direct man-machine interaction. Such direct access to a computing device can be realized not only by desk calculators but perhaps more effectively by a terminal net, but we emphasize the desk calculators for practical reasons: a desk calculator is a typical computer device of prospective users of our procedures, i.e. researchers in scientific and applied research. In this paper we want to point out the possibility of direct man-machine interaction rather than desk calculators themselves. This possibility is particularly important in situations, in which we are not exactly able to state goals of data analysis at the beginning (cf. Hansen, 1976). Our considerations of desk calculators concern mainly technicalities--how to realize our idea of a direct man-machine interaction. SOME DISADVANTAGES OF DESK CALCULATORS (i) Machine-dependentlanguages. We usually cannot use a program written for a certain type of calculator on another type. Frequently, the language of a calculator is BASIC (a simple FORTRAN-style language), but BASICs of different calculators have many differences; moreover, some types do not have BASIC, but a specific non-standard language. On the other hand, desk calculator languages are now, in many ways, comparable with high-level languages and we can expect increased compatibility in the future. (ii) Low speed of computation. We can consider this fact from a non-traditional point of view: instead of estimating the time needed for solution of a given problem, compare dimensions of a problem solvable in a given time. (By "dimensions" of a problem we mean a measure on input parameters, e.g. number of qualities, maximal length of sentences, etc.). If the time complexity is exponentially or high order polynomially dependent on the dimensions (this is typical for G U H A procedures), then a substantially quicker computer solves only slightly larger problems then the slow one. In such situations, a desk calculator can solve (from the point of view of time complexity) problems of a dimension comparable with the dimension of problems solvable by computers. This fact agrees with our experience. Let us demonstrate on a theoretical example, based on the associational G U H A procedure. Example. Consider data with n + l qualities Fo, F 1. . . . . Fn and a fixed given number of objects. Let ,,, be an arbitrary associational quantifier and q a natural number. Task. Verify in the data all sentences of the form F 0~ • or -IF0 ~ •, where ~ is a nonempty elementary conjunction built up from F 1. . . . ,Fn and with length
77
GUHA AND DESK CALCULATORS TABLE 1
q
Mx
M2
M3
2 1 1
25 5 3
250 10 7
1 1
3 3
6 5
1
3
5
M4
M5
Me
2.5x103 2.5x104 2.5x10 s 50 158 500 16 34 72 10 17 30
M7
Ms
Ms
2.5x106 2.5x10 ~ 2 . 5 x l 0 s 1581 5000 15811 155 335 721 53 94 165
8
12
19
29
46
73
7
9
11
14
16
18
The number of sentences to be checked (we use no helpful quantifier) is the following: q
C(n,q)=2
. Y~ (2 k.(7,)) k=l
which can be treated as a polynomial in n of degree q. On the other hand, we have particularly C(n,n) = 2. ( 3 " - 1 ) . Let us now suppose that we have given time t and a computing device M~ such that the device Mx is able in the time t to check 10 x sentences. (If both the given time t and number of objects are "reasonable", we can say M~ is a man for x about 0, a desk calculator for x about 2,3, a computer for 4,5,6, a future computer for x = 9.). For which n is our task solvable on a device M~? Table 1 shows values of such n's for M0,...,Mg, q= l ..... 5andq=n. (iii) Small internal memory. Internal memory is less than or equal to approximately to 64 kB (usually less); this fact causes the main limitations of our abilities and leads to a specific program philosophy. (a) We have to use simplified versions of G U H A procedures. This disadvantage is compensated by conversational abilities of calculators; the analyst has to make some decisions made in original procedures by a computer. (b) Programs are often segmented, even on a cost of time complexity. We cannot use auxiliary tables, e.g. of "critical" values (cf. Rauch, 1978), which would speed up a program considerably (think e.g. of the Fisher quantifier, cf. Chapter VII of H~,jek & Havrfinek, 1977). (c) The dimensions of the processed data file are substantially limited. Moreover, the situation is complicated, because the smallest direct accessible unit of memory is often not the bit, but a greater element, namely a "short integer" or "real number". I f we consider these facts from the point of view of the dimensions of the data file needed for statistically reasonable results, we obtain the following ordering of data types with respect to their decreasing convenience for processing on desk calculators by G U H A procedures: (i) reduced information (frequency tables, etc.); (ii) rational valued quantities; (iii) multinomial quantities; (iv) two-valued qualities. We need to find a compromise between economy of memory and reasonable dimension of a task; excessive saving, i.e. segmentation of the data file on external media and/
78
D. POKORI~
or concentration of maximal information on minimal space leads to an unacceptable time of computation and to loss of program elegance.
2. On G U H A programs for desk calculators The G U H A procedures implemented on desk calculators or equivalent consulting devices can be divided into two groups: (i) simplified versions of computer projects; (ii) procedures substantially based on conversational abilities of calculators (i.e. on direct man-machine interaction). Till now, there have been more procedures of the first kind; mini-versions of the implicational or associational version (Hfijek & Havrdmek, 1978) have been implemented. The older ones are Havr~inek's program LSM (implicational) and SSM (associational) for CELLATRON 8206 (1972; names of programs are picturesque Czech acronyms). Newer is an author's program for HP9821A. Here we present its limitations. The maximal dimension of the data file is 500 objects and 10 three-valued quantities. Relevant sentences are elementary association (i.e. pairs of elementary conjunctions connected by a quantifier) with the chi-square quantifier. The maximal admissible length of a sentence is 3 literals. No helpful quantifiers are used. Compare these limitations with the abilities of the new Rauch & Havel program for the IBM 370. On the other hand, the program has some small advantages, e.g. intelligible output on an XY-plotter. New programs for the HP9800 series realizing projects described in Havrfinek & Vos~hlo (1978) and Havrfinek & Pokorn2~ (1978) are just being designed. The first one belongs partially to the above-mentioned first group, while the second belongs to the second group; we shall describe it in section 3 and the Appendix of the present paper, demonstrating semantical advantages of a conversational procedure.
3. A method for identifying sources of dependence in t w o - w a y contingency tables--an example We assume now a knowledge of sections 1.1-1.6 and 1.12 of Havrfinek & Pokorn~ (1978). The core of the projects described there is an implementation of subroutine COLLAPS (K1, K2, TABLE, N, K, Q, TYPEI, TYPE2, LIST). Our implementation on HP9821A has the following provisional limitations: K1, K2 < 10 (i.e. TABLE < 10 × I0) N < 3, types only "full" and "threshold". Now we shall demonstrate the use of the COLLAPS procedure on a physiological example (we are obliged to Dr O~[~idalov(~ of the Institute of Physiology for permission to use her experimental data, for physiological details see Ogthdalowi, Babick~ & Obenberger (1978)). We consider two quantities on a sample of 159 suckling rats: F--applied doses of sodium selenite (60, 40, 20, 10 or 5 lamol/kg body weight) and G--its effects (A---death, B--permanent ocular cataract, C--intermittent ocular cataract, D--no effects). For the first quality we consider the threshold set of coefficients, and for the second one the full set. See Table 2. RUN A (NON-CONVERSATIONAL) We applied COLLAPS with N = 1, K = 0, Q = - 1 (i.e. maximum of significant chisquare; see section 1.12 of Havr~nek & Pokorn~ (1978)).
79
GUHA AND DESK CALCULATORS TABLE 2
G A
F
rA
B
C
60 40 20 10 5
24 8 1 0 0 33
0 22 20 5 0 47
0 6 5 6 0 17
D~ 0 0 5 17 40 62
24 36 31 28 40 159
In the first step we obtain (I) ( 6 0 ) F ~ ( A ) G
(;~2 = 107.93,
~2/m = 0.679)
high doses ~ mortality. Recall that (1) is the most significant hypothesis of the form (X)F,,~(Y)G, where (X) can be (60), (60, 40), (60, 40, 20), (60, 40, 20, 10), (40, 20, 10, 5), (20, 10, 5), (10, 5), (5) and (Y) can be any of the non-empty proper subsets of (A, B, C, D). Then we consider the subtable (60)×(A) and (40, 20, 10, 5)×(B, C, D) (see Table 3). In the second step we obtain for (T2) the most significant hypothesis: (2) (40,20)F,,~(B,C)G/(40,20,10,5)F & (B,C,D)G (Z 2 = 70.83, z2/m = 0.562). As the third and last step we obtain: (3) (40)F,,.(C)G/(40, 20)F & (B,C)G (Z 2 -- 0.016, ~2/m -- 0.000309). Graphically we can present the results of our iterated procedure in two ways. The first consists of a decomposition of the table into blocks (we use the "full" set of admissible coefficients for G and hence columns can be permuted)--see Table 4. Here the columns are permuted so that we have the following: if a block is divided by a vertical and horizontal line into sub-blocks, their positive association holds in the direction of the main diagonal. The second is a dendogram (see Fig. 1). Hence we successively obtain a hierarchical structure; the set of its lowest elements determines a one-to-one mapping 9: X~Y, where X (or Y) is a partition of the range of F (or G respectively) and we have: if 9(x) = y then TABLE 3
G F
B
C
D
40 20 10 5
22 20 5 0 47
6 5 6 0 17
0 5 17 40 62
28 30 28 40 126
80
D.
voKom,~
TABLE 4 G .g
F
A
C
B
r;
60 40 20 10
24 8 1
6 6 5 6
22 20 50
17
5
40
5 (60) n (A)
[
~t
1201 n (B) (10,5) 13 (D) FIG. I.
min/card(x), (card(y) = 1. In our case t# = {((60),(A)), ((40),(C)), ((20), (B)), ((10,5), (D))}. The representation of sources of significance in Table 2 is the whole hierarchical structure: V = tp L){ ((40,20),(B,C)), ((40,20,10,5),(B,C,D)), ((60,40,20,10,5),(A,B,C,D))}. This method of representation of relations in a contingency table is intuitively based on the simplifying assumption "the most interesting = the most statistically significant". By the way, the significance of a particular hypothesis is to be considered only as a way of representation of relations in the data analyzed; here we meet problems of simultaneous statistical inference (cf. Havr~nek, 1978). Some facts concerning real statistical reliability of results of the just-considered kind can be found in Sugiura & Otake (1973) and Havr~nek (1977). RUN B (CONVERSATIONAL) The previous run can be realized without any intervention of the analyst: all decisions can be made by the implemented procedure. This fact is due to our assumption "the most interesting = the most significant". On the other hand, "interestingness" of a hypothesis for a researcher is a much more complex notion, e.g. a role is played by such factors as (i) other statistical measures different from Pearson's chi-square and (ii) semantical factors which are mathematically intractable. In many situations, "the most interesting" hypothesis is at least "sufficiently significant". Such a notion can be formalized: the set o f a-sufficiently significant hypotheses, where a is a sequence of input parameters of the subroutine COLLAPS, is a set of all hypotheses from the corresponding output file LIST. In run B we want to have, in each step, three hypotheses in the output file LIST; i.e. we put N = 3, K = 0, Q = - 1 . Recall that these three hypotheses have a better chisquare value than the other hypotheses considered on the subtable just processed (cf.
GUHA AND DESK CALCULATORS
81
section 1.5 in Havrfinek & Pokorn~, 1978). After each step, the analyst chooses one of three hypotheses as the most interesting. The chosen hypothesis is printed in the final solution and its choice determines the branching of the procedure (i.e. determines which subtables are to be processed in the next steps by COLLAPS). In our example we obtain, in the first step, the following sentences: (l-i) (60)F~(A)G (X2 = 107.93, z2/m = 0-679) (1-ii) (60,40,20)F~(A,B)G (Z 2 = 87.72, z~/m = 0-552) (l-iii) (60,40,20,10)F~(A,B,C)G (Z ~ = 83.62, x~/m = 0.526) We choose (1-iii) as the most interesting. Reasons for such a decision are as follows. Statistical measures of all three sentences are comparable; the sentence (1-iii) says that we observe a positive association between non-negligable doses of selenite and pathological effects--this fact is semantically (i.e. from the physiological point of view) the most important. In further steps the choice of the most interesting sentence is clear, both for statistical and physiological reasons. In all cases, the first sentence has considerably greater chisquare value than the other two. The solution gives a good description of data (see Table 5 and/or Fig. 2). We have a clear relation between doses and effects. This solution is not an "unbiased" representation of all interesting sentences in the formalized sense of the word, but a "subjective" representation of all relations interesting to a specific researcher. TABLE 5 G A F 60 40 20 10
A
•
24 8 1
B
C
22 20 5
6 5 6
D ~
5
17 40
(60) n (A} , (40, 20) [] (B) (10) [] (C) (5) [] (D)
Flo. 2.
Conclusion In this paper we have tried to point out two facts relevant to construction and implementation of new GUHA-projects: (i) advantages of direct man-machine interaction; (ii) miniaturization of computing devices. What perspectives can we have for the future ?
D. POKOR~
82
(i) We hope that the above advantages will be used for the G U H A method more than they have been till now. (ii) We believe that miniaturized calculators will prosper to such a degree that within ten years (January 1988) a paper will be written for the International Journal o f M a n Machine Studies with a title "The G U H A method and pocket calculators". My thanks are due to Dr T. Havr~inek and Dr P. H~tjek for their valuable remarks and kind help with the translation of this paper.
References DICKEY,J. ~ WALRATH,J. (1974). Advanced breast cancer data. In J. W. DIXON,Ed., Exploring Data Analysis. Los Angeles: University of California Press. HkmK, P. & HAVRANEK,T. (1977). Mechanizing Hypothesis Formation. Mathematical Foundations for a General Theory. Heidelberg: Springer-Verlag. HJ~JEK, P. & HAVR./~NEK,T. (1978). The GUHA method--its aims and techniques. International Journal of Man-Machine Studies, 1O, 3-22. HANSEN, J. V. (1976). Man-machine communication: An experimental analysis of heuristic problem-solving under on-line and batch-processing conditions. IEEE Transactions on Systems, Man and Cybernetics, SMC-6, 746-752. HAVRAN[K, T. (1977). On the asymptotic distribution of the maximum of chi-square statistics on 2 x 2 tables derived from a I xJ table (in Czech). Research Report. Prague: Dept. of Biomathematics, Czech. Acad. Sci. I-lAX'RAN[K,T. & POKORN~, D. (1978). GUHA-style processing of mixed data. International Journal of Man-Machine Studies, 10, 47-57. HAVRANEK, T. & VOSAHLO, J. (1978). A GUHA procedure with correlational quantifiers. International Journal of Man-Machine Studies, 10, 67-74. O~ADALOV~,I., BAEICK'~,A. & OBENBERGER,J. (1978). Cataract induced by administration of a single dose of sodium selenite to suckling rats. Experientia (to be published). RAUCH, J. (1978). Some remarks on computer realizations of GUHA procedures. International Journal of Man-Machine Studies, 10, 23-28. SUGIURA,N. & OTAKE,M. (1973). Approximate distribution of the maximum of c-lxl-statistics (2 x2) derived from 2 x c contingency table. Communications in Statistics, 1, 9-16.
Appendix: on an implementation of the subroutine COLLAPS (on H P9821A) In this appendix we tryto explain some heuristic tools speeding up the search for the most significant sentence. First, we shall consider the particular case of 2 x J tables and later the general case of I × J table. PROCEDURE FOR 2 xJ TABLE Consider a two-way frequency (contingency) table (Table 6) as an input for subroutine COLLAPS with parameters N,K,Q. Our aim is to construct the output file LIST 9 ~ ( Y 1 ) G , 9 ~ ( Y D G . . . . . tp~(Y~)G;, where coefficients Yj are not empty proper TABLE 6
tp "]q~
(1)G
...
(j)G
...
(J)G
a~ c~
... ...
aj cl
... ...
aj cj
r s
kl
...
k~
...
k~
m
GUHA AND DESK CALCULATORS
83 TABLE 7
cp --lq,
¥
7¥
a c
b d
r s
k
l
m
subsets of {I . . . . . J} ("full" set of coefficients). LIST contains l sentences corresponding to the greatest values of the chi-square statistic. LIST is ordered according to the descending values of the chi-square statistic. Recall that for the fourfold table (Table 7) one of the possible forms of chi-square statistic is as follows: Z = ((am-kr)x/m)/v/k(m-k)r(m-r)
(1)
•
Suppose, without any loss of generality, that kl >_-k2 _-> • • • > kv Denote av = ~ aj, k v = W, k~, tSj = a j m - k ~ r , 3v = a v m - k v r . JeY
Clearly 3v =
JeY
W, #j. If we eliminate constants from (1), we see that it is sufficient to JEY
consider expressions: Lv = 8 v / v ' k v ( m - k v).
(2)
Instead of a critical (threshold) value K we use B = K . V ' r ( m - r ) / m . Denote H = {j;#j > 0}. Sentences are generated lexicographically w.r.t, the symmetric difference with H: y n = Y A H -- ( Y - H ) to ( H - Y ) . Note an important implication: if YlH ~-- Y2H then 8vl/>Sv,.
(3)
Denote = {ZIYHGZ H and max (YH)
(4)
Example. Put J = 4 (i.e. the range corresponding to G is {1,2,3,4}. Let H = {2,3} and
Y = {3}. Then y e = {2} and Y contains Z's having the following symmetric differences with H: {2}, {2,3}, {2,3,4}, {2,4} (see Fig. 3). Denote now, f o r j = 0,1 . . . . . J - 1 J
PLUS(j) =
Z q =j+ 1 qEH
J
kq, M I N U S ( j ) =
X kq
(5)
q =j & 1 q~H
and PLUS(J) = MINUS(J) = 0. Lemma. If Z eY and j = max(Y H), then k v - M I N U S ( j ) < < . k z ~ k v + PLUS(j). From the lemma we obtain the following inequality. Let ZeY, then kz(m
-
kz) >~min(M v(m - M y), P v ( m - Pv))
(6)
84
D. POKORN~
3
12
13
14 I/
24 \1
/\1
,I I 234
I
I
123 124 134
,i/"
4
:34
/
.-
1234
Fzo. 3.
where M r = k v - M I N U S ( j ) , Pv = k y + P L U S ( j ) with j = max(Ya). Moreover, for each Z, Z :/: z , Z:~{1 . . . . , J} we have: k z(m - k z)/> k ,(m - k ,).
(7)
Finally, for Z ~ Y - { ~ , { I . . . . . J}} we obtain
k z(rn-k z) > Sv
(8)
where S y = max(ka(m-kj), m i n ( M y ( m - M y ) , P v ( m - P y ) ) ) . Using (3) and (8) we obtain the following upper limit for value of statistics Lz. For each ZEY, Z ~ , Z~{1 . . . . . J}, Lz < Q y (9) where Qv = ~v/~/Sv. We shall use this upper bound for heuristic jumping in our procedure. Sentences 9 ~ ( Y ) H are generated according to the lexicographical order of differences y a = Y A H (i.e. ~ , {1}, {1,2}. . . . ). Sentences are checked w.r.t, their (chi-square) value in the table. If for a coefficient Y, L v > B then the sentence 9 ~ ( Y ) H is written into file LIST (and if necessary another sentence is erased, and a new threshold value is stated, see section 1.5 in Havr~mek & Pokorn2~ (1978)). If j e y n (i.e. Y is not on the " b o t t o m " of the three of Fig. 3), then before the test " L v > B ?" an auxiliary test "Q v < B ?" is made and if the answer is "yes", then the whole set Y is deleted (overjumped). See the flowchart in Fig. 4. Commands " G O O N I " and " G O O N 2 " have the following meaning: "generate the next sentence". GOON 1 is used if j ~ y n , then for the next coefficient we have Y., ya. = yHk3{jq_ l}, w h e r e j = max (yH). G O O N 2 is used if j e y n ; then y.n = Y ~ u { k + l } - { k , J } , where k = max (yH_{j}). " J U M P " means "jump over all sentences with coefficients in Y", i.e.y.a = y H t3{k + l } - {k}. Example. Consider Table 8. We want to find the most significant sentence of the form (1)F,,,(Y)G, where Y _ {1,2,3,4,5}.H = {1,3} and the way how the procedure finds the most significant sentence (I)F,~(1,3,5)G is demonstrated on Fig. 5 (list of symmetrical differences yH). Only 6 from 32 possible sentences must be checked. Remark. In addition to jumping some "technical" tools can be used to speed up the procedure. Namely, some auxiliary fields of dimension J are computed to avoid a great number of arithmetic operations. For example: instead of Table 6, we use a table where aj is substituted by ~j = ajrn-kjr; recall that 3~, = ~ Sj, where 3y = ( ~ aj)mjoY
J~Y
GUHA AND DESK CALCULATORS
I
FIND H LIST=¢
]
Y"--°
I
85
- i
JUMP
E) I CHANGELIST I {ANDCRITICAL [
VALUEB)
I
,
? WRITE LIST
[
END
FIG. 4. TABLE 8 1
2
3
4
5
1
30
3
30
3
24
90
2
10
15
10
15
20
70
40
18
40
18
44
160
( ~ kj)r.
In the block, " G O O N I " values of the new fourfold table are c o m p u t e d as jfY follows: 6 v , = 6v+U(j), k v , = kv+v(j), where j = max(Y) and u, v are auxiliary vectors independent of Y. Similarly for G O O N 2 and J U M P . PROCEDURE FOR A TABLE I x J O u r aim is to find, for given input parameters I,J, T A B L E , N , K , Q , T Y P E 1 , T Y P E 2 , an
86
D.
TEST G(~ONI
I--dUMP
125
124 125
I%
I/ JI\ I 1234 1235 1245 1345 I 12345
5
BEGIN I ¢*~J
"*
~, 2 - - JUMP . . . . .
/\ 234 235 I 2345
POKORN~
-> 3 - - J U M P - - ~, 4--,JUMP >'5~ TEST
34
35
I
45
END
I 245
Fio. 5.
TABLE 9 (I)G
...
(J)G
(X)F
ai
...
a ~
r
-q(X)F
cl
...
c~
s
kl
...
k~
output file LIST: (X1)F,-, (Y1)G. . . . . . . . (X~)F,-, (Y~)G. Suppose that at least TYPE 2 is "full" and columns of the table are ordered in such way that k~ _~ . . . > kj. If, moreover, TYPE1 is "full" then we assume I < J and r~ > . .. > ri. The procedure generates all literals (X)F (lexicographically w.r.t. X) where X is admissible. Tables (like Table 9) are processed by the above described procedure for 2 × J table with some small changes: on the start of the procedure LIST can be nonempty and the threshold value can be changed. CONCLUSION
Our example shows that the jumping procedure can be highly effective. The final decision concerning its value depends on computer experiments that are now being made. The present version of the algorithm is probably not the last.