Annals of Discrete Mathematics 26 ( 1985) 49-58 0 Elsevier Science Publishers B.V.(North-Holland)
49
Performance of Subset Generating Algorithms Margaret Carkeet and Peter Eudes Department of Computer Science University of Queensland St. Lucia, Queensland
AUSTRALIA
Abstract This note reports on some tests of several algorithms for generating the subsets of fixed size of a set. In particular, the speed of execution is compared. 1. Introduction
In this note the results of tests of algorithms for generating all subsets of size k of a set of size n (sometimes called combinations) are reported. We are concerned with testing the upced of the aigorithms. No complexity analysis is applied; we are merely reporting the results of some tests. There are eight such algorithms known to the authors. BER: From [l]. Wc tested the optimized version of the algorithm, described in
IS] (page 186).
CHASE: From [3]. EMK: From IS]. An optimized version (from B. D. McKay, private communication) was tested. EE: Even’s version (in [7j, page 42) of Ehrlich’s algorithm in [S]. LS: The optimized (third) version from [8]. LEX: The usual lexicographic algorithm. It is described in all standard texts, including (01 (page 181). RD: The “revolving door” algorithm presented in [lo] (subroutine NXSRD on page 30). EHR: The very strong minimal change algorithm described in [4] and [2]. Note that this algorithm works only for restricted values of n and k. For this reason, and because it is much slower than the others, this algorithm was not tested.
M. Carkeet and P. Eades
50
Some of these algorithms have "minimal change" properties, that is, successively generated subsets differ from each other by a small amount. To describe these properties we need to consider the data structures used to represent subsets. The elements of the sets are represented by the integers l,2,...,n . A k-subset S of an n-set can be represented as a 6itwctor (61, 62, ... ,6,), where 6, is 1 if z is in S and 0 if z is not in S. Alternatively, if S={aI, 82, ... ,ak} where 81 < 82 < ... < a), then S can be represented by the odered array (81, 82,
*
..>8 k ) -
(Aside : AU the algorithms above can be implemented using either data structure. For testing each algorithm was implemented using the data structure which made it faster: bitvectors were used for BER and EE, all the others used ordered arrays. It is usually easy to convert an ordered array algorithm to a bitvector algorithm without effecting performance significantly. The reverse conversion, however, often reduces performance.) The minimal change properties are: WMCP (Weak Minimal Change Property): Successively generated bitvectors differ in a t most two positions. This means that the next subset is formed from the previous one by deleting one element and adding another. This property holds for all the above algorithms except LM. SMCP (Strong Minimal Change Property): Successively generated ordered arrays differ in only one position. Note that this implies WMCP. This property holds for EHR, CHASE, EMK,and EE. WMCP (Very Strong Minimal Change Property): Successively generated bitvectors differ in two adjacent positions. This implies SMCP. It holds for EHR only. These properties are discussed in detail in IS]. 2. TheReealtr
The first seven algorithms above were tested on a Perkin-Elmer 3220 running UNIX. These language used was Pascal, and the programs were run under two different systems: the Berkeley Pascal to pcode compiler, and a UQ Pascal to C compiler. The Berkeley system reports the number of statements executed, and this was used as an indication of running time. The UNIX time utility was used to give an indication of the execution time under the UQ system. The two different Pascal systems and the two different timing systems were in substantial agreement, and only the results from the Berkeley system are quoted here.
,
Performance of subset generating algorithms
51
The authors recognize the dangers of this type of measurement. The time utility is a little sensitive to the machine load at the time of execution. It is quite probable that a different programmer, a different language, a different hardware configuration, could have produced different results. Every effort was made to minimize the effect of these differences, but we admit that a t best, only the first few digits of our results are significant. To obtain more significance a full complexity analysis (along the lines of the analysis of LEX in [Q]) would be required. With the exception of LEX and RD,all the algorithms tested are fast in the sense that the average time to generate a subset is bounded by a constant, independent of n and k . Further, these algorithms are loopless, or uniformly bounded, which roughly means that the time to generate each subset is constant, independent of n and k . (See [Q] for a precise definitions of these properties.) LEX and RD do not have these properties when k is close to n. The graph in figure 1 summarizes the results. The tables from which figure
1 was derived are in figure 2. The vertical axis in figure 1 is the average number
of Pascal statements executed per subset produced. The average was taken over n=5 to n=12. The horizontal axis represents the range of k ; the leftmost value is k=2, and the rightmost is C=n-2. The other value of k are dispersed linearly between the left and rightmost. Some statement counts for larger values of n are given in figure 3. 8. Conclusions All the algorithms except EHR are reasonably simple and can be coded in a few pages. LM is very simple and takes only a few minutes to write.
No algorithm (exccpt E m ) uses more than O(n) space; this is insignificant in comparison to time requirements. The main result of the tests is that LS is significantly faster than any of the others. An implementation of LS on a VAX11/750 generates a subset'about every 45 microseconds; on a Cyber 172/2 it takes about one third of this time.
In an application, each subset has to be processed in some way. If the
processing time dominates the generation time, then the processing time also determines the size of the largest problem that can be tackled. However, if the processing time is about the same or less than the generation time, then the generation time imposes a limit on the largest problem which can be tackled: for instance, in an hour of CPU time on the Cyber172/2, LS can process every 15subset of a 30-set. Hand optimized assembler, or a supercomputer, could improve this limit, but not significantly.
52
M. CarkeetandP. Eades
40
30
20
10
0
Figure 1
53
Performance of subset generating algorithms
The only disadvantage of using LS is that it does not have SMCP. EMK, about 4 times slower than LS,is the fastest elgorithm with this property. If the processing is significantly faster with SMCP, then EMIC should be used. Also, if the processing time dominates generation time, then a minor speedup from SMCP may justify EMK. The problem of finding a fast algorithm which has VSMCP is open. Finally we note that LEX is surprisingly fast. The simplicity of this algorithm (it requires no clever stack implementation), makes it attractive.
I
I
173
161
n=6
252
296
244
n=7
3.17
495
539
313
n=8
458
770
1058
794
434
n=9
585
1133
1893
1733
1245
533
n=10
728
1596
3152
3408
3080
1680
680
n=12
1062
2850
7454
10508
13574
10734
7166
4902
1602
k=2
k=3
k=4
k=5
k=O
k=7
k=8
k=9
k=1O
I
-
I
I
I
Figure 2b. CHASE number of statements executed n=5
246
240
n=6
371
459
358
n=7
527
797
789
493
n=8
716
1285
1554
1237
664
n=9
940
1056
2803
2740
1861
a48
n=10
1201
2854
4719
5486
4557
2646
I n=ll I I
I
-
Figure 2s. BER number of statements executed n=5
I
1.501
I I
3089
I
I
7520
I
10142
I I
9995
I
7134
I
1076 3670
I
1313
I
* * * * 1 1 I M. CarkeetandP. Eades
54
n=5
183
187
n=6
2130
364
610
712
428
1363
1122
603
1321
2316
2461
1760
1792
9878
4733
4232
928
n =I0 n=ll
I
660
I
111I11 319
2344
I
I
6619
8361
766
I
9012
973
2481
I
6S77
I
3476
I
1168 ~
1980
79'22
13806
17462
16626
10186
4613
k=3
k=4
k=5
k=6
k=7
k=8
k=9
k=10
-
Figure 2d. EE numbei
n =5
326
n =6
616
n =7
760
n =8
1087
n =9
1440
n =I0
1888
4448
n=ll
2411
6334
n =I2
1173
I
I
1181
I 11793 I
762
1
16916
3021
k=2
k=3
k=4
k=5
-
Figure 2e. LS number of statcments executed
n =5 n =6
n =7
61
I
134
I
182
_+f_t_ I 1
147
1
n =8
n=9
263
n=10 n=ll n =I2
133
I
460
I
1104
1098
k=5
1122
2214
1882
1116
4048
4080
2980
k=6
k=7
k=8
fk=9
I
k=lO
55
Performance of subset generating algorithms
7 Figure 2 t LEX - numbe
1 ;1 1 ~
n=5 n=6
/n=7 n =8
152
177
222
333
293
558
628
866
1188
ino
2ose
n =Q
j
446
d b
of statements executed
1076 2286
3988 8314
n =12
15970
k=6
736
621
n =8
1 ;3; 1 1 1 1404
1295
1592
n=9
2086
2406
3474
n =5 I
n=6
ln=7
174
281
251
514
338
888
1 1 1 I
3604 6598
I
1183 3793
14914
10393
k=7
k=8
I
1
k=9
1
1 I k=10 19Ss
1 1
?-+j-+j 1974
1180
1
n=10
690
2957
3873
6788
4641
3658
1280
n=ll
832
4040
5913
12213
9728
9655
4221
1903
n=12
987
5358
8656
20829
18697
22466
11836
7109
2003
56
M. CarkeerandP. Eades
I
1
Figure 3. Number of Pascal statements executed. n
k
BER
CHASE
EMK
EE
Ls
LE%
RD
14
3
4687
gas0
4616
16481
9 I7
6184
8790
I4
6
44419
66640
63237
1osa67
i~sa
49046
6710
I4
9
27239
43211
40620
72131
11618
JIOM
69380
18
3
ioaso
ia664
8727
303~7
191s
I IS70
19488
18
6
278111
432264
292287
>WOO00
68601
284169
326006
18
9
>WOO00
>600000
>WOO00
>SOOOOO
214523
>WOO00
>SO0000
-
~~
~
-
~
~~
Perfortnance of subset getreratkg algorithtiis
57
References
J. R. Bitner, G. Ehrlich, and E. M. Reingold, “Efficient Generation of the Binary Reflected Gray Code and its Applications”, Communications of the Association for Computing Machinery, 19 (1978)517-521. Margaret Carkeet and Peter Eades, “An Implementation of a Minimal Change Algorithm”, Technical Report No. 45, Department of Computer Science, University of Queensland, January 1983. Philip J. Chase, “Algorithm 382: Combinations of M out of N objects”, Communications of the Association for Computing Machinery, 13 (1970) 3138. Peter Eades, Michael Hickey and Ronald C. Read, “Some Hamilton Paths and a Minimal Change Algorithm”, Journal of the Aseociation for Computing Machinery 31 (1984)19-29. Peter Eades and Blendan McKay, “An Algorithm for Generating Subsets of a Fixed Size with a Strong Minimal Change Property”, Information Processing Letters 19 (1984)131-133. Gideon Ehrlich, “Loopless Algorithms for Generating Permutations, Combinations, and other Combinatorial Configurations”, Journal of the Association for Computing Maehiney, 20 (1973)500-513. Shimon Even, Algorithmic Combinnforb, hlacmillan (1973). Clement W. H. Lam and Leonard H. Soicher, “Three New Combination Algorithms with the Minimal Change Property”, Communications of the Association ,’or Computing Machinery, 25 (1982)555-559. Edward M. Reingold, Jurg Nicvergelt and Narsingh Deo, Combinatorial Algorithme, T h e o y and Practice, Prentice Hall (1977).
[lo] Albert Nijenhuis and Herbert S. Wilf,
Combinaton’al Algorithms,
Monographs in Computer Science and Applied Mathematics, Academic Press, (1975).