Performance of Subset Generating Algorithms

Performance of Subset Generating Algorithms

Annals of Discrete Mathematics 26 ( 1985) 49-58 0 Elsevier Science Publishers B.V.(North-Holland) 49 Performance of Subset Generating Algorithms Mar...

428KB Sizes 2 Downloads 66 Views

Annals of Discrete Mathematics 26 ( 1985) 49-58 0 Elsevier Science Publishers B.V.(North-Holland)

49

Performance of Subset Generating Algorithms Margaret Carkeet and Peter Eudes Department of Computer Science University of Queensland St. Lucia, Queensland

AUSTRALIA

Abstract This note reports on some tests of several algorithms for generating the subsets of fixed size of a set. In particular, the speed of execution is compared. 1. Introduction

In this note the results of tests of algorithms for generating all subsets of size k of a set of size n (sometimes called combinations) are reported. We are concerned with testing the upced of the aigorithms. No complexity analysis is applied; we are merely reporting the results of some tests. There are eight such algorithms known to the authors. BER: From [l]. Wc tested the optimized version of the algorithm, described in

IS] (page 186).

CHASE: From [3]. EMK: From IS]. An optimized version (from B. D. McKay, private communication) was tested. EE: Even’s version (in [7j, page 42) of Ehrlich’s algorithm in [S]. LS: The optimized (third) version from [8]. LEX: The usual lexicographic algorithm. It is described in all standard texts, including (01 (page 181). RD: The “revolving door” algorithm presented in [lo] (subroutine NXSRD on page 30). EHR: The very strong minimal change algorithm described in [4] and [2]. Note that this algorithm works only for restricted values of n and k. For this reason, and because it is much slower than the others, this algorithm was not tested.

M. Carkeet and P. Eades

50

Some of these algorithms have "minimal change" properties, that is, successively generated subsets differ from each other by a small amount. To describe these properties we need to consider the data structures used to represent subsets. The elements of the sets are represented by the integers l,2,...,n . A k-subset S of an n-set can be represented as a 6itwctor (61, 62, ... ,6,), where 6, is 1 if z is in S and 0 if z is not in S. Alternatively, if S={aI, 82, ... ,ak} where 81 < 82 < ... < a), then S can be represented by the odered array (81, 82,

*

..>8 k ) -

(Aside : AU the algorithms above can be implemented using either data structure. For testing each algorithm was implemented using the data structure which made it faster: bitvectors were used for BER and EE, all the others used ordered arrays. It is usually easy to convert an ordered array algorithm to a bitvector algorithm without effecting performance significantly. The reverse conversion, however, often reduces performance.) The minimal change properties are: WMCP (Weak Minimal Change Property): Successively generated bitvectors differ in a t most two positions. This means that the next subset is formed from the previous one by deleting one element and adding another. This property holds for all the above algorithms except LM. SMCP (Strong Minimal Change Property): Successively generated ordered arrays differ in only one position. Note that this implies WMCP. This property holds for EHR, CHASE, EMK,and EE. WMCP (Very Strong Minimal Change Property): Successively generated bitvectors differ in two adjacent positions. This implies SMCP. It holds for EHR only. These properties are discussed in detail in IS]. 2. TheReealtr

The first seven algorithms above were tested on a Perkin-Elmer 3220 running UNIX. These language used was Pascal, and the programs were run under two different systems: the Berkeley Pascal to pcode compiler, and a UQ Pascal to C compiler. The Berkeley system reports the number of statements executed, and this was used as an indication of running time. The UNIX time utility was used to give an indication of the execution time under the UQ system. The two different Pascal systems and the two different timing systems were in substantial agreement, and only the results from the Berkeley system are quoted here.

,

Performance of subset generating algorithms

51

The authors recognize the dangers of this type of measurement. The time utility is a little sensitive to the machine load at the time of execution. It is quite probable that a different programmer, a different language, a different hardware configuration, could have produced different results. Every effort was made to minimize the effect of these differences, but we admit that a t best, only the first few digits of our results are significant. To obtain more significance a full complexity analysis (along the lines of the analysis of LEX in [Q]) would be required. With the exception of LEX and RD,all the algorithms tested are fast in the sense that the average time to generate a subset is bounded by a constant, independent of n and k . Further, these algorithms are loopless, or uniformly bounded, which roughly means that the time to generate each subset is constant, independent of n and k . (See [Q] for a precise definitions of these properties.) LEX and RD do not have these properties when k is close to n. The graph in figure 1 summarizes the results. The tables from which figure

1 was derived are in figure 2. The vertical axis in figure 1 is the average number

of Pascal statements executed per subset produced. The average was taken over n=5 to n=12. The horizontal axis represents the range of k ; the leftmost value is k=2, and the rightmost is C=n-2. The other value of k are dispersed linearly between the left and rightmost. Some statement counts for larger values of n are given in figure 3. 8. Conclusions All the algorithms except EHR are reasonably simple and can be coded in a few pages. LM is very simple and takes only a few minutes to write.

No algorithm (exccpt E m ) uses more than O(n) space; this is insignificant in comparison to time requirements. The main result of the tests is that LS is significantly faster than any of the others. An implementation of LS on a VAX11/750 generates a subset'about every 45 microseconds; on a Cyber 172/2 it takes about one third of this time.

In an application, each subset has to be processed in some way. If the

processing time dominates the generation time, then the processing time also determines the size of the largest problem that can be tackled. However, if the processing time is about the same or less than the generation time, then the generation time imposes a limit on the largest problem which can be tackled: for instance, in an hour of CPU time on the Cyber172/2, LS can process every 15subset of a 30-set. Hand optimized assembler, or a supercomputer, could improve this limit, but not significantly.

52

M. CarkeetandP. Eades

40

30

20

10

0

Figure 1

53

Performance of subset generating algorithms

The only disadvantage of using LS is that it does not have SMCP. EMK, about 4 times slower than LS,is the fastest elgorithm with this property. If the processing is significantly faster with SMCP, then EMIC should be used. Also, if the processing time dominates generation time, then a minor speedup from SMCP may justify EMK. The problem of finding a fast algorithm which has VSMCP is open. Finally we note that LEX is surprisingly fast. The simplicity of this algorithm (it requires no clever stack implementation), makes it attractive.

I

I

173

161

n=6

252

296

244

n=7

3.17

495

539

313

n=8

458

770

1058

794

434

n=9

585

1133

1893

1733

1245

533

n=10

728

1596

3152

3408

3080

1680

680

n=12

1062

2850

7454

10508

13574

10734

7166

4902

1602

k=2

k=3

k=4

k=5

k=O

k=7

k=8

k=9

k=1O

I

-

I

I

I

Figure 2b. CHASE number of statements executed n=5

246

240

n=6

371

459

358

n=7

527

797

789

493

n=8

716

1285

1554

1237

664

n=9

940

1056

2803

2740

1861

a48

n=10

1201

2854

4719

5486

4557

2646

I n=ll I I

I

-

Figure 2s. BER number of statements executed n=5

I

1.501

I I

3089

I

I

7520

I

10142

I I

9995

I

7134

I

1076 3670

I

1313

I

* * * * 1 1 I M. CarkeetandP. Eades

54

n=5

183

187

n=6

2130

364

610

712

428

1363

1122

603

1321

2316

2461

1760

1792

9878

4733

4232

928

n =I0 n=ll

I

660

I

111I11 319

2344

I

I

6619

8361

766

I

9012

973

2481

I

6S77

I

3476

I

1168 ~

1980

79'22

13806

17462

16626

10186

4613

k=3

k=4

k=5

k=6

k=7

k=8

k=9

k=10

-

Figure 2d. EE numbei

n =5

326

n =6

616

n =7

760

n =8

1087

n =9

1440

n =I0

1888

4448

n=ll

2411

6334

n =I2

1173

I

I

1181

I 11793 I

762

1

16916

3021

k=2

k=3

k=4

k=5

-

Figure 2e. LS number of statcments executed

n =5 n =6

n =7

61

I

134

I

182

_+f_t_ I 1

147

1

n =8

n=9

263

n=10 n=ll n =I2

133

I

460

I

1104

1098

k=5

1122

2214

1882

1116

4048

4080

2980

k=6

k=7

k=8

fk=9

I

k=lO

55

Performance of subset generating algorithms

7 Figure 2 t LEX - numbe

1 ;1 1 ~

n=5 n=6

/n=7 n =8

152

177

222

333

293

558

628

866

1188

ino

2ose

n =Q

j

446

d b

of statements executed

1076 2286

3988 8314

n =12

15970

k=6

736

621

n =8

1 ;3; 1 1 1 1404

1295

1592

n=9

2086

2406

3474

n =5 I

n=6

ln=7

174

281

251

514

338

888

1 1 1 I

3604 6598

I

1183 3793

14914

10393

k=7

k=8

I

1

k=9

1

1 I k=10 19Ss

1 1

?-+j-+j 1974

1180

1

n=10

690

2957

3873

6788

4641

3658

1280

n=ll

832

4040

5913

12213

9728

9655

4221

1903

n=12

987

5358

8656

20829

18697

22466

11836

7109

2003

56

M. CarkeerandP. Eades

I

1

Figure 3. Number of Pascal statements executed. n

k

BER

CHASE

EMK

EE

Ls

LE%

RD

14

3

4687

gas0

4616

16481

9 I7

6184

8790

I4

6

44419

66640

63237

1osa67

i~sa

49046

6710

I4

9

27239

43211

40620

72131

11618

JIOM

69380

18

3

ioaso

ia664

8727

303~7

191s

I IS70

19488

18

6

278111

432264

292287

>WOO00

68601

284169

326006

18

9

>WOO00

>600000

>WOO00

>SOOOOO

214523

>WOO00

>SO0000

-

~~

~

-

~

~~

Perfortnance of subset getreratkg algorithtiis

57

References

J. R. Bitner, G. Ehrlich, and E. M. Reingold, “Efficient Generation of the Binary Reflected Gray Code and its Applications”, Communications of the Association for Computing Machinery, 19 (1978)517-521. Margaret Carkeet and Peter Eades, “An Implementation of a Minimal Change Algorithm”, Technical Report No. 45, Department of Computer Science, University of Queensland, January 1983. Philip J. Chase, “Algorithm 382: Combinations of M out of N objects”, Communications of the Association for Computing Machinery, 13 (1970) 3138. Peter Eades, Michael Hickey and Ronald C. Read, “Some Hamilton Paths and a Minimal Change Algorithm”, Journal of the Aseociation for Computing Machinery 31 (1984)19-29. Peter Eades and Blendan McKay, “An Algorithm for Generating Subsets of a Fixed Size with a Strong Minimal Change Property”, Information Processing Letters 19 (1984)131-133. Gideon Ehrlich, “Loopless Algorithms for Generating Permutations, Combinations, and other Combinatorial Configurations”, Journal of the Association for Computing Maehiney, 20 (1973)500-513. Shimon Even, Algorithmic Combinnforb, hlacmillan (1973). Clement W. H. Lam and Leonard H. Soicher, “Three New Combination Algorithms with the Minimal Change Property”, Communications of the Association ,’or Computing Machinery, 25 (1982)555-559. Edward M. Reingold, Jurg Nicvergelt and Narsingh Deo, Combinatorial Algorithme, T h e o y and Practice, Prentice Hall (1977).

[lo] Albert Nijenhuis and Herbert S. Wilf,

Combinaton’al Algorithms,

Monographs in Computer Science and Applied Mathematics, Academic Press, (1975).