PARALLEL COMPUTING ELSEVIER
Parallel Computing 20 (1994) 353-361
Short communication An optimal parallel algorithm for generating permutations in minimal change order, J o n g - C h u a n g T s a y *, W e i - P i n g L e e Institute of Computer Science and Information Engineering, College of Engineering, National Chiao Tung Unwersity, Hsinchu, Taiwan 30050
(Received 25 August 1992; revised 2 July 1993)
Abstract
Permutation generation is an important problem in combinatorial computing. In this paper we present an optimal parallel algorithm to generate all N! permutations of N objects. The algorithm is designed to be executed on a very simple computation model that is a linear array with N identical processors. Because of the simplicity and regularity of the processors, the model is very suitable for VLSI implementation. Another advantageous characteristic of this design is that it can generate all the permutations in minimal change order. Key words: Minimal change order; Parallel algorithm; Permutation generation; VLSI
1. Introduction
Combinatorial algorithms frequently involve generating all m e m b e r s of a set of combinatorial objects. For example, suppose we want to solve an optimal p r o b l e m that looks like this: Max g ( x l , x2, x 3 . . . . . XN). all sequences xlx 2...x N that satisfy some constrains In solving such a problem, it is necessary to g e n e r a t e each of the possible configurations of the x,'s in some systematic order. W e call these sequences of x,'s
* This research was supported by the National Science Council of R.O.C. under Grant NSC 82-0408-E009-116. * Corresponding author. Email:
[email protected] 0167-8191/94/$07.00 9 1994 Elsevier Science B.V. All rights reserved SSDI 0167-8191(93)E0089-E
354
J.-C. Tsay, W.P. Lee/Parallel Computing 20 (1994) 353-361
the combinatorial objects of the problem. Basic combinatorial objects include permutations, combinations, subsets, partitions of integers, etc. [8,10,13]. In this paper, we shall propose an optimal parallel algorithm for permutation generation. The idea used to design this parallel algorithm is based on the design concept of the sequential algorithm proposed by Johnson [5]. In the past, dozens of sequential algorithms have been proposed for generating permutations [8,10-14]. Some methods generate permutations in lexicographic order, some in minimal change order. There are also many methods that produce permutations without regard to any special generation order. Extensive and elaborate surveys of this field are given in [11,12,14]. Furthermore, in [14], Sedgewick compared over thirty sequential algorithms of permutation generation. Recently, several parallel algorithms have been developed for generating combinatorial objects, such as permutations [1,3,4,7,9,16] and combinations [1-3,15]. With respect to the design of parallel algorithms for permutation generation, common design criteria are listed in the following: (1) Computation cost [1]: The computation cost of a parallel algorithm is defined as the total execution time taken by the algorithm multiplied by the number of processing elements (in short, PE) used. A parallel algorithm is said to be cost-optimal if its computation cost matches the lower bound on the number of sequential operations required to solve the problem. In the case of generating all the permutations of N objects, there are N * N! items to be output; hence the lower bound is O(N * N!). (2) Computation model [6]: We expect the computation model upon which the algorithm is executed to be as simple as possible. The simplicity of a computation model is measured by the following factors: (a) size of local memory, (b) modularity and regularity, (c) operation simplicity, and (d) communication locality. A simple computation model can be effectively realized in VLSI, thus greatly reducing hardware cost. (3) Order of permutation generation [13]: Aside from producing all the permutations efficiently, the order of the permutation generation should also possess several other desirable features. For example, in some applications, we want to generate permutations in which the change between successive permutations is as small as possible. Algorithms satisfying this requirement are called minimal change algorithms. In addition, generation in lexicographic order is another desirable feature in some cases. Concerning the above criteria, we find that the algorithms in [1,3,4,9] run on more complicated computation models than those on which the algorithms in [7,16] run. All PEs in [7,16] are both simple and identical, whereas the PEs in [1,3,4,9] have to perform more complex operations. The algorithm in [16] is the only minimal change algorithm, and the algorithm in [1] can generate permutations in lexicographic order. The others [3,4,7,9] are neither in minimal change order nor in lexicographic order. In the following, some main features of these parallel algorithms will be described. In [9], a parallel algorithm to generate all the Pff = N! permutations of N objects was presented. Though the algorithm is cost-optimal, the operations of
J.-C. Tsay, W.P. Lee~Parallel Computing 20 (1994) 353-361
355
each PE are not simple. In [4], Gupta and Bhattacharjee also presented a parallel algorithm; the advantage of their design is that it can generate P us permutations for any M ~ [1, N] by using an arbitrary number of PEs. Their design is not cost-optimal, however. In [1], Aid proposed some concise parallel algorithms which are all cost-optimal, and these algorithms can generate PuS permutations and C g combinations for any M e [1, N] by using an arbitrary number of PEs. Furthermore, they are all lexicographic algorithms. Nevertheless, each PE requires memory of size O(N). In [3], a parallel algorithm which can generate pN permutations for all M < N was designed, and also this design can easily be modified to generate combinations. But the computation model of this algorithm needs a selector and each PE in this model has a stack of size M. The parallel algorithm to generate Pus permutations in [7] is cost-optimal and runs on a linear array in which each PE is simple and contains only six registers as its local memory. But the order of permutations generated by the algorithm does not possess any well known order, such as minimal change order or lexicographic order. In [16], a parallel algorithm was proposed to generate PUS permutations in minimal change order. It is also cost-optimal and uses a simple computation model. However, the algorithm requires a special counting procedure to avoid computing large numbers directly. This special procedure makes the PEs in the linear array a little complicated. In the next section, we shall propose a new parallel algorithm for generating Pus permutations. In Section 3, we show that this algorithm achieves all goals set by the above design criteria: it is cost-optimal, has a very simple computation model, and is a minimal change algorithm.
2. Parallel algorithm design In [5], Johnson proposed a sequential minimal change algorithm which can produce each permutation from its predecessor by the interchange of two adjacent elements. Hence, two successive permutations differ in only two positions. In what follows, we will modify and parallelize this concise method to obtain a parallel algorithm. First, let us explain Johnson's method briefly. Definition 1 (Index l(k)). For a permutation rr = ( Q , . . . , a,..., b,..., cu), if a > b, we call the pair (a, b) an inversion of rr. For a permutation ~-' = ( q , c2, c 3. . . . . cN), we define indices I(k) for k = 1, 2 . . . . . N, where I(k) is 0 or 1 depending on whether the number of inversions on the elements 1, 2, 3 , . . . , k - 1 is even or odd. By convention, let I(1) = I(2) = O. For example, the permutation 7r = (2, 4, 3, 1) has four inversions: (2, 1), (3, 1), (4, 1) and (4, 3). With respect to the element 4, we have inversions (2, 1) and (3, 1); therefore, I(4) is 0.
J.-C. Tsay, W..P.Lee/Parallel Computing 20 (1994) 353-361
356
Definition 2 (Interchanging operation T( k )). T(k) is an interchanging operation which is applied to an element k and its neighboring element of a permutation according to the following cases. (Case i). If I(k) = 0 and the element k has some smaller element immediately to its left, k is interchanged with the left element. (Case ii). If I(k) = 1 and the element k has some smaller element immediately to its right, k is interchanged with the right element. (Case iii). If the permutation satisfies neither of the above two cases, T(k) is undefined.
For example, if we apply T(4) to the permutation ~- = (2, 4, 3, 1), we get the permutation 7r' = (4, 2, 3, 1). Johnson's rules for generating the next permutation are listed as follows: Rule (1). At each stage apply T(m), where m is the largest element for which T(m) is defined. Rule (2). Complement the indices I(k) for m < k < N. By applying the two rules on the new permutation repeatedly, all the permutations can be generated. For example, utilizing these rules to produce all permutations of the elements 1, 2, 3 and 4, we obtain the results listed in Table 1. In the remainder of this paper, we will use N to denote the number of objects which we want to permute and 7rk to indicate the kth permutation. We also define set Si to be
S,={sl(smodN)=i,
0
O
From Table 1, we can observe that Johnson's rules always apply T(4) to the kth
Table 1 The permutation list generated by Johnson's rules k 0 1 2 3 4 5 6 7 8 9 10 11
zrt, (Cl c2 s c4) 1 1 1 4 4 1 1 1 3 3 3 4
2 2 4 1 1 4 3 3 1 1 4 3
3 4 2 2 3 3 4 2 2 4 1 1
4 3 3 3 2 2 2 4 4 2 2 2
T(m)
k mod 4
k
T(4) T(4) T(4) T(3) T(4) T(4) T(4) T(3) T(4) T(4) T(4) T(2)
0 1 2 3 0 1 2 3 0 I 2 3
12 13 14 15 16 17 18 19 20 21 22 23
~rk (Cl c2 c3 c4) 4 3 3 3 2 2 2 4 4 2 2 2
3 4 2 2 3 3 4 2 2 4 1 1
2 2 4 1 1 4 3 3 1 1 4 3
1 1 1 4 4 1 1 1 3 3 3 4
T(m)
k mod 4
T(4) T(4) T(4) T(3) T(4) T(4) T(4) T(3) T(4) T(4) T(4)
0 1 2 3 0 1 2 3 0 1 2 3
J.-C. Tsay, W.P. Lee~Parallel Computing 20 (1994) 353-361
357
permutation ~'k, when (k mod 4) ~ 3, to get the successor permutation. We state this property in the following observation. Observation
I. By Johnson's rules, with the initial permutation zr0 = (1, 2 . . . . .
N -
1, N ) , T ( N ) is always applied to 7rk when k ~ SN_ t, 0 < k < N ! - 2. Next, we shall ask how to find the element m which satisfies Johnson's Rule (1) so that T ( m ) can be applied to the k t h permutation 7r~ to derive ~-k+x, when k ~ S N_ ~. We shall answer this question after the following definition. D e f i n i t i o n 3 (Header permutation, final permutation, permutation block).
Suppose k ~ S o. We call permutations 7rk, 2rk+~,~rk+z,... , and ~rk+N_ t a permutation block. The permutations 7rk and zrk+N_ t are the header permutation and final permutation of the block, respectively. In addition, we call the permutations 9"0, 7r~. . . . . ~'N- 1 the first permutation block. By Observation 1, we know that the final permutation ~Tk+N_ 1 can be obtained by applying T ( N ) ( N - 1) times to the header permutation 7rk. Therefore, we can predict the final permutation of any permutation block from the header permutation of the block, We state this property in the following observation. 2. Suppose ~'0 = (1, 2 . . . . . N - 1, N), k ~ So, and T ( m ) is the interchanging operation applied to the final permutation rrk+ m_ 1" The values of m can be obtained from the header permutation 7r~ = ( q , c 2. . . . . c N) as follows: Observation
m = Max{c, I T ( c , ) is defined in ~'k, C,-~N, 1 <_i < N } . By the above observations, we present a modified version of Johnson's method in the following sequential algorithm, Algorithm 1. In the algorithm, the variable m contains the value that makes T ( m ) an interchanging operation applied to the final permutations, and since the initial permutation ~0 is (1, 2 . . . . , N), by Observation 2 we initialize m to be N - 1. Algorithm I (Initialization phase). ,r *- (1, 2, 3 . . . . . N ) / * Initialize the permutation ~-. 9 / count *- 0 / * The value of count is (k mod N ) shown in the last column of TabIe
1.*/ m ~N1 /* T(N(Execution phase)
1) is applied to the final permutation ~'U-l" * /
Repeat
(Step 1) / .
,/
We apply T ( N ) t o
7r, where 7r is the kth permutation and k ~ SN_ 1.
While count < ( N - 1) Do. (Step 1.1) Apply T ( N ) to 7r to get 7r', and Output rr'. (Step 1.2) count ~ count + 1; rr ~ rr' Endwhile
J.-C. Tsay, W.P. Lee/Parallel Computing 20 (1994) 353-361
358
(Step 2) / * We apply T ( m ) to rr, where 7r is the k t h permutation and k ~ SN_ 1 (i.e. 7r is a final permutation). * / (Step 2.1) Apply T ( m ) to zr to get ~-'= (cl, c 2. . . . , c~v), and Output rr' (Step 2.2) Complement I(ci), where m < e i < N. (Step 2 . 3 ) / * r in Step 2.1 is the header permutation of the next permutation block. By Observation 2, the m of the next block can be computed from ~-'. * / m = Max({0} U {cil T(c,) is defined in 7r', c, 4: N, 1 =
,/ While Count(i) < (N - 1) Do (Step 1.1) / * Apply T ( N ) . * / / * We use " ~ " to denote a swapping operation. * / Case: (1) C ( i ) = N and Ind(i)= 0: C(i),~, C ( i - 1); lnd(i) Ind( i - 1); (2) C(i) = N and lnd(i) = 1: C(i) ~ C(i + 1); Ind(i) Ind(i + 1) Endcase Output C(i) / . Evaluate m for the final permutation. * / M(i) ~ M a x ( M ( / - 1),M(i),M(i + 1)) (Step 1.2) Count(i) ~ Count(i) + 1 Endwhile
J.-C. Tsay, W.P. Lee/Parallel Computmg 20 (1994) 353-361
359
(Step 2) / * Apply T ( m ) to 7r, where ~- is the kth permutation and k ~ S N _ 1. Next, initialize M(i). * / (Step 2 . 1 ) / * Apply T ( m ) or terminate the computation. * / Case: (1) C ( i ) = M ( i ) and I n d ( i ) = 0: C(i) ~, C(i - 1); Ind(i) Ind(i - 1) (2) C(i) = M ( i ) and Ind(i) = 1 : C(i) ,-, C(i + 1); Ind(i) ,-->Ind(i + 1) (3) M ( i ) = 0 : Stop Endcase Output C ( i ) (Step 2.2) / * Complement I(ci), where m < c, < N. * / If C(i) > M ( i ) Then Ind(i) ~ ~ Ind(i) Endif (Step 2 . 3 ) / * Initialize M(i). *,/ Case: (1) C ( i ) = N : M ( i ) *-- 0 (2) lnd(i) = 0 and i > 1 and C(i - 1) < C(i) : M ( i ) C(i) (3) Ind(i) = 1 and i < N and C(i + 1) < C(i) : M(i) , -
c(i) (4) others
:M(i)
Endcase (Step 2.4) Count(i) ~ 0 Endrepeat
The operations of Algorithm 2 are similar to those of Algorithm 1 except the computation of m. In the parallel algorithm, each PE needs to know simultaneously the value of m so that T ( m ) can be applied to a final permutation to generate the next header permutation in one parallel step. To achieve this, we superimpose the computation of m on the generation of a permutation block as follows. Suppose 7rk and ~'k+s-~ are the header permutation and the final permutation, respectively, of a permutation block. Directly after the generation of the header permutation ~'k, Step 2.3 of Algorithm 2 initializes M(i). (Each PE(i) will test whether T(C(i)) is defined or not. If it is, we initialize M ( i ) to be C(i); if not, M ( i ) is set to zero. Since m--/:N, if C ( i ) = N , we set M(i) to zero.) If M ( h ) = Max{M(1), M ( 2 ) , . . . , M(N)}, then m is equal to M ( h ) and PE(h) owns the value of m. In order to broadcast the value of m = M ( h ) to all PEs, each PE performs the statement 'M(i) ,-- M a x ( M ( / - 1), M(i), M(i + 1))' in Step 1.1 ( N 1) times in parallel. Thus after ( N - 1) parallel steps, m is available in each PE and T ( m ) operation can be applied to the final permutation "Wk+N_1 to generate the next header permutation. For example, suppose that N is 4 and ~'k = (Cl, C2, C3, C4)= (4, 1, 3, 2) is a header permutation. After executing Step 2.3 of Algorithm 2, M(1), M(2), M(3), and M(4) are initialized as 0, 0, 3 and 0, respectively. Next, in Step 1.1 we execute the statement 'M(i) ~ M a x ( M ( / - 1), M(i), M(i + 1))' three times, and the M(i)
360
J.-C. Tsay, W.P.Lee/Parallel Computing 20 (1994) 353-361
values are changed as follows: M(1)
M(2)
M(3)
M(4)
Count(i) = 0
0
3
3
3
Count(i) = 1
3
3
3
3
Count(i) = 2
3
3
3
3
From the above descriptions, we have the following observation.
Observation 3. Suppose T ( m ) is the interchanging operation applied to a final permutation. By Algorithm 2, all PEs evaluate m in time so that T ( m ) can be applied to the final permutation to generate the next header permutation in one parallel step. By this observation, we see that variables M ( i ) and lnd(i) can be correctly updated in time by each PE(i). Therefore the correctness of Algorithm 2 can be assured.
3. Concluding remarks In the introduction, three common design criteria are set forth for evaluating a parallel algorithm for generating permutations. We can now evaluate our design according to these criteria. It is not difficult to conclude that all the design goals which stay behind these design criteria are met by our proposed parallel algorithm (Algorithm 2). The proposed algorithm is a cost-optimal, minimal change algorithm and can be executed on a very simple computation model which is amenable to VLSI implementation. Another closely related open problem is to design parallel algorithms for generating permutations in lexicographic order. In the near future, we shall study this problem.
4. References [1] S.G. Akl, Adaptive and optimal parallel algorithms for enumerating permutations and combinations, Computer J. 30 (5) (1987) 433-436. [2] S.G. Akl, D. Gries and I. Stojmenovic,An optimal parallel algorithm for generating combinations, Informat. ProcessingLett. 33 (1989//90) 135-139. [3] G.H. Chen and M.S. Chern, Parallel generation of permutations and combinations, BIT 26 (3) (1986) 277-283. [4] P. Gupta and G.P. Bhattacharjee, Parallel generation of permutations, Comput. J. 26 (2) (1983) 97-105. [5] S.M. Johnson, Generation of Permutations by adjacent transposition, Math. ofComputat., 17 (83) (1963) 282-285. [6] S.Y. Kung, VLSIArray Processor, Ch. 4 (Prentic-Hall, Englewood Cliffs, NJ, 1988). [7] C.J. Lin, Parallel generation of permutations on systolic arrays, Parallel Comput. 15 (1990) 267-276.
J.-C. Tsay, W.P. Lee/Parallel Computing 20 (1994) 353-361
361
[8] C.L. Liu, Introducuon to Combinatorial Mathematics (McGraw-Hill, New York, 1968). [9] M. Mor an A.S. Fraenkel, Permutation generation on vector processors, Comput. J. 25(4) (1982) 423-428. [10] A. Nijenhuis and H.S. Will, Combinatorial Algorithms (Academic Press, New York, 1978). [11] R.J. Ord-Smith, Generation of permutation sequences: Part 1, Comput J. 13(3) (1970) 152-155. [12] R.J. Ord-Smith, Generation of Permutation Sequences: Part 2, Comput. J. 14(2) (1971) 136-139. [13] E.M. Reingold, J. Nievergelt and N. Deo, Combinatorial Algorithms: Theory and Practice, Ch. 5 (Prentice-Hall, Englewood Cliffs, N J, 1977). [14] R. Sedgewick, Permutation generation methods, Comput. Surv. 9(2) (1977) 137-164. [15] C.Y. Tang, M.W. Du and R.C.T. Lee, Parallel generation of combinations, in: Proc. Internat. Comput. Syrup., Taipei, Taiwan (1984) 1006-1010. [16] B.Y. Wu and C.Y. Tang, An optimal parallel algorithm for generating permutations on linear array, in: Proc. First Workshop on Parallel Processmg, Hsinchu, Taiwan, Republic of China (1990) 106-110.