Copyright © IFAC Algorithms and Architectures for Real-Time Control. Ostend. Belgium. 1995
A PARALLEL DISTRIBUTED MEMORY ALGORITHM FOR THE SINGLE-INPUT POLE ASSIGNMENT PROBLEM M. Castillo', V. Hernandez** and R. Mayo' ·Universitat Jaume I, Dept. d'Informatica, CastellO, ESPANA ··Universidad Politecnica de Valencia, Dept. de Sistemas Infomaticos y Computaci6n, Valencia, ESPANA
Abstract. Pole assignment is a central problem in the design of control linear systems. There is a set of proposed algorithms to solve this problem, but almost all of them have been proposed for sequential machines. In this paper an efficient parallel algorithm for the single-input problem, based on the modified QR method is proposed . The algorithm has been implemented on a T805 Transputer network with a ring topology, obtaining good speedups and efficiencies. Key Words. feedback.
Pole assignment; distributed memory; linear systems; single-input systems; state
1 INTRODUCTION
k is sought. In contrast to the determination of the eigenvalues of a matrix, the solution of the pole assignment problem is direct rather than iterative.
Consider the controllable time-invariant singleinput linear system x(t) = Ax(t) + bu(t).
(1)
The structure of the paper is the following. Section 2 shows a detailed description of the sequential algorithm on which the parallel algorithm is based. Section 3 contains the proposed distributed memory parallel implementation. In Section 4 the experimental results and conclusions are described.
In cases where the state vector x(t) is available, the input vector u(t) may be determined by the linear control law, or linear state feedback u(t)
= -kx(t)
(2)
where the vector k is said to be a feedback vector or gain vector. The corresponding closed-loop system is described by the equation x(t) = (A - bk)x(t)
2 MODIFIED QR ALGORITHM FOR THE POLE ASSIGNMENT PROBLEM OF SINGLE-INPUT LINEAR SYSTEMS
(3)
and hence, the behaviour of the closed-loop system is determined by the properties of the matrix (A - bk), more exactly by the eigenvalues of this matrix. So, if it is desired that the closed-loop system to have a certain behaviour, the vector k must be selected to force the matrix (A - bk) to have a predefined set of eigenvalues. The problem of finding an appropriate feedback vector k is referred to as the pole assignment problem (Petkov et al., 1991).
The pole assignment problem has been widely studied in the control literature. Some of the best-known methods for solving it are QR type methods (Petkov et al., 1984; Miminis and Paige, 1982; Patel and Misra, 1984), solution via the real Schur form (Varga, 1981) and those based on the solution of a matrix equation (Datta, 1987; Arnold, 1993). This paper is focussed on the QR type methods, in which the ideas of the iterative QR algorithm for the eigenvalue problem have been adapted for solving the pole assignment problem.
From the viewpoint of matrix computation, the pole assignment problem may be considered as an inverse eigenvalue problem which requires the determination of a matrix with given eigenvalues. In fact, the eigenvalues of the closed-loop system are known and the unknown feedback row vector
A preliminary step of these algorithms is the reduction of the system (A, b) to the orthogonal canonical form (Laub and Linnemann, 1986; Dongarra et al., 1990) 461
are nonzero. Because b(l) k(l) affects only the first row of A(l) - >'1 1 , these diagonal elements are the same as in (A (1) - b(1) k(l) - >'1 I) , and the element (1,1) of this matrix must be zero. That is why if we write
(4)
where A is an upper unreduced Hessenberg matrix and all the elements of vector b are equal to zero except the first one. This canonical form may be obtained in a numerically stable and efficient way using Householder reflections. There are several proposed algorithms to perform this reduction, even for distributed memory architectures (Dongarra et al., 1990). So, the study of the pole assignment problem is considered at the starting point of having the system in the orthogonal canonical form, represented by x(t)
= (pT AP -
b(l)
k{1)Ql = (k 1 , k(2))
where b1 and k1 are scalars and k(2) an (n-I)-row vector, then we have that
(5)
pTbkP)x(t).
= (b 1 , o)T
and so k1 = t U /b 1, which will zero the first column of (A (1) - b(l) k(l) - >'1 I)Q1' Left multiplication by the matrix gives
Qr
The row vector k = kP is sought in order to assign the eigenvalues of A - bk at the desired locations >'1, >'2, ... , >'n· In the rest of the section the modified QR algorithm (Miminis and Paige, 1982; Petkov et al., 1984) for the solution of this problem is described. Let A(l) = A, b(1) = b and k(l) = k. For a given eigenvalue >'1, the matrix A(l) -b(l) k(l) ->'11 must be singular. Hence, if this matrix is reduced to upper triangular form, applying a set of Givens rotations from the right, then one of the diagonal elements must be zero. For n=5 the matrix A (1) - >'1 I will take the form
x
x x
x x
where new nonzero subdiagonal elements are introduced. The matrix A (2) is necessarily in unreduced Hessenberg form since the rotations implemented are nontrivial and all diagonal elements are nonzero.
x x x
=
The application of
x
Qr to the vector b(l) yields
T b(l) Q 1 b(l) -- RT 11 -
x
The elementary rotation R 1i, i = 1, ... ,4, is calculated from the elements (i + 1, i) and (i + 1, i + 1) in such a way that when it is applyed the element (i + 1, i) is zeroed. The arrows numbered 1, 2, 3 and 4 indicate the action of the rotations R 11 , R 12 , R 13 and R14 on the corresponding pair of columns of the matrix.
Now we have
where the pair (A (2), b(2)) is controllable. In this way the problem has been reduced to the allocation of the remaining n - 1 eigenvalues >'2, ... , >'n to A (2) - b(2) k(2) by choosing k(2). This yields the same operations performed for the allo-
Since A(l) - >'11 is an unreduced Hessenberg matrix, each rotation is nontrivial, and the diagonal elements of the triangular matrix tii, i = 2, ... , n,
462
cation of AI, but working with a reduced problem.
x
x
After the (n - l)th step we will obtain A(n) as a 1 x 1 matrix and the unknown scalar k(n) = k n is determined from
x x
~x
x x x x
x x x x x
x x x x x x
x \ x
:J x x
Fig. 1. Elements of a Hessenberg matrix modified by the aplication of R3 from the right and left·
so that (A(n) - An) k n = -'------::-----'-
To implement an algorithm on a distributedmemory multiprocessor, it will be neccesary to choose a way to distribute the matrix among the memories of the different proccessors, in order to obtain good performances. This choice will allow us to get good load balancing and a good relation between the arithmetic time and the comunication time.
To obtain the gain vector k = k(l) it is necessary to perform a backward sweep. Since in the forward sweep j = 1, 2, ... , n - 1
where kl' ... ' k n and Ql' .. . ' Qn-l have already been determined, the backward sweep is
Suppose that a Givens rotation R3 is applied to a 5 x 5 Hessenberg matrix in order to eliminate the subdiagonal element in position (4,3). If R3 is applied from the right, only two columns of the matrix are modified; but when Rf is applied from the left, only two rows are modified, as can be seen in fig.1.
kn [k J., kU +1)jQT J '
j = n - 1, . .. ,1.
Then we have the following algorithm.
ALGORITHM-1:Sequential Algorithm. BEGIN A{l )
b(l)
Note that neither the typical column-wrapped nor row-wrapped storage is convenient for the parallel implementation of these operations.
A =b =
For j=1 to n-1 step 1 Choose Qj = Rj,n-jRj,n-j-l ... Rjl such that (A(j) - AjI)Qj = T(j) is upper triagular U aux - b = RTl J b ) kj TU)(1, l)/aux..b(l) aux...A = QJT(j) + Ajl bU+l) = aux..b(j + 1 : n) AU+l) aux...A(j + 1 : n,j
If a row storage is used, then only is posible to exploit parallelism when the rotation is applied from the right , because the two modified columns are distributed among different processors (see fig.2) .
=
=
On the other hand, when a column storage scheme is used, parallelism can only be exploited when the rotation is applied from the left, because the two modified rows are distributed among different processors (see fig .3).
+ 1 : n)
EndFor k (n)
= k n = (A(n) -
from the
3.1 Data distribution
bn
=
Rf
An)/b(n)
For j=n-1 to 1 step -1 k U ) = (k j , k U+1))QJ EndFor k = k (l ) END
In order to avoid these problems we are going t o use the data distribution described in (Van de Geijn, 1993) for the eigenvalue problem. Consider a multi computer formed by p processors 1 and a matrix A of dimension n=ph divided into pxp blocks of dimension hxh. Then the matrix is distributed among the processors such that the block (ij) is handled by the processor (i+j-2)mod p, as can be seen in fig.4 . Note that the blocks are cyclically distributed by antidiagonals. Po , PI, ... , P p -
3
PARALLEL IMPLEMENTATION
In this section the parallel version of the modified QR algorithm on a multicomputer with bidirectional ring topology is presented. In the first subsection it is introduced the data distribution used and in the second one the algorithm itself.
This storage scheme is refered to as the blockHankel-wrapped (BHW) storage scheme. With this storage scheme parallelism can be exploited 463
;----------~ I
I
;------------" ----------
~.--"'"
I
I
------------" r-----------.
t------tk---.
------------" ,-----------.
~ __________ i~
Fig. 2. In a row distribution when the rotation is applied from the left all the work must be done by only one processor.
,,
" "
"
"
" II II
" 'I I,
" " " " "
II II II II
" " II
I,
"
)-;~
AI2
Al3
AI4
AIS
A2S
A21
A22
A23
A24
A34
A3S
A4S
A32
A33
A54
ASS
A43
-',- - -,,- --
" I,"
All
A44
Fig. 4. Partitioning of a Hessenberg matrix of dimension 20x20 into 5x5 blocks of dimension 4x4 , and their distribution among 5 processors.
'I
;~\~-\
Fig. 3. In a column distribution when the rotation is applied from the right all the work must be done by only one processor.
when the rotations are applied both from the right and from the left (see fig .5) . Now the parallel algorithm will be described.
3.2
Parallel Algorithm
The first phase of algorithm-1 consists of reducing an upper unreduced Hessenberg matrix to triangular form. The dependency graph for this reduction can be seen in fig.6. In this dependency graph the circle nodes represent applications of Givens rotations to pairs of elements while square nodes represent calculations of Givens rotations from pairs of elements and their correspondent applications.
Fig. 5. This figure shows how the workload of the left and right application of a GitJens rotation is balanced among all the processors.
In order to introduce the basic ideas of the parallel algorithm , consider an upper unreduced Hessenberg matrix A of dimension 20x20, partitioned into 5x5 blocks of dimension 4x4, and distributed in a system with 5 processors using BHW storage scheme (see fig.7).
The rotation R i - 1 that eliminates the element (i, i -I) is obtained from the elements (i, i-I), (i, i). But (i , i) is one of the results of applying the rotation ~ that eliminates element (i + 1, i) to (i, i) and (i, i + 1).
Suppose that the blocks A55 and A44 have been reduced to upper triangular form , and the blocks A54 and A43 have been annihilated (the annihilation of the sub diagonals blocks will be explained later). So the next step is the triangularization of A 33 ·
Therefore, after ~ has been applied to the columns (*, i) and (*, i + 1) , it is posible to apply RT to the rows (i, i + 1 : n) and (i + 1, i + 1 : n) . Note that RT cannot be applied to the element (i , i) until the rotation R i - 1 has been applied to the columns (*,i -1) and (*,i).
It is easy to see what work has to be done in each processor to reduce A33 to the triangular form. Processor P4. In this processor the neccesary ro464
I11
all
Obtain
Q to
reduce A33 to triangular form
Fig. 6. Dependency graph for the reduction to the triangu-
1-------
, , ,
lar form of an upper unreduced Hessenberg matrix of dimension 5x5.
( 'Au I A21 ;1=
-t' Receive Q , __ ~e~d_
1
-------
Receive Q Receive Q
-------- -------
- ------
BROADCAS-T-:
Receive Q Send Q
Send Q : SendQ: -A~~l; -fr-;';;" , the RIG lIT
Q
-------
I A12
2A
2 A22
3A 23
4 A24
4 A33
oA34 I A44
2 A45
Fig. 8. Operations and data communications performed
2A54
3A55
for the reduction of block A33 to the triangular form .
3A32
I3
oA43
Fig. 7. The notation
--------
3A14 oA 25 'A" I A35
k Aij indicates that block dled by processor Pk'
Aij
Apply Q Tfro !he LEFT
Apply Q Tfro
the LEFT
Apply Qfrorn Apply Q fro the RIGIIT the RlGIIT
and Q Tfrorn the LEFT
is han-
ALGORITHM-2 : Parallel Algorithm for the Triangularization of Diagonal Block Aii
tations for the reduction are obtained and applied from the right to the block A 33 .
IN PARALLEL: For k=O, ... ,p-1 In Processor Pk BEGIN If (Aii E Pk) then Calculate Ti = AiiQi where Qi = Ri ,h-I .. . Ri2 and Rij is a Givens rotation setting Aii(j+l,j) to zero Broadcast Qi Update Aii = QTT Else Receive Qi For l=l to i-1 step 1 If (A li E Pk) then Update A/i = AliQi EndIf EndFor For l=i+1 to p step 1 If (Ail E Pk) then Update Ail = QT Ail End If EndFor End If END
Processors P2, P3. In these processors the rotations obtained in the processor P4 are applied from the right to the blocks A I3 and A 23 . Note that this is only possible after a communication between the processor P4 and the processors P2 and P3 . Thus the reduction of A33 to triangular form has been achieved. But more operations can be done in parallel in thi.,s step. The second part in the calculation of an-element of the gain vector is the application of the rotations from the left. We can do this in parallel with the application of the rotations from the right. The rotations obtained to reduce A33 will affect, besides A 33 , the blocks A34 and A 35 , which are stored in the processors Po and PI. Note that processors Po and PI have no work to do in the application of the rotations from the right and there is no data dependency that prevents the application of the rotations from the left to A34 and A 35 . Processors P4, Po, PI · Apply the rotations from the left to A 33 , A34 and A 35 . There must be a communication between the processor P4 and the processors Po and PI'
Now it will be described the problem of the annihilation of subdiagonal blocks. The principal characteristic of these blocks is that they have only one non-zero element, the last one of their first row. Therefore, the only operation to perform is to annihilate this element. This is a border step, where the rows and columns affected by the transformation to be applied are in different processors.
In fig.8 it can be seen the operations performed in each processor and the necessary comunications between them. The parallel algorithm to reduce a diagonal block to the upper triangular form is the following.
In the previous example, the block A32 has the following form: 465
BB ~
~
Fig. 9. Form of the matrix at the begining of the step for Fig. 10. Operations and data comunications performed for
the annihilation of the subdiagonal block A32. The blocks in black are those that have been modified until now.
o o o o
o o o o
a98
i= O.
The main problem of this step is the amount of communications that have to be done to annhilate the element a98.
the annihilation of block
A32 .
1. Annihilation of element a98. 2. Annihilation of element aS7 ' This operation modifies the element aS8. 3. Obtain the new element a98. This element is obtained with the application of the appropriate rotation form the left to the modified element aSS, of step 2. In consequence, the data dependency forces us to wait for the annihilation of element ai-2,i-l in order to get the value of the new sub diagonal element ai,i-l' Note that this annihilation is the first step of the triangularization of the next diagonal block Ai-l,i-l.
The first step is to send the necessary data to begin the operations. When the step for the annihilation of this element begins, the blocks A 55 , A44 and A33 have been reduced to the triangular form and the elements of the blocks A54 and A43 have been annihilated (see fig.9) .
Data dependencies of this border step makes difficult the algorithmic description. For the shake of simplicity, an explanation of the most relevant phases is made here.
The elements that will be affected by the annihilation of the block A32 are:
ALGORITHM-3:Parallel Algorithm for the Annihilation of Subdiagonal Block Ai ,i-l
• Rows 8 and 9 of the matrix. These rows are distributed, so row 8 of the matrix is row 4 of the blocks A2l (stored in processor pd, A22 (in processor P2), A 23 (in processor P3), A24 (in processor P4) and A 2.5 (in processor po). On the other hand, row 9 of the matrix is row 1 of the blocks A32 (stored in processor P3), A33 (in processor P4), A34 (in processor Po) and A35 (in processor Pl) . • Colummns 8 and 9 of the matrix. Like rows 8 and 9, they are distributed among several processors (processors Pl, P2, P3 and P4) .
IN PARALLEL: For k=O, ... , p-1 In Processor Pk BEGIN Step 1: Perform the necessary initial data communications. Step 2: If (Ai-l,i E Pk) then
Obtain the Givens rotation Broadcast Givens rotation Apply it from the right Apply the traspose Givens rotation from the left (previously it is necessary receive from left processor the updated element
The necessary initial communications are shown in fig .10. Once these communications have been achieved, it is possible to obtain the rotation that annihilates a98. Clearly, the same data communications are necessary for its application as that needed in the reduction of a diagonal block to triangular form. But now there are some special operations in order to retry the Hessenberg form, obtaining the new subdiagonal element a98. The sequence of operations that affects this element is:
Ai-l,i (h, 1))
Send Aii(l , l) to right processor Receive Aii(1,1) from left processor ElseFor j=l to i-1 step 1 If (Aj ,i-l E Pk) then Receive Givens rotation 466
ALGORITHM-4: Parallel Algorithm for the Triangularization of Diagonal Block Aii
from broadcast Apply it from the right If (j=i-1) then Send A jj (l,*) to left processor End If End If EndFor For j=i to p step 1 If (Ai-i,j E Pk) then Receive Givens rotation from broadcast Apply it from the left If (j=i) then Receive Ali (l,l) from left processor Apply last Givens rotation ( for triangularization of Ajj from the right to A jj (1:2,1)) in order to retrieve subdiagonal element A jj (2,1) Send Ajj (1,1) to left processor EndIf End If EndFor EndIf
IN PARALLEL : For k=O, ... ,p-1 In Processor Pk BEGIN If (Aii E Pk) then Calculate R;,h-i such that (0, a)=A ii (h,h-1 :h) Ri,h-i If (i:;ip) then Send a to right processor Receive a from right processor End If Calcutale Ti = AiiQi where Qi = R;,h-i .. . Ri2 and R;j is a Givens rotation setting Aii (j +1, j) to zero Broadcast Qi If (i:;ip) then Aii (h,h)=a EnfIf Update Aii = QTT Else Receive Qi For l=l to i-1 step 1 If (A li E Pk) then Update Ali = AliQi End If EndFor For l=i+1 to p step 1 If (Ail E Pk) then Update Ail = QT Ail End If EndFor End If END
Step 3: Return initial transfered data to the original processors. Step 4: If (last-step for this pole) then Processor which stores Ai-i ,i - i Obtain the correspondent element of gain vector Update vector b Send new noncero element of vector b to processor which stores Aii Else Processor which stores Ai-i ,i Receive updated Ai-i,i-i (h,h) (in the annihilation of Ai-i ,i - i (h-1 ,h) corresponding to next diagonal step) Retrieve subdiagonal element
Finally, with the iterative application of the above described algorithms the gain vector is obtained. But this gain vector, as in the sequential algorithm is not the desired one. This gain vector is modified by all the rotations used to reduce the state matrix to the triangular form in the assignment of each pole. Then, a backward sweep must be done to obtain the gain vector. This is accomplished by applying the corresponding transpose rotations
A i -i,i (1 ,h)
Return End If END.
A i -i,i-i
(h,h)
In this backward sweep, each processor applies the transpose Givens rotations which have been obtained in it. This has to be done with the restrictions of the data dependency graph in fig.II. In this dependency graph each node represents the application of a transpose Givens rotation to a pair of elements of the obtained gain vector. If a node has inputs R;j ( Givens rotation obtained in the assignment of the i - th pole, in order to annihilate the element (j - I,j)), kj - i and kj, the outputs k j - i and k j represent the result of the
Algorithm-2 , for reducing a diagonal block to the triangular form, must be modified, in order to send the updated element (h,h) of the diagonal block to the right processor. This can be done after the Givens rotation to annihilate element (hI,h) has been obtained and applied form the right to the pair ((h-I,h),(h,h)). Note that this is necessary only if the diagonal block is not App. Thus, the modified algorithm is the following.
467
0.'
°O~----~~~----~,OO~----~,~~-----= ....nx 0IIMnIi0n
Fig. 12. Graph of the speed-up.
I.'
,_ '- -
k 1
I
i
Fig. 11 . Data dependency graph for th e ba ckward sweep in the case of a gain vector of dimension 5.
- ..-.--.-. - - - - -
-.-~
o. )<"<-: •••• -.~~.-.- •• -.~••0.'
~
So, the parallel algorithm for the pole assignment problem is the following .
Fig. 13. Graph of th e performance (measured in Mftops) .
4 EXPERlMENTAL RESULTS AND CONCLUSIONS
ALGORITHM-5 : Parallel Algorithm for the Single-Input Pole Assignment problem. BEGIN for i=l to n-1 do (step +1) In Parallel: update Akk = Akk - AJ k=1, ... ,p last_block = i DIV h if «i MOD h)==h) then last_block last_block + 1 border_end TRUE else border_end = FALSE endif for j=p downto last_block (step -1) Apply Algori thm-4 to A jj if «j ~last....block) OR (border _end)) then Apply Algorithm-3 to A j - 1 ,j endif In Parallel : update Akk = Akk + AJ k=1, .. . , p endfor k
-
n -
4.1 Obtaining data test In order to obtain a linear system for testing the algorithm the following process was used: • Generate a random linear system (.4., b) where .4. is an upper unreduced Hessenberg matrix and all the elements of b are zero except the first one. • Generate a random feedback row-vector k. • Using LAPACK, compute the eigenvalues of matrix.4. - bk, AI , A2 " ' " An. • Apply the pole assignment algorithm to obtain the feedback row-vector k such that the eigenvalues of .4. - bk are AI , A2, . . . , An. • The error is given by Ilk - kll .
4.2 Results The parallel algorithm was implemented on a bidirectional ring of T805 Transputers (Graham and King , 1990; Mitchell et al., 1990) , using C as the programming language. Fig.12 shows the speedup obtained for several system matrix sizes using 2, 3, 4 and 5 processors. Due to the maximum dimension used (limited by the amount of memory per processor, in this case 1Mbyte) the maximum speedup ~s achieved only in the case of two pro-
A" .,,(h ,h)-A n b(n )
In Parallel: END
update k = kQT
468
6
cessors. Fig.13 shows that, for a fixed number of professors, when the problem size increases the Mflops (Millions of flops) obtained by the algorithm also increase.
Arnold, M. (1993) . Algorithms and Conditioning for the Eigenvalue Assignment Problem. PhD thesis. Dept.of Mathematical Sciencies , Northern Illinois University. Datta, B.N . (1987) . An algorithm to assign eigenvalues in a hessenberg matrix: Single input case. IEEE Trans. on Automatic Control AC32(5), 414-417. Dongarra, J .J. , S.J . Hammarling and D.C. Sorensen (1990). Block reduction of matrices to condensed forms for eigenvalue computations. In: Parallel Algorithms for Numerical Linear Algebra (H.A. Van der Vorst and P. Van Dooren, Eds.). pp. 215- 239. North Holland. Graham, Ian and Tim King (1990) . The Transputer Handbook. Prentice Hall. Laub, A.J . and A Linnemann (1986) . Hessenberg and hessenberg/triangular forms in linear system theory. Int. J. Control 44(6), 1523- 1547. Miminis, George S. and Chris C. Paige (1982). An algorithm for pole assignment of time-invariant multi-input linear systems. Int. Journal Control 35(2) , 341-354. Mitchell, D.A.P. , J .A. Thompson, G.A. Manson and G.R. Brookes (1990). Inside The Transputer. Blackwell Scientific Publications. Patel, R.V. and P. Misra (1984) . Numerical algorithms for eigenvalue assignment by state feedback. Proceedings of the IEEE 12(12) , 17551765. Petkov, P.HR., N.D . Christov and M.M. Konstantinov (1984) . A computational algorithm for pole assignement of linear single-input systems . IEEE Trans. on Automatic Control AC29(11) , 1045- 1048. Petkov, P.HR., N.D. Christov and M.M. Konstantinov (1991). Computational Methods for Linear Control Systems. Prentice Hall Internacional. Van de Geijn, R.A . (1993). Deferred shifting schemes for parallel qr methods. SIAM J. Matrix Analisys and Applications 14(1) , 180-194. Varga, A. (1981) . A schur method for pole assignment. IEEE Trans. on Automatic Control AC26(2), 517- 519.
In the next table the numerical results of the efficiency are described. Matrix Dimension 60 Number of processors Efficency 2 0,77 3 0,49 4 0,39 5 0,33 Matrix Dimension 120 N umber of processors Efficency 2 0,84 3 0,59 0,51 4 0,45 5 Matrix Dimension 180 Number of processors Efficency 2 0,87 3 0,64 4 0,56 5 0,52
4.3
Conclusions
The results obtained show that the algorithms for the pole assignment problem based on the QR method can be implemented on distributed memory multiprocessors with good performances. This is specially interesting for distributed systems where the memory associated wiht each processor can not store the whole system, making it necessary to distribute the data among the memory of the processors. On the other hand, it has been shown that the BHW data distribution is apropiate for this type of problems. The results show that this can be one of the ways for the parallelization of the different Q R algorithms for the pole assignment problem. Future work will include examine the use of this storage scheme for the parallelization of other algorithms for the pole assignment problem, in case of both single-input and multi-input linear systems.
5
REFERENCES
ACKNOWLEDGEMENTS
This work was partially suported by the Euporean ESPRIT III Basic Research Project GEPPCOM No. 9072 , and by " Fundacio Caixa Caste1l6" grant No. A-35-IN. 469