Parallel Computing 17 (1991) 273-277 North-Holland
273
Short communication
Systolic computation of characteristic polynomials of Hessenberg matrices H. Schrt~der
a
and E.V. Krishnamurthy b
Department of Electrical Engineering and Computer Science, University of Newcastle, Newcastle NS W 2308, Australia b Computer Science Laboratory, Australian National University, Canberra, A C T 2601, Australia
Received August 1989 Revised July 1990
Abstract
Schr~der H. and E.V. Krishnamurthy, Systolic computation of characteristic polynomials of Hessenberg matrices, Parallel Computing 17 (1991) 273-277 This paper describes the use of Instruction Systolic Arrays to compute a scalar multiple of the characteristic polynomial of a Hessenberg matrix in time O(n), where n is the order of the matrix. Keywords. Instruction Systolic computation; complexity; characteristic polynomial; Hessenberg matrix.
1. Introduction A matrix is said to be in Hessenberg form, if it is an upper triangular form with an extra codiagonal (below the diagonal) or in lower triangular form with an extra codiagonal (above the diagonal); the almost upper diagonal form is called the upper Hessenberg matrix and the almost lower diagonal form is called the lower Hessenberg matrix. Every matrix can be reduced to lower or upper Hessenberg form by using similarity transformation involving Gaussian type elimination; if the given matrix is symmetric, its Hessenberg form turns out to be tridiagonal. An important numerical method for finding the eigenvalues of a matrix is based on the reduction of the matrix to either upper or lower Hessenberg form using the similarity transformation, Wilkinson [4], Krishnamurthy and Sen [2], Gregory and Krishnamurthy [1]. A beautiful property of the Hessenberg matrix is that a scalar multiple of its characteristic polynomial can be computed by a very simple recursive algorithm. This algorithm can be readily adapted for systolic and instruction systolic computation [3]. Since the characteristic polynomial has numerous applications, e.g. in statistics, numerical analysis, graph theory, such an algorithm will be very useful for not only hardware realization but also for synchronous programming in S I M D machines. 0167-8191/91/$03.50 © 1991 - Elsevier Science Publishers B.V. (North-Holland)
274
H. SchrSder, E. V. Krishnamurthy
2. Algorithm Let us assume that the given matrix A is available in lower Hessenberg form, with nonzero superdiagonal elements (if there are zero superdiagonal elements then we need to partition the matrix and work with the diagonal submatrices separately):
A=
all a21 a31
a12 a22 a32
0 a23 a33
0 0 a34
... ... ...
i_
anl
an2
an3
an4
•..
a:nn
)
Suppose we let P;, i = 1 to n + 1 be the (n + 1)-dimensional row vectors of the (n + 1) x (n + 1) matrix P, with P~ = (1, 0 . . . . . 0) and i
P;+I =
-- aiy
1
Z I'E~ti,;+lJ-:--------Pi+ ai,i+l /3;
(i = 1 . . . . . n -- 1)
(1)
P~+1 = L a . j P j - / 5 . ;
(2)
j=l
here/3,, denotes a single cyclic right shifted Pr The components of P~ + 1 correspond to the coefficients of a scalar multiple of the characteristic polynomial in left to right order, with the rightmost component as the coefficient of the n th term. For convenience of systolic computation from left to right we have reversed the vector P; in left-to-right order instead of right-to-left order used in [1] and [2].
3. Systolic implementation The implementation given in this paper makes use of a new concept of parallel computation called the instruction systolic array [3]. This concept leads to high speed, area efficiency and allows to visualize parallel processes. The instruction systolic array consists of an array of processors. A matrix of instruction codes (called left program (LP) throughout this paper) is pumped through this array in a systolic fashion (see Fig. 1 ). In addition to the matrix of instruction codes indicated as L P (left program) in Fig. 1, a Boolean matrix T P (top program) is pumped through the array of processors. If a 0 meets an instruction in a processor then it prevents the instruction to be executed, while a 1 enables its execution. Thus an ISA-program consists of a pair (LP, TP). The instructions needed in the implementation presented here are defined below. The computation of P.÷1 can be done by an instruction systolic array in time O(n) using O(n 2) processors as described below. Eqs. (1) and (2) are now rewritten, such that their form reflects the systolic algorithm which produces P.+I- Let ai,i+ 1 = 1/ai,i+ l, a . , . + 1 = - 1 and Pi = ( Pn . . . . . Pi,.+ l), then t
t
Pi+l,k=a'i.i+llL--aij'Pjk+Pi,k_ll, ~ j=l Let
j=l
i = 1 . . . . . n;
k=a ..... n+l.
(3)
]
I
Sizk = -- ~., aij" Pjk = Si.,-l.k
-- ai+" Pik
(4)
Systolic computation of characteristicpolynomials
275
then !
Pi+,,k=aij+l(S,,+Pi.k_,),
i=l,...,n;
k=l ..... n+l.
(5)
Firstly the matrix A is preprocessed such that the elements of its superdiagonal a~,i+ ,, i = 1 to n - 1 are redefined by ai,i+ 1 := 1/a~,~+1 (they are all nonzero by assumption). This can either be done in time O(n) using one processor to carry out the division operations, or it can be done in time O(1) using n - 1 processors. The matrix A is augmented by one element an,n+ 1
: =
- -
1.
The algorithm to evaluate P,+a is presented here as an ISA-prograrn for a triangular instruction systolic array: Let G j, 1 ~
S:=Su-Aw'P,
A:=Aw.
(sv)
Instruction sv is used to evaluate S,k according to Eq. (4).
P : = A w . ( P u w + SN),
A : = A w.
(cp)
Instruction cp is used to evaluate P~+,., according to Eq. (5). It is assumed that the open connections of the processors at the boundaries are constantly supplying a zero. The execution of the algorithm is initialized by setting P,, to 1 and all other registers to 0 (this can be achieved easily using instruction so and supplying appropriate data at the input ports). Then the ISA-program together with the matrix A are pumped through the processor array as shown in Fig. 1. Lets assume that a n enters Cn at time step 1, then S,k is produced in Ct, at time step i + k - 1 and P~,, is produced in G+Lk at time step i + k - 1 . Thus at time step 2n + 1, P,,+1.,+1 is computed. Thus the execution time is 2n + 1 and the period is n + 1 (including the initialization).
TP
IiT,TiT-iT,-I .......................... : l : l : l : l ; .....................
if!till :I:l! '. g41 ',f131 ', Q2I '~fill '. ', SV ~ SV ', SV ', SV ', : (142 ', ~32 : ¢222 ', ~J12 :
LP
i.s..v.i?..Li.s..LU..p.! : a 4 3 : a 3 3 ', a 2 3 : : SV : s v I c p :
', sv : cp : :-1: I.c.P_~
c,, c,, ISA c,, c,, c',, ~C4, C.,2 C.,a 6'44
C's, Cs2 C.'saCs., ('ss I
Fig. 1. An instruction systolic implementation (n = 4).
H. Schrrder, E.V. Krishnamurthy
276
4. Example
(4 0)
We illustrate the above algorithm with a simple example of a 3 x 3 Hessenberg matrix: A=
14/3
5
4.
4
3
3
After preprocessing, i.e. replacing superdiagonal elements by their reciprocals (they are nonzero) and introducing element a 3 , 4 = - - 1, we get: A=
4
1/3
0
0/
14/3
5
1/4
4
3
3
)
0
-1
P1 = (1, 0, O, O) P2.k = a;2[--au" P,,k + P,.k-,]" Thus P~, _ = ½[-4-1] P23 = 0 ;
=
7'" ,
-
.'~ [ 0 +
e ~__ =
11
=
½:
P24 = 0.
P3.k = a;3[-- {a2,P,.k + a22P2.k } + P2.k-,] • Thus
p,,='[-~+~]=', P,:= ¼[~] = ,'-::
p,_
~[-~--~1=-~.
p,~=o.
P4,k = a ; 4 [ - { a3,P,.k + a32P2.k + a33P3.k } + P3.k-,] • Thus P41
3.
P4.-
7.
P4~
1"
P44 =
1 ,2
Hence the scalar multiple to the characteristic polynomial is given by:
( ~ ) - (~)x + x'-- (,5)x ~.
5. Remarks If ordinary floating point arithmetic is used the coefficients obtained in the above algorithm may be subject to large rounding errors and these may cause the resulting polynomial to be ill-conditioned for very large numerical problems. However, presently there are several error-free arithmetic procedures, which include the rational arithmetic [1]; these procedures can be used efficiently in conjunction with the above systolic approach for many problems in symbolic computation, computer algebra, combinatorial and graph theory.
Acknowledgements The authors thank Richard Brent for his interest and the Australian National University for support during this work; also they thank the referees for their constructive remarks.
Systolic computation of characteristic polynomials
277
References [1] R.T. Gregory and E.V. Krishnamurthy, Methods and Applications of Error-Free Computation (Springer Verlag, New York, 1984). [2] E.V. Krishnamurthy and S.K. Sen, Numerical Algorithms (Affiliated East West Press, New Dehli, 1986). [3] H.W. Lang, The instruction systolic array, a parallel architecture for VLSI, Integration The VLSI J. 4 (1986) 65-74. [4] J.H. Wilkinson, The Algebraic Eigenvalue Problem (Oxford University Press, Oxford, 1965).