Data parallel evaluation-interpolation algorithm for polynomial matrix inversion

Data parallel evaluation-interpolation algorithm for polynomial matrix inversion

Parallel Computing 19 (1993) 577-589 North-Holland 577 PARCO 743 Data parallel evaluation-interpolation algorithm for polynomial matrix inversion E...

580KB Sizes 0 Downloads 116 Views

Parallel Computing 19 (1993) 577-589 North-Holland

577

PARCO 743

Data parallel evaluation-interpolation algorithm for polynomial matrix inversion E.V. K r i s h n a m u r t h y

and Chen

Pin

Computer Science Laboratory, Research School of Physical Science and Engineering, The Australian National University, GPO Box 4, ACT 0200, Canberra, Australia Received 15 November 1991 Revised 16 October 1992

Abstract Krishnamurthy, E.V. and C. Pin, Data parallel evaluation-interpolation algorithm for polynomial matrix inversion, Parallel Computing 19 (1993) 577-589. This paper describes a data parallel algorithm for the inversion of polynomial matrix using evaluation and rational interpolation. The algorithm generates the inverse matrix whose elements are continued fractions, in time complexity O(max(tm, n2)) for an (m × m) polynomial matrix, whose determinant has an estimated degree n, t is the number of iteration to obtain an inverse (or Moore-Penrose inverse). The implementation of the algorithm has been done on the Connection Machine in CM FORTRAN using at most (mE(n + 1)) processors. This algorithm can be directly extended to invert arbitrary function matrices by proper choice of evaluation-interpolation points.

Keywords. Linear algebra; matrix inversion; rational interpolation; continued fractions; Connection Machine.

1. Introduction In many problems in system theory it is necessary to invert matrices whose elements are polynomials. Such matrices are called polynomial matrices. The inversion of such matrices are computationally laborious and need data-parallel operations for speedup. In this p a p e r we describe how a data parallel machine such as the Connection Machine can be used for the inversion of a polynomial matrix using evaluation and rational interpolation; see [1,2]. Let us denote by aO(x) t h e single-variable polynomial at the ijth entry in a matrix A(x) of ( m × m). We also assume that aij(x) is of degree (d - 1). Then we write

A( x)

= {aij( x ) }

(1.1)

where i, j = 1. . . . . m, and d

air(X) = E Cijk X k - x

(1.2)

k=l

where x ~ F ( F is the real field). Correspondence to: E.V. Krishnamurthy, Computer Science Laboratory, Research School of Physical Science and Engineering, The Australian National University, GPO Box 4, ACT 0200, Canberra, Australia. 0167-8191/93/$06.00 © 1993 - Elsevier Science Publishers B.V. All rights reserved

E.V. Krishnamurthy, C. Pin

For a given (m × m) matrix A ( x ) with its polynomial degree d - 1, in a SIMD architecture ch as Connection Machine, the algorithm described here requires O(max(tm, (n + 1)2)) ~,ps to construct an inverse or Moore-Penrose inverse through rational interpolation, using most (m2(n + 1)) processors, where (n + 1) is the number of data points for interpolation, lich equals m ( d - 1 ) + 1, and t is the number of iterations to obtain an inverse (or aore-Penrose inverse).

Inverting a p o l y n o m i a l matrix - P r i n c i p l e s

The procedure used for inverting a polynomial matrix is based on the principle of aluation-interpolation [1]. This procedure consists of three main stages: (i) The first stage evaluates the given polynomial matrix at a certain number of required ints to obtain a set of numerical matrices. (ii) In the second stage, the Moore-Penrose inverses of these matrices are found. (iii) In the third stage, having the values of the corresponding elements (these inverses) at 9se specified points, we use a rational interpolation scheme to reconstruct the inverse of the lynomial matrix at those given points. In this paper, our discussion is limited to single-variable polynomial and square matrices.

'Evaluation The first stage evaluation is carried out by rewriting (1.2), as an inner product: C2

aij= [1, x, x 2. . . . . x a - ' ]

=X'c~. C

(2.1)

ii

Lere cij -- {cl, c2,..., Cd}o; i, j = 1, 2 . . . . . m, and X = {1, x, x e , . . . , x a - l } . Thus if given a set of points in (2.1) we can carry out the evaluation for a polynomial matrix x) with manipulating three-dimension matrices and get a set of evaluated matrices. Let C :ich is a three-dimension real matrix denote the polynomial coefficient matrix and V the ~luated value matrix. That is

C = {cqk }

(2.2)

ere i, j = 1, 2 , . . . , m ; k = 1, 2 . . . . . d and (2.3)

V = { Uijh}

" i , j = l , 2 , . . . , m ; h = l , 2. . . . . n + l . According to the expresssion of matrix evaluation, at point x h d

Uijh = XTcTij = E xhl-lCeil .

(2.4)

l=1

Conveniently introducing V (h) which denotes the hth plane in V along the dimension 3, we ~,e V = { V (1), V (2) . . . . . V (n+l)}

ere obviously V (h) is the evaluated matrix of A ( x ) at point x h or V (h) =A(Xh).

(2.5)

Parallel algorithmfor polynomkd matrix inversion

579

2.2 g-Inversion The second stage of the algorithm requires inversion of the evaluated matrices. Since it is possible that some matrices may turn out to be singular, it is preferable to use an algorithm that gives the conventional inverse ( A - 1 ) or the Moore-Penrose inverse (A ÷). The use of this algorithm for finding Moore-Penrose inverses has several advantages: (i) It avoids failures due to singularities of matrices; (ii) The iterative procedure which is based on multiplication generates either the conventional inverse or the Moore-Penrose inverse and produces stable results; (iii) The implementation of the iterative matrix multiplication algorithm can be done efficiently in linear time in the Connection Machine in a data parallel fashion for all the evaluated values in different planes simultaneously. The Moore-Penrose inverse A ÷ satisfies the following conditions: • AA+A = A ; • A+AA+=A+; •

(AA+)r = (AA+);

• ( A + A ) r = (A+A). This algorithm carries out the quadratically convergent iteration Yk+l = Yk( 2 I - A Y k )

(2.6)

until I Y k + I - - Y g I <_E, here I" I stands for a suitable norm and • is a selected precision to control the iteration. The iterative step begins with Y0 = a A T where 0 < a < 2/trace(AA T) (where trace is the sum of the diagonal elements of a matrix). One also could use a cubic speedup with more efficiently convergent process

Yk+l = Yk( I + AYk + (AYk) 2)

(2.7)

here AY k = ( I - A Y k ) . Some distinct advantages including reliability and stability of such a process are discussed in [7]. Generally, the number of the iteration depends on two factors which are selected precision • and the original matrix Y0- In practical cases we found that if we choose • < 10 -s, the number of the iteration is of the order of ten.

2.3 Rational interpolation The third stage of the algorithm is the rational interpolation of corresponding elements in the inverse matrices, to produce the corresponding elements of the inverse of A(x). Let R(k, l) be a set of rational function of the form rk,t(X) = b k ( x ) / d l ( x ) , where bk(X) and dr(x) are polynomials of degree k and l, respectively. If a set of n + 1 = k + l + 1 pairs of points (xi, fi) for 0 _
rk,l(Xi)

=f/

for 0 _ i _ < n .

(2.8)

In a recent paper [2], a parallel systolic algorithm for rational interpolation was achieved. Let { X k} denote the set {x 0, x 1. . . . . x k} and {Xk, X} denote {x 0, X l , . . . , x k, x} which is obtained by appending {X~} to the element x. We denote P{X o} = P ( x o) = f 0 =P0 and P( Xi ) = Pr We then define

P{XI} =PI = P { x o , x,} = ( x 1 - x 0 ) / ( f I - f o ) P{X2} =P2 = { X , , x21 = [ ( X 2 - x , ) / ( P { X o ,

x21 - P { X , } ) ] +P{X0},

)

E.V. Krishnamurthy, C. Pin

d in general, for a specified i,

P{ s i ' xi+j}

= [(xi+j-xi)/(etxi-j,

xi+j} - e{ xi} )]

+e{xi-1}

r all j = 1, 2 . . . . . ( n - 1). So the interpolant r ( x ) on the given points tained as the following continued fraction [5], r(x)

=P0 + P1 +

XO, X l , . . . ~ X

n

is

X --X 0 x --x 1 x -x 2

P 2 - Po +

P3 - P , + " "

+

X --Xn_ 1

Pn -P.-2

(2.9)

dch will also be denoted by r(x)

X --X 0

=P0 -4

(2.10)

x-xi-1 I P1

+ i=2[ei - P i - 2

Then the rational interpolant rk,t(x) ~ R ( k , l) can be derived from the construction of the de approximant which can be expressed as the following recurrences:

gi+l = g i ( e i + l - e i - 1 ) +gi-l( x - x i )

(2.11)

qi + l = qi( Pi + l - Pi-1) -4-q i - ,( x - x i )

(2.12)

Lere 1 < i < n - 1, and with initial values g0 = e 0 ; gl =PoP1 + ( x - x 0 ) ; q0=l; qf=Pl" Thus, the rational interpolant can be denoted by rk,t = b k / b t = g n / q n

ere n + 1 = k + I + 1. For given n + 1 points, a rational interpolation with this algorithm may not exist due to the ginal function having singularity at some point x i ~ {x0, x 1. . . . . Xn}. Morever, there may be "h points which are unsuitable for rational interpolation because a division by zero arises at ;m in processing, which aborts the algorithm. We call these points as unattainable points. restigating the algorithm [2], we can prove that a division by zero, which arises in ~cessing, is caused by that there are at least two points selected at which any aij c A ( x ) of e (m, m) has same values. Obviously, if aij is a constant, the algorithm [2] will stop in the ;t loop due to abort. 'Remark

As described in [1], it is necessary to select suitable degrees of numerator and denominator rational interpolation so that it can fit all polynomials of the matrix A ( x ) at a given set of :a. In other words, we must know how many points should be used in interpolation. We 1sider that the degree of a polynomial matrix is the largest integer d - 1 for which cij d # 0 that the inverse of A is related to the adjoint A by (adj A ) A

= A ( a d j A ) = I . det A

Parallel algorithm for polynomial matrix inversion

581

where I is the identity matrix. Thus, the question above is equivalent to the estimation of the maximal degree of A - l(x) (here, assume A - 1 is still a polynomial matrix) because of that the rational Pade approximant (k + l) whose numerator is of degree k and denominator is of degree l has the same accuracy as a polynomial of degree k + l [6]. Further, let (d - 1) be the maximal degree of polynomial air(X) for 1 < i, j < m. Then det A ( x ) can be at most of degree m ( d - 1) since it is a sum of products of m elements of matrix A ( m × m). This also means that the accuracy for rational interpolation requires the sum of the n u m e r a t o r degree and the denominator degree can have at most degree D = ( k + l) = m ( d - 1). Therefore, we need to evaluate the polynomial matrix at D + 1 = m ( d - 1 ) + 1 points, at least, so that these interpolants can be reconstructed uniquely or to a desired accuracy over same field.

3. Algorithm

To describe the algorithm we use the following notation. • Let F denote a polynomial matrix of m × m and F(x)={fij(x)}

forl
• Let X denote a set of given n + 1 points {x0, x~ . . . . . Xn}. • Let C be the coefficient matrix of F C={cijk}

forl
where fij = x'~d "-'k=l C i~k x. k - I , and d - 1 is the maximal degree in all polynomials of F ( x ) . • L e t V ( m , m, n + 1) be the evaluation value matrix of F ( x ) at given n + 1 points where n = m ( d - 1). V={Vijk}

forl
here vii k = f i j ( x l ) = ~_,k=lCijkXl d h-i, X 1 EX. Conveniently, define V
forl
and meanwhile U ~k) = (v~k)) - 1. • L e t P ( m , m , n + 1) be used in rational interpolation to construct continued fraction, i.e. each pair of vectors {uij~, uiy2. . . . ,uiin+ ~} and {x 0, X l , . . . , x n} can generate vector { P i j l , Pij2 . . . . .

Pijn + 1}

The three main stages are illustrated in Fig. 1. The procedure shown in Fig. 1 can be sequentially programmed. In this case, given F ( m , m ) polynomial matrix with maximal degree d - 1, we summarize its complexities in Table 1 in terms of main stages. In Stage 2, the value of t is the number of iteration which is related to the precision selected [7]. In Fig. 1 it can be found that solving polynomial matrix inverse through rational interpolation can take advantage of data-level parallelism in following manner: • Since the evaluation for each element of the matrix at point x~ is computed independently, the evalutations of all elements can be carried out in parallel. • The evaluation of inverse of a polynomial matrix is achieved by inverting its evaluation matrix at same point, this operation can be done independently at each selected point.

E. I~. Krishnarnurthy, C. Pin

f Input

~ matrixC (m,m,d)

d

I-1

vu, = Z Cu, xk ,=l for l~ij~m, l~l~n+l

stage 1 evaluate Ue,) = (Va)) .l

matrix U Output

(~ mn+l) I J

for 1,k~+l

(m,m,n+l)~ /

J

stage 3 interpolate Fig. 1. The procedure of evaluation and interpolation. Fhe c o m p u t a t i o n for r a t i o n a l i n t e r p o l a t i o n is b a s e d on t h e fact t h a t t h e r e a r e two vectors vhich a r e {x o, x 1. . . . , x n} a n d {Uo, u l , . . . , uij ~} for each e l e m e n t o f t h e matrix. H e n c e , in :he stage 3 o f t h e p r o c e d u r e all v e c t o r s along t h e d i r e c t i o n o f t h e d i m e n s i o n 3 o f m a t r i x ? ( m , m, n + 1) can b e u s e d to g e n e r a t e the m a t r i x P in parallel. W i t h t h e s e f e a t u r e s , we p r o v i d e t h e a l g o r i t h m s for the t h r e e stages of the p r o c e d u r e .pectively. A l l t h e s e a l g o r i t h m s a r e b a s e d on a S I M D a r c h i t e c t u r e , such as a C o n n e c t i o n tchine. Such an a r c h i t e c t u r e can s u p p o r t g e n e r a l a r i t h m e t i c o p e r a t i o n s b e t w e e n any two erants that a r e c o n f o r m a b l e arrays, or any two o p e r a n d s in which t h e left o p e r a n d is any limensional a r r a y a n d t h e right o p e r a n d is a scalar.

;orithm Evaluation ,ut: :tput:

1. coefficient m a t r i x C = {ci~ ~} for 1 < i, j < m a n d 1 < k < d; 2. v e c t o r X = {x 0, x 1, x . . . . . x~}; e v a l u a t i o n m a t r i x F ( m , m , n + 1). Begin for l = 1, n + 1 do begin for h = 1, d d o begin V (t) = C (h) . x~_-i 1 End End End

pie 1 ge i

Arithmetic steps

Complexity

;e 1 ~e 2 ge 3

rn2(2(d - 1))(n + 1) (6m3t + 2m 3 + m2(t + 1) + m)(n + 1) 2m2(n + 1)2

O(m2dn) 0 (m3tn)) O(m2n 2)

Parallel algorithm for polynomial matrix inversion

583

Algorithm g-Inversion Input:

Output:

1. evaluation matrix V ( m , m, n + 1); 2. iterative precision e; 3. matrix l ( m , m, n + 1) in which I (i) is a diagonal unit matrix. matrix U(m, rn, n + 1) in which U (° is the inverse of V (i) for 1 < i < n + 1. Begin for 1 = 1, n + 1 do in parallel Begin a = 2 / t r a c e ( V ~ ° V a)~) X O (l) = Oil V(l)T

X 2 (~) = X0(o while • < max[ X1 q) - X 2 a ) [ do Begin X 2 (l) = ( I - Va)XO a)) X1 a) = XOV)(I + X 2 a ) ( I + X2a))) X 2 (t) = XO ~) XO ~t) = X1 ~t) End End U =X1 End

Algorithm Interpolation Input: Output:

1. matrix U(m, m, n + 1); 2. vector X = {x0, x l , . . . , xn}. matrix P ( m , m, n + 1). Begin if (Upq k -4= Upql) then subroutine A B O R T \. forl
Enddo R ( y + 1,n + 1) =

R ( j + 1,n + 1) __ S ( j + 1,n + 1)

for h = j + l, n + l do Begin M (h) = x h - x j g (h) = M ( h ) / R

(h)

R(h) = R(h) + p(j-1)

Enddo Enddo End In order to make the Algorithm Interpolation perform successfully, according to the discussion in Section 2, we require the matrix U(m, m, n + 1) to meet following condition Uij k -'~ Uij l

(3.1)

4

E.V. Krishnamurthy, C. Pin

ble 2 lge i ige 1 Lge2 Lge3

Arithmetic steps 2d(n + 1) 3m +3(m + 1)t 2(n + 1)+ ½(n + 1)2

Complexity O(dn) 0 (mt)) O(n 2)

r 1 < i, j < m; 1 < k, l < n + 1 and k 4= l. Therefore, the subroutine A B O R T is in charge of ecking up this condition to ensure that the matrix U ( m , m, n + 1) is not ill-conditioned. Therefore, the procedure of evaluation-interpolation can be represented as following ;orithm: ain algorithm 9ut: ~oefficient matrix C = {ciy}, for i, j = 1, 2 . . . . . m and k = 1, 2 . . . . . d; selected n + 1 points X = {Xo, x 1. . . . . xn}. ttput : evaluation matrix V ( m , m, n + 1); inversion matrix U ( m , m, n + 1); matrix of continued fraction P ( m , m, n + 1); :gin Step 1. Run the algorithm Evaluation to generate the evaluation value matrix V ( m , 1); Step 2. Run the algorithm g-Inverse which is used to obtain each U ~k) from V ~k) r k ~ { X o , X 1. . . . . Xn} in parallel. Step 3. For all vectors of matrix U along the third dimension, that is {Uijl, uij 2 . . . . . call the algorithm Interpolation to interpolate in parallel for each element of the mial matrix F. :d

m, n +

at point uijn+ t}, polyno-

For these algorithms above, we briefly give the complexity analysis in Table 2. Making a nparison between Table 1 and Table 2, we easily find out that these parallel algorithms npute the procedure of evaluation-inversion-interpolation for whole matrix F ( m , m ) :hin the time which it takes to do an element of matrix F without using parallelism. Thus roducing parallelism pays off very well. mark e algorithm can be easily extended to invert rational polynomial or other function matrices properly choosing the number of evaluation points to estimate the degree of the determi"it.

Implementation on the Connection Machine

]'he algorithms presented in Section 3 have be developed on the Connection Machine in i F O R T R A N . In order to make use of those parallel properties of algroithms described in Mous section, our objective of implementation is to try to use as many processors which the tem provides as possible so that the algorithms can be carried out with as high parallelism possible.

Parallelalgorithmfor polynomialmatrix inversion

585

The Connection Machine is a SIMD architecture system with many processors, in which the maximal configuration of processors can reach 64K. Therefore, any double precision three-dimension VP set (virtual processor set) of ( p x q x l) which has the VP ratio 1 must meet the condition below: p × q x l < 2048. (4.1) Consider the situation of our algorithms, we have

• p=q=m; • l=n+l=m(d-1)+l. Thus as if (m 3 x (d - i) + m 2) is less than or equal to 2048, it can be expected that the computation of the procedure of evaluation-interpolation with these algorithms can be carried out as we have designed.

4.1 The extension of Cannon's algorithm We may note that the most expensive part of running algorithm g-Inverse is the computation of matrix multiplication. For each plane V (i) of (m × m) where 1 < i < n + 1, it needs at least 3t + 1 times of matrix multiplication to compute the inverse of V (i) if the number of iteration is t. Totally, we have to have (3t' + 1)(n + 1) times for whole V(m, m, n + 1), where t' is the average number of iterations. On the other hand, it has be found that the computations for all those planes V °), V (2) . . . . . V (n + 1) actually are able to be done in parallel independently. The well-known Cannon's Algorithm is a successful program for matrix multiplication in CM, which is an optimal systolic algorithm since no processor is ever idle. An extension of Cannon's algorithm is required so that it can fit our application, i.e. we should be able to move data in each plane of a three-dimension matrix along the given dimension at same time. Our effort for this purpose is to make each plane V ~i) in the Stage 2 of Fig. 1 be able to be computed towards its inverse U ~i) in parallel. Under the modification, a two-dimension skewing array instead of a skewing vector in the original algorithm is calculated to initially skew the three-dimension operand matrices. For a matrix multiplication of A(4 X 4 x 6) and B(4 X 4 × 6), the skewing array is given by:

0000

1 1 1 1 2 2 2 2 3 3 3 3 where the element values in each column determine the amount of left-horizontal circular shift or up-vertical circular shift produced in each pair of operand matrices along the third dimension. Then our algorithm proceeds as follows: Matrix A before and after skewing along the second dimension is given below:

alli a21i a31i a41i

al2i a22i a32i a42i

al3i a23i a33i a43i

al4i I [alli al2i al3i a24i ] ~ I a22i a23i a24i a34i / [a33i a34i a31i a44i ] ~ a44i a41i a42i

al4i I a21iI a3zi[ a43i]

for 1 < i < 6 . Similarly matrix B before and after skewing along the first dimension is given by:

blli b21i b31i b41i for 1 < i < 6 .

bl2i b22i b32i b42i

bl3i b23i b33i b43i

bl4i b24i b34i b44i

blli b21i b31i b41i

b22i b32i b42i bl2i

b33i b43i bl3i b23i

b44iI blai I b24i i b34i]

i

E.V. Krishnamurthy, C. Pin

processor

5

pro I 0

4

pro20~

3

p os

2

pro4 0

1

pro5

0

~

0--~

0---~ 0

0 ~ 0 ~ 0 1

2

~ 0 ~ 0 3

4

5

time

Fig. 2. Systolic property for n + 1 = 5 points interpolated.

The remainder of the algorithm can be performed in a similar manner as Cannon's gorithm. This algorithm has an iteration loop of repetition m. In every iteration of the loop, element to element multiplication between matrix A dn matrix B will be done, and the ;ult is added in a partial result, then two shifting operations are used to change positions of :ments of matrices for next iteration. The first shifting operation moves all planes of matrix along the dimension 1 one place to the left except moving the current first plane to the last. e second moves all planes of matrix B along the dimension 2 up one position except Mng the current top plane to the last. With this improvement, the Step 2 of the main ;orithm can be carried out with O(tm).

' Systolic computation of a three-dimension Virtual Processor set The systolic property of the rational function interpolation is illustrated in Fig. 2; initially need (n + 1) processors and at the end of each step of the loop one processor is freed. At ~,last step only the pro(n + 1) is used. When dealing with matrix interpolation we initially use a three dimensional VP set(m x m n + 1) processors. After each step in the loop all processors in a plane along the third nension of the set will be released. Thus we require m2(n + 1) processors initially and m 2 )cessors at the last step as shown in Fig. 3 where m = 4, d = 2, n + 1 = 5.

Example We illustrate the above algorithm with a very simple example, for easy understanding. Let

A(x)=(3+x

X+x) 2+x 2+x ere m = 2 , d = 2 , thus n + l = m ( d - 1 ) + l = 3 , which means that three points are luired by the algorithm to obtain the inversion of A(x). We select X = {1, 2, 3}. By the :orithm Evaluation the evaluated matrices are (rounded to single precision values): Vo ) = [ 4.0013000000000000 2.000000000000000 at x 1 3.000000000000000 3.000000000000000 } ' V(2) = (5.000000000000000 3.000000000000000 / at x 2; 4.000000000000000 4.000000000000000 } ' V(s) = [ 6.000000000000000 4.0000000000000001 5.000000000000000 5.0000000000000001' at x = 3.

Parallel algorithmfor polynomial matrix inversion

587

processor

2 m, 5

O°°'" O

m2, 4

0---©---©'©

m2, 3 m2, 2 mZ, 1

1

2

3

5

4

time

Fig. 3. Systolic property for matrix rational interpolation.

T h e n their M o o r e - P e n r o s e inverses could be worked out through the algorithm g - Inverse:

(V(1))+=U(1):( (V(2))+= U(2)= (

( V(3)) + =

- 0.500000000000000

-0.333333333333333) 0.666666666666667

0.499999999999998 -- 0.499999999999997

--0.374999999999998) 0.624999999999997

0.500000000000000

U(3) = (

0.500000000000000 -

0.500000000000000

-- 0.400000000000000 ]

0.6000000000000001"

Using the matrices U (1), U (2) and U (3), the algorithm Interpolation generates a vector of continued fraction approximant for each element of the matrix A ( x ) . T h e y are PII = {0.500000000000000, -428914250225762, 0.500000000000002} P12 = { - 0.333333333333333, - 24.000000000013, - 0.500000000000040} P2i = { - 0.500000000000000, 333599972397814, - 0.500000000000003} P22 = {0.666666666666667, - 23.999999999983, 0.500000000000052}. Thus, according to (2.9), we can construct the inverse for matrix A ( x ) as below: A-1

(a'u =

a~ 1

a~z] a22 1

where a~j are the continued fraction approximants given by: x-1 a'll = 0.500000000000000 +

x -- 2 -- 428914250225762 + -

0.000000000000002

x-1 a'12 = - 0.333333333333333 +

x - 2 - 24.000000000013 +

- 0.166666666666707

E.V.. Krishnamurthy, C. Pin X-1

a~l = -- 0.500000000000000 +

x-2

333599972397814 +

-0.000000000000003

x-1

a22' = 0.666666666666677 +

x-2

-23.999999999983 +

-0.166666666666615

Inversion of multivariable polynomial matrices For inverting the multivariable polynomial matrices, the evaluation-interpolation technique n be used [1]. Like a single-variable interpolation, a Thiele-type interpolation formula which es a branched continued fractions for two-variable functions can be used [10]. Let f(x, y) a function defined on a given real field F and such that for each point (xi, y ) ~ F

f ( x i , Yj) =fij, i = 0 , 1 , . . . , n , j = O , 1 . . . . . m. t us introduce the partial inverted differences by:

q,[x,][yA =f/j, Y~ - Y o

~,[Xo][y0, yl] = q , [ x 0 l [ y l ] ~,[x0][y0 . . . . .

y,] =

~,[x0][y0] '

Yt - Yt- 1 g , [ x 0 ] [ y 0 , . . . , y,-2, y,] - g,[x0][y0 . . . . . y , ] ' X 1 --X o

~,[x0, ~ , ] [ y 0 . . . . . y,] =

~b[x1][Y0 . . . . . Yl] - ~ / [ X o ] [ Y o . . . . , Y l ] '

q,[x0 . . . . . x ~ ] [ y 0 . . . . . y,] Xk -- X k - 1 = ~b[x0 . . . . . x k - 2 , xk][Yo . . . . , Y t ] - ~ b [ x 0 . . . . . xk-1][Y0 ..... Y,] =bk" all the partial inverted differences ~b[x0,... , xk][y 0. . . . , Yl], 0 < k < n, 0 < l < m for the lction f(x, y) are defined, then a branched continued fractional interpolation for f(x, y) :r the field F can have the form as below [9]:

m° X

f(x'y)=b°°+

~-"

j=l

-

n

xj_______21 bjo

Y - Yi- 1

+ --~11 i

I

]

mi X - - X j - l [ boi +

j=ll

bji

:h m o > m l > . . . > _ m n . The computational procedure for the inverse differences b,j can be carried out in parallel But it is not clear at this time how well above method of using continued fraction can be ended to more than two-variable rational polynomial matrices. The theoretical studies •ried out so far only concern with multivariable Pad6 approximants. While polynomial and ated spline approximants can be easily extended to a tensor product space [1] in several 'iables, it is not known to the authors how similar extension can be made to continued ction approximants.

Parallel algorithm for polynomial matrix inversion

589

Acknowledgements T h e a u t h o r s w o u l d like to t h a n k t h e r e f e r e e s for s u g g e s t i n g i m p r o v e m e n t s for an e a r l i e r v e r s i o n o f this p a p e r .

References [1] E.V. Krishnamurthy, Error-Free Polynomial Matrix Computations (Springer-Verlag, New York, 1985). [2] V.K. Murthy, E.V. Krishnamurthy and Chen Pin, Systolic algorithm for rational interpolation and Pad6 approximation, Parallel Comput: 18 (Jan. 1992) 75-83. [3] E.V. Krishnamurthy and S.K. Sen, Numerical Algorithms (Affiliated, East-West Press, New Delhi, 1986). [4] Chen Pin and E.V. Krishnamurthy, Data parallel evaluation-interpolation algorithm for solving functional matrix equations, Lecture Notes in Comput. Sci. 634 (Proc. CONPAR 92-VAPP V, Lyon, France, Sep. 1992) (SpringerVerlag, Berlin) 781-782. [5] A. Ben-Israel and T.N. E Greville, Generalized Inverses: Theory and Applications (Wiley, New York, 1974). [6] E.W. Cheney, Introduction to Approximation Theory (McGraw Hill, New York, 1966). [7] E.V. Krishnamurthy and S.G. Ziavras, Matrix g-inversion on the Connection Machine, Tech. Report, CAR-TR399, Univ. of Maryland, 1988. [8] E.V. Krishnamurty and Chen Pin, Polynomial matrix inversion in the Connection Machine using data parallel evaluation-interpolation, submitted to ACSC-16, 1993. [9] A.M. Cuyt and B.M. Verdonk, A review of branched continued fraction theory for the construction of multivariate rational approximants, in: C. Brezinski, ed., Continued Fractions and Pad~ Approximants, (NorthHolland, Amsterdam, 1990) 75-83. [10] W. Siemaszko, Thiele-type branched continued fractions for two-variable functions, J. Comput. Applied Math. 9 (1983) 137-153.