VLSI implementation of the capacitance matrix method

VLSI implementation of the capacitance matrix method

VLSI implementation of the capacitance matrix method P. Favati Istituto di Elaborazione dell'lnformazione, CNR, Via S. Maria 46, Pisa, ltaly G. Lotti ...

352KB Sizes 1 Downloads 67 Views

VLSI implementation of the capacitance matrix method P. Favati Istituto di Elaborazione dell'lnformazione, CNR, Via S. Maria 46, Pisa, ltaly G. Lotti Dipartimento di Matematica, University of Trento, Povo, Italy F. Romani Dipartimento di Informatica, University of Pisa, Corso Italia 40, Pisa, Italy

Received 28 September 1988 Revised 15 April 1990

Abstract. The solution of linear systems by the Morrison formula often allows a clever use of the structure of the coefficient matrix. This method, when applied to the solution of Partial Differential Equations, is called the capacitance matrix method. A VLSI design for this method is studied, and area-time upper bounds are obtained. Some applications are discussed.

Keywords. VLSI models, area-time complexity, ordinary differential equations, elliptic equations, linear systems, Morrison formula, capacitance matrix method.

1. Introduction

Often, in numerical analysis, a linear system A x = b has to be solved, where the coefficient matrix A can be expressed as the sum of a matrix B, for which efficient solvers are available, and a low-rank matrix H. In such cases the Morrison formula [7] can be used to exploit the structure of B and H. This situation occurs when solving some differential problems by finite difference discretization, and the use of Morrison formula is known as capacitance matrix method [1,11].

Elsevier INTEGRATION, the VLSI journal 11 (1991) 3 - 9 0167-9260/91/$03.50 © 1991 - Elsevier Science Publishers B.V.

4

P. Favati et al. / Capacitance matrix method.

In this p a p e r the V L S I i m p l e m e n t a t i o n o f c a p a c i t a n c e m a t r i x m e t h o d is investigated. Let A b e a q x q n o n s i n g u l a r matrix. L e t Z = B + H , w h e r e B is n o n s i n g u l a r a n d H is of r a n k p, p << q, i.e. H = FG, w h e r e F a n d G are o f size q x p a n d p X q, respectively. T h e n

A - 1 = B - 1 - B - 1 F ( I + G B - 1 F ) - I G B -1,

(1.1)

w h e r e the p x p m a t r i x C = I + G B - 1F is n o n s i n g u l a r a n d is called the capacitance matrix. T h e solution x = A - lb can b e w r i t t e n as

x = B-lb-

B-aF(I + GB-1F)-IGB-lb.

(1.2)

In Section 2 the V L S I i m p l e m e n t a t i o n of an a l g o r i t h m to c o m p u t e (1.2) is studied a n d the c o r r e s p o n d i n g A T 2 b o u n d is given. In Section 3 the a b o v e results are a p p l i e d to the V L S I solution o f two differential p r o b l e m s , n a m e l y (i) o r d i n a r y differential e q u a t i o n s w i t h variable coefficients a n d periodic b o u n d a r y c o n d i t i o n s a n d (ii) P o i s s o n or H e l m h o l t z partial differential e q u a t i o n s o n irregular regions. In [14] the stability of M o r r i s o n ' s f o r m u l a is e x a m i n e d a n d it is stated that if A a n d B are well c o n d i t i o n e d t h e n it is possible to find F a n d G m a t r i c e s such that

Paola Favati was born in Pisa, Italy, in 1959. She graduated on Mathe-

matics at the University of Pisa in 1982. Since 1984 she has been a researcher of CNR at the Istituto di Elaborazione dell'Informazione di Pisa. Her research interests are in Parallel Algorithms and Numerical Linear Algebra.

Grazia Lotti was born in Pisa, Italy, in 1952. She graduated on Computer

Science at the University of Pisa in 1974. She is a Professor of Computer Science at the University of Trento, Italy. She has been engaged in researches on Computational Mathematics, Numerical Quadrature and VLSI Complexity.

Francesco Romani was born in Grosseto, Italy, in 1952. He graduated on

Computer Science at the University of Pisa in 1974. He is a Professor of Computer Science at the University of Pisa. He has been engaged in researches on Cellular Automata, Number Theory, Computational Mathematics, Numerical Quadrature and VLSI Complexity.

P. Favati et al. / Capacitance matrix method.

5

the computation of (1.2) is stable. It is easy to verify that the choice, made for F and G in our examples, results to be the best possible one in the sense of [14].

2. Area-time upper bounds for the capacitance matrix method

In this section a VLSI design is proposed for the implementation of the capacitance matrix method. The inner step of this method is the solution of a linear system having the capacitance matrix C as coefficient matrix. If C is symmetric positive definite then the most c o m m o n methods to solve the system Ct = o are Gaussian elimination (without pivoting) and the conjugate gradient method. Since, in the general case, matrix C has not these properties, the above mentioned methods can be applied to solve the equivalent linear system C T C t = c T v , e.g. see [12]. In this work we solve the system c T c t = CTo by inverting the coefficient matrix by Gaussian elimination and then performing a matrix-vector multiplication. It is well known that this technique results to be specially suitable when many systems having the same coefficient matrix are to be solved. Then the computation of (1.2) can be decomposed into the following steps: (1) Solve the p linear systems B Z = F (2) Compute C = G Z + I (3) Compute M = c T c (4) Compute M -1 (5) Solve the linear system B u = b (6) Compute v = Gu (7) Compute w = CTv (8) Compute t = M - l w (9) Compute s = F t (10) Solve the linear system B r = s (11) Compute x = u - r This algorithm leads to the design of Fig. 1, where module I is devoted to solve linear systems with B as coefficient matrix (steps 1, 5, 10); module II performs the matrix-vector product of size q × p (step 9) and it is capable of storing and

Fig. 1. Circuit implementing the capacitance matrix method.

Fig. 2. Preprocessing phase of computation.

6

P. Favati et al. / Capacitance matrix method.

Fig. 3. Secondphase of computation.

Fig. 4. Final phase of computation.

shifting out the columns of the matrix F. Module III performs the matrix-vector product of size p × q (steps 2, 6) and, after step 2 has been executed, it adds to the results the columns of the identity matrix. Module IV receives as input and stores the matrix C by columns, computes the matrix M = c T c (step 3), computes and stores the inverse matrix M -1 (step 4). Then, it receives as input the vector v and solves the linear system M t = CXv by computing two p × p matrix-vector product (steps 7-8). Finally, in module V the partial result of step 5 is stored and the required solution is computed (step 11). The algorithm can be divided into three phases: (i) a preprocessing phase not depending on the right hand side vector, in which the inverse of the capacitance matrix is computed (steps 1-4). This phase is evidenced in Fig. 2. (ii) the solution of the system B u = b, and the computation of vector v (steps 5-6). This phase is evidenced in Fig. 3. (iii) a final phase in which the solution x = u - B - a F M - a C T v is computed, by solving another linear system with coefficient matrix B (steps 7-11). See Fig. 4. In Fig. 5 the VLSI circuit implementing the above described algorithm is presented. Modules II and III are implemented by using the well-known structure of mesh of trees. An r × s matrix-vector product can be performed in [log r] + [log s] + 1 time units by a mesh of trees whose layout has height h = O(r log s) and width l = O(s log r) [9]. Let A B and TB be the area and the time used by module I to solve a linear system having B as coefficient matrix, respectively. Then pT• steps suffice to solve the p systems at step 1 and p ([log q] + [log p] + 1) additional steps suffice to compute the capacitance matrix.

~ .

..2-

,II Fig. 5. VLSI layout of the circuit of Fig. 1.

P. Favati et aL / Capacitance matrix method.

7

In module IV the matrix C is stored in a mesh of processing elements, then its entries are opportunely routed as input of a circuit performing the p × p matrix multiplication M = CTC (step 3). The most suitable scheme seems to be that suggested by Preparata-Vuillemin in [10] with time T = O ( p ) and area A = O(p2). The p × p inverse matrix M -1 (step 4) can be computed by Gaussian elimination (without pivoting) in a systolic array in time T = O ( p ) and area A = O ( p 2) [8]. Finally, a time of order O ( p ) and an area of order O ( p ) suffice to perform steps (7), (8) by using the same scheme of [10] for matrix-vector multiplication. It is easy to prove that the insertion of the wires to route data, of the counters and the delays needed to synchronize the computation will not change the bounds to the area and the time for module IV, which respectively are of order O ( p 2) and O(p). The whole algorithm can be implemented with A = O(As+pq

log p log q + p 2 ) ,

T = O( p Ts + p log q). Note that the preprocessing phase shown in Fig. 2 can be performed in time Tp = O ( p T s + p log q). At the end of this phase, the inverse matrix M -1 is available in the circuit and it can be used to solve any system of the type A x = b in time TA = O(Ts + log q + p). If the linear system CTCt = CTv is solved by the conjugate gradient method, then the VLSI implementation of the capacitance matrix method results to have the upper bounds A = O(A s + p q log p log q +p2), T = O ( p T B + p log q + p k ) , where k is the number of the conjugate gradient method iteration steps [3].

3. Applications An immediate application of the previous algorithm is the solution of linear systems derived from the finite difference discretization of the following one dimensional differential problem with periodic boundary conditions, and variable coefficients: a(x)--d-~x2u(x ) + fl(X) d ~ U ( X ) + 3 " ( x ) u ( x ) = f ( x )

U(Xo) =

u(xo)

In this case the discretization leads to the linear system A x = b where

A=

28o + 3'0

-80

0

-8 0

--.81

281 + 3'1

-- 81

0

-~n-1

0

...

I

h2

•(x,) + - 2h -

'

v, = 3'(x,).

-Sn 1

28n-1 + Yn-1

8

P. Favati et al. / Capacitance matrix method.

Matrix A can be decomposed as A = B + H, where B is the tridiagonal part of A and H = FG is a rank two matrix, namely

=lli1

[0 G=

-~-1

0 0

0:0] ...

0

The capacitance matrix method can be applied by using an efficient solver for tridiagonal linear system as, for example, the o d d - e v e n reduction algorithm [13]. In [4] the VLSI implementation of the o d d - e v e n algorithm was given with O(n log n) area and O(log n) time. Therefore, in this case, the VLSI circuit of Fig. 5 produces the bounds A=O(nlogn) T = O(log n). Note that in the special case a(x) = a, fl(x) = b, ~,(x) = c, the resulting linear system Ax = b has a circulant coefficient matrix. It is well known [6] that, if A is a circulant matrix, then A = ~2D~2/~, D = diag(dl, d 2. . . . . dn), (dl, d2 . . . . . dn) T = nl/2Oa, where a T is the first row of A and $2 is the Fourier matrix of order n. Therefore another appropriate algorithm seems to be that resulting from this property. However in a VLSI environment this last algorithm results to be less efficient than the capacitance matrix method since the VLSI implementation of the D F T of order n has the lower bound

A T 2 = ~(n210g2n). The algorithm shown in Section 2 can be efficiently used to obtain the numerical solution of the Helmholtz equation ~2

~2

3x2U(X, y)-ff-f2y2U(X, y ) + c u ( x , y ) = f ( x ,

y)

(3.1)

on an irregular region F with Dirichlet or N e u m a n n boundary conditions [2]. Let ~ be a rectangular region containing F. ~ is covered by a uniform grid of m × n points (with m = O(n)). Let B the coefficient matrix, of order ran, of the linear system derived from the finite difference discretization of the problem (3.1) on ~ . Let p be the number of "irregular" mesh points, that is the mesh points in F which have at least one nearest neighbor mesh point outside F. It is easy to see that if F is regular enough (e.g. convex) then p = O(n). The matrix A, corresponding to the difference Helmholtz problem enlarged to ~ handling the given boundary conditions on 3F, differs from matrix B only in the p rows corresponding to the irregular mesh points [11]. Let us denote with G the p ×mn matrix being the compact representation of B - A , from which the zero rows corresponding to the regular mesh points have

P. Favati et al. / Capacitance matrix method

9

b e e n deleted, a n d let F be the rnn × p m a t r i x o b t a i n e d selecting in the i d e n t i t y m a t r i x the p c o l u m n s c o r r e s p o n d i n g to the irregular m e s h points. T h e n A = B + F G a n d the c a p a c i t a n c e m a t r i x m e t h o d c a n b e applied. In this case m o d u l e I results to be the V L S I Poisson solver d e s c r i b e d in [5], with area A B = O(n310g2n) a n d time TB = O(log n), so that the V L S I circuit of Fig. 5 p r o d u c e s the b o u n d s A = O(n310g2n), T = O ( n log n).

References [1] Buzbee, B.L., F.W. Dorr, J.A. George and G.H. Golub, The direct solution of the discrete Poisson equation on irregular regions, S I A M J. Numer. Anal. 8 (1971) 722-736. [2] Birkhoff, G. and R.E. Lynch, Numerical Solution of Elliptic Problems (SIAM, Philadelphia, 1984). [3] Codenotti, B. and P. Favati, Area-time complexity of the unconstrained minimization problem, Calcolo 23 (1986) 175-184. [4] Codenotti, B. and G. Lotti, A VLSI fast solver for tridiagonal linear systems, Inform. Process. Lett. 23 (1986) 111-114. [5] Codenotti, B., G. Lotti and F. Romani, VLSI implementation of fast solvers for band linear systems with constant coefficient matrix, Inform. Process. Lett. 21 (1985) 159-163. [6] Davis, P., Circulant Matrices. (John Wiley, New York, 1979). [7] Householder, A.S., The Theory of Matrices in Numerical Analysis. (Blaisdell, New York, 1964). [8] Kramer, M.R. and J. van Leeuwen, Systolic computation and VLSI, Foundations of Computer Science IV (1983) 75-103. [9] Leighton, F.T., New lower bound techniques for VLSI, Proc. 22-nd Ann. I E E E Symp. on Foundations of Computer Science, 1981, pp. 1-12. [10] Preparata, F.P. and J.E. Vuillemin, Area-time optimal VLSI networks for multiplying matrices, Inform. Process. Lett. 11 (1980) 77-80. [11] Proskurowski, W., Numerical solution of Helmholtz's equation by implicit capacitance matrix methods, A C M Trans. on Math. Soft. 5 (1979) 36-49. [12] Proskurowski, W. and O. Widlund, On the numerical solution of Helmholtz's equation by the capacitance matrix method, Math. Comp. 30 (1976) 433-468. [13] Stone, H.S., Parallel tridiagonal equation solvers, A C M Trans. on Math. Soft. 1 (1975) 289-307. [14] Yip, E.L., A note on the stability of solving a rank-p modification of a linear system by the Sherman-Morrison-Woodbury formula. S l A M J. Sci. Stat. Comput. 7 (1986) 507-513.