A Fast Algorithm for Solving the Tensor Product Collocation Equations by WEIWEI SUN and N. G. ZAMANI
Department Canada
of Mathematics
and Statistics,
University of Windsor, Windsor, Ontario,
ABSTRACT :
A fast algorithm is presentedfor solving the tensor product collocation equations (A, @ B,+B, @ A,)u = b, obtained from the discretization of the Poisson equation in a rectangular region by the collocation method. The Fast Fourier Transformation (FFT) algorithm is employed to achieve the above objective. The operation count is shown to be O(N* log, N) which makes the overall calculations very economical.
I. Introduction Over the past twenty years, fast algorithms in terms of FFT have been used to solve the Poisson equation in a rectangular region (l-3). These algorithms are designed for a system of linear equations by the finite difference method. When the collocation method with the Hermite bicubic interpolation polynomials is applied to solving a Poisson equation in a rectangular region, the system of linear equations can be written in the tensor product form as (A, 0 BY+ B, @ A,)u = RHS.
(1.1)
Since the matrix A, @ BY + B, @ A, is nonsymmetric, standard iteration methods do not converge for such a system (4,5). Most investigators have used the Gauss elimination method for solving Eq. (1.1). Recently, Dyksen (6) has used a tensor product generalized AD1 method for separable elliptic problems to tackle the above equation. He has shown the operation count to be 328 N’ on an N x N grid. This is lower than the operation count 32 N4 of the Gauss elimination method. In this paper, we present a new algorithm. FFT algorithm is used to solve Eq. (1.1). Using a matrix transformation, the two-dimensional problem is reduced to a series of one-dimensional problems. Hence, only lower order matrices are solved in this algorithm and, therefore, the operation count is 32 N2 log, N for an N x N grid. In Section II, we present the algorithm by a similarity transformation Q and a series of simpler one-dimensional problems equivalent to (1.1) are given. In Section III, the eigenvectors of matrix B; ‘A, are found explicitly and matrix Q is constructed. Section IV contains a computational procedure in which FFT algorithm is used and finally, in Section V, a numerical example is considered.
0 The Franklin lnstltute 00164032/89
%3.00+0.00
295
Weiwei Sun and N. G. Zamani II. The Algorithm For the sake of convenience, we consider a Poisson boundary conditions in a rectangular region R, i.e.
equation
with Dirichlet
(X,Y> Efl
v2u = f(X,Y) 4an = 9. Using the Hermite
bicubic
(2.1)
interpolation,
u is approximated
as
W%Y) = : % ~,jri(x)r,(Y)> i= 1 j=]
where ti(x) and u],(y) are one-dimensional Hermite cubic interpolation The collocation method forces U(x, v) to satisfy (2.1) at the Gaussian result is Eq. (1 .l), i.e.
functions. points. The
(A, 0 By +B, 0 A,)u = RHS,
(2.2)
where A 0 B denotes the tensor product (Kronecker product) of A and B. Many properties of tensor products are given in Halmos (7). It has been proven by Dyksen (6) that the eigenvalues of matrix BJ: ‘A, are real, negative and distinct. Hence, there exists a non-singular matrix Q, such that B>:‘A, = Q- ‘AQ where the column vectors of matrix Q are eigenvectors of matrix B,: ‘A,, A is a diagonal matrix, A = diag (A,,, I,,, . , A,,- ,. AN, A_ ,, . . . , A-,, ,) and lli are eigenvalues of B,; ‘A,. . Multiplying both sides of (2.2) by Z@ Q- ’ Bj: ‘, results in (A, 0 I+ B, @ A)21 = RHS,
(2.3)
U = (Z@ Q-‘)u
(2.4)
where
RHS = (Z@ Q-‘B,-‘)RHS. After interchanging
certain
rows and columns
(2.5)
u* = 1
of (2.3), it can be rewritten
A,OI+B,O& A,OI+B,Ol,
as
RHS*.
A,OZ+B,OA-N+,
I
(2.6)
Here u*=B.u
(2.7)
RHS* = R.RHS, and R is the products 296
of some permutation
(2.8)
matrices. Journal
of the Frankhn Pergamon
lnst~tute Press plc
Solving Tensor Product Collocation Thus problem (2.1) is reduced to solving Eq. (2.6), and therefore, of lower order problems, i.e. (A,O1+B,O/Zi)u,*=RHS,*, III. The Eigenvectors
of Matrix
i=O,fl,...,
+(N-l),N.
Equations
solving a series
(2.9)
B; ‘A,
The algorithm in Section II relies on the explicit knowledge of eigenvalues and eigenvectors of matrix B.,: ‘A,, i.e. the solution of a generalized matrix eigenvalue problem A,,v = l_Byv. In Dyksen’s paper (6), the eigenvalues are shown to be
(3.1) where p = h4[(16r4-16r2+3)p-8r2+2] v = h’[(-128r2+48)p+48] w = 192~ and p = tan’ (lhn/2), In this section, we derive a formula problem of differential equation,
1 z = __ 24’
for eigenvectors.
Let us consider
an eigenvalue
p”(X) = &?-p(x) x E (0, L) p(0) = p(L) = 0.
(3.2)
When the collocation method with the Hermite cubic functions (3.2), we obtain a generalized matrix eigenvalue problem (8) : Au = IuBv.
is applied
to
(3.3)
Let p be the Hermite cubic collocation approximation of the eigenvalue problem (3.2) corresponding to eigenvalue 2 and h = L/N, p can be expressed as a piecewise polynomial : pk(x) =a,thk(x-x,)+ck(X-:)2 defined Then
+dkelx,
in [xk, xki ,I, where Xk is midpoint
Vol. 326, No. 2, pp. 29S307, Pnnted in Great Britain
k=O,l,...,N-1, of interval
(3.4)
(xk, xki ,) and xk is node.
1989
297
Weiwei Sun and N. G. Zamani
PkMl) = w*k+
19
PkcT&)= (Bv)*k+2, where xt, and x& are two Gaussian points is [xk, xk+ ,I. Let U, be eigenvector of problem (3.3) corresponding to eigenvalue P=
(Bv,,Bv
,,...,
Bv,_,,Bv,,Bv_
,,...,
A, and
Bv_,+,),
i.e. P=BQ.
(3.5)
Now, we find pp(x). First, let pk(x) satisfy Eq. (3.2) at Gaussian We obtain 2
2
ck+dkTh = ;1 akibkzh+ckT ( Adding
and subtracting ck =
fdk6
point Xkfzh.
r3h3 (3.6)
)
(3.6), gives dk = A(,,,?),
,(,+bkF),
and therefore, ak=
=
bk=
(i-T)dk.
(3.7) into (3.4), pk can be rewritten
Substituting Pk(X)
(k--T)ck,
(3.7)
as
+ (x-22kJ2 )ck+((;-~).(x-%k)+(X-;)3)dk.
;-T
(3.8) By the continuity and Pi(xk+ I) = pi+
conditions ,@k+
1),
we
ofp and p’ at node xk+ ,, i.e. pk(xk+ ,) = pk+ ,(xk+ ,) obtain
r(-ck+Ck+l)
ck+ck+l
=
=
(3.9)
@k+dk+,),
(3.10)
+dk+dk+,),
where 1 z2h2 r=n-2+8,
h2
s=
1 z2h2 h h3 --6 2+48 ( I. )
and
t=--
From (3.9) and (3.10), one can arrive at rt(&,-2dk+dk+,) rt(ck_1_2ck+ck+I)
=s(dk_,+2dk+dk+,)
(3.11)
=s(ck_I+2ck+ck+I),
(3.12)
i.e. 298
Journal of the Franklin Institute Pergamon Press plc
Solving Tensor Product Collocation Equations (rt-s)dk_,--2(rt+s)dk+(rt-s)dk+,
=O,
(3.13)
(rt-s)ck_,
= 0.
(3.14)
and -2(rt+s)c,+(rt-s)ck+,
Hence, ck and dksatisfy the same difference equation. Since r, t and s are independent of k, (3.13) and (3.14) are two difference equations with constant coefficients. The general solution of (3.13) and (3.14) can be expressed by ck = acosk0+fisinkt),
(3.15)
dk = ycoske+6sinktI
(3.16)
and
where rt+s case = ~ rt-s’
sindz2*
rt-s
’
(3.17)
and a, p, y and 6, are independent of k and only dependent on eigenvalue il. In order to determine ~1,B, y and 6, we now consider the boundary conditions. Since p(O) = p(L) = 0, one can write
i.e. p,(O) = rcO-sd,
= 0.
(3.18)
Similarly, P,,-
, (L)
= rc,,- , +sd,_
, = 0
(3.19)
and from (3.9) and (3.10), we have = s(d,+d,)
r(-cO+cJ
(3.20)
and co+c, which are simplified
(3.21)
= t(-d,+d,),
to tr(-co+c,)-s(co+c,)
= 2std,
(3.22)
and 2rco = rt(-d,+d,)-s(do+d,). The coefficients OC,jI, y and 6 are determined by the boundary conditions (3.19), (3.22) and (3.23). Therefore, by (3.15) and (3.16), we obtain Vol. 326, No. 2, pp. 295-307, Printed in Great Britam
(3.23) (3.18),
1989
299
Weiwei Sun and N. G. Zamani cg = a
do =
Y
c, = acos6+/?sin6 d, = ycos8+6sin8 c,,_ r = CIcos (N-
1)6+/I sin (N-
l)d
and d,,_, = ycos(N-1)8+6sin(N-1)8. Substituting m-q
(3.24)
(3.24) into (3.18), (3.19), (3.22) and (3.23) respectively,
we have
= 0,
~[~c0s(N-l)~+psin(N-i)~]+~[~c0s(N-l)8+6sin(N-1)8] -(tr+s)a+(tr--)(acos@+fisin8)
=0,
= 2s~
and 2ra = -(rt+s)y+(rt-.s)(ycos8+6sinQ, and from these formulae,
(3.25)
the following
p -
are obtained
:
--J-try r
and (&jsrry rt
(3.26)
’
which satisfy y 2scos(N-
1>e+ -&z-
Since p(x) is an eigenfunction,
e)sin(N-l)Q]
= 0.
(3.27)
y # 0. Then by (3.17), we have
2~~0s (N- l)e+2sc0t
8sin (N-
i)e = 0,
i.e. sin NB = 0, and therefore 300
Journal of the Franklm Institute Pergamon Press plc
Solving Tensor Product Collocation
B,=$
z=o,+1,...,
&-(N-l),N,
Equations
(3.28)
satisfying rtsin’i
8
+~cos*~
6 = 0.
Equation (3.29) is just (18) in (6) and the eigenvalues of problem determined by solving (3.29). Thus the solution of Eqs (3.11) and (3.12) can be expressed by ck =
scoskO+ r
-JsinkB r
y
(3.3) can be
(3.30)
>
and (3.31) By (3.29), we have s
=
Substituting
2’
t.tan-
r
8
2’
into (3.30) and (3.31) results in c _ k-
fsin(k+i)e
_t.tan
2 dk =
Substituting given by PkcX)
J-““.=
_t.tan*!
r
cos@
cos(k+:)O
1
T2h2
j-T+
(x-.&)2 2
(3.33)
Y.
0 cos ~ 2
(3.32) and (3.33) into (3.8), the piecewise
=
(3.32)
Y
eigenfunction
p(x) can be
)e,.((;-$+,)+v’),,
L
(3.34) therefore
Pkki!,)
=
;
(ck
-
thd,)
(3.35)
and Pk(Xk*2)= Vol.326,No.2.pp.295-307.1989 Printed in Great Britain
; (Ck+ z&z).
(3.36)
301
Weiwei Sun and N. G. Zamani Since p(x) is an eigenfunction and for eigenvalue AI, let
and y is an arbitrary
constant
for k, let y = A cos2 40
(3.37)
From
(3.35) and (3.36), we get P!$ =pk(xf,)
= -t,sin:sin(k+i)tl,rhcos:cos(k+i)0,
(3.38)
and #J
= pk(x&) = - tl sin ; sin (k + 4) O1+ zh cos z cos (k + 2) 8,.
Let p=
PV)P'1"...Pi,, p$Z) p\2’ . . . Pg:,
Pjy) Pcl), . . . P!!)N+, P$‘Pcz),
Hence, by (3.5), matrix P can be determined
. . . P”’ N+I
1
(3.39)
.
by
P = R,p, where R, is the product
of some permutation
(3.40) matrices,
and Q is expressed
as
Q = B- ‘RIP.
(3.41)
IV. Use of FFT Algorithm In Section III, Matrix Q has been constructed. If we use the Gauss elimination method to solve Eq. (2.5), the algorithm shown in Section III can be achieved with operation count 18: N3 less than those of the tensor product generalized AD1 method and Gauss elimination method. But our purpose is to find an algorithm with lower order operation count rather than 0(N3). In this section, we shall modify the expression of matrix Q shown in Section III such that FFT algorithm can be applied to (2.4) and (2.5) and the operation count is only 32 N2 log, N. We consider matrix P. By (3.38), we obtain pi;’ -pjj’
= - 2zh cos f! cos (k + 4) 01, 2
(4.1)
0, . p&j)+ph:) = - 2t, sin - sm (k + 4) 0,. 2
(4.2)
Let
302
Journal
of the Franklin Pergamon
Institute Press plc
Solving Tensor Product Collocation
Equations
Q, H&f) = 2 sin - sin (k + 1) el. 2
HI,” = - 2 cos f! cos (k + ‘2)or, 2 For I # 0, 1 # N, we obtain H#) = -cot;
[sin (k + 1)8, -sin
k&l,
(4.3)
Hj$ = -tan:[sin(k+l)B,+sinkB,],
(4.4)
and furthermore jcO~j,j) = -cot:sin(k+l)A,
(4.5)
and iiO(-l)‘P’@;)=
-tan~sin(k+l)8,.
(4.6)
By (4.3) and (4.4), we have, for 1 # 0, 1 # N, Hi:’. = Hi2:,
Hi:’ = Hi”,I7
(4.7)
and Hj$=
-2 3 H&j=0
Hi? = 0,
H#
= -2sin(k+i)z
For the sake of convenience of numerical computation, operations as some matrix transformations. By (4.1)-(4.4), HD = 7. P,
(4.8) we express we obtain
the above
(4.9)
where
where D,,
= diag(-rh,
-zh ,...,
D,2 = diag(0,
-th,.
D2, = diag(0,
-t,,
D22 = diag(-tt,,
-Th),
. . , -zh), -t2 ,..., -t_
--tN_,),
,,.. ., -t_N+l),
and H(l) = (H’!‘),.,, lJ Similarly,
H’*’ = (Hj;))NxN.
by (4.5) and (4.6), we obtain
Vol. 326, No. 2, pp. 295-307, Printed in Great Britain
1989
303
Weiwei Sun and N. G. Zamani
EH = H,y [
D, 01 0 D,
(4.10)
where
-1 0, --anI
D2 =
..
where Z, = (2,4,. . . ,2(N- I))‘, Z2 = (2, -4,. . . , (- 1)N-22(Nz2 = (- l)N-’ 2N and Fs is Fourier sine transformation, i.e. sin8,
sin8,
...
sin Bx_,
sin 28,
sin282
...
sin 2dN_,
l))‘,
zI = 2N,
F, = [ sin (N-i l)O, Thus, a new expression
sin (N-
l)O,
...
sin (N-
1)8,,_,
of matrix P can be given by P = R,I-‘E-‘H,&
(4.11)
where 6=
and matrix
304
Q is also rewritten
DI [ 0
0 D, 02 1
as Journal of the Franklin Institute Pergamon Press
plc
Solving Tensor Product Collocation
(4.12)
Q = B-‘R,i-‘E-‘H,D.
Since F, is a Fourier sine transformation, algorithm proceeds as follows :
Equations
FFT algorithm
can be used. The overall
Step 1
Calculate
and
RHS = (Z@ Q- ‘B_; ‘)RHS
by
1.1
solve R;‘iEwj”
1.2
solve H,Tw\2)= wj*)
1.3
solve bwi3) = wj*),
RHS = wc3),
= RHS,
i=
1,1,...,2N
RHS* = RRHS.
Step 2
Solve equation (A, @ 1+ B, 0 A&,+ = RHS:, and U = R- ‘IA*, where h, are calculated
i=
1,2,...,2N
by (3.1).
Step 3
Calculate
u = (I @ Q) U by 3.1
calculate
zt ‘) = bUi
3.2
calculate
zj” = Hszj’)
3.3
calculate
zj3) = RF ‘iEzj*)
3.4
solve equation
Byzi4) = zj”.
Finally, we obtain ui = zf4), i = 1,2,. . . ,2N. In the above computational procedure, Step 1.2 and Step 3.2 can be achieved by FFT algorithm (1, 2). TABLE I
Comparison in operation count and storage
Method BGl (TP ordering) BG2 (SP ordering) TPGADI FIG* FFT
Operation count
Storage
128 N4 32 N4 328 N3 18: N’ 32 N2 log, N
48 N3
24 N3 12 N2 8N2 8NZ
* Gauss elimination method is applied to solving Eq. (2.4) and calculating (2.5) rather than FFT. From the theoretical results in Table I, FFT is fastest. Vol. 326, No. 2, pp. 295-307, Prmted in Great Britain
1989
305
Weiwei
Sun and N. G. Zamani TABLE II
Comparison in CPU for FFT, FIG and BGl
N = 2/h
CPU
Number of unknowns
FFT
FIG
64 256 576 1296
0.106 0.505 1.136 3.149
0.089 0.493 1.476 4.846
4
8 12 18
time (s) BGl (TP ordering) 0.589 9.239 41.866
Now we consider the performance of this algorithm. The operation count and storage of this algorithm are presented in Table I, and are compared with band Gauss elimination method with partial pivoting, band Gauss elimination method by using some special order of the collocation point (5, 9, lo), and tensor product generalized AD1 methods. V. Numerical Result and Conclusion In order to test our algorithm, we consider a simple example. Here a Poisson equation in region R = {(x, y) IO d x d 2,0 < y ,< 2) is solved by the algorithm in which S is chosen such that u(x, y) = x(2 - x)y(2 - y). The band Gauss elimination method with partial pivoting [BGI (TP ordering)] is also applied to this problem. The results are summarized in Table II. These numerical results were computed on an IBM-438 1. We have derived an algorithm of O(N* log, N) operation which employs FFT algorithm to solve a tensor product collocation equation for the Poisson problem in a rectangular region. We have shown that the operation count of this algorithm is less than any previously published results and less CPU time is needed. Because FFT algorithm is employed, the algorithm is stable. hence it can be used to solve some large obtained.
problems.
For
some
other
boundary
condition,
similar
results
can be
References (1) F. Door, “The direct solution of the discrete Poisson equation on a rectangular”, SIAM Rev., Vol. 12, pp. 248-263, 1970. (2) R. W. Hackney, “A fast direct solution of Poisson’s equation using Fourier analysis”, J. Assoc. Comp. Math., Vol. 12, pp. 95-113, 1965. (3) P. N. Swartztrauber, “The method of Cyclic reduction, Fourier analysis, and the FACR algorithm for the discrete solution of Poisson’s equation on a rectangle”, SIAM Review, Vol. 19, pp. 490-501, 1977. “On the iteration solution of (4) R. Balart, E. N. Houstis and T. S. Papatheodorou, 10th IMACS Congress, pp. 988100. collocation method equations”, (5) W. R. Dyksen and J. R. Rice, “A new ordering scheme for the Hermite bicubic collocation equations”, in “Elliptic Problem Solvers III” (Edited by G. Birkhoff and A. Schoenstadt), pp. 467480, Academic Press, New York, 1984.
306
Journalof the
Franklin lnstrtute Pergamon Press plc
Solving Tensor Product Collocation (6) W. R. Dyksen, “Tensor product generalized ADI methods for separable blems”, SIAM J. Numer. Anal., Vol. 24, pp. 59-76, 1987.
Equations elliptic pro-
(7) P. R. Halmos, “Finite-Dimensional Vector Spaces”, 2nd edn, D. Van Nostrand Company, Inc., Princeton, N.J., 1958. “A collection based finite element method for eigenvalue cal(8) N. G. Zamani, culations”, J. Franklin Inst., Vol. 324, No. 2, pp. 2055217, 1987. (9) W. R. Dyksen, R. E. Lynch, J. R. Rice and E. N. Houstis, “The performance of the collocation and Galerkin methods with Hermite bi-cubic?, SIAM J. Numer. Anal., Vol. 21, pp. 695-715, 1984. (10)P. M. Prenter and R. D. Russell, “Orthogonal collocation for elliptic partial differential equations”, SIAM J. Numer. Anal., Vol. 13, pp. 923-939, 1976.
Vol. 326, No. 2, pp. 295-307, Printed in Great Britam
1989
307