Pergamon
PIPELINED
Computers % Swucfures Vol. 58. No. 4. pp. 689-701, 1996 Copyright 0 1995 Ekvier Science Ltd Printed in Great Britain. All rights reserved 0045~7949/96 $9.50 + 0.00
0045-7949(95)00162-X
ALGORITHM FOR NEWTON’S DIVIDED DIFFERENCE INTERPOLATION A. E. Al-Ayyoub
Computer Science Department, Bahrain University, P. 0. Box 32038, Isa Town, State of Bahrain (Received 27 October 1994)
Abstract-The problem of numerical interpolation is encountered in many engineering and scientific applications. Newton’s methods are commonly used to solve these problems. Generally, Newton’s algorithms are of 0(n2) sequential complexity. In this paper, Newton’s divided difference interpolation algorithm is reorganized to well-suite vector processing. The proposed algorithm has O(logn) vector complexity. It requries 2 log n + 4 vector instructions to set up the divided differences, and log n + 4 vector instructions to evaluate the interpolating polynomial at a given point.
1. INTRODUCTION
The interpolation problem can be summarized as follows: given a set of pairs of numbers (x0&), (x1 f, ) ,..., (xnf,), with all x0, x1 ,..., x, are different, the interpolation problem is to find a polynomial P,(x) such that P,(x,) =fO, P,,(x,) =fi ,..., P,,(x,,) =S, and has degree n or less. Such P,(x) is called an interpolating polynomial. The x,s are often called nodes. The corresponding f;s may be the values of some mathematical functionf(x), or empirically obtained in an experiment or observation. The polynomial P,,(x) is used to estimate values for all x such that P,,(x) is approximatelyf(x) or to get values for xs at which no measurement was taken. There are several methods to find P,(x) for a given data. Newton’s divided difference is a standard method that defines the interpolating polynomial in terms of divided differences. This interpolation polynomial can be written in the Newton form as follows
terms are the desired divided differences. Arrows represent computation dependency between terms. A target term cannot be computed until all associated source terms are available. For example the termf,,, cannot be computed before fO, and fi2. However, terms in the same column are independent and can be evaluated simultaneously. This defines a strightforward parallel algorithm for computing the divided differences which would require O(n) steps. Alternative formulation of the divided differences is to express the terms_&, as linear combinations of the given values x0, x2, . . . , x, and f0 ,fi, . . . J,, . For instance, the term folz can be defined using the recursion in eqn (2) as follows: f J
n,*
=sol-.fl2
“1‘
x0 -
=ti
x2
-fimll
-x,1
Ul:
x0-
VI
-.hm
h
P,(x) =.fo+h(x - xo)+fol*(x-x0)(x --Xl)
fl
=~~~-~,~~~~-~2~-~~,--xo~~~,-~2~
fl
the coefficients f0 ,...p@ = 0, 1, . . n) are called the divided differences. The interpolation polynomial is mostly defined once the divided differences are evaluated. Generally, any serial setting for the divided differences derivation would be of O(n’) sequential complexity. The divided differences can be recursively computed using Nevill’s method as follows [5]: J;,. .,+r=K . z+p-l -f;+, ,...,I+pl/(x,---Y+p).
-x2)
x2
h
The above recursion can be used to compute the terms f;.....,+p in the order shown in Fig. 1. Diagonal
h +(%-xd(x2--0)
+h
689
.fl
=~x~-xi~~x~-x2~+~x,-x~~~x,-x2~
=.h
(2)
h
-(X,-X2)(XD-X2)+(X,-X2)(XD-X2)
i
n
i
/
I-I, (Xl- Xi)
(x0-xi) +fl
I= 1.2
n
,=O.l
1.
(x2-xi)
.
690
A. E. Al-Ayyoub
+-- independent terms
Fig. 1. Order of computing the terms f;, The above analysis can be safely generalized to define the pth divided difference fO, p as a linear combination of the known values as shown below [I].
where Y,,~= xi - x,, i #j. This formulation of the divided differences provides the basis for the parallel solution of the interpolation problem. One should notice that redundant calculations have been introduced by the new definition for the divided differences in eqn (3). However, since computations in the new
,,+p.
definition are inherently parallel, the new formulation will still run faster in parallel computers. An independent study has shown that evaluating the interpolating polynomial in this manner will require O(log n) time on O(n2) processors [3]. Another study to map Newton’s algorithm onto the IBM 3090/18OS Vector Facility can be found in Ref. [6]. In this paper we will present an algorithm to evaluate the interpolating polynomial that requires O(log n) vector instructions on vector computers. The rest of the paper is organized as follows: in the next section the new algorithm is presented. Analytical and experimental results are discussed in Section 3. Conclusions are given in the last section.
ALGORITHM 1 NewtonDivided D@erence Interpolation: StraightforwardVectorization [ This algorithm computes T,,(d in O(n)vector instructions The total vector length is O(G) )
Step 1:
forp=l,
.. .. n
foralc
i=O, . ... n-p
A ...#kp = If& .... tip.1 endfor&
-fi+1, .,,, It ,l/hj -$+,I
endfor Step 2:
fmaiYi=l,
.,., n
q ‘=F&I endfoTaU Step 3:
foT i=Z, . . .. n xi’=& *xi’
Step 4:
fmaKi=l,
endfor . ... n
G’G’*fOI..i endforaU_ Step 5: Step 6:
T&j = f. +Sum_~duction(() Output( and Exjt
Fig. 2. Straightforward vectorization of Newton’s divided difference interpolation.
Pipelined algorithms for Newton’s divided difference interpolation 2. VECI’ORIZED
NEWTON INTERPOLATION ALGORITHMS
A straightforward vector setting of Newton’s divided difference algorithm is given in Fig. 2. For any fixedp,thesetoftermsf; ,,,,, i+,(i=O ,..., n--p)are independent and can be computed in a pipelined manner. Running this algorithm in a vector computer requires 2n vector subtractions and n vector divisions to find the divided differences. Evaluating the polynomial P,(X) requires one vector subtraction, one vector multiplication, one vector reduction, and n - 1 scalar multiplications. The algorithm in Fig. 2 requires O(n) scalar operations and O(n) vector instructions. A powerful vectorizing compiler would discover this setting. VS Fortran vectorizing compiler from IBM, for example, could discover vector operations in step 1, however, it is difficult for VS Fortran to recognize vector operations when evaluating the interpolating polynomial as given in eqn (1). The compiler would discover vector operations if eqn (1) is expressed as shown in steps 2-5 of Algorithm 1. The vector report listing generated by VS Fortran for the IBM 3090/1808 vector machine is shown in Fig. 3. Vector operations are contained in loops flagged with “VECT”. One serious shortcoming of the Algorithm 1 when compiled for vector machines such as the IBM 3090 is that vector instructions may operate on very short vectors. Although the algorithm has better time
REQUESTED OPTIONS Ls&lzuGEJEsTING 001 003 004
UNAN
006 007 008 009
+ I
RECR VECT
t +
010
) I
complexity, short vectors will always introduce a lot of startup overhead. As we have mentioned before, the formulation in eqn (3) provides the basis for better vectorization of Newton’s divided difference algorithm. The discussion of the proposed algorithm is easier started wtih an example. Figure 4(a) shows an expansion of eqn (3) for n = 4. The columns of each semi-triangular matrix are multiplied and the result is stored in the first column. These multiplications require two vector instructions. In the first vector instruction column pairs lst, 2nd and 3rd, 4th in each matrix are multiplied and the result is stored in the left column, as shown in Fig. 4(b). In the second vector instruction column, pairs lst, 3rd, in each matrix are multiplied and the result is stored in the first column as shown in Fig. 4(c). The results in Fig. 4(d) are produced after a vector division off; by the corresponding product (first column in the ith matrix). Finally, the result of adding the first column of each matrix will produce the desired divided differences. This is shown in Fig. 4e-g. Evaluating the polynomial P4 (x) requires multiplying columns of the matrix C shown in Fig. 5 and storing the result in the first column. The dot product of the resulting column and corresponding divided differences will be the desired estimate P.,(x). This evaluation process is shown in Fig. 5a-d. The algorithm in Fig. 6 (Algorithm 2) formalizes the process in the previous example. In step 1 a set of lower-triangular matrices All? n (I = 1, . . , n + 1)
VECTOR(REPORT(XL1S.T))
REAL*8 D(O:100,0:100),X(0:100), F(O:lOO), XB(O:lOO), PN, Xl EQUIVALENCE (D(O,O), F(0)) READ(S,*) N, Xl, H IF (N.GT.lOO) STOP DO 1=0, N READ(5,*) X(I), F(1)
002
005
(EXECUTE):
691
t--I_
DO P=l, N DO I=P, N D(I,P)=(D(I-1,P-1)-D(I,P-1))/o(I-P)-X(I))
011 012
VECT
t I
DO 1=1, N XB(I)=Xl-X(1-1)
013 014
RECR
t I
DO 1=2, N XB(I)=XB(I)*XB(I-1)
015 016 017 018 019 020 021
VECT
+ I
VECT
t I
DO I=l, N XB(I)=XB(I)*D(I,I) PN=F(O) DO I=l, N PN=PN+XB(I) WRITE(G,*) PN END
Fig.3. VS Fortran version 2.5 vector report listing.
A. E. Al-Ayyoub
692
= ML
+fMrlo
=fo/!b~!hu
+fr/YloY12
+f2/Yalbl
tm
=fa/!b1*lk79
+A /YaoY12Y13
+f2/YzoYzlYz3
fom4
= fo/!bl!hz~~
+fl~YaoY~zY~3Ylr +f2/YwY8lYz1Ya
S&J
+h/YmY31Y32 +h/Y3oY31Y32Y34
+h/Y4okoklY42Y43
(a;
= 2
M%l
+fi/Ylo +f1/rlo
=fo/!h
Pw
+fz/Yzo
=h/Ym
xl3
+fi/Ym
Y13
+A/Ya
Y2.3
+fJy30
Y32
=fo/3b1
!h
+h/Yla
Yl3
+A/Y.zo
yz.3
+_h/!ho
y32
+fi/y,o
(b)
= 22
~~
h/%1
+fi/Ylo
=h/!h
+f*/Ylo
=h/3tx
+fi/Ym
+fz/Ym
+f3/y3u
=f/!Ax
+fr/Ylo
+f2/yzo
+f3/Ym
+fz/Yzo
+fr/ko
@I
k= k= =
!A71
=
%l !hl 31
+YW fYW +YW
+YW
+Ym +YW +YzO
+Y30 +y40
+Y30 ((0
ii2= = P =
OLW
=
SJl = fl2 = =
Fm =
Yol
x71 L !hl
yol !kll Yol lb;
+YW +YW +YW +Yw
+yzxl +Y30
+Yw +Yw +Yw +Yw
Fig. 4. Vector operations required to find the divided differences for
n =
4.
are installed. These matrices are necessary to expand
eqn (3) and has the following format ...
Y/-l,1
Y/-I./-Z
Y/-Z,/
...
Y/-l,/-,
YI-2.1 ... Y/-2,2
...
Y/-l,/-2
Y/-l./
..
.‘.
. Yl-1.n
1
y4z
Pipelined algorithms for Newton’s divided difference interpolation
(4
693
60
Fig. 5. Vector operations required to evaluate P4(x). C,j = (x xj_ ,) and Pa(x) =fO + Srcm(Cl*Jl).
where y,,, = xi -x,, i #j. Easier to say, elements of these matrices are defined as follows: x/-i-x,-1 a!” 1, = Xi-I-X, i
if l,
One should notice that some rows in matrices A”’ . . , A’” + 1) are unnecessarily generated. Calculatibns in steps 2 and 3 consider these extra rows, however, these row will never affect the result since they are excluded when the terms are added up to form the divided differences in step 4. Step 2 in Algorithm 2 calculates the products aj~=IIj=,u$forl
vector instructions necessary for the dot product ’ ail(1)Yi=l,..,n. The redundant calculations performed by Algorithm 2 can be removed if simultaneous memory writes are allowed, or if computation and memory accesses can be overlapped. For example, the value y,, = x,, - xl need to be stored in memory locations A(1, 1, 1). . . , A(n, 1, I), simultaneously. If the underlying architecture supports such simultaneous stores, vectors to be processed will contain no redundant elements, hence average vector length will be reduced. Algorithm 3 in Fig. 7 assumes such hardware exists. The algorithm requires 2 log n + 4 vector instructions to compute the divided differences and logn + 4 vector instructions to evaluate the polynomial P,(x). Although the number of vector instructions has increased by a constant factor, length of vectors to be processed is greatly reduced. The algorithm uses the notation A(i :j) = (expression) to indicate that the value (expression) is to be stored in array elements A(i), A(i + l), . . . , A(j), simultaneously. cil
3. ANALYTICALAND EXPERIMENTALREWLTS
The three vector algorithms given in the previous section possess different degrees of parallelism. For instance, when computing the divided differences in Algorithm 1, parallelism is constrained by the recursion in Nevill’s definition, whereas computations of the divided differences in Algorithm 2 and Algorithm 3 are mostly parallel. Furthermore, Algorithm 1 requires O(n) vector instructions and total vector length of O(n*), while Algorithm 2 and Algorithm 3 require O(logn) vector instructions and total vector length of O(n3) and O(n’), respectively. Table 1 summarizes the interpolation activities carried by each algorithm and the number of vector instructions and the total vector length needed to perform these activities. As it can be seen from the table, Algorithm 2 and Algorithm 3 require lesser number of vector instructions. Theoretically speaking, Algorithms 1 and Algorithm 3 have the same order of vector length,
694
A. E. Al-Ayyoub
Newton Divi&d di&znce Interpolation: New VectorizationApproach { This vectorisedfortn ofNewton’s divided di$erence interpolationalgorithm requires 3 L&p+6 vector instructions to set and evaluate the interpolating polynomial TJaJ ) The total vector length is o(n3) )
ALGORITHM
(x&J, (a&, .... (&J
Input: output: Step 1:
Art
n-P fm somepositiveinttgcrm
WI0
[Installtk n+l triangular
matrices producedafter expandingequation
(3.). this would require a
single vector instruction]
fbralc l-1 to n+l j&I
i-l tol-1
fmaU j-i ton SI fi,i,r)-x$-1)-&-1)
cndforau Cndf~au fmaU i=Cton fonaU j-i ton J3&i,&o-l)-.&)
cndf0ral.z cndforall CllLf~au Step
2:
[Multiply tk colwnns of tk n+l matrices and store tk result in tk first column of tk corresponding matrix. This would require I!@ vector instructions]
fm k-1 to hgn faracCC-l ton+1 forall i-l tonbyZC fmaU j=i+2’l to n ~sli,i,r)-a~,i,r)*~~,i+2~.‘,[)
cndforacl cndforau tluifaralc endfor Step
3;
by each ekment in tk first column of tk i” matrix.This process Would
[Divide fi (i-o,l,...,n)
require ok vector iktruction]
forpcc I-l w n+l fora#j-1
ton
n~,l,r)-f(c-l)/~li,l,o
cndfolatl UrdfmaE Step 4:
[Add up first columu of each matrix. Tk vector which contains tk divided differences will be tk first column of the first matrix. This process requires l@+l vector instructions, l$u vector instruction to add up tk fist n colwnns, atui one more vector instruction to add tk last vector]
fbrQ1 w t&n fdf-1 tonby24 fmaU j-C+&l to n ~~,1,9-~~,l,C)+~li,l,(+24’)
dfd cndfaatl Fig. 6 (see caption opposite)
695
Pipelined algorithms for Newton’s divided difference interpolation
cndjii
fmd j-1 to n a~,l,l)=ag’,l,l)*~Il~)
cld_fomu Step
5:
[Evaluate the interpolating multiply tk columns of tk would require logn+l vector multiply tk columns of tk
polynomial TJd. First install tk lower trianguhu matrix C. then matrix by each other and store tk result in tk fust column. This instructions, one instruction to install tk matrix C and [ogn to installed matrix)
foracC i-l to n fond j-l to i C&i)-edi) endfarafl e~f~aU for &l to&n fontIT i-l tonby2k forall j-i+&’
ton
C&i)-C&i) l Co,i+Pk.t)
endforal.I CndforalT endfar Step
6:
[Multiply the first column of the matrix C by the corresponding would require one vector instruction.] foraU j-l
divided differences.
This
to n
c(j,l)-c(i,l)‘54o’,l,l) endforalc Step
7:
[Compute tk desired estimate Tn(d u&h
Step
8:
Output(Tn(ti)
ic equal to the sum of tk vector C@), j-l,...,n]
and a&
Fig. 6 (continued). Vectorized algorithm for Newton’s divided difference interpolation.
For a single pipeline per function
available Algorithm
1 will run faster than the other two algorithms. However, if multiple pipelines per function are available, vectors will be fed into different pipelines in the same time, hence, reducing the total vector length for Algorithm 2 and Algorithm 3. One way to estimate the execution time of a vector algorithm can be summarized as follows: given a set of v vector instructions with total vector length m and a set of np pipelines, the execution time T of this set of vector instructions can be expressed as follows: T = r,unup x 0 + (k,., x m)lnp,
(4)
where tmw and tmcrare the pipeline startup time and the time required by a single pipeline stage, respectively. Using this formula the execution times of
1, Algorithm 2 and Algorithm 3 can be estimated as follows:
Algorithm
T‘,/s‘Wkn1 = t“.““‘Ix (4n + 2) + (t,,c
x &t’++n
- I))/np
T~~i~~~ = rwup X (3 log n + 6) + (Lo x(n’+3nz+4n))/np T&with3 = LuP x (3 log n + g) +(L,
x ($’ + 6n - S))/np.
The startup time and the stage time are assumed to be the same for all pipelines. This assumption is made to simplify the analysis. The execution times of the
696
A. E. Al-Ayyoub
Divided Di$erence Interpolation: New VectorizationApproach ( This vectorizedform of Newton3 divided difference interpolationalgorithm requires 3 logn+S vector instructions to set and evaluate the interpolating polynomial T,,(d ‘The total vector length is O(tG) )
ALGORITHM
9 Navton
Input: output:
Step 1:
(G,fJ, (G,fJ .. .. (dJ
wfieren=Zmfor somepositiueintegerm
%(d
[hstaN the n+l triangular matrices produced ajier expandingequation(3), this would require a single vector instruction]
forufl L=l to n+l fma(I i=l to l-1 ~(i:n,i,Q=a$l)-7-l) endforall for& i=l to n a(i:n,i,~=@l)-Ir/il
endforaCl endfmaU Step 2:
[Multiply the columns of the n+l matrices and store the result in the first column of the corresponding matrix. This would require (ogn+l vector instructions] for
k=l to&n foralr C=l to n+l ford i=l to n by Zk ~(i+2h1+l:n,i,l)=~(i~2~1~l,i,~*~(i+2~”~l,i~2~~’,~ endfmall
endforrall endfor fmaU Gl to n+l foralc i=3 to n by 2
a(l,i,C)=a(l,i,n*~(i,i,tl
endfmall endforaU Step
3:
[Divide fi (i=O,l,...,n) by each element in
thefirst column of the ikhmatrix. This process would
require one vector instruction] forall C=l lo ml f&zU j-l
to n
~~,l,o=f~~~l~/a~,l,o
endforall endfor& Step 4:
[Add up first column of each matrix. The vector which contains the divided differences will be the first column of the first matrix. This process requires loan+1 vector instructions, Goan vector instruction to add up the first n columns, and one more vector instruction to add the last vector] f0r k-1 Lo I+ fomll
l-1 ton6y2k
fmall j-l+Zkl
to n
~li,l,c)~a(ill,l/+alill,l+zcl! endjball
Cl@f~atl Fig. 7 (see caption opposite)
pipelined algorithms for Newton’s divided difference interpolation
Step
5:
697
[Evaluate the interpolating polynomial T&I. First install the lower triangular matrix C, then multiply the columns of the matrix by each other and store the result in thefirst column. This would require l$qn+Z vector instructions, one instruction to install the matrix C and loan to multiply the columns of the installed matrix]
forall i-l ton C(l:i,i)=&)
endforaa: for l&=1to logn
fod
i=l tonby2k C(i+$1+1:n,i)=C(i+2k1+l,i)*C(i+2k1+l,i+2k1)
endfor& endfor forall j=3 to n by 2 C(Lj)=C(Lj)*CJj$ endforaU Step
6:
[Multiply the first column of the matrix C by the corresponding would require one vector i~truction.]
divided differences. This
forall j=l to n Co’,l)~co’,l)*5t~,l,1) cndjaafl Step
7:
[Compute the desired estimate Tn(d wlkh is c@
Step
8:
Output(!P&J) and E*Cit Fig. 7 (continued). Vector&d
to the sum of the vector Cli,l), j=l,...,n)
algorithm for Newton divided difference interpolation.
above algorithms are greatly affected by the ratio tstartup/tstage (denoted R) and np. For large np values the execution times of the algorithms will be dominated by the startup time. In this case Algorithm 1 will be of O(n) time complexity, while Algorithms 2 and 3 will be of O(log n) time complexity. For fixed and small np values the execution time depends on the ratio R. Since the two parameters R and np are machine dependent, Algorithm 1, Algorithm 2 and Algorithm 3 may show different performance ratios on different machines. Figure 8a-c shows execution time estimates of Algorithm 1, Algorithm 2 and Algorithm 3 for R = 4 and different np and n values. These plottings show that performances of Algorithm 2 and Algorithm 3 have great response to np values. For instance, at n = 100 performance of Algorithm 3 has upgraded by a speedup factor of 12 when np is switched from 6 to 100, whereas Algorithm 1 responded by a speedup factor of only two for the same switch. It is worth mentioning here that
parallelism in Algorithm 2 and Algorithm 3 is constrained by the available hardware. As it can be seen from Fig. 9 Algorithm 2 and Algorithm 3 are in favor of large R and np values. Execution times dramatically reduces as np increases. Such large np values are always possible as today’s parallel computers support multiple parallel constructs and can scale to include thousands of processing units. 4. CONCLUSIONS
In this paper three vector algorithms for Newton’s divided difference interpolation are proposed. Time complexities of these algorithms are also discussed. The proposed algorithms possess different degrees of parallelism. The amount of parallelism in the first algorithm is bounded by O(n). The other two algorithms reduce this bound to O(logn). The major concern of the last two algorithms is to break the recursion introduced by Nevill’s definition of the divided differences. Performance estimates provided
n
n-l n n
1
n-l
1
1
Evaluating the polpomial P.(x)
4n - 1 3/2n2 + 11/2n - 1
3/2n2 + 312~1
3n
Sub-total
n+2 4n + 2
n(n + I)/2
n
Sub-total Total
n(n + 1)
2n
Total vector length
Algorithm 1
Computing the divided differences
Number of vector instructions
Compute x-x,O
Compute the divided differences Compute the divided differences
Interpolation activity
n(n - 1)/2+n
logn + 1
n
1
n2 + 2n n3+3n2+4n
n
1
logn+3 3logn +6
n(n - I)/2
n(n - I)/2
n(n + 1)/2
log n
log n
1
n’+ 2n2 + 2n
n(n + 1)/2+n
1
210gn +3
n(n + l)(n - I)/2
n(n + l)(n + I)/2
Total vector length
Algorithm 2
log n
1
Number of vector instructions
Table 1. Summary of vector operations required by AlEoorithm 1, AI.gorithm 2 and Algorithm 3
Compute C,,=C,,*f,lCiCn Compute P.(x) = El,. , c,,
Install the matrix C Compute Cj = l-I&, c,, l
Install matrices A’Ol
Interpolation activity
Sub-total Total
Evaluating the polynomial P”(X)
Sub-total
Computing the divided differences
(n + I)@/2 - 1) n(n + I)/2 +n
1
n
1 I logn +4 31ogn+8
n/2-
1
I
91211- 2 7/2n2+6n -5
n
n-l
”
1/2n 2 i- 3/2n - 3
log ”
1
210gn +4
logn + 1 I)/2 + n
(?I + l)(n - 1)
log n
n(n-
n(n + 1)
1
1
Total vector length
Number of vector instructions
Algorithm 3
Table 1 (continued)
iG’Ign+i
P”(X) =‘zy, ,c,,
C,,=C,,*1;ldi
ldi
C,,= n;=,c,,
Install the matrix C Compute
Add columns of A[LlI,
lCi
Idi
up=rIj=,u~
Install matrices A’L’l GL
Interpolation activity
700
A. E. Al-Ayyoub np=6
np=I2 np=M np=24 np=30 llp=lOO
1
I
I
I
I
I
I
20
40
60
80
103
120
140
Problem Size (a) CPU time units required by Algorithm 1
np=6 np=12 np=18 n/1=24 np=30 npdoo
, I
I
I
0
20
40
I
Go
I
I
I
la0
140
I
la,
80
Problem Size (b) CPU time units required by Algorithm 2
0 0
I
.
I
I
I
.
20
40
60
80
103
la0
1 140
Problem Size (c) CPU time units required by Algorithm 3 Fig. 8. CPU time requirements of Algorithm 1, Algorithm 2 and Algorithm 3 for R = 4 and different np and n values.
Algmirhm
0
I
Al&thin 2 Algon’thm 3 , 0
I XQ
1 1000
I lxx,
1 lam
1
2500
nP Fig. 9. CPU time of Algorithm 1, Algorithm 2 and Algorithm 3 for R = 4, n = 128 and different np values.
Pipelined algorithms for Newton’s divided difference interpolation in the last section indicate that the proposed algor-
ithms can run much faster than any straightforward setting of Newton’s divided difference interpolation given that the target machine can scale to include sufficient processing units. REFERENCES 1. M. Abramowitz and I. A. Stegun, Handbook of Mathematical Functions. Drove (1965). 2. R. Burden, J. Douglas and A. Reynolds, Numerical Analysis. PWS Publishers (1981).
701
3. 0. Egecioglue, E. Gauopoulos, E. and C. Koc, Fast and practical parallel polynomial interpolation. CSRD Report
no. 646. (1987). 4. A. Macleod, A comparison of algorithm for polynomial interpolation. J. Comput. appl. Math. 8 (1982). 5. J. Mathews, Numerical MethodF for Computer Science, Engineering and Mathematics. Prentice-Hall, Englewood
Cliffs, NJ (1987). 6. A. Yazici and A. Al-Ayyoub, An organization of the interpolation problem for vector processing. In: Parallel Processing in Engineering Applications (Edited by R. A. Adey), pp. 171-182. Computational Mechanics Publications, Springer (1990).