Pipelined algorithm for Newton's divided difference interpolation

Pipelined algorithm for Newton's divided difference interpolation

Pergamon PIPELINED Computers % Swucfures Vol. 58. No. 4. pp. 689-701, 1996 Copyright 0 1995 Ekvier Science Ltd Printed in Great Britain. All rights ...

783KB Sizes 0 Downloads 36 Views

Pergamon

PIPELINED

Computers % Swucfures Vol. 58. No. 4. pp. 689-701, 1996 Copyright 0 1995 Ekvier Science Ltd Printed in Great Britain. All rights reserved 0045~7949/96 $9.50 + 0.00

0045-7949(95)00162-X

ALGORITHM FOR NEWTON’S DIVIDED DIFFERENCE INTERPOLATION A. E. Al-Ayyoub

Computer Science Department, Bahrain University, P. 0. Box 32038, Isa Town, State of Bahrain (Received 27 October 1994)

Abstract-The problem of numerical interpolation is encountered in many engineering and scientific applications. Newton’s methods are commonly used to solve these problems. Generally, Newton’s algorithms are of 0(n2) sequential complexity. In this paper, Newton’s divided difference interpolation algorithm is reorganized to well-suite vector processing. The proposed algorithm has O(logn) vector complexity. It requries 2 log n + 4 vector instructions to set up the divided differences, and log n + 4 vector instructions to evaluate the interpolating polynomial at a given point.

1. INTRODUCTION

The interpolation problem can be summarized as follows: given a set of pairs of numbers (x0&), (x1 f, ) ,..., (xnf,), with all x0, x1 ,..., x, are different, the interpolation problem is to find a polynomial P,(x) such that P,(x,) =fO, P,,(x,) =fi ,..., P,,(x,,) =S, and has degree n or less. Such P,(x) is called an interpolating polynomial. The x,s are often called nodes. The corresponding f;s may be the values of some mathematical functionf(x), or empirically obtained in an experiment or observation. The polynomial P,,(x) is used to estimate values for all x such that P,,(x) is approximatelyf(x) or to get values for xs at which no measurement was taken. There are several methods to find P,(x) for a given data. Newton’s divided difference is a standard method that defines the interpolating polynomial in terms of divided differences. This interpolation polynomial can be written in the Newton form as follows

terms are the desired divided differences. Arrows represent computation dependency between terms. A target term cannot be computed until all associated source terms are available. For example the termf,,, cannot be computed before fO, and fi2. However, terms in the same column are independent and can be evaluated simultaneously. This defines a strightforward parallel algorithm for computing the divided differences which would require O(n) steps. Alternative formulation of the divided differences is to express the terms_&, as linear combinations of the given values x0, x2, . . . , x, and f0 ,fi, . . . J,, . For instance, the term folz can be defined using the recursion in eqn (2) as follows: f J

n,*

=sol-.fl2

“1‘

x0 -

=ti

x2

-fimll

-x,1

Ul:

x0-

VI

-.hm

h

P,(x) =.fo+h(x - xo)+fol*(x-x0)(x --Xl)

fl

=~~~-~,~~~~-~2~-~~,--xo~~~,-~2~

fl

the coefficients f0 ,...p@ = 0, 1, . . n) are called the divided differences. The interpolation polynomial is mostly defined once the divided differences are evaluated. Generally, any serial setting for the divided differences derivation would be of O(n’) sequential complexity. The divided differences can be recursively computed using Nevill’s method as follows [5]: J;,. .,+r=K . z+p-l -f;+, ,...,I+pl/(x,---Y+p).

-x2)

x2

h

The above recursion can be used to compute the terms f;.....,+p in the order shown in Fig. 1. Diagonal

h +(%-xd(x2--0)

+h

689

.fl

=~x~-xi~~x~-x2~+~x,-x~~~x,-x2~

=.h

(2)

h

-(X,-X2)(XD-X2)+(X,-X2)(XD-X2)

i

n

i

/

I-I, (Xl- Xi)

(x0-xi) +fl

I= 1.2

n

,=O.l

1.

(x2-xi)

.

690

A. E. Al-Ayyoub

+-- independent terms

Fig. 1. Order of computing the terms f;, The above analysis can be safely generalized to define the pth divided difference fO, p as a linear combination of the known values as shown below [I].

where Y,,~= xi - x,, i #j. This formulation of the divided differences provides the basis for the parallel solution of the interpolation problem. One should notice that redundant calculations have been introduced by the new definition for the divided differences in eqn (3). However, since computations in the new

,,+p.

definition are inherently parallel, the new formulation will still run faster in parallel computers. An independent study has shown that evaluating the interpolating polynomial in this manner will require O(log n) time on O(n2) processors [3]. Another study to map Newton’s algorithm onto the IBM 3090/18OS Vector Facility can be found in Ref. [6]. In this paper we will present an algorithm to evaluate the interpolating polynomial that requires O(log n) vector instructions on vector computers. The rest of the paper is organized as follows: in the next section the new algorithm is presented. Analytical and experimental results are discussed in Section 3. Conclusions are given in the last section.

ALGORITHM 1 NewtonDivided D@erence Interpolation: StraightforwardVectorization [ This algorithm computes T,,(d in O(n)vector instructions The total vector length is O(G) )

Step 1:

forp=l,

.. .. n

foralc

i=O, . ... n-p

A ...#kp = If& .... tip.1 endfor&

-fi+1, .,,, It ,l/hj -$+,I

endfor Step 2:

fmaiYi=l,

.,., n

q ‘=F&I endfoTaU Step 3:

foT i=Z, . . .. n xi’=& *xi’

Step 4:

fmaKi=l,

endfor . ... n

G’G’*fOI..i endforaU_ Step 5: Step 6:

T&j = f. +Sum_~duction(() Output( and Exjt

Fig. 2. Straightforward vectorization of Newton’s divided difference interpolation.

Pipelined algorithms for Newton’s divided difference interpolation 2. VECI’ORIZED

NEWTON INTERPOLATION ALGORITHMS

A straightforward vector setting of Newton’s divided difference algorithm is given in Fig. 2. For any fixedp,thesetoftermsf; ,,,,, i+,(i=O ,..., n--p)are independent and can be computed in a pipelined manner. Running this algorithm in a vector computer requires 2n vector subtractions and n vector divisions to find the divided differences. Evaluating the polynomial P,(X) requires one vector subtraction, one vector multiplication, one vector reduction, and n - 1 scalar multiplications. The algorithm in Fig. 2 requires O(n) scalar operations and O(n) vector instructions. A powerful vectorizing compiler would discover this setting. VS Fortran vectorizing compiler from IBM, for example, could discover vector operations in step 1, however, it is difficult for VS Fortran to recognize vector operations when evaluating the interpolating polynomial as given in eqn (1). The compiler would discover vector operations if eqn (1) is expressed as shown in steps 2-5 of Algorithm 1. The vector report listing generated by VS Fortran for the IBM 3090/1808 vector machine is shown in Fig. 3. Vector operations are contained in loops flagged with “VECT”. One serious shortcoming of the Algorithm 1 when compiled for vector machines such as the IBM 3090 is that vector instructions may operate on very short vectors. Although the algorithm has better time

REQUESTED OPTIONS Ls&lzuGEJEsTING 001 003 004

UNAN

006 007 008 009

+ I

RECR VECT

t +

010

) I

complexity, short vectors will always introduce a lot of startup overhead. As we have mentioned before, the formulation in eqn (3) provides the basis for better vectorization of Newton’s divided difference algorithm. The discussion of the proposed algorithm is easier started wtih an example. Figure 4(a) shows an expansion of eqn (3) for n = 4. The columns of each semi-triangular matrix are multiplied and the result is stored in the first column. These multiplications require two vector instructions. In the first vector instruction column pairs lst, 2nd and 3rd, 4th in each matrix are multiplied and the result is stored in the left column, as shown in Fig. 4(b). In the second vector instruction column, pairs lst, 3rd, in each matrix are multiplied and the result is stored in the first column as shown in Fig. 4(c). The results in Fig. 4(d) are produced after a vector division off; by the corresponding product (first column in the ith matrix). Finally, the result of adding the first column of each matrix will produce the desired divided differences. This is shown in Fig. 4e-g. Evaluating the polynomial P4 (x) requires multiplying columns of the matrix C shown in Fig. 5 and storing the result in the first column. The dot product of the resulting column and corresponding divided differences will be the desired estimate P.,(x). This evaluation process is shown in Fig. 5a-d. The algorithm in Fig. 6 (Algorithm 2) formalizes the process in the previous example. In step 1 a set of lower-triangular matrices All? n (I = 1, . . , n + 1)

VECTOR(REPORT(XL1S.T))

REAL*8 D(O:100,0:100),X(0:100), F(O:lOO), XB(O:lOO), PN, Xl EQUIVALENCE (D(O,O), F(0)) READ(S,*) N, Xl, H IF (N.GT.lOO) STOP DO 1=0, N READ(5,*) X(I), F(1)

002

005

(EXECUTE):

691

t--I_

DO P=l, N DO I=P, N D(I,P)=(D(I-1,P-1)-D(I,P-1))/o(I-P)-X(I))

011 012

VECT

t I

DO 1=1, N XB(I)=Xl-X(1-1)

013 014

RECR

t I

DO 1=2, N XB(I)=XB(I)*XB(I-1)

015 016 017 018 019 020 021

VECT

+ I

VECT

t I

DO I=l, N XB(I)=XB(I)*D(I,I) PN=F(O) DO I=l, N PN=PN+XB(I) WRITE(G,*) PN END

Fig.3. VS Fortran version 2.5 vector report listing.

A. E. Al-Ayyoub

692

= ML

+fMrlo

=fo/!b~!hu

+fr/YloY12

+f2/Yalbl

tm

=fa/!b1*lk79

+A /YaoY12Y13

+f2/YzoYzlYz3

fom4

= fo/!bl!hz~~

+fl~YaoY~zY~3Ylr +f2/YwY8lYz1Ya

S&J

+h/YmY31Y32 +h/Y3oY31Y32Y34

+h/Y4okoklY42Y43

(a;

= 2

M%l

+fi/Ylo +f1/rlo

=fo/!h

Pw

+fz/Yzo

=h/Ym

xl3

+fi/Ym

Y13

+A/Ya

Y2.3

+fJy30

Y32

=fo/3b1

!h

+h/Yla

Yl3

+A/Y.zo

yz.3

+_h/!ho

y32

+fi/y,o

(b)

= 22

~~

h/%1

+fi/Ylo

=h/!h

+f*/Ylo

=h/3tx

+fi/Ym

+fz/Ym

+f3/y3u

=f/!Ax

+fr/Ylo

+f2/yzo

+f3/Ym

+fz/Yzo

+fr/ko

@I

k= k= =

!A71

=

%l !hl 31

+YW fYW +YW

+YW

+Ym +YW +YzO

+Y30 +y40

+Y30 ((0

ii2= = P =

OLW

=

SJl = fl2 = =

Fm =

Yol

x71 L !hl

yol !kll Yol lb;

+YW +YW +YW +Yw

+yzxl +Y30

+Yw +Yw +Yw +Yw

Fig. 4. Vector operations required to find the divided differences for

n =

4.

are installed. These matrices are necessary to expand

eqn (3) and has the following format ...

Y/-l,1

Y/-I./-Z

Y/-Z,/

...

Y/-l,/-,

YI-2.1 ... Y/-2,2

...

Y/-l,/-2

Y/-l./

..

.‘.

. Yl-1.n

1

y4z

Pipelined algorithms for Newton’s divided difference interpolation

(4

693

60

Fig. 5. Vector operations required to evaluate P4(x). C,j = (x xj_ ,) and Pa(x) =fO + Srcm(Cl*Jl).

where y,,, = xi -x,, i #j. Easier to say, elements of these matrices are defined as follows: x/-i-x,-1 a!” 1, = Xi-I-X, i

if l,
One should notice that some rows in matrices A”’ . . , A’” + 1) are unnecessarily generated. Calculatibns in steps 2 and 3 consider these extra rows, however, these row will never affect the result since they are excluded when the terms are added up to form the divided differences in step 4. Step 2 in Algorithm 2 calculates the products aj~=IIj=,u$forl
vector instructions necessary for the dot product ’ ail(1)Yi=l,..,n. The redundant calculations performed by Algorithm 2 can be removed if simultaneous memory writes are allowed, or if computation and memory accesses can be overlapped. For example, the value y,, = x,, - xl need to be stored in memory locations A(1, 1, 1). . . , A(n, 1, I), simultaneously. If the underlying architecture supports such simultaneous stores, vectors to be processed will contain no redundant elements, hence average vector length will be reduced. Algorithm 3 in Fig. 7 assumes such hardware exists. The algorithm requires 2 log n + 4 vector instructions to compute the divided differences and logn + 4 vector instructions to evaluate the polynomial P,(x). Although the number of vector instructions has increased by a constant factor, length of vectors to be processed is greatly reduced. The algorithm uses the notation A(i :j) = (expression) to indicate that the value (expression) is to be stored in array elements A(i), A(i + l), . . . , A(j), simultaneously. cil

3. ANALYTICALAND EXPERIMENTALREWLTS

The three vector algorithms given in the previous section possess different degrees of parallelism. For instance, when computing the divided differences in Algorithm 1, parallelism is constrained by the recursion in Nevill’s definition, whereas computations of the divided differences in Algorithm 2 and Algorithm 3 are mostly parallel. Furthermore, Algorithm 1 requires O(n) vector instructions and total vector length of O(n*), while Algorithm 2 and Algorithm 3 require O(logn) vector instructions and total vector length of O(n3) and O(n’), respectively. Table 1 summarizes the interpolation activities carried by each algorithm and the number of vector instructions and the total vector length needed to perform these activities. As it can be seen from the table, Algorithm 2 and Algorithm 3 require lesser number of vector instructions. Theoretically speaking, Algorithms 1 and Algorithm 3 have the same order of vector length,

694

A. E. Al-Ayyoub

Newton Divi&d di&znce Interpolation: New VectorizationApproach { This vectorisedfortn ofNewton’s divided di$erence interpolationalgorithm requires 3 L&p+6 vector instructions to set and evaluate the interpolating polynomial TJaJ ) The total vector length is o(n3) )

ALGORITHM

(x&J, (a&, .... (&J

Input: output: Step 1:

Art

n-P fm somepositiveinttgcrm

WI0

[Installtk n+l triangular

matrices producedafter expandingequation

(3.). this would require a

single vector instruction]

fbralc l-1 to n+l j&I

i-l tol-1

fmaU j-i ton SI fi,i,r)-x$-1)-&-1)

cndforau Cndf~au fmaU i=Cton fonaU j-i ton J3&i,&o-l)-.&)

cndf0ral.z cndforall CllLf~au Step

2:

[Multiply tk colwnns of tk n+l matrices and store tk result in tk first column of tk corresponding matrix. This would require I!@ vector instructions]

fm k-1 to hgn faracCC-l ton+1 forall i-l tonbyZC fmaU j=i+2’l to n ~sli,i,r)-a~,i,r)*~~,i+2~.‘,[)

cndforacl cndforau tluifaralc endfor Step

3;

by each ekment in tk first column of tk i” matrix.This process Would

[Divide fi (i-o,l,...,n)

require ok vector iktruction]

forpcc I-l w n+l fora#j-1

ton

n~,l,r)-f(c-l)/~li,l,o

cndfolatl UrdfmaE Step 4:

[Add up first columu of each matrix. Tk vector which contains tk divided differences will be tk first column of the first matrix. This process requires l@+l vector instructions, l$u vector instruction to add up tk fist n colwnns, atui one more vector instruction to add tk last vector]

fbrQ1 w t&n fdf-1 tonby24 fmaU j-C+&l to n ~~,1,9-~~,l,C)+~li,l,(+24’)

dfd cndfaatl Fig. 6 (see caption opposite)

695

Pipelined algorithms for Newton’s divided difference interpolation

cndjii

fmd j-1 to n a~,l,l)=ag’,l,l)*~Il~)

cld_fomu Step

5:

[Evaluate the interpolating multiply tk columns of tk would require logn+l vector multiply tk columns of tk

polynomial TJd. First install tk lower trianguhu matrix C. then matrix by each other and store tk result in tk fust column. This instructions, one instruction to install tk matrix C and [ogn to installed matrix)

foracC i-l to n fond j-l to i C&i)-edi) endfarafl e~f~aU for &l to&n fontIT i-l tonby2k forall j-i+&’

ton

C&i)-C&i) l Co,i+Pk.t)

endforal.I CndforalT endfar Step

6:

[Multiply the first column of the matrix C by the corresponding would require one vector instruction.] foraU j-l

divided differences.

This

to n

c(j,l)-c(i,l)‘54o’,l,l) endforalc Step

7:

[Compute tk desired estimate Tn(d u&h

Step

8:

Output(Tn(ti)

ic equal to the sum of tk vector C@), j-l,...,n]

and a&

Fig. 6 (continued). Vectorized algorithm for Newton’s divided difference interpolation.

For a single pipeline per function

available Algorithm

1 will run faster than the other two algorithms. However, if multiple pipelines per function are available, vectors will be fed into different pipelines in the same time, hence, reducing the total vector length for Algorithm 2 and Algorithm 3. One way to estimate the execution time of a vector algorithm can be summarized as follows: given a set of v vector instructions with total vector length m and a set of np pipelines, the execution time T of this set of vector instructions can be expressed as follows: T = r,unup x 0 + (k,., x m)lnp,

(4)

where tmw and tmcrare the pipeline startup time and the time required by a single pipeline stage, respectively. Using this formula the execution times of

1, Algorithm 2 and Algorithm 3 can be estimated as follows:

Algorithm

T‘,/s‘Wkn1 = t“.““‘Ix (4n + 2) + (t,,c

x &t’++n

- I))/np

T~~i~~~ = rwup X (3 log n + 6) + (Lo x(n’+3nz+4n))/np T&with3 = LuP x (3 log n + g) +(L,

x ($’ + 6n - S))/np.

The startup time and the stage time are assumed to be the same for all pipelines. This assumption is made to simplify the analysis. The execution times of the

696

A. E. Al-Ayyoub

Divided Di$erence Interpolation: New VectorizationApproach ( This vectorizedform of Newton3 divided difference interpolationalgorithm requires 3 logn+S vector instructions to set and evaluate the interpolating polynomial T,,(d ‘The total vector length is O(tG) )

ALGORITHM

9 Navton

Input: output:

Step 1:

(G,fJ, (G,fJ .. .. (dJ

wfieren=Zmfor somepositiueintegerm

%(d

[hstaN the n+l triangular matrices produced ajier expandingequation(3), this would require a single vector instruction]

forufl L=l to n+l fma(I i=l to l-1 ~(i:n,i,Q=a$l)-7-l) endforall for& i=l to n a(i:n,i,~=@l)-Ir/il

endforaCl endfmaU Step 2:

[Multiply the columns of the n+l matrices and store the result in the first column of the corresponding matrix. This would require (ogn+l vector instructions] for

k=l to&n foralr C=l to n+l ford i=l to n by Zk ~(i+2h1+l:n,i,l)=~(i~2~1~l,i,~*~(i+2~”~l,i~2~~’,~ endfmall

endforrall endfor fmaU Gl to n+l foralc i=3 to n by 2

a(l,i,C)=a(l,i,n*~(i,i,tl

endfmall endforaU Step

3:

[Divide fi (i=O,l,...,n) by each element in

thefirst column of the ikhmatrix. This process would

require one vector instruction] forall C=l lo ml f&zU j-l

to n

~~,l,o=f~~~l~/a~,l,o

endforall endfor& Step 4:

[Add up first column of each matrix. The vector which contains the divided differences will be the first column of the first matrix. This process requires loan+1 vector instructions, Goan vector instruction to add up the first n columns, and one more vector instruction to add the last vector] f0r k-1 Lo I+ fomll

l-1 ton6y2k

fmall j-l+Zkl

to n

~li,l,c)~a(ill,l/+alill,l+zcl! endjball

Cl@f~atl Fig. 7 (see caption opposite)

pipelined algorithms for Newton’s divided difference interpolation

Step

5:

697

[Evaluate the interpolating polynomial T&I. First install the lower triangular matrix C, then multiply the columns of the matrix by each other and store the result in thefirst column. This would require l$qn+Z vector instructions, one instruction to install the matrix C and loan to multiply the columns of the installed matrix]

forall i-l ton C(l:i,i)=&)

endforaa: for l&=1to logn

fod

i=l tonby2k C(i+$1+1:n,i)=C(i+2k1+l,i)*C(i+2k1+l,i+2k1)

endfor& endfor forall j=3 to n by 2 C(Lj)=C(Lj)*CJj$ endforaU Step

6:

[Multiply the first column of the matrix C by the corresponding would require one vector i~truction.]

divided differences. This

forall j=l to n Co’,l)~co’,l)*5t~,l,1) cndjaafl Step

7:

[Compute the desired estimate Tn(d wlkh is c@

Step

8:

Output(!P&J) and E*Cit Fig. 7 (continued). Vector&d

to the sum of the vector Cli,l), j=l,...,n)

algorithm for Newton divided difference interpolation.

above algorithms are greatly affected by the ratio tstartup/tstage (denoted R) and np. For large np values the execution times of the algorithms will be dominated by the startup time. In this case Algorithm 1 will be of O(n) time complexity, while Algorithms 2 and 3 will be of O(log n) time complexity. For fixed and small np values the execution time depends on the ratio R. Since the two parameters R and np are machine dependent, Algorithm 1, Algorithm 2 and Algorithm 3 may show different performance ratios on different machines. Figure 8a-c shows execution time estimates of Algorithm 1, Algorithm 2 and Algorithm 3 for R = 4 and different np and n values. These plottings show that performances of Algorithm 2 and Algorithm 3 have great response to np values. For instance, at n = 100 performance of Algorithm 3 has upgraded by a speedup factor of 12 when np is switched from 6 to 100, whereas Algorithm 1 responded by a speedup factor of only two for the same switch. It is worth mentioning here that

parallelism in Algorithm 2 and Algorithm 3 is constrained by the available hardware. As it can be seen from Fig. 9 Algorithm 2 and Algorithm 3 are in favor of large R and np values. Execution times dramatically reduces as np increases. Such large np values are always possible as today’s parallel computers support multiple parallel constructs and can scale to include thousands of processing units. 4. CONCLUSIONS

In this paper three vector algorithms for Newton’s divided difference interpolation are proposed. Time complexities of these algorithms are also discussed. The proposed algorithms possess different degrees of parallelism. The amount of parallelism in the first algorithm is bounded by O(n). The other two algorithms reduce this bound to O(logn). The major concern of the last two algorithms is to break the recursion introduced by Nevill’s definition of the divided differences. Performance estimates provided

n

n-l n n

1

n-l

1

1

Evaluating the polpomial P.(x)

4n - 1 3/2n2 + 11/2n - 1

3/2n2 + 312~1

3n

Sub-total

n+2 4n + 2

n(n + I)/2

n

Sub-total Total

n(n + 1)

2n

Total vector length

Algorithm 1

Computing the divided differences

Number of vector instructions

Compute x-x,O
Compute the divided differences Compute the divided differences

Interpolation activity

n(n - 1)/2+n

logn + 1

n

1

n2 + 2n n3+3n2+4n

n

1

logn+3 3logn +6

n(n - I)/2

n(n - I)/2

n(n + 1)/2

log n

log n

1

n’+ 2n2 + 2n

n(n + 1)/2+n

1

210gn +3

n(n + l)(n - I)/2

n(n + l)(n + I)/2

Total vector length

Algorithm 2

log n

1

Number of vector instructions

Table 1. Summary of vector operations required by AlEoorithm 1, AI.gorithm 2 and Algorithm 3

Compute C,,=C,,*f,lCiCn Compute P.(x) = El,. , c,,

Install the matrix C Compute Cj = l-I&, c,, l
Install matrices A’Ol
Interpolation activity

Sub-total Total

Evaluating the polynomial P”(X)

Sub-total

Computing the divided differences

(n + I)@/2 - 1) n(n + I)/2 +n

1

n

1 I logn +4 31ogn+8

n/2-

1

I

91211- 2 7/2n2+6n -5

n

n-l



1/2n 2 i- 3/2n - 3

log ”

1

210gn +4

logn + 1 I)/2 + n

(?I + l)(n - 1)

log n

n(n-

n(n + 1)

1

1

Total vector length

Number of vector instructions

Algorithm 3

Table 1 (continued)

iG’Ign+i

P”(X) =‘zy, ,c,,

C,,=C,,*1;ldi
ldi
C,,= n;=,c,,

Install the matrix C Compute

Add columns of A[LlI,
lCi
Idi
up=rIj=,u~

Install matrices A’L’l GL
Interpolation activity

700

A. E. Al-Ayyoub np=6

np=I2 np=M np=24 np=30 llp=lOO

1

I

I

I

I

I

I

20

40

60

80

103

120

140

Problem Size (a) CPU time units required by Algorithm 1

np=6 np=12 np=18 n/1=24 np=30 npdoo

, I

I

I

0

20

40

I

Go

I

I

I

la0

140

I

la,

80

Problem Size (b) CPU time units required by Algorithm 2

0 0

I

.

I

I

I

.

20

40

60

80

103

la0

1 140

Problem Size (c) CPU time units required by Algorithm 3 Fig. 8. CPU time requirements of Algorithm 1, Algorithm 2 and Algorithm 3 for R = 4 and different np and n values.

Algmirhm

0

I

Al&thin 2 Algon’thm 3 , 0

I XQ

1 1000

I lxx,

1 lam

1

2500

nP Fig. 9. CPU time of Algorithm 1, Algorithm 2 and Algorithm 3 for R = 4, n = 128 and different np values.

Pipelined algorithms for Newton’s divided difference interpolation in the last section indicate that the proposed algor-

ithms can run much faster than any straightforward setting of Newton’s divided difference interpolation given that the target machine can scale to include sufficient processing units. REFERENCES 1. M. Abramowitz and I. A. Stegun, Handbook of Mathematical Functions. Drove (1965). 2. R. Burden, J. Douglas and A. Reynolds, Numerical Analysis. PWS Publishers (1981).

701

3. 0. Egecioglue, E. Gauopoulos, E. and C. Koc, Fast and practical parallel polynomial interpolation. CSRD Report

no. 646. (1987). 4. A. Macleod, A comparison of algorithm for polynomial interpolation. J. Comput. appl. Math. 8 (1982). 5. J. Mathews, Numerical MethodF for Computer Science, Engineering and Mathematics. Prentice-Hall, Englewood

Cliffs, NJ (1987). 6. A. Yazici and A. Al-Ayyoub, An organization of the interpolation problem for vector processing. In: Parallel Processing in Engineering Applications (Edited by R. A. Adey), pp. 171-182. Computational Mechanics Publications, Springer (1990).