The implementation of the explicit block iterative methods on the balance 8000 parallel computer

The implementation of the explicit block iterative methods on the balance 8000 parallel computer

81 Parallel Computing 16 (1990) 81-97 North-Holland Practical aspects and experiences The implementation of the explicit block iterative methods on...

857KB Sizes 17 Downloads 38 Views

81

Parallel Computing 16 (1990) 81-97 North-Holland

Practical aspects and experiences

The implementation of the explicit block iterative methods on the Balance 8000 parallel computer D.J. EVANS and W.S. YOUSIF Parallel Algorithms Research Centre, Department of Computer Studies, Loughborough University of Technology. Loughborough, Leicestershire, UK Received 10 January 1 ~ 0

Abstract. The explicit block iterative method for solving elliptic p.d.e.'s was introduced by Evans and Biggins [1] whilst in Yousif and Evans [2], larger size block methods were studied and their advantages investigated and compared with other iterative methods. In this paper, several variants of the implementation of these block methods on the Balance 8000 parallel computer are discussed. Keywonls. Explicit block iterative methods, Gauss-Seidel and S.O.R. methods, Shared memory multiprocessor.

1. Introduction

It has been shown that the explicit block, technique is superior when compared with other iterative methods for the solution of elliptic p.d.e.'s [2]. A brief description of the block methods is given in Section 2. The blocks chosen in our experiments are the 4, 6, 9 and 16-point blocks. In Section 3, the parallel implementations of the methods are developed, three strategies are discussed and implemented in order to show the performance of each method. 2. The explicit block itemtive method

Consider the second-order elliptic p.d.e., i.e. Laplace's equation in the unit square: 02//+ 02/'/ 8x a 8y 2

O, x, y~f/

(0, 1) x (0, 1)

(2.1)

and U(x, y) - g(x, y) on the boundary. Substituting the finite difference approximations for the second order derivatives leads to the following system of linear equations: u~_l.j+ u ~ + i . j - 4 u ~ . j + u~.j-! + u~.j+t --0, 1 <~i, j ~ N (2.2) u~.j ffi g~.j - g ( ih, j h )

ifi-0

orifN+l

orj-0

0167-8191/90/$03.50 © 1990 - Elsevier Science Publishers B.V. (North-Holland)

or j - N + l,

D.J. Evans, W.$. Yousif / Explicit block iterative methods

82

h

g

f

i

htt

x5

.

x3

x4

xI

Ix2

a

b

b Fig. 2.

c Fig. 1.

where u~.j is an approximation to the exact solution U(x~, yj) at the grid point (xj, y j ) 1) is the grid spacing. The expression (2.2) is the usual five-point formula.

(ih, j h ) and h - h x - h y - 1 / ( N +

2.1 The explicit 4-point, 6-point, 9-point and 16-point block iterative methods

In the 4-point block method the mesh points are grouped together in blocks of four, this necessitates that N must be even. Then by applying equation (2.2) to each point of the B M blocks, M - 1, 2,..., N2/4, where each block consists of four elements, then for block BM, say, (see Fig. 1), we have the four equations: 4 U 1 -- U 2 - - U 4 ------H a 4" Ub 4 U 2 - - U 1 -- U 3 - - Uc + Ud

U4 -

(2.3)

Ue + Ul

4U 3-

U2 -

4u 4-

u ] -- U 3 = t4g + U h

[4 1 0 !][,] [u,u]

In matrix notation, the system can be rewritten as: - 1

4

- 1

0

1

4

-

u2

uc + u d

us = u4

ue + u/ u8 + u,

(2.4)

[,] [!21]rua,u] - 1

0

- 1

Since the 4 x 4 matrix can be easily inverted, the solution of the above system is: u2 u4

7

2

1 | uc + ud

2 7

2 lu.+u/

I

7

2

(2.5)

Lus+u,

Hence, the explicit 4-point block equations are given by: u I -- (TtI + 2t6 + t3)/24,

u 2 - (Tt2 + 2t s + t4)/24

us = (7t3 + 2t6 + tl)/24,

u 4 - (7t4 + 2is + t2)/24

where t I --" U a "4" 14b ,

t 2 -----U c "I- ~ d ,

ts =t~ + t 3,

t6 = t 2 + t 4.

t3 m

U e -I- H f •

t 4 -- U g 4" U h ,

(2.6)

D.J. Evans, W.$. Yousif / Explicit block iterative methods

83

Alternatively, by using the values of u I and u 3 from (2.6), u 2 and u4 can also be determined using the five-point formula as follows: let

x x = u I + u 3,

then

u2--(xx

+ t2)/4

and

u4"- (xx

+ t4)/4.

The computational molecules for the 6-point, 9-point and 16-point blocks are shown in Figs. 2, 3 and 4 respectively. The explicit 6-point block iterative equations are then given by: u~ = (712t~ + 225uj + 208t 2 + 120u e + 68t 4 + 47t3)/2415 u 2, u s a n d u s are found similarly u 3 - (52uj + 17u e + 15t 5 + 8 t s ) / 1 6 1

(2.7)

us is found similarly, where t l -~ tl a 4. Ub , ts

t I 4. t 4 ,

t 2 "" U e 4" U d ,

t 3 -- Ul + Ug,

tS -- t 2 4. t 3.

Similarly, the explicit 9-point block iterative equations are given by: u I -- (67t I + 22t 2 + 7t 7 + 6t 5 + 3 t s ) / 2 2 4 u 3, u 7

a n d u 9 are found similarly

(2.8)

u 2 = (37u~ + 11t s + 7t 9 + 5u~ + 3tlo)/112 u 4, u s a n d u s are found similarly u 5 -- (2t n + t]2)/16. where t I -~ U a + U b,

t 2 - - Uc + Ul,

t 3-

t 5 "- U / + U i ,

t 6 -- U s + U , ,

t 7 -- t 3 + t 4,

t 8 -- t I + t 3 ,

t 9 ~- U 1 4 . U l ,

t l 0 - - t 4 4. t 6 ,

i l l ---- t 2 + t 5,

t12 - - t 8 + t l o .

U d+

U e,

t 4-

Uj + U k ,

Finally, the explicit 16-point block iterative equations are given by: u I -- (1987t I + 674t 2 + 251t 3 + 101t 6 + 88t9 + 7 4 t 7 + 37t8)/6600 u4, u13 a n d ul6 are found similarly u 2 -- (2238u c + 762u d + 674tl + 458up + 251t4 + 242tlo + !62ut + 158u, + 138u k + 101t 5 + 74t8)/6600 u3, us, u 8, u 9, u12, u14, a n d u15 are found similarly

(2.9)

u s -- (916t 2 + 559t 3 + 458tt + 409t s + 316t7 + 242t9 + 158t8)/6600 uT, Ulo a n d u n a r e

found similarly, where

t I - - U a 4- U b ,

t 2 "- U c 4. U p ,

t 3 "- U d 4. U o ,

t 4 - - U e 4. U/ ,

t 5 --~ Urn 4. U n ,

t s --~ Ug 4. U t ,

t 7 = Uh + Uk ,

t 8 ---- U i 4" U j ,

t 9 - - t 4 4. t s ,

110 --~ U s 4- U o.

O n e can accelerate the algorithm by introducing a parameter co in a Successive Block Over-Relaxation (S.B.O.R.) based way. N o w the newly calculated solutions, i.e. (2.6)-(2.9), are considered to be intermediate solutions, u*. The new solution is then defined by: u (*+i) = (1 - co)u (lO + u * .

(2.10)

D.J. Evans. W.S. Yousif / Explicit block iterative methods

84

m

j

i

h

1

k

j

I

n

i

"x13 ""x u ""x n5 ""!6 k

g • "x7 •

a

"x 4

dh

"'x8 •

• "x 9

J~1, x9

11¢

'Iv

"x 5

J"

o

x6 J"

a

~h

d~

I



x6

,,L

x7 Jh

I~' x

c

d Fig. 3.

8

Ak

3 b

h x12

'•

x5 O

~" ,,L,, " "x 10 Xll

f 4

b

c

d

e

Fig. 4.

l'he value of ~o can be chosen such that the spectral radius of the method is minimal in the usual way.

3. Parallel strategies and implementation details Different parallel implementations of the explicit block iterative methods presented in Section 2, have been investigated and three strategies have been applied. These are outlined as follows (note, quantities in parentheses apply to the 6 and 9-point method while the one in brackets apply to the 16-point block method):

3.1 Strategy 1 A subset of the mesh consisting of an approximately equal number of blocks of 2, (3), [4] rows are assigned to each processor. This means that P subsets are formed where each subset contains N, rows of the mesh where N, - N/P, N is the number of rows in the mesh (divisible by P). In this strategy N, should be a multiple of 2, (3), [4]. Each processor independently iterates upon its own subset, either in natural or red-black ordering schemes until local convergence is achieved. When this occurs, a test for global convergence is performed.

3.2 Strategy 2 The blocks of 4, 6, 9 or 16 point are dynamically allocated, one at a time, to processors either in natural or red-black ordering. Although the ordering of the blocks is fixed it is not possible to say which blocks will be processed by a particular processor since this depends on the relative execution speeds of the processors.

3.3 Strategy 3 In this strategy, each processor is assigned an approximately equal number of contiguous blocks of the mesh which are evaluated in either natural or red-black ordt.ring. The maximum number of processors that can be used with ~,trategy 1 is N/2, (N/3), [N/4], while for strategy 2 and strategy 3 the maximum number is N2/4, (N2/6), (N2/9), [N2/16]. The three strategies have been implemented using the following structure: each participating processor evaluates a fixed number of blocks in some ordering scheme, increments its own

D.J. Evans, W.S. Yousif / Explicit block iterative methods

85

iteration counter and then tests for local convergence. If convergence was not achieved then this cycle is repeated, else the processor will set a bit in the id'th position (0 < id < p - 1) of the global convergence flag and then check whether the remaini~,g p - 1 processors have set their (id) bits also. If they have, the algorithm terminates, else the iteration cycle is repeated.

4. Experimental results Numerical experiments have been carried out on the Balance 8000 multiprocessor at PARC using the G.S. and S.O.R. methods to solve Laplace's equation, i.e. a2U a2U ;)xe + ay 2 = 0 ,

i n / ~ = (0, 1) X (0, 1)

(4.1)

Table 1 The elapsed time (in seconds) and speed-up values for the parallel Block G.S. iterative methods for mesh size 60 × 60 (strategy one)

Method

Number of !Processors

Natural Ordering Elapsed time

Speed up

Red-Black Ordering Elapsed time

Speed up

2041.485 1023.340 680.491 519.623 410.649 341.211 291.938 270.812 231.101

1.000 1.995 3.000 3.929 4.971 5.983 6.993 7.538 8.834

i

J¢ o

o

m ° m

O

or.

1 2 3 4 5 6 7 8 9

1996.245 1006.261 668.773 512.840 401.668 335.981 287.269 266.361 228.990

1.000 1.984 2.985 3.893 4.970 5.942 6.949 7.495 8.718

1 2 3 4 5 6 7 8 9

2078.867 1045.180 696.824 529.915 416.816 348.591 298.278 276.650 237.744

1.000 1.989 2.983 3.923 4.987 5.964 6.970 7.514 8.744

2098.315 1552.820 1030.285 783.724 617.613 513.768 431.247 400.930 339.536

1.000 1.351 2.037 2.677 3.397 4.084 4.866 5.234 6.180

1 2 3 4 5 6 7 8 9

1821.130 915.119 632.291 460.165 368.843 314.595 278.204 256.286 208.930

1.000 1.990 2.880 3.958 4.937 5.789 6.546 7.106 8.716

1834.001 930.011 639.755 465.212 372.103 317.965 280.306 259.165 210.741

1.000 1.972 2.867 3.942 4.929 5.768 6.543 7.077 8.703

1 2 3 4 5 6 7 8 9

1676.415 871.323 575.267 454.497 346.927 319.783 245.390 232.957 231.101

1.000 1.924 2.914 3.689 4.832 5.242 6.832 7.196 7.254

1681.920 877.463 576.226 457.663 347.731 319.742 244.977 233.983 227.407

1.000 1.91 7 2.919 3.675 4.837 5.260 6.866 7.188 7.396

i

m

8

m o m

J¢ n

8

¢B c o n

&

¢D

i

D.J.Evans, W.S. Yousif/ Explicitblockiterativemethods

86

Table 2 The elapsed time (in seconds) and speed-up values for the parallel BlockG.S. iterativemethods for mesh size 60x 60 (strategytwo) ||

Method



m °m

m m

,m

(b

,:¢

m °m

&

m ,m

& I,i

Number of Processors

Natural Ordering

Red-Black Ordering

Elapsed time

Speed up

1996.245 1441.103 1050.501 827.755 670.462 567.275 490.601 425.325 389.383

1.000 1.385 1.900 2.412 2.977 3.519 4.069 4.693 5.127

2041.485 1125.450 755.991 507.851 453.773 379.110 325.392 285.615 254.470

1.000 1.814 2.700 3.595 4.499 5.385 6.274 7.148 8.022

2078.867 1422.360 995.132 774.579 633.351 52; .008 450.140 404.035 352.127

1.000 1.462 2.089 2.684 3.282 3.945 4.618 5.145 5.904

2098.315 1114.435 746.541 560.438 448.114 374.125 320.899 281.633 250.325

1.000 1.883 2.811 3.744 4.683 5.609 6.539 7.451 8.382

1821.130 1292.321 952.678 724.137 597.820 496.561 414.264 374.563 319.063

1.000 1.409 1.912 2.515 3.046 3.687 4.396 4.862 5.708

1834.001 950.730 636.709 477.951 381.978 318.470 273.565 240.175 213.646

1.000 1.929 2.880 3.837 4.801 5.759 6.704 7.636 8.584

1676.415 1233.001 854.752 676.839 506.878 437.960 397.963 341.986 297.201

Elapsed time

1.000 1.360 1.961 2.477 3.307 3.828 4.212 4.902 5.641

1681.920 867.122 585.091 437.079 348.628 291.271 250.460 220.402 195.493

Speed up

1.000 1.940 2.875 3.848 4.824 5.774 6.715 7.631 8.603

i

and U(O,y)=lO0,

O
U(x,O) f U ( N + I ,

y) f f i U ( x , N + l ) - O ,

O<~x,y < N + l .

Throughout the experiments a tolerance of e - 10 -s in the convergence test was used. The calculation time (in seconds) was measured in order to determine the most efficient method and the best strategy to implement. Tables 1-3 lists the elapsed time and speed-up values for the 4-point, 6-point, 9-point and 16-point parallel block G.S. iterative methods for mesh size 60 x 60 with natural and red-black order~a~ by implementing the three strategies respectively. The elapsed time and speed-up values for the parallel block S.O.R. iterative methods using the three strategies are given in Tables 4-6 respectively.

DJ. Evans, W.$. Yousif / Explicit block iterativemethods

87

Table 3 The elapsed time (in seconds) and speed-up values for the parallel Block G.S. iterative methods for mesh size 60× 60 (strategy three) Method

Number of Processors

|

m

nO

_z m C

no &

CD °C ~

no

Natural Ordering Elapsed time

Speed up

Red-Black Ordering Elapsedtime

Speed up

1996.245 1018.966 679.561 507.593 406.897 339.185 290.902 254.135 227.120

1.000 1.959 2.938 3.933 4.906 5.885 6.862 7.855 8.789

2041.485 1028.162 687.475 514.081 413.824 347.066 296.514 261.777 231.953

1.000 1.986 2.969 3.971 4.933 5.882 6.885 7.799 8.801

2078.867 1049.215 696.319 523.510 417.718 351.111 299.706 263.195 234.350

1.000 1.981 2.986 3.971 4.977 5.921 6.936 7.899 8.871

2098.315 1068.275 708.218 545.831 425.209 362.140 307.963 270.288 238.600

1.000 1.964 2.963 3.844 4.935 5.794 6.814 7.763 8.794

1821.130 925.641 620.182 464.735 371.252 310.859 265.505 233.693 207.560

1.000 1.g67 2.936 3.919 4.905 5.858 6.858 7.793 8.774

1834.001 929.465 624.572 473.374 378.301 316.313 269.485 237.930 213.052

1.000 1.973 2.936 3.874 4.848 5.798 6.806 7.708 8.608

1676.415 858.718 572.371 427.780 344.773 286.719 245.611 216.104 192.410

1.000 1.952 2.929 3.919 4.862 5.847 6.825 7.757 8.713

1681.920 865.650 578.070 435.002 351.901 293.249 252.888 227.353 198.380

1.000 1.943 2.910 3.866 4.780 5.735 6.651 7.398 8.478

The elapsed time against the number of processors and the speed-up values against the number of processors were plotted for G.S. and S.O.R. methods using strategy 1 with natural ordering, strategy 2 with red-black ordering and strategy 3 with natural ordering, are shown in Figs. 5-16. Table 7 list the iteration numh~s for the sequential versions of all four block methods together with the optimum relaxation factor co for mesh size 60 × 60. S. C o n c l u ~ g remarks 3.1 Strategy 1

From Table I it can be noted that the parallel 4, 9 and 16-point block G.S. iterative methods performed quite well for both natural and red-black ordering, while the 6-point block G.S.

D.J. Evans, W.S. Yousif / Explicit block iterative methods

88

Table 4 The elapsed time (in seconds) and speed-up values for the parallel S.O.R. iterative methods for mesh size 60x60 (strategy one) Method

m o B

aB &

. m

Number of Processors

Natural Ordering

Red-Black Ordering

Elapsed time

Speed up

Elapsed time

Speed up

107.507 54.023 36.597 30.980 21.613 17.920 17.789 15.356 14.771

1.000 1.990 2.938 3.470 4.974 5.999 6.043 7.001 7.279

109.170 55.173 37.182 31.731 22.934 19.169 18.570 16.341 15.616

1.000 1.979 2.936 3.440 4.760 5.695 5.879 6.681 6.991

116.798 62.763 42.372 34.210 24.233 20.660 20.963 18.386 17.432

1.000 1.861 2.766 3.414 4.820 5.653 5.572 6.353 6.700

123.332

115.927 57.964 43.123 26.982 23.59 25.202 19.543 22.091 18.302

1.000 2.000 2.688 4.OO0 4.914 4.600 5.932 5.248 6.334

116.180 58.973 42.393 29.883 24.732 24.773 20.285 21.927 19.810

1.000 1.970 2.741 3.888 4.698 4.690 5.727 5.298 5.865

120.250 66.762 40.100 38.618 25.850 36.208 27.030 21.538 25.649

1.000 1.801 2.999 3.114 4.652 3.321 4.449 5.583 4.688

125.220 68.538 42.208 38.833 28.025 36.950 29.542 21.720 28.625

1.000 1.827 2.967 3.400 4.468 3.389 4.239 5.765 4.374

Timing results were high.

method performed well with natural ordering implementations but gives poor speed-up values with the red-black ordering. For the parallel block S.O.R. iterative methods ( t 0 - toopt) and from Table 4 it can be seen that the methods performed well for small numbers of processors P, up to P - 5, and P - 6 for the 4-point block. For the parallel 6-point block with red-black ordering, the convergence was very slow and the timing results were very high compared to the timing results of the sequential algorithm. To achieve a linear speed-up it is necessary to evenly distribute the work amongst processors (load balancing); this is not always possible when the number of processors P does not divide exactly into the number of blocks. The 4-point block S.O.R. method appears to be the most efficient method within this class of methods and for this strategy.

DJ. Evans, W.$. Yousif / Explicit block iterativemethods

89

Table 5 The elapsed time (in seconds) and speed-up values for the parallel S.O.R. iterative methods for mesh size 60x60 (strategy two) Method

Number of Processors

Natural Ordering

Red-Black Ordering

i

Elapsed time

Speedup

Elapsed time

ii

m

E 0 n_

1 2 3 4 5 6 7 8 9

107.507

Divergence occurred

116.798

m

Divergence occurred

,b

115.927 i

¢

Divergence occurred

i20.250 m

Divergence occurred 0 ,i,m

109.170 60.878 40.770 30.691 24.710 20.762 18.020 15.811 14.358 123.332 65.474 43.790 33.035

Speed up 1.000 1.793 2.678 3.557 4.418 5.258 6.058 6.905 7.603

22.224 19.237 16.920 15.168

1.000 1.884 2.816 3.733 4.648 5.549 6.411 7.289 8.131

116.180 61.390 40.987 30.860 24.8OO 20.696 17.977 15.843 14.210

1.000 1.893 2.835 3.765 4.685 5.614 6.463 7.333 8.176

125.220 62.550 42.466 31.793 26.156 21.747 19.016 16.730 14.930

1.000 2.001 2.949 3.939 4.767 5.758 6.585 7.485 8.387

26.533

5.2 Strategy 2

It can be seen from Table 2 that the parallel block G.S. iterative methods with natural ordering gives a poor speed-up values clue to excessive or ex,'ensive processor interference which reduces the rate of convergence. On the other hand, the red-black ordering of the blocks minimises processor interfereuce when used with this strategy. The parallel 16-point block G.S. method performed better than the other block methods in terms of elapsed times and speed-up values. The divergence which occurs in the parallel block S.O.R. methods, see Table 5, is caused by processor interference. This can be explained by the fact that processors are always evaluating adjacent blocks of the mesh, thus providing the opportunity for the largest number of processor conflicts, and that the optimum value of ~0 makes the iterative method more sensitive to any fluctuation which may arise due to processor interference.

90

D.J. Evans, W.S, Yousif / Explicit block iterativemethods

Table 6 The elapsed time (in seconds) and speed-up values for the parallel S.O.R. iterative methods for mesh size 60 x60 (strategythree) ii

Method

Number of Processors

n

Natural Ordering Elapsed time i

|

m °~

o

!

m °~

O

Or, OI

t

(D

Speedup

Red-Black Ordering Elapsed time

Speed up

i i

107.507 55.59 36.150 27.317 21.818 17.921 15.950 13.867 12.703

1,000 1.934 2.974 3.936 4.927 5.999 8.740 7.753 8.463

109.170 57.120 39.813 31.332 24.751 22.67O 19.285 18.858 16.493

1.000 1.911 2.742 3.484 4.411 4.816 5.661 5.789 6.619

116.798 63.323 41.552 30.640 24.847 2t.574 18.046 16.103 14.938

1.000 1,844 2.811 3,812 4,701 5,414 6.472 7.253 7,819

123.332 62.860 42.551 35.707 28.856 34,337 21.245 21.470 19.481

1.000 1.962 2.898 3.454 4.274 3.592 5.805 5.744 6.331

115.927 57.963 38.842 29.171 23.220 19.730 18.450 17.733 15.042

1.000 2.000 2.985 3.974 4.993 5.876 6.283 6.537 7.707

116.180 62.376 47.887 32.463 27.743 27.140 23.090 22.482 22.172

1.000 1.863 2.426 3.579 4.188 4.281 5.032 5.168 5.240

120.250 60.125 40.880 31.430 25.975 25.040 19.790 21.122 20.704

1.000 2.000 2.942 3,826 4,629 4.802 6.076 5,693 5.808

125.220 72.262 47.960 41.200 33.173 33.340 26.877 24.533 38.540

1.000 1.733 2.611 3.039 3.775 3.756 4.659 5.104 3.249

This explanation is supported by the results for the red-black ordering scheme for ~o- ~0opt which show reasonable speed-ups taking into account the (dynamic) block allocation and indexing array initialisation overheads. 5.3 Strategy 3

For the parallel 3, 6, 9 and 16-point block G.S. iterative methods it can be seen from Table 3 that all the methods performed very well, and in the natural ordering implementations achieved a linear speed-up. From Table 6 which gives the elapsed times and speed-up values for the paraliel block S.O.R. methods, the 4-point block with natural ordering performed well, while the other block methods, i.e. 6, 9 and 16, performed well up to P = 5.

2.]

DJ. Evans, W.$. Yo~if / Explicit block iterative methods

91

2O0O 41

|

J

1500

-B•,e,4•~-

4point (;point 9point 16point

7

8

1000

500

0

~ 1

2

3

4

5

6

9

Prc~uore

Fig. 5. Elapsed times for the block GS methods, str.1, natural ord.

| -B4•m-4,-

14 1

2

3

4

5

6

7

4point 6point 9point 16point

8

Processors

Fig. 6. Speed-up values for the block GS methods, str.1, natural ord.

9

D.J. Evans, W.S. Yousif / Explicit block iterative methods

92

120" -m. -e41•4,"

100.

80

4point 6point 9point 16point

~60"

4020-

0

I

1

l

I

I

2

3

4

I

5 Processors

I

I

6

7

i

i

8

Fig. 7. Elapsed times for the block SOR methods, str.1, natural ord.

,I 9

|

~"

5

31f

•~- 6point -i- -9point -~- 16point

2

1 I

2

3

4

5

6

7

8

Processors Fig. 8. Speed-up values for the block SOR methods, str.1, natural ord.

9

D.J. Evans, W.S. Yousif / Explicit block iteralive methods

93

2500

-B-

2000

4point 6point 9point 16point

,41-

1500

i

1000 P,

500 -

0

1

g

i

I

I

l

|

I

i

2

3

4

5

6

7

8

9

Processors

Fig. 9. Elapsed times for the block GS methods, str.2, R-B ord.

8-

9

"o (D (D U)

1

2

3

4

5

6

-B•4,4. -t-

4point 6point 9point 16point

7

8

Processors Fig. 10. Speed-up values for the block (35 methods, str.2, R-B ord.

9

94

D.J. Evans, W.$. Yousif / Explicit block iterative methods

140

120

4•4)•O. •4-

100

4point 6point 9point 16point

80

i

60

i=

40

20

I

2

3

4

5

6

7

8

9

Processors Fig. 11. Elapsed time for the block SOR methods, str.2, R-B ord.

f,

i

5

-B- 4point •4- 6point 41- 9point 16point

1

2

3

4

5

6

7

8

Processors Fig. 12. Speed-up values for the block SOR methods, str.2, R-B ord.

9

D.J. Evans, W.$. Yousif / Explicit block iterative methods

95

2500

2000

4point 6point 9point 16point

-lB. "4.g. -4P

m

1500

1000

500"

0

1

II

I

I

I

II

I

II

I

2

3

4

S

6

7

8

9

Prc~euom

Fig. 13. Elapsed times for the block GS methods, str.3, natural ord.

O.

9 "D Q Q 8U) 4point -e.. 6point 4;. 9point •~- 16point

1 1

2

3

4

5

6

7

8

Proceuom Fig. ]4. Spell-up values for the block GS methods, str.3, natural ord.

9

96

D.J. Evans, W.$. Yousif / Explicit block iterative methods

140-

120 444-6-

100

J I=

i

80

4point 6point 9point 1Gpoint

60

i: 40

20

1

2

3

4

5

6

7

R

9

Procelmors

Fig. 15. Elapsed times for the block SOR methods, str.3, natural ord.

O.

•o Q

5

Q

4441•cir.

4point 6point 9point 16point

I I

2

3

4

S

6

7

8

Procouor8

Fig. 16. Speed-up values for the block SOR methods, str.3, natural ord.

9

D.J. Evans, W.S. Yousif / Explicit block iterative methods

97

Table 7 Iteration Method

Number of iterations Natural ordering Red-Black ordering G.S.

S,O.R.. m-

G.S.

S.O.R.

The optimum value of m

4-Point

2674

144

2686

144

1.865

6-Point

2256

127

2268

133

1.855

9-Point

1836

117

1844

116

1.840

16-Point

1406

100

1412

101

1.818

The red-black orderin;~ scheme p r o d v / ~ quite poor speed-up values for the 6, 9 and 16 point block S.O.R. meth¢~is. The reasons are, firstly, the initialisation of the red-black ordering array had to be done sequenfi-ally for simplicity (the time required to do this is included in the total elapsed time), and secondly, the organisation of the allocated blocks allows greater possibility for processor conflicts. In conclusion, the four different blocks have been shown to be viable parallel algorithms when implemented asynchronously on a tightly coupled multi-prc~essor, with the 4-point block as the most efficient one.

Acknowledgments The authors are indebted to Mr. M. Wheat for some programming assistance.

References [1] D.J. Evans and M.J. Biggins, The solution of elliptic partial differential equations by a new block overrelaxation technique, Intemat. J. Comput. Math. 10 (1982)269-282. [2] W.S. Yousif and D.J. Evans, Explicit group over-relaxation methods for solving elliptic partial differential equations, Math. Comput. Simulation 28 (1986)453-466.