Implementing a parallel constrained l1 approximation algorithm

Parallel Computing 19 (1993) 609-620 North-Holland 609 PARCO 735 Practical aspects and experiences Implementing a parallel constrained 11 approxim...

Download PDF

622KB Sizes 1 Downloads 129 Views

Report

PDF Reader
Full Text

Parallel Computing 19 (1993) 609-620 North-Holland

609

PARCO 735

Practical aspects and experiences

Implementing a parallel constrained 11 approximation algorithm T.B. B o f f e y a n d W . A . E s s a h Centre for Mathematical Software Research, University of Liverpool, P.O. Box 147, Liverpool, L69 3BX,, UK Received 2 March 1992

Abstract Boffey, T.B. and W.A. Essah, Implementing a parallel constrained l 1 approximation algorithm, Parallel Computing 19 (1993) 609-620. The constrained 11 approximation algorithm of Barrodale and Roberts is described and the concept of pass-through explained by means of an example. Next, a parallel implementation is described. Experimental results using a transputer array are given for a range of problems indicating moderate speedups may be obtained for larger problems. A complicating feature is that the number of pass-through rows varies dramatically between iterations. Ways in which this variability may be accommodated are discussed.

Keywords. Linear programming; constrained l~ approximation; transputer system.

1. Introduction

This work is concerned with a parallel implementation of an algorithm presented by Barrodale and Roberts (BR) in 1978 for solving 11 linear approximation problems with linear constraints. In general, the l 1 problem is to find the solution x which, over a discrete point set, minimises the one norm error between the real-values of the exact function f ( t ) and the linear approximation function F(t), where F ( t ) = ~,~=txjchj(t) for a set of prescribed real valued functions ~bj(t). The mathematical basis of the BR algorithm is to modify the simplex method and the required data structures to provide an efficient method for solving an (m × n) system of equations

Ax = b. In general, when m > n the system will have no feasible solution and an 'approximation' is sought which minimises the l 1 n o r m : IIb - A x Ill where

b = [ b l, b E , . . . , bin] T a n d t h e m a t r i x A = [ A i , f i i = 1, 2 . . . . . m a n d j = 1, 2 . . . . . n ] a r e

Correspondence to: T.B. Boffey, Centre for Mathematical Software Research, University of Liverpool, P.O. Box 147, Liverpool, L69 3BX, UK. 0167-8191/93/$06.00 © 1993 - Elsevier Science Publishers B.V. All rights reserved

610

T.B. Boffey, W.A. Essah

given. The unknown vector x = [x~, x 2. . . . . x,] T is the solution to be determined. Thus, the l 1 problem is: minimise II b - A x I11 =

bi -

Ai,jxi •

i=1

Then, by letting [b i - Y?]=xAi,jxi[ = u i - v i, u i, v i >_0 for all i, and x i =x] - x 7, x], xT >~0 for all j, the 11 approximation problem is the linear program: m

minimise ~ (u i + vi) i=1

subject to A(x'-x")

+l(u-v)

=b

X', X " _ > 0 , U , V _ > 0 .

Sometimes, there will be additional linear constraints (we call them side constraints) that variables x / = x~ -x~' must satisfy. Assuming these linear constraints are represented by Cx = d and Ex <_f, then the linear constrained 11 problem (cll) will be of the form: m

minimise ]~ (u i + vi) i=1

subject to A(x'-x")

+ I(u-v)

=b

C ( x ' - x " ) + I ( u ' - , , ' ) =d E(x'-x")

+I(u"-v")

=f

x', x " > O, u, v >_O, u', v' >_O, u", v" > O where u', v', v" are vectors of artificial variables and u" is a vector of slack variables. The vector of constants [b d f ] will be denoted by x 0. Note that the identity matrices, although all denoted by I, will in general be of different sizes. The method developed by BR is based on the standard simplex method, with modifications which take into account the structure of this particular application. In the iterations of the normal standard full simplex method the choice of (pivot column) vector to enter the basis is one with the largest nonnegative marginal cost. The vector leaving the basis, chosen from among the basis vectors, is that one which has the smallest nonnegative ratio between the corresponding constant column vector and pivot column vector elements. The application of this rule can require a large number of iterations. The major modification made in BR to this rule is to select as the outgoing vector that which causes the maximum reduction in the objective function; details of how this might be achieved are given in Section 2. This results in a considerable reduction in the number of iterations required (as shown by Table 1 later). This modification, along with others such as condensing the tableau and restricting the first few iterations to the choice of x~ and x~' as incoming variable, leads to the BR algorithm. Before discussing how the BR strategy may be implemented in parallel, it is necessary to have a good understanding of the strategy itself; Section 2 is devoted to this. A characteristic of cll problems is that the (condensed) tableau is dense and so the use of sophisticated sparse matrix techniques is unnecessary. In such circumstances standard simplex may be implemented in parallel in an obvious way. Blocks of rows of the tableau are distributed to the separate processors with the principal computational burden, that required for the tableau update (the pivoting step), being performed concurrently for the separate

Implementing a parallel constrained 11 approximation algorithm

611

blocks. With this simple strategy, respectable speedups are readily obtained [4]. On :the other hand, speedup may be achieved for d l on a single processor by using the BR strategy. However, simultaneously achieving the benefits of parallelisation and the BR strategy is not straightforward. Since some applications (e.g. for solving integral equations [6]) require the solution of many It or cll problems, an effective parallel implementation of the BR strategy is desirable. The present paper is devoted to a description of how this may be achieved.

2. The BR strategy 2.1 The condensed tableau

We illustrate the modifications made to the standard simplex method, via an example. Example. Fit a line f ( t ) = z = x I + x2t to the set of points (t, z) = (2, 1), (3, 2), (3, 1) and

(5, 3) subject to the constraints x I + 2x 2 = 1 and x I < 2. Solution. The tableau in full is: Tableau 1

c1 cz

1 7

1 4

2 13

- 1 - 4

- 2 - 13

0 0

uE uz u3 u4 u'~ u '1'

1 2 1 3 1 2

1 1 1 1 1 1

2 3 3 5 2 0

-1 -1 -1 -1 -1 -1

-2 -3 -3 -5 -2

1

0 0

0 0

0 0

0 - 2

0 - 2

0 - 2

0 - 2

0 0

- 2 0

1

-1

0 0

- 1 0

1

-1

-1 1

-1

1

-1 1

-1

0

where c 1 denotes the phase I objective row and c 2 the phase II objective row. Now, for each index j - the coefficient of x 7 is the negative of the coefficient of x) - the coefficient of vj is the negative of the coefficient of uj - the coefficient of vj is the negative of the coefficient of u~ - the coefficient of v7 is the negative of the coefficient of u~' and a condensed (n + 1) column tableau (containing the constants column and the n columns corresponding to the x} variables) can be used. For the above problem the condensed tableau is: Tableau 2 x0

x~

x~

c2

1 7

1 4

2 13

uI u2 u3 u4 u~ u~

1 2 1 3 1 2

1 1 1 1 1 1

2 3 3 5 2 0

c1

Ratio

1/2 2/3 1/3" 3/5 1/2"*

612

T.B. Boffey, W..A. Essah

With the modified selection rule for the variable to leave the basis, (primal) feasibility is often lost, this being manifested by the appearance of negative element(s) in the constants column. Feasibility is restored by negating the corresponding rows and replacing in the basis each xj, uj, u~ or uj concerned by the corresponding xj, vj., vj or vi and vica versa. This scheme can be extended by noting the following, easily proved, relations between coefficients in the objective row(s): In phase I, if present, the reduced costs of: (a) xjt and xjt t sum to zero; (b) uj and vj sum to zero; (c) ujt and vjt sum to minus two; (d) uj and cj sum to minus one. In phase II, the reduced costs of: (a) xit and x~t ! sum to zero; (b) uj and vi sum to minus two; (c) u~ and ( sum to zero; (d) u~ and v) sum to zero. On the basis of this, the condensed form can be used in place of the whole tableau providing care is taken when selecting incoming variables etc. Clearly, the above tableau shows that using the condensed form will produce substantial savings in space. Moreover, the above numerical example is solved by the full standard simplex in 5 iterations whereas cll takes only 2 iterations. ¢

t

t¢

~ ¢t

II

II

tt

tt

!

tt

2.2 Pass-through For the tableau shown above, it is natural to choose x~ as the variable to enter the basis. Normally, u 3 would be chosen to leave the basis (since it corresponds to the smallest ratio) but the BR strategy would lead to u'1 being chosen. Thus variable u 3 has been 'passed through'. (A geometric interpretation of 'pass-through' is given below.) In order to explain in more detail the BR rule for selecting the leaving variable an example will be used. For ease of explanation this will be the 11 problem corresponding to the earlier example; that is the two side constraints are relaxed and phase I is not needed.

Example. Fit a line f ( t ) = x I +x2t to the set of points (t, z) = (2, 1), (3, 2), (3, 1) and (5, 3). Solution. The initial tableau is: Tableau 3 x0

x~

x~

c2

7

4

13

u1 u2 u3 u4

1 2 1 3

1 1 1 1

2 3 3 5

Ratio

1/2"* 2/3 1/3" 3/5***

Comments

7-2×2

= 3

13-2×3= 7 3-2X5 = -7

x~ is the natural choice of incoming variable and u 3 would be the normal choice of variable to leave the basis.

613

Implementing a parallel constrained I l approximation algorithm

3.5 3 2.5 2

f(t)

1.5 1 0.5 0 -0.5

I

I

I

I

I

1

2

3

4

5

6

t

Fig. 1.

Consider what h a p p e n s as x~ is increased from zero. T h e rate of decrease of c 2 is 13 per unit increase in x~ until x~ = 1 / 3 when u 3 b e c o m e s 0 (which is why u 3 would normally be chosen as leaving variable) and c 2 b e c o m e s 7 - 1 3 / 3 = 8 / 3 . Geometrically, the initial basic solution (with x 1. .-. . .x. .I. = x 2 - x 2 = 0) corresponds to the approximating line f ( t ) = 0 + Ot, i.e. the t axis (cf. Fig. 1). As x~ increases so x 2 = x ~ - x~ increases and the approximating line rotates about the origin in c o u n t e r clockwise direction. This continues until x~ = 1 / 3 at which point u 3 - v 3 = 0 and the line passes t h r o u g h the point p3(3, 1). As x~ continues to increase b e y o n d 1 / 3 so u 3 remains 0 and v 3 starts to increase (the approximating line now being above p3(3, 1)). Algebraically this means that v 3 should be in the basis not u 3 and this change may be effected as follows: (1) negative the u 3 row; (2) subtract twice the u 3 row from the objective row to re-express c 2 in terms of u 3 rather than v 3. (It may be helpful to consult Tableau 1 to see why this should be so); (3) the u 3 row is r e n a m e d the v 3 row. T h e effect of (2) is that the r e d u c e d cost corresponding to x~ is replaced by 13 - 2 × 3 = 7 (see c o m m e n t s column in Tableau 3). T h a t is, the rate of decrease of c 2 is now only 7 per unit increase in x~. Geometrically, this is a reflection of the fact that the approximating line is now moving away from P3(3, 1). W h e n x~ has r e a c h e d 1 / 2 the approximating line f ( t ) = 0 + 1 / 2 t passes t h r o u g h the point Pl(2, 1). F u r t h e r m o r e , increase in x~ requires u I to be replaced in the basis by v I with a c o n s e q u e n t c h a n g e in r e d u c e d cost of x~ to 7 - 2 x 2 = 3 (cf. c o m m e n t s in Tableau 3). Increasing x~ still further leads to c 2 being r e d u c e d until x~ = 3 / 5 . A t this point replacing u 4 in the basis by u 4 would lead to the r e d u c e d cost of x~ b e c o m i n g 3 - 2 × 5 = - 7; that is, an increase in x~ would lead to an increase in c 2. It is thus seen that a maximal reduction in c 2 is obtained by increasing x~ from 0 to 3 / 5 . R e t u r n i n g now to Tableau 3 we have establised that x~ should be the variable to enter the basis and u 4 the variable to leave. Pivoting leads to Tableau 4.

T.B. Boffey, W.A. Essah

614 Tableau 4

X~

U4

c2

X0

4/5

7/5

-13/5

U1 u2 u3 x~

--1/5 1/5 -4/5 3/5

3/5 2/5 2/5 1/5

--2/5 -3/5 -3/5 1/5

This solution is infeasible, but feasibility can be restored by: (1) negating the rows u 1 and u3; (2) changing the basic variables u 1 to v I and u 3 to v3; (3) adding twice these rows to the objective row to make the r e d u c e d costs of v~ and v 3 equal zero. Following these steps, the tableau becomes:

Tableau 5 X0

X~

U4

c2

6/5

-3/5

-3/5

X~p

3/5

U4

- 7/5

Ratio

v1 u2 U3 X~

1/5 1/5 4/5 3/5

- 3/5 2/5 -- 2/5 1/5

2/5 - 3/5 3/5 1/5

3/5 - 2/5 2/5 -- 1/5

- 2/5 3/5 -- 3/5 -- 1/5

1/3 2

T h e r e d u c e d costs of x~ and u 4 are both negative and from the previous results in ( a ) - ( d ) of Section 3, those of x'( and u 4 are 3 / 5 and - 7 / 5 respectively.: Therefore, to p r o c e e d it is necessary, explicitly or implicitly, to replace the x~ column by the x i' column and choose as the next incoming variable x'l'. T h e leaving variable will be chosen to be v 1 as shown in the above tableau. Pivoting leads to Tableau 6 which corresponds to the optimal solution.

Tableau 6 X0

U1

U4

c2

1.0

- 1.0

- 1.0

x~' u2 v3 x~

1/3 1/3 2/3 2/3

5/3 2/3 2/3 1/3

2/3 - 1/3 - 1/3 1/3

It is seen that two iterations have b e e n replaced by one together with application of Steps (1)-(3). However, the work corresponding to steps of the latter kind is an order of magnitude less than a full iteration in general. W e thus see how effectively performing m o r e than one ordinary simplex iteration by passing t h r o u g h variables in a single cll iteration requires little m o r e effort than a single simplex iteration, consequently leading to a great reduction in the computational cost. In summary, the two iterations required by the B R strategy correspond respectively to (1) rotation of the approximating line from the position of the t-axis through points p3(3, 1) and pl(2, 1) until point p4(5, 3) is encountered. (2) rotation about the point p4(5, 3) counterclockwise until point p1(2, 1) is encountered. The final approximating line is - 1 / 3 + 2 t / 3 with an 11 n o r m of 1.

Implementing a parallel constrained 11 approximationalgorithm

615

3. Parallel implementation The parallel implementation of this modified algorithm broadly follows that given in [4]. Blocks of contiguous rows are distributed to the slave processors. The master processor retains the objective row(s) and an appropriate (with respect to 'load-balancing') number of rows. A block column decomposition could have been used but is probably less convenient. 3.1 A n initial implementation In order to effect the pass-through pivot row selection strategy, it is sufficient to know the constant column and the pivot column elements. To facilitate this, the master processor keeps a copy of the full constant column and updates it. At each iteration the parts of the pivot column are sent to the master processor and assembled there. Testing for pass-through now takes place entirely within the master processor and, if the master is lightly loaded with rows, then this testing can take place while the slaves are performing their pivoting (see below). However, it is necessary to ascertain whether this is likely to lead to the master processor being a bottleneck if there are many pass-through rows as frequently occurs. To test for pass-through, the ratios (constantcolumn element)/(pivotcolumn element) are computed for each non-objective row. Suppose the pivot column is column q and the smallest ratio is selected at row p say. Then: > 0, if aoq - 2 × avq <_O,

row p is a pass-through row; row p is the pivot row.

This process is repeated until the pivot row has been found. When the identity of the pivot row has been determined, it will be broadcast for pivoting. The slave (or master) holding the pivot row broadcasts it (in updated form) and the next iteration begins. When pivoting has taken place, pass-through variables will be negative. These are removed by negating and renaming the row as in the example of Section 2.2. The sum of twice all pass-through rows is sent to the master for 'correction of objective'. In order for the above scheme to be very effective it is necessary for the master to have the pivot column as soon as possible (it already has the constant column). This can be achieved as follows: (1) Master updates the objective. (2) Master finds the incoming variable for the following iteration (call it q). (3) Master broadcasts q. (4) A slave which is busy pivoting will: (4a) suspend pivoting and accept q; (4b) update its p a r t of the next pivot column (column q) unless it has already been updated; (4c) send its part of the updated column q for assembly in master. (4d) continue with pivoting. The master processor can be doing its updating of rows while the slaves are carrying out Step 4. Step (4b) implies that, within each block of rows, pivoting should take place column by column. Considering these stages of the algorithm, it is seen that although beneficial in reducing the total time taken, the effect of pass-through rows is potentially to impose a bottleneck, in particular at the determination of the pivot row. This can be explained heuristically via an example: suppose there are m rows, n columns, p processors and that a is the proportion of

616

T.B. Boffey, W.A. Essah

pass-through rows. Also, assume the pass-through rows are spread evenly over the slaves. To find the pivot row, master searches through the ratios a m times and each time the dominant effort is m - 1 comparisons. [Here a simple search through all ratios is assumed with alternatives being considered in Section 3.2.] Hence the total effort is O(am2). Each slave has then to update a m / ( p - 1) pass-through rows and send their sum back to master to change the objective. At this time, we can see that both master and slaves are synchronised. Having finished that, slaves start updating the remaining elements which number (1 - a ) m / ( p - 1)n. Each update requires an addition and multiplication equivalent to, say, /3 comparisons, so that the total work on each slave is: (1 - a ) m n / 3

Total work

(P-O However this part is considered as an overlap time with finding next pivot column. Assume the latter forms a proportion y ( a ) of total work where 0 _< y ( a ) < 1, then work on master

otto 2

P = work on slave = (1 - a ) m n / 3 y ( a ) / ( p

- 1) "

That is:

[ P=

o (1 - a ) y ( a )

nfl

"

Hence, on the basis of this very rough calculation and practical timing experience, it seems that the master process is the bottleneck if there are more than a few pass-through rows. 3.2 Speeding up pivot row selection

In the early stages of the application of cll there are typically many pass-through rows; in fact the proportion of rows which are pass-through may rise as high as 50%. Consequently the pivot row selection routine forms a bottleneck for the calculation as a whole. How then can this deficiency be lessened? To pick out the minimum element of an unordered list it is necessary to run once through the whole list. To pick out the minimum and next minimum it is satisfactory to run through the whole list twice, though it is better to combine the two passes over the list into a single one. Using a sorting routine, such as heapsort or quicksort, will be quite effective for those iterations with a large number of pass-through rows. On the other hand, when there are few pass-through rows presorting the ratios is wasteful of effort with the sorting being a bottleneck. It has been found that, in general, it is better to use a sorting routine (heapsort in our case) throughout rather than not using one at all. However, it would clearly be better if the level of sorting were to be tailored to the effort required as measured by the number of pass-through rows. First, note that heapsort consists of two phases, namely (a) heap formation, and (b) element extraction. The top element of a heap is the best, in our case the smallest. During the extraction phase the set is ordered by successively removing the top element of the heap and reforming the remaining elements into a heap until the heap becomes empty (cf. for example [5]). Now, if there are, say, 25 pass-through rows at a particular iteration of cll then there is no point in continuing the extraction phase after 25 element have been extracted. The results given in Table 3 were obtained using such a restricted extraction phase. It is seen that moderate speedups are achieved. [Speedup here is defined as T i l T p, where Tp is the time in

Implementing a parallel constrained ll approximation algorithm

617

seconds to solve the problem with p processors.] Table 3 suggests that as n ~ oo with fixed m and p, then the speedup might tend towards p. The reason for this may be explained as follows: the more columns there are, the more time is spent by the slaves on pivoting and hence there is less scope for idle time whilst master is finding the next pivot row. The heap formation and element extraction phases of heapsort require O(m) and O(m log m) time respectively and the use of heaps with restricted extraction requires an amount of effort which varies from O ( m ) to O(m log m) as the number of pass-through rows varies from 1 to m/2. In complexity terms this is optimal. It might be possible to speed the procedure up a little on average as follows: (1) estimate the number of pass-through rows based on earlier iterations ( m / 2 could be used for the first iteration estimate); (2) on the basis of a small sample of ratios estimate a value/x above which ratios are unlikely to be needed; (3) discard ratios greater t h a n / z ; (4) use a heap with restricted extraction on the remaining set of ratios. Note that if/z turns out to be too small the algorithm is not invalidated (it merely means that not all possible pass-through rows are actually passed through at this iteration). Of the overheads (1)-(3), only (3) is at all time consuming and the extra effort required may be more than offset by the gain represented by (4). However, since any gain is likely to be small this approach was not pursued further.

4. Numerical results

In the following results, two important criteria have been tested on a number of different test problems which are also tested by the BR algorithm. The first criterion is the (considerable) saving made in the number of iterations compared to those carried out by the standard full simplex Simplex02 (cf. [4]); results are shown in Table 2. For the sake of consistency with the BR results given in [3], we have run cll and Simplex02 on those problems using double precision arithmetic as does BR. The second criterion is the speedup obtained for these test problems by using parallel implementation of cll and the results are shown in Table 3 and 4. The problems, mostly taken from [2], will now be described. Table 1 gives a summary of the problems which have been tested in terms of their sizes, the l I error obtained and the number of iterations carried out by cll. Note that, for examples 1, 3, 5 and 6, the function f(x) is approximated by the function pn(x) where pn(x) = V'n+lr~ a'J-1 for a given value of n. It is ~'j = 1 ~y~

Table 1 Numerical results for the test problems solve by cll Unconstrained

Constrained

Problem number

Problem size

11 error

Iterations number

Problem size

l1 error

Iterations number

1 2 3 4 5 6a 7 8

201x7 51x8 101 x 6 141x8 101 x 6 201 x 7

0.34705(-1) 0.81031(-3) 0.36746( - 2) 0.11127 (0) 1.14722 0.00033

19 19 17 19 15 16

215x7 102x8 103 x 6 282x8 104 x 6 229 x 7 13 x 5 6x2

0.11019 (0) 0.45296(-1) 0.85758( - 2) 0.16891 (0) 2.85128 0.00045 26.1475 1.125

14 15 12 34 8 20 3 2

T.B. Boffey, W.A. Essah

618

also necessary to point out that, due to the t r a n s p u t e r ' s m e m o r y size limitations, to enable running some of the largest test p r o b l e m s using Simplex02 in double precision, we m a d e minor changes for Simplex02 so that only half of the tableau is required. W h e r e this has been done is indicated in Table 2 by m e a n s of an asterisk.

Problem 1. Find the 11 error for the approximation of f ( x ) by

P 6 ( X ) where: f ( x ) = e - * sin(x) on x = 0(0.02)4 and subject to the constraint I aj I < 1 for j = 1, 2 . . . . . 7.

Problem 2. Find the l 1 e r r o r for the approximation of f(x) by F(x) where: f(x) = e - x 2 and 4 F ( x ) = Y'.a=lajxJ-l+ Et=lat+a(X-at )3 with knots a 1 = 0 . 1 , a 2 = 0.2, a 3 = 0 . 4 , a 4 = 0 . 7 on x = 0(0.02)1 and subject to the constraint F"(x) < O. T h e t e r m (x - a t) = 0 for x < a t. Problem 3. Find the l I error for the approximation of f ( x ) by ps(x) where: f(x) = sin(x) on x = - ~ ' / 2 ( ~ - / 1 0 0 ) r r / 2 and subject to the constraint that the approximation interpolate the derivative f ' ( x ) at the end points. Problem 4. A p p r o x i m a t e 1 f(x) = __e-X2~2 by F(x) = y~8=lai49i(x ) on x = - 3.5(0.05)3.5

6i(x)=~g(-~f--i+6),

i=1,2

. . . . . 8, O~x~l,

0.5X

/0.5X2 -- 1.5(X -- 1) 2,

g(x)=[Oo.5X2_l.5(x

1) 2 + 1 . 5 ( x

l
2
x3. and subject to the constraint F ( x ) < f ( x ) .

Problem 5. Find the best 11 approximation by Ps(X) on x = 0(0.01)1.0 to the function f(x) = min(e x, e 1/2) and subject to the constraint p s ( x ) = f ( x ) at x = 01 0.5 and 1. This example is given in [4].

Problem 6. Find the l 1 e r r o r for the approximation of f ( x ) by p,(x) where: f ( x ) = e* with n = 7, 14, 20, 25, subject to the constraint I aj [ < 1 for j = I, 2 , . . . , n and on: (1) x = - 1(0.01)1, (2) x = - 1(0.004)1. Problem 7. Find the la solution to the o v e r d e t e r m i n e d (8 × 5) system Ax = b given in [4], subject to the equality (3 × 5) constraints Cx = d and the less than (2 × 5) constraints Ex
Problem 8. Find the 11 approximation to f ( x ) = a + bx on the points (1, 2), (2, 2), (3, 3), (4, 4), (5, 3), subject to the constraints f(6) = 5 and f(1) < 3.

Implementing a parallel constrained 11 approximation algorithm

619

Table 2 Number of iterations using standard simplex and d l Problem number

Simplex02 iterations

1" 2* 2* 3" 3* 4* 5* 7

496 143 116 125 87 162 182 7

clI

Time (secs.) 259.0 5.81 18.5 17.2 12.8 43.5 25.6 0.05

iterations

Time (secs.)

Problem description

19 19 15 17 12 19 15 3

0.72 0.22 0.32 0.34 0.22 0.52 0.32 0.01

(unconstrained) (unconstrained (constrained) (unconstrained (constrained) (unconstrained (unconstrained (constrained)

5. Comments on results The results shown in Table 2 illustrate very clearly the benefit of the 'pass-through' mechanism: sequential cll is always very much faster than the standard simplex method implemented by Simplex02. Table 3 shows that speedups in the range 2-4 are achieved with 8 processors. We regard these speedups as reasonable, given the size of tableau involved and that pivoting was performed entirely on the slaves, thus implying a maximum speedup of 7. They are at least as good as to those reported for the corresponding I= problem, in [7]. We note that the examples given are typical of those encountered in one-dimensional fitting problems. Much larger 11 problems arise, with dense tableaux in, for example, image analysis; for such problems, larger speedups might be anticipated particularly if the number of columns n is large. Fitting a higher degree polynomial to the exponential function (problem 6) is not very satisfactory because of the rapid decrease in the coefficients of coefficients in the series expansion of e x. Consequently, a further problem was tested: Problem 9. Find the l 1 error for the approximation of f(x) by pn(x) w h e r e : f(x) = log x with n = 10, 20, 30, 40, 50, subject to the constraint I aj I < 1 for j = 1, 2 . . . . . n and on x = 1(0.01)5.

Table 3 Transputer timings (secs.) and the speedups achieved Example number size

Problem

1 1 1 3 3 4 4 6a 6a 6a 6a 6b 6b 6b 6b

201 x 7 215 x 7 201 x 14 103x6 101 x 12 141 x 8 282 x 8 201 x 7 201 x 14 201 x 20 201 x 25 501 x 7 501 x 14 501 x 20 501 × 25

T1 0.72 0.6 3.52 0.22 0.68 0.517 1.614 0.69 3.0 5.8 14.5 1.8 8.95 24.4 40.4

T4 0.38 0.33 1.46 0.13 0.31 0.264 0.77 0.375 1.26 2.3 5.5 0.98 3.67 9.3 14.95

T8 0.33 0.29 1.08 0.125 0.278 0.23 0.62 0.33 0.95 1.586 3.6 0.84 2.6 5.8 8.6

Iterations number

l1

(T 1 / Ts)

Speedup 2.18 2.07 3.27 1.76 2.46 2.25 2.6 2.1 3.16 3.66 4.0 2.14 3.4 4.2 4.58

19 14 60 12 25 19 34 16 51 75 158 17 62 132 184

0.347( - 1) 0.1102 7.28 ( - 9) 0.85 ( - 2 ) 3.35 ( - 9) 0.1113 1.689 0.334( - 3) 4.48 ( - 9) 4.37 ( - 9) 4.16 ( - 9) 8.1 ( - 4) 1.15 ( - 8) 1.138( - 8) 1.135( - 8)

error

620

T.B. Boffey, W.A. Essah

Table 4 Problem 9 results Problem size

T1

T4

T8

Speedup (T 1 / T8)

Iterations number

Ix error

401 401 401 401 401

2.27 15.8 44.0 66.2 60.8

1.1 5.9 15.3 22.6 21.0

0.9 3.9 9.17 12.7 11.7

2.5 4.0 4.9 5.24 5.0

22 106 213 248 183

2.40( 1.04( 5.18( 3.10( 2.43( -

× 10 x 20 × 30 × 40 x 50

3) 7) 8) 8) 8)

Numerical results are given in Table 4 where it is seen that as n increases the speedups get better and better.

References [1] I. Barrodale and F.D.K. Roberts, An improved algorithm for discrete l 1 linear approximation, Siam J. Numer. Anal. 10 (5) (1972) 839-848. [2] I. Barrodale and F.D.K. Roberts, An efficient algorithm for discrete l 1 linear approximation with linear constraints, Siam Z Numer. Anal. 15 (3) (1978) 603-611. [3] I. Barrodale and F.D.K. Roberts, Algorithm 552, solution of the constrained l 1 linear approximation problem IF4], Commun. ACM 6 (2) (1980) 231-235. [4] T.B. Boffey and W.A. Essah, A Standard Simplex LP solver, Working paper, Centre for Mathematical Software Research, University of Liverpool, 1990. [5] T.B. Boffey and K. Yates, An Introduction to Data Structures (Chartwell-Bratt, L. 1989). [6] W.A. Essah and L.M. Delves, The numerical solution of first kind integral equations, J. Computat. Appl. Math. 27 (1989) 363-387. [7] C. Phillips, Parallel minimax approximation of an overdetermined system of equations, Computer Laboratory, University of Newcastle, 1991.

Implementing a parallel constrained l1 approximation algorithm

Implementing a parallel constrained l1 approximation algorithm

Recommend Documents