Analytica Chimica Acta, 181 (1986) 107-116 Elsevier Science Publishers B.V., Amsterdam -Printed
AN IMPROVED SIMPLEX ALGORITHM BOUNDARY CONDITIONS
in The Netherlands
FOR DEALING WITH
M. R. CAVE British Geological (Great Britain)
Survey,
Nicker Hill, Keyworth,
Nottingham
NG12 5GG
(Received 4th October 1985)
SUMMARY Two existing and one new method for dealing with boundaries in simplex optimization are tested on 2-, 3-, and B-parameter test functions, each subject to five different boundary conditions. The performance of the new method is shown to be the most consistent over all conditions.
Simplex optimization has proved to be a useful tool in analytical chemistry as shown in a review of its earlier uses [l] and by more recent applications [2-61. Since the introduction of the basic simplex method by Spendley et al. [7] as a generalized optimization method, several modifications have been tested [8-131 to try to improve the speed and accuracy of the optimization. These modifications have concentrated on varying methods for calculating the new vertex for each successive simplex move from existing vertices, given their respective responses. The results of this work show that the choice of method is very important, although there is another area of practical importance which has as yet received little attention, namely, the effect of boundary conditions. The aim of the work described here is to test the two existing and one new method for dealing with boundary conditions, on 2-, 3- and 5-parameter test functions with a series of boundary conditions imposed on them. THEORY
In simplex optimization of a multiparametric system, there will be constraints on each parameter which are usually set by physical restraints on instrumentation. During the optimization, however, it is quite common that the simplex procedure will arrive at a combination of parameter settings in which one or more parameters has exceeded its boundary condition. Nelder and Mead [8] used a simple solution to this problem by assigning an artificially low response to the straying vertex, which under the variable size rules [8] will cause the simplex to contract and move back within the boundary conditions. Using this method can cause a problem if the optimum 0003-2670/86/$03.50
0 1986 Elsevier Science Publishers B.V.
108
setting of the parameter is close to or actually on the boundary condition. As the simplex tries to approach the optimum, it will be forced away by the boundary rule and then approach again from a different direction. This process may have to be repeated several times before the optimum is reached. As will be shown below, this can lead to an increase in the number of experiments required and loss of accuracy in defining the optimum. This limitation was recognised by Routh et al. [9] who devised an alternative mode of dealing with boundary conditions, which was primarily designed to be used with their super modified simplex method [9, lo] . This method is illustrated in Fig. 1 for a two-parameter example (i.e., the simplex is a triangle) with the vertices B, NW, and W representing the combinations with the best, next to worse, and worst responses, respectively. The normal simplex step would then be produced by reflecting the triangle B NW W about the line B NW to produce R which in this case would be in violation of the boundary condition. Routh et al. [9] suggested that the new vertex should be formed at R’ by correcting R back to a point just on the boundary condition, and then continuing with B NW R’ as the new simplex. The main advantage of this method over the former is the ability to explore the factor space up to and including the boundary condition, which should prove more effective when the optimum is close to or on the boundary. From the experimental results, which will be discussed in detail below, it was evident that it would be advantageous to combine in one method the ability to move back and approach the optimum from a different direction as in the first method, and the ability to search up to the boundary as in the second method. With reference to Fig. 2, the combined method only uses a corrected point R’ at the boundary to form the new simplex if the response at R’ is greater than at B. If this is not the case, then a negative contraction to form NC is used. A further modification was also made to
Boundary
Boundary
;/ W
W
R
/
Fig. 1. Explanation
of boundary rule 2. For symbols, see text.
Fig. 2. Explanation
of combined boundary rule.
E
109
remove an anomaly which would have existed when this boundary method was used with the modified simplex method [8]. Normally, when R is formed if the response at R is greater than at B, then an expansion in the direction of R to form E would be performed. This would be pointless in the case of R’ as it would serve only to place E outside the boundary condition. In the combined method, therefore, when a corrected point R’ is formed with response greater than B, a positive contraction PC is tried; if this produces a response greater than R’ a negative contraction NC is tried, the better of the two responses from NC and PC being used to form the new simplex. If the response at PC is less than that at R’, then R’ is used to form the new simplex. The three boundary methods described were then used to form three simplex algorithms based on the simplex method of Nelder and Mead [8] with the modifications suggested by Aberg and Gustavsson [ll],and given the initials MSM, MSMl and MSMB, respectively. The three test functions used to evaluate the performance of the three methods are as follows: function 1 = cos X1 + cos Xz
(1)
(optimum X1 = X2 = 0”; response = 2) function 2 = cos X1 + cos Xz + cos X3
(2)
(optimum x1 = x2 = X3 = 0”; response = 3) function 3 = cos X1 + cos Xz + cos X3 + cos X4 + cos X5
(3)
(optimum X1 = Xz = X3 = X4 = X5 = 0”; response = 5) Well-behaved functions were chosen, as the main aim was to study performance at different boundary conditions rather than the effect of function complexity on the simplex methods. Five sets of boundary conditions were chosen for the parameters in each function to try to simulate all the possible conditions occurring in practical situations: (I) -180” < Xn < 180”, the optimum being symmetrically placed between the two boundaries; (II) -60” < Xn < 180”, the optimum being slightly displaced towards one boundary; (III) -10” Q Xn < 180”, the optimum being positioned very close to one boundary; (IV) 0” < Xn < 180”, the optimum being positioned on the boundary; and (V) 20” < Xn < 180”, the true optimum being positioned outside the boundary but with a local optimum on one boundary. All of the parameters in each function (represented as Xn for the general case) were subject to the same boundary conditions. EXPERIMENTAL
The microcomputer system used was a Commodore 64 with a Commodore 1541 single disc drive unit and a Commodore MPS-801 dot matrix printer.
110
All programs were written in Commodore 64 BASIC; copies are available from the author on request. Each simplex method was evaluated by using a method similar to that described by Ryan et al. [lo] and Parker et al. [ 131. For each function and set of boundary conditions, six initial starting sizes for the simplex were chosen (2, 10, 20, 40, 60, and 80% of the factor space). For each starting size, 100 randomly-generated coordinates were produced within the particular boundary condition being used. These coordinates and the procedure described by Yarbro and Deming [14] were used to construct the initial simplexes for each function, step size and boundary condition. The same set of initial simplexes for each function was used with each simplex method (MSM, MSMl and MSMB). The simplex optimizations for all functions were terminated &hen the standard deviation on the responses for the current simplex fell below 0.001. An expansion factor of 2 and contraction factor of 0.5 were used for each simplex method. RESULTS
AND DISCUSSION
The results of each optimization are reported in Table 1 in the manner of Ryan et al. [lo], giving the average number of evaluations for each initial starting simplex, the relative standard deviation on the average number of evaluations for each initial starting simplex and the non-critical to critical nonconvergence ratio. (A non-critical non-convergence is defined as one for which the simplex converges to a false optimum 10 times the termination criterion distant from the true optimum; a critical non-convergence is defined as one for which the simplex converges to a false optimum more than 1000 times distant.) From these data, a performance value, similar to that used by Parker et al. [ 131, was calculated for each simplex and step size, which combined both speed and accuracy in one function. The performance function (P) is defined by P = -log (A),where A is the average number of evaluations. If critical or non-critical nonconvergences occur, then P is defined as P = -log (A (1ON) (lOOC)), where N is the number of non-critical non-convergences and C is the number of critical non-convergences. This is a slightly modified version of the performance function used by Parker et al. [13] , in that a negative rather than a positive logarithmic function is used. This then defines, more logically, the simplex method with the highest P value to be the best as opposed to the method with the lowest P value as suggested by Parker et al. By plotting the calculated P values against initial starting Fig. 3. Comparison of the performance of MSM (0) and MSMl (+) on the different functions for different boundary conditions. Functions: (A) a-parameter; (B) 3-parameter; (C) 5-parameter. Boundary conditions: (a) -180” to 180” ; (B) -80” to 180”; (C) -10” to 180” ;(d) 0” to 180” ;(e) 20” to 180“.
_* 2 + d
.*
e
111
+,t-t-b-
‘0
J
e
_._._.--f
112
simplex size, graphical representations of the performance trends for each simplex method on each function and boundary condition can be obtained, as shown in Figs. 3 and 4. The results for methods MSMl and MSM2 are plotted separately along with the corresponding data for MSM, the MSM data providing a reference point for comparison of methods. Initial comparison of methods MSM and MSMl (Table 1 and Fig. 3) confirmed that both speed and accuracy of determining the optimum were lost when MSM was used if the optimum condition lay close to or on the boundary conditions, and that MSMl gave very good performance when the boundary conditions coincided with a local or global optimum. There were some reservations, however, as to the overall performance of MSMl in that it tended to produce reduced performance compared to MSM (mainly because of increased inaccuracy in defining the optimum) at large initial sizes (60 and 80%) for all functions where the optimum was symmetrically placed and slightly displaced towards the boundaries, and for the 2- and 3parameter functions when the optimum was close to but not on the boundary. The reason for this can perhaps be explained by referring to Fig. 1. The operation of the boundary condition rule in MSMl when R’ is formed is, in effect, a positive contraction. If the initial starting simplex is large, covering most of the factor space, then the formation of R’ could well be a massive contraction which would lead to a premature collapse of the simplex procedure before it had arrived at the optimum, leading to a reduced performance. Similarly, if the optimum is close to a boundary, the simplex procedure would be continually “bumping” against the boundary contracting each time and again collapsing prematurely. The results of the combined method MSM2 compared with MSM are summarized in Fig. 4. The success of this method is shown by the fact that for all bar one case at initial starting step sizes greater than 40%, MSM2 is as good as or outperforms MSM. In the one case where this is not so (Fig. 4A, c), an examination of the relevant data in Table 1 shows the difference to be due to an increased number of non-critical non-convergences at initial step sizes of 60 and 80% (8 and 10, respectively, compared with 0 and 0) which is only marginally less accurate. For the 2- and 3-parameter functions, MSM2 behaves as a compromise method, producing marginally reduced performance over MSMl at boundary conditions coinciding with the optimum, and improved performance over MSMl at boundary conditions close to the optimum. For the 5-parameter function, less predictably, MSM2 produces the best performance of the three methods over all boundary conditions, even at boundary conditions where the optimum is not close to or on the boundary. The multidimensional geometry involved in this case minimizes the validity of interpretations of the relative merits of each method using Bdimensional models. Fig. 4. Comparison of the performance of MSM (0) and MSM2 (x) on the different functions for different boundary conditions. (A-C) and (a-e) as in Fig. 3.
113
114 TABLE
1
Comparison of simplex methods for the 2-, 3- and 5-parameter functions Boundary condns.
I
II
III
IV
V
Ynitial
I& (%)
MSMl
AV. no.b
RSD m,)
2 10 20 40 60 80
36 25 24 24 26 28
29 22 16 14 13 12
2 10 20 40 60 80
36 27 23 24 25 27
35 32 21 15 13 16
2 10 20 40 60 80
46 30 30 28 29 29
33 31 24 17 17 14
2 10 20 40 60 80
58 45 41 39 40 42
34 25 23 17 17 17
2 10 20 40 60 80
83 68 64 62 63 63
26 23 19 16 16 18
simplex
3-Parameters
2Barameters MSM NC/CC
l/O
o/o 010 110 l/O 010 010 010
OP 010 010 010 412 010 3/O 5/O
o/o 010 110 411 210 7/O
Av,, no.
RSD WJ)
38 25 24 24 26 28
30 22 16 14 13 12
36 27 24 24 25 33
34 37 36 15 16 29
29 22 20 19 19 17
21 19 21 25 25 22
25 18 16 14 13
19 15 17 17
3/O
12
11 13
1612 810 9/O 2/O 3/O
26 20 17 15 14 14
16 16 18 17 13 14
110
13/o
size. bAverage
NC/CC
Av,, no.
RSD (W
35 24 24 24 26 28
31 21 16 14 13 12
111 110 110 110 110 010
35 25 23 24 25 28
36 29 21 15 13 16
010
24 26 21 16 16 16
41/o 1110
71/o 8110 69/O
36 28 27 27 27 28
o/o 010 010 010 o/o 010
34 30 26 28 25 28
010 010 010 010 010 010
43 36 32 33 29 30
211
010 010 110 110 010 3/O 7/O 3/O
010 o/o 2/O 8110 56/O 80/O
number of evaluations.
-
MSM
MSM2 NC/@
RSD (W
NC/C=
81 48 42 39 41 41
44 30 27 15 14 12
15/S
74 51 42 39 38 42
48 41 30 22 14 15
3318 8P’
35 44 39 31 26 14
70/1t 3211: 24111 1lP
8/O 1010
95 74 64 55 50 45
25 26 22 18 23 16
010 010 010 010 010 010
114 85 77 78 71 64
32 31 24 27 2 15
821315 5213, 52/O j
30 35 28 23 28 16
010 010 010 010 o/o o/o
119 110 109 102 100 91
29 27 26 25 22 15
4/O
o/o 010 010 010
4/O
o/o
‘Non-critical
Av,, no.
to critical
non-
CONCLUSIONS
For the five different boundary conditions used it has been shown that major differences between the performance of the three simplex methods occur mainly when the optimum is close to or actually on the boundary. In practical situations it is unlikely that the true global optimum will be situated exactly on a boundary, although it may well be the case that it will be close to a boundary or that the boundary marks a local optimum. A case in which the physical restraint8 of instrumentation forced a local optimum at the boundary has been reported [ 151; in a simplex optimization of an inductively-coupled plasma spectrometric system the maximum gas flow rate through the nebulizer tube was proving optimal until a tube with higher flow capacity was used, thus allowing a true global optimum to be found.
4/O 110
010 010 010
1P
2P 1P’ OP’
310 1410
34Pl lS/O~ 22/o 1 9514 8818 t 7913,
7OP
74/o I 71101
-
115
5-&mmleters pdl
MSMP
MSM
A-v. IIO.b
RSD (%)
NC/C'= Av. no.b
RSD (46)
60 4S 4t se 4$ 4Z
47 30 21 15 14 11
1517 3/O l/O 010 O/O IlO
66 44 40 39 41 41
33 27 18 15 14 12
6
45 40 29 19 17 31
3719 1612 4/O l/O l/O l/O
66 45 41 40 38 42
x: 21)
34 30 21 17 17 17
9716 8512 8110 71/O 75/o 78/O
46 3 21 2 2E 22
37 26 17 14 17 15
43 a4 SQ 30 28 25
27 23 22 18 14 14
4 i fe 49 sb 4g 39
convergence
NC/CC
MSMl
Av.
RSD
NC/CC
no. b
(%)
413 2/O 2/O o/o 010 010
203 163 116 96 80 76
34 42 40 34 24 12
100/70 8216 46/O 29/O 17/o 2/O
45 36 26 23 13 15
1113 210 2/O 2/o 110 010
184 166 129 89 66 67
59 62 4s 44 42 44
28 29 21 17 16 13
7019 20/l 26/O 12/o 610 4/O
716 010 010 010 010 010
70 60 51 52 48 52
36 33 22 24 21 22
717 o/o o/o 010 010 o/o
87 85 79 82 66 63
48 39 33 28 33 24
MSMP
Av.
RSD
noP
(%Ko)
194 153 116 96 80 105
36 36 40 34 24 32
100/70 193 82111 106 46/O 84 29/O 94 18/O 81 29/O 76
33 32 28 35 25 12
3819 1110 8/O 18/O 13/o 210
44 39 43 32 16 12
100/69 179 87119 148 6412 124 26/O 86 6/O 68 110 85
42 40 40 27 18 23
lOO/SS 187 86116 114 6212 90 32/O 83 910 66 33/O 67
42 47 36 31 14 12
60118 16/l 1010 18/O 110 IlO
204 192 178 144 121 83
38 33 31 36 28 18
100/85 170 99154 134 98123 122 86/O 79 64/O 78 27/O 66
43 38 49 23 26 17
100163 169 97121 137 86/l 113 60/O 89 40/o 78 15/o 78
32 43 32 22 14 13
91130 6714 57/o 36/O 610 210
913 110 110 2/O 010 010
215 216 189 174 148 121
32 33 32 31 18 16
100/90 180 100166 148 100129 118 97/l 83 93/O 87 66/O 76
48 40 48 38 26 22
73160 49120 2712 20/o 41/o 27/O
201 182 148 119 97 87
35 39 35 39 19 18
41117 2514 12/o 6/O IlO 010
1919 14/O 7/O 810 5/O 010
216 211 209 199 183 168
29 29 29 25 20 18
100193 100/79 100/52 100116 100/O 100/o
45 46 39 36 29 30
65143 51115 52/12 65/O 72/O 92/O
260 228 232 206 182 157
38 34 33 29 22 17
67123 74121 70/l 6417 52/o 17/o
204 171 146 140 127 100
NC/Cc
Av.
RSD
no.b
(~6)
NC/CC
ratio.
The absolute data as reported in Table 1 are relevant only to the particular test functions used; the trends in performance, however, (Figs. 3 and 4) should provide a degree of guidance for practical situations. The MSM boundary method has been shown to work well as long as the optimum is placed away from boundary conditions. The very nature of an optimization experiment, however, lies in not knowing where the optimum is at the outset and, as already mentioned, physical restraints on instrumentation do not allow optimal settings and boundaries to be set at will. The combination of the two methods MSM and MSMl produces an algorithm which is far less sensitive to the positioning of boundaries and perhaps should be universally adopted as the method of choice in simplex optimization work.
116
This paper is published with the approval of the Director, British Geological Survey (N.E.R.C.). REFERENCES 1 S. N. Deming and L. R. Parker, Jr,, CRC Crit. Rev. Anal. Chem., 7 (1978) 187. 2 C. F. Lam, A. For& and H. Bank, Appl. Spectrosc., 33 (1979) 273. 3L. Ebdon, M. R. Cave and D. J. Mowthorpe, Anal. Chim. Acta, 116 (1980) 179. 4 J. C. Berridge, Analyst (London), 109 (1984) 291. 5 D. D. Burgess and P. Hayumbu, Anal. Chem., 66 (1984) 1440. 6 S. C. Rutan and S. D. Brown, Anal. Chim. Acta, 167 (1985) 39. 7 W. Spendley, G. B. Hext and F. R. Himsworth, Technometrics, 4 (1962) 441. 8 J. A. Nelder and R. Mead, Comput. J., 7 (1964) 308. 9 M. W. Routh, P.-A. Swartz and M. B. Denton, Anal. Chem., 49 (1977) 1422. 1OP. B. Ryan, R. L. Barr and H. D. Todd, Anal. Chem., 52 (1980) 1460. 11 E. R. Aberg and A. G. T. Gustavason, Anal. Chim. Acta, 144 (1982) 39. 12 A. Gustavsson and J.-E. Sundkvist, Anal. Chim. Acta, 167 (1985) 1. 13 L. R. Parker, Jr., M. R. Cave and R. M. Barnes, Anal. Chim. Acta, 175 (1985) 231. 14 L. A. Yarbro and S. N. Deming, Anal. Chim. Acta, 73 (1974) 391. 15G. L. Moore, P. J. Humphries-Cuff and A. E. Watson, Spectrochim. Acta, Part B: 39 (1984) 915.