An Approach to Continuous Minimax: The Basic Algorithm

An Approach to Continuous Minimax: The Basic Algorithm

Copyright @ IFAC Computation in Economics, Finance and Engineering: Economic Systems, Cambridge, UK, 1998 AN APPROACH TO CONTINUOUS MINIMAX: THE BASI...

2MB Sizes 0 Downloads 42 Views

Copyright @ IFAC Computation in Economics, Finance and Engineering: Economic Systems, Cambridge, UK, 1998

AN APPROACH TO CONTINUOUS MINIMAX: THE BASIC ALGORITHM

Melendres Howe Portfolio Strategies Jt. Research Nomura International PL C London EClA 4NP, UK

Here Rustem Departme~t of Computing

Imperial College of Science, Technology Jt. Medicine 180 Queen's Gate, London SW7 2BZ, UK

Abstract: Uncertainty in forecasting can be addressed by considering optimal decisions robust to all possible realisations. The requirement for point forecasts can thereby be relaxed as only forecast ranges are needed. In the present paper, we consider an algorithm for computing the best decision in view of the worst-case forecast scenario. This minimax decision provides a guaranteed basic performance that is noninferior to any outcome within the range. We discuss an algorithm for the continuous minimax, worst-case design, problem. The algorithm extends the steepest descent approach of Panin (1981) and the convex combination rule for subgradients of Kiwiel (1987) to quasi-Newton search directions, conditional on an approximate maximizer. Two alternative directions are considered. The first is relatively easy to compute. It is based on an augmented maximization to ensure that the multiplicity of maximizers does not result in an inferior search direction. The second involves a quadratic suproblem to determine the minimum norm subgradient. The descent property of the chosen direction is established. A step is taken using an Armijo-type stepsize strategy consistent with the direction. The convergence of the algorithm is discussed in Rustem and Howe (1998). Copyright @ 1998 IFAC Keywords. Riskj Minimax techniquesj Optimizationj Algorithms

1. INTRODUCTION

Assumptions

Consider the problem

1.

min max f(x y) x E!R n y E q,J , m where q,J C !R and f: !Rn x '\II-t !RI. Let (x)

y mEaxq,J

f(x, y).

(1)

2. 3.

(2)

for all x E !Rn. We call (x) the max-function. Hence, (1) can be written as x

min

E

~n

(x).

'\I c !Rm is a convex and compact infinite setj f (x, y) is continuous in x and Yj twice continuously differentiable in Xj In the neighbourhood of the solution x* of (3), there exists a scalar b > 0 such that "I x E { x E !Rn 11 x - x* 11 < b }

I

(3)

the Hessian with respect to x of f(x, y) for all y E '\I is positive definite.

We discuss a quasi-Newton algorithm to solve (3). The salient features of the algorithm are the generation of a descent direction based on a subgradient of f(x, .), an approximate Hessian for possible multiple maximizers of (2), and a stepsize strategy that ensures sufficient decrease in (x) at each iteration.

If '\I is a finite set, (3) becomes the discrete minimax problem

max { f{j) (x) }. j E { 1, ... , J } 429

(4)

(1973) consider a first order, feasible directions, method that, like Chaney (1982), requires the maxImlzer to be unique. Demyanov and Malozemov (1974) solve an infinite sequence of discrete minimax problems of the form

Algorithms for solving (4) have been proposed by a number of authors (e.g. Charalambous and Bandler, 1976; Coleman, 1978; Conn and Li, 1992; Demyanov and Malozemov, 1974; Di Pillo and Grippo, 1993; Dutta and Vidyasagar, 1977; Hald and Madsen, 1981; Han, 1981; Murray and Overton, 1980; Polak, Higgins and Mayne, 1992; Rustem, 1992; Womersley and Fletcher, 1986). Most of these involve the transformation of (4) into the nonlinear programming problem

I

max f(x y) yE '\I' , for , = 1, 2, ... , where '\I' are finite subsets of '\I. Panin (1981) uses an approximation to ~(x) at xk' given by

min +1 {z (0) (x) ~ z; j=1, ... , J x, z E ~n (5) In the continuous case for '\I, the formulation of (5) is the semi-infinite optimization problem min n+1

{z

I f (x, y) ~ z;

x, z E ~ with an infinite number of corresponding to the elements in '\I.

l

V y E '\I

~{(d) =

1{ (d, y)

1,

y~~

= f (xk' y) +

rf (d, y)

< Vx f (xk' y), d >.

The method is based on the assumption that for any x E ~n, one can determine

(d) constraints

cl = arg

min d E ~n

~e (d) + ! IIdl~ k

1

2

(7)

where 11· 11 = <', . > 2. No procedure for solving (7) is given by Panin. The method of Kiwiel (1987) is based on Panin (1981). At the kth iteration of Kiwiel's (1987) method, the change in the objective, ~(xk + d) - ~(xk) is approximated by ~{ (d) - ~ (xk)' At x, a descent direction for ~(x) could be found by solving (7) using an auxiliary algorithm. Since the objective in (7) is strongly convex in d, cl exists and is uniquely determined by ·

Problem (3) poses several difficulties. First, ~(x) is in general continuous but may have kinks. Therefore, it may not be straightforwardly differentiable. The presence of kinks makes the optimization problem difficult to solve. At a kink, the maximizer is not unique and the choice of subgradient to generate a search direction is not simple compared to smooth functions. Furthermore, the Hessian of the Lagrangian of (6), is viewed in the context of multiple maximizers. The existence of the Hessian of the max-function for finite '\I (Le. the discrete minimax case) is discussed by Wierzbicki (1982). The approach can be used to establish that the Hessian of (6) represents the Hessian of the maxfunction at the solution. The results below require the existence of the Hessian of (6) and the algorithm utilizes an approximation to this. Second, ~(x) may not be computed accurately as it would require infinitely many iterations of an algorithm to maximize f(x, y), with respect to y E '\I. In practice, the maximization algorithm is terminated when a sufficiently good maximum is attained. Third, (3) requires a global maximum, in view of possible multiple maximizers, such as corner solutions (see Remark 1 a). The use of non-global maxima cannot guarantee a monotonic decrease in ~(x).

oE cl+conv{ Vxrf(xk'Y); yE 'lJ{+1} 'lJ{+1 = { y E '\I

I y = arg max 1{ (dk, y) }

and conv {.} denotes convex hull of {.}. Kiwiel's method finds, at each x, a linear combination of Vxf(x, Yi)' Yi E 'lJ{+1' We discuss an algorithm defines a direction of potential progress for ~ (x) based on the maximizer at x, corresponding to the minimum norm subgradient of Vx f(x, .), and a Hessian approximation. It is shown that this is a descent direction except at the solution of (1). An Armijo-type stepsize strategy, consistent with the search direction, is used to determine the stepsize. The algorithm is globally convergent, attains unit steps and Q-superlinear convergence (Rustem and Howe, 1998).

A number of algorithms have been proposed for (1). The method of centres by Chaney (1982) requires that the maximum in (1) be attained at a unique point y(x) for all x. The algorithm requires that for all maximizations involving xk' there is a globally convergent procedure generating a sequence {Yj} convergent to a unique y(xk)' Hence, Chaney's (1982) method does not handle kinks in ~(x). In this case, ~(x) is differentiable and can be solved using smooth optimization techniques. Klessig and Polak

The algorithm extends the first order approach of Panin (1981) and Kiwiel (1987) to quasiNewton descent directions, conditional on the maximizer, and attempts to deal with the problem of multiple maxImlzers. Two alternative directions are considered. The first involves an augmented maximization subproblem to ensure that any multiplicity of maximizers does not result in an inferior search 430

direction. Following Lemma 1 below, whenever possible, the maximizer with the minimum-norm subgradient is chosen. The second direction involves more computation. This is a quasiNewton descent direction based on the combination of the gradients corresponding to the multiple maximizers. The evaluation of the combination entails the solution of a quadratic programming problem. The descent property is established in Lemma 2 below. In the discrete min-max case, the algorithm based on the second direction is equivalent to the discrete min-max algorithm in Rustem (1992).

The choice of Hessian, discussed below, is consistent with the convex duality theory in Wierzbicki (1982; Lemma I), for the minimumnorm subgradient in the subdifferential

{)~(x) = conv {Vx f(x, y) y = arg max f (x, y)} = conv {Vx f(x, y) f (x, y) = ~(x)} of the mu-function. By Wierzbicki (1982), this is equivalent to the problem of minimizing the quadratic approximation to the Lagrangian of (6). The equivalence holds for y = arg max f (x, y), or f (x, y) = ~(x), ensured for sufficiently large e in (8)-(9). Hence we can express the above subdifferential equivalently as

We introduce useful basic concepts in Section 2 and present the algorithm and its basic descent property in Section 3. The monotonic decrease of the sequence {~(xk)}' convergence to unit stepsizes, global and local convergence results and numerical experiments are discussed in Rustem and Howe (1998).

{)~(x)

y :;Xk)

y = arg

"mE~ f(x,

Assumption .( At a point x, all members of the set computable.

{)~ (xk) _

{)d

max = yE '\I (xk)

a = arg

~lIdl~

k

(x),

< Vxf(xk,y),d>

- f(xk' y)

t

- e [ ~ (xk)

ymE~

fk (d, y)

nXI~

f },

(~)

'\Ik+l == {Yk+l E '\I(xk) ~k+l =arg max fk(a,y)}-

(8)

The resulting direction -

-1

d = - Hk Vx f (xk' Yk+l)

(11)

is a des<:.ent direction for < a, Vx f (xk'Yk+l) > and, if d is a descent direction for all the other maximizers, it is a descent direction for the maxfunction. If Y.k+l is nonunique (l'\Ik+l 1 > I), choose an arbItrary element of '\Ik+l;

(9)

~~( ) < Vx f(xk' y), a >. (12) yE \I Xk To solve (3), the quasi-Newton algorithm constructs the sequence

y = arg

Assumption 5 For the Hessian appr~ximation Hk , k = I, 2, ... , there exist numbers (, ~ > 0 such that x n2 ::;

- f(xk' y)

1

that solve (9), (10), is a ~ubset of '\I(xk)' We define maximizer set of fk (d, y) as

where ~ 11 d I? = < d, Hk d > and Hk is a positive definfbe approximation to the Hessian with respect to x of the Lagrangian of (6) at the kth iteration.

p

~k (d).

It is shown in Lemma 1 that the set of y E '\I

where e ~ 0 is a penalty parameter for deviations from the maximizer at xk' The maxfunction corresponding to this approximation is ~k (d) =

mm

dE~n

Yk+l = arg y mEax'\l{f(Xk'y) -HVx f (xk' y) 112

< Vx f (xk' y), d > +

e[ ~ (xk)

y), dk >

To evaluate a, we note that (9) is minimized by d(y) = - Hi1 Vx f (xk' y), for y E '\I. Using this d in (9) determines the maximizer Yk+l given by

(Demyanov and Malozemov, 1974). The proposed algorithm generates quasi-Newton directions that ensure descent. To this end, let fk (d, y) denote an augmented quadratic approximation to f (x, y) fk (d, y) = f (xk' y) +

~(xk) < Vx f(xk'

Let a be the direction that minimizes the approximation to the max-function, i.e.

,,) }-

~

9J (x) }.

Demyanov and Malozemov (1974; Theorem 3.1).

9J (x) are

We define the directional derivative of along the direction d E ~n, at xk' as

=

< Vx f(xk' y), d k > =

Vx f(xk' ~~ {)

At any x, we define the set of maximizers by

I

Iy

Furthermore, we note that,

2. BASIC CONCEPTS AND DEFINITIONS

9J (x) == { y

= conv { Vx f (x, y)

(13)

::; (nxn2 ; V x E ~n. k

431

Ak+l is the solution of (16). We also define

where Qk is calculated according to a rule discussed below while d k is given by

~

- !llgkll if

-e

- Hiel gk if < Vx f(xk' y), d > ~

.pk '" { . "it d ~l~n ~k (d) - ~ (xk) otherwise

(14)

dk -

d otherwise

a ~ad(x*) =y Emax C!J(x*) < Vxf(x*,y),d >

For nonunique yE C!J(xk)' by Caratheodory's theorem (e.g. Rockafeller, 1972), a vector gk E a~(xk) can be characterised by at most (n + 1) vectors Vx f(xk' y) E a~(xk) such that

L: ~ 0;

Ak+l Vx f (xk' y);

.

L

3. THE QUASI-NEWTON ALGORITHM The algorithm utilizes the direction d whenever possible as the evaluation of gk entails the solution of a quadratic programming problem. If d is a descent direction, as discussed above, it is used to determine d k in Step 2, and subsequently ~k in Step 3. It then utilizes a stepsize strategy to determine the progress along d k and updates the Hessian approximation dJc. The default parameter values are those used ID Rustem and Howe (1998).

(15)

Ak+l = 1.

yE C!J(xk) As in Wierzbicki (1982; Lemmas 1 and 3), Ak+l are chosen to ensure that gk is the minimumnorm subgradient in a~(x*) and hence Ak+l=arg

~~n {II yE LC!J(xk) AYVxf(Xk'Y)lr AY

~ 0;

IIkI

f

AY =

1}- (16)

The Algorithm

As all y E C!J(xk) correspond to the same function value, gk and the Hessian using Ak+l' are consistent. The solution to the minimumnorm problem is unique when Vx f(xk' y), yE C!J(xk)' are linearly independent. Otherwise, a minimum length Ak+l is determined.

Step 0: accuracy parameter u E (0,

direction

based

e

Step 1: Maximization at xk: compute the global solution to nonlinear program ~ (xk) = y mEaxC!J { f (xk' y) }. (17)

on

Step 2: Direction-finding subproblem: (a) compute Yk;tl given by (10): if Yk+l is nonunique (I.e. 1C!J.k.+l 1 > 1), choose an arbitrary eleme!lt of 9Jk+l" (b) Compute d and y given by (11) and (12) respectively. If < Vx f(xk' y), d > < got to Step 3, else compute gk given by (15)(16). Set d k given by (14). Stop if

-e,

The Lagrangian of (6) is considered given the maximizers at xk'

"dk " < Line search: if

Step 3:

< Vxf(xk,y),d > ~

L

L(x, v, A) = v + (f(x, y) - v) AY. yE C!J(x)

~k =



-e, compute - !llgkll

IIkI

otherwise

Thus, at xk' the Hessian ViL is given by

L

Given xP' yO' HO' set:k = 0; final 0, l = 10-6); line search c E (0, 1), (c= 10-4); stepsize factor 1), (u = 0.5); penalty coefficient e

e~

E [0, (0), (e = 106).

arg ~l~n~k (d) d corresponds to the selection of d k based on the maximizer that yields the least steep gradient which is straightforward to compute. On the other hand d k based on (15)-(16) entails the solution of a quadratic programming problem. The former is ensured to be an overall descent direction if it is also a descent direction for the other maximizers. The latter is always ensured to be a descent direction. The restriction which ensures the equivalence of d k computed either way is discussed in Rustem and Howe (1998). The

~ 0,

(Demyanovand Malozemov, 1974).

yE C!J(xk) Ak+l

.

Let x* be the solution of (1). At x* the following variational inequality is satisfied V d E ~n as the nec:essary condition for an erlremum (nee):

and g.k is a convex combination of Vx f(xk' y), y E 9J (xk)' given by (15)-(16) below.

gk =

~-e

~k = f(xk' Yk+l) + < Vx f(xk' Yk+l)' d k > Vi f(xk' y) Ak+l

yE C!J(xk) where, consistent with the above discussion,

+ !/IdklC k

432

-e{~(Xk)-f(Xk,yk+l)f -~ (xk)·

ak = max { a

I~(xk+a dk) - ~(xk) ~ ca 'Ik ; a

= (u)i, i = 0, 1, 2, ...

< 'Yk,c5j > .

(19, b) The algorithm converges for any positive definite matrix His;' The BFGS formula is used to ensure unit stepslzes and superlinear convergence.

}.

Set xk:+l using (13), k = k+l, update Hessian, go to Step 1.

It is possible for a maximizing algorithm to terminate at a solution for (10), satisfying

The condition m9Jax( ) < Vx f (xk' y), d k > < yE xk

e

(18)

o~

ensures that d k is a descent direction for the mu-function. This is established in Lemma 2.

for a small number e(e) ~ O. In Lemma 2, it is shown that d k is a descent direction even when Assumption 6 below is not satisfied. Nevertheless, the satisfaction of this equality is required by subsequent results, and enforced in the algorithm by appropriate choice of e.

In the case of the discrete min-max problem (4), with Assumption 4 and the direction of search given by d k = - HkI gk' the above algorithm is equivalent to the aiscrete min-max algorithm discussed in Rustem (1992). In particular, this equivalence applies to the computation of gk in (15)-(16) and the quadratic subproblem in Rustem (1992) as well as the stepsize strategies of either algorithm.

Assumption 6 There exists a e ~ 0 such that equality ~ (xk) - f (xk' Yk+l) = O. is satisfied for all k.

(20)

Remark 3 Assumption 6 can be replaced by a strategy for adjusting e in the algorithm to satisfy (20) (Fiacco and McCormick, 1968). The default value of e has, in practice, been sufficient to ensure (20).

Remark 1: Choice of Maximizer (a) If the maximum in (17) is attained by more than one y, these constitute 9J(xk)' In the case of 9J given by upper and lower bounds, a potential subset of 9J(xk) is evaluated by considering local solutions of the nonlinear program and the value of f(xk' y) at every vertex of the hypercube 9J. This practical approach can be refined by adopting a global optimization procedure, based on branch-and-bound, with greater assurance to reach the global maximum (e.g. Pardalos and Rosen, 1987; Floudas and Pardalos, 1995). If the minimum in (10) is attained by more than one y, these constitute 9J k+ l . By Lemma 2 below, dk is a descent direction, for all Yk+1 E 9J k-l: 1' 19J k+1 1 > 1. The algorithm is convergent for any choice of Yk+ 1 E 9J k + l'

Subproblem (10) ensures Yk 1 E 9J(xk) through the penalty term. ~y Remark 3, Assumption 6 can be relaxed using a strategy for increasing e to ensure (20). This is discussed in Lemma 1 along with the possible non uniqueness of Yk+ l' In the latter case, the descent property and subsequent convergence results apply to all members of 9Jk+ l' Lemma 1

Let (i) Assumptions 1,4, 5 and 6 hold, (ii) f(x, y) be continuous in y and once continuously differentiable in x; and (iii) e be chosen from 0 ~ e < 00. Then (a) in (10), we have Yk+l E 9J(xk)' and;

(b) Let (18) be satisfic:d .. We have, Yk+l E 9J k + 1 ~ 9J(xk)' d k = d in (11) and Vx f(xk' Yk+l) + Hk d k = O. Remark 2: Hessian approximation The approximate Hessian Hk is computed using the BFGS quasi-Newton formula (Broyden, 1970; Fletcher, 1970; Goldfarb, 1970; Shanno, 1970). We have

(b)

Yk+l

y = arg

L

'Yk = Ak+2Vxf{xk+l'Y)yE 9J(xk+l)

L

~ (xk) - f (xk' Yk+l) ~ e(e).

E 9J k + l = { y E 9J (xk)

min { HVxf(xk' y)11 }}. y E 9J(xk) Jr.I k

Proof (Rustem and Howe, 1998). Lemma 2 [Descent. properly of d k] Let (i) Assumptions 1, 4, 5 and 6 hold; and, (ii) f(x, y) be continuous in y and once continuously differentiable in x. Then we have

Ak+l Vx f (xk' y); (19, a)

yE 9J(xk)

and

~

(a) 'Ik = - HdklC k

433

-

~lldkll ~

0,

Wiley, New York. Fletcher, R. (1970). "A New Approach to Variable Metric Algorithms", Computer J., 13, 317-322. Floudas, C.A. and P.M. Pardalos (1995). State of the Art In Global Optimization: Computational Methods and Applications, Kluwer Academic Publishers, Dordrecht. Goldfarb D. (1970). "A Family of Variable Metric Algorithms Derived by Variational Means", Math of Computation, 24, 23-26. Hald, J.H. and K. Madsen (1981). "Combined LP and Quasi-Newton Methods for Minimax Optimization", Math Prog, 20, 49-62. Han, S.P. (1981). "Variahle Metric Methods for Minimizing a Class of Nondifferentiable Functions", Math Prog, 20, 1-13. Kiwiel, K.C. (1987). "A Direct Method of Linearization for Continuous Minimax Problems", JOTA, 55, 271-287. Klessig, R. and E. Polak (1973). "A Method of Feasible Directions Using Function Approximations with Applications to Minimax Problems" , J. Math. Anal. and Applications, 41, 583-602. Murray, W. and M.L. Overton (1980). "A Projected Lagrangian Algorithm for Nonlinear Minmax Optimization", SIAM J. Sci. Stat. Comput., 1, 345-370. Panin, V.M. (1981). "Linearization Method for Continuous Min-Max Problems", Kibemetika, 2, 75-78. Pardalos, P.M. and J.B. Rosen (1987). Constrained Global Optimization: Algorithms and Applications, Lecture Notes in Computer Science No: 268, Springer Verlag, Berlin. Polak, E., J.E. Higgins and D.Q. Mayne (1992). "A Barrier Function Method for Minimax Problems", Math Prog, 54, 155-176. Rockafeller, R.T. (1972). Convex Analysis, Princeton University Press, Princeton, New Jersey. Rustem, B. (1992). "A Constrained Min-Max Algorithm for Rival Models of the Same Economic System", Math Prog, 53, 279-295. Rustem, B. and M. Howe (1998). "A QuasiNewton Algorithm for Continuous Minimax", DoC, Imperial College. Shanno, D.F. (1970). "Conditioning of QuasiNewton Methods for Function Minimization", Math of Computation, 24, 647-654. Wierzbicki, A.P. (1982). "Lagrangian Functions and Nondifferentiable Optimization", in: E.A. Nurminski ed., Progress in Nondifferentiable CP-82-58, IIASA, 2361 Optimization, Laxenburg, Austria. Womersley, R.S. and R. Fletcher (1986). "An Algorithm for Composite Nonsmooth Optimization Problems", JOTA, 48, 493-523.

4. CONCLUDING REMARKS The quasi-Newton algorithm computes minimax strategies for worst-case design problems. The algorithm utilises whenever possible a simplified direction of search based on the gradient corresponding to the minimum norm. Numerical experiments have shown that the algorithm does indeed choose this direction, over the convex combination of gradients, until the last few iterations. Descent is assured at every iteration and the algorithm converges at a superlinear rate.

REFERENCES Broyden C.G. (1970). "The Convergence of a Class of Double-Rank Minimisation Algorithms 2. The new Algorithm", J. Inst. Maths. Applies., 6, 222-231. Chaney, R.W. (1982) "A Method of Centers Algorithm for Certain Minimax Problems", Math Prog, 22, 206-226. Charalambous, C. and J.W. Bandler (1976). "Nonlinear Minimax Optimization as a Sequence of Least pth Optimization with Finite Values of p", International Journal of System Science, 7, 377-391. Coleman, T.F (1978). "A Note on 'New Algorithms for Constrained Minimax Optimization''', Math Prog, 15, 239-242, (1978). Conn, A.R. and Y. Li (1992). "A Structure Exploiting Algorithm for Nonlinear Minimax Problems", SIAM J Optimization, 2, 242-263. Demyanov, V.F. and V.N. Malozemov (1974). Introduction to Minimax, John Wiley, New York. Di Pillo, G. and L. Grippo (1993). "A Smooth Method for the Finite Minimax Problem" Math Prog, 60, 187-214. Dutta, S.R.K. and M. Vidyasagar (1977). "New Algorithms for Constrained Minmax Optimization", Math Prog. 13, 140-155. Fiacco, A.V. and G.P. McCormick (1968). Nonlinear Programming: Sequential Unconstrained Minimization Techniques, John 434