Copyright © IFAC Nonsmooth and Discontinuous Problems of Control and Optimization, Chelyabinsk, Russia, 1998
APPROXIMATE GRADIENT METHODS AND THE NECESSARY CONDITIONS FOR THE EXTREMUM OF DISCONTINUOUS FUNCTIONS V. D. Batukhtin, S. I. Bigil'deev, T. B. Bigil'deeva
Chelyabinsk State University 129, Br. Kashirinyikh str, Chelyabinsk, 454021, Russia
Abstract: The theory of solving extremal problems is developed and its practical applications are discussed. On the basis of approximate gradient theory (Batukhtin and Maiboroda, 1984,1995; Batukhtin, 1993) a multivalued mapping is constructed that makes it possible to study the extremum offunctions integrated with the square in Lebesgue measure. The connection with the subdifferential F.H.Clarke is established. Numerical algorithms that practically realize this approach are given. The work of algorithms is illustrated with examples. Copyright ©1998 IFAC Keywords: Mathematical programming, Discontinuous function, Integrals, Numerical methods, Computer experiments.
behaviour of the studied objects, that is discontinuous functions. Particularly, one of the important problems is to create effective methods of constructing a sequence of points where approximate gradient equals to zero at a proper choice of a weight function. The point is that, when the diameter of the integration domain converges to zero on a function discontinuity the approximate gradient is infinitely growing in the norm to impede the numerical realization.
1. INTRODUCTION
In the monographs (Batukhtin and Maiboroda, 1984, 1995) the concept of approximate gradient was introduced and on its basis an approach to solving discontinuous extremal problems was developed. Approximate gradient being an integral operator, is a vector that can be considered as gradient of differentiable functions analogue. Yet, it is defined not only for differentiable functions but for discontinuous ones as well. The interesting interpretation of this approach is founded in the papers (Bigil'deev, 1996; Bigil'deev and RoIshchikov, 1997; Batukhtin, et al., 1997).
The intention of this paper is to consider the approximate gradient methods from a more general point of view, that, as the authors believe, would result in a further step in the understanding of the problem and would encourage the development of the approximate gradient theory. The theoretical part of the paper is based on the representation of the approximate gradient in a more general form and the study of a set of approximate gradients on all possible weight functions as well as the study of the limiting properties of the set.
The methods of approximate gradient are studying the functions for the extremum via constructing the sequence of points at each of which the approximate gradient is zero vector and the diameter of the integration domain converges to zero. Such approach proved to be fairly productive in numerical solutions of extremal problems. At the same time, some rather complex problems are encountered on this way. First and foremost while constructing the numerical methods. It is, to a great extent, connected with the complex
When using approximate gradient, the problem of conditional optimization of the function F : U -t R on set U C Rn may be consider as
25
the problem of unconditional optimization of an extended function on all space Rn:
Definition 2. The nonnegative square integrable in Lebesgue measure I-l function Po : Rn -+ [0; +00) is called a weight one if it is other than zero only in a neighborhood no of the origin that is contained in the sphere of the radius 0 > 0 and takes positive values there on the set of non zero measure.
X _ {F(X) , if X E U w(x) , otherwise
I( ) -
The various ways of function extention were considered by authors in detail at the papers (Batukhtin and Maiboroda, 1984, 1995; Batukhtin, et al., 1997). Note, that the problem of function extention does not involve any principle difficulties, that is a consequence of wideness of the studied functions class.
Definition 3. The following integral operator
a(x;o,po;/)
= D(Po)-l !(s - so)/(x + s) Po (s)l-l(ds) , (4)
Let 1 : Rn -+ R be a square integrable in Lebesgue ineasure I-l function. Consider the problem of unconditional minimization:
I(x) -+ min
06
where
(1)
zER"
So =
! =!
sPo(s)l-l(ds)/
06
Following (Liusternik and Sobolev, 1965), let us introduce the concept of substantial minimum of the function. Let Bo(x) be a sphere with the radius 0 and the center at the point x. Denote the class X of the sets X of the measure 0 that belong to B o(x) and consider the following function on X: inf
yE B 6(Z)\X
Let ao
D(po)
I(y) = a(X)
NOTE. In the mentioned papers (Batukhtin and Maiboroda, 1984,1995; Batukhtin 1993) the approximate gradient is defined on a weight function that represents a probability distribution density function of the vector s the components of which are centered noncorrelated random values.
xcx
The ao is called the substantial minimum of the function 1 on the sphere B o(x) and is denoted : inf
I(y)
In order to construct the algorithms for solving the problem (1) numerically, let us consider the properties of the limiting points of the set of approximate gradients (4) as 0 -+ +0 on the set of all weight functions. Here, this set of limiting points is called the approximate subdifferential of the function 1 at the point x and is denoted by oa/(x).
= xcx sup { inf I(Y)} . yE B 6(Z)\X
Here the following limit vr I(x) = Hm vrai 0-++0
inf
yEB6(Z)
I(y)
is called the substantial value of the function the point x.
(2)
1 at
Similarly (Batukhtin and Maiboroda, 1984) it is not difficult to establish that in the case of the differentiable function the approximate subdifferential consists of a single point that coinsides with the function gradient. However, if no additional conditions are imposed on weight functions, then the approximate subdifferential may not be contained in the subdifferential even for the convex function 1 (Bigil'deev and Rolshchikov, 1997). Hence, further we shall confine ourselves with the weight functions being other than zero in the sphere Br (so) ~ no, where r > 0 that depend only on the distance from points so. As the result, the weight function for each 0 will be given by a certain point from the neighborhood no and a function of one variable on the segment (0; r). We shall denote the mentioned point So from no by sr
Definition 1. The point x* E Rn is called a point of substantial local minimum of the function 1 if there exists c: > 0 such that,
I(x) 2: vr I(x*)
(s - so)(s - so)TPo (s)J.l(ds)
is the positively defined matrix due to the definition of the weght function Po, is called the approximate gradient a(x; 0, Po; f) of the function 1 at the point x.
= sup a(X). In addition, if the function
YEB6(Z)
Po (s)J.l(ds),
06
a(X) is finite for some X C X, then it takes the maximal value on a certain set X(1. C X and aD> -00 (Liusternik and Sobolev, 1965).
vraz
! 06
(3)
on Be(x*) almost everywhere (Kolmogorov and Fomin, 1989). If the prefix vrai is absent in (2) and (3) holds for all x from Be (x*), then the point x* is called just a point of local minimum. It is evident that for the continuous function the sets of minimum points and substantial minimum points coincide.
26
and the weight function by Pr (s) = Pr (11 S - Br ID. Such weight functions are called symmetrical, and in the case Br = 0 - centered. Denote by Q the set of the symmetrical weight functions.
Let us define the function r(x; u) as the upper limit of the following type:
r(x;u)= limsup (a(X;r,Pr,Br;!),U) (7) r -+ +0 Br -+ 0 Pr E Q
When symmetrical weight functions are used approximate gradient (4) takes a simpler form. In this case the set no may be replaced by Br (Br) and the matrix D(Pr) becomes diagonal with the same elements on the diagonal:
dr(Pr)
=~
J J
for Vu ERn. As in (Clarke, 1983), it can be established that the function u t-+ (x; u) is positively uniform and subadditive. Upper semicontinuity of (x; u) as the function of x follows from upper semicontinuity in inclusion of the multivalued mapping
r
11 S - Br 1/2 pr (11 S - Br IDJl(ds)
Br('r)
=.1~
11
V
11 2 Pr(11 V
II)Jl(dll) > 0,
r
x t-+ oaf(x).
Br
On the other hand, by Formulas (7) and (6) for the function r(x; u) and the set oaf(x), ~ E cooaf(x) if and only if (Cu) ~ r(x;u) for all u E Rn. Hence, (x; u) is the supporting function of co oaf(x) (Pschenichny, 1980).
=
where V S - Br, n is the space dimension, Br is the sphere of the radius r with the center at the origin.
r
As the vector Br is defined arbitrarily, let us include it in the arguments of the approximate gradient, then Formula (4) will be rewritten as :
d}Pr)
f
v f(x
+ Br + v) Pr (11 V
Following (Clarke, 1983) we call the following function as the generalized derivative of the function f in the direction u E Rn at the point x :
r(x; u) = limsup f(y Y-+X t -+ +0
II)Jl(dv) (5)
Br
Approximate gradient (5) on symmetrical weight functions is continuous in Br due to its continuity in x on the centered weight functions (Batukhtin and Maiboroda, 1984)
r
Theorem 4. For any point x and for all u E Rn the following inequality is true
r(x;u) lim
a(x; r, Pr, Br;!)
f(y) , (8)
which, as well as the function (x; u) , is the positively uniform and subadditive function in u.
By definition for the symmetrical weight functions
oaf(x) =
+ t~) -
~
r(x;u)
•
,(6)
-+ +0 Br -+ 0 Pr E Q
r
Due to positive uniformity of functions (7), (8) it is sufficient to consider u E Rn with 1/ u 11= 1. The following equality is true for this vector PROOF.
where for each r > 0 the weight function Pr (11 s Br ID is given arbitrarily.
J dr(~r) J
-+
(v, u)f(x + Sr
1 = dr(Pr)
Let us consider the properties of the set.
v) Pr (11 v II)Jl(dv) =
Br
It can be unbounded even for the function of one variable (Bigil'deev, 1996;). By definition, this set is closed. Besides, the multivalued mapping x t-+ oaf(x) is upper semicontinuous in inclusion. The latter follows from the fact that in the definition of approximate gradient (5) Br gives only a parallel transfer of the variable x. At the same time there can be any order of r and Br converging to zero in (6). Hence, directing first r then Br to zero we obtain the definition of upper semicontinuity in x.
(v, u)[f(Yr+ v )- f(Yr+vJ. )]Pr(11
V
II)Jl(dv)
Br
=
=
where Yr X + Br, vJ. v - (v, u)u is the vector v constituent orthogonal to the vector u. We have
!(v, u)f(Yr
+ vJ.) PrOI v II)Jl(dv)
= 0
Br
as the function f(Yr + vJ.) = const for all points v with the same consituent vJ., that is on the straight lines paralled to the vector u.
As oaf(x) is closed its convex hull cooaf(x) is a closed convex set.
27
Let X be the set of points v from the sphere Br for which (v , u) == O. As I'(X) == 0 we obtain
PROOF. This statement follows from the fact, that the extreme points of the set octf(x) are limit points of the sequence of gradients f' (y) as Y -r x on points Y of the differentiability of the function f (Clarke, 1983). But the extreme points are contained in oaf(x), too, because when selecting Br == y-x for all r > 0 a(x; r,Pr, y-x; j) -r f'(y) as r -r O.
(a(x;r,Pr,Br;f),u)
!
=
+ v) -
f(Yr
+ v.l.) qr () (d ) V I' v,
f(Yr v,u ( )
Br\X
where 1
= dr(Pr) (v,u)
qr(v)
2
PrO!
V
Theorem 5 is proved •
ID 2: 0
and
!
qr(v)l'(dv)
= d}Pr)
Br\X
!
(v, u)2 pr (1I
V
2. STRUCTURE OF THE APPROXIMATE SUBDIFFERENTIAL
IDI'(dv)
Br
1 ndr(Pr)
f
11 v 11 2 Pr(1I
V
IDI'(dv)
The idea on the structure ofthe set oaf(x) for the square integrable in Lebesgue measure I' function f results from the following theorems.
=1
Br
Theorem 6. At fixed r > 0 and Br the set of approximate gradients {a(x;r,Pr,Br;f)} on the symmetrical weight functions is a convex set.•
due to the symmetry of the weight function Pr. As the result we obtain
PROOF. For any A E [0; 1] and any weight functions p~(I1 S - Br ID ,p~(1I S - Sr ID
+ (1 -
Aa(x; r, p~, Sr; j)
== Adr(~)
== sup {sup 1; sup2}
< 11
f(zr
sup 0< t r S r Zr - Yr lis r
+ tru) -
sup2=
tr
f(Yr
f
+ v)p~(1I v IDI'(dV)
vf(Yr
+ v)p~(1I v IDI'(dv)
Br
vf(Yr
+ V) Pr (11 V 1I)I'(dv)
Br
sup (v, u) > 0 11 v liS r sup (v, u) < 0
== Yr + v.l.,
f(z~
+ (v, u)u) -
== a(x;r,Pr,Sr;j),
f(z~)
where Yr =
(v, u) f(z~
PrOI
.
ID ==
+ Br,
,p~(11 v
ID + (1 -
,) p~(11 vII),
, = Adr(P':) + (1 _ )")dr(v..) E [0; 1].
-(v, u)
Theorem 6 is proved •
z~ = Yr
hmsup
V
X
)"dr(P~)
- (v, u)u) - f(zn
+ v.
Theorem 7. The set of the points of the approximate subdifferential oaf(x) that can be obtained on the set of centered symmetrical weight functions is a convex set. •
Consequently,
I'" (x; u) S
f
= dr(~r)
II v liS r z~
V
+(1 - A) d)P':)
f(zr)
where sup 1 ==
f Br
A)a(x; r, p~, Br; j)
f(z
+ tu) - f(z) t
== r(x; u).
Z-rX
t -r +0
PROOF.
Let us fix
sr == S for all r > 0 and consider
Theorem 4 is proved • the set {
Theorem 5. If the function f is locally Lipschitzian in a neighborhood of the point x, then the F.H.Clarke subdifferential ocd(x) = co oaf(x) on symmetrical weight functions (Bigil'deev and Rolshchikov, 1997) .•
lim
r
-r +0
a(x;r,pr,S;f)}.
Pr E Q If rl
S r2,
then
{a(x;rl,Prl,S;f)} ~ {a(x;r2,Pr2,S;f)}
28
converqing to the point ~ as j -+ +00. Consequently, ~ E Ox and the set is closed.
as for any weight function Pr l ' in the sphere B r1 , there exists a weight function r
P
2
(11
V
1 (11 11) -- {pr 0
Hence, at fixed
v
ID , if 11 v 11::; rl , if " vII> rl
The convexity of the set Ox follows from it being the limiting points of the sequences of convex sets. Indeed, for any two points ~ and TJ from Ox there exist sequences of approximate gradients ~i a(xi;oi,P5.,0;1) and TJi a(xijPi,p~"O;1) such that ~j -+ TJj -+ 1] as i -+ +00. For any >.. E [0; 1] >"~i + (1- >")TJi = a(xi;ri,Pr" OJ I) for ri = max{Jj , Pi} and a certain centered weight function Pr, due to Theorem 6. Hence,
s the set
=
lim a(x; r,Pr, { r-++O Pr E Q
S;/)}
is an intersection of convex sets. For the case
>"e+(l->..)1]
s = 0 we obtain the statement.
Let us fix an arbitrary sequence of points X = {Xi} ~ Rn converging to the point X at i -+ 00. Consider at r -+ +0 the set Ox of limiting points of the sequences of the approximate gradients computed at the points Xi on the centered symmetrical weight functions. Unlike the points of the set in the Theorem 7 the points of the set Ox result from the changing r and the functions Pr at the transition from one point of the sequence X to another. At the same time, the set considered in the Theorem 7 is the set Ox for the sequence X all points of which coincide with the point x. Denote it by 00.
lim a(Xj;rj,Pr"O;/) i -+ +00 Pr, E Q
To prove the inclusion 00 ~ Ox let us take an arbitrary single vector u E Rn. For it limsup (a(Xj;r,Pr,O;I), u) r -+ +0 Pr E Q Xi
>
-+
X
limsup {limsUp(a(x;; r,Pr, 0; I), r -+ +0 X,--+-X Pr E Q
u>}
limsup (a(x;r,Pr,O;I), u) r -+ +0 Pr E Q due to the continuity of approximate gradient in X at fixed r > and weight function Pr.
°
a(xi;r,Pr,O;1)
lim
=
and any point that lies on the segment connecting points ~ and 1] belongs to the set Ox.
Theorem 8. On any sequence X converging to the point x the set
Ox =
=
e,
r-++O Xi -+ X Pr E Q
As the result, it is shown that supporting function of the convex closed set Ox can not be less than the supporting function of the convex closed set 00' Hence, 00 ~ ox·
of the limiting points at r -+ +0 of approximate gradients on the centered symmetrical weight functions is a convex closed set that contains the set 00. In addition, the set oal(x) is the union of the sets Ox in all sequences X = {Xi} converging to the point X •
The inclusion of the set Ox into the approximate subdifferential oal(x) follows from (see Formula (5))
a(Xj;r,Pr,O;1) = a(x;r,Pr,Xi -x;1) PROOF. Let {~j};l be a sequence of points of the set Ox converqing to a point ~ as j -+ +00. Then, from the definition of the set Ox for each point ~j there exists a sequence of numbers rj. -+ +0 and weight functions Pri, such that a( Xi; rj" Pri., 0; I) -+ ~j as i -+ +00. For each index j so that 11
> 0 the index z'* = i(j)
a(xi*; rj,*, Pr] * ,0;/) - ~j 11 •
for any point Xj converging to X sequence. On the other hand, any element of the set ool(x) is lim a(x;r,Pr,sr;/) r -+ +0
sr -+ 0 Pr E Q
is selected
that is the element of a set
1
ox.
Theorem 8 is proved •
< -;. J
Theorem 9. The approximate subdifferential Oa!( x) of the square integrable in Lebesgue measure function ! is a convex set on symmetrical weight functions.•
As the result, we obtain the sequence of approximate gradients
{a(xi*; rj,*,Pri,*' 0; I)}
29
PROOF. Consider two arbitrary elements ~ and Tf ofthe set oaf(x). Let ~ E Ox, Tf E 8y and the sets Ox, 8y C oaf(x) are obtained, correspondingly, on the sequences {xd, {Yi} converging to the point x. The sequence {Zj} with Zj = Xi for j = 2i and with Zj = Yi for j 2i + 1 also converges to the point x. In addition, the convex set Oz C oaf(x), that corresponds to this sequence, will include sets Ox and f}y, as for any single vector u E Rn and the sequence {Zj} 2 {{x;} U {Yi}}
a generalized stationary point (Batukhtin and Maiboroda, 1995) that is when r -+ +0 there exist points x r , at each of which the approximate gradient should be equal to zero for a certain weight function, converging to the point x.
=
limsup (a(zj;r,Pr,O;f),u) r -+ +0
Condition (9) introduced here is a necessary condition of minimum for different classes of functions. For convex functions f it is a necessary as well as a sufficient condition of minimum that zero belongs to approximate subdifferential. It is a necessary condition of minimum for discontinuous functions obtained by a vertical shift in hyperplanes of parts of epigraphs of convex functions. These two statements follows from properties of the approximate gradient which state by V.E.Rolshchikov (Batukhtin and Maiboroda, 1995).
2: sup {isl; is2} ,
Pr E Q Zj
-+
X
where
isl
=
limsup (a(xi; r,Pr, 0; I), u), r -+ +0 Pr E Q Xi
is2
=
-+
It follows from the Theorem 9 that (9) is a necessary condition of minimum for locally Lipschitzian functions (Clarke, 1983).
X
limsup (a(Yi;r,Pr,Oil), U). +0
It has not yet been established in the general case for the class of the square integrable in Lebesgue measure J1- functions that (9) is a necessary condition of substantial minimum. As the generalized stationary point is at the same time the substantially stationary point then for these functions that satisfy the special condition (Batukhtin and Maiboroda, 1995) this statement is true (Theorem 13). Particularly, strongly quasiconvex functions that can be discontinuous satisfy this special condition (Bazaraa and Shetty, 1979).
r -+
Pr E Q Yi -+ X Theorem 9 is proved. Corollary 10. If the function f is locally Lipschitzian in the neighborhood of the point X, then, the F.H.Clarke subdifferential ocd(x) = oaf(x) on symmetrical weight functions .•
To formulate the above condition, let us introduce the following notation: iy,u {xERn : x=y+o:u, 0: 2: O} is a beam originating from the point Y in the direction of the vector u =f. 0; S(y,R) = {x ERn: 11 X - y 11= R} is a sphere of the radius R with the center at the point y.
This statement repeats the Theorem 5 with due account of convexity of the approximate subdifferential • PROOF.
3. THE NECESSARY CONDITION OF OPTIMALITY Definition 11. The point Xo E Rn is called a substantially stationary point of the square integrable in Lebesgue measure J1- function f if
o E oaf(xo).
(9)
It is known that the necessary minimum condition for a differentiable function is its gradient being equal to zero. Hence, a point of a substantial local minimum of the function that is equivalent (Kolmogorov and Fomin, 1989) to a differentiable function is a substantially stationary point.
Fig. 1. Condition A.
Definition 12. The function f satisfies condition A at the point y if for any sufficiently small R > 0 there exists r> 0 such that, for any x E Rn and Z E S(y, R) the function f(x+o:(z-y)) for 0: 2: 0 is a nondecreasing function in 0: on the segment i:z:,z-y n Br(z) (segment CD on Fig.l).
It follows from the previous section that the approximate gradient methods (Batukhtin and Maiboroda, 1984, 1995) can be interpreted as the methods of constructing a substantially stationary point. Indeed, for the point X to be a substantially stationary one it is sufficient that it should be
30
Theorem 13. If the square integrable in Lebesgue measure fJ lower semicontinuous function f satifies condition A at the point of local minimum x* then inclusion (9) is satisfied at this point (Batukhtin and Maiboroda, 1995)._
PROOF. Due to strong monotonicity of the function I{) the point x* is the single point of minimum as well as for the function h, for which condition A is fulfilled at the point of minimum.
Theorem 16 is proved.
Definition 14. The function f is called strongly quasiconvex if for any x, Y E Rn, X :f. Y and any a E (0; 1)
f(a x
+ (1 -
a) y)
<
To conclude this section note, that all the above mentioned theorems are true for the functions equivalent to the discussed ones. Hence, numerical search for the point of substantial minimum of the square integrable in Lebesgue measure fJ function can be effected by finding an substantially stationary point with the help of approximate gradient like it is done in the convex and smooth cases.
max {f(x), f(y)} .
Theorem 15. Let f be a strongly quasiconvex square integrable in Lebesgue measure fJ lower semicontinuous function and x* is its point of minimum at which it is continuous. Then, it is satisfy condition A at the point x* and (9) is fulfilled there. _
4. NUMERICAL METHODS, HOW THEY WORK
The described theoretical results make it possible to construct numerical methods of solving discontinuous extremal problems. Approximate gradient (taken with the minus) can be used as a motion direction when constructing the minimizing sequence.
As f is a strongly quasiconvex function, then x* is the single point of minimum of this function. Here Mc(l) {x E Rn : f(x) :::; c} is the nonempty and convex set for any c 2: f(x*). PROOF.
=
Let us fix an arbitrary R > 0 and consider the points of the sphere BR(X*). As the lower semicontinuous function f on the compact reaches its min f( x) > least value, then denote CO
The main scheme of approximate gradient methods is constructing the sequence of points {x(k)}, x(k) E Rn in accordance with the iteration procedure:
xes(x* ,R/2)
f(x*).
=
Assume € CO - f(x*) > O. Due to the continuity of the function f at the point x* there exists o > 0 such that Bo(x*) ~ Mco(f). Consequently, x* E intMco (I) . In addition, 0 < R/2 and for any point x E B O/ 2(x*) x E intMco (f). But then it follows from the properties of the convex sets (Vasilyev, 1988) that any beam l x ,u for any direction u E Rn intersects the boundary of the sets Me (I) at c 2: CO at no more than one point. It means that for "Ix E B o/ 2 (x*) and "Iu E Rn the function f increases along the beam l z ,u on its parts outside Meo (I) and, hence, outside B R / 2 (X*). In addition, due to strong quasiconvexity on the beams there are no constant parts of function f.
X(k+l) -- x(k) _ aka(x(k)., r(k) , p(k) , s(k)." f)
=
=
where p(k) Pr(k) (s), k 0,1,2, .... and the step is chosen either by means of one-dimensional minimization: Ok
or from the condition defined by the inequality
Assume r = 0/2. Then, for any point yE 5(x*, R) the function f increases in the sphere Br (y) along every beam l x ,y-x* for x E B O/2(x*). In addition, l z ,y-z* n Br(y) 0 if x rI. B O/2(x*).
=
0< u
Hence, the function f satisfies condition A at the point x* and statement (9) is true.
< 0,5.
If 11 a(x(k);r(k),p(k),s(k);f) 11< € where € is a sufficiently small positive number or the function f is not decreasing in the direction -a(x(k)'r(k) , , p(k) , s(k)'f) " then , thevalueofr(k) is decreasing. The process is stopped if r(k) < rmin. The convergence of these methods is proved for nonsmooth and discontinuous functions. In addition, the main condition imposed on the function
Theorem 16. Let the function f satisfy the conditions of Theorem 15 and I{) is an increasing function of one variable for which the superposition h( x) I{)(I( x)) is a square integrable in Lebesgue measure fJ function. Then, the point of minimum of the function h(x) satisfies (9) .•
=
31
f is that it belongs to the class offunctions satisfying the generalized Lagrange mean-value theorem (Batukhtin and Maiboroda, 1995).
replace the approximate gradient a(x(k); r(k) ,p(k), f) at the point x at r> 0 with the following integral sum:
s(k);
These methods appear good when solving nonsmooth and discontinuous optimization problems. The authors presented computational results at international workshops as well as published them in some papers (Batukhtin and Maiboroda, 1984, 1995; Batukhtin, et al., 1997; Bigil'deeva and RoIshchikov, 1994).
a(x(k). r(k) p(k) s(k). ,
,
,
,
f)
<"oJ -,...;;j
a(m)
m
1 -- d- S
where s(i) are nodes in the sphere Br (0 fi = f(x + s(i));
In the main scheme of the approximate gradient methods computation a(x(k>; r(k), p(k), s(k); f) is performed in a certain uniform lattice in the integration domain. Here, function values at all nodes are taken as given with (most often equal) weights. However, in this case it is not always possible to find a fairly accurate solution of discontinuous problems where: a) a point of minimum is on a discontinuity surface, b) the function is decreasing along the discontinuity. This is conditioned by the fact that the approximate gradient norm converges to infinity at the discontinuity point if the radius of the domain converges to zero. Hence, at such points the approximate gradient "turns back" in the direction perpendicular to the discontinuity line hampering the motion along it.
A; =
<
(J Pr (s)J-L(ds)) -1 J Pr (s)J-L(ds) >
<
i
rn);
0
n.
Br
are weight coefficients, m
EA; = 1;
;=0
m
D; are some subdomains Br such that, Br and D; n Dj rn);
= 0 for i :f; j
m
S
= E Ais(i); ;=0
Let
That is why some adaptive algorithms were constructed.The main idea of adaptive algorithms is to correct the initial approach of the approximate gradient by adding new nodes. Each new (m+1)-th node is choosen with due account of the direction of the obtained approximate gradient approach on m nodes. Then, the approximate gradient is recomputed for the (m+1) points that defines the choice of the next node.
;=0
(0
:s i :s rn, 0 :s j :s
m
dm
U D;
= E 11 s(i) -
S
11 A;.
;=0 m
m
;=0
i=O
= E Ad; and g(m) = E Adis(;).
j
Then,
The initial approach of the approximate gradient is computed after (n + 1) points where n is the space dimension situated on coordinate axes, that IS
The procedure of accumulating the nodes and approximate gradient correction is continued until one of the following three conditions is observed
s(O)
= (0; ; of, s(l) = (r; 0; ...; O)T, ...,
s(n)
= (0;
; 0;
rf.
(1) an approximate gradient approach gives a node at which the function value is not less than at the previous one; (2) the approximate gradient norm less than a beforehand given number; (3) the node number reaches the maximum permissible one.
The weight coefficients A; are choosen equal to n~l. The next node is choosen on the surface of the sphere Br so that
Otherwise, the adaptive algorithms are similar to the main scheme of the approximate gradient methods. If for the given r the accumulation of nodes does not lead to constructing the direction of the function f decrease, then r is subdivided until it reaches the minimum permissible value or the desired direction is found. Finding the point in the direction of function decrease is defined by one-dimensional minimization.
When computing a new approach of the approximate gradient the weight coefficient for an additional node is assumed equal to one and the same number (3 (0 < (3 < 1). Then,
(m+l) _
s
Let us draw computational formulae for one of the nodes accumulation methods. For this, let us
32
-
a(m) - r 11 a(m)
11"
s(m+l)
= (1 _ (3)s(m) + (3s(m+l);
j
= (1 -
(3)j
+ (3f(m+l);
dm+l
= (1 -
,8)[dm
+ ,811 S(m+1)
-
S(m)
at a sufficiently big initial r the adaptive algorithm with a good accuracy finds the absolutE minimum (0; O)T: Of = 10- 9 , Ox = 6· 10- 3 , kf 1953. At the same time for the smooth methods points of local minimum that are concentrated in a neighborhood of the point (0; O)T appear to be "traps". For example, from the same initial point the conjugate gradient method terminated its work with Or = 1,3544.
W].
=
As a result (1 - ,8) plays the role of a certain "memory" coefficient of mean values s(m), pm), dm and g(m). It should be noted that though adaptive algorithms contain heuristic elements and there is no proof as yet of their convergence the idea used in them provides the lacking flexibility to the main scheme of approximate gradient methods. It is first of all connected with the fact that the procedure of node accumulating makes it possible to construct the approaches for sr and the weight function Pr (s) for which the approximate gradient norm is minimal.
As in the previous example, the point of absolute minimum of the discontinuous function
I(X1, X2) = 0, if xi + x~ I(X1, X2)
= 0 and
= -Jxi + x~ + 2j + 1,
ifj:S Jxi+x~
At the same time in the adaptive algorithms the initial approach of approximate gradient can be fairly rough and contain only information on function values in lineary independent directions. A special feature of this kind of algorithm also is that it is sufficient to preserve only the averaged characteristics of the nodes set to correct the direction of the approximate gradient .
I
:s Y Xl + X2 < J . I
2
2
'-1
J:
lOr
J•
= 1, 2 , ...
is the origin, in the neighborhood of which the discontinuities are concentrated. Moreover, each point of function discontinuity is a point of substantial local minimum as well as maximum. For this problem the adaptive algorithm method with initial r = 1 finds the point of absolute minimum with the following accuracy: Of 10- 6 , 6 Ox 10- , kf = 4558. Fig. 2 shows the algorithm motion trajectory.
=
=
Experience shows that adaptive algorithms appear more economical as compared to the main scheme in the number of function computations and provide more accurate problem solutions. Let us consider some examples to illustrate the work of adaptive algorithms. We shall use the following notation: xO is a point of minimum; x is an obtained approximation of the point of minimum;
Fig. 2. The following two examples belong to the problems of conditional optimization. The solution in them is constructed with the help of the extended function method.
So, for example, for the discontinuous function
I(X1, X2) = 2(X1 - 0, 1)2 + (X2 - 4, 5)2, if Xl
+ X2
I(X1, X2)
Let (cp, e) be a polar coordinate system. Consider etp and fl2 e2tp . two logarithmic spirals eI If for a fixed angle cp the point Xl = e . coscp, X2 e· sincp has the value e E [eI; e2], we say that the point (Xl, X2)T belongs to the set U.
=
:s 5, 0 :s Xl, 0 :s X2 and = 3(X1 -
=
0, 1)2 + 2(X2 - 4, 5)2 otherwise
the main scheme provides: Of = 10- , Ox = 10- , k f = 1258 and the adaptive algorithm provides: Of = 10- 9 , Ox 10- 4 , kf 776. 6
=
=
3
Consider the problem of conditional minimization
=
F(xI, X2) =
xi + x~ ---+
For the smooth function
= 0, if xi + x~ = 0 and I(X1, X2) = 5 .1O- 5 (xi +x~)(99 sin .;:f~x~ + 101), if xi + x~ # 0
min
(Xl,Z2)TEU
To solve it with the adaptive algorithm the function F was extended in the following way
/(X1,X2)
F(X1,X2)
/(X1,X2)
33
,if(X1,x2)TEU
= { F(X1,X2) + 1000 , otherwise
the function F and the level line f. Fig. 7 shows a fragment of the algorithm passing the narrow part of the domain U. For this problem §f = IQ-3, §x 10- 2 , kf 8853.
And the minimum of the latter was found with the result: ~f = 10- 6 , ~x = 10- 4 , k f = 3193. Fig. 3 and Fig. 4 show level lines of the function f as well as the whole trajectory of algorithm motion and its fragment in the neighborhood of the point of minimum in the bottom of the twisting ravine.
=
=
Fig. 6. Fig. 3.
Fig. 7.
Fig. 4. REFERENCES
In the second example of the problem of conditional minimization it is required to minimize the function F (x 1, X2) x~ in the domain U, pressed between the circle and the ellipse at Xl 2: 0 (see Fig. 5).
Batukhtin, V.D. and L.A. Maiboroda (1984). Optimization of Discontinuous Functions. Nauka, Moskow. Batukhtin, V.D. and L.A. Maiboroda (1995). Discontinuous Extremal Problems. Gippocrat, St.Peterburg. Batukhtin, V.D. (1993). On Solving Discontinuous Extremal Problems. Journal of Optimization Theory and Applications, 77, 575-589. Bigil'deev, 5.1. (1996). Approximated Derivative as a Multivalued Mapping. Vestn. Chelyabinsk State University, 1, 21-33. Bigil'deev, S.l. and V.E. Rolshchikov (1997). Properties of Approximate Gradient in Depending on Weight Functions. Journal of Computer and System Sciences, 4, 89-94. Batukhtin, V.D., S.I. Bigil'deev and T.E. Bigil'deeva (1997). Numerical Methods for Solution of Discontinuous Extremal Problems. Journal of Computer and System Sciences, 3, 113-120. Liusternik, L.A. and V.I. Sobolev (1965). Elements of Functional Analysis. Nauka, Moskow. Kolmogorov, A.N. and S.V. Fomin (1989). Elements of the Theory of Functions and Functional Analysis. Nauka, Moskow. Clarke, F.R. (1983). Optimization and Nonsmooth Analysis. Wiley, New York. Pschenichny, B.N. (1980). Convex Analysis and Extremal Problems. Nauka, Moskow. Bazaraa, M.S. and C.M. Shetty (1979). Nonlinear Programing. Theory and Algorithms. Wiley, New York. Vasilyev, F.P. (1988). Numerical Methods of Solving Extremal Problems. Nauka, Moskow. Bigil'deeva, T.B. and V.E. Rolshchikov (1994). Numerical Methods of Optimization Discontinuous Functions. News Russian Academy of Science, Technical Cybernetics, 3, 47-54.
=
Fig. 5. The curve ABCEFDA is the boundary U. The point (0;11) is the point of maximum and (0;-11) is the point of minimum of the function F. In addition, the function F is constant on the straight lines parallel to the axis OXI and on the axis OXl (CD segment on Fig. 5) it has the point of its inflection. Outside the domain U let us extend the function F by sufficiently great constants and the parabolic function: f(xl,x2) = x~, if
f(xl,x2)
if
(1;,bl)2
(Xl,X2f
E U;
= llx~ + 1, + (RV>
1 and
Xl
2:
0;
f( Xl, X2) = 1100, if xi
+ X~ < 100 and
f(XI' X2) = 1010, if Xl
< O.
Xl
2:
0;
At such extension of the function F the points of the curves ABC and DFE (Fig. 5) are saddle points for the function f. Fig. 6 shows the general view of the adaptive algorithm motion trajectory along the boundary of the domain U from the point of maximum of
34