THE METHOD NON-LINEAR
OF PENALTY
FUNCTIONS
PROGRAMMING
FOR
PROBLEMS*
V. D. SKARIN Sverdlovsk (Received 10 JuZy 1972) A connection
is established between the initial problem of mathematical
programming
and an associated problem involving a penalty; the discussion hinges mainly on an estimation-type approach to the topic. The conditions to be met by the parameter and the penalty function in order for the two problems to be equivalent are determined. The present paper aims at establishing
the connection
between the problems
and
where C is a convex set of given space E, the set M is defined by a system of functional inequalities,
and 0 (z, r) is a suitably chosen parametric
In view of its universality,
penalty function.
the method of penalty functions
has received considerable
attention in recent years. Russian papers on the topic include [l-5] survey of foreign papers is to be found in [6].
. An exhaustive
In most papers, the study of the method of penalty functions has been confined to proving its asymptotic convergence. The following two points are of obvious interest. First, we need to discover the cases in which the solutions of problems (0.1) and (0.2) are the same, starting with a certain value of the parameter; and second, if the solutions are not the same, then we want to estimate the rate of asymptotic convergence of the method. It appears that the first paper in which an estimation-type
approach was
applied to the method of penalties for convex programming problems was [2]. The same approach was applied in [3] for mathematical programming problems under constraints of the equation type and with a quadratic penalty. *Zh. @chid. Mat. mat. Fiz., 13, 5, 1186-1199, 1973.
108
Penaltyfunctions for non-linear programming problems
109
In the present paper, in the context of convex programming problems, we obtain estimates for the closeness of the solution of problem (0.2) to the solution of (O.l), for a wide class of spaces and with a quite general structure of the penalty function. This discussion is to be found in Section 1. In Section 2 we examine similar aspects of the behaviour of the solution of the problem with a penalty, in the neighbourhood of a solution of the general mathematical programming problem in reflexive Banach space. The statement of this topic is similar to that used in [3,7,8].
1. The case of convex programming
Let E be a linear topological space. We consider the problem of finding min {fo(z) ]z~Mll/},
(1 .l)
where M = {z/fl(z) < 0, j = 1, 2,. . . , q} , fj (X) are continuous convex functionals definedinE, j=O,l,..., 4, and C is a solid convex set in E. We shall assume that the problem (1.1) is solvable, that x’ is one of its optimal points, and that the following regularity condition holds for (1.1): 329 E C” Ll M: jj(ZO)
< 0 fornon-linear
f,(X),
(1.2)
where Co is the interior of the set C. We denote by E* the space conjugate to E, by i3fi(y) the set of support functionals tOfj(s)at thepointy,i.e.,dfj(y) = {e=E*jfj(x) - fj(y) > (e, z-y), VXEE}, i=o, f,..., Q; and by dC, the set of support functionals to C at the pointy, i.e., dC,={eEE*I (e, x-g) G 0, Vx E E} .The conditions listed above are sufficient [9] for the existence of functionals Zj E dfj(iT) , i = 0, 1, . . . , 4, Z E dC;E and numbers hj G 0, 7 = 1, 2,. . . ,q, such that hjf,(F)
=
0,
j = 172, . a. 7 4.
(1.3)
j=i
We construct the functional
whereO((s,r) =cp[z(~,r)],z(x,r) =(~~(x,r),...,z~(x,r)),zj(x,r) =rjfj+(x), j = I,2 7 . * * , q, fj+ (5) = max (0, fj(X)}, r = (ri, . . . , r,) > O.We shall assume that cp(z) is a convex function, defined for points z > 0 and satisfying the conditions
T(o)= 0,
%J(0)
,$ ----=v>o a2
(1.4)
K D. Shin
110
(the symbol II . II denotes the norm of an element in the appropriate normed space, in our case, in BP). We consider the problem of finding inf P (z, T) .
(1.9
XEC Theorem 1
Under the conditions stated above, if (1 S) is solvable, and we have
rj >
(lq
/ Y) 1hj I, j =
1, 2, . . . , q, problem
E=m(r),
(1.6)
where TZand m(r) are the optimal values of the problems (1.1) and (1 S) respectively, hj comes from Eqs. (1.3), and v from Eqs. (1.4); if rj > (I’q / v) 1hj ( , j = 1, 2 7 * * * 1 q, the optimal sets of problems (1.1) and (1 S) are identical. Proof: From Eqs. (1.4), F (2, r) = f. (5) = iTi. Hence, for any r > 0, inf F (z, r) G iii. XEC
(1.7)
Let x’ be an element of C. Writing z’ fdr z(x), r), and recalling Eqs. (1.3), we have F (z’, r) = fo (z’) + q(d)
=z+
z
2 fo (5) + (&, x’ - 5) + cp(2’)
hj(Zj, 5’ -if)-(i?,
X’ -Z)
+ C+l(Z’)
j=i
P
P
>z--
E j=i
lhjl
(Zj,
X’-Z)+$)(Z’)>
-fj(LF)]+~(Z’)=%--f:
fii-
Ihjlfj(X’)+(p(Z’). j-1
For the derivative &p (0) / dz we have [lo]
aq(0) -= az
max (c,z),
-=aw(O)
z j=i
Ihi1[fj(X’)
(1.8)
PeiraZtyfiinctions for ntdinear programming probEem
111
where a’y (0) is the set of support functionafs to 9 (z) at the point z = 0, i.e., for all 3 0). Hence dfp (0) = {c 1‘p(z) 2 (c, z) (1.9) On applying successively (1.9) and the inequality
which follows from the Cauchy inequality, we obtain from (I .8): F
c
F(d, 7-jz iri -
IXjlfj"
(Xl>+
+$
j=i
rjfj*
(5’)
(1.10)
j=l
P
=fE+
Y
CC14 7
rj
-
l;ljl)fj+(X’)
s
j-i
Using the hypotheses of the theorem, (1.10) gives
F(z’, r) 3 iit, whence, since x’ez C, is arbitrary,
Combin~g (1.7) and (I. 1I), we arrive at Eq. (1.6). Next, denote by g and k$) the optimal sets of problem (1.1) and (1 S) respectively, and let rj S (?‘g i Y) ]3Ljl, j 5;: 1, . . . , q. From what has been proved above, 23 c I@(Y) . Conversely, let z (I^) oa 1%(r‘), It is clear from (1.10) (with x’ = X(V)) that m(r) = F[x (r),t-1=E only under the condition fj+[ IC(r) ] = 0, j = 1, 2,. . , , q, i.e., only ifx(r) is an admissible point for the problem (1.1). Hence fo[z(r-)] =F[g(r), 7-l =fo(Z)i.e.,B(r) cJI2. The theorem is proved. Notice that the situations covered by the above theorem occur naturally. For instance, the hypotheses are satisfied by the following functions, widely employed in the method of penalty functions: cl %
(5, r) =
Is
rifi+
j=i
C5) I
@,(X,r)=max
rjfj+(X). f
112
v. D. Skmin
It can be shown in general that, if for the functions cl= (c, ,,.._, 9(s) =(I:@, ,..., 2,) =max (ci, z),
ciJ,
z>O,
i=l,...,
s,
numbers ai > 0, exist such that
then rp(z)will satisfy Eqs. (1.4). Assume that cp= 9 (z) , defined forz = (zi, . . . , z,) 2 0, is a uniformly convex function of z, or in other words, that a mo~otoni~~y increase function 6 (z) , z > 0, 6 (0) = 0, exists, such that (1.12) for any z1 2 0, zz > 0. In addition, let S (z) be continuous in its domain of definition and let rp(z) 20
for all z>O,
rp(O) =o.
(1.13)
Theorem 2
If, in the problem (1.5), the function cp(z)is ~iformly convex and satisfies (I .13), then there will exist, for any 0 < o < 1.an R > 0 such that the problem (1.5) is solvable and
for r+R,
where r,=min
rj, and the a hi, j=1,2,
. . . ,q,come from Eqs. (1.3).
ProoJ: The ant-hand inequality of ( 1.14) is obvious. Consider an arbitrary point z’ E C and write z’ = z (s’, r) . From (1.12) with 2%= z’, zz = 0 we have cp(z’, 2 2(# -; + as(llz’ll). ( )
fl.15)
Using this and applying the Cauchy inequality, we find that, in accordance with Eq. (1.8),
We set
113
Penalty functions for non-linear programming problems
to 6 = 6 (z) ; h = min q~(2) > 0, Let seII2 II-1 are three possible cases.
where 6-i (fi) is the inverse function TO> R.There Case 1.
In this case we have from (1.16): F (x’,
r>2
ti - j=l
Case 2.
Since
we have
Hence
In view of this and (1.16),
F (z’, r) > E.
Case 3. [ f:
fj+yx’)
y2>1.
j-i In view of the inequality r. > 1 we always have llz’j/ > 1. Since the function is strictly convex and satisfies (1.13), we have cp(z’) > hliz’ll. From(l.l6), ‘12
llz’ll + h lk’ll,
F(x’,r)>iTi--j=i whence P (z’, r) > ti for ro > fi. Thus, for all x’ E C
U.S.S.R. Vol. 13. No. 5-I-l
cp(z)
114
V. D. Skarin
for r. > R, whence the theorem follows. Corollary
lim m(r) = fi. I(O--f m Let rp(z) be a strongly convex function of z, i.e., we have 6 (z) = y?, y > 0 in (1.12). Theorem 3
If the function cp(z) of problem (1.5) is strongly convex, problem (1.5) will be solvable for any r > 0 and * ?bj
1
-
n2 --$
c
j-i
-
where the hj, j = 1, 2,..., q, come from Eqs. (1.3). Proof: Taking an arbitrary x’ E C we consider (1.8) and use the Cauchy inequality and (1.15). We get ‘I2
F(x’,r)>E-j=i
>E-
(pJ” j=i
llz’ll + 2y llz’l12= fi +
’
[
I(2y)
llz’ll
The theorem is proved. It will not be assumed below that condition (1.2) holds. We introduce the problem of finding min {fo(x) +I)1,
n cl,
(1.17)
Penalty functions for non-linearprogramming problems
115
where M, = {X1fj(z) G Ej, &j> 0,i= 1,2,. . . : 4). We shall assume that the following condition is satisfied: if the problem (1 .l) is solvable, then (1 .17) is also solvable, where rnL *
m
as
max
Ej' 0,
(1.18)
and m, is the optimal value of problem (1.17) (the condition (*) in the notation of
Ull>Notice that, to satisfy the condition (*) in finite-dimensional space, it is sufficient to require that the set of optimal solutions of problem (1 .l) be non-empty and bounded. Other conditions guaranteeing that (1.18) holds may be found in [ 121. Let C” fl M i #:We denote by ze a solution of (1.17). Since problem (1.17) already satisfies the condition (1.12), we can find, in the same way as above, functionals ijEdjj(Xe), i = 0, 2, . . . , q, ii E C,, and numbers ~.Q%O,j = 1,2, . . . , 4, such that q
r, =
c
Uj”ij -
t,
Ui”[f3(&)-
F,l=
0,
j =
1,2,
. . . ) q.
(1.19)
Lemma I
The numbers Uj’, j = I? 2,. . . , q, of Eqs. (1.19) satisfy the relationships lim e,u; = 0. e,4 Proof Using the convexity of the functions fj(s),
j = O,l, . . . , q, and (1.19), we
have
From this, in conjunction with the condition (*), the required equation follows. Let v(z) and such that
=(ph,...,
z,) be a continuous convex function, defined for z>O
cp(0) =o,
Lemma 2
If cp(z) satisfies the relationships (1.20), we have
(1.20)
116
K D. Skarin
q
(2) 2 cp(Zi’,. * .
)
z;+0, z;+1,. . . ,zqy.
Proof: Let Z’ 3 @while Zi’ > O.We introduce the notation z’(i) = (zl’, . . . , z:_~, 0, Z+l’ * * * , %I‘) .Given any E > 0 and noting that cp(z) is continuous, we can find a
point z’, in the interval [z’, z’(i)], different from z’(i) and z’ and such that cp(z’) > g, [ z’ (i) ] - a.From the properties of the directional derivative of a convex function [IO] and (1.20), we have for any h > 0 iP(sO+hs,)-cp(sO) h
..$ %40 dz+
>. *
Hence, with h =I sir - Z,
1-5
cp(s’) =-cp(z”> > &‘(q which proves the lemma, since E is arbitrary. Theorem 4
Let the condition (*) be satisfied for problem (l.l), let the rp(z)in problem (1.5) satisfy (I .20), and let rj=p, j = 1, 2, . . . , q. Then, lim m(f) = iii. P-t=-
Proof: We set aj = E = 1 i T’~I in problem (1.17). We take an arbitrary element I’ E C. As before, let z (x’, r) = z’ = (zl’, . . . , 2:) .Using (1.19), and proceeding as in the proof of Theorem 1, we get P (1.21) F(d,r)>m,Iuj”lfj+(5’)+ cp(z’). c j-1 Consider the derivatives of cp(z)at the point z’ = (1, I, . . . ,1) with respect to the unit directions sj, j = 1,2, . . . , Q. We have dcp@*I dZj
=
max (c,
zj>,
C=a(pCZ*)
where drp(z*) = (c = (c,, . . . , c,) (q(z) - cp(z*) 3 (c, z -z*) mm Bcex z 3 0). We denote the vector of @(z*) with the maximumj-th component by d= (cfj, . . . , G’). Let the vector .zC~)’ = (0, . . . , 0, z;, 0, . . . , 0). Then, 8~ (z’) / dz(jj’ =(d, Z(j)‘) L= cjjzj’, where, by virtue of (1.20), cjj > 0. By Lemma 2 and the definition of the vector c?
Summing these inanities
over j, we get
117
(1.22)
There are two possible cases. Case I
From (1.21) and (1.221,
From this, by Lemma 1, with some Ra we have
F(X’, r) F
111,
for p 2 R,.
(1.23)
Case 2. C 4 0. In this case,
Q
F(z’,r)2mm,-
c j-_i
Hence, with p Z R, = max -
4E” we get iC:jI 2 (1.24)
118
K
By Lemma 1, C&‘p/~~-
1~je 1GO
D. Sk&n
for some & and p>Rz. Let o > max {R,, Hz}.
If r’ E M,, then
f,’ (s’) > E and hence we obtain (1.23) from (1.24). If x’ E M,, E the fact that (1.23j$olds is obvious. Since, starting with a certain p (1.23) holds for all x E c, we have m(r)=infF(s,r)>n, and by condition (*), lim m(r) >Pi.
(1.25)
pz
In addition, we always have m (I’) G 2, so that i&n(r) p--
< iiz.
(1.26)
The theorem follows from (1.25) and (1.26). Notice that the conditions (1.20) are satisfied, e.g., by the penalty CD(I, r) =
2
rfj'"
(2))
k>
1.
j=i
2. The general case Consider the problem min
{fo(x) Ix EM},
(2.1)
where M has a similar form to the M in (1 .l). We shall assume that the functionals fj (x) are defined and weakly lower semi-continuous in reflex Banach space B,j= 0,1, . . . , q. Let x’ be a point at which problem (2.1) has a local minimum. In other words, let a number 6, > 0 exist such that E = f0 (Z) G f. (5) for all x E M fl S, where w e assume that fi(Z) = 0, j = 1,2, . . . , t, t G q. S=S(Z, S,) = {z(I[a:-~ll<‘60}. Along with (2.1), we introduce the problem of finding min {F(s, r) = to(t) + Y (r, r) I, IES
(2.2)
Penalty functions for non-linearprogrammingproblems
119
where the functional Y (J, r) is defined on S (r = (ri, . . . , r,)is a vector parameter, r > 0) and satisfies the following conditions: a) Y (z, r) is weakly lower semi-continuous with respect to x for any r, b) V (z, r) >, 0 for any x or r, c)
Y (5, r) = 0 for z E M;
lim min rid52
d) Y (5, r) + 00 without decreasing as min rj + m for 5 E S \ M. j Theorem 5
Under the conditions stated above, we can say that: 1) there exists at least one solution 5 r of the problem (2.2), 2) z* E a,
where x’ is a weak limit of the sequence (5,)) I@= (3 1zdVflS,
fo (x) =q; 3) fo(G)
4) Y(r,,r)
-+ E +O
as
minrj+
a;
as minrj-+m.
Proof: Since a weakly lower semi-continuous functional attains a minimum in any bounded, weakly closed set of reflexive Banach space [ 131, the existence of a solution of problem (2.2) is ensured by the hypotheses. Since F (x,., r) G F (5, r) for any r, it follows from c) of (2.3) that F(x,,
lim m1n
r)<
lim rnln
,I+c-
F(z,
r)=
65.
r,+m
(2.4)
Since S is weakly compact in reflexive Banach space, there must exist a subsequence {~tf~f}of elements of {z,}, weakly convergent to some point X’ E B. Since S is weakly closed, z* E S. Let us show that z* E M. Assume that zT’ @ Mfor r.’ 2 R, and .z*6 M. Then, from d) of (2.3), F (x1, ri) > some Rz and r’ b Rz with E > 0, arbitrary. By definition of weak lower semi-continuity for the functional F(x, r), an RS exists such that, with r’ Z RS E +
E for
F (z’, Rz) - $
Taking r’ > max {Ri, R,, Rs}
< F (z,~, R,).
it follows from this and d) of (2.3) that
F (w, r’)>F (x,1, R,)
>i~+d2,
K D. Skarin
120 which contradicts (2.4).
Further, let a subsequence {SP} c {a:,<) exists such that xrk E &’ for all k. Obviously, (2,“) is weakly convergent to z*. Since the functionals fj (3) are weakly lower semi-continuous in S, we have
fj(z*)<
4
fj(~,k)\(o,
j = 1,2, . . ., q.
minr -cot2 i f
Tlms, 2’ E M. It follows from this, together with b) of (2.3) and the properties of the functions f0 (5) that
From (2.4) and (2.5) we get the equation f. (2) = fi, which proves statement 2) of the theorem. Furthermore, we find from (2.4) and (2.5) that, for any weakly convergent subsequence {s,l} of {zt’,}that lim ml* r;j
F(z,f, ri) =
lim f0 (z,~) = iii. min .*,-+m j
Statements 3) and 4) of the theorem follow directly from this. The theorem is proved. There are serious difficulties in the way of obtaining theorems similar to the above (for convex programming problems) in the case problem of mathematical programming, indicating the conditions under which the original problem is equivalent to the problem with a continuous penalty. It is sometimes possible to achieve some success by using discontinuous penalty function&. Theorem 5 in fact shows that it is in principle possible to employ such functionals in the method of penalty functions. Let the function~s f;(z), j = 0, ‘1,. . . , q, be Frechet-~fferentiable in S. We fut a point p E S and transform from (2.1) to the fo~o~ng problem of linear pro~a~g
min[f~’fp>(5 - p) + b(P) I, We shall say that condition (A) is satisfied at the point p if the problem L @) is solvable. We consider the further problem: to find
(2.6)
Pen&y functionsfor non-linearprogrammingproblems
121
where r = (Q, . . . , n) > 0,o c s G 6,
Y (x,7-)=
0,
I
XEMflS,
I 0 (I, r) + 602i
rj,
x E S\M,
j=i
W, r) =cpb&, 3, . . . , z&4 r) 1, cp(2)
is a convex function, satisfying relationships (1.4) ((2, (x, r), . . . , Zt(X, r)) =2(X, r) =Z,Zi(2, r) =rifi’(X)).IIl addition to (1.4), we shall require that rp(2%)G q (z,) for zl < z2. It follows from this inequality and the fact that rp(z) is convex that @ (5, r) is a convex functions with respect to z in B, and hence, is weakly lower se~~ontinuous with respect to zr. Obviously,Y (z, r) is then also weakly lower semi-continuous in S, whence Theorem 5 is valid for the problem (2.6). Moreover, Theorem 6
Let the functionals fj (LX),j = 0, 1, . . . , t, have uniformly continuous second Frechet derivatives in S and let condition (A) be satisfied at the point S .Then there exist 0<6G&and Rj>O, j=l, 2,..., t, suchthat,~thrj~~~ min
F (x, r) = @i.
xc533(>,B) Proof
We denote by x+, B one of the solutions of the problem min {F (5, r) Ix E
S(2, 6), 0~6<&}.Bydeftitionof
Y(x, r),wehave arbitrary r and 0 < 6 G 6, Hence we always have
F (XT.a, r)
F($,r)
=fo($)
=Z
for
< E,
(2.7)
whatever the I and 6. Since condition (A) holds at the point 2 we have
(2.8)
f.'(iT)=~Uj(if)f~(I)
,=i
for some Uj(Z) G 0, j = I? 2, . . . , t. This follows from the duality theorem for the problem L (5) and its dual problem L” (2). We set
122
V.13.Skarin
where
K=_SIIf*” (zigII +
I
c 61
iUJ(Z)1lifj" (if) II , >
w~ewet~e~e~of(l,4)~andthe U+(Z), i=$, 2,.,., ~,of(2,8).Wef~rj>&R,, and let 0 C; 6 G 60. It can be assumed without loss of generality that zr, li $ M. By Taylor’s formula,
II0 (2, h) 11
where lim = 0. On applying (2.8) and Taylor’s formula for the la-P@ llhllZ fj(2),j=1,2,.._,t-,weobtain
where liw,(3, h) II = ~(1111211”). Hence
sincer1wi(ii?,Xr,
F--Z) if16‘-+O as 8+-O, there must exist a F, such that the sign of the expression in the brackets is determined by its first term. Hence
The theorem follows from (2.9) in conjunction with (2.7).
123
Penalty functions for non-linear programming problems
Notice in conclusion that, though some of the above theorems contain conditions which cannot be verified constructively, they nevertheless give a fairly clear picture of the qualitative nature of the connection between the initial problem and the problem with a penalty. Translated by D. E. Brown REFERENCES 1.
LEVITIN, E. S. and POLYAK, B. T., Constrained minimization methods, Zh. vjjchisl.Mat. mat. Fiz., 6, 5, 787-823, 1966.
2.
EREMIN, I. I., The method of penalties in convex programming, Dokl. Akad. Nauk SSSR, 173,4, 748-751, 1967.
3.
POLYAK, B. T., The convergence rate of the method of penalty functions, Z/z. vjjchisl Mat. mat. Fiz., 11, 1, 3-11, 1971.
4.
RYBASHOV, M. V., The gradient method for solving convex programming problems on an electronic model, Avtomatika i Telemekhan., 26, 11, 1955-1967, 1965.
5.
IVANOV, V. V., and TRUTEN’, V. E., On the method of penalty functions, Kibernetika, 2, 67-69, 1968.
6.
FIACCO, A. V., and MCCORMICK, G. P., Nonlinear programming, Wiley, 1968.
7.
FIACCO, A. V., Penalty methods for mathematical programming in En with general constraint sets, J. Opt. Theov and Appl., 6, 3, 252-268, 1970.
8.
OSBORNE, M. R., and RYAN, D. M., On penalty function methods for nonlinear programming problems, J. Math. Anal. andAppl., 31,3, 559-579, 1970.
9.
ASTAF’EV, N. N., The method of linearization in convex programming, in: Mathematical methods in some optimal planning problems (Matem. metody v nekotorykh zadachakh optimal’nogo planirovaniya), 3, 3-18, Sverdlovsk, UNTs Akad. Nauk SSSR, 1971.
10. PSHENICHNYI, B. N., Necessary conditions for an extremum (Neobkhodimye ekstremuma), Nauka, Moscow, 1969.
usloviya
11. EREMIN, I. I., On convex programming problems with opposed constraints, Kibernetika, 4, 124-129, 1971. 12. GOL’SHTEIN, E. G., Theory of duality in mathematicalprogramming and its applications (Teoriya dvoistvennosti v matematicheskom programmirovanii i ee prilozheniya), Nauka, Moscow, 1971. 13. VAINBERG, M. M., Variationalmethods for investigatingnon-linear operators (Variatsionnye metody issledovaniya nelineinykh operatorov), Gostekhizdat, Moscow, 1956.