A survey of subdifferential calculus with applications

A survey of subdifferential calculus with applications

Nonlinear Analysis 38 (1999) 687 – 773 A survey of subdi erential calculus with applications Jonathan M. Borweina; 1 , Qiji J. Zhub; ∗; 2 a b Depart...

491KB Sizes 45 Downloads 227 Views

Nonlinear Analysis 38 (1999) 687 – 773

A survey of subdi erential calculus with applications Jonathan M. Borweina; 1 , Qiji J. Zhub; ∗; 2 a b

Department of Mathematics and Statistics, Simon Fraser University, Burnaby, Canada BC V5A 1S6 Department of Mathematics and Statistics, Western Michigan University, Kalamazoo, MI 49008, USA Received 3 November 1997; accepted 18 December 1997

Dedicated to Francis Clarke on the occasion of his 50th birthday and the 25th birthday of the Clarke generalized gradient

Keywords: Viscosity subdi erential; Proximal subdi erential; Subdi erential calculus; Coderivative calculus; Limiting subdi erentials; Geometric subdi erential; Generalized gradients; Limiting coderivatives; Constrained optimization problems; Mean value theorems; Mean value inequalities; Viscosity solutions; Hamilton–Jacobi equations; Smooth spaces; Open mapping; Implicit function theorems; Metric regularity; Sensitivity

1. Introduction and preliminaries 1.1. Introduction Nonsmooth analysis had its origins in the early 1970s when control theorists and nonlinear programmers attempted to deal with necessary optimality conditions for problems with nonsmooth data or with nonsmooth functions (such as the pointwise maximum of several smooth functions) that arise even in many problems with smooth data. The following are two simple examples that illustrate how such intrinsic nonsmoothness arises in problems with seemingly smooth data.

∗ 1

Corresponding author. E-mail: [email protected]. Research was supported by NSERC and by the Shrum Endowment at Simon Fraser University.

2 Research was supported by the National Science Foundation under grant DMS-9704203 and by funds from the Faculty Research and Creative Activities Support Fund, Western Michigan University.

0362-546X/99/$ - see front matter ? 1999 Elsevier Science Ltd. All rights reserved. PII: S 0 3 6 2 - 5 4 6 X ( 9 8 ) 0 0 1 4 2 - 4

688

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

Fig. 1. Smooth becomes nonsmooth.

Example 1.1. Often one wishes to deal with the maximum of two or more functions. Let f(x) := max (f1 ; f2 ). For the very simple smooth functions on R; f1 (x) := x and f2 (x) := −x one obtains f(x) = |x|, a quint essentially nonsmooth function. Example 1.2. Consider the very simple constrained minimization problem of minimizing f(x) subject to g(x) = a; x ∈ R. Here a ∈ R is a parameter allowing for perturbation of the constraint. In practice, it is often important to know how the model responds to the pertubation a. For this we need to consider, for example, the optimal value v(a) := inf {f(x): g(x) = a} as a function of a. Consider a concrete example with two smooth functions f(x) := 1− cos x and g(x) := sin(6x)−3x and a ∈ [−=2; =2] which corresponds to x ∈ [−=6; =6]. The graph of (g(x); f(x)): x ∈ [−=6; =6] is given in Fig. 1. It is easy to see from Fig. 1 that the optimal value function v is not smooth; in fact, not even continuous. In the attempt to deal exibly with such problems, various (set-valued) generalized derivative concepts were proposed to replace the nonexistent derivative. The aim was to de ne a generalized derivative for every point in the domain of a function belonging to a particular class (such as that of locally Lipschitz functions). The rst such canonical generalized gradient was the generalized gradient introduced by Clarke in his pioneering work [39]. He applied this generalized gradient systematically to nonsmooth problems in a variety of problems (see [45, 47] and the reference therein) and, thus, opens the door for methodically studying nonsmooth problems.3; 4 Several of the other frequently used generalized derivative concepts are 3 Clarke’s impact on and contribution to nonconvex nonsmooth analysis has been dramatic. We mention the development of proximal analysis, of exact penalization techniques, of the regularity and existence in the calculus of variations, of nonsmooth necessary conditions, suciency and sensitivity, and much more in mathematical programming, control theory and di erential equations. In all cases with matching technical sophistication and elegance. 4 Of course, the theory of nonsmooth analysis did not come out of the blue. A fertile precursor for this theory is the theory of convex analysis for which a standard reference is Rockafellar’s book [165]. There

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

689

the co-derivatives introduced by Mordukhovich [136, 137, 139], approximate and geometric subdi erentials introduced by Io e [101, 102], contingent derivatives (see [4,153]), Michel and Penot’s derivatives [135] and Treiman’s B-derivatives [190,191]. There are also studies on classes of generalized derivative objects. Examples are Halkin’s screen [87] and Warga’s derivate containers [200,202] and their recent re nement and extension multidi erentials by Sussmann [180, 183]. While these generalized derivatives are very useful in the study of nonsmooth problems, their de nitions are complicated and they often are hard to calculate. Over the years smooth subdi erentials (approximating a function locally by smooth supporting functions) and corresponding normal cones have attracted increasing attention for two reasons. (a) Smooth subdi erentials and their related normal cone concept are very simple yet they can recapture many frequently used generalized derivatives and normal cones such as coderivatives, geometric subderivatives and Clarke’s generalized gradients and corresponding normal cones when the underlying space has an appropriate smoothness property. Thus, in many important applications, there is no loss of information in working directly with subdi erentials and their related normal cones. (b) More importantly, the use of smooth subdi erentials largely reduces to dealing with the much better behaved supporting functions or, equivalently, exploiting that the minimum of f − g is attained, where f is the function under consideration and g is the candidate for a smooth support function (by variational analysis arguments). Thus, variational principles such as were developed by Ekeland [78], Borwein and Preiss [27] and Deville et al. [72] become core tools, and lower semicontinuity-type conditions become natural assumptions. This is an important advance in nonsmooth analysis: it also facilitates the use of unbounded subdi erentials (studying problems under lower semicontinuity assumptions) instead of con ning oneself to bounded subdi erentials (studying problems under Lipschitz-type conditions). The geometric concept of perpendicular vectors (proximal normal) to a set can be traced back to Clarke’s thesis [39], while Hirriart–Urruty [89] rst showed how to obtain an explicit formula for the corresponding convex tangent cone. An emphasis on utilizing these normals to construct possibly nonconvex-valued generalized derivative concept was developed by Mordukhovich [136]. The proximal subdi erential was introduced explicitly by Rockafellar [168] where characterizations of the Clarke generalized gradient and Clarke normal cone in term of the proximal subdi erentials and proximal normals were given. These characterizations were subsequently extended to other generalized derivatives and normal cones such as coderivatives, approximate and geometric subdi erentials in greater generality in the work of Borwein and Io e [20], Borwein and Strojwas [30], Loewen [126] and Treiman [187, 189]. Fuzzy sum rules were studied by Io e independent of any limiting generalized derivative concept in [97]. Subdi erentials have also been used in the study of continuous viscosity solutions by is a large literature somewhat more tangential to the present work. We mention in particular Pshenichnyi’s seminal notion of quasi-di erentiability [160] which requires that the directional derivative of a function f exist and be sublinear and continuous. In this case, Pshenichnyi de nes the generalized derivative object of f at x to be the convex subdi erential @p(0), where p(h) := f0 (x; h), and derives a satisfactory calculus for it.

690

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

Crandall and Lions [68]. Systematic study of the proximal subdi erential calculus was carried out in Clarke et al. [53]. With the increasing utilization of variational arguments many new results on smooth subdi erentials have appeared in the past few years. This survey is an attempt at summarizing the basic tools for subdi erentials and some of their applications. There are very many competing smooth subdi erentials. To simplify the exposition we will concentrate on the Frechet subdi erential for most of this paper. Many, if not most, of the results hold for more general subdi erentials. We will indicate some of those generalizations in Section 8. We should note that smooth subdi erentials cannot completely remove the role of other generalized derivatives. For example, the Clarke generalized gradient (being convex) provides beautiful primal-dual results when used to de ne normal and tangent cones; and is more convenient than the smooth subdi erential in some applications involving Lipschitz data. Correspondingly, Warga’s derivative container and its re nement Sussmann’s multidi erential can be used to give open mapping and covering theorems intrinsically related to topological methods that are crucial in certain results which do not have an adequate subdi erential version. That said, smooth subdi erentials are most convenient in presenting results derived by variational arguments. While the main theme of the current survey is smooth subdi erential theory and variational methods, we should also note that the eld of nonsmooth analysis is much wider. Clarke’s classical monograph [45] and more recent books of Clarke [47], Clarke et al. [53], Loewen [129], Modukhovich [139] and Rockafellar and Wets [172] provide a comprehensive overview of the eld. 1.2. Preliminaries Let X be a real Banach space with closed unit ball BX and with topological real dual X ∗ . In what follows, we shall usually assume that X has an equivalent Frechet smooth norm and will use this norm as the norm of X unless otherwise stated. All re exive Banach spaces and all spaces, such as c0 , with separable duals possess such renorms [71]. We denote by 2X the collection of all subsets of X and use R to denote the extended real line R ∪ {+∞}. Let f : X → R be a function. We denote by dom f := {x ∈ X : f(x) ∈ R} the e ective domain of f. We assume all our functions are proper in that they take some nite values: dom f 6= ∅. Let us now recall the de nition of the Frechet subdi erential and normal cone. Deÿnition 1.3. Let f : X → R be a lower semicontinuous function and S a closed subset of X . We say f is Frechet subdi erentiable and x∗ is a Frechet-subderivative of f at x if there exists a concave C 1 function g such that ∇g(x) = x∗ and f − g attains a local minimum at x. We denote the set of all Frechet-subderivatives of f at x by DF f(x). We de ne the Frechet-normal cone of S at x to be NF (S; x) := DF S (x) where S is the indicator function of S de ned by S (x) := 0 for x ∈ S and ∞ otherwise. Remark 1.4. Frechet subderivatives can be de ned using either a “unilateral limit” or a “viscosity” approach. Our de nition uses the viscosity approach. The limit de nition

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

691

of a Frechet subderivative of x∗ of f at x is lim inf khk→0

f(x + h) − f(x) − hx∗ ; hi ≥ 0: khk

In a Frechet-smooth Banach space the two de nitions yield the same subdi erential. In fact; it is obvious that any viscosity Frechet subderivative is a limiting Frechet subderivative. On the other hand; let x∗ be a Frechet subderivative of f at x. Then by [71] there exists a C 1 function  with (0) = 0 (0) = 0 such that z → f(z) − f(x) − hx∗ ; z − xi + (kz − xk) attains a local minimum at 0. De ne (t) := sup0≤s≤t 0 (s). Then is a nonnegative Ru continuous increasing function and (0) = 0. Thus; (u) := 0 (t) dt is a convex C 1 function satisfying (0) = 0 (0) = 0. Moreover, (t) ≥ (t); ∀t ≥ 0. Apparently; z → g(z) := f(x) + hx∗ ; z − xi − (kz − xk) is a concave local support of f at x with Frechet-derivative x∗ . The requirement that the osculating function g be concave in the viscosity de nition of the subdi erential was introduced in [32]. It is technically more convenient and; as we can see from the above discussion; equivalent to a de nition which only requires g to be C 1 . Deÿnition 1.5. Let f be Lipschitz near x, and let v ∈ X . The Clarke generalized directional derivative f◦ (x; v) of f at x is de ned as follows: f(y + tv) − f(y) ; f◦ (x; v) := lim sup t y→x; t→0+ and the Clarke generalized gradient @c f(x) of f at x is de ned by @c f(x) := {x∗ ∈ X ∗ : hx∗ ; vi ≤ f◦ (x; v) ∀v ∈ X }: As already mentioned our main entry point for recent results on subdi erentials is the appropriate use of variational principles. We will, thus, have frequent recourse to the following smooth variational principle due to Borwein and Preiss [27]. Theorem 1.6 (Smooth variational principle). Let f : X → R be a lower semicontinuous function bounded below and let the constants ¿0 and ¿0 be given. Suppose that u satis es f(u)¡ + inf f: X

Then there exists a C 1 convex function g on X and v in X such that (i) the function x → f(x) + g(x) attains a global minimum at x = v; (ii) ku − vk¡; (iii) f(v)¡ + inf X f; (iv) k∇g(v)k¡2=: Informally, we make a small smooth, convex perturbation and obtain a nearby point which minimizes the perturbed function and leaves the function value unimpaired.

692

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

2. Several basic results There are several ways of developing a set of basic tools for subdi erential analysis that are applicable to a wide range of problems. Borwein et al. [33] use the nonlocal fuzzy sum rule [217] as a basic tool; the multidirectional mean value inequality [50] is a corner stone result in Clarke et al. [53]; Io e [102] begins with the local fuzzy sum rule and Mordukhovich and Shao [147] choose the extremal principle [115,136,141] as the main device. It turns out that all these basic results are equivalent. (The equivalence between the extremal principle and the local fuzzy sum rule was established in [149] and their equivalence to the multidirectional mean value inequality and the nonlocal fuzzy sum rule was proved in [218] in arbitrary smooth spaces. Very recently, Io e showed that the equivalence of these basic results actually holds in arbitrary Banach spaces for a large class of abstract subdi erentials [105].) In fact, they exploit in di erent ways the two underlying principles: (a) a “smooth variational principle” [27] and (b) a “decoupling lemma” used by Crandall and Lions in studying the uniqueness of viscosity solutions [68]. In this section we establish the nonlocal fuzzy sum rule rst. We then deduce the other three basic results from it. 2.1. Nonlocal fuzzy sum rule The nonlocal fuzzy sum rule was established in [217] as a tool for simplifying, and generalizing, the proof of the Clarke–Ledyaev multidirectional mean value inequality [50]. It is a convenient tool for deducing the three other basic tools of this section (see [33]). We denote the diameter of a set S by diam(S) := sup {kx − yk: x; y ∈ S}. Theorem 2.1 (Nonlocal fuzzy sum rule). Let f1 ; : : : ; fN : X → R be lower semicontinuous functions bounded below with ( N ) X fn (yn ): diam(y1 ; : : : ; yN ) ≤  ¡+∞: lim inf →0

n=1

Then; for any ¿0; there exist xn and xn∗ ∈ DF fn (xn ); n = 1; : : : ; N satisfying diam(x1 ; : : : ; xN ) · max (1; kx1∗ k; : : : ; kxN∗ k)¡; and N X

fn (xn )¡lim inf →0

n=1

( N X

(1) )

fn (yn ): diam(y1 ; : : : ; yN ) ≤ 

+

(2)

n=1

such that 0∈

N X n=1

xn∗ + BX ∗ :

(3)

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

693

Proof. De ne, for any real number i¿0, wi (y1 ; : : : ; yN ) :=

N X

N X

fn (yn ) + i

n=1

kyn − ym k2

n; m=1

and Mi := inf wi . Then Mi is an increasing sequence and is bounded by ( N ) X fn (xn ): diam(x1 ; : : : ; xN ) ≤  : lim inf →0

n=1

Let M := limi→∞ Mi . Observe that the product space of N copies of X (with the Euclidean product norm) also has an equivalent Frechet-smooth norm. For each i, applying the smooth variational principle of Theorem 1.6 to function wi , we obtain a convex C 1 function i and xn; i ; n = 1; : : : ; N such that wi + i attains a local minimum at (x1; i ; : : : ; xN; i ); k∇i (x1; i ; : : : ; xN; i )k¡=N and 1 1 (4) wi (x1; i ; : : : ; xN; i )¡inf wi + ≤ M + : i i For each n, the function y → wi (x1; i ; : : : ; xn−1; i ; y; xn+1; i ; : : : ; xN; i ) + i (x1; i ; : : : ; xn−1; i ; y; xn+1; i ; : : : ; xN; i ) attains a local minimum at y = xn; i . Thus, for n = 1; : : : ; N , xn;∗ i

:= −∇xn i (x1; i ; : : : ; xN; i ) − 2i

N X

∇k · k2 (xn; i − xm; i ) ∈ DF fn (xn; i ):

m=1

Summing these N inclusions leads to N X

xn;∗ i = −

n=1

N X

∇xn i (x1; i ; : : : ; xN; i ) − 2i

n=1

Observing that k −

PN

N X N X

∇k · k2 (xn; i − xm; i ):

n=1 m=1

n=1

∇xn i (x1; i ; : : : ; xN; i )k ≤  and

∇k · k2 (xn; i − xm; i ) + ∇k · k2 (xm; i − xn; i ) = 0 so that the double sum in the previous inclusion vanishes, we obtain 0∈

N X

xn;∗ i + BX ∗ :

n=1

By the de nition of Mi we have Mi=2 ≤ wi=2 (x1; i ; : : : ; xN; i ) = wi (x1; i ; : : : ; xN; i ) −

N i X kxn; i − xm; i k2 2 n; m=1

≤ Mi +

i 1 − i 2

N X n; m=1

kxn; i − xm; i k2 :

(5)

694

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

Rewriting Eq. (5) as i lim i

i→∞

N X

PN

n; m=1

kxn; i − xm; i k2 ≤ 2(Mi − M1=2 + (1=i)) yields

kxn; i − xm; i k2 = 0:

n; m=1

Therefore, lim diam(x1; i ; : : : ; xN; i ) = 0

i→∞

and ∗ lim diam(x1; i ; : : : ; xN; i ) · max (kx1;∗ i k; : : : ; kxN; i k) = 0:

i→∞

Thus, M ≤ lim inf

( N X

→0

) fn (xn ): diam(x1 ; : : : ; xN ) ≤ 

n=1

≤ lim inf

N X

i→∞

which yields M = lim inf

fn (xn; i ) = lim inf wi (x1; i ; : : : ; xN; i ) ≤ M

n=1

( N X

→0

i→∞

) fn (xn ): diam(x1 ; : : : ; xN ) ≤  :

n=1

It remains to take xn = xn; i and xn∗ = xn;∗ i ; n = 1; : : : ; N for a suciently large i. Remark 2.2. The conditions f1 ; : : : ; fN : X → R bounded below and ( N ) X fn (yn ): diam(y1 ; : : : ; yN ) ≤  ¡∞ lim inf →0

n=1

in the nonlocal fuzzy sum rule cannot be dispensed with. For example; we need only to consider functions on X := R. Functions f1 (x) = x and f2 (x) = 0 do not satisfy the nonlocal fuzzy sum rule because f1 is not bounded from below and functions f1 (x) = {0} (x) and f2 (x) = {1} (x) do not satisfy the nonlocal fuzzy sum rule because they fail the condition lim→0 inf {f1 (y1 ) + f2 (y2 ): ky1 − y2 k ≤ }¡∞. The conclusion (3) in the nonlocal fuzzy sum rule is similar to that of the usual (local) fuzzy sum rule. However; the conclusion (1) now only tells us that the points xn are close to each other; while; in contrast; in a local fuzzy sum rule they are guaranteed to be close to a point where the sum attains its minimum (under additional assumptions). Note that the conclusion (1) also gives us control on the “size” of the subderivatives involved in the sum. This size control was rst obtained in [36] for local fuzzy sum rules and has proven to be quite useful in applications (see e.g. Section 4 for deriving the approximate mean value theorem and Section 12 for proving the

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

695

uniqueness of viscosity solutions to Hamilton–Jacobi equations). The conclusion (2), nonetheless; gives purchase on the value of lower semicontinuous functions. In applications it often also yields information on the location of points xn indirectly. We illustrate this by the following example. Example 2.3 (Density of subdi erentiable points). Let f : X → R be a lower semicontinuous function, let x ∈ dom(f) and let  ∈ (0; 1). Applying Theorem 2.1 to f1 = f + x+BX and f2 = {x} yields that there exist x1 and x2 such that kx1 −x2 k¡; 0 ∈ DF f1 (x1 ) + DF {x} (x2 ) + BX ∗ and f1 (x1 ) + {x} (x2 )¡f(x) + : The last inequality forces x2 = x and, therefore, x1 must be in the interior of x + BX so that DF f1 (x1 ) = DF f(x1 ). Therefore, dom(DF f) is dense in dom(f). This is a very potent result. In particular, since subderivatives of concave functions are automatically derivatives, it implies that continuous concave (resp. convex) functions on Frechet-smooth spaces are densely (a fortiori generically) Frechet di erentiable. 2.2. Local fuzzy sum rule The prototypes of local fuzzy sum rules rst appear in Io e’s paper [97]. Local fuzzy sum rules are important in optimization problems and can serve as a basis for a full calculus. As mentioned local fuzzy sum rules need additional assumptions. The following uniform lower semicontinuity condition introduced in [36] is the weakest such condition so far found. Deÿnition 2.4 (Uniform lower semicontinuity). Let f1 ; : : : ; fN : X → R be lower semicontinuous functions and E a closed subset of X . We say that (f1 ; : : : ; fn ) is uniformly lower semicontinuous on E if inf

x∈E

N X n=1

fn (x) ≤ lim inf →0

( N X

) fn (xn ): kxn − xm k ≤ ; xn ; xm ∈ E; n; m = 1; : : : ; N

:

n=1

We TN say that (f1 ; : : : ; fN ) is locally uniformly lower semicontinuous at x ∈ n=1 dom(fn ) if (f1 ; : : : ; fN ) is uniformly lower semicontinuous on a closed ball centered at x. Remark 2.5. Two simple yet useful sucient conditions for (f1 ; : : : ; fN ) to be locally uniformly lower semicontinuous at x are (a) all but one of fn are uniformly continuous in a neighborhood of x; (b) at least one of fn has compact level sets in a neighborhood of x.

696

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

Theorem 2.6 (Strong local fuzzy sum rule). Let f1 ; : : : ; fN : X → R be lower semicontinuous functions. PN Suppose that (f1 ; : : : ; fN ) is locally uniformly lower semicontinu Then; for any ¿0; there exist ous at x and n=1 fn attains a local minimum at x.  and xn∗ ∈ DF fn (xn ); n = 1; : : : ; N such that |fn (xn )−fn (x)|¡;  n = 1; 2; : : : ; N; xn ∈ x+B diam(x1 ; : : : ; xN ) · max (kx1∗ k; : : : ; kxN∗ k)¡ and

N

X

∗ xn ¡:

n=1

Proof. The idea is to apply the nonlocal fuzzy sum rule to the penalized function  2 + x + hBX for some h given below. Take an h¿0 such that fn + k · −xk N X

fn (x)  = inf

x∈hBX

n=1

N X

fn (x)

n=1

≤ lim inf

( N X

→0

fn (yn ): kyn − ym k ≤ ;

n=1

)

yn ; ym ∈ x + hBX ; n; m = 1; : : : ; N

:

Let 0 ¡ min(=2; 2 =8N 2 ; h=2) be a positive number small enough so that ( N N X X fn (x)  ≤ inf fn (yn ): kyn − ym k ≤ 0 ; n=1

n=1

)

yn ; ym ∈ x + hBX ; n; m = 1; : : : ; N

+ 2 =8N 2 :

(6)

 2 + x + hBX ; n = Applying the nonlocal fuzzy sum rule of Theorem 2.1 to fn + k · −xk ∗ 2 1; : : : ; N yields xn and yn ∈ DF (fn + k · −xk  + x + hBX )(xn )) = DF fn (xn ) + ∇k · k2 (xn − x);  n = 1; : : : ; N satisfying diam(x1 ; : : : ; xN )¡0 , diam(x1 ; : : : ; xN ) · kyn∗ k¡0 ; and

N X

n = 1; : : : ; N

(7)

fn (xn ) + kxn − x k2

n=1

≤ lim inf →0

( N X

fn (yn ) + kyn − x k2 : kyn − ym k ≤ ;

n=1

)

yn ; ym ∈ x + hBX ; n; m = 1; : : : ; N ≤

N X n=1

0

fn (x) +  ≤

N X n=1

fn (x) + 2 =8N 2

+ 0 (8)

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

697

such that

N

X

∗ yn ¡0 :

n=1

Combining Eqs. (6) and (8) yields that kxn − x k2 ¡2 =4N 2 and, therefore, the norm of ∇k · k2 (xn − x ) is bounded by =2N . Then we can directly check that xn∗ := yn∗ − ∇k · k2 (xn − x ) ∈ DF fn (xn ) satis es the conclusion of the theorem. The last result is “strong” in that it deduces that the subderivatives are close together in norm. A weak local fuzzy sum rule that only assumes the summand functions to be lower semicontinuous can now be derived from Theorem 2.6. Notice that the conclusion involves the weak-star topology and that the hypothesis of minimality has been relaxed.  Theorem 2.7 (Weak local fuzzy sum rule). Let PN f1 ; : : : ; fN : X → R be lower semicontinuous functions. Suppose that x∗ ∈DF ( n=1 fn )(x). Then; for any ¿0 and any weak-star neighbourhood V of 0 in X ∗ ; there exist xn ∈ x + B; xn∗ ∈ DF fn (xn ); n = 1; : : : ; N such that |fn (xn ) − fn (x)|¡; n = 1; 2; : : : ; N; diam(x1 ; : : : ; xN ) max(kx1∗ k; : : : ; kxN∗ k)¡ and x∗ ∈

N X

xn∗ + V:

n=1

Proof. Let ¿0 and V be a weak-star neighbourhood of 0 in X ∗ . Fix r¿0 and L a nite-dimensional subspace of X containing x such that L⊥ +2rBX ∗ ⊂ V . Since PN ∗ x ∈D ( n=1 fn )(x) there exists a convex C 1 function g such that ∇g(x) = x∗ and PN F n=1 fn − g attains a local minimum at x. Choose 0¡¡ min(; r) such that ky − xk¡¡ implies that k∇g(x) − ∇g(y)k¡r and let L be the indicator function of L. PN Then n=1 fn −g+L attains a local minimum at x. By Remark 2.5(b) (f1 ; : : : ; fN ; −g; L ) is locally uniformly lower semicontinuous. Applying the strong local fuzzy sum rule of Theorem 2.6 yields the existence of xn ; n = 1; : : : ; N + 2, such that kxn − xk¡¡; n = 1; : : : ; N + 2, xn∗ ∈ DF f(xn ); n = 1; : : : ; N , xN∗ +1 = −∇g(xN +1 ) and xN∗ +2 ∈ DF L (xN +2 ) satisfying the conclusion of Theorem 2.6, i.e., |fn (xn ) − fn (x)|¡¡; ||xn∗ ||diam({x1 ; : : : ; xN }) ≤ ||xn∗ ||diam({x1 ; : : : ; xN +2 })¡¡ for n = 1; : : : ; N , |L (xN +2 ) −L (x)|¡, i.e., xN +2 ∈ L, and N X

xn∗ − ∇g(xN +1 ) + xN∗ +2 ∈ rBX ∗ :

n=1

Note that DF L (xN +2 ) = L⊥ and kx∗ − ∇g(xN +1 )k¡r. Therefore, ∗

x ∈

N X n=1

xn∗



+ L + 2rB

X∗



N X n=1

xn∗ + V:

698

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

Remark 2.8. (a) Io e [95] rst proved a local fuzzy sum rule in nite-dimensional spaces. Extensions to in nite-dimensional spaces may be found in [102] under the condition that all but one of the summand functions are locally Lipschitz. Deville and Haddad [73] relaxed the Lipschitz condition to one of uniform continuity. Borwein and Io e [20] and Io e and Rockafellar [107] proposed the following sequential uniform semicontinuity condition which covers both Deville and Haddad’s condition and the case when X is nite-dimensional. Deÿnition 2.9 (Sequential uniform lower semicontinuity). Let f1 ; : : : ; fN : X → R be lower semicontinuous functions. We say that (f1 ; : : : ; fN ) is sequentially uniformly lower semicontinuous at x if there exists a closed ball x + B centered at x such that for any N sequences {xnr }, n = 1; 2; : : : ; N; r = 1; 2; : : : ; belonging to x + BX and such that kxnr − xmr k → 0 as r → ∞, there is a sequence {ur } of the elements of the ball such that kxnr − ur k → 0 and lim inf r→∞

N X

(fn (xnr ) − fn (ur )) ≥ 0:

n=1

The condition of De nition 2.4 that we use here is a more topological form of the sequential uniform semicontinuity condition in De nition 2.9. It is not hard to see that local sequential uniform lower semicontinuity implies local uniform lower semicontinuity. Indeed, let (f1 ; : : : ; fN ) be sequential uniformly lower semicontinuous and let  be as in De nition 2.9. Choose sequences {xnr }, n = 1; 2; : : : ; N; r = 1; 2; : : : ; belonging to x + BX such that kxnr − xmr k → 0 as r → ∞ and ( N N X X fn (xnr ) = lim inf fn (xn ): kxn − xm k ≤ h; lim r→∞

h→0

n=1

n=1

)

xn ; xm ∈ x + BX ; n; m = 1; : : : ; N

:

Then there is a sequence {ur } ⊂ x + BX such that kxnr − ur k → 0 and lim inf r→∞

N X

(fn (xnr ) − fn (ur )) ≥ 0:

n=1

Thus, inf

x+BX

N X n=1

fn ≤ lim inf r→∞

= lim inf h→0

N X

fn (ur ) ≤ lim inf

n=1

( N X

r→∞

N X

fn (xnr )

n=1

fn (xn ): kxn − xm k ≤ h;

n=1

xn ; xm ∈ x + BX ; n; m = 1; : : : ; N

) ;

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

699

so that (f1 ; : : : ; fN ) is local uniformly lower semicontinuous. Moreover, local uniformly lower semicontinuity is properly weaker than local sequential uniform lower semicontinuity as shown by the following example from [33]. Example 2.10. Let X be an in nite-dimensional Banach space and select a sequence ek in X such that kek k = 1 and kek −el k¿1=2 when k 6= l. De ne A := {ek =l: k; l = 1; 2; : : :} ∪ {0} and B := {(ek + e1 =k)=l: k; l = 1; 2; : : :} ∪ {0}. Then both A and B are closed sets and A ∩ B = {0}. De ne f1 := A and f2 := B . We show that, for any ¿0, (f1 ; f2 ) is not locally uniformly lower semicontinuous on B as de ned in [20, 107]. In fact, let l be an integer such that 2=l¡ and let x1r = er =l and x2r = (er + e1 =r)=l. Then kx1r − x2r k → 0 and f1 (x1r ) = f2 (x2r ) = 0 for all r. Now for any sequence ur satisfying kxnr − ur k → 0; n = 1; 2; we have ur 6= 0 for r suciently large. Thus, at least one of f1 (ur ), f2 (ur ) is ∞ so that lim inf r→∞

2 X

(fn (xnr ) − fn (ur )) = −∞¡0:

n=1

However, (f1 ; f2 ) is locally uniformly lower semicontinuous according to De nition 2.4 because the right hand side is always nonnegative while the left hand side is zero. (b) Deville and Ivanov [74] and Vanderwer and Zhu [195] independently constructed di erent examples showing that the local fuzzy sum rule does not hold for lower semicontinuous functions in in nite-dimensional spaces without additional assumptions. The following is a Hilbert space version of the example in [195]. Example 2.11. Let X P := ‘2 and let ek be the standardPbasis. Then x ∈ X can be uniquely ∞ n represented as x = k=1 x(k)ek ; for Pn (x) := k=1 x(k)ek , one has kPm (x)k ≤ kPn (x)k for m ≤ n, in particular kPn k ≤ 1 for each n. Now x(k) → 0 as k → ∞ and so kxk∞ := max{|x(k)|: 1 ≤ k¡∞} exists. Moreover, for k0 such that |x(k0 )| = kxk∞ , we have |x(k0 )| = kPk0 +1 (x) − Pk0 (x)k ≤ 2kxk. Thus k · k∞ is Lipschitz with a Lipschitz constant 2. De ne Fn = {x: kxk ≤ 3; x(i) ≥ 0 and x(i) = 0 if i mod 3 6= 0 or i¡3n}. Now we construct two functions  0 if x = 0;   1 f1 (x) := − √n − kyk∞ if x = 1n e3n−1 + y; y ∈ Fn ;   +∞ otherwise and

 0   f2 (x) := − √1n − kyk∞   +∞

if x = 0; if x = 1n e3n−2 + y; y ∈ Fn ; otherwise:

First observe that dom(f1 ) ∩ dom(f2 ) = 0 by the uniqueness of basis representations, so f1 + f2 attains a minimum at 0. From the de nitions it also follows that f1 and f2

700

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

are both bounded below by −7 since k · k∞ is Lipschitz with a Lipschitz constant 2. We now prove that f1 is lower semicontinuous (the proof for f2 is similar). Suppose xn ∈ dom(f1 ) and xn → x. If x = 0, we may assume √ xn 6= 0 and so xn = (1=kn )e3kn −1 +yn , yn ∈ Fkn . Now kn → ∞, and yn → 0 and so −(1= kn ) − kyn k∞ → 0. If x 6= 0, we know that kn 6→ ∞. Indeed, if kn → ∞, then for each i, we have xn (i) → 0 as n → ∞ because xn (i) = 0 for all i ≤ 3kn − 1. Because the norm and pointwise limit must agree if they both exist, we conclude that xn converges to 0 in norm. Now because kn 6→ ∞, we know that kn = n0 for all large n (because, when n 6= m, k(1=n)e3n−1 + yn − ((1=m)e3m−1 + ym )k ≥ max{1=n; 1=m} for ym ∈ Fm , yn ∈ Fn by the monotonicity of the basis). Therefore, xn = (1=n0 )e3n0 −1 + yn , yn ∈ Fn0 for all large n. This implies yn → y ∈ Fn0 , which with the continuity of k · k∞ implies 1 1 f1 (xn ) = − √ − kyn k∞ → − √ − kyk∞ = f1 (x): n0 n0 This proves the lower semicontinuity of f1 (and similarly of f2 ). We turn to prove that, for any xi ∈ B and xi∗ ∈ DF fi (xi ), kx1∗ + x2∗ k ≥ 1. Let gi be the function associated to xi∗ , i = 1; 2; as in the de nition. Now observe that DF f1 (0) is empty because       √ 1 1 n f1 0 + e3n−1 − f1 (0) ≤ n − √ − 0 = − n; n n = (1=m)e3m−1 + y1 and x2 = similarly DF f2 (0) is empty. P∞Thus, we can write x1P ∞ (1=n)e3n−1 +y2 where y1 = k=m ak e3k ∈ Fm and y2 = k=n bk e3k ∈ Fn . We will prove ∗ ∗ that kx1 + x2 k ≥ 1 in the case m ≤ n (the proof for m ≥ n is similar). Let bk0 = maxk ≥ n {bk }, then 0 ≤ bk0 ≤ 2kx2 k ≤ 2, and thus y2 +te3k0 ∈ Fn for 0 ≤ t ≤ 1, and ky+te3k0 k∞ = kyk∞ + t. Therefore g2 (x2 + te3k0 ) − g2 (x2 ) f2 (x2 + te3k0 ) − f2 (x2 ) ≤ = −1: t t Now, because m ≤ n, we have y1 + te3k0 ∈ Fm , and because a3k0 ≥ 0 we know that ky1 + te3k0 k∞ ≥ ky1 k∞ . Consequently g1 (x1 + te3k0 ) − g1 (x1 ) f1 (x1 + te3k0 ) − f1 (x1 ) ≤ ≤ 0: t t Therefore, hx2∗ ; e3k0 i ≤ −1 while hx1∗ ; e3k0 i ≤ 0. This shows that kx1∗ + x2∗ k ≥ 1 as desired. (c) On the other hand, the following simple example from [215] shows that the uniform lower semicontinuity condition used here is not tight. Example 2.12. Let X 2; : : : ; satisfy kei k = 1  0 f1 (x) := −1=l  ∞

be an in nite-dimensional Banach space and let ei ∈ X; i = 1; and kei − ej k¿1=2 for i 6= j. De ne if x = 0; if x = ei =l; otherwise

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

and

701

 if x = 0; 0 f2 (x) := −1=l if x = (ei + e1 =i)=l;  ∞ otherwise;

where l = 1; 2; : : :. Then (f1 ; f2 ) is not locally uniformly lower semicontinuous at 0. In fact, for any h¿0, let l be the smallest positive integer satisfying 1=l¡h. Then 0 = inf (f1 + f2 )(x) x∈hB

¿ lim inf {f1 (x1 )+f2 (x2 ): kx1 − x2 k¡; x1 ; x2 ∈ hB} = −2=l: →0

However, it is easy to see that DF f1 (ei =l) = DF f2 ((ei + e1 =i)=l) = X ∗ and, therefore, the local sum rule holds at 0. 2.3. Multidirectional mean value inequality We consider multidirectional mean value inequalities established by Clarke and Ledyaev. They estimate the extremal values of a function on sets and such results cannot be derived from the usual mean value theorems for function values on points. Many pleasing applications can be found in [50, 52, 53, 133]. Let x ∈ X and Y ⊂ X . We denote by [x; Y ] the convex hull of {x} ∪ Y , i.e., [x; Y ] := {x +t(y −x): t ∈ [0; 1]; y ∈ Y } and d(Y; x) := inf {kx −yk: y ∈ Y } the distance from x to Y . The nature of this multidirectional mean value inequality is well illustrated by the following convex case of the theorem which is useful itself in many applications. Note that in the convex version of the mean value inequality we do not need to assume that X is Frechet smooth. Theorem 2.13 (Convex multidirectional mean value inequality). Let X be a Banach space; let Y be a nonempty; closed and convex subset of X and x ∈ X and let f : X → R be a convex continuous function. Suppose that f is bounded below on [x; Y ] and inf f(y) − f(x)¿r:

y∈Y

Then; for any ¿0; there exist z ∈ [x; Y ] and z ∗ ∈ @f(z); the convex subdi erential of f at z; such that r¡hz ∗ ; y − xi + ky − xk

for all y ∈ Y:

Further; we can choose z to satisfy f(z)¡ inf f + |r| + : [x; Y ]

Proof. (1) A special case: We begin by considering the special case when inf f(y)¿f(x)

y∈Y

and

r = −¡0:

702

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

Let f := f + [x; Y ] . Then f is bounded below on X . Without loss of generality, we may assume that ¡ inf f(y) − f(x): y∈Y

Applying Ekeland’s variational principle [78], we conclude that there exists z such that  f(z)¡ inf f + 

(9)

and   − kw − zk: f(w) ≥ f(z)

(10)

That is to say u → f(u) + [x;Y ] (u) + ku − zk  attains a minimum at z. By Eq. (9), f(z)¡+∞ hence z ∈ [x; Y ]. Since [x; Y ] is convex the sum rule for the convex subdi erential implies that there exists z ∗ ∈ @f(z) such that 0 ≤ hz ∗ ; w − zi + kw − zk; ∀w ∈ [x; Y ]. Using a smaller  to begin with if necessary we have, for w 6= z, 0¡hz ∗ ; w − zi + kw − zk;

∀w ∈ [x; Y ]\{z}:

(11)

 ≤ f(x) + ¡ inf Y f, so z 6∈ Y . Thus, Moreover by inequality (9) we have f(z) = f(z) we can write z = x + t(y − x) where t ∈ [0; 1). For any y ∈ Y set w = y + t(y − y) 6= z in Eq. (11) yields 0¡hz ∗ ; y − xi + ky − xk;

∀y ∈ Y:

(12)

(2) The general case: We now turn to the general case. Consider X × R with the norm k(x; r)k = kxk + |r|. Take an 0 ∈ (0; =2) small enough so that inf f(y) − f(x)¿r + 0

y∈Y

and de ne F(z; t) := f(z) − (r + 0 )t. Obviously, F is convex lower semicontinuous on X × R and is bounded below on [(x; 0); Y × {1}]. Moreover, inf F = inf f − (r + 0 )¿f(x) = F(x; 0):

Y ×{1}

Y

Applying the special case proved above with f, x and Y replaced by F, (x; 0) and Y × {1}, we conclude that there exist (z; s) ∈ [(x; 0); Y × {1}] and z ∗ ∈ @f(z) satisfying f(z) − (r + 0 )s¡

inf

(w; t)∈[(x; 0); Y ×{1}]

(f(w) − (r + 0 )t) + 0 ;

i.e., f(z) ¡

inf

(w; t)∈[(x; 0); Y ×{1}]

(f(w) − (r + 0 )(t − s)) + 0 ≤ inf f + |r| +  [x; Y ]

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

703

such that, for all y ∈ Y , 0 ¡ hz ∗ ; y − xi − (r + 0 ) + 0 (ky − xk + 1) = hz ∗ ; y − xi − r + 0 ky − xk ≤ hz ∗ ; y − xi − r + ky − xk: This completes the proof. A general nonsmooth version of the multidirectional mean value inequality may be derived by using the nonlocal fuzzy sum rule to replace the sum rule for convex subdi erentials. This leads to: Theorem 2.14 (Multidirectional mean value inequality). Let Y be a nonempty; closed and convex subset of X and x ∈ X and let f : X → R be a lower semicontinuous function. Suppose that; for some h¿0; f is bounded below on [x; Y ] + hBX and lim

inf

→0 y∈Y +BX

f(y) − f(x)¿r:

Then; for any ¿0; there exist z ∈ [x; Y ] + B and z ∗ ∈ DF f(z) such that r¡hz ∗ ; y − xi + ky − xk

for all y ∈ Y:

Further; we can choose z to satisfy inf

f(z)¡ lim

→0 [x; Y ]+BX

f + |r| + :

Proof. As in the proof of Theorem 2.13 we can reduce the general case to the special case when lim

inf

→0 y∈Y +BX

f(y)¿f(x)

and

r = −¡0:

We now prove this special case. Let f := f + [x; Y ]+hBX . Then f is bounded below on X . Fix a h ∈ (0; h=2) such that inf y∈Y +2hB  X f(y)¿f(x). Without loss of generality, we may assume that    ¡ min inf f(y) − f(x); h :  X y∈Y +2hB

Applying the nonlocal fuzzy sum rule of Theorem 2.1 to f1 := f and f2 := [x; Y ] we  = DF f(z) and u∗ ∈ NF ([x; Y ]; u) obtain that there exist z; u with kz −uk¡, z ∗ ∈ DF f(z) satisfying max(kz ∗ k; ku∗ k) · kz − uk¡

(13)

inf

(14)

and f(z)¡ lim

→0 [x; Y ]+BX

f +  ≤ f(x) + 

such that kz ∗ + u∗ k¡:

(15)

704

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

Since [x; Y ] is convex, NF ([x; Y ]; u) coincides with the normal cone of [x; Y ] at u in the sense of convex analysis. Thus, u∗ ∈ NF (u; [x; Y ]) implies that hu∗ ; w − ui ≤ 0;

∀w ∈ [x; Y ]:

(16)

Combining Eqs. (15) and (16) yields 0¡hz ∗ ; w − ui + kw − uk;

∀w ∈ [x; Y ]\{u}:

(17)

Moreover, we must have d(u; Y ) ≥ h for otherwise we would have d(z; Y ) ≤ 2h and  − x). f(z) ≥ inf y∈Y +2hB  X f(y)¿f(x) +  which contradicts Eq. (14). Let u := x + t(y  Then h ≤ ku − yk  = (1 − t)kx − yk  implies 1 − t¿0. Clearly x 6∈ Y . For any y ∈ Y set w = y + t(y − y) 6= u in Eq. (17) yields 0¡hz ∗ ; y − ui + ky − uk;

∀y ∈ Y:

(18)

Remark 2.15. (a) Clarke and Ledyaev proved versions of the multidirectional mean value inequality in two di erent settings: (1) for a lower semicontinuous function on a Hilbert space with regard to its values on a set and a point [50]; and (2) for a continuously Gateaux di erentiable function or for a general Lipschitz function on a Banach space with regard to its values on two sets (one of them compact) [49]. Theorem 2.14 stated here extends the type (1) mean value inequality by allowing X to be a more general Banach space and Y to be unbounded. This generalization was discovered independently by Aussel et al. [7] and by Zhu [217] in di erent settings. For other generalizations see Borwein et al. [33] and Radulescu and Clarke [162]. Luc derived a less general result in [133] similar to version (1) of the mean value inequality. In [123], Lewis and Ralph established an interesting relation: the nitedimensional version of the type (2) mean value inequality is equivalent to a hybrid convex-nonlinear (Fenchel) duality theorem. (b) As indicated in [195] condition lim→0 inf y∈Y +BX f(y) − f(x)¿r cannot be replaced by the tighter condition inf y∈Y f(y)−f(x)¿r in in nite-dimensional spaces. However; the two conditions are the same when X is nite-dimensional or when f is uniformly continuous. Moreover; the tighter condition inf y∈Y f(y) − f(x)¿r suces when f is convex (Theorem 2.13) or di erentiable (see [50, 53]). (c) Theorem 2.13 is a special case of the result in [117] which in turn is a generalization of the re ned version of the multidirectional mean value inequality for di erentiable functions in [50]. The conclusion of Theorem 2.13 is ner than Theorem 2.14 in that the point z in Theorem 2.13 belongs to [x; Y ]. In the nonsmooth nonconvex setting of Theorem 2.14 this is impossible as shown by the following simple example from [50]: Set X := R; x := 0; Y := {1} and de ne  p − |u| for u ≤ 0; f(u) := 1 for u¿0: Then lim → 0 inf y∈Y +BX f(y) − f(x) = 1; and applying Theorem 2.14 for r = 1=2 gives a point z and z ∗ ∈ DF f(z) such that z ∗ ¿1=2. But DF f(z) = {0} when z¿0; and DF f(0) = ∅; so z necessarily lies outside [0; 1].

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

705

(d) The term ky −xk in Theorem 2.14 is redundant when Y is bounded but cannot be dispensed with in general. A simple example is X := Y := R and f(y) := ey (see [217, Remark 3.4]). 2.4. The extremal principle The terminology “extremal principle” was rst used by Mordukhovich in [141] while the essence of the results can be traced back to Mordukhovich [136] for the nitedimensional case and Kruger and Mordukhovich [115] for in nite-dimensional extensions – given in terms of the “-Frechet” normal (where the general de nition of an extremal point was introduced). We may usefully view the principle as an extension of the Hahn–Banach separation theorem to nonconvex sets. Diverse applications of extremal principles can be found in the work of Kruger and Mordukhovich [115], Mordukhovich [139] and Mordukhovich and Shao [147]. We rst recall the de nition of an extremal point. Deÿnition 2.16 (Extremal system). Let S1 and S2 be closed sets in a Banach space X and let x ∈ S1 ∩ S2 . Then x is called a local extremal point of the set system {S1 ; S2 } if there are a neighborhood U of x and sequences {aik } ⊂ X; i = 1; 2, such that aik → 0 for i = 1; 2 and (S1 − a1k ) ∩ (S2 − a2k ) ∩ U = ∅

∀k = 1; 2; : : : :

(19)

We say that the sets S1 and S2 generate a (local) extremal system {S1 ; S2 } if they have at least one local extremal point. Theorem 2.17 (Extremal principle). Let S1 and S2 be closed sets in X . Let x ∈ S1 ∩ S2 be a local extremal point of the system {S1 ; S2 }. Then; for any ¿0 there exist xn ∈ Sn ∩ x + B and xn∗ ∈ NF (Sn ; xn ); such that kx1∗ k; kx2∗ k ≥ 1 − 

and kx1∗ + x2∗ k¡:

(20)

Proof. Denote elements of X × X by z = (z 1 ; z 2 ). Let x be a local extremal of (S1 ; S2 ) and x +hBX ⊂ U where U is a neighborhood of x as in the de nition of a local extremal point. Let 0 ¿0 be an arbitrary positive number. Take a ∈ X such that kak¡0 and S1 ∩ (S2 + a) ∩ (x + hBX ) = ∅. Applying the nonlocal fuzzy sum rule of Theorem 2.1 to functions f1 (z) := S1 × S2 (z) + kz 1 − z 2 − ak and f2 (z) := kz − (x; x )k2 yields that there exist xn and n ∈ DF fn (xn ); n = 1; 2 satisfying kx1 − x2 k¡0

(21)

and kx2 − (x; x )k2 ≤ f1 (x1 ) + f2 (x2 ) ≤ lim inf {f1 (y1 ) + f2 (y2 ): ky1 − y2 k¡; →0

y1 ; y2 ∈ (x; x ) + hBX ×X } + 0 ≤ f1 (x; x ) + f2 (x; x ) + 0 = kak + 0 ¡20

(22)

706

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

such that k1 + 2 k¡0 :

(23)

When 0 is small enough, relations (21) and (22) imply that kzni −x k¡ min(h; ). Taking 0 even smaller we can also ensure k1 k¡ and k2 k¡. Moreover, f1 (x1 )¡∞ implies that x11 ∈ S1 and x12 ∈ S2 so that kx11 −x12 −ak¿0. It follows from (11 ; 12 ) = 1 ∈ DF f1 (x1 ) that x1∗ := 11 −∇k·k(x11 −x12 −a) ∈ NF (S1 ; x1 ) and x2∗ := 12 +∇k·k(x11 −x12 −a) ∈ NF (S2 ; x2 ) satisfy the conclusion of the theorem. Remark 2.18. Recently; Mordukhovich and Shao further extended the extremal principle to Asplund spaces (spaces in which every separable subspace has a separable dual but originally de ned as those spaces on which every continuous convex function is generically Frechet di erentiable; see [155]) in terms of Frechet normals in [148]. Moreover, they showed that it is equivalent to the Asplund property of the Banach space in [149] by using Fabian’s local fuzzy sum rule characterization of Asplund spaces [79, 80]. The equivalence results in [217] then yield that all the four basic results discussed in this section characterize Asplund spaces. In particular; they all hold in Asplund spaces (and so in all re exive spaces). Extensions to more general settings can be found in [21, 33]. 3. Constrained minimization problems and subdi erential calculus Constrained optimization problems provide important models in many di erent applications. Finding good rst-order necessary conditions for solutions to such problems is a prerequisite. For smooth nite-dimensional problems the Lagrange multiplier theorem and Karush–Kuhn–Tucker conditions are the canonical results. There is a vast literature on the generalizations of these “critical point” conditions to nonsmooth and in nite-dimensional settings. We will present a fuzzy form of such necessary conditions in terms of subdi erentials using a variational analysis argument. Variational analysis enables us to impose minimal assumptions on the data: lower semicontinuity for the inequality constraints, continuity for the equality constraints and closeness for the feasible set and, thus, yields a quite general condition. We will also discuss the relationship of such conditions with subdi erential calculus. In this section we assume that X is a re exive Banach space. 3.1. Minimization problems with nitely many constraints  i = 0; 1; : : : ; N . Consider the following optimization Let C ⊂ X and fi : X → R; problem: P

minimize

f0 (x)

subject to

fi (x) ≤ 0;

i = 1; 2; : : : ; M;

fi (x) = 0;

i = M + 1; : : : ; N;

x ∈ C:

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

707

As usual, multipliers corresponding to the inequality constraints are nonnegative and multipliers corresponding to the equality constraints have no restriction. To simplify notation, we introduce the quantities i ; i = 0; 1; : : : ; N . The i ’s associated with the inequality constraints and the cost function are always 1; i.e., i := 1; i = 0; 1; : : : ; M . This corresponds to nonnegative multipliers. The i ’s associated with the equality constraints are either 1 or −1; corresponding to multipliers with arbitrary sign, i.e., i ∈ {−1; 1}; i = M +1; : : : ; N . We use the notation i ; i = 0; 1; : : : ; N throughout this section without further explanation. Theorem 3.1 (Fuzzy multiplier rule). Let C be a closed subset of X; let fi be lower semicontinuous for i = 0; 1; : : : ; M and fi be continuous for i = M + 1; : : : ; N and let x be a local solution of P. Assume that lim inf x→x d(DF fi (x); 0)¿0; for i = 1; : : : ; M and lim inf x→x d(DF fi (x) ∪ DF (−fi )(x); 0)¿0; for i = M + 1; : : : ; N . Then; for any positive  number ¿0 and any weak neighborhood V of 0 in X ∗ ; there exist (xi ; fi (xi )) ∈ (x;  + BX ×R ; i = 0; 1; : : : ; N and xN +1 ∈ x + BX such that fi (x)) 0 ∈ DF f0 (x0 ) +

N X

i DF (i fi )(xi ) + NF (C; xN +1 ) + V

i=1

where i ¿0; i = 1; : : : ; N . Remark 3.2. (a) It is easy to see that if f0 is C 1 then DF f0 (x0 ) in the conclusion  of the theorem can be replaced by ∇f0 (x). (b) Conditions lim inf x→x d(DF fi (x); 0)¿0; for i = 1; : : : ; M and lim inf d(DF fi (x) ∪ DF (−fi )(x); 0)¿0; x→x

for i = M + 1; : : : ; N serve as “constraint quali cations” to force the coecient 0 of DF f0 to be one. However; since our necessary conditions are in a fuzzy form they are less stringent than the usual constraint quali cations such as the Mangasarian– Fromovitz condition [134]. These conditions are not necessary if we do not insist 0 to be nonzero. Indeed; if the above condition fails for one of the fi ’s then we can assign the multiplier corresponding to that fi to be 1 and the rest of the multipliers to be 0. Thus; the following form of the multiplier rule holds without any constraint quali cation. Theorem 3.3. Let X be a re exive Banach space; let C be a closed subset of X; let fi be lower semicontinuous for i = 0; 1; : : : ; M and let gi be continuous for i = M + 1; : : : ; N . Assume that x is a local solution of P. Then; for any positive number  fi (x))  + ¿0 and any weak neighborhood V of 0 in X ∗ ; there exist (xi ; fi (xi )) ∈ (x; BX ×R ; i = 0; 1; : : : ; N and xN +1 ∈ x + BX such that 0∈

N X

i DF (i fi )(xi ) + NF (C; xN +1 ) + V;

i=0

where i ≥ 0; i = 0; 1; : : : ; N are not all zero.

708

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

Proof of Theorem 3.1 (Sketch). PN Since x is a local solution of P; it is a local minimum of f0 + C∩(∩Ni=1 Si ) = f0 + i=1 S i + C where Si := {x: fi (x) ≤ 0}; i = 1; : : : ; M and Si := {x: fi (x) = 0}; i = M + 1; : : : ; N . Therefore, ! N X S i + C (x): 0 ∈ DF f0 +  i=1

Applying the weak local fuzzy sum rule of Theorem 2.7, we get a necessary condition in terms of the subdi erential of f0 and normal cones of Si ; i = 1; 2; : : : ; N and C. The key then is to express the normal cone of Si in terms of the subdi erential of the corresponding function fi . These nontrivial facts are contained in the following theorems. Theorem 3.4. Let X be a re exive Banach space; let f : X → R be a lower semicon where S := {x: f(x) ≤ 0}. Then; either tinuous function and suppose  ∈ NF (S; x) (C1) for any ; ¿0 there exists (x; f(x)) ∈ (x;  f(x))  + BX ×R such that DF f(x) ∩ BX ∗ 6= ∅ or (C2) for any ¿0; there exist (x; f(x)) ∈ (x;  f(x))  + BX ×R ;  ∈ DF f(x) and ¿0 such that k − k¡: Theorem 3.5. Let X be a re exive Banach space. Let f : X → R be a continuous  where S := {x: f(x) = 0}. Then; either function and let  ∈ NF (S; x) (D1) for any ; ¿0 there exists (x; f(x)) ∈ (x;  f(x))  + BX ×R such that [DF f(x) ∪ DF (−f)(x)] ∩ BX ∗ 6= ∅ or (D2) for any ¿0; there exist (x; f(x)) ∈ (x;  f(x))  + BX ×R ;  ∈ DF f(x) ∪ DF (−f)(x) and ¿0 such that k − k¡: A full proof for Theorem 3.4 and a sketch of the proof for Theorem 3.5 in a Hilbert space using proximal subdi erentials and proximal normal cones are given below. Using the proximal subdi erential and the proximal normal cone makes the geometric idea behind the proofs easier to comprehend. It also strengthens the results. We refer to [32] for more technical proofs of these results in general re exive Banach spaces. The proximal subdi erential of f at x is de ned by requiring the supporting function g in the de nition of the Frechet subdi erential to be of the form g(y) = hx∗ ; y − xi − ky − xk2 where  is a given positive constant and is denoted by Dp f(x). That is we require a quadratic error term. The proximal normal cone of a closed set S at x ∈ S is de ned by Np (S; x) := Dp S (x). Theorem 3.6. Let X be a Hilbert space. Let f : X → R be a lower semicontinuous  where S := {x: f(x) ≤ 0}. Then; either function and let  ∈ Np (S; x)

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

709

(C1) for any ; ¿0 there exists (x; f(x)) ∈ (x;  f(x))  + BX ×R such that Dp f(x) ∩ BX 6= ∅ or (C2) for any ¿0; there exists (x; f(x)) ∈ (x;  f(x))  + BX ×R ;  ∈ Dp f(x) and ¿0 such that k − k¡:  Let r; ¿0 be Proof. Suppose that (C1) is not true. Then, in particular, 0 6∈ Dp f(x). constants such that, for all x ∈ S ∩ (x + rkkBX ); 0 ≥ h; x − xi  − kx − xk  2: Since f is lower semicontinuous, taking a smaller r if necessary, we may assume that f is bounded from below by −m on x + rkkBX for some positive constant m. Let  ∈ (0; r=4) be a constant such that h; x − xi  − kx − xk  2 is positive on (x + 2 +  and so is f. 2kkBX )\{x} For ¡ min(; 1=m) de ne h (x) := −1 max{0; kx − x − k − kk}2 : Note that h is a Lipschitz smooth function of x that has Lipschitz smooth derivative 0 at x whenever h (x) = 0 (in particular at x).  Consider (z): p (z) := f(z) + h (z) + {x+rkkB  X} Since 0 is not a proximal subderivative of f at x inf p ¡0: X

Let e := min( =2; − inf X p =2). By the Borwein–Preiss p smooth variational principle theorem [27] (see Theorem 1.6) with p := 2;  := 2e = ≤ 1 and  := e (see [27] and, for a Hilbert space version, [129]), there exist y ; w ∈ X with ky − w k¡2 such that p (y )¡ inf p + e ¡0 X

and such that z → p (z) + kz − w k2 2 attains a global minimum at y . Since p (y )¡0; y ∈ x + rkkBX and h (y ) ≤ h (y ) + ky − w k2 ¡ − f(y ) ≤ m: 2 Thus,

√ ky − x − k ≤ m + kk:

710

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

√ That is to say that y is inside the ball x +  + ( m + kk)BX . On the other hand, f(y )¡0 (because p (y )¡0) implies that y is outside the ball x + 2 + 2kkBX . Consequently  lim y = x:

→0

Moreover,  + e ; f(y ) ≤ p (y ) ≤ inf p + e ≤ f(x) X

 as → 0. It is and as before f being lower semicontinuous implies that f(y ) → f(x) easy to see that when is suciently small y is in the interior of the ball x + rkkBX and, therefore, z → f(z) + h (z) + kz − w k2 2 attains a local minimum at y . Consequently, the vector is  = k( )(x +  − y ) + (w − y ) a proximal subderivative of f at y where ( −1 2 (ky − x − k − kk)=ky − x − k if h (y )¿0; k( ) := 0 if h (y ) = 0: Since (C1) is not true we must have lim inf →0 k( ) greater than 0. (Otherwise there would exist a sequence j → 0 making Dp f(y ) 3  j → 0.) Thus,  := k( )( +  f(x))  + BX ×R and o(1)) as → 0. Choose small enough so that (y ; f(y )) ∈ (x; k =k( ) − k¡. Set x : = y ;  := 1=k( ) and  :=  to complete the proof. Theorem 3.7. Let X be a Hilbert space. Let f : X → R be a continuous function and  where S := {x: f(x) = 0}. Then; either let  ∈ Np (S; x) (D1) for any ; ¿0 there exists (x; f(x)) ∈ (x;  f(x))  + BX ×R such that [Dp f(x) ∪ Dp (−f)(x)] ∩ BX ∗ 6= ∅ or (D2) for any ¿0; there exist (x; f(x)) ∈ (x;  f(x))  + BX ×R ;  ∈ Dp f(x) ∪ Dp (−f)(x) and ¿0 such that k − k¡: Proof (Sketch). As in the proof of Theorem 3.6 we consider  6= 0. De ne r and m similarly with m being the upper bound for |f| instead of −f and de ne K and h in the same way. Observe that K is a convex set on which f 6= 0. Since f is continuous, it has constant sign on K. De ne ( f(z) + h (z) + {x+rkkB (z) if f is positive onK;  X} p (z) := (z) if f is negative on K: −f(z) + h (z) + {x+rkkB  X} The remainder of the proof closely follows the proof of Theorem 3.6.

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

711

Remark 3.8. The convexity of K follows from the concavity of the support function in the de nition of the proximal subderivative. This convexity also plays an important role in the proof of the general case. 3.2. A chain rule, a product rule and a quotient rule Let f1 ; : : : ; fN : X → R be lower semicontinuous functions and let f : RN → R be a lower semicontinuous function nondecreasing for each of its rst M variables (M ≤ N ).  Then one can check that Suppose that f( f1 ; : : : ; fN ) attains a local minimum at x.  : : : ; fN (x)))  is a local solution to the following minimization problem (on (x;  (f1 (x); X × RN ): minimize

f(y)

subject to

fn (x) − yn ≤ 0; n = 1; : : : ; M; fn (x) − yn = 0; n = M + 1; : : : ; N:

Applying the fuzzy multiplier rule of Theorem 3.1 in the previous subsection yields the following chain rule. Theorem 3.9 (Fuzzy chain rule). Suppose that f1 ; : : : ; fM : X → R are lower semicontinuous functions; fM +1 ; : : : ; fN are continuous functions and f : RN → R is a lower semicontinuous function nondecreasing for each of its rst M variables (M ≤ N ). Sup Then; for any positive number pose that f( f1 ; : : : ; fN ) attains a local minimum at x.  ¿0 and any weak-star neighborhood U of 0 in X ∗ ; there exist (xn ; fn (xn )) ∈ (x; fn (x))  + BX ×R ; n = 0; 1; : : : ; N; (y; f(y)) ∈ (y;  f(y))  + BRN +1 where y = (f1 (x);  :::;  and  = (1 ; : : : ; N ) ∈ DF f(y) + BRN such that fN (x)) 0∈

N X

DF (n fn )(xn ) + U:

n=0

The following corollary is obvious. Corollary 3.10. Suppose that f1 ; : : : ; fN : X → R are C 1 functions and f : RN → R is a Lipschitz function. Suppose that f( f1 ; : : : ; fN ) attains a local minimum at x.  Then; for any positive number ¿0 and any weak-star neighborhood U of 0 in X ∗ ; there  : : : ; fN (x))  and  = (1 ; : : : ; N ) ∈ exist (y; f(y)) ∈ (y;  f(y))+B  RN +1 where y = (f1 (x); DF f(y) + BRN such that 0∈

N X

n ∇fn (x)  + U:

n=0

Remark 3.11. When f1 ; : : : ; fN are locally Lipschitz one can derive a sharper conclusion in Theorem 3.9: ! N X n fn (xn ) + BX ∗ ; 0 ∈ DF n=0

712

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

see [53, 151]. However; such an improvement is impossible in the general case as shown by the following example from [215]. Example 3.12. Let X = R; N = 2; f1 (x) := x1=3 ; f2 (x) := −x1=5 and f(y1 ; y2 ) := max (y1 ; y2 ). Then f(f1 ; f2 ) attains a minimum at x = 0. Thus; 0 ∈ DF f(f1 ; f2 ) (0). For any y = (y1 ; y2 ); DF f(y) ⊂ A := {(a1 ; a2 ): a1 + a2 = 1; a1 ; a2 ≥ 0}. For any xed  ∈ (0; 1=2) and (1 ; 2 ) ∈ A + BR2 ; we can check directly that DF (1 f1 + 2 f2 ) (0) = ∅ and, for x 6= 0;  2 1 |DF (1 f1 + 2 f2 )(x)| = x−2=3 − x−4=5 → ∞ as x → 0: 3 5 Remark 3.13. In considering Remark 3.2(a) when f is C 1 we have more precisely  = ∇f(y).  This smooth version of the chainPrule is useful in deriving other calculus N rules. For example; setting f( f1 ; : : : ; fN ) := n=1 fn we may rededuce the weak fuzzy sum rule of Theorem 2.7. Then Example 2.10 also indirectly shows that the arbitrary weak-star neighborhood in the conclusions of the fuzzy chain rule and the fuzzy multiplier rule cannot be improved to an arbitrary norm neighborhood. Similarly; QN by setting f( f1 ; : : : ; fN ) := n=1 fn and f(f1 ; f2 ) := f1 =f2 ; respectively; we derive the following product rule and quotient rule:  Theorem 3.14 (Fuzzy product QN rule). Let f1 ; : : : ; fN : X → R be lower semicontinuous  Then, for any positive functions. Suppose that n=1 fn attains a local minimum at x. number ¿0 and any weak-star neighborhood U of 0 in X ∗ ; there exist (xn ; fn (xn )) ∈  + BX ×R ; n = 1; : : : ; N; such that (x;  fn (x))

0∈

N X

DF ( f1 (x)  : : : fn−1 (x)f  n (·)fn+1 (x)  : : : fN (x))  (xn ) + U:

n=1

Theorem 3.15 (Fuzzy quotient rule). Let f1 : X → R be a lower semicontinuous function and let f2 : X → R be a continuous function. Suppose that f1 =f2 attains a local  6= 0. Then, for any positive number ¿0 and any weak-star minimum at x and f2 (x)  fn (x))  + BX ×R ; n = 1; 2; such neighborhood U of 0 in X ∗ ; there exist (xn ; fn (xn )) ∈ (x; that

0∈

 1 (·)](x1 ) + DF [−f1 (x)f  2 (·)](x2 ) DF [f2 (x)f + U: 2 f2 (x) 

Remark 3.16. Mordukhovich and Shao [151] showed that under additional “fuzzy quali cation conditions” one can replace the weak neighborhood U in the above fuzzy product rule and fuzzy quotient rule by an arbitrary small norm neighborhood and gain additional quantitative estimate on the subderivatives involved.

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

713

3.3. Minimization problems with in nitely many constraints Consider the fundamental optimization problem with in nitely many equality and inequality constraints PI

minimize

f0 (x)

subject to

fs (x) ≤ 0;

s ∈ S;

ft (x) = 0;

t ∈ T;

where S and T are arbitrary sets. Again we use quantities s = 1; s ∈ S and t ∈ {−1; 1}; t ∈ T to simplify notation. Theorem 3.17. Let X be a re exive Banach space; let fs be lower semicontinuous for s ∈ S and let ft be continuous for t ∈ T . Let x be a local solution of Pb . Assume that lim inf x→x d(DF fs (x); 0)¿0; for s ∈ S and lim inf x→x d(DF ft (x) ∪ DF (−ft )(x); 0)¿0; for t ∈ T . Then; for any positive number ¿0 and any weak neighborhood U of 0  fi (x))  + in X ∗ ; there exist nite sets S; U ⊂ S and T; U ⊂ T such that (xi ; fi (xi )) ∈ (x; BX ×R ; i ∈ S; U ∪ T; U and X i DF (i fi ) (xi ) + U 0 ∈ DF f0 (x0 ) + i∈S; U ∪T; U

where i ≥ 0; i ∈ S; U ∪ T; U are not all 0. Proof. The strategy is to reduce the problem to one with only nitely many constraints. Without loss of generality we may assume that BX ∗ ⊂ U . Let L be a nite-dimensional subspace of X such that L⊥ ⊂ 13 U . Note if x is a solution to PI it is also a solution to PI with f0 replaced by f := f0 + K where K := x + L ∩ BX . Thus, " # " # \ \ 2  ≤ f0 (x)  − }∩ {x: fs (x) ≤ bs } ∩ {x: ft (x) = bt } = ∅: {x: f(x) s∈S

t∈T

Since f has compact level sets there exist nite sets S; U ⊂ S and T; U ⊂ T (these sets depend on  and L and, therefore, on U ) such that     \ \  ≤ f0 (x)  − 2 } ∩  {x: fs (x) ≤ bs } ∩  {x: ft (x) = bt } = ∅: {x: f(x) s∈S; U

t∈T; U

 = bs for Removing some elements T from S; U if necessary T we may assume that fs (x) all s ∈ S; U . Set C := [ s∈S; U {x: fs (x) ≤ bs }] ∩ [ t∈T; U {x: ft (x) = bt }]. Then  x)¡ f(  inf f + 2 : C

  0 BX and s ∈ S; U ∪{0} and Choose an 0 ¡=2 such that fs (x)−=2¡f s (x) for any x ∈ x+ 0  − ft (x)|¡=2 for any x ∈ x +  BX and t ∈ T; U . Invoking the smooth variational |ft (x)

714

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

principle [27] or Theorem 1.6 there exist x ∈ x + 0 BX and a C 1 function g with k∇g(x )k¡=3 such that f + g + C attains a minimum at x = x . In other words, x is a solution to the following minimization problem: minimize

 + g(x) f(x)

subject to

fs (x) ≤ bs ;

s ∈ S; U ;

ft (x) = bt ;

t ∈ T; U :

Applying the fuzzy multiplier rule of Theorem 3.1 there exist (xi ; fi (xi )) ∈ (x ; fi (x ))+ 0 BX ×R ; i ∈ S; U ∪ T; U and y0 so close to x that k∇g(y0 )k¡=3 and X

 0 ) + ∇g(y0 ) + 0 ∈ DF f(y

i∈S; U ∪T; U

1 i DF (i fi )(xi ) + U; 3

where i ≥ 0; i ∈ S; U ∪ T; U are not all 0. Since x ∈ C; we have fs (x)  − =2¡fs (x ) ≤  Thus, (xi ; fi (xi )) ∈ (x;  fi (x))  + BX ×R ; i ∈ S; U ∪ T; U . Applying the Weak bs = fs (x). Fuzzy Sum Rule of Theorem 2.7 to f we have  0 ) ⊂ DF f0 (x0 ) + 1 U; DF f(y 3 where x0 is a point that we may choose so that (x0 ; f0 (x0 )) is very close to (y0 ; f0 (y0 ))  f0 (x))  + BX ×R . Then to make (x0 ; f0 (x0 )) ∈ (x ; f0 (x )) + 0 BX ×R ⊂(x; 1 0 ∈ DF f(x0 ) + U + ∇g(y0 ) + 3 ⊂ DF f(x0 ) +

X

X i∈S; U ∪T; U

1 i DF (i fi )(xi ) + U 3

i DF (i fi )(xi ) + U:

i∈S; U ∪T; U

3.4. Subdi erential for maximum functions “Max” functions are closely related to constrained optimization problems. Let f(x) :=  f(x))  is a sup{fs (x): s ∈ S}. Then f attaining a local minimum at x implies that (x; local solution to the constrained minimization problem minimize

y

subject to

fs (x) − y ≤ 0;

s ∈ S:

Using the last relation, a subdi erential formula for the maximum function follows directly from the Fuzzy multiplier rule of Theorem 3.1.  s ∈ S be lower semiTheorem 3.18 (Subdi erential of max functions). Let fs : X → R; continuous functions and let f(x) := sup{fs (x): s ∈ S}. Suppose that f attains a local minimum at x.  Then; for any ¿0 and weak-star neighborhood U of 0 in X ∗ ; there exist a nite subset S; U of S; nonnegative numbers s and xs ∈ x + BX ; s ∈ S; U such

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

that 0∈

X

715

s DF fs (xs ) + U

s∈S; U

and



X

s − 1

¡:

s∈S; U

We note that the following classical result on the generalized gradient of the supremum of a family of convex functions can be recaptured from Theorem 3.18 through a limiting process.  s ∈ S and let Theorem 3.19 (Clarke [45, Section 2.8, Corollary 1]). Let fs : X → R; f(x) := sup{fs (x): s ∈ S}. Assume that (i) S is a compact metric space; (ii) fs is continuous as a function of s and convex as a function of x on a convex open set U ⊂ X ; (iii) each fs ; s ∈ S; is Lipschitz of given rank L on U and {fs (x): s ∈ S} is nite for any x ∈ U . Then f(x) is convex on U and; for each x ∈ U; Z  @c fs (x)(ds):  ∈ P[M (x)] ; @c f(x) = S

where M (x) := {s ∈ S: f(x) = fs (x)} and P[M (x)] is the collection of (Radon) probability measures supported on M (x). Note that the Clarke generalized gradient coincides with the convex subderivative for continuous convex functions [165]. 3.5. Notes First-order necessary conditions for constrained optimization problems comprise one of the main topics of nonsmooth analysis. Such conditions have been discussed in many di erent settings. Examples in terms of limiting generalized derivative objects and related literature can be found in [45, 96, 129, 139]. These results requires additional constraint quali cations and compactness conditions largely due to the nature of the limiting generalized derivative objects. (We will show later in Section 7 that those limiting forms of the rst order necessary conditions typically fail in general in nite dimensional spaces without additional assumptions.) Necessary conditions in fuzzy form without constraint quali cations were rst discussed in [115] in terms of Frechet norms of the epigraphs (for the inequality constraints), graphs ( for the equality constraints) of fi ’s and C in Frechet-smooth Banach spaces. Theorem 3.3 was proved in [32]. The idea of using the indicator functions of the level sets of the constraint functions comes from Treiman [192]. The chain rule was derived in [215]. Theorem 3.17 and the maximum formula is derived in [34].

716

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

4. Approximate mean value theorem and subdi erential criteria for functional properties One of the most important applications of the classical di erential theory is in the provision of derivative criteria for various properties of functions such as monotonicity, Lipschitzness, convexity, etc. In this section we discuss corresponding subdi erential criteria. 4.1. Approximate mean value theorems An important tool is Zagrodny’s powerful approximate mean value theorem. The following is a version of his approximate mean value theorem phrased in terms of the Frechet subdi erential. Theorem 4.1 (Approximate mean value theorem). Let f : X → R be a lower semicontinuous function de ned on X and let a; b ∈ X be two distinct points with f(a)¡∞ and let r ∈ R be such that r ≤ f(b). Then there exist c ∈ [a; b) and sequence xn with (xn ; f(xn )) → (c; f(c)) and xn∗ ∈ DF f(xn ) such that (i) lim inf n→∞ hxn∗ ; c − xn i ≥ 0; (ii) lim inf n→∞ hxn∗ ; b − ai ≥ r − f(a). Proof. Take v ∈ X ∗ such that hv; a − bi = r − f(a): Then g(x) := f(x) + hv; xi + [a; b] (x) attains its minimum at some c ∈ [a; b) because g(b) ≥ g(a). Applying the local fuzzy sum rule of Theorem 2.6 there exist sequences xn ; yn ; xn∗ and yn∗ satisfying (xn ; f(xn )) → (c; f(c)); xn∗ ∈ DF f(xn ); [a; b] 3 yn → c and yn∗ ∈ NF ([a; b]; yn ) such that kxn∗ k · kxn − yn k¡1=n; kyn∗ k · kxn − yn k¡1=n and kxn∗ + yn∗ + vk¡1=n: Then (i) can be derived directly via lim inf hxn∗ ; c − xn i = lim inf hxn∗ + v; c − xn i n→∞

n→∞

= lim inf h−yn∗ ; c − yn i ≥ 0: n→∞

To show (ii) note that c ∈ [a; b) implies that yn ∈ [a; b) for n suciently large. Then hxn∗ + v; b − ai = hxn∗ + v; b − yn i

kb − ak : kb − yn k

Taking limits we obtain lim inf hxn∗ + v; b − ai = lim inf hxn∗ + v; b − yn i n→∞

n→∞

kb − ak kb − yn k

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

= lim inf h−yn∗ ; b − yn i n→∞

717

kb − ak ≥ 0: kb − ck

This is (ii) in disguise. Remark 4.2. By passing to a subsequence one can replace the limit inferior in Theorem 4.1 by limit. 4.2. Lipschitz properties Theorem 4.3. Let U ⊂ X be an open convex set with U ∩ dom( f) 6= ∅ and let L¿0. Then f is Lipschitz with rank L on U if and only if; for all x ∈ U; sup{kx∗ k: x∗ ∈ DF f(x)} ≤ L. Proof. The “only if ” part is straightforward. We prove the “if ” part. Let a; b ∈ U with a ∈ dom(f ) and a 6= b; let r ∈ R such that r ≤ f(b); and let ¿0. It follows from Theorem 4.1(ii) that there exist x ∈ U and x∗ ∈ DF f(x) such that r − f(a) ≤ hx∗ ; b − ai +  ≤ Lkb − ak + : Since r ≤ f(b) and ¿0 are arbitrary, we derive that f(b)−f(a) ≤ Lkb−ak. Therefore, f(b)¡∞. Exchanging the roles of a and b we can conclude that f is Lipschitz of rank L on U . Corollary 4.4. Let U ⊂ X be a path connected open set with U ∩ dom( f) 6= ∅. Then f is a constant function on U if and only if; for all x ∈ U; DF f(x) ⊂{0}. 4.3. Cone-monotonicity Theorem 4.5. Let K be a cone in X . If; for all x; DF f(x) ⊂K − := {x∗ ∈X ∗ : hx∗ ; ki ≤ 0; ∀k ∈ K} then f is K-nonincreasing; that is; y ∈ x + K implies f(y) ≤ f(x). Proof. Let x; y ∈ X such that f(x)¡f(y). It follows from the Approximate Mean Value Theorem that there exist z ∈ dom( f) and z ∗ ∈ DF f(z) with hz ∗ ; y−xi¿0. Therefore y − x does not belong to K. 4.4. Weak monotonicity A weak monotonicity result can be derived similarly by replacing the approximate mean value theorem with the multidirectional mean value inequality. Theorem 4.6. Let D be a nonempty; compact; convex subset of X and let f : X → R be a lower semicontinuous function. Suppose that; for any u ∈ X; u∗ ∈ DF f(u) implies that mind∈D hu∗ ; di ≤ 0. Then; for any x and for any t¿0; one has min f(y) ≤ f(x):

y∈x+tD

718

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

Proof. There is nothing to prove if f(x) = +∞. When f(x)¡+∞ applying the multidirectional mean value inequality with Y := x + tD we have inf

lim

→0 y∈Y +BX

f(y) − f(x) = min f(y) − f(x): y∈x+tD

Choosing any r¡ miny∈x+tD f(y) − f(x) Theorem 2.14 and Remark 2.14(c) assert that there exist z and z ∗ ∈ DF f(z) such that r¡ minhz ∗ ; di ≤ 0: d∈D

Letting r → miny∈x+tD f(y) − f(x) completes the proof. 4.5. Quasi-convexity We recall that a function f : X → R is called quasiconvex provided, for any x; y ∈ dom f and z ∈ [x; y], f(z) ≤ max{f(x); f(y)} and that a multifunction F : X → X ∗ is quasimonotone if x∗ ∈ F(x);

y∗ ∈ F(y)

and

hx∗ ; y − xi¿0 ⇒ hy∗ ; y − xi ≥ 0:

Theorem 4.7. If DF f is quasimonotone then f is quasiconvex. Proof. We work by way of contradiction. Assume that there exist some x; y; z ∈ X such that z ∈ [x; y] and f(z)¿max{f(x); f(y)}. Applying Theorem 4.1 with a = x and b = z, there exist sequences xn and xn∗ ∈ DF f(xn ) such that xn → x ∈ [x; z), lim infn→∞ hxn∗ ; x − xn i ≥ 0 and lim infn→∞ hxn∗ ; z − xi¿0. Combining with y − x = (ky − x k=kz − xk)(z − x) we have lim inf hxn∗ ; y − xn i¿0: n→∞

(24)

Let  ∈ (0; 1) be such that z = x + (y − x ) and set zn := xn + (y − xn ). Then zn → z. Since f is lower semicontinuous in considering relation (24) we can pick an integer n such that f(zn )¿f(y) and hxn∗ ; y − xn i¿0:

(25)

Applying Theorem 4.1 again with a := y and b := zn , there exist sequences yk and yk∗ ∈ DF f(yk ) such that yk → y ∈[y; zn ), lim infk→∞ hyk∗ ; y −yk i ≥0 and lim infk→∞ hyk∗ ; zn − yi¿0. Noting that zn − y and xn − y lie in the same direction we obtain lim inf hyk∗ ; xn − yk i¿0: k→∞

(26)

Since y ∈ [xn ; y) inequality (25) yields lim inf hxn∗ ; yk − xn i = hxn∗ ; y − xn i¿0: k→∞

(27)

Inequalities (26) and (27) imply that, for k suciently large, we have both hyk∗ ; xn − yk i¿0 and hxn∗ ; yk − xn i¿0, i.e. DF f is not quasimonotone, a contradiction.

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

719

4.6. Convexity Theorem 4.8. If DF f is monotone then f is convex. Proof. If DF f is monotone then for each  ∈ X ∗ the operator x → DF f(x)+ = DF (f+ )(x) is monotone, hence quasimonotone. By Theorem 4.7, for each  ∈ X ∗ , the function f +  is quasiconvex. This is equivalent to the convexity of f. 4.7. Maximal monotonicity The approximate mean value theorem also a ords a simple and elegant proof of the maximal monotonicity of the convex subgradient of a convex lower semicontinu∗ ous function. Recall that a monotone multifunction F : X → 2 X is said to be maximal monotone if graph F does not properly contained in the graph of any monotone multifunction. Theorem 4.9. Let f : X → R be a proper semicontinuous function. If dom f 6= ∅ and DF f is monotone then DF f is maximal monotone. Proof. Let b ∈ X and b∗ ∈ X ∗ be such that b∗ ∈ DF f(b). We need to show that there exists x ∈ X and x∗ ∈ DF f(x) such that hx∗ −b∗ ; x−bi¡0. Observing that 0 ∈DF (f−b∗ ) (b) and, therefore, b is not a minimum of f−b∗ , there exists a ∈X such that (f−b∗ ) (a)¡( f−b∗ )(b). Then it follows from Theorem 4.1 there exist a sequence xn converges to c ∈ [a; b) and xn∗ ∈ DF f(xn ) such that yn∗ := xn∗ −b∗ ∈ DF (f −b∗ )(xn ) satisfying lim infn→∞ hyn∗ ; c − xn i ≥ 0 and lim infn→∞ hyn∗ ; b − ai¿0. It follows that lim inf hxn∗ − b∗ ; b − xn i n→∞

≥ lim inf hyn∗ ; b − ci + lim inf hyn∗ ; c − xn i n→∞

n→∞

kb − ck lim inf hy∗ ; b − ai + lim inf hyn∗ ; c − xn i¿0: ≥ n→∞ kb − ak n→∞ n It remains to set x := xn and x∗ := xn∗ for n suciently large. 4.8. Notes The approximate mean value theorem was proven rst by Zagrodny in terms of the Clarke generalized subgradient [213]. The Frechet subdi erential form was given in [130] along with some applications to function behaviour. Other related research can be found in [6, 33, 133, 147, 217]. The proof given here by using the re ned local fuzzy sum rule is taken from [215]. The prototypes of the Lipschitz criterion appeared in [167] for functions on nitedimensional spaces and in [188] for functions on in nite-dimensional spaces. Results in terms of the proximal subdi erential appeared in [48, 58, 59] which also include a criterion for cone monotonicity. The proof adopted here by using Zagrodny’s approximate

720

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

mean value inequality was discovered independently by Thibault and Zagrodny [186] (in a more general setting) and Loewen [130]. The weak monotonicity result is derived by Clarke and Ledyaev [50]. It is useful, for example, in discussing controllability of di erential inclusion systems (see [52]). Characterization of convexity for lower semicontinuous functions was discussed in [59, 63, 157] and quasiconvexity was discussed in [131]. The short proofs here follow a more general version of those of Aussel et al. [6]. The proof of maximal monotonicity of a convex subgradient has a long history [155]. The approximate mean value theorem was rst used to prove maximal monotonicity, in arbitrary Banach space, by Zagrodny in terms of the Clarke generalized gradients. 5. Coderivative calculus for multifunctions Multifunctions (set-valued maps) naturally appear in various areas of nonlinear analysis, optimization, control theory and mathematical economics. Aubin and Frankowska’s book [4] is an excellent introduction to the theory of multifunctions. Coderivatives are convenient derivative-like objects for multifunctions and were introduced by Mordukhovich [137] motivated by applications to optimal control (see [145] for more discussions on the motivations and the relationship among coderivatives and other derivative like objects for multifunctions). They are de ned via “normal cones” to the graph of the multifunctions. Di erent normal cones will yield di erent coderivatives. In this section we will discuss the Frechet coderivative of multifunctions corresponding to the Frechet normal cones. Deÿnition 5.1 (Coderivative). Let F : X → 2Y be a multifunction with a closed graph between Banach spaces X and Y , and let (x; y)  ∈ graph F. The Frechet coderivative of F at (x; y)  is de ned by  ∗ ) := {x∗ ∈ X ∗ | (x∗ ; −y∗ ) ∈ NF (graph F; (x; y))}:  DF∗ F(x; y)(y Note that when F is a single-valued C 1 function the Frechet coderivative coincides with the dual of the Frechet derivative. Calculus rules for coderivatives of multifunctions may be established by reducing them to calculus for subdi erentials of indicator functions of the graphs of the corresponding multifunctions. As with the fuzzy calculus for functions, the calculus for the Frechet coderivative can be derived in weak (accurate up to a weak-star neighborhood) and strong (accurate up to a norm neighborhood) forms. 5.1. Weak calculus We start with the weak form. It only requires the multifunctions under consideration to have closed graphs. Theorem 5.2 (Weak coderivative sum rule). Let X and Y be Frechet-smooth Banach spaces with equivalent Frechet smooth norms. Let Fn ; n = 1; 2; : : : ; N and F =

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

721

PN be closed-graph multifunctions from X into Y; and y ∈ n=1 Fn (x). Fix arbiPN  ∗ ). Then trary y n ∈ Fn (x); n = 1; 2; : : : ; N with y = n=1 y n . Suppose x∗ ∈ DF∗ F(x; y)(y for any ¿0 and any weak-star neighborhoods; U and V; of the origins in X ∗ and Y ∗ respectively; there exist (xn ; yn ) ∈ (graph Fn ) ∩ ((x; y n )+B); yn∗ ∈ y∗ +V; n = 1; 2; : : : ; N and xn∗ ∈ DF∗ Fn (xn ; yn )(yn∗ ) with maxn=1;:::; N (kxn∗ k; kyn∗ k) diam((x1 ; y1 ); : : : ; (xN ; yN ))¡ such that PN

n=1 Fn

x∗ ∈

N X

xn∗ + U:

(28)

n=1

 ∗ ). Then there exist a concave C 1 function g on X × Y Proof. Let x∗ ∈ DF∗ F(x; y)(y ∗ ∗  such that with (x ; −y ) = ∇g(x; y) (x; y) → graph(PN

n=1 Fn )

(x; y) − g(x; y)

attains a local minimum 0 at (x; y).  Since ! N N N X X X graph Fn (x; yn )≥graph F x; yn and graph Fn (x; y n ) = graph F (x; y)=  0; n=1

n=1

n=1

the function (x; y1 ; y2 ; : : : ; yN ) →

N X

graph Fn (x; yn ) − g x;

n=1

N X

! yn

n=1

attains a local minimum at (x; y 1 ; y 2 ; : : : ; y N ). Thus ! N X graph Fn (x; y n ) : (x∗ ; −y∗ ; : : : ; −y∗ ) ∈ DF n=1

Invoking the weak local fuzzy sum rule of Theorem 2.7, there exist xn ∈ x + BX and yn ∈ y n +BY , as well as yn∗ ∈ Y ∗ and xn∗ ∈ DF∗ F(xn ; yn )(yn∗ ) with maxn=1;:::; N (kxn∗ k; kyn∗ k) diam((x1 ; y1 ); : : : ; (xN ; yN ))¡ such that (x∗ ; −y∗ ; : : : ; −y∗ ) ∈ (x1∗ ; −y1∗ ; 0; : : : ; 0) + (x2∗ ; 0; −y2∗ ; 0; : : : ; 0) + · · · + (x2∗ ; 0; : : : ; 0; −yN∗ ) + U × V × · · · × V: Therefore one has yn∗ ∈ y∗ + V; n = 1; 2; : : : ; N and Eq. (28). This completes the proof of the theorem. Let X; Y , and Z be Banach spaces and let G : X → 2Y and F : Y → 2Z be arbitrary multifunctions with closed graphs. We de ne the composition of F and G by [ F(y): (29) (F ◦ G)(x) := F(G(x)) = y∈G(x)

Then a chain rule follows from a similar argument.

722

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

Theorem 5.3 (Weak coderivative chain rule). Let X; Y and Z be Frechet-smooth Banach spaces; let G : X → 2Y and F : Y → 2Z be multifunctions with closed graphs; and let y ∈ G(x) and z ∈ F(y).  Suppose x∗ ∈ DF∗ (F ◦ G)(x; z )(z ∗ ). Then; for any ¿0 and any neighborhoods; U; V and W of the origins in X ∗ ; Y ∗ and Z ∗ ; respectively; there exist x2 ∈ x + BX yn ∈ y n + BY ; n = 1; 2 and z1 ∈ z + BZ ; as well as x2∗ ∈ X ∗ ; yn∗ ∈ Y ∗ ; n = 1; 2 and z1∗ ∈ Z ∗ satisfying y1∗ − y2∗ ∈ V; z1∗ ∈ z ∗ + W; y1∗ ∈ DF∗ F(y1 ; z1 )(z1∗ ); and x2∗ ∈DF∗ G(x2 ; y2 )(y2∗ ) with max(kx2∗ k; ky1∗ k; ky2∗ k; kz1∗ k) · k(x1 ; y1 ) − (x2 ; y2 )k¡ such that x∗ ∈ x2∗ + U:

(30)

Proof. Let x∗ ∈ DF∗ (F ◦ G)(x; z )(z ∗ ). Then there exists a concave C 1 function g on X × Z with (x∗ ; −z ∗ ) = ∇g(x; z ) such that (x; z) → graph (F◦G) (x; z) − g(x; z) attains a local minimum 0 at (x; z ). Observe that graph F (y; z) + graph G (x; y) ≥ graph (F◦G) (x; z) and  z ) + graph G (x; y)  = graph(F◦G) (x; z ) = 0: graph F (y; We conclude that (x; y;  z ) is a local minimum of the function (x; y; z) → graph F (y; z) + graph G (x; y) − g(x; z): Therefore (x∗ ; 0; −z ∗ ) ∈ DF (graph F (y;  z ) + graph G (x; y)):  Applying the weak local fuzzy sum rule of Theorem 2.7, we can select x2 ∈ x + BX ; yn ∈ y + BY , n = 1; 2, and z1 ∈ z + BZ as well as x2∗ ; y1∗ ; y2∗ ; z1∗ with max(kx2∗ k; ky1∗ k; ky2∗ k; kz1∗ k) · k(x1 ; y1 ) − (x2 ; y2 )k¡ such that y1∗ ∈ DF∗ F(y1 ; z1 )(z1∗ ), x2∗ ∈ DF∗ G(x2 ; y2 )(y2∗ ) and (x∗ ; 0; −z ∗ ) ∈ (0; y1∗ ; −z1∗ ) + (x2∗ ; −y2∗ ; 0) + U × V × W: Then we have y1∗ − y2∗ ∈ V , z1∗ ∈ z ∗ + W and Eq. (30). 5.2. Strong calculus The strong calculus for the Frechet coderivative can be established similarly by using the strong local fuzzy sum rule. Now the sequential uniform lower semicontinuity condition in De nition 2.8 comes into play. First it follows easily from the de nition

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

723

that the sequential uniform lower semicontinuity condition is stable when adding a “nice” function as is made precise in the following lemma. Lemma 5.4. Let f1 ; : : : ; fN : X → R be lower semicontinuous functions. If ( f1 ; : : : ; fN ) is sequentially uniform lower semicontinuous at x and fN +1 : X → R is uniformly continuous around x then (f1 ; : : : ; fN ; fN +1 ) is sequentially uniform lower semicontinuous at x. Secondly, as shown by Io e [104] the sequential uniform lower semicontinuity is equivalent to the following general metric regularity condition. Deÿnition 5.5 (General T metric regularity). Let f1 ; : : : ; fN : X → R be lower semicontinN uous functions and x ∈ n=1 dom ( fn ). We say that ( f1 ; : : : ; fN ) satis es the general metric quali cation condition at x provided that there is an ! ∈ K [the set of nondecreasing nonnegative functions on R+ which are continuous at zero with !(0) = 0] such that d (x; a); epi

N X

!! fn

≤!

N X

n=1

! d((x; an ); epi( fn ))

n=1

for all x in a neighborhood of x and all a; an satisfying a =

PN

n=1

an .

This general metric regularity condition is particularly useful for indicator functions of closed subsets as is shown by the following lemma from [106]. TN Lemma 5.6. Let S1 ; : : : ; SN be closed subsets of X and let x ∈ n=1 Sn . Then the indicator functions Sn satisfy the general metric regularity condition at x if and only if there is an ! ∈ K such that d x;

N \ n=1

! Sn

≤!

N X

! d(x; Sn )

n=1

for all x in a neighborhood of x . Geometrically, this says that the distance to the intersection is of the same order as the sum of the distances to the individual sets. Strong calculus for the coderivatives of multifunctions then can be established under this general metric regularity condition. Theorem 5.7 (Strong coderivative PN sum rule). Let X and Y be Frechet-smooth spaces; and let Fn ; n = 1; 2; : : : ; N; F = n=1 Fn be closed-graph multifunctions from X into Y; PN and suppose y ∈ n=1 Fn (x). PN  ∗ ). Fix arbitrary y n ∈ Fn (x); n = 1; 2; : : : ; N with y = n=1 y n . Let x∗ ∈ DF∗ F(x; y)(y Suppose that graph Fn ; n = 1; : : : ; N satisfy the following general metric regularity

724

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

condition: for any (x; y1 ; : : : ; yN ) suciently close to (x; y 1 ; : : : ; y N ); ! N X d((x; yn ); graph Fn ) ; d((x; y1 ; : : : ; yN ); T ) ≤ ! n=1

where T := {(x; y1 ; : : : ; yN ): (x; yn ) ∈ graph Fn ); n = 1; : : : ; N } and ! ∈ K. Then for any ¿0; there exist (xn ; yn ) ∈ (graph Fn ) ∩ ((x; y n ) + BX × Y ); kyn∗ − y∗ k¡; n = 1; 2; : : : ; N and xn∗ ∈ DF∗ Fn (xn ; yn )(yn∗ ) with max (kxn∗ k; kyn∗ k) · diam((x1 ; y1 ); : : : ; (xN ; yN ))¡

n=1;:::; N

such that

N

∗ X ∗ xn ¡:

x −

(31)

n=1

 ∗ ). Then there exists a concave C 1 function g on X × Y Proof. Let x∗ ∈ DF∗ F(x; y)(y ∗ ∗  such that (x; y) → graph (F) (x; y)−g(x; y) attains a local minwith (x ; −y ) = ∇g(x; y) imum 0 at (x; y).  Since ! N N X X graph Fn (x; yn )≥graph (F) x; yn n=1

n=1

and N X

graph Fn (x; y n )=graph F (x; y)=0; 

n=1

the function (x; y1 ; y2 ; : : : ; yN ) →

N X n=1

graph Fn (x; yn ) − g x;

N X

! yn

n=1

attains a local minimum at (x; y 1 ; y 2 ; : : : ; y N ). Since the graphs of Fn ; n = 1; : : : ; N; satisfy the general metric regularity condition, (F1 ; : : : ; FN ) is sequentially uniformly lower semicontinuous. Then Lemma 5.4 implies that (F1 ; : : : ; FN ; −g) is sequentially uniPN formly lower semicontinuous. Let ’(x; y1 ; : : : ; yN ) := g(x; n=1 yn ). Then ∇’(x; y 1 ; : : : ; y N ) = (x∗ ; −y∗ ; : : : ; −y∗ ). Since g is C 1 there exists an 0 ¡=2 such that k(x; y1 ; : : : ; yN ) − (x; y 1 ; : : : ; y N )k¡0 implies that k∇’(x; y 1 ; : : : ; y N ) − (x∗ ; −y∗ ; : : : ; −y∗ )k¡=2: Invoke the strong local fuzzy sum rule of Theorem 2.6 with 0 in place of  and use (−x∗ ; y∗ ; : : : ; y∗ ) to replace the gradient of ’ at a point in the 0 neighborhood of (x; y 1 ; : : : ; y N ) with an error of at most =2. We conclude that there exist xn ∈ x + BX and yn ∈ y n + BY , as well as yn∗ ∈ Y ∗ and xn∗ ∈ DF∗ F(xn ; yn )(yn∗ ), with max (kxn∗ k; kyn∗ k) · diam((x1 ; y1 ); : : : ; (xN ; yN ))¡

n=1;:::; N

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

725

such that 0 ∈ (−x∗ ; y∗ ; : : : ; y∗ ) + (x1∗ ; −y1∗ ; 0; : : : ; 0) + (x2∗ ; 0; −y2∗ ; 0; : : : ; 0) + · · · + (x2∗ ; 0; : : : ; 0; −yN∗ ) + BX ∗ ×Y ∗ ×···×Y ∗ : The conclusion of the theorem follows. A chain rule may be similarly derived. Theorem 5.8 (Strong coderivative chain rule). Let X; Y and Z be Frechet-smooth Banach spaces; let G : X → 2Y and F : Y → 2Z be multifunctions with closed graphs; and let y ∈ G(x) and z ∈ F(y).  Suppose that x∗ ∈ DF∗ (F ◦ G)(x; z )(z ∗ ) and suppose that graph F and graph G satisfy the following general metric regularity condition : for all (x; y; z) suciently close to (x; y;  z ); d((x; y; z); T ) ≤ !(d((x; y); graph G) + d((y; z); graph F)); where T := {(x; y; z): y ∈ G(x); z ∈ F(y)} and ! ∈ K. Then; for any ¿0 there exist x2 ∈ x +BX yn ∈ y n +BY ; n = 1; 2 and z1 ∈ z +BZ ; as well as x2∗ ∈ X ∗ ; yn∗ ∈ Y ∗ ; n = 1; 2 and z1∗ ∈ Z ∗ satisfying ky1∗ − y2∗ k¡; kz1∗ − z ∗ k¡; y1∗ ∈ DF∗ F(y1 ; z1 )(z1∗ ); and x2∗ ∈ DF∗ G(x2 ; y2 )(y2∗ ) with max(kx2∗ k; ky1∗ k; ky2∗ k; kz1∗ k) · k(x1 ; y1 ) − (x2 ; y2 )k¡ such that kx∗ − x2∗ k¡:

(32)

5.3. Notes Strong calculus rules for Frechet coderivatives were established by Mordukhovich and Shao [151] under a “fuzzy quali cation condition” and by Io e and Penot [106] using the general metric regularity condition introduced by Io e [103]. The “fuzzy” constraint quali cation is stronger than general metric regularity as pointed out in Io e [104], who also showed that general metric regularity is equivalent to sequential uniform lower semicontinuity. On the other hand, the “fuzzy” constraint quali cation in [146] gives better estimates on the size of the coderivatives involved in these calculus rules and is convenient when studying the limiting coderivatives. The weak calculus was provided by Mordukhovich et al. [152] where an “intermediate” calculus in terms of other bornological coderivatives and exact calculus for limiting coderivatives were also discussed. Reduction to subdi erential calculus is a convenient way of establishing the coderivative calculus. There are several ways to do it. We used the method of Mordukhovich et al. [152] and Zhu [215] which exploits the structure of the viscosity subdi erentials for indicator functions. Io e and Penot [106] use marginal functions, Jourani and Thibault [111, 112] use the distance function, and Rockafeller and Wets [172] use projections.

726

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

6. Implicit function theorems, open coverings and metric regularity The grandfather of this whole subject was the celebrated Liusternik theorem which says that if f is a C 1 mapping from Banach space to Banach space Y and ∇f(x)X = Y then f is an open mapping in a neighborhood of x. Liusternik’s theorem is closely related to metric regularity and has many important applications. Many extensions have been produced over the years. It transpires that using a variational argument and the coderivative de ned in the previous section this result can be extended to arbitrary closed multifunctions. Even more generally, one can thus prove an implicit function theorem which simultaneously implies both the open mapping theorem and metric regularity for multifunctions. 6.1. An implicit function theorem ∗ When considering a multifunction with “two variables” F(x; y); we will use DF; x or to denote the “partial” Frechet coderivatives, i.e., Frechet coderivatives of F as a function of x or y alone.

∗ DF; y

Theorem 6.1 (Implicit function theorem). Let U be an open set in X × Y and let F : U → 2Z be a closed-valued multifunction satisfying (i) for any xed y; x → F(x; y) is upper semicontinuous; (ii) for any xed x; y → F(x; y) is Lipschitz with rank L; i.e.; for any yi ; i = 1; 2 with (x; yi ) ∈ U one has F(x; y2 ) ⊂ F(x; y1 ) + Lky2 − y1 kBZ ; (iii) there exists a ¿0 such that for any (x; y) ∈ U with 0 ∈ F(x; y); z ∈ F(x; y) and ∗ ∗ ∗ ∗ x∗ ∈ DF; x F(x; y; z)(z ) imply that kx k ≥ kz k; (iv) there exists (x0 ; y0 ) ∈ U such that 0 ∈ F(x0 ; y0 ): Then; for any K¿L=; G(y) := {x ∈ X : (x; y) ∈ U; 0 ∈ F(x; y)} is pseudo-Lipschitz around (x0 ; y0 ) with rank K; i.e.; there exist open sets V and W containing y0 and x0 ; respectively; such that; for any y1 ; y2 ∈ V; W ∩ G(y2 ) ⊂ G(y1 ) + Kky2 − y1 kBX : Proof. Let fy (x) := inf [kuk + graph F ((x; y); u))]: u∈Z

Then fy0 (x0 ) = 0. Moreover, it is easy to check that f is lower semicontinuous in x and is Lipschitz in y with rank L. Let r be a positive number such that (x0 + 2rBX ) × (y0 + (r=2K)BY ) ⊂ U . We rst show that G(y) 6= ∅; ∀y ∈ V := int(y0 + (r=2K)BY ).

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

727

Let y be an arbitrary element of V . Since f is Lipschitz in y with rank L and fy0 (x0 ) = 0 we have fy (x0 )¡r. Choose rst a positive number  and then a positive number ¡=2 such that r − fy (x0 )¿2 + (1 + )(r − 2) + : Applying the multidirectional mean value inequality to function x → fy (x) and set x0 + (r − 2)BX yields that inf

x∈x0 +(r−2)B

fy (x) − fy (x0 ) − ¡h; x − x0 i;

∀x ∈ x0 + (r − 2)BX ;

(33)

where  ∈ DF fy (w) and w ∈ x0 + (r − )B. By the de nition of subderivative there is a Frechet-smooth function g such that ∇g(w) =  and x → fy (x) − g(x) attains a local minimum 0 at x = w. Then there exists 0 ¡ such that inf

u∈Z; x∈w+0 BX

[kuk + graph F ((x; y); u)) − g(x)] ≥ 0 = fy (w) − g(w):

Let uw ∈ Z satisfy kuw k + graph F ((w; y); uw )) − g(w) ¡

inf

u∈Z; x∈w+0 BX

[kuk + graph F ((x; y); u)) − g(x)] + 2 :

Invoking the smooth variational principle [27] or Theorem 1.6 with  :=  we conclude that there exists a C 1 function  and x0 ∈ w + BX ⊂ x0 + (r − =2)BX ⊂ x0 + rBX and u0 ∈ Z such that (x; u) → kuk + graph F ((x; y); u)) − g(x) + (x; u) attains a minimum at (x0 ; u0 ) and k∇(x0 ; u0 )k¡=3. If 0 ∈ F(x0 ; y) then we are done. Otherwise, by the Strong Local Fuzzy Sum Rule of Theorem 2.6 there exist u1 , (xi ; ui ); i = 2; 3 close to (x0 ; u0 ) so that 0 ∈ F(x2 ; y), u1 6= 0, x2 ∈ x0 + rBX , k∇g(x3 ) − k¡=3 ∗ ∗ and k∇(x3 ; u3 )k¡=3 and x∗ ∈ DF; x F(x2 ; y; u2 )(z ) such that k{0} × ∇ku1 k + (x∗ ; −z ∗ ) − ∇g(x3 ) × {0} + ∇(x3 ; u3 )k¡=3: Then we have kk ≥ kx∗ k −  ≥ kz ∗ k −  ≥ (k∇ku1 kk − ) −  =  − (1 + ):

(34)

Combining Eqs. (33) and (34) we obtain 06

inf

x∈x0 +(r−2)BX

fy (x) ≤ fy (x0 ) − [ − (1 + )](r − 2) + 

= fy (x0 ) − r + 2 + (1 + )(r − 2) +  a contradiction. It remains to show that G is pseudo-Lipschitz with rank K. Set W = int(x0 + rBX ). Let y1 ; y2 ∈ V and let x2 be an arbitrary element of W ∩ G(y2 ). Then fy2 (x2 ) = 0. Since

728

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

fy (x2 ) is Lipschitz in y with rank L we have fy1 (x2 ) ≤ Lky2 −y1 k. Let r1 := Kky2 −y1 k. Then fy1 (x2 )¡r1 . The same argument as above shows that there exists x1 ∈ x2 + r1 BX = x2 + Kky2 − y1 kBX ⊂ x0 + 2rBX such that 0 ∈ F(x1 ; y1 ), i.e., x1 ∈ G(y1 ). Since x2 is an arbitrary element of W ∩ G(y2 ) we arrive at W ∩ G(y2 ) ⊂ G(y1 ) + Kky2 − y1 kBX , as was to be shown. 6.2. Open covering at a linear rate Let F be a multifunction from X to Y . Applying the implicit function Theorem 6.1 to the multifunction (x; y) → F(x) − y with Z = Y yields the following open covering theorem. Theorem 6.2 (Open covering theorem). Let U be an open set in X and let F : U → 2Y be a closed-valued upper semicontinuous multifunction. Suppose that there exists a ¿0 such that for any x ∈ U and y ∈ F(x); y∗ ∈ D∗ F(x; y)(x∗ ) implies that ky∗ k ≥ kx∗ k; Then; for any k¡ and any x + rBX ⊂ U; F(x) + krBY ⊂ F(x + rBX ) Remark 6.3. The property in the conclusion of the theorem is more precise than the general open covering and is called covering at a linear rate k. When such a covering property holds on a neighborhood of a point x; we say F has a covering property at linear rate k around x. The supremum of all the linear covering rate for a multifunction F around x is called the covering bound; and measures openness; of F at x and is denoted by (cov F)(x) in [140]. It was shown in [146] that the covering bound can be calculated by the following formula: Theorem 6.4. Let F: X → 2Y be a closed-valued upper semicontinuous multifunction in a neighborhood of x . Then (cov F)(x) = sup inf {kx∗ k: x∗ ∈ DF∗ F(x; y)(y∗ ); ¿0

x ∈ x + BX :y ∈ F(x) and ky∗ k = 1}: Proof. Let a := sup inf {kx∗ k: x∗ ∈ DF∗ F(x; y)(y∗ ); x ∈ x + BX :y ∈ F(x) and ky∗ k = 1}: ¿0

It follows from the Open Covering Theorem 6.2 that (cov F)(x) ≥ a: Suppose that a¡(cov F)(x). Let b be a positive number such that a + 2b¡(cov F)(x) and let U be a neighborhood of x on which F has the covering property at a linear rate with a covering bound (cov F)(x). Then there exist sequences xk ; yk ; xk∗ ; yk∗ such that xk → x , yk ∈ F(xk ), kyk∗ k = 1, kxk∗ k¡a + b and xk∗ ∈ DF∗ (xk ; yk )(yk∗ ):

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

729

By the de nition of the Frechet coderivative, for each k, there exists a concave C 1 function gk such that ∇g(xk ; yk ) = (xk∗ ; −yk∗ ) and graph F (x; y) − gk (x; y) attains a minimum 0 at (xk ; yk ). In particular, for any y ∈ F(x), gk (x; y) − gk (xk ; yk ) ≤ 0. Without loss of generality we may assume that xk ∈ U for all k. Taking rk ¿0 small enough so that xk + rk BX ⊂ U and x ∈ xk + rk BX and y ∈ yk + (a + 2b)rk BY implies gk (x; y) − gk (xk ; yk ) ≥ hxk∗ ; x − xk i − hyk∗ ; y − yk i − k1 (kx − xk k + ky − yk k): (35) −hyk∗ ; vk i = 1.

Then, by the covering property Now let vk ∈ BY be an element such that of F, yk + (a + 2b)rk vk ∈ F(xk + rk BX ), i.e., there exists uk ∈ BX with yk + (a + 2b)rk vk ∈ F(xk + rk uk ). Setting y := yk + (a + 2b)rk vk and x := xk + rk uk in Eq. (35) we have gk (xk + rk uk ; yk + (a + 2b)rk vk ) − gk (xk ; yk ) 1 ≥ −(a + b)rk + (a + 2b)rk − ((a + b)rk + (a + 2b)rk ) k   2a + 3b rk : = b− k It is evident that the right-hand side is positive when k is large enough. However, yk + (a + 2b)rk vk ∈ F(xk + rk uk ) implies that gk (xk + rk uk ; yk + (a + 2b)rk vk ) − gk (xk ; yk ) ≤ 0, a contradiction. Thus, we must have (cov F)(x) = a: 6.3. Metric regularity Metric regularity and the open covering properties for multifunctions are in fact two facets of a common property. Thus, the coderivate criterion for the open covering property is also a criterion for metric regularity. We make this parallel precise in this section. A multifunction F: X → 2Y is said to be (global) metrically regular around x with modulus r¿0 if there exist a neighborhood U of x and a positive constant s¿0 such that d(x; F −1 (y)) ≤ r · d(y; F(x)) for all x ∈ U and y ∈ F(x) + sBY . The in mum of all such moduli r is called the bound of (global) metric regularity for F at x and is denoted by (greg F)(x). Theorem 6.5 (Metric regularity). F is (global) metrically regular around x if and only if it has the linear covering property around x . Moreover; one has (greg F)(x) = 1=(cov F)(x): Proof. First let us assume that F is (global) metrically regular around x with a bound of global metric regularity (greg F)(x). Let a¡1=(greg F)(x) and consider r ∈ ((greg F)

730

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

(x); 1=a). Then r is a modulus of regularity of F at x . Therefore, there exist a neighborhood U of x and a positive constant s such that, for any x ∈ U and any y with d(y; F(x)) ≤ s, d(x; F −1 (y)) ≤ r · d(y; F(x)): Now, for any x ∈ U , consider any t¡s=a with x + tBX ∈ U . Then y ∈ F(x) + atBY implies that d(y; F(x)) ≤ s and, therefore, d(x; F −1 (y)) ≤ r · d(y; F(x)) ≤ r · a · t¡t; or y ∈ F(x + tBX ). That is to say F has a covering property around x and (cov F)(x) ≥ a: Letting a approach 1=(greg F)(x) yields (cov F)(x) ≥ 1=(greg F)(x): Now we assume that F has the linear covering property around x with a covering bound (cov F)(x). Let a¡(cov F)(x). Then there exists a neighborhood U of x such  implies that that for any x ∈ U and r with x + rB  X ⊂ U , one has r ∈ (0; r) F(x) + arBY ⊂ F(x + rBX ): Let s = ar.  Consider y ∈ F(x) + sBY . For any  ∈ (0; a) we have   d(y; F(x)) d(y; F(x)) BY ⊂ F x + BX ; y ∈ F(x) + a a− a− that is to say, d(x; F −1 (y)) ≤

1 d(y; F(x)): a−

Therefore, (greg F)(x) ≤

1 : a−

Letting  → 0 and a → (cov F)(x) we get (greg F)(x) ≤

1 : (cov F)(x)

Thus, (greg F)(x) =

1 (cov F)(x)

and the proof is completed. A most satisfactory characterization of regularity then follows from Theorems 6.4 and 6.5

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

731

Theorem 6.6. Let F : X → 2Y be a closed-valued upper semicontinuous multifunction in a neighborhood of x . Then (greg F)(x) = 1= sup inf {kx∗ k: x∗ ∈ DF∗ F(x; y)(y∗ ); ¿0

x ∈ x + BX ; y ∈ F(x) and ky∗ k = 1}: 6.4. Notes Miljutin rst observed that the openness at a linear rate as de ned here is actually contained in the original proof of the Liusternik theorem. See also Dolecki [77] for a related de nition. Sucient conditions for the open covering at a linear rate were studied under di erent names by Dmitruk et al. [76] for Lipschitz mappings and by Warga [202, 205] for continuous mappings. The corresponding concept for multifunctions was formulated in [138] where the rst coderivative criterion for such a property was given. Related results can also be found in [4] where one can also nd additional references. The exact calculation formula in Theorem 6.4 and some other characterizations were proven by Mordukhovich [140] for the nite-dimensional case and later extended to the in nite-dimensional spaces by Mordukhovich and Shao [146]. Metric regularity for multifunctions with closed and convexed graphs were rst studied by Robinson [163, 164] and Ursescu [194]. Sucient conditions for metric regularity of Lipschitz nonsmooth functions were discussed in [92], using a variational argument for the rst time. Metric regularity is a property that is closely related to many other important properties and topics such as error bounds [90], open covering theorems and stability among others [13]. The equivalence of regularity and openness at a linear rate for a multifunction along with a necessary and sucient condition was established in [35]. The precise relation between the regularity bound and the covering bound was given in [140] and the coderivative formula for the covering bound in Theorem 6.4 was derived in [146]. Metric regularity is equally potent in the context of image reconstruction. A recent survey that highlights this application is [11]. Inverse mapping theorems and implicit function theorems for single-valued nonsmooth functions were discussed in [43, 89, 200, 201]. Proving implicit function and open mapping theorems by using the multidirectional mean value inequality rst appears in the Clarke et al. monograph [53]. The general implicit function theorem given here is a special case of the more general results in [117]. 7. Limiting forms Results involving smooth subderivative can often be rephrased in terms of the limiting subderivative, the singular subderivative and the limiting normal cone. Let us recall the de nition of those limiting objects. Deÿnition 7.1. Firstly let f : X → R be a lower semicontinuous function. De ne o n @f(x) := w∗ − lim vi : vi ∈ DF f(xi ); (xi ; f(xi )) → (x; f(x)) ; i→∞

732

and

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

n o @∞f(x) := w∗ − lim ti vi : vi ∈ DF f(xi ); ti → 0+ ; (xi ; f(xi )) → (x; f(x)) i→∞

and call @f(x) and @∞f(x) the subderivative and singular subderivative of f at x; respectively. Secondly, let S be a closed subset of X . De ne o n N (S; x) := w∗ − lim vi : vi ∈ NF (S; xi ); S 3 xi → x i→∞

and call N (S; x) the normal cone of S at x. Finally, let F : X → 2Y be a multifunction ∗ with closed graph and let y ∈ F(x). We de ne the coderivative @∗ F(x; y) : Y ∗ → 2X of F at (x; y) by x∗ ∈ @∗ F(x; y)(y∗ ) if and only if (x∗ ; −y∗ ) ∈ N (graph F; (x; y)): Because the unit ball in a nite-dimensional space is norm compact such a limiting process yields an equivalent formulation of the corresponding fuzzy result. Such limiting results do not, however, generalize to in nite-dimensional spaces without additional and strong assumptions. We will discuss positive results in nite-dimensional space and limiting counterexamples in in nite-dimensional spaces, respectively. 7.1. Positive limiting results in nite-dimensional spaces Throughout this section, we assume that X is a nite-dimensional Banach space. Naive limit taking produce a calculus in terms of limiting subdi erentials from the corresponding fuzzy calculus. We illustrate such a limiting process in detail using the sum rule. Theorem 7.2 (Limiting sum rule). Let f1 ; : : : ; fN : X → R be lower semicontinuous PN functions and n=1 fn attains a local minimum at x . Then; either 0∈

N X

@fn (x);

(A1)

n=1

or there exist un ∈ @∞ (fn )(x); n = 1; : : : ; N not all zero such that 0=

N X

un :

(A2)

n=1

Proof. By Theorem 2.6, for each i, there exist (xni ; fn (xni )) ∈ (x; fn (x)) + (1=i)BX ×R and in ∈ DF fn (xni ) such that

N

X 1

i n ¡ : (36)

i

n=1

De ne ti :=

PN

n=1

kin k. We consider the following two cases.

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

733

Case 1. The sequence ti is bounded. Then, without loss of generality, we may assume that in converges to n . It is obvious that n ∈ @fn (x). Upon taking limits in Eq. (36) PN PN we obtain 0 = n=1 n ∈ n=1 @fn (x): Case 2. The sequence ti is unbounded. Then, without loss of generality, we may assume that ti → ∞ and in =ti converges to un . Then un ∈ @∞fn (x) by the de nition of the singular limiting P subdi erential. Dividing Eq. (36) byP ti and taking limits we obtain PN N N i 0 = n=1 un : Since k k=t = 1 we conclude that i n n=1 n=1 kun k = 1 and, therefore, un are not all 0. Other calculus rules given in terms of the limiting subdi erential can be derived similarly. Alternatively, they can be established by, for example, rst establishing a limiting multiplier rule corresponding to Theorem 3.3 and then using the relations in Section 3. We will state these results without proof. Consider the optimization problem P in Section 3.1. Using the notation n introduced there we have: Theorem 7.3 (Limiting multiplier rule). Let X be a nite-dimensional Banach space; let C be a closed subset of X and let fn be lower semicontinuous for n = 0; 1; : : : ; M and continuous for n = M + 1; : : : ; N . Suppose that x is a local solution of problem P. Then either: (A1) there exist un∞ ∈ @∞ (n fn )(x); n = 0; 1; : : : ; N and uN∞+1 ∈ N (C; x ) not all zero such that 0=

N +1 X

un∞ ;

n=0

PN or there exist n ≥ 0; n = 0; : : : ; N satisfying n=0 n = 1 such that (A2) X X m @(m fm )(x) + @∞ (m fm )(x) + N (C; x ): 0∈ m∈{n:n ¿0}

m∈{n:n =0}

Theorem 7.4 (Limiting chain rule). Let X be a nite-dimensional Banach space; let  n = 1; : : : ; M be lower semicontinuous functions and let f : R N → R and fn : X → R; fn : X → R; n = M + 1; : : : ; N be continuous functions. Suppose that f(f1 ; : : : ; fN ) attains a minimum at x . Then either: (A1) there exist un∞ ∈ @∞ (n fn )(x); n = 1; : : : ; N; not all zero such that 0=

N X

un∞ ;

n=1

or there exist  = (1 ; : : : ; N ) ∈ @f(f1 ; : : : ; fN )(x) such that (A2) X X m @(m fm )(x) + @∞ (m fm )(x): 0∈ m∈{n:n 6= 0}

m∈{n:n =0}

734

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

The calculus for limiting coderivatives follows naturally. Recall that a multifunction F : X → 2Y is lower semicompact around x if there is a neighborhood of U of x such that for any x ∈ U and any sequence xk → x with F(xk ) 6= ∅, there is a sequence yk ∈ F(xk ) containing a norm convergent subsequence [150]. Theorem 7.5 (Limiting coderivative sum rule). Let X and Y be nite-dimensional Banach spaces; let F1 and F2 be multifunctions from X to Y with closed graphs; and let y ∈ F1 (x) + F2 (x). Assume that the multifunction S(x; y) := {(y1 ; y2 ): y1 ∈ F1 (x); y2 ∈ F2 (x); y1 + y2 = y} is lower semicompact around (x; y);  and that the following condition is ful lled: @∗ F1 (x; y1 )(0) ∩ (−@∗ F2 (x; y2 )(0)) = {0}; Then

[

 ∗) ⊂ @∗ (F1 + F2 )(x; y)(y

∀(y1 ; y2 ) ∈ S(x; y): 

[@∗ F1 (x; y1 )(y∗ ) + @∗ F2 (x; y2 )(y∗ )]:

(y1 ; y2 ) ∈ S(x;y) 

Theorem 7.6 (Limiting coderivative chain rule). Let X; Y and Z be nite-dimensional Banach spaces and let F : X × Y → 2Z and G : X → 2Y be multifunctions with closed graphs. Assume that the multifunction M (x; z) := G(x) ∩ F −1 (z) = {y ∈ G(x): z ∈ F(x; y)} is lower semicompact around (x; z ). Assume also that for any y ∈ M (x; z ) the regularity condition  z )(0) & −x∗ ∈ @∗ G(x; y)(y  ∗ )] ⇒ x∗ = 0 & y∗ = 0 [(x∗ ; y∗ ) ∈ @∗ F((x; y); holds. Then; for all z ∗ ∈ Z ∗ ; @∗ (F ◦ G)(x; z )(z ∗ ) ⊂

[

[x1∗ + x2∗ : x1∗ ∈ @∗ G(x; y)(y  ∗ );

y∈M  (x; z )

 z )(z ∗ )]: (x2∗ ; y∗ ) ∈ @∗ F((x; y); The extremal principle also has a corresponding limiting form. Theorem 7.7 (Limiting extremal principle). Let S1 and S2 be closed subsets of a nite-dimensional space X and let x be a local extremal point of (S1 ; S2 ). Then N (S1 ; x ) ∩ (−N (S2 ; x )) 6= {0}: The characterization of open covering at a linear rate, metric regularity and the pseudo-Lipschitz property for multifunctions can also be formulated in terms of the limiting coderivative. We will state only the open covering theorem.

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

735

Theorem 7.8 (Limiting open covering theorem). Let X and Y be nite-dimensional Banach spaces and let F : X → 2Y be locally bounded around x . Then the following are equivalent: 1. F has an open covering property with linear rate around (x; y)  ∈ graph(F):  ∗ ); ky∗ k = 1; y ∈ F(x)}¿0: 2. inf {kx∗ k: x∗ ∈ @∗ F(x; y)(y 3. There exist a number c¿0 and a neighborhood U of x such that ky∗ k ≤ ckx∗ k for any x∗ ∈ @∗ F(x; y)(y∗ ); x ∈ U; and y ∈ F(x): 4. There exists a neighborhood U of x such that; for any x ∈ U and y ∈ F(x); Ker @∗ F(x; y) := {y∗ ∈ Y ∗ : 0 ∈ @∗ F(x; y)(y∗ )} = {0}: 5. For all y ∈ F(x);  = {0}: Ker @∗ F(x; y) Example 7.9. Let X and Y be nite-dimensional spaces; let A be a linear operator that maps X onto Y and let C and D be closed subset of Y and X , respectively. De ne F(x) := Ax + C if x ∈ D and F(x) := ∅ if x ∈= D. Applying Theorem 7.8 to F with the help of the limiting coderivative sum rule we have: if; for x ∈ D and any y ∈ Ax + C; A∗ N (C; y) ∩ N (D; x) = {0}; then A((x + rBX ) ∩ D) + C is an open set for any r¿0 suciently small. In particular, let X = Y; let A be the identity mapping and let C and D be closed cones. Applying the above result with x := 0 we obtain the polarity result that if C 0 ∩ D0 = {0} then C + D=X. 7.2. Negative results in in nite-dimensional spaces Analyzing the proof of the limiting sum rule of Theorem 7.2, we can see that in an in nite-dimensional space we will not be able to guarantee that un are not all 0. When all the un ’s are 0 the alternative (A2) is trivial. In fact, most of the limiting results fail in in nite-dimensional spaces. We start with an example showing that the limiting sum rule does not hold in in nite-dimensional spaces. Example 7.10. Our example is built with the following basic construction: An (in nite-dimensional) Hilbert space H with two closed subspaces M1 and M2 such that M1⊥ + M2⊥ is dense in H but not closed and M1⊥ ∩ M2⊥ = {0}. De ne f1 := M1 and f2 := M2 + hv; ·i where −v ∈ H \(M1⊥ + M2⊥ ). Since M1⊥ + M2⊥ dense implies that M1 ∩ M2 = {0}, f1 + f2 attains a minimum at 0. However, it is easy to check that @f1 (0) = @∞f1 (0) = M1⊥ , @f2 (0) = M2⊥ + v and @∞f2 (0) = M2⊥ . Thus, 0 ∈= @f1 (0) + @f2 (0) and @∞f1 (0) ∩ (−@∞f2 (0)) = {0}

736

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

or, equivalently, 0 ∈ @∞f1 (0) + @∞f2 (0) holds only in the trivial case. As a concrete example of the basic construction, let H := ‘2 and denote the unit vectors by {un }. Suppose { n } is a sequence of positive real numbers with 1¿ n ≥ p 1 − 1=n2 . De ne M1 := cl span{h1 ; h2 ; : : :} and M2 := cl span{g1 ; g2 ; : : :} where hn := p u2n and gn := 1 − n2 u2n−1 − n u2n . Then we can directly verify that M1⊥ := cl span{e1 ; e2 ; : : :}

and M2⊥ := cl span; {f1 ; f2 ; : : :}; p := n u2n−1 + 1 − n2 u2n . where en := u2n−1 , fnP ∞ Then, for any x = n=1 xn un ∈ H , the partial sum ! 2N N N X X X x2n n x p 2n fn ∈ M1⊥ + M2⊥ : xn un = x2n−1 − p en + 2 1 − n 1 − n2 n=1 n=1 n=1 Therefore, M1⊥ + M2⊥ is dense in H . We can show by a similar argument that M1 + M2 is dense in H which implies that M1 ∩ M2 = 0. It remains to show that M1⊥ + M2⊥ 6= H . Consider ∞ q X 1 − n2 u2n : v := n=1

P∞ P∞ If v = y + z with y ∈ M1⊥ and z ∈ M2⊥ then y = n=1 yn en and z = n=1 zn fn because ⊥ ⊥ {en } and {fn } are orthonormal basis for M1 and M2 , respectively. Then we must have zn = 1 and yn = zn n = n → 1 which is impossible. Since Theorem 7.3 implies 7.4 and the later implies the limiting sum rule PTheorem N by setting f(f1 ; : : : ; fN ) := n=1 fn , Example 7.10 also shows that these two results fail in in nite-dimensional spaces. Variations of this basic construction also show that the coderivative calculus does not hold in in nite-dimensional spaces. Example 7.11. Again let X be a separable Banach space and let M1 and M2 be closed subspaces of X such that M1⊥ ∩ M2⊥ = 0 and M1⊥ + M2⊥ is w∗ -dense but not closed. Recall that M1⊥ + M2⊥ w∗ -dense implies that M1 ∩ M2 = {0}. De ne multifunctions F1 ; F2 : H → 2R by graph(Fi ) := Mi × R + , i = 1; 2. Then graph(F1 + F2 ) = {0} × R + . Consider 0 ∈ F1 (0) + F2 (0). The set S(x; y) := {(y1 ; y2 ) ∈ R 2 : y1 ∈ F1 (x); y2 ∈ F2 (x); y1 + y2 = y} is {(0; 0)} at (x; y) = (0; 0) and ∅ elsewhere. It is obviously lower semicompact around (0; 0). An easy calculation shows that @∗ Fi (0; 0)(0) = Mi⊥ for i = 1; 2 and @∗ (F1 + F2 )(0; 0)(0) = X ∗ . Thus, the regularity condition @∗ F1 (0; 0)(0) ∩ (−@∗ F2 (0; 0)(0)) = {0} holds yet the sum rule @∗ (F1 + F2 )(0; 0)(0) ⊂ @∗ Fi (0; 0)(0) + @∗ Fi (0; 0)(0) fails.

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

737

Example 7.12. Let the separable Banach space X and its subsets M1 and M2 be as in the previous example. Let Y = Z = R. De ne multifunctions F and G by  + R ; x ∈ M1 ; G(x) := ∅; otherwise and

 F(x; r) :=

R + ; (x; r) ∈ M2 × R + ; ∅; otherwise:

Then F(x; G(x)) =



R+; ∅;

x = 0; otherwise:

When x = 0 and z = 0 we have M (0; 0) = {0} and it is the only value for (x; z) that makes M (x; z) 6= ∅. Thus, M is lower semicompact around (0; 0). Next we check that the regularity condition is satis ed. In fact, x∗ and y∗ satisfying the regularity condition amounts to (x∗ ; y∗ ; 0) ∈ N (graph(F); (0; 0; 0)) = M2⊥ × (−R + ) × (−R + ) and (−x∗ ; −y∗ ) ∈ N (graph(G); (0; 0)) = M1⊥ × (−R + ): This obviously implies that x∗ = 0 and y∗ = 0. Nevertheless, the chain rule does not hold because @∗ (F ◦ G)(0; 0)(0) = X ∗ ; while x1∗ ∈ @∗ G(0; 0)(y∗ ) and (x2∗ ; y∗ ) ∈ @∗ F((0; 0); 0)(0) imply that x1∗ ∈ M1⊥ and x2∗ ∈ M2⊥ . An adaptation of the basic construction would also show that the extremal principle does not hold in in nite-dimensional spaces. However, we will use a di erent construction where the two sets involved are norm compact. Example 7.13. Let X be a separable Banach space and {en }∞ n=1 unit independent vecn tors that densely span X . De ne S := cl co{±e =2 } and S := {0}. Then both S1 and S2 1 n 2 P∞ are norm compact. Let v := ( n=1 (en =n2 )) ∈ X . Note that for any sequence of nonzero real numbers rk → 0, (rk v + S2 ) ∩ S1 = {rk v} ∩ S1 = ∅. Thus, 0 is an extremal point for (S1 ; S2 ). However, the extremal principle does not holds at 0 because N (S1 ; 0) = {0}.

738

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

Now we adapt Example 7.13 to show the open mapping theorem fails in in nitedimensional spaces. Example 7.14. Let X be any separable Banach space and {en }∞ n=1 unit independent vectors that P densely span X . De ne S1 := cl co{±en =2n } and S2 := {tv: t ∈ [−1; 1]} ∞ where v := ( n=1 (en =n2 )) ∈ X . Then both S1 and S2 are norm compact and S1 ∩ S2 = S1 ∩ (−S2 ) = {0}. De ne  x + S1 if x ∈ S2 ; F(x) := ∅ otherwise: It is easy to see that (0; 0) ∈ graph(F). Since span(S1 ) is dense in X we have N (graph(F); (0; 0)) ⊂ [{0} × S1 ]⊥ = X ∗ × {0}: Therefore, Ker @∗ F(0; 0) = {0}. It remains to show that F does not have an open covering property with linear rate around (0; 0). In fact, for any r¿0, [ [ v + S1 ] F(rBX ) =

∈[0; r=kvk]

P∞ does not contain any open ball around 0. To see this let u := ( n=1 (en =n3 )) and be an arbitrary positive number. Then u ∈ F(rBX ) implies that, for some ∈ [0; r=kvk], u − v ∈ S1 can only happen when u = v = 0. 7.3. Notes De nition 7.1 was given in [136, 137, 139] in nite-dimensional spaces and in [115, 147] in in nite-dimensional spaces. This limiting subdi erential has two nice features: it is contained in any derivate container [139, Theorem 2.3] and is minimal among limiting generalized derivative objects that are sequential upper semicontinuous [147, Theorem 9.7]. The limiting sum rule of Theorem 7.2 was derived in [140]. The limiting multiplier rule Theorem 7.3 is to be found in [32, 137, 139]. Similar results with quali cation conditions imposed so as to eliminate the singular subdi erentials can be found in many places. We refer to Clarke [45], Io e [98] and Loewen [129] for typical results and references. The limiting chain rule follows from Mordukhovich [138, Theorem 7]. The derivation of the limiting chain rule from the limiting multiplier rule follows the method in [215]. Calculus for limiting coderivatives, the extremal principle and the characterization for the open mapping property follow Mordukhovich [140, 141]. Various counterexamples were constructed in [38]. The reason that the limiting processes in nite-dimensional space does not generalize to in nite-dimensional spaces can be seen from the proof of Theorem 7.2. In an in nite-dimensional space the singular part, case 2, of that proof will produce several weakly convergent sequences that may converge to 0, which leads to a trivial conclusion. This diculty is dealt with primarily by two di erent types of additional assumptions: (i) Assume that the functions involved satisfy certain Lipschitz continuity conditions so as to eliminate the singular case. (ii) Assume additional

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

739

compactness conditions that ensure the weakly convergent sequences contains norm convergent subsequences. Type (i) conditions appear in most of the early literature on nonsmooth analysis (see [45, 99, 101, 114, 115, 139] and the references therein). Type (ii) conditions appeared rst in [28]. Many di erent conditions appeared in [14, 36, 86, 97, 100, 108, 109, 128, 150, 147, 148, 152, 154]. A thorough discussion of the relationships among those conditions can be found in [104]. 8. Relations to other generalized di erentials and extensions So far we have restricted our attention to Frechet subdi erentials. We now discuss generalizations and relations with the other many generalized subdi erentials. Our discussion focusses on those that are closely related to the smooth subdi erentials and, therefore, is not comprehensive. 8.1. Bornologically smooth subdi erential In in nite-dimensional Banach spaces, one often needs to consider di erential concepts other than Frechet di erentiability. Indeed, in the Banach space L∞ of essentially bounded functions no smooth norm of any form can be found and in C[0; 1] no equivalent norm is Frechet smooth. A convenient uni ed treatment of di erentiablity comes by considering bornology. Let X be a Banach space. A bornology of X is a family of closed bounded and centrally symmetric subsets of X whose union is X , which is closed under multiplication by scalars and is directed upwards (that is, the union of any two members of is contained in some member of ). We will denote by X ∗ the dual space of X endowed with the topology of uniform convergence on -sets. The most important bornologies are those formed by all (symmetric) bounded sets (the Frechet bornology, denoted by F), weak compact sets (the weak Hadamard bornology, denoted by WH ), compact sets (the Hadamard bornology, denoted by H ) and nite sets (the Gateaux bornology, denoted by G) (see [27, 72, 155] for details). Given a function f on X , we say that f is -di erentiable at x and has a -derivative ∇ f(x) if f(x) is nite and t −1 (f(x + tu) − f(x) − th∇ f(x); ui) → 0 as t → 0 uniformly in u ∈ V for every V ∈ . We say that a function f is -smooth at x if ∇ f : X → X ∗ is continuous in a neighbourhood of x. It is not hard to check that a convex function f is -smooth at x if and only if f is -di erentiable on a convex neighborhood of x. Now we can de ne -viscosity subdi erentials that generalize the Frechet subdi erential. Deÿnition 8.1. Let f: X → R be a lower semicontinuous function and f(x)¡+∞. We say f is -viscosity subdi erentiable and x∗ is a -viscosity subderivative of f at x if there exists a locally Lipschitz concave function g such that g is -smooth at x, ∇ g(x) = x∗ and f − g attains a local minimum at x. The -subdi erential of f at x is the set of all -viscosity subderivatives of f at x, denoted by D f(x).

740

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

The -subdi erential for 6= F is “larger” than the Frechet-subdi erential. One can also consider a subdi erential that is smaller than the Frechet-subdi erential. The s-Holder subdi erential (s ∈ (0; 1]) de ned by requiring g in De nition 8.1 to be of the form g(y) := f(x) + hx∗ ; y − xi − ky − xk1+s is a class of such subdi erentials. We denote by DH (s) the s-Holder subdi erential. When X is a Hilbert space Dp := DH (1) is the proximal subdi erential introduced by Rockafellar [168] and already exploited in Section 3.1. We now turn to product spaces. Let Xn ; n = Q1;N : : : ; N be Banach spaces and, for each n, let n be a bornology of Xn . Then { n=1 Vn : Vn ∈ n } is a bornology of QN the product space X := n=1 Xn . We call this bornology the product bornology of n ; n = 1; : : : ; N , and denote it by [ 1 ; : : : ; N ]. De nition 8.1 certainly applies to the product bornology for a product of several Banach spaces. However, in product spaces a decoupled viscosity subdi erential introduced in [214] is often convenient. Let Xn ; n = 1; : : : ; N , be Banach spaces with QN bornology n and let f : X := n=1 Xn → R be a lower semicontinuous function and f(x)¡+∞. We say x∗ = (x1∗ ; : : : ; xN∗ ) is a decoupled [ 1 ; : : : ; N ]-viscosity subderivative functions gn : Xn → R of f at x = (x1 ; : : : ; xN ) if there exist (concave) locally Lipschitz PN such that gn is n -smooth at xn , ∇ n gn (xn ) = xn∗ and f − n=1 gn attains a local minimum at x. It turns out that with the concavity requirement on the osculating function any [ 1 ; : : : ; N ]-viscosity subderivative is automatically decoupled. In fact, let support of f at (x1 ; : : : ; xN ). De ne g(y1 ; : : : ; yN ) be a concave [ 1 ; : : : ; N ]-smooth P N gn (yn ) := g(x1 ; : : : ; xn−1 ; yn ; xn+1 ; : : : ; xN )=N . Then n=1 gn is a decoupled support function of f at (x1 ; : : : ; xN ) with the same [ 1 ; : : : ; N ]-derivative. In contrast, without the concave Lipschitz requirement the above result fails. This again shows the virtue of requiring the osculating function in the de nition of the smooth subdi erential to be concave Lipschitz. With a little abuse of notation we will use -subdi erential to refer to the various subdi erentials discussed in this section. Since the smooth variational principle is our main tool, which remains valid in a Banach space with a -smooth Lipschitz bump function [72, 124], most of the results and their proofs discussed so far remain valid when DF is replaced by D in Banach spaces with -smooth Lipschitz bump functions. One exception is the results in Section 3 where the proof methods crucially depend on the geometry of the space X . Those results hold for s-Holder subdi erentials in superre exive Banach spaces, in particular for proximal subdi erentials in Hilbert spaces. However, whether they can be extended to general -smooth spaces for ⊂ F remains open. 8.2. Limiting subdi erential in smooth spaces The smooth subdi erentials are also closely related to several classical generalized derivatives. We have seen in Section 7 that the Frechet subdi erential naturally induces limiting subdi erentials and a limiting normal cone. Two other classical generalized

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

741

derivatives, the Clarke generalized gradient and the G-subdi erential, and their related normal cones and singular subdi erentials can also be recaptured by this limiting process. For a Lipschitz function the de nition of the Clarke generalized gradient has been given in the introduction. Following the scheme in [45] one can also extend its de nition to extended real-valued lower semicontinuous functions. We recall a related de nition rst. Deÿnition 8.2. A vector x∗ is a Clarke normal to S at x if for any ¿0 and any nite-dimensional subspace L ⊂ X there are ¿0 and u∗ ∈ X ∗ such that |hu∗ − x∗ ; hi| ≤ khk;

∀h ∈ L and

d◦ (S; x; h) ≥ hu∗ ; hi;

∀h ∈ X:

The collection of all Clarke normals to S at x is a convex weak-star closed cone denoted Nc (S; x). A vector x∗ is a G-normal to S at x if there is a ¿0 such that for any ¿0 and any nite-dimensional subspace L ⊂ X there are u ∈ X with ku − xk ≤  and u∗ ∈ X ∗ such that |hu∗ − x∗ ; hi| ≤ khk;

∀h ∈ L and

d− d(S; u; h) ≥ hu∗ ; hi;

∀h ∈ L;

where d− d(Sx; h) is the lower Dini directional derivative of d(S; ·) at x: d− d(S; x; h) = lim inf + t −1 (d(S; x + te) − d(S; x)): e→h; t→0

The collection of all G-normals to S at x form a cone denoted Ng (S; x). Let f be a lower semi-continuous function on X which is nite at x. The generalized gradient of Clarke and the G-subdi erential of f at x are de ned as follows @c f(x) = {x∗ ∈ X ∗ : (x∗ ; −1) ∈ Nc (epif; (x; f(x))}; @g f(x) = {x∗ ∈ X ∗ : (x∗ ; −1) ∈ Ng (epif; (x; f(x))}: The singular C-subdi erential and the singular G-subdi erential of f at x are de ned by the relations @c∞ f(x) = {x∗ ∈ X ∗ : (x∗ ; 0) ∈ Nc (epif; (x; f(x))}; and @g∞ f(x) = {x∗ ∈ X ∗ : (x∗ ; 0) ∈ Ng (epif; (x; f(x))}: Remark 8.3. The (regular) Clarke subgradient is de ned in a di erent way from that in De nition 1.5. Fortunately they coincide for Lipschitz functions. The connections between these two generalized derivatives and the Frechet subdifferential are given in the following theorem.

742

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

Theorem 8.4. Let X be a Frechet smooth Banach space; let S be a closed subset of X and let f : X → R be a lower semicontinuous function. Then; for any x ∈ S; Ng (S; x) = cl∗ N (S; x) ∗

@g f(x) = cl @f(x)

Nc (S; x) = cl co∗ N (S; x);

and and





(37)

@c f(x) = cl [@f(x) + @ f(x)];

(38)

@c∞ f(x) = cl∗ [@ ∞ f(x)]:

(39)

and @g∞ f(x) = cl∗ @ ∞ f(x)

and

Remark 8.5. Theorem 8.4 is the Frechet subdi erential version of the general results in [20]. In fact; when X is a -smooth Banach space (with being any bornology or H (s); s ∈ (0; 1]) relations similar to Eqs. (37) and (38) in Theorem 8.4 hold with NF and DF replaced by the corresponding N and D (see [20, 37]). Whether the singular subdi erential relation (39) holds in more general -smooth Banach spaces is unknown. For Lipschitz functions, Preiss [159] has elaborated the following marvelous result. In any -smooth space @c f(x) = cl∗ co{w∗ − lim ∇ f(xn ) : xn → x}:

(40)

In nite-dimensional space this was actually Clarke’s original de nition. Thus, in any re exive space, Eq. (40) holds for = F and in any separable or weakly compactly generated spaces, Eq. (40) holds for = G and = H . There is an attractive fuzzy mean value inequality equivalent to Preiss’s result [159]. 8.3. Partially smooth subdi erential All the results discussed so far need assumptions that the underlying spaces have adequate smoothness properties. How do we handle problems that naturally arise outside smooth-Banach spaces (e.g. optimal control problems that naturally lead to abstract optimization problems in L∞ )? A useful observation is that although many problems inevitably lie in large (nonsmooth or non-Frechet smooth) spaces, X , in some cases the “target” set may be signi cantly smaller and so lie in a much more richly renormable space, Y . For example, in most contexts, existence results in control theory will require some measure of weak compactness of an associated lower level set, S. This set perforce lies in a weakly compactly generated and so weak-Hadamard smoothable subspace Y , and it is often the case that only variations in that subspace need be examined. In such settings the partially smooth subdi erential introduced in [33] is useful. The key construction is as follows. Deÿnition 8.6. Let X be a Banach space with a closed subspace Y . Let f be a lower semicontinuous function on X and k¿0. A vector v∗ ∈ X ∗ is a Y ; k subgradient to f at x if there is a concave function g, with local Lipschitz constant k, such that 1. g is Y smooth,

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

743

2. v∗ ∈ @(−g)(x), and 3. f − g has a local minimum at x, where @ signi es the convex subgradient. We call the set of all such Y ; k subgradients to f at x the S Y ; k subdi erential of f at x, denoted by DY ; k f(x) and we denote by DY f(x) = k¿0 DY ; k f(x) the Y subdi erential of f at x. The Y subdi erential de ned above coincide with the usual viscosity subdi erential when restricted to the subspace Y . It is de ned by using an osculating function and, therefore, is suited to variational arguments. However, outside the subspace Y , we have not heavily restricted the support function. Thus, this subdi erential could be much larger than the usual generalized subderivatives. The larger a subdi erential the more inaccurately it re ects the local behavior of a function. Therefore, it is desirable to further restrict the subdi erential relative to Y ⊥ . One way to do this is to restrict the subdi erential to be contained in a generalized subdi erential that has a reasonable calculus. Recall the following general subdi erential concept. A multifunction @∗ is a subdi erential if it has the following three properties: 1. 0 ∈ @∗ f(x) if x is a local minimum of f, 2. @∗ g(x) coincides with the subgradient in convex analysis when g is convex and Lipschitz around x, and 3. 0 ∈ @∗ f(x) + @g(x) when f + g attains a local minimum at x and g is convex and Lipschitz around x and @ is the usual convex subdi erential. We de ne the ∗Y subdi erential as follows: Deÿnition 8.7. Let Y ⊂ X be a closed subspace of X with bornology and let @∗ be a subdi erential. Let f be a lower semicontinuous function and f(x)¡+∞. We de ne the partial k; -viscosity subdi erential and partial -viscosity subdi erential of f with respect to Y at x to be @∗Y ; k f(x) := @∗ f(x) ∩ DY ; k f(x) and @∗Y f(x) := @∗ f(x) ∩ DY f(x): Note that D := @∗X is the original -viscosity subdi erential and D ; k := @∗X ; k is the -viscosity subdi erential of rank k de ned in [20] in which the notation is @ k while @∗0 = @∗ is the original ∗-subdi erential. Also, by the de nition, we always have @∗Y f(x) ⊂ @∗ f(x). Thus, results in terms of @∗Y are more accurate than those in terms of @∗ .

744

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

It was proven in [33] that all the basic results in Section 2 have corresponding versions in terms of the partially smooth subdi erential. Thus, results in Sections 2, 4, 5 and 6 can also be generalized to this setting. Moreover, the limiting characterization for the Clarke generalized gradient and the G-subdi erential discussed in the previous section can be appropriately extended to this setting. We refer to [33, Section 10] for details. 8.4. Notes We emphasize again that we have restricted ourselves to those generalized derivative constructions that are closely related to the viscosity subdi erentials and where variational arguments play a crucial role. There are many other constructions. We list some of the references in several classes. Warga’s derivate container is based on the idea of uniformly approximating nonsmooth functions by smooth functions. The original work is to be found in [200,202,205]. Recent re nements and extensions to multifunctions due to Sussmann can be found in [180–182]. Such generalized derivatives are particularly useful in deriving necessary optimality conditions for control and di erential inclusion systems (see [203, 204, 206, 207, 216]) and open mapping and covering theorems that are intrinsically related to more topological methods (see [180, 193, 202, 204, 205]). Generalized derivatives for functions and multifunctions based on various tangent cones can be found in Aubin and Frankowska’s monograph [4]. One drawback of the limiting subdi erential, G-subdi erential and Clarke generalized gradient is that they do not necessarily coincide with the Frechet derivative when the later exists. Many researchers have explored the possibility of constructing generalized derivative objects that do coincide with the Frechet derivative at di erentiable points of functions. Michel and Penot’s construction [135] was the rst of this sort. Treiman introduced a smaller and not necessarily convex B-subdi erential. The de nition, calculus and applications of the B-subdi erential is to be found in [190–192]. Sussmann’s multidi erential [180, 183] also has this feature: at a Frechet di erentiable point of a function there exists a multidi erential that coincide with the Frechet derivative. Abstract de nitions of subdi erentials have attracted much attention recently. Here one de nes an abstract subdi erential operator by specifying its key properties. The core properties are always (a) coincidence with the convex subdi erential for convex functions, (b) containing 0 at a local minimum of the function (critical points), and (c) a basic sum rule (calculus). Several di erent versions of such abstract subdi erential operators were proposed in [6, 64, 185, 186]. The abstract subdi erential operator @∗ used in this section is a slight modi cation of these abstract de nitions and was used in [33]. The limiting characterization of the Clarke generalized gradient and the G-subdi erential discussed in this section is the result of a long chain of e orts by many researchers. The limiting characterization of the normal cone in terms of the proximal normal vectors can be traced back to Clarke [39,40,42]. Proximal subdi erential characterizations of the Clarke generalized gradient and singular Clarke generalized gradient are given by Rockafellar [168] in nite-dimensional spaces. These characterizations

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

745

were extended to in nite-dimensional spaces by Treiman via the -Frechet normal and subdi erentials in [187, 189] and by Borwein and Strojwas in smooth spaces in terms of proximal normals [30] (see also [126]). The limiting characterizations discussed here are based on Borwein and Io e [20] and Borwein et al. [33]. 9. Integration The question here is: does the (Frechet) subdi erential determine a function up to a constant? Corollary 4.4 gives a simple positive result – on a connected open set U , DF f(x) ⊂ {0} implies that f is a constant. Sadly, as we are using multifunctions, this result does not imply that two functions with the same subdi erential di er only by a constant, in contrast to the case in smooth calculus. Example 9.1. Let f(x) := −[0;1] (x)   (−∞; 0]; DF f(x) = DF g(x) = [0; ∞);  0;

and g(x) := 2f(x). Then x = 0; x = 1; elsewhere;

but f(x) − g(x) = [0;1] (x) is not a constant. In Example 9.1 the problem occurs at the point where the functions are discontinuous. So it is natural to ask whether we can get a positive result if in addition we demand that the functions f and g are continuous. The following example, which was originally designed to show that the Newton–Leibniz formula (Fundamental Theorem of Calculus) fails without an absolute continuity assumption, shows that the answer is still negative. Example 9.2. Let C be the Cantor ternary set on [0; 1] consisting of every ternary decimal involving only 0 and 2 in its expression (cf. [85, pp. 95–98]). As C is closed, [0; 1]\C is the union of denumerable disjoint open intervals. We write ∞ [ (ak ; bk ): [0; 1]\C := k=1

Consider the classical Cantor ternary function h : C → [0; 1] de ned as follows: h(x) :=

∞ X xn 2n+1 n=0

where xn is the nth digit of the ternary decimal expression x = 0: x1 x2 : : : of x. As, for each k, ak and bk must have “dual” ternary expressions ak = 0:c1 c2 : : : cn 0222 : : :

and

bk = 0:c1 c2 : : : cn 2000 : : : ;

we can check that h(ak ) = h(bk ). Thus, we can extend h to [0; 1] by de ning h(x) := h(ak ) = h(bk );

∀x ∈ (ak ; bk ):

746

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

We further extend h to R by setting h(x) := 0; x¡0 and h(x) := 1; x¿1. It is well known that h is continuous and non-decreasing. Moreover  x ∈ C\({bk }∪ {0});   ∅; DF h(x) = [0; ∞); x ∈ ({bk }∪{0});   0; x ∈ R\C: In fact, for any x ∈ C\({bk } ∪ {0}), its ternary expression x = 0: x1 x2 : : : contains in nitely many 2’s. If  ∈ DF h(x) 6= ∅ then h(y) − h(x) + o(|y − x|) ≥ (y − x) for y suciently close to x. Consider y m = 0:y1 y2 : : : such that yi = xi if i 6= m and ym = 0. Then y m ≤ x and y m converges to x. Substituting y m into the aforementioned inequality leads to x  xm xm m − m+1 + o m ≥ − m : 2 3 3 This is absurd since there are in nitely many xm = 2. Therefore DF h(x) = ∅. If x ∈ ({bk } ∪ {0}), say x = bk , then x has a nite ternary decimal expression x = bk = 0:x1 x2 : : : xm . For any integer p¿m and y = 0: y1 y2 : : : ∈ (x; x + 2−p ), we must have yi = xi ; i = 1; 2; : : : ; m and ym+1 = · · · = yp = 0. Thus, h(y) − h(x) ≥

∞ X n=p+1

yn : 2n+1

Therefore,

P∞ n+1 ) 3p h(y) − h(x) n=p+1 (yn =2 ≥ P∞ ≥ p+1 : n y−x 2 n=p+1 (yn =3 )

Set  := min{bk − ak ; 2−p }. Then, for any y ∈ (bk − ; bk + ) and  ∈ [0; 3p =2p+1 ], observing that h is a constant on (ak ; bk ), we have h(y) − h(x) ≥ (y − x) and hence DF h(x) ⊃ [0; 3p =2p+1 ]. As p can be taken arbitrarily large, we obtain DF h(x) = [0; ∞). It is obvious that DF h(x) = 0 when x ∈ R\C. Now f = 2h and g = h have the identical subdi erentials everywhere yet their di erence f − g = h is not a constant. Remark 9.3. Replacing DF with Dp and o(| · |) with | · |2 for some positive constant ; both Example 9.1 and Example 9.2 as well as their arguments remain valid. In particular; the proximal version of Example 9.2 gives a negative answer to the following question in [129]: Given two continuous functions f and g; satisfying f(0) = g(0) = 0 and Dp f(x) = Dp g(x) for all x ∈ Rn ; must f = g? Both examples essentially use the fact that functions involved have subdi erentials with either 0 or a half-line as their values. Consider functions with everywhere bounded

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

747

subdi erentials which by Theorem 4.3 are Lipschitz. When X is a separable Banach space, observing that Gateaux subderivatives are derivatives when the later exist, by the in nite-dimensional extension of the Radamacher theorem (see [22–24]) we deduce that if two functions have the same Gateaux subdi erential everywhere then their (Gateaux) derivatives exist and are equal almost everywhere. Thus, the Newton–Leibniz formula (Fundamental theorem of calculus) and the Fubini theorem lead to the following positive result: Theorem 9.4 (Fundamental theorem of calculus). Let f and g be Lipschitz functions on a separable Banach space X . Then f − g is a constant if and only if the Gateaux (Hadamard ) subdi erentials of f and g coincide everywhere. Remark 9.5. It is obvious that we can replace the Gateaux subdi erential by the Frechet subdi erential in the nite-dimensional version of Theorem 9.4. Indeed for Lipschitz functions Gateaux and Hadamard subderivatives coincide while in nitedimensions the Hadamard and Frechet bornologies themselves coincide. However; explicit examples in [19] show that Theorem 9.4 fails even in nite-dimensional space when the Gateaux subdi erential in the theorem is replaced by the proximal subdifferential. 9.1. Notes The examples in this section are taken from [37]. When the functions involved are not Lipschitz one can impose additional assumptions to ensure the integrability of the subdi erentials. There are essentially two types of such conditions : (i) conditions on functions [158, 161, 170, 186] and (ii) conditions on the subdi erentials [15, 209]. 10. Applications to functional analysis Linear functional analysis is a special case of the more general analysis of convex multifunctions. Taking this point of view and using variational arguments will lead to signi cant simpli cations of the proofs for many classical results. We illustrate by given several examples. A multifunction F : X → 2Y is called a closed convex multifunction if the graph S of F is a closed convex set. Recall that a set S ⊂ X is absorbing provided that X = ¿0 S and a point s is in the core of S (denoted by s ∈ core S) provided that S − s is absorbing. Theorem 10.1 (Open mapping theorem). Let F : X → 2Y be a closed convex multifunction. Suppose that y0 ∈ core F(X ). Then F is open at y0 ; that is to say; for any x0 ∈ F −1 (y0 ) and any ¿0; y0 ∈ int F(x0 + BX ): Proof. Let T : X × Y → Y be a linear operator de ned by T (x; y) = y and let A := Graph F. It is plain that we need only to show that T |A is open at (x0 ; y0 ). Since T (A − (x0 ; y0 )) = F(X ) − y0

748

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

is absorbing and A is convex, a standard category argument implies that there exists ¿0 such that BY ⊂ cl T ((A − (x0 ; y0 )) ∩ BX ×Y ):

(41)

We show that T (x0 ; y0 ) + (=2)BY ⊂ T (((x0 ; y0 ) + BX × Y ) ∩ A). Let z ∈ T (x0 ; y0 ) + (=2)BY and set h(x; y) := kT (x; y) − zk. Applying the convex multidirectional mean value inequality of Theorem 2.13 which holds in an arbitrary Banach space to function h, set Y := ((x0 ; y0 )+BX × Y ) ∩ A and point (x0 ; y0 ) yields that there exist u ∈ ((x0 ; y0 )+ BX × Y ) ∩ A and u∗ ∈ @h(u) such that inf h − h(x0 ; y0 ) − =4 ≤ hu∗ ; (x; y) − (x0 ; y0 )i; Y

∀x ∈ Y:

(42)

If h(u) = 0 then T (u) = z and we are done. Otherwise u∗ = T ∗ y∗ with y∗ ∈ @k·k(T (u)− z) being a unit vector. Then we can rewrite Eq. (42) as 0 ≤ inf h ≤ h(x0 ; y0 ) + =4 + hy∗ ; T ((x; y) − (x0 ; y0 ))i Y

≤ =2 + =4 + hy∗ ; T ((x; y) − (x0 ; y0 ))i;

∀(x; y) ∈ ((x0 ; y0 ) + BX ×Y ) ∩ A:

Observing that BY ⊂ cl T ((A − (x0 ; y0 )) ∩ BX × Y ) the in mum of the right hand side of the above inequality is −=4, a contradiction. Theorem 10.2 (Boundedness of convex functions). Let f : X → R be a lower semicontinuous convex function. Then f is continuous at any point in the core of its domain. Thus f is everywhere continuous if and only if f is nite everywhere. Proof. We need only to prove the rst assertion: F(x) := f(x) + [0; +∞): Then F and F −1 are closed convex multifunctions because graph F := epi f is a closed convex set. Let x ∈ core (dom f) = core F −1 (R). By the Open Mapping Theorem 10.1, F −1 is open at x. Now consider any open interval (a; b) that contains f(x). The lower semicontinuity of f implies that {x: f(x) ≤ a} is closed. Thus, x is in the interior of f−1 ((a; b)) = F −1 ((a; b))\{x : f(x) ≤ a}: Therefore, f is continuous at x. Theorem 10.3 (Principle of uniform boundedness). Let A be a set of bounded linear operators from X to Y such that for each x ∈ X; sup{kAxk: A ∈ A}¡+ ∞. Then sup{kAk: A ∈ A}¡+ ∞. Proof. De ne f(x) := sup{kAxk: A ∈ A}: Then it is easy to verify that f is a lower semicontinuous convex function, as a supremum of convex continuous functions. Since f(x)¡+∞ for all x ∈ X , by Theorem 10.2,

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

749

f is continuous. In particular, there exists a ¿0 such that sup{f(x): x ∈ BX }¡∞. Then sup{kAk: A ∈ A} = sup{kAxk: A ∈ A; x ∈ BX } 1 1 = sup{kAxk: A ∈ A; x ∈ BX } = sup{f(x): x ∈ BX }¡+ ∞:   ∗

We now turn to monotone operators. A multifunction F : X → 2X is called a monotone operator provided hy∗ − x∗ ; y − xi ≥ 0 for any pairs (x; x∗ ) and (y; y∗ ) in the graph of F. The subgradient of a convex function provides a central example. We say F is locally bounded at x ∈ dom F if there exist M ¿0 and ¿0 such that ky∗ k ≤ M whenever y ∈ (x + BX ) ∩ dom F and y∗ ∈ F(y). It is often possible to derive information about monotone operators from variational or convex analysis as the next result illustrates. ∗

Theorem 10.4 (Boundedness of monotone operators). Let F : X → 2X be a monotone operator. Suppose that x ∈ core (dom F). Then F is locally bounded at x. Proof. By choosing any x∗ ∈ F(x) and replacing F by the monotone operator y → F(y + x) − x∗ , we lose no generality in assuming that x = 0 and that 0 ∈ F(0). De ne, for x ∈ X , f(x) := sup{hy∗ ; x − yi: y ∈ dom F; kyk ≤ 1 and y∗ ∈ F(y)}: As the supremum of ane continuous functions, f is convex and lower semicontinuous. We show that dom f is an absorbing set. First since 0 ∈ F(0), we must have f ≥ 0. Second, whenever y ∈ dom F and y∗ ∈ F(y), monotonicity implies that 0 ≤ hy∗ −0; y − 0i, so f(0) ≤ 0. Thus, f(0) = 0. Suppose x ∈ X . By hypothesis, dom F is absorbing so there exists t¿0 such that F(t x) 6= ∅. Choose any element u∗ ∈ F(tx). If y ∈ dom F and y∗ ∈ F(y), then by monotonicity hy∗ ; t x − yi ≤ hu∗ ; t x − yi: Consequently, f(t x) ≤ sup{hu∗ ; t x − yi: y ∈ dom F; kyk ≤ 1}¡hu∗ ; t xi + ku∗ k¡+∞: By virtue of Theorem 10.2, f is continuous at 0 and hence there exists ¿0 such that f(x)¡1 for all x ∈ 2BX . Equivalently, if x ∈ 2BX , then hy∗ ; xi ≤ hy∗ ; yi+1 whenever y ∈ dom F, kyk ≤ 1 and y∗ ∈ F(y). Thus, if y ∈ BX ∩ dom F and y∗ ∈ F(y), then 2ky∗ k = sup{hy∗ ; xi: x ∈ 2BX } ≤ ky∗ k · kyk + 1 ≤ ky∗ k + 1; so ky∗ k ≤ 1=.

750

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

Remark 10.5. (a) It is well known that the (convex) subdi erential of a lower semicontinuous convex function f is monotone. Thus; Theorems 10.4 and 4.3 imply that in the core of the domain of the subdi erential of f; f is in fact locally Lipschitz. (b) Note that Theorem 10.4 does not require that the domain of F be convex. There are trivial examples which show that 0 can be an absorbing point of dom F but not an interior point (see [155]). The best development is to be found in [174]. We now use the fuzzy multiplier rule to prove a simple version of the minimax theorem. Theorem 10.6 (Minimax theorem). Let X and Y be re exive Banach spaces and let C ⊂ X and D ⊂ Y be compact sets. Suppose that g : C × D → R is convex and lower semicontinuous on C and concave and upper semicontinuous on D. Then there exists a saddle point (x; y)  ∈ C × D such that; for any x ∈ C and any y ∈ D; g(x; y)  ≥ g(x; y)  ≥ g(x; y): Proof. Let p := min{r | g(x; y) ≤ r; for all y ∈ D} x∈C

and let ¿0. Then \ {x: g(x; y) ≤ p − } = ∅: C∩ y∈D

Since C is compact there exists a nite set D ⊂ D such that \ {x: g(x; y) ≤ p − } = ∅: C∩ y∈D

That is to say the minimum value of the minimization problem G minimize subject to

r g(x; y) − r ≤ 0;

y ∈ D

C (x) ≤ 0; is no less than p − . Let x be a solution to problem G which exists by compactness. Then, for any ¿0 and any weak neighborhood U of 0 in X ∗ , applying Theorem 3.1 we have that there exists x0 ; xy ∈ x + BX ; y ∈ D and y ≥ 0; y ∈ D such that X y ({−1} × DF g(xy ; y)) + {0} × NF (C; x0 ) + (−; ) × U: 0 ∈ (1; 0) + y∈D

For convex functions g(x; y); y ∈ D , the Frechet subdi erential and normal cone coincide with the convex subderivative @ and convex normal cone when exist. Moreover, the convex subderivative and normal cone are upper semicontinuous multifunctions.

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

751

Finally, without loss of generality, P we can assume that y → y as  → 0 for y ∈ D . Thus, upon taking limits, we have y∈D y = 1 and X y @g(x ; y) + @C (x ): 0∈ y∈D

By the easy part of the sum rule for convex functions we may write the last inclusion as   X y g(·; y) + C  (x ): 0∈@ y∈D

That is to say the convex function X y g(·; y) + C y∈D

attains a minimum at x which is no less than p − . De ne y := Since g is concave in y we have, for any x ∈ C, X X y g(x; y) ≥ y g(x ; y) ≥ p − : g(x; y ) ≥ y∈D

P y∈D

y y ∈ D.

y∈D

Since D is compact; passing to a subsequence if necessary we may assume that y → y ∈ D. Taking limits in the inequality g(x; y ) ≥ p − ; ∀x ∈ C and appealing to the upper semicontinuity of g in y, yields g(x; y)  ≥ p for all x ∈ C. Similar arguments applying to −g yields an element x ∈ C such that g(x; y) ≤ p for all y ∈ D. Then (x; y)  is the saddle point. Remark 10.7. The proof given above is technically easy but conceptually less clean. In contrast; a technically hard but conceptually easy proof is outlined below. Assume now that g(x; ·) is continuous and consider p := min{r | g(x; y) ≤ r; for all y ∈ D}: x∈C

1: We have an abstract optimization problem with G(x; r)(y) := g(x; y) − r viewed as an abstract convex mapping from C × R into the continuous functions on D with the nonnegative cone. A Lagrange multiplier exists since the Slater condition applies. 2: By the Riesz representation theorem the Lagrange multiplier is a non-negative measure . Therefore; we can write Z (g(x; y) − r)(dy) + r ≥ p for all x ∈ C: D

Since r is a free variable; we must have a probability measure and Z g(x; y)(dy) ≥ p for all x ∈ C: D

752

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

3: Now observe that since D is compact and convex; the barycentre b := exists as a weak integral: for each  ∈ Y ∗ Z h; b i = h; yid:

R D

y d

D

(This follows from the Hahn–Banach Theorem and a compactness argument and can be found; for example; in Rudin [173]). 4: Now use Jensen’s inequality for integrals to observe that Z g(x; b ) ≥ g(x; y)(dy) ≥ p for all x ∈ C: D

We conclude this section with a nice application to distance functions along the lines of the analysis in [16]. We recall that a norm is Kadec-Klee (sequentially) if the weak and norm topologies coincide (sequentially) on the boundary of the unit ball. Let W be a given weakly compact set. We de ne a Hadamard(W) bornology by H (W ) := {S: S is compact ∪ W }. We will also slightly abuse the notation by denoting @∗YH (C; x) f(x) := {x∗ : x∗ ∈ @∗ f(x) ∩ (−@g(x))} where Y is the closed span of C ∪ {x}, g is a convex Lipschitz function that is uniformly di erentiable at x on conv(C; x) and that f + g attains a local minimum at x. Theorem 10.8. Let C be a closed relatively weakly compact subset of a Banach space. Let Y be the closed span of C. Suppose that the norm is Kadec-Klee on X . Then (i) The set of points in X at which every minimizing sequence clusters to a best approximation is dense in X . (ii) If in addition; the original norm is Frechet on Y then @∗YH (C; x) d(C; x) ⊂ @∗YF d(C; PC (x)) where PC (x) is the (set of) best approximations of x on C. In particular; in any Frechet smooth locally uniformly rotund norm (see e.g. [155]) on a re exive space; this holds for all sets in the Frechet sense with a single-valued metric projection. Proof. We know, from Section 3 of [33], that Y has a H (C) renorm. After using [33, Lemma 3.6], the proof parallels that in [16], and we deduce that at any of the dense set of points with @∗YH (C; x) d(C; x) nonempty, all minimizing sequences actually converge in norm to a best approximation, b; and the corresponding subgradient provides a proximal normal to C at b. Finally, when the norm on Y is H (C)-smooth, simple derivative estimates show that any member of @∗YH (C; x) d(C; x) must actually lie in @∗YH (C) d(C; PC (x)): A similar result holds for C only boundedly relatively weakly compact. Note that this result allows us to show that normal cone de ned in terms of distance functions is always contained in the normal cone de ned in terms of indicator functions. Note

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

753

also that in Hilbert space we may derive DF d(C; x) ⊂ Dp d(C; PC (x)): 10.1. Notes The open mapping Theorem for convex multifunctions was proved rst by Robinson [163, 164] and Ursescu [194]. The current form was taken from [13] and the proof is drawn from [117]. The continuity of convex functions and the principle of uniform boundednes are classical results. The short proofs show that the open mapping Theorem for convex multifunctions is quite general. The boundedness result for monotone operators and its proof follow Phelps’ book [155]. Theorem 10.8 appears in [16]. The uni ed treatment largely follows the outline in [12, 13]. 11. Sensitivity analysis The basic pattern of sensitivity analysis is well illustrated by the following example taken from Clarke [47]. Consider the optimization problem Pa of minimizing f(x) subject to h(x) = a and again de ne the optimal value or marginal function v(a) := {f(x): h(x) = a}. Then it is not hard to see that, for any x; v(h(x)) ≤ f(x). On the other hand, if x is a solution to P0 then v(h(x)) = f(x). Thus, x is a minimum point for the function, x → f(x) − v(h(x)): Assuming all the functions involved are smooth then f0 (x) − v0 (0)h0 (x) = 0: In other words, −v0 (0) is a Lagrange multiplier (shadow price) of the problem P0 . We have seen in Example 1.2 that v is rarely a smooth function. Therefore, the above argument will not apply in general. Nevertheless the general pattern does persist and it turns out that the subdi erential provides a convenient language to describe it. 11.1. Sensitivity for nonsmooth optimal value functions  i = 0; 1; : : : ; N and b = (b1 ; : : : ; Let X be a re exive Banach space. Let fi : X → R; bN ) ∈ RN . Consider the following family of optimization problems: Pb

minimize

f0 (x)

subject to

fi (x) ≤ bi ;

i = 1; 2; : : : ; M;

fi (x) = bi ;

i = M + 1; : : : ; N:

We denote the optimal value of this family of problems as a function of b by v(b). Motivated by Theorem 3.3 we de ne the multiplier set of problem Pb as follows.

754

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

Deÿnition 11.1. Let  be a positive number and let U be a weak neighborhood of 0 in X ∗ . We say  = (1 ; : : : ; N ) is a multiplier of problem Pb corresponding to (x; ; U ) if i ; i = 1; : : : ; N; are not all 0; i i ≥ 0 and there exist (xi ; fi (xi )) ∈ (x; fi (x))+ BX ×R ; i = 0; 1; : : : ; N; such that 0 ∈ DF f0 (x0 ) +

N X

i i DF (i fi )(xi ) + U:

i=1

We denote the set of all such multipliers by M;U (x): Theorem 11.2 (Sensitivity). Let xa be a solution to problem Pa . Then; for any ¿0 and any weak neighborhood U of 0 in X ∗ ; −DF v(a) ⊂ M; U (xa ) + BRN : Proof. There is nothing to prove if DF v(a) = ∅. Let  ∈ −DF v(a) 6= ∅. Then there exists a Frechet smooth function g such that v + g attains a local minimum 0 at a and ∇g(a) = . Note that for any x satisfying the constraint, fi (x) ≤ bi ; i = 1; : : : ; M and fi (x) = bi ; i = M + 1; : : : ; N , we have f0 (x) ≥ v(b) so that f0 (x) + g(b) ≥ v(b) + g(b) ≥ v(a) + g(a) = f0 (xa ) + g(a): Thus, (xa ; a) is a solution to the minimization problem minimize

f0 (x) + g(b)

subject to

fi (x) − bi ≤ 0;

i = 1; 2; : : : ; M;

fi (x) − bi = 0;

i = M + 1; : : : ; N:

0

Choose  ¿0 smaller than =2 such that a0 ∈ 0 BRN implies that k∇g(a0 )−∇g(a)k¡=2. Note that U × 0 BRN is a weak neighborhood in X ∗ × RN . By Theorem 3.3, there exists xi ∈ xa + 0 BX ⊂ xa + BX and a0 ∈ a + 0 BRN such that 0 ∈ DF f0 (x0 ) × {∇g(a0 )} +

N X

i [DF (i fi )(xi ) × (−i ei )] + U × 0 BRN ;

i=1

where {ei }Ni= 1 is the standard base of RN . We can rewrite this relation as 0 ∈ DF f0 (x0 ) +

N X

i DF (i fi )(xi ) + W

i=1

and 0 ∈ ∇g(a0 ) −  + 0 BRN : That is  ∈ M; U (xa ) + BRN . Theorem 11.2 and its proof show that the key in discussing sensitivity is to have an appropriate necessary condition for the corresponding optimization problem. Using

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

755

the necessary condition in Section 3.3 we can similarly discuss sensitivity analysis for optimization problems with in nitely many constraints. Moreover, since our sensitivity results are expressed in a fuzzy form for the subdi erential, a variational argument would enable us to omit the assumption that the problem Pa has a solution. And a natural limiting process leads to sensitivity results in terms of the limiting subdi erentials. Details reside in [34]. Note that to calculate the value of v(b) one has to solve an optimization problem which usually is costly. By contrast the multiplier set is de ned through the original data and is easier to get hold of. Hence, Theorem 11.2 provides a useful estimate for the value function v. We illustrate how to use this theorem by revisiting Example 1.2. Since both f and g in Example 1.2 are C 1 , taking limits in Theorem 11.2 we have DF v(a) ⊂ {: f0 (x) − g0 (x) = 0; g(x) = a}: Thus, for any a with g(x) = a implies g0 (x) 6= 0 DF v is bounded in a neighborhood of a. By the Lipschitz characterization in Section 4.2 v is Lipschitz in a neighborhood of such √ points. Setting g0 (x) = 6 cos(6x)−3 = 0 we nd x = ±=18 or a = ±( 3=2−=6) to be possible discontinuous√points for v. Moreover, √ √ when g(x) fall√in any of the three open intervals (−=2; =6 − 3=2), (=6 − 3=2; 3=2 − =6) and ( 3=2 − =6; ), g0 (x) 6= 0. Therefore, for any such a the corresponding DF v(a) is of the form f0 (x)=g0 (x) with g(x) = a invertable. Thus, v is C 1 on each of these open intervals. One can easily verify these facts from Fig. 1. 11.2. Notes Early works on sensitivity analysis for nonsmooth mathematical programming problems can be found in [5, 45, 83, 84, 169]. Of course, in a linear or smooth setting the literature is vaster and older. Recent extensions can be found in [141 – 143, 150, 196–198]. The sensitivity result given here for constrained optimization problems with lower semicontinuous inequality constraints and continuous equality constraints follows Borwein et al. [34]. There is a considerable literature on sensitivity analysis for classes of optimal control problems that cannot be adequately reformulated as abstract optimization problems. For those optimal control problems appropriate maximum principles ll the role of necessary conditions. Research along this vein started in [46]. Various extensions and re nements can be found in [37, 56, 57, 62, 127, 210, 212].

12. Applications to eigenvalues Eigenvalue optimization problems arise throughout numerical analysis and in many engineering design problems (see survey [122]). Since these eigenvalues vary nonsmoothly with the matrix, the subdi erential is a natural tool in the analysis. In this section we discuss applications of subdi erentials in studying eigenvalues of symmetric matrices. Our main reference is Lewis [121].

756

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

Let X := S(n) be the vector space of n × n symmetric matrices endowed with the inner product hA; Bi := trace(AB);

∀A; B ∈ X:

Let A ∈ X . We write the n eigenvalues of A including multiplicity as 1 (A) ≥ · · · ≥ n (A) and de ne the eigenvalue map  : X → Rn by (A) := (1 (A); : : : ; n (A)). We use Diag y to signal the diagonal matrix with diagonal elements y ∈ Rn . We are interested in the extended-real-valued function of the eigenvalues of matrices:  f ◦  : X → R; where f : Rn → R is lower semicontinuous and invariant under the coordinate permutation. Note this type of functions include the maximum eigenvalue (or more generally the kth largest eigenvalue) of a matrix. The main result we will present is Theorem 12.1 (Subdi erential of eigenvalue functions). DF (f ◦ )(A) = {U T (Diag)U : U orthogonal; U T (Diag (A))U = A;  ∈ DF f((A))}: Then a limiting process leads to @(f ◦ )(A) = {U T (Diag )U : U orthogonal; U T (Diag (A))U = A;  ∈ @f((A))}: The proof of this result needs several auxiliary results. We discuss them rst. Lemma 12.2. Let U be an orthogonal matrix and let A ∈ X . Then U T DF (f ◦ )(A)U = DF (f ◦ )(U T AU ): Proof. Let B ∈ DF (f ◦ )(A). Then, for any C ∈ X , (f ◦ )(A + tUDU T ) = (f ◦ )(A) + thB; Ai + o(tkUDU T k): Since similarity transformation does not change the eigenvalues and f is permutation invariant, we can rewrite the above equality as (f ◦ )(U T AU + tD) = (f ◦ )(U T AU ) + thU T BU; U T AU i + o(tkDk): That is U T BU ∈ U T DF (f ◦ )(A)U . Since B ∈ DF (f ◦ )(A) was arbitrary we have proved that U T DF (f ◦ )(A)U ⊂ DF (f ◦ )(U T AU ): The reverse inclusion may be derived by replacing A with U T AU and U with U T .

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

757

Lemma 12.3. Let f : Rn → R be a permutation invariant closed function. Then b ∈ DF f(a) if and only if Diag b ∈ DF (f ◦ )(Diag a). The “only if ” part of this Lemma is easy. In fact, for any small vector c ∈ Rn we have f(a + c) = (f ◦ )(Diag a + Diag c) ≥ (f ◦ )(Diag a) + trace(Diag b)(Diag c) + o(kDiag c)k) = f(a) + hb; ci + o(kck); whence b ∈ DF f(a). The “if” part is highly nontrivial. Since the proof relates mostly to algebra rather than nonsmooth analysis we refer the interested reader to [121, Section 5]. Lemma 12.4. Let f : Rn → R be a permutation invariant function. Then B ∈ DF (f ◦ )(A) implies that AB = BA. Proof. De ne MA := {U T AU : U orthogonal}: Then MA is a submanifold of X and the normal space of MA at A is N (MA ; A) = {B ∈ X : BA = AB} (cf. [1, 243; 88, p. 150,121]). Let B ∈ DF (f ◦ )(A). In view of the above characterization of N (MA ; A) we need only show that B ∈ N (MA ; A). Consider any vector T in the tangent space of MA at A. Then there exists a sequence (An ) ⊂ MA such that An → A and (An − A)=kAn − Ak → T=kT k. By the de nition of subdi erential we have (f ◦ )(An ) = (f ◦ )(A) + hB; An − Ai + o(kAn − Ak): Since f ◦  remains constant on MA we obtain hB; An − Ai = o(kAn − Ak): Dividing by kAn − Ak and taking limits as n → ∞ yields hB; T i = 0, as was to be shown. Proof of Theorem 12.1. Let b be an arbitrary element of DF f((A)). Lemma 12.3 shows Diag b ∈ DF (f ◦ )(Diag (A)):

758

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

For any orthogonal matrix U with U T Diag (A)U = A, Lemma 12.2 implies that U T (Diag b)U ∈ DF (f ◦ )(U T Diag (A)U ) = DF (f ◦ )(A): Thus, {U T (Diag b)U : U orthogonal; U T (Diag (A))U = A; b ∈ DF f((A))} ⊂ DF (f ◦ )(A): To prove the reverse inclusion, let B be an arbitrary element of DF (f ◦ )(A). By virtue of Lemma 12.4, B and A commute and therefore can be diagonalized simultaneously, that is to say, there exists an orthogonal matrix U and a vector b ∈ Rn such that U T AU = Diag A and U T BU = Diag b. Invoking Lemma 12.2 we have Diag b ∈ DF (f ◦ )(Diag A); whence b ∈ DF f((A)), by Lemma 12.3, as required. Remark 12.5. The formula in Theorem 12.1 also holds for the Clarke generalized gradient when f is Lipschitz using a similar argument (see [121]). An interesting application of Theorem 12.1 is when f(x) = k (x) = the kth largest element of {x1 ; : : : ; xn }. In this case f ◦ (A) = k (A) the kth largest eigenvalue of matrix A and one can derive explicit formula. In view of Theorem 12.1 the key is to calculate the subdi erential of k . Lemma 12.6. At any point x ∈ Rn ;  conv{ei : xi = k (x)}; if k−1 (x)¿k (x), DF k (x) = ∅; otherwise, @k (x) = {y ∈ conv{ei : xi = k (x)}: |supp y| ≤ }; where := 1 − k + |{i: xi ≥ k (x)}|; and @c k (x) = conv{ei : xi = k (x)}: The proof of this lemma is elementary. Details can be found in [121]. Then we have: Theorem 12.7. Let A ∈ X := S(n). Then @c k (A) = conv{uuT : u ∈ Rn ; kuk = 1; Au = k (A)u}; and @k (A) = {B ∈ @c k (A): rankB ≤ }; where := 1 − k + |{i: i (A) ≥ k ((A))}|. Example 12.8. Consider a very simple case when A is the two by two unit matrix. Then @2 (A) = {uuT : u ∈ R2 ; kuk = 1} and @c 2 (A) = conv{uuT : u ∈ R2 ; kuk = 1}. Note

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

759

@c 2 (A) contains nonsingular matrices Diag (a; 1−a) for a ∈ [0; 1] while all the elements of @2 (A) are singular matrices. These formulae show the striking di erence between the Clarke generalized gradient and the limiting subdi erential. Remark 12.9. Note that 1 (A); the maximum eigenvalue of A; is convex as is P n i=1 i (A); the trace of A; because the corresponding function f is convex. For these convex functions both @c and @ coincide with the convex subgradient. 12.1. Notes The study of sensitivity of the eigenvalues of a matrix with respect to a single perturbation parameter has a long history. Kato [113] is an excellent standard reference. The Clarke generalized gradient for the maximum eigenvalue of a matrix with respect a single parameter was rst calculated by Polak and Wardi [156]. More general results along this line of research can be found in [175]. The results presented in this section follows Lewis [121] where the author treats the eigenvalue as a function of the matrix itself in the space S(n). They extend the authors earlier work [118–120] where one can also nd additional references. 13. Generalized solutions to partial di erential equations Our nal set of applications touches on the issues involved in providing generalized solutions to partial di erential equations. These are generalized solutions in our terms and not weak solutions in the classical PDE sense. 13.1. Uniqueness of viscosity solutions to Hamilton–Jacobi equations The Hamilton–Jacobi equation u + H (x; Du) = 0

(43)

is closely related to the optimal value function of certain optimal control problems. Consider the value function u of the optimal control problem  Z ∞ −t 0 e f(x(t); c(t)) dt: x (t) = g(x(t); c(t)); c(t) ∈ C; x(0) = x ; u(x) := inf 0

where f and g are Lipschitz functions, c is a measurable function modelling the control and C is a compact set modelling the admissible range of the control function. Then when u is smooth it satis es Eq. (43) with H (x; p) := sup{h−g(x; c); pi − f(x; c): c ∈ C} (see [81]). In general, such a value function is not necessarily smooth and Eq. (43) does not necessarily have a classical solution. Viscosity solutions were introduced by Crandall and Lions [68] to replace classical solutions. We recall the de nition below. First, let

760

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

be a bornology or H (s) for some s ∈ (0; 1] as in Section 9.1 and let f : X → R ∪ {−∞} be an upper semicontinuous function. We de ne the -superdi erential of f at x, D f(x), by D f(x) := −D (−f)(x): Deÿnition 13.1. A function u : X → R is a viscosity supersolution (viscosity subsolution) of Eq. (43) if u is lower (upper) semicontinuous and, for every x ∈ X and every x∗ ∈ DF (u)(x) (x∗ ∈ DF (u)(x)): u(x) + H (x; x∗ ) ≥ 0

(u(x) + H (x; x∗ ) ≤ 0):

A continuous function u is called a viscosity solution if u is both a viscosity subsolution and a viscosity supersolution. The uniqueness of viscosity solution to the Hamilton–Jacobi equation follows readily from the following comparison theorem. Theorem 13.2 (Comparison theorem). Suppose H : X × X ∗ → R satis es the following assumption (A) for any x1 ; x2 ∈ X and x1∗ ; x2∗ ∈ X ∗ ; |H (x1 ; x1∗ ) − H (x2 ; x2∗ )| ≤ !(x1 − x2 ; x1∗ − x2∗ ) + M max(||x1∗ ||; ||x2∗ ||)||x1 − x2 || where M ¿0 is a constant and ! : X × X ∗ → R is a continuous function with !(0; 0) = 0. Let u be an upper semicontinuous function bounded above and v be a lower semicontinuous function bounded below. If u is a viscosity subsolution of Eq. (43) and v is a viscosity supersolution of Eq. (43), then u ≤ v. Proof. Let  be an arbitrary positive number. Applying the nonlocal fuzzy sum rule of Theorem 2.1 with f1 = v and f2 = − u, there exist x1 ; x2 ∈ X , x1∗ ∈ DF v(x1 ) and x2∗ ∈ DF u(x2 ) satisfying (i) kx1 − x2 k¡; (ii) ||x1∗ ||||x1 − x2 ||¡ and ||x2∗ ||||x1 − x2 ||¡; (iii) v(x1 ) − u(x2 )¡ inf (v − u) + ; (iv) kx1∗ − x2∗ k ≤ . Since the function v is a viscosity supersolution of Eq. (43) we have v(x1 ) + H (x1 ; x1∗ ) ≥ 0: Similarly u(x2 ) + H (x2 ; x2∗ ) ≤ 0: Therefore, inf (v − u) ¿ v(x2 ) − u(x1 ) −  X

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

761

≥ [H (x2 ; x2∗ ) − H (x1 ; x1∗ )] −  ≥ −[!(x2 − x1 ; x2∗ − x1∗ ) + M max(||x1∗ ||; ||x2∗ ||)||x2 − x1 ||] − : As  → 0 the right-hand side converges to 0 which yields inf X (v − u) ≥ 0. Corollary 13.3 (Uniqueness of viscosity solutions). Under the assumptions Theorem 13.2 any continuous bounded viscosity solution to Eq. (43) is unique.

of

13.2. Relations among di erent generalized solution concepts Once again in this section we assume that X is a re exive Banach space. Besides viscosity solutions there are many other viable generalized solution concepts. An obvious generalization is the following. Deÿnition 13.4. A function u : X → R is a -viscosity supersolution ( -viscosity subsolution) of Eq. (43) if u is lower (upper) semicontinuous and, for every x ∈ X and every x∗ ∈ D (u)(x) (x∗ ∈ D (u)(x)): u(x) + H (x; x∗ ) ≥ 0

(u(x) + H (x; x∗ ) ≤ 0):

A continuous function u is called a -viscosity solution if u is both a -viscosity subsolution and a -viscosity supersolution. Another important generalized solution concept is the minimax solution whose definition needs the following sequential weak lower Dini directional derivative: Dw f(x; u) := inf lim+ inf w

{ui } t → 0 u →u i

f(x + tui ) − f(x) ; t

w

where → denotes weak convergence and the in mum is taken over all sequences {ui } weakly converging to u. The sequential weak upper Dini directional derivative is de ned symmetrically by w

D f(x; u) := −Dw (−f)(x; u): Deÿnition 13.5. A function u : X → R (u : X → R ∪ {−∞}) is a minimax supersolution (minimax subsolution) of Eq. (43) if u is lower (upper) semicontinuous and, for every x ∈ X where u(x) is nite, sup inf {Dw u(x; v) − hv∗ ; vi − u(x) − H (x; v∗ )} ≤ 0;

v∗



∈X∗ v∈X

inf ∗

 w sup {D u(x; v) − hv∗ ; vi − u(x) − H (x; v∗ )} ≥ 0 :

v ∈X∗ v∈X

(44) (45)

A continuous function u is called a minimax solution if u is both a minimax subsolution and a minimax supersolution.

762

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

Despite the apparent di erences in these de nitions we have the following equivalence results. Theorem 13.6. Let X be a -smooth Banach space (where can be H (s) as noted in Section 8) and let be either a bornology or = H (s) for some s ∈ (0; 1] such that -di erentiability is a di erentiability concept weaker than the -di erentiability. Let H satisfy the following Lipschitz condition: |H (x; v∗ ) − H (x; u∗ )| ≤ M kv∗ − u∗ k

∀x ∈ X;

and let u : X → R be a lower semicontinuous function. Then the following are equivalent: (i) u is a minimax supersolution; (ii) u is a -viscosity supersolution; (iii) u is a -viscosity supersolution. Proof. (i)⇒(ii): Suppose that u is a minimax solution. Let x∗ ∈ D u(x). Then there is an -smooth function g such that ∇ g(x) = x∗ and u − g attains a minimum at x. Then, for any v ∈ X , 0 ≤ Dw (u − g)(x; v) = Dw u(x; v) − hx∗ ; vi:

(46)

Set v∗ := x∗ in relation (44). Then for any ¿0 there exists v such that Dw u(x; v) − hx∗ ; vi − u(x) − H (x; x∗ )¡: In light of Eq. (46) we have u(x) + H (x; x∗ )¿−: Letting  → 0 yields u(x) + H (x; x∗ ) ≥ 0: Thus, u is a -viscosity supersolution. (ii)⇒(iii): Since -di erentiability is weaker than the -di erentiability any -viscosity supersolution is obviously a -viscosity supersolution. (iii)⇒(i): We prove this by contradiction. Suppose that u is a -viscosity supersolution but fails to be a minimax supersolution. Then there exists x0 ∈ X , v∗ ∈ X ∗ and ¿0 such that u(x0 )¡+∞ and Dw u(x0 ; v) − hv∗ ; vi − u(x0 ) − H (x0 ; v∗ ) ≥ 2;

∀v ∈ X:

(47)

Set f(x) := u(x)−hv∗ ; vi.Then we claim that for any integer i there exists a i ∈ (0; 1=i) such that t ∈ [0; 2i ] and v ∈ MBX implies that f(x0 + tv) − f(x0 ) ¿ + u(x0 ) + H (x0 ; v∗ ): t

(48)

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

763

In fact, if the claim fails then we can take sequences ti → 0+ and vi ∈ MBX such that f(x0 + ti vi ) − f(x0 ) ¿ + u(x0 ) + H (x0 ; v∗ ): ti Since MBX is weakly compact we may assume without loss of generality that vi weakly converges to v ∈ MBX . This will yield Dw u(x0 ; v) − hv∗ ; vi − u(x0 ) − H (x0 ; v∗ ) ≤ ;

∀v ∈ X;

a contradiction to Eq. (47). Let Y := x0 + i MBX . Inequality (48) implies that lim

inf

 → 0 y ∈ Y +BX

f(y) − f(x0 )¿i ( + u(x0 ) + H (x0 ; v∗ )):

Applying the -smooth space version of the multidirectional mean value inequality [217] (see Theorem 2.14, Remark 2.14. (c) and Section 9.1) with f; Y; x = x0 ;  = 1=i and r = i ( + u(x0 ) + H (x0 ; v∗ )) we conclude that there exist xi ∈ x0 + (i + 1=i)MBX , xi∗ ∈ D u(xi ) such that hxi∗ − v∗ ; i vi ≥ i ( + u(x0 ) + H (x0 ; v∗ ));

∀v ∈ MBX

(49)

and f(xi )¡f(x0 ) + |i ( + u(x0 ) + H (x0 ; v∗ ))| + 1=i:

(50)

Note that xi → x0 . The lower semicontinuity of u and inequality (50) implies that u(xi ) → u(x0 ). Since u is a -viscosity supersolution 0 ≤ u(xi ) + H (xi ; xi∗ ). Moreover, since H (x; x∗ ) is Lipschitz in x∗ uniformly in x with a Lipschitz constant M we have H (xi ; xi∗ )¡H (xi ; v∗ )+M kxi∗ −v∗ k = H (xi ; v∗ )−hxi∗ −v∗ ; vi for some v ∈ MBX . Combining these inequalities with Eq. (49) we obtain 0¡u(xi ) + H (xi ; v∗ ) − ((u(x0 ) + H (x0 ; v∗ )) − : Taking limits as i → ∞ yields a contradiction 0 ≤ − which completes the proof. Similar equivalent relations hold for the subsolutions. Thus, we have Corollary 13.7. Let the assumptions of Theorem 13.6 be satis ed and let u : X → R be a continuous function. Then the following are equivalent: (i) u is a minimax solution; (ii) u is a -viscosity solution; (iii) u is a -viscosity solution. The assumption X is -smooth is crucial. Example 13.8. Let X be an -smooth Banach space with a nowhere -di erentiable norm k · k. Consider Eq. (43) with H (x; x∗ ) = kx∗ k. Then 1. u = 0 is the unique continuous bounded -viscosity solution; 2. u(x) = kxk=(kxk + 1) is a continuous bounded -viscosity solution. Proof. (i) One can directly check that u = 0 is an -viscosity solution. The uniqueness follows from Theorem 13.2.

764

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

(ii) It is trivial to observe that u(x) = kxk=(kxk + 1) is a -viscosity supersolution. We will show that D u(x) = ∅ for all x ∈ X . Thus, u is also a -viscosity subsolution and, therefore, a -viscosity solution. In fact, if, for some x , D u(x) 6= ∅ then there exists a -smooth function g such that u − g attains a local maximum 0 at x . Since u(x)¡1 we have g(x)¡1 in a neighborhood of x . Observing that u(x) = h(kxk) where h(t) = t=(1 + t) we have kxk − h−1 ◦ g(x) = kxk −

g(x) 1 − g(x)

attains a local maximum at x . This amounts to saying that k · k is -superdi erentiable at x . Since k · k is convex we deduce that k · k is -di erentiable at x , a contradiction. As a concrete example, we can take X := Lp , 1¡p¡2, := F and := LS. 13.3. Notes Comparison and uniqueness results for Hamilton–Jacobi equations were derived in [65,69,72]. The relationship between the fuzzy sum rule and the comparison theorem for the Hamilton–Jacobi equation was observed in [73]. The uniform continuity condition imposed on H in [73] was replaced by the weaker condition (A) in [36] using the re ned local fuzzy sum rule. This allows one to apply Theorem 8.2 and Corollary 8.3 to the Hamilton–Jacobi equation corresponding to a general optimal control problems. (The uniform continuity condition in [73] is not satis ed even for H corresponding to a linear optimal control problem.) The corresponding in nite-dimensional result [36, Theorem 3.2] requires the solution to be uniformly continuous. It is observed that this restriction can be removed by using the nonlocal fuzzy sum rule in [217] and the full proof of Theorem 13.2 is given here for the rst time. The -viscosity solutions were introduced in [72]. The minimax solution originated in the work of Subbotin [176–179] for equations in nite-dimensional spaces. The de nition used here is a modi cation of the Hilbert space de nition in [50]. Theorem 13.6 is a generalization of the Hilbert space result in [50]. The idea of Example 13.8 is similar to [36, Example 3.7] but the later was wrong. The generalized solution concepts discussed in this section all use both sub- and super-di erentials and only apply to continuous solutions. In practice, the optimal value function of a control problem is often only semicontinuous. This has stimulated many studies of discontinuous generalized solutions to the Hamilton–Jacobi equations. Solution concepts that use only subdi erentials or superdi erentials are often very e ective in handling semicontinuous solutions. Such solutions were rst studied by Barron and Jensen [8, 9]. Clarke et al. [52] is an excellent survey for results in this direction. More recent work can be found in [10, 208, 211]. The Hamilton–Jacobi equations originated from the dynamic programming method in optimal control theory. The relations between dymamic programming and the maximum principle is one of the important topics in nonsmooth analysis. Detailed analysis can be found in [61].

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

765

14. Open problems 14.1. Second-order theory Comparison of the current status of smooth subdi erential theory and the corresponding smooth theory reveals a glaring lack of a second order theory. In nitedimensional space a beautiful sum rule for a second-order derivative-like object close to the fuzzy sum rule discussed in Section 2 was derived in [66]. Applications to secondorder viscosity solution theory were provided by Crandall et al. in their survey paper [67]. However, the proof of this result is fundamentally nite-dimensional (see [26]). There are many other approaches in nite-dimensional spaces. An excellent summary is Chapter 13 of Rockafeller and Wets’ recent monograph [172]. In in nite dimensions the eld is little developed. 14.2. Optimality conditions Whether it is possible to extend the results in Section 3 beyond re exive Banach spaces is both important for applications and challenging theoretically. 14.3. Equivalence of the basic results It is known that all the basic results discussed in Section 2 are equivalent (see [105, 149, 218]). Furthermore, it was established in [25] that they are also equivalent to the smooth variational principle Theorem 1.6 when the underlying space has a smooth norm. For the Frechet subdi erential case, Mordukhovich [144] reduced the condition that the underlying space has a Frechet smooth norm to the underlying space has a Lipschitzian and Frechet di erentiable bump function. It was also shown in [55] that in a Hilbert space the smooth variational priciple is equivalent to a minimization principle. The exact relationship of these basic results and the variational principle remains an interesting problem. The fact that all limiting subderivatives are viscosity subderivatives is known to fail for non-Lipschitz functions in the Gateaux bornology [36]. Since viscosity subderivatives are so much more exible than limiting ones, it would be very nice to see an example of a Lipschitz function f on Hilbert space with the limiting Gateaux subdi erential of f di erent from DG f at some point. 14.4. Local fuzzy sum rule There are many versions of local fuzzy sum rules for functions under competing assumptions. The weakest assumption so far is the uniform lower semicontinuity condition of De nition 2.4. However, as shown by Example 2.12 even this condition is not tight. Thus, nding weaker and useful conditions for the local fuzzy sum rule remains an interesting problem. It was shown in [218] that all the fuzzy sum rules derived so far are actually equivalent. Whether one can nd an essentially di erent local fuzzy sum rule is an interesting challenge.

766

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

14.5. Applications As noted in the introduction the development of nonsmooth analysis has been and is largely motivated by control theory and mathematical programming. Applications in these areas and other related problems are critical for the healthy development of further nonsmooth analysis theory. Acknowledgements We thank Yu. Ledyaev, B. Mordukhovich and L. Thibault for their helpful historical comments. References [1] V.I. Arnold, Geometrical Methods in the Theory of Ordinary Di erential Equations, 2nd ed., Springer, New York, 1988. [2] J.-P. Aubin, Contingent derivatives of set-valued maps and existence of solutions to nonlinear inclusions and di erential inclusions, in: L. Nachbin (Ed.), Mathematical Analysis and Applications, Academic Press, New York, 1981, pp. 159–229. [3] J.P. Aubin, F.H. Clarke, Monotone invariant solutions to di erential inclusions, J. London Math. Soc., 16 (1977) 357 – 366. [4] J.P. Aubin, H. Frankowska, Set-valued Analysis, Birkhauser, Boston, 1990. [5] A. Auslender, Di erentiable stability in nonconvex and nondi erentiable programming, Mathematical Programming Study, 10 (1979) 29 – 41. [6] D. Aussel, J.-N. Corvellec, M. Lassonde, Mean value property and subdi erential criteria for lower semicontinuous functions, Trans. Amer. Math. Soc. 347 (1995) 4147 – 4161. [7] D. Aussel, J.-N. Corvellec, M. Lassonde, Nonsmooth constrained optimization and multidirectional mean value inequalities, preprint. [8] E.N. Barron, R. Jensen, Semicontinuous viscosity solutions for Hamilton-Jacobi equations with convex Hamiltonians, Commun. Partial Di erential Equations 15 (1990) 1713 – 1742. [9] E.N. Barron, R. Jensen, Optimal control and semicontinuous viscosity solutions, Proc. Amer. Math. Soc. 113 (1991) 396 – 402. [10] E.N. Barron, R. Jensen, Semicontinuous solutions for Hamilton–Jacobi equations and the L∞ -control problem, Appl. Math. Optim. 34 (1996) 325 – 360. [11] H.H. Bauschke, J.W. Borwein, On projection algorithms for solving convex feasibility problems, CECM Research Report 95:034 SIAM Rev. 38 (1996) 367 – 426. [12] J.M. Borwein, Convex relations in analysis and optimization, in S. Schaible, W.T. Ziemba (Eds.), Generalized Concavity in Optimization and Economics, Academic Press, New York (1981). [13] J.M. Borwein, Stability and regular points of inequality systems, J. Optim. Theory Appl. 48 (1986) 9 – 52. [14] J.M. Borwein, Epi-Lipschitz-like sets in Banach space: theorems and examples, Nonlinear Anal. TMA 11 (1987) 1207 – 1217. [15] J.M. Borwein, Minimal cuscos and subgradients of Lipschitz functions, in J.-B. Baillonl, M. Thera (Eds.), Fixed Point Theory and its Applications, Pitman Lecture Notes in Mathematics, Longman, Essex (1991) 57 – 82. [16] J.M. Borwein, S. Fitzpatrick, Existence of nearest points in Banach spaces, Canad. J. Math. 61 (1989) 702 – 720. [17] J.M. Borwein, S. Fitzpatrick, A weak Hadamard smooth renorming of L1 ( ; ), Canad. Math. Bull. 36 (1993) 407 – 413.

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

767

[18] J.M. Borwein, S. Fitzpatrick, Weak-star sequential compactness and bornological limit derivatives, CECM Research Report 93-02, Convex Anal. (Special Issue in Celebration of T.R. Rockafellar’s 60th Birthday. Part 1) 2 (1995) 59–68. [19] J.M. Borwein, R. Girgensohn, X. Wang, On the construction of Holder and proximal subderivatives, CECM Research Report 97-091, Canad. Math. Bull., to appear. [20] J.M. Borwein, A. Io e, Proximal analysis in smooth spaces, CECM Research Report 93-04 (1993), Set-valued Anal. 4 (1996) 1 – 24. [21] J.M. Borwein, A. Jofre, A nonconvex separation property in Banach spaces, CECM Research Report 97-103. ZDR: Math. Methods of Oper. Res., to appear. [22] J.M. Borwein, W.B. Moors, Essentially smooth Lipschitz functions, J. Functional Anal. 49 (1997) 305 – 351. [23] J.M. Borwein, W.B. Moors, Null sets and essentially smooth Lipschitz functions, SIAM J. Optim. February 1998, to appear. [24] J.M. Borwein, W.B. Moors, A chain rule for essentially strictly di erentiable functions, CECM Research Report 96:057, SIAM J. Optim. 8 (1998) 300–308. [25] J.M. Borwein, B.S. Mordukhovich, Y. Shao, On the equivalence of some basic principles in variational analysis, CECM Research Report 97-098. [26] J.M. Borwein, D. Noll, Second order di erentiability of convex functions in Banach spaces, Trans. Amer. Math. Soc. 342 (1994) 43 – 81. [27] J.M. Borwein, D. Preiss, A smooth variational principle with applications to subdi erentiability and to di erentiability of convex functions, Trans. Amer. Math. Soc. 303 (1987) 517 – 527. [28] J.M. Borwein, H.M. Strojwas, Tangential approximations, Nonlinear Anal. TMA 9 (1985) 1347 – 1366. [29] J.M. Borwein, H.M. Strojwas, Proximal analysis and boundaries of closed sets in Banach spaces, Part I: Theory, Can. J. Math. 38 (1986) 431 – 452. [30] J.M. Borwein, H.M. Strojwas, Proximal analysis and boundaries of closed sets in Banach spaces, Part II: Applications, Can. J. Math. 39 (1987) 428 – 472. [31] J.M. Borwein, H.M. Strojwas, The hypertangent cone, Nonlinear Anal. TMA 13 (1989) 125 – 144. [32] J.M. Borwein, J.S. Treiman, Q.J. Zhu, Necessary conditions for constrained optimization problems with semicontinuous and continuous data, CECM Research Report 95-051, 1995, Trans. Amer. Math. Soc. 350 (1998) 2409–2429. [33] J.M. Borwein, J.S. Treiman, Q.J. Zhu, Partially smooth variational principles and applications, CECM Research Report 96-088, 1996. Nonlinear Analysis TMA (1998) to appear. [34] J.M. Borwein, J.S. Treiman, Q.J. Zhu, Sensitivity analysis in re exive Banach spaces, paper in preparation. [35] J.M. Borwein, D.M. Zhang, Veri able necessary and sucient conditions for openness and regularity of set-valued and single valued maps, J. Math. Anal. Appl. 134 (1988) 441 – 459. [36] J.M. Borwein, Q.J. Zhu, Viscosity solutions and viscosity subderivatives in smooth Banach spaces with applications to metric regularity, CECM Research Report 94-12 (1994), SIAM J. Control Optim. 34 (1996) 1568 – 1591. [37] J.M. Borwein, Q.J. Zhu, Variational analysis in non-re exive spaces and applications to control problems with L1 Perturbations, CECM Research Report 94-10 (1994), Nonlinear Anal. TMA 28 (1997) 889 – 915. [38] J.M. Borwein, Q.J. Zhu, Limiting convex examples for nonconvex subdi erential calculus, CECM research report, 97-097, 1997, Convex Anal., to appear. [39] F.H. Clarke, Necessary conditions for nonsmooth problems in optimal control and the calculus of variations, Ph.D. Thesis, Univ. of Washington, 1973. [40] F.H. Clarke, Generalized gradients and applications, Trans. Amer. Math. Soc. 205 (1975) 247 – 262. [41] F.H. Clarke, Maximum principle without di erentiability, Proc. Amer. Math. Soc. 81 (1975) 219 – 222. [42] F.H. Clarke, A new approach to Lagrange multipliers, Math. Oper. Res. 1 (1976) 165 – 174. [43] F.H. Clarke, On the inverse function theorem, Paci c J. Math. 64 (1976) 97 – 102. [44] F.H. Clarke, Optimal control and the true Hamiltonian, SIAM Rev. 21 (1979) 157 – 166. [45] F.H. Clarke, Optimization and Nonsmooth Analysis, Wiley, New York, 1983, Russian edition MIR, Moscow (1988). Reprinted as vol. 5 of the series Classics in Applied Mathematics, SIAM, Philadelphia, 1990.

768

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

[46] F.H. Clarke, Perturbed optimal control problems, IEEE Trans. AC, AC-31 (1986) 535 – 542. [47] F.H. Clarke, Methods of dynamic and nonsmooth optimization, CBMS-NSF Regional Conf. series in applied mathematics, vol. 57, SIAM, Philadelphia, 1989. [48] F.H. Clarke, An indirect method in the calculus of variations, Trans. Amer. Math. Soc. 336 (1993) 535 – 542. [49] F.H. Clarke, Yu.S. Ledyaev, Mean value inequalities, Proc. Amer. Math. Soc. 122 (1994) 1075 – 1083. [50] F.H. Clarke, Yu.S. Ledyaev, Mean value inequalities in Hilbert space, Trans. Amer. Math. Soc. 344 (1994) 307 – 324. [51] F.H. Clarke, Yu.S. Ledyaev, E.D. Sontag, A.I. Subbotin, Asymptotic controllability implies feedback stabilization, IEEE Trans. Auto. Control, to appear. [52] F.H. Clarke, Yu.S. Ledyaev, R.J. Stern, P.R. Wolenski, Qualitative properties of trajectories of control systems: a survey, J. Dyn. Control Systems, 1 (1995) 1 – 48. [53] F.H. Clarke, Yu.S. Ledyaev, R.J. Stern, P.R. Wolenski, Nonsmooth Analysis and Control Theory, Graduate Texts in Mathematics, vol. 178, Springer, New York, 1998. [54] F.H. Clarke, Yu.S. Ledyaev, R.J. Stern, Fixed points and equilibria in nonconvex sets, Nonlinear Anal. TMA 25 (1995) 145 – 161. [55] F.H. Clarke, Yu.S. Ledyaev, P.R. Wolenski, Proximal analysis and minimization principles, J. Math. Anal. Appl. 196 (1995) 722 – 735. [56] F.H. Clarke, P.D. Loewen, The value function in optimal control: sensitivity, controllability, and timeoptimality, SIAM J. Control Optim. 24 (1986) 243 – 263. [57] F.H. Clarke, P.D. Loewen, State constraints in optimal control: A case study in proximal normal analysis, SIAM J. Control Optim. 25 (1987) 1440 – 1456. [58] F.H. Clarke, R.M. Redhe er, The proximal subgradient and constancy, Canad. Math. Bull. 36 (1993) 30 – 32. [59] F.H. Clarke, R.J. Stern, P.R. Wolenski, Subgradient criteria for monotonicity, the Lipschitz condition, and convexity, Can. J. Math. 45 (1993) 1167 – 1183. [60] F.H. Clarke, R.B. Vinter, Regularity properties of solutions to the basic problem in the calculus of variations, Trans. Amer. Math. Soc. 289 (1985) 73 – 98. [61] F.H. Clarke, R.B. Vinter, The relationship between the maximum principle and the dynamic programming, SIAM J. Control and Optim. 25 (1987) 1291 – 1311. [62] F.H. Clarke, P.R. Wolenski, The sensitivity of optimal control problems to time delay, SIAM J. Control Optim. 29 (1991) 1176 – 1215. [63] R. Correa, A. Jofre, L. Thibault, Characterization of lower semicontinuous convex functions, Proc. Amer. Math. Soc. 116 (1992) 61 – 72. [64] R. Correa, A. Jofre, L. Thibault, Subdi erential monotonicity as characterization of convex functions, Numer. Funct. Anal. Optim. 15 (1994) 531 – 535. [65] M.G. Crandall, L.C. Evans, P.-L. Lions, Some properties of viscosity solutions of Hamilton–Jacobi equations, Trans. Amer. Math. Soc. 282 (1984) 487 – 502. [66] M.G. Crandall, H. Ishii, The maximum principle for semicontinuous functions, Di erential Integral Equitions 3 (1990) 1001 – 1014. [67] M.G. Crandall, H. Ishii, P.-L. Lions, User’s guide to viscosity solutions of second order partial di erential equations, Bull. Amer. Math. Soc., N.S. 27 (1992) 1 – 67. [68] M.G. Crandall, P.-L. Lions, Viscosity solutions of Hamilton–Jacobi equations, Trans. Amer. Math. Soc. 277 (1983) 1 – 42. [69] M.G. Crandall, P.-L. Lions, Hamilton–Jacobi equations in in nite dimensions, Part I. Uniqueness of viscosity solutions, J. Functional Analysis, 62 (1985) 379 – 396; Part II. Existence of viscosity solutions, 65 (1986) 368 – 405; Part III., 68, 214 – 247; Part IV. Unbounded linear terms, 90 (1990) 237 – 283; Part V. B-continuous solutions, 97 (1991) 417 – 465. [70] R. Deville, Stability of subdi erentials of nonconvex functions in Banach spaces, Set-Valued Anal. 2 (1994) 141 – 157. [71] R. Deville, G. Godefroy, V. Zizler, Smoothness and Renormings in Banach Spaces, Pitman Monographs and Surveys in Pure and Applied Mathematics, No. 64, Wiley, New York, 1993. [72] R. Deville, G. Godefroy, V. Zizler, A smooth variational principle with applications to Hamilton–Jacobi equations in in nite dimensions, J. Functional Anal. 111 (1993) 197 – 212.

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

769

[73] R. Deville, E.M.E. Haddad, The subdi erential of the sum of two functions in Banach spaces, I. rst order case, Convex Anal. 3 (1996) 295 – 308. [74] R. Deville, M. Ivanov, Smooth variational principles with constraints, Arch. Math. 69 (1977) 418–426. [75] J. Diestel, Sequences and Series in Banach Spaces, Graduate text in Mathematics, Springer, New York, 1984. [76] A.V. Dmitruk, A.A. Miljutin, N.P. Osmolovskii, Ljusternik’s theorem and the theory of extrema, Russ. Math. Surv. 35 (1980) 11 – 51. [77] S. Dolecki, A general theory of necessary optimality conditions, J. Math. Anal. Appl. 78 (1980) 267 – 308. [78] I. Ekeland, On the variational principle, J. Math. Anal. Appl. 47 (1974) 324 – 353. [79] M. Fabian, On class of subdi erentiability spaces of Io e, Nonlinear Analysis: TMA 12 (1988) 63 – 74. [80] M. Fabian, Subdi erentiability and trustworthiness in the light of a new variational principle of Borwein and Preiss, Acta Univ. Carolina 30 (1989) 51 – 56. [81] W.H. Fleming, R.W. Rishel, Deterministic and Stochastic Optimal Control, Springer, Berlin, 1975. [82] H. Frankowska, An open mapping principle for set-valued maps, J. Math. Anal. Appl. 127 (1987) 172–180. [83] J. Gauvin, The generalized gradient of a marginal function in mathematical programming, Mat. Oper. Res. 4 (1979) 458–463. [84] J. Gauvin, J.W. Tolle, Di erential stability in nonlinear programming, SIAM J. Control Optim. 15 (1977) 294–311. [85] B.R. Gelbaum, J.M.H. Olmsted, Counterexamples in Analysis, Holden-Day, San Francisco, 1964. [86] B. Ginsburg, A.D. Io e, The maximum principle in optimal control of system governed by semilinear equations in: B.S. Mordukhovich and H.T. Sussmann (Eds.), Proc. IMA Workshop on Nonsmooth Anal. and Geometric Methods in Deterministic Optimal Control, IMA Volumes in Math. and its Appl., Springer, New York, 1995. [87] H. Halkin, Mathematical programming without di erentiability, in: D.L. Russell (Ed.), Calculus of Variations and Control Theory, Academic Press, New York, 1976, pp. 279–288. [88] S. Helgason, Di erential Geometry, Lie groups, and Symmetric Spaces, Academic Press, New York, 1978. [89] J.B. Hiriart-Urruty, Tangent cones, generalized gradients and mathematical programming in Banach spaces, Math. Oper. Res. 4 (1979) 79–97. [90] A. Ho man, On approximate solutions of systems of linear inequalities, J. Res. Nat. Bur. Standards Sect. B 49 (1952) 263–265. [91] R.B. Holmes, Geometric Functional Analysis and Its Applications, Springer, New York, 1975. [92] A.D. Io e, Regular points of Lipschitz mappings, Trans. Amer. Math. Soc. 251 (1979) 61–69. [93] A.D. Io e, Nonsmooth analysis: di erential calculus of nondi erentiable mappings, Trans. Amer. Math. Soc. 266 (1981) 1–56. [94] A.D. Io e, On subdi erentiablility spaces, Ann. N.Y. Acad. Sci. 410 (1983) 107–119. [95] A.D. Io e, Subdi erentiablility spaces and nonsmooth analysis, Bull. Amer. Math. Soc. 10 (1984) 87–90. [96] A.D. Io e, Necessary conditions for nonsmooth optimization, Math. Oper. Res. 9 (1984) 159–189. [97] A.D. Io e, Calculus of Dini subdi erentials of functions and contingent derivatives of set-valued maps, Nonlinear Anal. TMA 8 (1984) 517–539. [98] A.D. Io e, Approximate subdi erentials and applications. I: the nite dimensional theory, Trans. Amer. Math. Soc. 281 (1984) 389–416. [99] A.D. Io e, Approximate subdi erentials and applications. II: Functions on local convex spaces, Mathematika 33 (1986) 111–128. [100] A.D. Io e, On the local surjection property, Nonlinear Anal. TMA 11 (1987) 565–590. [101] A.D. Io e, Approximate subdi erentials and applications 3: the metric theory, Mathematika, 36 (1989) 1–38. [102] A.D. Io e, Proximal analysis and approximate subdi erentials, J. London Math. Soc. 41 (1990) 175–192.

770

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

[103] A.D. Io e, Separable reduction theorem for approximate subdi erentials, C.R. Acad. Sci. Paris, 323 (1996) 107–112. [104] A.D. Io e, Codirectional compactness, metric regularity and subdi erential calculus, preprint. [105] A.D. Io e, Fuzzy principles and characterization of trustworthiness, preprint. [106] A.D. Io e, J.P. Penot, Subdi erentials of performance functions and calculus of coderivatives of setvalued mappings, Serdica Math. J. 22 (1996) 257–282. [107] A.D. Io e, T.R. Rockafellar, The Euler and Weierstrass conditions for nonsmooth variational problems, Calc. Var. 4 (1996) 59 – 87. [108] A. Jourani, L. Thibault, Coderivatives of multivalued mappings, locally compact cones and metric regularity, preprint 1995. [109] A. Jourani, L. Thibault, Veri able conditions for openness and regularity of multivalued mappings, Trans. Amer. Math. Soc. 347 (1995) 1255–1268. [110] A. Jourani, L. Thibault L, Extensions of subdi erential calculus rules in Banach spaces and applications, Canad. J. Math. 48 (1996) 834– 848. [111] A. Jourani, L. Thibault, Chain rules for coderivatives of multivalued mappings in Banach spaces, preprint 1994, Proc. Amer. Math. Soc. 126 (1998) 1479–1485. [112] A. Jourani, L. Thibault, Quali cation conditions for calculus rules of coderivatives of multivalued mappings, J. Math. Anal. Appl. 218 (1998) 66–81. [113] T. Kato, A Short Introduction to Perturbation Theory for Linear Operators, Springer, New York, 1982. [114] A.Y. Kruger, Properties of generalized di erentials, Sib. Math. J. 26 (1985) 822– 832. [115] A.Y. Kruger, B.S. Mordukhovich, Extremal points and Euler equations in nonsmooth optimization, Dokl. Akad. Nauk. BSSR 24 (1980) 684 – 687. [116] Yu.S. Ledyaev, Theorems on an implicitly given set-valued mapping, (Russian) Dokl. Akad. Nauk SSSR 276 (3) (1984) 543–546. [117] Yu.S. Ledyaev, Q.J. Zhu, Implicit multifunction theorems, preprint. [118] A.S. Lewis, Eigenvalue-constrained faces, Technical Report CORR 95-22, University of Waterloo, 1995. [119] A.S. Lewis, Convex analysis on the Hermitian matrices, SIAM J. Optim. 6 (1996) 164–177. [120] A.S. Lewis, Derivatives of spectral functions, Math. Oper. Res. 21 (1996) 576 –588. [121] A.S. Lewis, Nonsmooth analysis of eigenvalues, preprint. [122] A.S. Lewis, M.L. Overton, Eigenvalue optimization, Acta Numerica 5 (1996) 149 –190. [123] A.S. Lewis, D. Ralph, A nonlinear duality result equivalent to the Clarke–Ledyaev mean value inequality, Nonlinear Anal. TMA 26 (1996) 343–350. [124] Y. Li, S. Shi, A generalization of Ekeland’s e-variational principle and of its Borwein–Preiss’ smooth variant, J. Math. Anal. Appl. to appear. [125] J. Lindenstrauss, L. Tzafriri, Classical Banach Spaces II: Function Spaces, Springer, Berlin, 1979. [126] P.D. Loewen, The proximal subgradient formula in Banach space, Canad. Math. Bull. 31 (1988) 353–361. [127] P.D. Loewen, Perturbed di erential inclusion problems, in: F.H. Clarke, V.F. Dem’yanov, F. Giannessi (Eds.), Nonsmooth Optimization and Related Topics, Plenum Press, New York, 1989, pp. 255–263. [128] P.D. Loewen, Limits of Frechet normals in nonsmooth analysis, in: A.D. Io e et al. (Eds.), Optimization and Nonlinear Analysis, Pitman Research Notes in Math. Series No. 244, Longman, Harlow, Essex, (1992) pp. 178-188. [129] P.D. Loewen, Optimal Control via Nonsmooth Analysis, CRM Lecture Notes Series, Amer. Math. Soc., Summer School on Control, CRM, Universite de Montreal, 1992, Amer. Math. Soc., Providence, 1993. [130] P.D. Loewen, A mean value theorem for Frechet subgradients, Nonlinear Anal. TMA 23 (1994) 1365– 1381. [131] D.T. Luc, Characterizations of quasiconvex functions, Bull. Austral. Math. Soc. 48 (1993) 393– 405. [132] D.T. Luc, On the maximal monotonicity of subdi erentials, Acta Math. Vietnam 18 (1993) 99 –106. [133] D.T. Luc, A strong mean value theorem and applications, Nonlinear Anal. TMA 26 (1996) 915–923. [134] O.L. Mangasarian, S. Fromovitz, The Fritz John necessary optimality conditions in the presence of equality and inequality constraints, J. Math. Anal. Appl. 17 (1967) 37– 47.

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

771

[135] P. Michel, J.P. Penot, Calcul sous-di erential pour des fonctions Lipschitziennes et non-Lipschiziennes, C. R. Acad. Sci. Paris, Ser. I Math. 298 (1985) 269 –272. [136] B.S. Mordukhovich, Maximum principle in problems of time optimal control with nonsmooth constraints, J. Appl. Math. Mech. 40 (1976) 960–969. [137] B.S. Mordukhovich, Metric approximations and necessary optimality conditions for general classes of nonsmooth extremal problems, Soviet Math. Dokl. 22 (1980) 526 –530. [138] B.S. Mordukhovich, Nonsmooth analysis with nonconvex generalized di erentials and adjoint mappings, Dokl. Akad. Nauk BSSR 28 (1984) 976 –979. [139] B.S. Mordukhovich, Approximation Methods in Problems of Optimization and Control, Nauka, Moscow, (1988) (Russian; English transl. Wiley-Interscience). [140] B.S. Mordukhovich, Complete characterization of openness, metric regularity, and Lipschitzian properties of multifunctions, Trans. Amer. Math. Soc. 340 (1993) 1–35. [141] B.S. Mordukhovich, Generalized di erential calculus for nonsmooth and set-valued mappings, J. Math. Anal. Appl. 183 (1994) 250–288. [142] B.S. Mordukhovich, Stability theory for parametric generalized equations and variational inequalities via nonsmooth analysis, Trans. Amer. Math. Soc. 343 (1994) 609 – 658. [143] B.S. Mordukhovich, Sensitivity analysis for constraint and variational systems by means of set-valued di erentiation, Optimization 31 (1994) 13 – 46. [144] B.S. Mordukhovich, On variational analysis in in nite dimensions, Research Report # 33 (1997), Dept. of Math., Wayne State Univ., Proc. 18th IFIP Conference on System Modelling and Optimization, Detroit, July 1997. [145] B.S. Mordukhovich, Coderivatives of set-valued mappings: calculus and applications, Nonlinear Anal. TMA 30 (1997). [146] B.S. Mordukhovich, Y. Shao, Di erential characterizations of covering, metric regularity, and Lipschitzian properties of multifunctions between Banach spaces, Nonlinear Anal. TMA 24 (1995) 1401–1424. [147] B.S. Mordukhovich, Y. Shao, Nonsmooth sequential analysis in Asplund spaces, Trans. Amer. Math. Soc. 348 (1996) 1235–1280. [148] B.S. Mordukhovich, Y. Shao, Nonconvex di erential calculus for in nite-dimensional multifunctions, Set-Valued Anal. 4 (1996) 205–236. [149] B.S. Mordukhovich, Y. Shao, Extremal characterizations of Asplund spaces, Proc. Amer. Math. Soc. 124 (1996) 197–205. [150] B.S. Mordukhovich, Y. Shao, Stability of set-valued mappings in in nite dimensions: point criteria and applications, SIAM J. Control Optim. 35 (1997) 285–314. [151] B.S. Mordukhovich, Y. Shao, Fuzzy calculus for coderivatives of multifunctions, Nonlinear Anal. TMA 29 (1997) 605– 626. [152] B.S. Mordukhovich, Y. Shao, Q.J. Zhu, Viscosity coderivatives and their limiting behaviors in smooth spaces, preprint. [153] J.P. Penot, Calcul sous-di erentiel et optimisation, J. Funct. Anal. 27 (1978) 248–276. [154] J.P. Penot, Compactness property, openness criteria and coderivatives, preprint 1995. [155] R.R. Phelps, Convex Functions, Monotone Operators and Di erentiability, 1988, 2nd ed., Lecture Notes in Mathematics, vol. 1364, Springer, New York, 1993. [156] E. Polak, Y. Wardi, A nondi erentiable optimization algorithm for the design of control systems having singular valued inequalities, Automatica, J. IFAC 18 (1982) 267–283. [157] R.A. Poliquin, Subgradient monotonicity and convex functions, Nonlinear Anal. TMA 15 (1990) 305–317. [158] R.A. Poliquin, Integration of subdi erentials of nonconvex functions, Nonlinear Anal. TMA 17 (1991) 385–398. [159] D. Preiss, Frechet derivatives of Lipschitzian functions, J. Funct. Analysis 91 (1990) 312–345. [160] B.N. Pshenichnyi, Necessary Conditions for an Extremum, Marcel Dekker, New York, 1971. [161] L. Qi, The maximal normal operator space and integration of subdi erentials of nonconvex functions, Nonlinear Anal. TMA 13 (1989) 1003–1011. [162] M.L. Radulescu, F.H. Clarke, The multidirectional mean value theorem in Banach spaces, Canad. Math. Bull. 40 (1997) 88 –102.

772

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

[163] S.M. Robinson, Stability theory for systems of inequalities, Part II: di erentiable nonlinear systems, SIAM J. Numer. Anal. 13 (1976) 497–513. [164] S.M. Robinson, Regularity and stability of convex multivalued functions, Math. Oper. Res. 1 (1976) 130–145. [165] R.T. Rockafellar, Convex Analysis, Princeton University Press, Princeton, NJ, 1970. [166] R.T. Rockafellar, On the maximal monotonicity of subdi erential mappings, Paci c J. Math. 33 (1970) 209 –216. [167] R.T. Rockafellar, Clarke’s tangent cones and boundaries of closed sets in Rn , Nonlinear Anal. TMA 3 (1979) 145 –154. [168] R.T. Rockafellar, Proximal subgradients, marginal values and augmented Lagrangians in nonconvex optimization, Math. Oper. Res. 6 (1981) 424 – 436. [169] R.T. Rockafellar, Lagrange multipliers and subderivatives of optimal value functions in nonlinear programming, Math. Prog. Study 17 (1982) 28 – 66. [170] R.T. Rockafellar, Favorable classes of Lipschitz-continuous functions in subgradient optimization, in: E. Nurminski (Ed.), Progress in Nondi erentiable Optimization, pp. 125–144. IIASA Collaborative Proceedings Series, International Institute of Applied Systems Analysis, Laxenburg, Austria, 1982. [171] R.T. Rockafellar, Extensions of subgradients and its applications to optimization, Nonlinear Anal. TMA 9 (1985) 665– 698. [172] R.T. Rockafellar, R.J.-B. Wets, Variational Analysis, Springer, New York, 1997. [173] W. Rudin, Functional Analysis, McGraw-Hill, New York, 1973. [174] S. Simons, The least slope of a convex function and the maximal monotonicity of its subdi erential, J. Optim. Theory and Appl. 71 (1991) 127–136. [175] R.J. Stern, J.J. Ye, Variational analysis of an extended eigenvalue problem, Linear Algebra Appl. 220 (1995) 391– 417. [176] A.I. Subbotin, A generalization of the basic equation of the theory of di erential games, Soviet Math. Dokl. 22 (1980) 358–362. [177] A.I. Subbotin, Continuous and discontinuous solutions of boundary value problems for rst-order partial di erential equations, Dokl. Akad. Nauk SSSR 323 (1992). [178] A.I. Subbotin, Viable characteristics of Hamilton-Jacobi equations, preprint. [179] A.I. Subbotin, Generalized Solutions of First-Order Partial Di erential Equations, Birkhauser, New York, 1994. [180] H.J. Sussmann, A strong version of the maximum principle under weak hypotheses, Proceedings of the 33rd IEEE Conf. on decision and control, Lake Buena Vista, FL, December 1994. [181] H.J. Sussmann, A strong version of the Lojasiewicz maximum principle, in: N.H. Pavel (Ed.), Optimal Control of Di erential Equations, M. Dekker, Inc., New York, 1994. [182] H.J. Sussmann, A strong maximum principle for systems of di erential inclusions, Proceedings of the 35rd IEEE Conf. on decision and control, Kobe, Japan, December 1996. [183] H.J. Sussmann, Multidi erential calculus: chain rule, open mapping and transversal intersection theorems, in: William Hager (Ed.), Proc. IFIP Conf. on Optimal control: theory algorithms, and applications, held at the University of Florida, Gainesville, 1997. [184] L. Thibault, Subdi erentials of compactly Lipschitzian vector valued functions, Annali Math. Pura Appl. 125 (1980) 157–192. [185] L. Thibault, A note on the Zagrodny mean value theorem, Optimization 35 (1995) 127–130. [186] L. Thibault, D. Zagrodny, Integration of subdi erentials of lower semicontinuous functions on Banach spaces, J. Math. Anal. Appl. 189 (1995) 33–58. [187] J.S. Treiman, Characterization of Clarke’s tangent and normal cones in nite and in nite dimensions, Nonlinear Anal. TMA 7 (1983) 771–783. [188] J.S. Treiman, Generalized gradients, Lipschitz behavior and directional derivatives, Can. J. Math. 37 (1985) 1074 –1084. [189] J.S. Treiman, Clarke’s Gradients and epsilon-subgradients in Banach spaces, Trans. Amer. Math. Soc. 294 (1986) 65–78. [190] J.S. Treiman, Finite dimensional optimality conditions: B-gradients, J. Optim. Theory Appl. 62 (1989) 139 –150.

J.M. Borwein, Q.J. Zhu / Nonlinear Analysis 38 (1999) 687 – 773

773

[191] J.S. Treiman, Optimal control with small generalized gradients, SIAM J. Control and Optim. 28 (1990) 720–732. [192] J.S. Treiman, Lagrange multipliers for nonconvex generalized gradients with equality, inequality and set constraints, preprint. [193] H.D. Tuan, On controllability and extremality in nonconvex di erential inclusions, J. Optim. Appl. 85 (1995) 435– 472. [194] C. Ursescu, Multifunctions with closed convex graphs, Czech. Math. J. 25 (1975) 438 – 441. [195] J. Vanderwer , Q.J. Zhu, A limiting example for the local “fuzzy” sum rule in nonsmooth analysis, CECM Research Report 96-083, 1996, Proc. Amer. Math. Soc. 126 (1998) 2691–2697. [196] D.E. Ward, Di erential stability in non-lipschitzian optimization, J. Optim. Theo. Appl. 73 (1992) 101–120. [197] D.E. Ward, Epiderivatives of the marginal function in nonsmooth parametric optimization, Optimization 31 (1994) 47– 61. [198] D.E. Ward, Dini derivatives of the marginal functions of a non-lipschitzian program, SIAM J. Optim. 33 (1996) 198–211. [199] D.E. Ward, J.M. Borwein, Nonsmooth calculus in nite dimensions, SIAM J. Contr. and Optim. 25 (1987) 1312–1340. [200] J. Warga, Derivate containers, inverse functions, and controllability, in: D.L. Russell (Ed.), Calculus of Variations and Control Theory, Academic Press, New York, 1976. [201] J. Warga, An implicit function theorem without di erentiability, Proc. Amer. Math. Soc. 69 (1978) 65– 69. [202] J. Warga, Fat homeomorphisms and unbounded derivate containers, J. Math. Anal. Appl. 81 (1981) 545–560. [203] J. Warga, Controllability, extremality and abnormality in nonsmooth optimal control, J. Optim. Theory Appl. 41 (1983) 239 –260. [204] J. Warga, Optimization and controllability without di erentiability assumptions, SIAM J. Control Optim. 21 (1983) 837– 855. [205] J. Warga, Homeomorphisms and local C 1 approximations, J. Nonlinear Anal. TMA 12 (1988) 593–597. [206] J. Warga, An extension of the Kaskosz maximum principle, Applied Math. Optim. 22 (1990) 61–74. [207] J. Warga, Q.J. Zhu, A proper relaxation of shifted and delayed controls, J. Math. Anal. Appl. 168 (1992) 546 –561. [208] P.R. Wolenski, Z. Yu, Proximal analysis and the minimal time function, preprint (1996). [209] Z. Wu, Subdi erentials and their applications, Master Thesis, Advisor J.J. Ye, University of Victoria, 1997. [210] J.J. Ye, Perturbed in nite horizon optimal control problems, J. Math. Anal. Appl. 182 (1994) 90–112. [211] J.J. Ye, Q.J. Zhu, Hamilton–Jacobi theory for a generalized optimal stopping time problem, Research Report, DMS-709-IR, University of Victoria, July 1995, to appear in Nonlinear Analysis, TMA [212] J.J. Ye, Q.J. Zhu, Perturbed di erential inclusion problem with nonadditive L1 perturbations and applications, Research report DMS-668-IR, University of Victoria, 1994, J. Optim. Theo. Appl. 92 (1997) 189 –208. [213] D. Zagrodny, Approximate mean value theorem for upper subderivatives, Nonlinear Anal. TMA 12 (1988) 1413–1428. [214] Q.J. Zhu, Calculus rules for subderivatives in smooth banach spaces, preprint, (1995). [215] Q.J. Zhu, Subderivatives and their applications, Proc. Int. Conf. on Dynamical Systems and Di erential Equations, Spring eld, MO, June, 1996. [216] Q.J. Zhu, Necessary optimality conditions for nonconvex di erential inclusion with endpoint constraints, J. Di erential Equations 124 (1996) 186 –204. [217] Q.J. Zhu, Clarke–Ledyaev mean value inequality in smooth Banach spaces, CECM Research Report 96-78, 1996, Nonlinear Analysis, TMA 32 (1998) 315–324. [218] Q.J. Zhu, The equivalence of several basic theorems for subdi erentials, CECM Research Report 97-093. Set-Valued Anal., (1998), to appear.