Neural Networks 50 (2014) 79–89
Contents lists available at ScienceDirect
Neural Networks journal homepage: www.elsevier.com/locate/neunet
A one-layer recurrent neural network for constrained nonsmooth invex optimization✩ Guocheng Li a , Zheng Yan b , Jun Wang b,∗ a
Department of Mathematics, Beijing Information Science and Technology University, Beijing, China
b
Department of Mechanical and Automation Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
article
info
Article history: Received 4 August 2013 Received in revised form 9 November 2013 Accepted 10 November 2013 Keywords: Recurrent neural network Invex optimization Exact penalty function Finite time convergence
abstract Invexity is an important notion in nonconvex optimization. In this paper, a one-layer recurrent neural network is proposed for solving constrained nonsmooth invex optimization problems, designed based on an exact penalty function method. It is proved herein that any state of the proposed neural network is globally convergent to the optimal solution set of constrained invex optimization problems, with a sufficiently large penalty parameter. In addition, any neural state is globally convergent to the unique optimal solution, provided that the objective function and constraint functions are pseudoconvex. Moreover, any neural state is globally convergent to the feasible region in finite time and stays there thereafter. The lower bounds of the penalty parameter and convergence time are also estimated. Two numerical examples are provided to illustrate the performances of the proposed neural network. © 2013 Elsevier Ltd. All rights reserved.
1. Introduction Numerous problems in engineering applications, such as optimal control and adaptive signal processing, can be formulated as dynamic optimization problems and many of them are involved with nonconvexity and multiplicity in objective functions. In robotics, for example, the real-time motion planning and control of kinematically redundant robot manipulators can be formulated as constrained dynamic optimization problems with multiple and nonconvex objective functions for simultaneously minimizing kinetic energy and maximizing manipulability. Similarly, in nonlinear and robust model predictive control, optimal control commands have to be computed with a moving time window by repetitively solving constrained optimization problems with multiple and nonconvex objective functions for error and control variation minimization, and robustness maximization. The difficulty of dynamic optimization is significantly amplified when the optimal solutions have to be obtained in real time, especially in the presence of uncertainty. In such applications, neurodynamic optimization approaches based on recurrent neural networks are usually more competent than conventional optimization methods
✩ The work described in the paper was supported by the Research Grants Council of the Hong Kong Special Administrative Region, China, under Grants CUHK416812E. ∗ Corresponding author. Tel.: +852 39438472. E-mail addresses:
[email protected] (G. Li),
[email protected] (Z. Yan),
[email protected],
[email protected] (J. Wang).
0893-6080/$ – see front matter © 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.neunet.2013.11.007
because of their inherent nature of parallel and distributed information processing and their salient features of biological plausibility and hardware parallelizability. Since the pioneering works of Hopfield and Tank on a recurrent neural network approach to constrained optimization (Hopfield & Tank, 1985; Tank & Hopfield, 1986), numerous motivated researchers investigated alternative neural networks for solving various linear and nonlinear programming problems. The neurodynamic optimization approaches received substantial attention in the past three decades. Many neurodynamic optimization models have been developed. For example, Kennedy and Chua (1988) presented a recurrent neural network for nonlinear optimization by utilizing the finite penalty parameter method to approximate optimal solutions. Wang (1993, 1994) presented a deterministic annealing neural network for solving linear and nonlinear convex programming problems. Xia, Leung, and Wang (2002) proposed a gradient projection neural network for solving convex optimization problems, which can globally converge to corresponding exact optimal solutions. Zhang and Constantinides (1992) presented a two-layer Lagrangian neural network to deal with optimization problems with equality constraints. Wang (1997) presented a primal–dual neural network for the zero–one integer linear programming. Xia and Wang (2001) presented a dual neural network for kinematic optimal control of redundant robot manipulators. Gao (2004) presented a neural network for nonlinear convex programming. The aforementioned neural network models require the gradient information of objective functions and constraints for solving optimization problems. Therefore, these models cannot solve nonsmooth optimization problems. However, nonsmooth
80
G. Li et al. / Neural Networks 50 (2014) 79–89
objective functions arise widely in many engineering applications, such as the L1 -norm optimization in manipulator control and signal processing. Hence, further study on recurrent neural networks for nonsmooth optimization problems is necessary and rewarding. In recent years, the research on recurrent neural networks for nonsmooth or generalized convex optimization problems becomes a very active direction in the neurodynamic optimization field. Many neural network models for nonsmooth or nonconvex optimization problems have been developed. Forti, Nistri, and Quincampoix (2004) first proposed a generalized neural network for solving nonlinear nonsmooth convex programming problems using Clarke’s generalized gradient, which constituted a generalization of the network introduced by Kennedy and Chua (1988). In Forti, Nistri, and Quincampoix (2006), the neural network model proposed in Forti et al. (2004) was employed for solving nonsmooth linear or quadratic programming problems. The convergence of the neural network was analyzed by using Łojasiewicz inequality. Xue and Bian (2008) proposed a neural network for solving nonsmooth convex optimization problems with more general constraints. Bian and Xue (2009) proposed a neural network for nonsmooth nonconvex optimization, but the model was only quasi-convergent. In particular, finite-time convergence was discussed in Bian and Xue (2009), Forti et al. (2004, 2006) and Xue and Bian (2008). Hu and Wang (2006) proposed a projection neural network for solving pseudomonotone variational inequalities and pseudoconvex optimization problems. Li, Song, and WU (2010) studied a generalized gradient projection neural network for solving nonsmooth optimization problems with closed convex set constraints. Guo, Liu, and Wang (2011) proposed a recurrent neural network for solving the differentiable pseudoconvex optimization problems with linear equality constraints. Using the Lagrangian function method and the saddle point theorem, Cheng et al. (2011) proposed a recurrent neural network for solving nonsmooth convex optimization problems with convex inequality, linear equality, and bounded closed convex set constraints by generalizing the model in Gao (2004). Inspired by these works, Liu and Wang (2011a) proposed a finite-time convergent recurrent neural network for constrained optimization problems with piecewise-linear objective functions. Liu and Wang (2011b) proposed a one-layer recurrent neural network for constrained nonsmooth convex optimization problems with equality and inequality constraints. Hosseini, Wang, and Hosseini (2013) proposed a penalty-based recurrent neural network for solving a class of constrained optimization problems with generalized convex objective functions. Liu and Wang (2013) proposed a one-layer projection neural network without any design parameter for solving nonsmooth optimization problems with generalized convex objective functions. Despite the success of neurodynamic optimization in the past three decades with various globally convergent recurrent neural networks developed, almost all existing results are concerned with convex optimization problems and effective neurodynamic approach to constrained nonconvex optimization is rarely available. While constrained nonconvex optimization with multi-modal objective functions remain to be an unconquered stronghold in general, some special classes of nonconvex optimization problems (e.g., those with invex functions) have the desirable global property that any local optimal solution is also a global one. In this paper, a recurrent neural network based on an exact penalty function method is proposed for solving invex optimization problems subject to inequality constraints where the objectives and constrains can be nonsmooth invex functions. First, a sufficient condition is derived for the exact penalty property of the penalty function. It is ensured that the neural network converges to the feasible region in finite time. Next, dynamical behaviors and optimization capabilities for the proposed neural network
are rigorously analyzed. It is proved that for a sufficiently large penalty parameter, any state of the neural network can globally reach the feasible region in finite time and stay there thereafter, and an estimation method for the lower bound of the penalty parameter is given. An invex function shares with a convex function the property that every stationary point is a global minimum point. Roughly speaking, convex, pseudoconvex and quasi-convex functions are special cases of invex functions. There are several distinct differences between the proposed model and most of the existing ones such as Forti et al. (2004). First, Forti et al. (2004) requires the feasible region to be convex in order to obtain the global optimum. The proposed model is capable of dealing with nonconvex feasible region. Therefore, the proposed network has wider applicability. Second, the result on finite time convergence property under nonconvex constraints is new. In addition, a novel relation between the model equilibrium and almost cluster points is derived. The remainder of this paper is organized as follows. Section 2 introduces some definitions and preliminary results. Section 3 discusses the exact penalty function method. Section 4 presented the neural network model and analyzed its convergent properties. Section 5 discusses the penalty parameter estimation. Section 6 provides simulation results on two constrained invex and nonsmooth optimization problems. Finally, Section 7 concludes this paper. 2. Preliminaries In this section, we present definitions and properties concerning the set-valued analysis, nonsmooth analysis, and invex functions which are needed in the remainder of the paper. We refer readers to Aubin and Cellina (1984), Clarke (1969), Filippov (1988), Pardalos (2008) and Penot and Quan (1997) for more detailed discussions. Let Rn be a real n-Euclidean space with the scalar product ⟨x, y⟩ = ni=1 xi yi , x, y ∈ Rn and its related norm ∥x∥ = [ ni=1 1
x2i ] 2 . Let x ∈ Rn and A ⊂ Rn , dist(x, A) = infy∈A ∥x − y∥ is the distance of x from A. Definition 1. F : Rn ↩→ Rn is called a set-valued map, if to each point x ∈ Rn , there corresponds a nonempty closed set F (x) ⊂ Rn . Definition 2. Let F be a set-valued map. F is said to be upper semicontinuous (u.s.c) at x0 ∈ Rn if ∀ ε > 0, ∃ δ > 0 such that ∀x ∈ x0 + δ B(0, 1), F (x) ⊂ F (x0 ) + ε B(0, 1), where B(0, 1) is an open unit ball of Rn . F is u.s.c if it is so at every x0 ∈ Rn . A solution x(t ) of a differential inclusion is an absolutely continuous function, the derivative x˙ (t ) is only defined almost everywhere, so that its limit when t → ∞ is not well defined. The concepts of limit and cluster points to a measurable function should be defined. Let µ(A) denote the Lebesgue measure of a measurable subset A ⊂ R. Definition 3. Let x : [0, ∞) → Rn be a measurable function. x∗ ∈ Rn is the almost limit of x(·) if when t → ∞ ∀ ε > 0, ∃ T > 0 such that
µ t : ∥x(t ) − x∗ ∥ > ε, t ∈ [T , ∞) < ε. It can be written as x∗ = µ − limt →∞ x(t ). x∗ is an almost cluster point of x(·) if when t → ∞ ∀ ε > 0,
µ t : ∥x(t ) − x∗ ∥ ≤ ε, t ∈ [T , ∞) = ∞. The following propositions show that the usual concepts of limit and cluster are particular cases of almost limit and almost cluster point (Aubin & Cellina, 1984).
G. Li et al. / Neural Networks 50 (2014) 79–89
Proposition 1. The limit x∗ of x : [0, ∞) → Rn is an almost limit point. If x(·) is uniformly continuous, any cluster point x∗ of x(·) is an almost cluster point. Proposition 2. An almost limit x∗ of a measurable function x : [0, ∞) → Rn is a unique almost cluster point. If x(·) has a unique almost cluster point x∗ and {x(t ) : t ∈ [0, ∞)} is a bounded subset of Rn , µ − limt →∞ x(t ) = x∗ . Proposition 3. Let K be a compact subset of Rn and x : [0, ∞) → K be a measurable function, there exists an almost cluster x∗ ∈ K of x(·) when t → ∞. Definition 4. Function f : Rn → R is said to be Lipschitz near x ∈ Rn if there exist positive number k and ε such that |f (x2 )− f (x1 )| ≤ k∥x2 − x1 ∥, for all x1 , x2 ∈ x + ε B(0, 1). If f is Lipschitz near any point of its domain, then it is said to be locally Lipschitz. Suppose that f is Lipschitz near x ∈ Rn , the generalized directional derivative of f at x in the direction v ∈ Rn is given by f 0 (x; v) = lim sup y→x
f (y + t v) − f (y) t
t →0+
.
The quantity f 0 (x; v) is well defined and finite. Furthermore, Clarke’s generalized gradient of f at x is defined as
∂ f (x) = {ξ ∈ Rn : f 0 (x; v) ≥ ⟨v, ξ ⟩, for all v ∈ Rn }. By accounting for the properties of f 0 , some properties of ∂ f (x) can be obtained as follows (Clarke, 1969). Proposition 4. Let f : Rn → R be Lipschitz near x with a positive number k, then: (a) ∂ f (x) is a nonempty, convex, compact subset of Rn , and |∂ f (x)| = sup{∥ξ ∥ : ξ ∈ ∂ f (x)} ≤ k; (b) x ↩→ ∂ f (x) is u.s.c; (c) for every v ∈ Rn , f 0 (x; v) = max{⟨v, ξ ⟩ : ξ ∈ ∂ f (x)}.
81
Proposition 7. Let fi : Rn → R be regular near x (i = 1, 2, . . . , m), then,
∂
m
fi
(x) =
i=1
m
∂ fi (x).
i=1
Definition 6. Let f : Rn → R be locally Lipschitz, then f is invex if there exists a function η : Rn × Rn → Rn such that f (y) − f (x) ≥ ⟨ξ , η(y, x)⟩, ∀x, y ∈ Rn , and ∀ξ ∈ ∂ f (x). The notion of invexity was first introduced by Hanson (1981) to optimization theory as a very broad generalization of convexity and the term was coined soon after by Craven as a contraction of ‘‘invariant convexity’’ (Craven, 1981). Craven also showed that many multivariate composite functions are invex: if g : ℜn → ℜ is differentiable and convex and Ψ : ℜr → ℜn (r ≥ n) is differentiable and bijective with ∇ Ψ of rank n, then f = g (Ψ ) is invex in ℜr (Craven, 1981). For unconstrained optimization, BenIsrael and Mond (1986) proved that a function is invex if and only if every stationary point is a global minimum. As such, pseudoconvex functions are invex, but not vice verse. For constrained optimization, Hanson (1981) showed that, with an invex objective function and constraints with respect to the same kernel, the Karush–Kuhn–Tucker (KKT) conditions are sufficient for a global minimum. Invex optimization arises broadly with widespread applications in wireless communication, inventory management, portfolio selection, decision making, and so on (Frenk & Schaible, 2005). Proposition 8 (Pardalos, 2008). Let f : Rn → R be a regular invex function, then 0 ∈ ∂ f (x) if and only if x is a global minimizer of f . Proposition 9 (Pardalos, 2008). Let f1 , f2 , . . . , fm : Rn → R be invex with respect to the same function η(y, x) then a linear combination of f1 , f2 , . . . , fm : Rn → R, with nonnegative coefficients is invex with respect to the same η. f : Rn → R satisfies the growth condition if lim
∥x∥→+∞
f (x) = +∞.
When f is differentiable at x, ∂ f (x) reduces to the gradient ∇ f (x). When f is convex, ∂ f (x) coincides with the classical subdifferential of convex analysis, which is
Proposition 10. A function f : Rn → R satisfies the growth condition if and only if ∀α ∈ R, its level set L(α) = {x ∈ Rn : f (x) ≤ α} is bounded.
ξ ∈ ∂ f (x) ⇔ ∀y ∈ Rn ,
Lemma 1 (Pardalos, 2008). Let f : Rn → R be a regular invex function with respect to η, and g : R → R, g (x) = max{x, 0}, then the composite function g ◦ f is invex with respect to η.
f (y) ≥ f (x) + ⟨ξ , y − x⟩.
The concept of a regular function is defined as follows (Clarke, 1969). n
Definition 5. A function f : R → R which is Lipschitz near x is said to be regular at x provided the following conditions hold: (a) for all v ∈ Rn , the usual one-sided directional derivative f ′ (x; v) = limt →0+ [f (x + t v) − f (x)]/t exists; (b) for every v ∈ Rn , f ′ (x; v) = f 0 (x; v). Regular functions form a rather wide set, and several classes of them are presented in Clarke (1969) as follows.
Definition 7. Let f : Rn → R be locally Lipschitz, f is said to be pseudoconvex if
∀ x, y ∈ Rn ,
f (y) < f (x) ⇒ ⟨η, y − x⟩ < 0,
∀η ∈ ∂ f (x).
Definition 7 is equivalent to the following definition. Definition 8. Let f : Rn → R be locally Lipschitz, f is said to be pseudoconvex if ∀ x, y ∈ Rn , x ̸= y,
∃η ∈ ∂ f (x) : ⟨η, y − x⟩ ≥ 0 ⇒ f (y) ≥ f (x).
n
Proposition 5. Let f : R → R be Lipschitz near x:
Definition 9. A set-valued map F : Rn → R is said to be pseudomonotone if ∀ x, y ∈ Rn , x ̸= y,
(a) if f is strictly differentiable at x, then f is regular at x; (b) if f is convex function, then f is regular at x. Proposition 6. If f : R → R is regular at x(t ) and x : R → R is differentiable at t and Lipschitz near t, then n
d dt
f (x(t )) = ⟨ξ , x˙ (t )⟩ ∀ξ ∈ ∂ f (x(t )).
n
∃η(x) ∈ F (x) : ⟨η(x), y − x⟩ ≥ 0 ⇒ ∀η(y) ∈ F (y) : ⟨η(y), y − x⟩ ≥ 0. Proposition 11 (Penot & Quan, 1997). Let f : Rn → R be locally Lipschitz, then f is pseudoconvex if and only if ∂ f (x) is pseudomonotone.
82
G. Li et al. / Neural Networks 50 (2014) 79–89
3. Exact penalty function
is bounded. Extracting a subsequence and re-indexing, we may assume without loss of generality that there exists
Consider the following constrained optimization problem: minimize f (x)
lim z (k) = z¯ .
(5)
k→∞
(1)
subject to gi (x) ≤ 0 (i = 1, 2, . . . , m),
Since f is locally Lipschitzian, there exists a number L > 0 such that
where x = (x1 , x2 , . . . , xn )T ∈ Rn is the decision variable, f and gi : Rn → R (i = 1, 2 · · · , m) are regular invex functions with respect to the same η, and f satisfies the growth condition. Assume that
It follows from (4) and (5) that for all i ∈ {1, 2, . . . , m},
S = {x ∈ Rn : gi (x) ≤ 0, i = 1, 2, . . . , m} ̸= ∅,
max{gi (¯z ), 0} = lim max{gi (z (k) ), 0}
m
γi max{gi (x), 0},
(2)
x∈R
Therefore gi (¯z ) ≤ 0,
i ∈ {1, 2, . . . , m}.
(7)
I + (x) = {i ∈ {1, 2, . . . , m} : gi (x) > 0}, I 0 (x) = {i ∈ {1, 2, . . . , m} : gi (x) = 0}, I − (x) = {i ∈ {1, 2, . . . , m} : gi (x) < 0}. Since ϕλk γ is invex, and z (k) is a global minimizer of ϕλk γ , then by Proposition 8, for each integer k ≥ 1,
x ∈ Rn
has an optimal solution. A penalty function is said to have exact penalty property if there exists a penalty parameter for which a solution to the unconstrained penalized problem is a solution to the corresponding constrained problem. The following theorem shows that the penalty function (2) has the exact penalty property under appropriate conditions. The following assumption on constraint functions in problem (1) is needed. Assumption 1. For every 1 ≤ q ≤ m and each strictly increasing sequence 1 ≤ i1 < i2 < · · · < iq ≤ m, if
and
Theorem 1. Let Assumption 1 hold. There exists λ0 > 0 such that for ∀ λ > λ0 , z = argminx∈Rn ϕλγ (x) is an optimal solution to (1). Proof. Let us assume the converse. Then there exist a sequence (k) ∞ }i=1 ⊂ Rn such that for {λk }∞ k=1 ⊂ (0, +∞) and a sequence {z all natural numbers k, x∈R
i∈I + (z (k) )
+
λk γi {α∂ gi (z (k) ) : α ∈ [0, 1]}.
(8)
i∈I 0 (z (k) )
In view of (6) and (8), ∃ηki ∈ ∂ gi (z (k) ), i = 1, 2, . . . , m, and αki ∈ [0, 1] such that
γi ηki +
i∈I + (z (k) )
i∈I 0 (z (k) )
L . αki ηki ∈ η ∈ Rn : ∥η∥ ≤ λk
(9)
z
(k)
̸∈ S .
(10)
Since ∂ gi (¯z ) is a compact subset in R , then the sequence {ηki }∞ k=1 is bounded. Extracting a subsequence and re-indexing, we may assume without loss of generality that there exists n
then ηi1 , ηi2 , . . . , ηiq is linearly independent.
ϕλk γ (z ) = infn ϕλk γ (x),
λk γi ∂ gi (z (k) )
ηki ∈ ∂ gi (z (k) ) ⊂ ∂ gi (¯z ) + ε B(0, 1).
ηi1 ∈ ∂ gil (z ), ηi2 ∈ ∂ gi2 (z ), . . . , ηiq ∈ ∂ giq (z ),
(k)
0 ∈ ∂ϕλk γ (z (k) ) = ∂ f (z (k) ) +
Since ∂ gi (x) (i = 1, 2, . . . , m) is upper semicontinuous at z¯ ∈ Rn , then ∀ε > 0, ∃k0 such that k > k0 , we have
gil (z ) = gi2 (z ) = · · · = giq (z ) = 0,
lim ηki = ηi
k→∞
(3)
(i = 1, 2, . . . , m),
lim αki = αi .
k→∞
(11)
From (10) and (11), we have ηi ∈ ∂ gi (¯z ), (i = 1, 2, . . . , m). Taking k → ∞ in (9), it follows from (3), (7), (9) and (11) that
For each natural number k,
(γi ηi + αi ηi ) = 0.
(12)
i∈I 0 (¯z )
Therefore, {ηi ∈ ∂ gi (¯z ) : i ∈ I 0 (¯z )} is linear dependent, which is a contradiction with Assumption 1. Theorem 1 is proved.
inf f (x) = inf ϕλk γ (x)
x∈S
x∈S
k→∞
For each x ∈ R , set
where λ > 0 is a penalty parameter and γ = (γ1 , γ2 , . . . , γm ) ∈ (0, +∞)m is a constant vector. By Proposition 9 and Lemma 1, the function ϕλγ is a regular invex function which satisfies the growth condition, and the problem
λk ≥ k ,
1 −1 ≤ lim sup λ− γ inf f ( x ) − inf f ( x ) = 0. k i n
n
i =1
min ϕλγ (x),
(6)
k→∞
since f is a continuous function and satisfies the growth condition, the problem (1) has an optimal solution. For the problem (1), a penalty function is commonly defined as follows:
ϕλγ (x) = f (x) + λ
max ∥η∥ ≤ L.
η∈∂ f (z (k) )
x∈S
≥ infn ϕλk γ (x) = ϕλk γ (z (k) )
4. Convergence analysis
x∈R
= f (z (k) ) + λk
m
γi max{gi (z (k) ), 0}.
(4)
i=1
Relation (4) implies that f (z (k) ) ≤ infx∈S f (x). Since f satisfies the growth condition, it follows from (4) that the sequence {z (k) }∞ i=1
In view of the exact penalty property of (2), a recurrent neural network is proposed for solving the optimization problem (1):
x˙ (t ) ∈ −∂ϕλγ (x(t )); x(0) = x0 .
(13)
G. Li et al. / Neural Networks 50 (2014) 79–89
By computing the Clarke’s generalized gradient of ϕλγ (x), (13) can be written in detail as x˙ (t ) ∈ −∂ f (x(t )) −
∂ gi (x(t ))
−
then ϕλγ (x(t )) ≤ ϕλγ (x0 ). Since ϕλγ satisfies the growth condition, it follows from Proposition 10 that x(t ) is bounded, then x(t ) is defined [0, +∞). Note that +∞
i∈I + (x(t ))
{α∂ gi (x(t )) : α ∈ [0, 1]}, (14)
Definition 10. x¯ is said to be an equilibrium point of the system (13) if 0 ∈ −∂ϕλγ (¯x). Definition 11. A solution of (13) on [0, t1 ] is an absolutely continuous function satisfying x(0) = x0 and x˙ (t ) ∈ −∂ϕλγ (x(t )) a.a on [0, t1 ]. Since ∂ϕλγ (·) is an upper semicontinuous set-valued map with nonempty compact convex set values, the existence of a solution to (13) is a consequence of Aubin and Cellina (1984). Denote
t →+∞
then x˙ ∈ L2 ([0, +∞), Rn ). (ii) If x˙ does not almost converge to 0, then ∃ ε0 > 0 such that for ∀ T > 0,
µ{t : ∥˙x(t )∥ > ε0 , t ∈ [T , +∞)} ≥ ε0 , +∞ then T ∥˙x(τ )∥2 dτ ≥ ε02 which is a contradiction. (iii) Let x˙ (t ) = −ξ (t ), ξ (t ) ∈ ∂ϕλγ (x(t )), and t ∈ [0, +∞)\N, where µ(N ) = 0. Since x(·) on [0, +∞)\N is bounded, by Proposition 3 there exists an almost cluster point x∗ of x(t ) when t → +∞.
Next we shall show that x∗ is an equilibrium point of (13). Observe that x˙ (t ) almost converges to 0 when t → +∞, then there exists an increasing sequence tk ↑ +∞ in [0, +∞) such that lim x(tk ) = x∗ ,
lim x˙ (tk ) = 0.
k→∞
M = x¯ ∈ S : f (¯x) = inf f (x) . x∈S
x˙ (tk ) ∈ −ϕλγ (x(tk )) ⊂ −ϕλγ (x∗ ) + U .
The following theorem shows that for a sufficiently large penalty parameter, the state of (13) is globally convergent to M .
Taking k → ∞ in (4.6), it follows from (17) that
Theorem 2.
0 ∈ −ϕλγ (x∗ ) + U .
d dt
(i) For almost all t ≥ 0,
ϕλγ (x(t )) = −∥˙x(t )∥2 .
x(t ) is bounded, x˙ ∈ L2 ([0, +∞), Rn ), where L2 ([0, +∞), Rn ) is a set of all Lebesgue square integrable functions. (ii) x˙ (t ) almost converges to 0, i.e.,
µ − lim x˙ (t ) = 0. t →+∞
(iii) Any almost cluster point of x(t ) is an equilibrium point of (13), and lim ϕλγ (x(t )) = inf ϕλγ (x).
t →+∞
x∈Rn
In addition, let Assumption 1 hold. If λ is sufficiently large, any almost cluster point of x(t ) is also an optimal solution to the problem (1), and lim ϕλγ (x(t )) = inf f (x).
t →+∞
lim dist (x(t ), M ) = 0.
t →+∞
ϕλγ (x(t )) = ⟨ξ (t ), x˙ (t )⟩,
∀ξ (t ) ∈ ϕµγ (x(t )).
(15)
Replacing ξ (t ) with −˙x(t ) in (15), d dt
ϕλγ (x(t )) = −∥˙x(t )∥2 ,
(16)
then ϕλγ (x(t )) is a nonincreasing function. Integrating (16) over [0, t ], we obtain
ϕλγ (x(t )) − ϕλγ (x0 ) ≤ −
t
∥˙x(τ )∥ dτ ≤ 0, 2
0
lim ϕλγ (x(t )) = lim ϕλγ (x(tn )) = ϕλγ (x∗ ) = inf ϕλγ (x).
t →+∞
n→∞
x∈Rn
Since ϕλγ is invex, then Theorem 1 shows that there exists λ0 > 0 such that Eλγ = M if λ > λ0 , and the last assertions in (iii) naturally hold. (iv) If limt →+∞ dist(x(t ), Eλγ ) = 0 does not hold, then there exist ε0 > 0 and {tk }∞ k=1 ↑ +∞, such that dist(x(tk ), Eλγ ) ≥ ε0 . Since {x(tk )}∞ k=1 is bounded, we may assume without loss of generality that there exists (19)
k→∞
Next,we prove that x¯ ∈ Eλγ , i.e. 0 ∈ ∂ϕλγ (¯x). If it does not hold, that is dist(0, ∂ϕλγ (¯x)) = d > 0. Since x ↩→ ∂ϕλγ (x) is u.s.c, then x → dist(0, ∂ϕλγ (x)) is l.s.c. Therefore, ∀ ε > 0, ∃ δ > 0, such that for all x ∈ x¯ + δ B(0, 1), d = dist(0, ∂ϕλγ (¯x)) < dist(0, ∂ϕλγ (x)) + ε,
Proof. (i) From Proposition 6, for almost all t ≥ 0
dt
(18)
Since U is arbitrary, then 0 ∈ −ϕλγ (x∗ ), and x∗ is an equilibrium point of (13). Note that ϕλγ (x(t )) is a monotonically nonincreasing and lower bounded function, then by continuity of ϕλγ ,
lim x(tk ) = x¯ .
x∈S
(iv) limt →+∞ dist (x(t ), Eλγ ) = 0. In addition, let Assumption 1 hold. If λ is sufficiently large, then
d
(17)
k→∞
Since −∂ϕλγ (x) is upper semicontinuous at x∗ , for every closed neighborhood U of the origin of Rn , ∃k0 such that for k > k0 , we have
Eλγ = {¯x ∈ Rn : 0 ∈ −∂ϕλγ (¯x)},
∥˙x(τ )∥2 dτ ≤ ϕλγ (x0 ) − lim ϕλγ (x(t )), 0
i∈I 0 (x(t ))
x(0) = x0 .
83
Taking ε = 2d , when x ∈ x¯ + δ B(0, 1), dist(0, ∂ϕλγ (x)) > 2d . Since limk→∞ x(tk ) = x¯ , there exists a positive integer k0 , such that for all k > k0 , x(tk ) ∈ x¯ + 2δ B(0, 1). Note that x ↩→ ∂ϕλγ (x) is u.s.c and x(t ) is bounded, then ∂ϕλγ (x(t )) is bounded, set sup
η∈∂ϕλγ (x(t ))
∥η∥ ≤ L,
t ∈ [0, +∞).
Therefore,
∥x(t2 ) − x(t1 )∥ ≤
t2 t1
∥˙x(t )∥ ≤ L|t2 − t1 |.
84
G. Li et al. / Neural Networks 50 (2014) 79–89
When t ∈ [tk − δ/(4L), tk + δ/(4L)] and k ≥ k0 ,
If Γ is bounded, then Theorem 3 naturally holds. If Γ is unbounded, the following shall be proved
∥x(t ) − x¯ ∥ ≤ ∥x(t ) − x(tk )∥ + ∥x(tk ) − x¯ ∥ δ ≤ L|t − tk | + < δ.
inf
2
It follows that dist(0, ∂ϕλγ (x(t ))) >
d 2
∥ηk ∥ <
Since
[tk − δ/(4L), t + δ/(4L)] = +∞,
dist2 (0, ∂ϕλγ (x(τ )))dτ = +∞.
(22)
γi ∂ gi (x(tk ))
0
+∞
dist2 (0, ∂ϕλγ (x(τ )))dτ ≤
∥˙x(τ )∥2 dτ < +∞ 0
0
which contradicts (20). Therefore, x¯ ∈ Eλγ . Since t → dist(x(t ), Eλγ ) is continuous, then lim dist(x(tk ), Eλγ ) = dist(¯x, Eλγ ) = 0,
k→∞
which is a contradiction. The contradiction proves lim dist(x(t ), Eλγ ) = 0.
t →+∞
By the same argument in the proof of (iii), the last statement in (iv) of Theorem 2 holds. Next, we will show that a state of (13) can reach the feasible region in finite time and stay there thereafter. The following lemma is very useful in finite-time convergence analysis. Lemma 2 (Chong, Hui, & Zak, 1999). Let x(t ) be a state of (13). Suppose there exists V : Rn → R such that V (x(t )) is absolutely continuous on [0, +∞), and there exists ρ > 0 such that for almost all t ∈ [0, +∞) for which x(t ) ∈ {x ∈ Rn : V (x) > 0}, and V (x(t )) ≤ −ρ.
Then, the state x(t ) reaches {x ∈ Rn : V (x) ≤ 0} in finite time and stays there thereafter. Theorem 3. Based on Assumption 1, if λ is sufficiently large, then for any x0 ∈ Rn , a state of (13) with initial point x0 is guaranteed to be convergent to the feasible region S in finite time and stays there thereafter. Proof. Let x(·) be a state of (13) with initial point x0 , and Vγ (x) =
m
γi max{gi (x), 0},
i =1
where γ = (γ1 , γ2 , . . . , γm ) ∈ (0, +∞)m is a constant vector. Clearly, Vγ (x) is a regular invex function, and S = {x : Vγ (x) ≤ 0}. Set
Γ = {t : Vγ (x(t )) > 0}.
(23)
From (23) ∃ ηki ∈ ∂ gi (x(tk )), i ∈ I + (x(tk )) ∪ I 0 (x(tk )) and αki ∈ [0, 1], such that
+∞
{γi αi ηi : αi ∈ [0, 1], ηi ∈ ∂ gi (x(tk ))}.
i∈I 0 (x(tk ))
(20)
Note that
dt
.
+
+∞
d
k
i∈I + (x(tk ))
then
1
ηk ∈ ∂ Vγ (x(tk )) =
k>k0
(21)
Note that I + (x(tk )) ̸= ∅ and
µ
∥η∥ = l > 0.
If (21) does not hold, i.e. l = 0, then there exist {tk }∞ k=1 ⊂ Γ which satisfies tk → +∞ and ηk ∈ ∂ Vγ (x(tk )) such that
for all
t ∈ [tk − δ/(4L), tk + δ/(4L)].
min
t ∈Γ η∈∂ Vγ (x(t ))
ηk =
γi ηki +
i∈I + (x(tk ))
γi αki ηki .
(24)
i∈I 0 (x(tk ))
Since {x(tk )}∞ k=1 is bounded, we may assume without loss of generality that there exists lim x(tk ) = z¯ .
k→∞
Similar to the proof of (iv) in Theorem 2, we have z¯ ∈ M , thus gi (¯z ) ≤ 0, i = 1, 2, . . . , m. Extracting a subsequence and reindexing, we assume without loss of generality that, for all natural numbers k, I + (x(tk )) = I + (x(t1 )). Since gi (x(tk )) > 0 (i ∈ I + (x(tk ))), then lim gi (x(tk )) = gi (¯z ) ≥ 0,
k→+∞
i ∈ I + (x(t1 )).
Therefore gi (¯z ) = 0,
i ∈ I + (x(t1 )).
(25)
Since ∂ gi (x) (i = 1, 2, . . . , m) is u.s.c at z¯ ∈ R , then ∀ε > 0, ∃k0 such that k > k0 , we have n
ηki ∈ ∂ gi (x(tk )) ⊂ ∂ gi (¯z ) + ε B(0, 1).
(26)
Since ∂ gi (¯z ) is a compact subset in R , then the sequence {ηki }∞ k=1 is bounded. Note that αki ∈ [0, 1]. Extracting a subsequence and re-indexing, we may assume without loss of generality that there exists n
lim ηki = η¯i
k→∞
(i = 1, 2, . . . , m) and
lim αki = α¯i .
k→∞
(27)
Taking k → ∞ in (24), it follows from (22), (25) and (27) that
γi η¯i + γi α¯i η¯i = 0.
(28)
i∈I 0 (¯z )
Therefore, {η¯i ∈ ∂ gi (¯z ) : i ∈ I 0 (¯z )} is linear dependence, which is a contradiction to Assumption 1. The contradiction proves (21). Since x → Vγ (x) is a regular invex function and t → x(t ) is almost everywhere differentiable, for a.a t ≥ 0, there exists (d/dt )Vγ (x(t )). According to Proposition 6, for a.a. t ∈ Γ , d dt
Vγ (x(t )) = ⟨η(t ), x˙ (t )⟩,
∀η(t ) ∈ ∂ Vγ (x(t )).
From (14), there exist υ¯ ∈ ∂ f (x(t )) and η¯ ∈ ∂ Vλ (x(t )) such that d dt
Vγ (x(t )) = ⟨η( ¯ t ), −λ(η( ¯ t ) + υ( ¯ t ))⟩
≤ ∥υ( ¯ t )∥∥η( ¯ t )∥ − λ∥η( ¯ t )∥2 .
(29)
G. Li et al. / Neural Networks 50 (2014) 79–89
Since x(t ) is bounded, ∂ f (x(t )) and ∂ Vγ (x(t )) are u.s.c map with nonempty compact convex values, then ∃ l1 , l2 > 0 such that ∥υ(t )∥ ≤ l1 , ∀υ(t ) ∈ ∂ f (x(t )) and ∥η(t )∥ ≤ l2 , ∀η(t ) ∈ ∂ Vγ (x(t )). It follows from (21) and (29) that d dt
Vγ x(t ) ≤ l1 l2 − λl2 ,
thus, for λ > d dt
2l1 l2 l2
Vγ x(t ) ≤ −ρ.
(30)
According to Lemma 2, the state of (13) with initial point x0 reaches feasible S in finite time. Integrating the (30) on [0, t ], Vγ (x(t )) ≤ Vγ (x0 ) − ρ t .
ϑ = inf{t > 0 : x(t ) ̸∈ S } < +∞; by the continuity of t → x(t ), there exists h > 0 such that for any t ∈ (ϑ, ϑ + h), Vγ (x(t )) > 0.
(31)
On the other hand, by (30) and (ϑ, ϑ + h) ⊂ Γ , the function t → Vγ (x(t )) is decreasing on (ϑ, ϑ + h) and Vγ (x(ϑ)) = 0; hence, for any t ∈ (ϑ, ϑ + h), Vϑ (x(t )) < 0, which is a contradiction with (31). The proof of Theorem 3 also shows that if Assumption 1 holds and λ is sufficiently large, then S is an invariant set with respect to (13). By Theorem 3, we can further study the convergence of (13). The following lemma is very useful (Opial, 1967). Lemma 3. Let x : [0, ∞) → Rn be a curve such that there exists a nonempty subset C ⊂ Rn which satisfies the conditions below: (i) all cluster points of x(·) are contained in C ; (ii) ∀ x∗ ∈ C , limt →∞ ∥x(t ) − x∗ ∥ exists.
(d/dt )E (t , x∗ ) = −∥˙x(t )∥2 − ⟨ξ (t ), x(t ) − x∗ ⟩ −λ γi ⟨ζi (t ), x(t ) − x∗ ⟩,
(36)
Proof. Let x(·) be a state of (13) with initial point x0 ∈ Rn , and let C be the set of all cluster points of x(·). Since x(t ) is bounded, then C ̸= ∅. From the proof of Theorem 2, C ⊂ M when λ is sufficiently large. By Theorem 3, there exists t0 > 0 such that for t > t0 , x(t ) ∈ S. In the following proof, we assume that t > t0 . For any x∗ ∈ C , consider the following energy function (32)
By Proposition 6, for almost all t ≥ 0 and ∀η(t ) ∈ ∂ϕλγ (x(t )),
(d/dt )E (t , x∗ ) = ⟨η(t ), x˙ (t )⟩ + ⟨x(t ) − x∗ , x˙ (t )⟩.
(33)
Replacing x˙ (t ) with −η(t ) in (33), then
(37)
Clearly, for other circumstances, (37) still holds. By (37), E (t , x∗ ) is monotonically nonincreasing and bounded with respect to t , E (t , x∗ ) converges as t → +∞. By (i) of Theorem 2, ϕλγ (x(t )) converges as t → +∞, then ∥x(t )− x∗ ∥ is convergent as t → +∞. According to Lemma 3, x(t ) converges to an element of C ⊂ M , i.e., x(t ) converges to an optimal solution of (1). At this time, C = {x∗ }, limt →+∞ x(t ) = x∗ . 5. Penalty parameter estimation In this section, the goal is to estimate the penalty parameter λ. To this end, an assumption S = intS is needed. Let r > 0 and Sr = {x : Vγ (x) ≤ r }, where Sr is a compact subset, and S ⊂ Sr . Since f (x) and Vγ (x) are local Lipschitz functions and x ↩→ ∂ f (x) and x ↩→ ∂ Vγ (x) are u.s.c maps with nonempty compact convex values, Sr \ intS is a compact subset, there exist Lf > 0 and LV > 0 such that, for x ∈ Sr \ intS , |∂ f (x)| = max{∥η∥ : η ∈ ∂ f (x)} ≤ Lf and |∂ Vγ (x)| = max{∥η∥ : η ∈ ∂ Vγ (x)} ≤ LV .
x∈Sr \intS
(34)
By (14), ∃ ξ (t ) ∈ ∂ f (x(t )), ζi (t ) ∈ ∂ gi (x(t )) (i ∈ I + (x(t ))∪ I 0 (x(t ))) and αi ∈ [0, 1] (i ∈ I 0 (x(t ))), such that
i∈I 0 (x(t ))
γi αi ζi (t ).
(35)
(38)
Proof. Since Vγ (x) is a locally Lipschitz function and x ↩→ ∂ Vγ (x) is a u.s.c map with nonempty compact convex values, then x → dist(0, ∂ Vγ (x)) is a l.s.c real-valued function, there exists x¯ ∈ Sr \ intS such that dist(0, ∂ Vγ (¯x)) =
min dist(0, ∂ Vγ (x)) = mg .
x∈Sr \intS
(39)
(i) For x¯ ̸∈ bdS, it is needed to verify that 0 ̸∈ ∂ Vγ (¯x). Since Vγ (x) is a invex function, Vγ (¯x) > 0 and Vγ (ˆx) = 0, where xˆ ∈ intS, then 0 ̸∈ ∂ Vγ (¯x). (ii) For x¯ ∈ bdS, then I0 (¯x) ̸= ∅. Suppose that mg = 0. Since x → dist(0, ∂ Vγ (x)) is a l.s.c real-valued function and S = intS, we can take {xk } ⊂ Sr \ S , xk → x¯ and ηk ∈ ∂ Vγ (xk ) such that lim dist(0, ∂ Vγ (xk )) = lim ∥ηk ∥ = 0.
k→+∞
(d/dt )E (t , x∗ ) = −∥˙x(t )∥2 − ⟨x(t ) − x∗ , η(t )⟩.
γ i ζi ( t ) − λ
(d/dt )E (t , x∗ ) ≤ 0.
min dist (0, ∂ Vγ (x)) = mg > 0.
Theorem 4. Let the objective and constraint functions f and gi (i = 1, 2, . . . , m) in Problem (1) be regular pseudoconvex functions and the Assumption 1 hold. If λ is sufficiently large, a state of (13) converges to a unique optimal solution to the Problem (1).
t → E (t , x∗ ) = ϕλγ (x(t )) + (1/2)∥x(t ) − x∗ ∥2 .
thus
Lemma 4. Suppose that (1) satisfies Assumption 1, then
Then, x(t ) converges to an element of C as t → ∞.
i∈I + (x(t ))
for ∀ζi (t ) ∈ ∂ gi (x(t )) (i ∈ I + (x(t ))). Since max{0, gi (x)} (i ∈ I 0 (x(t ))) are regular, then in (35) we can take αi = 0 (i ∈ I 0 (x(t ))). By (33), (35) and the above analysis, if x(t ) ̸∈ M and x(t ) ∈ ∂ S, the for almost all t > t0 ,
i∈I + (x(t ))
Next we prove that when t ≥ Vγ (x0 )/ρ , the state of (13) with initial point x0 remains inside S thereafter. If not so, we suppose that the state leaves S in finite time. Let
If x(t ) ̸∈ M , f (x(t )) > f (x∗ ). Since f (x) is pseudoconvex, then ⟨ξ (t ), x(t ) − x∗ ⟩ > 0 for ∀ξ (t ) ∈ ∂ f (x(t )). If x(t ) ∈ M, then x(t ) reaches M in finite time. If x(t ) ∈ bdS (bdS is the boundary point set of S), then gi (x(t )) > gi (x∗ ) (i ∈ I + (x(t ))). Since gi (x) (i ∈ I + (x(t ))) are pseudoconvex, then
⟨ζi (t ), x(t ) − x∗ ⟩ > 0
, ρ = l1 l2 > 0, we obtain
η(t ) = −ξ (t ) − λ
85
k→+∞
(40)
Note that I + (xk ) ̸= ∅ and
ηk ∈ ∂ Vγ (xk ) =
γi ∂ gi (xk )
i∈I + (xk )
+
{γi αi ηi : αi ∈ [0, 1], ηi ∈ ∂ gi (xk )}.
i∈I 0 (xk )
(41)
86
G. Li et al. / Neural Networks 50 (2014) 79–89
Therefore, ∃ ηki ∈ ∂ gi (xk ), i ∈ I + (xk ) ∪ I 0 (xk ) and αki ∈ [0, 1], such that
ηk =
γi ηki +
i∈I + (xk )
γi αki ηki .
(42)
i∈I 0 (xk )
Extracting a subsequence and re-indexing, we assume with out loss of generality that, for all natural numbers k, I + (xk ) = I + (x1 ) and lim ηki = η¯i
k→∞
(i = 1, 2, . . . , m) and
lim αki = α¯i .
k→∞
(43)
Since ∂ gi (x) (i = 1, 2, . . . , m) is u.s.c at x¯ , ∀ε > 0, ∃k0 such that k > k0 , we have
ηki ∈ ∂ gi (xk ) ⊂ ∂ gi (¯x) + εB(0, 1). Therefore,
η¯i ∈ ∂ gi (¯x),
i = 1, 2, . . . , m.
(44)
Taking k → ∞ in (42), it follows from (40), (43) and (44) that
γi η¯i + γi α¯i η¯i = 0.
Fig. 1. Objective function in Example 1.
(45)
i∈I 0 (¯x)
Therefore, {η¯i ∈ ∂ gi (¯x) : i ∈ I 0 (¯x)} is linearly dependent, which is a contradiction to Assumption 1, thus mg > 0. Theorem 5. Let Assumption 1 hold. If x0 ∈ Sr and λ > Lf LV / then x(t ) ∈ Sr for any t > 0.
m2g ,
Proof. Let x(·) be a state of (13). Since x → Vγ is a locally Lipschitz function, for a.a. t ≥ 0, there exists (d/dt )Vγ (x(t )). For a.a. t ≥ 0, d dt
Vγ (x(t )) = ⟨η(t ), x˙ (t )⟩,
∀η(t ) ∈ ∂ Vγ (x(t )). From (14), we obtain that there exist υ¯ ∈ ∂ f (x(t )) and η¯ ∈ ∂ Vλ (x(t )) such that d dt
Theorem 7. Let Assumption 1 hold. If x0 ∈ Sr and λ > max{Lf LV / m2g , Lf /mg }, any equilibrium point of (13) is an optimal solution to the problem (1). Proof. Let x∗ be an equilibrium point of (13). Since ϕλγ is an invex function, according to Proposition 8, x∗ is a minimum point of ϕλγ on Rn . By Theorem 5, x∗ ∈ Sr . We say that x∗ ∈ S. If not, x∗ ∈ Sr \S, then ∃η ∈ ∂ f (x∗ ), v ∈ ∂ Vγ (x∗ ),
|∂ϕλγ (x∗ )| = min{∥ω∥ : ω ∈ ∂ϕλγ (x∗ )} = ∥η + λv∥ ≥ λ∥v∥ − ∥η∥ ≥ λmg − Lf > 0, which is a contradiction with 0 ∈ −∂ϕλγ (x∗ ). Therefore x∗ is an optimal solution to problem (1). 6. Simulation results
Vγ (x(t )) = ⟨η( ¯ t ), −λ(η( ¯ t ) + υ( ¯ t ))⟩
≤ ∥υ( ¯ t )∥∥η( ¯ t )∥ − λ∥η( ¯ t )∥2 2 ≤ Lf LV − λmq < 0,
(46)
In this section, simulation results on two nonsmooth invex optimization problems are provided to illustrate the effectiveness and efficiency of the proposed recurrent neural network model (13).
thus, Vγ (x(t )) is a monotonically nonincreasing function. Therefore, Vγ (x(t )) ≤ Vγ (x0 ) ≤ r, and x(t ) ∈ Sr .
Example 1. Consider an invex optimization problem as follows:
Theorem 6. Let Assumption 1 hold. If x0 ∈ Sr and λ > Lf LV /m2g , the state of (13) with initial point x0 reaches S in finite time and stays there thereafter.
subject to x21 − x2 + 1/4 ≤ 0,
Proof. Let x(·) be a state of (13) with initial point x0 . By the (46), d dt
Vγ (x(t )) ≤ Lf LV − λm2q < 0.
Taking ρ = −(Lf LV − λm2q ) > 0, then ρ > 0 and
2
min f (x) = 1 + x21 − e−x2
dt
Vγ (x(t )) ≤ −ρ.
∂ϕλ (x) = (2x1 , 2x2 + 2x2 e−x2 )T + λ (47)
Integrating (47) on [0, t ], then Vγ (x(t )) ≤ Vγ (x0 ) − ρ t . When t = Vγ (x0 )/ρ, Vγ (x(t )) = 0. Therefore, the state of (13) with initial point x0 reaches S in finite time by an upper bound of the hit time t = Vγ (x0 )/ρ . Similar to the proof of Theorem 4, when t ≥ Vγ (x0 )/ρ , the state of (13) with initial point x0 remains inside S thereafter.
(48)
As depicted in Fig. 1, the objective function f (x) is a smooth invex function and satisfies the growth condition. However, f is not quasi-convex. The feasible region S = {x ∈ ℜ2 : x21 − x2 + 1/4 ≤ 0, −|x1 | + x2 − 1 ≤ 0} is not a convex set. For x ∈ bdS, it can be checked that Assumption 1 holds. Taking γ = (1, 1), the generalized gradient of ϕλ is computed as follows: 2
d
−|x1 | + x2 − 1 ≤ 0.
2
∂ max{0, gi (x)},
i=1
where
T x21 − x2 + 1/4 > 0; (2x1 , −1) , T ∂ max{0, g1 (x)} = (0, [2x1 , −1]) , x21 − x2 + 1/4 = 0; (0, 0), x21 − x2 + 1/4 < 0; −|x1 | + x2 − 1 > 0; ∂ g2 (x), ∂ max{0, g2 (x)} = [0, 1] ∂ g2 (x), −|x1 | + x2 − 1 = 0; (0, 0), −|x1 | + x2 − 1 < 0.
G. Li et al. / Neural Networks 50 (2014) 79–89
Fig. 2. Transient behaviors of the state variables of the neural network in Example 1.
87
Fig. 4. Transient behaviors of the neural network in Nazemi (2012) in Example 1.
Example 2. Consider an optimization problem with a pseudoconvex objective as follows: min f (x) = x21 + x22 + |x1 − x2 − 1| subject to − |x1 | + |x2 | − 1/2 ≤ 0,
(49)
x21 − x2 − 3/4 ≤ 0. The generalized gradient terms in (14) are given as follows:
Fig. 3. Phase plot of state variables in Example 1.
and
(−1, −1)T , ∂ g2 (x) = ([−1, 1] , −1)T , (1, −1)T ,
x1 > 0; x1 = 0; x1 < 0.
The global minimum of unconstrained f (x) is x = (0, 0)T , but the point is not in the feasible region. Let r = 2, then the design parameter for solving the constrained optimization problem (48) is estimated as λ > 12. Let ϵ = 10−5 in the neural network model (13). Fig. 2 depicts the transient behaviors of the state vector x(t ) with 10 random initial values. Fig. 3 depicts the two-dimensional phase plot of (x1 , x2 ) from 100 random initial points. The simulation results show that the state variables globally converge to the unique optimal solution x∗ = (0, 0.25)T . However, most projection neural networks (Hu & Wang, 2007; Liu, Guo, & Wang, 2012; Liu & Wang, 2011a, 2011b; Xia & Wang, 2004) cannot deal with the nonconvex feasible region, so they are not capable of solving (48). As shown in Fig. 4, neural network models for convex optimization, such as the one presented in Nazemi (2012), cannot converge to the optimal solution.
x1 − x2 − 1 > 0; (2x1 + 1, 2x2 − 1)T , ∂ f (x) = (2x1 + [−1, 1], 2x2 − [−1, 1])T , x1 − x2 − 1 = 0; (2x1 − 1, 2x2 + 1)T , x1 − x2 − 1 < 0. g1 (x) > 0; ∂(−|x1 | + |x2 | − 1/2), ∂ max(0, g1 (x)) = [0, 1] ∂(−|x1 | + |x2 | − 1/2), g1 (x) = 0; (0, 0)T , g1 (x) < 0. T g 2 ( x) > 0 ; (2x1 , −1) , ∂ max(0, g2 (x)) = [0, 1] (2x1 , −1)T , g2 (x) = 0; (0, 0)T , g 2 ( x) < 0 . ∂(−|x1 | + |x2 | − 1/2) (−1, 1)T , x1 > 0, x2 > 0; T (− 1 , [− 1 , 1 ]) , x1 > 0, x2 = 0; T (− 1 , − 1 ) , x 1 > 0, x2 < 0; x1 = 0, x2 > 0; ([−1, 1], 1)T , = ([−1, 1], [−1, 1])T , x1 = x2 = 0; ([−1, 1], −1)T , x1 = 0, x2 < 0; T ( 1 , 1 ) , x 1 < 0, x2 > 0; T ( 1 , [− 1 , 1 ]) , x 1 < 0, x2 = 0; (1, −1)T , x1 < 0, x2 < 0. It can be easily verified that g1 (x) and g2 (x) satisfy Assumption 1. The feasible region is a nonconvex set. Let r = 2 then the design parameter is estimated as λ > 8.4. Let ϵ = 10−5 in the neural network model (13). Fig. 5 depicts the transient behaviors of state vector x(t ) with 10 random initial values. Figs. 6 and 7 depicts the two-dimensional phase plot of (x1 , x2 ) from 100 random initial points. The simulation results show that the state variables globally converge to the unique optimal solution x∗ = (0.5, −0.5)T . In contrast, most existing neural networks, such as the model presented in Xia, Feng, and Wang (2008) cannot converge to the global optimal solution.
88
G. Li et al. / Neural Networks 50 (2014) 79–89
7. Conclusions This paper has presented a recurrent neural network for solving nonsmooth invex optimization problems with inequality constraints based on a penalty approach. In order to guarantee the exact penalty property of the penalty function, a sufficient condition for the constraint functions was derived. This sufficient condition is easy to verify when the constraint functions are smooth. Based on this condition, the proposed neural network has been rigorously analyzed in the framework of differential inclusions. Results on the global convergence, finite-time convergence, and optimization capabilities have been obtained. Furthermore, the penalty parameter and the convergence time to the feasible region can be numerically estimated. Compared with most existing neural networks for nonsmooth optimization, the proposed neural network allows the constraint functions to be nonconvex and the feasible region to be unbounded. References Fig. 5. Transient behaviors of the state variables of the neural network in Example 2.
Fig. 6. Phase plot of state variables in Example 2.
Fig. 7. Transient behaviors of the neural network in Xia et al. (2008) in Example 2.
Aubin, J., & Cellina, A. (1984). Differential inclusions. Berlin: Springer-Verlag. Ben-Israel, A., & Mond, B. (1986). What is invexity? Bulletin of the Australian Mathematical Society, Series B, 28, 1–9. Bian, W., & Xue, X. (2009). Subgradient-based neural networks for nonsmooth nonconvex optimization problems. IEEE Transactions on Neural Networks, 20, 1024–1038. Cheng, L., Hou, Z., Lin, Y., Tan, M., Zhang, W. C., & Wu, F. (2011). Recurrent neural network for nonsmooth convex optimization problems with applications to the identification of genetic regulatory networks. IEEE Transactions on Neural Networks, 22, 714–726. Chong, E., Hui, S., & Zak, S. (1999). An analysis of a class of neural network for solving linear programming problems. IEEE Transactions on Automatic Control, 44, 1995–2006. Clarke, F. (1969). Optimization and non-smooth analysis. New York: Wiley. Craven, B. D. (1981). Duality for generalized convex fractional programs. In S. Schaible, & W. T. Ziemba (Eds.), Generalized concavity in optimization and economics. New York: Academic Press. Filippov, A. (1988). Differential equations with discontinuous right-hand side. Dordrecht: Kluwer Academic. Forti, M., Nistri, P., & Quincampoix, M. (2004). Generalized neural network for nonsmooth nonlinear programming problems. IEEE Transactions on Circuits and Systems. I, 51, 1741–1754. Forti, M., Nistri, P., & Quincampoix, M. (2006). Convergence of neural network for programming problems via a nonsmooth Łojasiewicz inequality. IEEE Transactions on Neural Networks, 17, 1471–1485. Frenk, J., & Schaible, S. (2005). Fractional programming. In Handbook of generalized convexity and generalized monotonicity. Springer. Gao, X. (2004). A novel neural network for nonlinear convex programming. IEEE Transactions on Neural Networks, 15, 613–621. Guo, Z., Liu, Q., & Wang, J. (2011). A one-layer recurrent neural network for pseudoconvex optimization subject to linear equality constraints. IEEE Transactions on Neural Networks, 22, 1892–1900. Hanson, M. A. (1981). On sufficiency of the Kuhn–Tucker conditions. Journal of Mathematical Analysis and Applications, 80, 545–550. Hopfield, J. J., & Tank, D. W. (1985). Neural computation of decisions in optimization problems. Biological Cybernetics, 52, 141–152. Hosseini, A., Wang, J., & Hosseini, S. M. (2013). A recurrent neural network for solving a class of generalized convex optimization problems. Neural Networks, 44, 78–86. Hu, X., & Wang, J. (2006). Solving pseudomonotone variational inequalities and pseudoconvex optimization problems using the projection neural network. IEEE Transactions on Neural Networks, 17, 1487–1499. Hu, X., & Wang, J. (2007). Design of general projection neural network for solving monotone linear variational inequalities and linear and quadratic optimization problems. IEEE Transactions on Systems, Man and Cybernetics. Part B: Cybernetics, 37, 1414–1421. Kennedy, M. P., & Chua, L. O. (1988). Neural networks for nonlinear programming. IEEE Transactions on Circuits and Systems, 35, 554–562. Li, G., Song, S., & WU, C. (2010). Generalized gradient projection neural networks for nonsmooth optimization problems. Science China Information Sciences, 53, 990–1005. Liu, Q., Guo, Z., & Wang, J. (2012). A one-layer recurrent neural network for constrained pseudoconvex optimization and its application for dynamic portfolio optimization. Neural Networks, 26, 99–109. Liu, Q., & Wang, J. (2011a). Finite-time convergent recurrent neural network with a hard-limiting activation function for constrained optimization with piecewise-linear objective functions. IEEE Transactions on Neural Networks, 22, 601–613.
G. Li et al. / Neural Networks 50 (2014) 79–89 Liu, Q., & Wang, J. (2011b). A one-layer recurrent neural network for constrained nonsmooth optimization. IEEE Transactions on Systems, Man and Cybernetics. Part B: Cybernetics, 40, 1323–1333. Liu, Q., & Wang, J. (2013). A one-layer projection neural network for nonsmooth optimization subject to linear equalities and bound constraints. IEEE Transactions on Neural Networks and Learning Systems, 24, 812–824. Nazemi, A. R. (2012). A dynamic system model for solving convex nonlinear optimization problems. Communications in Nonlinear Science and Numerical Simulation, 17, 1696–1705. Opial, Z. (1967). Weak convergence of the sequence of successive approximations for nonexpansive mappings. Bulletion of the American Mathematical Society, 73, 591–597. Pardalos, P. (2008). Nonconvex optimization and its application. Berlin, Heidelberg. Penot, J., & Quan, P. (1997). Generalized convexity of functions and generalized monotonicity of set-valued maps. Journal of Optimiation Theory and Applications, 92, 343–356. Tank, D. W., & Hopfield, J. J. (1986). Simple ‘neural’ optimization networks: an A/D converter, signal decision circuit, and a linear programming circuit. IEEE Transactions on Circuits and Systems, 33, 533–541. Wang, J. (1993). Analysis and design of a recurrent neural network for linear programming. IEEE Transactions on Circuits and Systems. I, 40, 613–618.
89
Wang, J. (1994). A deterministic annealing neural network for convex programming. Neural Networks, 7, 629–641. Wang, J. (1997). Primal and dual assignment networks. IEEE Transactions on Neural Networks, 8, 784–790. Xia, Y., Feng, G., & Wang, J. (2008). A novel neural network for solving nonlinear optimization problems with inequality constraints. IEEE Transactions on Neural Networks, 19, 1340–1353. Xia, Y., Leung, H., & Wang, J. (2002). A projection neural network and its application to constrained optimization problems. IEEE Transactions on Circuits and Systems. I, 49, 447–458. Xia, Y., & Wang, J. (2001). A dual neural network for kinematic control of redundant robot manipulators. IEEE Transactions on Systems, Man and Cybernetics. Part B: Cybernetics, 31, 147–154. Xia, Y., & Wang, J. (2004). A general projection neural network for solving optimization and related problems. IEEE Transactions on Neural Networks, 15, 318–328. Xue, X., & Bian, W. (2008). Subgradient-based neural networks for nonsmooth convex optimization problems. IEEE Transactions on Circuits and Systems. I, 55, 2378–2391. Zhang, S., & Constantinides, A. G. (1992). Lagrange programming neural network. IEEE Transactions on Circuits and Systems. II, 39, 441–452.