Automatica 58 (2015) 51–59
Contents lists available at ScienceDirect
Automatica journal homepage: www.elsevier.com/locate/automatica
Brief paper
Shahshahani gradient-like extremum seeking✩ Jorge I. Poveda a,1 , Nicanor Quijano b a
Department of Electrical and Computer Engineering, University of California, Santa Barbara, USA
b
Department of Electrical and Electronics Engineering, University of Los Andes, Bogotá, Colombia
article
info
Article history: Received 2 May 2013 Received in revised form 9 October 2014 Accepted 24 April 2015
Keywords: Extremum seeking Adaptive control Optimization Game theory Nonlinear control systems
abstract In this paper we present novel extremum seeking controllers designed to achieve online constrained optimization of uncertain input-to-output maps of a class of nonlinear dynamical systems. The algorithms, inspired by a class of evolutionary dynamics that emerge in the context of population games, generate online Shahshahani gradient-like systems able to achieve extremum seeking under simplex-like constraints on the positive orthant. The method attains semi-global practical convergence to the optimal point, even when the initial conditions are not in the feasible set, and it can be naturally adapted to address distributed extremum seeking problems in multi-agent systems where an all-to-all communication structure is not available. Potential applications include problems on distributed dynamic resource allocation, congestion and flow control in networked systems, as well as portfolio optimization in financial markets. Via simulations, we illustrate our results under different scenarios. © 2015 Elsevier Ltd. All rights reserved.
1. Introduction Extremum seeking control (ESC) (Ariyur & Krstić, 2003) is a type of adaptive control designed to optimize the steady state input-tooutput map of a nonlinear dynamical system in which, in principle, this map is not exactly known. Different types of ESCs have been proposed during the last years aiming to solve standard optimization problems (Ne˘sić, Tan, Moase, & Manzie, 2010; Tan, Ne˘sić, & Mareels, 2006), as well as to achieve online learning in noncooperative and cooperative games (Frihauf, Krstić, & Başar, 2012; Kvaternik & Pavel, 2012; Stanković, Johansson, & Stipanović, 2000). In recent years, there has been interest to address the problem of extremum seeking with constraints. For instance, in Tan, Li, and Mareels (2013), the problem of extremum seeking with saturation constraints is addressed based on penalty functions and antiwindup techniques. Also, in Durr, Zeng, and Ebenbauer (2013),
✩ This work was supported in part by the Colombian Administrative Department of Science, Technology and Innovation (COLCIENCIAS), Programa Jóvenes Investigadores—2012. The material in this paper was partially presented at the 51st IEEE Conference on Decision and Control, December 10–13, 2012, Maui, Hawaii, USA and at the American Control Conference, June 17–19, 2013, Washington, USA. This paper was recommended for publication in revised form by Associate Editor Raul Ordóñez under the direction of Editor Miroslav Krstic. This work was performed while J.I. Poveda was with University of Los Andes, Bogotá, Colombia. E-mail addresses:
[email protected] (J.I. Poveda),
[email protected] (N. Quijano). 1 Tel.: +1 805 8868162.
http://dx.doi.org/10.1016/j.automatica.2015.05.002 0005-1098/© 2015 Elsevier Ltd. All rights reserved.
an extremum seeking algorithm based on a Lie-bracket approximation is shown to achieve convex optimization of static maps. Other works, such as DeHaan and Guay (2005), use penalty terms in the cost functions to integrate the constraints in the optimization problem and achieve extremum seeking. Nevertheless, most of the existing approaches are not suitable for the case when multivariable global constraints are included in the extremum seeking problem for dynamical systems, or when a lack of all-to-all communication structures preclude the implementation of centralized controllers. Since this type of optimization problems arises in several engineering, economics, and finance applications, there exists an increasing interest in designing adaptive and distributed algorithms for dynamical systems, that guarantee convergence to the optimal solution in a robust manner. From this perspective, the main contribution of this paper is the introduction of novel extremum seeking controllers based on the framework of population games (Sandholm, 2010), which combine the main features of some evolutionary dynamics that emerge in the context of biological systems (Hofbauer & Sigmund, 1998), and the standard ESC. Since states with negative values are unrealistic in many practical applications we consider extremum seeking (ES) in the positive orthant, where the summation of the variables under control must be less or equal than a given positive value, allowing inequality constraints in the optimization problem. The ES dynamics proposed in this paper retain the optimality and adaptability properties of the classic evolutionary dynamics, as well as the robustness and non-model based characteristics of the standard ESC. Using the framework of population games we
52
J.I. Poveda, N. Quijano / Automatica 58 (2015) 51–59
show that the ES dynamics guarantee convergence, in a semiglobal practical sense, to the optimal solution of a constrained optimization problem with a global cost function, as well as in systems describing strictly stable games where a potential function does not exist. Additionally, we show that these dynamics can be naturally modified to achieve ES in multi-agent systems without an all-to-all communication structure. The remainder of this paper is organized as follows. In Section 2 some preliminary concepts are presented. In Section 3 we introduce the ESC for constrained optimization of dynamical systems with a global potential function. Section 4 extends the results for systems where the potential function does not exist. Section 5 addresses the problem of distributed extremum seeking in multi-agent systems. Section 6 discusses some further remarks on the implementation of the algorithms, while Section 7 presents some numerical simulations. Finally, Section 8 ends with some conclusions. 2. Preliminaries We denote by N>0 the set of positive integers, and by Rn the set of real numbers in the n-dimension, where n ∈ N>0 . Also, we denote by Rn≥0 and Rn>0 the set of non-negative real numbers and strictly positive real numbers, respectively, in the n-dimension. We use ∇ fi to represent the classical Euclidean gradient of a continuously differentiable function fi : Rn → Rn , and ∇i fi to represent ∂ f (z ) the ith entry of ∇ fi , i.e., ∇i fi := ∂i z . Given a point z ∈ Rn the set i S (z ) := {i ∈ {1, . . . , n} : zi > 0} denotes the support of z. Given a set M we use M to denote its closure, and Sz ∗ (M ) := {z ∈ M : S (z ∗ ) ⊂ S (z )} to denote the set of points in M whose support contain the support of z ∗ ∈ M. The symbol B denotes a closed unit ball, ρ B a closed ball of radius ρ > 0, and M + ρ B the union of all sets obtained by taking a closed ball of radius ρ around each point in the set M. We use 1 to represent the column vector of appropriate dimensions with entries equal to 1, and ∆nr := {z ∈ Rn≥0 : 1⊤ z = r } to represent the simplex in the n-dimension, parameterized by the constant r ∈ R>0 . We define int(∆nr ) := {z ∈ R>0 : 1⊤ z = r }, ¯ nr := {z ∈ Rn≥0 : 1⊤ z ≤ r }, and the corresponding set int(∆ ¯ nr ). ∆ We use the acronyms AS and SPAS to refer to asymptotical stability (Khalil, 2002), and semi-global practical asymptotical stability (Tan et al., 2006; Teel, Moreau, & Ne˘sić, 2003) respectively.
Note that according to Definitions 1 and 2, a GESS is a NE, but a NE is not necessarily a GESS. Also, it is a standard fact that every population game admits at least one Nash equilibrium. If this NE is also a GESS, then it is also unique (Sandholm, 2010). In this paper we will focus on optimization and learning problems that can be analyzed using the framework of a special type of population games, named strictly stable games. Definition 3. A population game in ∆nr , with vector of payoffs f (y), is said to be strictly stable if it satisfies
(y − z )⊤ (f (y) − f (z )) < 0,
Lemma 1. Consider a strictly stable game defined in ∆nr , with vector of payoffs f . Then, its unique NE is also a GESS. 2.2. Shahshahani gradients The evolution in time of the population state y(t ) towards a NE or a GESS can be ruled by different dynamics known as evolutionary dynamics (Hofbauer & Sigmund, 1998). In the present work we consider the most recognized evolutionary dynamics, i.e., the replicator dynamics (RDs), which are given by
Consider a population of entities playing a game, where each entity selects a pure strategy i from the set H n = {1, . . . , n}, where n ∈ N>0 . Let yi be the normalized proportion of entities playing strategy i, such that the population state y = [y1 , . . . , yn ]⊤ satisfies y ∈ ∆nr , for some fixed r > 0. The payoff function associated to the ith strategy is defined as a real-valued continuously differentiable function fi : ∆nr → R, and the vector of payoffs is defined as f (y) := [f1 (y), . . . , fi , (y) . . . , fn (y)]⊤ . The interactions among the entities describe a population game, where two of the main equilibrium concepts are the Nash equilibrium (NE) and the globally evolutionarily stable state (GESS). The following definitions are adapted from Sandholm (2010). Definition 1. For a population game defined in ∆nr , with vector of payoffs given by f (y), the point y∗ is a NE if (y − y∗ )⊤ f (y∗ ) ≤ 0, for all y ∈ ∆nr . If this inequality is satisfied uniquely by y∗ we say that NE(f ) = {y∗ }.
∆nr ,
Definition 2. For a population game defined in with vector of payoffs given by f (y), the point y∗ is a GESS if (y − y∗ )⊤ f (y) < 0, for all y ∈ ∆nr \{y∗ }. If this strict inequality is satisfied we say that GESS(f ) = {y∗ }.
(1)
Stable games have been studied in the context of transportation science (Dafermos, 1980; Smith, 1979), and more recently in the context of feedback control and passive systems (Fox & Shamma, 2013). Examples of strictly stable games include some types of symmetric normal form games, negative dominant diagonal games, and strictly concave potential games. For a detailed discussion about stable games the reader is referred to Sandholm and Hofbauer (2009). Since in general we will be dealing with dynamical systems for which the function f is an unknown mapping defined in Rn , and not only in ∆nr , we will say that a continuous vector function f : Rn → Rn satisfies the strictly stable game condition in ∆nr , if it satisfies Eq. (1). The following Lemma, adapted from Sandholm (2010), relates the concepts of NE, GESS, and strictly stable games.
y˙ i = α yi fi (y) − f¯ (y) ,
2.1. Population games and strictly stable games
∀ y, z ∈ ∆nr .
∀ i ∈ H n,
(2)
where f¯ := (1/Y )f ⊤ y, and α and Y are positive constants. The replicator equation has been used in applications ranging from biology (Akin, 1990) and engineering (Ramirez-Llanos & Quijano, 2010), to portfolio optimization in financial markets (Bomze, 2005), for example. A particular feature of the replicator equation (2) is that, if the vector of payoffs f is selected as the classic Euclidean gradient vector of a smooth potential function J, it will describe a special type of gradient systems, named Shahshahani gradients. Definition 4. Consider the manifold M = Rn>0 associated with the ⊤ Riemannian metric G(z ) = gij (z ), where gij = δij 1z z , δij = 1 only i if i = j, and δij = 0, otherwise. Consider also a real-valued smooth function J : M → R, whose derivative at the point z is given by the linear map DJ : Tz M → R, where Tz M is the tangent space of M at point z. Hence, there exists a unique gradient vector ∇g J such that
η, ∇g J
G(z ) z
= DJ [η], for all η ∈ Tz M, where the operator ⟨·, ·⟩Gz ,
represents the inner product associated to the Riemannian metric N ∂ J G(z ) at point z, and DJ [η] = i=1 ∂ zi ηi . Then, ∇g J is a Shahshahani gradient with potential function J, and G(z ) is the Shahshahani metric (Edalat, 2002).
J.I. Poveda, N. Quijano / Automatica 58 (2015) 51–59
Considering the inner product associated with the Shahshahani metric on the set M = Rn>0 , we get the gradient system z˙ = G(z )−1 ∇ J, with entries given by z˙i =
zi 1⊤ z
∂ J (z ) , ∂ zi
for all i ∈
53
¯ nr , with r being a prewhere the set X is given either by ∆nr or ∆ specified positive number, and where the initial condition of the state x is, in general, not restricted to be in X.
n
H , which under concavity assumptions on J converges to the maximizers of J in Rn≥0 . Now, consider the set int(∆nr ), which satisfies int(∆nr ) ⊂ M, and hence is endowed with the Shahshahani metric. For any y ∈ int(∆nr ) the tangent space of int(∆nr ) is defined as Ty ∆nr := {u ∈ Rn : 1⊤ u = 0}. The ith term of the perpendicular projection P (˙z ) of the vector field z˙ in Ty ∆nr is given by Py (˙z )i := y G(y) yi . r
z˙i − ⟨˙z , r ⟩y
3.1. Extremum seeking control in ∆nr We start by making the following assumption, which together with Assumption 1, guarantees existence and uniqueness of the solution to problem (4).
Hence, defining z = y and 1⊤ z = r, the projection
P (˙z ) is equal to the replicator system (2) when α = 1r and f = ∇ J, i.e., when the RDs are a Shahshahani gradient. As y˙ ∈ Ty ∆nr in (2), the manifold ∆nr is positively invariant under (2), i.e., if y(t0 ) ∈ ∆nr , for t0 ≥ 0, then y(t ) ∈ ∆nr , for all t ≥ t0 (Akin, 1990). Remark 1. When the vector of payoffs associated to the strategy set H n is defined as f = ∇ J, the population game describes a (full) potential game (Sandholm, 2010), and hence its NE can be entirely characterized by the Karush–Kuhn–Tucker (KKT) first order necessary conditions. Since these conditions are in general not sufficient for local maximization, all local maximizers of J will be NE, but not all the NE will necessarily be local maximizers. However, if J is also concave, the KKT conditions will be sufficient, and the set of NE will correspond to the convex set of maximizers of J on ∆nr . Based on these ideas, and inspired by the dynamics (2), we will formulate a robust Shahshahani gradient-like extremum seeking control (SGESC) in Rn≥0 , able to on-line optimize an unknown global potential function J, which describes the steady state input-tooutput map of a dynamical system, with inputs subject to similar multivariable constraints as those imposed by the sets ∆nr and ¯ nr . This type of constraints encompass a variety of applications in∆ cluding optimization with positive saturation constraints, optimal resource allocation, decision making in societal systems, and portfolio optimization, for instance. Subsequently, we will extend the results for multiple subsystems where a potential function J does not exist, but where the existence of a GESS is assumed. Finally, we will address the case where the dynamical system corresponds to a multi-agent system subject to local information constraints.
Assumption 2. The vector ∇ J (x) satisfies the strictly stable game condition (1) in ∆nr , with NE(∇ J ) = {x∗ }. Note that in the context of maximizing a global cost function, Assumption 2 is essentially a strict concavity assumption on J (x), which by convexity of ∆nr guarantees the existence of a unique global optimal x∗ ∈ ∆nr . Since the maximizers of J (x) are necessarily NE for the vector ∇ J (see Remark 1), by Lemma 1 the point x∗ will also be a GESS in ∆nr . To solve problem (4) consider the following ES dynamics: x˙ˆ = kxˆ ◦
1⊤ xˆ r
1 xˆ (t ) = k1
y = h(θ ),
⊤
1 diag(x) h(θ )µ(t ) − xˆ ⊤ h(θ )µ(t ) r
xˆ ⊤ 1 ⊤ = k xˆ h(θ )µ(t ) − xˆ h(θ )µ(t ) r
Consider a nonlinear dynamical system of the form
θ˙ = g (θ , x),
1 ⊤ xˆ (h(θ )µ(t ) + c) , r
(3)
(5)
where x = xˆ + aµ(t ), xˆ ∈ Rn≥0 is an auxiliary state, the operator (◦) represents the Hadamard product, c ∈ Rn>0 is the column vector whose entries are equal to the positive parameter c, µ(t ) := [µ1 (t ), . . . , µi (t ), . . . , µn (t )]⊤ is a vector of dither signals with µi (t ) := sin(ωω ¯ i t ), for i ∈ H n , ω¯ ∈ R>0 , and ωi is a positive rational number that satisfies ωi ̸= ωj for every i ̸= j. The adaptation gain is defined as k = 2ε ω ¯ a−1 , and the scalar parameters a and ε satisfy 0 < (ε, a) ≪ 1. To analyze the closed-loop system given by Eqs. (3) and (5) we start by making the observation that since x˙ˆ → 0 as xˆ → 0+ , and x˙ˆ i = 0 whenever xˆ i = 0, the sets Rn≥0 and Rn>0 are positively invariant for (5), for all θ ∈ Rm . This also implies positive invariance of the sets Rn≥0 \ {0} and Sx∗ (Rn≥0 ). Furthermore, note that if xˆ (t ) ∈ ∆nr , we have ⊤˙
3. Shahshahani gradient-like ESC
h(θ )µ(t ) + c −
⊤
= 0,
(6)
which implies positive invariance of the set ∆nr m
where θ ∈ Rm is the state of the system, x ∈ Rn is the input, and h : Rm → R is an unknown smooth function characterizing the output y, which is only accessible by measurements. We assume that the mathematical form of the mapping g (·, ·) is unknown, but it satisfies the following classical assumption in extremum seeking problems.
under the dynamics (5), for all θ ∈ R . We now proceed to neglect the dynamics of the plant (3), such that the output y = J (x) is a direct mapping of the input x. By changing the time scale from t to τ = t ω ¯ , replacing k = 2ε ω¯ a−1 , and expanding the Hadamard product, we obtain the ith term of (5), given by
Assumption 1. The function g : Rm × Rn → Rm is continuously differentiable, and satisfies g (θ , x) = 0 if and only if θ = ℓ(x), where ℓ : Rn → Rm is a smooth function, and where the equilibrium θ = ℓ(x) is globally asymptotically stable for Eq. (3), uniformly in x.
∂ xˆ i 2εˆxi = ∂τ a
Define J (x) := h (ℓ(x)), as the steady state input-to-output map of system (3). Then, we consider the following constrained optimization problem max J (x)
subject to: x ∈ X,
(4)
1⊤ xˆ r
J (x)µi (t ) + c −
xˆ ⊤ (J (x)µ(t ) + c) r
,
(7)
for all i ∈ H n , where x = xˆ + aµ(t ). For small values of ε , the right hand side of (7) varies slowly in comparison to the time-varying perturbation signal µ(t ). Using the smoothness of J, considering a as a small positive parameter, and following the ideas in Ne˘sić et al. (2010, Proposition 1), we expand J (ˆx + aµ(t )) in its Taylor series with respect to xˆ , and using standard averaging theory (Khalil, 2002), we obtain the average system of (7) in the τ -time scale,
54
J.I. Poveda, N. Quijano / Automatica 58 (2015) 51–59 A
which satisfies V2 ≥ 0 for all xˆ A ∈ Sx∗ (∆nr ), with equality only at xˆ A = x∗ , as well as V2 → ∞ as xˆ A → ∆nr \ Sx∗ (∆nr ). Then V2 is a valid candidate Lyapunov function in Sx∗ (∆nr ). The derivative of V2 along the trajectories of system (9) satisfies
given by x˙ˆ i = ε giA (ˆxA ), where giA =
2
T
xˆ i
aT
0
1⊤ xˆ J (ˆx) + aµ(s)⊤ ∇ J (ˆx) r T
+ O(a2 ) µi (s)ds +
xˆ i c ds
V˙ 2 = −
0
−
1
T
r
T
xˆ ⊤ c
xˆ i
r
0
⊤
ds ,
(8)
for all i ∈ H n , with T = 2π · LCM{1/ω1 , . . . , 1/ωn }. Eq. (8) can A be expressed as giA = gi1 + gi2A + gi3A + gi4A , where the terms gikA , ⊤ A for k = {1, . . . , 4} are given by gi1 = xˆ Ai 1 rxˆ
A gi2 = 2xˆ Ai c, gi3A = −
ˆA
xi
xˆ A⊤ ∇ J (ˆxA ) + O(a)
r
A
∇i J (ˆxA ) + O(a) ,
A , gi4 = −2xˆ Ai xˆ
Therefore, the average system in the λ := ετ -time scale is
∂ xˆ Ai = xˆ Ai ∂λ
1⊤ xˆ A
1 ⊤ 1⊤ xˆ A ζi − xˆ A ζ + 2c xˆ Ai 1 − ,
r
r
r
A⊤ c
r
.
1 2
(1⊤ xˆ A − r )2 ,
(9)
(10)
which satisfies V1 = 0 if and only if xˆ ∈ ∆nr , as well as V1 > 0 for all xˆ A ∈ Rn≥0 \ ∆nr . Let ϱ(ˆxA ) = (1⊤ xˆ A − r )/r. The derivative of V1 along the trajectories of (9) is given by V˙ 1 =
1⊤ xˆ A − r
= r ϱ(ˆxA )
A
1⊤ x˙ˆ
1⊤ xˆ A r
+ 2c 1⊤ xˆ A −
2c r
⊤
xˆ A ζ
−
(1⊤ xˆ A )2
= −2cr ϱ(ˆx ) 1⊤ xˆ A , A 2
⊤ xˆ A 1 A⊤ xˆ ζ r
(11)
which is negative definite for all xˆ A ∈ Rn≥0 \ (∆nr ∪ {0}). Furthermore, note that the rate of decrease of V1 (ˆxA (t )) is proportional to the tunable parameter c. Using (11) and the positive invariance of Rn≥0 \ {0} we obtain AS of the set ∆nr for (9), with basin of attraction containing the set Rn≥0 \ {0}, since for all xˆ A (t0 ) ∈ Ωk := {ˆxA ∈
Rn≥0 \ {0} : V1 (ˆxA ) ≤ k}, for each k > 0, we have that xˆ A (t ) ∈ Ωk for all t ≥ t0 , and xˆ A (t ) → ∆nr as t → ∞. Note that in the case that xˆ A ∈ ∆nr , the right hand side of Eq. (9) reduces to
n ∂ xˆ Ai 1 A = xˆ Ai ζi − xˆ j ζj , ∂λ r j =1
(12)
which corresponds to a Shahshahani gradient of the form (2), with perturbations of order O(a) on the derivatives of J. Now, consider a continuously differentiable function V2 : Sx∗ (Rn≥0 ) → R, given by V2 (ˆx ) = − A
n i=1
∗
xi ln
xˆ Ai x∗i
,
⊤
i=1
⊤ ∗
1 x r
⊤ − ϱ(ˆxA ) x∗ (∇ J − 2c)
(14)
+ O(a) V˙ 2 ≤−W (ˆxA ) − |ϱ(ˆxA )|∥x∗ ∥∥∇ J − 2c∥ + |ϱ(ˆxA )| + O(a),
where due to Lemma 1 and the strictly stable game condition of Assumption 2 the function W (ˆxA ) := (x∗ − xˆ A )⊤ ∇ J (ˆxA ) is strictly positive in Sx∗ (∆nr ) \ {x∗ }. Let K ⊂ Sx∗ (Rn≥0 ) be a compact set with x∗ ∈ K . Since ∆nr is AS for (9) in Rn≥0 \ {0}, and the set Sx∗ (Rn≥0 ) ⊂ Rn≥0 \ {0} is positively invariant for the same dynamics, we have
that for xˆ A (t0 ) ∈ K , and each ϵ > 0, there exists t ∗ > 0 such that xˆ A (t ) ∈ Bϵ , for all t ≥ t ∗ + t0 , where Bϵ := ∆nr + ϵ B ∩ Sx∗ (Rn≥0 ).
for all i ∈ H n , where ζi := ∇i J (ˆxA ) + O(a) corresponds to the ith entry of the vector ζ . Note that under the dynamics (9) the sets Rn≥0 , Rn≥0 \ {0}, Sx∗ (Rn≥0 ), and ∆nr , preserve the positive invariance property. Now, consider a continuously differentiable Lyapunov function V1 : Rn≥0 → R≥0 , given by V1 =
r
= − x∗ ∇ J + xˆ A ∇ J
j =1
+ O(a2 ) µj (s)ds −
n 1 A⊤ xi ∇i J − xˆ ∇ J − ϱ(ˆxA ) x∗i (∇i J − 2c ) + O(a) ∗
i= 1
N xˆ i xˆ j J (ˆx) + aµ(s)⊤ ∇ J (ˆx)
0
n
(13)
By continuity of ∇ J (ˆxA ) there exists M ∗ > 0 such that ∥∇ J (ˆxA ) − 2c∥ ≤ M ∗ , for all xˆ A ∈ B ϵ . Let κ = M ∗ ∥x∗ ∥. Then, we have that Eq. (14) satisfies V˙ 2 ≤ −W (ˆxA ) − |ϱ(ˆxA )|(κ + 1) + O(a) for all xˆ A ∈ Bϵ . Using the fact that |ϱ(ˆxA )| = 0 in Sx∗ (∆nr ) ⊂ ∆nr , the continuity of xˆ A → W (ˆxA ) and xˆ A → ϱ(ˆxA ), the negative definiteness of −W (ˆxA ) in Sx∗ (∆nr )\{x∗ }, and the definition of O(a), we obtain the existence of a pair (ϵ ∗ , a∗ ), such that for all xˆ A ∈ Bϵ ∗ , and a ∈ (0, a∗ ], the state xˆ A converges in finite time to a positively invariant compact set Λa ⊂ Sx∗ (Rn≥0 ) containing the point x∗ . Since this set can be made arbitrarily small by decreasing the parameter a we get SPAS (with respect to Sx∗ (Rn≥0 )) of x∗ , for system (9). Using the positive invariance of the sets ∆nr and Sx∗ (Rn≥0 ) for the original time-varying system (7), and Tan, Ne˘sić, and Mareels (2005, Lemma 1), we get SPAS (with respect to Sx∗ (Rn≥0 )) of x∗ , for system (7), in the parameters a and ε , uniformly in a. Finally, we analyze the complete system (3)-(5), which in the τ -time scale is in singular perturbation form Khalil (2002), with ω ¯ as the small perturbation parameter. The ‘‘slow dynamics’’ of this singularly perturbed system are obtained by replacing θ = ℓ(x) in (5), in the τ -time scale. Since the input-to-output-mapping of the plant is defined as J (x) = h(ℓ(x)) the resulting slow dynamics will be equal to system (7), which under Assumption 2 was shown to have the point x∗ SPAS (with respect to Sx∗ (Rn≥0 )) in (a, ϵ ), uniformly in a. Given that Assumption 1 ensures global stability of the boundary layer system associated to the singularly perturbed system, using Tan et al. (2006, Lemma 1), the original closedloop system, given by (3) and (5), will render the point (θ ∗ , x∗ ) SPAS (with respect to Rm × Sx∗ (Rn≥0 )) in the parameters (a, ε, ω ¯ ), uniformly in a. We summarize our results with the following theorem. Theorem 1. Consider the closed-loop system given by Eqs. (3) and (5), under Assumptions 1 and 2. Then, the point (θ ∗ , x∗ ) is SPAS (with respect to Rm × Sx∗ (Rn≥0 )) in the parameters (a, ε , ω ¯ ), uniformly in a. Moreover, these dynamics render the set Rm × ∆nr positively invariant. From the previous analysis we can observe that in cases where the state xˆ has been initialized outside of the feasible set ∆nr (or pushed out by external perturbations), the positive parameter c governs the rate of the asymptotic convergence of xˆ (t ) back to ∆nr . If the parameter c is chosen sufficiently large, a multi-scale time behavior emerges, leading to a faster convergence to ∆nr compared to the convergence to x∗ . Indeed, by considering the auxiliary
J.I. Poveda, N. Quijano / Automatica 58 (2015) 51–59
variable z := r ϱ(ˆxA )/δ , where δ = 1/c, the average system (9) can be rewritten as
z ∂ xˆ Ai δz 1 A⊤ A = xˆ i + 1 ζi − xˆ ζ − 2xˆ Ai ∂λ r r r ∂z z ⊤ A δ = −2 1 xˆ . ∂λ r
(15a) (15b)
Eq. (15) offer a ‘‘singular-perturbed’’ interpretation of the dynamics (9) in the case that c ≫ 1, where the slow manifold is given by z = 0, i.e., xˆ A ∈ ∆nr ; the slow dynamics are given by (12); and the fast dynamics are given by (15b) with the state xˆ A constant. In this case it is possible to obtain an approximate characterization of the location of the point xˆ A∆ ∈ ∆nr where the state xˆ A (t ) will converge in a faster time scale as δ → 0+ . This follows by noting that for A
c ≫ 1 the dynamics (9) can be approximated as x˙ˆ i ≈ −2c xˆ Ai ϱ(ˆxA ) for xˆ A ̸∈ ∆nr . Since we have that ϱ(ˆxA ) > 0 (resp. < 0) whenever
1⊤ xˆ (t0 )A > r (resp. < r ), and the rate of flow of the dynamics x˙ˆ i is approximately the same for all i ∈ H n , the trajectories of xˆ i (t ) will approximate in a faster time scale the point xˆ Ai ∆ = for all i ∈ H N .
xˆ A (t ) i 0 r ⊤ 1 xˆ A (t ) i
0
∈ ∆nr ,
Remark 2. A particular feature of system (5), is that the asymptotic stability of ∆nr does not depend on the cost function J. In this way, under the ES dynamics (5), local maximizers of J located outside of the feasible set are neither attractive, nor stable. Note that even though Theorem 1 requires the existence of a unique maximizer of J in the feasible set, local stability results in ∆nr can be obtained by relaxing Assumption 2 to hold only in a neighborhood of x∗ . In this case, the weaker and local notion of evolutionarily stable state (ESS) (Sandholm, 2010), can be used to characterize x∗ .
¯ nr 3.2. Extremum seeking control in ∆ The framework of evolutionary game theory and population games allows to modify the dynamics (5) in a straight forward manner for the case when we are interested in achieving ex¯ nr = {x ∈ Rn≥0 , 1⊤ x ≤ r }. tremum seeking of J in the feasible set ∆ This type of setting is usually described as a potential game with en¯ nr we now enforce try and exit (Sandholm, 2010). To achieve ES in ∆ n ¯ Assumption 2 to be satisfied with respect to ∆r , and we introduce a fictitious slack variable xˆ n+1 , associated to a fictitious strategy i = n + 1, for which the dither signal is defined as µn+1 (t ) := 0, for all t ≥ t0 ≥ 0. By extending the vectors 1 and c to Rn>+01 , while retaining xi = xˆ i + aµi (t ), only for i ∈ H n , we get an extended system (5), defined in Rn≥+01 , for which the average system is still given by (9), n+1 where now xˆ A ∈ R≥ 0 , and ζn+1 = 0 due to the choice of µn+1 (t ). For this system the Lyapunov function V1 : Rn≥+01 → R≥0 estab-
lishes again AS of the set ∆nr +1 , with basin of attraction containing n +1 n +1 ∗ the set R≥ 0 \ {0}. Using again the function V2 : Sx (R≥0 ) → R,
which satisfies V2 ≥ 0 for all xˆ A ∈ Sx∗ (∆nr +1 ), with equality only at xˆ A = x∗ , and therefore is a valid Lyapunov function in Sx∗ (∆rn+1 ), and taking the derivative of V2 along the trajectories of the average system, we obtain again Eq. (14) with ϱ(x) : Rn≥+01 → R. Note that
in this case by Assumption 2 we have that W (ˆxA ) is negative defi¯ nr ) but only negative semidefinite in Sx∗ (∆nr +1 ). Hownite in Sx∗ (∆ n A ˆ i . Using ever, since ∆rn+1 is AS we have that xˆ An+1 → r − i=1 x again the continuity of xˆ A → W (ˆxA ) and xˆ A → ϱ(ˆxA ), the negative semidefiniteness of W (·), the fact that ϱ(ˆxA ) → 0 as t → ∞, and n +1 LaSalle’s principle, we obtain SPAS (with respect to Sx∗ (R≥ 0 )) of ∗ x , for the average system; and using Tan et al. (2005, Lemma 1) we
55
n +1 ∗ obtain SPAS (with respect to Sx∗ (R≥ 0 )) of x , in (a, ε), uniformly in a, for the original time-varying system (5). By following the exact same singular perturbation argument of the previous subsection for the complete system (3)–(5) we get the following Corollary.
Corollary 1. Consider the closed loop system (3)–(5), where Eq. (5) is n +1 defined in R≥ 0 , µn+1 (t ) := 0 for all t ≥ t0 ≥ 0, and xi = xˆ i + aµi (t ), for all i ∈ H n . Suppose that Assumption 1 holds, and ¯ nr . Then, the point that Assumption 2 is satisfied with respect to ∆ (θ ∗ , x∗ ) is SPAS (with respect to Rm × Sx∗ (Rn≥0 )) in the parameters (a,
¯ nr ε , ω¯ ), uniformly in a. Moreover, these dynamics render the set Rm ×∆ positively invariant.
Note that for the case when c ≫ 1 the point xˆ A∆ ∈ ∆nr +1 will depend on the initial conditions of xˆ (t0 ), and therefore the selection of the initial condition for the slack state xˆ n+1 (t0 ) may play an important role in the optimization problem when multiple maximizers ¯ nr . of J exist in ∆ 4. SGESC for non-potential games According to Remark 1, the problem considered in Section 3, characterized by the existence of a potential function J, can be seen as a potential game with vector of payoffs given by ∇ J. It turns out that the entropy Lyapunov function V2 , considered in Section 3, allows us to directly extent the results of the previous section for the case where a potential function J does not exist, but where a vector of n individual cost functions Ji , for all i ∈ H n , associated to the input-to-output mapping of n different subsystems, exists. Consequently, in this section we consider a group of n different subsystems, with global state (θ , x) ∈ Rm × Rn and individual outputs yi , characterized by the following nonlinear dynamics
θ˙ = g (θ , x),
yi = hi (θ ),
∀ i ∈ H n,
(16)
m
where hi : R → R is an unknown smooth function accessible only by measurements. The function g : Rm × Rn → Rm satisfies Assumption 1. Let Ji (x) := hi (ℓ(x)) be the steady state input-tooutput mapping of the ith subsystem, for i ∈ H n , and define D J (x) := [∇1 J1 (x), . . . , ∇n Jn (x)]⊤ . The following assumption is the equivalent of Assumption 2 when a global potential function J is not available for measurements. Assumption 3. The vector D J (x) satisfies the strictly game condition (1) in ∆nr , with NE(D J ) := {x∗ }. For this case, the vectorial extremum seeking dynamics (5) are modified as follows
⊤ ˙xˆ = kxˆ ◦ 1 xˆ (h(θ ) ◦ µ(t )) + c − 1 xˆ ⊤ (h(θ ) ◦ µ(t ) + c) , (17) r
r
where h(θ ) is now a vector defined as h(θ ) := [h1 (θ ), . . . , hn (θ )]⊤ , (◦) represents the Hadamard product, and where the parameters (k, c ), and the dither signal µ(t ), are selected as in Section 3. Since the only difference between Eq. (17) in the quasi-steady state (i.e., with θ = ℓ(x)), and Eq. (5), is the vectorial nature of h(θ ), it is easy to see that Eq. (6) holds, and the positive invariance of the sets Rn≥0 , Rn>0 , and ∆nr is preserved under the dynamics (17). Furthermore, given that the dynamics of the ith term of system (17) (at the quasi-steady state) are equal to Eq. (5) with J replaced by Ji , for all i ∈ H n , the average system is still given by (9), now with ζi := ∇i Ji + O(a). The Lyapunov function V1 given by (10) can be used again to establish AS of ∆nr for (17), with basin of attraction containing the set Rn≥0 \ {0}. Using the entropy Lyapunov function (13) we obtain again Eq. (14) with ∇i Ji instead of ∇i J, and where the positive definiteness of W := (x∗ − xˆ A )⊤ ∇n J (ˆxA ) is now enforced by Assumption 3. Following now the exact same argument as in Section 3 we obtain directly the following Corollary.
56
J.I. Poveda, N. Quijano / Automatica 58 (2015) 51–59
Corollary 2. Consider the dynamics (16) and (17) under Assumptions 1 and 3. Then, (θ ∗ , x∗ ) is SPAS (with respect to Rm × Sx∗ (Rn≥0 )), in the parameters (a, ε, ω ¯ ), uniformly in a. Moreover, these dynamics render the set Rm × ∆nr positively invariant. 5. Distributed SGESC for multi-agent systems One of the main advantages of the evolutionary dynamics in population games lies in the fact that they can be naturally modified to achieve optimization in large-scale distributed scenarios, where using a centralized controller is not feasible. Therefore, we now extend our results from the previous sections for the case when the extremum seeking problem must be performed in a distributed way by a multi-agent system (MAS) without all-to-all communication. For this purpose, let (θi , xi ) be the state associated to the ith agent of a network comprised of n different agents. This network is characterized by an undirected graph G = (H n , E ), where the set of vertices H n = {1, . . . , n} also characterizes the set of agents, and where the set of edges E = {(i, j) : i, j ∈ H n } determines the interaction topology among the agents. We assume that for every pair of vertices in H n there is a path that has them as its end vertices, i.e., G is a connected graph. Due to the undirected nature of the communication links, we have that (i, j) = (j, i), implying that every agent i shares only information with its neighboring agents j, which belong to the set Ni := {j ∈ H n : (i, j) ∈ E }. The adjacency matrix associated to the graph G is defined as A = [aij ] ∈ {0, 1}n×n , where aij = 1 if (i, j) ∈ E , and aij = 0, otherwise. We assume that each agent has internal unknown dynamics θ˙ i , and has access to measurements of a global cost function n h(θ) := i=1 Ui (θi ), where Ui : R → R is a smooth function characterizing the individual payoff of the ith agent. Based on this, the model of each agent is given by
θ˙i = gi (θi , xi ),
∀ i ∈ H n,
y = h(θ ),
(18)
where gi : R × R → R satisfies Assumption 1 with function ℓi : R → R, for each i ∈ H n . The network of agents aims to maximize an unknown global steady state input-to-output mapping J (x) := n i=1 Ui (ℓi (xi )), by controlling their own input state xi , sharing information only with their neighboring agents, and subject to the global constraint imposed by ∆nr . We make the following assumption on the cost function J.
precluding attractivity of ∆nr in Rn≥0 . This is a natural consequence of using only local information in the MAS in contrast to the full information dynamics of Sections 3 and 4. To analyze the distributed dynamics (19) we start again by neglecting the fast dynamics (18), such that the function J is a direct mapping of the input x. Expanding the term k, using a Taylor series expansion of the cost function, and considering ε as a small parameter, we obtain by standard averaging theory the average dynamics for the ith term of (19), which in the τ -time scale are given by
∂ xˆ Ai A A A = xˆ i ζi + 2c xˆ j − ζj + 2c xˆ j , ∂τ j∈N j∈N i
(21)
i
where now we have ζi := ∇i Ui + O(a). The average system (21) corresponds to a distributed replicator equation (Pantoja, Quijano, & Passino, 2011), with perturbations of order O(a) on the gradients of the cost functions. To analyze this system consider the following Lyapunov function proposed in Pantoja et al. (2011) V3 (ˆxA ) = max (∇i Ui + 2c ) , i∈H
which is locally Lipschitz continuous in xˆ A , and differentiable almost everywhere. It is shown in Pantoja et al. (2011) that under the strict concavity assumption imposed by Assumption 4, the function V3 is a valid Lyapunov function in int(∆nr ), with global minimum given by x∗ , and time derivative given by V˙ 3 (ˆxA ) =
λk ∇k2 Uk
k∈Hf
∂ xˆ Ak , ∂τ
(22)
where I := {j : V (ˆxA ) = ∇j Uj } is the set of indices for which there exists a point where the differentiability of V3 (ˆxA ) fails. Replacing (21) in (22), and using the invariance of ∆nr to get j∈Ni xˆ Ai (t ) ≤ r, for all i ∈ H n and for all t ≥ t0 , we obtain V˙ 3 (ˆxA ) =
xˆ Ak λk ∇k2 Uk
k∈I
−
xˆ Al ∇k Uk + 2c l∈Nk
∇l Ul + 2c xˆ Ai + O(a).
(23)
l∈Nk
Assumption 4. The vector ∇ J (x) satisfies the strictly stable game condition (1), with NE(∇ J ) := {x∗ } ∈ int(∆nr ). To solve the distributed ES problem consider the following ES dynamics
x˙ˆ = kxˆ ◦ diag(h(θ )µ(t ) + c)(Axˆ )
− A diag (h(θ )µ(t ) + c) xˆ ,
(19)
where x = xˆ + aµ(t ), and where the parameters (k, c ) and the perturbation signal µ(t ) are defined as in Section 3. Let z = h(θ)µ(t ) + c. Then, we have
1⊤ x˙ˆ = 1⊤ k diag(x)[diag(z )Axˆ − Adiag(z )ˆx] = k diag(z )ˆx)⊤ Axˆ − xˆ ⊤ A(diag(z )ˆx
(20)
= 0, where we used the fact that the matrix A is symmetric. Eq. (20) implies positive invariance of the set ∆nr for the time varying dynamics (19), for all θ ∈ Rm , where now we have r = 1⊤ xˆ (t0 ), for t0 ≥ 0, and xˆ (t0 ) ∈ Rn>0 . Note that in this case the ‘‘size’’ of the invariant set ∆nr is defined by the initial conditions of the system,
By the smoothness of Ui , for all i ∈ H n , and compactness of ∆nr , there exists a c ∗ > 0, such that |∇ Ui (ˆxAi )| ≤ c ∗ , for all xˆ A ∈ ∆nr , and all i ∈ H n . Given that ∇j Uj ≥ ∇l Ul , for all l ∈ Nj , since j ∈ I, we have that for c > c ∗ /2 the terms in the parenthesis of (23) will be positive, and the general expression inside the brackets will be greater or equal to zero, being zero only when ∇j Uj = ∇l U (xl ), for all j ∈ I, which under the KKT conditions corresponds to the point x∗ that maximizes the global function J. Finally, since xˆ Ak (t ) > 0, for all t ≥ t0 , λk > 0, and the strict concavity in J imposed by Assumption 4, the term out of the brackets of (23) will be negative definite for all xˆ A ∈ int(∆nr ) \ {x∗ }. Since the term O(a) can be made arbitrarily small by decreasing the parameter a, and by continuity of the first term in (23), the point x∗ will be SPAS (with respect to int(∆nr )) in the parameter a, for the average system (21). Using the invariance of int(∆nr ) for the original time-varying dynamics (19), and Lemma Tan et al. (2005, Lemma 1), we obtain that the point x∗ is SPAS (with respect to int(∆nr )) in the parameters (ε , a), uniformly in a, for the dynamics (19), with θi = ℓi (xi ). Considering now the complete system (18)–(19) in the τ -scale, and making use of Assumption 1, we obtain again a singular perturbed system, where the boundary layer dynamics are GAS. Using Tan et al. (2006, Lemma 1) we obtain the following theorem.
J.I. Poveda, N. Quijano / Automatica 58 (2015) 51–59
57
Fig. 1. Trajectories of xˆ superimposed on the contour curves of J. The red balls denote the points xˆ A∆ , and the green ball denotes the GESS that maximizes J in ∆21 . (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Theorem 2. Consider the dynamics (18)–(19), under Assumption 1 with θi∗ = li (xi ), for each i ∈ H n , and Assumption 4. Then, there exists c ∗ > 0, such that for all c > c ∗ the point (θ ∗ , x∗ ) is SPAS (with respect to Rm × int (∆nr )), in the parameters (a, ε , ω ¯ ), uniformly in a. Moreover, these dynamics render the set Rm × ∆nr positively invariant, where r = 1⊤ xˆ (t0 ). Remark 3. Distributed extremum seeking under inequality con¯ nr , can be achieved in a similar way as presented in straints in ∆ Section 3.2, i.e., by adding to the system a fictitious agent (node) with perturbation signal µn+1 (t ) := 0, ∀ t ≥ 0, which plays the role of slack variable. 6. Further considerations It should be noted that even though the ES dynamics (5), (17), ¯ nr (or ∆nr ) for the auxiland (19) guarantee positive invariance of ∆ ¯ nr + aB, since iary state xˆ , the actual state x will only satisfy x ∈ ∆ x = xˆ + aµ(t ). Although the parameter a is generally selected to be ¯ nr may be problematic if the consmall, the excursions of x out of ∆ straints imposed by ∆nr correspond to physical constraints, such as a fixed available resource. In order to alleviate this potential issue, the initial conditions xˆ (t0 ) must be selected in Rn>0 ⊂ Sx∗ (Rn≥0 ), and the parameter r should be selected to satisfy r < r ∗ , where r ∗ corresponds to the critical physical constraint. In this way, some ¯ nr while still satisfying x ∈ ∆ ¯ nr∗ . This room is given to x to leave ∆ may also be relevant in cases where the initial conditions are de¯ nr is fined outside the feasible set, since the convergence to ∆nr and ∆ only asymptotic. Another consideration to take into account is the well known trade-off between the size of the residual set where convergence is achieved and the rate of convergence of the ES dynamics, see for instance Tan et al. (2006). In practice, acceptable residual sets might require the selection of the parameters (a, ε, ω) ¯ sufficiently small, leading to unacceptable slow convergence rates. A standard approach to address this issue is to include a low pass filter and/or a high pass filter into the closed-loop system, allowing to significantly increase the adaptation gain k, without excessively deteriorating the steady state performance of the algorithm, see for instance Ne˘sić et al. (2010); Tan et al. (2006), and Poveda and Quijano (2013). 7. Numerical examples Example 1. Consider the non-quadratic static mapping J (x) :=
4
j =1
Bj exp −
(x1 −x∗1j )2 +(x2 −x∗2j )2 cj
, which must be maximized on
∆21 . The parameters Bj , cj , x∗1j , and x∗2j , correspond to the jth entries
of the vectors B, c, x∗1 , and x∗2 , given by: B = [3, 15, 3, 3]⊤ , c = [0.01, 1.5, 0.01, 0.01]⊤ , x∗1 = [0.65, 0.5, 0.35, 0.8]⊤ , and x∗2 = [0.9, 0.5, 0.35, 0.7]⊤ . Note that the cost function J (x) has 4 local maxima in R2>0 , but only one in ∆21 , given by x∗ = [0.5, 0.5]⊤ . Furthermore it can be checked that x∗ satisfies Assumption 2. The parameters of the SGESC are defined as a = 0.01, k = 0.08, c = 20, r = 1, and µ(t ) = [sin(40t ), sin(42t )]⊤ . Fig. 1 shows three different trajectories of xˆ (t ) superimposed on the contour curves of J, each with an initial condition located outside of ∆21 , given by: xˆ (0) = [0.1, 0.7]⊤ , xˆ (0) = [0.2, 0.2]⊤ , and xˆ (0) = [0.9, 0.75]⊤ . The trajectories converge in a faster time scale (see the inset for xˆ (0) = [0.1, 0.7]⊤ ) to a neighborhood of the points xˆ A∆ = [0.125, 0.875]⊤ , xˆ A∆ = [0.5, 0.5]⊤ , and xˆ A∆ = [0.545, 0.454]⊤ , respectively, located in ∆21 . Then, the trajectories seek the maximizer xˆ ∗ = [0.5, 0.5]⊤ . Now, consider the case when the cost function J (x) must be op¯ 21 . In this case there are two local maximizers of J (x) in timized in ∆ 2 ¯ 1 , given by x∗a = [0.5, 0.5]⊤ and x∗b = [0.35, 0.35]⊤ . We define ∆ the dither signal as µ(t ) = [sin(20t ), sin(22t ), 0]⊤ , and hence a slack variable xˆ 3 (t ) is introduced into the control system to allow the optimization in the extended set ∆31 . Fig. 2 shows the trajectory of xˆ (t ), for xˆ (0) = [0.4, 1, 1]⊤ , converging in a faster time scale to a neighborhood of the point xˆ A∆ = [0.166, 0.416, 0.416]⊤ ∈ ∆31 , and then flowing until the coordinates xˆ 1 and xˆ 2 converge to a neighborhood of xˆ ∗b . The inset shows the evolution in time of the vector state xˆ (t ). Fig. 3 shows the case when the initial condition is defined as xˆ (0) = [0.4, 1, 0.01]⊤ . Here, we have that xˆ A∆ = [0.283, 0.709, 0.07]⊤ ∈ ∆31 , which is located closer to the maximizer xˆ ∗a , and hence the trajectory of the system converges to a neighborhood of xˆ ∗a . Example 2. We now consider the case when the global potential function J does not exist and Assumption 3 is satisfied. We consider the dynamical system given by
θ˙1 (t ) = −4θ1 + 9x21 (t ), θ˙2 (t ) = −2θ2 + x2 (t ), 8 2 θ1 (t )θ2 (t ) − θ1 (t ), y2 (t ) = 2θ22 (t ), y1 (t ) = 3
in
∆21 ,
9
which generates two steady state input-to-output mapx2
x2
pings, given by J1 = 2x1 x2 − 21 , and J2 = 22 . Assumption 3 is satisfied with x∗ = [0.5, 0.5]⊤ and D J = [2x2 − x1 , x2 ]⊤ , which in turn defines a classic Hawk–Dove game with NE(D J ) = {x∗ } = [0.5, 0.5]⊤ . Fig. 4 shows the trajectories of xˆ (t ) converging to a neighborhood of the Nash equilibrium x∗ of this Hawk–Dove game, where only measurements of J1 and J2 are available, and where the initial conditions were selected as xˆ (0) = [0.2, 0.8]⊤ .
58
J.I. Poveda, N. Quijano / Automatica 58 (2015) 51–59
¯ 21 . The red ball denotes the point xˆ A∆ . (For interpretation of the references Fig. 2. Trajectory of xˆ converging to the simplex ∆31 , and its component in R2≥0 converging to xˆ ∗b in ∆ to color in this figure legend, the reader is referred to the web version of this article.)
¯ 21 . The red ball denotes the point xˆ A∆ . (For interpretation of the references Fig. 3. Trajectory of xˆ converging to the simplex ∆31 , and its component in R2≥0 converging to xˆ ∗a in ∆ to color in this figure legend, the reader is referred to the web version of this article.)
Fig. 4. Trajectories of xˆ (t ) converging to a neighborhood of the interior Nash equilibrium x∗ .
In this case the parameters of the controller are defined as µ(t ) = [sin(8t ), sin(4.4t )]⊤ , a = 0.01, k = 0.02, r = 1, and c = 5. It should be noted that, in general, the strictly stable game assumption imposed on the gradients of the steady state mappings, cannot be relaxed. Indeed, there are examples of (non-strictly) stable games for which the ES dynamics (5), (17) and (19), do not converge to the NE x∗ . In these cases, however, it is possible to design ESCs based on other types of evolutionary dynamics different from replicator-like systems, that achieve (semiglobal practical) convergence to x∗ in ∆nr . 8. Conclusions We have presented novel extremum seeking controllers inspired in evolutionary dynamics and population games, designed to perform real-time optimization and learning in dynamical
systems with constrained inputs. The algorithms also achieve distributed extremum seeking in multi-agent systems where an all-to-all communication structure is absent. Possible applications include areas such as distributed resource allocation; congestion, flow, and traffic control; and dynamic portfolio optimization. References Akin, E. (1990). The differential geometry of population genetics and evolutionary games. In S. Lessard (Ed.), Mathematical and statistical developments of evolutionary theory (pp. 1–93). Dordrecht: Kluwer. Ariyur, K. B., & Krstić, M. (2003). Real-time optimization by extremum-seeking control. Hoboken, NJ: Wiley. Bomze, I. M. (2005). Portfolio selection via replicator dynamics and projections of indefinite estimated covariances. Dynamics of Continuous, Discrete and Impulsive Systems B, 12, 527–564. Dafermos, S. (1980). Transportation equilibrium and variational inequalities. Transportation Science, 14, 42–54.
J.I. Poveda, N. Quijano / Automatica 58 (2015) 51–59 DeHaan, D., & Guay, M. (2005). Extremum-seeking control of state-constrained nonlinear systems. Automatica, 41, 1567–1574. Durr, H., Zeng, C., & Ebenbauer, C. (2013). Saddle point seeking for convex optimization problems. In 9th IFAC symp. on nonlinear control systems (pp. 540–545). Edalat, A. (2002). Shahshahani gradients. In Papers contributed in honor of Siavash Shahshahani’s 60th birthday. Fox, M. J., & Shamma, J. S. (2013). Population games, stable games, and passivity. Games, 4, 561–583. Frihauf, P., Krstić, M., & Başar, T. (2012). Nash equilibrium seeking in noncooperative games. IEEE Transactions on Automatic Control, 57(5), 1192–1206. Hofbauer, J., & Sigmund, K. (1998). Evolutionary games and population dynamics. Cambridge, UK: Cambridge University Press. Khalil, H. K. (2002). Nonlinear Systems. Upper Saddle River, NJ: Prentice Hall. Kvaternik, K., & Pavel, L. (2012). An analytic framework for decentralized extremum seeking control. In Proc. of American control conference (pp. 3371–3376). Ne˘sić, D., Tan, Y., Moase, W. H., & Manzie, C. (2010). A unifying approach to extremum seeking: Adaptive schemes based on estimation of derivatives. In Proc. 49th IEEE conference on decision and control (pp. 4625–4630). Pantoja, A., Quijano, N., & Passino, K. (2011). Dispatch of distributed generators using a local replicator equation. In Proc. 50th IEEE conference on decision and control (pp. 7494–7499). Poveda, J., & Quijano, N. (2013). Distributed extremum seeking for real-time resource allocation. In Proc. of American control conference (pp. 2772 – 2777). Ramirez-Llanos, E., & Quijano, N. (2010). A population dynamics approach for the water distribution problem. International Journal of Control, 83, 1947–1964. Sandholm, W. H. (2010). Population games and evolutionary dynamics. The MIT Press. Sandholm, W. H., & Hofbauer, J. (2009). Stable games and their dynamics. Journal of Economic Theory, 144(4), 1665–1693. Smith, M. J. (1979). Existence, uniqueness, and stability of traffic equilibria. Transportation Research B, 13, 295–304. Stanković, M. S., Johansson, K. H., & Stipanović, D. M (2000). Distributed seeking of Nash equilibria with applications to mobile sensor networks. IEEE Transactions on Automatic and Control, 12, 527–564. Tan, Y., Li, Y., & Mareels, I. (2013). Extremum seeking for constrained inputs. IEEE Transactions on Automatic Control, 58(9), 2405–2410. Tan, Y., Ne˘sić, D., & Mareels, I. (2005). On non local stability properties of extremum seeking control. In Proc. of. World congress, IFAC (pp. 2807–2812).
59
Tan, Y., Ne˘sić, D., & Mareels, I. (2006). On non-local stability properties of extremum seeking controllers. Automatica, 42, 889–903. Teel, A. R., Moreau, L., & Ne˘sić, N. (2003). A unified framework for input-to-state stability in systems with two time scales. IEEE Transactions on Automatic and Control, 48(9), 1526–1544.
Jorge I. Poveda received the B.S. degree in Electronics Engineering, and the B.S. degree in Mechanical Engineering, both from University Los Andes, Bogotá, Colombia, in 2012, and the M.S. degree in Electrical Engineering from the same university in 2013. He is recipient of the 2013 Center for Control, Dynamical Systems, and Computation Outstanding Scholar Fellowship, at the University of California, Santa Barbara, where he is currently pursuing the Ph.D. degree in the area of control systems, in the Department of Electrical and Computer Engineering. His current research interests include hybrid dynamical systems, optimization, and robust learning. Nicanor Quijano received his B.S. degree in Electronics Engineering from Pontificia Universidad Javeriana (PUJ), Bogotá, Colombia, in 1999. He received the M.S. and Ph.D. degrees in Electrical and Computer Engineering from The Ohio State University, in 2002 and 2006, respectively. In 2007 he joined the Electrical and Electronics Engineering Department, University Los Andes (Uniandes), Bogotá, Colombia as an Assistant Professor. In 2008 he obtained the Distinguished Lecturer Award from the School of Engineering, Uniandes. He is currently an Associate Professor, the director of the research group in control and automation systems (GIAP, Uniandes), and a member of the Board of Governors of the IEEE CSS for the 2014 period. He was the chair of the IEEE Control Systems Society (CSS), Colombia for the 2011–2013 period. His research interests include hierarchical and distributed optimization methods, using bio-inspired and gametheoretical techniques for dynamic resource allocation, applied to problems in energy, water, and transportation.