Infinite horizon linear-quadratic Stackelberg games for discrete-time stochastic systems

Infinite horizon linear-quadratic Stackelberg games for discrete-time stochastic systems

Automatica 76 (2017) 301–308 Contents lists available at ScienceDirect Automatica journal homepage: www.elsevier.com/locate/automatica Brief paper ...

621KB Sizes 0 Downloads 35 Views

Automatica 76 (2017) 301–308

Contents lists available at ScienceDirect

Automatica journal homepage: www.elsevier.com/locate/automatica

Brief paper

Infinite horizon linear-quadratic Stackelberg games for discrete-time stochastic systems✩ Hiroaki Mukaidani a,1 , Hua Xu b a

Institute of Engineering, Hiroshima University, 1-4-1 Kagamiyama, Higashi-Hiroshima, 739-8527, Japan

b

Graduate School of Business Sciences, The University of Tsukuba, 3-29-1 Bunkyou-ku, Tokyo, 112-0012, Japan

article

info

Article history: Received 3 April 2015 Received in revised form 15 September 2016 Accepted 15 September 2016

Keywords: Stochastic systems Stackelberg games Nash games Cross-coupled stochastic algebraic equations (CSAEs) Hierarchical H∞ -constraint control

abstract In this paper, we consider infinite horizon linear-quadratic Stackelberg games for a discrete-time stochastic system with multiple decision makers. Necessary conditions for the existence of the Stackelberg strategy set are derived in terms of the solvability of cross-coupled stochastic algebraic equations (CSAEs). As an important application, the hierarchical H∞ -constraint control problem for the infinite horizon discretetime stochastic system with multiple channel inputs is solved using the Stackelberg game approach. Computational methods for solving the CSAEs are also discussed. A numerical example is provided to demonstrate the usefulness of the proposed algorithms. © 2016 Elsevier Ltd. All rights reserved.

1. Introduction Many researchers have extensively investigated the dynamic games of continuous- and discrete-time systems (see Basar & Olsder, 1999 and the references therein). Recently, due to the growth of interest in multi-agent and cooperative systems, optimal cooperation and team collaboration have been widely investigated. For example, neural networks were utilized to find suitable approximations of the optimal-value function and saddle-point solutions (Zhang, Qin, Jiang, & Luo, 2014). The conditions for the existence of Pareto optimal solutions for linear-quadratic infinite horizon cooperative differential games were derived (Reddya & Engwerda, 2013). The design method for the synchronization control of discrete-time multi-agent systems on directed communication graphs has been discussed (Movric, You, Lewis, & Xie, 2013). The open-loop Nash games for a class of polynomial system via a statedependent Riccati equation have been studied (Lizarraga, Basin,

✩ This work was supported by JSPS KAKENHI Grant Numbers 26330027, 16K00029. The material in this paper was partially presented at the 2014 American Control Conference, June 4–6, 2014, Portland, OR, USA. This paper was recommended for publication in revised form by Associate Editor Michael V. Basin under the direction of Editor Ian R. Petersen. E-mail addresses: [email protected] (H. Mukaidani), [email protected] (H. Xu). 1 Fax: +81 82 424 6476.

http://dx.doi.org/10.1016/j.automatica.2016.10.016 0005-1098/© 2016 Elsevier Ltd. All rights reserved.

Rodriguez, & Rodriguez, 2015). Among various dynamic games, the Stackelberg game is one that involves hierarchical decisions among different decision makers, yielding significant and useful work (Abou-Kandil, 1990; Basar, Bensoussan, & Sethi, 2010; Bensoussan, Chen, & Sethi, 2013, 2014, 2015; Jungers & Oara, 2010; Jungers, Trelat, & Abou-Kandil, 2008; Jungers, Trelat, & Aboukandil, 2011; Li, Cruz, & Simaan, 2002; Medanic, 1978; Xu, Zhang, & Chai, 2015). Although many researchers have studied Stackelberg games, they have only focused on those for finite horizon deterministic continuous- or discrete-time systems. To the best of our knowledge, the Stackelberg game for infinite horizon discrete-time systems is still unsolved. The infinite horizon Stackelberg game is difficult to solve, because it involves some higher-order algebraic nonlinear matrix equations rather than difference equations in the finite horizon case (Abou-Kandil, 1990) and the team-optimal state feedback Stackelberg strategy case (Li et al., 2002). Recent advances in the theory of discrete-time stochastic systems have led to the reconsideration of various control problems for infinite horizon discrete-time stochastic systems (Huang, Zhang, & Zhang, 2008; Zhang, Huang, & Xie, 2008). It has been shown that the optimal or mixed H2 /H∞ feedback controllers can be constructed by solving certain stochastic algebraic Riccati equations (SAREs). However, corresponding results have not been found in dynamic game settings, especially in dynamic games with multiple decision makers, except for Mukaidani, Tanabata, and Matsumoto (2014) who only considered Nash games. Taking into

302

H. Mukaidani, H. Xu / Automatica 76 (2017) 301–308

consideration the fact that the dynamic game approaches have found many applications in the control field, the investigation of hierarchical dynamic games for discrete-time stochastic systems is extremely promising. The Stackelberg games for a class of stochastic systems have been considered (Mukaidani, 2013, 2014a,b; Mukaidani & Xu, 2015). In Mukaidani (2013) and Mukaidani and Xu (2015), the Stackelberg games for the continuous-time stochastic systems with multiple decision makers have been studied. In Mukaidani (2014a,b), the Stackelberg games for a discrete-time stochastic system with one leader and one follower have been studied. In particular, in Mukaidani (2013, 2014a), the mixed H2 /H∞ control problem was investigated where the follower is interpreted as a deterministic disturbance. In this paper, we investigated the infinite horizon linearquadratic Stackelberg game for a class of discrete-time linear stochastic systems with state-dependent noise. The results in Mukaidani (2014a) were merely preliminary results to study a standard Stackelberg game with one follower. In this paper, we have extended the results in Mukaidani (2014a) to the Stackelberg game with multiple followers. Moreover, we have studied the hierarchical H∞ -constraint control problem with multiple decision-makers using this Stackelberg game approach. Although the Lagrange-multiplier technique was used to derive the necessary conditions in the same way as in Mukaidani and Xu (2015), the derivation procedures, and consequently the results, were completely different. Notation: The notations used in this paper are fairly standard. The superscript T denotes matrix transpose. In denotes the n × n identity matrix. E[·] denotes the expectation operator. Tr denotes the trace of a matrix. vec denotes the column vector of a matrix. δij denotes the Kronecker delta. block diag denotes the block diagonal matrix. X denotes a set of {X0 , X1 , . . . , XN }. ρ(·) denotes the spectral radius function. 2. Preliminary

When system (1) is AMSS, (A, Ap ) is called stable for short. The following lemma has been proved (Huang et al., 2008). Lemma 1. If (A, Ap ) is stable, then for any C , the following stochastic algebraic Lyapunov equation (SALE)

− P + AT PA + ATp PAp + C T C = 0

(2)

admits a unique solution P ≥ 0. If (A, Ap / C ) is exactly observable, then (A, Ap ) is stable if and only if (2) has a unique solution P > 0. Moreover, ∞  







E xT (k)C T Cx(k) = E xT (0)Px(0) ,

(3)

k=0

where x(k + 1) = Ax(k) + Ap x(k)w(k). Definition 2. System (1) is stabilizable in the mean square sense, if there exists a feedback control u(k) = Kx(k) with K (a constant matrix), such that for any x0 , the closed-loop system is AMSS. The triple of matrices (A, B, Ap ) is called stabilizable if and only if stochastic system (1) is stabilizable. The following lemma plays a key technical role in this paper (Huang et al., 2008; Zhang et al., 2008). Lemma 2 (Huang et al., 2008; Zhang et al., 2008). The following stochastic linear-quadratic (LQ) control problem is considered subject to (1): minimize J (u) :=

∞  



E xT (k)Qx(k) + uT (k)Ru(k) ,

k=0

Q = Q ≥ 0, R = RT > 0. T

(4)



If (A, B, Ap ) is stabilizable and (A, Ap / Q ) is exactly observable, then the following stochastic algebraic Riccati equation (SARE) has a unique solution P = P ∗ .

Before investigating the Stackelberg game, some useful lemmas are introduced. Consider the following discrete-time stochastic system.

− P + AT PA + ATp PAp + Q

x(k + 1) = Ax(k) + Bu(k) + Ap x(k)w(k),

Furthermore, J (u∗ ) = E[xT (0)P ∗ x(0)] and the optimal feedback control is given by

x(0) = x0 ,

(1a)

y(k) = Cx(k),

(1b)

where x(k) ∈ R represents the state vector, u(k) ∈ R represents the control input, y(k) ∈ Rp represents the measured output, and w(k) ∈ R is a one-dimensional sequence of real random process defined in a complete probability space, which is a wide sense stationary, second-order process with E[w(k)] = 0 and E[w(s)w(k)] = δsk (Huang et al., 2008; Zhang et al., 2008). l2w (N, Rn ) denotes the set of nonanticipative square summable stochastic processes y = {y(k) : y(k) ∈ Rn } = {y(0), y(1), . . .}, such that y(k) is Fk−1 measurable, where Fk denotes the σ -algebra generated by w(s), s = 0, 1, . . . , k.  The l2 -norm of y(k) ∈ 2 2 n 2 lw (N, R ) is defined by ∥y(k)∥ 2 := ∞ n k=0 E[∥y(k)∥ ] (Huang n

− AT PB(R + BT PB)−1 BT PA = 0.

(5)

u∗ (k) = K ∗ x(k) = −(R + BT PB)−1 BT PAx(k).

(6)

m

lw (N, R )

et al., 2008; Zhang et al., 2008). The following definition has been introduced in Huang et al. (2008) and Zhang et al. (2008). Definition 1. An autonomous discrete-time stochastic system x(k + 1) = Ax(k) + Ap x(k)w(k), (u(k) ≡ 0 in (1a)) is called asymptotically mean square stable (AMSS) if for any x0 , the corresponding state satisfies limk→∞ E[∥x(k)∥2 ] = 0. Moreover, (A, Ap / C ) is exactly observable if y(k) ≡ 0 almost surely ∀k ∈ N ⇒ x(0) = 0.

3. Stackelberg game with multiple followers The following stochastic systems with N + 1-decision makers involving state-dependent noise is considered. x(k + 1) = Ax(k) + B0 u0 (k) +

N 

Bi ui (k)

i=1

+ Ap x(k)w(k),

x(0) = x0 ,

(7)

where ui (k) ∈ R , i = 0, 1, . . . , N represents the ith control input. Moreover, i = 0 represents the leader’s control input, and the other values for i represent the follower’s inputs. The initial state x(0) = x0 is assumed to be a random variable with a covariance matrix E[x(0)xT (0)] = In . Without loss of generality, the following basic assumption is made: mi

Assumption 1. (A, Bi , Ap ), i = 0, 1, . . . , N is stabilizable.

H. Mukaidani, H. Xu / Automatica 76 (2017) 301–308

The cost functions for the decision makers are defined by

−M0 + ATK M0 AK + ATp M0 Ap +

J0 (u0 , u1 , . . . , uN )

:=

∞ 

1

 (k)R0j uj (k) ,

j =1

:=

+ (Bi EiT ATK + AK Ei BTi ) = 0,

(8a)

i = 1, . . . , N ,

2

E x (k)Qi x(k) + T

uT0

2

(k)Ri0 u0 (k)

 + uTi (k)Rii ui (k) ,

i = 1, . . . , N ,

(13d)

N  (BT0 Mj AK Nj + Rj0 K0 Nj ) j=0

k=0

(8b)

+

N 

BT0 Mj Bj EjT = 0,

(13e)

j =1

where Q0 = Q0T ≥ 0,

R0j = RT0j ≥ 0,

Qi = QiT ≥ 0,

R00 = RT00 > 0,

Ri0 = RTi0 ≥ 0,

2

Assumption 2. (A, Ap / observable.



Qi ), i

N 

BTi Mj AK Nj + 2Rii Ki Ni + 2R0i Ki N0 + Rii EiT

j=0

Rii = RTii > 0.

It should be noted that although there are no inputs of other decision makers in (8b), they depend on each other because of the state Eq. (7). Therefore, Ji , i = 1, . . . , N is a function of u1 , . . . , uN .

= 0, 1, . . . , N are exactly

+

N 

BTi Mj Bj EjT = 0,

i = 1, . . . , N ,

(13f)

j =1 1ˆ T T ˆ ˆ ˆ where Ki := −Rˆ − i Li , Ri := Rii + Bi Mi Bi , Li := Bi Mi A−i , i =

1, . . . , N, AK := Aˆ −i + Bi Ki = A + B0 K0 +

N

j=1, j̸=i ˆ −1 T

Bj Kj , QK0 := Q0 +

N

j=0 Bj Kj , T K0 R00 K0 , QKi

Sˆi := Bi Ri Bi .

o∗ Jio∗ ≤ J− i,

x(k + 1) = Aˆ −i x(k) + Bi ui (k) + Ap x(k)w(k),

∀ uoi = ui (u0 ) ∈ Rmi , i = 1, . . . , N ,

J0 (u0 , u1 , . . . , uN ) = min J0 (u0 , u ), ∗



o∗

u0

where and

o∗ J− i

(9a) (9b)

Proof. For the given arbitrary u0 (k) = K0 x(k), the following minimizing problem for the fixed i is considered.

Jˆi (ui ) :=

= Ji (u0 ,

(10a)

),

uo−∗i = (uo1∗ (u0 ), . . . , uoi−∗ 1 (u0 ), uoi (u0 ), uoi+∗ 1 (u0 ), . . . , uoN∗ (u0 )) i = 1, . . . , N .

(10b)

(11)

We assume that the linear closed-loop memoryless Stackelberg strategy has the following form (Basar & Olsder, 1999): ui (k, x(0), x(k)) = Ki x(k),

i = 0, 1, . . . , N .

(12)

This restriction is reasonable owing to the fact that the considered system is linear, and the cost functions are quadratic and justifiable from a practical implementation point of view (Medanic, 1978). It should be noted that the inequality (9a) means Nash equilibrium condition. Theorem 1. Under Assumptions 1 and 2, the strategy set (12) constitutes the Stackelberg strategy only if the following cross-coupled stochastic algebraic equations (CSAEs) have solutions of Mi > 0, Ni , Ei , i = 0, 1, . . . , N and K0 .

−Mi + ATK Mi AK + ATp Mi Ap + KiT Rii Ki + QKi = 0, i = 1, . . . , N ,

(14b)

1ˆ uoi (k) = Ki x(k) = −Rˆ − i Li x(k),

i = 1, . . . , N ,

(15)

where

with u∗i = uoi ∗ (u∗0 ),

E[xT (k)QKi x(k) + uTi (k)Rii ui (k)].

(14a)

By using Lemma 2, the optimal state feedback controller uoi (k) is given by

uo∗ = (uo1∗ (u0 ), . . . , uoi ∗ (u0 ), . . . , uoN∗ (u0 )), uo−∗i

∞  k=0

means the equilibrium strategy set except for ith player

Jio∗ = Ji (u0 , uo∗ ), o∗ J− i

Aˆ −i := A +

:= Qi + K0T Ri0 K0 ,

Under the assumption that all the decision makers employ the closed-loop memoryless strategies ui := ui (k, x(0), x(k)) (Basar & Olsder, 1999), a strategy set (u∗0 , u∗1 , . . . , u∗N ) is called a Nashbased Stackelberg strategy set if the following conditions are met:



(13c)

−N0 + AK N0 ATK + Ap N0 ATp + In = 0,

Ji (u0 , u1 , . . . , uN ) ∞  

(13b)

−Ni + AK Ni ATK + Ap Ni ATp

k=0

+

KjT R0j Kj

+ QK0 = 0,

E xT (k)Q0 x(k) + uT0 (k)R00 u0 (k)

uTj

N  j =1



N 

303

G−i (Mi , K−i ) := −Mi + Aˆ T−i Mi Aˆ −i + ATp Mi Ap 1ˆ + QKi − LTi Rˆ − i = 1, . . . , N , i Li = 0,

(16a)

Gi (Mi , K ) := −Mi + + QKi = 0,

(16b)

ATK Mi AK

+

i = 1, . . . , N ,

KiT Rii Ki

Hi (Mi , K ) := Rˆ i Ki + Lˆ i

= Rii Ki +

BTi Mi

 A+

N 

 Bj Kj

= 0,

(16c)

j=0

and K−i = {K0 , K1 , . . . , KN } − Ki . Then, (13a) is held. Moreover, if Aˆ −i is AMSS, then the cost J0 with uo1 (k) can also be obtained by using Lemma 1. J0 (K0 x, uo1 , . . . , uoN ) = Tr [M0 ],

(17)

where M0 is the solution of the following SALE (18). G0 (M0 , K ) := −M0 + ATK M0 AK + ATp M0 Ap

+ (13a)

+

ATp Mi Ap

N  j =1

KjT R0j Kj + QK0 = 0.

(18)

304

H. Mukaidani, H. Xu / Automatica 76 (2017) 301–308

Therefore, SARE (13b) is held. The Lagrangian I is considered.

Therefore, it is expected that a similar proof can be considered for this important feature. This has remained as an unresolved problem. On the other hand, it should be noted that if the CSAEs (13) have a solution set, the existence of a Stackelberg strategy is guaranteed.

I = I (M , K , N , E1 , . . . , EN )

= Tr [M0 ] +

N 

Tr [Nj Gj (Mj , K )]

j =1

+

N 

Tr [Ej Hj (Mj , K )] + Tr [N0 G0 (M0 , K )],

4. Application to hierarchical H∞ -constraint control (19)

j =1

where Ni and Ei , i = 0, 1, . . . , N are symmetric matrices of Lagrange multipliers. It should be noted that Eq. (16a) can be changed into Eq. (16b). As necessary conditions, in order to minimize M0 , the Lagrange multiplier technique results in the following equations.

∂I = −Ni + AK Ni ATK + Ap Ni ATp ∂ Mi 1

+ (Bi EiT ATK + AK Ei BTi ) = 0,

i = 1, . . . , N ,

2

∂I = −N0 + AK N0 ATK + Ap N0 ATp + In = 0, ∂ M0

(20a) (20b)

·

+

N 

BT0 Mj Bj EjT = 0,

(20c)

j =1

·

+ 2R0i Ki N0 + Rii EiT N  + BTi Mj Bj EjT = 0,

+ Ap x(k)w(k), x(0) = x0 ,  T z (k) = [Cx(k)]T uT0 (k) uT1 (k) · · · uTN (k) ,

i = 1, . . . , N .

(20d)

x(k + 1) = Ax(k) + Dv(k) + Ap x(k)w(k),

(23a)

z (k) = Cx(k),

(23b)

x(0) = x0 .

∂ Tr [NAT MSMA] ∂M

∀v(k) ∈ l2w (N, Rnv ),

∥z (k)∥2l2 (N, Rnz ) w

sup

∥v(k)∥2l2 (N, Rnv )

n v(k)∈l2 w (N, R v ), v(k)̸=0, x0 =0

.

= SMANA + ANA MS − SMANA MS ,

(21)

(25)

w

Definition 4 (Zhang et al., 2008). System (23) is internally stable if it is AMSS in the absence of v(k). Lemma 3 (Zhang et al., 2008). Let γ > 0 be given. If stochastic system (23) is internally stable and ∥L ∥ < γ , then there exists a stabilizing solution P˜ ≤ 0 for the following SARE

˜ + ATp PA ˜ p − P˜ + AT PA ˜ (γ 2 Inv + DT PD ˜ )−1 DT PA ˜ = 0, − C T C − AT PD ˜ > 0, γ 2 Inv + DT PD

(26)

where (A + DFγ , Ap ) is stable with

˜ )−1 DT PA ˜ . Fγ = −(γ 2 Inv + DT PD

where S := B(R + BT MB)−1 BT . However, by introducing the variables Ei , i = 1, . . . , N, the above-mentioned complicated formulation can be avoided. It should be noted that because the linear quadratic cost functions are considered, the convexity is satisfied. Therefore, second-order conditions are not computed. Theorem 1 provides only one set of Stackelberg strategies, and it does not attribute any uniqueness feature to this solution set. It should be noted that the local uniqueness has been discussed by using the Newton Kantorovich theorem in Mukaidani (2006).

(24)

with its norm

∥L ∥ :=



It should be noted that the result of the single follower case (Mukaidani, 2014a) is a special case of the multiple followers case of this paper. Furthermore, in Mukaidani (2014a), the following formula is needed:

T

(22b)

where v(k) ∈ R represents the external disturbance. z (k) ∈ Rnz represents the controlled output. The following preliminary results play an important role in the considered control problem.

2

Remark 1. It is well known that the first-order conditions are sufficient for dealing with a linear quadratic Stackelberg game. In fact, necessary and sufficient conditions for the existence and uniqueness of the two-player Stackelberg game have been given (Xu et al., 2015). However, it should be noted that although the cost functions under consideration are of strict convexity, the obtained conditions may not be sufficient. This is because the multiple followers play the Nash game which may lead to multiple Nashequilibrium strategies.

T

(22a)

nv

j =1

T

Bi ui (k) + Dv(k)

i=1

L v(k) := Cx(k, 0, v),

Finally, (13c)–(13f) can be obtained.

N 

In system (23), if the disturbance input is v(k) ∈ l2w (N, Rnv ), and the controlled output is z (k) ∈ l2w (N, Rnz ), then the perturbed operator L : l2w (N, Rnv ) → l2w (N, Rnz ) is defined by

N  ∂I = 2 BTi Mj AK Nj + 2Rii Ki Ni 2 ∂ Ki j =0

1

x(k + 1) = Ax(k) + B0 u0 (k) +

Definition 3 (Zhang et al., 2008). Consider the following autonomous stochastic system

N  ∂I = 2 (BT0 Mj AK Nj + Rj0 K0 Nj ) 2 ∂ K0 j =0

1

As an important application of the Stackelberg game, the hierarchical H∞ -constraint control is investigated in this section. The following stochastic system is considered.

(27)

Conversely, if (23) is internally stable and (26) has a stabilizing solution P˜ ≤ 0, then ∥L ∥ < γ . Definition 5. If there exist a strategy set (u∗0 , u∗1 , . . . , u∗N ) and a worst case disturbance v ∗ such that the following conditions are met, this strategy set is called the hierarchical H∞ strategy set. o∗ J˜io∗ ≤ J˜− i,

∀ uoi = ui (u0 ) ∈ Rmi , i = 1, . . . , N ,

(28a)

J˜v (u0 , u1 , . . . , uN , v ) ∗







= min J˜v (u∗0 , u∗1 , . . . , u∗N , v) > 0, v

(28b)

H. Mukaidani, H. Xu / Automatica 76 (2017) 301–308

where J˜0 (u∗0 , u∗1 , . . . u∗N , v ∗ ) = min J˜0 (u0 , uo∗ , v ∗ ), u0

o∗

u

o∗ J− i

˜

= ( (u0 ), . . . , = J˜i (u0 , uo−∗i , v ∗ ), uo1∗

uoi ∗

(u0 ), . . . ,

uoN∗

(u0 )),

(29b)

˜ j Bj E˜ jT = 0, BTi M

(32f)

1˜ −X˜ + A˜ T−F X˜ A˜ −F + ATp X˜ Ap − Q˜ z − L˜ Tv R˜ − v Lv = 0,

(32g)

j=1

1˜ ˜ T ˜ T ˜ ˜ ˜ where K˜ i := −R˜ − i Li , Ri := Rii + Bi Mi Bi , Li := Bi Mi A−i , i = 1, . . . , N,

1˜ ˜ F˜γ := −R˜ − , Rv := γ 2 Inv +DT X˜ D, L˜ v = DT X˜ A˜ −F , A˜ K := A˜ −F +DF˜γ , v Lv N N ˜ ˜ ˜ ˜ ˜ A˜ −F := A + j=0 Bj Kj , A−i := A + B0 K0 + j=1, j̸=i Bj Kj + DFγ ,

uo−∗i = (uo1∗ (u0 ), . . . , uoi−∗ 1 (u0 ), uoi (u0 ), uoi+∗ 1 (u0 ), . . . , uoN∗ (u0 )),

(29c)

∞  

J˜0 (u0 , u1 , . . . , uN , v) :=

i = 1, . . . , N ,

+ Rii E˜ iT +

(29a)

J˜io∗ = J˜i (u0 , uo∗ , v ∗ ),

N 

305

E xT (k)Q0 x(k)

N

Q˜ K˜ 0 := Q0 + K˜ 0T R00 K˜ 0 , Q˜ K˜ i := Qi + K˜ 0T Ri0 K˜ 0 , Q˜ z = C T C + j=0 K˜ jT K˜ j . The following strategy set (33) constitutes the hierarchical H∞ ˜ i > 0, N˜ i , i = strategy set only if the CSAEs (32) have solutions M 0, 1, . . . , N, E˜ i , i = 1, . . . , N, K˜ 0 , F˜γ and X˜ < 0.

k=0

+

uT0

(k)R00 u0 (k) +

N 

uTj

1˜ u∗i = K˜ i x(k) = −R˜ − i Li x(k),

 (k)R0j uj (k) ,

1˜ v(k) = F˜γ x(k) = −R˜ − v Lv x(k).

j =1

J˜i (u0 , u1 , . . . , uN , v) :=

∞ 



E xT (k)Qi x(k)

+

(k)Ri0 u0 (k) +

uTi

 (k)Rii ui (k) ,

J˜v (u0 , u1 , . . . , uN , v) :=

∞ 



E γ 2 | ∥v(k)∥ − ∥z (k)∥



(30)

It is assumed that the disturbance v(k) has the following form:

v(k) = F˜γ x(k).

(31)

Since inequality (28b) ensures that a strategy set (u∗0 , u∗1 , . . . , u∗N ) eliminates the external disturbance below a given disturbance attenuation level γ > 0, it constitutes an H∞ controller set for multiple decision makers. Moreover, when the worst case disturbance v ∗ , if existing, is implemented in (29), (u∗0 , u∗1 , . . . , u∗N , v ∗ ) arrives at a Stackelberg equilibrium, and the Stackelberg game is a hierarchical game. Therefore, the strategy set (u∗0 , u∗1 , . . . , u∗N ) is called the hierarchical H∞ strategy set.



Theorem 2. Assume that (A, Ap / C ) and (A + DF˜γ , Ap / Q0 ) are exactly observable. For any given γ > 0, let us consider the following CSAEs (32).

˜ i + A˜ TK M ˜ i A˜ K + ATp M ˜ i Ap + K˜ iT Rii K˜ i −M + Q˜ K˜ i = 0,

i = 1, . . . , N ,

˜ 0 + A˜ TK M ˜ 0 A˜ K + ATp M ˜ 0 Ap + −M

(32a) N 

T

K0T

K1T

Lemma 1 and (32b). Hence, u∗i = K˜ i x(k) ∈ l2w (N, Rmi ),

v(k) = F˜γ x(k) ∈ l2w (N, Rnv ).

2

(32b)

i = 1, . . . , N ,

−N˜ 0 + A˜ K N˜ 0 A˜ TK + Ap N˜ 0 ATp + In = 0, 2

N N   ˜ j A˜ K N˜ j + Rj0 K˜ 0 N˜ j ) + ˜ j Bj E˜ jT = 0, (BT0 M BT0 M j =0

2

N  j =0

j =1

˜ j A˜ K N˜ j + 2Rii K˜ i N˜ i + 2R0i K˜ i N˜ 0 BTi M

(34b)

stochastic system (22a) is internally stabilized by u∗i (k) = K˜ i x(k).

On the other hand, substituting u∗i = K˜ i x(k) into stochastic system (22a) yields the following closed-loop stochastic system: x(k + 1) = A˜ −F x(k) + Dv(k) + Ap x(k)w(k), z (k) = C T



K˜ 1T

···

 T T

K˜ N

x(k).

(35a) (35b)

By applying the Lemma 3 to the above mentioned stochastic system (35), if there exists a solution X˜ < 0 for the algebraic Riccati Eq. (32g), we have ∥L ∥ < γ . Second, the Stackelberg equilibrium case will also be shown. Substituting v ∗ (k) = F˜γ x(k) into stochastic system (22a) yields the following closed-loop stochastic system: N 

Bi ui (k)

i=1

−N˜ i + A˜ K N˜ i A˜ TK + Ap N˜ i ATp 1 + (Bi E˜ iT A˜ TK + A˜ K E˜ i BTi ) = 0,

(34a)

Therefore, from Lemma 1 and (32g), because (A˜ −F , Ap ) is stable, a

j =1

+ Q˜ K˜ 0 = 0,

i = 1, . . . , N ,

x(k + 1) = A˜ F x(k) + B0 u0 (k) +

K˜ jT R0j K˜ j

=

T



KNT

1L ˜ v . Moreover, the exact obL˜ Tv R˜ − ··· √v servable of (A + DF˜γ , Ap / Q0 ) yields the exact observable √ of (A˜ K , Ap / C˜ 0 ) from (32b), where C˜ 0 = Q0 K˜ 0T K˜ 1T · · ·  T ˜ 0 > 0, (A˜ K , Ap ) is stable by K˜ NT . By using the assumption that M

C

k=0

i = 0, 1, . . . , N .

(33b)

servability of (A˜ −F , Ap / C˜ K ) from (32g), where C˜ K



and u∗i = K˜ i x(k),

(33a)

Proof. First, the disturbance attenuation case will be proved. By using the similar steps of Lemma 5 in Zhang et al. (2008), the exact observability of (A, Ap / C ) implies the exact ob-

k=0

uT0

i = 0, 1, . . . , N ,

(32c) (32d) (32e)

+ Ap x(k)w(k),

x(0) = x0 ,

(36)

where A˜ F := A + DF˜γ . By applying Theorem 1 to the above mentioned stochastic system (36), Eqs. (32a)–(32f) can be derived.  Remark 2. Theorem 2 is the counterpart of Theorem 2 in Mukaidani and Xu (2015) for continuous systems. Although Lagrange multiplier technique as the used optimizing method adopted to prove the necessary conditions is the same, the two derivation procedures are completely different, and they result in completely different sets of CSAEs. The derivation of the set of the CSAEs in this paper is more difficult and complicated than that in the continuous systems due to the presence of implicit term

306

H. Mukaidani, H. Xu / Automatica 76 (2017) 301–308

1 Rˆ − = (Rii + BTi Mi Bi )−1 in Eq. (16a). Besides these differences, i we have also studied the hierarchical H∞ control in this work as an application of the Nash-based Stackelberg strategy which is not studied in Mukaidani and Xu (2015).

It should be noted that although the mixed H2 /H∞ control problems for the linear discrete-time stochastic systems have been investigated (Jungers et al., 2008; Mukaidani, 2014a), the problems are formulated under the assumption that there is only one follower in the systems. The mixed H2 /H∞ control problem cannot be formulated if there are multiple followers in the systems. Instead, the hierarchical H∞ -constraint control problem based on the Stackelberg game with multiple followers is formulated in the present paper. As a result, the H∞ -constraint control problem for the system with multiple channel inputs can be solved using the Stackelberg game approach.

˙

˜ i = Φi (M ˜ i ) = −M ˜ i + A˜ TK M ˜ i A˜ K + ATp M ˜ i Ap M i = 1, . . . , N ,

(37a)

˙

˜ 0 = Φ0 (M ˜ 0 ) = −M ˜ 0 + A˜ TK M ˜ 0 A˜ K + ATp M ˜ 0 Ap M N 

K˜ jT R0j K˜ j + Q˜ K˜ 0 ,

(37b)

j=1

˙

N˜ i = Ψi (N˜ i ) = −N˜ i + A˜ K N˜ i A˜ TK + Ap N˜ i ATp +

+ A˜ K E˜ i BTi ),

i = 1, . . . , N ,

1 2

(Bi E˜ iT A˜ TK (37c)

˙

N˜ 0 = Ψ0 (N˜ 0 )

= −N˜ 0 + A˜ K N˜ 0 A˜ TK + Ap N˜ 0 ATp + In ,

(37d)

˙

X˜ = Ξ (X˜ ) = −X˜ + A˜ T−F X˜ A˜ −F + ATp X˜ Ap − Q˜ z 1˜ − L˜ Tv R˜ − v Lv ,

(37e)

and the algebraic constraints (32e) and (32f). In other words, the convergence depends on the stability of the above-mentioned differential equations. Therefore, as long as the stability of the systems cannot be proved, we cannot conclude regarding convergence. For the special case, details of the sufficient conditions for convergence can be found in the Appendix section. The convergence speed of the steepest descent method strongly depends on the step size of Euler’s method. To improve the convergence, a new algorithm based on Lyapunov-type iterative scheme similar to the existing result (Gajić & Shen, 1993) is given here. T ˜ (ℓ+1) ˜ (ℓ) ˜ i(ℓ+1) + A˜ (ℓ) ˜ i(ℓ+1) Ap −M Mi AK + ATp M K

+ K˜ i(ℓ)T Rii K˜ i(ℓ) + Q˜ K˜(ℓ) = 0, i

i = 1, . . . , N ,

(38a)

T ˜ (ℓ+1) ˜ (ℓ) ˜ 0(ℓ+1) + A˜ (ℓ) ˜ 0(ℓ+1) Ap −M M0 AK + ATp M K

+

N  j =1

(ℓ)T

K˜ j

(ℓ)

R0j K˜ j

+ Q˜ K˜(ℓ) = 0, 0

T ˜ (ℓ+1) A˜ (ℓ) −N˜ i(ℓ+1) + A˜ (ℓ) + Ap N˜ i(ℓ+1) ATp K Ni K

i = 1, . . . , N ,

T ˜ (ℓ+1) A˜ (ℓ) −N˜ 0(ℓ+1) + A˜ (ℓ) + Ap N˜ 0(ℓ+1) ATp + In = 0, K N0 K   N  N   (ℓ+1) (ℓ) T ˜ (ℓ) (ℓ) ˜ (ℓ) ˜ ˜ ˜ 2 B0 Mj A + B0 K0 + Bj Kj + DFγ Nj j =0

(38c) (38d)

j =1



+ Rj0 K˜ 0(ℓ+1) N˜ j(ℓ) +

N 

(ℓ)

(ℓ+1)T

˜ j Bj E˜ j BT0 M

= 0,

(38e)

j=1

2

N 

(ℓ) BTi Mj

˜



(ℓ+1) A + B0 K˜ 0 +

j =0

N 

 (ℓ) ˜ (ℓ) ˜ (ℓ) ˜ Bj Kj DFγ Nj

j =1

˜ (ℓ)

˜ (ℓ)

+ 2Rii Ki Ni

˜ (ℓ)

˜ (ℓ)

+ 2R0i Ki N0 N  ˜ j(ℓ) Bj E˜ j(ℓ+1)T = 0, + BTi M j =1

To obtain the solutions of Eq. (32), the numerical algorithm based on the steepest descent method can be used. It is well known that the steepest descent method can be viewed as Euler’s method for solving ordinary differential equations. Thus, the convergence of the algorithm based on the steepest descent method is guaranteed, if the stability of the following nonlinear descriptor system is satisfied.

+

2

+ Rii E˜ i(ℓ+1)T

5. Numerical algorithm for solving (32)

+ K˜ iT Rii K˜ i + Q˜ K˜ i ,

1

T ˜ (ℓ) T + A˜ (ℓ) + (Bi E˜ i(ℓ)T A˜ (ℓ) K K Ei Bi ) = 0,

(38b)

i = 1, . . . , N ,

˜ (ℓ+1)

−X

+

(ℓ) (ℓ)T A˜ −F X˜ (ℓ+1) A˜ −F

(38f)

+

ATp X (ℓ+1) Ap

˜

˜ (ℓ)

− Qz

T ˜ (ℓ) −1 ˜ (ℓ) − L˜ (ℓ) v [Rv ] Lv = 0,

(38g)

(0)

= N˜ i(0) = 0, i = 0, 1, . . . , N, X˜ (0) = 0, K˜ i(0) = 0, (0) i = 0, 1, . . . , N, F˜γ(0) = 0, E˜ i = 0, i = 1, . . . , N, K˜ i(ℓ+1) −1 ˜ (ℓ) ˜ (ℓ) ˜ i(ℓ) Bi , L˜ (ℓ) ˜ i(ℓ) A˜ (ℓ) := −[R˜ (ℓ) Li , R i := Rii + BTi M := BTi M i ] i −i , (ℓ+ 1 ) (ℓ) − 1 (ℓ) (ℓ) 2 i = 1, . . . , N, F˜γ := −[R˜ v ] L˜ v , R˜ v := γ Inv + DT X˜ (ℓ) D, N T ˜ (ℓ) ˜ (ℓ) ˜ (ℓ) ˜ (ℓ) ˜ (ℓ) ˜ (ℓ) ˜ (ℓ) L˜ (ℓ) v = D X A−F , AK := A−F + DFγ , A−F := A + j=0 Bj Kj , N (ℓ) (ℓ) (ℓ) (ℓ) (ℓ) ˜ + DF˜γ , Q˜ := Q0 + A˜ −i := A + B0 K˜ 0 + j=1, j̸=i Bj Kj K˜ 0  (ℓ) T (ℓ) T (ℓ) (ℓ) (ℓ) (ℓ) N T K˜ 0 R00 K˜ 0 , Q˜ := Qi +K˜ 0 Ri0 K˜ 0 , Q˜ z = C C + j=1 K˜ j(ℓ)T K˜ j(ℓ) . ˜i where M

K˜ i

It should be noted that although a reduction in the number of iterations can be expected, the convergence proof is not given. However, the implementation is quite easy and works well in practice. In the numerical example, such a feature can be seen. Finally, Newton’s method can also be used to solve CSAEs. It is well known that Newton’s method attains a local quadratic convergence. However, it involves computing the Jacobian matrix at every iteration step. Moreover, it should be noted that the derivation of the Jacobi matrix by hand is quite tedious. The convergence property of Newton’s method will be given in the next section. 6. Numerical example The efficiency of the proposed approach is demonstrated by resolving the horizontal vibrations suppression control problem for a six-dimensional discrete-time system, which is based on an aircraft model in level flight, and is subjected to wind gust turbulence (Gajić & Shen, 1993). In particular, the following features are illustrated in this example. (1) A non-cooperative hierarchical strategy set for practical control problem can be obtained by solving the CSAE (32). (2) The considered numerical algorithms work well. The system matrices are given as follows:

  γ = 2, A = block diag A1 A2 A3 ,   0.905 0 A1 = , 0 0.904    0.883 0.14 0.447 A2 = , A3 = −0.14 0.883 −0.714

0.714 , 0.447



H. Mukaidani, H. Xu / Automatica 76 (2017) 301–308

307

Table 2 Number of iterations.

Table 1 Number of iterations. Steepest descent method

Lyapunov type iteration

Newton’s method

Iterations

Lyapunov-type

Newton’s method

144

12

7

0 1 2 3 4 5 6 7 8 9 10 11 12

7.0000 7.3033 3.8123e−001 1.3479e−002 2.7460e−003 1.0689e−004 7.6163e−006 1.7540e−006 7.3453e−008 6.1152e−009 1.1397e−009 5.2384e−011 4.7342e−012

8.4069e+001 7.5403e+001 1.0015e+001 7.5616e−001 6.8170e−003 1.6230e−005 9.1236e−010 7.6001e−014

−0.386e−4 −0.209e−1  0.557e−2   B1 =  −0.364e−1 ,   −0.675e−1 −0.460e−1 B0 =

1

(B1 + B2 ) ,

2 Q0 = I6 ,

−0.157e−7  0.724e−5   0.280e−2   B2 =  −0.121e−2 ,   −0.488e−3 −0.684e−3 Ap = 0.1 × A,



Q1 = block diag I4

D = 0.5B0 ,

2 × I2 ,



Q2 = block diag 2 × I4

0.25 × I2 ,

R11 = 2,

R12 = 5,

R21 = 1,

R22 = 1,

R10 = 2,

R20 = 3,

R00 = 1,

R01 = 3,





R02 = 2.

In this case, the exact strategies Ki , i = 0, 1, 2 and the worst case disturbance are given below: K1 =

9.6376e−5



7.7728e−2

5.1581e−2 −2.4537e−2  −6.8381e−3 2.2828e−1 ,

 −7.4762e−7 −5.6104e−4 −2.6874e−2  6.6090e−3 −1.2133e−4 4.5444e−4 ,  K0 = 9.7542e−5 5.2163e−2 −3.1867e−2  8.0455e−2 −6.5161e−3 1.2502e−1 ,  Fγ = −1.2055e−5 −6.4471e−3 3.8611e−3  −9.9942e−3 4.0328e−4 −1.4333e−2 .

K2 =

˜ i , i = 0, 1, . . . , N, It should be noted that although the solutions M ˜ ˜ N0 and X have not been shown due to page limitation, the ˜ i , i = 0, 1, . . . , N, N˜ 0 and the negative positive definiteness of M definiteness of X˜ were both verified. Consequently, it can be seen that in the strategies formulated to resolve the hierarchical decision-making problem, one solution comes up as the leader’s solution, while the remaining agents follow the leader as a team. The convergence properties of three methods that are given in the previous section are discussed. The required iteration is presented in Table 1. It should be noted that if the norm obtained by substituting the iterative solutions into CSAEs (32) is less than 10−11 , the computation stops. It can be observed that Newton’s method involves the least number of iterations. However, taking into account the difficulty in the derivation of Newton’s method, the Lyapunov-type iterative scheme is more suitable for implementation, even though the convergence rate is unclear, and there is no proof of convergence. The number of iterations required for Newton’s method versus that for the Lyapunov-type iterative scheme (Gajić & Shen, 1993) is presented in Table 2. It can be observed from Table 2 that, as compared to the Lyapunovtype iterative scheme, Newton’s method succeeds in reducing the number of iterations. Finally, since the implementation is easy, and fast convergence is attained with the Lyapunov-type iterative scheme, this scheme is considered as a more practical algorithm. 7. Conclusions In this paper, the infinite horizon Stackelberg games for discrete-time stochastic linear-quadratic systems with multiple decision-makers have been investigated. Firstly, the necessary conditions for the existence of the Stackelberg strategy set have

been given in terms of the solvability of the CSAEs. Secondly, as an important application, the hierarchical H∞ -constraint control problem has also been formulated and compared with the existing H2 /H∞ control formulation. It should be noted that the H2 /H∞ control problem has not been formulated using the Nash game approach for cases in which there are multiple decision-makers with a hierarchical structure in a stochastic system. Instead, the hierarchical decision-making problem with an H∞ constraint can be formulated using the Stackelberg game approach, even when more than three decision-makers are present in the system. Thus, such problems for systems with multiple channel inputs can be solved using the Stackelberg game approach. Moreover, it should be noted that the deterministic Stackelberg game problem can be considered completely as a special case of the current stochastic Stackelberg game problem. Finally, in order to solve the CSAEs, three numerical algorithms have been developed. As a result, it has been shown that the Lyapunov-type iterative scheme, which is associated with the stochastic Lyapunov equations for solving the CSAEs, is relatively easy to implement and converges relatively quickly. Appendix. Convergence of steepest descent method Consider the following difference equation based on the steepest descent method. xi+1 = xi + ε f (xi ),

x0 = x0 ,

i = 0, 1, . . . ,

(A.1)

where x =

 ˜ 1 ]T [vecM

···

˜ N ]T [vecM ˜ 0 ]T [vecN˜ 1 ]T [vecM T [vecN˜ 0 ]T [vecX˜ ]T ,

· · · [vecN˜ N ]T  f (x) = [vecΦ1 ]T · · · ···

[vecΨN ]T

[vecΦN ]T [vecΨ0 ]T

[vecΦ0 ]T [vecΨ1 ]T T [vecΞ ]T ,

and ε is a sufficient small positive parameter and can be viewed as the step size of Euler’s method for solving ordinary differential equations. It should be noted that K˜ i , i = 0, 1, . . . , N, F˜γ and E˜ i , i = 1, . . . , N can be obtained by solving the linear equations for any fixed xi for all iterations i. Taylor expansion expresses f (xi ) as follows: f (xi ) = f (α) + ∇ f (α)(xi − α) + O (xi − α)2



  = ∇ f (α)(xi − α) + O (xi − α)2 ,

 (A.2)

where x = α is a solution for the equation f (x) = 0. Moreover, ∇ f (x) = ∂ f∂(xx) . Hence, we have

      ∥xi+1 − α∥ ≤ In2 + ε∇ f (α) · ∥xi − α∥ + ε O ∥xi − α∥2 . (A.3)

308

H. Mukaidani, H. Xu / Automatica 76 (2017) 301–308

In this case, by letting β = α + O(ε), we have

    ∥xi+1 − α∥ ≤ In2 + ε∇ f (β) + O(ε2 ) · ∥xi − α∥   + ε O ∥xi − α∥2 .

(A.4)

Finally, the considered difference equation is locally and asymptotically stable, if

    In2 + ε∇ f (β) + O(ε 2 ) < 1.

(A.5)

This means that if condition (A.5) holds, the steepest descent method converges with a solution under the sufficient condition. In other words, if β is very close to a solution α, the steepest descent method converges to the solution under the condition x0 = β. Consequently, the stability implies that as the dominant part of the convergence condition of the steepest descent method, if ρ(In2 + ε∇ f (β) + O(ε2 )) < 1, the algorithm under consideration converges. It may be noted that the condition (A.5) is equivalent to the assumption that all the eigenvalues are located in the unit circle on a complex plane. In other words, the condition for local stability of x˙ = f (x) can be interpreted as the convergence condition ρ(In2 + ε∇ f (β) + O(ε2 )) < 1 at an equilibrium point x = α under condition that β = α + O(ε). References Abou-Kandil, H. (1990). Closed-form solution for discrete-time linear-quadratic Stackelberg games. Journal of Optimization Theory and Applications, 65(1), 139–147. Basar, T., Bensoussan, A., & Sethi, S. P. (2010). Differential games with mixed leadership: The open-loop solution. Applied Mathematics and Computation, 217(3), 972–979. Basar, T., & Olsder, G.J. (1999). Dynamic Noncooperative Game Theory, Philadelphia: SIAM Series in Classics in Applied Mathematics. Bensoussan, A., Chen, S., & Sethi, S. P. (2013). Linear quadratic differential games with mixed leadership: the open-loop solution. Numerical Algebra, Control and Optimization, 3(1), 95–108. Bensoussan, A., Chen, S., & Sethi, S. P. (2014). Feedback Stackelberg solutions of infinite-horizon stochastic differential games. In Series in operations research & management science: 198. Models and methods in economics and management science int (pp. 3–15). Bensoussan, A., Chen, S., & Sethi, S. P. (2015). The maximum principle for global solutions of stochastic Stackelberg differential games. SIAM Control and Optimization, 53(4), 1956–1981. Gajić, Z., & Shen, X. (1993). Parallel algorithms for optimal control of large scale linear systems. New York: Springer-Verlag. Huang, Y., Zhang, W., & Zhang, H. (2008). Infinite horizon linear quadratic optimal control for discrete-time stochastic systems. Asian Journal of Control, 10(5), 608–615. Jungers, M., & Oara, C. (2010). Anti-palindromic pencil formulation for openloop stackelberg strategy in discrete-time, in proceedings of the 19th int. symp. mathematical theory of networks and systems, Budapest, Hungary, July, (pp. 2265–2268). Jungers, M., Trelat, E., & Abou-Kandil, H. (2008). A Stackelberg game approach to mixed H2 /H∞ control, In Proceedings of the 17th IFAC world congress, Seoul, Korea, July, (pp. 3940–3945). Jungers, M., Trelat, E., & Abou-kandil, H. (2011). Min-max and min-min Stackelberg strategies with closed-loop information structure. Journal of Dynamical and Control Systems, 17(3), 387–425. Li, M., Cruz, J. B., , J. B., & Simaan, M. A. (2002). An approach to discrete-time incentive feedback Stackelberg games. IEEE Transactions on Systems, Man & Cybernetics, Part A, 32(4), 472–481.

Lizarraga, M. J., Basin, M., Rodriguez, V., & Rodriguez, P. (2015). Open-loop Nash equilibrium in polynomial differential games via state-dependent Riccati equation. Automatica, 53, 155–163. Medanic, J. V. (1978). Closed-loop Stackelberg strategies in linear-quadratic problems. IEEE Transactions on Automatic Control, 23(4), 632–637. Movric, K. H., You, K., Lewis, F. L., & Xie, L. (2013). Synchronization of discretetime multi-agent systems on graphs using Riccati design. Automatica, 49(2), 414–423. Mukaidani, H. (2006). A numerical analysis of the Nash strategy for weakly coupled large-scale systems. IEEE Transactions on Automatic Control, 51(8), 1371–1377. Mukaidani, H. (2013). H2 /H∞ control of stochastic systems with multiple decision makers: a Stackelberg game approach, In proceedings of the 52nd IEEE conf. decision and control, Firenze, Italy, December, (pp. 1750–1755). Mukaidani, H. (2014a). Stackelberg strategy for discrete-time stochastic system and its application to H2 /H∞ control, In proceedings of the American control conf. Portland, OR, June, (pp. 4488–4493). Mukaidani, H. (2014b). Stackelberg strategy for discrete-time stochastic system and its application to weakly coupled systems, In proceedings of the American control conf. Portland, OR, June, (pp. 4506–4511). Mukaidani, H., Tanabata, R., & Matsumoto, C. (2014). Dynamic game approach of H2 /H∞ control for stochastic discrete-time systems. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, E97-A(11), 2200–2211. Mukaidani, H., & Xu, H. (2015). Stackelberg strategies for stochastic systems with multiple followers. Automatica, 53, 53–59. Reddya, P. V., & Engwerda, J. C. (2013). Pareto optimality in infinite horizon linear quadratic differential games. Automatica, 49(6), 1705–1714. Xu, J., Zhang, H., & Chai, T. (2015). Necessary and sufficient condition for two-player Stackelberg strategy. IEEE Transactions on Automatic Control, 60(5), 1356–1361. Zhang, W., Huang, Y., & Xie, L. (2008). Infinite horizon stochastic H2 /H∞ control for discrete-time systems with state and disturbance dependent noise. Automatica, 44(9), 2306–2316. Zhang, H., Qin, C., Jiang, B., & Luo, Y. (2014). Online adaptive policy learning algorithm for H∞ state feedback control of unknown affine nonlinear discretetime systems. IEEE Transactions on Cybernetics, 44(12), 2706–2718.

Hiroaki Mukaidani received his B.S. in Integrated Arts and Sciences from Hiroshima University in 1992 and his M.Eng. and Dr.Eng. degrees in Information Engineering from Hiroshima University in 1994 and 1997, respectively. In April 1998, he joined the Faculty of Information Science at Hiroshima City University as a Research Associate. Since April 2002, he has been with the Graduate School of Education at Hiroshima University as an Assistant Professor and Associate Professor. He became a Full Professor at the Institute of Engineering at Hiroshima University in April 2012. He spent 10 months (from November 2007 to September 2008) as a research fellow at the Department of Electrical and Computer Engineering (ECE), University of Waterloo. His research interests include robust control, dynamic games, and the application of stochastic systems. Dr. Mukaidani is a member of the Institute of Electrical and Electronic Engineers (IEEE).

Hua Xu received his B.Eng. and M.Eng. degrees in Electrical Engineering from Northeastern University, China, in 1982 and 1985, respectively, and his Dr.Eng. degree in Information Engineering from Hiroshima University, Japan, in 1993. He worked with Hiroshima University as a Research Associate and Associate Professor from 1993 to 1998. Since 1998, he has been with the Graduate School of Business Sciences, the University of Tsukuba, Tokyo, Japan, as an Associate Professor, and Professor. His current research interests include dynamic optimization, dynamic games and their applications in business aspects. He is a member of International Society of Dynamic Games and several other academic societies of Japan.