Stochastic linear quadratic differential games in a state feedback setting with sampled measurements

Systems & Control Letters 134 (2019) 104563 Contents lists available at ScienceDirect Systems & Control Letters journal homepage: www.elsevier.com/l...

Download PDF

450KB Sizes 0 Downloads 18 Views

Report

PDF Reader
Full Text

Systems & Control Letters 134 (2019) 104563

Contents lists available at ScienceDirect

Systems & Control Letters journal homepage: www.elsevier.com/locate/sysconle

Stochastic linear quadratic differential games in a state feedback setting with sampled measurements ∗

Vasile Drăgan a,b , Ivan G. Ivanov c,d , , Ioan-Lucian Popa e a

Institute of Mathematics "Simion Stoilow" of the Romanian Academy, PO Box 1-764, RO-014700, Bucharest, Romania Academy of the Romanian Scientists, Romania c Faculty of Economics and Business Administration, Sofia University, Sofia 1113, Bulgaria d College Dobrich, Shumen University, Shumen, Bulgaria e Department of Exact Science and Engineering, University "1st December 1918" of Alba Iulia, Alba Iulia, 510009, Romania b

article

info

Article history: Received 26 February 2019 Received in revised form 15 September 2019 Accepted 19 October 2019 Available online xxxx Keywords: Stochastic linear differential games Nash equilibria Sampled-data controls Stochastic systems with finite jumps

a b s t r a c t The problem of sampled-data Nash equilibrium strategy in a state feedback setting for a stochastic linear quadratic differential game is addressed. It is assumed that the admissible strategies are constant on the interval between two measurements. The original problem is converted into an equivalent one for a linear stochastic system with finite jumps. This new formulation of the problem allows us to derive necessary and sufficient conditions for the existence of a sampled-data Nash equilibrium strategy in a state feedback form. These conditions are expressed in terms of solvability of a system of interconnected matrix linear differential equations with finite jumps and subject to some algebraic constraints. We provide explicit formulae of the gain matrices of the Nash equilibrium strategy in the class of piece-wise constant strategies in a state feedback form. The gain matrices of the feedback Nash equilibrium strategy are computed based on the solution of the considered system of matrix linear differential equations with finite jumps. For the implementation of these strategies only measurements at discrete-time instances of the states of the dynamical system are required. Finally, we show that under some additional assumptions regarding the sign of the weights matrices in the performance criteria of the two players, there exists a unique piecewise Nash equilibrium strategy in a state feedback form if the maximal length of the sampling period is sufficiently small. © 2019 Elsevier B.V. All rights reserved.

1. Introduction The stochastic differential games and their extensions have been investigated extensively in the literature. For the readers convenience we refer to the books [1–3] and their references. Lately, in the stochastic framework, the sampled-data control systems have received a great deal of attention. In [4,5], the authors considered H2 and LQ robust sampled-data control problems under a unified framework. A class of uncertain sampleddata systems with random jumping parameters described by finite state semi-Markov process are studied in [6]. In [7] the problem of exponential synchronization of Markovian jumping chaotic neural networks with sampled-data and saturating actuators is studied and in [8] the state estimation of T–S fuzzy delayed neural networks with Markovian jumping parameters using sampled-data control was investigated. Simaan et al. [9] have considered a class of differential games where measurements of the state vector are only at discrete time ∗ Corresponding author at: Faculty of Economics and Business Administration, Sofia University, Sofia 1113, Bulgaria. E-mail address: [email protected] (I.G. Ivanov). https://doi.org/10.1016/j.sysconle.2019.104563 0167-6911/© 2019 Elsevier B.V. All rights reserved.

instances. They have investigated a two-player differential game and the necessary conditions for the sampled data controls have been derived by a backward translation method starting at the last interval of time and following the last state measurements. These conditions have been expressed in terms of Riccati-like differential equations in the linear quadratic case. The terminal conditions of Riccati-like equations are updated through a set of linear differential equations. Further on, Basar [10] has investigated a linear-quadratic stochastic differential game under a stochastic formulation where, the players have access to sampled-data state information with the sampling interval. Basar has proved that the Nash equilibrium in which all players have access to sampled data state information is unique, whenever it exists. Moreover, Basar has obtained the unique Nash equilibrium strategies of N-player linear quadratic stochastic differential game in the case where one player has access to the closed-loop information and the others have access to the sampled data information. Tsai and coauthors [11] have developed a linear quadratic Nash game-based tracker for multiparameters singularly perturbed sampled-data systems. In [12] an open-loop Nash equilibrium strategy under piecewise constant controls with application to gas network optimization is studied.

2

V. Drăgan, I.G. Ivanov and I.-L. Popa / Systems & Control Letters 134 (2019) 104563

In the present paper, we study a linear-quadratic stochastic differential game under a stochastic formulation where the state space representation is described by an Itô differential equation. In our investigation we introduce a new class of controls consisting of the piecewise constant stochastic processes. This allows us to convert the general problem into an equivalent problem described by a system of Itô differential equations with finite jumps. In this way, we succeed to derive necessary and sufficient conditions for the existence of a piecewise Nash equilibrium strategy in a state feedback form. These conditions are expressed in terms of solvability of a system of interconnected matrix linear differential equations with finite jumps and subject to some algebraic constraints. We provide explicit formulae of the gain matrices of the Nash equilibrium strategy in the class of piece-wise constant strategies in a state feedback form. The gain matrices of the feedback Nash equilibrium strategy are computed based on the solution of the considered system of matrix linear differential equations with finite jumps. For the implementation of these strategies only measurements at discrete-time instances of the states of the dynamical system are required. Finally, we show that under some assumptions regarding the sign of the weights matrices in the performance criteria of the two players there exists a unique piecewise Nash equilibrium strategy in a state feedback form if the maximal length of the sampling period is sufficient small. It is worth pointing out that unlike Basar’s paper where the stochastic character of the mathematical model is produced by the presence of an additive white noise perturbation, in the present work, the considered dynamical system is modeled by a linear stochastic differential equation with state multiplicative and control multiplicative white noise perturbations. The presence of state dependent and control dependent terms in the diffusion part of the Itô differential equation does not allow us to transform the considered problem in a discrete-time one, as in [12]. Another important difference between our approach and those from [9,10] consists of the class of considered admissible strategies. As it was already mention, in [9,10] the class of admissible strategies consist of piecewise continuous strategies. In the present work, we consider admissible strategies which are piecewise constant. It is obvious that the piecewise constant strategies are easier implementable than the piecewise continuous strategies. In the later case, even if the measurements of the states are done at discrete-time instances, the gain matrices of the equilibrium strategy are time varying and they have to be known at each intermediary time instance between two measurements. In our approach, the gain matrices of the equilibrium strategy are kept constant between two measurements of the states. The outline of the paper is: Section 2 contains the problem formulation. The main results are enclosed in Section 3. First, in Section 3.1 we restate the problem in an equivalent version as a problem of linear quadratic differential game for a stochastic system described by Itô differential equations with finite jumps. Further, we present conditions which guarantee the existence of a Nash equilibrium strategy. The feasibility of the developed methodology is illustrated by numerical experiments in Section 4.

standard Wiener process defined on a given probability space (Ω , F , P ). The coefficients of the system (1) are continuous matrix valued functions defined on the interval [t0 , tf ]. According to the terminology used in the theory of the differential games (see e.g. [13]), uk (·) : [t0 , tf ] → Rmk are called strategies (policies) available to the player Pk , k = 1, 2. In the definition of a differential games problem an important role is played by the class of admissible strategies. In the present work, we consider the case when each player has access to the piecewise constant strategies of the form:

2. The problem

˜ 1 , F˜ 2 ≤ J1 x0 ; F1 , F˜ 2 J 1 x0 ; F

)

(6a)

˜ 1 , F˜ 2 ≤ J2 x0 ; F˜ 1 , F2 J 2 x0 ; F

)

(6b)

dx(t) = (A0 (t)x(t) + B01 (t)u1 (t) + B02 (t)u2 (t)) dt

+ (A1 (t)x(t) + B11 (t)u1 (t) + B12 (t)u2 (t)) dw(t), t ∈ [t0 , tf ] (1)

where x(t) ∈ Rn are the state variables and uk (t) ∈ Rmk , k = 1, 2 are the control parameters. In (1) {w (t)}t ≥0 is a 1-dimensional

(2)

where t0 < t1 < · · · < tN −1 < tN = tf is a partition of the interval [t0 , tf ]. In (2), Fk (j) ∈ Rmk ×n , k = 1, 2, j = 0, 1, . . . , N − 1 are arbitrary matrices and x(tj ) are the values of the solution of the closed-loop system

(

)

dx(t) = A0 (t)x(t) + B01 (t)F1 (j)x(tj ) + B02 (t)F2 (j)x(tj ) dt

) + A1 (t)x(t) + B11 (t)F1 (j)x(tj ) + B12 (t)F2 (j)x(tj ) dw(t), tj ≤ t < tj+1 , j = 0, 1, . . . , N − 1, (

x(t0 ) = x0 .

(3)

Hence, for the implementation of a strategy of type (2), the player k needs to know the initial state x0 and the sequence of the feedback gains

Fk = (Fk (0), Fk (1), . . . , Fk (N − 1)) , Fk (j) ∈ Rmk ×n .

(4)

This allows us to identify an admissible piecewise strategy of type (2) with a pair (F1 , F2 ) of sequences of feedback gains of type (4). To each player Pk we associate a quadratic performance criterion of the form: Jk (x0 ; F1 , F2 ) = E

+

N −1 ∑ [

[

xTF (tf )Gk xF (tf )

∫

tf

+ t0

xTF (t)Mk (t)xF (t)dt

E xTF (tj ) FkT (j)Rkk (j)Fk (j) + FℓT (j)Rkℓ (j)Fℓ (j) xF (tj )

(

)

]

]

(5)

j=0

where xF (·) is the solution of the closed-loop system (3) corresponding to the pair of feedback gains F = (F1 , F2 ). Regarding the weight matrices in (5) we make the assumption: (H1 ). (a) t → Mk (t) : [t0 , tf ] → Rn×n is a continuous matrix valued function such that Mk (t) = MkT (t), ∀ t ∈ [t0 , tf ]. (b) Gk = GTk , Rkk (j) = RTkk (j), Rkℓ (j) = RTkℓ (j), k, ℓ = 1, 2, ℓ ̸ = k, j = 0, 1, . . . , N − 1. Definition 1.) We say that the pair of sequences of feedback ( ˜ 1 , F˜ 2 achieves a feedaback Nash equilibrium strategy gains F for the linear quadratic (LQ) differential game described by the dynamical system (1), the quadratic cost functionals (5) and the class of piecewise constant admissible strategies of type (2) if the following inequalities hold:

(

Let us consider the controlled system having the state space representation described by:

x(t0 ) = x0

uk (t) = Fk (j)x(tj ), tj ≤ t < tj+1 , j = 0, 1, . . . , N − 1

(

)

)

(

(

for all Fk = (Fk (0), Fk (1), . . . , Fk (N − 1)) , k = 1, 2. In this work we shall provide explicit formulae of the matrices F˜k (j), k = 1, 2, j = 0, 1, . . . , N − 1 which are components of a Nash equilibrium strategy in the class of the admissible strategies in a state feedback form with sampled measurements. We shall see that the main ingredient in the construction of the matrices F˜k (j) is the solution of a problem with given terminal

V. Drăgan, I.G. Ivanov and I.-L. Popa / Systems & Control Letters 134 (2019) 104563

value (TVP) associated to a system of matrix linear differential equations with finite jumps and algebraic constraints in equalities and inequalities form.

Based on (5) for all j0 ∈ {0, 1, . . . , N − 1} we associate the performance criteria

[ Jk (tj0 , ξ0 ; F1 , F2 ) = E ξ

3. Main results

3

ξ

T F (tf )Gk F (tf )

ξ

ξ

T F (t)Mk (t) F (t)dt

+ tj

0

(12)

3.1. An equivalent setting of the problem In this subsection we transform the problem of finding of a Nash equilibrium strategy of a LQ differential game described by the dynamical system (1), the performance criteria (5) and the class of admissible strategies (2)–(3) into an equivalent problem described by a system of Itô differential equations with finite jumps. First, let us remark that the strategies (2)–(3) are a special form of the strategies of type uk (t) = uk (j), tj ≤ t < tj+1 , 0 ≤ j ≤ N − 1.

]

tf

∫

(7)

)T

N −1 ∑ ) ] ( [ + E ξFT (tj ) FTk (j)Rkk (j)Fk (j) + FTℓ (j)Rkℓ (j)Fℓ (j) ξF (tj ) j=j0

k, ℓ = 1, 2, ℓ ̸ = k, where ξF (·) is the solution of (10) with ξF (tj0 ) = ξ0 and

( Gk =

Gk Omn

Onm Omm

)

, Mk (t) =

(

Mk (t) Omn

Onm Omm

)

,

(13)

Gk and Mk (t), Rkk (j), Rkℓ (j) being from (5) and Fk = (Fk (0), . . . , Fk (N − 1)), k = 1, 2. Thus, the dynamical system (8)–

We set ξ (t) = xT (t), v1T (t), v2T (t) , where vk (t) ∈ Rmk , k = 1, 2 are fictitious state variables. With this extended new state variables one easily obtains that the system (1), under the inputs of type (7), is equivalent to the following controlled system with finite jumps:

(9), the performance criteria (12)–(13), together with the set of the admissible strategies of type (11) can define a problem of LQ dynamical game. Thus the analogue of Definition 1 is:

dξ (t) = A0 (t)ξ (t)dt + A1 (t)ξ (t)dw (t), tj ≤ t < tj+1

(8a)

Definition 2.

ξ (tj+ ) = Ad ξ (tj ) + Bd1 u1 (j) + Bd2 u2 (j), j = 0, 1, . . . , N − 1 ( )T ξ (t0 ) = xT0 0T 0T

(8b)

(

(8c)

Ak (t) =

( Ad =

Ak (t) Omn

In Omn

Bk1 (t) Omm1

Onm1 Omm1

)

Bk2 (t) , Omm2

Onm2 Omm2

Bd1 = OTnm1 Im1 OTm2 m1

)T

Bd2 = OTnm2 OTm1 m2 Im2

)T

(

(

)

k = 0, 1,

(14)

(

or equivalently the pair of sequences of feedback gains F˜ 1 , F˜ 2

) ( ≤ J1 tj0 , ξ0 ; F1 , F˜ 2 ) ) ( ( J2 tj0 , ξ0 ; F˜ 1 , F˜ 2 ≤ J2 tj0 , ξ0 ; F˜ 1 , F2 (

J1 tj0 , ξ0 ; F˜ 1 , F˜ 2

, (9)

where m = m1 + m2 and Oqr is the null matrix of size q × r. Applying the Theorem 5.2.1 in [14] on each interval [tj , tj+1 ] we obtain: Proposition 1. For each ξ0 ∈ Rn+m and each finite sequence uk = (uk (0), uk (1), . . . , uk (N − 1)), k = 1,[2, of mk]-dimensional random vectors that are Ftj -measurable and E |uk (j)|2 < ∞, 0 ≤ j ≤ N − 1, k = 1, 2 the system of differential equation with finite jumps (8) has a unique solution ξu (t) = ξ (t ; t0 , ξ0 , u1 , u2 ) with the properties: (a) (b) (c) (d)

u˜ k (j) = F˜ k (j)ξ˜ (tj ), 0 ≤ j ≤ N − 1, k = 1, 2

)

achieves a Nash equilibrium for the LQ dynamical game described by (8), (12) and the class of the admissible strategies (11) if

where we denoted

(

We say that a strategy of type

ξu (t) is Ft -measurable; t→ [ ξu (t) ]is left continuous a.s. in every t ∈ (t0 , tf ]; E |ξu (t)|2 < ∞; ξu (t0 ) = ξ0 .

(15b)

for all j0 ∈ {0, 1, . . . , N − 1}, F1 = (F1 (0), . . . , F1 (N − 1)), F2 = (F2 (0), . . . , F2 (N − 1)), with Fk (j) ∈ Rmk ×(n+m) , 0 ≤ j ≤ N − 1, k = 1, 2. According to the terminology introduced in [13] we call the strategies of type (14) feedback Nash equilibrium strategies for the LQ differential game under consideration. In the next subsection we shall provide explicit formulae of the gain matrices of a feedback Nash equilibrium strategy of the type (14). 3.2. Feedback Nash equilibrium strategy for a stochastic LQ game with finite jumps Let u˜ 1 (t), u˜ 2 (t) be a feedback Nash equilibrium strategy of type (14). Substituting u2 (t) by u˜ 2 (t) = F˜ 2 (j)ξ˜ (tj ) in (8) and (12) written for k = 1, we obtain

(

)

(

dξ (t) = A0 (t)ξ (t)dt + A1 (t)ξ (t)dw (t), tj ≤ t < tj+1

(16a)

)

ξ (tj+ ) = Ad + Bd2 F˜ 2 (j) ξ (tj ) + Bd1 u1 (j) j = 0, 1, . . . , N − 1, (16b)

ξ (tj0 ) = ξ0 . J1

ξ (tj+ ) = (Ad + Bd1 F1 (j) + Bd2 F2 (j)) ξ (tj ), j = 0, 1, . . . , N − 1

(

(16c)

[ ∫ ) T ˜ tjo , ξ0 ; u1 , F2 = E ξ (tf )G1 ξ (tf ) +

tf

] ξ (t)M1 (t)ξ (t)dt T

tj

0

(10)

is well defined. This allows us to consider strategies of the form uk (j) = Fk (j)ξ (tj ), 0 ≤ j ≤ N − 1, k = 1, 2.

(15a)

dξ (t) = A0 (t)ξ (t)dt + A1 (t)ξ (t)dw (t) tj ≤ t < tj+1

In this work Ft ⊂ F stands for the σ -algebra generated by the random variables w (s), 0 ≤ s ≤ t. Based on Proposition 1 we deduce that for arbitrary matrices Fk (j) ∈ Rmk ×(n+m) the solution of the system of linear differential equations with finite jumps

ξ (t0 ) = ξ0

)

(11)

+

N −1 ∑ [ ] E ξ T (tj )Md1 (j)ξ (tj ) + uT1 (j)R11 (j)u1 (j)

(17a)

j=j0

Md1 (j) = F˜ T2 (j)R12 (j)F˜ 2 (j)

(17b)

4

V. Drăgan, I.G. Ivanov and I.-L. Popa / Systems & Control Letters 134 (2019) 104563

where u1 (j) = F1 (j)ξ (tj ), F1 (j) ∈ Rm1 ×(n+m) are arbitrary matrices. Similarly, substituting u1 (j) by u˜ 1 (j) = F˜ 1 (j)ξ (tj ) in (8) and (12) written for k = 2, we obtain

where

dξ (t) = A0 (t)ξ (t)dt + A1 (t)ξ (t)dw (t) tj ≤ t < tj+1

for all j = 0, 1, . . . , N − 1, and φ1 (j) ∈ R matrices. The minimal value of the cost (17) is

(18a)

)

(

ξ (tj+ ) = Ad + Bd1 F˜ 1 (j) ξ (tj ) + Bd2 u2 (j) j = 0, 1, . . . , N − 1, (18c)

(

J2 tj0 , ξ0 ; F˜ 1 , u2

)

[ = E ξ T (tf )G2 ξ (tf ) +

tf

∫

] ξ T (t)M2 (t)ξ (t)dt

tj

(22b) m1 ×(n+m)

(

(18b)

ξ (tj0 ) = ξ0 .

∆ Π11 (P˜ 1 (tj ), j) = R11 (j) + BTd1 P˜ 1 (tj )Bd1 .

J1 t0 , ξ0 ; F˜ 1 , F˜ 2

)

are arbitrary

= ξ0T P˜ 1 (t0− )ξ0 .

(23)

Here and in the sequel, the superscript † denotes the pseudoinverse of a matrix. For precise definition of the pseudoinverse and its properties, we refer to [16]. Similarly, Theorem 1 and Theorem 3 from [15] and Remark 1(b), allow us to obtain:

0

+

N −1 ∑ [

E ξ T (tj )Md2 (j)ξ (tj ) + uT2 (j)R22 (j)u2 (j)

]

(19a)

j=j0

Md2 (j) = F˜ T1 (j)R21 (j)F˜ 1 (j)

(19b)

and u2 (j) = F2 (j)ξ (tj ) with F2 (j) ∈ R matrices.

m2 ×(n+m)

are arbitrary

Corollary 3. The linear quadratic optimal control described by the dynamical system with finite jumps (18) and the performance criterion (19) has an optimal control in a state feedback form if and only if the TVP associated to the MLDE with finite jumps: P˙ 2 (t) + AT0 (t)P2 (t) + P2 (t)A0 (t) + AT1 (t)P2 (t)A1 (t) + M2 (t) = 0 (24a)

Remark 1. (a) The inequality (15a) means that u˜ 1 (j) = F˜ 1 (j)ξ˜ (tj ) achieves the minimum of the quadratic cost (17) along the trajectories of the dynamical system (16) determined by the inputs in a state feedback form u1 (j) = F1 (j)ξ (tj ), for arbitrary F1 (j) ∈ Rm1 ×(n+m) . (b) The inequality (15b) means that u˜ 2 (j) = F˜ 2 (j)ξ˜ (tj ) achieves the minimum of the quadratic cost (19) along the trajectories of the dynamical system (18) determined by the inputs in a state feedback form u2 (j) = F2 (j)ξ (tj ) with arbitrary F2 (j) ∈ Rm2 ×(n+m) . Employing Theorem 1 and Theorem 3 from [15] and Remark 1(a), we obtain:

tj ≤ t < tj+1 ,

(

P˙ 1 (t) +

AT0 (t)P1 (t)

+ P1 (t)A0 (t) +

AT1 (t)P1 (t)A1 (t)

+ M1 (t) = 0 (20a)

(

Ad + Bd2 F˜ 2 (j)

)T

)T

(

P1 (tj )Bd1 R11 (j) + BTd1 P1 (tj )Bd1

)†

(20b)

(20c)

has a solution P˜ 1 (t) defined on the whole interval [t0 , tf ] and satisfies the constraints: R11 (j) + BTd1 P˜ 1 (tj )Bd1

)(

R11 (j) + BTd1 P˜ 1 (tj )Bd1

)†

(21a)

( ) ( ) ˜ ˜ · BTd1 P˜ 1 (tj ) Ad + Bd2 F(j) = BTd1 P˜ 1 (tj ) Ad + Bd2 F(j) R11 (j) + BTd1 P˜ 1 (tj )Bd1 ≥ 0, 0 ≤ j ≤ N − 1.

(21b)

If these conditions are satisfied, then the gain matrices of the optimal control are given by

(

F˜ 1 (j) = −Π11 (P˜ 1 (tj ), j)BTd1 P˜ 1 (tj ) Ad + Bd1 F˜ 1 (j)

+

†

)†

(24b)

)

(24c)

has a solution P˜ 2 (·) defined on the whole interval [t0 , tf ] satisfying the constraints: R22 (j) + BTd2 P˜ 2 (tj )Bd2

)(

(

R22 (j) + BTd2 P˜ 2 (tj )Bd2

)

)†

(25a)

(

· BTd2 P˜ 2 (tj ) Ad + Bd1 F˜ 1 (j) = BTd2 P˜ 2 (tj ) Ad + Bd2 F˜ 1 (j) R22 (j) + BTd2 P˜ 2 (tj )Bd2 ≥ 0, 0 ≤ j ≤ N − 1.

) (25b)

If these conditions are satisfied then the gain matrices of the optimal controls are given by

)

(

)

I − Π11 (P˜ 1 (tj ), j)Π11 (P˜ 1 (tj ), j) φ1 (j)

(

)

†

I − Π22 (P˜ 2 (tj ), j)Π22 (P˜ 2 (tj ), j) φ2 (j)

(26a)

where ∆

Π22 (P˜ 2 (tj ), j) = R22 (j) + Bd2 P˜ 2 (tj )Bd2

)

P1 (tN− ) = G1

(

(

P2 (tN− ) = G2

+

· BTd1 P1 (tj ) Ad + Bd2 F˜ 2 (j) + Md1 (j), 0 ≤ j ≤ N − 1,

†

P2 (tj )Bd2 R22 (j) + BTd2 P2 (tj )Bd2

· BTd2 P2 (tj ) Ad + Bd1 F˜ 1 (j) + Md2 (j), 0 ≤ j ≤ N − 1,

)

P1 (tj ) Ad + Bd2 F˜ 2 (j)

(

(

(

)

F˜ 2 (j) = −Π22 (P˜ 2 (tj ), j)BTd2 P˜ 2 (tj ) Ad + Bd1 F˜ 1 (j)

P1 (tj− ) = Ad + Bd2 F˜ 2 (j)

−

)T

(

P2 (tj ) Ad + Bd1 F˜ 1 (j) −

†

tj ≤ t < tj+1 ,

(

Ad + Bd1 F˜ 1 (j)

(

( Corollary 2. The quadratic performance criterion (17) has an optimal control in a state feedback form if and only if the terminal value problem (TVP) associated to the matrix linear differential equation (MLDE) with finite jumps:

)T

(

P2 (tj− ) = Ad + Bd1 F˜ 1 (j)

) (22a)

(26b)

for all j = 0, 1, . . . , N − 1 and φ2 (j) ∈ Rm2 ×(n+m) are arbitrary matrices. The minimal value of the cost (19) is

(

J2 t0 , ξ0 ; F˜ 1 , F˜ 2

)

= ξ0T P˜ 2 (t0− )ξ0 .

(27)

One sees that the TVP (20) contains matrices F˜ 2 (j) as parameters, while the TVP (24) contains the gain matrices F˜ 1 (j) as parameters. Hence, the two TVPs have to be considered together as a system of interconnected MLDEs with finite jumps. This fact becomes clearer from the next theorem. To make easier the statement of the theorem, we introduce the notations:

Π (P1 (tj ), P2 (tj ), j) = Γ (P1 (tj ), P2 (tj )) =

(

(

Π11 (P1 (tj ), j)

BTd1 P1 (tj )Bd2

BTd2 P2 (tj )Bd1

Π22 (P2 (tj ), j)

BTd1 P1 (tj )Ad BTd2 P2 (tj )Ad

)

.

) (28a) (28b)

V. Drăgan, I.G. Ivanov and I.-L. Popa / Systems & Control Letters 134 (2019) 104563

Theorem 4. lent:

Under the assumption (H1 ) the following are equiva-

(i) the LQ differential game described by the dynamical system with finite jumps (8), the quadratic performance criteria (12) and the class of admissible strategies (11), has a Nash equilibrium strategy of the form u˜ k (j) = F˜ k (j)ξF˜ (tj ), k = 1, 2, j = 0, 1, . . . , N − 1; (ii) the solutions P˜ 1 (·), P˜ 2 (·) of the TVPs (20) and (24) are defined on the whole interval [t0 , tf ] and satisfy the constraints (21b), (25b) together with

Π (P˜ 1 (tj ), P˜ 2 (tj ), j)Π † (P˜ 1 (tj ), P˜ 2 (tj ), j)Γ (P˜ 1 (tj ), P˜ 2 (tj )) = Γ (P˜ 1 (tj ), P˜ 2 (tj ))

(29)

Furthermore, if these conditions are fulfilled, then the gain matrices of the feedback Nash equilibrium strategy are given by F˜ 1 (j) F˜ 2 (j)

)

= − Π † (P˜ 1 (tj ), P˜ 2 (tj ), j)Γ (P˜ 1 (tj ), P˜ 2 (tj )) ( ) + Im − Π † (P˜ 1 (tj ), P˜ 2 (tj ), j)Π (P˜ 1 (tj ), P˜ 2 (tj ), j) φ (j) (30)

j = N − 1, N − 2, . . . , 0, where φ (j) ∈ Rm×(n+m) are arbitrary matrices. The value( of the minimal ) cost of the player

Pk is Jk t0 , ξ0 ; F˜ 1 , F˜ 2 = ξ0T P˜ k (t0− )ξ0 , k = 1, 2.

Proof. If the LQ differential game described by the dynamical system with finite jumps (8), the quadratic performance criteria (12) and the class of admissible strategies (11), has a Nash equilibrium ˜ ξ (tj ), k = 1, 2, then from the necessity part of strategy u˜ k (j) = F(j) Corollaries 2 and 3 we deduce that the TVPs (20) and (24) have the constraints (21) and (25), respectively, and the gain matrices F˜ k (j) of the equilibrium strategies are interconnected by (22) and (26), respectively. Employing the properties of the pseudo inverse of a matrix, we obtain from (22) and (26) that F˜ k (j) satisfy the system of linear equations:

Π11 (P˜ 1 (tj ), j)F˜ 1 (j) + BTd1 P˜ 1 (tj )Bd2 F˜ 2 (j) = −BTd1 P˜ 1 (tj )Ad BTd2 P˜ 2 (tj )Bd1 F˜ 1 (j) + Π22 (P˜ 2 (tj ), j)F˜ 2 (j) = −BTd2 P˜ 2 (tj )Ad .

(31a) (31b)

Based on (28) we may rewrite (31) in a compact form as:

( ) F˜ 1 (j) ˜ ˜ Π (P1 (tj ), P2 (tj ), j) ˜ = −Γ (P˜ 1 (tj ), P˜ 2 (tj )). F2 (j)

(32)

Applying Lemma A.1 from Appendix A in the case of Eq. (32) we may conclude that (29) is satisfied and additionally, all solutions of (32) are given by (30). Thus we have shown that the implication (i) ⇒ (ii) holds. We prove now the converse implication. Let us assume that TVPs (20) and (24), respectively, have solutions P˜ 1 (·), P˜ 2 (·) satisfying the(constraints (21b) and (25b), respectively, together with ) F˜ 1 (j) (29). Let be any of the feedback gains defined by (30). It is F˜ 2 (j) obvious that this will be a solution to Eq. (32) and, consequently, to (31). Using (22b) and (26b) we rewrite (31a) as

( ) Π11 (P˜ 1 (tj ), j)F˜ 1 (j) = −BTd1 P˜ 1 (tj ) Ad + Bd2 F˜ 2 (j)

(33a)

and (31b) as

( ) Π22 (P˜ 2 (tj ), j)F˜ 2 (j) = −BTd2 P˜ 2 (tj ) Ad + Bd1 F˜ 1 (j) .

Since Eqs. (33a) and (33b), respectively, have solutions, we deduce via Lemma A.1, that

(

†

)

Imk − Πkk (P˜ k (tj ), j)Πkk (P˜ k (tj ), j) BTdk P˜ k (tj ) (Ad + Bdℓ Fℓ (j)) = 0 (34)

k, ℓ = 1, 2, ℓ ̸ = k, j = 0, 1, . . . , N − 1. It is obvious that, when k = 1 and ℓ = 2, (34) coincides with (21a), while, when k = 2 and ℓ = 1,(34) is just (25a). In the same time, F˜ 1 (j), F˜ 2 (j) satisfy (22) and (26), respectively. Thus, we have obtained that P˜ 1 (·) satisfies (20)–(22) and P˜ 2 (·) satisfies (24)–(26), respectively. Applying the sufficiency part of Corollary 2, we deduce that

j = N − 1, N − 2, . . . , 0.

(

5

(33b)

)

(

F˜ 1 (j), F˜ 2 (j)

(

F˜ 1 , F˜ 2

)

whose components

are computed via (30) satisfy (15a). On the other

hand,( the sufficiency part of Corollary 3 allows us to conclude ) that F˜ 1 , F˜ 2

satisfies also (15b). The values of the minimal cost

of the player P, k = 1, 2 are obtained directly from (23) and (27), respectively. Thus the proof is complete. ■ Remark 2. For the values of j ∈ {0, 1, . . . , N − 1} for which the matrices Π (P˜ 1 (tj ), P˜ 2 (tj ), j) are invertible, the conditions (29) are redundant. In this case the gain matrices F˜ k (j) are uniquely determined by

(

F˜ 1 (j) F˜ 2 (j)

)

= −Π −1 (P˜ 1 (tj ), P˜ 2 (tj ), j)Γ (P˜ 1 (tj ), P˜ 2 (tj )).

To compute the gain matrices F˜ k (j) of a Nash equilibrium strategy for the LQ dynamical game under consideration, one uses the implication (ii) ⇒ (i) from Theorem 4. To this end, let us describe the computational way of the solutions of the TVPs (20) and (24), respectively, satisfying (29) together with (21b) and (25b), respectively. The main steps are: STEP 1. One computes P˜ 1 (tN −1 ), P˜ 2 (tN −1 ) integrating the backward linear differential equations (20a) and (24a), respectively, on the interval [tN −1 , tN ] using the terminal values (20c) and (24c), respectively. One forms the matrices from (28) in the case j = N − 1. It is checked the feasibility of the conditions (21b), (25b) and (29), for j = N − 1. If these constraints are fulfilled by P˜ 1 (tN −1 ) and P˜ 2 (tN −1 ) then go to STEP 2. If, at least one of them is violated, then algorithm stops because the problem has no solution (the validity of the conditions (21b), (25b) and (29) being necessary and sufficient condition for the existence of Nash equilibrium strategy in a state feedback form). STEP 2. One computes F˜ k (N − 1) solving Eq. (32) written for j = N − 1, or directly by (30). Next, one computes P˜ k (tN−−1 ), k = 1, 2, from (20b) and (24b), respectively, written for j = N − 1. STEP 3. Assuming that for a j ∈ {1, 2, . . . , N − 1} we have already computed P˜ k (tj− ), k = 1, 2, we compute P˜ k (tj−1 ), k = 1, 2, integrating the backward linear differential equations (20a) and (24a), respectively, with the terminal values P˜ k (tj− ), k = 1, 2. One checks that P˜ k (tj−1 ) satisfy the constraints (21b), (25b) and (29) written for j replaced by j − 1. If these constraints are satisfied, then go to the next step (STEP 4). If at least one is violated, the algorithm stops, because in this case the LQ differential game under consideration has no Nash equilibrium strategy in a state feedback form. STEP 4. One computes F˜ k (tj−1 ), k = 1, 2, as solution of Eq. (32) or directly from (30) written for j replaced by j − 1. Further, one computes P˜ k (tj− −1 ) from (20b) and (24b), respectively, written for j replaced by j − 1. STEP 5. If j > 1, go to STEP 3 replacing j by j − 1. If j = 1 the algorithm stops. The matrices F˜ k (j), k = 1, 2, j = 0, 1, . . . ,

6

V. Drăgan, I.G. Ivanov and I.-L. Popa / Systems & Control Letters 134 (2019) 104563

N − 1 obtained by this procedure provide the gain matrices of a Nash equilibrium strategy for the LQ differential game under consideration.

According with Lemma A.1, Eq. (38) has a solution if and only if (37) is fulfilled. Employing (28b), (30), (36b), (38) and recalling that

)T ( ξ (t) = xT (t) v1T (t) v2T (t)

3.3. The case of positive semidefinite weights matrices

we obtain In this subsection we analyze the solvability of the TVPs (20) and (24), respectively, when the assumption (H1 ) is replaced by a new assumption that contains more information regarding the sign of the weights matrices. (H2 ). For each k = 1, 2 the weights matrices from (5) satisfy:

(

where F˜k (j), k = 1, 2, are obtained solving Eq. (38). From (7), (8) with uk = F˜k (j)x(tj ) we obtain the closed-loop system:

[

(a) Mk (t) =

MkT (t)

(b) Rkℓ (j) =

RTkℓ (j)

≥ 0, ∀ t ∈ [t0 , tf ];

+

≥ 0, ℓ = 1, 2, 0 ≤ j ≤ N − 1;

)

]

[

(

)

]

A1 (t)x(t) + B11 (t)F˜1 (j) + B12 (t)F˜2 (j) x(tj ) dw (t) (39)

The next result shows that under the assumption (H2 ), the solutions of the TVPs (20) and (24) can be prolonged on the whole interval [t0 , tf ] and satisfy the constraints (21) and (25), respectively, provided that the conditions (29) are fulfilled. Theorem 5. Assume that the assumption (H2 ) is fulfilled. If for some j0 ∈ {1, 2, . . . , N − 1} the solutions P˜ 1 (t), P˜ 2 (t) of ] (20) and [ the TVPs (24), respectively, are well defined for any t ∈ tj0 , tN and satisfy the conditions (29) for any j0 ≤ [ j≤N− ] 1, then, these solutions are defined on the whole interval t , t . Moreover P˜ k (t) ≥ 0, for all j − 1 N 0 [ ] t ∈ tj0 −1 , tN and satisfy the constraints (21) and (25), respectively, for any j0 − 1 ≤ j ≤ N − 1. Proof. See Appendix B.

Now we are in position to provide explicit formulae of the gain matrices F˜k (j) ∈ Rmk ×n , k = 1, 2, 0 ≤ j ≤( N − )1, which are ˜ 1 , F˜ 2 in the class components of a Nash equilibrium strategy F of the admissible strategies of the form (2)–(3). Let

⎛

P˜ k,00 (t)

P˜ k,01 (t)

P˜ k,02 (t)

⎜˜T ⎝Pk,01 (t) P˜ T (t)

P˜ k,11 (t)

P˜ k,12 (t)⎠

P˜ kT,12 (t)

P˜ k,22 (t)

⎞ (35)

⎟

be the partition of the solutions of the TVPs (20) and (24), respectively, compatible with the partitions of their coefficients given in (9) and (13). Thus we obtain that (28) has the structure:

( Π (P˜ 1 (tj ), P˜ 2 (tj ), j) =

( Γ (P˜ 1 (tj ), P˜ 2 (tj )) =

R11 (j) + P˜ 1,11 (tj )

P˜ 1,12 (tj )

P˜ 2T,12 (tj )

R22 (j) + P˜ 2,22 (tj )

P˜ 1T,01 (tj )

Om1 m

P˜ 2T,02 (tj )

Om2 m

(36a)

.

(36b)

)

P˜ 1T,01 (tj )

P˜ 2T,02 (tj )

( =

P˜ 1T,01 (tj ) P˜ 2T,02 (tj )

) . (37)

The linear equation (32) becomes

F˜2 (j)

( =

(40)

u˜ k (t) = F˜k (j)x˜ (tj ), tj ≤ t < tj+1 , 0 ≤ j ≤ N − 1,

(41)

F˜k (j) being the solution of Eq. (38) and x˜ (tj ) being the values of the solution of the closed-loop system (39). The equilibrium pay off of the player Pk is Jk (x0 ; u˜ 1 , u˜ 2 ) = xT0 P˜ k,00 (t0− )x0 , k = 1, 2. Remark 3. From Theorem 5 it follows that if the assumption (H1 ) is replaced by the assumption (H2 ), then the conditions (40) are automatically satisfied by the solutions P˜ k (·). Now we point out a set of conditions which guarantee that the solutions P˜ 1 (·), P˜ 2 (·) of TVPs (20) and (24), respectively, satisfy the constraints (37). To this end, we introduce a new assumption which is a little bit stronger than (H2 ). (H3 ). The weights matrices from (5) satisfy the following sign conditions: (a) Mk (t) = MkT (t) ≥ 0,

t0 ≤ t ≤ tf ;

(c) Rkℓ (j) = RTkℓ (j) ≥ 0; (d) Rkk (j) = RTkk (j) > 0;

Π (P˜ 1 (tj ), P˜ 2 (tj ), j)Π † (P˜ 1 (tj ), P˜ 2 (tj ), j)

)

Rkk (j) + P˜ k,kk (tj ) ≥ 0, k = 1, 2, 0 ≤ j ≤ N − 1.

(b) Gk = GTk ≥ 0;

)

(

F˜1 (j)

Theorem 6. Under the assumption (H1 ), the LQ differential game described by the dynamical system (1), the performance criteria (5), the class of admissible strategies (2)–(3), has a Nash equilibrium strategy, if and only if the solutions P˜ 1 (·), P˜ 2 (·) of the TVPs (20) and (24), respectively, with the partitions given in (35) are defined on the whole interval [t0 , tf ] and satisfy the constraints (37) together with:

)

The conditions (29) become:

(

for tj ≤ t < tj+1 , j = 0, 1, . . . , N − 1, x(t0 ) = x0 . Thus we may provide the solution of the problem of the existence of a piecewise Nash equilibrium strategy in a state feedback form stated in Section 2.

Furthermore, if these conditions are satisfied, then a Nash equilibrium strategy is given by

■

3.4. Piecewise Nash equilibrium strategy in a state feedback form

Π (P˜ 1 (tj ), P˜ 2 (tj ), j)

(

dx(t) = A0 (t)x(t) + B01 (t)F˜1 (j) + B02 (t)F˜2 (j) x(tj ) dt

(c) Gk = GTk ≥ 0.

k,02

)

uk (j) = F˜ k (j)ξ (tj ) = F˜k (j) 0 ξ (tj ) = F˜k (j)x(tj )

)

P˜ 1T,01 (tj ) P˜ 2T,02 (tj )

.

(38)

for 0 ≤ j ≤ N − 1, k, ℓ = 1, 2, ℓ ̸ = k. Proposition 7. Let h = max0≤k≤N −1 {tk+1 − tk }. If the assumption (H3 ) is fulfilled then the solutions P˜ 1 (·), P˜ 2 (·) of TVPs (20) and (24), respectively, are defined and are positive semidefinite on the whole interval [t0 , tf ]. Furthermore, there exists h˜ > 0 with the property that the matrices Π (P˜ 1 (tj ), P˜ 2 (tj ), j) defined in (36a) are invertible ˜ for any 0 < h < h.

V. Drăgan, I.G. Ivanov and I.-L. Popa / Systems & Control Letters 134 (2019) 104563

Proof. If the assumption (H3 ) is satisfied, then, evidently the assumption (H2 ) is satisfied, too. Then, based on Theorem 5, the solutions P˜ 1 (·), P˜ 2 (·) of TVPs (20), (24), respectively, are well defined and are positive semidefinite on the whole interval [t0 , tf ]. Invoking the partition (35) together with the assumption (H3 ) (d) we infer that there exist δk > 0 such that Rkk (j) + P˜ k,kk (tj ) ≥ δk Imk , 0 ≤ j ≤ N − 1, k = 1, 2.

(42)

Further, employing the linear differential equations satisfied by P˜ 1,12 (·) and P˜ 2,12 (·) we obtain the upper bounds

∥P˜ k,12 (tj )∥ ≤ γk h, 0 ≤ j ≤ N − 1, k = 1, 2,

(43)

for all X ∈ Sn+m1 +m2 . The exponential eLs is approximated by eLs [X] ⋍

p ∑ sℓ

ℓ=0

p1 ∑ hℓ

Pk (tj ) =

ℓ=0

hp1 +1 (p1 + 1)!

Π (P˜ 1 (tj ), P˜ 2 (tj ), j) = diag(R11 (j) + P˜ 1,11 (tj ), R22 (j) + P˜ 2,22 (tj ))(Im + Ξ (j)),

(p2 + 2)!

Ξ (j) =

(

0 Ξ2 (j)

Ξ1 (j)

∥Ξk (j)∥ ≤ γk δk h, −1

0 ≤ j ≤ N − 1, k = 1, 2 which leads to

∥Ξ (j)∥ ≤ γ h

(45)

∥Ξ (j)∥ < 1 if 0 < h < γ −1 .

(46)

Thus, from (44) we may conclude that Π (P˜ 1 (tj ), P˜ 2 (tj ), j) are invertible for any 0 ≤ j ≤ N − 1, if h satisfies (46), because in this case the matrices Im + Ξ (j) are invertible. The proof is complete. ■ Remark 4. From Proposition 7 it follows that under the assumption (H3 ), if the maximal length max0≤k≤N −1 {tk+1 − tk } of the sampling period satisfies a condition of type (46), then the conditions (37) and (40) are automatically satisfied. Furthermore, Eq. (38) has a unique solution given by

( = Π −1 (P˜ 1 (tj ), P˜ 2 (tj ), j)

P˜ 1T,01 (tj )

λmax {Lp1 +1 [Pk (tj−+1 )]} < toll

λmax {Lp2 +1 [Mk ]} < toll.

The iterations Lℓ [X] are computed from:

P˜ 2T,02 (tj )

.

(47)

h

eLs [Mk ]ds

(48)

0

where tj = jh and

L[X] = AT0 X + XA0 + AT1 XA1

1.2 0.8

0.95 0 .7

0.7 0.24

0.19 0 .9

0.1 0.2

0.06 0 .3

0.8 0.7

0.7 0.95

( ( A1 =

( B12 =

( M1 =

( G1 =

1.2 0.45

)

0.04 0 .5

0.09 0.04

0.04 0.08

0.95 0.8

0 .8 1.15

)

)

)

( M2 =

0.45 1.5

)

( G2 =

0.25 0.8

0.05 0.04

0.04 0.08

R11 =

0.2 0.4

( B11 =

0.6 0.25

(

R12 =

)

)

( R22 =

)

)

0.3 0.15

0.15 0.4

0.06 0.07

0.07 0.094

( R21 =

)

) )

.

Applying the algorithm introduced in Remark 1 we obtain:

In this section we consider the time invariant case of the system (1) and the performance criteria (5). This means that Ak (t) = Ak , Bk (t) = Bk , Mk (t) = Mk , t ∈ [t0 , tf ], Rkℓ (j) = Rkℓ , 0 ≤ j ≤ N − 1, k, ℓ = 1, 2. Assume also that tk+1 − tk = h, k = 0, 1, . . . , N − 1. In this case, the algorithm described in Remark 1 has some updates. First, from (20a) and (24a) one obtains:

∫

(51)

Pk (tj− +1 ) −

Example. We define the matrix coefficients Ak , Bkj , k = 0, 1, j = 1, 2, Mk , Gk , k = 1, 2, Rkk , Rkℓ , k, ℓ = 1, 2, k ̸ = ℓ . We take the additional constants t0 = ( ) ( 0, tf =) 1, h = 0.2, N = 5. 1.5 0.17 1.5 0.7 A0 = B01 = 0.07 −1.4 0.3 0.4

(

)

4. Numerical experiments

Pk (tj ) = eLh [Pk (tj− + 1 )] +

hp2 +2

B02 =

0 ≤ j ≤ N − 1, where γ ≥ (γ12 δ1−2 + γ22 δ2−2 )1/2 . So,

)

(50)

ℓ=0

for ℓ ≥ 1 with L [X] = X where X = or X = Mk , respectively. For j = N − 1 one takes Pk (tN ) = Gk , k = 1, 2 (accordingly with (20c) and (24c), respectively). Taking Gk ≥ 0, Mk ≥ 0, Rkk > 0, Rkℓ ≥ 0, k, ℓ = 1, 2, ℓ ̸ = k one can choose h > 0 such that the matrices (36a) be invertible. In this case, F˜1 (j), F˜2 (j) are computed via (47).

0

Based on (42) and (43) we deduce that

F˜2 (j)

p2 ∑ hℓ+1 Lℓ [Mk ] (ℓ + 1)!

0

Ξ1 (j) = (R11 (j) + P˜ 1,11 (tj ))−1 P˜ 1,12 (tj ); Ξ2 (j) = (R22 (j) + P˜ 2,22 (tj ))−1 P˜ 2T,12 (tj ).

F˜1 (j)

ℓ!

Lℓ [Pk (tj−+1 )] +

Lℓ [X] = AT0 Lℓ−1 [X] + Lℓ−1 [X]A0 + AT1 Lℓ−1 [X]A1

)

with

(

Lℓ [X].

where pi ≥ 1 are such that

and

0 ≤ j ≤ N − 1, where

ℓ!

Thus (48) becomes

where γk > 0 are constants depending upon the upperbound of the coefficients. The matrix from (36a) may be rewritten as:

(44)

7

(49)

( ˜F1 (0) = −0.8867 −0.0024 ( −0.6299 F˜ 2 (0) = −0.1948 ( −0.7976 F˜ 1 (1) = 0.0021 ( −0.6487 F˜ 2 (1) = −0.2040 ( −0.7275 F˜ 1 (2) = −0.0053 ( −0.6732 F˜ 2 (2) = −0.2192

) −0.0415 −0.0567 ) −0.1392 −0.0581 ) −0.0301 −0.0627 ) −0.1933 −0.0816 ) −0.0731 −0.0813 ) −0.3041 −0.1298

8

V. Drăgan, I.G. Ivanov and I.-L. Popa / Systems & Control Letters 134 (2019) 104563

( ˜F1 (3) = −0.6491 −0.0146 ( −0.7739 F˜ 2 (3) = −0.2602 ( ˜F1 (4) = −0.6491 −0.0146 ( −0.7739 F˜ 2 (4) = −0.2602

Further, we shall show recursively, for j0 = N − 1, N − 2, . . . that the solutions P˜ k (t) can be prolonged on the interval [tj0 −1 , tj0 ] if they are already well defined on [tj0 , tf ] and satisfy (29) for j0 ≤ j ≤ N − 1. To this end, we introduce the matrices:

) −0.1700 −0.1189 ) −0.5236 −0.2205 ) −0.1700 −0.1189 ) −0.5236 −0.2205

(

Mdk (j0 ) Θkl (j0 ) = 0

) ((

Ad + Bdl F˜ l (j0 )

)T )

BTdk

)

(

· P˜ k (tj0 ) Ad + Bdl F˜ l (j0 ) Bdk .

( Thus, ) following Remark 1 we have found the feedback gains F˜ 1 , F˜ 2 achieve a Nash equilibrium for the LQ dynamical game described by (8)–(12). In the example F˜ k = (F˜ k (0), F˜ k (1), F˜ k (2), F˜ k (3), F˜ 1 (4) ) , k = 1, 2. Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Appendix A. Two useful lemmas Lemma A.1. Let K, H be matrices of compatible dimensions. The matrix equation

KZ = H

(A.1)

KK† H = H. If (A.2) is fulfilled, then all solutions of Eq. (A.1) are given by

φ being an arbitrary matrix of compatible dimension. ■

(

tN

∫

T∗ (s, t)[Mk (s)]ds

(B.6)

t

t ∈ [tN −1 , tN ]. From (13) and the assumption (H2 ) (c) we deduce that Gk ≥ 0, k = 1, 2. Thus, from (B.6) we may conclude that the solutions P˜ k (·) are well defined on the whole interval [tN −1 , tN ] and P˜ k (t) ≥ 0. Since (29) is assumed to be true for j = N − 1, it follows that we can compute F˜ k (N − 1) from (30) written for j = N − 1. Hence, the matrices Θkl (N − 1) are well defined by (B.5) with j0 = N − 1. On the other hand from (17b), (19b) respectively, together with the assumption (H2 ) (b), we may infer that Θkl (N − 1) ≥ 0, k, l = 1, 2, l ̸ = k. Applying Lemma A.2 from Appendix A, in the case of the matrix U = Θkl (N − 1) we obtain that the values P˜ k (tN −1 ), k = 1, 2, satisfy the constraints (21) and (25), respectively and additionally

−

)T

(

(

P˜ k (tN −1 ) Ad + Bdl F˜ l (N − 1)

Ad + Bdl F˜ l (N − 1)

)T

)

P˜ k (tN −1 )Bdk

( )† × Rkk (N − 1) + BTdk P˜ k (tN −1 )Bdk ( ) · BTdk P˜ k (tN −1 ) Ad + Bdl F˜ l (N − 1) + Mdk (N − 1) ≥ 0 ,

Z = K† H + (I − K† K)φ,

Proof. See Lemma 2.7 from [17].

P˜ k (t) = T∗ (tN , t)[Gk ] +

Ad + Bdl F˜ l (N − 1)

(A.2)

(B.5)

For j0 = n − 1, (B.3) together with (20c) and (24c), respectively, yields

(

has a solution Z if and only if the following condition is satisfied:

)

U11 U12 T U12 U22 T be such that Ujj = Ujj , j = 1, 2. Then the following are equivalent: Lemma A.2 (Extended Schur Lemma [18]). Let U =

(i) U ≥ 0; (ii) U22 ≥ 0, (I −

0 Rkk (j0 )

(B.7)

k, l = 1, 2, l ̸ = k. From (20b) and (B.7) for k = 1, l = 2 and (24b) and (B.7) with k = 2, l = 1, respectively, we deduce that P˜ k (tN−−1 ), k = 1, 2, are well defined and we have P˜ k (tN−−1 ) ≥ 0. This allows us to deduce via (B.3) written for j0 = N − 2 that P˜ k (t) are well defined and positive semidefinite for any t ∈ [tN −2 , tN −1 ]. Let us assume by induction that for some j0 ∈ {0, 1, 2, . . . , N − 2}, P˜ k (tj−+1 ), k = 1, 2, are well defined and positive semidefi0

† T U22 U22 )U12

= 0, U11 −

† T U12 U22 U12

= 0.

Appendix B. Proof of Theorem 5 Let us remark that for t ∈ [tj0 , tj0 +1 ] the solutions P˜ k (·) of (20a) and (24a), respectively, have the representations: P˜ k (t) = T∗ (tj0 +1 , t)[P˜ k (tj−+1 )] + 0

tj +1 0

∫

T∗ (s, t) [Mk (s)] ds

(B.3)

t

nite. Then, from (B.3) we deduce that P˜ k (t) are well defined and positive semidefinite for any t ∈ [tj0 , tj0 +1 ]. Since (29) are assumed to be true for j = j0 , it follows that the gain matrices F˜ k (j0 ) can be computed via (30) written for j = j0 . Invoking again (13) together with the assumption (H2 ) (b) we may infer that Θkl (j0 ) ≥ 0, k, l = 1, 2, l ̸ = k. Applying Lemma A.2 in the case U = Θkl (j0 ) we deduce that the values P˜ k (tj0 ) of the solutions of TVPs (20) and (24), respectively, satisfy the constraints (21) and (25), respectively, together with

)T

(

)

k = 1, 2, T∗ (s, t) being the adjoint of the linear evolution operator on the Hilbert space of the symmetric matrices Sn+m defined by the linear differential equation

(

˙ = A0 (s)X(s) + X(s)AT0 (s) + A1 (s)X(s)AT1 (s). X(s)

( ) · BTdk P˜ k (tj0 ) Ad + Bdl F˜ l (j0 ) + Mdk (j0 ) ≥ 0.

(B.4)

Applying Theorem 2.6.1 from [19] in the case of the linear differential equation (B.4) we deduce that T∗ (s, t) is a positive operator, that is, T∗ (s, t)[X] ≥ 0, for all s ≥ t ≥ t0 , if X ≥ 0. So, (B.3) allows us to infer that P˜ k (t) ≥ 0, for all t ∈ [tj0 , tj0 +1 ] provided that P˜ k (tj−+1 ) ≥ 0 and Mk (s) ≥ 0. From (13) and assumption (H2 ) (a) 0

we deduce that Mk (s) ≥ 0, s ∈ [t0 , tf ].

Ad + Bdl F˜ l (j0 )

−

(

P˜ k (tj0 ) Ad + Bdl F˜ l (j0 )

Ad + Bdl F˜ l (j0 )

)T

(

P˜ k (tj0 )Bdk Rkk (j0 ) + BTdk P˜ k (tj0 )Bdk

)† (B.8)

From (20b) and (B.8) with k = 1, l = 2 and (24b) together with (B.8) with k = 2, l = 1, we may conclude that P˜ k (tj− ) 0 are well defined and positive semidefinite. Then, (B.3) written for j0 replaced by j0 − 1 allows us to deduce that the solutions P˜ k (·), k = 1, 2, are well defined and positive semidefinite on the interval [tj0 −1 , tj0 ]. Thus the proof is complete.

V. Drăgan, I.G. Ivanov and I.-L. Popa / Systems & Control Letters 134 (2019) 104563

References [1] K.M. Ramachandran, C.P. Tsokos, Stochastic Differential Games Theory and Applications, in: Atlantis Studies in Probability and Statistics, Atlantis Press, 2012. [2] D.W.K. Yeung, L.a. Petrosyan, Cooperative Stochastic Differential Games, in: Springer Series in Operations Research and Financial Engineering, Springer, 2006. [3] C.-K. Zhang, H.-y. Zhou, H.-n. Zhou, N. Bin, Non-Cooperative Stochastic Differential Game Theory of Generalized Markov Jump Linear Systems, in: Studies in Systems, Decision and Control, 67, Springer, 2017. [4] L.-S. Hu, Y.-Y. Cao, H.-H. Shao, Constrained robust sampled-data control for nonlinear uncertain systems, Internat. J. Robust Nonlinear Control 12 (2002) 447–464. [5] L.-S. Hu, J. Lam, Y.-Y. Cao, H.-H. Shao, A linear matrix inequality (LMI) approach to robust H2 sampled-data control for linear uncertain systems, IEEE Trans. Syst. Man Cybern. B 33 (2003) 149–155. [6] L. Hu, P. Shi, B. Huang, Stochastic stability and robust control for sampleddata systems with Markovian jump parameters, J. Math. Anal. Appl. 313 (2006) 504–517. [7] R. Rakkiyappan, V. Preethi Latha, Q. Zhu, Z. Yao, Exponential synchronization of Markovian jumping chaotic neural networks with sampled-data and saturating actuators, Nonlinear Anal. Hybrid Syst. 24 (2017) 28–44. [8] M. Syed Ali, N. Gunasekaran, Q. Zhu, State estimation of t-s fuzzy delayed neural networks with Markovian jumping parameters using sampled-data control, Fuzzy Sets and Systems 306 (1) (2017) 87–104. [9] M.S. Imaan, J.B. Cruz Jr, Sampled-data Nash controls in nonzero-sum differential games, Internat. J. Control 17 (6) (1973) 1201–1209.

9

[10] T. Başar, On the existence and uniqueness of closed-loop sampled-data nash controls in linear-quadratic stochastic differential games, in: K. Iracki, K. Malanowski, S. Walukiewicz (Eds.), Optimization Techniques, in: Lecture Notes in Control and Information Sciences, vol. 22, Springer, Berlin, Heidelberg, 1980. [11] J.S.-H. Tsai, Z.-Y. Yang, S.-M. Guo, L.-S. Shieh, C.-W. Chen, Linear quadratic Nash game-based tracker for multiparameter singularly perturbed sampled-data systems: digital redesign approach, Int. J. Gen. Syst. 36 (6) (2007) 643–672. [12] T.P. Azevedo-Perdicoúlis, G. Jank, A disturbed Nash game approach for gas network optimization, Int. J. Tomogr. Statist., Special Issue: Control Appl. Optim. - Optim. Methods Differential Games Time Delay Control Syst. Econ. Manag. 6 (2007) 43–49. [13] T. Basar, G.J. Olsder, Dynamic Non-Cooperative Game Theory, SIAM, 1999. [14] B. Oksendal, Stochastic Differential Equations, Springer, 1998. [15] V. Dragan, I.G. Ivanov, On the stochastic linear quadratic control problem in the class of piecewise constant admissible controls, J. Franklin Inst. (2019) submited for publication. [16] R. Penrose, A generalized inverse of matrices, Math. Proc. Cambr. Philos. Soc. 52 (1955) 17–19. [17] M.A. Rami, J.B. Moore, X.Y. Zhou, Indefinite stochastic linear quadratic control and generalized differential riccati equation, SIAM J. Control Optim. 40 (4) (2001) 1296–1311. [18] A. Albert, Conditions for positive and nonnegative definiteness in terms of pseudo-inverses, SIAM J. Appl. Math. 17 (1969) 434–440. [19] V. Dragan, T. Morozan, A.M. Stoica, Mathematical Methods in Robust Control of Linear Stochastic Systems, second ed., Springer, 2013.

Stochastic linear quadratic differential games in a state feedback setting with sampled measurements

Stochastic linear quadratic differential games in a state feedback setting with sampled measurements

Recommend Documents