Proceedings of the 20th World Congress Proceedings of 20th The International Federation of Congress Automatic Control Proceedings of the the 20th World World The International Federation of Congress Automatic Control The International Federation of Automatic Control Proceedings of the 20th9-14, World Congress Toulouse, France, July 2017 The International Federation of Automatic Control Toulouse, France, July 9-14, 2017 Available online at www.sciencedirect.com Toulouse, France, July The International of Automatic Control Toulouse, France,Federation July 9-14, 9-14, 2017 2017 Toulouse, France, July 9-14, 2017
ScienceDirect
IFAC PapersOnLine 50-1 (2017) 14332–14337 Interference Induced Games in Networked Interference Interference Induced Induced Games Games in in Networked Networked Control Systems and a Class of Dual Interference Induced Games in Networked Control and a Class of Dual Control Systems Systems and a Class of of Dual Control Solutions Control Systems and a Class Dual Control Control Solutions Solutions Control Solutions Mehdi Abedinpour Fallah ∗∗ , Roland P. Malham´ e ∗∗
Mehdi Abedinpour Fallah P. Malham´ e ∗ , Roland ∗∗ ∗ Mehdi Fallah ,, Roland P. e Mehdi Abedinpour Abedinpour Fallah ∗∗Martinelli Roland ∗∗ P. Malham´ Malham´ e ∗∗ and Francesco and Francesco Martinelli ∗∗ Mehdi Abedinpour Fallah Martinelli , Roland ∗∗ P. Malham´ e and and Francesco Francesco Martinelli ∗∗ and Francesco Martinelli ∗ ´ ∗ GERAD and Department of Electrical Engineering, Ecole ´ ∗ GERAD and Department of Electrical Engineering, Ecole ´ ∗ ´ GERAD and of Electrical Engineering, Ecole GERAD and Department Department of Montr´ Electrical Engineering, Ecole Polytechnique de Montr´eeal, eeal, Qu´ eebec, Canada, al, Montr´ al, Qu´ bec, Canada, ∗ Polytechnique de Montr´ ´ GERAD and Department of Electrical Engineering, Ecole Polytechnique de Montr´ e al, Montr´ e al, Qu´ e bec, Canada, Polytechnique de Montr´ e al, Montr´ e al, Qu´ e bec, Canada, (e-mails: {mehdi.abedinpour-fallah; roland.malhame}@polymtl.ca) (e-mails: {mehdi.abedinpour-fallah; roland.malhame}@polymtl.ca) ∗∗ Polytechnique de Montr´ e al, Montr´ e al, Qu´ e bec, Canada, (e-mails: {mehdi.abedinpour-fallah; roland.malhame}@polymtl.ca) (e-mails: {mehdi.abedinpour-fallah; roland.malhame}@polymtl.ca) di Ingegneria Civile ee Ingegneria Informatica, ∗∗ Dipartimento di Ingegneria Civile Informatica, ∗∗ ∗∗ Dipartimento (e-mails: {mehdi.abedinpour-fallah; roland.malhame}@polymtl.ca) Dipartimento di Ingegneria Civile eedelIngegneria Ingegneria Informatica, Dipartimento di Ingegneria Civile Ingegneria Informatica, Universit` a di Roma “Tor Vergata”, via Politecnico, Rome, Italy, Universit` a di Roma “Tor Vergata”, via del Politecnico, Rome, Italy, ∗∗ Dipartimento di“Tor Ingegneria Civile edel Ingegneria Informatica, Universit` a Roma Vergata”, via Politecnico, Rome, Universit` a di di(e-mail: Roma “Tor Vergata”, via del Politecnico, Rome, Italy, Italy,
[email protected]) (e-mail:
[email protected]) Universit` a di(e-mail: Roma “Tor Vergata”, via del Politecnico, Rome, Italy,
[email protected]) (e-mail:
[email protected]) (e-mail:
[email protected]) Abstract: Networked control systems must use communication links between control hubs Abstract: Networked Networked control control systems systems must must use use communication communication links links between between control control hubs hubs Abstract: Abstract: Networked control systems must use communication links between control hubs and distributed components, possibly both to observe component states, and to send control and distributed components, possibly both to observe component states, and to send control Abstract: Networked control systems must communication links between hubs and distributed components, possibly both to observe component states, and to send control and distributed components, possibly both to use observe componentand states, andsystem to control send control commands. We consider a model of CDMA based communication control where the commands. We consider consider a model model of aa a CDMA CDMA based communication and control system where the and distributed components, possibly both to observe component states, andis to send control commands. We a of based communication and control system where the commands. We consider a model of a CDMA based communication and control system where the power sent from the components to the base station, acting as the control hub, proportional to power sent from the components to the base station, acting as the control hub, is proportional to commands. We consider a model of a CDMA based communication and control system where the power sent from the components to the base station, acting as the control hub, is proportional to power sent from the components to the base station, acting as the control hub, is proportional to their (scalar) state, and in turn, the base station sends back the required control commands their (scalar) state, and in turn, the base station sends back the required control commands to power sent from the components to the base station, acting as the the control hub, is linear, proportional to their (scalar) state, and in the base station sends back required control commands their (scalar) state, and in turn, turn, the base station sends back the required control commands to the components. The systems are linear, and commands are constrained to be possibly the components. The systems are linear, and commands are constrained to be linear, possibly their (scalar) state, and in turn, the baseand station sends back the required control commands to the components. The systems arecurrent linear, and are constrained to be possibly the The systems linear, and commands areof to be linear, linear, possibly time varying feedback laws on acommands limited set recent measurements. However, timecomponents. varying feedback feedback laws on onare current and limited set set ofconstrained recent measurements. measurements. However, the components. The systems arecurrent linear, and and commands areof constrained to be linear, possibly time varying laws aaathe limited recent However, time varying feedback laws on current and limited set of recent measurements. However, the individual measurements as decoded by base station include interference terms from the individual individual measurements as decoded decodedand by athe the base base station includemeasurements. interference terms terms from time varying feedback laws on set of recent However, the measurements as by include interference from the individual measurements as current decoded by thelimited base station station include interference terms from set of all other components, and this inadvertently creates an interference induced game the set of all other components, and this inadvertently creates an interference induced game individual measurements as decoded thedual baseeffects: station include interference terms game from the set of all components, and this inadvertently creates an interference induced the set of The all other other components, this by inadvertently creates interference situation. consequence is that controls have they steer individual systems, but situation. The consequence is that thatand controls have dual dual effects: effects: theyan steer individualinduced systems,game but the set of all other components, and this inadvertently creates an interference induced game situation. The consequence is controls have they steer individual systems, but situation. The consequence is that controls have dual effects: they steer individual systems, but they can also help create additional interference. We propose an algorithm which accounts for a they can also help create additional interference. We propose an algorithm which accounts for situation. The consequence is that controls have dual effects: they steer individual systems, butaaa they can help create additional interference. We an algorithm which for they can also also help create additional interference. We propose propose an algorithm which accounts accounts for combination of control and estimation costs to compute symmetric Nash equilibria if they exist. combination of control and estimation costs to compute symmetric Nash equilibria if they exist. they can alsoof create additional interference. We propose an algorithm which accounts for a combination control and estimation costs symmetric Nash if combination ofhelp control and estimation costs to to compute compute symmetric Nash equilibria equilibria if they they exist. exist. © 2017, IFAC of (International of Automatic Control) symmetric Hosting by Elsevier Ltd. All rights reserved. combination control andFederation estimation costs to compute Nash equilibria if they exist. Keywords: Interference induced games; networked control; CDMA; Nash equilibria. Keywords: Interference induced games; networked control; CDMA; Nash equilibria. Keywords: Interference Interference induced induced games; games; networked networked control; control; CDMA; CDMA; Nash Nash equilibria. equilibria. Keywords: Keywords: Interference induced games; networked control; CDMA; Nash equilibria. 1. INTRODUCTION 1. INTRODUCTION INTRODUCTION 1. 1. INTRODUCTION 1.System INTRODUCTION Networked Control (NCS) refers to decentralized Networked Control Control System System (NCS) (NCS) refers refers to to aa a decentralized decentralized Networked Networked Control System (NCS) refers to a decentralized control system in which the components are connected control system in which the components are connected Networked Control System (NCS) refers to aare decentralized control system in communication which the the components are connected control system in which components connected through real-time channels or a data netthrough real-time communication channels or a data netcontrol system in which the components are connected through real-time communication channels or a data netthrough real-time communication channels or a data network. Thus, there may be a data link between the sensors work. Thus, there may be a data link between the sensors through real-time communication channels or a data network. Thus, there may be a data link between the sensors work. there may be a data link between the sensors (which collect information), the controllers (which make (whichThus, collect information), the controllers controllers (which make work. Thus, there may be a data linkexecute between the sensors (which collect information), the (which make (which collect information), the controllers (which make decisions), and the actuators (which the controller BS decisions), and the actuators (which execute the controller BS (which collect information), the controllers, controllers (which make decisions), and the actuators (which execute the the controller BS decisions), and the actuators (which execute controller commands); and the sensors, the and the plant BS commands); and the sensors, the controllers, and the plant decisions), and the actuators (which execute the controller commands); and the sensors, the controllers, and the plant BS commands); and the sensors, the controllers, and the plant themselves could be geographically separated (Y¨ u ksel and themselves could could be geographically geographically separatedand (Y¨ uthe kselplant and commands); and the sensors, the controllers, themselves be separated (Y¨ u ksel and themselves could be geographically separated (Y¨ uksel and Ba¸ ar (2013)). Ba¸sssar ar (2013)). (2013)). themselves could be geographically separated (Y¨ uksel and Ba¸ Ba¸ sar (2013)). Following Verd´ u and Shamai (1999); Perreau and AnderBa¸ sar (2013)). Following Verd´ u and Shamai (1999); Perreau and Ander- Fig. 1. N users using CDMA technology Following Verd´ u and and Shamai Shamai (1999); Perreau andmultiple AnderFig. 1. N users using CDMA technology Following Verd´ u (1999); Perreau and Anderson (2006), we consider a model of a code division Fig. son (2006), we consider a model of a code division multiple Fig. 1. 1. N N users users using using CDMA CDMA technology technology Following Verd´ u based and Shamai (1999); Perreau andmultiple Anderson (2006), we consider consider a model model of aa code code division multiple son (2006), we a of division access (CDMA) communication and control system Fig. 1. N of users using CDMA technology sequence user i is given by: access (CDMA) based communication and control system sequence of user i is given by: son (2006), we consider a model of a code division multiple access (CDMA) based communication and control control system access (CDMA) based communication and system in the context of a large number of users with N users sequence of user ii is given by: sequence of user is given by: in the context of a large number of users with N users N access based communication and control system in the (CDMA) contexta of of large number of users users with Nequally users sequence of user i (b) N in the context aa large number of with users which share channel and are assumed to be h is given by: 2 N which share a of channel and are assumed assumed to be N equally N p(b) (b) + h (b) + σth = p (1) z 2 + vk,i , in the context a large number of users with N users k,i which share a channel and are to be equally h k,i k,j (b) (b) which share a channel and are assumed to be equally h spaced on a circle around the base station as depicted = p + σ ,, (1) z 2 (b) (b) + k,i 2 + N p th N k,i k,j spaced on a circle around the base station as depicted = p + p + σ + vvvk,i (1) z k,i k,i = p + p + σ + (1) z th N k,i k,j i k,j k,i k,i , which share acircle channel and arebase assumed to as bedepicted equally h jj= spaced on1,aa with circle around the base station as depicted th k,i (b) (b) spaced on around the station 2 in Figure a signal processing gain proportional to N =i p N = p + + σ + v , (1) z in Figure 1, with a signal processing gain proportional to j = i k,i k,i th k,i k,j j = i (m) spaced on a circle around the base station as depicted in Figure 1, with a signal processing gain proportional to 2 in Figure 1, with a signal processing gain proportional to where σth N of the background thermal (m) 2 is the variance 1/N . In particular, let p and α denote, respectively, j = i k,i the variance of the background thermal σ 2 1/N . In In particular, particular, let p p(m) and α respectively, (m) in Figure 1, with a signal processing gain proportional to where 2 is is the of background thermal where σ 1/N let and α denote, denote, respectively, is (modeled the variance variance of the the background thermal where σth noise process as aa zero mean Gaussian random th and α denote, respectively, 1/N .. In particular, let pk,i k,i the transmitted power and the mean squared value of the th k,i (m) noise process (modeled as zero mean Gaussian random 2 the transmitted power and the mean squared value of the is the variance of the background thermal where σ noise process (modeled as a zero mean Gaussian random and α denote, respectively, 1/N . In particular, let p th mean noise process (modeled as a zero mean Gaussian random the transmitted power and the squared value of the variable), v is the measurement noise. Signal processing th k,i k,i the transmitted power and the mean squared value of the uplink channel gain for the i th mobile user of the network variable), v is the measurement noise. Signal processing k,i uplink channel gain for the i mobile user of the network th noise process (modeled as a zero mean Gaussian random variable), v is the measurement noise. Signal processing th k,i (b) the transmitted power and the mean squared value of the variable), v is the measurement noise. Signal processing uplink channel gain for the i mobile user of the network gain is assumed to be h /N . Also, a time slot corresponds k,i uplink channel gain for the i mobile user of the network (b) denote the received gain is assumed to be h aa time slot corresponds at the base station and let p /N . Also, noise. th power k,i denote (b) variable), v is the measurement Signal processing gain is assumed to be h /N . Also, time slot corresponds the received power at the base station and let p k,i (b) uplink channel gain for the i mobile user of the network gain is assumed to be h /N . Also, a time slot corresponds to the time period between two consecutive power control denote the power and to the time period between two consecutive power control (m) denote(b) the received received power at at the the base base station station gain and let let p pk,i k,i k,i (b)where is assumed to be h /N . Also, a time slot corresponds to the time period between two consecutive power control (b) = αp(m) . The average over slot k of for user i, p to the time period between two consecutive power control commands. Furthermore, the actual controlling users are k,i k,i . The (b) (m) denote the received power at the base station and let pi,k,iwhere for user p = αp average over slot k of (b) (m) commands. Furthermore, the actual controlling users are k,i .. The for user i, where p = average over slot to the time period betweenthe two consecutive power control Furthermore, actual controlling users are for user i, of where pk,i = αp αp The average overspreading slot k k of of commands. k,i k,i commands. Furthermore, the actual controlling users are the power the CDMA signal despread by the assumed to be independent and simply using the base k,i k,i (b) (m) the power of the CDMA signal despread by the spreading assumed to be independent and simply using the base for user i, where p = αp . The average over slot k of commands. Furthermore, the actual controlling users are the power of the CDMA signal despread by the spreading assumed to be independent and simply using the base k,i k,i the power of the CDMA signal despread by the spreading station assumed to be independent and simply using the base as a communication tool. They would not want to station as aa communication tool. They would not want to the power of the CDMA signal despread by the spreading assumed to be independent and simply using the base station as communication tool. They would not want to station as a communication tool. They would not want to share in any way their private information in a cooperative work of the first two authors was supported by Canada’s The share in any way their private information in a cooperative The work of the first two authors was supported by Canada’s station as a communication tool. They would not want to share in any way their private information in aa cooperative The work the first two authors was supported by Canada’s share in any way their private information in cooperative scheme that would allow others to identify their state. NSERC grantof 6820 2011. The work of the first two authors was supported by Canada’s scheme that would allow others to identify their state. NSERC grant 6820 2011. The work share in any way their private information in a cooperative scheme that would allow others to identify their state. 2011. NSERC grant 6820 of the first two authors was supported by Canada’s scheme that would allow others to identify their state. NSERC grant 6820 2011. scheme that would allow others to identify their state. NSERC grant 6820 2011.
Copyright © 2017 IFAC 14897 Copyright © 2017, 2017 IFAC 14897 2405-8963 © IFAC (International Federation of Automatic Control) Hosting by Elsevier Ltd. All rights reserved. Copyright 14897 Copyright © © 2017 2017 IFAC IFAC 14897 Peer review under responsibility of International Federation of Automatic Control. Copyright © 2017 IFAC 14897 10.1016/j.ifacol.2017.08.2018
Proceedings of the 20th IFAC World Congress Toulouse, France, July 9-14, 2017 Mehdi Abedinpour Fallah et al. / IFAC PapersOnLine 50-1 (2017) 14332–14337
Recently there have been research efforts to treat the power control problem from base station to individual cellular phone through a game theoretic view. In particular, a non linear model of the channel is used by Aziz and Caines (2017) and they formulate a mean field game problem to find Nash equilibrium strategies. In this paper, we wish to use CDMA technology to achieve distributed control in a particular way over a network. It turns out that interference due to the convergence of information signals to the base station creates a game situation, in which control laws both steer the system and affect the quality of observations, thus creating a dual control environment (for example, see Feldbaum (1960); Kim and Rock (2006)). We propose an approach to characterize potential Nash equilibrium decentralized policies within a restricted class, that of linear time varying output feedback policies involving a limited record of the most recent measurements, despite this complex dual control environment. 2. STATE SPACE MODEL FORMULATION AND PROBLEM STATEMENT The base station itself sends the control signal to a collection of individual systems, hereon referred to as agents. Downlink channels are considered noiseless, however the controlled individual systems are stochastic. Agents encode their (scalar) state xk,i by sending a power propor(m) tional to it, that is to say, pk,i = βxk,i , where β is a constant parameter. The base station in turn computes the required control based on received power which also is tainted by interference and measurement noise (as in (1)). Thus, by letting h 2 yk,i = zk,i − σth , c = αβ(1 − ), h = αβh , (2) N the physics of CDMA transmission viewed as a networked control system with N agents can be cast into a state space form with individual scalar dynamics described by xk+1,i = axk,i + buk,i + wk,i (3) and partial scalar state observations given by: N 1 yk,i = cxk,i + h xk,j + vk,i (4) N j=1
14333
(i.e. involving a constant gain feedback on the optimal local state estimate) has been considered; it has been shown that if individual agent dynamics are sufficiently stable, this class includes a particular feedback gain corresponding to an asymptotic in population size Nash equilibrium. Furthermore, it has been numerically observed that the class of separated policies appears to always contain stabilizing gain ranges irrespective of the degree of instability of individual dynamics, while predictably so, the corresponding optimal filters are linear functions of the current and past observations, however with coefficients, which remain time varying. The latter observation leads us to consider existence of potential Nash equilibria, however within the special class of growing dimension time varying linear output feedback control policies given by uk,i = −f1,k yk,i −f2,k yk−1,i −...−fk−1,k y2,i −fk,k y1,i , (6) where fj,k are time varying scalar gains. In this paper, we propose a methodology for the computation of such policies for a finite horizon problem (which obviously we could make as large as we wish). For analytical tractability and computability, we shall further narrow the considered class of candidate policies to that of linear output time varying feedback policies, with a limited look back at the most recent two measurements. Evidently, one can make the class wider by picking more measurements, and having a growing vector of time varying gains as measurements accumulate. Thus, in the rest of the paper, we consider the problem of synthesizing Nash equilibrium strategies within a class of time varying linear output feedback control laws parameterized as follows: uk,i = −f1,k yk,i − f2,k yk−1,i , (7) i.e. the dependence is restricted to the most recent two measurements. 3. COMPUTATION OF SYMMETRIC NASH EQUILIBRIUM The idea of the calculation is based on a “generic” agent i0 attempting to develop a consistency principle for identifying the likely Nash candidate policies within the restricted class of interest, through the following steps:
for k ≥ 0 and 1 ≤ i ≤ N , where xk,i , uk,i , yk,i ∈ R are the state, the control input and the measured output of the ith agent, respectively. The random variables wk,i ∼ 2 N (0, σw ) and vk,i ∼ N (0, σv2 ) represent independent Gaussian white noises at different times k and at different agents i. The Gaussian initial conditions x0,i ∼ N (¯ x0 , σ02 ) are mutually independent and are also independent of {wk,i , vk,i , 1 ≤ i ≤ N, k ≥ 0}. Moreover, a is a scalar parameter and b, c, h > 0 are positive scalar parameters. Furthermore, the individual cost function for each agent is given by T −1 2 2 2 Ji E (xk,i + ruk,i ) + xT,i , (5) k=0
where r > 0 is a positive scalar parameter.
In Abedinpour-Fallah et al. (2016), and for the infinite horizon problem, the class of so-called separated policies 14898
• On a finite time horizon T , fix for every k the sequence of output feedback gains (f1,k , f2,k ) to be considered as a candidate. • Assume everyone except agent i0 uses that feedback strategy. • Consider N large enough that the actions of agent i0 have negligible impact on the policies of other agents (the standard starting point of mean field games analysis - see [Huang et al. (2007)] for example, i.e. decoupling of the mass from the individual). Indeed, while an agent’s input does not affect the quality of its estimates in the linear case, in reality the feedback gains used by i0 affect its state and through interference will affect the ability of other agents to estimate their own state. This in turn impacts the interference perceived by i0 . • Consider the optimal control problem to be solved by individual agent i0 . Note that when solving for its best response to other agents’ actions using a dynamic programming principle working backwards in time at some time k, based on the results at time (k + 1), it
Proceedings of the 20th IFAC World Congress 14334 Toulouse, France, July 9-14, 2017 Mehdi Abedinpour Fallah et al. / IFAC PapersOnLine 50-1 (2017) 14332–14337
is assumed in light of the previous discussion, and for estimation purposes that up to time k the agent has been using exactly the same feedback policy as the other agents. • Under such conditions, agent i0 calculates the optimal feedback gains at time k, and of course, a necessary condition for the posited feedback sequence to be Nash is that the agent in question recovers the feedback gains he assumed optimal in the first place, i.e. a fixed point result. In this process, unlike the standard LQG dynamic programming solution, the estimation error cost depends on the chosen feedback gains, and thus also the structure of the fixed point equations one needs to satisfy. Also, note that in positing the quadratic structure of the optimal cost to go, we do neglect any dependencies on measurements beyond the latest two (The most general analysis would instead need to deal with optimal cost to go quadratic dependencies on a growing dimension vector of measurements). We proceed to study the dynamical model of agent i0 . In particular, define: m− k
N 1 = xk,j , N
k = 0, 1, ..., T,
(8)
j=i0
w ˜k− =
N 1 wk,j , N
v˜k− =
j=i0
N 1 vk,j , N
(9)
Xk+1,i0 = Ad,k Xk,i0 + Bd uk,i0 + Dd,k Wk,i0 , yk,i0 = Hd Xk,i0 + vk,i0 ,
(10) (11)
− − Xk,i0 = [xk,i0 , xk−1,i0 , m− ˜k−1 ]T , k , mk−1 , v
Ad,k
k=0
and let Vk,i0 be the optimal expected value of Ji0 starting from time k on, knowing the measurements Yik0 and using an optimal u∗k,i0 , i.e. the optimal cost to go starting from time k. Then using the dynamic programming principle we have: (19) Vk,i0 = min E{x2k,i0 + ru2k,i0 + Vk+1,i0 |Yik0 }. uk,i0
Now, for tractability, we make the assumption that the optimal cost-to-go is a quadratic function of the last two most recent measurements. Recall however that in general, the optimal cost to go will be a quadratic function of all current and past measurements. Therefore, by substituting T yk+1,i0 q11,k+1 q12,k+1 yk+1,i0 Vk+1,i0 = + q¯k+1 yk,i0 yk,i0 q12,k+1 q22,k+1 (20) in (19) we get:
0 0
0 0
2 Vk,i0 = min E{x2k,i0 + ru2k,i0 + q11,k+1 yk+1,i 0 uk,i0
2 + q22,k+1 yk,i + 2q12,k+1 yk,i0 yk+1,i0 + q¯k+1 |Yik0 }, 0 (21) which can be written as: Vk,i0 = min E{(xk,i0 − x ˆk,i0 + x ˆk,i0 )2 + ru2k,i0 uk,i0
where
a 0 1 0 = −bf1,k (NN−1) h −bf2,k (NN−1) h 2 2 0 0 0 0
Let x ˆk,i denote the minimum mean square error estimator of xk,i based only on local observations of the ith agent. Also let Yik indicate the column vector of all measurements up to time k at agent i. Next, consider agent i0 with its cost function given by: T −1 J i0 E (x2k,i0 + ru2k,i0 ) + x2T,i0 , (18)
j=i0
and assume that all other agents (except agent i0 ) use (7). Thus, including (7) in (3) and combining (3), (4), (8), (9), the closed-loop dynamics model can be expressed as:
3.1 Optimal control using dynamic programming
(12)
0 0 a3,3 a3,4 −bf2,k , 1 0 0 0 0 0 (13)
(N − 1) h), (14) N (N − 1) h), (15) a3,4 = −bf2,k (c + N h T b 10 0 c+ N 0 0 0 0 0 Bd = 0 , Dd,k = 0 1 −bf1,k , Hd = h . 0 0 0 0 0 0 00 1 0 (16) a3,3 = a − bf1,k (c +
Furthermore, the noise vector Wk,i0 and its covariance matrix Σw,d are given by: 2 σw 0 0 wk,i0 2 . (17) ˜k− , Σw,d = 0 (NN−1) σw 0 Wk,i0 = w 2 − (N −1) 2 v˜k 0 0 N 2 σv
+ q11,k+1 (yk+1,i0 − yˆk+1|k,i0 + yˆk+1|k,i0 )2
2 + 2q12,k+1 yk,i0 yˆk+1|k,i0 + q¯k+1 |Yik0 }. + q22,k+1 yk,i 0 (22) Now, using the orthogonality of estimation error with the estimate itself (i.e., (xk,i0 − x ˆk,i0 ) ⊥ x ˆk,i ) in (22) we get:
ˆk,i0 )2 + x ˆ2k,i0 + ru2k,i0 Vk,i0 = min E{(xk,i0 − x uk,i0
2 + q11,k+1 (yk+1,i0 − yˆk+1|k,i0 )2 + q11,k+1 yˆk+1|k,i 0
2 + q22,k+1 yk,i + 2q12,k+1 yk,i0 yˆk+1|k,i0 + q¯k+1 |Yik0 }, 0 (23) which can be expressed as: Vk,i0 = min E{(xk,i0 − x ˆk,i0 )2 + x ˆ2k,i0 + ru2k,i0 uk,i0
h )(xk+1,i0 − x ˆk+1|k,i0 ) N 2 2 + h(m− ˆ− k+1 − m k+1|k,i0 ) + vk+1,i0 ) + q22,k+1 yk,i0 + q11,k+1 ((c +
h 2 )ˆ xk+1|k,i0 + hm ˆ− ¯k+1 k+1|k,i0 ) + q N h k xk+1|k,i0 + hm + 2q12,k+1 yk,i0 ((c + )ˆ ˆ− k+1|k,i0 )|Yi0 }. N (24) Note here the dependence of the cost to go on the expected state estimation error variance, itself a deterministic function of all current and past state feedback gains in the
14899
+ q11,k+1 ((c +
Proceedings of the 20th IFAC World Congress Toulouse, France, July 9-14, 2017 Mehdi Abedinpour Fallah et al. / IFAC PapersOnLine 50-1 (2017) 14332–14337
h ˜ 2,k ) ) + ΦTk K N h ˜ 1,k ) + rf1,k f2,k + q11,k+1 (−bf1,k (c + ) + ΦTk K N h ˜ 2,k ), (36) (−bf2,k (c + ) + ΦTk K N
control law, thus highlighting the dual effects of controls in this context. Then replacing (3) and x ˆk+1|k,i0 = aˆ xk,i0 + buk,i0 (25)
(x)
∂Vk,i0 ∂uk,i0
= 0,
h h N) u∗k,i0 = − (a(c + )q11,k+1 x ˆk,i0 h 2 2 N r + b (c + N ) q11,k+1 + hq11,k+1 m ˆ− (26) k+1|k,i0 + q12,k+1 yk,i0 ). ∗ Moreover, using (10)-(12) the optimal controller uk,i0 can
q¯k =q11,k+1 (ΦTk Pk|k Φk + (c2 +
b(c +
be expressed as:
h a(c + N ) − bf1,k (NN−1) h2 2 −bf2,k (NN−1) h2 2 Φk = [a − bf1,k (c + (N −1) h)]h . N −bf2,k (c + (N −1) h)h N −bf2,k h
(27)
3.2 Kalman filter-based estimation
(28)
Applying the standard (time-varying) Kalman filter algorithm to the dynamics (10)-(11) we have: ˆ k,i + Bd uk,i + Kk+1 (yk+1,i ˆ k+1,i =Ad,k X X
0
0
0
Thus, u∗k,i0 can further be expressed as follows: ∗ ∗ yk,i0 − f2,k yk−1,i0 , u∗k,i0 = −f1,k h N) h 2 N ) q11,k+1
h b(c + N ) h 2 2 b (c + N ) q11,k+1
b(c + r + b2 (c +
(30)
˜ 1,k + q12,k+1 , q11,k+1 ΦTk K
(31)
˜ 2,k . q11,k+1 ΦTk K (32) r+ As stated, a necessary condition for the posited feedback sequence to be Nash is that the agent i0 recovers the feedback gains he assumed optimal in the first place, i.e. the fixed point equilibrium condition holds, which yields: ∗ ∗ (f1,k , f2,k ) = (f1,k , f2,k ). (33) ∗ = f2,k
u∗k,i0
Furthermore, plugging in (23) the optimal controller and using estimate (29) we get, after some calculations, the following backward coupled equations: h ˜ 1,k )2 q11,k =q11,k+1 (−bf1,k (c + ) + ΦTk K N h ˜ 1,k ) + 2q12,k+1 (−bf1,k (c + ) + ΦTk K N ˜ (x) 2 + rf 2 , + q22,k+1 + K (34) 1,k 1,k q22,k =q11,k+1 (−bf2,k (c + ˜ (x) 2 + rf 2 , +K 2,k 2,k
0
0
0
ˆ k,i + Bd uk,i )) − Hd (Ad,k X 0 0
Now, we assume that any estimate we make depends at most on the two most recent measurements so that the quadratic assumption about the cost structure remains true from one step to another (see the Kalman filtering Subsection 3.2 for more details, as well as the calculation of the filter gains in (44)-(46)). We then have: ˜ 1,k yk,i + K ˜ 2,k yk−1,i . ˆ k,i = K (29) X
∗ f1,k =
(N − 1) 2 (xx) )σv ) + q¯k+1 + Pk|k , (37) N2 ˜ (x) , K ˜ (x) are respectively, the first elements of the where K 1,k 2,k ˜ 1,k , K ˜ 2,k . Moreover, P (xx) denotes the filter gain vectors K 2 + (1 + b2 f1,k h2
k|k
r + b2 (c + + q12,k+1 yk,i0 ),
where
2ch + h2 2 )σw N
first entry in the main diagonal of the estimation error covariance matrix Pk|k .
h N) ˆ k,i (q11,k+1 ΦTk X 0 h 2 ) q 11,k+1 N
b(c +
u∗k,i0 = − where
(x)
˜ K ˜ q12,k = K 1,k 2,k + q12,k+1 (−bf2,k (c +
in (24) and also noting that m− k+1 is considered independent of uk,i0 because of the size of N , we let which yields:
14335
h ˜ 2,k )2 ) + ΦTk K N (35)
T Pk+1|k =Ad,k Pk|k ATd,k + Dd,k Σw,d Dd,k Kk+1 =Pk+1|k HdT (Hd Pk+1|k HdT + Rd )−1
0
(38) (39) (40) (41)
Pk+1|k+1 =(I − Kk+1 Hd )Pk+1|k ˆ i,0 and P0|−1 . Also, Rd = which is initialized via given X σv2 . Then we note that the filtering equation (38) can be written as ˆ k,i ˆ k+1,i =(Ad,k − Kk+1 Hd Ad,k )X X 0 0 + (I − Kk+1 Hd )Bd uk,i0 + Kk+1 yk+1,i0 , (42) ˆ k+1,i = (Ad,k − Kk+1 Hd Ad,k )[(Ad,k−1 − Kk Hd Ad,k−1 ) X 0 ˆ k−1,i + (I − Kk Hd )Bd uk−1,i ] + (I − Kk+1 Hd )Bd uk,i X 0 0 0 + (Ad,k − Kk+1 Hd Ad,k )Kk yk,i0 + Kk+1 yk+1,i0 , (43) and thus ˜ 1,k yk,i + K ˜ 2,k yk−1,i , ˆ k,i = K (44) X 0 0 0 where ˜ 1,k =Kk , (45) K ˜ 2,k = − f1,k−1 (I − Kk Hd )Bd + (Ad,k−1 K − Kk Hd Ad,k−1 )Kk−1 .
(46)
3.3 Initialization Over a finite time horizon [0, T ], the solution starts with a forward sweep whereby one assumes an initial set of output feedback gains ([f1,1 , f1,2 , ..., f1,T ], [f2,1 , f2,2 , ..., f2,T ]), which gives an expression of state estimates as well as their error covariances in terms of the assumed gains and the measurements that will be gathered over time, through recursive equations (39)-(41) and (44)-(46). Then, by proceeding through a backward sweep, one can use these values to find q11,k , q22,k , q12,k , q¯k , for all k’s, through recursive equations (34)-(37) and hence a new set of candidate output feedback gains. The forward-backward sweep stops whenever one reaches a fixed point in the space of output feedback gains. In particular, after a first Kalman
14900
Proceedings of the 20th IFAC World Congress Toulouse, France, July 9-14, 2017 14336 Mehdi Abedinpour Fallah et al. / IFAC PapersOnLine 50-1 (2017) 14332–14337
8000 * *
Proposed algorithm using (K ,f )
6
Kalman-Riccati (K * ,f* ) 4
4000
2000
2
0
xk,1
x k,1
Kalman-Riccati (K *,f*)
6000
0
-2000 -2
-4000
-6000 -4 -8000 -6
0
5
10
15
20
25
30
35
40
45
50
-10000
time
0
5
10
15
20
25
30
35
40
45
50
time
Fig. 2. Behaviors of agent 1 when a = 2, N = 100, using Kalman-Riccati couple K ∗ = 0.809, f ∗ = 1.618 with J¯(N ) /T = 15.65 versus the proposed algorithm which is initialized by the Kalman-Riccati couple with J¯(N ) /T = 15.08.
Fig. 3. Unstable behavior of agent 1 when a = 2.6, N = 100, using Kalman-Riccati couple K ∗ = 0.8735, f ∗ = 2.2711 with J¯(N ) /T = 6.4582 × 107 .
filtering based forward calculations cycle, we initialize the backward sweep calculation at time T as follows: ˆT,i0 + x ˆT,i0 )2 |YiT0 } VT,i0 = E{x2T,i0 |YiT0 } = E{(xT,i0 − x =
Proposed algorithm using (K* ,f* ) 5 4
2
3
(47)
2
x k,1
=
ˆT,i0 ) + x ˆ2T,i0 |YiT0 } E{(xT,i0 − x (xx) ˜ (x) yT,i + K ˜ (x) yT −1,i )2 PT |T + (K 0 0 1,T 2,T
6
Thus, we get:
˜ (x) 2 , q11,T = K 1,T (x)
˜ (x) 2 , q22,T = K 2,T (x)
˜ K ˜ q12,T = K 1,T 2,T ,
(xx)
q¯T = PT |T .
1 0
(48)
-1
(49)
-2
4. NUMERICAL RESULTS
-3 -4
The numerical results reported in this section are obtained considering the following parameter setting for a representative agent, i.e., the 1st agent: b = c = h = 1 and σw = σv = 1, with initial standard deviation σ0 = 1 and Ex0,i = x ¯0 = 0 for all agents i = 1, 2, . . . , N . In addition, we will only deal with the case a ≥ 0. We experiment first, close to the a stability region [0, 2.53] by KalmanRiccati couple, and initialize the gains using the steadystate isolated (naive) Kalman sequence (see Appendix for details) and the Riccati control gain f ∗ = abΣ/(r + b2 Σ) which is obtained based on the positive solution of the associated algebraic Riccati equation (50) b2 Σ2 + (r − a2 r − b2 )Σ − r = 0. In particular, we initialize the algorithm by letting (51) f1 = f ∗ K ∗ , f2 = f ∗ (a − bf ∗ )(1 − cK ∗ )K ∗ . Then using a continuation approach, we let a go to a+∆a, and intialize the algorithm with the latest sequence. It is observed that the proposed approach, which is the optimal solution with limited memory, improves the previous value of asup = 2.53, obtained in Abedinpour-Fallah et al. (2016), up to the new value of 2.72, where asup is the maximum value of a such that the naive optimal KalmanRiccati couple (K ∗ , f ∗ ) is inside of the stability region and can stabilize the population (see Subsection 4.4.2. in Abedinpour-Fallah et al. (2016)). Moreover, the behaviors of agent 1 using the Kalman-Riccati couple versus the
0
5
10
15
20
25
30
35
40
45
50
time
Fig. 4. Behavior of agent 1 when a = 2.6, N = 100, using the proposed algorithm which is initialized by Kalman-Riccati couple K ∗ = 0.8735, f ∗ = 2.2711 with J¯(N ) /T = 45.6167. proposed algorithm are compared in Fig. 2 and Figs. 34 respectively, for a = 2 and a = 2.6, where J¯(N ) denotes the LQ average cost over all the population, that is, J¯(N ) = N 1 j=1 Jj . Furthermore, Fig. 5 depicts the behaviors of N agent 1 using the Kalman gain with af = a − bf = 0.9 versus the proposed algorithm for a = 2.6. In addition, we illustrate the sequence of output feedback gains that were obtained in every case. In particular, in Figure 2 with a = 2 for the Kalman-Riccati case we have: [f1,1 , f1,2 , ..., f1,50 ] = [1.309, 1.309, ..., 1.309], [f2,1 , f2,2 , ..., f2,50 ] = [0.0955, 0.0955, ..., 0.0955], while for the the proposed algorithm we have: [f1,1 , f1,2 , ..., f1,50 ] = [1.1797, 1.2532, 1.2445, ..., 1.2445, 1.2444, 1.2442, 1.2435, 1.2404, 1.2276, 1.1688, 0.6857], [f2,1 , f2,2 , ..., f2,50 ] = [0, 0.1338, 0.1196, ..., 0.1196, 0.1195, 0.1193, 0.1180, 0.1126, 0.0663]. Similarly, for Figure 3 with a = 2.6 we have: [f1,1 , f1,2 , ..., f1,50 ] = [1.9838, 1.9838, ..., 1.9838], [f2,1 , f2,2 , ..., f2,50 ] = [0.0825, 0.0825, ..., 0.0825],
14901
Proceedings of the 20th IFAC World Congress Toulouse, France, July 9-14, 2017 Mehdi Abedinpour Fallah et al. / IFAC PapersOnLine 50-1 (2017) 14332–14337
14337
which involves limited measurements memory. In future work, we shall investigate the existence properties of the equilibria that can be achieved within this set up.
14 *
Proposed algorithm using (K ,a =0.9) f
12 *
(K ,a =0.9) f
10
REFERENCES
8
x k,1
6 4 2 0 -2 -4 -6
0
5
10
15
20
25
30
35
40
45
50
time
Fig. 5. Behaviors of agent 1 when a = 2.6, N = 100, using the Kalman gain K ∗ = 0.8735 and af = 0.9 with J¯(N ) /T = 151 versus the proposed algorithm which is initialized by the Kalman gain K ∗ = 0.8735, and af = 0.9 with J¯(N ) /T = 44.75. while for Figure 4 with a = 2.6 we have: [f1,1 , f1,2 , ..., f1,50 ] = [1.7997, 1.9086, 1.8895, 1.8916, 1.8913, 1.8914, ..., 1.8914, 1.8913, ..., 1.8913, 1.8911, 1.8914, 1.8909, 1.8912, 1.8891, 1.8948, 1.8705, 1.8411, 1.0163], [f2,1 , f2,2 , ..., f2,50 ] = [0, 0.2280, 0.1964, 0.1988, 0.1980, 0.1981, ..., 0.1981, 0.1982, 0.1982, 0.1986, 0.1980, 0.1992, 0.1986, 0.2016, 0.1909, 0.2220, 0.1831, 0.0640]. Moreover, in Figure 5 with the same a = 2.6, for the (K ∗ = 0.8735, af = 0.9) case we have: [f1,1 , f1,2 , ..., f1,50 ] = [1.4849, 1.4849, ..., 1.4849], [f2,1 , f2,2 , ..., f2,50 ] = [0.1691, 0.1691, ..., 0.1691], while for the the proposed algorithm we have: [f1,1 , f1,2 , ..., f1,50 ] = [1.7918, 1.9039, 1.8942, 1.8971, 1.8960, 1.8963, 1.8964, ..., 1.8964, 1.8963, 1.8963, 1.8963, 1.8963, 1.8962, 1.8962, 1.8961, 1.8960, 1.8960, 1.8958, 1.8957, 1.8955, 1.8953, 1.8951, 1.8948, 1.8945, 1.8942, 1.8938, 1.8933, 1.8928, 1.8923, 1.8917, 1.8910, 1.8902, 1.8892, 1.8884, 1.8875, 1.8820, 1.8825, 1.8563, 1.0179], [f2,1 , f2,2 , ..., f2,50 ] = [0, 0.2272, 0.1787, 0.1912, 0.1914, 0.1904, 0.1908, 0.1907, ..., 0.1907, 0.1908, 0.1908, 0.1908, 0.1909, 0.1909, 0.1910, 0.1911, 0.1912, 0.1914, 0.1915, 0.1917, 0.1920, 0.1923, 0.1926, 0.1930, 0.1935, 0.1940, 0.1946, 0.1953, 0.1960, 0.1968, 0.1976, 0.1985, 0.1997, 0.2011, 0.2025, 0.2037, 0.2053, 0.2106, 0.1962, 0.1651, 0.0671]. It is noted that up to the fixed point numerical convergence criterion, the fixed points associated to Figure 4 and Figure 5 which are obtained using the proposed algorithm, appear to be essentially equivalent, although the two calculations were initialized differently. 5. CONCLUSION In this paper, motivated by the application of CDMA based communication and control system modeled as an interference induced game in a multi-agent networked control system, we considered the computation of a symmetric Nash equilibrium within a restricted class of output feedback policies. We have shown that for some instances, the range of Nash equilibria obtained from the Kalman-Riccati couple can be extended using the proposed methodology
Abedinpour-Fallah, M., Malham´e, R., and Martinelli, F. (2016). A class of interference induced games: Asymptotic nash equilibria and parameterized cooperative solutions. Automatica, 69, 181–194. Aziz, M. and Caines, P.E. (2017). A mean field game computational methodology for decentralized cellular network optimization. IEEE Transactions on Control Systems Technology, 25(2), 563–576. Feldbaum, A.A. (1960). Dual control theory i-iv. Autom. Remote Control, 21(6), 874–880. Huang, M., Caines, P., and Malham´e, R. (2007). Largepopulation cost-coupled lqg problems with non-uniform agents: individual-mass behavior and decentralized epsilon-nash equilibria. IEEE Transactions on Automatic Control, 52(9), 1560–1571. Kim, J. and Rock, S. (2006). Stochastic feedback controller design considering the dual effect. In Proceedings of the AIAA Guidance, Navigation and Control Conference. Perreau, S. and Anderson, M. (2006). A new method for centralised and decentralised robust power control in cdma systems. Digital Signal Processing, 16(5), 568– 576. Verd´ u, S. and Shamai, S. (1999). Spectral efficiency of cdma with random spreading. IEEE Transactions on Information Theory, 45(2), 622–640. Y¨ uksel, S. and Ba¸sar, T. (2013). Stochastic networked control systems: Stabilization and optimization under information constraints. Springer Science & Business Media. Appendix A. THE STEADY-STATE ISOLATED KALMAN SEQUENCE Definition 1. The isolated (naive) Kalman filter is a Luenberger like observer equation under the assumed state ˆk,i and zero interestimate feedback structure uk,i = −f x ference in the local measurements, which evolves as xk,i + K ∗ (yk+1,i − c(a − bf )ˆ xk,i ), (A.1) x ˆk+1,i = (a − bf )ˆ ∗ with the steady-state scalar gain K = cP∞ /(c2 P∞ + σv2 ) , where P∞ is the unique positive solution of 2 2 2 2 + ((1 − a2 )σv2 − c2 σw )P∞ − σw σv = 0. (A.2) c 2 P∞ Proposition 1. In the stability region, the isolated Kalman filter equivalent equation in the steady-state is given by x ˆk,i = K1 yk,i + K2 yk−1,i + ... + Kk y1,i , (A.3) where Kj = (a − bf )j−1 (1 − cK ∗ )j−1 K ∗ . (A.4) Proof : By rearranging (A.1) we have: x ˆk+1,i = (a − bf )(1 − cK ∗ )ˆ xk,i + K ∗ yk+1,i . (A.5) Then substituting (A.3) and x ˆk+1,i = K1 yk+1,i + K2 yk,i + ... + Kk+1 y1,i , (A.6) into (A.5) and applying the stationarity property by making the left-hand-side of the resulting equation equal to its right-hand-side, we get the fixed-point values (A.4).
14902