Preprints, 1st IFAC Conference on Modelling, Identification and Preprints, IFAC Conference Modelling, Identification and Control of 1st Nonlinear Systems on Preprints, 1st IFAC Conference Conference on Modelling, Modelling, Identification Identification and and Preprints, IFAC on Control of 1st Nonlinear Systems Available online at www.sciencedirect.com June 24-26, 2015. Saint Petersburg, Russia Control of Nonlinear Systems Control of Nonlinear Systems June 24-26, 2015. Saint Petersburg, Russia June June 24-26, 24-26, 2015. 2015. Saint Saint Petersburg, Petersburg, Russia Russia
ScienceDirect
IFAC-PapersOnLine 48-11 (2015) 802–807
Conditionally Minimax Prediction Conditionally Minimax Prediction Conditionally Minimax Prediction in Nonlinear Stochastic Systems in Nonlinear Stochastic Systems in Nonlinear Stochastic Systems
A. A. A. A.
V. V. V. V.
∗∗∗ Bosov ∗∗ A. V. Borisov ∗∗ ∗∗ K. V. Semenikhin ∗∗∗ Bosov ∗ A. V. Borisov ∗∗ ∗∗ K. V. Semenikhin ∗∗∗ ∗∗∗ ∗ Bosov A. V. Borisov K. V. Semenikhin Bosov A. V. Borisov K. V. Semenikhin ∗ ∗ RAS Institute for Informatics Problems Institute for Informatics Problems ∗ ∗ RAS RAS Institute for RAS(e-mail:
[email protected]). for Informatics Informatics Problems Problems (e-mail:
[email protected]). ∗∗ (e-mail:
[email protected]). RAS Institute for Informatics Problems (e-mail:
[email protected]). ∗∗ ∗∗ RAS Institute for Informatics Problems ∗∗ RAS Institute Institute for Informatics Informatics Problems Problems (e-mail:
[email protected]). RAS for (e-mail:
[email protected]). ∗∗∗ (e-mail:
[email protected]). Aviation Institute (National Research University) (e-mail:
[email protected]). ∗∗∗ Moscow Aviation Institute (National Research University) ∗∗∗ ∗∗∗ Moscow Moscow (National (e-mail:Institute
[email protected]). Moscow Aviation Aviation Institute (National Research Research University) University) (e-mail:
[email protected]). (e-mail:
[email protected]).
[email protected]). (e-mail:
Abstract: This paper describes an approach of nonlinear prediction in discrete-time stochastic Abstract: This paper describes approach of prediction in stochastic Abstract: This paper consists describesofan an approach of nonlinear nonlinearstatistical predictionestimation in discrete-time discrete-time stochastic systems. The method two steps: preliminary of second-order Abstract: This paper describes an approach of nonlinear prediction in discrete-time stochastic systems. The method consists of two steps: preliminary statistical estimation of second-order systems. The method consists consists of two two steps: steps: preliminary statistical estimation estimation ofThe second-order moment The characteristics and minimax optimization of structural coefficients.of minimax systems. method of preliminary statistical second-order moment characteristics and minimax optimization of structural coefficients. The minimax moment minimax optimization of structural coefficients. minimax predictioncharacteristics algorithms areand based on semidefinite programming over several kindsThe of confidence moment characteristics and minimax optimization of structural coefficients. The minimax prediction algorithms are based on semidefinite programming over several kinds of confidence prediction are based on semidefinite programming over confidence regions for algorithms the uncertain mean and covariance matrix. The guaranteed level kinds of the of mean square prediction algorithms aremean basedand on covariance semidefinite programming over several several kinds of confidence regions for the uncertain matrix. The guaranteed level of the regions for the the uncertain mean and covariance matrix. The guaranteed guaranteed level of of the the mean mean square square error (MSE) is uncertain provided by theand minimax theorem on normal correlation. regions for mean covariance matrix. The level mean square error (MSE) is provided by the minimax theorem on normal correlation. error error (MSE) (MSE) is is provided provided by by the the minimax minimax theorem theorem on on normal normal correlation. correlation. © 2015, IFAC (International Federation of Automatic Control) Hosting by Elsevier Ltd. All rights reserved. Keywords: Adaptive Control and Signal Processing for Nonlinear Systems; Modeling and Keywords: Adaptive Control and Keywords: Adaptive Control and Signal Signal Processing Processing for for Nonlinear Nonlinear Systems; Systems; Modeling Modeling and and Identification of Nonlinear Systems. Keywords: Adaptive Control and Signal Processing for Nonlinear Systems; Modeling and Identification of Nonlinear Systems. Identification of of Nonlinear Nonlinear Systems. Systems. Identification 1. INTRODUCTION In this paper we develop the method of conditionally 1. In we the of 1. INTRODUCTION INTRODUCTION In this this paper paper we develop develop the method method of conditionally conditionally minimax prediction using semidefinite programming tech1. INTRODUCTION In this paper we develop the method of conditionally minimax prediction using semidefinite programming techminimax prediction using semidefinite programming techniques together withusing constructing confidence regionstechfor prediction semidefinite programming In the last decade a significant progress has been made minimax niques together with constructing confidence regions for In the last decade a significant progress has been made niques together with constructing confidence regions for uncertain mean and covariance matrix. In the last last decade decade a significant significant progress adapted has been beentomade made together with constructing confidence regions for in developing various filtering techniques esti- niques In the a progress has uncertain mean and covariance matrix. in developing various filtering techniques adapted to estiuncertain mean and covariance matrix. in developing various filtering techniques adapted to estiesti- uncertain mean and covariance matrix. mating the state of partially observed nonlinear stochastic in developing various filtering techniques adapted to mating the state observed nonlinear stochastic 2. PROBLEM STATEMENT mating the state of of partially partially observed nonlinear stochastic systemsthe [Arulampalam et al.observed (2002); Julier and stochastic Uhlmann mating state of partially nonlinear 2. systems [Arulampalam et al. (2002); Julier and Uhlmann 2. PROBLEM PROBLEM STATEMENT STATEMENT systems [Arulampalam et al. (2002); Julier and Uhlmann 2. PROBLEM STATEMENT (2004); Arasaratnam and Haykin (2009)]. Nevertheless, systems [Arulampalamand et al. (2002);(2009)]. Julier and Uhlmann (2004); Arasaratnam Haykin Nevertheless, the following nonlinear regression model: (2004); Arasaratnam and Haykin Haykin (2009)].filtering Nevertheless, many of Arasaratnam heuristics in nonlinear stochastic remain Consider (2004); and (2009)]. Nevertheless, Consider the following nonlinear regression model: many of in stochastic nonlinear regression many of aheuristics heuristics in nonlinear nonlinear stochastic filtering filtering remain remain Consider Consider the the following following nonlinear regression model: without proper theoretical justification. many of heuristics in nonlinear stochastic filtering remain X = Φ(υ), Y = Ψ(υ, η), model: (1) without a proper theoretical justification. X = Φ(υ), Y = Ψ(υ, η), (1) without aa proper proper theoretical theoretical justification. justification. X = Φ(υ), Y = Ψ(υ, η), (1) mX without = Φ(υ), Y = Ψ(υ, η), (1) X ∈ Rm is the state vector to be estimated from At the same time, the theory of robust estimation and where where X ∈ R is the state vector to be estimated from m At the same time, the theory of robust estimation and n m where X ∈ R is the state vector to be estimated from of measurements. Suppose that υ ∈ Rqq the vector Y ∈ R At the has samecreated time, athe the theory offorrobust robust estimation and where X ∈ R isn the state vector to be estimated from control solid basisof designing linear algoAt the same time, theory estimation and υ ∈ R the vector Y ∈ R n of measurements. Suppose that control has created a solid basis for designing linear algon n of measurements. measurements. Suppose that υis∈ ∈ the Rqq the vector Y ∈ ∈ofR R random is the vector parametersSuppose and η ∈that Rn υ control has created created solid basis of forfull designing linear algo- the of R vector Y rithms insensitive toaathe absence knowledge of model control has solid basis for designing linear algois the vector of random parameters and η ∈ R n is the rithms insensitive to the absence of full knowledge of model n is the the vector vector of random random parameters andand ∈have R known is the the noise. Let υ, η be independent rithms insensitive to the theonabsence absence of full full knowledge knowledge of model model is of parameters and ηη ∈ R is characteristics. Based a game-theoretic approach and observation rithms insensitive to of of observation noise. Let υ, ηη be independent and have known characteristics. Based aa game-theoretic approach and noise. Let υ, be independent and have known distributions. characteristics. Based on on game-theoretic approach and observation observation noise. Let υ, η be independent and have known equipped with modern optimization tools, it has become characteristics. Based on a game-theoretic approach and distributions. equipped with modern optimization tools, equipped with modernnumerically optimization tools, it it has has become become distributions. possible towith determine minimax-optimal filters distributions. equipped modern optimization tools, it has become Instead of finding the MSE-optimal predictor possible to determine numerically minimax-optimal filters possible to determine determine numerically minimax-optimal filters MSE-optimal predictor and predictors for the multidimensional state of filters linear Instead possible to numerically minimax-optimal Instead of of finding finding the the predictor MSE-optimal predictor and predictors for the multidimensional state of linear y →MSE-optimal E{X | Y = y} , and predictors for considered the multidimensional multidimensional statecovariance of linear linear Instead of finding the stochastic systems with mean and and predictors for the state of y → E{X | Y = y} , stochastic systems considered with mean and covariance y → E{X | Y = y} y → E{X | Y = y} the problem of improving the ,,forecasting estimate stochastic systems considered with mean mean and covariance uncertainties [Li et considered al. (2002); Eldar et al. and (2005); Mittel- consider stochastic systems with covariance consider the problem of improving the uncertainties [Li et al. (2002); Eldar et al. (2005); Mittelimproving the forecasting forecasting estimate estimate uncertainties [Li(2010); et al. al. (2002); (2002); Eldar et al. al. (2005); (2005); MittelMittel- consider man and Miller Kogan Eldar (2014)]. consider the the problem problem of of U improving uncertainties [Li et et (2) 1 = Φ(υ) the forecasting estimate man and Miller (2010); Kogan (2014)]. U = Φ(υ) (2) 1 man and Miller (2010); Kogan (2014)]. U = Φ(υ) (2) man and Miller (2010); Kogan (2014)]. 1 U = Φ(υ) (2) 1 the correcting term Such type of uncertainty turns out to be useful for studying using using the correcting term Such type of uncertainty turns out to be useful for studying o using the correcting term Such type of uncertainty turns out to be useful for studying performance not only ofturns linear estimators but nonlinear Such type of uncertainty outestimators to be useful fornonlinear studying using the correcting U2 term = Ω(Y − Ψ(υ, η o )). (3) performance not only of linear but U2 = Ω(Y − Ψ(υ, (3) performance not only ofresult linear estimators but nonlinear filters as well.not Theonly key of in estimators this area isbut the nonlinear minimax The vector υ playsU = Ω(Y − Ψ(υ, ηηη oo )). )). (3) performance linear 2 role U = Ω(Y − Ψ(υ, )). (3) 2 a of some predefined estimate for υ filters as well. The key result in this area is the minimax The vector υ plays a role of some predefined estimate for υ filters ason well. The key key [Pankov result in inand thisBosov area (1994); is the the minimax minimax theorem correlation Borisov while o filters as well. The result this area is The vector υ plays a role of some predefined estimate for υ η stands for a nominal value (mean, median, mode, The vector υ plays a role of some predefined estimate for υ theorem on [Pankov Bosov Borisov o η median, mode, theorem on correlation correlation [Pankov and and(2004)]. Bosov (1994); (1994); Borisova while o stands for a nominal value (mean, and Pankov (1994); Semenikhin It provides n m theorem on correlation [Pankov and Bosov (1994); Borisov o stands forThe a nominal nominal value (mean, median, mode, while used etc) ofηη the noise.for mapping Ω: R while stands a value (mean, and Pankov (1994); Semenikhin (2004)]. It provides a n → Rmedian, m can bemode, Rm can be used etc) of the noise. The mapping Ω : Rnn → and Pankov (1994); Semenikhin Semenikhin (2004)]. It provides provides theoretical background for application of linear estimators and Pankov (1994); (2004)]. It aa as etc) the The change of coordinates. →R Rm can can be be used used etc)aa of of the noise. noise. The mapping mapping Ω Ω :: R R → theoretical background for application of linear estimators as change of coordinates. theoretical background for application of linear estimators to nonlinear systems, since the minimax predictor is known theoretical background for the application ofpredictor linear estimators as aa change of coordinates. as change of coordinates. to nonlinear systems, since minimax is known to nonlinear nonlinear systems, since the minimax predictor is is known known goal of the problem is to minimize the MSE of the be a linear one since when the theminimax joint distribution of the The to systems, predictor The goal of the problem is to minimize the MSE of the to be one the joint distribution of goal to be aaato linear linear one when when the joint distribution of the the The predictor vector be estimated andthe thejoint vector of measurements The goal of of the the problem problem is is to to minimize minimize the the MSE MSE of of the the to be linear one when distribution of the predictor vector to be estimated and the vector of measurements predictor vector to be estimated and the vector of measurements m m×m is arbitrary but has fixed mean and covariance matrix. predictor vector to be estimated and the vector of measurements U + F U , f ∈ R , F , F ∈ R , (4) X = f + F 1 1 2 2 1 2 m m×m is arbitrary but mean covariance matrix. F ff ∈ R X = ff + F m , F1 , F2 ∈ Rm×m m×m , (4) is arbitrary but has has fixed fixed mean and and covariance matrix. 1U 1+ 2U 2 ,, The most important application of this result matrix. is that m is arbitrary but has fixed mean and covariance U + F U ∈ R , F , F ∈ R , (4) X = + F 1 1 2 2 1 2 U + F U , f ∈ R , F , F ∈ R , X = f + F The most application of result is 1 1 2 structural 2 1 2 f, F1 , F2 . (4) means of choosing coefficients The most important important application of ofthis this result is that that by it explains the robustness propertyof theresult conditionally by means of choosing structural coefficients f, F The most important application this is that 1 ,, F 2 .. by means of choosing structural coefficients f, F F it explains the robustness property of the conditionally by means of choosing coefficients f, F11 , F22when . it explains the robustness robustness property ofSinitsyn the conditionally conditionally optimal nonlinear filter [Pugachev andof (1990)]. it explains the property the Such a problem arisesstructural in many practical situations optimal nonlinear filter [Pugachev and Sinitsyn (1990)]. Such a problem arises in many practical situations when optimal nonlinear filter [Pugachev and Sinitsyn (1990)]. Suchcan problem arises in many manyofpractical practical situations when optimal nonlinear filter [Pugachev and Sinitsyn (1990)]. one only tune parameters the working controlwhen sysSuch aa problem arises in situations one can only tune parameters of the working control sys The work is supported in part by Russian Foundation for Basic one can only tune parameters of the working control system instead of designing a brand-new one. The structure of one can only tune parameters of the working control sys The work is supported in part by Russian Foundation for Basic tem instead of designing a brand-new one. The structure of The work tem instead of designing a brand-new one. The structure of estimate (4) has become traditional in recursive parameter Research (Project No. 13-07-00408). is supported in part by Russian Foundation for Basic tem instead of designing a brand-new one. The structure of The work is supported in part by Russian Foundation for Basic estimate (4) has become traditional in recursive parameter Research (Project No. 13-07-00408). estimate (4) (4) has has become become traditional traditional in in recursive recursive parameter parameter Research (Project (Project No. No. 13-07-00408). 13-07-00408). estimate Research Copyright IFAC 2015 812 Hosting by Elsevier Ltd. All rights reserved. 2405-8963 © 2015, IFAC (International Federation of Automatic Control) Copyright © IFAC 2015 812 Copyright IFAC 2015 812 Peer review© of International Federation of Automatic Copyright ©under IFAC responsibility 2015 812Control. 10.1016/j.ifacol.2015.09.288
MICNON 2015 June 24-26, 2015. Saint Petersburg, Russia A. V. Bosov et al. / IFAC-PapersOnLine 48-11 (2015) 802–807
estimation and stochastic filtering. In the Kalman filtering (KF) algorithm, the state estimate is calculated in accordance with the scheme “forecast + correction” which is adopted by many nonlinear filtering techniques such as the extended KF (EKF) [Jazwinski (1970)], the conditionallyoptimal nonlinear filter (CONF) [Pugachev and Sinitsyn (1990)], the unscented KF (UKF) [Julier and Uhlmann (2004)], the cubature KF (CKF) [Arasaratnam and Haykin (2009)], etc. Similarly to the KF, any recursive estimation algorithm defines equations for both the state estimate vector and the error covariance matrix. However, the latter can dramatically differ from the true error covariance when implementing a nonlinear filtering scheme. Motivated by the issue of evaluating the true accuracy of nonlinear estimates, the next problem considered in the paper is to provide an upper bound for the MSE ˆ − X2 ≤ Jˆ (5) EX ˆ = fˆ + Fˆ1 U1 + Fˆ2 U2 . of the desired predictor X
In order to find the structural coefficients fˆ, Fˆ1 , Fˆ2 and the upper bound Jˆ we will use the confidence regions M, R for expectation and covariance of the augmented vector (6) Z = col[X, U1 , U2 ] ∈ R3m which contains the state vector, forecasting estimate, and correcting term. The sets M and R can be obtained by the preliminary step of Monte-Carlo simulation. The methods for construction of the confidence regions are discussed in the sequel. Now we can formulate the exact statement of the problems outlined above. Definition 1. Given terms (2) and (3), the predictor ˆ = fˆ + Fˆ1 U1 + Fˆ2 U2 (7) X is called conditionally minimax with respect to the uncertainty sets M and R if (fˆ, Fˆ1 , Fˆ2 ) ∈ arg min
sup
f ∈Rm , F1 ,F2 ∈Rm×m Z∼P(M,R)
Ef + F1 U1 + F2 U2 − X2 ,
(8) where Z ∼ P(M, R) means that vector (6) follows an arbitrary distribution with EZ ∈ M, cov{Z, Z} ∈ R. (9) From (8) we obtain the tight upper bound (5) as Efˆ + Fˆ1 U1 + Fˆ2 U2 − X2 . Jˆ = sup
(10)
Z∼P(M,R)
3. MINIMAX THEOREM ON NORMAL CORRELATION The approach developed in this paper is based on the following result [see Pankov and Bosov (1994)]. Theorem 2. Let θ and ξ be random vectors such that Cθ Cθξ θ aθ , C= . (11) ∼ P({a}, {C}), a = aξ Cξθ Cξ ξ Then, the linear predictor of θ from ξ θˆ = aθ + Cθξ C + (ξ − aξ ) ξ
(12) 813
803
and the multivariate normal distribution N (a, C) form a saddle point of the MSE functional Eθ − θ2 as θ and Law{θ, ξ} run respectively over the class of all (linear and nonlinear) estimates and the set P({a}, {C}) of all distributions with mean a and covariance C. The statement presented above is referred to as the minimax theorem on normal correlation, since the minimax predictor (12) coincides with the conditional expectation E{θ | ξ} as soon as Law{θ, ξ} is Gaussian [Liptser and Shiryayev (2005)]. The result analogous to Theorem 2 is valid in the case of uncertain mean a and covariance C. It suffices to assume that a belongs to an affine subspace and C to a convex compact set of semidefinite matrices [Semenikhin (2004)]. The main point of such a statement is the minimaxity of the linear-optimal estimator over the class of all Borel measurable mappings. As far as we know this problem has not yet been examined in the case of a bounded mean. Nevertheless, for linear estimates of a restricted parameter from Gaussian observations, the minimax risk is within a few percent of that considered over the class of all nonlinear rules [Donoho (1994)]. Now from Theorem 2 and the related results we can derive the following. Even if the minimization in (8) is taken over all Borel transformations of the terms U1 and U2 , the minimax value will remain the same (at least for any one-point set M). This important fact provides a solid basis for the concept of conditionally minimax prediction. 4. DUAL OPTIMIZATION APPROACH Along with (6) we will use the following vector: U = col[U1 , U2 ] ∈ R2m . is represented in the form So now, any linear predictor X = f + F U, f ∈ Rm , F = F1 F2 ∈ Rm×2m . X
The block representation similar to (11) will be used for mean and covariance µ RX RXU µ= X , R= µU RU X RU
of the augmented vector Z = col[X, U ].
Let us choose a point µo ∈ M and define the estimate F = µoX + F (U − µoU ), F ∈ Rm×2m , X which is analogous to (12). From now on the point µo is supposed to be fixed and hence the matrix F entirely F . Its MSE is equal to determines the predictor X J(F, K) = tr{[−I F ]K[−I F ]∗ }, where K = cov{Z, Z} + (EZ − µo )(EZ − µo )∗ and I is the identity. Since cov{Z, Z} and EZ run over R and M, the matrix K takes its value in (13) K = {R + (µ − µo )(µ − µo )∗ : R ∈ R, µ ∈ M}. Thus we obtain F − X2 = sup J(F, K). sup E X (14) Z∼P(M,R)
K∈K
The essence of the dual optimization approach is described in the following theorem.
MICNON 2015 804 June 24-26, 2015. Saint Petersburg, Russia A. V. Bosov et al. / IFAC-PapersOnLine 48-11 (2015) 802–807
Theorem 3. (Pankov and Semenikhin (2000)). Let R and M be compact sets such that R is convex and M is symmetric with respect to the point µo (the last condition means: Mo = −Mo if Mo = M − µo ). (1) Then we have the duality relation max J(F, K) = max J(K), min m×2m K∈K
F ∈R
K∈co[K]
5. METHOD OF SEMIDEFINITE PROGRAMMING
where J(K) denotes the dual functional: J(K) = min J(F, K) = tr[KX − KXU KU+ KU X ] m×2m F ∈R
and co[K] is the convex hull of K. (2) If RU is assumed to be always positive definite, i.e., RU O ∀ R ∈ R, (15) then ˆ XU K ˆ = µo + Fˆ (U − µo ), Fˆ = K ˆ −1 , (16) X X U U F ˆ is is the conditionally minimax predictor whenever K a solution to the dual optimization problem ˆ ∈ arg max J(K). (17) K K∈co[K]
ˆ together with the mini(3) The least favorable matrix K max predictor Fˆ form a saddle point: ˆ ≤ J(Fˆ , K) ˆ ≤ J(Fˆ , K) ∀ F, ∀ K ∈ co[K]. J(F, K) It should be noticed that condition (15) can be replaced with a milder one: ˆ U ] ∀ R ∈ R. (18) im[RU ] ⊆ im[R (if so the pseudoinverse can be written in (16) instead of the inverse). For example, (18) is known to be valid in the case when R contains the maximal element R, i.e., R R ∀R ∈ R (19) (A B or B A mean that B − A is positive semidefinite while the both matrices are supposed to be symmetric). In F is used, the maximum of this case, whatever predictor X the MSE is attained at R: sup J(F, K) = J(F, R) + sup J(F, (µ − µo )(µ − µo )∗ ). K∈K
program with the smooth objective of an explicit form and, for a special type of uncertainty, it can be directly solved by a standard optimization toolbox [e.g., Grant and Boyd (2011)]. However the original minimax statement leads to minimization of a non-smooth function which has no tractable representation.
µ∈M
In particular, if there is no uncertainty in mean, i.e., ˆ coincides M = {µo }, then the least favorable matrix K with the maximal covariance R. Requirement (15) may be ignored when the goal is to find an ε-minimax solution Fˆ ε , more specifically, sup J(Fˆ ε , K) ≤ Jˆ + ε. K∈K
To determine Fˆ ε one can use the regularization method adapted to the dual optimization. According to this method the uncertainty set R should be replaced with its regularized version R + ε0 I for some ε0 > 0. The detailed discussion regarding this question and related problems can be found in [Pankov and Siemenikhin (2007)]. Since the minimax predictor Fˆ is defined analytically via ˆ the success in constructing the least favorable matrix K, ˆ F depends on the tractability of the dual optimization problem (17). There are several advantages of the dual optimization approach in comparison with the direct minimization of the worst-case MSE (14). The dual problem is a convex 814
To tackle the dual optimization problem, we bring into play the method of semidefinite programming. As is known any semidefinite program (SDP) can be formulated as the optimization of a linear objective under linear matrix inequalities (LMIs): c, u → max s.t. A(u) − B O, u∈RN
where c ∈ R , A is a linear mapping from RN to a space . , B ∈ Rp×p of symmetric matrices Rp×p s s N
Suppose that the uncertain mean µ belongs to an ellipsoid (20) M = {µ ∈ R3m : Σ(µ − µo ), µ − µo ≤ 1}, o where µ is a fixed vector and Σ is a given positive definite matrix. If the set R of possible covariances R is defined by LMIs and contains only matrices with nondegenerate blocks RU , the minimax predictor can be calculated as (16) given a solution to the dual problem (17). Our aim is to replace (17) with an equivalent SDP. To do this, we first need the convex hull of (13): co[K] = {R + M : R ∈ R, M O, tr[M Σ] ≤ 1}. Introduce a new matrix variable T ∈ Rm×m : T will serve s as a lower bound for the matrix expression whose trace is to maximize in (17). Consequently, the dual problem can be stated as follows: tr[T ] → max s.t. T,K −1 KXU KU KU X
T, K ∈ co[K]. KX − Due to the Schur complement [Boyd and Vandenberghe (2004)], the matrix inequality given above is equivalent to the LMI: KX − T KXU O. KU X KU Thus we have obtained the following theorem. Theorem 4. If M is the ellipsoid (20) and R is determined by LMIs such that (15) is fulfilled, then: (1) the dual optimization problem (17) is identical to the SDP tr[T ] → max s.t. T,R,M R ∈ R, M O, tr[M Σ] ≤ 1, (21) T O R + M − O, O O
and R, M ∈ R3m×3m ; where T ∈ Rm×m s s ˆ and M ˆ, (2) any solution to (21) is comprised of Tˆ, R, where Tˆ provides the tight upper bound (10) Jˆ = tr[Tˆ]
MICNON 2015 June 24-26, 2015. Saint Petersburg, Russia A. V. Bosov et al. / IFAC-PapersOnLine 48-11 (2015) 802–807
ˆ =R ˆ+M ˆ is the least favorable matrix (17) and K which defines the conditionally minimax predictor in the form (16). 6. UNCERTAINTY SETS AND CONFIDENCE REGIONS In this section, we propose a few examples of uncertainty sets for the covariance matrix R. Assume that we have a nominal covariance Ro which then is considered as a center of the sets described below. denotes the cone of symmetric In what follows Rp×p + semidefinite p × p matrices.
To simplify notation we put p = 3m. Example 5. The first example is defined by means of the elementwise constraints [Mittelman and Miller (2010), Ignashchenko et al. (2010)]:
o Rel = {R ∈ Rp×p + : |Rij − Rij | ≤ ∆ij ∀ i, j}, where ∆ is a known symmetric matrix of the appropriate size and with nonnegative elements. It is obvious that R ∈ Rel can be rewritten in the form of LMIs.
In order to apply Theorem 4, it remains to find a condition that guarantees nondegeneracy (15). For every R ∈ Rel we can derive the following relations: Sij ∆ij zi zj ∆ii zi2 + (R − Ro )z, z = ≥
i
i
∆ii zi2
−
i=j
i=j
∆ij |zi zj | = ∆|z|, |z|
|z| = σmin [∆]z, z, ≥ σmin [∆]|z|, is the where |z| is the vector with components |zi |, ∆ matrix with entries ∆ii = ∆ii and ∆ij = −∆ii for all i = j, σmin denotes the minimal eigenvalue. Thus we obtain ∀ R ∈ Rel . R Ro + σmin [∆]I Eventually this leads to the condition U] > 0 σmin [Ro ] + σmin [∆ U
that is sufficient for the block RU to be nondegenerate for any R ∈ Rel . Example 6. To define the next uncertainty set we need the spectral matrix norm · . By definition A2 is equal to the maximal eigenvalue σmax [A∗ A] (or σmax [AA∗ ], it is all the same). If A is symmetric then A = max |σmin [A]|, |σmax [A]| .
Let Rsp be the intersection of the cone Rp×p with the ball + of radius δ > 0 and center Ro : Rsp = {R O : R − Ro ≤ δ}. This is the simplest case of those considered here, since Rsp contains the maximal matrix R = Ro + δI ∈ Rsp : R R ∀ R ∈ Rsp . Hence, the dual optimization problem takes the form tr[RX + MX − (RXU + MXU )(RU + MU )−1 (RU X + MU X )] →
max
M O : tr[M Σ]≤1
805
Besides reducing it to the SPD (21) with R = R, there is an explicit solution based on the eigenvalue decomposition under specific assumptions on the observation model [Eldar et al. (2005)]. In addition, if X ∈ R then the direct minimax problem can be solved analytically due to sup J(F, K) = J(F, R + Σ−1 ) K∈K
which follows from the relation sup{µ, z2 : Σµ, µ ≤ 1} = Σ−1 z, z µ
∀ z.
Thus, in the case of a scalar X, the conditionally minimax ˆ = R + Σ−1 . predictor Fˆ is given by formulae (16) with K ˆ In particular, the tight upper bound J is equal to J(R + Σ−1 ). For a vector valued X, the predictor Fˆ is not minimax but its MSE is bounded above by the same quantity. Example 7. Now let us consider the set Rfr = {R O : R − Ro 2 ≤ δ} that is a neighborhood of the nominal covariance in the Frobenius norm · 2 : A2 = tr[A∗ A] = tr[AA∗ ].
In spite of that R − Ro 2 ≤ δ can also be represented as an LMI, such a condition on the optimization variable falls under the special category named second-order cone constraints [Boyd and Vandenberghe (2004)]. This type of constraints is incorporated in the syntax of toolboxes designed for semidefinite programming [Sturm (1999)]. o If the nominal covariance has the nonsingular block RU then condition (18) holds true. To prove that we first note that the left-hand side inequality for the saddle point (Fˆ , ˆ =R ˆ+M ˆ ) implies K ˆ ∈ arg max tr[SR], S = [−I Fˆ ]∗ [−I Fˆ ] O. R R∈Rfr
ˆ is uniquely determined: Since the solution R ˆ = Ro + δ S, R S2
o ˆU O is sufficient for R we obtain that the condition RU to be full rank.
In order to construct statistical versions of the uncertainty sets we first need to choose a reliability level ρ ≈ 1 and generate mutually independent samples {υ (i) } and {η (i) } from the distributions Law{υ} and Law{η}, respectively. Then we can obtain the samples corresponding to the vector (6): Φ(υ (i) ) Φ(υ) Z (i) = . Ω(Ψ(υ (i) , η (i) ) − Ψ(υ, η o ))
Calculate the sample mean and covariance: ν ν 1 (i) 1 (i) µo = Z , Ro = (Z − µo )(Z (i) − µo )∗ . ν i=1 ν i=1
If the true covariance matrix cov{Z, Z} ∈ Rp×p is nonsingular, then the sample covariance Ro is also nonde-
. 815
MICNON 2015 806 June 24-26, 2015. Saint Petersburg, Russia A. V. Bosov et al. / IFAC-PapersOnLine 48-11 (2015) 802–807
generate for ν > p with probability 1. Hence, the statistic ν(Ro )−1 (µ − µo ), (µ − µo ) follows asymptotically the chi-square distribution χ2 (p) [Anderson (1984)]. Provided in the form by this fact we obtain the confidence region M of ellipsoid (20) with the weight matrix −1 χ2ρ (p) , (22) , r= Σ = r Ro ν where χ2ρ (p) denotes the ρ-quantile of χ2 (p). Thus, for all sufficiently large ν, we obtain ≈ ρ. P EZ ∈ M
Assuming that Z ∼ N (µ, R) and R O, we have D ν/2 R−1/2 Ro R−1/2 − I −→ Ξ as ν → ∞,
(23)
D
where “−→” is the sign of convergence in distribution and Ξ is a random symmetric matrix normally distributed with zero mean and identity covariance operator on the space equipped with the inner product Rp×p s S, T 2 = tr[ST ] ∀ S, T ∈ Rp×p . s The last condition means that tr[ΞS] is a random variable normally distributed with zero mean and variance S, S2 = tr[S 2 ] for every nonrandom symmetric S, or equivalently, the entries Ξii and Ξij = Ξji (i < j) are mutually independent random values such that D{Ξii } = 1 and D{Ξij } = 1/2. From the almost sure convergence Ro → R (ν → ∞) it follows that −1/2 a.s. o −1/2 1/2 a.s. R −−→ I, R1/2 Ro −−→ I. R Therefore, multiplying (23) by these terms preserves the asymptotic distribution D ν/2 I − (Ro )−1/2 R(Ro )−1/2 −→ Ξ as ν → ∞.
To find the confidence regions analogous to Rel , Rsp , and Rfr we have to know the distribution of the norms Ξ∞ = max |Ξij | ≤ i,j
Ξ = max |Ξz, z| ≤ z=1 √ Ξ2ii + ( 2 Ξii )2 . Ξ2 = i
i
Evidently, the squared Frobenius norm Ξ22 follows χ2 (p(p + 1)/2). The distribution function of Ξ∞ has the √ p(p−1)/2 p , explicit form P{Ξ∞ ≤ t} = Φ0 (t) Φ0 (t 2) where t 2 2 e−s /2 ds. Φ0 (t) = √ 2π 0 Denote by Λ(p) the distribution law of the spectral norm Ξ. Then its quantile is bounded by √ 2 −1 1/p p(p+1) / max Φ−1 2, Φ ρ ρ ≤ 0 0 p(p + 1) Λρ (p) ≤ χ2ρ . 2 The left-hand side serves as a lower bound for the ρquantile of Ξ∞ (we denote it by Mρ (p)).
Thus, for a sufficiently large ν, we obtain ≈ρ P cov{Z, Z} ∈ R
816
is any of the following confidence regions: where R el = R O : I − (Ro )−1/2 R(Ro )−1/2 ∞ ≤ δ∞ , R
(24) sp = R O : I − (Ro )−1/2 R(Ro )−1/2 ≤ δ , (25) R fr = R O : I − (Ro )−1/2 R(Ro )−1/2 2 ≤ δ2 (26) R with parameters δ∞ = Mρ (p) 2/ν, δ = Λρ (p) 2/ν, and δ2 = 2χ2ρ (p(p + 1)/2)/ν .
sp has the maximal element: We claim that the region R sp . R = (1 + δ)Ro : R R ∀ R ∈ R (27) To show this, note that I − (Ro )−1/2 R(Ro )−1/2 = −δ I sp . Take z ∈ Rp and R ∈ R sp arbitrarily. and hence R ∈ R o 1/2 Then the change of variable w = (R ) z yields Rz, z = [(Ro )−1/2 R(Ro )−1/2 − I]w, w + w, w ≤
δw, w + w, w = Rz, z
which proves (27).
After introducing the statistical estimates of moment characteristics µo , Ro for the vector Z = col[X, U ], we can define the linear predictor ˆ o = µo + Ro (Ro )+ (U − µo ) (28) X X
XU
U
U
and the estimated value of its error covariance matrix o + o o o P o = RX (RU ) RU X . (29) − RXU Since the CONF approach is based on such an estimator, we call (28) conditionally optimal predictor.
The theorem below describes the main result of the paper. Theorem 8. Suppose that Z = col[X, U ] has the expectation µ and the covariance matrix R O and ρ < 1 is a given reliability level. (1) If Z ∼ N (µ, R) and 0 < ρ0 < ρ, then R∈R > ρ2 P µ ∈ M, 0
is the ellipsoid for all sufficiently large ν, where M o is any of with center µ and weight matrix (22); R o o confidence regions (24)–(26); µ , R are the sample mean and covariance obtained from independent samples Z (i) ∼ N (µ, R), i = 1, . . . , ν. R sp ), then the MSE matrix of the con(2) If Z ∼ P(M, ditionally optimal predictor (28) is bounded above by the estimated value of its error covariance matrix (29) with some factor: ˆ o − X)(X ˆ o − X)∗ (1 + r + δ)P o , E (X and R sp . where r and δ are the parameters of M (3) The predictor that is conditionally minimax with and R sp coincides respect to the uncertainty sets M with the conditionally optimal predictor (28) if one of the following conditions is fulfilled: (a) X is a scalar variable; (b) µ is known to be equal to µo . The tight upper bound of MSE is Jˆ = (1 + r + δ)P o , in case (a), and Jˆ = (1 + δ) tr[P o ], in case (b).
MICNON 2015 June 24-26, 2015. Saint Petersburg, Russia A. V. Bosov et al. / IFAC-PapersOnLine 48-11 (2015) 802–807
Although the statistical inference made about the unknown covariance matrix R are correct only for a Gaussian Z, the result similar to (23) holds in the more general case. If EZ4 < ∞, the asymptotic distribution is also determined but its description are now based on the tensor of ◦ ◦ ◦ ◦ forth-order central moments E{Z i Z j Z k Z l } i,j,k,l=1,...,p . The construction of the confidence region involved samples estimates of these moments is described in [Ignashchenko et al. (2010)]. ACKNOWLEDGEMENTS This work is dedicated to the memory of our Teacher Professor Alexei Rostislavovich Pankov. REFERENCES Anderson, T.W. (1984). An introduction to multivariate statistical analysis. 2nd edition. J. Wiley & Sons, New York. Arasaratnam, I. and Haykin, S. (2009). Cubature Kalman filters. IEEE Trans. Automat. Control, 54(6), 1254– 1269. Arulampalam, M.S., Maskell, S., Gordon, N., and Clapp, T. (2002). A tutorial on particle filters for online nonlinear/non-gaussian bayesian tracking. IEEE Trans. Signal Process., 50(2), 174–188. Borisov, A.V. and Pankov, A.R. (1994). A solution of the filtering and smoothing problems for uncertainstochastic dynamic systems. Internat. J. Control, 60(3), 413–423. Boyd, S. and Vandenberghe, L. (2004). Convex optimization. Cambridge Univ. Press, Cambridge. Donoho, D.L. (1994). Statistical estimation and optimal recovery. Ann. Statist., 22(1), 238–270. Eldar, Y.C., Ben-Tal, A., and Nemirovski, A. (2005). Robust mean-squared error estimation in the presence of model uncertainties. IEEE Trans. Signal Process., 53(1), 168–181. Grant, M. and Boyd, S. (2011). CVX: Matlab software for disciplined convex programming, version 1.21. [Online]. Available: http://cvxr.com/cvx.
817
807
Ignashchenko, E.Y., Pankov, A.R., and Semenikhin, K.V. (2010). Minimax-statistical approach to increasing reliability of measurement information processing. Automation and Remote Control, 71(2), 241–256. Jazwinski, A. (1970). Stochastic processes and filtering theory. Acad. Press, New York. Julier, S.J. and Uhlmann, J.K. (2004). Unscented filtering and nonlinear estimation. Proc. IEEE, 92(3), 401–422. Kogan, M.M. (2014). LMI-based minimax estimation and filtering under unknown covariances. Internat. J. Control, 87(6), 1216–1226. Liptser, R.S. and Shiryayev, A.N. (2005). Statistics of random processes. Springer-Verlag, New York, 3rd edition. Li, L., Luo Z.-Q., Davidson T., Wong M., and Bosse, E. (2002). Robust filtering via semidefinite programming with applications to target tracking. SIAM J. Optimization, 12, 740–755. Mittelman, R. and Miller, E.L. (2010). Robust estimation of a random parameter in a Gaussian linear model with joint eigenvalue and elementwise covariance uncertainties. IEEE Trans. Signal Process., 58(3), 1001–1011. Pankov, A.R. and Bosov, A.V. (1994). Conditionally minimax algorithm for nonlinear system state estimation. IEEE Trans. Automat. Control, AC-39, 1617–1620. Pankov, A.R. and Semenikhin, K.V. (2000). The parametric identification methods for many-dimensional linear models in the presence of a priori uncertainty. Automation and Remote Control, 61(5), 785–800. Pankov, A.R. and Siemenikhin, K.V. (2007). Minimax estimation for singular linear multivariate models with mixed uncertainty. J. Multivariate Analysis, 98(1), 145– 176. Pugachev, V.S. and Sinitsyn, I.N. (1990). Stochastic differential systems. J. Wiley & Sons, London. Semenikhin, K.V. (2004). The optimality of linear estimation algorithms in minimax identification. Automation and Remote Control, 65(3), 493–503. Sturm, J.F. (1999). Using SeDuMi 1.02, a MATLAB toolbox for optimization over symmetric cones. Optimization Methods and Software, 11(12), 625–653.