isti c, worst-case point of view basically puts equal weight on all observations.
The matrix 0 determin es the "shape" of the ellipsoid (its eigenvalues are the squared semi-axis lengths) , while th e vector Bis the center. The size of th e ellipsoid is chosen to be measured by the sum of the squared semi-axis lengths, which is equal to Tr 0 (the trace of 0).
In this paper, we will consider ARX model stru ctures with time-varying coeffici ents which are postulated to belong t o an ellipsoidal set. This model set is suitable for modeling phenomena wh ose und erlying data generation mechanism is uncertain and time-varying , including non-linear behaviors , multiplicative noise, and non-station ary un certainty. Based on time-domain data, we seek the smallest ellipsoidal set in the space of the coefficients of the model, such that the resulting model is unfalsified by th e data. Our main result is that the problem can be solved exactly in polynomial-time, using (convex) semidefinite programming (Vandenberghe and Boyd, 1996). Our approach can be termed "optimal model unfalsification" (OMU) in that the bounds on the coefficients of the model (the ellipsoid that is id entified) are not assumed a priori. We also address the issue of outlier detection, whi ch is crucial in the context of bounded noise approaches to estimation , by expressing the problem in a stochastic framework . We extend our results to a class of multi vari a ble systems with unstructured uncertainty.
2. ARX MODELS We consider discrete-time, time-varying AR mod el of the form n
y(k)
=L
e;(k)y(k - i)
+ e(k)
(1)
;=1
wh ere y(k) is the m easured output, and the vector of coeffi cients e(k) E Rn is time-varying. The additive noise term is deterministic , bounded by le( k)1 ::; " where, 2: 0 is given. The Det erministic Optimal Model Unfalsification (DOMU) problem is as foll ows: Given a time horizon T > 0, a data sequence y(O), ... , y( n+ T1) , find the smallest (in the sense defined above) ellipsoid E(0 , B) in ithe space of parameters such that the model defined by (1) is unfalsified by the data, i.e.,
The identification scheme results in a model with time-varying , norm-bounded coefficients that is directly amenable to worst-case simulation (El Ghaoui and Calafiore, 1999) and robust control techniques (Boyd et al. , 1994) for systems with unstrucl.ured uncertainty. The absence of structure in the uncertainty makes the simulation and control problems very simple computationally.
Vk, n::;k::;n+T-l, 3e(k)EE(0,O) , IY(k) -
~ e;(k)y(k -
i) ::; ,.
(2)
The proposed methodology amounts to assume that th e observed signal has been generated by a linear time-varying system with deterministic , multiplicative and additive perturbation; we seek the "smallest" multiplicative perturbation that makes the system consistent, for a given level of additive perturbation .
The proposed approach is quite different from the set-membership estim ation fram ework. In this latter case , the mechanism generating th e data is assumed to be a linear time invariant system subject to bounded additive disturbance , and th e identification procedure is a imed at determining the set of all constant parameters that are consistent with observed data. Extensive references to the set-membership identification approach may be found in (Milanese et al., 1996).
2.1 SDP solution
We seek an ellipsoid E(0 , B) of minimal size Tr 0, such that for every k, n ::; k ::; n + T - 1, (1) holds for some e(k) E E(0, 0) and le(k)1 ::; ,. For every k , and given e(k), le(k)1 ::; " the hyp erpl ane defin ed by (1) intersects th e ellipsoid E(0, B) if and only if
Notation
For a square matrix X, X >- 0 (resp . X ~ 0) means X is symmetric, and positive-definite (resp. semidefinite). For 0 E Rn xn, with 0 >- 0, and {} E Rn, the notation E(0 , B) denotes the ellipsoid
x(k)T0x(k) 2: (x(kfO - y(k) - e(k))2, or, equivalently,
[
Note that the above expression allows for singular 0 , th at is , "flat" ellipsoids , for example singletons.
X(k f 0X(k) m(k)] ~ 0, where m(k) 1 m(k) = x(kfO - y(k) - e(k).
This yields the following result .
284
(3)
Theorem 1. A deterministic optimal unfalsified model is obtained by solving the following semidefinite programming (SDP) optimization problem in variables e, iJ and e(k), 1 :::; k :::; T
This interpretation will prove useful in the context of outlier detection, which is discussed in the next section . We assume now that the coefficient vector B( k) is random, and obeys to a time-varying distribution, with constant mean iJ and covariance matrix 0. We are given a confidence level 'I) (0 < 'I) < 1) , and a parameter 1 > O. We now pose the Stochastic Optzmal Model Unfalsification (SOMU) problem: Given a time horizon T > n, a data sequence y(O), ... , y(T -1) , find the smallest (in the sense of the trace) covariance matrix 0, and a mean vector B, such that, for every k, 1 :::; k :::; T, there exist a distribution on the coefficient vector B( k) having mean iJ and covariance matrix 0 that satisfies
minimize Tr e subject to
e;..
0 [X(kfeX(k) m(k)] ;.. 0
-,
m( k )
1
-,
( 4)
Iy(k) - x(kf iJ - m(k)1 :::; I, n:::;k:::;n+T-l.
The convex optimization problem (4) is a semidefinite program, that is, it can be expressed as the minimization of a linear objective under linear matrix inequality (LMI) constraints: minimize cT z subject to m
F(z) := Fa
+L
ZiFi ~ 0,
(5)
;=1
In the above problem, the parameter process B(k) is stationary, in the sense its first and second moment are constant in time . However , the resulting data-generation process y(k) is not, in general, stationary.
where m = n(n + 3)/2 is the number of independent variables in 0, iJ, Z is a vector containing those variables, c E R m is an objective vector , and the Fi'S are given symmetric, block-diagonal matrices consisting of T blocks of order 2, and one block of order n.
Define
Pk(iJ,0)
The SDP (4) can be solved very efficiently in a time polynomial in n (the order of the system) and T (the horizon of analysis) . The precise complexity result is: using a general-purpose interior-point method such as the one described in (Vandenberghe and Boyd, 1996), the number of iterations needed to compute an (-optimal solution grows as O( fo) in the worst-case, and is constant (between 50 and 200 , depending on the method) in practice. Each iteration requires solving a least-squares problem, the complexity of which grows as O(T 2 n 6 ). With no doubt, it is possible to exploit the particular structure of the above problem for greater efficiency; this is left for future research.
=
sup Prob {IY(k) -
~ {};(k)y(k -
i)1 :::;
I} ,
where the sup is taken over all distributions with mean iJ and covariance matrix 0. Our problem is expressed as
o ~ 0,
min,imize Tr e subject to (6) pd{},0);::: 'I), n:::; k :::; n + T - 1.
The following is a direct application of very general results reported in (Bertsimas and Sethuraman, 2000). We have 1 sup Prob{I~I:::;,}= l+d 2 '
Remark 1. If we postulate that the optimal shape matrix 0 is positive-definite at the optimum, then we can ignore the inactive constraint 0 ~ 0 in the above SDP. The problem then reduces to a simple convex quadratically constrained problem, which can be solved in O(n 3 ) .
where the sup is taken over all distributions of a scalar, random variable ~ with given mean J-l and variance (J" > 0, and where
Remark 2. It is possible to develop a recursive version of the algorithm, by which the ellipsoid is updated with new each measurement. For details, we refer to the forthcoming paper (G haoui and Calafiore, 2000).
We conclude that the sup above is greater than 'I) for some distribution with mean J-l and variance (J" ;::: 0 if and only if there exists a scalar e such that
2.2 A stochastic interpretation
(Note that the result is trivial when follows from the above otherwise.)
The above result can be interpreted in a stochastic context, using ideas from large deviations theory.
285
(J"
= 0, and
tain the family of ellipsoids that solve the stochastic OMU problems with confidence level TJ.
Let us now return to our problem, and assume that B(k) o.beys to a time-varying distribution with mean (J and covariance matrix 0 . For every k, the random variable y( k) - x( k f (J( k) has mean y(k) - x(k)TO and squared variance x(kf0x(k). Thus, Pk(O, 0) 2: TJ if and only if there exists a variable ECk) such that le(k)1 :S , and
2.3 Outtier detection Detecting outliers is especially important in the context of bounded noise uncertainty, since every measurement is taken into account. The parallel between the stochastic and deterministic approach allows us to develop an efficient heuristic for ou tlier detection.
(y(k) - x(kfO - e(k))2 :S 1 - TJ x(kf0x(k). TJ This yields the following result.
Theorem 2. The optimization problem (6) is equivalent to the semidefinite programming problem in variables 0,0 and ECk), 1 :S k :S T:
o
minimize Tr 0 subject to TJ --x(kf0x(k) m(k) 1~ 0, TJ [ m(k) 1
1
~
Consider a batch of data, for which we solve the deterministic OMU problem, based on the conditions of theorem 1, for a time period n < k:S n + T - 1. This result in an ellipsoid E(0, Now consider a new data yen + T). We can easily test if y( n+ T) falsifies our model or not: it suffices to check if th ere exist e(n + T) such that
ef
0,
Iy(k) - x(kfO - m(k)1 :S " n:Sk:Sn+T-1.
x(n + Tf0x(n + T) 2: (x(n + Tfo - yen + T) - e(n le(n + T)I :S ,.
We observe the close connection between the above result and the one derived in a purely deterministic manner in theorem 1. Indeed, by setting TJ 1/2 in the LMI conditions above, we recover those obtained in the latter theorem . In words , the deterministic OMU problem is equivalent to searching the "smallest" covariance matrix such that there exist a distribution that "explains" the data, up to a confidence level of 1/2.
(Note n +T What above
=
Ix(kfO - y(k)1 :S J.
1-
1)
--x(n + Tf0x(n + T) > TJ (x(n + Tf'O - yen + T) - e(n le(n + T)I :S ,
Likewise, setting TJ ---> 0 makes the constraints in the stochastic OMU problem feasible for any o ~ O. This should be expected: relaxing our level of confidence makes any covariance matrix feasible.
+ T))2,
holds for some e(n + T). (Computing TJ(n + T) is a simple exercise left to the reader.) If TJ(n + T) < TJth, we declare the new measurement y( n + T) to be an outlier, and we discard it . Otherwise , we decide that the data falsifies the model and that a new ellipsoid E(0, 0) should be computed.
Finally, we observe that a solution (0 , 0) to the deterministic OMU problem provides an immediate solution (B", 0,,) to the stochastic OMU problem for an arbitrary value of TJ: it suffices to set •• TJ
0"
that x( n+ T) contains values of y up to time - 1 only, and is postulated to be correct.) do we do if yen + T) does not satisfy the condition?
To decide upon this issue, we set an outlier threshold level TJth , with 0 < TJth < 1/2 . Define TJ(n + T) to be the largest 1) such that the condition
In this case, there exists a constant parameter 0 such that the corresponding AR model explains the data. This condition holds if and only if the deterministic OMU problem has a solution with 0=0 (that is, the ellipsoid E(0, 0) is reduced to a singleton).
= B,
(7)
We face two possibilites. Either we declare the mod el falsified , that is, the ellipsoid E(0,O) should be recomputed, using the new measurement (that is , with (7) as an additional constraint). Another possibility is to declare the data that causes problem to be an outlier that should be discarded.
We also observe that requiring a confidence level TJ = 1 requires that , for every k , 1 :S k :S T, we have
0"
+ T))2,
If we are to discard the new data y( n + T), we must replace it with an estimate. Clearly, our best estimate obtains when B( n + T) is set to the center of the ellipsoid E(0, 0), and e(T) = O. The estimate is
= -1-0. -TJ
This means that , by scaling the ellipsoid obtained in a deterministic setting, by a factor 2ry, we ob-
y(n
286
+ T) = x(n + Tfo.
2.4 One-step ahead prediction
unfalsification conditions. Note that U can be interpreted as an "ellipsoid" in a matrix space equipped with the largest singular value norm. One obvious measure of the size is Tr P + Tr Q . By homogeneity, we may without loss of generality set Tr P 1. We obtain the following result .
Assuming that the parameter vector 8( k) lies in E(8, e) for k = n, ... , n + T - 1, we can compute the predicted bounds for the next output iterate y(T + n). The possible values for this output lie in an interval [y+(n + T) y-(n + T)], where y± (n
+ T)
=
Theorem 3. An optimal unfalsified model for (9) is obtained by solving the following SDP optimization problem in variables A, P, Q:
= eT x (n + T)
± (/+ jX(n+T)T8x(n+T)) ,
(8)
Thus, the one-step ahead prediction error is 1 Jx(n + T)T8x(n + T).
minimize Tr Q subject to Q ~ 0, Tr P = 1,
+
P m(k)] [ m(kf' x(kf' Qx(k)
If the purpose of the identification is prediction in the above sense, it is clear from the above result that we could modify the objective in the optimization problem (4) in order to optimize the prediction error. This is done by replacing the objective Tr 8 in the SDP by the squared prediction error x(n+Tf8x(n+T). The resulting problem is still an SDP, with objective of the form Tr8W, where W = x(n+T)x(n+Tf.
0
~ ,
(10)
Iy(k) - Ax(k) - m(k)1 :::; I, n:::;k:::;n+T-l. At the optimum, an unfalsified model is given by (9) , with L a full column rank matrix such that LLT = P, and R a full row matrix such that Q = RT R. The above result reduces to the basic theorem when we impose an additional linear constraint on the variable A, that A should be in upper companion form (the coefficients of which have the interpretation of the variable used in the previous section).
In our experiments, we have found it is better to account for the prediction errors at every step , and include some forgetting factor. The corresponding objective is of the form Tr8W , with
e
N
W= L1/x(n+T-k)x(n+T-kf', k=O
4. NUMERICAL EXAMPLES
where TJ > 0 is the forgetting factor, and N < n + T is the window length .
We have generated data by a time-varying ARX(3,2: process with a prevalence of multiplicative noise over additive noise.
3. MULTIVARIATE MODELS
We applied the OMU paradigm to identify an uncertainty model of the form (1), from the inputoutput observations. We assumed the correct order of the model (i.e. n = 3 and m = 2) in the identification step, and then computed robust bounds on one-step-ahead predictions, using the technique described in Section 2.4. The predictions are computed using a new set of validation data, different from the one used for identification . We also compared the so obtained results with classical probabilistic bounds obtained identifying a standard stochastic ARX(3 ,2) model on the same data. The results are displayed in Figure 1, where the shaded region represents 98% confidence bounds on the predictions , computed according to the standard stochastic approach.
The simple ARX model in (1) is a special case of multivariate models of the form
y(k + 1) = (A + L~(k)R) x(k) + e(k), Ilz(k)1I2:::; 1,
(9)
where x(k) = (y(k - 1), . . . , y(k - n)) is the state, and A, L, R are appropriate matrices that depend on the parameters 8 computed by the identification procedure. The above model can be interpreted as one with norm-bounded, unstructured uncertainty (in the sense classical in robust control).
e,
We may generalize the previous approach to arbitrary models of the form above . For simplicity, we set e( k) = 0, and seek the "bset" model parameters A, L, R that are consistent with data values y(k), x(k). Let us introduce new variables,
P:= LLT E
Rnyxny,
Q:= RTRE R n • xn .
It is worth to notice that the 98% bounds computed from the stochastic ARX model are actually conservative with respect to the robust ones. This is to be ascribed to the fact that the standard ARX tries to explain the data using only additive noise, while the observed process has a prevalence of multiplicative noise. A plot of the type of that
,
We would like to minimize the "size" of the set U := {A + L~R I II~II :::; I}, subject to the
287