Automatica 42 (2006) 589 – 600 www.elsevier.com/locate/automatica
Least-squares estimation of a class of frequency functions: A finite sample variance expression夡 H˚akan Hjalmarsson a,∗ , Brett Ninness b a Department of Signals, Sensors and Systems, The Royal Institute of Technology, S-100 44 Stockholm, Sweden b School of Electrical Engineering & Computer Science, University of Newcastle, Australia
Received 23 March 2004; received in revised form 24 November 2005; accepted 19 December 2005
Abstract A new expression for the variance of scalar frequency functions estimated using the least-squares method is presented. The expression is valid for finite sample size and for a class of model structures, which includes finite impulse response, Laguerre and Kautz models, when the number of estimated parameters coincides with the number of excitation frequencies of the input. The expression gives direct insight into how excitation frequencies and amplitudes affect the accuracy of frequency function estimates. With the help of this expression, a severe sensitivity of the accuracy with respect to the excitation frequencies is exposed. The relevance of the expression when more excitation frequencies are used is also discussed. 䉷 2006 Elsevier Ltd. All rights reserved. Keywords: Frequency function; Model variance; FIR models; Laguerre models; Kautz models; Periodic signals
1. Introduction An essential ingredient in system identification is to quantify the error in the estimated model. It is common to discriminate between the “bias” error, which is due to the model structure being restricted compared to the system being estimated, and the “variance” error, which is due to noise and disturbances in the observed input–output data. There have been significant efforts to quantify both these types of errors, as well as the total model error. The reader is referred to Hjalmarsson (2005), Ninness and Goodwin (1995), Mäkilä, Partington, and Gustafsson (1995), Milanese and Vicino (1991) for overviews of this topic. When the true system is in the model set and disturbances can be modeled as stochastic, the total model uncertainty can be
夡 This paper was not presented at any IFAC meeting. This paper was recommended for publication in revised form by Associate Editor Antonio Vicino under the direction of Editor Torsten Söderström. This work was completed while the first author visited The School of Electrical Engineering & Computer Science, Newcastle, Australia. ∗ Corresponding author. Tel.: +46 8 790 8464; fax: +46 8 790 7329. E-mail addresses:
[email protected] (H. Hjalmarsson),
[email protected] (B. Ninness).
0005-1098/$ - see front matter 䉷 2006 Elsevier Ltd. All rights reserved. doi:10.1016/j.automatica.2005.12.021
quantified in terms of the variance error. In many applications, in particular control design, it is of importance to assess the uncertainty of the associated frequency function estimate. Let ˆ N (ej ) denote a scalar frequency function estimate based G on N input–output samples. For model structures where the frequency function is linear in the unknown parameters, i.e. when ˆ N (ej ) = ˆ TN (ej ), G
(1)
where ˆ N ∈ Rn is the vector of estimated parameters and (ej ) is a known vector-valued frequency function, the variance of the frequency function estimate becomes linear in the covariance matrix of the parameter estimate. For non-linearly parameterized frequency functions, a similar expression apply for large sample sizes. The latter follows from a first-order Taylor approximation of the frequency function (Ljung, 1999). Even though such expressions are useful, e.g., they can be used in experiment design to shape the variance error of the frequency function (see, Bombois, Scorletti, Gevers, Hildebrand, & Van den Hof, 2004; Cooley & Lee, 2001; Hildebrand & Gevers, 2003; Jansson & Hjalmarsson, 2005; Jansson, 2004), they do not provide much insight into how different design variables affect the variance error. To this end it is instructive
590
H. Hjalmarsson, B. Ninness / Automatica 42 (2006) 589 – 600
(q −1 is the time-shift operator: q −1 u(t) = u(t − 1)). By introducing
to introduce ˆ N (ej )) N · u () , n,N () = Var(G v ()
x(t) = F (q)u(t),
where u () and v () denote the input and noise spectra, respectively. Thus one can write ˆ N (ej )) = n,N () Var(G
v () , N · u ()
(2)
ˆN whenever Var(G is well-defined. Under certain assumptions exact expressions for, or accurate approximations to, n,N () exist. In the mid 1980s, the following result was derived (presented here for the case of open loop operation) (Ljung & Yuan, 1985; Ljung, 1985) N ˆ N (ej )) = v () , Var(G m→∞ N →∞ m u () lim
(3)
where m is the model order. Result (3) implies that n,N () ≈ m
(4)
for large enough model order m and sample size N. For Box–Jenkins and output-error models, pole-zero cancellations occur when the model order exceeds the underlying true system order. This is a complicating factor in the derivation of (3) which requires the use of regularization techniques. In Ninness and Hjalmarsson (2004a) it is shown that as the model order tends to infinity, it is the regularization only that determines the variance. In fact, result (3) holds for these types of model structures only when the regularization is towards the origin. Starting with Ninness, Hjalmarsson, and Gustafsson (1999), there have been a number of contributions towards more exact expressions for n,N () for different model structures. The case of a model with fixed denominator and fixed moving average noise model excited by an auto-regressive (AR) input is studied in Xie and Ljung (2001) and an exact expression for n ()limN→∞ n,N () is derived. This expression is thus not asymptotic in the number of parameters. A generalization of this result, including Box–Jenkins models, is presented in Ninness and Hjalmarsson (2004b). Here this evolution is continued. It will be assumed that the true system is single-input/single-output and given by y(t) = Go (q)u(t) + eo (t),
(5)
where {eo (t)} is a zero mean independent process with timeinvariant variance o . The system dynamics are given by Go (q) = To n (q)F (q),
(6)
where o ∈ Rn , F (q) is a known stable transfer function and n (q) = [q −1 , . . . , q −n ]T
we may rewrite (5) as y(t) = To (t) + eo (t),
(9)
with (t) given by
(ej ))
lim
(8)
(7)
(t) = [x(t − 1), x(t − 2), . . . , x(t − n)]T .
(10)
It is assumed that the model structure is of the same form so that its one-step ahead predictor is given by y(t, ˆ ) = T (t),
(11)
where ∈ Rn . Notice that finite impulse response (FIR) models correspond to F (q) = 1. Furthermore, as noted in Ninness et al. (1999), by taking F (q) = 1/A(q) for a suitably chosen polynomial A(q), the above model structure can parametrize exactly the same model class as orthogonal basis function structures such as Laguerre and Kautz models and their generalization (Heuberger, Wahlberg, & Van den Hof, 2005; Ninness et al., 1999). Since a one-to-one reparametrization leaves the variance properties of the frequency function estimate intact, the variance results to be presented below encompass the aforementioned model structures as well. The class of model structures for which the results apply is thus fairly general. The objective of the paper is to provide some insight into how the input excitation affects the accuracy of the estimate of Go . ˆ N (ej )) when The contribution is a new expression for Var(G the model parameters are estimated using the least-squares method. The expression holds when x(t) is periodic with the period time equal to the sample size N and when the number of estimated parameters is equal to the number of excitation frequencies of the input. Its utility arises from the excitation frequencies and amplitudes appearing in a simple form that makes their impact easy to interpret and thus adds to the general understanding of system identification. From the expression we will see that the variance is very sensitive to the choice of excitation frequencies. Before plunging into details we would like to direct the reader’s attention to Campi, Ooi, and Weyer (2004, 2005) where model uncertainty is assessed for models of similar type as in this paper but using a completely different approach based on measured data. The paper unfolds as follows: Fourier representation of signals is briefly reviewed in Section 2, and least-squares estimation is then discussed in Section 3. Some fundamental results on certain quadratic forms are covered in Section 4. The main variance results are presented in Section 5 and their interpretations are discussed in Section 6. Finally, Section 7 contains a summary.
H. Hjalmarsson, B. Ninness / Automatica 42 (2006) 589 – 600
591
2. Discrete Fourier transform representation
3. Least-squares estimation
Definition 1 (N-periodic signals). Let x : I →R with domain of definition I ⊂ Z (the field of integers) including {0, . . . , N − 1}. Then x is said to be N-periodic if
Consider the model structure represented by the predictor (11) with given by (10). When the parameter is estimated using the least-squares method based on N samples of y(t) and (t), this results in the estimate −1 N N 1 T ˆN = 1 (t) (t) (t)y(t). (18) N N
x(t + N) = x(t),
(12)
whenever t ∈ I and t + N ∈ I . The discrete Fourier transform (DFT) of an N-periodic signal x is defined as N−1 1 ˜ Xk = x(t)e−j(2/N)kt , N
k = 0, . . . , N − 1.
(13)
t=0
which gives the corresponding inverse DFT representation of x x(t) =
N−1
X˜ k ej(2/N)kt ,
t ∈ I.
(14)
k=0
t=1
t=1
When x, which has domain {−(n−1), . . . , N −1}, is N-periodic, it follows that N 1 (t)T (t) = Tn (rxx ), N
where Tn (r) denotes the n × n Toeplitz matrix with elements tij = r(i − j ). Hence, (18) can in such cases be written as
By removing the terms with X˜ k = 0, we get a more compact description which relates to the following definition.
1 ˆ N = T−1 n (rxx ) N
Definition 2 (Excitation order and spectral decomposition). An N-periodic signal x has excitation order m equal to the number of non-zero DFT elements X˜ k defined in (13). The corresponding inverse DFT representation
The function
x(t) =
m
Xk ejk t ,
t ∈I
(15)
k=1
is called the spectral decomposition of x. Above, 0 1 2 · · · m < 2, with the k , k = 1, . . . , m on the grid 2l/N , l = 0, . . . , N − 1, are the (unique) excitation frequencies and Xk ∈ C, k = 1, . . . , m are the corresponding (complex-valued) excitation amplitudes. Remark 3. Comparing with (14), we see that for a spectral decomposition it holds that for each k = 1, . . . , m, k can be written as k = 2l(k)/N for some l(k) ∈ {0, 1, . . . , N − 1} and that Xk = X˜ l(k) . Furthermore, since we are considering real valued signals, for each excitation frequency k there is a corresponding excitation frequency k =2−k and Xk =X k , the complex conjugate of Xk . We now define the covariance function for an N-periodic signal with domain {−(n − 1), . . . , N − 1} n > 0 as N−1 1 rxx ( ) = x(t)x(t − | |), N
| |n − 1.
(16)
t=0
The restriction of the range of the argument is due to the limitation of the domain on which x is defined. Using the spectral decomposition (15), we can express the covariance function as rxx ( ) =
m k=1
2 jk
|Xk | e
,
| |n − 1.
(17)
(19)
t=1
N
(t)y(t).
(20)
t=1
ˆ N (ej ) = ˆ TN n (ej )F (ej ) G
(21)
is an estimate of the true frequency function Go (ej ). The main contribution of this paper is to derive a transparent expression ˆ N (ej ). The starting point is the following for the variance of G standard lemma. Lemma 4. Suppose that (t) is given by (10) where x, with domain {−(n − 1), . . . , N − 1}, is N-periodic and that the number of estimated parameters n is not larger than the excitation order for x(t). Then the least-squares estimate ˆ N ∈ Rn , defined by (18), is well defined and has covariance matrix Cov{ˆ N } =
o −1 T (rxx ), N n
(22)
where rxx is the covariance function (16) of x. ˆ N (ej ), defined in (21), is Furthermore, the variance of G given by ˆ N (ej )} = Var{G
o j |F (ej )|2 ∗n (ej )T−1 n (rxx )n (e ), N
(23)
where ∗n denotes the complex conjugate transpose of n . Proof. The conditions on x(t) imply that (19) holds. It is a standard result that Tn (rxx ) is non-singular when the number of estimated parameters n is no larger than the excitation order. Hence, the least-squares estimate (20), which can be written as 1 ˆ N = o + T−1 n (rxx ) N is well-defined.
N t=1
(t)eo (t)
(24)
592
H. Hjalmarsson, B. Ninness / Automatica 42 (2006) 589 – 600
Secondly, the covariance matrix of ˆ N is obtained by forming (ˆ N − o )(ˆ N − o )T using (24) and then taking expectation. By noticing that the regressor vector (t) is deterministic, this expression simplifies to (22). ˆ N is a linear function of ˆ N , (23) is straightFinally, since G forward. Formula (23) expresses exactly how model order, input excitation, noise variance and sample size all influence the model uncertainty as expressed by the variance of the frequency function estimate. However, exactly how the excitation frequencies and amplitudes affect the variance is not obvious from this expression. The key to a more transparent variance expression for the frequency function estimate is to get some insight into the quadratic form j ∗n (ej )T−1 n (rxx )n (e ).
(25)
4. Some results on a quadratic form It is very easy to rewrite the quadratic form (25) in a simpler form when the following two assumptions are valid. A. 1. x has domain of definition {−(n − 1), . . . , N − 1} and is N-periodic with the excitation order m equal to n, the dimension of n . A. 2. The excitation frequencies k for x are evenly distributed on the interval [0, 2), i.e. k = 2(k − 1)/m, k = 1, . . . , m. Notice that Assumption A.2 requires that the sample size N is a multiple of the excitation order. In Lee (2003) it is claimed (cf. Eq. (22) in Lee, 2003), without proof, that when Assumptions A.1 and A.2 both hold then (25) evaluated at any of the excitation frequencies, i.e. = k for some k ∈ {1, . . . , m}, can be written as jk ∗n (ejk )T−1 ) n (rxx )n (e 2 n = ∗ j . n (e k )Tn (rxx )n (ejk )
(26)
Below we will show that this is true, but that in fact more can be said about the quadratic form (25) under Assumptions A.1 and A.2. First, notice that using (17) we can express Tn (rxx ) as Tn (rxx ) =
n k=1
|Xk |2 n (ejk )∗n (ejk ).
(27)
Now with k = 2(k − 1)/n, k = 1, . . . , n, it follows that n (ejk ), k = 1, . . . , n are orthogonal vectors and hence (27) is an orthogonal decomposition of Tn (rxx ). Therefore, n 1 1 T−1 (r ) = n (ejk )∗n (ejk ), xx n n2 |Xk |2 k=1
(28)
where the factor 1/n2 arises because n (ejk ) = n. Thus it holds for any frequency that j ∗n (ej )T−1 n (rxx )n (e ) n 1 1 = 2 |∗ (ej )n (ejk )|2 n |Xk |2 n k=1 ⎧ 1 ⎪ = k , ⎨ |X |2 , k 2 = 1 ejn − 1 ⎪ 1 n ⎩ otherwise. n2 k=1 |Xk |2 ej − ejk
(29)
Under Assumptions A.1 and A.2, there is thus a very simple expression that relates the excitation amplitudes Xk to the quadratic form (25). Replacing Tn (rxx ) on the right-hand side of (26) by (27) and using the orthogonality of n (ejk ), k=1, . . . , n shows that (26) equals the more general expression (29) at the excitation frequencies = k , k = 1, . . . , n. Inserting (29) in (23) gives us a variance expression for ˆ N (ej ) which explicitly shows how the excitation amplitudes G Xk influence the variance of the frequency function estimate when the excitation frequencies are evenly distributed over the whole frequency axis, i.e. k = 2(k − 1)/n, k = 1, . . . , n. In general, the excitation frequencies k , k = 1, . . . , n can be chosen on the grid 2l/N , l = 0, . . . , N − 1 (subject to the condition in Remark 3) and it is, of course, also of interest to understand how this choice influences the variance of the frequency function estimate. However, for a general selection of {k }nk=1 on this grid it no longer holds that n (ejk ), k = 1, . . . , n are orthogonal. This implies that (28) no longer is valid and the derivation (29) breaks down. A main technical contribution of this paper is Lemma 6 below which extends the above analysis to this case. This will provide a tool for understanding the impact of the excitation frequencies on the accuracy of the frequency function estimate. As a prerequisite we need the next lemma. Lemma 5 (Corollary 5.2 in Ninness and Hjalmarsson, 2004b). Suppose that
1 R= n (ej )∗n (ej )|Q(ej )|2 d, (30) 2 − where n is defined in (7) and where Q(z) has the AR(m) (AutoRegressive of order m) form Q(z) =
z , L(z)
L(z) =
m−1
(z − k ),
|k | < 1.
(31)
k=0
Then, with m = n − m 0: ∗n (ej )R −1 n (ej )=
m−1 1−|k |2 1 m + . (32) |Q(ej )|2 |ej −k |2 k=0
Lemma 6. Let Assumption A.1 be in force and let x have the spectral decomposition (15).
H. Hjalmarsson, B. Ninness / Automatica 42 (2006) 589 – 600
Then j ∗n (ej )T−1 n (rxx )n (e ) =
n k=1
1 vk (), |Xk |2
(33)
where rxx is the covariance function of x, n is defined by (7), and j m e − ejl 2 (34) vk () = ejk − ejl , k = 1, . . . , m. l=1,l =k
Proof. See Appendix A.
Notice that when k = 2(k − 1)/n, k = 1, . . . , n (i.e. when, in addition, Assumption A.2 is valid) it holds that n
ejk − ejl = lim
z→ejk
l=1,l =k n
e
j
−e
jl
=
l=1,l =k
zn − 1 = ne−jk , z − ejk
n
j jl l=1 e − e ej − ejk
=
593
F (q) = 1, i.e. for finite impulse response (FIR) models, the condition on x is, of course, exactly the condition on the input. In this case we notice that for Assumption A.1 to be valid, the input has to be tapered, i.e. u(t)=0 for t =N −n+1, . . . , N, for a system that is initially at rest when the data collection starts. In particular, Theorem 7 applies when the input is N-periodic with excitation order n and when the transient in F has died ˆ N (ej ) out. The following corollary gives the variance of G expressed in terms of the spectral decomposition of the input in this case. Corollary 8 (Periodic input). Let (t) and ˆ N be defined by (10) and (18), respectively. Let u(t), t < N, have period N, excitation order equal to the model order n and spectral decomposition u(t) =
n
Uk ejk t ,
t N − 1.
(36)
k=1
ejn − 1 ej − ejk
Then the variance of the frequency function estimate (21) is given by ˆ N (ej )} = Var{G
and hence (33) is identical to (29) in this case.
m
1 o vk (), |F (ej )|2 j N |F (e k )Uk |2
(37)
k=1
5. Main results
where vk is defined by (34).
We will now use Lemmas 6 and 4 to express the variance ˆ N (ej ) when the excitation frequencies of x in (10) are of G chosen arbitrary on the grid 2l/N , l = 0, . . . , N − 1.
Proof. The periodicity of u(t) implies that the signal x(t) = F (q)u(t), t = −(n − 1), . . . , N − 1 is N-periodic with spectral decomposition
Theorem 7. Let (t) and ˆ N be defined by (10) and (18), respectively, with x(t) defined in (8) satisfying Assumption A.1 so that x has the same excitation order as the number of estimated parameters. Let x(t) =
m
Xk ejk t ,
t = −(n − 1), . . . , N − 1
k=1
be the spectral decomposition of x. Then the variance of the frequency function estimate (21) is given by ˆ N (ej )} = Var{G
m
1 o |F (ej )|2 vk (), N |Xk |2
(35)
k=1
x(t) =
m
F (ejk )Uk ejk t ,
t = −(n − 1), . . . , N − 1.
k=1
Hence Assumption A.1 is valid for x and Theorem 7 is applicable with Xk = F (ejk )Uk . Hence (35) proves (37). We now make the following remarks in relation to Corollary 8: Remark 9. The functions vk () are interpolating functions in the sense that
1, l = k 1 l, k m. (38) vk (l ) = 0, l = k
with vk defined in (34).
Hence, under the conditions of Corollary 8,
ˆ N is given by (23) in Lemma 4. HowProof. The variance of G ever, Lemma 6 establishes that
ˆ N (ejk )} = Var{G
j ∗n (ej )T−1 n (rxx )n (e ) =
m k=1
1 vk () |Xk |2
o . N |Uk |2
(39)
Thus at the excitation frequencies, the variance equals the noise to signal ratio regardless of the properties of the known filter F.
under Assumption A.1, which when inserted in (23) proves the theorem.
Remark 10. The filter F does influence the variance at frequencies other than the excitation frequencies even though this filter is known, cf. (37).
For full generality, Theorem 7 is stated in terms of the signal x(t) = F (q)u(t) rather than for the input signal u(t). When
Remark 11. When the input has been zero before the data collection, there will be a transient effect that dies out
594
H. Hjalmarsson, B. Ninness / Automatica 42 (2006) 589 – 600
exponentially fast if the transfer function F is rational. Thus, in this case (37) will only hold approximately. Remark 12. Expressions (35) and (37) can be used for input design, i.e. for optimizing the excitation frequencies and amplitudes with respect to some performance objective. However, there are more general input design methods available that are based directly on (23) and which therefore apply to an arbitary number of excitation frequencies (Jansson, 2004). 5.1. Asymptotic analysis The period time of the input is assumed to be N in Corollary 8. It may, of course, also be less than N provided that N is an integer multiple of the period time since then (36) still is valid. However, when the sample size N is not an integer multiple of the period time there will be a leakage effect such that (37) only applies approximately. Nevertheless, for fixed period time, (37) will be valid as the sample size grows to infinity. Corollary 13. Let (t) and ˆ N be defined by (10) and (18), respectively. Let the input signal u(t) have Fourier series expansion u(t) =
n
Uk ejk t ,
(40)
Corollary 13 is a non-trivial extension to the asymptotic in sample size results presented in Ninness and Hjalmarsson (2004b). 5.2. Upper bounding the variance for arbitary number of excitation frequencies For a given excitation order of the input, expression (37) is valid when the maximum number of parameters (i.e. equal to the excitation order) are estimated. If more parameters are added, the least-squares problem becomes singular. On the other hand, the right-hand side of (37) can also be used as an upper-bound for the variance when the number of excitation frequencies is larger than the model order. Lemma 14. Let n, m be positive integers such that n m. Let (t) and ˆ N be defined by (10) and (18), respectively, with model order n = n and with x(t), defined in (8), having domain of definition {−(n − 1), . . . , N − 1} and being N-periodic with spectral decomposition x(t) =
m
X k ejk t ,
ˆ N (ej )} lim N · Var{G
N →∞
= o |F (ej )|2
m
1
vk (), |F (ejk )Uk |2 k=1
(41)
where vk is defined by (34). Proof. It holds that (Ljung, 1999) lim N · E[(ˆ N − o )(ˆ N − o )T ] = o T−1 n (rxx ),
⎢ ⎢ T−1 n+1 (rxx ) = ⎣
T−1 n (rxx ) +
ˆ N (ej )} Var{G
m
1 o vk () |F (ej )|2 N |Xk |2
(43)
k=1
for any m=n, n+1, . . . , m, with vk defined in (34) with {k }m k=1 being any selection of the {k }m (subject to the conditions in k=1 being the excitation amplitudes Remark 3) and with {Xk }m k=1 for x corresponding to the {k }m k=1 . Proof. First notice that adding parameters in will increase the variance of the frequency function estimate (21). This can be seen as follows: the decomposition Tn (rxx ) w Tn+1 (rxx ) = , wT rxx (0) where w = [rxx (n), . . . , rxx (1)]T , gives
N →∞
⎡
(42)
Then the variance of the frequency function estimate (21) is bounded by
k=1
where Uk ∈ C, Uk = 0, k = 1, . . . , n, for some frequencies k , k = 1, . . . , n (not necessarily on the grid 2l/N , l = 0, . . . , N − 1). Then
t = −(n − 1), . . . , N − 1.
k=1
T −1 T−1 n (rxx )ww Tn (rxx )
rxx (0) − w T T−1 n (rxx )w −1 wT Tn (rxx ) − rxx (0) − w T T−1 n (rxx )w
−
T−1 n (rxx )w
⎤
⎥ rxx (0) − w T T−1 n (rxx )w ⎥ , ⎦ 1
rxx (0) − w T T−1 n (rxx )w
where rxx (0) − wT T−1 n (rxx )w > 0 (see for example, Kailath, 1980). Hence
where now N−1 m 1 rxx ( ) = lim x(t)x(t − ) = |Xk |2 ejk . N→∞ N t=0
k=1
This is the same expression as (17). The result now follows in exactly the same manner as for Corollary 8.
j ∗n+1 (ej )T−1 n+1 (rxx )n+1 (e ) j = ∗n (ej )T−1 n (rxx )n (e )
+
j jn |2 |∗n (ej )T−1 n (rxx )n (e ) − e
rxx (0) − w T T−1 n (rxx )w ∗ j −1 j n (e )Tn (rxx )n (e ).
H. Hjalmarsson, B. Ninness / Automatica 42 (2006) 589 – 600
In view of (23), this implies that ˆ n (ej )} Var{G ˆ n+1 (ej )}, Var{G N N
595
0.1
(44)
where we have used superscript to denote the model order, whenever x is N-periodic with appropriate domain of definition. Now, notice that removing excitation frequencies will increase the variance of the frequency function estimate (21). This is seen from (27). Removing excitation frequencies will make Tn smaller and hence its inverse larger. This implies that (23) increases. ˆ n (ej ) for an arbitrary n in the Now, consider the estimate G N ˆ n (ej ), is interval nnm. Then by (44), the variance of G N n j ˆ (e ). In turn, the variance no greater than the variance of G N of the latter is no greater than the variance when only n out of the m excitation frequencies are present. The variance of this estimate is given by the right-hand side of (43). This completes the proof. 6. Discussion and illustration of results Below we will illustrate the results of the previous section and discuss the insights they provide regarding how the input excitation influences the model accuracy.
0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0
0
0.5
1
1.5
2
2.5
3
Frequency [rad] ˆ N (ej )} in Example 1 with evenly distributed excitaFig. 1. Variance Var{G tion frequencies. Solid line: sample variance for 10,000 Monte Carlo runs. Dashed line: theoretical expression (35). Crosses correspond to the excitation frequencies of the input.
900
6.1. The impact of excitation amplitudes Example 1. A FIR system with o = [1, 1, 1, 1, 1]T and noise variance o = 1 is identified. Fig. 1 shows the variance when the input contains the excitation frequencies consisting of the set {0, 3 · 2/N, 6 · 2/N, 8 · 2/N, 11 · 2/N }, each with amplitude Uk = 1. The data length is N = 14. The result of Monte Carlo simulations and the theoretical expression (35) are shown. The agreement between the sample variability and the theoretical expression is very good. Notice in particular that at the excitation frequencies, the variance is approximately 0.071 which agrees well with the value 1 o /(N |Uk |2 ) = 14 obtained from (39).
800 700 600 500 400 300 200 100 0 0
Notice that since each term in (37) is positive, it is necessary that all amplitudes |F (ejk )Uk | are large for the variance to be small at any frequency which is not an excitation frequency. This is due to the fact that Theorem 7 is valid when the number of excitation frequencies is as small as possible compared to the number of estimated parameters. Hence, if the power at any of these frequencies becomes small, the identification problem becomes close to singular and the variance blows up, except at those excitation frequencies (of the input) where there is sufficient excitation, cf. (39). We illustrate this by modifying Example 1. Example 2 (Example 1 cont’d). Let the system and input excitation be as in Example 1 but let us decrease the input amplitudes at = 6 · 2/N and = 8 · 2/N from 1 to 0.01. Fig. 2 shows the variance in this case. Comparing with Fig. 1, it can be seen that the variance has increased significantly, not
0.5
1
1.5
2
2.5
3
Frequency [rad] ˆ N (ej )} in Example 2 with evenly distributed excitaFig. 2. Variance Var{G tion frequencies. Solid line: sample variance for 10,000 Monte Carlo runs. Dashed line: theoretical expression (35). Crosses correspond to the excitation frequencies of the input.
only close to = 6 · 2/N but also in between the excitation frequencies with low amplitudes. At the excitation frequencies, the variance is still o /(N |Uk |2 ).
6.2. The impact of excitation frequencies Suppose that n < N/2 and that the excitation frequencies are clustered in one frequency region at a distance 2/N apart. Now take to be the point furthest away from the set of excitation
596
H. Hjalmarsson, B. Ninness / Automatica 42 (2006) 589 – 600
1200 2
x 105
1.8
1000
1.6 800
1.4 1.2
600
1 400
0.8 0.6
200 0.4 0
0.2 0
0.5
1
1.5 2 Frequency [rad]
2.5
3
ˆ N (ej )} in Example 3 with excitation frequencies Fig. 3. Variance Var{G concentrated at low frequency. Solid line: sample variance for 10,000 Monte Carlo runs. Dashed line: theoretical expression (35). Crosses correspond to the excitation frequencies of the input.
√ frequencies. The distance will be at least 2. Furthermore, the largest distance between two excitation frequencies will be 2m/N . This implies that m j e − ejl 2 |vk ()| = j j l=1,l =k e k − e l m−1 √ m−1 2 N = √ 2m/N m 2 when m/N>1. Thus we see that the magnitudes of vk (), k=1, . . . , m can in this case be very large. This indicates that the variance of FIR models is very sensitive to the distribution of the excitation frequencies in the input. The underlying reason is that the covariance matrix (22) becomes very poorly conditioned when the excitation is not distributed over the whole frequency axis. Example 3 (Example 1 cont’d). Let the system be as in Example 1 but let us now change the excitation frequencies to the set
2 2 2 2 0, , 2 · , (N − 2) · , (N − 1) · , (45) N N N N all with unit amplitude. This excitation is more low frequency than the one in Example 1. Fig. 3 shows the variance of the frequency function estimate. Comparing with Fig. 1 we see that there is now a much higher variance at high frequencies. The discussion preceding the example suggests that the more concentrated the cluster of frequencies, the higher will be the variance. To illustrate this we increase the sample size to N =28 but retain the choice (45) which implies a smaller frequency spacing. Fig. 4 shows the variance in this case. Comparing with
0 0
0.5
1
1.5 2 Frequency [rad]
2.5
3
ˆ N (ej )} in Example 3 with excitation frequencies highly Fig. 4. Variance Var{G concentrated at low frequency. Solid line: sample variance for 10,000 Monte Carlo runs. Dashed line: theoretical expression (35). Crosses correspond to the excitation frequencies of the input.
Figs. 1 and 3, we see a dramatic increase of the variance at high frequencies. In relation to Example 3 we point out that the sample size may influence the variance of the frequency function estimate in two ways: (1) For fixed excitation frequencies, the variance decreases as 1/N , as exposed by (35); (2) If the frequencies are fixed to have regular spacing, then the excitation frequencies are given by k ·2/N , k =0, . . . , N − 1, which has a further affect on the variance via the k () term in (35) and related expressions. 6.3. The impact of the known transfer function As noted in Remarks 9 and 10 in Section 5, the known transfer function F (q), which is part of the system dynamics (6), influences the variance, except at the excitation frequencies. In the next example we illustrate this. Example 4 (Example 1 cont’d). Consider the same system and excitation as in Example 1, save that in addition the true system contains the known resonance F (q) = (1 − 1.131q −1 + 0.64q −2 )−1 . This corresponds to a Kautz model (Wahlberg, 1994). Fig. 5 shows the variance of the frequency function estimate in this case. Comparing with Fig. 1, we see that the variance is the same as before (in Example 1) at the excitation frequencies of the input, whereas between those frequencies the filter F induces a significant variance increase around the resonance of F itself.
H. Hjalmarsson, B. Ninness / Automatica 42 (2006) 589 – 600
597
2000
4
1800
3.5
1600
3
1400
2.5
1200 1000
2 800
1.5
600
1
400
0.5
200
0
0
0
0.5
1
1.5 2 Frequency [rad]
2.5
0
3
ˆ N (ej )} in Example 4. Solid line: sample variance for Fig. 5. Variance Var{G 10,000 Monte Carlo runs. Dashed line (largely hidden behind the solid line): theoretical expression (37). Crosses correspond to the excitation frequencies of the input. Dash dotted line: |F (ej )|.
0.5
1
1.5 2 Frequency [rad]
2.5
3
ˆ N (ej )} in Example 5. Solid line: sample variance for Fig. 6. Variance Var{G 10,000 Monte Carlo runs. Dashed line: upper bound (43). Crosses correspond to the excitation frequencies of the input. Circles correspond to the frequencies used when computing (43). Expression (39) is used when marking the crosses and circles.
6.4. Arbitary number of excitation frequencies We will now discuss the relevance of (35) and (37) when there are more excitation frequencies m than number of estimated parameters n. Notice that the variance depends continuously on the excitation amplitudes, except at singularities. Hence, the true variance is continuously changed into (35) as all but n excitation amplitudes are reduced to zero. This suggests that (35) and (37) can give qualitative information also when m − n is moderate. The bounds in Lemma 14 provide further support for this notion. We illustrate this with an example.
0.5
0.4
0.3
0.2
Example 5 (Example 1 cont’d). Let the system be as in Example 1 but suppose now that the sample size is N = 28 and that the excitation frequencies are given by
2 2 2 2 2 2 0, , 2 · ,3 · ,4 · , (N − 4) · , (N − 3) · , N N N N N N 2 2 (N − 2) · , (N − 1) · . N N All amplitudes are unity except for =4·2/N (and the mirror frequency =(N −4)·2/N) where the amplitude is 0.1. Fig. 6 shows the variance of the frequency function estimate together with the upper-bound (43) computed using the frequencies
2 2 2 2 0, 2 · ,4 · , (N − 4) · , (N − 2) · , . (46) N N N N We have chosen these frequencies since they are most separated and therefore should give a less conservative bound. We see that the bound captures the rapid increase at high frequencies quite well. From the close up view in Fig. 7, we see that also the low frequency behavior is well represented by the upper bound.
0.1
0
0
0.1
0.2
0.3 0.4 Frequency [rad]
0.5
0.6
0.7
Fig. 7. Close up view of low frequency region of Fig. 6.
6.5. Comparing with the ETFE It is interesting to compare the variance of the FIR-estimate with the variance of the empirical transfer function estimate (ETFE) N y(t)e−jt ˆ j ˆ N (e ) = t=1 , G N −jt t=1 u(t)e when the underlying system is of FIR type.
598
H. Hjalmarsson, B. Ninness / Automatica 42 (2006) 589 – 600
Let k be an excitation frequency of the input. Then it is easy to show that the ETFE can be expressed as ˆˆ (ejk ) = G (ejk ) + G N o
N 1 e(t)e−jk t . NUk t=1
Hence the variance of the ETFE is given by ˆˆ (ejk )} = Var{G N
o . N |Uk |2
Thus we see that the variance of the ETFE equals the variance of a FIR estimate at the excitation frequencies of the input when the number of excitation frequencies equals the number of estimated parameters. 7. Conclusions
where 0 < < 1, 0 < k and where 0 k < 2. These constants, and m, will be specified later. Notice that since the poles are complex conjugated, k = p whenever k + p = 2. Notice also that Q corresponds to a spectral factor for which Lemma 5 is applicable. Step 1: The auto-covariances of the signal defined in (A.1). Denote by r ( ) the auto-covariance function E[s(t)s(t − )] of s(t) when Q = Q in (A.1). Partial fraction decomposition of (A.2) gives Q (z) =
1 − 2
m
C (k)
k=1
z , z − k ejk
(A.3)
where m
C (k) =
l=1,l =k
1 .
k ejk − l ejl
(A.4)
A transparent expression for the variance of an estimated frequency function, obtained from a certain class of model structures, has been derived. The result holds, e.g., when the input is periodic and the dimension of the estimated parameter vector is the same as the number of excitation frequencies in the input. For FIR models, the only requirement is that the input is periodic with period N over the observation interval [−(n − 1), . . . , N − 1]. The novel variance expression provides new insights into how the choice of excitation frequencies influence the accuracy of such frequency function estimates. In particular it has been illustrated that the extrapolation ability of, e.g., FIR models is quite limited. Concentrating the input energy to one frequency region may lead to very poor estimates in the frequency region where excitation is absent, even though the input is persistently exciting. We have also shown how the variance is affected when a known filter is included in the model. This is of interest for, e.g., Laguerre and Kautz models which can be seen as fixed denominator model structures (Ninness et al., 1999). Finally the novel variance expression has been used to relate FIR modeling to ETFE.
Thus for 0, the auto-covariance function of s(t) is given by
Appendix A. Proof of Lemma 6
Now
Proof outline. The idea is to find a spectral factor Q such that the matrix R in Lemma 5 equals Tn (rxx ) and then use the result of Lemma 5. Since R in Lemma 5 can be interpreted as the Toeplitz matrix built up of the first n covariance functions of the signal s(t) = Q(q)v(t),
(A.1)
where v(t) is zero mean white noise with unit variance, one interpretation is that we will show that there also exists an ARspectrum which has rxx ( ), = 0, 1, . . . , n − 1 as the first n covariances. Consider therefore the spectral factor Q (z) =
1 − 2 z
m k=1
1 , z − k ejk
(A.2)
Hence Q has impulse response coefficients (r) =
m
C (k) 1 − 2 k r ejk r
k=1
m
(r, k),
r = 0, 1, 2, . . . .
(A.5)
k=1
r (− ) = = =
∞
(r) (r + )
r=0 m ∞
(r, k)
r=0 k=1 ∞ m m
m
(r + , p)
p=1
(r, k) (r + , p).
(A.6)
k=1 p=1 r=0
∞
(r, k) (r + , p)
r=0
= (1 − 2 )
∞
C (k) k r ejk r C (p) p (r+ ) ejp (r+ )
r=0
= (1 − 2 )C (k)C (p)ejp p
∞
(k +p )r ej(k +p )r
r=0
= C (k)C (p)e
jp p
1 − 2 . 1 − k +p ej(k +p )
(A.7)
As easily verified by l’Hôpital’s rule, 1 − 2 = .
→1− 1 − 2 lim
(A.8)
H. Hjalmarsson, B. Ninness / Automatica 42 (2006) 589 – 600
From (A.8) and by noticing that 0 k + p < 4 and that k = p when k + p = 2, it follows from (A.7) that ∞
lim
→1−
(r, k) (r + , p)
k + p ∈ {0, 2},
(A.9)
otherwise,
where
→1−
l=0,l =k
1 . ejk − ejl
→1−
(A.10)
Hence, lim r (− ) = r(− )
→1−
m m
h(k, p, ).
(A.11)
k=1 p=1
In the next two steps we will show that by clever choice of the integer m (Step 2) and the constants C(k) and k (Step 3), it will hold that r( ) = rxx ( ). Step 2: r( ) for a particular choice of constants. Let m = n, the number of estimated parameters, and take the same k as in Lemma 6. Then, whenever k + l = 2 we have that C(l) = C(k) and k = p . Hence, ⎧ |C(0)|2 ⎪ , k = p = 0, ⎪ ⎪ ⎨ 0 h(k, p, ) = |C(k)|2 e−jk (A.12) ⎪ , k + p = 2, ⎪ ⎪ k ⎩ 0 otherwise. Hence for each k = 1, . . . , n there exists a unique integer 1 p n such that h(k, p, ) = |C(k)|2 /k e−jk whereas h(k, p, ) = 0 for all other p. Thus r(− ) =
n n
h(k, p, ) =
k=1 p=1
N−1 k=0
|C(k)|2 −jk e . k
(A.13)
|C(k)|2 , |Xk |2
(A.14)
we thus have r( ) = nk=1 |Xk |2 ejk which coincides with expression (17) for rxx ( ). So far we have thus proved that lim R( ) = Tn (rxx ),
→1−
(A.15)
where R( ) is the Toeplitz matrix with r (i − j ) as ijth entry. Step 4: Links between quadratic forms based on R −1 ( ) and T−1 n (rxx ). Define J ( )∗n (ej )R( )−1 n (ej ).
(A.18)
Step 5: An expression for lim →1− J ( ). The number of poles in (A.2) is m. Since m is also the assumed model order, Lemma 5 is applicable with m = n and m = 0. This gives that J ( ) can be expressed as J ( ) =
n
|ej − l ejl |2
l=1
×
n (1 − 2k ) k=1
|ej
(1 − 2 )
1 . − k ejk |2
(A.19)
From (A.8) it follows that lim J ( ) =
→1−
n k=1
k
n
|ej − ejl |2 ,
l=1,l =k
which, with k given by (A.14) and with C(k) defined by (A.10), gives n |C(k)|2 lim J ( ) =
→1− |Xk |2 k=1
=
n k=1
1 |Xk |2
n
|ej − ejl |2
l=1,l =k
j n e − ejl 2 ejk − ejl .
(A.20)
l=1,l =k
Now, (A.18) and (A.20) proves (33).
References
Step 3: Linking r( ) to rxx ( ). With the choice k =
where the convergence is due to (A.15). Hence j lim J ( ) = ∗n (ej )T−1 n (rxx )n (e ).
m−1
C(k) = lim C (k) =
Next notice that Tn (rxx ) is non-singular since |Xk | > 0, k = 1, . . . , n. Hence j |∗n (ej )T−1 n (rxx )n (e ) − J ( )| ∗ j −1 = |n (e )(Tn (rxx ) − R −1 ( ))n (ej )| −1 j = |∗n (ej )T−1 n (rxx )(R( ) − Tn (rxx ))R ( )n (e )| CTn (rxx ) − R( )2 → 0 as → 1− (A.17)
r=0
= h(k, p, ) C(k)C(p)ejp , k 0
599
(A.16)
Bombois, X., Scorletti, G., Gevers, M., Hildebrand, R., & Van den Hof, P. (2004). Least costly identification experiment for control. In MTNS’04. Leuven, Belgium. Campi, M. C., Ooi, S. K., & Weyer, E. (2004). Non-asymptotic quality assessment of generalised FIR models with periodic inputs. Automatica, 40(12), 2029–2041. Campi, M. C., Ooi, S. K., & Weyer, E. (2005). Guaranteed nonasymptotic confidence regions in system identification. Automatica, 41(10), 1751–1764. Cooley, B. L., & Lee, J. H. (2001). Control relevant experiment design for multivariable systems described by expansions in orthonormal bases. Automatica, 37(2), 273–281. Heuberger, P., Wahlberg, B., & Van den Hof, P. (Eds.). (2005). Modeling and identification with rational orthogonal basis functions. Berlin: Springer. Hildebrand, R., & Gevers, M. (2003). Identification for control: Optimal input design with respect to a worst-case -gap cost function. SIAM Journal of Control and Optimization, 41(5), 1586–1608. Hjalmarsson, H. (2005). From experiment design to closed loop control. Automatica, 41(3), 393–438.
600
H. Hjalmarsson, B. Ninness / Automatica 42 (2006) 589 – 600
Jansson, H. (2004). Experiment design with applications in identification for control. Ph.D. thesis, TRITA-S3-REG-0404. Jansson, H., & Hjalmarsson, H. (2005). Input design via LMIs admitting frequency-wise model specifications in confidence regions. IEEE Transactions on Automatic Control, 50(10), 1534–1549. Kailath, T. (1980). Linear systems. Englewood Cliffs, NJ: Prentice-Hall. Lee, J. H. (2003) Control-relevant design of periodic test input signals for iterative open-loop identification of multivariable FIR systems. In Proceedings of IFAC symposium SYSID 2003. Rotterdam, The Netherlands. Ljung, L. (1985). Asymptotic variance expressions for identified black-box transfer function models. IEEE Transactions on Automatic Control, 30(9), 834–844. Ljung, L. (1999). System identification: Theory for the user. (2nd ed.), Englewood Cliffs, NJ: Prentice-Hall. Ljung, L., & Yuan, Z. D. (1985). Asymptotic properties of black-box identification of transfer functions. IEEE Transactions on Automatic Control, 30(6), 514–530. Mäkilä, P. M., Partington, J. R., & Gustafsson, T. K. (1995). Worst-case control-relevant identification. Automatica, 31(12), 1799–1819. Milanese, M., & Vicino, A. (1991). Optimal estimation theory for dynamic systems with set membership uncertainty: An overview. Automatica, 27(6), 997–1009. Ninness, B., & Goodwin, G. (1995). Estimation of model quality. Automatica, 31(12), 32–74. Ninness, B., & Hjalmarsson, H. (2004a). The effect of regularisation on variance error. IEEE Transactions on Automatic Control, 49(7), 1142–1147. Ninness, B., & Hjalmarsson, H. (2004b). Variance error quantifications that are exact for finite model order. IEEE Transactions on Automatic Control, 49(8), 1275–1291. Ninness, B., Hjalmarsson, H., & Gustafsson, F. (1999). The fundamental role of general orthonormal bases in system identification. IEEE Transactions on Automatic Control, 44(7), 1384–1406. Wahlberg, B. (1994). System identification using Kautz models. IEEE Transactions on Automatic Control, 39(12), 1276–1282. Xie, L.-L., & Ljung, L. (2001). Asymptotic variance expressions for estimated frequency functions. IEEE Transactions on Automatic Control, 46(12), 1887–1899.
H˚akan Hjalmarsson was born in 1962. He received the M.S. degree in Electrical Engineering in 1988, and the Licentiate degree and the Ph.D. degree in Automatic Control in 1990 and 1993, respectively, all from Linköping University, Sweden. He has held visiting research positions at California Institute of Technology, Louvain University and the University of Newcastle, Australia. He has served as an Associate Editor for Automatica and been a Guest Editor for European Journal of Control and Control Engineering Practice. He is currently an Associate Editor for IEEE Transactions for Automatic Control and Professor in the Department of Signals, Sensors and Systems at KTH, Stockholm, Sweden. He is vice-chair of the IFAC Technical Committee on Modeling, Identification and Signal Processing. In 2001 he received the KTH award for outstanding contribution to undergraduate education. His research interests include system identification, signal processing, control and estimation in communication networks and automated tuning of controllers. Brett Ninness was born in 1963 in Singleton, Australia and received his BE, ME and Ph.D degrees in Electrical Engineering from the University of Newcastle, Australia in 1986, 1991 and 1994 respectively. He now serves at the same institution as an Associate Professor, and pursues research in areas of system identification, stochastic signal processing, wireless communications and control, and has published more than 100 articles on these topics. Together with H˚akan Hjalmarsson, he is jointly organizing the 14th IFAC Symposium on System Identification to be held in Newcastle in 2006. He is a member of the editorial board of Automatica, and is an Editor in Chief of IEE Control Theory and Applications. Details of his research and teaching activities are available at sigpromu.org/brett.