Digital Signal Processing 20 (2010) 1027–1039
Contents lists available at ScienceDirect
Digital Signal Processing www.elsevier.com/locate/dsp
Several multi-innovation identification methods ✩ Feng Ding School of Communication and Control Engineering, Jiangnan University, Wuxi 214122, PR China
a r t i c l e
i n f o
a b s t r a c t
Article history: Available online 6 November 2009 Keywords: Recursive identification Parameter estimation Signal processing Filtering Multi-innovation identification Stochastic gradient Least squares Missing data
This paper considers connections between the cost functions of some parameter identification methods for system modelling, including the well known projection algorithm, stochastic gradient (SG) algorithm and recursive least squares (RLS) algorithm, and presents a modified SG algorithm by introducing the convergence index and a multi-innovation projection algorithm, a multi-innovation SG algorithm and a multi-innovation RLS algorithm by introducing the innovation length, aiming at improving the convergence rate of the SG and RLS algorithms. Furthermore, this paper derives an interval-varying multi-innovation SG and an interval-varying multi-innovation RLS algorithm in order to deal with missing data cases. © 2010 Elsevier Inc. All rights reserved.
1. Introduction Parameter estimation methods have received much attention in signal processing, system modelling and adaptive control, e.g. [1–5]. Typical estimation methods include the least mean square algorithm (e.g., the projection algorithm and stochastic gradient algorithm) and least squares algorithm. For example, Malik and Salman studied the least mean square algorithm for state space models [1]; James discussed total least squares matrix enhancement, and signal processing [2]; Ding and Chen presented a multi-innovation stochastic gradient (MISG) algorithm for linear regression models (CAR/ARX systems) [6] and Han and Ding gave its multivariable version for multi-input multi-output systems [3]; Ding, Liu and Liu extended the multiinnovation method to pseudo-linear regression models for output error moving average (OEMA) systems and proposed an auxiliary model based multi-innovation extended stochastic gradient (AM-MI-ESG) algorithm [7] and multi-innovation least squares (MILS) and multi-innovation extended least squares (MI-ELS) algorithms [8]. Recently, Wang and Ding analyzed the performance of the auxiliary models based multi-innovation stochastic gradient estimation algorithm for output error systems [9]. Also, the multi-innovation theory was used to study the identification problems for multirate systems [10–14], and self-tuning control problems for discrete-time systems [15]. This paper considers identification problems of the following linear regression models:
y (t ) = ϕ T (t )θ + v (t ),
(1)
where y (t ) ∈ R1 is the system output, ϕ (t ) ∈ Rn is the information vector consisting of the system observation (input– output) data, v (t ) ∈ R1 is a stochastic white noise with zero mean and variance σ 2 , θ ∈ Rn is the parameter vector to be identified, the superscript T denotes the transpose.
✩
This work was supported by the National Natural Science Foundation of China (60973043). E-mail address:
[email protected].
1051-2004/$ – see front matter doi:10.1016/j.dsp.2009.10.030
©
2010 Elsevier Inc. All rights reserved.
1028
F. Ding / Digital Signal Processing 20 (2010) 1027–1039
Many approaches can identify the parameter vector θ , e.g., [4,16]. The typical methods are the projection algorithm, stochastic gradient (SG) algorithm and recursive least squares (RLS) algorithm [4,16]. The following simply introduces the projection algorithm, SG algorithm and RLS algorithm. 1.1. The projection identification algorithm Let μt be the step-size, θˆ (t ) be the estimate of θ at time t and X 2 := tr[ X X T ] be the norm of X . Using the negative gradient search (steepest descent method) and minimizing the quadratic cost function:
J 1 (θ) := y (t ) − ϕ T (t )θ
2
(2)
,
lead to the following recursive algorithm:
ˆ t − 1) − μt grad J 1 θˆ (t − 1) ˆ t ) = θ( θ( 2 ˆ t − 1) . ˆ t − 1) + μt ϕ (t ) y (t ) − ϕ T (t )θ( = θ(
(3)
Substituting θ = θˆ (t ) into (2) gives
2
ˆ t ) = y (t ) − ϕ T (t )θ( ˆ t) g (μt ) := J 1 θ(
ˆ t − 1) − μt ϕ (t )2 y (t ) − ϕ T (t )θˆ (t − 1) 2 = y (t ) − ϕ T (t )θ( 2 2 ˆ t − 1) 2 . = 1 − μt ϕ (t ) y (t ) − ϕ T (t )θ(
Assume that
μt =
ϕ (t ) = 0. Letting g (μt ) = 0 gives the best step-size: 1
ϕ (t )2
.
μt in the above equation into (3) yields the following projection identification algorithm: ˆ t ) = θ( ˆ t − 1) + ϕ (t ) y (t ) − ϕ T (t )θ( ˆ t − 1) . θ( ϕ (t )2
Substituting
If
(4)
ϕ (t ) = 0, then we take θˆ (t ) = θˆ (t − 1).
1.2. The stochastic gradient algorithm
ˆ t ) given by the projection algorithm in (4) is sensitive to noise v (t ) and cannot converge to The parameter estimates θ( their true values. In order to overcome this drawback, one can obtain the following stochastic gradient (SG) algorithm of estimating θ by taking the step-size μt := 1/r (t ) and r (t ) = r (t − 1) + ϕ (t )2 [4]:
ˆ t ) = θ( ˆ t − 1) + ϕ (t ) y (t ) − ϕ T (t )θ( ˆ t − 1) , θ( r (t ) 2 r (t ) = r (t − 1) + ϕ (t ) , r (0) = 1.
(5) (6)
1.3. The recursive least squares algorithm The least squares method uses the following cost function:
J 2 (θ) :=
t
y (i ) − ϕ T (i )θ
2
.
i =1
Minimizing J 2 (θ ) yields the recursive least squares (RLS) algorithm of estimating θ [16]:
ˆ t ) = θ( ˆ t − 1) + P (t )ϕ (t ) y (t ) − ϕ T (t )θ( ˆ t − 1) , θ( P
−1
(t ) = P
−1
T
(t − 1) + ϕ (t )ϕ (t ),
P (0) = p 0 I n .
(7) (8)
n×n
P (t ) ∈ R is the covariance matrix and I n denotes an identity matrix of order n, and p 0 is taken to be a large positive number, e.g. p 0 = 106 . Applying the matrix inversion formula
− 1 − 1 ( A + B C )−1 = A −1 − A −1 B I m + C A −1 B CA
(9)
F. Ding / Digital Signal Processing 20 (2010) 1027–1039
1029
to (8), the algorithm in (7)–(8) can be equivalently expressed as
ˆ t ) = θ( ˆ t − 1) + P (t )ϕ (t ) y (t ) − ϕ T (t )θˆ (t − 1) , θ( P (t ) = P (t − 1) +
P (t − 1)ϕ (t )ϕ (t ) P (t − 1) T
1 + ϕ T (t ) P (t − 1)ϕ (t )
,
(10) P (0) = p 0 I n .
(11)
It is well known that the SG algorithm has slower convergence rates than the RLS algorithm although they both use all ˆ t ) − θ of the SG measurement data { y (t ), ϕ (t ): t = 1, 2, . . . , L } (L is the data length). The parameter estimation error θ( algorithm is generally greater than that of the RLS algorithm. In other words, the RLS algorithm can extract much more sufficient information from the measurement data to enhance the parameter estimation accuracy, compared with the SG algorithm. The objective of this paper is to develop highly accurate estimation algorithms to improve the parameter estimation accuracy, including the modified SG algorithm, multi-innovation projection algorithm, multi-innovation SG algorithm, multi-innovation RLS algorithm, interval-varying multi-innovation SG algorithm and interval-varying multi-innovation RLS algorithm to estimate the parameter vector θ using the observations { y (t ), ϕ (t )}. 2. The multi-innovation identification algorithms 2.1. The multi-innovation projection algorithm Consider the newest p data { y (t − i ), ϕ (t − i ): i = 0, 1, . . . , p − 1} and define the stacked output vector Y ( p , t ) and stacked information matrix Φ( p , t ) as
⎡
y (t ) y (t − 1)
⎢ ⎢ ⎣
Y ( p, t ) = ⎢
.. .
⎡
⎤ ⎥ ⎥ ⎥ ∈ Rp , ⎦
⎢ ⎢ Φ T ( p, t ) = ⎢ ⎣
y (t − p + 1)
ϕ T (t )
⎤
ϕ T (t − 1) ⎥ ⎥
⎥∈R .. ⎦ . ϕ T (t − p + 1)
p ×n
.
Define a quadratic cost function:
2
J 3 (θ) := Y ( p , t ) − Φ T ( p , t )θ .
(12)
Its gradient is
grad J 3 (θ ) =
∂ J 3 (θ ) = −2Φ( p , t ) Y ( p , t ) − Φ T ( p , t )θ . ∂θ
Using the negative gradient search (steepest descent method) and minimizing J 3 (θ ) give the following recursive algorithm:
ˆ t − 1) ˆ t ) = θ( ˆ t − 1) − μt grad J 3 θ( θ( 2 ˆ t − 1) + μt Φ( p , t ) Y ( p , t ) − Φ T ( p , t )θˆ (t − 1) = θ( ˆ t − 1) + μt Φ( p , t ) E ( p , t ), =: θ(
(13)
where
E ( p , t ) := Y ( p , t ) − Φ T ( p , t )θˆ (t − 1) ∈ R p .
ˆ t ) into (12) gives Substituting θ = θ(
ˆ t ) = Y ( p , t ) − Φ T ( p , t ) θˆ (t − 1) + μt Φ( p , t ) E ( p , t ) 2 g (μt ) := J 3 θ(
2 = I p − μt Φ T ( p , t )Φ( p , t ) E ( p , t ) 2 = E T ( p , t ) I p − μt Φ T ( p , t )Φ( p , t ) E ( p , t ) = E T ( p , t ) I p − 2μt Φ T ( p , t )Φ( p , t ) + μt2 Φ T ( p , t )Φ( p , t )Φ T ( p , t )Φ( p , t ) E ( p , t ) 2 2 2 = E ( p , t ) − 2μt Φ( p , t ) E ( p , t ) + μt2 Φ T ( p , t )Φ( p , t ) E ( p , t ) .
Let the derivative of g (μt ) with respect to
μt be zero: 2 2 g (μt ) = −2Φ( p , t ) E ( p , t ) + 2μt Φ T ( p , t )Φ( p , t ) E ( p , t ) = 0.
If Φ T ( p , t )Φ( p , t ) E ( p , t ) = 0, then the best step-size is given by
1030
F. Ding / Digital Signal Processing 20 (2010) 1027–1039
μt = = Substituting
Φ( p , t ) E ( p , t )2
(14)
Φ T ( p , t )Φ( p , t ) E ( p , t )2 E T ( p , t )Φ T ( p , t )Φ( p , t ) E ( p , t ) E ( p , t )Φ T ( p , t )Φ( p , t )Φ T ( p , t )Φ( p , t ) E ( p , t ) T
(15)
.
μt into (13) leads to the multi-innovation projection identification algorithm (MI-Proj):
ˆ t ) = θ( ˆ t − 1) + μt Φ( p , t ) E ( p , t ), θ(
(16)
Φ( p , t ) E ( p , t ) , Φ T ( p , t )Φ( p , t ) E ( p , t )2 E ( p , t ) = Y ( p , t ) − Φ T ( p , t )θˆ (t − 1), Φ( p , t ) = ϕ (t ), ϕ (t − 1), . . . , ϕ (t − p + 1) , T Y ( p , t ) = y (t ), y (t − 1), . . . , y (t − p + 1) . 2
μt =
(17) (18) (19) (20)
ˆ t ) = θ( ˆ t − 1). When the innovation length p = 1, the Similarly, if the denominator of (17) equals zero, then we take θ( MI-Proj algorithm in (16)–(20) reduces to the projection identification algorithm in (4). Because it is very complex to compute μt by (17), it is necessary to simplify μt . For any real vector x ∈ Rn and nonnegative definite matrix Q , we have xT Q x λmax [ Q ]xT x Q xT x, where λmax [ Q ] is the greatest eigenvalue of Q . From (15), we have
E T ( p , t )Φ T ( p , t )Φ( p , t ) E ( p , t ) T
T
T
E ( p , t )Φ ( p , t )Φ( p , t )Φ ( p , t )Φ( p , t ) E ( p , t )
=
E T ( p , t )Φ T ( p , t )Φ( p , t ) E ( p , t )
λmax [Φ( p , t )Φ T ( p , t )] E T ( p , t )Φ T ( p , t )Φ( p , t ) E ( p , t ) 1
λmax [Φ( p , t )Φ T ( p , t )]
.
Thus, the step-size can be conservatively taken to be
μt =
1
λmax [Φ( p , t )Φ ( p , t )] T
or
μt =
1
Φ( p , t )2
,
since it is easier to compute the trace than the greatest eigenvalue of a matrix. Hence, the MI-Proj algorithm in (16)–(20) can be simplified as (the S-MI-Proj algorithm for short)
ˆ t ) = θ( ˆ t − 1) + μt Φ( p , t ) E ( p , t ), θ(
μt =
1
λmax [Φ( p , t )Φ T ( p , t )]
or
μt =
(21) 1
Φ( p , t )2
E ( p , t ) = Y ( p , t ) − Φ T ( p , t )θˆ (t − 1),
Φ( p , t ) = ϕ (t ), ϕ (t − 1), . . . , ϕ (t − p + 1) , T Y ( p , t ) = y (t ), y (t − 1), . . . , y (t − p + 1) .
,
(22) (23) (24) (25)
2.2. The multi-innovation stochastic gradient algorithm The MI-Proj and S-MI-Proj algorithms can track the time-varying parameters but their parameter estimation errors do not converge to zero for time-invariant systems. In order to achieve the consistent estimation, we adjust the step-size μt to obtain a multi-innovation stochastic gradient (MISG) algorithm [6,7]:
ˆ t − 1) + Φ( p , t ) E ( p , t ), ˆ t ) = θ( θ( r (t ) E ( p , t ) = Y ( p , t ) − Φ T ( p , t )θˆ (t − 1), 2 r (t ) = r (t − 1) + ϕ (t ) , r (0) = 1, Φ( p , t ) = ϕ (t ), ϕ (t − 1), . . . , ϕ (t − p + 1) , T Y ( p , t ) = y (t ), y (t − 1), . . . , y (t − p + 1) .
(26) (27) (28) (29) (30)
F. Ding / Digital Signal Processing 20 (2010) 1027–1039
1031
Since e (t ) := y (t ) − ϕ T (t )θˆ (t − 1) ∈ R1 is known as the scalar innovation [16], the projection, SG, RLS algorithms may be also called the single innovation projection, SG, RLS algorithms, respectively. Because E ( p , t ) ∈ R p in (16), (21) and (27) is an innovation vector, namely, multi-innovation, from which the multi-innovation identification algorithms originate. As p = 1, the MISG algorithm in (26)–(30) reduces to the SG algorithm in (5)–(6). Eq. (28) may be replaced with
2
r (t ) = r (t − 1) + Φ( p , t ) ,
r (0) = 1.
(31)
2.3. The multi-innovation least squares algorithm Defining and minimizing the cost function:
J 4 (θ) :=
t Y ( p , i ) − Φ T ( p , i )θ 2 , i =1
we can obtain a multi-innovation recursive least squares (MILS) algorithm with the innovation length p [8]:
ˆ t ) = θ( ˆ t − 1) + P (t )Φ( p , t ) E ( p , t ), θ( E ( p , t ) = Y ( p , t ) − Φ T ( p , t )θˆ (t − 1), −1
−1
(32) (33)
T
(t ) = P (t − 1) + Φ( p , t )Φ ( p , t ), Φ( p , t ) = ϕ (t ), ϕ (t − 1), . . . , ϕ (t − p + 1) , T Y ( p , t ) = y (t ), y (t − 1), . . . , y (t − p + 1) . P
(34) (35) (36)
When p = 1, the MILS algorithm in (32)–(36) reduces to the RLS algorithm in (7)–(8). Applying the matrix inversion formula (9) to (34), then the MILS algorithm in (32)–(36) can be equivalently expressed as
ˆ t ) = θ( ˆ t − 1) + L (t ) Y ( p , t ) − Φ T ( p , t )θˆ (t − 1) , θ( −1 , L (t ) = P (t − 1)Φ( p , t ) I p + Φ T ( p , t ) P (t − 1)Φ( p , t )
(37) (38)
T
P (t ) = P (t − 1) − L (t )Φ ( p , t ) P (t − 1),
(39)
Φ( p , t ) = ϕ (t ), ϕ (t − 1), . . . , ϕ (t − p + 1) , T Y ( p , t ) = y (t ), y (t − 1), . . . , y (t − p + 1) .
(40) (41)
To initialize the identification algorithms, the initial values θˆ (0) and P (0) are generally taken to be a small real vector ˆ 0) = 1n / p 0 and P (0) = p 0 I n , where 1n is an n-dimensional column and a large positive definite matrix, respectively, e.g., θ( vector whose elements are 1 and p 0 = 106 . 3. The interval-varying multi-innovation identification algorithms The interval-varying multi-innovation identification methods contain an interval-varying MISG algorithm and an intervalvarying MILS algorithm. 3.1. The interval-varying MISG algorithm In practice, there possibly exist some missing measurements in { y (t ), ϕ (t ), t = 0, 1, 2, . . .}, i.e., some data are unavailable or lost. In order to deal with such cases with missing data, we define an integer sequence {t s : s = 0, 1, 2, . . .} satisfying
0 t 0 < t 1 < t 2 < t 3 < · · · < t s −1 < t s < · · · , with t s∗ := t s − t s−1 1 so that { y (t s ), ϕ (t s )} is always available for any s (s = 0, 1, 2, . . .). For instance, if the available measurement data are
y (0), ϕ (0) ,
y (1), ϕ (1) ,
y (2), ϕ (2) ,
y (8), ϕ (8) ,
y (9), ϕ (9) ,
y (10), ϕ (10) ,
...,
then we have
t 0 = 0,
t 1 = 1,
t 2 = 2,
t 3 = 8,
t 4 = 9,
t 5 = 10,
....
Note that the above framework for data-missing cases includes scenarios with no missing data as a special case (only take t s∗ ≡ 1).
1032
F. Ding / Digital Signal Processing 20 (2010) 1027–1039
Define
⎡ ⎢ ⎢ ⎣
Y ( p, ts ) = ⎢
⎡
⎤
y (t s ) y (t s − 1)
⎢ ⎢ Φ T ( p, ts ) = ⎢ ⎣
⎥ ⎥ ⎥ ∈ Rp , .. ⎦ . y (t s − p + 1)
ϕ T (t s )
⎤
⎥ ⎥ ⎥ ∈ R p ×n . .. ⎦ . T ϕ (t s − p + 1)
ϕ T (t s − 1)
(42)
Defining and minimizing the cost function:
2
J 5 (θ) := Y ( p , t s ) − Φ T ( p , t s )θ .
(43)
This is a criterion function over a finite data window. Let gradient algorithm
μ(t s ) be the step-size. Minimizing J 5 (θ ) gives the following
ˆ t s −1 ) ˆ t s ) = θ( ˆ t s−1 ) − μ(t s ) grad J 5 θ( θ( 2 ˆ ˆ t s −1 ) . = θ(t s−1 ) + μ(t s )Φ( p , t s ) Y ( p , t s ) − Φ T ( p , t s )θ(
(44)
Define the innovation vector:
ˆ t s −1 ) ∈ R p . E ( p , t s ) := Y ( p , t s ) − Φ T ( p , t s )θ( Then we have
ˆ t s ) = θ( ˆ t s−1 ) + μ(t s )Φ( p , t s ) E ( p , t s ). θ(
(45)
Substituting θ = θˆ (t s ) into (43) gives
g
ˆ t s−1 ) + μ(t s )Φ( p , t s ) E ( p , t s ) 2 μ(t s ) := J 5 θˆ (t s ) = Y ( p , t s ) − Φ T ( p , t s ) θ(
2 = I − μ(t s )Φ T ( p , t s )Φ( p , t s ) E ( p , t s ) 2 = E T ( p , t s ) I − μ(t s )Φ T ( p , t s )Φ( p , t s ) E ( p , t s ) = E T ( p , t s ) I − 2μ(t s )Φ T ( p , t s )Φ( p , t s ) + μ(t s )2 Φ T ( p , t s )Φ( p , t s )Φ T ( p , t s )Φ( p , t s ) E ( p , t s ) 2 2 2 = E ( p , t s ) − 2μ(t s )Φ( p , t s ) E ( p , t s ) + μ(t s )2 Φ T ( p , t s )Φ( p , t s ) E ( p , t s ) .
Minimizing g (μ(t s )) gives the best step-size:
μ(t s ) = =
Φ( p , t s ) E ( p , t s )2 Φ ( p , t s )Φ( p , t s ) E ( p , t s )2 T
E T ( p , t s )Φ T ( p , t s )Φ( p , t s ) E ( p , t s ) E ( p , t s )Φ T ( p , t s )Φ( p , t s )Φ T ( p , t s )Φ( p , t s ) E ( p , t s ) T
,
which can be taken approximately as
μ(t s ) =
1
Φ( p , t s )2
or
μ(t s ) =
1
λmax [Φ( p , t s )Φ T ( p , t s )]
(46)
.
Combining (42)–(46) gives the following interval-varying MI-Proj identification algorithm (V-MI-Proj):
ˆ t s ) = θ( ˆ t s−1 ) + μ(t s )Φ( p , t s ) Y ( p , t s ) − Φ T ( p , t s )θ( ˆ t s −1 ) , θ(
μ(t s ) =
Φ( p , t s ) E ( p , t s )
2
Φ T ( p , t s )Φ( p , t s ) E ( p , t s )2
or
μ(t s ) =
s = 1, 2, 3, . . . , 1
λmax [Φ( p , t s )Φ T ( p , t s )]
,
ˆ t ) = θ( ˆ t s ), t s t < t s+1 , θ( Φ( p , t s ) = ϕ (t s ), ϕ (t s − 1), . . . , ϕ (t s − p + 1) , T Y ( p , t s ) = y (t s ), y (t s − 1), . . . , y (t s − p + 1) . When the interval t s∗ = 1, the V-MI-Proj algorithm reduces to the MI-Proj algorithm in (16)–(20).
(47) (48) (49) (50) (51)
F. Ding / Digital Signal Processing 20 (2010) 1027–1039
1033
Furthermore, we can derive the interval-varying MISG (V-MISG) algorithm:
ˆ t s ) = θ( ˆ t s−1 ) + Φ( p , t s ) Y ( p , t s ) − Φ T ( p , t s )θ( ˆ t s −1 ) , θ( r (t s ) ˆθ(t ) = θ( ˆ t s ), t s t < t s+1 , 2 r (t s ) = r (t s−1 ) + ϕ (t s ) , r (0) = 1, Φ( p , t s ) = ϕ (t s ), ϕ (t s − 1), . . . , ϕ (t s − p + 1) , T Y ( p , t s ) = y (t s ), y (t s − 1), . . . , y (t s − p + 1) .
(52) (53) (54) (55) (56)
Eq. (54) may be replaced with
2
r (t s ) = r (t s−1 ) + Φ( p , t s ) ,
r (0) = 1.
(57)
3.2. The interval-varying MILS algorithm Define the cost function:
J 6 (θ) :=
s Y ( p , t i ) − Φ T ( p , t i )θ 2 , i =1
and
⎡ Z (t s ) := ⎢
⎣
⎡
⎤
Y ( p, t1 ) ⎢ Y ( p, t2 ) ⎥ ⎢ ⎥
Z (t s−1 ) ∈ R ps , ⎥= .. Y ( p, ts ) ⎦ . Y ( p, ts )
⎤ Φ T ( p, t1 ) T ⎢ Φ ( p, t2 ) ⎥ H (t s−1 ) ⎢ ⎥ ∈ R( ps)×n . H (t s ) := ⎢ = ⎥ .. Φ T ( p, ts ) ⎣ ⎦ . Φ T ( p, ts )
Then J 6 can be rewritten as
J 6 (θ) = Z (t s ) − H (t s )θ
T
Z (t s ) − H (t s )θ .
ˆ t s ) gives Letting the derivative of J 6 with respect to θ at θ = θ(
∂{[ Z (t s ) − H (t s )θ ]T [ Z (t s ) − H (t s )θ]} ∂ J 6 (θ ) = ˆ = 0. ∂θ θ =θ( ∂θ ˆ ts ) θ =θ (t s ) Using the formulae
∂ T a x = a, ∂x
∂ T x Ax = A + A T x ∂x
gives
ˆ t s ) = H T (t s ) Z (t s ) H T (t s ) H (t s ) θ(
or
ˆ t s ) = H T (t s ) H (t s ) −1 H T (t s ) Z (t s ) θ( −1 s s T T Φ( p , t i )Φ ( p , t i ) Φ( p , t i )Y ( p , t i ) . = i =1
(58)
i =1
Let
P −1 (t s ) :=
s
Φ( p , t i )Φ T ( p , t i ) = P −1 (t s−1 ) + Φ( p , t s )Φ T ( p , t s ).
i =1
Using (58) and (59), we have
(59)
1034
F. Ding / Digital Signal Processing 20 (2010) 1027–1039
ˆ t s ) = H T (t s ) H (t s ) −1 H T (t s ) Z (t s ) θ( T H (t s−1 ) Z (t s−1 ) = P (t s ) Y ( p, ts ) Φ T (t s ) T Z (t s−1 ) = P (t s ) H (t s−1 ), Φ( p , t s ) Y ( p, ts ) T = P (t s ) H (t s−1 ) Z (t s−1 ) + Φ( p , t s )Y ( p , t s ) = P (t s ) P −1 (t s−1 ) P (t s−1 ) H T (t s−1 ) Z (t s−1 ) + Φ( p , t s )Y ( p , t s ) = P (t s ) P −1 (t s−1 )θˆ (t s−1 ) + Φ( p , t s )Y ( p , t s ) ˆ t s−1 ) + P (t s )Φ( p , t s )Y ( p , t s ) = P (t s ) P −1 (t s ) − Φ( p , t s )Φ T ( p , t s ) θ( ˆ t s−1 ) + P (t s )Φ( p , t s ) Y ( p , t s ) − Φ T ( p , t s )θ( ˆ t s −1 ) . = θ(
(60)
Applying the matrix inversion formula (9) to (59) gives
−1
P (t s ) = P (t s−1 ) − P (t s−1 )Φ( p , t s ) I p + Φ T ( p , t s ) P (t s−1 )Φ( p , t s )
Φ T ( p , t s ) P (t s−1 ).
(61)
Post-multiplying by Φ( p , t s ) gives
−1
P (t s )Φ( p , t s ) = P (t s−1 )Φ( p , t s ) − P (t s−1 )Φ( p , t s ) I p + Φ T ( p , t s ) P (t s−1 )Φ( p , t s )
−1 = P (t s−1 )Φ( p , t s ) I p + Φ T ( p , t s ) P (t s−1 )Φ( p , t s ) .
Φ T ( p , t s ) P (t s−1 )Φ( p , t s ) (62)
Let L (t s ) := P (t s )Φ( p , t s ). Combining (59)–(62) gives the interval-varying MILS (V-MILS) algorithm:
ˆ t s ) = θ( ˆ t s−1 ) + P (t s )Φ( p , t s ) E ( p , t s ), θ( ˆ t s−1 ), E ( p , t s ) = Y ( p , t s ) − Φ T ( p , t s )θ( ˆ t ) = θ( ˆ t s ), θ( −1
s = 1, 2, 3, . . . ,
(63) (64)
t s t < t s +1 , −1
(65) T
(t s ) = P (t s−1 ) + Φ( p , t s )Φ ( p , t s ), P (0) = p 0 I n , Φ( p , t s ) = ϕ (t s ), ϕ (t s − 1), . . . , ϕ (t s − p + 1) , T Y ( p , t s ) = y (t s ), y (t s − 1), . . . , y (t s − p + 1) , P
(66) (67) (68)
or
ˆ t s ) = θ( ˆ t s−1 ) + L (t s ) Y ( p , t s ) − Φ T ( p , t s )θ( ˆ t s −1 ) , θ(
(69)
ˆ t ) = θ( ˆ t s ), θ(
(70)
t s t < t s +1 ,
−1 L (t s ) = P (t s−1 )Φ( p , t s ) I p + Φ T ( p , t s ) P (t s−1 )Φ( p , t s ) , −1 T P (t s ) = P (t s−1 ) − P (t s−1 )Φ( p , t s ) I p + Φ T ( p , t s ) P (t s−1 )Φ( p , t s ) Φ ( p , t s ) P (t s−1 )
= P (t s−1 ) − L (t s )Φ T ( p , t s ) P (t s−1 ), Φ( p , t s ) = ϕ (t s ), ϕ (t s − 1), . . . , ϕ (t s − p + 1) ∈ Rn× p , T Y ( p , t s ) = y (t s ), y (t s − 1), . . . , y (t s − p + 1) ∈ R p .
(71)
(72) P (0) = p 0 I n ,
(73) (74)
When t s∗ = 1, V-MILS = MILS; when p = 1, V-MILS = V-RLS; when p = 1 and t s∗ = 1, V-MILS = RLS. 4. Other identification algorithms 4.1. The modified stochastic gradient algorithm The SG identification algorithm in (5)–(6) has slow convergence rates. In order to improve the convergence rate of the parameter estimation, we present a novel SG algorithm by introducing a convergence index in Eq. (5), which is called the modified SG algorithm (the MSG algorithm or -SG algorithm for short). The MSG algorithm is as follows:
ˆ t ) = θ( ˆ t − 1) + ϕ (t ) y (t ) − ϕ T (t )θ( ˆ t − 1) , θ( r (t ) 2 r (t ) = r (t − 1) + ϕ (t ) , r (0) = 1.
1 2
< 1,
(75) (76)
F. Ding / Digital Signal Processing 20 (2010) 1027–1039
The analysis shows that if the convergence index satisfies can be guaranteed.
1 2
1035
< 1, then the convergence of the MSG algorithm in (75)–(76)
4.2. The multi-innovation modification stochastic gradient algorithm In order to improve the convergence rate of the MISG algorithm, introducing a convergence index in the MISG algorithm in (26)–(30) gives a multi-innovation modification stochastic gradient algorithm (the MI-MSG algorithm for short):
ˆ t ) = θ( ˆ t − 1) + Φ( p , t ) E ( p , t ), θ( r (t ) E ( p , t ) = Y ( p , t ) − Φ T ( p , t )θˆ (t − 1), 2 r (t ) = r (t − 1) + Φ( p , t ) , r (0) = 1, Φ( p , t ) = ϕ (t ), ϕ (t − 1), . . . , ϕ (t − p + 1) , T Y ( p , t ) = y (t ), y (t − 1), . . . , y (t − p + 1) .
(77) (78) (79) (80) (81)
Eq. (79) may be replaced with
2
r (t ) = r (t − 1) + ϕ (t ) , When
r (0) = 1.
(82)
= 1, the MI-MSG algorithm in (77)–(81) reduces to the MISG algorithm in (26)–(30).
4.3. The multi-innovation forgetting gradient algorithm In order to track the time-varying parameters, introducing a forgetting factor λ in the MISG algorithm in (26)–(30) leads to the following multi-innovation stochastic gradient algorithm with a forgetting factor (the multi-innovation forgetting gradient (MIFG) algorithm for short) in [6]:
ˆ t ) = θ( ˆ t − 1) + Φ( p , t ) E ( p , t ), θ( r (t ) E ( p , t ) = Y ( p , t ) − Φ T ( p , t )θˆ (t − 1), 2 r (t ) = λr (t − 1) + Φ( p , t ) , 0 < λ < 1, r (0) = 1, Φ( p , t ) = ϕ (t ), ϕ (t − 1), . . . , ϕ (t − p + 1) , T Y ( p , t ) = y (t ), y (t − 1), . . . , y (t − p + 1) .
(83) (84) (85) (86) (87)
Eq. (85) may be replaced with
2
r (t ) = λr (t − 1) + ϕ (t ) ,
0 < λ < 1, r (0) = 1.
(88)
As λ = 1, the MIFG algorithm in (83)–(87) reduces to the MISG algorithm in (26)–(30). 5. Examples Example 1. Consider the following system:
A ( z) y (t ) = B ( z)u (t ) + v (t ), where u (t ) and y (t ) are the system input and output, v (t ) a white noise with zero mean, A ( z) and B ( z) are polynomials in the unit delay operator z−1 [ z−1 y (t ) = y (t − 1)], and
A ( z) = 1 + a1 z−1 + a2 z−2 = 1 − 1.60z−1 + 0.80z−2 , B ( z) = b1 z−1 + b2 z−2 = 0.40z−1 + 0.30z−2 . Define the parameter vector θ and information vector
ϕ (t ) as
θ = [a1 , a2 , b1 , b2 ]T , T ϕ (t ) = − y (t − 1), − y (t − 2), u (t − 1), u (t − 2) . Then this example system can be rewritten as
y (t ) = ϕ T (t )θ + v (t ).
1036
F. Ding / Digital Signal Processing 20 (2010) 1027–1039
Table 1 The SG and MISG estimates and errors (σ 2 = 0.102 , δns = 14.69%). Algorithms
t
a1
a2
b1
b2
δ (%)
SG (MISG, p = 1)
100 200 500 1000 2000 3000
−0.83821 −0.85855 −0.88638 −0.90877 −0.92639 −0.93687
0.03825 0.06116 0.09153 0.11536 0.13312 0.14447
0.28133 0.28906 0.30010 0.30527 0.31059 0.31348
0.46369 0.47087 0.47724 0.48385 0.48854 0.49105
59.01289 57.41133 55.23552 53.54923 52.25394 51.45554
MISG, p = 4
100 200 500 1000 2000 3000
−1.11505 −1.15684 −1.21283 −1.25939 −1.28937 −1.30702
0.30868 0.36127 0.42945 0.46925 0.49875 0.51597
0.51381 0.49378 0.47742 0.46326 0.45532 0.45070
0.48569 0.48632 0.48004 0.47566 0.46960 0.46523
38.97234 35.40197 30.72178 27.46641 25.19876 23.85827
MISG, p = 8
100 200 500 1000 2000 3000
−1.57099 −1.56221 −1.56848 −1.57620 −1.57985 −1.58197
0.74557 0.76171 0.77275 0.77728 0.78047 0.78196
0.46999 0.44137 0.42832 0.41769 0.41282 0.41094
0.33747 0.33259 0.32491 0.32012 0.31673 0.31497
5.41259 4.05296 3.02560 2.28459 1.88964 1.69758
−1.60000
0.80000
0.40000
0.30000
b1
True values
Table 2 The SG and MISG estimates and errors (σ 2 = 1.002 , δns = 146.91%). Algorithms
t
a1
a2
b2
δ (%)
SG (MISG, p = 1)
100 200 500 1000 2000 3000
−0.87072 −0.88160 −0.90365 −0.92389 −0.94799 −0.96034
0.04447 0.05860 0.09235 0.12081 0.14509 0.15875
−0.06449 −0.05859 −0.04776 −0.03994 −0.03227 −0.02714
0.30065 0.30448 0.30842 0.31378 0.31853 0.32142
61.81849 60.81802 58.63754 56.77716 54.93691 53.92606
MISG, p = 4
100 200 500 1000 2000 3000
−1.29183 −1.30998 −1.34411 −1.38284 −1.41740 −1.42981
0.46943 0.49387 0.55052 0.58344 0.61241 0.62912
0.16033 0.16863 0.19205 0.20437 0.21785 0.22671
0.08844 0.10635 0.12648 0.14528 0.16050 0.16903
29.80354 27.91576 24.14180 21.28217 18.74067 17.47456
MISG, p = 8
100 200 500 1000 2000 3000
−1.71320 −1.68476 −1.63848 −1.64101 −1.63267 −1.62130
0.87367 0.86768 0.87032 0.84000 0.82925 0.82435
0.47940 0.46247 0.46726 0.45268 0.44697 0.44498
0.32046 0.32218 0.31480 0.30899 0.30587 0.30398
8.50669 6.84364 5.68987 4.21765 3.47388 2.99053
−1.60000
0.80000
0.40000
0.30000
True values
The inputs {u (t )} is taken as an uncorrelated persistent excitation signal sequence with zero mean and unit variance, and { v (t )} as a white noise sequence with zero mean and variance σ 2 . Applying the SG and MISG algorithms to estimate the parameters of this system, the parameter estimates and their errors with different innovation lengths are shown in ˆ t ) − θ/θ versus t are shown in Figs. 1 and 2 with p = 1, Tables 1 and 2 and the parameter estimation errors δ := θ( p = 4 and p = 8, where σ 2 = 0.102 and σ 2 = 1.002 , the corresponding noise-to-signal ratios of the system are δns = 14.69% and δns = 146.91%, respectively. Example 2. Consider the following system:
A ( z) y (t ) = B ( z)u (t ) + v (t ), A ( z) = 1 + a1 z−1 + a2 z−2 = 1 − 0.52z−1 + 0.50z−2 , B ( z) = b1 z−1 + b2 z−2 = 0.12z−1 + 0.65z−2 .
F. Ding / Digital Signal Processing 20 (2010) 1027–1039
1037
Fig. 1. The MISG estimation errors δ versus t (σ 2 = 0.102 ).
Fig. 2. The MISG estimation errors δ versus t (σ 2 = 1.002 ).
The simulation conditions are the same as in Example 1. Applying the V-RLS algorithm and V-MILS algorithm with p = 2 to estimate the parameters of this system, the results are shown in Table 3 and Fig. 3, where σ 2 = 1.002 , the corresponding noise-to-signal ratio is δns = 142.71%. From the simulation results in Tables 1 to 3 and Figs. 1 to 3, we can draw the following conclusions:
• The parameter estimation errors given by the MISG algorithm decrease with the innovation length p increasing – see Figs. 1 and 2, thus the MISG estimates with p 2 have higher accuracy than those of the SG algorithm; a low noise level results in a fast rate of convergence of the parameter estimates to the true values – see Tables 1 and 2.
• The V-MILS algorithm with p 2 have faster convergence rates than the V-RLS algorithm – see Table 3 and Fig. 3. • A large innovation length p leads to small parameter estimation errors for the same data length, but the price paid is a large computational effort. On the other hand, the increased computation is still tolerable and affordable, especially in nowadays that the computing power of computers doubles day by day. 6. Conclusions This paper constructs the stacked output vector and stacked information matrix and derives the multi-innovation projection algorithm, multi-innovation SG and multi-innovation RLS algorithms to improve the convergence rates of the SG and RLS algorithms for linear regression models. The proposed methods can generate highly accurate parameter estimates, compared with the SG and RLS algorithms. Finally, an interval-varying multi-innovation SG and RLS algorithms are given for handling missing data cases.
1038
F. Ding / Digital Signal Processing 20 (2010) 1027–1039
Table 3 The V-RLS and V-MILS estimates and errors (σ 2 = 1.002 , δns = 142.71%). Algorithms
t
a1
a2
b1
b2
δ (%)
V-RLS (V-MILS, p = 1)
100 200 500 1000 1500 2000
−0.48534 −0.48692 −0.52036 −0.52021 −0.52734 −0.51862
0.36587 0.41060 0.45447 0.50105 0.49657 0.49517
0.05397 0.17262 0.13700 0.18223 0.17855 0.14727
0.89900 0.81523 0.80604 0.74090 0.69712 0.67698
29.89506 20.22411 16.70353 11.25933 7.72617 3.95403
V-MILS, p = 2
100 200 500 1000 1500 2000
−0.49465 −0.48227 −0.52393 −0.52903 −0.52878 −0.52580
0.39268 0.43335 0.49209 0.51303 0.50466 0.50114
0.06316 0.10919 0.09241 0.13421 0.13253 0.11684
0.82179 0.73236 0.73066 0.69235 0.66512 0.64829
21.65721 11.54791 8.75977 4.84469 2.24933 0.70647
−0.52000
0.50000
0.12000
0.65000
True values
Fig. 3. The MILS estimation errors δ versus s (σ 2 = 1.002 ).
The multi-innovation theory can be extended to study identification problems for non-uniformly sampled multirate systems [17,18], nonlinear systems [19–21], time-series models [22,23], and can be combined with the hierarchical identification principle in [24,25] to propose new hierarchical multi-innovation identification methods. The multi-innovation identification methods in the paper can be applied to estimate system parameters as the basis of designing filters or feedback control laws for uncertain systems or multirate systems [26–30]. References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12]
M.B. Malik, M. Salman, State-space least mean square, Digital Signal Process. 18 (3) (2008) 334–345. C. James, Total least squares, matrix enhancement, and signal processing, Digital Signal Process. 4 (1) (1994) 21–39. L.L. Han, F. Ding, Multi-innovation stochastic gradient algorithms for multi-input multi-output systems, Digital Signal Process. 19 (4) (2009) 545–554. G.C. Goodwin, K.S. Sin, Adaptive Filtering, Prediction and Control, Prentice-Hall, Upper Saddle River, NJ, 1984. F. Ding, P.X. Liu, H.Z. Yang, Parameter identification and intersample output estimation for dual-rate systems, IEEE Trans. Systems Man Cybernet. A 38 (4) (2008) 966–975. F. Ding, T. Chen, Performance analysis of multi-innovation gradient type identification methods, Automatica 43 (1) (2007) 1–14. F. Ding, P.X. Liu, G. Liu, Auxiliary model based multi-innovation extended stochastic gradient parameter estimation with colored measurement noises, Signal Process. 89 (10) (2009) 1883–1890. F. Ding, P.X. Liu, G. Liu, Multi-innovation least squares identification for system modelling, IEEE Trans. Systems Man Cybernet. B 40 (2010), doi:10.1109/TSMCB.2009.2028871, in press. D.Q. Wang, F. Ding, Performance analysis of the auxiliary models based multi-innovation stochastic gradient estimation algorithm for output error systems, Digital Signal Process. (2010), doi:10.1016/j.dsp.2009.09.002, in press. L.L. Han, F. Ding, Identification for multirate multi-input systems using the multi-innovation identification theory, Comput. Math. Appl. 57 (9) (2009) 1438–1449. L.L. Han, J. Sheng, F. Ding, Y. Shi, Auxiliary model identification method for multirate multi-input systems based on least squares, Math. Comput. Modelling 50 (7–8) (2009) 1100–1106. Y.J. Liu, Y.S. Xiao, X.L. Zhao, Multi-innovation stochastic gradient algorithm for multiple-input single-output systems using the auxiliary model, Appl. Math. Comput. 215 (4) (2009) 1477–1483.
F. Ding / Digital Signal Processing 20 (2010) 1027–1039
1039
[13] F. Ding, T. Chen, Combined parameter and output estimation of dual-rate systems using an auxiliary model, Automatica 40 (10) (2004) 1739–1748. [14] F. Ding, T. Chen, Parameter estimation of dual-rate stochastic systems by using an output error method, IEEE Trans. Automat. Control 50 (9) (2005) 1436–1441. [15] J.B. Zhang, F. Ding, Y. Shi, Self-tuning control based on multi-innovation stochastic gradient parameter estimation, Systems Control Lett. 58 (1) (2009) 69–75. [16] L. Ljung, System Identification: Theory for the User, 2nd ed., Prentice-Hall, Englewood Cliffs, NJ, 1999. [17] F. Ding, L. Qiu, T. Chen, Reconstruction of continuous-time systems from their non-uniformly sampled discrete-time systems, Automatica 45 (2) (2009) 324–332. [18] Y.J. Liu, L. Xie, F. Ding, An auxiliary model based recursive least squares parameter estimation algorithm for non-uniformly sampled multirate systems, in: Proceedings of the Institution of Mechanical Engineers, Part I, J. Syst. Control Eng. 223 (4) (2009) 445–454. [19] F. Ding, T. Chen, Identification of Hammerstein nonlinear ARMAX systems, Automatica 41 (9) (2005) 1479–1480. [20] F. Ding, Y. Shi, T. Chen, Auxiliary model based least-squares identification methods for Hammerstein output-error systems, Systems Control Lett. 56 (5) (2007) 373–380. [21] D.Q. Wang, F. Ding, Extended stochastic gradient identification algorithms for Hammerstein–Wiener ARMAX systems, Comput. Math. Appl. 56 (12) (2008) 3157–3164. [22] F. Ding, Y. Shi, T. Chen, Performance analysis of estimation algorithms of non-stationary ARMA processes, IEEE Trans. Signal Process. 54 (3) (2006) 1041–1053. [23] F. Ding, Y. Shi, T. Chen, Amendments to “Performance analysis of estimation algorithms of non-stationary ARMA processes”, IEEE Trans. Signal Process. 56 (10) (2008) 4983–4984 (Part I). [24] F. Ding, T. Chen, Hierarchical gradient-based identification of multivariable discrete-time systems, Automatica 41 (2) (2005) 315–325. [25] F. Ding, T. Chen, Hierarchical least squares identification methods for multivariable systems, IEEE Trans. Automat. Control 50 (3) (2005) 397–402. [26] Y. Shi, B. Yu, Output feedback stabilization of networked control systems with random delays modeled by Markov chains, IEEE Trans. Automat. Control 54 (7) (2009) 1668–1674. [27] M. Yan, Y. Shi, Robust discrete-time sliding mode control for uncertain systems with time-varying state delay, IET Control Theory Appl. 2 (8) (2008) 662–674. [28] B. Yu, Y. Shi, H. Huang, l2 − l∞ filtering for multirate systems using lifted models, Circuits Systems Signal Process. 27 (5) (2008) 699–711. [29] Y. Shi, F. Ding, T. Chen, Multirate crosstalk identification in xDSL systems, IEEE Trans. Commun. 54 (10) (2006) 1878–1886. [30] Y. Shi, F. Ding, T. Chen, 2-Norm based recursive design of transmultiplexers with designable filter length, Circuits Systems Signal Process. 25 (4) (2006) 447–462.
Feng Ding was born in Guangshui, Hubei Province. He received the B.Sc. degree from the Hubei University of Technology (Wuhan, China) in 1984, and the M.Sc. and Ph.D. degrees in automatic control both from the Department of Automation, Tsinghua University in 1991 and 1994, respectively. From 1984 to 1988, he was an Electrical Engineer at the Hubei Pharmaceutical Factory, Xiangfan, China. From 1994 to 2002, he was with the Department of Automation, Tsinghua University, Beijing, China and he was a Research Associate at the University of Alberta, Edmonton, Canada from 2002 to 2005. He was a Visiting Professor in the Department of Systems and Computer Engineering, Carleton University, Ottawa, Canada from May to December 2008 and a Research Associate in the Department of Aerospace Engineering, Ryerson University, Toronto, Canada, from January to October 2009. He has been a Professor in the School of Communication and Control Engineering, Jiangnan University, Wuxi, China since 2004. He is a Colleges and Universities “Blue Project” Middle-Aged Academic Leader (Jiangsu, China). His current research interests include model identification and adaptive control. He co-authored the book Adaptive Control Systems (Tsinghua University Press, Beijing, 2002), and published over 100 papers on modelling and identification as the first author.