SIGNAL
PROCESSING ELSEVIER
Signal
Processing
44 (1995) 103-l
15
Convergence of matrix LMS algorithms with applications to LMS/Newton* C. HorrP *, R.H. Kwongb aDepartment of Electrical Engineering, Princeton University. Princeton. NJ 08544, USA, b Department of Electrical Engineering, University of Toronto, Toronto, Ont.. Canada M5S IA4 Received
8 April 1994; revised 16 January
1995
Abstract
Necessary and sufficient conditions for convergence of a general matrix LMS algorithm are derived in the case where the adaptation matrix and the input correlation matrix commute. It is shown that the LMS/Newton algorithm is an important special case of the introduced algorithm. From this fact we deduce that the bounds on adaptation gain commonly quoted in the literature for LMS/Newton and sequential regression algorithms are not strict enough and must be reduced to provide convergence of the weight vector covariance matrix. The steady-state performance of the algorithm as wet1 as its performance in a nonstationary environment are also discussed. Previous results in this area are extended while using fewer assumptions. Finally, a class of nonlinear processes is introduced. Performance on an example process is given to show that our algorithm performs well in this environment which naturally has large eigenvalue spreads. Zusammenfassung
Notwendige und hinreichende Bedingungen fur die Konvergenz eines allgemeinen LMS-Algorithmus werden fiir den Fall hergeleitet, in dem Adaptionsmatrix und Korrelationsmatrix des Eingangssignals kommutieren. Es wird gezeigt, dat3 der LMS/Newton-Algorithmus einen wichtigen Spezialfall des beschriebenen Algorithmus darstellt. Hieraus lhgt sich schliehen, daR die in der Literatur gemeinhin angegebenen Schranken fur die Adaptionsverstarkung von LMS/Newton-Algorithmus und Sequentiellem Regressions-Algorithmus nicht streng genug sind und reduziert werden mussen, urn Konvergenz der Kovarianzmatrix des Gewichtsvektors zu gewahrleisten. Sowohl das stationare Verhalten des Algorithmus als such sein Verhalten in einer nichtstationaren Umgebung werden diskutiert. Bekannte Resultate werden erweitert, notwendige Annahmen eingeschrankt. SchlieRlich wird eine Klasse nichtlinearer Prozesse eingefiihrt. Anhand eines Beispielprozesses wird gezeigt, wie gut sich der vorgestellte Algorithmus in einer Umgebung verhalt, die sich durch weit auseinanderliegende Eigenwerte auszeichnet. R&sum6
Dans cet article, des conditions necessaires et suffisantes pour la convergence d’un algorithme LMS matriciel general sont derivees dans le cas od la matrice d’adaptation et la matrice de correlation de l’entree commutent. I1 est montre que
“This work was partially supported by the Natural *Corresponding author. Fax:
[email protected].
Sciences and Engineering
0165-1684/95/$9.50 0 1995 Elsevier Science B.V. All rights reserved SSDI 0165-1684(95)00018-6
Research
Council
of Canada
104
C. Horn, R.H. Kwong / Signal Processing 44 (1995) 103% 115
l’algorithme LMSjNewton est un cas particulier important de l’algorithme introduit. A partir de ce fait nous dkduisons que les valeurs limites sur le gain d’adaptation communtment citkes dans la littkature pour les algorithmes LMS/Newton et de rkgression siquentielle ne sont pas assez strictes et doivent Ctre riduites afin de garantir la convergence de la matrice de covariance du vecteur de pond&ration. Les performances en rkgime ttabli de l’algorithme ainsi que ses performances en milieu non-stationnaire sont tgalement discutbes. Des rksultats antkieurs dans ce domaine sont Ctendus alors qu’un nombre restreint d’hypothkses est pris en compte. Enfin, une classe de processus non-lintaires est introduite. Les performances sur un exemple de processus sont donnkes pour montrer que notre algorithme se comporte bien dans cet environnement qui naturellement prksente une large dispersion des valeurs propres. Keywords: Matrix LMS algorithms; LMS/Newton;
Sequential regression; Nonlinear
1. Introduction The least mean square (LMS) algorithm, perhaps the most famous algorithm in the area of adaptive signal processing, was introduced by Widrow and Hoff in the late 1950s. Since then, much analysis has been performed and it is now well known that the performance of the LMS algorithm degrades in the case of a large eigenvalue spread. Among the algorithms introduced to alleviate this problem are the LMS/Newton algorithm and the related sequential regression (SER) algorithm [ 13,4]. This paper examines a general matrix LMS algorithm designed to alleviate the large eigenvalue spread problem. We start with a brief review of the system identification problem considered and some of the corresponding results for the LMS algorithm in Section 2. Section 3 introduces the matrix LMS algorithm and derives necessary and sufficient conditions for convergence in the case where the adaptation matrix and the input correlation matrix commute. This is presented in Theorem 3.3. Issues of nonstationary optimal weight vector and steadystate performance are discussed in Section 3.1. In Section 4, it is shown that the LMS/Newton algorithm is an important special case of the introduced algorithm. We then show that the bounds on adaptation gain commonly quoted in the literature for LMS/Newton and sequential regression algorithms are not strict enough and must be reduced to provide convergence of the weight vector covariance matrix. The correct bounds are presented in Theorem 4.1. In Section 5, the performance of the algorithm is examined in a nonlinear environment where large eigenvalue spreads arise. A nonlinear process called the nonlinear moving average
moving average processes
process is introduced, and it is seen that examining such an environment can in many cases be better motivated than examining a truncated Volterra process. Finally, concluding remarks are presented in Section 6.
2. System identification and the LMS algorithm Consider the estimation of a signal yk (usually referred to as the detired or training signal) using the estimate jk = @& where & is a measured data vector and & is a weight vector. The problem is to find the weight vector which minimizes the mean square error & = E[ek2] = E[( yk - jk)*]. If & and yk are jointly stationary, then the well-known Wiener solution is
e* =
R-‘P,
where R = E[+k$T] is the input correlation matrix and P = E[yu#k] is the cross-correlation vector. The vector 8* can be estimated recursively using the steepest descent algorithm
where pk is an estimate of the gradient a&/&k and p is the adaptation constant which regulates the speed and stability of adaptation. The least mean square algorithm, which is well established and documented [13], uses the gradient estimate a&/a& = - 2ek& and thus the algorithm is given by (1)
C. Horn, R.H. Kwong / Signal Processing 44 (1995) 103-I 15
To simplify the analysis, we shall assume throughout, as in [3,7], that the various data vectors are mutually independent and that the sequences $Q and yk are zero mean Gaussian random processes that are uncorrelated in time. It is well known [l3] that E[&] + 8* as k -+ cc for 1 O < P < &,(R)’
(2)
where &JR) is the maximum eigenvalue of R. However, Refs. [3, 71 have shown that for convergence of the weight vector covariance matrix the more stringent conditions 1
o-c/i <-------, Urnax
105
develop necessary and sufficient conditions for convergence when M commutes with R. In a later section we shall relate it to other matrix algorithms, namely LMS/Newton and sequential regression. Assuming MR is diagonalizable, and following the same analysis as for the standard LMS algoritm, we obtain the condition 0 < Ai
< 1
‘dAi(MR) E o(MR)
(7)
for convergence of the mean of the weight where a(MR) is the set of eigenvalues matrix MR. Furthermore, if R = diag(21, use a diagonal then one can M = diag(pl, . . . ,pN) with the condition
vector, of the . . . ,nN) matrix
(3) O
(8) I
are required. It is also noted in [3] that the interval 1 3 tr(R)
o<,u<-,
which is relatively easy to check, satisfies the above conditions.
3. A general matrix LMS algorithm (MCLMS) It is well known that the LMS algorithm works most efficiently when the eigenvalues of the R matrix are equal or nearly equal. With disparate eigenvalues, however, stability is dependent on the fastest modes while settling time is limited by the slowest modes. One attempt to alleviate this problem is to use multiple adaptation constants; large ones for small eigenvalues and small ones for large eigenvalues. The adaptive algorithm can then be expressed as e,+
1 =
s,
+
2Mek4k,
(6)
with M E IWNxN being the adaptation matrix. Other work with matrix LMS algorithms has been performed but usually the matrix is assumed to be R - ’ from the start. Also, necessary and sufficient conditions for convergence are not produced. We shall now examine the above general matrix algorithm (multiple adaptation constants LMS) in detail and
for convergence of E[&] to @*. It has been shown in [3, 71 that it is not enough to just consider the mean; the covariance must also be examined. Convergence of the covariance matrix where the deviation vector is ck = E[gkg:], & = & - 0*, can be examined in a manner similar to that employed in [3]. We now perform such an analysis which culminates in Theorem 3.3. To that end, first assume that R and M are diagonal matrices. Since ek = yk - jk = =
d
-
yk
-
4:6*
+
cjF6*
-
ds8k
(9)
+kTgk,
where et = yk - 4:8*, we obtain the update algorithm in terms of the deviation vector 8k+,
=gk++Mek4k
=
(I
-
2M$kf$;)~k
+
2Mef4k.
Note that E[&&]
= E[&(yk - f$:d*)] = P - RB* = 0.
Also note that & depends only on ($i, yi} for i < k and therefore is independent of &. Therefore, ck+1
=
E[gk+18:+1]
=
E[((I
x (@(I
-
2M&&:)&
-
24k4TM)
+
2Md4k)
+
2etd:M)l
C. Horn, R.H. Kwong / Signal Processing 44 (1995) 103-l I5
106
= E[%R]-2(MRE[&kBT]
+
+ 4ME[$&‘kB:~~~~]
E[&KJRM)
M + 45*MRM
lm~xW)l
= Ck - 2MRCk - 2CkRM + 4ME[&~%&c@:] = Ck - 2(MRCk
+ 4l* MRM,
(10)
where e* = E [(ef)2] is the minimum mean square error and, as pointed out in [7], R[&~~Ckf$k@] = 2RCkR + Rtr(RCk) due to the Gaussian assumption on &. Eq. (10) can be divided into the diagonal terms and off diagonal terms. For i # j: 1=
[Cij]k+
[cij]k
+
=
-
2/bli[cij]k
-
2pjAj[cij]k
8/&&pjlj[Cij]k
(12)
1, . . . ,N,
i =
Ii 1 -
PiA
2fliAi
i=l
(13)
< 1.
These conditions are thus necessary and sufficient for convergence of ck. A simple way to generate pi satisfying (13) and (14) is to choose pi such that piJ,i = C, where c is a constant. Then conditions (13) and (14) simplify to
pij[cij]k,
’ < pi< (N
where pij = 1 - 2(pili + pjjlj) + 8pilipjij. For i = j: 1 = pii [Cii]k + 4,dh
[cii]k+
< 1
is necessary and sufficient for convergence of Ck. By using arguments similar to those in [3], it can be shown that Eq. (12) is equivalent to the conditions
M + 45* MRM
+ CkRM) + 8MRCkRM
+ 4MRtr(RCk)M
where AT = [Ai, . . . ,&I and F = diag(pii, . . . , PNN) + 4M2MT. Eq. (11) shows that the condition
F
Ap[cpp]k
+
4$h<*,
;
i =
2)&’
1, . . . ,N.
(15)
It is readily seen, with this simplification, that all pi of the MCLMS design can be made strictly greater than the number given by Eq. (5) if
p=1
(N + 2)1,,,(R) < 3 tr (R) where pii = 1 - 4pi/$ + 8/~,fJ: = (1 - 2pini)’ + 4/&I ?A2 > 0 * Since [cij]k2
<
xf = [[cll]k, Then
xk
holds. If not, one can still always find a set of pi with all being greater than or equal to that number.
[cii]k[cjj]k,
therefore convergence of ck will be assured by the convergence of the diagonal elements. Let us define ...
,[CNN]k].
Example 3.1. If 1 R=
xk+l =
0
0 1
i
0
satisfies the equation Pll
1 ,
10
then 3 tr(R) = 36 but (N + 2)&,,(R) = 50 so that Eq. (16) is not satisfied. However, one can see that choosing pLmin = l/(3 tr(R)) and then solving for c = pi&, i < N from (N - 1)c ~ 1 - 2c = ’ -
(l/3 tr(R))kdR) 1 - 2(1/3tr(R))&,(R)
3 tr(R) - 31,,(R) = 3tr(R) - 21,,,(R) = FXk + 45*M2A,
(16)
to give
107
C. Horn, R.H. Kwong / Signal Processing 44 (1995) 103- I15
3 tr(R) - 31,,,(R) ’ = 3(N + l)tr(R) - 2(N + 2)1,,,(R) ’
(17)
provides the regions 0 < PI,
~2
<
M = Q&Q’. 0 < ,u3 < l/36,
3/22,
&(MR) = I+i(M)Ai(R), i = 1, . . . , N, and the condition
So far we have developed convergence conditions for diagonal R (and therefore diagonal M) matrices. We now relax that assumption and consider a general symmetric R matrix. We can then extend conditions (8) (13) and (14) by using the following lemma.
O
QTAQ=
o
‘..
AN [ 0 1 A and B commute.
Pl
and
QTBQ =
0
“. [
0
PN
ii
is obtained as in Eq. (8). (b) Let & = QT&. Then G+1 = JqB;+1~Ll
Lemma 3.2. For real, symmetric matrices A and B: There exists an orthogonal matrix Q s.t. o-
(18)
Proof of Theorem 3.3. (a) AMAR = QTMQQTRQ = QTMRQ = AMR. Therefore,
which satisfy the conditions to give convergence of the weight vector covariance matrix.
/I11
a diagonal AM to satisfy the appropriate conditions for the diagonal matrix & and then we can set
1
Proof. This is a standard result in linear algebra. For example, see [l]. 0 The previous lemma together with our analysis for diagonal matrices leads to our main theorem on MCLMS convergence. Theorem 3.3. (MCLMS Convergence). Suppose M commutes with R so that 3Q such that M= QA,QT and R = QA,Q’ with AM =diag(pI, . . . ) pN) and AR = diag(&, . . . , A,). Then the MCLMS algorithm &+, = fik + 2Mek& provides (a) convergence of the mean of the weight vector ifs condition (8), 0 < lli < l/iii, holds. (b) Convergence of the weight vector covariance ifl conditions (13), 0 < pi < 1/2Ai, and (14), Ci”=l ~i;li/(l - 2piIi) < 1, hold. Theorem 3.3 suggests that if we are given R = QA,QT, then we can design M by designing
= ECQ’&+I~+IQI = Q’G+ IQ = QT[Ck-2(MRCk+CkRM) + 4MRtr(RC,)M
+ 8MRCkRM
+ 4t*MRM]Q
= C; - 2(AMARC; + C;A,A,) + 8AMARC;AMAR
+ 4AMARtr(ARCI)AM
+ 45*AhA~ which is equivalent to Eq. (11) and thus leads to Eqs. (13) and (14). 0 A significant advantage over the LMS algorithm is seen by noticing that one can now control all of the time constants of the system. Therefore, stability and a faster speed of convergence can be simultaneously achieved. However, the price that must be paid for this advantage is the requirement of a better estimate of R. We shall discuss this point further in Section 4.
3.1. Nonstationary optimal weight vector and steady-state performance So far, the performance of the MCLMS algorithm in a stationary environment, where 8* is a constant, has been discussed. Now we consider
108
C. Horn. R.H. Kwong / Signal Processing
a situation where 8* varies with time and we shall explicitly assume that yk and jk are of the form Yk
=
$f@
+
(19)
nk
=
Theorem 3.4. Condition (8), 0 < pi < l/&, is still the necessary and suficient condition for convergence of the mean of the weight vector.
Proof. Since E[&nk]
and ik
44 (I 995) 103-l 15
= 0, we have from (23) that
EC&.+r] = (1 - 2MR)Et%]
4:8kk,
where nk is a zero mean additive noise, independent of &, We shall assume the optimal weight vector e,* to be nonstationary, specifically generated by the equations @k+l=a@k+Zk,
(20)
0: = @k + 8*,
(21)
with 1a 1 < 1 and 00, Zk mutually independent, also independent of & and $., that satisfy E [O,Oi] = &I, E[Zk] = 0 Vk, and E[ZjZ;f] This nonstationary situation represents a Markov process converging in the mean to 0*. A slightly less general framework similar to the one above has been examined in [9, 121, but in both cases the simplifying assumption that a is approximately 1 is made. Our work handles the more general framework while also not requiring any simplifying assumptions. The minimum mean square error shall again be denoted as [* and it is defined by
+ (1 -
dE[@klv
which implies I!$&+,]
+ (1 - u)QTE[ok].
= (1 - 2&&)E[&]
Since E [OJ + 0 for any Ja( < 1, therefore #?[@k]+ 0 iff condition (8) holds. 0 Theorem 3.5. Conditions (13), 0 < pi < 1/22i, and (14), 1:. 1piAi/(1 - 2piI.i) < 1, are still the necessary and suflicient conditions for convergence covariance of the weight vector.
of the
=a2zsjk.
= E[n;].
[* = E[(J’k - $;f@)‘]
(22)
Proof. Recall (23), &+ 1 = (I-
2M&&)&
+ 2kf&nk + (1 - a)& - &.
Then &+
I = Q’U - 2M&s$:)Q&t + 2QTM&nk + (1 - U)QT&- QT& _
and so C;+1=‘%%+1&:1] =EIQT(I-
2M4k4:)Q&&TQT
X(1- 2dkd:M)Q]
Also the error at each iteration is now
+ E[QT(~-2Md’,6%8&(l
The MCLMS equation (6) will now be discussed in this context. We will show that convergence of the weight vector is still obtained in this nonstationary optimal weight vector environment. Since gk+,
=6k+d:+l
a)@:Ql
-a)QT@k&TQT(I - 24,&‘M)Q]
+ E[4QTMn‘:nkMQ] + E[(l
- U)‘QT@,@;Q]
+ E[QTzd%?].
Note that
=
6k
-
e:+
=
Pk
+
(1
=
(I
-
2h!fC/Jk@)8k
+
+ E[(l
-
(1
-
1 +
-
a)@k
a)@,
ECQTU - 2Mh#$)Q&(1- a)@:Ql = (1 - a)(Z - 2A,A,)E[&@:Q].
miek4k
-
Zk
+
-
+
2M4k(nk
-
$:ok)
Therefore,
2hCf$Jknk
zk,
we obtain the following theorems.
(23)
c;, 1 = c; - 2(/I&l& + 8&&C;&&
+ c&n,) + 4&&
tr(&C;)&
C. Horn, R. H. hong
/ Signal Processing 44 (I 995) IO3- I1 5
+ (1 - a)@ - 2A,A,)(E[&@Q] + E[Q*O,@])
due to the stability of the matrix (I - 2&&). If iS defined as before, then d=[!?ll]k, . . . ,[&]k] xk now satisfies the equation
+ 4&/1,/1,5*
+ (1 - u)~Q~E[O~@:]Q + u,2Z, and so
p=l O,’
+
[Bilk3
where [Bilk represents the time varying terms involving Ok. Now
W%@%QI= EC(Q’V- 2Mk,tb:-,)Q&-1 + 2Q*Mq&- Ink- 1 + (1 - a)QTOk_ 1
1@fQ]
=
EC(4~6’:
=
_I?[$]
-
+ (1 - u)Q*E[O~_~O;~]Q - a;Z = u(Z - 2AMAR)E[~k_ 1 O;F_1Q] + (1 - a)QTEIO,_,O,T]Q
- a:Z.
Also, by iterating (20) we have =
UkOO
+
C Uk- ’ j=O
= E
E[Q,Q:]
K (
=
u2kg;z
+
-
@Gk)“]
E[(e,
-
2Ebk#J;f@k
-
=
t*
+
%%%k&&]
=
t*
+
tr(&C;)
1
U’-‘-jzj
6:)‘4k&(e,
-
e:)]
%)]
= <* + AT(Z- F)-‘[4<*A$L
j=O k-l
uk~;S
+
+
nk
k-l
Uk@O
x
+
Now assume that A,,, is chosen so that xk converges to a finite value. Then
-‘Zj,
and so
(24)
= <* + AT,&.
k-l Ok
+ &I + fik,
Now a detailed analysis of the mean square error is presented. Recall that & = E[et] represents the mean square error. Then tk
- QT4- IP,TQl = (I - 2&&)E[&_
&+ 1 = F& + 4&n<*
where IT = Cl, . . ,11, IC = [[hlk, . , [h]k], and 1 and F are as defined in (11). Convergence of this equation only depends on F and so the same conditions for maintaining the eigenvalues of F inside the unit disc are obtained. 0
Cclilk+1 = Pii +
109
+
1
uk-l-jzf
j=O k-l C
I
u2(k-l-Jl,;I
j=O
k-l = u2kazI + c
u2jalz
where Bk-+ /Im exponentially and x, is found from ~q. (24). There are two measures of performance normally considered in the examination of LMS type algorithms. The first measure of performance, called misadjustment, is defined by
= QTE[@k@:]Q.
j=O
+ at1 + fim], (25)
M
=
L - 5*
(26)
s t*
Similarly, k-2 E[@k-
l@;f]
=
a2k-‘oil
+
a
c
a2’k-2-nofl
It measures the relative difference from the minimum mean square error. From (25) we obtain
j=O k-2 =a
2k-‘a#$I + a C u2jaiZ.
M, = 41*(Z - F)-‘A&l+
j=O
Therefore, we can conclude that [bilk is bounded and converges exponentially to a bounded quantity
+ AT&Z
- F)-‘/?,.
9
A*$(Z
- F)-‘1
C. Horn, R.H. Kwong / Signal Processing
110
The second measure of performance is the rate of convergence of the mean square error to its steady state. This is measured by the term J = f
(
44 (1995) 103-I I5
The above two equations show the tradeoff between misadjustment M, and rate of convergence J. M, is made smaller by decreasing the ,u;s, but decreasing the pi’s increases J.
(27)
k=O
with small .Z indicating a fast convergence rate. L&dk=xk-X,.Then + O,zI + flk - Fx,
dk+ 1 = Fx, + 4l*/i9. - 45*&n
= FAk +
4. Relation to LMS/Newton and sequential regression (SER) Let pi = ci/li where 0 < ci < f and Cr=l ci/ of the (1 - 2ci) < 1. Then, an examination MCLMS design equation (18) shows that we are in fact considering the matrices M of the form
- 0,21 - boo
(bk
-
hx,).
Therefore, 0
Cl
M=Q
and so
[ J=
~ITAk=~T
FkAo+k~lFk-l-‘(~j-~,)
~
k=O
k=O
j=O
(
do+
=IzT(Z-F)-’
f
(fik-jm)
k=O
(
)
mJT)-l = r-l
- 1+
=
’
CL1
/AU1
~~r_lUr-wr-l.
1 -
~~=l~Ai/(l
2PiAi)
-
2/li&)
+ L CE,Cot+ CBilcoY4Pd1 - 2Pini) r*
Now consider the LMS/Newton algorithm which is given by &+I = i.&+ 2$-‘e&k.
With a little algebra one obtains M
CN
, 1
the last sum being finite due to the exponential convergence of /?k. To tind (Z - F)-‘, use a variation of the matrix inversion formula
(r +
0
1 Q’R-?
*a.
1 - ~Yl/.&~/(l - 2fi;li)
(28)
(30)
This is a special case of (6) with M = pR- ‘, which certainly commutes with R. In fact, it is the important special case of MCLMS where ci= pili = p V’i.AS a result, we have shown that the range 0 c ,u < 1 commonly quoted in the literature for convergence of the LMS/Newton (and SER) algorithm is incorrect for convergence of the weight vector covariance matrix. The following theorem provides the correct result. Theorem 4.1. A necessary and sufiicient condition for convergence of the weight vector covariance matrix for the LMSINewton (and SER) algorithm is 0 < p < l/(N + 2).
and J=
which extend similar equations in [3] to the matrix case and to this nonstationary optimal weight vector environment.
Proof. Follows from Eq. (15).
0
The following simulation shows that indeed the allowable region of p in the LMS/Newton algorithm must be decreased to insure convergence. The system to be identified was Yk
=
2oul.k
-
2ou2,k
+
5u3.k
-
h,k,
C. Horn,
R.H. Kwong
! Signal Processing
44 (1995)
III
103- I15
1000
7
900
IO51
1o-'O ._ 0
20
40
80 lIeration
60
100
120
140
3 400L
.-. 12
14
16
18
?
A-A-L
22
14
26
lllli
28
1 3
x
10’
Fig. 1. Mean square error Fig. 2. Performance
with the Ui.L being zero mean Gaussian
R=[l 5oo 10000
10
O
10000
signals with
1 1
for a large eigenvalue spread. In terms of the terminology introduced previously,
d,’ = c”,,k, UZ.k, u3.k, U4.k] t)*T = [20, -20,
5, -51.
Fig. 1 shows the divergence of the algorithm for ,LI= 0.2 and p = 0.25 which are well below the value 1 quoted in the literature but above &which is the cutoff as specified by the above analysis. On the other hand, convergence occurs for p = 0.15. Fifty runs were averaged in this simulation. It is useful to point out that in an important special case MCLMS can be viewed to give another derivation for the LMS/Newton algorithm. However, we should note that the MCLMS algorithm is more general than LMS/Newton and does not have to give only matrices of the form ,uR ‘, Also, an examination of Eqs. (28) and (29) shows that in a general framework where we want to find the optimal M to minimize some performance criterion based on M, and J, th’e optima1 M will in general
criterion
versus p.
not be of the form pRm ‘. This is illustrated following example.
in the
Example 4.2. In [3], the optima1 p for the LMS algorithm is found to minimize J for the case of equal eigenvalues. We shall also use J as our performance criterion. We shall assume a stationary environment and so Eq. (29) simplifies to ix]“= '
=
~CAOliIPi( 1 - 2Piii)
1 - Cl”= 1/.Liii/( 1 - 211i”i)
Assume j>, = 1, i., = 100, [n,,]r = 1. and [doI = 1. We now show that for this example with two nonequal eigenvalues, that a value for J can be found which is smaller than the optimal value given by the LMS/Newton algorithm. To see this, examine Fig. 2. First consider the curve labelled “LMS/ Newton” which provides a plot of J( p) versus ~1for the LMS/Newton algorithm. The optima1 was seen to be approximately at p = 0.00183. Next consider the other curve labelled “MCLMS” which provides a plot ofJ(p, = 0.00183, ,u~) versus 11~. We see that we can obtain values of J which are lower than the optimal for the LMS/Newton algorithm but which still satisfy the MCLMS design equations. Note, however, that we do not claim that we have provided the optima1 solution for the MCLMS case but only that we can achieve a lower cost criterion than the LMS/Newton case. In general, an examination of the KuhnTucker conditions for
112
C. Horn, R.H. Kwong / Signal Processing
optimization under inequality constraints must be performed. The MCLMS algorithm assumes knowledge of R. To create a scheme for the case of unknown R we can use a variable matrix Mk = QJ,,Qf which we update by designing AM, for Ag,, where &= Qkna,Q;f. In this case it may be more practical to restrict to the special case of ci = p Vi since the update would only involve & and not Qk explicitly. In a stationary environment (where R does not vary with time), rather than updating at each iteration, we suggest a compromise LMS/MCLMS algorithm which starts with the LMS algorithm and then switches to the MCLMS algorithm when an online estimate for R is obtained. This is computationally less intensive than an algorithm where an estimate ff, is generated for each k, but we still obtain the benefits that multiple adaptation constants provide. In this case, using the general MCLMS algorithm without restricting to the case ci = p Vi will allow better control over the speed of adaptation.
yk
=
44 (1995) 103-115
(31)
cxk,
where A(Uk) = t
Aiui,
i=O
i=O
where AiERnX”, biE(W”, CE[W~~“, XER”, and u, y E R. For K = 1, Eq. (31) describes a bilinear system. Applications include wastewater treatment and pH neutralization problems [S]. A subclass of state affine systems, called nonlinear moving averages, are those for which the output can be written as a finite length regression on functions of the input sequence alone. For example, strictly lower (or upper) triangular A(ak) will give nonlinear moving average systems since A(+ 1) “’ A(&_,,) = 0 for all k. A nOnhear moving average system of order n and degree K shall be denoted as NMA(n, K). Define
5. Performance in a nonlinear environment It is useful and instructive to examine performance in nonlinear environments since these types of environments naturally give rise to large eigenvalue spreads. We shall examine the MCLMS algorithm in the case of a nonlinear moving average process which we shall introduce shortly. Other adaptive filtering work has been done in nonlinear environments involving second-order Volterra systems (see [2, S]), but we shall see that nonlinear moving average process models are in many cases better motivated since they are suggested by state space models.
i+l @k
=@:xz@i:,
i>l,
where ‘x’ denotes a set product and ‘z’ is the forward shift operator defined by zxk = xk+ 1. Also assume that any replication of terms in the sets above are discarded. Finally, denote the class of linear combinations of functions in @: with coefficients in IF!by 5’(@:). With these definitions, it can be shown, [lo], that the general input-output relation of an NMA(n, K) system can be written as .?yk E 3 ((a”,). Thus,
5.1. The nonlinear moving average process
y, E z-“_Y(@;). But
Consider a single-input single-output discrete time state affine system whose state equations are of the form xk+
1 =
@k)xk
+
hk),
z-“u(@;)
= z -“Y(@: x z@: x .‘. x z”-‘@:) = 5?(Z-=@: x ... xz_‘@p:) = Z(@:_” x ..’ x @k-r)
(32)
C. Horn, R.H. Kwong / Signal Processing 44 (1995)
=
.9({U;L1U~~2... I$-“: 0
. . . ,p.
Therefore, the input--output as
<
K}).
UkP’Q Odpr,
... ,O), ... ,(K,O,
. . . ,P,
... ,O),(O,l,O,
... ,O),
(l,l,O, . . . .O), . . . ,(K,K, . . . ,K)]T,
5.2. Performance
i
pi(K
+
l)‘- ’
In this section, we compare, in an example, the performance of three algorithms, LMS, MCLMS, and RLS (recursive least squares). The algorithms were used to identify the NMA(2, 2) system y,=
.”
,Pn).
-4r&r+4u;_r
-4u;_1Uk_2 +32&z -
where the uk are zero mean Gaussian signals with a variance of 10 so that 10
0
0
0
0
0
100
0
0
300
0
0
0
100
0
3000
0
0
100
0
0
0
0
0
0
0 100 100
0
0
0
0
0
0
100 3000
0
0
0
R=
100
In general, the ith element of & is
[tK
+ l),
Lb1
~ljn-,]InodW
0
mod(K + l), . . . .
+ 1))
- 3u,_,U:_2
6&1~~-2,
0
imod(K
+2u&-2u~_,u~_2
(34)
i=l
=(Pl,P2,
of MCLMS in an NMA
environment
where the constant term, (0, 0, . . ,O), is left out. It should be noted that this listing is closely related to counting from 1 to (K + 1)” - 1 in a number system that has a base of K + 1 since for a number p p =
113
model can be written
With this representation, it is easy to see that in general there can be up to (K + 1)” terms in the linear combination. The model yk = &‘fI* shall now be used for the NMA(n, K) case by a suitable ordering of the terms in the set 6&L_1. Let the list (pi, pz, . . , p,) denote the term u,P: 1$5 z ... up_ n. Then $k can be listed as [(l,O,
I5
with Volterra processes of length N and order p. Refer to [6] for more details on NMA systems and a comparison with Volterra processes.
(33)
%;_I ={U;L1u;12...
103-l
100 0
0
0
300
0
3000
0
0
0
0
3000
0
3000 0
0
0
3000
0
90000
0
In this case, (35)
where ‘[.I’ is the integer round down, or floor, function. For example, [4.9] = 4. Thus, the NMA(n., K) model has been rearranged to fit the general model of y, = 4:0* with & as defined in (34). Note that for NMA (n, K) systems, R becomes very complicated as the order increases since it becomes based on higher-order moments of the input signal u. Since NMA systems are motivated by specific state space models, the NMA structure is more appropriate to examine in many cases than simply including all terms up to a certain order as is done
e*T = c-4,4,
2, -2,
-4,
3, -3,
-61.
The actual choices for adaptation constants were LMS: /J = 1.11 x lo-‘, MCLMS:
pI = 1 x 10-3, p2 = 1 x 10-4, PLj= 5x 10P5, /&$= 3.33 x 1o-6,
C. Horn, R.H. Kwong 1 Signal Processing 44 (1995) 103-115
114
ps =
1.5 x 10-3,
&j = 3.33 x 10-6,
&j = 1.11 x 10-7, RLS: Experimentation gave best results with covariance resetting every 30 iterations. Figs 3-6 illustrate both mean square error convergence and weight convergence. Ten runs were averaged for this simulation. Only the weight convergence plots for the slowest and fastest modes are presented, and these are given in Figs. 5 and 6,
respectively. These plots show that the MCLMS algorithm has performance very favorable compared to other well-known algorithms. One other algorithm, called LMS/MCLMS, was tried which started with the LMS algorithm but then switched to the MCLMS approach when an online estimate of R was obtained. The estimate was taken at the 500th iteration. This algorithm was also seen to have performance very favorable compared to other popular algorithms. The LMS/MCLMS algorithm outperformed the RLS algorithm and in fact had performance comparable to the MCLMS algorithm even though the estimate of R was obtained online. See Figs. 3 and 4.
IO
60,
I
I( 40
20 IO!
0. F
I
E
$
I
-40
IO'
5
0
200
-20
-60 -00
5
10'
I
,
1 ’
‘loo
600
600
1000 iteration
1200
Fig. 3. Mean square
1400
1600
1600
2000
-100’
0
I
200
400
600
600
1000 llHt3,lO”
Fig. 5. Convergence
error.
1200
1400
1600
of first weight.
106
IO’
E 102
lOa
-6-
d 10.’
0
200
400
600
600
1000 lteraton
Fig. 4. Mean square
1200
1400
error.
1600
1600
2000
Fig. 6. Convergence
of eighth weight.
,600
2000
C. Horn, R.H. Kwong / Signal Processing 44 (I 995) 103-I I5
6. Conclusions
Necessary and sufficient conditions for convergence of a general matrix LMS algorithm were derived for the case where the adaptation and input correlation matrices commute. Both stationary and nonstationary optimal weight vector environments were examined. The LMS/Newton algorithm was seen to be an important special case of the introduced algorithm and it was then shown that the bounds on adaptation gain commonly quoted in the literature for LMS/Newton and SER algorithms are not strict enough and must be reduced to provide convergence of the weight vector covariance matrix. Performance of the introduced algorithm was also examined in a nonlinear moving average process environment which was motivated by specific state space models. The algorithm performed favorably with respect to other wellknown algorithms. This paper took a theoretical approach to develop necessary and sufficient conditions for convergence based on assumptions which are commonly used in the literature. It would be of interest to explore practical issues in more detail. For example, when we use the hybrid LMS/MCLMS algorithm, how accurate does the estimate of R need to be before we can switch to the MCLMS algorithm? Also, a more detailed performance comparison of our algorithm with RLS would be desirable. References [l]
R. Bellman, Introducrion to Matrix Analysis, McGrawHill, New York, 1960.
115
[Z] C.E. Davila, A.J. Welch and H.G. Rylander 111, “A secondorder adaptive Volterra filter with rapid convergence”. IEEE Trans. Acoust. Speech Signal Process., Vol. 35, September 1987, pp. 125991263. analysis of LMS D ] A. Feuer and E. Weinstein, “Convergence filters with uncorrelated Gaussian datas”, IEEE Trans. Acoust. Speech Signal Process., Vol. 33. February 1985, pp. 222-230. c4 ] R.D. Gitlin and F.R. Magee Jr., “Self-orthogonalizing adaptive equalization algorithms”, IEEE Trans. Commun.. Vol. 25, July 1977, pp. 666.~672. [5] G.C. Goodwin and K.S. Sin, Adaptive Fdtering Prrdiction und Control, Prentice-Hall, Englewood Cliffs, NJ. 1984. [6] C. Horn and R.H. Kwong, “A modified LMS algorithm with applications to nonlinear moving average processes”, Systems Control Group Report #9109. Department of Electrical Engineering, University of Toronto. August 1991. [7] L.L. Horowitz and K.D. Senne, “Performance advantage of complex LMS for controlling narrow-band adaptive arrays”, IEEE Trans. Acoust. Speech Signal Process., Vol. 29, June 1981, pp. 7222736. [8] T. Koh and E.J. Powers, “Second-order Volterra filtering and its application to nonlinear system identification”. IEEE Trans. Acoust. Speech Siqnul Prwess.. Vol. 33, December 1985, pp. 14451455. [9] R.H. Kwong and E.W. Johnston, “A variable step size LMS algorithm”. IEEE Trans. Signal Processing, Vol 40. July 1992, pp. 163331642. [lo] N. Meijer, Parametrization and tdentihcation of state affine systems, Ph.D. Thesis. University of Toronto, 1989. [l l] B. Widrow and J.M. McCool, “A comparison of adaptive algorithms based on the methods of steepest descent and random search”, IEEE Trans. Antenna< Propagation, Vol. 24, September 1976, pp. 6155637. 1121 B. Widrow, J.M. McCool. M.G. Larimore and CR. Johnson Jr., “Stationary and nonstationary learning characteristics of the LMS adaptive filter”. Proc. IEEE, Vol. 64, August 1976, pp. 1151~1162. [ 131 B. Widrow and SD. Stearns, Adaptioe Siqnul Processing, Prentice-Hall, Englewood Cliffs, NJ, 1985.