Signal behavior of adaptive filtering algorithms in a nonstationary environment with singular data covariance matrix

Signal behavior of adaptive filtering algorithms in a nonstationary environment with singular data covariance matrix

ARTICLE IN PRESS Signal Processing 85 (2005) 1263–1274 www.elsevier.com/locate/sigpro Signal behavior of adaptive filtering algorithms in a nonstatio...

273KB Sizes 0 Downloads 67 Views

ARTICLE IN PRESS

Signal Processing 85 (2005) 1263–1274 www.elsevier.com/locate/sigpro

Signal behavior of adaptive filtering algorithms in a nonstationary environment with singular data covariance matrix Eweda Eweda Department of Electrical Engineering, Ajman University of Science & Technology, P.O. Box 346, Ajman, United Arab Emirates Received 19 May 2004; received in revised form 27 January 2005

Abstract The paper analyzes the signal behavior of adaptive filtering algorithms when the target weights of the adaptive filter are time varying and the covariance matrix of the filter input is singular. The signal behavior is evaluated in terms of moments of the excess output error of the filter. Two algorithms are considered: the LMS algorithm and the sign algorithm. The analysis is done in the context of adaptive plant identification. The plant parameters vary according to a random walk model. The plant input, plant noise, and plant parameters are assumed mutually independent. Under these assumptions, it is found that the signal behavior of the algorithms is the same as the signal behavior in the case with positive definite input covariance matrix. r 2005 Elsevier B.V. All rights reserved. Keywords: Tracking; Adaptive signal processing; Adaptive filtering

1. Introduction A common assumption in available theoretical analyses [1–32] of adaptive filtering algorithms is that the covariance matrix of the filter input is strictly positive definite. This is usually called the condition of persistent excitation [5,27,33]. This assumption, however, may be violated in some cases in practice. One example is the case when the filter input is obtained by sampling a band limited E-mail address: [email protected].

signal at a rate that exceeds Nyquist rate. This situation takes place when the signal bandwidth is over-estimated. Another example is the case when the power spectral density function of the filter input has nulls within its band. This situation may take place when the filter input is dependent digital data. A third example is the case of an adaptive linear combiner with a linear dependence among its parallel inputs. This situation takes place when there is no prior knowledge about such dependence so that the dependent inputs remain without reduction. In these cases, the input covariance

0165-1684/$ - see front matter r 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.sigpro.2005.01.007

ARTICLE IN PRESS 1264

E. Eweda / Signal Processing 85 (2005) 1263–1274

matrix is singular. A well-known procedure that is usually followed in such a situation is to add a dithering sequence to the filter input so as to retain the positive definite property of the input covariance matrix. This procedure avoids bad weight behavior of the filter. In [34], we have shown that dithering is not needed when the signal behavior of the adaptive filter is the main concern and the weight behavior is unimportant. This is the case in many applications. An immediate example is adaptive echo canceling [1,2]. In such a case, the main concern is to minimize the mean square of the residual echo signal while the behavior of the weights of the echo canceler is not interesting. The analysis of [34] is done in the case when the target weights of the adaptive filter are constant with the time. Four algorithms have been considered: LMS, RLS, sign (SA), and signed regressor (SRA) algorithms. It is found that the steady state signal behaviors of the LMS and SA can be made arbitrarily fine by using sufficiently small step sizes of the algorithms. On the other hand, the RLS and SRA diverge when the input covariance matrix is singular. The present paper is an extension of [34] to the case when the target weights of the adaptive filter are time varying. Since the RLS and SRA are found to be divergent in the stationary case [34], there will be no interest in studying their behavior in the non-stationary case. Therefore, the present paper will consider the signal behaviors of the LMS and SA only. The analysis is done in the context of adaptive plant identification. The plant parameters vary according to a random walk model. The plant input, plant noise, and plant parameters are assumed mutually independent. Under these assumptions, it is found that the signal behavior of the LMS and SA is the same as the signal behavior in the case with positive definite input covariance matrix. It is important to mention that this good signal behavior is conditioned by the mutual independence of the plant input and the plant noise. For cases with dependent input and noise, a bad signal behavior, such as bursting shown in [35–39], may take place. The paper is organized as follows. The formulation of the problem is given in Section 2. Section 3 analyzes the signal behavior of the LMS algo-

rithm. Three theorems are provided. Two theorems concern the case of Gaussian data with the common independence assumption. The third theorem is concerned with bounded data without the independence assumption. The section also shows that the provided analysis holds for both singular and positive definite input covariance matrices. Section 4 is concerned with analyzing the signal behavior of the SA. Computer simulations that validate the derived analytical results are provided in Section 5. The conclusions of the paper are given in Section 6. For convenient tracking of the results, the proofs of the theorems are put in appendices.

2. Formulation of the problem Consider the case of adaptive plant identification [1,2] depicted by Fig. 1. Let the output ak of the plant be given by ak ¼ GTk Xk þ bk ,

(2.1)

where Gk  ðg1 ðkÞ; g2 ðkÞ; . . . ; gN ðkÞÞT

(2.2)

is the vector composed of the plant parameters at time k; Xk ¼ ðx1;k ; x2;k ; . . . ; xN;k ÞT

(2.3)

is the data vector, bk is the plant noise, N is the number of plant parameters, and ð:ÞT is the transpose of (.). The inputs x1;k ; x2;k ; . . . ; and xN;k may be successive samples of the same signal, such as in the cases of adaptive echo canceling and adaptive line enhancement [1]. They may also be the instantaneous outputs of N parallel sensors, such as in the case of adaptive beam forming [1]. They may also be the parallel outputs of a transformation block applied to the input signal as in the case of transform domain adaptive filters [29,30]. The identification of the plant is made by an adaptive FIR filter whose weight vector Hk ; assumed of dimension N; is adapted on the base of the error ek given by ek ¼ ak  HTk Xk .

(2.4)

ARTICLE IN PRESS E. Eweda / Signal Processing 85 (2005) 1263–1274

1265

Noise bk Unknown Plant Gk

Input Xk



ak

+



ek

_

Adaptive Filter Hk

Fig. 1. Adaptive plant identification.

This error can be decomposed to two terms: the plant noise bk and the excess estimation error k defined by  k  e k  bk .

(2.5)

k is also termed as the adaptation noise [1] since it represents the noise that appears at the filter output due to adaptation. The weight misalignment vector is defined by Vk  Hk  Gk .

(2.6)

The signal behavior of the adaptive filter is described by the evolution of the moments of k with time while the weight behavior is described by the evolution of the moments of Vk : This paper is concerned with the signal behavior only. Due to (2.1), (2.4)–(2.6), k ¼ VTk Xk .

(2.7)

The paper considers the case when the plant parameters are randomly time-varying. The plant parameter vector increment is denoted by Dk ; i.e., Dk  Gkþ1  Gk .

(2.8)

The following common assumptions are used throughout the paper: Assumption A1. The sequences fXk g; fDk g; and fbk g are mutually independent.

Assumption A2. The sequence fXk g is stationary with finite second moments. Assumption A3. fbk g is a stationary sequence of independent zero mean random variables with finite variance s2b . Assumption A4. The sequence fDk g is i.i.d. and zero-mean with finite second moments. Assumption A1 has been used in a number of papers e.g. [12,16,17,20,21] and it is realistic in many applications. An immediate example is the case of adaptive echo canceling. In such a case, the plant input is the near end data, the plant noise is the far end data, and the plant parameters are the echo path parameters. It is obvious that these three processes have independent physical origins and hence they are mutually independent. Assumption A4 is the random-walk model of plant parameter variations. This assumption is used in a lot of analyses concerned with the tracking behavior of adaptive filtering algorithms e.g. [12,16,17,20,21]. Denote q2  EðkDk k2 Þ.

(2.9)

Also, denote the covariance matrix of Xk by R; i.e. R  EðXk XTk Þ.

(2.10)

ARTICLE IN PRESS E. Eweda / Signal Processing 85 (2005) 1263–1274

1266

In available adaptive filtering analyses [1–32], R is assumed strictly positive definite. In the present paper, we treat the case when this assumption is violated. Let the eigenvalues of R be denoted by l1 ; l2 ; . . . ; lN ; l1 pl2 p . . . plN : The eigenvector associated with lj is denoted by Uj ; j ¼ 1; 2; . . . ; N: Let L ¼ N-rankðRÞ: Then, li ¼ 0 for ipL;

li 40 for L þ 1pipN.

(2.11)

The minimum non-zero eigenvalue of R is denoted by lmin while the maximum eigenvalue is denoted by lmax : It is worth mentioning that the analysis provided in the paper holds for both singular and positive definite R’s. For a singular R; LX1: For a positive definite R; L ¼ 0: The following common notation is used throughout the paper:   Limsup sk  Lim sup si k!1

k!1

iXk

k!1

3. Signal behavior of the LMS algorithm The LMS algorithm [1–10] is given by (3.1)

where m40 is the algorithm step size. From (2.5)–(2.8), and (3.1), Vkþ1 ¼ Vk þ mXk ðbk  VTk Xk Þ  Dk .

Theorem 1. The Assumptions A1; A3–A5 imply that for 0omo2=ð3trðRÞÞ; the excess error of the LMS satisfies Limsup

where ‘‘sup’’ means supremum.

Hkþ1 ¼ Hk þ mXk ek ,

when x1;k ; x2;k ; . . . ; and xN;k are samples of N parallel sources, A5 holds when each one of the sequences x1;k ; x2;k ; . . . ; and xN;k is i.i.d. Gaussian. The covariance matrix R will be singular when there is a linear dependence among x1;k ; x2;k ; . . . ; and xN;k : In the case when x1;k ; x2;k ; . . . ; and xN;k are successive samples of the same source, the independence of successive Xk ’s does not hold. However, the results derived under this assumption are usually close to the practical behavior of the algorithm especially when the step size is small [32]. The following theorem describes the signal behavior of the LMS with Gaussian data.

(3.2)

3.1. Case of Gaussian data The following assumption is used in this subsection: Assumption A5. The sequence fXk g is i.i.d., zeromean, and Gaussian with finite second moments. A5 is the common independence assumption that is used in a large number of papers dealing with the analysis of adaptive filtering algorithms e.g. [1,16,17,20–26,28]. Although this assumption has been deleted in some papers, it is still used as a simplifying assumption when a difficult aspect of adaptive filtering is to be analyzed. In the case

k ms2 trðRÞ þ m1 q2 1X . Eð2j Þp b k j¼1 2  3mtrðRÞ

(3.3)

Proof. Appendix A. Notice that the assumptions of this theorem, and of all other theorems of the paper, do not include a positive definite condition for the input covariance matrix. Theorem 1 shows a satisfactory signal behavior of the LMS algorithm with a singular input covariance matrix. Indeed, the bound (3.3) is the same as the bound obtained when that matrix is positive definite. The bound (3.3) consists of two terms. The first term is due to the gradient noise; it is increasing in the algorithm step size and the noise variance. The second term is due to the lag error; it is decreasing in the algorithm step size and increasing in the mean square plant parameter increment. The bound (3.3) attains its minimum at the step size  4 1=2 9q q2 3q2 mmin ¼ þ 2  2. (3.4) 4 4sb sb trðRÞ 2sb The following theorem provides an upper bound of the mean square excess error itself rather than its long-term average: Theorem 2. The Assumptions A1; A3; A4; and A5 imply that for 0omom0 ; the excess error of the

ARTICLE IN PRESS E. Eweda / Signal Processing 85 (2005) 1263–1274

LMS satisfies Limsup k!1

lmax ðms2b trðRÞ þ m1 q2 Þ , Eð2k Þp 2lmin  ð2l2max þ lmax trðRÞÞm (3.5)

where m0 

2lmin . 2l2max þ lmax trðRÞ

(3.6)

Proof. Appendix B. Each one of the results (3.3) and (3.5) has an advantage with respect to the other. (3.3) has the advantage of wider range of m and tighter bound than (3.5) when lmax =lmin is large. On the other hand, (3.5) has the advantage of concerning the mean square excess error itself rather than its longterm average. Thus, none of the results (3.3) and (3.5) implies the other.

also holds for arbitrary distribution of Xk : Again, the upper bound in (3.8) shows a satisfactory signal behavior of the LMS algorithm with a singular input covariance matrix since it is the same as the bound obtained when that matrix is positive definite. The above theorems lead to the conclusion that the LMS algorithm can be used with singular input covariance matrix without dithering when the signal behavior is the main concern and the weight behavior is unimportant. Fortunately, this is the case in many applications. An immediate example is adaptive echo canceling [1,2]. In such a case, the main concern is to minimize the mean square of the residual echo signal k : The behavior of the weights of the echo canceler is not interesting. 3.3. Weight behavior in the case of Gaussian data Decompose Hk to the two orthogonal components defined by

3.2. Case of bounded data In this subsection, Assumption A5 is replaced by the following assumption: Assumption A6. The sequence fXk g is stationary, zero mean, and bounded; i.e. there exists a finite positive number B such that kXk kpB

for all k.

(3.7)

This boundedness assumption does not represent a practical limitation since data are usually bounded in practice. However, this assumption excludes the case of Gaussian Xk : This case has been treated in Subsection 3.1 above. Now, we have the following theorem: Theorem 3. The Assumptions A1; A3; A4; and A6 imply that for 0omo2=B2 ; the excess error of the LMS satisfies k ms2 B2 þ m1 q2 1X Limsup Eð2j Þp b . 2  mB2 k!1 k j¼1

(3.8)

Proof. Appendix C. Theorem 3 is strong in the sense that it holds for arbitrary dependence among successive Xk ’s. It

1267

¯k  H

L X

ðUTj Hk ÞUj ;

~k  H

j¼1

N X

ðUTj Hk ÞUj .

j¼Lþ1

(3.9) ¯ k is the projection of Hk on the null-space of the H ~ k ; is normal to data covariance matrix R; while H ~ k will be shortly ¯ k and H the null-space of R; H named as the ‘‘null-space’’ and ‘‘normal’’ components of Hk ; respectively. In an analogous way, the misalignment vector Vk can be decomposed to a ¯ k and a normal componull-space component V ~ k : The behavior of H ¯ k is described by the nent V following theorem: Theorem 4. Assumptions A1; A3; and A5 imply that for all m40 ¯ kÞ ¼ H ¯1 EðH

for all k,

¯ k ¼ 0 for all k, variance of H

(3.10) (3.11)

Proof. See Appendix D. ¯ k does not adapt. This theorem implies that H Thus, the algorithm does not track the projection of the plant parameter vector on the null-space of R; which is not satisfactory. A geometrical interpretation of this behavior is as follows. The update term of (3.1) is parallel to Xk : Since Xk is

ARTICLE IN PRESS 1268

E. Eweda / Signal Processing 85 (2005) 1263–1274

orthogonal to the null-space of R; the update term ¯k of (3.1) at each iteration step is orthogonal to H and, hence, cannot alter this component. ~ k is described by the following The behavior of H theorem: Theorem 5. The Assumptions A1; A3–A5 imply that for 0ompm0 ; ~ k Þ ¼ 0, Lim EðV

(3.12)

k!1

~ k k2 Þp Limsup EðkV k!1

ms2b trðRÞ þ m1 q2

. 2lmin  ð2l2max þ lmax trðRÞÞm (3.13)

The proof of (3.12) follows from (B.2)–(B.7) in Appendix B and (2.11). Eq. (3.13) is the same as (B.13) in Appendix B. Eqs. (3.12) and (3.13) imply a good behavior of the normal component of the weight vector. Indeed, (3.12) and (3.13) imply that the behavior of this component is similar to its behavior with a positive definite input covariance matrix. However, due to the non-adaptation of the ¯ k ; shown by Theorem 4, null-space component H the overall weight behavior is not satisfactory.

Now, the following question may arise. How does it come that the signal behavior is satisfactory although the weight behavior is not? This question is answered as follows. Due to (2.7) and (A.1) in Appendix A, (3.14)

~ k and V ¯k þV ¯ k belongs to the nullSince Vk ¼ V space of R; then (3.14) implies that ~ T RV ~ k Þ. Eð2k Þ ¼ EðV k

4. Signal behavior of the SA The SA [18–26] is given by Hkþ1 ¼ Hk þ mXk sgnðek Þ,

(4.1)

where sgn(.) is the signum function. Let F ð:Þ denote the probability distribution function of the noise bk : The following assumptions are used: Assumption A7. F ð0Þ ¼ 12:

3.4. Interpretation of the signal behavior

Eð2k Þ ¼ EðVTk RVk Þ.

the fact that the theorems do not include any assumption about the rank of R: The results for a positive definite R can be derived as a particular case of the above results as follows. The case of a positive definite R corresponds to L ¼ 0 in (2.11). The signal behavior bounds provided by Theorems 1–3 do not depend on L: Consequently, these bounds hold for both singular and positive definite R’s. The situation is different for the weight behavior provided by Theorems 4 and 5. When ~ k: ¯ k ¼ 0: Thus, Vk ¼ V L ¼ 0; (3.9) implies that V Consequently, (3.12) and (3.13) will hold for the total misalignment vector Vk : This agrees with the results reported for the case with a positive definite R:

(3.15)

(3.15) shows that the signal behavior depends ~ k and does not depend on V ¯ k : Consequently, on V the signal behavior will not be affected by the bad ¯ k : This answers the above question. behavior of V It is worth mentioning that the analysis provided in the paper holds for both singular and positive definite covariance matrices. This is due to

Assumption A8. There exist a40 and b40 such that the noise probability density function f ðuÞ exists over the interval apupa and satisfies f ðuÞXb

for  apupa.

(4.2)

Assumption A7 means that the noise assumes positive and negative polarities equally likely, which holds in most practical applications. Assumption A8 holds for many noise distributions encountered in practice. Examples are the uniform, exponential, and Gaussian distributions. Notice that A8 allows the non-existence and unboundedness of the noise probability density function for juj4a; i.e. away from the origin. It also allows that f ðuÞ vanishes for juj4a: A7 and A8 allow also nonsymmetry of the noise probability density function. Finally, we give examples of the values of a and b for some frequent

ARTICLE IN PRESS E. Eweda / Signal Processing 85 (2005) 1263–1274

distributions. a¼

pffiffiffi 3 sb ;

5. Simulation results

pffiffiffi 3 b¼ for uniform distribution, 6sb

a ¼ sb ;



a ¼ sb ;



pffiffiffi 2 pffiffi2 e for exponential distribution, 2sb

1 1 pffiffiffiffiffiffi e2 for Gaussian distribution. sb 2p

The following theorem describes the signal behavior of the SA. Theorem 6. The Assumptions A1–A3; A4; A7; and A8 imply that for all m40; the excess error of the SA satisfies Limsup k!1

p

1 k

1269

k X

Eðjj jÞ

j¼1

1 ½mtrðRÞ þ m1 q2 4ab pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi þ 2aðmtrðRÞ þ m1 q2 Þ.

ð4:3Þ

Proof. Eq. (4.3) is the same as Eq. (6.8) in [31]. It is straightforward to show that the proof given in [31] holds under the assumptions of Theorem 6. Theorem 6 shows a satisfactory signal behavior of the SA with a singular input covariance matrix. Indeed, the bound (4.3) is the same as the bound obtained with a positive definite covariance matrix. The bound (4.3) attains its minimum at the step size pffiffiffiffiffiffiffiffiffiffiffi mmin ¼ q= trðRÞ. Theorem 6 is strong in the sense that it holds for (1) arbitrary dependence of the sequence Xk ; (2) arbitrary distribution of Xk ; (3) several distributions of the noise, and (4) all m40: Theorem 6 leads to the conclusion that the SA can be used with a singular input covariance matrix without dithering when the signal behavior is the main concern and the weight behavior is unimportant.

The simulations are done for a plant with five parallel inputs fx1;k g; fx2;k g; . . . ; and fx5;k g: Each one of these sequences is an i.i.d. Gaussian sequence with zero mean and unity variance; fx1;k g; fx2;k g; fx3;k g and fx4;k g are mutually indepffiffiffi pendent, while x5;k ¼ ðx1;k þ x2;k Þ= 2: In such a case, successive vectors Xk ’s are mutually independent and the covariance matrix of Xk is given by 3 2 1 0 0 0 p1ffiffi2 7 6 6 0 1 0 0 p1ffiffi 7 27 6 7 6 R ¼ 6 0 0 1 0 0 7. 7 6 60 0 0 1 07 5 4 p1ffiffi p1ffiffi 0 0 1 2

2

This matrix has one zero eigenvalue. The corresponding eigenvector is U1 ¼ ð12; 12; 0; 0; p1ffiffi2ÞT : The minimum non-zero eigenvalue is lmin ¼ 1: The maximum eigenvalue is lmax ¼ 2: The plant noise is an i.i.d. Gaussian sequence with a zero mean and a 0.01 variance. The components of the plant parameter increment vector Dk are assumed mutually independent, Gaussian, and with the same standard deviation w: Thus, q ¼ Nw2 : We consider w ¼ 0:001 throughout the simulations. The initial weight vector of the filter is set to an all zero vector; i.e., H1 ¼ 0: Consider the results of the LMS first. The above conditions of the simulations match the assumptions of Theorem1. The time averaged mean square excess error is shown in Fig. 2 over a wide range of m satisfying the conditions of Theorem 1. The time averaging is done over 100 000 iterations. Fig. 2 validates the upper bound (3.3). This shows that in spite of the singularity of the input covariance matrix, the signal behavior of the LMS algorithm is satisfactory, and it is the same as the behavior obtained when that matrix is positive definite. The simulation results of the SA are shown in Fig. 3. The figure shows the time averaged mean absolute excess error versus m: Fig. 3 validates the upper bound (4.3). Again, this shows that in spite of the singularity of the input covariance matrix,

ARTICLE IN PRESS E. Eweda / Signal Processing 85 (2005) 1263–1274

1270

Time Averaged Mean Square Excess Error

0.012

0.01

0.008

Simulations Theoritical Upper Bound

0.006

0.004

0.002

0

10-3

10-2 Step Size

10-1

Fig. 2. Time averaged mean square excess error of the LMS algorithm; N ¼ 5; sb ¼ 0:1; and w ¼ 0:001:

Time Averaged Mean Absolute Excess Error

0.16 0.14 0.12

Theoritical Upper Bound Simulations

0.1 0.08 0.06 0.04 0.02 0 10-4

10-3 Step Size

10-2

Fig. 3. Time averaged mean absolute excess error of the sign algorithm; N ¼ 5; sb ¼ 0:1; and w ¼ 0:001:

ARTICLE IN PRESS E. Eweda / Signal Processing 85 (2005) 1263–1274

the signal behavior of the SA algorithm is satisfactory, and it is the same as the behavior obtained when that matrix is positive definite.

1271

first evaluating the conditional expectation for a given Vk ; it follows that EðkXk k2 ðVTk Xk Þ2 Þ ¼ E½EðkXk k2 ðVTk Xk Þ2 jVk Þ. (A.3) Vk ; VTk Xk

From (A.1) and A5, for a given are jointly Gaussian. Consequently,

6. Conclusion The signal behavior of the LMS algorithm and the SA has been analyzed when the input covariance matrix is singular and the target values of the adaptive filter weights vary with the time. The analysis is done in the context of adaptive plant identification. The plant parameters vary according to a random walk model. The plant input, plant noise, and plant parameters are assumed mutually independent. The signal behavior is analyzed by investigating the excess error at the output of the filter. Upper bounds are derived for the long-term averages of the mean square excess error and the mean absolute excess error of the LMS and SA, respectively. Based on these bounds, it is found that the signal behaviors of the algorithms are satisfactory in the environment described by the above assumptions. Namely, the signal behaviors are the same as those obtained when the input covariance matrix is positive definite. The main conclusion of the paper is that no dithering of the filter input is needed when the signal behavior of the adaptive filter is the main concern and the weight behavior is unimportant.

and Xk

EðkXk k2 ðVTk Xk Þ2 jVk Þ ¼

N X

Eðx2j;k ÞEððVTk Xk Þ2 jVk Þ

j¼1

þ2

N X

½Eðxj;k VTk Xk jVk Þ2

j¼1

p3

N X

Eðx2j;k ÞEððVTk Xk Þ2 jVk Þ,

ðA:4Þ

j¼1

where the last inequality follows from Cauchy–Schwarz inequality. From (A.3) and (A.4), EðkXk k2 ðVTk Xk Þ2 Þp3trðRÞEððVTk Xk Þ2 Þ.

(A.5)

From (2.7), (A.2), and (A.5), it follows that EðkVkþ1 k2 ÞpEðkVk k2 Þ  mð2  3mtrðRÞÞEð2k Þ þ m2 s2b trðRÞ þ q2 .

ðA:6Þ

Iterating backward with (A.6) k  1 iterations, it follows that EðkVkþ1 k2 ÞpEðkV1 k2 Þ  mð2  3mtrðRÞÞ

Appendix A. Proof of Theorem 1  Due to (3.2), Vk is a function of fðbj ; Xj ; Dj Þ : jokg: Hence, A1–A5 imply that bk ; Vk ; Xk ; and Dk are mutually independent. (A.1)

k X

Eð2j Þ þ km2 s2b trðRÞ þ kq2 .

ðA:7Þ

j¼1

Since the left-hand side of (A.7) is non-negative, then dividing (A.7) by k and taking the limit as k tends to infinity, we obtain (3.3). &

Squaring (3.2), taking the expectation, and using (A.1), it follows that Appendix B. Proof of Theorem 2

EðkVkþ1 k2 Þ ¼ EðkVk k2 Þ  2mE½ðVTk Xk Þ2  þ m2 s2b EðkXk k2 Þ þ m2 EðkXk k2 ðVTk Xk Þ2 Þ 2

þq .

ðA:2Þ

Let EðxjyÞ denote the conditional expectation of x for a given y: Evaluating the last term of (A.2) by

Due to (3.2), Vk is a function of fðbj ; Xj ; Dj Þ : jokg: Hence, A1–A5 imply that bk ; Vk ; Xk ; and Dk are mutually independent. (B.1)

ARTICLE IN PRESS E. Eweda / Signal Processing 85 (2005) 1263–1274

1272

Let wj;k ; sj;k ; and f j;k respectively denote the projections of the vectors Vk ; Xk ; and Dk on the jth eigenvector of R; i.e. wj;k  UTj Vk ,

(B.2)

sj;k  UTj Xk ,

(B.3)

f j;k  UTj Dk .

(B.4)

where the last equality follows from the fact that N X

~k  V

wj;k Uj .

(B.10)

j¼Lþ1

Inserting (B.9) in (B.8) yields

(B.2)–(B.4) imply that ! N X bk  wi;k si;k  f j;k .

Pre-multiplying (3.2) by wj;kþ1 ¼ wj;k þ msj;k

UTj ;

i¼1

(B.5) From (2.10) and (B.3), ( 0 for iaj Eðsj;k si;k Þ ¼ lj for i ¼ j:

Eðw2j;kþ1 Þpð1  2mlmin þ 2m2 l2max ÞEðw2j;k Þ ~ k k2 Þ þ m2 s2 lj þ Eðf 2 Þ þ m2 lj lmax EðkV j;k b for j ¼ L þ 1; L þ 2; . . . ; N.

Summing (B.11) over j from L þ 1 to N; using (B.4), and (2.9), one obtains ~ kþ1 k2 Þpð1  2mlmin þ 2m2 l2 ÞEðkV ~ k k2 Þ EðkV max ~ k k2 Þ þ m2 lmax trðRÞEðkV þ m2 s2b trðRÞ þ q2 .

ðB:12Þ

(B.12) implies that (B.6)

~ k k2 Þp Limsup EðkV k!1

ms2b trðRÞ þ m1 q2 , 2lmin  ð2l2max þ lmax trðRÞÞm

0omom0 ,

From (B.1), bk ; wi;k ; sj;k ; f p;k are mutually independent, 1pipN; 1pjpN; 1pppN.

ðB:11Þ

ðB:7Þ

Due to A5 and (B.3), sj;k is zero-mean Gaussian. Then, (B.6) implies that si;k and sj;k are mutually independent for iaj: Therefore, squaring (B.5), taking the expectation, using (B.6) and (B.7), we obtain after straightforward calculations

ðB:13Þ

where m0 is given by (3.6). Due to (2.7), (2.10), (2.11), (B.2), and (B.10), ~ k k2 Þ. Eð2k Þplmax EðkV (3.5) follows (B.14). &

immediately

(B.14) after

(B.13)

and

Appendix C. Proof of Theorem 3

Eðw2j;kþ1 Þ

Due to (3.2), Vk is a function of fðbj ; Xj ; Dj Þ : jokg: Hence, A1, A3, and A5 imply that

¼ ð1  2mlj þ 2m2 l2j ÞEðw2j;k Þ þ m2 lj

N X

li Eðw2i;k Þ þ m2 s2b lj þ Eðf 2j;k Þ. ðB:8Þ

i¼1

bk ; Vk ; and Dk are mutually independent.

(C.1)

From (2.11), it follows that

Squaring (3.2), taking the expectation, using (C.1) and (2.7), we obtain

N X

EðkVkþ1 k2 Þ ¼ EðkVk k2 Þ  2mEð2k Þ

li Eðw2i;k Þ

þ m2 s2b EðkXk k2 Þ

i¼1

¼

N X

þ m2 EðkXk k2 2k Þ þ q2 .

li Eðw2i;k Þ

(C.2) and (3.7) yield

i¼Lþ1

plmax

N X i¼Lþ1

~ k k2 Þ, Eðw2i;k Þ ¼ lmax EðkV

ðC:2Þ

ðB:9Þ

EðkVkþ1 k2 ÞpEðkVk k2 Þ  mð2  mB2 ÞEð2k Þ þ m2 s2b B2 þ q2 .

ðC:3Þ

ARTICLE IN PRESS E. Eweda / Signal Processing 85 (2005) 1263–1274

Iterating backward with (C.3) k  1 iterations, it follows that EðkVkþ1 k2 ÞpEðkV1 k2 Þ  mð2  mB2 Þ

k X

Eð2j Þ

j¼1

þ

km2 s2b B2

2

þ kq .

ðC:4Þ

Since the left-hand side of (C.4) is non-negative, then dividing (C.4) by k and taking the limit as k tends to infinity, (3.8) follows immediately. &

Appendix D. Proof of Theorem 4

1273

Due to (2.11), lj ¼ 0 for j ¼ 1; 2; . . . ; L: Then, (D.8) implies that EðZj;kþ1 Þ ¼ EðZj;k Þ ¼ Zj;1 ;

j ¼ 1; 2; . . . ; L.

(D.9)

Eq. (3.10) follows immediately after (3.9), (D.3), and (D.9). Proof of (3.11). Due to A5 and (B.3), sj;k is zero mean Gaussian. Then, (B.6) implies that si;k and sj;k are mutually independent for iaj: Therefore, squaring (D.5), taking the expectation, using (B.6), (D.6), and (D.7), we obtain after straightforward calculations EðZ2j;kþ1 Þ

Proof of (3.10). From (2.1), (2.4), and (3.1) we obtain Hkþ1 ¼ Hk þ mXk ðbk þ GTk Xk  HTk Xk Þ.

(D.1)

Due to (D.1), Hk is a function of fðbj ; Xj ; Gj Þ : jokg: Consequently, A1, A3, and A5 imply that Hk ; Xk ; and bk are mutually independent.

(D.2)

¼ EðZ2j;k Þ þ 2mlj EðZj;k gj;k  Z2j;k Þ þ 3m2 l2j E½ðgj;k  Zj;k Þ2  þ m2 lj

N X

li E½ðgi;k  Zi;k Þ2  þ m2 s2b lj .

i¼1;iaj

ðD:10Þ From (2.11) and (D.10),

Let Zj;k and gj;k ; respectively denote the projections of the vectors Hk and Gk on the jth eigenvector of R; i.e.

EðZ2j;kþ1 Þ ¼ EðZ2j;k Þ ¼ Z2j;1 ;

Zj;k  UTj Hk ,

(D.3)

Due to (D.9) and (D.11),

gj;k  UTj Gk .

(D.4)

From (D.1), (D.3), (D.4), and (B.3) in Appendix B, ! N N X X Zj;kþ1 ¼ Zj;k þ msj;k bk þ gi;k si;k  Zi;k si;k . i¼1

i¼1

(D.5) From (D.2), bk ; sj;k ; and Zi;k are mutually independent, 1pipN; 1pjpN.

ðD:6Þ

Due to A1, (B.3), and (D.4) bk ; sj;k ; and gi;k are mutually independent, 1pipN; 1pjpN.

ðD:7Þ

(D.5)–(D.7) and (B.6) imply that EðZj;kþ1 Þ ¼ ð1  mlj ÞEðZj;k Þ þ mlj Eðgj;k Þ.

(D.8)

j ¼ 1; 2; . . . ; L. (D.11)

variance of Zj;k ¼ 0;

j ¼ 1; 2; . . . ; L.

(D.12)

(3.11) follows immediately after (3.9), (D.3), and (D.12).

References [1] B. Widrow, S.D. Stearns, Adaptive Signal Processing, Prentice-Hall, Englewood Cliffs, NJ, 1985. [2] A.H. Sayed, Fundamentals of Adaptive Filtering, Wiley, New York, 2003. [3] J.K. Kim, L.D. Davisson, Adaptive linear estimation for stationary M-dependent data, IEEE Trans. Inform. Theory 21 (1975) 23–32. [4] A. Weiss, D. Mitra, Digital adaptive filters, conditions for convergence, rates of convergence, effects of noise and errors arising from the implementation, IEEE Trans. Inform. Theory 25 (1979) 637–652. [5] R.R. Bitmead, B.D.O. Anderson, Lyapunov techniques for the exponential stability of linear difference equations with random coefficients, IEEE Trans. Automat. Control AC-25 (1980) 782–787.

ARTICLE IN PRESS 1274

E. Eweda / Signal Processing 85 (2005) 1263–1274

[6] R.R. Bitmead, Convergence properties of LMS adaptive estimators with unbounded dependent inputs, IEEE Trans. Automat. Control AC-29 (1984) 477–479. [7] O. Macchi, E. Eweda, Second order convergence analysis of stochastic linear adaptive filtering, IEEE Trans. Automat. Control AC-28 (January 1983) 76–85. [8] D.H. Shi, F. Kozin, On almost sure convergence of adaptive algorithms, IEEE Trans. Automat. Control AC31 (1986) 471–474. [9] V. Solo, The limiting behavior of LMS, IEEE Trans. ASSP 37 (12) (December 1989) 1909–1922. [10] V. Solo, The stability of LMS, IEEE Trans. Signal Processing 45 (12) (December 1997) 3017–3026. [11] F. Ling, J.G. Proakis, Nonstationary learning characteristics of least squares adaptive estimation algorithms, in: Proceedings of IEEE ICASSP, 1984, pp. 3.7.1–3.7.4. [12] E. Eleftheriou, D.D. Falconer, Tracking properties and steady state performance of RLS adaptive filtering algorithms, IEEE Trans. ASSP-34 (5) (October 1986) 1097–1110. [13] O. Macchi, N.J. Bershad, M. Mboup, Steady-state superiority of LMS over LS for time-varying line enhancer in noisy environment, IEE Proc. Part F-Radar and Signal Processing 138 (4) (August 1991) 354–360. [14] E. Eweda, O. Macchi, Convergence of the RLS and LMS adaptive filters, IEEE Trans. Circuits and System CAS-34 (7) (July 1987) 799–803. [15] O. Macchi, E. Eweda, Compared speed and accuracy of the RLS and LMS with constant forgetting factors, APII 22 (March 1988) 255–267. [16] E. Eweda, Comparison of RLS, LMS, and Sign algorithms for tracking randomly time-varying channels, IEEE Trans. Signal Processing 42 (11) (November 1994) 2937–2944. [17] E. Eweda, Maximum and minimum tracking performances of adaptive filtering algorithms over target weight cross correlations, IEEE Trans. Circuits and Systems II: Analog and Digital Signal Processing 45 (1) (January 1998) 123–132. [18] A. Gersho, Adaptive filtering with binary reinforcement, IEEE Trans. Inform. Theory IT-30 (March 1984) 191–198. [19] E. Eweda, A tight upper bound of the average absolute error in a constant step size sign algorithm, IEEE Trans. Acoust. Speech Signal Process. ASSP-37 (11) (November 1989) 1774–1776. [20] C.P. Kwong, Control-theoretic design of the LMS and the sign algorithms in nonstationary environments, IEEE Trans. Acoust. Speech Signal Process. 38 (February 1990) 253–259. [21] S.H. Cho, V.J. Mathews, Tracking analysis of the sign algorithm in nonstationary environments, IEEE Trans. Acoust. Speech Signal Process. 38 (12) (December 1990) 2046–2057. [22] E. Eweda, Optimum step size of sign algorithm for nonstationary adaptive filtering, IEEE Trans. Acoust. Speech Signal Process. 38 (11) (November 1990) 1897–1901. [23] E. Masry, F. Bullo, Convergence analysis of the sign algorithm for adaptive filtering, IEEE Trans. Inform. Theory 41 (2) (March 1995).

[24] E. Eweda, Tracking analysis of the sign algorithm without the Gaussian constraint, IEEE Trans. Circuits and Systems II: Analog and Digital Signal Processing 45 (1) (January 1998) 115–122. [25] T.A.C.M. Claasen, W.F.G. Mecklenbrauker, Comparison of convergence of two algorithms for adaptive FIR digital filters, IEEE Trans. Acoust. Speech Signal Process. ASSP29 (June 1981) 670–678. [26] N.J. Bershad, Comments on Comparison of convergence of two algorithms for adaptive FIR digital filters, IEEE Trans. Acoust. Speech Signal Process. ASSP-33 (December 1985) 1604–1606. [27] W.A. Sethares, I.M.Y. Mareels, B.D.O. Anderson, C.R. Johnson Jr., R.R. Bitmead, Excitation conditions for signed regressor least mean squares adaptation, IEEE Trans. Circuits and Systems 35 (5) (June 1988). [28] E. Eweda, Analysis and design of a signed regressor LMS for stationary and nonstationary adaptive filtering with correlated Gaussian data, IEEE Trans. Circuits and Systems CAS-37 (11) (November 1990) 1367–1374. [29] J.C. Lee, C.K. Un, Performance of transform domain LMS adaptive digital filters, IEEE Trans. Acoust. Speech Signal Processing. ASSP-34 (June 1986) 499–510. [30] D.F. Marshall, W.K. Jenkins, J.J. Murphy, The use of orthogonal transforms for improving performance of adaptive filters, IEEE Trans. Circuits and Systems 36 (4) (April 1989) 474–484. [31] E. Eweda, Convergence analysis of the sign algorithm without the independence and Gaussian assumptions, IEEE Trans. Signal Processing 48 (9) (September 2000) 2535–2544. [32] J.E. Mazo, On the independence theory of equalizer convergence, Bell System Technical J. 58 (May 1979) 968–993. [33] R.R. Bitmead, Persistence of excitation conditions and the convergence of adaptive schemes, IEEE Trans. Inform. Theory 30 (March 1984) 183–191. [34] E. Eweda, Convergence analysis of adaptive filtering algorithms with singular data covariance matrix, IEEE Trans. Signal Processing 49 (2) (February 2001) 334–343. [35] B.D.O. Anderson, Adaptive systems, lack of persistency of excitation and bursting phenomena, Automatica 21 (3) (May 1985) 247–258. [36] W.A. Sethares, C.R. Johnson Jr., C.E. Rohrs, Bursting in Adaptive Hybrids, IEEE Trans. Commun. 37 (8) (August 1989) 791–799. [37] M.D. Espana, Intermittent phenomena in adaptive systems: a case study, Automatica 27 (4) (1991) 717–720. [38] K. Tsakalis, M. Deisher, A. Spanias, System identification based on bounded error constraints, IEEE Trans. Signal Processing 43 (12) (December 1995) 3071–3075. [39] K.S. Tsakalis, Performance limitations of adaptive parameter estimation and system identification algorithms in the absence of excitation, Automatica 32 (4) (1996) 549–560.