Analysis of robust recursive least squares: Convergence and tracking

Analysis of robust recursive least squares: Convergence and tracking

Analysis of Robust Recursive Least Squares: Convergence and Tracking Journal Pre-proof Analysis of Robust Recursive Least Squares: Convergence and T...

2MB Sizes 0 Downloads 50 Views

Analysis of Robust Recursive Least Squares: Convergence and Tracking

Journal Pre-proof

Analysis of Robust Recursive Least Squares: Convergence and Tracking Alireza Naeimi Sadigh, Amir Hossein Taherinia, Hadi Sadoghi Yazdi PII: DOI: Reference:

S0165-1684(20)30025-6 https://doi.org/10.1016/j.sigpro.2020.107482 SIGPRO 107482

To appear in:

Signal Processing

Received date: Revised date: Accepted date:

20 May 2019 11 December 2019 15 January 2020

Please cite this article as: Alireza Naeimi Sadigh, Amir Hossein Taherinia, Hadi Sadoghi Yazdi, Analysis of Robust Recursive Least Squares: Convergence and Tracking, Signal Processing (2020), doi: https://doi.org/10.1016/j.sigpro.2020.107482

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2020 Published by Elsevier B.V.

Highlights • This study proposes a robust method for recursive least squares. • Convergence and performance are analyzed in both the stationary and non-stationary environments. • Maximum correntropy criterion (MCC) is designed for loss function optimization problem. • A half-quadratic method is used to successively convert complex correntropy based objective to a simple quadratic for establishing convergence and performance. • A robust adaptive filter is designed when the desired signal is contaminated with non-Gaussian noises.

1

Analysis of Robust Recursive Least Squares: Convergence and Tracking Alireza Naeimi Sadigha , Amir Hossein Taherinia a Department

a,∗

, Hadi Sadoghi Yazdia

of Computer Engineering, Ferdowsi University of Mashhad, Mashhad, Iran.

Abstract Outliers and impulsive noise are inevitable factors in recursive least squares (RLS). Developing robust RLS is vital in practical applications such as system identification in which outliers in the desired signals may severely divert the solutions. Almost all suggested robust RLS schemes have been designed for the impulsive noise and Gaussian environments. Recently, employing the Maximum Correntropy Criterion (MCC), the RGMCC (Recursive General MCC) algorithm has been given which yields more exact results for system identification problem in non-Gaussian environments. Here, we develop a new Robust RLS (R2 LS) scheme based on the MCC. In contrast to RGMCC, the structure of our model, although being complex, makes it possible to conduct convergence and performance analysis in both the stationary and non-stationary environments. Especially, the model is capable to reasonably predict and track the signals when the original signal is contaminated by non-Gaussian noise. To establish convergence and performance, we apply a half-quadratic optimization algorithm in the multiplicative form to successively convert our model to a quadratic problem which can be effectively solved by the classical tools. Numerical experiments are done on real and synthetic datasets; they show that the proposed algorithm outperforms the conventional RLS as well as some of its recent extensions. Keywords: Robust recursive least squares, Non-Gaussian noise, ∗ corresponding

author Email addresses: [email protected] (Alireza Naeimi Sadigh), [email protected] (Amir Hossein Taherinia ), [email protected] (Hadi Sadoghi Yazdi)

Preprint submitted to Journal of LATEX Templates

January 25, 2020

Half-quadratic optimization, Convergence and tracking

1. Introduction Recursive least squares can be considered as a popular tool in many applications of adaptive filtering [1, 2], mainly due to the fast convergence rate. RLS algorithms employ Newton search directions and hence they offer faster con5

vergence relative to the algorithms that employ the steepest-descent directions. Besides, this performance gain is in return for a larger computational complexity. Nevertheless, there are several low-complexity versions of the RLS algorithm [3, 4]. Nevertheless, this methods has only been operated in application to purely RLS adaptive filters [4] or RLS with at most quadratic regularization

10

[5]. In various applications, RLS is sensitive to the presence of outliers which can reduce the performance of adaptive filters. Many algorithms have been suggested to overcome this defect. Among them, the robust RLS has attracted especial attentions; as examples see [7, 8, 9], [12], [18], [23].

15

Although these approaches improve RLS when outliers appear, they have some limitations. For example, Chan and Zou [7] minimized a convex cost function to provide some levels of robustness. However, this makes no guarantee for reaching the global optima. Moreover, these models cannot handle the correlated nominal noise with known or estimated covariance matrix. On the

20

other hand, instead of scheme cost functions in the robust statistics, the outliers sparsity model has been explicitly introduced to determine the outliers [15]. The robust fast RLS (FRLS) algorithm is proposed in [8]. Although scale factor is used along with the standard deviation of the input signal to improve the robustness, but it has no benefit on the standard deviation of the background

25

noise. A common difficulty of outliers is impulsive noise which is discussed in [9] and [12]. This kind of noise appeared in an echo cancellation setting [16] as a double talk. Different algorithms are proposed to improve RLS robustness, see for example [9, 12]. The approach of [9] is based on optimizing a certain cost

3

function subject to a time-dependent constraint on the norm of the filter update, 30

but this method is no closed-form solution in general and the algorithm of [12] endure from its computational complexity. The robust RLS method suggested in [17] employs l1 norm penalty of the outlier vector to solve a sparse minimization. In that algorithm, l1 and l2 norm minimization problems have been presented, and it is helpful to remove the impulsive noise in the contaminated signal.

35

Another approach for outlier sparsity control is named as robust RLS via outlier pursuit (RRLSvOP) [18], is different from the method of [17] in computational procedures. The RRLSvOP has used a kind of nonconvex penalization for sparsity such as multistage local linearized approximation that generalizes original l1 norm penalized to the possibly nonconvex penalized.

40

A robust RLS was recently proposed, which focuses on the regularized RLS algorithm [11]. In the initial stage of the algorithm, ill-conditioned matrix inversion is required. In all the mentioned methods, the Mean Square Error (MSE) has been considered as the cost function, whereas, it is not suitable for non-Gaussian signal

45

environments. Hence, the Maximum Correntropy Criterion (MCC) [24] in the Information Theoretical Learning (ITL) has been applied in the optimization model to handle the non-Gaussian noises [22], [19]. Recently, a generalized MCC (GMCC) [13] as well as a recursive GMCC (RGMCC) [14] have been developed in robust adaptive filtering on non-Gaussian signal environments. However, con-

50

vergence analyses have not been conducted in RGMCC because of the complex structure of the given models. As known, classical optimization models make it possible to deal with the convergence more effectively. In this study, the robustness of RLS adaptive filtering algorithm applied in system identification on the maximum correntropy criterion (MCC) is evalu-

55

ated in different non-Gaussian noise models. Also, the approach is presented convergence analysis in stationary and non-stationary environments. Being an extension of the RLS achieved by replacing SE criterion with MCC, from now on the proposed method is called R2 LS. As a popular tool in robust face recognition [22], [25], and robust feature extraction [26], the half-quadratic (HQ) optimiza4

60

tion method [20] is employed in our scheme. It is shown that R2 LS is suitable for achieving better convergence and tracking properties in a non-stationary environment by time-varying systems with not so complicated computations. Moreover, our experiments confirm that in contrast to GMCC and RGMCC, the proposed algorithm is more efficient.

65

The remainder of this paper is organized as follows. Section 2 is a brief review on RLS and HQ-optimization methods. Section 3 describes the R2 LS algorithm on half-quadratic optimization and introduces the proposed method that would be used to solve it. The convergence of the R2 LS is presented in section 4. Section 5 proposes tracking of time-varying system on R2 LS and

70

section 6 reports the experimental results.

2. Preliminaries In this section some preliminary issues with the proposed algorithm are discussed. 2.1. Recursive least squares Consider a linear adaptive filter with the observed real-valued input vector at time i, u(i) = [u(i), u(i − 1), ..., u(i − M + 1)]T

(1)

where M is the number of delay elements. In addition, M tap-weight vector as, w(n) = [w0 (n), w1 (n), ..., wM −1 (n)]T

(2)

and desired signal d(i) at time i as follows, d(i) = [d(i), d(i − 1), ..., d(i − M + 1)]T

(3)

The weight vector is found recursively in time such as to minimize the sum of error square. The adaptive filter output is computed as the inner product of w(n) and u(i), 5

Figure (1)

Adaptive filtering structure

that is, wT (n)u(i), which is illustrated in Fig. 1. It is chosen w(n) so that the following summation is minimized.

J(w) =

n X

λn−i e2i

(4)

i=1

The corresponding estimation error would be en (i) = d(i) − yn (i), that λ is forgetting factor reduces the influence of old data, it usually takes the value from 0 < λ ≤ 1. yn (i) as the filter output is defined using the tap-weight vector w(n) as follows: yn (i) = wT (n)u(i) ∀i = 1, 2, ..., n 75

(5)

2.2. Optimization procedure via half-quadratic The half-quadratic technique was first developed in [20]. HQ has been successfully used in face recognition [22], pattern recognition [25], feature extraction [26] and image processing [21]. In this section, It is derived an algorithm based on the maximum correntropy criterion (MCC) by half- quadratic algo-

80

rithm. The correntropy [24] is a method to estimate the similarity between two random variables. It is a nonlinear function and insensitive to outliers and robustness. Based on the theory of convex conjugated function, it can be easily derived the following proposition.

6

Proposition 2.1. There exists a convex conjugated function ϕ of g(x) such that:

g(x) = sup (p

0

p0 <0

85

0 kx2 k − ϕ(p )) σ2

0

(6) 0

where p ∈ R and for a fixed x, the supremum is reached at p = −g(x). (Proof in [27]) 3. Robust Recursive Least Square (R2 LS) Outliers are the abnormal data points, which significantly deviate from normal values. In real-world, outliers may produce from measurement failures or big noise disturbances. They are rarely detectable and not clearly marked. Outliers and missing data are common in the measurement data and must be addressed properly to avoid large estimation errors. For overcoming this problem in RLS, here, a new robust RLS algorithm (R2 LS) is proposed. Which uses a maximum correntropy criterion (MCC) of the error to weight each sample based on how likely it seems to be noisy and then develop a half-quadratic optimization algorithm to maximize the objective in (7), along with the convergence analysis.

max J(w) = w

n X

λn−i exp(−ηe2i )

(7)

i=1

For solving (7) by HQ optimization, firstly, it is defined as a convex function ϕ(p) = −p log(−p) + p where p < 0. According to Proposition 2.1, it obtains exp(−ηe2i ) = sup (pi ηe2i − ϕ(pi ))

(8)

pi <0

where the supremum is achieved at pi = − exp(−ηe2i )

7

(9)

Substituting (8) into (7), J(w) is obtained as follows: n X

λn−i sup (pi ηe2i − ϕ(pi ))

= sup {

λn−i (pi ηe2i − ϕ(pi ))}

max J(w) = w

i=1

w,p<0

pi <0

n X i=1

n X = max { λn−i (pi ηe2i − ϕ(pi ))} w,p<0

(10)

i=1

An alternating optimization method can be used to optimize (10). To be specific, given w, it is optimized over p and then given p, it is optimized over w. First, in given w, whose analytic solutions are pi = − exp(−ηe2i ) < 0 and it is supposed qi = −pi . Then after achieving p, w can be obtained by solving the following problem: min w

n X

λn−i (qi ηe2i ),

(11)

i=1

the minimization problem in (11) is the optimization problem of R2 LS. In the next section, it is explained why the R2 LS performs better than the 90

RLS. 3.1. R2 LS loss function The loss function which used in RLS is in the least squares form where its expectation is defined by E{l(e)} = E{e2 }. However, the loss function in R2 LS is correntropy loss function which its expectation is obtained by E{l(e)} =

95

E{1 − exp(−ηe2 )}, where e is the error. Fig. 2 shows two loss functions. As seen, the correntropy loss function is less sensitive in presence of the error and it has more stability in comparison with the other loss functions, whereas the squares loss function is sensitive to a large value of the error. Thus, RLS is deviated to the presence of outliers and impulsive noise which can reduce the

100

performance of adaptive filters. Hence, the R2 LS is suitable for achieving better convergence and tracking. The R2 LS solution is given by following.

8

Figure (2)

Illustration of the correntropy and squared loss

3.2. R2 LS formulation The correlation matrix Φ(n) and the cross correlation matrix z(n) are defined by following: Φ(n) = z(n) =

n X

i=1 n X

λn−i qi ηu(i)uT (i)

(12)

λn−i qi ηu(i)dT (i)

(13)

i=1

For this equations and expanding the summations, it can be defined recursive form following: Φ(n) = λΦ(n − 1) + qn ηu(n)uT (n)

(14)

z(n) = λz(n − 1) + qn ηu(n)dT (n)

(15)

From matrix notation, the diagonal matrix consisting of λn−1 q1 η, λn−2 q2 η, ..., qn η yield



   Λq (n) =    

λn−1 q1 η λn−2 q2 η

0



0 ..

. qn η

and the matrix observed input is defined U (n) = [u(1), u(2), ..., u(n)]

9

      

(16)

(17)

Eq. (12) and (13) can obtained by Φ(n) = U (n)Λq (n)U T (n)

(18)

z(n) = U (n)Λq (n)d(n)

(19)

Also, tap-weight vector w(n) obtains from Φ(n) and z(n) as: w(n) = Φ−1 (n)z(n)

(20)

For calculating w(n), it is needed to apply the correlation matrix inverse. According the matrix inversion lemma [1] (A + αaaT )−1 = A−1 −

αA−1 aaT A−1 1 + αaT A−1 a

(21)

with the identification: A = λΦ(n − 1), α = qn η, a = u(n)

(22)

For simplification in formulas, let qn = qn η. 105

Substituting (22) in (21) and using eq. (14), Φ−1 (n) yields: Φ−1 (n) = λ−1 Φ−1 (n − 1) −

qn λ−1 Φ−1 (n − 1)u(n)uT (n)Φ−1 (n − 1)λ−1 (23) 1 + qn λ−1 u(n)T Φ−1 (n − 1)u(n)

Φ−1 (n) = λ−1 Φ−1 (n − 1) −

qn λ−2 Φ−1 (n − 1)u(n)uT (n)Φ−1 (n − 1) 1 + qn λ−1 u(n)T Φ−1 (n − 1)u(n)

(24)

Gain vector is introduced by k(n) as follows: k(n) =

qn λ−1 Φ−1 (n − 1)u(n) 1 + qn λ−1 u(n)T Φ−1 (n − 1)u(n)

(25)

From eq. (24) and (25), gain vector is rewrite as follow: k(n) = qn Φ−1 (n)u(n)

(26)

Substituting (15) and (24) into (20), w(n) obtains as: w(n) = w(n − 1) + k(n)[d(n) − w(n − 1)T u(n)] 10

(27)

Finally, Eq. (16), (24), (26) and (27) describe the R2 LS. The proposed R2 LS is described in Algorithm 1. In details, which will reduces R2 LS complexity to a weighted RLS problem. In RLS, when d(n) sud110

denly increases by outliers, hence, the error also increases and this would deviate weighted coefficient. The effect of this abnormal data points can be bounded by R2 LS which extremely suppresses qn closed to zero with qn = exp(−ηe2n ) when en is grown. With qn ≈ 0 and also the corrupt desired signal, the gain vector in (26) greatly

115

reduced, and the weighted equations in (27) do not deviate. In other words, the R2 LS provides robust performance even in the presence of very large outliers. 3.3. Computational complexity of R2 LS The computational complexity of the R2 LS algorithm is now compared with the traditional RLS algorithm in the time domain. The comparison is based on

120

a count of the total number of operations such as multiplications and addition involved in each of these two implementations for a block size M , where M is the dimension of the tap-weight vector and the number of delay elements. In RLS algorithm for real value data, the number of multiplications/division is M 2 + 5M + 2 and number of addition/subtraction is M 2 + 3M [2]. Therefore,

125

the computational complexity of the RLS is O(M 2 ). Similar to [2], the total

number of operations of the R2 LS is listed in Table 1. The difference between the complexity of algorithm the R2 LS and RLS is an exponential function which is negligible according to the R2 LS performance. 4. Convergence Analysis of R2 LS This section focuses on the convergence of the R2 LS algorithm in the context of a system identification problem. It is assumed a multiple linear regressor is defined by following: d(n) = U T (n)wo + eo (n)

11

(28)

Alg. 1 R2 LS Algorithm

Input: u(n), d(n), λ, η Output: w(n), Φ−1 (n)

1

begin

2

Initialize w(0) = 0, Φ−1 0 = δI

3

for each instant of time, n=1,2,... do

4

a. error estimated: T en (n) = d(n) − wn−1 u(n)

5 6

b. computing qn :

7

qn = exp(−ηen (n)2 )

8

qn = qn η

9

c. computing gain vector:

10

k(n) =

11

d. weight vector adaptation : w(n) = w(n − 1) + k(n)[d(n) − w(n − 1)T u(n)]

12

e. update Φ−1 (n) :

13

Φ−1 (n) = λ−1 {Φ−1 (n − 1) − k(n)uT (n)Φ−1 (n − 1)}

14 15

130

qn λ−1 Φ−1 (n−1)u(n) 1+qn λ−1 u(n)T Φ−1 (n−1)u(n)

end

where wo is the regressor weight vector, U (n) is the tap-input vector, eo (n) = [e0 (1), e0 (2), ..., e0 (n)]T is the noise with white zero mean and variance σ 2 and independent of the input samples. d(n) is the noisy output of the system shown in Fig. 3. 4.1. Convergence of R2 LS in the Mean value

135

Proposition 4.1 is shown that the least squares estimation w(n) is an unbiased estimate of the tap-weight vector wo .

12

Computational cost of the R2 LS per iteration for real value data in terms of the

Table (1)

number of operations

Term

Multiplication/Division

Addition/Subtraction

Exp fun.

M

M −1

-

-

1

-

2

-

-

qn = exp(−ηen (n)2 )

-

-

1

qn = qn η

1

-

-

λ−1 u(n)

M

-

-

λ−1 Φ−1 (n − 1)u(n)

M2

M (M − 1)

-

1

-

-

M

(M − 1)

-

-

1

-

1

-

-

1

-

-

M

-

-

-

M

-

M2

-

-

T wn−1 u(n)

d(n) −

T wn−1 u(n)

−ηen (n)2

−1

qn λ

Φ

−1

(n − 1)u(n)

qn λ−1 u(n)T Φ−1 (n − 1)u(n) 1 + qn λ

−1

T

u(n) Φ

−1

(n − 1)u(n)

1 1+qn λ−1 u(n)T Φ−1 (n−1)u(n) λ−1 Φ−1 (n−1)u(n) k(n) = 1+qnqλn−1 u(n)T Φ−1 (n−1)u(n) T

k(n)[d(n) − w(n − 1) u(n)] w(n) k(n)uT (n)Φ−1 (n − 1) Φ

−1

T

(n − 1) − k(n)u (n)Φ

Φ−1 (n)

−1

(n − 1)

Total per Iteration

-

M

M2

-

-

3M 2 + 4M + 6

2M 2 + 2M

1

Proposition 4.1. Despite the Gaussian noise eo (n), the average tap-weight behavior of the R2 LS generated an unbiased estimate on optimal weight vector wo . Proof:

13

2

-

-

-

Figure (3)

System identification structure

Substituting (28) into (19), z(n) is derived as following: z(n) = U (n)Λq (n)d(n) = U (n)Λq (n) .(U T (n)wo + eo (n))

(29)

z(n) = U (n)Λq (n)U (n)T wo + U (n)Λq (n)eo (n)

(30)

z(n) = Φ(n)wo + U (n)Λq (n)eo (n)

(31)

From (20) and (31), w(n) obtains as: w(n) = Φ−1 (n)[Φ(n)wo + U (n)Λq (n)eo (n)]

(32)

w(n) = wo + Φ(n)−1 U (n)Λq (n)eo (n)

(33)

Express expectation on (33) and as regard that U (n) and eo (n) are independent of each other, is defined as: E(w(n)) = wo + E[Φ(n)−1 U (n)Λq (n)eo (n)] = wo + E[Φ(n)−1 U (n)Λq (n)]E[eo (n)]

(34)

In (34), eo (n) is the zero mean Gaussian white noise, E(eo (n)) = 0, so that it is concluded: E(w(n)) = wo 14

(35)

This Proposition shows that w(n) is an unbiased estimation of wo . 140

4.2. The mean squared error in weighted vector The weighted error vector is defined as: v(n) = w(n) − wo

(36)

From (33), v(n) is written as following: v(n) = Φ−1 (n)U (n)Λq (n)eo (n)

(37)

Next, weight error correlation matrix is defined as: C(n) = E{v(n)v T (n)}

(38)

Substituting (37) into (38), C(n) is obtained as following: C(n) = E{Φ−1 (n)U (n)Λq (n)eo (n)eTo (n)U T (n)Λq (n)Φ−1 (n)}

(39)

As regard that U (n) and eo (n) are independent of each other, C(n) can be defined: C(n) = E{Φ−1 (n)U (n)Λq (n)E(eo (n)eTo (n))U T (n)Λq (n)Φ−1 (n)}

(40)

eo (n) is assumed as white-noise process of variance σ02 , E(eo (n)eTo (n)) = σ02 I

(41)

Then C(n) is derived as following: C(n) = σ02 E{Φ−1 (n)U (n)Λq (n)Λq (n)U T (n)Φ−1 (n)}

(42)

From (18), Φλ (n) and Φλ2 (n) are defined as : Φλ (n) = U (n)Λq (n)U T (n) = (λn−1 q1 + λn−2 q2 + ... + qn )R

(43)

Φλ2 (n) = U (n)Λq (n)Λq (n)U T (n) = ((λn−1 q1 )2 + (λn−2 q2 )2 + ... + qn2 )R 15

(44)

where R = E(u(n)uT (n)) is auto correlation matrix of the input vector u(n). Finally, substituting (43) and (44) into (42), C(n) is defined as: C(n) = σ02

R−1 R−1 (λn−1 q1 + λn−2 q2 + ... + qn )2

.[(λn−1 q1 )2 + (λn−2 q2 )2 + ... + qn2 ]R R−1 = σ02

n X (qi λn−i )2 i=1

(45)

n X ( qi λn−i )2 i=1

Proposition 4.2. The series that appear in (45),

n X

(qi λn−i )2

i=1 n X

(

, is convergent qi λn−i )2

i=1

and less than one. (Proof in Appendix A).

Now, α is set as:

α=

n X

(qi λn−i )2

i=1 n X

(

(46) n−i 2

qi λ

)

i=1

In other words, when n → ∞, it can be shown: C(∞) = σ02 R−1 α, α < 1

(47)

The Eq. (47) is expressed two known quantity: white-noise process of variance

145

σ02 that is added to the desired signal and inverse of eigenvalue correlation matrix R.

16

4.3. Learning curve on the R2 LS algorithm The learning curve is defined in terms of the a priori error ξ(n) and as eliminating desired response d(n) using (28) and (37) as following: en−1 (n) = d(n) − wT (n − 1)u(n) = (e0 (n) + w0T u(n)) − wT (n − 1)u(n)

(48)

Now, a priori estimation error is defined by mean square error: ξn−1 (n) = E[e2n−1 (n)]

(49)

Substituting (48) into (49), it is obtained as following: ξn−1 (n) = E[e2n−1 (n)] = E([e0 (n) − v T (n − 1)u(n)]2 ) = E(e20 (n)) − 2E[e0 (n)v T (n − 1)u(n)] + E[v T (n − 1)u(n)uT (n)v(n − 1)]

(50)

eo (n) is assumed as white-noise process and eo and u(n) are independent, so we obtain: E[e0 (n)v T (n − 1)u(n)] = 0

(51)

Furthermore, it is assumed that v(n − 1) and u(n) are independent. The third term of (50) is obtained as: E[v T (n − 1)u(n)uT (n)v(n − 1)] = E[v T (n − 1)E(u(n)uT (n))v(n − 1)] = E[v T (n − 1)Rv(n − 1)].

(52)

tr[.] denotes the trace of the matrix. = E[v T (n − 1)Rv(n − 1)] = tr[E[v T (n − 1)Rv(n − 1)]] = E[tr[v T (n − 1)Rv(n − 1)]] = E[tr[v T (n − 1)v(n − 1)R]] = tr[E[v T (n − 1)v(n − 1)]R] = tr[C(n − 1)R]

(53)

17

Substituting (51) and (53) into (50), ξn−1 (n) obtains as follows: ξn−1 (n) = E[e20 (n)] + tr[C(n − 1)R]

(54)

ξn−1 (n) = ξmin + tr[C(n − 1)R]

(55)

where ξmin = E[e20 (n)] is the minimum MSE of the filter which is achieved when a perfect estimate of w0 is available. Substituting (45) into (55), ξn−1 (n) is derived as : n−1 X

(qi λn−1−i )2

ξn−1 (n) = ξmin + σ02

i=1 n−1 X

(

(56) n−1−i 2

qi λ

)

i=1

Because of the second term on the right-hand side of (56) is a positive value and less that one (prop. 4.2), deviation ξn−1 (n) obtains from ξmin .

150

5. Tracking of Time-Varying Systems In previous sections, it was considered the behavior of R2 LS algorithm in a stationary environment that wo is invariant over time. In this section, it will be shown the operation of this algorithm in a non-stationary environment that wo is varying by time. For such an environment, optimum filter solution takes on time-varying form. The adaptive filtering now has the new task of tracking. In this section will be shown a time-varying model for system identification that depicted in Fig. 4, as described next. A transversal filter tap weight vector wo (n) is derived as follows. wo (n + 1) = awo (n) + ω(n)

(57)

where a is a fixed parameter, and ω(n) is the process noise vector assumed to be zero mean. The desired response d(n) is defined as: d(n) = wo (n − 1)T u(n) + s(n)

18

(58)

Figure (4)

System identification structure in tracking model

where u(n) is the input vector and s(n) is the measurement noise to be white with zero mean and variance σs2 . The error signal e(n) is defined by: e(n) = d(n) − y(n) = wo (n − 1)T u(n) + s(n) − w(n − 1)T u(n)

(59)

The tap-weight vector wo (n) of the unknown system represents the target to be tracked by adaptive filter. 5.1. Tracking performance of the R2 LS algorithm Consider the R2 LS algorithm used the adaptive filter of Fig. 4, from (27) weight vector w(n) to be shown as following: w(n) = w(n − 1) + qn Φ−1 (n)u(n)[d(n) − w(n − 1)T u(n)]

(60)

Eq. (57) is modified as: wo (n) = awo (n − 1) + ω(n)

(61)

Weight error vector is defined as: v(n) = w(n) − wo (n) 19

(62)

Substituting (60) and (61) into (62), weight error vector is updated in the 2

R LS algorithm as following: v(n) = w(n) − wo (n) = w(n − 1) + qn Φ−1 (n)u(n)[d(n) − w(n − 1)T u(n)] − awo (n − 1) − ω(n)

(63)

After substituting desired response in (63), weight error vector is calculated as: v(n) = v(n − 1)[I − qn Φ−1 (n)u(n)u(n)T ] + qn Φ−1 (n)u(n)s(n) + (1 − a)wo (n − 1) − ω(n)

(64)

It is assumed parameter a is very close to 1 so that we can ignore term (1 − a)wo (n − 1). Reduced this equation model as: v(n) = v(n − 1)[I − qn Φ−1 (n)u(n)u(n)T ] + qn Φ−1 (n)u(n)s(n) − ω(n)

(65)

Substituting (43) into (65), v(n) = v(n − 1)[I − +

qn R−1 u(n)u(n)T n−1 λ q1 + λn−2 q2 + ...

+ qn

]

qn R−1 u(n)s(n) − ω(n) λn−1 q1 + λn−2 q2 + ... + qn

(66)

If there is no outliers, it can be assumed q1 = q2 = ... = qn and when n → ∞,

and so λn → 0. Therefor correlation matrix of the input vector u(n) is R and independence assumption as follow: v(n) = v(n − 1)λ + (1 − λ)R−1 u(n)s(n) − ω(n)

(67)

Finally, the correlation matrix of v(n) obtains: C(n) = C(n − 1)λ2 + (1 − λ)2 R−1 σs2 + Q Q is correlation matrix of ω(n). 155

20

(68)

In another situation, if there is one outlier (the ith value of d(n)), we can rewrite q1 = q2 = ... = qi−1 = qi+1 = ... = qn , qi 6= qn and when n → ∞ then λn → 0:

qn R−1 u(n)u(n)T ] (λn−1 qn + λn−2 qn + ... + λn−i+1 qn + qi λn−i + qn λn−i−1 + ... + qn − qn λn−i + qn λn−i ) qn R−1 u(n)s(n) + n−1 − ω(n) (λ qn + λn−2 qn + ... + λn−i+1 qn + qi λn−i + qn λn−i−1 + ... + qn − qn λn−i + qn λn−i )

v(n) = v(n − 1)[I −

(69)

if n is far away from i, so that λn−i → 0, v(n) will be same as (67), otherwise v(n) obtains as follow: v(n) = v(n − 1)[1 − +

1 qn 1−λ

+

qn (n−i) λ (qi

qn R−1 u(n)s(n) 1 qn 1−λ + λ(n−i) (qi − qn )

− qn )

]

− ω(n)

(70)

But qi → 0 then from (70), it obtains: v(n) = v(n − 1)[1 − +

1 qn 1−λ

qn ] − λ(n−i) qn

qn R−1 u(n)s(n) − ω(n) 1 qn 1−λ − λ(n−i) qn

= v(n − 1)(1 −

1 − λ −1 1−λ )+( )R u(n)s(n) − ω(n) λ λ

(71)

The correlation matrix of v(n) obtains: C(n) = C(n − 1)(1 −

1−λ 2 1 − λ 2 −1 2 ) +( ) R σs + Q λ λ

(72)

5.2. Analysis correlation matrix of v(n) Now, due to the error e(n) and the λ values, we review the correlation matrix of v(n) (C(n)) changes. Firstly, note that λ is defined by λ = 1 − qn

(73)

If there exist outliers, then the error e(n) grows, qn → 0, and consequently, from (73) we have λ → 1. So, from (72) it can be concluded C(n) only depended on the previews states and it is completely independent to the influence data 21

160

and outliers. On the other hand, If there is no any outlier, then error e(n) decreases, qn → 1, and consequently, from (73) we have λ → 0. So, from (68) it can be concluded that C(n) only depended on the influence data. Thus, it is suitable since there is no any outlier.

6. Simulation Result In this section 1 , the robust RLS (R2 LS) algorithm is compared with the conventional RLS algorithm and the PRRLS [12], GMC algorithm [13] and RGMC [14] in terms of robustness for a system identification application by MackeyGlass short time series prediction. This series is generated by the following time differential equation [6]: dx(t) 0.2x(t − 30) = −0.1x(t) + dt 1 + x(t − 30)10

(74)

The sampling period in this time series is 6 s. The training set is selected by a segment of 500 samples and the testing set is selected by another 100 samples in all the experiments. To evaluate the filtering performance, 50 Monte Carlo PN are operated to achieve the MSE criteria as N1 n=1 (di − yi )2 (N = 100). These comparisons are evaluated under non-Gaussian noise distributions that

divide including alpha-stable noise and mixed Gaussian noise for performance experiments. Specifications of these non-Gaussian noises are as following: 1) Alpha-stable noise model: this function is given by Ψ(t) = exp{jδt − γ | t |α [1 + jβsgn(t)S(t, α)]} in which

  tan aπ α 6= 1 2 S(t, α) =  2 log | t | α = 1 π

(75)

(76)

with the characteristic factor 0 < α 6 2, the symmetry parameter −1 6 β 6 1, the location parameter −∞ < δ < +∞ and the dispersion parameter 0 < γ < 1 The

source codes is provided in https://github.com/anaeimi/R2LS

22

Learning Curve

RGMC α = 1 RGMC α = 2 RGMC α = 3 GMC α = 1 GMC α = 2 GMC α = 3

R2 LS

R2 LS

10 -1

10 -2

0

50

100

150

200

250

300

Learning Curve

10 0 RGMC α = 1 RGMC α = 2 RGMC α = 3 GMC α = 1 GMC α = 2 GMC α = 3

MSE

MSE

10 0

350

400

450

500

10 -1

10 -2

0

50

100

150

200

iteration

250

(a) Figure (5)

(b)

Convergence curves of GMS, RGMC and R2 LS in different noises. (a) Alpha-

stable noise (b) Mixed Gaussian noise.

∞. The parameter of vector noise model is defined as Vα−stable (α, β, γ, δ). In all simulations, the alpha-stable vector noise is set as Vα−stable (1.3, 0, 1.2, 0). 2) Mixed Gaussian noise model: the mixed Gaussian noise model is given by (1 − θ)N (µ1 , σ12 ) + θN (µ2 , σ22 ) 165

(77)

For producing impulsive noise or large outliers, θ is set by a small value and σ12 

σ22 . The parameter of vector noise model is defined as Vmix (µ1 , µ2 , σ12 , σ22 , θ). In all simulations, the mixed Gaussian vector noise is set as Vmix (0, 0, 0.01, 100, 0.05).

6.1. Performance comparison among the R2 LS and other algorithms 170

In the first experiment, Convergence curves are shown for R2 LS and GMC family algorithms on the alpha-stable noise model and mixed Gaussian noise in Fig. 5. In all experiments, the recursive algorithms of forgetting factor is 0.95 and the step size for GMC is 0.01. The shape parameter (α) is valued {0.5, 2, 3}

on GMC-based algorithms and η’s value is 0.1 for the R2 LS. As one can see

175

300

iteration

clearly, the R2 LS algorithm is more robust and stable than the GMC family algorithm and obtains lower steady-state MSE. 23

350

400

450

500

In addition, Table 2 lists the testing MSE and the average execution time for one simulation, where the execution time is measured on a PC configured with a 1.9-GHz processor and 4-GB memory. As seen, the execution time of R2 LS is less than the recursive family algorithms. Table (2)

Performance Comparison Between Different Algorithms on Alpha-stable noise

model

Algorithm

α

MSE(Test)

0.5 2

2.4226 0.0479

0.0150 0.0156

3

0.0574

0.0192

0.5 2

0.0222 0.0174

0.1287 0.1426

3

0.0215

0.1536

R2 LS

-

0.0162

0.0755

RLS

-

1.4536

0.0541

GMC

RGMC

Execution Time (Sec)

180

In the following simulations, to validate the efficiency of the R2 LS algorithm, experiments are conducted on a synthetic data set. The unknown system coefficient vector, wo is obtained using Filter command in MATLAB and the length of the unknown System impulse response is assumed to be M =16. The input signal u(n) is as the vector with zero-mean, unit variance and Gaussian noise added. The desired output signal includes a zero-mean white Gaussian noise signal as detailed in [12]. At all experiment, forgetting factor λ and η were chosen by different values. The next experiment is compared to the R2 LS and the conventional RLS algorithm for weight estimation error. In order to generate the performance curves, 10 independent experiments were performed and averaged, where curves are generated by running the learning process for 300

24

15

5

10

0

5

steady-state MSD in db

steady-state MSD in db

10

-5 -10

RLS R2 LS

-15 -20

0 -5

-20

-30

-25

0

50

100

150

200

250

300

R2 LS

-15

-25

-35

RLS

-10

-30

0

50

100

Iteration

150

(a) Figure (6)

(b)

Steady-state MSD and weights changes (w − wopt ) with λ=0.9, η=0.5 with the

length of the unknown system (M =16). (a) 30 percent (b) 50 percent of the desired data is contaminated to outliers randomly.

iterations. The mean square deviation (MSD) obtained as M SD(n) = 10log10 (kwn − wopt k22 ),

(78)

is used as a performance measure. It can be seen in Fig. 6 the R2 LS algorithm is robust for a large number of outliers whereas the rate of convergence R2 LS algorithm is faster than the conventional RLS. So that it is concluded whenever outliers appear in desired signal, weighted changing is robust but in RLS it is 185

very dissemination. As seen, in Fig.6 (b) when the number of outliers increases, the weighted changes in steady-state is better from changes in Fig. 6 (a) on the R2 LS algorithm. In Fig. 6 has shown the misalignment of the RLS algorithm relative to that of the R2 LS algorithm in the presence of impulsive noise.

190

200

Iteration

The next experiment compares R2 LS with the PRRLS algorithm [12]. Simulations were done with parameters in [12] for the PRRLS. Learning process is run for 2500 iterations. Fig. 7 and 8 are shown steady-state MSD and mean square error the R2 LS algorithm and PRRLS, respectively. In Fig 8, the R2 LS converges to a lower 25

250

300

350

450 R2 LS PRRLS

400

R2 LS PRRLS

300

steady-state MSD in dB

steady-state MSD in dB

350 300 250 200 150 100

250 200 150 100 50

50 0

0 -50

0

500

1000

1500

2000

2500

-50

0

Number of iterations

500

1000

(a) Figure (7)

(b)

Comparison between PRRLS in [12] and R2 LS on steady-state MSD and weights

changes (w − wopt ) with λ=0.99, η=0.01 with the length of the unknown system (M =16). Impulsive noise was added to desired signal at iterations (a) 700, 1200, 2000 (b) 100, 200, 300, 500, 600, 700, 800, 1400, 1500, 1600, 1700, 1800, 1900.

195

MSE. As can be seen, the performance of the PRRLS algorithms are decreased with respect to increase of the percentage of impulsive noise in the desired signal. They are become divergent due to non-Gaussian noise, whereas the R2 LS algorithm is not affected by the increase of the impulsive noise. 6.2. Performance the R2 LS with different parameters

200

First, it is illustrated how the η will influence the convergence performance of R2 LS. Fig. 9 shows the convergence curves of R2 LS with different η values, where the alpha-stable noise and mixed Gaussian noise are chosen for measurement noise and the vector noise parameters are the same as the previous simulation. The η is set at 0.01, 0.7, 1, 2. Noticeably, the value of η has a

205

1500

Number of iterations

significant effect on the convergence curve. If η is large or too small, then the filtering accuracy of R2 LS as well as the robustness process will act poorly. Second, the steady-state MSE of R2 LS and RGMC family algorithms with different γ = (1.2, 1.4, 1.6, 1.8) is shown by Fig. 10. Other parameters are the same as in the previous simulation for all algorithms. As expected, the proposed

26

2000

2500

600

450 R2 LS PRRLS

500

R2 LS PRRLS

400 350 300

MSE in dB

MSE in dB

400

300

200

250 200 150 100

100

50 0

-100

0

0

500

1000

1500

2000

-50

2500

0

500

1000

Number of iterations

1500

2000

2500

Number of iterations

(a)

(b)

Comparison between PRRLS in [12] and R2 LS on Squared error system identifi-

Figure (8)

cation and difference between the true and the estimated system with λ=0.99, η=0.01 with the length of the unknown system (M =16). Impulsive noise was added to the desired signal at iterations (a) 700, 1200, 2000 (b) 100, 200, 300, 500, 600, 700, 800, 1400, 1500, 1600, 1700, 1800, 1900.

η

η

0.9 R2 LS η = 0.01

0.8

R2 LS η = 0.7

0.7

2

R LS η = 1 2

R LS η = 2

0.6

R2 LS η = 0.01

0.9

R2 LS η = 1

0.8

R2 LS η = 2

R2 LS η = 0.7

0.7

MSE

0.5

MSE

1.2 1.1 1

0.4

0.6 0.5 0.4

0.3

0.3 0.2 0

50

100

150

200

250

300

350

400

450

iteration

500

0

50

100

150

200

250

(a) Figure (9)

300

iteration

(b)

Convergence curves of R2 LS with different η on different noises. (a) Alpha-stable

noise (b) Mixed Gaussian noise.

27

350

400

450

500

210

algorithm can achieve much better steady-state performance than RGMC in all cases. 0.032 0.03 0.028

R2 LS RGMC α = 0.5 RGMC α = 2 RGMC α = 3

MSE

0.026 0.024 0.022 0.02 0.018 0.016 1.2

1.3

1.4

1.5

1.6

1.7

1.8

1.9

γ

Figure (10)

Performance evolution curve of Recursive family algorithm with different γ on

alpha stable noise model

6.3. Tracking of time-varying systems of a noisy chirped signal The next experiment, the tracking of a chirped signal will be shown by the R2 LS in non-stationary environments. The chirped input signal given by 215

y = chirp(t, f0 , t1 , f1 ) generates samples of a linear swept-frequency cosine signal that defined in array t, where f0 (Hz.) and f1 (Hz.) are the instantaneous frequency at time 0 and time t1 , respectively. If y is added with impulsed noise or outliers then tracking of noisy chirp would be a proper experiment for testing R2 LS. In this experiment, f0 and f1 are respectively 10Hz and 400Hz, and t

220

varies from 0 to 10 seconds. Fig 11 shows a good tracking of time-varying systems of a noisy chirped signal by the R2 LS algorithm.

28

3 Original Signal Noisy Signal Predict

2

Signal value

1

0

-1

-2

-3

0

20

40

60

80

100

120

Iteration

Figure (11)

Tracking noisy chirp signal by R2 LS with λ = 0.9, η=0.01 and 30 percentage of

the chirp signal is contaminated to outliers randomly.

6.4. Prediction signal GPS in the navigation system Global Position System (GPS) is frequently used in navigation applications in the open environment. However, GPS signals are attenuated whenever signals 225

are blocked by buildings or in tunnels. This study aims to deal with prediction of such attenuated signals by the R2 LS approach, making it possible to perform continuous high precision navigation when no GPS signal is available. In this context, data is collected by the GPSNEO-8 module (Fig 12) with Arduino to achieve a path by latitude and longitude coordinates.

Figure (12)

230

GPS Module (GPSNEO-8m)

At first, the latitude and longitude coordinates of a route are collected by 29

the GPSNEO-8 module. In parts of the route, the GPS signal is not available. For example, we present some of data collected in Table 3. Table (3)

The latitude and longitude coordinates of a route, in the red data, the signals are

blocked by the tunnels and the module cant obtain coordinates.

Latitude

Longitude

36.29036167

59.61955333

36.29032333

59.61948667

36.29027833

59.619415

36.29022667

59.61933

36.29016833

59.61923667

36.29010833

59.61913333

36.29004167

59.619025

36.28997167

59.61891

36.28990167

59.61879

36.28983333

59.61866833

36.28976667

59.61854833

36.289705

59.61843

36.289645

59.61831167

,,,,18365, V,N*5A

,,,,18365, V,N*5A

36.29044667

59.61396833

36.29064167

59.61381667

In the second step, we plot the given main signals for the latitude and the longitude separately (Fig. 13-a and Fig. 13-b) that the signals are contaminated 235

to the noises. Then, these noisy signals are predicted by R2 LS and the path can be smooth for using navigation applications. Fig 14-a shows a path in Mashhad city and GPS outages determine on the map. Finally, in Fig 14-b, the R2 LS predicts the signal where there are no GPS signals.

30

(a) Figure (13)

(b)

Prediction the noisy signal by R2 LS with λ = 0.9 and η = 0.01. (a) the latitude

noisy signal (b) the longitude noisy signal.

(b)

(a) Figure (14)

Prediction signal GPS during GPS outages by R2 LS with λ=0.99, η=0.01 and

the length of the unknown system (M =5). (a) GPS signals are blocked by underpass and tunnel in zone 1, 2 and 3. (b) the signals are predicted where there are no GPS signals by R2 LS and the path can be smooth for using navigation applications.

6.5. Target tracking by the R2 LS The last experiment, it’s considered a target tracking problem by the R2 LS that used in control systems. The target of state transition equation is expressed as x(i + 1) = T ∗ x(i) + z(i)

31

(79)

240

where x(i) is the 2D coordinates for the target at time i. Also, x(i) as input

245

data and x(i + 1) as desired output is considered. T ∗ is 2 ∗ 2 state transition   1.0019 −0.0129  and z(i) is the noise model with matrices that is set to  0.0187 1.0034 zero-mean Gaussian and covariance matrix σz I, that σz is set to 0.01. The h iT initial coordinate for the target is x[0] = 1 −1 . The trajectory of the target from time 0 to 100 is shown in Fig. (15-a).

(a) Figure (15)

(b)

Target tracking by R2 LS for Gaussian noise with λ=0.99, σs =σz =0.01, η=0.01

and the length of the unknown system (M =2). (a) The trajectory of the target. (b) MSD learning curves

It’s assumed that the proposed method tracks the target with noisy observations as follows: x ˜(i) = x(i) + s(i)

(80)

where s(i) is the observation noise with zero-mean Gaussian and covariance matrix σs I, that σs is set to 0.01. The R2 LS can track a target, also to estimate its transition matrix given input and output noisy data. This act is performed by minimizing the cost function Eq. 11 with respect to matrix T . In the cost function, the error obtains as ei = x ˜(i + 1) − T x ˜(i) 32

(81)

Fig. (15-b) shows the mean square deviation (MSD) learning curves of transition matrices estimated by the R2 LS. The following equation obtains it. M SD(i) = kTi − T ∗ k22 ,

(82)

It can be seen the rate of convergence the R2 LS algorithm is faster than the conventional RLS and also the R2 LS is robust with respect to Gaussian noisy data.

7. Conclusion 250

This paper proposed robust recursive least squares R2 LS algorithms, which applied in system identification that used half-quadratic optimization. Robustness to outliers is achieved by a modification on the weights of conventional RLS. Firstly, the cost function of the algorithm is constructed on maximum correntropy criterion (MCC), that derived the update equation of weight based

255

on how likely each sample seems to be noisy. Secondly, it also is presented with theoretical results for the convergence in the mean-square of the misalignment vector, which are valid for a large variety of input processes and noise distributions. Finally, simulation results present that the proposed algorithm is robust when desired data is contaminated to outliers and non-Gaussian noise

260

and provide evidence of the excellent performance behavior and robustness of the proposed algorithm.

Appendix A

Proof of the convergence of

n X (qi λn−i )2 i=1 n X

(

and less than one.

n−i 2

qi λ

)

i=1

Proof:

n X

(qi λn−i )2 = (q1 λn−1 )2 + (q2 λn−2 )2 + ... + qn2

i=1

33

(83)

Also n X ( qi λn−i )2 = (q1 λn−1 +q2 λn−2 +...+qn )2 = (q1 λn−1 )2 +(q2 λn−2 )2 +...+qn2 +2q1 q2 λn−1 λn−2 +...+2qn−1 qn λ i=1

(84)

Because of qi > 0 and λ > 0 then, 2q1 q2 λn−1 λn−2 + ... + 2qn−1 qn λ > 0.

By (83) and (84), it’s obtained as, (

n X i=1

qi λn−i )2 ≥ (q1 λn−1 )2 + (q2 λn−2 )2 + ... + qn2 =

Therefor,

n X (qi λn−i )2 i=1 n X

(

qi λ

n−i 2

)

n X

(qi λn−i )2

(85)

i=1

≤1

(86)

i=1

For the proof convergence, using a method called the comparison test. It 265

consists of a pair of statements about infinite series with non-negative (realX valued) terms: If the infinite series bn converges and 0 ≤ an ≤ bn for all X sufficiently large n then the infinite series an also converges [10]. Now according to inequalities relations as:

n n n−1 X X X (qi λn−i )2 ≤ λn−i = λm (87)

(qi λn−i )2 ≤ (λn−i )2 ≤ λn−i −→ qi ≤1

The n X

n−1 X

λ≤1

λm is convergent to

i=1

1 1−λ ,

m=0

i=1

according to comparison test, the series

m=0

(qi λn−i )2 is convergent too.

i=1

With the same reasoning above as: qi λn−i ≤ λn−i −→ qi ≤1

The n X

n−1 X

λm is convergent to

n X i=1

qi λn−i ≤

1 1−λ ,

i=1

λn−i =

n−1 X

λm

(88)

m=0

according to comparison test, the series

m=0

qi λn−i is convergent too.

i=1

270

n X

34

Now by proving the convergence of the above two series and also by Eq. n X (qi λn−i )2

(86), it can be proved that the

i=1 n X

(

is convergent.

n−i 2

qi λ

)

i=1

References References 275

[1] S. Haykin, Adaptive Filter Theory. Fourth Edition, Upper Saddle River, NJ: Prentice-Hall, 2002. [2] A. H. Sayed, Adaptive Filters. New York, NY: Wiley, 2008. [3] Zakharov, Y. V., White, G. P., and Liu, J. ”Low-complexity RLS algorithms using dichotomous coordinate descent iterations,” IEEE Transac-

280

tions on Signal Processing, vol. 56, No. 7 , pp. 3150-3161, 2008. [4] Raffaello Claser, Vtor H. Nascimento, Yuriy V. Zakharov. ”A Lowcomplexity RLS-DCD algotithm for volterra system identification,” in: European Signal Processing Conference (EUSIPCO), 2016. [5] Liu, J., and Y. Zakharov. ”Low complexity dynamically regularised RLS

285

algorithm.” Electronics Letters 44, No. 14, pp. 886-885, 2008. [6] Liu, Weifeng, Jose C. Principe, and Simon Haykin. ”Kernel adaptive filtering: a comprehensive introduction,” Vol. 57. John Wiley & Sons, 2011. [7] S.-C. Chan and Y.-X. Zou, ”A recursive least M-estimate algorithm for robust adaptive filtering in impulsive noise: Fast algorithm and convergence

290

performance analysis,” IEEE Trans. Signal Process., vol. 52, no.4, pp. 975991, Apr. 2004. [8] J. Benesty and T. Gaensler, ”A robust fast recursive least squares adaptive algorithm,” in Proc. Int. Conf. Acoustics, Speech, Signal Processing (ICASSP), Salt Lake City, UT, pp. 3785-3788, 2001. 35

295

[9] L. R. Vega, H. Rey, J. Benesty, and S. Tressens, ”A fast robust recursive least-squares algorithm,” IEEE Trans. Signal Process., vol. 57, no. 3, pp. 1209-1216, Mar. 2009. [10] Ayres, Frank Jr., Mendelson, Elliott. Schaum’s Outline of Calculus. Fourth Edition, New York: McGraw-Hill., 1999.

300

[11] Elisei-Iliescu, Camelia, Cristian Stanciu, Constantin Paleologu, Jacob Benesty, Cristian Anghel, and Silviu Ciochin. ”Robust variable-regularized RLS algorithms.” In Hands-free Speech Communications and Microphone Arrays (HSCMA), 2017, pp. 171-175. IEEE, 2017. [12] M. Z. A. Bhotto and A. Antoniou, ”Robust recursive least-squares

305

adaptive-filtering algorithms for impulsive-noise environments,” IEEE Signal Process. Lett., vol. 18, no. 3, pp. 185-188, Mar. 2011 . [13] B. Chen, L. Xing, H. Zhao, N. Zheng, and J. C. Principe, ”Generalized correntropy for robust adaptive filtering,” IEEE Trans. Signal Process., vol. 64, no. 13, pp. 33763387, Jul. 2016.

310

[14] W. Ma, J. Duan, B. Chen, G. Gui, and W. Man, ”Recursive generalized maximum correntropy criterion algorithm with sparse penalty constraints for system identification,” Asian J. Control, vol. 19, no. 1, pp. 16, 2016. [15] G.B. Giannakis, G. Mateos, S. Farahmand, V. Kekatos, and H. Zhu, ”USPACOR: Universal sparsity-controlling outlier rejection,” in 2011 IEEE

315

International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1952-1955, 2011. [16] Paleologu, Constantin, Jacob Benesty, Cristian Stanciu, Cristian Anghel, and Mircea Stena. ”Robust regularization of the recursive least-squares algorithm.” In Electronics, Computers and Artificial Intelligence (ECAI),

320

2016 8th International Conference on, pp. 1-4. IEEE, 2016.

36

[17] Shahrokh Farahmand and Georgios G. Giannakis, ”Robust RLS in the presence of correlated noise using outlier sparsity,” IEEE Transaction on Signal Processing, vol. 60, no. 6, pp. 3308-3313, 2012. [18] Xiao, Longshuai, Ming Wu, Jun Yang and Jing Tian. ”Robust RLS via the 325

nonconvex sparsity prompting penalties of outlier components,” In Signal and Information Processing (ChinaSIP), IEEE China, pp. 997-1001, 2015. [19] Ma, Wentao, Hua Qu, Guan Gui, Li Xu, Jihong Zhao, and Badong Chen. ”Maximum correntropy criterion based sparse adaptive filtering algorithms for robust channel estimation under non-Gaussian environments.” Journal

330

of the Franklin Institute 352, no. 7, 2708-2727, 2015. [20] M. Nikolova and M.K. NG, ”Analysis of Half-Quadratic Minimization Methods for Signal and Image Recovery,” SIAM J. Scientific Computing, vol. 27, no. 3, pp. 937-966, 2005. [21] D. Geman and G. Reynolds, ”Constrained restoration and the recovery of

335

discontinuities,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 14,no. 3, pp. 367-383, Mar. 1992. [22] R. He, W.S. Zheng, B.-G. Hu, ”Maximum correntropy criterion for robust face recognition,” IEEE Trans. Pattern Anal. Mach. Intell. vol. 33, pp. 1561-1576, 2011.

340

[23] Chen, Yu, Guan Gui, and Lei Wang. ”Who care for channel sparsity? Robust sparse recursive least square based channel estimation.” In Wireless Communications and Signal Processing (WCSP), 2016 8th International Conference on, pp. 1-5. IEEE, 2016. [24] Liu, Weifeng, Puskal P. Pokharel, and Jos C. Prncipe. ”Correntropy: Prop-

345

erties and applications in non-Gaussian signal processing.” IEEE Transactions on Signal Processing 55, no. 11, pp. 5286-5298, 2007.

37

[25] R. He, W.-S. Zheng, B.-G. Hu, X.-W. Kong, ”A regularized correntropy framework for robust pattern recognition,” Neural Comput. vol. 23 ,no. 8, pp 2074-2100, 2011. 350

[26] X.-T. Yuan, B.-G. Hu, ”Robust feature extraction via information theoretic learning,” in: Proceedings of International Conference on Machine Learning (ICML), pp. 1193-1200, 2009. [27] Xu, Guibiao, Zheng Cao, Bao-Gang Hu, and Jose C. Principe. ”Robust support vector machines based on the rescaled hinge loss function.” Pattern

355

Recognition. No. 63, pp 139-148, 2017.

38

Declaration of Competing Interest We wish to submit our manuscript entitled Analysis of Robust Recursive Least Squares: Convergence and Tracking for consideration by Signal Processing journal. 360

This manuscript has not been published and is not under consideration for publication elsewhere. Hereby I declare that there are no conflicts of interest for the authors of this study.

39

Author Contribution Alireza Naeimi Sadigh: Software, Formal analysis, Data Curation, Writ365

ing - Original Draft. Amir Hossein Taherinia: Conceptualization, Validation, Investigation, Writing - Review & Editing. Hadi Sadoghi Yazdi: Supervision, Methodology, Project administration.

40