Variable Step Size Least Mean Square Algorithm Based on Censored Regression

Variable Step Size Least Mean Square Algorithm Based on Censored Regression

Available online at www.sciencedirect.com ScienceDirect IFAC PapersOnLine 52-24 (2019) 88–92 Variable Step Size Least Mean Square Algorithm Variable...

675KB Sizes 0 Downloads 7 Views

Available online at www.sciencedirect.com

ScienceDirect IFAC PapersOnLine 52-24 (2019) 88–92

Variable Step Size Least Mean Square Algorithm Variable Variable Step Step Size Size Least Least Mean Mean Square Square Algorithm Algorithm Variable Step Size Least Mean Square Algorithm Regression Regression Regression Regression

Based on Censored Based Based on on Censored Censored Based on Censored

Feng Feng Zhao. Zhao. Haiquan Haiquan Zhao Zhao Feng Zhao. Haiquan Zhao Feng Zhao. Haiquan Zhao The Key Key Laboratory Laboratory of of Magnetic Magnetic Suspension Suspension Technology Technology and and Maglev Vehicle, Vehicle, Ministry Ministry of of Education. Education. The Maglev The Key Laboratory of Magnetic Suspension Technology and Maglev Vehicle, Ministry of Education. The School of Electrical Electrical Engineering, Southwest Jiaotong University, Chengdu, 610031, China TheThe KeySchool Laboratory of Magnetic Suspension Technology and University, Maglev Vehicle, Ministry of Education. of Engineering, Southwest Jiaotong Chengdu, 610031, China The School of Electrical Engineering, Southwest Jiaotong University, Chengdu, 610031, China (e-mail:[email protected]; [email protected]). The School of Electrical Engineering, Southwest Jiaotong University, Chengdu, 610031, China (e-mail:[email protected]; [email protected]). (e-mail:[email protected]; [email protected]). (e-mail:[email protected]; [email protected]). Corresponding Corresponding author: author: Haiquan Haiquan Zhao. Zhao. Corresponding author: Haiquan Zhao. Corresponding author: Haiquan Zhao. Abstract: Abstract: In In numerous numerous practical practical applications, applications, the the censored censored observations observations often often occur. occur. Using Using traditional traditional Abstract: In numerous practical applications, the censored observations often occur. Using traditional adaptive algorithms to identify the system of this type may lead to the performance degradation. To address Abstract: In numerous practical applications, the censored observations often occur. Using traditional adaptive algorithms to identify the system of this type may lead to the performance degradation. To address adaptive algorithms to identify the system of this type may lead to the performance degradation. To address this the regression algorithms proposed. However, adaptive algorithms to identifycensored the system of this type may leadhave to thebeen performance To address this problem, problem, the distributed distributed censored regression algorithms have been proposed.degradation. However, distributed distributed this problem, the distributed censored regression algorithms have been proposed. However, distributed Least Mean (D-LMS) based censored aa slow speed. solve this the distributed algorithmshas have beenconvergence proposed. However, Leastproblem, Mean Square Square (D-LMS) censored based on on regression censored regression regression has slow convergence speed. To Todistributed solve this this Least Mean Square (D-LMS) based on censored regression has a slow convergence speed. To solve this Least Mean Square (D-LMS) based on censored regression has a slow convergence speed. To solve this problem, a variable step size LMS based on censored regression (CR-VSS-LMS) is proposed in this paper. problem, a variable step size LMS based on censored regression (CR-VSS-LMS) is proposed in this paper. problem, a variable step size LMS based on censored regression (CR-VSS-LMS) is proposed in this paper. problem, a variable step sizeFederation LMS based censored regression (CR-VSS-LMS) is curve proposed this paper. © 2019, IFAC (International of on Automatic by tongue-like Elsevier Ltd. All rightsinreserved. Key words: censored regression, Least Square, variable step Key words: censored regression, Least Mean Mean Square,Control) variableHosting step size, size, tongue-like curve Key words: censored regression, Least Mean Square, variable step size, tongue-like curve Key words: censored regression, Least Mean Square, variable step size, tongue-like curve

1. INTRODUCTION INTRODUCTION 1. 1. INTRODUCTION 1. INTRODUCTION The linear linear regression regression model model is is widely widely used used in in signal signal The The linear regression model is widely used in signal processing, communications, andislots lots of other other fields, where The linear communications, regression modeland widely used in where signal processing, of fields, processing, communications, and lots of other fields, where observed data data are assumed assumed to toand belots available completely. To processing, communications, of othercompletely. fields, where observed are be available To observed data are assumed to be available completely. To address this this model problem, in previous references, observed data are assumed to be in available completely. To address model problem, previous references, address this model problem, in previous references, researchers havemodel proposed plenty of of adaptive algorithms [S. address this problem, in adaptive previousalgorithms references, researchers have proposed plenty [S. researchers have proposed plenty of adaptive algorithms [S. researchers have proposed plenty of adaptive algorithms [S. S. Haykin. (2008); A. H. Sayed. (2003)], such as least meanS. Haykin. (2008); A. H. Sayed. (2003)], such as least meanS. Haykin. (2008); A. H. Sayed. (2003)], such as least meanS. Haykin. (2008); A. H. least-squares Sayed. (2003)], suchalgorithms as least meansquare (LMS), recursive least-squares (RLS) algorithms and square (LMS), recursive (RLS) and square (LMS), recursive least-squares (RLS) algorithms and so on. on. (LMS), recursive least-squares (RLS) algorithms and square so so on. so on. Unfortunately, the the requirement requirement for for linear linear regression regression cannot cannot be be Unfortunately, Unfortunately, the requirement for linear regression cannot be usually met met in in the practical applications. In regression general, output output data Unfortunately, requirement for linear cannotdata be usually practical applications. In general, usually met in practical applications. In general, output data usually met in practical applications. In general, output data whose value value exceeds exceeds the the limit limit of of the the recording recording device device cannot cannot whose whose value exceeds the limit of the recording device cannot whose value exceeds the limit of the recording device cannot be observed [J. J. Heckman. (1976); Powell, J. L. (1984); be observed [J. J. Heckman. (1976); Powell, J. L. (1984); be observed [J. J. Heckman. (1976); Powell, J. L. (1984); Bottai, M., J. In words, the output be observed J. Heckman. (1976); Powell, J. L. Bottai, M., & &[J.Zhang, Zhang, J. (2010)]. (2010)]. In other other words, the(1984); output Bottai, M., & Zhang, J. (2010)]. In other words, the output data whose values lie in a certain range are available. Actually, Bottai, M., & Zhang, J. (2010)]. In other words, the output data whose values lie in a certain range are available. Actually, data whose values lie in a certain range are available. Actually, the regression be as aa nonlinear regression data whose values lie in acan certain range available. Actually, the censored censored regression can be seen seen as are nonlinear regression the censored regression can be seen as a nonlinear regression model which includes aa saturated nonlinearity and the censored be seen as a nonlinearmodel regression model whichregression includes can saturated nonlinearity model and model which includes a saturated nonlinearity model and linear system [Liu, Z., & Li, C. (2017)]. This type of model model which includes a saturated nonlinearity model and linear system [Liu, Z., & Li, C. (2017)]. This type of model linear system [Liu, Z., & Li, C. (2017)]. This type of model has attracted attention and has been linear system [Liu,increasing Z., & Li, C. (2017)]. This model has been been attracted increasing attention and hastype beenofapplied applied has been attracted increasing attention and has been applied in fields. Some sensors in application has been attracted increasing and has been applied in diverse diverse Some sensorsattention in real real engineering engineering application fields. in diverse fields. Some sensors in real engineering application arediverse saturate, which leads to an an intrinsic threshold for the the in fields. Some sensors in real engineering application are saturate, which leads to intrinsic threshold for are saturate, which leads to an intrinsic threshold for the observed data.which To put put it another another way, the the threshold signals of offor which are saturate, leads to an intrinsic the observed data. To it way, signals which observed data. To put it another way, the signals of which observed data. To put it another way, the signals of which values exceed the threshold are censored. In addition, some values exceed the threshold are censored. In addition, some values exceed the threshold are censored. In addition, some values exceed the threshold are censored. In addition, some

data of of econometrics econometrics are are often often censored. censored. Since Since the the output output data data data data of econometrics are often censored. Since the output data data of econometrics are often censored. Since the output data of the censored regression may lose significant information, of the censored regression may lose significant information, of the censored regression may lose significant information, of the the censored regression may to lose significant information, using the traditional algorithms to identify this type type of model model using traditional algorithms identify this of using the traditional algorithms to identify this type of model may result result in biased biasedalgorithms and wrong wrongtoestimates estimates [Liu,type Z., of & model Li, C. C. using the traditional identify this may in and [Liu, Z., & Li, may result in biased and wrong estimates [Liu, Z., & Li, C. (2015)]. Recently, an attempt to may result in biasedin [Liu,the Z., censored & Li, C. Recently, inand an wrong attemptestimates to deal deal with with the censored (2015)]. (2015)]. Recently, in an attempt to deal with the censored regressionRecently, problem, algorithms have been (2015)]. in annumerous attempt toalgorithms deal with the censored numerous regression problem, have been regression problem, numerous algorithms have been proposed. In Inproblem, reference numerous [Cook, J., J., & & Mcdonald,have J. (2013)], (2013)], regression algorithms been proposed. reference [Cook, Mcdonald, J. proposed. In reference [Cook, J., & Mcdonald, J. (2013)], researches Inproposed proposed the maximum likelihood J.algorithm algorithm proposed. referencethe [Cook, J., & Mcdonald, (2013)], researches maximum likelihood researches proposed the maximum likelihood algorithm which processes processes the the desirable properties. However, this researches proposed maximum likelihood algorithm which the desirable properties. However, this which processes the desirable properties. However, this algorithm holds aa heavy heavy computational burden.However, Referencethis [J. which processes the desirable properties. algorithm holds computational burden. Reference [J. algorithm holds a heavy computational burden. Reference [J. J. Heckman. Heckman. (1976)] proposed the the burden. Heckman two-step algorithm holds(1976)] a heavy computational Reference [J. J. proposed Heckman two-step J. Heckman. (1976)] proposed the Heckman two-step J. Heckman. (1976)] proposed the Heckman two-step algorithm which which consists consists of of two two steps: steps: the the first first step step is is algorithm algorithm which consists of two steps: the first step is algorithm which consists of two steps: the first step is estimating the bias and the second one is fitting the original estimating the bias and the second one is fitting the original estimating the bias and the second one is fitting the original linear model. model. Then, reference [Powell, J.isL. L.fitting (1984)] proposed estimating theThen, bias and the second oneJ. theproposed original linear reference [Powell, (1984)] linear model. Then, reference [Powell, J. L. (1984)] proposed the least least absolute deviation method to to address the robust linear model. Then, deviation reference [Powell, J. L. (1984)]the proposed the absolute method address robust the least absolute deviation method to address the robust censored regression. To solve solve onlinetocensored censored regression the least absolute deviation method address the robust censored regression. To online regression censored regression. To solve online censored regression problems, regression. reference [Liu [Liu etsolve al., (2015)] (2015)] proposed the adaptive censored To et onlineproposed censoredthe regression problems, reference al., adaptive problems, reference [Liu et al., (2015)] proposed the adaptive Heckman reference two-step [Liu algorithm (TSA) proposed which significantly significantly problems, et al., (2015)] the adaptive Heckman two-step algorithm (TSA) which Heckman two-step algorithm (TSA) which significantly outperformstwo-step the conventional conventional adaptive algorithms while the the Heckman algorithmadaptive (TSA) algorithms which significantly outperforms the while outperforms the conventional adaptive algorithms while the output data data is isthe censored. outperforms conventional adaptive algorithms while the output censored. output data is censored. output data is censored. In this this paper, paper, aa variable variable step step size size LMS LMS based based on on censored censored In In this paper, a variable step size LMS based on censored regression (CR-VSS-LMS) is proposed. proposed. Owing to using using the In this paper, a variable step size LMS Owing based on censored regression (CR-VSS-LMS) is to the regression (CR-VSS-LMS) is proposed. Owing to using the variable step step size, the the proposed proposed algorithm has faster regression (CR-VSS-LMS) is proposed. Owinghas to using the variable size, algorithm aa faster variable step size, the proposed algorithm has a faster convergence speed than the previous algorithmhas [Liu et al., variable stepspeed size, than the the proposed algorithm a et faster convergence previous algorithm [Liu al., convergence speed than the previous algorithm [Liu et al., convergence speed than the previous algorithm [Liu et al.,

2405-8963 © 2019, IFAC (International Federation of Automatic Control) Hosting by Elsevier Ltd. All rights reserved. Peer review under responsibility of International Federation of Automatic Control. 10.1016/j.ifacol.2019.12.386



Feng Zhao et al. / IFAC PapersOnLine 52-24 (2019) 88–92

(2015)]. Contribution of this paper:

89

the speech signal exceeds the acceptable range of the sensor,

1.A simple review of the variable-step LMS algorithm based

it cannot be correctly observed. In this paper, we assumed that data less than zero in the output signal cannot be

on tongue-like curve 2.A variable step size LMS algorithm based on censored

observed. The censored output dˆ ( n) can be expressed as

regression model is proposed. 3.Simulation examples are presented to demonstrate the performance of the proposed algorithm.

dˆ (n)  ( X T (n)Wo   (n))   (d (n)) 

(5)

where (d (n))  max(0, d (n)) . In other words, when

2. PROPOSED CR-VSS-LMS ALGORITHM

d ( n)  0 , the data d (n) is missing, which leads to the bias

2.1 Review of a VSS-LMS based on tongue-like curve

and the inequality E[dˆ (n) | X T (n)Wo ]  X T (n)Wo . To

In [Deng et al., (2004)], Jiangbo Deng et al. proposed a

compensate the bias, we firstly correct the sample selection

variable step size LMS adaptive algorithm based on tongue-

bias, inspired by the Heckman two-stage approach [J. J.

like curve. The iterative formula is as follows:

Heckman. (1976)]. Due to the left-censored property of

e(n)  d (n)  X T (n)W (n)

(1)

dˆ (n) , only positive values of d (n) can be correctly

where X ( n) and W (n) denote the L  1 input vector

obtained. Recalling (5) and noting that the background noise  (n) is zero-mean signals, the expectation of d (n) under

and coefficient vector, respective. d (n) denote expected

the condition d (n) > 0 can be expressed by

signal. 

 ( n)    1  

 1    e ( n)  1  2

E[dˆ (n) | X T (n)Wo , dˆ (n)  0]

(2)

(6)

 X (n)Wo  E[ (n)  X (n)Wo ] T

where 0   max  1/ max , max is the maximum eigenvalue of the input signal autocorrelation matrix, and  ,  are the step adjustment factors. W (n  1)  W (n)  2 (n)e(n) X (n)

 E[d (n) | X T (n)Wo , d (n)  0] T

Before calculating the last term of (6), the following lemma is introduced. Lemma: The condition expectation E[ x | x  c] satisfies

(3) E[ x | x  c] 

2.2. Proposed new algorithm

f (c )  (c) . F (c )

(7)

Proof: See Appendix for detail.

Consider the following linear regression model

Using lemma, we have

d (n)  X (n)Wo   (n) T

(4)

where Wo is a L  1 unknown system,  (n) denote the background noise with zero-mean and variance  2 . The traditional adaptive filtering algorithm can be used to estimate the effective parameters of the unknown system, provided that the output signal is fully observed. However, in real life, the output is often not fully observed due to the saturation characteristics of the sensor. For example, in microphone array signal processing, since the amplitude of

E[ (n)  X T (n)Wo ]   ( X T ( n)  )

(8)

where the vector  is given by   Wo / 

Then, using (6) and the probability theory yields

(9)

Feng Zhao et al. / IFAC PapersOnLine 52-24 (2019) 88–92

90

E[dˆ (n) | X T (n)Wo ]  Pr(dˆ (n)  0) E[dˆ (n) | X T (n)Wo , dˆ (n)  0] (10)

ˆ  arg max (  )  arg max E[ n (  ) ]

(18)

( )  E[n ( ) ]

(19)

n (  )  log(Pr( dˆ ( n) | X ( n),  ))

(20)

where

 F ( X T (n)  ) X T (n)Wo   f ( X T (n)  )

where the second equation comes from the fact that the probability of

dˆ ( n)  0 is equal to F ( X T (n)  ) , i.e.,

and

Pr(dˆ (n)  0)  F ( X T ( n)  ) .According to (10), the censored regression model (5) can be expressed as:

with

dˆ (n)  F ( X T (n)  ) X T (n)W (n)   f ( X T (n)  )   (n) (11)

Pr(dˆ (n) | X (n),  )  [ F ( X T ( n)  )] ( n ) [ F (  X T ( n)  )]1  ( n )

where  ( n ) is the random variable with zero mean, i.e.

E[ (n) | X T (n)Wo ]  0 .

(12)

(21) In the sequel, using the steepest ascent principle yields [7]  (  ) ˆ (n  1)  ˆ (n)   n 

Then the new error function can be expressed as

 (W ,  ,  )  dˆ (n)  F ( X T (n)  ) X T ( n)W ( n)   f ( X T ( n)  )

ˆ ( n )

 ˆ (n)  [ (n) ( X T (n)  ) X (n)

(22)

 (1   (n)) ( X (n)  ) X ( n)] T

(13) Then the step size iteration formula can be expressed as 

 ( n)    1  

 1     (W ,  ,  )  1  2

method and the cost function (14)

Obviously, to estimate Wo , the estimation for  and  should also be available. In the sequel, we introduce an indicator variable  (n) to estimate  , 1, if dˆ (n)  0 0, otherwise

 ( n)  

Then, the estimation for  are considered. Using the decent

(15)

The probabilities of  (n) is expressed as

respected to  ( n) , we have

 (n  1)   (n) 

 (n) E[ 2 (W ,  ,  (n))] 2   (n)

(16)

Pr( ( n)  0)  F (  X T ( n)  )

(17)

(23)

  (n)   (n) f ( X T (n)  (n  1)) (W ,  ,  ( n))

Similarly, by maximizing the cost function and using the decent theory, we have W (n  1)  W ( n)   ( n)

Pr( (n)  1)  F ( X T ( n)  )

E[ 2 (W ,  ,  )] with

E[ 2 (W ,  ,  ( n))] W W (n)

 W (n)   (n) F ( X T (n)  ) X T (n) (W ,  ,  (n))

(24) 4. SIMULATION

Then, we consider the following optimization problem to

This section consists of three parts. The first part verifies the

estimate  [ Liu, Z., & Li, C. (2015)],

advantages of the algorithm proposed in this paper, while the second part and the third part respectively analyze the



Feng Zhao et al. / IFAC PapersOnLine 52-24 (2019) 88–92

91

parameters  and  in the Eq. (14). In the first experiment, we compared the CR-VSS-LMS algorithm proposed in this paper with LMS, VSS-LMS, and CR-LMS. The input signal is a zero-mean Gaussian signal and the background noise selects 30 dB of Gaussian white noise. The step size in LMS and CR-LMS was set to   0.05 . As can be seen from Fig.1, the LMS and VSS-LMS without bias compensation have a larger MSD than the CR-LMS and CR-VSS-LMS algorithms, which are both undesirably -5dB. The CR-LMS and CR-VSS-LMS algorithms that perform bias compensation on the output signal achieve -35dB. Moreover, it can be seen from the figure that the CR-VSSLMS algorithm proposed in this paper achieves a stable 35dB at 500 points, while the CR-LMS algorithm almost

Fig.2 MSDs of the CR-VSS-LMS algorithm with the parameter  ,where   70 .

reaches the steady state at 2500 points, fully demonstrating

In the third experiment, we analyzed the parameter  in Eq.

CR-VSS-LMS algorithm not only has a smaller MSD, but

(14). As can be seen from Fig.3, the larger  , the faster the

also has a faster convergence speed.

algorithm converges, but it will cause a larger MSD. At the same time, we can find that the convergence speed of the algorithm is not much different from that of MSD when

  100 and   50 . We can know that when α is large, the impact on the algorithm will be reduced.

Fig.1 Simulated MSD curves for the proposed algorithm, where   0.06 ,   70 We know that, when 0    1 / max , the CR-VSS-LMS is convergent. Therefore, we can find out that  max  1/ max by observing (14). In the second experiment, we selected three different values for analysis in this range. The simulation displayed that the larger β, the faster the convergence rate, and the corresponding MSD is larger.

Fig.3 MSDs of the CR-VSS-LMS algorithm with the parameter  , where   0.06 . 5. COMCLUSION In this paper, a variable step size algorithm based on the tongue line is chosen for the consideration of computational complexity. We can see that CR-VSS-LMS not only shows

Feng Zhao et al. / IFAC PapersOnLine 52-24 (2019) 88–92

92

very small MSD, but also has very fast convergence speed. APPENDIX

J. J. Heckman. (1976). The common structure of statistical

For a random x with standard distribution, i.e., the cumulative distribution function satisfies the condition in Table 1, the conditional density function of conditional random variable denoted by is given by

censored

regression

model. Journal

of

Bottai, M., & Zhang, J. (2010). Laplace regression with censored data. Biometrical Journal, 52(4), 487-503.

d b f ( x)  dx Pr( x  c) 1  F (c) f (u )du

Liu, Z., & Li, C. (2017). Recursive least squares for censored regression. IEEE

probability theory, the conditional expectation E[ x | x  c] can be computed by

Transactions

on

Signal

Processing, 65(6), 1565-1579. Liu, Z., & Li, C. (2015). Censored regression with noisy input. IEEE Transactions on Signal Processing, 63(19), 5071-5082.



E[ x | x  c]   x ( x | x  c)dv  c

c

4, pp. 311-801, 475492.

Econometrics, 25(3), 303-325.

where f ( x ) and F ( x) is expressed in Table 1. Using the



dependent variables, Ann. Econ. Social Meas., vol. 5, no.

the

d Pr(V  x | x  c) dx d Pr(V  x and x  c)  dv Pr( x  c)

=

models of truncation, sample selection and limited

Powell, J. L. (1984). Least absolute deviations estimation for

p( x | x  c) 

x

Hoboken, NJ, USA:Wiley.

Cook, J., & Mcdonald, J. (2013). Partially adaptive estimation

xf ( x) 1 dx    ( x) |c  1  F ( x) 1  F (c ) f (c ) f (c )    ( c ) 1  F (c ) F ( c )

of interval censored regression models. Computational Economics, 42(1), 119-131. Liu, Z., Li, C., & Liu, Y. (2015). Distributed censored regression over networks. IEEE Transactions on Signal

Table 1: Probability density function and distribution function of three

Processing, 63(20), 5437-5449.

different noises

Deng Jiangbo, Hou Xinguo, & Wu Zhengguo. (2004). Noise type

probability density

cumulative distribution

function

function ( x   )2

Gaussian f ( x) 

 2 1 e 2  2

F ( x) 

tongue line. Data Acquisition and Processing, 19(3). ( y   )2

 2 1 e 2 dy   2



x

Variable step size lms adaptive algorithm based on

ACKNOWLEDGMENTS This work was partially supported by National Science Foundation of P.R. China (Grant: 61871461, 61571374, 61433011), Sichuan Science and Technology Program (Grant: 19YYJC0681). REFERENCES S. S. Haykin. (2008) Adaptive Filter Theory. Chennai, India: Pearson Education, 16 A. H. Sayed. (2003) Fundamentals of Adaptive Filtering,