A new robust variable weighting coefficients diffusion LMS algorithm

A new robust variable weighting coefficients diffusion LMS algorithm

Author’s Accepted Manuscript A new robust variable weighting coefficients diffusion LMS algorithm Do-Chang Ahn, Jae-Woo Lee, Seung-Jun Shin, Woo-Jin S...

625KB Sizes 7 Downloads 195 Views

Author’s Accepted Manuscript A new robust variable weighting coefficients diffusion LMS algorithm Do-Chang Ahn, Jae-Woo Lee, Seung-Jun Shin, Woo-Jin Song www.elsevier.com/locate/sigpro

PII: DOI: Reference:

S0165-1684(16)30209-2 http://dx.doi.org/10.1016/j.sigpro.2016.08.023 SIGPRO6250

To appear in: Signal Processing Received date: 18 May 2016 Revised date: 31 July 2016 Accepted date: 16 August 2016 Cite this article as: Do-Chang Ahn, Jae-Woo Lee, Seung-Jun Shin and Woo-Jin Song, A new robust variable weighting coefficients diffusion LMS algorithm, Signal Processing, http://dx.doi.org/10.1016/j.sigpro.2016.08.023 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting galley proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

A new robust variable weighting coefficients diffusion LMS algorithm Do-Chang Ahn, Jae-Woo Lee, Seung-Jun Shin, Woo-Jin Song∗ Department of Electrical Engineering, Pohang University of Science and Technology (POSTECH), Korea

Abstract We introduce a new robust algorithm that is insensitive to impulsive noise (IN) for distributed estimation problem over adaptive networks. Motivated by the fact that each node can access to multiple spatial data, we propose to discard IN-contaminated data. Under the assumption that IN is successfully detected, we propose a cost function that considers only the uncontaminated data. The derived algorithm is the ATC diffusion LMS algorithm that has variable weighting coefficients depending on IN detection, which leads both to insensitivity to IN and to good estimation performance. A method to detect IN is also presented. Simulation results show that the proposed algorithm has good estimation performance in an environment that is subject to IN, and outperforms the conventional robust algorithms. Keywords: Adaptive networks, Distributed estimation, Impulsive noise, Robust algorithm, Diffusion LMS algorithm

1. Introduction Distributed estimation over adaptive networks has been frequently studied due to its potential for many applications [1, 2, 3, 4, 5]. In the problem of distributed estimation, numerous sensor nodes that have processing and communication ability cooperate to estimate a common parameter. Depending on the cooperation strategy, algorithms that have been proposed so far can be mainly categorized as incremental [6, 7] or diffusion [8, 9, 10]. There is no need for cyclic path in the diffusion strategy, which makes the diffusion strategy more popular than the incremental strategy. In signal processing fields, measurement noise is usually assumed to have a Gaussian distribution, and many algorithms are designed to perform in such a case. However, in real-world applications, impulsive noise (IN) also happens and degrades estimation performance of many algorithms [11]. Especially in the case of distributed estimation, IN could be propagated over entire network, so its influence must be reduced. ∗ E-mail

address: [email protected] research was supported by the MSIP (Ministry of Science, ICT and Future Planning), Korea, under the ITRC (Information Technology Research Center) support program (IITP-2016-H8601-16-1005) supervised by the IITP(Institute for Information & communications Technology Promotion). 1 This

Preprint submitted to Signal processing

August 17, 2016

To achieve insensitivity to IN, many robust algorithms [12, 13, 14, 15, 16, 17] have been developed. Most of them are designed to reduce the influence of IN during the update process. Several algorithms [18, 19, 20, 21, 22] for distributed estimation have been also proposed, but most of them simply adopt the schemes of the conventional robust algorithms. The main problem of these schemes is that the data contaminated with IN are still used, even if the influence is decreased. In distributed estimation, each node can communicate with other nodes, so that even a node that has data contaminated with IN can obtain uncontaminated data from other nodes. If IN is detected, each node can discard the IN-contaminated data and process only the uncontaminated data. Here, we propose an algorithm that uses only the uncontaminated data based on the detection result. Also, we use a mean square error (MSE) criterion which shows good estimation performance in an IN-free environment. The derived algorithm is based on the adapt-then-combine (ATC) diffusion least mean square (DLMS) algorithm [10] which is one of the most popular algorithms based on the MSE criterion, and has variable weighting coefficients depending on IN detection. Contrary to the conventional robust algorithms, the proposed algorithm rejects IN-contaminated data, and thereby improves estimation performance. This paper is organized as follows. In Section 2, we formulate the problem, and introduce the DLMS algorithm briefly. In Section 3, the proposed update equations are derived under the assumption that IN is detected, of which method follows next. In Section 4 we show simulation results, and in Section 5 we conclude the paper. Notation: We use boldface letters for random variables and normal letters for deterministic quantities. E [·] denotes the expectation operator, and [·]T denotes the transpose operator. 2. Background 2.1. Problem formulation Consider a network that consists of N nodes that are connected to each other in an own subset (Fig. 1). The subset of node k is called the neighborhood of node k and denoted by Nk , in which node k can communicate with other nodes. At each time i, node k senses the M × 1 input regression vector uk,i and desired signal dk (i); the two signals are assumed to be related as dk (i) = uTk,i wo + v k (i)

(1)

where wo is an M ×1 unknown parameter vector and v k (i) corresponds to additive noise, which is assumed to be independent of uk,i . The goal is for each node to estimate wo by using information from the neighborhood as well as own sensed data.

2

2.2. DLMS algorithm The MSE criterion provides good estimation performance if IN is absent. The DLMS algorithm [10] has been proposed by using the MSE criterion, and its ATC version is represented as ψk,i wk,i

= =

wk,i−1 + μk 



  cl,k ul,i dl (i) − uTl,i wk,i−1

l∈Nk

al,k ψl,i

(2)

l∈Nk

where μk is a positive step-size, and {cl,k , al,k } are non-negative weighting coefficients that satisfy: cl,k = al,k = 0 if l ∈ / Nk , and

N 

cl,k = 1,

k=1

N 

al,k = 1.

(3)

l=1

The DLMS algorithm consists of two steps. The first equation of (2) is an adaptation step in which each node k uses the data from Nk to calculate an intermediate estimate ψk,i . The second equation of (2) is a combination step to average intermediate estimates in the neighborhood. In both steps, each node uses the data from its neighborhood; this strategy results in spatial diversity of information and improves estimation performance. However, like other algorithms that use the MSE criterion, the DLMS algorithm suffers from poor estimation performance when IN is present. 2.3. Additive noise model In real-world applications, the additive noise v k (i) could contain not only a background measurement noise βk (i), but also IN ηk (i) as v k (i) = β k (i) + η k (i).

(4)

2 , and IN can The background measurement noise can be assumed as a zero-mean Gaussian with power σβ,k

be modeled as η k (i) = θ k (i)I k (i)

(5)

where θk (i) is a Bernoulli process with P [θk (i) = 1] = pk and I k (i) is a zero-mean Gaussian with power 2 2 2 . Although the occurrence probability pk is low, σI,k is much larger than σβ,k . Therefore, unlike the σI,k

background measurement noise, IN severely disturbs the convergence of many algorithms. 3. Proposed algorithm In distributed estimation, each node uses not only its own data but also multiple spatial data from its neighborhood. Therefore, a node that receives IN-contaminated data can discard them; if each node can detect the occurrence of IN in its own data, it can instead use the uncontaminated data from its neighborhood to perform the estimation task. As a result, the entire network can process with only the uncontaminated 3

data. To exploit this advantage, we propose a cost function that considers only the data without IN. Under the assumption that the data of IN are successfully eliminated, the MSE criterion can be minimized by using the remaining data for good estimation performance. The proposed algorithm is expected to outperform the conventional robust algorithms because IN-contaminated data that disturb the estimation are not used. 3.1. Derivation of update equations Assuming that the occurrence of IN is known, we propose a global cost function:  2  E dk (i) − uTk,i w .



Jiglob (w) =

(6)

k∈{l|θl (i)=0}

At the same time, we propose a local cost function that uses only the available data for each node: loc (w) = Jk,i



 2  c˜l,k (i)E dl (i) − uTl,i w

(7)

l∈Nk

where c˜l,k (i) is a non-negative weighting coefficient that satisfies: ⎧ ⎪ ⎨0 if θl (i) = 1 c˜l,k (i) = ⎪ ⎩cl,k otherwise

(8)

where cl,k is given in (3). The coefficient c˜l,k (i) represents the proportion that node l shares its data with node k. If IN occurs at node l, then the data from node l is rejected by c˜l,k (i) = 0, so node k does not use those data (Fig. 2). Considering c˜l,k (i), the global cost function (6) can be expressed as the sum of the local cost functions (7) over the entire network, and the procedures in [10] yield an alternative global cost function: 

Jiglob (w) =



N  2   c˜l,k (i)E dl (i) − uTl,i w + w − wo 2Γl,i

(9)

l=k

l∈Nk

where the notation a2Σ = aT Σa represents a weighted vector norm and Γl,i = m∈Nl c˜m,l (i)Ru,m is a 

covariance matrix, and where Ru,m = E um,i uTm,i . As in [10], we replace the unknown parameter wo with an available parameter ψl from node l, and replace Γl,i with a diagonal weighting matrix Γl,i = ˜bl,k (i)IM , where ˜bl,k (i) is a non-negative weighting coefficient that satisfies: ˜bl,k (i) = 0 if l ∈ / Nk or



c˜m,l (i) = 0.

(10)

m∈Nl

Note that the second condition of (10) results from the fact that all zero-coefficients in the covariance matrix go to a null matrix. These modifications finally yield the following distributed cost function which is based only on available information to each node: dist (w) = Jk,i



 2  c˜l,k (i)E dl (i) − uTl,i w +

l∈Nk

 l∈Nk /{k}

4

˜bl,k (i) w − ψl 2 .

(11)

Using the steepest-descent method to minimize (11) yields the following update equations: ψk,i



= wk,i−1 + μk

c˜l,k (i) (Rdu,l − Ru,l wk,i−1 )

l∈Nk



= ψk,i + νk

wk,i

˜bl,k (i) (ψl − wk,i−1 )

(12)

l∈Nk /{k}

where {μk , νk } are positive step-sizes, and Rdu,l = E [dl (i)ul,i ]. Note that an intermediate estimate ψk,i is introduced in the process of dividing the solution into two steps. As in [10], we replace ψl and wk,i−1 in the second equation of (12) with ψl,i and ψk,i , respectively; the result can be rewritten as ⎛ ⎞   ˜bl,k (i)⎠ ψk,i + νk ˜bl,k (i)ψl,i , wk,i = ⎝1 − νk l∈Nk /{k}

(13)

l∈Nk /{k}

and if we set a non-negative coefficient: 

a ˜k,k (i) = 1 − νk

˜bl,k (i), and a ˜l,k (i) = νk ˜bl,k (i) for l = k,

(14)

l∈Nk /{k}

then equation (13) can be expressed as wk,i =



a ˜l,k (i)ψl,i .

(15)

l∈Nk

If we introduce an effective neighborhood of node k as    k,i = l|l ∈ Nk and c˜m,l (i) = 0 ∪ {k} , N

(16)

m∈Nl

the coefficient a ˜l,k (i) can be expressed as k,i , and /N a ˜l,k (i) = 0 if l ∈

N 

a ˜l,k (i) = 1.

(17)

l=1

The effective neighborhood of a node includes only the node itself and the neighborhood nodes that receive k,i more than one uncontaminated data, i.e., m∈Nl c˜m,l (i) = 0. If IN occurs at every node in Nl , then N excludes node l, so a ˜l,k (i) becomes 0; this process prevents ψl,i from participating in the combination for wk,i in (15) (Fig. 3). Finally, using the instantaneous approximations for Rdu,l and Ru,l in (12), the proposed update equations are obtained as ψk,i wk,i

= wk,i−1 + μk =





  c˜l,k (i)ul,i dl (i) − uTl,i wk,i−1

l∈Nk

a ˜l,k (i)ψl,i .

(18)

l∈Nk

Compared to the DLMS algorithm, the proposed algorithm has weighting coefficients that vary depending on the existence of IN θk (i). In the adaptation step, if IN occurs at node l, c˜l,k (i) becomes 0, so neighborhood 5

node k does not use {dl (i), ul,i }. Thus, the entire network can reject IN-contaminated data. If IN occurs at every node in Nl , i.e., m∈Nl c˜m,l (i) = 0, then node l cannot be updated in the adaptation step. At the same time, a ˜l,k (i) becomes 0, so ψl,i does not participate in the combination step at node k. As a result, out-of-date intermediate estimates due to IN can be eliminated. By this process, both insensitivity to IN and good estimation performance can be accomplished. For this reason, the proposed algorithm is named Robust Variable Weighting Coefficients DLMS (RVWC DLMS). 3.2. Detection of impulsive noise Basically, we adopt the detection method used in M-estimate algorithms [13, 23]. We first introduce an estimation error: ek (i) = dk (i) − uTk,i wk,i−1 .

(19)

Because IN generally has much greater power than the background measurement noise, the magnitude of the error abruptly increases when IN happens. Therefore, the occurrence of IN can be detected by comparing the magnitude of the error to threshold T : θˆk (i) =

⎧ ⎪ ⎨1 if |ek (i)| ≥ T ⎪ ⎩0 otherwise

.

(20)

To determine threshold T , we assume that the error has a Gaussian distribution when IN is not present. Then, the probability that the magnitude of the error is larger than T is given by   T P [|ek (i)| > T ] = erfc √ 2σe,k (i) where σe,k (i) is the standard deviation of the error without IN, and erfc(x) =

√2 π

(21) ∞ x

2

e−t dt is the comple-

mentary error function. If the magnitude of the error becomes so large that the probability is too low in (21), then the error is likely to contain IN. Therefore, by determining the confidence degree (or the probability) in (21), we can obtain the threshold of the form T = ασe,k (i), where α is a non-negative constant depending on the confidence degree. Methods to estimate σe,k (i) are also proposed in [13, 23], but they are sensitive to the probability of IN and require large computation due to the median operator. To overcome these drawbacks, we propose the following conditional time-averaging method: ⎧ ⎪ 2 ⎨λˆ σe,k (i − 1) + (1 − λ)e2k (i) if θˆk (i) = 0 2 σ ˆe,k (i) = ⎪ 2 ⎩σ ˆe,k (i − 1) otherwise

(22)

2 where λ is a forgetting factor. To ensure that the estimate is insensitive to IN, σ ˆe,k (i) is not updated if IN is 2 (i) is updated after the detection, so σe,k (i) is estimated not by σ ˆe,k (i) but by σ ˆe,k (i − 1); detected. Also, σ ˆe,k

T = αˆ σe,k (i − 1). 6

3.3. Practical considerations When IN is detected at a node, its neighborhood nodes do not use those data in the adaptation step. Although this property guarantees insensitivity to IN, the energy of the update decreases because data are missing. To maintain the update energy, we modify the step-size to ⎧  c ⎪ ⎨  l∈Nk l,k μk if ˜l,k (i) = 0 l∈Nk c ˜l,k (i) l∈Nk c μ ˜k (i) = ⎪ ⎩ μk otherwise

.

(23)

If some nodes in Nk include IN and c˜l,k (i) = 0 results for those nodes, then μ ˜ k (i) is adjusted to be larger than μk ; this adjustment compensates for the loss of the update energy due to the missing data. The variance estimation (22) helps reduce sensitivity to IN, but at the same time loses the ability to track the real value if it increases suddenly. When wo changes, this loss leads to failure of rapid restoration. 2 (0). First, To solve this problem, we include a control method that also presents the initial value σ ˆe,k

2  T using the L × 1 error regression vector Ak,i = ek (i), . . . , e2k (i − L + 1) , we can obtain the initial value 2 (0) = median(Ak,0 ). In addition, we introduce a stack parameter: σ ˆe,k

Sk (i) =

⎧ ⎪ ⎨Sk (i − 1) + 1 if θˆk (i) = 1 ⎪ ⎩0

(24)

otherwise

which shows the accumulation of IN. If Sk (i) becomes too large, then the cause is probably a change in wo , 2 (i) can be included as rather than the occurrence of IN. Thus, the initialization for σ ˆe,k

⎧ ⎪ 2 ⎪ λˆ σe,k (i − 1) + (1 − λ)e2k (i) if θˆk (i) = 0 ⎪ ⎪ ⎨ 2 (i) = median(Ak,i ) σ ˆe,k elseif Sk (i) ≥ L ⎪ ⎪ ⎪ ⎪ ⎩σ 2 ˆe,k (i − 1) else

.

(25)

Compared to the DLMS algorithm, the RVWC DLMS algorithm requires additional 5 multiplications, ˜l,k (i) per iteration, where nk is the number of nk + 1 additions, 1 division, and extra computations for a nodes in Nk . The median operator is not considered because it is performed only in the special case. The computational complexity for a ˜l,k (i) depends on how the weight is assigned, but the weighting rules that are commonly used [10] do not require much. Therefore, the RVWC DLMS algorithm costs only a few additional computations compared to the DLMS algorithm. 4. Simulation results To demonstrate the estimation performance of the RVWC DLMS algorithm, we present a simulation 2

of a system identification scenario in which the unknown system is set to be M = 8 and wo  = 1. We 7

consider N = 30 nodes (Fig. 4). The input regressor is a zero-mean Gaussian, and IN occurs with probability 2 2 = 104 σβ,k . All results are obtained by taking the ensemble average of a network pk = 0.1 and has power σI,k

mean square deviation (MSD) defined as MSDnetwork (i) =

N  1   o 2 E w − w k,i  N

(26)

k=1

over 200 independent trials. In every algorithm, we use the metropolis rule for the adaptation weight cl,k , and the relative degree rule for the combination weight al,k ; both rules are stated in [10]. For the RVWC DLMS algorithm, relative degree rule for the combination weight a ˜l,k (i) can be determined as a ˜l,k (i) =

nl

k,i m∈N

nm

.

(27)

Fig. 5 shows the network MSD for the DLMS algorithm and the RVWC DLMS algorithm. Both algorithms use step-size μk = 0.05. For the RVWC DLMS algorithm, parameters are L = 30, α = 2.58 (which corresponds to 99% confidence degree), and λ = 1 −

1 M

(which is commonly used in estimation

problems). Due to the presence of IN, the DLMS algorithm has very poor estimation performance, but the RVWC DLMS algorithm is insensitive to IN and has good estimation performance by using variable weighting coefficients. We also changed wo by multiplying it by −1 at time i = 501 to test the tracking ability. Although system change is detected after L iterations, the RVWC DLMS algorithm tracks the sudden system change well. The next simulation tests the estimation performance at different confidence degrees α (Fig. 6). All parameters except α are the same as in the previous simulation. A small value of α leads to good insensitivity to IN, but increases the probability that uncontaminated data would be discarded. As a result, we can see that the convergence rate decreases when 95% confidence degree is used. In contrast, a large value of α reduces the false alarm in detection, but may allow inclusion of IN-contaminated data. Fig. 6 suggests that 99% confidence degree yields both insensitivity and low false alarm. In Fig. 7, the IN detection rate of the RVWC DLMS algorithm is shown for different powers of IN. All 2 decreases. However, parameters are the same as in previous simulations. The detection rate declines as σI,k 2 , because IN which is not detected has small magnitude the estimation performance is hardly affected by σI,k

that does not disturb the estimation. In addition, the detection rate is low during the transient state when the additive noise does not dominate the magnitude of the error; the rate increases as wk,i converges and IN with small magnitude amplifies the error. Considering that a noise with small magnitude is not regarded as IN, the proposed detection method works well without degradation of the estimation performance. Fig. 8 shows how the probability of IN affects the estimation performance of the RVWC DLMS algorithm. All parameters are the same as in previous simulations. As the probability increases, the estimation performance is expected to be degraded more in estimation problems; this is also natural for the RVWC 8

DLMS algorithm, because more data are missing. Nevertheless, the RVWC DLMS algorithm successfully converges at all probabilities tested. When pk = 0.5, the influence of IN appears during the transient state; 2 2 (0) being wrong. However, σ ˆe,k (i) is this increased miss rate in IN detection is due to the initial value σ ˆe,k

adjusted well over time, and the RVWC DLMS algorithm eventually becomes insensitive to IN. Considering that pk = 0.5 is unrealistic, the RVWC DLMS algorithm is insensitive to IN at various probabilities. Finally, we compare the RVWC DLMS algorithm to several algorithms previously reported [18, 20, 21]. All parameters for the RVWC DLMS algorithm are the same as in previous simulations, and parameters for other algorithms are set to show the best estimation performance with the same convergence rate for every algorithm (p = 1.2, μk = 0.035 for [18], σs2 = 0.01, μk = 0.23 for [20], and σ = 1, μk = 0.15 for [21]). The diffusion strategy is considered for the algorithm in [20], and the data are not shared in the adaptation step for every algorithm. Fig. 9 shows that the RVWC DLMS algorithm outperforms all other algorithms. This result occurs because the RVWC DLMS algorithm excludes IN-contaminated data, whereas the other algorithms use them. We also tried other IN models such as alpha-stable noise and Laplace noise (not shown here). When alpha-stable noise is used, the DLMS algorithm deteriorates, but the RVWC DLMS algorithm retains good estimation performance. However, in the case of Laplace noise, the RVWC DLMS algorithm cannot show remarkable estimation performance compared to the DLMS algorithm. This result occurs because the proposed IN detection method is not appropriate for this kind of noise; most noise from a Laplace distribution is not considered as IN, and only the rare very loud noises are discarded. 5. Conclusion A new robust algorithm for distributed estimation problem over adaptive networks was introduced. We proposed a cost function that considers only the data from nodes that are not affected by IN. The derived algorithm uses only the uncontaminated data in the adaptation step and averages only updated intermediate estimates in the combination step by using variable weighting coefficients. A method to detect IN was also presented. Simulation results showed that the proposed algorithm has good estimation performance in an environment that is subject to IN, and outperforms the conventional robust algorithms. References [1] F.S. Cattivelli, A.H. Sayed, “Modeling bird flight formations using diffusion adaptation,” IEEE Trans. Signal Process., vol. 59, no. 5, pp. 2038–2051, May 2011. [2] S.-Y. Tu, A.H. Sayed, “Mobile adaptive networks,” IEEE J. Sel. Top. Signal Process., vol. 5, no. 4, pp. 649–664, August 2011. [3] A.H. Sayed, S.-Y. Tu, J. Chen, X. Zhao, Z.J. Towfic, “Diffusion strategies for adaptation and learning over networks,” IEEE Signal Process. Mag., vol. 30, no. 3, pp. 155–171, May 2013.

9

[4] P.Di Lorenzo, S. Barbarossa, A.H. Sayed, “Bio-inspired decentralized radio access based on swarming mechanisms over adaptive networks,” IEEE Trans. Signal Process., vol. 61, no. 12, pp. 3183–3197, June 2013. [5] P.Di Lorenzo, S. Barbarossa, A.H. Sayed, “Distributed spectrum estimation for small cell networks based on sparse diffusion adaptation,” IEEE Signal Process. Lett., vol. 20, no. 12, pp. 1261–1265, December 2013. [6] C.G. Lopes, A.H. Sayed, “Incremental adaptive strategies over distributed networks,” IEEE Trans. Signal Process., vol. 55, no. 8, pp. 4064–4077, August 2007. [7] L. Li, J.A. Chambers, “A new incremental affine projection-based adaptive algorithm for distributed networks,” Signal Process., vol. 88, pp. 2599–2603, October 2008. [8] F.S. Cattivelli, C.G. Lopes, A.H. Sayed, “Diffusion recursive least-squares for distributed estimation over adaptive networks,” IEEE Trans. Signal Process., vol. 56, no. 5, pp. 1865–1877, May 2008. [9] C.G. Lopes, A.H. Sayed, “Diffusion least-mean squares over adaptive networks: formulation and performance analysis,” IEEE Trans. Signal Process., vol. 56, no. 7, pp. 3122–3136, July 2008. [10] F.S. Cattivelli, A.H. Sayed, “Diffusion LMS strategies for distributed estimation,” IEEE Trans. Signal Process., vol. 58, no. 3, pp. 1035–1048, March 2010. [11] S.R. Kim, A. Efron, “Adaptive robust impulse noise filtering,” IEEE Trans. Signal Process., vol. 43, no. 8, pp. 1855–1866, August 1995. [12] J. Chambers, A. Avlonitis, “A robust mixed-norm adaptive filter algorithm,” IEEE Signal Process. Lett., vol. 4, no. 2, pp. 46–48, February 1997. [13] Y. Zou, S.-C. Chan, T.-S. Ng, “Least mean M-estimate algorithms for robust adaptive filtering in impulse noise,” IEEE Trans. Circuits Syst. II, vol. 47, no. 12, pp. 1564–1569, December 2000. [14] W. Liu, P.P. Pokharel, J.C. Principe, “Correntropy: properties and applications in non-Gaussian signal processing,” IEEE Trans. Signal Process., vol. 50, no. 11, pp. 5286–5298, November 2007. [15] L.R. Vega, H. Rey, J. Benesty, S. Tressens, “A new robust variable step-size NLMS algorithm,” IEEE Trans. Signal Process., vol. 56, no. 5, pp. 1878–1893, May 2008. [16] N.J. Bershad, “On error saturation nonlinearities for LMS adaptation in impulsive noise,” IEEE Trans. Signal Process., vol. 56, no. 9, pp. 4526–4530, September 2008. [17] B. Majhi, G. Panda, B. Mulgrew, “Robust identification using new Wilcoxon least mean square algorithm,” Electron. Lett., vol. 45, no. 6, pp. 334–335, March 2009. [18] F. Wen, “Diffusion least-mean P-power algorithms for distributed estimation in alpha-stable noise environments,” Electron. Lett., vol. 49, no. 21, pp. 1355–1356, October 2013. [19] U.K. Sahoo, G. Panda, B. Mulgrew, B. Majhi, “Development of robust distributed learning strategies for wireless sensor networks using rank based norms,” Signal Process., vol. 101, pp. 218–228, August 2014. [20] T. Panigrahi, G. Panda, B. Mulgrew, “Error saturation nonlinearities for robust incremental LMS over wireless sensor networks,” ACM Trans. Sensor Netw., vol. 11, no. 2, December 2014. [21] W.M. Bazzi, A. Rastegarnia, A. Khalili, “A robust diffusion adaptive network based on the maximum correntropy criterion,” Proceedings of International Conference on Computer Communication and Networks (ICCCN), Las Vegas, USA, August 2015, pp. 1–4. [22] J. Ni, J. Chen, X. Chen, “Diffusion sign-error LMS algorithm: formulation and stochastic behavior analysis,” Signal Process., vol. 128, pp. 142–149, November 2016. [23] Y. Zou, S.C. Chan, T.S. Ng, “A robust M-estimate adaptive filter for impulse noise suppression,” Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Phoenix, USA, March 1999, pp. 1765– 1768.

10

Table 1: RVWC DLMS algorithm

IN detection ek (i) = dk (i) − uTk,i wk,i−1

T Ak,i = e2k (i), . . . , e2k (i − L + 1) ⎧ ⎪ ⎨1 if |ek (i)| ≥ αˆ σe,k (i − 1) θˆk (i) = ⎪ ⎩0 otherwise ⎧ ⎪ ⎨Sk (i − 1) + 1 if θˆk (i) = 1 Sk (i) = ⎪ ⎩0 otherwise ⎧ ⎪ 2 ⎪ λˆ σe,k (i − 1) + (1 − λ)e2k (i) if θˆk (i) = 0 ⎪ ⎪ ⎨ 2 (i) = median(Ak,i ) σ ˆe,k elseif Sk (i) ≥ L ⎪ ⎪ ⎪ ⎪ ⎩σ 2 ˆe,k (i − 1) else Adaptation ⎧ ⎪ ⎨0 if θˆl (i) = 1 c˜l,k (i) = ⎪ ⎩cl,k otherwise ⎧  c ⎪ ⎨  l∈Nk l,k μk if ˜l,k (i) = 0 l∈Nk c ˜l,k (i) l∈Nk c μ ˜k (i) = ⎪ ⎩μ otherwise k   ψk,i = wk,i−1 + μ ˜k (i) l∈Nk c˜l,k (i)ul,i dl (i) − uTl,i wk,i−1 Combination   k,i = l|l ∈ Nk and N ˜m,l (i) = 0 ∪ {k} m∈Nl c

k,i , and N a al,k (i) = 0 if l ∈ /N a ˜l,k (i): non-negative value satisfying [˜ l=1 ˜l,k (i) = 1] ˜l,k (i)ψl,i wk,i = l∈Nk a

11

Figure 1: A network with N nodes.

At time i, each node k senses

neighborhood Nk .

12

  dk (i), uk,i , and communicates with nodes in its

Figure 2: When IN occurs at node l, c˜l,k (i) becomes 0.

13

Figure 3: When IN occurs at every node in the neighborhood Nl , a ˜l,k (i) becomes 0.

14

Figure 4: Network topology (top), trace of input regressor covariance Tr(Ru,k ) (bottom, left), and background measurement 2 (bottom, right) for N = 30 nodes. noise variance σβ,k

15

Network MSD (dB)

10

(a) DLMS (b) RVWC DLMS

0

(a) −10

−20

−30

(b) −40

−50 0

100

200

300

400

500

600

700

800

900

1000

Iteration number Figure 5: Network MSD for the DLMS algorithm and the RVWC DLMS algorithm

16

0

(a) α=1.96 (95%) (b) α=2.58 (99%) (c) α=2.81 (99.5%) (d) α=3.28 (99.9%)

Network MSD (dB)

−5 −10 −15

(a)

−20

(b), (c), (d)

−25 −30 −35 −40 −45 0

20

40

60

80

100

120

140

160

180

200

Iteration number Figure 6: Network MSD for the RVWC DLMS algorithm with different confidence degrees

17

Network MSD (dB)

0

2 2 I,k β,k 2 4 2 (b) σ =10 σ I,k β,k 2 5 2 (c) σI,k=10 σβ,k 2 2 (d) σI,k=106σβ,k

(a) σ =103σ

−5 −10 −15 −20 −25

(a) (b), (c), (d)

−30 −35 −40 −45 0

40

20

60

80

100

120

140

160

180

200

Iteration number 1

IN detection rate

0.9

(d) (c)

0.8 0.7

2 3 2 I,k β,k 2 4 2 (b) σ =10 σ I,k β,k 2 5 2 (c) σI,k=10 σβ,k 2 6 2 (d) σ =10 σ I,k β,k

(a) σ =10 σ

(b)

0.6 0.5

(a)

0.4 0.3 0.2 0.1 0 0

20

40

60

80

100

120

140

160

180

200

Iteration number Figure 7: Network MSD (top) and IN detection rate (bottom) for the RVWC DLMS algorithm in different powers of IN

18

0

(a) pk=0.1

Network MSD (dB)

−5

(b) pk=0.2

−10

(c) pk=0.3

−15

(d) pk=0.4

−20

(e) pk=0.5

−25 −30

(a)

−35

(b)

(c)

(d)

(e)

−40 −45 0

50

100

150

200

250

300

350

400

450

500

Iteration number Figure 8: Network MSD for the RVWC DLMS algorithm in different occurrence probabilities of IN

19

0

(a) Algorithm in [18] (b) Algorithm in [20] (c) Algorithm in [21] (d) RVWC DLMS

Network MSD (dB)

−5 −10 −15 −20

(a) (b)

−25 −30

(c)

−35

(d) −40 0

20

40

60

80

100

120

140

160

180

200

Iteration number Figure 9: Network MSD for the conventional algorithms and the RVWC DLMS algorithm

20