JID:YDSPR AID:102589 /FLA
[m5G; v1.261; Prn:3/10/2019; 8:27] P.1 (1-13)
Digital Signal Processing ••• (••••) ••••••
1
Contents lists available at ScienceDirect
67 68
2 3
Digital Signal Processing
4
69 70 71
5
72
6
www.elsevier.com/locate/dsp
7
73
8
74
9
75
10
76
11 12 13
Robust distributed Lorentzian adaptive filter with diffusion strategy in impulsive noise environment
16 17 18
Annet Mary Wilson
a,∗
a
, Trilochan Panigrahi , Ankit Dubey
b
a
23
26 27 28
82
85 86
a r t i c l e
i n f o
Article history: Available online xxxx
24 25
81
84
20 22
79
83
Dept. of Electronics & Communication Engineering, National Institute of Technology, Goa, India b Dept. of Electrical Engineering, Indian Institute of Technology Jammu, India
19 21
78 80
14 15
77
Keywords: Distributed Lorentzian Impulse noise Channel estimation
29 30 31
a b s t r a c t
87 88
Many works of literature support the potential of distributed channel estimation resorting to the traditional LMS algorithm and its variants. But these conventional LMS algorithms fail in an impulsive noise environment, which is undeniable in many communication systems. Hence in this paper, we study distributed channel estimation with robust cost functions. Most of the robust adaptive algorithms are less efficient in terms of convergence rate. To deal with this, we propose the use of the windowbased Lorentzian norm in a distributed framework to gain the merit of improved convergence rate in terms of both distribution and data reuse. The performance of the proposed algorithm is validated using simulation results. Our contribution in this work is the application of Lorentzian norm in sensor networks with diffusion cooperation and stability analysis of the same. © 2019 Elsevier Inc. All rights reserved.
89 90 91 92 93 94 95 96 97
32
98
33
99
34
100
35 36
1. Introduction
37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58
Wireless sensor networks (WSN) have emerged as trending, with its application in diverse fields like industry, environmental monitoring, military, etc. This has stirred interest in researchers to deal with the limited power and computational capability of nodes in WSN [1], to tackle the noisy links [2] and so on. The focus of this paper is the channel estimation in a WSN under impulsive noise environment. The presence of impulsive noise in communication systems has been stated in many works of literature and shall be duly noted later in this section. A lot of research works on channel estimation have elucidated its importance in wireless communication systems. Due to the multiple path propagation of a signal in a wireless channel, the received signal could be in the grip of intersymbol interference (ISI). To mitigate the effect of ISI inverse filtering (equalization) is performed at the receiver front, which requires the exact channel information [3]. Consider a common source signal transmitted gets received by sensor nodes, which are distributed over the destination with certain topology. We consider an additive impulse noise channel. It can be mathematically described as a linear finite impulse response (FIR) filter connected serially with a noise source modeled
59 60 61 62 63 64 65 66
*
Corresponding author. E-mail addresses:
[email protected] (A.M. Wilson),
[email protected] (T. Panigrahi),
[email protected] (A. Dubey). https://doi.org/10.1016/j.dsp.2019.102589 1051-2004/© 2019 Elsevier Inc. All rights reserved.
as a Gaussian mixture [4,5]. We aim to estimate the wireless channel parameters at the network nodes from the noisy received signals collected. There are mainly two approaches to estimate the desired quantity in a WSN: centralized estimation and distributed estimation. In centralized estimation, the data collected over the nodes are sent to a central processor which filters out the desired quantity from the noise [6]. This requires a powerful central processor and is more susceptible to link failure. In the latter approach, an adaptive filter at each node process the noisy data according to certain rules, and the estimated data is shared with its neighboring nodes. This approach is more robust to link failure and adaptive to environmental change [7]. The discussion of distributed estimation is not complete without mentioning the two main methods of operation such as incremental and diffusion. Unlike the consensus strategy of distribution, which only takes care of temporal information, they incorporate spatial information as well to the estimation along with temporal information [8]. In the incremental approach, the nodes are linked in a cyclic manner with node k being the only neighbor of node k − 1 and node 1 being the only neighbor of node N [9–11]. Thus the network topology of incremental is simpler considering the diffusion distributed network where each node is a member of a neighborhood with many neighbors connected with certain topology [7,8,12,13]. But diffusion is more robust to link failure than incremental. Adaptive filtering is the core of statistical signal processing and is implemented in this paper under diffusion distributed framework owing to the fact that it improves the convergence rate and steady state performance [8] as compared to simple non-
101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132
JID:YDSPR
2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66
AID:102589 /FLA
[m5G; v1.261; Prn:3/10/2019; 8:27] P.2 (1-13)
A.M. Wilson et al. / Digital Signal Processing ••• (••••) ••••••
cooperative filtering. Due to the mathematical tractability of the l2 norm, least mean square (LMS) approach is the most robust and common adaptive filtering rule when the system is linear and noise is Gaussian. The idea of the LMS algorithm is developed by Widrow and Hoff [14] in the year 1959, which soon became popular among researchers working in the field of statistical signal processing. But the incapability of the particular algorithm to deal with outliers or impulsive noise (heavy-tailed distribution) led to further research in robust adaptive algorithms. Many works of literature are there to support the presence of impulsive noise in a communication system [15]. In an indoor wireless communication channel, electrical equipment such as microwave ovens, or office equipment like printers, copying machines, etc can cause impulsive noise [16] whereas, in an outdoor environment, it is caused due to switching transients in power line or automobile ignition [17]. In a radar system, natural causes like thunderstorm and lightning could cause impulsive noise [18]. Reinforcing this, in recent past, countless research works in communication systems like, source localization based on the direction of arrival (DOA) and time difference of arrival (TDOA) estimation [19,20], speech signal recovery using variational Bayesian algorithm [21], are being carried out considering impulsive interferences. An equivalence of minimum mean squared error (MMSE) and maximum signal-to-noise-ratio (MSNR) criterion in Bayesian estimation in additive non-Gaussian channel is proved in [22]. In [23], the authors proposed a method based on empirical characteristic function (ECF) to estimate the signal level in a binary communication system under impulsive noise. An impulse noise reduction technique in telecommunications based on minimum mean square error (MMSE)/maximum a posteriori (MAP) estimation is proposed in [24]. It has been noted that a lower order statistical measure of an error signal, like l p norm with 0 ≤ p ≤ 1 works better in an impulsive noise environment. In sign algorithm (SA) [25] we use l1 norm of error as the cost function and is shown to work in an impulsive environment. Lower order statistics is found to have a detrimental effect on the convergence rate. To incorporate the attractive features of the l2 norm like faster convergence rate and better steady state error while being less sensitive to outliers, l1 and l2 norm is combined with a threshold in Huber cost function [26]. But properly setting the threshold is found to be tiresome. Other robust cost functions which can improve convergence rate include mixed norm [27,28] which uses both l1 and l2 parameter, least logarithmic absolute difference (LLAD) [29], etc. Another way to tackle slow convergence rate is by reusing data as in affine projection sign algorithm (APSA) [30] which combines the feature of multiple projections with l1 norm optimization. Another class of robust cost functions is developed by making use of saturation property of error nonlinearity like maximum correntropy criterion (MCC) [31–33], Lorentzian adaptive filter (LAF) [34] and sigmoid cost function [35]. In [36], a study on robust adaptive algorithms with error non-linearity is done where the existing algorithms are classified as V-shaped and -shaped based on the weighting function and the authors came up with a new family of robust M-shaped algorithms, which brings together the merits of both V-shaped and -shaped algorithms. Lately, many research works are going on in distributed estimation under impulse noise environment. An adapt-then-combine diffusion strategy is employed in [37] by using l1 norm of error as cost function at each node. A distributed 1-bit compressed sensing algorithm is applied to sparse channel estimation in impulsive noise in [38]. A weighted least mean p-power (LMP) algorithm is proposed in [39] in a non-uniform noise environment with distributed optimization. The efficacy of correntropy criterion in presence of outliers is extended to distributed estimation in [40]. While these proved its efficacy in the presence of impulsive noise, not much discussion is carried out in terms of convergence rate. The
67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
Fig. 1. Networked channel estimation model.
91 92
features of affine projection(AP) improve convergence rate and are used together with l1 norm optimization to achieve robustness as mentioned prior. It is extended to distributed parameter estimation in [41]. To achieve the merits of data reuse along with strong outlier rejection in distributed estimation, we extend the LAF filter in [34] to a diffusion framework and compare the performance with Huber cost function and MCC in a distributed network and diffusion affine projection sign algorithm (DAPSA). In LAF, a sliding window-based Lorentzian cost function is used, where the trade-off between convergence rate and steady state error being controlled by the window length. The Lorentzian norm is convex near the origin and unlike Huber norm, it is continuous everywhere [34]. The main attraction to Lorentzian norm is its ability to reduce the influence of outliers considerably. Also, the lack of nonlinear mathematical operations in the filter update equation makes it a suitable candidate for hardware implementation. Our main contribution is the extension of the LAF filter to distributed network channel estimation and convergence analysis of the same in an impulsive noise environment. The remainder of the paper is organized as follows. Section 2 presents the adaptive solution derived for channel estimation in a distributed framework. The distributed robust adaptive algorithms used in this paper are discussed in Section 3. The proposed Combine then Adapt-Diffusion LAF is discussed in Section 4. The performance analysis of the proposed diffusion LAF algorithm in a global framework is presented in Section 5. Section 6 illustrates the simulation results and performance comparison of proposed Combine then Adapt-Diffusion LAF with other robust algorithms. Section 7 provides the concluding remarks. 2. Distributed channel estimation
93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126
The channel estimation scenario is described as follows. Consider a single source transmitting a signal through a wireless channel, to a receiver which is modeled as a distributed network with N nodes. At each node, there are spatially independent noise sources (a mixture of Gaussian and impulsive distribution) as shown in Fig. 1. Mathematically the channel can be
127 128 129 130 131 132
JID:YDSPR AID:102589 /FLA
[m5G; v1.261; Prn:3/10/2019; 8:27] P.3 (1-13)
A.M. Wilson et al. / Digital Signal Processing ••• (••••) ••••••
1 2 3 4 5 6 7 8 9 10 11 12 13 14
modeled as linear finite impulse response (FIR) filter of order M, connected to a noise source at each node. The estimation problem is mathematically described below. From the collection of noisy measurements across the nodes of the sensor network, the M × 1 unknown channel coefficient vector w (o) is to be estimated. Each node k, where k = 1, . . . , N has access to time realizations of scalar data dk (i ) and M × 1 regressor vector uk,i ,
T
(uk,i = uk (i ) uk (i − 1) . . . uk (i − M + 1) ) of zero mean random data {dk , uk } which is related to w (o) by the model [8]: dk (i ) = ukT,i w (o) + v k (i )
(1)
where the background noise v k (i ) is independent of uk ( j ) for any i , j.
15 16
2.1. Local optimization
17 18 19 20 21 22 23 24 25
Local optimization of diffusion distributed network to identify the unknown parameter is elaborated in [7]. For ease of reference, a slightly modified set of relevant equations are written here in this section. The local cost function is given as,
J kloc ( w )
=
28
T
o,k E | d (i ) − u,i w |
2
(2)
∈Nk
26 27
where o,k are the coefficients of matrix O which satisfy the condition,
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
o,k = 0 if ∈ / Nk and 1T O = 1T
where 1 is an N × 1 vector with unit entries. Applying stochastic gradient descent solution [42] to the cost function in (2) we get
w ki
= =
w ki −1 w ki −1
47 48 49
− μk (∇ i −1 T + μk o,k u,i (d (i ) − u, i wk )
φ ki −1 =
c ,k w i −1
∈Nk
52 53 54 55 56 57
w ki = φ ki −1 + μk
60 61 62 63 64 65 66
i −1 T o,k u,i (d (i ) − u, iφ k )
(7)
∈Nk
Here, in the above equation, c ,k denotes the entries of a combiner matrix C, which is a mathematical representation of the network topology, i.e. if node k and are not connected, then c ,k = 0. The popular combiner matrix rule is Metropolis rule, which is used in this paper. Let nk and n denote the degrees of node k and respectively. The Metropolis rule is defined as [8]:
58 59
(5)
(6)
c ,k =
1
max(nk , n ) c ,k = 0 ck,k = 1 −
c ,k
if k = are linked
(8)
if k and are not linked
(9)
for k =
c ,k w i −1
(11)
w ki
= φ ki −1
(10)
∈ N k /k
An uncomplicated CTA scheme with O = I leads to the simple CTA diffusion algorithm,
67 68
∈Nk i −1 T k uk,i (dk (i ) − uk,i φ k )
+μ
(12)
This paper follows the uncomplicated CTA-diffusion distribution scheme, but with a robust adaptive algorithm, as the LMS based distributed adaptive algorithm fails in an impulsive environment. The following section elaborates on the robust adaptive algorithms which are implemented in this paper.
69 70 71 72 73 74 75 76
3. Robust distributed algorithms
77 78
Least mean square based distributed adaptive algorithms built upon the assumption of Gaussian noise and linearity are not able to cope with the impulsive environment causing its failure to converge. This has prompted the researchers to work on robust adaptive algorithms. In this paper, we have implemented Lorentzian cost function in diffusion distributed framework and compared with Combine then Adapt diffusion Maximum Correntropy Criterion (CTA-DMCC) [40], Combine then Adapt diffusion affine projection sign algorithm (CTA-DAPSA) [41], and Combine then Adapt Huber diffusion LMS (CTA-HuberDLMS). Recently two papers are published on diffusion Huber adaptive filter; [43] and [44]. Former deals with normalized Huber adaptive algorithm in diffusion framework with Adapt then Combine (ATC) strategy, while in latter they introduce variable step size and variable threshold value which also uses ATC combine strategy, but not normalized as in the former case. Hence for a similar framework for comparison purposes with CTA-DLAF and CTA-DMCC an uncomplicated CTA strategy of diffusion Huber adaptive algorithm is presented in this paper with constant step size. The distributed framework in Section 2 with the data model in (1) is used for the following algorithms.
(4)
where μk = 2μk . There are two schemes of diffusion estimation: Adapt then Combine (ATC) and Combine then Adapt (CTA) [7,8]. In ATC the first stage of operation is adaptation which is then followed by diffusion of neighborhood estimates, whereas in CTA a local diffusion step precedes the adaptation step. The CTA diffusion scheme is given below:
50 51
J kloc ( w ))∗
∈Nk
45 46
(3)
φ ki −1 =
3
J (i ) =
E [δ|e (i )| −
if |e (i )| ≤ δ 1 2 δ ], 2
if |e (i )| > δ
J klocal ( w ) =
1 2 ∈Nk o ,k E [ 2 e ,k (i )],
δ|ek (i )| − 12 δ 2 ,
82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
if |ek (i )| ≤ δ if |ek (i )| > δ
103 104 105 106 107
(13)
108 109
In a distributed network, the local cost function at each node, which is a linear combination of cost in (13) over the neighborhood of the local node k, is minimized to arrive at the optimal solution w (0) . The local cost function is given by
81
102
The Huber cost function is a fusion of 1 and 2 norm which dynamically switches according to the relation with e (i ) and threshold δ [43]:
E [ 12 e 2 (i )],
80
101
3.1. Huber diffusion LMS algorithm
79
110 111 112 113 114 115
(14)
116 117
T where e ,k (i ) = d (i ) − u, w i −1 and ek (i ) = ek,k (i ). The coefficients i k o,k satisfy the condition in (3). Choosing O = I N , a simple uncomplicated CTA strategy can be derived by minimizing (14) by gradient descent method, which is given below:
118
φ k(i −1) =
123
ck w i −1
(15)
119 120 121 122 124
∈Nk
φ k(i −1) + μk (ek (i ))uk,i , if ek (i ) ≤ δ (i ) wk = ( i −1 ) φk + μk δ csgn(ek (i ))uk,i , otherwise (i −1)
125
(16)
where ek (i ) = dk (i ) − ukT,i φ k . The mathematical operator csgn{} for a complex number x is defined as,
126 127 128 129 130 131
csgn{x} = sign(real(x)) + j ∗ sign(imag (x))
132
JID:YDSPR
AID:102589 /FLA
[m5G; v1.261; Prn:3/10/2019; 8:27] P.4 (1-13)
A.M. Wilson et al. / Digital Signal Processing ••• (••••) ••••••
4
1
3.2. Diffusion maximum correntropy criterion algorithm
2 3 4 5 6 7
k
DMCC algorithm proposed in [40], is a correntropy based robust adaptive algorithm in diffusion distributed framework. Gaussian kernel-the most popular kernel used in correntropy-leads to the following instantaneous MCC cost,
8 9
12 13 14 15 16
√
19 20 21 22
J klocal
=
σ 2π
25
28 29 30 31
o,k Gσ
34
( i −1 )
φk
=
(18)
wk = φ k
μk + 2 o,k GσMC C e (i ) e (i )u,i σ ∈N
(19)
(i −1)
where e (i ) = d (i ) − u,i φ k . Keeping O = I N the uncomplicated CTA version is arrived at and is given by T
φk
=
ck w i −1
wk
where
μk = φk + 2 GσMC C (ek (i ))ek (i )uk,i σ ( i −1 ) = φk + μk GσMC C (ek (i ))ek (i )uk,i ( i −1 )
μk =
(21) (22)
μk . σ2
43 44 45
3.3. Diffusion affine projection sign algorithm A distributed data reuse algorithm with diffusion strategy, which is robust against impulsive interference is given in [41]. With the projection order L given, the regressor matrix is defined as
46 47 48 49
Uk,i = uk,i
51 52
φ k(i −1) =
55 56 57 58
. . . uk,i − L +1
T
(23)
(i )
( i −1 )
wk = φ k
ck w i −1
(24)
+ μk
63 64 65 66
70 71 72
(27)
73 74 75 76
(28)
77 78
Applying gradient descent search on the linear combination of cost function (26) at each node k within the neighborhood Nk , the weight update equation in [34] is extended to diffusion distributed framework as follows. The local cost function is defined as,
J klocal ( w k )
=
∈Nk
∇ J klocal ( w k ) = −
o,k J w (i) (i )
(29)
k
γ
(30)
( i −1 )
μk =
γ2
− τ)
,
τ = 0, 1, . . . , L − 1
89 91 92 93
96
( i −1 )
o,k U∗,i W,i (d,i − U,i w k
)
(31)
98 99 100 101 102 103
γ 2 is absorbed in the step size). The
104 105 106 107 108 109 110 111 112
ck w i −1
113
∈Nk,i −1 ( i −1 )
85
95
mathematical framework for diffusion LAF is expressed below, where we use a linear combiner to diffuse the data in spatial domain, along with a sliding window based Lorentzian LMS adaptive rule at each node. In this paper, we follow combine then adapt (CTA) rule with O = I N , and yields this uncomplicated CTA DLAF algorithm, ( i −1 ) φk =
84
97
(parameter
83
94
2
+ μk
2μk
82
90
and the coefficients o,k follow the relation in (3). The gradient base algorithm at node k to get the estimate of w (o) at time i is given as follows: (i )
81
87
o,k U∗,i W,i (d,i − U,i w k )
γ 2 ∈N
2 + e 2 (i (i ) wk
wk = wk
80
88
2
γ
gk,i (τ ) =
79
86
where Wk,i = diag{ gk,i (0), gk,i (1), . . . , gk,i ( L − 1)} is a L × L diagonal matrix with
wk = φ k
(32)
+ μk Uk∗,i Wk,i (dk,i
( i −1 )
− Uk,i φ k
114 115
)
The outline of the proposed CTA-DLAF is given in Algorithm 1:
116 117 118
Uk∗,i csgn(ek,i )
[Uk∗,i csgn(ek,i )]∗ [Uk∗,i csgn(ek,i )] + δ
119
(25)
where ek,i = dk,i − Uk,i φ ki −1 . 4. Proposed diffusion LAF algorithm
61 62
and Uk,i is L × M
i−L +1≤n≤i
for
k
(i )
59 60
e w (i) (n) = d(n) − unT w k
68 69
T
∈Nk,i −1
53 54
uk,i −1
The CTA-DAPSA is stated below.
50
Uk,i = uk,i uk,i −1 . . . uk,i − L +1 T and dk,i = dk (i ) dk (i − 1) . . . dk (i − L + 1) .
with
41 42
x 2
∈Nk
39 40
where Lorentzian function ψγ (x) = log 1 + ( γ ) regressor matrix
(20)
∈Nk,i −1 (i )
k
k
k
( i −1 )
67
(26)
Taking the derivative of (29) and using (26) we get
ck w
( i −1 )
(i )
36 38
e (i )
i −1
∈Nk
35 37
where o,k satisfy (3). A gradient based CTA-DMCC algorithm is derived by taking the derivative of (18), which is expressed below:
32 33
2σ 2
(17)
)
τ =0
(i )
ψγ (e w (i) (i − τ )) ≡ dk,i − Uk,i w k LL2,γ
(i )
MC C
26 27
e (i )
∈Nk
23 24
exp(−
which is used in [40] to derive DMCC algorithm. In a distributed network with N nodes, the local cost function at each node k, is given by
17 18
2
1
GσMC C (e (i )) =
10 11
L −1
J w (i ) ( i ) =
Algorithm 1: Diffusion LAF algorithm. 1 Initialize: w k = 0; 2 for i=1,2,. . . do 3
122 123
for k=1,2,. . . N do
124
4
ek,i = dk,i − Uk,i φ ki −1
125
5
g k ,i ( j ) =
126
6
In this section, a distributed solution with diffusion strategy is implemented for the sliding window based Lorentzian adaptive filter algorithm [34]. The sliding window based cost function for Lorentzian adaptive filter algorithm [34] is given as,
120 121
(o)
7 8
γ2 , for j = 0, 1, . . . L − 1 γ 2 +ek2,i ( j )
Wk,i = diag{ gk,i (0), gk,i (1), . . . , gk,i ( L − 1)}
φ k(i −1) =
i −1
∈Nk,i −1 ck w ; (i ) (i −1) wk = φ k + μk Uk∗,i Wk,i ek,i
127 128 129 130 131 132
JID:YDSPR AID:102589 /FLA
[m5G; v1.261; Prn:3/10/2019; 8:27] P.5 (1-13)
A.M. Wilson et al. / Digital Signal Processing ••• (••••) ••••••
1
5
67
Table 1 Computational complexity of CTA-HuberDLMS, CTA-DAPSA, CTA-DMCC, and CTA-DLAF for each node per iteration. nk denotes cardinality of node k.
2 3
Robust algorithms
4 5
e k (i ) ≤ δ e k (i ) > δ
CTA-HuberDLMS
8
CTA-DAPSA CTA-DMCC CTA-DLAF
9 10
69
Computational cost per iteration per node
70
Multiplication
Addition
Exponent arithmetic
Sgn ( )
Sqrt ( )
2M + 1 + nk M 2M + 2 + nk M 2LM + L + M + nk M 2M + 4 + nk M 2LM + 4L + nk M
nk M + M nk M + M 2K M + nk M nk M + M L + 2LM + nk M − M
0 0 0 1 0
0 1 0 0 0
0 0 1 0 0
6 7
68
11
71 72 73 74 75 76 77 78
12 13
4.1. Complexity analysis
Also note that
In Table 1, we provide a complexity comparison of our proposed algorithm with the other robust algorithms considering real data. A slight increase in computational complexity of the proposed algorithm is attributed to the order L. Nevertheless, the absence of exponential computation gives an upper-hand to our proposed algorithm compared to diffusion MCC algorithm.
wo = wo Gw
14 15 16 17 18 19 20 21 22
5. Performance analysis of the diffusion LAF algorithm
25 26 27 28 29 30 31 32 33 34 35 36
The diffusion of adaptive filters in the spatial domain makes the performance analysis complex as compared to a single adaptive filter. Nevertheless resorting to some assumptions on random data and background noise we proceed to solve the stability analysis by following the procedure adopted in [8]. The background noise is modeled as Gaussian mixture in this analysis, different to the Gaussian noise scenario in [8]. The global matrices collecting data and filter updates across local nodes are given below. (i )
(i )
(i )
w i col{ w 1 , w 2 , . . . , w N }
φ
i −1
( i −1 )
col{φ 1
( i −1 )
, φ2
, . . . , φN
37
Ui diag{U1,i , U2,i , . . . , U N ,i }
38
di col{d1,i , d2,i , . . . , d N ,i }
39 40 41 42 43 44 45 46 47
( N M × 1) ( i −1 )
}
( N M × 1)
50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
i −1
where Ui and Wi are block diagonal. The local step sizes are collected into a N M × N M block diagonal matrix
(33)
∗
˜ = [I N M − DE (Ui Wi Ui )]GE w˜ Ew
for k =
0,
for k =
where w o = Q w (o) and v i = col{ v 1,i , v 2,i , . . . , v N ,i }. Q is an N M × M matrix defined as Q = col{I M , I M , · · · , I M } Using the above relations the global representation of adaptive algorithm in (32) is given as,
(35)
Replacing φ i −1 by Gw i −1 we get
w w = Gw
i −1
∗
w + DUi Wi (di − Ui Gw
i −1
)
(36)
where G = C ⊗ I M is the N M × N M transition matrix and C is the N × N diffusion combination matrix with entries [ck,l ]. ⊗ is the Kronecker product. The global weight error vector is defined as, i
˜ = wo − wi w
(37)
87 89 90 91 92 93 94 95 96 97 98 99 100 101
(41)
102 103 105
Ru ,k T r ( EWk,i ) =
106 107
E [Uk∗,i Wk,i Uk,i ]
108
The above evaluation breaks down E (U∗i Wi Ui ) to
E (U∗i Wi Ui ) = RU S
RU = diag{Ru ,1 , Ru ,2 , . . . , Ru , N }
(34)
86
104
di = U i w o + v i
i
(40)
Ru ,k T r ( EWk,i ),
where
w i = φ i −1 + DU∗i Wi (di − Ui φ i −1 )
i −1
We need to evaluate the N M × N M data moment E (U∗i Wi Ui ) prior to establishing the condition for mean stability. Assume EUk∗,i Uk,i and Wk,i are statistically independent [34]. The k block of E (U∗i Wi Ui ) is given by
The desired data available at the local nodes as expressed in (1) is used to write the global matrix of measurements di :
φ i −1 = Gw i −1
(39)
Taking expectation on both sides of (39) and assuming spatial and temporal independence of regressor data uk,i with background noise, we get, i
85
88
− DU∗i Wi v i
Ru ,k = E [uk,i uk∗,i ]
65 66
˜ = (I N M − DU∗i Wi Ui )Gw˜ w
where
48 49
84
(N L × N L)
D = diag{μ1 I M , μ2 I M , . . . , μ N I M }
83
The mean behavior of the CTA-DLAF algorithm is derived here. Subtract left hand side of the global update equation in (36) from w o [8] to get, w o and the right hand side from Gw
Ak = [ E (U∗i Wi Ui )]k =
( N L × 1)
81 82
5.1. Mean stability analysis
(N L × N M )
Wi = diag{W1,i , W2,i , . . . , W N ,i }
80
(38)
i
23 24
79
109 110
(42)
111 112 113 114 115
S = diag{ T r ( EW1,i ) I M , . . . T r ( EW N ,i ) I M }
116 117
For the stability in mean we must have
|λmax (PG)| < 1
118
(43)
119 120
where λmax (.) denotes the maximum eigen value of the matrix and P = (I N M − DRU S). In [8] it is shown that cooperation improves stability i.e. |λmax (PG)| ≤ |λmax (P)|. Hence to show that the CTA-DLAF is mean stable it only requires to show that
121
|λmax (P)| < 1
126
123 124 125 127
i.e. P = (I N M − DRU S) must be a stable matrix. As time i → ∞, S → LI N M . Keeping this in mind the condition for stability is
0 < μk <
122
2 L λmax (Ru ,k )
128 129 130 131 132
JID:YDSPR
AID:102589 /FLA
[m5G; v1.261; Prn:3/10/2019; 8:27] P.6 (1-13)
A.M. Wilson et al. / Digital Signal Processing ••• (••••) ••••••
6
1
we consider a weaker assumption that U∗i Wi Ui is independent of
5.2. Mean square transient analysis
i −1
2 3 4 5 6
The study of mean square transient analysis is done in this section. The analysis is based on weighted energy conservation relation and variance relation [8,42]. The error vector of length L at each node k is defined as,
˜ w . This assumption helps us to replace the random weighting matrix by its mean = E a deterministic quantity. i 2
˜ = E w˜ E w
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
ek,i = dk,i − Uk,i φ ki −1
˜ = Ui G w
25 26 27 28 29 30 31 32 33 34 35 36
(45)
˜ i −1 eaG,i = Ui G w
(46)
The global a priori and a posteriori weighted estimation errors are defined as,
˜ i −1 eaD,i G Ui DGw ˜i eDp ,i Ui Dw
46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66
(49)
+ (eaG,i )∗ Wi Ui DDU∗i Wi v i − v ∗i Wi eaD,i G +
41
45
− DUi Wi ei
− (eaG,i )∗ Wi eaD,i G + (eaG,i )∗ Wi Ui DDU∗i Wi eaG,i
40
44
∗
w˜ i 2 = w˜ i −1 2G∗ G − (eaD,i G )∗ Wi eaG,i − (eaD,i G )∗ Wi v i
39
43
(48)
Substitute (45) into (49) and perform weighted energy balance on the resulting equation, which leads to the equation (50):
38
42
(47)
for some arbitrary N M × N M matrix ≥ 0 [8]. The weight error vector curve is given below, which is obtained by following the same path as that is used in deriving (39):
37
∗
∗
v i Wi Ui DDUi Wi eaG,i
i
+2
∗
(50)
∗
+ v i Wi Ui DDUi Wi v i
i −1 2
+ E [ v ∗i C∗i DDCi v i ] −1
n E v i Ci D G[I N M − DPi −l ] DCi −n v i −n
L −1
n =1
70 72 73 74 75 76 77 78
By looking into (52) we could see that we need to evaluate certain data moments namely,
79
∗ ∗
l =0
85
(53)
E [U∗i Wi Ui ], E [ v ∗i Wi Ui DDU∗i Wi v i ] and
+ G∗ U∗i Wi Ui DDU∗i Wi Ui G
E [Ui Wi Ui DDUi Wi Ui ]
88
prior to establishing the learning behavior of the CTA-DLAF algorithm. The fourth order data moment in (53) is difficult to be evaluated in closed form for arbitrary distributions and hence Gaussian distributed data is considered. The evaluation of data moments and subsequent analysis is simplified if we use remodeled variables by appealing to the eigendecomposition [42] of E [U∗i Wi Ui ] = RU S. The eigen decomposition is given by
90
∗
E [Ui Wi Ui ] = RU S = V V
89
∗
(54)
where = diag{ 1 , 2 , . . . , N }. Eigen decomposition of Ru ,k is given by
where Ci = U∗i Wi and Pi = U∗i Wi Ui . The extra noise term in (51) accounts for the dependency of weight error vector on the past noise and the derivation follows as in [45]. Since itself is a random quantity as it is dependent on input data, the transient analysis becomes complex. Hence for the sake of mathematical tractability of (51) some simplifying assumptions are taken into consideration. The independence of the regressor ˜ i −1 is a common assumption [42] for adaptive filter vector with w analysis. But since Ui consists of successive regressors it is rather a strong condition than the usual independence assumption. Hence
91 92 93 94 95 96 97 98 99 100 101 102
(55)
103 104
The remodeled input regressor autocorrelation matrix by Lorentzian weight matrix in (54) from that of the diffusion LMS in [8] calls for a slight change in eigen decomposition. From (54) and (55) we vectorize k into
105 106 107 108 109
vec { k } = k,i = T r ( EWk,i )λk
(56)
110
where λk = vec {k } The new transformed variables are defined below:
111
∗
¯ i = Ui V , U
i
¯ = V w˜ , w i
¯ = V ∗ V
¯ = V ∗ GV , G
¯ = V V , ∗
+2
116
∗ ¯∗
117 118 119 120 121
−1
n ∗ ¯ [I N M − DP¯ i −l ] DC¯ i −n v i −n ¯ E v ∗i C¯ i D G
L −1
n =1
113 115
¯ DC¯ i v i ]
¯ + E [ v i Ci D
i −1 2
112 114
¯ = V ∗ DV = D W ¯ i = V ∗ Wi V = W i D
¯ ¯ = E w¯ E w i 2
(51)
86 87
∗
The variance relation in (52) is rewritten in terms of the transformed variables
= G∗ G − G∗ DU∗i Wi Ui G − G∗ U∗i Wi Ui DG
82 84
l =0
∗
81 83
−1
n E v i C i D G[I N M − DPi −l ] DCi −n v i −n ∗ ∗
Ru ,k = V k V ∗
Another assumption has also been taken in this convergence analysis apart from [8], the dependency of weight error vector on the past noise [45]. Substituting (46) and (47) into (50) and taking expectation on both sides the variance relation is obtained as in:
˜ 2 = E w˜ E w
l =0
+ G∗ E [U∗i Wi Ui D DU∗i Wi Ui ]G
i −1
69
80
+ vi
where
i
−1
n E v ∗i C∗i D G[I N M − DPi −l ] DCi −n v i −n
L −1
(52)
= eaG,i + v i
˜ = Gw ˜ w
68
71
+ E [ v i Ci DDCi v i ]
∗ = G∗ G − G∗ DE [U∗i Wi Ui ]G − G∗ E [U∗i Wi Ui ]DG
w i −1 ei = di − Ui Gw i −1
∗ ∗
n =1
The N L × 1 global error vector ei = col{e1,i , e2,i , . . . , e N ,i } is written as,
23 24
+2
(44)
i −1 2
67
l =0
122 123 124
¯ G¯ − G¯ ∗ ¯ DE [U¯ ∗i Wi U¯ i ]G¯ − G¯ ∗ E [U¯ ∗i Wi U¯ i ]D ¯ G¯ ¯ = G¯ ∗ ∗ ∗ ¯ DU¯ ∗i Wi U¯ i ]G¯ + G¯ E [U¯ i Wi U¯ i D
125 126 127
(57)
128 129
To evaluate the data moments, we closely follow the procedure in [8]. A block matrix to vector conversion operator bvec{} is used [8]. ¯ } = σ¯ . With the bvec{} For the ease of reference note that bvec{
130 131 132
JID:YDSPR AID:102589 /FLA
[m5G; v1.261; Prn:3/10/2019; 8:27] P.7 (1-13)
A.M. Wilson et al. / Digital Signal Processing ••• (••••) ••••••
1 2 3 4 5 6 7
operator at hand, the next step is to evaluate the transformed data moments in the remodeled variance relation (57), namely,
¯∗
¯∗
¯ i ], E [ v ∗ Wi U¯ i D ¯ DUi Wi v i ], E [Ui Wi U i −1
n ∗¯ ¯ [I N M − DP¯ i −l ] D¯ C¯ i −n v i −n ¯ E v ∗i C¯ i D G
l =0
8 9 10 11 12
¯∗
¯∗
¯ i D ¯ DUi Wi U¯ i ] E [Ui Wi U
and
¯ i = Ui V and from (54), the first data moment By substituting U in (58) gets evaluated to
15
The second term in (58) is expressed as
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
34
∗
37 38 39 40 41 42 43 44
∗
(60)
¯∗
¯ DUi are independent, (60) is writAssuming Wi v i v ∗i Wi and Ui D ten as
¯∗
¯∗
¯ i D ¯ DUi Wi v i ] = T r { E [Wi v i v ∗ Wi ] E [U¯ i D ¯ DUi ]} E [ v ∗i Wi U i
(61)
For noise with less power, we can assume Wi and v i are independent which fails if noise contains sufficient power. Hence the term EWi v i v ∗i Wi cannot be further approximated due to mathematical complexity and kept the same for the impulsive noise scenario. Let
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66
−1
n ¯ [I N M − DP¯ i −l ] DC¯ i −n v i −n = xnT σ¯ ¯ E v i C i D G
72
∗ ¯∗
where
μk2 T r (k ¯ kk )IL for k =
for k =
0,
¯∗
bk =
vec{I L }μ 0,
σ¯ kk for k = for k =
¯∗
(65)
where 0 is L 2 × M 2 null matrix and
σ¯ = col{σ¯ 1 , σ¯ 2 , . . . , σ¯ N } B = diag {0, 0, . . . , vec{I L }μ2 λT , . . . , 0} Thus we get,
(66)
81 82 83 84 85 86
(70)
(71)
87
92
¯ k u¯ ,i u¯ ∗ ] ¯ k,i u¯ k∗,i Fk = E [ T r {Wk,i W,i }u ,i L −1 L −1
93 94
¯ k u¯ ,i −τ2 u¯ ∗ ¯ k,i −τ1 u¯ k∗,i −τ1 E [u ,i −τ2
95
(72)
98 99
∗
τ 2 =0 τ 1 =0 τ1 =τ2
¯ k u¯ ,i −τ2 u¯ ∗ ¯ k,i −τ1 u¯ k∗,i −τ1 E [u ,i −τ2 ]
101 102 104 105
(73)
106 107 108 109
Applying fourth moment of Gaussian variable Lemma in [42], the M × M block Fk in (73) gets evaluated to (74),
⎧ ¯ ) + ξ k ¯ ) k,k,i (k T r (k ⎪ ⎪ kk L −1 kk k ⎪ L − 1 ⎪ ¯ kk k ⎨ + k τ 2 =0 τ1 =0 E [Wk,i (τ1 , τ1 ) × Wk,i (τ2 , τ2 )] ¯ k k,,i k
100
103
× E [Wk,i (τ1 , τ1 )W,i (τ2 , τ2 )]
⎪ ⎪ ⎪ ⎪ ⎩
96 97
× Wk,i (τ1 , τ1 )W,i (τ2 , τ2 )]
L −1 L −1
90 91
τ1 =τ2
110 111 112 113 114 115
for k = for k = (74)
b = col{0σ¯ 1 , 0σ¯ 2 , . . . , vec{I L }μ2 λT σ¯ , . . . , 0σ¯ N }
where Bd = diag {B1 , B2 , . . . , B N }. Thus we get (61) as
79
The M × M k block is given by
Fk =
78
89
(63)
b = col{b1, , b2, , . . . , bk, , . . . , b N , }
bvec{B} = col{B1 σ¯ 1 , B2 σ¯ 2 , . . . , B N σ¯ N } = Bd σ¯
¯∗
¯ ∗i Wi U¯ i ¯ U¯ ∗i Wi U¯ i ]. F = E [U
(64)
74
88
+
Finally bvec{B} = col{b1 , b2 , . . . , b , . . . , b N } where
B σ¯
(69)
¯ k u¯ ,i u¯ ∗ ] ¯ k,i u¯ k∗,i Fk = E [ T r {Wk,i W,i }] E [u ,i
bk = vec {Bk }
73
77
Taking the statistical independence of Ui Ui and Wi into account (72) is rewritten as follows:
In order to vectorize B we need to apply the vec{} operator on Bk and let us denote it by
71
76
Let
(62)
70
80
¯ i D ¯ DUi Wi U¯ i ] = DE [Ui Wi U¯ i ¯ Ui Wi U¯ i ]D E [Ui Wi U
where B is the th block column:
2 T λ k k
T ¯ [I N M − DP¯ i −l ] DC¯ i −n v i −n v ∗ C¯ ∗i D G i
τ 2 =0 τ 1 =0 τ1 =τ2
B = [B1 B2 . . . B . . . B N ]
B = col{B1, , B2, , . . . , Bk, , . . . , B N , }
−1
n
¯∗
+
Now express B as
69
75
No further simplifications are possible to the expectation term due to the dependency of Wi and v i . The fourth order moment in (58) is solved by appealing to the ¯ ∗i Wi U¯ i and D are Gaussian factorization theorem [8,42]. Now both U block diagonal and hence we can write
Bk =
(68)
l =0
The L × L k-block of B is given by
47 48
68
∗
45 46
where x = bvec{ E [Wi v i v ∗i Wi ] T } and σ¯ = col{σ¯ 1 , σ¯ 2 , . . . , σ¯ N }. By invoking the properties of trace operator just as done previously, the third term in (58) gets evaluated to the following.
¯ i D ¯ DU¯ i B EU
35 36
67
(59)
¯ i D ¯ DU¯ i ]} ¯ DU¯ i Wi v i ] = T r { E [Wi v i v ∗ Wi U¯ i D E [ v ∗i Wi U i
32 33
(67)
xn = bvec E
14 16
∗
¯ i D ¯ DU¯ i Wi v i ] = xT Bd σ¯ E [ v ∗i Wi U
l =0
¯ ∗i Wi U¯ i ] = E [U
13
(58)
7
where k,,i = T r { EWk,i W,i } and ξ = 1 for complex data and ξ = 2 for real data. This differs from that in [8] by the presence of T r { EWk,i W,i } and the summation term which takes care of previous L error vector. For small step size the summation term could be neglected. For a compact representation of recursive equation in (57), the ¯ is converted to a vector by applying bvec{} opblock matrix erator on the same. bvec{} is the vectorization operator for block matrices and is explained in detail in [8]. For any three block matrices P , Q and the following relation holds:
116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131
bvec{PQ} = (Q T P)σ
(75)
132
JID:YDSPR
AID:102589 /FLA
[m5G; v1.261; Prn:3/10/2019; 8:27] P.8 (1-13)
A.M. Wilson et al. / Digital Signal Processing ••• (••••) ••••••
8
1 2 3 4 5 6 7 8 9 10 11 12 13
¯ } by keeping all of the above Now we proceed to express bvec{ ¯ is rewritten here: relations in mind.
¯∗
¯∗
¯∗
¯ G¯ − G ¯ DG¯ − G D ¯ G¯ ¯ =G ∗ ∗ ∗ ¯ DU¯ i Wi U¯ i ]G¯ + G¯ E [U¯ i Wi U¯ i D
¯ i 2σ¯ = E w¯ i −1 2¯ σ¯ + (xT Bd + 2 E w
First term is vectorized as
(77)
Second and third terms in (76) are vectored respectively as
¯∗
¯ σ¯ = E w¯ E w
¯ D } ¯ DG¯ } = (G¯ G )bvec{IN M bvec{G
and
17
¯ D ¯ G¯ } = (G¯ G¯ )(I N M D)σ¯ bvec{G
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
∗
∗
¯ i Wi U¯ i ¯ U¯ i Wi U¯ i ]} = F σ¯ bvec{ E [U
F = diag {F 1 , F 2 , . . . , F N } and σ¯ = col{σ¯ 1 , σ¯ 2 , . . . , σ¯ N }
50
55 56
F = diag {1, (1 ⊗ ), 2, (2 ⊗ ), . . . L −1 L −1
×
(82)
E [W,i (τ1 , τ1 )W,i (τ2 , τ2 )], . . .
τ 2 =0 τ 1 =0 τ1 =τ2
The time index i is dropped for compactness. σ¯ is defined as σ¯ = col{σ¯ 1 , σ¯ 2 , . . . , σ¯ N }. σ¯ k is obtained by applying vec{} op¯ k of block matrix ¯. erator on each element Hence
¯∗
¯∗
¯∗
63 64 65 66
Choosing σ¯ = curve,
83 85 86
( N1 )bvec{I N M }
κ η in (88) yields the global MSD
87 88 89
T
η(i ) = η(i − 1) + (x Bd + 2
L −1
90
¯i xnT )
κ η − w¯ 0 2
n =1
¯ i (I− ¯ )κ η
(89)
91 92 93
In a similar way, choosing σ¯ = ( N1 )bvec{} λζ , where
94
ζ=
η= ζ=
Summarizing
¯ } = σ¯ = ¯ σ¯ bvec{
(84)
where
¯ = (G¯ G ) I N 2 M 2 −(I N M D)
99
¯ λζ − w¯ 0 2 xnT ) i
¯ i (I− ¯ )λζ
(90)
1 N 1 N
1 N 1 N
100 101 102 103 104 105
¯ i −1 2 E w
(MSD)
(91)
¯ i −1 2 E w
(EMSE)
(92)
106 107
(x T Bd + 2
L −1
T
(x Bd + 2
109 111
¯ )−1 bvec{IN M } xnT )( I −
112
(MSD)
(93)
n =1 L −1
108 110
113 114 115
xnT )( I
¯ )−1 bvec{} −
(EMSE)
(94)
n =1
The verification of the theoretical expressions derived here is done in the next section.
(85)
By taking inverse operation bvec−1 {} we express (57) as
=
98
6. Simulation results
− (D I N M ) + (D D)F σ¯
¯ i −1 2 −1 ¯ E w bvec { σ¯ }
L −1
116 117 118 119 120 121
¯∗
¯ i 2 −1 E w bvec {σ¯ }
97
Which further solved to
¯ i D ¯ DUi Wi U¯ i ]G¯ } = (G¯ G )(D D)F σ¯ (83) bvec{G E [Ui Wi U
T
96
As the learning process reaches steady state, the global steady state MSD and EMSE are defined as [8]
¯∗
T
61 62
82 84
η=
. . . , N , ( N ⊗ )}
58 60
79 81
n =1
. . . , (λ λT + ξ ⊗ ), + ⊗
57 59
76
80
i 2
ζ (i ) = ζ (i − 1) + (xT Bd + 2
53 54
¯ )σ¯ (I−
the global learning curve for EMSE is given below, with
51 52
σ¯ − w¯ ¯ i
The global MSD and global EMSE error curves are taken as in [8]. The global mean square deviation is defined as,
with F comes from (74) and is defined in:
42
49
75
95
41
48
(88)
0 2
= diag { 1 , 2 , . . . , k , . . . , N }
40
47
(81)
where
39
46
(80)
74
78
¯i xnT )
N
By following the same procedure used in [8] for bvec{} operator, ¯ ∗i Wi U¯ i ¯ U¯ ∗i Wi U¯ i ]} gets evaluated to bvec{ E [U
38
45
73
η(i ) = ( ) E w¯
T ∗ ∗ ¯ U¯ ∗i Wi U¯ i ]} = (G¯ G¯ )(D D)bvec{ E [U¯ i Wi U¯ i
∗
L −1 n =1
1
¯ ∗ E [U¯ ∗i Wi U¯ i D ¯ DU¯ ∗i Wi U¯ i ]G¯ } bvec{G
37
44
(79)
Finally we apply bvec{} operator on the fourth term.
36
43
∗
T
72
77
+ (x Bd + 2
(78)
70 71
σ¯
15 16
69
(87)
i −1 2
T
T ∗ = (G¯ G¯ )(D I N M )σ¯
14
68
n =1
The global learning curves are derived by iterating (87) as in [8] and is given below: i 2
¯∗
T
67
xnT )σ¯
¯ =(G¯ T G¯ ∗ ) IN 2 M 2 − (I N M D) − (D I N M ) + (D D)F σ¯
(76)
¯ ∗ ¯ G¯ } = (G¯ T G¯ ∗ )σ¯ bvec{G
L −1
T
+ (x Bd + 2
L −1
xnT ) ¯
σ (86)
n =1
Dropping the inverse notation for compactness, the recursive relation for the Diffusion LAF algorithm is given as
122 123
The performance comparison of the proposed CTA-DLAF with CTA-DMCC, CTA-DAPSA, and CTA-HuberDLMS is done for different % of impulsive noise in this section. Also the theoretical results of proposed CTA-DLAF is compared with simulation results. Fig. 2 [46] shows the topology of WSN adopted in this paper. In this work, we assume the wireless channel to be Rayleigh fading and is considered to be time invariant for the block of pilot symbols transmitted. Also since the real and imaginary parts of channel taps are independent we have assumed real channel
124 125 126 127 128 129 130 131 132
JID:YDSPR AID:102589 /FLA
[m5G; v1.261; Prn:3/10/2019; 8:27] P.9 (1-13)
A.M. Wilson et al. / Digital Signal Processing ••• (••••) •••••• 1 where w ki − ,l is the estimate at node k at lth run of experiment.
1 2
⎛
3 4
E M S E (i ) = 10 log ⎝
5 6 7
10 11
Fig. 2. Network topology.
12
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
v (i ) = v G (i ) + b (i ) v I (i )
(95)
where v G (i ) is zero mean Gaussian noise with variance σG2 and v I (i ) is zero mean Gaussian noise with large variance σ I2 and b(i) follows Bernoulli distribution, i.e. P (b(i ) = 1) = p i and P (b(i ) = 0) = 1 − p i . The ratio of variance of impulsive to Gaussian noise is defined as η , where
σ2 η = I2 σG
(96)
In the experiment results to follow, same noise power is considered at each node with SNR 30dB and η is chosen as 106 . The threshold value for CTA-HuberDLMS is chosen as [29]
δopt =
1
αopt αopt =
pi 1 1− p i σ G
sen as 1 in all experiments. The analysis of the robustness performance in this work is based on three parameters: mean square deviation (MSD), excess mean square error (EMSE) and absolute percentage of deviation. In this paper global MSD and EMSE which is the average of local performance measures over the nodes is used. The ensemble average of global MSD and EMSE in dB over N runs of independent experiment are given as:
43 44 45 46 47 48 49
70 71
k =1
72 73 74 75
(97)
where w ∞ is the global estimate of unknown parameter.
76 77 78 79 80 81
6.1. Comparison of analytical and simulation results
82 83
To confirm the theoretical analysis of proposed CTA-DLAF, the mathematical learning curve is compared with the plot of simulation results averaged over 50 trials in 10% impulsive noise environment. The step size is taken as 0.01 and the tuning parameter γ for CTA-DLAF is chosen as 1. The result is plotted for window length L = 1 and 2. The result shows how the parameter L controls the convergence and steady state error. The expectation term E [Wi v i v ∗i Wi ], E [Wk,i W,i ] and E [Wk,i ] is evaluated by ensemble averaging over time index i. Fig. 3, shows the comparison of analytical and simulated learning curves, and it is observed that there is a mismatch in transient part due to ensemble averaging of E [Wk,i W,i ] and E [Wk,i ]. The time dependency for E [Wk,i W,i ] and E [Wk,i ], due to the presence of error term in the same, gets lost in ensemble averaging. Which, unlike in simulation, leads to constant matrices and causes a mismatch in the transient curve.
⎛
M S D (i ) = 10 log ⎝
and kernel width
σ of CTA-DMCC is cho-
38
42
−
68 69
1 2⎠ w ki − ,l )|
6.2. Effect of L and μ on steady state performance of CTA-DLAF
where
41
|uk,l (i )( w
(o)
67
84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
37
40
l =1
N
⎞
l =1
coefficients for simulation study. The 3-path Rayleigh channel coefficients w 0 are generated as w 0 = randn(1, 3)/norm( w 0 ). The pilot symbols u i at each node are considered the same and Gaussian distributed and the noise source is a mixture of Gaussian and impulsive noise. The impulsive noise is modeled as a Bernoulli Gaussian distribution [29]. The impulsive noise model is given by
36
39
N
1
N
NM 1 w (o) (l) − w ∞ (l) AP D = × 100% NM w (o) (l)
9
14
1
N
Absolute percentage of deviation is given as
8
13
9
N N 1 1
N
l =1
N
k =1
⎞
w 0 − w ki −,l 1 2 ⎠
101 102 103
The theoretical expression for steady state MSD given in (93) is verified by varying the step size μ and by varying the window length, L respectively in Fig. 4. As i → ∞, error tends to zero and hence E [Wk,i ] gets approximated to identity matrix [34]. The behavior of CTA-DLAF with different step size is plotted in Fig. 4(a) with window length L = 2 and γ = 1, in 10% impulsive noise environment. Fig. 4(b) shows the steady state behavior of CTA-DLAF with varying window length L for step size μ = .01 and γ = 1 in 10% impulsive noise environment. While increasing L leads to a faster convergence at the expense of steady state error, choosing L > 2 leads to more memory requirement and as memory is a constraint in nodes in WSN L = 2 is an optimum choice.
104 105 106 107 108 109 110 111 112 113 114 115
50
116
51
117
52
118
53
119
54
120
55
121
56
122
57
123
58
124
59
125
60
126
61
127
62
128
63
129
64
130 131
65 66
Fig. 3. Comparison of global theoretical and simulated curve for CTA-DLAF: (a) MSD, (b) EMSE.
132
JID:YDSPR
AID:102589 /FLA
[m5G; v1.261; Prn:3/10/2019; 8:27] P.10 (1-13)
A.M. Wilson et al. / Digital Signal Processing ••• (••••) ••••••
10
1
67
2
68
3
69
4
70
5
71
6
72
7
73
8
74
9
75
10
76
11
77
12
78
13
79
14
80
15
81
16
82
17
Fig. 4. Comparison of global steady state MSD for CTA-DLAF (a) Varying
18
83
μ (b) Varying L.
84
19
85
20
86
21
87
22
88
23
89
24
90
25
91
26
92
27
93
28
94
29
95
30
96
31
97
32
98
33
99 100
34 35 36
Fig. 5. Steady state comparison of CTA-DLAF and CTA-DMCC with (b) Steady state EMSE (dB) of CTA-DLAF and CTA-DMCC.
σ = 1 and varying γ in 10% impulsive noise. (a) Steady state MSD (dB) of CTA-DLAF and CTA-DMCC.
102 103
37 38 39 40 41
44 45 46 47
104
Table 2 Comparison of Steady State MSD and EMSE. Adaptive algorithm
42 43
CTA−DLAF CTA−DMCC CTA−HuberDLMS CTA−DAPSA CTA−DLMS
105 106
Impulsive noise 10%
20%
107
30%
MSD
EMSE
MSD
EMSE
MSD
EMSE
−48.2225 −39.5700 −27.8501 −29.0925
−48.0786 −39.7806 −27.1291 −29.6178
−48.2919 −37.7340 −28.7468 −28.5562
−48.2642 −37.7235 −28.2109 −29.1192
−49.0365 −36.0198 −29.7635 −28.1170
−49.0098 −36.1820 −29.6070 −28.6086
8.8471
8.6681
11.6021
11.6546
13.4244
13.2722
50
Hence in the following sections window length L for CTA-DLAF is chosen as 2.
6.4. Steady state performance comparison
55 56 57 58 59 60 61 62 63 64 65 66
110 111 112 113 115 117
6.3. Steady state performance comparison with varying γ
53 54
109
116
51 52
108
114
48 49
101
In this experiment tuning parameter γ of CTA-DLAF is varied and compared the steady state performance with CTA-DMCC in 10% impulsive noise environment. The step size of CTA-DLAF and CTA-DMCC are adjusted to provide same convergence rate for each value of γ . Fig. 5 shows the comparable performance of CTADMCC and CTA-DLAF for γ > 0.4 with CTA-DLAF having a slight upper hand in 10% impulsive noise environment. Fig. 5 also shows the necessity of properly tuning the parameter γ to provide robustness in an impulsive noise environment as it is seen that for γ < 0.4 CTA-DMCC shows better performance compared to CTADLAF. In the √ following experiments a time varying γ is chosen, i.e., γ = 10 E M S E (i ).
In this experiment the steady state performance of CTA-DLAF, CTA-DMCC, CTA-DAPSA, CTA-HuberDLMS, and CTA-DLMS is compared. The step size of each algorithm is adjusted for same convergence rate and compared the steady state error. The projection order L is chosen as 2 for CTA-DAPSA and CTA-DLAF. Step size for CTA-DLMS, CTA-HuberDLMS, CTA-DAPSA, CTA-DMCC and CTA-DLAF is taken as 0.1, 0.5, 0.03, 0.2, and 0.05 respectively. The performance results in 10, 20, and 30% impulsive noise environment is given in Table 2. The robustness of all four algorithms remains intact with increase in impulsive noise percentage. Fig. 6 shows the comparison of learning curves in 30% impulsive noise. The comparable performance of CTA-HuberDLMS and CTA-DAPSA in 30% impulse noise, is attributed to the projection order L of CTA-DAPSA. Table 2 and Fig. 6 show the √ improvement in performance of CTA-DLAF in choosing γ = 10 E M S E (i ).
118 119 120 121 122 123 124 125 126 127 128 129 130 131 132
JID:YDSPR AID:102589 /FLA
[m5G; v1.261; Prn:3/10/2019; 8:27] P.11 (1-13)
A.M. Wilson et al. / Digital Signal Processing ••• (••••) ••••••
11
1
67
2
68
3
69
4
70
5
71
6
72
7
73
8
74
9
75
10
76
11
77
12
78
13
79
14
80
15
81
16
82
17
83
18
84
Fig. 6. Learning curves for CTA-DLAF and other algorithms in 30% impulsive noise: (a) MSD, (b) EMSE.
19
85
20
86
21
87
22
88
23
89
24
90
25
91
26
92
27
93
28
94
29
95
30
96
31
97
32
98
33
99
34
100
35
101
36
102 103
37
Fig. 7. Transient Performance of CTA-DLAF, CTA-DMCC, CTA-HuberDLMS and CTA-DLMS in 30% impulsive noise: (a) MSD, (b) EMSE.
38
104 105
39 40 41 42 43 44 45 46 47 48
Table 3 Comparison of convergence rate.
Table 4 Absolute percentage deviation (η = 106 ).
Adaptive algorithm
Convergence rate in impulsive noise (approx.) 10%
20%
CTA-DLAF CTA-DMCC CTA-HuberDLMS CTA-DAPSA CTA-DLMS
123 447 1062 1390 Diverging
159 612 1077 1570 Diverging
106 107
Impulsive noise
30%
Adaptive algorithm
0%
10%
20%
30%
109
183 956 1498 1925 Diverging
CTA-DLAF CTA-DMCC CTA-HuberDLMS CTA-DAPSA CTA-DLMS
0.0097 0.0110 0.0477 0.0228 0.0119
0.0154 0.0390 0.1181 0.0526 8.9863
0.0181 0.0178 0.0389 0.0331 3.5590
0.0086 0.0462 0.0831 0.0600 14.0859
110
108
51
54 55 56 57 58 59 60 61 62 63 64 65 66
113 114 116
6.5. Transient performance comparison
52 53
112
115
49 50
111
In this experiment the transient performance of CTA-DLAF is analyzed in channel estimation scenario √ at 10, 20 and 30% impulsive noise environment with γ = 10 E M S E (i ). Table 3 illustrates the comparison of algorithms in terms of approximate convergence rate. In 10% impulsive noise scenario, step sizes are 0.06,0.051,0.0015, 0.03, and 0.05 for CTA-DLAF, CTA-DMCC, CTADAPSA, CTA-HuberDLMS and CTA-DLMS respectively. Kernel width CTA-DMCC σ is chosen as 1. In 20% impulsive noise scenario, step sizes are 0.06, 0.038, 0.0013, 0.03, and 0.05 for CTA-DLAF, CTA-DMCC, CTA-DAPSA, CTA-HuberDLMS and CTA-DLMS respectively. In 30% impulsive noise scenario, step sizes are 0.06, 0.03, 0.0013, 0.03, and 0.05 for CTA-DLAF, CTA-DMCC, CTA-DAPSA, CTAHuberDLMS and CTA-DLMS respectively. The learning curves in 30% impulsive noise environment is given in Fig. 7 which con-
117
firms that CTA-DLAF provides faster convergence rate with window length, L = 2 compared to other algorithms. Absolute percentage of deviation (APD) of CTA-DLMS, CTAHuberDLMS, CTA-DMCC and CTA-DLAF in 0, 10, 20, 30% of impulsive noise is given in Table 4. The parameters are kept same as in Section 6.4. APD is calculated by using (91) averaged over 100 independent experiments and the estimated weight is taken as the average over last 500 iterations. It clearly shows the failure of diffusion LMS in impulsive noise environment. In 0% impulsive noise environment, i.e. channel gets affected by only Gaussian noise CTA-DAPSA gives a slightly degraded performance compared with CTA-DLMS while CTA-DMCC shows performance at par with CTA-DLMS. CTA-DLAF outperforms all of these algorithms in estimating AWGN channel if the tuning parameter γ is chosen as a priori excess mean square error.
118 119 120 121 122 123 124 125 126 127 128 129 130 131 132
JID:YDSPR
AID:102589 /FLA
12
1
[m5G; v1.261; Prn:3/10/2019; 8:27] P.12 (1-13)
A.M. Wilson et al. / Digital Signal Processing ••• (••••) ••••••
7. Conclusion
2 3 4 5 6 7 8 9 10 11 12 13 14 15
In this paper we proposed CTA-DLAF, a faster converging channel estimation algorithm in a sensor network affected by impulsive noise. The proposed algorithm differs from Lorentzian adaptive filter in the sense that it is extended to sensor network and follows diffusion strategy. Numerical simulations shows the comparable performance of CTA-DLAF and CTA-DMCC in terms of both steady state and convergence rate if a constant tuning parameter γ is chosen for CTA-DLAF. The simulation studies show the improved trade off between convergence rate and steady state error compared to CTA-DMCC when γ 2 is taken 100 times larger than a priori EMSE, i.e. it is time varying. Under both the cases CTA-DLAF outperforms CTA-HuberDLMS and CTA-DAPSA. Also the steady state expression for CTA-DLAF shows close agreement with the simulation results.
16 17
Declaration of competing interest
18 19 20 21 22
We wish to confirm that there are no known conflicts of interest associated with this publication and there has been no significant financial support for this work that could have influenced its outcome.
23 24
Acknowledgments
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66
This work was supported in part by the Science and Engineering Research Board (SERB), Govt. of India (Ref. No. SB/S3/EECE/210/2016 Dated 28/11/2016). References [1] T. Panigrahi, P.M. Pradhan, G. Panda, B. Mulgrew, Block least mean squares algorithm over distributed wireless sensor network, J. Comput. Netw. Commun. 2012 (2012) 5564–5569. [2] W. Xia, Y. Wang, A variable step-size diffusion LMS algorithm over networks with noisy links, Signal Process. 148 (2018) 205–213. [3] A. Goldsmith, Wireless Communication, Cambridge University Press, 2005. [4] J.G. Proakis, Digital Communications, McGraw Hill, 2000. [5] S.R. Kim, A. Efron, Adaptive robust impulse noise filtering, IEEE Trans. Signal Process. 43 (8) (1995) 1855–1866. [6] R. Abdolee, B. Champagne, Centralized adaptation for parameter estimation over wireless sensor networks, IEEE Commun. Lett. 19 (9) (2015) 1624–1627. [7] F.S. Cattivelli, A.H. Sayed, Diffusion LMS strategies for distributed estimation, IEEE Trans. Signal Process. 58 (3) (2010) 1035–1048. [8] C.G. Lopes, A.H. Sayed, Diffusion least-mean squares over adaptive networks: formulation and performance analysis, IEEE Trans. Signal Process. 56 (7) (2008) 3122–3136. [9] C.G. Lopes, A.H. Sayed, Incremental adaptive strategies over distributed networks, IEEE Trans. Signal Process. 55 (8) (2007) 4064–4077. [10] B.P. Mishra, T. Panigrahi, A. Dubey, Robust distributed estimation of wireless channel, in: IEEE International Conference on Applied Electromagnetics, Signal Processing and Communication, 2019, in press. [11] T. Panigrahi, G. Panda, B. Mulgrew, Error saturation nonlinearities for robust incremental LMS over wireless sensor networks, ACM Trans. Sens. Netw. 11 (2) (2014) 27, 20 pp. [12] J. Chen, C. Richard, A.H. Sayed, Multitask diffusion adaptation over networks, IEEE Trans. Signal Process. 62 (16) (2014) 4129–4144. [13] J. Chen, C. Richard, A.H. Sayed, Diffusion LMS over multitask networks, IEEE Trans. Signal Process. 63 (11) (2015) 2733–2748. [14] B. Widrow, Adaptive Filters I: Fundamentals, Tech. rep., Stanford Electronic Laboratories, 1966. [15] A.M. Zoubir, V. Koivunen, Y. Chakhchoukh, M. Muma, Robust estimation in signal processing: a tutorial-style treatment of fundamental concepts, IEEE Signal Process. Mag. 29 (4) (2012) 61–80. [16] T.K. Blankenship, D.M. Kriztman, T.S. Rappaport, Measurements and simulation of radio frequency impulsive noise in hospitals and clinics, in: IEEE 47th Vehicular Technology Conference. Technology in Motion, 1997. [17] D. Middleton, Non-Gaussian noise models in signal processing for telecommunications: new methods an results for class A and class B noise models, IEEE Trans. Inf. Theory 45 (4) (1999) 1129–1149. [18] Y. Abramovich, P. Turcaj, Impulsive noise mitigation in spatial and temporal domains for surface-wave over-the-horizon radar, in: 9th Annual MIT Workshop on Adaptive Sensor Array Processing. ASAP, 2001.
[19] J. Zhang, T. Qiu, S. Luan, H. Li, Bounded non-linear covariance based esprit method for noncircular signals in presence of impulsive noise, Digit. Signal Process. 87 (2019) 104–111. [20] Y. Liu, Y. Zhang, T. Qiu, J. Gao, S. Na, Improved time difference of arrival estimation algorithms for cyclostationary signals in α -stable impulsive noise, Digit. Signal Process. 76 (2018) 94–105. [21] H. Wan, X. Ma, X. Li, Variational Bayesian learning for removal of sparse impulsive noise from speech signals, Digit. Signal Process. 73 (2018) 106–116. [22] L. Rugini, P. Banelli, On the equivalence of maximum SNR and MMSE estimation: applications to additive non-Gaussian channels and quantized observations, IEEE Trans. Signal Process. 64 (23) (2016) 6190–6199. [23] S.B. Babarsad, S.M. Saberali, M. Majidi, Analytic performance investigation of signal level estimator based on empirical characteristic function in impulsive noise, Digit. Signal Process. 92 (2019) 20–25. [24] P.A. Lopes, J.A. Gerald, Iterative MMSE/MAP impulsive noise reduction for OFDM, Digit. Signal Process. 69 (2017) 252–258. [25] V. Mathews, S.H. Cho, Improved convergence analysis of stochastic gradient adaptive filters using the sign algorithm, IEEE Trans. Acoust. Speech Signal Process. 35 (4) (1987) 450–454. [26] P. Petrus, Robust Huber adaptive filter, IEEE Trans. Signal Process. 47 (4) (1999) 1129–1133. [27] J. Chambers, A. Avlonitis, A robust mixed-norm adaptive filter algorithm, IEEE Signal Process. Lett. 4 (2) (1997) 46–48. [28] E.V. Papoulis, T. Stathaki, A normalized robust mixed-norm adaptive algorithm for system identification, IEEE Signal Process. Lett. 11 (1) (2004) 56–59. [29] M.O. Sayin, N.D. Vanli, S.S. Kozat, A novel family of adaptive filtering algorithms based on the logarithmic cost, IEEE Trans. Signal Process. 62 (17) (2014) 4411–4424. [30] T. Shao, Y.R. Zheng, J. Benesty, An affine projection sign algorithm robust against impulsive interferences, IEEE Signal Process. Lett. 17 (4) (2010) 327–330. [31] B. Chen, L. Xing, H. Zhao, N. Zheng, J.C. Príncipe, Generalized correntropy for robust adaptive filtering, IEEE Trans. Signal Process. 64 (13) (2016) 3376–3387. [32] A. Singh, J.C. Príncipe, Using correntropy as a cost function in linear adaptive filters, in: International Joint Conference on Neural Networks, 2009, pp. 2950–2955. [33] V.C. Gogineni, S. Mula, Improved proportionate-type sparse adaptive filtering under maximum correntropy criterion in impulsive noise environments, Digit. Signal Process. 79 (2018) 190–198. [34] R.L. Das, M. Narwaria, Lorentzian based adaptive filters for impulsive noise environments, IEEE Trans. Circuits Syst. I, Regul. Pap. 64 (6) (2017) 1529–1539. [35] F. Huang, J. Zhang, S. Zhang, A family of robust adaptive filtering algorithms based on sigmoid cost, Signal Process. 149 (2018) 179–192. [36] S. Zhang, W.X. Zheng, J. Zhang, H. Han, A family of robust M-shaped error weighted least mean square algorithms: performance analysis and echo cancellation application, IEEE Access 5 (2017) 14716–14727. [37] J. Ni, J. Chen, X. Chen, Diffusion sign-error LMS algorithm: formulation and stochastic behavior analysis, Signal Process. 128 (2016) 142–149. [38] H. Zayyani, M. Korki, F. Marvasti, A distributed 1-bit compressed sensing algorithm robust to impulsive noise, IEEE Commun. Lett. 20 (6) (2016) 1132–1135. [39] M. Korki, H. Zayyani, Weighted diffusion continuous mixed p-norm algorithm for distributed estimation in non-uniform noise environment, Signal Process. 164 (2019) 225–233. [40] W. Ma, B. Chen, J. Duan, H. Zhao, Diffusion maximum correntropy criterion algorithms for robust distributed estimation, Digit. Signal Process. 58 (2016) 10–19. [41] W. Huang, L. Li, Q. Li, X. Yao, Distributed affine projection sign algorithms against impulsive interferences, Tein Tzu Hsueh Pao/Acta Electron. Sin. 44 (7) (2016) 1555–1560. [42] A.H. Sayed, Fundamentals of Adaptive Filtering, Wiley, 2003. [43] Z. Li, S. Guan, Diffusion normalized Huber adaptive filtering algorithm, J. Franklin Inst. 355 (8) (2018) 3812–3825. [44] W. Huang, L. Li, Q. Li, X. Yao, Diffusion robust variable step-size LMS algorithm over distributed networks, IEEE Access 6 (2018) 47511–47520. [45] S. Kim, J. Lee, W. Song, A theory on the convergence behaviour of the affine projection algorithm, IEEE Trans. Signal Process. 59 (12) (2011) 6233–6239. [46] C. Yu, L. Xie, Y.C. Soh, Blind channel and source estimation in networked systems, IEEE Trans. Signal Process. 62 (17) (2014) 4611–4626.
67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123
Annet Mary Wilson received her M.Tech. in Electronics and Communication Engineering from Calicut University, India, in 2015. She is currently working as SRF in the Department of Electronics and Communication Engineering, National Institute of Technology, Goa, India. Her research includes adaptive signal processing and signal processing for sensor networks.
124 125 126 127 128 129 130
Trilochan Panigrahi received his MTech in Electronics and Communication Engineering from Biju Patnaik University of Technology Rourkela,
131 132
JID:YDSPR AID:102589 /FLA
[m5G; v1.261; Prn:3/10/2019; 8:27] P.13 (1-13)
A.M. Wilson et al. / Digital Signal Processing ••• (••••) ••••••
1 2 3 4 5 6 7 8 9
India, in 2005, the Ph.D. in Electronics and Communication Engineering from National Institute of Technology, Rourkela, India, in 2012. He is currently an Associate Professor in the Department of Electronics and Communication Engineering, National Institute of Technology Goa, India. His research interests include signal processing for wireless sensor network and wireless communication, application of evolutionary algorithms in signal processing and source localization. Ankit Dubey received the B.E. degree in Electronics and Telecommunication Engineering from the Chhattisgarh Swami Vivekanand Technical
13
University, Bhilai, India, in 2009 and the Ph.D. degree in Electrical Engineering from the Indian Institute of Technology, Delhi, India, in 2014. Since January 2019, he has been with the faculty of the Department of Electrical Engineering, Indian Institute of Technology, Jammu, India, where he is currently an Assistant Professor. His research interests are in diversity combining, multi-hop transmission, and physical layer security for power line and wireless communications and smart grid communications.
67 68 69 70 71 72 73 74 75
10
76
11
77
12
78
13
79
14
80
15
81
16
82
17
83
18
84
19
85
20
86
21
87
22
88
23
89
24
90
25
91
26
92
27
93
28
94
29
95
30
96
31
97
32
98
33
99
34
100
35
101
36
102
37
103
38
104
39
105
40
106
41
107
42
108
43
109
44
110
45
111
46
112
47
113
48
114
49
115
50
116
51
117
52
118
53
119
54
120
55
121
56
122
57
123
58
124
59
125
60
126
61
127
62
128
63
129
64
130
65
131
66
132