ARTICLE IN PRESS
Signal Processing 85 (2005) 1759–1779 www.elsevier.com/locate/sigpro
Efficient fast recursive least-squares adaptive complex filtering using real valued arithmetic George-Othon Glentis Technological Education Institute of Crete, Branch at Chania, Department of Electronics, 3, Romanou Str., Halepa, 73133 Chania, Greece Received 26 January 2004; received in revised form 29 March 2005
Abstract In this paper, a novel implementation of the fast recursive least squares (FRLS) algorithm for complex adaptive filtering, is presented. Complex signals are treated as pairs of real signals, and operations are re-organized on a real arithmetic basis. A new set of signals and filtering parameters is introduced, which are expressed either as the sum, or as the difference between the real and the imaginary part of the original variables. The algorithmic strength reduction technique is subsequently applied, resulting in significant computational savings. In this way, an efficient, FRLS algorithm is derived, where only two real valued filters are propagated, based on novel three-terms recursions for the time updating of the pertinent parameters. The proposed algorithm is implemented using real valued arithmetic only, whilst reducing the number of the required real valued multiplication operations by 23.5%, at the expense of a marginal 2.9% increase in the number of the real valued additions. The performance of the proposed FRLS algorithm is investigated by computer simulations, in the context of adaptive system identification and adaptive channel equalization. r 2005 Elsevier B.V. All rights reserved. Keywords: Adaptive filtering; Fast algorithms; Complex valued arithmetic
1. Introduction Complex valued signals are encountered in many adaptive signal processing applications. Typical examples include the design of adaptive algorithms for channel identification or channel Tel.: +30 28210 23003; fax: +30 821 281 90.
E-mail address:
[email protected].
equalization of telecommunication links, the design of adaptive beamformers or sidelobe cancellers in array signal processing, etc, [1–5]. While most of the existing real valued adaptive algorithms can easily be extended to handle complexvalued signals, it was not until recently when attention was given in the specific implementation of the complex valued operations [6–9]. The fast complex-valued multiplication method [10,11], has
0165-1684/$ - see front matter r 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.sigpro.2005.03.013
ARTICLE IN PRESS 1760
G.-O. Glentis / Signal Processing 85 (2005) 1759–1779
been adopted for the design of efficient digital signal processing algorithms, including digital filtering and adaptive filtering. In the context of VLSI signal processing, the fast complex valued multiplication method has been introduced as an algorithmic strength reduction (SR) transform, and it has been successfully applied for the design of low-power, high-speed adaptive filters and equalizers [12–14,8,15,16]. In this paper, the fast recursive least squares (FRLS) algorithm for complex adaptive filtering is considered and an efficient real valued FRLS scheme is derived. Throughout this paper, the FRLS nomenclature introduced by the FAEST algorithm is adopted [17]. The FRLS FTF counterpart can be handled in a similar way [18]. Direct application of a fast complex valued multiplication method to the original FRLS scheme, where each individual complex multiplication is performed using a fast scheme [10,19,11], can reduce the number of the required real multiplications by 23.5%, at the expense of a 70% increase in the required number of additions. To keep the overhead due to the extra additions low, both the input and the desired response signals, as well as all variables associated with the FRLS algorithm, are treated as pairs of real signals, and operations are re-organized on a real arithmetic basis. Auxiliary signals and filter parameters are introduced, which correspond to the generic signal transformation sR ðnÞ sI ðnÞ, where sR ðnÞ and sI ðnÞ represent the real and the imaginary part of the complex valued signal sðnÞ, respectively. The algorithmic strength reduction technique is subsequently applied to the transformed FRLS algorithm [9,16]. In this way, novel real valued threeterms recursions are derived for the efficient time update of the FRLS parameters, resulting in significant computational savings. The proposed FRLS algorithm utilizes real arithmetic only, for the efficient realization of the transversal filtering part, as well as for the parameters updating part. The computational complexity of the proposed scheme, measured by the required number of the valued real multiplications, is reduced by 23.5%, with a marginal 2.9% increase in the number of the real valued additions. Computational savings, without sacrifi-
cing performance, is a task of primary interest in the design of high-speed adaptive systems, where processing power of several GOPS (Giga operations per second) is needed. On the other hand, in slow sampling rate adaptive filtering systems, for example in speech processing and hearing aids applications, where special VLSI ASICs are required for small size and low-power implementation, the reduction in the computational complexity is of primarily interest, since it is directly related to the size and the power consumption of the final design. The performance of the proposed FRLS implementation is investigated by computer simulations, in the context of adaptive system identification and adaptive channel equalization. To avoid numerical instability, error feedback stabilization is utilized [20]. Simulation results indicate that both the original complex valued FRLS algorithm as well as the proposed efficient FRLS implementation using real valued arithmetic have similar convergence and stabilization properties.
2. Fast RLS adaptive filtering overview Let us consider a complex valued FIR model described by the difference equation [3] yðnÞ ¼
M X
ci xðn i þ 1Þ.
(1)
i¼1
Here xðnÞ is a complex valued input signal and ci , i ¼ 1; . . . M, are the complex valued filter coefficients. (As for the superscripts throughout this paper, ‘*’ means the complex conjugate, ‘T’ denotes the vector transpose, and ‘H’ stands for the Hermitian operator (conjugate and transpose).) The above equation is compactly written as yðnÞ ¼ cH M xM ðnÞ.
(2)
Vector xM ðnÞ is the regressor xM ðnÞ ¼ ½xðnÞ xðn 1Þ . . . xðn M þ 1Þ T
(3)
and vector cM ¼ ½c1 c2 . . . cM T , is the filter coefficients vector.
(4)
ARTICLE IN PRESS G.-O. Glentis / Signal Processing 85 (2005) 1759–1779
Given a sequence of a complex valued input signal xðkÞ, and a complex valued desired response signal zðkÞ, over a finite observation horizon, k ¼ 0; 1; . . . n, the optimum parameters of the FIR model of Eq. (1) are estimated by minimizing the exponentially forgetting total squared error, i.e., ! n X c cM ðnÞ : min lnk ec M ðkÞeM ðkÞ , cM
k¼0
ecM ðkÞ
¼ zðkÞ cH M ðnÞxM ðkÞ.
ð5Þ
Here, 05lo1 is a parameter that controls the size of the exponentially fading memory of the adaptation mechanism. Minimization of the cost function (5), with respect to the filter coefficients vector, results in a set of linear equations, the socalled normal equations RM ðnÞcM ðnÞ ¼ dM ðnÞ.
(6)
Matrix RM ðnÞ is the sampled autocorrelation matrix of the input signal xðnÞ, defined as RM ðnÞ ¼
n X
lnk xM ðkÞxH M ðkÞ.
(7)
k¼0
Vector dM ðnÞ is the sampled cross-correlation vector between the input signal xðnÞ and the desired response signal zðnÞ, i.e., dM ðnÞ ¼
n X
lnk xM ðkÞz ðkÞ.
(8)
k¼0
The RLS algorithm provides a method for the time updating of the inverse autocorrelation matrix, bypassing the direct matrix inversion imposed by Eq. (6). Indeed, application of the matrix inversion lemma [3, p. 480] to Eqs. (6)–(8), results in the following updating scheme: cM ðnÞ ¼ ðzðnÞ cH M ðn 1ÞxM ðnÞÞ =ð1 þ wH M ðnÞxM ðnÞÞ, cM ðnÞ ¼ cM ðn 1Þ þ wM ðnÞc M ðnÞ.
ð9Þ (10)
Parameter wM ðnÞ that appears in the above equations, is the so-called Kalman gain vector, defined as [17,18] wM ðnÞ ¼ l1 R1 M ðn 1ÞxM ðnÞ.
(11)
1761
According to the RLS algorithm, the inverse matrix R1 M ðn 1Þ can be updated using an OðM 2 Þ recursive scheme [3]. Fast RLS algorithms go a step further, using fast OðMÞ adaptive schemes for the time update of the Kalman gain vector, bypassing the need of the inverse matrix R1 M ðn 1Þ adaptation, [17,18,21,4]. The key point in the development of fast RLS algorithms is the proper utilization of the shift invariance property of the data matrix. Indeed, the increased order data vector xMþ1 ðnÞ can be partitioned either in a lower or in an upper form, as " # " # xM ðnÞ xðnÞ xMþ1 ðnÞ ¼ ¼ . (12) xðn MÞ xM ðn 1Þ These partitions of the data vector are utilized to obtain appropriate partitions for the increased order autocorrelation matrix RMþ1 ðnÞ. It has been shown [17,18] that the Kalman gain vector wM ðn 1Þ at time n 1, is related to the Kalman gain vector at time n, as it follows from the upper and the lower form solution of the increased order system, Eq. (11). The fast RLS algorithm for complex adaptive filtering is tabulated in Table 1. Auxiliary vectors aM ðnÞ and bM ðnÞ are least squares forward and backward predictors, and result from the general filtering scheme of Eq. (1), by setting zðnÞ ¼ xðn þ 1Þ and zðnÞ ¼ xðn MÞ, respectively. The intrinsic numerical instability of the FRLS algorithm is treated using the error feedback stabilization method, originally proposed in [22,20]. Two specific variables are used for this purpose, the a priori backward prediction error ebM ðnÞ and the Kalman gain power aM ðnÞ. These are estimated by the algorithm as is dictated by Eqs. (10) and (6), (14) of Table 1, respectively. At the same time these parameters are re-estimated using Eqs. (11) and (15) of Table 1. The error signals DbM ðnÞ and DaM ðnÞ corresponding to the backward prediction error and the Kalman gain power, are utilized as a feedback mechanism to control the error propagation and provide a new set of variables to be used by the algorithm as is dictated by Eqs. (13) and (17) of Table 1. Tuning variables sb and sa are carefully selected so that
1762
Table 1 The stabilized complex valued Fast RLS (FRLS) adaptive algorithm Initialization
A: Addition; M: Multiplication; R: Real valued; C: Complex valued
cM ð1Þ ¼ 0; wM ð1Þ ¼ 0; aM ð1Þ ¼ 0; bM ð1Þ ¼ 0 aM ð1Þ ¼ 1, afM ð1Þ ¼ lM abM ð1Þ, abM ð1Þ ¼ s2x
Forward prediction part 1 efM ðnÞ ¼ xðnÞ þ aH M ðn 1ÞxM ðn 1Þ 2 f ðnÞ ¼ ef ðnÞ=aM ðn 1Þ M
CM
CA
M
M
M
M
4
afM ðnÞ ¼ lafM ðn 1Þ þ efM ðnÞfM ðnÞ
2
5
kfMþ1 ðnÞ ¼ l1 efM ðnÞ=afM ðn 1Þ
2
6
ðnÞ aMþ1 ðnÞ ¼ aM ðn 1Þ þ efM ðnÞkfMþ1
2
" wMþ1 ðnÞ ¼
8 wMþ1 ðnÞ ¼ 9
0
wM ðn 1Þ " # qM ðnÞ
#
" þ
1 aM ðn 1Þ
15 16 17 18
wM ðnÞ ¼ qM ðnÞ bM ðn 1ÞkbMþ1 ðnÞ
M
M
M
M
M
M 1 3
2
b aM ðnÞ ¼ aMþ1 ðnÞ eb;1 M ðnÞk Mþ1 ðnÞ ðnÞx ðnÞ AM ðnÞ ¼ 1 þ wH M M DaM ðnÞ ¼ AM ðnÞ aM ðnÞ a a aM ðnÞ ¼ AM ðnÞ þ s DM ðnÞ b;i b;i M ðnÞ ¼ eM ðnÞ=aM ðnÞ;
bM ðnÞ ¼ bM ðn 1Þ wM ðnÞb;2 M ðnÞ
20
b;3 abM ðnÞ ¼ labM ðn 1Þ þ eb;3 M ðnÞM ðnÞ
ecM ðnÞ ¼ zðnÞ cH M ðn 1ÞxM ðnÞ cM ðnÞ ¼ ecM ðnÞ=aM ðnÞ cM ðnÞ ¼ cM ðn 1Þ þ wM ðnÞc M ðnÞ
2
2
2M
2M 1 1
4
i ¼ 2; 3
19
Filtering part 21 22 23
kfMþ1 ðnÞ
kbMþ1 ðnÞ
Backward prediction part 10 ebM ðnÞ ¼ labM ðn 1ÞkbMþ1 ðnÞ 11 E bM ðnÞ ¼ xðN mÞ þ bH M ðn 1ÞxM ðnÞ 12 DbM ðnÞ ¼ E bM ðnÞ ebM ðnÞ b b b 13 eb;i i ¼ 1; 2; 3 M ðnÞ ¼ E M ðnÞ þ si D ðnÞ; 14
#
2
M
M 2
M
M
M
M
2
2
Forward and backward prediction powers abM ð1Þ and afM ð1Þ are initialized using an estimate of the input signal power, s2x . Eq. (8) defines a vector partition.
ARTICLE IN PRESS
Kalman gain part 7
2
G.-O. Glentis / Signal Processing 85 (2005) 1759–1779
aM ðnÞ ¼ aM ðn 1Þ wM ðn 1ÞfM ðnÞ
RA
2
M
3
RM
ARTICLE IN PRESS G.-O. Glentis / Signal Processing 85 (2005) 1759–1779
the overall algorithm remains stable. An additional constrain of the stabilized fast RLS algorithms is the nominal range within which the forgetting factor l should be, in order the algorithm to remain stable, and as was explained in [18], l 2 ð1 1=3M; 1Þ [20].
3. Algorithmic strength reduction Complex valued arithmetic is required for the implementation of the FRLS algorithm when the input signals and/or the filter model are represented by complex valued variables. Complex valued addition is realized by a set of two real valued additions, i.e., ða þ jbÞ þ ðc þ jdÞ ¼ ða þ cÞ þ jðb þ dÞ.
(13)
Complex valued multiplication can be realized by the classic method, that requires four real valued multiplications and two real valued additions, i.e., ða þ jbÞðc þ jdÞ ¼ ðac bdÞ þ jðad þ bcÞ.
(14)
Alternatively, the fast complex valued multiplication method can be applied, where the inherent dependencies of the partial products and sums, are utilized for the reduction of the number of real valued multiplication operations, at the expense of some extra real valued additions [10,14,15,19,11]. In this case, three real valued multiplications and five real valued additions, are required. Among others, two possible implementations of the fast complex valued multiplication are described by the following equations [19]: ac bd ¼ bðc dÞ þ ða bÞc, ad þ bc ¼ aðc þ dÞ ða bÞc
that the input–output behavior is preserved. The application of the fast complex valued multiplication method belongs to a general class of algorithmic transformations, and it is known as the algorithmic strength reduction transform. It has been successfully applied for the design of lowpower, high-speed adaptive filters and equalizers [12,13,16]. Direct implementation of the fast complex valued multiplication method to the FRLS scheme, can reduce the required number of real multiplication operations. However, although the number of real valued multiplications is reduced, there is a significant increase in the number of real valued additions and in the total number of the overall real valued operations performed by the algorithm. Further improvement on the number of the required real valued additions can be achieved by algorithm re-organization, using a set of auxiliary signals and filter parameters, that are propagated through the algorithm and are updated by the pertinent recursive equations, in accordance to the strength reduction real valued arithmetic imposed by Eqs. (15) and (16).
4. The SR FRLS adaptive algorithm Application of the algorithmic strength reduction transform, described by Eqs. (15) and (16), to the FRLS adaptive algorithm, results in an equivalent algorithmic description, called the strength reduced (SR) FRLS adaptive algorithm, where real valued arithmetic is only required. The main features of the SR FRLS algorithm are
ð15Þ
and ac bd ¼ ða þ bÞd þ aðc þ dÞ, ad þ bc ¼ bðc dÞ þ ða þ bÞd.
1763
ð16Þ
Efficient signal processing algorithms, for digital filtering and adaptive filtering, have been developed by taken into account the reduced complexity, fast complex valued multiplication method [6–9]. In the context of VLSI signal processing, transformations are modifications to the computational structure of a given algorithm, in a manner
the introduction of a set of real valued auxiliary signals that are fed into the transversal filters and are propagated through the algorithm, and the use of a set of real valued, modified filter parameters that are updated instead of the original ones, using novel three-terms recursions. As a result, a minimum amount of operations, i.e., real valued multiplication and addition, is attained.
ARTICLE IN PRESS G.-O. Glentis / Signal Processing 85 (2005) 1759–1779
1764
Let us consider the complex filtering set up dictated by Eqs. (1)–(4). The input signal xðnÞ is expressed in terms of a real and an imaginary part, as xðnÞ ¼ xR ðnÞ þ jxI ðnÞ (throughout this paper, subscripts R and I denote the real and the imaginary parts of a scalar, complex valued variable, respectively). As a consequence, the corresponding complex regressor vector xM ðnÞ is written as I xM ðnÞ ¼ xR M ðnÞ þ jxM ðnÞ,
(17)
I where vectors xR M ðnÞ and xM ðnÞ denote the real and the imaginary part of xM ðnÞ, respectively, i.e., T xR M ðnÞ ¼ ½xR ðnÞ xR ðn 1Þ . . . xR ðn M þ 1Þ
(18) and xIM ðnÞ
cM ðnÞ. They are defined as I bcM ðnÞ ¼ cR M ðnÞ þ cM ðnÞ,
(26)
I c~ M ðnÞ ¼ cR M ðnÞ cM ðnÞ.
(27)
Consider the complex variable ~ þ jb Y ðnÞ ¼ yðnÞ yðnÞ.
(28)
It is computed using Eqs. (24) and (25) as I Y ðnÞ ¼ ðbcM ðn 1Þ þ j~cM ðn 1ÞÞT ðxR M ðnÞ þ jxM ðnÞÞ.
(29) Application of the SR transform, Eq. (15), to the above equation, results in T ybðnÞ ¼ bcM ðn 1Þb xM ðnÞ ðbcM ðn 1Þ
c~ M ðn 1ÞÞT xR M ðnÞ,
ð30Þ
~ ¼ c~ TM ðn 1Þx~ M ðnÞ þ ðbcM ðn 1Þ yðnÞ ¼ ½xI ðnÞ xI ðn 1Þ . . . xI ðn M þ 1Þ
T
c~ M ðn 1ÞÞT xR M ðnÞ. (19)
(superscripts R and I denote the real and the imaginary parts of a vector, complex valued variable, respectively). In a similar way, the filter coefficients complex vector cM ðnÞ is expressed as cM ðnÞ ¼
cR M ðnÞ
þ
jcIM ðnÞ.
(20)
Using Eqs. (17) and (20), the filter output at time n, i.e., yðnÞ ¼ cH M ðn 1ÞxM ðnÞ, is expanded as I;T R I yðnÞ ¼ cR;T M ðn 1ÞxM ðnÞ þ cM ðn 1ÞxM ðnÞ I;T I R þ jðcR;T M ðn 1ÞxM ðnÞ cM ðn 1ÞxM ðnÞÞ.
ð21Þ Let us consider a set of auxiliary filter output signals, defined as ybðnÞ ¼ yR ðnÞ þ yI ðnÞ,
(22)
~ ¼ yR ðnÞ yI ðnÞ. yðnÞ
(23)
~ Using Eq. (21), variables ybðnÞ and yðnÞ are estimated as T ybðnÞ ¼ c~ TM ðn 1ÞxR cM ðn 1ÞxIM ðnÞ, M ðnÞ þ b
(24)
~ TM ðn 1ÞxIM ðnÞ. ~ ¼ bcTM ðn 1ÞxR yðnÞ M ðnÞ c
(25)
Here, vectors bcM ðnÞ and c~ M ðnÞ are transformed versions of the original filter coefficient vector
ð31Þ
The auxiliary regressors b xM ðnÞ and x~ M ðnÞ appeared above, are compatible defined as I b xM ðnÞ ¼ xR M ðnÞ þ xM ðnÞ,
(32)
I x~ M ðnÞ ¼ xR M ðnÞ xM ðnÞ.
(33)
Eqs. (30) and (31) provide a cost effective way of ~ computing variables ybðnÞ and yðnÞ, using 3M real valued multiplications and 4M 1 real valued additions. On the contrary, evaluation of either ~ using Eqs. (24) and (25), or yðnÞ using ybðnÞ and yðnÞ Eq. (21), requires 4M real valued multiplications and 4M 2 real valued additions. Thus, working ~ with the transformed variables ybðnÞ and yðnÞ instead of the original parameter yðnÞ, results in a significant reduction in the required number of real valued multiplications. On the contrary, if the SR technique is applied on an individual basis, for each complex valued multiplication involved into the inner product computation yðnÞ ¼ cH M ðn 1ÞxM ðnÞ, 3M real valued multiplications and 7M real valued additions are utilized. Estimation of the complex convolution yðnÞ ¼ cH M ðn 1ÞxM ðnÞ using the SR technique was originally introduced in the context of LMS adaptive filtering [9], where yR ðnÞ and yI ðnÞ was computed in an efficient way. Here, we approach
ARTICLE IN PRESS G.-O. Glentis / Signal Processing 85 (2005) 1759–1779
the subject in a slightly different way, estimating ~ instead of yR ðnÞ and yI ðnÞ. Of course, ybðnÞ and yðnÞ the later variables can be retrieved from the former, by applying the inverse transformation implied by Eqs. (22) and (23). ~ Since ybðnÞ and yðnÞ are estimated using the transformed filters bcM ðnÞ and c~ M ðnÞ, it could be more efficient, if these variables were updated directly, instead of the updating the original cM ðnÞ. Using Eq. (23) of Table 1, we get a couple of recursive equations, of the form R cR M ðnÞ ¼ cM ðn 1Þ þ
c;R wR M ðnÞM ðnÞ
þ
wIM ðnÞc;I M ðnÞ, (34)
c;I c;R I cIM ðnÞ ¼ cIM ðn 1Þ wR M ðnÞM ðnÞ þ wM ðnÞM ðnÞ. (35)
If we insert the above recursions into Eqs. (26) and (27), we get bcM ðnÞ ¼ bcM ðn 1Þ þ ðb ~ M ðnÞc;I wM ðnÞc;R M ðnÞ w M ðnÞÞ, (36) ~ M ðnÞc;R b M ðnÞc;I c~ M ðnÞ ¼ c~ M ðn 1Þ þ ðw M ðnÞ þ w M ðnÞÞ. (37) b M ðnÞ Here, the transformed Kalman gain vectors w ~ M ðnÞ, are defined as and w I b M ðnÞ ¼ wR w M ðnÞ þ wM ðnÞ,
(38)
I ~ M ðnÞ ¼ wR w M ðnÞ wM ðnÞ.
(39)
Consider the complex valued transformed coefficients vector defined as CM ðnÞ ¼ bcM ðnÞ j~cM ðnÞ.
(40)
It is updated using Eqs. (36) and (37) as ~ M ðnÞÞ CM ðnÞ ¼ CM ðn 1Þ þ ðb wM ðnÞ j w c;I ðc;R M ðnÞ jM ðnÞÞ.
ð41Þ
Finally, application of the algorithmic strength reduction transform, Eq. (16), to the above equation, results in bcM ðnÞ ¼ bcM ðn 1Þ þ w b M ðnÞ~cM ðnÞ þ ðb wM ðnÞ w~ M ðnÞÞc;I M ðnÞ,
ð42Þ
1765
~ M ðnÞbcM ðnÞ c~ M ðnÞ ¼ c~ M ðn 1Þ þ w ~ M ðnÞÞc;I þ ðb wM ðnÞ w M ðnÞ.
ð43Þ
In the above recursions, a set of transformed filtering error variables has been introduced according to the scheme b zðnÞ þ ybðnÞ, ecM ðnÞ ¼ b
(44)
~ e~cM ðnÞ ¼ z~ðnÞ yðnÞ,
(45)
where b zðnÞ ¼ zR ðnÞ þ zI ðnÞ;
z~ðnÞ ¼ zR ðnÞ zI ðnÞ.
(46)
The computational complexity of Eqs. (42) and (43) is 3M real valued multiplications and 5M real valued additions. This figure should be compared with the computational complexity of the original adaptive scheme implied by Eqs. (34) and (35), which is either 4M real valued multiplications and 4M real valued additions, or 3M real valued multiplications and 7M real valued additions, depending on the type of complex valued multiplication (i.e., classical, or fast elementwise) is adopted. Clearly, the proposed recursions provide the best low complexity compromise among all the competitive schemes. The SR transform can be applied to the forward and the backward prediction parameters that are involved in the computational flow of the FRLS algorithm of Table 1. Indeed, fast, three-term recursions can be derived for the efficient updating of the transformed forward prediction parameters, implied by Eqs. (1)–(4) of Table 1, as well as the transformed backward prediction parameters, resulting from Eqs. (10)–(20) of Table 1. The SR FRLS formulas for updating the transformed forward and backward parameters are tabulated in Table 2, Eqs. (1)–(15) and (36)–(57), respectively. These reduced complexity recursions result from the general filtering scheme discussed so far, bðn þ 1Þ, z~ðnÞ ¼ xðn ~ þ 1Þ, and by setting b zðnÞ ¼ x bðn MÞ, z~ðnÞ ¼ xðn ~ MÞ, respectively. b zðnÞ ¼ x Finally, the derivation of the SR Kalman gain updating scheme is provided by Eqs. (16)–(35) of Table 2. The proof of the pertinent SR recursions is supplied in Appendix. The signal flow chart of the proposed FRLS algorithm is presented in Fig. 1. Twelve basic processing blocks are utilized, named after the
ARTICLE IN PRESS G.-O. Glentis / Signal Processing 85 (2005) 1759–1779
1766
Table 2 Real valued implementation of the stabilized FRLS adaptive algorithm, using the strength reduction transform (SR FRLS) Initialization bcM ð1Þ ¼ 0; c~ M ð1Þ ¼ 0; w b M ð1Þ ¼ 0; w ~ M ð1Þ ¼ 0; b aM ð1Þ ¼ 0; a~ M ð1Þ ¼ 0 b bM ð1Þ ¼ 0; b~ M ð1Þ ¼ 0; dw ð1Þ ¼ 0 M
aM ð1Þ ¼ 1; afM ð1Þ ¼ lM abM ð1Þ; abM ð1Þ ¼ s2x bðnÞ ¼ xR ðnÞ þ xI ðnÞ; xðnÞ ~ x ¼ xR ðnÞ xI ðnÞ b zðnÞ ¼ zR ðnÞ þ zI ðnÞ; z~ðnÞ ¼ zR ðnÞ zI ðnÞ
Forward prediction part 1 aTM ðn 1Þb xM ðn 1Þ xf1 ðnÞ ¼ b f 2 x ðnÞ ¼ a~ T ðn 1Þx~ M ðn 1Þ 2
3 4 5 6 7 8 9 10 11 12 13 14 15 Kalman gain part 16 17 18 19 20 21 22 23
M
aM ðn 1Þ a~ M ðn 1Þ daM ðn 1Þ ¼ b
R xf3 ðnÞ ¼ daT M ðn 1ÞxM ðn 1Þ
RM
RA
M
M 1
M
M 1
0 M
M M 1
b bðnÞ þ ðxf1 ðnÞ xf3 ðnÞÞ efM ðnÞ ¼ x
~ þ ðxf2 ðnÞ þ xf3 ðnÞÞ e~fM ðnÞ ¼ xðnÞ f bM ðnÞ ¼ b efM ðnÞ=aM ðn 1Þ f ~M ðnÞ ¼ e~fM ðnÞ=aM ðn 1Þ fM;R ðnÞ ¼ 0:5ðbfM ðnÞ þ ~fM ðnÞÞ b M ðn 1Þ~fM ðnÞ gfM;1 ðnÞ ¼ w f ;2 ~ M ðn 1ÞbfM ðnÞ gM ðnÞ ¼ w f ;3 gM ðnÞ ¼ dwM ðn 1ÞfM;R ðnÞ b aM ðn 1Þ gfM;1 ðnÞ gfM;3 ðnÞ aM ðnÞ ¼ b a~ M ðnÞ ¼ a~ M ðn 1Þ gfM;2 ðnÞ gfM;3 ðnÞ efM ðnÞbfM ðnÞ afM ðnÞ ¼ lafM ðn 1Þ þ 0:5ðb
2 2 1 1 1 M M M 2M 2M þ e~fM ðnÞ~fM ðnÞÞ
f efM ðnÞ=afM ðn 1Þ kbMþ1 ðnÞ ¼ l1 b f 1 k~Mþ1 ðnÞ ¼ l e~fM ðnÞ=afM ðn 1Þ " # 1 f wf ;1 ðnÞ gMþ1 ðnÞ ¼ kb b aM ðn 1Þ Mþ1 " # 1 f ;2 ðnÞ gwf k~ Mþ1 ðnÞ ¼ a ~ M ðn 1Þ Mþ1 f f ;I ðnÞ ¼ 0:5ðkbMþ1 ðnÞ k~Mþ1 ðnÞÞ kfMþ1 " # 0 ;3 f ;I gwf Mþ1 ðnÞ ¼ dw ðn 1Þ kMþ1 ðnÞ M " # 0 ;1 wf ;3 b Mþ1 ¼ þ gwf w Mþ1 ðnÞ gMþ1 ðnÞ b M ðn 1Þ w " # 0 ;2 wf ;3 þ gwf w~ Mþ1 ¼ Mþ1 ðnÞ gMþ1 ðnÞ ~ M ðn 1Þ w
2
1 1 M M 0
1
M 2M 2M 2
25
f f efM ðnÞ þ k~Mþ1 ðnÞ~efM ðnÞÞ aMþ1 ðnÞ ¼ aM ðn 1Þ þ 0:5ðkbMþ1 ðnÞb 2 3 " # b qM ðnÞ q~ M ðnÞ 4 5 b Mþ1 ðnÞ ¼ ; w~ Mþ1 ðnÞ ¼ ~b w b kMþ1 ðnÞ kbMþ1 ðnÞ
26
bb b gwb;1 M ðnÞ ¼ bM ðn 1Þk Mþ1 ðnÞ
M
24
2
2
ARTICLE IN PRESS G.-O. Glentis / Signal Processing 85 (2005) 1759–1779
1767
Table 2 (continued ) Initialization bcM ð1Þ ¼ 0; c~ M ð1Þ ¼ 0; w b M ð1Þ ¼ 0; w ~ M ð1Þ ¼ 0; b aM ð1Þ ¼ 0; a~ M ð1Þ ¼ 0 b bM ð1Þ ¼ 0; b~ M ð1Þ ¼ 0; dw ð1Þ ¼ 0 M
aM ð1Þ ¼ 1; afM ð1Þ ¼ lM abM ð1Þ; abM ð1Þ ¼ s2x bðnÞ ¼ xR ðnÞ þ xI ðnÞ; xðnÞ ~ x ¼ xR ðnÞ xI ðnÞ b zðnÞ ¼ zR ðnÞ þ zI ðnÞ; z~ðnÞ ¼ zR ðnÞ zI ðnÞ RM 27 28 29 30 31 32 33 34 35
~b b gwb;2 M ðnÞ ¼ bM ðn 1Þk Mþ1 ðnÞ kb;I ðnÞ ¼ 0:5ðkbMþ1 ðnÞ k~Mþ1 ðnÞÞ
1
dbM ðn 1Þ ¼ b bM ðn 1Þ b~ M ðn 1Þ b;I b ðnÞ ¼ d gwb;3 M M ðn 1Þk Mþ1 ðnÞ
M
Mþ1
M
M
AM ðnÞ ¼ 1 þ
Mþ1 0:5ðb wTM ðnÞb xM ðnÞ
Backward prediction part T 36 bM ðn 1Þb xM ðnÞ xb1 ðnÞ ¼ b T 37 xb ðnÞ ¼ b~ ðn 1Þx~ M ðnÞ 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 Filtering part 58 59
M M
wb;3 b M ðnÞ ¼ b qM ðnÞ gwb;1 w M ðnÞ þ gM ðnÞ wb;2 w~ M ðnÞ ¼ q~ M ðnÞ gM ðnÞ þ gwb;3 M ðnÞ b b ebM ðnÞ ¼ labM ðn 1ÞkbMþ1 ðnÞ b e~b ðnÞ ¼ lab ðn 1Þk~ ðnÞ
2M 2M 1 1
þ
w~ TM ðnÞx~ M ðnÞÞ
M 2 R xb3 ðnÞ ¼ dbT M ðn 1ÞxM ðnÞ b bðn MÞ þ ðxb1 ðnÞ xb3 ðnÞÞ EbM ðnÞ ¼ x b ~ MÞ þ ðxb2 ðnÞ þ xb3 ðnÞÞ E~ M ðnÞ ¼ xðn b b b b ebM ðnÞ DM ðnÞ ¼ E M ðnÞ b b b D~ M ðnÞ ¼ E~ M ðnÞ e~bM ðnÞ b b ðnÞ; i ¼ 1; 2; 3 bb b eb;i sbi D M M ðnÞ ¼ E M ðnÞ þ b ~ b ðnÞ þ s~ b D~ b ðnÞ; i ¼ 1; 2; 3 ðnÞ ¼ E e~b;i i M M M b;1 bb ~b eb;1 aM ðnÞ ¼ aMþ1 ðnÞ 0:5ðb M ðnÞk Mþ1 ðnÞ þ e~M ðnÞk Mþ1 ðnÞÞ a DM ðnÞ ¼ AM ðnÞ aM ðnÞ aM ðnÞ ¼ AM ðnÞ þ sa DaM ðnÞ bb;i eb;i i ¼ 2; 3 M ðnÞ=aM ðnÞ; M ðnÞ ¼ b b;i ~M ðnÞ ¼ e~b;i i ¼ 2; 3 M ðnÞ=aM ðnÞ; ~b;2 b;I b;2 M ðnÞ M ðnÞ ¼ 0:5ðb M ðnÞÞ b M ðnÞ~b;2 gb;1 M ðnÞ ¼ w M ðnÞ b;2 ~ M ðnÞbb;2 gM ðnÞ ¼ w M ðnÞ b M ðnÞ w ~ M ðnÞ dwM ðnÞ ¼ w b;I w ðnÞ ¼ d ðnÞ gb;3 M M ðnÞ M b;3 b bM ðnÞ ¼ b bM ðn 1Þ gb;1 M ðnÞ gM ðnÞ b;2 b~ M ðnÞ ¼ b~ M ðn 1Þ gM ðnÞ gb;3 M ðnÞ b;3 abM ðnÞ ¼ labM ðn 1Þ þ 0:5ðb eb;3 ðnÞb b;3 b;3 M ðnÞ þ e~M ðnÞ~ M M ðnÞÞ
T y1 ðnÞ ¼ bcM ðn 1Þb xM ðnÞ y2 ðnÞ ¼ c~ TM ðn 1Þx~ M ðnÞ
RA
2M
2M 1
M
M 1
M
M 1
M
M 1 2 2 1 1 3 3
2
2
1 1 2 2 1 M M 0 M
M M 2M
2
2M 2
M
M 1
M
M 1
ARTICLE IN PRESS G.-O. Glentis / Signal Processing 85 (2005) 1759–1779
1768 Table 2 (continued ) Initialization
bcM ð1Þ ¼ 0; c~ M ð1Þ ¼ 0; w b M ð1Þ ¼ 0; w ~ M ð1Þ ¼ 0; b aM ð1Þ ¼ 0; a~ M ð1Þ ¼ 0 b bM ð1Þ ¼ 0; b~ M ð1Þ ¼ 0; dw ð1Þ ¼ 0 M
aM ð1Þ ¼ 1; afM ð1Þ ¼ lM abM ð1Þ; abM ð1Þ ¼ s2x bðnÞ ¼ xR ðnÞ þ xI ðnÞ; xðnÞ ~ x ¼ xR ðnÞ xI ðnÞ b zðnÞ ¼ zR ðnÞ þ zI ðnÞ; z~ðnÞ ¼ zR ðnÞ zI ðnÞ RM 60 61 62 63 64 65 66 67 68 69 70 71
dcM ðn 1Þ ¼ bcM ðn 1Þ c~ M ðn 1Þ R y3 ðnÞ ¼ dcT M ðn 1ÞxM ðnÞ c b eM ðnÞ ¼ b zðnÞ ðy1 ðnÞ y3 ðnÞÞ e~cM ðnÞ ¼ z~ðnÞ ðy2 ðnÞ þ y3 ðnÞÞ bcM ðnÞ ¼ b ecM ðnÞ=aM ðnÞ ~cM ðnÞ ¼ e~cM ðnÞ=aM ðnÞ cM ðnÞ ~cM ðnÞÞ c;I M ðnÞ ¼ 0:5ðb c;1 b M ðnÞ~cM ðnÞ gM ðnÞ ¼ w c;2 ~ M ðnÞbcM ðnÞ gM ðnÞ ¼ w c;3 gM ðnÞ ¼ dwM ðnÞc;I M ðnÞ c;3 bcM ðnÞ ¼ bcM ðn 1Þ þ gc;1 M ðnÞ þ gM ðnÞ c;2 c~ M ðnÞ ¼ c~ M ðn 1Þ þ gM ðnÞ þ gc;3 M ðnÞ
M
RA M M 1 2 2
1 1 1 M M M
M 1 2M 2M
Forward and backward prediction powers abM ð1Þ and afM ð1Þ are initialized using an estimate of the input signal power, s2x . Eqs. (25) define vector partitions.
~ M ðnÞ and c~ M ðnÞ, four extra local vector variables w are used, i.e., daM ðnÞ, dbM ðnÞ, dwM ðnÞ and dcM ðnÞ. These are the difference between the corresponding ‘tilde’ and ‘hat’ state parameters, and are defined by Eqs. (3), (29), (53) and (60) of Table 2, respectively. Equations of Table 2 that describe the proposed stabilized FRLS adaptive, can be partitioned to fit into the processing scheme of Fig. 1. Take for example the top-left block that is characterized by the vector variable b aM ðnÞ. Then, Eqs. (1), (5), (7), (10), (13) and (16) of Table 2, may be assigned to this particular processing block.
Fig. 1. Block diagram of the proposed SR FRLS algorithm.
main vector parameter that is processed by each block. Apart from the ‘state’ vector variables b b M ðnÞ and bcM ðnÞ, and a~ M ðnÞ, b~ M ðnÞ, aM ðnÞ, b bM ðnÞ, w
Remark 1. Eq. (29) implies a least-squares filtering interpretation, where the FIR model is parameterized by the transformed parameters vector CM ðnÞ ¼ bcM ðnÞ j~cM ðnÞ and the desired response signal is set equal to ZðnÞ ¼ z~ðnÞ þ jb zðnÞ. Obviously, the original filter cM ðnÞ can be recovered from the transformed parameters, as cR cM ðnÞ þ c~ M ðnÞÞ, M ðnÞ ¼ 0:5ðb
(47)
cIM ðnÞ ¼ 0:5ðbcM ðnÞ c~ M ðnÞÞ.
(48)
ARTICLE IN PRESS G.-O. Glentis / Signal Processing 85 (2005) 1759–1779
In a similar way, the original filtering error, ecM ðnÞ ¼ zðnÞ yðnÞ, is retried by the corresponding transformed filtering errors, as ec;R ecM ðnÞ þ e~cM ðnÞÞ, M ðnÞ ¼ 0:5ðb
(49)
ec;I ecM ðnÞ e~cM ðnÞÞ. M ðnÞ ¼ 0:5ðb
(50)
Remark 2. Despite the fact that the FRLS implements an RLS scheme, the algorithm exhibits poor numerical error performance, which appears, even when it is carefully implemented in double precision arithmetic. Severe divergence problems are caused by the intrinsic instability of the FRLS algorithm, and extra care has been given to compensate for this undesirable effect. The most promising method for the stabilization of the FRLS algorithm was proposed in [22], who suggested the use of a small amount of computational redundancy combined with a suitable feedback mechanism, to control the numerical errors. It is the difference between the instantaneous backward error and its value as determined by the algorithm. It was shown in [20], how this can be fed back into the algorithm and considerably improve its numerical performance. Two specific variables are used for this purpose, the a priori backward error ebM ðnÞ and the Kalman gain power aM ðnÞ. These are estimated by the complex valued FRLS algorithm as is dictated by Eqs. (10) and (14) of Table 1, and defined by E bM ðnÞ
¼ xðn MÞ þ
AM ðnÞ ¼ 1 þ
bH M ðn
wH M ðnÞxM ðnÞ.
1ÞxM ðnÞ, ð51Þ
We form the differences DbM ðnÞ ¼ E bM ðnÞ ebM ðnÞ, DaM ðnÞ ¼ AM ðnÞ aM ðnÞ.
ð52Þ
In infinite precision, these differences should be zero; however in practice, DbM ðnÞ grows exponentially while DaM ðnÞ grows linearly. These errors are utilized as a feedback mechanism to control the error propagation and provide a new set of variables to be used by the algorithm as is dictated by Eqs. (13) and (17) of Table 1. Tuning variables
1769
sbi , i ¼ 1; 2; 3 and sa are carefully selected so that the overall algorithm remains stable. In analogy with the complex valued FRLS algorithm, the differences between the backward errors, e~bM ðnÞ and b ebM ðnÞ, and the Kalman gain power, aM ðnÞ, and their estimated values have been identified as sensitive stability indicators for the SR FRLS adaptive algorithm of Table 2, where a real valued implementation of the FRLS adaptive algorithm using the strength reduction transform is adopted. These variables are defined as T b ~ MÞ þ b E~ M ðnÞ ¼ xðn b ðn 1Þb xM ðnÞ R dbT M ðn 1ÞxM ðnÞ,
ð53Þ
T bb ðnÞ ¼ x bðn MÞ þ b~ ðn 1Þx~ M ðnÞ E M R þ dbT M ðn 1ÞxM ðnÞ,
ð54Þ
T T bM ðn 1Þ b~ M ðn 1Þ, and where dbM ðn 1Þ ¼ b
~ TM ðnÞx~ M ðnÞÞ. wTM ðnÞb xM ðnÞ þ w AM ðnÞ ¼ 1 þ 0:5ðb (55) They are alternatively estimated by Eqs. (34), (35) and (45) of Table 2. Once again, we form the differences ~ b ðnÞ ¼ E~ b ðnÞ e~b ðnÞ, D M M M b b ðnÞ ¼ E bb ðnÞ b D ebM ðnÞ M M
ð56Þ
and DaM ðnÞ ¼ AM ðnÞ aM ðnÞ.
(57)
These variables, that should be equal to zero in infinite precision, are used through appropriate feedback to improve stability behavior of the algorithm, as is dictated by Eqs. (43), (44) and (47) of Table 2. Parameters s~ bi , b sbi , i ¼ 1; 2; 3, and sa are tuning variables that are utilized for the stabilization of the algorithm. The stable performance of the resulting stabilized algorithm is testified via extensive simulations, as it is demonstrated in Section 5. Notice that the strength reduction technique adopted here, implies a close relationship between the backward error indicators utilized by the FRLS algorithm of Table 1 and the SR FRLS
ARTICLE IN PRESS G.-O. Glentis / Signal Processing 85 (2005) 1759–1779
1770
algorithm of Table 2, i.e., b b ðnÞ ¼ Db;R ðnÞ þ Db;I ðnÞ, D M M M ~ b ðnÞ ¼ Db;R ðnÞ Db;I ðnÞ, D M M M
ð58Þ
b;I where, Db;R M ðnÞ and DM ðnÞ denote the real and the imaginary part of variable DbM ðnÞ defined by Eq. (52).
4.1. Complexity assessment The computational complexity of the complex valued FRLS algorithm of Table 1 is given by C classical FRLS 8M CMUL þ 8M CADD þ 2MRMUL þ 2M RADD.
ð59Þ
Here, CMUL, CADD, RMUL and RADD denote the two operands complex valued multiplication, the two operands complex valued addition, the two operands real valued multiplication, and the two operands real valued addition, respectively. Notice that: (a) multiplication by l is ignored, since it can be realized by a digit shift operator, provided that l is chosen to be a power of two, and (b) real valued divisions are treated as being real valued multiplications. Direct implementation of the fast complex valued multiplication method to the complex valued FRLS, can reduce the number of real multiplications, at the expense of an increased number of additions. Consider Eq. (59) that describes the complexity of the original FRLS adaptive algorithm. The equivalent algorithm complexity in terms of real valued operations is estimated, using the fact that the classical complex valued multiplication method requires 4 RMULs and 2 RADDs, while the complex valued addition can be performed by 2 RADDs. Thus, we get C classical FRLS ¼ 34M RMUL þ 34M RADD.
(60)
On the other hand, if the direct fast complex valued multiplication is applied, where 3 RMULs and 5 RADDs are engaged, we get SR C direct ¼ 26M RMUL þ 58M RADD. FRLS
(61)
Comparing Eqs. (60) and (61) we conclude that the direct application of the fast complex valued
multiplication reduces the number of real valued multiplications by 23.5%, at the expense of 70.7% extra real valued additions. The computational complexity of the proposed novel SR FLRS adaptive algorithm of Table 2, is C proposed FRLS
SR
26M RMUL þ 35M RADD.
(62)
Comparing Eqs. (62) and (60) that gives the equivalent real valued arithmetic complexity of the FRLS algorithm, it is established that the proposed SR FRLS adaptive algorithm requires about 23.5% less real valued multiplications, at the expense of 2.9% extra real valued additions. A comparison of the relative computational requirements among all FRLS adaptive algorithms discussed in this paper is provided in Table 3. The amount of memory required for the implementation of the proposed SR FRLS adaptive algorithm of Table 2, is slightly increased compared to the memory requirements of the direct implementation of the FRLS algorithm of Table 1. Indeed, the complex valued FRLS adaptive scheme of Table 1, requires approximately 12M real valued memory cells, while the proposed real valued SR FRLS adaptive algorithm requires approximately 16M real valued memory cells. This figure can further be reduced to 14M real valued memory cells, at the expense of a small increase in the number of real valued additions ( i.e. 37M instead of 37M real valued additions). Computational savings, without sacrificing performance, is a task of primary interest in the design of high-processing rate adaptive systems. Implementation of the proposed FRLS algorithm
Table 3 Comparison of the relative computational requirements among three different realizations of the complex valued FRLS algorithm Realization of the FRLS Algorithm
Relative RMUL
Relative RADD
Classical Direct SR Proposed SR
100% 77.5% 77.5%
100% 171% 103%
ARTICLE IN PRESS G.-O. Glentis / Signal Processing 85 (2005) 1759–1779
on a typical DSP processor with a single-cycle MAC (multiply and accumulate) unit, requires 20% less cycles per processing sample, compared to the implementation of the original complex valued FRLS scheme. On the contrary, if the fast multiplication is applied directly on each complex valued operation, the processing rate would be deteriorated by a factor of 70%. When multiple DSP units are utilized for the implementation of the adaptive processing task, the proposed FRLS method results in hardware savings [23]. When the implementation of the adaptive algorithm in ASIC is under consideration, the proposed method could result in about 20% reduction of the power dissipation and the silicon area of the fabricated circuit, since the hardware multiplier units are an order of magnitude more complex compared to the hardware addition/subtraction units [24,9,16].
5. Simulation results The performance of the proposed FRLS adaptive algorithm for complex valued adaptive filtering and system identification is illustrated by means of computer simulations. Two important adaptive filtering applications are considered: (a) the FIR system identification, and (b) the equalization of a telecommunication channel using a linear equalizer. 5.1. Adaptive FIR system identification The input time series used in the experiment is a complex valued stationary AR process of order two, i.e., xðnÞ ¼ a1 xðn 1Þ þ a2 xðn 2Þ þ ZðnÞ. The driven signal ZðnÞ is a unit variance complex valued white Gaussian noise. This AR process is used as input to an FIR system of order M ¼ 100. The filter taps are randomly selected from a complex valued, uniformly distributed random process. At the output of the FIR system complex valued white Gaussian noise is added, resulting in an SNR equal to about 30 dB. Thus the contaminated with noise output of the system is
1771
given by yðnÞ ¼
100 X
ci xðn i þ 1Þ þ nðnÞ.
(63)
i¼1
The eigenvalue spread of the sampled autocorrelation matrix of the input signal is controlled by the parameters of the AR process. In our case the AR model parameters are selected so that the resulting condition number of the sampled autocorrelation matrix to be wR ¼ Oð102 Þ. The unknown FIR system is estimated using the classical FRLS adaptive algorithm of Table 1, and the SR FRLS adaptive algorithm of Table 2. The exponentially forgetting parameter is set equal to l ¼ 0:997. The filtering error power for each case is computed by averaging the squared instantaneous filtering errors. Since input data are stationary signals, ensemble averaging is replaced by time averaging, over an exponentially decaying window with effective memory equal to 128 time instants. Simulation results are presented in Fig. 2, where the mean-squared filtering error EðecM ðnÞec M ðnÞÞ is plotted versus time. The output of the SR FRLS algorithm is mathematically equivalent to that of the original FRLS algorithm, since the filtering error ec;R M ðnÞ, R I ec;I ðnÞ and the filter coefficients c ðnÞ, c ðnÞ can M M M be retrieved from the transformed parameters b ecM ðnÞ, e~cM ðnÞ and bcM ðnÞ, c~ M ðnÞ, as is dictated by Eqs. (47)–(50). The numerical equivalence of the two algorithms is demonstrated by computing the signals difference c;R
DeM ðnÞ ¼ ec;R ecM ðnÞ þ e~cM ðnÞÞ, M ðnÞ 0:5ðb c;I
DeM ðnÞ ¼ ec;I ecM ðnÞ e~cM ðnÞÞ, M ðnÞ 0:5ðb R
DcM ðnÞ ¼ kcR cM ðnÞ þ c~ M ðnÞÞk1 , M ðnÞ 0:5ðb I
DcM ðnÞ ¼ kcIM ðnÞ 0:5ðbcM ðnÞ c~ M ðnÞÞk1 .
(64) (65) (66) (67)
These signals are plotted in Fig. 3. Clearly, the FRLS algorithm and the proposed SR FRLS algorithm, are numerically equivalent, since the signals difference are very close to the machine’s precision constant.
ARTICLE IN PRESS G.-O. Glentis / Signal Processing 85 (2005) 1759–1779
1772
Learning curve 25
MSE db
20 15 10 5 0 0
500
1000
1500
2000
(a)
2500 3000 samples
3500
4000
4500
7
8
9
5000
Learning curve 25
MSE db
20 15 10 5 0 0
1
2
3
4
(b)
5
6
samples
10 x 105
Fig. 2. Adaptive FIR system identification: Plot of the Mean Squared Error (MSE), EðecM ðnÞec M ðnÞÞ, versus time; (a) convergence behavior and (b) long run behavior.
Simulation results allow us to conclude that both algorithms show similar stabilization characteristics. The error feedback signals, Db;R M ðnÞ, a Db;I ðnÞ and D ðnÞ, that are utilized by the FRLS M M algorithm are depicted in the firstbcolumn of Fig. 4. b ðnÞ D ~ b ðnÞ and The error feedback signals D M M a DM ðnÞ, that are utilized by the SR FRLS algorithm are depicted in second column of Fig. 4. Clearly, both algorithms perform well, when feedback stabilization is utilized. 5.2. Adaptive channel equalization In this experiment, the proposed SR FRLS adaptive algorithm is used for the equalization of a telecommunication channel [2,3,5]. A QPSK sig-
nalling is considered and the ISI is simulated by passing the transmitted signal through a discrete-time linear channel. At the receiver, a linear symbol spaced equalizer is engaged in order to cancel the ISI introduced by the channel. Although linear, symbol spaced, equalizers are considered, the proposed SR FRLS methodology can be extended to the case of fractionally spaced, and/or decision feedback equalizers, where multichannel SR FRLS algorithms can be applied. Several communication channels are considered for the evaluation of the SR FRLS linear equalizer. Following [8], channels h1 and h2 are considered, with impulse responses h1 ¼ ½0:2194ejf0 þ ejf1 þ 0:2194ejf3 T
ARTICLE IN PRESS G.-O. Glentis / Signal Processing 85 (2005) 1759–1779
x 10-13
3 2
1
1
c,ℜ
∆eM (n)
2
0
c,ℜ
∆eM (n)
3
-1
-2
-2
-3
-3 0
2
4 6 samples
8
-4
10 x 105
0
2
4 6 samples
8
10 x 105
4
8
10
x 10-13 2
1.5
1.5 ∆cM (n)
1
ℜ
ℜ
∆cM (n)
x 10-13 2
0.5
0
x 10-13
0
-1
-4
1773
1
0.5
0
2
4
6
8
samples
10 x 10
5
0
0
2
6
samples
x 105
Fig. 3. Comparison c;R between c;Ithe output of the classical complex valued FRLS and the proposed SR FRLS adaptive algorithm. c;R c;I Difference signals DeM ðnÞ, DeM ðnÞ, DcM ðnÞ and DcM ðnÞ are plotted versus time.
and h2 ¼ ½0:227ejf0 þ 0:406ejf1 þ 0:688ejf2 þ 0:406ejf3 þ 0:227ejf4 T . Here, the fi are random phases. Moreover, channel h3 is also considered with impulse response [25] h3 ¼ ½0:1897ejf0 þ 0:5097ejf1 þ 0:6847ejf2 þ 0:4600ejf3 þ 0:1545ejf4 T . Finally, a time-varying channel h4 ðnÞ is considered with impulse response given by the first order model [26,27,29], h4 ðnÞ ¼ bb4 ðn 1Þ þ qðnÞ. Here, qðnÞ is a zero mean complex valued random signal having diagonal covariance matrix, i.e.,
EðqðnÞqH ðnÞÞ ¼ s2q I. The model parameters are set equal to b ¼ 0:9999, s2q ¼ 0:01, and h4 ð1Þ ¼ h3 . In all cases, additive complex white Gaussian noise is added to the received signal, resulting in an SNR equal to 20 db. The stabilized SR FRLS adaptive algorithm is employed for the channel equalization for all four channels tested. In all cases, feedback stabilization is employed, where the stabilization parameters are all set equal to sbi ¼ b sbi ¼ s~ bi ¼ 0:5, i ¼ 1; 2; 3, a and s ¼ 0:5. The equalizers are initially trained using a known data sequence, consisting of 100 data samples. After the training period, equalizers are set to a decision-directed mode. A set of 100 independent trials are run for each experiment, and the ensemble squared error (MSE) is estimated for each case. Details of each experiment such as the equalizer length M, the decision delay
ARTICLE IN PRESS G.-O. Glentis / Signal Processing 85 (2005) 1759–1779
1774
4
x 10-14
1 hat ∆bM (n)
∆b,ℜ (n) M
2 0 -2 -4 0
2
4
6
8
0 -0.5 -1
10
x 10-14
1
0
-5 0
2
4 6 samples
8
-1
2
∆αM (n)
∆αM (n)
4 6 samples
8
10 x 105
6
8
10 x 105
x 10-13
-0.5
10 x 105
0
4
0
10
2
2
0.5
x 10-16
0
0
samples
tilde ∆bM (n)
∆b,ℑ (n) M
0.5
x 105
samples
5
x 10-13
0
2
4 6 samples
8
10 x 105
4 6 samples
8
10 x 105
x 10-15
1 0 -1
0
2
Fig. 4. Stability issues. Signals utilized for the error feedback stabilization of the classical complex valued FRLS (first column) and the proposed SR FRLS adaptive algorithm (second column). Notice that all signals of interest are kept at a low level, in the range of machine’s precision.
D and the forgetting factor l, are summarized below
Channel Channel Channel Channel
h1 , h2 , h3 , h4 ,
M M M M
¼ 9, D ¼ 5, l ¼ 0:98, ¼ 21, D ¼ 11, l ¼ 0:985, ¼ 21, D ¼ 11, l ¼ 0:985, ¼ 21, D ¼ 11, l ¼ 0:985.
The learning curve for each experiment is depicted in Fig. 5. Notice the striking difference between the results reported here and those of [8], as far as the number of training symbols is concerned. This is due to the fact that the convergence rate of RLStype algorithms is independent from the eigenvalue distribution of the input data signals, a property
that the LMS-type algorithms do not possess. No numerical instability is occurred during the simulation of the SR FRLS adaptive equalization algorithm for any of the four channels tested.
6. Conclusions A novel implementation of the fast recursive least squares algorithm for complex adaptive filtering, has been presented. The proposed method results in a reduced computational complexity adaptive scheme, where all complex valued operations are replaced by real valued counterparts. A set of auxiliary signals and filtering parameters has
ARTICLE IN PRESS G.-O. Glentis / Signal Processing 85 (2005) 1759–1779
been introduced, which are propagated through the algorithm, and are updated by the pertinent recursive equations, in accordance to the strength reduction real valued arithmetic. In this way, an
1775
efficient adaptive scheme has been established based on novel three-terms recursions for the time updating of the pertinent parameters. The proposed algorithm is implemented using real valued
Learning curve 10 MSE db
5 0 -5 -10 -15 -20
0
100
200
300
400
500 600 samples
700
800
900
1000
7
8
9
700
800
900
1000
7
8
9
10
Learning curve -15 MSE db
-15.5 -16 -16.5 -17 -17.5 -18
0
1
2
3
4
MSE db
(a)
6 4 2 0 -2 -4 -6 -8
5 samples
6
10 x 104
Learning curve
0
100
200
300
400
500 600 samples
MSE db
Learning curve -4.5 -5 -5.5 -6 -6.5 -7 -7.5 -8 0
1
2
3
4
5
6
samples x 104 c Fig. 5. Adaptive channel equalization: Mean Squared Error plot E eM ðnÞec M ðnÞ versus time (Convergence behavior and long run behavior), for four different communication channels: (a) Channel equalization: h1 ; (b) channel equalization: h2 ; (c) channel equalization: h3 ; (d) channel equalization: h4 . (b)
ARTICLE IN PRESS G.-O. Glentis / Signal Processing 85 (2005) 1759–1779
1776
Learning curve 10 MSE db
5 0 -5 -10 -15
MSE db
-20
-13 -13.5 -14 -14.5 -15 -15.5 -16 -16.5
0
100
200
300
400
700
800
900
7
8
9
1000
Learning curve
0
1
2
3
4
5
6
10 x 104
samples
(c)
MSE db
500 600 samples
Learning curve
6 4 2 0 -2 -4 -6 -8 0
100
200
300
400
500
600
700
800
900
7
8
9
1000
samples Learning curve
MSE db
0 -5 -10 -15 -20 0
1
2
3
4
5 samples
(d)
6
10 x 104
Fig. 5. (Continued)
arithmetic only, whilst reducing the number of the required real valued multiplications by 23.5%, at the expense of a marginal 2.9% increase in the number of the real valued additions. This provides a significant advantage over the direct application
of a fast complex valued multiplication method to the original FRLS scheme, where the reduction in the number of real valued multiplications, is overshadowed by the 70% increase in the required number of additions. The proposed scheme is
ARTICLE IN PRESS G.-O. Glentis / Signal Processing 85 (2005) 1759–1779
suitable for multiprocessor or VLSI ASIC implementation, achieving reduction of the required DSP units, or in the power dissipation and the silicon area of the fabricated circuit. The performance of the proposed FRLS algorithm has been investigated by computer simulations, in the context of adaptive system identification and adaptive channel equalization. Simulation results indicate that both the original complex valued FRLS algorithm as well as the proposed efficient FRLS implementation using real valued arithmetic attain similar convergence and stabilization properties. Although single-channel complex adaptive filtering has been considered, the proposed SR FRLS methodology can be extended to the case multichannel complex adaptive filtering [28,27].
Let us consider the transformed Kalman gain parameters, defined by Eqs. (38) and (39), in conjunction with Eq. (68). Thus, we get b Mþ1 ðnÞ w " ¼
It remains to show how the Kalman gain parameters are updated using the proposed SR method. Consider the order updating equation of the Kalman gain vector, Eq. (7) of Table 1. Let us define
b M ðn 1Þ w 2
3
;R f ;I kfMþ1 ðnÞ þ kMþ1 ðnÞ
;R ;I b aM ðn 1ÞkfMþ1 ðnÞ þ a~ M ðn 1ÞkfMþ1 ðnÞ
5
ð69Þ and ~ Mþ1 ðnÞ w "
#
0
~ M ðn 1Þ w 2 þ4
3
;R f ;I kfMþ1 ðnÞ kMþ1 ðnÞ
;R ;I a~ M ðn 1ÞkfMþ1 ðnÞ þ b aM ðn 1ÞkfMþ1 ðnÞ
5.
ð70Þ Define the vector
I wM ðnÞ ¼ wR M ðnÞ þ jwM ðnÞ,
wM ðnÞ. WM ðnÞ ¼ w~ M ðnÞ jb
I aM ðnÞ ¼ aR M ðnÞ þ jaM ðnÞ
(71)
Using Eqs. (69) and (70), Eq. (71) is updated as follows: " # 0 WMþ1 ðn þ 1Þ ¼ WM ðnÞ " # 1j þ a~ M ðn 1Þ jb aM ðn 1Þ
and ;R ;I kfMþ1 ðnÞ ¼ kfMþ1 ðnÞ þ jkfMþ1 ðnÞ.
Then, Eq. (7) of Table 1 can be expanded as I wR Mþ1 ðnÞ þ jwMþ1 ðnÞ " # " # 0 0 ¼ þj I wR wM ðn 1Þ M ðn 1Þ " # 1 þ kf ;R ðnÞ R aM ðn 1Þ Mþ1 ! " # 1 f ;I I ðnÞ k aM ðn 1Þ Mþ1 " # 1 ;I þj ðnÞ kfMþ1 aR ðn 1Þ M ! " # 1 f ;R þ I ðnÞ . k aM ðn 1Þ Mþ1
#
0
þ4
¼ Appendix A. SR Kalman gain updating
1777
f ;R ;I ðkMþ1 ðnÞ jkfMþ1 ðnÞÞ.
ð72Þ
Finally, application of the strength reduction operation, Eq. (16) results in the following couple of equations: " b Mþ1 ¼ w
0 b M ðn 1Þ w "
ð68Þ
#
" þ 0
1
#
b aM ðn 1Þ #
b aM ðn 1Þ a~ M ðn 1Þ
f kbMþ1 ðnÞ
;I ðnÞ kfMþ1
ð73Þ
ARTICLE IN PRESS G.-O. Glentis / Signal Processing 85 (2005) 1759–1779
1778
and
and "
~ Mþ1 ¼ w
#
0
þ
~ M ðn 1Þ w "
"
0
#
1
a~ M ðn 1Þ #
b aM ðn 1Þ a~ M ðn 1Þ
" ~ Mþ1 ¼ w
f k~Mþ1 ðnÞ
~ M ðnÞ w "
;I ðnÞ. kfMþ1
0
"
# þ
b~ M ðn 1Þ
Here, we have I b bM ðnÞ ¼ bR M ðnÞ þ bM ðnÞ, b~ M ðnÞ ¼ bR ðnÞ bI ðnÞ, M
b aM ðnÞ ¼ a~ M ðnÞ ¼
aIM ðnÞ, aIM ðnÞ.
ð75Þ
Similarly, we get kb fMþ1 ðnÞ k~ fMþ1 ðnÞ
¼k ¼k
f ;R Mþ1 ðnÞ þ f ;R Mþ1 ðnÞ
k k
wM ðnÞ
0
þ
ð76Þ
bM ðn 1Þ b kMþ1 ðnÞ. 1 (77)
Notice that the above equation is implicitly expressed by Eqs. (8) and (9) of Table 1. Working in a similar way as before, we get the set of recursive equations as " b Mþ1 ¼ w
b M ðnÞ w "
0
#
" þ
b bM ðn 1Þ 1
#
b kbMþ1 ðnÞ
b bM ðn 1Þ b~ M ðn 1Þ 0
M
ð80Þ
I where bR M ðnÞ and bM ðnÞ represent the real and the imaginary part of the backward predictor bM ðnÞ. Moreover, b b;I kbMþ1 ðnÞ ¼ kb;R Mþ1 ðnÞ þ kMþ1 ðnÞ,
f ;I Mþ1 ðnÞ, f ;I Mþ1 ðnÞ,
f ;R f ;I where, kMþ1 ðnÞ and kMþ1 ðnÞ represent the real and f the imaginary part of parameter kMþ1 ðnÞ. The order downdating recursion associated with the recursive updating of the Kalman gain vector can be treated in a similar way. Let us consider the dual order updating recursion for the Kalman gain vector, i.e.,
wMþ1 ðnÞ ¼
kb;I Mþ1 ðnÞ. ð79Þ
ð74Þ
aR M ðnÞ þ aR M ðnÞ
b k~Mþ1 ðnÞ
1 # ~ b bM ðn 1Þ bM ðn 1Þ 0
Auxiliary parameters b aM ðnÞ and a~ M ðnÞ introduced above are related to the forward predictor I aM ðnÞ ¼ aR M ðnÞ þ jaM ðnÞ, as
#
# kb;I Mþ1 ðnÞ ð78Þ
b b;I k~Mþ1 ðnÞ ¼ kb;R Mþ1 ðnÞ kMþ1 ðnÞ,
ð81Þ
b;I where kb;R Mþ1 ðnÞ and k Mþ1 ðnÞ are the real and the imaginary part of parameter kbMþ1 ðnÞ. The order updating recursions, as well as the order downdating recursions, that are utilized for the efficient time updating of the Kalman gain vector using the SR transform, are listed in Eqs. (16)–(35) of Table 2. Eqs. (15), (24), (35), (45) and (57) of Table 2, that are used for the computation of the ‘power’ of the forward predictor, the Kalman gain and the backward predictor, result by applying the SR updating recursions developed so far, on the definition equations of the pertinent ‘power’ parameters.
References [1] M. Bellanger, Adaptive Digital Filters and Signal Analysis, Marcel Dekker, New York, 1987. [2] S. Benedetto, E. Biglieri, Principles of Digital Transmission, with Wireless Applications, Kluwer Academic Publishers, Dordrecht, 1999. [3] S. Haykin, Adaptive Filter Theory, second ed., PrenticeHall, Englewood Cliffs, NJ, 1991. [4] N. Kalouptsidis, S. Theodoridis (Eds.), Adaptive System. Identification And Signal Processing Algorithms, PrenticeHall, Englewood Cliffs, NJ, 1993.
ARTICLE IN PRESS G.-O. Glentis / Signal Processing 85 (2005) 1759–1779 [5] J. Proakis, Digital Communications, third ed., McGrawHill, New York, 1995. [6] R. Baghaie, T. Laakso, Implementation of low-power CDMA RAKE receivers using strength reduction transformation, Proceedings of the IEEE Nordic Signal Processing Symposium, Denmark, 1998, pp. 169–172. [7] D. Bull, Efficient IQ filter structure for use in adaptive equalisation, Electron. Lett. 30 (24) (November 1994) 2018–2019. [8] R. Perry, D. Bull, A. Nix, Efficient adaptive complex filtering algorithm with application to channel equalisation, IEE Proc.-Vis. Image Signal Process. 146 (2) (February 1999) 57–64. [9] N. Shanbhag, M. Goel, Low-power adaptive filter architectures and their application to 51.84 Mb/s ATM-LAN, IEEE Trans. Signal Process. 45 (5) (May 1997) 1276–1290. [10] A. Lamagna, Fast computer algebra, IEEE Comput. 15 (9) (September 1982) 43–56. [11] S. Winograd, Arithmetic Complexity of Computations, SIAM, Philadelphia, 1980. [12] A. Chandrakasan, M. Potkonjak, R. Mehra, J. Rabaey, R. Brodersen, Optimizing power using transforms, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 14 (1) (January 1995) 12–31. [13] A. Chandrakasan, R. Brodersen, Minimizing power consumption in digital CMOS circuits, Proc. IEEE 83 (4) (April 1995) 498–523. [14] K. Parhi, VLSI Digital Signal Processing Systems, Design and Implementation, Wiley, New York, 1999. [15] P. Pirsch, Architectures for Digital Signal Processing, Wiley, New York, 1996. [16] N. Shanbhag, Algorithmic transformation techniques for low-power wireless VLSI systems design, Internat. J. Wireless Informat. Networks 5 (2) (1998) 147–171. [17] G. Carayannis, D. Manolakis, N. Kalouptsidis, A fast sequential algorithm for least-squares filtering and prediction, IEEE Trans. Acoust. Speech, Signal Process. 31 (12) (December 1983) 1394–1402. [18] J. Cioffi, T. Kailath, Fast recursive LS transversal filters for adaptive processing, IEEE Trans. Acoust. Speech, Signal Process. 32 (4) (April 1984) 304–337.
1779
[19] A. Wenzler, E. Luder, New structures for complex multipliers and their noise analysis, Proceedings of the IEEE International Symposium on Circuit Systems, ISCAS-1995, Seattle, pp. 1431–1435. [20] D. Slock, T. Kailath, Numerical Stable Fast RLS transversal adaptive filtering, IEEE Trans. Acoust. Speech, Signal Process. 39 (1) (January 1991) 92–114. [21] G. Glentis, K. Berberidis, S. Theodoridis, Efficient least squares adaptive algorithms for FIR transversal filtering: a unified view, IEEE Signal Process. Mag. 16 (4) (April 1999) 13–42. [22] J. Botto, G. Moustakides, Stabilizing the fast Kalman algorithms, IEEE Trans. Acoust. Speech, Signal Process. 37 (9) (September 1989) 1342–1348. [23] J. Sanchez, H. Barral, Multiprocessor implementation models for adaptive algorithms, IEEE Trans. Signal Process. 44 (9) (September 1996) 2319–2331. [24] J. Diguet, D. Chillet, O. Sentieys, A framework for high level estimations of signal processing VLSI implementations, J. VLSI Syst. Signal, Image Video Technol. 25 (July 2000). [25] T. Vaidis, C. Weber, Block adaptive techniques for channel identification and data demodulation over bandlimited channels, IEEE Trans. Comm. 46 (2) (February 1998) 232–243. [26] E. Eweda, Comparison of RLS, LMS and Sign algorithms for tracking randomly time-varying channels, IEEE Trans. Signal Process. 42 (11) (November 1994) 2937–2944. [27] D. Slock, L. Chisci, H. Lev-Ari, T. Kailath, Modular and numerically stable fast transversal filters for multichannel and multiexperiment RLS, IEEE Trans. Signal Process. 40 (4) (April 1992) 784–802. [28] G. Glentis, N. Kalouptsidis, Fast adaptive algorithms for multichannel filtering and system identification, IEEE Trans. Signal Process. 40 (10) (October 1992) 2433–2458. [29] L. Lindbom, M. Sternad, A. Ahlen, Tracking of timevarying mobile radio channels—Part I: The Wiener LMS algorithm, IEEE Trans. Comm. 49 (12) (December 2001) 2207–2217.