Efficient fast recursive least-squares adaptive complex filtering using real valued arithmetic

Efficient fast recursive least-squares adaptive complex filtering using real valued arithmetic

ARTICLE IN PRESS Signal Processing 85 (2005) 1759–1779 www.elsevier.com/locate/sigpro Efficient fast recursive least-squares adaptive complex filterin...

771KB Sizes 1 Downloads 98 Views

ARTICLE IN PRESS

Signal Processing 85 (2005) 1759–1779 www.elsevier.com/locate/sigpro

Efficient fast recursive least-squares adaptive complex filtering using real valued arithmetic George-Othon Glentis Technological Education Institute of Crete, Branch at Chania, Department of Electronics, 3, Romanou Str., Halepa, 73133 Chania, Greece Received 26 January 2004; received in revised form 29 March 2005

Abstract In this paper, a novel implementation of the fast recursive least squares (FRLS) algorithm for complex adaptive filtering, is presented. Complex signals are treated as pairs of real signals, and operations are re-organized on a real arithmetic basis. A new set of signals and filtering parameters is introduced, which are expressed either as the sum, or as the difference between the real and the imaginary part of the original variables. The algorithmic strength reduction technique is subsequently applied, resulting in significant computational savings. In this way, an efficient, FRLS algorithm is derived, where only two real valued filters are propagated, based on novel three-terms recursions for the time updating of the pertinent parameters. The proposed algorithm is implemented using real valued arithmetic only, whilst reducing the number of the required real valued multiplication operations by 23.5%, at the expense of a marginal 2.9% increase in the number of the real valued additions. The performance of the proposed FRLS algorithm is investigated by computer simulations, in the context of adaptive system identification and adaptive channel equalization. r 2005 Elsevier B.V. All rights reserved. Keywords: Adaptive filtering; Fast algorithms; Complex valued arithmetic

1. Introduction Complex valued signals are encountered in many adaptive signal processing applications. Typical examples include the design of adaptive algorithms for channel identification or channel Tel.: +30 28210 23003; fax: +30 821 281 90.

E-mail address: [email protected].

equalization of telecommunication links, the design of adaptive beamformers or sidelobe cancellers in array signal processing, etc, [1–5]. While most of the existing real valued adaptive algorithms can easily be extended to handle complexvalued signals, it was not until recently when attention was given in the specific implementation of the complex valued operations [6–9]. The fast complex-valued multiplication method [10,11], has

0165-1684/$ - see front matter r 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.sigpro.2005.03.013

ARTICLE IN PRESS 1760

G.-O. Glentis / Signal Processing 85 (2005) 1759–1779

been adopted for the design of efficient digital signal processing algorithms, including digital filtering and adaptive filtering. In the context of VLSI signal processing, the fast complex valued multiplication method has been introduced as an algorithmic strength reduction (SR) transform, and it has been successfully applied for the design of low-power, high-speed adaptive filters and equalizers [12–14,8,15,16]. In this paper, the fast recursive least squares (FRLS) algorithm for complex adaptive filtering is considered and an efficient real valued FRLS scheme is derived. Throughout this paper, the FRLS nomenclature introduced by the FAEST algorithm is adopted [17]. The FRLS FTF counterpart can be handled in a similar way [18]. Direct application of a fast complex valued multiplication method to the original FRLS scheme, where each individual complex multiplication is performed using a fast scheme [10,19,11], can reduce the number of the required real multiplications by 23.5%, at the expense of a 70% increase in the required number of additions. To keep the overhead due to the extra additions low, both the input and the desired response signals, as well as all variables associated with the FRLS algorithm, are treated as pairs of real signals, and operations are re-organized on a real arithmetic basis. Auxiliary signals and filter parameters are introduced, which correspond to the generic signal transformation sR ðnÞ  sI ðnÞ, where sR ðnÞ and sI ðnÞ represent the real and the imaginary part of the complex valued signal sðnÞ, respectively. The algorithmic strength reduction technique is subsequently applied to the transformed FRLS algorithm [9,16]. In this way, novel real valued threeterms recursions are derived for the efficient time update of the FRLS parameters, resulting in significant computational savings. The proposed FRLS algorithm utilizes real arithmetic only, for the efficient realization of the transversal filtering part, as well as for the parameters updating part. The computational complexity of the proposed scheme, measured by the required number of the valued real multiplications, is reduced by 23.5%, with a marginal 2.9% increase in the number of the real valued additions. Computational savings, without sacrifi-

cing performance, is a task of primary interest in the design of high-speed adaptive systems, where processing power of several GOPS (Giga operations per second) is needed. On the other hand, in slow sampling rate adaptive filtering systems, for example in speech processing and hearing aids applications, where special VLSI ASICs are required for small size and low-power implementation, the reduction in the computational complexity is of primarily interest, since it is directly related to the size and the power consumption of the final design. The performance of the proposed FRLS implementation is investigated by computer simulations, in the context of adaptive system identification and adaptive channel equalization. To avoid numerical instability, error feedback stabilization is utilized [20]. Simulation results indicate that both the original complex valued FRLS algorithm as well as the proposed efficient FRLS implementation using real valued arithmetic have similar convergence and stabilization properties.

2. Fast RLS adaptive filtering overview Let us consider a complex valued FIR model described by the difference equation [3] yðnÞ ¼

M X

ci xðn  i þ 1Þ.

(1)

i¼1

Here xðnÞ is a complex valued input signal and ci , i ¼ 1; . . . M, are the complex valued filter coefficients. (As for the superscripts throughout this paper, ‘*’ means the complex conjugate, ‘T’ denotes the vector transpose, and ‘H’ stands for the Hermitian operator (conjugate and transpose).) The above equation is compactly written as yðnÞ ¼ cH M xM ðnÞ.

(2)

Vector xM ðnÞ is the regressor xM ðnÞ ¼ ½xðnÞ xðn  1Þ . . . xðn  M þ 1Þ T

(3)

and vector cM ¼ ½c1 c2 . . . cM T , is the filter coefficients vector.

(4)

ARTICLE IN PRESS G.-O. Glentis / Signal Processing 85 (2005) 1759–1779

Given a sequence of a complex valued input signal xðkÞ, and a complex valued desired response signal zðkÞ, over a finite observation horizon, k ¼ 0; 1; . . . n, the optimum parameters of the FIR model of Eq. (1) are estimated by minimizing the exponentially forgetting total squared error, i.e., ! n X c cM ðnÞ : min lnk ec M ðkÞeM ðkÞ , cM

k¼0

ecM ðkÞ

¼ zðkÞ  cH M ðnÞxM ðkÞ.

ð5Þ

Here, 05lo1 is a parameter that controls the size of the exponentially fading memory of the adaptation mechanism. Minimization of the cost function (5), with respect to the filter coefficients vector, results in a set of linear equations, the socalled normal equations RM ðnÞcM ðnÞ ¼ dM ðnÞ.

(6)

Matrix RM ðnÞ is the sampled autocorrelation matrix of the input signal xðnÞ, defined as RM ðnÞ ¼

n X

lnk xM ðkÞxH M ðkÞ.

(7)

k¼0

Vector dM ðnÞ is the sampled cross-correlation vector between the input signal xðnÞ and the desired response signal zðnÞ, i.e., dM ðnÞ ¼

n X

lnk xM ðkÞz ðkÞ.

(8)

k¼0

The RLS algorithm provides a method for the time updating of the inverse autocorrelation matrix, bypassing the direct matrix inversion imposed by Eq. (6). Indeed, application of the matrix inversion lemma [3, p. 480] to Eqs. (6)–(8), results in the following updating scheme: cM ðnÞ ¼ ðzðnÞ  cH M ðn  1ÞxM ðnÞÞ =ð1 þ wH M ðnÞxM ðnÞÞ, cM ðnÞ ¼ cM ðn  1Þ þ wM ðnÞc M ðnÞ.

ð9Þ (10)

Parameter wM ðnÞ that appears in the above equations, is the so-called Kalman gain vector, defined as [17,18] wM ðnÞ ¼ l1 R1 M ðn  1ÞxM ðnÞ.

(11)

1761

According to the RLS algorithm, the inverse matrix R1 M ðn  1Þ can be updated using an OðM 2 Þ recursive scheme [3]. Fast RLS algorithms go a step further, using fast OðMÞ adaptive schemes for the time update of the Kalman gain vector, bypassing the need of the inverse matrix R1 M ðn  1Þ adaptation, [17,18,21,4]. The key point in the development of fast RLS algorithms is the proper utilization of the shift invariance property of the data matrix. Indeed, the increased order data vector xMþ1 ðnÞ can be partitioned either in a lower or in an upper form, as " # " # xM ðnÞ xðnÞ xMþ1 ðnÞ ¼ ¼ . (12) xðn  MÞ xM ðn  1Þ These partitions of the data vector are utilized to obtain appropriate partitions for the increased order autocorrelation matrix RMþ1 ðnÞ. It has been shown [17,18] that the Kalman gain vector wM ðn  1Þ at time n  1, is related to the Kalman gain vector at time n, as it follows from the upper and the lower form solution of the increased order system, Eq. (11). The fast RLS algorithm for complex adaptive filtering is tabulated in Table 1. Auxiliary vectors aM ðnÞ and bM ðnÞ are least squares forward and backward predictors, and result from the general filtering scheme of Eq. (1), by setting zðnÞ ¼ xðn þ 1Þ and zðnÞ ¼ xðn  MÞ, respectively. The intrinsic numerical instability of the FRLS algorithm is treated using the error feedback stabilization method, originally proposed in [22,20]. Two specific variables are used for this purpose, the a priori backward prediction error ebM ðnÞ and the Kalman gain power aM ðnÞ. These are estimated by the algorithm as is dictated by Eqs. (10) and (6), (14) of Table 1, respectively. At the same time these parameters are re-estimated using Eqs. (11) and (15) of Table 1. The error signals DbM ðnÞ and DaM ðnÞ corresponding to the backward prediction error and the Kalman gain power, are utilized as a feedback mechanism to control the error propagation and provide a new set of variables to be used by the algorithm as is dictated by Eqs. (13) and (17) of Table 1. Tuning variables sb and sa are carefully selected so that

1762

Table 1 The stabilized complex valued Fast RLS (FRLS) adaptive algorithm Initialization

A: Addition; M: Multiplication; R: Real valued; C: Complex valued

cM ð1Þ ¼ 0; wM ð1Þ ¼ 0; aM ð1Þ ¼ 0; bM ð1Þ ¼ 0 aM ð1Þ ¼ 1, afM ð1Þ ¼ lM abM ð1Þ, abM ð1Þ ¼ s2x

Forward prediction part 1 efM ðnÞ ¼ xðnÞ þ aH M ðn  1ÞxM ðn  1Þ 2 f ðnÞ ¼ ef ðnÞ=aM ðn  1Þ M

CM

CA

M

M

M

M

4

afM ðnÞ ¼ lafM ðn  1Þ þ efM ðnÞfM ðnÞ

2

5

kfMþ1 ðnÞ ¼ l1 efM ðnÞ=afM ðn  1Þ

2

6

 ðnÞ aMþ1 ðnÞ ¼ aM ðn  1Þ þ efM ðnÞkfMþ1

2

" wMþ1 ðnÞ ¼

8 wMþ1 ðnÞ ¼ 9

0

wM ðn  1Þ " # qM ðnÞ

#

" þ

1 aM ðn  1Þ

15 16 17 18

wM ðnÞ ¼ qM ðnÞ  bM ðn  1ÞkbMþ1 ðnÞ

M

M

M

M

M

M 1 3

2

b aM ðnÞ ¼ aMþ1 ðnÞ  eb;1 M ðnÞk Mþ1 ðnÞ ðnÞx ðnÞ AM ðnÞ ¼ 1 þ wH M M DaM ðnÞ ¼ AM ðnÞ  aM ðnÞ a a aM ðnÞ ¼ AM ðnÞ þ s DM ðnÞ b;i b;i M ðnÞ ¼ eM ðnÞ=aM ðnÞ;

bM ðnÞ ¼ bM ðn  1Þ  wM ðnÞb;2 M ðnÞ

20

b;3 abM ðnÞ ¼ labM ðn  1Þ þ eb;3 M ðnÞM ðnÞ

ecM ðnÞ ¼ zðnÞ  cH M ðn  1ÞxM ðnÞ cM ðnÞ ¼ ecM ðnÞ=aM ðnÞ cM ðnÞ ¼ cM ðn  1Þ þ wM ðnÞc M ðnÞ

2

2

2M

2M 1 1

4

i ¼ 2; 3

19

Filtering part 21 22 23

kfMþ1 ðnÞ

kbMþ1 ðnÞ

Backward prediction part 10 ebM ðnÞ ¼ labM ðn  1ÞkbMþ1 ðnÞ 11 E bM ðnÞ ¼ xðN  mÞ þ bH M ðn  1ÞxM ðnÞ 12 DbM ðnÞ ¼ E bM ðnÞ  ebM ðnÞ b b b 13 eb;i i ¼ 1; 2; 3 M ðnÞ ¼ E M ðnÞ þ si D ðnÞ; 14

#

2

M

M 2

M

M

M

M

2

2

Forward and backward prediction powers abM ð1Þ and afM ð1Þ are initialized using an estimate of the input signal power, s2x . Eq. (8) defines a vector partition.

ARTICLE IN PRESS

Kalman gain part 7

2

G.-O. Glentis / Signal Processing 85 (2005) 1759–1779

aM ðnÞ ¼ aM ðn  1Þ  wM ðn  1ÞfM ðnÞ

RA

2

M

3

RM

ARTICLE IN PRESS G.-O. Glentis / Signal Processing 85 (2005) 1759–1779

the overall algorithm remains stable. An additional constrain of the stabilized fast RLS algorithms is the nominal range within which the forgetting factor l should be, in order the algorithm to remain stable, and as was explained in [18], l 2 ð1  1=3M; 1Þ [20].

3. Algorithmic strength reduction Complex valued arithmetic is required for the implementation of the FRLS algorithm when the input signals and/or the filter model are represented by complex valued variables. Complex valued addition is realized by a set of two real valued additions, i.e., ða þ jbÞ þ ðc þ jdÞ ¼ ða þ cÞ þ jðb þ dÞ.

(13)

Complex valued multiplication can be realized by the classic method, that requires four real valued multiplications and two real valued additions, i.e., ða þ jbÞðc þ jdÞ ¼ ðac  bdÞ þ jðad þ bcÞ.

(14)

Alternatively, the fast complex valued multiplication method can be applied, where the inherent dependencies of the partial products and sums, are utilized for the reduction of the number of real valued multiplication operations, at the expense of some extra real valued additions [10,14,15,19,11]. In this case, three real valued multiplications and five real valued additions, are required. Among others, two possible implementations of the fast complex valued multiplication are described by the following equations [19]: ac  bd ¼ bðc  dÞ þ ða  bÞc, ad þ bc ¼ aðc þ dÞ  ða  bÞc

that the input–output behavior is preserved. The application of the fast complex valued multiplication method belongs to a general class of algorithmic transformations, and it is known as the algorithmic strength reduction transform. It has been successfully applied for the design of lowpower, high-speed adaptive filters and equalizers [12,13,16]. Direct implementation of the fast complex valued multiplication method to the FRLS scheme, can reduce the required number of real multiplication operations. However, although the number of real valued multiplications is reduced, there is a significant increase in the number of real valued additions and in the total number of the overall real valued operations performed by the algorithm. Further improvement on the number of the required real valued additions can be achieved by algorithm re-organization, using a set of auxiliary signals and filter parameters, that are propagated through the algorithm and are updated by the pertinent recursive equations, in accordance to the strength reduction real valued arithmetic imposed by Eqs. (15) and (16).

4. The SR FRLS adaptive algorithm Application of the algorithmic strength reduction transform, described by Eqs. (15) and (16), to the FRLS adaptive algorithm, results in an equivalent algorithmic description, called the strength reduced (SR) FRLS adaptive algorithm, where real valued arithmetic is only required. The main features of the SR FRLS algorithm are

ð15Þ

and ac  bd ¼ ða þ bÞd þ aðc þ dÞ, ad þ bc ¼ bðc  dÞ þ ða þ bÞd.

1763

ð16Þ

Efficient signal processing algorithms, for digital filtering and adaptive filtering, have been developed by taken into account the reduced complexity, fast complex valued multiplication method [6–9]. In the context of VLSI signal processing, transformations are modifications to the computational structure of a given algorithm, in a manner

the introduction of a set of real valued auxiliary signals that are fed into the transversal filters and are propagated through the algorithm, and the use of a set of real valued, modified filter parameters that are updated instead of the original ones, using novel three-terms recursions. As a result, a minimum amount of operations, i.e., real valued multiplication and addition, is attained.

ARTICLE IN PRESS G.-O. Glentis / Signal Processing 85 (2005) 1759–1779

1764

Let us consider the complex filtering set up dictated by Eqs. (1)–(4). The input signal xðnÞ is expressed in terms of a real and an imaginary part, as xðnÞ ¼ xR ðnÞ þ jxI ðnÞ (throughout this paper, subscripts R and I denote the real and the imaginary parts of a scalar, complex valued variable, respectively). As a consequence, the corresponding complex regressor vector xM ðnÞ is written as I xM ðnÞ ¼ xR M ðnÞ þ jxM ðnÞ,

(17)

I where vectors xR M ðnÞ and xM ðnÞ denote the real and the imaginary part of xM ðnÞ, respectively, i.e., T xR M ðnÞ ¼ ½xR ðnÞ xR ðn  1Þ . . . xR ðn  M þ 1Þ

(18) and xIM ðnÞ

cM ðnÞ. They are defined as I bcM ðnÞ ¼ cR M ðnÞ þ cM ðnÞ,

(26)

I c~ M ðnÞ ¼ cR M ðnÞ  cM ðnÞ.

(27)

Consider the complex variable ~ þ jb Y ðnÞ ¼ yðnÞ yðnÞ.

(28)

It is computed using Eqs. (24) and (25) as I Y ðnÞ ¼ ðbcM ðn  1Þ þ j~cM ðn  1ÞÞT ðxR M ðnÞ þ jxM ðnÞÞ.

(29) Application of the SR transform, Eq. (15), to the above equation, results in T ybðnÞ ¼ bcM ðn  1Þb xM ðnÞ  ðbcM ðn  1Þ

 c~ M ðn  1ÞÞT xR M ðnÞ,

ð30Þ

~ ¼ c~ TM ðn  1Þx~ M ðnÞ þ ðbcM ðn  1Þ yðnÞ ¼ ½xI ðnÞ xI ðn  1Þ . . . xI ðn  M þ 1Þ

T

 c~ M ðn  1ÞÞT xR M ðnÞ. (19)

(superscripts R and I denote the real and the imaginary parts of a vector, complex valued variable, respectively). In a similar way, the filter coefficients complex vector cM ðnÞ is expressed as cM ðnÞ ¼

cR M ðnÞ

þ

jcIM ðnÞ.

(20)

Using Eqs. (17) and (20), the filter output at time n, i.e., yðnÞ ¼ cH M ðn  1ÞxM ðnÞ, is expanded as I;T R I yðnÞ ¼ cR;T M ðn  1ÞxM ðnÞ þ cM ðn  1ÞxM ðnÞ I;T I R þ jðcR;T M ðn  1ÞxM ðnÞ  cM ðn  1ÞxM ðnÞÞ.

ð21Þ Let us consider a set of auxiliary filter output signals, defined as ybðnÞ ¼ yR ðnÞ þ yI ðnÞ,

(22)

~ ¼ yR ðnÞ  yI ðnÞ. yðnÞ

(23)

~ Using Eq. (21), variables ybðnÞ and yðnÞ are estimated as T ybðnÞ ¼ c~ TM ðn  1ÞxR cM ðn  1ÞxIM ðnÞ, M ðnÞ þ b

(24)

~ TM ðn  1ÞxIM ðnÞ. ~ ¼ bcTM ðn  1ÞxR yðnÞ M ðnÞ  c

(25)

Here, vectors bcM ðnÞ and c~ M ðnÞ are transformed versions of the original filter coefficient vector

ð31Þ

The auxiliary regressors b xM ðnÞ and x~ M ðnÞ appeared above, are compatible defined as I b xM ðnÞ ¼ xR M ðnÞ þ xM ðnÞ,

(32)

I x~ M ðnÞ ¼ xR M ðnÞ  xM ðnÞ.

(33)

Eqs. (30) and (31) provide a cost effective way of ~ computing variables ybðnÞ and yðnÞ, using 3M real valued multiplications and 4M  1 real valued additions. On the contrary, evaluation of either ~ using Eqs. (24) and (25), or yðnÞ using ybðnÞ and yðnÞ Eq. (21), requires 4M real valued multiplications and 4M  2 real valued additions. Thus, working ~ with the transformed variables ybðnÞ and yðnÞ instead of the original parameter yðnÞ, results in a significant reduction in the required number of real valued multiplications. On the contrary, if the SR technique is applied on an individual basis, for each complex valued multiplication involved into the inner product computation yðnÞ ¼ cH M ðn  1ÞxM ðnÞ, 3M real valued multiplications and 7M real valued additions are utilized. Estimation of the complex convolution yðnÞ ¼ cH M ðn  1ÞxM ðnÞ using the SR technique was originally introduced in the context of LMS adaptive filtering [9], where yR ðnÞ and yI ðnÞ was computed in an efficient way. Here, we approach

ARTICLE IN PRESS G.-O. Glentis / Signal Processing 85 (2005) 1759–1779

the subject in a slightly different way, estimating ~ instead of yR ðnÞ and yI ðnÞ. Of course, ybðnÞ and yðnÞ the later variables can be retrieved from the former, by applying the inverse transformation implied by Eqs. (22) and (23). ~ Since ybðnÞ and yðnÞ are estimated using the transformed filters bcM ðnÞ and c~ M ðnÞ, it could be more efficient, if these variables were updated directly, instead of the updating the original cM ðnÞ. Using Eq. (23) of Table 1, we get a couple of recursive equations, of the form R cR M ðnÞ ¼ cM ðn  1Þ þ

c;R wR M ðnÞM ðnÞ

þ

wIM ðnÞc;I M ðnÞ, (34)

c;I c;R I cIM ðnÞ ¼ cIM ðn  1Þ  wR M ðnÞM ðnÞ þ wM ðnÞM ðnÞ. (35)

If we insert the above recursions into Eqs. (26) and (27), we get bcM ðnÞ ¼ bcM ðn  1Þ þ ðb ~ M ðnÞc;I wM ðnÞc;R M ðnÞ  w M ðnÞÞ, (36) ~ M ðnÞc;R b M ðnÞc;I c~ M ðnÞ ¼ c~ M ðn  1Þ þ ðw M ðnÞ þ w M ðnÞÞ. (37) b M ðnÞ Here, the transformed Kalman gain vectors w ~ M ðnÞ, are defined as and w I b M ðnÞ ¼ wR w M ðnÞ þ wM ðnÞ,

(38)

I ~ M ðnÞ ¼ wR w M ðnÞ  wM ðnÞ.

(39)

Consider the complex valued transformed coefficients vector defined as CM ðnÞ ¼ bcM ðnÞ  j~cM ðnÞ.

(40)

It is updated using Eqs. (36) and (37) as ~ M ðnÞÞ CM ðnÞ ¼ CM ðn  1Þ þ ðb wM ðnÞ  j w c;I ðc;R M ðnÞ  jM ðnÞÞ.

ð41Þ

Finally, application of the algorithmic strength reduction transform, Eq. (16), to the above equation, results in bcM ðnÞ ¼ bcM ðn  1Þ þ w b M ðnÞ~cM ðnÞ þ ðb wM ðnÞ  w~ M ðnÞÞc;I M ðnÞ,

ð42Þ

1765

~ M ðnÞbcM ðnÞ c~ M ðnÞ ¼ c~ M ðn  1Þ þ w ~ M ðnÞÞc;I þ ðb wM ðnÞ  w M ðnÞ.

ð43Þ

In the above recursions, a set of transformed filtering error variables has been introduced according to the scheme b zðnÞ þ ybðnÞ, ecM ðnÞ ¼ b

(44)

~ e~cM ðnÞ ¼ z~ðnÞ  yðnÞ,

(45)

where b zðnÞ ¼ zR ðnÞ þ zI ðnÞ;

z~ðnÞ ¼ zR ðnÞ  zI ðnÞ.

(46)

The computational complexity of Eqs. (42) and (43) is 3M real valued multiplications and 5M real valued additions. This figure should be compared with the computational complexity of the original adaptive scheme implied by Eqs. (34) and (35), which is either 4M real valued multiplications and 4M real valued additions, or 3M real valued multiplications and 7M real valued additions, depending on the type of complex valued multiplication (i.e., classical, or fast elementwise) is adopted. Clearly, the proposed recursions provide the best low complexity compromise among all the competitive schemes. The SR transform can be applied to the forward and the backward prediction parameters that are involved in the computational flow of the FRLS algorithm of Table 1. Indeed, fast, three-term recursions can be derived for the efficient updating of the transformed forward prediction parameters, implied by Eqs. (1)–(4) of Table 1, as well as the transformed backward prediction parameters, resulting from Eqs. (10)–(20) of Table 1. The SR FRLS formulas for updating the transformed forward and backward parameters are tabulated in Table 2, Eqs. (1)–(15) and (36)–(57), respectively. These reduced complexity recursions result from the general filtering scheme discussed so far, bðn þ 1Þ, z~ðnÞ ¼ xðn ~ þ 1Þ, and by setting b zðnÞ ¼ x bðn  MÞ, z~ðnÞ ¼ xðn ~  MÞ, respectively. b zðnÞ ¼ x Finally, the derivation of the SR Kalman gain updating scheme is provided by Eqs. (16)–(35) of Table 2. The proof of the pertinent SR recursions is supplied in Appendix. The signal flow chart of the proposed FRLS algorithm is presented in Fig. 1. Twelve basic processing blocks are utilized, named after the

ARTICLE IN PRESS G.-O. Glentis / Signal Processing 85 (2005) 1759–1779

1766

Table 2 Real valued implementation of the stabilized FRLS adaptive algorithm, using the strength reduction transform (SR FRLS) Initialization bcM ð1Þ ¼ 0; c~ M ð1Þ ¼ 0; w b M ð1Þ ¼ 0; w ~ M ð1Þ ¼ 0; b aM ð1Þ ¼ 0; a~ M ð1Þ ¼ 0 b bM ð1Þ ¼ 0; b~ M ð1Þ ¼ 0; dw ð1Þ ¼ 0 M

aM ð1Þ ¼ 1; afM ð1Þ ¼ lM abM ð1Þ; abM ð1Þ ¼ s2x bðnÞ ¼ xR ðnÞ þ xI ðnÞ; xðnÞ ~ x ¼ xR ðnÞ  xI ðnÞ b zðnÞ ¼ zR ðnÞ þ zI ðnÞ; z~ðnÞ ¼ zR ðnÞ  zI ðnÞ

Forward prediction part 1 aTM ðn  1Þb xM ðn  1Þ xf1 ðnÞ ¼ b f 2 x ðnÞ ¼ a~ T ðn  1Þx~ M ðn  1Þ 2

3 4 5 6 7 8 9 10 11 12 13 14 15 Kalman gain part 16 17 18 19 20 21 22 23

M

aM ðn  1Þ  a~ M ðn  1Þ daM ðn  1Þ ¼ b

R xf3 ðnÞ ¼ daT M ðn  1ÞxM ðn  1Þ

RM

RA

M

M 1

M

M 1

0 M

M M 1

b bðnÞ þ ðxf1 ðnÞ  xf3 ðnÞÞ efM ðnÞ ¼ x

~ þ ðxf2 ðnÞ þ xf3 ðnÞÞ e~fM ðnÞ ¼ xðnÞ f bM ðnÞ ¼ b efM ðnÞ=aM ðn  1Þ f ~M ðnÞ ¼ e~fM ðnÞ=aM ðn  1Þ fM;R ðnÞ ¼ 0:5ðbfM ðnÞ þ ~fM ðnÞÞ b M ðn  1Þ~fM ðnÞ gfM;1 ðnÞ ¼ w f ;2 ~ M ðn  1ÞbfM ðnÞ gM ðnÞ ¼ w f ;3 gM ðnÞ ¼ dwM ðn  1ÞfM;R ðnÞ b aM ðn  1Þ  gfM;1 ðnÞ  gfM;3 ðnÞ aM ðnÞ ¼ b a~ M ðnÞ ¼ a~ M ðn  1Þ  gfM;2 ðnÞ  gfM;3 ðnÞ efM ðnÞbfM ðnÞ afM ðnÞ ¼ lafM ðn  1Þ þ 0:5ðb

2 2 1 1 1 M M M 2M 2M þ e~fM ðnÞ~fM ðnÞÞ

f efM ðnÞ=afM ðn  1Þ kbMþ1 ðnÞ ¼ l1 b f 1 k~Mþ1 ðnÞ ¼ l e~fM ðnÞ=afM ðn  1Þ " # 1 f wf ;1 ðnÞ gMþ1 ðnÞ ¼ kb b aM ðn  1Þ Mþ1 " # 1 f ;2 ðnÞ gwf k~ Mþ1 ðnÞ ¼ a ~ M ðn  1Þ Mþ1 f f ;I ðnÞ ¼ 0:5ðkbMþ1 ðnÞ  k~Mþ1 ðnÞÞ kfMþ1 " # 0 ;3 f ;I gwf Mþ1 ðnÞ ¼ dw ðn  1Þ kMþ1 ðnÞ M " # 0 ;1 wf ;3 b Mþ1 ¼ þ gwf w Mþ1 ðnÞ  gMþ1 ðnÞ b M ðn  1Þ w " # 0 ;2 wf ;3 þ gwf w~ Mþ1 ¼ Mþ1 ðnÞ  gMþ1 ðnÞ ~ M ðn  1Þ w

2

1 1 M M 0

1

M 2M 2M 2

25

f f efM ðnÞ þ k~Mþ1 ðnÞ~efM ðnÞÞ aMþ1 ðnÞ ¼ aM ðn  1Þ þ 0:5ðkbMþ1 ðnÞb 2 3 " # b qM ðnÞ q~ M ðnÞ 4 5 b Mþ1 ðnÞ ¼ ; w~ Mþ1 ðnÞ ¼ ~b w b kMþ1 ðnÞ kbMþ1 ðnÞ

26

bb b gwb;1 M ðnÞ ¼ bM ðn  1Þk Mþ1 ðnÞ

M

24

2

2

ARTICLE IN PRESS G.-O. Glentis / Signal Processing 85 (2005) 1759–1779

1767

Table 2 (continued ) Initialization bcM ð1Þ ¼ 0; c~ M ð1Þ ¼ 0; w b M ð1Þ ¼ 0; w ~ M ð1Þ ¼ 0; b aM ð1Þ ¼ 0; a~ M ð1Þ ¼ 0 b bM ð1Þ ¼ 0; b~ M ð1Þ ¼ 0; dw ð1Þ ¼ 0 M

aM ð1Þ ¼ 1; afM ð1Þ ¼ lM abM ð1Þ; abM ð1Þ ¼ s2x bðnÞ ¼ xR ðnÞ þ xI ðnÞ; xðnÞ ~ x ¼ xR ðnÞ  xI ðnÞ b zðnÞ ¼ zR ðnÞ þ zI ðnÞ; z~ðnÞ ¼ zR ðnÞ  zI ðnÞ RM 27 28 29 30 31 32 33 34 35

~b b gwb;2 M ðnÞ ¼ bM ðn  1Þk Mþ1 ðnÞ kb;I ðnÞ ¼ 0:5ðkbMþ1 ðnÞ  k~Mþ1 ðnÞÞ

1

dbM ðn  1Þ ¼ b bM ðn  1Þ  b~ M ðn  1Þ b;I b ðnÞ ¼ d gwb;3 M M ðn  1Þk Mþ1 ðnÞ

M

Mþ1

M

M

AM ðnÞ ¼ 1 þ

Mþ1 0:5ðb wTM ðnÞb xM ðnÞ

Backward prediction part T 36 bM ðn  1Þb xM ðnÞ xb1 ðnÞ ¼ b T 37 xb ðnÞ ¼ b~ ðn  1Þx~ M ðnÞ 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 Filtering part 58 59

M M

wb;3 b M ðnÞ ¼ b qM ðnÞ  gwb;1 w M ðnÞ þ gM ðnÞ wb;2 w~ M ðnÞ ¼ q~ M ðnÞ  gM ðnÞ þ gwb;3 M ðnÞ b b ebM ðnÞ ¼ labM ðn  1ÞkbMþ1 ðnÞ b e~b ðnÞ ¼ lab ðn  1Þk~ ðnÞ

2M 2M 1 1

þ

w~ TM ðnÞx~ M ðnÞÞ

M 2 R xb3 ðnÞ ¼ dbT M ðn  1ÞxM ðnÞ b bðn  MÞ þ ðxb1 ðnÞ  xb3 ðnÞÞ EbM ðnÞ ¼ x b ~  MÞ þ ðxb2 ðnÞ þ xb3 ðnÞÞ E~ M ðnÞ ¼ xðn b b b b ebM ðnÞ DM ðnÞ ¼ E M ðnÞ  b b b D~ M ðnÞ ¼ E~ M ðnÞ  e~bM ðnÞ b b ðnÞ; i ¼ 1; 2; 3 bb b eb;i sbi D M M ðnÞ ¼ E M ðnÞ þ b ~ b ðnÞ þ s~ b D~ b ðnÞ; i ¼ 1; 2; 3 ðnÞ ¼ E e~b;i i M M M b;1 bb ~b eb;1 aM ðnÞ ¼ aMþ1 ðnÞ  0:5ðb M ðnÞk Mþ1 ðnÞ þ e~M ðnÞk Mþ1 ðnÞÞ a DM ðnÞ ¼ AM ðnÞ  aM ðnÞ aM ðnÞ ¼ AM ðnÞ þ sa DaM ðnÞ bb;i eb;i i ¼ 2; 3 M ðnÞ=aM ðnÞ; M ðnÞ ¼ b b;i ~M ðnÞ ¼ e~b;i i ¼ 2; 3 M ðnÞ=aM ðnÞ; ~b;2 b;I b;2 M ðnÞ   M ðnÞ ¼ 0:5ðb M ðnÞÞ b M ðnÞ~b;2 gb;1 M ðnÞ ¼ w M ðnÞ b;2 ~ M ðnÞbb;2 gM ðnÞ ¼ w M ðnÞ b M ðnÞ  w ~ M ðnÞ dwM ðnÞ ¼ w b;I w ðnÞ ¼ d ðnÞ gb;3 M M ðnÞ M b;3 b bM ðnÞ ¼ b bM ðn  1Þ  gb;1 M ðnÞ  gM ðnÞ b;2 b~ M ðnÞ ¼ b~ M ðn  1Þ  gM ðnÞ  gb;3 M ðnÞ b;3 abM ðnÞ ¼ labM ðn  1Þ þ 0:5ðb eb;3 ðnÞb b;3 b;3 M ðnÞ þ e~M ðnÞ~ M M ðnÞÞ

T y1 ðnÞ ¼ bcM ðn  1Þb xM ðnÞ y2 ðnÞ ¼ c~ TM ðn  1Þx~ M ðnÞ

RA

2M

2M  1

M

M 1

M

M 1

M

M 1 2 2 1 1 3 3

2

2

1 1 2 2 1 M M 0 M

M M 2M

2

2M 2

M

M 1

M

M 1

ARTICLE IN PRESS G.-O. Glentis / Signal Processing 85 (2005) 1759–1779

1768 Table 2 (continued ) Initialization

bcM ð1Þ ¼ 0; c~ M ð1Þ ¼ 0; w b M ð1Þ ¼ 0; w ~ M ð1Þ ¼ 0; b aM ð1Þ ¼ 0; a~ M ð1Þ ¼ 0 b bM ð1Þ ¼ 0; b~ M ð1Þ ¼ 0; dw ð1Þ ¼ 0 M

aM ð1Þ ¼ 1; afM ð1Þ ¼ lM abM ð1Þ; abM ð1Þ ¼ s2x bðnÞ ¼ xR ðnÞ þ xI ðnÞ; xðnÞ ~ x ¼ xR ðnÞ  xI ðnÞ b zðnÞ ¼ zR ðnÞ þ zI ðnÞ; z~ðnÞ ¼ zR ðnÞ  zI ðnÞ RM 60 61 62 63 64 65 66 67 68 69 70 71

dcM ðn  1Þ ¼ bcM ðn  1Þ  c~ M ðn  1Þ R y3 ðnÞ ¼ dcT M ðn  1ÞxM ðnÞ c b eM ðnÞ ¼ b zðnÞ  ðy1 ðnÞ  y3 ðnÞÞ e~cM ðnÞ ¼ z~ðnÞ  ðy2 ðnÞ þ y3 ðnÞÞ bcM ðnÞ ¼ b ecM ðnÞ=aM ðnÞ ~cM ðnÞ ¼ e~cM ðnÞ=aM ðnÞ cM ðnÞ  ~cM ðnÞÞ c;I M ðnÞ ¼ 0:5ðb c;1 b M ðnÞ~cM ðnÞ gM ðnÞ ¼ w c;2 ~ M ðnÞbcM ðnÞ gM ðnÞ ¼ w c;3 gM ðnÞ ¼ dwM ðnÞc;I M ðnÞ c;3 bcM ðnÞ ¼ bcM ðn  1Þ þ gc;1 M ðnÞ þ gM ðnÞ c;2 c~ M ðnÞ ¼ c~ M ðn  1Þ þ gM ðnÞ þ gc;3 M ðnÞ

M

RA M M 1 2 2

1 1 1 M M M

M 1 2M 2M

Forward and backward prediction powers abM ð1Þ and afM ð1Þ are initialized using an estimate of the input signal power, s2x . Eqs. (25) define vector partitions.

~ M ðnÞ and c~ M ðnÞ, four extra local vector variables w are used, i.e., daM ðnÞ, dbM ðnÞ, dwM ðnÞ and dcM ðnÞ. These are the difference between the corresponding ‘tilde’ and ‘hat’ state parameters, and are defined by Eqs. (3), (29), (53) and (60) of Table 2, respectively. Equations of Table 2 that describe the proposed stabilized FRLS adaptive, can be partitioned to fit into the processing scheme of Fig. 1. Take for example the top-left block that is characterized by the vector variable b aM ðnÞ. Then, Eqs. (1), (5), (7), (10), (13) and (16) of Table 2, may be assigned to this particular processing block.

Fig. 1. Block diagram of the proposed SR FRLS algorithm.

main vector parameter that is processed by each block. Apart from the ‘state’ vector variables b b M ðnÞ and bcM ðnÞ, and a~ M ðnÞ, b~ M ðnÞ, aM ðnÞ, b bM ðnÞ, w

Remark 1. Eq. (29) implies a least-squares filtering interpretation, where the FIR model is parameterized by the transformed parameters vector CM ðnÞ ¼ bcM ðnÞ  j~cM ðnÞ and the desired response signal is set equal to ZðnÞ ¼ z~ðnÞ þ jb zðnÞ. Obviously, the original filter cM ðnÞ can be recovered from the transformed parameters, as cR cM ðnÞ þ c~ M ðnÞÞ, M ðnÞ ¼ 0:5ðb

(47)

cIM ðnÞ ¼ 0:5ðbcM ðnÞ  c~ M ðnÞÞ.

(48)

ARTICLE IN PRESS G.-O. Glentis / Signal Processing 85 (2005) 1759–1779

In a similar way, the original filtering error, ecM ðnÞ ¼ zðnÞ  yðnÞ, is retried by the corresponding transformed filtering errors, as ec;R ecM ðnÞ þ e~cM ðnÞÞ, M ðnÞ ¼ 0:5ðb

(49)

ec;I ecM ðnÞ  e~cM ðnÞÞ. M ðnÞ ¼ 0:5ðb

(50)

Remark 2. Despite the fact that the FRLS implements an RLS scheme, the algorithm exhibits poor numerical error performance, which appears, even when it is carefully implemented in double precision arithmetic. Severe divergence problems are caused by the intrinsic instability of the FRLS algorithm, and extra care has been given to compensate for this undesirable effect. The most promising method for the stabilization of the FRLS algorithm was proposed in [22], who suggested the use of a small amount of computational redundancy combined with a suitable feedback mechanism, to control the numerical errors. It is the difference between the instantaneous backward error and its value as determined by the algorithm. It was shown in [20], how this can be fed back into the algorithm and considerably improve its numerical performance. Two specific variables are used for this purpose, the a priori backward error ebM ðnÞ and the Kalman gain power aM ðnÞ. These are estimated by the complex valued FRLS algorithm as is dictated by Eqs. (10) and (14) of Table 1, and defined by E bM ðnÞ

¼ xðn  MÞ þ

AM ðnÞ ¼ 1 þ

bH M ðn

wH M ðnÞxM ðnÞ.

 1ÞxM ðnÞ, ð51Þ

We form the differences DbM ðnÞ ¼ E bM ðnÞ  ebM ðnÞ, DaM ðnÞ ¼ AM ðnÞ  aM ðnÞ.

ð52Þ

In infinite precision, these differences should be zero; however in practice, DbM ðnÞ grows exponentially while DaM ðnÞ grows linearly. These errors are utilized as a feedback mechanism to control the error propagation and provide a new set of variables to be used by the algorithm as is dictated by Eqs. (13) and (17) of Table 1. Tuning variables

1769

sbi , i ¼ 1; 2; 3 and sa are carefully selected so that the overall algorithm remains stable. In analogy with the complex valued FRLS algorithm, the differences between the backward errors, e~bM ðnÞ and b ebM ðnÞ, and the Kalman gain power, aM ðnÞ, and their estimated values have been identified as sensitive stability indicators for the SR FRLS adaptive algorithm of Table 2, where a real valued implementation of the FRLS adaptive algorithm using the strength reduction transform is adopted. These variables are defined as T b ~  MÞ þ b E~ M ðnÞ ¼ xðn b ðn  1Þb xM ðnÞ R  dbT M ðn  1ÞxM ðnÞ,

ð53Þ

T bb ðnÞ ¼ x bðn  MÞ þ b~ ðn  1Þx~ M ðnÞ E M R þ dbT M ðn  1ÞxM ðnÞ,

ð54Þ

T T bM ðn  1Þ  b~ M ðn  1Þ, and where dbM ðn  1Þ ¼ b

~ TM ðnÞx~ M ðnÞÞ. wTM ðnÞb xM ðnÞ þ w AM ðnÞ ¼ 1 þ 0:5ðb (55) They are alternatively estimated by Eqs. (34), (35) and (45) of Table 2. Once again, we form the differences ~ b ðnÞ ¼ E~ b ðnÞ  e~b ðnÞ, D M M M b b ðnÞ ¼ E bb ðnÞ  b D ebM ðnÞ M M

ð56Þ

and DaM ðnÞ ¼ AM ðnÞ  aM ðnÞ.

(57)

These variables, that should be equal to zero in infinite precision, are used through appropriate feedback to improve stability behavior of the algorithm, as is dictated by Eqs. (43), (44) and (47) of Table 2. Parameters s~ bi , b sbi , i ¼ 1; 2; 3, and sa are tuning variables that are utilized for the stabilization of the algorithm. The stable performance of the resulting stabilized algorithm is testified via extensive simulations, as it is demonstrated in Section 5. Notice that the strength reduction technique adopted here, implies a close relationship between the backward error indicators utilized by the FRLS algorithm of Table 1 and the SR FRLS

ARTICLE IN PRESS G.-O. Glentis / Signal Processing 85 (2005) 1759–1779

1770

algorithm of Table 2, i.e., b b ðnÞ ¼ Db;R ðnÞ þ Db;I ðnÞ, D M M M ~ b ðnÞ ¼ Db;R ðnÞ  Db;I ðnÞ, D M M M

ð58Þ

b;I where, Db;R M ðnÞ and DM ðnÞ denote the real and the imaginary part of variable DbM ðnÞ defined by Eq. (52).

4.1. Complexity assessment The computational complexity of the complex valued FRLS algorithm of Table 1 is given by C classical FRLS 8M CMUL þ 8M CADD þ 2MRMUL þ 2M RADD.

ð59Þ

Here, CMUL, CADD, RMUL and RADD denote the two operands complex valued multiplication, the two operands complex valued addition, the two operands real valued multiplication, and the two operands real valued addition, respectively. Notice that: (a) multiplication by l is ignored, since it can be realized by a digit shift operator, provided that l is chosen to be a power of two, and (b) real valued divisions are treated as being real valued multiplications. Direct implementation of the fast complex valued multiplication method to the complex valued FRLS, can reduce the number of real multiplications, at the expense of an increased number of additions. Consider Eq. (59) that describes the complexity of the original FRLS adaptive algorithm. The equivalent algorithm complexity in terms of real valued operations is estimated, using the fact that the classical complex valued multiplication method requires 4 RMULs and 2 RADDs, while the complex valued addition can be performed by 2 RADDs. Thus, we get C classical FRLS ¼ 34M RMUL þ 34M RADD.

(60)

On the other hand, if the direct fast complex valued multiplication is applied, where 3 RMULs and 5 RADDs are engaged, we get SR C direct ¼ 26M RMUL þ 58M RADD. FRLS

(61)

Comparing Eqs. (60) and (61) we conclude that the direct application of the fast complex valued

multiplication reduces the number of real valued multiplications by 23.5%, at the expense of 70.7% extra real valued additions. The computational complexity of the proposed novel SR FLRS adaptive algorithm of Table 2, is C proposed FRLS

SR

26M RMUL þ 35M RADD.

(62)

Comparing Eqs. (62) and (60) that gives the equivalent real valued arithmetic complexity of the FRLS algorithm, it is established that the proposed SR FRLS adaptive algorithm requires about 23.5% less real valued multiplications, at the expense of 2.9% extra real valued additions. A comparison of the relative computational requirements among all FRLS adaptive algorithms discussed in this paper is provided in Table 3. The amount of memory required for the implementation of the proposed SR FRLS adaptive algorithm of Table 2, is slightly increased compared to the memory requirements of the direct implementation of the FRLS algorithm of Table 1. Indeed, the complex valued FRLS adaptive scheme of Table 1, requires approximately 12M real valued memory cells, while the proposed real valued SR FRLS adaptive algorithm requires approximately 16M real valued memory cells. This figure can further be reduced to 14M real valued memory cells, at the expense of a small increase in the number of real valued additions ( i.e. 37M instead of 37M real valued additions). Computational savings, without sacrificing performance, is a task of primary interest in the design of high-processing rate adaptive systems. Implementation of the proposed FRLS algorithm

Table 3 Comparison of the relative computational requirements among three different realizations of the complex valued FRLS algorithm Realization of the FRLS Algorithm

Relative RMUL

Relative RADD

Classical Direct SR Proposed SR

100% 77.5% 77.5%

100% 171% 103%

ARTICLE IN PRESS G.-O. Glentis / Signal Processing 85 (2005) 1759–1779

on a typical DSP processor with a single-cycle MAC (multiply and accumulate) unit, requires 20% less cycles per processing sample, compared to the implementation of the original complex valued FRLS scheme. On the contrary, if the fast multiplication is applied directly on each complex valued operation, the processing rate would be deteriorated by a factor of 70%. When multiple DSP units are utilized for the implementation of the adaptive processing task, the proposed FRLS method results in hardware savings [23]. When the implementation of the adaptive algorithm in ASIC is under consideration, the proposed method could result in about 20% reduction of the power dissipation and the silicon area of the fabricated circuit, since the hardware multiplier units are an order of magnitude more complex compared to the hardware addition/subtraction units [24,9,16].

5. Simulation results The performance of the proposed FRLS adaptive algorithm for complex valued adaptive filtering and system identification is illustrated by means of computer simulations. Two important adaptive filtering applications are considered: (a) the FIR system identification, and (b) the equalization of a telecommunication channel using a linear equalizer. 5.1. Adaptive FIR system identification The input time series used in the experiment is a complex valued stationary AR process of order two, i.e., xðnÞ ¼ a1 xðn  1Þ þ a2 xðn  2Þ þ ZðnÞ. The driven signal ZðnÞ is a unit variance complex valued white Gaussian noise. This AR process is used as input to an FIR system of order M ¼ 100. The filter taps are randomly selected from a complex valued, uniformly distributed random process. At the output of the FIR system complex valued white Gaussian noise is added, resulting in an SNR equal to about 30 dB. Thus the contaminated with noise output of the system is

1771

given by yðnÞ ¼

100 X

ci xðn  i þ 1Þ þ nðnÞ.

(63)

i¼1

The eigenvalue spread of the sampled autocorrelation matrix of the input signal is controlled by the parameters of the AR process. In our case the AR model parameters are selected so that the resulting condition number of the sampled autocorrelation matrix to be wR ¼ Oð102 Þ. The unknown FIR system is estimated using the classical FRLS adaptive algorithm of Table 1, and the SR FRLS adaptive algorithm of Table 2. The exponentially forgetting parameter is set equal to l ¼ 0:997. The filtering error power for each case is computed by averaging the squared instantaneous filtering errors. Since input data are stationary signals, ensemble averaging is replaced by time averaging, over an exponentially decaying window with effective memory equal to 128 time instants. Simulation results are presented in Fig. 2, where the mean-squared filtering error EðecM ðnÞec M ðnÞÞ is plotted versus time. The output of the SR FRLS algorithm is mathematically equivalent to that of the original FRLS algorithm, since the filtering error ec;R M ðnÞ, R I ec;I ðnÞ and the filter coefficients c ðnÞ, c ðnÞ can M M M be retrieved from the transformed parameters b ecM ðnÞ, e~cM ðnÞ and bcM ðnÞ, c~ M ðnÞ, as is dictated by Eqs. (47)–(50). The numerical equivalence of the two algorithms is demonstrated by computing the signals difference c;R

DeM ðnÞ ¼ ec;R ecM ðnÞ þ e~cM ðnÞÞ, M ðnÞ  0:5ðb c;I

DeM ðnÞ ¼ ec;I ecM ðnÞ  e~cM ðnÞÞ, M ðnÞ  0:5ðb R

DcM ðnÞ ¼ kcR cM ðnÞ þ c~ M ðnÞÞk1 , M ðnÞ  0:5ðb I

DcM ðnÞ ¼ kcIM ðnÞ  0:5ðbcM ðnÞ  c~ M ðnÞÞk1 .

(64) (65) (66) (67)

These signals are plotted in Fig. 3. Clearly, the FRLS algorithm and the proposed SR FRLS algorithm, are numerically equivalent, since the signals difference are very close to the machine’s precision constant.

ARTICLE IN PRESS G.-O. Glentis / Signal Processing 85 (2005) 1759–1779

1772

Learning curve 25

MSE db

20 15 10 5 0 0

500

1000

1500

2000

(a)

2500 3000 samples

3500

4000

4500

7

8

9

5000

Learning curve 25

MSE db

20 15 10 5 0 0

1

2

3

4

(b)

5

6

samples

10 x 105

Fig. 2. Adaptive FIR system identification: Plot of the Mean Squared Error (MSE), EðecM ðnÞec M ðnÞÞ, versus time; (a) convergence behavior and (b) long run behavior.

Simulation results allow us to conclude that both algorithms show similar stabilization characteristics. The error feedback signals, Db;R M ðnÞ, a Db;I ðnÞ and D ðnÞ, that are utilized by the FRLS M M algorithm are depicted in the firstbcolumn of Fig. 4. b ðnÞ D ~ b ðnÞ and The error feedback signals D M M a DM ðnÞ, that are utilized by the SR FRLS algorithm are depicted in second column of Fig. 4. Clearly, both algorithms perform well, when feedback stabilization is utilized. 5.2. Adaptive channel equalization In this experiment, the proposed SR FRLS adaptive algorithm is used for the equalization of a telecommunication channel [2,3,5]. A QPSK sig-

nalling is considered and the ISI is simulated by passing the transmitted signal through a discrete-time linear channel. At the receiver, a linear symbol spaced equalizer is engaged in order to cancel the ISI introduced by the channel. Although linear, symbol spaced, equalizers are considered, the proposed SR FRLS methodology can be extended to the case of fractionally spaced, and/or decision feedback equalizers, where multichannel SR FRLS algorithms can be applied. Several communication channels are considered for the evaluation of the SR FRLS linear equalizer. Following [8], channels h1 and h2 are considered, with impulse responses h1 ¼ ½0:2194ejf0 þ ejf1 þ 0:2194ejf3 T

ARTICLE IN PRESS G.-O. Glentis / Signal Processing 85 (2005) 1759–1779

x 10-13

3 2

1

1

c,ℜ

∆eM (n)

2

0

c,ℜ

∆eM (n)

3

-1

-2

-2

-3

-3 0

2

4 6 samples

8

-4

10 x 105

0

2

4 6 samples

8

10 x 105

4

8

10

x 10-13 2

1.5

1.5 ∆cM (n)

1





∆cM (n)

x 10-13 2

0.5

0

x 10-13

0

-1

-4

1773

1

0.5

0

2

4

6

8

samples

10 x 10

5

0

0

2

6

samples

x 105

Fig. 3. Comparison c;R between c;Ithe output of the classical complex valued FRLS and the proposed SR FRLS adaptive algorithm. c;R c;I Difference signals DeM ðnÞ, DeM ðnÞ, DcM ðnÞ and DcM ðnÞ are plotted versus time.

and h2 ¼ ½0:227ejf0 þ 0:406ejf1 þ 0:688ejf2 þ 0:406ejf3 þ 0:227ejf4 T . Here, the fi are random phases. Moreover, channel h3 is also considered with impulse response [25] h3 ¼ ½0:1897ejf0 þ 0:5097ejf1 þ 0:6847ejf2 þ 0:4600ejf3 þ 0:1545ejf4 T . Finally, a time-varying channel h4 ðnÞ is considered with impulse response given by the first order model [26,27,29], h4 ðnÞ ¼ bb4 ðn  1Þ þ qðnÞ. Here, qðnÞ is a zero mean complex valued random signal having diagonal covariance matrix, i.e.,

EðqðnÞqH ðnÞÞ ¼ s2q I. The model parameters are set equal to b ¼ 0:9999, s2q ¼ 0:01, and h4 ð1Þ ¼ h3 . In all cases, additive complex white Gaussian noise is added to the received signal, resulting in an SNR equal to 20 db. The stabilized SR FRLS adaptive algorithm is employed for the channel equalization for all four channels tested. In all cases, feedback stabilization is employed, where the stabilization parameters are all set equal to sbi ¼ b sbi ¼ s~ bi ¼ 0:5, i ¼ 1; 2; 3, a and s ¼ 0:5. The equalizers are initially trained using a known data sequence, consisting of 100 data samples. After the training period, equalizers are set to a decision-directed mode. A set of 100 independent trials are run for each experiment, and the ensemble squared error (MSE) is estimated for each case. Details of each experiment such as the equalizer length M, the decision delay

ARTICLE IN PRESS G.-O. Glentis / Signal Processing 85 (2005) 1759–1779

1774

4

x 10-14

1 hat ∆bM (n)

∆b,ℜ (n) M

2 0 -2 -4 0

2

4

6

8

0 -0.5 -1

10

x 10-14

1

0

-5 0

2

4 6 samples

8

-1

2

∆αM (n)

∆αM (n)

4 6 samples

8

10 x 105

6

8

10 x 105

x 10-13

-0.5

10 x 105

0

4

0

10

2

2

0.5

x 10-16

0

0

samples

tilde ∆bM (n)

∆b,ℑ (n) M

0.5

x 105

samples

5

x 10-13

0

2

4 6 samples

8

10 x 105

4 6 samples

8

10 x 105

x 10-15

1 0 -1

0

2

Fig. 4. Stability issues. Signals utilized for the error feedback stabilization of the classical complex valued FRLS (first column) and the proposed SR FRLS adaptive algorithm (second column). Notice that all signals of interest are kept at a low level, in the range of machine’s precision.

D and the forgetting factor l, are summarized below

Channel Channel Channel Channel

h1 , h2 , h3 , h4 ,

M M M M

¼ 9, D ¼ 5, l ¼ 0:98, ¼ 21, D ¼ 11, l ¼ 0:985, ¼ 21, D ¼ 11, l ¼ 0:985, ¼ 21, D ¼ 11, l ¼ 0:985.

The learning curve for each experiment is depicted in Fig. 5. Notice the striking difference between the results reported here and those of [8], as far as the number of training symbols is concerned. This is due to the fact that the convergence rate of RLStype algorithms is independent from the eigenvalue distribution of the input data signals, a property

that the LMS-type algorithms do not possess. No numerical instability is occurred during the simulation of the SR FRLS adaptive equalization algorithm for any of the four channels tested.

6. Conclusions A novel implementation of the fast recursive least squares algorithm for complex adaptive filtering, has been presented. The proposed method results in a reduced computational complexity adaptive scheme, where all complex valued operations are replaced by real valued counterparts. A set of auxiliary signals and filtering parameters has

ARTICLE IN PRESS G.-O. Glentis / Signal Processing 85 (2005) 1759–1779

been introduced, which are propagated through the algorithm, and are updated by the pertinent recursive equations, in accordance to the strength reduction real valued arithmetic. In this way, an

1775

efficient adaptive scheme has been established based on novel three-terms recursions for the time updating of the pertinent parameters. The proposed algorithm is implemented using real valued

Learning curve 10 MSE db

5 0 -5 -10 -15 -20

0

100

200

300

400

500 600 samples

700

800

900

1000

7

8

9

700

800

900

1000

7

8

9

10

Learning curve -15 MSE db

-15.5 -16 -16.5 -17 -17.5 -18

0

1

2

3

4

MSE db

(a)

6 4 2 0 -2 -4 -6 -8

5 samples

6

10 x 104

Learning curve

0

100

200

300

400

500 600 samples

MSE db

Learning curve -4.5 -5 -5.5 -6 -6.5 -7 -7.5 -8 0

1

2

3

4

5

6

samples x 104 c Fig. 5. Adaptive channel equalization: Mean Squared Error plot E eM ðnÞec M ðnÞ versus time (Convergence behavior and long run behavior), for four different communication channels: (a) Channel equalization: h1 ; (b) channel equalization: h2 ; (c) channel equalization: h3 ; (d) channel equalization: h4 . (b)

ARTICLE IN PRESS G.-O. Glentis / Signal Processing 85 (2005) 1759–1779

1776

Learning curve 10 MSE db

5 0 -5 -10 -15

MSE db

-20

-13 -13.5 -14 -14.5 -15 -15.5 -16 -16.5

0

100

200

300

400

700

800

900

7

8

9

1000

Learning curve

0

1

2

3

4

5

6

10 x 104

samples

(c)

MSE db

500 600 samples

Learning curve

6 4 2 0 -2 -4 -6 -8 0

100

200

300

400

500

600

700

800

900

7

8

9

1000

samples Learning curve

MSE db

0 -5 -10 -15 -20 0

1

2

3

4

5 samples

(d)

6

10 x 104

Fig. 5. (Continued)

arithmetic only, whilst reducing the number of the required real valued multiplications by 23.5%, at the expense of a marginal 2.9% increase in the number of the real valued additions. This provides a significant advantage over the direct application

of a fast complex valued multiplication method to the original FRLS scheme, where the reduction in the number of real valued multiplications, is overshadowed by the 70% increase in the required number of additions. The proposed scheme is

ARTICLE IN PRESS G.-O. Glentis / Signal Processing 85 (2005) 1759–1779

suitable for multiprocessor or VLSI ASIC implementation, achieving reduction of the required DSP units, or in the power dissipation and the silicon area of the fabricated circuit. The performance of the proposed FRLS algorithm has been investigated by computer simulations, in the context of adaptive system identification and adaptive channel equalization. Simulation results indicate that both the original complex valued FRLS algorithm as well as the proposed efficient FRLS implementation using real valued arithmetic attain similar convergence and stabilization properties. Although single-channel complex adaptive filtering has been considered, the proposed SR FRLS methodology can be extended to the case multichannel complex adaptive filtering [28,27].

Let us consider the transformed Kalman gain parameters, defined by Eqs. (38) and (39), in conjunction with Eq. (68). Thus, we get b Mþ1 ðnÞ w " ¼

It remains to show how the Kalman gain parameters are updated using the proposed SR method. Consider the order updating equation of the Kalman gain vector, Eq. (7) of Table 1. Let us define

b M ðn  1Þ w 2

3

;R f ;I kfMþ1 ðnÞ þ kMþ1 ðnÞ

;R ;I b aM ðn  1ÞkfMþ1 ðnÞ þ a~ M ðn  1ÞkfMþ1 ðnÞ

5

ð69Þ and ~ Mþ1 ðnÞ w "

#

0

~ M ðn  1Þ w 2 þ4

3

;R f ;I kfMþ1 ðnÞ  kMþ1 ðnÞ

;R ;I a~ M ðn  1ÞkfMþ1 ðnÞ þ b aM ðn  1ÞkfMþ1 ðnÞ

5.

ð70Þ Define the vector

I wM ðnÞ ¼ wR M ðnÞ þ jwM ðnÞ,

wM ðnÞ. WM ðnÞ ¼ w~ M ðnÞ  jb

I aM ðnÞ ¼ aR M ðnÞ þ jaM ðnÞ

(71)

Using Eqs. (69) and (70), Eq. (71) is updated as follows: " # 0 WMþ1 ðn þ 1Þ ¼ WM ðnÞ " # 1j þ a~ M ðn  1Þ  jb aM ðn  1Þ

and ;R ;I kfMþ1 ðnÞ ¼ kfMþ1 ðnÞ þ jkfMþ1 ðnÞ.

Then, Eq. (7) of Table 1 can be expanded as I wR Mþ1 ðnÞ þ jwMþ1 ðnÞ " # " # 0 0 ¼ þj I wR wM ðn  1Þ M ðn  1Þ " # 1 þ kf ;R ðnÞ R aM ðn  1Þ Mþ1 ! " # 1 f ;I  I ðnÞ k aM ðn  1Þ Mþ1 " # 1 ;I þj ðnÞ kfMþ1 aR ðn  1Þ M ! " # 1 f ;R þ I ðnÞ . k aM ðn  1Þ Mþ1

#

0

þ4

¼ Appendix A. SR Kalman gain updating

1777

f ;R ;I ðkMþ1 ðnÞ  jkfMþ1 ðnÞÞ.

ð72Þ

Finally, application of the strength reduction operation, Eq. (16) results in the following couple of equations: " b Mþ1 ¼ w

0 b M ðn  1Þ w "

 ð68Þ

#

" þ 0

1

#

b aM ðn  1Þ #

b aM ðn  1Þ  a~ M ðn  1Þ

f kbMþ1 ðnÞ

;I ðnÞ kfMþ1

ð73Þ

ARTICLE IN PRESS G.-O. Glentis / Signal Processing 85 (2005) 1759–1779

1778

and

and "

~ Mþ1 ¼ w

#

0

þ

~ M ðn  1Þ w " 

"

0

#

1

a~ M ðn  1Þ #

b aM ðn  1Þ  a~ M ðn  1Þ

" ~ Mþ1 ¼ w

f k~Mþ1 ðnÞ

~ M ðnÞ w "



;I ðnÞ. kfMþ1

0

"

# þ

b~ M ðn  1Þ

Here, we have I b bM ðnÞ ¼ bR M ðnÞ þ bM ðnÞ, b~ M ðnÞ ¼ bR ðnÞ  bI ðnÞ, M

b aM ðnÞ ¼ a~ M ðnÞ ¼

aIM ðnÞ, aIM ðnÞ.

ð75Þ

Similarly, we get kb fMþ1 ðnÞ k~ fMþ1 ðnÞ

¼k ¼k

f ;R Mþ1 ðnÞ þ f ;R Mþ1 ðnÞ 

k k

wM ðnÞ



0

þ

ð76Þ

 bM ðn  1Þ b kMþ1 ðnÞ. 1 (77)

Notice that the above equation is implicitly expressed by Eqs. (8) and (9) of Table 1. Working in a similar way as before, we get the set of recursive equations as " b Mþ1 ¼ w

b M ðnÞ w "



0

#

" þ

b bM ðn  1Þ 1

#

b kbMþ1 ðnÞ

b bM ðn  1Þ  b~ M ðn  1Þ 0

M

ð80Þ

I where bR M ðnÞ and bM ðnÞ represent the real and the imaginary part of the backward predictor bM ðnÞ. Moreover, b b;I kbMþ1 ðnÞ ¼ kb;R Mþ1 ðnÞ þ kMþ1 ðnÞ,

f ;I Mþ1 ðnÞ, f ;I Mþ1 ðnÞ,

f ;R f ;I where, kMþ1 ðnÞ and kMþ1 ðnÞ represent the real and f the imaginary part of parameter kMþ1 ðnÞ. The order downdating recursion associated with the recursive updating of the Kalman gain vector can be treated in a similar way. Let us consider the dual order updating recursion for the Kalman gain vector, i.e.,

wMþ1 ðnÞ ¼

kb;I Mþ1 ðnÞ. ð79Þ

ð74Þ

aR M ðnÞ þ aR M ðnÞ 

b k~Mþ1 ðnÞ

1 # ~ b bM ðn  1Þ  bM ðn  1Þ 0

Auxiliary parameters b aM ðnÞ and a~ M ðnÞ introduced above are related to the forward predictor I aM ðnÞ ¼ aR M ðnÞ þ jaM ðnÞ, as

#

# kb;I Mþ1 ðnÞ ð78Þ

b b;I k~Mþ1 ðnÞ ¼ kb;R Mþ1 ðnÞ  kMþ1 ðnÞ,

ð81Þ

b;I where kb;R Mþ1 ðnÞ and k Mþ1 ðnÞ are the real and the imaginary part of parameter kbMþ1 ðnÞ. The order updating recursions, as well as the order downdating recursions, that are utilized for the efficient time updating of the Kalman gain vector using the SR transform, are listed in Eqs. (16)–(35) of Table 2. Eqs. (15), (24), (35), (45) and (57) of Table 2, that are used for the computation of the ‘power’ of the forward predictor, the Kalman gain and the backward predictor, result by applying the SR updating recursions developed so far, on the definition equations of the pertinent ‘power’ parameters.

References [1] M. Bellanger, Adaptive Digital Filters and Signal Analysis, Marcel Dekker, New York, 1987. [2] S. Benedetto, E. Biglieri, Principles of Digital Transmission, with Wireless Applications, Kluwer Academic Publishers, Dordrecht, 1999. [3] S. Haykin, Adaptive Filter Theory, second ed., PrenticeHall, Englewood Cliffs, NJ, 1991. [4] N. Kalouptsidis, S. Theodoridis (Eds.), Adaptive System. Identification And Signal Processing Algorithms, PrenticeHall, Englewood Cliffs, NJ, 1993.

ARTICLE IN PRESS G.-O. Glentis / Signal Processing 85 (2005) 1759–1779 [5] J. Proakis, Digital Communications, third ed., McGrawHill, New York, 1995. [6] R. Baghaie, T. Laakso, Implementation of low-power CDMA RAKE receivers using strength reduction transformation, Proceedings of the IEEE Nordic Signal Processing Symposium, Denmark, 1998, pp. 169–172. [7] D. Bull, Efficient IQ filter structure for use in adaptive equalisation, Electron. Lett. 30 (24) (November 1994) 2018–2019. [8] R. Perry, D. Bull, A. Nix, Efficient adaptive complex filtering algorithm with application to channel equalisation, IEE Proc.-Vis. Image Signal Process. 146 (2) (February 1999) 57–64. [9] N. Shanbhag, M. Goel, Low-power adaptive filter architectures and their application to 51.84 Mb/s ATM-LAN, IEEE Trans. Signal Process. 45 (5) (May 1997) 1276–1290. [10] A. Lamagna, Fast computer algebra, IEEE Comput. 15 (9) (September 1982) 43–56. [11] S. Winograd, Arithmetic Complexity of Computations, SIAM, Philadelphia, 1980. [12] A. Chandrakasan, M. Potkonjak, R. Mehra, J. Rabaey, R. Brodersen, Optimizing power using transforms, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 14 (1) (January 1995) 12–31. [13] A. Chandrakasan, R. Brodersen, Minimizing power consumption in digital CMOS circuits, Proc. IEEE 83 (4) (April 1995) 498–523. [14] K. Parhi, VLSI Digital Signal Processing Systems, Design and Implementation, Wiley, New York, 1999. [15] P. Pirsch, Architectures for Digital Signal Processing, Wiley, New York, 1996. [16] N. Shanbhag, Algorithmic transformation techniques for low-power wireless VLSI systems design, Internat. J. Wireless Informat. Networks 5 (2) (1998) 147–171. [17] G. Carayannis, D. Manolakis, N. Kalouptsidis, A fast sequential algorithm for least-squares filtering and prediction, IEEE Trans. Acoust. Speech, Signal Process. 31 (12) (December 1983) 1394–1402. [18] J. Cioffi, T. Kailath, Fast recursive LS transversal filters for adaptive processing, IEEE Trans. Acoust. Speech, Signal Process. 32 (4) (April 1984) 304–337.

1779

[19] A. Wenzler, E. Luder, New structures for complex multipliers and their noise analysis, Proceedings of the IEEE International Symposium on Circuit Systems, ISCAS-1995, Seattle, pp. 1431–1435. [20] D. Slock, T. Kailath, Numerical Stable Fast RLS transversal adaptive filtering, IEEE Trans. Acoust. Speech, Signal Process. 39 (1) (January 1991) 92–114. [21] G. Glentis, K. Berberidis, S. Theodoridis, Efficient least squares adaptive algorithms for FIR transversal filtering: a unified view, IEEE Signal Process. Mag. 16 (4) (April 1999) 13–42. [22] J. Botto, G. Moustakides, Stabilizing the fast Kalman algorithms, IEEE Trans. Acoust. Speech, Signal Process. 37 (9) (September 1989) 1342–1348. [23] J. Sanchez, H. Barral, Multiprocessor implementation models for adaptive algorithms, IEEE Trans. Signal Process. 44 (9) (September 1996) 2319–2331. [24] J. Diguet, D. Chillet, O. Sentieys, A framework for high level estimations of signal processing VLSI implementations, J. VLSI Syst. Signal, Image Video Technol. 25 (July 2000). [25] T. Vaidis, C. Weber, Block adaptive techniques for channel identification and data demodulation over bandlimited channels, IEEE Trans. Comm. 46 (2) (February 1998) 232–243. [26] E. Eweda, Comparison of RLS, LMS and Sign algorithms for tracking randomly time-varying channels, IEEE Trans. Signal Process. 42 (11) (November 1994) 2937–2944. [27] D. Slock, L. Chisci, H. Lev-Ari, T. Kailath, Modular and numerically stable fast transversal filters for multichannel and multiexperiment RLS, IEEE Trans. Signal Process. 40 (4) (April 1992) 784–802. [28] G. Glentis, N. Kalouptsidis, Fast adaptive algorithms for multichannel filtering and system identification, IEEE Trans. Signal Process. 40 (10) (October 1992) 2433–2458. [29] L. Lindbom, M. Sternad, A. Ahlen, Tracking of timevarying mobile radio channels—Part I: The Wiener LMS algorithm, IEEE Trans. Comm. 49 (12) (December 2001) 2207–2217.