A fast method for regularized adaptive filtering

A fast method for regularized adaptive filtering

DIGITAl. SIGNAL. PRocESSlNC 2,14-26 (1992 1 A Fast Method for Regularized Adaptive Filtering John F. Doherty* and Richard J. Mammone+ *Iowa State ...

1MB Sizes 2 Downloads 229 Views

DIGITAl. SIGNAL. PRocESSlNC 2,14-26

(1992 1

A Fast Method for Regularized Adaptive Filtering John F. Doherty*

and Richard J. Mammone+

*Iowa State University, Ames, Iowa 50011-1045; Piscataway, New Jersey 08903

and +Rutgers University,

1. INTRODUCTION Regression models are used in many areas of signal processing, e.g., spectral analysis and speech LPC, where block processing methods have typically been used to estimate the unknown coefficients. Iterative methods for adaptive estimation fall into two categories: the least-mean-square (LMS) algorithm and the recursive-least-squares (RLS) algorithm. The LMS algorithm offers low complexity and stable operation at the expense of convergence speed. The RLS algorithm offers improved convergence performance at the expense of possible stability problems. Note that low complexity is used here to denote that the computational burden is proportional to the number of adaptive coefficients. The LMS algorithm is the standard algorithm for applications in which low complexity is tantamount. The LMS algorithm produces an approximation to the minimum mean square error estimate; that is, the expected value of the output error approaches zero. The major drawback of the LMS algorithm is that its rate of convergence is dependent upon the eigenvalue spread of the input data correlation matrix, usually excluding its use in high-speed, real-time signal processing applications [ 11. However, it is widely used where the convergence speed is not a problem [ 2,3]. The convergence rate of the LMS algorithm can be increased by introducing methods that attempt to orthogonalize the input data and reduce the eigenvalue spread of the correlation matrix [ 4,5]. The convergence rate of any adaptive algorithm can be specified by either the coefficient error or the estimation error, or both. It is well known that, for the LMS algorithm, the mean square error in the output converges faster than the mean square error in the coefficients [ 6,7]. Thus, the LMS algorithm is useful for situations in which the primary function of the adaptive system is 1051-2004/92 $1.50 Copyright 0 1992 by Academic Press, Inc. All rights of reproduction in any form reserved.

to estimate a signal in the presence of noise.This is in contrast to system identification, i.e., the estimation of the coefficients of an unknown system from its output. In the specific case of an adaptive equalizer, minimizing the equalizer output error (signal estimation) is more important than minimizing the coefficient error (coefficient estimation ) because the output error determines the bit error rate of the communications system. The class of RLS adaptive algorithms converges faster than the LMS algorithm based upon the same number of input samples. The RLS algorithm forms a deterministic least-squares estimate based solely upon solving the deterministic normal equations. The convergence rate of the RLS algorithm can be an order of magnitude faster than the LMS algorithm with much less dependence on the eigenvalue spread of the data correlation matrix [ 81. However, severely ill-conditioned, i.e., high eigenvalue spread, data may still deleteriously affect the convergence of the RLS algorithm [ 91. The generic RLS algorithm has O( N2) complexity because it involves a recursive matrix inverse. The special case of a transversal filter structure results in fast RLS (FRLS) implementations with O(N) complexity [lo]. The RLS and FRLS algorithms are also potentially unstable and usually employ rescue devices that detect the onset of instability [ll] . The unstable behavior has been attributed to finite precision implementation and the effects of illconditioned input data [ 12 ] . Much analysis has been devoted to numerical round-off error propagation in the FRLS algorithm, e.g., [13-161; however, ill-conditioned correlation matrices still pose a stability problem. Ill-conditioned data naturally occur in a variety of signal processing problems, such as fractionally spaced channel equalization [ 171, equalization for channels with spectral nulls [18], narrowband jammer excision [ 19 1, autoregressive spectral estimation [ 201, maximum entropy linear prediction [ 211, and

linear predictive coding of speech [ 22 ] . The RLS algorithm implicitly computes the direct inverse of the data correlation matrix that arises in the deterministic normal equations. The inverse may not exist analytically and the inversion process can be numerically unstable for correlation matrices with high eigenvalue spread. Thus, it is desirable to use a matrix inversion method that is insensitive to the eigenvalue spread of the matrix. This paper presents an adaptive method that combines the simplicity of the LMS algorithm with the performance of the FRLS algorithm, and is also stable when operating with ill-conditioned data. The iterative row-action projection (RAP) algorithm performs updates of the unknown coefficients by using the rows of the data matrix more than once. The RAP algorithm offers 0 (N) complexity, easy implementation, facile multiprocessor implementation, stable operation, and convergence performance comparable to that of the RLS algorithm. This paper also addresses the fact that the least-squares estimate is not robust either in the presence of statistical outliers in the data or when the data matrix is ill conditioned. The RAP algorithm is stable for ill-conditioned data matrices due to the fact that it performs an implicit pseudo-inverse of the data matrix to obtain an estimate of the unknown coefficients. Unlike the direct inverse, the pseudo-inverse is well defined and bounded for rank-deficient matrices. This property is achieved by noninversion of the zero-valued eigenvalues of the correlation matrix. Spectral tapering of the inverse eigenvalues, i.e., regularization, avoids noise enhancement for ill-conditioned data corrupted by noise. We present a method of controlling the regularization inherent in the RAP algorithm. In Section 2 the problem formulation is presented along with the common obstacles to its solution. The LMS and RLS algorithms are reviewed in Section 3. In Section 4 the iterative RAP algorithm is described. In Section 5 the algorithm’s performance for timevarying channel equalization is illustrated with computer simulations, The paper is concluded in Section 6.

2. PROBLEM FORMULATION The following matrix formulation is applicable most least-squares signal processing problems:

in

In compact notation, d = Xc + e.

(2)

For example, in an adaptive transversal filter application, d represents the desired output of the adaptive filter, X is the state matrix of the tapped delay line, c is the unknown coefficient vector, and e is the residual error of the estimation process. The parameter L denotes the processing block size and the parameter N signifies the number of unknown coefficients. The well-known least-squares solution that minimizes l]ell in (2) is [3] cLs = [XTX]-lXTd.

(3)

The matrix quantity XTX is sometimes called the Grammian matrix of the data matrix X. The Grammian matrix also serves as the deterministic autocorrelation matrix for the input sequence x. The deterministic cross-correlation vector is given by XTd. The stochastic approximation is frequently made to approximate the continuous values of these quantities [ 231. A more general form of the least-squares solution of (2) is cpI = X*d,

(4)

where X + is the Moore-Penrose pseudo-inverse of the data matrix X [ 241. The pseudo-inverse solution is unique and is the least-squares solution of minimum norm. The pseudo-inverse solution is identical to the least-squares solution only if the data matrix is full rank. Indeed, when the data matrix X is rank deficient, the Grammian matrix inverse, [ XTX ] -‘, does not exist. The pseudo-inverse, however, is well defined for any m X n matrix regardless of its rank. The pseudo-inverse of the data matrix can be described by its singular value decomposition ( SVD ) . Let x = UB,VT

(5)

by the SVD of the data matrix, where U is an L X L unitary matrix, V is an N X N unitary matrix, and Z, is an L X N matrix of singular values. The overdetermined case of ( 1) produces an equivalent SVD decomposition where U is L X N and 2, = diag( cxl, ux2, . . . f U,N ) is an N X N diagonal matrix, where u$ > 0. The Grammian matrix can also be diagonalized using the SVD decomposition in (5) XTX = VWT,

(6)

9 = zz,.

(7)

where

Equation (6) is the similarity transformation that produces the diagonal eigenvalue matrix a. The condition number of the data matrix is defined as y = a* Inaxj”.z min’ A matrix is considered ill conditioned if the condition number is large relative to the numerical precision used. Inverting an ill-conditioned matrix usually causes numerical anomalies due to the limited dynamic range of the computing hardware. In practice, a condition number greater than 10 imposes serious numerical problems for many methods of inversion. The singular values of the data matrix are the square roots of the eigenvalues of the Grammian matrix ( see ( 7 ) ) ; thus, the RAP algorithm will be less ill conditioned than the RLS algorithm. Using (5)) the pseudo-inverse of X is x+ = VZt,UT,

(8)

where Zz = diug( LI, gL2, . . . , aLN), g:; = u;/ if g,i Z 0, and a:, = 0 otherwise. The number of nonzero singular values will denote the rank of the data matrix. Thus, if the data matrix possesses a null space, i.e., is rank deficient, then at least one of the singular values will equal zero. Components of the desired signal that reside in the null space of the data matrix are not observable; i.e., these components are not coupled to the observation vector. It is for this reason that the components of the solution cLs residing in the null space of X will not contribute to the residual error e in (2). However, it is still desirable to control the null space behavior of cLs in order to maintain certain characteristics of the estimate d = Xc,. For example, in a nonstationary signal environment, the null space of the data matrix may abruptly change with time, with the result that previous null space components of cLs now operate on the data. Thus, the algorithm will produce large residual error if the null space components of the coefficient vector had previously become arbitrarily large. The solution to this problem is to ensure that there are no components of cLs in the null space of X. This is equivalent to requiring the least-squares solution to be minimum norm, i.e., the pseudo-inverse solution. For this reason, the pseudoinverse solution will be more robust than any other least-squares solution. The presence of additive noise in the system has the tendency to improve the condition number of the data matrix. That is, the noisy data matrix is of the form x =

Xsignol

+ Xnoise9

(9)

where Xnoiserepresents a (probably) full-rank noise matrix of random values and Xsrgnalis a rank-deficient signal matrix of rank r. The data matrix will now have singular values of the form Z, = diug( usI + u,~, . . . ,

+ unr~ unr+l 3 * . . 9 (TnN ) . The inverse of the composite Grammian matrix exists, since it is full rank; however, inversion of the singular values beyond ur essentially serves only to enhance the noise components in the data matrix. Thus, truncating the diagonal matrix Z, prior to inversion, to eliminate noise only components, improves the solution [ 251. This method requires ascertaining the rank of the correlation matrix from the data, usually a formidable task. The predominant case for the data matrix is to have a full-rank signal matrix component and a full-rank noise matrix component. Assuming a noiseless case, the data matrix may still be ill conditioned even though it is not rank deficient, e.g., us min@ 1. As stated previously, this poses a problem in implementation and stability, due to the numerical overflow in inverting very small numbers. The presence of noise will usually alleviate this problem by reducing the condition number of the data matrix. However, for those singular values, ux = us + un, satisfying u, 4 u,, direct inversion will result in noise enhancement. Since none of the signal component singular values are zero, setting them to zero as in the pseudo-inverse is inappropriate. What is necessary is to apply a taper to the inversion of the singular values such that modes with low signal-to-noise ratios are attenuated.This process is called regularization. Essentially, this amounts to minimizing the energy in the uncoupled modes of the system, i.e., the null space components. usr

3. REVIEW OF CONVENTIONAL ALGORITHMS In order to compare the RAP algorithm to the LMS and standard RLS algorithms, we briefly describe these algorithms below.

3.1.

The LMS

Algorithm.

The LMS algorithm is an iterative algorithm with O(N) complexity. It is a stochastic steepest descent method whose performance metric is the minimization of the expected value of the output error e. The LMS algorithm transforms the standard steepest descent update, ck+l

=

ck

+

CLIP

-

Rckl,

(10)

CkTXklXkr

(11)

into the LMS update, ck+l

=

using the stochastic

ck

+

PCL[dk

-

approximations

R = E[xxT]+xkx;

(12a)

p = E[d,x,]

(12b)

+ d,x,.

The step size, 0 < p < 2 / X,,, , where X,, is the maximum eigenvalue of the autocorrelation matrix, controls the rate of convergence and the excess noise after convergence [8]. The normalized LMS (NLMS) algorithm uses 1 / ](xk I(’ as a lower bound for l/X,,, and includes it in the update as a normalization factor. The NLMS convergence criterion is 0 < p < 2. Note that the LMS algorithm (11) forms only one update for each input vector, xk.

Coefficient

Space

00 :

t

3.2. The RLS Algorithm The standard

RLS algorithm

gk =

has the form

X-P-,x, 1 + x-‘x,TP,-,x,

(13a)

ek = d, - c~-~x~

(13b)

Ck =

(13c)

ck-l

+

P, = A-‘[Pk-,

e&k

- &x,TP,-,I,

(13d)

where gk is the gain vector and P, is the estimate of [X,X ,‘I -’ using the matrix inversion lemma [ 261. The parameter 0 < X < 1 allows the more recent samples to have greater influence on the output error, allowing for the tracking of nonstationary signals. The iteration (13a)-( 13d) is initialized with P, = 6-‘I to guarantee invertibility of the Grammian matrix, where 0 < 6 4 1. The introduction of the forgetting factor, X, and the initial condition, P, = 61, is equivalent to transforming the Grammian matrix into x,Tnx,

+ X%1,

FIG. 1. Sequence of projections. correspond to a set of hyperplanes sequence of orthogonal projections timum fixed point solution.

plying that all the hyperplanes share a common point. The presence of noise in the data perturbs the hyperplanes from the common point, resulting in an optimal coefficient vector that terminates at a point whose distance from each hyperplane is minimal in some sense. The point with the minimum sum of squared distances (least-squares) is a typical optimality criterion. The goal of the RAP algorithm is to attain this optimal solution via a sequence of orthogonal projections toward the hyperplanes. This process is a special case of the method of projections onto convex sets (POCS) , which has found widespread use in medical imaging and image recovery [ 27,281. The projection of an arbitrary solution vector onto a hyperplane is accomplished with the row-action projection in the equivalent forms

(14) ci+l

where A = diag[l, X, . . . , Xk-’ 1. The initial condition adds a small positive term, Xk8, to each of the singular values of the Grammian matrix. This regularization effect eventually decays if the forgetting factor X is less than unity, as is the case for time-varying system tracking.

4.E

ROW-ACTIONPROJECTION METHOD

4.1. Preliminary

The rows of the data matrix in the coefficient space. The eventually converges to the op-

=

Ci

+ ~[dj -

~

C~X~]

Ci+l= [I-p$]ci

+djs,

(15b)

where ci is a previous estimate and c~+~is the updated estimate that, if p = 1, exactly satisfies the jth equation (row) in ( 1) , The iteration index is related to the equation index by j = i mod L. This update relationship is shown geometrically in Fig. 2. The distance of the ith estimate to the jth hyperplane is

Discussion

The operation of the RAP algorithm is based upon the fact that each equation in ( 1) describes a hyperplane in the coefficient space. A two-dimensional example of this concept is shown in Fig. 1. There are an infinite number of solution vectors that terminate on each individual hyperplane; however, for a consistent set of equations there is only one solution vector that satisfies all the hyperplanes. The optimal solution for noiseless data satisfies all the equations exactly, im-

e. ti.j

=

j$$

9

(16)

where ei,j = [ dj - c yxi ] and xj/ I(xi I( is the unit normal to the jth hyperplane.

4.2. The RAP Algorithm The basic operation of row-action projection described in Section 4.1 is used to form the basis of the

adaptive update method. This can be seen by converting the row-action operator in (15a) into an equivalent matrix operator; Coefficient

Space

cl = Hcl-i + d H = P,P,-,

c-x=

(17a)

. . . P,-,+,

(1%)

d

\*

= (ck-x Ckr,

- d) =

c*

1 IlXll -

= e*

/ II x II

cLAX/llXll

(17d)

oepcz

FIG. 2. Each coefficient tion toward a hyperplane. tion distance.

vector The

update can correspond step size p determines

to a projecthe projec-

RAP algorithm. The repeated sequential application of ( 15a) on the rows (hyperplanes) of the data matrix is an iterative process that converges to the pseudoinverse solution for the unknown coefficients [ 271. The steps that constitute the RAP algorithm are as follows: Retain data vectors whose elements are the rows of the data matrix along with the desired output, i.e., sk = [ xkr dk] . For a transversal filter, this amounts to storing the input sample and the associated desired output sample at each sampling instant. l Once the data buffer is full, make multiple updates of the adaptive coefficients by sequentially using the data vectors, skr and the update equation ( 15a) (Fig. 1). The following optional steps may improve the performance of the algorithm: l Ascertain the hyperplane that produces the largest residual error, using ( 16)) for each cycle through the data. Do not reuse this hyperplane to update the coefficients on the subsequent cycle. This step helps to make the algorithm robust, i.e., an inconsistent datum will consistently be disregarded by the coefficient updating process (Fig. 3 ) . l Compute an accelerated update that uses the difference of two coefficient estimates. This essentially finds a vector in the direction of maximum change in the coefficient space. The pseudo-code of the algorithm is provided in Appendix A.

where each increment of 1 represents an entire cycle through the data block of L equations as in ( 1) . Further substitutions cast (17a)-( 17d) into a recursive form for the pseudo-inverse of the data matrix X [ 311, cl = G,d

(18a)

;1= Bd

(1%)

BX=I-H,

(18~)

l

4.3. RAP Convergence

Analysis

Regularization of ill-conditioned inversion problems was first proposed by Tikhonov and was later applied to signal restoration [ 29, 301. In this section, we show that the RAP algorithm is a regularized

where G, is an estimate of the pseudo-inverse of X after the lth cycle. The matrix B is obtained by substituting the vector Iii = [ 0, . . . , 0, 1 (ithposition), 0, . . . ) 0] for the scalar di in ( 17~). Substituting ( 18a) ( 18~) into ( 17a), the pseudo-inversion recursion becomes

G1+1= GI + B[ I - XG,]. Defining

(19)

E, = I - G,X, with B = G,, the error in the

Coefficient

Space

t

la-4

= c* * p e*., x, I (X1 - x1 )

iff

e*-1.j

-z ebc--r.mrr

FIG. 3. The cyclic reuse of the system matrix allows the RAP algorithm to detect outliers in the data. An outlier will have a projection distance that is always largest when the algorithm is near convergence.

pseudo-inverse lent forms

estimate

can be stated in the equiva-

E, = E,[I - GI-,X1

(20a)

E, = Eb+‘.

(20b)

The convergence of the inversion process ( 19) can be described by taking the singular value decomposition of (20b), i.e., V[ I - Zg[Z,] VT = V[ I - X&] Assuming that ZgJ is diagonal, singular values of X follows

the inversion

QP,)‘+‘l.

a81 = $ [l - (lx

l+V.

(21) of the

(22)

The general conclusion regarding (22) is that ogl + l/ a,if 0 < 1 - uggoux-C 1. If (p 1 + 1, then the first-order approximation of G, = B becomes G oNpXTW)

(23)

where W = ~~ag(lll~xll~2, 1/11x21i2,. . . . l/llx,l12). The singular given by

values of G, in (23) are approximately

P ago = - fJ,, N

10

10 10

(24)

coefficients.

ggl~-$(l-$u:)+l].

Sub-

(25)

The RAP algorithm is regularized via the iteration index, 1, in ( 25 ) . That is, for a fixed number of cycles through the data, the smaller singular values will be attenuated in the inversion process. This regularization gives more weight to inverting the stronger modes of the system because these modes are usually dominated by signal energy. An example of this taper is given in Fig. 4 for a rank-deficient data matrix with dimensions 50 X 11 and for p = 0.2. Note that the eigenvalue inverses are plotted using ( 7), i.e., C#J~ = ~9, the singular value matrix is given by Z = diog( 1.0,0.9, 0.8, . . . ) 0.1,O.O). The smaller singular values are inverted by the RAP algorithm only as the number of cycles through the block of equations increases. The smallest singular value is identically zero, which corresponds to a rank-deficient matrix. The RAP algorithm always produces zero as an estimate of zerovalued singular value; i.e., it tends toward the pseudoinverse solution. The singular value inversion characteristic of the RLS algorithm is given by the uppermost curves, indicating that all the singular values tend to get inverted, even if some are zero. The initialization constant 6 in ( 14) avoids the problem of division by zero; however, the mode corresponding to the null space is still greatly amplified, i.e., 6-l 9 1.

8

6 6

Eigenvalue

FIG. 4. Rank-deficient strated. The RAP curves and 0.92.

where N is the number of adaptive stituting (24) into (22) yields

correlation are plotted

index

matrix example ( 11th eigenvalue = zero). The as 5,10,15,20, and 25 cycles through the matrix.

regularization action of the RAP algorithm The RLS results represent forgetting factors

is demonof 1,0.96,

Since the RAP algorithm tends to attenuate small singular values, which can be dominated by noise energy, it will perform more effectively with regard to noise attenuation when the data matrix is ill conditioned. This conclusion is supported by the simulation results in Section 5.

4.4. Acceleration

Techniques

In application of regularization to adaptive filtering there is not only concern about asymptotic convergence but also rate of convergence. This section presents techniques that can be used to accelerate the rate of convergence of the RAP algorithm. Two simple acceleration techniques are presented that can be viewed as a modification of the regularization process given by ( 25). A TWO-POINT ACCELERATOR. This accelerator uses the difference between two successive block coefficient estimates to obtain a vector in the direction of maximum change (Fig. 5). The accelerated update equations become

c;’ = Cl+ a[q - c;“-,] Cl+ c,A,

(26a) (26b)

where the estimate cl is obtained via (17a) and cA is the accelerated estimate obtained after each cycle. The effect of ( 26) on the regularization process can be seen by rewriting it as

= Ci+2

+ Jl ( Ci+2

- Ci)

/

FIG. 6. An acceleration technique (three-point accelerator) that uses geometrical principles. Three coefficient updates are used to determine an improved estimate. This technique is equivalent to pairwise orthogonalization for a two-by-two system.

c;’ = H&L,

+ B’d,,

(28)

where Hi = I - B;X. The accelerated update is now in the form of the unaccelerated update ( 17a) ; thus, the regularization is the same except for the constant 1 + LY,that is, A ,+

U6-l

l-

fJx1

l(

/.L(1 + (Y) 2 L+l fix . N 1 I

The bound for the acceleration

parameter

(29)

CYis

ct’ = H,c;~_, + B,d, + cu[H,cfl_, + B, - ck,](27a) ClA = [(l Making yields

+ (Y)H~ - olI]cf’_, + (1 + cu)B,dJ27b)

the substitution

B; = (1 + a) B, in (27b)

o
2N I.4 max

1.

(30)

The action of (Y is to boost the lower order singular values so that convergence for those modes is attained faster. A THREE-POINT ACCELERATOR. The three-point accelerated update is given by

ClA = c, + (Y[q - CE,] Cl+- Cfl,

FIG. 6. An acceleration technique (two-point accelerator) that uses the difference of two successive block updates. This method updates the coefficients in the direction of maximum change.

where cI is given by (17a). This technique will form one accelerated update (31a) for every three standard updates ( 17a) (Fig. 6). The motivation behind ( 31) is to obtain a better direction estimate by using older updates. This approach to acceleration can also be interpreted as a modification to the singular values in (25), although the derivation is not as straightforward as that for the two-point accelerator. The update in (31) can be converted to dependence on c&, that is,

c;’ = [/3H2 - (P - ~)I]c;‘_~ + /3[1+ Hld,

(32)

where fl = 1 + CY.Using the transformations previously established, (32) can be written as a pseudo-inverse recursion G;4 = QG;‘_, + PGO”

(33a)

Q = p[I - G,A12 - (1 - fi)I

(33b)

P = /3[21 + G;X]

(33c)

G,A = B

(33d)

G:’ = B + B[I - XB].

(33e)

The recursion (33a) can be written in terms of the initial pseudo-inverse estimate Gt as Gj“ = {&‘[21-

X] + [Q’-’ + - - -+ Q + I]P}G:.

The terms in (34 ) can be diagonalized

ad

A =‘l-

l-

a,

[

(

2(1+

to yield

a)/J N

(34)

)I

2 L ax

.

(35)

The form of ( 35) is similar to that of ( 29). The major difference between the two is the exponent, which is larger by one in (29). Thus, the expected rate of convergence is higher when using the two-point accelerator. An interesting special case when using the threepoint accelerator occurs for the case of a block size of 2. The projects in (15a) will alternate between two hyperplanes in this case. The three-point accelerator will converge to the intersection of the two hyperplanes in lh - %lIl 2 (y = (ICI- q2112 .

singular values in order from largest to smallest, the acceleration techniques are applied to reduce the convergence time associated with the smaller singular values. The application of the acceleration techniques is shown in Fig. 7. The graph depicts the inversion of the smallest singular value for the standard RAP algorithm, the two-point accelerator, the three-point accelerator, and the combination of the two-point and three-point accelerators. The combined accelerator uses the two-point technique after every block and the three-point technique after every third cyclic update. The true inverse of the smallest eigenvalue is 1 /r$& = 100. 4.5. Computational Complexity and Stability The objective of this section is to show that computational complexity of the RAP algorithm is comparable to the computational complexity of the FRLS algorithm. The computational complexity of the RAP algorithm can be derived from the computational complexity of the NLMS algorithm. Each coefficient vector update using (15a) requires 0( 2N) multiply/add and one division. Thus, the computational requirement for each input sample is 0 ( 2NK) multiply/add and one division, where K is the total number of cycles for each block of equations. This estimate assumes that the normalization, l/ /xl], can be computed recursively. If this assumption is not true, then the complexity becomes O( 2NK + N) multiply/add per input sample, with still only one division. The RAP algorithm is stable, i.e., convergent, if the step size satisfies 0 < p < 2. The stability is independent of the eigenvalue spread of the Grammian matrix. The FRLS algorithms have computational complexity in the range 0 ( 7N) -0 ( 40N) multiply/ add.

(36)

This procedure is equivalent to pairwise orthogonalization of the rows of the data matrix [ 321. 4.4.1. Application of accelerators. The acceleration techniques presented in the previous two sections were applied to the example problem used at the end of Section 4.3. The data matrix was slightly altered by the addition of a small positive term to the diagonal of the singular value matrix, i.e., Z = diczg(l.1, 1.0,0.9,. . . , 0.2,O.l) , and the step size was increased, p = 0.5. The additive term, which can represent an average noise level, makes the data matrix full rank; i.e., the condition number becomes y = 11, or equivalently, the eigenvalue spread becomes y2 = 121. Because the RAP algorithm will invert the smallest

I 5

10

15

lT%ATlON

20

25

30

INDEX

FIG. 7. The application of the acceleration RAP techniques for inversion of the smallest eigenvalue. From bottom to top the lines represent, respectively, standard RAP, two-point accelerator, three-point accelerator, and two-point and three-point accelerator. The actual inverse is 100.

However, the faster versions of FRLS are prone to be unstable and are usually monitored to detect and eliminate the onset of instability. Some procedures include utilizing a standard LMS algorithm during stabilization of the FRLS algorithm [ll]. The slower versions of the FRLS algorithm use normalization techniques to alleviate numerical round-off, thus introducing many more divisions and square root operations not found in the lowest complexity FRLS implementations [lo]. The computational complexity comparison of various adaptive algorithms, in multiply/adds per input sample, is shown in the chart in Fig. 8. The chart is based on the assumption that the average number of cycles used in the RAP algorithm does not exceed 20. The following comments pertain to Fig. 8: l The O(N) complexity of the RAP algorithm is not dependent upon the data structure. This contrasts the FRLS algorithm which relies on a tapped delay line model to obtain 0 (N) complexity. This is important in applications where the data matrix cannot be modeled as a Toeplitz system as in, for example, adaptive array processing. l As the multiply/add complexity of the RAP algorithm increases, the number of divisions required remains unity and the number of square roots remains zero. The FRLS, in comparison, requires an increasing number of divisions and square roots as the multiply/add complexity increases. Also, the faster versions of FRLS usually incorporate recovery techniques to circumvent stability problems, incurring overhead costs in the algorithm implementation. Thus, the total computational complexity of the RAP algorithm compares favorably to the total computational complexity of the FRLS algorithms. l The lower bound on the complexity of the RAP algorithm, 2N/ 1 /O, is smaller than that for the FRLS algorithm, 7N/2/0. That is, the RAP algorithm can still be used when the computational resources be-

MULT/ADD

LMS

RAP

FRLS

RLS

2N

2N -

7N -

O(N?-

40N

39N

DIVIDE

SQUARE

ROOT

1

0

1

0

2-4N

O--5N

O(N9

FIG. 8. The computational complexity comparison of various adaptive algorithms on a per sample basis. The dominating factor in the RLS algorithm is the multiply/add complexity, which usually excludes it from fast adaptive applications.

FIG. 9. Parallel implementation. parsed to multiple processing independent coefficient vector bined with the other estimates.

The system of equations can be units. Each processor produces an estimate that is eventually com-

come too low for FRLS implementation. This has been shown to be advantageous in applying the RAP algorithm to perform frequency subband processing with dynamic allocation of computational resources [ 33 ] . l Although the chart shows only complexityper-sample, a better performance metric is the product (complexity-per-sample) X (samples-for-convergence). The RAP algorithm and the FRLS algorithm still provide good performance when the metric is used. However, the standard RLS algorithm, using the matrix inversion lemma, has a high complexityper-sample cost, while the LMS algorithm usually incurs a high samples-for-convergence cost.

4.6. Parallel

Implementation

The task of parallel implementation of a signal processing algorithm is greatly simplified if the algorithm can be decomposed into a sequence of multiplies and additions [ 341. The RAP algorithm is decomposable in two respects: the rows of the data matrix can be operated on in any order that preserves the local stationarity and each coefficient update consists of a vector inner product, a scalar multiply/add, and a vector addition. A block diagram of a coarse-grain parallel implementation of the RAP algorithm is shown in Fig. 9. The processing units consist of local memory, an I/O processor, and an arithmetic processor. Assume that there are P separate processing units, each accepting an input sample every Pth sampling time. Thus, each processing unit stores L rows of the data matrix spaced P samples apart, cycles through the L equations, and produces an estimate independently from the other processing units. The separate estimates are combined to produce an aggregate coefficient estimate that is passed to the adaptive filter.

5. SIMULATIONRESULTS This section presents simulation results obtained for the RAP algorithm in a time-varying channel tracking application using an adaptive transversal equalizer. The fast start-up performance of the RAP algorithm and its application to decision feedback equalizers has been presented previously [ 351. The RAP algorithm will be contrasted to the RLS algorithm to compare tracking ability and robustness. The simulations presented in this section were obtained using a transversal filter structure, as shown in Fig. 10. The adaptive filter length was 21 coefficients. The unknown system is modeled as an HF (timevarying) communications channel. This is accomplished by modeling the channel as an FIR filter with coefficients obtained by low-pass filtering white Gaussian noise. The 3-dB point of the low-pass filter used was f,/4000, where f, is the sampling rate of the communication system. This model has been usedpreviously to describe a HF link [ 18,361. The variation of the channel is depicted in Fig. 11. The symbol sequence used was a binary antipodal sequence randomly chosen from the set { -1, 1). The tracking performance of the RAP algorithm compared to that of the RLS algorithm is shown in Figs. 12 and 13. In Fig. 12, the forgetting factor equals unit, i.e., X = 1. The expectedly poor tracking of the RLS algorithm is borne out in Fig. 12. The forgetting factor is reduced in Fig. 13 to X = 0.9 to provide a performance improvement. The RAP algorithm operated with a block size of 30 equations with 10 cycles and a step size p= 1. This configuration is denoted as RAP (30,lO). The performance of both algorithms for a rank-deficient data matrix is shown in Fig. 14. The rank deficiency was artificially induced by setting the second

I-

I 500

02L 0

50

100

150

200

250

SAMPLE

FIG.

11.

The

time

variation

300

350

400

4Jo

NUMBER

of the channel

impulse

response.

column of the data matrix equal to the first column. This only affects the low-level tails of the impulse response, since the reference signal is delayed to center the impulse response of the equalizer. Physically, the presence of nulls in the frequency response of the channel induces ill conditioning in the data matrix [ 181. The average signal-to-noise ratio (SNR) at the input to the equalizer was measured at 25.8 dB. The noise dominates the lowest order eigenvalue of the Grammian matrix and is amplified when the lowest eigenvalue gets inverted, as in the RLS algorithm. In contrast, the RAP algorithm attenuates the noise only modes of the data matrix. This noise attenuation property of the RAP algorithm is evident in Fig. 14 by its significant performance advantage over the RLS algorithm. The robustness of the RAP algorithm was demonstrated by a simulation in which large noise spikes, i.e., shot noise, were injected into the additive noise of the channel. The shot noise consisted of noise samples of amplitude 100 standard deviations spaced at intervals of 50 samples. The first spike occurs at sample 40. The signal and noise are otherwise identical to

I -h-m---w I

FIG. 10. The adaptive transversal simulations. The reference signal response of the equalizer.

filter used for the equalization is delayed to center the impulse

FIG. error

12. A comparison of RAP and RLS of the RAP algorithm is 5.4 dE3 better

for X = 1. The on average.

output

0

50

100

150

200

300

250

SAMPLE NUMBER

SAMPLE NUMBER

FIG. error

13. A comparison of RAP and RLS for X = 0.9. The of the RAP algorithm is 2.0 dB better on average.

output

those in the previous simulation. The graph of Fig. 15 shows the quick recovery time and marginally robust performance of the standard RAP algorithm. The simple procedure of residual monitoring as described in Section 4.2 produces significant performance improvement, as shown in Fig. 16. In comparison, the performance of the RLS algorithm for the same conditions is shown in Fig. 17. The performance degradation is much more pronounced than the RAP algorithm degradation. Much of this difference can be attributed to the effective memory of the two algorithms. The effect memory of the RAP algorithm is essentially the block size, L. Thus, the effect of past data is retained only through the evolution of the filter coefficients. In contrast, the RLS algorithm has its effective memory given by the time constant of the exponential decay of the data, 7 = 1 / ( 1 - X) . Since all the weighted data are used to form coefficient updates, the effect of a statistical outlier on the RLS algorithm will generally last longer than with the RAP algorithm.

FIG. 15. The performance of RAP The shot noise causes a degradation (without shot noise) is 25.8 dB.

in the presence of 6.3 dB. The

of shot average

noise. SNR

6. CONCLUSION This paper presented an adaptive algorithm that is based upon matrix inversion techniques. The algorithm achieves 0 (N) complexity by iteratively operating on the rows of the system data matrix individually to form updates of the unknown adaptive coefficients. The iterative process converges to the pseudo-inverse of the data matrix, which stabilizes the algorithm’s performance for rank-deficient systems. Additional techniques for increasing the algorithm’s robustness and convergence performance were introduced. The RAP algorithm is easily implemented on a multiprocessor architecture due to the decomposability of the coefficient update equation. Computer simulations show that the RAP algorithm provides tracking performance comparable to that of the RLS algorithm for time-varying channel equalization. The RAP algorithm is also shown to have a per-

10

101 _ :

ENHANCED

RANK DEFICIENT DATA MATRIX

_

RAF'l30.10)

---US

(0.9)

RAP

_

CONSISTENT DATA

_ _ _ _ INCONSISTENT DATA

5+ O-

0

50

100

150

200

I 250

300

SAMPLE NuMaER

FIG. 14. The performance of RAP and RLS for a rank-deficient data matrix. The output error of the RAP algorithm is 6.0 dB better on average. The average SNR is 25.8 dB.

FIG. 16. The performance of RAP in the presence of shot noise disregarding large residuals. The shot noise causes a degradation of 4.3 dB, an average improvement of 2.0 dB over the standard RAP algorithm.

C save + dZ1 c f t&*1

endfor

ACKNOWLEDGMENTS The research presented in this paper was made possible in part through the support of the Rutgers Wireless Information Network Laboratory and the Rutgers Center for Computer Aids for Industrial Productivity.

REFERENCES FIG. 17. The performance of RLS in the presence of shot noise. The shot noise causes a degradation of 12.8 dB. The RAP algorithm shows a 4.6-dB average improvement over the RLS algorithm.

formance channels noise.

advantage for rank-deficient systems, e.g., with spectral nulls, and for channel shot

APPENDIXA RAPAlgorithmPseudo-code M

number of equations in data matrix c number of equations in processing block K number of iterations per processing block & small positive constant used to avoid division by zero C adaptive coefficient vector c13] three-point accelerated coefficient vector cl*] two-point accelerated coefficient vector

Begin Algorithm forN,= ltoM/C P + starting value for p for N, = 1 to K

P-

h*P

for N, = 1 to C x + ([Nb - l]*C + Ni)th row of data matrix d + ( [ Nb - l] *C + Ni ) th desired output f? = p*(d - C*X)/(& + Ilxll”) c 4- c + e*x endfor if Ni mod 3 = 0 then diff, = c - cc-,) diff, = c - ccm2)

f”L=,;“ll+ [(lldiff2112)/(c + Ildiff,I12)l*diff, else CWimod3)

+

c

endif endfor d21 = c + a* [c - C,,“,]

1. Treichler, J., Johnson, C., Jr., and Larimore, Design of Adaptive Filters. Wiley, New York, 2. Widrow, B., et al. Adaptive noise canceling: applications. In Proc. IEEE (Dec. 1975).

M. Theory 1987.

and

Principles

and

3. Orfanidis, S. Optimum Signal Processing. McGraw-Hill, New York, 1988. 4. Gitlin, R., and Magee, F., Jr. Self-orthogonalization adaptive equalization algorithms. IEEE Trans. Commun. COM-25 (July 1977)) 666-672. 5. Chang, R. A new equalizer structure for fast start-up digital communication. Bell Syst. Tech. J. 50 (July/Aug. 1971), 1969-2013. 6. Ungerboeck, G. Theory on the speed of convergence in adaptive equalizers for digital communication. IBM J. Res. Deu. (Nov. 1972). I. Lee, E., and Messerschmitt, D. Digital Communication. Kluwer Academic, Dordrecht, The Netherlands, 1988. 8. Haykin, S. Adaptive Filter Theory. Prentice-Hall, New York, 1986. 9. Ybarra, G., and Alexander, S. Effects of ill-conditioned data on least squares adaptive filters. In IEEE International Conference on Acoustics, Speech, and Signal Processing, 1988, pp. 1387-1390. 10. Cioffi, J., and Kailath, T. Fast recursive least squares transversal filters for adaptive filtering. IEEE Trans. Acowt. Speech Signal Process. ASSP-32 (Apr. 1984)) 304-337. 11. Eleftheriou, E., and Falconer, D. Restart methods for stabilizing FRLS adaptive equalizers in digital HF transmissions. In IEEE Global Communications Conference, Dec. 1984, pp. 48.4.1-48.4.5. 12. Bottomley, G., and Alexander, S. A theoretical basis for the divergence of conventional recursive least squares filters. In IEEE International Conference on Acoustics, Speech, and Sgnal Processing, 1989. D., and Kailath, T. Numerically stable fast recursive 13. Slack, least-squares transversal filters. In IEEE International Conference on Acoustics, Speech, and Signal Processing, 1988, pp. 1365-1368. 14.

Cioffi, J. Limited-precision effects in adaptive Trans. Circuits Syst. 34, (July 1987)) 821-833. 15. Kim, D., and Alexander, W. Stability analysis adaptation algorithm. In IEEE International Acoustics, Speech, and Signal Processing, 1988, 16. Benallal, A., and Gilloire, A. A new method RLS algorithms based on a first-order model tion of numerical errors. In IEEE International Acoustics, Speech, and Signal Processing, 1988, R., and Weinstein, S. Fractionally-spaced 17. Gitlin,

filtering.

IEEE

of the fast RLS Conference on pp. 1361-1364. to stabilize fast of the propagaConference on pp. 1373-1376. equalization:

18.

19.

20.

21.

An improved digital transversal equalizer. Bell Syst. Tech. J. 60 (Feb. 1981), 275-296. Lina. F.. and Proakis. J. Adautive lattice decision feedback equalizers--Their performance and application to time variant multipath channels. IEEE Trans. Commun. COM-33 (Apr. 1985)) 348-356. Milstein, L. Interference suppression to aid acquisition in direct-sequence spread-spectrum communication. IEEE Trans. Commun. 36, (Nov. 1988), 1200-1207. Kay, S. Recursive maximum likelihood estimation of autoregressive processes. IEEE Trans. Acoust. Speech Signal Process. 31, (Feb. 1983), 56-65. Makhoul, J. Linear prediction: A tutorial review. Proc. IEEE 63, (Apr. 1975), 561-580.

22. Flanagan, COM-27,

J., et al. Speech coding, (Apr. 1979)) 710-736.

IEEE

Trans.

Robbins, H., and Monro, S. A stochastic approximation method. Ann. Math. Stat. 22 (1951)) 400-407. 24. Ben-Israel, A., and Greville, T. Generalized Znuerses: Theory and Applications. Wiley, New York, 1974. 25. Long, G., Ling, F., and Proakis, J. Fractionally-spaced equalizers based on singular value decomposition. In IEEE Znternational Conference on Acoustics, Speech, and Signal Processing, 1988, pp. 1514-1517. 26. Alexander, S. Adaptive Signal Processing: Theory and Applications. Springer-Verlag, New York/Berlin, 1986. 27. Youla, D. Generalized image reconstruction by the method of alternating projections. ZEEE Trans. Circuits Syst. 25 ( 1978). Podilchuk, C., and Mammone, techniques for image restoration. croscopy and Video Processing,

29.

Tikhonov, A., and Arsenin, V. H. Winston and Sons,

30.

Rushforth, C. Signal restoration, Fredholm integral equations of the cry: Theory and Application (H. demic Press, San Diego, 1987. Tanabe, K. Projection method for linear equations and its applications. 203-214. Goodwin, G., and Sin, K. Adaptive Control. Prentice-Hall, New York,

31.

32.

R. A comparison of projection In SPIE Proc. Electron MiLos Angeles, CA Jan. 1989.

V. Solutions Washington,

to Ill-Posed DC, 1977.

Problems.

functional analysis, and first kind. In Image RecouStark, Ed.), Chap. 1, Acasolving a singular Numer. Math. Filtering 1984.

Gay, S., RAP on Acoustics, 34. Haynes, Comput.

and Mammone, R. Acoustic echo cancellation using the DSPlGA. In IEEE International Conference on Speech, and Signal Processing, Apr. 1990. L., et al. A survey of highly parallel computing. IEEE Msg. 15, (Jan. 1982)) 9-24.

35.

Doherty, J., and Mammone, R. A new method for robust fast tracking channel equalization. In IEEE Military Communications Conference, Oct. 1989, pp. 13.4.1-13.4.5. Eleftheriou, E., and Falconer, D. Adaptive equalization techniques for HF channels. IEEE J. Selected Areas Commun. SAC-5 (Feb. 1987)) 238-247.

36.

Commun.

23.

28.

33.

system of 17 (1971)

Prediction

and

JOHN F. DOHERTY is an assistant professor at Iowa State University in the Department of Electrical Engineering and Computer Engineering. He received the B.S. (honors) degree from The City University of New York in 1982, the M.Eng. degree from Stevens Institute of Technology, Hoboken, New Jersey, in 1985, and the Ph.D. degree from Rutgers University, New Brunswick, New Jersey, in 1990, all in electrical engineering. He worked as a Member of Technical Staff at AT&T Bell Laboratories from 1985 to 1988 in the area of underwater acoustical signal processing. Prior to that, he worked as an integrated circuit reliability engineer at IBM, Poughkeepsie, New York. His current research activities include signal processing for communications, image analysis, and magnetic field sensing. Dr. Doherty is currently serving as the Executive Secretary of the IEEE Communication Society Scholarship Program. RICHARD J. MAMMONE received the B.E.E., M.E.E.E., and Ph.D. degrees from the City College of New York in 1975, 1977, and 1981, respectively. He was on the faculty of Manhattan College for 1 year before coming to Rutgers University in 1982. He is currently an associate professor in the Department of Electrical and Computer Engineering at Rutgers. He is the coeditor of the book Neural Networks: Theory and Applications, (Academic Press, 1991). He is the editor of the forthcoming book Computational Method of Signal Recouery and Recognition (Wiley). He was an associate editor of the IEEE Communications Magazine. He is an associate editor of the Journal for Pattern Recognition. Dr. Mammone is a senior member of IEEE and a member of Eta Kappa Nu and Sigma Xi. He is in Marquis’ Who’s Who in the World, as well as several other biographies.