SIGNAL
PROCESSING
ELSEVIER
Signal Processing 50 (1996) 123 136
Asymptotic convergence analysis of the projection approximation subspace tracking algorithms Bin Yang* Department of Electrical Engineering, Ruhr Unirersity Bochum. 44780 Bochum, Germany
Received 30 March 1995, revised 20 November 1995
Abstract Subspace tracking plays an important role in a variety of adaptive subspace methods. In this paper, we present a theoretical convergence analysis of two recently proposed projection approximation subspace tracking algorithms (PAST and PASTd). By invoking Ljung's ordinary differential equation approach, we derive a pair of coupled matrix differential equations, whose trajectories describe the asymptotic convergence behavior of tbe subspace tracking algorithms. We discuss properties of the matrix differenti31 equation~ and determine their asymptotically stable equilibrium states and domain of attraction, it turns out that, under weak conditions, both PAST and PAS'lrd globally converge to the desired signal subspace or signal eigenvectors and eigenva~ues with probability one. Numerical exampk ~ are also included to illustrate the asymptotic convergence rate of the algorithms. Zusammenfassmg Adaptive Unterraumberechnung spielt eine wichtige Rolle in einer Vidzahl adaptiver Unterraummethoden. Wit stellen in diesem Beitrag eine theoretisch¢ Konvergenzanalyse zweier kiirzlich entwickeltcn, auf ¢iner Projcktionsapproximation basiercnden Algorithmen (PAST und PASTd) zur adaptiven Unterraumberechnung vor. Unter Verwendung yon Ljung's Diffcrentialgleichungsansatz leitcn wit ein Paar gekoppc~ter Matrixdifferentialgleichtm[g'n her, degen Trajektorien das asymptotiscbe Konvergenzverhalten der adaptiven Algorithmen beschreibem Wit diskntiergn die E ~ schal~en der Matrixdifferentialgleichungm und bestimmen ihre asymptotisch stabile G h f i c h l ~ h t s z u s t i n d e sowie die entsprechenden Einzugsbereich¢. Es zeigt sich, dab sowohi PAST als auch PASTd under schwachen lkdingungen mit Wahrscheinlichkeit Eins global gegen den gesuchten Signalunterraum bzw. S i g n a l ~ v e k t o r e n tagl EiFnwer~ konverglercn. Auflerdem zeigen wir Rechnersimulationer~ die die asymptoti~h¢ Konvergenrate dec AIgoritbmen veramchaulichen. R~mn~
La poursuite des sous.espaces joue L~n r61e important dans un grand nomhrc de m6thodes de type sous-esiracc adaptatives. Nous pr~scntons dans cct article une analyse th6orique de la convergence de deux algorithme~ de poursuitc de sous-cspace par approximation de la pcojcction r ~ m m e n t proposes (PAST et PASTd). A raidc de I'apprnehc de Ljung par ~quations diff&entielles ordinaires, nous d6rivons une paire d'6quations diff~rentielles matricielles coupl~es, dont les traject,,ires d~crivent le comportement de convergence asymptotique des algorithmes de poursuite du
* E-mail:
[email protected]. 0165-1684/96/$15.00 ~ 1996 Elsevier Science B.V. All rights reserved Pll SO 165-1 6 8 4 ( 9 6 ) 0 0 0 0 g - 4
124
B. Yang/ Signal Processing$0 (1996) 123-136
sous-espace.Nous discetons les propri~t6sdes ~uations dilTerentiellesmatri¢iellesel d6terminons leurs 6tats d'6quilibre asymptotique et leur dom~,.'ned'attraction. !1apparait que~,sous des conditions faibles,PAST et PASTOconvergent tous deux global~mentvers ie sous-espaceou les vecteurs propres et valeurs propres de signal (i~sir~savec une probabilit~ un. Des exemples num~riques sont 6galement inclus afin d'illustrer le taux de convergence asymptotique des algorithmes. £eywords: Array signal processing;Subspace methods: Recersive estimation: Convergence analysis
I. Imtroduetion In the last couple of decades, adaptive subspace methods have been applied successfully to a variety of signal processing applications. One computation step which plays a central role in these methods is subspace estimation and tracking [1, 10, 12,15-17, 20, 21, 24]. Given the samples of an input process, we are interested in estimating and tracking the subspace spanned by the eigenvectors corresponding to the largest or smallest eigenvalues of the correlation matrix of the input process. The classical subspace estimation technique uses batch eigenvalue decomposition (ED) of the sample correlation matrix or batch singular value decomposition (SVD) of the data matrix. This approach is computationally too expensive for adaptive methods because we would require repeated ED/SVD. Modern subspace tracking methods are recursive in nature. They update the subspace in a sample-by-sample way. Hence, they significantly reduce the computational complexity and are well suited to tracking applications with time varying signal statistics. Recently, we presented two adaptive algorithms for tracking the signal subspace spanned by the eigenvectors corresponding to the largest eigenvalues [19, 21]. Based on a novel interpretation of the signal subspace as the solution of an unconstrained minimization task, we applied a projection approximation to reduce the minimization task to the well known exponentially weighted least square~ problem. Recursive least squares (RLS) methods are then used to track the signal subspace. The resulting algorithm is referred to as projection approximation subspace tracking (PAST). It requires 3nr + O(r 2) operations for each update, where n is the input vector dimension and r (r < n) denotes the dimension of the signal subspacc. The PAST algorithm computes an asymptotically orthonormal basis of the signal subspace, la order to
get the signal eigenvectors (and eigenvalues), we derived a second algorithm from PAST by applying the deflation technique. The algorithm is called PASTd. It updates the r dominant eigenvalues and eigenvectors by only 4nr + O(r) operations. Both PAST and PASTd are tested successfully in various applications as frequency estimation, direction-of-arrival tracking, and image compression [4,19, 21]. They have also been extended to both rank and subspace tracking at a low computational complexity O(nr) [20, 22]. It turns out that PAST and PASTd offer a faster rate of convergence than gradient type subspace tracking algorithms [10,12, 16,24] because they use a Newton type update. In addition, they behave more robustly in sc~rarios with closely spaced signals and/or low signal-to-noise ratio than the ED/SVD approach. One open question, however, is the global convergence behavior of the algorithms. Though all simulation results reveal a global convergence of PAST and PASTd to the desired signal subspace, there has been no theoretical analysis up to now. In this paper, we adopt Ljung's ordinary differential equation (ODE) approach [8, 9] for analyzing the asymptotic behavior of PAST and PASTd. In Section 2, both algorithms are reviewed briefly. Section 3 applies Ljung's method to derive a set of ODEs which describe the asymptotic dynamics of PAST and PASTd. Properties of the ODEs, in particular, asymptotically stable equilibrium states and their domain of attraction are studied in Section 4. Several computer simulations are given in Section 5 to illustrate the asymptotic convergence rate of the algorithms. The following notations are used in the paper. Matrices and vectors are represented by bold italic and plain italic underlined characters, respectively. Vectors are by default in column orientation. The superscript T denotes transposition. ! is an identity matrix. E( ) and tr( ) denote the expectation and
B. Yang / Signal Processing 50 (1996) 123-136
trace operator. IIa II is the Euclidean vector n o r m of a, and IIA [h, is the Frobenius matrix n o r m of A.
2.
Projection npproxint, ltion sUbSlmce tracking
Let x ~ R ~ be a r a n d o m vector with the correlation matrix C = E{x_xT). The non-negative eigenvalues and the corresponding o r t h o n o r m a l eigenvectors of C are denoted by A~ and u, (i = 1. . . . . n). In matrix notation, C = U£,UT with U = [ul . . . . . ~ ] , UTU = L and 2~ is a diagonal matrix with the diagonal elements At . . . . . 2,. We define s~(it . . . . . i,) = { w e R " ~ ' I W = [ ~ . . . . . . ~ , ] Q , where {i: . . . . . i,} is a subset of {1 . . . . . n}. ~ ( i t . . . . . i,) is the set of all n × r matrices, whose columns form an o r t h o n o r m a l basis of the subspace spanned by the eigenvectors ~ , . . . . . ~ . Suppose that the eigenvalues ace non-increasingly ordered :.~ I> --- /> ~, I> O. T h e subspace we are interested in is the signal subspace as represented by ~ ( 1 . . . . . r). In order for this subspace to be unique, ~, must be strictly larger than :.,+ t. In [19, 21], we considered the scalar function t
J(W) = E I l _ x - WWTxH 2 = tr(C) - 2tr(WTcw) + tr(WZCW • wTw),
T+
U n d e r the assumption that W has full rank r, we have proven two results in [21]. Tbeocem I. Each stationary point of J(W) is characterized by W e .::(it . . . . . i,). To be more speci.Bc, dJ/dW = 0 implies W T W = 1 and ( I - WWT)CW = O. The last equation is satisfied if and only if W e :/'(il . . . . . i,), where {il . . . . . i,} is an arbitrary subset of [ I . . . . . n }. Theorem 2. if 2 t t > ..- >t )., > A, + t /> "'" >/).n, J(W) > J(W,) = ~ = , + t )-i for W¢.~(I . . . . . r) and /4:, ~ : f ( l . . . . . r). In addition, all stationary points not belonging to .9"(i . . . . . r) are saddle points. A conclusion of Theorem 2 is that J i W ) has no local minima and attains the global m i n i m u m if the columns of W form a n o r t h o n o r m a l basis of the signal subspace. This motivates the minimization of J(W) for seeking the signal subspacc. In real applications, only sample vectors x(k) (k = i, 2 . . . . ) of the ,tochastic vector process x are available. We consider the exponentially weighted sum
J(W(k)) = ~ /~-'ll_x(i)- W{k)WT(h)x(i)ll2
(4)
i=l = t r [ C ( k ) ] - 2tr[WZ(k)C(k)W(k)]
(2)
with a matrix argument W e R "~" (r < n). Let dJ/d W e R ~~" denote the derivative matrix of J ( W ) with respect to W. Its (i,j)-th element is the derivative of J(W) with respect to the (i,j)-th element of W. A straightforward calculation shows dJ cl--W= 2(- 2c + c w w
125
wwTc)w
(3)
ISince the ODE approach was originally formulated for real case algorithms, we consider in this paper the real version of the subspacc tracking algorithms PAST and PAST:.. We believe. however, that the convergence results derived in this paper also hold for the complex case. I~ order to show this, the ODE approach has to be extended to the complex case or the complex algorithms have to be recasted into real ones by c~,nsidering the real and imaginary parts separately. This is currently being studied.
+ tr[WT(k)C(k)W(k) - WT(k)W(k)], in which 0 < ~ ~< ! is an exponential forgetting
factor and C(k)=Y~=1 #"-ix(i)xr(i)is a sample correlation matrix. By using the same arguments as above, we conclude that J(W(k)) is minimized ifthe columns of W(k) form an orthonormal basis of the signal subspace spanned by the r dominant ¢igcnvectors of C(k). J(W(k)) is a fourth order function of the elements of W(k). Iterative algorithms are thus ~ r y to minimize J(W(k)). In order to facilitate a recursive minimization, we suggested to approximate WT(k)x_(i) in (4), the u n k n o w n projection of x(i) o n t o the columns of W(k), by _y(i) = WT(i - I)x(i). Note that all _y(i) for I ~< i ~< k are kn'~wn at the time instant k. This leads to a modified cost function
J'{W(k)) = ~ /P-~llx(i)- W(k)y_.(i)ll 2, i=I
(5)
126
B. Fang / Signal Processing JO (1996) 123-136
which has the same structure as the exponentially weighted least squares problem in adaptive filtering [6]. It is now straightforward to apply the RLS method (by using the matrix inversion lemma) to derive the PAST algorithm given in [19, 21]. In the following, we give another formulation of the PAST algorithm, which is more convenient for our convergence analysis: y(k) = JJzt(k -- l)_x(k),
(6a)
C , , ( k ) = ~C,,.(k - 1) + y_(k)y.T(k),
(6b)
W(k) = W(k - 1) + Ix(k) - W(k - l)_y(k)] x yT (k ) C ~ '(k ).
(6c)
The PASTd algorithm is a modification of PAST. It applies the deflation technique to compute the eigencomponents sequentially. First the PAST algorithm with r -- I is used to update _wx(k), the estimate of the first eigenvector of C(k). Then we remove the projection of the current sample vector x(k) onto_wt (k) from x(k) itself. Because now the ~econd dominant eigenvector becomes the most dominant one, w_2(k) can be updated in the same way as _wt (k). Below we give a brief summary of the PASTd algorithm: y~(k) = w_~T(k-- l)x,(k),
(7a)
c,,.,(k) = #c..,(k - I) + y2:(k),
(7b)
w,(k} = w_~(k-- I) + [~(k) -- w~(k -- l)y~(k)] x yi(k)/c,,.,(k),
x,+ t(k) = x~(k) - y,(k)~(k).
(7c) (7d)
The index i runs from 1 to r, and the loop is initialized with xl (k) = x(&).
then used in (6b) tO update the r x r matrix Cyy(k), whose inverse C7,1(k), together with y(k), is involved in (6c). Hence, W ( k ) is a strongly nonlinear function of W(k - 1), and the mapping from the input data {x(1). . . . . x(k)} to the subspace estimate W(k) is highly complex. In particular, the effect of the projection approximation, i.e. replacing w T ( k ) x ( i ) in (4) by y(i)= WT(i- l)x(i), is not completely studied. Ljung [8], Kushner al:d Clark [7], and Ljung and S6derstrom [9] have independently developed a theory to analytically study the asymptotic behavior of stochastic approximation algorithms. The basic idea is to associate a continuous time deterministic ODE with the discrete time stochastic approximation algorithm at hand. They have shown that, under weak conditions, the parameter estimates of the stochastic approximation algorithm asymptotically follow the trajectories of the ODE and converge to the stable equilibrium states of the dynamic system described by the ODE. In this paper, we apply the ODE approach to analyze the asymptotic dynamics of the PAST and PASTd algorithms. In particular, we use Theorem 4.2 in [9"1 to derive an ODE associated with (6). Before doing this, let us first reformulate and extend Ljung's theorem to a form which better suits our purpose. 2 Given a sequence of sample vectors {x(k)lk = 1,2. . . . }, which is a particular realization of an underlying stochastic vector process, we consider the following stochastic approximation algorithm: _y.fk)= A ( W ( k - l))y(k -- 1) + B ( W ( k - 1))x(k),
(8a)
[
(8b)
e(k) = x(k) -- ~(k),
(8c)
~(k)'] = C ( W ( k - 1))y(g), ~(k ) J
3. The ODE of the PAST algorithm PAST and PASTd are stc,~'bastic approximation algorithms. A rigorous convergence analysis of these algorithms is in general difficult. A major reason for that is the stochastic end nonlinear character of (6) and (7). Note that the quantity _y(k) in (6a) is formed using the current sample vector x(k) and the previous subspace estimate W ( k - 1). It is
R ( k ) = R ( k -- 1) + ? ( k ) H ( R ( k -- 1), W(k-
l),e(k),.v(k)),
(Sd)
2Originally.Ljung'stheorem was formulatedfor algorithmic witha parametervector.It is not hard to extendthe theoremto algorithmswith parametermatrices.
127
B. Yang/ SignalProcessing~0 (1996) 123 136 W(k) = W(k - 1) + 7(k)h(W(k - D, e(k),y(k))R- ~(k).
(8e)
A(W), a(W), C(W), H(R, W,a,y_) and h ( W , ~ y ) are matrix valued functions with appropriate dimensions. {~,(k)} is a gain sequence with positive numbers. In the sequel, we shall work with the following conditions: AI: ~ is a compact set of R n~', such that A(W) for WE ~ has all eigenvalues strictly inside the unit circle. A2: The matrices A(W) B(W) and C(W) are continuously differentiable with respect to W ~ ~. A3: H(R, W,e,_y) is differentiable with respect to R, W,e and y, such that, for some positive constant C < oc, IIHIIF < C(! + IIR[IF+ Itel]2 + Ityll2), ~'R r +
r ~< C(t + I1£112+ 112112), (9bl
1~"1+~" I II '~_e F
(9a)
~< C(1 + II_el,+ Ilyl!). ~--~_YF
(9c)
Note that for two arbitrary matrices X ~ R ''~m" and Y~ ~m,~,,, the derivative of Y with respect to X is defined as an rn, my × n~ny matrix dY/dX = Y.~.jE 0 @ ~Y/OxO, where ® is the Kronecker tensor product [5], and xo is the (i, j)-th element of X. The matrix E~ has the same dimension as X. It has zero elements but 1 at the position (i, j). In other words, the (i,j)-th block of dY/dX is OY/Oxo. A 4 : h(W,£,y_.) is differentiable with respect to W,e and y, such that, for some positive constant C < o~, [Ih][e+l~h-~ ~
A5: The matNx R(k) in (8d) is symmetric. In addition, there exists some 6 > 0 such that R(k) >t 61 for all k. With A I> B, we mean that A-B is non-negative definite. Similarly, A > B means that A - B is positive definite.
A6: The gain sequence {7(k)} satmfics i i m ~ , kT(k) = # > 0. 3 A7: The data S~lmracv !~(k)} is such that the following limits exist for all WE ~ : ! k lira : ~ E{H(R, W,g(i, ~0, y(i, W))} =/-](~R, II~: k~,~ k t= ! Ilia)
~, = :.l i~m ~=1E{k(W,£(i, W),y(i, I$O)} = •(W). flfb) ! iere e_(i,HI) and _y.(i,W) are defined in a similar way as in (ga)-(8c) except for the use of a fixed value W instead of W(k - 1). The expectation E{ } in (11 ) is over the stochastic input process {_~(k)}. AS: The sequence of random vectors {~[(k)} is required to be "asymptotically independent". A precise definition of this so-calhut mixing condition is given in [9, p. 169]. Ljung has proven that, under assumptions A I-A8, R(k) and W(k) of the stochastic approximation algorithm (8) stay close to the corresponding trajectory of the ODE k(t)
:= dR(t)/dt = l~(R(t), W(t)),
W(t):= dW(t)/dl = £(W(t))a- '(t)
(12a)
(12b)
with higher and higher probability as k increases. Here t denotes a fictitious continuous time, which is related to the gain sequence {;,(k)} by t(k)= Y~: ~~,(i). In addition, if A9: R(k) and W(k) enter a compact subset of the domain of attraction of an asymptotically stable equilibrium state of the ODE (12) infinitely often with probability one, then R(k) and W(k) tend to this equilibrium state with probability one. In order to apply Ljung's tbeor~rn to the PAST algorithm (6), we choose the forgetting factor 0 to be I. This means no discarding of p ~ o u s data, which is implicitly iequited by assumption A6. To ~Wcaker conditions on the gain sequence in the form ~-~= I 7(k) = ~ and ~ = t 7 F(k) < ~ for some p > I are possible as well.
128
B Yang / Signal Processing .~0 (1996) 12J-136
bring algorithm (6) to the general form (8), we define R(k) = ~,(k)C,(k) with 7(k) -- l / { k + I) (k ;~ 0). The PAST algorithm can then be reformulated to [(k)=WT(k
- 1)x(k),
l)£(k),
g(k)=x(k)-W(k-
Other more efficient techniques can be found in ['9]. Therefore, condition A5 is satisfied by a careful implementation of the update (13c). Regarding the gain sequence 7(k) = l/(k + 1), condition A6 is fulfilled with/~ -- 1. The last two assumptions A7 and A8 describe conditions imposed on the observed input data {x(k)}. Note that, for a fixed matrix W, the quantities y ( k , W ) and e(k,W) in (l l) become y(k, W) -- wTx(k) and e(k, W) = x(k) - Wy_(k, W) for the PAST algorithm. According to (14), we obtain
(13a) (13b)
R(k)=R(k-l)+7(k)[y(k)yT(k)-R(k-1)],
(l~c) W(k) = W(k - 1) + 7(k)e_(k)yT(k)R - '(k).
(13d)
Comparing this algorithm with (8) reveals that the PAST algorithm is a member of the family of stochastic approximation algorithms (8) with
H(a, W,¢(L W),~(L W)) -- y(L W)YT(L nO -- a (16a)
---- WTx(k).~T(k) W - R,
A(JV) m fi, B ( W ) = W T, C(W) = I ~ l ,
h(W, e(L W),y(L W))
H(R,W,e,y)=yyT-R, k(JJe,_e,y)~--e.y T.
(14) Now it is easy to verify conditions A1-AS. Conditions AI and A2 are obviously satisfied with = R "x'. Conditions A3 and A4 are satisfied as well, since, for the functions H(R,W,e_,y_) and k(W,e,y) given in (14), we find
l~(a, W) = w T c w -- R,
IIHIIF <~ IIRIIF + [].viiz,
(15a)
K(W) = C W -
+ i° .1F
(15b)
if lim~,~ ~ Y.~=x £rx(i)x--T(i)] = C holds. This is, for example, the case if x(k) is stationary with the correlation matrix C. Finally, we mention that assumption A8 does not exclude a sequence of correlated random vectors. It just requires that x(i) for i<
F
r,
= g(k, W).va(k, W) =x__(k)x_.T(k)W - WWTx_(k)xT(k) W.
(15c) •
IlkliF + ]0-~w[0k ' ~<1( If_ellz + ii.vl[2),
I0~1~
(15d)
,,,o,
Condition A5 is merely a regularization requiremcnt imposed on the matrix R ( k ) in (13c). Remember that R ( k ) is always symmetric and positive definite if its initial value R(0) is. However, R ( k ) could be almost singular for large k if x(k) is not persistently exciting. In order to assure that R - t (k) is uniformly bounded, it is necessary to modify Eq. (13c) to satisfy condition A5. There exist various 9ossibilities for achieving this. The simplest way is to add 61 to R(k) if we find R(k) nearly singular.
(16b)
Condition A7 is thus satisfied with the limits
R = wTcw
WWTCW = (!-
- - R,
~ = (! -- W W T ) C W R - I,
(17a) wwT)cw,
(17b)
(18a) (18b)
where C is a fixed correlation matrix, and R = R(t) and W = W(t) are continuous time versions of R(k) and W(k) in (13). In the next section, we shall discuss to which values R(t) and W(t) tend as t increases.
B. Yang/ SignalProcessing50 (1996) 123-136 4. C o n v e r g e n c e
analysis
Proof. Let us define 0 - - WTu,. By exploiting the relation C_u~= ~y~, we easily obtain
4.1. Two lemmas and the equilibrium states
_o = s - '(~,I - w T c ~ o
Eq. (18) describes an autonomous nonlinear dynamic system. Before we discuss its asymptotic behavior, we first prove two useful lemmas. I.,emma 1. l f R(O) is symmetric and positive R(t) in (18a) will be symmetric and definite for t > 0 as well. in addition, bounded uniformly in time if wT(t)W(t) is uniformly.
129
definite, positive R(t) is bounded
(2o~
from (18b). Obviously, 0 = 0 is an equilibrium state. This means, 0(0) = 0 implies 0(t) = 0 for all e. Since Eq. (20) satisfies the Lipschitz condition, the solution _0(t)= 0 to the initial value 0(0)= 0 is uniqve [2, 14]. In other words, ~(t) ~ 0 if 0(0) ~ 0. In the single vector case (r = I), (20) simplifies to a scalar differential equation •
1
0 = ~ (2, - wTC~)O = O~.(t), R(t)). O(t). Proof. Note that Eq. (18a) does nc,t describe a simple non-homogeneous linear matr:x differential equation of first order with constant coefficients due to the W and R dependence as given in (18b). Hence, it is hard to find the explicit solution R(t) to the initial values R(0) and W(0). Nevertheless, it is straightforward ,o verify that R(t) can still be expressed in the implicit form
R(t)=e-'[f~
WT(z)CW(~)e~d¢ + R(O)],
(19)
wi~h WT(T)CW(z) being a function of R(z). Since WT(z)CW(¢) is always symmetric and non-negative definite, the first part of Lemma 1 follows immediately. To prove the second part, we assume WT(t)W(t) ~
0 and t I> 0. Clearly, WT(t)CW(t) g ~-t WT(t)W(t) g ~t61, where At is the largest eigenvalue of C. Therefore, R(t)<~ e-~[2tM(e ' - 1) 4- R(0)] ~<2,M 4- R(0). [] In the following, we always assume R(0) to be symmetric and l~ositive-definite. Remember that, in the implementation of the PAST algorithm, a similar constraint is imposed on the initial value of the discrete time update R(k) [21]• 1,emma 2. For each eigenvector .~ of C, both {WIWTua=0} and { w I w T ~ : / : 0 } are invariant sets for the ODE (!8b). In other words, each trajectory W(t) startino inside one of the sets remains there for all subsequent times. In the sin@ie vector case, i.e. if W simplifies to a vector w, {w]w~r~ = 0}, {WIWT~> 0} and {W[WTU,< 0} are all invariant sets.
(21)
Using similar arguments as in the proof of Lemma 1, an implicit solution of (21) to the initial value 0(0) can be written as o(t)= O(O)exp(flo~_.(~),R(~))d~ ). Therefore, 0 ( 0 ) = 0 O( >0, <0). []
-122)
( > 0 , <0) implies O(t)=
Lemma 2 has a nice interpretation. If the initial value W(0) does not have a compouent along the direction u,, then W(t) will never gain such a component in future times. As a consequence, a necessa.,T ~,,~dition for W(t) converging to the desired signal subspace ~(1 . . . . . r) is that W(0) must have non-zero projectiens onto each of the r dominant eigenvectors_u~ . . . . . It,. Similarly, if W(0) has a nonzero component 0(0) along u , then this component will not vanish identically for finite time. This d e ~ not exclude, however, ~(t) from converging to zero as t tends to infinity. Now we turn to the equilibrium states of the autonomous system (18). Each equilibrium state is characterized by W = 0 and k = 0. Since R(t) is positive definite du¢ to l.emma 1 and the assumption RT(0)= R(0) > 0 , g v = 0 implies ( i - - WWT) CW=O. Using Theorem 1, we conclude W6 ff(i~ . . . . . /,) for any full rank W'. Actually, ,~(i~ . . . . . 6) represents an equilibrium set instead of an isolated equilibrium point, since ~e(it . . . . . 6) defines the set of all n x r matrices sharing tim same column space as fu,, . . . . . !~.J- The reason for that is
130
R Yaag/ SignalProcessingJO (1996) 123-136
the invariance of the O D E 118) with respect to a rotational transformation. It ~s easy to check that replacing Wand R in (18) by W@ and QTRQ does not change the O D E at all ifQ is a fixed orthogonal matrix. We conjecture that ~ ( 1 . . . . . r) is the only asymptotically stable equilibrium set for the O D E (18b). Its domain of attraction seems as large as the whole parameter space R """ provided that the initial value W(0) has non-zero components along all r dominant eigenvectors. In the following, we prove our conjecture by using Liapunov's second method (or direct method) [2, 14]. We first consider the single vector case (r -- 1), in which we only seek the eigenvector corresponding to the largest eigenvalue. The result is used to show the convergence of the PASTd algorithm. Then we extend the proof to the multiple vector case (r > 1), which corresponds to the PAST algorithm.
4.2. The single vector case In the single vector case, the n x r matrix W simplifies to a n x I vector w, and the r × r matrix reduces to a scalar R. The O D E (18) becomes
pends both on ~ and time t (over R(t)). We shall show that this non-autonomous system has two asymptotically stable equilibrium states at u, and - u l regardless of the given function R(t), provided that the condition R(t) > 0 is satisfied. By exploiting this result, we then show the convergence of R(t) towards ~t.
Theorem 3. Given a symmetric and positive definite matrix C, whose largest eigenvalue ,~ has multiplicity one. The autonomous system (23) has two asymptotically stable equilibrium states at (,lt,ul ) and (~l, - u t ) . Their domains of attraction are {R, wIR > 0, _wTRl> 0} and {R, wIR > O, _wTul < 0}, respectively. This means, if R(O) > 0 and _wT(0)Ul > 0 (or < 0), R(t) and w_.(t)always converge m ~1 and ul (or - u t ) . Proof. The line of proof is to show (a) IJ_w(t)lt- , 1, (b) _w(t)-, + ul, and (c) R(t)-* ~1 as t tends to infinity. (a) Let z = ]lw][2 --- _.WT_.Wand V(z) = (z - 1)2. Their time derivatives are = 2__WT~= -- 2_wTC_w(z- 1)/R, l?(z) ffi 2(z -- 1)~ = -- 4w_TCw(z -- I)2/R.
R ~ wTCw -- R;
(23a)
~. = (! -- wwT)cw_/R.
(23b)
What we have to show is that R(t) and w(t) con~erge to the largest eigenvalue ~1 of C and the co,"responding normalized eigenvector ux or - ut, respectively. The basic idea of Liapunov's second method is to fmd an appropriate scalar function V(R,w_) which decreases along any trajectory of the O D E (23) and attains a (possibly local) minimum at the equilibrium state (,~t,ut) or (~1, - l t t ) . Such a function is called Liapunov function. Unfortunately, it is hard to find such a ftmction for the whole dynamic system (23) with a domain of attraction as large as possible. To combat this problem, we first ignore Eq. (23a) and consider only the subsystem (23b) with the state vector w while treating R(t) as a given function in time. By doing this, Eq. (23b) describes a non-autonomous dynamic system, since _~ de-
(24a) (24b)
We consider the function V(z) in a neighborhood of z = 1: ~g(I) -- {zl V(z) < I for some l > 0, z > 0} ~-
{zl~,,a,(0,1 - 4 ) ~ z
~ 1 + ~}
(25)
For z ¢ ~:(l), z is bounded. By Lemma 1, there exists a fixed e > 0 such that R(t) <~~ for t ~> 0. In addition, _.wTc_wI> wT~.nlw ----,~nz, where ~.~ > 0 is the smallest eigenvalue of C. Clearly, V(z) > V(1) ffi 0 and l?(z) ~< - 4~,z(z - 1)2/~ < I;'(1) = 0 for z ¢ ~2(!) and z ~ 1. According to the theory of Liapuno~ stability for non-autonomous systems [14], V(z) is a Liapunov function, and z ~- I is asymptotically stable. Moreover, each trajecto:T starting in ~:(I) does not leave ~ ( l ) and must tend to z = 1. Since the quantity I in (25) can be chosen as large as possible, the domain of attraction of the asymptotically stable equilibrium state z = 1 is ~,(oo) = {zlz > 0},
B. Yang / Signal Processing ~0 (1996) ;23-136 (b) In order to prove _w--* + ut, we use the cost function in (2) as our Liapunov function: V ~ ) = tr(C) - 2w~Cw + _w~C_w._wT_w.
(26)
According to Theorem 2, V ( w ) > V ( + ~ t ) = Y'-~'=22j forw # +ut. The derivative of V(.V.)with respect to w is dV d-~ = 2 ( - 2C + C_w_wT + _ww~C)w,
(27)
see (3). Since I[_wll approaches one for each _w(0) # 0, we may write dV
= 2 ( - C + w_w~C)w = - 2(1 - w_w__T)Cw_ (28)
for large t. Therefore, the time derivative of V(.~) is
~'(.v.)
I-drT = Ldwj
_
-< - { I1(!- _ww.ff)C_wll2,
(29)
where we used Eq. (23b) and the boundedness of R(t). By Theorem 1, we know I?(.~.)= 0 for _w-- =l: u t and V ~ ) < 0 for _we {_wl_w~t # 0} and _w# + _ut. This shows the asymptotic stability of the equifibrium states _w= _+ u~. Concerning the domain of attraction, we recall that {_wl_wTut> 0} and {_w[w~_ut < 0} are invariant sets due to Lemma 2. In addition, V(ff.) --, ~ as II_wll--, go. Hence each trajectory starting inside one of the above invariant set is bounded and remains there for all subsequent times. Therefore, all trajectories starting in {wl_wTut > 0} or {_wiwTut < 0} tend to u~ or - u , , respectively. (c) _w--, + uj implies wTCw--* 21. Using the Liapunov function V ( R ) = ( R At)2, R---At follows immediately from (23a). This completes the proof. [-1 By combining the above results with Ljung's theorem, we now prove the convergence of the PASTd algorithm.
Cemllary 1.
Sv.ppose that the input data x_,'k) of the PASTd algorithm (7) satisfies conditions A7 and A8.
131
The eigenvalues of the correlation matrix C satisfy 21 > "" > ;t, > ;~,+ i >t "'" >1 )~,. The forgetting factor f is set equal to one. Then for c,.t(O) > 0 and randomly chosen initial weights w,(O), cTy.dk )/k and w_a(k) tend to ~ and -t-~ (i = 1. . . . . r) with probability one. P,o: r. For randomly chosen initial value wt(0). wT(k)_u~ # 0 occurs infinitely often with probability one [13]. Hence, all conditions AI to A9 are satisfied. According to Ljung's theorem, c,,. i(k)/k, which corresponds to the quantity R in (23), and _w,(k) tend to Az and -I-ut with probability one. The PASTd algorithm uses the same procedure to update the second eige~component c,,.2(k)and _w2(k) as for the first one except for the use of the modified input data x , ( k ) = [ l - w t ( k ) _w[(k - l)]x(k) due to the deflation step (7d). Since _wl(k) tends to + u t with probability one, x2(k) tends to [ ! - ulu[]_x(k) with probability one as well. This modified input data has a correlation matrix equal to C - )qutu~, whose 6genvalues are ~-2. . . . . 2..0. Therefore. c,.=(k)/k and w,(k) converge to ;-2 and + u , with probability one. Using the same arguments, the convergence of the other eigencomponents can be shown similady. F'I 4. 3. The multiple vector case The multiple vector case (r > 1) is more complicated because we are dealing with a pair of coupled matrix differential equations (18) instead of scalar and vector ones (23). in particular, we cannot expect a convergence of the PAST algorithm to the signal eigenvectors and eigenvalnes. We gather guess that W(k) will tend to a matrix with orthonormal columns spanning the signal subspace. The line of the proof for the PAST algorithm is similar to that of the single vector case. Nevertimless, there exist essential differences in the computations. T h e o r e m & Assume R(O) > O and ~t >i ... >~A, > A,+t/> -'- /> ,}.. ,9'(1 . . . . . r) is an asymptotically stable equilibrium set for the O D E (18b) with a domain of attraction ~ w = { ~ W Tu~ # ~ for 1 ~ i <~r, W full rank}.
B. Yang / Signal Processing 50 (1996) 123-136
132
Proof. As for the single vector case, we firstshow w'rw-.I. Let us define Z = w r w = £ : r. Its time derivative is given by
By substituting Eqs. (18a) and (30) for R and 2', I:'(ff) -- - 2 t r [ ( Z -- l ) R ( f f - l ) R + (z - l)R(z - 1)wTcw]
= ~'TW + W T W =
=
-- J R - 1 W T C W ( Z
_ ! ) ~- ( Z - - I ) W T C ~ V R - ! ].
IV(Z)
-
- 2tr[Rll2(Z - I ) W T C W ( Z - l ) R I / 2 ]
(30) Clearly, Z = ! is an equilibrium state. T h e next step is to find a Liapunov function V(Z) such that V(Z) attains a m i n i m u m at Z = ! a n d h a s negative time derivatives in a n e i g h b o r h o o d of Z = !. At a first glance, we would try the function V(Z) = [[Z - i11~ = t r [ ( Z - ! ) : ] . After some calculations, we obtain ~'(2") ffi 2tr['(Z - 1 ) 2 ] = - 2 t r [ R - l W T C W ( Z _ 1)2 + ( Z -- I ) Z W T C W R - 1!.
(31)
Unfortunately, we c a n n o t show I:'(g) ~< 0 in general. F o r three arbitrary symmetric a n d non-negative definite matrices A ~ , A z a n d A3, tr(AIA2A3) < 0 could happen, in particular if A ~A 2 is 'strong' unsymmetric. Hence, V ( Z ) - - I l Z - / l l ~ is not a suitable Liapunov function. We propose to use V(Z) = lIR~/2(Z - l ) S : : ' ll~ =
tr[(Z
-
(32)
I)R(Z - I)R],
where R ~/z is the symmetric a n d positive definite square root of R = R~/ZR ~/2. Such a square root always exists a n d is unique because R is symmetric a n d positive definite. R ~/2 has the same eigenvectots as R, a n d its eigenvalues arc the positive square roots of the eigenvalues of R. W e k n o w from the rcgularization condition A5 in Section 3 that R >_.61 or R t/2 >! V/-61 for some 6 > 0. Therefore,
=6"llZ-lll~->
V(/)=0
forZ¢L
(35) results. Since R I> 61, we find
w r c w >i w T ).,,IW= ).nZ and
I;'(Z) ~< - 2 6 z IIZ - lll~ - 26A, tr I'(Z - l ) g ( g - 1)1 = - 262 IIZ - iil~ - 26).. [IZI/2(Z - l)ll 2 < ~'(!) = 0
for Z • !.
(36)
In other words, the equilibrium state Z = l i s asymptotically stable. The domain of attraction is the set of all symmetric and positive-definite matrices. At the second step, we prove that W tends to a matrix in the set Ae(l . . . . ,r). In order to d o this, we again use the cost function (2) as o u r Liapunov function V ( W ) --- tr(C) - 2 t r ( w r c w )
+ tr(WTCW • w T w ) . (37)
According to T h e o r e m 2, we k n o w that V ( W ) attains its global m i n i m u m in ~ ( 1 . . . . . r), i.e. V(W) > V ( W . ) for W. ~ ~ ( 1 . . . . . r) a n d W C f f ( l . . . . . r). T h e derivative of V(W) with respect to W is given in (3). By taking wTw--* ! into account, the derivative reads d E / d W = - 2 ( / w w T ) c w for large t. T h e time deft :ative of V ( W ) is hence ~'(W) ffi t r [ ~ ( d V / d W ) T]
= - 2tr[(! - W W T ) C W R - l w T c ( I _ W W T ) I , (38) where we used the O D E (18b) for the last step. Since w T w globally converges to !, we conclude the b o u n d e d n ~ s of W T W . By invoking l.emma 1, there exists a fixed ;: > 0 such that R ~< d or - R - ~ <~ - 1/5. Correspondingly, we may write
(33) ~'(W) <~ - 2 tr [(! - w w T ) c w w T C f l
-- WWT)]
T h e time derivative of V(Z) in (32) is given by
P(Z) = 2tr [~'R(Z - I ) R + ( Z - I ) R ( Z - I ) R ] .
-(34)
2 -
~,
II(I --
WWT)CWll 2 < f'(W.) = 0
(39)
R )rang / Signal Processing 50 (1996) 123 136
for W.~,f~(l . . . . . ~.) and W E ~ w but We ,Cf(l . . . . . r) due to Theorem 1. This shows the asymptotic stability of the equilibrium set Ae(l . . . . . r). Its domain of attraction is ~w as given in Theorem zt because ~ w is an invariant set according to Lemma 2 and because each trajectory starting in ~ w remains bounded due to V(W) -as II Wily--, ~ . This completes the proof. []
column of Whas a component along that direction. The remaining columns of gt' may he zero vectors or identical to the aforementioned one as in the above two examples, Hence a matrix may have projections along all desired eigenvectors and can still be rank deficient. Finally, the asymptotic convergence of the PAST algorithm is established.
Interestingly, ~ w given in Theorem 4 is not the maximum domain of attraction of ,Y(I . . . . . r). Even if W(0) has rank less than r, there still exists R(0) such that W(t) will converge to ,Cf(l . . . . . r). This is a significant difference between Newton type subspace tracking algorithms as PAST and PASTd and gradient type algorithms. In the convergence analysis of gradient type subspace tracking algorithms [3, ! 1, 13, 18-1,one was dealing with a single O D E of the type
Corollary 2. Suppose that the input data x(k ) of the PAST al.qorithm (6)ful/illsconditions A7 and AS. The ~iqenvalue.~ of the correlation matrix C satisfy ~-i i> "'" ~>'~.,>~+i I> " >~..Assumetbattt~k) visits the domain of attraction of Sf ( I, ..., r) infinitely often with probability one. Then W(R) tends to .Cf(l . . . . . r) with probability one as k --, ~ .
fr = (! - w w ~ ) c w .
Proof. This result follows immediately from Theorem 4 and Ljung's theorem. I-1
(4o)
It has been shown that such an ODE has rank preserving property [18-]. In the PAST algorithm, the presence of the matrix R -I in (18b) has the effect of 'mixing' the r columns of the time derivative W in (40).This may cause, despite a faster initial convergence rate, the rank of W to increase, thus leading to a much larger domain of attraction. Below we show two examples to illustrate this phenomenon. In the first example, we assume r = 2 and set W(0) equal to [w(0), _0-1. Clearly, W(0) is rank deficient. The derivative W(t)l,=o in (18b) hecomes L~(0) 0Q]R-I(O) with ~ ( O ) = [ I - W ( O ) wT(0)3C_w(0). If R - t (0) has non-zero off-diagonal elements, I#'(t )h = o has obviously a non-zero second column, and W(t) will gain a non-zero second column as well. In the second example, we use the initial value W(0)= [_W.(0)w(0)-1 which is again rank deficient. When we choose R(0) to be a diagonal matrix with the diagonal entries 1 and c, we obtain I#'(t)l,=o = [_~.(0) _~(0)/c]. if c # 1, the two columns of W(t) have different derivatives at t = 0 and will become gradually linearly independent as t proceeds. Note that the rank property of W may not be confused with Lemma 2. We say that a matrix W has a projection along a direction if at least one
5. Computer sirmglmtiom In this section, we present three comp,ter simulations to support our theoretical results and to illustrate the convergence rate of the algorithms. Actually, the more complicated multiple vector case and the PAST algorithm are considered. In the first example, the correlation matrix C = E(xx T) of a random vector process _xeR 6 is set equal to a diagonal matrix C = diag(10, 9, 8, 3, 2,1), where the numbers in diag( ) indicate the diagonal entries. Clearly, the eigenvalues of C are given by 10,9,8,3,2,1, and the eigenvectors are the columns e, ( i - - 1 . . . . . 6) of a 6 x 6 identity matrix. We used the PAST algorithm to track the column space of Us = ret e2 g3 ]. The forgetting factor ,8 is set equal to one. For each of a total number of 10 trials, independent and identically Gaussian distributed random vectors x(t) with the given correlation matrix C are generated. The initial value W(0) is chosen randomly and its columns are orthonormalized. The initial value C,,(O) in (6) is set equal to diag(10, 9, 8) in order to shorten the transient phase because we are interested in the asymptotic convergence of the algorithms, in order to evaluate the
B. Yang/ SignalProcessing30 (1996)123-136
134
quality o f the subspace estimator, we computed two 'learning curves'
Motivated by a recent study of the asymptotic distribution of W(k) ['23], we postulate for ~, > 2~., + t
fonbo(k)
=
II Wt(k)W(k)
--
(41)
ijl~"
I
and
fo.ho(k ) " U '
f~oj(k)
----
li W(k)Wtfk) -- VsU~t IIr.Z
(42)
While fo,~(k) measures the deviation of the columns of W(k) from orthonormality, fproj(k)compaws W(k)Wt(k) with the true projector of the desired subspace. Fig. l(a) shows the results. W e ob~rve that W(k) converges fast to a matrix with orthonormal columns, and W(k)wT(k) approaches the desired subspace projector at a satisfactoryrate as well.
2 ~' f,,oj( k ) = ~
(43)
~" v(,tj/,l,)
1=1J:P+!
with v(x)= x/(l -2x), Fig. I(b) shows the averaged learaing curves of the 10 trials. It also depicts the least squares fit of fo.ho(k) to the model a/k: in the time interval [500,2000"1 and the theoretical prediction of f~,,j(k) according to (43). We see that the experimental results agree excellently with the theoretical predictions.
a) 10' /
1
"
10;
'
,o° L
"6~o"i
~ -
~ , o + ~I I 6
lO o
10-t
~ 1 0 -z
--
!,o.
_ ~-~_~____~..=__.~=~
g,og~
bo+t
10 10
lO-q
,
0
500
!
.
,
1000 Time k
,
1500
2OOO
0
500
1000 Timo k
1500
2OOO
b) 10I
• ,0
!~...~
.
,..~)
!
--
Rio 1o-
pq',
],og -. ],~) 1 0 -7 ]
0
1oo
,
500
,
loo0 Time k
|,o-
. ,
1500
2000
Fig. I. (a) Learning curves of the PAST algorithm for a total numher of 10 trials with the correlation matrix C - dialg(10,9, 8, 3,2, I). (b) AveragodIeaminigcurves and their kmst squares fit and prediction, respectively.
500
1000
Tim k
1500
2000
Fig. 2. (a) Learning curves of the PAST algorithm for a total number of I0 trials with the correlation matrix C = diag(7,7,7,3,2, I). (b) Averaged learning curves and their least squares fit and prediction, respectively.
135
B. Yang / Signal Processing 50 (1996) 123-136
By analyzing their asymptotically stable equilibrium states and domain of attraction, we found that both PAST and PASTd globally converge to the desired signal subspac: with probability one. In addition, simulation results are reported which verify the global convergence and reveal the asymptotic convergence rate of the algorithms.
10~
"8 10"
t
10"~
,~1o4 11o -`
l
~
,o-'l
forego(k)
--
oxperimcmtal ~ast squame fit t° a/k2
lO.4
References
lff o
500
1000
1500
2000
Time k
Fig. 3. The same experiment as in Fig. I(b) except for the use of rank deficient initial values W(O).
In a second example, we performed the same experiment for a different set of random data with the correlation matrix C = diag(7, 7, 7, 3, 2,1). N o t e that in this case there are no uniquely determined eigenvectors corresponding to the largest eigenvalue 7 of multiplicity 3. Instead, only a threedimensional eigen-subspace is well defined which has been tracked by the PAST algorithm. Fig. 2 shows the results. Again, the global convergence and the predicted asymptotic convergence rates are verified. In the last experiment, we demonstrate the rank increasing capability of the PAST algorithm. Actually, we repeated the first experiment in Fig. 1 except for the use of different initial values W(0). In each trial, we set the second and third column of W(0) equal to the first one. Though W(0) is now a rank one matrix, Fig. 3 shows that W ( k ) still converges to a full rank matrix with orthonormal columns which span the desired subspace.
[1] P. Comon and G.H. Golub, 'Wracking a few extreme singular values and vectms in signal processing', Proc. IEEE, August 1990, pp. 1327-1343. [2] P.A. Cook, Nonlinear Dynamical Systems, Prentice-Hall, Englewood Cliffs,NJ, 1994, 2rid Edition. [3] i~.~. Dowlingand R.D. DeGroat,"Adaption dynamics of the spl~zricalsebslxtCetracker", I EEE Trans. $i¢nal Process., VoL 40, 1992, pp. 2559-2602. [4] F. Gersemsky, B. Yanll and 3.F. B6heae,"A robust and efficient MUSIC-based alsmithm for traclkinBweak and closely spaced simnokls', proc. 7th EUSIPCO (F_~n. burgh), September 1994, pp. 641-644. [5] A. Graham, cd., Knmecher Products ~ Matrix C ~ with Applicatiom, Wiley, New Ym'k, 1981. [6] S. Haykin. Adapfive Filter Tkeorey, Prentice-Hall, Enl~cwood Cliffs,NJ, 1991, 2nd Edition. [7"] H.J. K~.mhnerand D.S. Clark, Stochastic Apl~'oxB~mtion Methods for Constrained and Um:omstraimedSystems, MIT Press, Cambridge, MA, 1978. [8] L. Ljung, "Analysis of recunive stochastic algorithm", IEEE
Trans.
A;;:amo~ic
CoMrol, VoL
22,
197"?,
pp. 551-575. [9] L. Ljung and T. Sfidcrstr'ian, Theory and Pructice of Recwsire idemiJicatioah MIT Press, C a m ~ MA, 1963. [ I t ) ] E. Oja, "A simplifiedneuron model as a principal component analyzer", J. Math. Bio/~Vol. 15,1902, pp. 267-273. [I I] E. Oja and J. Karhunen,'On stochastic approximation of the eigenvectors and eiScnvalucs of the expectation of a random matrix", J. Math. Anal ,4.~fl. Vol. 106, Ig65, pp. 69-84. [12] N.L. Owsley, "Adaptive data orthogonalization", Prec. IEEE Inrernat. Com~.Acom¢. Speech Signal Process. 197&
6. C o a e l m l e m This paper presents a theoretical analysis of the asymptotic convergence of PAST and PASTd, two efficient subspace tracking algorithms based on projection approximation. We derived a pair of coupled deterministic matrix differential equations, whose phase portrait describes the asymptotic dynamics o f the stochastic approximation algorithms.
pp. 109-112. [13] T.D. Sanger, "Optimal Ummlg'fvisedlearning in a singklayer linear fecdforward nenral network", Newal Networks, Vol. 2,1989, pp. 459--473. [14] D.D. ~iljak, Noalhwar Systtnm,Wiley, New York, 1969. [ !5] G.W. Stewart,"An updating algorithm for subslm~ truckinn", IEEE Tram~ Siomal Process., Vol. 40, 1992, pp. 1535- IMl. [16] P.A. Thompson, "An adaptive spectral analysis technique for unbiased fn:quencyestimation in the presenceof white noise",Proc. 13th A~lomar Co~jereuce on Circuits" Systems and Computers, I'4ovemlx'r1980, pp. 529-533.
136
R Yang / Signal Processing 50 (1996) 123-136
[17] G. Xu and T. Kailath, "Fast subspace decomposition', IEEE Trans. Signal Process., Vol. 42, 1994, pp. 539-551. [18] W.Y. Yan, U. Helmke and J.B. Moore, "Global analysis of Oja's flow for neural networks'. Neural Networks, V,)I. 5, 1994, pp. 674-683. [19] B. Yang, "$ubspace tracking based on the proje(lion approach and the recursiv¢ least squares method", in: Pron. IEEE lnternat. Conf. Acoust. Speech Signal Pro,:ess., Minneapolis, April 1~3, pp. IVI45-1VI48. [20] B. Yang, "An extension of the PASTd algorithm to I~oth rank and subspace tracking", IEEE Signal Procesz. L ett., Vol. 2,1995, pp. 179-182.
[21] B. Yang, "Pr,.hcction approximation ~bspace tracking", IEEE Trans. Yignal Process., Vol. 43. 1~5, pp. 95-107. [22] B. Yang and F Gersemsky, "An adaptive algorithm of linear comput:ttional complexity for both rank and subspace tracking", Proc. IEEE Internat. Conf. Acoust. Speech Signal Process., Adelaide, April 1994, pp. IV33-IV36. [23] B. Yang and F. Gersemsky, "Asymptotic distribution of recursive subspace estimators", Pron. I EEE lnternat. Conf. Acoust. Speech Signal Process., Atlanta[. May 1996. [24"] J. Yang and M. Kaveh, ~Adaptive Li.gensubspacc algorithms for direction or frequency estin,~ttion and tracking", IEEE Trans. Acoust. Speech Signal P,,Jcess., Vnl. 36.1988, pp. 241-251.