theVLSI journal ELSEVIER
INTEGRATION, the VLSI Journal 20 (1995) 3-19
Factored spherical subspace tracking" Filiep Vanpoucke*, Marc Moonen Katholieke UniversiteitLeuven, Department of Electrical Engineering, ESAT, K. Mereierlaan 94, 3001 Leuven, Belgium
Abstract
Tracking the dominant subspace of a data matrix is an essential part of many signal processing algorithms. We present a modification to the so-called spherical subspace tracking algorithm. This algorithm has a low computational complexity, but suffers from accumulation of numerical errors. We show that this numerical instability can be circumvented by using a minimal orthogonal parameterization of the subspace basis matrix. The resulting factored spherical SVD updating algorithm then consists exclusively of rotation operations. In view of implementing the algorithm in real-time applications with high data rates, a linear systolic architecture is derived.
Keywords: Subspace tracking; Givens rotations; Systolic arrays
1. Introduction
Tracking the singular value decomposition (SVD) of a time-varying data matrix is an important component of several algorithms for engineering applications. Examples include subspace algorithms for high resolution direction finding of narrow-band signal sources using antenna arrays [1-1 and subspace algorithms for system identification [2]. The first step in these algorithms consists in determining the dominant subspace of a recursively defined data matrix X t k j e R k ×M:
r ~Xtk- 1l 1 X[k] = L XtTk] '
A condensed version of this full paper has appeared in the following conference proceedings. F. Vanpoucke and M. Moonen, Parallel and stable spherical subspace tacking. Proc. ICASSP 3, 119, pp. 2064-2067. This work was supported in part by the ESPRIT BRA 6632 project of the EU. Filiep Vanpoucke is a research assistant with the NFWO. Marc Moonen is a research associate with the NFWO (Belgian National Fund for Scientific Research). *Corresponding author. Current address: Lernout and Hauspie Speech Products, Koning Albert-I laan 64, 1780 Wemmel, Belgium. E-mail: filiep @brussels.lhs.be. 0167-9260/95/$9.50 © 1995 Elsevier Science B.V. All rights reserved SSDI 0 1 6 7 - 9 2 6 0 ( 9 5 ) 0 0 0 1 5 - 1
4
F. Vanpoucke, M. Moonen/INTEGRATION, the VLSIJournal 20 (1995) 3-19
where the scalar ct < 1 is an exponential weighting factor and X[k] • ~M is the data vector 1 at the kth sampling time tk. The singular value decomposition of Xtk J is defined by X[k I = U[k]'z~[k]"
T
V[k ]
Esl lr S[k]
=
utn l] "
•
(1)
V[~ ~
L vtk)
,
where Utk I e ~k×M and Vtkje I~ m×m are orthogonal matrices (Ut~1• Utk j ~ VtkT l" Vtk] = I u ) and "~]k]• Rm ×m is a diagonal matrix with positive real numbers in non-increasing order, the so-called singular values. When partitioning the matrices according to the set of D largest (in S(~k]) and M - D smallest (in '~(kl) singular values, the dominant subspace 6e is defined as the column space of the matrix Vt~kj. In the direction finding application D denotes the number of narrow-band emitters in the frequency band of interest, and M is the number of antennas in the array. The accuracy of the signal parameter estimates ultimately depends on the accuracy of the signal subspace estimate. Several applications, e.g. in communication systems or radar systems, require tracking of the dominant subspace in real-time at data rates in the range of 105-106 sample/s. However, tracking the SVD exactly is computationally expensive• Therefore, in the literature several algorithms have been developed which trade accuracy for lower complexity [3-6]• In this paper we propose a modification to the spherical subspace tracking algorithm of DeGroat [3], which will be reviewed in Section 2. It is a non-iterative algorithm with a low complexity, (9 ( M . D ) . This is attained by keeping track of the matrix Vtkl instead of the full matrix Vtk l, and of two averaged singular levels instead of the exact singular values. Although the latter approximation may seem crude, it can be shown that the algorithm still provides a consistent estimate of the signal subspace in a stationary environment [7]• Recently the algorithm has been generalized to multiple levels [8]• Here, we concentrate on the two-level algorithm. The generalization to the multi-level algorithm does not pose any additional difficulty. The spherical subspace tracking algorithm has one major drawback. In finite precision arithmetic the subspace tracking matrix VtkI gradually loses its orthogonality due to error accumulation. Current solutions to stabilize the algorithm are based on re-orthogonalization by means of pairwise Gram-Schmidt transformations [4] or renormalization [9]• However, they only keep the deviation from orthogonality bounded and are difficult to implement on parallel architectures. As an alternative, in Section 3 we introduce a minimal parameterization of Vi*kjin terms of a sequence of Givens rotations, each characterized by a single rotation angle. By computing with the rotation angles instead of the explicit matrix Vt~k],the orthogonality is preserved at each time instant. This parameterization has already been applied successfully to stabilize a related Jacobi algorithm for SVD updating [10]. Due to its regularity, this factored spherical subspace tracking algorithm is well suited for implementation on application specific parallel hardware• Starting from the signal flow graph, in
1In the antenna application the data matricesare complex.However,for ease of expositionwe restrict our attention to real matrices. The generalizationto complex matrices is straightforward.
F. Vanpoucke, M. Moonen / INTEGRATION, the VLSI Journal 20 (1995) 3-19
5
Section 4 a linear systolic array is derived by a placement and scheduling step. With D + 1 processors a pipelining period of 2M - 1 cycles is obtained.
2. Spherical SVD updating In this section the spherical SVD updating algorithm is reviewed. It is a minor variation on the ROSA algorithm (Rank-One update, Signal Averaging) of DeGroat [3] for eigenvalue decomposition updating. The spherical SVD updating algorithm aims at tracking the following approximate orthogonal decomposition of Xtk[:
XEkl ~ XEkl = UEkj"Otkj" Vt]j, where Utk j e ~k×.~4 and VLkI e R M×M are orthogonal matrices and D[kj e ~M×M is defined by
D[k] =
0
~7[kllM- D
(2)
where IM is the M × M identity matrix. The orthogonal matrices U[k ] and V[k ] a r e given by the SVD of X[k]. The exact singular values in the signal and noise part of Eq. (1) are replaced by constant S rl singular levels O'[k], O'[k ]. These levels are chosen as the root mean square values of the two sets of singular values. A subspace with constant associated singular values is called a spherical subspace, since there is no longer a unique basis of singular vectors. Any orthogonal basis for the subspace can be chosen. This leads to important savings in computation when deriving a recursive algorithm. The spherical SVD updating algorithm is given below. In the description the symbol * denotes a quantity of no further interest.
Algorithm 1. Spherical SVD updating ID s
0"[Ol, 0"~oI ~-- 0
for k -- 1,..., oc 1. Orthogonal matrix-vector multiplication ~S T X[k] ~-
T • g s XIk] [k-l]
2. Noise singular vector X[kn ] ~-- X[k ] - nT
n
n X[k]
V~k- 1 ] ' X [~s k] 11
6
F. Vanpoucke, M. Moonen/INTEGRATION, the VLSIJournal 20 (1995) 3-19
Column rotations
3.
[ 0 "'" 01~tsk]]" (I~)~;? . . . t ' ~ D - IID'I T ~sT =[k] ~ *-" Xtk]
(-~ 113. (-¢ 213. [~1 112. " [kl ~[k] "-" [k]
0
0
• d) DID+ lr ~'T[k]
]
0
0~0"~ _ 11
0
[v k]l *]
111
"'
1)
. Updating the singular values
,~ ~$2 +
s O'[k ]
(D
O'(k]
--
1) c~2 O'[ks~
1]
D
+ (M
~n 2
n if[k] =
O" [kl
--
D
--
1) o~2 0"[knz
1
M-D
end for
In each time iteration four operations have to be performed. The first operation is a multiplication of the incoming data vector Xtk ] and the orthogonal matrix V[~_ 1j- The second operation is the "T 1] • Xtk] which is the projection of Xtk ] onto the noise calculation of the vector Xtk" I = V "[k- 1]" V [ksubspace at time tk-1. This vector can be computed alternatively by n
S
(IM -
X[k ] =
l / t k _ 1]"
l/s
T
[ k - 1])" X[ k]
such that storage and updating of Vtkj is avoided. Therefore, the m e m o r y and computation requirements are cut from C (M 2) to (-9(MD), which is particularly interesting in applications where D ~ M. Finally, XEk] has to be normalized. In the third operation the matrix l/[~k- 1] is updated to V~R] by means of sequences of column rotations ~i~i]+ 1. A rotation matrix (pilJ is a 2 × 2 Givens rotation operating on columns/rows i and j embedded in a larger identity matrix, 2 i.e. ¢Pq3differs from the identity matrix only in four entries ~P~Ij = ~ f f = cos(gY Ij) and ~PlY= - g ~ l j = sin(~YlJ)• The ~hil~+ 1 is steered by the annihilation of the first D - 1 generation of the first D - 1 Givens rotations ",'tk] ~S components of Xtk ]. s [0 ... 01 ?rk]]
~-- )~s r
(d)l12 .. D 1 [k]'~T(k] . Cp[k ] IO).
(3)
Below an example is given for the case D = 3.
[× k
(/711/2
× ×]---* [0 × ×] Y
(I)D-11D
,[00
×].
)
#s r
This implies that the rotation angle ~1~ + ~is such that tan (q~il~+ 1) = × i/x~+ 1. The column rotations (~}ili+ [k] 1 are then applied to V [Sk - 1 ] " They align the last signal singular vector along the projection of 2The notation ~biliis used with some liberty concerning the size of the embedding identity matrix, which may differ from M x M depending on the context.
F. Vanpoucke, M. Moonen / INTEGRATION, the VLSI Journal 20 (1995) 3-19
7
X[k] onto the signal subspace at time tk- 1. All signal power added by X[k] is captured by the aligned signal singular vector and the normalized noise vector and only the corresponding a singular values (D and D + 1) are altered. The last column rotation a~Dl° ~[kl + 1, adjusting the "border" between signal and noise subspaces, is generated by the following 3 × 2 SVD problem which can be solved as a 3 x 2 Q R D updating operation followed by a triangular 2 x 2 SVD.
0 0 S ~[kl
n
a[k- 11
Gll3 T tkl )
n ~[k]
kX x] 0
0
0~0"~k_l]
x G213I Ikl ,
0 0
×
0]
x
x 0
oi~l 2.
) ~DID + 1
[kl
0
aO 1 "
(4)
The fourth operation in the spherical SVD updating algorithm consists in calculating the new singular levels by re-averaging in the mean square sense. This operation requires a multiplication, a sum, a division and a square root C = 4b 2 + (K -- 1)a2/x/~,
where K e N and a, b, c E ~. Efficient execution of multiplications and square roots in VLSI hardware is expensive. Therefore, we are interested in an algorithm which solely consists of rotation operations for which an efficient hardware component, i.e. the C O R D I C processor [11], is known. The re-averaging operation can be converted into a sequence of Givens rotations. The numerator can be obtained by a reduction of the vector v ~ ~r, [bla
... a ] ~
[× IO-" 0].
Also the division by ~
can be performed as a single rotation over an angle @ = t a n - l ( x / ~ - 1).
;la I 3. Factored spherical SVD updating Algorithm 1 has one major disadvantage. Due to accumulation of rounding errors in finite precision arithmetic, the orthogonality of the matrix VS[k- H is gradually destroyed. In step 3 of Algorithm 1 the matrix Vt~_ 11is continuously multiplied by sequences of column rotations. Errors in these operations accumulate, and the matrix V s[ k - 11 drifts away from orthogonality. This is illustrated in Fig. 1. An attractive cure for the numerical error build-up is to decompose the orthogonal matrix V[~j into a sequence of Givens rotations. The factorization has been developed in [10] for the related Jacobi SVD updating algorithm of [4] which suffers from a similar error accumulation problem. It is based on the fact that a full orthogonal matrix V e R M×M with positive determinant can be represented as a product of M ( M - 1)/2 Givens rotations, each parameterized by a single rotation angle V = (Ql12" Q 113 . - . Q I l M ) ' ( Q 213 . . . Q 21M) . . . ( Q M
IIM).
8
F. Vanpoucke, M. Moonen/INTEGRATION, the VLSIJournal 20 (1995) 3-19
1.2
x I0 -14
0.9
I
0.6
o.3
0
i
0
i
200
i
i
400 600 iteration
800
1000
Fig. 1. Error accumulation averaged over 10 independent runs. The columns of an orthogonal matrix V e R 2°× 10 are multiplied by sequences of Givens rotations with random rotation angles. The deviation from orthogonality grows approximately linearly with time. Numerical experiment in MATLAB with machine precision e = 2.2 x 10-16
The rotations can be c o m p u t e d by the Givens m e t h o d for QR decomposition [12] applied to the matrix V, as is illustrated in the 3 x 3 example below: V =
x
x
x
x
x
x
x
x
x
I
Q1127
)
x'
x'
x'
0
x'
x'
x
x
x
0~13~ ~
1
0
0
x '
0
x'
0 X r
x'
Q213~ ,
1 0
0
0
1
0
0
0
1
= 13.
F r o m Q 213T" Q 113~" Qll2T.
V = 13 it follows that V = Q112. Q l l 3 . Q213 as anticipated. The matrix V is gradually reduced to the identity matrix by a rotation scheme. Several orderings of the rotation planes can be chosen. In the above ordering the zeros are introduced columnwise using the diagonal element in each column as the pivot. In the case of spherical SVD updating we are only interested in a factorization of the matrix V tk-lJ s S RMxO with D columns. In order to characterize V s, it suffices to truncate the above factorization to column D. V s = ( Q l 1 2 . Q I I 3 ... Q I l M ) . . . (QDID+ 1 ... QDIM).
N o w it remains to be shown how we can directly work with the rotation angles instead of the matrix Vtk 1. In Algorithm 1 parts of the matrix Vtkl are involved in several steps. The f i r s t o p e r a t i o n is the matrix-vector product which is readily performed in factored form ~s ~
T
.
s
XIk] ~ X[k] V[k-1]
T
=
./')112
X[k] Y-.[k-l]
.[)113 Y-,[k-1]
... ()DIM Y--ik-1]"
A signal flow graph (SFG) of the factored product is shown in Fig. 2. It consists of a trapezoidal array of simple rotation cells, each performing a rotation over a stored angle. The new data vector X[R] is fed in at the left side, and the product Xtkj becomes available at the b o t t o m side.
F. Vanpoucke, M. Moonen / INTEGRATION, the VLSI Journal 20 (1995) 3-19
8
o
9
8
012 x2--~ Q1[3
) Q213
Ql14
I Q 214
X3~
Xi
] Q 314 xj
Qlls
] Q21s
.
" ~
Xjout
[ Q31S
Xiout Qll6
i Qm
i Qal~ xjout
= oilff .
xjin
Fig. 2. SFG for factored orthogonal matrix-vector multiplication (M = 6, D = 3). The nodes apply the rotation Q~lJ to their input pair.
The second operation in Algorithm 1 is the computation of the normalized projection of the incoming data vector onto the noise subspace. In factored form this computation is illustrated in Fig. 3. The rotation array of Fig. 2 is first extended to a full triangular array, parameterizing the full s 1][ V[k" 1]]" Now V"[k- II has to be modified such that the last orthogonal matrix V[k- 1] = [ V [kT X[k] equal zero. Given that these components are M - D - 1 components of the v e c t o r )~[k] = V[k]" the outcome of rotating the vector of right-hand-side outputs of the nodes in column D + 1 of the full SFG and that a rotation operation (dashed box in Fig. 3) preserves the norm, the inputs to this rotation operator should be zero as well. This implies that the nodes in column D + 1 should be given a rotation angle such that their right-hand-side output is zeroed. It follows that only one additional column (the D + 1st) of nodes is needed. This computation is more efficient and elegant than the computation of VtkI in Algorithm 1. The data flow in Fig. 3 is identical to the data flow in Fig. 2. Only the functionality of the rotation nodes differs. In Fig. 2 the nodes rotate their input pair over a given angle (rotation mode), whereas in Fig. 3 the nodes rotate their input pair over an angle such that the right-hand-side output is zeroed (angle accumulation mode). Both types of operations can be implemented with the same complexity on a CORDIC processor. The third operation is the computation of the updating rotations (bili ~[k]+ l, i = 1,..., D, given in Eq. (3). As shown in Fig. 4 this operation can also be performed by a single row of rotation nodes in
10
F. Vanpoucke, M. Moonen / INTEGRATION, the VLSI Journal 20 (1995) 3-19 C'4
+
+
-o
...j o
94
X~
1
Qilj .
05Q41~
~J
z jin
t_0_.... "'", ,
Z~ ut
i ........
q ,o
:: = QO r . 7 [k] n
T
~
0
0
zji n
0
Fig. 3. Computation of the factorization of the vector x[~] (M = 6, D = 3). The nodes in column D + 1 compute the rotation QiU such that the right-hand-side output is zeroed.
~],2i
~],3
i[a?]
~8
n
[k],~[k]
,~i, + ~ (D = 3). The functionality of the nodes is given in Fig. 7. The left Fig. 4. Computation of the updating rotations ~ik] D - 1 nodes are rotation nodes in the angle accumulation mode zeroing their first output. The right node solves a 3 × 2 SVD problem.
accumulation mode. These nodes take in the vector ,~[~] and gradually zero it from left to ~iii+l which become available at the upper side. right. Meanwhile, they compute the rotations --[k] The computation of the last rotation ,~Dto +1 requires a node solving the small SVD problem of ~[k] Eq. (4). The f o u r t h - and most difficult - operation is the updating of the factorization of V [ks
in terms of the rotation angles. s
s
n
.{~112
[Vt~]l*] "- [ v t k - n lvtkl] ,-t~l ""
. tdsDID + 1 ),
~t~J
11 directly
F. Vanpoucke, M. Moonen / INTEGRATION, the VLSI Journal 20 (1995) 3-19 $
11
n
ili+ The rotations a, T[k ] ~ have to be worked into the factorization of [ V[k_111/)[k]]" /,')112
O+IlM
"~[k] • "" Qtkl
~
112
Qtk- 11
...Q~k!l]
j
•
t~ik I
...
,
As shown in [ 10] this operation involves 3 types of transformations depending on the indices of the interacting rotations Qkli and q~ili+ 1 1. The index pairs (k, l) and (i, i + l) are disjoint: In this case the rotation matrices Qklt and ~iqi + 1 commute since they affect different rows or columns. ~ili + 1• Qklt = Qkll. (I)ili + 1.
2. The index pairs (k, l) and (i, i + 1) are equal: Here the rotation angles of Qili+ 1 and q~ili+ simply add together. Qili + 1 = Qili + 1. (I)i[i + 1
3. The index pairs (k, l) and (i, i + 1) share a common index: Let k = i + 1. Generically, the matrices Qi+ lit and ~ili+l do not commute. Re-ordering the indices is only possible if a third rotation, Qil~, is taken into account• The sequence of 3 Givens rotations in the (i, l), (i + 1, l), (i, i + 1)-planes, defines a rotation in the three-dimensional (i, i + 1,/)-space Vili+ll; = Qm. Qi+ 11;. q, ili+l This matrix V iii + lit can also be represented by a different set of three Givens rotations by choosing an ordering of the coordinate planes in which the (i, i + 1)-plane is in front. Viii+
111 __ d)il i+ l . l')ill. [)i + lll
In order to determine the equivalent set of angles, V ili+ll~ has to be computed explicitly using the first ordering and then factored using the second ordering. The computational complexity of this 3 × 3 core problem is relatively low and independent of the matrix dimension M. It is even sufficient to compute only two columns of V iii +~lt. By selecting the first and last column, the operation count is optimized to 7 rotations over a given angle (vector rotation) and 3 rotations in which a coordinate is zeroed (angle accumulation). As an example of the use of these transformation types, the first update V +-- V. ~12 for V e []~4x2 is detailed below. To interchange ~112 with Q214, QII4 must be adjacent to Q214. Therefore, on the first line Ql14 is commuted with Q213 (type 1). Then the equivalent set of rotations in the (1, 2, 4)-space is determined (type 3). The same operation is repeated in the (1, 2, 3)-space. On the last line the angles in the (1, 2)-plane are summed (type 2). V " 4112 = Ql12
= Qll2
•Qil3. Q213. QI14. Q214. (~i12 (type 1)
= Q l l 2 • Q ] I 3 . Q213. ¢.bll 2 . [)114./,)214
(type 3)
= Q l l 2 . t b 112 ()l13.tQ213./,)114.f)214
(type 3)
= /')112 • N I l 3 • D I I 4 . Q2,13. D214
(type 2, 1).
--**'~,
~,
~,
~,
12
F. Vanpoucke, M. Moonen / INTEGRATION, the VLSI Journal 20 (1995) 3-19
The S F G for updating the angles in factored form is shown in Fig. 5. The updating process starts at the bottom-left node and gradually evolves towards the diagonal. When merging all SFGs together, the final S F G for factored spherical SVD of Fig. 6 is obtained. The node descriptions are given in Fig. 7. A textual description of the factored algorithm is given below.
Algorithm 2. Factored spherical S V D updating algorithm Qilj [o1 +__i 2
for l <~ i <. D; i + l <<.j <~ M
s
ato j, a~o] ,- 0 for k = l , . . . , ~ 1. O r t h o g o n a l matrix-vector multiplication ~ s x ~T [XtkllY[kl]
*--
T (O112 . tQll3 "X:lkl'~,Y-.tk-ll Y - , t k - l ]
..
t')DIM
" "~-tk-l]!
2. Noise singular vector rotations ~T
Qll
.Q
Qili+] out
Qll:
Qili+l = QII~+I . ~ili+I OUt
Qll,
•ili+1 OUt Q41S
Qll,
Qitj ~ o~t
.
i+llj Oi,
Q416
t o u t b" F" ~l ioluj it ~112
~213
.
l
FIi+llj "~out
i=
+ . O~ i ni +ll l j
O!lJ ~ln
ili+l " ¢~in
t~314
Fig. 5. SFG for updating the angles (M = 6, D = 3). Type 2 (Type 3) operations are performed in the nodes labeled A (B).
F. Vanpoucke, M. Moonen / INTEGRATION, the VLSI Journal 20 (1995) 3-19
I,-)112
P
'~ [k-l]
z[k],2 (3113
~[k-l]
~114
"~ [k-l]
0115 "~ [k-l]
[k-~]
13
~-~~ ~ I
~, t
~
[3213
'~Ik-1]
~214
~314
Q21s
QalS
~'~[k-l]
"# [k-1]
[k-l]
~1 / ? I"~[k--a]
[k-l]
I/
"[k-~]
/
~r~k],O['~1 Fig. 6. Detailed SFG for the factored spherical SVD updating algorithm (M = 6, D = 3). The description of the rotation nodes is given in the next figure. The vectors p and sl, s2 are the placement and scheduling vectors for a linear systolic array. The short arrows indicate the delay-less vertical bidirectional-directional dependency path of length 2M - 1.
14
F. Vanpoucke, M. Moonen /INTEGRA TION, the VLSI Journal 20 (1995) 3-19 Q Xin
[-1:
Qout
Q i,, Yin
.
Your
[-]
Your
Qout =
Q~,,
Yin
"
Xout t~
Q
"
¢out
Qlout ~ Qll.
Qrin Qrout
Yin
Your
@out' Qlout "Qrout = Qlin " Qri,~ . @i,~
Xout t~in .......................
~... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Xin
°
Yin
Q
•
[][] X°ut
=
0
Zin
Yln
Xout
.........................................................................................................
dp
Yin
U
Xin
Your
@ "
[0]
=
Yout
as
0
GlI3"G213"Ol12" / 0
0"n
,
0
Lo
aal',,
0
0
aa~',,
x
y
• 4 'T =
s
flout =
~
"~_J"
Yin
i(D-1) a2°'~2+a~2 D In
,/(M-D-a).,~r'. +~°'
fl°nut = V
M-D
flout ~ flout
Fig. 7. Node descriptions for the factored spherical SVD updating algorithm.
F. Vanpoucke, M. Moonen / INTEGRATION, the VLSI Journal 20 (1995) 3-19
15
3. Column rotations [ 0 . . . 0l
k]]
'
-s if[k]
G113./-2_213./21112. [k] ',J[k] V [ k ]
0]
,-- x[kl
0 . (.bDID + 1T
~n
0
if[k]
0
~[k]
0 ^S ~[k]
0
,j ~k]
for j ~ 2 to M ()llj
QI, lj ~--
-~[k-l]
endfor for i ,,-- 1 to D ~i[i+ *
1 ~
~ili+l [ k - 1]
for j ~- M downto i + 2 (~ili + 1 , f ) i l j . i + l l j * Y--[k] Q *
i')ilj
i-)i-f-llj.~ili÷l
~-- .~,* " ' / - - [ k - l ]
r,
endfor
Q i[k]l i + I
i')ili + l ~-- ~ *
(bili + l ~*
endfor
. updating the singular values s
alkl + (D ~n 2
n O'[k] =
--
D
O'[k] =
atkl + ( M
1)~ 2 O '~[ k -
11
D 1)~ 2 O ' ~ [kM - D --
--
11
endfor 4. A linear array
In many signal processing applications subspace tracking has to be performed in real-time at high data rates. Often the amount of computation of the factored spherical SVD updating algorithm and the stringent throughput demands require a computational power which can only be delivered by pipelined parallel computers. Below we develop an application specific linear array. The derivation of the linear array is based on the SFG of Fig. 6. This SFG has a bidirectional vertical dependency path indicated by the small arcs. In the first step of Algorithm 2 the product vector ~[~J is accumulated from top to bottom, whereas in the third step the updating rotations ~ili + l [k] are applied from bottom to top. This cycle of C(2M) operations has to be completed before a new vector Xtk+ X]can be processed. A natural way to assign computation to processors is to map all nodes in one column to a single processor (placement vector p in Fig. 6). This mapping keeps the amount of communication between the processors low. The linear array then consists of D + 1 processors.
16
F. Vanpoucke, M. Moonen / INTEGRATION, the VLSI Journal 20 (1995) 3-19
Because of the bidirectional vertical data flow, the scheduling (i.e. allocation of the computation over time) cannot be done with a single scheduling vector. Instead, two scheduling vectors are necessary, resulting in a piecewise linear schedule. The first three operations (calculation of 2l~kl, QI~ 'jj and --lkla'iJi+')have a data flow from top to bottom and from left to right. Their dependencies are respected if the first scheduling Q316 Q216
Qzls Q314
X6
Qll6
Q215
X5
Qlls
Q214
X4
Qll4
Q213
X3
Qlla
X2
Qal2
Xl
t=0
t=2
t=4
t=6 Q416
0415
Fig. 8. Phase 1 of the linear systolic array (M = 6, D = 3): orthogonal matrix-vector multiplication. The t = i expressions control the switches.
F. Vanpoucke, M. Moonen / INTEGRA TION, the VLSI Journal 20 (1995) 3-19
17
vector (sl in Fig. 6) points from the top-left to the bottom-right corner of the SFG. The fourth operation (updating of r~ilJ ~ [ k - ll]~ has dependencies pointing upwards and to the right. Therefore, the second scheduling vector (s2 in Fig. 6) should point from the bottom-left corner to the diagonal. X2
~112
~3
dp213
~n
s ~Gin n Gin
alp314
s
n
Gout ~Gout
Fig. 9. Phase 2 of the linear systolic array (M --- 6, D = 3): calculation of the column rotations. Ql12
Qlla
Q213
Q314
Q415
Ql14
Q214
Q31S
Q416
QllS
Q21S
Q316
Qll6
Q216
Q,12 Q,13 Ql[4 Q,15 Q116
Q213 Q214 Q2Is Q216
Q314 Q3}5 Q316
Q415 Q416
Fig. 10. Phase 3 of the linear systolic array (M = 6, D = 3): application of the column rotations.
18
F. Vanpoucke, M. Moonen / INTEGRATION, the VLSI Journal 20 (1995) 3-19
The resulting linear array has three phases of operation. In the first phase the linear array computes the orthogonal matrix-vector product and the rotations in column D + 1. The timing is indicated in Fig. 8. The first component "~1 becomes available after cycle M - 1. The initialization of the Yis require some control of the inputs. During the second phase the column rotations ~ili + 1 are computed (Fig. 9). Finally, during the last phase the factorization is updated (Fig. 10). The pipelining period of the linear array is 2M - 1 cycles.
5. Conclusion
In this paper a factored spherical subspace tracking algorithm was introduced. Its main feature is that the accumulation of numerical errors in the original spherical SVD algorithm is cured by a minimal orthogonal factorization of the subspace basis matrix. The factored algorithm consists solely of rotation operations and has a regular data flow. Therefore, it is amenable to parallel implementation. By a linear placement and piecewise linear schedule an efficient linear systolic array was derived. On D + 1 processors the algorithm with a complexity of C (M- D) is executed in 2M - 1 cycles. It is also possible to fully pipeline the algorithm into a planar systolic array with C (M.D) processors and a pipelining period of 3 cycles. However, the derivation of the planar array is rather complicated involving algorithmic transformations. The interested reader is referred to [13].
References [1] R. Schmidt, Multiple emitter location and signal parameter estimation, IEEE Trans. antennas Propagat. 34 (1986) 276-280. [2] P. Van Overschee and B. De Moor, N4SID: subspace algorithms for the identification of combined deterministicstochastic systems, Automatica 30 (1994) 75-93. [3] R. DeGroat, Noniterative subspace tracking, IEEE Trans. Signal Process. 40 (1992) 571-577. [4] M. Moonen, P. Van Dooren and J. Vandewalle, An SVD updating algorithm for subspace tracking, SIAM J. Matrix. Anal. Appl. 13 (1992) 1015-1038. [5] G. Stewart, An updating algorithm for subspace tracking, Tech. Rep. CS-TR 2494, Dept. of Computer Science, Univ. of Maryland, 1990. [6] B. Yang, Gradient based subspace tracking algorithms and systolic implementation, Int. J. High Speed Electronics Systems 4 (1993) 203-218. [7] E. Dowling and R. DeGroat, Adaptation dynamics of the spherical subspace tracker, IEEE Trans. Signal Process. 40 (1992) 2599-2602. [8] R. DeGroat and E. Dowling, Sphericalized subspace tracking: analysis, convergence and detection schemes, Proc. 26th Ann. Asilomar Conf. (1992) 561-565. [9] M. Moonen, P. Van Dooren and J. Vandewalle, A note on 'efficient numerically stabilized rank-one eigenstructure updating', IEEE Trans. Signal Process. 39 (1991) 1911-1914. [10] F. Vanpoucke, M. Moonen and E. Deprettere, A numerically stable Jacobi array for parallel SVD updating, in: F. Luk (Ed.), Advanced Signal Processing Algorithms, Architectures and Implementations V, Proc. of SPIE, Vol. 2296, July 1994, San Diego, USA, pp. 403-412. [11] J. G6tze and G. Hekstra, An algorithm and architecture based on orthonormal/~-rotations for computing the symmetric EVD, Integration 20 (1995) 21-39, this issue. [12] G. Golub and C.V. Loan, Matrix Computations (Johns Hopkins Univ. Press, Baltimore, MD, 2nd ed., 1989). [13] F. Vanpoucke, Algorithms and architectures for adaptive array signal processing, Ph.D. Thesis, Katholieke Universiteit Leuven, 1995.
F. Vanpoucke, M. Moonen / INTEGI~4 TION, the VLSI Journal 20 (1995) 3-19
19
Filiep Vanpoucke was born in Kortrijk, Belgium, in 1967. He received his electrical engineering degree (1991) and Ph,D. degree (1995) from the Katholieke Universiteit Leuven, where he has been with the Electrical Engineering Department as a research assistant (1991-1995) of the Belgian NFWO (National Fund for Scientific Research). Currently he is with Lernout and Hauspie Speech Products, Brussels. His main interests are in adaptive algorithms and parallel architectures for array signal processing and echo cancelation.
Marc Moonen was born in St. Truiden, Belgium, in 1963, He received the electrical engineering
degree and the Ph.D. degree in applied sciences from the Katholieke Universiteit Leuven, Leuven, Belgium, in 1986 and 1990, respectively. Since 1994 he has been a research associate with the Belgian NFWO (National Fund for Scientific Research) at the Electrical Engineering Department of the Katholieke Universiteit Leuven. His research activities are in mathematical systems theory and signal processing, and parallel computing. He is a member of the editorial board of Integration, the VLSI Journal.