Fast QR factorization of Vandermonde matrices

Fast QR factorization of Vandermonde matrices

Fast OR Factorization Gdric of Vandermonde Matrices* J. Demeure Electrical and Computer Engineering Department University of Colorado Campus Box 4...

1MB Sizes 0 Downloads 120 Views

Fast OR Factorization Gdric

of Vandermonde

Matrices*

J. Demeure

Electrical and Computer Engineering Department University of Colorado Campus Box 425 Boulder, Colorado 80309

Submibd

by

Jan C. Willems

ABSTRACT We introduce a fast algorithm for computing the QR factors of a complex column Vandermonde matrix V. The complexity of the algorithm is 5mn + 7n2/2+ O(m), where m is the number of rows in V and n is the number of columns (we assume that m > n). The matrices Q and R may be computed independently if desired. Two special cases for the row Vandermonde matrix (with real elements, or unit magnitude elements) are also studied, and similar results are obtained.

INTRODUCTION The problem of fitting theoretical exponential modes to experimental data has several applications in signal processing. Such a problem arises in transfer function identification in linear systems, in channel identification for communications, in direction of arrival estimation for radars, and in the study of oscillations on power lines. The data model consists of a linear combination of’ several damped or undamped exponentials in additive white noise:

yt=

f: cizj+n,, i=o

t =O,l,...,

N.

*This work was supported by the Office of Naval Research, Arlington, Va. under contract NOOO14-85-K-0256. LINEAR ALGEBRA

0 Gdric J. Detneure

AND ITS APPLKATZONS 122/123/124:165-194

(1989)

165 00243795/89

166

ChDRIC J. DEMEURE

This model corresponds to a noisy impulse response of a rational discrete linear system H( z ) = B( z )/A( z ), with

A(z)

= fi

(l-

z,z-l).

i=O

The estimation of the exponential model is equivalent to estimating a rational system from its noisy impulse response. The difference between the two system identification procedures lies in the choice of the parameter set. In 1795, Prony [l] developed a two step procedure for solving the noise free case. His method was based on the computation of the linear prediction polynomial A( z ) that annihilates the data. The roots of the polynomial are equal to the modes { ~~}:,a, and the linear coefficients { ci }:=a are then obtained by solving a system of linear equations defined by a column Vandermonde matrix V containing the identified modes. The linear prediction step in this method has received a lot of attention lately, and several new variants to the Prony method have recently been developed to account for the noise component [2-S]. In this paper, we focus on the second step of this method. In the case N = n, the algorithm of Bjork and Pereyra [6] may be used to solve the square linear system for the linear coefficients. If N > n, a least squares solution for the coefficients { ci }:=a is computed by solving the normal equations, based on a Hilbert type matrix H = V*V, which is the Gramian of the Vandermonde matrix V. These normal equations are usually solved implicitly using the QR factorization of the original matrix V. The QR factorization of V is preferred to the Cholesky factorization of H to compute the solution, as it leads to better numerical properties. We develop in this paper a fast algorithm to perform this factorization, by taking advantage of the very special structure of the Vandermonde matrix and its Gramian. A fast algorithm for the Cholesky factorization of H-' is first derived using the technique of Heinig and Rost [7]. This algorithm is then used to derive an algorithm for the matrix Q in the QR factorization. We obtain a vector based algorithm that allows for a saving of an order of magnitude in the complexity of the computation in comparison with standard QR factorization algorithms such as Gram-Schmidt, Householder, or Givens. We also show that, for the row-wise Vandermonde matrix that arises in least squares polynomial data fitting, only two special cases allow for a fast algorithm, namely, when the modes all lie either on the unit circle or on the real axis. The Gramian is then Hankel or Toeplitz, respectively. The algorithm to compute the Cholesky factor of the inverse of the Gramian is then the Lanczos or the Levinson algorithm, respectively. In these cases, the

167

FAST QR FACTORIZATION

reduction technique of [7] leads to other algorithms, which we didn’t use here. The following diagram illustrates several connections between the various matrix factorizations that we shall exploit over and over again in our development of fast algorithms. It also establishes our notation: V=QZB*

-

(2)

II H = BZ2B*

-

II H-1 = Ax-2A*

\

J

(1) j HA = BZ2 / = V”QZ

II (3) IV’Q=RZI In this diagram and throughout this paper, Q’ denotes the transpose of Q, 0 denotes its complex conjugate, and Q* = QT. In the first line of the diagram, the equation V = QXB* defines the QR factors of V in the direct form that synthesizes columns of V from the orthogonal columns of Q. The upper triangular matrix B* is defined with ones on its diagonal, and the diagonal matrix 2 contains real positive numbers. Just beside this equation, the equation VA = QZ defines the QR factors of V in the inverse form that analyzes the columns of V into the orthogonal columns of Q. The matrix A in this equation is upper triangular and related to B by B* = A - ‘. The equation A*HA = E2 defines the inverse Cholesky factors of the Gramian H, and the equation H = BZ2B* defines its direct Cholesky factors. The third line of the diagram shows factors of the Gramian that mix variables from the inverse Cholesky, the QR, and the direct Cholesky factors. The last line explicitly shows the triangular correlation between the columns of V and those of Q. The numbers (1) through (3) beside the equations in boxes correspond to the three fundamental equations that we use to derive algorithms, in the order we use them. A fast recursion for the columns of A, derived from the Cholesky factor HA = BZ2, may be used in the QR factor VA = QZ to derive fast recursions for the columns of Q. These recursions may then be used in the equation V*Q = BZ to derive fast recursions for the columns of B.

CkDRIC J. DEMEURE

168

I.

BACKGROUND Let V be the following complex Vandermonde matrix:

V=

1 z()

1 zi

.a. ...

1 z,

20” . . . x0”

21” . . . z;”

...

z;

...

2,m

.

(1.1)

We assume that m > n. If m = n, the algorithm of Bjork and Pereyra [6] may be used to solve a square system of linear equations involving the matrix V. We also assume that the values zi are pairwise distinct so that V is full column rank. The QR factorization of V is defined as V = QZS*

with

Q*Q = I.

(1.2)

The Gramian of V is the matrix V*V, where H = { hi, t }F j=o, and if

m+l hi, j =

1 - ,~+iz~+i

i

1-

Zizj=l;

otherwise.

ZiZj

(1.3)

H is a complex Hermitian matrix that belongs to the class of Hilbert matrices [71.

The

UL factorization of HP' is defined to be H-l=

or

A2-2A*,

A*HA = x2,

(1.4)

where A is an upper triangular matrix with ones on its main diagonal: 1 0

A=0

6

a,(O) 1

a,(O)

...

e,(O)

a,(l) 1

...

o,(I)

0 6

..:

..6’

(1.5) I11.

FAST QR FACTORIZATION

169

The diagonal matrix 2 = Diag( uO, ul,. . . , a,) is constructed from real positive n. The matrix A may be characterized by the fact numbers ui for i=O,l,..., that HA = A _ *E2 is a lower triangular matrix with a? as its ith diagonal element. The recursions for the columns of A are derived in Section II, in much the same way as in [7-91, with a different scaling. Once the recursions for the columns of A are known, they are used to derive recursions on the columns of Q in VA = QZ. The recursions for the columns of Q are derived in Section III. Once the recursions for the columns of Q are found, they are used to induce recursions on the columns of the lower triangular matrix B = LX2 in the equation V*Q = BZ. The recursions for the columns of B are derived in Section IV, and a complete algorithm for the QR factorization of a column Vandermonde matrix is given in Section V. The Cholesky factorization of the Hilbert matrix H is then given by

H = BZ2B*,

(1.6)

where

$1) B=

b”(2)

01

00

.. .. ..

0 0

b,(2)

1

..

: 0

b,(A)

h,in)

h,jn)

..:

(1.7)

1

The row Vandermonde matrix does not in general lead to a fast QR factorization. This is due to the fact that the Gramian is not structured in a simple way. The Gramian is the sum of rank one matrices:

Hz

f k=O

(1.8)

Okwk*

with

.

(1.9)

CkDRIC J. DEMEURE

170

This rank one matrix has a Toeplitz structure on the phase of the elements and a Hankel structure on the magnitude of the elements. Only the two special cases of real modes ( zk lies on the real axis) or phase only modes (z k lies on the unit circle) will lead to a usable structure. These two special cases are studied in Sections VI and VII, and a fast QR factorization algorithm is derived for each case. The Lanczos algorithm [lo] for Cholesky factoring the inverse of a Hankel matrix is used to derive a fast QR factorization in the real Hankel Gramian case. The Levinson algorithm [ll] is used to derive the QR factorization algorithm in the complex Toeplitz Gramian case.

II.

FAST CHOLESKY FACTORIZATION OF THE HILBERT MATRIX

OF THE INVERSE

The reduction technique of Heinig and Rost [7] (see also [8, 91) may be used to reduce the matrix H: Dp*H - HD = g’( f’)T

+ g”( f2)‘.

(2.1)

Here D is the diagonal matrix containing the modes, D = Diag( za,. . . , zn), and the vectors g’ and f” are defined as follows:

The columns of A may be characterized by the equation HA = BZ2. Define H, to be the leading minor of H of dimension k + 1 by k + 1. Then ak, the vector containing the k + 1 nonzero elements of the kth column of A, is given by the equation H,a, = ate,,

(2.3)

where ek = [0, . . . , 0, l]r. Due to the structure of H we also have the equation

Hka,=Hk[ak(0),...,a&k

- l), l] r = H$ik = %,iik = ufek.

V is full column rank; therefore H is strongly regular, meaning all of its

FAST QR FACTORIZATION

171

leading minors Hk are invertible matrices, and so a; = 0 Vk. Similarly, define the vectors & and & of length k + 1 as

H,$i=gi and H~$~=Hkt&=fk

for

i=1,2.

(2.4)

gi (fj) contains the first k + 1 elements of g’ (f’). The difference equation (2.1) is still valid for the leading minors, and we can write

(2.5)

D,*H,-H,D,=g:(f,')T+g~(~~)~. Multiply Equation (2.5) by Hkl on both sides to get

Applying this equation to ek, we have

(2.6) Similarly, using the transpose of (2.5) we can get (2.7) If ak is known, the updates for the auxiliary vectors are given by

and

J/i =

[

$xpI +21 2Uk.(2.8) 0

‘k

These equations follow directly from (2.3) and (2.4) and the definitions of pi and v:: k-l pk=gi(k)-

c j=O

k-1 hk,j’#‘-l(j)

and

v;=f’(k)

-

1

hj,,&i(j).

(2.9)

j=O

The kth elements of @i and I& are +i,( k) = ~~/cJ~ and +;(k) = ~:/a:.

If

172

CkDRIC J. DEMEURE

Z,z, f 1, then the solution for 0: is

(2.10) The final equation for ak is then obtained by using (2.8) in (2.6) to get

Dk)ak=

(z,Z-

vi

t i=l

I“,-l 1,

(2.11)

or equivalently 2

ak = ek + C vi i=l

&_,(k-1)

___ +--&,

“‘*’

T .

,o

zk-.?i&l

(2.12)

I

Alternatively, using (2.7) for G, we obtain

(2.13) or

_

If Zkzk = 1, we have pivi + p$i equations

iakco)

2;’

‘...’

pl_

(2.14)

z;l

= 0, so that (2.6) and (2.7) lead to the

,...,ak(k-l)]‘=(Zkz-Dk_l)-’

f &$_lr

(2.15)

i=l

[G(O)

,...,ak(k-l)]T=(Dk=*l-Zkz)-l;

$&:_l.

(2.16)

i=l

Define the variables P:, for i = 1,2, by

?T;

=

‘i’ hk,j;;-;(j) , j=O

I

(2.17)

FAST QR FACTORIZATION

173

so that k-l

c

j=O

h,,

jUk(j) = ‘Jk”- h,,

k

=

(2.18)

+T; + ~$7;.

An alternative formula for uf based on (2.15) (when ZkZk= 1) is then 2

u2=h k k,k +

c

(2.19)

v;7T;.

i=l

Another formula (with the same complexity) can be obtained using (2.16) instead of (2.15). The total complexity of computing all the columns uk is then 6n2 + O(n). The procedure of computing the matrices A and E2 from the Hilbert matrix H may be summarized in the following algorithm (similar to that of [S] with a different scaling): ALGORITHM A. Let H be the Hilbert matrix defined by (1.3). Then its inverse Cholesky factorization, given by (1.4) and (1.5) may be computed in the following way:

1.

Initialization: a,(O) =l

2.

Loopon

and

ut= h,,,

k=l,...,n: k-l

p\=g’(k)-

c j-0

k-l

hk.&_,(j)

and v;=fi(k)-

~~-hj,k$‘,_i(j) j=()



for If Zk.zk# 1, then

i=1.2.

174

CkDRIC J. DEMEURE

otherwise

compute

for j=O

zk

-

i=1,2,

zj

and then set 2 uk”= h, k + 1 vimi k k> i=l

uk(k)

Note equations

III.

= 1 and

L(j)

= f Vi=

uk(j>

that the vectors Hx = y [8].

for j = O,...,

k - 1,

i .=1

uk may be used

FAST ORTHOGONALIZATION VANDERMONDE MATRIX

to solve a Hilbert

system

of

OF THE COLUMN

Given the matrix A, and using the equation AA = QZ, we can write out the kth column qk of the orthogonal matrix Q as

= where 0 = [0, . . ,O]r. Extend have length n:

the vectors

(3.1)

qkak,

in Equation

(2.11) with zeros to

(ZkM$q =+[~~-l]. Multiplying

this equation

on the left by V, we have

(zkv-vD)[y)q =i i=l

v:$l~

(3.2)

FAST QR FACTORIZATION

175

where we define the vectors CIX:by

aik =v

+k [

for

0 1

i=1,2.

(3.3)

Using the structure of the Vandermonde matrix V, we can write

where the vector f2 is defined in (2.2), and Z is the shift matrix

z=

By using (3.1) and (3.4) in (3.2), we have

(3.5) i=l

where

An alternative procedure is based on the other equation (2.13) for uk, with the same extension:

Multiplying this equation on the left by V, we have

(3.6)

CkDRIC J. DEMEURE

176

where the vectors /3:, for i = 1,2, are defined by (3.7) Using

1

‘,...,’ [

=e,(g’)r’

20

2,

where the vector g’ is defined in (2.2), we have

(3.8) where Sk is given by

Note that the two ways of computing qk have the same complexity. The inner products (2.9) defining pi and vi may be replaced by

p;=&(k)-

vi= fyk)

[l,zk,...,Zkm](Y:~l=gi(k)--k*(Y:_l, - [l, Zk,...’ ZF]j?_r=fi(k)

(3.9)

-V$;pr,

where V, is defined as the kth column of V. The updates for the vectors CX: and p: are then given by (2.8) premultiplied by VI

qk and pL=pL_r+

: i

qk

for

i=1,2.

(3.10)

‘i

This means that we can compute the matrix Q without carrying the computation for the matrix A. The multiplier crz may be computed as before when Zk.zk# 1, using (2.10); otherwise it may be obtained using the relation (3.11)

177

FAST QR FACTORIZATTON

which comes from the equation V*Q = BE. The procedure for computing the matrices Q and Z from the Vandermonde matrix V may be summarized in the following algorithm: ALGORITHM B. Given the Vandermonde matrix V defined in (1.1) the matrices Q and Z in VA = QZ may be computed in the following way:

1.

Initialization:

2=hoo. uo

90(j) = d/u0

2.

for

j =O,...,m,

d)(j) = g”(O)&~o,o

for

i=1,2

and j=O ,..., m,

Pa j) = fi(0)&3~o*o

for

i=1,2

and j=O ,..., m.

Loop on k=l,...,n: pi =g’(k)

-

F

Z~a:_,(j)

and v; =fi(k)

-

j=O

2

@;Jj)

j=o

for 9#)

= - zk

i

i

j&p:-,(o)

- &

i=l

4kc.i

> i

-

I)

-

t

for

i&h-,(j)

j=l,...,m.

i=l

Sk(j) Sk(j)

=

--&-

for

j=O,...,m.

for The total cost for computing Q is llmn + O(m).

i=1,2.

i = 1,2,

CEDRIC J. DEMEURE

178 IV.

FAST CHOLESKY

FACTORIZATION

OF THE HILBERT

MATRIX

Given the matrix Q, we can write out the kth column b, of the lower triangular matrix B as

v*v

[

“0”=V*qkuk = bp;.

(4.1)

1

Multiplying Equation (3.8) on the left by V*, we have 2

where the vectors s: are defined by

$ =

v*p; = v*v

-i 1

#k

[ 0

for

i=I,2.

(4.2)

Using the structure of V*, we can write

[0 ,..., O,l] = J‘“eL,

where the vector

f2 is defined in (2.2). We have the equation for b, 2 ,$=

c

$,&-ji;f1+&_f2,

(4.3)

i=l

where

pk

=

= An alternative

eLqkuk=

e:v[“o] =C-E2Y[“a]

-(#-~~a~=

procedure

-c&;(k)=

-$.

is based on the other equation

(3.5)

for qk,

FAST QR FACTORIZATION multiplied

179

on the left by V*:

(4.4)

where the vectors

ri are defined by

rik = v*aik = v*v

@: [

0 1

for

i=1,2.

(4.5)

Using

where

the vector

g’ is defined

in (2.2),

the equation

(4.4) is simplified

into

where

8k=e&kuk=e,TV

The inner products

[

(3.9) defining

“0” =(fk’)Tak I

pi and ~1 may be replaced

by

(4.7)

The updates by V*:

for the vectors

ri = rk_i + pibk

ri and

and

s:

s; = s;_i

are given by (3.10)

+ ‘Lb,

for

premultiplied

i = 1,2.

(4.8)

180

CkDRIC J. DEMEURE

The recursions may be simplified, using the following definitions: for

i=1,2,

(4.9)

so that (4.7) is replaced by ,ui = xl-,(k)

and V: = YL_r(k)

for

i=1,2.

(4.10)

The equations (4.3) and (4.6) are then simplified into 2

bko;=

c

,?ky;_l

(4.11)

- C v;r;_,

(4.12)

-

i=l

and 2

(+I-D-*)bp;=

i=l

The updates for XL and y; are given by XI = XL_ r - ,uib, and y: = yL_r - i:b,

for

i=1,2.

(4.13)

The elements in the matrix z may be computed when ZkZkf 1, as before (2.10); otherwise, if zkzk = 1, we use the fact that b,(k) = 1, so that

Using

k-l

h ,,,=m+l=a;+

c

bj(k)bj(k)o,2.

j=O We have then the following alternative formula for IJ~:

uk” = m +

1-

c bj(k)bj(k)af. j=o

(4.14)

The procedure for computing the matrices B and X2 from the Hilbert matrix H may be summarized in the following algorithm:

FAST QR FACTORIZATION C. H = BZ’B*

ALGORITHM

ization 1.

181 matrix H in (1.3), its Cholesky factorin the following way:

Given the Hilbert may be computed

Initialization:

u2= 0 h 0,O’ bO(j

2.

> =

j=O,...,n,

4(j) = g’(j) - giWo(j)

for

i=1,2

and

j=l,...,

n,

d(j) = i”(j) - f’(O>bo( j>

for

i=1,2

and

j=l,...,

n.

Loopon

k=l,...,n:

p;=x; If

for

hj,0/",2

Zkzk f 1,

b,(k)=1

then

and

,(k)

u;“=

h(j)

and

for

v;=y;ml(k)

p;v:+ p;v;

i=1,2. k-l

ot=m+l-

else

z,‘-z,



= Ot(zit

_ zj>

c

b,(k)bj(k)cf,

j=O

t

$kYi-dj)

for j=k+l,...,n,

i=l

x;(j) = 6 dj> - P;h( j>

for

j=k+l,...,

n and

Y;(j)

for

j=k+l,...,

n and i=1,2.

= Y; l(j)

- %h(j)

i=1,2,

The complexity for computing B is then 7n2/2 + O(n). Note that this algorithm is perfectly parallel, as it does not require any inner product, when the modes (zi) do not lie on the unit circle.

V.

COMPLETE QR FACTORIZATION VANDERMONDE MATRIX

OF THE COLUMN

A complete algorithm for computing the QR factorization of the Vandermonde matrix V is given by using Algorithms B and C together and eliminating the inner products in Algorithm B, as well as its unnecessary

ClkDRIC J. DEMEURE

182

variables. The algorithm produces the matrices Q, B, and Z, where V= QZS* = QR, or equivalently B = R*Z-‘. ALGORITHM D. Given the Vandermonde matrix V in (l.l), factorization V = QZB* may be computed in the following way:

1.

Initialization: *cl2=hoo.



for

hd j) = hj,O/4

for

= &%

Mj)

= fi(O)d/h,,

dkj>

= g’(j> - d(OMj)

Loopon

for

i=1,2

i=1,2

and j=l,...,

n,

for

i=1,2

and j=l,...,

n.

k=l,...,n:

zk - zk

-

k-l

, else IJ~ = m + 1 -

&

c

&b:~,(“)

-

i&

qk(j-l)

-

i

b,(k)=1

) I

’ zk

C bj( k)bj( k)o:, j=O

i i=l

=

i=1,2.

2

zk =

for

and Y: = y:_,(k)

P:v: + l-44 _~l

If Zkzk # 1, then ut=

qk(j)

and j=O ,..., m,

for

- f’(O)W)

pi = x;_,(k)

qk(O)

j =O,...,n, j=O,...,m,

40(j)

!/S(j) = P(j) 2.

its QR

c

i=l

i

cl;

for

Gbk-l(j)

j=l,...,m,

1

and b,(j)=

d(j) = k,(j)

-/@k(j)

for

j=k+l,...,

n and i=1,2,

Y:(j)

- %I%(j)

for

j=k+l,...,

n and i=1,2,

= Y:-,(j)

c;

,$=,d_l+

i

uk

‘i qk

for

i=1,2.

183

FAST QR FACTORIZATION

The complexity of the algorithm is 5mn + 7n2/2 + O(m). Note that the back substitution for qk is the only step of the algorithm which is not parallel. Recall that the complexity of standard QR factorization algorithms is mn2 + O(m) [12], which means that our algorithm is an order of magnitude faster.

REAL ROW VANDERMONDE

VI.

MATRIX

QR FACTORIZATION

Let V be the following real Vandemronde matrix:

The Gramian is defined by VTV=

H = {hi, j}E j=o, with

hi, j =

f

(6.2)

.;+j,

k=O

so that H is a Hankel matrix. The Lanczos algorithm [lo] is then the classical way to compute the matrix A column by column. We have H,a, = afek, and

N~[ ayi]

= ot-,[

$:I:]

and

H,

[a;_l] =&[;t_i]. (6.3)

where

pk-I=

L k~lhk,jak-l(j) and

2

uk-l

j=O

vk_i=~

I

uk-l

lC1h k+l.

j’k-l(j).

j=o

(6.4)

Similarly,

(6.5)

184

CkDRIC J. DEMEURE

so that a recursion for ak is obtained as a combination of (6.3) and (6.5):

and (6.7)

U,2=Uk2_,[vk-,-vk-,+~k~1(~k~2-~k~1)1.

The complete Lanczos algorithm for computing the upper triangular matrix A from the Hankel matrix H in the Cholesky factorization H- ’ = AZ p2AT consists of the equations (6.4), (6.7), and (6.6) together with the initialization ao=

1,

00a=haa > '

a, =

PO=

-PO [ 1

I

hl,O/hO

I

0’

vo = h,,o/ho,o7

u,2=u,2(vo-pg.



The Lanczos algorithm has a complexity of 2n2 + O(n). If we premultiply the recursion (6.6) for ak, extended with zeros to have length n, by V, we get

qkOk=v[“di]

=vz[“;‘]

-

i~i,k-2+(~,~2-,,,,,,-..,,.

w3) As before, we use

VZ=DV-

[z;+l,..., z;+llT[o

)...)

OJ],

where the diagonal matrix D is defined as D = Diag(z,,

uk-l qk=

‘k

D qk-l-

zi,. . . , z,).

uk-l -qk-2 uk-2

+bk-2

-

Elk-lbk-l

Then

I (6.9) .

185

FAST QR FACTORIZATION

The multipliers pk_i and vk_i are given by (6.4):

An algorithm for computing Q and Z from V is given by: E. Given the Vandermonde matrix V defined by (6.1) one may compute the orthogonal matrix Q and the diagonal matrix Z in VA = QZ in the following way: ALGORITHM

1.

Initialization:

q,(j)=:and dj)=~ 2.

zj-PO

for

j=O,...,m.

Loop on k = 2,...,n:

and

qk

=

y(

[D+(pkp2-pk-1)z]qk-l-

v&i=

%qk&2). (IL-2

This algorithm has a complexity of 5mn + O(n). If we premultiply the recursion (6.9) for qk by Vr, we get

bkU,f=v=q,U, = U&l

[VTD+(~k~2-ELk-1)VT]qk~1-

186

CkDRIC J. DEMEURE

and if we use V*D=Z*V*+[O

,..., O,l]*[z;+‘,...,

z;+‘],

we have

b,=

~[ZTb~_,-b,,+(p,-,-Pi_l)bxll+~.

(6.11) uk

Note that Pk_ i would have been obtained if V * had an extra row on its bottom. The idea to obtain Pk_i without any extra inner product is then to extend H on its bottom by its natural extension (similar to what is done in [9]). We also have bi( i + 1)

hi(i) = 1,

Pi =

ui2

and ’

li= bi(i+2) ui2

.

This means that no inner product is necessary to get the recursion multipliers. An algorithm for computing B from V is given by: ALGORITHM F. The Cholesky factorization of the real Hankel matrix V*V = BZ2BT may be computed in the following way: 1.

Initialization:

b,(j)

b,(2)

b,(l) PO=-’

4

for

= L f z; 00” I=0

vo=27

b,(j)=~[bo(j+l)-pobo(j)l

j = 1,...,2n,

ul”-u~(vo-P~)>

00

for

j = 1,...,2n

- 1.

187

FAST QR FACTORIZATION 2.

Loop on k=2,...,n:

b,(j) =

~[bk-,(i+l)-bk-~(j)+(~k~.-Sk-~)bk~,(j)l

for

j = k,...,2n

- k.

The initialization phase has a complexity of 2mn + O(m), and it contains inner products and so is not parallel. The rest of the algorithm has a complexity of 4 n2 + O(n), and it is perfectly parallel. A complete QR factorization of the real row Vandermonde matrix V is then given by the following algorithm: ALGORITHMG. Given the Vandermonde matrix V in (6.1), factorization V = QCB* may be computed in the following way: 1.

its QR

Initialization: a’=m+l, 0

b,(j) =

-$l;o;I

b,(l)

./Jo=

b,(j) =

-9

4

for

b&) v(j =

-

00”

$bdj + 1)- /+Mj)l 'j-PLO

a(j) = t and dj) = (II

j=0,...,2n,

a,2=cg(v,-PL2,)~

for

for

j = 1,...,2n

j=O,...,m.

- 1,

188 2.

CkDRIC J. DEMEURE Loopon

k=2,...,n:

k-l= %-1

b,(j)

=

~[b,,(j+l)-bk-,(j)+(P,,-i”k-l)bk-~(j)l

for

qk =

y(

j=k,...,2n-k,

[l)+(lik-2-Pk-1)1]qk-1-~qk-2~, ‘k-2

The total complexity of the QR factor computation is 4mn + 2n2 + O(m), Note that the alternative algorithm given in [8] to compute A leads to another (and equivalent in complexity) algorithm for the computation of Q and R.

VII.

COMPLEX ROW VANDERMONDE QR FACTORIZATION

MATRIX

Let V be the following complex Vandermonde matrix:

(7.1)

The Gramian is defined by V*V = H = {hi, j}T j=O, with hi,i=

F

.z;(‘-i)

with

zk = eiek,

(7.2)

k=O

so

that H is a Hermitian Toeplitz matrix: H T = i? = JHJ, where .I is the

189

FAST QR FACTORIZATION

exchange matrix that has ones on its main antidiagonal. The Levinson algorithm [ll] is the classical way to compute A column by column. We have H,a, = o:ek, and H, Jii, = ak2e,; then

H



k [ ak-l

1=

,...,O,U;__,]~

[vk,O

and

1

“h-l = [&,o

Hk [

,..., o,~J,f-,

where k-l

k-l vk=

1

ho,j+lakpl(j)=

c

hk,jak-l(k-l-j)’

(74

j=O

j=O

The recursion for ak is then obtained as

ak=

[af_l]+K,[JG;-l]>

(7.4)

where the complex multiplier K, (called the reflection coefficient in signal processing) is given by

so that the “prediction error” ut is updated using

The Levinson algorithm for computing the upper triangular matrix A from the Hermitian Toeplitz matrix H in the Cholesky factorization H-' = AB -2A* consists of the equations (7.5) (7.6) (7.3) and the recursion (7.4) with the initialization a a=1

and

u~=h,,=m+l.

The Levinson algorithm has a complexity of n2 + O(n). If we premultiply the

190

CkDRIC J. DEMEURE

recursion (7.4) for ak, extended with zeros to have length n, by V, we get

(7.7)

As before, we use

[ej(n+l)eo,..., ,i(n+l)em]T[o ,..., o,l],

vz =

where the diagonal matrix D is defined as D = Diag(e@o, ejel,. . . , ejem), SO that

qk =

2(DSk-,+ Kkq"k-l)'

(7.8)

where Gk is defined as

(7.9)

.

It is updated using

ak

=

+

F(ik-1

K,D%-,).

(7.10)

The scalar vk may be computed using k-l vk=

c

hk,jak_l(k-l-j)=

-ok-l

2

e-jks’q^k_l(l),

I=0

j=O

so that 1 KkZ - -Vk*&i.

(7.11)

uk-l

A complete algorithm for computing the matrices Q and Z from the Vandermonde matrix V in (7.1) consists of the equations (7.11) (7.6), and

191

FAST QR FACTORIZATION

the coupled recursion (7.8) and (7.10), with the initialization $=m+l 9ow

for

= CioG) = V%

i=O,...,m.

This algorithm has a complexity of 5mn + O(m), which could be reduced to 3mn + O(m) if the matrix QZ were computed instead of Q, as a normahzation multiplication would be saved. The drawback of such a choice is the fact that variables are not scaled any longer, which prevents an implementation in fixed point arithmetic. This can be a problem if the algorithm is implemented on special hardware. This algorithm is actually a specialization of the algorithm derived by Cybenko [13]. Here the columns are related by the diagonal unitary matrix D containing the values on the unit circle. If we premultiply the recursion (7.8) for qk by V*, we get b,p,f = V*q,a,

= ok_ 1(v*D9k4

+ Ky*q”k-J.

(7.12)

If we use V*D=ZV*+[l,O

,..., O]r[eieO ,..., ej’m]

and define hkok = V*Gk, we have the following recursion for bk:

b, = +(Zb,_,+K,i,,_,)+

+.

(7.13)

Define the vector ck = 6, - e,; then (7.13) is simplified into

(7.14) as Pk_ i = vk, and b,_ i(O) = 1. Similarly we get the update

(7.15) We also have b,(k)

= 1, and

(7.16)

192

CkDRIC J. DEMEURE

so that no inner product is necessary to get the reflection coefficient. A complete algorithm for computing the lower triangular matrix B from the Vandermonde matrix V consists of the equations (7.16), (7.6) and the coupled recursions (7.14) and (7.15) with the initialization

$=m+l, b,(i) = co(O)=0

for i=O,...,n, $ l~oe-jioJ and

c,(i)=bO(i)

for i=l,...,n.

The algorithm has a complexity of mn + 2n2 + O(m), and its body is perfectly parallel, as it contains no inner product. Note that in fact the inner products are done in the initialization phase. This algorithm is in fact a Schur type algorithm. A complete QR factorization of the complex row Vandermonde matrix V with unit magnitude modes is then given by: ALGORITHMH.

Given the Vandermonde matrix V in (7.1) its complete may be computed in the following way:

QR factorization V = QCB”

1.

Initialization:

b,(i) = c,,(O)=0

$ ,p”, and

4e(i) = (i,(i) 2.

for

for i=l,...,n,

co(i)=b,(i)

= l/e0

i=O,...,n,

for

i=O,...,m.

Loop on k = l,...,n:

Kk= b, = +(%-I+

qk

-

F(Dqk_,+

-c,_,(k)

and

0: = et_i(l-

K,K,),

&c,-,)

and

ck = ++-i+

KkZ+_i),

K,cj,_,)

and

&=2(&i+

‘kDqk-,)’

FAST QR FACTORIZATION

193

The complexity of the algorithm is 7mn + 2n2 + O(m), reduced to 5mn + n2 + O(m), if QZ and BZ2 are computed and B (different scaling).

VIII.

and can be instead of Q

CONCLUSION

In this paper, we have introduced a fast algorithm to compute the QR factors of a complex column Vandermonde matrix. The obtained algorithm has a complexity of 5mn + 7n2/2 + O(m). Such an algorithm may be used to solve overdetermined Vandermonde systems in the least squares sense, as well as in the problem of finding an orthogonal basis spanning the same subspace as the columns of V. The matrices Q and R may be computed independently if desired. Two special cases of the row Vandermonde matrix (with real only elements, or unit magnitude only elements) were also studied, and similar results were obtained. The generalization of these results to confluent Vandermonde matrices is given in [14]. The special case of Vandermonde matrices with complex conjugate pairs of modes (which arise in modeling real data) is also treated in [14]. It is a pleasure helpful comments.

to thank Professor L. L. Scharfand

German Feyh for their

REFERENCES Baron R. de Prony (Gaspard Riche), “Essai expbrimental et analytique: Sur les lois de la dilatabiliti! de fluides Blastiques et sur celles de la force expansive $e la vapeur de l’eau et de la vapeur de l’alcool, ?I diffkrentes tempkratures,” I. Ecole Polytechnique Ibis (1):24-76 (1795). F. B. Hildebrand, Introduction to Numerical Analysis, McGraw-Hill, New York,

1956, p. 379. R. 0. Schmidt, Multiple emitter location and signal parameter estimation, in Proceedings of the RADC Spectrum Estimation Workshop, Griffiths AFB, N.Y., 1979, pp. 243-258; IEEE Trans. Antennas and Propagation 34~276-280 (Mar.

1986). D. Tufts and R. Kumaresan, Frequency estimation of multiple sinusoids: Making linear prediction like maximum likelihood, Proc. IEEE 70:975-990 (Mar. 1983). R. Roy, A. Paulraj, and T. Kailath, ESPRIT--a subspace rotation approach to estimation of parameters of cisoids in noise, IEEE .Truns. Acoust. Speech Signal Process. 34(5):1340-1342 (Oct. 1986). A. Bjork and V. Pereyra, Solutions of Vandermonde systems of equations, Math. Cump. 24, No. 112 (Oct. 1970).

CEDRIC J. DEMEURE

10 11 12 13 14

G. Heinig and K. Rost, Algebraic Methods forToeplitz-like Matrices and @eratom, Birkhauser, Berlin, 1984. I. Gohberg, T. Kailath, and I. Koltracht, Efficient solution of linear systems of equations with recursive structure, Linear Algebra A&. 80:81-113 (1986). I. Gohberg, T. Kailath, I. Koltracht, and P. Lancaster, Linear complexity parallel algorithms for linear systems of equations with recursive structure, Linear AZgebra AppE. 88:271-315 (1987). C. Lanczos, Solution of systems of linear equations by minimized iterations, J. Res. Nat. Bar. Standards 49 (1952). N. Levinson, Appendix, in N. Wiener, Extrapolation, Interpolation and Smoothing of Stationary Time Series, Wiley, 1949. G. H. Golub and C. F. Van Loan, Mutrir Computations, Johns Hopkins U.P., Baltimore, 1983. G. Cybenko, A general orthogonalization technique with applications to time series analysis and signal processing, Math. Camp. 40 (161):323-336 (Jan. 1983). C. Demeure, Fast Algorithms for Linear Prediction and Modal Analysis, Ph.D. Dissertation, ECE Dept., Univ. of Colorado, Boulder, 1988. Received 27 May 1988; final manuscript accepted 31 October 1988