Copyright © IFAC Adaptive Systems in Control and Signal Processing, Grenoble, France, 1992
CONSISTENCY AND MINIMALITY FOR THE PREWINDOWED PREDICTION PROBLEM P.A. Regalia Departement Electronique et Communications,lnstitut National des Telecommunications, 9 rue Charles Fourier, 91011 Evry Cedex, France
Abstract: The issues of consistency and minimal parametrization of the prewindowed prediction problem are presented in detail. The paper establishes an equivalence class between the set of p x p symmetric, positive definite matrices P and the set of 2 x I stable causal allpass functions 'llp(z) or McMillan degree p. The minimal parametrization of such matrices P is obtained by parametrizing 'llp(z), resulting in an inherently consistent parametrization . The application to numerically stable fast least-squares filtering is highlighted.
Keywords: Least-squares estimation; stability of numerical methods; adaptive systems; prediction theory; recursive algorithms. 11 Background
I Introduction
Let X(n) be a p-column data matrix, defined recursively via
This paper presents a tutorial but theoretically detailed account of consistency and minimality questions as they relate to the numerical stability problem of fast least-squares prediction algorithms. Recent works [1]-[3] show that stable error propagation in this algorithm class is ensured provided the set of numerically reachable state variables of the algorithm is contained within the set of theoretically reachable state variables, a property henceforth called consistency . The numerical instability of certain fast algorithms thus derives from the violation of consistency properties, and hence it is essential to understand the consistency properties of the prewindowed prediction problem which is central to the development of all fast least-squares algorithms.
(1)
where the p-element row vector x'(n) contains the p most recent data samples, and 0 « A < I. A key quantity propagated in recursive least-squares algorithms is pen) = [X'(n) X(nWI . Considering the time recursive structure of (1), one obtains the familiar recursion pen) = A-I [Ip - c(n) x'(n)] P(n-I)
(2a)
with the Kalman gain vector c(n) given by c(n ) =
The paper details how the entire information of the p,h-order prewindowed prediction problem may be compressed into 2p + I independent parameters [1][3]. More precisely, an equivalence class is established between the set of p x p positive definite matrices P of displacement rank three, and the set of 2x I stable, causal all-pass functions 'llp(z) of McMillan degree p. The all-pass function 'llp(z) plus one scale factor thus completely parametrize the prediction problem such that, for any perturbation of the filter parameters proper to 'llp(Z), there corresponds a perturbed covariance matrix of displacement rank three which remains symmetric and positive definite. This leads to a minimal parametrization. The importance of these results in the development of numerically stable fast least-squares algorithms is highlighted.
P(n-I)x(n)
.,---...:.....--.:..~...:.....-
(2b)
A + x'(n) P(n-I) x(n)
The error propagation question--of paramount practical importance-is obtained by perturbing P(no) to obtain P(no) = P(no) + dP(no) for some time index no and comparing the related system c(n) =
P(n-~)x(n)
A+ x'(n) P(n-I)x(n) pen) = A-I [Ip - c(n)x'(n)] P(n-I)
}
rt>no
(3)
to its primal counterpart (2), assuming the same driving sequence x(n) for n > no. If there exists a norm 11·11 for which IIP(n)- P(n)11 ~ 0 exponentially fast as n ~ 00, the system (2) is exponentially stable with respect to the perturbation dP(no) in question . A standard result is rephrased as follows: 471
°
Theorem 1. Suppose the input vector sequence xC)
with x( ·) some scalar sequence with x(n) == for all n < 0. Then X(n) in (I) takes an exponentially weighted prewindowed Toeplitz structure. The matrix Pen) is then theoretically of displacement rank three:
is persistently spanning. Then (2) is exponentially stable with respect to any perturbation/or which the remains symmetric and positive defiresulting nite.
PO
n
An-kx(k) x/(k)
k="O+1 n
L
[-bp ~p
I]+A[O][O cp
cp ] Yp
Since X(n-I) also has a prewindowed Toeplitz structure, an analogous decomposition applies to P(n-I) such that, instead of propagating PO according to (2), one may propagate the three vector dyads appearing in (5), equivalent to propagating a prediction problem. The propagation of such vectors is algebraically equivalent to the propagation of P in (2) among the class of matrices P of displacement rank three from (5). Hence, any perturbation of the displacement structure which does not destroy the symmetric, positive definite character of P will decay as time progresses, by Theorem I. The instability of certain fast algorithms thus must derive from the violation of consistency requirements [I], which we study next.
future sequence x(no + I), x(no + 2), . .. , then we find, for all n > no,
p-I (n) = X'-"OX/(no) X(no) +
Pen)
(5) where ap contains the coefficients of a p-th order forward predictor, bp the coefficients of a p-th order backward predictor, with cp the gain vector of (2b). The positive constants CJ.p , ~P' and yp are, respectively, the forward prediction error energy, the backward prediction error energy, and the likelihood variable, for the p-th order prediction problem.
If we now drive systems (2) and (3) with the same
L
0
p] [_Iap ][1 CJ.-ap]_[-b I p
Algebraic proofs of Theorem I are widely available [4], [5]; we present here a conceptual counterpart which is closer in spirit to the notion of backwards stabi!!ty in numerical analysis . If the perturbed matrix P(no) remains symmetric and positive_definite, then there exists a perturbed data matrix X(no) for which (4)
p-I (n) = X'-"OXrcno) X(no) +
°0] -A [0 0/] =
Pen) [ 0/
The perturbations arise from arithmetic errors, and the exponential qualifier on the stability is necessary to ensure that the composite inftuence of errors incurred at each iteration remains bounded. Note also that Theorem I applies to any recursion which is algebraically equivalent to (2) .
X'-kx(k) x/(k)
k= "0+1
so that
III Consistency in Fast Algorithms The symmetric, positive definite character of P may be rephrased as algebraic properties of the displacement structure (5) using Theorems 2 and 3 below, whose proofs are included for completeness. For convenience, introduce three polynomials via
which tends to zero exponentially fast as n ~ 00. Simply restated , it is known that, for persistently spanning data xC), past inputs x have an exponentially decaying inftuence on the future solution P, owing to exponential data weighting. Hence any error that may be rephrased as a past data perturbation [cL (4)] must likewise have an exponentially decaying inftuence. This is to say that any perturbation which results in the exact solution to a perturbed data set will decay as time progresses, owing to the forgetting factor A. Hence consistency of the computed solution forms a meaningful sufficiency criterion for stable error propagation. It is known [5] that a numerical ~mplementation of (2) may give rise to a computed PC) which loses symmetry and/or positi ve definiteness (violating consistency), in which case stability of the time recursion (2) is no longer guaranteed.
[1 zJ A Ap(z) =
. ..
(zJ
CJ.~f2(n)
AY] [
I
]
-ap(n)
B (z) = [I zJA ... (zJAY] [-bp(n)] ~~f2(n) I p
C
_
p(Z) -
t.! f2 [1
zJ
A ... y~f2(n)
(zJ
A)P] [
(6)
°]
-cp(n)
Theorem 2. Given Ap(z), Bp(z) and Cp(Z) in (6), there exists a P(n)fuljilling (5) if and only if Bp(z) Bp (Z-I ) = Ap(z) Ap([1 ) + Cp(Z) Cp(Z-I).
(7)
Proof" For the "only if' clause, suppose (5) holds . Pre- and post-multiply the equation by [1 zJA .. . (zJAY] and [1 w/A ·· · (w/AYY, with Z
The same concepts apply to fast algorithms, which derive from special structural constraints occuring in the data matrix . Specifically, suppose
and w two complex variables, to obtain (1 - zw) P(z, w) =
x/en) = [x(n) x(n-I) .. . x(n-p+2) x(n-p+I)], for all n,
Ap(z)Ap(w) - Bp(z) Bp(w) + Cp(z) Cp(w)
472
(8)
where
Separating positive definiteness from positive semidefiniteness is more subtle, and will be inferred later.
w/~ 1:
P(z, w) = [1 z/~ ... (z/~rl] Pen)
[
1
(w/~rl (9)
Equation (7) now results upon setting w =
Z-I .
For the "if' clause, consider the two-variable polynomial
of degree p in both z and w. If (7) holds, then (10) con tains a factor (1 - zw), so that Ap(z)Ap(w) - Bp(z) Bp(w) + Cp(z) Cp(w)
1-zw
(11)
The key point of [1] is to recognize that the instability problems of fast transversal filter algorithms derive from the violation of the above consistency properties. Theorem 2 shows the three polynomials A, B, and C to be redundant; independent arithmetic errors in Ap(z), Bp(z) and Cp(z) will lead to violation of (7) . In this case, there no longer exists any matrix Pen) fulfilling (5), such that propagation of the quantities on the right-hand side of (5) has no formal connection with the propagation of a covariance matrix, which is to say that consistency is lost. Only in this case can unstable propagation result. This instability problem may theoretically be resolved by reducing the propagated quantities to a minimal parameter set [1]. To this end, consider the 2 x 1 rational vector
is a two-variable polynomial of degree p - 1 in both w. It may thus be written in the form (9) for some coefficient matrix Pen) . Moreoever, since (11) is a symmetric function of z and w, the matrix P is symmetric: Pen) = pt(n). 0
z and
(15)
Theorem 2 is satisfied if and only if T]p(z) is all-pass:
The "only if' clause and its proof are due to Slock [3]; the "if' clause is a modest extension . We next characterize the positive definite property:
while (12) from Theorem 3 may be rearranged as
Theorem 3. Suppose (7) holds. If Pen) is positive definite, then Izl < 1; Izl = 1; Izl> I.
lIT]p(z)11
If P is positive definite, then P(z, z·) > 0 for all z, so that the right-hand side of (13) must have the same sign as (I -lzI 2 ), which gives (12).
Conversely, if (12) holds, then
Z, z
=
2
IA(z)1 -IB(z)12 + lC(z)1 1 _ Izl2
2
0 > ,
< 1,
Izl < 1; Izl = 1; Izl> 1.
(16)
These observations allow us to infer an equivalence class between the set of stable 2x 1 all-pass functions of formal degree p, and the set of p x p symmetric positive-definite matrices of displacement rank three (cf. Theorem 4 below). Hence a minimal parametrization for the prewindowed prediction problem may be derived by parametrizing the all-pass function T]p(z) . All-pass parametrization is a classical problem of network synthesis, and explicit solutions are fully known [6], [7] . We derive now one such description via lattice parameters, and show its equivalence to the Schur parameter description of [6].
Proof' Set now w = z' (complex conjugate) in (8) to obtain
')
{
= 1,
One may show [7] that (16) holds if and only if T]p(z) has all its poles inside the unit circle of the z-plane, i.e., if and only if B(z) is minimum-phase.
(12)
Conversely. if (12) holds, then P is at least positive semi-definite.
P(
> 1,
2
Section 8.6 of Bellanger [8] gives formulas for converting the parameters of an unnormalized lattice least-squares filter to the forward and backward predictors plus the Kalman gain vectors of orders zero through p. With the subscript i now denoting the order index, we have for the forward predictor [8]
Vlzl ~ I. (14)
On the unit circle, z· = Z-I so that (14) becomes a polynomial function of z = eiO) which, by continuity arguments applied to (14), is nonnegati ve:
[ -.:(n)
This may be interpreted as a polynomial spectral density function to show that Pen) in (9) is at least positive semi-definite. 0
1 -.,~ = [
(n) ] - 'b,(n) [
- 'b,,(n) ",;_, (n) [
473
-b~
c'_~(n) ]
(n) ]
(17)
where eb,i(n) is the eh-order a priori backward prediction error residual, and
dividing the equation by p!l2(n) = p!!i(n-I) cos i(n), and finally using identity (19) once more leads to
k . n _ Lli(n) b,l( ) - Pi(n-I)
_1_ [-bi(n)] = sin i(n) _1_ [-ai\(n)] p!l2(n) I cos i(n) a!!i(n)
'0
with Lli(n) a cross-correlation between the forward and backward (i-Iyt-order prediction residuals. The least-squares reflection coefficient - sin (Mn) may be obtained as [8], [9]
I I ).J12 [ -b - (n) -1-12i 1 cos i(n) cos 8i-1 (n) Pi-l (n) 1
+
I sin8i-l(n) cos i(n) COs8i-l(n) Yi_l(n)
1~12 [ci_~(n)](21)
.
-
Lli(n) Sill i (n) = ---;-;:1/2:;-(-)-'::A7112;;-(-_-I) a i_1 n I-'i-I n
° ]
+
Last but not least, the recursion [8] for the Kalman gain vector reads
= aI!i(n) cos i(n)
Hence dividing (17) by aI 12 (n) gives
_I [ I ] = I _I a!l2(n) -ai(n) cos i(n) aL~(n)
[-a '0
i II(n)]
where Eb,H(n) = eb,i-l(n)Yi-l(n) is an a posteriori backward prediction residual. Dividing the equation by yJl2(n) = yI!i(n)cos8i-l(n) and again using (19) leads to
1 [bO()] - i-I n
+
sin (Mn) . 112 cos i(n) Pi_l(n-l)
+
sin
1
O()]
°
° ]
'\ 112 [ 0 ] = Sill . 8 i-I ( n) _ 1'1112 _11._ 1 . _ [ -bi I (n) y!l2(n) ci(n) cos 8i- l (n) p!~~(n) "1
(18)
+
One may now define a rotation angle 8i-l (n) via [9]
cos 8i-1 (n) =
'\1/2
(22)
112 Pi-l (n) . 112 eb,i-I (n) Sill 8i-l (n) = Yi-I (n) 112 Pi-l (n)
If we now premultiply equations (20)-(22) by the vector [1 zlV'i. ... (zlV'i.i], and let Ai(z), Bi(Z), and Ci(z) be the eh-order counterparts to (6), we obtain
(19)
Ai(z) Bi(Z)] = [ Ci(Z)
such that (18) becomes
'0
° ] °] °
sin i(n) I ')...112 [ + --w--b i- I (n) cos i(n) cos 8i-1 (n) Pi-l (n) 1
.
.
')...112
~
cos i(n) cos 8i-1 (n) Yi-I (n)
[
I COS'i(n) sin'i(n) [ cos'i(n)
xl'
_1 [ 1 ] = 1 _ I [-ai\(n)] a!l2(n) -ai(n) cos
+
sinMn) cos'i(n) I cos ' i(n)
_z_
sin 9H (n) cos 9i_1 (n) I cos 9 i_ l (n)
cos 9 i - 1 (n) Z sin 9;-1(n) cos 9i_l(n)
1
which begin with the initialization Ao(z) = Bo(z) = lIa612 (n),
Co(Z) = 0,
(24)
Ci-I (n) (20) with ao(n) the exponentially weighted signal energy. Let us show now that (26) below holds for all i. To this end, set
Similarly, starting from the backward predictor recursion [8]
Si(Z) =
[~:~~~] C;(z)
[-bien)1= -",en) [-a,~ en)] + [ -b,;' en) ] + 'bH
~12 [Ci-~(n)] °
1
cos 8 i- 1(n) yJ_I (n)
PL~(n-l)
11.
sin i(n) sin 8i-l (n)
°
so that (23) becomes Si(Z) = Li(Z) SH (Z), where Li(z) is the matrix product from (23) . Note that
en) [,;-~en) 1
Ai(z) Ai(Z-I) - Bi(Z) Bi(Z-I) + C;(z) C;(Z-I)
= S:(Z-I) J Si(Z) where
with J=diag[1,-I,I), so that ka,i(n)
Lli(n)
= -(-) , ai n
S:(Z-I)JSi(Z)= SLI(Z-I)[L:(Z-I)JLi(Z)]SH(Z) (25)
474
Theorem 4. The set of pxp symmetric, positive def-
Direct calculations show that
inite Pen) of displacement rank 3 in (5), and the set of stable 2 x I all-pass vectors IIp(z) in (15), form an equivalence class. such that (25) reduces to
Proof" The set of all-pass functions IIp(z) defined above leads to three polynomials Ap(z), Bp (z), and Cp(Z) which satisfy Theorems 2 and 3, to which one may then associate a Pen) of diplacement rank three which is at least positive semi-definite. The results of [2] show that Pen) is in fact positive definite if and only if l<1>d < re/2, 18d < re12 for all i, which is equivalent to stability of TJp(z) from the above arguments . Conversely, if Pen) in (5) is positive definite, then TJ(z) is necessarily stable by Theorem 3, in which case [6] and [7] show that a lattice synthesis outlined above always exists. 0
Via (24) one sees trivially that ~b J ~o = 0, which gives ~:(Z-l) J ~i(Z) = for all i, or equivalently,
°
Bi(z) Bi(z-l) = Ai(z)Ai([I) + c;(z) Ci(Z-I),
Vz, i. (26)
The physical interpretation is clear: the i'h-order prediction problem is obtained by row-deleting X(n) to i columns; the new data matrix so obtained remains pre-windowed and Toeplitz, and hence (26) must result for all i by Theorem 2.
This equivalence class is the key to numerical stability. For suppose we can find an algorithm which propagates the 2p + I parameters <1>i(n), 8i(n), and CXQ(n) , and suppose that the algorithm can, for all time indices n, numerically guarantee satisfaction of the consistency contraints l<1>i(n)1 < re12, 18 i(n)1 < re/2 for all i and CXQ(n) > 0. Then such an algorithm is algebraically consistent with the propagation of a covariance matrix for which all numerical errors are restricted to the class of perturbations which retain the symmetric, positive definite character of Pen). The time recursions are stable with respect to all such perturbations by Theorem I, which is to say that error propagation is inherently stable for such an algorithm. A recent fast QR algorithm [9] fulfills precisely these properties, and its numerical stability is detailed in [2].
If we now define
lli(z)
I
= Bi(Z)
[Ai(Z)]
C;(Z)
then (26) shows that lli(z) is all-pass for all i:
Hence we may define p+ I all-pass functions 110, ... , IIp(Z), which are related by rearranging (23) into (common time-index n suppressed): Ai(Z) [
C;(z)
zB i-1(Z)
1 [I
1
=
cos 8i-l sin 8i-l - sin 8i-l cos 8i-l COS <1>i sin <1>i [Ai-l (z) Ci-l (z) x [ -sin<1>i cos <1>i Bi(z)
1
1
IV Some Open Questions The fast QR algorithm of [9] programs nicely on a CORDIC machine but, as with many numerically well-behaved algorithms, the equations do not map so nicely onto a multiply-accumulate machine. The widespread availability of multiply-accumulate machines would favor fast transversal algorithms, and issue of their stabilization remains of practical interest [3]. Fast transversal algorithms propagate all three polynomiais Ap(Z), Bp(z), and Cp(z), which Theorem 2 show to be redundant. Via Theorem 3, it is clear that such algorithms need only propagate the normalized forward predictor Ap(z) and the normalized Kalman gain Cp(Z); the backward predictor Bp(z) is uniquely determined as the minimum-phase spectral factor from (7), and thus need not be calculated explicitly. The polynomials Ap(z) and Cp(Z) can always be associated with a consistent parameter set by deliberately choosing Bp(z) to give a stable allpass function TJp(Z) in (15) . This raises the open question [1] of whether there exists a fast transversal algorithm that propagates only the forward prediction and Kalman gain vectors, without explicit recourse
which is sketched in Figure I. This shows that the all-pass functions lli(z) may be understood as a Richards section constrained by TJi-l (z), giving identically the all-pass order recursions of [6], [7]. The lattice synthesis for p = 2 is sketched in Figure 2. For arbitrary p, the parametrization requires 2p rotation angles plus one scale factor CXQ(n) in (24), giving 2p+ I parameters as required [1]-(3]. Finally, the results of [7] show that each TJi(Z) is a stable all-pass function (equivalently, each Bi(Z) is minimum-phase) if and only if IITJ i(oo )11 < I,
(27)
Vi.
Evaluating (23) gives
TJi( (0)
sin <1>i
= [ cos <1>i sin 8i-l
1
Hence, (27) holds provided l<1>d < Ttl2, 18d < re/2 for all i. We summarize as: 475
References
to the backward prediction equations. Such an algorithm would likewise be inherently consistent and thus stable [1].
[1] P. A. Regalia, "Numerical stability issues in fast
least-squares adaptation algorithms," Opt. Engr., vol. 21, July 1992.
Finally, from the equivalence class of Theorem 4, one is led to the following query: Given Pen) of displacement rank three in (5), can one always find an exponentially weighted prewindowed Toeplitz matrix X(n) of finite row dimension which satisfies p-I (n) = [XI(n) X(nW I ?
[2] P. A. Regalia, "Numerical stability properties of a QR-based fast least-squares algorithm," IEEE Trans. Signal Proc., under review. [3] D . T. M. Slock, "An overview of some recent advances in fast RLS algorithms," in: Proc. Int. Wkshp. Parallel Algorithms and Architectures, Pont-a-Mousson, France, June 1990.
Insight to this question is obtained by examining the diplacement rank two case, obtained by setting c(n) = o in (5), and hence Cp(Z) = 0 in (6). One then has Ap(z) Ap(Z-1 ) = Bp(z) Bp (Z-I ),
Vz,
[4] S. Ljung and L. Ljung, "Error propagation properties of recursive least-squares adaptation algorithms," Automatica, vol. 21, pp. 157-167, 1985.
and if Pen) is positive definite, then Theorem 3 shows that Izl < 1; Izl = 1; Izl> 1; which is equivalent to saying that Ap(z)IBp(z) is a stable scalar all-pass function . (Set Si '= 0 in Figs . 1 and 2 to obtain the Gray-Markel lattice). If one drives the stable, causal all-pole function zP IBp(z) with white noi se to obtain the output sequence {y(-)} then
p-I = E{ [
[5] M. Verhaegen, "Round-off error propagation in four generally applicable recursive least-squares estimation schemes," Automatica, vol. 25, pp. 437-444, 1989. [6] S. K. Rao and T. Kailath, "Orthogonal digital filters for VLSI implementation," IEEE Trans. Circuits and Systems, vol. 31, pp. 933-945, Nov. 1984.
yr~~~) 1
[7] P. P. Vaidyanathan and S. K. Mitra, "Discrete version of Richards's theorem and application to cascaded lattice realization of digital transfer matrices," IEEE Trans. Circuits and Systems, vol. 33, pp. 26-34, Jan . 1986.
[Yen) ... y(n-P+l)]}
y(n-p+l) with E{-} the expectation operator. If we set p
-z- = Bp(z)
00
'" ~ h KZ -k
[8] M. G . Bellanger, Adaptive Filters and Signal Analysis, New York: Marcel-Dekker, 1988.
(28)
k= 0
then P = [XI Xr l with ho hI
0 ha
[9] P. A. Regalia and M . G. Bellanger, "On the duality between fast QR methods and lattice methods in least-squares adaptive filtering," IEEE Trans. Signal Proc., vol. 39, pp. 879-891 , April 1991.
0 0
X=
which has an infinite number of rows, because (28) is, by construction, an HR model matching the first p autocorrelation coefficients contained in the top row of p-I. Restricting X to having a finite number of rows is equivalent to finding a finite length filter whose first p autocorrelation coefficients match the top row of p-I . For a given positive definite Toeplitz p-I, such a finite length description mayor may not exist.
Figure 1: Order recursion between all-pass functions Tli(z) and 1'1 i-I (z).
It is expected that a similar problem intervenes in the displacement rank three case: one may always find a prewindowed Toeplitz data matrix X which, as the number of rows tends to infinity, matches arbitrarily closely the constraint P = [XI Xrl, but if X is constrained to have a finite number of rows, the set of "reachable" P will conceivably diminish. A more complete answer is needed, and may be considered an open problem for future work.
Figure 2: Lattice synthesis for p = 2.
476