0005-1098190 $3.1)0 + 0.00 Pergamon Press plc ~) 1990 International Federation of Automatic Control
Automat/ca, Vol. 26, No. 1, pp. 51-61, 1990 Printed in Great Britain.
Consistent Parameter Estimation for Non-causal Autoregressive Models via Higher-order Statisticst J I T E N D R A K. TUGNAIT~t
The parameters of non-causal autoregressive models with non-Gaussian system and observation noises can be consistently estimated by matching a small set of second- and third- (or, higher-)order cumulants of the data. Key Words--Parameter estimation; system identification; time-series analysis; (non-causal autoregressive models); (higher-order statistics). Almtraet--The problem of estimating the parameters of a non-causal autoregressive (AR) model given the noisy observations of the system output is considered. The system is driven by an i.i.d. (independent and identically distributed) non-Gaussian sequence that is not observed. The measurement noise is additive, i.i.d., and possibly nonGaussian. A parameter estimator that involves simultaneous matching of data autocorrelations and third-order autocumulants was recently proposed for the estimation of the parameters of non-causal AR models. In the proposed method the maximum cumulant lag used in cumulant matching was specifiedas "large" but finite. In this paper we give a specific lower bound to the maximum cumulant lag, and we prove the strong consistency of the parameter estimator under the specified choice of lags. Some recently developed fundamental results concerning the recovery of the poles of a causal system from the third-order statistics of the system output play a crucial role in this paper.
causal system representation has also been used to model non-minimum phase causal systems as, for example, in Benveniste et al. (1980) and Benveniste and Goursat (1984) where the problem of blind equalization of communications signals is addressed. This paper is concerned with the problem of parameter estimation for non-causal (hence, non-minimum phase) autoregressive (AR) models given noisy measurements. An interesting example of non-causal system model occurs in seismi c signal processing (Hsueh and Mendel, 1985; Sheriff and Geldart, 1982). Vibroseis is a widely used seismic source that produces a frequency modulated (chirp) acoustic signal in the earth. Correlation of the received (reflected) signal with a stored copy of the transmitted chirp signal results effectively in a zero-phase non-causal pulse (wavelet, or system impulse response), as is the case, for example, for chirp radar signals (Klauder et al., 1960). For any further processing, one regards this noncausal pulse as the effective source signature. To exploit the standard algorithms for filtering, estimation, and identification, a standard approach to handle the non-causality of the system impulse response is: truncate the impulse response h(k) ( - ~ < k < ~) to a finite support - L 1 --- k -< L2, then shift the truncated response by LI samples to the right of the origin and finally, use an M A (moving average) model for the shifted impulse response (Abutalib et al., 1977; Murphy and Silverman, 1978). This leads to a state-space model of dimension L1 + L2 which can be prohibitively large for long impulse responses. Such models have been used for image restoration (deblurring) in Abutalib et al. (1977) and Woods (1981), and for system (blur)
1. INTRODUCTION ALTHOUGH causal system representation of the stochastic signals for the purposes of filtering, smoothing, parameter estimation and system identification is quite common, there are several practical situations where the impulse response of the underlying system is non-causal. Such examples occur, for example, in blurring distortion of images (Andrews and Hunt, 1977; Tekalp et al., 1986; Tekalp and Kaufman, 1988), astronomical data processing (Scargle, 1981a,b), and geophysical signal processing (Hsueh and Mendel, 1985; Wiggins, 1978) (see also Rosenblatt (1985), Nikias and Raghuveer (1987), Mendel (1987), and references therein). Nont Received 11 October 1988; revised 4 April 1989; received in final form 26 June 1989. The original version of this paper was not presented at any IFAC meeting. This paper was recommended for publication in revised form by Associate Editor Y. Sunahara under the direction of Editor P. C. Parks. ¢ Department of Electrical Engineering, 200 Broun Hall, Auburn University, Auburn, AL 36849-5201, U.S.A. 51
52
J . K . TUGNAIT
identification in Tekalp et al. (1986), Tekalp and Kaufman (1988), and Murphy and Silverman (1978). From the viewpoint of system identification, the MA modeling of the non-causal impulse responses used in Tekalp et al. (1986), Tekalp and Kaufman (1988), and Woods (1981) can be quite inefficient computationally as well as statistically (i.e. it leads to poor accuracy of the impulse response estimate), because the formulation involves a high-order system with a large number of unknown parameters to be estimated. The computational complexity of the MA models for non-causal impulse responses prompted Hsueh and Mendel (1985) to seek alternative models. They have developed loworder, "causal" state-space models for a class of non-causal impulse responses in Hsueh and Mendel (1985) where they have also developed recursive, optimal estimation algorithms for such models. However, they assume that the noncausal impulse response is known. In this paper we address the problem of parameter estimation for a class of non-causal systems, namely, those modeled by a non-causal AR model, given noisy data. Once the non-causal model has been estimated, one can transform the estimated model into a "causal" state-space model as defined in Hsueh and Mendel (1985) and then use it for estimation purposes. The model is assumed to be driven by an i.i.d, non-Gaussian noise sequence with its third cumulant at zero-lag being nonzero. [The approach of this paper extends "trivially" to the case where the system noise has zero odd-order cumulants but non-zero even-order cumulants, in which case one would exploit the fourth- (or, higher-)order cumulants of the observations; see, e.g. Tugnait (1987a).] The exact knowledge of the probability density of the driving noise is not required. However, it is essential that the system noise be non-Gaussian because it is impossible to identify non-minimum phase/noncausal models from Gaussian processes (Benveniste et al., 1980). Fortunately, for several cases of practical interest, the system noise is known to be non-Gaussian. For example, analysis of real data has shown (Walden and Hosken, 1986) that the real reflection coefficient series ("system noise" in seismic deconvolution problems) is indeed non-Gaussian. In the communications channel equalization problem, the system noise is a (modulated) digital communications signal which is clearly nonGaussian (Benveniste and Goursat, 1984). System identification for non-minimum phase or non-causal models may be accomplished by the use of the higher-order statistics (Huzii, 1981; Rosenblatt, 1985; Nikias and Raghuveer,
1987; Mendel, 1987; Tugnait, 1987a). (In certain special cases second-order statistics suffice (Tugnait, 1986c).) The area of parametric modeling via cumulant statistics has attracted considerable attention in recent years; for a tutorial and a perspective, see Nikias and Raghuveer (1987) and Mendel (1987), respectively, where further references may be found. Non-parametric approaches to non-minimum phase transfer function estimation may be found in Lii and Rosenblatt (1982), Brillinger (1965), Matsuoka and Ulrych (1984), and Rosenblatt (1985) (and references therein), where essentially an MA model is fitted to the data. In Nikias and Raghuveer (1987), Nikias and Chiang (1987), and Nikias (1988) various approaches to MA model fitting via higher-order statistics have been discussed. These approaches include non-causal AR or non-causal ARMA modeling of non-minimum phase MA models. In this paper we focus on processes generated by non-causal AR models, rather than MA models. The problem of non-causal AR model identification has been addressed before in Scargle (1981a,b), Donoho (1981), Benveniste et al. (1980), Wiggins (1978), Huzii (1981), Tugnait (1987b), Giannakis (1987, 1988), and Giannakis and Swami (1987). Statistical consistency of the parameter estimators in Scargle (1981a,b), Donoho (1981), Benveniste et al. (1980), Wiggins (1978) and Huzii (1981) has been shown only for the case when the observations are noisefree. Even in the noisefree case, the approach of Huzii (1981) does not yield consistent parameter estimates for an arbitrary AR model. Also, none of the approaches given in the above-mentioned papers addresses the problem of consistent order selection. In this paper we allow measurement noise. We address the order selection problem in a companion paper (Tugnait, 1990). In Giannakis (1987, 1988) and Giannakis and Swami (1987) the problem of parameter estimation for non-causal ARMA (hence, AR) models has been investigated. The approach of Giannakis (1987) is erroneous; in particular, the recursion (5) in Giannakis (1987) is false for non-causal models. The approach of Giannakis (1988) and Giannakis and Swami (1987) (that is different from that in Giannakis (1987)) is also erroneous. We show in the companion paper (Tugnait, 1990) that the approach of Giannakis (1988) and Giannakis and Swami (1987) cannot yield consistent parameter estimates for a simple (single-pole) causal AR(1) model which is a special case of general non-causal AR models. Since the parameter estimation approach of these two papers is erroneous, it follows that the
Consistent parameter estimation order selection method presented therein is also in error since it relies on the same arguments that were used to justify the parameter estimation method. In Tugnait (1987b) two techniques that use both autocorrelations and the third-order autocumulants of the noisy observations were presented for the estimation of the parameters of non-causal A R signal-in-noise models. One of them (Algorithm 1 of Tugnait (1987b)) is an extension of Huzii (1981) to include measurement noise, and it is also more general than the approach of Huzii (1981) in that the models that are not identifiable in Huzzii (1981) are so in Tugnait (1987b). However, both Algorithm 1 of Tugnait (1987b) and Huzii (1981) require "testing" of 2p candidate models for fitting a non-causal A R model with p poles; thus, both of them are impractical for moderate to large values of p. Algorithm 2 of Tugnait (1987b) involving simultaneous matching of correlations and third-order cumulants of the data, was shown to be free from this exponentially increasing complexity with system order. Moreover, it was also shown to be more general than Algorithm 1 of Tugnait (1987b) in its ability to capture near all-pass factors in the system model; see Example 4 in Tugnait (1987b). Algorithm 2 of Tugnait (1987b) is the focus of this paper. In Tugnait (1987b) it was claimed that the proposed method (Algorithm 2 of Tugnait (1987b)) yields strongly consistent parameter estimates provided that the maximum cumulant lag parameters are selected to be "large" but finite. Also, the order selection problem was not addressed. In this paper we provide a specific lower bound to the maximum cumulant lags used in cumulant matching. In a companion paper (Tugnait, 1990), we propose and analyze a strongly consistent order selection method that does not require testing of 2p candidate models as in Algorithm 1 of Tugnait (1987b). Both of the above results exploit some recent results on pole recovery from output cumulant of causal A R M A models (Tugnait, 1988b, 1989). The paper is organized as follows. In Section 2 we give a precise statement of the problem under consideration. In Section 3 the identification criterion of Algorithm 2 of Tugnait (1987b) is briefly reviewed. Strong consistency of the parameter estimator assuming complete knowledge of the system order is proved in Section 4. Convergence of the parameter estimator under overparametrization is analyzed in Section 5. Finally, a summary of the results is presented in Section 6, and some details are given in the Appendix. AUT 26:1-E
53
2. M O D E L A S S U M P T I O N S
Let ys(t) denote a pth order stationary scalar AR signal given by P
aiys(t - i) = w(t)
(1)
i=0
where t is an integer (discrete-time), ao 4=0 and as, 4: 0. The signal yjt) is observed in additive noise as
y(t) = y~(t) + v(t).
(2)
The following conditions are assumed to hold. P
(HI) A(z) = E aiz-i 4:0 for Izl = 1 where z is i=0
a complex variable. (H2) The random sequence {w(t)} is zeromean, i.i.d, and non-Gaussian with E{w2(t)} =: o~w4:0, e{w3(t)} =: Y3w4:0, and E{w6(t)} <~. (H3) The noise sequence {v(t)} is zero-mean, i.i.d., and also independent of {w(t)}. Moreover, we have E{vZ(t)}=:O~v, E{vS(t)= Y3o, and E{v6(t)} < ~. As in Tugnait (1987b) we may factorize the polynomial A(z) into two polynomials, one having all its roots inside the unit circle, the other with its roots outside the unit circle. That is, following Tugnait (1987b) we have
A(z ) = d(z )A,(z )A2(z )
(3)
where Pl
Al(Z)=l+~,aliz-i¢O
for
Izl->l
(4)
i=1
P2
A2(z)=l+~a2izi4=O
for
Izl-
(5)
i=1
d(z) = doz -w
for some real do
P = P l +P2.
(6) (7)
Thus, there are Pl system poles ("causal poles") that lie inside the unit circle, and P2 poles ("anticausal poles") that lie outside the unit circle. In the sequel, we will take do := 1 without any loss of generality by "absorbing" it in {w(t)}. Instead of parametrizing (1) in terms of a,-'s, we shall do so in terms of the parameters au's and a2~'s. Define vectors of the unknown parameters in the system model as
O(pa, P 2 )
:----"(a11, a12 . . . . .
a l p , , a21,
azz,...,azp2, O'2w,or~) (8) V)(P,, P2): = (O(p~, P2), 73~, Y3o)
(9)
O(p~,pz ) has dimension p + 2 and V)(P~, P2) is of dimension p + 4. Let Po, P~o, Pzo, Oo(Pm, Pzo), and V)0(Plo, P2o) denote the true
where
54
J.K.
values of the respective parameters/vectors. In the sequel, we also use the abbreviations 0 for O(pl, P2), 00 for Oo(Plo, P2o), /P for lp(p~, P2), etc. Finally, define the following sets:
review Algorithm 3 of Tugnait (1987b) and to introduce some notation used in the sequel.
3.1. Criterion Following Tugnait (1987b) we select V2(pz, P2) to minimize the following cost function subject to (1), (2) and (H1):
O := {O:(H1) holds} ~p e W := O x R z where R = real line. The objective is to estimate O(pl,P2) given the noisy data YN := {y(t), 1 -< t -< N}. We will use the second- and the third-order statistics of the observations to estimate the desired parameters. Therefore, before we conclude this section, we introduce some notation and definitions. For a zero-mean stationary process {y(t)} the third-order cumulant function may be defined as (Rosenblatt, 1985; Brillinger, 1965)
C3(tl, t2) = E{y(t)y(t + tOy(t + t2)}.
TUGNAIT
JN(~P) = J~N(V,') + M2N(~P)
(14)
where )t > 0 is a scalar and
J,u(v,) = 0
= 0.5 Z
I0)-
(15)
t~-L) 0
0
J N(v,)=o.5 Z
Z
t l = - - L 2 t2=l I
(10)
[C3(t,, t2 I ~/') - C3(t,, tz)] z
(16)
For a Gaussian random variable the kth order cumulant ),, vanishes for k > 2. For a Gaussian time series, the (joint) kth order cumulant function vanishes for k > 2. It can be shown (Rosenblatt, 1985) that for the model (1)
R2(v] 0) = E{y(t)y(t + 3) ] 0}
(17)
N
/~2(r) = ( l / N ) ~] y(t)y(t + 3)
( r - 0)
(18)
t ~ -T
C3(/1, tz I qO = E{y(t)y(t + t~)y(t + t2 ] ~p)}
C3s(t,, t2): = E{ys(t)ys(t + tl)Ys(t +/2)}
(19) N2
= Y3w ~
h(i)h(i + h)h(i +t2) (11)
C3(tl, t2) = ( l / N ) ~] y(t)y(t + tl)y(t + t2) t=N 1
where {h(i), -oo < i < o0} is the impulse response of (1). Also, note the identities (Rosenblatt, 1985; Tugnait, 1987b) C3(tl, t2) = C3(t2, t,) = C3(-tl,
t2 -
tl)
= C3(-t2, t~ -/2).
(12)
Finally, for the model (1) and (2), we have
C3(t,, t2) = C3s(tl, t2) + y3~b,,.obt~.o (13) where 6t.o is the Kronecker delta function. 3. I D E N T I F I C A T I O N
CRITERION
In Tugnait (1987b) two techniques that use both autocorrelations and the third-order autocumulants of the noisy observations were presented for the estimation of the parameters of a non-causal A R signal model. In Tugnait (1987b) it was demonstrated via a numerical example that one of them (Algorithm 2 of Tugnait (1987b) involving simultaneous matching of second- and third-order statistics of the data) was more general than the other (Algorithm 1 of Tugnait (1987b) involving a two-step approach): see Example 4 in Tugnait (1987b). Therefore, in this paper, we will concentrate on Algorithm 2 of Tugnait (1987b). The main purpose of this section is to briefly
(20) N1 = max (0, - q , -t2), N2 = rain (N, N + tl, N +/2).
(21)
Note that /~z(r) and C3(h, t2) are the sampled autocorrelation and third-order autocumulant functions, respectively, of the given observations. We show in the next section that it is sufficient to take L1-> 2p and L2-> Lo (where Lo is given by (23)) to obtain strongly consistent parameter estimates. The positive scalar ~, acts as a weight that maintains a balance between the contributions of the second- and third-order cumulants. As shown in the sequel, it has no influence on the consistency of the parameter estimates (so long as ~, > 0). However, it would appear to influence the accuracy of the estimates, a topic outside the scope of this paper. In the sequel (as in Tugnait (1987b) we will choose ~, as follows:
"C
X
L1
],
~] [C3(tl, t2)] z
(22)
I I = - - L 2 t2=g I
with the "nominal" value of ~.o = 1. The above choice is motivated by the desire to equally penalize errors in matching the estimated
Consistent parameter estimation
55
(sampled) correlation and cumulant functions, respectively. We now turn to the computational aspects.
estimate obtained by the minimization of (14) w.r.t. ~p for ~p e W¢c tIJ where qJc is compact, i.e. given the data YN
3.2. Optimization We refer the reader to Tugnait (1987b) for details. Briefly, we have used standard software packages for gradient-type minimization of (14) under (HI). Efficient procedures are available for computation of JmOP) and its gradients w.r.t, each element of ~p (Tugnait, 1987b). However, computation of J2NOP) and its gradients is done via truncated infinite-sum approximation by use of (11). We should note that the software packages referred to in Tugnait (1987b) also provide subroutines that compute gradients via forward differencing. We have had very good experience with NL2SOL (Dennis and Schnabel, 1983) in this respect. A Kronecker product formulation for closed-form computation of output cumulants for causal A R / A R M A models is available in Giannakis (1988). It is not yet clear whether the method of Giannakis (1988) extends to non-causal models.
JN(;PN) = W~ud¢ inf JNOP).
4. CONSISTENCY
It was claimed in Tugnait (1987b) that for L ~ - 2 p , there exists a finite L such that the parameter estimate obtained by minimization of (14) is strongly consistent for L2-> L. We show in this section that L = 2p + 2[min (p~, P2)] under the assumption that the system order is known a priori. Recall that for a non-causal A R model, the knowledge of the system order requires specification of two integers: numbers of the system poles lying inside and outside the unit circle, respectively. In the next section we address the convergence issues for the case of system overparametrization, and in a companion paper (Tugnait, 1990) we propose a consistent order determination procedure. We first state the main result. Before we can prove it, we need to develop several auxiliary results some of which are of independent interest. The lower bounds to the maximum lag parameters given below are only sufficient. It is not known if they are also necessary.
Theorem 1. Consider the algorithm discussed in Section 3. Assume that pl = P~o, P2 = P2o (SO that P=Po), ~>0, L~>-2p and L2->Lo where Lo satisfies (23): Lo = ~2p + 2[min (p,, P g l t p + 2[min (Pl, P2)I
if Y3v :/: 0
if
(23)
Y3~ = 0.
Assume that the assumptions (H1)-(H3) hold for lp = ~Po(Pio,Pzo) = ~Po. Let ~PN denote the
Then lira ~N = ~P0 w.p.1, if ~P0e tttc • N---~o~
We will prove Theorem 1 by following a procedure very similar to that used in Tugnait (1987a). We also exploit some facets of the problem that are not present in causal models. From (3)-(7) the signal model (1) can be decomposed into two models in cascade: a causal, stable, minimum-phase A R ( p ) model with transfer function [AI(z)A2(z-1)] -1 and an anticausal, stable, all-pass ARMA(p2, P2) model with transfer function A2(z-~)[A2(z)] -1. It should be noted that this observation is a well-known one; see, e.g.p. 349, equation (7.26) in Oppenheim and Schafer (1975), where it has been noted that any non-minimum phase transfer function can be decomposed into a minimum-phase model cascaded with an all-pass model. Furthermore, this fact has also been used in the context of the higher-order statistics in Huzii (1981), Tugnait (1988a), and Giannakis and Mendel (1986a,b). We also note that the above decomposition of the model (1) is by no means unique. For example, we can also write it as a cascade of two models: an anticausal, stable, maximum-phase A R ( p ) model with transfer function [AI(Z-1)A2(z)] -1 and a causal, stable, all-pass A R M A ( p l , p 2 ) model with transfer function Al(z-1)[Ax(z)] -l. The above two cases are illustrated in Fig. 1. In the sequel we shall make use of both of these decompositions (Figs l(c) and l(d)). Define P
A(z)=l+~aiz-~4:O
for
Izl-1
(24)
~ w [ A ( z ) A ( z - ' ) l - ' = ~[A(z)A(z-1)] -'
(25)
i=l
such that
for some # z That is, A(z) is a minimum phase. Then / t ( z ) : = A - l ( z ) is the transfer function of the signal-plus-noise causal stable model P )Ts(t) = - - 2 (liYs(t -- i) + ~ ( t ) i=1
(26)
y(t) = ~s(t) + v(t)
(27)
where E{fv2(t)} =O2w. Alternatively, (26) and (27) is a spectrally-equivalent, causal, stable model representation for the process (1) and (2).
56
J . K . TUGNAIT
w(t)
Ys(t)
~
f
>
(a)
a1i z -i i---O
noncausal stable
(b) 1+~ ali z-i
1+~_~a2i z~
ill
d~lw(t
ial
causal stable
anticausal stabh
~a2(p2-i)Zi i--0
1
P~ a2i zi 1+ y',
1+~ ~ z-~
i=l
d~lw(t
ys(t)
as specified by (29). It should be noted that analysis of time series via formation of residual sequences is a well-established technique. It has been used in Kay and Marple (1981) and Cadzow (1982) (and references therein) for parametric spectral analysis and in Giannakis (1987, 1988), Giannakis and Mendel (1986a,b), and Giannakis and Swami (1987) for analysis of non-minimum phase systems. Multiply both sides of (31) by the product x~(t + r)x~(t + r - m ) and take expectations to obtain (Nikias and Raghuveer, 1987; Mendel, 1987; Tugnait, 1986b, 1987a; Nikias, 1988; Giannakis et al., 1987; Giannakis, 1988; Parzen, 1967)t the scalar recursion
(c)
~>
C3x~(r - m, , [ ~P) =
i=l
a2i i=1
anticausal allpass
minimum phase
~-'-al(p~-i) z-' i---o
zp
P' 1+~ ali z-1"
1 + ~ ai zi
r-i I v,,) for r > p 2 + 1 (33)
x C3xs(r-m-i,
where
f p, f i=l
causal allpass
Ys
(d)
C3xs(r- m, r[~p)
i~l
• = E{xs(t)x~(t + r - m)x~(t + r) I ~P}" (34)
maximum phase
FIG. 1. Various equivalent representations of the non-causal autoregressive signal model.
Define C3x(h, tz) in a similar fashion (see also (10)) as
C3~(tl, t2 [ap):= E { x ( t ) x ( t + tl)x(t + t2) [ ~P}"
Now we would like to make use of the minimum-phase/all-pass decomposition discussed earlier in this section. Let us filter out the minimum-phase part of the signal model to form an all-pass signal residual sequence {x(t)}. Define
Then by conditions (H2) and (H3), and (29), we have c3x(t,, t21
=
t21 W) C3x (h, tz [
P
x(t):= ~ ~ i y ( t - i ) ,
for any t 1 and t2 if Y3o = 0 for max (tl,/2)---p + 1 if Y3v4=0.
~o: = 1
(28)
i)
(29)
Define a (P2 + 1)-column vector d;3x(r)~t as
(30)
C?3x(r) := [Csx(r, t)C3.(t - 1, r) • "" C3.(~-p2, r)] T. (36)
(35)
i=0 P
= x~(t) + ~ cliv(t
--
i=0
where P
x~(t) := ~ aiys(t - i). i=O
By (1)-(7) and (24) and (25), we obtain an anticausal all-pass ARMA(p2, P2) signal model for the process xs(t) described by the recursion x~(t) = - ~
Then by (33)-(36) we have the vector recursion P2
C3x(r)= - ~
a2iC3x(r-i)
for
r->10
(37)
i=1
where
a2ix~(t + i)
i=1
lo=[P2+,- 1 tpE~-p+l
P2
+ ~. aE(p~_ow'(t + i)
(31)
where {w'(t)} is a scaled version of {w(t)}. The process x~(t) is "observed" in colored noise 0(t) := ~ aio(t - i) i=0
(32)
Finally,
define
an
ify3~=0 ify3~0.
nk × m(nk = k
(38) ×
(P2 + 1),
t Parzen (1967) deals only with causal stable AR models. ~:To simplify notation, when there is no risk of confusion, we will suppress explicit dependence on parameters such as ~,
57
Consistent parameter estimation k
if we have 0 >- ti - - L (i = 1, 2) in (41). Thus if the hypotheses of the lemma are true, then
>---P2)block-Hankel matrix C(k, m; n) as
C(k, m; n) = {/jth element: (~3~(i- j + n)}
~(n)
C3~(n+l)
...
C3x(n+m-1)
C3x(n+ 1)
t~3,,(n + 2)
..-
C'3x(n + m)
C3x(tl, t2 [ ~) = C3x(tl, t2 [ ~o) for 0 - 6 - L (i = 1, 2). The desired re ult then follows from Lemma 2. []
53~(n + k - 1)
(?3x(n+ k ) ... ( ~ 3 x ( n + m + k (39)
Lemma 1 has been proved in Tugnait (1989) (see also Tugnait (1988a,b)).
Lemma 1. Assume (H1)-(H3). For any fixed k, m, n, l such k >--P2, m >--P2, l>-lo, and n > _ l - p 2 , r a n k { C ( k , m ; n ) } =P2 if and only if (31) is minimal.
The proofs o I Lemmas 4-6 are given in the Appendix. The convergence results for sampled cumulants give a in Huzii (1981), Rosenblatt (1985), Nikias a nd Chiang (1988), Nikias (1988), Giannakis (19t ~7), Giannakis et al. (1987), Giannakis and Vlendel (1986a,b), and Brillinger (1965) prove w :ak consistency whereas we show strong consisten y following Lemma 3 in Tugnait (1987a).
•
Lemma 4. (a) For any given integer 1:, we have i
Corollary 1. Under (H1)-(H3), rank {C(k, m; n ) } = p 2 if l>-lo, m>-p2, n > - l - p 2 , and k ->P2.
°
w.p.1
lim/~2~') = R2(00)
N~o~
'
Proof. Since (31) is all-pass, it is minimal if and only if a2p2~:0. However, this is true by assumption. Hence, the desired result follows from Lemma 1. []
= E(y(t)y(t + 3) I 0o}. ;. (b) For any gwen pair of integers (tl, t2), we have i
lim C3(q, t2)w~.l C3(t1, t2 1 ~00)
N---*~o
!
Lemma 2. If C3x(tl, t21 ~) = C3~(tx, t21 ~Po) for O>_ti >_-L2 < - - L ( i = 1, 2), then a2i0P) = a2~(Ipo), for 1 -
£= {p + 2p2 2p2
if )'3o•0 if Y3~ = 0.
(40) •
Proof. Take l = lo and k = P2 in (39). Then the desired result follows from (36) to Corollary 1 and the identities (12). []
~, E{y(t)y(t + q)y(t + t2) I *o} Lemma 5. Un4er (H1)-(H3) we have lim JN(~)"~~](~O):=]~(~) + ] 2 ( ~ )
(43)
uniformly in N for ~p e ~Fc c u d where IFc is an arbitrary compact set, JN(~) is given by (14), and 0 ~ [R2(~" I 0 ) - R 2 ( ~ I 0o)12 l" :--L 1 0 0
(39), ]l(~p) = 0,5
Lemma 3. Suppose that ai(ap) = ai(~o) (1 - i -< p). If C3(tl, t z l ~ ) = C 3 ( t x , t21aPo) for 0>-t~- L 2 < - - L ' (i=1, 2), then a2i(~)=a2i(~Oo) ( 1
°
i
=0.5
(44)
Z ll: --L 2 t2~t 1
[C3(qi t2 ] ~p) - C3(tl, t2 I Wo)]2" ° . (45)
Proof. By (28) we have (as, e.g. in Tugnait
Lemma 6. Th~ limit ](lp) defined in (43) is
(1987a), proof of Lemma 7)
continuous in
P
P
C3x(tl, t2 I ~0) = E E
P
Lemma 7. Asshme the hypotheses of Theorem 1
E ai(~))ai(~ 3)
with the exception that Lo is replaced with L'. Then we have
i=0 j=0 k=0
XKk(~)C3(i+tl--j, i + t 2 - k l ~ )
(41)
where rio := 1. By (12) it follows then that any C3(., .) term on the right-hand side of (41) can be expressed in terms of a member of the set {C3('E1, "g2 I ~)), 0 ~-~ 17i ~-~ - L ' ,
i = 1, 2}
for ~p e tlJc.
(42)
](~) -> ] ( ~ o ) with equality
= o
and only if ~p = ~Po. °
Proof. It foil ws trivially that ](~Po)=0. It is well known (I~ [ehra, 1971) that for L~-> 2po, we
58
J.K.
have
TUGNAIT (C1) JN(7)) is measurable with respect to
J,(7)) :/: J, (7)o)
(46)
for any 7) = (0, Y3w, Y3~) such that 0 ~ Oo where Oo := {0
:Szy(Z I0)= S2y(Z 10o)}
P(YNI 7)0), the probability measure of YN given the true parameter 7)0. (C2) L i m J N ( 7 ) ) = ] ( 7 ) ) w.p.1 uniformly in ~p N~
(47)
for 7) • ~c. (C3) ](7)) is continuous in 7) for 7) • We. (C4) ](7))->3(7)0) w.p.1, with equality if and only if 7) = 7)o.
That is, Jr(7)) can distinguish only among classes of 0 that are s_pectrally not equivalent. (We also have L(7))=J1(7)o) for any 7)= (0, ~'3w, ~'30) such that 0 ~ Oo.) Thus, we need to confine our attention to the members of Oo. We now claim that for every L2>L ', 7 ) = ( 0 , Y3~, Y3~), 0 ~ Oo, we have ]z(7)) > 0 for 7) 4: 7)o- The proof is by contradiction. If it is not true, then
It has been shown in Theorem 1 of Nakajima and Kozin (1979) that if (C1)-(C4) hold, then lim ~N -- 7)o w.p.1. The results of Lemmas 5-7
and
Szy(Z I O):= aZ~H(z I O)H(z-' l O) + o2, H(z I O):=a-l(z 10).
C3(t,, t2 I 7)) = C3(tl, t2 ] 7)0) for 0 - > t ~ - - L 2 - < - L ' (i=1,2). Then by Lemma 3, a2~-(7)) = a2,.(7)o), 1 -< i -
C3(tl, t21 7)) = C3(tl, t2 I 7)0) for ~ < t; < - ~ (i = 1, 2). This implies that
$3(zl, z2 ] 7)) = S3(z~, zz ] 7)0)
(48)
where
0o oe S3(ZI' Z2 I 7 ) ) : = £ £ C3(q, tl=--~ t2=--~
tz I 7))zT"z~ '~
= y3~(7))H(z, I 7))H(z2 I 7)) x H([z, zzl-'{ 7)) + r3~(7)) (49) and H(zlT))= H(zlO ) is the transfer function of the signal model (1) parametrized by % hence by 0. Set z~ = 0 and z2 = 0 in (48) and (49) to obtain ~,'3~(7)) = Y3~(7)o)- Then use this fact in (48) to conclude that Y3w(7))= Y3~(7)o) since ai(7))=ai(7)o). That is, we have 7)= 7)o--a contradiction. Combine the above claim with (43), (46) and the facts that Ji(7)) -> 0 (i = 1, 2), for any 7) = W and that ](7)o) = 0, to obtain the desired result. [] We are now ready to prove Theorem 1.
Proof of Theorem 1. We will prove the desired result by exploiting some results from Nakajima and Kozin (1979). The following result has been proved in Nakajima and Kozin (1979), which we state by using the notation of this paper instead of that of Nakajima and Kozin (1979). Consider the following four conditions.
N~
satisfy the sufficient conditions (C2)-(C4), respectively. Since JN(7)) is continuous in the observation vector YN, it is measurable with respect to P(YNI 7)0) (Ash, 1972). Thus (C1) is also satisfied. This yields the consistency result for L2-> L'. It remains to show that we need to consider only L z - L0. This follows by noting that instead of using the decomposition consisting of a causal minimum-phase model cascaded with an anticausal all-pass model to derive the results, we can also use the decomposition involving an anticausal maximum-phase model cascaded with a causal all-pass model; see Fig. 1. Now use this option to carry through all the developments of this section to conclude that everything goes through if P2 is replaced with p, in L (see (40)). Thus the desired result follows. [] 5. CONVERGENCE UNDER OVERPARAMETRIZATION In the previous section we proved the strong consistency of the parameter estimator of Section 3 under the assumption that the system order (integers Pl0 and P2o) is known a priori. In this section we analyze the convergence of the parameter estimator under the (relaxed) assumption that Pl ->Pl0 and P2 >-P2o. That is, we only assume that the system order used is equal to or greater than the true order. The results are of interest when the model order may be unknown, however, an upper bound on it may be available. For example, via a second-order statistics based method, one may know the value of Po. Then, clearly, Pl0 - P o and P2o -> PoThe main result of this section is that under overparametrization, the non-causal A R signal plus noise model is parameter identifiable (Ljung, 1978), i.e. the parameter estimator converges to a unique limit that in our case also turns out to be the true parameter vector. This turns out to be exactly as it is for causal A R models. This is in marked contrast to the case of (causal) A R M A models where parameter identifiability is
59
Consistent parameter estimation nonexistent under overparametrization (Astr0m and S6derstrSm, 1974). Let us first introduce some more notation and clarification of earlier notation. In the previous sections we used ~Po for ~Po(P~o,P2o) and similarly for 0o. In this section since p~ >-P~o (i = 1, 2), we will explicitly use the dependence of ~p and 0 on px and P2. Therefore by 0o(Pl, P2) we mean a vector such as given by (8) such that
au(Oo(p , P2))
{ ;liO = ali( Oo(Plo, P2o))
for l <-i <-p~o for p~o
a .(Oo(pl, P2)) = { ;zio= a2i( Oo(Plo, P2o))
for l <-i <-P2o for P2o
0"2w(Oo(Pl,P:)) = 02w(Oo(Plo,P2o)) = OwO 2 (52)
Proof. By (50)-(54) we note that a2iOPo(pl, P2))=a2~(Oo(Pl, PE)) and that it is zero for P2o < i -
(55)
where the matrix C(.,. 1.) is as given by (39) with explicit dependence upon ~Oo= ~Po(Pl, P2). In order to prove the desired result let us consider two cases. If ~?(Pl, Pz) is such that a2i(O(pDp2))=O for P2o
P2o
(53)
= C(pz, P2, lo -P21 V(P,, P2)).
Similar comments hold true for ~o(Pl,P2) = (Oo(pl, p2),Y3~o,)'3~o). Also, in (26) with P >-Po, we have
Thus, we must have j -< Pzo in which case we are back to the previous case. This yields the desired result. []
Oo(p,,
= o2(Oo(Pm, P2o))= O~o.
We now turn to the counterpart to Lemma 7.
ti,0Po(P~, P2)) = [a,o = ai(aPo(Plo, P2o)) (0
for 1--i--
We first state the main result. Before it can be proved, we need to develop some auxiliary results.
Theorem 2. Consider the algorithm discussed in Section 3. Assume that p~ >-P~o, P2 >-P2o (so that P>-Po), ~.>0, Ll>-2p and L2>-Lo where Lo satisfies (23). Assume that the assumptions (H1)-(H3) hold for ~p = ~P0(PDP2) =: ~Po. Let ~Pu denote the estimate obtained by the minimization of (14) w.r.t. ~p for lp e qJ~=qJ where W~ is compact. Then lim ~u = ~Po w.p.1 if N--~ IPo • rye. , We now turn to the development of the needed auxiliary results.
Lemma 9. Assume the hypotheses of Theorem 2 with the exception that Lo is replaced with L' =/_] +p. Then we have
]OP(P,, P2)) -> Y(~o(p,, p2)) = 0 with equality ~Po(P1,P2)- "
if
and
only
if
~p(p~,p2)=
Proof. It follows trivially that J(~o(Pl, P2)) = 0. As in the proof of Lemma 7, (46) holds true. Observe now that for any ~O(pl,p2)= (O(pl, p2), ~'3w, Y3o), if O(pl,p2) • Oo, then tii0p(pl, pz)) =5,0P0(P~, P2)) so that 5~(0(pl, p 2 ) ) = 0 for po
a2iOp)=O for p~
P2), )'3w, ]t3v) O(pl, P2) • Oo(pl, P2) where
be such that Oo(pl, P2) is as in (47) with obvious modifications. Consider (28) with 5~=5i(apo(Pl,P2)) (see also (54)). If C3~(q, t211p(p~,p2))= C3~(tl, t2l lPo(p~,p2)) for O>-ti>--L2 <--[, (i -- 1, 2), then a2~0P(pa, P2)) = a2~(aPo(pDP2)) for 1 -< i -< P2 >-/920 where/7, is in (40) °
au0p)=0
for P ' I < i - < P l
where P'1 +P~=Po. By Lemma 8, (51) and by mimicking the proof of Lemma 3, it follows that P~=P2o, hence, p'~ =Plo. Now we may mimic the proof of Lemma 7 to establish that for every L2 >-L', ~P(P,, Pz)=(O(P_I, P2), Y3w, Y3o), O(pD P2) • O0, we have Jz0P(PD P2)) > 0 for
60
J . K . TUGNAIT
lP(pl, P2) ~ ~Po(Pl,P2). This yields the desired result.
[]
We now turn to the proof of Theorem 2.
Proof of Theorem 2. Replace Lemma 7 with Lemma 9 and just mimic the proof of Theorem 1 to obtain the desired result. (Note that Lemmas 5 and 6 continue to hold true.) [] 6. CONCLUDING REMARKS
Strong consistency of a parameter estimator for non-causal A R model fitting with noisy observations was proved under certain sufficient conditions that are sharper than the previously known conditions. The parameter estimator involves simultaneous matching of data autocorrelations and third-order autocumulants, and is the same as Algorithm 2 of Tugnait (1987b). The earlier consistency results had been obtained with maximum cumulant lags as "large" but finite. In this paper we have given a specific lower bound to the maximum cumulant lags. In a companion paper (Tugnait, 1990), we propose and analyze a strongly consistent order selection method for non-causal A R model fitting. Unlike the method presented in Algorithm 1 of Tugnait (1987b), it does not call for testing of 2p candidate models where p is the number of system poles. Also, it does not call for selection of thresholds for "absolute" rank determination; rather, it works with "relative" ranks of two Hankel matrices, and hence, is quite practical. REFERENCES Abutalib, A. O., M. S. Murphy and L. M. Silverman (1977). Digital restoration of images degraded by general motion blurs. IEEE Trans. Aut. Control, AC-22, 294-302. Andrews, H. C. and B. R. Hunt (1977). Digital Image Processing. Prentice-Hall, Englewood Cliffs, New Jersey. Ash, R. (1972). Real Analysis and Probability. Academic Press, New York. Astrrm, K. J. and T. SOderstrrm (1974). Uniqueness of maximum likelihood estimates of the parameters of an ARMA model. IEEE Trans. Aut. Control, AC-19, 769-774. Benvenistc, A. and M. Goursat (1984). Blind equalizers. IEEE Trans. Commun. COM-32, 871-883. Benveniste, A., M. Goursat and G. Ruget (1980). Robust identification of a nonminimum phase system: blind adjustment of linear equalizer in data communications. IEEE Trans. Aut. Control, AC-25, 385-398. Brillinger, D. R. (1965). An introduction to polyspectra. Annals Math. Statist. 36, 1351-1374. Cadzow, J. A. (1982). Spectral estimation: an overdetermined rational model equation approach. IEEE Proc., 70, 907-938. Dennis, Jr., J. E. and R. B. Schnabel (1983). Numerical Methods" for Unconstrained Optimization and Nonlinear Equations, Sect. 10.3. Prentice-Hall, Englewood Cliffs, New Jersey. Donoho, D. (1981). On minimum entropy deconvolution. In D. F. Findley (Ed.), Applied Time Series Analysis, IL Academic Press, New York.
Giannakis, G. B. (1987). Cumulants in identification of causal and noncausal 1-D (time-series) and 2-D (image) ARMA models. Proc. Conf. Inf. Sci. Syst. Baltimore, Maryland. Giannakis, G. B. (1988). A Kronecker product formulation of the cumulant based realization of stochastic systems. Proc. 1988 Am. Control Conf., Atlanta, Georgia, pp. 2096-2101. Giannakis, G. B. and J. M. Mendel (1986a). Stochastic realization of nonminimum phase systems. Proc. 1986Am. Control Conf., Seattle, Washington, pp. 1254-1259. Giannakis, G. B. and J. M. Mendel (1986b). Approximate realization and model reduction of nonminimum phase stochastic systems. Proc. 25th IEEE Conf. Decision Control, Athens, Greece, pp. 1079-1084. Giannakis, G. B., J. M. Mendel and W. Wang (1987). ARMA modeling using cumulant and autocorrelation statistics. Proc. 1987 1EEE Int. Conf. Acoustics, Speech, Signal Proc., Dallas, Texas, pp. 61-64. Giannakis, G. B. and A. Swami (1987). New results on state-space and input-output identification of nonGaussian processes using cumulants. Proc. SPIE, San Diego, California, pp. 199-205. Hsueh, A.-C. and J. M. Mendel (1985). Minimum-variance and maximum-likelihood deconvolution for noncausal channel models. 1EEE Trans. Geoscience Remote Sensing, GE-7,,3, 791-808. Huzii, M. (1981), Estimation of coefficients of an autoregressive process by using a higher-order moment. J. Time Series Analysis, 2, 87-93. Kay, S. M. and S. L. Marple, Jr. (1981). Spectrum analysis--a modern perspective. IEEE Proc., 69, 13101418. Klauder, J. R., A. C. Price, S. Darlington and W. J. Alkcrsheim (1960). The theory and design of chirp radars. The Bell System Tech. J., 39, 745-820. Lii, K. S. and M. Rosenblatt (1982). Deconvolution and estimation of transfer function phase and coefficients for nongaussian linear processes. Annals Statist. 10, 11951208. Ljung, L. (1978). Convergence analysis of parametric identification methods. IEEE Trans. Aut. Control, AC-2,3, 770-783. Matsuoka, T. and T. J. Ulrych (1984). Phase estimation using the bispectrum. IEEE Proc., 72, 1403-1411. Mehra, R. K. (1971). On-line identification of linear dynamic systems with applications to Kalman filtering. IEEE Trartv. Aut. Control, AC-16, 12-21. Mendel, J. M. (1987). Use of higher-order statistics in signal processing and system theory: a short perspective. 1987 Math. Theory of Networks and Systems Conf., Phoenix, Arizona. Murphy, M. S. and L. M. Silverman (1978). Image model representation and line-by-line recursive restoration. IEEE Trans. Aut. Control, AC-2,3, 809-816. Nakajima, F. and F. Kozin (1979). A characterization of consistent estimators. IEEE Trans. Aut. Control, AC-24, 758-765. Nikias, C. L. (1988). A R M A bispectrum approach to nonminimum phase system identification. IEEE Trans. Acoustics, Speech, Signal Processing, ASSP-36, 513-524. Nikias, C. L. and H. H. Chiang (1987). Non-causal autoregressive bispectrum estimation and deconvolution. Proc. 1987 1EEE Int. Conf. Acoustics, Speech, Signal Proc., Dallas, Texas, pp. 1557-1560. Nikias, C. L. and M. R. Raghuveer (1987). Bispectrum estimation: a digital signal processing framework. IEEE Proc., 75, 869-891. Oppenheim, A. V. and R. W. Schafer (1975). Digital Signal Processing. Prentice-Hall, Englewood Cliffs, New Jersey. Parzen, E. (1967). Time series analysis of models of signal plus white noise. In B. Harris (Ed.), Spectral Analysis of Time Series. Wiley, New York. Rosenblatt, M. (1985). Stationary Sequences and Random Fields. Birkh~iuser, Boston, Massachusetts. Scargle, J. D. (1981a). Phase-sensitive deconvolution to model random processes, with special reference to
Consistent parameter estimation astronomical data. In D. F. Findley (Ed.), Applied Time Series Analysis, IL Academic Press, New York. Scar#e, J. D. (1981b). Studies in astronomical time series analysis, I: modeling random processes in the time domain. Astrophys. J. Supplement Series, 45, 1-71. Sheriff, R. E. and L. P. Geldart (1982). Exploration Seismology, Vol. I., Sect. 5.4.3. Cambridge University Press, Cambridge, U. K. Tekalp, A. M. and H. Kaufman (1988). On statistical identification of a class of linear space-invariant image blurs using nonminimum-phase ARMA models. IEEE Tram. Acoustics, Speech, Signal Processing, ASSP-36, 1360-1363. Tekalp, A. M., H. Kaufman and J. W. Woods (1986). Identification of image and blur parameters for the restoration of noncausal blurs. 1EEE Tram. Acoustics, Speech, Signal Processing, ASSP-34, 963-972. Tugnait, J. K. (1986a). Recursive parameter estimation for noisy autoregressive signals. IEEE Tram. Inf. Theory, IT-32, 426-430. Tugnait, J. K. (1986b). Identification of nonminimum phase linear stochastic systems. Automatica, 22, 457-464. Tugnait, J. K. (1986c). Modeling and identification of symmetric noncausal impulse responses. IEEE Tram. Acoustics, Speech, Signal Processing, ASSP-34, 11711181. Tugnait, J. K. (1987a). Identification of linear stochastic systems via second- and fourth-order cumulant matching. IEEE Tram. Inf. Theory, IT-33, 393-407. Tugnait, J. K. (1987b). Fitting noncausal autoregressive signal plus noise models to noisy non-Gaussian linear processes. IEEE Tram. Aut. Control, AC-32, 547-552. Tugnait, J. K. (1988a). On selection of maximum cumulant lags for noncausal autoregressive model fitting. Proc.
61
Tugnait, J. K. (1990). Consistent order selection for noncausal autoregressive models via higher-order statistics. Automatica, to be published. Walden, A. T. and J. W. J. Hosken (1986). The nature of non-Gaussianity of primary reflection coefficients and its significance for deconvolution. Geophysical Prospecting, 34, 1038-1066. Wiggins, R. A. (1978). Minimum entropy deconvolution. Geoexploration, 16, 21-35. Woods, J. W. (1981). Two-dimensional Kalman filtering. In T. S. Huang (Ed.), Two dimensional Digital Signal Processing. I. Springer, New York. APPENDIX. PROOFS OF LEMMAS 4-6 Here we provide proofs of Lemmas 4-6. But first some preliminaries. From (1) we have ys(t) = ~
h(i)w(t-i)
(A1)
where h(i) is the impulse response function of the signal model (1). By (H1) it is straightforward to show that Ih(t)l -< M[fl] I`1, -oo < t < ~
(A2)
for some M < oo and 0 3 < 1. The bound given by (A2) proves useful in proving the desired lemmas.
Proof of Lemma 4. It follows by the use of (A2) and mimicking the proof of Lemma 3 in Tugnait (1987a).
[]
IEEE Int. Conf. Acoustics, Speech, Signal Processing, New York, pp. 1360-1363. Tugnait, J. K. (1988b). Recovering the poles from fourth-order eumulants of system output. Proc. 1988 Am. Control Conf., Atlanta, Georgia, pp. 2090-2095. Tugnait, J. K. (1989). Recovering the poles from third-order cumulants of system output. IEEE Tram. Aut. Control, AC-34, 1085-1089.
Proof of Lemma 5. It follows by the use of (A2) and mimicking the proof of Lemma 4 in Tugnait (1987a). [] Proof of Lemma 6. It is a direct consequence of the continuity of R2(rl 0) in 0 and of C3(tl, t2l 1/,) in ~p, for ~p = (0, Y3w, )'3~) ¢ ~ , which, in turn, follows from (A2). I"1