Signal Processing 77 (1999) 139}157
Autoregression and irregular sampling: Spectral estimation R.J. Martin* GEC-Marconi Research Centre, Elstree Way, Borehamwood, Herts WD6 1RX, UK Received 7 February 1997; received in revised form 28 January 1999
Abstract We describe how to model an irregularly sampled data sequence as a set of observations from a linear stochastic process, permitting data classi"cation and spectral estimation. The technique is a generalisation of the prediction error approach taken by the Burg and Covariance methods, in that a generalised prediction error energy is minimised with respect to the AR coe$cients. Various tests on synthetic data, using line and broad-band spectra, show that the method is robust. In a companion paper we show that the generalised prediction error approach permits "ltering or signal separation. 1999 Published by Elsevier Science B.V. All rights reserved. Zusammenfassung Wir beschreiben die Modellierung einer irregulaK r abgetasteten Datenfolge als Menge von Beobachtungen eines linearen stochastischen Prozesses. Eine solche Modellierung erlaubt eine Klassi"kation der Daten und eine SpektralschaK tzung. Bei dem Verfahren handelt es sich um eine Verallgemeinerung des in der Burg- und Kovarianzmethode verwendeten PraK diktionsfehleransatzes, insofern als eine verallgemeinerte PraK diktionsfehlerenergie bezuK glich der ARKoe$zienten minimiert wird. Verschiedene Tests mit synthetichen Daten, die Linienspektren und breitbandigen Spektren entsprechen, zeigen die Robustheit der Methode. In einem Begleitartikel zeigen wir, da{ der verallgemeinerte PraK diktionsfehleransatz eine Filterung oder Signaltrennung erlaubt. 1999 Published by Elsevier Science B.V. All rights reserved. Re2 sume2 Nous deH crivons comment modeH liser une seH quence de donneH es eH chantillonneH es irreH gulie`rement par un ensemble d'observations provenenant d'un processus stochastique lineH aire, ce qui permet la classi"cation de ces donneH es et leur estimation spectrale. Cette technique est une geH neH ralisation de l'approche par erreur de preH diction suivie par les meH thodes de type Burg et covariance, en ceci qu'une eH nergie d'erreur de preH diction geH neH raliseH e est minimiseH e par rapport aux coe$cients AR. Divers tests sur des donneH es syntheH tiques, preH sentant des spectres de raies et large-bande, montrent que cette meH thode est robuste. Dans un article compagnon nous montrons que l'approche par erreur de preH diction geH neH raliseH e permet "ltrage et seH paration de signaux. 1999 Published by Elsevier Science B.V. All rights reserved. Keywords: Autoregression; Irregular sampling; Model "tting; Spectral estimation
* Corresponding author. Paribas, 10 Harewood Ave, London NW1 6AA, UK. 0165-1684/99/$ - see front matter 1999 Published by Elsevier Science B.V. All rights reserved. PII: S 0 1 6 5 - 1 6 8 4 ( 9 9 ) 0 0 0 2 9 - 8
140
R.J. Martin / Signal Processing 77 (1999) 139}157
1. Introduction An AR process is a discrete-time linear stochastic process given by N x # a x "e , (1) L H L\H L H in which the RHS is uncorrelated and, for the representation to be useful, small in the sense that Var e Var x (here Var can be taken to mean L L a time-average or a realisation-average). AR modelling is known to be e!ective for modelling short regularly sampled data sets in which the underlying signal contains strong spectral features; by estimating a process that is assumed to hold for all time, it is capable of achieving higher resolution than a simple FFT, which makes no realistic assumption about the origin of the signal [10]. The spectrum is easily obtained from Eq. (1) as p C , z"ep DBR P( f )" "1# N a z\L" H H with dt the intersample spacing. Of course, for any given set of data, we need to be able to estimate the AR coe$cients (a )N conveniently. Three H H methods for achieving this are the Yule-Walker, Burg and Covariance methods [10]. The YuleWalker technique uses the sample autocovariance to obtain the coe$cients; the Covariance method de"nes, for a set of numbers a"(a )N , a quantity H H known as the total forward and backward prediction error power
1 , N E(x, a)" x# ax L H L\H 2(N!p) LN> H N # x # aHx . (2) L\N H L\N>H H and minimises this w.r.t. a. As E(x, a) is a quadratic function of a, *E(x, a)/*a is linear in a and so this is a linear optimisation problem. The Burg method is a constrained minimisation of E(x, a) using the Levinson recursion, a computational device derived from the Yule-Walker method. It is apparent that the bene"ts of AR modelling could be brought to bear on irregularly sampled signals, if only one could "nd a way of generalising
Eq. (1) and devising a method for estimating the associated parameters. This problem has been addressed before, and there are several strands of research. The work on maximum-entropy techniques [2,5,16}18] relies on using the autocorrelation moments which for regular sampling are de"ned as R [k]"EVxHx "E xHx VV H H>I H H H>I (regular sampling). Here EV denotes an average over all realisations of the random process x, and E denotes a time averH age over j. The "rst step to a general theory is the development of techniques which use irregularly spaced values of the ACF. In practice the ACF is not available explicitly. It can be estimated in the missing data problem, i.e. when the some of the x 's H are unknown. In that case the average in the above equation is, for each k desired by the user, simply taken over the available pairs ( j, j#k). Unless one is prepared to try and reconstruct the underlying continuous waveform, the method is not applicable to the general problem of irregular sampling. To clear up any misunderstanding that may arise later, we adopt the following notation for irregularly sampled data: the data are numbered in the order that they are observed, i.e. 2, x , x , x , 2 ob served at time instants 2, t , t , t , 2 . This is not the same notation as used in work on the missing data problem, in which the grid points are numbered consecutively but not every grid point has a data point associated with it. Using as we will the (t , x ) notation, we must remember that the expresI I sion 1 [k]"ExHx (irregular sampling) VV H H>I implicitly involves an average over the observation times, and hence depends on the sampling scheme. Accordingly, one must not use the 1 [k]'s as estiVV mates of the underlying ACF and insert them into the Yule-Walker equations in an attempt to "nd the autoregression coe$cients. We shall use them, but only in arguments where we wish to show that a certain observation is statistically independent of previous ones. The second possibility for the missing data problem is the dual estimation of coe$cients and
R.J. Martin / Signal Processing 77 (1999) 139}157
missing data points, often by a recursive procedure in which the data are reconstructed with "xed coef"cients, then the coe$cients re-estimated, and so on [4,20]. One could consider the general problem in an approximate way as the missing data problem with a very high proportion of missing data points, but [6,7] this is not very realistic. This has led to the consideration of the continuous-time model (3). Masry [9] shows that the coe$cients in that equation may be obtained from the 1 [k]'s, but, as VV hinted above, the estimation of these requires a large amount of data and the results are asymptotic in the limit of in"nite data. The other continuous-time approach is that of Jones [6,7] who has used Kalman recursive estimation [3] to obtain a likelihood function lik(x " b) which is then maximised w.r.t. b to obtain an estimate of the true parameters. In principle this should give &optimal' results, but it did not in Jones' experience for he found that the likelihood function was not &well behaved' but instead had numerous local maxima. This might be due to the fact that the Kalman "lter propagates an initial state through the data set and small changes in the input state can produce large changes in the "nal estimate and hence in the likelihood function; also, the Kalman recursion is not always numerically stable and there were di$culties with roundo! error. The approach taken here is somewhat simpler, and generalises the Covariance method, in the sense that it generalises the above de"nition of E(x,a) by replacing the a 's on RHS(3) with timeH varying coe$cients. This gives an explicit de"nition of E(x, a) which can be estimated quickly from the data. The computational inconvenience is thereby reduced to minimising E(x, a), which is of course a nonlinear optimisation. The initial ideas behind this piece of work, and the concept of a timevarying predictor based on an underlying AR model, were presented in [11], but the author had not realised at that time that the method was suitable for the general case, and instead concentrated on the missing data problem. Intervening developments are given in [12,15]. This paper is organised as follows. First we relate the discrete process (1) to the Ito di!usion (3) by showing that by discrete regular sampling of Eq. (3)
141
we obtain an ARMA(p,p!1) process (a wellknown result). The MA(p!1) part is then ignored. We then show that the same result holds for irregular sampling, if the prediction coe$cients are allowed to be time-varying, and construct these coe$cients explicitly. By an irregular ARMA process we mean that the sequence of prediction errors +f ,, obtained using the time-varying prediction L coe$cients, has a "nite autocorrelation sequence, i.e. 1ff[k] is zero for k*p. We then de"ne a ztransform so that calculations can be done in the more familiar z-domain; this corresponds to imposing a bandwidth limit and de"ning a notional sampling time q for the model. The data are not moved on to a regular grid, though. The underlying model, i.e. the z-domain poles, which relate via an exponential to the Laplace-domain poles of Eq. (3), are then found from a data set by minimising the generalised forward and backward prediction error energy. Test examples are given using both narrow-band and broad-band data. The emphasis is on using short data sets and investigating whether the use of irregular sampling leads to di!erent estimates from regular sampling.
2. Method 2.1. Connection with a stochastic integral The continuous-time formulation that we adopt is the following Ito di!usion: +Ly(t), dt"dB(t),
d N\H N L, b , H dt H
(3) b "1, in which B(t) is a Brownian motion on 1 or ". An introduction to these processes is given in [19,8]; the latter discusses a wide variety of applications, including geophysics, #uid dynamics, biomedicine, and the pricing of "nancial &derivatives'. We shall only need some elementary facts and these are discussed in the appendix. The solution of Eq. (3) is
y(t)"
G(t!t) dB(t), \
(4)
142
R.J. Martin / Signal Processing 77 (1999) 139}157
where G(t) is the Green's function satisfying t(0 N G(t)"0.
LG(t)"d(t),
For a stable process, G(t)P0 exponentially as tPR. For an unstable process the integral does not exist. Suppose now that Eq. (4) is sampled at regular time intervals (t "n.dt), and call the samL ples y , so y "y(t ). De"ne the poles of model (3) as L L L numbers b 3" obeying G N N b 1\H, (1!b /1). H G G H Any linear combination of functions e@GR satis"es the homogeneous (unforced) version of Eq. (3), so for distinct poles the Green's function is constructed thus:
G(t)"
0, N j e@GR, G G
t(0, t'0
for appropriate coe$cients j which, incidentally, G are given by
j "(!)G> (b !b ) (b !b ) G H I H I HI HI HI$G with L'Ho( pital's Rule being used if there are indistinct roots; however we shall not need (j ) explicitly. G De"ne also a "e@GBR, and coe$cients a by G H N N 1# a z\H, (1!a /z). (5) H G H G De"ning N f" ay L H L\H H we have
(a " : 1),
N G(t !t) dB(t). f" a L H L\H \ H On expanding the Green's function we obtain
N f" a L H H
RL\H
RL\N
#
RL\N
N j e@GRL\H\RY dB(t). G \ G (6)
But by design we have that N N e@GRL\Ha "1# a a\H"0 (1)i)p). (7) H H G H H So the terms arising from the second integral in Eq. (6) (i.e. that going back to t"!R) vanish, and
RL
(Function of t) dB(t). RL\N Using the overlap formula (see the appendix)
f" L
E
f (t) dB(t)
H
(8)
g(t) dB(t)
"p f (t)Hg(t) dt, we can see that if f (t) and g(t) have non-overlapping support then the LHS must vanish. Applying this result to the sequence +f , as given by the stochastic L integral in Eq. (8), we have that E +f , is a MA(p!1) process: m*pNE fHf L L L>K "0; E +f , is independent of past observations: m* L pNE fHy "0. L L\K Thus +y , is an ARMA(p,p!1) process, con"rming L a well-known result of Bartlett [1]. Crucially though we can go through the above working when the sampling is irregular and arrive at the same conclusion; we shall have to replace the (a ) with H time-varying coe$cients so that Eq. (7) is obeyed for each n. 2.2. Fitting the model (part 1) In the above discussion we saw that convolving (a ) with the data sequence obtained by sampling H Eq. (4) regularly produced a moving-average (&allzero') output. We shall now show how to derive time-varying coe$cients to generalise this idea to the irregularly sampled case. Let the given data be (t ,y ), where y are the data values and t the time L L L L instants at which they are observed; we shall assume that m(nNt (t . K L De"ne a predictor to be a (p#1)-vector of complex numbers rL depending on L and the sample spacings t ,2,t and satisfying the condition L\N L N K(t)3KN rLK(t )"0, (9) H L\H H
R.J. Martin / Signal Processing 77 (1999) 139}157
where K"ker L is the set of functions sent to 0 by L. We showed above that for regular sampling the predictor coe$cients are simply the AR coe$cients, up to a factor. For arbitrary sampling Eq. (7) implies that rL must satisfy the matrix equation e@GRL\H\RL\NrL"0 (1)i)p) (10) H (where the t term in the exponent is for conveniL\N ence). There are p equations and p#1 unknowns, so a solution will always exist and generically the solution space will have dimension 1. Then the forward innovation errors, de"ned by
as in the standard AR de"nition. We then propose to minimise E(x,b) with respect to b. However we have not quite "nished, because the de"nitions for rL and sL are incomplete. At present they are determined only up to scalar multiple, which must be set in order for f and b to be well-de"ned. In the next L L section we resolve this problem for the AR(1) case and then state a workable de"nition for the general case. 2.3. AR(1) case (the Ornstein}Uhlenbeck process) Let us consider the "rst-order continuous-time stochastic process (Ornstein}Uhlenbeck or OU process)
N f " rLy , L H L\H H can be expressed as
dy(t)#by(t) dt"dB(t),
N f " rL G(t !t) dB(t) L\H L H \ H RL N G(t !t) dB(t) as before " rL L\H H RL\N H and so +f , enjoys the same two properties as L above, viz. m*pNE fHf "E fHy "0. L L>K L L\K Reverse innovation errors are de"ned similarly. If an AR process is stable and de"ned by the parameters (a ), then its reverse is also an AR process, H and has parameters (aH). The reverse innovation H errors are therefore de"ned for irregular sampling by N b " sLHy , L H L>H H (1)i)p).
143
N e@GRL\RL\H>NsL"0 H H (11)
To construct the function E that generalises Eq. (2) we take our data set (t ,x ), construct the forward L L and backward predictor coe$cients rL, sL, and deH H "ne forward and backward prediction errors according to N N f " rLx , b " sLHx L H L\H L H L>H H H with rL and sL given by Eqs. (10) and (11), and then write 1 , E(x,b)" "f "#"b " L L\N 2(N!p) LN>
(12)
Re b'0,
(13)
which is solved by the stochastic integral
y(t)"
R
e@YRY\R dB(t). \ Therefore y "e\@YRL\RL\y L L\
#e\@YRL\RL\
RL
e@YRY\RL\ dB(t).
RL\ First let us consider the case when we know b and merely wish to simulate observations from this process. For brevity let us write F for the last term in L the above equation. It satis"es the equations m*1NE FHy "E FHF "0. L L\K L L!K We can go further, for p E "F "" (1!e0 @Y RL\\RL), L 2Re b p E "y "" L 2Re b and so to simulate a "rst-order stochastic process of variance p and decay parameter b we have only W to calculate y "e\@YRL\RL\y #p (1!e0 @Y RL\\RL)g , L L\ W L where +g , are independent observations from the L standard normal distribution.
144
R.J. Martin / Signal Processing 77 (1999) 139}157
Now let us consider how our generalised prediction error approach fares when we wish to estimate b from a section of data. Suppose we think, in the process of doing the parameter estimation, that the Laplace transform-domain pole (of the stochastic system (13) from which the data are generated) is at 1"!b; of course, it is actually at 1"!b. Let us construct prediction errors; we would like these to be bigger &in the long run' if bOb than if b"b. By Eqs. (10) and (11)
rL sL 1 , J . rL sL !e@RL\\RL Suppose that we remove the proportionality symbol by simply declaring rL "sL "1. This is just as in the regularly sampled case, where a "1. Then the forward and backward prediction errors are
f "y !e@RL\\RLy ("F as de"ned earlier), L L L\ L b "y !e@RL\\RLy L\ L\ L and we can calculate their variances using the overlap formula for stochastic integrals: p E "f ""E "b "" (1#"e@RL\\RL" L L\ 2Re b !2Re+e@H>@YRL\\RL,). For brevity write PE(b) for this quantity. Then p PE(b)" (1!"e@YRL\\RL") 2Re b and so PE(b)!PE(b) p " ("e@RL\\RL!e@YRL\\RL")*0. 2Re b This is an important conclusion, because we can now say that &on average' (i.e. over all realisations of the underlying random walk B(t)) E( y, b)*E( y, b) and it follows that minimising E( y, b) with respect to b is a good way to estimate b. The same conclusion is obtained even if the prediction error energies are &weighted' with weights depending on the observation times. One might, for example,
notice that PE(b) is approximately proportional to (t !t ), and minimise instead the function L L\ 1 , "f "#"b " L\ . E(y,b) " L t !t 2(N!1) L L\ L 2.4. Fitting the model (part 2) An analysis similar to that above is extremely di$cult if the model order is greater than 1. However one does not have to perform much analysis before coming across the following problem, which presents itself when the model order is greater than 1 and the sampling is irregular. Suppose that, for p"2, b "!b "pitK \, t "3tK , t "2tK , t "0, L L\ L\ where tK is the unit of time that is being used (b has G dimensions (time)\). The matrix equation (10) reads
rL rL "0, implying rL "0. i !1 1 rL In other words, requiring rL "1 would make rL and rL in"nite and there would be a singularity in the function b C E(x,b). Rather than imposing rL "1 we therefore constrain the length (2-norm) of the vector rL"(rL)N . But now that we have deH H parted from the convention that the zeroth prediction coe$cient be 1, which is the case when the sampling is regular (recall that a "1), we must be careful that our new de"nition coincides with the classical case when the sampling is regular. So, although the normalisation
!i !1 1
N "rL""1 (not used) H H is super"cially attractive, it does not agree with the classical de"nition for prediction coe$cients for the simple reason that 1# N "a " is not 1. However, H H all we have to do is make it agree. That procedure is very straightforward when one realises that regular sampling involves a characteristic time, namely the intersample spacing. In classical DSP one has the z-transform, in which the z-transform-variable is
R.J. Martin / Signal Processing 77 (1999) 139}157
related to the Laplace transform-variable 1 by z"e1BR. A Nyquist interval [!B, B] then arises, where B"1/(2.dt). For irregular sampling there is no speci"c &characteristic time', and we therefore choose one ourselves; indeed, as the Nyquist interval now expands, we may choose the Nyquist interval ourselves and put q"B, z"e1O. We also cut the z-plane along the negative real axis, which corresponds to frequency B. Having established a normalising frequency B of our choice, or a &characteristic time' q (we can talk about either) we can now operate in the z-domain as we are accustomed to do for autoregression problems, and propose the following normalisation for the predictors: "rL"""sL""N(a) ,
(14)
where the norm function N( ) is de"ned for a set of z-plane poles (a ) thus: "rst, expand the product G N (1!a /z),1# N c z\H, and then comG H H G pute N(a)"1# N "c ". Now, when the sampH H ling is regular, we have q"dt and rL"sL" l(1, a , 2, a ) where l is an arbitrary scalar of N modulus 1 and the (a ) are the classical AR coe$G cients. The fact that we do not know l is not a problem, because we only need "f " and "b " (not L L f and b explicitly) when we come to evaluate the L L error energy E. The de"nition of E is complete. 2.5. Performing the optimisation In this section we discuss how we have gone about the task of minimising E(x, a) with respect to a. The optimisation is nonlinear, so we have employed the following method which consists of two parts: a search and then a gradient descent. The search is necessary to ensure that the gradient descent does not become stuck in spurious local minima. First, a quick search of the performance surface is performed. The annulus +0.95("z"(1.00, is cut into sectors of angle p/6. For a model order p there are (>N) di!erent ways of choosing poles such that N each pole lies within one of the twelve sectors. If the poles are selected in complex-conjugate pairs, which we have done because we are working with real-valued signals in our simulations, then the
145
count reduces to (>N). For each possible choice of N sectors poles are chosen at random positions within their sectors and E is calculated. The choice of poles that gave the minimum E value is then taken as the starting point in a gradient descent algorithm, which is the second part of the method. The local gradient at any particular point is estimated by looking at nearby points. One then walks a down the performance surface in small steps in the direction of steepest descent. When no further progress can be made, the step size is made smaller and the process is repeated. The resulting minimum is found to a precision "da"(0.001. It might appear that, by choosing all the search points very near the unit circle, one cannot obtain a model with poles away from the unit circle; but this is not so. Unnecessary poles are moved away from the unit circle during the optimisation. Hence, if one is picking the poles in complexconjugate pairs, a sixth-order model may have a spectrum with three, two or one pronounced spikes, or even none at all. This is seen in the test results. 2.6. Summary This section summarises the model "tting procedure for arbitrary model order. If the desired model order is no greater than 1, the method of Section 2.3, which is very similar, can be employed. E Take a block of data (t , x ), . L L L E Set the bandwidth limit to B and choose a notional sampling interval q"B. E For a set of z-plane poles a the function E is G de"ned as follows. The normalising index N(a) is obtained (14); then the predictors rL, sL are found using the third, fourth and "fth equations below; then the prediction errors are found using the second. 1 , E" "f "#"b ", L L\N 2(N!p) LN> N N f " rLx , b " sL Hx , L H L\H L H L\N>H H H RLrL"0, RL "aRL\H\RL\NO GH G (1)i)p, 0)j)p),
146
R.J. Martin / Signal Processing 77 (1999) 139}157
SL "aRL\RL\N>HO GH G (1)i)p, 0)j)p),
SLsL"0,
"rL"""sL""N(a). [All this means is that we need the kernels of RL and SL. A simple way to do this is to factorise each as LQ (L lower-triangular, Q orthogonal), after which the (column) vector that generates the kernel is the transpose of the bottom row of Q; that vector will be of length 1.] E Minimise E w.r.t. the a . G E In the degenerate case when the data consist of p pure tones Ae SR, E is zero precisely when the (a ) are in the correct places (a "e SGO); so of G G course minimising E gives the right answer. E When the sampling is regular, there is no di!erence between this and the modi"ed Covariance method [10].
3. Test examples It is almost universal to test autoregressive algorithms on data consisting of sinusoids in noise; by so doing one can investigate the resolution, performance for di!erent SNR, and so on. We have therefore run some tests on this kind of data. One objection to this kind of test is that tones in noise do not obey an AR model. Therefore we have also tested the algorithm in question on genuinely broad-band data. The speci"c aim in carrying out these tests has been to examine if, and how, the model "tting procedure depends on the sampling scheme. Ideally, there should be no dependence except when there are spectral components above the Nyquist limit; in those cases regular sampling would give rise to aliasing, so the results for di!erent sampling schemes would not agree. 3.1. Test on one sinusoid in noise The "rst test was to take 64 samples of a sinusoid in white additive Gaussian noise of standard deviation 1. The frequency of the sinusoid was taken (arbitrarily) as 0.27Hz. The quality of "t was assessed as a function of the following three para-
meters: E Signal-to-noise ratio. Four values of amplitude (A) of the sinusoid were considered: 8,4,2 (SNR #15,#9,#3 dB) and 0. The purpose of trying A"0 was to examine the level of spurious features thrown up. E Model order. Orders 2 and 6 were used. E Sampling scheme. Six sampling schemes were applied: (a) regular sampling with period 1 (i.e. 1 s), (b) additive random sampling, intersample spacings drawn from the rectangular (&uniform') distribution on [0.5,1.5], (c) additive random sampling, intersample spacings drawn from the C(2,2) distribution, (d) additive random sampling, intersample spacings drawn from the C(3,3) distribution, (e) jittering the sampling in scheme 1, varying the positions of the samples randomly, with uniform probability, up to $0.25, (f ) jittering the sampling in scheme 1, varying the positions of the samples randomly, with uniform probability, up to $0.5. The C(l, j) distribution has pdf 1 jJtJ\e\HR (l'0, j'0) C(l) and has mean l/j. (When l"1 it is the Poisson distribution.) In each of the six cases enunciated above, the average sampling rate is 1(Hz). The parameter q was set to 1 for each of these tests, corresponding to a notional Nyquist interval [!, ]. The results are shown in Figs. 1}4. Some general trends are clear. For SNR"#15 dB (corresponding to A"8) the results for model order 2 are all good (and virtually identical); the peak becomes lower and wider as the SNR reduces. The e!ect of increasing the model order is to sharpen the peak, a well-known e!ect with the classical AR spectral estimator; but the results for sampling schemes (c) and (d) are consistently not as good as for the other four schemes. We are not sure why this should be, but it may be due to the wider spread of intersample spacings for those two schemes.
R.J. Martin / Signal Processing 77 (1999) 139}157
147
Fig. 1. AR(p) spectrum for single sinusoid (frequency 0.27) in white noise (SNR#15 dB). Sampling schemes: see Section 3.1. Orders: [solid line] p"6, [dotted line] p"2.
3.2. Test on two sinusoids in noise The second test was to take 64 samples of a pair of sinusoids, each of amplitude 4, in white additive Gaussian noise of standard deviation 1. The same sampling schemes as in Section 3.1 were applied,
and model orders of 4 and 6 were used. See Figs. 5}6. First the frequencies were selected as 0.27 and 0.37; one sees that AR(4) model does not separate the two components, but the addition of a third pair of poles is su$cient to split them.
148
R.J. Martin / Signal Processing 77 (1999) 139}157
Fig. 2. AR(p) spectrum for single sinusoid (frequency 0.27) in white noise (SNR#9 dB). Sampling schemes: see Section 3.1. Orders: [solid line] p"6, [dotted line] p"2.
When the frequencies were altered to 0.43 and 0.83 the uniform sampling gives rise to aliasing, but the nonuniform sampling schemes correctly identify the frequencies (the &Nyquist interval' was expanded by putting q"0.5 for this example). In fact
the uniform sampling gave us some problems, because the error function E(x; a) had multiple minima and depending on the starting conditions di!erent answers were produced. Of course, this is not a fault of the method, which if the sampling is
R.J. Martin / Signal Processing 77 (1999) 139}157
149
Fig. 3. AR(p) spectrum for single sinusoid (frequency 0.27) in white noise (SNR#3 dB). Sampling schemes: see Section 3.1. Orders: [solid line] p"6, [dotted line] p"2.
regular cannot be expected to di!erentiate between frequencies above the Nyquist limit and their aliases. 3.3. Test on broad-band signals A 64-point test sequence was created by calcu-
lating x(t)"y (t) cos 2pf t#y (t) sin 2pf t (15) with y (t) and y (t) independent observations from OU process (13) with parameters b"0.02 or 0.20, p"50 (see Section 2.3). Di!erent values of b give W
150
R.J. Martin / Signal Processing 77 (1999) 139}157
Fig. 4. AR(p) spectrum for white noise. Sampling schemes: see Section 3.1. Orders: [solid line] p"6, [dotted line] p"2.
di!erent degrees of incoherence, or equivalently di!erent notional bandwidths. A typical realisation of the process generating y (t) and y (t) is shown in Fig. 7. The ¢re frequency' f was taken as 0.21. The model order was taken as 2 and the "t was assessed
as a function of the sampling scheme. The same six schemes as in Section 3.1 were applied. As can be seen from Fig. 8, the AR spectra are virtually identical. In particular, the results for sampling schemes (c) and (d) are not demonstrably worse, as they are in the results of Section 3.1. Perhaps this is because
R.J. Martin / Signal Processing 77 (1999) 139}157
151
Fig. 5. AR(p) spectrum for pair of tones in white noise (frequencies 0.27, 0.37; SNR#9 dB). Sampling schemes: see Section 3.1. Orders: [solid line] p"6, [dotted line] p"4.
the signals under consideration in these tests are closer to being autoregressive. As an alternative method of spectral broadening we have considered amplitude modulation by
a continuous-time chaotic signal, the Lorenz attractor [21] x(t)"Z (ct) cos 2pf t#Z (ct) sin 2pf t.
(16)
152
R.J. Martin / Signal Processing 77 (1999) 139}157
Fig. 6. AR(p) spectrum for pair of tones in white noise (frequencies 0.43, 0.83; SNR#9 dB). Sampling schemes: see Section 3.1. Orders: [solid line] p"6, [dotted line] p"4. Note that aliasing has occurred in (a).
R.J. Martin / Signal Processing 77 (1999) 139}157
153
Fig. 7. A typical realisation of the stochastic process (13), used to generate (15), with b"0.05. (a) 256-point time series, (b) power spectrum as estimated by windowed FFT.
The Lorenz equations are dZ /dt"10(Z !Z ), dZ /dt"!Z Z #28Z !Z , dZ /dt"Z Z !Z . Z and Z are &independent' realisations of the chaotic process, in that although they are both observations on the Z variable their starting con ditions are di!erent. The integration of the Lorenz equations was performed using an adaptive step-size fourth-order Runge}Kutta algorithm and a relative accuracy of NAG routine D02BAF.
10\. The sampling could therefore be performed by nominating the next sample point, and integrating in time as far as that point. The initial conditions, which correspond to points very close to the attractor (as opposed to arbitrary points in space) were (Z )(0)"(!15,!21, 30) and (Z )(0)"(7, 15, 10). A typical realisation of data from the Z variable, and its characteristically broad, continuous spectrum, are shown in Fig. 9. By altering c we can control the coherence of x: the lower c is, the more slowly the modulator varies, and the more coherent the process. We have considered two values, 0.02 and 0.10. Again (see Fig. 10) the AR spectra are seen to be virtually identical.
154
R.J. Martin / Signal Processing 77 (1999) 139}157
Fig. 8. AR(2) spectrum for continuous-time stochastic process (15). [Solid line] b"0.02, [dotted line] b"0.20. Sampling schemes: see Section 3.1.
R.J. Martin / Signal Processing 77 (1999) 139}157
155
Fig. 9. A typical realisation of the Z variable of the Lorenz attractor, used to generate (16), with c"0.05. (a) 256-point time series, (b) power spectrum as estimated by windowed FFT.
4. Conclusions We have presented a new method for "tting a continuous-time stochastic process to a set of irregularly observed data points. The method makes no assumptions about the sampling, and does not attempt to resample or reconstruct the signal, or to move the observation points onto a regular lattice. It works by de"ning a generalised prediction error energy function, which is minimised with respect to the AR coe$cients. It is useful for short data sets: 64-point sets were used in our test examples. In extracting sinusoids from noise it appears that for a correct estimate the minimum SNR is about #3 dB (at least for small model orders). We have shown also that the method
works well with genuinely broad-band data, and on this basis we can say that our method is a natural generalisation of commonly used AR estimation schemes that cope only with regularly sampled data. In a companion paper [13] we have shown how the generalised prediction error approach may be used in "ltering and signal separation schemes; a combined discussion of the issues appears in [14].
Acknowledgements I thank the referees for their generous and helpful comments. I am also grateful to Dr. Simon Godsill
156
R.J. Martin / Signal Processing 77 (1999) 139}157
Fig. 10. AR(2) spectrum for continuous-time dynamical process (16). [Solid line] c"0.02, [dotted line] c"0.10. Sampling schemes: see Section 3.1.
(University of Cambridge Engineering Dept.) for his many perceptive remarks, to Dr. Jaroslav Stark (University College London), and to the Royal Commission for the Exhibition of 1851 for awarding me a "nancially generous Industrial Fellowship.
Appendix A. A.1. Continuous-time stochastic processes A Brownian motion process B(t) may be de"ned in the following way. Independent Gaussian
R.J. Martin / Signal Processing 77 (1999) 139}157
random variables e of variance p are chosen and G C a process L B " e L G G is generated. De"ne now a time spacing dt and set the variance of each (e ) to p.dt. The parameter G p is known as the diwusion coezcient and has dimensions [B][¹]\ where [B], [¹] denote the dimensions of the physical variable and of time. Now de"ne the piecewise constant function B(t) by B(t)"B (n.dt(t((n#1)dt). L Then B(t) is de"ned as this in the limit dtP0. As a result, B(t) is a random walk obeying the condition
p dt, E dB(t )H dB(t )" 0,
t "t , t Ot , where dB(t ) are increments of the process at times G t over an in"nitesimal time dt. We may regard G dB(t) as a continuous-time white-noise process, in which case if it is passed into a linear system then the output of that system will be a convolution product of dB( ) ) with the impulse response. To examine the second-order statistics of the output we need the &overlap formula', easily obtained from the above expression: E
f (t) dB(t)
H
g(t) dB(t)
"p f (t)Hg(t) dt.
References [1] M.S. Bartlett, On theoretical speci"cation of sampling properties of autocorrelated time series, J. Roy. Statist. Soc. B 8 (1946) 27}41. [2] F.U. Dowla, J.H. McClellan, MEM spectral analysis for nonuniformly sampled signals, in: Proceedings of ICASSP, 1981, pp. 79}85. [3] A.C. Harvey, Filtering, Structural Time Series Analysis and the Kalman Filter, CUP, Cambridge, 1989.
157
[4] A.J.E. Janssen, R.N. Veldhuis, L.B. Vries, Adaptive interpolation of discrete time signals that can be modeled as autoregressive processes, IEEE Trans. Acoust. Speech Signal Process. 34 (1986) 317}330. [5] R.H. Jones, Maximum likelihood "ltting of ARMA models to time series with missing observations, Technometrics 22 (1980) 389}395. [6] R.H. Jones, Fitting a continuous time autoregression to discrete data, in: D.F. Findley (Ed.), Applied Time Series Analysis II, Academic Press, New York, 1981. [7] R.H. Jones, Fitting multivariate models to unequallyspaced data, in: E. Parzen (Ed.), Time Series Analysis of Irregularly Observed Data, Springer, Berlin, 1984, pp. 158}188. [8] P.E. Kloeden, E. Platen, Numerical Solution of Stochastic Di!erential Equations, Springer, Berlin, 1992. [9] K.-S. Lii, E. Masry, Model-"tting for continuous-time stationary processes from discrete-time data, J. Multivariate Anal. 41 (1992) 56}79. [10] S.L. Marple Jr, Digital Spectral Analysis with Applications, Prentice-Hall, Englewood Cli!s, NJ, 1987. [11] R.J. Martin, All-pole modelling of irregularly-sampled data, in: Sampling Theory and Applications, Proceedings of the Workshop SAMPTA'95, Jurmala, Riga, Latvia, September 1995, 1995, pp. 247}252. [12] R.J. Martin, Autoregressive modelling of irregularly-sampled data, in: G. Ramponi, G.L. Sicuranza, S. Carrato, S. Marsi (Eds.), Signal Processing VIII, Proceedings of EUSIPCO-96, 1996, pp. 2033}2036. [13] R.J. Martin, Autoregression and irregular sampling: "ltering, Signal Processing 69 (1998) 229}248. [14] R.J. Martin, Irregularly sampled signals: theories and techniques for analysis, Ph.D. Thesis, University of London, 1998. [15] R.J. Martin, S.J. Godsill, Autoregression and irregular sampling: estimation and "ltering, in: Sampling Theory and Applications, Proceedings of Workshop SAMPTA'97, Aveiro, Portugal, 1997, pp. 193}198. [16] R.N. McDonough, Maximum-entropy spatial processing of array data, Geophysics 39 (1974) 843}851. [17] W.I. Newman, Extension to the maximum entropy method, IEEE Trans. Inform. Theory 23 (1977) 89}93. [18] W.I. Newman, Extension to the maximum entropy method II, IEEE Trans. Inform. Theory 25 (1979) 705}708. [19] B. "ksendal, Stochastic Di!erential Equations, Springer, Berlin, 1992. [20] R.H. Shumway, Some applications of the EM algorithm to analysing incomplete time series data, in: E. Parzen (Ed.), Time Series Analysis of Irregularly Observed Data, Springer, Berlin, 1983, pp. 290}324. [21] J.M.T. Thompson, H.B. Stewart, Nonlinear Dynamics and Chaos, Wiley, New York, 1986.