J;il
STATISTICS & PROBABILITY LETTERS
ELSEVIER
Statistics & Probability Letters 35 (1997) 307-315
Nonparametric estimation of the time-varying frequency and amplitude 1 Vladimir Katkovnik *
Department of Statistics, University of South Africa, Pretoria 0001, South Africa Received 1 May 1994; received in revised form 1 June 1995
Abstract The local polynomial approximation is used to develop a nonparametric estimator of the time-varying frequency, phase and amplitude of a real-valued harmonic signal in noise. The estimator is shown to be strongly consistent and Gaussian. (~) 1997 Elsevier Science B.V.
Keywords: Instantaneous frequency; Frequency estimation; Nonparametric regression; varying parameters
Sinusoidal regression;
Time-
1. Introduction The model considered is
y(t) ----m(t) ÷ ~(t),
m(t) = A(t) sin(q~(t)),
(1)
where t is an integer, e(t) are i.i.d, normal random variables, E e ( t ) = 0, E e e ( t ) = a. A harmonic with the time-varying phase as well as the concept of the instantaneous frequency (IF), determined as
f2( t ) = dqg( t )/dt, are key-models, which have been utilized to study a wide range o f signals, including speech, music, other acoustic signals, biological signals, and so on (see, e.g., Boashash, 1992). The unknown A(t), qg(t), and I2(t) are to be estimated from observations y(t).
* Fax: (012)429-3434; e-mail:
[email protected]. ] This work was supported by the Foundation of Research Development of South Africa. 0167-7152/97/$17.00 (~) 1997 Elsevier Science B.V. All rights reserved PHS0167-7152(97)00027-8
Ill Katkovnik I Statistics & Probability Letters 35 (1997) 307-315
308
Let C°(t)ER 5 be a vector combined from the true values of the phase, IF, amplitude, and their first derivatives:
c°(t) = (C°(t)) \co(t)/,
C°(t) = (~o(t), f2(t), I2(1)(t)) T,
C°(t) = (A(t),A(1)(t)) T.
(2)
Here and in what follows x(S)(t)= OSx(t)/&~. Let Q be a compact given as the direct product of one-dimensional spaces Qj = (C°: 7u ~
V. Katkovnik I Statistics & Probability Letters 35 (1997) 307-315
309
assumed, and as a result the variance of these estimates and the Cramer-Rao lower bounds are monotonically decreasing function of the number of observations. The contribution of this paper includes: (a) The method for estimating both the time-varying frequency and amplitude and their derivatives from observations with noise. (b) Strong consistency and asymptotic normality of the estimator are given for the linear in time frequency and amplitude. In particular, it is proved that the variance of the frequency estimator has the orders O(h -3/2) and O(h -5/2) respectively for the time-invariant and linear in time amplitude. (c) Asymptotic normality for the estimates of the time-varying frequency, amplitude, and their derivatives is given for a fairly general class of the time-varying frequency and amplitude with bounded second derivatives. The orders of the optimal window size h and MSE are obtained.
2. Estimation algorithm The main idea of the LPA approach is to apply a truncated Taylor series in order to approximate the time-varying ~p(t) and A(t) and to use this approximation only locally in time. This approximations are used to calculate estimates for a single time-instant. For the next time-instant the calculations are repeated and new approximation series arise which are again used to get estimates for the single time-instant. This point-wise estimation gives the nonparametric character to the estimator. The polynomial approximation is applied in order to insure the reproduction properties with respect to the polynomial components of ~p(t) and A(t). It is emphasized that estimates of ~p(t) and A(t) are not assumed to be polynomial functions of time. The last is the principal difference between the nonparametric LPA and the parametric maximum likelihood estimator mentioned above where the constant coefficients of a polynomial series for the phase and the amplitude can be estimated. In order to implement the LPA approach for the time-varying frequency and amplitude, the following loss function is applied:
Jh(C) = Z
ph(i)(y(t ÷ i) -- m(C,i)) 2,
(3)
i
where ph(i)>>,O is the rectangular symmetric window, ph(i)= 1/h, [iI <,h/2, and h > 0 determines a window length, ~--~i = ~i~=-~ • C = (c~) in (3) is a vector of estimated parameters with elements corresponding to those given by (2), and
m( C, i) = A(i, t) sin(cp(i, t)), cp(i,t)=~p~(i)Ce,
A(i,t)=@pA(i)CA ,
(4)
~.(t) = (1, t, t2/2 ..... tJ/j!). Here ~bpo(i)Ce, ~bpA(i)CA, and m(C,i) are the LPA of the phase, the amplitude and m(C,i), respectively, for the instant t. The degrees p~ and pA determine the orders of models used for ~p(t) and A(t). In what follows it is assumed that p~ ---2,
PA = 1,
(5)
i.e. the orders of the LPA in (4) correspond to the hypothesis (HI). The high-order LPA estimates of ~o(t) and A(t) are introduced in this paper as a solution of the following constrained optimization problem (Katkovnik, 1994b, 1997):
C(t,h) = arg min Jh(C). CcQ
(6)
310
V. Katkovnik / Statistics & Probability Letters 35 (1997) 307-315
The term "high-order" has a double meaning in (6). Firstly, it means that the order of the algorithm (the number of estimated parameters) is larger than the number of the original parameters in question. In our case those are the three parameters ~o(t), f2(t), and A(t). The number of parameters estimated in (6) is five. Secondly, the bias of estimation error is proportional to the high (second) order derivatives given in (HI). The aim of the high-order polynomial approximation in (4) is firstly to obtain a high accuracy of estimation of q~(t), f2(t) and A(t), with the estimates of the derivatives as a secondary result, and then, to enable one to use the a priori information given in (H2).
3. Convergence of the estimates In this section the strong convergence and asymptotic normality of the estimator (6) are considered. The expectation of (3) is of the form
EJh( C) = Ih( C) + a 2, where
Ih(C) = Z
ph(i)(A(t + i) sin(q~(t + i)) - re(C, i)) 2. i
The following hypotheses are used. (H3) A(t) = ao + alt, (H4) Ih(C) > 0 ,
qfft) = bo ÷ bit + bzt2/2.
C c Q , if [ [ c - c ° ( t ) l l > 6
(7)
for all 6 > 0 and h---,co.
The last conditions seem rather unrestrictive and natural in the sense that if it fails for some C ~ C°(t) then one hardly expects effectively to distinguish between C and C°(t) by means of the quadratic criterion. (H5) Let C°(t) be an interior point of Q and ~o(t) be such that 1 Z p h ( i ) ( s i n ( q f f t + i ) ) ~ "ir--~O, i ~cos((p(t + i))J
as h--*cx~,
(8)
S r ( h ) = h-;
where r is an integer, 0 ~
C(t,h) ~; C°(t),
Jh(C(t,h)) ~ a 2.
Theorem 2. Under the assumptions of Theorem 1 and (H5) as h --* oo
hU2D(h)(C(t,h)- C°(t)) converges in law
to
N(0, o-2W-1)
IL Katkovnik I Statistics & Probability Letters 35 (1997) 307-315 where 1°. I f in (H3), a l = 0
then
D(h) = diag( 1, h, h2, 1, h),
O)
W~
311
(9) 1/2
0
0
1/24
1/48
0
~=
,
I
=
1/640/
(:0060) 24
\-60 WA= diag(1/2, 1/24),
1/~8 /
0
(10)
,
1440/
WA-1 = diag(2, 24).
2 °. I f in (H3), al ¢ 0 then
(?40 ?20) (:0 0 0680/
D(h) = diag(h, h2, h 3, 1, h),
WA
\ 1/320
WA-1
,
0
W~-1 = \-1680
WA= diag(1/2, 1/24),
(11)
1/3584,/
160 0
(12) ,
22400 ]
WA-1 -----diag(2,24).
Hints about proofs of the theorems. The proofs of Theorems 1 and 2 are based on Lemma 1 and Theorem 5 by Wu (1981). The uniform almost surely convergence for weighted random errors which is necessary for the Theorem 5 (Wu) is obtained by a slight extension of the technique given by Kundu (1993). The asymptotic covariance matrix for C(t,h) can be represented in the form: cov(C°(t) - C(t,h)) =
FcA / -1 t72 ( F~pq~ETA FAA
(13)
where ph(i)" A2(t + i) cos2(q~(t + i)). ~be(i)xo2(i),
Fee = ~
i
FAA = Z ph(i) " sin2(q)(t + i))-~bl(i)TI~I(i), i ph(i)" A(t + i) sin(2~p(t + i)). ~2(i)T~b1(i).
1~
i
(14)
312
V. Katkovnik I Statistics & Probability Letters 35 (1997) 307-315
The hypothesis (H5) is important for the limit passage in (13) giving (9)-(12). The proofs of Theorem 1 and Theorem 2 are given in Katkovnik (1994b). Comment on Theorems 1 and 2. (1) The asymptotic convergence rates which follow from (9)-(12) for the case al =0 (A(t)=ao) are as follows: [(p _ ~[ = O(h-1/2),
[Q _ ~[ = O(h-3/2),
1~'~(1) _ ~c~(1)[- O(h-5/2),
I~/-AI = O(h-1/2),
!d(1) --A(1)[- O(h-3/2).
(15) where O(h~) means that [O(ha)/h~[<~L
and
Id-A[---- O(h-1/2),
[~Q - ~[ ~ O(h-3/2),
hl/2D(h)(C(t,h)- C°(t)) converges in law
0)
~-1
,
(16)
to N(0,o-2W -1), as h ~ co, where
0)
We-I =
24
'
~ - 1 =2,
D(h)=diag(1,h, 1).
(17)
The comparison (16)-(17) with (9)-(10) shows that the LPA and the least squares method give equal asymptotic variances for the estimators of the frequency and amplitude. The remarkable advantage of the LPA versus the least squares method is that these estimates are obtained by the LPA for a linear in time frequency. Furthermore, the LPA gives estimates of the first derivatives for the amplitude and frequency. (2) The asymptotic convergence rates which follow from (11)-(12) for the case al •0 (A(t)=ao+alt) are as follows: 10 -- (Pl = O(h-3/2),
[~ -- QI = O(h-5/2)'
I~O)(t) - ~20)(t)[ = O(h-7/2).
(18)
The linear in time amplitude vastly improves the order of the convergence rate for the estimator of the phase, the frequency and the first derivative of the frequency. The convergence rates for the amplitude and its first derivative appeared to be the same for al = 0 and al ~ 0 cases. (3) Approximate confidence intervals for ~o(t), ~2(t), O0)(t), A(t) and A(O(t) can be obtained in the usual way by using the consistent estimator of the covariance matrix given by substituting the estimators of A(t), AO)(t), and a 2 into the formulas given in Theorem 2. For example, an approximate 95% confidence interval for 12(t) is _[
(C2(t,h) - 1.968, C2(t,h) + 1.968),
8=
2462
]1/2 (19)
|h3~4(t,h)
if al = 0, and 160•2 (C2(t,h) - 1.968, C2(t,h) + 1.968),
8=
]1/2
hS#(t,h) |
(20)
if al y~0. Here C2(t,h)is an estimator of f2(t),C4(t,h)and Cs(t,h) are estimators for Ao(t) and A~l)(t), respectively. It is clear that the choice between (19) and (20) seems to be a problem connected with the testing of the
K Katkovnik I Statistics & Probability Letters 35 (1997) 307-315
313
hypothesis al =A~l)(t)=0. An unusual feature here is the norming factor h 5/2 for ( C 2 ( t , h ) - f2(t)). The asymptotic variance of C2(t, h) being proportional to h -5 in the case al # 0. (4) The asymptotic accuracy seems to be independent of the features of the time-varying frequency. At least the asymptotic covariance matrices (10) and (12) do not depend on the time-varying phase ¢p0(t)= b0 + bit + bzt2/2. However, more subtle facts can be noted here. As a matter of fact the asymptotic of S~ determines the speed with which the covariance matrix approaches the asymptotic limit values (10) and (12). It seems that this influence exists only as second-order effects. Let b2 = 0 and bl # 0 in (7), then the sums (8) can be calculated and it is obvious that S r ( h ) = O ( l / h ) . It is sufficient for (H5). It was proved by Katkovnik (1994b, Lemma 3), that Sr really can approach zero faster than O(1/h). Let b2 # 0 and the sampling period A be small, A-~ 0, so that all sums in the formulas given above can be transformed into integrals over the continuous time interval [-T, T], T = hA. Then (8) is of the form 1
Sr = Tr+1 .I-T/2
sin(q~(t + u)). u r du ~ 0
as h ~ ~ ,
and if b2 > 0 the Sr is of the order O(1/T 2) for any integer r > 0 . (5) The normality hypothesis is not of prime importance in these theorems. The assumption was made for the sake of simplicity only. As a matter of fact, it was used only to get the exponential inequality for weighted sums of random ~(t). It is known that similar inequalities exist under different hypotheses about random values e(t). For example, it is enough to assume that E~(t)= 0, E g Z ( t ) : o"2, and [e(t)l ~
4. Bias estimates Consider the general case when A(t) and ~o(t) are not polynomials. The problem of bias and the usual trade-off bias-variance is studied in this section. Theorem 3. Let C'(t,h) be a solution of (6), and (H1), (H2), (H4), (H5) hold. Let also L2oh 4, L2Ah2, L2a/a, LZA/Cr~ 0
as h ~ ~ ,
then: 1. D ( h ) ( C ( t , h ) - C°(t)) converges in law to N(b(h),aZw-1/h), where D(h) and W are given by (9)-(10), /fA(1)(t)=0, and by (11)-(12), /fAO)(t)#0, respectively, and the bias-vector b(h) E ~5 has the following orders: b(h)
J" O(h3)L20 + O(h2)L2A
/fA(1)(t) = 0 ,
[ O(h4)L2o -4- O(h2)L2A
/fA(1)(t)#0.
(21)
2. The asymptotic mean squares errors are as follows:
(a) I f AO)(t)=O E[(Ds(h)(Cs(t, h) - C°(t))) 2] = 0 where D(h) is given by (9), s = 1.... ,5.
+ h6L2~ + h4L~
,
314
I( Katkovnik I Statistics & Probability Letters 35 (1997) 307-315
(b)
If A(1)(t)¢O
E[(D~(h)(C:~(t,h)-Cs°(t)))2] = 0
h(A(-iY(t))2 +h8L~ +h4L
,
(22)
where D(h) is given by (11), s = 1..... 5. The justification of the limit passages in Theorem 3 and Theorem 2 are quite similar. The proof of Theorem 3 is given in Katkovnik (1994b). Comments on Theorem 3. (1) This theorem determines the MSE of the LPA. The optimal choice of h can give an optimal trade-off between the bias and noise reduction. Let Lza = 0 and LzF~~ 0, then the least asymptotic MSE are as follows:
(a) If A(1)(t) = O, E(tp - q3)2 = O(1/h,),
E ( n - ~)2 = O(1/h3),
E(A - 7 ) 2 = O(1/h, ),
E(A(I) _ ,~(1))2 = O( l/h3, ),
E(~2(1)_ ~(1))2 = O(1/hS,), (23)
where h.
0
(
(24)
(b) If AO)(t) ¢ O, E((,o - ~b)2 = O(1/h3),
E(D - ~)2 = O(1/hS,),
E(A - A ) 2 = O(1/h,),
E(A (1) - A(l)) 2 = O(1/h3,),
where h,=O
((
E(~C~(1) __ ~'~(I))2= O( I/h7 ),
(25)
o
\AO)(t)L2oJ J"
(26)
3. Formula (21) shows that the bias of the errors is proportional to the second derivatives of the frequency and the amplitude. The degrees p A > l and p ~ > 2 can be applied in (4). It can be seen then that the bias is proportional to the corresponding higher order derivatives A(pA+I)(t) and [2(P~)(t). The higher order local polynomial models can improve the accuracy of estimation only if the absolute values of these derivatives of D(t) and A(t) are small enough. This information about D(t) and A(t) is basic for a reasonable choice of powers in the local polynomial models.
5. Conclusions On the basis of the LPA approach the nonlinear nonparametric high-order kernel estimates developed for estimation of the rapidly time-varying frequency and amplitude. The convergence and asymptotic normality results are presented for the estimates.
Acknowledgements I would like to thank the anonymous referee for the helpful comment.
V. Katkovnik I Statistics & Probability Letters 35 (1997) 307-315
315
References Boashash, B., 1992. Estimating and interpreting the instantaneous frequency of a signal - Part 1: Fundamentals and Part 2: Algorithms and applications. Proc. IEEE 80 (4), 520-568. Brown, R.G., 1963. Smoothing, Forecasting and Prediction of Discrete Time Series. Prentice-Hall, Englewood Cliffs, N.J. Cleveland, W.S., 1979. Robust locally-weighted regression and smoothing scatterplots. J. Am. Statist. Assoc. 74, 829-836. Cohen L., 1989. Time-frequency distributions - a review. Proc. IEEE 77, 941-981. Hannan, E.J., 1973. The estimation of frequency. J. Appl. Probab. 10, 510-519. Hlawatsch F., Boudreaux-Bartels, G.F., 1992. Linear and quadratic time-frequency signal representations. IEEE Signal Process. Mag. 9, 21~58. Huang, D., Hannan, E.J., 1994. Estimating time-varying frequency, submitted. Katkovnik, V., 1971. Homogeneous integral averaging operators obtained by the method of least squares, Part 1. Autom. Remote Control 32 (11) 1767-1775. Katkovnik, V., 1979. Linear and nonlinear methods of nonparametric regression analysis. Soviet J. Autom. Inform. Sci. 5, 25-34. Katkovnik, V., 1985. Nonparametric Identification and Smoothing of Data (Local Approximation Method). Nauka, Moscow (in Russian). Katkovnik, V., 1993. High-order local approximation adaptive control of rapidly time-varying dynamics. In: Preprints 12th IFAC Congress, Sydney, Australia, vol. 2, pp. 21-26. Katkovnik, V., 1994a. Identification of physical time-varying parameters. In: Blanke, M., Soderststrom, T, (Eds.), Preprints 10th IFAC Symp. on System Identification, Copenhagen, Denmark, vol. 2, pp. 349-354. Katkovnik, V., 1994b. Local polynomial approximation of rapidly time-varying frequency. Research Report 94/1, May, UNISA, RSA. Katkovnik, V., 1997. Nonparametric estimation of instantaneous frequency, IEEE Trans. on Information Theory, 43, 183-189. Kundu, D., 1993. Asymptotic theory of least squares estimator of a particular nonlinear regression model. Statist. Probab. Lett. 18, 13-17. Peleg, S., Pot-at, B., 1991, The Cramer-Rao lower bound for signals with amplitude and polynomial phase. IEEE Trans. Acoust. Speech Signal Process. 39 (3), 749-752. Stone, C.J., 1977. Consistent nonparametric regression. Ann. Statist. 4, 595-645. Walker, A.M., 1971. On the estimation of a harmonic component in a time-series with stationary independent residuals. Biometrika 58 (1), 21-36. Wu, C.F., 1981. Asymptotic theory of nonlinear least squares estimation. Ann. Statist. 9 (3), 501-513.