Journal of Statistical Planning and Inference 68 (1998) 359-371
ELSEVIER
journal of statistical planning and inference
Nonlinear time series with long memory: a model for stochastic volatility 1 Peter M. Robinson*, Paolo Zaffaroni Department of Economics, London School of Economics, Houghton Street, London WC2A 2AE, UK
Received 18 September 1996; received in revised form 25 October 1996; accepted 28 October 1996
Abstract We introduce a nonlinear model of stochastic volatility within the class of "product type" models. It allows different degrees of dependence for the "raw" series and for the "squared" series, for instance implying weak dependence in the former and long memory in the latter. We discuss its main statistical properties with respect to the common set of stylized facts characterizing financial assets' returns time series dynamics, and apply it to several series of asset returns. @ 1998 Elsevier Science B.V. All rights reserved.
1. Introduction The focus o f attention in much nonlinear time series modelling has been the form o f the nonlinear dynamics, rather than memory properties. In particular many o f the stationary nonlinear time series models that have been studied, such as those o f nonlinear autoregressive form, satisfy some more or less strong form o f mixing condition (implying, for example, that the mean o f n consecutive observations, normalized by n 1/2, is asymptotically normal). Short memory properties do not usually seem uppemaost in the intentions o f nonlinear modellers, and are frequently not checked in empirical applications. In fact some empirical evidence seems to contradict the short m e m o r y hypothesis. Financial data, such as stock returns and exchange rate returns, can have small sample autocorrelations whereas certain instantaneous nonlinear functions such as squares can have sample autocorrelations which die away very slowly, possibly consistent with the notion o f long memory, where theoretical autocovariances are not summable (see e.g. Ding et al., 1993). A process which is white noise but whose squares are autocorrelated cannot be modelled linearly (Robinson and Zaffaroni, 1995). The financial data in question is also apt to exhibit greater than Gaussian kurtosis. * Corresponding author. Tel.: +440171 9557509; fax: +440171 831 1840. l The research of both authors was funded by ESRC Grant no. R000235892. The second author's research was also funded by EU Human Capital and Mobility Grant no. ERBCHBITC941742. 0378-3758/98/$19.00 (~ 1998 Elsevier Science B.V. All rights reserved. PII S0378-3758(97)001 49-3
360 P.M. Robinson, P. Zaffaroni / Journal of Statistical Planning and Inference 68 (1998) 359-371 The ARCH type of models first introduced by Engle (1982) accounts for these phenomena but with short memory autocorrelation in the squares. Robinson (1991) introduced an extended form of GARCH model which can produce long memory in the squares and used it as an alternative in testing for no-ARCH. This type of model was then further developed by Baillie et al. (1994). A difficulty with it is a lack of rigorous asymptotic theory for estimates of the parameters explaining the long memory and the apparent difficulty of deriving such a theory, indeed the available asymptotic theory for short memory GARCH models is limited to very simple models. This difficulty was a partial motivation for the introduction by Robinson and Zaffaroni (1995) of the nonlinear moving average model Xt = etht,
(1)
where oo
h t = p - ~ E (zi~t_i, i=1
oo
E ~2 <(2o,
(2)
i=1
and {et} is a sequence of independent indentically distributed (i.i.d.) random variable with finite fourth moment (for example Gaussian white noise). This model partially extends a short memory nonlinear moving average of Robinson (1977). Under (1) and (2) xt is white noise but if the ~] decay suitably slowly (and are certainly not absolutely summable) ht has long memory and the squares yt = x2 may also have long memory. Robinson and Zaffaroni (1995) considered a number of properties of this model. They also considered statistical inference in case of finite parameterization of the ~j. Maximum likelihood estimation (with Gaussian et, for example) is possible but the likelihood and scores can only be computed recursively, and expensively, while asymptotic theory for the parameter estimates would seem very difficult to derive. Instead Robinson and Zaffaroni (1995) stressed Gaussian estimates, under the fiction that the yt are Gaussian. This enables an asymptotic theory for the parameter estimates although it does not follow from usual limit laws for short or long memory variates and involves complicated formulae for higher order moments. An alternative approach is suggested by the stochastic volatility model of Taylor (1986). We can replace (1) by
Xt = ~hht,
(3)
with ht given in (2), but {rh} is an i.i.d, sequence independent of {et}. There is thus a decoupling of the two factors, and (3) can be called a "two-shock" model as distinct from the "one-shock" case (1). In the stochastic volatility literature ht and yt have short memory, the weights ~j in (2) being assumed to be at least summable (so that they could be the coefficients in the moving average representation a stationary autoregressive moving average sequence) but as did Robinson and Zaffaroni (1995) with the model (1), we can choose the ctj to impart long memory to ht and Yt. An advantage of (3) over (1) is that the independence of the two factors leads to simplification
I~ M. Robinson. P. Zaffaronil Journal of Statistical Plannin O and Inference 68 (1998) 359-371 361 in moment formulae and thus should also simplify asymptotic theory for Gaussian estimation relative to (1). An alternative two-shock model is in Harvey (1993). In the following section we describe a model that is actually rather more general than (3), in two respects, and derive its memory properties and kurtosis. The greater generality is due in part to allowing arbitrary memory in the raw xt, and in part that we do not impose linearity anywhere, so far as consideration of autocorrelation properties is caused. We make use of some general results on the second order properties of certain nonlinear functions, which are stated and proved in an Appendix. Section 3 specializes to the case of linear processes, because these are the likely vehicles for parametric modelling. Section 4 discusses Gaussian estimation of a parametric model. Section 5 estimates a simple version of our model from empirical data with comparison to the model of Robinson and Zaffaroni (1995).
2. The model: definitions and statistical properties We extend (3) to xt=gt-l-r/tht, where the right-hand side variates obey the following condition. Condition A. The process {r/t} is serially uncorrelated, with E(r/t)=E(r/~)=O, var(r/t)=an~. The bivariate process {gt, ht} is independent of {r/t} and fourth order stationary with zero joint third cumulants, and for at, btC{gt, ht}, including the case at = bt, we define
E(at)=pa, ?ab(j)=cov(ao, bj),
j = 0 , + l .... ,
tCab(j) = cum4(ao, ao, by, bj),
j = O, 1.....
We also introduce a stronger condition, which holds under Gaussianity. Condition B. Condition A holds and for at, btE {gt, ht} and j = 0 , 1,..., K~b(j) = O.
Theorem 1. Under Condition A, J'or all u = O, 1..... (i) ?xx(U)=?~y(u), (ii) ?yy(U)= 2[?~g(u) + ½xgg(u) + a,,(7~h(U ) + 72g(u) + ½ Xgh(u) + ½ Xh~(U))
+ od.( 2h(u) + I hh(u) +
+
+ vyyf(u, 0),
where Vyy = (K,~l + 2t72~l)(Khh + 272h(0) "q- 4#2~hh(0) + (]A2 + ~hh(0)) 2) -~- 8tT~l,lAglAhTgh(O)
2 q-40"~t~/(Kgh -[- ~gg(O)~hh(O ) "-]"72h(0) -[- ]AgThh(O ) + #2~gg(O) ~- 21tg#h~oh(O)) , Xab = Xab(0), and 6(i,j) denotes the Kronecker delta.
362 P.M. Robinson, P. Zaffaronil Journal of Statistical Planning and Inference 68 (1998) 359-371 Proof. (i) Immediate given independence of {gt, ht} and {r/t}, and observing that
#x = #o" (ii) From Lemma A.1 in the Appendix, yyy(U) = cov((xo - #x)2, (Xu - #x)2 ) + 4#27xx(U). By Lemma A.2 in the Appendix the first term on the right-hand side becomes cov((go - #g + r/oho)2, (gu - #g + quhu)2) 2
2
= cov((g0 - #g)2 + 2(g0 - #g)qoho + r/oho, (gu _ #g)2 + 2(g~ - #g)r/uh~ + r/~h~)
= a + a~n(b + c) + a~d, where a = c o v ( ( g 0 - # o ) 2 , ( g ~ - # o ) 2 ) , b = c o v ( ( g 0 - #g)2,hu) , 2 c = c o v ( h ~ , ( g , - #0)2), d=cov(h~,h~). From Lemmas A.1 and A.3 in the Appendix, a = Z T Z g ( u ) + Koo(u), b = 27o2h(u) + ~oh(u), c = 2?~g(u) + nho(u), d = tchh(u) + 2~u(u) + 4#~yhh(u). The result follows from (i) and routine computation. [] Under Condition A the autocovariance properties of xt are inherited from those of gt. Moreover if r/t and gt are also martingale differences, so is xt. The different possibilities allowed by the model in terms of degree of dependence for the squared process Yt are indicated as follows. Assuming that Condition B and the mild ergodicity condition ~gg(U) --+ O,
Yhh(U) --~ O,
as u --+ oo,
hold, we deduce that, as u--~ cx~: (i) When #9 = 0
7yy(U) ~ 2[y2o(u) + a,,(y2h(U) + y20(U)) + 2•2,#27hh(u)]. (ii) When #h = 0 7yy(u) ~ 2[#2yog(u) + ann(?2h(u) + y2g(u)) + a2n72h(u)]. (iii) When #g = #h = 0
~yy($1) ,'~ 2[?~g(u) + a,m(?2h(u) + y2g(u)) + cr2ny2h(u)]. (iv) Otherwise yyy(u) ~ 2[#2?gg(u) + a,,(y2h(u) + y2g(u)) + a,2,2p2yhh(u)]. Returning to the general setting, we can apply these results to the circumstances envisaged in Section l, to achieve a white noise or short memory xt, and a long memory Yr. Assume that for j--~ oc
?hh(j)",-' Kh]jl 2a-l,
~ag(j)=o(]jl4cl--2),
P.M. Robinson, P. Zaffaronil Journal of Statistical Plannino and Inference 68 (1998) 359-371
363
with 0 < d < ½ and 0 < K h < c ~ . Here ht has long memory (with memory parameter d) because the Yhh(j) are not summable. Clearly gt has shorter memory than Yt for all d E ( 0 , ½), and for d E ( 0 , ¼) it does not have long memory because the Yoo(J) are 1 Assume also that gt summable, while long memory in gt is a possibility when ¼< d < ~. and ht a r e uncorrelated, so that ?gh(j) = 0, or more generally that ])oh(J) = o(Ijl za-I ), as IJl ---' e~. Then from ( i ) - ( i v ) we deduce that asymptotically it does not matter whether or not 129 is zero, and 2
2
" ~x~
4~7~12h'~hh(J)
7yy(J)""
2
2
-2d-I
4gh#hahhlJ[
,
2a2qy2h(j),.,.,2K2a2n[jl4d_2,
12h¢0,
12h-~-0.
(4) 1
Thus when 12h ~ 0, Yt exhibits long memory for all d E (0, ~), while when 12h 0 it 1 does when d E ¼, 3" We can also give an expression for the coefficient of kurtosis of the process xt. =
Theorem
2. Under Condition B
kurt (xt) = 3 +
12a..y~h(O) + 6a2.~'hh(O)(Yhh(O) + 2# 2)
(Too(°) + (7~(12h2 + ~h(0))) z
Proof. Writing kurt(xt) --
var((xt - #x) 2) (var(xt))2 + 1,
we will evaluate the numerator and the denominator separately. By direct calculation using Lemmas A.3 and A.2 in the Appendix we get var(xt) = Tog(0) + ann(Yhh(O ) + #~), var((xt -- m ) 2) = c o v ( ( g , _ 12g)2+ qtht 2 2 + 2?hht(g t _ #o), (Or _ =var((gt_po)2)+2cov((gt
/to) 2
2 2 + 2 ? h h t ( g t - 12o)) +?ltht
2 2 ) + varObht 2 2 ) + 4var(rhht(gt - 12o)) - #o) 2 ,rltht
=a+2b+c+4d, where a = 2y~o(0 ), 2
b = an,2yZgh(0), 2
2
2
2 2
c = 3%~(4#hYhh(O ) + 2yhh(0)) + 2a~(yhh(0) + 12h) , d = a,m(yhh(O)Too(O) + #gT00(0) + 272h(0)). The result follows by straightforward manipulation.
[]
Because the second term on the right hand side in Theorem 2 is positive, xt has fatter tails than a Gaussian process, but as #2 ---, oc, Gaussian kurtosis is approached.
364
P.M. Robinson, P. Zaffaroni/ Journal of Statistical Plannin9 and Inference 68 (1998) 359-371
We shall now derive the power spectra for the xt and the Yt processes, assuming {gt, ht} have jointly absolutely continuous spectral distribution function. We denote by fab().) the cross spectral density of processes at, bt, satisfying
~ab(U) =
?
fab(~)e iu~°dco,
u = 0, ±1 . . . . .
(5)
Theorem 3. For any - ~ < 2 <<.rt: (i) under Condition A fx~(2) = fg0(2); (ii) under Condition B fyy(2) = 2
[£
fog(p)foo(2 - p ) d p + 2 ~
/;
~e(foh(#)foh(2 -- # ) ) d #
7I
+ 2#2f0,()~) + ~ r 2 ,( / /
jh(#)jh(2
"YY -- #) d p + 2 # ~ j h ( 2 ) ) ] + -~-.
Proof. (i) The proof follows directly from Theorem 1. (ii) In the expression for Vyy(U) from (ii) Theorem 1 with all fourth order cumulants terms set to zero, substitute using (5) to obtain
7yy(U)
=
2
foo(,~)f.qg(Og)eiu(2+C°)d2 dco + 2#
q- ¢Tqq 2 + ann
fgo().)eiWtd~
( fgh(~)f#h(CO) + fho(,~)fhg(~o))e iu(2+°~)d ~ d2
(?/2
fhh(2)fhh(~)eiUO+~O)d2d~o+2p2
/;
)
j~h(J.)eiu2 d
.
Now make the change of variables from ~o to # = ~o + 2 and equate the integrand with respect to 2 to fyy()~)eiu'~, in view of (5). []
3. Linear gt and ht For the purpose of finite-parameter modelling it is likely that we will specify gt and
ht tO be linear processes, as in: Condition C:
at = #g ~- E fli ~t-i' i=0
where the coefficients
ht = ~lh ~- E O~i~t--i' i=0
{~i}
O0
O0
i=0
i--0
and {fli} at minimum satisfy
P.M. Robinson, P. ZaffaronilJournal of Statistical Plannin9 and Inference 68 (1998) 359-371
365
and E(gt) = 0 ,
t=0,+l,...,
~a~.,
s = t,
E(~,~s) = ( o,
s ¢ t,
E(e~ete,)=O,
Vs, t,u •
E(es ~t euev ) =
(6)
2
s:t:/3:U,
{i=t#v=~,
¢72
u ~ t=v,
g8: ~
v ~ t=u, O,
otherwise.
Thus et behaves as an i.i.d, sequence up to fourth moments. U n d e r Condition C, ht and gt satisfy Condition A.
Corollary 1. Under Condition C, for
u = 0, 1. . . . .
(X)
(i)
7~(u) = a~.f~ flifli+~. i=0
(ii)
2 -~ 2 ~ , ~)yy(U) = K8e E fl,2fli+u
flifli+u
+ 41~2a,~ Y~, flifli+u
i=0
4- (T,q
-'}-a2,
i=0
(
oo
22
Kge..~ fie Ogi+u~- 2a2,~ i=0
/
fliOQ+u
)2)
Xee, Z_~ 0~2~2 i i+uq-2ff~2
~
2 O~iO~i+u q-4tlhffee~'-~ ~iO~i+u
i=0
q- Vyya(U,
22
Jr- N.~g~ (Xi fli+u i=O
i=0
0),
with Vyy =
(Krpl -{- 20"2r/)(Ke~,~'a4Jr- 2a2~(Z~,:) 2 + 4/~20"~L'~,~ + (/~2 + a~Z, 2)2) 2
+ #2a~S, fl2 + 2#a#ha~Z~ fl) where Y~c aef = ~ i =o~ o ci for any sequence {ci}.
_jr_ 0.2
(~afl)2
2
366 P.M. Robinson, P. Zaffaroni/ Journal of Statistical Plannin9 and Inference 68 (1998) 359-371
It follows that when /3i = 0, i ~> 1 the raw process xt is a white noise, but not a martingale difference sequence. To achieve the latter property we would require et to be a martingale difference sequence, a stronger condition than (6). Corollary 2. Under Condition C kurt(xt) = 3 +
2 12o'~o'~rt(~-]~/~)2 + 6a~a;m(a~d~-]~2 )2 + 2#~ y ~ )
(,~. ~B ~ +,,~,(~2 + ~. ~ 2 ) ) 2
+ ~¢,,7 (Ke~ Z ~
+ 2 a 4 ( Z . 2 ) 2 + 4#2o'.~ Z . ,
+(P2 + a.e Z . 2 ) 2) J •
Again note that with ~:q, = x~ = 0 the coefficient of kurtosis decreases to 3 as /.th2 ---+oo. By denoting the transfer functions of the ~i and/3/ coefficients, respectively, by ~ ( 2 ) = k ~j eij2, j=0
/3(2)= k /3jeij2, j=0
we obtain the power spectra for the raw process xt and the squared process yr. Corollary 3. Under Condition C (i)
fxx(2)=
~1/3(2)12,
and if also x~ = 0
(ii) fyy(2)=a~2( ~2 f)
I/3(#)/3(2 - #)12 d#
4a~ [ ~ + 2It J_,~ ¢le(a(#)/3(-#)a(2 - #)/3(-2 + #)) d# 2 2 7t +4#•a~ 1/3(2)12 + ~2%. f _ I"(#)cc(2 - #)12 d#
)
+ 4 a 2 n ~ l a ( 2 ) ] 2 +Vyy2n
4. Estimation
We propose as a simple expedient a pseudo maximum likelihood estimate (PMLE) based on a Gaussian likelihood as if £t were Gaussian. Of course, from the assumptions made on the distribution of the unobservable processes gt, rh, ht, the squared process Yt
P.M. Robinson, P. Zaffaronil Journal of Statistical Planning and Inference 68 (1998) 359-371 367
cannot be Gaussian, being always non-negative. Indeed, xt can also not be Gaussian assuming gt,ht, rlt to be so. However, given the latent structure of the model there is no simple way to invert the model and to write down the true likelihood on the basis of, say, Gaussian gt, ht, rh. In the linear set-up of the previous section introduce functions ~(2; 0), fl(2; 0) of 2 and a p × 1 vector 0 and define fyy(,~; ~/) by the formula for fyy(2) in Corollary 3 with ~(2), fl(2) replaced by ~(2; 0), /3(2; 0), for ~9= (/~0, ph,a,,, 0')'. Denote by if0 the true value of ~,, so that fyy( 2 ) = Ly()~, ~t0).
Introducing the periodogram based on T observations 1
1(2)
T
2
~-~ "-',~y,e~'~ ,
and denoting 2j = 2rtj/T, a discrete version of Whittle's Gaussian pseudo log likelihood is
'-'( lnfyy(2j;~) + 1(#)
QT(~)= 1/T ~
.
(7)
The PMLE is = arg min Qr(~b),
q~Eq'
for a compact ~u. The potentially most cumbersome aspect from a computational point of view is induced by the two convolutions contained in the expression for fyy(2; i~) which are a result of the nonlinearity. To indicate how we deal with this let us suppose that we wish to evaluate
~(~) =
/:
f ( 2 ) O ( # - ,~) d2, It
for some functions ./'(2), 0(2). By a standard result in harmonic analysis the Fourier transform of h(2) is the product of the Fourier transforms of f ( 2 ) , 0(2). Thus to approximate h(#) we can use the fast Fourier transform in order to convolve f ( 2 j ) , 0(2j),j = 1. . . . . T - 1, take the product of the results and then deconvolve. It should not be difficult to establish T 1/2 consistency and asymptotic normality of the PMLE given the stationarity of the processes involved and the relatively simple moment structure. With respect to consistency we should be able to adapt the approach of Hannan (1973) under ergodicity assumptions. A critical aspect is checking identifiability, which depends on the parameterization chosen, e.g. in the linear case above we might need to set a~ = 1 depending on whether we set ~g = 0 or not. So far as asymptotic normality is concerned, we cannot use central limit theorems for weakly dependent processes on the one hand, or the available results on linear long memory
368 P.M. Robinson, P. ZaffaronilJournal of Statistical Planning and Inference 68 (1998) 359-371
processes on the but it seems that fying assumption Zaffaroni (1996).
other (e.g., see, Giraitis and Surgailis, 1990; Heyde and Gay, 1993), the method of moments can be applied, especially under the simplithat qt and et are Gaussian, as in Robinson and Zaffaroni (1995) and We conjecture that, as T ~ co,
T1/2( ~ - ~ ) --~a JV'(O,A-1BA -1 ), where
'l
A(O)= ~
B(O) = 47~
c(2; ~)c'(2; ~) d2, c(2; ~k)c'(2; ~) d2
,t
~
f v ~ ; ~ ) ~yyyt- , (.o, -~o ) d2 do9, fyy(,~; O) yy
and c(2; O) - ~3In fyy(2; O),
a0 with A = A ( ~ ) , B = B(Oo) and where Qyys( 21, 22, 23 ) denotes the trispectrum of Yr. To perform approximate statistical inference we can plug ~ in the expression for A(O ) and B(0) or discrete approximations of these, for example we can replace A(O ) by 1 r-i j=l
and estimate the double integral on the right hand side of B(O), involving Qyyy( ..... ), as in Taniguchi (1982) or Keenan (1987).
5. An empirical application We consider seven time series of asset returns. In all cases we calculate the return as xt = ln(Pt/Pt-1) for t-- 1,3 .... ,547 where Pt denotes the speculative price of the asset. In particular we will consider the exchange rate Yen~Pound spot and forward, the exchange rate Dollar~Pound spot and forward and the return indices F T S E 100, F T S E All and the S&P 500 All. The foreign exchange rate (forex) data are weekly and run from 8 January 1985 through 7 June 1995 while the stock index data are daily and run from 1 January 1993 through 6 February 1995. In Table 1 we report the Ljung and Box (1978) statistic based on the first 24 sample autocorrelations for the raw data xt(Q(24)) and for the squares yt(Q2(24)) in the first and second columns, respectively. In the third column we report the Ljung-Box statistic based on the first 70 sample autocorrelations for the squared returns (Q2(70)). In parentheses we report the p-value based on the usual Z2
P.M. Robinson, P. ZaffaronilJournal of Statistical Planning and Inference 68 (1998) 359-371 369 Table 1 Summary statistics Data
Q(24)(p-value)
Q2(24)(p-value)
Q2(70)(p-value)
Kurtosis
Skewness
sYP fYP sUP fUP F100 FAIl $500
26.34 (0.33) 26.01 (0.35) 32.07 (0.12) 31.99 (0.13) 19.35 (0.73) 16.38 (0.87) 33.001 (0.10)
50.83 50.85 102.17 101.16 49.73 51.91 37.33
65.23 65.93 118.94 117.79 94.29 101.45 83.63
8.43 8.34 5.8 5.69 3.41 3.10 4.98
1.27 1.25 0.27 0.25 0.02 0.02 0.07
(0.001) (0.001) (0.00) (0.00) (0.001) (0.001) (0.04)
(0.62) (0.61) (0.00) (0.00) (0.02) (0.008) (0.13)
Note: "sYP" = spot Yen/Dollar, "fYP"= forward Yen/Pound, "sUP"= spot Dollar/Pound, 'TUP"= forward Dollar/Pound, "FI00" = FTSE 100, "FAIl" = FTSE All, "$500" = S&P500.
approximation. Finally in the last two columns we report the sample coefficients of kurtosis and skewness for the raw returns. The results clearly indicate little serial correlation in the levels, but significant serial correlation in the squares. In particular, for the forex Dollar/Pound and the stock indices the degree of dependence in the squares appears particularly strong, given the high significance of the portmanteau statistic up to the 70th lag. For all but the FTSE series the kurtosis is much greater than that for Gaussian data, and the forex Yen/Pound series show the greatest skewness. The results of Table 1 provide some motivation for considering a nonlinear model such as that presented in this paper. We consider the linear parameterization reported in Section 2 with f l i = f l ',
Ifll
= ~i(d) =
/
1,
i=0,
[l~=lJ+d-'j
,
0
i=1,2,...,
so that
I~= (Pg,Uh,fl, 2 2
t
d,a~) .
Hence we specify gt - #g as a stationary AR(1 ) and ht - Ph as a stationary ARFIMA (0, d, 0) thus obtaining a very parsimonious model. Finally we take au~ = 1 so that the mean parameters and a~,: will assume the meaning of "variance-ratios". To optimize the pseudo likelihood for Yt (see Section 4) we used the Gauss subroutine OPTMUM with the Polak-Ribiere-type option, with 50 iterations from estimates obtained by a grid search. Standard errors and thus Student-t statistics use the estimates of the trispectrum for the squared data of Taniguchi (1982) and Keenan (1987) with a Fejer window. The results are displayed in Table 2, the hatted quantities indicating parameter estimates with t-ratios in parentheses.
370 P.M. Robinson, P. Zaffaroni l Journal of Statistical Plannino and Inference 68 (1998) 359-371 Table 2 Pseudo maximum likelihood estimates Data
fi~ (ti~)
sYP fYP sUP fUP FI00 FAll $500
exp(-13.00) exp(-13.09) exp(-14.89) exp(-14.93) exp(-15.97) exp(-16.26) exp(-17.24)
(0.00) (0.00) (0.00) (0.00) (0.00) (0.00) (0.00)
//h2 (t~)
/~(t#)
d (td)
6",z
1.22 (2.28) 1.21 (2.27) 1.33 (2.46) 1.34 (2.50) 19.69 (3.83) 19.13 (3.92) 1.00 (1.56)
0.03 (0.07) 0.03 (0.06) 0.11 (0.19) 0.10 (0.17) 0.09 (0.04) 0.04 (0.002) 0.02 (0.004)
0.313 (3.22) 0.312 (3.11) 0.336 (5.43) 0.337 (5.38) 0.475 (35.03) 0.474 (25.49) 0.476 (38.39)
0.000132 0.000133 0.000143 0.000144 0.000020 0.000027 0.000016
For each raw series the estimates of the mean #x = #g are insignificantly different from zero as are those of the AR(1) coefficient /3. Things are much more interesting once we consider the estimates of the parameters of the nonlinear part of the model. In fact all the data display a strong degree of dependence in the squares, some of the d values being close to the boundary of the stationary region. For all but the S&P500 index /i2 is significantly different from zero, so in view of (4) the d estimates are directly interpretable as expressing the memory property of the squared process. For the S&P500 index, taking e as the memory parameter of the squares we have kssoo = 2dss00 - ~1 = 0.452 from the relation 2 e - 1 - - 4 d - 2. Then in agreement with the preliminary analysis of Table 1 we find that the weakest (yet significant) degree of persistence of volatility characterizes the forex Yen/Pound data. Finally, we observe how the biggest estimates of the #2h parameter characterize the series with the smallest coefficient of kurtosis, in agreement with the theoretical result of Corollary 2.
Appendix
The following lemmas may be obtained as special cases of Brillinger (1975, Theorem 2.3.2) which itself results from work of Leonov and Shiryaev (1959), and are presented here without proof for ease of reference. L e m m a A.1. I f X and Y have zero mean, finite fourth moments and zero third cumulants, (i) cov((X + a)2, (Y + b)2) = cov(X 2, y2) + 4abcov(X, Y), (ii) cov(X 2, y 2 ) = cum4(X,X, Y, Y) + 2(cov(X, y))2. Lemma A.2. For W, LX, Z with finite variance and such that W, Y and ( X , Z ) are independent
cov(WX, YZ) = E( W)E( Y )cov(X, Z).
P.M. Robinson, P. Zaffaroni/ Journal of Statistical Planning and Inference 68 (1998) 359-371 371
Lemma A.3. For any X, Y with finite fourth moments and zero third cumulants var(XY) = cuma(X,)(, Y, Y) 4- var(X) var(Y) + (cov(X, y))2 + E(X)2var(Y) + 2E(X)E(Y)cov(X, Y) + E(y)2var(X). References Baillie, R.T., Bollerslev, T., Mikkelsen, H.O.A., 1994. Fractionally integrated generalized autoregressive conditional heteroskedasticity, preprint. Brillinger, D., 1975. Time series, Data Analysis and Theory. Holt, Rinehart and Winston, New York. Ding, Z., Granger, C.W.J., Engle, R.F., 1993. A long memory property of stock market and a new model. J. Empirical Finance 1, 83-106. Engle, R.F., 1982. Autoregressive conditional heteroskedasticity with estimates of the variance of the United Kingdom. Econometrica 50, 987-1007. Giraitis, L., Surgailis, D., 1990. A central limit theorem for quadratic forms in strongly dependent linear variables and its application to asymptotically normality of Whittle's estimate. Probab. Theory Rel. Fields 86, 105-129. Hannan, E., 1973. The asymptotic theory of linear time series models. J. Appl. Probab. 10, 130-145. Harvey, A., 1993. Long memory in stochastic volatility. Discussion paper 10, London School of Economics, SPS. Heyde, C., Gay, R., 1993. Smoothed periodogram asymptotics and estimation for processes and fields with possible long-range dependence. Stochast. Process. Appl. 45, 169-182. Keenan, D.M., 1987. Limiting behaviour of functionals of high-order sample cumulants spectra. Ann. Statist. 15 (1), 134-151. Leonov, V., Shiryaev, A., 1959. On a method of calculation of semi-invariants. Theory Probab. Appl. 4, 319-329. Ljung, G.M., Box, G.E.P., 1978. On a measure of lack of fit in time series models. Biometrika 65,297 303. Robinson, P.M., 1977. The estimation of a nonlinear moving average model. Stochast. Process. Appl. 5, 81-90. Robinson, P.M., 1991. Testing for strong serial correlation and dynamic conditional heteroskedasticity in multiple regression. J. Econometrics 47, 67-84. Robinson, P.M., Zaffaroni, P., 1995. Modelling nonlinearity and long memory in time series. In: Nonlinear Dynamics and Time Series. American Mathematical Society, Providence, RI. Taniguchi, M., 1982. On estimation of the integrals of the fourth order cumulant spectral density. Biometrika 69, 117-122. Taylor, S., 1986. Modelling Financial Time Series. Wiley, Chichester, UK. Zaffaroni, P., 1997. Ph.D. thesis, in progress. London School of Economics.