COMPUTATIONAL STATISTICS & DATAANALYSIS ELSEVIER
Computational Statistics & Data Analysis 24 (1997) 169-178
Estimation of nonlinear time series with conditional heteroscedastic variances by iteratively weighted least squares T.K. Maid,*, H. Wong b, W.K. Li ¢,1 aDepartment of Decision Science/MIS, Concordia University, 1455 de Maisonneuve Boulevard West, Montreal Quebec, Canada H3G 1M8 bHong Kong Polytechnic University, Hong Kong eThe University of Hong Kong, Hong Kong Received January 1995; revised June 1996
Abstract
In this paper we consider a unified approach for fitting conditionally nonlinear time series models with heteroscedastic variances. The model considered is completely general, requiring only that the forms of the mean and conditional variance functions be specified. Based on the recent results of Mak (1993) on general estimating equations, we derive a convenient expression for the conditional information matrix. Furthermore, it is shown that estimation in such models can be performed via an iteratively weighted least squares algorithm (IWLS), so that the computational problems involved can be conveniently handled by many popular statistical packages. Its implementation is numerically illustrated using the "threshold plus ARCH" model. The algorithm is also demonstrated using both simulated and real data to be superior to the popular BHHH algorithm, which requires a much longer computing time and fails to converge if initial values are not chosen properly.
Keywords." Autoregressive conditional heteroscedasticity; Iteratively weighted least squares; Maximum likelihood estimation; Nonlinear time series models
* Corresponding author. Supported by an operating grant from Canada Natural Science and Engineering Research Council. l Supported by the Hong Kong Research Grants Committee. 0167-9473/97/$17.00 (~) 1997 Elsevier Science B.V. All rights reserved PH S 0 1 6 7 - 9 4 7 3 ( 9 6 ) 0 0 0 6 0 - 6
170
T.K. Mak et al. I Computational Statistics & Data Analysis 24 (1997) 169-178
I. Introduction There has been an increasing interest in recent years in linear or nonlinear time series models with heteroscedastic variances since many econometric and financial data display considerable heterogeneity. In this regard, the seminal paper by Engle (1982) had generated considerable interest in the past decade in models with autoregressive conditional heteroscedasticity, or ARCH models. Similarly, the linearity assumptions in many classical linear time series models are frequently found to be unrealistic (Tong, 1990). Consequently, alternative models, such as the bilinear model (Subba Rao, 1981) and the threshold model (Tong, 1978; Tong and Lim, 1980), have been proposed in the literature. In some applications, it is also probable (Gourieroux and Monfort, 1992; Li and Lam, 1995) that models permitting both nonlinearity and heterogeneous variances are needed to describe the observed data adequately. While time series models with nonlinear mean and heteroscedastic variances are more flexible and would in general provide a better fit to the data, they undoubtedly pose considerably more complex computational problems in estimation and inference. So far, these models have to be dealt with individually in the literature, and an unified method of maximum likelihood estimation is not readily available. As new models continue to emerge to meet the needs of specific applications, a more systematic approach to model estimation seems desirable. Although general purpose optimization algorithms such as quasi-Newton type methods can be used for maximizing the conditional likelihood, their implementation as well as the resulting approximation of the Hessian matrix are, however, not without problems in practice (Quandt, 1983; Thisted, 1988). In this paper, a method for fitting general nonlinear time series with heteroscedastic variances is developed based on the general algorithms of Mak (1993). A general expression for the associated conditional information matrix is also given. It is then seen that the algorithm can actually be implemented in the form of iteratively weighted least squares (IWLS) for maximizing the conditional likelihood (Section 2). Consequently, maximum likelihood estimation of any nonlinear time series models with heteroscedastic variances can then be conveniently handled by existing statistical packages, such as GLIM. Green (1984) discussed the computational advantages of IWLS as well as its applications in model diagnostics and outlier resistant estimation. In Section 3, we use the more recent threshold plus ARCH (Tong, 1990) model to illustrate the implementation of the suggested IWLS procedure. In Section 4, simulation studies are performed to compare the proposed algorithm with the widely used BHHH algorithm for parameter estimation in ARCH models. It is seen that the BHHH algorithm frequently experiences a convergence problem if the starting values are not sufficiently close to the solutions. Even when the BHHH algorithm converges, the proposed algorithm is considerably more efficient in terms of the computing time required to arrive at the final estimate.
T.I( Mak et al./ Computational Statistics & Data Analysis 24 (1997) 169-178
171
2. Estimation The results in this section are mainly developed based on the general theory found in Mak (1993). For clarity, we now summarize the results in Mak (1993) that are relevant to the present discussion. Let y be an n × 1 random vector of observations and p(y; O) its corresponding density function, where 0 is a vector parameter. The maximum likelihood (ML) estimate 0 of 0 is therefore obtained from solving
f ( y , O) = O, where f ( y , 0) = 01n p(y; O)/O0. Then: (1) Fisher's information matrix is given by
6o(0,60 =O) o=o' where 9(0, O) = Ey[f(y, 0) [ 0], and Ey(. [ 0) is the customary notation denoting the expectation taken under the density p(y; O) of y. (2) If 0(0) is any starting value and we define in the (r + 1)th iteration 0(r+l) as a root of the equation (in 0),
g(O, O(r)) = f ( y , 0(~)), then 0(~)~0
asr~c~.
Furthermore O(r) -- 0 is of the order Op(n-r/2). Thus (2) implies that if the equation
o(O,O) = f(y,O)
(2.1)
can be solved explicitly for 0, the algorithm in (2) can be easily implemented and a high degree of accuracy is obtained in very few iterations. For numerical demonstration of this, see Mak (1993). When Eq. (2.1) does not have an explicit solution, Mak (1993) suggests the linearization 0)
g(o,o) +
'
] (0 - O) = 0=0J
0. 0,0
]'
0=0
(0 - O) = f ( y , 0).
(2.2)
Thus 0(~+1) is the solution to the linear Eq. (2.2) (with 0 replaced by 0(r)). We now apply the above results to the ML estimation of a general nonlinear time series model with conditional heterogeneous variances. Let {Yt} be a discrete parameter time series with a vector regressor xt. Suppose that yt is described by a stochastic model of the form y , = l i ( x t , Yt-1 . . . . . Y t - p , O) q- et = lit -q- et,
T.K. Mak et al./ Computational Statistics & Data Analysis 24 (1997) 169-178
172
say, where et is conditional normally distributed with E(et)
=
0 and variance
~/t = 0(xt, Yt-1,. . . , yt-p, 0), and 0 is a vector of unknown parameter. This is a very general formulation (with the functions # and ff permitted to assume any functional forms) encompassing a wide range of linear and nonlinear time series; see, for example, the formulation "threshold plus ARCH" model in Section 3. By differentiating the log conditional likelihood, we have, setting y = (Yn, Yn-I .... )',
1
Off/{1
f(y,O) = -5 ~
~
~t
(y, _ #t)2 ~
d#t (Yt -/~,)
(2.3)
Hence
1
d~Pt{ _1_
~t + (fit~.7-#t)2 } + y ~ O#t00(fit ~9t-&)'
(2.4)
where fit = #(xt, Yt-l, . . . , Yt- p, O) and ~t = ~(xt, y t - l , . . . , Yt- p, 0). Thus
oO -
\oO/
2Z-#\oO/
(2.5)
1 (Ofit~ (O#t'~'
+E t ,,oO/
At 0 = 0, we have by result (1), for Fisher's information matrix, 1
1 (0¢t) (&bt'~'
1 (0/~,) (Opt)'
I(O) = -'}- ~'-~-~tz k OO J \ ~-lg J + ~"-* ~ \ OO J k--ffff / "
We now develop an iteratively weighted least squares algorithm for general maximum likelihood estimation. Eq. (2.2) in Mak's algorithm described above then yields, using Eqs. (2.3)-(2.5), the following linear equation in 0: ~ 2 f f211 t O~b,~30{ ( ~c3~bt'~'(O / - O) + ~pt - (yt -1~')2
+ZO, oo \oo/ Rearranging terms, we have Z
W l t X l t { Y l t - XtltO} -'[-Z
W2tx2t{Y2t- xt2tO} = O,
(2.6)
where Wit = 2~,t2 , ws, = ~ ,
Xlt = - ~ , x2t = c~O '
Ylt ys,
\ ~-ff j 0 - ~bt + (Yt - #t) 2, \-~-ff / 0 + (y, - / ~ , ) .
(2.7)
T.K. Mak et aL / Computational Statistics & Data Analysis24 (1997) 169-178
173
Thus, at the ( r + 1 )th step, 0(r+l) is computed as the weighted least squares estimate: t/--I
{i=~l ~t Witxitxit
0(r+l )
{i=~l ~t WitxitYit},
(2.8)
where Wit, xit and yit, i ~- 1,2, are as defined in (2.7), with 0 replaced by 0~r) and all the partial derivatives evaluated at O(r). The implementation of this algorithm is illustrated in detail with a numerical example in the next section. Its performance is then compared with the popular BHHH algorithm in Section 4.
3. A numerical example It is believed by some financial market practitioners that a stock price series behaves differently when the market is in a bull versus a bear situation. It is also reasonable to assume that the conditional variance for such data is not constant. Thus an appropriate time series model is the following threshold plus ARCH model (Tong, 1990, p. 116; Li and Lam, 1995) which is referred to as "second generation" model by Tong (1990). Let { Y t ) be a series representing the first difference of the log of actual market indices. We have
Yt :
~lp, Yt-p, -~-,St -+- ~)2pzYt-pl -~-at
~llYt-1 q- "'" -I-
Yt = ~b21Yt-I -~- "'"
if if
Yt-d > C, Yt-a ~ ¢,
where the conditional variance of et is given by Ip, = ~0 + ~lat2-1 "-~ " " " ~-
2 ~q~t--q'
~0,~1 . . . . ,~q ~ 0.
Here d is called the delay parameter and c the threshold parameter. If we let c = 0 and d = 1 then the rise and fall of the market in the previous day can be reflected in the model. Denote by It the indicator function which equals 1 if Yt-1 > c and 0 otherwise. Let (D1 = ( ~ ) l l , ' ' ' , ( ~ l p , ) t, t~2 = (~)21 . . . . ,~D2p2) t, ~ = ( ~ o , . . . , ~ q ) t, and OI = (~1,I ¢~2' I ~ 1 )" The model can be rewritten as, using the notation of Section 2, Yt
=
Pt P2 It Y ~ dP,jyt-j + (1 - I,) ~ q~2jYt-j + at j=l
j=l
: #t + at, with the conditional variance of at given by ~t = ~o + ~l(Yt-I - # t - l ) 2 + " " W e have, by direct differentiation,
~,/04,1 ] =
O#,/Oep: I ,
+ ~q(Yq -/it-q)
2-
T.K. Mak et al./ Computational Statistics & Data Analysis 24 (1997) 169-178
174
Table 1 Iterative estimates of the parameters of the "threshold plus ARCH" model in Section 3
- 2 In L
Iteration
0
-956.0752 -1862.7880 -1866.3272 -1867.5318 -1867.8179 -1867.8727 -1867.8821 -1867.8838 -1867.8840 -1867.8840 -1867.8840
1
2 3 4 5 6 7 8 9 10 Standard error
31.t t
Yt- 1
&b,
-
1,
~0 -
-
j=l
0~o 80
L
CO(r) X 103
~l(r)
0.020 0.151 0.217 0.261 0.283 0.294 0.298 0.300 0.301 0.301 0.301
20.000 0.171 0.151 0.140 0.136 0.134 0.133 0.132 0.132 0.132 0.132
0.020 0.104 0.205 0.272 0.307 0.323 0.330 0.332 0.333 0.334 0.334
0.0879
0.017
0.111
Opt = (1 -- I t )
"
'
eo
"
~4~2
k Yt - p l
-
tpl l(r)
k Y t - p2
q{
2Z
j=l
+ -(~0- ( ~ j
~j(Yt-j - ~tt--j ) -~]2t--j - - ~ -- (Yt--j -- ]Jt--j )2 ~--~ } .
At the ( r + 1 )th step, these expressions are used to compute (again, with 0 replaced by O(r) = (~'l(r), ~(r), ~i~))') the Wit, Xet and Yet, i = 1,2 in (2.7). The updated estimate 0(r+l) can then be computed using the weighted least squares expression (2.8). The "threshold plus ARCH" model is used to fit a set of real data representing the daily closings L t of the Hang Seng index in 1979. Here Yt ----- In L t - In L t - 1 , and the model
{ ~llYt-i + et if Yt =
~t
Yt-1 >_ 0,
i f Yt-1 < O,
I//, = ~0 + ~lgt2-1 = ~0 ~- ~ I ( Y t - I - - / A t - I )2
is postulated. Thus p~ ---- 1, P2 -----0, q = 1, d = 1 and c = 0. The starting value 0(0) = (~11(0), ~o(0), ~,(0))' = (0.02, 0.02, 0.02)' is used. Table 1 gives the iterative estimates at each step until convergence is attained (the algorithm is taken to have converged if the change in - 2 lnL is less than 10-5). It is seen that the final estimate is obtained in only 9 iterations.
T.K. M a k et al./ Computational Statistics & Data Analysis 24 (1997) 169-178
175
4. Comparison with the BHHH algorithm To investigate the performance of the proposed algorithm, we compare it with a very popular one used in applied work for estimating ARCH models, the Berndt, Hall, Hall and Hansman (BHHH, 1974) algorithm. Its popularity has been reported in the review article by Bera and Higgins (1992). The BHHH version we used is the one built in the "Maximum Likelihood" procedure of the GAUSS package. The GAUSS version used is "GAUSS-386i VM Version 3.1.4 (12 March 1993)". All the computing work is performed by an AST Bravo LC 4/66 personal computer - an 80486 machine. Our aim is to compare the computing time of the BHHH algorithm and ours in estimating ARCH models. In the simulations, both order 1 and order 2 ARCH models are considered. They are: Model (a):
Yt
=
~'t,
~kt = 0.25 + 0.25~_ t. Model (b):
Yt = ~t,
fit = 0.25 + 0.25e2_1 + 0.25~t2_2. Model (b) is more complex than model (a). This will facilitate the comparison of the two algorithms as the complexity of the model increases. The number of observations in each generated time series is 400. It is found that the BHHH can fairly easily run into trouble if the data are not properly scaled or if the starting value is too far from the optimum. This is in fact mentioned in the GAUSS manual (p. 17, Maximum likelihood, GAUSS applications, 1994). If the above conditions are not met, the usual consequence is that the GAUSS system will stop and report an error message saying it has trouble in matrix manipulations. The convergence problem of the BHHH algorithm is indeed confirmed in our initial simulation studies. On the contrary, the proposed algorithm converges in most cases even when the BHHH does not. To facilitate the comparison, we thus rerun our simulation using the true values as the starting values and preprocess the generated data to avoid extreme values. The preprocessing is as follows: Let m and s be the sample mean and sample standard deviation of a generated series. We replace any values in the series that are smaller than m - 3s or larger than m + 3s by m - 3s and m + 3s, respectively. One hundred data sets are simulated for each model. The convergence criteria for both algorithms are the same, i.e., a tolerance of 0.01 in every element of the gradient vector. Using the gradient vector as the convergence criterion is set by the GAUSS package. The default value used in GAUSS is 10 -4. We have used a less stringent criterion because of the convergence problem experienced by BHHH as mentioned earlier. Finally, for the sake of comparison, the algorithm of this paper is also coded in GAUSS. Table 2 reports, for each model, the total times required to compute the MLEs for 100 simulated data sets using each of the two algorithms. For model (a), the proposed method takes only 59.7% of the time by BHHH. For model (b),
T.K. Mak et al./ Computational Statistics & Data Analysis 24 (1997) 169-178
176
Table 2 Computing time comparison of two algorithms Model algorithm
(a) Proposed
BHHH
(b) Proposed
BHHH
Computing time
412.71 s
690.74 s
695.14 s
1571.75 s
Table 3 Comparison of two algorithms using SP500 index data Proposed
BHHH
Initial value Computing time
Estimate Computingtime
0.008 0.09 0.004 0.04 0.001 0.01
0.00766 0.0967 0.00766 0.0968 0.00766 0.0971
8.46 s 19.28 59.54
Estimate
18.84 s
0.0077 0,0966 87.38 0.0077 0.0967 Failedto give estimate
which is more complex, the improvement is even more substantial. The proposed method takes only 44.2% of the time by B H H H to compute the MLEs of the 100 simulated data sets. This suggests that the gain in efficiency using the proposed algorithm could be even more substantial for complicated models with heavier computing demand. To further compare the two algorithms, we use a set of real data to compare their efficiencies under different starting values (recall that we have used the true values as the initial values in the simulation study above.) The data considered are the daily closings of the Standard and Poor's 500 (SP500) index from January 1989 to December 1990. There are 505 observations. The data series Yt used is ten times the first difference of logs. It is found that the data can be fitted as an A R C H ( l ) model of a form similar to the one in model (b) in the simulations. Mathematically, the model can be written as Yt = et, 2
Our task is to estimate the vector (¢0, ¢1 )'. Three sets of initial vector, (0.008, 0.09)', (0.004, 0.04)', and (0.001, 0.01)', are used and the results are summarized in Table 3. The MLE is found to be (0.0077, 0.0097). It is seen that when the initial vector is furthest from the final estimate (0~0~ = (0.001, 0.01)'), the B H H H algorithm fails to converge. In the two remaining cases, the proposed algorithm gives a solution considerably faster than the B H H H algorithm. The gain in efficiency seems greater as the initial vector gets further away from the solution.
T.K. Mak et al./ Computational Statistics & Data Analysis 24 (1997) 169-178
177
5. Discussion In this paper, an iteratively weighted least squares algorithm is developed for maximum likelihood estimation of general linear or nonlinear time series models with conditional heteroscedasticity. The algorithm is virtually applicable to any such models, as long as the practitioners can supply subroutines for computing the derivatives of the mean and variance functions. It is convenient to implement since efficient algorithms for least squares estimation exist and are widely available in many statistical packages. It is noted from the simulation studies in Section 4 that if the initial values are not close to the solution, popular packages can run into a convergence problem even in relatively simple models such as the ARCH models with known means. The convergence problem, as seen in the simulation studies, seems in general much less severe for the proposed algorithm. In fact, Mak (1993) showed that if the linearization (2.2) is exact, the algorithm will converge regardless of the starting values chosen, unless the model is badly misspecified. It is also shown numerically that for the ARCH models, the proposed algorithm also outperforms the popular BHHH algorithm, requiring a considerably shorter computing time to obtain the MLE. These properties are generally not surprising in view of the theoretical results in Mak (1993). It is clear that some of the parameters in the variance function are typically nonnegative. Our experience with many real data sets suggests that the simple approach of setting negative estimates to 0 works well in practice. Each step in the proposed iterative algorithm involves a least squares minimization problem. Thus, alternatively, in the presence of negative final estimates, the algorithm can be modified so that a least squares solution is sought at each step with the variance parameters constrained to be nonnegative. Methods for linear least squares with linear inequality constraints have been well established (Lawson, 1974) and can therefore be readily adopted to the proposed algorithm. Green (1984) also showed how the IWLS procedure can be modified to yield robust procedure and diagnostic techniques and his results can in principle be adopted to the present algorithm.
Acknowledgements The authors are grateful to the editor-in-chief Dr. Azen and the referees for their comments and suggestions, leading to considerable improvement of the paper. References Bera, A.K. and M.L. Higgins, A survey of ARCH models: properties, estimation and testing, J. Econom. Survey, 7 (1993) 305-336. Engle, R.F., Autoregressive conditional heteroscedasticitywith estimates of the variance of UK inflation, Econometrika, 50 (1982) 987-1008. GAUSS applications manual, maximum likelihood (1994). Gourieroux, C. and A. Monfort, Qualitative threshold ARCH models, J. Econometrics, 52 (1992) 159-199.
178
T.K. Mak et al./ Computational Statistics & Data Analysis 24 (1997) 169-178
Green, P.J., Iteratively reweighted least squares for maximum likelihood estimation, and some robust and resistant alternatives (with discussion), J. Roy. Statist. Soc. Ser. B, 46 (1984) 149-192. Lawson, C.L., Solving least squares problems (Prentice-Hall, New York, 1974). Li, W.K. and K. Lam, Modelling the asymmetry in stock returns using threshold ARCH models, Statistician, 44 (1993) 333-341. Mak, T.K., Solving nonlinear estimation equations, J. Roy. Statist. Soc. Ser. B, 55 (1993) 945-955. Quandt, R.E., in: Z. Griliches and M.D. Intriligator (Eds.), Computational problems and methods in handbook of econometrics (North-Holland, Amsterdam, 1983) 699-764. Subba Rao, T., On the theory of bilinear time series models, J. Roy. Statist. Soc. Ser. B, 43 (1981) 244-255. Thisted, R.A., Elements of statistical computing (Chapman and Hall, New York, 1988). Tong, H., in: C.H. Chen (Ed.), On a threshoM model In pattern recognition and signal processing (Sijthoff and Noordhoff, Amsterdam, 1978) 575-586. Tong, H., Nonlinear time series: a dynamical system approach (Oxford University Press, Oxford, 1990). Tong, H. and K.S. Lim, Threshold auto-regression, limit cycles and cyclical data (with discussion), J. Roy. Statist. Soc. Ser. B, 42 (1980)245-292.