Journal of Econometrics 88 (1999) 365—401
Monte Carlo inference in econometric models with symmetric stable disturbances Efthymios G. Tsionas* Council of Economic Advisers, Ministry of National Economy, Suite 607, 5 Nikis Street, Syntagma Square, 10180 Athens, Greece Received 1 September 1995; received in revised form 1 April 1998
Abstract The paper develops Markov Chain Monte Carlo methods to perform exact posterior analysis in models with symmetric stable Paretian disturbances. It is shown how posterior moments and marginal densities of functions of parameters can be computed methodically by combining a Gibbs sampler with Metropolis independence chains. The new method is shown to perform satisfactorily in constructed examples. The method is also applied to a set of monthly real exchange rates and the question of difference versus trend stationarity is taken up jointly with the problem of inference about the parameter of the stable distribution. 1999 Elsevier Science S.A. All rights reserved. JEL classification: C11, C15 Keywords: Bayesian inference; Stable distributions; Markov chain Monte Carlo; Exchange rate models; Difference stationarity
1. Introduction Beginning with the pioneering work of Mandelbrot (1960, 1963a, b, 1967) several studies have been concerned with the problem of application of stable Paretian distributions in economics. The problem of estimation and inference emerged naturally, motivated by Mandelbrot’s arguments and the first techniques were developed by Fama and Roll (1968, 1971). After nearly twenty-five years the question of whether or not stable distributions might be reasonable
* E-mail:
[email protected]. 0304-4076/99/$ — see front matter 1999 Elsevier Science S.A. All rights reserved. PII: S 0 3 0 4 - 4 0 7 6 ( 9 8 ) 0 0 0 3 9 - 6
366
E.G. Tsionas / Journal of Econometrics 88 (1999) 365–401
alternatives to normality (or other candidates) is still left unanswered. This is partially because the question of inference in linear models with stable disturbances is not fully resolved. Despite many efforts fundamental problems associated with the stable distributions make us restrict attention to simple location-scale models, and only with great difficulties. These problems can be described briefly as follows: First, the density function of stable distributions is not expressible in closed form with the exception of few values of the parameters (see Zolotarev, 1986, Chapter 1). Second, no practical generalizations from location-scale to general linear models exist. Third, important numerical problems appear in connection with approximations of densities. Fourth, nearly all the ad hoc methods that have been proposed require arbitrary choices to be made regarding important intermediate steps. Ad hoc methods rely mainly on minimum distance estimators that exploit the simple nature of characteristic functions. See for example Akrigay and Lamoureaux (1989), Koutrouvelis (1980, 1981), McCulloch (1986) and Press (1972). However, even with ad hoc methods numerical problems restrict attention to simple models in practice. The appearance of stable distributions in certain limit theorems — especially the fact that stable distributions arise as the only limits of sums of independent and identically distributed (i.i.d.) random variables — has always been an appealing property for their use in economics and a natural alternative to normality when it is justified by using an interpretation of error terms as sums of a large number of i.i.d. variates. The fact that non-normal stable distributions exhibit infinite second moments was another motivation. Mandelbrot (1963b) argued that when second or higher moments behave erratically due to a number of extreme outliers, stable distributions might still be used as a reasonable approximation. Whether or not this approximation is satisfactory enough motivated many studies, especially in finance and foreign exchange rates. Mandelbrot (1960) work was macroeconomic in nature. For some applications see Akrigay and Booth (1988), Blattberg and Gonedes (1974), Dumouchel and Ohlsen (1975), Leitch and Paulson (1975) and Mittnik and Rachev (1989), to name only a few. Because of the important problems associated with inference in stable samples, important questions such as estimation, comparison of alternative models, and relevance for asset pricing remained unanswered. In the past, interest concentrated mainly on model comparison that involved the stable distributions, the normal distribution (a special case), discrete normal mixtures and the Student-t distribution. See for example Blattberg and Gonedes (1974) and Lau et al. (1990). Therefore, likelihood inference for the stable distributions has been a relatively obscure area without many applications (see, however, DuMouchel, 1973, 1975). These difficulties motivated recently Mittnik and Rachev (1993) to alternatively focus on Weibull distributions, based on certain statistical invariance properties (which was motivated in turn by the additivity properties of stable random variables).
E.G. Tsionas / Journal of Econometrics 88 (1999) 365–401
367
Baillie (1993) in his comments to Mittnik and Rachev (1993), p. 344) argues that ‘for stable distributions to be natural competitors to the ARCH family, it would seem necessary for them to be capable of relaxing the independence assumption and also for more to be known about inference. At the moment, the efficient estimation of a regression equation with a disturbance term drawn from a stable distribution appears intractable’. In this study, it is shown that complete posterior distributions of parameters for general models with stable disturbances can be obtained quite easily. The present paper formulates a general framework for Monte Carlo posterior inference in models with symmetric stable errors. In more detail: 1. It proposes a practical method for exact posterior inference in models with symmetric stable Paretian disturbances. Models can be linear, non-linear or of the seemingly unrelated regression type, but for clarity the emphasis of this paper is on linear models. The method is based on the representation of symmetric stable variates as normal scale mixtures. 2. It shows how to compute posterior marginal moments and densities of functions of parameters as well as how to compare the normal specification with other members of the symmetric stable Paretian distributions. It is emphasized that complete posterior distributions are produced rather than just marginal moments or posterior modes. 3. It proposes a Markov chain Monte Carlo technique to perform the computations. The method is based on a combination of the Gibbs sampler with data augmentation and Metropolis—Hastings chains. Recently Buckle (1995) exploited Zolotarev (1966) integral representation, and proposed a Monte Carlo method for posterior inference, but not for general models. 4. The new method is applied to constructed examples and is shown to perform satisfactorily. Subsequently inference is taken up in a model with stable disturbances that allows to study the question of trend versus difference stationarity. The model is applied to a set of monthly real exchange rates of seven currencies versus the US dollar and the questions of Paretian stability and trend-difference stationarity or non-stationarity are jointly examined. There are a number of differences with Buckle’s (1995) approach. Buckle (1995) applied his method to location-scale models only. Although Buckle’s (1995) approach is useful in scale-location stable models, the principal problem has always been to analyze arbitrary models with stable errors. In the present paper it is shown that Monte Carlo draws for location and scale parameters can be realized by drawing from normal and chi-square distributions, which is an easy task. Buckle’s (1995) approach relies on data augmentation with a total of 2n augmenting variables, where n is sample size. For symmetric stable errors, this study shows that one can use a normal scale mixture representation that requires n augmenting variables. Therefore, in moderate to large samples, one expects increases in efficiency. A problem with Buckle’s (1995) approach is that it
368
E.G. Tsionas / Journal of Econometrics 88 (1999) 365–401
is not well suited to address the case of stable distributions with characteristic exponent parameter (a) near unity because certain degeneracies arise. Cauchy distributions are special cases of stable distributions with a"1. In other words, to follow Buckle’s (1995) approach one needs to assume a prior for the characteristic exponent (a) that places zero mass at a neighborhood of a"1. The present approach is free from this problem. The remainder of the paper is organized as follows: The model is developed in Section 2 where it is shown that the posterior distribution of parameters in a model with symmetric stable disturbances reduces to an augmented posterior corresponding to a heteroscedastic normal model. Section 3 explains how to exploit the conditional structure of the problem and presents the conditional distributions of parameters. Based on these conditional distributions, a Markov Chain Monte Carlo method is developed to perform the necessary computations. Section 4 takes up the question of sampling the conditional distribution of the critical parameter (namely the characteristic exponent) and develops a Metropolis subchain based on a fast Fourier transform. Section 5 illustrates the new method in constructed examples for various values of the characteristic exponent and sample sizes. Section 6 develops the trend-difference stationary model. Section 7 applies this model to monthly exchange rate data and takes up the question of non-stationarity jointly with the question of stable Paretian distributed exchange rates. Section 8 concludes the paper and proposes some directions for future research.
2. The model Before proceeding the following definition is useful in establishing notation. Definition 1. The random variable u is distributed according to the stable law if and only if its characteristic function (t),E[exp(itu)], t31 is given by the following representation:
(t)"exp[ikt!c"t"?+1#ib sgn(t)¼(t,a),],
(1)
0(a)2, !1)b)1, c'0
!tan(na/2)
(aO1),
(2/n)log"t",
otherwise.
¼(t,a)"
This is also known as Levy’s representation. Here c is a scale parameter, k a location parameter, a is the characteristic exponent and b a skewness parameter which equals zero for symmetric distributions. I refer to this model as a location-scale model. If a"2 the stable distribution function reduces to that of
E.G. Tsionas / Journal of Econometrics 88 (1999) 365–401
369
the N(k, 2c) and if a"1 (with b"0) to that of the Cauchy (Student-t with one degree of freedom) with median k. It is well known that when a(1, the support of u is 1 (!1 ) when b"1 (b"!1). The parameter a measures the degree > > of leptokurtosis: lower values of a correspond to distributions with heavier tails. We shall refer to the class of symmetric stable distributions as S(a) and to the class of positive stable distributions (i.e. a(1 and b"1) as S (a). The standard > form (k"0, c"1) is assumed throughout. For the standard form of symmetric stable distributions, the characteristic function is simply given by
(t)"exp[!"t"?]. It is assumed that the data set is (y, X) where y"[y ,2,y ] is a n;1 vector L of observations on a dependent variable and X"[x , x ,2, x ] is an n;k L matrix of observations on covariates. Suppose also that y "xb#(p/(2)u , i"1,2, n G G G
(2)
where +u , is distributed as i.i.d. S(a), b is a k;1 vector of unknown parameters, G and p is an unknown scale parameter. The parameter space is HM "H ;H ;H "1I;(0,R);(0, 2]. @ N ? Let now f (."a) be the density function of distributions in S(a). Let n(b, p, a) denote a prior distribution on b, p and a. In this study it is assumed that n(b, p, a),n(b"p, a)n(p"a)n(a)J ;exp[!1/2(b!b )A(b!b )]p\I (a), (3)
where b and A are the (normal) prior mean vector and precision matrix, respectively. Also, I (x)"1 (if x3A) and 0 otherwise. It is clear that under Eq. (3), the posterior distribution of parameters of Eq. (2) is given by
L y !xb G a n(b, p, a)IHM (b, p, a). P*(b, p, a" y, X)Jp\L f G p/(2 G
(4)
The fact that a closed-form expression for f does not exist is well known. As a result, a full likelihood analysis of samples from a stable distribution is an extremely difficult undertaking even under the unlikely assumption that the only unknown parameter is a. Although f (."a) possesses series expansions and has integral representations, ( Zolotarev, 1986, pp. 71—81, pp. 87—92; Ibragimov and Chernin, 1959; Holt and Crow, 1973 ) computation of stable densities to sufficient precision and a range of values for a is a task that involves arcane numerical issues that may well account for the notable absence of likelihood inference in the stable class of distributions. The numerical problems are treated in Holt and Crow (1973) and Bol’shev et al. (1968).
370
E.G. Tsionas / Journal of Econometrics 88 (1999) 365–401
Another main obstacle in developing inference procedures has always been the fact that likelihood analysis does not generalize easily from the stable model where only a is unknown to the general linear model or (of course) non-linear models. When only a scale and location parameter are involved, simple ad hoc normalization rules (Leitch and Paulson, 1975 ) or grid searches (Feuerverger and McDunnough, 1980 ) might be used. In higher dimensions these procedures are costly and prone to numerical failures. In addition, although they could locate the mode of the likelihood function they could not be helpful when complete posterior distributions are sought. With this motivation in mind, we will express the model in the following form: y "xb#u G G G u "pue G G G
(5)
e &i.i.d. N(0,1), i"1,2,n, G where (b, p, a) are as before, and (u ) are unknown parameters distributed G a priori i.i.d. according to S (a/2). > In this formulation, the model is augmented with a set of new parameters (u , i"1,2,n) and appropriate priors. This assumption essentially expresses G u as a scale mixture of normal distributions, the mixing variates u being from the G G class of positive stable laws. Let h(."a) be the density of random variables with distributions in S (a/2). > Under the new assumption, the posterior distribution of parameters (b, u, p, a) of Eq. (5) is given by
L P(b, u, p, a" y, X )Jp\L u\ G G (y !x b) G u\ h(u "a) n(b, p, a)IHM (b, p, a), ;exp ! G G G 2p (6) where u"[u ,2,u ]. L Notice that L h(u "a) n (b, p, a) is simply the prior of the augmented G G parameter vector (b, u, p, a) and the remaining terms form the likelihood function of a heteroscedastic normal linear model. Although this last assumption may seem arbitrary quite the opposite is the case and in fact Eqs. (4) and (6) are equivalent in the sense that ∀(b, p, a)3HM , 1L P(b, u, p, a" y, X ) du" > P*(b, p, a" y, X ). The proof appears in Appendix A, as proof of Proposition A.1. The importance of the scale-mixture-of-normals representation is in the following: inference for models with stable errors has to be based on P*(b, p, a" y, X ). However, as mentioned above, this posterior is intractable. Augmenting the parameter vector with u results in a posterior P(b, u, p, a" y, X )
E.G. Tsionas / Journal of Econometrics 88 (1999) 365–401
371
that corresponds to a normal heteroscedastic model. Since the two posteriors are equivalent and the second is clearly tractable, as we will show inferences can be based on the augmented posterior. It should be mentioned that scale mixture representations in the context of gamma mixing variables (that lead to the Student-t family) were used by Carlin and Polson (1991), Chib (1993) and Albert and Chib (1993). Using Eq. (6), one arrives at P(b, u, p, a" y, X )Jp\L>"X"\
L 1 ; h(u "a)exp ! +(y!Xb)X\(y!Xb), G 2p G
#(b!b )A(b!b ), n(a),
(7)
where X,diag(u). Existence of moments and the fact that Eq. (6) or Eq. (7) corresponds to a proper posterior are shown in Proposition A.2 in Appendix A. For a general non-linear regression model the posterior is more complicated only to the extent implied by the non-linearity itself. In other words the assumption that disturbances are stable Paretian introduces no additional complications. To see this replace the model in Eq. (5) by: y "g(x ; b)#u , G G G u "p/(2ue , e &i.i.d. N(0, 1), i"1,2, n G G G G
(8)
where g:1I>KP1, b31K and p, u have the same interpretation as before. The G posterior in (b, p, u) corresponding to the non-linear model (8) is given by L P(b, u, p, a" y, X )Jp\L>"X"\ h(u "a) G G
1 ;exp ! +y!u(X; b)X\(y!u(X; b) 2p
#(b!b )A(b!b ), n(a)
(9)
where u(X; b),[g(x ; b), i"1,2, n]. G It is, therefore, clear that inference about (b, p) given (u, a) is not more complicated than standard Bayesian procedures when applied to non-linear models (Zellner, 1971, Chapter 4). Regarding inference about (u, a) given (b, p) it is clear that a successful attack of this problem in the linear case adapts readily
372
E.G. Tsionas / Journal of Econometrics 88 (1999) 365–401
to account for nonlinearities. For Gibbs sampling methods in non-linear models see Carlin and Gelfand (1991), Carlin and Polson (1991), Ritter and Tanner (1992) or Tanner (1993) for a summary. Although these statements pertain to conditional distributions, the conditional structure of the problem has important implications for marginal inference as will be shown in the next section.
3. Posterior inference 3.1. Conditional distributions Although the marginal in b, p and a of the posterior kernel in Eq. (6) can be interpreted as the posterior kernel in Eq. (4), the full posterior kernel (in b, p, a and u) cannot be interpreted in a similar manner. However, it does imply conditional distributions that are easy to understand. These conditional distributions are critical to computing posterior moments of functions of the parameters and form the basis of the Gibbs sampler, to be discussed in the next section. It is, therefore, necessary to provide a discussion of their form. The full conditional distributions b"p, u, y, X and p"u, b, y, X take the same form as in Carlin and Polson (1991), Chib (1993) and Albert and Chib (1993). 3.1.1. Distribution of b"p, u, y, X Using Eq. (6) it is clear that: b"p, u, y, X&N(bK ,p[pA#XX\X ]\)
(10)
where bK "[pA#XX\X ]\[XX\y#pAb ], i.e. a mixed GLS estimator type expression. 3.1.2. Distribution of p"b, u, y, X Since on the other hand, P(p"b, u, y, X )Jp\L> exp[!(y!Xb)X\(y!Xb)/2p] it follows that (y!Xb)X\(y!Xb)/p"b, u, y, X&s(n)
(11)
i.e. a chi-square distribution with n degrees of freedom. 3.1.3. Distribution of u"b, p, y, X Clearly u "b, p, y, X ) are independent, and their conditional kernel is G P(u "b, p, y, X )Ju\exp[!q (b, p)u\]h(u "a), i"1,2, n, G G G G G
(12)
E.G. Tsionas / Journal of Econometrics 88 (1999) 365–401
373
where q (b, p),(y !xb)/2p, i"1,2, n. G G G 3.1.4. Distribution of a"b, p, y, X It is possible to integrate out u explicitly in Eq. (6), and find
L y !xb G "a n(a). p(a"b, p, y, X )" f G p/(2 G
(13)
3.2. Computation of posterior moments Let h be the parameter vector, i.e. h"[bpua]. The problem is to compute integrals of the form HG(h)P(h"y, X )dh/HP(h"y, X )dh for integrable functions of the parameters G(h). These functions may include posterior moments, probabilities that h lies in a given area etc. (see, e.g. van Dijk et al. (1987), especially p. 85). The Bayesian multiple integration problem cannot be solved analytically in this case even when a is considered known. One has, therefore, to rely on numerical methods in exploring properties of Eq. (6). One such technique is the Gibbs sampler (Gelfand and Smith, 1989) with data augmentation (Tanner and Wong (1987) or Tanner (1993) ). The method produces a sequence of correlated drawings hG"[bGYpGuGYaG], i"1,2, M that converges to the posterior distribution whose kernel is given by Eq. (6). Let H"HM ;1L be the augmented parameter space. Starting from a given h3H > the Gibbs sampler produces subsequent iterates by repeated draws from the conditional distributions of parameters as follows: given hG, E E E E
Draw Draw Draw Draw
uG> conditional on y, X, bG, pG and aG using Eq. (12). pG> conditional on y, X, bG, uG> and aG using Eq. (11). bG> conditional on y, X, uG>, pG> and aG using Eq. (10). aG> conditional on y, X, bG>, pG> and uG> using Eq. (13).
The ability to generate random drawings from the conditional distributions is critical. Whether or not this is possible in the present case is taken up next.
If this strategy was not followed, the conditional distribution of u would have G beenP(a"u, y, X )JL h(u "a)n(a). It will be explained in Section 4 that this formis not well suited G G for numerical work and, as a result, Eq. (13) will be used instead.
374
E.G. Tsionas / Journal of Econometrics 88 (1999) 365–401
3.3. Comments Generating random drawings from the conditional distributions (10) and (11) is straightforward. Although Eq. (12) involves the density of positive stable laws, sampling u is straightforward via standard rejection methods (Ripley, 1987, pp. G 60—71): One such obvious method is to use S (a/2) as a source density. An > alternative is to generate random drawings from Eq. (12) based on an independence chain ( Tierney, 1994). Specifically, for each i"1,2,n and at the tth pass of the sampler, one draws uR from S (a/2). Define R(w;q),w\ exp(!q/w). G > With probability n(uR, uR\; q )"min+R(uR; q )/R(uR\; q ),1, G G G G G G G accept uR as the current draw and with probability 1!n(uR,uR\; q ) set G G G G uR"uR\. The Metropolis transition kernel for (u , i"1,2,n) is defined as the G G G product of the kernels for i. To use h(u"a) as a proposal density in a Metropolis-Hastings step one needs R(w; q) to be bounded. Since lim R(w; q)"0(wP0) and R(w; q) is maximized at w"2q, this condition is satisfied. This method is discussed by Chib and Greenberg (1995) and (Tierney, 1994, Section 2.4). It has been used by Jacquier et al. (1994) in the context of a stochastic volatility model and also by Muller and Pole (1993) in the context of GARCH models. It should be noted that sampling stable variates is straightforward. For example Chambers et al. (1976) present an efficient algorithm based on integral representations of the distribution function. When a is known (or when a specified value of a must be assumed) this completes the discussion of Monte Carlo inference with the conclusion that it affords a straightforward solution. It should perhaps be pointed out that treating a as known is not completely unrealistic when one adopts a stable distribution for robustness purposes, just as one might be willing to fix the degrees of freedom for a Student-t distribution. However, when such an assumption cannot be made generating random draws from Eq. (13) is necessary. One idea is to use the Ibragimov and Chernin (1959) representation of u as G a product of a standard exponential variate and a (complicated trigonometric) function of a standard uniform variate. This would require additional augmentation by an n-dimensional vector of parameters with uniform priors. Other integral representations that could suggest data augmentation do exist. However, they do not result in a proper posterior distribution since the integrands can become negative; integral representations (as well as representations in convergent series that could motivate finite mixture formulations) invariably involve non-positive oscillatory terms. Buckle (1995) essentially uses this idea.
E.G. Tsionas / Journal of Econometrics 88 (1999) 365–401
375
4. Sampling the conditional posterior distribution of a Given (b, p, y, X) the conditional posterior distribution of a is given by Eq. (13). The primary difficulty in generating random drawings from Eq. (13) is that the density f (."a) is in general unavailable in closed form and is usually computed using quadrature (Holt and Crow (1973) or Bol’shev et al. (1971)). Quadrature is expensive when n is large and it would be highly desirable (although atypical in most Bayesian applications) to make the cost of sampling (13) nearly independent of n. One such method is to rely on a fast Fourier transform (FFT) of the characteristic function (Feuerverger and McDunnough, 1980, 1981). By the inversion theorem f (x"a)"(1/2n)
exp(!itx!"t"?)dt,
\ it is clear that when the FFT is applied to the sequence 1/2, (Dt),2, ((N!1)Dt) then one obtains (save for a constant) values of the density f (0"a), f (Dx"a), f (2Dx"a),2, f (!2Dx"a), f (!Dx"a) where Dt" 2n/(NDx) and N is a given integer. The computation of the likelihood of a sample (u , i"1,2,n) where u "(y !xb)/p/(2 is then achieved via interpoG G G G lation. In the present work N"1024 (FFTs are faster when N is a product of small primes) and linear interpolation is used. Spacings Dx"0.1 were found adequate. This approach works well for values a*1. For a(1 the FFT is applied to a transformed stable density (see Appendix B) with index 1/a but non-zero skewness. This guarantees good performance for small values of a. Further details are provided in Appendix B. Even though a FFT greatly simplifies the task of computing the stable density (with computational costs increasing negligibly with n, due only to the fact that more interpolations are needed) the problem of generating random numbers from Eq. (13) remains. The obvious solution of using a grid-based Gibbs sampler (Ritter and Tanner, 1992 ) is prohibitive since many FFTs per pass would greatly increase computational costs for moderate n. It was estimated that about 20 grid points are needed for sufficient precision. An efficient approach is to use a Metropolis random walk subchain (Hastings, 1970; Tierney, 1994, Sections 2.3 and 2.4). Given the current draw aGa candidate aJ is drawn from a (symmetric) transition density J(."aG). With probability r(aG, aJ )"min(1, p(aJ )/p(aG) the candidate is accepted and aG>"a, otherwise one remains at the current draw, i.e. aG>"a. In work reported here, J has been specified as
376
E.G. Tsionas / Journal of Econometrics 88 (1999) 365–401
N(aG, (cp*)) I(a3(0,2]) where c is an ‘inflation factor’ and p* was determined as follows: given initial estimates of b and p obtained via ¸ regression, an optimization technique is used to obtain the mode of the likelihood function and p* is determined by a numerical approximation to the appropriate element of the information matrix. This step is done only once at the beginning before any other computations. Even this step can be avoided by using as an approximation to the mode, a value of a that is determined via an empirical characteristic function approach (e.g. Koutrouvelis, 1980). This approach results in estimates that are sensitive on the choice of grid points involved (for the variable t) and more variable than the conditional mode. So the additional cost involved in an initial optimization is worth taking. One shortcut that also behaved well in practice (but not pursued further) is to fix b and p at their ¸ regression values and maximize Eq. (11) instead of the full likelihood. If full ML is expensive this reduces the problem to a univariate optimization with respect to a. This provides a complete approach to the problem of posterior inference for (b, p, a) in symmetrically stable samples. The proposed combination of the Gibbs sampler with Hastings—Metropolis sub-chains is seen to be feasible and provides a general answer to the problem. For non-linear models it was explained in Section 2 that only trivial modifications to the posterior distribution are required. After the discussion in this section it is clear that since sampling (u,a"b, p, y, X ) has been successfully attacked in the linear case the only remaining problem is to sample b"p, u, y, X in the normal heteroscedastic non-linear regression model: techniques explained in Tanner (1993), Chapter 4), Carlin and Gelfand (1991) and Carlin and Polson (1991) could potentially be helpful. Naturally, the parameters u,p and a should not enter u in Eq. (9) otherwise the form of the conditional distributions would be modified. Time series extensions are also possible and the model in Section 6 offers an example. It is perhaps instructive to point out that ad-hoc frequentist estimation methods result in relatively straightforward estimators of a but are generally unable to solve the so-called ‘normalization problem’, i.e. get estimates of many location and scale parameters, like b and p. The Gibbs sampler presented in this
A referee suggested that it may be an increase in efficiency to utilize the adaptive rejection Metropolis sampling method of Gilks (1992). A similar idea (but in the context of importance sampling) was suggested by West (1993). It should be noted that a could be sampled using its conditional (on u, b, p) kernel P(a"b, p, u, y, X ), see footnote 1. Computations in Eq. (13) will be more precise because interpolation for small values of a is inaccurate: P(a"b,p, u, y, X ) involves positive stable densities with exponent a/2 whereas Eq. (13) involves symmetric stable densities with exponent a. Second, smaller values of a are prone to numerical problems and generally slow down convergence of p. In addition the behavior of P(a"b, p, u, y, X ) for values of a near (but not equal to) one is degenerate and creates numerical problems that are easily avoided using Eq. (13) instead. Third, P(a"b, p, u,y, X ) would involve complex FFTs whereas Eq. (13) involves real FFTs and, therefore, is about twice as fast.
E.G. Tsionas / Journal of Econometrics 88 (1999) 365–401
377
paper, gives straightforward conditional posterior distributions for b, p and u. Generating random drawings from the conditional of a"b, p is less straightforward but the only technical requirement is to use a FFT. Provided marginal densities are available, computing probabilities of the form Pr(a3A), A-(0,2] can be done easily and thus facilitate comparisons of different members of the stable family. Regarding Bayes probability intervals, quantiles of the chain draws can be used in the obvious way. See Section 6 for further details. 5. Constructed examples To see whether the Gibbs—Metropolis sampler performs satisfactorily the following design is assumed: y "b #b x #b x #(p/(2) u , i"1,2, n G G G G where x , x are i.i.d. N(0,1) and u is i.i.d. S(a).Two sample sizes will be G G G considered, n"200 and n"1000. The characteristic exponent is set to a"0.8, 1.5, 1.8 and 2.0. It is also assumed that p"1, b"[1 !1 1] and that x and x are fixed across different values of a. Independent G G N(0.0001, 0.0001\) priors are placed on b ( j"1, 2, 3). A flat prior (n(a)"1) is H adopted for a. Symmetric stable variates were generated using the normal scale mixture representation: u "ue where u is S (a/2) and e is i.i.d. N(0, 1). G G G G > G One-sided stable variates are generated using the Ibragimov and Chernin (1959) representation. Several other values of a were tested with comparable results that are not reported as they do not add much to the main point. Table 1 summarizes the results for all parameters of the model. These results are based on 1000 preliminary passes followed by 5000 passes on which computations of posterior moments were based. Initial conditions for b, a and p were obtained via full maximum likelihood estimation. The results in Table 1 are according to expectations. We examine next the question of whether or not the algorithm exhibits sensitive dependence on initial conditions. The question of one long-run (Geyer, 1992) versus restarts is not resolved. Following the principles in Gelman and Rubin (1992), and focusing on the case a"1.5, the chain was restarted at two other points that are many standard deviations away from the posterior mean, namely a"1.9 and a"0.9. Results for other initial conditions (and values of a) are the same and therefore analysis focuses on these two starting points. Two parallel chains of length 5000 were simulated and sequences of ergodic averages were compared with ergodic averages of parameters when the chain starts at the full ML estimate. For b and p convergence was not found to be an issue. The results for a — by far the parameter that one would a priori suspect as prone to problems — and n"200 are depicted in Fig. 1. The chains immediately jump to the neighborhood
378
E.G. Tsionas / Journal of Econometrics 88 (1999) 365–401
Table 1 Posterior moments for the constructed examples
b b b p a
n"200 Dataset corresponding to a"0.8 a"1.5
a"1.8
a"2.0
0.968 (0.068) !1.0277 (0.076) 1.0924 (0.083) 1.120 (0.250) 0.836 (0.067)
1.019 (0.078) !1.002 (0.080) 1.089 (0.079) 1.084 (0.133) 1.769 (0.090)
1.044 (0.073) !1.041 (0.074) 1.070 (0.073) 0.992 (0.108) 1.923 (0.065)
1.0365 (0.081) !0.994 (0.087) 1.080 (0.082) 1.106 (0.16) 1.472 (0.103)
95% Bayes probability intervals for a (0.71, 0.98)
b b b p a
(1.27, 1.67)
(1.57, 1.93)
(1.77, 2.00)
n"1000 Dataset corresponding to a"0.8 a"1.5
a"1.8
a"2.0
1.012 (0.028) !1.026 (0.028) 0.983 (0.027) 0.923 (0.097) 0.823 (0.029)
1.013 (0.033) !1.055 (0.031) 0.941 (0.032) 0.962 (0.053) 1.795 (0.044)
1.046 (0.032) !1.001 (0.033) 1.013 (0.033) 1.012 (0.047) 1.979 (0.019)
1.053 (0.033) !1.073 (0.032) 0.959 (0.033) 0.948 (0.062) 1.489 (0.045)
95% Bayes probability intervals for a (0.77, 0.88)
(1.40, 1.58)
(1.70, 1.88)*
(1.93, 2.00)
*Note: Posterior standard deviations in parentheses.
of the true value and few passes are generally enough to guarantee convergence. The inflation factor was set to 1.5. Since acceptance rates of the a-subchain average near 50% this seems to be an acceptable choice. Two other convergence diagnostics were used supplementary. One is due to Geweke (1991) that tests for equality of means in two subsets of the same long-run (based on variance estimates computed via the spectral density at the The diagnostics are not reported but are available from the author on request.
E.G. Tsionas / Journal of Econometrics 88 (1999) 365–401
379
Fig. 1. Ergodic means of a (true a"1.5, n"200).
origin) and the other is a diagnostic based on the behavior of quantiles in (increasing) subsets of the same long run. As the chain converges in distribution quantiles should converge as well. With interest presumably focused on a for most applications, posterior densities of a are presented in Fig. 2. Although the densities look fairly symmetric for values of a less than about 1.80, symmetry fails, as expected, when nearly normal stable laws are considered and posterior densities of a are in fact monotonically increasing when a is near 2. One advantage of the present approach — or any Bayesian approach for that purpose — is of course that the need to resort to asymptotic normal approximations is eliminated. 6. Application to stationarity in real exchange rates 6.1. Model Considerable attention has been devoted to modeling exchange rates. After the study of Meese and Rogoff (1983) time series models (especially of the random walk variety) received considerable attention. Beginning with Diebold Computations required on the average 35 min of CPU (90 for n"1000) for each run on a 486 DX2/50 MHz running GAUSS. Nearly half the time is spent in interpolation of FFT values.
380
E.G. Tsionas / Journal of Econometrics 88 (1999) 365–401
Fig. 2. Examples. Marginal posterior densities of a.
E.G. Tsionas / Journal of Econometrics 88 (1999) 365–401
Fig. 2. Continued.
381
382
E.G. Tsionas / Journal of Econometrics 88 (1999) 365–401
(1988) several authors have studied the time series properties of exchange rates focusing on their conditional distributional properties (Hsieh, 1988, 1989), Baillie and Bollerslev (1990) and Muller and Pole (1993) among many others), the issue of trend versus difference stationarity (Schotman and van Dijk, 1991 ) or their unconditional distributional properties (Boothe and Glassman (1987), Friedman and Vandersteel (1982) or Westerfield (1977)). Stable distributions have been prominent in the third class and the available empirical evidence from various ad hoc or grid-based likelihood estimates (e.g. Table 1 in Boothe and Glassman, 1987) reveals that estimates of the characteristic exponent range from 1.12 to 2.06 for a variety of currencies (sometimes unrestricted ML estimates converge to values of a'2). It is evident that confidence in these estimates should be limited for a variety of reasons: (i) log-differenced exchange rates are invariably considered (to impose stationarity) thus leaving the important question of trend versus difference stationarity unanswered. In other words these studies impose a unit root; (ii) exchange rate changes are assumed to be independent (in order to reduce the problem to a location-scale one), an assumption that is unrealistic in view of the evidence in Diebold (1988); (iii) measures of uncertainty associated with point estimates of characteristic exponents (such as standard errors) are not given. Following Schotman and van Dijk (1991) the issue of unit root inference in exchange rates is given special attention here. The available evidence (Diebold, 1988 ) generally points to the direction that unit roots are present in many exchange rate series. Schotman and van Dijk (1991) show that this evidence is considerably weakened once unit roots are given less prior weight. This point has been emphasized by many authors including Sims (1988) and Geweke (1994) who also considered the effect of Student-t disturbances on unit root inference. The motivation for considering the relevance of fat-tailed distributions in unit root inference is here the same as in Perron (1989) and Geweke (1994). The present study uses monthly real exchange data for seven currencies versus the US dollar after 1973:1. The currencies are the British Pound (BP), Canadian Dollar (CD), German Mark (DM), French Franc (FF), Japanese Yen (JY), Italian Lira (LI) and Swiss Franc (SF). The data were obtained from the IMF-IFS databank. Real rates were obtained using the consumer price indices from the same databank. There is a different number of observations for each currency depending on the availability of price data for 1993:12. Ending dates are 1993:9 for the British Pound, 1993:11 for the Canadian Dollar, 1993:10 for the Japanese Yen and 1993:12 for all remaining currencies. Nominal rates are all available up to 1994:1. The model developed here is based upon the general principles in Schotman and van Dijk (1991), DeJong (1992) and Geweke (1994). Let y denote the R logarithm of the real exchange rate of a currency versus the US dollar and assume K y "k#dt#oy # b (y !y )#u , t"1,2, n R R\ H R\\H R\\H R H
(14)
E.G. Tsionas / Journal of Econometrics 88 (1999) 365–401
383
where u is i.i.d. S(a). The substantive questions can be posed as follows: R (i) What is the posterior distribution of a for major currencies? How likely is normality a posteriori? (ii) What are the posterior properties of o, i.e. is trend stationarity more plausible than unit roots? (iii) What is the posterior relationship between a and o? Question (i) is by far the most important as far as the distributional properties of exchange rates are concerned and is innovative. Question (ii) has been taken up in models with normally distributed disturbances by Schotman and van Dijk (1991) and in models with Student-t disturbances by Geweke (1993, 1994) — in the context of quarterly US macroeconomic time series. Question (iii) summarizes the empirical implications of Paretian stability for stationarity. If posterior independence between o and a cannot be assumed, then the assumption of normality will not be without consequences for stationarity inference. Abrupt deviations from the ‘mean’ part of y should have an effect on R inferences regarding o (and possibly other parameters) because models that assume normality are more sensitive to outliers than models where this assumption is relaxed. What exactly is the magnitude of this effect is a question that can be answered by looking at joint posterior distributions. A related argument is made by Perron (1989) whose emphasis is, however, on structural breaks rather than directly on distributional specification. 6.2. Prior distributions The methods developed in this paper allow for arbitrary priors for a but it is desirable at this stage to choose a ‘non-informative’ prior in order to convey mainly evidence from the data. We therefore choose n(a)"I (a). It should be
mentioned however that beta priors would be a natural choice in view of the bounded support of a. Another prior that might be natural is the one derived by placing an inverse gamma prior on the scale parameter c"p? along with Jeffrey’s prior on p. The resulting prior is n(a)JaJ for some constant l. Our non-informative prior results in the limit as lP0. Notice that a non-informative prior on c would put more mass at lower values of a and would have a corner minimum at a"2, and a corner supremum at a"0: this prior might be unreasonable in certain applications. Following Eq. (3), independent normal priors are placed on (k,d,o) and n(p)Jp\. More specifically k&N(0, 2),
d&N(0, 10)
o&N(0.98, 0.5)I b &N(0, fH\), H
(o), \
j"1,2,m, f"0.5, m"4.
384
E.G. Tsionas / Journal of Econometrics 88 (1999) 365–401
The prior on k reflects the idea that this parameter extends with high probability from !4 to 4. This prior is flat for all practical purposes given historical experience with exchange rates. The prior on d is also rather diffuse. Evidence in Frankel and Meese (1987) points to the direction that first-order autocorrelations should be about 0.9 for monthly data. The value 0.98 adopted here is closer to the empirical evidence in Schotman and van Dijk (1991). Adopting a relatively flat prior for o over the stationary region is not recommended although it is formally possible; that o is concentrated to values higher than about 0.8 is well documented in many studies. This fact should be reflected in available prior information formally. Since this prior is truncated, an obvious rejection step (in conjunction with Eq. (10)) is required to maintain the correct posterior. Finally the ‘smoothness prior’ on b reflects the idea that these H parameters quickly shrink towards zero. The knife-edge role of a (i.e. the fact that first moments of y abruptly cease to R exist in a neighborhood of a"1) makes it extremely difficult to use nonindependent priors that would rely on properties of average growth rates, standard deviations or other moments as for example in Schotman and van Dijk (1991) or Geweke (1994). These quantities would not even be defined over the whole range of variation of a. Medians are still defined but are intractable to work with. Non-independent priors that do not rely on such things on the other hand are likely to be quite arbitrary. 6.3. Results To perform Bayesian analysis of the model in Eq. (12) the Markov Chain Monte Carlo technique described in Section 4 was used with 1500 preliminary passes and 5000 subsequent passes (which required about 20 min per dataset). Full ML is used to obtain the ML estimate of a and its standard error. The Metropolis chain parameter (c) is selected to imply average acceptance rates of about 50%. According to ergodic averages, quantile diagnostics and Geweke’s (1991) test, convergence to posterior distributions obtains within 500—1000 passes. Posterior means and standard deviations are reported in Table 2. The evidence can be summarized as follows. First, no evidence of linear trend is apparent and posterior standard deviations of d are 1—4 times as large as posterior means. Second, posterior means of o are fairly concentrated around 0.98 with standard errors of about 0.01. Figs. 3 and 4 present the marginal posterior densities of o. These densities peak sharply in general at values of o less than unity.
Figures are reported for a limited number of currencies. For the remaining currencies, marginal posteriors are similar in nature and are not reported. The full set of figures is available from the author on request.
E.G. Tsionas / Journal of Econometrics 88 (1999) 365–401
385
Table 2 Empirical results for the exchange rates data
k o d* b
b
b
b
a
BP
CD
DM
FF
JY
LI
SF
!0.010 (0.008) 0.975 (0.015) 2.600 (0.003) 0.107 (0.066) 0.0365 (0.062) !0.016 (0.062) 0.035 (0.065) 1.866 (0.080)
0.002 (0.002) 0.986 (0.009) 0.390 (0.001) !0.031 (0.0599) !0.033 (0.059) 0.036 (0.059) 0.053 (0.055) 1.694 (0.097)
0.013 (0.008) 0.982 (0.011) 0.750 (0.003) 0.049 (0.064) 0.085 (0.062) 0.002 (0.060) !0.028 (0.062) 1.804 (0.093)
0.036 (0.021) 0.980 (0.011) 0.350 (0.003) 0.029 (0.064) 0.091 (0.059) 0.066 (0.060) 0.067 (0.060) 1.783 (0.110)
0.134 (0.072) 0.975 (0.013) 5.900 (0.004) 0.106 (0.064) 0.069 (0.064) 0.116 (0.063) 0.042 (0.062) 1.913 (0.059)
0.108 (0.072) 0.985 (0.010) 1.200 (0.003) 0.099 (0.061) 0.131 (0.063) 0.081 (0.058) 0.012 (0.059) 1.669 (0.127)
0.014 (0.010) 0.975 (0.014) 1.280 (0.004) 0.074 (0.063) 0.092 (0.062) 0.037 (0.061) 0.032 (0.060) 1.869 (0.074)
1.53—1.96
1.77—1.99
1.41—1.90
1.70—1.98
95% Bayes probability intervals for a 1.68—1.98
1.51—1.88
1.56—1.96
95% Bayes probability intervals for o 0.944—0.998 0.967—0.999 0.959—0.999 0.956—0.998 0.947—0.997 0.963—0.999 0.944—0.998 Correlation coefficients between a and o !0.160
!0.069
!0.174
!0.205
!0.081
!0.154*
!0.140
*;10\.
Table 2 reports 95% Bayes probability intervals for o. Since the prior for o did not assign mass at o"1 testing for a unit root is not straightforward. However the 95% Bayes probability intervals range from about 0.95 to 0.997 (for JY) and 0.999 for most of the other currencies. However, as in Schotman and van Dijk (1991) marginal posterior densities of o do not peak at o"1 and in some cases (e.g. BP, JY and SF) the difference of probability masses near the posterior mode and o"1 is considerable. Third, regarding a posterior means range from 1.69 (CD) to 1.91 (JY). Marginal posterior densities are reported in the upper panels of Figs. 5 and 6. It is remarkable that evidence against normality is rather strong and these densities peak sharply at values below 2. Bayes 95% probability intervals for a are reported in Table 2. It seems that exchange rates are consistent with values of a as high as 1.88 for CD, 1.96 for DM, 1.99 for JY and values in this range for the remaining currencies. The lower endpoints are however between 1.41 (for LI) and 1.77 (for JY). Despite the fact that one has to impose the restriction a)2, it
386
E.G. Tsionas / Journal of Econometrics 88 (1999) 365–401
Fig. 3. Marginal posterior density of o: German mark.
seems that exchange rates do exhibit deviations from normality, with values of a"1.8 being fairly typical. Apparently, marginal densities of a are not even approximately symmetric. As a result, asymptotic approximations (or ad hoc methods that rely on asymptotic normality at least for inference purposes) are not expected to work well for the data at hand. Fourth, evidence on the joint posterior of a and o is conveyed by the lower panels of Figs. 5 and 6. The most important feature of the bivariate posterior is the fact that a and o tend to be negatively correlated, as can be seen from the contourlines of these posteriors in Fig. 7. This is also obvious upon inspection of posterior correlation coefficients between a and o in Table 2. These are between about !0.1 and !0.2 with two exceptions, namely the JY and CD. The restrictions o)1 and a)2 result in ‘cuts’ in the joint posteriors of a and o along both axes (lower panels of Figs. 5 and 6) with the exception of IL, for which the posterior of a is more symmetric compared to other currencies. This places even less mass near a"2. The same is true for the CD and to a lesser extent for DM (Fig. 5). The effect of the ‘cut’ is that the posterior mean of a is lower compared to a hypothetical situation where a could vary in (0,R). For that reason it is easier to ‘reject’ normality because marginal posteriors of a have
E.G. Tsionas / Journal of Econometrics 88 (1999) 365–401
387
Fig. 4. Marginal posterior density of o: Japanese Yen.
peaks well below a"2. Monotonically increasing posteriors in the range (0,2] would naturally offer more support to normality. Upon inspection of Fig. 7 it is evident that restricting a)2 is generally not important with the possible exception of the Japanese yen. The stationarity restriction o)1 is more important, but the JY and SF offer an exception. The issue of negative correlation between o and the shape parameter (a) is investigated further by assuming that the errors are Student-t with degrees of freedom l. For given l, Bayes analysis of the model is performed as in Geweke (1993). We assume l"1, 2, 3, 5, 7, 10 and 50. It is found invariably that posterior means of o increase as l decreases with the exception of l)2 — posterior standard errors of o, of course have to decrease since leptokurtosis permits more precise estimation of location. The same phenomenon occurs when one fixes a (at 1.99, 1.9, 1.8, 1.7, 1.5, 1.1 and 0.9). The effect, however, is larger on the standard errors rather than on the posterior means themselves, as one would expect from the low correlation coefficients between a and o in Table 2.
Gibbs sampling with fixed l or a uses 1100 passes, 100 of which are ignored.
388
E.G. Tsionas / Journal of Econometrics 88 (1999) 365–401
Fig. 5. Marginal posterior densities of a and o: German Mark.
E.G. Tsionas / Journal of Econometrics 88 (1999) 365–401
Fig. 6. Marginal posterior densities of a and o: Japanese Yen.
389
390
E.G. Tsionas / Journal of Econometrics 88 (1999) 365–401
Fig. 7. Contours of joint posterior of a, o: German Mark and Japanese Yen.
The negative correlation between a and o is also evidenced by the contours of the joint posteriors given in Fig. 7. Notice that conditioning on o"1 implies a great deal of uncertainty about a. It is remarkable that ‘cuts’ are more pronounced for o rather than a. The data are informative in deciding against normality but they would be less decisive in deciding about strict non-stationarity (o'1). The same pattern is generally observed in other studies (e.g Schotman and van Dijk, 1991 ).
E.G. Tsionas / Journal of Econometrics 88 (1999) 365–401
391
7. Conclusions and directions for future research The paper proposed a practical method for posterior inference in general models with symmetric stable Paretian disturbances. For clarity, the emphasis has been on linear models but generalizations are straightforward. The new method is based on a combination of Gibbs sampling with data augmentation, Metropolis independence chains and an FFT-based Metropolis chain for the characteristic exponent. It has been emphasized that the method produces the posterior distributions, and not just the mode of the likelihood function. The method has been applied to some artificial models and a model that allows for unit roots. The second model has been applied to a set of monthly exchange rates of six currencies versus the US dollar and unit root inference has been developed jointly with inference regarding the characteristic exponent. There are important extensions suggested by the present work. First, it is important to resolve the issue of Paretian stability versus normality (or other candidate distributions for that purpose) for important economic time series and (naturally) stock prices. Formulating capital asset or arbitrage pricing models with stable disturbances is an important and feasible extension of the present work. Second, comparisons with ARCH-type models (Bollerslev, 1986) are important. Whether or not stable distributions are more useful than ARCH models is an empirical matter that remains to be solved. The results of Vries (1991) indicate relationships between GARCH and stable processes. However putting model comparison on more formal ground is necessary. Third, Section 6 points to the direction that posterior inference in stable models with a general time series structure and possible non-stationarities is in fact exactly the same with the analysis of the linear model once the data are given. Further extensions to time series models with possible non-linearities would be both important and useful in such areas as business cycle research, asset pricing, etc. One of the advantages of this line of research is that it removes the unrealistic assumption that the underlying time series must be an i.i.d. process that was used without exception in empirical applications of stable processes so far. Fourth, stable Paretian processes imply the restriction that second moments are infinite. To the extent that second moments vary considerably, the stable Paretian hypothesis might be a good approximation. One however would like to remove this restriction while maintaining Pareto tails that are likely to capture well the tail behavior of many economic time series. In other words it would be desirable to work with ‘hyper-stable’ distributions where a is allowed to vary in (0, r) for r substantially greater than 2 and ideally r"R. Although one can certainly think of non-stable distributions with Pareto-like behavior (see Phillips (1993) for example), model comparison and estimation is not a trivial undertaking, unless one is willing to restrict attention to order statistic
392
E.G. Tsionas / Journal of Econometrics 88 (1999) 365–401
or ad hoc methods. An exact likelihood analysis in classes of distributions of this form would be especially important in its own right, as well as in several financial data applications. Fifth, extensions to the case of stable models with time varying or stochastic volatility are important. In this case the stable disturbance in Assumption 4 would have to be represented as a normal scale mixture with stochastic scale. Conditionally on volatility parameters, Bayesian analysis of the model can be carried out along the lines discussed in this study. Generating random drawings from the conditional distributions of volatility or GARCH parameters would proceed via standard methods (e.g. Jacquier et al., 1994; Muller and Pole, 1993). However, the feasibility and efficiency of these methods still remains to be examined in this context. Acknowledgements This work was supported in part by the University of Toronto Connaught Fund Grant No. 3-373-156-20, and was completed while the author was with the Department of Economics, University of Toronto. The author is indebted to an Associate Editor and two anonymous referees for their detailed comments that helped to improve the paper considerably. The author also wishes to acknowledge very helpful discussions and comments from John Geweke who encouraged and motivated this study. Discussions with Andrey Feuerverger, Carmen Fernandez, Dale Poirier, Mark Steel and comments from Hu McCulloch are gratefully acknowledged. Any remaining errors are, however, the author’s responsibility. Appendix A. Proposition A.1. ∀(b, p, a)3HM , 1L P(b, u,p, a" y, X ) du"P*(b, p, a" y, X ). >
Proof. The integral exists by Lemma A.3 below. By Lemma A.2, u &(2ue G G G whenever u &S(a), u &S (a/2) and e &N(0, 1) all independent and a3(0, 2]. G G > G Therefore (p/(2)u &pue from which it follows that (since +u , are independent) G G G e f (u "a)Jp\ u\ exp ! G u\ h(u "a) du , G G G G 2p G
L L (y !x b) G f (u "a)Jp\L u\exp ! G u\ h(u "a) du 2du . G G G G L 2p G G 1L > (A.1)
E.G. Tsionas / Journal of Econometrics 88 (1999) 365–401
Multiplying by n(b, p, a) proves the proposition.
393
䊐
¸emma A.1. Assume (i) ¼&S (h), 0(h(1 and u&S(o), 0(o)2 and > (ii) ¼ and u are independent. ¹hen u¼M&S(oh). Proof. (Feller, 1971, p. 176). 䊐 ¸emma A.2. Assume (i) u&S (a/2), e&N(0, 1), 0(a(1, (ii) u and u are in> dependent. Adopt the convention that there is a point mass at u"1 when a"1 and let h(."a) denote the density of random variables with distributions in S (a/2). ¹hen > (2ue&S(a). Proof. For 0(a(1 the lemma follows by Lemma A.1 provided o"2 and h"a/2. When a"1 the equivalence follows by the convention in the statement of this lemma. 䊐 ¸emma A.3. 1L P(b, u,p, a) du exists and is finite. >
Proof. Consider Eq. (A.1). Since the term
(y !x b) G p\Lexp ! G n(b, p) 2p is bounded, the lemma would follow provided one can show that the integral u\exp(!qu\)h(u"a) du exists, where q,(y !xb)/2p, some G G i"1,2, n. Since u\Bh(u"a) du"C(1#d/a)/C(1#d),d*0, ( Brockwell and Brown, 1981, p. 628) finite integrability follows. 䊐 Proposition A.2. (i) Provided n'k, Eq. (6) is the kernel of a posterior density for (b, u, p, a). (ii) ¼hen A is positive definite and n'k, all posterior moments of b exist. (iii) E(pP" y, X)(R provided n!k'r. (iv) E(aK" y, X )(R, m'0. Proof. (i) Follows immediately from the proof of (ii). (ii) Suppose first that the prior n (b, p"a)Jp\. By Eq. (6), existence of the (r 2r )th moment of b requires that the following integral is finite: I
I L "b "PH f ([y !xb]/]p/(2]"a) db p\L> dp da. H G G H G 1I 1>
394
E.G. Tsionas / Journal of Econometrics 88 (1999) 365–401
Using Eq. (7), this integral is proportional to
1L >
I "b "PHfI (b"bK (X),p[XX\X ]\) db H , H 1I 1>
(A.2)
L p\L\I>exp[!s(X)/2p] u\h(u "a)"XX\X"\ du da, G G G where bK (X)"[XX\X]\XX\y, s(X)"yX\y!yX\X[XX\X]\XX\y and f I (b"k, R) denotes the k-variate normal distribution for b, with mean k and , covariance matrix R. Using the continuity of h(u "a) in a, and applying the mean value theorem for G integrals with respect to a, the integral in Eq. (A.2) is proportional to
1L >
I "b "PHf I (b"bK (X),p[XX\X ]\) db H , H 1I 1>
(A.3)
L p\L\I>exp[!s(X)/2p] u\h(u "a*)"XX\X"\ du G G G for some a*3(0, 2). The above integral was shown to exist (for more general classes of u -priors) by Fernandez and Steel (1996), Theorem 2), provided G r (min+n!k,n!k#m!p(r),, where p(r)"max+p :r '0,,p is their singuH H H H larity index of X, m"min+"m\", m>,,m\"inf S, m>"sup S, S"+s31: E(u\Q),u\Qh(u "p) du )(R,. Under the positive stable prior for u ,u G G G G G G has non-positive moments moments of every order (see proof of Lemma A.3), and as a result m\"!R, m>"0, m"0. If every k;k submatrix of X has rank k, we obtain max+p :r '0,"0 and therefore r(n-k implies existence of H H moments of b of order less than n-k. Now, with a normal prior for b as in Eq. (3), it is easy to see that E+I "b "PH" y, X, is bounded above by a multiple of Eq. (A.3) provided A is H H positive definite, and therefore all moments of order less than n-k exist as well. (iii) To show that E(pP" y, X) exists, one must show that the following integral is finite:
L p\L>P> f ([y !xb]/]p/(2]"a) db dp da, G G G 1I 1>
E.G. Tsionas / Journal of Econometrics 88 (1999) 365–401
395
or (using the mean value theorem for integrals) that for some a*3(0,2) the following integral is finite:
L p\L>P> f ([y !xb]/]p/(2]"a*) db dp. G G G 1I 1> Using Theorem 4 of Fernandez and Steel, 1996, pp.8 and 21), existence of this integral requires that r(n!k and E(u\P),u\Ph(u "p) du ) is finite. As G G G G shown in the proof of Lemma A.3, The last condition is satisfied. (iv) To show that E(aK"y, X ) is finite, we apply again the mean value theorem for integrals, to see that the following integral must be finite:
1L >
L a*K u\h(u "a*)\L> G G G 1I 1>
exp[!(1/2p)(y!Xb)(y!Xb)]db dp du. Existence of this integral follows by Theorem 1 of Fernandez and Steel (1996).
Appendix B. Practical issues
B.1. Random number generation from the conditional distribution of a Sampling the conditional distribution of a in Eq. (11) requires knowledge of the log-likelihood function for stable laws in standard form. The method employed in this paper relies on interpolation of values of stable density functions computed using fast Fourier transforms (FFT) of the characteristic function. Given the characteristic function of symmetric stable laws
(t)"exp[!"t"?] the probability density function (by the inversion theorem) is f (x"a)"(1/n)
exp[!"t?"]exp[!itx] dt.
\ For a*1 the FFT is as described in the main text. For a(1 the FFT is increasingly inaccurate. In order to avoid this and maintain accuracy, it is
396
E.G. Tsionas / Journal of Econometrics 88 (1999) 365–401
possible to perform a FFT of a transformed stable density with characteristic exponent greater than one. More specifically, one exploits the property f (x"a, 0)"x\>?f (x\?"a, b)
(B.1)
where a'1, b"a!1 (see Zolotarev, 1986, pp. 86—88). However, the FFT performs reasonably well even with values of a as low as 0.8. For a few lower values exact quadrature can be used instead, provided the proportion of such low values of a is not exceedingly large. All computations were checked against quadrature. FFTs will generally give 3—4 significant digits provided "x" is not larger than about 7.0. For a(1, FFTs are accurate to 2—3 digits in the entire range of the FFT, i.e. "x"(51 but the asymptotic series are generally faster and were used instead. However, to increase accuracy, FFT of a transformed density with exponent greater than 1 was used, as in Eq. (A.1). For "x"'7 five terms of the asymptotic series were used. The asymptotic series is C(na#1) f (x"a)"(1/(nx)) (!1)L\ sin(nna/2)"x"\L?. n! L See for example Zolotarev (1986, pp. 106—107). Bounds for "x" such that FFT is sufficiently accurate were prepared in advance by comparing FFT to exact quadrature. For these test purposes integration was performed using the two integrands given in Holt and Crow (1973, Paragraphs 2.12 and 4.2) namely:
f (x"a)"(1/n) exp(!t?)cos(tx) dt,
f (x"a)"(1/n) exp(!D t?!xt)sin(D t?) dt, where D "cos(an/2) and D "sin(an/2). The first expression is used for a'1 and the second for a(1. For a"1 and a"2 well known exact formulae are available, i.e. f (x"1)"(1/n)(1#x)\, f (x"2)"(4n)\ exp(!x/4). The value of the density at the origin is also available analytically: f (0"a)"C(1/a)/(na),
(B.2)
E.G. Tsionas / Journal of Econometrics 88 (1999) 365–401
397
see Zolotarev (1986, Formula 2.2.11). The asymptotic series is again used for large values of the argument (x*20). Because quadrature is the only available way to check FFT results, several (standard form) likelihoods were computed for a values for which the stable density is not available in terms of special functions. Plots of the likelihoods versus the only parameter (a) looked reasonable and, therefore, this served as an independent check of validity of exact quadrature subroutines. These subroutines were not used at all in the Markov Chain Monte Carlo computations since quadratures for large sample sizes (n greater than about 50) are extremely computer intensive. Feuerverger and McDunnough (1980) point out that the approximation error of the FFT is essentially constant for any a and in the range of values of x, for which the FFT should be used (i.e. "x" sufficiently small). The author verified this fact in his own numerical experiments. Since the approximation error is practically constant and f (0"a) is known analytically (see Eq. (B.2)), it follows that subtracting this known value from the available FFT-based density approximation reduces the error substantially. This strategy was followed in both the constructed examples and the exchange rates application of this paper. With this construction, reasonable accuracy is expected when results are compared to quadrature or Zolotarev’s integral approximation (see Zolotarev (1986, Theorem 2.2.1) or Zolotarev (1966)). In the tails (i.e. "x" sufficiently large) asymptotic expansions are used, and therefore machine precision is obtained. B.2. Remarks on alternatives Experimentations with alternative ways to compute the stable density point to the direction that various problems exist and FFT is by far the fastest exact method. Quadrature is of course possible but very expensive for moderate sample sizes. First, various simulated likelihood techniques seem to work well only at excessive computation costs. One approach is to compute the density f (x"a) by averaging (in u) (x/u) u\ where is the standard normal density and u is S (a/2). This idea exploits the fact that symmetric stable variates have a normal > scale mixture representation. With 500 replications standard errors were still large for values of a less than about 1 and computation time was prohibitive with more than 50 replications. If, however, one desires the density for a few a values, then 10,000 replications (plus some smoothing) will work well. Embedding this in a Markov Chain Monte Carlo is however very poor practice. Second, another alternative to the rough estimates produced by Monte Carlo integration of normal densities is to use a kernel smoother to estimate the stable density at a given a using a random sample from that distribution. However, this technique was not used because it should fail with certainty; for all aO2, if X G (i"1,2,n) are i.i.d. S(a) then max(X ) will dominate the average eventually. In G effect the smoother must fail to be precise exactly where we hope to extract more
398
E.G. Tsionas / Journal of Econometrics 88 (1999) 365–401
information about a, namely the extreme order statistics of the sample. One could of course use the asymptotic series at the tails but then FFTs are faster. A related idea is explored next. Third, the idea of using a Dirichlet mixture to estimate the density could be used. Muller (1992) used the method to build up estimates of conditional distributions that are hard to compute. This requires embedding a Gibbs sampler in the chain just to get a single draw of a. While it is obvious that the accuracy of this method depends on (i) how many Gibbs subpasses are used and (ii) how complex the mixture is, it is conjectured that even 2—3 Gibbs passes with 5—10 mixants will be slower than a Metropolis-FFT pass. The fact that the Dirichlet step has an unknown accuracy, and in view of the fact that FFTs are quite accurate, this idea was not given further consideration. References Akrigay, V., Booth, G., 1988. The stable-law model of stock returns. Journal of Business and Economic Statistics 6, 51—57. Akrigay, V., Lamoureux, C.G., 1989. Estimation of stable-law parameters: a comparative study. Journal of Business and Economic Statistics 7, 85—93. Albert, J.H., Chib, S., 1993. Bayes inference via Gibbs sampling of autoregressive time series subject to Markov mean and variance shifts. Journal of Business and Economic Statistics 11, 1—15. Baillie, R.T., 1993. Comment on Modeling asset returns with alternative stable distributions. Econometric Reviews 12, 343—345. Baillie, R.T., Bollerslev, T., 1990. A multivariate generalized ARCH approach to modeling risk premia in forward foreign exchange rate markets. Journal of International Money and Finance 9, 309—324. Blattberg, R.C., Gonedes, N.J., 1974. A comparison of the stable and student distributions as statistical models for stock prices. Journal of Business 47, 244—280. Bollerslev, T., 1986. Generalized autoregressive conditional heteroscedasticity. Journal of Econometrics 31, 307—327. Bol’shev, L.N., Zolotarev, V.M., Kedrova, E.S., Rubinskaya, M.A., 1968. Tables of cumulative functions of one-sided stable distributions. Theory of Probability and its Applications 15, 299—309. Boothe, P., Glassman, D., 1987. The statistical distribution of exchange rates: empirical evidence and economic implications. Journal of International Economics 22, 297—319. Brockwell, P.J., Brown, B.M., 1981. High-efficiency estimation for the positive stable laws. Journal of the American Statistical Association 76, 626—631. Buckle, D.J., 1995. Bayesian inference for stable distributions. Journal of the American Statistical Association 90, 605—613. Carlin, B.P., Gelfand, A.E., 1991. An iterative Monte Carlo method for nonconjugate Bayesian analysis. Statistics and Computing 1, 119—128. Carlin, B.P., Polson, N.G., 1991. Inference for nonconjugate Bayesian models using the Gibbs sampler. Canadian Journal of Statistics 19, 399—405. Chambers, J.M., Mallows, C.L., Stuck, B.W., 1976. A method for simulating stable random variables. Journal of the American Statistical Association 71, 340—344. Chib, S., 1993. Bayes regression with autoregressive errors. Journal of Econometrics 58, 275—294. Chib, S., Greenberg, E., 1995. Understanding the Metropolis—Hastings algorithm. The American Statistician 49, 327—335.
E.G. Tsionas / Journal of Econometrics 88 (1999) 365–401
399
DeJong, D.N., 1992. Cointegration and trend-stationarity in macroeconomic time series: evidence from the likelihood function. Journal of Econometrics 52, 347—370. Diebold, F.X., 1988. Empirical Modeling of Exchange Rate Dynamics, Lecture Notes in Economics and Mathematical Systems, vol. 303. Springer, New York. DuMouchel, W.H., 1973. Stable distributions in statistical inference. I: symmetric stable distributions compared to other symmetric long-tailed distributions. Journal of the American Statistical Association 68, 469—477. DuMouchel, W.H., 1975. Stable distributions in statistical inference. II: information from stably distributed samples. Journal of the American Statistical Association 70, 386—393. DuMouchel, W.H., Olshen, R.A., 1975. On the distributions of claim costs. In: Credibility: Theory and Applications, Proceedings of the Berkeley Actuarial Research Conference. Academic Press, New York, pp. 23—50, pp. 409—414. Fama, E.F., Roll, R., 1968. Some properties of symmetric stable distributions. Journal of the American Statistical Association 63, 817—836. Fama, E.F., Roll, R., 1971. Parameter estimates for symmetric stable distributions. Journal of the American Statistical Association 66, 331—338. Fernandez, C., Steel, M., 1996. On Bayesian inference under sampling from scale mixtures of normals. Center for Economic Research Discussion Paper No. 9602, Tilburg University, Netherlands. Feuerverger, A., McDunnough, P., 1980. On efficient inference in symmetric stable laws and processes. In: Csorgo, M., Dawson, D.A., Rao, J.N.K., Md.E. Saleh, A.K. (Eds.), Statistics and Related Topics. North-Holland, Amsterdam, pp. 109—122. Feuerverger, A., McDunnough, P., 1981. On some Fourier methods for inference. Journal of the American Statistical Association 76, 379—387. Frankel, J.A., Meese, R., 1987. Are exchange rates excessively variable? National Bureau of Economic Research Macroeconomic Annual. Friedman, D., Vandersteel, S., 1982. Short-run fluctuations in foreign exchange rates: Evidence from the data 1973—1979. Journal of International Economics 13, 171—186. Gelfand, A.E., Smith, A.F.M., 1989. Sampling based approaches to calculating marginal densities. Journal of the American Statistical Association 85, 398—409. Gelman, A., Rubin, D.B., 1992. Inference from iterative simulation using multiple sequences (with discussion). Statistical Science 7, 457—511. Geyer, C.J., 1992. Practical Markov chain Monte Carlo. Statistical Science 7, 473—482. Geweke, J., 1991. Evaluating the accuracy of sampling based approaches to the calculation of posterior moments. In: Bernardo, J.M., Berger, J.O., Dawid, A.P., Smith A.F.M. (Eds.), Bayesian Statistics, vol. 4. Clarendon Press, Oxford. Geweke, J., 1993. Bayesian treatment of the independent Student-t linear model. Journal of Applied Econometrics 8, 19—S40. Geweke, J., 1994. Priors for macroeconomic time series and their application. Econometric Theory 10, 609—632. Gilks, W.R., 1992. Derivative-free adaptive rejection sampling for Gibbs sampling. In: Bernardo, J.M., Berjer, J.O., Dawid, A.P., Smith, A.F.M. (Eds.), Bayesian Statistics, vol. 4. Oxford University Press, Oxford, pp. 631—649. Hastings, W.K., 1970. Monte Carlo sampling methods using Markov chains and their applications. Biometrica 57, 97—109. Holt, D.R., Crow, E.L., 1973. Tables and graphs of the stable probability functions. Journal of Research of the National Bureau of Standards 77, 143—198. Hsieh, D.A., 1988. The statistical properties of daily foreign exchange rates: 1974—1983. Journal of International Economics 24, 129—145. Hsieh, D.A., 1989. Modeling heteroscedasticity in daily foreign exchange rate markets. Journal of Business and Economic Statistics 7, 307—317.
400
E.G. Tsionas / Journal of Econometrics 88 (1999) 365–401
Ibragimov, I.A., Chernin, K.E., 1959. On the unimodality of stable laws. Theory of Probability and its Applications 4, 417—419. Jacquier, E., Polson, N.G., Rossi, P.E., 1994. Bayesian analysis of stochastic volatility models. Journal of Business and Economic Statistics 12, 371—417. Koutrouvelis, I.A., 1980. Regression-type estimation of the parameters of stable laws. Journal of the American Statistical Association 75, 918—928. Koutrouvelis, I.A., 1981. An iterative procedure for the estimation of the parameters of the stable law. Communications in Statistics-Simulation and Computation 10, 17—28. Lau, A.H., Lau, L., Wingender, J.R., 1990. The distribution of stock returns: new evidence against the stable model. Journal of Business and Economic Statistics 8, 217—223. Leitch, R.A., Paulson, A.S., 1975. Estimation of stable law parameters: stock price behavior application. Journal of the American Statistical Association 70, 690—697. Mandelbrot, B., 1960. The Pareto—Levy law and the distribution of income. International Economic Review 1, 79—106. Mandelbrot, B., 1963a. New methods in statistical economics. Journal of Political Economy 71, 421—440. Mandelbrot, B., 1963b. The variation of certain speculative prices. Journal of Business of the University of Chicago 36, 394—419. Mandelbrot, B., 1967. The variation of some other speculative prices. Journal of Business of the University of Chicago 40, 393—413. McCulloch, J.H., 1986. Simple consistent estimators of stable distribution parameters. Communications in Statistics-Simulation and Computation 15, 1109—1136. Meese, R.A., Rogoff, K., 1983. Empirical exchange rate models of the seventies: do they fit out of sample?. Journal of International Economics 14, 3—24. Mittnik, S., Rachev, S.T., 1989. Stable distributions for asset returns. Applied Mathematics Letters 2, 301—304. Mittnik, S., Rachev, S.T., 1993. Modeling asset returns with alternative stable distributions. Econometric Reviews 12, 261—389. (with discussion and reply). Muller, P., 1992. Alternatives to the Gibbs sampling scheme. Institute of Statistics and Decision Sciences, Discussion Paper 92-A014, Duke University. Muller, P., Pole, A., 1993. Monte Carlo posterior integration in GARCH models. Manuscript, Institute of Statistics and Decision Sciences, Duke University. Perron, P., 1989. The great crash, the oil price shock and the unit root hypothesis. Econometrica 57, 1361—1401. Phillips, P.C.B., 1993. Comment on ‘Modeling asset returns with alternative stable distributions’. Econometric Reviews 12, 331—338. Press, S.J., 1972. Estimation of univariate and multivariate stable distributions. Journal of the American Statistical Association 67, 842—846. Ripley, B.D., 1987. Stochastic Simulation. Wiley, New York. Ritter, C., Tanner, M.A., 1992. Facilitating the Gibbs sampler: the Gibbs stopper and the griddy—Gibbs sampler. Journal of the American Statistical Association 87, 861—868. Schotman, P., Dijk, H., 1991. A Bayesian analysis of the unit root in real exchange rates. Journal of Econometrics 49, 195—238. Sims, C.A., 1988. Bayesian scepticism on unit root econometrics. Journal of Economic Dynamics and Control 12, 463—474. Tanner, M.A., 1993. Tools for Statistical Inference. Methods for the Exploration of Posterior Distributions and Likelihood Functions. Springer, New York. Tanner, M.A., Wong, W.H., 1987. The calculation of posterior ditributions by data augmentation (with discussion). Journal of the American Statistical Association 82, 528—550. Tierney, L., 1994. Markov chains for exploring posterior distributions (with discussion). Annals of Statistics 22, 1701—1762.
E.G. Tsionas / Journal of Econometrics 88 (1999) 365–401
401
West, M., 1993. Approximating posterior distributions by mixtures. Journal of the Royal Statistical Society B55, 409—422. Westerfield, J.M., 1977. An examination of foreign exchange risk under fixed and floating rate regimes. Journal of International Economics 7, 181—200. vanDijk, H.K., Hop, P.J., Louter, A.S., 1987. An algorithm for the computation of posterior moments and densities using simple importance sampling. The Statistician 36, 83—90. Vries, C.G. de., 1991. On the relation between GARCH and stable processes. Journal of Econometrics 48, 313—324. Zellner, A., 1971. An Introduction to Bayesian Inference in Econometrics. Wiley, New York. Zolotarev, V.M., 1966. On Representation of Stable Laws by Integrals, American Mathematical Society Selected Translations in Mathematical Statistics and Probability. AMS, Providence, RI, pp. 84—83. Zolotarev, V.M., 1986. One-dimensional Stable Distributions, American Mathematical Society, Translations of Mathematical Monographs, vol. 65. AMS, Providence, RI.