Journal of the Korean Statistical Society 37 (2008) 309–311 www.elsevier.com/locate/jkss
Rejoinder: The t family and their close and distant relations M.C. Jones ∗ Department of Mathematics & Statistics, The Open University, Walton Hall, Milton Keynes MK7 6AA, United Kingdom Received 31 August 2008 Available online 26 September 2008
I am extremely grateful to the discussants for the kindness of their words and the excellence of their contributions. In my response – in which I am sure I cannot keep up any sort of reasonable level of narrative, clarity or exposition! – I will refer to the individual discussion contributions by the initial letters of the authors’ surnames as follows: Professor Azzalini [A] and Professors Balakrishnan and Capitanio [BC]. It seems I am spared comments from the remaining 23 letters of the alphabet. I couldn’t agree more with every word that [A] has written on the question of the underlying motivation and purpose of this kind of work. I heartily commend his excellent outline of the issues to the reader. [BC] make some of the same points more briefly towards the end of their discussion. I will not seriously attempt the “detailed discussion of these points [that] would require extensive space” but I will make some further fairly specific comparative points that will make a variable degree of reference to those general issues. It is one reasonable way of defining a skew-t family to insist that such a family retain power tails controlled by the degrees of freedom parameter while introducing a skewness parameter that has only the kind of scale effect on tails that I sketched in Sections 3.1 and 3.2 (while containing the Student-t distributions as their symmetric special cases, of course). I am grateful to [BC] for putting more flesh on the skeleton I provided in this regard (or perhaps it should be for identifying a skeleton from fragmentary bones?). With respect to the tail behaviour of the Azzalini skew-t distribution, we are both right. [BC] demonstrate the less appealing tail behaviour of what I meant – but did not explicitly state – towards the end of Section 3.1 by “a more naive skew-t distribution based directly on (3.1)”; the alternative skew-t distribution at (3.2), arising by dividing a skew-normal random variable by the square root of an independent chi-squared random variable, has the different tail behaviour I claimed of it, which “seems to me to be an extra, unheralded, advantage of formulation (3.2)”. That said, I recently found that the point that (3.2) joins (3.3) with g = tν in having |x|−(ν+1) tails has previously been “heralded” by Aas and Haff (2006), although the skewness parameter match-up was not. The above skew-t definition would disqualify the Jones and Faddy approach but includes other existing, indeed long-standing, skew-t families that I didn’t mention in the paper, namely, the noncentral-t (Johnson, Kotz and Balakrishnan, 1994, Chapter 31) and Pearson Type IV distributions (e.g. Willink (2008), and references therein). The Barndorff–Nielsen generalised hyperbolic skew-t distribution investigated by Aas and Haff has one power tail and one semi-heavy tail and therefore differs further in this regard (which, of course, might sometimes be to the good). So, let me concentrate a further brief comparison of skew-t distributions on the Azzalini ((3.2); henceforth (A)) and two-piece (T) versions, with side-mentions of other skew-t distributions as appropriate. As in the paper, common ∗ Tel.: +44 1908 652209; fax: +44 1908 655515.
E-mail address:
[email protected]. c 2008 The Korean Statistical Society. Published by Elsevier B.V. All rights reserved. 1226-3192/$ - see front matter doi:10.1016/j.jkss.2008.09.001
310
M.C. Jones / Journal of the Korean Statistical Society 37 (2008) 309–311
properties of A and T are their tail behaviour and half-t limiting distribution as skewness increases. Other common ground includes unimodality, limiting (if different) skew-normal distributions as ν → ∞, explicit formulae for moments and broadly comparable ease of random variate generation. By the way, limiting skew-normal distributions do not exist for most alternative skew-t distributions (a disadvantage of them?); there again, some of them have more attractive (?) high skewness limits (typically of inverse gamma type). Let me continue my comparison by referring more specifically to [A]’s general issues in this context. “Actual numerical fits are quite [very?] similar” because appropriately centred, scaled and skewness-matched skew-t densities are very similar at least for small and moderate amounts of skewness. (This goes for pretty much all skew-t’s I know of, not just A and T.) Clear differences between density families appear largely only because of differing large skewness limits, and this doesn’t apply to A and T. (The ‘kink’ at 0 corresponding to T’s discontinuity in second derivative there is a less noticeable feature than one might imagine until γ becomes large.) [A] is, of course, quite right to say that the A and T skew-t families differ considerably in genetic makeup (i.e. genesis). My badly made point was that the resulting families differ perhaps rather less than people might think—but this must be the effect of nurture rather than nature! However, I guess A scores over T in having more, and more meaningful, methods of genesis. Each of A and T has the same clearly interpretable (location, scale and) tail-weight parameter; it is also pretty clear in both cases, but especially in the latter, how the skewness parameter goes about controlling skewness. What of more formal links with the theory of skewness? The sinh–arcsinh skewing transformation of Section 4.3 respects the classical van Zwet (1964) notion of a skewness ordering. This is easy to show (Jones and Pewsey, 2008, Section 2.2); with much more difficulty the same can be shown to be true of T (Klein & Fischer, 2006). I am not aware of any such result for A although I imagine that most scalar skewness measures turn out to be monotone in λ. T, by the way, has an interesting place in the as yet obscure theory of density-based asymmetry (Av´erous, Foug`eres, & Meste, 1996; Boshnakov, 2007; Critchley & Jones, 2008), its density-based asymmetry function being constant. I would argue that T wins over A in terms of mathematical tractability. For example, the mode of T is at 0, the mode of A is . . . where? The distribution and quantile functions of the T distribution can be written explicitly in terms of the incomplete beta function and its inverse which means they are at the same level of complexity as those of the symmetric, Student-t, special case; the A distribution function seems to be a bit more complicated. The Arnold and Groeneveld (1995) skewness measure 1−2F(mode) is simply (γ 2 −1)/(γ 2 +1) for T (Fernandez and Steel 1998) but available only numerically for A. That all said, the mathematical aesthete, as opposed to statistician, in me remains unfulfilled by t’s lack of continuous even order differentiability and its very piecewise nature! (Tractability, by the way, is where noncentral-t and Pearson Type IV distributions are at a disadvantage.) Multivariate extensions are indeed important; A certainly scores well on that count and T perhaps less well (but see Bauwens and Laurent (2005)). Sinh–arcsinh distributions generalise naturally by applying the sinh–arcsinh transformation to marginals of the multivariate normal distribution. But I would caution that ‘natural’ multivariate constructions, like these and as emphasised by [BC], are not always useful multivariate constructions. For example (and there are numerous generalised versions of this), the classical multivariate F distribution naturally extends its univariate counterpart by dint of a common denominator in the ratios of chi-squared random variables defining its marginals. But its correlations must be positive, while dependence and marginal properties are conflated. For an example in the paper, the lletbai-t distribution has a natural extension to the multivariate case through (slightly) generalising the joint distribution of several order statistics (Jones & Larsen, 2004), but the result is limited for practice because it has support x1 < · · · < xd , d denoting dimensionality. But all is far from lost for distributions without a natural multivariate extension (or with an unsatisfactory one) because there are many general ways of constructing multivariate distributions with given univariate marginals. Copulas form one obvious approach. But there are others which might well be more attractive in this context. I have long felt that multiplicative marginal replacement might be further explored. By this I don’t necessarily mean the trivial version expounded by Jones (2002b) but perhaps its ‘full-blown’ counterpart formed by multiplying, say, a multivariate t initial density by the ratio of each desired skew-t marginal and the existing marginal in turn, and iterating. This is essentially the continuous analogue of a famous approach of Deming and Stephan in the discrete case (Kullback, 1968; Wang, 1993). So much, however, for tractability! Back on Planet Earth, the multivariate A skew-t distribution is itself the product of a general approach, as made beautifully clear recently by Capitanio (2008), for a copy of which I must thank discussant C! Canonically, the multivariate skew-normal distribution consists of d −1 independent normal random variables and a single independent skew-normal one (Azzalini & Capitanio, 1999). The canonical multivariate skew-t distribution follows by scale
M.C. Jones / Journal of the Korean Statistical Society 37 (2008) 309–311
311
mixing the multivariate skew-normal. And, throughout this paragraph, the general form follows by introducing location, scale and correlation parameters by linear transformation in the usual way. So, this approach can be applied to any univariate skew-t distribution with a skew-normal limit, such as A and T. An obvious extension is to more than one, perhaps all, skew-normal rather than normal marginals, although parsimony dictates the hope that just one, or a few, skewing parameter(s) is usually enough. An alternative is simply to linearly transform a set of independent marginal skew-t components. See Bauwens and Laurent (2005) and Ferreira and Steel (2007). [A]’s final comparative point concerns inferential methods and comparison. As in the paper, I cop out for a second time. But this is certainly a very important area in which much more needs to be done. [BC] make one new suggestion, for a two-piece skew-normal distribution and its accompanying two-piece skew-t distribution; let me refer to the latter as (AT) (or should that be (BC)?)! I would set c1 = c2 so that the density is continuous. AT – with location and scale parameters incorporated – would then be a five-parameter distribution. I have to admit to being dubious of the worth of a fifth parameter in unimodal distributions, on grounds of limited differences between densities, parameter interpretability, and potential inferential near-nonidentifiability. We do know that the mode of the A density is nonnegative (resp. negative) as λ ≥ (resp. <) 0. I think it follows that the AT density is unimodal with > 0 if α1 , α2 > 0, mode = 0 if α1 ≥ 0, α2 ≤ 0, < 0 if α1 , α2 < 0. And that the AT distribution is bimodal if α1 < 0, α2 > 0, with the same number of parameters as a two-component normal mixture distribution. Unfortunately, I fear that the AT density is (first-order) continuously differentiable at 0 only in the A case (α1 = α2 ). In conclusion, the global Student/skew-t family is clearly in rude health. Branches of the family are currently jostling for position, and it is not yet entirely clear which will prove to be the most influential in the years to come. Whichever they are, the practical benefits of empirical modelling with skew-t (and other) families of distributions incorporating skewness and control over tail-weight (e.g. Azzalini and Genton (2008), especially Section 1.1) will surely come to the fore. I would very much like to thank the editor of the journal, Professor Byeong Uk Park, for inviting this paper and the talk on which it is based, and for organising this discussion. I am very grateful to Byeong, several students and other members of the Korean Statistical Society for their fabulous hospitality when attending the 2008 Spring Meeting of the Society. I am greatly impressed by the strength and vigour of the Korean statistical community. Additional References in Rejoinder Aas, K., & Haff, I. H. (2006). The generalised hyperbolic skew Student’s t-distribution. Journal of Financial Econometrics, 4, 275–309. Arnold, B. C., & Groeneveld, R. A. (1995). Measuring skewness with respect to the mode. The American Statistician, 49, 34–38. Av´erous, J., Foug`eres, A. L., & Meste, M. (1996). Tailweight with respect to the mode for unimodal distributions. Statistics and Probability Letters, 28, 367–373. Azzalini, A., & Capitanio, A. (1999). Statistical applications of the multivariate skew normal distribution. Journal of the Royal Statistical Society, Series B, 61, 579–602. Bauwens, L., & Laurent, S. (2005). A new class of multivariate skew densities, with application to generalized autoregressive conditional heteroscedasticity models. Journal of Business and Economic Statistics, 23, 346–354. Boshnakov, G. N. (2007). Some measures for asymmetry of distributions. Statistics and Probability Letters, 77, 1111–1116. Capitanio, A. (2008). The canonical form of scale mixtures of skew-normal distributions with applications. Unpublished manuscript. Critchley, F., & Jones, M. C. (2008). Asymmetry and gradient asymmetry functions: density-based skewness and kurtosis. Scandinavian Journal of Statistics, 35, 415–437. Ferreira, J. T. A. S., & Steel, M. F. J. (2007). A new class of skewed multivariate skew distributions with applications to regression analysis. Statistica Sinica, 17, 505–529. Jones, M. C. (2002b). Marginal replacement in multivariate densities, with application to skewing spherically symmetric distributions. Journal of Multivariate Analysis, 81, 85–99. Jones, M. C., & Larsen, P. V. (2004). Multivariate distributions with support above the diagonal. Biometrika, 91, 975–986. Klein, I., & Fischer, M. (2006). Skewness by splitting the scale parameter. Communications in Statistics—Theory and Methods, 35, 1159–1171. Kullback, S. (1968). Probability densities with given marginals. Annals of Mathematical Statistics, 39, 1236–1243. van Zwet, W. R. (1964). Convex transformations of random variables. Amsterdam: Mathematisch Centrum. Wang, Y. J. (1993). Construction of continuous bivariate density functions. Statistica Sinica, 3, 173–187. Willink, R. (2008). A closed form expression for the Pearson Type IV distribution function. Australian and New Zealand Journal of Statistics, 50, 199–205.