Statistics & Probability Letters 49 (2000) 285 – 290
The skew-Cauchy distribution Barry C. Arnold, Robert J. Beaver ∗ Department of Statistics, University of California, Riverside, CA 92521-0128, USA Received April 1999; received in revised form September 1999
Abstract Suppose (X ; Y ) has a (k + 1)-dimensional Cauchy distribution. Consider the conditional distribution of X given Y ¿y0 , for some xed value of y0 ∈ R. The resulting distribution is the multivariate skewed Cauchy, in which there is truncation with respect to Y : this is but one of a general class of skewed distributions for which the initial distribution is symmetric. The skewing function, which depends upon the distribution of Y , need not be from the same family as the initial density. c 2000 Elsevier Science B.V. All rights reserved
MSC: 62H05; 62H12 Keywords: Skewed-Cauchy; Truncated Cauchy; Generalized skewed distributions
1. Introduction Azzalini and Dalla Valle (1996) introduced a family of skewed k-variate normal distributions. The present paper deals with a heavier tailed alternative model, a skew-Cauchy model. The parallel with the Azzalini and Dalla Valle distribution and the extension of it introduced in Arnold and Beaver (1997) is quite close, nevertheless the distinguishing features of the Cauchy model (no moments, for example) justify a separate treatment.
2. The basic k-dimensional skew-Cauchy distribution Begin with k + 1 independent standard Cauchy random variables W1 ; W2 ; : : : ; Wk ; U . Now consider the conditional distribution of W given 0 + 01 W ¿U . This distribution is called the basic skew-Cauchy distribution. ∗
Corresponding author. E-mail address:
[email protected] (R.J. Beaver).
c 2000 Elsevier Science B.V. All rights reserved 0167-7152/00/$ - see front matter PII: S 0 1 6 7 - 7 1 5 2 ( 0 0 ) 0 0 0 5 9 - 6
286
B.C. Arnold, R.J. Beaver / Statistics & Probability Letters 49 (2000) 285 – 290
To determine the density of the distribution we may argue as follows. For notational simplicity introduce the event A(0 ; 1 ) = {0 + 01 W ¿U } = {(W ; U ) ∈ B(0 ; 1 )}; where B(0 ; 1 ) = {(w; u): 0 + 01 w¿u}: The conditional density of (W ; U ) given A(0 ; 1 ) is clearly given by Qk [ i = 1 (wi )] (u)I ((w; u) ∈ B(0 ; 1 )) ; fW ; U |A(0 ;1 ) (w; u) = P(A(0 ; 1 )) where here and henceforth we denote the standard Cauchy density and distribution function by respectively. Thus, (u) =
1 ; (1 + u2 )
(2.1) and ,
u∈R
and 1 1 + Tan−1 u; u ∈ R: 2 Now if we integrate density (2.1) with respect to u we obtain the desired density " k # Y (wi ) (0 + 01 w)=P(A(0 ; 1 )): fW |A(0 ; 1 ) (w) = (u) =
(2.2)
i=1
However, the denominator in (2.2) is not dicult to evaluate. We have P(A(0 ; 1 )) = P(0 + 01 W ¿U ) = P(U − 01 W ¡ 0 ): Note that U − 01 W is a linear combination of independent Cauchy random variables and in fact U − 01 W ∼ Pk Cauchy (0; 1 + i = 1 |1i |). Consequently, ! 0 : P(A(0 ; 1 )) = Pk 1 + i = 1 |1i | Thus, the basic k-dimensional skew-Cauchy distribution is of the form ! , k Y 0 0 : (wi ) (0 + 1 w) f(w; 0 ; 1 ) = Pk 1 + i = 1 |1i | i=1
(2.3)
If 0 = 0 and 1 = 0, the density reduces to a k-variate density with independent standard Cauchy marginals. Generally, it is a weighted version of that density with weight function (0 + 01 w). If 0 = 0, the density (2.7) reduces to a close parallel of Azzalini and Dalla Valle’s (1996) k-dimensional skew-normal distribution. In the case where 0 = 0, since U − 0 W is symmetric, the denominator in (2.3) reduces to 12 . Thus, " k # Y (wi ) (01 w): (2.4) f(w; 0; 1 ) = 2 i=1
B.C. Arnold, R.J. Beaver / Statistics & Probability Letters 49 (2000) 285 – 290
287
Remark 1. Density (2.4) remains a valid density if the distribution function is replaced by any symmetric distribution ∗ . Thus, variant skew-Cauchy densities could be of the form " k # Y (wi ) ∗ (01 w); (2.5) f(w; 0; 1 ) = 2 i=1 ∗
where could be, for example, normal, logistic or Laplace. If W has the basic skew-Cauchy distribution (2.3), we will also say that ane transformations of W are skew-Cauchy. Thus, any random variable X of the form X = + 1=2 W
(2.6)
will be said to be skew-Cauchy if W has density (2:3) and ∈ Rk and 1=2 is positive de nite. Remark 2. The starting point for the development of our skew-Cauchy distribution was a vector of independent standard Cauchy random variables. An alternative “basic” k-dimensional Cauchy distribution is to be found in the literature and might have been used instead of the independent Cauchy model. The easiest development of the alternative model begins with k + 1 independent standard normal random variables Z1 ; Z2 ; Z3 ; : : : ; Zk and Z0 and de nes W = (W1 ; : : : ; Wk ) by Wi = Zi =|Z0 |. It is not dicult to verify that the joint density of W is of the form fW (w) = c(k)(1 + w0 w)−(k+1)=2 ;
(2.7)
(k+1=2)
where c(k) = ((k + 1)=2)= is a normalizing constant. Density (2.7) is spherically symmetric and has Cauchy marginals (see Fang et al., 1990). Now we can consider W1 ; W2 ; : : : ; Wk with joint density (2.7) and U an independent Cauchy (0; 1) random variable. Then focus on the conditional density of W given 0 + 01 W ¿U . Just as we argued to obtain (2.2) we may obtain this variant skew-Cauchy distribution as , ! 0 0 0 −(k+1)=2 p (0 + 1 w) ; (2.8) f(w; 0 ; 1 ) ˙ (1 + w w) 1 + 01 1 where the missing constant is c(k). The skew-Cauchy in (2.3) will be shown to have marginals and conditionals of the same type, a feature not shared by the variant skew-Cauchy density. When 0 = 0, the density in (2.8) corresponds to a member of the family of elliptical densities mentioned in Azzalini and Capitanio (1999). 3. Marginal and conditional distributions Suppose that W has the skew-Cauchy density (2.3) and we wish to determine the marginal density of to denote the remaining ˙ = (W1 ; : : : ; Wk1 ), the rst k1 coordinates of W . We also will use the notation W W ˙ ;W ). This dot double-dot notation for partitioning k-dimensional k2 = k − k1 coordinates of W , i.e. W = (W vectors will be used quite generally in the following development. In order to determine the distribution of ˙ we may return to the genesis of density (2.3). We are interested in the conditional distribution of W ˙ given W Pk 0 0 0 0 ˙ ˙ 0 + 1 W ¿U , i.e. given 0 + 1 W ¿U − 1 W . Now U − 1 W ∼ Cauchy (0; 1 + i = k1 +1 |1i |) so, if we P 0 )=(1 + k ˜ de ne U˜ = (U − 1 W i = k1 +1 |1i |), then U ∼ Cauchy (0; 1) and we are interested in the conditional ˙ ¿ U˜ , where ˙ given ˜ + ˜˙ W distribution of W 0
,
˜0 = 0
1+
k X i = k1 +1
1
!
|1i |
and
˜˙1 = ˙1
,
1+
k X i = k1 +1
!
|1i | :
288
B.C. Arnold, R.J. Beaver / Statistics & Probability Letters 49 (2000) 285 – 290
But this is obviously a k1 -dimensional skew-Cauchy distribution and we have ! " k # , 1 Y ˜ 0 0 (wi ) (˜0 + ˜˙1 w) ˙ f(w; ˙ 0 ; 1 ) = Pk1 ˜ 1 + i= 1 |1i | i=1 " =
k1 Y i=1
# (wi )
!,
0
1+
0 + ˙1 w˙ Pk i = k1 +1
|1i |
1+
0 Pk
i=1
! |1i |
:
(3.1)
Thus all marginals of the skew-Cauchy distribution, (2.3), are again skew-Cauchy. What about conditional densities? For any w ∈ Rk2 , what can we say about the conditional density of ˙ = w? W given W Referring to (3.1) we can write the marginal density of w and then consider the ratio f(w; ˙ w)=f( w) = f(w)=f(w). Alternatively and equivalently, we need to just write down the joint density of f(w) given by (2.3), consider w to be xed and normalize to get a density. In either of these ways, we get # " k 0 0 1 Y ˙ (0 + 1 w + ˙1 w) : (3.2) (wi ) f(w| ˙ w) = Pk1 0 ((0 + 1 w)=(1 + |1i |)) i=1 i=1
Thus all conditionals of the skew-Cauchy distribution are again skew-Cauchy. In summary, if we use the notation W ∼SCk (0 ; 1 ) to indicate that W has density (2.3), then ! ˙1 0 ˙ ∼SCk1 ; W Pk Pk 1 + i = k1 +1 |1i | 1 + i = k1 +1 |1i |
(3.3)
and for each w ∈ Rk2 , ˙ |W = w ∼SCk1 (0 + 01 w; ˙1 ): W
(3.4)
4. Parameter estimation for the skew-Cauchy distribution Suppose that W ∼SCk (0 ; 1 ) and that X = + 1=2 W :
(4.1) −1=2
(x − ) and consequently the joint density of X is given The corresponding inverse transformation is w = by Qk −1=2 0 −1=2 ) (i) (x − ))−1 |−1=2 | i = 1 (1 + (x − )0 ((i) fX (x; 0 ; 1 ; ; ) = k 0 −1=2 (0 + 1 (x − )) ; (4.2) × Pk (0 =(1 + i = 1 |1i |)) −1=2 where (i) denotes the ith row of −1=2 . Now suppose that we have a sample X1 ; X2 ; : : : ; Xn of size n from density (4.2) and we wish to estimate the parameters ; ; 0 and 1 . In principle maximum likelihood can be considered, perhaps reparameterizing by setting A = −1=2 to simplify the likelihood somewhat. A search over a parameter space of dimension [(k 2 + 5k)=2] + 1 is required. In principle also, a Bayesian approach can be implemented using, most probably, diuse priors for the parameters involved. Integration over a high-dimensional space is required for this. Density (4.2) does not seem to lend itself well to other standard inference techniques. For example, its fractional moments and quantiles are distressingly complex and so variations on the method of moments theme do not appear promising. In two dimensions, the parameter space is eight dimensional and so both maximum likelihood and diuse prior Bayesian estimation approaches
B.C. Arnold, R.J. Beaver / Statistics & Probability Letters 49 (2000) 285 – 290
289
Table 1 Estimates for the ve-parameter model Parameter
Estimate
1 2 1 2 3
172.631 79.1961 0.0000013 0.0625834 0.0000006
Table 2 Estimates for the eight-parameter model Parameter
Estimate
1 2 1 2 3 0 1 2
185.00 70.000 0.00449 −0:16261 0.0000212 0.0000014 −1:200000 −1:19996
can be implemented. In the nal section we will illustrate this using the “Australian athletes” data previously tted to skew-normal models in Azzalini and Dalla Valle (1996) and Arnold and Beaver (1997).
5. Australian athletes We shall use the data set reported by Cook and Weisberg (1994) concerning 13 variables measured on 202 athletes at the Australian Institute of Sport, courtesy of Richard Telford and Ross Cunningham. The variables to be analyzed are a person’s height and weight, denoted as (H; W ) using a model based upon (2.3) with the transformed variables x, where x = + 1=2 w that is, w = −1=2 (x − ) with "
−1=2
=
1
2
2
3
# :
The method of maximum likelihood was implemented using the Matlab minimization routine. Values obtained using a genetic algorithm and simulated annealing were used as starting values for the Matlab routine. The maximum likelihood estimates for the usual ve-parameter bivariate Cauchy which are given in Table 1, produced a log-likelihood equal to −462:471. The maximum likelihood estimates for the eight-parameter model, given in Table 2, produced a log-likelihood equal to −380:879. To test whether the eight-parameter model provides a signi cantly better t we use as a test statistic, minus twice the logarithm of the likelihood ratio equal to −2[ − 462:471 − (−380:879)] = 163:184. When this is compared to a 2 -distribution with three degrees of freedom the p-value is close to zero. Therefore, we can conclude that the parameters representing hidden truncation are signi cantly non-zero, and that there may be a truncation mechanism at work on both the height and weight of athletes.
290
B.C. Arnold, R.J. Beaver / Statistics & Probability Letters 49 (2000) 285 – 290
Acknowledgements The authors would like to thank Yating Wang and Mark E. Lehr for their role in analyzing these data. The genetic algorithm and the simulated annealing routines used in maximizing the likelihood were written by Mark Lehr and are available by E-mailing him at
[email protected] or
[email protected]. References Azzalini, A., Capitanio, A., 1999. Statistical applications of the multivariate skewed normal distribution. J. Roy. Statist. Soc. B 61, 579–602. Azzalini, A., Dalla Valle, A., 1996. The multivariate skew-normal distribution. Biometrika 83, 715–726. Arnold, B.C., Beaver, R.J., 1997. Some skewed multivariate models. Technical Report No. 249, Department of Statistics, University of California, Riverside. Cook, R.D., Weisberg, S., 1994. An Introduction to Regression Graphics. Wiley, New York. Fang, K.-T., Kotz, S., Ng, K.W., 1990. Symmetric Multivariate and Related Distributions. Chapman & Hall, London.