ARTICLE IN PRESS
Statistics & Probability Letters 77 (2007) 335–342 www.elsevier.com/locate/stapro
On Srivastava’s multivariate sample skewness and kurtosis under non-normality Yosihito Maruyama The Institute of Statistical Mathematics, 4-6-7 Minami-Azabu, Minato-ku, Tokyo 106-8569, Japan Received 21 February 2006; received in revised form 6 June 2006; accepted 24 July 2006 Available online 30 August 2006
Abstract This paper is concerned with asymptotic behavior of sample skewness and kurtosis. The asymptotic expansions under non-normality are obtained for the sample measures of multivariate skewness and kurtosis defined by Srivastava [1984. A measure of skewness and kurtosis and a graphical method for assessing multivariate normality, Statist. Probab. Lett. 2, 263–267.]. r 2006 Elsevier B.V. All rights reserved. Keywords: Asymptotic expansion; Moment; Multivariate kurtosis; Multivariate skewness; Non-normality; Principle component
1. Introduction Let x¼t ðx1 ; . . . ; xp Þ be a p-variate random vector with EðxÞ ¼ l and CovðxÞ ¼ S where the symbol ‘t x’ denotes a transpose of x. Measures of multivariate skewness and kurtosis in the sense of Srivastava (1984) are defined by using the principle component method. Let Q be an orthogonal matrix such that t QSQ¼:Dl ¼ diagðl1 ; . . . ; lp Þ where l1 ; . . . ; lp are the eigenvalues of S. Also define the principle component as y¼t ðy1 ; . . . ; yp Þ:¼t Qx. Then g1;p :¼
p X
3 2 l3 i ðE½ðyi yi Þ Þ ,
i¼1
b2;p :¼
p X
4 l2 i E½ðyi yi Þ ,
i¼1
where h¼t ðy1 ; . . . ; yp Þ:¼t Ql, these are proposed as a measure of skewness and kurtosis, respectively. Suppose random vectors according to x, and P that x1 ; . . . ; xn are independent and identically distributed P x:¼ð1=nÞ ni¼1 xi is the sample mean and U:¼ð1=ðn 1ÞÞ ni¼1 ðxi xÞ t ðxi xÞ is the unbiased sample covariance matrix. Further let yj ¼t ðy1j ; . . . ; ypj Þ:¼t Hxj for j ¼ 1; . . . ; n, y¼t ðy1 ; . . . ; yp Þ:¼t Hx and H is an orthogonal matrix such that t HUH¼:Dw ¼ diagðw1 ; . . . ; wp Þ where w1 ; . . . ; wp are the eigenvalues of U. Then E-mail address:
[email protected]. 0167-7152/$ - see front matter r 2006 Elsevier B.V. All rights reserved. doi:10.1016/j.spl.2006.07.014
ARTICLE IN PRESS Y. Maruyama / Statistics & Probability Letters 77 (2007) 335–342
336
the affine invariant sample analogues of g1;p and b2;p are given by !2 p n 1 X 3 X w ðyij yi Þ3 , g1;p :¼ 2 n i¼1 i j¼1 n 1 X 2 X wi ðyij yi Þ4 . n i¼1 j¼1 p
b2;p :¼
ð1Þ
On the other hand, Mardia (1970) has also defined population measures of multivariate skewness and kurtosis M t 3 t 2 1 1 as gM 1;p :¼E½f ðx lÞS ðx lÞg and b2;p :¼E½f ðx lÞS ðx lÞg , respectively. Here x is an independent M M copy of x. The sample analogues of g1;p and b2;p are gM 1;p :¼
n 3 1X t ðxj xÞU 1 ðxl xÞ , 2 n j;l
bM 2;p :¼
n 2 1X t ðxj xÞU 1 ðxj xÞ , n j¼1
ð2Þ
respectively. These are used commonly as estimators of multivariate skewness and kurtosis at present. It is well known that the asymptotic expansions of basic statistics on multivariate analysis (for example, Hotelling’s T 2 , the test statistics for MANOVA, multivariate linear hypothesis, etc.) include some parameters which are measures for a derivation from normality, especially the skewness and kurtosis, and so estimations of them are coming into question. In Mardia (1970), asymptotic distributions of sample measures (2) have been obtained when the population is normal (see also Mardia and Kanazawa, 1983). The related discussion of bM 2;p under the elliptical distribution has been given by Berkane and Bentler (1990) and Maruyama (2005a,b). For general distributions, the asymptotic properties have been discussed in Koziol (1986), Baringhaus and Henze (1992) and Henze (1994). However for Srivastava’s measures (1), there has almost been no work done regarding estimations under nonnormality compared with the case of Mardia’s. In addition, it seems that the characteristics which the Srivastava’s themselves have, are not completely known. Therefore, it is the purpose of this paper to provide the asymptotic behavior of g1;p and b2;p in case of a general multivariate distribution, thus generalizing the results of Maruyama (2005a) to the non-normal case. 2. The main results In this section, we shall give the asymptotic mean of g1;p , and mean and variance of b2;p to order n1 under non-normality. We denote the moment of y as mi1 i2 ik :¼Eðyi1 yi2 yik Þ. Note that y is a p-variate random vector with EðyÞ ¼ h and CovðyÞ ¼ Dl , and sample measures (1) are invariant under the transformation from 1=2 y to Dl ðy hÞ. This means that y has been standardized as EðyÞ ¼ 0 and CovðyÞ ¼ I p , the identity matrix of order p. In the following the notation mi1 i2 ik for moments is used for the standardized variate. More precisely, 1=2 let z¼t ðz1 ; . . . ; zp Þ:¼Dl ðy hÞ, and we write Eðzi1 zi2 zik Þ ¼ mi1 i2 ik . We introduce the measures for general moments: X m2aaa ðmX1Þ, gm;p :¼ |fflffl{zfflffl} a 2mþ1
bm;p :¼
X a
maaa |fflffl{zfflffl}
ðmX2Þ,
2m
which are correspondent to the 2m þ 1th (odd) and the 2mth (even) order moments and given by generalizing g1;p and b2;p , respectively. Further we can express them in terms of the original variables as gm;p ¼ Pp ð2mþ1Þ P 2m ðE½ðyi yi Þ2mþ1 Þ2 and bm;p ¼ pi¼1 lm i E½ðyi yi Þ . i¼1 li
ARTICLE IN PRESS Y. Maruyama / Statistics & Probability Letters 77 (2007) 335–342
337
Proposition 1. As n gets large, to the order n1 we have the asymptotic mean and variance of b2;p which are 1 Eðb2;p Þ ¼ b2;p þ 2b3;p þ 3Zð42 Þ;p 5b2;p þ 8g1;p þ 6p þ Oðn2 Þ, n o 1n b4;p þ Zð8Þ;pðp1Þ þ 4ðZð43 Þ Zð6;4Þ Þ b22;p 8Zð5;3Þ þ 16ðZð4;32 Þ þ g1;p Þ þ Oðn2 Þ, n
Varðb2;p Þ ¼ where Zð42 Þ;p :¼
X
m2aaaa ;
Zð8Þ;pðp1Þ :¼
a
Zð43 Þ :¼
X
X
maaaa mbbbb maabb ;
Zð6;4Þ :¼
a;b
Zð5;3Þ :¼
maaaabbbb ,
aab
X
maaaab mbbb ;
a;b
Zð4;32 Þ :¼
X
X
maaaabb mbbbb ,
a;b
maaaa mbbb maab .
a;b
Proposition 2. The asymptotic expectation of g1;p is 1 Eðg1;p Þ ¼ g1;p þ b3;p 6Zð5;3Þ;p þ 6Zð4;32 Þ;p þ 11g1;p 6b2;p þ 9p þ Oðn2 Þ, n where Zð5;3Þ;p :¼
X a
maaaaa maaa ;
Zð4;32 Þ;p :¼
X
maaaa m2aaa .
a
Corollary 3. In the univariate case p ¼ 1, let mk :¼Eðyk Þ so that m1 ¼ 0 and m2 ¼ 1 under Proposition 1. A simple calculation shows that in the univariate case Varðb2;p Þ figuring in the statement of Proposition 1 is obtained by 1 Varðb2;p Þ ¼ fm8 þ 4ðm34 m6 m4 Þ m24 8m5 m3 þ 16m23 ðm4 þ 1Þg þ Oðn2 Þ, n this is essentially equal to the asymptotic variance of bM 2;p given by Henze (1994). Corollary 4. Consider the case where the odd order moments are 0, which is led from symmetricity of the distribution of y. In the case, the results of Propositions 1 and 2 reduce to 1 Eðg1;p Þ ¼ ðb3;p 6b2;p þ 9pÞ þ Oðn2 Þ, n and 1 Eðb2;p Þ ¼ b2;p þ ð2b3;p þ 3Zð42 Þ;p 5b2;p þ 6pÞ þ Oðn2 Þ, n o 1n b4;p þ Zð8Þ;pðp1Þ þ 4ðZð43 Þ Zð6;4Þ Þ b22;p Þ þ Oðn2 Þ. n As a general class of examples, we consider the class of elliptical distributions with the characteristic function fðsÞ ¼ exp½it slcðt sLsÞ. the 2mth order moment is P It follows that the odd order moment is 00 and given by mi1 i2 i2m ¼ ðKðmÞ þ 1Þ ðd m Þ di1 i2 di2m1 i2m where KðmÞ :¼cðmÞ ð0Þ=ðcP ð0ÞÞm 1 is called the 2mth order moment parameter, especially k:¼Kð2Þ is the kurtosis parameter. Also ðd m Þ means the sum of all d m ¼ Qm k¼1 ð2k 1Þ possible combinations, and daa ¼ 1 and dab ¼ 0 for aab. Since maaaaaa ¼ 15ðKð3Þ þ 1Þ and maaaa ¼ 3ðk þ 1Þ, then Propositions 1 and 2 imply that p 15ðKð3Þ þ 1Þ 18ðk þ 1Þ þ 9 þ Oðn2 Þ, Eðg1;p Þ ¼ n Varðb2;p Þ ¼
ARTICLE IN PRESS Y. Maruyama / Statistics & Probability Letters 77 (2007) 335–342
338
and Eðb2;p Þ ¼ 3pðk þ 1Þ þ Varðb2;p Þ ¼
p 6 þ 27ðk þ 1Þ2 15ðk þ 1Þ 30ðKð3Þ þ 1Þ þ Oðn2 Þ, n
p 3ð3p þ 32ÞðKð4Þ þ 1Þ þ 36ðp þ 2Þðk þ 1Þ3 n 36ðp þ 4ÞðKð3Þ þ 1Þðk þ 1Þ 9pðk þ 1Þ2 þ Oðn2 Þ.
These results were obtained by Maruyama (2005a). More specifically the underlying distribution is normal, implying KðmÞ ¼ 0, is considered. The results are obviously given as: 6p þ Oðn2 Þ, n 12p þ Oðn2 Þ, Eðb2;p Þ ¼ 3p n 24p þ Oðn2 Þ. Varðb2;p Þ ¼ n These expressions are essentially equivalent to those obtained by Srivastava (1984). Now we define estimators of the standardized multivariate kurtosis as k~ 4 :¼b2;p 3p and k^ 4 :¼bM 2;p pðp þ 2Þ. Since both of theoretical values are 0, and the differences of the bias and the variance up to the order n1 are Eðg1;p Þ ¼
4 jbiasðk^ 4 Þj jbiasðk~ 4 Þj ¼ pðp 1Þ þ Oðn2 Þ, n 8 Varðk^ 4 Þ Varðk~ 4 Þ ¼ pðp 1Þ þ Oðn2 Þ, n if the underlying distribution is normal and pX2, k~ 4 is superior to k^ 4 and so b2;p is better than bM 2;p from the viewpoint of bias and asymptotic variance. 3. Numerical example We shall compare the bias of k~ 4 and k^ 4 (if and only if b2;p and bM 2;p ) by simulation based on 10,000 times replications. Table 1 lists the bias when the underlying distribution is normal (denote M1), the sample sizes are 100, 200, 500 and 4000, and p ¼ 2; 5. The mean squared error (MSE) is presented in Table 2. Table 1 Simulation results for the bias p
M1 M2 M3 M4 M5
a
n ¼ 100
2 5 2 5 2 5 2 5 2 5
Value 101 ,
ak
n ¼ 200
n ¼ 500
n ¼ 4000
b2;p
bM 2;p
b2;p
bM 2;p
b2;p
bM 2;p
b2;p
bM 2;p
0.124 0.800 0:577a 0.389 3.873 7.952 0.880 1.022 1:931b2 4:794b2
0.315 1.364 0:679a 0.986 4.252 1:215b 1.117 3.653 1:940b2 4:888b2
0:619a 0.348 0:262a 0.188 2.295 5.014 0.497 0.625 1:779b2 4:445b2
0.157 0.714 0:350a 0.471 2.513 7.216 0.622 1.951 1:785b2 4:501b2
0:267a 0.138 0:100a 0:768a 1.095 2.344 0.198 0.304 1:536b2 3:813b2
0:674a 0.283 0:137a 0.179 1.185 3.278 0.244 0.855 1:538b2 3:839b2
0:320a2 0:165a 0:132a2 0:772a2 0.159 0.352 0:297a 0:523a 8:285b 2:128b2
0:801a2 0:342a 0:150a2 0:222a 0.170 0.465 0:352a 0.118 8:285b 2:128b2
Value 10k ðkX2Þ, b Value 10,
bk
Value 10k ðkX2Þ.
ARTICLE IN PRESS Y. Maruyama / Statistics & Probability Letters 77 (2007) 335–342
339
Table 2 Simulation results for the MSE p
M1 M2 M3 M4 M5
a
n ¼ 100
2 5 2 5 2 5 2 5 2 5
Value 101 ,
ak
n ¼ 200
n ¼ 500
n ¼ 4000
b2;p
bM 2;p
b2;p
bM 2;p
b2;p
bM 2;p
b2;p
bM 2;p
0.463 2.696 0:568a 0.513
0.608 4.015 0:589a 1.398
0.236 0.910 0:208a 0.207
0.323 1.727 0:277a 0.327
0:942a 0.284 0:629a2 0:554a
0.125 0.617 0:106a 0:706a
4:313b 1:413b2 5.462 1:508b 3:766b4 2:313b5
4:556b 2:141b2 5.948 2:555b 3:798b4 2:397b5
3:172b 9:145b 3.599 9.361
3:237b 1:178b2 3.828 1:319b 3:275b4 2:047b5
1:918b 5:274b 1.840 4.674
1:953b 5:985b 1.925 5.961
0:119a 0:305a 0:666a3 0:231a2 3.649 9.360 0.278 0.697
0:156a 0:707a 0:130a2 0:822a2 3.702 9.996 0.292 0.819
2:615b4 1:522b5
2:622b4 1:542b5
7:989b3 9:187b4
7:986b3 9:195b4
Value 10k ðkX2Þ, b Value 10,
3:258b4 1:999b5 bk
Value 10k ðkX2Þ.
Table 3 Theoretical values p
g1;p ; gM 1;p b2;p bM 2;p
M1 (normal)
2
0
5 2 5 2
0 6 15 8
5
35
M2 (uniform) 0 0 3.6 9 5.6 29
M3 (exponential)
M4 (chi-square)
M5 (log-normal)
8
2
76.5
20 18 45 20
5 9 22.5 11
191.26 227.87 569.68 229.87
65
42.5
589.68
Similarly, Tables 1 and 2 list the results also when underlying distributions are the following four types of non-normal distributions: M2 Uniform distribution: Each of the p variables is generated independently from a uniform distribution on the interval ½2; 2, M3 Exponential distribution: Each of the p variables is generated independently from an exponential distribution with a mean of unity, M4 Chi-square distribution: Each of the p variables is generated independently from a chi-square distribution with 8 degrees of freedom, M5 Log-normal distribution: Each of the p variables is generated independently from a log-normal distribution such that log x has the standard normal, where the first distribution (M2) is included in the symmetric distributions mentioned in Corollary 4. The theoretical values of population multivariate skewness and kurtosis are computed easily and given by Table 3. M We note that both g1;p and gM 1;p are equivalent under each condition. In addition, the bias for g1;p and g1;p are presented in Table 4. 4. Conclusion It may be seen from simulation results that the expectation of sample kurtosis b2;p converges to the multivariate kurtosis b2;p when the sample size is large. Also from Table 1, we notice that both b2;p and bM 2;p are underestimated for each population. In other words, the bias of k~ 4 is smaller than that of k^ 4 in magnitude.
ARTICLE IN PRESS Y. Maruyama / Statistics & Probability Letters 77 (2007) 335–342
340
Table 4 Simulation results for the bias of g1;p p
M1 M2 M3 M4 M5
a
n ¼ 100
2 5 2 5 2 5 2 5 2 5
Value 101 ,
ak
n ¼ 200
n ¼ 500
n ¼ 4000
g1;p
gM 1;p
g1;p
gM 1;p
g1;p
gM 1;p
g1;p
gM 1;p
0.112 0.330 0:414a 0.127 1.151 1.459 0:992a 0.179
0.216 0.920 0:975a 0.656 3.934 1:096b 0.978 3.060
0:580a 0.156 0:205a 0:567a 0.566 0.851 0:465a 0:960a
0.113 0.493 0:488a 0.340 2.429 6.878 0.558 1.746
0:238a 0:606a
0:301a2 0:749a2 0:101a2 0:259a2 0:333a 0:354a
0:594a2 0:262a
0:837a2 0:212a 0.232 0.304 0:862a2 0:371a
0:466a 0.203 0:190a 0.137 1.158 3.244 0.237 0.769
5:298b 1:275b2
6:768b 1:712b2
4:352b 1:079b2
6:125b 1:547b2
3:052b 7:429b
5:044b 1:270b2
Value 10k ðkX2Þ, b Value 10,
bk
0:173a2 0:511a3 3.255 1:176b
0:242a2 0:173a 0.171 0.462 0:324a 0.107 2:326b 5:895b
Value 10k ðkX2Þ.
This coincides with the theoretical result in Corollary 4. Moreover, the same results as b2;p are seen in the case of sample skewness g1;p . It can be seen from Table 4 that the bias of g1;p is small in comparison with that of gM 1;p . Remark that both of them are underestimated for each population. The fact mentioned above is true for the MSE. It may be noted from Table 4 that the MSE of b2;p is small as compared to that of bM 2;p . Therefore, it can be seen that g1;p is superior to gM from the viewpoint of bias, and that b is better than bM 2;p 1;p 2;p from the viewpoint of bias and MSE. In this paper, we considered Srivastava’s multivariate skewness and kurtosis, and compared them with Mardia’s measures theoretically and by simulations. There are tests of normality based on those measures of multivariate skewness and kurtosis. It may be noted that the test statistics are derived by using asymptotic formulae in propositions, which have efficient approximations. However, it is not possible to find others easily. It will be important to continue a study of these problems in the future. Acknowledgement The author thanks the referee for his/her helpful comments. Appendix A. Proofs of propositions 2 Since yj , y and Dw are not independent, we put T ij :¼w1 i ðyij yi Þ and
yðaÞ :¼
n 1 X y. n 1 jaa j
Similarly, the covariance matrix should be estimated without the ath observation. That is by LðaÞ :¼
n 1 X ðy yðaÞ Þ t ðyj yðaÞ Þ. n 2 jaa j
Then, it can be shown that 1 1 Dw ¼ 1 LðaÞ þ ðya yðaÞ Þ t ðya yðaÞ Þ. n1 n
ARTICLE IN PRESS Y. Maruyama / Statistics & Probability Letters 77 (2007) 335–342
341
and also D1 w
2 2 1 1 n 1 1 ððn 1Þ =nðn 2Þ ÞLðaÞ ðya yðaÞ Þ t ðya yðaÞ ÞLðaÞ LðaÞ ¼ . t 1 n2 ð1 þ ðn 1Þ=nðn 2ÞÞ ðya yðaÞ ÞLðaÞ ðya yðaÞ Þ
Further, let 1 yðaÞ ¼ pffiffiffiffiffiffiffiffiffiffiffi f, n1 1 LðaÞ ¼ I p þ pffiffiffiffiffiffiffiffiffiffiffi M. n1
ð3Þ
Note that it is enough if only diagonal elements are calculated. Then, T 2ij is stochastically expanded with f¼t ðz1 ; . . . ; zp Þ and M ¼ ðmab Þ as 1 T 2ij ¼ y4ij þ pffiffiffi ð4y3ij zi 2mii y4ij Þ n 1 þ ð2y6ij 2y4ij þ 3m2ii y4ij þ 6z2i y2ij þ 8mii y3ij zi Þ þ Oðn3=2 Þ. n By using the asymptotic expansion, the joint probability density function (j.p.d.f.) of f and M (see Fujikoshi, 2002), we have Eðm2aa Þ ¼ maaaa 1 ðmaaaa 3Þ=n and Eðmaa mbb Þ ¼ maabb 1 ðmaabb 1Þ=n for aab. Therefore, we obtain the second order asymptotic mean of b2;p . Using a similar idea, we may obtain the variance. In order to avoid the dependence of ya , yb , y and Dw , we define yða;bÞ :¼
n 1 X y, n 2 jaa;b j
Lða;bÞ :¼
n 1 X ðy yða;bÞ Þt ðyj yða;bÞ Þ. n 3 jaa;b j
Note that Eðg22;p Þ ¼ ð1=nÞG1 þ ð1 1=nÞG2 where 2 !2 3 " # p p X X 2 2 2 G1 :¼E 4 T ij 5; G2 :¼E T ia T kb ðaabÞ. i¼1
i;k
In order to compute G2 , let 1 yða;bÞ ¼ pffiffiffiffiffiffiffiffiffiffiffi f , n2 1 Lða;bÞ ¼ I p þ pffiffiffiffiffiffiffiffiffiffiffi M . n2
ð4Þ
For evaluation of G1 , we calculate the expectation with respect to yj , f and M, and then for calculating G2 , we use an asymptotic expansion of j.p.d.f. of f and M . Then we have the asymptotic variance up to the order n1 . 3=2 We can get Proposition 2 by the same method above. Let V ij :¼wi ðyij yi Þ3 . Then we have p p 1X 1 X 2 Eðg1;p Þ ¼ EðV ij Þ þ 1 EðV ia V ib Þ ðaabÞ. n i¼1 n i¼1
ARTICLE IN PRESS Y. Maruyama / Statistics & Probability Letters 77 (2007) 335–342
342
As for EðV 2ij Þ, we may rewrite V 2ij with yðaÞ and LðaÞ and so expand with f and M given by (3). Also for EðV ia V ib Þ, we make use of yða;bÞ and Lða;bÞ . Expanding V ia V ib with f and M given by (4), we have 3 V ia V ib ¼ y3ia y3ib pffiffiffi y2ia y2ib ðmii yia yib þ yia zi þ yib zi Þ n 1 2 2 þ ½yia yib f9mii zi ðyia þ yib Þ þ 6m2ii yia yib 3ðy2ia þ y2ib Þðyia yib þ 1Þg n þ 3yia yib z2i ðy2ia þ y2ib þ 3yia yib Þ þ Oðn3=2 Þ, Calculating the expectation with respect to ya , yb , f and M , we obtain the asymptotic mean Eðg1;p Þ up to the order n1 . Thus, we complete the proof. & References Baringhaus, L., Henze, N., 1992. Limit distributions for Mardia’s measure of multivariate skewness. Ann. Statist. 20, 1889–1902. Berkane, M., Bentler, P.M., 1990. Mardia’s coefficient of kurtosis in elliptical populations. Acta Math. Appl. Sin. Eng. Ser. 6, 289–294. Fujikoshi, Y., 2002. Asymptotic expansions for the distributions of multivariate basic statistics and one–way MANOVA test under nonnormality. J. Statist. Plann. Inference 108, 263–282. Henze, N., 1994. On Mardia’s kurtosis test for multivariate normality. Comm. Statist. Theory Methods 23, 1031–1045. Koziol, J.A., 1986. A note on the asymptotic distribution of Mardia’s measure of multivariate kurtosis. Comm. Statist. Theory Methods 15, 1507–1513. Mardia, K.V., 1970. Measures of multivariate skewness and kurtosis with applications. Biometrika 57, 519–530. Mardia, K.V., Kanazawa, M., 1983. The null distribution of multivariate kurtosis. Comm. Statist. Simulation Comput. 12, 569–576. Maruyama, Y., 2005a. Asymptotic properties for measures of multivariate kurtosis in elliptical distributions. Int. J. Pure Appl. Math. 25, 407–421. Maruyama, Y., 2005b. The effect of non-normality on the distribution of sample fourth order cumulant under elliptical populations. SUT J. Math. 41, 97–116. Srivastava, M.S., 1984. A measure of skewness and kurtosis and a graphical method for assessing multivariate normality. Statist. Probab. Lett. 2, 263–267.