Bias-reduced estimates for skewness, kurtosis, L-skewness and L-kurtosis

Bias-reduced estimates for skewness, kurtosis, L-skewness and L-kurtosis

Journal of Statistical Planning and Inference 141 (2011) 3839–3861 Contents lists available at ScienceDirect Journal of Statistical Planning and Inf...

675KB Sizes 0 Downloads 75 Views

Journal of Statistical Planning and Inference 141 (2011) 3839–3861

Contents lists available at ScienceDirect

Journal of Statistical Planning and Inference journal homepage: www.elsevier.com/locate/jspi

Bias-reduced estimates for skewness, kurtosis, L-skewness and L-kurtosis Christopher S. Withers a, Saralees Nadarajah b, a b

Applied Mathematics Group, Industrial Research Limited, Lower Hutt, New Zealand School of Mathematics, University of Manchester, Manchester M13 9PL, UK

a r t i c l e i n f o

abstract

Article history: Received 23 March 2010 Received in revised form 24 December 2010 Accepted 29 June 2011 Available online 7 July 2011

Estimates based on L-moments are less non-robust than estimates based on ordinary moments because the former are linear combinations of order statistics for all orders, whereas the later take increasing powers of deviations from the mean as the order increases. Estimates based on L-moments can also be more efficient than maximum likelihood estimates. Similarly, L-skewness and L-kurtosis are less non-robust and more informative than the traditional measures of skewness and kurtosis. Here, we give nonparametric bias-reduced estimates of both types of skewness and kurtosis. Their asymptotic computational efficiency is infinitely better than that of corresponding bootstrapped estimates. & 2011 Elsevier B.V. All rights reserved.

Keywords: Bias Kurtosis L-moments Maximum Nonparametric Skewness

1. Introduction The concept of L-moments was introduced in the seminal paper by Hosking (1990). See also Hosking (1992, 2006, 2007a,b) for developments on L-moments. L-moments have several advantages over ordinary moments (Hosking, 1990; Theorem 1, Theorem 3 and p. 115): (1) for L-moments to be defined one requires only that the distribution has finite mean, no higherorder moments need be finite; (2) for variance of L-moments to be finite one requires only that the distribution has finite variance, no higher-order moments need be finite; (3) sample standardized L-moments can take any values that the corresponding population quantities can; and (4) univariate L-moments determine the underlying distribution in general. Hosking’s L-moments are increasingly widely used in preference to moment estimators based on ordinary moments. For suitable weight functions they are less non-robust than ordinary moment estimators. They can also be more efficient than maximum likelihood estimators: see, for example, an application to the three-parameter generalized extreme-value distribution by Hosking et al. (1985). Hosking’s L-moments also have other uses than providing useful parameter estimators. For example, plots of L-kurtosis versus L-skewness are used by hydrologists and others to identify what family or type of distribution a sample is best fitted to. Such plots are much more informative than straight kurtosis-skewness plots. See Hosking (1990), Pearson (1993) and Vogel and Fennessey (1993). Hosking’s L-moments have been studied by many other authors in the statistics community. We mention Royston (1992), Ulrych et al. (2000), David and Nagaraja (2003, Section 9.9), Jones (2004), Serfling and Xiao (2007), Delicado and Goria (2008), and Alkasasbeh and Raqab (2009).

 Corresponding author.

E-mail address: [email protected] (S. Nadarajah). 0378-3758/$ - see front matter & 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.jspi.2011.06.024

3840

C.S. Withers, S. Nadarajah / Journal of Statistical Planning and Inference 141 (2011) 3839–3861

Hosking’s L-moments have also received extensive applications in many areas. The most prominent application areas are engineering, hydrology, limnology, meteorology and climatology, geosciences, computer science, economics, business and finance, and biology. The aim of this paper is to provide nonparametric bias-reduced estimators for some measures based on L-moments and ordinary moments. Let X1 , . . . ,Xn be a random sample of size n from an unknown distribution F on R ¼ ð1,1Þ, with finite mean m and central moments fmr g. The skewness and kurtosis of F are

b3 ¼ m3 m3=2 , b4 ¼ m4 m2 2 : 2 The rth L-moment of F is defined to be ! r 1 X r1 lr ¼ r 1 ð1Þj EX rj,r , j

ð1:1Þ

j¼0

where X1,r r    r Xr,r are the order statistics of a random sample of size r from F (Hosking, 1990). A more informative way to write (1.1) is as an (r  1)th-order difference of the order statistics:

lr ¼ r 1 Dr1 EX m,r , at m¼1, where Dxm ¼ xm þ 1 xm , that is, D is the forward difference operator. Note that l1 ¼ m, so is a measure of location; l2 is a measure of scale, l3 is a measure of skewness and l4 is a measure of kurtosis. Set

br ¼ mr m2r=2 ,

ð1:2Þ

the standardized rth moment, and

tr ¼ lr l1 2 ,

ð1:3Þ

the standardized rth L-moment. As mentioned, ftr g are much more informative than fbr g for distinguishing between families of distributions. In particular, plots of t4 against t3 are more useful than plots of b4 against b3 . See, for example, Figs. 2 and 3 and Table 1 of Hosking (1990), Pearson (1993) and Vogel and Fennessey (1993). Withers and Nadarajah (2010) give unbiased estimators for mr for r r7. Hosking (1990, Eq. (3.1)) gives the following unbiased estimator of lr : X

lr ¼ ðnÞ1 r

r 1 X

ð1Þk ðr1Þk Xirk ,n ,

ð1:4Þ

1 r i1 o  o ir r n k ¼ 0

where ðnÞr ¼ n!=ðnrÞ! ¼ nðn1Þ    ðnr þ 1Þ. However, while unbiased estimators exist for moments and L-moments, they do not exist for their standardized versions br and tr of (1.2), (1.3). This paper provides nonparametric bias-reduced estimators for br and tr . A summary of the theory needed is given in Section 2. Unlike resampling methods, these analytic formulas only require OðnÞ or Oðn2 Þ calculations, where n is the sample size. So, their asymptotic computational efficiency is infinitely better than corresponding bootstrapped estimators since these require Oðnp Þ calculations to reduce bias to Oðnp Þ. Sections 3 and 4 apply the theory to standardized moments and standardized L-moments, respectively. Applications and a simulation study are given in Sections 5 and 6. Some conclusions and future work are noted in Section 7. Some technical results needed for the theory are presented in Appendices A and B. The proofs of all results are given in Appendix C. Unless otherwise stated, we assume throughout that F is continuous, differentiable and has all fmr g finite and all flr g finite too. These conditions may be too restrictive, but are sufficient for all of the results given. 2. The theory of nonparametric bias reduction Let Fb be the empirical distribution of a random sample of size n from an unknown distribution F defined on some space O. Let TðFÞ be a real functional with rth-order von Mises (functional) derivative Tx1 xr ¼ TF ðx1 , . . . ,xr Þ finite almost everywhere F for all r Z1. By Withers and Nadarajah (2010), E TðFb Þ ¼ TðFÞ þ

1 X

ni Ci ðFÞ,

i¼1

where Ci ðFÞ is a certain functional of fTF ðx1 , . . . ,xr Þ,r r2ig. Explicit forms for Ci ðFÞ are given in Section 3 of Withers and Nadarajah (2010). Also there exist fTi ðFÞ,Si ðFÞg such that Tpn ðFb Þ ¼

p1 X i¼0

ni Ti ðFb Þ,

Spn ðFb Þ ¼

p1 X i¼0

Si ðFb Þ=ðn1Þi

ð2:1Þ

C.S. Withers, S. Nadarajah / Journal of Statistical Planning and Inference 141 (2011) 3839–3861

3841

satisfy ET pn ðFb Þ ¼ TðFÞ þOðnp Þ,

ESpn ðFb Þ ¼ TðFÞ þOðnp Þ

as n-1. Explicit forms for fTi ðFÞ,Si ðFÞg are given in Eqs. (1.2) and (1.3) and Section 4 of Withers and Nadarajah (2010). These pth-order estimators of TðFÞ require only OðnÞ or Oðn2 Þ calculations. This compares with the pth-order bootstrap (see Theorem 1.3 of Hall, 1992, p. 28, in which ð1Þi þ 1 should be inserted in the right-hand side of Eq. (1.35) in the book) which requires Oðnp Þ calculations. So, the computational efficiency of the pth-order bootstrap relative to the analytic estimators Tpn ðFb Þ, Spn ðFb Þ is Oðn2p Þ-0 as n-1 for fixed p 4 2. If T(F) is a polynomial in F of degree less than or equal to p (for example, the pth central moment, mp , or pth cumulant, kp ), then Spn ðFbÞ is an unbiased estimator of T(F). Theorem 2.1. The second, third and fourth-order estimators based on (2.1) are S2n ðFb Þ ¼ TðFb Þ þ S1 ðFb Þ=ðn1Þ, S3n ðFb Þ ¼ TðFb Þ þ S1 ðFb Þ=ðn1Þ þ S2 ðFb Þ=ðn1Þ2 , S4n ðFb Þ ¼ S3n ðFb Þ þ S3 ðFb Þ=ðn1Þ3 and T2n ðFb Þ ¼ TðFb Þ þ T1 ðFb Þ=n, T3n ðFb Þ ¼ TðFb Þ þ T1 ðFb Þ=n þ T2 ðFb Þ=n2 , T4n ðFb Þ ¼ T3n ðFb Þ þ T3 ðFb Þ=n3 , where T1 ¼ S1 ¼ C1 ,

ð2:2Þ

S2 ðFÞ ¼ Tð13 Þ=3þ Tð12 12 Þ=8,

T2 ðFÞ ¼ S1 ðFÞ þ S2 ðFÞ,

ð2:3Þ

S3 ðFÞ ¼ Tð14 Þ=4þ 3Tð12 12 Þ=8Tð13 12 Þ=6Tð12 12 12 Þ=48, T3 ðFÞ ¼ S3 ðFÞ þ 3S2 ðFÞ þ S1 ðFÞ, where C1 ðFÞ ¼ Tð12 Þ=2, Tð13 Þ ¼

Z

Tð12 12 Þ ¼ Tð14 Þ ¼

Z

Tð13 12 Þ ¼

Tð12 Þ ¼

Z

TF ðx,xÞ dFðxÞ,

TF ðx,x,xÞ dFðxÞ, ZZ

TF ðx,x,y,yÞ dFðxÞ dFðyÞ,

TF ðx,x,x,xÞ dFðxÞ, ZZ

TF ðx,x,x,y,yÞ dFðxÞ dFðyÞ,

ZZZ TF ðx,x,y,y,z,zÞ dFðxÞ dFðyÞ dFðzÞ, Tð12 12 12 Þ ¼ where Tðai bj   Þ ¼

Z

Z 

TF ðxi yj   Þ dF a ðxÞ dF b ðyÞ    ,

where xi denotes a string of i x’s (not a product) and similarly, for ai. Furthermore, the estimators TðFb Þ, Tpn ðFb Þ and Spn ðFb Þ all have variance n1 VT ðFÞ þOðn2 Þ, where Z VT ðFÞ ¼ TF ðxÞ2 dFðxÞ:

ð2:4Þ

If TðFÞ ¼ gðSðFÞÞ, where SðFÞ is bivariate, then Tð12 Þ ¼ g1 S 1 þ g2 S 2 þ g11 S 11 þ 2g12 S 12 þ g22 S 22 ,

ð2:5Þ

3842

C.S. Withers, S. Nadarajah / Journal of Statistical Planning and Inference 141 (2011) 3839–3861

VT ðFÞ ¼ g12 S 11 þ 2g1 g2 S 12 þg22 S 22 ,

ð2:6Þ

where gi ¼ @gðsÞ=@si

at s ¼ SðFÞ,

gij ¼ @2 gðsÞ=@si @sj Si ¼ S ij ¼

Z Z

at s ¼ SðFÞ,

SiF ðx,xÞ dFðxÞ, SiF ðxÞSjF ðxÞ dFðxÞ:

ð2:7Þ

Since VT ðFb Þ estimates VT ðFÞ with bias Oðn1 Þ, n1 VT ðFb Þ estimates var fTðFb Þg with bias Oðn2 Þ. To reduce the bias of n1 VT ðFb Þ to Oðn3 Þ one can apply Section 6 of Withers and Nadarajah (2010). 3. Bias-reduced standardized moments Theorem 3.1 applies Theorem 2.1 to

br ¼ br ðFÞ ¼ m2r=2 mr , the standardized rth moment. r=2

Theorem 3.1. Let TðFÞ ¼ br ðFÞ ¼ m2 S1 ðFÞ ¼

2 X

cr,r þ i br þ i ,

mr . Then the estimators of Theorem2.1 hold with 4 X

S2 ðFÞ ¼

i ¼ 2

dr,r þ i br þ i ,

ð3:1Þ

i ¼ 4

where cr,r þ 2 ¼ r=2, cr,r þ 1 ¼ 0, cr,r ¼ rðr þ2Þðb4 1Þ=8, cr,r1 ¼ r 2 b3 =2, cr,r2 ¼ rðr1Þ=2, and dr,r þ 4 ¼ ðr=2Þðr=2 þ1Þ,

dr,r þ 3 ¼ 0,

  dr,r þ 2 ¼ r ðr þ2Þðr þ4Þb4 þ r 2 þ 18r þ 16 =16,

dr,r þ 1 ¼ 0,

 2 dr,r ¼ ðr=2Þðr=2 þ 1Þðr=2 þ 2Þb6 =3 þðr=2Þðr=2 þ 1Þðr=2 þ 2Þðr=2þ 3Þ b4 1 =8    ðr=2Þðr=2 þ 1Þ 5r=2þ 7 b4 1 rðr1Þðr þ10Þ=24, dr,r1 ¼ r 2 ðr þ 2Þb5 =4 þr 2 fðr þ 2Þðr þ 4Þðb4 1Þ þ 4ð5r þ 4Þgb3 =16,   dr,r2 ¼ r 2 ðr1Þf3ðr þ 2Þ b4 1 þ 4g=16, dr,r3 ¼ rðr1Þðr2Þð3r4Þb3 =12,

dr,r4 ¼ rðr1Þðr2Þðr3Þ=8:

r=2 Furthermore, br ðFb Þ ¼ mr ðFb Þm2 ðFb Þ has variance n1 vr þ Oðn2 Þ, where

mr ðFbÞ ¼

n  r 1X X X , ni¼1 i

the estimator of mr for a sample X1 ,X2 , . . . ,Xn from F, and 2

2

2

vr ¼ b2r br 2r br þ 1 br1 r br ðbr þ 2 br r br1 b3 Þ þr 2 br ðb4 1Þ=4þ r 2 br1 , where X ¼ ðX1 þ X2 þ    þ Xn Þ=n. Example 3.1. For b3 , S1 ðFÞ ¼ 3ð4b5 5b4 b3 7b3 Þ=8,

ð3:2Þ

S2 ðFÞ ¼ 15b7 =4 þ35b6 b3 =83b5 ð35b4 þ 61Þ=16þ 945ðb4 1Þ2 b3 =128555ðb4 1Þb3 =16 þ 75b3 =2, 2

2

and v3 ¼ b6 3b5 b3 6b4 b2 þ b3 ð9b4 þ35Þ=4 þ 9b2 . Example 3.2. For b4 , 2

S1 ðFÞ ¼ 2b6 3ðb4 1Þb4 8b3 6,

ð3:3Þ 2

S2 ðFÞ ¼ 6b8 2b6 ð2b4 1Þ24b5 b3 þ 8b3 ð6b4 þ 5Þ þ 15ðb4 1Þ3 87ðb4 1Þ2 55ðb4 1Þ þ14 2

2

2

and v4 ¼ b8 4b6 b4 8b5 b3 þ b4 ð4b4 1Þ þ16b4 b3 þ 16b3 .

C.S. Withers, S. Nadarajah / Journal of Statistical Planning and Inference 141 (2011) 3839–3861

3843

4. Probability-weighted moments Theorem 4.1 gives bias-reduced estimators of probability-weighted moments of bias Oðn4 Þ. Given a distribution function F(x) on some space O, by a probability-weighted moment we mean a functional of the form Z TðFÞ ¼ EaðX,FðXÞÞ ¼ aðx,FðxÞÞ dFðxÞ ¼ Tða,FÞ ð4:1Þ say, where X  FðxÞ and aðx,yÞ : O  ½0,1-R is a given function, which we call the weight function. In Section 5, we apply these to the L-moments. L-moments are linear combinations of the rth probability-weighted moment, b1r ¼ EXFðXÞr , where O ¼ R. Theorem 4.1. Let TðFÞ ¼ Tða,FÞ of (4.1) with ai ðx,yÞ ¼ ð@=@yÞi aðx,yÞ assumed to exist for iZ 0. Then the estimators of Theorem2.1 hold with Si ðFÞ ¼ TðAi ,FÞ,

ð4:2Þ

where Ai ðx,yÞ ¼

2i X

sij ðyÞaj ðx,yÞ:

j¼i

The first few sij are s11 ðFÞ ¼ ð1FÞ,

s12 ðFÞ ¼ ðFF 2 Þ=2,

s22 ðFÞ ¼ ð1FÞð12FÞ,

s23 ðFÞ ¼ ðFF 2 Þð510F þ 3F 2 Þ=6,

s24 ðFÞ ¼ ðFF 2 Þ2 =8, s33 ðFÞ ¼ ð1FÞð2 þ 9F9F 2 Þ=2, s34 ðFÞ ¼ ð1FÞð20 þ 54F99F 2 21F 3 Þ=24, s35 ðFÞ ¼ Fð1FÞ2 ð4 þ 5F þ 3F 2 Þ=24, s36 ðFÞ ¼ ðFF 2 Þ3 =48: Furthermore, Z TðFb Þ ¼ aðx, FbðxÞÞ dFb ðxÞ ¼ Tða, Fb Þ has asymptotic variance vn1 þ Oðn2 Þ, where v ¼ c00 þ 2c01 þc11 Tða,FÞ2 , c00 ¼

Z

aðx,FðxÞÞ2 dFðxÞ,

c01 ¼ d01 Tða,FÞTðb,FÞ, d01 ¼

ZZ xry

aðx,FðxÞÞa1 ðy,FðyÞÞ,

bðx,yÞ ¼ a1 ðx,yÞy, c11 ¼ d11 Tðb,FÞ2 , d11 ¼

ZZ

a1 ðy,FðyÞÞa1 ðz,FðzÞÞFðy4zÞ,

3844

C.S. Withers, S. Nadarajah / Journal of Statistical Planning and Inference 141 (2011) 3839–3861

where y4z ¼ minðy,zÞ. If F is continuous then d11 can be simplified to ZZ a1 ðy,FðyÞÞa1 ðz,FðzÞÞFðyÞ: d11 ¼ 2 yoz

Note that (4.2) provides the estimator S4n ðFb Þ of bias Oðn4 Þ. The bias-reduced estimators are linear combinations of probability-weighted moments. These results can be extended using the formulas for Si ðFÞ for i r 8 given in Withers and Nadarajah (2008). Suppose that O ¼ R. Then TðFb Þ ¼ n1

n X

aðXðkÞ ,k=nÞ

k¼1

and Si ðFb Þ ¼ n1

n X

Ai ðXðkÞ ,k=nÞ,

k¼1

where Xð1Þ r    r XðnÞ are the ordered sample values. It is important to note that the estimators for which bias reduction is sought by Theorem 4.1 are not the same as the estimators defined by Hosking (1990). For example, take aðx,FÞ ¼ xð2F1Þ. Then TðFÞ ¼ l2 , the second L-moment. The unbiased estimator for l2 given by Hosking (1990) is l2 defined by (1.4). But TðFb Þ ¼ l2 ðn1Þ=n. Similar arguments apply to Theorem 5.1 in Section 5: it can be seen that the estimator TðFb Þ corresponding to 1 TðFÞ ¼ l2 lr in (5.2) is not the same as Hosking’s ‘‘sample L-moment ratio’’ lr =l2 . The observations of the two preceding paragraphs were suggested by a referee. We are most grateful to this referee. 5. Application to L-moments Suppose that O ¼ R. Set

bjr ¼ EX j FðXÞr ¼ Tða,FÞ

ð5:1Þ j

r

of (4.1) for weight function aðx,yÞ ¼ x y . By Eq. (2.6) of Hosking (1990) or Eq. (7) of Vogel and Fennessey (1993), the Lmoment, lr , defined by (1.1), can be written as    r X r r þk lr ¼ b1r ð1Þrk : k k k¼0 So,

l1 ¼ g0 , l2 ¼ 2g1 g0 , l3 ¼ 6g2 6g1 þ g0 , l4 ¼ 20g3 30g2 þ 12g1 g0 , where gr ¼ b1r . Theorem 5.1 applies Theorem 2.1 to 1

TðFÞ ¼ tr ¼ l2 lr ,

ð5:2Þ

the standardized L-moment. Theorem 5.1. Let TðFÞ ¼ tr of (5.2). Then the estimators of Theorem2.1 hold with S1 ðFÞ ¼ Tð12 Þ=2, where Tð12 Þ is given by (2.5) with 2

gðsÞ ¼ s1 1 s2 ,

g1 ¼ l2 lr ,

Z

Z

S1 ¼

l2xx , S 2 ¼

1

g2 ¼ l2 ,

lrxx , S 11 ¼

Z

3

g11 ¼ 2l2 lr ,

l22x , S 12 ¼

Z

2

g12 ¼ l2 ,

g22 ¼ 0,

l2x lrx

for S i , S ij of (2.7), where integrals are with respect to F(x). In particular, S 11 ¼ 4a11 4a10 þ a00 , where aij ¼

Z

gix gjx :

ð5:3Þ

C.S. Withers, S. Nadarajah / Journal of Statistical Planning and Inference 141 (2011) 3839–3861

3845

The faij g needed here are given in Theorem B.1 in terms of Z y bjr ðyÞ ¼ xj FðxÞr dFðxÞ:

ð5:4Þ

If F is continuous then (5.3) can be reduced to S 11 ¼ 4b22 4b21 þ b20 16g21 þ 8g1 g0 3g20 þ 16

Z

b11 db10 :

ð5:5Þ

br has asymptotic variance Vr n1 þ Oðn2 Þ, where Furthermore, t Vr ¼ g12 S 11 þ2g1 g2 S 12 þ g22 S 22 , and Z

S 22 ¼

l2rx :

If AðFÞ ¼

ZZ

aðx,FðxÞÞbðy,FðyÞÞ, xoy

then AðFb Þ ¼ n2

n X

IðXi oXj ÞaðXi , Fb ðXi ÞÞ bðXj , Fb ðXj ÞÞ:

i,j ¼ 1

For F continuous, X

AðFb Þ ¼ n2

aðXðiÞ ,i=nÞbðXðjÞ ,j=nÞ:

MSE (Skew)

Bias (Skew)

1riojrn

0.000

−0.005 12

14

−0.3 4

6 8 10 Square root of n

12

0e+00

−8e−04 4

6 8 10 Square root of n

12

0.0030

0.0000 2

4

6 8 10 Square root of n

12

14

4

6 8 10 Square root of n

12

14

2

4

6 8 10 Square root of n

12

14

2

4

6 8 10 Square root of n

12

14

2

4

6 8 10 Square root of n

12

14

0.2

0.007

0.001

14

MSE (L−Kurt)

2

2

0.8

14

MSE (L−Skew)

Bias (L−Skew)

6 8 10 Square root of n

0.0

2

Bias (L−Kurt)

4

0.05

MSE (Kurt)

Bias (Kurt)

2

0.25

0.010

0.002

Fig. 6.1. Biases and mean squared errors for the standard normal case. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

3846

C.S. Withers, S. Nadarajah / Journal of Statistical Planning and Inference 141 (2011) 3839–3861

This provides sample versions of faij g above. R Integrals of the form b1r db1s in Theorem 5.1 do have explicit forms: using (5.4), we obtain ZZ y ZZ Z bir ðyÞ dbjs ðyÞ ¼ xi fFðxÞgr dFðxÞyj fFðyÞgs dFðyÞ ¼ xi yj fFðxÞgr fFðyÞgs dFðxÞ dFðyÞ xoy

 ZZ   s X s s r!t! ¼ ð1Þ xi yj fFðxÞgr f1FðyÞgt dFðxÞ dFðyÞ ¼ ð1Þt EðXri þ 1,r þ t þ 2 Xrj þ 2,r þ t þ 2 Þ: t t ðr þ t þ 2Þ! xoy t¼0 t¼0 s X

t

ð5:6Þ This explicit form and its derivation were suggested by a referee to whom we are most grateful. So, the bias-reduced estimators of Theorem 5.1 involve terms no more complicated than EðXr,r þ s Xr þ 1,s Þ. These expectations can be estimated by XX pjk XðjÞ XðkÞ , ð5:7Þ jok

where the weights pjk depend on r and s. This double sum involves Oðn2 Þ calculations. So, the calculations for each Si ðFb Þ and hence those for the pth-order estimates of L-moments require Oðn2 Þ calculations. See discussion in Sections 1 and 2. Example 5.1. Suppose that r ¼3. So, S 12 ¼ 6a20 þ 12a12 12a11 þ 8a10 þ a00 : So, by Theorem B.1, for F continuous, S 12 ¼ 12b23 18b22 þ8b21 þ8b20 72g2 g1 þ18g2 g0 þ 72g21 64g1 g0 þ 3g20 þ 36

b10 dðb12 þ b11 Þ:

0.15 MSE (L−Skew)

Bias (L−Skew)

0.0005

Z

−0.0010

0.10

0.05 −0.0025 2

4

6 8 10 12 Square root of n

14

2

4

6 8 10 12 Square root of n

14

2

4

6 8 10 12 Square root of n

14

0.00 0.05 MSE (L−Kurt)

Bias (L−Kurt)

−0.02

−0.04

0.03

−0.06 0.01 2

4

6 8 10 12 Square root of n

14

Fig. 6.2. Biases and mean squared errors for the Student’s t case with two degrees of freedom. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

C.S. Withers, S. Nadarajah / Journal of Statistical Planning and Inference 141 (2011) 3839–3861

3847

Set Ui ¼

Z

gixx :

ð5:8Þ

By (5.1), U0 ¼ 0,

U1 ¼ 2ðg0 g1 Þ,

U2 ¼ 6ðg1 g2 Þ,

U3 ¼ 12ðg2 g3 Þ,

so S 1 ¼ 2U1 U0 ¼ 4ðg0 g1 Þ, S 2 ¼ 6U2 6U1 þU0 ¼ 12ðg0 þ 4g1 3g2 Þ: So, by (2.2), an estimator of TðFÞ ¼ t3 of bias Oðn2 Þ is TðFb Þ þS1 ðFb Þ=ðn1Þ,

ð5:9Þ

where   Z 2 1 3 2 2 S1 ðFÞ ¼ 2l2 l3 ðb10 b11 Þ6l2 ðb10 þ 4b11 3b12 Þl2 l3 4b22 4b21 þ b20 16b11 þ 8b11 b10 3b10 þ16 b11 db10 2

þ l2

 Z   2 2 12b23 18b22 þ8b21 þ 8b20 72b12 b11 þ18b12 b10 þ72b11 64b11 b10 þ3b10 þ36 b10 d b12 þ b11 : ð5:10Þ

0.000

MSE (Skew)

Bias (Skew)

b3 has variance V3 n1 þ Oðn2 Þ, where Also t   Z 4 2 2 2 V3 ¼ l2 l3 4b22 4b21 þ b20 16b11 þ 8b11 b10 3b10 þ16 b11 db10  Z 3 2 2 2l2 l3 12b23 18b22 þ8b21 þ 8b20 72b12 b11 þ18b12 b10 þ72b11 64b11 b10 þ3b10 þ36 b10 dðb12 þ b11 Þ

−0.002

1.0 0.8 0.6

2

4

6 8 10 12 Square root of n

14

2

4

6 8 10 12 Square root of n

14

2

4

6 8 10 12 Square root of n

14

2

4

6 8 10 12 Square root of n

14

0.025

0.0020

MSE (L−Skew)

Bias (L−Skew)

1.2

0.0010

0.0000 2

4

6 8 10 12 Square root of n

0.015

0.005

14

MSE (L−Kurt)

Bias (L−Kurt)

0.000

−0.006

−0.012 2

4

6 8 10 12 Square root of n

14

0.006

0.002

Fig. 6.3. Biases and mean squared errors for the Student’s t case with four degrees of freedom. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

3848

C.S. Withers, S. Nadarajah / Journal of Statistical Planning and Inference 141 (2011) 3839–3861

  Z Z Z 2 2 2 36 b24 9b12 þ12 b12 db11 2b23 þ 12b12 b11 4b11 6 b10 db12 þ b22 4b11 þ4 b11 db10   Z 2 2 þ 12 b22 3b12 b10 þ 2 b10 db11 b21 þ 2b11 b10 b10 =2 þ b20 b10 : 2

þ l2



Note that l1 ¼ b10 , l2 ¼ 2b11 b10 and l3 ¼ 6b12 6b11 þ b10 . Example 5.2. Suppose that r ¼4. Then S 12 ¼ 40a31 20a30 60a21 þ30a20 þ 24a11 14a10 þ a00 : So, by Theorem B.1, for F continuous, S 12 ¼ 40b24 20b33 60b23 þ 54b22 14b21 þ b20 þ 10ð8g3 þ 9g2 Þð4g1 g0 Þ þ 8ð27g21 þ 11g1 g0 g20 Þ Z Z þ 240 ðb11 b10 Þ db12 þ 4 ð40b13 9b11 Þ db10 : In terms of Ui of (5.8), S 2 ¼ 20U3 30U2 þ 12U1 U0 ¼ 12ð2g0 17g1 þ 35g2 20g3 Þ: So, by (2.2), an estimator of TðFÞ ¼ t4 of bias Oðn2 Þ is TðFb Þ þ S1 ðFb Þ=ðn1Þ,

ð5:11Þ

where 2

1

MSE (Skew)

Bias (Skew)

S1 ðFÞ ¼ 2l2 l4 ðb10 b11 Þ6l2 ð2b10 17b11 þ 35b12 20b13 Þ   Z 3 2 2 l2 l4 4b22 4b21 þ b20 16b11 þ 8b11 b10 3b10 þ16 b11 db10

0.003

−0.001 12

−1.2 4

6 8 10 Square root of n

12

6 8 10 Square root of n

12

−0.0030 2

4

6 8 10 Square root of n

12

14

4

6 8 10 Square root of n

12

14

2

4

6 8 10 Square root of n

12

14

2

4

6 8 10 Square root of n

12

14

2

4

6 8 10 Square root of n

12

14

1.0

0.030

0.005

14 MSE (L−Kurt)

4

2

2.5

14

−8e−04 2

0.1

14

MSE (L−Skew)

Bias (L−Skew)

6 8 10 Square root of n

−0.2

2

Bias (L−Kurt)

4

MSE (Kurt)

Bias (Kurt)

2

0.4

0.012

0.002

Fig. 6.4. Biases and mean squared errors for the standard logistic case. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

C.S. Withers, S. Nadarajah / Journal of Statistical Planning and Inference 141 (2011) 3839–3861

3849

n 2 2 40b24 20b33 60b23 þ54b22 14b21 þ b20 þ 10ð8b13 þ9b12 Þð4b11 b10 Þ þ 8ð27b11 þ 11b11 b10 b10 Þ Z Z ð5:12Þ þ 240 ðb11 b10 Þ db12 þ 4 ð40b13 9b11 Þ db10 : 2

þ l2

b4 has variance V4 n1 þ Oðn2 Þ, where Also t   Z 4 2 2 2 V4 ¼ l2 l4 4b22 4b21 þ b20 16b11 þ 8b11 b10 3b10 þ16 b11 db10  3 2l2 l4 40b24 20b33 60b23 þ 54b22 14b21 þ b20 2

2

þ10ð8b13 þ 9b12 Þð4b11 b10 Þ þ 8ð27b11 þ 11b11 b10 b10 Þ þ240 2

þ l2

Z

ðb11 b10 Þ db12 þ 4

Z

ð40b13 9b11 Þ db10



   Z Z Z 2 2 100 4b26 64b13 þ 96 b13 db12 12b25 þ 144b12 b13 96 b13 db11 þ9b24 135b12 þ 108 b12 db11

  Z Z Z 2 þ240 2b24 16b11 b13 þ 12 b11 db12 þ 8 b13 db10 3b23 þ 18b12 b11 6b11 9 b10 db12     Z Z 2 þ144 b22 4b11 þ4 b11 db10 40 b33 4b13 b10 þ3 b10 db12   Z 2 2 þ60 b22 3b12 b10 þ 2 b10 db11 24ðb21 2b11 b10 þ b10 =2Þ þ b20 b10 :

MSE (Skew)

Bias (Skew)

Note that l1 ¼ b10 , l2 ¼ 2b11 b10 , l3 ¼ 6b12 6b11 þ b10 and l4 ¼ 20b13 30b12 þ 12b11 b10 .

−0.002 6 8 10 Square root of n

12

14

−0.5

−2.5 4

6 8 10 Square root of n

12

0.0005

−0.0030 4

6 8 10 Square root of n

12

0.000

−0.008 2

4

6 8 10 Square root of n

12

14

4

6 8 10 Square root of n

12

14

2

4

6 8 10 Square root of n

12

14

2

4

6 8 10 Square root of n

12

14

2

4

6 8 10 Square root of n

12

14

7 5 3

0.030

0.005

14 MSE (L−Kurt)

Bias (L−Kurt)

2

2

9

14 MSE (L−Skew)

2 Bias (L−Skew)

4

0.4

MSE (Kurt)

Bias (Kurt)

2

0.8

0.020

0.005

Fig. 6.5. Biases and mean squared errors for the standard Laplace case. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

3850

C.S. Withers, S. Nadarajah / Journal of Statistical Planning and Inference 141 (2011) 3839–3861

6. A simulation study Here, we perform a simulation study to compare the second-order bias-reduced estimators for skewness, kurtosis, L-skewness and L-kurtosis with those obtained by using unbiased estimators of the numerator and denominator in (1.2) and (1.3). We use two criteria to compare performance: the bias and the mean squared error. These were calculated by simulating 10,000 samples each of size n, n ¼ 4,5, . . . ,200 from the following distributions: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.

standard normal distribution (b3 ¼ 0, b4 ¼ 3, t3 ¼ 0, t4 ¼ 0:1226Þ; Student’s t distribution with two degrees of freedom (b3 undefined, b4 undefined, t3 ¼ 0, t4 ¼ 0:375Þ; Student’s t distribution with four degrees of freedom (b3 ¼ 0, b4 undefined, t3 ¼ 0, t4 ¼ 0:2168Þ; standard logistic distribution (b3 ¼ 0, b4 ¼ 21=5, t3 ¼ 0, t4 ¼ 0:1667Þ; standard Laplace distribution (b3 ¼ 0, b4 ¼ 6, t3 ¼ 0, t4 ¼ 0:2357Þ; uniform ð1,1Þ distribution (b3 ¼ 0, b4 ¼ 9=5, t3 ¼ 0, t4 ¼ 0Þ; standard exponential distribution (b3 ¼ 2, b4 ¼ 9, t3 ¼ 1=3, t4 ¼ 0:1667Þ; standard Gumbel distribution (b3 ¼ 1:14, b4 ¼ 27=5, t3 ¼ 0:1699, tp ¼ 0:1504Þ; 4 ffiffiffi gamma distribution with shape parameter 2 and unit p scale (b3 ¼ffi 2, b4 ¼ 6, t3 ¼ 0:2346, t4 ¼ 0:1416Þ; ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi standard log-normal distribution (b3 ¼ fexpð2Þ þ2g expð1Þ1, b4 ¼ expð4Þ þ 2 expð3Þ þ 3 expð2Þ3, t4 ¼ 0:2931Þ.

t3 ¼ 0:4625,

Of these 10 distributions, nine are nonnormal and four are asymmetric. Two of the 10 distributions have heavy tails (the two Student’s t distributions). Seven of the 10 distributions have light tails (the normal, logistic, Laplace, exponential, Gumbel, gamma and the log-normal distributions). To compute the second-order bias-reduced estimates for skewness and kurtosis and their biases and mean squared errors, we use the following algorithm:

MSE (Skew)

Bias (Skew)

1. simulate a sample of size n from the chosen distribution; 2. using (1.2), estimate b3 , b4 , b5 and b6 for the sample generated in step 1;

−5e−04 6 8 10 Square root of n

12

0.00 6 8 10 Square root of n

12

−2e−04 4

6 8 10 Square root of n

12

0.008

0.000 2

4

6 8 10 Square root of n

12

14

4

6 8 10 Square root of n

12

14

2

4

6 8 10 Square root of n

12

14

2

4

6 8 10 Square root of n

12

14

2

4

6 8 10 Square root of n

12

14

0.02

0.005

14 MSE (L−Kurt)

Bias (L−Kurt)

2

2 0.10

14 MSE (L−Skew)

4

0.05

14

0.08

2 Bias (L−Skew)

4

MSE (Kurt)

Bias (Kurt)

2

0.20

0.0025

0.0005

Fig. 6.6. Biases and mean squared errors for the uniform ( 1,1) case. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

C.S. Withers, S. Nadarajah / Journal of Statistical Planning and Inference 141 (2011) 3839–3861

3851

3. calculate S1 ðFb Þ in (3.2) and S1 ðFb Þ in (3.3); b þS ðFb Þ=ðn1Þ for S ðFb Þ in (3.2) and b b þ S ðFb Þ=ðn1Þ for S ðFb Þ in (3.3); 4. calculate b 1 1 1 1 3 4 5. repeat steps 1–4 10,000 times. To compute the second-order bias-reduced estimates for L-skewness and L-kurtosis and their biases and mean squared errors, we use the following algorithm: 1. simulate a sample of size n from the chosen distribution; 2. using (5.4), estimate b10 , b11 , b12 , b22 , b21 , b20 , b23 , b24 , b13 and b33 for the sample generated in step 1; 3. use the estimates in step 2 and the facts l2 ¼ 2b11 b10 , l3 ¼ 6b12 6b11 þ b10 , l4 ¼ 20b13 30b12 þ12b11 b10 to estimate l2 , l3 and l4 ; 4. use the estimates in step 3 and the facts t3 ¼ l3 =l2 , t4 ¼ l4 =l2 to estimate t3 and t4 ; R R R R R 5. using (5.6) and (5.7), estimate b11 db10 , b11 db12 , b10 db12 , b13 db10 and b10 db11 for the sample generated in step 1; 6. calculate S1 ðFb Þ in (5.10) and S1 ðFb Þ in (5.12); 7. calculate (5.9) and (5.11); 8. repeat steps 1–7 10,000 times.

−0.2

MSE (Skew)

Bias (Skew)

These algorithms can be implemented in most platforms. We implemented the algorithms using the statistical software package R. The computer code can be obtained from the second author, email: [email protected] The biases and mean squared errors for skewness, kurtosis, L-skewness and L-kurtosis for the 10 distributions are plotted in Figs. 6.1–6.10. The actual values plotted are the lowess (Cleveland, 1979, 1981) smoothed versions versus the square root of n for n ¼ 4,5, . . . ,200. While lowess smoothing, we used the default options. These are a smoothing span of 2/3, three ‘‘robustifying’’ iterations and the speed of computations determined by 0.01th of the range of the n values. Each plot in Figs. 6.1–6.10 contains two curves. The curves in red show the biases and mean squared errors for those estimates obtained by using unbiased estimators. The curves in black show the biases and mean squared errors for the bias-reduced estimators. The broken line in black in some of the plots corresponds to the bias being zero.

−0.8 12

−3 −6 4

6 8 10 Square root of n

12

−0.015 4

6 8 10 Square root of n

12

MSE (L−Kurt)

−0.010 2

4

6 8 10 Square root of n

12

14

4

6 8 10 Square root of n

12

14

2

4

6 8 10 Square root of n

12

14

2

4

6 8 10 Square root of n

12

14

2

4

6 8 10 Square root of n

12

14

25 15

0.005

14

0.000

2 35

14

0.000

2

0.2

14

MSE (L−Skew)

Bias (L−Skew)

6 8 10 Square root of n

0

2

Bias (L−Kurt)

4

MSE (Kurt)

Bias (Kurt)

2

1.0

0.005

Fig. 6.7. Biases and mean squared errors for the standard exponential case. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

3852

C.S. Withers, S. Nadarajah / Journal of Statistical Planning and Inference 141 (2011) 3839–3861

Skewness and kurtosis are not defined for the Student’s t distribution with tow degrees of freedom. Also kurtosis is not defined for the Student’s t distribution with four degrees of freedom. So, the corresponding estimates are not shown. We now present several observations – some of which are evident from the figures – and others determined by empirical investigations not detailed here. The bias-reduced estimators outperform those based on unbiased estimators for all values of n and for the 10 distributions considered. The bias is consistently smaller and often much smaller for the bias-reduced estimators. The biases reduce by a factor of approximately 1=n for all of the 10 distributions as expected by the theory. The mean squared errors appear larger for the bias-reduced estimators, but the difference is only slight. In most of the plots, this amount appears to diminish as n increases. The estimates for L-skewness and L-kurtosis are generally superior to those for skewness and kurtosis. They have generally smaller and often much smaller biases and mean squared errors. Many values for biases and mean squared errors appear unacceptably high. If we take ‘‘unacceptably high’’ as being j bias j 41 or mean squared error 4 1 then we can observe the following:

 The mean squared errors for the skewness estimate appear unacceptably high for the exponential distribution for small n.  The mean squared errors for the skewness estimate appear unacceptably high for the Student’s t distribution with four degrees of freedom for all sufficiently large n.

 Both the biases and mean squared errors for the skewness estimate appear unacceptably high for all n for the lognormal distribution.

 The biases for the kurtosis estimate appear unacceptably high for small n for the logistic, Laplace, and the Gumbel distributions.  The mean squared errors for the kurtosis estimate appear unacceptably high for all n for the logistic, Laplace and the Gumbel distributions.

 Both the biases and mean squared errors for the kurtosis estimate appear unacceptably high for all n for the exponential, gamma and the log-normal distributions.

−0.1

MSE (Skew)

Bias (Skew)

Both the bias and mean squared error generally decrease with increasing n for all of the 10 distributions. The normal and uniform distributions generally take the lowest values for the bias and mean squared error. The Student’s t distribution with

−0.5 12

14

−2.5 4

6 8 10 Square root of n

12

0.000

−0.015 4

6 8 10 Square root of n

12

0.0000

−0.0030 2

4

6 8 10 Square root of n

12

14

4

6 8 10 Square root of n

12

14

2

4

6 8 10 Square root of n

12

14

2

4

6 8 10 Square root of n

12

14

2

4

6 8 10 Square root of n

12

14

6 4

0.002

14 MSE (L−Kurt)

2

2 8

14 MSE (L−Skew)

Bias (L−Skew)

6 8 10 Square root of n

−0.5

2

Bias (L−Kurt)

4

0.2

MSE (Kurt)

Bias (Kurt)

2

0.8

0.014

0.002

Fig. 6.8. Biases and mean squared errors for the standard Gumbel case. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

−0.2

MSE (Skew)

Bias (Skew)

C.S. Withers, S. Nadarajah / Journal of Statistical Planning and Inference 141 (2011) 3839–3861

−0.6 12

−3.0 4

6 10 8 Square root of n

12

−0.015 4

6 10 8 Square root of n

12

MSE (L−Kurt)

−0.0030 4

6 10 8 Square root of n

12

14

2

4

6 8 10 Square root of n

12

14

2

4

6 10 8 Square root of n

12

14

2

4

6 10 8 Square root of n

12

14

2

4

6 10 8 Square root of n

12

14

11 8 5

0.025

0.005

14

0.0000

2

0.1

14

0.000

2

0.5

14

MSE (L−Skew)

Bias (L−Skew)

6 8 10 Square root of n

−1.0

2

Bias (L−Kurt)

4

MSE (Kurt)

Bias (Kurt)

2

3853

0.005

Fig. 6.9. Biases and mean squared errors for the gamma case. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

two degrees of freedom and the log-normal distribution generally take the highest values for the bias and mean squared error. Among the six symmetric distributions, the highest values for the bias and mean squared error are obtained by the:

 Student’s t distribution with four degrees of freedom for skewness;  Laplace distribution for kurtosis;  Student’s t distribution with two degrees of freedom for L-skewness and L-kurtosis. Among the four asymmetric distributions, the:

 Gumbel and gamma distributions generally take the lowest values for the bias and mean squared error;  log-normal distribution generally takes the highest values for the bias and mean squared error. Among the two Student’s t distributions, the one with four degrees of freedom takes the higher values for the bias and mean squared error. For all of the distributions, excluding the two heavy tailed ones,

 we have the biases and mean squared errors for skewness generally smaller than those for kurtosis;  we have the biases and mean squared errors for L-skewness generally larger than those for L-kurtosis;  among the biases and mean squared errors, we have those for kurtosis and L-kurtosis generally taking the largest and smallest values, respectively. Finally, for the two heavy tailed distributions, we have the

 biases for L-skewness generally smaller than those for L-kurtosis;  mean squared errors for L-skewness generally larger than those for L-kurtosis.

MSE (Skew)

C.S. Withers, S. Nadarajah / Journal of Statistical Planning and Inference 141 (2011) 3839–3861

Bias (Skew)

3854

0 −2 −5 12

−100 4

6 8 10 Square root of n

12

−0.05 4

6 8 10 Square root of n

12

MSE (L−Kurt)

−0.04 2

4

6 8 10 Square root of n

12

14

4

6 8 10 Square root of n

12

14

2

4

6 8 10 Square root of n

12

14

2

4

6 8 10 Square root of n

12

14

2

4

6 8 10 Square root of n

12

14

9000

0.030

0.005

14

0.00

2 14000

14

−0.01

2

10

14

MSE (L−Skew)

Bias (L−Skew)

6 8 10 Square root of n

−20

2

Bias (L−Kurt)

4

MSE (Kurt)

Bias (Kurt)

2

20

0.020

0.005

Fig. 6.10. Biases and mean squared errors for the standard log-normal case. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

In case of the Student’s t distribution with two degrees of freedom, the biases for L-kurtosis are the largest among all biases and the mean squared errors for L-skewness are the largest among all mean squared errors. 7. Conclusions We have proposed bias-reduced estimators for skewness, kurtosis, L-skewness and L-kurtosis. We have provided algorithms for computing the bias-reduced estimators. These algorithms can be implemented in most platforms. The proposed estimators for skewness and kurtosis are infinitely more efficient than pth-order bootstrapped estimators if p 4 1. The proposed estimators for L-skewness and L-kurtosis are infinitely more efficient than pth-order bootstrapped estimators if p 4 2. Simulation studies show that the proposed estimators produce much smaller biases compared with those based on unbiased estimators. The mean squared errors for the proposed estimators are only slightly larger. However, the mean squared errors for the proposed and unbiased-based estimators appear indistinguishable when n is sufficiently large. The work of this paper can be extended in several ways: (1) extend the results of Sections 3–5 to obtain higher-order corrections for skewness, kurtosis, L-skewness and L-kurtosis; (2) provide extensions for multivariate measures of skewness and kurtosis; (3) provide real data applications. We hope to address some of these issues in a future paper.

Acknowledgments The authors would like to thank the Executive Editor and the three referees for carefully reading the paper and for their comments which greatly improved the paper. Appendix A Theorem A.1 derives br ð12 Þ, br ð13 Þ and br ð12 12 Þ needed for Theorem 3.1 to estimate br of (1.2) with bias Oðn3 Þ.

C.S. Withers, S. Nadarajah / Journal of Statistical Planning and Inference 141 (2011) 3839–3861

3855

Theorem A.1. We have

br ð12 Þ ¼ r br þ 2 þ ðr=2Þðr=2þ 1Þðb4 1Þbr þ r2 br1 b3 þrðr1Þbr2 , 4 X

br ð13 Þ ¼

ar,r þ i br þ i

ðA:1Þ

ðA:2Þ

i ¼ 3

and

br ð12 12 Þ ¼

2 X

br,r þ i br þ i ,

ðA:3Þ

i ¼ 4

where ar,r þ 4 ¼ 3rðr þ2Þ=4,

ar,r þ 3 ¼ 0,

ar,r þ 2 ¼ 3r 2 =2,

ar,r þ 1 ¼ 0,

ar,r ¼ ðr=2Þðr=2 þ1Þðr=2 þ 2Þb6 3ðr=2Þðr=2 þ 1Þðr=2 þ5Þðb4 1Þrðr 2 þ 44Þ=8, ar,r1 ¼ 3r 2 fðr þ2Þb5 þ 2ðr2Þb3 g=4, ar,r2 ¼ 3rðr1Þðr b4 þr2Þ=2,

ar,r3 ¼ rðr1Þðr2Þb3

and br,r þ 2 ¼ rðr þ 2Þfðr þ 4Þb4 r þ8g=2,

br,r þ 1 ¼ 0,

br,r ¼ ðr=2Þðr=2 þ 1Þðr=2þ 2Þðr=2 þ3Þðb4 1Þ2 4rðr þ 1Þðr þ2Þðb4 1Þ þ 3rð6rÞ, br,r1 ¼ r 2 ðr þ 2Þfðr þ 4Þðb4 1Þ þ 12gb3 =2, br,r2 ¼ rðr1Þfrð3r þ 14Þb4 3r 2 10r þ16g=2, br,r3 ¼ 2r 2 ðr1Þðr2Þb3 ,

br,r4 ¼ rðr1Þðr2Þðr3Þ:

Appendix B Set gr ¼ b1r ¼ EXFðXÞr . Theorem B.1 gives the terms faij ¼ Theorem B.1. Suppose F is continuous. In terms of Z bjr ðyÞ dbis ðyÞ , where

bjr ðyÞ ¼

Z

y

xj FðxÞr dFðxÞ,

we have a00 ¼ b20 g20 , a10 ¼ b21 2g1 g0 þ a1 , a11 ¼ b22 4g21 þ a2 , a20 ¼ b22 3g2 g0 þ 2a5 , a21 ¼ b23 6g2 g1 þ 2g21 þ 3 a22 ¼ b24 9g22 þ 12

Z

Z

b10 db12 ,

b12 db11 ,

a30 ¼ b33 4g3 g0 þ 3a11 , a31 ¼ b24 8g3 g1 þ 6

Z

b11 db12 þ 4

Z

b13 db10 ,

R

gix gjx dFðxÞg needed for Theorem 5.1.

3856

C.S. Withers, S. Nadarajah / Journal of Statistical Planning and Inference 141 (2011) 3839–3861

a32 ¼ b25 þ 9g22 =212g2 g3 þ 8 a33 ¼ b26 16g23 þ24

Z

Z

b13 db11 ,

b13 db12 ,

where

a1 ¼ g20 =2, a2 ¼ 4

Z

Z

a5 ¼

a11 ¼

b11 db10 , b10 db11 ,

Z

b10 db12 :

Appendix C

Proof of Theorem 2.1. The proof is immediate from (2.1) and the explicit expressions for fTi ðFÞ,Si ðFÞg given in Withers and Nadarajah (2010). Note (2.6) and (2.5) follow by Eqs. (A.8) and (A.16) of Withers and Nadarajah (2008). & Proof of Theorem 3.1. For TðFÞ ¼ br ðFÞ, explicit expressions for Tð12 Þ, Tð13 Þ and Tð12 12 Þ are given by Theorem A.1. So, (3.1) follows from (2.2) and (2.3). Let mrF ð  Þ denote the derivative of mr ðFÞ ¼ mr , the rth central moment of X  F, with respect to the arguments in ð  Þ. For a detailed definition, see Example 5.6 of Withers and Nadarajah (2010). r=2 For TðFÞ ¼ br ðFÞ, TF ðxÞ ¼ mrF ðxÞm2 r br m1 2 m2F ðxÞ=2. Now Z mrF ðxÞ2 dFðxÞ ¼ m2r m2r 2rmr1 mr þ 1 and Z

mrF ðxÞm2F ðxÞ dFðxÞ ¼ mr þ 2 mr m2 rmr1 m3 :

So, by (2.4), br ðFb Þ has variance n1 vr þ Oðn2 Þ, where Z Z Z mrF ðxÞ2 dFðxÞrbr m2r=21 mrF ðxÞm2F ðxÞ dFðxÞ þ r2 b2r m2 m2F ðxÞ2 dFðxÞ vr ¼ mr 2 2 reduces to the expression given by the theorem.

&

Proof of Theorem 4.1. Set hi ðxÞ ¼ hiF ðxÞ ¼ ai ðx,FðxÞÞ. Differentiating (4.1) and simplifying, we obtain Z h1 Fx , Tx ¼ h0 ðxÞTðFÞ þ

Txy ¼

2  X

h1 ðxÞFðxÞy 

Z

Z h1 Fy þ h2 Fx Fy ,

xy

where s X

pðx1 , . . . ,xs Þ ¼ pðx1 , . . . ,xs Þ þ pðx2 , . . . ,xs ,x1 Þ þ    þ pðxs ,x1 , . . . ,xs1 Þ,

x1 xs

Z Z

h1 Fy ¼

Z

h2 Fx Fy ¼

h1 ðzÞFðzÞy dFðzÞ, Z

h2 ðzÞFðzÞx FðzÞy dFðzÞ,

FðyÞx ¼ Iðx r yÞFðyÞ, the first derivative of FðyÞ, and IðAÞ ¼ 1 or 0 for A true or false. More generally, one obtains  Z Z s X hs Fx1    Fxs : hs1 ðx1 ÞF ðx1 Þx2    Fðx1 Þxs  hs1 Fx2    Fxs þ Tx1 xs ¼ x1 xs

ðC:1Þ

C.S. Withers, S. Nadarajah / Journal of Statistical Planning and Inference 141 (2011) 3839–3861

3857

So, the general derivative of the probability-weighted moment, (4.1), does not involve delta-functions, despite the discontinuity in FðxÞy . Set Z Z Tð1s Þ ¼ TF ðxs Þ ¼ Txx dFðxÞ with s arguments. By (C.1), Z Tð1s Þ ¼ ðshs1 bs þhs Ms Þ, where Z



Z

Ms ðyÞ ¼

hðyÞ dFðyÞ, Z

FðyÞsx dFðxÞ ¼ Fð1FÞs þ ð1FÞðFÞs

¼ ðFF 2 Þfð1FÞs1 ðFÞs1 g at F ¼ FðyÞ, and bs ðyÞ ¼ ð1FÞs1 Ms1 ðyÞ ¼ ð1FÞfð1FÞs1 ðFÞs1 g at F ¼ FðyÞ: So, Tð12 Þ ¼

Tð13 Þ ¼

Z ð1FÞð2h1 þFh2 Þ, Z

ð1FÞð12FÞð3h2 þ Fh3 Þ, Z

Tð12 12 Þ ¼

Tð14 Þ ¼

Z

Fð1FÞ2 ð4h3 þFh4 Þ,

ð1FÞf4ð13F þ 3F 2 Þh3 þFð12F þ 2F 2 Þh4 g,

Tð13 12 Þ ¼

Z

Tð12 12 12 Þ ¼

ð1FÞ2 ð12FÞð5h4 þ Fh5 Þ, Z

F 2 ð1FÞ3 ð6h5 þFh6 Þ:

So, we can write Si ðFÞ ¼

Z 2i X

sij ðFÞhj

j¼i

with the sij ðFÞ given by the statement of the theorem. So, (4.2) follows. By (2.4), Tða, Fb Þ has the stated asymptotic variance, where Z c00 ¼ h20 ,

c01 ¼

Z

h0 ðxÞ

Z

h1 Fx ¼

Z

aðx,FðxÞÞ dFðxÞ

Z

a1 ðy,FðyÞÞFðyÞx dFðyÞ,

bðx,yÞ ¼ a1 ðx,yÞy, c11 ¼

Z Z

h1 Fx

2

ZZZ ¼ a1 ðy,FðyÞÞFðyÞx a1 ðz,FðzÞÞFðzÞx :

The stated expressions for c00, c01, bðx,yÞ, c11 and others follow by simplification and using the fact that Z FðyÞx FðzÞx dFðxÞ ¼ Fðy4zÞFðyÞFðzÞ:

3858

C.S. Withers, S. Nadarajah / Journal of Statistical Planning and Inference 141 (2011) 3839–3861

For F continuous, use the fact ZZ ZZ bðxÞbðyÞFðx4yÞ ¼ 2

bðxÞbðyÞFðxÞ

xoy

to simplify d11.

&

Proof of Theorem 5.1. The proof is an immediate application of Theorem 2.1. To prove (5.5), note, by (5.1), that

bjr ¼ bjr ð1Þ, and that, for F continuous, ZZ Z Z bjr dbis ¼ xj FðxÞr yi FðyÞs ¼ bjr bis  bis dbjr :

&

xoy

Proof of Theorem A.1. By Appendix A of Withers and Nadarajah (2008), for TðFÞ ¼ gðSðFÞÞ with SðFÞ, 2  1, Tð12 Þ ¼ gij S ij ð1,1Þ þ gi S i ð12 Þ,

ðC:2Þ

Tð13 Þ ¼ gijk S ijk ð1,1,1Þ þ3gij S ij ð1,12 Þ þ gi S i ð13 Þ,

ðC:3Þ 2

Tð12 12 Þ ¼ gijkl S ij ð1,1ÞS kl ð1,1Þ þ 6gijk S ij ð1,1ÞS k ð12 Þ þ gij f4S ij ða,ab Þ þ S i ð12 ÞS j ð12 Þ þ 2S ij ða,abÞg þ gi S i ð12 12 Þ, 2

ðC:4Þ 2

where repeated indices i,j,k in each term are implicitly summed over 1 and 2 (for example, gi S i ð1 Þ ¼ g1 S 1 ð1 Þ þg2 S 2 ð12 Þ), gij ¼ ð@=@si Þð@=@sj Þ    gðsÞ at s ¼ SðFÞ, S ij ð1I ,1J , . . .Þ ¼ S ij ðaI bJ ,aK bL Þ ¼

Z

ZZ

SiF ðxI ÞSjF ðxJ Þ    dFðxÞ, SiF ðxI yJ ÞSjF ðxK yL Þ dFðxÞ dFðyÞ, r=2

where ðxI Þ ¼ ðx, . . . ,xÞ with I arguments. To apply this to SðFÞ ¼ ðm2 , mr Þ, gðSðFÞÞ ¼ m2 Withers and Nadarajah (2008):

mr ¼ br , one uses Example 5.3 of

mrF ðxÞ ¼ r mr1 mx þ mrx mr , mrF ðx,yÞ ¼ rðr1Þmr2 mx my r

2 X ðmr1 x mr1 Þmy , x,y

2 X

fxy ¼ fxy þfyx ,

x,y

mrF ðx,y,zÞ ¼ rðr1Þðr2Þmr3 mx my mz þ rðr1Þ

3 X ðmxr2 mr2 Þmy mz , xyz

3 X

fxyz ¼ fxyz þ fyzx þ fzxy ,

xyz

where mx ¼ mF ðxÞ ¼ xm. In particular, m2F ðxÞ ¼ m2x m2 , m2F ðx,yÞ ¼ 2mx my and m3F ðx,y,zÞ ¼ 0. The nonzero derivatives up to order four are g1 ¼ br m1 2 r=2,

r=2

g2 ¼ m2

g11 ¼ br m2 2 ðr=2Þðr=2 þ 1Þ,

, r=21

g12 ¼ m2

g111 ¼ br m3 2 ðr=2Þðr=2 þ1Þðr=2 þ 2Þ,

r=2,

g1111 ¼ br m4 2 ðr=2Þðr=2þ 1Þðr=2 þ2Þðr=2 þ 3Þ, Also S 1 ð12 Þ ¼

Z

m2F ðx,xÞ dFðxÞ ¼ 2m2 ,

r=22

g112 ¼ m2

ðr=2Þðr=2 þ 1Þ, r=23

g1112 ¼ m2

ðr=2Þðr=2 þ1Þðr=2 þ 2Þ:

C.S. Withers, S. Nadarajah / Journal of Statistical Planning and Inference 141 (2011) 3839–3861

S 2 ð12 Þ ¼

Z

S 11 ð1,1Þ ¼

S 12 ð1,1Þ ¼

mrF ðx,xÞ dFðxÞ ¼ rðr1Þmr2 m2 2rmr , Z Z

m2F ðxÞ2 dFðxÞ ¼ m4 m22 , m2F ðxÞmrF ðxÞ dFðxÞ ¼ mr þ 2 mr m2 rmr1 m3 :

So, by (C.2),

br ð12 Þ ¼ g1 S 1 ð12 Þ þ g2 S 2 ð12 Þ þ g11 S 11 ð1,1Þ þ 2g12 S 12 ð1,1Þ reduces to (A.1). Also Z m2F ðxÞ3 dFðxÞ ¼ m6 3m4 m2 þ 2m32 , S 111 ð1,1,1Þ ¼ S 112 ð1,1,1Þ ¼ mr þ 4 mr m4 r mr1 m5 2m2 ðmr þ 2 mr m2 r mr1 m3 Þ, S 11 ð1,12 Þ ¼

Z

m2F ðxÞm2F ðx,xÞ dFðxÞ ¼ 2ðm4 m22 Þ,

S 12 ð1,12 Þ ¼ rðr1Þmr2 m4 2rðmr þ 2 mr1 m3 Þm2 ½rðr1Þmr2 m2 2r mr , S 21 ð1,12 Þ ¼ 2ðmr þ 2 mr m2 r mr1 m3 Þ, S 1 ð13 Þ ¼ 0, S 2 ð13 Þ ¼ rðr1Þðr2Þmr3 m3 þ 3rðr1Þðmr mr2 m2 Þ, 2

S 11 ðab,abÞ ¼ 4m22 ,

S 11 ða,ab Þ ¼ 0, 2

S 12 ða,ab Þ ¼ rðr1Þðr2Þmr3 m3 m2 ,

2

S 21 ða,ab Þ ¼ 0, S 1 ð12 12 Þ ¼ 0,

S 12 ðab,abÞ ¼ 2rðr1Þmr2 m4 þ 4r mr m2 ,

S 2 ð12 12 Þ ¼ rðr1Þðr2Þðr3Þmr4 m22 4rðr1Þðr2Þmr2 m2 : So, by (C.3), Tð13 Þ ¼ g111 S 111 ð1,1,1Þ þ 3g112 S 112 ð1,1,1Þ þ 3g11 S 11 ð1,12 Þ þ 3g12 fS 12 ð1,12 Þ þS 21 ð1,12 Þg þ g1 S 1 ð13 Þ þ g2 S 2 ð13 Þ giving (A.2). Also by (C.4), Tð12 12 Þ ¼ g1111 S 11 ð1,1Þ2 þ4g1112 S 11 ð1,1ÞS 12 ð1,1Þ þ 6g111 S 11 ð1,1ÞS 1 ð12 Þ 2

þ 6g112 fS 11 ð1,1ÞS 2 ð12 Þ þ 2S 12 ð1,1ÞS 1 ð12 Þg þg11 f4S 11 ða,ab Þ þ S 1 ð12 Þ2 þ 2S 11 ðab,abÞg 2

2

þ 2g12 f2S 12 ða,ab Þ þ2S 21 ða,ab Þ þ S 1 ð12 ÞS 2 ð12 Þ þ 2S 12 ðab,abÞg þ g1 S 1 ð12 12 Þ þ g2 S 2 ð12 12 Þ, giving (A.3).

&

Proof of Theorem B.1. We obtain a00 and a10 since Z g0x ¼ xg0 , g1x ¼ xFðxÞ2g1 þ y x

and

a1 ¼

Z

x dFðxÞ

Z

y dFðyÞ ¼ x

ZZ xoy

xy ¼ m2 =2 ¼ g20 =2,

since ZZ

aðxÞaðyÞ ¼ xoy

We obtain a11 since

a2 ¼ 2a3 þ a4 ,

Z

aðxÞ dFðxÞ

2 =2:

3859

3860

C.S. Withers, S. Nadarajah / Journal of Statistical Planning and Inference 141 (2011) 3839–3861

where

a3 ¼

ZZ

Z

xyFðxÞ ¼

b11 db10

xoy

and ZZZ

a4 ¼

yz ¼

ZZ

ZZ yzFðminðy,zÞÞ ¼ 2

x o y,x o z

yzFðyÞ ¼ 2 yoz

We obtain a20 since

Z

b11 db10 :

Z

g2x ¼ xFðxÞ2 3g2 þ 2 yFðyÞ x

and

a5 ¼

ZZ

xyFðyÞ ¼

Z

b10 db11 :

xoy

We obtain a21 since a21 ¼ b23 3g2 g1 þ a6 þ a7 3g2 a8 þ2a9 , where ZZ

a6 ¼ 2 a7 ¼

xoy

ZZ

a8 ¼

xyFðxÞFðyÞ ¼ g21 ,

xFðxÞ2 y ¼

Z

xoy

Z Z

y¼ xoy

ZZZ

a9 ¼

a10 ¼

Z

b10 db12 ,

yFðyÞ ¼ g1 ,

yFðyÞz ¼

ZZ

x o y,x o z

ZZ y4z

yFðyÞzFðminðy,zÞÞ ¼ a7 þ a10 ,

yFðyÞzFðzÞ ¼ g21 =2:

We obtain a30 since

Z

g3x ¼ xFðxÞ3 4g3 þ 3 yFðyÞ2 x

and

a11 ¼

ZZ

xyFðyÞ2 ¼

Z

xoy

b10 db12 :

We obtain a31 since a31 ¼ b24 8g3 g1 þ 3a12 þ a13 þ 3a14 , where

a12 ¼ a13 ¼ a14 ¼

ZZ

xFðxÞyFðyÞ2 ¼ xoy

ZZ

xFðxÞ3 y ¼ xoy

ZZ

Z

Z

b11 db12 ,

b13 db10 ,

yFðyÞ2 zFðminðy,zÞÞ ¼

ZZ

yFðyÞ3 z þ yoz

ZZ

yFðyÞ2 zFðzÞ ¼ y4z

Z

b13 db10 þ

Z

b11 db12 :

&

References Alkasasbeh, M.R., Raqab, M.Z., 2009. Estimation of the generalized logistic distribution parameters: comparative study. Statistical Methodology 6, 262–279. Cleveland, W.S., 1979. Robust locally weighted regression and smoothing scatterplots. Journal of the American Statistical Association 74, 829–836. Cleveland, W.S., 1981. LOWESS: a program for smoothing scatterplots by robust locally weighted regression. The American Statistician 35, 54. David, H.A., Nagaraja, H.N., 2003. Order Statistics, third ed. John Wiley and Sons, Hoboken, New Jersey.

C.S. Withers, S. Nadarajah / Journal of Statistical Planning and Inference 141 (2011) 3839–3861

3861

Delicado, P., Goria, M.N., 2008. A small sample comparison of maximum likelihood, moments and L-moments methods for the asymmetric exponential power distribution. Computational Statistics and Data Analysis 52, 1661–1673. Hall, P., 1992. The Bootstrap and Edgeworth Expansion. Springer-Verlag, New York. Hosking, J.R.M., 1990. L-moments: analysis and estimation of distributions using linear combinations of order statistics. Journal of the Royal Statistical Society B 52, 105–124. Hosking, J.R.M., 1992. Moments or L-moments? An example comparing two measures of distributional shape. The American Statistician 46, 186–189. Hosking, J.R.M., 2006. On the characterization of distributions by their L-moments. Journal of Statistical Planning and Inference 136, 193–198. Hosking, J.R.M., 2007a. Some theory and practical uses of trimmed L-moments. Journal of Statistical Planning and Inference 137, 3024–3039. Hosking, J.R.M., 2007b. Distributions with maximum entropy subject to constraints on their L-moments or expected order statistics. Journal of Statistical Planning and Inference 137, 2870–2891. Hosking, J.R.M., Wallis, J.R., Wood, E.F., 1985. Estimation of the generalized extreme-value distribution by the method of probability-weighted moments. Technometrics 27, 251–261. Jones, M.C., 2004. On some expressions for variance, covariance, skewness and L-moments. Journal of Statistical Planning and Inference 126, 97–106. Pearson, C.P., 1993. Application of L-moments to maximum river flows. The New Zealand Statistician 28, 2–10. Royston, P., 1992. Which measures of skewness and kurtosis are best? Statistics in Medicine 11, 333–343 Serfling, R., Xiao, P., 2007. A contribution to multivariate L-moments: L-comoment matrices. Journal of Multivariate Analysis 98, 1765–1781. Ulrych, T.J., Velis, D.R., Woodbury, A.D., Sacchi, M.D., 2000. L-moments and C-moments. Stochastic Environmental Research and Risk Assessment 14, 50–68. Vogel, R.M., Fennessey, N.M., 1993. L-moment diagrams should replace product moment diagrams. Water Resources Research 29, 1745–1752. Withers, C.S., Nadarajah, S., 2008. Analytic bias reduction for k-sample functionals. Sankhya. ¯ A 70, 186–222. Withers, C.S., Nadarajah, S., 2010. Nonparametric estimates of low bias. Technical Report, Applied Mathematics Group, Industrial Research Ltd., Lower Hutt, New Zealand. Available on-line at /http://arxiv.org/abs/1008.0127S.