A ratio goodness-of-fit test for the Laplace distribution

A ratio goodness-of-fit test for the Laplace distribution

Statistics and Probability Letters xx (xxxx) xxx–xxx Contents lists available at ScienceDirect Statistics and Probability Letters journal homepage: ...

442KB Sizes 9 Downloads 143 Views

Statistics and Probability Letters xx (xxxx) xxx–xxx

Contents lists available at ScienceDirect

Statistics and Probability Letters journal homepage: www.elsevier.com/locate/stapro

Q1

Q2

A ratio goodness-of-fit test for the Laplace distribution Elizabeth González-Estrada ∗ , José A. Villaseñor ∗ Colegio de Postgraduados, Km. 36.5 Carr. México-Texcoco, Montecillo, 56230, Mexico

article

info

Article history: Received 24 March 2016 Received in revised form 5 July 2016 Accepted 5 July 2016 Available online xxxx

abstract A test based on the ratio of the sample mean absolute deviation and the sample standard deviation is proposed for testing the Laplace distribution hypothesis. The asymptotic null distribution for this test statistic is found to be normal. The use of Anderson–Darling test based on a data transformation is also discussed. © 2016 Elsevier B.V. All rights reserved.

Keywords: Data transformation Test for exponentiality Mean absolute deviation Asymptotic distributions Anderson–Darling test

1. Introduction

1

A random variable X has the Laplace distribution, also called double exponential distribution, with location and scale parameters −∞ < θ < ∞ and β > 0, denoted by X ∼ L(θ , β), if its cumulative distribution function (cdf) is given by

  1 θ −x FX (x) = exp − , x ≤ θ, 2 β   x−θ 1 , x ≥ θ. = 1 − exp − 2 β

Y

= |X − θ| ∼ Exp(β).

5

(1)

Some applications of the Laplace family of distributions are found in the areas of economics, finance, health sciences, hydrology, etc. (Kotz et al., 2001; Puig and Stephens, 2000), where it is used for modeling symmetric datasets. In this paper we consider the problem of testing the null hypothesis H0 : a random sample X1 , . . . , Xn comes from a L(θ , β) distribution with unknown parameters. This topic has also been addressed by Puig and Stephens (2000), Meintanis (2004), Choi and Kim (2006), Best et al. (2008), Gel (2010) and Lafaye de Micheaux and Tran (2016), among others. Current references on goodness-of-fit are Torabi et al. (2016) and Roberts (2015). For testing H0√ we propose a test based on the ratio of two estimators for the scale parameter β : the sample mean absolute deviation and 1/ 2 times the sample standard deviation. A similar approach has been used before by Geary (1936) and Gel



Corresponding authors. E-mail address: [email protected] (E. González-Estrada).

http://dx.doi.org/10.1016/j.spl.2016.07.003 0167-7152/© 2016 Elsevier B.V. All rights reserved.

3

4

The mean and variance of X are θ and σX2 = 2β 2 . This distribution is closely related to the exponential distribution with cdf F (z ) = 1 − exp{−z /β}, where z > 0 and β > 0, which is denoted as Exp(β). In fact, if X ∼ L(θ , β) then the random variable (θ)

2

6 7 8 9 10 11 12 13 14 15 16 17

2

E. González-Estrada, J.A. Villaseñor / Statistics and Probability Letters xx (xxxx) xxx–xxx

11

et al. (2007) for testing normality. On the other side, as a second test for H0 , using property in (1) we also propose to transform X1 , . . . , Xn to approximately exponential random variables and then use Anderson–Darling test for testing exponentiality. Puig and Stephens (2000) considered the Anderson–Darling, Watson and Cramér–von Mises tests for testing H0 , which compare the empirical distribution function (EDF) to the Laplace cumulative distribution function. Their power studies indicate that, among the EDF tests, Watson test is in general the best test against symmetric distributions and that Anderson–Darling test performs poorly against this kind of alternatives. The simulation results presented in Section 3 show that Anderson–Darling test performs better than Watson test when it is based on transformed observations instead of the original observations. These results also indicate that the ratio test is powerful against symmetric alternative distributions. This manuscript is organized as follows. In Section 2 the proposed tests are presented and the asymptotic null distribution of the ratio test is obtained. The results of a Monte Carlo simulation study conducted in order to assess the power properties of the tests are presented in Section 3. Some conclusions are provided in Section 4.

12

2. New tests for the Laplace distribution

1 2 3 4 5 6 7 8 9 10

13 14

Let X1 , . . . , Xn be a random sample of size n from a continuous population with cdf F . Next we present two tests for the composite null hypothesis: H0 : X ∼ L(θ , β),

15

(2)

16

where −∞ < θ < ∞ and β > 0 are unknown.

17

2.1. A test based on the ratio of two estimators for β

18 19

Notice that an estimator of the scale parameter β is the sample mean absolute deviation (MAD) about the sample mean X¯ n , defined as

βˇ n =

20

n 

|Xi − X¯ n |/n.

(3)

i =1 21 22

On the other hand, a moments estimator of β is β˜ n = If H0 in (2) holds, then the following statistic: Rn = β˜ n /βˇ n

23 24 25

28 29 30

(4)

is expected to take on values close to one. Therefore, a test for the Laplace distribution based on Rn rejects H0 at a significance level α ∈ (0, 1) if Rn < cα/2 or Rn > c1−α/2 , where these critical values satisfy the following equation: 1 − α = P (cα/2 < Rn < c1−α/2 |H0 ),

26 27

Sn2 /2, where Sn2 is the sample variance.



(5)

since Rn is a location-scale invariant statistic. Critical values for this test can be obtained approximately from the asymptotic null distribution of Rn ; however, for small sample sizes these values can be computed by Monte Carlo simulation for each sample size n. Next theorem provides the asymptotic null distribution of Rn .



d

32

Theorem 1. Under H0 , 4n(Rn − 1) → η(0, 1), as n → ∞. For the proof of this theorem, see the Appendix A section. 

33

Corollary 1. Under H0 ,

31

34 35



d

n(βˇ n − β)/β → N (0, 1), as n → ∞.



Remark 1. From expression (A.9) in the Appendix A, a well-known result (Kotz et al., 2001) follows: N (0, 5β 2 /4). 





d

n(Sn / 2 − β) →



36 37

38

From Theorem 1, a test based on R∗n = 4n(Rn − 1) rejects H0 at a test size α ∈ (0, 1) when the sample size is large if |Rn | > z1−α/2 , where z1−α/2 is the 100(1 − α/2)% quantile of the standard normal distribution. ∗

Remark 2. An additional test for the Laplace distribution can be based on the ratio R′n = β˜ n /βˆ n , where βˆ n =

39

is the maximum likelihood estimator of β and θˆn is the sample median.

40

2.2. Anderson–Darling test based on a data transformation

41

n

i=1

|Xi − θˆn |/n



Under H0 , Y (θ ) = |X − θ | is distributed as Exp(β). Hence, if the unknown value of the location parameter θ is replaced by the sample mean X¯ n , then the transformed observations Yi = |Xi − X¯ n |, i = 1, . . . , n, are asymptotically independent

E. González-Estrada, J.A. Villaseñor / Statistics and Probability Letters xx (xxxx) xxx–xxx

3 p

and identically distributed (i.i.d.) random variables with Exp(β) distribution. In fact, since by CLT X¯ n − θ → 0 as n → ∞, p

1

d

then Xi − X¯ n = (Xi − θ ) − (X¯ n − θ) → Xi − θ ; hence, Yi → Y (θ ) , i = 1, . . . , n. Therefore, testing H0 is asymptotically equivalent to testing H0′ : Y1 , . . . , Yn ∼ Exp(β).

2 3

(6)

4

Since β > 0 is unknown, it is convenient to consider scale invariant tests like Anderson–Darling test for exponentiality, which has good power properties against a wide range of alternative distributions. Anderson–Darling test rejects H0′ at a significance level α ∈ (0, 1) if A2n > A21−α , where A2n = n





[G(y) − Gn (y)]2 dG(y), G(y)[1 − G(y)]

5 6 7

(7)

8

G(y) is the exponential cdf and Gn (y) is the EDF of the Yi ’s. The critical value A21−α is the 100(1 − α)% quantile of the null distribution of A2n , which is independent of β since A2n is scale invariant. For small sample sizes, the critical value A21−α can be obtained by Monte Carlo simulation when generating pseudo random samples from the Exp(1) distribution.

10

3. Simulation study

12

−∞

In order to assess the power performance of the proposed tests, a Monte Carlo simulation study was conducted using R (R Core Team, 2016). For comparison purposes, the powers of Watson (U 2 ) and Anderson–Darling (A2X ) tests introduced by Puig and Stephens (2000), and G and Z tests implemented by Lafaye de Micheaux and Tran (2016), were estimated. Anderson–Darling test based on the transformed observations will be denoted by A2Y . The R′n test mentioned in Remark 2 has also been included in this comparison. The following symmetric alternative distributions were considered: normal, logistic, Cauchy, Uniform(0,1) and Beta(2,2). Even though symmetric distributions are the most interesting alternatives for the Laplace distribution, the following asymmetric distributions were also considered: standard Gumbel, skew-normal (sn) with slant parameter 3, sn(3); skew-t (st) with slant parameter 3 and 10 degrees of freedom, st (3, 10). 20 000 samples of sizes n = 20, 35, 50, 100, 200 were generated from each alternative distribution. The test size has been chosen to be α = 0.05. Fig. 1 presents the estimated sizes and powers of the tests. The top left-hand side graph presents the estimated sizes of the tests for different sample sizes. Notice that these values are close to the nominal test size α = 0.05. Against the symmetric alternative distributions considered, it is observed that R, R′ and A2Y are in general more powerful than G, Z , U 2 and A2X tests and that R and R′ produce similar results. Notice that, in general, there is a substantial gain in power when Anderson–Darling test is used to test exponentiality on the transformed observations (A2Y ) in comparison to the case when it is applied directly on the original observations (A2X ). The graphs in the bottom row of Fig. 1 present the estimated powers of the tests against the asymmetric alternative distributions considered. In general, A2Y and A2X are more sensitive. The R test outperforms R′ against this kind of alternatives. 4. Conclusions

9

11

13 14

Q3

15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

Q4

30

31

The asymptotic null distribution of the proposed ratio test has been obtained. Simulation results indicate that this test has in general competitive power properties against the symmetric alternative distributions considered in the simulation study. There is a substantial improvement in terms of power when Anderson–Darling test is based on the transformed observations, instead of the original data, against symmetric alternative distributions. Furthermore, this test is competitive against skewed alternative distributions. Acknowledgments

32 33 34 35 36

37

The authors wish to thank two anonymous reviewers and the associate editor for their constructive comments on the original version of the manuscript, which helped to improve the presentation of this paper. Appendix A

38 39

40

The following lemma is useful to prove Theorem 1. (θ)

(θ)

(X¯ −X )+(θ−X )

Lemma 1. Let Qi,n = |Xn−X¯ i |+|X −θ|i and define Q¯ n n i i

=

41

n

i=1

(θ )

(θ )

Qi,n /n then Q¯ n

p

→ 0.

42

p

Proof. Notice that under H0 , by CLT X¯ n − θ → 0, then for i ≥ 1,

43

4

E. González-Estrada, J.A. Villaseñor / Statistics and Probability Letters xx (xxxx) xxx–xxx

Fig. 1. Estimated size and power of the tests against some symmetric and skew alternative distributions for different sample sizes and α = 0.05. 1 2

3

4

(θ)

(θ)

p

(θ)

symmetric. Now, since |Qi,n | ≤ 1, by the Dominated Convergence Theorem, (θ) lim E (Qi,n ) n→∞



1

= −1

x lim fQ (θ) (x)dx = E (Q ) = 0, i,n

n→∞

(θ)

where fQ (θ) is the density function of Qi,n . i,n

5 6

7

(θ)

(θ)

(θ )

p

(θ)

(θ)

Let kn (i, k) = cov(Qi,n , Qk,n ). Therefore, lim kn (i, k) = 0,

= 1,

8

10

(θ)

Similarly, for i ̸= k limn→∞ E (Qi,n Qk,n ) = 0, since Qi,n Qk,n → Q .

n→∞

9

d

Qi,n → Q , which implies Qi,n → Q where P (Q = −1) = P (Q = 1) = 1/2 due to the fact that Laplace distribution is

if i ̸= k

(A.1)

if i = k.

(A.2)

Notice that if we let ck,n = n2 Var(Q¯ n(θ) ) =

n 

k

j =1

(θ)

kn (j, k)/k then

Var(Qk,n ) + 2

k=1

n  k−1 

kn (j, k) = 2

k=1 j=1

n  k=1

kck,n −

n 

(θ )

Var(Qk,n )

k=1

n

≤2

11



kck,n .

k=1 12

Now, let N be such that 0 < N < n and notice that



13

 N n  1  1   kc ≤ | kc | + |kck,n |   k , n k , n  n2  k=1 n2 k=1 n2 k=N +1  n 1 

(A.3)

E. González-Estrada, J.A. Villaseñor / Statistics and Probability Letters xx (xxxx) xxx–xxx N 1 



n2

k=1

N 1 



|kck,n | +

n2 k=1

n 1 

n k=N +1

5

|ck,n |

1

|kck,n | + sup |ck,n |.

(A.4)

N
By (A.1) and (A.2),

3

k−1

lim ck,n = lim

n→∞

n→∞

1

1

k i =1

k

[kn (i, k) + kn (k, k)] =

.

4

Hence, limk→∞ limn→∞ ck,n = 0. Therefore, if first we let n → ∞ and second N → ∞ in expression (A.4) we conclude that 1 n2

limn→∞

¯ (θ )

Qn

p

(θ )

 ¯ k=1 kck,n = 0. Hence, from (A.3) we have limn→∞ Var(Qn ) = 0. By Chebyshev’s inequality, it follows that

n 

→ 0.





5 6 7

(θ) Proof of Theorem 1. Define Yi = |Xi − X¯ n | and Yi = |Xi − θ|, i = 1, . . . , n. Notice that under H0 ,

(Xi − X¯ n )2 − (Xi − θ )2 = (X¯ n − θ )Qi(θ) ,n , |Xi − X¯ n | + |Xi − θ | n (θ) (θ) (θ) ¯ ¯ (θ ) where Qi,n is defined in Lemma 1. Let Y¯n = i=1 Yi /n and Dn (θ ) = Yn − Yn , then by (A.5), (θ)

Yi − Yi

=

Dn (θ ) = (X¯ n − θ )Q¯ n(θ) . Since by the CLT, that

2





(θ )

d

n(X¯ n − θ) → Z ∼ Normal and by Lemma 1, Q¯ n

p

8

(A.5)

10

(A.6)

11

→ 0, applying Slutsky’s Theorem in (A.6) it follows

12

p

nDn (θ ) → 0. Hence

13

¯ (θ)

¯ (θ)

(Y¯n − β) = Dn (θ ) + (Yn − β) = Yn

√ − β + op (1/ n),

(A.7)

(θ)

where E (Yi − β) = 0. On the other hand, by CLT applied to the Xi′ s, n 1

Sn2 − 2β 2 =

n i=1

9

Yi2 − 2β 2 =

n 1

n i =1

14

15 16

n 1  (θ )2 (Yi(θ)2 − 2β 2 ) − (X¯ n − θ )2 = (Yi − 2β 2 ) + Op (1/n),

n i =1

(A.8)

(θ)2

where E (Yi − 2β 2 ) = 0. For arbitrary t1 , t2 ∈ ℜ but fixed, by (A.7) and (A.8) we define:

17

18 19



Vn = n[t1 (Y¯n − β) + t2 (Sn2 − 2β 2 )]   n √ 1 √ (θ) (θ)2 2 = n {t1 (Yi − β) + t2 (Yi − 2β )} + op (1/ n)

20

21

n i=1

=

n √ 1

n

n i =1



Wi + op (1/ n),

22

(θ)

− β) + t2 (Yi(θ)2 − 2β 2 ), i = 1, . . . , n, are independent random variables such that E (Wi ) = 0 (θ ) (θ )2 − 2β 2 ) = 20β 4 and and Var(Wi ) = t12 σ11 + t22 σ22 + 2t1 t2 σ12 , where σ11 = Var(Yi − β) = β 2 , σ22 = Var(Yi (θ) (θ)2 2 3 σ12 = σ21 = cov(Yi − β, Yi − 2β ) = 4β . Then by the CLT, Vn is asymptotically normally distributed. Since t1 , t2 are arbitrary, by Cramér–Wold Theorem: √ d nTn → N (2) (0, 6), (A.9)   σ σ where Tn = (Y¯n − β, Sn2 − 2β 2 )′ and 6 = σ11 σ12 . 21 22 √ √ √ Now, define g : ℜ+ × ℜ+ → ℜ+ such that g (x, y) = y/ 2x. Notice that g (Y¯n , Sn2 ) = Sn / 2Y¯n . Let η = (β, 2β 2 )′ and notice that ∂∂x g (x, y)|(x,y)=η = −1/β and ∂∂y g (x, y)|(x,y)=η = 1/4β 2 . Let g˙ (η) = (−1/β, 1/4β 2 )′ . where Wi = t1 (Yi

Therefore, by the multivariate delta method,

√ n



Sn



2Y¯n



23 24 25 26

27

28 29 30 31

d

− 1 → N (0, γ 2 ),

32



where γ 2 = [˙g (η)]′ 6g˙ (η) = 1/4. Since Rn = Sn / 2Y¯n , the theorem follows.



33

6

1

2

3

4 5 6 7 8 9 10 11 12 13 14 15

E. González-Estrada, J.A. Villaseñor / Statistics and Probability Letters xx (xxxx) xxx–xxx

Appendix B. Supplementary data Supplementary material related to this article can be found online at http://dx.doi.org/10.1016/j.spl.2016.07.003. References Best, D.J., Rayner, J.C.W., Thas, O., 2008. Comparison of some tests of fit for the Laplace distribution. Comput. Statist. Data Anal. 52 (12), 5338–5343. Choi, B., Kim, K., 2006. Testing goodness-of-fit for Laplace distribution based on maximum entropy. Statistics 40 (6), 517–531. Geary, R.C., 1936. Moments of the ratio of the mean deviation to the standard deviation for normal samples. Biometrika 28, 295–305. Gel, Y., 2010. Test of fit for a Laplace distribution against heavier tailed alternatives. Comput. Statist. Data Anal. 54, 958–965. Gel, Y.R., Miao, W., Gastwirth, J.L., 2007. Robust directed tests of normality against heavy tailed alternatives. Comput. Statist. Data Anal. 51, 2734–2746. Kotz, S., Kozubowski, T., Podgorski, K., 2001. The Laplace Distribution and Generalizations: A Revisit with Applications to Communications, Economics, Engineering and Finance. Birkhäuser, Boston. Lafaye de Micheaux, P., Tran, V.A., 2016. PoweR: A reproducible research tool to ease monte carlo power simulation studies for goodness-of-fit tests in R. J. Stat. Softw. 69 (3), 1–42. Meintanis, S., 2004. A class of omnibus tests for the laplace distribution based on the empirical characteristic function. Comm. Statist. Theory 33 (4), 925–948. Puig, P., Stephens, M., 2000. Tests of fit for the laplace distribution, with applications. Technometrics 42 (4), 417–424. R Core Team 2016. R: A Language and Environment for Statistical Computing. Vienna, Austria. Roberts, L.A., 2015. Distribution free testing of goodness of fit in a one dimensional parameter space. Statist. Probab. Lett. 99, 215–222. Torabi, H., Montazeri, N.H., Grané, A., 2016. A test for normality based on the empirical distribution function. SORT Statist. Oper. Res. Trans. 40 (1), 3–36.