Computational Statistics and Data Analysis 53 (2009) 1132–1141
Contents lists available at ScienceDirect
Computational Statistics and Data Analysis journal homepage: www.elsevier.com/locate/csda
New goodness-of-fit tests based on fiducial empirical distribution functionI Xingzhong Xu, Xiaobo Ding ∗ , Shuran Zhao Department of Mathematics, Beijing Institute of Technology, Beijing 100081, China
article
a b s t r a c t
info
Article history: Received 13 December 2007 Received in revised form 7 October 2008 Accepted 8 October 2008 Available online 14 October 2008
In this paper we derive new tests for goodness of fit based on the fiducial empirical distribution function (EDF) after the probability integral transformation of the sample. Note that the fiducial EDF for a set of given sample observations is a randomized distribution function. By substituting the fiducial EDF for the classical EDF in the Kolmogorov–Smirnov, Cramér–von Mises statistics and so forth, randomized statistics are derived, of which the qth quantile and the expectation are chosen as test statistics. It emerges from Monte Carlo simulations that in most cases there exist some of the new tests having better power properties than the corresponding tests based on the classical EDF and Pyke’s modified EDF. © 2008 Elsevier B.V. All rights reserved.
1. Introduction Let X1 , . . . , Xn be an iid sample from a cumulative distribution function F (x) = P (X1 ≤ x), which is assumed to be continuous. The general problem of goodness of fit is to test the null hypothesis H0 that F (x) = F0 (x; θ1 , . . . , θk ) for every x against unspecified alternatives. Here the θi ’s denote the parameters of the hypothesized distribution function. Two cases of this problem are of interest: one is that the hypothesis H0 is simple, i.e. the distribution function F0 is completely specified; the other one is that the hypothesis H0 is composite, i.e. F0 is defined only as a member of a family of distributions, such as the normal or exponential family, but some or all of the parameters are unspecified. Several classes of tests for goodness of fit have been developed. One class of the tests is based on the discrepancy between the estimator of F0 under H0 , say Fˆ0 , and the nonparametric estimator of F . Of course, when H0 is simple, we replace Fˆ0 by F0 . The classical nonparametric estimator of a distribution function is the empirical distribution function (EDF) Fn . Three of the best known tests based on the EDF are the Kolmogorov–Smirnov (KS), Cramér–von Mises (CM) and Anderson–Darling (AD) weighted tests. Although the EDF is a nonparametric maximum likelihood estimator (see Owen (2001)) and has optimum asymptotic properties (see Aggarwal (1955) or Dvoretsky et al. (1956)), a modified EDF can have some small sample advantages. Since F is continuous, F (Xi ) has the same distribution as Ui , i = 1, . . . , n, where Ui ’s are independently distributed with uniform distribution U (0, 1). Denote X(1) ≤ X(2) ≤ · · · ≤ X(n) as the order statistics of X1 , . . . , Xn and in addition let X(0) = −∞ and X(n+1) = ∞. Similarly, U(1) ≤ U(2) ≤ · · · ≤ U(n) are the order statistics from U (0, 1) and U(0) = 0, U(n+1) = 1.Note that EU(i) = i/(n + 1). Pyke (1959) suggested modified one-sided KS statistics Cn+ = max1≤i≤n i/ (n + 1) − F0 X(i) and Cn− = max1≤i≤n F0 X(i) − i/ (n + 1) , and modified two-sided KS statistic Cn = max Cn+ , Cn− , which have some desirable small sample properties. Green and Hegazy (1976) considered that Pyke’s statistics were based on Gn (x) = i/(n + 1) for
I The work is supported by Chinese National Natural Science Fund with grant No. 10771015.
∗
Corresponding author. Tel.: +86 10 68915494. E-mail address:
[email protected] (X. Ding).
0167-9473/$ – see front matter © 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.csda.2008.10.003
X. Xu et al. / Computational Statistics and Data Analysis 53 (2009) 1132–1141
1133
X(i) ≤ x < X(i+1) , i = 1, . . . , n, and then constructed some modified goodness-of-fit tests based on Gn . Also, Brunk (1962) considered that Pyke’s modified EDF was
" Pn (x) =
F0 (x) − F0 X(i−1)
#
F0 X(i) − F0 X(i−1)
+i−1
(n + 1)
(1)
for X(i−1) ≤ x ≤ X(i) , i = 1, 2, . . . , n + 1, and obtained a modified Kuiper’s statistic for circle data (see Kuiper (1960) and Stephens (1969)). One might prefer, for testing purposes, Pn to Gn , for the simple reason that Pn is continuous as F . The case when H0 is composite is in fact of more relevance since it is more realistic in practice to have unknown parameters for the null distribution. This case is relatively more difficult to handle and there are many techniques developed for this case (see D’Agostino and Stephens (1986)). It is well known that one can apply the test for simple hypothesis to the case when the hypothesis is composite, by substituting Fˆ0 for F0 . There are several methods to estimate Fˆ0 (see, for example, Srinivasan (1970) and Parr and Schucany (1982)). One of the most convenient methods is to estimate the values of the unknown parameters. It is proved by David and Johnson (1948) that if F0 depends only on a location θ1 and a scale θ2 , and if θˆ1 and θˆ2 are proper estimators of θ1 and θ2 respectively, then the distribution of the random variable F0 (X ; θˆ1 , θˆ2 ) under H0 depends only on the functional form of F0 . Hence such a method is possibly valid for a location–scale family (see e.g. Lilliefors (1967) and Lilliefors (1969)). A more general discussion will be given in Section 2. In this paper we introduce a new randomized estimator, called fiducial EDF, for an unknown continuous distribution function. To apply the fiducial EDF more conveniently, we first apply probability integral transformation to the given sample. Then the fiducial EDF has similar functional form to Pyke’s modified EDF, presented in (1), and its conditional expectation given an observed sample is Pyke’s modified EDF. Analogously to the test statistics based on the classical EDF or Pyke’s modified EDF, we can construct new statistics based on the fiducial EDF. However, the new statistics are random variables for a given sample. Then the expectation and qth quantile are chosen as test statistics. From Monte Carlo simulations it can be seen that at least in most cases we consider, there exist some of the new tests having better power properties than the corresponding tests based on the classical EDF or Pyke’s modified EDF. This paper is organized as follows. In Section 2, we construct the new tests through the fiducial EDF and show that they are consistent. In Section 3, we provide some power comparisons between the new tests and the corresponding tests based on the classical EDF or Pyke’s modified EDF and show that some of the new tests may have optimum performance in power. In Section 4, a numerical example is provided. In Section 5, some comments are given. 2. The new test statistics We first construct a new nonparametric estimator of F , say e Fn , called fiducial EDF, in the sense of fiducial distribution and maximum entropy. Given observations x1 , . . . , xn of the sample X1 , . . . , Xn , e Fn is defined by
x − x(1) , U(1) exp µ1 x − x(i) e U(i+1) − U(i) · + U(i) , F n x; U , x = x(i+1) − x(i) x − x(n) 1 − 1 − U(n) exp − , µ2
x ≤ x(1) , x(i) ≤ x ≤ x(i+1) , 1 ≤ i ≤ n − 1, x(n) ≤ x,
where U = (U1 , . . . , Un )T , x = (x1 , . . . , xn )T , and µ1 and µ2 are constants or depend on x. e Fn x; U , x is a consistent
randomized estimator of F . It can be seen that e Fn (x(i) ; U , x) = U(i) , i = 1, . . . , n, which is done in the sense that F (X(i) ) has the same distribution as U(i) , i = 1, . . . , n. This is why we call e Fn fiducial distribution, because the essential of fiducial inference is to estimate an unknown realization of a random variable with an independent and identically distributed random element. √ For example, suppose X is distributed according to N (µ, 1), where µ is unknown. Hence n(√ X¯ −µ) is distributed as N (0, 1). Let E be a random element which is independent of X¯ and distributed as N (0, 1). Then let n(¯x − µ) = E, which implies that the fiducial distribution of µ is N (¯x, 1/n). Another property of e Fn x; U , x is that a random variable with distribution function e Fn x; u, x has exponential tails and is uniformly distributed on [x(i) , x(i+1) ], i = 1, . . . , n − 1. This is obtained in the sense of entropy. It is known that if the support of a random variable is a limited interval V , then the random variable uniformly distributed on V has maximum entropy; if the random variable takes values on (−∞, a) or (a, ∞), where a is a real number, and the expectation is known, then the exponential random variable has maximum entropy. Under the conditions that the distribution function should pass through the points (x(1) , u(1) ), . . . , (x(n) , u(n) ) and E X1 |X1 ≤ x(1) = µ1 , E X1 |X1 ≥ x(n) = µ2 , X1 with distribution function e Fn x; u, x has maximum entropy.
The exponential tails of e Fn x; U , x may cause some difficulties in applications to the construction of test statistics. First, we should determine the parameters µ1 and µ2 . Usually, µ1 and µ2 are determined to make the distribution function e Fn x; U , x satisfy some conditions. For example, denote f1 (u, x) and f2 (u, x) as the expectation and variance of a random
variable with distribution function e Fn x; U , x when U = u respectively. One could find µ1 and µ2 so that EU f1 U , x
and
1134
X. Xu et al. / Computational Statistics and Data Analysis 53 (2009) 1132–1141
EU f2 U , x are equal to the sample mean and sample variance of x respectively. The other problem is that the exponential tails would give in calculation. For example, it would take much time to obtain the value of the KS type rise to difficulties statistic supx e Fn (x) − F0 (x) , because the supremum is not always taken over the sample points. On the other hand, the information about the underlying distribution function that the sample gives us, focuses on the sample points, suggesting that the tails of the fiducial EDF shall play a minor role in power of the tests based on the fiducial EDF. To overcome the difficulties caused by the exponential tails, we first transform the sample X1 , . . . , Xn into the random variables between 0 and 1. Then the tails shall be linear in the sense of maximum entropy. The natural transformation is the probability integral transformation. Let Yi = Fˆ0 (Xi ), i = 1, . . . , n and 0 = Y(0) ≤ Y(1) ≤ · · · ≤ Y(n+1) = 1 be the
corresponding order statistics. Under the simple hypothesis H0 , Yi ’s are iid U (0, 1) random variables since Fˆ0 is replaced with F0 . The fiducial EDF based on the transformed sample y = (y1 , . . . , yn )T is given by
U(i+1) − U(i) y − y(i) e + U(i) Fn y; U , y = y(i+1) − y(i)
(2)
for y(i) ≤ y ≤ y(i+1) , 0 ≤ i ≤ n. Note that e Fn y; U , y has similar functional form to Pn (y; y) and its conditional expectation given y is Pn (y; y). Consistence is a basic requirement for any estimator. Below we will show that the fiducial EDF is a consistent estimator. Suppose the distribution function of Y1 is G(·). Note that
sup |Pn (y) − Fn (y)| = max
1≤i≤n
0≤y≤1
=
1 n+1
i n
−
i
,
i
n+1 n+1
→ 0,
−
i−1
n
n → ∞.
Then sup |Pn (y) − y| ≥ 0≤y≤1
sup |Fn (y) − y| − sup |Pn (y) − Fn (y)| 0≤y≤1
0≤y≤1
→ sup |G(y) − y| ,
n → ∞,
0≤y≤1
and sup |Pn (y) − y| ≤ 0≤y≤1
sup |Fn (y) − y| + sup |Pn (y) − Fn (y)| 0≤y≤1
0≤y≤1
→ sup |G(y) − y| ,
n → ∞.
0≤y≤1
It follows that sup |Pn (y) − y| → sup |G(y) − y| , 0≤y≤1
n → ∞.
0≤y≤1
Note that supy |Pn (y) − y| = max1≤i≤n Y(i) − i/(n + 1) . It can be obtained that
− max U(i) − i 1≤i≤n n + 1 1≤i≤n n + 1 → sup |G(y) − y| , n → ∞,
max U(i) − Y(i) ≥ max Y(i) −
1≤i≤n
i
0≤y≤1
and
i + max U(i) − i max U(i) − Y(i) ≤ max Y(i) − 1≤i≤n 1≤i≤n 1≤i≤n n+1 n + 1 → sup |G(y) − y| ,
n → ∞.
0≤y≤1
Then
sup F˜n y; U , Y − y = max U(i) − Y(i)
1≤i≤n
0≤y≤1
→ sup |G(y) − y| ,
n → ∞.
(3)
0≤y≤1
This implies that the fiducial EDF is consistent. Now we will construct tests through the fiducial EDF. The classical tests based on the EDF are the KS statistic Dn = sup |Fn (y) − y| 0≤y≤1
(4)
X. Xu et al. / Computational Statistics and Data Analysis 53 (2009) 1132–1141
1135
and CM statistic 1
Z
(Fn (y) − y)2 dy.
Wn =
(5)
0
Substituting Pn for Fn in (4) and (5), we can obtain Cn = sup |Pn (y) − y| = max y(i) − i/(n + 1)
1≤i≤n
0≤y≤1
and 1
Z
(Pn (y) − y)2 dy.
CWn = 0
Cn is obtained in another way when testing the periodogram in time series analysis (see Durbin (1969a) and Durbin (1969b)). When replacing Fn by e Fn defined in (2), we have FCn = sup e Fn y; U , y − y = max U(i) − y(i)
1≤i≤n
0≤y≤1
and 1
Z FWn =
2 e Fn y; U , y − y dy.
0
It is convenient to define CWn and FWn in terms of y(i) ’s and U(i) ’s; then CWn =
n2
−
3 (n + 1)2
2
n X
(n + 1)2
i=1
iy(i) +
2
n X
3 (n + 1) i=1
y2(i) +
n X
1
3 (n + 1) i=1
y(i) y(i+1)
and 1
FWn =
3
y2(n) +
+
n−1 1X
3 i=1
1 3
U(2n) −
2 3
U(n) y(n) +
U(2i) y(i+1) −
n −1 1X
3 i=1
n−1 1X
3 i=1
U(i+1) y2(i) −
U(2i+1) y(i) +
n −1 1X
3 i =1
n−1 1X
3 i=1
U(i) y2(i+1)
U(i+1) − U(i) y(i) y(i+1) +
n −1 1X
3 i =1
U(i) U(i+1) y(i+1) − y(i) .
D’Agostino and Stephens (1986, page 110) pointed out that the AD statistic 1
Z An = n 0
(Fn (y) − y)2 dy y (1 − y)
calculate. Using the fact that the variance of U(i) is i (n + 1 − i) (n + 2) AD statistics, which are defined as
−1
R1
(Pn (y) − y)2 /var (Pn (y)) dy is difficult to (n + 1)−2 , we introduce summational modified
has better power than the KS and CM statistics. But the related modification
0
2 n y(i) − i/ (n + 1) (n + 2) (n + 1)2 X CAn = , n i (n + 1 − i) i =1 and FAn =
2 n y(i) − U(i) (n + 2) (n + 1)2 X . n i (n + 1 − i) i =1
(6)
It is worth noting that FCn , FWn and FAn are random variables when x is given. This will cause that the test directly based on FCn , FWn or FAn is a randomized test and may have poor power. A convenient way to deal with this problem is to transform the random variables into real numbers. Of course, the transformation should summarize some or all of the characteristics of the random variables. For example, one can choose the expected value as a test statistic. Then we have the following three test statistics: FCnµ = EU (FCn ) = EU
max U(i) − y(i) ,
1≤i≤n
FWnµ = EU (FWn )
=
n 3 (n + 2)
−
1
n X
(n + 1) (n + 2)
i =1
(2i + 1) y(i) +
2
n X
3 (n + 1) i=1
y2(i) +
1
n X
3 (n + 1) i=1
y(i) y(i+1) ,
(7)
1136
X. Xu et al. / Computational Statistics and Data Analysis 53 (2009) 1132–1141
and FAnµ = 1 + CAn .
(8)
From (8), it can be seen that FAnµ and CAn are the same test. We can also choose the qth (0 < q < 1) quantile as a test statistic. The qth quantile of a random variable X is defined by inf {x : P (X ≤ x) ≥ q}. Let FCnq , FWnq and FAnq be the qth quantile of FCn , FWn and FAn respectively. Note that P (FCn ≤ x) = P y(i) − x ≤ U(i) ≤ y(i) + x, i = 1, . . . , n can be evaluated by the recursion due to Noé (1972). Then one can obtain FCnq by numerical search. The expectation test can be considered as the combining test of the quantile tests, for expectation can be expressed by the mean of quantile. Hence the performance of the expectation test will be between the best and the poorest performance of the quantile tests for a given alternative distribution function. But which performs best for the whole range of alternative distribution functions? From the next section, it can be seen that in most cases the qth quantile test with a very small or large q performs best and in some cases the expectation test is also a comparable test. Recall that the fiducial EDF is consistent. It can be seen from (3) that as n tends to infinity, the limits of FCnµ , FWnµ , FCnq and FWnq (0 < q < 1) are equal to 0 under the null hypothesis. So the limits of the critical values are equal to 0 too. On the other hand, the limits of the test statistics are larger than 0 under any alternative distribution function. Thus as n tends to infinity, the power of FCnµ , FWnµ , FCnq and FWnq (0 < q < 1) tends to 1, which shows that they are consistent tests. By some slight modification, the new tests could be used for testing composite hypothesis H0 . Given the estimates θˆ1 , . . . , θˆk of θ1 , . . . , θk , which are the unknown parameters involved in F0 , let Fˆ0 (x) = F0 (x; θˆ1 , . . . , θˆk ). Then FCn , FWn and FAn follow by the transformation Yi = Fˆ0 (Xi ), i = 1, . . . , n. In this case, FCn , FWn and FAn are not distribution-free and depend on the functional form of F0 . Hence the range of the application of any tests based on this transformation will be restricted. However, as proved by David and Johnson (1948), given suitable estimators (usually maximum likelihood estimators (MLEs)) of the location and scale parameters, tests based on this transformation can be applied for a location–scale family. More generally, this method could be applied to a group family which is obtained by subjecting a random variable with a fixed distribution to a family of transformations. This could be seen as follows. Let X be a population with distribution function F0 (x; θ ) where θ = (θ1 , . . . , θk )T , and X1 , . . . , Xn an iid sample from for any X . Suppose g belonging to the transformation family G, gX is distributed as F0 (x; g¯ θ ). If the test statistic T x, θˆ x
and if the estimator θˆ satisfies θˆ (gx) = g¯ θˆ (x), then T gX , θˆ gX
= T gx, g¯ θˆ x
for all x, where gx = (gx1 , . . . , gxn )T ,
has the same distribution for any g ∈ G.
In this paper, we mainly applied the new tests for the normal, uniform and exponential hypotheses by estimating the 2 unknown parameters. For the null hypothesis that a random sample is generated from a normal distribution N (µ, P σ ), consider the case in which both µ and σ are unknown. Then the estimators of µ and σ are given by µ ˆ = x¯ = xi /n and σˆ =
P
(xi − x¯ )2 / (n − 1)
0.5
respectively. For hypothesized uniform distribution U (a, b) with a and b unknown,
the recommended estimators of a and b are best linear unbiased estimators (BLUEs) aˆ =
nx(1) − x(n) / (n − 1) and
bˆ = nx(n) − x(1) / (n − 1), which lead the resulting tests to be more powerful than the ones based on the MLEs a˜ = x(1)
and b˜ = x(n) . For the null hypothesis of underlying exponential distribution Exp(α, γ ) with density γ1 exp [−(x − α)/γ ], x ≥ α , we are mainly concerned with the case in which α = 0 and γ > 0 is unknown, because mathematical properties of the exponential distribution can be used to change the case in which both α and γ are unknown to the case we consider. Then γ is estimated by γˆ = x¯ . The next section will give power comparisons of tests for testing normality, uniformity and exponentiality. It should be emphasized that the alternative distributions considered here are really families of distributions, for which the location and scale parameters can take any values. 3. Power comparisons We employ FORTRAN 95 to develop programs. Critical values of FBnq and FCnµ tests are derived by 50 000 samples from the null distribution and, for a given sample FBnq and FCnµ are estimated by the empirical quantile and average of 50 000 values of FBn , B = C , W , A. For convenience, we say the critical values are derived by 50 000 × 50 000 replications. Similarly, the power of FBnq and FCnµ tests is evaluated by 10 000 × 10 000 replications. The critical values and power of FWnµ and FAnµ are derived by 50 000 replications, since their values for a given sample can be computed according to (7) and (8) respectively. Critical values and power of tests based on the classical EDF and Pyke’s modified EDF are obtained by 100 000 replications. The test statistics are divided into three groups: Group 1, supremum statistics FCnq , FCnµ , Cn and Dn ; Group 2, quadratic statistics FWnq , FWnµ , CWn and Wn ; and Group 3, weighted statistics FAnq , FAnµ and An . Tables for FCnq , FWnq and FAnq with q = 0.05, 0.10, 0.15, . . . , 0.95 have been made, but only the statistics with q = 0.05 and q = 0.95 are presented here out of consideration for space saving and clearer comparisons. Because the statistic with q = 0.05 or q = 0.95 is more powerful than the other statistics with other q values in most cases and a little less powerful only in some cases. On the other hand, power of FBnµ is between the greatest and smallest power of FBnq and close to that of FBnq with moderate q values, B = C , W , A. Critical values of FBn,q=0.05 , FBn,q=0.95 and FBnµ at significance level 5% are tabulated in Tables 1–4. Before the power studies, it is worth stressing that there is no test which is powerful against each of the alternatives since the alternative distribution function space is very large. A test can be said to have optimum performance if it is more
X. Xu et al. / Computational Statistics and Data Analysis 53 (2009) 1132–1141
1137
Table 1 Critical values for simple hypothesis at significance level 5%. Statistics
√ √nFCn,q=0.05 √nFCn,q=0.95
nFCnµ nFWn,q=0.05 nFWn,q=0.95 nFWnµ FAn,q=0.05 FAn,q=0.95 FAnµ
Sample size n 10
20
30
40
50
80
160
250
0.6338 1.7779 1.1922 0.0976 1.0806 0.4831 0.8708 8.1479 3.7136
0.7466 1.9483 1.3175 0.1235 1.2439 0.5466 1.0135 7.9002 3.6475
0.7831 2.0131 1.3611 0.1308 1.3168 0.5751 1.0321 7.7655 3.5937
0.8050 2.0537 1.3906 0.1349 1.3442 0.5852 1.0486 7.8070 3.6031
0.8215 2.0861 1.4124 0.1377 1.3697 0.5966 1.0525 7.7625 3.5786
0.8522 2.1370 1.4521 0.1414 1.4001 0.6044 1.0619 7.7666 3.5809
0.8814 2.1859 1.4898 0.1442 1.4390 0.6232 1.0488 7.7148 3.5465
0.8919 2.2070 1.5034 0.1429 1.4455 0.6249 1.0426 7.6692 3.5148
Table 2 Critical values for testing normality at significance level 5%. Statistics
√ √nFCn,q=0.05 √nFCn,q=0.95
nFCnµ nFWn,q=0.05 nFWn,q=0.95 nFWnµ FAn,q=0.05 FAn,q=0.95 FAnµ
Sample size n 10
20
30
40
50
80
160
250
0.4904 1.4301 0.8988 0.0524 0.5380 0.2123 0.5232 3.6867 1.6471
0.5925 1.5676 1.0080 0.0722 0.6107 0.2500 0.6044 3.6630 1.6918
0.6344 1.6235 1.0534 0.0795 0.6354 0.2635 0.6341 3.6452 1.7065
0.6565 1.6531 1.0786 0.0828 0.6465 0.2699 0.6535 3.6378 1.7258
0.6734 1.6766 1.0988 0.0855 0.6547 0.2748 0.6578 3.6185 1.7220
0.7006 1.7109 1.1292 0.0886 0.6654 0.2804 0.6721 3.6010 1.7294
0.7335 1.7518 1.1656 0.0917 0.6761 0.2872 0.6893 3.5904 1.7437
0.7466 1.7679 1.1805 0.0923 0.6777 0.2878 0.6945 3.5773 1.7428
Table 3 Critical values for testing uniformity at significance level 5%. Statistics
√ √nFCn,q=0.05 √nFCn,q=0.95
nFCnµ nFWn,q=0.05 nFWn,q=0.95 nFWnµ FAn,q=0.05 FAn,q=0.95 FAnµ
Sample size n 10
20
30
40
50
80
160
250
0.5451 1.6667 1.0801 0.0663 0.8664 0.3597 0.5842 6.1215 2.5655
0.6892 1.8863 1.2510 0.1004 1.1254 0.4762 0.7518 6.7608 2.9429
0.7484 1.9786 1.3237 0.1149 1.2212 0.5218 0.8254 7.0242 3.1027
0.7796 2.0259 1.3595 0.1220 1.2724 0.5436 0.8699 7.1843 3.2085
0.7964 2.0572 1.3838 0.1245 1.3075 0.5590 0.8877 7.2114 3.2169
0.8321 2.1154 1.4265 0.1324 1.3628 0.5835 0.9297 7.3332 3.2972
0.8714 2.1719 1.4784 0.1398 1.4125 0.6081 0.9808 7.4826 3.4154
0.8866 2.1975 1.4960 0.1403 1.4298 0.6130 0.9947 7.5332 3.4310
Table 4 Critical values for testing exponentiality with scale parameter unknown at significance level 5%. Statistics
√ √nFCn,q=0.05 √nFCn,q=0.95
nFCnµ nFWn,q=0.05 nFWn,q=0.95 nFWnµ FAn,q=0.05 FAn,q=0.95 FAnµ
Sample size n 10
20
30
40
50
80
160
250
0.5459 1.5527 1.0002 0.0685 0.7074 0.2896 0.6444 5.0791 2.2886
0.6509 1.7027 1.1127 0.0911 0.8018 0.3364 0.7487 5.0564 2.3213
0.6915 1.7607 1.1600 0.0977 0.8384 0.3530 0.7842 5.0417 2.3234
0.7132 1.7955 1.1857 0.1008 0.8568 0.3586 0.8021 5.0106 2.3255
0.7282 1.8210 1.2041 0.1033 0.8720 0.3659 0.8104 5.0014 2.3366
0.7561 1.8640 1.2384 0.1063 0.8939 0.3763 0.8255 5.0167 2.3415
0.7877 1.9070 1.2755 0.1095 0.9094 0.3817 0.8317 4.9642 2.3217
0.7994 1.9245 1.2888 0.1096 0.9206 0.3840 0.8294 4.9634 2.3227
powerful than the other tests against most of the alternatives and a little less powerful against the other alternatives. Since this cannot be definitely defined, different people may have different points of view. Of course, besides the power criterion, one can compare tests by the Pitman efficiency or the Bahadur efficiency. In this paper we are trying to compare the tests in an objective manner. To study the power of the tests for a simple hypothesis H0 that F (x) = x for 0 ≤ x ≤ 1, we choose the distribution functions F1 (x) = 1/ (1 + log (1/x)), F2 (x) = exp (1 − 1/x), β(3, 3) and β(1.3, 0.7) as alternatives. F1 and F2 were introduced in Jager and Wellner (2005). F1 , stochastically smaller than uniform distribution U (0, 1), is an example of a
1138
X. Xu et al. / Computational Statistics and Data Analysis 53 (2009) 1132–1141
Table 5 Power for testing simple hypothesis H0 : F (x) = x, 0 ≤ x ≤ 1, at significance level 5%. Statistics
FCn,q=0.05 FCn,q=0.95 FCnµ Cn Dn FWn,q=0.05 FWn,q=0.95 FWnµ CWn Wn FAn,q=0.05 FAn,q=0.95 FAnµ An
n = 20
n = 50
n = 20
n = 50
β(3, 3) n = 20
n = 50
n = 20
n = 50
0.5231 0.5228 0.4874 0.4600 0.4042 0.5410 0.4979 0.4717 0.4379 0.3980 0.4705 0.4265 0.4234 0.8648
0.9107 0.8512 0.8637 0.8398 0.8099 0.9095 0.8035 0.8255 0.7947 0.7679 0.9007 0.7911 0.8391 0.9933
0.4843 0.1112 0.2651 0.2861 0.3437 0.4867 0.1528 0.2466 0.2861 0.3303 0.7013 0.4187 0.5683 0.3092
0.9979 0.5736 0.9250 0.9236 0.9476 0.9959 0.5712 0.8406 0.8792 0.9089 1.0000 0.9866 0.9992 0.9549
0.4584 0.0059 0.0714 0.1039 0.1505 0.5176 0.0012 0.0341 0.0710 0.1460 0.7833 0.1373 0.4478 0.1572
0.9882 0.1158 0.7004 0.6350 0.7078 0.9945 0.0496 0.6623 0.7670 0.8490 0.9999 0.9234 0.9917 0.9505
0.5536 0.5713 0.5812 0.5527 0.5451 0.5724 0.6513 0.6426 0.6313 0.6217 0.5593 0.6620 0.6379 0.6657
0.9324 0.9251 0.9370 0.9113 0.9074 0.9336 0.9569 0.9549 0.9519 0.9495 0.9296 0.9626 0.9595 0.9641
F1
F2
β(1.3, 0.7)
F1 (x) = 1/ (1 + log (1/x)), F2 (x) = exp (1 − 1/x) and β(m1 , m2 ) is beta distribution.
Table 6 Power for testing normality at significance level 5%. Statistics
FCn,q=0.05 FCn,q=0.95 FCnµ Cn Dn FWn,q=0.05 FWn,q=0.95 FWnµ CWn Wn FAn,q=0.05 FAn,q=0.95 FAnµ An
Uniform
Exponential
Cauchy
Laplace
n = 20
n = 40
n = 20
n = 40
n = 20
n = 40
n = 20
n = 40
0.1741 0.2494 0.2373 0.1320 0.0964 0.1755 0.2914 0.2564 0.2033 0.1401 0.1427 0.2406 0.1744 0.1699
0.3505 0.4459 0.4497 0.2484 0.1962 0.3534 0.5100 0.4977 0.4106 0.3289 0.3887 0.5243 0.4439 0.4342
0.5807 0.5456 0.5483 0.5788 0.5770 0.5811 0.7382 0.6852 0.7152 0.7199 0.7788 0.8139 0.7947 0.7715
0.9255 0.9049 0.8980 0.8948 0.9023 0.9173 0.9806 0.9636 0.9655 0.9667 0.9911 0.9919 0.9905 0.9837
0.8319 0.6781 0.7994 0.8254 0.8461 0.8420 0.7039 0.8182 0.8570 0.8810 0.8772 0.8397 0.8759 0.8824
0.9795 0.9440 0.9737 0.9771 0.9811 0.9811 0.9560 0.9781 0.9863 0.9896 0.9881 0.9824 0.9881 0.9902
0.2125 0.3162 0.2843 0.2032 0.1648 0.2275 0.3866 0.3327 0.2940 0.2293 0.2389 0.3472 0.2800 0.2708
0.4481 0.5790 0.5590 0.4010 0.3509 0.4781 0.6934 0.6501 0.5786 0.5095 0.6379 0.7252 0.6688 0.6215
distribution function with high density near zero, while F2 , which is the inverse function of F1 and thus stochastically larger than U (0, 1), is an example of a distribution function with low density near zero. β(3, 3) and β(1.3, 0.7) are beta distributions. β(3, 3) is stochastically larger than U (0, 1) conditionally on x ≤ 0.5 and smaller than U (0, 1) conditionally on x ≥ 0.5. β(1.3, 0.7) is stochastically larger than U (0, 1). Table 5 shows the power for testing simple hypothesis H0 that F (x) = x for 0 ≤ x ≤ 1 at significance level 5% with sample sizes n = 20 and 50. It can be seen that FBn,q=0.05 performs much better than FBn,q=0.95 and FBnµ , B = C , W , A. In Group 1, FCn,q=0.05 is more powerful than Cn and Dn for all alternatives. In Group 2, FWn,q=0.05 is more powerful than CWn and Wn against F1 , F2 and β(3, 3), while a little less powerful against β(1.3, 0.7). In Group 3, against F1 , An is much more powerful than FAn,q=0.05 whereas against F2 and β(3, 3), much less powerful than FAn,q=0.05 . This shows that An is much more sensitive to the alternatives with high density in the tails. Note that F1 is an extreme example with ∞-density at zero, and against β(1.3, 0.7) of which the density near one is not as high as the density of F1 near zero, An is not so much powerful than FAn,q=0.05 . Roughly speaking, FAn,q=0.05 has optimum performance for testing simple hypothesis. It can be also seen from Table 5 that tests in Group 3 perform better than the corresponding tests in Groups 1 and 2. Table 6 gives the power for testing normality at significance level 5% with sample sizes n = 20 and 40. In Group 1, FCnµ is a good test. FCnµ and FCn,q=0.95 have similar power results against the uniform, exponential and Laplace distributions, while against Cauchy distribution, FCnµ is more powerful. Against the uniform and Laplace distributions, FCnµ is much more powerful than FCn,q=0.05 , Cn and Dn , while against the exponential and Cauchy distributions, they perform similarly. In Group 2, FWn,q=0.95 and FWnµ are good tests. For conservative consideration, one may prefer FWnµ . In Group 3, FAn,q=0.95 performs better than the other tests and hence is a good test. In general, FAn,q=0.95 is the best statistic for testing normality among the three groups of tests. Simulations of tests based on MLEs of the uniform distribution parameters are carried out, and it is shown that tests based on BLUEs perform better than the corresponding ones based on MLEs. The results are not given here out of consideration that the main purpose of this paper is to show that the fiducial EDF leads to more powerful tests than the classical EDF or Pyke’s modified EDF.
X. Xu et al. / Computational Statistics and Data Analysis 53 (2009) 1132–1141
1139
Table 7 Power for testing uniformity at significance level 5%. Statistics
FCn,q=0.05 FCn,q=0.95 FCnµ Cn Dn FWn,q=0.05 FWn,q=0.95 FWnµ CWn Wn FAn,q=0.05 FAn,q=0.95 FAnµ An a
Normal
Exponential
Gamma(4)a
Laplace
n = 20
n = 40
n = 20
n = 40
n = 20
n = 40
n = 20
n = 40
0.2692 0.0937 0.1759 0.1929 0.2201 0.2795 0.0883 0.1437 0.1723 0.2040 0.3942 0.1762 0.2502 0.1948
0.7279 0.2781 0.5109 0.5041 0.5445 0.7501 0.2020 0.4423 0.4935 0.5516 0.8471 0.5019 0.6911 0.5886
0.8686 0.8722 0.8869 0.8757 0.8747 0.8683 0.8969 0.9024 0.9039 0.9038 0.8726 0.9113 0.9078 0.9044
0.9960 0.9969 0.9971 0.9969 0.9968 0.9961 0.9976 0.9979 0.9984 0.9984 0.9961 0.9978 0.9976 0.9984
0.1605 0.1762 0.1782 0.1741 0.1718 0.1637 0.1932 0.1919 0.1941 0.1920 0.1603 0.1977 0.1918 0.1961
0.3214 0.3574 0.3613 0.3367 0.3345 0.3250 0.3929 0.3875 0.3823 0.3796 0.3066 0.3795 0.3659 0.3817
0.4774 0.4007 0.4756 0.4800 0.4972 0.4822 0.4044 0.4670 0.4916 0.5101 0.5524 0.4940 0.5275 0.4943
0.8667 0.7932 0.8457 0.8418 0.8518 0.8739 0.7811 0.8350 0.8484 0.8610 0.9092 0.8448 0.8772 0.8567
Gamma(4) is a gamma distribution with shape parameter 4.
Table 8 Power for testing exponential distribution with location parameter 0 and scale parameter unknown at significance level 5% for sample size 20. Statistics
FCn,q=0.05 FCn,q=0.95 FCnµ Cn Dn FWn,q=0.05 FWn,q=0.95 FWnµ CWn Wn FAn,q=0.05 FAn,q=0.95 FAnµ An a b c d e
IFR alternatives
DFR alternatives
Chi(4)a
U (0, 1)
W (1.5)b
1 c N 2
Chi(1)a
W (0.8)b
LN (0, 1)d
1 Cae 2
0.2935 0.1069 0.2577 0.3492 0.4092 0.2970 0.1264 0.2795 0.4009 0.4828 0.4986 0.4727 0.5295 0.4545
0.4341 0.4197 0.4710 0.5189 0.5341 0.4225 0.5426 0.5572 0.6368 0.6763 0.7101 0.6362 0.6573 0.6351
0.2670 0.1344 0.2658 0.3532 0.4014 0.2726 0.1623 0.2985 0.4115 0.4839 0.4704 0.4653 0.4990 0.4445
0.0996 0.0785 0.1145 0.1592 0.1792 0.1040 0.0751 0.1213 0.1716 0.2106 0.1920 0.1718 0.1917 0.1735
0.5313 0.6145 0.5840 0.5350 0.4736 0.5491 0.6808 0.6430 0.5975 0.5251 0.4414 0.6412 0.5813 0.7081
0.1970 0.2617 0.2414 0.2062 0.1752 0.2065 0.3135 0.2794 0.2450 0.1997 0.1259 0.2677 0.2182 0.2733
0.1165 0.0859 0.1028 0.1237 0.1393 0.1180 0.0926 0.1030 0.1313 0.1545 0.2072 0.1277 0.1696 0.1437
0.5692 0.6226 0.6150 0.6124 0.6035 0.5690 0.6672 0.6475 0.6497 0.6350 0.4376 0.6295 0.5726 0.6403
Chi(m) is a chi-square distribution with freedom degree m. W (m) is a Weibull distribution with shape parameter m. 1 N: X is |Y |, where Y = N (0, 1). 2
LN (m1 , m2 ): X = eY , where Y = N (m1 , m2 ). 1 Ca: X is |Y |, where Y has the Cauchy distribution with median 0. 2
Table 7 provides the power for testing uniformity at significance level 5% with sample sizes n = 20 and 40, where Gamma(4) denotes the gamma distribution with shape parameter 4. In Group 1, FCn,q=0.05 , FCnµ , Cn and Dn perform similarly against the exponential, Laplace and Gamma(4) distributions, while against normality, FCn,q=0.05 performs much better than the others. Hence FCn,q=0.05 is a good test. From the same discussion, it can be seen that good test in Group 2 is FWn,q=0.05 and in Group 3 is FAn,q=0.05 . Tests in Group 3 performs better than tests in Groups 1 and 2, and FAn,q=0.05 has optimum performance for testing uniformity among the tests in Groups 1, 2 and 3. Table 8 gives power studies of tests for exponentiality with location parameter 0 and scale parameter unknown at significance level 5% with sample size n = 20. Two classes of alternatives are considered, the alternatives with increasing failure rate (IFR) and the alternatives with decreasing failure rate (DFR). In Group 1, Dn performs best against IFR alternatives and FCn,q=0.95 and FCnµ perform well against DFR alternatives. When considering both IFR and DFR alternatives, Dn and Cn are optimum tests. In Group 2, the power comparisons are similar to that in Group 1 and Wn and CWn are good tests. In Group 3, FAn,q=0.05 , FAn,q=0.95 and FAnµ perform better than An against IFR alternatives, and An perform best against DFR alternatives. Against both IFR and DFR alternatives, FAn,q=0.95 , FAnµ and An have close performance in power. Among Groups 1, 2 and 3, against IFR alternatives, Wn , FAn,q=0.05 and FAnµ perform best; against DFR alternatives, FWn,q=0.95 and An perform best; against the whole range of (IFR and DFR) alternatives, FAn,q=0.95 , FAnµ and An might be preferred. Some salient features are summarized from the power studies. (1) In most cases, the fiducial EDF tests improve the performance of the corresponding classical EDF or Pyke’s modified EDF tests, except the supremum and quadratic tests for exponentiality with scale parameter unknown. (2) In general the qth quantile tests with extremely small or large q have optimum performance in power. This shows that the tails of the random statistics FCn , FWn and FAn are more informative. (3) Tests in Group 3 perform better than tests in Groups 1 and 2. This is a natural result from a widely known fact that the tests of AD type are more powerful than the tests of KS and CM types.
1140
X. Xu et al. / Computational Statistics and Data Analysis 53 (2009) 1132–1141
Table 9 Leghorn chick data. Xa Yb a b
156 214 0.0401 0.5571
162 220 0.0602 0.6328
168 226 0.0873 0.7037
182 230 0.1839 0.7472
186 230 0.2206 0.7472
190 236 0.2612 0.8055
190 236 0.2612 0.8055
196 242 0.3286 0.8548
202 246 0.4021 0.8825
210 270 0.5052 0.9756
Original values X of weights of 20 chicks in grams. Values Y are given by the probability integral transformation for a test for normality, using sample mean 209.6 and sample standard deviation 30.65.
The extension of the simulation results to general results depends on how well the chosen alternative distributions stand for the whole range of alternatives. For the three considered composite hypotheses the fiducial EDF tests perform worse than for the simple hypothesis. However, this may result from the fact that the chosen alternatives for the composite hypotheses testing problem are common but might be less representative than the chosen alternatives for the simple hypothesis. 4. Numerical example Table 9 gives the weights in grams of n = 20 twenty-one-day-old leghorn chicks, provided by Bliss (1967). Now consider the hypothesis that the sample comes from a normal distribution N (µ, σ 2 ) with µ and σ 2 unknown. The estimates for µ and σ are given by µ ˆ = 209.6 and σˆ = 30.65. √ Probability integral transformations give the values Y of Table 9. The observed values for the classical test statistics are nDn = 0.4639, nWn = 0.0338 and An = 0.2142. The fiducial EDF test statistics √ are nFCnµ = 0.8378, nFWn,q=0.95 = 0.5086, nFWnµ = 0.1875, FAn,q=0.95 = 3.1146, which are obtained by generating 100 000 values of FBn , B = C , W , A. It can be seen from Table 2 with n = 20 that all these test results are not significant at the 5% level, so that at this level the sample would not be rejected as coming from a normal population. 5. Comments We have seen that the new tests based on the fiducial EDF enable improvements in the power of the tests based on the classical EDF or Pyke’s modified EDF. This means that FBn , B = C , W , A, provide more information about how well the sample agrees with the null distribution. For FBn , we choose its quantile or expectation as a test statistic. That is, we only make use of some of the information about FBn . It is expected that the more information about FBn is made use of, the more powerful the test will be. Then a question arises that how we can draw as much information of FBn as possible. Maybe one can obtain a critical band and accept the null hypothesis if the distribution function of FBn lies completely within the band. It is expected that the new technique could be applied to more cases, such as, the case that sample is incomplete or censored, 2 or k-sample problem and so forth. Note that the fiducial EDF has similar functional form to Pyke’s modified EDF and calls for the probability integral transformation. This may cause problems in some cases. For example, for testing whether two sets of random samples are generated from the same underlying distribution, there is no knowledge about the null distribution and hence the probability integral transformation cannot be achieved. One might use other transformations to make the sample restricted between 0 and 1. It may follow that the test will depend not only on the underlying distribution but also on the particular transformation. If transformation of sample is not made, then a question arises that how we handle the exponential tails of the fiducial EDF based on the original sample. In such a case, the discrete fiducial EDF, defined by F˘n (x) = U(i) ,
x(i) ≤ x < x(i+1) , i = 0, . . . , n,
(9)
is more easily applied than the continuous fiducial EDF. There are many recent goodness-of-fit techniques. Jager and Wellner (2007) proposed tests via phi-divergence, which are similar to the nonparametric likelihood ratio tests given by Zhang (2002). Of course, these tests could be improved by the fiducial EDF. It can be shown that the supremum statistics can be easily obtained because the supremum is taken over the sample points. The integral statistics, however, have no explicit expression to obtain their values, calling for laborious calculation. In such cases, we may develop summational statistics like FAn defined in (6) or employ the discrete fiducial EDF (9) instead of (2). Further work will be carried out in another paper. Acknowledgements We are very grateful to the anonymous reviewers for their helpful suggestions and comments, which helped in improving the paper. References Aggarwal, O.P., 1955. Some minimax invariant procedures for estimating a cumulative distribution function. Ann. Math. Statist. 26, 450–463. Bliss, C., 1967. Statistics in Biology: Statistical Methods for Research in the Natural Sciences. McGraw–Hill, New York. Brunk, H.D., 1962. On the range of the difference between hypothetical distribution function and Pyke’s modified empirical distribution function. Ann. Math. Statist. 33, 525–532.
X. Xu et al. / Computational Statistics and Data Analysis 53 (2009) 1132–1141
1141
D’Agostino, R.B., Stephens, M.A., 1986. Goodness-of-fit Techniques. Marcel Dekker, New York. David, F.N., Johnson, N.L., 1948. The probability integral transformation when parameters are estimated from the sample. Biometrika 35, 182–190. Durbin, J., 1969a. Test for serial correlation in regression analysis based on the periodogram of least-squares residuals. Biometrika 56, 1–16. Durbin, J., 1969b. Tests of serial independence based on the cumulated periodogram. Bull. Inst. Internat. Statist. 42, 1039–1048. Dvoretsky, A., Kiefer, J., Wolfowitz, J., 1956. Asymptotic minimax character of the sample distribution function and of the classical multinomial estimator. Ann. Math. Statist. 27, 642–669. Green, J.R., Hegazy, Y.A.S., 1976. Powerful modified-edf goodness-of-fit tests. J. Amer. Statist. Assoc. 71, 204–209. Jager, L., Wellner, J.A., 2005. A new goodness-of-fit test: The reversed Berk-Jones statistic. Technical report 443, Department of Statistics, University of Washington. Jager, L., Wellner, J.A., 2007. Goodness-of-fit via phi-divergences. Ann. Statist. 35, 2018–2053. Kuiper, N.H., 1960. Tests concerning random points on a circle. In: Proc. Kon. Nederl. Akad. van. Wet. Amsterdam Ser. A 63. In: Indag. Math., vol. 22. pp. 38–47. Lilliefors, H.W., 1967. On the Kolmogorov–Smirnov test for normality with mean and variance unknown. J. Amer. Statist. Assoc. 62, 399–402. Lilliefors, H.W., 1969. On the Kolmogorov–Smirnov test for the exponential distribution with mean unknown. J. Amer. Statist. Assoc. 64, 387–389. Noé, M., 1972. The calculation of distributions of two-sided Kolmogorov–Smirnov type statistics. Ann. Math. Statist. 43, 58–64. Owen, A.B., 2001. Empirical Likelihood. Chapman & Hall, London. Parr, W.C., Schucany, W.R., 1982. Minimum distance estimation and components of goodness-of-fit statistics. J. Roy. Statist. Soc. B 44, 178–189. Pyke, R., 1959. The supremum and infimum of the Poisson process. Ann. Math. Statist. 30, 568–576. Srinivasan, R., 1970. An approach to testing the goodness of fit of incomplete specified distributions. Biometrika 57, 605–611. Stephens, M.A., 1969. Results from the relation between two statistics of the Kolmogorov–Smirnov type. Ann. Math. Statist. 40, 1833–1837. Zhang, J., 2002. Powerful goodness-of-fit tests based on the likelihood ratio. J. Roy. Statist. Soc. B 64, 281–294.