Accepted Manuscript
Improved inference in dispersion models Francisco M.C. Medeiros, Silvia L.P. Ferrari, Artur J. Lemonte PII: DOI: Reference:
S0307-904X(17)30437-7 10.1016/j.apm.2017.06.045 APM 11849
To appear in:
Applied Mathematical Modelling
Received date: Revised date: Accepted date:
20 February 2017 12 June 2017 21 June 2017
Please cite this article as: Francisco M.C. Medeiros, Silvia L.P. Ferrari, Artur J. Lemonte, Improved inference in dispersion models, Applied Mathematical Modelling (2017), doi: 10.1016/j.apm.2017.06.045
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
Highlights • A Bartlett-type correction to the gradient statistic in dispersion models is derived. • The correction improves the chi-squared approximation when the sample size is finite. • A comprehensive simulation study compares several different tests in small samples.
CR IP T
• Large sample tests, including the gradient test, may perform poorly in small samples.
AC
CE
PT
ED
M
AN US
• Bartlett and Bartlett-type corrections remove size distortions with no power loss.
1
ACCEPTED MANUSCRIPT
Improved inference in dispersion models Francisco M.C. Medeiros
Silvia L.P. Ferrari∗
CR IP T
Department of Statistics, Federal University of Rio Grande do Norte, Brazil
Department of Statistics, University of S˜ao Paulo, Brazil
Artur J. Lemonte
AN US
Department of Statistics, Federal University of Rio Grande do Norte, Brazil
Abstract
M
We derive a general matrix Bartlett–type correction factor to the gradient statistic in the class of dispersion models. The correction improves the large–sample χ2 approximation to the null
ED
distribution of the gradient statistic when the sample size is finite. We conduct Monte Carlo simulation experiments to evaluate and compare the performance of various different tests, namely the usual Wald, likelihood ratio, score, and gradient tests, the Bartlett–corrected versions of the
PT
likelihood ratio, score, and gradient tests, and bootstrap–based tests. The simulation results suggest that the analytical and computational corrections are effective in removing size distortions
CE
of the type I error probability with no power loss. The impact of the corrections in two real data
AC
applications is considered for illustrative purposes.
Keywords: Bartlett correction; Bartlett–type correction; Bootstrap–based tests; Large–sample test statistics; Proper dispersion models.
∗
Corresponding author:
[email protected]
2
ACCEPTED MANUSCRIPT
1
Introduction
The Wald, likelihood ratio (LR), and score statistics, referred to as the “Holy Trinity” in the statistical literature, are often used to test hypotheses in parametric models. The gradient statistic (Terrell [1]) has received attention in the last few years (Muggeo and Lovision [2], Lemonte [3]) because it is very simple to compute (Rao [4, § 1.8]). It only involves the score vector and the maximum likelihood
CR IP T
estimates (unrestricted and restricted to the null hypothesis) of the parameter vector. Unlike the Wald and score statistics, the gradient statistic does not require the information matrix, it does not involve matrix inversion, and it shares the same first order asymptotic properties with the other three statistics (Lemonte and Ferrari [5]). The four statistics differ in their second-order asymptotic properties. In small and moderate sized samples, the approximation of the true, usually unknown, distributions of
AN US
the statistics by their χ2 asymptotic distribution may lead to considerable size distortions in the type I error probability of the tests. Bartlett and Bartlett–type corrections use second-order likelihood asymptotic results to produce modified statistics that follow more closely a χ2 distribution under the null hypothesis.
Bartlett corrections have a long history. Lawley [6], generalizing the findings of Bartlett [7], pro-
M
posed a correction factor to the LR statistic in wide generality. The factor, called Bartlett correction, reduces the approximation error of the null distribution of the statistic by χ2 distribution from O(n−1 )
ED
to O(n−2 ). Cordeiro and Ferrari [8] derived a Bartlett–type correction to the score statistic. Unlike
the Bartlett correction, it is a factor that involves a second-order polynomial in the statistic itself.
PT
Recently, Vargas et al. [9] obtained a Bartlett–type correction to the gradient statistic. Bartlett–type corrections to the Wald statistic are available only for some special models, which depend on cumu-
CE
lants of log–likelihood derivatives and may be difficult or even impossible to obtain depending on the model and the hypothesis under test. The reader is referred to Cordeiro and Cribari–Neto [10] for a
AC
detailed survey on Bartlett and Bartlett–type corrections. In recent years there has been a renewed interest in Bartlett and Bartlett–type correction fac-
tors and research papers have been published giving expressions for computing these corrections for special models. Lemonte et al. [11, 12] obtained Bartlett and Bartlett–type corrections to the LR and score statistics in the nonlinear Birnbaum–Saunders regression model, respectively. Ferrari and Uribe–Opazo [13] and Uribe–Opazo et al. [14] derived, respectively, a Bartlett correction to the LR statistic and a Bartlett–type correction to the score statistic in symmetric linear regression models,
3
ACCEPTED MANUSCRIPT
whereas Medeiros and Ferrari [15] obtained a Bartlett–type correction to the gradient statistic in this class of models and showed that their results are valid for log–symmetric linear models as well. Lagos et al. [16] derived a Bartlett–type correction to the score statistic in the Gaussian ARMA model. In generalized linear models, Cordeiro [17, 18] derived Bartlett corrections to the LR statistic, Cribari– Neto and Ferrari [19] and Cordeiro et al. [20] obtained a Bartlett–type correction to the score statistic, while Vargas et al. [21] derived a Bartlett–type correction to the gradient statistic. Other references are
CR IP T
Zucker et al. [22], Manor and Zucker [23], Lagos and Morettin [24], Tu et al. [25], van Giersbergen [26], Bai [27], Lemonte et al. [28], Fujita et al. [29], and Noma [30], among others.
Dispersion models (Jørgensen [31]) correspond to a wide class of models that allows several choices for the distribution of the response variable and a nonlinear function relation between its conditional mean and the linear predictor. Unlike generalized linear models, dispersion models en-
AN US
compass distributions outside the exponential family. An example is the von Mises distribution, which is useful for modeling directional data (Lee [32], Mardia and Jupp [33], Fisher [34]). Second-order likelihood–based inference in dispersion models has been the subject of recent papers. For example, Lemonte and Ferrari [35] derived second-order asymptotic expansions to the distribution of the Wald,
M
LR, score, and gradient statistics under a sequence of Pitman alternatives hypotheses. They showed that none of the four tests is uniformly more powerful than the others. Simas et al. [36] studied the
ED
asymptotic behavior of the probability density function of dispersion models. Simas et al. [37] derived a second-order unbiased maximum likelihood estimator, and Rocha et al. [38] derived a second-order approximation to the covariance matrix of the maximum likelihood estimator. Bartlett correction to
PT
the LR statistic and Bartlett–type correction to the score statistic were obtained by Cordeiro et al. [39] and Ferrari et al. [40], respectively. It is worth emphasizing that simulation experiments provided by
CE
Lemonte and Ferrari [35] in the class of dispersion models reveal considerable size distortions in the type I error probability for the test that uses the gradient statistic (and for the tests that use the usual
AC
Wald, LR, and score statistics as well) coupled with its null asymptotic χ2 distribution when the sample size is small or moderate. Hence, corrections are needed. This paper goes in this direction and we will provide a Bartlett–type corrected gradient statistic in the class of dispersion models. In this paper, we revisit the work of Cordeiro et al. [39] and Ferrari et al. [40] and derive a Bartlett– type correction to the gradient statistic in dispersion models. Our results generalize the findings in Vargas et al. [21], which hold only for generalized linear models. We perform Monte Carlo simulations to evaluate and compare the type I error probability and the power of the usual tests, i.e. the 4
ACCEPTED MANUSCRIPT
“Holy Trinity” plus the gradient test, and the corrected LR, score and gradient tests. Additionally, bootstrap–based tests are also included in the Monte Carlo experiments for the sake of comparison. The simulations suggest that the uncorrected tests may be markedly liberal in small and moderate sized samples, and the analytically and bootstrap corrected tests display type I error probability close to the chosen nominal level with virtually no power loss. It is worth emphasizing that we have not found any comprehensive simulation study in the statistical literature comparing the classical uncor-
CR IP T
rected and corrected large–sample tests in dispersion models. This paper fills this gap and includes the gradient test and its Bartlett–type corrected version derived here in the Monte Carlo simulation study.
The following sections are organized as follows. In Section 2, we define dispersion models and present likelihood–based inferential tools. In Section 3, we present the Bartlett correction to the LR
AN US
statistic and the Bartlett–type correction to the score statistic, and we derive a Bartlett–type correction to the gradient statistic in dispersion models. Section 4 deals with Monte Carlo simulation experiments. In Section 5 we illustrate the impact of the corrections in two real data applications. Section 6
Dispersion models, estimation, and testing
ED
2
M
closes the paper with some final remarks.
Let Y be a continuous random variable with probability density function (1)
PT
π(y; θ, φ) = exp[φt(y, θ) + a(y, φ)],
for y in a suitable subset of the real line, where t(·, ·) and a(·, ·) are known functions, θ ∈ IR and
CE
it is usually the parameter of interest, and φ > 0 is the precision parameter. The parameter θ may
be generally interpreted as a kind of location parameter, not necessarily the mean of the distribution.
AC
The class of models defined in (1) is called dispersion models (Jørgensen [31]). Some distributions in this class are presented in Table 1. The exponential family (Brown [41]), obtained from (1) with t(y, θ) = yθ − b(θ) and b(·) being a known function, is a remarkable subclass of dispersion models, where E(Y ) = µ = db(θ)/dθ. Another important subclass of (1) is the proper dispersion models,
and it is obtained by taking a(y, φ) = d1 (φ) + d(y), with d1 (·) and d(·) being known functions. The von Mises, log–gamma, and simplex distributions also belong to this subclass. The proper dispersion models have the following properties: the distribution of t = t(Y, θ) does not depend on θ, i.e., t 5
ACCEPTED MANUSCRIPT
is pivotal when φ is known; if θ is known, the proper dispersion models belong to the exponential family, with t being a canonical statistic. Table 1: t(y, θ) and a(y, θ) for some dispersion models.a Model
t(y, θ)
a(y, φ)
Normal
− 12 (y − θ)2
− 12 log(2πφ−1 )
yθ + (−2θ)1/2
Gamma
−{2[log(2πy 3 φ−1 ) + φ/y]}−1
yθ + log(−θ)
− log Γ(φ) + φ log(φy) − log y
y − θ − exp(y − θ)
φ log φ − log Γ(φ)
cos(y − θ)
log I0 (φ)
Log–gamma von Mises a
CR IP T
Inverse normal
Iν (·) is the modified Bessel function of first kind and order ν; and Γ(·) is the gamma function.
AN US
Let Y1 , . . . , Yn be n random variables with Yl having probability density function (1) with θ = θl , for l = 1, . . . , n. A regression structure is introduced in model (1) by assuming that g(θl ) = ηl = x> l β,
l = 1, . . . , n,
(2)
where x> l = (xl1 , . . . , xlp ) is a vector of known covariates associated to the l–th observation, β =
M
(β1 , . . . , βp )> is a p–vector (p < n) of unknown parameters to be estimated from the data, and g(·) is a continuous, invertible, and twice differentiable function, called the link function, which links the
ED
covariates xl to the parameter of interest θl . The precision parameter φ is unknown but the same for all observations.
PT
The log–likelihood function for the parameter vector (β, φ) takes the form
CE
` = `(β, φ) = φ
n X
t(yl , θl ) +
l=1
n X
a(yl , θl ),
(3)
l=1
(i)
−1 (i) i i where θl = g −1 (x> l β), and g (·) is the inverse of g(·). Let tl = t (yl , θl ) = ∂ t(yl , θl )/∂θl and
AC
Dil = Dil (θl , φ) = E[t(i) (Yl , θl )] for i = 1, 2, 3, 4 and l = 1, . . . , n. We assume the usual regularity conditions for large sample inference based on likelihood estimation Cox and Hinkley [42, Cap. 9]. We then have D1l = 0 and D2l = −φE[t(1) (Yl , θl )2 ]. Table 2 lists D2l , D3l , and D4l for some
distributions in the class of dispersion models.
Let X = (x1 , . . . , xn )> be the n × p design matrix of full column rank (rank(X) = p). The
score vector and the Fisher information matrix for β are given, respectively, by Uβ = φX > t(1) and (1)
(1)
Kβ = φX > W X, where t(1) = (t1 , . . . , tn )> and W = W (β, φ) = diag{w1 , . . . , wn } with 6
ACCEPTED MANUSCRIPT
Table 2: D2l , D3l , and D4l for some dispersion models. Model
D2l
D3l
D4l
Normal
−1
0
0
−(−2θl )−3/2
−3(−2θl )−5/2
−15(−2θl )−7/2
−1
1
−1
−I1 (φ)/I0 (φ)
0
I1 (φ)/I0 (φ)
Inverse normal Gamma Log–gamma von Mises
2/θl3
−6/θl4
CR IP T
−1/θl2
wl = −D2l (dθl /dηl )2 . The maximum likelihood estimate of β = (β1 , . . . , βp )> , βb = (βb1 , . . . , βbp )> ,
comes from the system of equations Uβ = 0, which does not have explicit solution except for normal models. From the Fisher scoring method, the maximum likelihood estimate of β can be obtained by
AN US
iteratively solving the system of equations X > W (m) Xβ (m+1) = X > W (m) y ∗(m) ,
m = 0, 1, . . . ,
−1 −1 where y ∗(m) = Xβ (m) + N (m) t(1)(m) with N = −diag{D21 dη1 /dθ1 , . . . , D2n dηn /dθn }, and m is
the iteration counter. Note that βb depends on the distribution assumed for the data through D2l , and b = 0, [t(yl , θbl ) + a(1) (yl , φ)]
ED
n X
M
b comes from the equation does not depend on φ. The maximum likelihood estimate of φ, φ, l=1
(4)
where a(1) (yl , φ) = da(yl , θl )/dφ. It can be shown that E[∂ 2 `(β, φ)/∂β∂φ] = 0 and, hence, the
PT
Fisher information matrix is block–diagonal, i.e. β and φ are globally orthogonal in the sense of Cox
CE
(1) b and Reid [43]. In particular, for the proper dispersion models, equation (4) reduces to d1 (φ) = Pn (1) − l=1 t(yl , θbl )/n, where d1 (φ) = dd1 (φ)/dφ, and the Fisher information matrix for the parameter
vector (β, φ) becomes K = K(β, φ) = diag{φX > W X, −nd2 d1 (φ)/dφ2 }. (0)
AC
Consider the problem of testing the null hypothesis H0 : β1 = β1 against the alternative hypoth(0)
esis H1 : β1 6= β1 , where β is partitioned as β = (β1> , β2> )> , with β1 = (β1 , . . . , βq )> being the (0)
vector of parameters of interest, and β2 = (βq+1 , . . . , βp )> . Here, β1 is a fixed vector of dimension
q and β2 and φ act as nuisance parameters. The partition of β induces the following partitions: X1> t(1) X1> W X1 X1> W X2 , Kβ = φ , Uβ = φ X2> t(1) X2> W X1 X2> W X2 7
ACCEPTED MANUSCRIPT
where the matrix X is partitioned as X = X1 X2 , with X1 being an n × q matrix, and X2 being an n × (p − q) matrix. The Wald, LR, score, and gradient statistics for testing H0 against H1 are given, respectively, by
b βb1 − β (0) )> R b >W c R( b βb1 − β (0) ), SW = φ( 1 1 b − `(β (0) , βe2 , φ) e , SLR = 2 `(βb1 , βb2 , φ) 1 f 1/2 X1 (βb1 − β (0) ), ST = φe1/2 se> W 1
CR IP T
f R) e −1 X > W f 1/2 se, f 1/2 X1 (R e >W SR = se> W 1
(1)
where R = X1 − X2 (X2> W X2 )−1 X2> W X1 , s = (s1 , . . . , sn )> with sl = φ1/2 tl (−D2l )−1/2 , b is the unrestricted maximum likelihood estimator of (β1 , β2 , φ), and (β (0) , βe2 , φ) e is the (βb1 , βb2 , φ) 1
AN US
restricted maximum likelihood estimator of (β1 , β2 , φ) obtained by imposing that the null hypothesis
holds. “Hat” and “tilde” indicate evaluation at the unrestricted and restricted maximum likelihood estimator of (β1 , β2 , φ), respectively. Under the null hypothesis H0 , the limiting distribution of the four statistics is χ2q with approximation error of order O(n−1 ); see, for instance, Sen and Singer [44]
and Lemonte [3]. The null hypothesis is rejected for a given nominal level, α say, if the observed
Improved tests
ED
3
M
value of the test statistic exceeds the upper 100(1 − α)% quantile of the χ2q distribution.
Henceforth, we assume that a(y, φ) = d1 (φ) + d(y) in (1). The Bartlett correction to the LR statistic
PT
and the Bartlett–type correction to the score statistic in proper dispersion models are given, respectively, in Cordeiro et al. [39] and Ferrari et al. [40], and will be revisited in this section. Also, from
CE
the general results in Vargas et al. [9], we will derive a general Bartlett–type correction factor to the gradient statistic. This result is new and represents an additional tool to perform testing inference
AC
in proper dispersion models. Further, we also consider the alternative testing procedure based on bootstrap.
To define the corrected LR and score statistics as well as to derive Bartlett–type corrections for the
gradient statistic in dispersion models, some additional notation is in order. We define the n × n matrices: Z = X(X > W X)−1 X > = ((zlc )), Z2 = X2 (X2> W X2 )−1 X2> = ((z2,lc )) when q < p and
Z2 = 0 when q = p, Zd = diag{z11 , . . . , znn }, Z2d = diag{z2,11 , . . . , z2,nn }, F = diag{f1 , . . . , fn },
8
ACCEPTED MANUSCRIPT
G = diag{g1 , . . . , gn }, E = diag{e1 , . . . , en }, J = diag{j1 , . . . , jn }, R = diag{r1 , . . . , rn },
CR IP T
H = diag{h1 , . . . , hn }, S = diag{s∗1 , . . . , s∗n }, Q = diag{q1 , . . . , qn }, where 3 3 dθl d2 θl dθl dθl d2 θl dθl 0 fl = − D2l − D3l , gl = − D2l , el = − D2l , 2 2 dηl dηl dηl dηl dηl dηl 4 dθl 00 0 (2) 2 , jl = −3D4l + 8D3l − 6D2l + 3φD2l − 3φD2l dηl " # 4 2 2 2 dθl dθl d θl 0 d2 θl dθl d3 θl 00 rl = D2l + 5 + D2l , D +2 dηl dηl dηl2 2l dηl2 dηl dηl3 4 dθl 00 0 (2) 2 hl = −2D4l + 6D3l − 5D2l + 2φD2l − 2φD2l dηl 2 2 dθl d θl 0 + 2D3l − 3D2l , dηl dηl2
AN US
# 2 2 2 3 2 dθl dθ d θ d θ d θl l l l ∗ + D2l + 3 D3l sl = 3 2 3 dηl dηl dηl dηl dηl2 4 2 2 dθl dθl d θl 0 0 + D3l + 3 D , dηl dηl dηl2 2l " # 4 2 2 2 dθl dθl d θl d2 θl dθl d3 θl ql = D4l + 6 D3l + 3 +4 D2l , dηl dηl dηl2 dηl2 dηl dηl3
M
"
0
00
(j)
0
(2)
ED
where Dil = ∂Dil /∂θl , Dil = ∂Dil /∂θl , and Dil = E[(∂ i t(Yl , θl )/∂θli )j ] for i, j = 1, 2, 3, 4. Let Z (2) = Z Z, Z2 = Z2 Z2 , Z (3) = Z (2) Z, etc., where “ ” denotes the element–wise product
PT
of two matrices (Hadamard product). The matrices φ−1 Z and φ−1 Z2 are the asymptotic covariance matrices of X βb and X2 βe2 , respectively.
(0)
CE
Cordeiro et al. [39] derived a Bartlett correction to the LR statistic for testing H0 : β1 = β1
in proper dispersion models by using the general results in Lawley [6]. The Bartlett-corrected LR
AC
statistic is given by
∗ SLR = SLR (1 − aLR ),
where aLR = (ALR + ALR,βφ )/q, with 1 > 1 (3) (2) (2) 1n G(Z (3) − Z2 )(F + G)1n + 1> [Q + 4(R − S)](Zd − Z2d )1n 3φ 4φ n 1 > (3) 1n F [2(Z (3) − Z2 ) + 3(Zd ZZd − Z2d Z2 Z2d )]F 1n + 12φ
ALR = −
(3)
(3) + φ−1 1> − Z2 ) + (Zd ZZd − Z2d Z2 Z2d )](E − F + G)1n , n (E + G)[(Z
9
(5)
ACCEPTED MANUSCRIPT
" # q d(2) + d(3) 2p − q = , − 2n d2(2) 2d(2)
ALR,βφ
00
where 1n = (1, . . . , 1)> is an n-dimensional vector of ones, d(2) = d(2) (φ) = φ2 d1 (φ) and d(3) = 000
00
0
000
00
d(3) (φ) = φ3 d1 (φ), with d1 (φ) = dd1 (φ)/dφ and d1 (φ) = dd1 (φ)/dφ. Under the null hypothesis, ∗ has an asymptotic χ2q distribution with approximation error reduced from O(n−1 ) to O(n−2 ). SLR
(0)
Ferrari et al. [40] derived a Bartlett–type correction to the score statistic for testing H0 : β1 = β1
corrected score statistic is
CR IP T
in proper dispersion models by taking into account the general results of Cordeiro and Ferrari [8]. The SR∗ = SR 1 − cR + bR SR + aR SR2 ,
(6)
where aR = AR3 /[12q(q + 2)(q + 4)], bR = (AR22 − 2AR3 )/[12q(q + 2)], cR = (AR11 − AR22 +
AN US
AR3 )/(12q), with AR11 = AR1 + AR1,βφ , AR22 = AR2 + AR2,βφ ,
AR1 = 3φ−1 1> n (2E + 2G − F ) Z2d (Z − Z2 )Z2d (2E + 2G − F ) 1n − 6φ−1 1> n (2E + 2G − F ) Z2d Z2 (Z − Z2 )d (2F − 2G − 3E) 1n (2)
− 6φ−1 1> n (3F − 2G − 4E) [Z2 (Z − Z2 )] (2E + 2G − F ) 1n
M
− 6φ−1 1> n H(Z − Z2 )d Z2d 1n ,
ED
AR2 = 6φ−1 1> n (2E + 2G − F ) Z2d (Z − Z2 )(Z − Z2 )d (2F − 2G − 3E) 1n − 3φ−1 1> n (2F − 2G − 3E) [(Z − Z2 )d Z2 (Z − Z2 )d (2)
PT
+ 2(Z − Z2 )d Z2 ] (2F − 2G − 3E) 1n (2)
CE
+ 3φ−1 1> n J (Z − Z2 )d 1n ,
AC
AR3 = φ−1 1> n (2F − 2G − 3E) [3(Z − Z2 )d (Z − Z2 )(Z − Z2 )d + 2(Z − Z2 )(3) ] (2F − 2G − 3E) 1n ,
AR1,βφ =
6 ˙ (Z − Z2 )}] [d(3) − (p − q − 2)d(2) ][q − φtr{W nd2(2) +
6φ ˙ )(Z − Z2 )W ˙ Z2 } [2tr{(2W − φW nd(2) ¨ (Z − Z2 )}], + φtr{W
10
ACCEPTED MANUSCRIPT
AR2,βφ =
3 h ˙ (Z − Z2 )})2 (−q + φtr{W nd(2) +
i ˙ (Z − Z2 )W (Z − Z2 )}) + 2(q − 2φtr{W 6φ2 ˙ (Z − Z2 )W ˙ (Z − Z2 )}, tr{W nd(2)
where the notation (·)d indicates that the off–diagonal elements are zero, tr{·} is the trace operator,
CR IP T
˙ = diag{w˙ 1 , . . . , w˙ n } and W ¨ = diag{w¨1 , . . . , w¨n } with w˙ l = dwl /dφ and w¨l = dw˙ l /dφ. In and W ˙ =W ¨ = 0. The correction factor [1 − (cR + exponential family nonlinear models, it follows that W
bR SR + aR SR2 )] in (6) reduces the approximation error of the asymptotic χ2q distribution of SR∗ from O(n−1 ) to O(n−3/2 ).
Vargas et al. [9] obtained a general formula for a Bartlett–type correction to the gradient statistic in
AN US
regular parametric models. Their formula may, in principle, be particularized to specific models. For instance, Vargas et al. [21] obtained a Bartlett-type correction to the gradient statistic in generalized linear models.
From the general formula in Vargas et al. [9] we now derive a Bartlett–type correction to the gradient statistic in proper dispersion models. Let κrs = E(∂ 2 `/∂βr ∂βs ), κrst = E(∂ 3 `/∂βr ∂βs ∂βt ), (t)
(tu)
M
κrstu = E(∂ 4 `/∂βr ∂βs ∂βt ∂βu ), κrs = ∂κrs /∂βt , κrs = ∂ 2 κrs /∂βt ∂βu , κφφ = E(∂ 2 `/∂φ2 ), κrφ = (s)
E(∂ 2 `/∂βr ∂φ), κrφ = ∂ 2 κrφ /∂βs , and so on. The indices r, s, t, and u range from 1 to p. Note dispersion models, we have n X
κrstu = φ
l=1 n X
AC
κ(tu) rs = φ
κrsφ = − (φ)
κrsφ = −
l=1 n X
l=1 n X
wl xlr xls , w˙ l xlr xls ,
ql xlr xls xlt xlu ,
(u)
κrst = φ
κ(t) rs
n X
sl xlr xls xlt xlu ,
l=1
n X = −φ (e + 2g)l xlr xls xlt , l=1
rl xlr xls xlt xlu ,
l=1 n X
n X l=1
(f + 2g)l xlr xls xlt ,
CE
κrst = −φ
wl xlr xls ,
PT
κrs = −φ
ED
that κrs , κrst , κrstu , κφφ , κrφ , κrsφ and κrsφφ are cumulants of the log-likelihood function. In proper
κ(φ) rs
κφφ = nd(2) /φ2 , =−
n X
(φ)
κφφφ = κφφ = nd(3) /φ3 ,
(7)
(wl + φw˙ l )xlr xls ,
l=1
(s)
κrsφφ = κrφφ = κrφφ = 0.
l=1
From Vargas et al. [9], the Bartlett–type corrected gradient statistic takes the form ST∗ = ST 1 − cT + bT ST + aT ST2 , 11
(8)
ACCEPTED MANUSCRIPT
where aT = AT3 /[12q(q + 2)(q + 4)], bT = (AT22 − 2AT3 )/[12q(q + 2)], cT = (AT11 − AT22 +
AT3 )/(12q), with AT11 = AT1 + AT1,βφ and AT22 = AT2 + AT2,βφ . By replacing the cumulants given in (7) in the expressions of Theorem 1 in Vargas et al. [9], and after some algebra, we arrive at1
(2)
AT1 = 3φ−1 1> n (F + 2G)[Z2d (Z − Z2 )Z2d + 2Z2 (Z − Z2 )+ (2)
CR IP T
+ 2(Z − Z2 )d Z2 Z2d + Z2d (Z − Z2 )(Z − Z2 )d ](F + 2G)1n
(2) − 6φ−1 1> − Z2 ) + (Z − Z2 )d (ZZd + Z2 Z2d ) n (F + 2G)[(Z + Z2 ) (Z (2)
+ 2Z2d (ZZd − Z2 Z2d ) + 2Z2 (Z − Z2 )](E + 2G)1n (3)
(3) + 12φ−1 1> − Z2 ](E + 2G)1n n (E + 2G)[Zd ZZd − Z2d Z2 Z2d + Z
(2)
(2)
+ 12φ−1 1> n R(Zd − Z2d )1n ,
3 (Z − Z2 )d (Z − Z2 )(Z − Z2 )d + 2Z2 (Z − Z2 )(2) 4 1 (3) + (Z − Z2 ) + Z2d (Z − Z2 )(Z − Z2 )d + (Z − Z2 )d Z2 (Z − Z2 )d (F + 2G)1n 2 −3φ−1 1> n (F
+ 2G)
M
AT2 =
AN US
−1 > − 6φ−1 1> n S(Z − Z2 )d (Zd + 3Z2d )1n + 6φ 1n Q(Z − Z2 )d Z2d 1n
(2)
(2) + 6φ−1 1> − Z2 ) + (Z − Z2 )d (ZZd − Z2 Z2d )](E + 2G)1n n (F + 2G)[(Z − Z2 ) (Z (2)
3 1 (3) + 2G) (Z − Z2 )d (Z − Z2 )(Z − Z2 )d + (Z − Z2 ) (F + 2G)1n , 4 2
PT
AT3 =
φ−1 1> n (F
ED
+ 3φ−1 1> n (2S − Q)(Z − Z2 )d 1n ,
AT1,βφ =
6q [d(3) + (2 − p + q)d(2) ], nd2(2)
AT2,βφ =
3q(q + 2) . nd(2)
CE
The correction factor [1−(cT +bT ST +aT ST2 )] in (8) reduces the approximation error of the asymptotic χ2q distribution of ST∗ from O(n−1 ) to O(n−3/2 ).
AC
A brief commentary on the quantities that define the improved statistics is in order. The coeffi-
cients ALR , AR ’s and AT ’s are functions of the model matrix X (through Z and Z2 ), the parameter φ, and the second and third order derivatives of the link function. The terms ALR,βφ and AT1,βφ depend on the number of regression parameters (p) and the number of parameters of interest (q), while AR1,βφ and AR2,βφ are functions of p, q and the model matrix, and the term AT2,βφ is function of the number of parameters of interest. Unfortunately, they are not easy to interpret in generality and provide no 1
The detailed derivation of the formulas for the A’s is omitted but may be requested to the authors.
12
ACCEPTED MANUSCRIPT
indication as to what structural aspects of the model contribute significantly to their magnitude. It is noteworthy that the A’s can be easily implemented in computational programming languages with support for matrix operations, such as Ox (Doornik [45]), R (R Core Team [46]), and MAPLE (Rafter et al. [47]). Finally, all unknown parameters in the quantities that define the improved statistics are replaced by their restricted maximum likelihood estimates. The improved LR, score, and gradient tests that employ (5), (6), and (8), respectively, as test statistics, follow from the comparison of the
CR IP T
∗ , SR∗ , and ST∗ with the critical value obtained as the appropriate χ2q quantile. observed values of SLR
For generalized linear models, where t(y, θ) = yθ −b(θ) in equation (1), we have g(db(θl )/dθl ) = 0
00
−1 −2 −2 2 −1 2 2 g(µl ) = ηl = x> l β, with D2l = −Vl , D2l = Vl dVl /dµl , D2l = Vl [d Vl /dµl −2Vl (dVl /dµl ) ], (2)
0
D2l = Vl−2 + Vl−3 (dVl /dµl )2 /φ, D3l = 2Vl−2 dVl /dµl , D3l = 2Vl−2 [d2 Vl /dµ2l − 2Vl−1 (dVl /dµl )2 ] and D4l = 3Vl−2 [d2 Vl /dµ2l − 2Vl−1 (dVl /dµl )2 ]. Additionally,
3
dµl dηl
s∗l = 9λ1l − 3λ2l − 3λ3l − 4λ4l + 2λ5l ,
dµl dηl
3
,
hl = λ1l + λ5l ,
M
jl = λ4l + λ5l ,
PT
ED
2 2 2 dµl d µl 1 d 2 µl 1 dµl d3 µl , λ = , λ = , 2l 3l dηl dηl2 Vl dηl2 Vl dηl dηl3 2 4 4 1 dVl dµl 1 d2 Vl dµl λ4l = 3 , λ5l = 2 2 . Vl dµl dηl Vl dµl dηl
1 dVl λ1l = 2 Vl dµl
,
1 dVl gl = 2 Vl dµl
rl = 5λ1l − 2λ2l − 2λ3l − 2λ4l + λ5l ,
ql = 12λ1l − 3λ2l − 4λ3l − 6λ4l + 3λ5l , where
AN US
1 dµl d2 µl 1 dVl fl = −2 2 2 Vl dηl dηl Vl dµl
By replacing these formulas in (5), (6), and (8), and after some algebraic manipulations, we obtain the
CE
Bartlett correction to the LR statistic given by Cordeiro [18], the Bartlett–type correction to the score statistic derived by Cribari–Neto and Ferrari [19], and the Bartlett–type correction to the gradient
AC
statistic obtained by Vargas et al. [21].2 Another way of improving the size properties of tests is achieved by the bootstrap technique. The
bootstrap corrected tests are performed as follows. Let S denote any of the uncorrected statistics. First, one generates B bootstrap samples from the assumed model with the parameters replaced by restricted estimates computed using the original sample, under H0 , i.e. imposing the restrictions stated 2
In the formula for bl in Vargas et al. [21, p. 113], λ3l should be replaced by λ5l ; see the expression for b in Cribari–
Neto and Ferrari [19, p. 428].
13
ACCEPTED MANUSCRIPT
in the null hypothesis. Second, for each pseudo–sample, one computes the statistic; let S b denote the statistic for the b–th sample, b = 1, . . . , B. Third, the 1 − α percentile of S is estimated by qb1−α , such
that #{S b ≤ qb1−α }/B = 1 − α. Finally, one rejects the null hypothesis if S > qb1−α . Alternatively, the test may be based on the bootstrap p–value given by pb = #{S b ≥ S}/B. We denote the bootstrap
b b corrected test statistics by SW , SLR , SRb and STb . For a good discussion of bootstrap tests, see Efron
and Tibshirani [48, Ch. 16]. Also, it is possible to prove second-order asymptotic results on the
constructed using bootstrap estimate of the critical point is O(n−2 ).
CR IP T
bootstrap procedure. In fact, Carpenter [49] proved that the coverage error of LR confidence intervals
The corrections presented in this section theoretically reduce the order of approximation of the
true, unknown distribution of the test statistics by the limiting χ2 distribution under the null hypothesis. In the next section, we numerically compare the size and power of the tests that employ the
AN US
∗ , SR∗ and ST∗ using Monte uncorrected statistics SW , SLR , SR , and ST , and the corrected statistics SLR
Carlo simulations. Bootstrap–based tests are also included in the simulation study for comparison.
Monte Carlo simulations
M
4
We performed some Monte Carlo simulation experiments to investigate the size and power perfor-
ED
mances of the usual Wald, LR, score, and gradient tests, and the tests that use the corrected statistics ∗ , SR∗ , and ST∗ in small and moderate sized samples in dispersion models. We also included bootSLR b b strap tests (SW , SLR , SRb , and STb ) in the simulations for the sake of comparison. The simulations were
PT
carried out using the Ox matrix programming language, which is freely distributed for academic purposes and is available at http://www.doornik.com. All maximizations of log–likelihood func-
CE
tion were performed using the Broyden–Fletcher–Goldfarb–Shanno (BFGS) quasi–Newton method with analytical first derivatives through the MaxBFGS subroutine.
AC
The null hypothesis under test is H0 : β1 = · · · = βq = 0. We computed the type I error
probability of the tests that use the above statistics for the nominal levels α = 10%, 5%, and 1%;
i.e., we computed the proportion of the times that the test statistic exceeds the asymptotic critical value in 10, 000 Monte Carlo replicates. We considered different values for the number of regression parameters (p), the number of parameters being tested (q), and the sample sizes (n = 20, 25, and 30). We used B = 600 replicates for the bootstrap–based tests. The results are presented in Tables 3 and 4. Entries are percentages. 14
ACCEPTED MANUSCRIPT
First, we consider the von Mises model, which is the standard model for circular data; see Carta et al. [50] and Mardia and Jupp [33]. Let π(y; θ, φ) =
exp[φ cos(y − θ)] , 2πI0 (φ)
y ∈ (−π, π),
(9)
where θ ∈ (−π, π), φ > 0 is the precision parameter, and Iν (·) denotes modified Bessel function of
first kind and order ν. This distribution is symmetric around y = θ, and the larger the value of φ,
CR IP T
the greater the concentration around θ. Model (9) is a special case of (1) when t(y, θ) = cos(y − θ)
and a(y, φ) = − log(I0 (φ)) − log(2π), and hence it is a proper dispersion model. The systematic
component of the model is
tan(θl /2) = ηl = β0 + β1 xl1 + · · · + βp−1 xl,p−1 ,
l = 1, . . . , n,
AN US
and hence θl = 2 arctan(ηl ). The covariates were taken as random draws of the uniform distribution U(−0.5; 0.5) and were kept fixed in the Monte Carlo replicates. Also, the regression parameters,
except for those fixed at the null hypothesis, were set at one. The precision parameter was fixed at φ = 4.
Tables 3 and 4 indicate that the usual Wald and LR tests are the most liberal (rejecting the null
M
hypothesis more frequently than expected based on the selected nominal level). For instance, for p = 6, n = 25, and α = 5%, the null rejection rates are, respectively, 14.06% and 11.84% (q = 3),
ED
12.71% and 11.23% (q = 2), and 10.33% and 9.24% (q = 1); see Table 4. The rejection rates of the LR test, although being smaller than those of the Wald test, are considerably bigger than those of the
PT
usual score and gradient tests. For the example just mentioned, the rejection rates of the score and gradient tests are 6.70% and 7.03% (q = 3), 8.27% and 8.59% (q = 2), and 7.79% and 8.04% (q = 1), respectively.
CE
When the number of parameters being tested decreases keeping the number of regression parameters fixed, the null rejection rates of the tests that employ SW and SLR tend to get smaller, and the
AC
opposite is observed for the tests that use SR and ST . For instance, for p = 4, n = 20, and α = 10%, the rejection rates of the Wald and LR tests are 20.70% (SW ) and 16.98% (SLR ) for q = 3, 17.13% (SW ) and 16.59% (SLR ) for q = 2, and 15.72% (SW ) and 15.43% (SLR ) for q = 1, while the rejection rates for the tests that employ SR and ST are 11.49% (SR ) and 11.48% (ST ) for q = 3, 12.92% (SR ) and 13.47% (ST ) for q = 2, and 14.16% (SR ) and 14.33% (ST ) for q = 1; see Table 3. The tests that use the corrected statistics and the bootstrap tests present null rejection rates close to the nominal levels. It is also noteworthy that the modified tests are influenced neither by the number 15
ACCEPTED MANUSCRIPT
of regression parameters nor by the number of parameter under test. For example, for p = 6, n = 20, ∗ b and α = 5%, the null rejection rates are 4.82% (SLR ), 4.79% (SR∗ ), 4.85% (ST∗ ), 4.78% (SW ), 4.84% b ∗ b ), ), 5.34% (SR∗ ), 5.19% (ST∗ ), 4.80% (SW ), 4.91% (SRb ) and 5.10% (STb ) when q = 3; 4.91% (SLR (SLR ∗ b ), 5.21% (SR∗ ), 5.05% (ST∗ ), 5.07% (SLR ), 5.12% (SRb ), and 5.10% (STb ) when q = 2; and 4.79% (SLR b b 4.72% (SW ), 4.88% (SLR ), 4.91% (SRb ), and 4.85% (STb ) when q = 1; see Table 4.
The second simulation experiment deals with the log–gamma dispersion model, which is often
CR IP T
applied in survival data analysis; see Ortega et al. [51], Ortega et al. [52], and Lawless [53, Cap. 5]. We have
π(y; θ, φ) = exp{φ[y − θ − exp(y − θ)] + φ log φ − log Γ(φ)},
y > 0,
(10)
where θ ∈ IR and φ > 0. The density function in (10) is a special case of (1) with t(y, θ) =
AN US
y − θ − exp(y − θ) and a(y, θ) = φ log(φ) − log(Γ(φ)), and hence the model is a proper dispersion model. Let
g(θl ) = ηl = θl = β0 + β1 xl1 + · · · + βp−1 xl,p−1 ,
l = 1, . . . , n.
The covariates were drawn from the uniform distribution U(0; 1) and were kept fixed in the Monte
Carlo replicates. The regression parameters, except for those in the fixed in the null hypothesis, are
M
set equal to one, and φ = 3.
The simulation findings for the log–gamma regression model are qualitatively similar to those
ED
reported for the von Mises regression model (results not shown to save space). Again, the corrected tests and the bootstrap tests are clearly much less size distorted than the original tests. The Bartlett
PT
and Bartlett–type corrections as well as the bootstrap technique are remarkably effective in correcting the size distortions of type I error of the large sample tests when the sample size is small or moderate.
CE
We also performed simulation experiments to investigate the power of the tests. As the simulation results above show, the tests have different sizes when one uses their asymptotic χ2 distribution in
AC
small– and moderate–sized samples. In evaluating the power of these tests, it is important to ensure that they all have the correct size under the null hypothesis. To overcome this difficulty, we first ran 500, 000 Monte Carlo replicates to estimate the exact critical value for each test (except for the bootstrap tests). So, the exact critical value guarantees that all tests have the same type I error, which allows us to compare their powers. Figure 1 displays the power curves for the tests of H0 : β1 = β2 =
0 in von Mises and log–gamma models with n = 30, p = 3, q = 2, and α = 10%. The power of
the tests were calculated under the alternative hypothesis H1 : β1 = β2 = δ for a grid of values for 16
ACCEPTED MANUSCRIPT
20
2
25
30
SR∗
ST∗
b SW
b SLR
SRb
STb
10
20.70
16.98
11.49
11.48
10.38
10.25
10.18
10.00
10.34
10.50
10.57
5
13.72
10.05
5.06
4.89
5.18
4.86
4.86
5.02
5.25
5.31
5.36
1
5.46
2.66
0.52
0.44
0.98
0.78
0.80
1.04
1.05
1.13
1.03
10
18.52
14.85
10.65
10.26
9.75
9.63
9.66
9.93
9.72
9.73
9.85
5
11.71
8.43
4.77
4.50
4.84
4.74
4.65
4.84
4.86
4.90
4.99
1
4.23
2.05
0.47
0.39
0.79
0.65
0.65
0.79
0.84
0.85
0.85
10
16.77
14.28
10.76
10.74
10.09
9.87
9.96
10.19
10.09
10.03
10.09
5
10.25
7.65
4.91
4.82
4.93
4.78
4.82
4.99
4.98
4.95
5.05
1
3.29
2.00
0.83
0.72
1.10
0.96
0.97
1.06
1.19
1.09
1.16
10
17.13
16.59
12.92
13.47
9.74
10.04
9.75
9.64
9.85
10.01
9.83
5
10.46
9.43
5.99
6.49
4.86
4.71
4.67
4.69
4.92
4.86
4.88
1
3.49
2.63
0.94
1.08
1.01
0.93
0.90
1.04
1.14
1.12
1.16
10
17.05
15.38
12.68
12.96
10.45
10.62
10.55
10.50
10.51
10.63
10.57
5
10.51
8.96
6.24
1
3.70
2.50
1.01
10
15.02
13.97
11.64
5
8.91
7.79
1
2.91
10
6.41
5.30
5.13
5.25
5.36
5.42
5.22
5.45
1.09
1.15
0.99
1.08
1.29
1.21
1.22
1.27
12.11
10.05
9.97
10.17
10.03
10.06
9.97
10.17
5.91
6.04
5.09
5.10
5.08
5.13
5.12
5.25
5.18
2.13
1.05
1.08
1.11
1.03
1.05
1.30
1.19
1.20
1.20
15.72
15.43
14.16
14.33
10.12
10.53
10.40
10.09
10.19
10.21
10.26
5
9.21
9.08
7.68
7.75
5.08
5.24
5.31
4.94
5.21
5.23
5.33
1
2.77
2.47
1.33
1.42
0.96
0.86
0.96
0.95
1.04
1.02
1.09
10
14.35
13.63
12.33
12.65
9.84
9.92
9.97
9.69
9.90
9.84
9.90
5
8.34
7.58
6.43
6.65
4.93
5.05
4.99
4.97
4.99
5.03
5.03
1
2.46
2.08
1.27
1.14
0.94
0.94
0.93
1.02
0.99
1.02
0.97
10
13.88
13.71
12.75
13.02
10.14
10.21
10.20
9.97
10.21
10.16
10.22
5
7.64
7.42
6.43
6.64
5.01
5.06
5.05
5.04
5.07
5.04
5.11
1
2.16
2.00
1.24
1.32
0.83
0.93
0.89
0.93
0.97
1.02
0.98
AC
20
∗ SLR
1
25
30
CR IP T
30
ST
AN US
25
SR
M
3
SLR
ED
20
SW
α(%)
PT
n
CE
q
Table 3: Null rejection rates (%) of H0 : β1 = · · · = βq = 0 with p = 4; von Mises model.
17
ACCEPTED MANUSCRIPT
20
2
25
30
SR∗
ST∗
b SW
b SLR
SRb
STb
10
26.37
22.68
15.20
15.77
9.90
10.29
10.29
9.77
9.92
10.09
10.06
5
18.27
14.29
7.02
7.41
4.82
4.79
4.85
4.78
4.84
4.91
5.10
1
7.95
4.62
0.89
1.05
1.01
0.81
0.77
1.12
1.11
1.15
1.13
10
21.85
19.53
14.13
14.76
10.02
10.35
10.43
9.97
10.03
10.21
10.35
5
14.06
11.84
6.70
7.03
4.90
4.82
4.81
4.66
4.93
4.93
4.99
1
5.06
3.27
0.94
1.09
0.93
0.87
0.93
1.00
1.07
1.08
1.09
10
20.15
17.86
13.47
14.01
10.19
10.52
10.36
10.24
10.14
10.44
10.34
5
12.80
10.29
6.64
7.09
5.49
5.32
5.43
5.36
5.56
5.44
5.55
1
4.86
3.33
1.03
1.12
1.11
0.94
0.93
1.18
1.24
1.16
1.17
10
24.02
20.96
16.21
16.82
9.96
10.36
10.56
10.15
10.10
9.86
10.27
5
16.43
13.28
8.35
8.98
4.91
5.34
5.19
4.80
5.07
5.12
5.10
1
6.96
4.41
1.35
1.51
0.89
0.85
0.92
0.93
0.96
1.02
1.07
10
19.80
18.51
15.61
16.04
10.39
10.83
10.72
10.15
10.40
10.53
10.56
5
12.71
11.23
8.27
1
4.75
3.31
1.66
10
17.83
16.23
13.81
5
11.47
9.59
1
4.13
10
8.59
5.24
5.25
5.26
5.13
5.27
5.18
5.22
1.86
1.18
1.13
1.17
1.20
1.32
1.35
1.40
13.96
9.99
10.26
10.10
9.93
10.03
10.08
10.01
7.00
7.30
5.15
4.94
5.11
5.17
5.18
4.89
5.13
2.95
1.53
1.50
1.21
1.19
1.10
1.25
1.29
1.29
1.26
19.04
17.84
15.90
16.42
9.59
10.14
9.93
9.67
9.69
9.75
9.77
5
12.01
10.92
8.90
9.23
4.79
5.21
5.05
4.72
4.88
4.91
4.85
1
4.37
3.67
1.99
2.14
0.87
0.90
0.94
0.87
1.01
0.94
1.00
10
16.81
15.98
14.61
15.02
9.74
10.13
10.13
9.83
9.72
9.84
9.84
5
10.33
9.24
7.79
8.04
4.78
5.12
4.94
4.79
4.81
4.90
4.84
1
3.39
2.71
1.68
1.74
0.98
1.08
1.07
1.00
1.09
1.12
1.13
10
15.28
14.76
13.53
13.81
10.08
10.25
10.18
10.01
10.07
10.07
10.16
5
9.35
8.74
7.45
7.75
5.08
5.28
5.26
5.07
5.14
5.17
5.26
1
2.81
2.35
1.49
1.63
0.88
0.91
0.92
0.72
0.95
0.97
0.97
AC
20
∗ SLR
1
25
30
CR IP T
30
ST
AN US
25
SR
M
3
SLR
ED
20
SW
α(%)
PT
n
CE
q
Table 4: Null rejection rates (%) of H0 : β1 = · · · = βq = 0 with p = 6; von Mises model.
18
ACCEPTED MANUSCRIPT
δ. The power of all tests are similar and, as expected, as |δ| increases, the power tends to 1. Power
−1.5
−1.0
−0.5
0.0
0.5
1.0
1.5
−1.5
−1.0
−0.5
0.0
0.5
1.0
1.5
δ
AN US
δ
SW SLR SR ST S*LR S*R S*T SbW SbLR SbR SbT
CR IP T
0.6 power 0.4 0.0
α = 0.1
0.0
α = 0.1
0.8
1.0
0.2
0.4
power
0.6
SW SLR SR ST S*LR S*R S*T SbW SbLR SbR SbT
0.2
0.8
1.0
simulations for different values of n, p, q, φ, and α presented comparable results.
Figure 1: Power curves for the von Mises (left) and log–gamma (right) dispersion models.
In summary, the simulation study reveals that the Bartlett and Bartlett–type corrections as well
M
as the bootstrap method are effective in correcting the size distortion of the tests with no power loss. The Wald, LR, score, and gradient tests are liberal, i.e. they wrongly reject the null hypothesis more
ED
frequently than expected from the nominal level. Among all tests considered in the study, the usual Wald test is undoubtedly the most liberal. As earlier remarked, there is no general Bartlett–type
PT
correction available for the Wald test and hence the bootstrap method is a suitable option to correct its liberal behavior in small and moderate sized samples. For the other tests, the analytical corrections
Real data applications
AC
5
CE
are attractive because they are not computationally costly, unlike the bootstrap.
The first application deals with the data set presented in Fisher [34] on the distance travelled by 31 small blue periwinkles (Nodilittorina unifasciata) after they have moved down–shore from the height at which they normally live. Following Fisher and Lee [54], we assume a von Mises model for the animals’ path with constant precision parameter and link function tan(θl /2) = ηl = β0 + β1 xl , 19
l = 1, . . . , 31,
ACCEPTED MANUSCRIPT
where θl is the mean direction for a given distance moved xl (cm). The angular responses were transformed to the range (−π, π). Lemonte and Ferrari [35] considered the problem of testing H0 : (0)
(0)
β1 = 0 and H0 : β1 6= β1 , for different values of β1 , using the Wald, LR, score, and gradient tests. We revisit their analysis and include the analytically corrected tests and the bootstrap tests. The maximum likelihood estimates of the parameters (asymptotic standard errors in parentheses) are: βb0 = −0.3233 (0.1506), βb1 = −0.0129 (0.0039), and φb = 3.2654 (0.7262). The observed values
CR IP T
of the test statistics (p–values in parentheses) for testing H0 : β1 = 0 are: SW = 11.0308 (0.0009), ∗ SLR = 9.5257 (0.0020), SR = 7.1261 (0.0076), ST = 8.2801 (0.0040), SLR = 8.7682 (0.0031), b b SR∗ = 7.1901 (0.0073), and ST∗ = 8.8840 (0.0029). The p–values of the bootstrapped tests SW , SLR ,
SRb , and STb are, respectively, 0.0035, 0.0030, 0.0060, and 0.0015. At the usual significance levels, all tests reject the null hypothesis. (0)
(0)
AN US
We now turn to the null hypothesis H0 : β1 = β1 , for different values of β1 . The values chosen (0)
for β1 do not have a scientific meaning behind them; they were chosen to best highlight our results in different settings. The results are summarized in Table 5. The asterisks indicate rejection at 10% (∗ ), 5% (∗∗ ), and 1% (∗∗∗ ) significance levels. From Table 5, all tests reject H0 : β1 = −0.018 at the
M
b SRb , and STb lead to the usual significance levels. In all cases, the tests that use SR , ST , SR∗ , ST∗ , SLR
same decision. On the other hand, there may be considerable conflict among different tests in many
ED
instances. For example, the p–values of the tests of H0 : β1 = −0.022 vary from 1.89% (SW ) to values bigger than 7% (SR∗ , SRb , and STb ).
The second application regards a data set reported in Rosillo et al. [55] and Rosillo [56] on the
PT
lifetime of n = 14 fluorescent lamps. We assume the systematic component
CE
x2l θl = ηl = β0 + β1 x1l + β2 + β3 x3l
x2l x3l
2
,
l = 1, . . . , 14,
(11)
to describe the effect of the covariates glow (x1 ), voltage (x2 ), and nominal voltage (x3 ) on the
AC
lamp lifetime (Y ). We assume a log–gamma distribution for the lifetimes with unknown but constant precision φ for all observations. The maximum likelihood estimates (standard errors in parentheses) are: βb0 = 18.4791 (1.7089), βb1 = 9.4318 (4.2940), βb2 = −35.3273 (3.6169), βb3 = 16.7075 (1.8929),
and φb = 30.6994 (11.5408). The test statistics and corresponding p–values for testing H0 : β1 = 0
are given in Table 6. At the 5% significance level, the tests that employ SW , SLR , and SR reject the ∗ b b null hypothesis, but ST , SLR , SR∗ , ST∗ , SW , SLR , SRb , and STb lead to the opposite decision. Note that
p–values range from 2.81% to 9.67%. Also, the null hypotheses H0 : β2 = 0 and H0 : β3 = 0 are 20
ACCEPTED MANUSCRIPT
(0)
(0)
Table 5: Test statistics of H0 : β1 = β1 against H1 : β1 6= β1 (p–values in parentheses); periwinkles
data.
(0)
−0.026
∗∗∗
5.5088(0.0189)∗∗
11.408(0.0007)
SLR
7.3137(0.0068)∗∗∗
5.6057(0.0179)∗∗
4.0115(0.0452)∗∗
SR
5.8719(0.0154)∗∗
4.6364(0.0313)∗∗
3.4071(0.0649)∗
ST
5.7277(0.0167)∗∗
4.6110(0.0318)∗∗
3.4577(0.0630)∗
∗ SLR
6.7663(0.0093)∗∗∗
5.1847(0.0228)∗∗
3.7091(0.0541)∗
SR∗
5.7598(0.0164)∗∗
ST ∗
5.5647(0.0183)∗∗
b SW
(0.0035)∗∗∗
b SLR
(0.0105)∗∗
SRb
(0.0160)∗∗
STb
(0.0205)∗∗
SLR
3.2603(0.0710)∗
4.4667(0.0346)∗∗
3.3220(0.0684)∗
(0.0110)∗∗
(0.0295)∗∗
(0.0280)∗∗
(0.0580)∗
(0.0380)∗∗
(0.0730)∗
(0.0365)∗∗
(0.0730)∗
M
4.4935(0.0340)∗∗
(0)
β1
−0.018
∗
1.7331(0.1880)
2.5906(0.1075)
1.4114(0.2348)
2.2511(0.1335)
1.2493(0.2637)
3.3554(0.0670)
PT
SR
ED
SW
−0.020
8.1932(0.0042)
−0.022
∗∗∗
SW
statistics
ST
2.3320(0.1267)
1.3207(0.2505)
∗ SLR
2.3946(0.1218)
1.3043(0.2534)
SR∗
2.1270(0.1447)
1.1669(0.2800)
ST ∗
2.2138(0.1368)
1.2370(0.2600)
b SW
(0.0775)∗
(0.2105)
b SLR
(0.1095)
(0.2585)
SRb
(0.1335)
(0.2795)
STb
(0.1220)
(0.2740)
CE AC
−0.024
AN US
statistics
CR IP T
β1
21
ACCEPTED MANUSCRIPT
strongly rejected by all tests.
statistic
observed value
p–value
SW
4.8247
0.0281
SLR
4.2598
0.0390
SR
3.9429
0.0471
ST
3.7532
0.0527
∗ SLR
3.2287
0.0724
SR∗
3.2483
0.0715
ST∗
3.0578
0.0803 0.0967
b SLR
0.0933
SRb
0.0783
AN US
b SW
CR IP T
Table 6: Test statistics of H0 : β1 = 0 against H1 : β1 6= 0 and p–values; fluorescent lamps data.
STb
0.0900
From our simulations, we concluded that the usual Wald, LR, score, and gradient tests are size distorted when the sample size is small (here, n = 14) and that the corrections are effective in remov-
M
ing the type I error distortion. We then rely on the corrected tests, that indicate that the covariate glow
ED
may be removed from model (11). Hence, the final model is 2 x2l x2l , θl = ηl = β0 + β2 + β3 x3l x3l
l = 1, . . . , 14.
The maximum likelihood estimates (standard errors in parentheses) are: βb0 = 19.3698 (1.9430),
6
CE
PT
βb2 = −36.7263 (4.1548), βb3 = 17.3502 (2.1798), and φb = 22.7321 (8.5296).
Final remarks
AC
The main theoretical contribution of this paper is the derivation of a Bartlett–type correction factor in matrix notation to the gradient statistic in dispersion models. The simulation study compares the performance of several different tests in small and moderate sized samples. Our empirical results suggest that the usual large sample tests, namely the Wald, LR, score tests, and the more recently proposed test, the gradient test, may perform poorly if the sample is not large enough to guarantee close approximation to the asymptotic distribution. The effectiveness of the Bartlett correction to the LR statistic and the Bartlett–type corrections to the score and gradient statistics is remarkable 22
ACCEPTED MANUSCRIPT
and comparable to the improvement achieved by the bootstrap method without computational burden. The empirical applications with real data show that the uncorrected tests may lead to misleading conclusions if the sample is not large. We, therefore, recommend the use of analytically or bootstrap corrected tests when performing inference in dispersion models.
CR IP T
Acknowledgements We gratefully acknowledge the financial support of the Brazilian agencies FAPESP (grant 2012/21788– 2) and CNPq (grants 304388/20149 and 301808/2016–3).
AN US
References
[1] G.R. Terrell, The gradient statistic. Computing Science and Statistics, 34, (2002) 206–215. [2] V.M.R. Muggeo, G. Lovison, The ‘three plus one’ likelihood-based test statistics: uniffied geometrical and graphical interpretations. American Statistician, 68, (2014) 302–306.
M
[3] A.J. Lemonte, The Gradient Test: Another Likelihood-Based Test. Academic Press, London, 2016. [4] C.R. Rao, Score test: historical review and recent developments. In: Balakrishnan, N., Kannan, N.,
Birkhauser, Boston, 2005.
ED
Nagaraja, H.N. (Eds.) Advances in Ranking and Selection, Multiple Comparisons, and Reliability.
PT
[5] A.J. Lemonte, S.L.P. Ferrari, The local power of the gradient test. Annals of the Institute of Statistical
CE
Mathematics, 64, (2012) 373–381. [6] D. Lawley, A general method for approximating to the distribution of likelihood ratio criteria. Biometrika,
AC
43, (1956) 295–303.
[7] M.S. Bartlett, Properties of sufficiency and statistical tests. Proceedings of the Royal Society A, 160, (1937) 268–282.
[8] G.M. Cordeiro, S.L.P. Ferrari, A modified score test statistic having chi–squared distribution to order n−1 . Biometrika, 78, (1991) 573–582. [9] T.M. Vargas, S.L.P. Ferrari, A.J. Lemonte, Gradient statistic: higher order asymptotics and Bartlett–type correction. Electronic Journal of Statistics, 7, (2013) 43–61.
23
ACCEPTED MANUSCRIPT
[10] G.M. Cordeiro, F. Cribari–Neto, An Introduction to Bartlett Correction and Bias Reduction. Springer, New York, 2014. [11] A.J. Lemonte, G.M. Cordeiro, G. Moreno–Arenas, Bartlett corrections in Birnbaum–Saunders nonlinear regression models. Journal of Statistical Computation and Simulation, 82, (2012) 927–935. [12] A.J. Lemonte, G.M. Cordeiro, G. Moreno–Arenas, Improved likelihood-based inference in Birnbaum–
CR IP T
Saunders nonlinear regression models. Applied Mathematical Modelling, 40, (2016) 8185–8200. [13] S.L.P. Ferrari, M.A. Uribe–Opazo, Corrected likelihood ratio tests in a class of symmetric linear regression models. Brazilian Journal of Probability and Statistics, 15, (2001) 49–67.
[14] M.A. Uribe–Opazo, S.L.P. Ferrari, G.M. Cordeiro, Improved score tests in symmetric linear regression
AN US
models. Communications in Statistics–Theory and Methods, 37, (2008) 261–276.
[15] F.M.C. Medeiros, S.L.P. Ferrari, Small-sample testing inference in symmetric and log-symmetric linear regression models. Statistica Neerlandica, doi: 10.1111/stan.12107.
[16] B.M. Lagos, P.A. Morettin, L.P. Barroso, Some corrections of the score test statistic for Gaussian ARMA
M
models. Brazilian Journal of Probability and Statistics, 24, (2010) 434–456. [17] G.M. Cordeiro, Improved likelihood ratio statistics for generalized linear models. Biometrika, 74, (1983)
ED
265–274.
[18] G.M. Cordeiro, On the corrections to the likelihood ratio tatistics. Journal of the Royal Statistical Society
PT
B, 45, (1987) 404–413.
[19] F. Cribari–Neto, S.L.P. Ferrari, Second order asymptotics for score tests in generalised linear models.
CE
Biometrika, 82, (1995) 426–432.
[20] G.M. Cordeiro, S.L.P. Ferrari, G.A. Paula, Improved score tests for generalized linear models. Journal of
AC
the Royal Statistical Society B 55, (1993) 661–674.
[21] T.M. Vargas, S.L.P. Ferrari, A.J. Lemonte, Improved likelihood inference in generalized linear models. Computational Statistics & Data Analysis, 74, (2014)110–124.
[22] D.M. Zucker, O. Lieberman, O. Manor, Improved small sample inference in the mixed linear model: Bartlett correction and improved likelihood. Journal of the Royal Statistical Society B, 62, (2000) 827– 838.
24
ACCEPTED MANUSCRIPT
[23] O. Manor, D.M. Zucker, Small sample inference for the fixed effects in the mixed linear model. Computational Statistics & Data Analysis, 46, (2004) 801–817 [24] B.M. Lagos, P.A. Morettin, Improvement of the likelihood ratio test statistic in ARMA models. Journal of Time Series Analysis, 25, (2004) 83–101. [25] D. Tu, J. Chen, P. Shi, Y. Wu, A Bartlett type correction for Rao’s score test in Cox regression model.
CR IP T
Sankhya, 67, (2005) 722–735. [26] N.P.A. van Giersbergen, Bartlett correction in the stable AR(1) model with intercept and trend. Econometric Theory, 25, (2009) 857–872.
[27] P. Bai, Sphericity test in a GMANOVA–MANOVA model with normal error. Journal of Multivariate
AN US
Analysis, 100, (2009) 2305–2312.
[28] A.J. Lemonte, S.L.P. Ferrari, F. Cribari–Neto, Improved likelihood inference in Birnbaum–Saunders regressions. Computational Statistics & Data Analysis, 54, (2010) 1307–1316. [29] A. Fujita, K. Kojima, A.G. Patriota, J.R. Sato, P. Severino, S. Miyano, A fast and robust statistical test
formatics, 26, (2010) 2349–2351.
M
based on likelihood ratio with Bartlett correction to identify Granger causality between gene sets. Bioin-
ED
[30] H. Noma, Confidence intervals for a random-effects meta-analysis based on Bartlett-type corrections. Statistics in Medicine, 30, (2011) 3304–3312.
PT
[31] B. Jørgensen, The Theory of Dispersion Models. Chapman and Hall, London, 1997. [32] A. Lee, Circular data. Wiley Interdisciplinary Reviews: Computational Statistics, 2, (2010) 477–486.
CE
[33] K.V. Mardia, P.E. Jupp, Directional Statistics. Wiley, New Jersey, (2000).
AC
[34] N.I. Fisher, Statistical Analysis of Circular Data. Cambridge University Press, (1993). [35] A.J. Lemonte, S.L.P. Ferrari, Local power and size properties of the LR, Wald, score and gradient tests in dispersion models. Statistical Methodology, 9, (2012) 537–554.
[36] A.B. Simas, G.M. Cordeiro, S. Nadarajah, Asymptotic tail properties of the distributions in the class of dispersion models. Theory of Probability and its Applications, 56, (2012) 703–710. [37] A.B. Simas, A.V. Rocha, W. Barreto–Souza, Bias-corrected estimators for dispersion models with dispersion covariates. Journal of Statistical Planning and Inference, 141, (2011) 3063–3074.
25
ACCEPTED MANUSCRIPT
[38] A.V. Rocha, A.B. Simas, G.M. Cordeiro, Second-order asymptotic expressions for the covariance matrix of maximum likelihood estimators in dispersion models. Statistics and Probability Letters, 80, (2010) 718–725. [39] G.M. Cordeiro, G.A. Paula, D.A. Botter, Improved likelihood ratio tests for dispersion models. International Statistical Review, 62, (1994) 257–274.
CR IP T
[40] S.L.P. Ferrari, G.M. Cordeiro, F. Cribari-Neto, Higher-order asymptotic refinements for score tests in proper dispersion models. Journal of Statistical Planning and Inference, 97, (2001) 177-190.
[41] L.D. Brown, Fundamentals of Statistical Exponential Families. Lecture Notes and Monographs Series, Hayward, 1986.
AN US
[42] D.R. Cox, D.V. Hinkley, Theoretical Statistics. Chapman and Hall, London, 1974
[43] D.R. Cox, N. Reid, Parameter orthogonality and approximate conditional inference. Journal of the Royal Statistical Society B, 40, (1987) 1–39.
[44] P.K. Sen, J.M. Singer, Large Sample Methods in Statistics: An Introduction with Applications, Chapman
M
and Hall/CRC, Boca Raton, 1993.
[45] J.A. Doornik, Object-Oriented Matrix Programming using Ox. Timberlake Consultants Press, London,
ED
2013.
[46] R Core Team R: A Language and Environment for Statistical Computing. R Foundation for Statistical
PT
Computing, Vienna, Austria, 2014.
[47] J. Rafter, M.L. Abell, J.P. Braselton, Statistics with Maple. Elsevier Science & Technology, London, 2003.
CE
[48] B. Efron, R.J. Tibshirani, An Introduction to the Bootstrap. Chapman and Hall, New York, 1993. [49] J. Carpenter, Assessing parameter uncertainty via bootstrap likelihood ratio confidence regions. Journal
AC
of Applied Statistics, 25, (1998) 639–649.
[50] C. Carta, J.A. Bueno, P. Ramirez, Statistical modelling of directional wind speeds using mixtures of von Mises distributions. Energy Conversion and Management, 49, (2008) 897–907.
[51] E.M.M. Ortega, V.G. Gancho, G.A. Paula, Generalized log–gamma regression models with cure fraction. Lifetime Data Analysis, 15, (2009) 79–106.
26
ACCEPTED MANUSCRIPT
[52] E.M.M. Ortega, G.A. Paula, H. Bolfarine, Deviance residuals in generalised log–gamma regression models with censored observations. Journal of Statistical Computation and Simulation, 78, (2008) 747–764. [53] J.F. Lawless, Statistical Models and Methods for Lifetime Data. 2nd ed. Wiley, New Jersey, 2003. [54] N.I. Fisher, A.J. Lee, Regression models for an angular response. Biometrics, 48, (1992) 525–529. [55] F.G. Rosillo, N. Mart´ın, M.A. Egido, Comparison of conventional and accelerated lifetime testing of
CR IP T
fluorescent lamps. Lighting Research and Technology, 42, (2010) 243–259.
[56] F.G. Rosillo, Predicci´on del tempo de vida de l´amparas fluorescentes aplicaci´on a usos fotovoltaicos.
AC
CE
PT
ED
M
AN US
PhD thesis, Instituto de Energ´ıa Solar, Universidad Polit´ecnica de Madrid, Madrid, 2011.
27