Improved inference in dispersion models

Improved inference in dispersion models

Accepted Manuscript Improved inference in dispersion models Francisco M.C. Medeiros, Silvia L.P. Ferrari, Artur J. Lemonte PII: DOI: Reference: S030...

604KB Sizes 1 Downloads 109 Views

Accepted Manuscript

Improved inference in dispersion models Francisco M.C. Medeiros, Silvia L.P. Ferrari, Artur J. Lemonte PII: DOI: Reference:

S0307-904X(17)30437-7 10.1016/j.apm.2017.06.045 APM 11849

To appear in:

Applied Mathematical Modelling

Received date: Revised date: Accepted date:

20 February 2017 12 June 2017 21 June 2017

Please cite this article as: Francisco M.C. Medeiros, Silvia L.P. Ferrari, Artur J. Lemonte, Improved inference in dispersion models, Applied Mathematical Modelling (2017), doi: 10.1016/j.apm.2017.06.045

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

Highlights • A Bartlett-type correction to the gradient statistic in dispersion models is derived. • The correction improves the chi-squared approximation when the sample size is finite. • A comprehensive simulation study compares several different tests in small samples.

CR IP T

• Large sample tests, including the gradient test, may perform poorly in small samples.

AC

CE

PT

ED

M

AN US

• Bartlett and Bartlett-type corrections remove size distortions with no power loss.

1

ACCEPTED MANUSCRIPT

Improved inference in dispersion models Francisco M.C. Medeiros

Silvia L.P. Ferrari∗

CR IP T

Department of Statistics, Federal University of Rio Grande do Norte, Brazil

Department of Statistics, University of S˜ao Paulo, Brazil

Artur J. Lemonte

AN US

Department of Statistics, Federal University of Rio Grande do Norte, Brazil

Abstract

M

We derive a general matrix Bartlett–type correction factor to the gradient statistic in the class of dispersion models. The correction improves the large–sample χ2 approximation to the null

ED

distribution of the gradient statistic when the sample size is finite. We conduct Monte Carlo simulation experiments to evaluate and compare the performance of various different tests, namely the usual Wald, likelihood ratio, score, and gradient tests, the Bartlett–corrected versions of the

PT

likelihood ratio, score, and gradient tests, and bootstrap–based tests. The simulation results suggest that the analytical and computational corrections are effective in removing size distortions

CE

of the type I error probability with no power loss. The impact of the corrections in two real data

AC

applications is considered for illustrative purposes.

Keywords: Bartlett correction; Bartlett–type correction; Bootstrap–based tests; Large–sample test statistics; Proper dispersion models.



Corresponding author: [email protected]

2

ACCEPTED MANUSCRIPT

1

Introduction

The Wald, likelihood ratio (LR), and score statistics, referred to as the “Holy Trinity” in the statistical literature, are often used to test hypotheses in parametric models. The gradient statistic (Terrell [1]) has received attention in the last few years (Muggeo and Lovision [2], Lemonte [3]) because it is very simple to compute (Rao [4, § 1.8]). It only involves the score vector and the maximum likelihood

CR IP T

estimates (unrestricted and restricted to the null hypothesis) of the parameter vector. Unlike the Wald and score statistics, the gradient statistic does not require the information matrix, it does not involve matrix inversion, and it shares the same first order asymptotic properties with the other three statistics (Lemonte and Ferrari [5]). The four statistics differ in their second-order asymptotic properties. In small and moderate sized samples, the approximation of the true, usually unknown, distributions of

AN US

the statistics by their χ2 asymptotic distribution may lead to considerable size distortions in the type I error probability of the tests. Bartlett and Bartlett–type corrections use second-order likelihood asymptotic results to produce modified statistics that follow more closely a χ2 distribution under the null hypothesis.

Bartlett corrections have a long history. Lawley [6], generalizing the findings of Bartlett [7], pro-

M

posed a correction factor to the LR statistic in wide generality. The factor, called Bartlett correction, reduces the approximation error of the null distribution of the statistic by χ2 distribution from O(n−1 )

ED

to O(n−2 ). Cordeiro and Ferrari [8] derived a Bartlett–type correction to the score statistic. Unlike

the Bartlett correction, it is a factor that involves a second-order polynomial in the statistic itself.

PT

Recently, Vargas et al. [9] obtained a Bartlett–type correction to the gradient statistic. Bartlett–type corrections to the Wald statistic are available only for some special models, which depend on cumu-

CE

lants of log–likelihood derivatives and may be difficult or even impossible to obtain depending on the model and the hypothesis under test. The reader is referred to Cordeiro and Cribari–Neto [10] for a

AC

detailed survey on Bartlett and Bartlett–type corrections. In recent years there has been a renewed interest in Bartlett and Bartlett–type correction fac-

tors and research papers have been published giving expressions for computing these corrections for special models. Lemonte et al. [11, 12] obtained Bartlett and Bartlett–type corrections to the LR and score statistics in the nonlinear Birnbaum–Saunders regression model, respectively. Ferrari and Uribe–Opazo [13] and Uribe–Opazo et al. [14] derived, respectively, a Bartlett correction to the LR statistic and a Bartlett–type correction to the score statistic in symmetric linear regression models,

3

ACCEPTED MANUSCRIPT

whereas Medeiros and Ferrari [15] obtained a Bartlett–type correction to the gradient statistic in this class of models and showed that their results are valid for log–symmetric linear models as well. Lagos et al. [16] derived a Bartlett–type correction to the score statistic in the Gaussian ARMA model. In generalized linear models, Cordeiro [17, 18] derived Bartlett corrections to the LR statistic, Cribari– Neto and Ferrari [19] and Cordeiro et al. [20] obtained a Bartlett–type correction to the score statistic, while Vargas et al. [21] derived a Bartlett–type correction to the gradient statistic. Other references are

CR IP T

Zucker et al. [22], Manor and Zucker [23], Lagos and Morettin [24], Tu et al. [25], van Giersbergen [26], Bai [27], Lemonte et al. [28], Fujita et al. [29], and Noma [30], among others.

Dispersion models (Jørgensen [31]) correspond to a wide class of models that allows several choices for the distribution of the response variable and a nonlinear function relation between its conditional mean and the linear predictor. Unlike generalized linear models, dispersion models en-

AN US

compass distributions outside the exponential family. An example is the von Mises distribution, which is useful for modeling directional data (Lee [32], Mardia and Jupp [33], Fisher [34]). Second-order likelihood–based inference in dispersion models has been the subject of recent papers. For example, Lemonte and Ferrari [35] derived second-order asymptotic expansions to the distribution of the Wald,

M

LR, score, and gradient statistics under a sequence of Pitman alternatives hypotheses. They showed that none of the four tests is uniformly more powerful than the others. Simas et al. [36] studied the

ED

asymptotic behavior of the probability density function of dispersion models. Simas et al. [37] derived a second-order unbiased maximum likelihood estimator, and Rocha et al. [38] derived a second-order approximation to the covariance matrix of the maximum likelihood estimator. Bartlett correction to

PT

the LR statistic and Bartlett–type correction to the score statistic were obtained by Cordeiro et al. [39] and Ferrari et al. [40], respectively. It is worth emphasizing that simulation experiments provided by

CE

Lemonte and Ferrari [35] in the class of dispersion models reveal considerable size distortions in the type I error probability for the test that uses the gradient statistic (and for the tests that use the usual

AC

Wald, LR, and score statistics as well) coupled with its null asymptotic χ2 distribution when the sample size is small or moderate. Hence, corrections are needed. This paper goes in this direction and we will provide a Bartlett–type corrected gradient statistic in the class of dispersion models. In this paper, we revisit the work of Cordeiro et al. [39] and Ferrari et al. [40] and derive a Bartlett– type correction to the gradient statistic in dispersion models. Our results generalize the findings in Vargas et al. [21], which hold only for generalized linear models. We perform Monte Carlo simulations to evaluate and compare the type I error probability and the power of the usual tests, i.e. the 4

ACCEPTED MANUSCRIPT

“Holy Trinity” plus the gradient test, and the corrected LR, score and gradient tests. Additionally, bootstrap–based tests are also included in the Monte Carlo experiments for the sake of comparison. The simulations suggest that the uncorrected tests may be markedly liberal in small and moderate sized samples, and the analytically and bootstrap corrected tests display type I error probability close to the chosen nominal level with virtually no power loss. It is worth emphasizing that we have not found any comprehensive simulation study in the statistical literature comparing the classical uncor-

CR IP T

rected and corrected large–sample tests in dispersion models. This paper fills this gap and includes the gradient test and its Bartlett–type corrected version derived here in the Monte Carlo simulation study.

The following sections are organized as follows. In Section 2, we define dispersion models and present likelihood–based inferential tools. In Section 3, we present the Bartlett correction to the LR

AN US

statistic and the Bartlett–type correction to the score statistic, and we derive a Bartlett–type correction to the gradient statistic in dispersion models. Section 4 deals with Monte Carlo simulation experiments. In Section 5 we illustrate the impact of the corrections in two real data applications. Section 6

Dispersion models, estimation, and testing

ED

2

M

closes the paper with some final remarks.

Let Y be a continuous random variable with probability density function (1)

PT

π(y; θ, φ) = exp[φt(y, θ) + a(y, φ)],

for y in a suitable subset of the real line, where t(·, ·) and a(·, ·) are known functions, θ ∈ IR and

CE

it is usually the parameter of interest, and φ > 0 is the precision parameter. The parameter θ may

be generally interpreted as a kind of location parameter, not necessarily the mean of the distribution.

AC

The class of models defined in (1) is called dispersion models (Jørgensen [31]). Some distributions in this class are presented in Table 1. The exponential family (Brown [41]), obtained from (1) with t(y, θ) = yθ − b(θ) and b(·) being a known function, is a remarkable subclass of dispersion models, where E(Y ) = µ = db(θ)/dθ. Another important subclass of (1) is the proper dispersion models,

and it is obtained by taking a(y, φ) = d1 (φ) + d(y), with d1 (·) and d(·) being known functions. The von Mises, log–gamma, and simplex distributions also belong to this subclass. The proper dispersion models have the following properties: the distribution of t = t(Y, θ) does not depend on θ, i.e., t 5

ACCEPTED MANUSCRIPT

is pivotal when φ is known; if θ is known, the proper dispersion models belong to the exponential family, with t being a canonical statistic. Table 1: t(y, θ) and a(y, θ) for some dispersion models.a Model

t(y, θ)

a(y, φ)

Normal

− 12 (y − θ)2

− 12 log(2πφ−1 )

yθ + (−2θ)1/2

Gamma

−{2[log(2πy 3 φ−1 ) + φ/y]}−1

yθ + log(−θ)

− log Γ(φ) + φ log(φy) − log y

y − θ − exp(y − θ)

φ log φ − log Γ(φ)

cos(y − θ)

log I0 (φ)

Log–gamma von Mises a

CR IP T

Inverse normal

Iν (·) is the modified Bessel function of first kind and order ν; and Γ(·) is the gamma function.

AN US

Let Y1 , . . . , Yn be n random variables with Yl having probability density function (1) with θ = θl , for l = 1, . . . , n. A regression structure is introduced in model (1) by assuming that g(θl ) = ηl = x> l β,

l = 1, . . . , n,

(2)

where x> l = (xl1 , . . . , xlp ) is a vector of known covariates associated to the l–th observation, β =

M

(β1 , . . . , βp )> is a p–vector (p < n) of unknown parameters to be estimated from the data, and g(·) is a continuous, invertible, and twice differentiable function, called the link function, which links the

ED

covariates xl to the parameter of interest θl . The precision parameter φ is unknown but the same for all observations.

PT

The log–likelihood function for the parameter vector (β, φ) takes the form

CE

` = `(β, φ) = φ

n X

t(yl , θl ) +

l=1

n X

a(yl , θl ),

(3)

l=1

(i)

−1 (i) i i where θl = g −1 (x> l β), and g (·) is the inverse of g(·). Let tl = t (yl , θl ) = ∂ t(yl , θl )/∂θl and

AC

Dil = Dil (θl , φ) = E[t(i) (Yl , θl )] for i = 1, 2, 3, 4 and l = 1, . . . , n. We assume the usual regularity conditions for large sample inference based on likelihood estimation Cox and Hinkley [42, Cap. 9]. We then have D1l = 0 and D2l = −φE[t(1) (Yl , θl )2 ]. Table 2 lists D2l , D3l , and D4l for some

distributions in the class of dispersion models.

Let X = (x1 , . . . , xn )> be the n × p design matrix of full column rank (rank(X) = p). The

score vector and the Fisher information matrix for β are given, respectively, by Uβ = φX > t(1) and (1)

(1)

Kβ = φX > W X, where t(1) = (t1 , . . . , tn )> and W = W (β, φ) = diag{w1 , . . . , wn } with 6

ACCEPTED MANUSCRIPT

Table 2: D2l , D3l , and D4l for some dispersion models. Model

D2l

D3l

D4l

Normal

−1

0

0

−(−2θl )−3/2

−3(−2θl )−5/2

−15(−2θl )−7/2

−1

1

−1

−I1 (φ)/I0 (φ)

0

I1 (φ)/I0 (φ)

Inverse normal Gamma Log–gamma von Mises

2/θl3

−6/θl4

CR IP T

−1/θl2

wl = −D2l (dθl /dηl )2 . The maximum likelihood estimate of β = (β1 , . . . , βp )> , βb = (βb1 , . . . , βbp )> ,

comes from the system of equations Uβ = 0, which does not have explicit solution except for normal models. From the Fisher scoring method, the maximum likelihood estimate of β can be obtained by

AN US

iteratively solving the system of equations X > W (m) Xβ (m+1) = X > W (m) y ∗(m) ,

m = 0, 1, . . . ,

−1 −1 where y ∗(m) = Xβ (m) + N (m) t(1)(m) with N = −diag{D21 dη1 /dθ1 , . . . , D2n dηn /dθn }, and m is

the iteration counter. Note that βb depends on the distribution assumed for the data through D2l , and b = 0, [t(yl , θbl ) + a(1) (yl , φ)]

ED

n X

M

b comes from the equation does not depend on φ. The maximum likelihood estimate of φ, φ, l=1

(4)

where a(1) (yl , φ) = da(yl , θl )/dφ. It can be shown that E[∂ 2 `(β, φ)/∂β∂φ] = 0 and, hence, the

PT

Fisher information matrix is block–diagonal, i.e. β and φ are globally orthogonal in the sense of Cox

CE

(1) b and Reid [43]. In particular, for the proper dispersion models, equation (4) reduces to d1 (φ) = Pn (1) − l=1 t(yl , θbl )/n, where d1 (φ) = dd1 (φ)/dφ, and the Fisher information matrix for the parameter

vector (β, φ) becomes K = K(β, φ) = diag{φX > W X, −nd2 d1 (φ)/dφ2 }. (0)

AC

Consider the problem of testing the null hypothesis H0 : β1 = β1 against the alternative hypoth(0)

esis H1 : β1 6= β1 , where β is partitioned as β = (β1> , β2> )> , with β1 = (β1 , . . . , βq )> being the (0)

vector of parameters of interest, and β2 = (βq+1 , . . . , βp )> . Here, β1 is a fixed vector of dimension

q and β2 and φ act as nuisance parameters. The partition of β induces the following partitions:     X1> t(1) X1> W X1 X1> W X2  , Kβ = φ  , Uβ = φ  X2> t(1) X2> W X1 X2> W X2 7

ACCEPTED MANUSCRIPT

  where the matrix X is partitioned as X = X1 X2 , with X1 being an n × q matrix, and X2 being an n × (p − q) matrix. The Wald, LR, score, and gradient statistics for testing H0 against H1 are given, respectively, by

b βb1 − β (0) )> R b >W c R( b βb1 − β (0) ), SW = φ( 1 1   b − `(β (0) , βe2 , φ) e , SLR = 2 `(βb1 , βb2 , φ) 1 f 1/2 X1 (βb1 − β (0) ), ST = φe1/2 se> W 1

CR IP T

f R) e −1 X > W f 1/2 se, f 1/2 X1 (R e >W SR = se> W 1

(1)

where R = X1 − X2 (X2> W X2 )−1 X2> W X1 , s = (s1 , . . . , sn )> with sl = φ1/2 tl (−D2l )−1/2 , b is the unrestricted maximum likelihood estimator of (β1 , β2 , φ), and (β (0) , βe2 , φ) e is the (βb1 , βb2 , φ) 1

AN US

restricted maximum likelihood estimator of (β1 , β2 , φ) obtained by imposing that the null hypothesis

holds. “Hat” and “tilde” indicate evaluation at the unrestricted and restricted maximum likelihood estimator of (β1 , β2 , φ), respectively. Under the null hypothesis H0 , the limiting distribution of the four statistics is χ2q with approximation error of order O(n−1 ); see, for instance, Sen and Singer [44]

and Lemonte [3]. The null hypothesis is rejected for a given nominal level, α say, if the observed

Improved tests

ED

3

M

value of the test statistic exceeds the upper 100(1 − α)% quantile of the χ2q distribution.

Henceforth, we assume that a(y, φ) = d1 (φ) + d(y) in (1). The Bartlett correction to the LR statistic

PT

and the Bartlett–type correction to the score statistic in proper dispersion models are given, respectively, in Cordeiro et al. [39] and Ferrari et al. [40], and will be revisited in this section. Also, from

CE

the general results in Vargas et al. [9], we will derive a general Bartlett–type correction factor to the gradient statistic. This result is new and represents an additional tool to perform testing inference

AC

in proper dispersion models. Further, we also consider the alternative testing procedure based on bootstrap.

To define the corrected LR and score statistics as well as to derive Bartlett–type corrections for the

gradient statistic in dispersion models, some additional notation is in order. We define the n × n matrices: Z = X(X > W X)−1 X > = ((zlc )), Z2 = X2 (X2> W X2 )−1 X2> = ((z2,lc )) when q < p and

Z2 = 0 when q = p, Zd = diag{z11 , . . . , znn }, Z2d = diag{z2,11 , . . . , z2,nn }, F = diag{f1 , . . . , fn },

8

ACCEPTED MANUSCRIPT

G = diag{g1 , . . . , gn }, E = diag{e1 , . . . , en }, J = diag{j1 , . . . , jn }, R = diag{r1 , . . . , rn },

CR IP T

H = diag{h1 , . . . , hn }, S = diag{s∗1 , . . . , s∗n }, Q = diag{q1 , . . . , qn }, where  3  3 dθl d2 θl dθl dθl d2 θl dθl 0 fl = − D2l − D3l , gl = − D2l , el = − D2l , 2 2 dηl dηl dηl dηl dηl dηl  4   dθl 00 0 (2) 2 , jl = −3D4l + 8D3l − 6D2l + 3φD2l − 3φD2l dηl " #  4  2 2 2 dθl dθl d θl 0 d2 θl dθl d3 θl 00 rl = D2l + 5 + D2l , D +2 dηl dηl dηl2 2l dηl2 dηl dηl3 4  dθl  00 0 (2) 2 hl = −2D4l + 6D3l − 5D2l + 2φD2l − 2φD2l dηl  2 2   dθl d θl 0 + 2D3l − 3D2l , dηl dηl2

AN US



#  2 2 2 3 2 dθl dθ d θ d θ d θl l l l ∗ + D2l + 3 D3l sl = 3 2 3 dηl dηl dηl dηl dηl2  4  2 2 dθl dθl d θl 0 0 + D3l + 3 D , dηl dηl dηl2 2l "  #  4  2 2 2 dθl dθl d θl d2 θl dθl d3 θl ql = D4l + 6 D3l + 3 +4 D2l , dηl dηl dηl2 dηl2 dηl dηl3

M

"

0

00

(j)

0

(2)

ED

where Dil = ∂Dil /∂θl , Dil = ∂Dil /∂θl , and Dil = E[(∂ i t(Yl , θl )/∂θli )j ] for i, j = 1, 2, 3, 4. Let Z (2) = Z Z, Z2 = Z2 Z2 , Z (3) = Z (2) Z, etc., where “ ” denotes the element–wise product

PT

of two matrices (Hadamard product). The matrices φ−1 Z and φ−1 Z2 are the asymptotic covariance matrices of X βb and X2 βe2 , respectively.

(0)

CE

Cordeiro et al. [39] derived a Bartlett correction to the LR statistic for testing H0 : β1 = β1

in proper dispersion models by using the general results in Lawley [6]. The Bartlett-corrected LR

AC

statistic is given by

∗ SLR = SLR (1 − aLR ),

where aLR = (ALR + ALR,βφ )/q, with 1 > 1 (3) (2) (2) 1n G(Z (3) − Z2 )(F + G)1n + 1> [Q + 4(R − S)](Zd − Z2d )1n 3φ 4φ n 1 > (3) 1n F [2(Z (3) − Z2 ) + 3(Zd ZZd − Z2d Z2 Z2d )]F 1n + 12φ

ALR = −

(3)

(3) + φ−1 1> − Z2 ) + (Zd ZZd − Z2d Z2 Z2d )](E − F + G)1n , n (E + G)[(Z

9

(5)

ACCEPTED MANUSCRIPT

" # q d(2) + d(3) 2p − q = , − 2n d2(2) 2d(2)

ALR,βφ

00

where 1n = (1, . . . , 1)> is an n-dimensional vector of ones, d(2) = d(2) (φ) = φ2 d1 (φ) and d(3) = 000

00

0

000

00

d(3) (φ) = φ3 d1 (φ), with d1 (φ) = dd1 (φ)/dφ and d1 (φ) = dd1 (φ)/dφ. Under the null hypothesis, ∗ has an asymptotic χ2q distribution with approximation error reduced from O(n−1 ) to O(n−2 ). SLR

(0)

Ferrari et al. [40] derived a Bartlett–type correction to the score statistic for testing H0 : β1 = β1

corrected score statistic is

CR IP T

in proper dispersion models by taking into account the general results of Cordeiro and Ferrari [8]. The   SR∗ = SR 1 − cR + bR SR + aR SR2 ,

(6)

where aR = AR3 /[12q(q + 2)(q + 4)], bR = (AR22 − 2AR3 )/[12q(q + 2)], cR = (AR11 − AR22 +

AN US

AR3 )/(12q), with AR11 = AR1 + AR1,βφ , AR22 = AR2 + AR2,βφ ,

AR1 = 3φ−1 1> n (2E + 2G − F ) Z2d (Z − Z2 )Z2d (2E + 2G − F ) 1n − 6φ−1 1> n (2E + 2G − F ) Z2d Z2 (Z − Z2 )d (2F − 2G − 3E) 1n (2)

− 6φ−1 1> n (3F − 2G − 4E) [Z2 (Z − Z2 )] (2E + 2G − F ) 1n

M

− 6φ−1 1> n H(Z − Z2 )d Z2d 1n ,

ED

AR2 = 6φ−1 1> n (2E + 2G − F ) Z2d (Z − Z2 )(Z − Z2 )d (2F − 2G − 3E) 1n − 3φ−1 1> n (2F − 2G − 3E) [(Z − Z2 )d Z2 (Z − Z2 )d (2)

PT

+ 2(Z − Z2 )d Z2 ] (2F − 2G − 3E) 1n (2)

CE

+ 3φ−1 1> n J (Z − Z2 )d 1n ,

AC

AR3 = φ−1 1> n (2F − 2G − 3E) [3(Z − Z2 )d (Z − Z2 )(Z − Z2 )d + 2(Z − Z2 )(3) ] (2F − 2G − 3E) 1n ,

AR1,βφ =

6 ˙ (Z − Z2 )}] [d(3) − (p − q − 2)d(2) ][q − φtr{W nd2(2) +

6φ ˙ )(Z − Z2 )W ˙ Z2 } [2tr{(2W − φW nd(2) ¨ (Z − Z2 )}], + φtr{W

10

ACCEPTED MANUSCRIPT

AR2,βφ =

3 h ˙ (Z − Z2 )})2 (−q + φtr{W nd(2) +

i ˙ (Z − Z2 )W (Z − Z2 )}) + 2(q − 2φtr{W 6φ2 ˙ (Z − Z2 )W ˙ (Z − Z2 )}, tr{W nd(2)

where the notation (·)d indicates that the off–diagonal elements are zero, tr{·} is the trace operator,

CR IP T

˙ = diag{w˙ 1 , . . . , w˙ n } and W ¨ = diag{w¨1 , . . . , w¨n } with w˙ l = dwl /dφ and w¨l = dw˙ l /dφ. In and W ˙ =W ¨ = 0. The correction factor [1 − (cR + exponential family nonlinear models, it follows that W

bR SR + aR SR2 )] in (6) reduces the approximation error of the asymptotic χ2q distribution of SR∗ from O(n−1 ) to O(n−3/2 ).

Vargas et al. [9] obtained a general formula for a Bartlett–type correction to the gradient statistic in

AN US

regular parametric models. Their formula may, in principle, be particularized to specific models. For instance, Vargas et al. [21] obtained a Bartlett-type correction to the gradient statistic in generalized linear models.

From the general formula in Vargas et al. [9] we now derive a Bartlett–type correction to the gradient statistic in proper dispersion models. Let κrs = E(∂ 2 `/∂βr ∂βs ), κrst = E(∂ 3 `/∂βr ∂βs ∂βt ), (t)

(tu)

M

κrstu = E(∂ 4 `/∂βr ∂βs ∂βt ∂βu ), κrs = ∂κrs /∂βt , κrs = ∂ 2 κrs /∂βt ∂βu , κφφ = E(∂ 2 `/∂φ2 ), κrφ = (s)

E(∂ 2 `/∂βr ∂φ), κrφ = ∂ 2 κrφ /∂βs , and so on. The indices r, s, t, and u range from 1 to p. Note dispersion models, we have n X

κrstu = φ

l=1 n X

AC

κ(tu) rs = φ

κrsφ = − (φ)

κrsφ = −

l=1 n X

l=1 n X

wl xlr xls , w˙ l xlr xls ,

ql xlr xls xlt xlu ,

(u)

κrst = φ

κ(t) rs

n X

sl xlr xls xlt xlu ,

l=1

n X = −φ (e + 2g)l xlr xls xlt , l=1

rl xlr xls xlt xlu ,

l=1 n X

n X l=1

(f + 2g)l xlr xls xlt ,

CE

κrst = −φ

wl xlr xls ,

PT

κrs = −φ

ED

that κrs , κrst , κrstu , κφφ , κrφ , κrsφ and κrsφφ are cumulants of the log-likelihood function. In proper

κ(φ) rs

κφφ = nd(2) /φ2 , =−

n X

(φ)

κφφφ = κφφ = nd(3) /φ3 ,

(7)

(wl + φw˙ l )xlr xls ,

l=1

(s)

κrsφφ = κrφφ = κrφφ = 0.

l=1

From Vargas et al. [9], the Bartlett–type corrected gradient statistic takes the form   ST∗ = ST 1 − cT + bT ST + aT ST2 , 11

(8)

ACCEPTED MANUSCRIPT

where aT = AT3 /[12q(q + 2)(q + 4)], bT = (AT22 − 2AT3 )/[12q(q + 2)], cT = (AT11 − AT22 +

AT3 )/(12q), with AT11 = AT1 + AT1,βφ and AT22 = AT2 + AT2,βφ . By replacing the cumulants given in (7) in the expressions of Theorem 1 in Vargas et al. [9], and after some algebra, we arrive at1

(2)

AT1 = 3φ−1 1> n (F + 2G)[Z2d (Z − Z2 )Z2d + 2Z2 (Z − Z2 )+ (2)

CR IP T

+ 2(Z − Z2 )d Z2 Z2d + Z2d (Z − Z2 )(Z − Z2 )d ](F + 2G)1n

(2) − 6φ−1 1> − Z2 ) + (Z − Z2 )d (ZZd + Z2 Z2d ) n (F + 2G)[(Z + Z2 ) (Z (2)

+ 2Z2d (ZZd − Z2 Z2d ) + 2Z2 (Z − Z2 )](E + 2G)1n (3)

(3) + 12φ−1 1> − Z2 ](E + 2G)1n n (E + 2G)[Zd ZZd − Z2d Z2 Z2d + Z

(2)

(2)

+ 12φ−1 1> n R(Zd − Z2d )1n ,

3 (Z − Z2 )d (Z − Z2 )(Z − Z2 )d + 2Z2 (Z − Z2 )(2) 4  1 (3) + (Z − Z2 ) + Z2d (Z − Z2 )(Z − Z2 )d + (Z − Z2 )d Z2 (Z − Z2 )d (F + 2G)1n 2 −3φ−1 1> n (F

+ 2G)

M

AT2 =



AN US

−1 > − 6φ−1 1> n S(Z − Z2 )d (Zd + 3Z2d )1n + 6φ 1n Q(Z − Z2 )d Z2d 1n

(2)

(2) + 6φ−1 1> − Z2 ) + (Z − Z2 )d (ZZd − Z2 Z2d )](E + 2G)1n n (F + 2G)[(Z − Z2 ) (Z (2)

  3 1 (3) + 2G) (Z − Z2 )d (Z − Z2 )(Z − Z2 )d + (Z − Z2 ) (F + 2G)1n , 4 2

PT

AT3 =

φ−1 1> n (F

ED

+ 3φ−1 1> n (2S − Q)(Z − Z2 )d 1n ,

AT1,βφ =

6q [d(3) + (2 − p + q)d(2) ], nd2(2)

AT2,βφ =

3q(q + 2) . nd(2)

CE

The correction factor [1−(cT +bT ST +aT ST2 )] in (8) reduces the approximation error of the asymptotic χ2q distribution of ST∗ from O(n−1 ) to O(n−3/2 ).

AC

A brief commentary on the quantities that define the improved statistics is in order. The coeffi-

cients ALR , AR ’s and AT ’s are functions of the model matrix X (through Z and Z2 ), the parameter φ, and the second and third order derivatives of the link function. The terms ALR,βφ and AT1,βφ depend on the number of regression parameters (p) and the number of parameters of interest (q), while AR1,βφ and AR2,βφ are functions of p, q and the model matrix, and the term AT2,βφ is function of the number of parameters of interest. Unfortunately, they are not easy to interpret in generality and provide no 1

The detailed derivation of the formulas for the A’s is omitted but may be requested to the authors.

12

ACCEPTED MANUSCRIPT

indication as to what structural aspects of the model contribute significantly to their magnitude. It is noteworthy that the A’s can be easily implemented in computational programming languages with support for matrix operations, such as Ox (Doornik [45]), R (R Core Team [46]), and MAPLE (Rafter et al. [47]). Finally, all unknown parameters in the quantities that define the improved statistics are replaced by their restricted maximum likelihood estimates. The improved LR, score, and gradient tests that employ (5), (6), and (8), respectively, as test statistics, follow from the comparison of the

CR IP T

∗ , SR∗ , and ST∗ with the critical value obtained as the appropriate χ2q quantile. observed values of SLR

For generalized linear models, where t(y, θ) = yθ −b(θ) in equation (1), we have g(db(θl )/dθl ) = 0

00

−1 −2 −2 2 −1 2 2 g(µl ) = ηl = x> l β, with D2l = −Vl , D2l = Vl dVl /dµl , D2l = Vl [d Vl /dµl −2Vl (dVl /dµl ) ], (2)

0

D2l = Vl−2 + Vl−3 (dVl /dµl )2 /φ, D3l = 2Vl−2 dVl /dµl , D3l = 2Vl−2 [d2 Vl /dµ2l − 2Vl−1 (dVl /dµl )2 ] and D4l = 3Vl−2 [d2 Vl /dµ2l − 2Vl−1 (dVl /dµl )2 ]. Additionally, 

3

dµl dηl

s∗l = 9λ1l − 3λ2l − 3λ3l − 4λ4l + 2λ5l ,

dµl dηl

3

,

hl = λ1l + λ5l ,

M

jl = λ4l + λ5l ,

PT

ED

2 2  2 dµl d µl 1 d 2 µl 1 dµl d3 µl , λ = , λ = , 2l 3l dηl dηl2 Vl dηl2 Vl dηl dηl3  2  4  4 1 dVl dµl 1 d2 Vl dµl λ4l = 3 , λ5l = 2 2 . Vl dµl dηl Vl dµl dηl

1 dVl λ1l = 2 Vl dµl



,

1 dVl gl = 2 Vl dµl

rl = 5λ1l − 2λ2l − 2λ3l − 2λ4l + λ5l ,

ql = 12λ1l − 3λ2l − 4λ3l − 6λ4l + 3λ5l , where



AN US

1 dµl d2 µl 1 dVl fl = −2 2 2 Vl dηl dηl Vl dµl

By replacing these formulas in (5), (6), and (8), and after some algebraic manipulations, we obtain the

CE

Bartlett correction to the LR statistic given by Cordeiro [18], the Bartlett–type correction to the score statistic derived by Cribari–Neto and Ferrari [19], and the Bartlett–type correction to the gradient

AC

statistic obtained by Vargas et al. [21].2 Another way of improving the size properties of tests is achieved by the bootstrap technique. The

bootstrap corrected tests are performed as follows. Let S denote any of the uncorrected statistics. First, one generates B bootstrap samples from the assumed model with the parameters replaced by restricted estimates computed using the original sample, under H0 , i.e. imposing the restrictions stated 2

In the formula for bl in Vargas et al. [21, p. 113], λ3l should be replaced by λ5l ; see the expression for b in Cribari–

Neto and Ferrari [19, p. 428].

13

ACCEPTED MANUSCRIPT

in the null hypothesis. Second, for each pseudo–sample, one computes the statistic; let S b denote the statistic for the b–th sample, b = 1, . . . , B. Third, the 1 − α percentile of S is estimated by qb1−α , such

that #{S b ≤ qb1−α }/B = 1 − α. Finally, one rejects the null hypothesis if S > qb1−α . Alternatively, the test may be based on the bootstrap p–value given by pb = #{S b ≥ S}/B. We denote the bootstrap

b b corrected test statistics by SW , SLR , SRb and STb . For a good discussion of bootstrap tests, see Efron

and Tibshirani [48, Ch. 16]. Also, it is possible to prove second-order asymptotic results on the

constructed using bootstrap estimate of the critical point is O(n−2 ).

CR IP T

bootstrap procedure. In fact, Carpenter [49] proved that the coverage error of LR confidence intervals

The corrections presented in this section theoretically reduce the order of approximation of the

true, unknown distribution of the test statistics by the limiting χ2 distribution under the null hypothesis. In the next section, we numerically compare the size and power of the tests that employ the

AN US

∗ , SR∗ and ST∗ using Monte uncorrected statistics SW , SLR , SR , and ST , and the corrected statistics SLR

Carlo simulations. Bootstrap–based tests are also included in the simulation study for comparison.

Monte Carlo simulations

M

4

We performed some Monte Carlo simulation experiments to investigate the size and power perfor-

ED

mances of the usual Wald, LR, score, and gradient tests, and the tests that use the corrected statistics ∗ , SR∗ , and ST∗ in small and moderate sized samples in dispersion models. We also included bootSLR b b strap tests (SW , SLR , SRb , and STb ) in the simulations for the sake of comparison. The simulations were

PT

carried out using the Ox matrix programming language, which is freely distributed for academic purposes and is available at http://www.doornik.com. All maximizations of log–likelihood func-

CE

tion were performed using the Broyden–Fletcher–Goldfarb–Shanno (BFGS) quasi–Newton method with analytical first derivatives through the MaxBFGS subroutine.

AC

The null hypothesis under test is H0 : β1 = · · · = βq = 0. We computed the type I error

probability of the tests that use the above statistics for the nominal levels α = 10%, 5%, and 1%;

i.e., we computed the proportion of the times that the test statistic exceeds the asymptotic critical value in 10, 000 Monte Carlo replicates. We considered different values for the number of regression parameters (p), the number of parameters being tested (q), and the sample sizes (n = 20, 25, and 30). We used B = 600 replicates for the bootstrap–based tests. The results are presented in Tables 3 and 4. Entries are percentages. 14

ACCEPTED MANUSCRIPT

First, we consider the von Mises model, which is the standard model for circular data; see Carta et al. [50] and Mardia and Jupp [33]. Let π(y; θ, φ) =

exp[φ cos(y − θ)] , 2πI0 (φ)

y ∈ (−π, π),

(9)

where θ ∈ (−π, π), φ > 0 is the precision parameter, and Iν (·) denotes modified Bessel function of

first kind and order ν. This distribution is symmetric around y = θ, and the larger the value of φ,

CR IP T

the greater the concentration around θ. Model (9) is a special case of (1) when t(y, θ) = cos(y − θ)

and a(y, φ) = − log(I0 (φ)) − log(2π), and hence it is a proper dispersion model. The systematic

component of the model is

tan(θl /2) = ηl = β0 + β1 xl1 + · · · + βp−1 xl,p−1 ,

l = 1, . . . , n,

AN US

and hence θl = 2 arctan(ηl ). The covariates were taken as random draws of the uniform distribution U(−0.5; 0.5) and were kept fixed in the Monte Carlo replicates. Also, the regression parameters,

except for those fixed at the null hypothesis, were set at one. The precision parameter was fixed at φ = 4.

Tables 3 and 4 indicate that the usual Wald and LR tests are the most liberal (rejecting the null

M

hypothesis more frequently than expected based on the selected nominal level). For instance, for p = 6, n = 25, and α = 5%, the null rejection rates are, respectively, 14.06% and 11.84% (q = 3),

ED

12.71% and 11.23% (q = 2), and 10.33% and 9.24% (q = 1); see Table 4. The rejection rates of the LR test, although being smaller than those of the Wald test, are considerably bigger than those of the

PT

usual score and gradient tests. For the example just mentioned, the rejection rates of the score and gradient tests are 6.70% and 7.03% (q = 3), 8.27% and 8.59% (q = 2), and 7.79% and 8.04% (q = 1), respectively.

CE

When the number of parameters being tested decreases keeping the number of regression parameters fixed, the null rejection rates of the tests that employ SW and SLR tend to get smaller, and the

AC

opposite is observed for the tests that use SR and ST . For instance, for p = 4, n = 20, and α = 10%, the rejection rates of the Wald and LR tests are 20.70% (SW ) and 16.98% (SLR ) for q = 3, 17.13% (SW ) and 16.59% (SLR ) for q = 2, and 15.72% (SW ) and 15.43% (SLR ) for q = 1, while the rejection rates for the tests that employ SR and ST are 11.49% (SR ) and 11.48% (ST ) for q = 3, 12.92% (SR ) and 13.47% (ST ) for q = 2, and 14.16% (SR ) and 14.33% (ST ) for q = 1; see Table 3. The tests that use the corrected statistics and the bootstrap tests present null rejection rates close to the nominal levels. It is also noteworthy that the modified tests are influenced neither by the number 15

ACCEPTED MANUSCRIPT

of regression parameters nor by the number of parameter under test. For example, for p = 6, n = 20, ∗ b and α = 5%, the null rejection rates are 4.82% (SLR ), 4.79% (SR∗ ), 4.85% (ST∗ ), 4.78% (SW ), 4.84% b ∗ b ), ), 5.34% (SR∗ ), 5.19% (ST∗ ), 4.80% (SW ), 4.91% (SRb ) and 5.10% (STb ) when q = 3; 4.91% (SLR (SLR ∗ b ), 5.21% (SR∗ ), 5.05% (ST∗ ), 5.07% (SLR ), 5.12% (SRb ), and 5.10% (STb ) when q = 2; and 4.79% (SLR b b 4.72% (SW ), 4.88% (SLR ), 4.91% (SRb ), and 4.85% (STb ) when q = 1; see Table 4.

The second simulation experiment deals with the log–gamma dispersion model, which is often

CR IP T

applied in survival data analysis; see Ortega et al. [51], Ortega et al. [52], and Lawless [53, Cap. 5]. We have

π(y; θ, φ) = exp{φ[y − θ − exp(y − θ)] + φ log φ − log Γ(φ)},

y > 0,

(10)

where θ ∈ IR and φ > 0. The density function in (10) is a special case of (1) with t(y, θ) =

AN US

y − θ − exp(y − θ) and a(y, θ) = φ log(φ) − log(Γ(φ)), and hence the model is a proper dispersion model. Let

g(θl ) = ηl = θl = β0 + β1 xl1 + · · · + βp−1 xl,p−1 ,

l = 1, . . . , n.

The covariates were drawn from the uniform distribution U(0; 1) and were kept fixed in the Monte

Carlo replicates. The regression parameters, except for those in the fixed in the null hypothesis, are

M

set equal to one, and φ = 3.

The simulation findings for the log–gamma regression model are qualitatively similar to those

ED

reported for the von Mises regression model (results not shown to save space). Again, the corrected tests and the bootstrap tests are clearly much less size distorted than the original tests. The Bartlett

PT

and Bartlett–type corrections as well as the bootstrap technique are remarkably effective in correcting the size distortions of type I error of the large sample tests when the sample size is small or moderate.

CE

We also performed simulation experiments to investigate the power of the tests. As the simulation results above show, the tests have different sizes when one uses their asymptotic χ2 distribution in

AC

small– and moderate–sized samples. In evaluating the power of these tests, it is important to ensure that they all have the correct size under the null hypothesis. To overcome this difficulty, we first ran 500, 000 Monte Carlo replicates to estimate the exact critical value for each test (except for the bootstrap tests). So, the exact critical value guarantees that all tests have the same type I error, which allows us to compare their powers. Figure 1 displays the power curves for the tests of H0 : β1 = β2 =

0 in von Mises and log–gamma models with n = 30, p = 3, q = 2, and α = 10%. The power of

the tests were calculated under the alternative hypothesis H1 : β1 = β2 = δ for a grid of values for 16

ACCEPTED MANUSCRIPT

20

2

25

30

SR∗

ST∗

b SW

b SLR

SRb

STb

10

20.70

16.98

11.49

11.48

10.38

10.25

10.18

10.00

10.34

10.50

10.57

5

13.72

10.05

5.06

4.89

5.18

4.86

4.86

5.02

5.25

5.31

5.36

1

5.46

2.66

0.52

0.44

0.98

0.78

0.80

1.04

1.05

1.13

1.03

10

18.52

14.85

10.65

10.26

9.75

9.63

9.66

9.93

9.72

9.73

9.85

5

11.71

8.43

4.77

4.50

4.84

4.74

4.65

4.84

4.86

4.90

4.99

1

4.23

2.05

0.47

0.39

0.79

0.65

0.65

0.79

0.84

0.85

0.85

10

16.77

14.28

10.76

10.74

10.09

9.87

9.96

10.19

10.09

10.03

10.09

5

10.25

7.65

4.91

4.82

4.93

4.78

4.82

4.99

4.98

4.95

5.05

1

3.29

2.00

0.83

0.72

1.10

0.96

0.97

1.06

1.19

1.09

1.16

10

17.13

16.59

12.92

13.47

9.74

10.04

9.75

9.64

9.85

10.01

9.83

5

10.46

9.43

5.99

6.49

4.86

4.71

4.67

4.69

4.92

4.86

4.88

1

3.49

2.63

0.94

1.08

1.01

0.93

0.90

1.04

1.14

1.12

1.16

10

17.05

15.38

12.68

12.96

10.45

10.62

10.55

10.50

10.51

10.63

10.57

5

10.51

8.96

6.24

1

3.70

2.50

1.01

10

15.02

13.97

11.64

5

8.91

7.79

1

2.91

10

6.41

5.30

5.13

5.25

5.36

5.42

5.22

5.45

1.09

1.15

0.99

1.08

1.29

1.21

1.22

1.27

12.11

10.05

9.97

10.17

10.03

10.06

9.97

10.17

5.91

6.04

5.09

5.10

5.08

5.13

5.12

5.25

5.18

2.13

1.05

1.08

1.11

1.03

1.05

1.30

1.19

1.20

1.20

15.72

15.43

14.16

14.33

10.12

10.53

10.40

10.09

10.19

10.21

10.26

5

9.21

9.08

7.68

7.75

5.08

5.24

5.31

4.94

5.21

5.23

5.33

1

2.77

2.47

1.33

1.42

0.96

0.86

0.96

0.95

1.04

1.02

1.09

10

14.35

13.63

12.33

12.65

9.84

9.92

9.97

9.69

9.90

9.84

9.90

5

8.34

7.58

6.43

6.65

4.93

5.05

4.99

4.97

4.99

5.03

5.03

1

2.46

2.08

1.27

1.14

0.94

0.94

0.93

1.02

0.99

1.02

0.97

10

13.88

13.71

12.75

13.02

10.14

10.21

10.20

9.97

10.21

10.16

10.22

5

7.64

7.42

6.43

6.64

5.01

5.06

5.05

5.04

5.07

5.04

5.11

1

2.16

2.00

1.24

1.32

0.83

0.93

0.89

0.93

0.97

1.02

0.98

AC

20

∗ SLR

1

25

30

CR IP T

30

ST

AN US

25

SR

M

3

SLR

ED

20

SW

α(%)

PT

n

CE

q

Table 3: Null rejection rates (%) of H0 : β1 = · · · = βq = 0 with p = 4; von Mises model.

17

ACCEPTED MANUSCRIPT

20

2

25

30

SR∗

ST∗

b SW

b SLR

SRb

STb

10

26.37

22.68

15.20

15.77

9.90

10.29

10.29

9.77

9.92

10.09

10.06

5

18.27

14.29

7.02

7.41

4.82

4.79

4.85

4.78

4.84

4.91

5.10

1

7.95

4.62

0.89

1.05

1.01

0.81

0.77

1.12

1.11

1.15

1.13

10

21.85

19.53

14.13

14.76

10.02

10.35

10.43

9.97

10.03

10.21

10.35

5

14.06

11.84

6.70

7.03

4.90

4.82

4.81

4.66

4.93

4.93

4.99

1

5.06

3.27

0.94

1.09

0.93

0.87

0.93

1.00

1.07

1.08

1.09

10

20.15

17.86

13.47

14.01

10.19

10.52

10.36

10.24

10.14

10.44

10.34

5

12.80

10.29

6.64

7.09

5.49

5.32

5.43

5.36

5.56

5.44

5.55

1

4.86

3.33

1.03

1.12

1.11

0.94

0.93

1.18

1.24

1.16

1.17

10

24.02

20.96

16.21

16.82

9.96

10.36

10.56

10.15

10.10

9.86

10.27

5

16.43

13.28

8.35

8.98

4.91

5.34

5.19

4.80

5.07

5.12

5.10

1

6.96

4.41

1.35

1.51

0.89

0.85

0.92

0.93

0.96

1.02

1.07

10

19.80

18.51

15.61

16.04

10.39

10.83

10.72

10.15

10.40

10.53

10.56

5

12.71

11.23

8.27

1

4.75

3.31

1.66

10

17.83

16.23

13.81

5

11.47

9.59

1

4.13

10

8.59

5.24

5.25

5.26

5.13

5.27

5.18

5.22

1.86

1.18

1.13

1.17

1.20

1.32

1.35

1.40

13.96

9.99

10.26

10.10

9.93

10.03

10.08

10.01

7.00

7.30

5.15

4.94

5.11

5.17

5.18

4.89

5.13

2.95

1.53

1.50

1.21

1.19

1.10

1.25

1.29

1.29

1.26

19.04

17.84

15.90

16.42

9.59

10.14

9.93

9.67

9.69

9.75

9.77

5

12.01

10.92

8.90

9.23

4.79

5.21

5.05

4.72

4.88

4.91

4.85

1

4.37

3.67

1.99

2.14

0.87

0.90

0.94

0.87

1.01

0.94

1.00

10

16.81

15.98

14.61

15.02

9.74

10.13

10.13

9.83

9.72

9.84

9.84

5

10.33

9.24

7.79

8.04

4.78

5.12

4.94

4.79

4.81

4.90

4.84

1

3.39

2.71

1.68

1.74

0.98

1.08

1.07

1.00

1.09

1.12

1.13

10

15.28

14.76

13.53

13.81

10.08

10.25

10.18

10.01

10.07

10.07

10.16

5

9.35

8.74

7.45

7.75

5.08

5.28

5.26

5.07

5.14

5.17

5.26

1

2.81

2.35

1.49

1.63

0.88

0.91

0.92

0.72

0.95

0.97

0.97

AC

20

∗ SLR

1

25

30

CR IP T

30

ST

AN US

25

SR

M

3

SLR

ED

20

SW

α(%)

PT

n

CE

q

Table 4: Null rejection rates (%) of H0 : β1 = · · · = βq = 0 with p = 6; von Mises model.

18

ACCEPTED MANUSCRIPT

δ. The power of all tests are similar and, as expected, as |δ| increases, the power tends to 1. Power

−1.5

−1.0

−0.5

0.0

0.5

1.0

1.5

−1.5

−1.0

−0.5

0.0

0.5

1.0

1.5

δ

AN US

δ

SW SLR SR ST S*LR S*R S*T SbW SbLR SbR SbT

CR IP T

0.6 power 0.4 0.0

α = 0.1

0.0

α = 0.1

0.8

1.0

0.2

0.4

power

0.6

SW SLR SR ST S*LR S*R S*T SbW SbLR SbR SbT

0.2

0.8

1.0

simulations for different values of n, p, q, φ, and α presented comparable results.

Figure 1: Power curves for the von Mises (left) and log–gamma (right) dispersion models.

In summary, the simulation study reveals that the Bartlett and Bartlett–type corrections as well

M

as the bootstrap method are effective in correcting the size distortion of the tests with no power loss. The Wald, LR, score, and gradient tests are liberal, i.e. they wrongly reject the null hypothesis more

ED

frequently than expected from the nominal level. Among all tests considered in the study, the usual Wald test is undoubtedly the most liberal. As earlier remarked, there is no general Bartlett–type

PT

correction available for the Wald test and hence the bootstrap method is a suitable option to correct its liberal behavior in small and moderate sized samples. For the other tests, the analytical corrections

Real data applications

AC

5

CE

are attractive because they are not computationally costly, unlike the bootstrap.

The first application deals with the data set presented in Fisher [34] on the distance travelled by 31 small blue periwinkles (Nodilittorina unifasciata) after they have moved down–shore from the height at which they normally live. Following Fisher and Lee [54], we assume a von Mises model for the animals’ path with constant precision parameter and link function tan(θl /2) = ηl = β0 + β1 xl , 19

l = 1, . . . , 31,

ACCEPTED MANUSCRIPT

where θl is the mean direction for a given distance moved xl (cm). The angular responses were transformed to the range (−π, π). Lemonte and Ferrari [35] considered the problem of testing H0 : (0)

(0)

β1 = 0 and H0 : β1 6= β1 , for different values of β1 , using the Wald, LR, score, and gradient tests. We revisit their analysis and include the analytically corrected tests and the bootstrap tests. The maximum likelihood estimates of the parameters (asymptotic standard errors in parentheses) are: βb0 = −0.3233 (0.1506), βb1 = −0.0129 (0.0039), and φb = 3.2654 (0.7262). The observed values

CR IP T

of the test statistics (p–values in parentheses) for testing H0 : β1 = 0 are: SW = 11.0308 (0.0009), ∗ SLR = 9.5257 (0.0020), SR = 7.1261 (0.0076), ST = 8.2801 (0.0040), SLR = 8.7682 (0.0031), b b SR∗ = 7.1901 (0.0073), and ST∗ = 8.8840 (0.0029). The p–values of the bootstrapped tests SW , SLR ,

SRb , and STb are, respectively, 0.0035, 0.0030, 0.0060, and 0.0015. At the usual significance levels, all tests reject the null hypothesis. (0)

(0)

AN US

We now turn to the null hypothesis H0 : β1 = β1 , for different values of β1 . The values chosen (0)

for β1 do not have a scientific meaning behind them; they were chosen to best highlight our results in different settings. The results are summarized in Table 5. The asterisks indicate rejection at 10% (∗ ), 5% (∗∗ ), and 1% (∗∗∗ ) significance levels. From Table 5, all tests reject H0 : β1 = −0.018 at the

M

b SRb , and STb lead to the usual significance levels. In all cases, the tests that use SR , ST , SR∗ , ST∗ , SLR

same decision. On the other hand, there may be considerable conflict among different tests in many

ED

instances. For example, the p–values of the tests of H0 : β1 = −0.022 vary from 1.89% (SW ) to values bigger than 7% (SR∗ , SRb , and STb ).

The second application regards a data set reported in Rosillo et al. [55] and Rosillo [56] on the

PT

lifetime of n = 14 fluorescent lamps. We assume the systematic component

CE

x2l θl = ηl = β0 + β1 x1l + β2 + β3 x3l



x2l x3l

2

,

l = 1, . . . , 14,

(11)

to describe the effect of the covariates glow (x1 ), voltage (x2 ), and nominal voltage (x3 ) on the

AC

lamp lifetime (Y ). We assume a log–gamma distribution for the lifetimes with unknown but constant precision φ for all observations. The maximum likelihood estimates (standard errors in parentheses) are: βb0 = 18.4791 (1.7089), βb1 = 9.4318 (4.2940), βb2 = −35.3273 (3.6169), βb3 = 16.7075 (1.8929),

and φb = 30.6994 (11.5408). The test statistics and corresponding p–values for testing H0 : β1 = 0

are given in Table 6. At the 5% significance level, the tests that employ SW , SLR , and SR reject the ∗ b b null hypothesis, but ST , SLR , SR∗ , ST∗ , SW , SLR , SRb , and STb lead to the opposite decision. Note that

p–values range from 2.81% to 9.67%. Also, the null hypotheses H0 : β2 = 0 and H0 : β3 = 0 are 20

ACCEPTED MANUSCRIPT

(0)

(0)

Table 5: Test statistics of H0 : β1 = β1 against H1 : β1 6= β1 (p–values in parentheses); periwinkles

data.

(0)

−0.026

∗∗∗

5.5088(0.0189)∗∗

11.408(0.0007)

SLR

7.3137(0.0068)∗∗∗

5.6057(0.0179)∗∗

4.0115(0.0452)∗∗

SR

5.8719(0.0154)∗∗

4.6364(0.0313)∗∗

3.4071(0.0649)∗

ST

5.7277(0.0167)∗∗

4.6110(0.0318)∗∗

3.4577(0.0630)∗

∗ SLR

6.7663(0.0093)∗∗∗

5.1847(0.0228)∗∗

3.7091(0.0541)∗

SR∗

5.7598(0.0164)∗∗

ST ∗

5.5647(0.0183)∗∗

b SW

(0.0035)∗∗∗

b SLR

(0.0105)∗∗

SRb

(0.0160)∗∗

STb

(0.0205)∗∗

SLR

3.2603(0.0710)∗

4.4667(0.0346)∗∗

3.3220(0.0684)∗

(0.0110)∗∗

(0.0295)∗∗

(0.0280)∗∗

(0.0580)∗

(0.0380)∗∗

(0.0730)∗

(0.0365)∗∗

(0.0730)∗

M

4.4935(0.0340)∗∗

(0)

β1

−0.018



1.7331(0.1880)

2.5906(0.1075)

1.4114(0.2348)

2.2511(0.1335)

1.2493(0.2637)

3.3554(0.0670)

PT

SR

ED

SW

−0.020

8.1932(0.0042)

−0.022

∗∗∗

SW

statistics

ST

2.3320(0.1267)

1.3207(0.2505)

∗ SLR

2.3946(0.1218)

1.3043(0.2534)

SR∗

2.1270(0.1447)

1.1669(0.2800)

ST ∗

2.2138(0.1368)

1.2370(0.2600)

b SW

(0.0775)∗

(0.2105)

b SLR

(0.1095)

(0.2585)

SRb

(0.1335)

(0.2795)

STb

(0.1220)

(0.2740)

CE AC

−0.024

AN US

statistics

CR IP T

β1

21

ACCEPTED MANUSCRIPT

strongly rejected by all tests.

statistic

observed value

p–value

SW

4.8247

0.0281

SLR

4.2598

0.0390

SR

3.9429

0.0471

ST

3.7532

0.0527

∗ SLR

3.2287

0.0724

SR∗

3.2483

0.0715

ST∗

3.0578

0.0803 0.0967

b SLR

0.0933

SRb

0.0783

AN US

b SW

CR IP T

Table 6: Test statistics of H0 : β1 = 0 against H1 : β1 6= 0 and p–values; fluorescent lamps data.

STb

0.0900

From our simulations, we concluded that the usual Wald, LR, score, and gradient tests are size distorted when the sample size is small (here, n = 14) and that the corrections are effective in remov-

M

ing the type I error distortion. We then rely on the corrected tests, that indicate that the covariate glow

ED

may be removed from model (11). Hence, the final model is  2 x2l x2l , θl = ηl = β0 + β2 + β3 x3l x3l

l = 1, . . . , 14.

The maximum likelihood estimates (standard errors in parentheses) are: βb0 = 19.3698 (1.9430),

6

CE

PT

βb2 = −36.7263 (4.1548), βb3 = 17.3502 (2.1798), and φb = 22.7321 (8.5296).

Final remarks

AC

The main theoretical contribution of this paper is the derivation of a Bartlett–type correction factor in matrix notation to the gradient statistic in dispersion models. The simulation study compares the performance of several different tests in small and moderate sized samples. Our empirical results suggest that the usual large sample tests, namely the Wald, LR, score tests, and the more recently proposed test, the gradient test, may perform poorly if the sample is not large enough to guarantee close approximation to the asymptotic distribution. The effectiveness of the Bartlett correction to the LR statistic and the Bartlett–type corrections to the score and gradient statistics is remarkable 22

ACCEPTED MANUSCRIPT

and comparable to the improvement achieved by the bootstrap method without computational burden. The empirical applications with real data show that the uncorrected tests may lead to misleading conclusions if the sample is not large. We, therefore, recommend the use of analytically or bootstrap corrected tests when performing inference in dispersion models.

CR IP T

Acknowledgements We gratefully acknowledge the financial support of the Brazilian agencies FAPESP (grant 2012/21788– 2) and CNPq (grants 304388/20149 and 301808/2016–3).

AN US

References

[1] G.R. Terrell, The gradient statistic. Computing Science and Statistics, 34, (2002) 206–215. [2] V.M.R. Muggeo, G. Lovison, The ‘three plus one’ likelihood-based test statistics: uniffied geometrical and graphical interpretations. American Statistician, 68, (2014) 302–306.

M

[3] A.J. Lemonte, The Gradient Test: Another Likelihood-Based Test. Academic Press, London, 2016. [4] C.R. Rao, Score test: historical review and recent developments. In: Balakrishnan, N., Kannan, N.,

Birkhauser, Boston, 2005.

ED

Nagaraja, H.N. (Eds.) Advances in Ranking and Selection, Multiple Comparisons, and Reliability.

PT

[5] A.J. Lemonte, S.L.P. Ferrari, The local power of the gradient test. Annals of the Institute of Statistical

CE

Mathematics, 64, (2012) 373–381. [6] D. Lawley, A general method for approximating to the distribution of likelihood ratio criteria. Biometrika,

AC

43, (1956) 295–303.

[7] M.S. Bartlett, Properties of sufficiency and statistical tests. Proceedings of the Royal Society A, 160, (1937) 268–282.

[8] G.M. Cordeiro, S.L.P. Ferrari, A modified score test statistic having chi–squared distribution to order n−1 . Biometrika, 78, (1991) 573–582. [9] T.M. Vargas, S.L.P. Ferrari, A.J. Lemonte, Gradient statistic: higher order asymptotics and Bartlett–type correction. Electronic Journal of Statistics, 7, (2013) 43–61.

23

ACCEPTED MANUSCRIPT

[10] G.M. Cordeiro, F. Cribari–Neto, An Introduction to Bartlett Correction and Bias Reduction. Springer, New York, 2014. [11] A.J. Lemonte, G.M. Cordeiro, G. Moreno–Arenas, Bartlett corrections in Birnbaum–Saunders nonlinear regression models. Journal of Statistical Computation and Simulation, 82, (2012) 927–935. [12] A.J. Lemonte, G.M. Cordeiro, G. Moreno–Arenas, Improved likelihood-based inference in Birnbaum–

CR IP T

Saunders nonlinear regression models. Applied Mathematical Modelling, 40, (2016) 8185–8200. [13] S.L.P. Ferrari, M.A. Uribe–Opazo, Corrected likelihood ratio tests in a class of symmetric linear regression models. Brazilian Journal of Probability and Statistics, 15, (2001) 49–67.

[14] M.A. Uribe–Opazo, S.L.P. Ferrari, G.M. Cordeiro, Improved score tests in symmetric linear regression

AN US

models. Communications in Statistics–Theory and Methods, 37, (2008) 261–276.

[15] F.M.C. Medeiros, S.L.P. Ferrari, Small-sample testing inference in symmetric and log-symmetric linear regression models. Statistica Neerlandica, doi: 10.1111/stan.12107.

[16] B.M. Lagos, P.A. Morettin, L.P. Barroso, Some corrections of the score test statistic for Gaussian ARMA

M

models. Brazilian Journal of Probability and Statistics, 24, (2010) 434–456. [17] G.M. Cordeiro, Improved likelihood ratio statistics for generalized linear models. Biometrika, 74, (1983)

ED

265–274.

[18] G.M. Cordeiro, On the corrections to the likelihood ratio tatistics. Journal of the Royal Statistical Society

PT

B, 45, (1987) 404–413.

[19] F. Cribari–Neto, S.L.P. Ferrari, Second order asymptotics for score tests in generalised linear models.

CE

Biometrika, 82, (1995) 426–432.

[20] G.M. Cordeiro, S.L.P. Ferrari, G.A. Paula, Improved score tests for generalized linear models. Journal of

AC

the Royal Statistical Society B 55, (1993) 661–674.

[21] T.M. Vargas, S.L.P. Ferrari, A.J. Lemonte, Improved likelihood inference in generalized linear models. Computational Statistics & Data Analysis, 74, (2014)110–124.

[22] D.M. Zucker, O. Lieberman, O. Manor, Improved small sample inference in the mixed linear model: Bartlett correction and improved likelihood. Journal of the Royal Statistical Society B, 62, (2000) 827– 838.

24

ACCEPTED MANUSCRIPT

[23] O. Manor, D.M. Zucker, Small sample inference for the fixed effects in the mixed linear model. Computational Statistics & Data Analysis, 46, (2004) 801–817 [24] B.M. Lagos, P.A. Morettin, Improvement of the likelihood ratio test statistic in ARMA models. Journal of Time Series Analysis, 25, (2004) 83–101. [25] D. Tu, J. Chen, P. Shi, Y. Wu, A Bartlett type correction for Rao’s score test in Cox regression model.

CR IP T

Sankhya, 67, (2005) 722–735. [26] N.P.A. van Giersbergen, Bartlett correction in the stable AR(1) model with intercept and trend. Econometric Theory, 25, (2009) 857–872.

[27] P. Bai, Sphericity test in a GMANOVA–MANOVA model with normal error. Journal of Multivariate

AN US

Analysis, 100, (2009) 2305–2312.

[28] A.J. Lemonte, S.L.P. Ferrari, F. Cribari–Neto, Improved likelihood inference in Birnbaum–Saunders regressions. Computational Statistics & Data Analysis, 54, (2010) 1307–1316. [29] A. Fujita, K. Kojima, A.G. Patriota, J.R. Sato, P. Severino, S. Miyano, A fast and robust statistical test

formatics, 26, (2010) 2349–2351.

M

based on likelihood ratio with Bartlett correction to identify Granger causality between gene sets. Bioin-

ED

[30] H. Noma, Confidence intervals for a random-effects meta-analysis based on Bartlett-type corrections. Statistics in Medicine, 30, (2011) 3304–3312.

PT

[31] B. Jørgensen, The Theory of Dispersion Models. Chapman and Hall, London, 1997. [32] A. Lee, Circular data. Wiley Interdisciplinary Reviews: Computational Statistics, 2, (2010) 477–486.

CE

[33] K.V. Mardia, P.E. Jupp, Directional Statistics. Wiley, New Jersey, (2000).

AC

[34] N.I. Fisher, Statistical Analysis of Circular Data. Cambridge University Press, (1993). [35] A.J. Lemonte, S.L.P. Ferrari, Local power and size properties of the LR, Wald, score and gradient tests in dispersion models. Statistical Methodology, 9, (2012) 537–554.

[36] A.B. Simas, G.M. Cordeiro, S. Nadarajah, Asymptotic tail properties of the distributions in the class of dispersion models. Theory of Probability and its Applications, 56, (2012) 703–710. [37] A.B. Simas, A.V. Rocha, W. Barreto–Souza, Bias-corrected estimators for dispersion models with dispersion covariates. Journal of Statistical Planning and Inference, 141, (2011) 3063–3074.

25

ACCEPTED MANUSCRIPT

[38] A.V. Rocha, A.B. Simas, G.M. Cordeiro, Second-order asymptotic expressions for the covariance matrix of maximum likelihood estimators in dispersion models. Statistics and Probability Letters, 80, (2010) 718–725. [39] G.M. Cordeiro, G.A. Paula, D.A. Botter, Improved likelihood ratio tests for dispersion models. International Statistical Review, 62, (1994) 257–274.

CR IP T

[40] S.L.P. Ferrari, G.M. Cordeiro, F. Cribari-Neto, Higher-order asymptotic refinements for score tests in proper dispersion models. Journal of Statistical Planning and Inference, 97, (2001) 177-190.

[41] L.D. Brown, Fundamentals of Statistical Exponential Families. Lecture Notes and Monographs Series, Hayward, 1986.

AN US

[42] D.R. Cox, D.V. Hinkley, Theoretical Statistics. Chapman and Hall, London, 1974

[43] D.R. Cox, N. Reid, Parameter orthogonality and approximate conditional inference. Journal of the Royal Statistical Society B, 40, (1987) 1–39.

[44] P.K. Sen, J.M. Singer, Large Sample Methods in Statistics: An Introduction with Applications, Chapman

M

and Hall/CRC, Boca Raton, 1993.

[45] J.A. Doornik, Object-Oriented Matrix Programming using Ox. Timberlake Consultants Press, London,

ED

2013.

[46] R Core Team R: A Language and Environment for Statistical Computing. R Foundation for Statistical

PT

Computing, Vienna, Austria, 2014.

[47] J. Rafter, M.L. Abell, J.P. Braselton, Statistics with Maple. Elsevier Science & Technology, London, 2003.

CE

[48] B. Efron, R.J. Tibshirani, An Introduction to the Bootstrap. Chapman and Hall, New York, 1993. [49] J. Carpenter, Assessing parameter uncertainty via bootstrap likelihood ratio confidence regions. Journal

AC

of Applied Statistics, 25, (1998) 639–649.

[50] C. Carta, J.A. Bueno, P. Ramirez, Statistical modelling of directional wind speeds using mixtures of von Mises distributions. Energy Conversion and Management, 49, (2008) 897–907.

[51] E.M.M. Ortega, V.G. Gancho, G.A. Paula, Generalized log–gamma regression models with cure fraction. Lifetime Data Analysis, 15, (2009) 79–106.

26

ACCEPTED MANUSCRIPT

[52] E.M.M. Ortega, G.A. Paula, H. Bolfarine, Deviance residuals in generalised log–gamma regression models with censored observations. Journal of Statistical Computation and Simulation, 78, (2008) 747–764. [53] J.F. Lawless, Statistical Models and Methods for Lifetime Data. 2nd ed. Wiley, New Jersey, 2003. [54] N.I. Fisher, A.J. Lee, Regression models for an angular response. Biometrics, 48, (1992) 525–529. [55] F.G. Rosillo, N. Mart´ın, M.A. Egido, Comparison of conventional and accelerated lifetime testing of

CR IP T

fluorescent lamps. Lighting Research and Technology, 42, (2010) 243–259.

[56] F.G. Rosillo, Predicci´on del tempo de vida de l´amparas fluorescentes aplicaci´on a usos fotovoltaicos.

AC

CE

PT

ED

M

AN US

PhD thesis, Instituto de Energ´ıa Solar, Universidad Polit´ecnica de Madrid, Madrid, 2011.

27