Bias and skewness in a general extreme-value regression model

Computational Statistics and Data Analysis 55 (2011) 1379–1393 Contents lists available at ScienceDirect Computational Statistics and Data Analysis ...

Download PDF

350KB Sizes 3 Downloads 145 Views

Report

PDF Reader
Full Text

Computational Statistics and Data Analysis 55 (2011) 1379–1393

Contents lists available at ScienceDirect

Computational Statistics and Data Analysis journal homepage: www.elsevier.com/locate/csda

Bias and skewness in a general extreme-value regression model Wagner Barreto-Souza ∗ , Klaus L.P. Vasconcellos Universidade Federal de Pernambuco, Departamento de Estatística, Cidade Universitária, 50740-540 – Recife, PE, Brazil

article

info

Article history: Received 10 January 2010 Received in revised form 10 September 2010 Accepted 11 September 2010 Available online 1 October 2010 Keywords: Extreme-value regression model Dispersion covariates Maximum likelihood estimates Bias correction Skewness

abstract In this paper we introduce a general extreme-value regression model and derive Cox and Snell’s (1968) general formulae for second-order biases of maximum likelihood estimates (MLEs) of the parameters. We obtain formulae which can be computed by means of weighted linear regressions. Furthermore, we give the skewness of order n−1/2 of the maximum likelihood estimators of the parameters by using Bowman and Shenton’s (1988) formula. A simulation study with results obtained with the use of Cox and Snell’s (1968) formulae is discussed. Practical uses of this model and of the derived formulae for bias correction are also presented. © 2010 Elsevier B.V. All rights reserved.

1. Introduction The extreme-value distribution, named after Emil Julius Gumbel (1891–1966), is perhaps the most widely applied statistical distribution for climate modelling and has found some applications in engineering. It is also known as the Gumbel distribution and the log-Weibull distribution. The extreme-value distribution is very useful in predicting the probability that an extreme earthquake, flood or other natural disaster will occur. Some applications such as in accelerated life testing, earthquakes, horse racing, rainfall, queues in supermarkets, sea currents, wind speeds, tracking race records and so on are listed by Kotz and Nadarajah (2000). The extreme-value regression model is one of the models most commonly used in analyzing lifetime data. Nelson (1982) and Meeker and Escobar (1998) have discussed this regression model extensively, and also illustrated its application. Chan et al. (2008) discussed point and interval estimation for a simple extreme-value linear regression model under Type-II censoring. It is important to observe that the results valid in this paper are not valid under censoring. In this paper we have two aims. First, we introduce a general extreme-value regression model. We suppose that the location and dispersion parameters vary across observations through nonlinear regression models. The systematic components of the general extreme-value regression model introduced by us follow as in the generalized nonlinear models with dispersion covariates (Cordeiro and Udo, 2008); the latter extend the generalized linear models (GLMs) that were introduced by Nelder and Wedderburn (1972) and the generalized nonlinear models (GNLMs) introduced by Cordeiro and Paula (1989). In contrast with the case for the GNLMs, the dispersion parameter in our model is defined by a nonlinear regression structure, as we mentioned earlier. Other similar regression models have been introduced in the literature. Dey et al. (1997) introduced the overdispersed generalized linear models (OGLMs) and Cordeiro and Botter (2001) considered an additional regression model for a scale parameter in the OGLMs which is incorporated in the variance function. Jørgensen (1987) introduced a general class of models called exponential dispersion models. This class of models contains the GNLMs as a particular case. Moreover, the location and scale parameters for exponential dispersion models are orthogonal (Cox and Reid, 1987), in contrast to the general extreme-value regression models. Ferrari and Cribari-Neto (2004) proposed a class

∗

Corresponding author. E-mail addresses: [email protected] (W. Barreto-Souza), [email protected] (K.L.P. Vasconcellos).

0167-9473/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.csda.2010.09.028

1380

W. Barreto-Souza, K.L.P. Vasconcellos / Computational Statistics and Data Analysis 55 (2011) 1379–1393

of regression models for modelling rates and proportions assuming that the response has a beta distribution. Azzalini and Capitanio (1999) examined further probabilistic properties of the multivariate skew-normal distribution, which extends the class of normal distributions and, in particular, they discussed the skew-normal linear regression models. Our second aim is to provide formulae for the second-order biases of the maximum likelihood estimates (MLEs) of the parameters of our model. When the sample size is large, the biases of the MLEs are negligible, since, in general, their order is O(n−1 ), while the asymptotic standard errors are of order O(n−1/2 ). However, the bias correction is important when the sample size is small. In the literature, many authors have derived expressions for the second-order biases of the MLEs in a variety of regression models. Box (1971) gives a general expression for the n−1 bias in multivariate nonlinear models with known covariance matrices. Young and Bakir (1987) show the usefulness of bias correction in generalized log-gamma regression models, which have as a particular case the extreme-value linear regression model. Cordeiro and McCullagh (1991) give general matrix formulae for bias correction in GLMs. Cordeiro and Vasconcellos (1997) obtain general matrix formulae for bias correction in multivariate nonlinear regression models with normal errors. This result is extended in Vasconcellos and Cordeiro (1997) to cover heteroscedastic models, while Cordeiro et al. (1998) derive the second-order bias for the univariate nonlinear Student-t regression. Vasconcellos and Cordeiro (2000) provide an expression for the secondorder bias in multivariate nonlinear Student-t regression models. Cordeiro and Botter (2001) and Cordeiro and Udo (2008) derive general formulae for second-order biases of MLEs in OGLMs and GNLMs with dispersion covariates, respectively. Ospina et al. (2006) provide the second-order biases of MLEs in beta linear regression models. Recently, Simas et al. (2010) obtained the second-order bias for a general beta regression model. The paper is organized as follows. In Section 2 we introduce the model and obtain the score function and Fisher’s information matrix. With this, we discuss estimation by maximum likelihood and provide asymptotic confidence intervals for the parameters. In Sections 3 and 4, formulae for the second-order biases and skewness of the maximum likelihood estimators of the parameters are derived. Monte Carlo simulations are presented in Section 5.1 for evaluating the performance of Cox and Snell’s (1968) general formula. An application to a real data set is presented in Section 5.2 and we conclude the paper in Section 6. 2. The general extreme-value regression model Let Y be a random variable with an extreme-value distribution. Then, the probability density function of Y is given by g (y; µ, φ) = φ

−1

     y−µ y−µ exp − exp − exp − , φ φ

y ∈ R,

(1)

where µ ∈ R and φ > 0 are the location and dispersion parameters, respectively. The moment generating function of Y is given by E (etY ) = et µ Γ (1 − φ t ), with |t | < φ −1 . Hence, we obtain the mean and variance of Y : E (Y ) = µ + γ φ

(2)

and Var (Y ) =

π2 6

φ2,

(3)

respectively, where γ is the Euler constant, γ = limn→∞ ( k=1 1/k − log n) ≈ 0.5772. Let Y1 , . . . , Yn be a random sample, where each Yi has a probability density function (pdf) given by (1) with location parameter µi and dispersion parameter φi . We assume that the components of both parameter vectors µ = (µ1 , . . . , µn )⊤ and φ = (φ1 , . . . , φn )⊤ vary across observations through nonlinear regression models. The extreme-value regression model with dispersion covariates is defined by (1) and by two systematic components which are parameterized as

∑n

g1 (µ) = η1 = f1 (X ; β),

g2 (φ) = η2 = f2 (Z ; θ ),

(4)

where g1 (·) and g2 (·) are known strictly monotonic and twice-differentiable link functions that map R and R , respectively, f1 (·; β) and f2 (·; θ ) are continuously twice-differentiable nonlinear functions, β = (β1 , . . . , βp )⊤ (β ∈ Rp ) and θ = (θ1 , . . . , θq )⊤ (θ ∈ Rq ) are unknown parameter vectors to be estimated and X and Z are n × p and n × q matrices with columns representing different covariates and ranks p and q, respectively; X and Z are not necessarily different. +

2.1. Estimation and inference The log-likelihood function for Y1 , . . . , Yn with observed values y1 , . . . , yn is

ℓ ≡ ℓ(β, φ) = −

n −

log(φi ) −

i=1

with µi and φi defined by (4).

n − y i − µi i=1

φi

−

n − i=1

 exp −

yi − µi

φi



,

(5)

W. Barreto-Souza, K.L.P. Vasconcellos / Computational Statistics and Data Analysis 55 (2011) 1379–1393

1381

The score function is defined by U ≡ U (β, θ ) = (∂ℓ/∂β ⊤ , ∂ℓ/∂θ ⊤ )⊤ . Let Yi∗ = exp(−Yi /φi ), y∗i = exp(−yi /φi ), µi = E (Yi∗ ) = exp(−µi /φi ) and vi = −1 + (yi − µi )[1 − exp(−(yi − µi )/φi )]/φi , for i = 1, . . . , n. Hence, we have that ∗

Uj (β, θ ) =

n − 1 dµi ∂η1i ∂ℓ =− (y∗i − µ∗i ) ∗ , ∂βj µi φi dη1i ∂βj i=1

UJ (β, θ ) =

n − ∂ℓ 1 dφi ∂η2i = vi , ∂θJ φ i dη2i ∂θJ i=1

j = 1 , . . . , p,

and J = 1, . . . , q.

In matrix notation, we have that

∂ℓ ⊤ −1 ∂ℓ ⊤ −1 = X Ω M1 (y∗ − µ∗ ), = S Φ M2 v, (6) ∂β ∂θ where we define the n × 1 vectors y∗ = (y∗1 , . . . , y∗n )⊤ , µ∗ = (µ∗1 , . . . , µ∗n )⊤ , v = (v1 , . . . , vn )⊤ as well as the n × p matrix  X = (∂η1i /∂βj )i,j , the n × q matrix  S = (∂η2i /∂θj )i,j and the n × n diagonal matrices Ω = −diag(µ∗i φi ), Φ = diag(φi ), M1 = diag(dµi /dη1i ) and M2 = diag(dφi /dη2i ). The maximum likelihood estimators of the parameters β and θ are obtained by solving the nonlinear system U = 0 and do not have closed form. Therefore, nonlinear optimization algorithms such as a Newton algorithm or a quasi-Newton algorithm are needed to find the maximum likelihood estimators of the parameters; for details, see, e.g., Nocedal and Wright (1999). We now obtain an expression for Fisher’s information matrix. First, we define the diagonal matrices Wββ = diag{(dµi /dη1i )2 /φi2 }, Wθθ = diag{c (dφi /dη2i )2 /φi2 } and Wβθ = diag{(γ − 1)(dµi /dη1i )(dφi /dη2i )/φi2 }, the constant c in Wθ θ being c = (1 − γ )2 + π 2 /6 ≈ 1.82368. Fisher’s information matrix is then obtained in Appendix A as K = K (β, θ ) =



Kββ Kθβ

Kβθ Kθθ



 ⊤   X W  X  X ⊤W  S = ⊤ ββ ⊤ βθ . S Wθβ X S Wθ θ S

 with dimensions 2n × (p + q) and 2n × 2n, respectively, as X = Define the matrices X and W 

Wββ Wθ β

Wβθ Wθθ



 X 0



0  S

 = and W

.Then, we can write Fisher’s information matrix as

 X. K = X⊤ W

(7)

We have Wβθ ̸= 0; thus, the parameters β and θ are not orthogonal, in contrast to the case for GLMs (Nelder and Wedderburn, 1972). Let τ = (β ⊤ , θ ⊤ )⊤ be the parameter vector. Under usual regularity conditions for maximum likelihood estimation and assuming that J (τ ) = limn→∞ K (τ )/n exists and is nonsingular, we have that

√

d

n( τ − τ ) → Np+q (0, J −1 (τ )), d

where  τ is the maximum likelihood estimator of τ and ‘‘→’’ denotes convergence in distribution. Hence, denoting as τr the d

r-th component of τ , it emerges that ( τr − τr )[K rr ( τ )]−1/2 → N (0, 1), where K rr ( τ ) is the r-th diagonal element of K −1 ( τ ). Thus, an asymptotic confidence interval for τr is given by τr ± z1−α/2 [K rr ( τ )]1/2 with asymptotic coverage 100(1 − α)%, where α is the significance level and zξ is the ξ quantile of the standard normal distribution. We now discuss inference for large samples in the extreme-value regression model with dispersion covariates. Consider (0) the partition τ = (τ1⊤ , τ2⊤ )⊤ , where τ1 has dimension g. Suppose we are interested in testing the hypothesis H0 : τ1 = τ1 (0)

(the null hypothesis) versus H1 : τ1 ̸= τ1

(the alternative hypothesis). The statistic of the likelihood ratio (LR) test is

ω1 = 2{ℓ( τ ) − ℓ( τ )}, where ℓ is the log-likelihood function defined in (5) and  τ is the maximum likelihood estimator of τ under the null hypothesis. The score and Wald’s tests can also be performed. Let Uτ 1 and K τ1 τ1 be, respectively, the vector and the matrix containing the components of the score vector (6) and of the inverse of the matrix (7) corresponding to τ1 . Rao’s score test statistic is given by

ω2 =  Uτ⊤1  K τ1 τ1  U τ1 , where tildes indicate that quantities are evaluated under the null hypotheses. Wald’s test has the statistic (0) (0)

ω3 = ( τ1 − τ1(0) )⊤ K τ1 τ1 ( τ1 − τ1(0) ), where  τ1 is the maximum likelihood estimator of τ1 .

d

Under H0 and some regularity conditions, we have that ωi → χg2 , for i = 1, 2, 3, where g is the difference between the number of estimated parameters under H1 and the number of estimated parameters under H0 . Hence, the decision rule for the i-th test is as follows. We reject the null hypothesis if ωi > χg2,1−α/2 , where α is the significance level and χg2, ξ represents the ξ quantile of the chi-squared distribution with g degrees of freedom.

1382

W. Barreto-Souza, K.L.P. Vasconcellos / Computational Statistics and Data Analysis 55 (2011) 1379–1393

3. Bias correction of the MLEs In this section we obtain an expression for the second-order biases of the MLEs of the parameters of the general extreme-value regression model using Cox and Snell’s (1968) general formula. First, we introduce some notation. The partial derivatives of the log-likelihood (5) with respect to the components of the unknown vectors β and θ are indicated by indices {j, l, . . .} and {J , L, . . .}, respectively. Hence, we define Uj = ∂ℓ/∂βj , UJ = ∂ℓ/∂θJ , UjL = ∂ 2 ℓ/∂βj ∂θL , UjlM = ∂ 3 ℓ/∂βj ∂βl ∂θM etc. To denote the moments of the partial derivatives above, we use the notation introduced by Lawley (1956): κjl = E (Ujl ), κj,l = E (Uj Ul ), κjl,M = E (Ujl UM ), etc., where all κ ’s refer to a total over the sample and are, in general, of order O(n). The (m)

derivatives of these moments are denoted as κjl (m)

= ∂κjl /∂βm , κjl(M ) = ∂κjl /∂θM etc. Not all of the κ ’s are functionally

independent. For example, κjl,m = κjlm −κjl gives the covariance between the first partial derivative of the log-likelihood (5) with respect to βm and its mixed partial second derivative with respect to βj , βl . Note also that κj,l = −κjl and κL,M = −κLM are typical elements of the information matrices Kββ and Kθ θ for β and θ , respectively. Let κ j,l = −κ jl and κ J ,L = −κ JL be the corresponding elements of their inverses K ββ and K θθ , which are O(n−1 ). Cox and Snell’s (1968) formula can be used to obtain the second-order bias of the MLE for the ath component of the parameter vector  τ = ( τ1 , . . . , τp+q )⊤ = ( β ⊤, θ ⊤ )⊤ : B( τa ) =

−

  −   1 1 κ aJ κ lm κJl(m) − κJlm κ aj κ lm κjl(m) − κjlm + 2

j ,l ,m

2

J ,l ,m

   −  − 1 1 κ aj κ Lm κjL(m) − κjLm + κ aj κ lM κjl(M ) − κjlM + 2

j,L,m

+

−

2

J ,L,m

+

−

2

j ,l ,M

   −  1 1 (m) (M ) aJ Lm aJ lM κ κ κJL − κJLm + κ κ κJl − κJlM κ κ aj

LM



(M )

κjL

1

− κjLM +

j,L,M

2

J ,l ,M



2

−

κ κ aJ

J ,L,M

LM



(M )

1



κJL − κJLM .

(8)

2

The parameters β and θ are not orthogonal; thus, the entries of the matrix Wβθ are not all zero. Hence, all terms in (8) must be considered, which makes the derivation onerous. After tedious algebra, we obtain an expression for the secondorder bias of  β in matrix form: B( β) = K ββ  X ⊤ [W1 Zβ d + W2 Dβ + (W3 + W5 )Zβθ d + W4 Dθ + W7 Zθ d ]1n×1

+ K βθ S ⊤ [W3 Zβ d + W4 Dβ + (W6 + W7 )Zβθ d + W8 Zθ d + W9 Dθ ]1n×1 ,

(9)

X K ββ  X ⊤ ), where the matrices W1 , . . . , W9 are defined in Appendix B, 1n×1 denotes an n × 1 vector of ones, Zβ d = diag(

i K ββ ), Zβθ d = diag( X K βθ S ⊤ ), Zθ d = diag( SK θθ S ⊤ ), Dβ = diag{d1β , . . . , dnβ } and Dθ = diag{d1θ , . . . , dnθ } with diβ = trace(X 

  i = (∂ 2 η1i /∂βj ∂βl )j,l and   Si K θθ ), X Si = (∂ 2 η2i /∂θJ ∂θL )J ,L , for i = 1, . . . , n. diθ = trace( In a similar way, we have an expression for the second-order bias of  θ in matrix form: B( θ ) = K θβ  X ⊤ [W1 Zβ d + W2 Dβ + (W3 + W5 )Zβθ d + W4 Dθ + W7 Zθ d ]1n×1

+ K θθ S ⊤ [W3 Zβ d + W4 Dβ + (W6 + W7 )Zβθ d + W8 Zθ d + W9 Dθ ]1n×1 .

(10)

We define the (2n × 1) vectors δ1 and δ2 as

δ1 =



[W1 Zβ d + (W3 + W5 )Zβθ d + W7 Zθ d ]1n×1 [W3 Zβ d + (W6 + W7 )Zβθ d + W8 Zθ d ]1n×1



 [W2 Dβ + W4 Dθ ]1n×1 , [W4 Dβ + W9 Dθ ]1n×1



and

δ2 =

and the p × (p + q) upper and q × (p + q) lower blocks of the matrix K (τ )−1 by K β∗ = (K ββ K βθ ) and K θ∗ = (K θ β K θ θ ), respectively. With these definitions, we can write the second-order biases of  β and of  θ as B( β) = K β∗ X⊤ (δ1 + δ2 )

(11)

B( θ ) = K θ∗ X⊤ (δ1 + δ2 ),

(12)

and

W. Barreto-Souza, K.L.P. Vasconcellos / Computational Statistics and Data Analysis 55 (2011) 1379–1393

1383

respectively. Hence, from (11) and (12) we conclude that the second-order bias of the MLE of the joint vector τ = (β ⊤ , θ ⊤ )⊤ has the form

 X)−1 X⊤ (δ1 + δ2 ). B( τ ) = K −1 X⊤ (δ1 + δ2 ) = (X⊤ W  −1 δ1 and ϵ2 = W  −1 δ2 , we obtain Defining ϵ1 = W  X)−1 X⊤ W  (ϵ1 + ϵ2 ). B( τ ) = (X⊤ W

(13)

Formula (13) shows that the second-order bias of  τ is easily obtained as the vector of regression coefficients in the formal  as weight matrix. We can express (13) as B( linear regression of ϵ1 + ϵ2 on the columns of X with W τ ) = B1 ( τ ) + B2 ( τ ), with ⊤ −1 ⊤  ⊤ −1 ⊤  B1 ( τ ) = (X W X) X W ϵ1 and B2 ( τ ) = (X W X) X W ϵ2 . If ϵ2 = 0, formula (13) gives the second-order bias for the extreme-value linear regression with linear dispersion covariates. Therefore, B1 ( τ ) and B2 ( τ ) may be regarded, respectively, as the linearity and nonlinearity terms in the total bias. Hence, we can define a corrected estimator τ¯ as τ¯ =  τ − B( τ ), where  B( τ ) denotes the bias (13) with the unknown parameters replaced by their MLEs. The corrected estimator τ¯√is expected to have better sampling properties than the uncorrected  τ (see Section 5.1). The asymptotic distribution of n(τ¯ − τ ) is Np+q (0, J −1 (τ )), where, as before, we assume that J (τ ) = limn→∞ K (τ )/n exists and is nonsingular. Hence, an asymptotic confidence interval for τr (the rth element of τ ) is given by τ¯r ± z1−α/2 [K rr (τ¯ )]1/2 with asymptotic coverage 100(1 − α)%, where α is the significance level and zξ is the ξ quantile of the standard normal distribution. 3.1. Bias correction of the MLEs of µ and φ We now give matrix formulae for the second-order biases of the MLEs of µ and φ . First, expanding the functions    η1i = f1 (x⊤ η2i = f2 (x⊤ i , β) and  i , θ ) given in (4) in Taylor’s series up to second-order around the points β and θ , respectively, we obtain

1  i ⊤ ( i (  η1i − η1i = X β − β) + ( β − β)⊤ X β − β) + op (‖ β − β‖2 ) 2

and 1 ⊤   η2i − η2i =  Si ( θ − θ ) + ( θ − θ )⊤ Si ( θ − θ ) + op (‖ θ − θ ‖2 ) 2

i and  where X Si are the i-th rows of the matrices  X and  S, respectively. Hence, the second-order biases of  η1 and  η2 in matrix notation are given by

B( η1 ) =  X B( β) +

1 2

Dβ 1n×1

and

B( η2 ) =  SB( θ) +

1 2

Dθ 1n×1 .

We now expand the functions  µi = g1−1 ( η1i ) and  φi = g2−1 ( η2i ) in Taylor’s series up to second order around the points η1i and η2i , respectively. With this, it follows that

 µi − µi =

dµi dη1i

( η1i − η1i ) +

1 d 2 µi 2 2 dη1i

( η1i − η1i )2 + op (( η1i − η1i )2 )

and dφi 1 d2 φ i  φi − φi = ( η2i − η2i ) + ( η2i − η2i )2 + op (( η2i − η2i )2 ). 2 dη2i 2 dη2i

i : Hence, we obtain the second-order biases of µ i and φ B( µi ) = B( η1i )

dµi dη1i

1

d2 µi

2

2 dη1i

+ Var ( η1i )

and B( φi ) = B( η2i )

dφ i dη2i

1

d2 φi

2

2 dη2i

+ Var ( η2i )

.

2 2 Defining the n × n diagonal matrices T1 = diag{d2 µi /dη1i } and T2 = diag{d2 φi /dη2i }, we provide an expression for the second-order biases of the MLEs of µ and φ in matrix notation, as follows:

B( µ) =

1

B( φ) =

1

B( µ) =

1

2

{M1 [2 X B( β) + Dβ 1n×1 ] + Zβ d T1 1n×1 }

and

{M2 [2 SB( θ ) + Dθ 1n×1 ] + Zθ d T2 1n×1 }. 2 For the extreme-value regression model, using (11) and (12) we obtain 2

{M1 [2 X K β∗ X⊤ (δ1 + δ2 ) + Dβ 1n×1 ] + Zβ d T1 1n×1 }

(14)

1384

W. Barreto-Souza, K.L.P. Vasconcellos / Computational Statistics and Data Analysis 55 (2011) 1379–1393

and B( φ) =

1 2

{M2 [2 SK θ∗ X⊤ (δ1 + δ2 ) + Dθ 1n×1 ] + Zθ d T2 1n×1 }.

(15)

The corrected estimators  µ= µ − B( µ) and  φ = φ − B( φ) of µ and φ , respectively, have biases of order O(n−2 ), where  B(·) denotes the MLE of B(·), i.e., the unknown parameters are replaced by their MLEs. 4. Skewness We here give an asymptotic formula of order n−1/2 , where n is the sample size, for the skewness of the distribution of the maximum likelihood estimators of the parameters in general extreme-value regression model. The assumption of symmetry plays a crucial role in many statistical models. The most well-known measure of skewness 3/2 is Pearson’s standardized third cumulant, defined by γ1 = κ3 /κ2 , where κ2 and κ3 are the second and third cumulants of the distribution, respectively; when γ1 > 0 (γ1 < 0) the distribution is positively (negatively) skewed. If the distribution is symmetrical, γ1 vanishes; therefore its value will give some indication of the departure from symmetry. Further, it has also been suggested and used as a possible measure of nonnormality of the distribution. Bowman and Shenton (1998) give an asymptotic formula for the skewness of the distribution of the maximum likelihood estimators and discuss applications related to the maximum likelihood estimators of the parameters of the Poisson, binomial, normal, gamma and beta distributions. Cordeiro and Cordeiro (2001) give an asymptotic formula of order n−1/2 for the skewness of the distribution of the maximum likelihood estimates of the linear parameters in generalized linear models. Moreover, they also give asymptotic formulae for the skewness of the distribution of the maximum likelihood estimators of the dispersion and precision parameters. We use the same notation as was introduced in Section 3. Let κ3 ( τa ) = E [( τa − τa )3 ] be the third cumulant of the maximum likelihood estimator τa of τa , for a = 1, . . . , p + q. Using the formula found by Bowman and Shenton (1998), we can write to order n−2

κ3 ( τa ) =

−

κ a,j κ a,l κ a,m {κj,l,m + 3κjlm + 6κjl,m } +

κ a,J κ a,l κ a,m {κJ ,l,m + 3κJlm + 6κJl,m }

J ,l ,m

j ,l ,m

+

−

−

a ,j

a ,L

κ κ κ

a,m

j,l,M

j,L,m

+

−

κ a,J κ a,L κ a,m {κJ ,L,m + 3κJLm + 6κJL,m } +

+

−

κ a,J κ a,l κ a,M {κJ ,l,M + 3κJlM + 6κJl,M }

J ,l ,M

J ,L,m

−

κ a,j κ a,l κ a,M {κj,l,M + 3κjlM + 6κjl,M }

−

{κj,L,m + 3κjLm + 6κjL,m } +

a ,j

a ,L

κ κ κ

a,M

{κj,L,M + 3κjLM + 6κJL,M } +

κ a,J κ a,L κ a,M {κJ ,L,M + 3κJLM + 6κJL,M }.

−

(m)

(j)

(l)

(m)

Plugging the Bartlett identities κj,l,m = 2κjlm − κjl − κjm − κlm and κjl,m = κjl cumulant of τa as a function of the cumulants calculated in the previous section as

κ3 ( τa ) = −

−

(j) (l) κ a,j κ a,l κ a,m {κjlm − 5κjl(m) + κjm + κlm }−

−

−

a ,j

a ,L

κ κ κ

a,m

(m)

(L)

(j)

{κjLm − 5κjL + κjm + κLm } −

j,L,m

−

−

−

(J ) (l) κ a,J κ a,l κ a,m {κJlm − 5κJl(m) + κJm + κlm }

−

(j) (l) κ a,j κ a,l κ a,M {κjlM − 5κjl(M ) + κjM + κlM }

j ,l ,M

(J ) (L) κ a,J κ a,L κ a,m {κJLm − 5κJL(m) + κJm + κLm }−

J ,L,m

−

− κjlm in (16), we obtain the third

J ,l ,m

j ,l ,m

−

(16)

J ,L,M

j,L,M

−

(J ) (l) κ a,J κ a,l κ a,M {κJlM − 5κJl(M ) + κJM + κlM }

J ,l ,M a ,j

a ,L

κ κ κ

a,M

(M )

{κjLM − 5κjL

(L)

(j)

+ κjM + κLM } −

j,L,M

−

(J ) (L) κ a,J κ a,L κ a,M {κJLM − 5κJL(M ) + κJM + κLM }.

(17)

J ,L,M

S ⊤ , Nβ = (hβ 1 , . . . , hβ n ), Nθ = We now define the matrices Bββ = K ββ  X ⊤ , Bβθ = K βθ S ⊤ , Bθ θ = K θ θ S ⊤ , Bθ β = K θ β (hθ 1 , . . . , hθ n ), Nβθ = (hβθ 1 , . . . , hβθ n ) and Nθβ = (hθβ 1 , . . . , hθ β n ), with hβ i = Hβ i 1p×1 , hθ i = Hiθ 1q×1 , hβθ i = Hiβθ 1p×1 ,

i K ββ }, Hθ i = diag{K θ θ  i K βθ }, for hθ β i = Hiθβ 1q×1 , Hβ i = diag{K ββ X Si K θ θ }, Hβθ i = diag{K βθ  Si K θ β } and Hθ β i = diag{K θ β X i = 1, . . . , n. Here, diag{A}, for a given matrix A, denotes a diagonal matrix with the same diagonal as A. With these definitions and after tedious algebra, which is shown with some details in Appendix C, the third cumulant of  β is obtained as 







3) (3) (2) (2)⊤ κ3 ( β) = [B(ββ V1 + Bβθ V8 ]1n×1 + Bββ [2V3 + V5 ]B⊤ βθ 1p×1 + Bββ [V6 + 2V7 ]Bβθ 1p×1

⊤ ⊤ ⊤ + diag{Nβ [V2 B⊤ ββ + V4 Bβθ ] + Nβθ [V4 Bββ + V9 Bβθ ]}1p×1 ,

where V1 , . . . , V9 are defined in Appendix C, A(2) = A ⊙ A and A(3) = A ⊙ A ⊙ A, with ⊙ denoting the direct product.

(18)

W. Barreto-Souza, K.L.P. Vasconcellos / Computational Statistics and Data Analysis 55 (2011) 1379–1393

1385

Table 1 MLEs and corrected MLEs of the parameters with their respective mean squared errors for n = 15, 40, 60. n = 15

β1

β2

θ1

θ2

MLEs

1.0157 (0.029573) 1.0074 (0.028112)

−1.1169

−1.0983

(0.79610) −1.0618 (0.68206)

(0.58132) −0.85299 (0.38411)

0.26502 (0.45394) 0.37290 (0.34129)

1.0060 (0.0047162) 1.0002 (0.0045484)

−0.99541

−0.77236

(0.046229) −0.99912 (0.045846)

(0.040934) −0.70555 (0.034298)

1.0036 (0.0035434) 0.9998 (0.0034738)

−0.9961

−0.7501

(0.033480) −1.0016 (0.033376)

(0.028271) −0.70428 (0.025105)

Corrected MLEs n = 40 MLEs Corrected MLEs

0.48221 (0.022347) 0.49453 (0.020585)

n = 60 MLEs Corrected MLEs

0.4857 (0.016972) 0.49542 (0.016105)

Table 2 Uncorrected (UN) and corrected (CO) confidence intervals for the parameters with their respective coverages below in parentheses for n = 15.

(1 − α) × 100%; UN

βˆ0

βˆ1

θˆ0

θˆ1

99%

(0.8187; 1.2126) (0.7091) (0.8658; 1.1655) (0.6159) (0.8899; 1.1415) (0.5549)

(−2.2074; −0.0264) (0.8265) (−1.9467; −0.2871) (0.7565) (−1.8133; −0.4205) (0.7028)

(−1.9129; −0.2836) (0.7578) (−1.7182; −0.4784) (0.6450) (−1.6185; −0.5780) (0.5693)

(−0.4546; 0.9846) (0.7441) (−0.2825; 0.8126) (0.6156) (−0.1945; 0.7245) (0.5408)

(0.7607; 1.2542) (0.7975) (0.8197; 1.1952) (0.7134) (0.8498; 1.1650) (0.6518)

(−2.3674; 0.2438) (0.8791) (−2.0553; −0.0683) (0.8195) (−1.8955; −0.2280) (0.7730)

(−1.6675; −0.0384) (0.8476) (−1.4728; −0.2331) (0.7427) (−1.3731; −0.3328) (0.6627)

(−0.3478; 1.0937) (0.8117) (−0.1755; 0.9213) (0.6961) (−0.0873; 0.8331) (0.6185)

95% 90%

(1 − α) × 100%; CO 99% 95% 90%

In an analogous way, we obtain the third cumulant of the maximum likelihood estimator  θ of θ as 3) (3) (2) (2)⊤ κ3 ( θ ) = [B(θβ V1 + Bθθ V8 ]1n×1 + Bθβ [2V3 + V5 ]B⊤ θθ 1q×1 + Bθ β [V6 + 2V7 ]Bθ θ 1q×1

⊤ ⊤ ⊤ + diag{Nθβ [V2 B⊤ θβ + V4 Bθθ ] + Nθ [V4 Bθβ + V9 Bθ θ ]}1q×1 . −2

(19)

of the maximum likelihood estimators of  β and  θ , given by (18) and (19), and the

From the third cumulants of order n  X)−1 , we obtain the skewness of order n−1/2 of the maximum likelihood estimators of asymptotic covariance matrix (X⊤ W  β and  θ. 5. Numerical results 5.1. Simulation The aim of this section is to compare the finite sample performances of the uncorrected and corrected MLEs of the parameters of the extreme-value regression model through Monte Carlo simulation. We have considered the extreme-value regression model with the systematic components log µi = β1 x1i + eβ2 x2i

and

log φi = θ1 + θ2 x1i ,

i = 1 , . . . , n.

The values of the covariates x1 and x2 were obtained as random draws from the N (−1, 1) and U (0, 1), respectively. We have used 10,000 Monte Carlo replications and all simulations were carried out using the Ox matrix programming language (Doornik, 2006). The regression model was simulated with β1 = 1, β2 = −1, θ1 = −0.7, θ2 = 0.5. Table 1 shows the MLEs and corrected MLEs with respective mean squared errors (MSE) for the sample sizes 15, 40 and 60. We see that the biases of the MLEs were reduced, mainly those relating to dispersion covariates. The MSEs of the MLEs were also reduced after second-order correction for all cases. We also observe that the uncorrected and corrected MLEs have their biases reduced when n increases, which was already expected. Tables 2–4 give uncorrected and corrected confidence intervals for the parameters and the coverage of those intervals; we use the significance levels α = 0.01, 0.05, 0.1. We observe that the corrected confidence intervals present coverages that

1386

W. Barreto-Souza, K.L.P. Vasconcellos / Computational Statistics and Data Analysis 55 (2011) 1379–1393

Table 3 Uncorrected (UN) and corrected (CO) confidence intervals for the parameters with their respective coverages below in parentheses for n = 40.

(1 − α) × 100%; UN

βˆ0

βˆ1

θˆ0

θˆ1

99%

(0.8554; 1.1566) (0.96230) (0.8914; 1.1206) (0.89900) (0.9098; 1.1021) (0.83820)

(−1.5148; −0.4760) (0.96740) (−1.3906; −0.6002) (0.91470) (−1.3271; −0.6637) (0.85870)

(−1.1972; −0.3475) (0.96020) (−1.0956; −0.4491) (0.895100) (−1.0437; −0.5011) (0.83300)

(0.1590; 0.8054) (0.96760) (0.2363; 0.7281) (0.90080) (0.2758; 0.6886) (0.83800)

(0.8404; 1.1600) (0.97180) (0.8786; 1.1218) (0.92030) (0.8982; 1.1023) (0.86500)

(−1.5542; −0.4431) (0.97760) (−1.4215; −0.5767) (0.93190) (−1.3536; −0.6446) (0.88510)

(−1.1304; −0.2807) (0.97720) (−1.0288; −0.3823) (0.91950) (−0.9769; −0.4342) (0.86270)

(0.1712; 0.8179) (0.97360) (0.2485; 0.7406) (0.91420) (0.2880; 0.7010) (0.85360)

95% 90%

(1 − α) × 100%; CO 99% 95% 90%

Table 4 Uncorrected (UN) and corrected (CO) confidence intervals for the parameters with their respective coverages below in parentheses for n = 60.

(1 − α) × 100%; UN

βˆ0

βˆ1

θˆ0

θˆ1

99%

(0.8639; 1.1433) (0.97620) (0.8973; 1.1099) (0.92170) (0.9144; 1.0928) (0.86330)

(−1.4439; −0.5483) (0.97220) (−1.3369; −0.6554) (0.92310) (−1.2821; −0.7102) (0.86720)

(−1.1258; −0.3745) (0.97010) (−1.0360; −0.4643) (0.91040) (−0.9900; −0.5102) (0.85290)

(0.1911; 0.7802) (0.97370) (0.2615; 0.7098) (0.91390) (0.2976; 0.6738) (0.85440)

(0.8545; 1.1449) (0.98120) (0.8892; 1.1102) (0.93460) (0.9070; 1.0925) (0.88020)

(−1.4712; −0.5318) (0.97950) (−1.3589; −0.6441) (0.93860) (−1.3015; −0.7016) (0.88660)

(−1.0799; −0.3286) (0.98130) (−0.9901; −0.4184) (0.92820) (−0.9441; −0.4644) (0.87140)

(0.2007; 0.7901) (0.97730) (0.2711; 0.7196) (0.92770) (0.3072; 0.6835) (0.86530)

95% 90%

(1 − α) × 100%; CO 99% 95% 90%

are closer to the true coverages than their uncorrected counterparts. We also observe that the coverages of the uncorrected and corrected confidence intervals approach the true coverages when the sample size increases. 5.2. Application Our main aim here is motivate the use of the extreme-value regression model with dispersion covariates. With this aim, we present an application to a real data set. The data set used here was presented in Huet et al. (2004) and also in Faivre and Masle (1988). It considers the growth of winter wheat, focusing on the differences in dry weights of the wheat tillers and stems. The explanatory variable x is measured on a cumulative degree–days scale. This degree–days measure is an integral in time of all temperatures at which the wheat is submitted that are above the smallest temperature at which wheat can develop. Temperatures are measured in degrees Celsius and time is measured in days, the initial time being determined by the physiological state of the wheat. Plants growing on n = 18 randomly chosen small areas of about 0.15 m2 are harvested each week and the dry weights of the tillers for plants harvested from each area are measured in milligrams. We assume that the dry weight (y) of tillers follows an extreme-value distribution with a pdf in the form of (1). We have considered a nonlinear regression with the systematic components parameterized, for i = 1, . . . , 18, as

µi = β0 + eβ1 +β2 xi and

log φi = θ0 + θ1 xi .

Results for this regression have led us to try a restricted regression, where θ0 = 0, with the systematic components parameterized, for i = 1, . . . , 18, as

µi = β0 + eβ1 +β2 xi and

log φi = θ xi .

Wald’s test and the AIC and BIC criteria show that the restricted model is preferable and we then considered bias correction of maximum likelihood estimates for the latter model. Table 5 shows the uncorrected and corrected MLEs of the parameters with their respective asymptotic standard errors and asymptotic confidence intervals (ACI) at the significance level of 5%. Theoretically, it is not difficult to see that the difference between the standard deviations of the uncorrected and corrected estimators is O(n−3/2 ). Therefore, like for the original MLE, the difference between the standard deviation of the corrected estimator and the inverse information is also O(n−1/2 ). In the first table, the asymptotic quantities (standard error and ACI) are estimated using the original MLEs, while,

W. Barreto-Souza, K.L.P. Vasconcellos / Computational Statistics and Data Analysis 55 (2011) 1379–1393

1387

Table 5 Uncorrected and corrected MLEs of the extreme-value nonlinear regression model with linear dispersion covariates for the data set analyzed. Uncorrected

MLE

Stand. error

ACI (95%)

β0 β1 β2 θ

81.6 −2.74 0.0140 0.00661

14.7 0.928 0.00135 0.000297

(52.7, 110.) (−4.56, −0.922) (0.0114, 0.0167) (0.00603, 0.00719)

16.1 1.03 0.00151 0.000297

(48.4, 111.) (−4.71, −0.673) (0.0110, 0.0169) (0.00622, 0.00738)

Corrected

β0 β1 β2 θ

79.9

−2.69 0.0140 0.00680

Fig. 1. Graphics of the uncorrected and corrected mean and dispersion curves for the real data set.

in the second, we use the corrected estimators. Fig. 1 shows graphics for the uncorrected and corrected mean and dispersion curves for the real data set. Looking at these graphics, we may note that the mean curves provide a good approximation of the data, i.e., they yield good fits. Further, the uncorrected and the corrected dispersion curves provide different fits. From Table 5, it can be seen that the correction in the θ parameter, from 0.00661 to 0.00680, corresponds to a shift of 0.00019, which is approximately 64% of the estimated standard deviation of 0.000297. We see, therefore, that, in practice, the bias correction can produce a significant change in the estimation of the parameters—in this case, 64% of the standard deviation. 6. Conclusions In this paper we introduced a general extreme-value regression model by considering two nonlinear regression structures for the location and scale parameters. We have obtained, for this model, the score function and Fisher’s information matrix. We then discussed estimation by maximum likelihood and provided asymptotic confidence intervals for the parameters. We also obtained the second-order biases and skewness of the maximum likelihood estimators of the parameters, the main results of this paper. Moreover, for a nonlinear extreme-value regression model with linear dispersion covariates, we observe, through Monte Carlo simulations, the usefulness of the bias correction, mainly for the standard deviation, as can be seen in Section 5.1. We have also concluded that interval estimation by using the corrected MLEs is better than with the uncorrected MLEs of the parameters. An application to real data was also presented. We considered a data set describing the growth of winter wheat, which is reported in Huet et al. (2004) and also in Faivre and Masle (1988). The response y is the total dry weight in milligrams of tillers for plants growing on n = 18 randomly chosen small areas of about 0.15 m2 and the explanatory variable x is measured on a cumulative degree–days scale. We fitted a nonlinear extreme-value regression model with linear dispersion covariates. The homoscedasticity hypothesis was rejected for the model, thus showing the usefulness of dispersion covariates. The fitting of the corrected standard deviation shows that the original MLE was underestimating it, as happened in the simulation study. Also, we observe a reasonably large correction in the estimate of the parameter θ defining the dispersion. The correction corresponds to a shift of approximately 64% of the estimated standard deviation.

1388

W. Barreto-Souza, K.L.P. Vasconcellos / Computational Statistics and Data Analysis 55 (2011) 1379–1393

Acknowledgements We gratefully acknowledge financial support from CAPES and CNPq. We also thank an anonymous referee for comments and suggestions. Appendix A We now derive Fisher’s information matrix. The second-order derivatives of (5) with respect to the parameters β and φ are given by

 2  n − ∂ 2ℓ 1 d µi ∂η1i ∂η1i yi − µi =− exp − 2 ∂βj ∂βl φ d η ∂βj ∂βl φ i 1i [ i =1 i  ] [ 2 ] n − 1 yi − µi d µi ∂η1i ∂η1i dµi ∂ 2 η1i 1 − exp − + , + 2 φi φi ∂βj ∂βl dη1i ∂βj ∂βl dη1i i=1 [   ]  2 n − ∂ 2ℓ 1 yi − µi y i − µi yi − µi dφi ∂η2i ∂η2i = =− 1 + exp − −1 2 ∂θJ ∂θL φ φ φ d η ∂θJ ∂θL φ i i i 2i  i =1 i    n − 1 yi − µi y i − µi y i − µi + −1 + − exp − φ φ φi φi i i i=1    2  2 2 dφi ∂η2i ∂η2i dφi ∂ η2i d φi 1 + × − 2 φi dη2i ∂θJ ∂θL dη2i ∂θJ ∂θL dη2i

Ujl =

UJL

(20)

(21)

and UjL =

  n − ∂ 2ℓ 1 yi − µi yi − µi dµi dφi ∂η1i ∂η2i =− exp − ∂βj ∂θL φi φi dη1i dη2i ∂βj ∂θL φi2 i =1 [  ] n − 1 y i − µi dµi dφi ∂η1i ∂η2i − 1 − exp − . 2 φ d η1i dη2i ∂βj ∂θL φ i i i =1

(22)

Under regularity conditions, it is known that the expected value of the score function vanishes. Hence, we obtain

[   ] Y i − µi Y i − µi = 1 and E exp − = γ − 1, φi φi φi ∑n  −1 where i = 1, . . . , n and γ is Euler constant, γ = limn→∞ − log n ≈ 0.5772. It is easy to show that k=1 k    2  Y i − µi Yi − µi π2 E exp − = Γ (2) (2) = − γ (2 − γ ), φi φi 6 [



E exp −

Y i − µi

]

Γ (2) (·) denoting the second derivative of the gamma function. Using the results above and taking the expected values in (20)–(22), it emerges that

 2 n − d µi 1 ∂η1i ∂η1i κjl = − , 2 d η ∂βj ∂βl φi 1i i=1

 2 n − dφi 1 ∂η2i ∂η2i (2) κJL = − [Γ (2) − 1] 2 d η ∂θJ ∂θL φi 2i i=1

and

κjL = −

n − γ − 1 dµi dφi ∂η1i ∂η2i . φi2 dη1i dη2i ∂βj ∂θL i=1

In matrix notation, we have

  ∂ 2ℓ E − = X ⊤ Wββ  X, ∂β∂β ⊤

∂ 2ℓ E − ∂θ ∂θ ⊤ 



= S ⊤ Wθ θ S

and E

  ∂ 2ℓ − = X ⊤ Wβθ S. ∂β∂θ ⊤

Appendix B We here present the cumulants of third order and use the notation of Section 3 with (jl)i = ∂ 2 η1i /∂βj βl , (JL)i = ∂ η2i /∂θJ θL , (j, l)i = ∂η1i /∂βj ∂η1i /∂βl , (jl, m)i = ∂ 2 η1i /∂βj ∂βl η1i /∂βm , (jlm)i = ∂ 3 η1i /∂βj ∂βl βm and so on. 2

W. Barreto-Souza, K.L.P. Vasconcellos / Computational Statistics and Data Analysis 55 (2011) 1379–1393

Taking the expected value of the partial derivatives of (20)–(22), it follows that n −

κjlm = −





φi

−3

d µi

3

+ 3φi

dη1i

i=1

d µi d 2 µi

−2

 (j, l, m)i −

2 dη1i dη1i

n −

φi−2



d µi

2

dη1i

i =1

[(lm, j)i + (jm, l)i + (jl, m)i ],

  2  n n − − γ − 1 d µi d φ i 3−γ 1 − γ d2 µi dφi dµi + ( j , l , M ) − (jl, M )i , κjlM = i 3 2 2 dη1i φi dη1i dη2i φi2 dη1i dη2i φi i=1 i=1   3  n 2 − dφ i −2 dφi d φi −3 (2) (3) (2) − 3(Γ (2) + 1)φi κJLM = (Γ (2) + 6Γ (2) + 4)φi 2 dη2i dη2i dη2i i=1   2 n − dφ i × (J , L, M )i − (Γ (2) (2) + 1) φi−2 [(JM , L)i + (LM , J )i + (JL, M )i ] dη2i i=1 and

κJLm =

n −

 (2)

[4(γ − 1) − Γ

(2)]φi

−3



n −

d µi d φ

φi−2

dη1i dη2i

i=1

d µi

2

dη2i

i =1

+ (1 − γ )

dφi

dη1i

φi dµi (J , L, m)i 2 dη1i dη2i

−2 d

+ (1 − γ )φi

2



(JL, m)i .

The partial derivatives of the elements of Fisher’s information matrix are given by (m)

κjl

= −2

n −

φi

n −



φi−3

κJL

d µi

dη2i

= 2(Γ (2) (2) + 1)

n −

×

φi

−2



 φi

−3

2

dη2i

i =1

κjL(m) = (1 − γ )

dφi

n −

φi−2

[





dφi

d µi

2

dη1i

[(jm, l)i + (lm, j)i ],

κJL(m) = 0,

3

dη2i

− φi

−2

dφi d2 φi



2 dη2i dη2i

(J , L, M )i − (Γ (2) (2) + 1)

[(JM , L)i + (LM , J )i ],

d 2 µi dη

2 1i

i=1

φi

−2

(j, l, M )i ,

i=1 n −

n − i =1

dφi

2

dη1i

i=1

(M )

(j, l, m)i −

2 dη1i dη1i

i =1

κjl(M ) = 2

d µi d 2 µi

−2

(j, L, m)i +

d µi dη1i

(jm, L)i

]

dφ i dη2i

and (M )

κjL

= (γ − 1)

 n [ −

2φi

−3



d2 φi 2 dη2i

i=1



φi 2 dη2i

−2 d

− φi

2

]

d µi dη1i

(j, L, M )i −

n −

φi

−2

i =1

dµi dφi dη1i dη2i

 (LM , j)i .

We define the quantities

ω1i = ω3i ω4i ω6i ω7i

1

 φi

−3

2

d µi

3

dη1i

− φi

−2

d µi d 2 µi 2 dη1i dη1i

 ,

1 ω2i = − φi−2 2



d µi dη1i

2

,

 2  d µi dφi µi −3 = (1 − γ )φi − (3 − γ )φi , 2 2 dη1i dη2i dη1i    2 2 1 − γ − 2 d µi d φ i 1 d µi dφ i −3 − 2 d µi = φi , ω5i = (γ + 1)φi + (γ − 1)φi , 2 2 dη1i dη2i 2 dη1i dη1i dη2i    2 2 1 dφi d µi −2 d φi −3 (2) =− [4(γ − 1) − Γ (2)]φi + (1 − γ )φi , 2 2 dη2i dη1i dη2i    2 2 1 dφi d µi −3 −2 d φi (2) = Γ (2)φi + (1 − γ )φi , 2 2 dη2i dη1i dη2i 1





−2 d

2

1389

1390

W. Barreto-Souza, K.L.P. Vasconcellos / Computational Statistics and Data Analysis 55 (2011) 1379–1393

Γ (3) (2)

]  3  1  (2) dφi d2 φi dφi , + Γ (2) (2) φi−3 Γ (2) + 1 φi−2 − 2 2 dη2i 2 dη2i dη2i   Γ (2) (2) + 1 −2 dφi 2 ω9i = − φi 2 dη2i ω8i = −

[

and the diagonal matrices Wk = diag{ωk1 , . . . , ωkn } for i = 1, . . . , n and k = 1, . . . , 9; Γ (3) (·) denotes the third derivative of the gamma function. It can be verified that

Γ (3) (2) = (1 − γ )

π2 2

+ γ 2 (3 − γ ) − 2ζ (3),

−3 where ζ (3) = ≈ 1.202. k=1 k With the notation introduced above, we have that

∑∞

1

κjl(m) − κjlm = 2

n −

ω1i (j, l, m)i −

n −

i=1

1

κJl(m) − κJlm = 2

n −

ω3i (J , l, m)i +

n −

i =1

1

κjL(m) − κjLm =

ω3i (j, L, m)i +

n −

1

n −

2

ω5i (j, l, M )i −

n −

i =1

1

κJL(m) − κJLm =

n −

ω6i (J , L, m)i −

i=1

1

κJl(M ) − κJlM =

ω7i (J , l, M )i +

n −

i=1

1

κjL(M ) − κjLM = 2

ω4i (JL, m)i ,

i =1

n −

2

ω4i (jl, M )i ,

i=1

n −

2

ω4i (jm, L),

i =1

i =1

κjl(M ) − κjlM =

ω4i (lm, J )i ,

i=1

n −

2

ω2i [(jl, m)i − (jm, l)i − (lm, j)i ],

i =1

ω4i (JM , l)i ,

i =1

n −

ω7i (j, L, M )i +

i=1

n −

ω4i (LM , j)i

i=1

and 1

κJL(M ) − κJLM = 2

n −

ω8i (J , L, M )i +

i =1

n −

ω9i [(LM , J )i − (JL, M )i + (JM , L)i ].

i=1

We now calculate the terms in (8) to obtain the second-order biases for the MLEs of the extreme-value regression model with dispersion covariates. ea as the ath column vector of the matrix, 1n×1 as a vector of ones with ∑nDefine∑ ∑pn × p identity ∑ aj lM aJ Lm dimension n × 1, L1,a = ω κ κ ( jl , M ) and L = − ω κ κ (JL, m)i . We have 4i i 2 , a 4i i=1 j ,l ,M i=1 J ,L,m

−

  n n − − − − − − 1 κ aj κ lm κjl(m) − κjlm = ω1i κ aj (j)i κ lm (l, m)i + ω2,i κ aj (j)i κ lm (lm)i 2

j,l,m

−

2

j,l,M

i =1

i =1

l ,m

J

i =1

βθ⊤ βθ⊤ = e⊤ S W3 Zβ d 1n×1 + e⊤ S W4 Dβ 1n×1 , a K a K

κ κ aj

Lm



(m)

1

κjL − κjLm 2

j,L,m

−

l ,m

j

j

l ,m

  n n − − − − − − 1 κ aJ κ lm κJl(m) − κJlm = ω3i κ aJ (J )i κ lm (l, m)i + ω4,i κ aJ (J )i κ lm (lm)i

J ,l ,m

−

i =1

ββ ⊤ ββ ⊤ = e⊤ X W1 Zβ d 1n×1 + e⊤ X W2 Dβ 1n×1 , a K a K

 =

n − i=1

ω3i

− j

κ aj (j)i

−

κ Lm (L, m)i + L1,a

L,m

ββ ⊤ = e⊤ X W3 Zβθ d 1n×1 + L1,a , a K

κ κ aj

lM

  n − − − 1 (M ) κjl − κjlM = ω5i κ aj (j)i κ lM (l, M )i − L1,a 2

i=1

j

l ,M

ββ ⊤ = e⊤ X W5 Zβθ d 1n×1 − L1,a , a K

J

l ,m

W. Barreto-Souza, K.L.P. Vasconcellos / Computational Statistics and Data Analysis 55 (2011) 1379–1393

−



1

κ aJ κ Lm κJL(m) − κJLm

n −

ω6i

−

i=1

κ aJ (J )i

−

1391

κ Lm (L, m)i + L2,a

L ,m

J

βθ⊤ S W6 Zβθ d 1n×1 + L2,a , = e⊤ a K

  n − − − 1 ω7i κ aJ (J )i κ lM (l, M )i − L2,a κ aJ κ lM κJl(M ) − κJlM = 2

J ,l,M

−

=

2

J ,L,m

−



i =1

l ,M

J

βθ⊤ S W7 Zβθ d 1n×1 − L2,a , = e⊤ a K

κ κ aj

LM



(M )

κjL

1

− κjLM



2

j,L,M

=

n −

ω7i

−

i =1

−

κ aj (j)i

κ LM (L, M )i +

n −

L,M

j

−

ω4i

i =1

κ aj (j)i

−

κ LM (LM )i

L,M

j

ββ ⊤ ββ ⊤ X W4 Dθ 1n×1 , X W7 Zθ d 1n×1 + e⊤ = e⊤ a K a K

and

−

  n n − − − − − − 1 κ aJ κ LM κJL(M ) − κJLM = ω8i κ aJ (J )i κ LM (L, M )i + ω9i κ aJ (J )i κ LM (LM )i 2

J ,L,M

i=1

L,M

J

i=1

⊤ βθ⊤

L,M

J

⊤ βθ⊤

= ea K S W8 Zθ d 1n×1 + ea K S W9 Dθ 1n×1 . Appendix C We here calculate the cumulants of expression (17). With the cumulants obtained previously, it is easy to show that (m)

−κjlm + 5κjl

(l)

(j)

− κjm − κlm =

n −

 φi

−3

n −



φi−2

−κJlm + 5κJl

(l)

(J )

− κJm − κlm =

n −

d µi

− 3φi

 (γ − 5)φi

−3

2

(L)

(j)

−κjLm + 5κjL − κjm − κLm =



n −

(γ − 5)φi

−3

d µi



n −

φi−2

i=1

(M )

−κjlM + 5κjl

(l)

(j)

− κjM − κlM =

n −

 (γ + 7)φi

−3

n −

φi−2

i =1

(m)

(L)

(J )

−κJLm + 5κJL − κJm − κLm =

n −

dφi

dµi dφi

2

dη2i

 (2)]φi

−3

n −

φi−2

i=1

(M )

−κJlM + 5κJl

(l)

(J )

− κJM − κlM =

n −

dµi dφi dη1i dη2i

− 3(1 − γ )φi

d µi

[−4(1 − γ ) + Γ

(2)]φi

i=1

+ 3(1 − γ )

n − i =1

φi−2

d µi d φ i dη1i dη2i

2



dφi

2

dη2i

− 3(1 − γ )φi

−2

dµi d2 φi



2 dη1i dη2i

(J , L, m)

(JL, m)i ,

 (2)

 µi dφi (j, l, M )i 2 dη2i dη1i

−2 d

dη1i

i =1

− 3(1 − γ )

+ 3(1 − γ )φi

2

(jl, M )i ,

dη1i dη2i

[8(1 − γ ) + Γ

 µi dφi (j, L, m)i 2 dη2i dη1i

−2 d

(jm, L)i ,

dφi

dµi dφi

(2)

+ 3(1 − γ )φi

dη2i

dη1i

i=1

− 3(1 − γ )

2

 µi dφi (J , l, m)i 2 dη2i dη1i 2

(J , lm)i ,

dη1i dη2i

d µi



(j, l, m)i

−2 d

dη2i

dη1i

i=1

+ 3(1 − γ )

dφi

dη1i dη2i

i=1

(m)

2

dµi dφi

φi−2



2 dη1i dη1i

dη1i

n −

d µi d 2 µi

[(jm, l)i + (j, lm)i − (jl, m)i ],

d µi



i=1

+ 3(1 − γ )

−2

dη1i

i=1

(m)

3

dη1i

i=1

−3

d µi



−3

d µi dη1i

(JM , l)i ,



dφ i dη2i

2

+ 3(1 − γ )φi

−2

dµi d2 φi 2 dη1i dη2i

 (J , l , M )

1392

W. Barreto-Souza, K.L.P. Vasconcellos / Computational Statistics and Data Analysis 55 (2011) 1379–1393

(M )

−κjLM + 5κjL

n −

(j)

(L)

− κjM − κLM =

 [−4(1 − γ ) + Γ

(2)

(2)]φi

n −

dµi dφi

φi−2

dη1i dη2i

i=1

dφi



dη1i

i =1

+ 3(1 − γ )

d µi

−3

2

dη2i

+ 3(1 − γ )φi

−2

dµi d2 φi 2 dη1i dη2i

 (j, L, M )

(j, LM )i

and (M )

n −

(J )

(L)

−κJLM + 5κJL − κJM − κLM =

 (−Γ

(3)

(2) + 2)φi n −

3

− 3(Γ

dη2i

i=1

− 3(Γ (2) (2) + 1)

dφ i



−3

φi−2



dφ i

2

dη2i

i=1

(2)

(2) + 1)φi

−2

dφi d2 φi 2 dη2i dη2i

[(J , LM )i + (JM , L)i − (JL, M )i ].

We define

v1i = φi−3



d µi

3

dη1i

v3i = (γ − 5)φi−3 v5i = (γ + 7)φi−3



− 3φi−2 d µi dη1i



d µi

dφ i dη2i

2

dφ i dη2i

(2)

−3

v7i = [−4(1 − γ ) + Γ v8i = [2 − Γ

2 dη1i dη1i

dη1i

v6i = [8(1 − γ ) + Γ

(3)

2

d µi d 2 µi

(2)]φi

(2)]φi

(2)

−3

d µi



dη1i

3

dη2i

2

dη2i

d µi

−3

dφ i



dφ i



d2 µi dφi dη

dη2i

2 1i

d2 µi dφi

− 3(1 − γ )φi−2

dη1i

dφi

v2i = −3φi−2

+ 3(1 − γ )φi−2

(2)]φi



,

2 dη2i dη1i

d µi dη1i

,

v4i = 3(1 − γ )φi−2

dµi d2 φi 2 dη1i dη2i

+ 3(1 − γ )φi−2

dη2i

− 3[Γ (2) (2) + 1]φi−2

, dµi dφi dη1i dη2i

,

− 3(1 − γ )φi−2

2

2

,

dµi d2 φi 2 dη1i dη2i

,

dφi d2 φi 2 dη2i dη2i

and

v9i = −3[Γ (2) (2) + 1]φi−2

dφ i



2

dη2i

,

and the diagonal matrices Vk = diag{vk1 , . . . , vkn }, for i = 1, . . . , n and k = 1, . . . , 9. Hence, we obtain (j) (l) −κjlm + 5κjl(m) − κjm − κlm =

n −

v1i (j, l, m)i +

n −

i=1

(J ) (l) − κlm = −κJlm + 5κJl(m) − κJm

n −

v3i (J , l, m)i +

n −

i=1

v3i (j, L, m)i +

n −

i=1

(j) (l) −κjlM + 5κjl(M ) − κjM − κlM =

n −

v5i (j, l, M )i −

n −

n −

n −

n −

v6i (J , L, m)i − v7i (J , l, M )i +

n − i=1

v4i (JL, m)i ,

i=1 n −

i =1

(j) (L) −κjLM + 5κjL(M ) − κjM − κLM =

v4i (jl, M )i ,

i=1

i=1

(J ) (l) −κJlM + 5κJl(M ) − κJM − κlM =

v4i (L, jm)i ,

i=1

i =1

(J ) (L) −κJLm + 5κJL(m) − κJm − κLm =

v4i (J , lm)i ,

i=1

n −

(j) (L) = −κjLm + 5κjL(m) − κjm − κLm

v2i [(jm, l)i + (j, lm)i − (jl, m)i ],

i =1

v4i (JM , l)i ,

i =1

v7i (j, L, M )i +

n − i =1

v4i (j, LM )i ,



,

(J , L, M )i

W. Barreto-Souza, K.L.P. Vasconcellos / Computational Statistics and Data Analysis 55 (2011) 1379–1393

1393

and (J ) (L) −κJLM + 5κJL(M ) − κJM − κLM =

n − i=1

v8i (J , L, M )i +

n −

v9i [(J , LM )i + (JM , L)i − (JL, M )i ].

i =1

Finally, we calculate the terms in (17) to obtain the third cumulant of order n−2 for the MLEs of the extreme-value regression model with dispersion covariates. It follows that

−

−

−

−

j ,l ,m

J ,l,m

−

− j,L,m

−

− j ,l ,M

−

− J ,L,m

−

− J ,l,M

−

− j,L,M

(j) (l) (3) ⊤ κ a,j κ a,l κ a,m {κjlm − 5κjl(m) + κjm + κlm } = e⊤ a [Bββ V1 1n×1 + diag{Nβ V2 Bββ }1p×1 ], (J ) (l) (2) ⊤ ⊤ κ a,J κ a,l κ a,m {κJlm − 5κJl(m) + κJm + κlm } = e⊤ a [Bββ V3 Bβθ 1p×1 + diag{Nβ V4 Bβθ }1p×1 ], (j) (L) (2) ⊤ ⊤ κ a,j κ a,L κ a,m {κjLm − 5κjL(m) + κjm + κLm } = e⊤ a [Bββ V3 Bβθ 1p×1 + diag{Nβ V4 Bβθ }1p×1 ], (j) (l) (2) ⊤ ⊤ κ a,j κ a,l κ a,M {κjlM − 5κjl(M ) + κjM + κlM } = e⊤ a [Bββ V5 Bβθ 1p×1 − diag{Nβ V4 Bβθ }1p×1 ], (J ) (L) (2)⊤ ⊤ + κLm } = e⊤ κ a,J κ a,L κ a,m {κJLm − 5κJL(m) + κJm a [Bββ V6 Bβθ 1p×1 − diag{Nβθ V4 Bββ }1p×1 ], (J ) (l) (2)⊤ ⊤ κ a,J κ a,l κ a,M {κJlM − 5κJl(M ) + κJM + κlM } = e⊤ a [Bββ V7 Bβθ 1p×1 + diag{Nβθ V4 Bβθ }1p×1 ], (j) (L) (2)⊤ ⊤ κ a,j κ a,L κ a,M {κjLM − 5κjL(M ) + κjM + κLM } = e⊤ a [Bββ V7 Bβθ 1p×1 + diag{Nβθ V4 Bββ }1p×1 ]

and

−

− J ,L,M

(J ) (L) (3) ⊤ + κLM } = e⊤ κ a,J κ a,L κ a,M {κJLM − 5κJL(M ) + κJM a [Bβθ V8 1n×1 + diag{Nβθ V9 Bβθ }1p×1 ].

References Azzalini, A., Capitanio, A., 1999. Statistical applications of the multivariate skew-normal distribution. Journal of the Royal Statistical Society B 61, 579–602. Bowman, K.O., Shenton, L.R., 1998. Asymptotic skewness and the distribution of maximum likelihood estimators. Communications in Statistics. Theory and Methods 27, 2743–2760. Box, M., 1971. Bias in nonlinear estimation (with discussion). Journal of the Royal Statistical Society B 33, 171–201. Chan, P.S., Ng, H.K.T., Balakrishnan, N., Zhou, Q., 2008. Point and interval estimation for extreme-value regression model under type-II censoring. Computational Statistics & Data Analysis 52, 4040–4058. Cordeiro, G.M., Botter, D.A., 2001. Second-order biases of maximum likelihood estimates in overdispersed generalized linear models. Statistics & Probability Letters 55, 269–280. Cordeiro, G.M., McCullagh, P., 1991. Bias correction in generalized linear models. Journal of the Royal Statistical Society B 53, 629–643. Cordeiro, G.M., Paula, G.A., 1989. Improved likelihood ratio statistics for exponential family nonlinear models. Biometrika 76, 93–100. Cordeiro, G.M., Udo, M.C.T., 2008. Bias correction in generalized nonlinear models with dispersion covariates. Communications in Statistics. Theory and Methods 37, 2219–2225. Cordeiro, G.M., Vasconcellos, K.L.P., 1997. Bias correction for a class of multivariate nonlinear regression models. Statistics & Probability Letters 35, 155–164. Cordeiro, G.M., Vasconcellos, K.L.P., Santos, M.L., 1998. On the second-order bias of parameter estimates in nonlinear regression models with Student t errors. Journal of Statistical Computation and Simulation 60, 363–378. Cordeiro, H.H., Cordeiro, G.M., 2001. Skewness for parameters in generalized linear models. Communications in Statistics. Theory and Methods 30, 1317–1334. Cox, D.R., Reid, N., 1987. Parameter orthogonality and approximate conditional inference. Journal of the Royal Statistical Society B 49, 1–39. Cox, D.R., Snell, E.J., 1968. A general definition of residuals. Journal of the Royal Statistical Society B 30, 248–275. Dey, D.K., Gelfand, A.E., Peng, F., 1997. Overdispersed generalized linear models. Journal of Statistical Planning and inference 64, 93–107. Doornik, J.A., 2006. An Object-Oriented Matrix Language - Ox 4, 5th ed. Timberlake Consultants Press, London. Faivre, R., Masle, J., 1988. Modeling potential growth of tillers in winter wheat. Acta Ecologica 9, 179–196. Ferrari, S.L.P., Cribari-Neto, F., 2004. Beta regression for modeling rates and proportions. Journal of Applied Statistics 31, 799–815. Huet, S., Bouvier, A., Poursat, M.-A., Jolivet, E., 2004. Statistical Tools for Nonlinear Regression, 2nd ed. Springer. Jørgensen, B., 1987. Exponential dispersion models (with discussion). Journal of the Royal Statistical Society B 49, 127–162. Kotz, S., Nadarajah, S., 2000. Extreme Value Distributions: Theory and Applications. Imperial College Press. Lawley, D.N., 1956. A general method for approximating to the distribution of the likelihood ratio criteria. Biometrika 71, 233–244. Meeker, W.Q., Escobar, L.A., 1998. Statistical Methods for Reliability Data. John Wiley & Sons, New York. Nelder, J.A., Wedderburn, R.W.M., 1972. Generalized linear models. Journal of the Royal Statistical Society A 135, 370–384. Nelson, W., 1982. Applied Life Data Analysis. John Wiley & Sons, New York. Nocedal, J., Wright, S.J., 1999. Numerical Optimization. Springer. Ospina, R., Cribari-Neto, F., Vasconcellos, K.L.P., 2006. Improved point and interval estimation for a beta regression model. Computational Statistics & Data Analysis 51, 960–981. Simas, A.B., Barreto-Souza, W., Rocha, A.V., 2010. Improved estimators for a general class of beta regression models. Computational Statistics & Data Analysis 54, 348–366. Vasconcellos, K.L.P., Cordeiro, G.M., 1997. Approximate bias for multivariate nonlinear heteroscedastic regressions. Brazilian Journal of Probability and Statistics 11, 141–159. Vasconcellos, K.L.P., Cordeiro, G.M., 2000. Bias corrected estimates in multivariate Student-t regression models. Communications in Statistics B 29, 797–822. Young, D., Bakir, S., 1987. Bias correction for a generalized log-gamma regression model. Technometrics 29, 183–191.

Bias and skewness in a general extreme-value regression model

Bias and skewness in a general extreme-value regression model

Recommend Documents