Computational Statistics & Data Analysis 51 (2006) 960 – 981 www.elsevier.com/locate/csda
Improved point and interval estimation for a beta regression model Raydonal Ospina, Francisco Cribari-Neto∗ , Klaus L.P. Vasconcellos a Departamento de Estatística, Universidade Federal de Pernambuco,Cidade Universitária, Recife/PE, 50740–540, Brazil
Received 27 May 2004; received in revised form 27 September 2005; accepted 4 October 2005 Available online 25 October 2005
Abstract In this paper we consider the beta regression model recently proposed by Ferrari and Cribari-Neto [2004. Beta regression for modeling rates and proportions. J. Appl. Statist. 31, 799–815], which is tailored to situations where the response is restricted to the standard unit interval and the regression structure involves regressors and unknown parameters. We derive the second order biases of the maximum likelihood estimators and use them to define bias-adjusted estimators. As an alternative to the two analytically bias-corrected estimators discussed, we consider a bias correction mechanism based on the parametric bootstrap. The numerical evidence favors the bootstrap-based estimator and also one of the analytically corrected estimators. Several different strategies for interval estimation are also proposed. We present an empirical application. © 2005 Elsevier B.V. All rights reserved. Keywords: Beta distribution; Beta regression; Bias; Bias correction; Interval estimation; Maximum likelihood estimation
1. Introduction The beta distribution provides a useful tool for modeling data restricted to the interval (0, 1). The class of regression models introduced by Ferrari and Cribari-Neto (2004) for modeling rates and proportions assumes that the response follows a beta law and is linked to a linear predictor (defined by regressors and unknown regression parameters) through a link function. The model is similar to the well known class of generalized linear models (McCullagh and Nelder, 1989). Our chief interest in this paper is to obtain accurate point and interval estimation strategies for the class of beta regression models. It is important note that there are different specifications for beta regressions in the literature. For instance, see Kieschnick and McCullough (2003), Paolino (2001), Vasconcellos and Cribari-Neto (2005), among others. We chose the specification given by Ferrari and Cribari-Neto (2004) for five reasons, namely: (i) Some of the proposed specifications use regression structures in the two parameters that index the beta distribution directly, and not on the mean of response; the approach we follow model the mean directly using a linear predictor and a link function. (ii) The specification used here is quite similar to that of the well known class of generalized linear models (McCullagh and Nelder, 1989). (iii) The model proposed by Ferrari and Cribari-Neto (2004), unlike several other specifications, employs a general link function relating the mean response to the linear predictor. (iv) Ferrari and Cribari-Neto (2004), in addition to proposing ∗ Corresponding author. Tel.: +55 8121267425; fax: +55 8121268422.
E-mail addresses:
[email protected],
[email protected] (F. Cribari-Neto). 0167-9473/$ - see front matter © 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.csda.2005.10.002
R. Ospina et al. / Computational Statistics & Data Analysis 51 (2006) 960 – 981
961
their beta regression model, develop point and interval estimation, hypothesis tests and diagnostic measures, including a scheme for selecting initial values for maximum likelihood point estimation; that is, they develop a complete set of inference tools. (v) Finally, their proposal has been implemented into a piece of statistical software: the betareg package for R (www.r-project.org); it is thus widely available to practitioners. An important area of research is the investigation of the behavior of maximum likelihood estimators (MLEs) in small samples, in particular the study of bias. Bias does not pose a serious problem when the sample size n is large, since its order is typically O n−1 , whereas the asymptotic standard error has order O n−1/2 . However, for moderate sample sizes, bias can become problematic. Therefore, bias correction becomes relevant when the available information is small. The derivation of expressions that can be used to define bias correction schemes typically leads to more precise Bias reduction was thoroughly studied by Box (1971), who provided a general expression estimators. for the O n−1 bias in multivariate nonlinear models with known covariance matrix. Cook et al. (1986) relate bias with the position of the explanatory variables in the sample space. Young and Bakir (1987) show that bias correction can improve estimation in generalized log-gamma regression models. Cordeiro and McCullagh (1991) provide matrix formulae for bias correction in generalized linear models. Recent papers in this area include Cordeiro and Vasconcellos (1997), who obtained formulae for bias correction in multivariate nonlinear regression models with normal errors, and Cordeiro and Vasconcellos (1999), who derived the second order biases of MLEs in von Mises regression models. Bias-corrected versions of the MLEs of the parameters that index the beta distribution were obtained by Cordeiro et al. (1997) and Cribari-Neto and Vasconcellos (2002). However, such results do not hold for models with regression structures. Vasconcellos and Cribari-Neto (2005) study the behavior of MLEs in a class of beta regression models where the parameters of the distribution are nonlinear functions of linear combinations of the explanatory variables with unknown coefficients. The usual bias correction approach uses the second order bias to adjust MLEs after these estimators were computed; it is thus performed in a corrective fashion. An alternative approach was proposed by Firth (1993). He suggested a preventive method of bias reduction by modifying the score function prior to obtaining the parameter estimates. It is also possible to perform bias adjustment using the estimated bias from a bootstrap resampling scheme, which requires no explicit derivation of the bias function. Ferrari and Cribari-Neto (1998) used Edgeworth expansions to explore the relationship between estimators that were bias-adjusted analytically and via bootstrap. The chief goal of our paper is to obtain and evaluate reliable point and interval estimation strategies in the class of beta regression models. In particular, we derive closed-form expressions for the second order biases of MLEs. The results are used to define a bias correcting mechanism up to order O n−1 . We also consider bootstrap bias adjustment, and investigate different strategies for constructing confidence intervals. The remainder of the paper unfolds as follows. In Section 2, we introduce the beta regression model of interest along with the maximum likelihood equations for point estimation and asymptotic confidence intervals (ACI). In Section 3, we derive a matrix expression for the second order biases of MLEs, and consider analytical (corrective and preventive) and bootstrap bias correction schemes. We also provide confidence intervals of three types, namely: asymptotic, percentile and BCa; the latter two intervals are bootstrap-based. In Section 4, we present simulation results that show that the proposed estimators have better performance in small samples, in terms of bias, than the original MLEs. In Section 5, we consider an empirical example. Finally, Section 6 concludes the paper.
2. The model According to Johnson et al. (1995, p. 235), “the beta distributions are among the most frequently employed to model theoretical distributions”. In particular, such a distribution is among the most employed when it comes to model random experiments producing results in the (0, 1) interval, since its parameters allow for large flexibility. Bury (1999) lists some applications of the beta distribution in engineering. Janardan and Padmanabhan (1986) model hydrological variables using the beta distribution, McNally (1990) uses the beta distribution in the study of reproducibility of cows; Graham and Hollands (1990) and Milyutin and Yaromenko (1991) use the beta distribution in the study of indices related to the transmission of solar radiation; the power of radar signals is modeled by Maffet and Wackerman (1991) using the beta law. Wiley et al. (1989) develop a beta model to estimate the probability of HIV transmission during sexual intercourse involving infected and non-infected individuals.
962
R. Ospina et al. / Computational Statistics & Data Analysis 51 (2006) 960 – 981
The random variable Y follows a beta distribution with parameters p, q > 0, denoted by Y ∼ B(p, q), if the distribution of Y admits the following density: f (y; p, q) =
(p + q) p−1 y (1 − y)q−1 , (p)(q)
y ∈ (0, 1),
(2.1)
where (·) is the gamma function. The rth moment around zero is r =
p(r) (p + q)(r)
with a(r) = a(a + 1)(a + 2) · · · (a + r − 1).
Therefore, the mean and variance of Y are p/p + q and pq/{(p + q)2 (p + q + 1)}, respectively. The skewness 3 and kurtosis 4 can also be easily obtained: 3 = 2(q − p) p −1 + q −1 + (pq)−1 (p + q + 2)−1 , 3(p + q + 1) 2(p + q)2 + pq(p + q − 6) . 4 = {pq(p + q + 2)(p + q + 3)} We note that (2.1) is a product of continuous functions admitting derivatives of all orders with respect to p and q. Furthermore, the support of the density is not a function of p or q. Let y = (y1 , . . . , yn ) be a random sample, where each observation is B(p, q) distributed. The log-likelihood function is given by (p + q) + (p − 1) log g1 + (q − 1) log g2 (p, q) = n log (p)(q) = n {A(p, q) + (p − 1) log g1 + (q − 1) log g2 } , (2.2) 1/n n n 1/n where A(p, q) = log (p + q)[(p)(q)]−1 . The quantities g1 = and g2 = are i=1 yi i=1 (1 − yi ) the geometric means of the yi ’s and (1 − yi )’s, respectively. We note that g1 and g2 are jointly sufficient statistics for p and q. Ferrari and Cribari-Neto (2004) define a regression structure for beta distributed responses using a parameterization that differs from (2.1). Let =p/(p +q) and =p +q, i.e., p = and q =(1−). Under this new parameterization, if Y ∼ B(p, q), then E(Y ) = and Var(Y ) = V ()/(1 + ), where V () = (1 − ) denotes the “variance function”. Also, plays the role of a precision parameter, in the sense that, for fixed , the larger the , the smaller the variance of the response. Using this new parameterization, the beta density in (2.1) can be written as f (y; , ) =
() y −1 (1 − y)(1−)−1 , ()((1 − ))
y ∈ (0, 1),
and the log-density becomes log f (y) = log () − log () − log ((1 − )) + ( − 1) log y + {(1 − ) − 1} log(1 − y), with 0 < < 1 and > 0, since p, q > 0. Fig. 1 presents different beta densities under the new parameterization for different values of (, ). Note that the shape of the density changes according to different choices of the parameters. In particular, when = 21 , the density is symmetric around , which does not hold when = 21 . Note also the two J-shaped densities as well as two inverted J-shaped densities. Let y = (y1 , . . . , yn ) be a random sample, where yt ∼ B t , , t = 1, . . . , n. Ferrari and Cribari-Neto (2004) proposed the following beta regression structure. Suppose the mean of yt satisfies a functional relation of the form k g t = xti i = t ,
(2.3)
i=1
where = 1 , . . . , k is a vector of unknown regression parameters, ∈ Rk , t is a linear predictor and xt1 , . . . , xtk are observations on k known covariates, k < n. We also assume that the link function g : (0, 1) → R is strictly
R. Ospina et al. / Computational Statistics & Data Analysis 51 (2006) 960 – 981
963
12 (0.05,5)
(0.95,5) 15
(0.05,15)
(0.95,15)
10
8
f (y)
f (y)
10
6
(0.10,15) 4
(0.90,15)
5 (0.25,15)
(0.25,5)
(0.75,5)
(0.50,15)
(0.75,15)
(0.50,5)
2
0
0 0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
y
0.4
0.6
0.8
1.0
y
Fig. 1. Beta densities for different values of (, ).
monotonic and twice differentiable. A number of different link functions can be used, such as the logit specification g()=log{/(1−)}, the probit function g()=−1 (), where (·) denotes the standard normal distribution function, the complementary log–log function g() = log{− log(1 − )}, the function g() = − log{− log()}, among others. A rich discussion of link functions can be found in McCullagh and Nelder (1989); see also Atkinson (1985, Chapter 7). A particularly interesting specification of the model is obtained when the logit link function is used; in this case, the mean of yt can be written as exp xt , t = 1 + exp xt where xt = (xt1 , . . . , xtk ), t = 1, . . . , n. Here, the regression parameters have an important interpretation. Suppose the value of the ith regressor is incremented by units while the values of all other regressors are held fixed. Let † be the mean of the response variable under the new covariate value and let be the mean of the response variable under the original values of the covariates. It follows that † / 1 − † , exp i = /(1 − ) i.e., exp i is the odds ratio. The log-likelihood function for the beta regression model has the form (, ) =
n
t t , ,
(2.4)
t=1
where
t t , = log () − log t − log 1 − t + t − 1 log yt + 1 − t − 1 log (1 − yt ) ; t = g −1 t , as defined in (2.3), is a function of . It is possible to show that this beta regression model is a regular model since all regularity conditions described in Cox and Hinkley (1974, p. 107) hold. Furthermore, since (2.4) is a rewritten form of (2.2), through an invertible reparameterization, one can guarantee that the MLEs are unique. The
964
R. Ospina et al. / Computational Statistics & Data Analysis 51 (2006) 960 – 981
components of the score vector, obtained by differentiation of the log-likelihood function with respect to the parameters, are given, for r = 1, . . . , k, as n n dt j(, ) jt t , dt jt Ur (, ) = = = yt∗ − ∗t xtr , jr jt dt jr dt t=1
t=1
where dt /dt =dg −1 t /dt =1/g t , yt∗ =log (yt / (1 − yt )), ∗t = t − 1 − t , with (·) denoting the digamma function,1 together with U (, ) =
n j(, ) ∗ t yt − ∗t + log (1 − yt ) + () − 1 − t . = j t=1
Consider the complete parameter vector = , . Defining the vectors y ∗ = y1∗ , . . . , yn∗ , ∗ = ∗1 , . . . , ∗n and the matrix T =diag dt /dt , with diag t denoting the n×n diagonal matrix with typical element t , t =1, . . . , n, we can write the (k + 1) × 1 dimensional score vector U ( ) in the form U (, ) , U (, ) , with U (, ) = X T y ∗ − ∗ , n ∗ U (, ) = t yt − ∗t + log (1 − yt ) + () − 1 − t . t=1
The MLEs of and are obtained as the solution of the nonlinear system U ( )=0. In practice, the maximum likelihood estimates can be obtained through numerical maximization of (2.4) using a nonlinear optimization algorithm, e.g., some form of Newton’s method (Newton–Raphson, Fisher’s scoring, BHHH, etc.) or a quasi-Newton algorithm such as BFGS. For details, see Press et al. (1992). as the (n + 1) × (k + 1) dimensional matrix Define X
X 0 , (2.5) X= 0 1 be the (n + 1) × (n + 1) matrix where X is the n × k matrix of explanatory variables. Also, let W
= W W , W W W with
W = diag
d t dt
2
(2.6)
wt ,
W = T c,
W = W ,
W = tr (diag (dt )) . Here, wt = t + 1 − t , the n-vector c=(c1 , . . . , cn ) has typical element ct = t wt − 1 − t , 2 dt = 1 − t 1 − t + 2t t − () and tr(·) is the trace operator. After some algebra, and using (2.5),
it is possible to obtain Fisher’s information matrix for the parameter vector = , as W X. K( ) = X
1 The polygamma functions are defined for m = 0, 1, . . . , as (m) (x) = dm+1 /dx m+1 log (x), x > 0. For further details, see Dishon and
Weiss (1980).
R. Ospina et al. / Computational Statistics & Data Analysis 51 (2006) 960 – 981
965
= 0, thus indicating that the parameters and are not orthogonal, in contrast to the class Note that W = W of generalized linear models (McCullagh and Nelder, 1989), where such
orthogonality holds. Also, the MLEs and K are consistent estimators of and K( ), respectively, where K is Fisher’s information matrix evaluated at D √
. Assuming that J ( ) = limn→∞ K( )/n exists and is nonsingular, we have that n − → Nk+1 0, J ( )−1 ,
D
where → denotes convergence in distribution. Hence, if r denotes the rth component of , it follows that
rr −1/2 D r − r K → N(0, 1),
rr
−1
where K is the rth diagonal element of K . Let K be the last diagonal element. Then, if 0 < < 21 ,
rr 1/2 and z represents the quantile of the N(0, 1) distribution, we have, for r = 1, . . . , k, r ± z1− /2 K and
1/2 as the limits of the asymptotic confidence intervals (ACI) for r and , respectively, both ± z1− /2 K
rr
with asymptotic coverage 100(1 − )%. The asymptotic variances of r and are estimated by K and K , respectively. These confidence intervals may lack precision in small samples, since the asymptotically pivotal quantities used to construct them may have asymmetric distributions and may not have zero mean. Another shortcoming is that these confidence intervals may include values outside the parameter space. For more details, see Davison and Hinkley (1997, Chapter 5) or Efron and Tibshirani (1993, Chapter 12).
3. Bias correction of MLEs and interval estimation We shall now obtain an expression for the second order biases of the MLEs in the class of beta regression models using Cox and Snell’s (1968) general formula. This expression will, in turn, allow us to obtain bias corrected estimates of the unknown parameters. At the outset, we shall introduce some notation. The derivatives of the log-likelihood function with respect to the unknown parameters are indicated by indices, where the letters r, s, . . . correspond to derivatives with respect to the components of and is used to denote derivatives with respect to the precision parameter. Then, Ur = j/jr , U = j/j, Us = j2 /jjs , Urs = j3 /jr js j, etc. We shall also use the notation introduced by Lawley (1956) for the moments of these derivatives: rs =E (Urs ), =E U U , rs, =E Urs U , rst =E (Urst ), etc., where all ’s are moments under the sample distribution and are, ingeneral, of order O(n). Finally, the derivatives () (t) of these moments are denoted as rs = j rs /jt , r = j r /j, etc. Cox and Snell’s (1968) formula can be used to obtain the second order bias of the MLE for the ath component of
, k , k+1 = : the parameter vector = 1 , . . . ,
1 1 (u) ar su (u) a su B a = rs − rsu + s − su 2 2 r,s,u ,s,u 1 1 () (u) + ar u r − r u + ar s rs − rs 2 2 r,,u r,s, 1 1 () (u) + a u − u + a s s − s 2 2 ,,u ,s, 1 1 () () ar a + r − r + − . 2 2 r,,
(3.1)
,,
From (2.6) we note that the entries of the matrix W are not all zero, which makes the derivation cumbersome, since all terms in (3.1) must be considered. These terms can be deduced from the moments and cumulants obtained in the
966
R. Ospina et al. / Computational Statistics & Data Analysis 51 (2006) 960 – 981
Appendix. After some algebra, we arrive at the following expression, in matrix form, for the second order bias of :
B = K X W1 + (W2 + W3 ) XK + diagonal(W4 ) K
+ K tr W3 XK X + K tr(S) + {diagonal (W4 + W5 )} XK , where W1 to W5 are given in (A.1)–(A.5) in the Appendix, S = diag (st ), with the st ’s also given in the Appendix, diagonal(·) is the row vector formed with the entries in the main diagonal of a square matrix and is the n × 1 dimensional vector containing the diagonal elements of XK X . Furthermore, −1
−1 X T cc T X X W X K = X W X Ik + ,
−1 X T c, = tr (diag (dt )) − c T X X W X
−1 1
X T c, K = K = − X W X 1 K = , where Ik denotes the k × k identity matrix. We define the (n + 1)-vector as W1 + (W2 + W3 ) XK + diagonal(W4 )
= tr W3 XK X + K tr(S) + {diagonal (W4 + W5 )} XK We also consider the k × (k + 1) upper block of the matrix K( )−1 given by
K ∗ = K K . The second-order bias of can now be written as
B = K ∗ X .
(3.2)
Similarly, we have that
B = K X W1 + (W2 + W3 ) XK + diagonal(W4 ) K
+ K tr W3 XK X + K tr(S) + {diagonal (W4 + W5 )} XK . Then, considering the 1 × (k + 1) lower block of the matrix K( )−1 , given by
K ∗ = K K , we can write the second order bias of as
B = K ∗ X .
(3.3)
Hence, from (3.2) and (3.3) we conclude that the second order bias of the MLE of the joint vector = , has the form
−1
X W B = K( )−1 X X = X .
R. Ospina et al. / Computational Statistics & Data Analysis 51 (2006) 960 – 981
967
−1 , the previous expression becomes Defining
=W
−1 W X W B = X X
.
(3.4)
Therefore, the components of B can be estimated through a weighted linear regression. We can now define a corrected estimator as
= −B ,
where B denotes the MLE of B , i.e., the unknown parameters are replaced by their MLEs. This is the corrective D √
procedure for bias reduction. It is not difficult to show that n − → Nk+1 0, J −1 ( ) , where, as before, we assume that J ( ) = limn→∞ K( )/n exists and is nonsingular. From the asymptotic normality of , we have that, for 1/2 rr 1/2 and ± z1− /2 K are the limits of ACI for r and , respectively, r = 1, . . . , k, r ± z1− /2 K both with asymptotic coverage 100(1 − )%. We shall refer to these intervals, constructed from the estimators obtained with the corrective procedure, as CACI (corrective asymptotic confidence intervals). The asymptotic variances of r
rr
rr and are estimated, respectively, by K and K , where K is the rth diagonal element of the matrix K evaluated at . There are other approaches to bias-correcting MLEs. Firth (1993) developed the so-called “preventive” method, which also allows for the removal of the second-order bias. His method consists of modifying the original score function: U ∗ ( ) = U ( ) + K( )B( ),
(3.5)
where K( ) is the information matrix and B( ) is the second-order bias. The solution ∗ of U ∗ = 0 is a bias-corrected estimator for . For the beta regression model, the substitution of B( ) by (3.4) yields the following form for the modified score function (3.5): U ∗ ( ) = U ( ) + X . The estimator ∗ , obtained as the root of the modified score function in (3.5), is consistent and asymptotically D √ normal: n ∗ − → N 0, J −1 ( ) , with J ( ) as given before. Then, for r = 1, . . . , k, the quantities ∗r ± 1/2 rr 1/2 and ∗ ± z1− /2 K ∗ are the limits of the confidence intervals for r and , respectively, z1− /2 K ∗ both with 100(1− )% asymptotic coverage. These intervals are constructed using K ∗ , the information matrix K( ) rr −1 evaluated at ∗ . In particular, K ∗ denotes the rth diagonal element of K ∗ . We shall refer to these intervals, constructed from the estimators obtained with the preventive procedure described above, as PACI (preventive asymptotic rr confidence intervals). The asymptotic variances of ∗r and ∗ are estimated, respectively, as K ∗ and K ∗ . A third approach to bias-correcting MLEs is based upon the numerical estimation of the bias through the bootstrap resampling scheme introduced by Efron (1979). Consider a random sample y = (y1 , . . . , yn ) with common distribution defined by the distribution function F. We write the parameter as = t (F ), that is, is viewed as a function of F. Let = S(y) be an estimator for . The parametric form of the bootstrap consists of obtaining, from the original sample y, a large number of pseudo-samples y + = y1+ , . . . , yn+ , obtaining the corresponding bootstrap replicates ∗ . We , estimating the distribution function of of , ∗ = S y + , and, based upon the empirical distribution of assume that F belongs to a known parametric family with finite dimension, F , so that, using a consistent estimator = S(y) is written for , we obtain a parametric estimate of F, which we denote as Fˆ . The bias of the estimator
asBF , = EF [S(y)] − t (F ), where the index F indicates expected value with respect to the measure defined by F. The estimated bias via parametric bootstrap is obtained when, instead of the true distribution F, from which the
968
R. Ospina et al. / Computational Statistics & Data Analysis 51 (2006) 960 – 981
original sample was taken, we consider the distribution Fˆ . The bias can be expressed as
= EFˆ [S(y)] − t Fˆ . BFˆ , By generating R bootstrap samples y1+ , . . . , yR+ , independently, and obtaining the corresponding bootstrap replicates ∗ ∗ ∗ (·) = 1/R R 1 , . . . , ∗R , one can approximate the value of EFˆ [S(y)] by i=1 i . Then, the bootstrap estimated bias
∗ , based upon R replicates of the estimator is B = (·) − S(y). We can now obtain an estimator that is bias F
corrected up to the second order:
∗ F , = S(y) − B = 2 − (·). ˆ Following MacKinnon and Smith (1998), we shall refer to as CBC (Constant-Bias-Corrected). of Using the bootstrap method to obtain the empirical distribution F , we can construct the percentile confidence . The interval is interval (PCI), with approximate coverage of 1 − , defined by the percentiles /2 and 1 − /2 of F
−1 ( /2), F −1 (1 − /2) . F By ordering the R bootstrap replicates of , we can obtain the lower and upper limits of the percentile confidence interval, respectively, as the R( /2)th and R(1 − /2)th replicates, assuming that R( /2) and R(1 − /2) are integers and 0 < < 0.5. This interval not necessarily is symmetric around . Note also that the interval will only include proper values of the parameter. We shall consider confidence intervals of percentile type for , and also those based on the of the corrected estimators empirical distribution F and ∗ , these latter intervals being referred to, respectively, as corrective empirical confidence intervals (CECI) and preventive empirical confidence intervals (PECI). Efron (1981, 1982) introduced a bootstrap method for interval estimation that can be extended to handle variable standard errors for different values of . This extension yields the so-called bias-corrected and accelerated confidence interval (BCaCI). The BCaCI with approximate coverage 1 − for has the form (DiCiccio and Tibshirani, 1987)
−1 z∗ −1 z∗ F , F , [ /2] [1− /2] where z[∗ /2] = v0 +
v0 + z /2 , 1 − a v0 + z /2
∗ z[1−
/2] = v0 +
v0 + z1− /2 , 1 − a v0 + z1− /2
a being a constant that measures the ratio of change of the standard error of with respect to the true parameter value. This constant is called the acceleration constant and can be estimated as
a = 16 Skew ˙ | = ,
where Skew(·) denotes the skewness coefficient of a given distribution and ˙ is the derivative of the log-likelihood function evaluated at . The limits of the BCa interval are v0 + z /2 v0 + z1− /2 , 2 = . 1 = v0 + v0 + 1 − a v0 + z /2 1 − a v0 + z1− /2
The main disadvantage of the BCa method is the large number of bootstrap replicates of that are needed (typically from 1000 to 2000 replicates are used). 4. Numerical results We shall use Monte Carlo simulation to evaluate the finite sample performance of the original MLEs and of their corrected versions. All simulations were performed using the Ox matrix programming language (Cribari-Neto and
R. Ospina et al. / Computational Statistics & Data Analysis 51 (2006) 960 – 981
969
Table 1 Simulation results for estimation of 0 = 1.5 n
Estimator
20
0
0 ∗0
0 40
0 0
∗0 0
60
0 0
∗0 0
Mean
Bias
Rel. bias
Variance
MSE
√ MSE
1.50310 1.50198 1.50465 1.50153
0.00310 0.00198 0.00465 0.00153
0.00206 0.00132 0.00310 0.00102
0.006654 0.006653 0.006664 0.006653
0.006664 0.006657 0.006685 0.006656
0.081634 0.081588 0.081763 0.081583
1.501338 1.500165 1.502564 1.500458
0.001338 0.000165 0.002564 0.000458
0.000892 0.000110 0.001709 0.000306
0.003423 0.003422 0.003427 0.003426
0.003425 0.003422 0.003433 0.003426
0.058524 0.058494 0.058593 0.058536
1.500627 1.499857 1.501424 1.500083
0.000627 −0.000143 0.001424 0.000083
0.000418 −0.000095 0.000949 0.000055
0.002387 0.002386 0.002388 0.002390
0.002387 0.002386 0.002390 0.002390
0.048859 0.048850 0.048884 0.048890
: MLE; : corrected MLE (Cox–Snell); ∗ : corrected MLE (Firth); : corrected MLE (bootstrap).
Zarkos, 2003; Doornik, 2001). The beta regression model used in the numerical exercise was g t = 0 + 1 xt , t = 1, . . . , n,
(4.1)
where the logit link function was used. The values of 0 , 1 and were set at 0 = 1.5, 1 = 1.8, = 120; the covariate values xt , used to form the matrix X of regressors, were selected as random draws from an exponential distribution with mean equal to 3. We emphasize that the values of X are kept constant for a given sample size. The sample sizes considered were n = 20, 40 and 60, the number of Monte Carlo replications was M = 5000 and the number of bootstrap replications, R = 600. The MLEs of the parameters in (4.1) were obtained by maximizing the log-likelihood function using the BFGS method with analytical derivatives; the BFGS quasi-Newton method is generally regarded as the best-performing nonlinear optimization method (Mittelhammer et al., 2000, p. 199). After obtaining the MLE, , we computed the corrected estimator . Using B and the inverse of Fisher’s ∗ information matrix evaluated at , we obtained the preventively corrected estimate by solving the nonlinear equation (3.5) using the Newton–Rapshon algorithm; for further details see Press et al. (1992). For each estimate , we generated R bootstrap replicates of the model in parametric form, and then computed the bootstrap corrected estimate . For each Monte Carlo replication and for each maximum likelihood estimate of the parameters of the model, we obtained interval estimates of the asymptotic type, of the percentile type and BCa (bootstrap). All intervals were obtained by estimating the two limits separately. In order to analyze the point estimation results, we computed, for each sample size and for each estimator: mean of estimates, bias, relative bias, variance, mean square error and root mean square error. In what concerns interval estimation, we graphically present the means of the lower and upper limits together with the empirical coverage probabilities, obtained from the relative frequencies at which the true parameter value belongs to the intervals. The observed frequencies at which the lower limit of the interval was larger/smaller than the true parameter value are also indicated. Table 1 presents simulation results for the intercept parameter 0 . We observe that, in absolute value, the estimated biases of the corrective estimator 0 and of the bootstrap estimator 0 were smaller than that of the original MLE 0 for all sample sizes considered, thus showing the effectiveness of the bias correction schemes used in the definition of these estimators. The bootstrap bias correction is slightly more pronounced than the analytically corrective bias correction. As an illustration, we note that for n = 20 the bias of the bootstrap-adjusted intercept estimator is less than half that of the corresponding MLE. It is also noteworthy that the preventive estimator ∗0 has worse performance than the other two adjusted estimators, in terms of both bias and variance and for all sample sizes considered. Additionally, we note that all estimators have similar mean squared errors.
970
R. Ospina et al. / Computational Statistics & Data Analysis 51 (2006) 960 – 981
Table 2 Simulation results for estimation of 1 = 1.8 n
Estimator
20
1
1 ∗1
1 40
1 1
∗1 1
60
1 1
∗1 1
Mean
Bias
Rel. bias
Variance
MSE
√ MSE
1.790476 1.805542 1.779916 1.786805
−0.009524 0.0055420 −0.020084 −0.013195
−0.005291 0.0030790 −0.011158 −0.007331
0.979154 0.980788 0.979601 0.977723
0.979245 0.980819 0.980004 0.977897
0.989568 0.990363 0.989952 0.988887
1.790608 1.804120 1.778862 1.789677
−0.009392 0.0041200 −0.021138 −0.010323
−0.005218 0.0022890 −0.011743 −0.005735
0.291355 0.291850 0.291086 0.291422
0.291444 0.291867 0.291533 0.291529
0.539855 0.540248 0.539938 0.539934
1.790686 1.799312 1.782815 1.789645
−0.009314 −0.000688 −0.017185 −0.010355
−0.005174 −0.000382 −0.009547 −0.005753
0.194875 0.195062 0.194753 0.194921
0.194962 0.195063 0.195048 0.195028
0.441545 0.441659 0.441642 0.441620
: MLE; : corrected MLE (Cox–Snell); ∗ : corrected MLE (Firth); : corrected MLE (bootstrap).
Table 3 Simulation results for estimation of = 120 n
Estimator
20
∗
40
∗
60
∗
Mean
Bias
Rel. bias
Variance
MSE
√ MSE
148.9531 119.1283 186.4067 111.6602
28.95312 −0.87162 66.40670 −8.33972
0.24127 −0.00726 0.55338 −0.06949
3103.577 1986.266 4849.477 1758.780
3941.860 1987.025 9259.328 1828.332
62.7842 44.5760 96.2254 42.7589
134.0331 120.6078 148.9867 119.0727
14.03311 0.60787 28.98676 −0.92725
0.11694 0.00506 0.24155 −0.00772
1096.540 888.1870 1353.774 866.9244
1293.468 888.5571 2194.007 867.7842
35.9648 29.8086 46.8402 29.4581
128.8105 120.2088 138.0418 119.5717
8.81059 0.20888 18.0418 −0.42822
0.073422 0.001741 0.150349 −0.00356
613.7862 534.6695 704.6131 530.2472
691.4128 534.7131 1030.120 530.4306
26.2947 23.1238 32.0954 23.0310
: MLE; : corrected MLE (Cox–Snell); ∗ : corrected MLE (Firth); : corrected MLE (bootstrap).
Table 2 summarizes the simulation results for the slope parameter 1 . Again, the corrective and bootstrap estimators for 1 display the best performances both in terms of bias and relative bias. The corrective estimator has slightly better small sample behavior than the bootstrap estimator, conversely to what happened for 0 . For instance, when n = 60 the bias of the correctively adjusted estimator is over 13 times smaller than that of the original MLE. The preventively corrected estimate ∗1 , similarly to what happened in the previous case, had worse finite sample behavior than the original MLE. The estimator obtained using Firth’s (1993) approach does not appear to work well in the beta regression setting, even though it proved reliable in the independent and identically distributed case (Cribari-Neto and Vasconcellos (2002)). Again, all estimators have similar mean squared errors. Table 3 presents results relating to the estimation of the parameter in (4.1). We can readily see that the MLE is, on average, far from the true parameter value ( = 120), thus displaying large bias, for the different sample sizes considered. This stresses the importance of bias correction. Similarly to what was observed in the estimation of the regression parameters, the corrective and bootstrap-based estimators improved upon the finite sample performance of the original MLE, the corrective estimator having superior performance in terms of bias and relative bias. On the other hand, the bootstrap-based estimator had the smallest variance for all sample sizes. We also note that the estimated variance of is larger than those of the remaining estimators. For instance, when n = 20, the estimated variance of
R. Ospina et al. / Computational Statistics & Data Analysis 51 (2006) 960 – 981
971
φ = 50 12 True MLE Cox&Snell Boot
10
f (y)
8
6
4
2
0 0.70
0.75
0.80
0.85
0.90
0.95
1.00
y
Fig. 2. Conditional densities: true and estimated.
was 3103.57, whereas the estimated variances of and were 1986.26 and 1758.78, respectively. The same occurs √ with MSE and MSE, for all sample sizes considered. Thus, for the parameter , we strongly recommend the use of bias correction schemes, in particular the use of bootstrap bias correction. We have performed a second set of simulations where the covariate values were obtained as random draws from a standard normal distribution. We used 0 = 1.5, 1 = 1.8, = 50 and n = 20. Again, there was not considerable variation in the point estimates of the parameters included in the linear predictor, and again the estimates of were considerably different: 61.5696 (ML), 49.0240 (CS), 45.4659 (boot) and 77.7864 (Firth). Fig. 2 presents the true and three estimated conditional densities at x = 0.47. Note that the estimated conditional density obtained using the bias corrected estimates obtained using the Cox–Snell approach is much closer to the true counterpart than that obtained by maximum likelihood. The conditional density obtained by bootstrap is also closer to the true density than that computed via maximum likelihood. It is noteworthy that the maximum likelihood density estimate is “more peaked” than the true density, which follows from the overestimation of the precision parameter. Fig. 2 thus shows that more accurate estimation of the precision parameter translates into more accurate estimation of the conditional density. Following Paolino (2001), we compute efficiency measures for the different corrected estimators relative to the MLE by comparing the average values of the first differences for the mean response from the parameters estimated with and without bias correction with the true values of the first differences. The first differences are computed by varying the covariate from one standard deviation above its average value to one standard deviation below it. That is, we compute 2 M − i,ML true i=1 EFFICm j = 100 × 2 , M i=1 i,j − true where true is the true value of the first difference, i.e., true = , ML is the first difference estimate obtained using maximum likelihood point estimates, and j is the corresponding estimate yielded by the jth bias corrective scheme, with j = CS, DF, boot relating to the Cox–Snell (CBCE), David Firth (PBCE) and bootstrap (BBCE) schemes. M is the total number of Monte Carlo replications. In addition to the estimation methods considered in the previous simulations, we now add a new estimation strategy: quasi-likelihood estimation with logit link and variance function specified as 2 V () = (1 − ) (see Cox, 1996; McCullagh and Nelder, 1989); its efficiency is denoted as EFFICm QL . Three different 2 Quasi-likelihood parameter estimation was carried out as described in McCullagh and Nelder (1989, Section 9.23).
972
R. Ospina et al. / Computational Statistics & Data Analysis 51 (2006) 960 – 981
Table 4 Efficiency with respect to the MLE: mean change
Scheme
n = 20
n = 40
n = 60
20
CS DF Boot QL
100.3206 99.11278 101.4135 87.38461
100.1487 99.65190 100.8637 99.74616
100.1071 99.83123 100.6661 99.96335
50
CS DF Boot QL
100.1175 99.52401 100.8800 80.26937
100.0602 99.88204 100.4674 99.36556
100.0423 99.94458 100.2322 99.89812
120
CS DF Boot QL
100.0583 99.80083 100.4282 67.88276
100.0249 99.9484 100.2657 98.59355
100.009 99.97745 100.3930 100.1807
Scheme
n = 20
n = 40
n = 60
20
CS DF Boot QL
125.6726 77.84475 138.5901 95.99583
110.6675 90.25059 113.351 92.04008
107.1620 93.40565 108.6689 91.31474
50
CS DF Boot QL
125.4593 81.69985 138.0299 95.10046
111.1303 90.8314 113.8717 95.2521
107.156 93.91896 108.427 96.18197
120
CS DF Boot QL
124.2345 84.44393 135.6462 95.98951
110.7002 91.6084 112.9300 97.83907
106.9798 94.23788 108.0284 95.98873
Table 5 Efficiency with respect to the MLE: variance change
sample sizes and three different values of were considered. Figures greater than 100 indicate that the corresponding estimator is more efficient, in the sense discussed above, than the MLE. The results are grouped in Table 4. They show that the estimators that were adjusted for bias using the Cox–Snell and bootstrap corrections are slightly more efficient than the MLE, and that the estimator corrected using David Firth’s preventive approach is slightly less efficient than the MLE. The results also show that the quasi-likelihood estimator is considerably less efficient than the benchmark MLE. For instance, for n = 20 and = 120, the efficiency measures of the Cox–Snell, the bootstrap and the David Firth estimators were 100.06, 100.43, 99.90; the quasi-likelihood efficiency measure was 67.88. It is also noteworthy that the quasi-likelihood efficiency (relative to maximum likelihood) is quite affected by the true value of the precision parameter: the larger the , the less efficient the estimation by quasi-likelihood when the number of observations is small. We shall now consider a second efficiency measure, in which differences are taken with respect to variances and not to means; the measure compares the estimated change in the variance to the true value, in the same fashion as above. Note that, unlike the usual linear regression model, the beta regression model explores an attractive feature of the beta distribution: the relationship between mean and variance that may occur with proportions; in particular, when the mean is close to 0 or 1, the beta-variable has smaller variance than when the mean is close to 0.5. And in many instances researchers may also be interested in not just how a covariate influences the expected value of the function, but also the variance (Paolino, 2001, p. 326). These efficiency measures, EFFICvj , j = CS, DF, boot, QL, are given in Table 5. The results show that the Cox–Snell and bootstrap estimators become considerably more efficient than the MLE, and that David Firth’s estimator and the quasi-likelihood estimator are considerably less efficient than the
R. Ospina et al. / Computational Statistics & Data Analysis 51 (2006) 960 – 981
973
n = 20
0.76
1.3
1.32 2.44
0.64 1.34 1.38 2.38
2.68 3.14 4.50 4.42 4.22 6.60 6.5
1.4
5.36 5.40 6.94 7.68 7.64 10.24 10.04
β0
1.5
1.6
1.7
2.16 1.24
1.8 1.30 1.16 0.66 0.6
4.76 4.94 3.60 3.52 3.72 2.60 2.5
8.30 6.3 7.42 5.94 6.44 4.54 4.38
1.8 99%
95%
90%
1.9
Confidence Intervals ACI CACI PACI PCI CECI PECI BCaCI
Fig. 3. Interval estimation for 0 with n = 20.
benchmark MLE. The preventive analytically corrected estimator now has the worst performance. For example, when n = 20 and = 120, EFFICvj = 124.23, 84.44, 135.64, 95.99 for j = CS, DF, boot, QL. Note that the Cox–Snell and bootstrap-based estimators were approximately 25% and 35% more efficient than the MLE, respectively. The two efficiency measures considered above, EFFICm and EFFICv , relate to the estimation of the impact of a covariate change on the mean and on the variance, respectively. Measuring such impact is oftentimes of interest. The results in Tables 4 and 5 indicate that the corrective analytically corrected estimator and the bootstrap-based estimators are superior to the preventive analytically corrected estimator, to the MLE and to the quasi-likelihood estimator. We now turn to the evaluation of the confidence intervals described in Section 3. A total of 243 intervals were obtained, with nominal coverages 1 − = 0.90, 0.95 and 0.99, and for sample sizes n = 20, 40 and 60. All confidence intervals were defined such that the probability that the true parameter value belongs to the interval is 1 − , the probability that the true parameter value is smaller than the lower limit of the interval is /2 and the probability that the true value of the parameter is greater than the upper limit of the interval is /2, for 0 < < 21 . Figs. 3–5 contain histograms constructed from the 5000 maximum likelihood estimates of the parameters, for n = 20. (In the interest of space, we only report results for n = 20.) The different lines represent the different confidence intervals under evaluation, their lengths corresponding to the respective average interval lengths. The figures above (below) the lines are the percentage of replications in which the true parameter value fell above (below) the upper (lower) limit. For the three different nominal coverage levels and the three different sample sizes, the PECI and PACI intervals have the smallest average lengths, whereas the CACI and CECI intervals have the best coverages. Upon inspection of Fig. 3 we note that all confidence intervals for 0 are approximately symmetric around the true parameter value and well balanced. For the slope parameter 1 = 1.8, the PECI and PACI intervals are again the ones with smallest average length. In general, for the three different nominal levels, the CACI interval is the one with best coverage, in the sense that the empirical coverage is nearest to the nominal counterpart. Fig. 4 shows that all confidence intervals for 1 are approximately symmetric around the true parameter value, but less well balanced than those for 0 . (The intervals become more well balanced as the sample size increases.) We observe that, in general, the empirical and asymptotic distributions of the MLEs of the regression parameters are in agreement for moderate sample sizes, thus yielding approximately balanced and approximately symmetric confidence intervals for the different sample sizes and nominal coverage levels considered. Also, the CACI intervals have the best
974
R. Ospina et al. / Computational Statistics & Data Analysis 51 (2006) 960 – 981 n = 20 -2
0.80 0.44 0.9 1.42 1.38 1.84 1.52
0
2.02 2.56 3.78 3.06 3.94 4.28 5.16
3.9 4.56 5.46 6.76 6.5 7.2 8.08
β1 2
6.16 4.18 4.60 2.72 2.72
6.42
4
2.24
2.50
1.62
1.30
4.16
0.78
1.06
6
9.68 9.8 5.2 7.4 6.78
7.26 5.2
1.3
99%
95%
Confidence Intervals ACI CACI PACI PCI CECI PECI BCaCI
90%
Fig. 4. Interval estimation for 1 with n = 20.
n = 20 0
100
0.0 0.02 0.04 5.34 2.84 9.08 22.58
0.4 5.34
7.10 5.22 18.54
1.38 5.4
1.32 5.42
39.76
10.72 14.86 25.84
φ
49.86
5.04
5.96 3.28
300
14.58
10.46
200
0.62
0.54
3.5
1.96 0.46
0.32 0.2
1.62 0.02
0.04 0.00 0.00
400
0.00
0.00 0.00
500
600
0.00
99%
95%
90%
Fig. 5. Interval estimation for with n = 20.
Confidence Intervals ACI CACI PACI PCI CECI PECI BCaCI
R. Ospina et al. / Computational Statistics & Data Analysis 51 (2006) 960 – 981
975
coverage in all situations. In addition, the PECI intervals have the smallest lengths on average. Additionally, we note that for nearly all cases the lengths of the CACI exceed those of the other intervals, which holds for both 0 and 1 . Fig. 5 displays confidence intervals for considering the same three different nominal coverage levels. We note that BCaCI intervals are approximately well balanced for the 0.90 and 0.95 coverage levels.
5. An application We shall now present an application of the beta regression model described in Section 2. We analyze the petrol data given in Prater (1956), for which our interest is to model the proportion of crude oil that is converted to petrol after fractional distillation. Basically, the model has two explanatory variables. The first is the level of crude oil, where the 10 different possible levels correspond to the proportion of crude oil that was vaporized. The second is the temperature in Farenheit at which all petrol that is contained in the amount of crude oil vaporizes. There are n = 32 different types of crude oil. The model specification includes an intercept (x1 = 1), 9 different dummy variables (x2 , . . . , x10 ) to represent the 10 possible different situations for the level of crude oil and the covariate x11 , measuring the temperature in Farenheit degrees at which all petrol vaporizes. The logit link function was used to relate the mean of the response variable to the linear predictor. The unknown coefficients were estimated through maximum likelihood using the quasi-Newton optimization method known as BFGS, with analytical derivatives. The corrective, preventive and bootstrap bias corrected estimates of the coefficients were also computed. Table 6 presents the maximum likelihood estimates along with their corrected versions and the corresponding standard errors. The adjusted versions are the corrective bias-corrected estimator (CBBE), the preventive bias-corrected estimator (PBCE) and the bootstrap bias-corrected estimator (BBCE). It can be seen that, for each regression parameter, the four different estimates that were obtained are very similar. This indicates that, although the alternative estimators can be effective in reducing the bias of the MLEs, their use does not produce a sizeable increase in precision. On the other hand, for the precision parameter , we observe a considerable difference between the maximum likelihood estimate and the bias corrected ones, the bootstrap bias correction, in particular, producing a large standard error. In this example, the analytically corrective procedure seems to produce the best estimate of .
6. Conclusions We considered the class of beta regression and derived a closed-form expression for the second order bias of the maximum likelihood estimator of the unknown parameter vector. We have shown that the expression for such a bias can be expressed as the vector of estimated coefficients in a weighted linear regression, which makes calculations easier. Simulation results show that bias correction can be effective even for small sample sizes, the analytically corrective √ and bootstrap bias corrected estimates displaying the best performance in terms of bias, relative bias, MSE and MSE, while the preventively corrected estimators have worse behavior than their MLE counterparts. The numerical results have shown that bias reduction can be effective in small samples. In particular, for the precision parameter, we observe that the MLE can become considerably biased and, therefore, we strongly recommend its bias correction. Two efficiency measures were computed through Monte Carlo simulation. They evaluate the accuracy of the mean and variance changes estimation associated to a covariate change. They show that two bias correction approaches can lead to substantial gains in accuracy relative to maximum likelihood. They also show that estimation by quasi-likelihood tends to be inefficient when the sample size is small. We have also considered interval estimation. We concluded that, in general, for the regression parameters, the confidence intervals of the PECI and PACI types have the shortest lengths and the CACI intervals have the most precise coverage. In general, all confidence intervals that were obtained for the regression parameters are approximately symmetric. However, for the precision parameter, confidence intervals are asymmetric and unbalanced, the CACI intervals having the shortest lengths and the ACI intervals, the most precise coverages when the sample sizes are relatively small. For the 90% coverage level, BCa intervals have the shortest lengths and are also the most well balanced. In general, for the precision parameter , the asymptotic intervals have the shortest lengths.
976
R. Ospina et al. / Computational Statistics & Data Analysis 51 (2006) 960 – 981
Table 6 Maximum likelihood estimates and their corrected versions along with their standard errors, for Prater’s regression model Parameter
MLE
CBCE
PBCE
BBCE
1
−6.15957 (0.18232)
−6.15473 (0.23610)
−6.19875 (0.17176)
−6.13479 (0.18448)
2
1.72773 (0.10123)
1.72699 (0.13115)
1.74411 (0.09518)
1.73227 (0.10165)
3
1.32260 (0.11790)
1.32149 (0.15270)
1.33326 (0.11087)
1.31756 (0.11851)
4
1.57231 (0.11610)
1.57108 (0.15039)
1.57829 (0.10925)
1.56388 (0.11775)
5
1.05971 (0.10236)
1.05943 (0.13259)
1.06642 (0.09624)
1.05534 (0.10201)
6
1.13375 (0.10352)
1.13324 (0.13412)
1.13551 (0.09736)
1.13384 (0.10181)
7
1.04016 (0.10604)
1.03966 (0.13736)
1.04606 (0.09969)
1.03940 (0.10193)
8
0.54369 (0.10913)
0.54384 (0.14129)
0.53526 (0.10279)
0.54428 (0.10965)
9
0.49590 (0.10893)
0.49582 (0.14107)
0.48789 (0.10255)
0.49891 (0.11456)
10
0.38579 (0.11859)
0.38508 (0.15363)
0.39056 (0.11143)
0.38778 (0.12342)
11
0.01097 (0.00041)
0.01096 (0.00053)
0.01107 (0.00039)
0.01090 (0.00039)
440.27840 (110.02563)
261.19964 (65.25724)
498.42340 (124.56206)
139.87717 (226.34557)
Pseudo R 2 : 0.9616 (for more diagnostics, see Ferrari and Cribari-Neto, 2004).
Acknowledgements The partial financial support from CNPq is gratefully acknowledged. We also thank two anonymous referees for comments and suggestions that led to a much improved paper. The usual disclaimer applies.
Appendix A We differentiate the log-likelihood function (2.4) up to the third order with respect to the unknown parameters and obtain the moments of those derivatives. We define the quantities w t = t + 1 − t , mt = t − 1 − t , dt 2 j dt , jt dt dt
j2 dt dt dt j dt 2 bt = , + dt jt dt j2t dt dt
at = 3
R. Ospina et al. / Computational Statistics & Data Analysis 51 (2006) 960 – 981
2 dt = 1 − t 1 − t + 2t t − (), 3 st = 1 − t 1 − t + 3t t − (), ct = t wt − 1 − t , ut = − 2wt + t mt + 1 − t , 2 dt rt = 2 t wt − 1 − t + 2t t − 1 − t 1 − t . dt The second and third order derivatives of the log-likelihood function are
n j dt dt ∗ dt 2 2 ∗ Urs = − wt xts xtr , + yt − t dt jt dt dt t=1
Ursu = −
n
2
mt
t=1 n
U =
dt dt
3 + wt at −
yt∗
− ∗t
bt xts xtr xtu ,
−dt ,
t=1
U = −
n
st ,
t=1
Ur =
n
dt − ct − yt∗ − ∗t xtr , dt
t=1
Urs =
n
ut
t=1
Ur =
n
dt dt
+ yt∗ − ∗t
j dt jt dt
− ct
j dt jt dt
dt xtr xts , dt
−rt xtr .
t=1
Taking the expected value of the derivatives above, we obtain the cumulants
n dt 2 2 rs = − wt xts xtr , dt t=1
rsu = −2
n
mt
t=1
= −
n
dt ,
t=1
= −
n
st ,
t=1
r = −
n t=1
ct
dt xtr , dt
dt dt
3
+ wt at xts xtr xtu ,
977
978
R. Ospina et al. / Computational Statistics & Data Analysis 51 (2006) 960 – 981
rs =
n
ut
t=1
dt dt
− ct
r = E Ur = E
n
j dt jt dt
dt xtr xts , dt
−rt xtr
t=1
=−
n
rt xtr .
t=1
Differentiating the second order cumulants with respect to the parameters and defining z = w − 1 − + t t t t 2 2 t t − 1 − t 1 − t , we have 2
(u) rs
= −
()
n
n
mt
t=1
rs =
t=1 (u)
r = −
dt dt
3
2 + wt at xtr xts xtu , 3
dt 2 ut xts xtr , dt
n dt j dt dt wt + t mt + 1 − t + ct xtr xtu , dt jt dt dt t=1
()
r = −
n
zt
t=1 (u)
=
n
dt xtr , dt
−rt xtu ,
t=1 ()
= −
n
st .
t=1
We now define the diagonal matrices
dt 3 1 −2 W1 = diag , + wt at mt 2 dt 3
dt j dt dt 1 W2 = diag + ct , ut 2 dt jt dt dt
dt j dt dt 1 W3 = diag − , + ct 2 t mt + ((1 − )) 2 dt jt dt dt
2 dt −1 2 W4 = diag , t t − 1 − t 1 − t 2 dt
−1 W5 = diag rt . 2 From the above expressions, we obtain the derivatives of the cumulants: 1 (u) w1t xtr xts xtu , rs − rsu = 2 n
t=1
1 () rs − rs = w2t xtr xts , 2 n
t=1
(A.1)
(A.2) (A.3) (A.4) (A.5)
R. Ospina et al. / Computational Statistics & Data Analysis 51 (2006) 960 – 981
979
1 (u) r − r u = w3t xtr xtu , 2 n
t=1
1 1 (u) (u) r − ru = r − r u = w3t xtr xtu , 2 2 n
t=1
1 () r − r = 2
n
w4t xtr ,
t=1
1 (u) − u = w5t xtu , 2 n
t=1
1 1 () − = − st . 2 2 n
t=1
We now compute each term in (3.1), which gives the second order biases of the MLEs in the beta regression model: n 1 = ar su (u) − w1t ar xtr xts su xtu . rsu rs 2 r,s,u r s,u t=1
Define ea as the ath column vector of the k × k identity matrix. Then, n
w1t
t=1
ar xtr
r
xts su xtu =
s,u
n
w1t
t=1
= ea K
ar xtr xt K xt
r n
w1t xt xt K xt .
t=1
Let be the n × 1 vector with components obtained from the main diagonal of the matrix XK X . We have 1 ar su (u) − rsu = ea K X W1 . rs 2 r,s,u Let tr(·) represent the trace of a square matrix. It follows that
1 (u) a su s − su = ea K tr W3 XK X . 2 ,s,u
Now,
ar u
r,,u
(u) r
1 1 (u) ar u r − r u − r u = 2 2 r,u =
n
n
xt w3t xt K . w3t ea K xt xt K = ea K
t=1
But
1 − r u = ea K X W3 X K , 2 r,,u
1 () ar s rs − rs = ea K X W2 X K . 2
r,s,
ar u
(u) r
t=1
980
R. Ospina et al. / Computational Statistics & Data Analysis 51 (2006) 960 – 981
Hence, ,,u
1 1 (u) (u) a u − u = a u − u 2 2 u a
=
n
w5t
t XK
=
a
t=1 = ea K diagonal (W5 ) XK ,
n
w5t t
XK
t=1
where t denotes the tth column vector of the n × n identity matrix. Consequently, 1 (u) a u − u = ea K diagonal (W5 ) XK , 2 ,,u 1 () a s s − s = ea K diagonal (W4 ) XK , 2 ,s, 1 () ar r − r = ea K X diagonal(W4 ) K . 2 r,,
Finally, we get ,,
a
n 1 1 1 () a − = − = a − st = a tr(S) 2 2 2 ()
t=1
= ea K K tr(S),
where S represents the matrix diag − 21 st . Therefore, 1 () a − = ea K K tr(S). 2 ,,
References Atkinson, A.C., 1985. Plots, Transformations and Regression: An Introduction to Graphical Methods of Diagnostic Regression Analysis. Oxford University Press, New York. Box, M., 1971. Bias in nonlinear estimation (with discussion). J. Roy. Statist. Soc. Ser. B 33, 171–201. Bury, K., 1999. Statistical Distributions in Engineering. Cambridge University Press, New York. Cook, R., Tsai, C., Wei, B., 1986. Bias in nonlinear regression. Biometrika 73, 615–623. Cordeiro, G.M., McCullagh, P., 1991. Bias correction in generalized linear models. J. Roy. Statist. Soc. Ser. B 53, 629–643. Cordeiro, G.M., Vasconcellos, K.L.P., 1997. Bias correction for a class of multivariate nonlinear regression models. Statist. Probab. Lett. 35, 155–164. Cordeiro, G.M., Vasconcellos, K.L.P., 1999. Second-order biases of the maximum likelihood estimates in von Mises regression models. Austral. New Zealand J. Statist. 41, 901–910. Cordeiro, G.M., Rocha, E.C., Rocha, J.G.C., Cribari-Neto, F., 1997. Bias-corrected maximum likelihood estimation for the beta distribution. J. Statist. Comput. Simulation 58, 21–35. Cox, C., 1996. Nonlinear quasi-likelihood models: applications to continuous proportions. Comput. Statist. Data Anal. 21, 449–461. Cox, D.R., Hinkley, D.V., 1974. Theoretical Statistics. Chapman and Hall, London. Cox, D., Snell, E., 1968. A general definition of residuals. J. Roy. Statist. Soc. Ser. B 30, 248–275. Cribari-Neto, F., Vasconcellos, K.L.P., 2002. Nearly unbiased maximum likelihood estimation for the beta distribution. J. Statist. Comput. Simulation 72, 107–118. Cribari-Neto, F., Zarkos, S.G., 2003. Econometric and statistical computing using Ox. Comput. Econom. 21, 277–295. Davison, A.C., Hinkley, D.V., 1997. Bootstrap Methods and Their Application. Cambridge University Press, New York. DiCiccio, T.J., Tibshirani, R., 1987. Bootstrap confidence intervals and bootstrap approximations. J. Amer. Statist. Assoc. 82, 163–170. Dishon, M., Weiss, G.H., 1980. Small sample comparison of estimation methods for the beta distribution. J. Statist. Comput. Simulation 11, 1–11. Doornik, J.A., 2001. Ox: An Object-oriented Matrix Programming Language, fourth ed. Timberlake Consultants & Oxford, London http:// www.doornik.com
R. Ospina et al. / Computational Statistics & Data Analysis 51 (2006) 960 – 981
981
Efron, B., 1979. Bootstrap methods: another look at the jackknife. Ann. Statist. 7, 1–26. Efron, B., 1981. Nonparametric standard errors and confidence intervals. Canad. J. Statist. 9, 139–172. Efron, B., 1982. The Jackknife, the Bootstrap and Other Resampling Plans. Society for Industrial and Applied Mathematics, Philadelphia. Efron, B., Tibshirani, R.J., 1993. An Introduction to the Bootstrap. Chapman & Hall, New York. Ferrari, S.L.P., Cribari-Neto, F., 1998. On bootstrap and analytical bias corrections. Econom. Lett. 58, 7–15. Ferrari, S.L.P., Cribari-Neto, F., 2004. Beta regression for modeling rates and proportions. J. Appl. Statist. 31, 799–815. Firth, D., 1993. Bias reduction of maximum likelihood estimates. Biometrika 80, 27–38. Graham, V.A., Hollands, K.G.T., 1990. Method to generate synthetic hourly solar radiation globally. Solar Energy 44, 333–341. Janardan, K.G., Padmanabhan, G., 1986. Double bounded beta distribution for hydrologic variables. Proc. 17th Annu. Pittsburgh Conference (part 3), vol. 17, pp. 1107–1111. Johnson, N., Kotz, S., Balakrishnan, N., 1995. Continuous Univariate Distributions. second ed. Wiley, New York. Kieschnick, R., McCullough, B., 2003. Regression analysis of variates observed on (0,1): percentages, proportions, and fractions. Statist. Model. 3, 193–213. Lawley, D., 1956. A general method for approximating to the distribution of likelihood ratio criteria. Biometrika 43, 295–303. MacKinnon, J.G., Smith Jr., A.A., 1998. Approximate bias correction in econometrics. J. Econometrics 85, 205–230. Maffet, A.L., Wackerman, C.C., 1991. The modified beta density function as a model for synthetic aperture radar clutter statistics. IEEE Trans. Geoscience and Remote Sensing 29, 277–283. McCullagh, P., Nelder, J., 1989. Generalized Linear Models. second ed. Chapman & Hall, London. McNally, R.J., 1990. Maximum likelihood estimation of the parameters of the prior distributions of three variables that strongly influence reproductive performance in cows. Biometrics 446, 501–514. Milyutin, E.R., Yaromenko, Y.I., 1991. Statistical characteristics of atmospheric transparency index over tilted routes. Meteorologiya i Gidrologiya 12, 72–76. Mittelhammer, R.C., Judge, G.G., Miller, D.J., 2000. Econometric Foundations. Cambridge University Press, New York. Paolino, P., 2001. Maximum likelihood estimation of models with beta-distributed dependent variables. Political Anal. 9, 325–346. Prater, N.H., 1956. Estimate gasoline yields from crude. Petroleum Refiner 35, 236–238. Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P., 1992. Numerical Recipes in C: The Art of Scientific Computing. second ed. Cambridge University Press, New York. Vasconcellos, K.L.P., Cribari-Neto, F., 2005. Improved maximum likelihood estimation in a new class of beta regression models. Brazilian J. Probab. Statist. 19, 13–31. Wiley, J.A., Herschokoru, S.J., Padiau, N.S., 1989. Heterogeneity in the probability of HIV transmission per sexual contact: the case of male-to-female transmission in penile–vaginal intercourse. Statist. Medicine 8, 93–102. Young, D., Bakir, S., 1987. Bias correction for a generalized log-gamma regression model. Technometrics 29, 183–191.