Statistics and Probability Letters 78 (2008) 3269–3273
Contents lists available at ScienceDirect
Statistics and Probability Letters journal homepage: www.elsevier.com/locate/stapro
Residuals and their statistical properties in symmetrical nonlinear models Francisco José A. Cysneiros ∗ , Luis Hernando Vanegas Departamento de Estatística, CCEN-UFPE-Cidade Universitária-Recife, PE-Brazil 50740-540, Brazil
article
info
Article history: Received 11 July 2005 Received in revised form 24 April 2008 Accepted 2 June 2008 Available online 17 June 2008
a b s t r a c t In this work we present theoretical details of a general residual for symmetric nonlinear regression models. This class of models includes all symmetric continuous distributions such as normal, Student-t, Pearson VII, power exponential and logistic. Such regression models are used for the analysis of data sets containing influential or outlying observations, that can significantly influence inferential conclusions. On the basis of expansions of Cox and Snell [Cox, D.R., Snell, E.J., 1968. A general definition of residuals. Journal of the Royal Statistical Society B 30, 248–275], we calculate first and second moments of a general definition of a residual for symmetrical nonlinear regression models. Also, the statistical properties of some proposed residuals are studied using Monte Carlo simulations for the Michaelis–Menten model, frequently used in chemical and biological experiments. © 2008 Elsevier B.V. All rights reserved.
1. Introduction Statistical modelling is one of the tools most used by researchers in several areas of the knowledge whose main interest is in answering research questions through the statistical analysis of a data set. Classical analysis based on models with normal errors is one of the most popular methods used when the response is continuous, because it is easily applicable and has a great number of theories developed for it. However, it is very well known that modelling under the assumption of normally distributed errors can be highly influenced by extreme observations (see, e.g., Cook and Weisberg (1982)). As an alternative to the classical analysis for data sets with extreme observations we have models with error distributions belonging to the class of symmetrical distributions (see Fang et al. (1990)). In this article we present theoretical details of a general residual for the symmetrical nonlinear regression models where the standardized residual proposed by Galea et al. (2005) is a particular case. Also, we derived the deviance and quantal residual and, following the methodology proposed by Cox and Snell (1968), we obtained approximate expressions for the expectation and the variance of these residuals. The work is organized as follows. In Section 2 we present the symmetrical nonlinear regression model. In Section 3 we develop the first and second moments for a general residual and the standardization of the residual is presented. In Section 4 we present the results of a simulation experiment using the Michaelis–Menten model, in which statistical properties of the residuals are studied empirically. In the last section, we present some conclusions. 2. Symmetrical nonlinear regression model Let random variables Y1 , . . . , Yn be independent and let each Yi have a density of the symmetrical class of distributions with location parameter µi ∈ R and of scale φ > 0. The density function of Yi , denoted by Yi ∼ S (µi , φ), is defined as
∗
Corresponding author. E-mail addresses:
[email protected] (F.J.A. Cysneiros),
[email protected] (L.H. Vanegas).
0167-7152/$ – see front matter © 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.spl.2008.06.011
3270
F.J.A. Cysneiros, L.H. Vanegas / Statistics and Probability Letters 78 (2008) 3269–3273
(y − µi )2 , f (y; µi , φ) = √ g φ φ 1
y ∈ R,
(1)
for some function g (·) called the density generator with g (u) > 0 for u > 0 and 0 u−1/2 g (u)du = 1 (see, for example, Chmielewski (1981), Fang and Anderson (1990) and Fang et al. (1990)). The characteristic function of Yi , ςYi (t ) = E(eitYi ), is given for eit µi ϕ(t 2 φ), t ∈ R, for some function ϕ , with ϕ(u) ∈ R for u > 0. When they exist, the mean and variance of Yi are, respectively, E(Yi ) = µi and Var(Yi ) = ξ φ , where ξ > 0 is a constant, given by ξ = −2ϕ 0 (0), with ϕ 0 (0) = ∂ϕ(u)/∂ u|u=0 , and which does not depend on the parameters µi and φ . The symmetrical nonlinear regression model is given by
R∞
yi = µi (β; xi ) + i ,
i = 1, . . . , n,
(2)
where i ∼ S (0, φ), µi = µi (β; xi ) is a nonlinear function of β = (β1 , . . . , βp ) that is twice continuously differentiable, T
∂µ
such that the derivative matrix Dβ = ∂β has rank p (p < n) for all β with µ = (µ1 , . . . , µn )T , y = (y1 , . . . , yn )T is the vector of response variables and xi = (xi1 , . . . , xik )T is a vector of the values of k explanatory variables. The log-likelihood function of θ T = (βT , φ)T is given by L(θ) =
n X
1
− log φ + log{g (zi2 )}
(3)
2
i=1
where zi = (yi − µi )/φ 1/2 . The log-likelihood function L(θ) is assumed regular (Cox and Hinkley, 1974) with respect to β 4dg and φ . The Fisher information matrix for θ is a block-diagonal matrix, namely Kθ θ = diag{Kββ , Kφφ } where Kββ = φ DTβ Dβ
(4fg − 1), with dg = E (Wg 2 (u2 )u2 ), fg = E (Wg 2 (u2 )u4 ) for u ∼ S (0, 1) and Wg (u) = g 0 (u)/g (u) and g 0 (u) = (∂ g (u)/∂ u), v(u) = 2Wg (u). Galea et al. (2005) developed an iterative process for finding the maximum likelihood and Kφφ =
n 4φ 2
estimates. 3. Residuals In many of the available regression models in the statistical literature, the response yi for the ith individual in the sample is considered a function of the systematic component and the ith element of a vector of errors. In consequence, it is reasonable that the process of validation in regression analysis is based on verifying the residuals having statistical properties similar to those of the error model. Belsley et al. (1980) and Cook and Weisberg (1982) discussed the standardization of residuals for the normal case. Pregibon (1981) proposed a deviance residual for generalized linear models and its standardized version. McCullagh (1987) showed analytical proofs of this standardization using approaches proposed by Cox and Snell (1968) and a quantal residual was proposed by Dunn and Smyth (1996) for the class of generalized linear models. Cox and Snell (1968) presented a general methodology that can be applied to a nonlinear symmetrical regression model. We consider the residual as one function t (ˆzi ) = t (Yi , µi ), where t (·) is an odd function, twice continuously differentiable, and following Cox and Snell (1968), expanding in a Taylor series, we can obtain expressions to order n−1 for the expectation and variance of t (ˆzi ) given by E (t (ˆzi )) =
E (t 0 (zi ))
"
√ φ
# (1 − hii )ηi −
X
hij ηj
=
j6=i
E (t 0 (zi ))
√
φ
E (yi − µ ˆ i ),
E t 2 (ˆzi ) = Var (t (zi )) 1 − τii∗ hii ,
Cov t (ˆzi ), t (ˆzj ) = −τij hij
i 6= j,
(4)
where zi = (yi − µi )/φ 1/2 ,
τii∗ = τii /Var (t (zi )) ,
Dββ (i) = ∂ 2 µi /∂β∂βT ,
τii = (4dg ) E 2t (zi )t (zi )v(zi )zi − t (zi ) − t (zi )t (zi ) , hij = dTi (DTβ Dβ )−1 dj , φ ηi = − tr{(DTβ Dβ )−1 Dββ (i)}, di = (di1 , . . . , dir )T and −1
0
02
00
8dg
τij = (4dg )−1 E (t (zi )t 0 (zj )v(zi )zi + t (zj )t 0 (zi )v(zj )zj − t 0 (zi )t 0 (zj )). Galea et al. (2005) presented a standardized residual for symmetrical nonlinear regression models which is a particular case of t (ˆzi ) when t (·) is the identity function. We proposed some usual functions t (·) and the standardized versions for residuals such as the deviance and quantal functions (see Tables 1 and 2). In particular, we have E (t (ˆzi )) = 0 in symmetrical linear regression models for all odd functions t (·). The factor of correction τii assumes values of 1 and 1/4dg , ∀i, for the deviance and a Cox–Snell residual respectively. Also, τii∗ = 1, ∀i, when yi has a normal distribution and Var(t (zi )) = 1 when t (·) = tq (·). Simulation studies omitted here show that the correction factor τii∗ for the Student-t model becomes closer to the correction factor for the normal model as the degrees of freedom increase in number.
F.J.A. Cysneiros, L.H. Vanegas / Statistics and Probability Letters 78 (2008) 3269–3273
3271
Table 1 Expressions for td (ˆzi ) and σg2 for some symmetrical distributions Distribution
td (ˆzi )
σg2
Normal
zˆi
1
12 zˆ 2 sign(ˆzi ) (ν + 1) log 1 + νi 12 zˆ 2 sign(ˆzi ) (r + 1) log 1 + si 1 sign(ˆzi ) 4 log 21 (1 + e|ˆzi | ) − 2|ˆzi | 2 sign(ˆzi ) |ˆzi |1/(k+1)
Student-t Generalized Student-t Logistic II Power exponential
ν ν−1 r r −1
1.233 k+1
σg2 = E (2 log[g (0)/g (z 2 )]), z ∼ S (0, 1).
Table 2 Some expressions for standardized residuals Residual
t (z )
Deviance
td = sign(z ) 2 log g (0)/g (z )
Quantal
tq = Φ
Cox–Snell
Identity
Standardized
−1
2
21
[F (z )]
td∗ = ∗
tq = tri =
td (ˆzi ) 1
(σg2 −hˆ ii ) 2 tq (ˆzi ) 1
(1−σg−2 hˆ ii ) 2 zˆi ξ
1 1 ˆ ii ) 2 2 (1−(4dg ξ )−1 h
F and Φ are the distribution function of z and the standard normal, respectively.
4. Results of simulations The Monte Carlo experiments were carried out using a symmetrical nonlinear regression model, in which we studied the statistical properties of the deviance, quantal and Cox–Snell residuals. We consider a Michaelis–Menten model of two parameters used in chemical kinetics to relate the initial velocity of an enzymatic reaction to the substrate concentration, where θ1 is the maximum speed of the reaction and θ2 is a kinetic constant. The Michaelis–Menten model is given by Yi =
θ1 xi + i θ2 + xi
with θ1 > 0, θ2 > 0, xi > 0, and i ∼ S (0, φ), i = 1, . . . , n. For illustration, we consider a Student-t distribution with ν = 5 d.f. for errors and where the true values of the parameters are θ1 = 212, θ2 = 0.0641 and φ = 2, xi ∼ U (0, 1), fixed among 10,000 replications. In each one of 10,000 replications, the maximum likelihood estimates of θ1 , θ2 and φ were calculated. For these estimates, the residuals td∗ and tri were available. Table 3 presents summary statistics for tri , td∗ ; we observed that they have approximately zero mean and unit variance. Also, the value of the empirical asymmetry coefficient (γˆ1 ) is very close to zero, indicating that the residuals distribution is approximately symmetrical although the size of the sample is small. td∗ has an empirical kurtosis coefficient (γˆ2 ) close to 3, showing that the empirical distribution of residuals is close to showing a good agreement with the normal distribution. Meanwhile, the empirical kurtosis coefficients for tri are, in most cases, smaller than the theoretical value of the kurtosis coefficient of the Student-t distribution with 5 d.f. Similar results, omitted here, were found for several situations and, in particular, tq∗ presents similar behavior to td∗ . 5. Concluding remarks In this work we developed theoretical details for approximate moments residuals for symmetrical nonlinear regression models. Codes in S-Plus and R to fit these models can be found on the web page www.de.ufpe.br/∼cysneiros/elliptical/ elliptical.html. We also verified, by simulation, that Cox–Snell residuals have approximately zero mean, unit variance, negligible skewness and similar kurtosis error distributions, while quantal and deviance residuals have good agreement with normal distributions. The residual tri has a simple expression that can be implemented easily in matrix language. On the other hand, we suggest using the deviance residual because this residual does not require the knowledge of the distribution function of zi and is close to showing a good agreement with the normal distribution. Acknowledgements The authors received financial support from CNPq and FACEPE, Brazil, and they are grateful to the referees for helpful comments and suggestions.
3272
F.J.A. Cysneiros, L.H. Vanegas / Statistics and Probability Letters 78 (2008) 3269–3273
Table 3 Summary of statistics residuals for a fitted Michaelis–Menten model with Student-t(5) errors with φ = 2
td∗
tri
i
Mean
Var
γˆ1
γˆ2
i
Mean
Var
γˆ1
γˆ2
1 2 3 4 5 6 7 8 9 10
−0.011 −0.013
1.077 1.066 1.085 1.087 1.069 1.077 1.087 1.082 1.092 1.065
−0.005
2.929 2.815 2.911 2.794 2.852 2.915 2.879 2.943 2.976 2.835
11 12 13 14 15 16 17 18 19 20
−0.009
1.087 1.091 1.047 1.095 1.107 1.059 1.081 1.109 1.087 1.083
−0.007 −0.029 −0.030 −0.029 −0.040
2.971 2.854 2.909 3.015 2.890 2.902 2.833 2.812 2.854 2.917
1 2 3 4 5 6 7 8 9 10
−0.011 −0.012
5.666 9.281 6.211 5.106 6.668 7.515 7.224 5.534 7.545 6.225
11 12 13 14 15 16 17 18 19 20
−0.009
0.006 −0.005 −0.013 −0.003 −0.013 0.022 −0.002 0.012
0.004
−0.013 0.032
−0.018 −0.009 0.070 −0.023 −0.008 −0.015
1.053 0.945 1.075 1.054 1.047 1.078 1.078 1.055 1.113 1.046
0.005
−0.002 −0.015 −0.003 −0.008 0.020
−0.002 0.011
0.077
−0.069 −0.122 0.128
−0.238 0.138 0.182 −0.003 0.069 0.010
0.005
−0.006 −0.013 −0.001 0.002 0.009 0.002 0.019 −0.002
1.120 1.085 1.024 1.082 1.112 1.045 1.078 1.090 1.075 1.069
0.003 −0.009 −0.015 −0.003 0.002 0.007 0.003 0.019 0.001
0.014
−0.020 0.004 0.002 0.039 0.015
−0.057 −0.233 −0.145 0.107 0.037 −0.109 0.009 0.090 0.045
9.390 5.954 8.523 6.192 7.540 7.228 7.962 5.545 5.824 5.767
Appendix Let us have Ti = t (ˆzi ) = t (Yi , µi ), where t (·) is an odd function, twice continuously differentiable. Following Cox and Snell (1968), expanding in a Taylor series, we can obtain expressions to order n−1 for the expectation and variance of the Ti : t (ˆzi ) = t (zi ) +
p p p X 1 XX (βˆ r − βr )(βˆ s − βs )Mrs(i) , (βˆ r − βr )Mr(i) +
(5)
2 r =1 s =1
r =1
where Mr(i) =
∂ t (zi ) t 0 (zi ) ∂µi =− √ , ∂βr φ ∂βr
Mrs(i) =
t 00 (zi ) ∂µi ∂µi t 0 (zi ) ∂ 2 µi ∂ 2 t (zi ) = − √ . ∂βr ∂βs φ ∂βr ∂βs φ ∂βr ∂βs
(6)
Following (25) in Cox and Snell (1968), taking the expectation on both sides for (5), we obtain E (t (ˆzi )) = E (t (zi )) +
p X
E βˆ r − βr E Mr(i) +
r =1
p X p X
I rs E
1
Mr(i) u(si) +
2
r =1 s =1
Mrs(i)
(7)
E (t (ˆzi )) = E (t (zi )) + ai , (i) (i) (i) (i) vi zi ∂µi 1 √ where I rs is the (r , s)th element of the matrix K− d where us and Mr us are odd functions and zi ββ , dis = ∂βs , us = φ is
(i) (i) = 0. P Pp P p p (i) (i) 1 rs ˆ E β − β E M + I E M and, evaluating the Then, Eq. (7) can be written as E (t (ˆzi )) = r r r rs r =1 r =1 s=1 2 (i) (i) E (t 0 (zi )) E (t 0 (zi )) ∂ 2 µi expectation in (6), we have E Mr = − √φ dir and E Mrs = − √φ ∂βr ∂βs . Then,
has a symmetrical distribution around zero, and therefore, when it exists, E Mr us
E (t (ˆzi )) =
p p p E (t 0 (zi )) X E (t 0 (zi )) X X rs ∂ 2 µi E βˆ r − βr dir − . I √ √ φ r =1 2 φ r =1 s=1 ∂βr ∂βs
(8)
ˆ , that can also be written Cordeiro et al. (2000) developed the bias of the second order of the maximum likelihood estimates, β φ T T −1 T −1 ˆ as B(β) = (Dβ Dβ ) Dβ η, where η is an n-vector n with ith element ηi = − 8d tr{(Dβ Dβ ) Dββ (i)}, Dββ (i) = ∂ 2 µi /∂β∂βT , g
and applying biases of the second order in (8), we obtain at order n−1 E (t (ˆzi )) =
E (t 0 (zi ))
"
# (1 − hii )ηi −
√ φ
X
hij ηj
j6=i
=
E (t 0 (zi ))
√
φ
E (yi − µ ˆ i ),
For the second moment, up to order n−1 , we have E t 2 (ˆzi ) = E t 2 (zi ) + 2
p X
E βˆ r − βr E t (zi )Mr(i) + 2
r =1
p X p X r =1 s=1
I rs E
t (zi )Mr(i) u(si) +
1 2
Mr(i) Ms(i) +
1 2
t (zi )Mrs(i)
.
F.J.A. Cysneiros, L.H. Vanegas / Statistics and Probability Letters 78 (2008) 3269–3273
(i)
3273
(i)
By the same argument, t (zi )Mr and t (zi )t 0 (zi ) are odd functions; when it exists, we have E t (zi )Mr
= E t (zi )t 0 (zi ) = 0.
Thus, E t 2 (ˆzi ) = Var (t (zi )) −
p p 2 XX
φ
I rs dir dis E t (zi )t 0 (zi )v(zi )zi +
r =1 s =1
p p 1 XX
φ
I rs dir dis E t 02 (zi ) + t 00 (zi )t (zi ) .
r =1 s=1
Therefore, E t 2 (ˆzi ) = Var (t (zi )) − τii tr (DTβ Dβ )−1 di dTi
= Var (t (zi )) − (τii hii ),
(9)
with τii = (4dg ) E 2t (zi )t (zi )v(zi )zi − t (zi ) − t (zi )t (zi ) and di = (di1 , . . . , dip )T . Then (9) can be given by E t 2 (ˆzi ) = Var (t (zi )) 1 − τii∗ hii , −1
0
τii Var(t (zi ))
where τii∗ = e hii = And, for the covariance,
02
dTi
00
(10)
(Dβ Dβ )−1 di . T
E t (ˆzi ), t (ˆzj ) = E 2 (t (zi )) + (ai + aj )E (t (zi )) +
p X p X
I rs E t (zi )Mr(j) u(si) + t (zj )Mr(i) u(sj) + Mr(i) Ms(j) .
r =1 s=1
Applying E t (zi )Mr(j) us(i) = −φ −1 E t (zi )t 0 (zi )v(zi )zi djr dis ,
E (t (zi )) = 0,
E t (zj )Mr(s) u(si) = −φ −1 E t (zj )t 0 (zi )v(zj )zj dir djs ,
(i)
(j)
(i)
E 2 (t (zi )) = 0
and
= −φ E t (zi )t (zj ) dir dij in covariance, we have p X p X Cov t (ˆzi ), t (ˆzj ) = −φ −1 I rs E t (zi )t 0 (zj )v(zi )zi djr dis + E t (zj )t 0 (zi )v(zj )zj − t 0 (zi )t 0 (zj ) dir djs E Mr Ms us
−1
0
r =1 s=1
= −τij hij where τij
(4dg )−1 E t (zi )t 0 (zj )v(zi )zi + t (zj )t 0 (zi )v(zj )zj − t 0 (zi )t 0 (zj ) and hij is the (i, j)th element of H Dβ (DTβ Dβ )−1 DTβ , for i 6= j. =
=
References Belsley, D.A., Kuh, E., Welsh, R.E., 1980. Regression Diagnostic. John Wiley, New York. Chmielewski, M.A., 1981. Elliptical symmetric distributions: A review and bibliography. International Statistics Review 49, 67–74. Cook, R.D., Weisberg, S., 1982. Residuals and influence in Regression. Chapman and Hall, New York. Cordeiro, G.M., Ferrari, S.L.P., Uribe-Opazo, M.A., Vasconcellos, K.L.P., 2000. Corrected maximum likelihood estimation in a class of symmetric nonlinear regression models. Statistics and Probability Letters 53, 629–643. Cox, D.R., Hinkley, D.V., 1974. Theoretical Statistics. Chapman and Hall, London. Cox, D.R., Snell, E.J., 1968. A general definition of residuals. Journal of the Royal Statistical Society B 30, 248–275. Dunn, K.P., Smyth, G.K., 1996. Randomized quantile residuals. Journal of Computational and Graphical Statistics 5, 236–244. Fang, K.T., Anderson, T.W., 1990. Statistical Inference in Elliptical Contoured and Related Distributions. Allerton Press, New York. Fang, K.T., Kotz, S., Ng, K.W., 1990. Symmetric Multivariate and Related Distributions. Chapman and Hall, London. Galea, M., Paula, G.A., Cysneiros, F.J.A., 2005. On diagnostic in symmetrical nonlinear models. Statistics and Probability Letters 73, 459–467. McCullagh, P., 1987. Tensor Methods in Statistics. Chapman and Hall, London. Pregibon, D., 1981. Logistic regression diagnostics. Annals of Statistics 9, 705–724.