Journal of the Korean Statistical Society 40 (2011) 115–124
Contents lists available at ScienceDirect
Journal of the Korean Statistical Society journal homepage: www.elsevier.com/locate/jkss
Nonlinear regression models based on scale mixtures of skew-normal distributions Aldo M. Garay a , Víctor H. Lachos a , Carlos A. Abanto-Valle b,∗ a
Departamento de Estatística, Universidade Estatual de Campinas, Brazil
b
Departamento de Estatística, Universidade Federal de Rio de Janeiro, Brazil
article
info
Article history: Received 18 November 2009 Accepted 23 August 2010 Available online 27 September 2010 AMS 2000 subject classifications: primary 62F35 secondary 62J02 Keywords: EM algorithm Nonlinear regression models Scale mixtures of skew-normal distributions Skew-normal distribution
abstract An extension of some standard likelihood based procedures to nonlinear regression models under scale mixtures of skew-normal (SMSN) distributions is developed. This novel class of models provides a useful generalization of the symmetrical nonlinear regression models since the random terms distributions cover both symmetric as well as asymmetric and heavy-tailed distributions such as skew-t, skew-slash, skew-contaminated normal, among others. A simple EM-type algorithm for iteratively computing maximum likelihood estimates is presented and the observed information matrix is derived analytically. In order to examine the robust aspect of this flexible class against outlying and influential observations, some simulation studies have also been presented. Finally, an illustration of the methodology is given considering a data set previously analyzed under normal and skew-normal nonlinear regression models. © 2010 The Korean Statistical Society. Published by Elsevier B.V. All rights reserved.
1. Introduction Normal nonlinear regression models (N-NLM) are usually applied in science and engineering to model symmetrical data using nonlinear functions of unknown parameters in order to explain or describe the phenomena under study. However, it is well-known that several phenomena are not always in agreement with the normal model due to lack of symmetry in the distribution or presence of heavy-and-lightly tailed distributions related to the normal law in the data. Particularly, it is well-known that the parameter estimates of the normal model based on maximum likelihood (ML) methods are often sensitive to atypical observations. To deal with this problem, some proposals have been made in the literature by replacing the normality assumption by more flexible classes of distributions. For instance, Cysneiros and Vanegas (2008) studied the symmetrical nonlinear regression model and performed an analytical and empirical study to describe the behavior of the standardized residuals. Vanegas and Cysneiros (2010) propose diagnostic procedures based on case-deletion model for symmetrical nonlinear regression models. Cancho, Lachos, and Ortega (2009) introduce the skew-normal nonlinear regression models (SN–NLM) and they present a complete likelihood based analysis, including an efficient EM algorithm to maximum likelihood estimation. Xie, Lin, and Wei (2009) and Xie, Wei, and Lin (2009) develop score test statistics for testing homogeneity in the SN–NLM proposed by Cancho et al. (2009). A common feature of these classes of NLM is that the N-NLMs is also a member of the same class. Although these models are attractive, there is a need to check the distributional assumptions of the model errors because these can present skewness and be heavy-tailed (Basso, Lachos, Cabral, & Ghosh, 2009). Branco and Dey (2001) propose using
∗ Corresponding address: Departamento de Estatística, Universidade Federal de Rio de Janeiro, Caixa Postal 68530, CEP: 21945-970, Rio de Janeiro-RJ, Brazil. E-mail addresses:
[email protected] (A.M. Garay),
[email protected] (V.H. Lachos),
[email protected] (C.A. Abanto-Valle). 1226-3192/$ – see front matter © 2010 The Korean Statistical Society. Published by Elsevier B.V. All rights reserved. doi:10.1016/j.jkss.2010.08.003
116
A.M. Garay et al. / Journal of the Korean Statistical Society 40 (2011) 115–124
scale mixtures of skew-normal (SMSN) distributions in order to deal with the problem of atypical data in an asymmetrical context. In this article, we extend the SN–NLM by assuming that the model errors follow a mean-zero SMSN distribution. Interestingly, the SMSN class contains the entire family of scale mixtures of normal distributions (Lange & Sinsheimer, 1993) and skewed versions of classical symmetric distributions such as the skew-t (ST), the skew-slash (SSL) and the skew contaminated normal (SCN) distribution. Therefore, they can be a reasonable choice for robust inference in many types of models. In addition, this rich class of distributions may naturally attribute different weights to each observation and consequently control the influence of a single observation on the parameter estimates. The rest of the paper is organized as follows. In Section 2, we present some properties of the univariate SMSN family. Section 3 outlines the asymmetric model as well as some inferential results. In Section 4 an EM-type algorithm for maximum likelihood estimation is developed. In addition, the use of standardized residuals in these asymmetrical nonlinear models are evaluated in the presence of outliers. The methodology proposed is illustrated in Section 6 by analyzing a real data set and some conclusive remarks are presented in Section 7. 2. Scale mixtures of skew-normal distributions Before we start, let us introduce us some notation. A random variable Y is in the SMSN family if it can be written as Y = µ + κ 1/2 (U )Z ,
(1)
where µ is a location parameter, Z is a skew-normal random variable with location 0, scale σ , skewness λ(Z ∼ SN(0, σ 2 , λ)), κ(u) is a positive function of u, U is a random variable with distribution function H (·; ν) and density h(·; ν) and ν is a scalar or vector parameter indexing the distribution of U. Although we can deal with any κ function, in this paper we restrict our attention to the case in that κ(u) = 1/u, since it leads to good mathematical properties. Given U = u, we have that Y |U = u ∼ SN(µ, u−1 σ 2 , λ). Thus, the density of Y is given by 2
f (y) = 2
∞
∫
φ(y; µ, u−1 σ 2 )Φ
0
u1/2 λ(y − µ)
σ
dH (u; ν),
(2)
where φ(·; µ, σ 2 ) denotes the density of the univariate normal distribution with mean µ and variance σ 2 > 0 and Φ (·) is the distribution function of the standard univariate normal distribution. The notation Y ∼ SMSN(µ, σ 2 , λ; H ) will be used when Y has pdf (2). When H is degenerate, with u = 1, we obtain the SN(µ, σ 2 , λ) distribution. When λ = 0, the SMSN ∞distributions reduces to the class of scale-mixtures of the normal (SMN) distribution represented by the pdf f0 (y) = 0 φp (y; µ, u−1 σ 2 )dH (u; ν). The distributions in the SMSN class that will be considered in this work are:
• The skew-normal distribution. Denoted by SN(µ, σ 2 , λ), with pdf given by f (y) = 2φ(y; µ, u−1 σ 2 )Φ (A),
y ∈ R,
(3)
where A = λ(y − µ)/σ . • The skew-t distribution with ν degrees of freedom. In this case, the density of Y takes the form − ν+2 1 Γ ν+2 1 d ν+1 √ f (y) = 1+ T A; ν + 1 , y ∈ R, ν d+ν Γ ν2 π νσ
(4)
where d = (y −µ)2 /σ 2 and T (·; ν) denotes the distribution function of the standard Student-t distribution, with location zero, scale one and ν degrees of freedom, namely t (0, 1, ν). We use the notation Y ∼ ST (µ, σ 2 , λ; ν). It is known that if ν → ∞ we obtain the skew normal distribution. In the special case where λ = 0, we obtain the cdf of the ordinary Student-t distribution with ν degrees of freedom. Although the ST distribution has nice properties (see, Azzalini & Genton, 2008), there are some inferential problems. For instance, it can be shown that the observed information matrix is singular when λ = 0 and ν → ∞. In this article, we only consider the case that the degrees of freedom ν is finite. • The skew-slash distribution. Denoted by Y ∼ SSL(µ, σ 2 , λ; ν) and the associated density is given by f (y) = 2ν
1
∫
uν−1 φ(y; µ, u−1 σ 2 )Φ (u1/2 A)du,
y ∈ R.
(5)
0
The skew-slash is a heavy-tailed distribution having as a limiting distribution the skew-normal one (when ν → ∞).
• The skew contaminated normal distribution. We denote it by Y ∼ SCN(µ, σ 2 , λ; ν, γ ). Its density is given by f (y) = 2{νφ(y; µ, γ −1 σ 2 )Φ (γ 1/2 A) + (1 − ν)φ(y; µ, σ 2 )Φ (A)},
ν, γ ∈ (0, 1).
The parameters ν and γ can be interpreted as the proportion of outliers and a scale factor, respectively. The skew contaminated normal distribution reduces to the skew-normal distribution when γ = 1.
A.M. Garay et al. / Journal of the Korean Statistical Society 40 (2011) 115–124
117
3. The SMSN nonlinear regression model The nonlinear regression model based on SMSN distributions–hereafter SMSN–NLM—is defined as Yi = η(β, xi ) + εi ,
i = 1, . . . , n,
(6)
where the Yi are responses, η(.) is an injective and twice continuously differentiable function with respect to theparameter vector β = (β1 , . . . , βp )⊤ , xi is a vector of explanatory variable values and the random errors εi ∼ SMSN(−
k ∆, π 1 2
σ 2 , λ; H ), with k1 = E {U −1/2 }, ∆ = σ δ and δ = λ/(1 + λ2 )1/2 , which corresponds to the regression model where the error distribution has mean zero and hence the regression parameters are all comparable, once from Lemma 2 given in Basso et al. (2009), we have that E [Yi ] = η(β, xi ),
where b = −
2
k π 1
Var[Yi ] = k2 σ 2 − b2 ∆2 ,
and Yi ∼ SMSN(η(β, xi ) + b∆, σ 2 , λ; H ), for i = 1, . . . , n. As recommended by Lange, Little, and Taylor
(1989) and Berkane, Kano, and Bentler (1994), who pointed out difficulties in estimating ν due to problems of unbounded and local maximum in the likelihood function, we taken the value of ν to be known. ⊤ 2 ⊤ ⊤ ∑nThe log-likelihood function for θ = (β , σ , λ) given the observed sample y = (y1 , . . . , yn ) is given by ℓ(θ) = i=1 ℓi (θ), where
ℓi (θ) = log 2 − with Ki = 1/2
∞ 0
1/2
ui
1 2
log 2π −
1 2
log σ 2 + log Ki , 1/2
exp − 12 ui di Φ (ui
Ai )dH (ui ), di = (yi − η(β, xi ) − b∆)2 /σ 2 is the Mahalanobish distance, and ∂ℓ(θ)
Ai = di λ. The score function is given by U (θ) = ∂θ = Ui (γ), for γ = β, σ 2 or λ, has the form
∑n
i=1
Ui (θ), where Ui (θ) =
∂ℓi (θ) ∂θ
= (Ui (β)⊤ , Ui (σ 2 ), Ui (λ))⊤ and
1 ∂ log σ 2 1 ∂ Ki ∂ℓi (θ) =− + , i = 1, . . . , n, ∂γ 2 ∂γ Ki ∂γ ∂d ∑ φ ∂K ∂A ∂ 2 ℓ(θ) where ∂γi = Ii (1) ∂γi − 12 IiΦ 32 ∂γi and the observed information matrix J(θ) = − = − ni=1 ∂θ∂θ⊤ ∂ 2 ℓi (θ) ,for γ, τ = β, σ 2 or λ, where given by Jγ τ = − ∂γ∂τ⊤ Ui (γ) =
(7) ∂ 2 ℓi (θ) , ∂θ∂θ⊤
have elements
∂ 2 ℓi (θ) 1 ∂ 2 log σ 2 1 ∂ Ki ∂ Ki 1 ∂ 2 Ki = − − + ∂γ∂τ⊤ 2 ∂γ∂τ⊤ Ki ∂γ∂τ⊤ Ki2 ∂γ ∂τ⊤ and
∂ 2 Ki 1 = IiΦ ⊤ ∂γ∂τ 4
2 ∂ di ∂ di 1 Φ 3 ∂ di 1 φ ∂ Ai ∂ di ∂ di ∂ Ai − I − I ( 2 ) + i 2 ∂γ ∂τ⊤ 2 2 ∂γ∂τ⊤ 2 i ∂γ ∂τ⊤ ∂γ ∂τ⊤ 2 ∂ Ai ∂ Ai ∂ Ai φ φ − Ii (2)Ai + Ii (1) , ⊤ ∂γ ∂τ ∂γ∂τ⊤ 1 1 1/2 φ 2 with IiΦ (w) = EH uw Ai ) and Ii (w) = √1 EH uw i exp − 2 ui di Φ1 (ui i exp − 2 ui (di + Ai ) . Notice that we can also 2π φ write Ki = IiΦ 21 . Attractive expressions for the quantities IiΦ (w) and Ii (w) have been given in Basso et al. (2009), which 5
can be easily implemented with low computational burden. In addition, the derivatives of di and Ai involve standard algebraic manipulations and are not given here. For readers interested in these derivatives, we refer to Cancho et al. (2009). Note that since one has an analytical expression for the observed information matrix, the Newton–Raphson method can be easily applied to get the ML estimates. In the next section we discuss a more elaborate technique to find the ML estimates of the parameters vector θ based on an EM-type algorithm. 4. Parameter estimation via the EM-algorithm
In this subsection we develop an Expectation–Maximization (EM) algorithm (Dempster, Laird, & Rubin, 1977) for maximum likelihood estimation of the parameters of SMSN–NLM. In order to do this, we first represent the SMSN–NLM in an incomplete data framework using the Lemma 2 given in Basso et al. (2009). We consider the following hierarchical representation for Yi : Yi |Ti = ti ∼ N1 (η(β, xi ) + 1ti , Ui−1 Γ ),
1 Ti |Ui ∼ TN1 (b, u− i ; (b, ∞)),
Ui ∼ H (.; ν),
(8)
118
A.M. Garay et al. / Journal of the Korean Statistical Society 40 (2011) 115–124
where Γ = (1 − δ 2 )σ 2 , ∆ = σ δ and where TN1 (r , s; (a, b)) denotes the univariate normal distribution (N (r , s)), truncated on the interval (a, b). A useful straightforward result is that the conditional distribution of Ti given yi and ui is 1 2 TN1 (µTi + b, u− i MT ; (b, ∞)), with MT2 =
Γ , ∆2 + Γ
µTi =
∆ (yi − η(β, xi ) − 1b). ∆2 + Γ
Now we proceed for the E-step of the algorithm. To represent the estimator of the parameter ξ = g (θ), we will use the general notation ξ = g ( θ), where g (·) is a generic function of θ = (β⊤ , σ 2 , λ)⊤ . Thus, let y = (y1 , . . . , yn )⊤ , t = (t1 , . . . , tn )⊤ and u = (u1 , . . . , un )⊤ . It follows that the complete log-likelihood function associated with (y, t, u) is given by n
ℓc (θ|y, t, u) = c −
2
log Γ −
n 1 −
2Γ i=1
ui (yi − η(β, xi ) − 1ti )2 ,
(9)
2 = E [U t 2 |θ = where c is a constant that is independent of θ. Letting ui = E [Ui |θ = θ, yi ], ut i = E [Ui ti |θ = θ, yi ], ut θ, yi ] i i i and using known properties of conditional expectation we obtain T ut i = ui ( µTi + b) + M τ1i , [
1/2
where τ1i = E Ui
WΦ
1/2
Ui
µTi
2 = T2 + M T ( ut ui ( µTi + b)2 + M µTi + 2b) τ1i , i
T M
] | θ, yi . In each step, the conditional expectations ui = u1i and τ1i can be easily de-
rived from the results given in Basso et al. (2009, Section 2.1). For the skew-t, skew-slash and skew contaminated normal distribution we have computationally attractive expressions that can be easily implemented (see also, Lachos, Ghosh, & Arellano-Valle, 2010). These expressions are quite useful in implementing the M-step, which consists in maximizing the expected complete data function or the Q -function over θ, given by
] n [ (k) 1 − (k) n (k) (k) (k) 2 2 2 , Q (θ|θ ) = E [ℓc (θ)|y, θ ] = c − log(Γ ) − ui (yi − η(β, xi )) − 2∆(yi − η(β, xi ))ut i + ∆ uti 2 2Γ i=1 (k)
where θ is an updated value of θ. When the M-step turns out to be analytically intractable, it can be replaced with a sequence of conditional maximization (CM) steps. The resulting procedure is known as ECM algorithm (Meng & Rubin, 1993). Next, we describe this EM-type algorithm (ECM) for maximum likelihood estimation of the parameters of the SMSN–NLM. (k)
(k)
(k)
2 E-step: Given a current estimate θ , compute ui ut i , ut i (k)
CM-step: Update θ
(k)
, for i = 1, . . . , n.
(k)
by maximizing Q (θ| θ ) over θ, which leads to the following nice expressions
(k+1) β = argmin(z(k) − η(β, x))⊤ U(k) (z(k) − η(β, x)),
(10)
β
n ∑
(k+1)
∆
=
i =1
(k)
ut i (yi − η(β(k+1) , xi )) n ∑ 2 (k)
,
(11)
uti
i=1
(k+1) = Γ
n 1 −
(k)
(k)
(k) 2 (yi − η(β(k+1) , xi ))2 ui − 2∆(k+1) (yi − η(β(k+1) , xi ))ut i + (∆2 )(k+1) ut i
n i=1 (k)
,
(12)
(k)
(k) where U(k) = diag( u1 , . . . , un ), z(k) is the corrected observed response given by z(k) = y − ∆ τ(k) , with τ(k) = ( τ1 (k) , (k)
. . . , τn (k) )⊤ , τi
(k)
(k)
= ut i / ui and η(β, x) = (η(β, x1 ), . . . , η(β, xn ))⊤ . This process is iterated until a suitable convergence (k+1) (k) rule is satisfied, e.g. if ‖ θ − θ ‖ is sufficiently small, or until some distance involving two successive evaluations of the (k+1) (k) (k+1) (k) actual log-likelihood ℓ(θ), like ‖ℓ( θ ) − ℓ( θ )‖ or ‖ℓ( θ )/ℓ( θ ) − 1‖, is small enough. An interesting observation is that the M-step to estimate β is equivalent to the weighted nonlinear least squares in the NLM, z = η(β, x) + ϵ, in which the reliable and efficient implementation of algorithms is available such as SAS, R, Ox and Matlab. Note that σ 2(k+1) √ in software (k+1) 2 2 and λ can be recovered by using the fact that λ = ∆/ Γ and σ = ∆ + Γ .
A.M. Garay et al. / Journal of the Korean Statistical Society 40 (2011) 115–124
119
4.1. Notes on implementation It is well known that maximum likelihood estimation in nonlinear models may face some computational hurdles, in the sense that the method may not give maximum global solutions if the starting values are far from the real parameter values. Thus, the choice of starting values for the EM algorithm in the non-linear context plays a big role in parameter estimation. In our example we consider the following procedure: (0) • Compute β modeling using the standard nonlinear least squares • compute the initial values ( σ 2 )(0) and λ(0) using the residuals and the method of moments estimators given in Basso et al. (2009) with M1 = 0. The range for the skewness coefficient γ1 of the SN distribution is approximately (−0.9953, 0.9953)—see Azzalini (1985). But the method of moments can produce an initial value of γ1(0) that is not in this interval. (0) (0) In this case, we use as starting points the values −0.99 (if γ1 ≤ −0.9953) or 0.99 (if γ1 ≥ 0.9953). To use these initial
values to find the ML estimates of a SN–NLM.
• We use the EM estimates of the SN–NLM as initial values for the corresponding ST–NLM, SSL–NLM and SCN–NLM parameters;
• In order to estimate ν in the ST–NLM and SSL–NLM we have fixed integer values for ν from 3 to 100 and 2 to 100 by 1, respectively, choosing the value of ν that maximizes the likelihood function. A similar procedure has been adopted for the SCN–NLM.
• If we suspect that λ = 0, then we proceed to fit the corresponding SMN distribution. The EM algorithm for the symmetrical SMN class arises as a particular case of the above iterative scheme. All we have to do is replace λ = 0 in the expressions of the E and M steps and proceed with the simplifications. Moreover, computation of the observed information matrix is accomplished by dropping the elements corresponding to λ from J(θ). Even though these procedures look reasonable for computing the starting points, the tradition in practice is to try several initial values for the EM algorithm, in order to get the highest likelihood value. It is important to note that the highest maximized likelihood is an essential ingredient of some model choice criteria such as Akaike Information Criterion (AIC). R codes are available in the address www.ime.unicamp.br/~hlachos/nlsmsn.html to implement the authors’ method. 4.2. Residuals Residual analysis aims at identifying atypical observations and/or model misspecification once residuals are measures of agreement between the data and the fitted model. Most residuals are based on the differences between the observed responses and the fitted conditional mean. We define the following standardized ordinary residual (Pearson residuals): yi − µi ri = , Var(yi )
i = 1 , . . . , n,
(yi ) = k2 where Var σ 2 − π2 k21 σ 2 δ 2 . Here, µi = η( β, xi ), and β, σ 2 and δ denoting the maximum likelihood estimators of β, σ 2 and δ , respectively. As the distribution of the standardized residual is not known, we follow the suggestion given by Atkinson (1981) to construct the simulated envelope. The simulated envelope can be used as a helpful diagnostic tool to detect incorrect specification of the error distribution and the systematic component η(β, xi ), as well as the presence of outlying observations. An oft-voiced complaint of this method is that it may very slow in some situations, once we need to generate and fit the model for a number k ≥ 100 of simulated samples. 5. Simulation study Initially we employ simulated non-normal data to illustrate the use of our proposal when the structure of the data is known. The goal of this simulation study is to compare the performance of the estimates on nonlinear models in the presence of outliers on the response variable. We performed a Monte Carlo simulation study with the following nonlinear growthcurve model: Yi = where εi
β1 1 + β2 exp(−β3 xi )
+ εi ,
i = 1, . . . , 50,
(13)
i.i.d.
∼ SMSN − π2 k1 ∆, σ 2 , λ; H are independent and identically distributed variables with mean zero. The
variable xi ranging from 4 to 53 and these values were held fixed throughout the simulations. The parameter values were set 2 around the estimates obtained in the application √β1 = 37.32 , β2 = 44.5, β3 = 0.73, σ = 2.95, λ = −2 (i.e., ∆ = 1.537). For errors, we generated 2000 samples of SN(− 2/π ∆, σ , λ). Following Vanegas and Cysneiros (2010), to guarantee the presence of one outlier we constructed Yi∗ = Yi −ϑ , where i is a central value of samples and ϑ = 1, 2, 3, 4, 5, 6, 7, 8, 9, 10. In each replication, we obtained the parameters estimates with and without outliers denoted by θˆ and θˆ (i) , respectively, under skew normal (SN–NLM), the skew-T (ST–NLM) with different values of ν = 3, 6 and 10, skew-slash (SSL–NLM) with
120
A.M. Garay et al. / Journal of the Korean Statistical Society 40 (2011) 115–124
Fig. 1. Simulated data. Average changes on estimates of β1 , β2 , β3 and σ 2 for the nonlinear growth-curve model.
ν = 2 and skew contaminated normal models (SCN–NLM) with ν = (0.2, 0.2). Then we computed the relative changes θˆ (i) − 1 on estimates of β1 , β2 , β3 and σ 2 when the outlier is removed from the dataset. θˆ Fig. 1 shows the average values of relative changes on the estimation for the 2000 replications. We can observe, that for all the models, the influence of the outliers in the estimative increases as ϑ also increases. In the ST models the influence increases when ν also increases. However, for the SMSN models with heavy tails these measure varied little, which indicates that these models are more robust than the SN–NLM in the presence of discrepant observations. Note also that, unlike the other models considered, in the ST–NLM and SSL–NLM the influence of outliers on β and σ 2 is not strictly increasing when ϑ increases. These figures shows that, although there exists an extreme observation in all models considered, this observation seems to be more influent in the skew-normal errors model, which reflects the ability of models with heavier tails than skew-normal to reduce and control the influence of outliers on parameter estimates. As suggested by one referee, we also conducted the analysis for various values of the skewness parameter, say λ = 0 and λ = 2. The relative changes of the estimates do not present a remarkable difference and not impair the behavior depicted in Fig. 1, and hence are not present here. In conclusion, heavy-tailed in a asymmetric context, are less sensitive in the presence of outliers. These results agree with similar considerations presented in Vanegas and Cysneiros (2010) in a symmetric context (λ = 0). 6. An application In this section we consider a likelihood analysis of the data set presented in Foong (1999) that describe the oil palm yield. The motivation for the use of this data set is the presence of the outliers when it is analyzed under normal and skewnormal distribution errors—see Fig. 1 in Cancho et al. (2009). We reanalyze the data set with the aim of providing additional inferences by using SMSN distributions. Assuming a nonlinear growth-curve model, we fit a NLM to the data as specified by Cancho et al. (2009) Yi =
β1 1 + β2 exp(−β3 xi )
+ εi ,
i.i.d.
εi ∼ SMSN −
2
π
k1 ∆, σ , λ; H 2
,
(14)
for i = 1, . . . , 19, where H denote the distribution function for the mixture variable Ui , for i = 1, . . . , 19. In our analysis we will assume SN, ST, SSL and SCN distributions from the SMSN class for comparative purposes. We choose the value of ν by maximizing the likelihood function as described in Section 4.1. For the ST model we found ν = 3, for the SSL we found ν = 2 and for the SCN we found ν = (0.3, 0.3). Actually, with ν = 3 and ν = 2 the variance of the skew-t and slash distributions is finite. Table 1 contains the ML estimates of the parameters from the four models, together with their corresponding standard errors calculated via the observed information matrix. The AIC model selection criterion indicate that the ST distribution present the best fit. Although the regression estimates parameters are similar in all the four fitted models (see Table 1) the standard errors of the SMSN–NLM with heavy tails are smaller than those in the SN–NLM. This suggests that the three models with longer tails than the SN model seem to produce more accurate maximum likelihood estimates. The estimates for the variance components (σ 2 and λ) are not comparable since they are on a different scale. The QQ-plots and envelopes for the Pearson residuals are shown in Fig. 2. The lines in these figures represent the 5th percentile, the mean, and the 95th percentile of 100 simulated points for each observation. This figure clearly indicates that the ST–NLM
A.M. Garay et al. / Journal of the Korean Statistical Society 40 (2011) 115–124
121
Fig. 2. Oil palm yield data set. Q–Q plots and simulated envelopes for the Pearson residuals. Table 1 ML estimation results for fitting various mixture models on the oil palm yield data set. SE are the asymptotic standard errors based on the observed information matrix. Parameter
SN–NLM
ST–NLM
SCN–NLM
SSL–NLM
Estimate
SE
Estimate
SE
Estimate
SE
Estimate
SE
β1 β2 β3 σ2 λ ν γ
37.351 44.576 0.731 6.919 −4.453 – –
0.462 17.039 0.070 2.655 3.125 – –
37.529 43.483 0.732 1.644 −1.871 3 –
0.441 10.364 0.045 1.152 1.332 – –
37.714 42.579 0.722 2.077 −2.969 0.3 0.3
0.413 11.826 0.052 1.343 1.641 – –
37.463 43.373 0.728 3.105 −3.489 2 –
0.486 14.982 0.063 1.708 2.481 – –
log-likelihood AIC
−35.03691
−33.829
80.07382
79.659
−34.132 82.265
−34.781 81.562
is more suitable for modeling the current data than the SN–NLM, SSL–NLM and SCN–NLM, since there is only one observation falling outside the envelope. Moreover, there is no evidence of lack of fit for the ST–NLM model. In order to detect outlying observations, we use the Mahalanobis distance di = (yi −η(β, xi )− b∆)2 /σ 2 , for i = 1, . . . , 19, as recommended by Osorio, Paula, and Galea (2007) and Pinheiro, Liu, and Wu (2001). In the skew-normal case, we have di ∼ χ12 . Thus, we can use as cutoff points the quantile υ = χ12 (ξ ), where 0 < ξ < 1, to identify outliers. From Zeller, Labra, Lachos, and Balakrishnan (2010), we have the following properties relating to the Mahalanobis distance: di ∼ F (1, ν) for 2ν Γ (ν+1/2) ST, Pr(di ≤ υ) = Pr(χ12 ≤ υ) − υ ν Γ (1/2) Pr(χ22ν+1 ≤ υ) for SSL, and Pr(di ≤ υ) = ν Pr(χ12 ≤ γ υ) + (1 − ν) Pr(χ12 ≤ υ) for the SCN distribution, and these may be used to identify outliers. Fig. 3 displays these distances for the four fitted models. The cutoff lines corresponds to the quantile ξ = 0.95. We can see from these figures that observations 15 and 18 appear as possible outliers in all cases.
122
A.M. Garay et al. / Journal of the Korean Statistical Society 40 (2011) 115–124
Fig. 3. Oil palm yield data set. Index plots of the Mahalanobis distances for the four fitted models.
Fig. 4. Oil palm yield data set. Estimated ui for the ST, SSL and SCN models.
In Table 2 we presented the relative changes of the estimates of the parameters when the observations 15 and 18 are deleted. It is of interest to note that larger changes occur under the SN distribution. As expected, the results show that the maximum likelihood estimators are less sensitive to the presence of atypical data when distributions with than the SN are used. In fact, when we use distributions with tails heavier than the skew-normal distribution, the EM algorithm allows to accommodate discrepant observations attributing to them small weights in the estimation procedure. The weights for the skew-normal distribution ( ui , i = 1, . . . , 19,) are indicated in Fig. 4 as a continuous line. Note from this figure that for the ST, SSL and SCN distributions ui is inversely proportional to the Mahalanobis distance. Therefore, this rich class of distributions may naturally attribute different weights to each observation and consequently control the influence of a single observation on the parameter estimates. These results agree with similar considerations, presented in Osorio et al. (2007), in a symmetric context. The robustness of the ST–NLM, SSL–NLM and SCN–NLM can be also studied through the influence of a single outlying observation on the ML estimate of β. In particular, we can assess how much the ML estimates of θ is influenced by a change
A.M. Garay et al. / Journal of the Korean Statistical Society 40 (2011) 115–124
123
Table 2 Oil palm yield data set. Relative changes in estimates (%) with dropout observations (15, 18). Model fit
β1
β2
β3
SN–NLM ST–NLM SSL–NLM SCN–NLM
1.624986 1.120948 1.576584 0.768666
11.47517 6.471933 11.08431 5.867261
2.161559 0.8630556 1.988226 0.9005396
Fig. 5. Oil palm yield data set. Relative changes on the ML estimates of β2 and β3 when fitting a SN–NLM, ST–NLM SSL–NLM and SCN–NLM for different contaminations of ∇ in the fifth observation on subject 5. Relative change = (( θ(∇) − θ)/ θ), where θ denotes the original estimate and θ(∇) the estimate for the contaminated data.
of ∇ units in a single observation Yk . We replace a single observation yk by yk (∇) = yk − ∇ , and record the relative change in the estimates (( θ (∇) − θ)/ θ ), where θ denotes the original estimate and θ (∇) the estimate for the contaminated data. In this example, we contaminated the observation on subject 5, and varied ∇ between 0 and 5 in increments of 0.4. In Fig. 5 we have presented the results under SN–NLM, ST–NLM, SSL–NLM and SCN–NLM, for β1 and β2 . As expected, the ST, SSL and SCN models are less adversely affected by variations of ∇ than the SN model. 7. Conclusions In this paper, we have proposed the application of a new class of asymmetric distributions, called SMSN distributions, to nonlinear regression models. An EM-type algorithm is developed by exploring the statistical properties of the SMSN class that can be implemented efficiently in software such as SAS, R, Ox and Matlab. The observed information matrix is derived analytically which allows direct implementation of inference on this class of models. Additionally, the simulation study reveals that the asymmetrical nonlinear regression models with heavier tails than skew-normal are more robust against extreme observations. As in the symmetric case, it is important to emphasize the capacity of such models to attenuate outlying observations, by means of attributing to these observations a small weight in the estimation process. Our analysis indicates that a skew-t nonlinear regression model with 3 degrees of freedom seems to fit the data better than the skewnormal nonlinear regression model as well as other asymmetrical nonlinear models in the sense of robustness against outlying observations. The main conclusion is that the ML estimates from the SMSN–NLM with heavy tails are more robust against outlying observations as compared to the corresponding estimates from the SN–NLM. Although the methodology can be conceptually extended when ν is unknown, we faced some methodological and computational hurdles. When ν is unknown, a more efficient ECME algorithm (Meng & van Dyk, 1997) should be used, where one can replace the usual M-step by a step that maximize the restricted actual log-likelihood function. Unfortunately, for the SSL–NLM we have encountered some difficulties (very slow convergence) once it involves an integral in the marginal likelihood (M-step), which complicates the practical use of the proposal. On the other hand, due to recent advances in computational technology, it is worthwhile carrying out Bayesian treatments via Markov chain Monte Carlo (MCMC) sampling methods in the context of SMSN–NLM. Other extensions of the current work include, for example, the implementation of some diagnostic procedures to SMSN–NLMs as those presented in Galea, Paula, and Cysneiros (2005) and Vanegas and Cysneiros (2010). Acknowledgements This work was supported by FAPESP-Brazil and CNPq- Brazil. The authors are very grateful to two Referees and the Associate Editor for helpful constructive comments and suggestions.
124
A.M. Garay et al. / Journal of the Korean Statistical Society 40 (2011) 115–124
References Atkinson, A. (1981). Two graphical displays for outlying and influential observations in regression. Biometrika, 68, 13. Azzalini, A. (1985). A class of distributions which includes the normal ones. Scandinavian Journal of Statistics, 12, 171–178. Azzalini, A., & Genton, M. (2008). Robust likelihood methods based on the skew-t and related distributions. International Statistical Review, 76, 1490–1507. Basso, R. M., Lachos, V. H., Cabral, C. R., & Ghosh, P. (2009). Robust mixture modeling based on scale mixtures of skew-normal distributions. Computational Statistics & Data Analysis, doi:10.1016/j.csda.2009.09.031. Berkane, M., Kano, Y., & Bentler, P. M. (1994). Pseudo maximum likelihood estimation in elliptical theory: effects of misspecification. Computational Statistics & Data Analysis, 18, 255–267. Branco, M. D., & Dey, D. K. (2001). A general class of multivariate skew-elliptical distributions. Journal of Multivariate Analysis, 79, 99–113. Cancho, V. C., Lachos, V. H., & Ortega, E. M. M. (2009). A nonlinear regression model with skew-normal errors. Statistical Papers, doi:10.1007/s00362-0080139-y. Cysneiros, F. J. A., & Vanegas, L. H. (2008). Residuals and their statistical properties in symmetrical nonlinear models. Statistics & Probability Letters, 78, 3269–3273. Dempster, A., Laird, N., & Rubin, D. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39, 1–38. Foong, F. S. (1999). Impact of mixture on potential evapotranspiration, growth and yield of palm oil. In PORIM Interl. Palm Oil Cong. (Agric.) (pp. 265–287). Galea, M., Paula, G., & Cysneiros, F. (2005). On diagnostics in symmetrical nonlinear models. Statistics & Probability Letters, 73(4), 459–467. Lachos, V. H., Ghosh, P., & Arellano-Valle, R. B. (2010). Likelihood based inference for skew-normal independent linear mixed models. Statistica Sinica, 20, 303–322. Lange, K. L., Little, R., & Taylor, J. (1989). Robust statistical modeling using t distribution. Journal of the American Statistical Association, 84, 881–896. Lange, K. L., & Sinsheimer, J. S. (1993). Normal/independent distributions and their applications in robust regression. Journal of Computational and Graphical Statistics, 2, 175–198. Meng, X., & Rubin, D. B. (1993). Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika, 81, 633–648. Meng, X., & van Dyk, D. (1997). The EM algorithm—an old folk-song sung to a fast new tune. Journal of the Royal Statistical Society. Series B, 59(3), 511–567. Osorio, F., Paula, G. A., & Galea, M. (2007). Assessment of local influence in elliptical linear models with longitudinal structure. Computational Statistics and Data Analysis, 51, 4354–4368. Pinheiro, J. C., Liu, C. H., & Wu, Y. N. (2001). Efficient algorithms for robust estimation in linear mixed-effects models using a multivariate t-distribution. Journal of Computational and Graphical Statistics, 10, 249–276. Vanegas, L. H., & Cysneiros, F. J. A. (2010). Assesment of diagnostic procedures in symmetrical nonlinear regression models. Computational Statistics & Data Analysis, 54, 1002–1016. Xie, F. C., Lin, J. G., & Wei, B. C. (2009). Diagnostics for skew-normal nonlinear regression models with ar(1) errors. Computational Statistics & Data Analysis, 53, 4403–4416. Xie, F. C., Wei, B. C., & Lin, J. G. (2009). Homogeneity diagnostics for skew-normal nonlinear regression models. Statistics & Probability Letters, 79, 821–827. Zeller, C. B., Labra, F. V., Lachos, V. H., & Balakrishnan, N. (2010). Influence analyses of skew-normal/independent linear mixed models. Computational Statistics & Data Analysis, 54, 1266–1280.