Nonlinear regression models based on the normal mean–variance mixture of Birnbaum–Saunders distribution

Nonlinear regression models based on the normal mean–variance mixture of Birnbaum–Saunders distribution

Journal of the Korean Statistical Society ( ) – Contents lists available at ScienceDirect Journal of the Korean Statistical Society journal homepa...

493KB Sizes 0 Downloads 52 Views

Journal of the Korean Statistical Society (

)



Contents lists available at ScienceDirect

Journal of the Korean Statistical Society journal homepage: www.elsevier.com/locate/jkss

Nonlinear regression models based on the normal mean–variance mixture of Birnbaum–Saunders distribution Mehrdad Naderi a,b , Alireza Arabpour a , Tsung-I Lin c,d,∗ , Ahad Jamalizadeh a,∗∗ a

Department of Statistics, Faculty of Mathematics and Computer, Shahid Bahonar University of Kerman, Kerman, Iran

b

Young Researchers Society, Shahid Bahonar University of Kerman, Kerman, Iran

c

Institute of Statistics, National Chung Hsing University, Taichung 402, Taiwan

d

Department of Public Health, China Medical University, Taichung 404, Taiwan

article

info

Article history: Received 10 December 2016 Accepted 21 February 2017 Available online xxxx AMS 2000 subject classifications: 62F10 62J02

abstract This paper presents a new extension of nonlinear regression models constructed by assuming the normal mean–variance mixture of Birnbaum–Saunders distribution for the unobserved error terms. A computationally analytical EM-type algorithm is developed for computing maximum likelihood estimates. The observed information matrix is derived for obtaining the asymptotic standard errors of parameter estimates. The practical utility of the methodology is illustrated through both simulated and real data sets. © 2017 The Korean Statistical Society. Published by Elsevier B.V. All rights reserved.

Keywords: Birnbaum–Saunders distribution ECME algorithm GIG distribution Nonlinear regression

1. Introduction The normal nonlinear regression model (N-NLM) that allows for an arbitrary mean function for the response variable has been widely used in investigating the complex relationship of random phenomena under study. However, the N-NLM is often criticized for its lack of stability and robustness against atypical observations exhibiting non-normal features such as asymmetry and heavy tails (Cancho, Dey, Lachos, & Andrade, 2011; Contreras-Reyes & Arellano-Valle, 2013; Garay, Lachos, & Abanto-Valle, 2011; Jamalizadeh & Lin, 2016). Meanwhile, it may cause misleading inferential results when the underlying normality assumption is violated. To overcome this potential deficiency, there has been a growing interest in seeking a robust extension of N-NLM by using a more general family of skewed distributions. For instance, Cancho, Lachos, and Ortega (2010) introduced the skew-normal nonlinear regression models (SN-NLM) to accommodate asymmetric errors. Unfortunately, the SN-NLM is still sensitive to the presence of extreme outlying observations. To enhance the resistance against outliers, Garay et al. (2011) extended the SN-NLM using the scale mixtures of skew-normal (SMSN) distributions, called the SMSN-NLM, which consists of the skew-t (ST-NLM), skew-slash (SSL-NLM) and skew contaminated normal (SCN-NLM) as base members and includes the SN-NLM as a special case. However, the shape parameter and degrees of freedoms (say ν and γ ) in ST-NLM and SCN-NLM cannot be estimated directly due to problems of unbounded likelihood function. Instead, the parameters of ν and γ are fixed at the values that maximize the associated profile likelihood function, as recommended by Lange and

∗ ∗∗

Corresponding author at: Institute of Statistics, National Chung Hsing University, Taichung 402, Taiwan. Corresponding author. E-mail addresses: [email protected] (T.-I. Lin), [email protected] (A. Jamalizadeh).

http://dx.doi.org/10.1016/j.jkss.2017.02.002 1226-3192/© 2017 The Korean Statistical Society. Published by Elsevier B.V. All rights reserved.

2

M. Naderi et al. / Journal of the Korean Statistical Society (

)



Sinsheimer (1993). Recently, Ferreira and Lachos (2016) proposed an extension of nonlinear regression models using the skew-scale mixtures of normal (SSMN) distributions proposed by Ferreira, Bolfarine, and Lachos (2011). Their proposed nonlinear regression model is called SSMN-NLM, which contains two more nonlinear regression models based on the skewt-normal (STN) and skew power exponential (SPE) distributions, referred to as STN-NLM and SPE-NLM, respectively. Pourmousa, Jamalizadeh, and Rezapour (2015) introduced the normal mean–variance mixture of Birnbaum–Saunders (NMVBS) distribution and showed that the NMVBS distribution can be considered as a good alternative to existing skewed distributions for robust inference in many types of models. The main objective of this paper is to propose an alternate robust extension of the N-NLM with errors following a mean-zero NMVBS distribution. This new model will be referred to as the NMVBS-NLM henceforth. In our estimating procedure, unlike the SMSN-NLM, all parameters in NMVBS-NLM can be obtained explicitly without resorting to additional ad-hoc steps. Furthermore, we offer an information-based approach to approximating the variance–covariance matrix of maximum likelihood (ML) estimators. The paper unfolds as follows. In Section 2, we briefly review the NMVBS distribution and study some related properties. In Section 3, we develop an EM-type algorithm for employing maximum likelihood (ML) estimation and provide a general information-based method for obtaining the asymptotic standard errors of ML estimates. Section 4 gives an application to real data. Two simulation studies are carried out in Section 5 to examine the robustness of the model and study the finite sample properties of the proposed EM-based estimators. We end the paper with a short summary in Section 6. 2. The NMVBS distribution A random variable X is said to have a univariate normal mean–variance mixture if it admits the following stochastic representation: X = µ + W λ + W 1/2 Z ,

(1)

where both µ and λ ∈ R, Z ∼ N (0, σ ), and W is an independent non-negative random variable with the distribution function H (·; θ), parameterized by a vector of parameters θ . According to (1), the generalized hyperbolic (GH) distribution (McNeil, Frey, & Embrechts, 2005) was created by considering the generalized inverse Gaussian (GIG) distribution (Barndorff-Nielsen & Halgreen, 1977; Good, 1953) as a mixing variable. The GIG distribution is widely used for modeling and analyzing lifetime data. We shall write W ∼ GIG(κ, χ , ψ) to indicate that the random variable W follows the GIG distribution with the probability density function (pdf) given by 2

ψ χ



fGIG (w; κ, χ , ψ) =

κ/2

   w κ−1 −1  − 1 w χ + wψ , exp √ 2 2Kκ ( ψχ )

w > 0,

(2)

where Kκ (·) denotes the modified Bessel function of the third kind, κ ∈ R, and χ and ψ are parameters that satisfy χ ≥ 0, ψ > 0 if κ > 0; ψ ≥ 0, χ > 0 if κ < 0 and χ > 0, ψ > 0 if κ = 0. If W in (1) has a GIG distribution, then X follows the GH distribution with pdf

  (ψ + λ2 /σ 2 )(χ + (x − µ/σ )2 ) (x−µ)λ/σ 2 fGH (x; µ, λ, σ 2 , κ, χ , ψ) = C  , 0.5−κ e (ψ + λ2 /σ 2 )(χ + (x − µ/σ )2 ) Kκ−0.5

x ∈ R,

where C =

√ ( ψ/χ )κ (ψ + λ2 /σ 2 )0.5−κ √ √ 2π σ 2 Kκ ( ψχ )

is the normalizing constant. Using the law of iterative expectations, the mean and variance of X are, respectively, E [X ] = µ + λ Var(X ) =



χ ψ



χ ψ

1/2

1/2

R(κ,1) ( χ ψ),



 R(κ,1) ( χ ψ)σ 2 + λ2



χ ψ



 R(κ,2) ( χ ψ) −



χ ψ

1/2

  R(κ,1) ( χ ψ) ,

where R(κ,a) (c ) = Kκ+a (c ) Kκ (c ). The Birnbaum–Saunders (BS) distribution was originally introduced by Birnbaum and Saunders (1969) to model failure times and has found several applications in the last three decades. A random variable W follows the BS distribution, denoted by W ∼ BS (α, β), if its cumulative distribution function (cdf) is



 F (w; α, β) = Φ

1

α



w − β



β w

 ,

w > 0, α > 0, β > 0,

(3)

M. Naderi et al. / Journal of the Korean Statistical Society (

)



3

Fig. 1. The density plots for the NMVBS distribution.

where Φ (·) is the cdf of N (0, 1), and α and β are the shape and scale parameters, respectively. In general, the BS distribution can be generated by d

W =

β 4

αZ +

2  (α Z )2 + 4 ,

(4)

where Z ∼ N (0, 1). The aspect of the BS model renders it as an alternative to the normal model for data with non-negative support and positive skewness. Moreover, the cdf of W in (3) can be formulated by a mixture of two GIG distributions (Desmond, 1986). Specifically, F (w; α, β) =

1 2

 FGIG

1

1

w; ,

2 βα

, 2

β α2



  1 −1 1 β + FGIG w; , , . 2 2 βα 2 α 2

(5)

Consider W ∼ BS (α, 1) as a mixing variable for (1). Notice that the scale parameter (β ) of the BS distribution is fixed to unity to avoid the identifiability problem. This gives rise to the NMVBS distribution introduced by Pourmousa et al. (2015), denoted by X ∼ NMVBS (µ, λ, σ 2 , α), and the pdf of which is fNMVBS (x; µ, λ, σ 2 , α) =

1 2

 fGH

x; µ, λ, σ 2 ,

1

1

1

2 α

α2

,

, 2



  1 1 1 1 + fGH x; µ, λ, σ 2 , − , 2 , 2 . 2 2 α α

(6)

The skewness and kurtosis of X are obtained as follows

γx =

µ3 − 3µ1 µ2 + 2µ31 (µ2 − µ21 )1.5

and κx =

µ4 − 4µ1 µ3 + 6µ21 µ2 − 3µ41 − 3, (µ2 − µ21 )2

where

µ1 = E (X ) = λ(1 + 0.5α 2 ),

µ2 = E (X 2 ) = 1 + λ2 (1 + 1.5α 4 ) + α 2 (0.5 + 2λ2 ),    µ3 = E (X 3 ) = λ3 (1 + 7.5α 6 ) + λ 3 + α 2 α 2 (4.5 + 9λ2 ) + 6 + 4.5λ2 , µ4 = E (X 4 ) = λ4 (1 + 52.5α 8 ) + 3(1 + 2α 2 + 1.5α 4 )   + λ2 6 + α 2 (27 + 8λ2 ) + α 4 (54 + 30λ2 ) + α 6 (45 + 57.5λ2 ) . Fig. 1 displays the density curves of (6) for µ = 0, σ = 1 and various values of λ and α . It is clearly seen that the NMVBS distribution can produce very strong skewness and extremely heavier tails than the normal distribution. Table 1 presents the maximum ranges of skewness and kurtosis of the NMVBS distribution with respect to λ for different values of α . When α increases and λ is close to zero, the NMVBS distribution tends to the symmetric class of scale mixture of normal distribution based on the BS distribution. On the other hand, the NMVBS distribution tends to N (µ + λ, σ 2 ) as α approaches zero. Moreover, it can be observed from Table 1 that the NMVBS distribution takes wider ranges of skewness and kurtosis as compared with the ST and STN distributions (Ho, Pyne, & Lin, 2012; Lin, Ho, & Lee, 2014).

4

M. Naderi et al. / Journal of the Korean Statistical Society (

)



Table 1 The ranges of skewness and kurtosis for different values of α .

α

Skewness

Kurtosis

0.01 0.5 1 5 10 15 20 30 50 100

(−0.029, 0.029) (−1.455, 1.455) (−2.519, 2.519) (−3.879, 3.879) (−3.980, 3.980) (−3.999, 3.999) (−4.007, 4.007) (−4.012, 4.012) (−4.014, 4.014) (−4.016, 4.016)

(0.001, 0.002) (0.778, 3.079) (3.000, 9.358) (13.273, 21.883) (14.539, 22.990) (14.802, 23.213) (14.904, 23.292) (14.998, 23.349) (15.122, 23.379) (15.542, 23.388)

Consequently, we establish the following proposition, which is useful for the calculation of some conditional expectations involved in the proposed EM-type algorithm discussed in the next section. Proposition 1. Let X ∼ NMVBS (µ, λ, σ 2 , α) and W ∼ BS (α, 1). By Bayes’ rule, we have fW |X =x (w; x, µ, λ, σ 2 , α) = p(x)fGIG (w; 0, χ (x, µ, σ 2 , α), ψ(λ, σ 2 , α))

+ (1 − p(x))fGIG (w; −1, χ (x, µ, σ 2 , α), ψ(λ, σ 2 , α)),  2 where χ (x, µ, σ 2 , α) = (x − µ)/σ + α −2 , ψ(λ, σ 2 , α) = (λ/σ )2 + α −2 and p(x) =

fGH (x; µ, λ, σ 2 , 0.5, α −2 , α −2 ) fGH (x; µ, λ, σ 2 , 0.5, α −2 , α −2 ) + fGH (x; µ, λ, σ 2 , −0.5, α −2 , α −2 )

w > 0,

.

Also,



χ (x, µ, σ 2 , α) ψ(λ, σ 2 , α)

r / 2 



 E (W | X = x) = p(x)R(0,r ) χ (x, µ, σ , α)ψ(λ, σ , α)   χ (x, µ, σ 2 , α)ψ(λ, σ 2 , α) , r = ±1, ±2, . . . . + (1 − p(x))R(−1,r ) r

2

2

(7)

3. Model and estimation The nonlinear regression model based on the NMVBS distribution, called the NMVBS-NLM hereafter, is defined as i.i.d.

Yi = η β, xi + ϵi ,



ϵi ∼ NMVBS (−bλ, λ, σ 2 , α),



i = 1 , . . . , n,

(8)

where Y = (Y1 , . . . , Yn ) are response variables, η is an injective and twice continuously differentiable function with respect to β = (β1 , . . . , βp )T and xi is a vector of explanatory variable values corresponding to Yi . In model (8), we choose b = E (W ) = (1 + 0.5α 2 ) such that E (ϵi ) = 0 and the regression coefficients are comparable to ind

those of the N-NLM. It can be verified that Yi ∼ NMVBS (η(β, xi ) − bλ, λ, σ 2 , α) with E [Yi ] = η(β, xi ) and

var(Yi ) = (1 + 0.5α )σ + 2

2

 1+

5 4

α

2



λ2 α 2 .

Alternatively, the NMVBS-NLM can be formulated by a two-level hierarchical representation Yi W = wi ∼ N η(β, xi ) − bλ + wi λ, wi σ 2 ,







Wi ∼ BS (α, 1).

(9)

  The complete-data log-likelihood function for θ = β, λ, σ 2 , α associated with the observed responses y = (y1 , . . . , yn ) and hidden variables w = (w1 , . . . , wn ), omitting additive constants, is ℓc (θ | y , w ) = −n log(α) − −

n 1 

2σ 2 i=1



n 2

log σ 2 −

n  (wi − 1)2 i =1

2α 2 wi

   (yi − η(β, xi ) + bλ)2 2 − 2λ yi − η(β, xi ) + bλ + wi λ . wi

(10)

M. Naderi et al. / Journal of the Korean Statistical Society (

)



5

To compute the ML estimates of unknown parameters involved in (10), we adopt the Expectation Conditional Maximization Either (ECME) algorithm as proposed by Liu and Rubin (1994), which is a variant of the EM algorithm (Dempster, Laird, & Rubin, 1977) and a simple extension of the ECM algorithm (Meng & Rubin, 1993) with the maximization (M) step of EM replaced by a sequence of computationally simpler conditional maximization (CM) steps that maximize either the Q function or the corresponding constrained actual likelihood function. The ECME algorithm for ML estimation of the NMVBSNLM proceeds as follows:

• E-step: At the iteration k, we compute the Q -function, defined as the expected value of complete data log-likelihood (10) (k) with θ evaluated at θˆ (k) n  (k) Aˆ i n Q (θ | θˆ ) = −n log α − log σ 2 − 2 2α 2 i=1 − (k)

(k)

ˆi where Aˆ i = w Proposition 1.

n 1  (k)

2σ 2 i=1

   vˆ i (yi − η(β, xi ) + bλ)2 − 2λ yi − η(β, xi ) + bλ + w ˆ i(k) λ2 , (k)

(11)

(k)

+ vˆ i(k) − 2 with w ˆ i(k) = E (Wi | yi , θˆ ) and vˆ i(k) = E (Wi−1 | yi , θˆ ) calculated by using (7) in (k)

• CM-steps: For updating θˆ , suppose first that η(β, x) is a linear function, i.e., η(β, x) = x⊤ β. Maximizing (11) over θ leads to the following CM estimators: (k+1) βˆ =



 −1  n  (k) (k) ˆ (k) ˆ vˆ i xi xi (ˆvi yi − λ U1i )xi ,

n 

(k)



i=1 n 

ˆ (k+1)

λ

=

i =1

(k)

ˆ Uˆ 1i (yi − η(β

(k+1)

, xi ))

i=1 n 

,

(k) Uˆ 2i

i=1 n  2 (k+1) (k+1) 1   (k)  , xi ) − 2Uˆ 1i(k) λˆ (k+1) (yi − η(βˆ , xi )) + Uˆ 2i(k) λˆ 2(k+1) , σˆ 2(k+1) = vˆ i yi − η(βˆ n i =1

(k)

(k)

(k)

(k)

(k)

ˆ i and bˆ (k) = 1 + 0.5αˆ 2(k) . Since there is no closed-form solution where Uˆ 1i = 1 − bˆ (k) vˆ i , Uˆ 2i = bˆ 2(k) vˆ i − 2bˆ (k) + w (k+1) for αˆ , we update α by maximizing the constrained actual observed log-likelihood function, namely αˆ (k+1) = arg max ℓobs (α, βˆ

(k+1)

α

, λˆ (k+1) , σˆ 2(k+1) ),

where (k+1) ℓobs (α, βˆ , λˆ (k+1) , σˆ 2(k+1) ) =

n 



ˆ log fNMVBS yi ; x⊤ i β

(k+1)

 − (1 + 0.5α 2 )λˆ (k+1) , λˆ (k+1) , σˆ 2(k+1) , α .

i =1

ˆ For an arbitrary nonlinear regression function η(β, x), we use the Taylor expansion for model (8) around β

(k)

  (k)  (k)  , xi + η′ βˆ , xi β − βˆ + ϵi     (k) ˆ , xi is the first partial derivative of η β, xi with respect to β evaluated at β = βˆ (k) . It follows that where η′ β  (k)      ˆ , xi + η′ βˆ (k) , xi βˆ (k) = η′ βˆ (k) , xi β + ϵi . yi − η β  (k)      ˆ , xi + η′ βˆ (k) , xi βˆ (k) and the pseudo covariates as x˜ i = η′ βˆ (k) , xi . As such, Define the pseudo responses as y˜ i = yi − η β the regression coefficients β can be estimated in the same manner described in the above CM-steps. (k+1) (k) The above procedure is iterated until a suitable convergence rule is satisfied, e.g., ℓ(θˆ ) − ℓ(θˆ ) is less than a usern ˆ = ˆ xi ) − bˆ λ, ˆ λ, ˆ σˆ 2 , α) specified tolerance, where ℓ(θ) log fNMVBS (yj ; η(β, ˆ is the observed log-likelihood evaluated at θˆ . ˆ yi = η β 

(k)

j =1

To avoid an indication of lack of progress of the algorithm (McNicholas, Murphy, McDaid, & Frost, 2010), we recommend adopting the Aitken acceleration (Aitken, 1926) method as the stopping criterion. At iteration k, we first compute the Aitken acceleration factor a(k) =

ℓ(θˆ

(k+1)

(k) ) − ℓ(θˆ )

(k) (k−1) ℓ(θˆ ) − ℓ(θˆ )

.

6

M. Naderi et al. / Journal of the Korean Statistical Society (

)



Following Böhning, Dietz, Schaub, Schlattmann, and Lindsay (1994), the asymptotic estimate of the log-likelihood at iteration k + 1 is given by

ℓ∞ (θˆ

(k+1)

) = ℓ(θˆ

(k+1)

)+

1 1−



a(k)

ℓ(θˆ

(k+1)

 (k) ) − ℓ(θˆ ) . (k+1)

(k)

As pointed out by Lindsay (1995), the algorithm can be considered to have reached convergence when ℓ∞ (θˆ )−ℓ(θˆ ) < ε. In our study, the tolerance ε is considered as 10−5 . ˆ λ, ˆ σˆ 2 , α) To compute the asymptotic covariance of the ML estimates, denoted by θˆ = (β, ˆ , we employ the informationbased method suggested by Meilijson (1989). Formally, the empirical information matrix is defined as Ie (θ | y ) =

n 

s(yj | θ)s⊤ (yj | θ) −

j =1

where S (y | θ) =

n

j =1

1 n

S (y | θ)S ⊤ (y | θ),

(12)

s(yj | θ) and s(yj | θ) are individual scores which can be determined from the result of Louis (1982) as

  ∂ f (yj | θ) ∂ℓc (θ | yj , wj )  s(yj | θ) = =E yj , θ , ∂θ ∂θ where ℓc (θ | yj , wj ) is the individual complete-data log-likelihood defined in (10) formed by single observation yj and mixing variable wj .

ˆ λ, ˆ σˆ 2 , α) Substituting the ML estimates θˆ = (β, ˆ into (12) gives Ie (θˆ | y ) =

n 

sˆ j sˆ ⊤ j ,

(13)

j=1

where sˆ j = (ˆsj,β1 , . . . , sˆ j,βp , sˆ j,λ , sˆ j,σ 2 , sˆ j,α ). The explicit expressions for the elements of sˆ j are summarized as follows Aˆ j

 1 1  ˆ yj − η(β, ˆ xj ) + bˆ λ) ˆ − αˆ λˆ 2 , − − v ˆ α ˆ λ( j αˆ 3 αˆ σˆ 2  1  ′ ˆ xj )(yj − η(β, ˆ xj ) + bˆ λ) ˆ − λη ˆ r′ (β, ˆ xj ) , (r = 1, . . . , p) sˆ j,βr = 2 vˆ j ηr (β, σˆ  −1  ˆ xj ) + bˆ λ) ˆ − (yj − η(β, ˆ xj )) − 2bˆ λˆ + w ˆ j λˆ , sˆ j,λ = 2 bˆ vˆ j (yj − η(β, σˆ  1  1 ˆ xj ) + bˆ λ) ˆ 2 − 2λ( ˆ yj − η(β, ˆ xj ) + bˆ λ) ˆ +w ˆ2 − , v ˆ ( y − η( β, ˆ sˆ j,σ 2 = λ j j j 4 2σˆ 2σˆ 2     ˆ xi is the first partial derivative of η β, xi with respect to β evaluated at β = βˆ . As a result, the standard errors where η′ β, of θˆ are obtained as the square roots of the diagonal elements of the inverse of (13). sˆ j,α =

4. An application to real data As an illustration, we apply the proposed method to the oil palm data studied previously by Foong (1999) and described in Table 1 of Cancho et al. (2010), who considered the skew-normal distribution errors in nonlinear regression models. Ferreira and Lachos (2016) analyzed the same data set using the SSMN family of distributions and showed that the STN-NLM has the best fitting performance. We follow Cancho et al. (2010) to adopt the following three-parameter nonlinear regression model with an inverted S-shaped curve Yi =

β1 1 + β2 exp{−β3 xi }

+ ϵi ,

(14)

i.i.d.

where ϵi ∼ NMVBS (−bλ, λ, σ 2 , α) with E (ϵi ) = 0 for i = 1, . . . , 19. We perform the ECME algorithm with pseudo responses (y˜ i ) and covariates (˜xi ) described in Section 3 to carry out ML estimation for model (14). The respective SN-NLM, ST-NLM, SCN-NLM and SSL-NLM considered in Garay et al. (2011) are also fitted for the sake of model comparison. The models in competition are compared using the AIC and BIC, defined as AIC = 2m − 2ℓmax

and

BIC = m log n − 2ℓmax ,

where m is the number of free parameters and ℓmax is the maximized log-likelihood value. Models with lower values of AIC or BIC are considered more preferable. Table 2 shows the ML results obtained by fitting the five considered models. We follow the same specification of Garay et al. (2011) when implementing the ST-NLM (ν = 3) to fit the data. Results based on AIC and BIC indicate that the NMVBSNLM provides a highly improved fit of the data over the SMSN-NLM. Moreover, we found that the NMVBS-NLM yields quite

M. Naderi et al. / Journal of the Korean Statistical Society (

)



7

Fig. 2. QQ-plots and simulated envelopes for the Pearson residuals of the oil palm yield data under various nonlinear regression models.

smaller standard errors for the estimated regression coefficients, indicating the NMVBS distribution allows to produce more precise estimates for this data example. To verify the validity of the model and to identify atypical observations, we performed a residual analysis. In general, residuals are defined as the differences between the observed values of the response variable and the fitted conditional mean. The standardized ordinary residuals, namely Pearson residuals, are defined as yi − µ ˆi , ri =   (Yi ) Var

i = 1, . . . , 19,

ˆ xi ) − bˆ λˆ and Var  (Yi ) = (1 + 0.5αˆ 2 )σˆ 2 + (1 + 1.25αˆ 2 )λˆ 2 αˆ 2 . where µ ˆ i = η(β, Atkinson (1981) suggested that when the distribution of the standardized residuals cannot be obtained, the simulated envelope can be used as a helpful diagnostic tool to detect an incorrect specification of the error distribution and the systematic component η(β, xi ), as well as the presence of outlying observations. The QQ-plots and envelopes for the Pearson residuals of the data set are shown in Fig. 2. The lines in these figures represent the 5th percentile, the mean, and the 95th percentile of 100 simulated points for each observation. The figure highlights that the NMVBS-NLM is the most appropriate for modeling the oil palm yield data since there is no observation falling beyond the envelope. 5. Simulation study We carry out two simulation studies to illustrate the performance of our model and computational method. The first simulation study aims at showing the robustness on estimating NMVBS-NLM models in which some outliers are introduced into the simulated data. The second simulation study demonstrates if our proposed ECME algorithm can provide good asymptotic properties.

8

M. Naderi et al. / Journal of the Korean Statistical Society (

)



Table 2 ML estimates and estimated standard errors of the oil palm data set. Parameter

NMVBS-NLM

SN-NLM

ST-NLM

SCN-NLM

SSL-NLM

MLE

SE

MLE

SE

MLE

SE

MLE

SE

MLE

SE

β1 β2 β3 σ2 λ α ν γ

37.967 46.349 0.744 0.080 −0.003 16.798 – –

0.234 1.506 0.004 0.004 0.002 6.063 – –

37.351 44.576 0.731 6.919 −4.453 – – –

0.462 17.039 0.070 2.655 3.125 – – –

37.529 43.483 0.732 1.644 −1.871 – 3 –

0.441 10.364 0.045 1.520 1.332 – – –

37.714 42.579 0.722 2.077 −2.969 – 0.3 0.3

0.413 11.826 0.052 1.343 1.641 – – –

37.463 43.373 0.728 3.105 −3.489 – 2.000 –

0.486 14.982 0.063 1.708 2.481 – – –

ℓmax

−24.521

−35.037

−33.829

AIC BIC

61.042 66.708

80.074 84.796

79.659 85.324

34.132 82.265 88.875

−34.781 81.562 86.986

Fig. 3. Average MMRS as a function of contamination level (ϑ ).

5.1. Robustness of the EM estimates In this experiment, we simulate 300 Monte Carlo samples from model (14)√ of size n = 50, while the errors are randomly generated from the skew-normal distribution with parameters µ = −1.537 2/π , σ 2 = 2.95 and λ = 2. The presumed parameters are β1 = 37.3, β2 = 44.5, β3 = 0.73 and the covariates xi are chosen sequentially from 4 to 53 and these values were held fixed for all replications. Further, we add perturbations into the sixth observation, namely y∗6 = y6 − ϑ with ϑ varying from 1 to 10. The parameter estimates are computed under NMVBS-NLM, SN-NLM, ST-NLM and SCN-NLM models in each replication with and without contaminations, denoted by θˆ (ϑ) and θˆ , respectively. To assess relative changes on the parameter estimates, we calculate the mean magnitude of relative error (MMRE; Fagundes, De Souza, & Cysneiros, 2013), defined as MMRE =

1 4



3  βˆ j(ϑ) − βˆ j j =1

βˆ j

+

2 σˆ (ϑ) − σˆ 2

σˆ 2

 .

Fig. 3 displays the average MMRE as a function of contamination level ϑ . It can be observed that the influence of the outliers in estimation increases as ϑ increases for all models. As can be expected, the NMVBS model is less adversely affected, indicating that our proposed model is robust against the presence of outlier observations. On the other hand, an extreme observation seems to be much more effective on the SN-NLM, reflecting a lack of enough ability to reduce the influence of outliers for skew-normal models. 5.2. Finite sample properties of ML estimates and standard errors We conduct a second simulation study to examine the finite sample performances of the ML estimates obtained by using the ECME algorithm and their standard errors via the information-based method described in Section 3. In each replication

M. Naderi et al. / Journal of the Korean Statistical Society (

)



9

Table 3 Simulation results for assessing the consistency of parameter estimates and standard errors with various sample sizes. True parameter values are shown in parentheses. Sample size 100

200

500

1000

Measure

β1 (37.9)

β2 (46.13)

β3 (0.74)

σ 2 (1)

λ(−1)

α(16.5)

bias MSE STD ASE

−0.045

0.802 1.551 0.966 1.128

0.018 0.072 0.243 0.198

−0.279 0.369 0.549 0.614

0.139 0.539 0.713 0.672

−0.343 1.106 0.997 0.879

bias MSE STD ASE

−0.018

0.396 0.982 0.904 0.959

−0.009 0.010 0.089 0.047

−0.238 0.256 0.466 0.508

−0.107

−0.267 0.629 0.782 0.701

bias MSE STD ASE

−0.007

0.061 0.346 0.563 0.572

−8.615e−04 5.320e−05 0.005 0.004

−0.164 0176 0.378 0.354

0.082 0.143 0.359 0.348

0.217 0.459 0.412

bias MSE STD ASE

−0.001

0.007 0.098 0.311 0.315

7.629e−05 6.928e−06 0.002 0.002

0.088 0.106 0.307 0.307

0.007 0.058 0.239 0.241

0.012 0.093 0.300 0.297

0.051 0.217 0.144 0.029 0.148 0.118 0.011 0.105 0.091 0.007 0.077 0.076

0.379 0.603 0.584

−0.100

of 300 trials, we generate data from the NMVBS-LNM with true parameter values displayed in Table 3 for various sample sizes n = 100, 200, 500 and 1000. The covariates xi are randomly chosen from a uniform distribution over the interval (0, 15). To investigate the estimation accuracies, we compute the bias and the mean squared error (MSE): bias =

300 1 

300 i=1

(θˆi − θtrue ) and MSE =

300 1 

300 i=1

2 θˆi − θtrue ,

where θˆi denotes estimate of a specific parameter at the ith replication. In addition, we are interested in comparing the standard deviation (STD) of ML estimates across 300 replications with the average values of their approximate standard errors (ASE) obtained through the information-based method. The STD and ASE of ML estimates are defined as

    300 300 300  2  1  1 1  STD =  θˆi2 − θˆi se(θˆi ). and ASE = 299

i =1

300

i=1

300 i=1

The detailed numerical results are reported in Table 3. It can be observed that both magnitudes of bias and MSE tend to decrease toward zero by increasing the sample size, showing empirically the consistency of the ML estimates obtained via the ECME algorithm. Moreover, results reveal the values of STD and ASE tend to be very close as n increases, signifying that the expected information matrix can be accurately approximated. 6. Conclusion In this paper, we have dealt with the application of the NMVBS distribution to nonlinear regression models, called the NMVBS-NLM, which allows practitioners for performing regression analysis in a wide variety of considerations. We have presented a convenient hierarchical representation for the NMVBS distribution and developed a simple EM-type algorithm for parameter estimation. Numerical results suggest that the proposed NMVBS-NLM is well suited to the experimental data and can be more robust against extreme observations than the SMSN-NLM competitors. The utility of our current approach can be extended to accommodate censored responses based on a recent work studied by Garay, Lachos, and Lin (2016), Wang and Lin (2017) and Wang, Lin, and Lachos (2015). In addition, a Bayesian approach could be adopted in order to have a more exact inference of parameters using an appropriate prior distribution, see Cancho et al. (2010), López Quintero, Contreras-Reyes, Wiff, and Arellano-Valle (2017) and Wang and Lin (2015). Acknowledgments The authors are grateful to the anonymous reviewers and the editor for their insightful comments and suggestions that greatly improved this paper. This work was partially supported by the Ministry of Science and Technology of Taiwan under Grant No. MOST 105-2118-M-005-003-MY2. The computer program coded in R language is available from the first author upon request. References Aitken, A. C. (1926). On Bernoullis numerical solution of algebraic equations. Proceedings of the Royal Society of Edinburgh, 46, 289–305. Atkinson, A. (1981). Two graphical displays for outlying and influential observations in regression. Biometrika, 68, 13–20.

10

M. Naderi et al. / Journal of the Korean Statistical Society (

)



Barndorff-Nielsen, O., & Halgreen, C. (1977). Infinite divisibility of the hyperbolic and generalized inverse Gaussian distributions. Zeitschrift fur Wahrscheinlichkeitstheorie und Verwandte Gebiete, 38, 309–311. Birnbaum, Z. W., & Saunders, S. C. (1969). A new family of life distributions. Journal of Applied Probability, 6, 319–327. Böhning, D., Dietz, E., Schaub, R., Schlattmann, P., & Lindsay, B. (1994). The distribution of the likelihood ratio for mixtures of densities from the oneparameter exponential family. Annals of the Institute of Statistical Mathematics, 46, 373–388. Cancho, V. G., Dey, D. K., Lachos, V. H., & Andrade, M. G. (2011). Bayesian nonlinear regression models with scale mixtures of skew-normal distributions: Estimation and case influence diagnostics. Computational Statistics & Data Analysis, 55, 588–602. Cancho, V. G., Lachos, V. H., & Ortega, E. M. M. (2010). A nonlinear regression model with skew-normal errors. Statistical Papers, 51, 547–558. Contreras-Reyes, J. E., & Arellano-Valle, R. B. (2013). Growth estimates of cardinalfish (Epigonus crassicaudus) based on scale mixtures of skew-normal distributions. Fisheries Research, 147, 137–144. Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm (with discussion). Journal of the Royal Statistical Society. Series B. Statistical Methodology, 39, 1–38. Desmond, A. F. (1986). On the relationship between two fatigue-life models. IEEE Transactions on Reliability, 35, 167–169. Fagundes, R. A., De Souza, R. M., & Cysneiros, F. J. A. (2013). Robust regression with application to symbolic interval data. Engineering Applications of Artificial Intelligence, 26, 564–573. Ferreira, C. S., Bolfarine, H., & Lachos, V. H. (2011). Skew scale mixtures of normal distributions: properties and estimation. Statistical Methodology, 8, 154–171. Ferreira, C. S., & Lachos, V. H. (2016). Nonlinear regression models under skew scale mixtures of normal distributions. Statistical Methodology, 33, 131–146. Foong, F.S. 1999. Impact of mixture on potential evapotranspiration, growth and yield of palm oil. In PORIM interl. palm oil cong. (agric.), (pp. 265–287). Garay, A. M., Lachos, V. H., & Abanto-Valle, C. A. (2011). Nonlinear regression models based on scale mixtures of skew-normal distributions. Journal of the Korean Statistical Society, 50, 115–124. Garay, A. M., Lachos, V. H., & Lin, T. I. (2016). Nonlinear censored regression models with heavy-tailed distributions. Statistics and its Interface, 9, 281–293. Good, I. J. (1953). The population frequencies of species and the estimation of population parameters. Biometrika, 40, 237–260. Ho, H. J., Pyne, S., & Lin, T. I. (2012). Maximum likelihood inference for mixtures of skew Student-t-normal distributions through practical EM-type algorithms. Statistics and Computing, 22, 287–299. Jamalizadeh, A., & Lin, T. I. (2016). A general class of scale-shape mixtures of skew-normal distributions: properties and estimation. Computational Statistics, http://dx.doi.org/10.1007/s00180-016-0691-1. Lange, K. L., & Sinsheimer, J. S. (1993). Normal/independent distributions and their applications in robust regression. Journal of Computational and Graphical Statistics, 2, 175–198. Lin, T. I., Ho, H. J., & Lee, C. R. (2014). Flexible mixture modelling using the multivariate skew-t-normal distribution. Statistics and Computing, 24, 531–546. Lindsay, B. G. (1995). Mixture models: Theory, geometry and applications. Hayward, CA: Institute of Mathematical Statistics. Liu, C., & Rubin, D. B. (1994). The ECME algorithm: a simple extension of EM and ECM with faster monotone convergence. Biometrika, 81, 633–648. López Quintero, F. O., Contreras-Reyes, J. E., Wiff, R., & Arellano-Valle, R. B. (2017). Flexible Bayesian analysis of the von Bertalanffy growth function with the use of a log-skew-t distribution. Fishery Bulletin, 115, 13–26. Louis, T. A. (1982). Finding the observed information when using the EM algorithm. Journal of the Royal Statistical Society. Series B. Statistical Methodology, 44, 226–232. McNeil, A., Frey, R., & Embrechts, P. (2005). Quantitative risk management: Concepts, techniques and tools. Princeton University Press. McNicholas, P. D., Murphy, T. B., McDaid, A. F., & Frost, D. (2010). Serial and parallel implementations of model-based clustering via parsimonious Gaussian mixture models. Computational Statistics & Data Analysis, 54, 711–723. Meilijson, I. (1989). A fast improvement to the EM algorithm to its own terms. Journal of the Royal Statistical Society. Series B. Statistical Methodology, 51, 127–138. Meng, X. L., & Rubin, D. B. (1993). Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika, 80, 267–278. Pourmousa, R., Jamalizadeh, A., & Rezapour, M. (2015). Multivariate normal mean–variance mixture distribution based on Birnbaum-Saunders distribution. Journal of Statistical Computation and Simulation, 85, 2736–2749. Wang, W. L., & Lin, T. I. (2015). Bayesian analysis of multivariate t linear mixed models with missing responses at random. Journal of Statistical Computation and Simulation, 85, 3594–3612. Wang, W. L., & Lin, T. I. (2017). Multivariate-t nonlinear mixed models with application to censored multi-outcome AIDS studies. Biostatistics, http://dx.doi.org/10.1093/biostatistics/kxx013. Wang, W. L., Lin, T. I., & Lachos, V. H. (2015). Extending multivariate-t linear mixed models for multiple longitudinal data with censored responses and heavy tails. Statistical Methods in Medical Research, http://dx.doi.org/10.1177/0962280215620229.