On some mixture models based on the Birnbaum–Saunders distribution and associated inference

On some mixture models based on the Birnbaum–Saunders distribution and associated inference

Journal of Statistical Planning and Inference 141 (2011) 2175–2190 Contents lists available at ScienceDirect Journal of Statistical Planning and Inf...

421KB Sizes 0 Downloads 21 Views

Journal of Statistical Planning and Inference 141 (2011) 2175–2190

Contents lists available at ScienceDirect

Journal of Statistical Planning and Inference journal homepage: www.elsevier.com/locate/jspi

On some mixture models based on the Birnbaum–Saunders distribution and associated inference N. Balakrishnan a, Ramesh C. Gupta b, Debasis Kundu c, Vı´ctor Leiva d,, Antonio Sanhueza e a

Department of Mathematics and Statistics, McMaster University, Hamilton, Ontario, Canada Department of Mathematics and Statistics, University of Maine, Orono, ME, USA Department of Mathematics and Statistics, Indian Institute of Technology Kanpur, Kanpur, India d Departamento de Estadı´stica, CIMFAV, Universidad de Valparaı´so, Valparaı´so, Chile e ´tica y Estadı´stica, Universidad de La Frontera, Temuco, Chile Departamento de Matema b c

a r t i c l e in f o

abstract

Article history: Received 7 June 2010 Received in revised form 5 December 2010 Accepted 8 December 2010 Available online 16 December 2010

In this paper, we consider three different mixture models based on the Birnbaum–Saunders (BS) distribution, viz., (1) mixture of two different BS distributions, (2) mixture of a BS distribution and a length-biased version of another BS distribution, and (3) mixture of a BS distribution and its length-biased version. For all these models, we study their characteristics including the shape of their density and hazard rate functions. For the maximum likelihood estimation of the model parameters, we use the EM algorithm. For the purpose of illustration, we analyze two data sets related to enzyme and depressive condition problems. In the case of the enzyme data, it is shown that Model 1 provides the best fit, while for the depressive condition data, it is shown all three models fit well with Model 3 providing the best fit. & 2010 Elsevier B.V. All rights reserved.

Keywords: EM algorithm Fisher information Goodness-of-fit Hazard rate function Inverse Gaussian distribution Length-biased distributions Maximum likelihood methods

1. Introduction Birnbaum and Saunders (1969) proposed a fatigue failure model based on a physical argument derived from the cumulative damage or Miner law. This model is known as the Birnbaum–Saunders (BS) distribution. Desmond (1985) strengthened the physical justification of this model and relaxed some of the assumptions made by Birnbaum and Saunders (1969). The BS model is a positively skewed, unimodal, two-parameter distribution with non-negative support possessing many attractive properties. This model has found several applications in the literature including lifetime, survival and environmental data analysis; see, e.g., Balakrishnan et al. (2007), Leiva et al. (2007, 2008, 2009, 2010b), Kundu et al. (2008), and Barros et al. (2008). Extensive work has been done on the BS distribution with regard to its properties, inference and applications. A comprehensive treatment on the BS distribution till mid 90’s can be found in Johnson et al. (1995, pp. 651–662). For more recent references on this distribution and some of its generalizations, the readers may refer to Ng et al. (2003, 2006), Kundu et al. (2008), Sanhueza et al. (2008), Balakrishnan et al. (2009b), and Xiao et al. (2010). Since a random variable (RV) following the BS distribution is defined through a standard normal RV, then the probability density function (PDF) and the cumulative distribution function (CDF) of the BS model can be expressed in terms of the standard normal PDF and CDF, respectively. As mentioned, the PDF of the BS distribution is unimodal. Kundu et al. (2008)

 Corresponding author at: Departamento de Estadı´stica, Universidad de Valparaı´so, Avda. Gran Bretan ˜ a 1111, Playa Ancha, Valparaı´so, Chile. Tel.: +56 32 2508273; fax: +56 32 2508322. E-mail addresses: [email protected], [email protected] (V. Leiva).

0378-3758/$ - see front matter & 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.jspi.2010.12.005

2176

N. Balakrishnan et al. / Journal of Statistical Planning and Inference 141 (2011) 2175–2190

showed that the hazard rate function (HRF) of this distribution is not monotone and is in fact unimodal for all values of the parameters; see also Gupta and Akman (1995, 1997) and Bebbington et al. (2008). Length-biased (LB) versions of a distribution have received considerable attention in the literature; see Fisher (1934), Rao (1965), and Cox (1969). The LB distributions are a special case of the weighted distributions, which have been derived from a LB sampling procedure; see Patil (2002). These distributions have found applications in diverse areas such as biometry, ecology, environmental science, and reliability analysis. LB versions of several distributions have been considered. Patil and Rao (1978) provided a table of some basic distributions and their LB versions, while Khattree (1989) presented relationships among variates and their LB versions for some specific distributions. In the context of reliability, this relationship was treated by some other authors; see, e.g., Gupta and Keating (1985), Jain et al. (1989), and references therein. A review of different LB distributions and their applications has been provided by Gupta and Kirmani (1990); one may also refer to Olyede and George (2002) for some more recent works in this direction. Specifically, LB versions of the inverse Gaussian (IG) and lognormal distributions can be seen in Gupta and Akman (1995) and Sansgiry and Akman (2001). For more details on the IG model, one may refer to Chhikara and Folks ¨ (1989), Johnson et al. (1994, Chapter 15), Jorgensen (1982, 1997), and Seshadri (1993, 1999), and to Balakrishnan et al. (2009a) for more recent results about the BS and IG models and their related mixture distributions. Recently, Leiva et al. (2009) considered the LB version of the BS (LBS) distribution and illustrated its application in water quality analysis. In reliability studies, populations often turn out to be heterogeneous simply because there are at least two subpopulations, with one being the standard subpopulation (sometimes called strong) and the other one being the defective subpopulation (sometimes called weak). For this reason, data arising from such heterogeneous populations need to be modeled by a mixture of two or more life distributions. More details about life distributions can be seen in Johnson et al. (1995, pp. 639–681) and Marshall and Olkin (2007), and about mixture distributions in McLachlan and Peel (2000). In this paper, we consider three different two-component mixture models based on the BS and LBS distributions. Model 1 is a mixture of two BS distributions with different sets of parameters, which involves five parameters including one mixing parameter. Model 2 is a mixture of the BS and LBS distributions with different sets of parameters, which is once again a five-parameter model. Finally, Model 3 is a special case of Model 2, which is a mixture of a BS distribution and its LB version, and so consequently it has only three parameters including one mixing parameter. Here, we study different characteristics and reliability aspects of these three mixture distributions. It is well known (see, e.g., Ng et al., 2003) that the maximum likelihood (ML) estimators of the parameters of a two-parameter BS distribution cannot be obtained in closed-form. Thus, one needs to solve a non-linear equation for the determination of the ML estimates. This is also the case when the parameters of the LBS distribution are estimated by the ML method. For the direct computation of the ML estimates of the model parameters, one needs to solve five-dimensional optimization problems for Models 1 and 2, but this optimization problem is reduced to one three-dimensional for the case of Model 3. To facilitate the associated numerical procedure, we use the expectation and maximization (EM) algorithm proposed by Dempster et al. (1977) for the computation of the ML estimates of all the model parameters through which the required multi-dimensional optimization is solved by a sequence of one-dimensional optimizations. Finally, using the idea of Louis (1982), it also becomes possible to compute the Fisher information matrix, which is useful in constructing confidence intervals for the model parameters. ¨ Bhattacharyya and Fries (1982), Desmond (1986), and Jorgensen et al. (1991) noted a relationship between the BS and IG distributions. All of them observed that the BS distribution can be obtained as an equally weighted mixture of an IG distribution and its LB (LIG) version (or complementary reciprocal). In the case of Model 1, this relationship can be utilized for computing the ML estimates more efficiently by the EM algorithm. It is observed that for each E-step of this algorithm, the corresponding M-step can be obtained in an explicit form by using the fact that the ML estimates of the parameters of the IG and LIG distributions can be explicitly obtained. This renders the implementation of the EM algorithm to be efficient and effective. In the case of Model 2, while implementing the EM algorithm, it is observed that in each E-step, the corresponding M-step would require two one-dimensional optimizations, which significantly reduces the computational burden. Model 3 is, in fact, a weighted version of the two-parameter BS distribution with a linear weight function. In this last model, the corresponding M-step would require solving only one non-linear equation in each E-step. The rest of the paper is organized as follows. In Section 2, we briefly describe the BS and LBS distributions. In Section 3, we present the three mixture models and examine some of their characteristics. In Section 4, we deal with the estimation problem for the three models. In Section 5, we analyze two real data sets for the purpose of illustrating the three mixture models discussed in this work. Finally, in Section 6, we make some concluding comments. 2. Background In this section, we provide several characteristics of the BS and LBS distributions including their PDF, CDF, quantile function, HRF, moments and some properties. 2.1. The Birnbaum–Saunders distribution From now on, if a RV T follows the two-parameter BS distribution, the notation T  BSða, bÞ is used. In this case, the PDF of T can be written as      fb=tg1=2 þ fb=tg3=2 1 1 t b þ 2 , t 4 0, ð1Þ fT ðt; a, bÞ ¼ pffiffiffiffiffiffi exp  2 2ab 2a b t 2p

N. Balakrishnan et al. / Journal of Statistical Planning and Inference 141 (2011) 2175–2190

2177

where a 4 0 and b 40 are the shape and scale parameters of the BS distribution, respectively. The CDF of T is given by "

1=2 #! 1 t 1=2 b FT ðt; a, bÞ ¼ F  , t 40, ð2Þ a b t where FðÞ is the CDF of the standard normal distribution. The quantile function of T is expressed as  qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 b azðqÞ þ fazðqÞg2 þ 4 , 0 oq o 1, tðq; a, bÞ ¼ 4

ð3Þ

where z(q) is the qth quantile of the standard normal distribution. Since tð0:5; a, bÞ ¼ b, then b is also the median of the BS distribution. The HRF of T can be easily obtained from (1) and (2) as   expð½1=f2a2 g½t=b þ b=t2Þ fb=tg1=2 þ fb=tg3=2 pffiffiffiffiffiffi , t 40: ð4Þ hT ðt; a, bÞ ¼   8pabF ½1=a ft=bg1=2 fb=tg1=2 Some basic properties of the BS distribution include: (i) cT  BSða,cbÞ, with c 40, and (ii) 1=T  BSða,1=bÞ. From (3), it is pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi d d possible to note that T ¼ ½b=4½aZ þ faZg2 þ 42 , where Z  Nð0,1Þ and ¼ means equal in distribution. Thus, it can be proved that "

1=2 #   T 1=2 b a2 d 1 X¼ , ð5Þ   N 0, 2 b T 4 d

where N(a, b2) denotes the normal distribution with mean a and variance b2, which implies T ¼ b½1 þ 2X 2 þ2X f1 þX 2 g1=2   BSða, bÞ. Using the transformation given in (5), the rth moment of T can be shown to be ! j   r X 2r X j ð2r2j þ2iÞ! hai2r2j þ 2i r r : ð6Þ E½T  ¼ b 2j i 2rj þ i ðrj þiÞ! 2 i¼0 j¼0 In particular, from (6), we have     a2 5a2 , V½T ¼ ½ab2 1 þ , E½T ¼ b 1 þ 2 4

b1 ½T ¼

16a2 ½11a2 þ 6 ½5a2 þ43

,

ð7Þ

and

b2 ½T ¼ 3 þ

6a2 ½93a2 þ40 ½5a2 þ 43

,

pffiffiffiffiffiffiffiffiffi of skewness (CS) and kurtosis (CK) of T, respectively, g½T ¼ V½T=E½T is the where b1 ½T and b2 ½T denote the coefficients pffiffiffiffiffiffiffiffiffi coefficient of variation (CV) of T and V½T its standard deviation (SD). From b1 ½T in (7), it may be observed that, as a-0, the skewness also goes to zero, so that the BS distribution tends in this case to be symmetrical, but it degenerates at the scale parameter b; see Kundu et al. (2008). If b ¼ 1, then the mode of the BS distribution can be obtained as the solution of the nonlinear equation:   x 3x3=2 þ x1=2 1 ¼ 2: ð8Þ 1x2 a If x ðaÞ is the solution of (8), then this modal value becomes bx ðaÞ. From (8), it is immediate that x ðbÞ o 1, so that the mode of the BS distribution is less than b. Table 1 presents the BS modal values for different choices of a. From this table, it is readily seen that xn is a decreasing function of a. We now briefly describe how the ML estimation of the parameters of the BS distribution can be done based on a random sample of size n, say T ¼ ðT1 , . . . ,Tn Þ> , where Ti  BSða, bÞ, for i= 1,y,n. For given b, the ML estimate of a, say a^ ðbÞ, is obtained as " #1=2 s b^ ^ a ðbÞ ¼ ^ þ 2 , ð9Þ b r P P where s ¼ ½1=n ni¼ 1 ti and r ¼ ½f1=ng ni¼ 1 f1=ti g1 are the sample arithmetic and harmonic means, respectively, and b^ is the ML estimate of b. However, it is not possible to find an explicit form for b^ . Thus, the ML estimate of b can be obtained as the Table 1 Values of xn for different choices of a.

a

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

5.0

xn

0.720

0.471

0.337

0.258

0.207

0.172

0.146

0.126

0.110

0.098

2178

N. Balakrishnan et al. / Journal of Statistical Planning and Inference 141 (2011) 2175–2190

unique positive root of the equation

b2 b½2s þKðbÞ þr½s þ KðbÞ ¼ 0,

ð10Þ

Pn

where KðuÞ ¼ ½f1=ng i ¼ 1 fu þ ti g1 1 is the harmonic mean function, for u Z 0, such that K(0) =r. Note that (10) can be easily solved by using the fixed-point type equation given by gðbÞ ¼ b, where gðbÞ ¼

b2 þ r½s þKðbÞ : 2r þ KðbÞ ðmÞ

ðm þ 1Þ

Then, a simple iterative scheme such as gðb Þ ¼ b can be used to determine the ML estimate of b based on an initial ð0Þ guess b , which can be the sample median, for example. Once b^ is obtained as a solution of (10), then the ML estimate of a can be obtained from (9). Engelhardt et al. (1981) obtained asymptotic inference for h ¼ ða, bÞ> by using the result that 0 0 2 11 ! ! a 0 B a B 2n CC a^ B B CC  _ N2 B ,B CC, b^ b2 @ b @ AA 0 n½0:25 þ a2 þ IðaÞ where IðaÞ ¼ 2

Z

1

0



2 1 0:5 dFðxÞ 1 þ rðaxÞ

and

rðyÞ ¼ 1 þ

 1=2 y2 y2 þy 1þ : 2 4

An alternative way for determining the variance-covariance matrix of h^ ¼ ða^ , b^ Þ> , say Ry^ , is from the expected information matrix, say JðhÞ, as Ry^ ¼ JðhÞ1 . Instead of the expected information matrix, one may prefer to use the observed information matrix since it is easier to compute. In the application part of this work, we carried out asymptotic inference on the model parameters based on the observed information matrix; for more details on this approach, see Efron and Hinkley (1978). 2.2. Length-biased distributions As mentioned earlier, the LB distributions are a special case of the weighted distributions. Specifically, let Y be a nonnegative RV with PDF fY(y). Then, the weighted version of Y, say YB , has a weighted distribution with PDF given by fYB ðyÞ ¼

BðyÞfY ðyÞ , y 4 0, E½BðYÞ

ð11Þ

where BðÞ is the weight function. Here, it is assumed that E½BðYÞ o1. A particular case of the distributions with PDF as given in (11) is obtained when we consider BðyÞ ¼ y. In this case, YB is called the LB (or size-biased) version of Y, denoted by L. The PDF of L is given by fL(l)= l fY(l)/E[Y], for l 4 0, with E½Y o 1. The distribution of L is the LB version of the distribution of Y. Note that the LB distribution involves the same parameters as in the original distribution. 2.3. Connection between the BS and IG distributions Consider the following mixture representation of the BS distribution. Suppose X1  IGðm, lÞ, i.e., X1 has an IG distribution with parameters m 40 and l 40. The PDF of X1 is given by ! pffiffiffi 1 l½xm2 l pffiffiffiffiffi , x 40: fX1 ðx; m, lÞ ¼ pffiffiffiffiffiffi exp  ð12Þ 2m2 x 2p x3 Furthermore, let X2 be a RV such that 1=X2  IGð1=m, l=m2 Þ, which is independent of X1. Then, consider a new RV given by ( X1 with probability 1=2, ð13Þ T¼ X2 with probability 1=2: Evidently, the PDF of T (being a mixture of X1 and X2) is defined as fT ðx; m, lÞ ¼ 12 fX1 ðx; m, lÞ þ 12 fX2 ðx; m, lÞ,

x 40,

ð14Þ

where fX1 ðx; m, lÞ and fX2 ðx; m, lÞ are the densities of X1 and X2, respectively. Here, the PDF of X2 can be expressed as fX2 ðx; m, lÞ ¼

x f X1 ðx; m, lÞ

m

,

x 4 0,

ð15Þ

N. Balakrishnan et al. / Journal of Statistical Planning and Inference 141 (2011) 2175–2190

2179

pffiffiffiffiffiffiffiffiffi where m ¼ E½X1 , so that X2 is the LB version of X1. Therefore, if a ¼ m=l and b ¼ m in (14), then T  BS ða, bÞ. The representation given in (14) is quite useful for developing ML estimation of the BS distribution parameters via the EM algorithm. 2.4. The length-biased Birnbaum–Saunders distribution Let T  BSða, bÞ and L be its LB version, i.e., L follows a LBS distribution, denoted by L  LBSða, bÞ. Then, the PDF of L is expressed as      l fb=lg1=2 þ fb=lg3=2 1 1 l b fL ðl; a, bÞ ¼ pffiffiffiffiffiffi exp  2 þ 2 , l 4 0: ð16Þ l 2a b 2p a½a2 þ 2b2 The CDF of L is given by 0 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi19 2 4 þ x ðl=bÞ = A FL ðl; a, bÞ ¼ F x ; a b a ffi93 !8   qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 4 þ x ðl=bÞ= 1 x2 ðl=bÞ <1 l 5, l 40, x þ pffiffiffiffiffiffiexp  :a b ; 2 a 2p 

  l

1

2



8 

a2 4 2 < exp 2 þ F@ 2 ½2þ a  a :

ð17Þ

where xðuÞ ¼ u1=2 u1=2 . The HRF of L can be easily obtained from (16) and (17) as hL ðl; a, bÞ ¼

fL ðl; a, bÞ , 1FL ðl; a, bÞ

l 40:

ð18Þ

Using the fact that the BS distribution in (1) is unimodal, it can be shown that the LBS distribution in (16) is also unimodal. The mode of the LBS distribution can be obtained as the solution of the cubic equation  3  2   x x x þ ½1a2  ð19Þ ½1þ a2 1 ¼ 0:

b

b

b

It can be further shown that the HRF of the LBS distribution obtained from (18) is also unimodal for all values of a and b. A basic property of this distribution is cL  LBSða,cbÞ, with c 4 0. The rth moment of L is given by # "     r þ1 2r þ1 X 2r2 E½T rk þ 1  2r þ2 E½T kr1  E½T r þ 1  1 2r þ 1 Gðr þ3=2Þa2r þ 2 b rX k k r pffiffiffiffi ¼ 2 E½L  ¼ ½1  ½1 ,  E½T ½a þ 2 k k p b1k bk2r1 k¼1 k ¼ rþ2 ð20Þ where T  BSða, bÞ and its moments are as given in (6). In particular, from (20), we obtain E½L ¼

b½2 þ 4a2 þ 3a4 

b1 ½L ¼

2

2þa

,

V½T ¼

½ab2 ½4 þ17a2 þ24a4 þ 6a6  ½2 þ a2 2

,

3a½8þ 48a2 þ95a4 þ48a6 þ 8a8  qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi , ½2 þ9a2 þ18a4 þ 15a6 3

ð21Þ

and

b2 ½L ¼ 3 þ

3a2 ½80 þ 602a2 þ1508a4 þ1149a6 þ 384a8 þ48a10  ½4 þ 17a2 þ 24a4 þ 6a6 2

:

For more details about the LBS distribution, one may refer to Leiva et al. (2009). We now briefly describe how the ML estimation of the parameters of the LBS distribution can be done based on a random sample of size n, say L ¼ ðL1 , . . . ,Ln Þ> , where Li  LBSða, bÞ, for i= 1,y,n. In this case, the log-likelihood function is given by  X  1=2  3=2 ! n n  n X b b 1 X li b log þ þ 2 þ logðli Þ: ð22Þ ‘LBS ða, bÞ ¼ nlogðaÞ2nlogðbÞnlogða2 þ 2Þ þ  2 l l b l 2 a i i i i¼1 i¼1 i¼1 For a given b, the ML estimates of a, say a^ ðbÞ, can be obtained as  qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1=2 1 a^ ðbÞ ¼ pffiffiffi fAðbÞ2g þ fAðbÞ2g2 þ24AðbÞ , ð23Þ 6 Pn where AðbÞ ¼ ½1=n i ¼ 1 ½li =b þ b=li 2. Thus, the ML estimate of b, say b^ , can be obtained by first maximizing the profile loglikelihood function of b, viz., ‘LBS ða^ ðbÞ, bÞ. Once b^ is numerically computed, the ML estimate of a, say a^ , can be obtained from (23) as a^ ðb^ Þ. Note that the maximization of the profile log-likelihood function ‘LBS ða^ ðbÞ, bÞ can be done by solving the

2180

N. Balakrishnan et al. / Journal of Statistical Planning and Inference 141 (2011) 2175–2190

non-linear equation: 

3

b

þ

n 1 X 2½b=li 3=2 s 1 þ  ¼ 0, nb i ¼ 1 ½b=li 1=2 þ ½b=li 3=2 b2 a^ 2 ðbÞ r a^ 2 ðbÞ

ð24Þ

where s and r are as defined in (10) and a^ ðbÞ as in (23). Once again, observe that (24) can be solved by using a fixed-point type equation gðbÞ ¼ b, where now " #" #1 n 2b X ½b=li 3=2 s b þ : ð25Þ gðbÞ ¼ 3þ 2 n i ¼ 1 ½b=li 1=2 þ½b=li 3=2 a^ 2 ðbÞ r a^ ðbÞ A simple iterative scheme, as detailed in Subsection 2.1, can be used to solve (25). 3. Three new mixture models and their characteristics In this section, we present the three mixture models and examine some of their characteristics. 3.1. Model 1 A RV Y has a mixture distribution of two different BS models (MTBS) iff its PDF is given by fY ðy; a1 , b1 , a2 , b2 ,pÞ ¼ p f T1 ðy; a1 , b1 Þ þ ½1p fT2 ðy; a2 , b2 Þ,

y4 0,

ð26Þ

where Tj  BSðaj , bj Þ, with aj 4 0, bj 4 0, and 0 rp r1, for j= 1,2. The RV Y with PDF as given in (26) is said to have a MTBS distribution, which is denoted by Y  MTBSða1 , b1 , a2 , b2 ,pÞ. It appears that this PDF is either unimodal or bimodal and its shapes for some choices of a1 , b1 , a2 , b2 , and p are presented in Fig. 1. The CDF of Y is given by FY ðy; a1 , b1 , a2 , b2 ,pÞ ¼ p F T1 ðy; a1 , b1 Þ þ ½1pFT2 ðy; a2 , b2 Þ,

3

y 40,

ð27Þ

2.5

2.5

2

2 1.5 1.5 1 1 0.5

0.5 0

0 0

1

2

3

4

0

5

3.5

0.5

1

1.5

2

2.5

3

2 1.8

3

1.6 2.5

1.4

2

1.2

1.5

0.8

1

0.6

1

0.4 0.5

0.2 0

0 0

0.5

1

1.5

2

2.5

0

0.5

1

1.5

2

2.5

Fig. 1. PDF of the MTBS distribution given in (26) for the choices of a1 , b1 , a2 , b2 , and p as (a) 1.0, 1.0, 0.5, 1.0, and 0.75, (b) 2.5, 1.0, 0.5, 1.0, and 0.25, (c) 1.0, 0.5, 1.0, 1.0, and 0.5, and (d) 1.0, 0.5, 0.5, 5.0, and 0.25, respectively.

N. Balakrishnan et al. / Journal of Statistical Planning and Inference 141 (2011) 2175–2190

2181

where FTj ð; aj , bj Þ is as given in (2), for j =1, 2. The HRF of Y is of the form hY ðy; a1 , b1 , a2 , b2 ,pÞ ¼

p f T1 ðy; a1 , b1 Þ þ ½1p fT2 ðy; a2 , b2 Þ , 1½p F T1 ðy; a1 , b1 Þ þ ½1pFT2 ðy; a2 , b2 Þ

y4 0,

ð28Þ

from which it is clear that it can be expressed as hY ðy; a1 , b1 , a2 , b2 ,pÞ ¼ j1 ðyÞhT1 ðy; a1 , b1 Þ þ j2 ðyÞhT2 ðy; a2 , b2 Þ,

y 40,

ð29Þ

where hT1 ð; a1 , b1 Þ and hT2 ð; a2 , b2 Þ are the HRFs of T1 and T2, respectively, and j1 ðÞ and j2 ðÞ are non-linear weight functions depending on y. As mentioned, the HRF of the BS distribution is unimodal for all values of a and b. However, due to the complicated form of the HRF of the MTBS distribution, it is difficult to examine theoretically its shape characteristics. Graphically, we observe that the HRF in (28) is either unimodal or bimodal depending on the values of a1 , b1 , a2 , b2 , and p. For some choices of these parameters, a plot of the HRF is presented in Fig. 2. For Y having the PDF in (26), the first two raw moments are given by E½Y ¼ pb1 ½1 þ 12 a21  þ ½1pb2 ½1þ 12a22 

ð30Þ

and 2

2

E½Y 2  ¼ 32 pb1 ½a41 þ 2a21 þ 23  þ 32 ½1pb2 ½a42 þ2a22 þ 23:

ð31Þ

Moreover, it is evident that if Y has PDF as in (26), then 1/Y has its PDF as f1=Y ðy; a1 , b1 , a2 , b2 ,pÞ ¼ p f 1=T1 ðy; a1 , b1 Þ þ ½1p f1=T2 ðy; a2 , b2 Þ,

y4 0,

ð32Þ

where 1=Tj  BSðaj ,1=bj Þ, for j= 1,2. Therefore, 1/Y also has a MTBS distribution.

7.5

5 4.5

7

4

6.5

3.5

6 5.5

3

5

2.5

4.5

2

4

1.5

3.5

1

3

0.5

2.5 2

0 0

1

2

3

4

0.5

0

5

7

9

6

8

1

1.5

2

2.5

7

5

6

4

5 3 4 2

3

1

2

0

1 0

1

2

3

4

5

0

0.5

1

1.5

2

2.5

3

3.5

4

Fig. 2. HRF of the MTBS distribution given in (28) for the choices of a1 , b1 , a2 , b2 , and p as (a) 1.0, 1.0, 0.5, 1.0, and 0.75, (b) 2.5, 1.0, 0.5, 1.0, and 0.25, (c) 1.0, 0.5, 1.0, 1.0, and 0.5, and (d) 1.0, 0.5, 0.25, 4.0, and 0.75, respectively.

2182

N. Balakrishnan et al. / Journal of Statistical Planning and Inference 141 (2011) 2175–2190

3.2. Model 2 A RV Y has a mixture distribution of BS and LBS models (MBSLBS) if its PDF is given by fY ðy; a1 , b1 , a2 , b2 ,pÞ ¼ p f T ðy; a1 , b1 Þ þ ½1p fL ðy; a2 , b2 Þ,

y 40,

ð33Þ

where T  BSða1 , b1 Þ and L  LBSða2 , b2 Þ, with 0 r p r 1 being the mixing parameter. The RV Y with PDF in (33) is said to have a MBSLBS distribution, which is denoted by Y  MBSLBSða1 , b1 , a2 , b2 ,pÞ. By graphical plots, it has been observed that this PDF is either unimodal or bimodal and its shapes for some choices of the parameters a1 , b1 , a2 , b2 , and p are presented in Fig. 3. The CDF of Y is then FY ðy; a1 , b1 , a2 , b2 ,pÞ ¼ pF T ðy; a1 , b1 Þ þ ½1pFL ðy; a2 , b2 Þ,

y 40,

ð34Þ

where FT ð; a1 , b1 Þ and FT ð; a2 , b2 Þ are as given in (2) and (17), respectively. As in the case of Model 1, the HRF of the MBSLBS distribution can also be expressed as a weighted mixture of the HRFs of the BS and LBS distributions, with the weight functions depending on y. The HRF of the MBSLBS distribution can take on different shapes and its plots for some choices of the parameters are presented in Fig. 4. Since the rth moment of the LBS distribution can be expressed in terms of the moments of the BS distribution, the moments of the MBSLBS distribution can be expressed purely in terms of the moments of the BS distribution. Though Model 2 has five parameters and thus provides great flexibility in modeling, we now consider Model 3 which is a special case of Model 2 when a1 ¼ a2 and b1 ¼ b2 , as this simpler form provides adequate fit in some cases (as in the case of the depressive condition data studied in Section 5).

0.6

0.6

0.5

0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0.1 0

0 0

1

2

3

4

5

6

7

8

9 10

1

0

0.8

0.4

0.7

0.35

0.6

0.3

0.5

0.25

0.4

0.2

0.3

0.15

0.2

0.1

0.1

0.05

2

4

3

5

6

0

0 0

1

2

3

4

5

6

0

1

2

3

4

5

6

7

8

9 10

Fig. 3. PDF of the MBSLBS distribution given in (33) for the choices of a1 , b1 , a2 , b2 , and p as (a) 1.0, 1.0, 0.5, 1.0, and 0.75, (b) 2.5, 1.0, 0.5, 1.0, and 0.25, (c) 1.0, 0.5, 1.0, 1.0, and 0.5, and (d) 1.0, 0.5, 0.5, 5.0, and 0.25, respectively.

N. Balakrishnan et al. / Journal of Statistical Planning and Inference 141 (2011) 2175–2190

1

2183

1.2

0.9 1

0.8 0.7

0.8

0.6 0.5

0.6

0.4 0.4

0.3 0.2

0.2

0.1 0

0 0

2

4

6

8

10

12

0.9

0

1

2

3

4

5

6

0

1

2

3

4

5

6

1.4

0.8

1.2

0.7 1

0.6 0.5

0.8

0.4

0.6

0.3

0.4

0.2 0.2

0.1 0

0 0

1

2

3

4

5

6

Fig. 4. HRF of the MBSLBS distribution obtained from (33) and (34) for the choices of a1 , b1 , a2 , b2 , and p as (a) 1.0, 1.0, 0.5, 1.0, and 0.75, (b) 2.5, 1.0, 0.5, 1.0, and 0.25, (c) 1.0, 0.5, 1.0, 1.0, and 0.5, and (d) 1.0, 0.5, 0.5, 5.0, and 0.25, respectively.

3.3. Model 3 Here, we concentrate on a special case of the MBSLBS distribution when the two distributions share the same parameter values. Such a three-parameter MBSLBS distribution has its PDF as fY ðy; a, b,pÞ ¼ p fT ðy; a, bÞ þ ½1p f L ðy; a, bÞ,

y4 0,

ð35Þ

where fT ð; a, bÞ and fL ð; a, bÞ are the densities of the variates T and L defined in Model 2. The RV Y with PDF given in (35) is said to have a reduced MBSLBS (RMBSLBS) distribution, which is denoted by Y  RMBSLBSða, b,pÞ. Noting that (35) is a weighted version of the BS model with a linear weight function, it can be shown that the PDF of Y is unimodal for all values of the parameters a 40 and b 4 0. The parameter b only modifies the scale, while the parameter a affects the asymmetry and kurtosis of the distribution. The mixing parameter p modifies both scale and kurtosis. The CDF, HRF and moments of this model can all be easily obtained from those of Model 2. Using the fact that the HRF of the BS distribution is unimodal, it can be shown that the HRF of the RMBSLBS distribution with PDF given in (35) is also always unimodal. 4. Estimation via EM algorithm In this section, we deal with the estimation problem for the three models by using the EM algorithm. Specifically, we discuss the ML estimation of the parameters of Models 1–3 with densities given in (26), (33) and (35), respectively. 4.1. Model 1 For ease in notation, we denote p1 =p and p2 = 1 p. Now, consider a random sample of size n, say Y ¼ ðY1 , . . . ,Yn Þ> , where Yi, for i=1,y,n, has a distribution with PDF as given in (26). For facilitating the EM algorithm, we first use (14) and rewrite the

2184

N. Balakrishnan et al. / Journal of Statistical Planning and Inference 141 (2011) 2175–2190

PDF in (26) as fY ðy; m1 , l1 , m2 , l2 ,pÞ ¼

2 2 1X 1X p fX ðy; mj , lj Þ þ p fX ðy; mj , lj Þ, 2j¼1 j 1 2j¼1 j 2

t 40,

ð36Þ

where fX1 ð; mj , lj Þ and fX2 ð; mj , lj Þ are as defined in (12) and (15), respectively, with mj ¼ bj and lj ¼ bj =a2j , for j =1,2. It is well known that a mixture model can be treated as a missing value problem and so the EM algorithm can be used to efficiently compute the ML estimates of the model parameters; see McLachlan and Peel (2000, pp. 47–50). For this purpose, we now assume that the complete-data are as follow. For Y being the RV with PDF given in (36), we then define an associated P random vector W ¼ ðU1 ,V1 ,U2 ,V2 Þ as follows. Here, each Uj and Vj can take on values 0 or 1, with 2j ¼ 1 ½Uj þVj  ¼ 1, where PðUj ¼ 1Þ ¼ PðVj ¼ 1Þ ¼ pj =2, for j = 1,2. Moreover, YjðUj ¼ 1Þ and YjðVj ¼ 1Þ have densities fX1 ð; mj , lj Þ and fX2 ð; mj , lj Þ, respectively, for j =1,2. These facts yield the joint distribution of ðY,WÞ. Now, if we have complete-data, say yðcÞ ¼ ðy,wÞ> ¼ ðfy1 ,w1 g, . . . ,fyn ,wn gÞ> , where y ¼ ðy1 , . . . ,yn Þ> and wi ¼ fui1 ,vi1 ,ui2 ,vi2 g, for i=1,y,n, then the corresponding log-likelihood function for the parameter vector h ¼ ðm1 , l1 ,p1 , m2 , l2 ,p2 Þ> is given by ‘ðcÞ ðhÞ ¼ c þ

2 X

½uj þ vj logðpj Þ þ

j¼1

n X 2 X

uij logðfX1 ðyi ; mj , lj ÞÞ þ

i¼1j¼1

n X 2 X

vij logðfX2 ðyi ; mj , lj ÞÞ,

ð37Þ

i¼1j¼1

P P where uj ¼ ni¼ 1 uij , vj ¼ ni¼ 1 vij , for j = 1,2, and here c is a constant independent of the parameter vector h. Therefore, for the complete-data case, the ML estimates of the unknown parameters can be obtained as follows. For pj, we have p^ j ¼

uj þvj , n

j ¼ 1,2,

ð38Þ

and for mj and lj , we must maximize n X

uij logðfX1 ðyi ; mj , lj ÞÞ þ

i¼1

n X

vij logðfX2 ðyi ; mj , lj ÞÞ,

j ¼ 1,2,

ð39Þ

i¼1

with respect to mj and lj , respectively. The function in (39) can be shown to be ½uj þ vj 

n ½u þ v ½y m 2 X logðlj Þ ij ij i j lj vj logðmj Þ, 2y 2 2 m i j i¼1

j ¼ 1,2,

ð40Þ

and so the ML estimates of mj and lj can be obtained by maximizing (40) with respect to mj and lj , respectively. After some algebraic calculations, it can be seen that these ML estimates are given by qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Bj ½Aj Bj  þ ½Bj fAj Bj g2 þ Aj Cj Dj ½2Bj Aj  m^ j ¼ ð41Þ Dj A j and 2

½uj þ vj m^ j

l^ j ¼ Pn

2 i ¼ 1 ½uij þvij ½yi  j  =yi

m

,

j ¼ 1,2,

ð42Þ

P P where Aj ¼ vj , Bj ¼ ½1=2½uj þ vj , Cj ¼ ½1=2 ni¼ 1 yi ½uij þ vij , and Dj ¼ ½1=2 ni¼ 1 ½uij þ vij =yi , for j =1,2. Thus, in the case of complete-data, the ML estimates of the parameters can all be obtained explicitly. Now, we are ready to implement the EM algorithm. The existence of explicit expressions of the ML estimates simplifies the implementation of the EM algorithm as well as makes it computationally efficient. ðmÞ Suppose, at the mth stage of the EM algorithm, the ML estimate of the unknown parameter h is hðmÞ ¼ ðmðmÞ 1 , l1 , ðm þ 1Þ ðmÞ ðmÞ ðmÞ ðmÞ > p1 , m2 , l2 ,p2 Þ . We now describe how the next iteration h can be obtained. At the E-step of the EM algorithm, the pseudo-log-likelihood function is formed by replacing the missing values with their expectation. Since in this case uij’s and vij’s are missing, the pseudo-log-likelihood function at the mth stage is obtained from (m) (m) and bij , respectively, where aðmÞ ¼ E½Uij jyi , ðmÞ  and bðmÞ ¼ E½Vij jyi , ðmÞ , for i= 1,y,n and ij ij (m) (m) j = 1,2. Note that here aij and bij are the usual posterior probabilities associated with the ith observation, which are given by

(37) by replacing uij and vij with aij

h

h

ðmÞ

ðmÞ pðmÞ j fX1 ðyi ; mj , lj Þ P ðmÞ ðmÞ ðmÞ ðmÞ ðmÞ ðmÞ k r ¼ 1 pr fX1 ðyi ; mr , lr Þ þ r ¼ 1 pr fX2 ðyi ; mr , lr Þ

¼ Pk aðmÞ ij

ð43Þ

and ðmÞ

pjðmÞ fX2 ðyi ; mðmÞ j , lj Þ , Pk ðmÞ ðmÞ ðmÞ ðmÞ ðmÞ ðmÞ r ¼ 1 pr fX1 ðyi ; mr , lr Þ þ r ¼ 1 pr fX2 ðyi ; mr , lr Þ

bðmÞ ij ¼ Pk

i ¼ 1, . . . , n,

j ¼ 1, 2,

ð44Þ

N. Balakrishnan et al. / Journal of Statistical Planning and Inference 141 (2011) 2175–2190

2185

respectively; see McLachlan and Peel (2000, pp. 48–49). Consequently, the pseudo-log-likelihood function at the mth stage can be expressed as   X   n X 2 n X 2 X pj pj aðmÞ bðmÞ ð45Þ ‘ðpsÞ ðhjhðmÞ Þ ¼ ij log 2 fX1 ðyi ; mj , lj Þ þ ij log 2 fX2 ðyi ; mj , lj Þ : i¼1j¼1 i¼1j¼1 The M-step of the EM algorithm involves the maximization of (45) with respect to the unknown parameters. Let us now Pn Pn P ðmÞ ðmÞ ðmÞ ðmÞ ðmÞ ðmÞ ðmÞ ðmÞ ðmÞ ¼ bðmÞ ¼ ½1=2½aðmÞ ¼ ½1=2 ni¼ 1 yi ½aðmÞ ¼ ½1=2 denote aðmÞ i ¼ 1 aij , bj ¼ i ¼ 1 bij , Aj j ¼ j , Bj j þbj , Cj ij þ bij , and Dj Pn ðm þ 1Þ ðmÞ ðmÞ ½a þ b =y . Then, the elements of h can be obtained as i i ¼ 1 ij ij aðmÞ þbðmÞ j j

þ 1Þ ¼ pðm j

ðm þ 1Þ j

m

n

¼

,

ð46Þ

½fBðmÞ g2 AðmÞ BðmÞ  j j j ½BðmÞ AðmÞ  DðmÞ j j j

þ

qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi g2 AðmÞ BðmÞ 2 þAðmÞ CjðmÞ DjðmÞ ½BðmÞ AðmÞ  ½fBðmÞ j j j j j j ½BðmÞ AðmÞ  DðmÞ j j j

,

ð47Þ

and ðmÞ ðm þ 1 2 ½aðmÞ  j þbj ½mj

ljðm þ 1Þ ¼ Pn

ðmÞ ðmÞ ðm þ 1Þ 2  =yi i ¼ 1 ½aij þ bij ½yi  j

m

,

j ¼ 1,2:

ð48Þ

4.2. Model 2 Now, consider a random sample of size n, say Y ¼ ðY1 , . . . ,Yn Þ> , where Yi, for i= 1,y,n, has a distribution with PDF as in (33). Based on this sample, we wish to estimate the five model parameters, viz., a1 , b1 , a2 , b2 , and p. It is evident that if we want to maximize the log-likelihood function directly with respect to the unknown parameters, we would need to solve a fivedimensional optimization problem, which would be computationally quite involved. On the other hand, it has already been mentioned that the computation of the ML estimates for both BS and LBS distributions involves solving only one non-linear equation each. We now treat this problem as a standard missing value problem. The complete-data are of the form fy, dg, where d is the observed value of the binary RV D taking on the values 0 or 1. If D ¼ 1, it means that the corresponding Y arises from the BS distribution and if D ¼ 0 it means that Y arises from the LBS distribution. Moreover, PðD ¼ 1Þ ¼ p and PðD ¼ 0Þ ¼ 1p. If the complete-data, say yðcÞ ¼ ðy, dÞ> ¼ ðfy1 , d1 g, . . . ,fyn , dn gÞ> , where y ¼ ðy1 , . . . ,yn Þ> and d ¼ ðd1 , . . . , dn Þ> , are available, then the complete-data log-likelihood function is given by n X

‘ðcÞ ðhÞ ¼ logðpÞ

di þlogð1pÞ

i¼1

n X

½1di  þ

i¼1

n X

di logðfT ðyi ; a1 , b1 ÞÞþ

i¼1

n X

½1di logðfL ðyi ; a2 , b2 Þ,

ð49Þ

i¼1

where now h ¼ ða1 , b1 , a2 , b2 ,pÞ> and fT ð; a1 , b1 Þ and fL ð; a2 , b2 Þ are the densities of the BS and LBS distributions, respectively. The ML estimate of h can be obtained by maximizing (49) with respect to its elements. First, we obtain the ML estimate of p based on the complete-data as ðcÞ p^ ¼

n 1X d ¼ d: ni¼1 i

ð50Þ

ðcÞ P ðcÞ The ML estimates of a1 and b1 based on the complete-data, say a^ 1 and b^ 1 , can be obtained by maximizing ni¼ 1 ðcÞ

di logðfT ðyi ; a1 , b1 ÞÞ with respect to a1 and b1 , respectively. It may be easily seen that b^ 1 can be obtained as the solution of g ðcÞ ðbÞ ¼ b, where " g ðcÞ ðbÞ ¼ bd þ

#" #1 Pn n n X 2X bdi fb=yi g3=2 b di i ¼ 1 di yi þ d þ , 2 ðcÞ n i ¼ 1 fb=yi g1=2 þ fb=yi g3=2 nfa^ ðcÞ ðbÞg2 nfa^ 1 ðbÞg2 i ¼ 1 yi 1

ð51Þ

with "

a^ ðcÞ 1 ðbÞ ¼

n 1 X

nd i ¼ 1

di

yi

b

þ

b yi

#1=2 2 :

ð52Þ

ðcÞ ðcÞ We can use an iterative scheme similar to that suggested in Subsection 2.1 to compute b^ 1 from (51). Thus, once b^ 1 is ðcÞ ðcÞ ðcÞ obtained, we can readily compute a^ 1 ¼ a^ 1 ðb^ 1 Þ from (52). ðcÞ ðcÞ Similarly, the ML estimates of a2 and b2 based on the complete-data, say a^ 2 and b^ 2 , can be obtained by maximizing

Pn

i ¼ 1 ½1di logðfL ðyi ;

ðcÞ a2 , b2 ÞÞ with respect to a2 and b2 , respectively. In this case, b^ 2 can be once again computed as the

2186

N. Balakrishnan et al. / Journal of Statistical Planning and Inference 141 (2011) 2175–2190

solution of g ðcÞ ðbÞ ¼ b, where now " #" #1 n n n X X 2b X f1di gfb=yi g3=2 1 b 1di g ðcÞ ðbÞ ¼ þ f1 d gy d g þ , 3f1 i i ðcÞ n i ¼ 1 fb=yi g1=2 þ fb=yi g3=2 nfa^ ðcÞ ðbÞg2 i ¼ 1 nfa^ 2 ðbÞg2 i ¼ 1 yi 2

ð53Þ

with 1



qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1=2

~ ~ bÞ2g2 þ24d Að ~ bÞ a^ ðcÞ fAð 2 ðbÞ ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi fAðbÞ2g þ

ð54Þ

6½1d 

and   n X y b ~ bÞ ¼ 1 ½1di  i þ 1 : Að ni¼1 b yi ðcÞ

By employing an iterative scheme similar to the one as suggested earlier for Model 1, we can compute b^ 2 from (53). Thus, ðcÞ b^ 2

^ ðcÞ ^ ðcÞ 2 ðb 2 Þ

ðcÞ once is obtained, we can readily compute a^ 2 ¼ a from (54). Now, we are ready to implement the EM algorithm for the computation of the ML estimates of the parameters of the MBSLBS model. Suppose, at the mth stage of the EM algorithm, the corresponding estimate of the parameter vector h is ðmÞ ðmÞ ðmÞ > hðmÞ ¼ ða1ðmÞ , bðmÞ Þ . Then, at this stage of the EM algorithm, the E-step can be formed by replacing the missing 1 , a2 , b2 ,p

values with their expectation in the complete-data log-likelihood function, thus forming the pseudo-log-likelihood function. ðmÞ

Since di is missing, we replace it by di , where ðmÞ

pðmÞ fT ðyi ; aðmÞ 1 , b1 Þ

diðmÞ ¼

ðmÞ

ðmÞ

pðmÞ fT ðyi ; a1ðmÞ , b1 Þ þ ½1pðmÞ fL ðyi ; aðmÞ 2 , b2 Þ

,

ð55Þ

which is the conditional posterior expectation of Di given the observation and hðmÞ ; see McLachlan and Peel (2000, pp. 47–50). ðm þ 1Þ

Once the pseudo-log-likelihood function is formed, at the M-step, we maximize it to obtain hðm þ 1Þ . Therefore, a1ðm þ 1Þ , b1 ðm þ 1Þ ðm þ 1Þ , b2 , and p(m + 1) are obtained as 2

a

ðcÞ

^ ^ ðcÞ 1 , b1

a

ðcÞ

^ ^ ðcÞ 2 , b2

,a

, and p^

ðcÞ

ðmÞ by replacing di with di , where

ðcÞ

^ ^ ðcÞ 1 , b1

a

ðcÞ

^ ^ ðcÞ 2 , b2

,a

, and p^

ðcÞ

,

are

as defined earlier in Section 4.2. 4.3. Model 3 Now, consider a random sample of size n, say Y ¼ ðY1 , . . . ,Yn Þ> , where Yi, for i =1,y,n, has a distribution with PDF as in (35). Based on this sample, we wish to estimate the parameter vector now given by h ¼ ða, b,pÞ> . In this case as well, we define D exactly in the same way as in Model 2. Thus, the complete-data log-likelihood becomes  n  1 X yi b ‘ðcÞ ðhÞ ¼ nlogðaÞnlogða2 þ 2Þ 2 þ 2 2a i ¼ 1 b yi " #  1=2  3=2 ! X n n n n X X X b b þ  2n di logðbÞ þ log þ di logðpÞ þ ½1di logð1pÞ: ð56Þ yi yi i¼1 i¼1 i¼1 i¼1 ðcÞ ðcÞ The ML estimate of p for the complete-data, say p^ , is the same as before, i.e., p^ ¼ ½1=n

Pn

i¼1

di . For fixed b, the ML estimate ðcÞ

of a for the complete-data set, say a^ ðbÞ, is the same as in (23). The corresponding ML estimate of b, say b^ , can then be obtained ðcÞ ^ with respect to b. Note that the maximization of by maximizing the profile complete-data log-likelihood function ‘ðcÞ ða^ ðbÞ, b, pÞ ðcÞ

ðcÞ the profile log-likelihood function ‘LBS ða^ ðbÞ, bÞ given in (22) can be obtained by solving the non-linear equation:



3p^

b

ðcÞ

þ

n 1 X 2½b=yi 3=2 s 1 þ  ¼ 0, nb i ¼ 1 ½b=yi 1=2 þ½b=yi 3=2 b2 ½a^ ðcÞ ðbÞ2 r½a^ ðcÞ ðbÞ2

ð57Þ

where s and r are as defined earlier in (10). It is of interest to observe that (57) can also be solved by a fixed-point type equation gðbÞ ¼ b, where now #1 " #" n 2b X fb=yi g3=2 s b gðbÞ ¼ þ : ð58Þ n i ¼ 1 fb=yi g1=2 þfb=yi g3=2 fa^ ðcÞ ðbÞg2 rfa^ ðcÞ ðbÞg2 þf3p^ ðcÞ g A simple iterative scheme, such as the one outlined for Models 1 and 2, can be used to solve the fixed-point type equation ðcÞ ðcÞ ðcÞ ðcÞ ðcÞ in (58). Once b^ is obtained, the ML estimate of a for the complete-data set can be obtained as a^ ¼ a^ ðb^ Þ, where a^ ðbÞ is the same as defined in (23). Now, we are ready to implement the EM algorithm for the computation of the corresponding estimates of the parameters of the RMBSLBS model. Suppose, at the mth stage of the EM algorithm, the corresponding estimates of the parameter vector h is hðmÞ ¼ ðaðmÞ , b

ðmÞ

,pðmÞ Þ> . At the mth stage of the EM algorithm, the E-step is formed by

N. Balakrishnan et al. / Journal of Statistical Planning and Inference 141 (2011) 2175–2190

2187

replacing the missing value of di with its expectation in the complete-data log-likelihood function, thus forming the pseudoðmÞ

log-likelihood function. In this case, the missing di is replaced by its conditional expectation di

diðmÞ ¼

ðmÞ pðmÞ fT ðyi ; ðmÞ , b Þ ðmÞ , bðmÞ Þ þ ½1pðmÞ f ðy ; L i

a

pðmÞ fT ðyi ;

a

aðmÞ , bðmÞ Þ

given by

:

ð59Þ ðm þ 1Þ

, and Once the pseudo-log-likelihood function is formed at the M-step, we maximize it to obtain hðm þ 1Þ . Hence, aðm þ 1Þ , b ðcÞ ^ ðcÞ ðcÞ ðcÞ ^ ðcÞ ðcÞ ðmÞ (m + 1) ^ ^ ^ ^ are obtained as a , b , and p , by replacing d with d , where a , b , and p are as defined earlier in Subsection 4.3. p i

i

5. Analysis of two data sets In this section, for the purpose of illustration, we analyze two data sets by using the three mixture models proposed in the preceding sections. 5.1. Data sets Next, the two data sets upon analysis are discussed and presented. 5.1.1. Enzyme data (S1) These data correspond to the enzymatic activity in the blood and represent the metabolism of carcinogenic substances among 245 unrelated individuals. This enzymatic activity is quantified by the molar ratio between two metabolites of caffeine, identified as 5-acetylamino-6-formylamino-3-methyluracil (AFMU) and 1-methyl-xanthine (1X), both measured in urine. This ratio is known as AFMU/1X, which is a marker of genetic acetylation status 24 h after ingestion of 200 mg caffeine. The observed values for these data are the following (frequency in parentheses and none when it is equal to one): 0.021, 0.031, 0.044, 0.061, 0.070, 0.077, 0.078, 0.080, 0.081, 0.083(2), 0.084, 0.085, 0.088, 0.096(2), 0.100, 0.106, 0.108(2), 0.109, 0.111, 0.112, 0.113, 0.118(2), 0.122, 0.124(3), 0.126(3), 0.128, 0.129, 0.130(2), 0.131, 0.132(2), 0.134, 0.137(2), 0.138, 0.142(2), 0.144, 0.148(2), 0.149(2), 0.151, 0.152(2), 0.157, 0.159, 0.162(3), 0.166(2), 0.167(2), 0.171, 0.172(2), 0.174, 0.175, 0.176(2), 0.178, 0.179(2), 0.180(2), 0.182, 0.183, 0.184(2), 0.185(2), 0.190(2), 0.191, 0.192(4), 0.193, 0.194, 0.195, 0.198(3), 0.200(2),0.204, 0.205, 0.207, 0.210, 0.213, 0.214, 0.215, 0.216, 0.222, 0.224, 0.225, 0.230(2), 0.232(2), 0.236, 0.238, 0.240, 0.241, 0.246, 0.250, 0.255, 0.258(3), 0.263, 0.264, 0.265, 0.275, 0.277, 0.279, 0.280, 0.291(2), 0.292, 0.293, 0.299, 0.305(2), 0.308, 0.309, 0.313, 0.320(2), 0.325, 0.340, 0.342, 0.347, 0.357, 0.360, 0.368, 0.379, 0.387, 0.408, 0.409, 0.466, 0.520, 0.673, 0.687, 0.709, 0.718, 0.818, 0.839, 0.853, 0.860, 0.867, 0.875, 0.895, 0.902, 0.909, 0.923, 0.929, 0.933, 0.945, 0.953(2), 0.962, 0.967, 0.978, 0.985, 0.998, 1.003, 1.004, 1.007, 1.010, 1.018, 1.036(2), 1.040, 1.052, 1.073, 1.075, 1.113, 1.123, 1.161, 1.166, 1.169, 1.173, 1.176, 1.195(2), 1.200, 1.229, 1.231, 1.241, 1.261, 1.271, 1.279, 1.293, 1.298, 1.300, 1.310, 1.326, 1.329, 1.361, 1.369, 1.371, 1.374, 1.405, 1.419, 1.452, 1.461, 1.488, 1.516, 1.567, 1.573, 1.604, 1.622, 1.625, 1.633, 1.672, 1.713, 1.741, 1.742, 1.768, 1.811, 1.848, 1.861, 1.885, 1.944, 1.950, 2.016, 2.183, 2.264, 2.338, 2.427, 2.518, 2.545, 2.880. These data have been analyzed earlier by Bechtel et al. (1993). They observed that a mixture of two skewed distributions is suitable for analyzing these enzyme data. 5.1.2. Depressive condition data (S2) These data correspond to a scale that measures behavioral and emotional problems by an instrument composed of 19 items among 134 unrelated individuals. Each item is scored from 1 to 3, where the value ‘‘3’’ indicates a greater tendency to a depressive condition. Thus, the total score of the scale for each individual is in the range [19, 57]. In this way, the individual can be classified with depressive condition or not according to this scale. A study conducted by Dr. Nelson Araneda from the University of La Frontera, Temuco, Chile, in collaboration with some of the authors, in a city located at the southern part of Chile, resulted in real data corresponding to the scores of this scale. The observed values for these data are the following (frequency in parentheses and none when it is equal to one): 19(16), 20(15), 21(14), 22(9), 23(12), 24(10), 25(6), 26(9), 27(8), 28(5), 29(6), 30(4), 31(3), 32(4), 33, 34, 35(4), 36(2), 37(2), 39, 42, 44. These data have been analyzed earlier by Leiva et al. (2010a). As in the case of the enzyme data, these authors observed that a mixture of two skewed distributions is suitable for analyzing the depressive condition data. 5.2. Exploratory data analysis Next, an exploratory data analysis for both data sets is carried out. For S2, the value 16 was subtracted from each observation. Table 2 summarizes some descriptive statistics and Fig. 5 displays the histograms for S1 and S2. From this table, Table 2 Descriptive statistics for the indicated data. Data

Median

Mean

SD

CV (%)

CS

CK

Range

Min.

Max.

n

S1 S2

0.26 8.00

0.62 8.96

0.62 5.36

99.94 59.80

1.19 1.11

3.58 3.89

2.86 25.00

0.02 3.00

2.88 28.00

245 134

2188

N. Balakrishnan et al. / Journal of Statistical Planning and Inference 141 (2011) 2175–2190

0.14

2 1.8

0.12

1.6 1.4

0.1

1.2

0.08

1 0.06

0.8 0.6

0.04

0.4

0.02

0.2

0

0 0

0.5

1

1.5

2

2.5

3

0

5

10

15

20

25

30

Fig. 5. Histograms of S1 (a) and S2 (b).

positively skewed distributions with non-negative support seem to be appropriate for S1 and S2. From the histograms of the data, it is possible to observe that the underlying distributions can be a mixture of two distributions. Next, we fit the three proposed mixture models to S1 and S2. 5.3. Analysis using the mixture models 5.3.1. Model 1 In order to fit the MTBS distribution to S1 and S2, we need an initial guess of the parameters to start the EM algorithm. For this purpose, we use the method of Finch et al. (1989), which is as follows. Choose p1 uniformly on (0, 1), say p1ð0Þ . Order the observations as y(1),y,y(n). Now, based on the subsets {y(1),y,y(r)} and {y(r + 1),y,y(n)}, estimate h1 ¼ ða1 , b1 Þ> and h2 ¼ (0) ða2 , b2 Þ> , respectively, where r is the integer part of np1 . For the initial guess of the parameters aj and bj , for j = 1,2, we utilize the modified moment estimates of a and b proposed by Ng et al. (2003), which are given explicitly as " (sffiffiffiffi )#1=2 sj a~ j ¼ 2 and b~ j ¼ ½sj rj 1=2 , j ¼ 1,2, 1 rj where r 1X s1 ¼ y , r i ¼ 1 ðiÞ

n 1 X s2 ¼ y , nr i ¼ r þ 1 ðiÞ

" #1 r 1X 1 r1 ¼ , r i ¼ 1 yðiÞ

"

and

n 1 X 1 r2 ¼ nr i ¼ r þ 1 yðiÞ

#1 :

The computation of these initial guest values can be easily obtained from an R package called bs, which can be secured from CRAN.R-project.org; see Leiva et al. (2006). For different choices of p1, we perform the EM algorithm and, if the relative absolute difference of the log-likelihood values between two consecutive EM steps is less than 10  4, we stop the EM algorithm. Through this approach, we determine the ML estimates of the Model 1 parameters based on S1 and S2, which are given in Table 3. 5.3.2. Model 2 In order to fit the MBSLBS distribution to S1 and S2, we employ the following method to obtain initial guess values for the unknown parameters. We choose some guess value of p, say 0 opð0Þ o 1, and then for each data point we assign group 1 (BS) Table 3 ML estimates, MLL and SIC values, and KS distances with their associated p-values for the three mixture models fitted to the indicated data. Set

Model

Estimates

Goodness-of-fit

a1

b1

a2

b2

p

–MLL

SIC

KS

p-value

S1

1 2 3

0.533 0.365 1.038

0.175 0.171 0.216

0.319 1.274 1.038

1.274 0.213 0.216

0.629 0.450 0.417

59.168 71.091 115.899

0.298 0.346 0.507

0.053 0.111 0.151

0.507 0.005 o0.001

S2

1 2 3

0.579 0.603 0.598

7.300 7.579 5.665

0.135 61.406 0.598

20.256 0.001 5.665

0.963 0.997 0.176

387.522 388.094 388.726

2.983 2.988 2.992

0.091 0.092 0.085

0.218 0.206 0.283

N. Balakrishnan et al. / Journal of Statistical Planning and Inference 141 (2011) 2175–2190

1

1

0.8

0.8

0.6

0.6

0.4

0.4

ESF Model 1

0.2

2189

0.2 Model 2

Model 3 0

0 0

1

2

3

4

5

6

0

5

10

15

20

25

30

Fig. 6. ESF and FSF of the three mixture models for the S1 (a) and S2 (b).

with probability p(0) and group 2 (LBS) with probability 1  p(0). Next, based on the group 1 observations, we estimate the parameters a1 and b1 and those are the guess values of a1 and b1 . Similarly, based on the group 2 observations, we obtain the guess values of the parameters a2 and b2 . Using these initial guess values and for different choices of p, we perform the EM algorithm and adopted the same stopping criterion as in Model 1. Through this approach, we determine the ML estimates of the Model 2 parameters based on S1 and S2, which are given in Table 3. 5.3.3. Model 3 In order to fit the RMBSLBS distribution to S1 and S2, we determine the initial estimates for the parameters in the same way as in Model 2. First, we obtain the initial estimates of a1 and a2 and then take the initial estimate of a as the average of these two estimates. Similarly, we also obtain the initial estimate of b and adopted the same stopping criterion as stated above. Through this approach, we compute the ML estimates of the Model 3 parameters based on S1 and S2, which are given in Table 3. 5.3.4. Model checking The natural question that arises now is how good are the fits of these models to S1 and S2. For this purpose, we calculate the Kolmogorov–Smirnov (KS) distance between the empirical CDF and the estimated theoretical CDF for the three models. The KS distances for Models 1–3 are also given in Table 3. The empirical survival function (ESF) and the fitted (estimated theoretical) survival function (FSF) for the three models are presented in Fig. 6. Based on the KS distances and the associated p-values, we conclude that Model 1 provides the best fit to the enzyme data and that Model 3 provides the best fit to the depressive condition data. These results are corroborated by the negative value of the maximized log-likelihood (MLL) and the Schwarz information criterion (SIC). 6. Concluding comments In this paper, we have considered three different mixture models based on the Birnbaum–Saunders and length-biased Birnbaum–Saunders distributions, and discussed some of their characteristics. Since in this case the ML estimation by direct maximization of the log-likelihood function is numerically quite involved, we have proposed the EM algorithm for this purpose in order to simplify the estimation procedure. We have analyzed two real data sets for the purpose of illustration and it has been demonstrated that the proposed mixture models as well as the suggested EM algorithm work very well. The shapes of the density and hazard rate functions are important indicators for data analysis as well as for model selection. For the three mixture models, we have used graphical tools to examine their shapes. An analytical examination of the form and features of these functions, for the proposed mixture models, is of interest and remains open. For Model 3, however, it can be shown that the density and hazard rate functions are both unimodal.

Acknowledgments The authors wish to thank the anonymous referees for their constructive comments on an earlier version of this manuscript which resulted in this improved version. Research work of V. Leiva and A. Sanhueza was partially supported by FONDECYT 1080326 and 1090265 Grants from Chile.

2190

N. Balakrishnan et al. / Journal of Statistical Planning and Inference 141 (2011) 2175–2190

References Balakrishnan, N., Leiva, V., Lo´pez, J., 2007. Acceptance sampling plans from truncated life tests from generalized Birnbaum–Saunders distribution. Comm. Statist. Simulation Comput. 36, 643–656. Balakrishnan, N., Leiva, V., Sanhueza, A., Cabrera, E., 2009a. Mixture inverse Gaussian distribution and its transformations, moments and applications. Statistics 43, 91–104. Balakrishnan, N., Leiva, V., Sanhueza, A., Vilca, F., 2009b. Estimation in the Birnbaum–Saunders distribution based on scale-mixture of normals and the EM-algorithm. Statist. Oper. Res. Trans. 33, 171–192. Barros, M., Paula, G.A., Leiva, V., 2008. A new class of survival regression models with heavy-tailed errors: robustness and diagnostics. Lifetime Data Anal. 14, 316–332. Bebbington, M., Lai, C.-D., Zitikis, R., 2008. A proof of the shape of the Birnbaum–Saunders hazard rate function. Math. Sci. 33, 49–56. Bhattacharyya, G.K., Fries, A., 1982. Fatigue failure models: Birnbaum–Saunders vs. inverse Gaussian. IEEE Trans. Rel. 31, 439–440. Birnbaum, Z.W., Saunders, S.C., 1969. A new family of life distributions. J. Appl. Probab. 6, 319–327. Bechtel, Y.C., Bonaiti-Pellie, C., Poisson, N., Magnette, J., Bechtel, P.R., 1993. A population and family study of n-acetyltransferase using caffeine urinary metabolites. Clin. Pharmacol. Therapeut. 54, 134–141. Chhikara, R.S., Folks, J.L., 1989. The Inverse Gaussian Distribution. Marcel Dekker, New York. Cox, D.R., 1969. Some sampling problems in technology. In: Johnson, N.L., Smith Jr., H. (Eds.), New Developments in Survey Sampling. John Wiley and Sons, New York, pp. 506–527. Dempster, A.P., Laird, N.M., Rubin, D.B., 1977. Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. B 39, 1–38. Desmond, A.F., 1985. Stochastic models of failure in random environment. Canad. J. Statist. 13, 171–183. Desmond, A.F., 1986. On the relationship between two fatigue-life models. IEEE Trans. Rel. 35, 167–169. Efron, B., Hinkley, D., 1978. Assessing the accuracy of the maximum likelihood estimator: observed vs. expected Fisher information. Biometrika 65, 457–487. Engelhardt, M., Bain, L.J., Wright, F.T., 1981. Inferences on the parameters of the Birnbaum–Saunders fatigue life distribution basedon maximum likelihood estimation. Technometrics 23, 251–256. Fisher, R.A., 1934. The effects of methods of ascertainment upon the estimation of frequencies. Ann. Eugenics 6, 13–25. Finch, S.J., Mendell, N.R., Thode Jr., H.C., 1989. Probabilistic measures of adequacy of a numerical search for a global maximum. J. Amer. Statist. Assoc. 84, 1020–1023. Gupta, R.C., Akman, O., 1995. On the reliability studies of a weighted inverse Gaussian distribution. J. Statist. Plann. Inference 48, 69–83. Gupta, R.C., Akman, O., 1997. Estimation of critical points in the mixture of inverse Gaussian model. Statist. Papers 38, 445–452. Gupta, R.C., Keating, J.P., 1985. Relations for reliability measures under length biased sampling. Scand. J. Statist. 13, 49–56. Gupta, R.C., Kirmani, S.N.U.A., 1990. The role of weighted distributions in stochastic modeling. Comm. Statist. Theory Methods 19, 3147–3162. Jain, K., Singh, H., Bagai, I., 1989. Relations for reliability measures of weighted distributions. Comm. Statist. Theory Methods 18, 4393–4412. Johnson, N.L., Kotz, S., Balakrishnan, N., 1994. Continuous Univariate Distributions—Vol. 1, second ed. John Wiley and Sons, New York. Johnson, N.L., Kotz, S., Balakrishnan, N., 1995. Continuous Univariate Distributions—Vol. 2, second ed. John Wiley and Sons, New York. ¨ Jorgensen, B., 1982. Statistical Properties of the Generalized Inverse Gaussian Distribution. Springer-Verlag, New York. ¨ Jorgensen, B., Seshadri, V., Whitmore, G.A., 1991. On the mixture of the inverse Gaussian distribution with its complementary reciprocal. Scand. J. Statist. 18, 77–89. ¨ Jorgensen, B., 1997. The Theory of Dispersion Models. Chapman and Hall, London. Khattree, R., 1989. Characterization of inverse Gaussian and gamma distributions through their length biased versions. IEEE Trans. Rel. 38, 610–611. Kundu, D., Kannan, N., Balakrishnan, N., 2008. On the hazard function of Birnbaum–Saunders distribution and associated inference. Comput. Statist. Data Anal. 52, 2692–2702. Leiva, V., Herna´ndez, H., Riquelme, M., 2006. A new package for the Birnbaum–Saunders distribution. R Journal 6, 35–40. Leiva, V., Barros, M., Paula, G.A., Galea, M., 2007. Influence diagnostics in log-Birnbaum–Saunders regression models with censored data. Comput. Statist. Data Anal. 51, 5694–5707. Leiva, V., Barros, M., Paula, G., Sanhueza, D., 2008. Generalized Birnbaum–Saunders distributions applied to air pollutant concentration. Environmetrics 19, 235–249. Leiva, V., Sanhueza, A., Angulo, J.M., 2009. A length-biased version of the Birnbaum–Saunders distribution with application in water quality. Stochastic Environ. Res. Risk. Assess. 23, 299–307. Leiva, V., Sanhueza, A., Kotz, S., Araneda, N., 2010a. A unified mixture model based on the inverse Gaussian distribution. Pakistan J. Statist. 26, 445–460. Leiva, V., Vilca, F., Balakrishnan, N., Sanhueza, A., 2010b. A skewed sinh-normal distribution and its properties and application to air pollution. Comm. Statist. Theory Methods 39, 426–443. Louis, T.A., 1982. Finding the observed information matrix when using the EM algorithm. J. Roy. Statist. Soc. B 44, 226–233. Marshall, A.W., Olkin, I., 2007. Life Distributions. Springer-Verlag, New York. McLachlan, G.J., Peel, D., 2000. Finite Mixture Models. John Wiley and Sons, New York. Ng, H.K.T., Kundu, D., Balakrishnan, N., 2003. Modified moment estimation for the two-parameter Birnbaum–Saunders distribution. Comput. Statist. Data Anal. 43, 283–298. Ng, H.K.T., Kundu, D., Balakrishnan, N., 2006. Point and interval estimations for the two-parameter Birnbaum–Saunders distribution based on type-II censored samples. Comput. Statist. Data Anal. 50, 3222–3242. Olyede, B.O., George, E.O., 2002. On stochastic inequalities and comparisons of reliability measures for weighted distributions. Math. Probab. Eng. 8, 1–13. Patil, G.P., 2002. Weighted distributions. In: El-Shaarawi, A., Piegorsch, W.W. (Eds.), Encyclopedia of Environmetrics, vol. 4. John Wiley and Sons, New York, pp. 2369–2377. Patil, G.P., Rao, C.R., 1978. Weighted distributions and size-biased sampling with applications to wildlife populations and human families. Biometrics 34, 179–189. Rao, C.R., 1965. On discrete distributions arising out of methods of ascertainment. Sankhya A 27, 311–324. Sanhueza, A., Leiva, V., Balakrishnan, N., 2008. The generalized Birnbaum–Saunders distribution and its theory, methodology and application. Comm. Statist. Theory Methods 37, 645–670. Sansgiry, P.S., Akman, O., 2001. Reliability estimation via length-biased transformation. Comm. Statist. Theory Methods 30, 2473–2479. Seshadri, V., 1993. The Inverse Gaussian Distribution: A Case Study in Exponential Families. Oxford University Press, Oxford, England. Seshadri, V., 1999. The Inverse Gaussian Distribution: Statistical Theory and Applications. Springer-Verlag, New York. Xiao, Q., Lu, Z., Balakrishnan, N., Lu, X., 2010. Estimation of the Birnbaum–Saunders regression model with current status data. Comput. Statist. Data Anal. 54, 326–332.