Statistics and Probability Letters 82 (2012) 252–261
Contents lists available at SciVerse ScienceDirect
Statistics and Probability Letters journal homepage: www.elsevier.com/locate/stapro
A two-parameter of weighted exponential distributions M.K. Shakhatreh Department of Statistcs, Yarmouk University, Irbid, Jordan
article
abstract
info
Article history: Received 30 March 2011 Received in revised form 9 October 2011 Accepted 12 October 2011 Available online 19 October 2011 Keywords: Asymptotic distribution Hazard function Maximum likelihood estimates Method of moments
In this paper, a class of distributions called the two-parameter weighted exponential distributions is introduced (TWE). This new class of distributions generalizes the ones – weighted exponential distributions (WE) – proposed by Gupta and Kundu (2009). The main properties of this new class of distributions are investigated. Several statistical properties and statistical inferences are then obtained and studied. Two real data sets of which one is a right censored data set were analyzed, and it is shown that in both two cases our model fits much better than WE or some other existing models. © 2011 Elsevier B.V. All rights reserved.
1. Introduction Azzalini (1985) proposed a novel method for introducing a skewness parameter to the normal distribution which is known in the literature as a skew-normal distribution. Several authors have done an extensive work on introducing skewness parameters for other symmetric distributions such as skew-logistic, skew-Cauchy, skew-Laplace, and skew-t. Moreover, their statistical properties and inference procedures have been investigated thoroughly in the literature; see for example, Genton (2004), Arnold and Beaver (2000a), and Nadarajah (2009). However, it seems that no attempts were made to implement Azzalini’s idea for non-symmetric distributions. Recently, Gupta and Kundu (2009) have followed a similar approach as of Azzalini for introducing a shape parameter to the exponential distribution. They showed that by applying Azzalini’s method to the exponential distribution, a new class of weighted exponential distribution denoted by WE can be obtained. Furthermore, they showed that the WE distribution possess some good properties and can be used as a good fit to survival time data compared to other popular distributions such as gamma, Weibull, or generalized exponential distribution. In this paper, a class of two-parameter weighted exponential distribution is introduced. This class of distributions is closely related to the weighted exponential distribution introduced by Gupta and Kundu (2009) and defined as follows: a random variable X is said to have a weighted exponential distribution with a shape parameter α > 0, and scale parameter λ > 0, denoted by WE(λ, α ), if its probability density function (PDF) is given by gX (x; λ, α) =
1+α
α
λe−λx (1 − e−λαx ),
x > 0.
(1)
When α → ∞, then (1) converges to exp(λ) and when α → 0, it converges to Γ (2, λ). Moreover, it has a unimodal √ that occurs at ln(1 + α)/α . As shown by Gupta and Kundu (2009), the skewness function assumes values in the interval [ 2, 2]. While the WE distribution has several good properties and can be used as a good fit for modeling lifetime data, a limitation of using it is that when the sample skewness measure for data lies outside its skewness function range. In this case, the WE distribution may not be able to capture the skewness in the data well. This motivates us to look at a new class of distributions with skewness function of a wider range in order to capture the skewness that some models may miss it.
E-mail address:
[email protected]. 0167-7152/$ – see front matter © 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.spl.2011.10.008
M.K. Shakhatreh / Statistics and Probability Letters 82 (2012) 252–261
253
The main objective of this article is to introduce a flexible class of distributions that generalizes the WE distribution and provide wider range for the skewness measure. The rest of the paper is organized as follows. Section 2 provides the definition and some basic properties. Important properties of this class of distribution are given in Section 3. Statistical properties and inferences are given in Sections 4 and 5, respectively. Section 6 gives an application to two real data sets. Conclusions of the paper are given in Section 7. 2. Definition and basic properties In this section, we introduce the definition of the weighted two-parameter exponential distribution denoted by TWE (λ, α1 , α2 ), and then we list some of its basic properties. Definition 1. A random variable X is said to have a two-parameter weighted exponential distribution with shape parameters α1 > 0, α2 > 0, and scale parameter λ > 0 if the PDF of X is given below fX (x; λ, α1 , α2 ) = k(α1 , α2 ) λe−λx (1 − e−λα1 x )(1 − e−λα2 x ), and 0 otherwise, where k(α1 , α2 ) =
x>0
(2)
(1+α1 )(1+α2 )(1+α1 +α2 ) . α1 α2 (2+α1 +α2 )
Remark. When α1 = α2 = α , then Eq. (2) becomes fX∗ (x, λ, α1 , α2 ) =
(1 + α)(1 + 2α) −λx λe (1 − e−λαx )2 . 2α 2
(3)
Eq. (3) can be obtained from a class of generalized distributions proposed by Eugene et al. (2002) as follows. Let G(x) be the cdf of a random variable X and consider the following cdf which is based on the beta PDF: F (x) =
Γ (a + b) Γ (a)Γ (b)
G(x)
∫
w a−1 (1 − w)b−1 dw.
(4)
0
Therefore (3) can be obtained from (4) by taking G(x) = 1 − exp(−αλx) with a = 1/α , and b = 3. Fig. 1 shows the graph of (2) for different values of α1 , α2 , and λ = 1.
Fig. 1. Examples of the TWE density for (α1 , α2 ) = (5, 10) (solid line), (α1 , α2 ) = (2, 2) (solid dashed line), (α1 , α2 ) = (0.02, 5) (dashed line), and (α1 , α2 ) = (0.02, 0.5) (line).
Based on (2), we can derive the distribution function, survival, and hazard function of X which can be placed in the following compact forms respectively. FX (x; λ, α1 , α2 ) = 1 − k(α1 , α2 ) e
−λx
SX (x; λ, α1 , α2 ) = k(α1 , α2 ) e−λx
1−
1−
1 1 + α1
1 1 + α1
−λα1 x
e
e−λα1 x −
−
1 1 + α2
1 1 + α2
e
−λα2 x
e−λα2 x +
+
1 1 + α1 + α2 1
1 + α1 + α2
−λ(α1 +α2 )x
e
e−λ(α1 +α2 )x
,
,
(5)
(6)
254
M.K. Shakhatreh / Statistics and Probability Letters 82 (2012) 252–261
and HX (x) =
λ(1 − e−λα1 x )(1 − e−λα2 x ) 1 1 e−λα1 x − 1+α e−λα2 x + 1+α1+α e−λ(α1 +α2 )x 1 − 1+α 1 2 1 2
.
(7)
2.1. Basic properties We list some basic properties of the TWE(λ, α1 , α2 ) distribution. Proposition 2.1. Let G = {TWE(λ, α1 , α2 ): λ > 0; α1 > 0; α2 > 0} be a family of TWE distributions; then, we have the following properties 1. 2. 3. 4.
limα1 −→∞ fX (x; λ, α1 , α2 ) = gX (x; λ, α2 ); limα2 −→∞ fX (x; λ, α1 , α2 ) = gX (x; λ, α1 ); If α1 , α2 −→ ∞ then fX (x; λ, α1 , α2 ) converges to the Exp(λ) If α1 , α2 −→ 0, then f (x; λ, α1 , α2 ) converges to Gamma(3, λ).
Proposition 2.2. The hazard function HX is an increasing function for all α1 > 0, α2 > 0. Proof. Since the PDF of the TWE, fX (x; λ, α1 , α2 ) is a log-concave, see Lemma 4.1, it then follows that the PDF belongs to the class of Pólya frequency of order 2 (PF2), see Gupta et al. (2010, p. 88, Lemma 5.1). Now Theorem 5.3 of Gupta et al. (2010, p. 91) implies that the hazard function is an increasing function. In this case, for any α1 > 0, α2 > 0, HX (x) will start at 0 and converge to λ, as x −→ ∞. Fig. 2 provides further illustration of the hazard function for different values of (α1 , α2 ) and when the scale parameter λ = 1.
Fig. 2. Examples of the hazard for (α1 , α2 ) = (5, 10) (solid line), (α1 , α2 ) = (2, 2) (solid dashed line), (α1 , α2 ) = (0.02, 5) (dashed line), and (α1 , α2 ) = (0.02, 0.5) (line).
3. Important properties The TWE (λ, α1 , α2 ) model can be explained as follows. Proposition 3.1. Suppose that Y , Y1 , and Y2 are independent and identically distributed (i.i.d) exponential random variables with rate λ. Let X be a random variable whose PDF is given by f (x; λ, α1 , α2 ) = k(α1 , α2 )fY (x)FY (α1 x)FY (α2 x),
x > 0.
(8)
Eq. (2) can be obtained from (8) by putting fY (x) = λe , and FY (x) = 1 − e . Since (8) is a PDF, it then follows that the constant k(α1 , α2 ) can be determined by k(α1 , α2 ) = 1/E (F (α1 X )F (α2 X )) = 1/P (Y1 ≤ α1 Y , Y2 ≤ α2 Y ). −λx
−λx
It should be noted that Jammalizadeh et al. (2008) obtained a two-parameter generalized skew normal distribution by taking FY (x) = Φ (x) in (8), where Φ (x) is the cdf of the normal density function. The following proposition establishes that the TWE distribution can be obtained hierarchically as follows.
M.K. Shakhatreh / Statistics and Probability Letters 82 (2012) 252–261
255
Proposition 3.2. Suppose that W , Y1 and Y2 are random variables such that y1 , y2 |w, λ ∼i.i.d Exp(λw),
w|λ ∼ Exp(λ).
Let X be a random variable defined by X = W 1 {Y1 ≤ α1 , Y2 ≤ α2 }, where 1 (A) stands for the indicator function of a set A. It then follows that the PDF of X is given in (2). d
Proof. Set A = {Y1 ≤ α1 , Y2 ≤ α2 }. From the definition of X , it follows that X = (W |(Y1 , Y2 ) ∈ A). Therefore, the PDF of X is; fX (x) = f (w|(Y1 , Y2 ) ∈ A) =
fW (x) P ((Y1 , Y2 ) ∈ A | W = x) P ((Y1 , Y2 ) ∈ A)
P ((Y1 , Y2 ) ∈ A) =
0
0
0
2 −λw(1+y1 +y2 )
λ w e 3
α1 α2
λ2 x2 e−λx(y1 +y2 ) dy2 dy1 = (1 − e−λα1 x )(1 − e−λα2 x ), and dy2 dy1 dw = 1/k(α1 , α2 ), and we get the desired result.
Since fW (x) = λe−λx , then P ((Y1 , Y2 ) ∈ A | W = x) =
∞ α1 α2
.
0
0
Observe that Proposition 3.2 has a relation to the hidden truncation models discussed by Arnold and Beaver (2000b), and selection models proposed by Arellano-Valle et al. (2006). The following proposition shows that the TWE distribution can be obtained from a mixture of exponential distributions. With out loss of generality suppose that the scale parameter λ = 1, and for brevity we denote TWE(1, α1 , α2 ) by TWE(α1 , α2 ), the corresponding PDF and CDF by f (x, α1 , α2 ) and F (x, α1 , α2 ), respectively. Proposition 3.3. Suppose that Y1 follows (∼) exp(1), Y2 ∼ exp(1 + α1 ), Y3 ∼ exp(1 + α2 ), and Y4 ∼ exp(1 + α1 + α2 ), and they are independent. Let Z = Y1 + Y2 + Y3 + Y4 and W = Y2 + Y3 + Y4 ; then (2) can be obtained from fX (x; α1 , α2 ) = θ fZ (x; α1 , α2 ) + (1 − θ )fW (x, α1 , α2 ) where θ = (α1 + α2 )/(2 + α1 + α2 ), and fZ (x; α1 , α2 ) = a1 exp(1) + a2 exp(1 + α1 ) + a3 exp(1 + α2 ) + a4 exp(1 + α1 + α2 ), with a1 = (1 + α1 )(1 + α2 )(1 + α1 + α2 )/α1 α2 (α2 + α1 );
(9)
a2 = (1 + α2 )(1 + α1 + α2 )/α1 α2 (α1 − α2 ); a3 = (1 + α1 )(1 + α1 + α2 )/α1 α2 (α2 − α1 ); a4 = −(1 + α1 )(1 + α2 )/α1 α2 (α2 + α1 ), and fW (x; α1 , α2 ) = b1 exp(1 + α1 ) + b2 exp(1 + α2 ) + b3 exp(1 + α1 + α2 )
(10)
b1 = (1 + α2 )(1 + α1 + α2 )/α2 (α2 − α1 );
(11)
b2 = (1 + α2 )(1 + α1 + α2 )/α1 (α1 − α2 ); b3 = (1 + α1 )(1 + α2 )/α1 α2 . Proof. The moment generating function of X (see Section 4) is given by
(1 + α1 )(1 + α2 )(1 + α1 + α2 )(2(1 − t ) + α1 + α2 ) , (1 − t )(1 + α1 − t )(1 + α2 − t )(1 + α1 + α2 − t )(2 + α1 + α2 ) 2 α1 + α2 MW (t ) + MZ (t ), = 2 + α1 + α2 2 + α1 + α2
MX (t ) =
where MZ (t ) and MW (t ) are the moment generating functions of Z and W , respectively. Now the MGF of W can be written as a partial fraction, i.e., MW (t ) = b1
(1 + α1 ) (1 + α2 ) (1 + α1 + α2 ) + b2 + b3 , 1 + α1 − t 1 + α2 − t (1 + α1 + α2 − t )
where b1 , b2 , and b3 are given in (11). From the uniqueness and the linearity of the inverse Laplace transform, the PDF of W is given via (10). The PDF of Z can be obtained similarly and this completes the proof. 4. Statistical properties In this section, we derive some statistical quantities.
256
M.K. Shakhatreh / Statistics and Probability Letters 82 (2012) 252–261
4.1. Moment generating function The moment generating function (MGF) of X for |t | < 1 is given by MX (t ) = E (etX ) = k(α1 , α2 )
[
1 1−t
−
1 1 + α1 − t
−
1 1 + α1 − t
+
1
]
1 + α1 + α2 − t
.
Simple algebra leads to MX (t ) =
(1 + α1 )(1 + α2 )(1 + α1 + α2 )(2(1 − t ) + α1 + α2 ) . (1 − t )(1 + α1 − t )(1 + α2 − t )(1 + α1 + α2 − t )(2 + α1 + α2 )
To find the mean, the variance, and the kth moment we use the cumulant function. The cumulant function (KX (t ) = log MX (t )) is given by KX (t ) = log(2(1 − t ) + α1 + α2 ) − (log(1 − t ) + log(1 + α1 − t ) + log(1 + α2 − t )
+ log(1 + α1 + α2 ) + c )
(12)
where c is a constant which does not depend on t. By differentiating the above and letting t = 0, we obtain
µ=1+
1 1 + α1
Var(X ) = 1 +
+
1
1 + α2 1
+
1 1 + α1 + α2 1
−
2 2 + α1 + α2 1
, 4
, (2 + α1 + α2 )2 ] 1 1 8 1 + + − , µ3 = 2 1 + (1 + α1 )3 (1 + α2 )3 (1 + α1 + α2 )3 (2 + α1 + α2 )3 and the kth cumulant τk can be obtained as [ ] 1 1 1 2k τk = (k − 1)! 1 + + + − . (1 + α1 )k (1 + α2 )k (1 + α1 + α2 )k (2 + α1 + α2 )k (1 + α1 )
2
+
(1 + α 2 )
2
+
(1 + α1 + α2 )
2
−
[
Fig. 3, shows that the skewness and the coefficient of variation are both increasing functions in α1 > 0, and α2 > 0. Therefore, these functions can reach their maximum when α1 , α2 → ∞, and their minimum when α1 , α2 → 0. Consequently, the skewness function assumes values in [ √2 , 2], and the coefficient of variation function assumes values 3
in [ √1 , 1]. This implies that the TWE distribution possess a wider range of skewness and coefficient of variation measures 3
than the ones provided by the WE distribution. Finally, the mean, the median, the mode, and the variance are all decreasing functions in α1 > 0, and α2 > 0; see Fig. 4.
Fig. 3. Skewness and coefficient of variation measures for different values of (α1 , α2 ) when λ = 1.
4.2. Modes and quantiles Lemma 4.1. The TWE(α1 , α2 ) is strongly unimodal. To show that the TWE is strongly unimodal, it is enough to show ∂ 2 log f (x, α1 , α2 )/∂ x2 < 0, for all x > 0, see Karlin (1968), and this condition can be verified easily for the TWE. Therefore, the mode can be obtained as a solution for the equation: −α x −α x α1 1−e e−α1 1 x + α2 1−e e−α2 2 x = 1. This applies as well to the quantiles that cannot be obtained in closed form and in this case, we have to solve a non-linear equation: F (xq , α1 , α2 ) = q, 0 < q < 1, where xq is the qth quantile. While the mode and the median of TWE do not possess a closed form, Fig. 4 shows that the median lies between the mode and the mean and therefore the TWE distribution is a positively skewed distribution.
M.K. Shakhatreh / Statistics and Probability Letters 82 (2012) 252–261
257
Fig. 4. Mean, mode, median, and skewness measure for different values of (α1 , α2 ) when λ = 1.
5. Statistical inferences The statistical inferences about the unknown parameters of the model, namely, the maximum likelihood estimators (MLEs), their asymptotic distributions, and the method of moments (MMEs) are considered in this section. 5.1. Maximum likelihood estimators In this subsection, we discuss the MLEs for the unknown parameters in our proposed model. We show in this section that the MLE of the unknown vector parameters θ = (λ, α1 , α2 ) (if they exist), will be in the region R = {(λ, α1 , α2 ): λ > 0, α1 > 0, α2 = w(α1 ) > 0}, where w(.) is a monotone function. To see this, let X1 , . . . , Xn be a random sample from TWE(λ, α1 , α2 ). The log-likelihood function based on xn = (x1 , . . . , xn ) is given by l(λ, α1 , α2 ) = n (ln(1 + α1 ) − ln(α1 ) + ln(1 + α2 ) − ln(α1 ) + ln(1 + α1 + α2 ) − ln(2 + α1 + α2 ) + λ)
−λ
n − i=1
xi +
n −
ln 1 − e−α1 λxi +
i =1
n −
ln 1 − e−α2 λxi .
(13)
i=1
Our goal is to maximize the log-likelihood function with respect to λ, α1 , α2 . To find the MLE estimates for the TWE model parameters, we first differentiate the log-likelihood function and equating the resulting equations to 0, i.e., n n − − ∂l n = − xi + α1 xi ∂λ λ i=1 i=1
e−λα1 xi 1 − e−λα1 xi
+ α2
n − xi
i=1
e−λα2 xi
1 − e−λα2 xi
n − ∂l n n =− + +λ xi ∂α1 (1 + α1 )α1 (1 + α1 + α2 )(2 + α1 + α2 ) i =1
= 0,
e−λα1 xi 1 − e−λα1 xi
(14)
= 0,
(15)
−λα2 xi n − ∂l n n e =− + +λ xi = 0. (16) ∂α2 (1 + α2 )α2 (1 + α1 + α2 )(2 + α1 + α2 ) 1 − e−λα2 xi i =1 −λα x −λα x ∑n ∑ 1 i e = λn α1 (11+α1 ) − (1+α1 +α2 )(1 2+α1 +α2 ) , and ni=1 xi 1−e e−λα2 2ixi = Eqs. (15) and (16), implies that i=1 xi 1−e−λα1 xi n 1 1 n 1 1 − (1+α1 +α2 )(2+α1 +α2 ) . Substitute these quantities in Eq. (17), to see that, λα1 ,α2 = ∑n x (1 + 1+α1 + 1+α2 − λ α2 (1+α2 ) α1 +α2 ). (1+α1 +α2 )(2+α1 +α2 )
i=1 i
Therefore, for given α1 > 0 and α2 > 0, the log-likelihood function given in (13) is maximized for the
258
M.K. Shakhatreh / Statistics and Probability Letters 82 (2012) 252–261
value of λα1 ,α2 , given above. Moreover, the three-optimization problem reduces now to solve a two-optimization problem namely, the following non-linear equations: n
(1 + α1 + α2 )(2 + α1 + α2 ) n
(1 + α1 + α2 )(2 + α1 + α2 )
n −
+ λα1 ,α2
xi
e−λα1 xi 1 − e−λα1 xi
i=1
n −
+ λα1 ,α2
xi
e−λα2 xi
(1 + α1 )α1
=
(1 + α2 )α2
1 − e−λα2 xi
i=1
n
=
n
,
(17)
.
(18)
Notice that the lead terms in Eqs. (17) and (18) are equal. Now, subtract Eq. (17) from Eq. (18) to get n
(1 + α2 )α2
−
n
(1 + α1 )α1
+ λα1 ,α2
n − xi
i=1
e−λα1 xi
−
1 − e−λα1 xi
n − xi
e−λα2 xi
1 − e−λα2 xi
i =1
= 0.
(19)
Eq. (19) is effectively equal to 0 if and only if α2 = α1 . Such situation can happen when the distribution is not an identifiable distribution which is the case in our proposed distribution. Therefore, identifiability of the distribution is a necessary condition for the existing consistence estimators of the unknown parameters. Let θ = (λ, α1 , α2 ) be the parameters of the TWE distribution. Clearly, θ determines the distribution of the data say X. We say that a distribution or a model is nonidentifiable if there exists θ ̸= θ⋆ , such that f (x; θ) = f (x, θ⋆ ), for all x ∈ X. Otherwise, the distribution or the model is said to be identifiable. To see that the TWE distribution is not identifiable, take θ = (3, 2, 6) ̸= θ⋆ = (3, 6, 2), but TWE(θ) = TWE(θ⋆ ), for all x ∈ X. Throughout the paper we shall assume that α1 = α , and α2 = w(α) > 0. Therefore, the log-likelihood function is given l1 (λ, α, w(α)) = n (ln(1 + α) − ln(α) + ln(1 + w(α)) − ln(w(α)) + ln(1 + α + w(α)) − ln(2 + α + w(α)) + λ)
−λ
n −
xi +
i=1
n −
ln 1 − e−αλ xi +
n −
i=1
ln 1 − e−w(α)λxi .
(20)
i=1
Similarly, our goal now is to maximize the log-likelihood function given (20) with respect to (λ, α), and again we have to solve the score function equations. Let w ′ (·) be the derivative of w(·), then the score function equations is given by the following non-linear system of equations, n
λ −
−
n −
xi + α
i =1
n
n − xi
i =1
−
e−λα xi
1 − e−λα xi
n w ′ (α)
+ w(α)
n − xi
i=1
e−λw(α)xi
1 − e−λw(α)xi
= 0,
(21)
n(1 + w ′ (α))
+
(1 + α)α (1 + w(α))w(α) (1 + α + w(α)) (2 + α + w(α)) −λw(α)xi −λαxi n n − − e e ′ + λw (α) x = 0. +λ xi i −λα xi 1 − e 1 − e−λw(α)xi i=1 i =1
(22)
To solve the above non-linear equation, we observe that Eq. (21) has a unique root for fixed α > 0. To show ∑ this, let vα (λ; xn ) be the left-hand side of Eq. (21), where xn = (x1 , . . . , xn ) ∈ X. It follows that limλ→∞ vα (λ; xn ) = − ni=1 xi ,
and limλ→0 vα (λ; x ) = ∞. Since dvα (λ; x )/dλ = − n
n
n
λ2
+α
2
∑n
2 i =1 x i
e−λα xi 1−e−λα xi
2
+ w(α)
e−λw(α)xi
∑n
2 i=1 xi
1−e−λw(α)xi
2
< 0, for
all α > 0, it follows that vα (λ; xn ) is a decreasing function in λ, and thus a solution for λ can be obtained uniquely. To finish, Eq. (22) can be re-written as follows
−
w ′ (α) (1 + w ′ (α)) + (1 + w(α))w(α) (1 + α + w(α)) (2 + α + w(α)) −λαxi −λw(α)xi n n e λw′ (α) − e 1 λ− + xi + x = . i −λw(α)xi n i =1 1 − e−λα xi n 1 − e ( 1 + α)α i=1
The solution of Eq. (22) can be obtained numerically. In fact the system of non-linear equations given by Eqs. (21) and (22) can be solved by using nlm() procedure implemented in R software. We are now in a position to establish the asymptotic distribution of the MLEs estimates derived in the previous subsection. We have the following theorem.
ˆ n , αˆ n ) of (λ, α) are consistent estimators, and Theorem 5.1. The maximum likelihood estimators (λ
√
is asymptotically normal with mean vector 0 and the variance covariance matrix I−1 , where I =
ˆ n − λ, αˆ n − α)T n(λ − 1n E
∂ l21 (θ)
∂θ∂θT
. Since
M.K. Shakhatreh / Statistics and Probability Letters 82 (2012) 252–261
E
X2
(
e−λα X
1−e−λα X
)
k(α,w(α))
=
2
λ2 α 3
1 0
(ln u)2 1−u
1
uα
1−u
w(α) α
du, and E X
e−λα X
(1−e−λαX )
k(α,w(α))w(α) , (1+α)(1+α+w(α))
=
259
then the components of
the Fisher information matrix can be easily computed. 5.2. Method of moment estimators An alternative approach to the MLE approach for estimating the unknown parameters θ = (λ, α1 , α2 ) is to use the method of moments. Let x¯ , s2 and m3 be the sample mean, sample variance, and the third sample central moment of the observed sample xn = (x1 , . . . , xn ). The MEs can be obtained by equating and solving the sample mean, the sample variance, and the third sample central moment to their distribution counterparts, i.e. x¯ =
1
1+
λ
s2 = m3 =
1+
λ2 2 3
+
1 + α1
1
λ
1
1 1 + α2
1
+
(1 + α1 )2
1+
1
(1 + α1 )
3
1
+
1 + α1 + α2
1
+
(1 + α2 )2
+
1
(1 + α2 )
3
−
2 + α1 + α2
1
−
(1 + α1 + α2 )2
+
2
1
(1 + α1 + α2 )
3
,
4
(2 + α1 + α2 )2
−
8
(2 + α1 + α2 )3
,
.
While the above system of equations can be reduced from a 3-dimensional problem to solve a 2-dimensional problem, it is difficult to check whether these equations possess a unique solution all the time. For example, consider the case when α1 = α and α2 = w(α) = α , then to find the estimates of λ and α by using the ME method, we have to solve
1 1 1 + 1+12α , and s2 = λ12 1 + (1+α) simultaneously. Similarly these two equations can be x¯ = λ1 1 + 1+α 2 + (1+2α )2 2 reduced to find a unique root of α > 0, for the following function
q(α; x¯ , s2 ) = 1 −
6(1 + 2α)(1 + α)2
(1 + α)2 [(1 + 2α)(7 + 2α) + 1] + (1 + 2α)2
−
s2 x¯ 2
,
or equivalently, we have to find a root for a polynomial of degree 4. This polynomial might have more than one root. Interestingly, root as long as the c v, the coefficient of variation measure, √ for this case we can show that 2there is a unique c v ∈ [1/ 3, 1√ ]. Notice that limα→0 q(α; x¯ , s ) = 1 − c v 2 < 0, when c v > 1, and limα→∞ q(α; x¯ , s2 ) = 1/3 − c v 2 > 0, when c v √ < 1/ 3. Since q(α; x¯ , s2 ) is an increasing function in α , it follows that there is a unique solution for α as long as c v ∈ [1/ 3, 1]. From the above, it seems that to find a unique solution to the equations of the ME method which is based on the full model is not an easy task, because in this case we have to work with higher polynomials of degrees 6 and 8. 6. Data illustration In this section, we consider two real data sets for illustrative purposes. DATA SET 1: consider the data given by Gupta and Kundu (2009) which represent the marks in Mathematics for the students in the pace slow program in the year 2003.
29 25 50 15 13 27 15 18 7 7 8 19 12 18 5 21 15 86 21 15 14 39 15 14 70 44 6 23 58 19 50 23 11 6 34 18 28 34 12 37 4 60 20 23 40 65 19 31. For this data, we have that n = 48, x¯ = 25.89 √ , S = 18.60, and b1 = 1.332. Observe that the sample skewness (b1 = 1.332) lies outside the range of skewness of WE ([ 2, 2]), while it is in the range of the TWE distribution ([ √2 , 2]). It follows that 3
the flexibility of the TWE and increased range of skewness were able to capture features of the data much better than WE. For example, it is clear that fitting a WE model to this data would be inappropriate due to the sample skewness value. These issues are further demonstrated in Fig. 5. To see how the distribution is a good fit, we compute the Kolmogorov–Smirnov (K–S) distance between the empirical distribution and the fitted model, its corresponding P-value, log-likelihood, and Akaike information criterion (AIC). We have included the AIC because parametric models usually do not have an equal number of parameters. One reasonable way of selecting parametric models with different number of parameters, is to base the decision on the AIC. The results are summarized in Table 1, considering three special cases from TWE(λ, α1 , α2 ), and generalized Weibull distribution: GW(λ, α1 , α2 ), where the CDF of GW is F (x; λ, α1 , α2 ) = [1 − exp(−λxα1 )]α2 . For these models, the AIC is given by AIC = −2 ∗ log − likelihood + 2k, where k is the number of parameters in the model. It is clear from Table 1 that our sub-model TWE(λ, α , α ), with AIC = 396.878, provides the best fit for this data, and TWE(λ, α , α1 ,) with AIC = 397.342, is a close second that performed much better than WE and GW as well. The GW which has the smallest log-likelihood does not have a smaller AIC than
260
M.K. Shakhatreh / Statistics and Probability Letters 82 (2012) 252–261
these three models and the simpler models are preferred. Finally, Gupta and Kundu (2009) compared the WE with Weibull, generalized exponential, and gamma distributions based on this data set, and they observed that the WE was better than all of them. Table 1 MLEs of the marks data set.
λ α1 α2 Log-likelihood AIC K–S test P-value
WE(λ, α )
GW(λ, α1 , α2 )
TWE(λ, α, α )
TWE(λ, α , α1 )
0.0688 0.2797 – −197.247 398.494 0.0933 0.7976
0.2953 0.6747 6.4805 −196.428 398.856 0.068 0.9795
0.0563 2.4562 2.4562 −196.439 396.878 0.0777 0.934
0.0721 3.6932 0.2707 −196.671 397.342 0.0951 0.7782
Fig. 5. Histogram of the marks data. The lines represent distributions fitted using MLEs: TWE(λ, α1 , α2 ) (solid line); TWE(λ, α, α1 ) (dotted line); WE(λ, α ) (solid dashed line); GW(λ, α1 , α2 ) (line).
DATA SET 2: the survival times in months of 100 patients who have been infected by HIV were provided by Hosmer and Lemeshow (1999), where the plus sign in the data indicates a right-censored time.
5 6+ 8 3 22 1+ 7 9 3 12 2+ 12 1 15 34 1 4 19+ 3+ 2 2+ 6 60+ 7+ 60+ 11 2+ 5 4+ 1+ 13 3+ 2+ 1+ 30 7+ 4+ 8+5+ 10 2+ 9+ 36 3+ 9+ 3+ 35 8+ 1+ 5+ 11 56+ 2+ 3+ 15 1+10 1+ 7+ 3+ 3+ 2+ 32 3+ 10+ 11 3+ 7+ 5+ 31 5+ 58 1+ 2+1 3+ 43 1+ 6+ 53 14 4+ 54 1+ 1+ 8+ 5+ 1+ 1+ 2+ 7+ 1+10 24+ 7+ 12+ 4+ 57 1+ 12+. For this data set, there are 37 uncensored time and 63 right censored time. For each patient, we observe pairs (T1 , δ1 ), . . . , (T100 , δ100 ), where Ti stand for the survival time for patient i, and δi indicates whether the survival time is observed (δi = 1) or not (δi = 0). We estimate the parameters of our proposed model using the likelihood method. The likelihood function for a censored data looks like the usual likelihood function, but we have to add information from the ∏100 censored variable. For a right-censored, the likelihood function based on this data is lc (θ) = i=1 f (ti ; θ)δi S (ti ; θ)1−δi , where f is the PDF, and S is the survival function. Similarly, we use numerical methods to estimate the unknown parameters of the model. Once the maximum likelihood estimates are of the survival function is available. We fit three √ computed, estimate √ submodels from the full model,TWE: TWE(λ, α, α/ λ); TWE(λ, α, α ), and TWE(λ, α, α ). To examine the proposed model, we fit the weighted exponential distribution (WE), the generalized exponential distribution, the Weibull distribution, the generalized Weibull distribution, and log logistic distribution separately to this data. The log-likelihood and the AIC for each model are reported in Table√2. From this table, we see that our submodels provide a good fit for this data. In particular, our submodel TWE(λ,√α, α/ λ) provides the best fit for this data among the other models included in the analysis, and the submodel TWE(λ, α, α ) is a close second. The generalized Weibull GW has a much poorer fit than the Weibull or log logistic distributions and there is no evidence of an improved fit for the HIV data, using these distributions. The Kaplan–Meiers and the fitted survival functions for the HIV data are plotted in Fig. 6.
M.K. Shakhatreh / Statistics and Probability Letters 82 (2012) 252–261
261
Table 2 MLEs of the survival time in month for the HIV data.
λ
α1
α2
Log-likelihood
AIC
0.037 0.038 0.035 0.170 0.037 0.043 0.018 0.010
18.856 201.421 46.110 0.700 15.793 1.326 1.183 1.559
98.457 14.192 46.110 2.414 – – – –
−162.088 −162.131 −162.155 −162.232 −162.160 −162.416 −162.632 −162.808
328.176 328.262 328.31 330.464 328.32 328.832 329.264 329.616
√
TWE(λ, α, α/ √ λ) TWE(λ, α, α ) TWE(λ, α, α ) GW(λ, α1 , α2 ) WE(λ, α ) GE(λ, α ) Weibull(λ, α ) log logistic(λ, α )
√
Fig. 6. Kaplan–Meiers and the fitted survival functions for the HIV data: TWE(λ, α, α/ λ).
7. Conclusions In this paper, we have introduced a TWE distribution. The proposed distribution generalizes the WE and the exponential distributions. These distributions can be obtained as a limiting distribution of TWE. Moreover, the proposed distribution has a wider range of skewness than WE distribution. The flexibility of the TWE distribution and increased range of skewness were able to fit and capture features in two real data sets much better than WE and other popular distributions. Acknowledgments I would like to thank the Associate Editor and an anonymous referee for their constructive comments that led to improvements in this paper. References Arellano-Valle, R.B., Branco, D., Genton, M.G., 2006. A unified view on skewed distributions arising from selections. The Canadian Journal of Statistics 34, 581–601. Arnold, B.C., Beaver, R.J., 2000a. The skew Cauchy distribution. Statistics and Probability Letters 49, 285–290. Arnold, B.C., Beaver, R.J., 2000b. Hidden truncation models. Sankhya Serial A 62, 23–35. Azzalini, A., 1985. A class of distributions which includes the normal ones. Scandinavian Journal of Statistics 12, 171–178. Eugene, N, Lee, C., Famoye, F., 2002. Beta-normal distributions and its application. Communication in Statistics—Theory and Methods 81, 497–512. Genton, M.G., 2004. Skew-Elliptical Distributions and their Applications: A Journey Beyond Normality. Chapman & Hall/CRS, New York. Gupta, A.K., Zeng, W., Wu, Y., 2010. Probability and Statistical Models: Foundations for Problems in Reliability and Financial Mathematics. Springer & Business Media, New York. Gupta, R.D., Kundu, D., 2009. A new class of weighted exponential distributions. Statistics 43, 621–634. Hosmer, D.W., Lemeshow, S., 1999. Applied Survival Analysis: Regression Modelling of Time to Event Data. John Wiley& Sons, New York. Jammalizadeh, A., Behboodian, J., Balakrishnan, N., 2008. A two-parameter generalized skew-normal distribution. Statistics and Probability Letters 78, 1722–1726. Karlin, S., 1968. Total Positivity, vol. 1. Stanford University press, Stanford. Nadarajah, S., 2009. The skew logistic distribution. Advances in Statistical Analysis 93, 197–203.