Applied Mathematics and Computation 247 (2014) 762–779
Contents lists available at ScienceDirect
Applied Mathematics and Computation journal homepage: www.elsevier.com/locate/amc
The exponentiated generalized Birnbaum–Saunders distribution Gauss M. Cordeiro, Artur J. Lemonte ⇑ Departamento de Estatística, Universidade Federal de Pernambuco, Cidade Universitária, 50740-540 Recife, PE, Brazil
a r t i c l e
i n f o
a b s t r a c t A new distribution called the exponentiated generalized Birnbaum–Saunders distribution is proposed and studied. The new distribution is quite flexible to model survival data, reliability problems, fatigue life studies and hydrological data. It can have decreasing, increasing, upside-down bathtub, bathtub-shaped, increasing–decreasing–increasing and decreasing–increasing–decreasing hazard rate functions. We provide a comprehensive account of the mathematical properties of the new distribution and various structural quantities are derived. We discuss maximum likelihood estimation of the model parameters for uncensored and censored data. Two empirical applications of the new model to real data are presented for illustrative purposes. Ó 2014 Elsevier Inc. All rights reserved.
Keywords: Birnbaum–Saunders distribution Exponentiated distribution Fatigue life distribution Lifetime data Maximum likelihood estimation
1. Introduction Generalizing distributions is always precious for applied statisticians. The recent literature has suggested several ways of extending well-known distributions. One of the earliest class of models generated by the cumulative beta distribution was pioneered by Eugene et al. [19]. The more recent ones are: the class of distributions generated by gamma random variables introduced by Zografos and Balakrishnan [51]; the class of distributions generated by Kumaraswamy’s [27] random variable proposed by Cordeiro and de Castro [13]; the class of distributions generated by McDonald’s [38] generalized beta random variable formulated by Alexander et al. [1]; and the class of exponentiated generalized distributions defined by Cordeiro et al. [15]. These generators allow us to define new families of distributions to extend well-known distributions and at the same time provide great flexibility in modeling real data, i.e., we can develop more realistic statistical models in a great variety of applications. Some of the above methods were recently discussed in [29]. The generator approach in [15] is as follows. For a baseline continuous cumulative distribution function (cdf) GðxÞ, these authors defined the exponentiated generalized (‘‘EG’’ for short) family of distributions by the cdf
b FðxÞ ¼ 1 ½1 GðxÞa ;
x 2 R;
ð1Þ
where a > 0 and b > 0 are two additional parameters whose role is to govern skewness and generate distributions with heavier/lighter tails. They are sought as a manner to furnish a more flexible distribution. Because of its tractable distribution function (1), the EG class of distributions can be used quite effectively even if the data are censored. The EG family is a continuous univariate class of distributions for modeling continuous univariate data that can be in any interval of the real ⇑ Corresponding author. E-mail addresses:
[email protected] (G.M. Cordeiro),
[email protected] (A.J. Lemonte). http://dx.doi.org/10.1016/j.amc.2014.09.054 0096-3003/Ó 2014 Elsevier Inc. All rights reserved.
G.M. Cordeiro, A.J. Lemonte / Applied Mathematics and Computation 247 (2014) 762–779
763
line. Therefore, this family is motivated to analyze continuous univariate data that have any type of support. The probability density function (pdf) corresponding to (1) is
b1 f ðxÞ ¼ a b ½1 GðxÞa1 1 ½1 GðxÞa gðxÞ;
x 2 R;
ð2Þ
where gðxÞ ¼ dGðxÞ=dx is the baseline pdf. The two extra parameters in (2) can control both tail weights and possibly adding entropy to the center of the EG pdf. The baseline cdf GðxÞ is a special case of (1) when a ¼ b ¼ 1. Setting a ¼ 1 it gives the exponentiated-G distribution. If b ¼ 1, we obtain the Lehmann type II distribution. So, the distribution (2) generalizes both Lehmann types I and II alternative distributions; that is, this method can be interpreted as a double construction of Lehmann alternatives. Note that even if gðxÞ is a symmetric density, the density f ðxÞ will not be symmetric. The class of EG distributions shares an attractive physical interpretation whenever a and b are positive integers. Consider a device made of b independent components in a parallel system. Further, each component is made of a independent subcomponents identically distributed according to GðxÞ in a series system. The device fails if all b components fail and each component fails if any subcomponent fails. Let X j1 ; . . . ; X ja denote the lifetimes of the subcomponents within the jth component (for j ¼ 1; . . . ; b) having a common cdf GðxÞ. Let X j denote the lifetime of the jth component and let X denote the lifetime of the device. Thus, the cdf of X is
PrðX 6 xÞ ¼ PrðX 1 6 x; . . . ; X b 6 xÞ ¼ PrðX 1 6 xÞb ¼ ½1 PrðX 1 > xÞb ¼ ½1 PrðX 11 > x; . . . ; X 1a > xÞb b
¼ f1 ½1 GðxÞa g : So, it follows that the lifetime of the device obeys the EG distribution. In this paper, we propose a generalization of the two-parameter Birnbaum–Saunders (‘‘BS’’ for short) distribution [8] on the basis of Cordeiro et al.’s [15] generator. The advantage of this approach for constructing new distributions lies in its flexibility to model both monotonic and non-monotonic hazard rate functions (hrfs) even though the baseline hrf may be monotonic. It should mentioned that the two-parameter BS distribution has been studied by several authors in the recent years and has been applied in modeling survival data, reliability problems, fatigue life studies and hydrological data (see Section 2). We go further and generalize the two-parameter BS distribution. The new model is called the exponentiated generalized Birnbaum–Saunders (‘‘EGBS’’) distribution. One of the aims for introducing the EGBS distribution relies on the fact that practitioners will have a new quite flexible BS distribution to use in practice. We hope that the EGBS distribution may serve as an alternative model to the BS distribution. We also hope that the proposed distribution may work better (at least in terms of model fitting) than the BS distribution in certain practical situations, although it cannot always be guaranteed. We present applications to real data which demonstrated that the EGBS distribution yields a better fit than the BS distribution. Additionally, we provide a comprehensive account of the mathematical properties of the proposed distribution. As we will see later, the formulae related with the new distribution are simple and manageable, and with the use of modern computer resources and their numerical capabilities, the new model may prove to be an useful addition to the arsenal of applied statisticians in areas like biology, medicine, economics, reliability, engineering, among others. In summary, the basic motivation for introducing the EGBS distribution is based on the fact that we are proposing a quite flexible distribution which can be used in several fields of applications. The rest of the paper is organized as follows. In Section 2, we provide a brief review about the two-parameter BS distribution. We introduce the new EGBS distribution in Section 3. In Section 4, we provide several structural properties of the EGBS distribution: two useful expansions are derived in Section 4.1; explicit expressions for the ordinary and incomplete moments and mean deviations are obtained in Sections 4.2 and 4.3, respectively; the moment generating function (mgf) of X is provided in Section 4.4; the stress-strength parameter is derived in Section 4.5; and Section 4.6 deals with order statistics and their moments. Non-standard measures for the skewness and kurtosis are provided in Section 5. In Section 6, we discuss maximum likelihood estimation and likelihood inference. Two applications to real data are presented in Section 7 to show the usefulness of the proposed distribution. Finally, concluding remarks are addressed in Section 8. 2. The Birnbaum–Saunders distribution Fatigue is a structural damage which occurs when a material is exposed to stress and tension fluctuations. When the effect of vibrations on material specimens and structures is studied, the first point to be considered is the mechanism that could cause fatigue of these materials. The fatigue process (fatigue life) begins with an imperceptible fissure, the initiation, growth, and propagation of which produces a dominant crack in the specimen due to cyclic patterns of stress, whose ultimate extension causes the rupture or failure of this specimen. The failure occurs when the total extension of the crack exceeds a critical threshold for the first time. The partial extension of a crack produced by fatigue in each cycle is modeled by a random variable which depends on the type of material, the magnitude of the stress, and the number of previous cycles, among other factors. Motivated by problems of vibration in commercial aircraft that caused fatigue in the materials, Birnbaum and Saunders [8] proposed the two-parameter BS distribution, also known as the fatigue life distribution, with shape parameter a > 0 and scale parameter b > 0, say BSða; bÞ. It was originally derived from a model for a physical fatigue process where dominant
764
G.M. Cordeiro, A.J. Lemonte / Applied Mathematics and Computation 247 (2014) 762–779
crack growth causes failure. It was later derived by Desmond [17] using a biological model which followed from relaxing some of the assumptions originally made by Birnbaum and Saunders [8]. This distribution can be used to model lifetime data and it is widely applicable to model failure times for fatigue materials. A random variable T has the BSða; bÞ distribution, if it can be expressed as T ¼ bfaZ þ ½ðaZÞ2 þ 4
GðtÞ ¼ Uðv t Þ; where
1=2 2
g =4, where Z N ð0; 1Þ. Its cdf is given by
t > 0;
ð3Þ
v t ¼ a qðt=bÞ; qðzÞ ¼ z
1=2
1
z
1=2
and UðÞ is the standard normal distribution function. It is easy to verify that b is
the median of the distribution: GðbÞ ¼ Uð0Þ ¼ 1=2. Also, we have that T 1 BSða; b1 Þ and k T BSða; kbÞ (for any k > 0). The pdf corresponding to (3) takes the form
gðtÞ ¼ /ðv t Þ V t ;
t > 0;
ð4Þ
pffiffiffi where V t ¼ dv t =dt ¼ t ðt þ bÞ=ð2a bÞ and /ðÞ is the standard normal density function. The density function (4) is right skewed and the skewness decreases with a. The fractional moments of T are given by EðT p Þ ¼ bp Iðp; aÞ, where 3=2
Iðp; aÞ ¼
K pþ1=2 ða2 Þ þ K p1=2 ða2 Þ : 2K 1=2 ða2 Þ
ð5Þ
Here, the function K m ðzÞ denotes the modified Bessel function of the third kind with m representing its order and z the argument. The mean and variance of T are EðTÞ ¼ bð1 þ a2 =2Þ and VðTÞ ¼ ðabÞ2 ð1 þ 5a2 =4Þ, respectively. Notice that both mean and variance increase as the shape parameter a increases. The BS quantile function (qf), say Q BS ðuÞ, is given by
Q BS ðuÞ ¼
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2 b a U1 ðuÞ þ 4 þ a2 U1 ðuÞ2 ; 4
u 2 ð0; 1Þ;
ð6Þ
where U1 ðÞ is the standard normal qf. The BS distribution has received significant attention over the last few years by many researchers. The shape of the hrf of the BS distribution was discussed in [28]. The authors showed that the BS hrf is not monotone and is upside-down bathtub (unimodal) for all ranges of the parameter values. Results on improved statistical inference and interval estimation for the BS distribution were investigated by Lemonte et al. [34,36] and Wang [50], respectively. Other important works about the BS distribution are Guiraud et al. [21], Balakrishnan et al. [5], Xu and Tang [49], Bhatti [7], Meintanis [39], Lemonte and Ferrari [35] and Pradhan and Kundu [46], among many others. For book treatments of the BS distribution, the reader is referred to [25, Chapter 33], [37, Chapter 13] and [47, Chapter 10]. The BS distribution has been applied in a wide variety of fields. We refer the reader to [4,30,31] for applications in reliability and in other fields, respectively. It is worth emphasizing that the major weakness of the BS distribution relies on the fact that its hrf can only be upside-down bathtub-shaped [28]; that is, the BS hrf does not accommodate monotone (increasing or decreasing) or bathtub-shaped hrfs. It can be a problem in practical applications since many real-life data exhibit monotone or bathtub-shaped hrfs. An example of bathtub-shaped hrf is the human mortality experience with a high infant mortality rate which reduces rapidly to reach a low level. It then remains at that level for quite a few years before picking up again. In this paper, we shall propose a generalization of the BS distribution in which the new hrf can be decreasing, increasing, upside-down bathtub or bathtub-shaped. 3. The EGBS distribution Here, we define and study basic properties of the EGBS distribution. The EGBS cdf is given by
b FðxÞ ¼ 1 Uðv x Þa ;
x > 0;
ð7Þ
where a > 0; b > 0 and a > 0 control the shapes of the distribution, the parameter b > 0 is the scale parameter, and v x is defined after Eq. (3). Clearly, if a ¼ b ¼ 1, the EGBS distribution reduces to the BS distribution. The exponentiated BS (EBS) distribution corresponds to a ¼ 1. If b ¼ 1, we obtain the Lehmann type II BS (LeBS) distribution. Hereafter, a random variable X having the cdf (7) is denoted by X EGBSða; b; a; bÞ. If X EGBSða; b; a; bÞ, then kX EGBSða; b; a; kbÞ (for k > 0), i.e., the class of EGBS distributions is closed under scale transformations as occurs with the BS distribution. The pdf corresponding to (7) takes the form
b1 f ðxÞ ¼ a b V x /ðv x Þ Uðv x Þa1 1 Uðv x Þa ;
x > 0;
ð8Þ
where V x is defined after Eq. (4). It is clear that the EGBS pdf (8) is a weighted BS pdf with weight function b1 a b Uðv x Þa1 1 Uðv x Þa . The density function (8) can take various forms depending on its shape parameters. In particular, it can be a decreasing function or it can be a skewed unimodal density. Fig. 1 displays some plots of (8). They reveal that the EGBS distribution is very versatile and that the additional shape parameters a and b have substantial effects on its skewness and kurtosis. Further calculations lead to
765
G.M. Cordeiro, A.J. Lemonte / Applied Mathematics and Computation 247 (2014) 762–779
α = 0.5
α = 1.5
2.0
a = 0.5, b = 0.5 a = 1.4, b = 0.5 a = 0.8, b = 1.5 a = 2.5, b = 1.5 a = 1.9, b = 3.7 a = 3.5, b = 2.5 a = 2.5, b = 5.5
1.0
0.4
0.5
0.2
0.0
0.0 0.0
0.5
1.0
1.5 x
2.0
2.5
3.0
a = 0.1, b = 0.5 a = 1.2, b = 0.1 a = 0.2, b = 0.8 a = 0.4, b = 0.8 a = 0.6, b = 1.1 a = 0.5, b = 0.2 a = 0.9, b = 1.2
0.6
density
1.5
density
0.8
0.0
0.5
1.0
1.5
x
Fig. 1. The EGBS density function for some parameter values; b ¼ 1.
" # d log½f ðxÞ 3 1 ða 1Þ a ðb 1ÞUðv x Þa1 : ¼ þ v x V x /ðv x Þ V x dx 2x x þ b Uðv x Þ 1 Uðv x Þa The EGBS density function in (8) may have a mode. The x value, x say, corresponding to the mode of the density function (8) is obtained from ½d log½f ðxÞ=dxx¼x ¼ 0. b The EGBS survival function is given by SðxÞ ¼ 1 1 Uðv x Þa (for x > 0). The simple structure of SðxÞ implies that the EGBS distribution can be used quite effectively in analyzing lifetime data in the presence of censoring. The EGBS hrf is given by
hðxÞ ¼
b1 a b V x /ðv x Þ Uðv x Þa1 1 Uðv x Þa ; b 1 1 Uðv x Þa
x > 0:
ð9Þ
The reverse hrf of the EGBS distribution is
rðxÞ ¼
a b V x /ðv x Þ Uðv x Þa1 ; 1 Uðv x Þa
x > 0:
Fig. 2 displays some plots of the EGBS hrf (9) for some parameter values. The parameter b does not change the shape of the hrf since it is a scale parameter. It is evident that the hrf of the proposed distribution can be decreasing, increasing, upsidedown bathtub shaped or bathtub-shaped. It is difficult (or even impossible) to determine analytically the parameter spaces corresponding to these shapes. It is interesting to point out that the EGBS hrf can also be increasing–decreasing–increasing or decreasing–increasing–decreasing (see Fig. 3). So, the new generalized BS distribution is quite flexible and can be used effectively in analyzing real data in many areas. Thus, the beauty and importance of the new distribution lies in its ability to model monotone as well as non-monotone failure rates which are quite common in reliability and biological studies. The EGBS cdf (7) has tractable properties. For example, it yields a very simple qf
1=a xu ¼ Q ðuÞ ¼ Q BS 1 ð1 u1=b Þ ;
u 2 ð0; 1Þ;
ð10Þ
where Q BS ðÞ is defined in (6). It follows immediately that the median of X EGBSða; b; a; bÞ is x1=2 ¼ Q ð1=2Þ. In addition, Q ðuÞ gives a trivial random variable generation: if U Uð0; 1Þ, then X EGBSða; b; a; bÞ is given by
1=a X ¼ Q BS 1 ð1 U 1=b Þ : Since the uniform random variables are easily generated in most statistical packages, the above scheme is very useful to generate EGBS random variates. 4. General properties General properties of the EG family such as moments, mean deviations, entropy, among several others, are analytically tractable only in special cases when both functions GðxÞ and gðxÞ have simple analytic expressions. Therefore, power series for the cdf and pdf of the EG family represent the usual method to derive some of its structural properties. In this section, based on these expansions, we study some EGBS mathematical properties.
766
G.M. Cordeiro, A.J. Lemonte / Applied Mathematics and Computation 247 (2014) 762–779
decreasing 50
increasing 0.12
a = 18, b = 0.1, α = 1.85
0.10
40
hazard rate
hazard rate
0.08 30
20
0.06
0.04
0.02
10
a = 1.2, b = 8.0, α = 2.0
0.00 0
1
2
3
4
0
1
2
3
x
x
upside−down bathtub
bathtub−shaped
a = 1.6, b = 1.2, α = 1.5
a = 0.35, b = 0.01, α = 0.19
5.0
1.2
4
4.5
0.8
hazard rate
hazard rate
1.0
0.6
4.0
0.4 0.2
3.5
0.0 0
1
2
3
4
0
1
2
x
3
4
x
Fig. 2. The EGBS hrf for some parameter values; b ¼ 1.
increasing−decreasing−increasing 5.5e−06
decreasing−increasing−decreasing
a = 0.5, b = 10, α = 10
a = 35, b = 0.05, α = 1.15 30
hazard rate
hazard rate
5.0e−06 28
26
4.5e−06
24 4.0e−06 0
1
2
3
4
0.0
0.2
x
0.4
0.6 x
Fig. 3. The EGBS hrf for some parameter values; b ¼ 1.
0.8
1.0
G.M. Cordeiro, A.J. Lemonte / Applied Mathematics and Computation 247 (2014) 762–779
767
4.1. Useful expansions Useful expansions for (1) and (2) can be derived using the concept of exponentiated distributions. For an arbitrary continuous baseline cdf GðxÞ, a random variable is said to have the exponentiated-G (‘‘exp-G’’ for short) distribution with power parameter a > 0, say X exp-GðaÞ, if its pdf and cdf are
Ha ðxÞ ¼ GðxÞa ;
ha ðxÞ ¼ a gðxÞGðxÞa1 ;
ð11Þ
respectively. The properties of the exponentiated distributions have been studied by many researchers. We refer the reader to [41,23,24,44,22,10,2,42,33,43,32], among many others. We consider the generalized binomial expansion
ð1 zÞb ¼
1 X
ð1Þk
b k
k¼0
zk ;
ð12Þ
which is valid for any real non-integer b and jzj < 1. Using (12) twice in Eq. (1), the EG cumulative function can be expressed as
FðxÞ ¼
1 X wjþ1 Hjþ1 ðxÞ; j¼0
where wjþ1 ¼ we obtain
f ðxÞ ¼
P1
jþmþ1 m¼1 ð1Þ
b m
ma jþ1
and Hjþ1 ðxÞ is the exp-G cdf with power parameter j þ 1. By differentiating FðxÞ,
1 X wjþ1 hjþ1 ðxÞ;
ð13Þ
j¼0
where hjþ1 ðxÞ is the exp-G pdf with power parameter j þ 1. Eq. (13) reveals that the EG density function is a linear combination of exp-G densities. This result is important to derive some properties of the EG distribution such as the ordinary and incomplete moments, generating function and mean deviations from those of the exp-G distribution. Let T jþ1 be a random variable having the EBS distribution with power parameter j þ 1. The density function of T jþ1 is obtained from (3) and (4) as
hjþ1 ðtÞ ¼ ðj þ 1Þ Uðv t Þj gðtÞ ¼ ðj þ 1Þ Uðv t Þj /ðv t Þ V t ;
t > 0:
From the above density and (13), the density function of X can be reduced to
f ðxÞ ¼
1 X ðj þ 1Þ wjþ1 Uðv x Þj /ðv x Þ V x ;
x > 0:
ð14Þ
j¼0
So, several EGBS mathematical properties can be determined directly from Eq. (14). 4.2. Moments The rth ordinary moment of X is given by
l0r ¼ b
Z
1
h
l0s ¼ EðX r Þ ¼
ir Q BS 1 ð1 uÞ1=a ub1 du;
R1 0
xr f ðxÞdx. After some algebra, we obtain
ð15Þ
0
where Q BS ðÞ is the BS qf given in (6). There is no closed-form expression for the above integral and hence it needs to be computed numerically in software such as MAPLE, MATLAB, MATHEMATICA, Ox and R. On the other hand, the ordinary moments of X can be obtained in closed-form as an infinite weighted sum of the probability weighted moments (PWMs) of the BS distribution. First, we review the PWMs of the BS distribution since they are required for the ordinary moments of the EGBS distribution. If T BSða; bÞ has the pdf (4), the ðr; jÞth PWM of T is formally defined by
sr;j ¼ E½T r GðTÞj ¼
Z
1
t r Uðv t Þj /ðv t Þ V t dt:
0
Cordeiro and Lemonte [14] obtained
sr;j in the form
2s j X 1 i þi X 2si þ i ð2si þi2mÞ=2 b X j b ¼ j Aðk1 ; . . . ; ki Þ ð1Þm Iðr þ ð2si þ i 2mÞ=2; aÞ; m 2 i¼0 i k ;...;k ¼0 m¼0 r
sr;j
1
ð16Þ
i
pffiffiffiffi 1 where ak ¼ ð1Þk 2ð12kÞ=2 ½ pð2k þ 1Þk! ; sj ¼ k1 þ þ kj , Aðk1 ; . . . ; kj Þ ¼ a2sj j ak1 . . . akj and Iðr þ ð2si þ i 2mÞ=2; aÞ is determined from (5). Finally, the moments of X can be expressed from (14) as
l0r ¼ EðX r Þ ¼
1 X j¼0
ðj þ 1Þ wjþ1 sr;j :
ð17Þ
768
G.M. Cordeiro, A.J. Lemonte / Applied Mathematics and Computation 247 (2014) 762–779
α = 0.5
α = 2.5
Skewness
Skewness
a
a
b
b
Fig. 4. Skewness of the EGBS distribution as functions of a and b; b ¼ 1.
α = 1.5
α = 2.5
Kurtosis
Kurtosis
a
a
b
b
Fig. 5. Kustosis of the EGBS distribution as functions of a and b; b ¼ 1.
Eqs. (16) and (17) are readily computed numerically using standard statistical software with good accuracy by substituting infinity for 20 or 30. They (and other expansions in the paper) can also be evaluated in symbolic computation software such as MAPLE and MATHEMATICA, for example. These symbolic software have currently the ability to deal with analytic expressions of formidable size and complexity. The skewness and kurtosis of X can be computed from the ordinary moments using well-known relationships. Figs. 4 and 5 show how these measures vary with respect to a and b for some values of a. 4.3. Incomplete moments Many important questions in econometrics require more than just knowing the mean of a distribution, but its shape as well. This is also obvious not only in the study of econometrics and income distributions but in many other areas of research. For empirical purposes, the shape of many distributions can be usefully described by the incomplete moments. These types of moments play an important role for measuring inequality, for example, income quantiles and Lorenz and Bonferroni curves, which depend upon the incomplete moments of a distribution. The nth incomplete moment of X is given by Ry mn ðyÞ ¼ 0 xn f ðxÞdx. By inserting (14) in mn ðyÞ, we obtain
mn ðyÞ ¼ jða; bÞ
1 X
wjþ1
Z
0
j¼0
y
sðx=bÞ xn3=2 ðx þ bÞ exp Uðv x Þj dx; 2a2
pffiffiffiffiffiffiffiffiffi where jða; bÞ ¼ expða2 Þ=ð2a 2pbÞ and sðzÞ ¼ z þ z1 . Following [14], we have
Uðv x Þj ¼
X j 1 m þm 2sm þ m ð2sm þm2pÞ=2 1X j Aðk1 ; . . . ; km Þ 2sX x ðbÞp ; ð2sm þmÞ=2 j p 2 m¼0 m k ;...;km ¼0 b p¼0 1
where sm and Aðk1 ; . . . ; kn Þ are defined in the last section. Thus,
X j 1 1 X bjþ1 X j
bð2sm þmÞ=2 Aðk1 ; . . . ; km Þ j m 2 m¼0 j¼0 k1 ;...;km ¼0 Z y 2sX m þm 2s þ m sðx=bÞ m dx: ðbÞp xnþð2sm þm2p3Þ=2 ðx þ bÞ exp 2 2a p 0 p¼0
mn ðyÞ ¼ jða; bÞ
G.M. Cordeiro, A.J. Lemonte / Applied Mathematics and Computation 247 (2014) 762–779
769
Let
Dðp; qÞ ¼
Z
q
0
Z q=b ðx=b þ b=xÞ ðu þ u1 Þ pþ1 q dx ¼ b du: xq exp u exp 2a2 2a2 0
From [48], we can write
Dðp; qÞ ¼ bpþ1 K pþ1 ða2 Þ qpþ1 K pþ1
q
;
b
2a2 b 2a2 q
;
where K p ðx1 ; x2 Þ denotes the incomplete Bessel function with arguments x1 and x2 and order p. Hence, the nth incomplete moment of X becomes
X j 1 1 X bjþ1 X k mn ðyÞ ¼ jða; bÞ bð2sm þmÞ=2 Aðk1 ; . . . ; km Þ j m j¼0 2 m¼0 k1 ;...;km ¼0 2sX m þm 2sm þ j 2p 1 2sm þ m 2p 3 2sm þ m D nþ ðbÞp ; y þ bD n þ ;y : p 2 2 p¼0 An application of the first incomplete moment m1 ðyÞ is to obtain the Bonferroni and Lorenz curves. These curves are important not only in economics to study income and poverty, but also in other fields like reliability, demography, insurance and medicine. For a given probability p, they are defined by BðpÞ ¼ m1 ðqÞ=ðpl01 Þ and LðpÞ ¼ m1 ðqÞ=l01 , respectively, where q ¼ Q ðpÞ. The mean deviations about the mean (d1 ) and about the median (d2 ) are also obtained from m1 ðyÞ. They can be expressed as d1 ¼ 2½l01 Fðl01 Þ m1 ðl01 Þ and d2 ¼ l01 2m1 ðMÞ, respectively, where FðÞ is the EGBS cdf, l01 ¼ EðXÞ; M is the median M ¼ Q ð1=2Þ and Q ðuÞ is the EGBS qf given by (10). 4.4. Generating function Here, we derive the generating function MðsÞ ¼ Eðes X Þ of the EGBS distribution. The formula for MðsÞ comes from (14) as
MðsÞ ¼
1 X
wjþ1 M jþ1 ðsÞ;
ð18Þ
j¼0
where M jþ1 ðsÞ is the generating function of T jþ1 . Thus, MðsÞ can be determined from the EBS generating function. We can write
Mjþ1 ðsÞ ¼ ðj þ 1Þ
Z
1
es y Uðv x Þj /ðv x Þ V x dx:
0
Setting u ¼ Uðv x Þ in the last equation, we obtain
Mjþ1 ðsÞ ¼ ðj þ 1Þ
Z
1
uj exp½s Q BS ðuÞdu:
ð19Þ
0
Next, we use an equation in Section 0.314 of [20] for a power series raised to a positive integer k given by 1 X ai xi
!k
¼
i¼0
1 X ck;i xi ;
ð20Þ
i¼0
where the coefficients ck;i (for i ¼ 1; 2; . . .) are computed from the recurrence equation
ck;i ¼ ði a0 Þ
1
i X
½mðk þ 1Þ i am ck;im
ð21Þ
m¼1
and ck;0 ¼ ak0 . The coefficient ck;i can be determined from ck;0 ; . . . ; ck;i1 and then from the quantities a0 ; . . . ; ai . Following [14], we obtain an expansion for the BS qf and correct their expression using the following six steps. First, we define the following quantities determined recursively by (for k P 0)
bkþ1 ¼
k X 1 ð2r þ 1Þ ð2k 2r þ 1Þ br bkr : 2ð2k þ 3Þ r¼0 ðr þ 1Þð2r þ 1Þ
The first constants are b0 ¼ 1; b1 ¼ 1=6; b2 ¼ 7=120; b3 ¼ 127=7560; . . .. Second, if the condition 2 < ðy=bÞ1=2 ðb=yÞ1=2 < 2 holds (which is not very restrictive), we obtain an expansion for y ¼ Q BS ðuÞ given in (6) as (in Cordeiro and Lemonte’s paper there was an error in this step)
y ¼ Q BS ðuÞ ¼
1 X i¼0
pi ðu 1=2Þi
" #i 1 X dk ðu 1=2Þk ; k¼0
u 2 ð0; 1Þ;
ð22Þ
770
G.M. Cordeiro, A.J. Lemonte / Applied Mathematics and Computation 247 (2014) 762–779
pffiffiffiffiffiffiffi 2iþ1 1=2 ¼ 22i b a2iþ1 2p (for i P 0), p2 ¼ 2 p ba2 =2; p2i ¼ 0 (for i P 2) and dk ¼ bk=2 for k even and i
where p0 ¼ b; p2iþ1
dk ¼ 0 for k odd. Third, using Eqs. (20) and (21), Q BS ðuÞ can be expressed as 1 X
Q BS ðuÞ ¼
pi ei;k
i;k¼0
iþkr iþk X iþk 1 ur ; 2 r r¼0
u 2 ð0; 1Þ;
where the quantities ei;k (for i P 0) are easily determined from 1 Pk ei;k ¼ ðk d0 Þ m¼1 ½mði þ 1Þ k dm ei;km . Fourth, Q BS ðuÞ can be simplified to 1 X
Q BS ðuÞ ¼
pi ei;k
i;k¼0 iþk¼q¼0
qr q X q 1 ur ; 2 r r¼0
the
constants
dk ’s
as
i
ei;0 ¼ d0
and
u 2 ð0; 1Þ
and by substituting q 1 X X
1 X 1 X
by
r¼0
i;k¼0 r¼0 iþk¼q¼0
i;k¼0 iþk¼q¼r
and defining 1 X
nr ¼
i;k¼0 iþk¼q¼r
qr q 1 ; pi ei;k 2 r
we can rewrite the BS qf as the power series
Q BS ðuÞ ¼
1 X nr ur ;
u 2 ð0; 1Þ:
ð23Þ
r¼0
Eq. (23) is important do derive the mgf and other mathematical quantities for the EGBS distribution. Inserting (23) in (19), expanding the exponential function and using (20) and (21), we obtain
Mjþ1 ðsÞ ¼ ðj þ 1Þ
1 X
r dp;r sp ; ðr þ j þ 1Þ p! p;r¼0
where (for p P 0) dp;0 ¼ np0 and dp;r ¼ ðr n0 Þ1 EGBS mgf reduces to
MðsÞ ¼
1 X gp sp
p!
p¼0
ð24Þ Pr
m¼1 ½mðp
þ 1Þ r nm dp;rm (for r P 1). Finally, substituting (24) in Eq. (18), the
ð25Þ
;
P where gp ¼ 1 r;j¼0 ½r ðj þ 1Þ wjþ1 dp;r =ðr þ j þ 1Þ. Clearly, from Eq. (25), we obtain an alternative expression for the pth moment p of X as EðX Þ ¼ gp =p!. 4.5. Stress-strength parameter In the context of reliability, the stress-strength model describes the life of a component which has a random strength X 1 subjected to a random stress X 2 . The component fails at the instant that the stress applied to it exceeds the strength, and the component will function satisfactorily whenever X 1 > Y 2 . In probability terms, R ¼ PrðX 1 > X 2 Þ is a measure of component reliability. It has many applications, especially in engineering concepts such as structures, deterioration of rocket motors, static fatigue of ceramic components, fatigue failure of aircraft structures, and the aging of concrete pressure vessels. Let X 1 EGBSða1 ; b1 ; a; bÞ and X 2 EGBSða2 ; b2 ; a; bÞ be two independent random variables with the same baseline parameters a and b. Let f i denote the pdf of X i and F i denote the cdf of X i . The stress-strength parameter is obtained as
R¼
Z
1 0
f 1 ðxÞ F 2 ðxÞ dx ¼ a1 b1
Z
1
V x /ðv x Þ Uðv x Þa1 1
0
½1 Uðv x Þa1
b1 1
½1 Uðv x Þa2
b2
dx:
Setting u ¼ Uðv x Þ, the above expression reduces to
R ¼ a1 b1
Z
1
ua1 1 ð1 ua1 Þ
b1 1
b
ð1 ua2 Þ 2 du:
ð26Þ
0
Notice that R does not depend on the baseline parameters a and b but only on the additional shape parameters. The integral in (26) has to be computed numerically. If a1 ¼ a2 , then Eq. (26) can be reduced to
R¼
b1 : b1 þ b2
G.M. Cordeiro, A.J. Lemonte / Applied Mathematics and Computation 247 (2014) 762–779
771
In the particular case a1 ¼ a2 and b1 ¼ b2 , we have R ¼ 1=2. Finally, using the generalized binomial twice in (26), we obtain
R ¼ a1 b1
1 X
ð1Þrþs ½ðr þ 1Þ a1 þ s a2 r;s¼0
b1 1
r
b2 s
:
4.6. Order statistics The density f i:n ðxÞ of the ith order statistic X i:n , for i ¼ 1; . . . ; n, from independent identically distributed random variables X 1 ; . . . ; X n is given by
f ðxÞ FðxÞi1 ½1 FðxÞni ; Bði; n i þ 1Þ
f i:n ðxÞ ¼
where Bð; Þ is the beta function. Substituting (1) and (2) into the above expression, we can write
ib1 n b oni ab gðxÞ½1 GðxÞa1 1 ½1 GðxÞa 1 1 ð1 GðxÞÞa : Bði; n iÞ
f i:n ðxÞ ¼
To obtain an expansion for f i:n ðxÞ we proceed as follows. Using the binomial expansion (12), the above density can be expressed as
f i:n ðxÞ ¼
ni X 1 ni ði þ kÞb 1 a b gðxÞ X ½1 GðxÞaþr1 : ð1Þkþr Bði; n iÞ k¼0 r¼0 k r
Using again (12) and the BS cdf (3), the density function of the EGBS order statistics reduces to 1 X
mjþ1 hjþ1 ðxÞ; x > 0;
f i:n ðxÞ ¼
ð27Þ
j¼0
where
mjþ1 ¼
ni X 1 X aþr1 ni ði þ kÞb 1 ab ð1Þkþrþj ðj þ 1Þ Bði; n iÞ k¼0 r¼0 j k r
and hjþ1 ðxÞ is the EBS density function with power parameter j þ 1 given in Section 4.1. Thus, the density function (27) of the EGBS order statistics is an infinite linear combination of the EBS density functions. So, several mathematical properties like moments, mean deviations and generating function of the EGBS order statistics can be directly obtained from those of the EBS distribution. For example, the moments of X i:n can be expressed in terms of the PWMs of the BS distribution as
EðX si:n Þ ¼
1 X ðj þ 1Þ mjþ1 ss;j :
ð28Þ
j¼0
We can obtain another closed-form expression for the moments using a result due to [6] applied to the independent and identically distributed case. We have (for s ¼ 1; 2 . . .) n X
EðX si:n Þ ¼ s
ð1Þjnþi1
j1
n j
ni
j¼niþ1
J j ðsÞ;
where (for j ¼ 1; . . . ; n)
J j ðsÞ ¼
Z
1
xj1 ½1 FðxÞs dx:
0
We require only J j ðsÞ to obtain EðX si:n Þ. We can insert (7) in the binomial term in J j ðsÞ and obtain
½1 FðxÞs ¼
1 X t p Uðv x Þp ; p¼0
where
t p ¼ t p ðs; a; bÞ ¼
1 X s X ra s kb : ð1Þkþrþp p k r r¼0 k¼0
Then,
EðX si:n Þ ¼ s
n X j¼niþ1
ð1Þjnþi1
j1 ni
X n 1 t p sj1;p ; j p¼0
which is an alternative formula for (28).
772
G.M. Cordeiro, A.J. Lemonte / Applied Mathematics and Computation 247 (2014) 762–779
5. Quantile measures The EGBS qf can be easily determined from the BS qf as given in (10). The effects of the additional shape parameters a and b on the skewness and kurtosis of the EGBS distribution can be considered based on quantile measures. The shortcomings of the classical kurtosis measure are well-known. There are many heavy-tailed distributions for which this measure is infinite. Thus, it becomes uninformative precisely when it needs to be. The Bowley skewness [26] is one of the earliest skewness measures defined by the average of the quartiles minus the median, divided by half the interquartile range, namely
B¼
Qð3=4Þ þ Q ð1=4Þ 2Qð1=2Þ : Q ð3=4Þ Q ð1=4Þ
Since only the middle two quartiles are considered and the outer two quartiles are ignored, this adds robustness to the measure. The Moors kurtosis ([40]) is based on octiles
M¼
Qð3=8Þ Q ð1=8Þ þ Q ð7=8Þ Qð5=8Þ : Q ð6=8Þ Q ð2=8Þ
Clearly, M > 0 and there is good concordance with the classical kurtosis measures for some distributions. These measures are less sensitive to outliers and they exist even for distributions without moments. Because M is based on the octiles, it is not sensitive to variations of the values in the tails or to variations of the values around the median. The basic justification for M as an alternative measure of kurtosis is the following: keeping Q ð6=8Þ Q ð2=8Þ fixed, M clearly decreases as Q ð3=8Þ Q ð1=8Þ and Q ð7=8Þ Q ð5=8Þ decrease. If Q ð3=8Þ Q ð1=8Þ ! 0 and Q ð7=8Þ Q ð5=8Þ ! 0, then M ! 0 and half of the total probability mass is concentrated in the neighborhoods of the octiles Q ð2=8Þ and Q ð6=8Þ. In Figs. 6 and 7, we plot the measures B and M as functions of the additional shape parameters a and b for some values of a. It is clear from the above definitions of B and M that these measures do not depend on the scale parameter b. Notice that these plots indicate that both measures B and M depend on all shape parameters. In particular, the measure B decreases as a and b increase, whereas the measure M increases as a and b increase. In short, on the basis of these measures, the additional shape parameters a and b have substantial effect on the skewness and kurtosis of the proposed distribution.
α = 0.5
α = 2.5
B B
a
a
b
b
Fig. 6. Plots of the measure B as functions of a and b.
α = 0.5
α = 2.5
M
M
a
b
a
Fig. 7. Plots of the measure M as functions of a and b.
b
G.M. Cordeiro, A.J. Lemonte / Applied Mathematics and Computation 247 (2014) 762–779
773
6. Maximum likelihood estimation >
Let x1 ; . . . ; xn be a random sample of size n of the EGBS distribution with unknown parameter vector h ¼ ða; b; a; bÞ . We consider the estimation of the model parameters of this distribution by the method of maximum likelihood. The log-likelihood function for h based on a given sample is given by
‘ðhÞ ¼ n logða bÞ þ
n X
log½V xi /ðv xi Þ þ ða 1Þ
n X
i¼1
log½Uðv xi Þ þ ðb 1Þ
i¼1
n X
log 1 Uðv xi Þa ;
ð29Þ
i¼1
pffiffiffi where v xi ¼ a1 qðxi =bÞ; V xi ¼ x3=2 ðxi þ bÞ=ð2a bÞ, and qðÞ is defined after Eq. (3). The maximum likelihood estimates i (MLEs) of the unknown parameters are obtained by maximizing the log-likelihood function (29) in relation to h. We make some assumptions [16, Chapter 9] on the behavior of ‘ðhÞ as the sample size n approaches infinity, such as the regularity of the first two derivatives of ‘ðhÞ with respect to h and the existence and uniqueness of the MLE of h. The likelihood equations are given by n n X Uðv xi Þa log½Uðv xi Þ @‘ðhÞ n X ¼ þ log½Uðv xi Þ ðb 1Þ ; @a a i¼1 1 Uðv xi Þa i¼1 n @‘ðhÞ n X ¼ þ log 1 Uðv xi Þa ; @b b i¼1
n n n
ða 1Þ X v xi /ðv xi Þ a ðb 1Þ X v xi /ðv xi Þ Uðv xi Þa1 @‘ðhÞ n 2 1X ¼ 1þ 2 þ 3 f ðxi =bÞ2 þ ; @a a a a i¼1 a i¼1 Uðv xi Þ a 1 Uðv xi Þa i¼1 n n n n
ða 1Þ X /ðv xi Þ fðxi =bÞ a ðb 1Þ X /ðv xi Þ fðxi =bÞ Uðv xi Þa1 @‘ðhÞ n X 1 1 X 2 ¼ þ þ q ðx =bÞ ; þ i @b 2b i¼1 t i þ b 2 a2 b i¼1 2 a b i¼1 Uðv xi Þ 2 a b i¼1 1 Uðv xi Þa >
> b; b b; a a; b bÞ of h ¼ ða; b; a; bÞ can be obtained by solving simultaneously the likelihood where fðzÞ ¼ z1=2 þ z1=2 . The MLE b h ¼ ðb equations
@‘ðhÞ ¼ 0; @a h¼bh
@‘ðhÞ ¼ 0; @b h¼bh
@‘ðhÞ ¼ 0; @ a h¼bh
@‘ðhÞ ¼ 0: @b h¼bh
There is no closed-form expression for the MLE b h and its computation has to be performed numerically using a nonlinear optimization algorithm. The Newton–Raphson iterative technique could be applied to solve the likelihood equations and obtain b h numerically. To start the Newton–Raphson algorithm or some other optimization algorithm, we require some initial guesses for the unknown parameters. We suggest for a and b the initial guesses
x e b þ 2 e h b
ae ¼
!1=2 ;
1=2 e b ¼ x h ;
respectively, where n 1X x ¼ xi ; n i¼1
n 1X 1 n i¼1 xi
¼ h
!1 :
a , is obtained from the solution of the non-linear equation The initial guess for a, say e
0
n þ e a
n X i¼1
B e xi Þ þ B log½Uð v @
1
ea n CX Uð ve xi Þ log½Uð ve xi Þ þ 1C ¼0 A Pn e e i¼1 e xi Þ a e a 1 Uð v i¼1 log 1 Uð v xi Þ
n
e 1 q xi = e e xi ¼ a a , where v b . Finally, after computing the initial guess for a from the above non-linear equation, the initial for e guess for b becomes
e b¼
Pn
n
e e a i¼1 log 1 Uð v xi Þ
:
The observed information matrix used for computing asymptotic confidence intervals for the parameters a; b; a and b can be determined numerically from standard maximization routines, which now provide the observed information matrix as part of their output; e.g., one can use the R functions optim or nlm, the Ox function MaxBFGS, the SAS procedure NLMixed, among others, to compute the observed information matrix numerically.
774
G.M. Cordeiro, A.J. Lemonte / Applied Mathematics and Computation 247 (2014) 762–779
Table 1 Empirical means. a
b
n ¼ 90
n ¼ 150
b a
b b
ab
b b
b a
b b
ab
b b
2.0
2.5 3.5 4.5 5.5
4.113 3.759 4.082 3.231
0.607 0.778 1.327 1.365
0.213 0.242 0.208 0.175
1.967 2.540 2.193 1.587
2.837 3.681 2.920 2.819
0.876 1.052 1.383 2.806
0.298 0.266 0.281 0.225
1.351 1.772 1.560 1.514
2.5
2.5 3.5 4.5 5.5
3.901 4.107 4.019 4.091
0.830 0.817 0.983 1.310
0.241 0.192 0.181 0.206
1.621 1.649 1.889 1.576
3.551 2.991 3.077 2.911
0.870 1.173 1.277 1.421
0.273 0.268 0.249 0.210
1.498 1.175 1.165 1.355
3.0
2.5 3.5 4.5 5.5
4.443 5.023 3.900 3.532
0.922 0.835 1.313 1.229
0.259 0.201 0.184 0.175
3.453 1.748 1.526 1.202
4.341 3.534 2.841 2.985
1.015 1.401 1.492 1.850
0.293 0.291 0.236 0.221
1.678 1.182 0.994 1.117
3.5
2.5 3.5 4.5 5.5
5.141 5.285 4.201 4.331
0.820 0.989 1.002 1.166
0.216 0.200 0.158 0.153
1.410 1.484 1.206 1.606
3.940 3.491 3.102 2.873
1.056 1.289 1.388 1.545
0.282 0.267 0.231 0.231
1.053 1.290 0.939 0.850
Next, a small Monte Carlo simulation experiment is conducted to evaluate the estimations of the EGBS parameters. The simulation was performed using the Ox matrix programming language [18]. The Ox program is freely distributed for academic purposes and available at http://www.doornik.com. The number of Monte Carlo replications was R ¼ 5000. For maxe and e imizing the log-likelihood function, we use the subroutine MaxBFGS with analytical derivatives and take e b, a a; e b as initial values for the model parameters to start the algorithm. The evaluation of point estimation was performed based on the empirical mean for each sample size. We set the sample size at n ¼ 90 and 150, the parameter a at a ¼ 2:0, 2.5, 3.0 and 3.5, and the parameter b at b ¼ 2:5, 3.5, 4.5 and 5.5. The parameters a and b were fixed at 0.5 and 1.0, respectively. The simulation results reveal that the parameters a and b are overestimated, whereas the parameters b and a are underestimated (see Table 1). It should be emphasized that the MLEs are also biased for the BS model parameters [45,34] and hence our simulation results are in according to the above papers. It indicates that correction methods as those considered in [34] could be used to improve the MLEs for the EGBS parameters. However, this is beyond the scope of this paper and could be considered in a separate paper elsewhere. Censored data occur very frequently in lifetime data analysis. Some mechanisms of censoring are identified in the literature as, for example, types I and II censoring. The EGBS survival function has a simple expression and hence this distribution can be used in analyzing data in the presence of censoring. In what follows, we shall consider the general case of multicensored data. Suppose there are n ¼ n0 þ n1 þ n2 subjects of which n0 are known to have failed at the times x1 ; . . . ; xn0 , n1 are known to have failed in the interval ½si1 ; si for i ¼ 1; . . . ; n1 , and n2 survived at a time ri (i ¼ 1; . . . ; n2 Þ but not observed any longer. Notice that types I and II censoring are contained as particular cases of multicensoring. The log-likelihood function of > h ¼ ða; b; a; bÞ for this multicensoring data takes the form
‘ ðhÞ ¼ n0 logða bÞ þ
n0 X
n0 n0 X X log½V xi /ðv xi Þ þ ða 1Þ log½Uðv xi Þ þ ðb 1Þ log 1 Uðv xi Þa
i¼1
þ
n2 X
i¼1
i¼1
n1 n n b o X b b o þ : log 1 1 Uðv ri Þa log 1 Uðv si Þa 1 Uðv si1 Þa
i¼1
i¼1
The MLEs are obtained by maximizing the above log-likelihood function with respect to the unknown parameters. The likelihood equations are b1 n0 n0 n2 X X Uðv xi Þa log½Uðv xi Þ Uðv ri Þa ½1 Uðv ri Þa log½Uðv ri Þ @‘ ðhÞ n0 X ¼ þ log½Uðv xi Þ ðb 1Þ þ b b @a a 1 Uðv xi Þa 1 ½1 Uðv r Þa i¼1 i¼1 i¼1 i
b1 b1 n1 n1 X X Uðv si Þa ½1 Uðv si Þa log½Uðv si Þ Uðv si1 Þa ½1 Uðv si1 Þa log½Uðv si1 Þ b þ b ; b b b b 1 Uðv si Þa 1 Uðv si1 Þa 1 Uðv si Þa 1 Uðv si1 Þa i¼1 i¼1 b1
b1
0 2 1 X ½1 Uðv ri Þa log½1 Uðv ri Þa X ½1 Uðv si Þa log½1 Uðv si Þa @‘ ðhÞ n0 X ¼ þ log 1 Uðv xi Þa þ b b a a b @b b 1 1 Uðv r Þ 1 Uðv s Þa i¼1 i¼1 i¼1 1 Uðv s Þ
n
n
n
i
b1 n1 X ½1 Uðv si1 Þa log½1 Uðv si1 Þa b ; b 1 Uðv si Þa 1 Uðv si1 Þa i¼1
i
i1
775
G.M. Cordeiro, A.J. Lemonte / Applied Mathematics and Computation 247 (2014) 762–779
n0 n0 n0
ða 1Þ X v xi /ðv xi Þ a ðb 1Þ X v xi /ðv xi Þ Uðv xi Þa1 @‘ ðhÞ n0 2 1X ¼ 1þ 2 þ 3 f ðxi =bÞ2 þ @a a a a i¼1 a i¼1 Uðv xi Þ a 1 Uðv xi Þa i¼1 b1
þ
2 v ri /ðv ri Þ Uðv ri Þa1 ½1 Uðv ri Þa ab X b a i¼1 1 1 Uðv ri Þa
þ
n1 v si1 /ðv si1 Þ Uðv si1 Þa1 ½1 Uðv si1 Þa ab X b b a i¼1 1 Uðv si Þa 1 Uðv si1 Þa
n
b1
1 v si /ðv si Þ Uðv si Þa1 ½1 Uðv si Þa ab X a i¼1 1 Uðv s Þa b 1 Uðv s Þa b i i1
n
b1
;
n0 n0 n0 n0
ða 1Þ X /ðv xi Þ fðxi =bÞ a ðb 1Þ X /ðv xi Þ fðxi =bÞ Uðv xi Þa1 @‘ ðhÞ n0 X 1 1 X 2 ¼ þ þ q ðx =bÞ þ i @b 2 a b i¼1 Uðv xi Þ 2 a b i¼1 2b i¼1 t i þ b 2 a2 b i¼1 1 Uðv xi Þa
þ
n2 /ðv ri Þ fðr i =bÞ Uðv ri Þa1 ½1 Uðv ri Þa ab X b 2 a b i¼1 1 1 Uðv r Þa i
b1
b1
n1 /ðv si Þ fðsi =bÞ Uðv si Þa1 ½1 Uðv si Þa ab X b b 2 a b i¼1 1 Uðv s Þa 1 Uðv s Þa i
i1
b1 n1 /ðv si1 Þ fðsi1 =bÞ Uðv si1 Þa1 ½1 Uðv si1 Þa ab X þ : b b 2 a b i¼1 1 Uðv s Þa 1 Uðv s Þa i
i1
The Newton–Raphson algorithm or some other optimization algorithm needs to be used to solve
@‘ ðhÞ ¼ 0; @a h¼bh
@‘ ðhÞ ¼ 0; @b h¼bh
@‘ ðhÞ ¼ 0; @ a h¼bh
@‘ ðhÞ ¼0 @b h¼bh
with respect to a; b; a and b, since the MLEs of these parameters cannot be obtained explicitly from this system of equations. The observed information matrix can be computed numerically. > > Besides estimation of the model parameters, hypothesis tests can be taken into account. Let h ¼ ðh> 1 ; h2 Þ , where h1 and h2 are disjoint subsets of h. Consider the test of the null hypothesis H0 : h1 ¼ h01 against H1 : h1 – h01 , where h01 is a specified h be the restricted MLE of h under H0 . The likelihood ratio (LR) statistic to test H0 is given by x ¼ 2½‘ðb hÞ ‘ðe hÞ. vector. Let e Under H0 and some regularity conditions, the LR statistic converges in distribution to a chi-square distribution with dim(h1 ) degrees of freedom. In particular, the LR statistic to test the null hypothesis H0 : a ¼ 1 against H1 : a – 1 is x1 ¼ 2½‘ð ba ; bb; ab ; bbÞ ‘ð1; eb; ae ; ebÞ, whereas the LR statistic to test H0 : b ¼ 1 against H1 : b – 1 is x2 ¼ 2½‘ð ba ; bb; ab ; bbÞ ‘ð ea ; 1; ae ; ebÞ. The limiting distributions of x1 and x2 are v2 under the null hypotheses, which are 1
rejected if the test statistics exceed the upper 100ð1 gÞ% quantile of the v21 distribution. Also, for testing b; b e; e b; a a; b bÞ ‘ð1; 1; a bÞ, which has a v22 distribution under H0 : ða; bÞ ¼ ð1; 1Þ against H1 : ða; bÞ – ð1; 1Þ, we have x3 ¼ 2½‘ð b the null hypothesis. The null hypothesis is rejected if the observed value of x3 exceeds the upper 100ð1 gÞ% quantile of the v22 distribution. 7. Real data illustrations In the following, we present two applications of the EGBS distribution to real data for illustrative purposes. We also consider their sub-models to fit the data, i.e., the EBS, LeBS and BS models. All the computations presented were performed using the Ox matrix programming language (http://www.doornik.com). The first real data set corresponds to the exceedances of flood peaks (in m3 =s) of the Wheaton River near Carcross in Yukon Territory, Canada. The data consist of 72 exceedances for the years 1958–1984, rounded to one decimal place [12]: 1.7, 2.2, 14.4, 1.1, 0.4, 20.6, 5.3, 0.7, 1.9, 13.0, 12.0, 9.3, 1.4, 18.7, 8.5, 25.5, 11.6, 14.1, 22.1, 1.1, 2.5, 14.4, 1.7, 37.6, 0.6, 2.2, 39.0, 0.3, 15.0, 11.0, 7.3, 22.9, 1.7, 0.1, 1.1, 0.6, 9.0, 1.7, 7.0, 20.1, 0.4, 2.8, 14.1, 9.9, 10.4, 10.7, 30.0, 3.6, 5.6, 30.8, 13.3, 4.2, 25.5, 3.4, 11.9, 21.5, 27.6, 36.4, 2.7, 64.0, 1.5, 2.5, 27.4, 1.0, 27.1, 20.2, 16.8, 5.3, 9.7, 27.5, 2.5, 27.0. Table 2 gives a descriptive summary for these data. Table 3 lists the MLEs (and the corresponding standard errors in parentheses) of the unknown parameters of the EGBS, EBS, LeBS and BS distributions for the exceedances of flood peaks data. The initial choices for e ¼ 1:7530 and a; b; a and b to start the Newton–Raphson algorithm were the values e b ¼ 7:8984, a a ¼ 2:4820; e e b ¼ 4:8114, respectively. The convergence takes place after 36 steps. Table 4 presents the maximized log-likelihood function and the values of AIC (Akaike information criterion) and CAIC (Corrected Akaike information criterion) statistics for all fitted models. From the figures in this table, we conclude that the new EGBS, EBS and LeBS models outperform the BS distribution in terms of model fitting for these data. The values of AIC for the EBS and LeBS models are very close (see Table 4). There is some uncertainty as to what constitutes a ‘‘substantial’’ difference in AIC values in practical situations. According to Burnham and Anderson’s [9, p. 70] empirical evidence scale, the difference can be computed between the best approximating model and the next best models, denoted by Dm . The models with values of Dm 2 ½0; 2 have substantial support to be considered as good as the best approx-
776
G.M. Cordeiro, A.J. Lemonte / Applied Mathematics and Computation 247 (2014) 762–779 Table 2 Descriptive statistics; exceedances of flood peaks data. Mean
Median
Variance
Skewness
Kurtosis
Minimum
Maximum
12.20
9.50
151.22
1.47
5.89
0.10
64.00
Table 3 MLEs and standard errors in parentheses; exceedances of flood peaks data. Distribution
Estimates
EGBSða; b; a; bÞ
0.4032 (0.9211)
1.4662 (1.8272)
1.7261 (1.7540)
EBSðb; a; bÞ
2.9000 (0.7105)
2.7682 (0.5483)
1.1230 (0.5425)
LeBSða; a; bÞ
0.1596 (0.1277)
1.1991 (0.2274)
0.8313 (0.5689)
BSða; bÞ
1.7583 (0.1477)
4.4179 (0.6497)
1.0420 (0.6654)
Table 4 Maximized log-likelihood, AIC and CAIC for the fitted models; exceedances of flood peaks data. Distribution
‘ðb hÞ
AIC
CAIC
EGBS EBS LeBS BS
249.97 250.08 250.04 256.03
507.95 506.15 506.07 516.05
508.54 506.51 506.42 516.23
Table 5 Some measures for the fitted models; exceedances of flood peaks data. Distribution
Dm
ERm
wm
EGBS EBS LeBS BS
1.88 0.08 0 9.98
2.56 1.04 – 146.94
0.166 0.407 0.424 0.003
imating model. Two additional measures are then introduced to provide the ‘‘strengths’’ of each model: the evidence ratio (ERm) and the weight (wm ) of model m. These measures are defined as
ERm ¼
expðDmin =2Þ ¼ expðDm =2Þ; expðDm =2Þ
expðDm =2Þ wm ¼ PM ; m¼1 expðDm =2Þ
where M denotes the total number of models considered, and given that Dmin ¼ 0. The values of ERm can be interpreted as the greater likelihood of the best approximating model with respect to model m, whereas the values of wm can be interpreted as the probability of a given model being the best approximating model. The values of these measures are given in Table 5. From this table, we note that the LeBS model is about two times more likely to be the best approximating model than the EGBS model. Also, the value of ERm for the EBS is near one, which means that this model is as likely as the LeBS model. Additionally, the chance of the BS model with respect to the LeBS model (and to the EBS model as well) is non-existent. The best LeBS model has a weight of 0.424; that is, there is a 42.4% chance that it really is the best approximating model among the current models to describe the data. Notice that, by definition, the strength of evidence in favor of model i over the model j is obtained simply by considering wi =wj . The second real data set corresponds to lifetimes for 50 industrial devices put on life test at time zero [3]: 0.1, 0.2, 1, 1, 1, 1, 1, 2, 3, 6, 7, 11, 12, 18, 18, 18, 18, 18, 21, 32, 36, 40, 45, 46, 47, 50, 55, 60, 63, 63, 67, 67, 67, 67, 72, 75, 79, 82, 82, 83, 84, 84, 84, 85, 85, 85, 85, 85, 86, 86. Table 6 gives a descriptive summary for these data. Table 7 lists the MLEs (and the corresponding standard errors in parentheses) of the unknown parameters of the EGBS, EBS, LeBS and BS distributions for these data. e ¼ 2:5464 The initial choices for a; b; a and b to start the Newton–Raphson algorithm are taken as e b ¼ 10:159, a a ¼ 2:6446; e and e b ¼ 10:249, respectively. The convergence takes place after 28 steps. Table 8 presents the maximized log-likelihood function, AIC and CAIC statistics for all fitted models for the industrial devices data. We can conclude that the new EGBS, EBS and LeBS models outperform the BS distribution in terms of model fitting for these data. The values of the measures
777
G.M. Cordeiro, A.J. Lemonte / Applied Mathematics and Computation 247 (2014) 762–779 Table 6 Descriptive statistics; industrial devices data. Mean
Median
Variance
Skewness
Kurtosis
Minimum
Maximum
35.88
34.00
861.61
0.14
1.41
0.10
83.00
Table 7 MLEs and standard errors in parentheses; industrial devices data. Distribution
Estimates
EGBSða; b; a; bÞ
0.4842 (0.5946)
2.3661 (1.9400)
3.8217 (2.5563)
EBSðb; a; bÞ
4.2523 (0.8832)
5.6919 (1.6179)
0.8769 (0.5549)
LeBSða; a; bÞ
0.0786 (0.0593)
1.7322 (0.4321)
0.7021 (0.5239)
BSða; bÞ
2.7455 (0.2982)
7.1877 (1.5499)
0.8528 (0.5200)
Table 8 Maximized log-likelihood, AIC and CAIC for the fitted models; industrial devices data. Distribution
‘ðb hÞ
AIC
CAIC
EGBS EBS LeBS BS
237.05 237.24 237.99 253.47
482.09 480.48 481.97 510.94
482.98 481.00 482.49 511.20
Dm , ERm and wm are given in Table 9. From this table, we have that the EBS model is about two times more likely to be the best approximating model than the EGBS and LeBS models. Additionally, the chance of the BS model with respect to the EBS model is non-existent. The best EBS model has a weight of 0.520; that is, there is a 52% chance that it really is the best approximating model among the current models to describe the data. Finally, we shall apply formal goodness-of-fit tests in order to verify which distribution fits better the exceedances of flood peaks and industrial devices data sets. We consider the Cramér–von Mises (W ) and Anderson–Darling (A ) statistics, which are described in details in [11]. These non-parametric statistics are very easy to carry out and work fairly well as demonstrated by [11]. In general, the smaller the values of these statistics, the better the fit to the data. The values of W and A are given in Table 10 for both data sets. Note that the EGBS distribution and their sub-models outperform the BS distribution and hence fit the exceedances of flood peaks and industrial devices data sets better than the BS distribution according to
Table 9 Some measures for the fitted models; industrial devices data. Distribution
Dm
ERm
wm
EGBS EBS LeBS BS
1.61 0 1.49 30.46
2.24 – 2.11 4.1E6
0.233 0.520 0.247 0
Table 10 Statistics W and A . Distribution
EGBS EBS LeBS BS
Exceedances of flood peaks
Industrial devices
W
A
W
A
0.09963 0.10308 0.10082 0.18572
0.56571 0.58330 0.57213 1.13518
0.42614 0.42916 0.44602 0.67087
2.59420 2.61017 2.69735 3.82853
778
G.M. Cordeiro, A.J. Lemonte / Applied Mathematics and Computation 247 (2014) 762–779
these non-parametric statistics. In summary, the proposed EGBS distribution (and their sub-models) may be interesting alternatives to the BS distribution for modeling positive real data. 8. Concluding remarks We propose a generalization of the Birnbaum–Saunders distribution called the exponentiated generalized Birnbaum–Saunders (EGBS) distribution. The new distribution is constructed using Cordeiro et al.’s [15] generator. The EGBS model has three shape parameters and one scale parameter, and includes as special sub-models the exponentiated Birnbaum–Saunders, Lehmann type II Birnbaum–Saunders and Birnbaum–Saunders distributions. The EGBS density function can take various forms depending on its shape parameters. Moreover, its hazard rate function can have different forms: (i) decreasing; (ii) increasing; (iii) upside-down bathtub; (iv) bathtub-shaped; (v) increasing–decreasing–increasing; and (vi) decreasing–increasing– decreasing. Therefore, it can be used quite effectively in analyzing real data in practice. We provide a comprehensive account of the mathematical properties of the EGBS distribution, including explicit expressions for the ordinary and incomplete moments, generating function, mean deviations and Bonferroni and Lorenz curves. The estimation of the model parameters is approached by the maximum likelihood method for uncensored and censored data. We present two applications to real data for illustrative purposes. In conclusion, the EGBS distribution may provide a rather flexible mechanism for fitting a wide spectrum of real world data and therefore it may serve as an alternative model to the BS distribution for modeling real data in areas such as engineering, survival analysis, hydrology, economics, among others. Acknowledgments We are very grateful to the referee for the comments and suggestions. The authors gratefully acknowledge financial support from CNPq (Brazil) and FACEPE (Pernambuco, Brazil). References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34]
C. Alexander, G.M. Cordeiro, E.M.M. Ortega, J.M. Sarabia, Generalized beta-generated distributions, Comput. Stat. Data Anal. 56 (2012) 1880–1897. E.K. AL-Hussaini, Inference based on censored samples from exponentiated populations, Test 19 (2010) 487–513. M.V. Aarset, How to identify bathtub hazard rate, IEEE Trans. Reliab. 36 (1987) 106–108. N. Balakrishnan, V. Leiva, J. López, Acceptance sampling plans from truncated life tests from generalized Birnbaum–Saunders distribution, Commun. Stat. Simul. Comput. 36 (2007) 643–656. N. Balakrishnan, V. Leiva, A. Sanhueza, L. Filidor-Vilca, Estimation in the Birnbaum–Saunders distribution based on scale-mixture of normals and the EM-algorithm, Test 33 (2009) 171–192. H.M. Barakat, Y.H. AbdelKader, Computing the moments of order statistics from nonidentical random variables, Stat. Methods Appl. 13 (2004) 15–26. C.R. Bhatti, The Birnbaum–Saunders autoregressive conditional duration model, Math. Comput. Simul. 80 (2010) 2062–2078. Z.W. Birnbaum, S.C. Saunders, A new family of life distributions, J. Appl. Probab. 6 (1969) 319–327. K.P. Burnham, D.R. Anderson, Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach, Springer, New York, 2002. J.M.F. Carrasco, E.M.M. Ortega, G.M. Cordeiro, A generalized modified Weibull distribution for lifetime modeling, Comput. Stat. Data Anal. 53 (2008) 450–462. G. Chen, N. Balakrishnan, A general purpose approximate goodness-of-fit test, J. Qual. Technol. 27 (1995) 154–161. V. Choulakian, M.A. Stephens, Goodness-of-fit for the generalized Pareto distribution, Technometrics 43 (2001) 478–484. G.M. Cordeiro, M. de Castro, A new family of generalized distributions, J. Stat. Comput. Simul. 81 (2011) 883–898. G.M. Cordeiro, A.J. Lemonte, The b-Birnbaum–Saunders distribution: an improved distribution for fatigue life modeling, Comput. Stat. Data Anal. 55 (2011) 1445–1461. G.M. Cordeiro, E.M.M. Ortega, D.C.C. Cunha, The exponentiated generalized class of distributions, J. Data Sci. 11 (2013) 1–27. D.R. Cox, D.V. Hinkley, Theoretical Statistics, Chapman and Hall, London, 1974. A.F. Desmond, Stochastic models of failure in random environments, Can. J. Stat. 13 (1985) 171–183. J.A. Doornik, An Object-Oriented Matrix Language – Ox 6, Timberlake Consultants Press, London, 2009. N. Eugene, C. Lee, F. Famoye, Beta-normal distribution and its applications, Commun. Stat. Theory Methods 31 (2002) 497–512. I.S. Gradshteyn, I.M. Ryzhik, Table of Integrals, Series, and Products, Academic Press, New York, 2007. P. Guiraud, V. Leiva, R. Fierro, A non-central version of the Birnbaum–Saunders distribution for reliability analysis, IEEE Trans. Reliab. 58 (2009) 152– 160. R.C. Gupta, R.D. Gupta, Analyzing skewed data by power normal model, Test 17 (2008) 197–210. R.C. Gupta, P.L. Gupta, R.D. Gupta, Modeling failure time data by Lehman alternatives, Commun. Stat. Theory Methods 27 (1998) 887–904. R.D. Gupta, D. Kundu, Exponentiated exponential family: an alternative to gamma and Weibull distributions, Biom. J. 33 (2001) 117–130. N. Johnson, S. Kotz, N. Balakrishnan, Continuous Univariate Distributions, second ed., vol. 2, Wiley, New York, 1995. J.F. Kenney, E.S. Keeping, Mathematics of Statistics, Part 1, third ed., Princeton, NJ, 1962, pp. 101–102. P. Kumaraswamy, Generalized probability density-function for double-bounded random-processes, J. Hydrol. 46 (1980) 79–88. D. Kundu, N. Kannan, N. Balakrishnan, On the hazard function of Birnbaum–Saunders distribution and associated inference, Comput. Stat. Data Anal. 52 (2008) 2692–2702. C. Lee, F. Famoye, A.Y. Alzaatreh, Methods for generating families of univariate continuous distributions in the recent decades, WIREs Comput. Stat. 5 (2013) 219–238. V. Leiva, M. Barros, G.A. Paula, A. Sanhueza, Generalized Birnbaum–Saunders distributions applied to air pollutant concentration, Environmetrics 19 (2008) 235–249. V. Leiva, A. Sanhueza, J.M. Angulo, A length-biased version of the Birnbaum–Saunders distribution with application in water quality, Stochastic Environ. Res. Risk Assess. 23 (2009) 299–307. A.J. Lemonte, W. Barreto–Souza, G.M. Cordeiro, The exponentiated Kumaraswamy distribution and its log-transform, Braz. J. Probab. Stat. 27 (2013) 31–53. A.J. Lemonte, G.M. Cordeiro, The exponentiated generalized inverse Gaussian distribution, Stat. Probab. Lett. 81 (2011) 506–517. A.J. Lemonte, F. Cribari–Neto, K.L.P. Vasconcellos, Improved statistical inference for the two-parameter Birnbaum–Saunders distribution, Comput. Stat. Data Anal. 51 (2007) 4656–4681.
G.M. Cordeiro, A.J. Lemonte / Applied Mathematics and Computation 247 (2014) 762–779
779
[35] A.J. Lemonte, S.L.P. Ferrari, Testing hypotheses in the Birnbaum–Saunders distribution under type-II censored samples, Comput. Stat. Data Anal. 55 (2011) 2388–2399. [36] A.J. Lemonte, A.B. Simas, F. Cribari–Neto, Bootstrap-based improved estimators for the two-parameter Birnbaum–Saunders distribution, J. Stat. Comput. Simul. 78 (2008) 37–49. [37] A.W. Marshall, I. Olkin, Life Distributions. Structure of Nonparametric, Semiparametric and Parametric Families, Springer, New York, 2007. [38] J.B. McDonald, Some generalized functions for the size distribution of income, Econometrica 52 (1984) 647–663. [39] S.G. Meintanis, Inference procedures for the Birnbaum–Saunders distribution and its generalizations, Comput. Stat. Data Anal. 54 (2010) 367–373. [40] J.J.A. Moors, A quantile alternative for kurtosis, J. R. Stat. Soc. D 37 (1998) 25–32. [41] G.S. Mudholkar, D.K. Srivastava, Exponentiated Weibull family for analyzing bathtub failure-rate data, IEEE Trans. Reliab. 42 (1993) 299–302. [42] S. Nadarajah, The exponentiated exponential distribution: a survey, AStA Adv. Stat. Anal. 95 (2011) 219–251. [43] S. Nadarajah, G.M. Cordeiro, E.M.M. Ortega, The exponentiated Weibull distribution: a survey, Stat. Pap. 54 (2013) 839–877. [44] S. Nadarajah, S. Kotz, The exponentiated type distributions, Acta Appl. Math. 92 (2006) 97–111. [45] H.K.T. Ng, D. Kundu, N. Balakrishnan, Modified moment estimation for the two-parameter Birnbaum–Saunders distribution, Comput. Stat. Data Anal. 43 (2003) 283–298. [46] B. Pradhan, D. Kundu, Inference and optimal censoring schemes for progressively censored Birnbaum–Saunders distribution, J. Stat. Plann. Inference 143 (2013) 1098–1108. [47] S.C. Saunders, Reliability, Life Testing, and Prediction of Service Lives, Springer, New York, 2007. [48] R. Terras, A Miller algorithm for an incomplete Bessel function, J. Comput. Phys. 39 (1981) 233–240. [49] A. Xu, Y. Tang, Reference analysis for Birnbaum–Saunders distribution, Comput. Stat. Data Anal. 54 (2010) 185–192. [50] B.X. Wang, Generalized interval estimation for the Birnbaum–Saunders distribution, Comput. Stat. Data Anal. 56 (2012) 4320–4326. [51] K. Zografos, N. Balakrishnan, On families of beta-and generalized gamma-generated distribution and associate inference, Stat. Methodol. 6 (2009) 344– 362.