Statistical Methodology 29 (2016) 32–50
Contents lists available at ScienceDirect
Statistical Methodology journal homepage: www.elsevier.com/locate/stamet
A two-parameter general inflated Poisson distribution: Properties and applications Athanasios C. Rakitzis a,∗ , Philippe Castagliola a , Petros E. Maravelakis b a
LUNAM Université, Université de Nantes & IRCCyN UMR CNRS 6597, Nantes, France
b
Department of Business Administration, University of Piraeus, 80 Karaoli & Dimitriou Street, 18534, Piraeus, Greece
highlights • • • •
A new two-parameter general inflated Poisson distribution is proposed and studied. The proposed model can be effectively used for fitting underdispersed and overdispersed count data. A general mechanism for inflating the zero and non-zero values of any discrete distribution is offered. Real-data examples illustrate the practical usefulness of the proposed model.
article
info
Article history: Received 7 March 2015 Received in revised form 16 October 2015 Accepted 26 October 2015 Available online 3 November 2015 MSC: 62F99 62P99 62E15 Keywords: Count data EM algorithm Maximum likelihood estimation Overdispersion Poisson distribution Underdispersion
abstract In this work, we propose and study a two-parameter modification of the ordinary Poisson distribution that is suitable for the modeling of non-typical count data. This model can be viewed as an extension of the zero-inflated Poisson distribution. We derive the proposed model as a special case of a general one and focus our study on it. The theoretical properties for each model are given, while estimation methods for the two-parameter model are discussed as well. Three practical examples illustrate its usefulness. The results show that the proposed model is very flexible in the modeling of various types of count data. © 2015 Elsevier B.V. All rights reserved.
∗
Corresponding author. E-mail addresses:
[email protected] (A.C. Rakitzis),
[email protected] (P. Castagliola),
[email protected] (P.E. Maravelakis). http://dx.doi.org/10.1016/j.stamet.2015.10.002 1572-3127/© 2015 Elsevier B.V. All rights reserved.
A.C. Rakitzis et al. / Statistical Methodology 29 (2016) 32–50
33
1. Introduction The Poisson distribution is a widely used probability model in the description of count related data that arise very frequently across many disciplines. When a large number of zero counts is present in the data, there is evidence of overdispersion and the ordinary Poisson distribution is not an appropriate model, because it often underestimates the observed dispersion. The use of the zero-inflated Poisson (ZIP) distribution has been considered by several authors as an appropriate model for count data with an excessive number of zeros. Xie and Goh [31] and Xie et al. [32] used the ZIP distribution in the context of statistical quality control, in order to describe a zero-defect process, subject to random shocks. According to this model, the random shocks will cause some defects (non-conforming items) and the number of defects follows a Poisson distribution with parameter λ. Note also that the shocks occur independently with parameter 1 − φ , where φ ∈ [0, 1] is the zero-inflated parameter. Other applications of the ZIP model can be found in manufacturing (Lambert [17], Bae et al. [1]), epidemiology (Bohning [4]), biology (Yip [33]), healthcare (Brown et al. [5]), insurance (Yip [34]) and accident data (Lord et al. [18], Ullah et al. [29]). Bohning [3] provided several practical examples where the ZIP distribution is the appropriate model for the available data. Also, he discussed the numerical aspects of parameter estimation for the ZIP distribution, using the Computer Assisted Analysis of Mixtures (C.A.MAN) package. A natural extension of the ZIP model is to assume that random shocks influence not only the zero values but also all the first r + 1 values {0, 1, . . . , r } of a Poisson distribution with parameter λ. The idea is that instead of a single inflated parameter φ0 ∈ [0, 1], which only impacts on the smallest value x = 0, there are r + 1 inflated parameters φ0 , φ1 , . . . , φr , which impact on the first r + 1 smallest values x = 0, . . . , r. When data are not inflated only at ‘‘zero’’, they are usually referred to as ‘‘non-zero’’ or as ‘‘general inflated’’ models. It seems that Yoneda [35] was the first who studied a modification of the Poisson distribution with arbitrary probabilities to the first r + 1 values and the remaining probabilities proportional to e−x λx /x!. Padney [22] proposed the Poisson distribution inflated at point ‘‘8’’, as an appropriate model for the number of flowers of the plant Primula Veris. Murat and Szynal [21] considered modified Power Series distributions inflated at point r , r > 0, and extended the results of Gupta et al. [13] who studied the distributional properties of zero-inflated Power Series distributions. Melkersson and Rooth [20] proposed the zero-two inflated Poisson (ZTIP) distribution (i.e., their model is only inflated at ‘‘0’’ and ‘‘2’’) as an appropriate model for describing the Swedish women’s fertility. Recently, Begum et al. [2] proposed and studied an inflated Poisson distribution (called Generalized Inflated Poisson), considering also special cases of it in order to fit the data of Melkersson and Rooth [20]. The main motivation of our work is to develop an appropriate discrete probability model that generalizes the ZIP model while it can be used for modeling unusual count data, i.e., data with various level of dispersion or non-unimodal data. Moreover, it would be desirable to have the least possible number of model parameters, in order to keep things simple. According to Weiß [30] (see also Sellers et al. [26]), when the equidispersion property of the Poisson distribution is violated, there is a vast amount of research articles that proposes several alternative models in order to handle the overdispersion of the data but there are only few, when underdispersion is present. Moreover, we will focus on two-parameter models, in order to have only one additional parameter that controls the degree of inflation in all the first r + 1 values of the Poisson distribution. The paper is laid out as follows: In Section 2 we start with a general model that consists of r + 2 parameters, for inflating the first r +1 values of the Poisson distribution, and then we derive as a special case, the proposed two-parameter model. The properties for both models are given. In Section 3 we discuss the ability of the proposed model in describing overdispersed and underdispersed data while estimation methods are discussed in Section 4. Three practical applications, concerning the fitting of the considered models, are provided in Section 5 while conclusions and future research are discussed in Section 6. 2. The proposed model In the current section, we present two discrete probability models that can be considered as generalizations of the well known zero-inflated Poisson distribution (see, for example, Johnson et al.
34
A.C. Rakitzis et al. / Statistical Methodology 29 (2016) 32–50
[14, p. 193]). Our study will mainly focus on the second model, a two-parameter one, which is a special case of the first. Note also that the same inflation mechanism can be applied to any other discrete distribution defined on N or a subset of it. 2.1. The r-First Inflated Poisson distribution By definition, X is a r-First Inflated Poisson random variable with parameters θ = (φ0 , . . . , φr , λ), denoted as X ∼ FIP r (φ0 , . . . , φr , λ), if it is defined for x ∈ {0, 1, . . .} and if its probability mass function (p.m.f.) fFIP r (x |θ ) is given by
fFIP r
r 1 φi fP (x |λ ) , φx + r + 1 − r + 1 i =0 (x |θ ) = r 1 φi fP (x |λ ) , r +1− r +1 i=0
if x ∈ {0, . . . , r } if x ∈ {r + 1, . . .},
or, equivalently, fFIP r
r φx 1 I{0,1,...,r } (x) + r +1− φi fP (x |λ ) , (x |θ ) = r +1 r +1 i =0
x ∈ {0, 1, . . .},
(1)
where IA (x) is an indicator function, such that IA (x) = 1 if x ∈ A, IA (x) = 0 if x ̸∈ A and fP (x |λ ) is the p.m.f. of the standard Poisson distribution with parameter λ. The proposed FIP r distribution can be written as a special mixture model having two classes where the first class has a fixed value for x = 0, 1, . . . , r and the second class has the standard Poisson distribution with parameter λ (see, Johnson et al. [14, p. 348]). The parameter r ≥ 0 defines the number of the first r + 1 values of the distribution that are inflated. Also, φ0 , φ1 , . . . , φr are the inflated parameters for each of the values 0, 1, . . . , r, whereas λ > 0 is the mean of the Poisson distribution. Depending on the values of φ0 , φ1 , . . . , φr , the FIP r (φ0 , . . . , φr , λ) distribution reduces to
• the so-called zero-inflated Poisson distribution with parameters φ0 and λ (i.e., ZIP (φ0 , λ)) if r = 0, • the standard Poisson distribution with parameter λ if r > 0 and φx = 0 for x ∈ {0, . . . , r }, • the Dirac distribution at x = x0 if, for x ∈ {0, 1, . . . , r }, φx = r + 1 when x = x0 and φx = 0 when x ̸= x0 . In order for fFIP r (x |θ ) to be a true p.m.f., we notice that, by construction, no matter the values of φ0 , φ1 , . . . , φr (positive or negative, smaller or larger than 1), we necessarily have ∞
fFIP r (x |θ ) = 1.
x=0
Thus, the only remaining constraint is that fFIP r (x |θ ) ≥ 0, for x ∈ {0, 1, . . .}, which implies the following r + 2 inequalities:
φx + r + 1 −
r
φi fP (x |λ ) ≥ 0 for x ∈ {0, . . . , r },
i=0
r +1−
r
φi ≥ 0.
i=0
Now, if we also assume that φ0 ≥ 0, φ1 ≥ 0, . . . , φr ≥ 0 (this will be our assumption in the sequel of this paper), then, since fP (x |λ ) ≥ 0 for x ∈ {0, 1, . . .}, the only constraint for parameters φ0 , φ1 , . . . , φr is r +1−
r i=0
φi ≥ 0.
(2)
A.C. Rakitzis et al. / Statistical Methodology 29 (2016) 32–50
35
Note that, unlike the ZIP (φ0 , λ) distribution for which φ0 ∈ [0, 1], inequality (2) is satisfied even if some of the φi ’s are larger than 1. The cumulative distribution function (c.d.f.) FFIP r (x |θ ) of the FIP r (φ0 , . . . , φr , λ) distribution can be derived from the p.m.f. fFIP r (x |θ ) and it is equal to
FFIP r
0, ⌊ x⌋ r 1 φi + r + 1 − φi FP (x |λ ) , (x |θ ) = r + 1 i=0 i =0 r r 1 φi + r + 1 − φi FP (x |λ ) , r + 1 i=0 i=0
for x < 0 for 0 ≤ x ≤ r
for x ≥ r + 1,
where ⌊. . .⌋ is the rounded down integer and FP (x |λ ) is the c.d.f. of the Poisson distribution with parameter λ. Also, the probability generating function (p.g.f.) GFIP r (s) of the FIP r (φ0 , . . . , φr , λ) distribution is given by
1
GFIP r (s) =
r +1
r
s φx + x
r +1−
r
x =0
φi GP (s) ,
(3)
i=0
while its moment generating function (m.g.f.) MFIP r (t ) can be directly obtained by substituting s = et in (3) and it is given by MFIP r (t ) =
1
r
r +1
etx φx +
r +1−
x =0
r
φi MP (t ) ,
i=0
where GP (s) = e−λ(1−s) and MP (t ) = e−λ(1−e ) are the p.g.f. and the m.g.f. of the Poisson distribution with parameter λ, respectively. By using the formula t
E (X ) = MX (t ) dt k d
k
,
t =0
we directly obtain the non-central moments of order k for the random variable X and, after successive differentiations, it is not difficult to verify that
1
E (X ) = k
r
r +1
x φx + k
r +1−
x =1
r
φi EP (X ) , k
i=0
where EP (X k ) is the kth order non-central moment of the Poisson distribution with parameter λ. In particular, using the fact that EP (X ) = λ and EP (X 2 ) = λ(λ + 1), the first two moments of the FIP r (φ0 , . . . , φr , λ) distribution are E (X ) =
1 r +1
E (X ) = 2
1 r +1
r
xφx +
r +1−
x =1
r x =1
r
φi λ ,
i=0
x φx + 2
r +1−
r
φi λ(1 + λ) ,
i =0
and, consequently, the variance of the FIP r (φ0 , . . . , φr , λ) distribution can be computed by the well2 = E (X 2 ) − E 2 (X ). known formula V (X ) ≡ σFIP r Before closing this subsection, we mention that the proposed model provides a general mechanism for inflating one (or more) non-zero values of the Poisson distribution. For example, if r > 0 with φi0 ∈ (0, r + 1) and φj = 0, for (i0 , j) ∈ {0, 1, . . . , r }2 and j ̸= i0 , then compared to the ordinary Poisson distribution with parameter λ, the resulted FIP r (θ) distribution will have an excessive number of i0 values. We will refer to this distribution as the i0 -inflated Poisson distribution while for i0 = 0, the zero-inflated Poisson distribution is obtained. Also, for r = 2, (φ0 , φ2 ) ∈ (0, 3)2 , φ1 = 0 and φ0 + φ1 + φ2 < 3, we obtain the ZTIP distribution of Melkersson and Rooth [20]. For illustrative purposes, we provide the p.m.f. of various FIP r distributions for different parameter sets (see Fig. 1).
36
A.C. Rakitzis et al. / Statistical Methodology 29 (2016) 32–50
(a) (r , φ0 , φ1 , φ2 , φ3 , λ) = (3, 0.8, 0.7, 0, 0, 2).
(b) (r , φ0 , φ1 , φ2 , φ3 , λ) = (3, 0.8, 0.6, 0.4, 0, 1).
(c) (r , φ0 , φ1 , φ2 , λ) = (2, 0.5, 0.75, 0.2, 2).
(d) (r , φ0 , φ1 , λ) = (1, 0.7, 0.4, 3).
Fig. 1. Plots of the p.m.f. for various FIP r distributions.
Remark. The FIP r distribution can be also viewed as a special case of the Generalized Inflated Poisson distribution (GIP) of Begum et al. [2]. More specifically, the p.m.f. of the GIP distribution is of the form
fGIP (x |λ, θ1 , . . . , θm ) = θx I{k1 ,k2 ,...,km } (x) +
1−
m
θi fP (x |λ ) ,
x ≥ 0,
(4)
i=1
where x ∈ {0, 1, . . .} and λ > 0. Also, for 1 ≤ i ≤ m, it is (θ1 , θ2 , . . . , θm ) ∈ (0, 1)m , (k1 , k2 , . . . , km ) ∈ m {0, 1, . . .}m and 0 < i=1 θi < 1. Therefore, for m = r + 1, k1 = 0, . . . , kr +1 = r and θi = φi /r + 1, i = 1, 2, . . . , m, the GIP distribution coincides with the FIP r distribution. Conversely, the FIP r distribution with φi = (r + 1)θi and φx = 0 for x ̸∈ {k1 , k2 , . . . , km }, is the GIP distribution of Begum et al. [2]. Even though these two models coincide, the FIP r distribution can be considered as more practical than the GIP distribution, since in practice, we do not know a priori which values of the Poisson distribution are inflated. On the other hand, the GIP distribution is more flexible than the FIP r distribution, whereas both models have a large number of parameters, especially for large values of r or m, which have to be determined. In the sequel we will not consider the GIP distribution, since our aim is to establish a model with the least possible number of parameters. 2.2. The r-geometrically inflated Poisson distribution The main drawback of the previous model is that there are r + 2 parameters to handle. In practice, this may be a problem if these parameters have to be estimated, because of the large size of the preliminary sample that might be necessary. A special case of the r-First Inflated Poisson distribution is obtained if we replace each parameter φi , i = 0, . . . , r, by a function that depends on a single parameter φ ∈ [0, 1] and i. Since in many practical applications, low Poisson counts are observed (e.g., in case of high-yield processes or in public-health surveillance), it is natural to assume that we have φ0 ≥ φ1 ≥ · · · ≥ φr ≥ 0.
A.C. Rakitzis et al. / Statistical Methodology 29 (2016) 32–50
37
In that sense, a logical choice is to replace φi with φ i+1 , i = 0, . . . , r. Obviously, instead of φ i+1 , any other function of φ can be used. For φ ∈ [0, 1], the sequence φ0 , φ1 , . . . is a geometric sequence. We suggest to call the resulting distribution as the Geometrically Inflated Poisson distribution of order r, with parameters φ and λ, and to denote it GIP r (φ, λ). Clearly, by using this function, all the first r + 1 values of the Poisson distribution with parameter λ are inflated. Next, our study will mainly focus on the case of the r-Geometrically Inflated Poisson distribution with parameters φ and λ. Also, we assume that the number r of the inflated values according to the model, is known. The rational behind this choice was the need for a two-parameter model, i.e., compared with the Poisson distribution, we have one additional parameter (φ ) while compared with the ZIP distribution, the number of the parameters that has to be estimated is the same. Guidelines for the choice of r are provided in Section 5. Consequently, the p.m.f. fGIP r (x |φ, λ ), the c.d.f. FGIP r (x |φ, λ ), the p.g.f. GGIP r (s) and the m.g.f. MGIP r (t ) of the GIP r (φ, λ) distribution can be simply obtained from the the c.d.f., the p.g.f. and p.m.f., x x x the m.g.f. of the FIP r (λ, φ0 , . . . , φr ) distribution by replacing the sums i=0 φi , i=0 iφi and i=0 i2 φi with their respective expressions, that is
x +1 φ 1 − φ , for φ ∈ [0, 1) φi = φ i+1 = g0 (x, φ) = 1−φ i=0 i=0 x + 1, for φ = 1, x +1 xφ − (x + 1)φ x + 1 2 , x x φ (1 − φ)2 iφi = iφ i+1 = g1 (x, φ) = i=0 i=0 x(x + 1) , x
x
2
for φ ∈ [0, 1) for φ = 1,
and x
i2 φi =
i=0
x
i2 φ i+1 = g2 (x, φ)
i=0
φ(1 + φ) − φ x+1 ((φ x)2 − φ(2x2 + 2x − 1) + (x + 1)2 ) , φ (1 − φ)3 = 2 x(2x + 3x + 1) , 6
for φ ∈ [0, 1) for φ = 1.
Note also that r + 1 − g0 (r , φ) > 0 holds for every φ ∈ (0, 1). Thus, we directly obtain the p.m.f. fGIP r (x |φ, λ ) and the c.d.f. FGIP r (x |φ, λ ) of the GIP r distribution, which are respectively given by
1 φ x+1 + (r + 1 − g0 (r , φ)) fP (x |λ ) , if x ∈ {0, . . . , r } r +1 fGIP r (x |φ, λ ) = 1 (r + 1 − g0 (r , φ)) fP (x |λ ) , if x ∈ {r + 1, . . .}, r +1 0, for x < 0 1 ((g0 (⌊x⌋, φ) + (r + 1 − g0 (r , φ)) FP (x |λ ))) , for 0 ≤ x ≤ r FGIP r (x |φ, λ ) = r + 1 1 (g (r , φ) + (r + 1 − g (r , φ)) F (x |λ )) , for x ≥ r + 1. 0 0 P r +1 In a similar manner, the 1st and 2nd non-central moments are, respectively, given by
µGIP r ≡ E (X ) = E (X 2 ) =
1 r +1
1
(g1 (r , φ) + (r + 1 − g0 (r , φ)) λ) ,
(5)
(g2 (r , φ) + (r + 1 − g0 (r , φ)) λ(1 + λ)) .
(6)
r +1
38
A.C. Rakitzis et al. / Statistical Methodology 29 (2016) 32–50
(a) r = 3, φ = 0.8, λ = 2.
(b) r = 2, φ = 0.6, λ = 1.
(c) r = 3, φ = 0.7, λ = 5.
(d) r = 1, φ = 0.8, λ = 3.
Fig. 2. Plots of the p.m.f. for various GIP r distributions.
As for the case of the FIP r (φ0 , φ1 , . . . , φr , λ) distribution, the variance of the GIP r (φ, λ) distribution 2 is evaluated by σGIP = E (X 2 ) − µ2GIP r where the E (X 2 ) is given in (6). r Moreover, the p.g.f. and the m.g.f. of the GIP r (λ, φ0 , . . . , φr ) distribution are, respectively, given by
1 − (φ s)r +1 φ + (r + 1 − g0 (r , φ)) e−λ(1−s) , r +1 1 − φs 1 − (φ et )r +1 1 −λ(1−et ) MGIP r (t ) = φ + r + 1 − g ( r , φ)) e . ( 0 r +1 1 − φ et
GGIP r (s) =
1
Before closing this subsection, it is worth to mention that in the special case of φ = 1, the proposed model reduces to the Discrete Uniform distribution, defined on the set {0, 1, . . . , r } with equal probabilities 1/(r + 1). Also, for r → ∞, we obtain the standard Poisson distribution with parameter λ. Moreover, if φ = 1 and r = 0, the GIP r (φ, λ) distribution reduces to the Dirac distribution on x = 0. Next, we provide for illustrative purposes the p.m.f. of the GIP r (φ, λ) distribution, for different values of the parameters r, φ and λ (see Fig. 2). 2.3. Simulate from FIP r and GIP r distributions Because the proposed models are new, they are not yet implemented in statistical software packages. However, for both models, closed-form expressions for the p.m.f. are available (see Sections 2.1 and 2.2), which can be numerically evaluated. So, in order to simulate a random sample from the GIP r i distribution, the following procedure is suggested. Let also k=j fGIP r (k |φ, λ ) = 0 for i < j. 1. Choose an upper limit xmax , such that P (X ≥ xmax ) is very close to zero (e.g., less than 10−20 ). 2. Generate a random number U from the Uniform distribution defined on (0, 1), i.e., U ∼ U (0, 1).
A.C. Rakitzis et al. / Statistical Methodology 29 (2016) 32–50
39
3. For all i = 0 to xmax : if
i−1
fGIP r (k |φ, λ ) < U ≤
k=0
i
fGIP r (k |φ, λ ) ,
k=0
then we get the value i from the GIP r distribution. The procedure is (almost) the same for generating random numbers from the FIP r distribution. The above mentioned steps can be directly implemented in R (R Core Team [23]) by using the command sample(x=0 : xmax ,size= m ,replace=TRUE,prob=...), where m is the size of the random sample and prob is the vector of probabilities (P (X = 0), P (X = 1), . . . , P (X = xmax )). 3. Modeling data with various levels of dispersion In the literature there are several flexible discrete distributions, which aim to model either the overdispersion or the underdispersion of the data. The most common are the Consul’s generalized Poisson (GP) distribution (Consul [8]), the double Poisson (DP) distribution (Efron [12]), the Conway–Maxwell–Poisson (CMP) distribution (Shmueli et al. [27]) and the Power-law (PL) weighted Poisson distribution (del Castillo and Pérez-Casany [10]). Under certain circumstances, the proposed GIP r distribution can be used to model not only overdispersed, but underdispersed data as well. Note also that all the above mentioned distributions are two-parameter models. According to del Castillo and Pérez-Casany [10], given a random variable X , the index of dispersion is defined as I (X ) = V (X )/E (X ). When I (X ) > 1 (I (X ) < 1), the distribution is overdispersed (underdispersed) while when I (X ) = 1, the distribution is equidispersed. The dispersion index for the GIP r distribution is equal to I (X ) =
E (X 2 ) − µ2GIP r
µGIP r
,
where µGIP r and E (X 2 ) are given by (5) and (6), respectively. Therefore, since µGIP r > 0, the I (X ) > 1 if
(r + 1 − g0 (r , φ))(g0 (r , φ)λ2 − 2g1 (r , φ)λ) > g1 (r , φ)(r + 1 + g1 (r , φ)) − (r + 1)g2 (r , φ). (7) Given the value of r, inequality (7) can be used for characterizing the level of dispersion of the GIP r distribution in terms of φ and λ. For example, for r = 1 and λ = 2,
• when φ ∈ (0, 0.931142), the GIP 1 (φ, 2) distribution is overdispersed, • when φ ∈ (0.931142, 1), the GIP 1 (φ, 2) distribution is underdispersed, • when φ = 0 or φ = 0.931142, the GIP 1 (φ, 2) distribution is equidispersed. In a similar manner, for r = 1 and φ = 0.9,
• when λ > 1.71764, the GIP 1 (φ, 2) distribution is overdispersed, • when λ ∈ (0, 1.71764), the GIP 1 (φ, 2) distribution is underdispersed, • when λ = 1.71764, the GIP 1 (0.9, λ) distribution is equidispersed. Moreover, in Fig. 3(a)–(d) we provide the contour plots for r ∈ {1, 2, 3, 4}, in terms of the dispersion index I (X ). These plots depict the (λ, φ)-region where the GIP r is underdispersed. It is not possible to derive an analytical expression for that region. The values for λ are in the horizontal axis while the φ values are in the vertical axis. The boundary is labeled with ‘‘1’’ (i.e., I (X ) = 1) and the region above (below) it, corresponds to an under- (over-) dispersed GIP r distribution. It is not difficult to verify that as r increases, this area becomes narrower and for r ≥ 4, all region is overdispersed. Finally, it goes without saying that since the GIP r distribution can be used for modeling underdispersed/overdispersed data, the FIP r distribution is also appropriate for that purpose, due to its flexibility. However, it is in general more difficult to determine an analytical expression (similar to (7)) for characterizing the dispersion level, or provide useful graphs, because of the large number of its parameters.
40
A.C. Rakitzis et al. / Statistical Methodology 29 (2016) 32–50
4. Estimation Suppose that X = (X1 , X2 , . . . , Xm ) is a random sample from the GIP r (φ, λ) distribution. We want to estimate the parameters φ and λ, given the value of r. A simple approach is to employ the method of moments, where the theoretical moments E (X ), E (X 2 ) are equated to the respective sample moments, resulting to the following system of equations m
m
Xi
i=1
Xi2
i=1
, E (X ) = . m m After some necessary algebraic manipulations, we get the following two equations, i.e., E (X ) =
2
(r + 1)
m
Xi2 −
+ m(g1 (r , φ) − g2 (r , φ)) =
Xi
i =1
i=1
λ=
m
1 m(r + 1 − g0 (r , φ))
(r + 1)
m i=1
Xi2
−
m
Xi
(r + 1)
m
2 Xi − mg1 (r , φ)
i =1
,
m(r + 1 − g0 (r , φ))
(8)
1/2 + m(g1 (r , φ) − g2 (r , φ))
.
(9)
i =1
Thus, the moment estimator φ˜ of φ is obtained numerically (i.e., using an iterative method) from ˜ for λ, directly Eq. (8) and then, by using the estimate of φ , we can deduce the moment estimator λ from Eq. (9). It should be mentioned that this method can lead to a value of φ˜ that is outside the interval [0, 1], especially when the sample size is not very large. Clearly, in that case it does not make sense to rely on moment estimators and another estimation method must be used. Apart from the moment estimation method, estimates for φ and λ can be obtained via the Maximum Likelihood (ML) method. In order to find the maximum likelihood estimators (MLE) of φ and λ, we have to maximize the likelihood function, which is given by
L(φ, λ |x ) =
m
fGIP r (xi |φ, λ ).
i=1
Clearly, it is difficult to have an analytical expression for L(φ, λ |x ). Thus, the maximization can be ˆ that made via numerical methods (e.g., Newton–Raphson), where the MLE are the values φˆ and λ maximize L(φ, λ |x ), i.e.,
ˆ λ) ˆ = argmax L(φ, λ |x ), (φ, φ,λ
under the constraints φ ∈ (0, 1) and λ > 0. Apart from directly maximizing the likelihood function, the EM algorithm (Dempster et al. [11]) can be employed in order to estimate the parameters of the GIP r distribution. This technique has been strongly advocated by several authors (see, for example, McLachlan and Peel [19], Karlis [15], Sur et al. [28]) for estimating the parameters of a mixture distribution. Let X = (X1 , X2 , . . . , Xm ) be a random sample from the GIP r (φ, λ) distribution and suppose that we know which of the values {0, 1, . . . , r } are structural (i.e., they come from the first class of the mixture distribution in Eq. (1)) and which come from the Poisson distribution. We denote as N0,s , N1,s , . . . , Nr ,s the total number of values {0, 1, . . . , r } in the sample that are structural and as N0 , N1 , . . . , Nr the total number of values {0, 1, . . . , r }. Next, we will estimate φ and λ by assuming that Ns = (N0,s , N1,s , . . . , Nr ,s ) are missing values (latent variables). Let also N = (N0 , N1 , . . . , Nr ). The (complete) likelihood function L(φ, λ; N, Ns , X) is written in the form
L(φ, λ; N, Ns , X) =
Nj,s Nj −Nj,s r 1 1 φ j+1 · (r + 1 − g0 (r , φ))fP (j |λ ) r +1 r +1 j =0 1 × (r + 1 − g0 (r , φ)) fP (Xj |λ ). r +1 X ̸∈{0,1,...,r } j
A.C. Rakitzis et al. / Statistical Methodology 29 (2016) 32–50 1.0
2
1.0
7
6
0.8
0.8
41
1 3 5
4
3
0.6
4
5
0.6
0.4
0.4
0.2
0.2
2
0.0
0.0
1
0
2
4
6
8
10
(a) r = 1.
0
2
4
6
8
10
(b) r = 2.
1.0
1.0
3
1 3
4.5 5 1
0.8
0.8
3.5
3.5 4
4 2
0.6
0.6 2
2
2.5
2.5
0.4
0.4
1.5
0.2
1.5
0.2
1.5
1.5
0.0
0.0 0
2
4
(c) r = 3.
6
8
10
0
2
4
6
8
10
(d) r = 4. Fig. 3. Contour plots of the I (X ) index for r ∈ {1, 2, 3, 4} and different values for φ , λ.
Since the Ns values are not available, we can find the MLE of φ and λ by applying the following two steps of the EM algorithm: E-Step: Given X, N and the estimate of φ and λ at the kth step, say φ (k) and λ(k) , compute the conditional expected value of the log-likelihood function Q φ, λ φ (k) , λ(k)
= ENs |X,N,φ (k) ,λ(k) (ln L(φ, λ; N, Ns , X)) r φ j+1 = E Nj,s Nj , φ (k) , λ(k) ln (r + 1 − g0 (r , φ))fP (i |λ ) j =0 r r + 1 − g0 (r , φ) |λ + Nj ln fP (j ) r +1 i=0 r r + 1 − g0 (r , φ) + m− Nj ln + ln fP (xj |λ ). r +1 j =0 x ̸∈{0,1,...,r } j
42
A.C. Rakitzis et al. / Statistical Methodology 29 (2016) 32–50
Table 1 Simulated observations (m = 150) from the GIP 2 (0.6, 3) distribution. 6 5 3 3 1 0 5
1 2 5 1 2 2 2
3 3 2 3 4 6 1
2 5 2 3 0 2 4
5 0 2 1 2 1 1
0 0 1 4 0 2 5
5 1 1 0 0 3 2
1 1 2 2 2 3 0
2 3 2 0 4 2 0
3 2 3 1 0 1 1
5 4 4 2 2 3 2
4 3 4 5 8 8 2
1 0 2 2 5 1
1 1 1 2 2 4
1 3 0 6 2 0
0 2 2 2 5 1
2 0 1 5 1 0
3 5 1 1 2 0
5 0 6 2 0 1
3 3 2 7 1 1
0 1 1 0 2 1
0 1 0 2 3 4
2 3 4 1 3 1
The expected value E(Nj,s Nj , φ, λ ), is equal to
E(Nj,s Nj , φ, λ ) =
Nj φ j+1
j ∈ {0, 1, . . . , r }. M-Step: Find the values φ (k+1) , λ(k+1) that maximize Q (φ, λ φ (k) , λ(k) ), i.e., (φ (k+1) , λ(k+1) ) = argmax Q φ, λ φ (k) , λ(k) .
φ j+1 + (r + 1 − g0 (r , φ))fP (j |λ )
,
(φ,λ)
Since closed form solutions for the estimates φ (k+1) , λ(k+1) are unavailable, they will be obtained numerically via an appropriate iterative procedure. The above steps are repeated until we achieve the ˆ (e.g., when |(φ (k+1) −φ (k) )/φ (k) | ≤ 10−5 , |(λ(k+1) −λ(k) )/λ(k) | ≤ desired accuracy in the estimates φˆ , λ
ˆ λˆ ) of the unobserved 10−5 ). Using this estimation procedure, we can also get an estimate E(Nj,s Nj , φ,
frequencies of the structural values {0, 1, . . . , r }. Moreover, for establishing approximate confidence intervals for φ and λ, we can exploit the observed Fisher information matrix, which takes on the (general) form
I I = 11 I21
I12 , I22
where
∂2 , = ENs |X,N,φ,λ − 2 ln(L(φ, λ; N, Ns , X)) ∂φ ˆ λ) ˆ (φ,λ)=(φ, ∂2 , = I21 = ENs |X,N,φ,λ − ln(L(φ, λ; N, Ns , X)) ∂φ∂λ ˆ λ) ˆ (φ,λ)=(φ, ∂2 = ENs |X,N,φ,λ − 2 ln(L(φ, λ; N, Ns , X)) . ∂λ ˆ λ) ˆ (φ,λ)=(φ,
I11
I12
I22
As an example of the above mentioned estimation method, suppose that m = 150 (simulated) observations are available from a GIP 2 (0.6, 3) distribution. The x1 , x2 , . . . , x150 values are given in Table 1 (read from left to right, by row). From the data in Table 1 we obtain the moment estimates for φ and λ, which are φ˜ = 0.463 and λ˜ = 2.801. Also, the N = (N0 , N1 , N2 ) = (25, 35, 38). Using the moment estimates as the initial valˆ = 2.891, whereas the ues, we employ the EM algorithm and obtain at iteration 158 that φˆ = 0.551, λ estimates of the Nj,s values, j = 0, 1, 2 are N0,s = 20.85, N1,s = 17.11 and N2,s = 10.16, respectively.
ˆ are I (φ) ˆ −1/2 = 0.0353 and I (λ) ˆ −1/2 = 0.1685 respectively, The estimated standard errors of φˆ and λ whereas the approximate 1 − α = 95% confidence intervals for φ and λ are ˆ −1/2 = (0.4820, 0.6202), φˆ ± zα/2 I (φ)
ˆ −1/2 = (2.5611, 3.2215). λˆ ± zα/2 I (λ)
Note also that by directly maximizing the likelihood function, we obtain the same point estimates ˆ = 2.892), with corresponding standard errors I (φ) ˆ −1/2 = 0.1570 and I (λ) ˆ −1/2 = (φˆ = 0.551, λ 0.3800. The respective approximate 95% confidence intervals are
ˆ −1/2 = (0.2435, 0.8590), φˆ ± zα/2 I (φ)
ˆ −1/2 = (2.1477, 3.6355). λˆ ± zα/2 I (λ)
A.C. Rakitzis et al. / Statistical Methodology 29 (2016) 32–50
43
Before closing this section we mention that the above mentioned estimation methods can be used in a similar way, after some necessary modifications, in order to estimate the parameters (φ0 , φ1 , . . . , φr , λ) in the case of the FIP r (φ0 , φ1 , . . . , φr , λ) distribution. The details are left to the reader. 5. Examples In the current section we will fit the FIP r and GIP r models into three different sets of count data. The aim is to highlight their practical value in the modeling of count data. The first example is about the number of outbreaks of strikes, the second is about the number of roots of a specific apple variety while the third is about the number of responses in a 15-point rating scale. Clearly, the techniques applied in this section can be similarly applied to data from any area of applied research with similar characteristics. Before proceeding to the examples, it is worth to mention some practical information about how to choose the value of r. In general, it is advised to pre-specify the value of r before using the preferred estimation method. When previous knowledge is not available, model selection criteria can be applied in order to determine r. Two of the most common criteria are the Akaike information criterion (AIC) and the Bayesian information criterion (BIC), which are respectively given by AIC = −2 ln(Lmax ) + 2k,
BIC = −2 ln(Lmax ) + k ln(n),
where n is the sample size, k is the number of the parameters that have been estimated and Lmax is the maximum value of the likelihood function for the estimated model. The usual practice is to estimate first a number, say M , of candidate models (e.g., FIP r ) using the MLE method, for r ∈ {0, 1, . . . , M − 1} and then to compute the AIC (BIC) for each model. The model with the smallest AIC (BIC) is chosen as the ‘‘best’’ model. Note also that as r increases, the number of the parameters for the FIP r model increases as well. More formal statistical tests, like the likelihood ratio test (see, for example, Casella and Berger [6]), can be used for determining the number of inflated points and choose the value of r for the FIP r distribution. In that case, it is of interest to test the hypothesis H0 : φ0 = φ1 = · · · = φr = 0
vs. H1 : ∃ φi ̸= 0,
for at least one i ∈ {0, 1, . . . , r }, i.e., a reduced model against a more general one. For the GIP r model, the number of parameters does not change as r increases, and thus, the best model can also be determined from profile likelihood estimation, by considering r as a nuisance parameter (see, for example, Davison [9]). In the sequel, we will use the AIC as the main criterion for selecting the r value for the FIP r and GIP r models. Finally, it is worth mentioning that there is no simple conclusion about which criterion is better in selecting the ‘‘best’’ model. However, the BIC has a heavier penalty than the AIC (i.e., ln(n) instead of 2) on the over-fitting and thus, it tends to select the model with relatively simpler structure than that of AIC. 5.1. The strike counts data example Let us consider the number of outbreaks of strikes (in 4-week periods) in the UK coal mining industry for the years 1948–1959. These data can be found in Kendall [16], while they were further analyzed by del Castillo and Pérez-Casany [10], Ridout and Besbeas [24] and Weiß [30]. From the previous studies, it is known that the mean and variance are equal to 0.994 and 0.742, respectively. Therefore, there is an indication of underdispersion in the data since the variance–mean ratio is equal to 0.742/0.994 = 0.747 < 1. We have already seen that the FIP r and GIP r distributions can be used for modeling data with underdispersion. Next, we will fit the FIP r distribution to the data by computing numerically the ML estimates of the parameters φ and λ, along with the corresponding standard errors. The R’s function optim was used for that purpose. Further, we assume that r = 1, since from the barplot of the strike counts data (see Fig. 4) it is evident that there is an excessive number of 1’s and (maybe) 0’s in the data.
44
A.C. Rakitzis et al. / Statistical Methodology 29 (2016) 32–50 Table 2 AIC values of different FIP r models. r
0
1
2
3
4
AIC
406.75
380.79
387.24
394.92
403.03
Fig. 4. Barplot for the strike counts data.
Therefore, we consider as possible models for describing appropriately the data, the FIP 1 (φ0 , φ1 , λ) and FIP 1 (0, φ1 , λ) distributions. Our choices can be further justified by the results in Table 2, from which we deduce that the FIP 1 (φ0 , φ1 , λ) model has the minimum AIC value. Note also that since the ZIP model (for r = 0) has the largest AIC value among the models in Table 2, we may also conclude that there is not only zero-excess in the data. The data might demonstrate an excessive number of 0’s and 1’s or an excessive number of only 1’s. The latter, justifies our choice for the FIP 1 (0, φ1 , λ). Also, for comparison purposes, we fitted in the data the CMP distribution using R’s package comPoisson. The results are given in Table 3. For each model, we also provide the expected frequencies under the specific models using a three-decimal accuracy. The values for the AIC, BIC and Pearson’s χ 2 goodness-of-fit test are given in the respective rows while in the column labeled ‘‘P-value’’, we provide the P-value of the Pearson’s goodness-of-fit χ 2 -test. The degrees of freedom (d.f.) for the χ 2 -test are given in the row ‘‘df’’ while the standard errors of the estimates are given in the parenthesis. It is worth mentioning that even if some of the expected frequencies are small, there are no more than 20% of cells (for each of the considered probability models) with an expected frequency lower than 5 while all the expected frequencies are at least 1 (we considered the value 0.835 as very close to 1). Weiß [30] fitted in these data the Good, the PL1 and the PL2 distributions and evaluated the same model selection criteria. Therefore, it is not difficult to verify (by a direct comparison between Table 1 in Weiß [30] and Table 3) that the FIP 1 distribution shows a very good fit in terms of the AIC, BIC and χ 2 . Especially, the FIP 1 (0, φ1 , λ) distribution attains the lowest value in all the three criteria and, consequently, it is preferable than the GP, Good, PL1 , PL2 and CMP distributions. Note also that the differences between the FIP 1 (φ0 , φ1 , λ) and the Good, PL1 and PL2 distributions, in terms of AIC and BIC, are not very big; however, the FIP 1 (φ0 , φ1 , λ) distribution has two extra parameters than the Poisson distribution and one extra parameter than all other models. Finally, the CMP distribution shows a better fitting in the data only than the Poisson and GP distributions, even though the values of AIC and BIC are not very large, compared to the remaining distributions.
A.C. Rakitzis et al. / Statistical Methodology 29 (2016) 32–50
45
Table 3 Fitting various models in the strike counts data. Value
Observed counts
FIP 1 (φ0 , φ1 , λ)
FIP 1 (0, φ1 , λ)
CMP
0 1 2 3 4
46 76 24 9 1
47.655 76.129 22.424 7.476 1.869
46.927 76.000 23.094 7.637 1.894
47.501 70.406 30.663 6.523 0.835
380.79 386.89 0.88 1 0.3473
378.64 384.79 0.72 2 0.6980
380.01 386.11 2.91 2 0.2331
λˆ = 1.0002 (0.1983) φˆ 0 = 0.03622 (0.2910) φˆ 1 = 0.401151 (0.2037)
λˆ = 0.9921 (0.0885) φˆ 1 = 0.3775 (0.1266)
λˆ = 1.4822 (0.2522) νˆ = 1.7670 (0.2954)
AIC BIC
χ2 df P-value Estimates
Fig. 5. Barplot for the roots data.
5.2. The roots data example In the current subsection, we will use the data given in Ridout et al. [25]. This dataset contains the number of roots produced by 270 micropropagated shoots of the columnar apple cultivar Trajan. The estimated mean and variance of the data are, respectively, equal to 5.059 and 15.707. Therefore, the variance–mean ratio is equal to 15.707/5.059 = 3.105 > 1 which is a strong indication of overdispersed data. In Fig. 5, we give the barplot of the data. A clear indication from this chart is that there is an excessive number of zeros. However, the frequencies for the non-zero values does not seem to be very small, so a zero-inflated distribution might not be an appropriate model. The number of roots has an obvious peak at zero (i.e., the mode of the dataset) and then has a symmetric shape. Therefore, since the shape of the data in Fig. 5 is similar to a mixture distribution, it seems logical to assume that the GIP r (or the FIP r ) distribution, might be an appropriate model for this kind of data. Similar to the approach that was followed in Section 5.1, we provide in Table 4 fitting results for several discrete distributions for the roots dataset, along with the values of AIC, BIC and χ 2 . The considered models are the Poisson distribution (‘‘Poisson’’), the zero-inflated Poisson distribution (‘‘ZIP’’), the GIP 2 distribution (‘‘GIP 2 ’’), the FIP 2 distribution (‘‘FIP 2 ’’), the generalized Poisson
λˆ = 7.1394 (0.0416) φˆ = 0.4824 (0.0315)
λˆ = 6.6223 (0.1800) φˆ = 0.2360 (0.0259)
λˆ = 5.05926 (0.1369)
Estimates
χ
2
1367.74 1374.93 28.20 13 0.0085
43.572 22.052 14.057 9.407 16.791 23.975 28.527 29.095 25.965 20.597 14.705 9.544 5.678 3.118 1.590 0.757 0.338 0.142
GIP 2
df P-value
64.000 1.817 6.018 13.283 21.991 29.126 32.147 30.412 25.175 18.524 12.267 7.385 4.075 2.076 0.982 0.434 0.179 0.070
ZIP
1381.67 1388.87 63.16 12 <0.001
1.715 8.674 21.943 37.005 46.805 47.360 39.934 28.863 18.253 10.261 5.191 2.388 1.007 0.392 0.142 0.048 0.015 0.004
Poisson
1823.62 1827.22 2453.45 11 <0.001
64 10 13 15 21 18 24 21 23 21 17 12 5 2 3 0 0 1
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
AIC BIC
Observed counts
Value
Table 4 Fitting various models in the roots data.
φˆ1 = 0.0991 (0.0347) φˆ2 = 0.1015 (0.0402)
λˆ = 7.1175 (0.2067) φˆ0 = 0.7094 (0.0778)
1354.71 1369.11 12.11 11 0.3554
64.000 10.000 13.000 9.165 16.308 23.214 27.538 28.000 24.912 19.701 14.022 9.073 5.381 2.946 1.498 0.711 0.316 0.132
FIP 2
λˆ = 0.5637 (0.1470) θˆ = 2.2072 (0.1470)
1493.24 1500.43 144.31 13 <0.001
29.703 37.309 35.400 30.604 25.453 20.805 16.875 13.648 11.035 8.930 7.239 5.881 4.788 3.907 3.195 2.619 2.151 1.771
GP
λˆ = 1.1469 (0.0045) νˆ = 0.1568 (0.0039)
1451.35 1458.54 116.90 13 <0.001
25.691 29.466 30.315 29.268 27.011 24.071 20.847 17.624 14.590 11.858 9.479 7.466 5.800 4.450 3.374 2.531 1.880 1.383
CMP
46 A.C. Rakitzis et al. / Statistical Methodology 29 (2016) 32–50
A.C. Rakitzis et al. / Statistical Methodology 29 (2016) 32–50
47
Fig. 6. Observed (bars) and expected (lines) counts for the 15-point rating data.
distribution (‘‘GP’’) and the CMP distribution (‘‘CMP’’). Once again, the selection of r value for the FIP r and GIP r models, was based on the minimum AIC value. However, we decided to include the case r = 0 (i.e., the ZIP model) in order to have a complete comparison between various candidate models. Note also that the ZIP model has been considered by Ridout et al. [25] as an appropriate model for these data. Our analysis showed that the lowest AIC for the GIP r and FIP r models is attained for r = 2. Due to space economy, we do not present here the AIC values for the GIP r and FIP r models when r ̸∈ {0, 2}; they are available from the authors upon request. We also mention that the degrees of freedom for each model have been obtained by following Cochran [7]. Specifically, we combined the last cells (the ones corresponding to the large number of the roots) so that all the expected frequencies are at least equal to 1 and at the same time, the cells with values less than 5 are at most 20% of the total number of cells. Contrary to what we did in Section 5.1, we decided to merge the cells since there was a large number with expected frequencies less than 1 (see Table 4). Table 4 reveals that the FIP 2 distribution has the best fit in terms of AIC, BIC and χ 2 , with a P-value larger than 0.10. The P-values for the remaining models are lower than 0.01. Even though the FIP 2 distribution has the best fit in these data, it has two extra parameters than the ZIP, GIP r , GP and CMP models and three extra parameters than the Poisson model. 5.3. The bimodal data example In the current subsection, we will use the data from Table 5 in Sur et al. [28]. This dataset contains 1000 (simulated) observations from a CMP mixture, representing the responses of 1000 people in a 15-point rating scale. See Fig. 6 where the bars correspond to the observed counts. Clearly, there is double truncation in the data, at values 0 and 15. Therefore, we will fit in these data a truncated version of the GIP r distribution. It is not difficult to verify that for data in the range 1, 2, . . . , M, the p.m.f. of the truncated GIP r distribution, denoted as fTGIP r (x |φ, λ ), is given by fTGIP r (x |φ, λ ) =
fGIP r (x |φ, λ ) FGIP r (M |φ, λ ) − FGIP r (0 |φ, λ )
,
x ∈ {1, 2, . . . , M }.
From the data we easily deduce that they are also bimodal, with a first mode at value 1 and a second one at value 10. In addition, the first anti-mode (or lode, i.e., the location of the lowest value) is at value 4 and the second anti-mode is at value 15. According to Sur et al. [28], in the modeling of bimodal data it is important to capture the shape of the observed distribution as well as the magnitude of the modes and lodes of it. Therefore, we will not base our conclusions solely on a single measure,
48
A.C. Rakitzis et al. / Statistical Methodology 29 (2016) 32–50
Table 5 Fitting the truncated GIP r distribution in the 15-point rating scale data. Value
Simulated counts
TGIP 1
TGIP 2
TGIP 3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
302 115 24 13 21 37 51 81 80 84 64 49 36 30 13
302.00 6.96 18.82 38.17 61.96 83.80 97.14 98.54 88.85 72.10 53.19 35.97 22.45 13.01 7.04
233.72 170.46 6.69 15.87 30.10 47.57 64.46 76.41 80.52 76.37 65.84 52.04 37.96 25.72 16.26
174.16 130.72 101.24 13.24 26.00 42.55 59.68 73.24 79.91 78.45 70.03 57.30 43.27 30.35 19.87
λˆ = 8.1149 (0.1133) φˆ = 0.6375 (0.0126)
λˆ = 9.4840 (0.1432) φˆ = 0.7217 (0.0104)
λˆ = 9.8185 (0.1538) φˆ = 0.7426 (0.0099)
1 2 8 15
1 3 9 15
1 4 9 15
5252.18 5262.00
4741.08 4750.90
4838.06 4847.88
Estimates First mode First lode Second mode Second lode AIC BIC
1 4 10 15
such as the AIC and the BIC. What is important, is to identify the magnitude and the locations of the popular and unpopular values of the distribution. In Table 5 and Fig. 6, we provide the results of fitting the TGIP 1 , TGIP 2 and TGIP 3 distributions in the data. R’s function optim was used for estimating the parameters of each model by directly maximizing the likelihood function. From Table 5 we deduce that the TGIP 3 model correctly identifies the modes and the first anti-mode in the data, whereas indicates a neighboring value (9) as the second mode. However, the estimate of the magnitude for the first mode (at value 1) is not very good while for both anti-modes, their magnitude is captured. With the exception of the magnitude for the first mode, the TGIP 3 distribution estimates very well the shape of the empirical distribution. Furthermore, comparing the fitting of the TGIP 3 distribution to the Poisson mixture and CMP mixture (see Table 5 in Sur et al. [28]), we notice that all these three models detect the same modes and anti-modes whereas the proposed model has the advantage of using only two parameters instead of 3 (for the Poisson mixture) or 5 (for the CMP mixture). It is worth mentioning that the AIC for the TGIP 2 distribution is comparable to the AIC value for the CMP mixture but it fails to identify the first anti-mode. However, it indicates a neighboring value (3) and it approximates very well the shape of the observed distribution. Consequently, under certain circumstances, the truncated GIP r distribution can be used for modeling that kind of data, even though it is not as flexible as the CMP mixture model. 6. Conclusions In this work, two general inflated Poisson distributions, a general one and a special case of it, were proposed and their properties (p.m.f., c.d.f., m.g.f.) were derived. The considered models are useful to model count data with an excessive number of the first r + 1 values 0, 1, . . . , r while they can be used for approximating several types of non-standard count data. Especially, the r-geometrically inflated Poisson distribution is a two-parameter model that offers a simple, unified, inflating mechanism for the first r + 1 values of a Poisson distribution with parameter λ. Apart from the theoretical properties of the GIP r distribution, three procedures for estimating its parameters were discussed. The usefulness of the considered models was further examined by fitting them in three datasets that arise in practice. The results showed that under certain circumstances, they can be used for fitting
A.C. Rakitzis et al. / Statistical Methodology 29 (2016) 32–50
49
underdispersed and overdispersed data or even, bimodal data. In addition, the GIP r distribution is, in terms of the AIC or BIC, a viable alternative to several popular discrete distributions such as the CMP or the GP. Finally, it goes without saying that instead of the Poisson distribution, several other distributions (e.g., binomial, negative binomial) can be inflated in the same way, offering a broad class of discrete distributions that can be used in the modeling of different types of count data. Acknowledgments The authors would like to thank the two anonymous reviewers and the associate editor for the constructive comments they made, which helped us to improve the content of the paper. The work of the first author has received funding from the People Programme (Marie Curie Actions) of the European Union’s Seventh Framework Programme (FP7/2007–2013) under the REA grant agreement no 328037. References [1] S.J. Bae, J.Y. Hwang, W. Kuo, Yield prediction via spatial modeling of clustered defect counts across a wafer map, IIE Trans. 39 (2007) 1073–1083. [2] A. Begum, A. Mallick, N. Pal, A generalized inflated Poisson distribution with application to modeling fertility data, Thail. Stat. 12 (2) (2014) 135–139. [3] D. Bohning, Zero-inflated Poisson models and C.A.MAN: A tutorial collection of evidence, Biom. J. 40 (7) (1998) 833–843. [4] D. Bohning, E. Dietz, P. Schlattmann, L. Mendonca, U. Kirchner, The zero-inflated Poisson model and the decayed, missing and filled teeth index in dental epidemiology, J. Roy. Statist. Soc. Ser. A 162 (2) (1999) 195–209. [5] S. Brown, A. Duncan, M.N. Harris, J. Roberts, K. Taylor, A zero-inflated regression model for grouped data, Oxford Bull. Econ. Stat. 77 (2015) 822–831. [6] G. Casella, R.L. Berger, Statistical Inference. Vol. 2, Duxbury Pacific Grove, CA, 2002. [7] W.G. Cochran, The χ 2 test of goodness of fit, Ann. Math. Statist. 23 (1952) 315–345. [8] P.C. Consul, Generalized Poisson Distributions-Properties and Applications, Marcel Dekker Inc., 1989. [9] A.C. Davison, Statistical Models. Vol. 11, Cambridge University Press, 2003. [10] J. del Castillo, M. Pérez-Casany, Weighted Poisson distributions for overdispersion and underdispersion situations, Ann. Inst. Statist. Math. 50 (3) (1998) 567–585. [11] A.P. Dempster, N.M. Laird, D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, J. R. Stat. Soc. Ser. B 39 (1) (1977) 1–38. [12] B. Efron, Double exponential families and their use in generalized linear regression, J. Amer. Statist. Assoc. 81 (3) (1986) 709–721. [13] P.L. Gupta, R.L. Gupta, R.C. Tripathi, Inflated modifed power series distributions with applications, Comm. Statist. Theory Methods 24 (9) (1995) 2355–2374. [14] N.L. Johnson, A.W. Kemp, S. Kotz, Univariate Discrete Distributions, third ed., John Wiley & Sons, 2005. [15] D. Karlis, EM algorithm for mixed Poisson and other discrete distributions, ASTIN Bull. 35 (1) (2005) 3–24. [16] M.G. Kendall, Natural law in social sciences, J. Roy. Statist. Soc. Ser. A 124 (1961) 1–19. [17] D. Lambert, Zero-inflated Poisson regression, with an application to defects in manufacturing, Technometrics 34 (1) (1992) 1–14. [18] D. Lord, S.P. Washington, J.N. Ivan, Poisson, Poisson-gamma and zero-inflated regression models of motor vehicle crashes: balancing statistical fit and theory, Accid. Anal. Prev. 37 (1) (2005) 35–46. [19] G.J. McLachlan, D. Peel, Finite Mixture Models, Wiley, 2000. [20] M. Melkersson, D.-O. Rooth, Modeling female fertility using inflated count data models, J. Popul. Econ. 13 (2) (2000) 189–203. [21] M. Murat, D. Szynal, Non-zero inflated modifed power series distributions, Comm. Statist. Theory Methods 27 (12) (1998) 3047–3064. [22] K.N. Padney, Generalized inflated Poisson distribution, J. Sci. Res. Banaras Hindu Univ. XV (2) (1964–1965) 157–162. [23] R Core Team, R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, 2014, URL: http://www.R-project.org. [24] M.S. Ridout, P. Besbeas, An empirical model for underdispersed count data, Stat. Model. 4 (1) (2004) 77–89. [25] M. Ridout, C.G.B. Demétrio, J. Hinde, Models for count data with many zeros, in: Proceedings of the XIXth International Biometrics Conference, Cape Town, 1998, pp. 179–192, Invited Papers. [26] K.F. Sellers, S. Borle, G. Shmueli, The COM-Poisson model for count data: A survey of methods and applications, Appl. Stoch. Models Bus. Ind. 28 (2012) 104–116. [27] G. Shmueli, T.P. Minka, J.B. Kadane, S. Borle, P. Boatwright, A useful distribution for fitting discrete data: revival of the Conway–Maxwell–Poisson distribution, J. R. Stat. Soc. Ser. C. Appl. Stat. 54 (2005) 127–142. [28] P. Sur, G. Shmueli, S. Bose, P. Dubey, Modeling bimodal discrete data using Conway–Maxwell–Poisson mixture models, J. Bus. Econom. Statist. 33 (3) (2015) 352–365. [29] S. Ullah, C.F. Finch, L. Day, Statistical modelling for falls count data, Accid. Anal. Prev. 42 (2010) 384–392. [30] C.H. Weiß, Integer-valued autoregressive models for counts showing underdispersion, J. Appl. Stat. 40 (9) (2013) 1931–1948. [31] M. Xie, T.N. Goh, SPC of a near zero-defect process subject to random shocks, Qual. Reliab. Eng. Int. 9 (1993) 89–93. [32] M. Xie, B. He, T.N. Goh, Zero-inflated Poisson model in statistical process control, Comput. Statist. Data Anal. 38 (2001) 191–201.
50
A.C. Rakitzis et al. / Statistical Methodology 29 (2016) 32–50
[33] P. Yip, Inference about the mean of a Poisson distribution in the presence of a nuisance parameter, Austral. J. Statist. 30 (1988) 299–306. [34] K.C.H. Yip, On modeling claim frequency data in general insurance with extra zeros, Insurance Math. Econom. 36 (2005) 153–163. [35] K. Yoneda, Estimations in some modified Poisson distributions, Yokohama Math. J. 10 (1962) 73–96.