Computational Statistics and Data Analysis 55 (2011) 3381–3385
Contents lists available at ScienceDirect
Computational Statistics and Data Analysis journal homepage: www.elsevier.com/locate/csda
Short communication
A note on the binomial model with simplex constraints Guo-Liang Tian ∗ , Kai Wang Ng, Philip L.H. Yu Department of Statistics and Actuarial Science, The University of Hong Kong, Pokfulam Road, Hong Kong, China
article
abstract
info
Article history: Received 10 September 2009 Received in revised form 20 May 2011 Accepted 6 June 2011 Available online 15 June 2011
Liu (2000) considered maximum likelihood estimation and Bayesian estimation in a binomial model with simplex constraints using the expectation–maximization (EM) and data augmentation (DA) algorithms. By introducing latent variables {Zij } and {Yij } (to be defined later), he formulated the constrained parameter problem into a missing data problem. However, the derived DA algorithm does not work because he actually assumed that the {Yij } are known. Furthermore, although the final results from the derived EM algorithm are correct, his findings are based on the assumption that the {Yij } are observable. This note provides a correct DA algorithm. In addition, we obtained the same E-step and M-step under the assumption that the {Yij } are unobservable. A real example is used for illustration. © 2011 Elsevier B.V. All rights reserved.
Keywords: Constrained binomial model DA algorithm EM algorithm Simplex constraints
1. A constrained binomial model 1.1. The statistical model Suppose that the Ni are known positive integers and 0 ≤ pi ≤ 1, i = 1, . . . , m, and that p = (p1 , . . . , pm )⊤ . Liu (2000) considers the following binomial model: ind
ni |(Ni , pi ) ∼ Binomial(Ni , pi ),
i = 1, . . . , m,
(1.1)
subject to simplex constraints p = Bθ
or pi =
q −
bik θk ,
i = 1, . . . , m,
(1.2)
k=1
and q −
B∗k θk ≤ 1,
(1.3)
k=1
where the θ = (θ1 , . . . , θq )⊤ with θk ≥ 0 are unknown parameters, Bm×q = (bik ), bik ≥ 0, are known scalars, and B∗k ≥ max1≤i≤m bik > 0 are known scalars for k = 1, . . . , q. The objective is to estimate p or equivalently θ on the basis of the observed data Yobs = {Ni , ni }m i=1 . ∑ For convenience, Liu (2000) further defines θ0 = ˆ 1 − qk=1 B∗k θk , bi0 = ˆ 0 and B∗0 = ˆ 1. It follows from (1.3) that 0 ≤ θ0 = B∗0 θ0 = 1 −
q −
B∗k θk ,
k=1
∗
Corresponding author. Tel.: +852 28591984; fax: +852 28589041. E-mail address:
[email protected] (G.-L. Tian).
0167-9473/$ – see front matter © 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.csda.2011.06.005
3382
G.-L. Tian et al. / Computational Statistics and Data Analysis 55 (2011) 3381–3385 Table 1 Illustration for the physical particle model. Particle •
Location Probability Detector # detected
i.e.,
∑q
k=0
1, . . . , N1
1, . . . , N2
N1
0 B∗0 θ0 1 n1
N2
···
1, . . . , j, . . . , Ni
··· ··· ··· ···
1 B∗1 θ1 2 n2
Ni
···
k B∗k θk i ni
1, . . . , Nm
··· ··· ··· ···
Nm
q B∗q θq m nm
B∗k θk = 1. Therefore, from (1.2), it is easy to obtain
0 ≤ pi =
q −
bik θk ≤
k=0
q −
B∗k θk = 1,
(1.4)
k =0
for all i = 1, . . . , m. Liu (2000) claims that monotonic increasing, monotonic decreasing, increasing convex, decreasing convex, increasing concave, and decreasing concave constraints on pi are special cases of (1.2) and (1.3). To formulate the above constrained parameter problem into a missing data problem, Liu (2000) first proposes a so-called physical particle model.
1.2. A physical particle model Motivated by the EM implementation of Shepp and Vardi (1982) and Lange and Carson (1984) for the Poisson emission tomography, Liu (2000) establishes a physical particle model, which suggests a DA scheme∑ for the constrained binomial m model (1.1). We summarize the physical particle model in Table 1. There are a total of N = i=1 Ni particles, where each q ∗ particle is emitted at one of q + 1 locations at random with probabilities {Bk θk }k=0 . When emitted at location k, the particle is detected by detector i with probability bik /B∗k . Among the Ni particles, we observe ni particles detected by detector i. Therefore, the observed data set is Yobs = {Ni , ni }m i =1 .
2. MLE and Bayesian estimation
2.1. MLE based on complete observations Note that the counts emitted at the q + 1 locations are not available. If the counts emitted at the q + 1 locations were available, estimation of the unknown parameters θ would be trivial. For 1 ≤ i ≤ m, 1 ≤ j ≤ Ni , k = 0, 1, . . . , q, we introduce 0–1 latent variables Zij (k) =
1, 0,
if particle j to be detected by detector i is emitted at location k, otherwise,
(2.1)
and define Zij =( ˆ Zij (0), Zij (1), . . . , Zij (q))⊤ ; then Zij |θ ∼ Multinomial(1; B∗0 θ0 , B∗1 θ1 , . . . , B∗q θq ),
(2.2)
∑q
where k=0 Zij (k) = 1. Therefore, the likelihood function for the complete data Ycom = {Yobs , Z } = Yobs ∪ {Zij : 1 ≤ i ≤ m, 1 ≤ j ≤ Ni } is given by L(θ|Ycom ) ∝
Ni m ∏ ∏
(B∗0 θ0 )Zij (0) · · · (B∗q θq )Zij (q) =
i=1 j=1
q ∏
(B∗k θk )Sk ,
k=0
∑m ∑Ni ∑q ∑q ∑m ∑Ni j=1 Zij (k) is the sufficient statistic of θk . Since k=0 Zij (k) = 1, we have k=0 Sk = j =1 i =1 i =1 ∑q ∑ q { k=0 Zij (k)} = m i=1 Ni = N. Hence, the complete-data MLEs of {θk }k=0 are given by where Sk =
θˆk =
Sk NB∗k
,
k = 0, 1, . . . , q.
(2.3)
G.-L. Tian et al. / Computational Statistics and Data Analysis 55 (2011) 3381–3385
3383
2.2. Consistency of the statistical model and the physical model Next, we need to verify that the statistical model (1.1) with simplex constraints (1.2) and (1.3) can describe the physical particle model. In fact, by further introducing the unobservable variables
Yij =
1, 0,
if particle j is detected by detector i, otherwise,
we obtain Yij |(Zij (k) = 1, θ ) ∼ Bernoulli(bik /B∗k ).
(2.4)
By using (2.4) and (2.2), we have Pr{Yij = 1|θ} =
q −
Pr{Yij = 1|Zij (k) = 1, θ} Pr{Zij (k) = 1|θ }
k=0
=
=
q − bik
B∗k k=0 q −
(B∗k θk ) (1.4)
bik θk = pi ,
∀ j = 1, . . . , Ni .
(2.5)
k=0
∑N
i Finally, suppose that ni = j=1 Yij (0 ≤ ni ≤ Ni ); then ni denotes the number of particles detected by detector i among the Ni particles. It is clear that n1 , . . . , nm are independent and satisfy (1.1).
2.3. MLE via the EM algorithm The conditional predictive distribution is given by f (Z |Yobs , θ ) =
Ni m ∏ ∏
f (Zij |Yobs , θ )
i=1 j=1
=
Ni − m ∏ 1 ∏
f (Zij , Yij = y|Yobs , θ )
i=1 j=1 y=0
=
Ni − m ∏ 1 ∏
f (Zij |Yobs , Yij = y, θ ) · Pr(Yij = y|Yobs , θ )
i=1 j=1 y=0
=
Ni − m ∏ 1 ∏
f (Zij |Yij = y, θ ) · Pr(Yij = y|Yobs , θ ),
(2.6)
i=1 j=1 y=0
where1 Yij |(Yobs , θ ) ∼ Hyper-geometric(1, Ni − 1, ni ),
(2.7)
(1)
Zij |(Yij = 1, θ ) ∼ Multinomialq+1 (1, πi ),
(2.8)
(0)
Zij |(Yij = 0, θ ) ∼ Multinomialq+1 (1, πi ),
(2.9)
(1)
= (bi0 θ0 , . . . , biq θq ) /pi ,
(0)
= ((B0 − bi0 )θ0 , . . . , (Bq − biq )θq )⊤ /(1 − pi ).
πi πi
⊤
∗
and
∗
It is not difficult to derive (2.7)–(2.9) by using the following results. q
Theorem 2.1. Suppose that Xk ∼ Binomial(nk , p) and let the {Xk }k=1 be independent; then
Xk |
q −
k ′ =1
Xk′ = x+
∼ Hyper-geometric nk ,
q −
nk − nk , x +
.
k=1
1 Liu (2000) does not give formulae like (2.6) and (2.7). It seems to us that he assumes that the {Y } are observed. ij
(2.10)
3384
G.-L. Tian et al. / Computational Statistics and Data Analysis 55 (2011) 3381–3385
Theorem 2.2. Pr{Yij = 1|Zij , θ ) =
Z (k) q ∏ bik ij B∗k
k=0
Pr{Yij = 0|Zij , θ ) =
q ∏
and
Zij (k)
bik
1−
,
B∗k
k=0
(2.11)
.
(2.12)
Now we consider the E-step and M-step of the EM algorithm. On the one hand, from (2.8) and (2.9), we obtain bik θk
E (Zij (k)|Yij = 1, θ ) =
E (Zij (k)|Yij = 0, θ ) =
and
pi
(B∗k − bik )θk , 1 − pi
which can rewritten as bik θk
E (Zij (k)|Yij , θ ) =
· Yij +
pi
(B∗k − bik )θk · (1 − Yij ). 1 − pi
(2.13)
On the other hand, from (2.7), we have E (Yij |Yobs , θ ) =
ni Ni
,
and E (1 − Yij |Yobs , θ ) = 1 −
ni Ni
.
(2.14)
Therefore, the E-step is computing the conditional expectation of the sufficient statistic Sk = (t )
Sk
Ni
m
i=1
j =1
Zij (k), i.e.,
E (Zij (k)|Yobs , θ (t ) )
−−
=
∑m ∑Ni
i=1 j=1 Ni m − −
=
E {E (Zij (k)|Yobs , Yij , θ (t ) )|Yobs , θ (t ) }
i=1 j=1 Ni m − −
=
E {E (Zij (k)|Yij , θ (t ) )|Yobs , θ (t ) }
i=1 j=1 (2.13)
=
Ni m − −
E
(t )
bik θk Yij
=
(t )
pi Ni
i=1 j=1
θk
=
m − ni bik (t ) (t )
for k = 0, 1, . . . , q, where pi (t )
Sk
NB∗k
,
+
+
(B∗k − bik )θk(t ) (Ni − ni )
(1 − p(i t ) )Ni (Ni − ni )(B∗k − bik ) (t )
1 − pi
pi
i =1
θk(t +1) =
(t )
1 − pi
Ni (t ) m − − bik θk ni
(t )
+
(t )
pi
i=1 j=1 (2.14)
(B∗k − bik )θk(t ) (1 − Yij )
=
|Yobs , θ
(t )
,
(2.15)
(t )
∑q
bik θk . From (2.3), the M-step updates as follows:
k=0
k = 0, 1, . . . , q.
(2.16)
2.4. Bayes estimation via the DA algorithm For Bayesian estimation, with the conjugate prior distribution
π (θ ) ∝
q ∏ k=0
c −1
θk k
,
θk ≥ 0,
q −
B∗k θk = 1,
(2.17)
k=0
the complete-data posterior of (B∗0 θ0 , . . . , B∗q θq )⊤ is
(B∗0 θ0 , . . . , B∗q θq )⊤ |(Yobs , Z ) ∼ Dirichlet(c0 + S0 , . . . , cq + Sq ). The predictive distribution for the missing values is given by (2.6). This leads to the following DA algorithm: I-step: Given the current sample of θ , impute the missing values Z by generating a sample from (2.6). P-step: Given the current imputed complete data, generate θ from (2.18). We noted that the DA algorithm developed by Liu (2000) does not work.
(2.18)
G.-L. Tian et al. / Computational Statistics and Data Analysis 55 (2011) 3381–3385
3385
Fig. 1. The observed mortality frequencies and the MLE of the mortality rate under the increasing convex constraints.
3. An illustrative example Broffitt (1988) considered the estimation of mortality rate in life insurance. Table 1 of Liu (2000) displays the age ti from 35 to 64 years, the number Ni of people insured under a certain policy, and the number ni of the insured who died (i = 1, . . . , m and m = 30). Anyone who started or ended a policy in the middle of a year is counted as half. Let pi denote the true mortality rate (i.e., the probability of death) at age ti . It is reasonable to assume that the {pi } are increasing convex over the observed range, that is, 0 ≤ p2 − p1 ≤ p3 − p2 ≤ · · · ≤ pm − pm−1 ≤ 1. This increasing convex constraints can be written as p = (p1 , . . . , pm )⊤ = Bθ ,
(3.1)
where
B=
1
1m−1
0⊤ m−1
Ωm−1
,
Ωm−1
1 2
. = .. m − 2 m−1
0 1
.. . m−3 m−2
··· ··· .. .
0 0
··· ···
1 2
.. .
0 0
.. . , 0 1
and θ = (θ1 , . . . , θm ) ∈ R+ . The aim is to estimate the unknown mortality rates p subject to the constraints (3.1). The observed mortality rates (=ni /Ni ) against the ages are shown in Fig. 1 as a solid line. The observed deaths ni at each age are assumed to follow independent binomial distributions with unknown mortality rates pi and known population sizes Ni (cf. (1.1)). Suppose that θ (0) = 0.0001 · 1m , as the initial value of θ . Using (2.15) and (2.16), the EM algorithm converged to the constrained MLE after 3000 iterations. The constrained MLEs {ˆpi } are dotted in Fig. 1. ⊤
m
References Broffitt, J.D., 1988. Increasing and increasing convex Bayesian graduation. Transaction of the Society of Actuaries 40, 115–148. Lange, K., Carson, R., 1984. EM reconstruction for emission and transmission tomography. Journal of Computer Assisted Tomography 8, 306–312. Liu, C.H., 2000. Estimation of discrete distributions with a class of simplex constraints. Journal of the American Statistical Association 95, 109–120. Shepp, L.A., Vardi, Y., 1982. Maximum likelihood reconstruction for emission tomography. IEEE Transactions on Image Processing 2, 113–122.