Characterization of multinomial exponential families by generalized variance

Characterization of multinomial exponential families by generalized variance

Statistics and Probability Letters 80 (2010) 939–944 Contents lists available at ScienceDirect Statistics and Probability Letters journal homepage: ...

259KB Sizes 0 Downloads 20 Views

Statistics and Probability Letters 80 (2010) 939–944

Contents lists available at ScienceDirect

Statistics and Probability Letters journal homepage: www.elsevier.com/locate/stapro

Characterization of multinomial exponential families by generalized variance Abdelaziz Ghribi, Afif Masmoudi ∗ Sfax University, Tunisia

article

abstract

info

Article history: Received 18 December 2009 Received in revised form 3 February 2010 Accepted 4 February 2010 Available online 10 February 2010

In this paper, we show that the multinomial exponential families in a d-dimensional linear space are characterized by the determinant of their covariance matrix, named generalized variance. © 2010 Elsevier B.V. All rights reserved.

Keywords: Exponential family Generalized variance Means domain Multinomial distribution Variance function

1. Introduction Exponential families have been a distinguished topic of theoretical statistics and probability. In this context, the natural exponential families (NEFs) are characterized by their variance functions. Recently, other papers investigated the so-called generalized variance, that is the determinant of the covariance matrix of NEF (e.g. Kokonendji and Seshadri (1996), Hassairi (1999), Kokonendji and Pommeret (2001, 2007), Kokonendji and Masmoudi (2006) and Bernardoff et al. (2008)). In general, the knowledge of the NEF is not given by the knowledge of its domain of the means and its generalized variance function. Indeed, we give here the following example. Example 1.1. Let F1 be the NEF on R2 and generated by a negative binomial distribution and a gamma distribution (see Casalis, 1996, page 1834). Its variance function is defined on MF1 = (0, +∞)2 by VF1 (m) =



m21 + m1 m1 m2

m1 m2 m22



.

Let F2 be the NEF on R2 defined as the product of independent Poisson and gamma, with variance function VF2 (m1 , m2 ) = diag(m1 , m22 ) on MF2 = (0, ∞)2 . So, we have det(VF1 (m1 , m2 )) = det(VF2 (m1 , m2 )) = m1 m22 with MF1 = MF2 = (0, ∞)2 , but also F1 and F2 are distinct. However, the converse happens for the NEF of Poisson–Gaussian distribution (Kokonendji and Masmoudi, 2006). The aim of the present paper is to show that this phenomena, of uniqueness of a NEF when its generalized variance function is Pd Pd given, also occurs when the NEF is the family of all multinomial distributions p0 δ0 + j=1 pj δej , with pj > 0 and j=0 pj = 1, where (ej )dj=1 is the canonical basis of Rd and δej denotes the Dirac measure at ej .



Corresponding author. Tel.: +216 098667552; fax: +216 74274437. E-mail address: [email protected] (A. Masmoudi).

0167-7152/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.spl.2010.02.004

940

A. Ghribi, A. Masmoudi / Statistics and Probability Letters 80 (2010) 939–944

The present paper is structured as follows. In Section 2, definitions and generalized variance function of d-dimensional multinomial NEF are given. The technical materials of this characterization are presented in Section 4. The main result of the present paper is given in Section 3. 2. NEFs and generalized variance The natural exponential families (NEFs) represent a very important class of distributions in probability and statistical theory. In the previous few years, various interesting works have been devoted to the theory of NEFs (see Barndorff-Nielsen (1978), Brown (1986) and Letac (1992)). Let M (Rd ) be the set of σ -finite positive measures on the Borel sets of Rd , which are not concentrated on an affine subspace of Rd . For µ ∈ M (Rd ), let Θ (µ) denotes the (non-empty) interior of the domain of its Laplace transform, Lµ (θ ) = R exphθ , xi µ(dx) < ∞, where hθ , xi = θ T x. The NEF generated by µ ∈ M (Rd ), denoted by F = F (µ), is the family of Rd probability measures

{P (θ , µ)(dx) = [Lµ (θ )]−1 exphθ , xi µ(dx); θ ∈ Θ (µ)}. Let us denote the cumulant function of the measure µ by kµ (θ ) = ln Lµ (θ ), ∀θ ∈ Θ (µ). If X is a random vector distributed according to P (θ , µ), then Eθ (X ) = ∂ kµ (θ )/∂θ = k0µ (θ ) and v arθ (X ) = ∂ 2 kµ (θ )/

(∂θ T ∂θ ) = k00µ (θ ). The function m(θ ) = k0µ (θ ) is a one-to-one transformation from Θ (µ) onto MF = k0µ (Θ (µ)) and thus m = m(θ ) provides an alternative parametrization of the family F = {P (m, F ); m ∈ MF }, called the mean parametrization. So, we can return to the canonical parametrization by θ = θ (m). Note that MF depends only on F , and not on the choice of the generating measure µ of F . If MF = ΩF , where ΩF = int(conv(supp µ)) denotes the interior of the convex hull of the support of µ, the family F is said to be steep. In particular, if the support of a NEF F is bounded then F is a steep family. The covariance matrix of the probability measure P (m, F ) can be written as a function of the mean parameter m, VF (m) = k00µ (θ) also named the variance function of F . Note that (VF (m))−1 is the Fisher information matrix when the model is parameterized by the mean parameter m. An important result concerning natural exponential family, due to Tweedie and Bartlett (1947), states that together with the means domain MF , the variance function VF of a natural exponential family characterizes the family F within the class of all NEFs. Now, we introduce the generalized variance function of a NEF, F = F (µ), defined by det(k00µ (θ )) = det(VF (m)) it has been introduced in Kokonendji and Seshadri (1996). Recall that the generalized variance function of the quadratic NEF F = F (µ) (Casalis, 1996) verify the following formulae det(k00µ (θ )) = exp{α kµ (θ ) + hb, θi + c }, where (α, b, c ) ∈ R × Rd × R, and θ ∈ Θ (µ). We conclude this section by recalling the NEF of multinomial family and its generalized variance function. Let N be a nonnegative integer. Define the NEF F generated by the measure

µ = p0 δ0 +

d X

pj δej ,

d X

with pj > 0 and

j=1

pj = 1.

j =0

Then

Θ (µ) = R

d

and kµ (θ ) = N ln p0 +

d X

! pi e

θi

.

i=1

Its generalized variance function is defined on

( MF =

m = (m1 , . . . , md ) ∈ (0, +∞) ; d

d X

) mi < N

i=1

by det(VF (m)) =

1 N

m1 · · · md (N − m1 − · · · − md ).

3. Main result In what follows, we denote by F the NEF on Rd with bounded support and by ∂ MF its boundary means domain. Let us first show our main result in which we give a characterization of the multinomial exponential family by the generalized variance function.

A. Ghribi, A. Masmoudi / Statistics and Probability Letters 80 (2010) 939–944

941

Theorem 3.1. Let F be a NEF on Rd , with bounded support, supposed to be included in Rd+ , parameterized by m ∈ MF , with variance function VF . Then the two following statements are equivalent: (a) For all m = (m1 , . . . , md ) ∈ MF , det(VF (m)) = m1 · · · md (1 − m1 − · · · − md ). (b) F is the family of multinomial distributions p0 δe0 +

(3.1)

Pd

j=1

pj δej , where ( )

ej dj=1

is the canonical basis of R , with pj > 0 and d

Pd

j=0 pj = 1.

The following result shows that the means domain of a NEF F with generalized variance function VF verifying (3.1) is a polytope convex set. Proposition 3.1. Let F be a NEF on Rd , with bounded support, supposed included in Rd+ , parameterized by m ∈ MF , with variance function VF defined on MF . Suppose that, for all m = (m1 , . . . , md ) ∈ MF , det(VF (m)) = m1 · · · md (1 − m1 − · · · − md ). Then,

( MF =

m = (m1 , . . . , md ) ∈ (0, 1) ; d

d X

) mi < 1 .

i=1

4. Proofs We first introduce the following notations. Let (e1 , e2 , . . . , ed ) be the canonical basis of Rd , ei = (0, 0, . . . , 1, 0, . . . , 0), 1 in the ith place. If θ = (θ1 , θ2 , . . . , θd ) ∈ Rd , we denote by θˇi = (θ1 , . . . , θi−1 , 0, θi+1 , . . . , θd ) and by µθˇ (dxi ) the image measure of exphθˇi , xiµ(dx1 , . . . , dxd ) by the i

projection τi : Rd → R; x = (x1 , x2 , . . . , xd ) 7→ xi . We also, denote Si the support of µθˇ , which is independent of θ . i If m = (m1 , m2 , . . . , md ) ∈ MF , we set mi = mi (θ ) =

∂ kµ (θ ). ∂θi

We denote by Vii (m) the ith diagonal element of VF (m). For the proof of Proposition 3.1, we need the following lemma which is given by Hassairi and Masmoudi (2005). Lemma 4.1. If the support of F is bounded then the following properties holds. (i) The variance function VF of F extends to the closure M F of MF . (ii) For all m ∈ ∂ MF , the extended variance VF (m) is a degenerated matrix. Proof. (i) By applying Corollary 2.4 of Hassairi and Masmoudi (2005), the variance function VF extends to M F . (ii) Let m ∈ ∂ MF and let H be an exposed face containing m then by using Corollary 3.4 of Hassairi and Masmoudi (2005), we have the desired result.  Proof of Proposition 3.1. Let m = (m1 , m2 , . . . , md ) ∈ MF ⊂ Rd+ . Since VF (m) is a positive definite matrix then its determinant is strictly positive and by hypothesis the support of the family F is bounded and included in Rd+ , then the condition (a) of Theorem 3.1 implies that i=1 mi < 1. Thus MF ⊆ {m = (m1 , . . . , md ) ∈ (0, 1)d ; i=1 mi < 1}. To conclude the proof of the proposition it suffices to show the converse of this inclusion. Let m = (m1 , m2 , . . . , md ) ∈ MF and let t0 = inf{t , t ∈ [0, 1], tm ∈ MF }. Since MF is a bounded convex set then t0 m ∈ ∂ MF . According to Lemma 4.1, the variance function VF extends to t0 m and besides det(VF (t0 m)) = 0. This, using (3.1) implies that t0 m1 m2 · · · md = 0 or t0 (m1 + m2 · · · + md ) = 1. As, we have 0 < m1 + m2 · · · + md < 1 and 0 ≤ t0 ≤ 1 then t0 m1 m2 · · · md = 0. Necessarily, t0 = 0 because 0 < mi < 1; i = 1, 2, . . . , d. Hence 0 ∈ ∂ MF . 

Pd

Pd

Now, we will show that e1 , . . . , ed are elements of ∂ MF . Let ti = inf{t , t ∈ [0, 1], tm + (1 − t )ei ∈ MF }, then ti m + (1 − ti )ei ∈ ∂ MF . This means that, (ti m1 , . . . , ti mi + 1 − ti , . . . , ti md ) ∈ ∂ MF . By using the fact that 0 < mi < 1, one can deduce that ti (m1 + m2 · · · + md ) = ti . Necessary ti = 0 because 0 < m1 + m2 · · · + md < 1. Therefore, ei ∈ ∂ MF . Pd Since {0, e1 , e2 , . . . , ed } ⊂ ∂ MF , then {m = (m1 , . . . , md ) ∈ [0, 1]d ; i=1 mi ≤ 1} = conv({0, e1 , e2 , . . . , ed }) ⊂ M F . Hence, the proposition is easily deduced. Recall that µ ∈ M (Rd ), then µθˇ ∈ M (R). We denote by ai = inf(Si \ {0}). i

942

A. Ghribi, A. Masmoudi / Statistics and Probability Letters 80 (2010) 939–944

The proof of Theorem 3.1 necessitates some technical lemmas. The following result provides a precise idea on the support of F . More precisely, we have. Lemma 4.2. Si = {0, 1}. Proof. We prove this lemma in four steps. We successively show that Step 1: Si ⊂ [0, 1] and inf(Si ) = 0. Step 2: limθi −→−∞ Step 3: ai ≤

Vii (m(θ))+m2i (θ)

mi (θ) Vii (m(θ))+m2i (θ)

= ai .

.

mi (θ) Vii (m)+m2i lim infmi −→0 m

= ai = 1. Step 1. Since supp(µ) ⊂ [0, 1]d , then Si = supp(µθˇ )⊂[0, 1]. On the other hand, 0 is an extreme point of the convex i support of µ, then 0 ∈ supp(µ). This implies that inf(Si ) = 0. 0 Step 2. Observe that Lµ ˇ (θi ) = Lµ (θ ), so mi (θ ) = kµ (θi ). Since the support Si of µθˇ is included in [0, 1] and inf(Si ) = 0, i ˇ θ Step 4:

i

θi

i

then it is shown in Jørgensen et al. (1994), that lim

θi −→−∞

mi (θ ) = 0 and

lim

θi −→−∞

L00µ (θi ) θˇi

L0µ (θi )

By using the fact that Vii (m(θ )) + m2i (θ ) = Vii (m(θ )) + m2i (θ ) mi (θ )

= ai ∈ [0, 1].

θˇi

R

Rd

x2i P (θ , µ)(dx), we obtain

R 2 L00µ (θi ) x P (θ , µ)(dx) θˇ Rd i R = 0 i . = L x P (θ , µ)( dx ) µ ˇ (θi ) Rd i θi

Therefore Vii (m(θ )) + m2i (θ )

lim

mi (θ )

θi −→−∞

= ai .

(4.1)

Step 3. Observe that Vii (m(θ )) + m2i (θ ) mi (θ )

R 2 d x P (θ , µ)(dx) = RR i x P (θ , µ)(dx) Rd i

and

Z Rd

x2i P (θ , µ)(dx) ≥ ai

Z Rd

xi P (θ , µ)(dx),

we deduce Vii (m(θ )) + m2i (θ ) mi (θ )

≥ ai .

Step 4. According to step 3 and by using formulae (4.1), we obtain lim inf

Vii (m) + m2i mi

mi −→0

= ai .

Since the support of the exponential family F is bounded then F is steep and we have ΩF = MF = {(x1 , . . . , xd ) ∈ ]0, 1[d ; Pd i=1 xi < 1}. Therefore, for all m ∈ MF , one has Vii (m) +

m2i

Z =

x2i P

(m, F )(dx) =

Z

x2i P

(m, F )(dx) ≤

Z

xi P (m, F )(dx).

0≤xi ≤1

It follows that 0 < Vii (m) ≤ mi − m2i .

(4.2)

It is well known that det(VF (m)) ≤ 1≤i≤d Vii (m). To see this, write the Cholesky decomposition of a positive definite matrix P = TT ∗ , where T is an upper triangular matrix with positive elements on the diagonal. Thus from the hypothesis, det(VF (m)) = m1 · · · md (1 − m1 − · · · − md ), we obtain

Q

1 − m1 − · · · − md ≤

Y Vii (m) 1≤i≤d

mi

.

A. Ghribi, A. Masmoudi / Statistics and Probability Letters 80 (2010) 939–944

943

V (m)

By applying formulae (4.2), one has iim ≤ 1 − mi ≤ 1. i Therefore, for all i ∈ {1, . . . , d}, one has

X

1−

Y Vjj (m)

mj ≤

1≤j≤d

mj

1≤j≤d

Vii (m)



mi

.

Since

! X

lim inf 1 − mi −→0

mj

≤ lim inf mi −→0

j6=i

Vii (m) mi

+ mi .

Then,

! 1−

X

≤ ai

mj

j6=i

and by letting m to zero, one has 1 ≤ ai , necessarily ai = 1.



Lemma 4.3. One has, Vii (m) = mi (1 − mi ). Proof. Since supp(µθˇ ) ⊂ [0, 1] and ai = 1 then µθˇ is a linear combination of δ0 and δ1 . Let µθˇ = pθˇ δ1 + qθˇ δ0 , where i i i i i L00 µ ˇ (θi ) θi

(pθˇi , qθˇi ) ∈ (0, +∞)2 . The Laplace transform of µθˇi is Lµθˇ (θi ) = pθˇi exp(θi )+ qθˇi . This implies that L0

µ ˇ (θi ) θi

i

we have Vii (m(θ )) + m2i (θ ) mi (θ )

= 1. Consequently,

= 1.

Therefore, for all m = (m1 , m2 , . . . , md ) ∈ MF , we have Vii (m) = mi (1 − mi ).



To finish the proof of our main result we need to recall the following result (due to Bar-Lev et al. (1994)) concerning the diagonal multivariate NEFs. A family F is said to be a diagonal multivariate NEF if there exist d functions (a1 , a2 , . . . , ad ) such that the diagonal of the covariance matrix VF (m1 , m2 , . . . , md ) equal to (a1 (m1 ), a2 (m2 ), . . . , ad (md )). Recall that a NEF F is said to be irreducible if it is not the product of two independent NEFs in Rk and Rd−k for some k = 1, 2, . . . , d − 1. Lemma 4.4 (Bar-Lev et al., 1994). There are only six types of irreducible diagonal NEFs in Rd called normal, Poisson, multinomial, negative multinomial, gamma, and hybrid. We are now in a position to prove Theorem 3.1. Proof of Theorem 3.1. Now we will show, by way of contradiction, that F is an irreducible family. Suppose that there exists k ∈ {1, 2, . . . , d − 1} such that F is the product of two independent NEFs F1 in Rk and F2 in Rd−k . Necessary, Ω F = Ω F1 × Ω F2 . Pd As Ω F = M F = {x = (x1 , . . . , xd ) ∈ [0, 1]d ; i=1 xi ≤ 1}, we observe that k X 1

k

j=1

 ej =

 1 , , . . . , , 0, 0, . . . , 0 ∈ Ω F

1 1 k k

k

and d X j=k+1

1 d−k

 ej =

0, 0, . . . , 0,

1

1

,

d−k d−k

,...,

1



d−k

∈ ΩF .

Since Ω F = Ω F1 × Ω F2 , then k X 1

k j=1

ej ∈ Ω F1 × {0}d−k

and

d X

1

d−k j=k+1

ej ∈ {0}k × Ω F2 .

So



1 1

1

k k

k d−k d−k

, ,..., ,

1

,

1

,...,

1 d−k



∈ Ω F1 × Ω F2

and this is impossible, because the sum of the components of this vector is equal to 2. According to Lemma 4.3 and by applying Lemma 4.4, we deduce that F is a diagonal multivariate irreducible NEF. Necessary F is a multinomial family, because F has a bounded support. 

944

A. Ghribi, A. Masmoudi / Statistics and Probability Letters 80 (2010) 939–944

Acknowledgements We sincerely thank the Editor and a referee for various suggestions and comments References Bar-Lev, S., Bschouty, D., Enis, P., Letac, G., Lu, I., Richard, D., 1994. The diagonal multivariate natural exponential families and their classification. J. Theoret. Probab. 7, 883–929. Barndorff-Nielsen, O., 1978. Information and Exponential Families in Statistical Theory. Wiley, New York. Bernardoff, Ph., Kokonendji, C.C., Puig, B., 2008. Generalized variance estimators in the multivariate gamma models. Math. Meth. Statist. 17, 66–73. Brown, L.D., 1986. Fundamentals of Statistical Exponential Families. In: IMS Lecture Notes — Monograph Series, vol. 9. Hayward, CA. Casalis, M., 1996. The 2d + 4 simple quadratic natural exponential families on Rd . Ann. Statist. 24, 1828–1854. Hassairi, A., 1999. Generalized variance and exponential families. Ann. Statist. 27, 374–385. Hassairi, A., Masmoudi, A., 2005. Extension of the variance function of a steep exponential family. J. Multivariate Anal. 92, 239–256. Jørgensen, B., Martinez, R., Tsao, M., 1994. Asymptotic behaviour of the variance function. Scand. J. Statist. 21, 223–243. Kokonendji, C.C., Masmoudi, A., 2006. A characterization of Poisson–Gaussian families by generalized variance. Bernoulli 12, 371–379. Kokonendji, C.C., Pommeret, D., 2001. Estimateurs de la variance généralisée pour des familles exponentielles non gaussiennes. C.R. Acad. Sci. Paris Sér. I 332, 351–356. Kokonendji, C.C., Pommeret, D., 2007. Comparing UMVU and ML estimators of the generalized variance for natural exponential families. Statistics 41, 547–558. Kokonendji, C.C., Seshadri, V., 1996. On the determinant of the second derivative of a Laplace transform. Ann. Statist. 24, 1813–1827. Letac, G., 1992. Lectures on natural exponential families and their variance functions. In: Monograf. Mat., vol. 50. Instituto de matemática pura e applicada, Rio de Janeiro. Tweedie, M.C.K., Bartlett, M.S., 1947. Functions of a statistical variate with given means, with special reference to Laplacian distribution. Proc. Cambridge Philos. Soc. 43, 41–49.