Minimax estimation of a restricted mean for a one-parameter exponential family

Minimax estimation of a restricted mean for a one-parameter exponential family

Journal Pre-proof Minimax estimation of a restricted mean for a one-parameter exponential family Éric Marchand, Fanny Rancourt, William E. Strawderman...

463KB Sizes 1 Downloads 51 Views

Journal Pre-proof Minimax estimation of a restricted mean for a one-parameter exponential family Éric Marchand, Fanny Rancourt, William E. Strawderman

PII: DOI: Reference:

S0378-3758(20)30103-8 https://doi.org/10.1016/j.jspi.2020.10.001 JSPI 5870

To appear in:

Journal of Statistical Planning and Inference

Received date : 2 March 2020 Revised date : 30 September 2020 Accepted date : 1 October 2020

Please cite this article as: É. Marchand, F. Rancourt and W.E. Strawderman, Minimax estimation of a restricted mean for a one-parameter exponential family. Journal of Statistical Planning and Inference (2020), doi: https://doi.org/10.1016/j.jspi.2020.10.001. This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

© 2020 Published by Elsevier B.V.

Journal Pre-proof

Minimax estimation of a restricted mean for a one-parameter exponential family 1

repro of

´ Eric Marchanda , Fanny Rancourtb & William E. Strawdermanc a Universit´e de Sherbrooke, D´epartement de math´ematiques, Sherbrooke Qc, CANADA, J1K 2R1 (e-mail: [email protected]) b Commission scolaire de Montr´eal, 3737 rue Sherbrooke E, Montr´eal, Qc, CANADA, H1X 3B3 (e-mail: [email protected]) c Rutgers University, Department of Statistics, 501 Hill Center, Busch Campus, Piscataway, N.J., USA, 08855 (e-mail: [email protected])

Summary

For one-parameter exponential families, we provide a unified minimax result for estimating the mean under weighted squared error losses in presence of a lower-bound restriction. The finding recovers cases for which the result is known, as well as others which are new such as for a negative binomial model. We also study a related self-minimaxity property, obtaining several non-minimax results. Finally, for discrete models such as Poisson and negative binomial, we obtain classes of minimax estimators of a lower-bound restriction on the mean, which include range preserving solutions.

rna lP

AMS 2010 subject classifications: 62C15, 62C20, 62F10, 62F30

Keywords and phrases: Admissibility; Discrete distributions; Exponential family; Karlin’s theorem; Minimax; Negative binomial; Poisson; Restricted parameters.

1. Introduction

Jou

Restricted parameter space inference bring many challenges, several of which are less apparent in corresponding unrestricted parameter space versions. Over the years, a better understanding of decision-theoretic properties of such problems has emerged and several techniques and methods have been developed to address such challenges (e.g., Marchand & Strawderman, 2004; van Eeden, 2006), including minimax analysis (e.g., Casella & Strawderman, 1981; Marchand & Strawderman, 2012). Interestingly, there exist a large number of problems where the minimax risk for X ∼ pθ , θ ∈ Θ, remains the same for the parametric restriction θ ∈ Θ0 ⊂ Θ, and this for many choices of loss functions (e.g., Marchand & Strawderman, 2012). An early example was addressed by Katz (1961) for X ∼ N (θ, 1), squared error loss, Θ = R, and Θ0 = R+ . In this case, the estimator δ0 (X) = X is clearly not adapted to the parametric restriction, but knowledge of its minimaxity nevertheless is still a useful benchmark as dominating estimators are necessarily minimax themselves. Then, in such situations, it is further of interest to specify classes of plausible estimators, Bayesian or otherwise, that dominate the given benchmark and that are necessarily minimax themselves. 1

October 30, 2020

1

Journal Pre-proof

repro of

This paper deals with one-parameter exponential families and a corresponding minimax property for normalized squared error loss. We find it useful to record a unifying minimax finding, which not only includes existing results such as for the normal case among others, but also leads to new findings such as for a lower-bounded negative binomial mean. We also focus on determining dominating minimax estimators, in particular for discrete models as there exists a relative paucity of dominance findings in the literature. Let

X ∼ pη (x) = eηx − A(η) ,

(1.1)

with η ∈ (η, η¯), be a one-parameter exponential family density with respect to a σ−finite measure ν. Set θ = E(X) = A0 (η), V(θ) = Varθ (X) = A00 (η), assume θ → θ¯ as η → η¯ and θ → θ as η → η. The minimax and non-minimax results of this paper depend critically on the variation of λ(θ) = V(θ) . θ2 γλ X + 1+λ of θ under weighted squared We study the potential minimaxity of estimators δλ,γ (X) = 1+λ error loss (δ − θ)2 Lλ,γ (θ, δ) = , with w(θ) = E {δλ,γ (X) − θ}2 and λ ≥ 0. (1.2) w(θ)

Karlin’s Theorem (Karlin, 1958) concerns exponential families (1.1) and provides sufficient conditions (see Section 2.2) for the admissibility of δλ,γ (X) under squared error loss, and hence under loss (1.2). Moreover, in cases where admissibility holds, it follows immediately that δλ,γ (X) is unique minimax under loss (1.2) with constant risk equal to 1.

rna lP

When θ¯ = ∞, taking λ = limθ→∞ λ(θ), we show with Theorem 2.2 that the estimator δλ,0 (X) = X is also minimax under a lower bound restriction θ ≥ θ0 , under loss Lλ,0 provided Karlin’s 1+λ conditions for admissibility in the full parameter space hold. The finding is unified and we provide examples, some of which being known minimax results, and some which being new to the best of our knowledge. Also, the result is significant as dominating estimators of δ0 (X) will necessarily be minimax. Extensions to a much large class of loss functions (Corollary 2.1 are then obtained. An analogue minimax result (Theorem 2.3) is also established for an upper-bound restriction and implications for the potential minimaxity of estimators δλ,γ (X) under loss Lλ,γ is also addressed with Corollary 2.2.

Jou

The angle examined here can be described as one of self-minimaxity as loss (1.2) is linked to the risk function of the estimator being assessed for minimaxity. This interesting question is first studied in Section 2.1 with examples of non-minimaxity provided for estimators of a mean θ of the type δ(X) = aX. Although such findings are not necessarily linked to exponential families, they do apply 1 for families in (1.1) and we identify many cases of non self-minimaxity when the choice a = 1+λ is not made. Section 2.2 contains the principal minimax results described in the paragraph above, and examples are presented in Section 2.3. Section 3 deals with improved estimators under squared error loss of a lower-bounded expectation for discrete models using an adaptation of Kubokawa’s integral expression of risk difference technique. For Poisson and negative binomial models, the classes of improved estimators are minimax under normalized squared error loss by virtue of the minimax findings of Section 2.2.

2

Journal Pre-proof

2. Main results 2.1. Non-minimaxity results

repro of

The following non-minimaxity result applies to one-parameter exponential families, but as well more generally to estimating a mean θ with finite variance. Theorem 2.1. Let X be a random variable on the real line with θ = E(X), Varθ (X) = V(θ) , θ ∈ Θ, and λ(θ) = V(θ) . Assume that λ(θ) is continuous and that Θ is connected. Let λ = inf θ∈Θ λ(θ) θ2 ¯ = sup ¯ and λ θ∈Θ λ(θ), and assume that λ > 0 or λ < ∞ (or both). Consider estimating θ under ¯ loss Lλ,0 in (1.2). Then, for any λ < λ or λ > λ, the estimator X/(1 + λ) is inadmissible and ¯ in dominated in both risk and maximum risk, by X/(1 + λ) in the former case, and by X/(1 + λ) the latter case. Proof. With Eθ (aX − θ)2

= a2 V(θ) + (a − 1)2 θ2 = θ2 {a2 λ(θ) + (a − 1)2 }   λ(θ) 1 2 2 ) + , θ 6= 0 , = θ (1 + λ(θ)) (a − 1 + λ(θ) (1 + λ(θ))2

we have

λ(θ) (1+λ(θ))2 λ(θ) 1 )2 + (1+λ(θ)) 2 1+λ(θ)

1 )2 1+λ(θ)

1 ( 1+λ −

+

rna lP

R(θ, aX) =

(a −

, θ 6= 0.

(2.3)

Since R(0, aX) = a2 (1 + λ)2 , and with the above risk depending on a only through the distance 1 |, we infer that |a − 1+λ(θ) 1 = R (θ, X/(1 + λ)) > R (θ, X/(1 + λ)) for all θ ∈ Θ, λ < λ , and

¯ 1 = R (θ, X/(1 + λ)) > R θ, X/(1 + λ)



¯, for all θ ∈ Θ, λ > λ

which establishes the dominance parts. Since the risk R(θ, aX) depends on θ through λ(θ) only, it suffices to show for the dominance in maximum risk part that:

Jou

(i) L1 = lim R (θ, X/(1 + λ)) < 1 ¯ λ(θ)→λ  ¯ < 1 (ii) L2 = lim R θ, X/(1 + λ) λ(θ)→λ

for

λ < λ,

for

¯. λ>λ

¯ < ∞, and (ii) holds if λ > 0. And finally, we obtain readily from (2.3) that Clearly, (i) holds if λ  2   ¯ = ∞, and L2 = λ ¯ (1 + λ)/λ(1 + λ ¯ 2 < 1 in (ii) if λ = 0. The proof < 1 in (i) if λ L1 = 1+λ 1+λ is thus complete. It is immediate that the dominance part of the above result holds as well for weighted squared error losses q(θ) (δ − θ)2 with q(·) > 0. As well, the above result is applicable both under a parametric restriction on θ, or not. Here are some applications of Theorem 2.1. Example 2.1. (Poisson) For X ∼ Poisson(θ), we have E(X) = θ, V(θ) = θ, and λ(θ) = 3

V(θ) θ2

= 1θ .

Journal Pre-proof

repro of

(A) Under an upper bound restriction θ ≤ θ0 , we have λ = inf θ≤θ0 λ(θ) = θ10 . Theorem 2.1 thus applies and tells us that estimators X/(1 + λ) are, for 0 ≤ λ < θ10 and loss Lλ,0 , not minimax 0 for θ ≤ θ0 and dominated by X/(1 + λ) = θ0θ+1 X in both risk and maximum risk. The dominated estimators include X which is, for loss (δ − θ)2 /θ and in contrast, unique minimax for the unconstrained parameter space θ ∈ R+ , and minimax under a lower bound restriction θ ≥ θ0 as addressed below (e.g., Theorem 2.2, Section 2.3). As a complement, interesting minimax analyses for θ ≤ θ0 were given by Johnstone & MacGibbon (1992). 1 ¯ = sup (B) Under a lower bound restriction θ ≥ θ0 , we have λ θ≥θ0 λ(θ) = θ0 . It thus follows from Theorem 2.1 that estimators X/(1 + λ) are, for λ > θ10 and loss Lλ,0 , not minimax for θ ≥ θ0 ¯ = θ0 X in both risk and maximum risk. Finally, observe as and dominated by X/(1 + λ) θ0 +1 well that we can fix λ ∈ (0, ∞) and infer the same dominance result for θ ≥ θ0 with θ0 > 1/λ. This tells us that the non self-minimaxity of X/(1 + λ) is inevitable for sufficiently large θ0 and λ > 0. As mentioned above in (A), it thus follows that only the case λ = 0 is immune to this property.

Example 2.2. (Negative binomial) Consider a negative binomial model NB(r, θ) for X with  r  x r θ (r)x , for x ∈ N . (2.4) pθ (x) = x! r+θ r+θ Here, we have E(X) = θ, V(θ) = θ(θ + r)/r, and λ(θ) =

1 θ

+ 1r .

rna lP

(A) Under an upper bound restriction θ ≤ θ0 , we have λ = inf θ≤θ0 λ(θ) = θ10 + 1r . Theorem 2.1 thus applies and tells us that estimators X/(1 + λ) are for 0 ≤ λ < θ10 + 1r and loss (1.2) not minimax for θ ≤ θ0 and dominated by X/(1 + λ) = 1+1/θX0 +1/r in both risk and maximum risk. The dominated estimators in both risk and maximum risk include rX/(r + 1) for loss  r 2 (1.2) with w(θ) = E ( r+1 X − θ)2 = rθ(θ+r)+θ . The dominance persists of course for other (r+1)2 r 2 weighted squared error losses q(θ)(δ − θ) , namely for q(θ) = 1/E {(X − θ)2 } = θ(θ+r) . With respect to this latter loss, in contrast to the non-minimaxity example here for θ ≤ θ0 , the estimator rX/(r + 1) is unique minimax for the unconstrained parameter space θ ∈ R+ , and minimax under a lower bound restriction θ ≥ θ0 as addressed below (see Example 2.6, iv).

Jou

¯ = 1 + 1 . It follows from (B) Under a lower bound restriction θ ≥ θ0 , we have λ = 1r and λ θ0 r Theorem 2.1 that, for λ < 1/r, X/(1 + λ) is not minimax for loss Lλ,0 and dominated in both rX risk and maximum risk by X/(1 + λ) = r+1 . Actually this is true for θ0 = 0 as well. For 1 1 λ > θ0 + r , Theorem 2.1 tells us that X/(1 + λ) is not minimax for loss Lλ,0 and dominated in both risk and maximum risk by 1+1/θX0 +1/r . Alternatively, for any fixed λ > 1/r, the same dominance result applies as long as θ0 > r/(λr − 1). Example 2.3. (Location families) Let X ∼ N (θ, σ 2 ) with known σ 2 , or more generally let X ∼ f (x−θ) with E(X) = θ, V(θ) = σ 2 , and λ(θ) = σ 2 /θ2 . Then, for the parameter restriction |θ| ≤ θ0 , we have λ = σ 2 /θ02 and Theorem 2.1 tells us that estimators X/(1 + λ) are, for λ ∈ [0, λ) and loss θ2 X (1.2), not minimax for |θ| ≤ θ0 and dominated by X/(1 + λ) = θ20+σ2 in both risk and maximum 0 risk. Such dominated estimators include X (e.g., Lehmann & Casella, 1998, page 327).

4

Journal Pre-proof

Example 2.4. (Inverse gaussian) Consider an inverse Gaussian distribution for X with density r xξ ξ ξ ξ p(x) = e− 2x e− 2θ2 + θ I(0,∞) (x) . 3 2πx

repro of

Here, for known ξ > 0, we have an example of model (1.1) with η = − 2θξ 2 ∈ (−∞, 0), A(η) = (−2ξη)1/2 , Eθ (X) = A0 (η) = θ, V(θ) = A00 (η) = θ3 /ξ, and λ(θ) = θ/ξ. Under an upper bound ¯ = θ0 /ξ and it follows from Theorem 2.1 that estimators X/(1 + λ) restriction θ ≤ θ0 , we have λ ¯ = ξX . As well, under are, for λ > θ0 /ξ and loss (1.2), not minimax and dominated by X/(1 + λ) ξ+θ0 the restriction θ ≥ θ0 , we have λ = θ0 /ξ and we infer from Theorem 2.1 that estimators X/(1 + λ) ξX . are, for λ ∈ [0, θ0 /ξ) and loss (1.2), dominated in risk and maximum risk by ξ+θ 0 Example 2.5. Consider a random variable X with θ = E(X) ∈ (θ, ∞), V(θ) = c θα , λ(θ) = cθα−2 with α ≥ 0; the location families in Example 2.3 covered here by cases α = 0, the Poisson case covered by α = 1, and the inverse gaussian arising with α = 3. For α = 2, such as for Gamma models, the risk of aX is, under any loss Lλ,0 constant in θ and an optimal choice among multiples ¯ = c. For of X is given by X/(1 + c). We do not need Theorem 2.1, but it applies anyway with λ = λ ¯ 0 ≤ α < 2 and θ ≥ θ0 > 0, it follows from Theorem 2.1 that X/(1 + λ) is, for λ > λ and loss Lλ,0 , ¯ with λ ¯ = λ(θ0 ). For 0 ≤ α < 2 and 0 < θ ≤ θ0 , dominated in risk and maximum risk by X/(1 + λ) we obtain that X/(1 + λ) is, for 0 ≤ λ < λ and loss Lλ,0 , dominated in risk and maximum risk by X/(1 + λ) with λ = λ(θ0 ). Similar inferences, such as those in the inverse gaussian example, follow for cases α > 2.

rna lP

2.2. Minimaxity results

As implied by Theorem 2.1 for exponential family model (1.1), when λ(θ) is monotone increasing or decreasing with limθ→θ¯ λ(θ) = λ0 , there exists some (and many) values θ0 such that X/(1 + λ) for λ 6= λ0 is not minimax for θ ≥ θ0 under weighted squared error loss (1.2). In contrast, as expanded upon in this section for one-parameter exponential families, for λ = λ0 and whenever X/(1 + λ) is admissible as an estimator of θ ∈ (θ, ∞) by Karlin’s theorem, the estimator X/(1 + λ) is minimax for θ ≥ θ0 under loss Lλ,0 in (1.2). For X exponentially family distributed as in model (1.1), Karlin’s sufficient condition for to be an admissible estimator of θ, θ ∈ (θ, ∞) is that the integrals Z η¯ Z η0 λ{A(η) − γη} e dη and eλ{A(η) − γη} dη

+

γλ 1+λ

(2.5)

η

Jou

η0

X 1+λ

diverge for some η0 . Under the given model, the quadratic risk of X/(1 + λ), or weight in (1.2), is given by: V(θ) λ 2 2 X w(θ) = E ( − θ)2 = +( ) θ . (2.6) 2 1+λ (1 + λ) 1+λ Here is the main result of this section. Theorem 2.2. Assume model (1.1), θ¯ = ∞, limθ→∞ λ(θ) = λ ∈ [0, ∞) , and that δ0 (X) = δλ,0 (X) = X/(1 + λ) is admissible by Karlin’s theorem with the integrals in (2.5) divergent. Then, δ0 (X) is minimax under the parametric restriction θ ≥ θ0 for weighted squared error loss Lλ,0 in (1.2) with w(θ) as in (2.6). 5

Journal Pre-proof

Proof. Assume that δ0 (X) is not minimax. Then, there would exist an estimator δ(X) and  > 0 such that for all θ ≥ θ0 : E {(δ(X) − θ)2 } < 1 − , E {(δ0 (X) − θ)2 }

repro of

or, equivalently via (2.6)

 V(θ) λ 2 2 E (δ(X) − θ)2 ≤ (1 − ) { ) θ }. +( (1 + λ)2 1+λ

(2.7)

Now, with I(θ) = 1/V(θ) for this given model, where I(θ) is the Fisher information, and setting b(θ) = Eθ (δ(X)) − Eθ (δ0 (X)) = Biasθ (δ(X)) + the Cram´er-Rao inequality E (δ(X) − θ)

2



2



1+

λ θ, 1+λ

2 d Biasθ (δ(X)) dθ

≥ {Biasθ (δ(X))} + I(θ)   2 2 1 λ 0 + b (θ) + V(θ) , = b(θ) − θ 1+λ 1+λ

and, as well,

rna lP

along with inequality (2.7) would imply     λ 2b0 (θ) 0 2 ≤ − V(θ) + λ2 θ2 , V(θ) + b2 (θ) − 2θb(θ) b (θ) + 2 1+λ 1+λ (1 + λ)  2b0 (θ) λ  V(θ) + b2 (θ) − 2θ b(θ) ≤ − V(θ) + λ2 θ2 . 2 1+λ 1+λ (1 + λ)

(2.8)

We pursue by showing that there cannot exist a b(θ) (equivalently an estimator δ(X)) such that (2.8) is satisfied for all θ ≥ θ0 . We first make use of the fact that if b(θ0 ) < 0 for some θ0 ≥ θ0 , there exists no solution to (2.8). This is due to the proof of Karlin’s theorem (e.g., Lehmann & Casella, 1998; Brown, 1986). We cannot have b(θ0 ) = 0 for some θ0 neither since (2.8) would imply b0 (θ0 ) < 0 and then b(θ0 + h) < 0 for some h > 0. We therefore assume hereafter that b(θ) > 0 for all θ ≥ θ0 and we proceed by treating the cases : (I) λ = 0 and (II) λ > 0 separately.

Jou

(I) For λ = 0, expression (2.8) implies that b0 (θ) < −/2 for all θ ≥ θ0 which in turn implies that b(θ0 ) < 0 for some θ0 ≥ θ0 which leads to a contradiction as outlined in the above paragraph. (II) For λ > 0, we establish and make use of three intermediate results which are as follows, the proofs of which are relegated to the Appendix. ˜ Lemma 2.1. For some θ˜ ≥ θ0 and K > 0, we have b(θ) ≤ Kθ for all θ ≥ θ. Lemma 2.2. We have limθ→∞

b(θ) θ

= 0.

Lemma 2.3. There exists a sequence {θn : n ∈ N+ } such that θn → ∞ and b0 (θn ) → 0 as n → ∞. 6

Journal Pre-proof

Now, Lemma 2.3, together with inequality (2.8), imply that for the sequence {θn : n ≥ 1}, there λ exists 0 < 0 < 1+λ such that b2 (θn ) λ 2λ b(θn ) ≤ −0 , − 2 θn 1 + λ θn 1+λ

repro of

for sufficiently large n. But, this quadratic inequality implies that r λ λ b(θn ) λ ≥ − ( − 0 ) > 0 , θn 1+λ 1+λ 1+λ

contradicting Lemma 2.2, showing finally that (2.8) cannot hold and establishing the result that X/(1 + λ) is a minimax estimator under the given conditions. The above finding relates to the minimaxity of δ0 (X) = X/(1 + λ) for weighted squared error loss (1.2) with weight associated to mean square error of δ0 (X). The following intermediate result will permit us to infer the minimaxity of δ0 (X) for a larger class of weighted squared error losses. Lemma 2.4. Consider a model X ∼ fθ , a non-negative loss function L0 (θ, δ) for estimating 1 (θ) θ ∈ Θ0 = (θ0 , ∞), and positive functions w1 (θ) and w2 (θ) such that w is non-decreasing on w2 (θ)

1 (θ) Θ0 with limit c = limθ→∞ w . Further consider weighted loss functions Li (θ, δ) = Lw0i(θ,δ) for w2 (θ) (θ) ∗ i=1,2, and suppose that δ (X) is, under L1 (θ, δ) and for θ ∈ Θ0 , a minimax estimator with constant risk equal to 1. Now, assume

rna lP

Condition C0 : For any estimator δ(X), there exists a subsequence {θn : n ∈ N+ } tending to infinity such that for any  > 0: there is a n with E L1 (θn , δ(X)) > E L1 (θn , δ ∗ (X)) −  = 1 −  for all n > n . Then, δ ∗ (X) is a minimax estimator of θ under loss L2 (θ, δ) and for θ ∈ Θ0 with minimax risk equal to c. Proof. Denote Ri (θ, δ) the risk associated with loss Li (θ, δ). For any δ, we have sup R2 (θ, δ) ≥ sup R2 (θn , δ) n>n   w1 (θn ) = sup R1 (θn , δ) w2 (θn ) n>n w1 (θn ) ≥ (1 − ) sup n>n w2 (θn ) w1 (θn ) = (1 − ) lim θn →∞ w2 (θn ) = (1 − ) c .

Jou

θ∈Θ0

Since  is arbitrary, we infer that supθ∈Θ0 R2 (θ, δ) ≥ c = supθ∈Θ0 R2 (θ, δ ∗ ), which establishes the result. Corollary 2.1. Consider the context of Theorem 2.2, but with loss E(δ0 (X)−θ)2 w2 (θ)

(δ−θ)2 w2 (θ)

instead for positive and

continuous w2 (θ) such that is non-decreasing in θ ∈ Θ0 with c = limθ→∞ E(δ0w(X)−θ) . 2 (θ) Then, the estimator δ0 (X) = X/(1 + λ) remains minimax with minimax risk equal to c. 7

2

Journal Pre-proof

Proof. Apply Lemma 2.4 with δ ∗ = δ0 , L0 (θ, δ) = (δ − θ)2 , and w1 (θ) = E(δ0 (X) − θ)2 . It follows 2 from Theorem 2.2 that δ ∗ = δ0 is a minimax estimator for L1 (θ, δ) = (δ−θ) . As a consequence, w1 (θ) Condition C0 is met and the result follows.

repro of

For an upper bound restriction, we have the following analogous result, the proof of which is essentially the same as that of Theorem 2.2. Theorem 2.3. Assume model (1.1), θ = −∞, limθ→−∞ λ(θ) = λ ∈ [0, ∞) , and that δ0 (X) = X/(1 + λ) is admissible by Karlin’s theorem with the integrals in (2.5) divergent. Then, δ0 (X) is minimax under the parametric restriction θ ≤ θ0 and for weighted squared error loss Lλ,0 in (1.2). We conclude this section with an extension of Theorem 2.2 to estimators δλ,γ (X).

X Corollary 2.2. Assume model (1.1), θ¯ = ∞, limθ→∞ λ(θ) = λ ∈ [0, ∞) , and that δλ,γ (X) = 1+λ + γλ is admissible by Karlin’s theorem with the integrals in (2.5) divergent. Then, δλ,γ (X) is minimax 1+λ

under the parametric restriction θ ≥ θ0 and for weighted squared error loss Lλ,γ (θ, δ) = w(θ) = E(δλ,γ (X) − θ)2 =

V(θ) (1+λ)2

+

λ2

(1+λ)2

(δ−θ)2 w(θ)

with

(θ − γ)2 .

Proof. We make use of an argument similar to Brown (1986). Let Y = X − γ. With model (1.1) representative of the distribution of Y with Eθ (Y ) = θ − γ and with AY (η) = A(η) − ηγ, we have

Hence,

Y 1+λ

Varθ (Y ) λ(θ)θ2 = → λ as θ → ∞ . (Eθ (Y ))2 (θ − γ)2

rna lP

λY (θ) =

is by Theorem 2.2 minimax for estimating Eθ (Y ) under the restriction E(Y ) ≥ θ0 − γ 2

Y with wY (θ − γ) = E( 1+λ − (θ − γ))2 . Finally, the result under weighted squared error loss {δ−(θ−γ)} wY (θ−γ) follows as wY (θ − γ) = E( X+γλ − θ)2 . 1+λ

2.3. Examples

This section is devoted to illustration and further observations.

Jou

Example 2.6. (i) (Poisson) Let X ∼ Poisson(θ) with θ ≥ θ0 . Here λ = limθ→∞ V(θ) = θ2 1 limθ→∞ θ = 0. With η = log θ ∈ (−∞, ∞), it is easily verified that the integrals in (2.5) are divergent with λ = γ = 0, and it thus follows from Theorem 2.2 that δ0 (X) = X is min2 imax under loss (δ−θ) with w(θ) = θ obtained from (2.6). This is a known result from van w(θ) Eeden (2006, page 57). (ii) (Normal) Theorem 2.2 applies for X ∼ N (θ, σ02 ) for the restriction θ ≥ θ0 and with known σ02 . With λ(θ) = σ02 /θ2 , λ = 0, η = θ ∈ (−∞, ∞), it is easily verified that the integrals in (2.5) are divergent with λ = γ = 0. It thus follows from Theorem 2.2 that δ0 (X) = X is minimax under loss (δ − θ)2 /σ02 , and squared error loss as well. The result is due to Katz (1961) (also see van Eeden, 2006). Analogously, Theorem 2.3 applies yielding the minimaxity of δ0 (X) for the same loss and for θ ≤ θ0 . (iii) (Gamma) Consider X ∼ G(α, β) with known α, unknown β, and mean θ = αβ. Density (1.1) is representative of the density of X for η = −1/β ∈ (−∞, 0) and A(η) = −α log(−η). 8

Journal Pre-proof

repro of

With V(θ) = θ2 /α, we have λ(θ) = 1/α = limθ→∞ λ(θ) = λ and it is thus pertinent to study the applicability of Theorems 2.2, Corollary 2.2 and Theorem 2.3 to estimators δ 1 ,γ (X) = α γ α X + α+1 . As well known (e.g., Lehmann & Casella, 1998), the integrals in (2.5), which α+1 reduce to Z 0 Z η0 1 −γη/α 1 − e − e−γη/α dη , dη and (2.9) η η η0 −∞ diverge for some η0 if and only if γ ≥ 0. Theorem 2.2 hence applies and tells us that δ 1 ,0 (X) is α minimax for estimating θ ∈ [θ0 , ∞) under loss (1.2) with w(θ) = θ2 /(α + 1), or equivalently for loss (δ − θ)2 )/θ2 . Translating the above to the problem of estimating β, we infer that βˆ0 (X) = X/(α + 1) is minimax for estimating β with β ≥ β0 and loss (βˆ − β)2 /β 2 . This is a known result from van Eeden (1995). Corollary 2.2 also applies and tells us that δ 1 ,γ (X) is minimax for estimating θ ∈ [θ0 , ∞) under weighted squared error loss αθ2 (α+1)2

+

(θ−γ)2 . (α+1)2

(δ−θ)2 w(θ)

α

with w(θ) =

The problem of specifying dominating estimators of δ 1 ,0 (X) for the parametric restriction α θ ≥ θ0 is of interest and has been studied by van Eeden (1995), as well as Marchand & Strawderman (2005), among others. Such dominating estimators include the Bayes estimator of θ with respect to the truncation onto [θ0 , ∞) of the non-informative prior 1/θ, for a large 2 . class of losses which include weighted squared error loss (δ−θ) θ2

2

rna lP

x (iv) (Negative binomial) Consider a negative binomial model for X with p(x) = (r) pr (1−p)x IN (x) , x! and the parametric restriction p ∈ (0, p0 ]. We have θ = E(X) = r(1 − p)/p, V(θ) = θ(θ + r)/r, and the restriction on p corresponds to θ ≥ θ0 with θ0 = r(1 − p0 )/p0 . With λ = limθ→∞ (r+θ) = 1r , and the integrals in (2.5) shown to be divergent, Theorem 2.2 applies rθ confirming the minimaxity of δ 1 ,0 (X) = rX/(r + 1) for weighted squared error loss (1.2) with r  r 2 w(θ) = E ( r+1 X − θ)2 = rθ(θ+r)+θ obtained from (2.6). Furthermore, an application of (r+1)2 Corollary 2.1 with w1 (θ) = w(θ) and w2 (θ) = θ(θ + r)/r tells us that δ 1 ,0 (X) is also minimax r

r 1 (θ) with minimax risk c = limθ→∞ w = r+1 . In the latter case, the minimaxity for loss (δ−θ) w2 (θ) w2 (θ) property of δ 1 ,0 (X), applicable here for θ ≥ θ0 with θ0 > 0, extends the known minimaxity r result for δ 1 ,0 (X) and θ0 = 0. The findings of this example have not been previously established r to the best of our knowledge.

Jou

Remark 2.1. The minimax properties in parts (ii) and (iii) above extend much more generally to location families in (ii) and scale families in (iii) with either a lower-bound or upper-bound parametric constraint on θ, with the minimum risk equivariant estimator, of the form X + c∗ in (ii) and a∗ X in (iii), being minimax for invariant loss of the form ρ(δ − θ) in (ii) and of the form ρ(δ/θ) in (iii). This follows from findings of Marchand and Strawderman (2012), which are applicable to a general framework with an invariance structure. Their results however do not cover discrete exponential families such as the Poisson and Negative Binomial families studied above. In the context of Theorem 2.2, dominating estimators of δ0 (X) = X/(1 + λ) are necessarily minimax. These include for instance their truncations onto the restricted parameter space given by max{θ0 , δ0 (X)}. For the Poisson and negative binomial models, further dominating estimators are given in Section 3. It is instructive to immediately identify estimators δλ,γ (X) that dominate δλ,0 (X) whenever λ > 0. 9

Journal Pre-proof

Lemma 2.5. Let X be a non-negative random variables with X ∼ pθ for θ ≥ 0 such that E(X) = θ and Varθ (X) = V(θ) < ∞. Consider estimators δλ,γ (X) for λ > 0 and the parametric restriction γλ X θ ≥ θ0 > 0. Then, estimators δλ,γ (X) = 1+λ + 1+λ dominate δλ,0 (X) under squared error loss 2 (δ − θ) if and only if 0 < γ ≤ 2θ0 . 1 (1+λ)2

λ 2 2 V(θ)+( 1+λ ) θ +

repro of

Proof. The result follows immediately with the risk evaluation R(θ, δλ,γ ) = γλ 2 λ 2 λ 2 ( 1+λ ) − 2( 1+λ ) γθ = R(θ, δλ,0 ) + ( 1+λ ) (γ 2 − 2 γ θ).

The following is an immediate consequence of the above and Theorem 2.2.

Corollary 2.3. In the context of Theorem 2.2 with λ > 0, estimators δλ,γ (X) are minimax, and dominate δλ,0 (X) whenever 0 < γ ≤ 2θ0 , under loss (1.2). The above dominating estimators δλ,γ (X) can actually take values solely on the restricted parameter space. With P(X ≤ t) > 0 for all t > 0, such a condition requires γ ≥ 1+λ θ0 as well as γ ≤ 2θ0 , λ in other words λ ≥ 1 in Corollary 2.3. The above corollary is applicable in both the Gamma and Negative Binomial cases. Example 2.7. • For X ∼ G(α, β) as above in (iii), estimators δ 1 ,γ (X) of θ = Eθ (X) with α 0 < γ ≤ 2θ0 are minimax and dominate δ 1 ,0 (X) for weighted squared error loss (δ − θ)2 /θ2 α and the parametric restriction θ ≥ θ0 . For instance, in the exponential case with α = 1, the estimator (X/2) + θ0 is minimax and takes values solely on the parameter space.

2

rna lP

• For the Negative Binomial model as above in part (iv) of Example 2.6, estimators δ 1 ,γ (X) of r θ = Eθ (X) with 0 < γ ≤ 2θ0 are minimax and dominate δ 1 ,0 (X) for weighted squared error r

losses (δ−θ) , i = 1, 2, and the parametric restriction θ ≥ θ0 , with the given wi ’s. For instance, wi (θ) in the geometric case with r = 1, the estimator (X/2) + θ0 is minimax and takes values solely on the parameter space.

3. Minimax estimation of a lower-bounded expectation for discrete models

Jou

For X ∼ Poisson(θ) as well as X ∼ N B(r, θ) with θ ≥ θ0 > 0, we have seen in Section 2 (e.g., Examples 2.6 (i), (iv)) that δ0 (X) = X and δ0 (X) = rX/(r + 1) are respectively for these models minimax under weighted squared error loss (1.2). As well, following Corollary 2.1 and as illustrated 2 for the negative binomial model with loss r(δ−θ) , the minimax property carries over to a large class of θ(θ+r) weighted squared error loss functions. In this section for a class of discrete distributions, we provide estimators that dominate a target estimator δ0 (X) under squared error loss. Target estimators such as the above δ0 (X) are of interest because of their minimaxity and because we wish to have at our disposal alternative estimators that pass the minimum test of dominating δ0 (X). As particular cases, we obtain for both the Poisson and negative binomial models above with θ ≥ θ0 > 0 minimax estimators under weighted squared error (1.2) and Corollary 2.1’s variants. To achieve this, we make use of Kubokawa’s Integral Expression for Risk Difference (IERD; Kubokawa, 1994) adapted to count variables taking values on N and conditional moment properties of these distributions. Analogous results have been obtained for continuous models (see for instance Kubokawa, 1999; 10

Journal Pre-proof

Marchand & Strawderman, 2004; for reviews and additional references), but the discrete model analysis presented here is not available in the literature to the best of our knowledge.

repro of

Hereafter, we consider target estimators δ0 (X), such that δ0 (x) is non-decreasing in x ∈ N, and (possibly) shrinkers with respect to X, i.e., δ0 (x) ≤ x for all x ∈ N. These plausible conditions are satisfied by the target minimax estimators in both the Poisson and negative binomial given at the outset of this section, as well as for other situations described next. Remark 3.2. A large class of interesting models for which the dominance results below will apply, other than Poisson and negative binomial, are Poisson mixtures such that X|λ ∼ Poisson(λ) , λ ∼ gθ ,

with E(λ) = θ and Var(λ) = k 2 θ2 , k > 0 known and √ representing the coefficient of variation of λ. The N B(r, θ) model is one such model with k = 1/ r. For such Poisson mixtures, we have E(X) = θ , and V(θ) = θ + k 2 θ2 . From this, for weighted loss

(δ−θ)2 , V(θ)

the frequentist risk of a multiple cX of X with c ≥ 0 is given by R(θ, cX) = c2 +

(c − 1)2 θ , 1 + k2θ

2

rna lP

1−k It is easy to see then that cX has lower risk than X for all θ ≥ 0 whenever c ∈ [0 ∨ 1+k 2 , 1). As X well, the choice 1+k2 is unique minimax among such multiples for θ ≥ θ0 and any θ0 > 0. We thus see that these estimators are shrinkers with respect to the unbiased estimator X.

The next result represents the principal dominance finding of this section. Motivated by the Poisson and negative binomial models, it is cast more generally not only in continuation of the previous Remark, but also to handle cases where X is supported on the finite set {0, 1, . . . , M } with a parametric restriction of the form θ = E(X) ∈ [θ0 , M ] with θ0 > 0. A prominent example, which is further detailed in Example 3.8 below, is given by a Binomial model X ∼ B(n, p) with θ = np, M = n, and p ≥ p0 = θ0 /n.

Jou

Theorem 3.4. Consider X ∼ pθ , θ ≥ 0, with pθ (t) = Pθ (X = t), Eθ (X) = θ, Eθ (X 2 ) < ∞, and support S independent of θ either of the form S = N or S = {0, 1, . . . , M }. Consider estimators δψ (X) = δ0 (X) + ψ(X), where δ0 (X) is a target estimator such that δ0 (x) is non-decreasing in x and δ0 (x) ≤ x for x ∈ S, and ψ is a non-increasing function on S such that limt→¯s ψ(t) = 0 where s¯ = sup S. Let ψ1 (t) = inf Eθ (θ − δ0 (X) |X ≤ t ) , t ∈ S , θ0 > 0 . θ≥θ0

Then, with the further specification that ψ(M + 1) = 0 when S = {0, 1, . . . , M }, under squared error loss and the lower-bound constraint θ ≥ θ0 , (a) δψ (X) dominates δ0 (X) whenever ψ(t + 1) + ψ(t) ≤ ψ1 (t) , for all t ∈ S ; 2

(b) δψ∗ (X) dominates δ0 (X) where ψ ∗ (t) = ψ1 (t) − ` with ` = limt→¯s ψ1 (t) ; 11

(3.10)

Journal Pre-proof

(c) δψ∗ (X) is range-preserving whenever δ0 (X) = X, i.e., Pθ (δψ∗ (X) ≥ θ0 ) = 1 for all θ ≥ θ0 , while δ ∗ (X) = max{θ0 , δψ∗ (X)} is range-preserving and dominates δ0 (X). Proof. (a) First, condition (3.10) is not vacuous since

which implies that

repro of

Eθ (θ − δ0 (X) |X ≤ t ) ≥ Eθ (θ − X |X ≤ t ) ≥ θ − min{θ, t} = max{0, θ − t} , ψ1 (t) ≥ inf (max{0, θ − t}) ≥ max{0, θ0 − t} , for all t ∈ S. θ≥θ0

Now, we have for the difference in losses: (δ0 (x) − θ)2 − (δψ (x) − θ)2

(3.12)

¯ (δ0 (x) − θ + ψ(t) )2 |st=x X  = {δ0 (x) − θ + ψ(t + 1) }2 − {δ0 (x) − θ + ψ(t) }2

=

t≥x

= 2

X t≥x

{ψ(t + 1) − ψ(t)} {δ0 (x) − θ +

As a difference in risks, we hence obtain

ψ(t + 1) + ψ(t) }. 2

= Eθ (δ0 (X) − θ)2 − Eθ (δψ (X) − θ)2 X X ψ(t + 1) + ψ(t) } = 2 pθ (x) {ψ(t + 1) − ψ(t)} {δ0 (x) − θ + 2 t≥x x∈S X X ψ(t + 1) + ψ(t) = 2 {ψ(t + 1) − ψ(t)} pθ (x) {δ0 (x) − θ + }. 2 x≤t t∈S

rna lP

∆(θ)

(3.11)

Since by assumption ψ(t + 1) − ψ(t) ≤ 0 for all t ∈ S, a sufficient condition to have ∆(θ) ≥ 0 is: P ψ(t + 1) + ψ(t) x≤t pθ (x) (θ − δ0 (x)) ≥ , for all t ∈ S . Pθ (X ≤ t) 2 Finally, the result follows by minimizing the l.h.s. of the above expression for θ ≥ θ0 .

(b) Since δ0 (x) is non-decreasing in x, it follows that Eθ (δ0 (X)|X ≤ t) is non-decreasing in t ∈ S. Since this holds for all θ, it follows that both ψ1 (t) and ψ ∗ (t) are non-increasing in t ∈ S. With ∗ ∗ (t) limt→¯s ψ ∗ (t) = 0 and ψ (t+1)+ψ ≤ ψ ∗ (t) ≤ ψ1 (t), condition (3.10) is satisfied and the result 2 follows from part (a).

Jou

(c) Clearly, δ ∗ (X) is range preserving and dominates δψ∗ (X). Part (b) therefore implies that δ ∗ (X) dominates δ0 (X). Finally for δ0 (X) = X, since we have limt→¯s Eθ (θ − X|X ≤ t ) = 0 for all θ ≥ θ0 , it follows that ` = 0 and δψ∗ (t) = δψ1 (t) = inf Eθ (θ + t − X |X ≤ t ) ≥ θ0 , θ≥θ0

for all t ∈ S.

Remark 3.3. The dominance results of Theorem 3.4 can be extended directly for an upper-bounded restriction θ ≤ θ1 in cases where S is finite. This is achieved with the reflection X 0 = M − X and the correspondence θ0 = M − θ, because the risk of δ(X) at θ for estimating θ ∈ [0, θ1 ] is equivalent to the risk of M − δ(M − X 0 ) at θ0 for estimating θ0 ∈ [θ0 , M ]. 12

Journal Pre-proof

repro of

Example 3.8. As previously mentioned, Theorem 3.4 applies for Binomial models X ∼ B(n, p) with M = n and for a lower bound restriction θ = np ≥ θ0 with θ0 ∈ (0, n). Such situations correspond to cases where the success parameter p is lower-bounded and applications also cover upper-bounded restrictions as pointed out in Remark 3.3. There has been some earlier work on estimating an upperbounded or lower-bounded Binomial proportion. See for instance Marchand & MacGibbon (2000), as well as van Eeden (2006), for findings and useful references. A specific application of Theorem 3.4 is given for the target estimator δ0 (X) = X with ` = 0 and ψ1 (t) = inf θ≥θ0 E(θ − X|X ≤ t) for t = 0, 1, . . . , n. The dominance finding of Theorem 3.4 leaves untouched the question of where the minimum value of Eθ (θ − δ0 (X)|X ≤ t) is attained on the set θ ∈ [θ0 , ∞). We show below, for both the Poisson and negative binomial models and when δ0 (X) is the target minimax estimator given in examples (i) and (iv) of Section 2.3, that the infimum value is attained at the boundary θ = θ0 . We first address the Poisson case. Corollary 3.4. For X ∼ Poisson(θ) with θ ≥ θ0 > 0, and for estimating θ under loss dominating estimators δψ (X) of δ0 (X) = X are given by Theorem 3.4 with ` = 0 and

(δ−θ)2 , θ

θ0t+1 /t! . j 0≤j≤t θ0 /j!

ψ1 (t) = P

Moreover, these dominating estimators are minimax.

rna lP

Proof. The dominating estimators δψ (X) here are minimax since δ0 (X) is minimax as a consequence of Theorem 2.2. The dominance follows from Theorem 3.4, and by the calculation (e.g., Gurmu & Trivedi, 1992; Remark 3.6) θt+1 /t! , j 0≤j≤t θ /j!

Eθ (θ − X|X ≤ t) = P

(3.13)

which is clearly increasing in θ ∈ [θ0 , ∞) for all t ∈ N. This implies that the infimum value for the determination of ψ1 (t), t ∈ N, is attained at θ = θ0 , and establishes the result. Now, for the negative binomial case, we will make use of the following one-parameter exponential family connection between the determination of ψ1 (t) and the comparison of the conditional variance Var(X|X ≤ t) and the unconditional variance Var(X).

Jou

Lemma 3.6. For exponential family model (1.1), an estimator δ0 (X) of θ = E(X) such that t−δ0 (t) is nondecreasing in t ∈ N, we have for t belonging to the support of X: ψ1 (t) = inf Eθ (θ − δ0 (X) |X ≤ t ) = Eθ0 (θ0 − δ0 (X) |X ≤ t ) , θ≥θ0

as long as Varθ (X|X ≤ t) ≤ Varθ (X) for all θ ≥ θ0 . Proof. The truncated distribution X|X ≤ t has exponential family density as in (1.1) with pη (x|X ≤ t) = eηx−At (η) , and At (η) = A(η) + log Pη (X ≤ t) .

13

(3.14)

Journal Pre-proof

From this, we have d d2 log Pη (X ≤ t) and Varη (X|X ≤ t) = Var(X) + 2 log Pη (X ≤ t) . dη dη (3.15) 0 From the above, recalling that θ = A (η), we obtain d Eη (θ − δ0 (X)|X ≤ t) dη

repro of

Eη (X|X ≤ t) = A0 (η) +

d d Eη (θ − X|X ≤ t) + Eη (X − δ0 (X)|X ≤ t) dη dη d = Varη (X) − Varη (X|X ≤ t) + Eη (X − δ0 (X)|X ≤ t) dη ≥ 0, =

by the assumption on the conditional and unconditional variances, and since the monotonicity assumption for t − δ0 (t), paired with an increasing monotone likelihood ordering in X for the d Eη (X − δ0 (X)|X ≤ t) ≥ 0. Finally, the result distributions X|X ≤ t and parameter η, imply that dη follows since θ is a monotone increasing function of η. Remark 3.4. It is of interest to point out, as can be inferred from the above development, that ¯ if and only if Varη (X) ≥ Varη (X|X ≤ t) for the probability Pη (X ≤ t) is logconcave in η ∈ (η, η) ¯ η ∈ (η, η).

rna lP

Remark 3.5. The ordering required in Lemma 3.6 between the truncated and untruncated variances is quite plausible as a property, but it is not guaranteed without some conditions on the underlying distribution, and such issues have been studied mainly for continuous models by Mailhot (1987), Burdett (1996), Chen et al. (2010), and Chen (2013), among others. As can be inferred from the above, such an ordering does hold for both the Poisson and negative binomial distributions. Remark 3.6. Expression (3.13), which relates to the expectation of a truncated Poisson distribution, can be derived from (3.15), with θ = eη as follows: d log Pη (X ≤ t) dη X e−eη eηj d log = − dη j! 0≤j≤t

Eθ (θ − X|X ≤ t) = −

1/t! , η(j−t−1) /j! 0≤j≤t e

which is (3.13) indeed.

= P

Jou

To conclude, here is an application of our findings (Corollary 2.1, Theorem 3.4, and Lemma 3.6) 2 to negative binomial models. The minimax implication is presented for loss r(δ−θ) , but applies as θ(θ+r) well to loss (1.2) and Corollary 2.1’s variants. Corollary 3.5. For X ∼ NB(r, θ) with probability mass function as in (2.4) and with θ ≥ θ0 > 0, 2 and for estimating θ under loss r(δ−θ) , estimators δψ (X) given by Theorem 3.4 with ` = θ0 /(r + 1) θ(θ+r) and r+θ0 (r)t+1 0 ( θ0θ+r )t+1 θ0 r+1 t! , (3.16) ψ1 (t) = + P (r)j r+1 ( θ0 )j 0≤j≤t j!

θ0 +r

dominate δ0 (X) = rX/(r + 1) and are minimax. Namely δψ∗ (X) = δ0 (X) + ψ ∗ (X) dominates δ0 (X) θ0 with ψ ∗ (t) = ψ1 (t) − r+1 . 14

Journal Pre-proof

Proof. As in Corollary 3.4, the dominating estimators δψ (X) here are minimax since δ0 (X) is minimax as a consequence of Corollary 2.1 (see Example 2.6, iv). The dominance of the above δψ0 s follows from Theorem 3.4, while the determination of ` and representation of δψ∗ (X) follow as θ0 limt→∞ ψ1 (t) = r+1 from (3.16).

θt and σt2

repro of

There remains to show that ψ1 (t) is indeed given by (3.16). Shonkmiler (2016) gives expressions for the truncated mean and variance of a negative binomial distribution. Namely, denoting σ 2 = Var(X) = θ(1+ θr ), he obtains the following expressions for θt = E(X|X ≤ t) and σt2 = Var(X|X ≤ t), t ∈ N+ : (r + θ) (t + 1) pθ (t + 1) rPθ (X ≤ t) 1 = θt + t(θt − θ) + θt θ (1 + ) − θt2 . r = θ −

From this, it follows that

σ 2 − σt2 = (θ − θt ) (1 +

(3.17)

θ + t − θt ) > 0 , r

rna lP

t as θt < min{θ, t} for t ∈ N+ . Hence, Lemma 3.6 applies with t − δ0 (t) = r+1 nondecreasing in t ∈ N and implies that rX r ψ1 (t) = Eθ0 (θ0 − |X ≤ t) = θ0 − θt , r+1 r+1 which from (3.17) yields (3.16) and completes the proof.

4. Concluding remarks

Jou

We have provided a unified treatment of minimaxity for one-parameter exponential family subject to a lower-bound restriction on the expectation with several instances where the minimax property persists from the unrestricted case to the restricted parameter case. Our findings recovers previous known cases, but also leads to new results, namely for negative binomial models and for a large variety of weighted squared error loss functions. In contrast, we have elaborated on a plentiful number of non-minimax results and provided illustrations. In the latter part of the work, we have focussed on determining improved estimators in discrete models under squared error loss in presence of a lower-bound constrained expectation, which lead to classes of minimax estimators for estimating lower-bounded Poisson or negative binomial expectations under normalized squared error loss. An open question in relationship to the class of minimax estimators for a lower-bounded Poisson mean given in Corollary 3.4 is the determination of a Bayesian solution. In fact, to the best of our knowledge, no such solution has been found. A plausible candidate is the Bayes estimator δπ (X) with respect to the uniform prior π0 (θ) = I[θ0 ,∞) (θ); which is the truncation of π(θ) = I(0,∞) (θ) for which δπ (X) = X is minimax. However, it it is known (i.e., van Eeden, 2006, pp. 58-59, reporting on a personal communication of Bill Strawderman) that δπ0 (X) is never minimax for any θ0 > 0. In a related problem with interesting techniques and findings that may be helpful in addressing the open question described above as well as related ones, Hamura & Kubokawa (2019) consider X ∼ Poisson(θ), θ ≤ θ0 , and loss L(θ, δ) = θ{δ/θ − log(δ/θ) − 1}. They further consider the class of prior densities π(θ) = θβ−1 and πθ0 (θ) = θβ−1 I(0,θ0 ) (θ), with β > 0 and including the Jeffreys 15

Journal Pre-proof

prior for β = 1/2 in the former case, and the associated Bayes estimators θˆβ (X) and θˆβ,θ0 (X), showing that θˆβ,θ0 (X) dominates θˆβ (X) for sufficiently small θ0 .

repro of

Appendix PROOF OF LEMMA 2.1. Since limθ→∞ λ(θ) = λ by assumption, for any δ ∈ (0, 1) there exists θ˜ > max{0, θ0 } such that λ (1 − δ) ≤ λ(θ) ≤ λ (1 + δ) , for all θ ≥ θ˜ . (4.18) From (2.8), it follows for θ ≥ θ˜ that:   1 + λ b2 (θ)  λ(θ) + λ2 b(θ) λ 0 + ≤ − b (θ) − θ λ(θ) 2 θ2 λ(θ) 2 (1 + λ) λ(θ)  2   1 + λ b(θ) λ2 λ  λ(θ) + λ2 0 − ⇐⇒ b (θ) + ≤ − + 2λ(θ) θ 1+λ 2 (1 + λ) λ(θ) 2 λ(θ)(1 + λ)  1 =⇒ b0 (θ) ≤ λ2 (1 − ) − λ(θ) 2 λ(θ)(1 + λ)   1 λ(1 − ) 0 =⇒ b (θ) ≤ −  = K0 > 0 , (4.19) 2(1 + λ) 1−δ

rna lP

λ ˜ an integration of both sides of for  < λ+1−δ . Recalling that θ˜ > 0 and that b(θ) > 0 for all θ > θ, 0 0 0 ˜ θ) establishes that b(θ) ≤ K (θ − θ) ˜ + b(θ) ˜ ≤ K 0 θ + b(θ) ˜ ≤ Kθ with inequality b (t) < K on t ∈ (θ, 0 ˜ ˜ K = K + b(θ)/θ , establishing the result.

Jou

PROOF OF LEMMA 2.2. Since b(θ) > 0 for all θ ≥ θ0 , expression (2.8) along with (4.18) and Lemma 2.1 imply for sufficiently large θ, say θ ≥ θ2   b0 (θ) λ  λ2 ≤ − 1+ b(θ) θλ(θ) 2b(θ)(1 + λ) λ(θ)   1  ≤ 1+δ− {1 + λ/(1 + δ)} θ 2(1 + λ)K 1 ≤ (1 − 0 ) , (4.20) θ for some 0 > 0 since δ may be chosen 0arbitrarily small and  is fixed here and positive. Hence, in0 0 (t) tegrating both sides of the inequality bb(t) ≤ 1− for t ∈ (θ2 , θ), we infer that log b(θ) ≤ log(c θ1− ), t

for some c > 0 and θ sufficiently large. Finally, this implies that to +∞.

b(θ) θ



c θ0

which goes to 0 as θ goes

PROOF OF LEMMA 2.3. It follows from (4.20) and the last line of the proof of Lemma 2.2, that (1 − 0 ) b(θ) (1 − 0 )c b0 (θ) ≤ ≤ → 0 as θ → ∞. (4.21) θ θ0 16

Journal Pre-proof

Acknowledgements

repro of

Hence, for any h > 0, b0 (θ) < h for all large θ. In addition, if b0 (θ) < −h for all large θ, b(θ0 ) < 0 for many θ0 and the contradiction in the proof of Karlin’s theorem would arise. Therefore, for every hn > 0, there exists a θn such that −hn < b0 (θn ) < hn . By choosing hn → 0, it follows that b0 (θn ) → 0 as θn → ∞.

´ We are quite thankful for the useful comments and suggestions of two reviewers. Eric Marchand’s research is supported in part by the Natural Sciences and Engineering Research Council of Canada, and William Strawderman’s research is partially supported by a grant from the Simons Foundation (#418098).

References

[1] Burdett, K. (1999). Truncated means and variances. Economics Letters, 52, 263-267. [2] Brown, L.D. (1986). Fundamentals of Statistical Exponential Families with Applications in Statistical Decision Theory. Lecture Notes-Monograph Series, Vol. 9, Institute of Mathematical Statistics.

rna lP

[3] Casella, G. & Strawderman, W.E. (1981). Estimating a bounded normal mean. Annals of Statistics, 9, 870–878. [4] Chen, J., van Eeden, C. & Zidek, J. V. (2010). Uncertainty and the conditional variance. Statistics & Probability Letters, 80, 1764-1770. [5] Chen, J. (2013). A partial order on uncertainty and information. Journal of Theoretical Probability, 26, 349-359. [6] Gurmu, S. & Trivedi, P.K. (1992). Overdispersion tests for truncated Poisson regression models. Journal of Econometrics, 54, 347-370.

Jou

[7] Hamura, Y. & Kubokawa, T. (2019). Bayesian predictive distributions for a Poisson model with a parametric restriction. Communications in Statistics: Theory and Methods, 49, 3257– 3266. [8] Johnstone, I.M. & MacGibbon, K. B. (1992). Minimax estimation of a constrained Poisson vector. Annals of Statistics, 20, 807–831. [9] Karlin, S. (1958). Admissibility for estimation with quadratic loss. Annals of Mathematical Statistics, 29, 406–436. [10] Katz, M. (1961). Admissible and minimax estimates of parameters in truncated spaces, Annals of Mathematical Statistics, 32, 136-142. [11] Kubokawa, T. (1999). Shrinkage and modification techniques in estimation of variance and the related problems: a review. Communications in Statistics: Theory and Methods, 28, 613-650. 17

Journal Pre-proof

[12] Kubokawa, T. (1994). A unified approach to improving equivariant estimators. Annals of Statistics, 22, 290-299. [13] Lehmann, E.L. & Casella, G. (1998). Theory of Point Estimation. Springer Texts in Statistics. Springer-Verlag, New York.

repro of

[14] Mailhot, L. (1987). Ordre de dispersion et lois tronques. Cahiers de recherche de l’Acadmie des sciences de Paris, Sr. I Math. 304, 499-501. ´ & MacGibbon, K.B. (2000). Minimax estimation of a constrained Binomial [15] Marchand, E. proportion. Statistics & Decisions, 18, 129-167. ´ & Strawderman, W.E. (2012). A unified minimax result for restricted parameter [16] Marchand, E. spaces. Bernoulli, 18, 635-643. ´ & William E. Strawderman (2005). On improving on the minimum risk equiv[17] Marchand, E. ariant estimator of a scale parameter under a lower-bound constraint. Journal of Statistical Planning and Inference, 134, 90-101. ´ & Strawderman, W.E. (2004). Estimation in Restricted Parameter Spaces: A [18] Marchand, E. Review, Festschrift for Herman Rubin. Institute of Mathematical Statistics Lecture NotesMonograph Series, pp. 21-44. [19] Shonkwiler, J.S. (2016). Variance of the truncated negative binomial distribution. Journal of Econometrics, 195, 209-210.

rna lP

[20] van Eeden, C. (2006). Restricted parameter space problems: Admissibility and minimaxity properties. Lecture Notes in Statistics, 188, Springer.

Jou

[21] van Eeden, C. (1995). Minimax estimation of a lower-bounded scale parameter of a gamma distribution for scale-invariant squared-error loss. Canadian Journal of Statistics, 23, 245–256.

18

Journal Pre-proof . Unified minimax result for estimating a restricted mean for a one-parameter exponential family

repro of

New applications including for a lower-bounded negative binomial mean

rna lP

.

Non-minimax results

Jou

.

Journal Pre-proof

Jou

rna lP

repro of

Both authors worked together on the conception, the derivation of findings, and the writing.