Statistics and Probability Letters 109 (2016) 224–231
Contents lists available at ScienceDirect
Statistics and Probability Letters journal homepage: www.elsevier.com/locate/stapro
Stokes’ theorem, Stein’s identity and completeness Dominique Fourdrinier a,∗ , William E. Strawderman b a
Normandie Université, Université de Rouen, LITIS EA 4108, Avenue de l’Université, BP 12, 76801 Saint-Étienne-du-Rouvray, France
b
Rutgers University, Department of Statistics, 561 Hill Center, Busch Campus, Piscataway, NJ 08854-8019, USA
article
info
Article history: Received 23 October 2015 Accepted 3 November 2015 Available online 28 November 2015
abstract We study the relation between Stein’s theorem and Stokes’ theorem (or the divergence theorem) and show, using completeness of certain exponential families, that they are equivalent, in a certain sense, by using each to prove a version of the other. © 2015 Elsevier B.V. All rights reserved.
MSC: 62C10 62C20 Keywords: Stein’s identity p-variate normal distribution Stokes’ theorem Integration by slices Minimax estimation
1. Introduction We study the relationships between Stein’s lemma (Stein, 1981) and Stokes’ theorem, and demonstrate using completeness of certain exponential families that they are in fact equivalent in a certain sense. Stein’s lemma has been extensively used in the area of minimax shrinkage estimation in the normal case. Also, Stokes’ theorem has been used to prove versions of Stein’s lemma which are useful in developing minimax shrinkage procedures for general spherically and elliptically symmetric distributions. In Section 2, we study the relation between the two results for spherical regions and prove, using Stein’s lemma, an almost everywhere version of Stokes’ theorem. Since Stein’s lemma can be proved (e.g. Stein’s original proof) without the use of Stokes’ theorem, this implies a certain type of equivalence between the two results, as noted above. In Section 3, we extend this equivalence to regions with quite general contours. We prove an extension of Stein’s lemma to densities of the form exp(−ϕ(x)) where ϕ is a smooth function tending to ∞ in all directions which defines level sets of the form [ϕ ≤ r ] with boundaries of the form [ϕ = r ]. The proof we give is analogous to Stein’s original proof for the normal case and does not use Stokes’ theorem. We also give an almost everywhere version of Stokes’ theorem whose proof is based on the extended Stein’s lemma and which again uses completeness of a certain exponential family in an essential way. For completeness, we give, in the Appendix, a proof of the extended version Stein’s lemma using Stokes’ theorem. This again establishes a certain equivalence between (generalized) Stein’s lemma and Stokes’ theorem for quite general regions. To us, it is quite striking that completeness can be used to establish a connection between these two well known results. In Section 4, we give several examples, mostly known, of applications of Stein’s lemma in estimation of a mean vector. Section 5 gives some concluding remarks.
∗
Corresponding author. E-mail addresses:
[email protected] (D. Fourdrinier),
[email protected] (W.E. Strawderman).
http://dx.doi.org/10.1016/j.spl.2015.11.003 0167-7152/© 2015 Elsevier B.V. All rights reserved.
D. Fourdrinier, W.E. Strawderman / Statistics and Probability Letters 109 (2016) 224–231
225
2. Equivalence of Stokes’ theorem and Stein’s lemma on spheres Stein’s lemma (Stein, 1981) and Stokes’ theorem (see e.g. Stroock, 1990 and Kavian, 1990) are well known to statisticians, and indeed Stokes’ theorem has been used to prove an extension of Stein’s lemma to spherically symmetric distributions (including the normal) (see e.g. Fourdrinier and Strawderman, 2008). It is the purpose of this section to show that one may use Stein’s lemma to derive a version of Stokes’ theorem on spheres, so that, in a certain sense, the two results are equivalent. Note that, in Section 3, this connection will be extended to regions which are not spherical. Here is Stein’s lemma. Lemma 2.1. Stein’s lemma (Stein, 1981). Let X ∼ Np (θ , τ 2 I ) for fixed θ ∈ Rp and τ > 0. If g is a weakly differentiable function from Rp into Rp then, denoting by Eθ,τ 2 the expectation with respect to Np (θ , τ 2 I ), we have Eθ,τ 2 [(X − θ )t g (X )] = τ 2 Eθ,τ 2 [divg (X )],
(2.1)
provided that either expectation exists. Stein’s lemma follows immediately from a generalization given in Section 3. For completeness, we provide two proofs of this generalization (see Lemma 3.1 in Section 3). The first proof, given in Section 3, follows along the lines of Stein’s original proof and does not use Stokes’ theorem. A second proof relying on Stokes’ theorem is given in the Appendix. The main result of this section proves an almost everywhere version of Stokes’ theorem for weakly differentiable function from Rp into Rp , when the sets of integration are balls Br ,θ and spheres Sr ,θ of radius r > 0 and centered at θ ∈ Rp and the measures are respectively the Lebesgue measure on Br ,θ and the uniform measure on Sr ,θ (see Appendix A.1 for more details). The proof uses Stein’s lemma and completeness of the normal scale family. Stokes’ theorem follows as a corollary for functions which are sufficiently smooth near the inner boundary of Br ,θ . Thus equivalence between Stein’s lemma and Stokes’ theorem is established for spheres and balls. As noticed above, in Section 3, we extend this equivalence to more general regions. Theorem 2.1. Let r > 0 and let θ ∈ Rp be fixed. Let g be a weakly differentiable function from Rp into Rp . Then, provided that the following integrals exist, we have, for almost every r > 0, x−θ
Sr ,θ
t
∥x − θ∥
g (x) dσr ,θ (x) =
divg (x) dx,
(2.2)
Br ,θ
where σr ,θ is the uniform measure on Sr ,θ . Further, for every r for which
lim
r ′ →r −
Sr ′ ,θ
x−θ
t
∥x − θ∥
g (x) dσr ′ ,θ (x) =
x−θ
t
∥x − θ ∥
Sr ,θ
g (x) dσr ,θ (x)
(2.3)
the two integrals in (2.2) are equal. Proof. Let θ ∈ Rp and τ > 0 be fixed and let X ∼ Np (θ , τ 2 I ). Assume first that Eθ ,τ 2 [|g (X )|] < ∞ so that (2.1) holds. Integrating through uniform measures on spheres (equivalently through spherical coordinates), we have Eθ,τ 2 [(X − θ )t g (X )] =
Rp
(x − θ )t g (x)
=
Sr ,θ
R+
exp − (2 π τ 2 )p/2
x−θ
1
∥x − θ∥
t
∥x − θ∥2 2 τ2
dx
g (x) dσr ,θ (x) ψτ 2 (r ) dr
(2.4)
where
ψτ 2 (r ) =
r2 r exp − 2 p / 2 (2 π τ ) 2 τ2
1
.
(2.5)
Also
∥x − θ ∥2 τ Eθ,τ 2 [divg (X )] = τ divg (x) exp − dx (2 π τ 2 )p/2 2 τ2 Rp ∞ 1 r2 2 = divg (x) −τ exp − dx (2 π τ 2 )p/2 2 τ2 Rp ∥x−θ∥ ∞ = divg (x) ψτ 2 (r ) dr dx 2
2
1
Rp
∥x−θ∥
= R+
Br ,θ
divg (x) dx ψτ 2 (r ) dr ,
(2.6)
226
D. Fourdrinier, W.E. Strawderman / Statistics and Probability Letters 109 (2016) 224–231
since, according to (2.5),
∂ ∂r
1
(2 π τ 2 )p/2
r2
2
−τ exp −
2 τ2
= ψτ 2 (r )
and by Fubini’s theorem (noticing [x ∈ Rp and r ≥ ∥x − θ ∥] ⇔ [r ∈ R+ and x ∈ Br ,θ ]). Therefore, it follows from (2.1), (2.4) and (2.6) that, for all τ 2 > 0,
R+
x−θ
t
∥x − θ ∥
Sr ,θ
g (x) dσr ,θ (x) ψτ 2 (r ) dr =
R+
Br ,θ
divg (x) dx ψτ 2 (r ) dr
(2.7)
and hence, since the family (ψτ 2 (r ))τ 2 >0 defined in (2.5) is complete as an exponential family, we have equality of the innermost integrals in (2.7) for almost every r > 0 (θ being fixed). This gives the first result, assuming that Eθ ,τ 2 [|g (X )|] < ∞. If not, for fixed R > 0, multiplication of g by a smooth function which equals 1 on the ball BR of radius R and centered at 0 and vanishes off a compact set containing BR gives a function with finite expectation which equals g (·) on BR . Then, for almost every 0 < r < R, (2.2) holds for such a g as well. Finally, note that the right hand-side of (2.2) is absolutely continuous in r, and hence continuous, since B divg (x) dx =
r 0
h(ρ) dρ where h(ρ) =
Sρ,θ
r ,θ
divg (x) dσρ,θ (x) (see Appendix A.1). This together with (2.3) implies the second result.
Remark 2.1. It is clear that some form of regularity of g near the boundary Sr of Br is required for equality in (2.2). For example, the function g = (g1 , . . . , gp ) defined, for i = 1, . . . , p and x = (x1 , . . . , xp ), by gi (x) = 0 if ∥x∥ ̸= r and gi (x) = xi if ∥x∥ = r is weakly differentiable Rp with weak divergence divg ≡ 0. However g does not satisfy Stokes’ equation on on t Br since B divg (x) dx = 0 but S x /∥x∥ g (x) dσ (x) = r p σ (S ) ̸= 0 where σ (S ) is the area of the unit sphere in Rp . For r r example, Stroock (1990) requires g to be smooth so that (2.2) is satisfied. Kavian (1990) allows g to be weakly differentiable while guaranteeing that g can be properly defined on the boundary through a trace operator, so that (2.2) holds. 3. More general contours In this section, we prove an extension of Stein’s lemma (Lemma 2.1) for densities proportional to exp(−ϕ(x)) where ϕ is a continuously differentiable function. Additionally, we give an extension of Theorem 2.1 when the sets of integration Br with boundary Sr are replaced by [ϕ ≤ r ] = {x ∈ Rp : ϕ(x) ≤ r } with boundary [ϕ = r ] = {x ∈ Rp : ϕ(x) = r }. Here is an extension of Stein’s lemma. Lemma 3.1. Let ϕ be a continuously differentiable function from Rp into R+ such that φ : x → exp −ϕ(x) is a density and such that, for any i = 1, . . . , p, lim|xi |→∞ ϕ(x1 , . . . , xp ) = ∞. If g = (g1 , . . . , gp ) is a weakly differentiable function from Rp into Rp then, denoting by E the expectation with respect to φ , we have
E [∇ϕ(X )t g (X )] = E [divg (X )],
(3.1)
provided that either expectation exists. Remark 3.1. Note that Lemma 2.1 follows immediately by choosing ϕ(x) = ∥x − θ ∥2 /2τ 2 . Proof of Lemma 3.1. Let x = (x1 , . . . , xp ) ∈ Rp . For fixed i = 1, . . . , p, set x−i = (x1 , . . . , xi−1 , xi+1 , . . . , xp ) and, with a slight abuse of notation, set x = (xi , x−i ). Note that
∂ϕ(x) ∂φ(x) =− φ(x) ∂ xi ∂ xi so that φ(x) can be written as
φ(x) =
xi
− −∞
∂ϕ(xi , x−i ) φ(xi , x−i ) dxi = ∂ xi
∞
xi
∂ϕ(xi , x−i ) φ(xi , x−i ) dxi , ∂ xi
noticing that, by assumption, lim|xi |→∞ ϕ(x1 , . . . , xp ) = ∞ implies lim φ(xi , x−i ) = lim exp −ϕ(xi , x−i ) = 0.
|xi |→∞
|xi |→∞
Thanks to the existence of the expectations in (3.1), we can write, for almost every x−i ,
∞ −∞
∂ g i ( x i , x −i ) φ(xi , x−i ) dxi = ∂ xi
∂ gi (xi , x−i ) xi ∂ϕ(xi , x−i ) − φ(xi , x−i ) dxi dxi ∂ xi ∂ xi −∞ −∞ ∞ ∂ gi (xi , x−i ) ∞ ∂ϕ(xi , x−i ) + φ(xi , x−i ) dxi dxi ∂ xi ∂ xi 0 xi
0
(3.2)
D. Fourdrinier, W.E. Strawderman / Statistics and Probability Letters 109 (2016) 224–231
227
0 ∂ϕ(xi , x−i ) ∂ gi (xi , x−i ) φ(xi , x−i ) dxi dxi ∂ xi ∂ xi −∞ xi ∞ xi ∂ϕ(xi , x−i ) ∂ gi (xi , x−i ) + φ(xi , x−i ) dxi dxi ∂ xi ∂ xi 0 ∞0 ∂ϕ(xi , x−i ) = φ(xi , x−i ) [gi (xi , x−i ) − gi (0, x−i )] dxi ∂ xi −∞ ∞ ∂ϕ(xi , x−i ) = φ(xi , x−i ) gi (xi , x−i ) dxi , ∂ xi −∞ 0
−
=
since, using again (3.2),
∂ϕ(xi , x−i ) φ(xi , x−i ) dxi = ∂ xi
∞
−
−∞
∞
−∞
∂φ(xi , x−i ) dxi = 0. ∂ xi
Then integrating with respect to x−i gives
∂ gi (X ) E ∂ xi
∂ gi (xi , x−i ) φ(xi , x−i ) dxi dx−i ∂ xi ∂ϕ(xi , x−i ) = φ(xi , x−i ) gi (xi , x−i ) dxi dx−i p ∂ xi R ∂ϕ(X ) gi (X ) , =E ∂ xi
=
Rp
and hence, summing on i gives the desired result.
In preparation for an extension of Theorem 2.1 the following co-area theorem used in Fourdrinier et al. (2003) is relevant and serves as a general replacement of spherical coordinates. This result can be derived from the co-area theorem stated by Federer in Federer (1969) (i.e. Theorem 3.2.12). Lemma 3.2 (Fourdrinier et al., 2003). For any real number r, let [ϕ = r ] be the manifold in Rp associated with a given continuously differentiable function ϕ defined on Rp with nonnegative values whose gradient does not vanish at any point. Then, for any Lebesgue integrable function f , we have
Rp
f (x)dx =
f ( x)
{r ∈R|[ϕ=r ]̸=∅}
[ϕ=r ]
∥∇ϕ(x)∥
dσr (x)dr
(3.3)
where σr is the area measure defined on [ϕ = r ]. The following theorem is an extension of Theorem 2.1, i.e. an almost everywhere version of Stokes’ theorem. As in typical statements of Stokes’ theorem, we assume the set [ϕ ≤ r ] is bounded, and hence, [ϕ ≤ r ] is compact for every r > 0. The proof again relies on Lemma 3.1 and completeness of a certain exponential family. Thus equivalence, in a certain sense, between Stokes’ theorem and generalized Stein’s lemma is established for quite general sets. Theorem 3.1. Let ϕ be a continuously differentiable function from Rp into R+ whose gradient does not vanish at any point and is such that lim|xi |→∞ ϕ(x1 , . . . , xp ) = ∞, for any i = 1, . . . , p. Also assume that, for every r > 0, [ϕ ≤ r ] is compact and that ϕ determines a density φ as in Lemma 3.1. Let g be a weakly differentiable function from Rp into Rp . Then, for almost every r > 0,
[ϕ=r ]
∇ϕ(x) ∥∇ϕ(x)∥
t
g (x) dσr (x) =
divg (x) dx.
(3.4)
[ϕ≤r ]
Further, for every r for which
lim
r ′ →r −
[ϕ=r ′ ]
∇ϕ(x) ∥∇ϕ(x)∥
t
g (x) dσr ′ (x) =
[ϕ=r ]
∇ϕ(x) ∥∇ϕ(x)∥
t
g (x) dσr (x)
(3.5)
the two integrals in (3.4) are equal. Proof. Let X ∼ φτ (x) = Kτ exp(−ϕ(x)/τ ) with ϕ as in Lemma 3.1 and where Kτ is a normalizing constant. As in the proof of Theorem 2.1, we assume, without loss of generality, that Eτ [|g (X )|] < ∞ where Eτ denotes the expectation with respect to φτ . Note that Identity (3.1) in Lemma 3.1 becomes Eτ [∇ϕ(X )t g (X )] = τ Eτ [divg (X )].
(3.6)
228
D. Fourdrinier, W.E. Strawderman / Statistics and Probability Letters 109 (2016) 224–231
By Lemma 3.2, we have
ϕ(x) Eτ [∇ϕ(X ) g (X )] = ∇ϕ(x) g (x) Kτ exp − dx τ Rp ∞ ∇ϕ(x)t g (x) dσr (x) ξτ (r ) dr = Kτ τ 0 [ϕ=r ] ∥∇ϕ(x)∥
t
t
(3.7)
with
ξτ (r ) =
1
τ
r . τ
exp −
(3.8)
We also have
ϕ(x) τ Eτ [divg (X )] = τ divg (x) Kτ exp − dx τ Rp r ∞ = Kτ τ divg (x) − exp − dx τ ϕ(x) Rp ∞ = Kτ τ divg (x) ξτ (r ) dr dx
ϕ(x)
Rp
= Kτ τ
∞
0
divg (x) dx ξτ (r ) dr ,
(3.9)
[ϕ≤r ]
by Fubini’s theorem. Therefore, it follows from (3.6), (3.7) and (3.9) that, for all τ > 0, ∞
0
[ϕ=r ]
∇ϕ(x) ∥∇ϕ(x)∥
t
g (x) dσr (x) ξτ (r ) dr =
∞
0
divg (x) dx ξτ (r ) dr ,
(3.10)
[ϕ≤r ]
and hence, since the family (ξτ (r ))τ >0 defined in (3.8) is complete as an exponential family, we have equality of the innermost integrals in (3.10) for almost every r > 0. This gives the first result. Finally, as in the proof of Theorem 2.1, the right hand-side of (3.4) is absolutely continuous in r, and hence continuous, so that (3.5) implies the second result. 4. Applications We give, in this section, several brief examples of the use of the generalized Stein’s lemma (Lemma 3.1) in estimation of the mean vector θ , most of which have appeared in the literature. Example 1 (Spherical Normal Distributions). Let X ∼ Np (θ , τ 2 I ) with τ 2 known. By Stein’s lemma, Eθ ,τ 2 [(X − θ )t g (X )] = τ 2 Eθ ,τ 2 [divg (X )]. For the loss function L(θ , d) = pi=1 (di − θi )2 = ∥d − θ∥2 and, for an estimator of the form δ(X ) = X + τ 2 g (X ), the risk is given by
R(θ , δ) = Eθ,τ 2 [L(θ , δ)]
= Eθ,τ 2 ∥X − θ∥2 + τ 4 ∥g (X )∥2 + 2 τ 2 (X − θ )t g (X ) = Eθ,τ 2 ∥X − θ∥2 + τ 4 ∥g (X )∥2 + 2 τ 2 τ 2 divg (X ) = R(θ , X ) + τ 4 Eθ,τ 2 ∥g (X )∥2 + 2 divg (X ) . Thus, for a weakly differentiable function g such that Eθ ,τ 2 ∥g (X )∥2 < ∞ which satisfies the differential inequality ∥g ∥2 + 2 divg ≤ 0 (with strict inequality on a set of positive measure), the estimator δ dominates the usual minimax estimator X . The James–Stein estimator with g (X ) = a X /∥X ∥2 is the classical example and gives domination for p ≥ 3 provided 0 < a < 2 (p − 2) (see e.g. Stein, 1981). Example 2 (General Normal Distributions). Let X ∼ Np (θ , Σ ) with Σ known. Then ϕ(x) = (x − θ )t Σ −1 (x − θ )/2 and Lemma 3.1 gives (since ∇ϕ(x) = Σ −1 (x − θ )) Eθ,Σ [(X − θ )t Σ −1 g (X )] = Eθ ,Σ [divg (X )]. Hence, with loss function L(θ, d) = (d − θ )t Σ −1 (d − θ ), an estimator of the form δ(X ) = X + g (X ) has risk (as in the calculation of Example 1)
R(θ , Σ , δ) = R(θ , Σ , X ) + Eθ,Σ g t (X ) Σ −1 g (X ) + 2 (X − θ )t Σ −1 g (X )
= R(θ , Σ , X ) + Eθ,Σ g t (X ) Σ −1 g (X ) + 2 divg (X ) .
D. Fourdrinier, W.E. Strawderman / Statistics and Probability Letters 109 (2016) 224–231
229
Thus, if g t Σ −1 g + 2 divg < 0, δ(X ) dominates the usual minimax estimator X . For example, the James–Stein estimator δ(X ) = 1 − a/X t Σ −1 X X satisfies this inequality for p ≥ 3 and 0 < a < 2 (p − 2). Example 3 (Spherically and Elliptically Symmetric Distributions). Let X ∼ exp −ψ(∥x − θ ∥2 /2) = F (x). Then (3.1) gives
E [ψ ∥X − θ ∥2 /2 (X − θ )t g (X )] = E [divg (X )]. Let the loss be given by L(θ , d) = ψ ′ ∥X − θ∥2 /2 ∥d − θ ∥2 . Then, as above, ′
it follows that
RF (θ , X + g (X )) = EF [ψ ′ ∥X − θ∥2 /2 ∥X − θ ∥2 ] + EF [ψ ′ ∥X − θ∥2 /2 ∥g (X )∥2
+ 2 ψ ′ ∥X − θ∥2 /2 (X − θ )t g (X )] = R(θ , X ) + EF ψ ′ ∥X − θ∥2 /2 ∥g (X )∥2 + 2 divg (X ) .
(4.1)
Hence, provided, for example, that 0 < ψ ′ (t ) ≤ C < ∞ and C ∥g (X )∥2 + 2 divg (X ) < 0, we have RF (θ , X + g (X )) < RF (θ, X ). Since the above result is for a somewhat non-standard loss function, it is instructive to make a connection with the usual (squared error) , d) = ∥d − θ ∥2 . To this end, assume that ψ ′ (t ) ≥ 0 for any t and also that, for some K > 0, loss L(θ ′ 2 f (x) = K ψ ∥x − θ ∥ /2 F (x) is a density over Rp . For X ∼ f (x), (4.1) may be used to show (since, for any function H, EF [H (X ) ψ ′ ∥X − θ ∥2 /2 ] = 1/K Ef [H (X )])
2 Rf (θ , X + g (X )) = Ef ∥X − θ ∥ + Ef ∥g (X )∥2 +
2 divg (X )
ψ ′ ∥X − θ ∥2 /2 ψ ′ ∥X − θ∥2 /2 ∥g (X )∥2 + 2 divg (X ) 2 = Ef ∥X − θ∥ + Ef ψ ′ ∥X − θ ∥2 /2 C ∥g (X )∥2 + 2 divg (X ) 2 ≤ Ef ∥X − θ∥ + Ef . ψ ′ ∥X − θ ∥2 /2
Hence, if X ∼ f (x), it follows that X + g (X ) dominates X under usual squared error loss whenever C ∥g (x)∥2 + 2 divg (x) < 0 for all x. This is an extension of a result due to Berger (1975). Brandwein et al. (1993) studied minimaxity of X + g (X ), when the sphere means of ∥g ∥2 are decreasing in the radius and ψ ′ is nondecreasing, and showed minimaxity of X + a g (X ) provided ∥g ∥2 + 2 divg ≤ 0 and 0 < a < E0 [∥X ∥2 /p]. All results require p ≥ 3. A parallel development can be given for the elliptically symmetric case X ∼ F (x) = exp −ψ((x − θ )t Σ −1 (x − θ )/2) . We omit the details. See Fourdrinier et al. (2003) and Kubokawa and Srivastava (2001) for detailed developments in the case where a residual vector is available.
p
p
p
q q Example 4 (lq Contours). Let X ∼ F (x) = exp −ψ . Identity (3.1) gives i=1 E [ψ ′ q |Xi − i=1 |xi − θi | i=1 |Xi − θi | q −2 θi | (Xi − θi ) gi (X )] = E [divg (X )]. Fourdrinier and Lemaire (2000) studied the case q = 1 and Fourdrinier et al. (2008) studied the case q = 4, both when ψ(t ) = t. Berger (1978) studied the case q = 4, ψ(t ) = t 1/4 . In each case, minimax improvements over X were found but the dimension required was sometimes greater than 3.
p
Example 5 (Independent Symmetric Distributions). Let X ∼ exp − E
p
h′
(Xi − θi )2
p 2 ′ (Xi − θi ) E h
2
i =1
= R(θ , X ) +
(Xi − θi ) gi (X ) = E [divg (X )].
Assume h′ > 0 and consider the non-standard loss function L(θ , d) = the form δ(X ) = X + g (X ), we have
RF (θ , δ) =
h((xi − θi )2 /2) . Lemma 3.1 implies
2
i=1
i =1
(Xi − θi )
2
i =1
2
i=1
h′ ((xi − θi )2 /2)(di − θi )2 . For an estimator of
p 2 ′ (Xi − θi ) E h
i =1
p 2 ′ (Xi − θi ) E h
+
p
gi2 (X ) + 2
2
gi2
(X ) + 2 h
′
(Xi − θi )2 2
(Xi − θi ) gi (X )
∂ gi (X ) . ∂ Xi
Suppose that 0 < h ((xi − θi ) /2) ≤ C for any x = (x1 , . . . , xp ) ∈ Rp . Then ′
2
RF (θ , δ) ≤ R(θ , X ) + C ∥g (X )∥2 + 2 divg (X ). Hence, in this case, X + g (X ) dominates X provided C ∥g (X )∥2 + 2 divg (X ) < 0. This particular result seems to be new. Shinozaki (1984) analyzed several different independent location models under the usual quadratic loss, ∥d − θ ∥2 . In each case, he gave minimax shrinkage estimators improving on X for p ≥ 3.
230
D. Fourdrinier, W.E. Strawderman / Statistics and Probability Letters 109 (2016) 224–231
5. Concluding remarks We have studied the relation between Stokes’ theorem and Stein’s lemma and have shown that the two useful results are equivalent, in a certain sense. The connection between them relies in an essential way on completeness of particular exponential families. That completeness can be used to help establish the connection is, to us at least, quite striking and interesting. In the course of studying the connection, we have established an interesting generalization of Stein’s lemma to distributions with very general contours, and have extended the connection with Stokes’ theorem for these contours. We have also given several examples (mostly known) of applications of the generalized Stein’s lemma in establishing improved minimax shrinkage estimation. Acknowledgments The authors would like to thank the associate editor and a referee for their careful reading and for their useful comments. This work was partially supported by a grant from the Simons Foundation (#209035 to William Strawderman). Appendix A.1. Uniform measures on spheres Theorem 2.1 relies on uniform measures on spheres. The uniform measure σ on the unit sphere S = {x ∈ Rp : ∥x∥ = 1} can be defined, for any Borel set Ω of S, by
σ (Ω ) = p λ({ρ u ∈ Rp : 0 < ρ < 1, u ∈ Ω }) where λ is the Lebesgue measure in Rp . For more general spheres, let θ ∈ Rp and r ≥ 0 be fixed. The uniform measure σr ,θ on the sphere Sr ,θ = {x ∈ Rp : ∥x − θ ∥ = r } of radius r and centered at θ (note that S = S1,0 and σ = σ1,0 ) is given, for any Borel set Ω of Sr ,θ , by
σr ,θ (Ω ) = σ
Ω −θ
r
where (Ω − θ )/r = {u ∈ S : r u + θ ∈ Ω } is a Borel set of S. Then, for any Lebesgue integrable function γ , we have
Rp
γ (x) dx =
γ (r u + θ ) dσ (u) dr = S
R+
R+
γ (x) dσr ,θ (x) dr Sr ,θ
which expresses a change of variable through spherical coordinates. A.2. A proof of Lemma 3.1 We give here a proof of Lemma 3.1 using Stokes’ theorem. First assume that g is smooth and has compact support so that Stokes’ theorem is valid for every contour. Thus, for such g, we have
[ϕ=r ]
∇ϕ(x) ∥∇ϕ(x)∥
t
g (x) dσr (x) =
divg (x) dx.
(A.1)
[ϕ≤r ]
Using Lemma 3.2 we have E [∇ϕ(X ) g (X )] = t
∇ϕ(x)t g (x) K exp (−ϕ(x)) dx ∞ ∇ϕ(x)t g (x) =K dσr (x) exp (−r ) dr 0 [ϕ=r ] ∥∇ϕ(x)∥ ∞ =K divg (x) dx exp (−r ) dr Rp
(A.2)
[ϕ≤r ]
0
according to (A.1). Applying Fubini’s theorem (A.2) becomes E [∇ϕ(X )t g (X )] = K
Rp
=K Rp
∞ ϕ(x)
exp (−r ) dr divg (x) dx
exp (−ϕ(x)) divg (x) dx
= E [divg (X )],
(A.3)
D. Fourdrinier, W.E. Strawderman / Statistics and Probability Letters 109 (2016) 224–231
231
which proves the lemma for such smooth g. To extend the proof to weakly differentiable g with compact support, we use the fact that (see e.g. Brezis, 1983) g can be approximated by a sequence (gn )n∈N which converges to g in the Sobolev space W 1,1 (Rp ). This implies that E [∇ϕ(X )t g (X )] = lim E [∇ϕ(X )t gn (X )] n→∞
and E [divg (X )] = lim E [divgn (X )]. n→∞
Hence we have the desired extension. To complete the proof for general weakly differentiable functions, it suffices to apply the dominated convergence theorem to the sequence (hn g )n∈N where hn is smooth, equals 1 on Ωn and vanishes off a compact containing Ωn , where (Ωn )n∈N is an increasing sequence of open sets tending to Rp . References Berger, J.O., 1975. Minimax estimation of location vectors for a wide class of densities. Ann. Statist. 3 (6), 1318–1328. Berger, J.O., 1978. Minimax estimation of a multivariate normal mean under polynomial loss. J. Multivariate Anal. 8, 173–180. Brandwein, A.C., Ralescu, S., Strawderman, W.E., 1993. Shrinkage estimators of the location parameter for certain spherically symmetric distributions. Ann. Inst. Statist. Math. 45 (3), 551–565. Brezis, H., 1983. Analyse fonctionnelle, Théorie et applications. Masson, Paris, New York. Federer, H., 1969. Geometric Measure Theory. Springer, Berlin. Fourdrinier, D., Lemaire, A.-S., 2000. Estimation of the mean of a l1 -exponential multivariate distribution. Statist. Decisions 18, 259–273. Fourdrinier, D., Ouassou, I., Strawderman, W.E., 2008. Estimation of a mean vector under quartic loss. J. Statist. Plann. Inference 138, 3841–3857. Fourdrinier, D., Strawderman, W.E., 2008. Generalized Bayes minimax estimators of location vector for spherically symmetric distributions. J. Multivariate Anal. 99 (4), 735–750. Fourdrinier, D., Strawderman, W.E., Wells, M.T., 2003. Robust shrinkage estimation for elliptically symmetric distributions with unknown covariance matrix. J. Multivariate Anal. 85, 24–39. Kavian, O., 1990. Introduction à la Théorie des Points Critiques et Applications aux Problèmes Elliptiques. Springer, Berlin, Heidelberg, New York. Kubokawa, T., Srivastava, M.S., 2001. Robust improvement in estimation of a mean matrix in an elliptically contoured distribution. J. Multivariate Anal. 76, 138–152. Shinozaki, N., 1984. Simultaneous estimation of location parameters under quadratic loss. Ann. Statist. 12, 322–335. Stein, C., 1981. Estimation of the mean of multivariate normal distribution. Ann. Statist. 9, 1135–1151. Stroock, D.W., 1990. A Concise Introduction to the Theory of Integration. World Scientific Publishing Co. Pte. Ltd.