Statistics and Probability Letters 80 (2010) 1980–1984
Contents lists available at ScienceDirect
Statistics and Probability Letters journal homepage: www.elsevier.com/locate/stapro
A convolution identity and more with illustrations Nitis Mukhopadhyay Department of Statistics, University of Connecticut, Storrs, CT 06269-4120, USA
article
abstract
info
Article history: Received 8 June 2010 Accepted 3 September 2010 Available online 12 October 2010
We begin with a new, general, interesting, and useful identity (Theorem 2.1) based on recursive convolutions. Next, we extend the basic message from Theorem 2.1 by generalizing it under broadened set of assumptions. Interesting examples are provided involving multivariate Cauchy as well as multivariate equi-correlated normal and equicorrelated t distributions. Then, we discuss yet another approach (Theorem 4.1) that also works in the evaluation of the expectation of a ratio of suitable random variables. Example 4.3 especially stands out because it cannot be handled by the previous approaches, but Theorem 4.1 steps in to help. © 2010 Elsevier B.V. All rights reserved.
MSC: 60E10 62E99 Keywords: Basu’s theorem Cauchy Chi-square Convolution Multivariate Cauchy Multivariate normal Multivariate t Normal U-statistic
1. Introduction Convolution operators have important roles to play in probability theory. Transforms of these and their other well-known variants are very useful in substantive applied fields within electrical engineering, signal processing, and imaging. In the present discourse, we will first report on an interesting identity (Theorem 2.1) based on recursive convolutions since it is very general and useful, and as far as we know, it is new. Let us begin with a probability density function (p.d.f.) f (x) defined on the real line ℜ or an appropriate subinterval of ℜ. Then, we denote the p.d.f.’s obtained through successive convolutions as follows: f1 (x) ≡ f (x);
f2 (x) ≡ f ∗ f (x) =
∫
f (x − y)f (y)dy; ℜ
fk (x) ≡ fk−1 ∗ f (x) =
∫
f3 (x) ≡ f2 ∗ f (x) =
∫
f2 (x − y)f (y)dy; . . . ; ℜ
(1)
fk−1 (x − y)f (y)dy;
ℜ
with k = 2, 3, . . . is a fixed integer, x ∈ ℜ. One of our main results (Theorem 2.1) is stated and proved in Section 2. It gives an identity involving integrals and the convoluted p.d.f.’s fk (x) and fn−k (x), 2 ≤ k ≤ n where n(≥2) is a fixed integer. Theorem 2.1 looks complicated in its appearance, but the readers will soon find that we have provided a very easy way to prove it. Some may even find this proof rather pretty. We follow this up with a number of interesting illustrations.
E-mail address:
[email protected]. 0167-7152/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.spl.2010.09.002
N. Mukhopadhyay / Statistics and Probability Letters 80 (2010) 1980–1984
1981
Section 3 extends the basic message carried by Theorem 2.1. The identity in (8) is a generalization of the identity from (2) under a considerably broadened set of assumptions. This is supplemented by a range of interesting examples involving multivariate Cauchy as well as multivariate equi-correlated normal and equi-correlated t distributions. In Section 4, we discuss a different approach (Theorem 4.1) compared with what one finds in Sections 2 and 3 that works in the evaluation of the expectation of a ratio of suitable random variables. This approach works when certain assumption of independence may be verified, often via Basu’s (1955) theorem. Example 4.3 especially stands out because it cannot be handled by the approaches laid down in Sections 2 and 3, however, Theorem 4.1 beautifully steps in to help in this illustration. 2. A general identity We begin with the statement of our new found identity and its simple proof. Theorem 2.1. We have:
∫ ∫
u
ℜ
(u + v)
ℜ
fk (u)fn−k (v)dudv =
k n
,
(2)
where 1 ≤ k < n and n(≥2) are fixed integers. Proof. Let us consider independent and identically distributed (i.i.d.) real-valued random variables X1 , . . . , Xn with a common p.d.f. f (x). Obviously,
k ∑
j =1 Ef n ∑
Xj Xi
[
= Ef
]
U
(3)
U +V
i=1
∑n
∑k
where we denote the random variables U = i=1 Xi and V = i=k+1 Xi . Next, we note that the p.d.f.’s of U and V are respectively given by fk (u) and fn−k (v). Also, U , V are independent random variables. Thus, we can express
[
]
U
∫ ∫
u
fk (u)fn−k (v)dudv. (4) (u + v) ∑n ∑n At this point, observe that Xj / i=1 Xi , 1 ≤ j ≤ n, are identically distributed random variables and thus E Xj / i=1 Xi = n−1 , 1 ≤ j ≤ n, since we can claim: n ∑ X j n j =1 − Xj = nEf X1 1 ≡ Ef n Ef = n n ∑ ∑ ∑ Ef
=
U +V
ℜ
Xi
i =1
ℜ
j =1
Xi
Xi
i =1
i=1
which implies that
X1 = n−1 , n ∑
Ef
(5)
Xi
i =1
assuming that these expectations are finite. Now, a combination of (3)–(5) leads to (2).
2.1. Some illustrations
√
1
−2 −1 We may plug in, for example, (i) f (x) = β −1 exp(− √x/β)−I 1(x > 0),1β 2> 0, (ii) f (x) = { 2π } exp(−x/2)x I (x > 0), −1 2 (iii) f (x) = θ I (0 < x < θ ), θ > 0, (iv) f (x) = (σ 2π ) exp − 2 x /σ I (x ∈ ℜ), σ > 0. In each case, we can claim
that E X1 /
∑n
i =1
Xi = n−1 . In what follows, we especially draw attention to cases (ii) and (iv).
1982
N. Mukhopadhyay / Statistics and Probability Letters 80 (2010) 1980–1984
Example 2.1. Let U , V be independent random variables, U ∼ χk2 and V ∼ χn2−k . Then, in case (ii), we can rewrite (2) as follows: k n
[ = Ef
]
U
[ = Ef
U +V
]
1
where W ∼ Fn−k,k .
1 + k−1 (n − k)W
Hence, we can immediately write down the following identity: ∞
∫
1
1+
0
k−1 (n
− k)w
gn,k (w)dw =
k n
,
(6)
where gn,k (w) is the p.d.f. of the Fn−k,k distribution.
Example 2.2. Next, we let U , V be independent random variables, U ∼ N (0, kσ 2 ) and N (0, (n − k)σ 2 ). Then, in case (iv), we can rewrite (2) as follows: k n
[ = Ef
]
U
= Ef
U +V
1 1+
where W ∼ Cauchy (0, 1).
k−1 (n − k)W
Hence, we can immediately write down the following identity:
∫
∞ −∞
1
1
1+
k−1 (n − k)w
1+w
2
dw =
kπ n
.
(7)
The two integrals seen on the left-hand side of (6) and (7) are both quite complicated and neither one is particularly easy to evaluate directly. 3. Some closely related stuff The nature of our proof of the identity (2) makes it abundantly clear that here we are going after finding expectations of ∑k ∑n the ratio of j=1 Xj and i=1 Xi where the X ’s are i.i.d. with a common p.d.f. f (x), 1 ≤ k < n and n(≥2). The point is that finding expectations of ratios of dependent random variables frequently involves not-so-simple calculations. We also show briefly how the core idea from Section 2 may be expanded much more beyond the evaluation of expectations of ratios of dependent random variables, but the numerator and denominator may involve sums of suitable non-i.i.d. random variables. Again, let us begin with i.i.d. real-valued random variables X1 , . . . , Xn with a common ∑ p.d.f. f (x). Suppose that g (x1 , . . . , xr ) is a real-valued symmetric kernel of degree r (≤n). Suppose that the notation (n,r ) g (Xi1 , . . . , Xir ) denotes the sum over all indices 1 ≤ i1 < · · · < ir ≤ n. Then, arguing similarly as in the proof of Theorem 2.1, we may state the following result:
g (Xj , . . . , Xjr ) Ef ∑ 1 =1 g (Xi1 , . . . , Xir )
n r
,
(8)
(n,r )
for any fixed set of indices 1 ≤ j1 < · · · < jr ≤ n. When r = 1, with g (x1 ) = x1 , (5) is equivalent to (8). Suppose that g (x1 , x2 ) = 12 (x1 − x2 )2 , the kernel that is customarily seen in the construction of the sample variance which happens to be a U-statistic (Hoeffding, 1948). Now, from (8), we can claim that
2 (Xj − Xj2 ) Ef ∑ 1 , = (Xi1 − Xi2 )2 n(n − 1) 2
n≥2
(9)
(n,r )
for any fixed set of indices 1 ≤ j1 < j2 ≤ n. This does not follow from Theorem 2.1. A close inspection reveals that the core idea behind result (8) is simply this: as long as the distribution of ∑ g (Xj1 , . . . , Xjr )/ (n,r ) g (Xi1 , . . . , Xir ) remains unchanged over each fixed set of indices 1 ≤ j1 < · · · < jr ≤ n, (8) would hold. Indeed, one may require a much lesser stringent condition than the i.i.d. assumption for the X ’s. All one needs is that
Ef
− g (Xj1 , . . . , Xjr ) g (Xi1 , . . . , Xir ) (n,r )
remains the same for each fixed set of indices 1 ≤ j1 < · · · < jr ≤ n, and that these expectations are finite. Then, (8) would hold. In what follows, we give a few examples.
N. Mukhopadhyay / Statistics and Probability Letters 80 (2010) 1980–1984
1983
Example 3.1. Let (X1 , . . . , Xn ) be distributed as n-dimensional normal with the mean vector zero, unit variances, and equicorrelation ρ, −(n − 1)−1 < ρ < 1. This is referred to as Rao’s (1973, Section 3c.2) symmetric normal distribution. Then, we have:
2 2 (Xj − Xj2 ) E ∑ 1 , = 2 (Xi1 − Xi2 ) n(n − 1)
(n,r )
for n ≥ 2.
Example 3.2. Suppose that (X1 , . . . , Xn ) has an n-dimensional joint p.d.f. given by
f (x1 , . . . , xn ) = kn
1+
n −
− 12 (n+1) for − ∞ < x1 , . . . , xn < ∞ where kn = Γ
x2i
i =1
1 2
1 (n + 1) π − 2 (n+1) .
This is the p.d.f. of an n-dimensional Cauchy distribution. See Johnson and Kotz (1972, Chapter 42, Eq. (53)). See also Ferguson (1962). Then, we have:
2 (Xj − Xj2 ) E ∑ 1 , = 2 n(n − 1) (Xi1 − Xi2 ) 2
(n,r )
for n ≥ 2.
Example 3.3. Suppose that (X1 , . . . , Xn ) has the p.d.f. of an n-dimensional multivariate t distribution with ν degrees of freedom and equi-correlation ρ, −(n − 1)−1 < ρ < 1. One may look at Johnson and Kotz (1972, Chapter 37), Tong (1990, Chapter 9) or Mukhopadhyay (2000, Section 4.6.2) for reviewing multivariate t-distributions. Again, we have:
X1 (Xj1 − Xj2 ) 2 = n−1 and E , ∑ = n ∑ (Xi1 − Xi2 )2 n(n − 1) 2
E
Xi
(n,r )
i=1
for n ≥ 2.
Example 3.4. Suppose that (X1 , . . . , Xn ) has an n-dimensional p.d.f. given by f ( x1 , . . . , xn ) =
1
(2π )n/2
1+
1 2
Πin=1
n 1− 2 xi sin(2π xi ) exp − 2 i =1
for − ∞ < x1 , . . . , xn < ∞.
Here, every proper subset of n random variables obtained from X1 , . . . , Xn happen to consist of i.i.d. standard normal variables, but all X1 , . . . , Xn together are dependent random variables, and (X1 , . . . , Xn ) has an n-dimensional non-normal multivariate distribution. One may refer to Mukhopadhyay (2009, Eq. (8)). We immediately claim:
2 2 (Xj − Xj2 ) E ∑ 1 , = 2 (Xi1 − Xi2 ) n(n − 1)
(n,r )
for n ≥ 2.
4. Any recourse if (8) fails? We are not in a position to discuss one recourse over another in general when (8) may elude us. But, we may pursue another idea that would allow us to evaluate the expectation of a ratio of random variables in the case of some specific parametric families of distributions. This is a very useful result, but we have not seen it stated as follows in this generality. Theorem 4.1. Suppose that S , T are two statistics and that S and T /S are distributed independently. Then, we have:
[ E
Tk Sk
] =
E [T k ] E [S k ]
,
if E [S k ] ̸= 0, E [T k ], E [T k /S k ] are all finite, S , T are non-zero w.p.1, and k is a fixed real number.
1984
N. Mukhopadhyay / Statistics and Probability Letters 80 (2010) 1980–1984
Proof. Note that Tk
T k = Sk ×
Sk
w.p.1 ⇒ E [T k ] = E [S k ]E
[
Tk Sk
]
[ ⇒E
One crucial point we need to be careful about is that E S k ×
Tk
] =
Sk Tk Sk
E [T k ] E [S k ]
.
need not be E [T k ]E
k T Sk
even though T and S /T are assumed
independent unless each expectation is assumed finite to begin with. Mukhopadhyay (in press) has provided a number of cautionary tales. Example 4.1. Let X1 , . . . , Xn be i.i.d. with a common p.d.f. f (x) = β −1 exp(−x/β)I (x > 0), where β(>0) is an unknown parameter. Let T = X1 and S = i=1 Xi , n ≥ 2. Now, S is a complete and sufficient statistic for β , whereas T /S is an ancillary statistic for β . Thus, by Basu’s (1955) theorem it follows that S and T /S are independent. Hence, by Theorem 4.1, we have:
∑n
E [T ] 2β 1 = = , = E [S ] 2nβ n [ 2] 2 T E [T ] 2β 2 2 (ii) E 2 = = = , 2 S E [S ] (n + n2 )β 2 n(n + 1)
(i) E
[ ] T
S
2 since 2T /β ∼ χ22 and 2S /β ∼ χ2n . The conclusion in part (ii) would not immediately follow from (2) or (5) or (8).
Example 4.2. Let X1 , . . . , Xn be i.i.d. with a common p.d.f.
√ 1 (σ 2π )−1 exp − x2 /σ 2 I (x ∈ ℜ), 2
2 where σ (>0) is an unknown parameter. Let T = X12 and S = i=1 Xi , n ≥ 2. Now, S is a complete and sufficient statistic for σ , whereas T /S is an ancillary statistic for σ . Again, by Basu’s (1955) theorem, S and T /S are independent. Hence, by Theorem 4.1, we have:
∑n
E [T ] σ2 1 = = = , S E [S ] nσ 2 n [ 2] 2 T E [T ] 3σ 4 3 (ii) E 2 = = = , 2 S E [S ] (2n + n2 )σ 4 n(n + 2)
(i) E
[ ] T
since T /σ 2 ∼ χ12 and S /σ 2 ∼ χn2 . The conclusion in part (ii) would not immediately follow from (2) or (5) or (8).
The full worth of Theorem 4.1 will be more apparent from the following illustration. Example 4.3. Let X1 , . . . , Xn be i.i.d. with a common Uniform(0, θ ) distribution where θ (>0) is an unknown parameter. Let T = X1 and S = Xn:n , the maximum order statistic, n ≥ 2. Now, S is a complete and sufficient statistic for θ , whereas T /S is an ancillary statistic for θ . Thus, by Basu’s (1955) theorem, S and T /S are independent. Hence, by Theorem 4.1, we have:
[ E
Tk Sk
] =
E [T k ] Sk
E[ ]
=
θ k /(k + 1) n+k = , nθ k /(n + k)σ 4 n(k + 1)
for any real number k ̸= −1, −n. Now, note that (10) does not follow from Theorem 2.1 or (8).
(10)
References Basu, D., 1955. On statistics independent of a complete sufficient statistic. Sankhya 15, 377–380. Ferguson, T.S., 1962. A representation of the symmetric bivariate Cauchy distribution. Annals of Mathematical Statistics 33, 1256–1266. Hoeffding, W., 1948. A class of statistics with asymptotically normal distribution. Annals of Mathematical Statistics 19, 293–325. Johnson, N.L., Kotz, S., 1972. Distributions in Statistics: Continuous Multivariate Distributions. Wiley, New York. Mukhopadhyay, N., 2000. Probability and Statistical Inference. Dekker, New York. Mukhopadhyay, N., 2009. On p × 1 dependent random variables having each (p − 1) × 1 sub-vector made up of IID observations with examples. Statistics and Probability Letters 79, 1585–1589. Mukhopadhyay, N., 2010. When finiteness matters: counterexamples to notions of covariance, correlation, and independence, American Statistician, 64 (in press). Rao, C.R., 1973. Linear Statistical Inference and its Applications, 2nd ed. Wiley, New York. Tong, Y.L., 1990. The Multivariate Normal Distribution. Springer, New York.