Accepted Manuscript A quantitative discounted central limit theorem using the Fourier metric Guy Katriel
PII: DOI: Reference:
S0167-7152(18)30337-7 https://doi.org/10.1016/j.spl.2018.10.013 STAPRO 8352
To appear in:
Statistics and Probability Letters
Received date : 12 April 2018 Revised date : 12 July 2018 Accepted date : 15 October 2018 Please cite this article as: Katriel G., A quantitative discounted central limit theorem using the Fourier metric. Statistics and Probability Letters (2018), https://doi.org/10.1016/j.spl.2018.10.013 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
*Manuscript Click here to view linked References
A quantitative discounted central limit theorem using the Fourier metric Guy Katriel Department of Mathematics, ORT Braude College, Karmiel, Israel1
Abstract The discounted central limit theorem concerns the convergence of an infinite discounted sum of i.i.d. random variables to normality as the discount factor approaches 1. We show that, using the Fourier metric on probability distributions, one can obtain the discounted central limit theorem, as well as a quantitative version of it, in a simple and natural way, and under weak assumptions. Keywords: Discounted sums, Central limit theorem, Fourier metric 1. Introduction Let Xn (n ≥ 0) be a sequence of i.i.d. real-valued random variables, with µ = E(X0 ),
σ 2 = V ar(X0 ) < ∞.
(1)
For a ∈ [0, 1), we define the random variable Sa =
∞ X
an Xn .
(2)
n=0
1
email:
[email protected], Phone: 972-48663107, Fax: 972-49901802
Preprint submitted to Probability and Statistics Letters
July 12, 2018
Standard results ensure that (2) converges almost surely (see e.g. [Chung 2001], Sec. 5.3). Sa can be understood as the present value of a future stream of i.i.d. payments, where a is the discount factor. [Gerber 1971] proved, assuming that Xn have finite third moments, that as a → 1−, the distribution of Sa approaches a normal distribution: normalizing Sa by setting
we have
Sa − E(Sa ) Sˆa = p = V ar(Sa ) Sˆa
D →
√
µ 1 − a2 , · Sa − σ 1−a
N(0, 1)
as a → 1−,
(3)
(4)
that is, defining the corresponding cumulative distribution functions Fa (x) = P Sˆa ≤ x ,
1 Φ(x) = P (N(0, 1) ≤ x) = √ 2π
(5)
Z
x
u2
e− 2 du,
(6)
−∞
we have, for all x ∈ R, lima→1− Fa (x) = Φ(x). This is the discounted central limit theorem. Gerber also gave a quantitative bound of Berry-Esseen type for this convergence with respect to the Kolmogorov metric: 1 E(|X0 − µ|3) . · (1 − a) 2 , dK (Fa , Φ) = sup |Fa (x) − Φ(x)| ≤ C · 3 σ x∈R
(7)
where one can take C = 5.4 (we note that the formulation given in [Gerber 1971] is slightly different, but equivalent, because of the different normalization taken there). Further explicit bounds, with respect to a non-uniform Kolmogorov metric, again under the assumption that third (or higher) moments exist, are given in [Saulis and Deltuvien˙e 2006]. 2
Assuming only the existence of a second moment, [Embrechts and Maejima 1984] (Theorem 1) proved that sup (1 + |x|)2 · |Fa (x) − Φ(x)| → 0, as a → 1 − .
(8)
x∈R
Note that (8) immediately implies convergence with respect to the Wasserstein (or Kantorovich) metric: Z ∞ . dW (Fa , Φ) = |Fa (x) − Φ(x)|dx → 0, as a → 1−,
(9)
−∞
which is stronger than convergence with respect to the Kolmogorov metric (see, e.g., [Gibbs and Su 2002]). Quantitative (Berry-Esseen type) bounds, without an assumption of finiteness of the the third moment, have been obtained in [Sunklodas 2006]. Assuming the existence of a finite moment of order 2 < s < 3, Corollary 6 of [Sunklodas 2006] gives dW (Fa , Φ) ≤ C ·
1 E(|X0 − E(X0 )|s ) · (1 − a) 2 (s−2) . s σ
(10)
Here we will prove a discounted central limit theorem without any assumption on moments higher than 2, and also give a new and different quantitative bound for the convergence, in the case that some moment of order s = 2 + ǫ (ǫ > 0) exists. This bound will be given in terms of a Fourier-based metric, which will be seen to provide a simple and natural approach to the study of discounted sums. A key observation underlying our proofs is that the distibution Fa can be realized as a fixed point of a mapping on a space of distributions, which is a contraction with respect to this metric. Fourier-based metrics were introduced in connection with study of the Boltzmann equation [Gabetta et al. 1995], and have since found many 3
applications (see [Carrillo and Toscani 2007] for a review). In particular in [Goudon et al. 2002] these metrics have been used to obtain Berry-Esseen type inequalities for sums of i.i.d random variables in terms of Sobolev norms of densities. We recall the definition of the Fourier-based metrics. For any real s > 0, we denote by P s the set of all distribution functions G on R with finite moment of order s, and with expectation 0 and variance 1: Z ∞ |x|s dG(x) < ∞, Z
−∞
∞
Z
xdG(x) = 0,
−∞
∞
x2 dG(x) = 1.
(11) (12)
−∞
To each distribution function G we associate its characteristic function Z ∞ CG (ξ) = e−iξx dG(x). −∞
If s > 0, and G, H are probability distributions, their Fourier distance of type s is defined by ds (G, H) = sup ξ6=0
|CG (ξ) − CH (ξ)| . |ξ|s
(13)
If s ∈ [2, 3] and G, H ∈ P s , then ds (G, H) < ∞ (see [Carrillo and Toscani 2007], Proposition 2.6). Our first result concerns convergence without any assumption on moments higher than 2: Theorem 1. If (1) holds, then lim d2 (Fa , Φ) = 0.
a→1−
4
(14)
We note that since convergence in the d2 metric implies convergence in the Wasserstein metric (see [Carrillo and Toscani 2007], Theorem 2.21), the result of theorem 1 implies (9), which, as noted above, also follows from the result (8) of [Embrechts and Maejima 1984]. However, it does not appear that the convergence in the Fourier metric, given by of Theorem 1, can be directly derived from (8). A quantitative version of Theorem 1 can be obtained if we assume that Xn have a finite s-moment for some s > 2. Set ˆ 0 = σ −1 (X0 − µ), X and let F denote its distribution function:
ˆ F (x) = P X0 ≤ x .
(15)
Note that we have F, Fa ∈ P 2 . Theorem 2. Assume F ∈ P s where s ∈ (2, 3]. Then, for a ∈ (0, 1)
(s − 2)(1 − a2 ) d2 (Fa , Φ) ≤ e · a2
12 (s−2)
· ds (F, Φ).
(16)
Note that s > 2 implies that the right-hand side of (16) goes to 0 as a → 1, so that (16) implies (14) for s > 2 (but not for s = 2, which is the reason that Theorem 1 needs a separate proof). Comparing the bound of Theorem 2 with the bound (10) given by [Sunklodas 2006], we note: (1) The bound (10), (like Gerber’s bound (7) in the case s = 3) is a universal one, hence does not take into account the distance between the distribution of X0 and the normal distribution. In (16), the bound becomes 5
small if X0 is close to normal. In this sense the bounds are of a quite different nature. (2) While the distance between Fa and Φ is measured by different metrics in the two bounds, one may derive a bound on the Wasserstein metric 1 1 π 3 from (16) by using the inequality dW (G, H) ≤ 18 d2 (G, H) 6 , given by
[Carrillo and Toscani 2007], Theorem 2.21. However if we do this we see
that the bound obtained on the distance in the Wasserstein metric is of or1
1
der O((1 − a) 12 (s−2) ), a weaker rate than the O((1 − a) 2 (s−2) ) given by (10). We thus conclude that neither of the inequalities (10) and (16) is a consequence of the other, and each has its advantages. It might be an interesting problem to obtain bounds which combine the advantages of the two inequalities. We remark that the study of discounted sums has been extended in several directions. The i.i.d. assumption on the random variables Xk has been weakened in different ways: [Miao et al. 2013] study the case in which Xk are given by an autoregressive process, while [Sunklodas 2006] studies the case of non-independent random variables satisfying a weak-mixing condition, and [Aghbilagh et al. 2014] consider independent random variables whose distribution varies in a periodic manner. Another line of research has considered finite stochastic sums, in which the upper limit of the sum is (1 − a)−1 t, for some parameter t, and it is proved that the sum weakly converges to a time-transformed Wiener process with respect to t (see e.g. [Horv´ath 1985, Whitt 1972]).
6
2. Proofs of the theorems Noting that Sˆa does not change if a linear function is applied to all Xn ’s, there is no loss of generality in proving our results under the normalization E(Xn ) = 0, V ar(Xn ) = 1,
(17)
which we will henceforth assume. Under this assumption the distribution F given by (15) is simply the distribution of the Xn ’s: F (x) = P (X0 ≤ x) , and (3) becomes Sˆa =
√
1 − a2 · Sa .
(18)
(19)
For a ∈ [0, 1), we now define the operator Ta : P 2 → P 2 , whose unique
fixed point will later be shown to be the distribution function Fa of Sˆa . If G ∈ P 2 , let Y be a random variable with distribution function G. Let X be a random variable independent of Y , with distribution function F given by √ (18). Then Ta [G] is defined to be the distribution function of aY + 1 − a2 ·X: √ Ta [G](x) = P aY + 1 − a2 · X ≤ x . Since
√ √ E aY + 1 − a2 · X = aE(Y ) + 1 − a2 · E(X) = 0, √ 2 V ar aY + 1 − a · X = a2 V ar(Y ) + (1 − a2 )V ar(X) = a2 + (1 − a2 ) = 1, we indeed have Ta [G] ∈ P 2 .
By the above definition, the n-fold composition Tan [G] (n ≥ 1) is the
distribution function of the random variable Yn given by the autoregressive process Yn+1 = aYn +
√
1 − a2 · Xn , n ≥ 0, 7
(20)
where Y0 is distributed according to G, and the Xn ’s are independent and distributed according to F . In terms of characteristic functions we have CTa [G] (ξ) = CF
√
1 − a2 · ξ · CG (aξ).
(21)
The following Lemma summarizes properties of the operator Ta . Lemma 1. For a ∈ [0, 1): (i) If G, H ∈ P 2 then For any integer n ≥ 1 d2 (Tan [G], Tan [H]) ≤ a2n · d2 (G, H).
(22)
(ii) Fa , given by (5), is a fixed point of Ta , and if a > 0 it is the unique fixed point. (iii) For any G ∈ P 2 we have d2 (G, Fa ) ≤
d2 (G, Ta [G]) . 1 − a2
(23)
Proof : (i) For n = 1 we have, using (21) and the fact that |CF (ξ)| ≤ 1, √ |CF ( 1 − a2 · ξ)| · |CG (aξ) − CH (aξ)| d2 (Ta [G], Ta [H]) = sup ξ2 ξ6=0 ≤ sup ξ6=0
|CG (aξ) − CH (aξ)| |CG (aξ) − CH (aξ)| = a2 · sup = a2 · d2 (G, H). 2 2 ξ (aξ) ξ6=0
Proceeding by induction, we have d2 Tan+1 [G], Tan+1 [H] ≤ a2 · d2 (Tan [G], Tan [H]) = a2(n+1) · d2 (G, H). 8
(ii) We denote equality in distribution of two random variables by D = . Let X be a random variable with distribution F , independent of Sa . We claim that aSˆa +
√
1 − a2 · X
D =
Sˆa ,
(24)
which implies Ta [Fa ] = Fa . To show (24), note that we have ∞ X
Sa =
an Xn = X0 + a
∞ X
an Xn+1
D =
X + aSa ,
n=0
n=0
so that, using (19), aSˆa +
√
1 − a2 · X =
√
1 − a2 · [aSa + X]
D =
√
1 − a2 · Sa = Sˆa ,
so we have (24). To show uniqueness of the fixed point when a > 0, assume G ∈ P 2 , Ta [G] = G. Using (22), d2 (Fa , G) = d2 (T [Fa ], T [G]) ≤ a2 ·d2 (Fa , G) ⇒ d2 (Fa , G) = 0 ⇒ G = Fa . (iii) By (22) we have d2 Tak [G], Tak+1[G] ≤ a2k d2 (G, Ta [G]) ,
so the triangle inequality gives d2 (G, Tan [G])
≤
n−1 X
1 − a2n · d2 (G, Ta [G]) . d2 Tak [G], Tak+1 [G] ≤ 2 1 − a k=0
(25)
From (i),(ii) we have d2 (Tan [G], Fa ) = d2 (Tan [G], Tan [Fa ]) ≤ a2n · d2 (G, Fa ). 9
(26)
Therefore, using the triangle inequality and (25),(26), d2 (G, Fa ) ≤ d2 (G, Tan [G]) + d2 (Tan [G], Fa ) 1 − a2n ≤ · d2 (G, Ta [G]) + a2n · d2 (G, Fa ) , 1 − a2 and taking n → ∞ we obtain (23). n The following Lemma plays a key role in proving the theorems: Lemma 2. Let Φ be the Normal distribution function (6). Then for a ∈ [0, 1)
−
d2 (Fa , Φ) ≤ sup e w6=0
(aw)2 2(1−a2 )
|CF (w) − CΦ (w)| · . w2
Proof : Applying (23) with G = Φ we have d2 (Φ, Fa ) ≤
d2 (Φ, Ta [Φ]) . 1 − a2
(27)
The characteristic function of the normal distribution Φ is given by CΦ (ξ) = √ 1 2 e− 2 ξ , so using (21) and the substitution w = 1 − a2 · ξ, we have |CTa [Φ] (ξ) − CΦ (ξ)| d2 (Ta [Φ], Φ) = sup (28) 1 − a2 (1 − a2 )ξ 2 ξ6=0 √ 1 2 1 2 |CF ( 1 − a2 · ξ)e− 2 (aξ) − e− 2 ξ | = sup (1 − a2 )ξ 2 ξ6=0 " # √ 1 2 · ξ) − e− 2 (1−a2 )ξ 2 | (aξ)2 |C ( 1 − a F = sup e− 2 · (1 − a2 )ξ 2 ξ6=0 # " 2 2 − 21 w 2 | |C (w) − e |CF (w) − CΦ (w)| − (aw) 2 − (aw) 2 F 2(1−a ) 2(1−a ) = sup e . = sup e · · w2 w2 w6=0 w6=0 Combining (27) and (28) we have the result. n We can now give the proofs of the theorems.
10
Proof of Theorem 1: By Lemma 2 it suffices to show that (aw)2 |C (w) − C (w)| − F Φ lim sup e 2(1−a2 ) · = 0. a→1 w6=0 w2 Fix ǫ > 0. By the assumption (17), CF and CΦ are twice differentiable, with CF (0) = CΦ (0) = 1, CF′ (0) = CΦ′ (0) = 0, CF′′ (0) = CΦ′′ (0) = −1 so application of L’Hospital’s rule gives CF (w) − CΦ (w) = 0. w→0 w2 lim
Therefore we can choose δ > 0 so that −
|w| < δ ⇒ e
(aw)2 2(1−a2 )
·
|CF (w) − CΦ (w)| |CF (w) − CΦ (w)| ≤ < ǫ. 2 w w2
(29)
Using the fact that |CF (w)|, |CΦ(w)| ≤ 1, we have −
|w| ≥ δ ⇒ e
(aw)2 2(1−a2 )
·
(aw)2 (aδ)2 2 − 2(1−a |CF (w) − CΦ (w)| 2 − 2(1−a 2) 2) ≤ e e ≤ . w2 w2 δ2
The right-hand side of the above inequality goes to 0 as a → 1, hence for a sufficiently close to 1 we have −
|w| ≥ δ ⇒ e
(aw)2 2(1−a2 )
·
|CF (w) − CΦ (w)| < ǫ. w2
From (29),(30) we have that, for a sufficiently close to 1, (aw)2 |CF (w) − CΦ (w)| − 2) 2(1−a · sup e < ǫ, w2 w6=0 concluding the proof. n
11
(30)
Proof of Theorem 2: Assume s > 2. Using Lemma 2 we have 2 |CF (w) − CΦ (w)| − (aw) 2 2(1−a ) d2 (Fa , Φ) ≤ sup e · w2 w6=0 (aw)2 |C (w) − C (w)| − F Φ s−2 = sup e 2(1−a2 ) · |w| · |w|s w6=0 (aw)2 − ≤ ds (F, Φ) · sup e 2(1−a2 ) |w|s−2
(31)
w6=0
Computing the supremum on the right-hand side of (31) by elementary cal√ (s−2)(1−a2 ) culus we find that it is attained at w = ± , hence a
−
sup e w6=0
(aw)2 2(1−a2 )
|w|
s−2
(s − 2)(1 − a2 ) = e · a2
12 (s−2)
,
which gives (16). n References [Aghbilagh et al. 2014] Aghbilagh, B. A., Shishebor, Z. and Saber, M. M. 2014. A Generalization of Discounted Central Limit Theorem and its Berry-Esseen Analogue, JSTA 13, 289-295. [Carrillo and Toscani 2007] Carrillo, J.A., Toscani, G., 2007. Contractive probability metrics and asymptotic behavior of dissipative kinetic equations, Riv. Mat. Univ. Parma 6, 75-198. [Chung 2001] Chung, K.L., 2001. A Course in Probability Theory, 3rd edition, Academic Press, London. [Embrechts and Maejima 1984] Embrechts, P. and Maejima, M. 1984. The central limit theorem for summability methods of iid random variables, Z. Wahrsch. Verw. Gebiete 68, 191-204. 12
[Feller 1968] Feller, W., 1968. An Introduction to Probability Theory and its Applications, Vol. 2, Wiley, New-York. [Gabetta et al. 1995] Gabetta, E., Toscani, G., Wennberg, W., 1995. Metrics for probability distributions and the trend to equilibrium for solutions of the Boltzmann equation, J. Statist. Phys. 81, 901-934. [Gerber 1971] Gerber, H.U., 1971. The discounted central limit theorem and its Berry-Esseen analogue, The Annals of Mathematical Statistics 42, 389-392. [Gibbs and Su 2002] Gibbs, A. L. and Su, F. E. 2002. On choosing and bounding probability metrics, Int. Stat. Rev. 70, 419-435. [Goudon et al. 2002] Goudon, T., Junca, S., Toscani, G., 2002. Fourierbased distances and Berry-Esseen like inequalities for smooth densities, Monatsh. Math 135, 115-136. [Horv´ath 1985] Horv´ath, L., Approximation for Abel sums of independent, identically distributed random variables, Statistics & Probability Letters 3, 221-225. [Miao et al. 2013] Miao, Y., Xue, T. and Du, T., 2013. The Discounted Berry-Essen Analogue for Autoregressive Processes, Communications in Statistics-Theory and Methods 45, 2684-2693. [Saulis and Deltuvien˙e 2006] Saulis, L., Deltuvien˙e, D., 2006. The discounted limit theorems, Acta Applicandae Mathematicae 90, 219-226.
13
[Sunklodas 2006] Sunklodas, J., 2006. On the discounted global CLT for some weakly dependent random variables, Lithuanian Math. J., 46, 475486. [Whitt 1972] Whitt, W., 1972. Stochastic Abelian and tauberian theorems, Z. Wahrsch. Verw. Gebiete 22, 251-267.
14