New deviation inequalities for martingales with bounded increments

New deviation inequalities for martingales with bounded increments

Accepted Manuscript New deviation inequalities for martingales with bounded increments Emmanuel Rio PII: DOI: Reference: S0304-4149(16)30153-3 http:/...

369KB Sizes 0 Downloads 58 Views

Accepted Manuscript New deviation inequalities for martingales with bounded increments Emmanuel Rio PII: DOI: Reference:

S0304-4149(16)30153-3 http://dx.doi.org/10.1016/j.spa.2016.09.003 SPA 3025

To appear in:

Stochastic Processes and their Applications

Received date: 3 November 2015 Revised date: 2 September 2016 Accepted date: 12 September 2016 Please cite this article as: E. Rio, New deviation inequalities for martingales with bounded increments, Stochastic Processes and their Applications (2016), http://dx.doi.org/10.1016/j.spa.2016.09.003 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

*Manuscript

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

New deviation inequalities for martingales with bounded increments Emmanuel Rio(?) (?)

Université de Versailles, Laboratoire de mathématiques de Versailles, UMR 8100 CNRS, Bâtiment Fermat, 45 Avenue des Etats-Unis, 78035 Versailles, FRANCE. E-mail: [email protected] Key words and phrases. Hoeffding’s inequality, Bennett’s inequality, Martingales with bounded increments, Freedman’s inequality, Deviation inequalities, Bounded differences inequality. Mathematical Subject Classification (2010). 60E15

Abstract In this paper we give new deviation inequalities for martingales with increments bounded from above. Our results are based on an improvement of the results of Bennett (1968) for random variables bounded from above.

1

Introduction

In this paper, we are interested in the deviation on the right of sums of square-integrable, bounded from above, martingale differences. We will assume throughout the paper that the random variables are bounded on the right. Our aim is to continue the research started in Bentkus (2003, 2004) and Fan, Grama and Liu (2012) on deviation inequalities for martingales with binomial rate functions. Throughout the introduction, in order to explain the results to readers which are not specialized in martingale inequalities, we will focus our attention on independent random variables. So, let X1 , X2 , . . . , Xn be a finite sequence of independent and square-integrable random variables satisfying the conditions below: Var Xk ≤ vk , IE(Xk ) = 0 and Xk ≤ ck almost surely, (1.1) for positive reals v1 , v2 , . . . , vn and c1 , c2 , . . . , cn . The sum Sn is defined by Sn = X1 + X2 + · · · + Xn .

(1.2)

We now recall some known results on random variables bounded on the right. Bennett (1962, page 42) proved that, for a centered random variable X with variance v bounded on the right by some positive constant c, the value of IE(exp(tX)) is maximized for any positive t by the discrete distribution µ given by µ({c}) = v/(c2 + v) and µ({−v/c}) = c2 /(c2 + v). 1

(1.3)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

Now, let us define the log-Laplace transform `w by  `w (t) := log wet + e−wt − log(1 + w).

(1.4)

Bennett’s result ensures that, for any k in [1, n],

 w ect + e−wk ct   k log IE exp(tXk ) ≤ log with c = sup ck and wk = vk /c2 . 1 + wk k∈[1,n]

(1.5)

Next Hoeffding (1963, Lemma 3, page 23) proved that, for any t > 0, the function v 7→ `v (t) is concave with respect to v. Combining this result with Bennett’s result, Hoeffding proved that  log IE exp(tSn ) ≤ n`(Vn /nc2 ) (ct), where Vn = v1 + v2 + · · · + vn . (1.6) Now, let `∗w denote the Legendre transform of `w . Then `∗w (u) = +∞ for u > 1 and  w + u   u 1−u ∗ `w (u) = sup ut − `w (t) = log 1 + + log 1 − u (1.7) w+1 w w+1 t≥0

for u in [0, 1]. The usual Chernoff calculation and the above inequality ensure that, for any positive x,  IP(Sn ≥ ncx) ≤ exp −n`∗(Vn /nc2 ) (x) . (1.8)

This result is exactly Theorem 3 in Hoeffding (1963). We refer to Bentkus (2004) and Pinelis (2014) for additional results concerning Hoeffding’s type inequalities in the independent case. Notice that n`∗(Vn /nc2 ) (x) ∼ (2Vn )−1 (ncx)2 as x → 0.

(1.9)

Hence (1.8) provides the adequate moderate deviations rate function for x small enough. However, as noted by Bennett (1968), from the assumption (1.1), IP(Sn > c1 + c2 + · · · + cn ) = 0,

(1.10)

which cannot be deduced from (1.8). Therefore, starting from (1.3), Bennett proved that X  log IE exp(tSn ) ≤ (|c|1 /c) `w (ct), where |c|1 = ck and w = sup (vk /c2k ), (1.11) 1≤k≤n

k∈[1,n]

(see Bennett (1968), Eq. (7), page 567) from which he derived the large deviations inequality  IP(Sn ≥ |c|1 x) ≤ exp −(|c|1 /c) `∗w (x) . (1.12)

(see Bennett (1968), Eq. (9), page 567). Clearly this inequality yields (1.10). However, the upper bound (1.11) is suboptimal in the moderate deviations bandwidth, even in the case v1 /c21 =P· · · = vn /c2n = w. Indeed, in that case, one can prove that w = (Vn /|c|22 ), with |c|22 = nk=1 c2k , which shows that (|c|1 /c) `∗w (x)

|c|22 (|c|1 x)2 as x → 0. ∼ c|c|1 2Vn 2

(1.13)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

Since |c|22 < c|c|1 , except in the case c1 = c2 = · · · = cn , (1.13) shows that (1.12) is less efficient than (1.8) for small values of x. In this paper we will obtain an improved version of (1.12) which implies (1.8). In order to improve (1.12), we will prove in Section 2 that `v (at) ≤ a `av (t) for any t > 0 and any a in [0, 1]. From this inequality, we will derive in Section 2 an improved version of Inequality (1.12), which implies (1.8). Next the results of Section 2 are extended to martingale differences in Section 3: more precisely, we will give improved versions of the Hoeffding’s type inequalities of Fan, Grama and Liu (2012) in the specific case of martingale differences.

2

An improvement of Bennett’s 1968 Inequality

In this section, we improve Inequality (1.12) of Bennett (1968) and we give some applications of this improved version. Let us now state the main result of this section. P Theorem 2.1. Let Sn be defined by (1.2). Assume that (1.1) holds and set |c|1 = nk=1 ck . Then, for any positive t and any c ≥ max(c1 , c2 , . . . , cn ),  log IE exp(tSn ) ≤ (|c|1 /c) `(Vn /c|c|1 ) (ct), where Vn = v1 + v2 + · · · + vn . (2.1) Consequently, for any positive x,

 IP(Sn ≥ |c|1 x) ≤ exp −(|c|1 /c) `∗(Vn /c|c|1 ) (x) .

(2.2)

Remark 2.1. From the concavity of `v with respect to v and the fact that `0 = 0, |c|1 `(Vn /c|c|1 ) (ct) ≤ nc`(Vn /nc2 ) (ct). Hence Inequality (2.1) is more efficient than (1.6). Therefrom (2.2) improves on (1.8). Let w be defined as in (1.11). Then w ≥ supk (vk /ck c) ≥ (Vn /c|c|1 ). Since `w (x) is increasing with respect to w, it ensures that (2.2) is more efficient than (1.12). Remark 2.2. Hoeffding (1963) proved that, if w ≥ 1 then `∗w (x) ≥ x2 /(2w) for any positive x. Consequently, if Vn ≥ c|c|1 , then (2.2) implies that, for any positive y, −2Vn log IP(Sn ≥ y) ≥ y 2 .

(2.3)

Under the assumptions of Theorem 1.1, Theorem 2.2 in Bentkus (2003) yields Vn y 2 ≥ y 2 if vk ≥ c2k for any k ∈ [1, n]. 2 max(c , v ) k k k=1

−2Vn log IP(Sn ≥ y) ≥ Pn

(2.4)

It appears here that (2.3) cannot be deduced from (and does not imply) (2.4). We now give an application of Theorem 2.1 to random variables bounded from below. Corollary 2.1. Let X1 , X2 , . . . , Xn be a finite sequence of centered independent random variables. Suppose that, for any k in [1, n], −1 ≤ Xk ≤ ck almost surely, for positive constants c1 , c2 , . . . , cn . Pn Let |c|1 = k=1 ck . Then, for any positive x and any c ≥ max(c1 , c2 , . . . , cn ),  IP(Sn ≥ |c|1 x) ≤ exp −(|c|1 /c) `∗(1/c) (x) . 3

(2.5)

(2.6)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

Remark 2.3. If ck ≤ 1 for any k in [1, n], then (2.6) holds true with c = 1. Since `∗1 (x) > x2 /2, (2.6) improves on the inequality  IP(Sn ≥ |c|1 x) ≤ exp −(|c|1 x2 /2 , (2.7) which is a byproduct of (2.4), since max(vk , c2k ) ≤ ck if ck ≤ 1 and Xk ≥ −1.

We now prove Theorem 2.1 and Corollary 2.1. The proof of Theorem 2.1 is based on the elementary lemma below. Lemma 2.1. Let the functions `w be defined by (1.4). Then `0v (as) ≤ `0av (s) for any s > 0 and any a in [0, 1]. Consequently `v (as) ≤ a`av (s) for any s > 0 and any a in [0, 1]. Proof of Lemma 2.1 From the definition of the function `w ,   v eas − e−vas av es − e−avs 0 0 `v (as) = and `av (s) = veas + e−vas aves + e−avs

(2.8)

Consequently `0v (as) ≤ `0av (s) if and only if     eas − e−vas aves + e−avs ≤ a es − e−avs veas + e−avs , which is equivalent to

(1 + av)e(1−v)as ≤ (1 − a)e−2avs + (a + av)e(1−av)s .

(2.9)

Now (2.9) is an immediate byproduct of the convexity of the exponential function, since (1 + av)(1 − v)a = (1 − a)(−2av) + (a + av)(1 − av). Hence `0v (as) ≤ `0av (s), which gives the first part of Lemma 2.1. The second part of Lemma 2.1 follows by integrating this bound with respect to s. End of the proof of Theorem 2.1. From (1.3), for any k in [1, n],  log IE exp(tXk ) ≤ `(vk /c2k ) (ck t). Now, applying Lemma 2.1 with a = ck /c and s = ct, we get that

`(vk /c2k ) (ck t) = `(vk /c2k ) (as) ≤ (ck /c)`(vk /ck c) (ct). From the two above inequalities and the independence of the random variables Xk we then get that n  X log IE exp(tSn ) ≤ (ck /c)`(vk /ck c) (ct). (2.10) k=1

Next, from the concavity of `w with respect to w, n X k=1

(ck /c)`(vk /ck c) (ct) ≤ (|c|1 /c)`(Vn /|c|1 c) (ct).

The two above inequalities imply then (2.1). Finally (2.2) follows from (2.1) via the Chernoff calculation exactly in the same way as (1.12) follows from (1.11). 4

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

Proof of Corollary 2.1. From condition (2.5), (Xk + 1)(Xk − ck ) ≤ 0 almost surely. Taking the expectation in this inequality, we easily get that Var Xk ≤ ck . Hence Vn ≤ |c|1 . Now `∗w is nonincreasing with respect to w. It follows that `∗(Vn /c|c|1 ) (x) ≥ `∗(1/c) (x). Now Corollary 2.1 follows from Theorem 2.1 via this inequality. Remark 2.4. Starting from Lemma 2.1, one can easily prove the property below for the dual functions `∗w : for any positive x, the function w 7→ w`∗w (x) is nondecreasing on ]0, +∞[. Define h by h(u) = (1 + u) log(1 + u) − u for u ≥ −1 and h(u) = +∞ for u < −1 and let ψw (x) = w`∗w (x). Then ψw (x) = (1 + w)−1 (w2 h(x/w) + wh(−x)). Now, for any x ≥ 0, w2 h(x/w) ≤ h(−x), whence w2 h(x/w) ≤ ψw (x) ≤ h(−x). Furthermore limw→∞ ψw (x) = h(−x). To conclude this section, we give an other application of Lemma 2.1, which improves and extends (2.4). Theorem 2.2. Let w be any real in [1, ∞[. With the same notations as in Theorem 2.1, for any positive t, n  tV (w)  X  |c|21 n log IE exp(tSn ) ≤ w`w , where Vn (w) = max(vk , c2k w). (2.11) Vn (w) w|c|1 k=1 Consequently, for any positive x,

 ∗ ∗ 2 −|c|−2 1 Vn (w) log IP Sn ≥ |c|1 x ≥ w`w (x) ≥ `1 (x) > (x /2).

(2.12)

Proof. Set wk = (vk /c2k ). From (1.3) and the monotonicity of `w with respect to w, log IE(exp(tSn )) ≤

n X k=1

`(wk ∨w) (ck t).

Now, if wk > w, by Lemma 2.1 applied with c = w/wk , `wk (ck t) ≤ (w/wk )`w (ck wk t/w), whence n  X w   (w ∨ wk )ck t  `w . (2.13) log IE(exp(tSn )) ≤ L(t) := w ∨ wk w k=1

Next, deriving L and using the concavity of `0w for w ≥ 1 (see Bercu, Delyon and Rio (2015), page 47), we obtain that L0 (t) ≤ |c|1 `0w Vn (w)t/|c|1 w . (2.11) follows by integrating with respect to t this upper bound on L0 and applying (2.13). (2.12) follows from (2.11) via the usual Chernoff calculation. Example 2.1. Let X1 , X2 , . . . be an infinite sequence of independent random variables fulfilling (1.1). Set Vn = v1 + · · · + vn and An = c1 + · · · + cn . Assume that limn Vn = ∞ and lim inf n c−2 n vn = z ∈ [1, ∞]. Then, by the Toeplitz lemma, for any finite w in [1, z], limn (Vn (w)/Vn ) = 1 . If z is finite, (2.12) applied with w = z yields ∗ lim inf −A−2 n Vn log IP(Sn ≥ An x) ≥ w`w (x), where w = z. n→∞

(2.14)

If z = ∞, the above result holds true for any w in [1, ∞[. Taking the limit as w → ∞, we then get that the liminf is larger than h(−x), according to the end of Remark 2.4. Note that, under the same assumptions, (2.4) yields the less efficient asymptotic inequality 2 lim inf −A−2 n Vn log IP(Sn ≥ An x) ≥ x /2. n→∞

5

(2.15)

3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

Extensions to martingale differences

Let (Fk )k∈IN denote an increasing filtration. Let (Xk )k>0 be a sequence of real-valued and integrable random variables, adapted to the above filtration and such that IE(Xk | Fk−1 ) = 0 for any positive integer k.

(3.1)

The martingale (Mn )n≥0 is defined by M0 = 0 and Mn = X1 + X2 + · · · + Xn for n > 0.

(3.2)

If the random variables Xk are square-integrable, the quadratic compensator (hM in )n≥0 associated to the martingale (Mn )n≥0 is defined by hM i0 = 0 and hM in =

n X k=1

IE(Xk2 | Fk−1 ) for any n > 0.

(3.3)

Throughout this section, we assume that there exists a sequence (Ck )k>0 of nonnegative and integrable random variables such that Ck is Fk−1 -measurable and Xk ≤ Ck a.s. for any k > 0.

(3.4)

Our first result is an extension of Corollary 2.1 to martingale differences. Theorem 3.1. Let X1 , X2 , . . . be a sequence of random variables satisfying (3.1) and (3.4). Define the nonnegative random variables An and Cn∗ by An = C1 + C2 + · · · + Cn and Cn∗ = max(C1 , C2 , . . . , Cn ).

(3.5)

Xk ≥ −1 almost surely for any positive integer k.

(3.6)

Suppose that Let (Mn ) be defined by (3.2). Then, for any positive reals x, y and z,  IP Mn ≥ x, An ≤ y, Cn∗ ≤ z) ≤ exp −(y/z) `∗(1/z) (x/y) .

(3.7)

Remark 3.1. Suppose that the random variables Ck are almost surely bounded by some positive deterministic constant c. Then, applying Theorem 3.1, with z = c, one obtains that  IP Mn ≥ x, An ≤ y) ≤ exp −(y/c) `∗(1/c) (x/y) , (3.8) which extends Corollay 2.2 to martingale differences.

Proof of Theorem 3.1. Let t be any positive real. Define the sequence (Rn (t))n≥0 by R0 (t) = 1 and  Rn (t) = Rn−1 (t) exp tXn − `(1/Cn ) (Cn t) for any integer n > 0. (3.9) From the convexity of the exponential function,

 exp(tXn ) ≤ (1 + Cn )−1 (Xn + 1)eCn t + (Cn − Xn )e−t . 6

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

Therefrom, taking the conditional expectation with respect to Fn−1 ,    IE etXn | Fn−1 ≤ (1 + Cn )−1 eCn t + e−t = exp `(1/Cn ) (Cn t) . It follows that

 IE(Rn (t) | Fn−1 ≤ Rn−1 (t) almost surely, for any n > 0.

(3.10)

Consequently the sequence (Rn (t))n is a supermartingale adapted to the filtration (Fn )n . In particular  IE(Rn (t) ≤ 1 for any n ≥ 0. (3.11)

Let us define the event Γn (y, z) by Γn (y, z) = (An ≤ y, Cn∗ ≤ z). Then, for any positive x,   (3.12) IP(Mn ≥ x, An ≤ y, Cn∗ ≤ z) ≤ exp −tx IE etMn IIΓn (y,z) .

Now

tMn

IE e

IIΓn (y,z)





n X   = IE Rn (t) exp `(1/Ck ) (Ck t) IIΓn (y,z) . k=1

Next, on the event Γn (y, z), by Lemma 2.1 applied with v = 1/Ck , s = zt and a = (Ck /z), `(1/Ck ) (Ck t) ≤ (Ck /z)`(1/z) (zt), from which

n X k=1

Hence

`(1/Ck ) (Ck t) ≤ (y/z)`(1/z) (zt) on the event Γn (y, z).

   IE etMn IIΓn (y,z) ≤ IE Rn (t) exp (y/z)`(1/z) (zt) ≤ exp (y/z)`(1/z) (zt) .

(3.13)

(3.14)

The above inequality and (3.12) ensure that, for any positive t,

 IP(Mn ≥ x, An ≤ y, Cn∗ ≤ z) ≤ exp (y/z)(`(1/z) (tz) − (x/y)tz) .

(3.15)

Minimizing with respect to t the upper bound in the above inequality, we then get Theorem 3.1. Example 3.1. Let (ξn )n≥0 be a sequence of independent random variables with common law the uniform law over [0, 1]. For any k ≥ 0, let Fk = σ(ξ P i : 0 ≤ i ≤ k). Let (bk )k>0 be a sequence of reals in [0, 1] such that limk→∞ bk = 0 and k bk = ∞. Set Xk = II(ξk−1
Then (Xk )k>0 satisfies the assumptions of Theorem 3.1 with Ck = II(ξk−1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

It follows that the Laplace transform of An is bounded by the Laplace transform of the Poisson law P(Bn ) (see Bentkus (2004) for a proof). Consequently, for any positive z,   IP An > Bn (1 + z) ≤ exp −Bn h(z) , where h(z) = (1 + z) log(1 + z) − z. (3.17) Combining (3.17) and (3.16) applied with y = Bn (1 + z), we then obtain that    IP Mn ≥ Bn (1 + z)w ≤ exp −Bn h(z) + exp −Bn (1 + z)`∗1 (w) .

In order to get a more tractable inequality, we now choose z and w in such a way that (1 + z)`∗1 (w) = h(z) = u. With this choice of z and w, the above inequality yields   −1 IP Mn ≥ Bn (h−1 (u) + 1) `∗−1 (u/(h (u) + 1)) ≤ 2 exp −B u . (3.18) n 1 Now, by Inequality (3.6) in Rio (2013),

`∗1 (x) ≥ (1 − x2 /2)−1/3 (x2 /2) ≥ (1 − x2 /6)−1 (x2 /2), which implies that, for any positive reals u and z, p `∗−1 2u/(1 + z + u/3). 1 (u/(1 + z)) ≤ p Next h(x) ≥ x2 /(1 + (x/3) + 1 + 2x/3 ), which ensures that √ h−1 (u) ≤ z := 2u + (u/3).

(3.19)

(3.20)

Combining the two above upper bounds on these inverse functions, we then get that  1/2 √ 2u −1 ∗−1 −1 √ (h (u) + 1) `1 (u/(h (u) + 1)) ≤ (1 + 2u + u/3) . 1 + 2u + 2u/3

Now, it can √ be proved that the upper bound in the above inequality is bounded from above by 2u + u. Combining these facts with (3.18), we finally obtain that, for any positive u, √  (3.21) IP Mn ≥ Bn ( 2u + u) ≤ 2 exp −Bn u).

Notice here that Bn = Var Mn . Hence the above inequality provides optimal results in the moderate deviations bandwidth. Note also that, although the random variables Xk are symmetric, the random variable √ Mn has a positive cubic moment. Consequently the subGaussian inequality IP Mn ≥ Bn 2u ≤ exp −Bn u) cannot be reached.

We now give an extension of Theorem 2.1 to martingale differences. For sake of simplicity, we will assume that the increments Xk of the martingale satisfy the additional condition Xk ≤ 1 almost surely, for any positive k. (3.22)

Under this condition, (3.4) holds true for random variables Ck with values in [0, 1]. Starting from Lemma 2.1, we now prove the following result.

Theorem 3.2. Let X1 , X2 , . . . be a sequence of square integrable random variables satisfying (3.1), (3.22) and condition (3.4) for random variables Ck with values in [0, 1]. Let (An ) be defined as in Theorem 3.1. Let hM in denote the quadratic compensator associated to the martingale (Mn ) defined in (3.2). Then, for any positive x, y and z,  IP(Mn ≥ x, hM in ≤ y, An ≤ z) ≤ exp −z`∗(y/z) (x/z) . (3.23) 8

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

Remark 3.2. Clearly An ≤ n almost surely. Consequently Theorem 3.2 applied with z = n yields  IP(Mn ≥ x, hM in ≤ y) ≤ exp −n`∗(y/n) (x/n) , which is exactly Inequality (5) in Fan, Grama and Liu (2012).

Proof of Theorem 3.2. For any positive integer k, set Bk = hM ik − hM ik−1 . Then, applying Inequality (1.3) conditionally to Fk−1 with c = Ck and v = Bk , we get: log IE exp(tXk ) | Fk−1 ) ≤ `Bk /Ck2 (Ck t). Next, applying Lemma 2.1 with c = 1 to the above upper bound, we obtain that log IE exp(tXk ) | Fk−1 ) ≤ Ck `Bk /Ck (t).

(3.24)

It follows that, for any positive t, the sequence (Qn (t))n≥0 defined by Q0 (t) and  Qn (t) = Qn−1 (t) exp tXn − Cn `(Bn /Cn ) (t) for any integer n > 0

is a positive supermartingale adapted to the filtration (Fn )n . In particular  IE(Qn (t) ≤ 1 for any n ≥ 0.

(3.25)

Let us define the event Λn (y, z) by Λn (y, z) = (hM in ≤ y, Bn ≤ z). Then, for any x > 0,   IP(Mn ≥ x, hM in ≤ y, An ≤ z) ≤ exp −tx IE etMn IIΛn (y,z) . (3.26)

Now

n  X    IE etMn IIΛn (y,z) = IE Qn (t) exp Ck `(Bk /Ck ) (t) IIΛn (y,z) . k=1

Next, on the event Λn (y, z), by the concavity of `v (t) with respect to v, n X k=1

Ck `(Bk /Ck ) (t) ≤ An `hM in /An (t) ≤ z`hM in /z (t),

from which, using the monononicity of `v with respect to v, n X

Ck `(Bk /Ck ) (t)z`(y/z) (t) on the event Λn (y, z).

(3.27)

k=1

Hence

   IE etMn IIΛn (y,z) ≤ IE Rn (t) exp z`(y/z) (t) ≤ exp z`(y/z) (t) .

(3.28)

The above inequality and (3.26) ensure that, for any positive t,

 IP(Mn ≥ x, hM in ≤ y, An ≤ z) ≤ exp z `(y/z) (t) − xt .

(3.29)

Minimizing with respect to t the upper bound in the above inequality, we then get Theorem 3.2. To conclude this section, we now give an application of Theorem 3.2 to martingale transforms. 9

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

Corollary 3.1. Let (ξk )k>0 be a sequence of martingale differences adapted to an increasing filtration (Fk )k∈IN . Assume that ξk ≤ 1 a.s. and IE(ξk2 | Fk−1 ) = v for any k > 0, for some positive v.

(3.30)

Let (Ck )k>0 be a sequence of random variables with values in [0, 1], such that Ck is Fk−1 measurable for any positive k. Let Xk = Ck ξk and let the martingale (Mn )n be defined from the random variables Xk by (3.2). Then, for any positive x and any δ in [0, 1],  IP(Mn ≥ x, hM in ≤ nvδ 2 ) ≤ exp −nδ `∗δv x/(nδ) . (3.31) Remark 3.3. Let (ηk )k>0 be a sequence of martingale differences adapted to an increasing filtration (Fk )k∈IN . Assume that −1 ≤ ηk ≤ 1 a.s. and IE(ηk2 | Fk−1 ) = v for any k > 0, for some v > 0. Let (Dk )k>0 be a sequence of random variables with values in [−1, 1], such that Dk is Fk−1 -measurable for any positive k. Let Xk = Dk ηk and let the martingale (Mn )n be defined from the random variables Xk by (3.2). Define εk by εk = 1 if Dk ≥ 0 and εk = −1 if Dk < 0. Let then ξk = εk ηk and Ck = εk Dk = |Dk |, so that we also have Xk = Ck ηk . Then the sequences (ξk )k and (Ck )k fulfill the assumptions of Corollary 3.1, and consequently (3.31) holds true. Remark 3.4. Inequality (5) in Fan, Grama and Liu (2012) yields  IP(Mn ≥ x, hM in ≤ nvδ 2 ) ≤ exp −n `∗δ2 v (x/n) .

Let us compare this result with Corollary 3.1. Recall that `v is concave with respect to v, which implies that the map v 7→ `v (t)/v is nonincreasing for any positive t. It follows that the map v 7→ v −1 `∗v (xv) is nondecreasing for any positive x. Hence, for any δ in [0, 1] and any positive s, δ `∗δv (s) ≥ `∗δ2 v (δs). From this inequality applied with s = x/(nδ), Corollary 3.1 is more efficient than Inequality (5) in Fan, Grama and Liu (2012) in this specific case. Proof of Corollary 3.1. Let An be defined as in (3.5). By the Schwarz inequality, A2n ≤ n (C12 + C22 + · · · + Cn2 ). Now, from the definition of (Mn )n , hM in = v (C12 + C22 + · · · + Cn2 ). Consequently A2n ≤ (n/v)hM in .

(3.32)

2 We now apply Theorem p 3.2 with y = nvδ and z = nδ. From the above inequality, if hM in ≤ y, then An ≤ (n/v)nvδ 2 = nδ = z. Consequently for the above choices of the reals y and z, (hM in ≤ y, An ≤ z) = (hM in ≤ y). Applying now Theorem 3.2, we get Corollary 3.1.

Example 3.2. Let (ξk )k≥0 be a sequence of independent and identically distributed random variables with discrete distribution µ given by µ({1}) = a/(1+ a) = 1− µ({−a}). Then IE(ξk ) = 0 and Var ξk = a. For any k ≥ 0, let Fk = σ(ξi : 0 ≤ i ≤ k). Set bk = (1 + a)−1 (ξk + a) for k ≥ 0 and Xk = bk−1 ξk for any k > 0. 10

(3.33)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

Clearly the random variables bk have the Bernoulli law b(a/(1 + a)). Let Mn and hM in be defined by (3.2) and (3.3) respectively. Then n n X X hM in = a b2k−1 = a bk−1 = aAn . k=1

k=1

Now let u and w be reals in [0, 1]. Applying Theorem 3.2 with x = nuw, z = nw and y = anw, we get that  (3.34) IP(Mn ≥ nuw, An ≤ nw) ≤ exp −nw`∗a (u) . Next

whence

n−1

na 1 X An = + ξk−1 , 1 + a 1 + a k=0

 IP(An > nw) = IP ξ0 + · · · + ξn−1 > n(w(1 + a) − a) .

Now, by Inequality (1.8), i.e. Theorem 3 in Hoeffding (1963), applied with c = 1 and Vn = na,  IP(An > nw) ≤ exp −n`∗a (w(1 + a) − a) , for any w > a/(1 + a). (3.35)

We now choose w = (a + u)/(a + 1). Combining (3.34) and (3.35) with this choice of w, we get that  a + u   a + u   ∗ ∗ u ≤ exp −n`a (u) + exp −n ` (u) IP Mn ≥ n a+1 a+1 a  a + u  ≤ 2 exp −n `∗ (u) . (3.36) a+1 a For the same choices of x, y and w, Inequality (5) in Fan, Grama, Liu (2012) yields the slightly less efficient inequality   IP Mn ≥ nwu ≤ 2 exp −n`∗aw (uw) . ( Here w = (a + u)/(a + 1) ). (3.37)

Notice now that, for any x in [0, 1], the equation x = u(a + u)/(a + 1) has an unique solution ux in [0, 1]. More precisely p  (3.38) ux = −a + a2 + 4(a + 1)x /2.

Choosing u = ux in (3.36), we have the equivalent formulation  a + u    x ∗ IP Mn ≥ nx ≤ 2 exp −n ` (ux ) . (3.39) a+1 a For example, suppose now that a ≥ 1. Then `∗a (ux ) ≥ u2x /(2a). In that case, (3.39) and straightforward computations yield !  n(a + 1)x2 p IP Mn ≥ nx ≤ 2 exp −  a a + a2 + 4(a + 1)x ! n(a + 1)x2 . (3.40) ≤ 2 exp − 2 2(a + (a + 1)x) The above inequality can be derived from (3.37) only under the more restrictive condition a2 ≥ a + 1. 11

References 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

[1] Bennett, G. Probability inequalities for the sum of independent random variables. J. Amer. Statist. Assoc. 57, 33-45 (1962). [2] Bennett, G. A one-sided inequality for the sum of independent, bounded random variables. Biometrika 55, no. 3, 565-569 (1968). [3] Bentkus, V. An inequality for tail probabilities of martingales with differences bounded from one side. J. Theoret. Probab. 16, no. 2, 161-173 (2003). [4] Bentkus, V. On Hoeffding’s inequalities. Ann. Probab. 32, no. 2, 1650-1673 (2004). [5] Bercu, B., Delyon, B. and Rio, E. Concentration inequalities for sums and martingales. SpringerBriefs in Mathematics. Springer (2015). [6] Fan, X., Grama, I. and Liu, Q. Hoeffding’s inequality for supermartingales. Stochastic Process. Appl. 122, no. 10, 3545-3559 (2012). [7] Hoeffding, W. Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc. 58, 13-30 (1963). [8] Pinelis, I. On the Bennett-Hoeffding inequality. Ann. Inst. Henri Poincaré Probab. Stat. 50, 15-27 (2014). [9] Rio, E. Extensions of the Hoeffding-Azuma inequalities. Electron. Commun. Probab. 18, (54), 1-6 (2013).

12