A reversion of the Chernoff bound

A reversion of the Chernoff bound

ARTICLE IN PRESS Statistics & Probability Letters 77 (2007) 558–565 www.elsevier.com/locate/stapro A reversion of the Chernoff bound Ted Theodosopou...

168KB Sizes 2 Downloads 25 Views

ARTICLE IN PRESS

Statistics & Probability Letters 77 (2007) 558–565 www.elsevier.com/locate/stapro

A reversion of the Chernoff bound Ted Theodosopoulos IKOS Research, 9 Castle Square, Brighton, East Sussex, BN1 1DZ, UK Received 11 May 2005; received in revised form 31 August 2006; accepted 20 September 2006 Available online 18 October 2006

Abstract This paper describes the construction of a lower bound for the tails of general random variables, using solely knowledge of their moment generating function. The tilting procedure used allows for the construction of lower bounds that are tighter and more broadly applicable than existing tail approximations. r 2006 Elsevier B.V. All rights reserved. MSC: primary 60E10; 62E17; secondary 60F10; 60J50 Keywords: Cumulant Legendre transform; Tail bounds; Saddlepoint approximation; Exponential tilting

1. A lower bound to complement the Chernoff bound This paper presents and solves a nonlinear optimization problem arising in the construction of lower bounds for the tails of distributions which possess a moment generating function on an open subset of Rþ . The resulting lower bounds complement the classical Chernoff (upper) bound (Chernoff, 1952) in a set of cases more general than has been previously achieved. The methodology for the construction of the bounds was motivated by the presentation of the lower bound in Crame´r’s large deviations theorem on p. 29 of Stroock (1993). An earlier version of the results presented here was used in Theodosopoulos (1999) to establish a lower bound to the asymptotic convergence rate for an algorithm for global optimization. The bounds presented here share numerous methodological characteristics with the development of saddleppoint approximations (Daniels, 1954, 1980; Jensen, 1995). Both schemes use tilting, a technique first developed by Esscher, in order to center the power series expansions at the desired tail of the distribution. Our method concentrates on the restriction of the Laplace transform on the real line. A nonlinear optimization problem is constructed by adding two degrees of freedom to the tilting procedure. This allows us to obtain tighter lower bounds, which hold even in cases where existing lower bounds break down. The same direction was explored independently in Bagdasarov and Ostrovskii (1996) where a rough lower bound is computed using some rudiments of the methodology utilized here. We parameterize the problem more efficiently, thereby arriving at a lower bound which possesses significantly better tightness. E-mail address: [email protected]. 0167-7152/$ - see front matter r 2006 Elsevier B.V. All rights reserved. doi:10.1016/j.spl.2006.09.003

ARTICLE IN PRESS T. Theodosopoulos / Statistics & Probability Letters 77 (2007) 558–565

559

Let X be a real-valued, positive random variable on a probability space ðX; mÞ. Assume that X has :   exponential moments with respect to m, i.e. gðxÞ¼Em exX o1 for x in some open set ð1; x Þ, where x 40. Let the rate function be given by the Legendre transform of the cumulant (i.e. the logarithm of the moment : generating function), I m ðyÞ¼supx fxy  log gðxÞg. Further, let ( m : arg supxX0 fxy  log gðxÞg if yXE ½X ; XðyÞ¼ arg supxp0 fxy  log gðxÞg if yoEm ½X  be the corresponding ‘Legendre dual’. It is well known (Stroock, 1993) that X is an increasing concave function with XðEm ½X Þ ¼ 0, and thus, it is generally invertible. Using integration by parts we notice that   Z y d gðXðyÞÞ ¼ ygðXðyÞÞX0 ðyÞ¼)gðXðyÞÞ ¼ exp yXðyÞ  XðtÞ dt . (1.1) dy Em ½X  The above formula leads to the following concise representation of the rate function: Z y I m ðyÞ ¼ XðtÞ dt.

(1.2)

Em ½X 

Of course for any xox , Z x Z XðxÞ XðtÞ dt ¼ 1

X ðxÞ

x

l dl , X ðX1 ðlÞÞ 0

and X0 ðX1 ðlÞÞ ¼ so that Z

g2 ¼ 00 gg  g0 2



d g0 dl g

1 ,

x

XðtÞ dt ¼ xXðxÞ  xX1 ðxÞ  log 1

X ðxÞ

gðXðxÞÞ . gðxÞ

(1.3)

However, the integral representation of the exponent in (1.1) has two advantages. Firstly it does not depend explicitly on the moment generating function, creating the possibility of constructing the tail bounds we are after using only the Legendre dual, X, side-stepping the moment generating function. We will explore this idea in a subsequent paper. Secondly, the integral representation of the exponent in (1.1) allows us to combine rate functions in a straightforward manner by adding the corresponding Legendre duals. An application of the Markov inequality to the random variable expfxX g suffices to obtain a surprisingly accurate upper bound to the tails of X, the celebrated Chernoff bound (Chernoff, 1952; R. Bahadur and Rao, 1960) (assuming y4Em ½X ): mðX XyÞ ¼ inf mðexX Xexy Þp inf xX0

xX0

gðxÞ ¼ expfI m ðyÞg. exy

Note that this bound works equally well when yoEm ½X , by considering the left-hand tail instead of the right-hand one. Specifically we obtain mðX pyÞ ¼ mðX X  yÞ ¼ inf mðexX Xexy Þ, xp0

which leads to the same upper bound for the left-hand tail as the one for the right-hand tail we obtained above in the case y4Em ½X . This short exposition of the Chernoff bound emphasizes the optimizing degrees of freedom afforded to us by the free parameter x. The estimates in the rest of the paper involve the construction of a lower bound to accompany the Chernoff bound. The tools we use in the construction of the desired lower bound are exponential tilting and the incorporation of a second optimizing degree of freedom. The former leads to a centering of the measure around the tail of interest (Jensen, 1995). The latter allows us to tailor the tilting procedure in order to optimize the iterative use of the Chernoff bound to the tilted measure.

ARTICLE IN PRESS T. Theodosopoulos / Statistics & Probability Letters 77 (2007) 558–565

560

: In particular, let nx 2 M1 ðXÞ be a new probability measure on X defined by nx ðdtÞ¼gðxÞ1 ext mðdtÞ. This is called an exponentially tilted measure, after the theory of Esscher Tilting (Jensen, 1995). To simplify the notation, we will use I a to signify I nXðayÞ and Ea for EnXðayÞ . Observe that I m ¼ I Em ½X =y . Finally, let  : Lða; d; yÞ¼ð1  eI a ðdyÞ  eI a ðyÞ Þ exp I m ðayÞ  XðayÞyðd  aÞ . With this terminology we are in a position to state the proposed reversion of the Chernoff bound as a nonlinear constrained optimization in two dimensions: Theorem 1.1. For any y4Em ½X , the following inequality holds: mðX XyÞX sup Lða; d; yÞ.

(1.4)

1oaod

Moreover, there exist feasible values of a and d which make the right-hand side of (1.4) strictly positive. Proof. Following the traditional proof of Crame´r’s theorem (Stroock, 1993) we let Y to be a na -distributed random variable. Observe that, for any a,

Z 1 d

1 a tXðayÞ E ½Y  ¼ gðXðayÞÞ te mðdtÞ ¼

log g ¼ ay. dx x¼XðayÞ 1 Then, for every 1oaod we have Z 1 mðX XyÞ ¼ gðXðayÞÞ etXðatÞ na ðdtÞ Z

y dy

XgðXðayÞÞ XgðXðayÞÞe

etXðatÞ na ðdtÞ

y dyXðayÞ

na ð½y; dyÞ.

ð1:5Þ

We are now in a position to apply the Chernoff bound iteratively as it were to estimate the last term on the right-hand side. Specifically, we observe that, for any d4a41, na ðY 4dyÞp expfI a ðdyÞg, because dy4Ena ½Y  ¼ ay and na ðY oyÞp expfI a ðyÞg, because yoEna ½Y  ¼ ay. Consequently we can estimate the last term on the right-hand side of (1.6) as na ð½y; dyÞ ¼ 1  na ðY 4dyÞ  na ðY oyÞX1  eI a ðdyÞ  eI a ðyÞ .

(1.6)

Substituting into (1.5) we see that, for any 1oaod, mðX XyÞXLða; dÞ. Noting that the left-hand side does not depend on a or d, we conclude that the inequality is maintained if we maximize the right-hand side with respect to a and d, thus obtaining (1.4). In order to evaluate the rate function for the tilted measure we observe that, for any y, Ea ½eyY  ¼

gðy þ XðayÞÞ . gðXðayÞÞ

: Thus, Xa ðdyÞ¼argyX0 ; supfdyy  log Ea ½eyY g must satisfy



  d

gðy þ XðayÞÞ d

log log g dy ¼

¼

dy y¼Xa ðdyÞ gðXðayÞÞ dx x¼yþXðayÞ and therefore Xa ðdyÞ ¼ XðdyÞ  XðayÞ. Thus, using (1.2) in this situation we obtain Z dy Z I a ðdyÞ ¼ Xa ðtÞ dt ¼ yða  dÞXðayÞ þ Ea ½Y 

dy

ay

XðtÞ dt

(1.7)

ARTICLE IN PRESS T. Theodosopoulos / Statistics & Probability Letters 77 (2007) 558–565

561

and Z I a ðyÞ ¼

Z

y Ea ½Y 

ay

Xa ðtÞ dt ¼ yða  1ÞXðayÞ 

XðtÞ dt.

(1.8)

y

We are now in a position to show that there exists a feasible choice of a and d such that eI a ðdyÞ þ eI a ðyÞ o1, thus ensuring that the right-hand side of (1.4) is strictly positive. Specifically, observe that the monotonicity of XðÞ leads to d I a ðdyÞ ¼ yðXðdyÞ  XðayÞÞ40, dd d2 I a ðdyÞ ¼ y2 X0 ðdyÞX0 dd2 which implies that lim I a ðdyÞ ¼ 1.

(1.9)

d!1

On the other hand, d I a ðyÞ ¼ y2 ða  1ÞX0 ðayÞX0 da which, together with the observation that I a¼1 ðyÞ ¼ 0, implies that there exists an 40 and 1oa o1 such that I a ðyÞ ¼ . Choose a d o1 such that I a ðd yÞ4  logð1  e Þ. We can always do that because  of (1.9). Then, manifestly, eI a ðyÞ þ eI a ðd yÞ o1 and therefore the right-hand side of (1.4) is strictlly positive. & Note that the lower bound in (1.4) does not depend on explicit knowledge of the moment generating function g. Indeed, (1.2), (1.7) and (1.8) show that all the components of (1.4) can be computed directly from XðÞ. This opens the possibility, discussed further in the Conclusions, that no precise knowledge, or perhaps even existence, of the moment generating function may be required for (1.4). At this point it is worthwhile to compare the lower bound in Theorem 1.1 to three other approximations. The first one is Daniels’ saddlepoint approximation which, using our notation, is given by (Daniels, 1954, 1980) Z 1 pffiffiffiffiffiffiffiffiffiffi mðX XyÞð2pÞ1=2 (1.10) X0 ðtÞeI m ðtÞ dt. y

Compared to (1.4) in Theorem 1.1, (1.10) has the disadvantage that it involves an extra integral. In that sense, the Chernoff bound and (1.4) can be thought as upper and lower bounds to the integral expression in (1.10). Second, we look at the lower bound proposed by Bagdasarov and Ostrovskii (1996). While they use different notation, their methodology is very close to the one presented here. Specifically, their lower bound to mðX XyÞ has only one free parameter, D40, which, using the notation in the current paper, is equivalent to 1  XðyÞ=XðayÞ. As in the proof of Theorem 1.1 above, their lower bound needs access to a point yþ 4y. This point corresponds to the point dy in our notation. They describe this point as the point where the function lð1 þ DÞyIðyÞ, where l takes the place of XðayÞ in the notation used here. But the supremum of the function lð1 þ DÞyIðyÞ over y is the Legendre transform of IðyÞ, which is itself the Legendre transform of log gðxÞ. Using Legendre duality we conclude that yþ ¼ dy ¼ X1 ð2XðayÞ  XðyÞÞ. With these notational translations, the Bagdasarov–Ostrovskii (B–O) lower bound can be described as ~ yÞ, mðX XyÞX sup Lða; a41

where : 1  XðayÞ=ðXðayÞ  XðyÞÞ½eI a ðdða;yÞyÞ þ eI a ðyÞ  ~ yÞ¼ Lða; Lða; dða; yÞ; yÞ, 1  eI a ðdða;yÞyÞ  eI a ðyÞ

(1.11)

ARTICLE IN PRESS T. Theodosopoulos / Statistics & Probability Letters 77 (2007) 558–565

562

and : dða; yÞ¼y1 X1 ð2XðayÞ  XðyÞÞ. So we can see immediately that the B–O lower bound is inferior to the one described in Theorem 1.1 for two reasons. On the one hand, it foregoes one of the two optimizing degrees of freedom (making d a function of a, which we will recognize as a suboptimal choice in the following section). Furthermore, the term XðayÞ=ðXðayÞ  XðyÞÞ makes the fraction on the right-hand side of the expression for L~ strictly less than 1. In the same spirit is the lower bound presented in Stroock (1993). The construction of Stroock’s lower bound is very similar to the one we present here, and in fact our presentation mirrors his. The main difference lies with Stroock’s use of the Chebyshev inequality to bound na ð½y; dyÞ, as opposed to our iterative use of the Chernoff bound. The symmetry of the Chebyshev inequality around the mean determines one of the two optimizing degrees of freedom, and consequently Stroock’s lower bound can be described as ¯ yÞ, mðX XyÞX sup Lða;

(1.12)

a41

where 2 0 : 1  1=X ðayÞy2 ða  1Þ ¯ yÞ¼ Lða; 2a  1; yÞ. Lða; 1  eI a ðð2a1ÞyÞ  eI a ðyÞ The first disadvantage of (1.12) when compared to (1.4) is the fact that it lacks one degree of freedom, whose optimization could only improve the latter. Secondly, unlike (1.4) which is guaranteed to work in general by Theorem 1.1, the range of applicability of (1.12) is limited by the requirement that X0 ðayÞy2 ða  1Þ2 41. There are indeed application of interest that do not conform with this requirement, and for which therefore (1.12) is inapplicable. In particular, one such application involves XðtÞ ¼ c  t1 for some constant c40. One readily 2 concludes that this choice makes X0 ðayÞy2 ða  1Þ2 ¼ 1  1=a o1, thus invalidating (1.12).

2. A nonlinear optimization problem We are now in a position to compute the lower bound presented in an implicit way in (1.4). In order to arrive at an explicit computation, we need to solve the optimization over the two parameters, a and d, that determine the tightest achievable lower bound. The first step in our computation is the reduction of the optimization in (1.4) to one variable. In what follows we will use the following symbols to simplify the presentation:  : Aða; yÞ¼ exp I a ðyÞ ,  : Bða; d; yÞ¼ exp I a ðdyÞ . We proceed by evaluating the first order condition with respect to a: qL Ly2 X0 ðayÞ  ða; d; yÞ ¼ dy2 X0 ðayÞL þ ay2 X0 ðayÞL  ðd  aÞB  ða  1ÞA ¼ 0, qa 1AB ð1  aÞA þ ðd  aÞB ()  d þ a  ¼ 0 () 1AB aA d ða; yÞ ¼ . 1A The following lemma describes the properties of the resulting optimum choice of d:

ð2:1Þ

Lemma 2.1. For every y, d is a quasiconvex function of a which attains a unique minimum at some a ðyÞ 2 ð1; 1Þ. Proof. Observe that   Z limþ A ¼ exp limþ yða  1ÞXðayÞ þ a!1

a!1

y



ay

XðtÞ dt

¼ 1

ARTICLE IN PRESS T. Theodosopoulos / Statistics & Probability Letters 77 (2007) 558–565

563

and : qA ¼ Ay2 ða  1ÞX0 ðayÞ. Aa ¼ (2.2) qa Also, lima!1 A ¼ 0þ because the concavity of X (Stroock, 1993) implies lima!1 X0 ðayÞo1 and therefore lima!1 ðq=qaÞ log A ¼ 1, which implies that log A & 1, and therefore A approach 0 from above as a tends to infinity. The proof of the lemma will take three steps. The first step is to show that, for any y, limþ d ¼ limþ

a!1

a!1

1  Aa 1 ¼ limþ 2 ¼ þ1, 0 Aa a!1 Ay ða  1ÞX ðayÞ

(2.3)

where Aa ¼ qA=qa. The second step is to observe that lima!1 d ¼ þ1. The final step of the proof involves qd 1  A  Ay2 ða  1Þ2 X0 ðayÞ ¼ . qa ð1  AÞ2

(2.4) 

 From (2.3) we conclude that lima!1þ qd qa ¼ 1. Therefore, for every y, there must exist a aðyÞ 2 ð1; 1Þ such that ðqd =qaÞðaðyÞ; yÞ ¼ 0 and 8a 2 ðaðyÞ; 1Þ; ðqd =qaÞðaðyÞ; yÞ40. Indeed, differentiating (2.4) with respect to a we see that  2Ay2 ða  1ÞX0 ðayÞ qd q2 d Ay2 ða  1Þ  2 2 0 2 0 00 . ¼ y ða  1Þ X ðayÞ þ X ðayÞ  yða  1ÞX ðayÞ þ 1A qa2 qa ð1  AÞ2

(2.5)

The concavity of X together with (2.5) show us that qd =qa ¼ 0¼)q2 d =qa2 40, implying that each critical point of d is a minimum, and therefore there is a unique minimum and d is quasiconvex. & Using the resulting expression for d as a function of a, we can rewrite the lower bound L and the expression B above as functions solely of a and y: : ^ yÞ¼ Lða; Lða; d ða; yÞ; yÞ, : ^ yÞ¼ Bða; expfI a ðyd ða; yÞÞg. Using this terminology, we observe that: ^ aðyÞ; yÞ ¼ 1 and, for all Lemma 2.2. For every y there exists a unique a^ ðyÞ 2 ð1; a ðyÞÞ such that Að^aðyÞ; yÞ þ Bð^ ^ a 2 ð1; a^ ðyÞÞ, Aða; yÞ þ Bða; yÞo1. Proof. Notice that   qd : qB^   0 ^ ^ ¼ By yðd  aÞX ðayÞ  ½Xðd yÞ  XðayÞ . Ba ¼ qa qa By the monotonicity of X and Lemma 2.1 we can see that B^ a X0 and therefore ^ lim B40.

a!1

Using (2.2) and (2.6), after some algebra we arrive at   q qd Xðd yÞ  XðayÞ  2 0 ^ ðA þ BÞ ¼ y X ðayÞðd  aÞ A þ B  1  . qa qa yðd  aÞX0 ðayÞ

(2.6)

(2.7)

(2.8)

Since the term outside the brackets on the right-hand side of (2.8) is always nonnegative, we conclude, using Lemma 2.1, that the following statements are true:

(1) (2) (3) (4)

^ ¼ 0¼)Að¯aÞ þ Bð¯ ^ aÞo1. 9¯a 2 ð1; a Þ ðq=qaÞja¯ ðA þ BÞ ^ ^ aÞ41. 9¯a 2 ða; 1Þ ðq=qaÞja¯ ðA þ BÞ ¼ 0¼)Að¯aÞ þ Bð¯    ^ ^ 9a 2 ð1; a Þ Aða Þ þ Bða Þ ¼ 1¼)ðq=qaÞja ðA þ BÞ40.    ^ ^ 9a 2 ða; 1Þ Aða Þ þ Bða Þ ¼ 1¼)ðq=qaÞja ðA þ BÞo0.

ARTICLE IN PRESS T. Theodosopoulos / Statistics & Probability Letters 77 (2007) 558–565

564

Also, using (2.3), we see that ( " #)   R dy XðtÞ dt 1 y  lim B^ ¼ exp lim d yXðyÞ 1  d!1 d d a!1þ   ¼ exp lim dy½XðyÞ  XðdyÞ ¼ 0þ , d!1

whether limy!1 Xo1 or limy!1 X ¼ 1. This deduction rules out a maximum of A þ B^ before a , because by ^ statement (1) above, any critical point of A þ B^ before a leads to A þ Bo1 and therefore must be a minimum. Furthermore, the combination of statements (2) and (4) above imply that, if 1 þ A þ B^ lacks a zero below a , ^ then it cannot have a minimum below a , because when A þ Bo1 for all aoa, the slope of A þ B^ will be negative for all aoa. Therefore, one of the following two statements must hold: ^ (i) either A þ B41 for all y and a41, or ^ a^ , in ð1; a Þ such that ðq=qaÞja^ ðA þ BÞ40. ^ (ii) there exists a zero of 1 þ A þ B,  yÞXa be such that It turns out that we can rule out case (i). Specifically, for every a and y, let dða;  Aða; yÞ þ Bða; dða; yÞ; yÞ ¼ 1. Clearly, qB=qd ¼ ByfXðdyÞ  XðayÞgp0. Thus, if there is no a 2 ð1; a Þ with  yÞ, which implies that ^ A þ Bo1, then for all a 2 ð1; a Þ and every y, d ða; yÞodða; : 2 0 La ¼qL=qa ¼ ðLy X ðayÞ=ð1  A  BÞÞ½ða  AÞ  dð1  AÞp0. But this would imply that the maximum lower  which leads to max L ¼ 0. This clearly contradicts the statement of Theorem 1.1 bound L is achieved on d, which asserts that, for any y, there exists a pair of values for a and d making L strictly positive. Thus, we are left with statement (ii) as the only viable possibility, which establishes the desired result. & : Let Gða; d; yÞ¼Bða; d; yÞXðdyÞ  ð1  Aða; yÞÞXðayÞ. ^ yÞ 2 ða; 1Þ such that Gða; dða; ^ yÞ; yÞ ¼ 0. Moreover, Lemma 2.3. For any a41 and y, there exists a unique dða; ^ ðqG=qdÞða; dða; yÞ; yÞo0. Proof. From the definition of G we see that : qG ¼ ByfXðdyÞ½XðdyÞ  XðayÞ  X0 ðdyÞg Gd ¼ qd and 2 qG : q G Gdd ¼ 2 ¼ y½XðdyÞ  XðayÞ  By2 fX0 ðdyÞ½2XðdyÞ  XðayÞ  X00 ðdyÞg, qd qd which implies that Gd X0¼)Gdd o0. Observe that G d ða; a; yÞ ¼ ByX0 ðayÞ40. Also, using (2.7) we can see that for large enough d, G d ða; d; yÞo0, because the concavity of X forces X0 to remain uniformly bounded. Thus, for ¯ yÞ 2 ða; 1Þ. Notice that Gða; a; yÞ ¼ AXðayÞ because any a41 and y, Gða; ; yÞ has a unique maximum dða; Bða; a; yÞ ¼ 1. Also, for any a41 and y, ( " #) R dy  a ay XðtÞ dt lim Bða; d; yÞ ¼ exp lim d y 1  XðayÞ  d!1 d!1 d d   ¼ exp lim dy½XðayÞ  XðdyÞ ¼ 0þ , d!1

^ yÞ. Naturally, d4 ^ d, ¯ and whether limy!1 Xo1 or limy!1 X ¼ 1. Therefore, G indeed possesses a zero, dða; ^ therefore ðqG=qdÞða; dða; yÞ; yÞo0. Finally, this zero is unique because for there to be another, there first must exist a minimum, which is prohibited by the preceding. & We are in a position to prove the main theorem of this section: ^ Theorem 2.4. For every y there exists a a ðyÞ 2 ð1; a^ ðyÞÞ which attains the unique maximum of L.

ARTICLE IN PRESS T. Theodosopoulos / Statistics & Probability Letters 77 (2007) 558–565

565

 Proof. We have already seen that, for any a41 and any y, La ða;  d ða; yÞ; yÞ ¼ 0. We can also see that : ^ yÞ; y ¼ 0. The discussion at the Ld ¼qL=qd ¼ LGy=ð1  A  BÞ and thus for any a41 and any y, Ld a; dða; end of the proof of Lemma 2.2 guarantees, for any y, the existence of an intersection, a 2 ð1; a^ ðyÞÞ between the ^ yÞ. The only question that remains is the uniqueness of this intersection and two curves, d ð; yÞ and dð; consequently of the maximum for L. Observe that, for any y,

   2 q2

qd qd q2 d  þ L L ð a; d ðaÞ; y Þ ¼ L þ 2L þ Ld 2 aa ad dd

2 qa a qa qa qa   2  2 2 0 Ly ðy  2ÞX ðayÞð1  AÞ qd LyG d qd þ ¼ , ð2:9Þ 1AB qa 1  A  B qa

which is generically negative, where we have used G ¼ 0 and the conclusion of Lemma 2.3 to arrive at the ^ yÞ is a local maximum of Lð; d ðÞ; yÞ as a second line. Thus we see that every intersection of d ð; yÞ and dð; function of a single variable a, and so it must be unique. & 3. Conclusions and future steps We have shown a way to construct a lower bound to complement the Chernoff bound that is applicable generically, avoiding the restrictions to the moment generating function that characterized earlier inequalities of a similar nature. We were able to represent the bound as the solution of a two-dimensional nonlinear optimization problem, and proved the existence and uniqueness of the solution. At this point it is natural to investigate the asymptotic properties (in the spirit of Borovkov (1996)) of the new lower bound, including its asymptotic gap from the Chernoff bound. It is reasonable to expect that one can classify moment generating functions relative to the resulting asymptotic gaps. While there are obvious examples with no gap (e.g. Gaussian) and some with a gap (Theodosopoulos, 1999), a complete classification is still out of reach. Extensions to the multivariate case are another natural next step, along the lines of Barndorff-Nielsen and Klu¨ppelberg (1999), and Iyengar and Mazundar (1998) for the saddlepoint approximation. Furthermore, one may inquire whether the new lower bound, as well as the Chernoff bound, can be extended in situations where the moment generating function is not precisely known (Bagdasarov and Ostrovskii, 1996) or does not exist at all. The former case will be dealt with in a follow-up paper. More generally, the fact that, as shown in this paper, bilateral tail bounds are achievable with reference only to XðÞ lends support to the latter possibility. It is also plausible to substitute the exponential function for other ones, more appropriate for different distributions. An investigation of this question remains open at the moment. References Bagdasarov, D.R., Ostrovskii, E.I., 1996. Reversion of Chebyshev’s inequality. Theory Probab. Appl. 40, 737–742. Bahadur, R., Rao, R.R., 1960. On deviations of the sample mean. Ann. Math. Statist. 31, 1015–1027. Barndorff-Nielsen, O.E., Klu¨ppelberg, C., 1999. Tail exactness of multivariate saddelpoint approximations. Scand. J. Statist. 26, 253–264. Borovkov, A.A., 1996. Unimprovable exponential bounds for distributions of sums of a random number of random variables. Theory Probab. Appl. 40, 230–237. Chernoff, H., 1952. A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. Ann. Math. Statist. 23, 493–507. Daniels, H.E., 1954. Saddlepoint approximations in statistics. Ann. Math. Statist. 25, 631–650. Daniels, H.E., 1980. Exact saddlepoint approximations. Biometrika 67, 59–63. Iyengar, S., Mazundar, M., 1998. A saddle point approximation for certain multivariate tail probabilities. SIAM J. Sci. Comput. 19, 1234–1244. Jensen, J.L., 1995. Saddlepoint Approximations. Oxford University Press, Oxford. Stroock, D.W., 1993. Probability Theory. Cambridge University Press, Cambridge. Theodosopoulos, T.V., 1999. Some remarks on the optimal level of randomization in global optimization. In: Pardalos, P, et al. (Eds.), Randomization methods in algorithm design, AMS DIMACS Series, vol. 43.