Statistics & Probability Letters 49 (2000) 305 – 311
Non-uniform bounds for geometric approximation M.J. Phillips, Graham V. Weinberg∗;1 Department of Mathematics and Statistics, University of Melbourne, Parkville Vic. 3052, Australia Received August 1999; received in revised form November 1999
Abstract Crucial to the Stein–Chen method for distributional approximation is the estimation of dierences of the solution to a Stein equation associated with the distributions being compared, and the usual approach has been to obtain uniform bounds on these dierences. The purpose of this paper is to demonstrate, in the geometric case, that improvement can be obtained by taking non-uniform bounds. The results are illustrated with an application to Polya distribution convergence. c 2000 Elsevier Science B.V. All rights reserved
Keywords: Geometric approximation; Non-uniform bounds; Immigration–birth–death process; Stein–Chen method; Total variation distance; Polya distribution
1. Introduction Two recent independent studies of geometric approximation via Stein’s method have yielded useful results in applications. In Pekoz (1996) the focus is on measuring the error in the geometric approximation of a random variable counting the number of failures before the rst success in a sequence of dependent Bernoulli trials. This was applied to Markov hitting time and sequence pattern applications. In Brown and Phillips (1999) the negative binomial approximation of a sum of indicator random variables is considered, and this is used to provide a bound on the rate of convergence of the Polya distribution. In the same spirit as that paper we will consider the geometric approximation of a sum of indicator random variables, and will present a new bound. The main point of this work is to demonstrate an important idea that can result in improved bounds when using Stein’s method. The approach of previous work has been to use analysis to obtain uniform bounds on the dierence of the solution to the Stein equation associated with the approximations being measured. Both Pekoz (1996) and Brown and Phillips (1999) follow this method, parallelling the argument of Barbour et al. (1992), who present the Poisson analogue. Recent work by Xia (1999) introduced a probabilistic method to ∗
Corresponding author. E-mail address:
[email protected] (G.V. Weinberg).
1
Research supported by an Australian Postgraduate Research Award.
c 2000 Elsevier Science B.V. All rights reserved 0167-7152/00/$ - see front matter PII: S 0 1 6 7 - 7 1 5 2 ( 0 0 ) 0 0 0 6 2 - 6
306
M.J. Phillips, G.V. Weinberg / Statistics & Probability Letters 49 (2000) 305 – 311
obtain bounds on these dierences of the Stein equation in the Poisson setting; by probabilistic we mean utilising explicitly the underlying stochastic process associated with the generator, a method rst studied by Barbour (1988). It turns out that by adapting this probabilistic method developed in Xia (1999) useful non-uniform bounds on these functional dierences can be extracted, and consequently improved bounds are possible. The argument in the case of geometric approximation is rather simple and directly demonstrates the improvements possible. We begin by introducing the Stein–Chen method for geometric approximation, adapting the probabilistic method in Barbour (1988). The generator Ah(j) := q(j + 1)[h(j + 1) − h(j)] + j[h(j − 1) − h(j)]
(1)
is that of a one-dimensional immigration–birth–death process, with immigration rate q, birth rate q per particle already present and unit per capita death rate. A stochastic process with this generator has stationary distribution that of a geometric random variable with parameter p = 1−q; we will denote this by Geo(p). The appropriate Stein equation is f(j) − Geo(p)(f) = Ah(j) for a suitable test function f, and we use the notation Geo(p)(f) := P j j∈A q p. Phillips (1996) demonstrates that the function Z ∞ E[f(Zj (t)) − Geo(p)(f)] dt h(j) = −
P∞
j=0
0
(2) f(j)q j p and Geo(p){A} :=
(3)
is the well-de ned solution to the Stein equation (2), where the process Zj has generator (1), and Zj (0) = j. Thus, if W is any integer-valued random variable, and we choose f(j) = I [j ∈ A], |P(W ∈ A) − Geo(p){A}| = |EAh(W )|:
(4)
Recall that for two probability measures and on the same probability space, the total variation metric is de ned to be dTV (; ) := sup((A) − (A)), where the supremum is over all measurable sets. Thus bounds on the right hand side of (4) will provide bounds on the suitability of the geometric approximation of W under the total variation metric. The approach now is to write generator (1) in terms of a second-order dierence 2 h(f; n) := h(n + 2) − 2h(n + 1) + h(n), and to bound this dierence. 2. Bounding the second-order dierence In Phillips (1996) it is shown that sup|2 h(f; n)|61, where the supremum is over all n ∈ N and all test functions f : N → [0; 1]. We will construct the following non-uniform bound: Lemma 1. For h solution (3) to the Stein equation (2); and f : N → [0; 1]; |2 h(f; n)|6
1 : n+1
(5)
Proof. The proof relies upon the following embedding representation, adapted from Xia (1999). Let m; k := inf {t¿0 : Zm (t) = k}. Then Em; k ¡ ∞ and Z m; k f(Zm (t)) − Geo(p)(f) dt: (6) h(k) = h(m) + E 0
M.J. Phillips, G.V. Weinberg / Statistics & Probability Letters 49 (2000) 305 – 311
307
The proof of (6) is identical to the corresponding result in Xia (1999) because it only depends upon the Markovian structure of the process Z. The key step is to write Z u E[f(Zm (t)) − Geo(p)(f)] dt hu (m) := − 0
Z
= −E
0
Z −
u∧m; k
Z
∞
0
[f(Zm (t)) − Geo(p)(f)] dt
u−s∧u
0
E[f(Zk (t)) − Geo(p)(f)] dt dF(s);
(7)
where F is the cumulative distribution function of m; k . Applying the dominated convergence theorem to (7) completes to proof of (6). Applying (6) to 2 h(f; n) we obtain Z n; n+2 f(Zn (t)) − Geo(p)(f) dt 2 h(f; n) = E 0
Z
−2E Z =E
n; n+1
0
n+1; n+2
0
Z
−E
n; n+1
0
f(Zn (t)) − Geo(p)(f) dt
f(Zn+1 (t)) − Geo(p)(f) dt f(Zn (t)) − Geo(p)(f) dt
= (dn+1 (f) − dn (f)) − (dn+1 − dn )Geo(p)(f);
(8) R m; m+1
g(Zm (t)) dt and dm := dm (1) = Em; m+1 . where we have used the strong Markov property, and dm (g) := E 0 We now proceed to establish a recurrence relation for dn+1 (g). Let be the rst jump time of Zn+1 , so that = n+1; n ∧ n+1; n+2 . Due to the Markovian structure of Zn+1 it follows that the rst jump time is also the minimum of three independent exponential variables, corresponding to whether the rst event is due to an immigration, a birth of a new particle from one of the original n + 1 parents or a death of one of these initial parents. Applying this, together with conditioning on and using the strong Markov property, we obtain dn+1 (g) = g(n + 1)E Z n+1; n+2 g(Zn+1 (t)) dt = n+1; n+2 P( = n+1; n+2 ) +E Z +E
n+1; n+2
g(Zn+1 (t)) dt = n+1; n P( = n+1; n ) Z
n; n+2
=
n+1 g(n + 1) + E q + (n + 1)(q + 1) q + (n + 1)(q + 1)
=
n+1 g(n + 1) + [dn (g) + dn+1 (g)]; q + (n + 1)(q + 1) q + (n + 1)(q + 1)
0
g(Zn (t)) dt
308
M.J. Phillips, G.V. Weinberg / Statistics & Probability Letters 49 (2000) 305 – 311
from which we can deduce that n+1 g(n + 1) + dn (g) dn+1 (g) = q(n + 2) q(n + 2)
(9)
with initial condition d0 (g) = g(0)q−1 . By solving the recurrence relation (9) we obtain the solution dn (g) =
n X j=0
g(j) : q n+1−j (n + 1)
(10)
Observe that by choosing g = 1 and applying (10), dn+1 − dn =
n+1 n X X 1 1 j q − qj (n + 2)q n+2 (n + 1)q n+1 j=0
j=0
1 = (n + 1)(n + 2)q n+2
" n+1−
n+1 X
# qj
(11)
j=1
from which we deduce that dn+1 − dn ¿ 0. Furthermore, after some routine manipulations it can be shown that n X n + 1 − q(n + 2) f(n + 1) j : (12) f(j)q + dn+1 (f) − dn (f) = n+2 (n + 1)(n + 2)q q(n + 2) j=0
Observe that since f is non-negative, Geo(p)(f) =
∞ X
f(j)q j p ¿
j=0
n+1 X
f(j)q j p;
(13)
j=0
and so applying (11) – (13) to (8), together with the fact that the dn are increasing, yields ! Pn+1 n X −q + p j = 1 q j j 2 h(f; n) 6 f(j)q (n + 1)(n + 2)q n+2 j=0
f(n + 1) + q(n + 1)(n + 2)
" q(n + 1) + p
n+1 X
# q
j
j=1
1 : n+1 Furthermore, note that 2 h(1 − f; n) = − 2 h(f; n), and so 1 : 2 h(f; n)¿ − n+1 Combining (14) and (15) completes the proof. 6
(14)
(15)
3. The main result In this section, we apply bound (5) to the generator component in the Stein equation to derive an improved bound on the geometric approximation of a sum of indicator random variables. The following is the main result:
M.J. Phillips, G.V. Weinberg / Statistics & Probability Letters 49 (2000) 305 – 311
309
Pm d Theorem 1. Suppose W = j = 1 Xj is a sum of indicator random variables with j = EXj and that Z = Geo(p) is a geometric random variable independent of W; with the same mean as W . Furthermore suppose d Wj∗ = (W − Xj )|{Xj = 1}. Then m W + Z − W∗ X j (16) j E dTV (L(W ); Geo(p))6(2 − p) Wj∗ + 1 j=1
6(2 − p)
m X
j E|W + Z − Wj∗ |:
(17)
j=1
Proof. The proof essentially follows that in Brown and Phillips (1999), except that bound (5) is used instead of the supremum of second-order dierences; observe that by adding and subtracting the term [h(W + Z + 1) − h(W + Z)] inside the expectation, we obtain after some simpli cations ˜ E[Ah(W )] = qE W [h(W + 1) − h(W )] − [h(W + Z˜ + 1) − h(W + Z)] +E [[h(W + Z + 1) − h(W + Z)] − W [h(W ) − h(W − 1)]] ;
(18)
d
where Z˜ = Z + 1. By a simple conditioning argument it can be shown that for a function k, E[Xj k(W )] = EXj E[k(W )|Xj = 1]
(19)
and so applying (19) to (18) results in E[Ah(W )] = q
m X E h(Wj∗ + 2) − h(Wj∗ + 1) − [h(W + Z + 2) − h(W + Z + 1)] j=1
+
m X E h(W + Z + 1) − h(W + Z) − [h(Wj∗ + 1) − h(Wj∗ )] :
(20)
j=1
The result follows by writing the dierences of the function h in (20) as a telescoping sum of second-order dierences h(n + 2) − 2h(n + 1) + h(n) and repeatedly applying the smoothness estimate (5). The second bound (17) is the corresponding result from Brown and Phillips (1999), and clearly the rst bound (16) is an improvement. In applications the hope is that the term on the bottom of (16) introduces lower-order terms into the bounds. Note that under some conditions it may be possible that # " W + Z − W∗ 1 j ∗ : (21) E 6E|W + Z − Wj |E Wj∗ + 1 Wj∗ + 1 One would expect (21) to hold in applications where W + Z − Wj∗ and the reciprocal of Wj∗ are negatively correlated, or when they are independent. In these cases the improvement could be quite substantial, and the term involving the mean of the reciprocal of Wj∗ + 1 can under certain circumstances be calculated using the following general result. Pm Lemma 2. If W = j = 1 Xj is a sum of exchangeable indicators; and W ∗ has the common distribution of Wj∗ ; j = 1; : : : ; m; then 1 − P(W = 0) 1 = : (22) E ∗ W +1 EW
310
M.J. Phillips, G.V. Weinberg / Statistics & Probability Letters 49 (2000) 305 – 311
Proof. The distribution of W ∗ can be written directly in terms of that for W as (k + 1)P(W = k + 1) EW (by applying Bayes’ rule for example) for which the probability generating function is G∗ (s) = G 0 (s)=EW , where G(s) is the p.g.f. for W . The expectation will then be Z 1 G(1) − G(0) 1 − P(W = 0) 1 = G∗ (s) ds = = : E W∗ + 1 EW EW 0 P(W ∗ = k) =
4. Application The results described by Theorem 1 will hold for any realisations of the random variables having the required distributions, and to apply them we want to nd a coupling of such random variables for which the bound is minimised. Simpli cation is possible if this can be constructed such that the expression within the absolute value sign in Eq. (17) has constant sign; when this sign is always positive the summation term reduces to just the dierence of variances m X j=1
j E|W + Z − Wj∗ | =
1−p − Var W: p2
(23)
We illustrate by demonstrating an explicit coupling to obtain a bound on the rate of convergence of the Polya distribution to geometric. The Polya distribution has applications to models of epidemics and in genetics, as well as in other elds, and is typically introduced in terms of an urn scheme. Here we use an alternative construction of the distribution from which a natural coupling for this problem also arises. Consider the random assignment of m balls into d compartments such that all partitions have equal probability, and let W be the number of balls in the rst (say) r compartments. The distribution of W is given by r+k −1 d+m−k −2 d+m−1 P(W = k) = (24) k m−k m d
and we denote this form of the Polya distribution by W = Pya(d; m; r); see also Feller (1968) for discussion of models generating the distribution. In particular, as d; m → ∞ such that the mean d=m tends to a constant, the Pya(d; m; 1) converges to a geometric distribution with parameter d=(d + m). As described in Brown and Phillips (1999), W can be written as a sum of exchangeable indicators, with the common distribution of the Wj∗ , as de ned for Theorem 1, being Pya(d + 1; m − 1; r + 1). For the coupling, assume d ¿ 1 and m ¿ 1 are xed. Start with two counters Z and W , both initially zero, an array of d + m − 1 locations and a single urn containing d + m elements: m − 1 white balls, one black ball and d dividers. Draw one element at random and, if it is not the black ball, place it in the rst location. If it is any ball increment Z by one. Repeat for subsequent locations until the rst divider is selected, choosing at each draw from all the d + m elements, both in the urn and in the array. No change is made to the array if an already placed ball is chosen but Z is still incremented. Consider the rst divider as the rst element of the second stage and continue drawing, now only from the elements not already chosen in this stage. The counter W is incremented for each ball selected. Elements drawn from the urn, other than the black ball, are put in the next location, but there is no re-location of balls already placed in stage one which are re-selected. Continue until the second divider is placed. From the construction it can be seen that Z is a geometric random variable with success probability d=(d + m). The process of placing the balls and dividers realises a random partition of the m − 1 white balls into d + 1 groups; from the previous discussion, the number W ∗ placed up to the second divider in the array,
M.J. Phillips, G.V. Weinberg / Statistics & Probability Letters 49 (2000) 305 – 311
311
will thus have a Pya(d + 1; m − 1; 2) distribution as required. In the second stage a partition is also eected of all m balls into the d groups remaining, the number in the rst of these being recorded by W which therefore has distribution Pya(d; m; 1). Clearly Z and W are independent. In this scheme Z is greater than or equal to the number of balls placed up to the rst divider and W is greater than or equal to the number placed between the rst and the second, the excess W + Z − W ∗ being the number of times the black ball is drawn or a white ball is re-selected. So the coupling of W + Z with W ∗ is of the required form for the application of Theorem 1 and Eq. (23). It is clear that the number of ‘excess’ balls drawn should increase with the number placed in the array, and a relatively simple argument, conditioning on the location of the rst divider in the array, shows that P(W + Z − W ∗ ¿x|W ∗ = y) is increasing in y for any x. From this it follows that W + Z − W ∗ and the reciprocal of W ∗ are negatively correlated, and Eq. (21) applies. For the distribution in this particular problem EW =
m ; d
Var W =
m(d + m)(d − 1) : d2 (d + 1)
The resulting bound, applying (16), (21) – (23) and noting that Pya(d; m; 1){0} = (d − 1)=(d + m − 1), is 2m(d + 2m) m d 6dTV Pya(d; m; 1); Geo 6 ; (d + m)(d + m − 1) d+m d(d + 1)(d + m − 1) where the lower bound is evaluated as |Pya(d; m; 1){0} − Geo(d=(d + m)){0}|. 5. Concluding remarks The main conclusion from this work is that the usual approach in applications of Stein’s method, where uniform bounds on the second-order dierence are used, may in fact be unnecessarily removing useful terms from the nal bounds. As the application shows, an improvement of order at least may be possible. We observe that the method presented here can be extended not only to negative binomial approximations but also possibly to a wider class of probability approximations, where the underlying reference distribution can be characterised in terms of birth and death parameters. Work is currently in progress to extend the above arguments to the case of negative binomial, and to this wider class of random variable approximations. References Barbour, A.D., 1988. Stein’s method and Poisson process convergence. J. Appl. Probab. 25A, 175–184. Barbour, A.D., Holst, L., Janson, S., 1992. Poisson Approximation. Oxford University Press, Oxford. Brown, T.C., Phillips, M.J., 1999. Negative binomial approximation with Stein’s method. Methodol. Comput. Appl. Probab., to appear. Feller, W., 1968. An Introduction to Probability Theory and its Applications I. Wiley, New York. Pekoz, E.A., 1996. Stein’s method for geometric approximation. J. Appl. Probab. 33, 707–713. Phillips, M.J., 1996. Ph.D. Thesis. Department of Statistics, University of Melbourne, unpublished. Xia, A., 1999. A probabilistic proof of Stein’s factors. J. Appl. Probab. 36, 287–290.