Statistics & Probability Letters 45 (1999) 141 – 147
Con dence intervals from simulations based on 4-independent random variables Paul Kabaila ∗ Department of Statistical Science, La Trobe University, Bundoora, Victoria 3083, Australia Received February 1998; received in revised form January 1999
Abstract Using an algorithm of Joe (Ann. Probab. 2 (1974) 161–162) we generate a nite sequence of 4-independent random variables. Using this sequence we nd a con dence interval with coverage probability exceeding a value close to 1 − for = E{g(U )} where U = (U1 ; : : : ; Us ) with U1 ; : : : ; Us independent and identically U (0; 1) distributed random variables c 1999 Elsevier Science B.V. All rights reserved and g(u) ∈ {0; 1} for all u ∈ Rs . Keywords: k-independent random variables; Con dence interval
1. Introduction Our aim is to calculate the probability = P{T ∈ A} where the statistic T is a continuous random p-vector and A ⊂ Rp . In the context of continuous data, many probabilities of statistical importance can be put in this form: P-values, Type I and II errors for hypothesis tests and coverage probabilities for con dence limits, just to name a few. Suppose that the calculation of is analytically intractable, so that we seek a numerical approximation to . Also suppose that T is such a complicated function of the data that its probability distribution (for given true parameter values) is not readily calculable. For example, when T is a signed root likelihood ratio statistic with Bartlett correction this will usually be the case. We suppose, however, that T can readily be expressed in the form T = a(U ) where a : Rs → Rp and U = (U1 ; : : : ; Us ) with U1 ; : : : ; Us independent and identically distributed U (0; 1) random variables. For example, suppose that T is a statistic based on s independent and identically distributed continuous random variables each with distribution function F( · ) where F −1 (u) is de ned for 0 ¡ u ¡ 1. Since F( · ) is the distribution function of F −1 (U1 ) for U1 a U (0; 1) distributed random variable, T can readily be expressed in the form T = a(U ). To discuss the numerical approximation of it is convenient to express as follows. De ne g( · ) = A (a( · )) where A ( · )
∗
Visiting: Department of Statistics, University of Oxford, UK. E-mail address:
[email protected] (P. Kabaila)
c 1999 Elsevier Science B.V. All rights reserved 0167-7152/99/$ - see front matter PII: S 0 1 6 7 - 7 1 5 2 ( 9 9 ) 0 0 0 5 3 - X
142
P. Kabaila / Statistics & Probability Letters 45 (1999) 141 – 147
is the indicator function of the set A, i.e. A ( u ) = 1 for u ∈ A and A ( u ) = 0 for u 6∈ A. Thus Z 1 Z 1 ··· g(u) du: = E{g(U )} = 0
0
The analysis in the paper will be based on the quasi-Monte Carlo approximation of by m 1X g( y(i)); = m i=1
where y(1); : : : ; y(m) is a set of points in the unit cube in Rs . We suppose that m and y(1); : : : ; y(m) have been chosen judiciously under the con icting aims of (a) making a good approximation to and (b) making m as small as possible. A detailed discussion about how this choice is made is a topic outside the scope of the paper. However, we make the following observations about that choice. The fact that g(·) is discontinuous implies that to guarantee a given level of accuracy of approximation of by , m is necessarily very large by comparison with the case that g(·) is smooth in some sense, see e.g. Haber (1970). For the particular case that {u: u ∈ Rs ; g(u) = 1} is a convex set, Theorem 1 of Schmidt (1975) provides some guidance as to how large m needs to be chosen to guarantee a given level of accuracy of approximation of by . Note that the approximator is very similar to the approximator obtained using what Fang and Wang (1994) call the “indicator function method” applied to the above multiple integral expression for . We suppose that the calculation of an approximation to is subject to the following constraint. Constraint 1. The computational burden of evaluating g(u) for a single value of u is such that it is only practical to evaluate g(u) for far less than m values of u. In statistical practice, values of s exceeding 30 and functions a( · ) which are computationally burdensome to evaluate, arising for example from bootstrapping, are not uncommon. Thus it is not uncommon to encounter Constraint 1 in statistical practice. Constraint 1 rules out the direct use of the approximator . We therefore look to Monte Carlo simulation to provide an estimate of which is then used as an estimate of . Since we are using simulation, our purpose is not only to nd an estimate of but also to nd a con dence interval for with coverage probability exceeding a value close to 1 − (where has been speci ed beforehand, commonly = 0:05). One method of simulation which, under Assumption 1 described below, ful lls that purpose is the following. Suppose that I1 ; : : : ; In are independent and identically distributed discrete random variables each uniformly distributed over {1; : : : ; m}. Throughout the paper we will suppose that n ¿ 4. We form W1 =
n X
g( y(Ii ))
(1)
i=1
and estimate by ˜ = W1 =n. Note that W1 has a Binomial (n; ) distribution and there are standard procedures for nding a con dence interval for with coverage probability ¿1 − . It is straightforward to show that such a con dence interval for will also be a con dence interval for with coverage probability exceeding a value close to 1 − when the following assumption holds true. p ˜ i.e. | − | (1 − )=n. Assumption 1. n 1 and | − | standard deviation of , Now observations of truly random variables may be made using electronic noise, radioactive decay or other truly random sources, see e.g. Maddocks et al. (1972) and references therein. We make a sharp distinction between truly random and pseudorandom variables. For a given value of the seed, the latter are calculated using a deterministic formula. We suppose that the observation of truly random variables is subject to the following constraint.
P. Kabaila / Statistics & Probability Letters 45 (1999) 141 – 147
143
Constraint 2. The making of observations of truly random variables has a cost associated with it, so that it is practical to make observations of I1 ; : : : ; I4 only. In computational practice this is not an uncommon constraint. Constraint 2 makes the method of simulation described above inapplicable, since we will typically want the summation (1) that de nes W1 to involve more than just a handful of terms. We will come back to the con dence interval based on ˜ later, when we use it as a yardstick for the new con dence interval based on the new method of simulation. In Section 2 we introduce a new method of simulation which is based on a sequence of random variables K1 ; : : : ; Kn . These random variables are identically uniformly distributed over {1; : : : ; m} and are 4-independent, i.e. any set of 4 of these random variables are independent. The basis of their generation is an algorithm of Joe (1974). The new simulation is seeded by (I1 − 1; : : : ; I4 − 1). We then form n X g( y(Ki )) (2) W= i=1
and estimate by ˆ = W=n. As a consequence of the 4-independence of the Ki ’s, the following assumption is satis ed by W . Assumption 2. E{W k } = E{X k } for k = 1; : : : ; 4 where X ∼ Binomial(n; ). In Section 3 we describe a con dence interval for based on ˆ which has coverage probability ¿1 − . As proved in Appendix B, under Assumption 1 this is also a con dence interval for with coverage probability exceeding a value close to 1 − . In Section 4 we compare a con dence interval for based on ˜ (based, in turn, on W1 given by Eq. (1)) with the con dence interval for based on ˆ (calculated using the new method of simulation) but with n set equal to n0 . We show that for large n and n0 these con dence intervals will have almost identical widths provided that n0 =n takes a certain value which depends on . This value is approximately 2 for in the range [0:02; 0:1]. In other words, for n and n0 large and in this range, the con dence interval using the new method of simulation with n set to twice the value of n for the con dence interval based on ˜ will lead to approximately equally “accurate” con dence intervals. By adopting the former con dence interval we, in eect, avoid the generation of n − 4 observations of the Ii ’s, but at the cost of requiring the evaluation of g(u) for about twice as many values of u as the latter. We illustrate this comparison with an example. Suppose that m = 1030 and that evaluating g(u) for 105 values of u takes 1 h of computer time. Clearly, Constraint 1 applies in this case. Two procedures which lead to con dence intervals of approximately equal width are the following: ˆ for n = 2 × 105 . This (a) Make observations of I1 ; : : : ; I4 and calculate the con dence interval based on , requires the calculation of g(u) for 2 × 105 values of u, taking 2 h of computer time. ˜ for (b) Make observations of I1 ; : : : ; In where n = 105 and calculate the con dence interval based on , n = 105 . This requires the calculation of g(u) for 105 values of u, taking 1 h of computer time. The rst of these procedures requires us to wait 2 h instead of 1 h for the result. However, this procedure saves us the burden of making observations of approximately 105 independent truly random variables each uniformly distributed on {1; : : : ; 1030 }. For many computer users the access to observations of truly random variables made using electronic noise, radioactive decay or some other truly random source is so limited as to make this burden unacceptably large. 2. The new method of simulation The new method of simulation uses the random variables X1 ; X2 ; : : : ; Xp de ned according to the following algorithm of Joe (1974). Suppose that p is a prime and de ne X1 ; X2 ; X3 ; X4 to be independent random
144
P. Kabaila / Statistics & Probability Letters 45 (1999) 141 – 147
variables each uniformly distributed on {0; 1; : : : ; p − 1}. Then let Xl = X1 ⊕ lX2 ⊕ l2 X3 ⊕ l3 X4
for l = 5; : : : ; p;
where ⊕ denotes addition modulo p. According to the theorem of Joe (1974), the random variables X1 ; X2 ; : : : ; Xp are identically distributed and are 4-independent. We view the truly random vector (X1 ; X2 ; X3 ; X4 ) as being the seed for Joe’s algorithm. Note that the analysis of pseudorandom number generators under seeding by truly random variables is a standard approach in the cryptography literature, see e.g. Lagarias (1993) and references therein. We also suppose that m and the prime p have been chosen such that m = p. We de ne Ki = Xi + 1 for i = 1; : : : ; n (n ¡ p) and observe that K1 ; : : : ; Kn are identically distributed 4-independent random variables. It is then straightforward to show that Assumption 2 is satis ed by W which is de ned by Eq. (2). Finally, we compare the generation of observations of I1 ; : : : ; In with the generation of observations of K1 ; : : : ; Kn from the point of view of the burdens they place on the generator of observations of truly random variables. Note that (X1 ; : : : ; X4 ) and (I1 − 1; : : : ; I4 − 1) have identical distributions. We may thus seed the new method of simulation by setting (X1 ; : : : ; X4 ) equal to (I1 − 1; : : : ; I4 − 1). 3. The new conÿdence interval based on the new method of simulation We now describe the new method of nding a con dence interval for , based on the new method of simulation described in Section 2, with coverage probability ¿1 − . De ne 2 1 1 1 (3) = + (1 − ); a(; ) = − − 4 2 4 where is a parameter satisfying ∈ [0; 1] and =(1−). Since ∈ [0; 1], ∈ [0; 14 ]. De ne 1 =1 (1−1 ). As shown in Appendix A, the following is a con dence interval for with coverage probability ¿1 − : H = {1 : n(ˆ − 1 )2 6c∗ (0):a(1 ; ∗ )}; where c∗ (0) and ∗ are given by Eqs. (A.3) and (A.4) of this appendix, respectively. Remember, ˆ is calculated using the new method of simulation. Now H = {1 : a1 12 − a2 1 + a3 60}, where ∗ c∗ (0) ; n ∗ c∗ (0) ; a2 = 2ˆ + n ∗ 2 c (0) ∗ :( − 1): a3 = ˆ + 4n
a1 = 1 +
Thus,
" H=
a2 −
p
a22 − 4a1 a3 a2 + ; 2a1
# p a22 − 4a1 a3 : 2a1
As proved in Appendix B, under Assumption 1 this con dence interval is also a con dence interval for with coverage probability exceeding a value close to 1 − . 4. A comparison of the new conÿdence interval with one based on ˜ In this section we restrict the range of possible values of to [; 1 − ] where is a small positive number. The con dence interval for based on ˜ (based, in turn, on W1 given by Eq. (1)) and described by Bickel
P. Kabaila / Statistics & Probability Letters 45 (1999) 141 – 147
145
and Doksum (1977, pp. 160 –161) is ˜ 1 }; G = {1 : n(˜ − 1 )2 6c: ˜ for R ∼ 12 . For n large, this con dence interval for has coverage where 1 =1 (1−1 ) and P{R6c}=1− probability exceeding a value close to 1 − . We compare this con dence interval with the new con dence of comparison we suppose interval H for de ned in the previous section but with n set equal to n0 . For ease p ˜ Also, for large n0 , that n and n0 are large. For large n, G is approximately centred at ˜ with length 2 c=n. ∗ ∗ ∗ ≈ 1 and c (0) ≈ c where 1=2 2(1 − ) ∗ : c =1+ p Thus, for large n0 , H is approximately centred at ˆ with length 2 c∗ =n0 . Hence, for large n and n0 , r c∗ n length of H ≈ : length of G cn ˜ 0 This ratio is equal to 1 when n0 =n = 2:01, 1:87 and 1:94 for = 0:02, 0:05 and 0:1, respectively. Appendix A In this appendix we prove that the new con dence interval based on the new method of simulation includes with probability ¿1 − . We will need the following basic result which is a straightforward consequence of the result stated in Exercise 10:1:2 of Whittle (1970). Theorem A.1. For any random variable Z satisfying E{Z}60, P{Z60}¿[E{Z}]2 =E{Z 2 }. De ne Y = n(n−1 W − )2 − c:a(; ) where W is given by Eq. (2) and a(; ) is given by Eq. (3). Also de ne the interval C(; ) = (=a(; ); ∞). Throughout this appendix it is implicit that n ¿ 4, ∈ [0; 1] and ∈ [0; 14 ]. Note that c¿0 and E{Y } ¡ 0 for all , and all c ∈ C(; ). Using the well known variance and excess of a Binomial (n; ) random variable (see e.g. Abramowitz and Stegun (1972, pp. 928–929)), we nd that for W satisfying Assumption 2, 6 E{Y 2 } 2 : ( − c:a(; ))−2 : = 1 + 2 − + n n [E{Y }]2 De ne l(c; ; ) to be 1 divided by the right hand side of this equality. Theorem A:1 has the following corollary. Corollary A.1. For all , , c ∈ C(; ) and all W satisfying Assumption 2; P{Y 60}¿l(c; ; ). De ne c( ; ) to be the solution for c ∈ C(; ) of l(c; ; ) = 1 − where = −1 . Since it is implicit that ∈ [0; 14 ], it is also implicit that ∈ [4; ∞); so that sup denotes the supremum over ∈ [4; ∞). It is straightforward to show that " 1=2 1=2 #,
1 6 1− + (A.1) + :(1 − ) : 2− c( ; ) = 1 + n n 4 Now de ne c∗ ( ) = sup c( ; )
(A.2)
146
P. Kabaila / Statistics & Probability Letters 45 (1999) 141 – 147
Lemma A.1. c∗ ( ) is a nondecreasing function of . Proof. It is clear from (A.1) that c( ; ) is a nondecreasing function of for each . The remainder of the proof is by contradiction. Suppose that sup c( ; 1 ) ¿ sup c( ; 2 ) for some 1 ¡ 2 where 1 ; 2 ∈ [0; 1]. Thus we can nd ˜ ∈ [4; ∞) such that c( ; ˜ 1 ) ¿ sup c( ; 2 ). But this implies that c( ; ˜ 1 ) ¿ c( ; ˜ 2 ). This contradicts the fact that c( ; ) is a nondecreasing function of for each . The next result provides further information about the function c∗ ( ). Theorem A.2. (a) If argsup c( ; ) = 4 then c∗ ( ) = c∗ (0); where
∗
c (0) = 1 +
2(1 − )
1=2
1 1− n
1=2 :
(A.3)
(b) If argsup c( ; ) ¿ 4 then c∗ ( ) ¿ c∗ (0). Proof. (a) If argsup c( ; ) = 4 then it follows from (A.1) that c∗ ( ) = c(4; )= right-hand side of Eq. (A.3). Note also that c( ; 0) is a decreasing function of so that argsup c( ; 0) = 4. (b) By Lemma 1, c∗ ( )¿c∗ (0) for all . The remainder of the proof is by contradiction. Suppose that for some , argsup c( ; ) = ¿ 4 and that c∗ ( ) = c∗ (0). Then
1 + t( ) 1− ; − = 1 + t(4) 4 4 where t( ) =
1−
1=2 2−
6 n
+
n
1=2 :
But t( ) is an increasing function of . Thus ¿ 1, which contradicts the (implicit) assumption that ∈ [0; 1]. Now de ne ∗ to be the largest such that argsup c( ; ) = 4. It follows from Lemma A.1 and Theorem A.2 that c∗ ( ) = c∗ (0) for all ∈ [0; ∗ ] and that c∗ ( ) ¿ c∗ (0) for all ¿ ∗ . To nd ∗ we solve the following for @c( ; ) = 0: @ =4 It is straightforward to show that " ∗
−1=2
=1− 2
1−
1=2
#−1 1=2
(n(n − 1))
+n−1
:
(A.4)
For each xed , ∗ is an increasing function of n. De ne 1 = 1 (1 − 1 ). The results presented in this appendix imply that for W satisfying Assumption 2, {1 : n(ˆ − 1 )2 6c∗ ( ):a(1 ; )} is, for each , a con dence interval for with coverage probability ˆ 1 )2 6c∗ ( 1 ):a(1 ; 1 )} ⊂{1 : n(− ˆ 1 )2 6c∗ ( 2 ):a(1 ; 2 )} for all 1 ¿ 2 satisfying ¿1−. Also, {1 : n(− 1 ; 2 ∈ [0; ∗ ]. This motivates the use of the con dence interval {1 : n(ˆ − 1 )2 6c∗ (0):a(1 ; ∗ )}.
P. Kabaila / Statistics & Probability Letters 45 (1999) 141 – 147
147
Appendix B In this appendix we prove that under Assumption 1, the con dence interval H for is also a con dence interval for with coverage probability exceeding a value close to 1 − . Now P{ ∈ H } = P{n(ˆ − )2 6c∗ (0):a((1 − ); ∗ )} = P{Y ()60}; where Y () = n(ˆ − )2 − c∗ (0):a((1 − ); ∗ ). By Theorem A:1, if E{Y ()} ¡ 0 then P{Y ()60}¿
[E{Y ()}]2 : E{Y 2 ()}
Now Y () = Y ∗ + 2n(ˆ − )( − ) + n( − )2 + c∗ (0) ∗ [(1 − ) − (1 − )]; where Y ∗ = n(ˆ − )2 − c∗ (0):a((1 − ); ∗ ). Note that by the construction of H , E{Y ∗ } ¡ 0 and [E{Y ∗ }]2 = 1 − : E{(Y ∗ )2 } Straightforward calculations show that for n 1 and | − |
p (1 − )=n, E{Y ()} ¡ 0 and
[E{Y ∗ }]2 [E{Y ()}]2 ≈ : E{Y 2 ()} E{(Y ∗ )2 } References Abramowitz, M., Stegun, I.A. (Eds.), 1972. Handbook of Mathematical Functions. Dover, New York. Bickel, P.J., Doksum, K.A., 1977. Mathematical Statistics. Holden-Day, San Francisco. Fang, K.-T., Wang, Y., 1994. Number-theoretic Methods in Statistics. Chapman & Hall, London. Haber, S., 1970. Numerical evaluation of multiple integrals. SIAM Rev. 12, 481–526. Joe, A., 1974. On a set of almost deterministic k-independent random variables. Ann. Probab. 2, 161–162. Lagarias, J.C., 1993. Pseudorandom numbers. Statist. Sci. 8, 31–39. Maddocks, R.S., Matthews, S., Walker, E.W., Vincent, C.H., 1972. A compact and accurate generator for truly random binary digits. J. Phys. E 5, 542–544. Schmidt, W.M., 1975. Irregularities of distribution IX. Acta Arithmetica 27, 385–396. Whittle, P., 1970. Probability. Wiley, London.