Asymptotic theory for order sampling

Asymptotic theory for order sampling

Journal of Statistical Planning and Inference 62 (1997) 135 158 ELSEVIER journal of statistical planning and inference Asymptotic theory for order ...

1011KB Sizes 5 Downloads 536 Views

Journal of Statistical Planning and Inference 62 (1997) 135 158

ELSEVIER

journal of statistical planning and inference

Asymptotic theory for order sampling Bengt Ros6n* Statistics Sweden, S-115 81 Stockholm, Sweden

Received 7 June 1996

Abstract

Sampling with varying probabilities, notably ~ps (= probability ~ proportional to size) sampling, is one of many vehicles for utilization of auxiliary information. We introduce and study a novel general class of varying probabilities sampling schemes, called order sampling schemes. The main results concern asymptotic distributions of linear statistics. Even if the results lie on the theoretical side, they lay ground for applications of practical sampling interest. Rosrn (1997), which is a follow-up to the present paper, shows that order sampling yields interesting contributions to the problem of finding simple and good ~tps schemes. [c~" 1997 Elsevier Science B.V. AMS classification: Primary 62D05; 60F05 Keywords: Order sampling; Successive sampling; Sequential Poisson sampling; Linear statistics; Asymptotic results

1. Introduction 1.1. Outline o f the paper

We study a novel general class of sampling schemes, called order sampling schemes, which are executed as follows. With the units in the population are associated independent random variables, called orderin9 variables. A without replacement sample of predetermined size n is generated by first realizing the ordering variables, and then letting the units with the n smallest ordering variable values constitute the sample. Although simple to describe and simple to execute, order sampling has an intricate theory. Standard results for linear estimators cannot be applied, since it is impossible to exhibit manageable exact expressions even for first order inclusion probabilities,

* Tel.: 46-8783 4490; e-mail: [email protected]. 03"78-3758/97/$17.00 © 1997 Elsevier Science B.V. All rights reserved PII S0378-3758(96)00 1 85- 1

136

B. Roskn/Journal of Statistical Planning and Inference 62 (1997) 135 158

and matters become only worse for second order quantities. To circumvent these obstacles we follow an asymptotic considerations approach, which leads to approximation results. The instrumental tool is a general limit theorem, which is formulated and proved in Section 5. Using this theorem, asymptotic results for linear statistic under order sampling are formulated in Section 3. In particular, estimation issues are treated in Section 3.3. The proofs are deferred to Section 6. By varying the ordering variable distributions, a wide class of varying probabilities sampling schemes is obtained. At least two special cases have been treated in the literature, successive sampling and sequential Poisson sampling (SPS). These schemes are specified in Section 2, and they are treated further in Section 4. Asymptotic results for successive sampling have been studied by i.a. Ros6n (1972) and Hfijek (1981). Ohlsson (1990, 1995b) presents sequential Poisson sampling, a scheme for nps (inclusion probabilities 7~ proportional to size) sampling, which is used in the Swedish Consumer Price Index survey system since 1990. The study reported here was initiated by discussions with Dr. Ohlsson on SPS problems. The results in this paper lie on the theoretical side, but they lay ground for applications of sampling practical interest. Ros6n (1997), which is a follow-up to the present paper, shows that the notion of order sampling yields interesting contributions to the problem of finding good and simple nps sampling schemes. Other application possibilities emanate from the fact that "empirical" order samples may be encountered in practice not as a result of an imposed sampling scheme but because "nature" selected the sample in this way. 1.2. Basic assumptions and notations

U = (1, 2 . . . . ,N) denotes a finite population. A variable on it, z = (zl, z2 . . . . ,zN), associates a numerical value to each unit in U. A sampling frame, in the guise of a list with records that one-to-one correspond with the units in the population, is available, and we let U refer to frame as well as population. We confine to without replacement (wor) probability sampling. Then the outcome of a sampling scheme is specified by the sample inclusion indicators 11, 12, ..., IN (Ii = 1 if unit i is sampled and = 0 otherwise). P, E, V and 17 denote probability, expectation, variance and variance estimator. First and second order inclusion probabilities are denoted by ni = P(I~ = 1) and n~ = P(Ig = Ij = 1), i,j = 1, 2 . . . . . N. We disregard non-response, i.e. we assume that sampled and observed units agree. Arithmetic operations on population variables should be interpreted component-wise, e.g. for x = (xl, x2 . . . . ,xN) and y - - ( y a , Y2. . . . . YN) we have x ' y = (Xl "Yl . . . . . xN" yu). A linear statistic is of the form (1.1), where the weights w = (wl, w2, ... ,wu) are presumed to be known for all units in U, while the study variable values y = (Yl, Y2, ... ,YN) are known only for observed units. N

L(y;w)=

• i~ sample

yi'wi=

Z yi'wi'tl. i= 1

(1.1)

B. Roskn/Journal of Statistical Planning and Inference 62 (1997) 135 158

137

A linear statistic of special interest is the Horvitz-Thompson (HT) estimator f(Y)H'r, which is obtained by setting wi = 1/~i in (1.1). It yields unbiased estimation of the population total z(.v) = Yl + Y2 + "" + YN- The literature provides formulas, in terms of first and second order inclusion probabilities, for the variance of ?(y)m as well as for estimators of this variance (provided that rcis > 0 for i, j = 1, 2 . . . . . N). R e m a r k 1.1. The particular, linear statistic sample sum, is denoted by S(y), or by S(n; y) if we want to emphasize the sample size; N

S(y)=S(n;y)=

~

Yi= ~ Yi'I~.

i~ sample

i- 1

(1.2)

The following straightforward relation

L(y; w) = S(y- w)

(1.3)

shows that any linear statistic can be viewed as a sample sum. In particular, f ( y ) m = S(y/n), where rt = (7tl, re2, ... , rtN). Therefore, for notational simplicity many of the results in the sequel will be formulated in terms of the "standardized" linear statistic, sample sum, and (1.3) provides a means for easy transformation of the results to general linear statistics.

2. Order sampling Definition 2.1. To each unit i in U = (1, 2 . . . . . N) there is associated a probability distribution Fi on [0, oc) with density f/. Order samplinq from U with sample size n, n ~<; N, and order distributions F = (F1,/r2 . . . . . F N ) , denoted OS(n; F), is carried out as follows. Independent random variables Q1, Q2 . . . . . QN, called orderinq or ranking variables, with distributions F1, F 2 . . . . . F N are realized. The units with the n smallest Q-values constitute the sample. Since the ordering distributions are presumed to be continuous, the realized Q values contain (with probability 1) no ties. Hence, an OS(n; F) sample has the fixed size n. If all Fi are equal, OS(n; F) is simple random sampling (SRS), and all population units have the same inclusion probability. However, as soon as all Fi are not equal, OS(n; F ) has non-equal inclusion probabilities. By varying the ordering distributions, OS(n; F) generates a quite wide class of varying probabilities sampling schemes. Two particular cases are specified below. Definition 2.2. (a) Sequential Poisson sampling is order sampling with uniform ordering distributions. For 0 = (01, 02 . . . . . ON), Oi > 0, SPS(n; 0) stands for OS(n; F) with Fi = the uniform distribution on [0, 1/0i3, i = l, 2, ... , N.

138

B. Roskn/ Journal of Statistical Planning and Inference 62 (1997) 135-158

(b) Successive sampling is order sampling with exponential ordering distributions. For 0 = (01, 02 . . . . , ON), 0i > 0, SUC(n; 0) stands for OS(n; F) with Fi(t) = 1 - e -°' .t, which has density fi(t) = Oi "e -°'t, 0 ~< t < 0% i = 1,2, ... ,N.

(2.1)

Remark 2.1. An account of SPS is given in Ohlsson (1995b). See also Ohlsson (1995a). Remark 2.2. The Definition 2.2(b) of successive sampling is not the usual one, which runs as follows. The sample is generated by n successive wor draws from the population, with each new draw independent of the previous drawing history. As long as unit i remains in the population it is, at each draw, selected among remaining units with probability proportional to 01 • By this definition it is readily deduced that formula (2.2) below holds for the probability of selecting the ordered (by successive draws) sample il, iz, ... ,i, (of different units). Using the notation ~ ( 0 ) = 01 + 0 2 "-[- "'" + ON w e have p(ia,i2,

...

,i,) = Oi~ .

0i2

~(0) ~(0)-0il

Oi, ~(o) - (ol, + oi2 + ... + ol. , )

(2.2)

We shall show that Definition 2.2(b) also leads to (2.2), and we start with some preparations. For 2 > 0, Exp(2) stands for the exponential distribution with density 2 e x p ( - 2. t), t/> 0. The following assertions (i)-(iii) state well-known properties of independent Exp(2x) and Exp(2v) random variables X and Y: (i) min(X, Y) is Exp(2x + 2y). (ii) P ( X < Y ) = 2x/(2x + 2y) and (iii) for a random variable T which is independent of X and Y, the conditional distribution of X - T and Y - T given that (X > T and Y > T) is the same as the unconditional distribution of X and Y. ("Exponential variables have no memories"). Now let the sample be drawn according to Definition 2.2(b), and be ordered by increasing Q-values. We confine the verification of (2.2) to the case n = 2, which suffices to indicate the general principles. For different unitsj and k, let p(j, k) = P(Qj is the smallest and Qk the next smallest among the Q-values). Relation (2.3) holds for order sampling in general; p(j, k) = P(Qj < min{Qi; i :/:J})'P(Qk < min{Qm; m :A j, k}[Qi > Qj, i vaj). (2.3) Under (2.1) Qj is Exp(0j) and, by (i), min{Qi; i ¢ j } is Exp(z(0) - Oj). Since Qj and min {Qi; i va j} are independent, (ii) yields that the left hand factor in (2.3) is Oj/z(O). To compute the right hand factor, first note that (iii) (in "many-dimensional" version) with T = Qj yields that the factor equals the corresponding unconditional probability. To compute this probability, argue as before; Q, and min{Qm; m ¢ j , k} are independent, Qk is Exp(0K) and, by (i), min {Q~; i C j, k} is Exp(z(0) - 0j - Ok). Hence, by (ii), P(Qk < min{Qi; i ¢ j , k}) = Ok/(Z(O) -- Oj). Thereby relation (2.2) is obtained for n = 2. The general case is readily obtained by induction. (A smoother, but less

B. Roskn/Journal of Statistical Planning and Inference 62 (1997) 135 158

139

elementary, justification can be given by relying on well-known results for waiting times in independent Poisson processes.)

3. Asymptotic distributions of linear statistics under order sampling This section encompasses limit results for linear statistics under general order sampling. The sampling practical interest in such results stems from the fact that they lay ground for approximations which can be employed in practical "finite" situations. We highlight this aspect by formulating approximation results as well as stringent limit results. The latter provide justification of the former. Moreover, the conditions for the validity of a limit result shed light on the question when a corresponding approximation can be expected to be "good enough" in a practical situation. In the following approximation results, we use the somewhat sweeping phrase "under general conditions" to signify that a corresponding stringent limit theorem holds under quite general conditions. N(/~, a 2) denotes the normal distribution with mean /~ and variance a 2, and ~ denotes convergence in distribution/law. In Sections 3.1 and 3.2 we confine to the standardized linear statistic, the sample sum. In Section 3.3, which treats estimation issues, we illustrate the statement in Remark 1.1, that results about sample sums are readily transformed to results for general linear statistics. 3.1. The main result in approximation version We start by formulating an approximation version of the main limit result, Theorem 3.1.

Approximation result 3.1. Consider OS(n; F) sampling from U = (1, 2 . . . . . N), and let z = (zl, z2 . . . . , zN) be a variable on U. S(n; z) denotes the corresponding sample ,sum. Then, with g and a as specified below, (3.1) holds under general conditions; Law[S(n; z)] is well approximated by N(/~, 62).

(3.1)

Specification of parameters: N

solves the equation (in t)

~, Fi(t) = n,

(3.2)

i=1 N

I~ = ~ zi'Fi(~.),

(3.3)

i=1

4 =

z~. i=1

he.),

a 2 = ~ ( z i - qS)2 -F~(~)'(1 - Fi(~)). i=1

(3.4)

i=

(3.51

140

B. Rosbn/Journal of Statistical Planning and Inference 62 (1997) 135-158 N

Remark 3.1. A rephrasing of (3.2) is

Fi(O = n.

(3.6)

i=1

Formula (3.4) tacitly presumes that the sum of distribution functions in (3.2) has positive derivative ~if~(0 in t = ~, which should be viewed as an assumption in the approximation result• Moreover, as is readily checked, under this assumption the solution ~ to (3.2) is unique.

3.2. The main limit result Theorem 3.1. For k = 1, 2 . . . . . an OS(n,; Fk) sample is drawn from U, = (1, 2, ..., Nk). fki denotes the density Of Fki. Zk = (Zkl, Zk2, ... , ZkN,) is a variable on Uk, and Sk(nk; Zk) denotes the corresponding sample sum. The parameters ~k, ~k, (bk and ak are defined in accordance with (3.2)-(3.5). Then, if conditions (B1)-(B5) below are met, we have Law[(Sk(nk; ZR) -- # , ) / a , ]

=~ N(0, 1), as k ~ ~ .

(3.7)

(B1) (B2) (B3) (B4)

nk ~ oO as k ~ ~ (which in particular implies that Nk ~ ~ , as k ~ 0o). maxi]zki - - ~ ) k l / t T k --~ O, as k -~ ~ . 0 < l i m k ~ ( ~ k / n k ) Y,~= Sk~((k) <~ l i m k ~ ( ~ k / n k ) Y,~= tfki(~k) < ~ . For some 6 > O, some C < ~ and some function w(d) which tends to 0 as A ~ O, the following inequalities hold for 1 - 6 <~ t, s <~ 1 + 6;

If~,(t'~,) --fki(S'~k)l ~< C . w ( ] t - Sl)'fki(~k),

i = 1,2 . . . . . Nk, k = 1,2,3 . . . . .

(3.8) (B5) For some p > O, we have Fk,((k)'(1 -- Fk,(~k)) ~> p" ~k'fki(~k),

i = 1,2 . . . . ,Nk, k -- 1,2,3, ....

(3.9)

The proof of Theorem 3.1 is given in Section 6. Remark 3.2. A loose formulation of condition (B2) is that no Zki-Value deviates

"extremely" from the majority of Zki-Values. The condition is somewhat involved, though, to the effect that it depends on the ordering distributions (via q~ and ak) as well as the variable values z,~. The result (3.11) formulates simpler sufficient conditions for (B2), which separate conditions on variable values and order distributions. The following well-known Noether condition (N) relates to the spread of the variable values. Let £k and ~Ok 2 denote "ordinary" population mean and population variance, respectively• 1

N~

Z~k=Nkk• j:~=1 (N)

Zkj

and

1

N~

e32 = - - ' N k 1 j=~ (zkj -- £,)2.

m a x l z k i - - Y k J / ( O g , ' x / ~ k ) ~ O , as k ~ ~ . i

(3.10)

B. Roskn/Journal of Statistical Planning and Inference 62 (1997) 135-158

141

Moreover, introduce the following condition; (B6) For some 6 > 0, Fki(~k)'(1 -- Fki(~k)) >~ 6, i = 1, 2, . . . , Nk, k = 1, 2, 3 . . . . . Then; (B6) + (N) ~ (B2).

(3.11)

Verification. For any 21, 2 2 ~ [ m i n l z k i , maxizki] we have maxilzki--21b ~< 2"max~lzki- 221. Since q~k and Zk both lie on the interval [minizki, max izki] (cf. (3.4) and (3.10)), we have

maxilZkl - ~bkl ~< 2 maxilzki -- Zk[.

(3.12)

Presume that (B6) is satisfied. By (3.5), (B6) and the fact that a sum of squared deviations is minimal around the mean, we have Nk

Nk

a~ = ~ (Zki - Ok)2"Fki(~k)'(1 -- ek,(~k)) >~ 6" ~ (Zki-- Ok) 2 i=1

i--1 Nk i=1

(3.13) Combination of (N), (3.12), (3.13) and (B2) now yield (3.11). 3.3. Estimation o f population characteristics

The central inference problem in finite population sampling contexts concerns estimation of a population total. The following result, based on the results in Sections 3.1 and 3.2, concerns this estimation problem for observations from an order sample.

Approximation result 3.2. Consider OS(n; F) sampling from U = (1,2, . . . , N ), and let Y == (Yl, Y2. . . . ,YN) be a variable on U. Introduce the following order sampling estimator for the population total z(y) = Yl + Y2 + "'" + YN; N

Yi . "~(Y)os: Z F ~ ) Ii.

(3.14)

i=1

Then (a) and (b) below hold under general conditions. (a) With Z z as stated below, we have

Law[?(Y)os] is well approximated by N(z(y), Z2).

(3.15)

z2 = N N-- 1" i~1 @ (\ F -Yi~ - •)2. Fi(O" (1 -- Fi(~)),

(3.16)

where Yi 7 = ~, ~ ' f ~ ( ~ ) i

f~(~). i

(3.17)

B. Rosbn/Journal of Statistical Planningand Inference62 (1997) 135-158

142

(b) In particular, "~(Y)osyields consistent estimation of z(y). (c) In particular, )(2 yields asymptotically correct value for V[~(Y)os]. (d) A consistent variance estimator for "~(X)os is provided by n

Y~

- ~

.(1 - F , ( ~ ) ) . I ,

(3.18)

where ~= ~ Y~ .i~/ N f(~) i=1 ~ "f~(~) i~1 ~i(~)" Ii.

(3.19)

Justification. Part (a) is in essence obtained by setting zi = yffFi(~) in Approximation Result 3.1 and recalling (1.3). Note that p in (3.3) then becomes z(y). There is a slight discrepancy, though, between (3.5) and (3.16) relating to the "correction factor" N/(N - 1) in (3.16), which is inserted in the following, on somewhat ad hoc ground. As (3.16) stands, it yields the proper SRS variance in the case when all Fi are equal, and order sampling is nothing but SRS. Note that the factor N/(N - 1) tends to 1 as N increases to infinity, and recall condition (B1). The factor n/(n - 1) in (3.18) has an analogous ground. We now turn to the (b) and (d) parts. By Theorem 3.1 and (a) ~(Y)os = z(y) + random term of

order N/Z2.

(3.20)

For simplicity we presume that the following conditions, which reflect "typical" orders of magnitude, are met. For some ~ > 0, p > 0 and C < ~ ; [Yil ~< C < ~ ,

i = 1,2 . . . . ,N, and Iv(y)[ i> p.N,

(3.21)

0
i = l,2, ... ,N,

(3.22)

0<6~
i = 1 , 2 . . . . . N.

(3.23)

As is readily checked, under (3.21) and (3.22), (3.20) can be written as follows, which together with (B1)justifies (b). £(Y)os = z(y) .(1 + random term of order 1/x/N ).

(3.24)

By estimates of the type (3.20) we have Numerator in (3.19) = numerator in (3.17).(1 + random term of order 1/x/N), (3.25) Denominator in (3.19) = denominator in (3.17) × (1 + random term of order 1/x/N).

(3.26)

(3.25) and (3.26) yield = 7"(1 + random term of order 1/x/N).

(3.27)

B. Rosbn/Journal of Statistical Planning and Inference 62 (1997) 135-158

143

In the same vein, estimates of type (3.20) together with (3.27) yield

= i=1 F ~

- 7

"F/(~)'(1 - Fi(~))'(1 + random term of order 1~x/N).

(3.28) Thereby also (d) is justified. The claim in (c) follows along the following lines. Express (3.15) as: Law[(~(Y)os- r(Y))/X] is well approximated by (in sequence version, converges in law to) the standard normal distribution N(0, 1). Combine this with the the general result that convergence in law entails convergence of variances to that of the limit distribution, provided the sequence is second order uniformly integrable (which holds e.g. if 4th moments are uniformly bounded). This type of condition is viewed to be covered by "under general conditions".

Remark 3.3. As just stated, ~(Y)os yields consistent (loosely "asymptotically unbiased") estimation of r(y) under general conditions. By combining this with the fact that the HT-estimator f(Y)nT = Y~ y~/zt~'l, is the unique unbiased linear estimator of z(y), comparison of (3.14) and the HT-estimator leads to the conjecture that formula (3.29) below yields good approximation of OS(n; F) inclusion probabilities under general conditions. Note, though, that (3.29) is only a conjecture, we have no stringent limit theorem justification of it. ~ ~ F,(~),

i = 1,2 . . . . . N.

(3.29)

4. Asymptotic results for sequential Poisson and successive sampling In this section we specialize the results in Sections 3.1 and 3.2 to the particular order sampling schemes sequential Poisson and successive sampling, as specified in Definition 2.2. The parameter 0 = (01, 02, ... ,ON) is said to be normalized if 8 1 - t - 0 2 + "'" + 8 N = 1.

4.1. Sequential poisson sampling Since Ohlsson (1995b) presents a special study of SPS, we are brief on this topic. We confine to specializing Approximation Result 3.1.

Approximation result 4.1. Consider SPS(n; 0) sampling from U = (1,2, ... , N), with normalized O. Assume that n- 01 < 1, i = 1, 2. . . . . N. Let z be a variable on U and S(n; z)

144

B. Roskn/ Journal of Statistical Plannin9 and Inference 62 (1997) 135-158

the corresponding sample sum. Then (3.1) holds with the following 12 and a: N

12 = n. ~ zi" 0i,

(4.1)

i=1

a2 =

zi i=1

zj'Oj

(4.2)

"nOi'(1 - nOi).

j=l

Derivation from Approximation Result 3.1. By Definition 2.2(a), SPS(n; 0) is OS(n; F ) with Fi(t ) = m i n ( t ' 0~, 1) and f/(t) = 0~ on t ~ [0, 1/0~], = 0 outside [0, 1/0~], i = 1,

2, . . . , N .

On

the

interval 0 < t < 1/max~0i Eq. (3.2) takes the form t. (Oi + Oz + ... + ON) = n, which for normalized 0 yields ~ = n. N o w it is readily checked that (3.3) and (3.5) specialize to (4.1) and (4.2). 4.2. Successive sampling

Here we specialize Approximation Result 3.1 as well as limit T h e o r e m 3.1.

Approximationresult

4.2. Consider SUC(n; 0) sampling from U = (1, 2 . . . . , N), with

normalized O. L e t z be a variable on U and S(n; z) the corresponding sample sum. Then

(3.1) holds with 12 and cr as stated below; N

(1 - e -°''t) = n.

is the solution to the equation (in t)

(4.3)

i=1 N

e-°"¢),

(4.4)

(9 --- ENi=l zi " Oi "e - ° " ¢ v£N=10.e_O,. ~ ,

(4.5)

12 = ~ z i ' ( 1 i=1

~ 2 = i:1 ~

zl--

Z lzJ'0J'e-°J y ~-=-~Oj~

j

"(1-e-°"¢)'e

-°''¢

(4.6)

Derivation from Approximation Result 3.1. We leave to the reader to check that (3.2)-(3.5) specialize to (4.3)-(4.6) when Fi(t) a n d f { t ) are as in (2.1).

Theorem 4.1. For k = 1,2,3 . . . . , consider S U C ( n k ; Ok) sampling from U k = (1,2 . . . . , N k ) , with normalized Ok. Let Zk be a variable on Uk and Sk(nk; Zk) the corresponding sample sum. Let ~k, #k, ~)k and O"k be in accordance with (4.3)-(4.6). Then (3.7) holds, if conditions (C1)-(C4) below are satisfied. ( e l ) nk -~ oo as k ~ ~ (which in particular implies that N k --* o0, as k -~ ~ ) . (C2) m a x i J z k i - C~k[/gk--* O, as k ~ ~ . (C3) For some L < ~ , Oki <~ Link, i = I, 2 . . . . , Nk, k = 1, 2, 3, . . . . (C4) For some p > O, 7 > 1 and 6 > O, the following holds ( # = number o f units in); # {i: Okl >-piNk} >>-max(7"nk, 6"Nk}, k = 1,2,3 . . . . .

B. Roskn/Journal of Statistical Planning and Inference 62 (1997) 135-158

145

Remark 4.1. Theorem 4.1 improves on results in Ros~n (1972) and H/tjek (1981) by providing more general conditions for asymptotic normality. Remark 4.2. Here we comment on condition (C4), which is a bit involved. In loose words it states the following. When Ok is normalized, the average value of Ok~ is lINk. Say that Ok~is "small" if Oki < p/Nk for a "small" p (p > 0), otherwise "non-small". (C4) states that the number of non-small Oki should exceed nk (7 > 1), and should constitute a non-negligible (6 > 0) proportion of all Oki. In (4.8) below we present a sufficient condition for (C4) in terms of the simpler conditions (C5) and (C6), which separate the restrictions on nk and Ok~. (C5) For some p > 0, Oki >~ p/Nk, i = 1,2 . . . . ,Nk, k = 1,2,3, .... (C6) For some 2 < 1, nk ~ J." Nk, k = 1, 2, 3 . . . . . The following implications hold; (C4) ~ (C6),

(4.7)

(C5) + (C6) ~ (C4).

(4.81

Verification of (4.7) and (4.8). Presume that (C4) holds. Then, Nk t> max(7"nk, 6"Nk) >~ 7" rig. Hence (C6) holds, and (4,7) is justified. Conversely, presume that (C5) and (C6)hold. Then, # {i: Oki >~ pINk} = by (C5) = Nk >~ by (C6) >1 max(~ 1. rig, Nk). Hence (C4) holds. Proof of Theorem 4.1. We show that (C1)-(C4) imply (B1)-(B5) in Theorem 3.1, which then justifies Theorem 4.1. Presume that (C1)-(C4) are in force. First note that (C1) and (C2) are just repetitions of (B1) and (B2). In the verification of (C3)-(C5) the following two estimates (4.9) and (4.10) will be crucial. Their proofs are postponed for a while. with p, 6 and 7 as in (C4), k = 1,2, 3 . . . . .

nk <~ ¢k <~ nk/[p'6"(1 -- 7-1)],

i = 1,2 . . . . . Nk, k = 1,2,3 . . . . .

~k'Oki<<-C,

For s o m e C < ~ ,

(4.9) (4.10)

By (2.1) we have Nk

Nk nk

2

fki(~k) = ~,k. ~ Oki.e-O,,.~. nk

i= 1

(4.11)

i= 1

(B3) now follows from (4.11) and the two estimates below. Recall that Ok is normalized. Nk

Nk

nk

k

Z1

Oki'e -°~`¢k >>-¢k'e-~C nk

k=1

>le-C,

k = 1,2,3 . . . . .

(by 4.9)

(4.12)

Nk

Nk

Oki.e-O~.~. <~~_kk. ~ nk

~ Oki (by 4.10)

k= 1

(by (4.9))

nk

Ok i

~ [ p ' 3 " ( 1 -- 7 1)] a, k = 1,2,3 . . . . .

k= 1

(4.13)

146

B. Ros@n/ Journal of Statistical Planning and Inference 62 (1997) 135-158

Next we verify (B4). We have [fki(t" ~k) -- fki(S" ~k)[ = fki(~D" Ie-ok,. (,- 1). ¢~ _ e - 0~,.(s- 1). ¢~[.

(4.14)

E m p l o y the general inequality Ie - x - e-Y[ ~< Ix - y[ "e max{Ixl'lyl}, - oo < x, y < oo, to obtain [fki(t'~k) --fki(S'~k)[ ~fki(~k)'

Oki" ~k" eOk"~k'max(tt ll,ls--11),

[ t -- S I"

k = 1,2,3 . . . . .

(4.15)

(B4) n o w follows readily from (4.15) and (4.10). T o verify (B5) e m p l o y the inequality 1 - e - x / > x-(1 - e-X°)/Xo, 0 <~ x <~ Xo, with Xo = C according to (4.10); Fki(~k)'(1 -- Fki(~k)) = (1 -- e-°~"¢~)'e -°~''e~ >~ Oki" ~k'[(1 -- e - C ) / C] "e -°~''¢~ = [(1 - - e - C ) / c ] ' ( k ' f k i ( ~ k ) ,

k = 1,2,3, . . . .

(4.16)

N o w (4.16) yields that (B5) is satisfied. It remains to show that (4.9) and (4.10) hold. Set, Nk

Nk

~pk(t)= ~ F k i ( t ) = ~ ( 1 - - e-°~"t), i=l

0~
(4.17)

i=1

We estimate ~ok from above by using the inequality 1 - e -x ~< x,

0 ~< x < ~ ;

Nk

~pk(t) <~ ~ Oki" t = t =: ~ptku)(t), 0 ~< t < ~ ,

(since Ok is normalized).

(4.18)

i=1

Next we estimate ~Pk from below. Set ak = # {i: Ok~ >I p / N k } . By (4.17) and (C4) we have ~pk(t) >>-ak'(1 -- e -°t/N~) =:q~(ke)(t), 0 ~< t < ~ .

(4.19)

Since ~p(k~)(t)~< ~pk(t) <~ ~ptku)(t), and ~k is the solution to ~pk(t) = nk (see (3.2)), ~k lies between the solutions to qCku)(t) = nk and ~p(k~)(t)= rig. This yields (4.20) below, where also the inequality In (1 - x ) - 1 << x/(1 - x), 0 ~< x < 1, is used. nk ~ ~k ~ N__kk.In (1 - nk/ak)- 1 ~ Nk

p ~<

p.6.(1 -7-~) '

p

nk 1 ak (1 -- nk/ak)

k = 1, 2, 3,

"'"

.

Hence, (4.9) and thereby T h e o r e m 4.1 is proved.

(by (C4)).

(4.20)

B. Roskn/Journal of Statistical Planning and Inference 62 (1997) 135 158

147

5. A general limit result for passage variables

5.1. Some notation and terminology W e start by settling some n o t a t i o n and terminology to be used in this section. Let [to, tl ], to < tl, be a specified interval, and z(t) a function on [to, t~ ]. M(z) denotes the supremum of ]z(t)] on [to, tl]. The modulus of variation for z on [to, tl], w(A; z), is w(A; z) = sup{lz(s) -- z(t) l:ls - t[ <<.A, to <~ s, t <~ t l }.

(5.1)

A sequence {Zk(t)}~= ~ of functions on [to, t~] is equicontinuous if there is a function w(A), A > 0, with w ( A ) - * O as A --,0, such that w(A; Zk) <~ w(A), 0 < A <<,t~ -- to, k =: 1, 2, 3 . . . . . A sequence {Qk}~°= ~ of r a n d o m variables is bounded in probability, if SUpk P([ Qkl >~ M) ~ O, as M ~ ~ . ~ denotes convergence in probability. A sequence {Zk }k~ 1 of stochastic process on [to, t l] satisfies condition ( T ) if; (T)

o~ 1 with Ak ~ 0 , W(Ak; Z k ) ~ O , as k --" oc. F o r any sequence {A k}k=

(5.2)

Condition (T) is closely related to tightness properties of {Law(Zk)}~= ~, a topic which is treated in depth in Billingsley (1968). Below we state a sufficient criterion for condition (T). Even if it differs in formulation from T h e o r e m 12.3 in Billingsley (l 968), the p r o o f of that t h e o r e m contains all vital ingredients for verification of the criterion. Criterion for Condition (T). Sufficient for a sequence Z1, Z 2 , Z 3 , ... of separable stochastic processes to satisfy condition (T) is that for some C < m, and for each fixed A >0 limE(Zk(t+A)--Zk(t))

4 <~ C A 2,

to <<.t,

t + A <~ t,.

(5.3)

k ~

5.2. A limit theorem f o r passage variables An interval [to, t l ] is specified. F o r k = 1,2,3, ... , V , and Uk are stochastic processes on [to, tl], Yk and Xk are ordinary functions on [to, t~], and e,k is a positive real number. The stochastic processes Yk and Xk are defined by Yk(t) -= yk(t) + ek" Vk(t),

to ~ t <~ tl,

(5.4)

Xk(t) = Xk(t) + ek" Uk(t),

to <~ t <~ tl.

(5.5)

F o r a real rk, the level Zk passage times for Yk and Yk, denoted t* and T * (often referred to as inverses) are defined by t* = inf{t: yk(t) >~ Z,},

Tk* = inf{t: Yk(t) >~ Zk}, with the convention inf0 = tl. (5.6)

B. Roskn/Journal of Statistical Plannin9 and Inference 62 (1997) 135-158

148

Our interest concerns the asymptotic behavior (as k -o oo) of the random variable X k ( T * ) = the value o f X k when Yk passes through the level "~k"

(5.7)

Theorem 5.1. For k = 1, 2, 3, ..., let Yk, Xk, '~k, Yk, Xk, ~k, t~ and T * be as specified above. Provided that conditions (A1)-(A8) below are fulfilled, we have Xk(T*)--Xk(t~) ,Sk

(A1) (A2) (A3) (A4) (A5) tinuous (A6) (A7) (A8)

(Uk(t,)_x!(t*) -- \

.

/

~

V(t,))po,

ask~.

(5.8)

"

ek--*O as k ~ o o . limk.o~ min(zk -- yk(tO), yk(tl) -- Zk) > O. Yk has derivative Y'k on [to, tl], and limk-~ 0o infto <., <~tl y'k(t) > O. {Y'k)k~ l is equicontinuous on [to, tl]. Xk has derivative X'k on [to, tl ], and {X'k}k~=1 is uniformly bounded and equzconon [to, tl ]. For some tk ~ [to, tl ], k = 1, 2, 3, ..., { Vk(tk)}~= 1 is bounded in probability. { Vk }if= 1 satisfies condition (T). {Uk}ff=l satisfies condition ( T ).

Remark 5.1. When (A2)-(A4) are met, yk(t) is continuous and strictly increasing on [to, tl], and it crosses the level Zk on [to, tl]. Then t* can be defined in the more transparent way as the unique solution to the equation (in t) yk(t) = Zk. In particular; yk(t*) = Zk. Remark 5.2. Iftg' = t* for k = 1, 2, 3, ..., where t* is an interior point in [to, tl], then condition (A3) implies that (A2) is satisfied. Remark 5.3. A main aspect on the result (5.8) is that it tells that the two terms have the same asymptotic behavior. As a consequence, a limit result for one of the terms holds also for the other one. Usually, the term within brackets is simplest to study, and it provides a mean for studying the other term. This use of(5.8) is illustrated in Section 6. The crucial ingredient in the proof of Theorem 5.1 is stated in the following Lemma 5.1, where we continue to use notation and assumptions as in Theorem 5.1. However, the situation in Lemma 5.1 is more special. It concerns just one k-situation, and we omit index k. Moreover, also V and U are presumed to be ordinary ( = non-random) functions. Hence, in Lemma 5.1 Y, X, y, x, V and U are real-valued, bounded functions on an interval [to, tl], to < tl, which are related in accordance with (5.4) and (5.5), e and ~ being real numbers, e > 0. The level ~ passage times for y and Y, t* and T*, are defined in accordance with (5.6). Furthermore, we assume that the

B. Roskn/Journal of Statistical Planning and Inference 62 (I997) 135-158

149

following counterparts to (A2), (A3) and (A5) are in force:

y(to) < z < y(tl).

(5.9)

The functions y and x have continuous derivatives y' and x' on [to, t~]. (5.10) For s o m e m > 0 ,

y'(t)>~m>0, to~t~
(5.11)

The proof of the following Lemma 5.1 is deferred until next subsection. Lemma 5.1. Let notation be as stated above. Assume that (5.9)-(5.11) are in force. Define c by (5.12), and presume that e, > 0 is small enough for (5.13) and (5.14) to hold;

c = 2.M(V)/m.

(5.121

w(c.c; y') <~ m/3,

(5.13)

e ~ min(z - y(to), y(tl) - z ) / ( 2 . m ( v ) ) .

(5.14)

For R(O as defined implicitly below, (5.16) then holds; X(T*)

'(t*)'V(t*))+e'R(e), = x ( t * ) + ~. ( V ( t * ) - xy'(t*) c'M(x')

IR(e)l ~ < - - ' w ( e ' c ; m

y') + c'w(e'c; x') +

2"M(x') m

(5.15)

w(e . c; V) + w(c . c; U). (5.16)

Remark 5.4. If we add to the other conditions in Lemma 5.1 that U and V are continuous on [-to, tl], the bound in (5.16) tells that R ( e ) - , 0 as c ~ 0. Hence, the remainder term e" R@) in (5.15) becomes negligible as e tends to O. Proof of Theorem 5.1. Set, in line with (5.15)

Rk(ek) = Xk(r*)ek-- Xk(t*) _ ( gk(t~) -- X'k(t*~)'Vk(t*)) Y'ktt*)

k=1,2,3,....

(5.17)

(A2), (A3) and (A5) imply that there are p > 0, m > 0, M < oc, and ko < o% such that

yk(to) + P < Zk < Yk(tl) -- p, y'k(t) ~> m > O, t0<~t<%tt, M(X'k) <~ M, k >>-ko.

(5.18) Fix a 6 > 0. From (A6) and (A7) we conclude that there is a Ba < oc, and a k1(6) < 0% such that

P(IM(Vk)[ <~ Bo) >~ 1 -- 6,

k >~ k~(6).

(5.19)

Set, in accordance with (5.12);

c~ = 2" B~/m.

(5.20)

150

B. Rosbn /Journal of Statistical Planning and Inference 62 (1997) 135-158

By virtue of (A1) and (A4) we have for some kz(6) < oe; 1

W(ek'Ca; Y'k) <~ 3m,

k >>,k2(6).

(5.21)

By (A1) and the left inequality in (5.18), the following bound holds on the set {[M(Vk) I <~Ba} for some ka(6) < oo; ek <~ min(zk -- yk(tO), yk(tl) -- Zk)/(2"Ba),

k >~ k3(6).

(5.22)

From (5.21) and (5.22) it is seen that Lemma 5.1 applies on the set { [M(Vk)[ ~ Ba} for k sufficiently large. Hence, by (5.16) ca" M

I Rk(ek)l <~ - - " m +

W(ek" Ca; Y'k) + C~" W(ek" Ca; X',)

2"M m

"w(ek'ca; V~) + w(ek'c~; Uk)

holds on {[M(V,)] ~< Ba}, for k ~> max{ko, k1(6), kz(6), k3(3)}. (5.23) By (A1), (A4) and (A5), the two first terms in (5.23) tend to 0 as k --* oc. By (A7), (A8) and the definition of condition (T), the two last terms tend to 0 on the set {[m(Vk)[ ~< Ba} as k ~ o0. By (5.19), P ( I M ( V k ) I <~ B~) >>, 1 -- 6. Hence, for fixed 3' > 0, lim,_~ ~ P(IRk(ek) l /> 3') ~< 6. Since 3' and 6 are arbitrary, the claim in Theorem 5.1 follows. [] 5.3. P r o o f o f lemma 5.1

(5.24) introduces a notation for the remainder term in a one-step Taylor expansion, and (5.25) presents an estimate of it, which follows from the mean value theorem. As before, w denotes modulus of variation. Define p(t,s; z) by z(t) = z(s) + (t - s).z'(s) + p(t,s; z),

(5.24)

Then Ip(t,s; z)l ~< It - s l ' w ( I t - s[;z').

(5.25)

Next we prepare the proof of Lemma 5.1 with two auxiliary lemmas. Lemma 5.2. Let notation and assumptions be as in Lemma 5.1. Then I T* -- t*l ~< ~" c.

(5.26)

B. Ros~n/Journal of Statistical Planning and Inference 62 (1997) 135-158

151

Proof. By (5,10), (5.11) and (5.9), y is continuous, strictly increasing, and crosses the level z on the interval [to, tl]. Hence, t* is the unique solution to the equation y(t) = z. In particular; (5.27)

y(t*) = z.

By (5.4), (5.10) and (5.24) we have, provided that to ~< t*, t* + h ~ tl; r ( t * + h) = y(t* + h) + e. V ( t * + h) = y(t*) + h.y'(t*) + p(t* + h, t*; y) + ~. V(t* + h).

{by Taylor expansion of y(t* + h)}.

(5.28)

F r o m (5.28), (5.27), (5.11) and (5.25) we get for h > 0; Y ( t * + h) >~ z + h . m - h . w ( h ; y') - ~.1V(t* + h)]. Now, provided that 0 < h ~< e-c, this estimate can, by (5.13), be continued to Y ( t * + h) ~ z + h . m - h ' m / 3 - ~. M ( V ) > z + h . m / 2 - e,. M ( V ) .

(5.29)

By setting h = e . c in (5.29) and recalling (5.12), it is seen that Y ( t * + ~ , . c } > z , provided that t* + e . c ~< tl. It may happen, though, that t* + e . c > tl. If so, note that Y(tl) = y(tl) + ~" V(tl) and (5.14) imply that Y ( t l ) > z. All in all this implies that Y(min(t* + e-c, tl)) > z. By similar reasoning, it is shown that Y(max(t* - e- c, to)) < z. Hence, Y has a level passage on the interval [max(t* - ~,.c, to), rain(t* + ~.c, tl)], which is included in t* :__k~-c. Thereby (5.26) is proved. [] L e m m a 5.3. L e t notation and assumptions be as in T h e o r e m 5.1, and define Q(~,) by (5.30). T h e n (5.31) holds. T * = t*

y'(t*)

V(t*) + ~" Q(e).

IQ(e)l ~< [ c ' w ( e ' c ; y') + 2"w(e'c; V ) ] / m .

(5.30) (5.31)

Proof. Since Y need not be continuous nor increasing on [to, tl], Y ( T *) and z are not as simply related as y(t*) and z. An analog to (5.27) is stated in (5.32), where w ( 0 + ; z) denotes the limit of w(d; z) as d ~ 0, which also can be interpreted as the maximal j u m p for z on [to, tl ]. I Y ( T * ) - v] <~ w(O + ; Y) ~ {by (5.4) and the fact that y is continuous}

= e . w ( 0 + ; V).

(5.32)

By (5.4) and (5.24) we have Y ( T * ) = y ( r * ) + e" V ( T * ) = y(t*) + ( T * -- t*).y'(t*) + p(T*, t*;y) + e" V(t*) + e. I V ( T * ) - V(t*)].

{by Taylor expansion of y(T*)}.

(5.33)

152

B. Rosbn/Journal of Statistical Planning and Inference 62 (1997) 135-158

Rearrangement in (5.33), (5.27) and comparison with (5.30) yields e'Q@) = [(Y(T*) - ~) - p ( T * , t*; y) - e ' ( V ( T *) -- V(t*))]/y'(t*).

(5.34)

By applying (5.32), (5.25) and (5.11) in (5.34), we get IQ(e)[ ~< [ e ' w ( 0 + ; V) + IT* - t * [ ' w ( [ T * - t*[; y') + e'w([ T * - t*[; V)]/(e.m).

(5.35)

By using (5.26) and w(O + ; z) <<.w(A; z) in (5.35), the bound (5.3l) is obtained.

[]

Proof of Lemma 5.1. By (5.5) and (5.24) X ( T * ) = x ( T * ) + e" U ( T * ) = x(t*) + ( T * - t*)'x'(t*) + p ( T * , t * ; x) + ~. u ( t * ) + ~ . ( U ( T * ) - U(t*)).

{Taylor expansion o f x ( T * ) }

(5.36)

Insertion of T* - t* according to (5.30) into (5.36) and comparison with (5.15) yields (5.37)

R@) = x'(t*)" Q@) + p ( T * , t * ; x)/e + ( U ( T * ) - U(t*)).

By employing (5.25) in (5.37) we get ]R(e)[ = rx'(t*)l']Q@)[ + [T* - t * l ' w ( ] r * - t*[;x')/~ + w([T* - t'l; U). (5.38) Now, application of (5.31) and (5.26) in (5.38) leads to (5.16).

[]

6. Proof of theorem 3.1 6.1. The p r o o f

The instrumental tool in the proof will be Theorem 5.1, and we start with some preparations for its application. For the ordering variables Q1, Q:, . . . , QN in Definition 2.1, introduce the following stochastic processes Hi(s) on 0 ~< s < ~ , where H, "counts when Qi occurs". (1 A denotes the indicator of the set A); Hi(s)=l{q,,
0~
i=1,2,...,N.

(6.1)

Hi(s) is a 0-1 random variable with expected value Fi(s), which yields E[Hi(s)] = Fi(s)

and

Vat[Hi(s)] = Fi(s)'(1 -- Fi(s)),

0 ~< s < ~ .

(6.2)

B. Roskn/Journal of Statistical Plannin~t and Inference 62 (199;7) 135 158

153

Define the stochastic processes J(t) and L(t; z) as follows, where z = (zl, Zz . . . . . zN) are given real numbers and ~ > 0 is arbitrary (but fixed); N

J(t) = ~

N

Hi(t'O

and

L(t;z)=

i=l

~ zi.H~(t.~),

(6.3)

0 < ~ t < oct.

i=1

Let T * be the time when J(t) hits the level n, and A(n; z) the value of L(t; z) at the hitting time; T * = inf{t: J ( t ) - - n}

and

(6.4)

A(n; z) = L ( T * ; z ) .

The following relation (6.5) should be evident upon some thought. It provides the link between order sampling and passage variables, which allows application of Theorem 5.1. The sample sum S(n; z) under OS(n; F) has the same distribution as A(n; z). (6.5) We now adhere to the sequence version in Theorem 3.1. For k = 1,2, 3, ..., let Jk(t), Lk(t; Zk), T * and Ak(nk; Zk) be in accordance with what is said above, and let ~ , ~k, qSk and ak be in accordance with (3.2)-(3.5). By virtue of (6.5) we have Law[(Sk(nk; Zk) -- t~k)/ak] agrees with Law[(Ak(nk; Zk) -- #k)/ak],

k = 1,2, 3 . . . . .

(6.6) As will be shown, Theorem 5.1 yields that (Ak(nk;Zk) -- #k)/ak is asymptotically N(0,1)distributed under the conditions in Theorem 3.1. Having shown that, (6.6) tells that the same holds for (Sk(nk; ZR) -- I~k)/ak, and Theorem 3.1 is proved. When employing Theorem 5.1, the entities Yk, Xk, Xk, Yk, Vk, Uk, ek and zk are chosen as stated in (6.7)-(6.12). The interval under consideration is set to [to, tl] = [1 - 6, 1 + 6], where 8 comes from (B4). Superscript c stands for centering at expectation. When checking (6.7)-(6.10), recall (6.2). 1 rk(t) = nk i:, Hgi(t'~,) 1

N~

yk(t) = - - E Fki(t" ~k), l'|k i = 1

1

1 Xk(t) = x/~k '

yk(t) + - - - - "

Vk(t), 1

N~

Vk(t) + - - - ~ E Hki(t'~k) c' N/nki=l

Zk, Z dpk "Hk,(t'~.k) = Xk(t) + - - "

Zki-- qbk "Fki(t'~k), ak

(6.7)

tO <~ t <-%tl,

1

Uk(t),

to ~ t <~ t,,

to <<.t <. t,,

Z k = 1,

(6.9)

Uk(t) = E Zk, -- Ok .Hk,(t.~k)¢, i= , ak

to ~< t ~< t~, ~k = 1/'x/~k,

(6.8)

(6.10) k = 1,2,3, ...,

k=1,2,3,....

From (6.12), (6.8) and (3.6) follows that t~' in (5.6) takes the value t* = 1.

(6.11) (6.12)

154

B. Ros~n/ Journal of Statistical Planning and Inference 62 (1997) 135-158

The proof of Theorem 3.1 comprises two main parts which are stated as separate lemmas below. Together the two lemmas yield that (Ak(nk, Zk)- [Ak)/~k is asymptotically N(0,1)-distributed under (B1)-(B5), which is the claim in Theorem 3.1. In addition, Section 6.2 contains two lemmas with auxiliary results, which will be used in the sequel. Lemma 6.1. Under conditions (B1)-(B5) in Theorem 3.1, we have A k ( n k ; 7~k) - - ]Ak

Ug(1) ~ 0,

t7k

as k ~ oo.

(6.13)

Lemma 6.2. Under condition (B2) in Theorem 3.1, we have Law[-Uk(1)] ~ N(0, 1) as k - ~ oo.

(6.14)

We start by proving Lemma 6.2, which is the simplest. Proof of Lemma 6.2. Since Qkl, Qk2. . . . . Hki(S), i = 1,2, ... ,Nk. Hence, cf. (6.10),

QkN are independent, so are the processes

Nk

Ck(1) = ~ Z k i - (ak Uk,((k)~ i= 1

(6.15)

O'k

is a sum of independent random variables with mean 0. By using (6.2) and (3.5) it is readily checked that the variance of Uk(1) is 1. To verify (6.14) we apply Lyapunov's condition (in 4th moments version) for the central limit theorem for independent variables. •

u~ E (Zki -- 4)k Hk,(~k) i~1 \ ok

C/4

N~ (z~i Z ~)~ ~ "Fki(~k)'(1 -- Fk~(~k) (by Lemma 6.3) ~k4 i==1

~ max (Zk, _ C~k)2 N. t74k

(

" Z (Zki -- (gk)E" F k i ( ~ k ) " (1 - - t k i ( ~ k ) i= 1

=

maxi

,Zki ~ ~.)k!~2 O"k

(by (3.5)).

/I

(6.16)

By (B2) the last term in (6.16) tends to 0 as k ~ ~ , and Lemma 6.2 is proved.

[]

Proof of Lemma 6.1. First we derive expressions for the two terms in (5.8). By (6.3) and (6.4), it is readily checked that X k ( T * ) --

Lk(T*;Zk) -- dpk'Jk(T*)

=

Ak(nk;r.) -- ~bk'nk

(6.17)

B. Ros~n/Journal of Statistical Planning and Inference 62 (1997) 135-158

155

Furthermore, (6.10), (3.3) and (3.4) yield xk(t*) = Xk(1) -- I~k -- (Ok'nU

(6.18)

(7k •

From (6.17) and (6.18) we get X R ( T * ) -- xk(t*)

-

Ak(nk;Z) -- #k

/-;k

(6.19)

O'k

The formulas for Yk and Xk in (6.8) and (6.10), and the assumption that the Fki have densities imply that Yk and Xk are differentiable on [-to, tl], and Nk

y'k(t) = ~ " ~ fk~(t" {k), nk

(6.20)

to <~ t <~ t~,

i= 1

~k

•~

Zk, -- C~k "fk,(t'~k),

Xk(t) = N ~ k i=1

(6.21)

to <~ t <~ t,.

O'k

From (6.21) and (3.4) follows that Xk(t*) = X'x(1) = 0. Hence, the term to the right in (5.8) "collapses" to Ut(1). This together with (6.19) yields that/Jr Theorem 5.1 applies in the present situation, it leads to (6.13). It remains, though, to show that Theorem 5.1 in fact does apply, which we do by showing that conditions (B1)-(B5) imply (AI)-(A8). (A1) follows from (6.11) and (B1). (A3) is the left part of (B3), cf. (6.20). By Remark 5.2, (A2) follows from (A3), t* = 1, k = 1, 2 .... , and the choice of the interval [to, t~ ]. To verify (A4) we use (6.20) Nk

~k

N~

ly'k(t) -- y~(S)I ~ gk. ~ [fki(t' ~.k) --fki(S" ~k)l <<-C ' w ( [ t -- sl)" """ ~ J1,~(~-k) /~k

nk

i=1

(by (B4)).

i=1

(6.22)

Formula (6.22), the right-hand side part of (B3) and the properties of w yield that (A4) is satisfied. When considering (A5), we first show that {x~,(1)}~= 1 is bounded, and we start from (6.21). ,x~(1)[ =

~nk" ~= ~ Zk~--C~k'fki(~k)ak

2Fki(~k)(1

(by Schwarz' inequality).

--

Fki(~k))

"

. ~ l'lk

,~=1 F k i ( ( k ) ' ( 1 •=

Fki(~-k))

(6.23)

156

B. Rosbn/Journal of Statistical Planning and Inference 62 (1997) 135 158

By (3.5), the left square root in (6.23) equals 1. By using (B5) in the right-hand-side square root, we arrive at the following estimate, which together with (B3) shows that {x;,(1)}~=l is bounded; [x~,(1)[ <~

~k . ~ fk,(¢k).

(6.24)

i=1

Next we show that {X;,(t)}k%1 is equicontinuous on [to, tl], and we start from (6.21);

Ixi(t)-x;,(s)l = ~ ' i = ~ ~< (by Schwarz' inequality and (3.5)) <~

~/~k i~= (fki(t'~k)--fki(S'~k))2

.k 4

v-22

(f-

(L5

C 2'w(It - s I)2"¢k ~ fki(¢,) (by (B4) and (B5)). P" glk

(6.25)

i=1

By (B3), (6.25) tells that {x;,(1)},%1 is equicontinuous on [to, tl]. Since {xi(1)}k%l is bounded, (A5) is verified. The following estimate implies (A6) (recall (6.8), (6.2) and (3.6)): Nk

Var[Vk(1)] =--1. ~ Var[nki(~k)] /'/k 1

i=1 N~

1

Sk

= -- ~ Fki(~k)'(1 -- Fki(~k)) <<--- ~ Fk,(~k) = 1. /'/k i=~-1

(6.26)

/'/k i = 1

To show that (A8) is met, we employ condition (5.3), and we start from (6.10); lim E[Uk(t) - Uk(S)] 4 = ~ k~oo

k~

E [ N ~ = l Z k i ---- ~ k i

-'(

~< 6. lira ~

k~ov i=1

"(nki(t" Ck)c - - Hki(S'(k) c) 14

ffk

Zk~ -- Ok

y

"E[{Hk~(t'~k) -- Hki(S'~k)}~] 4

O'k

( 1c)

+ 6- li---m

k ~ o9 i

;

Zk, Z (ak z. Var[nk~(t" Ck) -- nk,(S" Ck)]

~

t~k

(by Lemma 6.4)

(6.27)

Here we make an interlude to derive some auxiliary results. Under (B4) there is a C' < o% such that; fk,(t'¢k)<~C''A,(¢k),

t ~ [ t o , tl],

i = 1 , 2 .... ,Nk,

k=1,2,3 .....

(6.28)

B. Ros~n/ Journal of Statistical Planning and Inference 62 (1997) 135-158

157

Under (B4) and (B5) there is a C" < oc, such that I Fki( t" ~k)

--

FRi(S" ~k)l ~ C " ' l t

- s I" Fki(~k)'(1 -- Fki(~k)),

i=1,2

t, S ~ [to, tl ],

k=1,2,3,....

.... ,Nk,

(6.29)

Check of (6.28): f k i ( t ' ~ k ) <~fki(~k) + l f k i ( t ' ~ k ) - - fki(~k)l ~ fk~(~k)'(l + C ' w ( I t

-- l l )

by(B4)

~< c' Ai(~).

Check of (6.29): Recall that Fk~ is differentiable with continuous derivative~i. By the mean value theorem I Fki(t" ~k) -- Fki(S" ~k) l = [t -- S I" ~k'fki(~C), where ~ lies between t'~k and S ' ~ k . By applying first (6.28) and then (3.9), (6.29) is obtained. We now return to (6.27). Note that Hki(t" ~k) -- Hki(S" ~k) is a 0- 1 random variable with mean value Fki(t" ~,) -- Fki(S'~.k). By Lemma 6.3 and (6.29), (6.27) can be continued as follows: <<. 6 " C " ' [ t -

s I" nm ~maxl

.

k ~ oo \

Gk

Zki -- O k i= 1

+ 6'(C") 2'(t - s)2- lim

. Fk,(~k)'(1 -- Fk~(~.k)

Zki -- &k

k~oo

i

\

O'k

"Fki(~k)'(1 -- Fki(~k)

O'k



(6.30)

/

By (3.5) the sums in (6.30) equal 1. From this and (B2) it is seen that the first term in (6.30) equals 0, and the second equals 6- (C"):- (t - s) 2. Now condition (5.3) tells that (A8) is satisfied. Verification of (AT) is similar but simpler, and details are omitted. We start from (6.8): lim E [ V k ( t ) k~oe

Vk(S)] 4

lim

{Hk~ (t'~ k) -- H k i ( S ' ~ k ) } ~

• ~

k~m

i=1 Nk

~< 6. lira 1 . k~o~nk

+ 6- 1-~ k~c

~ E[{Hk~(t~k)

-- Hk~(S" ~k)}~]'*

i=1

1. \i'Zk

~ Var[Hki(t'~k) -- Hk~(S'~_k)] i=1

(by Lemma 6.4) - -

1

Nk

<<. 6 C " l t - s I" lira n~, ~" Fk~(~k) k~oc

i=1

(

+ 6(C")2(t- s) 2 1-~ _ 1 . k~oo

XxRk

-~1'kN Fki(:k) )2 • ~ i=

(6.31)

158

B. Roskn/Journal of Statistical Planning and Inference 62 (1997) 135-158

Now that estimate (6.31) together with (3.6) yields that (5.3) is satisfied, and (A7) is verified. Hence, Lemma 6.1, and thereby also Theorem 3.1 are proved. [] 6.2. Two auxiliary results

The following results are certainly known, but they are proved for the sake of completeness. Lemma 6.3. For a 0-1 random variable Z with P ( Z = 1) = p, we have E(ZO) 4 = E(Z - p)4 ~< p.(1 - p).

(6.32)

Proof. By binomial expansion of(Z - p)4 and use of E Z = E Z 2 = E Z 3 = E Z 4 -- p, it follows that E(ZC)4 = p'(1 - p)' (3p 2 - 3p + 1) ~< p' (1 - p). [] Lemma 6.4. Let Z1, Z2 . . . . . Z s be independent random variables. Then E(i~=lZ~y<~6Ii=~E(Z~)4+(i=~Var(Zi))2 ;.

(6.33)

Proof. By employing binomial expansion of (Y~I N Zi) c 4 = (Y~t N - 1 Zic + Z~v)4 and taking expectation we get E ( • N 1 Z i~) 4 -~ E ( Z f - 1 Z~) 4 + Var(ZN). ~ - 1 Var(Zi) + E ( Z ~ , ) 4. By induction in N, this identity readily yields (6.33). []

References Billingsley, P. (1968). Convergence of Probability Measures. Wiley, New York. H~ijek, J. (1981). Sampling from a Finite Population. Marcel Dekker, New York. Ohlsson, E. (1990). Sequential Poisson sampling from a business register and its application to the Swedish consumer price index. Statistics Sweden R & D Report 1990:6. Ohlsson, E. (1995a). Coordination of samples using permanent random numbers. In: Business Survey Methods. Wiley, New York. Ohlsson, E. (1995). Sequential Poisson sampling. Research Report from Institute of Actuarial Mathematics and Mathematical Statistics at Stockholm University. Ros6n, B. (1972). Asymptotic theory for successive sampling with varying probabilities without replacement I and II, Ann. Math. Statist. 43, 373-397 and 748-776. Ros6n, B. (1996). On sampling with probability proportional to size. J. Stat. Plan. Inf., this issue.