Motoo's combinatorial central limit theorem for serial rank statistics

Motoo's combinatorial central limit theorem for serial rank statistics

Journal of Statistical Planning and Inference 91 (2000) 427–440 www.elsevier.com/locate/jspi Motoo’s combinatorial central limit theorem for serial ...

108KB Sizes 0 Downloads 22 Views

Journal of Statistical Planning and Inference 91 (2000) 427–440

www.elsevier.com/locate/jspi

Motoo’s combinatorial central limit theorem for serial rank statistics a Department

David M. Masona; 1 , Tatyana S. Turovab; ∗; 2

of Mathematical Sciences, 501 Ewing Hall, University of Delaware, Newark, DE 19716, USA b Department of Mathematical Statistics, University of Lund, Box 118, Lund S-221 00, Sweden

Abstract We shall use the Stein method to prove a general Motoo-type combinatorial central limit theorem for serial rank statistics. Our basic approach will be the underlying graph structure of the ranks of lag 1. In the process we shall obtain minimal conditions for the asymptotic c 2000 Elsevier Science B.V. All normality of Wald–Wolfowitz-type serial rank statistics. rights reserved. MSC: primary 62G20; secondary 62G10 Keywords: Random graphs; Ranks; Serial rank statistics

1. Introduction and statement of main result Consider the nonparametric testing problem: H0 : X1 ; : : : ; Xn are independent versus H1 : X1 ; : : : ; Xn are ÿrst order dependent (such as ARMA(1; 1)); where X1 ; : : : ; Xn are observations of a time series at times i = 1; : : : ; n and under H0 ; X1 ; : : : ; Xn are assumed also to have a common continuous distribution function. Let R(i); for i = 1; : : : ; n; denote the rank of Xi among X1 ; : : : ; Xn . The classical statistics for testing H0 versus H1 are often based on the pairs of ranks (lag 1 ranks), given by (R(1); R(0)); (R(2); R(1)); : : : ; (R(n); R(n − 1)); ∗

(1.1)

Corresponding author. E-mail address: [email protected] (T.S. Turova). 1 Supported by NSF Grant 9803344. 2 On leave from the Institute of Mathematical Problems of Biology, Russian Academy of Sciences, Russia.

c 2000 Elsevier Science B.V. All rights reserved. 0378-3758/00/$ - see front matter PII: S 0 3 7 8 - 3 7 5 8 ( 0 0 ) 0 0 1 9 2 - 0

428 D.M. Mason, T.S. Turova / Journal of Statistical Planning and Inference 91 (2000) 427–440

where R(0) ≡ R(n). For instance, the Wald–Wolfowitz (1943) statistic for testing H0 versus H1 is deÿned to be n P (1.2) Wn := an (R(i))an (R(i − 1)); i=1

where an (i); i = 1; : : : ; n; is a triangular array of constants, typically satisfying the condition that n P an (i) = 0: i=1

This statistic can be written in an alternate way which we shall soon see will be very useful in establishing its asymptotic normality. Let A = (A1 ; : : : ; An ) denote the vector of anti-ranks of (R(1); : : : ; R(n)), i.e. A is the inverse permutation of (R(1); : : : ; R(n)) deÿned by R(Aj ) = j;

j = 1; : : : ; n:

Clearly under H0 for any permutation ((1); : : : ; (n)) ∈ P; where P denotes the set of all permutations of (1; : : : ; n), 1 : n! Now for any n¿3; deÿne the random permutation of (1; : : : ; n) P{(A1 ; : : : ; An ) = ((1); : : : ; (n))} =

(f(1); : : : ; f(n)) = (R(A1 − 1); : : : ; R(An − 1)):

(1.3)

With this notation we can rewrite the Wald–Wolfowitz statistic as n P Wn := an (i)an (f(i)): i=1

The published proofs of central limit theorems for Wald–Wolfowitz statistics of the form (1.2) are generally quite long and technical. The proof of the original Wald– Wolfowitz central limit theorem is based on the method of moments. Hallin et al. (1985) use the projection technique to establish a result for a generalized version of Wn . The proof which Hallin and Vermandele (1996) provided for their central limit theorem, though based on a simple approach developed by Lombard (1986), is very lengthy. The proofs given in Haeusler et al. (2000) for their somewhat more general results are much shorter. However, they rely upon a sophisticated weighted approximation to a serial rank process by a Brownian bridge. The methods used by Hallin et al. (1985) and Hallin and Vermandele (1996) have the advantage that they provide representations of the statistics that are very useful in establishing local powers and optimality results via Le Cam techniques. Our intention here is to provide a short and easy proof of a central limit theorem under minimal conditions for a general class of serial rank statistics which includes the Wald–Wolfowitz statistic. To accomplish this we shall use the Stein (1972) method making use of the underlying graph structure of the pairs given in (1.1). Let us now introduce the notions from graph theory we need for our approach. A directed graph consists of a set of vertices V = {1; : : : ; n}; and the set of ordered pairs

D.M. Mason, T.S. Turova / Journal of Statistical Planning and Inference 91 (2000) 427–440 429

{(ik ; jk ): k = 1; : : : ; N }; N ¿1; where ik ; jk ∈ V and ik 6= jk for each k = 1; : : : ; N . The pair (ik ; jk ) represents the arc from the vertex ik to the vertex jk . Thus, the positive integer N denotes the number of arcs in the directed graph. We say that the vertex i is connected to the vertex j if there is a path of arcs from i to j. For any n¿3; leq Hn be the set of all directed graphs for which N = n and {i1 ; : : : ; in } = {j1 ; : : : ; jn } = {1; : : : ; n}:

(1.4)

Hence, a directed graph H ∈ Hn is deÿned uniquely by the set of its arcs, which enables us to use the following notation for the elements of Hn : H = {(ik ; jk ): k = 1; : : : ; n}:

(1.5)

It follows from (1.4) that any directed graph H ∈ Hn has the property that for any vertex i there exist exactly one incoming arc (j; i) for some j 6= i and exactly one outcoming arc (i; l) for some l 6= i. Further assume the arcs of any H ∈ Hn form a unique cycle, i.e. the set in (1.5) can be rewritten as H = {(l1 ; l2 ); (l2 ; l3 ); : : : ; (ln ; l1 )};

(1.6)

where {l1 ; l2 ; : : : ; ln } = {1; : : : ; n} due to the deÿnition (1.4). The elements of the set Hn are called Hamiltonian cycles. The following result is a special case of Proposition 2:1 of Haeusler et al. (2000) and will be crucial for the proof of our main result. Proposition 1.1. For any n¿2 {{(1; f(1)); : : : ; (n; f(n))}: A ∈ P} = Hn :

(1.7)

This says that (f(1); : : : ; f(n)) cannot take on all possible permutations in P; but only those (n − 1)! permutations  ∈ P for which {(1; (1)); : : : ; (n; (n))} ∈ Hn . We shall consider the following generalization of the Wald–Wolfowitz type statistics (cf. Hallin et al., 1985). For each n¿3 let (an (i; j))16i; j6n be an n × n matrix of real numbers such that n P i=1

an (i; j) = 0;

n P j=1

an (i; j) = 0;

j = 1; : : : ; n; i = 1; : : : ; n:

Introduce the class of statistics n P an (i; f(i)): Wn = i=1

Writing H = {(1; f(1)); : : : ; (n; f(n))}

(1.8)

(1.9)

430 D.M. Mason, T.S. Turova / Journal of Statistical Planning and Inference 91 (2000) 427–440

as a randomly selected H ∈ Hn , chosen with probability 1=(n − 1)! we see that P an (i; j): (1.10) Wn = Wn (H ) = (i; j)∈H

Deÿne further N1 :={(i; j) ∈ {1; : : : ; n}2 : i 6= j}: Straightforward but lengthy calculations based on Proposition 1.1 show that for n¿2 Pn an (i; i) s1 (n) EWn = − i=1 =:− (1.11) n−1 n−1 and for n¿3 P 2 s1 (n)2 (i; j)∈N1 an (i; j) + Var Wn = n−2 (n − 1)2 (n − 2) P 2s2 (n) (i; j)∈N2 an (i; j)an (j; i) − ; (1.12) − (n − 1)(n − 2) (n − 1)(n − 2) where s2 (n) =

n P i=1

a2n (i; i):

We shall assume from now on that for some 0 ¡ M ¡ ∞, for all n¿3 n n P P a2 (i; j)6M n−1 i=1 j=1

(1.13)

and Var Wn = 1:

(1.14)

We shall prove the following central limit theorem for Wn . It can be shown after some lengthly analysis that our theorem contains as special cases the central limit theorems for serial rank statistics of lag 1 established in Wald and Wolfowitz (1943), Hallin et al. (1985), Hallin and Vermandele (1996), and Haeusler et al. (2000). Theorem 1.1. In addition to (1:13) and (1:14); assume the Lindeberg condition that for all  ¿ 0; P a2n (i; j) = 0; (1.15) lim n−1 n→∞

|an (i; j)|¿

then as n → ∞ Wn − EWn →d Z;

(1.16)

where Z is a standard normal random variable. Theorem 1.1 should be compared to the Motoo (1957) combinatorial central limit theorem for the class of rank statistics n P an (i; (i)); Tn := i=1

D.M. Mason, T.S. Turova / Journal of Statistical Planning and Inference 91 (2000) 427–440 431

where ((1); : : : ; (n)) ∈ P is chosen with probability 1=n!, which we state below in the form given in Schneller (1988). For this reason we consider our Theorem 1.1 as Motoo’s combinatorial central limit theorem for serial rank statistics. Theorem (Schneller, 1988). Assume that the Var Tn = 1 and the Lindeberg condition (1:15) holds then as n → ∞ Tn →d Z; where Z is a standard normal random variable. HÃajek (1961) has shown that the Lindeberg condition in Motoo’s theorem is necessary. Therefore we suspect it is also necessary in our Theorem 1.1. Our proof of Theorem 1.1 will parallel closely the short proof given by Schneller (1988) for Motoo’s theorem, using the Stein method coupled with a clever combinatorial construction due to Bolthausen (1984). We shall substitute Bolthausen’s trick with one of our own motivated by the Hamiltonian cycle structure of ranks of lag 1. Presently, our approach depends crucially on the Hamiltonian cycle structure of ranks of lag 1. This structure breaks down for ranks of lag r; r¿2, when r is not relatively prime to the sample size n. Refer to Haeusler et al. (2000) for further details. On the other hand, the methods of Hallin et al. (1985), Hallin and Vermandele (1996) and Haeusler et al. (1998) allow one to prove central limit theorems for Wald–Wolfowitz type statistics based on ranks of any lag r. Moreover, using the projection technique, Hallin et al. (1985) can handle multivariate analogs of statistics of the form (1.9) which are functions of ranks of an arbitrary number of lags. However, their results when specialized to the case of lag 1 serial rank statistics do not imply Theorem 1.1. 2. Proof of Theorem 1.1 To establish Theorem 1.1, we shall follow the basic outline of the proof of Motoo’s combinatorial central limit given by Schneller (1988), using the Stein method. As in Schneller (1988) it is enough to show that as n → ∞ Eh(Wn − EWn ) → (h); for all ÿxed continuous functions h which can be extended continuously to R∪{∞; −∞}, where (h) denotes the Eh(Z). In the following, for notational simplicity we shall from now on drop the subscript n on Wn ; an ; Hn etc. Deÿne the function Z x (h(y) − (h))’(y) dy; g(x) = ’(x)−1 −∞

where ’ denotes the standard normal density. This function satisÿes: g0 (x) − xg(x) = h(x) − (h)

for all x ∈ R;

(2.1)

432 D.M. Mason, T.S. Turova / Journal of Statistical Planning and Inference 91 (2000) 427–440

and lim |g0 |(x) = 0 and g0 is uniformly continuous:

|x|→∞

(2.2)

(Refer to Schneller, 1988.) 2.1. Randomization Let N:={1; : : : ; n}, and for any 0 ¡ k ¡ n set Nk :={(i0 ; : : : ; ik ) ∈ Nk+1 : ij 6= il for any j 6= l}:

(2.3)

Let I = (I0 ; I1 ; I2 ) be a random vector uniformly distributed on N2 . Given I = i = (i0 ; i1 ; i2 ) ∈ N2 deÿne a random vector J = (J0 ; J1 ; J2 ) in N2 as follows. Let J1 be uniformly distributed on N \ {i0 }. Then we have the following cases: (i) If J1 = j1 = i1 , then J = i. (ii) If J1 = j1 = i2 , then J0 = i1 and J2 is uniformly distributed on N \ {i0 ; i1 ; i2 }. (iii) If J1 = j1 6∈ {i0 ; i1 ; i2 }, then J0 is uniformly distributed on N \ {i0 ; i1 ; j1 }. In which case, if J0 =i2 then J2 is uniformly distributed on N\{i0 ; i1 ; i2 ; j1 }. Whereas, if J0 6∈ {i0 ; i1 ; i2 ; j1 }; then the component J2 is uniformly distributed on N \ {i1 ; i2 ; j1 ; j0 }. Notice that it follows from our construction that (I0 ; I1 ; J1 ) ∈ N2

unless I = J:

(2.4)

For each (i; j) such that P {I =i; J =j} ¿ 0 deÿne a new vector V (i; j)=(V0 ; : : : ; Vk ) ∈ Nk ; 26k65, with {V0 ; : : : ; Vk } = {i0 ; i1 ; i2 ; j0 ; j1 ; j2 }, which preserves the order of the components of the vectors I and J . Formally, this means  i; if i = j;      (j ; j ; i ; i ; i ); if {i ; i ; i } ∩ {j ; j ; j } = {i }; 0 1 0 1 2 0 1 2 0 1 2 0 V (i; j) =  if {i0 ; i1 ; i2 } ∩ {j0 ; j1 ; j2 } = {j0 ; j1 }; (i0 ; i1 ; i2 ; j2 );     (i0 ; i1 ; i2 ; j1 ; j2 ); if {i0 ; i1 ; i2 } ∩ {j0 ; j1 ; j2 } = {j0 }: Deÿne also for any k¿1 a function s(i1 ;j1 ) (·) : Nk → Nk to be a transposition of the components whose values are i1 and j1 . This means that for any k¿1 and v ∈ Nk s(i1 ;j1 ) (v) = v; unless i1 6= j1 and vm = i1 and vl = j1 for some m 6= l; 06m; l6k, in which case s(i1 ;j1 ) (v) = (s0 ; : : : ; sk ); where sm = j1 ; sl = i1 , and sp = vp otherwise. For any H ∈ H and i ∈ {1; : : : ; n} let fH (i) = j;

fH−1 (j) = i;

i (i; j) ∈ H:

Then we deÿne recursively fHk (i) = fHk−1 (fH1 (i)) for all k ¿ 1 and i; j ∈ {1; : : : ; n}.

(2.5)

D.M. Mason, T.S. Turova / Journal of Statistical Planning and Inference 91 (2000) 427–440 433

Further for any 26r64 and v = (v0 ; : : : ; vr ) ∈ Nr we shall deÿne a transformation G(v; ·) : Hn → Hn

(2.6)

as follows. Denote (v) = {(v0 ; v1 ); : : : ; (vr−1 ; vr )}: Then (I) if H ∩ (v) = (v), set G(v; H ) = H: (II) Otherwise for any l ∈ N deÿne for any 16r ¡ n − 1, M (l) = M (l; H; v):=min{m¿1: fHm (l) 6∈ {v1 ; : : : ; vr }}

(2.7)

and proceed with the following iterative steps. Step A: Deÿne the set of arcs 0 )}; Gr+1 (v; H ):= (v) ∪ {(vr ; vr+1

where 0 :=fHM (v0 ) (v0 ): vr+1

(2.8)

Step B: Assume, that we have already constructed the set 0 0 0 ); : : : ; (vr+k−1 ; vr+k )} Gr+k (v; H ):= (v) ∪ {(vr ; vr+1

for 16k6n − r − 1. Then let 0 M (vr+k )

0 :=fH vr+k+1

0 (vr+k )

(2.9)

and deÿne 0 0 ; vr+k+1 )}: Gr+k+1 (v; H ):=Gr+k (v; H ) ∪ {(vr+k

Step C: Finally, set G(v; H ):=Gn (v; H ): Also, for i = (i0 ; i1 ; i2 ) and j = (j0 ; j1 ; j2 ) such that (i0 ; i1 ; i2 ; j0 ; j1 ; j2 ) ∈ N5 deÿne G(i; j; ·) : Hn → Hn as follows. For any l ∈ N deÿne M1 (l) = M (l; H; i; j):=min{m¿1: fHm (l) 6∈ {i1 ; i2 ; j1 ; j2 }} and M0 (l) = M (l; H; i; j):=min{m¿1: fHm (l) 6∈ {i1 ; i2 ; j0 ; j1 ; j2 }} and proceed with the following iterative steps.

(2.10)

434 D.M. Mason, T.S. Turova / Journal of Statistical Planning and Inference 91 (2000) 427–440

Step 1: Deÿne the set of arcs 0 )}; Gr+1 (i; j; H ):= (i) ∪ {(ir ; ir+1

where 0 :=fHM1 (i0 ) (i0 ): ir+1

Step 2: Assume, that we have already constructed the set 0 0 0 ); : : : ; (ir+k−1 ; ir+k )} Gr+k (i; j; H ):= (i) ∪ {(ir ; ir+1

for 16k6n − r − 1. Then if 0 6= j0 ; ir+k

let 0 M1 (ir+k )

0 :=fH ir+k+1

0 (ir+k )

(2.11)

and deÿne 0 0 ; ir+k+1 )}: Gr+k+1 (i; j; H ):=Gr+k (i; j; H ) ∪ {(ir+k

Step 3: Assume, that we have reached with this procedure the point j0 , i.e. for some 16k6n − r − 2 we have constructed 0 0 0 ); : : : ; (ir+m−1 ; ir+k )} Gr+k (i; j; H ):= (i) ∪ {(ir ; ir+1 0 = j0 . Then set with ir+k

Gr+k+2 (i; j; H ):=Gr+k (i; j; H ) ∪ (j): 0 ) in the formula Step 4: Unless k = n − r − 2 proceed as in Step (2) but with M0 (ir+k 0 (2.11) instead of M1 (ir+k ). Step 5: Finally, set

G(i; j; H ):=Gn (i; j; H ): We shall list some obvious properties of transformation G which we will use later on. Firstly, it follows immediately from our construction that for any v ∈ Nk ; 26k64, G(v; H ) ∈ H(v):={H ∈ H : (v) ⊂ H }

(2.12)

and for any i = (i0 ; i1 ; i2 ); j = (j0 ; j1 ; j2 ) such that (i0 ; i1 ; i2 ; j0 ; j1 ; j2 ) ∈ N5 G(i; j; H ) ∈ H(i; j):={H ∈ Hn : (i) ∈ H and

(j) ∈ H }

(2.13)

and G(i; j; H ) = G(j; i; H );

(2.14)

where clearly |H(v)| = (n − (k + 1))!

and

|H(i; j)| = (n − 5)!

(2.15)

D.M. Mason, T.S. Turova / Journal of Statistical Planning and Inference 91 (2000) 427–440 435

with |A| denoting the cardinality of a set A. Then after some straightforward calculations we get for any ÿxed v ∈ Nk ; 26k64, and any ÿxed H0 ∈ H(v), that #{H ∈ H: G(v; H ) = H0 } =

|H| |H(v)|

(2.16)

and, correspondingly, for any ÿxed i = (i0 ; i1 ; i2 ); j = (j0 ; j1 ; j2 ) with (i0 ; i1 ; i2 ; j0 ; j1 ; j2 ) ∈ N5 and H0 ∈ H(i; j): #{H ∈ H : G(i; j; H ) = H0 } =

|H| : |H(i; j)|

(2.17)

Let H be a randomly chosen Hamiltonian cycle uniformly distributed on H and independent of the random vectors I and J deÿned above. Next for any value of (I; J ) we deÿne ( G(I; J ; H ) if {I0 ; I1 ; I2 } ∩ {J0 ; J1 ; J2 } = ∅; (2.18) G((I; J ); H ) = G(V (I; J ); H ) otherwise and

( G1 ((I; J ); H )=

G((I0 ; J1 ; I2 ); (J0 ; I1 ; J2 ); H ) if {I0 ; I1 ; I2 }∩{J0 ; J1 ; J2 }=∅; G(s(I1 ;J1 ) (V (I; J )); H

otherwise:

(2.19)

Further let H1 =G((I; J ); H ) and H2 =G1 ((I; J ); H ), and set W =W (H ); W1 =W (H1 ) and W2 = W (H2 ). Also let W˜ = W − EW;

W˜ 1 = W1 − EW1 ;

W˜ 2 = W2 − EW2 :

Lemma 2.1. We have W =d W1 =d W2 ;

(2.20)

and for any measurable function g EWg(W ) = nEa(I0 ; I1 )g(W1 ) = nEa(I0 ; J1 )g(W2 ):

(2.21)

Proof. Let H0 ∈ H be ÿxed arbitrarily. Consider the probability function P{G((I; J ); H ) = H0 } P P{G((I; J ); H ) = (i; j) : (i)∈H0 ; ( j)∈H0

= H0 | (I; J ) = (i; j)} P{J = j | I = i} P{I = i} P P{G(i; j; H ) = H0 }P{J = j|I = i}P{I = i} = (i; j)∈N5 : (i)∈H0 ; ( j)∈H0

+

5 P

P

P{G(V (i; j); H)=H0}P{J =j|I =i}P{I =i}:

k=3(i; j):V (i; j)∈Nk−1 ; (V (i; j))∈H0

(2.22)

436 D.M. Mason, T.S. Turova / Journal of Statistical Planning and Inference 91 (2000) 427–440

Notice, that by the construction of (I; J ) the proabability P{J = j|I = i}, when positive, depends only on the cardinality of the set {i0 ; i1 ; i2 } ∪ {j0 ; j1 ; j2 } and n. Therefore, we can denote pk :=P{J =j|I =i} whenever |{i0 ; i1 ; i2 } ∪ {j0 ; j1 ; j2 }|=k and P{J =j|I =i}¿0 and after trivial calculations get 1 1 ; p4 = ; p3 = n−1 (n − 3)(n − 1)

p5 = p6 =

1 : (n − 4)(n − 3)(n − 1) (2.23)

Clearly, for any i ∈ N1 , P{I = i} =

1 : n(n − 1)(n − 2)

(2.24)

Using the fact that H is uniformly distributed on H and is independent of (I; J ), and taking into account (2.17) together with (2.15), we get for any (i; j) ∈ N5 such that (i) ∈ H0 and (j) ∈ H0 1 : (2.25) P{G(i; j; H ) = H0 } = (n − 5)! Analogously, taking into account (2.16) and (2.14) for any (i; j) such that V (i; j) ∈ Nk−1 and (V (i; j)) ∈ H0 1 : (2.26) P{G(V (i; j); H ) = H0 } = (n − k)! Substituting now (2.25), (2.26), and (2.24) into (2.22), we obtain P{G((I; J ); H ) = H0 } =

1 n(n − 1)(n − 2) +

5 P

P (i; j)∈N5 : (i)∈H0 ; ( j)∈H0

P

k=3 (i; j):V (i; j)∈Nk−1 ; (V (i; j))∈H0

pk (n − k)!

p6 (n − 5)! ! ;

(2.27)

where it is understood that the sums are only taken for pairs (i; j) such that P{J = j| I =i} ¿ 0. Making use of (2.23), along with simple calculations of the possible choices for (i; j) we get from (2.27) that 1 (2.28) = P{H = H0 }: P{G((I; J ); H ) = H0 } (n − 1)! This proves H =d G((I; J ); H );

(2.29)

which in turn implies W =d W1 :

(2.30)

Now write I 0 = (I0 ; J1 ; I2 )

and

J 0 = (J0 ; I1 ; J2 ):

D.M. Mason, T.S. Turova / Journal of Statistical Planning and Inference 91 (2000) 427–440 437

It is routine to show that V (I; J ) =d V (I 0 ; J 0 ); from which it readily follows that W =d W2 ; which when combined with (2.30) proves (2.20). To establish (2.21) we observe that for any h ∈ H given H1 = h, the conditional distribution of (I0 ; I1 ) is concentrated on the edges of the graph h (see properties (2.12) and (2.13)). Furthermore, conditioned on H1 = h, the random variable (I0 ; I1 ) is uniformly distributed on h. To see this, notice that when H1 = h and (i0 ; i1 ) ∈ h we have (i0 ; i1 ) = (i0 ; fh (i0 )) by deÿnition (2.5), and then P{(I0 ; I1 ) = (i0 ; i1 ) | H1 = h} = P{(I0 ; fH1 (I0 )) = (i0 ; fh (i0 )) | H1 = h} 1 = P{I0 = i0 | H1 = h} = : n Therefore, we get nE[a(I0 ; I1 )g(W1 )] = nE[g(W (H1 ))E{a(I0 ; I1 )|H1 }] # " 1 P a(i; j) = nE g(W (H1 )) n (i; j)∈H1 = E[W (H1 )g(W (H1 ))]:

(2.31)

Then taking into account the just established equalities (2.20), we immediately prove the ÿrst equality in (2.21). The second one is proved analogously. This completes the proof of the lemma. We shall also use the fact that the random variables W2 −W1 and W are independent, since W2 − W1 is a function of (I; J ) only, W = W (H ), where H is independent of (I; J ). From now on write a(·;  ·) = a(·; ·) − n−1 EW: We see by (2.20) and (2.21) that E[W˜ g(W˜ )] = E[W˜ 2 g(W˜ 2 )] = nE[a(I0 ; J1 )g(W˜ 2 )] − EW Eg(W˜ 2 )  0 ; J1 )(W˜ 2 − W˜ 1 )g0 (W˜ )] = nE[a(I  0 ; J1 )g(W˜ 1 )] + nE[a(I " Z + nE a(I  0 ; J1 )(W˜ 2 − W˜ 1 )

1

0

# + t(W˜ 2 − W˜ 1 )) − g (W˜ ) dt : 0

(g0 (W˜ + (W˜ 1 − W˜ )

438 D.M. Mason, T.S. Turova / Journal of Statistical Planning and Inference 91 (2000) 427–440

Notice that (I0 ; J1 ) is independent of W1 , therefore nE [a(I  0 ; J1 )g(W˜ 1 )] = 0: Also, by (2.20) and (1.14) and the independence of W and W1 − W2 , and the independence of (I0 ; J1 ) and W1 nE [a(I  0 ; J1 )(W˜ 2 − W˜ 1 )g0 (W˜ )] = nE[a(I  0 ; J1 )W˜ 2 Eg0 (W˜ )] = Var (W )Eg0 (W˜ ) = Eg0 (W˜ ): Thus, by (2.1) |Eh(W˜ ) − (h)| " 6nE |a(I  0 ; J1 )(W˜ 2 −W˜ 1 )||

Z 0

1

#

(g (W˜ +(W˜ 1 −W˜ )+t(W˜ 2 −W˜ 1 ))−g (W˜ )) dt| : 0

0

As in Schneller (1988), we now ÿx  ¿ 0, arbitrarily. By (2.2) we can ÿnd a  ¿ 0 and 06K ¡ ∞ such that |g0 (x)|6K for all x ∈ R and |g0 (x) − g0 (y)|6, whenever |x − y|6. Following exactly as in Schneller, we get then that the last term is 6 2KnE|a(I  0 ; J1 )(W˜ 1 − W˜ 2 )|1{|W˜ 1 − W˜ | + |W˜ 1 − W˜ 2 | ¿ } + nE|a(I  0 ; J1 )(W˜ 1 − W˜ 2 )|1{|W˜ 1 − W˜ | + |W˜ 1 − W˜ 2 |6}= : A1 + A2 : (2.32) Notice that |W˜ 1 − W˜ |6

P (i; j)∈H1 H

and |W˜ 1 − W˜ 2 |6

|a(i;  j)|

P (i; j)∈H1 H2

|a(i;  j)|;

where, AB denotes the symmetric di erence between two sets A and B. Obviously, we have |H1 H |616

and

|H1 H2 |68

(2.33)

due to deÿnitions (2.18) and (2.19). Setting C = H1 H2 and D = H1 H ∪ H1 H2 and choosing  = =24, we see then that P P |a(I  0 ; J1 )a(;  )|1{|a( ;  )| ¡ }; A1 62KnE (;)∈C ( ; )∈D

which in turn is 6 2KnE

P

P

(;)∈C ( ; )∈D

+ 2KnE

P

2 1{|a( ;  )| ¿ }

P

(;)∈C ( ; )∈D

[|a(I  0 ; J1 )|1{|a(I  0 ; J1 )| ¿ }1{|a( ;  )| ¿ }]

D.M. Mason, T.S. Turova / Journal of Statistical Planning and Inference 91 (2000) 427–440 439

+ 2KnE + 2KnE

P

P

(;)∈C ( ; )∈D

P

P

(;)∈C ( ; )∈D

[|a(;  )|1{|a(;  )| ¿ }1{|a( ;  )| ¿ }] [|a(I  0 ; J1 )|1{|a(I  0 ; J1 )| ¿ }|a(;  )|1{|a(;  )| ¿ }]:

Taking into account (2.33) and using the Chebyshev and Cauchy–Schwarz inqualities, we derive from the last bound that A1 68 × 8 × 16 K(n − 1)−1

P |a(i; j)|¿

a2 (i; j):

(2.34)

The Lindeberg condition implies that the right-hand side of inequality (2.34) converges to 0 as n → ∞. Finally observe that by (1.13), (2.33) and the Cauchy–Schwarz inequality for all n¿3 2

 0 ; J1 ) 6 A2 68nE a(I

8

Pn Pn i=1

2 j=1 a (i; j)

n−1

616M:

Since  ¿ 0 can be selected arbitrarily small, this combined with (2.34) and (2.32) completes the proof of Theorem 1.1.

Acknowledgements We thank two referees for their comments that were helpful for improving our presentation.

References Bolthausen, E., 1984. An estimate of the remainder in a combinatorial central limit theorem. Z. Wahrsch. Verw. Gebiete 66, 379–386. Haeusler, E., Mason, D.M., Turova, T.S., 2000. A study of serial ranks via random graphs. Bernoulli 6 (3), 541–570. HÃajek, J., 1961. Some extensions of the Wald–Wolfowitz–Noether theorem. Ann. Math. Statist. 32, 506–523. Hallin, M., Ingenbleek, J.-Fr., Puri, M.L., 1985. Linear serial rank tests for randomness against ARMA alternatives. Ann. Statist. 13, 34–71. Hallin, M., Vermandele, C., 1996. An elementary proof of asymptotic normality for serial rank statistics. In: E. Brunner, M. Denker (Eds.), Festschrift in honor of Madan L. Puri on the occasion of his 65th birthday. Research Developments in Probability and Statistics, 163–191. VSP, Utrecht. Lombard, F., 1986. An elementary proof of asymptotic normality for linear rank statistics. South African Statist. J. 20, 29–35. Motoo, M., 1957. On Hoe ding’s combinatorial central limit theorem. Ann. Inst. Statist. Math. 8, 145–154. Schneller, W., 1988. A short proof of Motoo’s combinatorial central limit theorem using Stein’s method. Z. Wahrsch. Verw. Gebiete 78, 249–252.

440 D.M. Mason, T.S. Turova / Journal of Statistical Planning and Inference 91 (2000) 427–440 Stein, C., 1972. A bound for the error in the normal approximation to the distribution of a sum of dependent random variables. Proceedings of the Sixth Berkeley Symposium on Mathematics Statistics and Probability, Vol. 2, pp. 583– 602. Wald, A., Wolfowitz, J., 1943. An exact test for randomness in the nonparametric case based on serial correlation. Ann. Math. Statist. 14, 378–388.