Journal of Statistical Planning and Inference 136 (2006) 1 – 32 www.elsevier.com/locate/jspi
Asymptotic linearity of serial and nonserial multivariate signed rank statistics Marc Hallin∗,1 , Davy Paindaveine1 Département de Mathématique, I.S.R.O, and E.C.A.R.E.S, Université Libre de Bruxelles, Bruxelles B-1050, Belgium Received 5 November 2003; accepted 27 May 2004 Available online 25 July 2004
Abstract Asymptotic linearity plays a key role in estimation and testing in the presence of nuisance parameters. This property is established, in the very general context of a multivariate general linear model with elliptical VARMA errors, for the serial and nonserial multivariate rank statistics considered in Hallin and Paindaveine (Ann. Statist. 30 (2002a) 1103; Bernoulli 8 (2002b) 787 Ann. Statist. 32 (2004), to appear) and Oja and Paindaveine (J. Statist. Plann. Inference (2004), to appear). These statistics, which are multivariate versions of classical signed rank statistics, involve (i) multivariate signs based either on (pseudo-)Mahalanobis residuals, or on a modified version (absolute interdirections) of Randles’s interdirections, and (ii) a concept of ranks based either on (pseudo-)Mahalanobis distances or on lift-interdirections. © 2004 Elsevier B.V. All rights reserved. MSC: 62E20; 62G10; 62H10; 62M10 Keywords: Asymptotic linearity; Interdirections; Lift-interdirections; Linear models; Multivariate ranks; Multivariate signs; VARMA models
∗ Corresponding author. Tel.: +32-2650-5886; fax: +32-2650-5899.
E-mail address:
[email protected] (M. Hallin). 1 Research supported by a P.A.I. contract of the Belgian federal Government, and an A.R.C. contract of the
Communauté française de Belgique. 0378-3758/$ - see front matter © 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.jspi.2004.05.013
2
M. Hallin, D. Paindaveine / Journal of Statistical Planning and Inference 136 (2006) 1 – 32
1. Introduction 1.1. Rank-based inference for multivariate observations Whereas the classical univariate theory of rank-based inference (rank tests and Restimation) presents a pretty complete and coherent body of methods applicable to a variety of models, ranging from simple location and scale problems to general linear and time series models, the corresponding multivariate theory is much less systematic and elaborate. The reason for this relative underdevelopment certainly lies in the difficulty of defining an adequate multivariate concept of ranks. And indeed, except for the theory of componentwise ranks (see Puri and Sen, 1971 for a systematic account), which suffers a severe lack of affine invariance, the results in the area are rather piecemeal, scattered, and incomplete. Recently, however, the subject has received renewed attention. Hettmansperger et al. (1997), Möttönen and Oja (1995), Möttönen et al. (1997, 1998), Oja (1999), Ollila et al. (2004), and Visuri et al. (2003) are proposing estimation and testing methods based either on spatial signs and ranks, or on an affine-equivariant concept of signs and ranks related with the well-known Oja (1983) median. Randles (1989), Peters and Randles (1990), Randles and Peters (1990), Jan and Randles (1994), Randles and Um (1998) are proposing affine-invariant multivariate signed rank procedures based on Randles’s (1989) concept of interdirections (a multivariate sign concept) and the ranks of Mahalanobis distances. Their procedures require elliptically symmetric errors, while Oja’s are valid under a more general assumption of central symmetry. Another approach to multivariate ranks can be based on concepts of (preferably affine-invariant) data depth—see Liu (1990), Liu et al. (1999), or Zuo and Serfling (2000) for a discussion. Invariance, in this strand of literature, is mainly considered in connection with robustness (as opposed to efficiency). Moreover, all methods are restricted to location and regression models with independent observations. Inspired by Le Cam’s asymptotic theory of statistical experiments, a different point of view is taken in a series of papers by Hallin and Paindaveine (2002a–c, 2004a) and Oja and Paindaveine (2004), where, based on the same concepts of multivariate signs and ranks as above, locally asymptotically optimal procedures are developed for a broader class of models, including multivariate time series ones. All these results however only address those testing problems for which exact residuals can be computed under the null hypothesis—essentially, thus, null hypotheses of the form = 0 , under which the parameter of interest is completely specified. In practice, null hypotheses of interest seldom are of that type, and usually consist in imposing some limited number of constraints under which still remains partially unspecified. The univariate literature on ranks then usually proposes tests based on the so-called aligned ranks, computed from estimated residuals. The key result in the study of the asymptotic behavior of these aligned ranks is an asymptotic linearity property of the test statistics under consideration: see Jureˇcková (1969), van Eeden (1972), Heiler and Willers (1988), Koul (1992), Hallin and Puri (1994), and many others for univariate rank and signed rank results of the same type. Despite of its purely theoretical nature, such a result is thus of considerable importance for the applications. The purpose of this paper is to derive an asymptotic linearity property in the multivariate case, for the serial and nonserial statistics proposed in Hallin and Paindaveine (2002a–c,
M. Hallin, D. Paindaveine / Journal of Statistical Planning and Inference 136 (2006) 1 – 32
3
2004a) and Oja and Paindaveine (2004). Asymptotic linearity only makes sense in the context of a given model. This model thus should be as general as possible. The multivariate (multiresponse) linear model with VARMA error terms we are considering here encompasses all situations in which rank-based procedures have been considered so far. The resulting multivariate aligned rank tests are constructed in a companion paper (Hallin and Paindaveine, 2004b). The paper is organized as follows. In Section 2, we define the various concepts of multivariate signs—pseudo-Mahalanobis signs or (absolute) interdirections—and ranks—the pseudo-Mahalanobis and the lift-interdirection ones—to be used in the sequel, and provide the corresponding asymptotic representation results. Section 3 deals with the linear model with VARMA errors, its uniform local asymptotic normality, and the regularity assumptions under which this property holds. In Section 4, we establish the asymptotic linearity of the statistics under study, which is the main result of this paper. Technical proofs are concentrated in Appendix.
2. Multivariate ranks, multivariate signs, and rank-based statistics 2.1. Serial and nonserial statistics Let Z := (Z1 , . . . , Zn ) be an n-tuple of i.i.d. k-variate random vectors. Denoting by + a symmetric positive definite k × k matrix (the scatter matrix),and by f : R+ 0 → R a ∞ k−1 nonnegative function (the radial density) such that f > 0 a.e. and 0 r f (r) dr < ∞, we assume throughout that Z has an elliptical density. More precisely, we make the following assumption. Assumption (A1). Z has an elliptical density, of the form Rnk , where f (z; , f ) := ck,f (det )−1/2 f (z ),
z ∈ Rk .
n
t=1 f (zt ; , f ), (z1 , . . . , zn )
∈
(2.1)
As usual, z := (z −1 z)1/2 denotes the norm of z in the metric associated with . The constant ck,f is the normalization factor (k k−1;f )−1 , where k stands for the (k − 1)∞ dimensional Lebesgue measure of the unit sphere Sk−1 ⊂ Rk , and l;f := 0 r l f (r) dr. Here and in the sequel, we write −1/2 for the unique upper-triangular k × k array with positive diagonal elements satisfying −1 = (−1/2 ) −1/2 . Each vector Zt decomposes into Zt = dt ()1/2 Ut (), where dt () := Zt , and Ut () := −1/2 Zt /dt (). Note that U1 (), . . . , Un () are i.i.d., and uniformly distributed over Sk−1 , hence generalizing the traditional concept of signs: we henceforth call them multivariate signs. Similarly, d1 (), . . . , dn () are i.i.d. with probability density function f˜k (r) := (k−1;f )−1 r k−1 f (r)I[r>0] ,
r ∈ R,
(2.2)
where IA stands for the indicator function of the set A. Denote by F˜k the corresponding distribution function.
4
M. Hallin, D. Paindaveine / Journal of Statistical Planning and Inference 136 (2006) 1 – 32
The uniform local asymptotic normality (ULAN) property of the multiresponse linear model with elliptical VARMA errors and the structure of the corresponding central sequence (see Section A.1) imply that all the relevant information (in this very general framework) about the serial component of the model is contained in generalized cross-covariance matrices of the form n (n) −1 −1/2 i;,K :=(n − i) K1 (dt ())K2 (dt−i ())Ut ()Ut−i () × 1/2 ,
t=i+1
i = 1, . . . , n − 1,
(2.3)
where K1 and K2 are adequate real-valued score functions. The intuitive interpretation (n) of such matrices is clear: letting indeed K1 (d) = K2 (d) = d, i;,K reduces to (n − (n) (n) i)−1 −1 t Zt Z t−i =−1 Ci , where Ci (taking into account the fact that centering of the Zt ’s is not required) is the traditional lag-i-cross-covariance matrix. The functions K1 and K2 thus are weighting the observations according to their distances from the origin, bringing some flexibility in the assessment of serial cross-dependencies—a flexibility that allows for improving either robustness, or efficiency. For the trend part of the model, this information is contained in nonserial statistics of the form (n)
i;,K := (n − i)−1 −1/2
n t=i+1
K0 (dt ())Ut ()xt−i K(n) , i = 0, . . . , n − 1,
(2.4)
K(n) are nonrandom weights where K0 again is an adequate score function, whereas xt−i related with the regression constants in the model. The intuitive interpretation of those nonserial statistics is very much the same as that of the serial ones (2.3). For K0 (d) = d, they are directly related to linear regression coefficients; here again, the function K0 allows for weighting the observations according to their distances to the origin.
2.2. Pseudo-Mahalanobis signs and ranks Both the serial statistics in (2.3) and the nonserial ones in (2.4) are measurable with respect to (a) the distances dt () between the sphericized vectors −1/2 Zt and the origin in Rk which, under the assumptions made, are i.i.d. over the positive real line, so that their ranks have the same distribution-freeness and maximal invariance properties as those of the absolute values of any univariate symmetrically distributed univariate n-tuple, and (b) the multivariate signs Ut () := −1/2 Zt /dt () which, under the same conditions, are uniformly distributed over the unit sphere. These quantities however both involve the (generally unknown) scatter matrix . If finite second-order moments exist, a “natural” root-n consistent candidate for estimating is the empirical covariance matrix n−1 nt=1 Zt Z t . The robustness properties of empirical
M. Hallin, D. Paindaveine / Journal of Statistical Planning and Inference 136 (2006) 1 – 32
5
covariances however are rather poor, and finite second-order moments need not exist. More generally, we thus assume the following. (n) Assumption (B1). A sequence of estimators of exists, such that
√ (n) − a) = OP (1) as n → ∞ for some positive real a, and (i) n( (n) (ii) is invariant under permutations and reflections (with respect to the origin in Rk ) of the vectors Zt . Assumption (B1) will be sufficient for the asymptotic linearity result in Section 4. However, the affine-equivariance of the proposed nonparametric versions of (2.3) and (2.4) also (n) requires the following equivariance assumption on . (n) Assumption (B2). The estimator := is quasi-affine-equivariant, in the sense that, (M) stands for the statistic for all k × k full-rank matrix M, (M) = d M M , where computed from the n-tuple (MZ1 , . . . , MZn ), and d denotes some positive scalar that may depend on M and on the sample (Zt , t = 1, . . . , n), but not on t.
Under Assumption (B2), the pseudo-Mahalanobis distances dt ( ) := Zt = −1 1/2 (Zt Zt ) , t = 1, . . . , n are quasi-affine-invariant, in the sense that MZt (M) = −1/2 d Zt . The word “quasi” stresses that the equivariance/invariance properties of and dt ( ) hold up to some scalar factor, that does not depend on t. This factor, however, t , t =1, . . . , n of the dt ( disappears when considering the ranks R )’s. Therefore, these ranks are strictly affine-invariant (that is, affine-invariant in the usual sense): call them the pseudoMahalanobis ranks. The corresponding multivariate signs Wt := Ut ( ) will be referred to as pseudo-Mahalanobis signs. The terminology Mahalanobis signs and Mahalanobis ranks will be used in case is the classical covariance matrix. −1/2 −1/2 Denoting, by computed from the n-tuple (MZ1 , . . . , MZn ), (M) the statistic −1/2 under Assumption (B2) enjoys the equivariance property −1/2 −1/2 −1 (M) = d −1/2 O M ,
(2.5)
where O is some k × k orthogonal matrix (for a proof, see Randles, 2000, p. 1267). (n) (n) For each and n, the group of continuous monotone radial transformations G ={Gg }, k n acting on (R ) and characterized by 1/2 U1 (), . . . , g(dn ())1/2 Un ()), G(n) g (Z1 , . . . , Zn ) := (g(d1 ())
(2.6)
where g : R+ → R+ is a continuous monotone increasing function such that g(0) = 0 and limr→∞ g(r) = ∞, is a generating group for the family of elliptical densities f { nt=1 f (n)
(n)
(. ; , f )}. Denote by Rt () the rank of the distance dt () = Zt among d1 (), . . . , (n) (n) dn (): the vector of multivariate signed ranks (U1 (), . . . , Un (), R1 (), . . . , (n) (n) Rn ()) constitutes a maximal invariant for the corresponding group G of radial
6
M. Hallin, D. Paindaveine / Journal of Statistical Planning and Inference 136 (2006) 1 – 32
transformations. These genuine ranks cannot be computed from Z1 , . . . , Zn . However, they t(n) , as shown can be consistently recovered by considering the pseudo-Mahalanobis ranks R by the following result (see Peters and Randles, 1990 for a proof). (n)
(n)
t −Rt ()) Lemma 2.1. Assume that Assumptions (A1) and (B1) hold. Then, for all t, (R is oP (n) as n → ∞. (n) (n) (n) ) are obviously invariant under G , The pseudo-Mahalanobis signs Wt := Ut ( irrespective of the true value of . They also are affine-equivariant in the following sense: if (n) (n) (n) Wt (M) denotes a sign computed from (MZ1 , . . . , MZn ), then Wt (M) = OWt , where O is the orthogonal matrix involved in (2.5). Finally, the following consistency result is proved in Hallin and Paindaveine (2004a). (n)
(n)
Lemma 2.2. Assume that Assumptions (A1) and (B1) hold. Then, for all t, Wt − Ut () is OP (n−1/2 ) as n → ∞. For k = 1, pseudo-Mahalanobis ranks and pseudo-Mahalanobis signs clearly reduce to the ranks of absolute values and traditional signs, respectively.
2.3. Hyperplane-based signs and ranks Pseudo-Mahalanobis signs and ranks were entirely based on an estimation of the underlying scatter matrix. A completely different approach can be based on counts of hyperplanes, and leads to a modification of Randles’s interdirections (namely, the absolute interdirections) for multivariate signs, to Oja and Paindaveine’s (2004) concept of lift-interdirection ranks for the ranks. Writing Q := {t1 , t2 , . . . , tk−1 } (1t1 < t2 < · · · < tk−1 n) for an arbitrary ordered set of indices with size (k − 1), let ZQ := (Zt1 , . . . , Ztk−1 ). Denote by eQ the vector whose . components are the cofactors of the last column in the array (Z ..z). This vector e is Q
Q
orthogonal to the hyperplane spanned by the k − 1 columns of ZQ . We say that z0 ∈ Rk z = 0 iff sign(e z) > 0, =0, or < 0 lies above, on, or below the hyperplane with equation eQ Q (where sign(x) := I[x>0] − I[x<0] ); note that the ordering in Q determines what is meant by “above”, as opposed to “below”. A hyperplane-based empirical angular distance between two vectors v, w in Rk then can be defined as c(v, w) :=
1 {1 − sign(eQ v)sign(eQ w)}. 2 Q
(n)
The statistics qst := c(Zs , Zt ) are the so-called Randles interdirections (see Randles, (n) 1989); qst is—up to a small-sample correction—the number of hyperplanes in Rk passing
M. Hallin, D. Paindaveine / Journal of Statistical Planning and Inference 136 (2006) 1 – 32
7
through the origin and (k − 1) out of the (n − 2) points Z1 , . . . , Zs−1 , Zs+1 , . . . , Zt−1 , Zt+1 , . . . , Zn that separate Zs and Zt . Interdirections provide affine-invariant estimations of the Euclidean angles between the sphericized vectors −1/2 Zt , that is, they estimate the scalar products between the corresponding spatial signs Ut () defined in Section 2.1. More precisely, one can show the following (see Hallin and Paindaveine, 2002a for a proof based on U-statistics). n −1 (n) Lemma 2.3. Assume that Assumption (A1) holds. Then, ( k−1 ) c (v, w) is a consistent estimator for
−1 arccos
−1/2 v −1/2 v
−1/2 w
.
−1/2 w
Lemma 2.3 implies that Randles’ interdirections allow for an estimation of the cosines Us ()Ut (). These cosines (the signs Ut () themselves are not required) are sufficient in some important particular cases (such as one-way analysis of variance), since the parametric versions of locally asymptotically optimal test statistics involve the Ut ()’s only through their mutual cosines. In such cases, Randles’ interdirections can be used with the same success as in Hallin and Paindaveine (2002a, b), or Randles and Um (1998). For more sophisticated testing problems however, such as the problem of testing for the adequacy of a VARMA model (see Hallin and Paindaveine, 2004a), locally asymptotically optimal parametric procedures involve the Ut ()’s through quantities of the form Us ()NUt (), where N is some symmetric positive definite matrix (which often depends on the shape matrix , and therefore has to be estimated). In such cases, Randles’interdirections are not sufficient anymore, as they cannot estimate the scalar products (N1/2 Us ()) (N1/2 Ut ()). We therefore introduce the following concept of absolute interdirections. Denoting by {e1 , . . . , ek } the canonical basis in Rk , consider the interdirection 1/2 1/2 (n) e , Zt ) associated with the pair ( e , Zt ) in the sample (Z1 , . . . , Zn ), ct; := c( (n)
and let Vt
(n)
(n)
(n)
(n)
(n)
n −1 := (cos(pt;1 ), . . . , cos(pt;k )) , where pt; := ( k−1 ) ct; . Call Vt (n) Zt .
absolute interdirection associated with residual following consistency and equivariance properties.
the
Absolute interdirections enjoy the
Lemma 2.4. Assume that Assumptions (A1) and (B1) hold. Then, n −1 1/2 n −1 ) c( v, w) = ( k−1 ) c(1/2 v, w) + oL1 (1), as n → ∞, for all v, w ∈ Rk , (i) ( k−1 and (n) (n) (ii) Vt = Ut () + oP (1), as n → ∞. Assume moreover that Assumption (B2) holds. Then, denoting by Vt (M) the statistic Vt computed from the n-tuple (MZ1 , . . . , MZn ), where M is a k × k full-rank matrix, (n)
(n)
(iii) Vt (M) = OUt () + oP (1) as n → ∞ (so that Vt (M) = OVt + oP (1) as n → ∞), where O is the orthogonal matrix involved in the equivariance relation (2.5).
8
M. Hallin, D. Paindaveine / Journal of Statistical Planning and Inference 136 (2006) 1 – 32
Proof. (i) First note that 1 u) − sign(eQ v)]sign(eQ w)| [sign(eQ |c(u, w) − c(v, w)|= | 2 Q
1 |sign(eQ u) − sign(eQ v)| 2 Q
1 (1 − sign(eQ u)sign(eQ v)). = 2 Q
under permutation of the Zt ’s (see Assumption (B1)) imply This and the invariance of that
−1 1/2 n 1/2 |c( v, w) − c( v, w)| E k−1 1 1/2 1/2 E 1 − sign(eQ v)sign(e v) , (2.7) Q0 0 2 where Q0 = (1, 2, . . . , k − 1). Now, for any (k − 1)-tuple (z1 , . . . , zk−1 ) such that the z=0 distance (z1 , . . . , zk−1 ) between (a)1/2 v and the hyperplane Q0 with equation eQ 0 is strictly positive (with the scalar a > 0 in Assumption (B1)), we have that 1 1/2 1/2 v)sign(e v)|Z = z , . . . , Z = z E 1 − sign(eQ 1 1 k−1 k−1 Q 0 0 2
1/2 = P Q0 separates v and (a)1/2 v|Z1 = z1 , . . . , Zk−1 = zk−1
1/2 P v − (a)1/2 v > (z1 , . . . , zk−1 )/2|Z1 = z1 , . . . , Zk−1 = zk−1 , which is o(1) as n → ∞, since v − (a)1/2 v = oP (1) as n → ∞ (see Assumption (B1)). The absolutely continuity of the common distribution of Zt ’s with respect to the Lebesgue measure implies that this holds a.s. (in the joint distribution of Z1 , . . . , Zk−1 ). Lebesgue dominated convergence theorem then yields the desired result that (2.7) is o(1) as n → ∞. (ii) From the mean value theorem,
(n) (n) E e (Vt − Ut ())| Zt
(n) E pt; − −1 arccos(e Ut ())| Zt
−1 1/2 n E |c( e , Zt ) − c(1/2 e , Zt )| Zt k−1
−1 n 1/2 −1 (n) + E | c( e , Zt ) − arccos(e Ut ()) Zt . k−1 1/2
The result then follows from (i) and from Lemma 2.3.
M. Hallin, D. Paindaveine / Journal of Statistical Planning and Inference 136 (2006) 1 – 32
9
(iii) Denote by c(v, w) = c(v, w; Z) (resp., by c(v, w; MZ)) the interdirection associated with (v, w) in the sample Z1 , . . . , Zn (resp., in the sample MZ1 , . . . , MZn ). Then
−1
−1 1/2 1/2 n n c( (M)e , MZt ; MZ)= c d 1/2 M O e , MZt ; MZ k−1 k−1
−1 1/2 n c O e , Zt ; Z , = k−1 so that, working as in (ii) we obtain (n)
(n)
(n)
e Vt (M) = cos(pt; (M)) = (O e ) Ut () + oP (1), as n → ∞.
Lemma 2.4 shows that absolute interdirections allow for “reconstructing” any function of the standardized residuals Ut , and in particular quantities of the form Us NUt . In case → N() is continuous, and provided that N(a) = N() for any a ∈ R+ , the esti(n) mator can be plugged in without affecting asymptotic results. Note that, unlike the −1/2 −1/2 Zt / Zt , absolute interdirections are only (pseudo-)Mahalanobis signs Wt := asymptotically affine-equivariant, in the sense that they are asymptotically equivalent to affine-equivariant random vectors (without being affine-equivariant in the usual sense). We now consider hyperplane-based ranks. Write P := {t1 , t2 , . . . , tk } (1 t1 < t2 < · · · ) < tk n) for an arbitrary ordered k-tuple of integers in {1, . . . , n}. Denote by (d0P , dP the vector whose components are the cofactors of the last column in the array
1 1 ... 1 1 . Zt1 Zt2 . . . Ztk z The vector dP is orthogonal to the hyperplane going through Zt1 , . . . , Ztk , which has equa z = 0. Again, the sign of d tion d0P + dP 0P + dP z allows to determine on which side of that hyperplane the point z lies. A hyperplane-based empirical distance between any vector v and the origin in Rk then can be defined as l (n) (v) :=
1 1 − sign(d0P + dP v) sign(d0P − dP v) , 2 P
i.e., as the number of hyperplanes in Rk passing through k out of the n points Z1 , . . . , Zn that separate v and its reflection −v. This concept of distance from the origin introduced by Oja and Paindaveine (2004) however suffers a lack of symmetry, and they rather recommend using the symmetrized distances l (n) (v) :=
1 (s)v) sign(d0P (s) − dP (s)v) , 1 − sign(d0P (s) + dP 2 s P
where, for some P = (t1 , . . . , tk ) and some s ∈ {−1, 1}k ({−1, 1}k denotes the set of all k (s)) stands for the vector of cofactors associated vectors with entries 1 or −1), (d0P (s), dP
10
M. Hallin, D. Paindaveine / Journal of Statistical Planning and Inference 136 (2006) 1 – 32
with the last column in the array
1 1 ... 1 s1 Zt1 s2 Zt2 . . . sk Ztk
1 . z (n)
The resulting (symmetrized) lift-interdirections l t := l (n) (Zt ), t = 1, . . . , n, are invariant under reflections (with respect to the origin in Rk ) of the Zt ’s. As shown by the following (n) result (Oja and Paindaveine, 2004), their ranks R t are asymptotically equivalent to the ranks of the genuine distances dt (). (n)
(n)
Lemma 2.5. Assume that Assumptions (A1) hold. Then, for all t, (R t − Rt ()) is oP (n) as n → ∞. This asymptotic equivalence result between the true ranks and the ranks of (symmetrized) lift-interdirections (along with the invariance of the latter under affine transformations, permutations and reflections of the observations) allows for building multivariate affine invariant signed-rank procedures based on interdirections and the ranks of lift-interdirections for a broad class of location and serial problems (see Oja and Paindaveine, 2004). As mentioned in the introduction, extensions of univariate signs and ranks could also be based on affine-invariant concepts of data depth. Incidently, note that the above concept of hyperplane-based ranks, in the elliptic setup under consideration, is closely related to the so-called majority depth first introduced by Singh (1991): more precisely, 1 − E[l (n) (v)] coincides with the majority depth of v (see Zuo and Serfling, 2000). 2.4. Serial and nonserial multivariate signed rank statistics Several rank-based versions of the serial and nonserial statistics (2.3) and (2.4) will be considered in the sequel, each of them based on the combination of a concept of multivariate signs (either Mahalanobis signs, pseudo-Mahalanobis signs, or absolute interdirections) with a concept of multivariate ranks (Mahalanobis, pseudo-Mahalanobis, or lift-interdirection ranks). The versions based on Mahalanobis or pseudo-Mahalanobis signs and ranks are, in the serial case, (n) n t(n) R 1 R −1/2 (n) (n) 1/2 (n) t−i i;J := J1 (2.8) J2 Wt Wt−i n−i n+1 n+1 t=i+1
and, in the nonserial case, (n) i;J
:= (n − i)
−1 −1/2
n t=i+1
J0
t(n) R (n) (n) Wt xt−i K(n) . n+1
(2.9)
These versions will serve as reference versions, in the sense that, in order to avoid unnecessary additional notation, asymptotic linearity will be stated formally for (2.8) and (2.9) only (part (i) of Proposition 4.1), then extended (part (ii) of the same proposition) to the other versions (based on the other concepts of signs and ranks).
M. Hallin, D. Paindaveine / Journal of Statistical Planning and Inference 136 (2006) 1 – 32
11
Note that, contrary to the score functions K0 , K1 , and K2 appearing in (2.3) and (2.4), the score functions J0 , J1 , and J2 in (2.8) and (2.9) are defined over the open unit interval ]0, 1[. The asymptotic representation results in Proposition 2.1 induce a relation between Jand K-scores: typically, the J-score-rank-based statistics (2.8) and (2.9) are asymptotically equivalent, under radial density f, with the K-score statistics (2.3) and (2.4), respectively, provided that K = J ◦ F˜k , = 0, 1, 2. These asymptotic representation results however require some technical assumptions on the score functions J0 , J1 , and J2 . More precisely, we will assume the following. Assumption (C). The score functions J : ]0, 1[→ R, = 0, 1, 2, are continuous differ1 ences of two monotone increasing functions, and satisfy 0 [J (u)]2 du < ∞ ( = 0, 1, 2). (n)
(n)
We now can state the asymptotic representation results for i;J and i;J . Letting (n) i;J ;,f := −1/2
×
1 n−i
n
(n) J1 (F˜k (dt ()))
t=i+1
(n) J2 (F˜k (dt−i ()))
(n) (n) Ut ()Ut−i ()
1/2
(2.10)
and (n) i;J ;,f := (n − i)−1 −1/2
n t=i+1
(n)
(n)
(n)
J0 (F˜k (dt ()))Ut ()xt−i K(n) ,
(2.11)
we have the following; see the appendix for the proof. Proposition 2.1. Assume that Assumptions (A1), (B1), and (C) hold. Then, (n) (n) (n) (n) i;J ;,f ) and vec (i;J − i;J ;,f ) are oP (n−1/2 ) for all i, as n → ∞, and (i) vec (i;J −
(n)
(ii) the same result still holds if the pseudo-Mahalanobis signs Wt
(n)
(n)
in i;J and i;J are re
placed by the corresponding absolute interdirections, and/or if the pseudo-Mahalanobis t(n) are replaced by the lift-interdirection ranks R (n) ranks R t . Proposition 2.1 allows for deriving the null (hence, via Le Cam’s third Lemma, the (n) (n) nonnull) asymptotic distributions of i;J and i;J , since it shows the asymptotic equivalence
of these statistics with (2.10) and (2.11), respectively, whose asymptotic distributions are easily derived (see Lemma A.1). Most importantly, it also allows for determining the (null and nonnull) asymptotic distributions of the test statistics for testing linear hypotheses on the parameter in the model described in Section 3.1, hence their local powers and AREs, (n) (n) since these test statistics are quadratic forms in i;J and i;J . Last but not least, it also
allows for an optimal choice of the score functions J , = 0, 1, 2 (see the comments after Lemma A.1).
12
M. Hallin, D. Paindaveine / Journal of Statistical Planning and Inference 136 (2006) 1 – 32
3. The linear model with VARMA error terms 3.1. The model Asymptotic linearity properties are characterizing the impact of a “small” perturbation of underlying parameters on the asymptotic behavior of the statistics under study. Such properties thus are inherently related to some underlying model. The model considered throughout this paper is the very general multivariate linear model with VARMA error terms Y(n) = X(n) + U(n) , where
x
1,1
X(n) := ... x n,1 1,1 := ... m,1
x1,2 .. .
(3.1) x x1,m 1 .. := .. . .
...
xn,2 . . . xn,m 1,2 . . . 1,k .. .. . . m,2 . . . m,k
and
xn
denote an n × m matrix of constants (the design matrix), and the m × k regression parameter, respectively. In the special case where the disturbances Ut are uncorrelated, this model contains the classical one-sample, two-sample, and ANOVA (or m-sample) models, as well as more general regression and analysis of variance/covariance models. The m-sample model (with location parameters 1 , . . . , m and sample sizes n1 , . . . , nm ), for instance, is obtained with 1 0 ... 0 n1
X(n) := ... 0
.. . 0
.. .
. . . 1nm
1
and := ... , m
where 1n : =(1, . . . , 1) ∈ Rn . Now, instead of the traditional assumption that the error term U U 1,1 U1,2 . . . U1,k 1 .. .. := .. U(n) := ... . . . Un,1
Un,2
...
Un,k
Un
is white noise, we rather assume Ut , t = 1, . . . , n to be a finite realization (of length n) of a solution of the multivariate linear stochastic difference equation (a VARMA(p0 , q0 ) model) (3.2) A(L)Ut = B(L)t , t ∈ Z, p0 q0 Bi Li for some (p0 + q0 )-tuple of where A(L) := Ik − i=1 Ai Li and B(L) := Ik + i=1 k × k real matrices (A1 , . . . , Ap0 , B1 , . . . , Bq0 ), {t | t ∈ Z} is a k-dimensional white-noise
M. Hallin, D. Paindaveine / Journal of Statistical Planning and Inference 136 (2006) 1 – 32
13
process, and L stands for the lag operator. Under this model, the observation Y Y Y ... Y 1,1
Y(n) := ... Yn,1
1,2
.. .
Yn,2
1,k
...
1
.. := .. . . Yn,k Yn
is the realization of a k-variate VARMA process {Yt , t ∈ Z} with trend xt . Of course, asymptotic linearity requires some regularity assumptions. These assumptions deal with the asymptotic behavior of the design matrices X(n) , the coefficients and the (elliptical) innovation density of the VARMA model (3.2), and the score functions involved in the statistics under study. For convenient reference, all these assumptions are listed here. Let us begin with some structural conditions on the trend part of the model. The following assumptions are standard in the context (see Garel and Hallin, 1995). (n)
Assumption (D1). Let Ci
:= (n − i)−1
n
by D(n) the diagonal matrix with elements
(n) (n) t=i+1 xt xt−i , i = 0, 1, . . . , n − 1, and denote (n) (n) (C0 )11 , . . . , (C0 )mm .
(n)
(i) (C0 )jj > 0 for all j. (n) (n) (n) (ii) Let Ri := (D(n) )−1/2 Ci (D(n) )−1/2 . The limits limn→∞ Ri =: Ri exist for all i; R0 is positive definite, and therefore can be factorized into R0 = (KK )−1 for some full-rank m × m matrix K. Letting K(n) := (D(n) )−1/2 K, note that K(n) is also of full rank. (n) (iii) The classical Noether conditions hold: the (xt )j , t = 1, . . . , n, are not all equal, and, (n) (n) letting x¯j := n−1 nt=1 (xt )j , (n)
(n)
max1 t n ((xt )j − x¯j )2 lim = 0, n (n) (n) 2 n→∞ t=1 ((xt )j − x¯ j )
j = 1, . . . , m.
Note that Noether conditions also imply that (n)
max1 t n (xt )2j lim = 0, n (n) 2 n→∞ t=1 (xt )j
j = 1, . . . , m.
(3.3)
For the serial part of the model, we essentially require the VARMA model (3.2) to be causal and invertible. The assumptions on the difference operators are actually the same as in Hallin and Paindaveine (2004a), where the problem of testing the adequacy of a specified VARMA model is considered. p0 q0 Ai zi ) = 0 and det(Ik + i=1 Bi z i ) = 0 Assumption (D2). All solutions of det(Ik − i=1 (|Ap0 | = 0 = |Bq0 |) lie outside the unit ball in C. Moreover, the greatest common left q0 p0 Ai zi and Ik + i=1 Bi zi is the identity matrix Ik . divisor of Ik − i=1 Under Assumption (D2), {t } is {Ut }’s (hence also {Yt }’s) innovation process. The set of assumptions (A) deals with the density of this innovation. For local asymptotic normality, the assumption of elliptical symmetry (Assumption (A1)) is to be reinforced into
14
M. Hallin, D. Paindaveine / Journal of Statistical Planning and Inference 136 (2006) 1 – 32
Assumption (A1 ). Same as Assumption (A1), but with k+1,f < ∞. Moreover, f 1/2 is also required to satisfy a quadratic mean differentiability property: Assumption (A2). Denoting by L2 (R+ 0 , k−1 ) the space of measurable functions h : ∞ [h(n)]2 nk−1 dn < ∞, the square root f 1/2 of the radial density R+ → R satisfying f 0 0 + 2 f is in the subspace W 1,2 (R+ 0 , k−1 ) of L (R0 , k−1 ) containing all functions admitting a + 2 weak derivative that also belongs to L (R0 , k−1 ). Assumption (A2) is strictly equivalent to the assumption that f 1/2 is differentiable in quadratic mean (see Hallin and Paindaveine, 2002a). Denoting by (f 1/2 ) the weak deriva1/2 ) /f 1/2 ]. Under (A2), the radial Fisher tive of f 1/2 in L2 (R+ 0, k−1 ), let f := −2[(f ∞ information Ik,f := 0 [f (r)]2 r k−1 f (r) dr is finite. In the pure location or purely serial problems considered in Hallin and Paindaveine (2002a, b, 2004a), this was sufficient for LAN. However, as pointed out by Garel and Hallin (1995), LAN, in this model where serial and nonserial features are mixed, requires the stronger assumption: Assumption (A3).
∞ 0
[f (r)]4 r k−1 f (r) dr < ∞.
Examples of radial densities f satisfying (A1)–(A3) are f (r) = exp(−r 2 /2) and f (r) := (1 + r 2 /)−(k+)/2 , yielding the k-variate multinormal distribution and the k-variate Student distributions with degrees of freedom, respectively. Note however that Assumption (A1 ) requires > 2. Finally, the score functions yielding locally and asymptotically optimal procedures are −1 −1 of the form J0 = J1 := f∗ ◦ F˜∗k and J2 := F˜∗k , for some radial density f∗ (with obvious notation f∗ and F˜∗k ). Assumption (C) then takes the form of an assumption on f∗ : Assumption (C ). The radial density f∗ is such that f∗ is the continuous difference of ∞ two monotone increasing functions, k+1;f∗ < ∞, and 0 [f∗ (r)]2 r k−1 f∗ (r) dr < ∞. 3.2. Uniform local asymptotic normality Under the assumptions made, the model described in Section 3.1 is uniformly locally asymptotically normal (ULAN: see Appendix). Letting Ai := 0 for p0 < i p1 and Bi := 0 for q0 < i q1 , denote by := ((vec ) , (vec A1 ) , . . . , (vec Ap1 ) , (vec B1 ) , . . . , (vec Bq1 ) ) the vector of parameters indexing the model. The orders p1 p0 and q1 q0 are taken into account in order to allow for testing against higher order VARMA dependencies (namely, testing VARMA(p0 , q0 ) against VARMA(p1 , q1 )). The hypothesis under which the observation has been generated by model (3.1)–(3.2) with parameter value , scatter matrix , and radial density f will be denoted as H(n) (, , f ).
M. Hallin, D. Paindaveine / Journal of Statistical Planning and Inference 136 (2006) 1 – 32
15
The sequences of local alternatives to be considered for this property are associated with sequences of models of the form Y(n) = X(n) (n) + U(n) ,
(n)
(n)
A(n) (L)Ut
= B(n) (L)t , t ∈ Z, (3.4) p1 (n) where (n) = + n−1/2 K(n) (n) , A(n) (L) := Ik − i=1 (Ai + n−1/2 i )Li and B(n) (L) := q1 (n) Ik + i=1 (Bi + n−1/2 i )Li , and the sequence (n)
(n)
(n) K (n) :=((vec (n) ) , (vec 1 ) , . . . , (vec (n) p1 ) , (vec 1 ) , . . . , (vec q1 ) ) ∈ R
:=Rkm+k
2 (p +q ) 1 1
is bounded as n → ∞: supn ( (n) ) (n) < ∞. The perturbed parameter is thus
(n) ⊗ Ik 0 (n) (n) −1/2 K (n) . := + (n) := + n 0 Ik 2 (p1 +q1 ) The corresponding sequence of local alternatives will be denoted by H(n) ( + (n) (n) , , f ). Denote by Gu (), u ∈ N, theGreen’s matrices associated with the autoregressive difp0 ference operator A(L) = Ik − i=1 Ai Li . These matrices can be defined recursively by min(p0 ,u) A(L)Gu = Gu − i=1 Ai Gu−i = u0 Ik , where u0 = 1 if u = 0, and u0 = 0 otherwise. Assumption (D2) also allows for defining Gu by means of −1 p0 +∞ u i Gu z := Ik − Ai z , z ∈ C, |z| < 1. (3.5) u=0
i=1
Similarly, we denote by Hu (), u ∈ N, the Green’s matrices associated with the moving average difference operators B(L). Clearly, all these Green’s matrices are continuous functions of . When no confusion is possible, we will not stress their dependence on . (n) (n) The residuals (Z1 (), . . . , Zn ()) associated with a value of the parameter then can be (n) (n) computed from the initial values −q0 +1 . . . , 0 , Y−p0 +1 , . . . , Y0 and the observed series (n)
(n)
(Y1 , . . . , Yn ) via the recursion (n) Zt ()=
p0 t−1 i=0 j =0
(n)
(n)
Hi Aj (Yt−i−j − xt−i−j )
0 −q0 +1 0 .. . (3.6) .. . . 0 Bq0 −1 Bq0 −2 . . . Ik Assumption (D2) ensures that neither the (generally unobserved) values (−q0 +1 , . . . , 0 ) of (n) (n) the innovation, nor the initial values (Y−p0 +1 , . . . , Y0 ), have an influence on asymptotic results; they all safely can be put to zero in the sequel. In order to avoid overloading the preparation of the main result, the statement of the ULAN result is postponed to the appendix. The structure of the central sequence (see Appendix)
Ik B1 + (Ht+q0 −1 . . . Ht ) ...
0 Ik .. .
... ... .. .
16
M. Hallin, D. Paindaveine / Journal of Statistical Planning and Inference 136 (2006) 1 – 32
reveals that all the asymptotically relevant information, in this elliptical context, is contained (n) (n) in the generalized cross-covariance matrices i;,f () and the nonserial statistics i;,f () (n)
(see (2.3) and (2.4)) computed from the residuals Zt (), with the score functions K0 (d) = K1 (d) = f (d) and K2 (d) = d. 4. Asymptotic linearity We now can state and prove the main result of this paper. Define hj = hj () := Hj () −
min(p 0 ,j )
Hj −i ()Ai (),
j = 0, 1, 2, . . . ,
i=1
ai ( ; ) :=
i−j min(q min(p 0 ,i−j −l) 1 ,i) j =1
l=0
(Gi−j −l−k ()Bk () ⊗ Hl () ) vec j ,
(4.1)
k=0
and bi ( ; ) :=
min(q 1 ,i)
(Ik ⊗ Hi−j ()) vec j .
(4.2)
j =1
1 1 Let further Ck (J ; f ) := 0 J (u)f ◦ F˜k−1 (u) du and Dk (J ; f ) := 0 J (u)F˜k−1 (u) du. These quantities are covariance measures between the score functions J used in the rank(n) (n) based statistics i;J () and i;J () and the score functions F˜k−1 and f ◦ F˜k−1 characterizing
the optimal procedures associated with radial density f (see the comments after Lemma A.1 for a more precise statement). Most problems of practical relevance about the parameter of interest in the model described in Section 3.1 involve null hypotheses under which the value of remains incompletely specified. In multivariate signed rank procedures, aligned signs and ranks thus have to be substituted for the genuine ones that cannot be computed from the observations. Handling this alignment device requires the asymptotic linearity property that is established in Proposition 4.1 below; we refer to Hallin and Paindaveine (2004b) for the development of the resulting aligned signed rank tests. Proposition 4.1. Assume that Assumptions (A1 ), (A2), (A3), (B1), (C) (or (C )), (D1), and (D2) hold. Then, (i) (n)
(n)
(n − i)1/2 {vec i;J ( + (n) (n) ) − vec i;J ()} ∞ 1 + Ck (J0 ; f )(Im ⊗ −1 ) (K R|i−j | K) ⊗ hj k j =0
(n)
× (vec
) = oP (1),
(4.3)
M. Hallin, D. Paindaveine / Journal of Statistical Planning and Inference 136 (2006) 1 – 32
17
and (n)
(n)
(n − i)1/2 {vec i;J ( + (n) (n) ) − vec i;J ()}
1 + 2 Dk (J2 ; f )Ck (J1 ; f )( ⊗ −1 ) ai ( (n) ; )+bi ( (n) ; ) = oP (1), k
(4.4)
as n → ∞, under H(n) (, , f ), and (n) (n) (n) (ii) the same result still holds if the pseudo-Mahalanobis signs Wt in i;J and i;J
are replaced by the corresponding absolute interdirections, and/or if the pseudot(n) are replaced by the lift-interdirection ranks R (n) Mahalanobis ranks R t . The proof of Proposition 4.1 relies on a series of lemmas. In the remaining of this (n) (n) section,we will write Z0t and Znt for Zt () and Zt (+ (n) (n) ), respectively.Accordingly, −1/2 0 −1/2 0 0 n 0 0 Zt /dt , dt := −1/2 Znt , and Utn := −1/2 Znt /dtn . Zt , Ut := let dt := We will need the following two preliminary results, the proof of which is postponed to the appendix. Lemma 4.1. Under H(n) (, , f ), (i) max1 t n Znt − Z0t = oP (1) as n → ∞; (ii) max1 t n |dtn − dt0 | = oP (1) as n → ∞; (iii) denoting by IA the indicator function of the set A, max1 t n (Utn − Ut0 I[d 0 >ε] ) = t oP (1) as n → ∞, for all ε > 0. Moreover, Utn − Ut0 = oP (1) as n → ∞, for all t. Lemma 4.2. Under H(n) (, , f ) and for sufficiently large n, {Znt , t ∈ Z} is an absolutely regular process, with mixing rates (n) (j ), j ∈ N, satisfying (n) (j )(j ), where (j ) is exponentially decreasing (to zero) as j → ∞. We now may turn to the proof of Proposition 4.1. Proof of Proposition 4.1. More precisely, we restrict to the proof of the asymptotic linearity result (4.4) for statistics of the serial type. One can check that the nonserial case (4.3) follows along the same lines, and is actually simpler. Consider the following truncation of the score functions J , = 1, 2. For all m ∈ N0 , define 0 if u m1 ! ! J 2 m u − 1 if m1 < u m2 m m (m) J (u) := J (u) if m2 < u1 − m2 ! ! ! 2 1 2 1 J 1 − m m 1 − m − u if 1 − m < 1u1 − m 0 if u > 1 − m . (m)
Since J is continuous (see Assumption (C)), the function J is also continuous on ]0, 1[. (m) As J is compactly supported in ]0, 1[ for all m; consequently, it is bounded for all m. Clearly it safely can be assumed that J is a monotone increasing function (rather than
18
M. Hallin, D. Paindaveine / Journal of Statistical Planning and Inference 136 (2006) 1 – 32
the difference of two monotone increasing functions), so that (at least for m sufficiently (m) large) |J | is bounded by |J | uniformly in m and u, i.e., there exists some M such that (m) |J (u)||J (u)| for all u ∈]0, 1[ and all m M. We have to prove that, under H(n) (, , f ), as n → ∞, (n)
(n)
(n − i)1/2 vec (i;J ( + (n) (n) ) −i;J ())
+
1 Dk (J2 ; f )Ck (J1 ; f )( ⊗ −1 )[ai ( (n) ; ) + bi ( (n) ; )] k2
(4.5)
(n) (n) is oP (1). Proposition 2.1 shows that (n − i)1/2 vec (i;J ()) − (n − i)1/2 vec ( i;J ;,f ())
is oP (1), as n → ∞, under the same sequence of hypotheses. Similarly, (n) (n) (n − i)1/2 vec (i;J ( + (n) (n) )) − (n − i)1/2 vec ( i;J ;,f ( + (n) (n) ))
(4.6)
is oP (1) as n → ∞, under H(n) ( + (n) (n) , , f ). It follows from contiguity that (4.6) is also oP (1) under H(n) (, , f ), as n → ∞. Consequently, (4.5) is asymptotically equivalent, under H(n) (, , f ), to (n) (n) i;J ;,f ( + (n) (n) )) − (n − i)1/2 vec ( i;J ;,f ()) (n − i)1/2 vec ( 1 + 2 Dk (J2 ; f ) Ck (J1 ; f ) ( ⊗ −1 )[ai ( (n) ; ) + bi ( (n) ; )]. k
(4.7)
Using the fact that vec (A1 BA2 ) = (A 2 ⊗ A1 ) vec B, (4.7) can be written as (1/2 ⊗ −1/2 )C(n) , where n n n C(n) :=(n − i)−1/2 vec J1 (F˜k (dtn )) J2 (F˜k (dt−i ))Utn Ut−i − (n − i)
−1/2
t=i+1 n
vec
t=i+1
0 0 J1 (F˜k (dt0 ))J2 (F˜k (dt−i ))Ut0 Ut−i
1 + 2 Dk (J2 ; f )Ck (J1 ; f )( 1/2 ⊗ −1/2 )[ai ( (n) ; ) + bi ( (n) ; )]. (4.8) k Clearly, it is sufficient to show that C(n) = oP (1), under H(n) (, , f ), as n → ∞. Now, (n;m) (n;m) (n;m) (n;m) (n;m) decompose C(n) into C(n) = D1 + D2 − R1 + R2 + R3 , where, denoting (n) by E0 the expectation under H (, , f ), n (m) (n;m) (m) −1/2 n n n n D :=(n − i) vec J (F˜k (dt ))J (F˜k (d ))Ut U 1
−1/2
− (n − i)
1 t=i+1 n
vec
t=i+1
− (n − i)−1/2 E0 vec
2
t−i
t−i
(m) (m) 0 0 J1 (F˜k (dt0 ))J2 (F˜k (dt−i ))Ut0 Ut−i
n
t=i+1
(m) (m) n n J1 (F˜k (dtn ))J2 (F˜k (dt−i ))Utn Ut−i
,
M. Hallin, D. Paindaveine / Journal of Statistical Planning and Inference 136 (2006) 1 – 32
(n;m) D2
:=(n − i)
−1/2
E0 vec
n
t=i+1
(n;m)
R1
(m) (m) n n ))Utn Ut−i J1 (F˜k (dtn ))J2 (F˜k (dt−i
1 (m) (m) + 2 Dk (J2 ; f )Ck (J1 ; f )( 1/2 ⊗ −1/2 ) k
× ai ( (n) ; ) + bi ( (n) ; ) , n 0 −1/2 )) [J1 (F˜k (dt0 ))J2 (F˜k (dt−i :=(n − i) vec t=i+1
(m) (m) 0 0 ))]Ut0 Ut−i −J1 (F˜k (dt0 ))J2 (F˜k (dt−i
(n;m) R2
19
:=(n − i)
−1/2
vec
n
,
n [J1 (F˜k (dtn ))J2 (F˜k (dt−i ))
t=i+1
(m) (m) n n ))]Utn Ut−i −J1 (F˜k (dtn ))J2 (F˜k (dt−i
and (n;m)
R3
1 (m) (m) [Dk (J2 ; f )Ck (J1 ; f ) − Dk (J2 ; f ) Ck (J1 ; f )] k2 ×( 1/2 ⊗ −1/2 )[ai ( (n) ; ) + bi ( (n) ; )].
:=
We prove that C(n) = oP (1), under H(n) (, , f ), as n → ∞ (thus completing the proof (n;m) (n;m) of (4.4)) by establishing that D1 and D2 are oP (1) under H(n) (, , f ), as n → ∞, (n;m) (n;m) (n;m) , R2 and R3 are oP (1) under the same sequence of for fixed m, and that R1 hypotheses, as m → ∞, uniformly in n. For the sake of convenience, these three results are treated as separate lemmas (Lemmas 4.3 and 4.4, and Lemma 4.5, respectively). (n;m) (n;m) (n;m) (n;m) Decompose D1 into D1,1 + D1,2 − E0 [D1,1 ], where (n;m)
D1,1
:=(n − i)−1/2 n (m) ˜ (m) ˜ n n 0 0 (m) ˜ n n (J1 (Fk (dt ))Ut − J1 (Fk (dt ))Ut )J2 (Fk (dt−i ))Ut−i ×vec t=i+1
and
(n;m) D1,2
:=(n − i)
−1/2
vec
n t=i+1
(m)
(m)
n n J1 (F˜k (dt0 ))Ut0 (J2 (F˜k (dt−i ))Ut−i
(m) 0 0 − J2 (F˜k (dt−i ))Ut−i )
.
(taking into account the independence between Z0t and Znt−i under H(n) (, , f )). We then have the following.
20
M. Hallin, D. Paindaveine / Journal of Statistical Planning and Inference 136 (2006) 1 – 32
Lemma 4.3. For any fixed m, " # (n;m) (n;m) 2 (i) E0 D1,1 − E0 [D1,1 ] = o(1), as n → ∞; " # (n;m) 2 (ii) E0 D1,2 = o(1), as n → ∞; (n;m)
(iii) D1
= oP (1), as n → ∞, under H(n) (, , f ). (n;m)
Lemma 4.4. For any fixed m, D2
= o(1), as n → ∞. (n;m)
is oP (1), as m → ∞, uniformly in n. Lemma 4.5. (i) Under H(n) (, , f ), R1 (n;m) is oP (1), as m → ∞, uniformly in n (for n sufficiently (ii) Under H(n) (, , f ), R2 large). (n;m) (iii) R3 is o(1), as m → ∞, uniformly in n. Proof of Lemma 4.3. Let us begin with the second part of Lemma 4.3. Part (ii). Since (vec (uv )) vec (xy ) = tr[(uv ) xy ] = (u x)(v y), for any k-vectors u, v, x, y, we obtain
(n;m) (n;m) E0 (D1,2 ) (D1,2 ) =(n − i)−1
n
(m) (m) E0 J1 (F˜k (ds0 ))Us0 J1 (F˜k (dt0 ))Ut0
s,t=i+1 (m) (m) ˜ n 0 0 n ))Us−i ) ))Us−i − J2 (F˜k (ds−i ×(J2 (Fk (ds−i (m) (m) n n 0 0 × (J2 (F˜k (dt−i ))Ut−i − J2 (F˜k (dt−i ))Ut−i ) .
Due to the independence, for s = t, between Z0max(s,t) and (Z0min(s,t) , Z0s−i , Z0t−i , Zns−i , Znt−i ) (note that, under H(n) (, , f ), {Z0t , t ∈ Z} is the innovation process of {Znt , t ∈ Z}), this is equal to (n − i)−1
n t=i+1
" (m) (m) n n E0 (J1 (F˜k (dt0 )))2 J2 (F˜k (dt−i ))Ut−i
2 # (m) 0 0 −J2 (F˜k (dt−i ))Ut−i . (m)
Since J1
is bounded, it is sufficient to show that
(m) (m) n n 0 0 ))Ut−i − J2 (F˜k (dt−i ))Ut−i 2 = o(1), E0 J2 (F˜k (dt−i
as n → ∞,
(4.9)
(m) 0 ))I uniformly in t. Now, with > 0 such that F˜k () < 1/m, we have J2 (F˜k (dt−i 0 ] =0 [dt−i + (note that F˜k is a continuous strictly monotone increasing function that maps R onto ]0, 1[). 0
M. Hallin, D. Paindaveine / Journal of Statistical Planning and Inference 136 (2006) 1 – 32
21
This yields (m) (m) n n 0 0 J2 (F˜k (dt−i ))Ut−i − J2 (F˜k (dt−i ))Ut−i (m) ˜ (m) ˜ n 0 n |J (Fk (d )) − J (Fk (dt−i ))| U 2
(m)
t−i
2
n 0 0 ))| Ut−i − Ut−i + |J2 (F˜k (dt−i (m) (m) ˜ 0 ))| |J (Fk (d n )) − J (F˜k (dt−i t−i 2 (m) ˜ 0 n + |J2 (Fk (dt−i ))| Ut−i
t−i
2
0 − Ut−i I[d 0
t−i >]
,
so that (m) (m) n 0 0 n J2 (F˜k (dt−i ))Ut−i − J2 (F˜k (dt−i ))Ut−i 2 (m) (m) 0 C |J (F˜k (d n )) − J (F˜k (dt−i ))|2 + C Un t−i
2
t−i
2
0 − Ut−i 2 I[d 0
t−i >]
(m) (m) n )) for some constant C. Lemma 4.1(ii) and the continuity of J2 ◦F˜k imply that J2 (F˜k (dt−i (m) ˜ (m) (n) 0 )) = o (1) as n → ∞, under H (, , f ). Since J − J2 (Fk (dt−i is bounded, this P 2 convergence to zero also holds in quadratic mean. Similarly, using Lemma 4.1(iii) and the 0 and Un , we obtain that Un − U0 I boundedness of Ut−i t−i [d 0 >] is o(1) in quadratic t−i t−i t−i
mean, as n → ∞, under H(n) (, , f ). The convergence in (4.9) follows. (m) (m) (m) n ))Un ], Part (i). Letting Tt;i := vec [(J1 (F˜k (dtn ))Utn − J1 (F˜k (dt0 ))Ut0 )J2 (F˜k (dt−i t−i we have " # (n;m) (n;m) 2 E0 D1,1 − E0 [D1,1 ] (n;m)
(n;m)
(n;m)
(n;m)
= E0 [(D1 − E0 [D1,1 ]) (D1,1 − E0 [D1,1 ])] n
(n;m) −1 = tr Var 0 [D1,1 ] = (n − i) tr Var 0 Tt;i t=i+1
= tr[Var 0 [Tt;i ]] + 2
n−i−1 j =1
n−j −i tr Cov0 [Tt;i , Tt−j ;i ] . n−i
(4.10)
First note that tr[Var 0 [Tt;i ]] = E0 [(Tt;i − E0 [Tt;i ]) (Tt;i − E0 [Tt;i ])] E0 [Tt;i 2 ], where, (m) using again (vec(uv )) vec(xy ) = (u x)(v y) and the boundedness of J2 ,
2 (m) (m) (m) n )))2 ] E0 Tt;i =E0 [J1 (F˜k (dtn ))Utn − J1 (F˜k (dt0 ))Ut0 2 (J2 (F˜k (dt−i (m) (m) C E0 [J1 (F˜k (dtn ))Utn − J1 (F˜k (dt0 ))Ut0 2 ],
which—compare with (4.9)—is o(1), as n → ∞, uniformly in t. On the other hand, the absolute regularity of {Znt , t ∈ Z} (Lemma 4.2) and the fact that {Z0t , t ∈ Z} is (under H(n) (, , f )) the innovation process of {Znt , t ∈ Z} imply that the process {(Znt , Z0t ), t ∈ Z} is also absolutely regular with the same mixing rates as {Znt , t ∈ Z}. Using Lemma 1 of Yoshihara (1976) (with p := k, k := 2, := 1, and h(x1 , x2 ) := tr(x1 x2 ) = x1 x2 ),
22
M. Hallin, D. Paindaveine / Journal of Statistical Planning and Inference 136 (2006) 1 – 32
we obtain |tr[Cov0 [Tt;i , Tt−j ;i ]]|=|E0 [T t;i Tt−j ;i ] − E0 [T t;i ]E0 [Tt−k;i ]|
4 E0 [Tt;i 2 ]((n) (j ))1/2 4 E0 [Tt;i 2 ]((j ))1/2 ,
where the sequence ((j )) is as in Lemma 4.2. Consequently, n−i−1 n−j −i tr[Cov [T , T ]] 0 t;i t−j ;i n−i j =1
∞ j =1
|tr[Cov0 [Tt;i , Tt−j ;i ]]| "
4 E0 Tt;i ] 2
∞
# ((j ))
1/2
C E0 [Tt;i , 2
j =1
since the series converges (due to the exponential decrease of the (j )’s; see Lemma 4.2 again). This entails that both terms in (4.10) are bounded by (a constant multiple of) E0 [Tt;i 2 ], a quantity which, as we showed above, is o(1) as n → ∞. The result follows. Part (iii) trivially follows from Parts (i) and (ii), and the fact that convergence in quadratic mean implies convergence in probability. (m) (m) (n;m) 0 )) Proof of Lemma 4.4. Let B1 := (n−i)−1/2 vec [ nt=i+1 J1 (F˜k (dt0 ))J2 (F˜k (dt−i 0 ]. Proceeding as in Lemma A.1, one can show that Ut0 Ut−i
1 (m) (m) −→ Nk 2 0, 2 E[(J1 (U ))2 ]E[(J2 (U ))2 ] Ik 2 , k
(n;m) L
B1
(4.11)
as n → ∞, under H(n) (, , f ). Under the sequence of local alternatives H(n) ( +
(n) (n) , , f ), as n → ∞, 1 (m) (m) Ck (J ; f )Dk (J2 ; f ) (1/2 ⊗ −1/2 )[ai ( (n) ; ) + bi ( (n) ; )] k2 1 1 L (m) (m) −→ Nk 2 0, 2 E[(J1 (U ))2 ]E[(J2 (U ))2 ] Ik 2 . k
(n;m)
B1
−
(n;m) (m) (m) n ))Un Un ], it folDefining B2 := (n − i)−1/2 vec [ nt=i+1 J1 (F˜k (dtn ))J2 (F˜k (dt−i t t−i lows from uniform local asymptotic normality that 1 (m) (m) Ck (J ; f )Dk (J2 ; f )(1/2 ⊗ −1/2 )[ai ( (n) ; ) + bi ( (n) ; )] k2 1 1 L (m) (m) 2 2 (4.12) −→ Nk 2 0, 2 E[(J1 (U )) ] E[(J2 (U )) ] Ik 2 , k
(n;m)
B2
+
as n → ∞, under H(n) (, , f ).
M. Hallin, D. Paindaveine / Journal of Statistical Planning and Inference 136 (2006) 1 – 32 (n;m)
(n;m)
(n;m)
(n;m)
Now, Lemma 4.3(iii) yields that D1 = B2 − B1 − E0 [B2 (n) n → ∞, under H (, , f ). Using this and (4.11), we obtain that (n;m)
B2
(n;m)
− E0 [B2
23
] = oP (1), as
1 L (m) (m) ] −→ Nk 2 0, 2 E[(J1 (U ))2 ]E[(J2 (U ))2 ] Ik 2 , k
as n → ∞, under H(n) (, , f ). Comparing with (4.12), it follows that (n;m)
D2
1 (m) (m) Ck (J1 ; f )Dk (J2 ; f ) (1/2 ⊗ −1/2 ) k2 ×[ai ( (n) ; ) + bi ( (n) ; )] (n;m)
=E0 [B2
]+
is o(1), as n → ∞, as was to be proved.
We now complete the proof of (4.4) by proving Lemma 4.5. Proof of Lemma 4.5. (i) In view of the independence between the dt0 ’s and the Ut0 ’s under H(n) (, , f ), we obtain
(n;m) 2
E0 [R1
]=
n 1 0 E0 [[J1 (F˜k (ds0 ))J2 (F˜k (ds−i )) n−i s,t=i+1 (m) ˜ (m) 0 ))] − J1 (Fk (ds0 ))J2 (F˜k (ds−i (m)
(m)
0 0 [J1 (F˜k (dt0 ))J2 (F˜k (dt−i )) − J1 (F˜k (dt0 ))J2 (F˜k (dt−i ))]]
0 0 )) vec (Ut0 Ut−i )] × E0 [(vec (Us0 Us−i n 1 (m) (m) 0 0 = E0 [[J1 (F˜k (dt0 ))J2 (F˜k (dt−i )) − J1 (F˜k (dt0 ))J2 (F˜k (dt−i ))]2 ] n−i t=i+1 $ 1$ 1 (m) (m) = [J1 (u)J2 (v) − J1 (u)J2 (v)]2 du dv. (4.13) 0
(m)
0
(m)
Now, J1 (u)J2 (v) converges to J1 (u)J2 (v), for all (u, v) ∈ ]0, 1[ ×]0, 1[. Also, since (m) |J (u)| |J (u)|, = 1, 2, for all m M, the integrand in (4.13) is bounded (uniformly in m) by 4 |J1 (u)|2 |J2 (v)|2 , which is integrable on ]0, 1[ × ]0, 1[ (see Assumption (C)). (n;m) 2 Consequently, the Lebesgue dominated convergence theorem yields that E0 [R1 ]= (n;m) 2 ] does not o(1), as m → ∞. This convergence is of course uniform in n, since E0 [R1 depend on n.
24
M. Hallin, D. Paindaveine / Journal of Statistical Planning and Inference 136 (2006) 1 – 32
(ii) The claim in (ii) is the same as in (i), except that dtn and Utn replace dt0 and Ut0 , respectively. Accordingly, it holds under H(n) ( + (n) (n) , , f ). That it also holds under H(n) (, , f ) follows from Lemma 3.5 in Jureˇcková (1969). (iii) Note that $
(m) |Dk (J2 ; f ) − Dk (J2 ; f )|2 =
1 0
2 (m) −1 ˜ (J2 (u) − J2 (u))Fk (u) du
k+1;f
$
k−1;f
1 0
(m)
|J2 (u) − J2 (u)|2 du.
1
(m)
Again, |J2 (u) − J2 (u)|2 4|J2 (u)|2 , with 0 |J2 (u)|2 du < ∞. Consequently, the point(m) (m) wise convergence of (J2 ) to J2 implies that Dk (J2 ; f ) − Dk (J2 ; f ) = o(1) as m → ∞. (m) We similarly obtain that Ck (J1 ; f ) − Ck (J1 ; f ) = o(1), as m → ∞. Using the fact that the sequence ( (n) ) is bounded (and the definitions of ai ( (n) ; ), bi ( (n) ; ) in (4.1), (4.2)), this implies that, for some real constant C, (n;m) 1 (m) (m) R3 2 Dk (J2 ; f )Ck (J1 ; f ) − Dk (J2 ; f )Ck (J1 ; f ) k × 1/2 ⊗ −1/2 ai ( (n) ; ) + bi ( (n) ; ) (m) (m) C Dk (J2 ; f )Ck (J1 ; f ) − Dk (J2 ; f )Ck (J1 ; f ) , which is o(1), as m → ∞, uniformly in n.
Appendix A. A.1. ULAN with any k-dimensional linear difference operator of the form C(L) := ∞Associated i (letting C = 0 for i > s, this includes, of course, the operators with finite order C L i i i=0 s), define, for any integers u and v, the k 2 u × k 2 v matrices 0 ... 0 C0 ⊗ Ik C0 ⊗ I k ... 0 C1 ⊗ Ik . . .. . . . . . (l) (A.1) Cu,v := C 0 ⊗ Ik Cv−1 ⊗ Ik Cv−2 ⊗ Ik . . . .. .. . . Cu−1 ⊗ Ik
Cu−2 ⊗ Ik
. . . Cu−v ⊗ Ik
M. Hallin, D. Paindaveine / Journal of Statistical Planning and Inference 136 (2006) 1 – 32
25
and
C(r) u,v
Ik ⊗ C0 Ik ⊗ C1 .. . := Ik ⊗ Cv−1 .. . Ik ⊗ Cu−1 (l)
0 Ik ⊗ C 0
... ... .. .
Ik ⊗ Cv−2
...
Ik ⊗ Cu−2
. . . Ik ⊗ Cu−v
(l)
(r)
0 0 .. .
I k ⊗ C0 .. .
(r)
,
(A.2)
(l)
(r)
respectively; write Cu for Cu,u and Cu for Cu,u . With this notation, note that Gu , Gu , (r) (l) (r) (l) (r) (l) H(l) , and Hu are the inverses of Au , Au , Bu , and Bu , respectively. Denoting by Cu,v (r) i and Cu,v the matrices associated with the transposed operator C (L) := ∞ i=0 Ci L , we (l) (l) (l) (l) ¯ (l) ¯ (r) ¯ (l) also have Gu = (Au )−1 , Hu = (Bu )−1 , etc. We will use the notation C u,v , C u,v , C u , etc. when the identity matrices involved in (A.1) and (A.2) are m-dimensional rather than k-dimensional. Let := max(p1 − p0 , q1 − q0 ) and0 := + p0 + q0 , and define the k 2 0 × k 2 (p1 + q1 ) matrix
M :=
G (l) 0 ,p1
.. (l) . H0 ,q1 ,
(A.3)
under Assumption (D2), M is of full rank. p0 +q0 Consider the operator D(L) := Ik + i=1 Di Li (just as M , D(L) and most quantities defined below depend on ; for simplicity, however, we are dropping this reference to ), where, putting G−1 = G−2 = · · · = G−p0 +1 = 0 = H−1 = H−2 = · · · = H−q0 +1 ,
Gq0 Gq0 +1 .. D . 1 G .. := − p0 +q0 −1 . Hp0 D p0 +q0 Hp0 +1 .. . Hp0 +q0 −1
Gq0 −1 Gq0 Gp0 +q0 −2 Hp0 −1 Hp0 Hp0 +q0 −2
. . . G−p0 +1 −1 . . . G−p0 +2 Gq0 +1 .. .. .. . . . ... G0 Gp0 +q0 . . . . H−q0 +1 Hp0 +1 . . . . H−q0 +2 . . .. .. . . Hp0 +q0 ... H0
Note that D(L)Gt =0 for t =q0 +1, . . . , p0 +q0 , and D(L)Ht =0 for t =p0 +1, . . . , p0 +q0 . (p +q ) (1) Let {t , . . . , t 0 0 } be a set of k × k matrices forming a fundamental system of solutions of the homogeneous linear difference equation associated with D(L) (such a system can be obtained, for instance, from the Green’s matrices of the operator D(L);
26
M. Hallin, D. Paindaveine / Journal of Statistical Planning and Inference 136 (2006) 1 – 32
see Hallin, 1986). Define (1) (p0 +q0 ) +1 . . . +1 (p0 +q0 ) (1) . . . +2 ¯ m () := ⊗ Ik (m > ), .+2 .. . . . (p0 +q0 ) (1) m . . . m
I2 Ik 2 0 (r) (l) (n) and Q := Hn−1 Bn−1 k P := −1 0 0 C
0
¯ n−1
,
(A.4)
¯ . where C is the Casorati matrix 0 (n) Considering the matrices i;,f () associated with the scores K0 = f , put (n)
(n)
(n)
SI ;,f () :=(n1/2 (vec 0;,f ()) , . . . , (n − i)1/2 (vec i;,f ()) , . . . , (n)
(vec n−1;,f ()) ) , (n)
(n) (n)
n1/2 TI ;,f () := L SI ;,f () and (n) (n) JI ;, := lim L (Kn ⊗ −1 ) L ,
(A.5)
n→+∞
(n) ¯ n(r) ()A ¯ (r) (), and where K ˜ denotes the m × m ˜ matrix whose m × m where L := H n,1 , ˜ is K R|i−j | K (we write K instead block in position (i, j ) (i = 1, . . . , , j = 1, . . . , ) (n) of K, ). Similarly, for the serial part and the i;,f () matrices associated with the score functions K1 = f and K2 : d → d, let (n)
(n)
SI I ;,f () :=((n − 1)1/2 (vec 1;,f ()) , . . . , (n − i)1/2 (n)
(n)
×(vec i;,f ()) , . . . , (vec n−1;,f ()) ) , (n)
(n) (n)
n 1/2 TI I ;,f () := Q SI I ;,f () and (n)
(n)
JI I ;, := limn→+∞ Q [In−1 ⊗ ( ⊗ −1 )]Q
(A.6)
(convergence in (A.5) and (A.6) follows from the exponential decrease, as u → ∞, of the Green’s matrices Gu and Hu ). The following ULAN reinforcement of Garel and Hallin’s (1995) Proposition 3.1 then follows along the same steps as in Section 3 of Hallin and Paindaveine (2004a). Proposition A.1 (ULAN). Assume that Assumptions (A1 ), (A2), (A3), (D1), and (D2) (n) hold. Let n be such that n − = O(n−1/2 ). Then, the logarithm L + (n) (n) / ;,f of the n
n
likelihood ratio associated with the sequence of local alternatives H(n) (n + (n) (n) , , f ) with respect to H(n) (n , , f ) is such that (n)
(n)
L + (n) (n) / ;,f (Y(n) ) = ( (n) ) ,f (n ) − n n
1 2
( (n) ) ,f () (n) + oP (1)
M. Hallin, D. Paindaveine / Journal of Statistical Planning and Inference 136 (2006) 1 – 32
as n → ∞, under H(n) (n , , f ), with the central sequence
(n)
(n) TI ;,f (n ) I ;,f (n ) Ikm 0 (n) 1/2 := n ,f (n ) := (n) (n) 0 M n P n I I ;,f (n ) TI I ;,f (n ) and the information matrix
I ;,f () ,f () := 0
0
27
(A.7)
I I ;,f ()
,
where I ;,f () := (1/k)Ik,f JI ;, and I I ;,f () := [k+1;f Ik,f /k 2 k−1;f ]N, , (n) with N, := M P JI I ;, P M . Moreover, ,f (n ), still under H(n) (n , , f ), is asymptotically NK (0, ,f ()). Le Cam’s third lemma then yields, for the serial and nonserial statistics (2.10) and (2.11), the following asymptotic normality result under local alternatives. Lemma A.1. Assume that Assumption (C) and the assumptions of Proposition A.1 hold, ˜ the and let (n) → = ((vec ) , (vec ) , (vec ) ) as n → ∞. Then, for all integers , , vector (n)
(n)
0;J ;,f ()) , . . . , (n − + 1)1/2 (vec −1;J ;,f ()) , (n1/2 (vec
(n) (n) ˜ 1/2 (vec 1;J ;,f ()) , . . . , (n − ) ;J (n − 1)1/2 (vec ˜ ;,f ()) )
is asymptotically normal as n → ∞, with mean 0 under H(n) (, , f ) and mean
(n) −1 1 k Ck (J0 ; f )(Im ⊗ )[lim n→∞ (K,n ⊗ Ik ) L ] (vec ) 1 k2
˜ (+1)
Ck (J1 ; f )Dk (J2 ; f ) [I˜ ⊗ ( ⊗ −1 )] Q
P M ((vec ) , (vec ) )
(A.8)
under H(n) ( + (n) (n) , , f ), and covariance matrix
1 −1 2 0 k E[J0 (U )] (K ⊗ ) 1 E[J12 (U )] E[J22 (U )] [I˜ ⊗ ( ⊗ −1 )] 0 k2 under both. Proof. The proof follows along the same argument as in Lemma 4.1 in Garel and Hallin (1995). Note that
∞
j =0 (K
R
|j | K) ⊗ hj
.. . ∞ lim (K,n ⊗ Ik ) L = (K R|i−j | K) ⊗ hj j =0 n→∞ . .. ∞ j =0 (K R|−j −1| K) ⊗ hj (n)
(A.9)
28
M. Hallin, D. Paindaveine / Journal of Statistical Planning and Inference 136 (2006) 1 – 32
and that
a1 ( ; ) + b1 ( ; )
˜ vec . = Q(+1) .. P M vec a˜( ; ) + b˜( ; )
(A.10)
(see Section 4 for the definitions of hj , aj , and bj ); (A.9) and (A.10) allow for a direct comparison between Lemma A.1 and the corresponding univariate result (Proposition 4.3) in Hallin and Puri (1994). Test statistics for linear hypotheses on the parameter of the model defined in Section 3.1 (n) (n) typically are quadratic forms in i;J and i;J . Their distributions under local alternatives
are noncentral chi-square, with noncentrality parameters that are quadratic forms in (A.8). As usual, the larger this noncentrality parameter, the higher the local powers. Consequently, Lemma A.1 provides an optimal (at radial density f) choice of the score functions J , = 0, 1, 2 as functions of f. The optimal score functions will be those maximizing the shift in (A.8), or, equivalently, maximizing the “covariances” Ck (J ; f ), =0, 1, and Dk (J2 ; f ): J0 = J1 = f ◦ F˜k−1 and J2 = F˜k−1 (see Hallin and Paindaveine, 2004b) for details. A.2. Proofs of Proposition 2.1 and Lemmas 4.1 and 4.2
Proof of Proposition 2.1. (i) The result for the serial part is established in Proposition 2 of Hallin and Paindaveine (2004a), where, however, Tyler (1987)’s estimator of scatter is used for ; one can easily check that the same proof holds for any estimate satisfying Assumption (B1). The proof for the trend part follows along similar lines, and is left to the reader. (ii) A closer look at the proof of (i) (see Proposition 2 of Hallin and Paindaveine, 2004a) t(n) be shows that it only requires that the estimated ranks R (a) invariant under permutations and reflections (with respect to the origin in Rk ) of the residuals, and (b) asymptotically equivalent to the “true” ranks, meaning that, for all t, (n)
(n)
t /(n + 1)] = [Rt ()/(n + 1)] + oP (1) [R
as n → ∞.
(n)
Similarly, all estimators Wt (for the signs) that (n) (n) (c) satisfy Wt (s1 Z1 , . . . , sn Zn ) = st Wt (Z1 , . . . , Zn ) for all (s1 , . . . , sn ) ∈ {−1, 1}n , and (d) are asymptotically equivalent to the “true signs”, meaning that, for all t, (n)
Wt
(n)
= Ut () + oP (1) as n → ∞,
successfully can be substituted for the pseudo-Mahalanobis signs in the proof of (i). This yields the desired result since, from Section 2.3, it is clear that lift-interdirection ranks and absolute interdirections do satisfy (a)–(d).
M. Hallin, D. Paindaveine / Journal of Statistical Planning and Inference 136 (2006) 1 – 32
29
Proof of Lemma 4.1. (i) Writing Hin for Hi ( + (n) (n) ), we may write, in view of (3.6), Znt − Z0t =
t−1
(n)
(n)
Hin A(n) (L)[Yt−i − ( + n−1/2 K(n) (n) ) xt−i ]
i=0 t−1
−
i=0
(n)
−1/2
=n
(n)
Hi A(L)[Yt−i − xt−i ] n
1/2
t−1 i=0
− n−1/2
t−1 i=0
(Hin
− Hi )A(L) −
t−1 i=0
(n)
Hin (n) (L)
(n)
(Yt−i − xt−i )
(n)
Hin A(n) (L) (n) K(n) xt−i ,
(A.11)
p1 (n) i 1/2 ∞ Hn − H is bounded as where (n) (L) := i i=0 i i=1 i L . Using the fact that n n → ∞ (see Lemma 4.3 in Garel and Hallin, 1995), it can be easily checked that the sums n − H )A(L) − t−1 Hn (n) (L)] of the norms of the matrix coefficients of [n1/2 t−1 (H i i=0 i i=0 i are uniformly bounded (for n sufficiently large). Consequently, t−1 t−1 (n) (n) (n) 1/2 n n (n) (Hi − Hi )A(L) − Hi (L) (Yt−i − xt−i ) et := n i=0
i=0
(n)
is a stationary process with finite variance. Therefore, max1 t n et is oP (n1/2 ). For the nonrandom term in (A.11), using the same type of arguments as above, it is easily seen that t−1 −1/2 n (n) (n) (n) (n) (n) Hi A (L) K xt−i Cn−1/2 max K(n) xt−i . max n 1 t n 1t n i=0
Now, note that (n)
(n)
(n)
(n)
K(n) xt−i =K (D(n) )−1/2 xt−i K [xt−i (D(n) )−1 xt−i ]1/2 % n &1/2 m (n) (n) (xt−i )2j (xt )2j , < n1/2 K t=1
j =1
which, in view of (3.3), is o(n1/2 ) as n → ∞, uniformly in t. The result follows. (ii) This trivially results from (i), and from the chain of inequalities (for all t = 1, . . . , n) |dtn − dt0 | −1/2 (Znt − Z0t ) −1/2 Znt − Z0t −1/2 max Znt − Z0t . 1t n
(iii) Working along the same lines as in the proof of Lemma 2 of Hallin and Paindaveine (2004a), we obtain that Utn − Ut0 I[d 0 >ε] (2/ε) −1/2 Znt − Z0t , which, in view of t
30
M. Hallin, D. Paindaveine / Journal of Statistical Planning and Inference 136 (2006) 1 – 32
(i), yields the first statement. To establish the second one, note that P[Utn − Ut0 > ] P[Utn − Ut0 I[d 0 >ε] > ] + P[dt0 ε]. t
Since the second term can be made as small as possible by choosing a suitable ε, the result follows from the first part of (iii). ∞ n i n Proof of Lemma 4.2. Letting H(n) (L) := i=0 Hi L , we have, from Eq. (3.6), Zt = (n) (n) (n) (n) (n) (n) 0 (n) (n) −1/2 K ) xt ] and Zt = H(L)A(L)[Yt − xt ] = t , H (L)A (L)[Yt − ( + n so that (n)
Znt = H(n) (L)A(n) (L)G(L)B(L)t − n−1/2 H(n) (L)A(n) (L) (n) K(n) xt , where the t ’s are i.i.d. with the probability density function f given in (2.1). Consequently, the process {Z˜ nt := Znt − E0 [Znt ], t ∈ Z} (here, and in the sequel, expectation E0 is taken under H(n) (, , f )) satisfies the infinite order linear difference equation ˜ nt = H(n) (L)A(n) (L)G(L)B(L)t =: Z
∞ j =0
(n)
Ej t−j .
(n) (n) Let l := ∞ j =l Ej . It follows from Theorem 2.1 in Pham and Tran (1985) that, if (i) |f (x + ) − f (x)| dx K, (ii) x f (x) dx < ∞, for some > 0, ∞ (n) j ∞ (n) (iii) j =0 Ej < ∞, j =0 Ej z = 0, for all |z| 1, and ∞ (n) /(1+) < ∞, (iv) l=1 ( l ) ˜ nt , t ∈ Z} is absolutely regular, with mixing rates (n) (j )K ∞ ( (n) )/(1+) . then, {Z l=j l We check that Conditions (i)–(iv) hold here. Denoting by .2 the L2 -norm and by Df 1/2 the quadratic mean gradient of f 1/2 , we have $ |f (x + ) − f (x)| dx f 1/2 (. + ) − f 1/2 (.)2 f 1/2 (. + ) + f 1/2 (.)2 2 f 1/2 (. + ) − f 1/2 (.)2 2 f 1/2 (. + ) − f 1/2 (.) − Df 1/2 (.)2 + 2 Df 1/2 (.)2 2 ((1/k) Ik,f −1 )1/2 + 2 Df 1/2 (.)2 , where we used Lemma 2.2(i) in Garel and Hallin (1995) to bound the first term. Since the quadratic mean gradient is in L2 (Rk ), Condition (i) is satisfied. Of course, Assumption (A1 ) implies that Condition (ii) is satisfied with = 2. (n) It follows from Assumption (D2) that (Ej ) is exponentially decreasing to zero in j (for fixed n), so that the first part of Condition (iii) clearly holds (note that the second part of Condition (iii) directly follows from Assumption (D2)). It is then a simple exercise to
M. Hallin, D. Paindaveine / Journal of Statistical Planning and Inference 136 (2006) 1 – 32
31
(n)
check that the sequence ( l ) is also exponentially decreasing to zero in j (still for fixed n). Consequently, Condition (iv) is satisfied, and Pham and Tran’s (1985) Theorem 2.1 applies. (n) As above, the exponential decrease in l of the ( l )’s implies the exponential decrease (n) in j of the mixing rates (j ) of the associated absolutely regular process. The uniformity in n of the exponential decrease of (n) (j ) is obtained, as in the univariate case, by showing (n) (as in Kreiss, 1987, Lemma 6.1) that the above bounds on the norms Ej hold uniformly in n (for sufficiently large n).
References Brockwell, P.J., Davis, R.A., 1987. Time Series: Theory and Methods. Springer, New York. Garel, B., Hallin, M., 1995. Local asymptotic normality of multivariate ARMA processes with a linear trend. Ann. Inst. Statist. Math. 47, 551–579. Hallin, M., 1986. Non-stationary q-dependent processes and time-varying moving-average models: invertibility properties and the forecasting problem. Adv. Appl. Prob. 18, 170–210. Hallin, M., Paindaveine, D., 2002a. Optimal tests for multivariate location based on interdirections and pseudoMahalanobis ranks. Ann. Statist. 30, 1103–1133. Hallin, M., Paindaveine, D., 2002b. Optimal procedures based on interdirections and pseudo-Mahalanobis ranks for testing multivariate elliptic white noise against ARMA dependence. Bernoulli 8, 787–815. Hallin, M., Paindaveine, D., 2002c. Multivariate signed ranks: Randles’ interdirections or Tyler’s angles?. In: Dodge, Y. (Ed.), Statistical Data Analysis Based on the L1-Norm and Related Methods. Birkhäuser, Basel, pp. 271–282. Hallin, M., Paindaveine, D., 2004a. Rank-based optimal tests of the adequacy of an elliptic VARMA model. Ann. Statist., to appear. Hallin, M., Paindaveine, D., 2004b. Affine invariant aligned rank tests for the multivariate general linear model with ARMA errors. J. Multivariate Anal., to appear. Hallin, M., Puri, M.L., 1994. Aligned rank tests for linear models with autocorrelated error terms. J. Multivariate Anal. 50, 175–237. Heiler, S., Willers, R., 1988. Asymptotic normality of R-estimates in the linear model. Statistics 19, 173–184. Hettmansperger, T.P., Möttönen, J., Oja, H., 1997. Affine invariant multivariate one-sample signed-rank tests. J. Amer. Statist. Assoc. 92, 1591–1600. Hettmansperger, T.P., Möttönen, J., Oja, H., 1998. The geometry of the affine invariant multivariate sign and ranks methods. J. Nonparam. Statist. 11, 271–285. Jan, S.-L., Randles, R.H., 1994. A multivariate signed-sum test for the one-sample location problem. J. Nonparam. Statist. 4, 49–63. Jureˇcková, J., 1969. Asymptotic linearity of a rank statistic in regression parameter. Ann. Math. Statist. 40, 1889–1900. Koul, H.L., 1992. Weighted Empiricals and Linear Models, IMS Lecture Notes-Monograph Series. IMS, Hayward, CA. Kreiss, J.-P., 1987. On adaptative estimation in stationary ARMA processes. Ann. Statist. 15, 112–133. Le Cam, L., 1986. Asymptotic Methods in Statistical Decision Theory. Springer, New York. Liu, R.Y., 1990. On a notion of data depth based on random simplices. Ann. Statist. 18, 405–414. Liu, R.Y., Parelius, J.M., Singh, K., 1999. Multivariate analysis by data depth: descriptive, statistics, graphics, and inference. Ann. Statist. 27, 783–858. Möttönen, J., Oja, H., 1995. Multivariate spatial sign and rank methods. J. Nonparam. Statist. 5, 201–213. Möttönen, J., Oja, H., Tienari, J., 1997. On the efficiency of multivariate spatial sign and rank methods. Ann. Statist. 25, 542–552. Möttönen, J., Hettmansperger, T.P., Oja, H., Tienari, J., 1998. On the efficiency of the multivariate affine invariant rank methods. J. Multivariate Anal. 66, 118–132. Oja, H., 1983. Descriptive statistics for multivariate distributions. Statist. Probab. Lett. 1, 327–332.
32
M. Hallin, D. Paindaveine / Journal of Statistical Planning and Inference 136 (2006) 1 – 32
Oja, H., 1999. Affine invariant multivariate sign and rank tests and corresponding estimates: a review. Scand. J. Statist. 26, 319–343. Oja, H., Paindaveine, D., 2004. Optimal testing procedures based on hyperplanes. J. Statist. Plann. Inference, to appear. Ollila, E., Hettmansperger, T.P., Oja, H., 2004. Affine equivariant multivariate sign methods. Preprint, University of Jyväskylä. Peters, D., Randles, R.H., 1990. A multivariate signed-rank test for the one-sample location problem. J. Amer. Statist. Assoc. 85, 552–557. Pham, D.T., Tran, L.T., 1985. Some mixing properties of time series models. Stochastic Process. Appl. 19, 297–303. Puri, M.L., Sen, P.K., 1971. Nonparametric Methods in Multivariate Analysis. Wiley, New York. Randles, R.H., 1989. A distribution-free multivariate sign test based on interdirections. J. Amer. Statist. Assoc. 84, 1045–1050. Randles, R.H., 2000. A simpler affine-invariant, multivariate, distribution-free sign test. J. Amer. Statist. Assoc. 95, 1263–1268. Randles, R.H., Peters, D., 1990. Multivariate rank tests for the two-sample location problem. Comm. Statist. Theory Methods 19, 4225–4238. Randles, R.H., Um, Y., 1998. Nonparametric tests for the multivariate multi-sample location problem. Statist. Sinica 8, 801–812. Singh, K., 1991. A notion of majority depth. Unpublished manuscript. Tyler, D.E., 1987. A distribution-free M-estimator of multivariate scatter. Ann. Statist. 15, 234–251. van Eeden, C., 1972. An analogue, for signed rank statistics for, Jureˇcková’s asymptotic linearity theorem for rank statistics. Ann. Math. Statist. 43, 791–802. Visuri, S., Ollila, E., Koivunen, V., Möttönen, J., Oja, H., 2003. Affine equivariant multivariate rank methods. J. Statist. Plann. Inference 114, 161–185. Yoshihara, K.I., 1976. Limiting behaviour of U-statistics for stationary absolutely regular processes. Z. Wahrsch. Verw. Gebiete 35, 237–252. Zuo, Y., Serfling, R., 2000. General notions of statistical depth function. Ann. Statist. 28, 461–482.