A note on asymptotic testing theory for nonhomogeneous observations

A note on asymptotic testing theory for nonhomogeneous observations

Stochastic Processes and their Applications 28 (1988) 267-273 North-Holland 267 A NOTE ON ASYMPTOTIC TESTING THEORY FOR NONHOMOGENEOUS OBSERVATIONS ...

532KB Sizes 2 Downloads 129 Views

Stochastic Processes and their Applications 28 (1988) 267-273 North-Holland

267

A NOTE ON ASYMPTOTIC TESTING THEORY FOR NONHOMOGENEOUS OBSERVATIONS L. FAHRMEIR Institute of Statistics, University of Regensburg, Universitiits.str. 31, D-8400 Regensburg, FR Germany Received 8 April 1987 Revised 9 December 1987

This note shows, for ergodic and nonergodic models, how previous results on the limit distributions of the likelihood ratio, score and Wald statistics can be extended under full matrix normalization. Compared to n t/2o or diagonal norming this allows, just as in asymptotic estimation theory, for more heterogeneity of the data. As a key tool the Cholesky square root is u:ed instead of the common symmetric square root. test statistics • limit distributions • full matrix normalization

1. Introduction

Asymptotic properties of maximum likelihood estimation for not identically distributed, possibly dependent observations using full matrix normalization, in particular a full square root of the information matrix F,, of the ML estimator have been established by various authors, e.g. Sweeting (1980, 1983), Ibragimov and Hasminskii (1981), Jeganathan (1982), Kaufmann (1987, Theorem 3). Compared to classical nVe-norming or normalization by diagonal matrices, e.g. Weiss (1973), Basawa and Scott (1983), no additional convergence conditions, such as F,/n-> F or asymptotic relations between diagonal and off-di/lgonal elements of F,, are imposed. This is of importance in application to special models, since it allows for more heterogeneity of the data and convergence rates different from n -!/2, see e.g. Fahrmeir and Kaufmann (1985, 1986, 1987, in the context of generalized linear models and categorical time series) and the example in Section 3. Asymptotic properties of the common test statistics, i.e. the likelihood ratio, Wald and score statistics, have been studied by several authors, e.g. Weiss (1975), Dzhaparidge (1977) for ergodic models, Basawa and Koul (1979, 1983), Basawa and Scott (1983) for nonergodic models, all assuming diagonal norming. It seems that the restriction has been made primarily because full matrix normalization with common symmetric square roots of Fn 'involves certain complexities', as Basawa and Koul (1979) point out. This note shows, generalizing more special results in Fahrmeir (1987), that these problems can be solved eag:,!y on the usual lines if the 0304-4149/88/$3.50 © 1988, Elsevier Science Publishers B.V. (North-Holland)

L. Fahrmeir / Asymptotic testing theory

268

Cholesky square root is used. Existing results on the limit distributions are extended under full matrix normalization and under essentially the same conditions which assure asymptotic properties of the MLE, thus providing a unified set of assumptions for asymptotic likelihood inference. Asymptotic optimality is not considered here, but it is to be expected that existing results may be treated in an analogous way.

2. Notations and regularity assumptions Let {~2~, ~1,, P,(O); n ~>O} be a family of statistical experiments corresponding to a stochastic process y = {y~; n ~>0}, where n is a discrete or continuous parameter, and 0 is an unknown parameter vector in an open subset O of SP. To avoid additional technicalities we assume, somewhat more restrictively than necessary, (R)

(i) The probability measures {P.(O); O~ O}, n fixed, are mutually absolutely continuous, and log likelihoods I.(0) exist for all n with respect to some (ii) Second-order :artial derivatives of I~(O) exist and are continuous. The score function s,(O)=#l~(O)/OO [:a~: ~xpectation zero under P,(O) and is square integrable.

Hence, the observed information matrix H~(O)= _0z/.(0)/O0 ~0' and the expected information matrix F~(O)=covos,(O) exist. In the sequel, AminAdenotes the minimal eigenvalue of a positive definite matrix A, and A~/Z(Ar/2) is a left (the corresponding right) square root of the positive definite matrix A, i.e.

AI/2Arl2=A;

A-l12=(A1/2)-I ' A-rl2=(Ar/2) -1.

Note that left (fight) square roots are unique up to an orthogonal transformation from the fight (from the left). Unique continuous 'versions' of the square root are the Cholesky square root, henceforth abbreviated by CSR, and the well known symmetric square root. The left CSR is defined by the condition to be a lower triangular matrix with positive diagonal entries.

3. Limit distributions of test statistics Beyond the regularity conditions (R), we assume: (D) Divergence of information: AmiHF.(0)->oo. (S) Smoothness: For any 8 > 0

sup

e~N.(~)

ilF~'/~(O)H,(E)F~r/~(e)- v ( 0 ) l l

, o,

with V(O) random and a.s. positive definite, and neighborhoods ,,v,,(8) =

IIFT (o)( -o)II < 8}.

L. Fahrmeir/ Asymptotic testing theory

269

(N) (Mixed) asymptotic normality of the normed score function: trn_,_t~)s.(O),Fnl/2(O)Hn(O)F~r/2(O))

d ~ (VI/2(O)Z, V(O)),

with Z ~ N(O, I), independent of V(O). Remarks. (i) The divergence condition (D) is used, at least implicitly, by all authors. Condition (S) is a continuity condition on the asymptotic relation between F~(0) and H~ (0) within the neighborhoods Nn (~). It implies

F_~,/2(O)H.(O)F_r/2(O) d ~ V(O). If V(O) is t~ly random, the process is called non-ergodic (e.g. Basawa and Scott, 1983). If V(O) is a.s. coristant, the process is called ergodic, and usually V(0)=/. Two approaches exist to establish (N), which reduces to

F:,/2(O)s.(O) a , N(O,I) in the ergodic case: (a) Use of central limit theorems (e.g. Hall and Heyde, 1980, Corollary 3.1) in combination with the Cram6r-Wold device, thereby replacing (N) by a Lindeberg or Ljapunoff condition (see e.g. Kaufmann, 1987, Theorem .~. ~ j . (b) Direct verifications using characteristic functions together with the additional requirement that (S) holds under P~(O~), On= O+ F~r/2h (similar as in Basawa and Scott, 1983, using {diag Fn} as normalizing sequence, Sweeting, 1980, Hall and Heyde, 1980, Chapter 6, for the scalar case, Kaufmann, 1983). (ii) Compared to diagonal norming, additional problems arise concerning diKerent versions of the square root of Fn(O)" In the non-ergodic case, (S) and (N) may hold for a certain version, but may not hold for others. In the ergodic case V ( 0 ) - I, and (S) and (N) hold for any version F~/2(O), if they hold at all. Under (R), (D), (S) and (N), a local MLE 0n exists asymptotically. It is consistent and asymptotically normal,

Hr/2(O)(~ _O ) d ~ N(O,I),

(3.1)

where Hr/2(O) is chosen such that HT/2(O)F-~T/2(O) is a continuous fight square root of F~I/2(O)Hn(O)F~r/2(O). Furthermore

H-n~/2(O)s,,(O) °--~-~N(O, I)

(3.2)

for the same square root version. If (S) and (N) hold for the CSR, then H~/2 can be chosen as the fight CSR of Hn, since H~/2F-~T/2 is the right CSR of F-~/2HnF~ T/2. This is always possible in the ergodic case, if (N) and (S) hold at all, compare Remark (ii) above. This property of the CSR will be used in the derivation of the limit distributions of test statistics.

L. Fahrmeir / Asymptotic testing theory

270

We consider the problem of testing the value of a subvector of length r, say 02, of o =

oD':

Ho: 02 = 002 against

H~" 02 # 002.

(3.3)

The more general linear hypothesis CO = ~ can be reduced to (3.3) by an appropriate reparametrization. In the sequel, O" =(0"~, 0"2) denotes the unrestricted MLE, acf a¢ whereas O" =(0,~, 062) is the MLE under the restraint of the null hypothesis. Partitions of the score function and of information matrices are to be ur~derstoed analogously. Three common test statistics are the likelihood ratio statistic A. =-2{l.(O,,)-l.(On)},

the Wald statistic w. = (0".2- 002)'A.22(0.)(0.2 - 002), and the score statistic r. = s'.2(/~n)A~~2(0n)sn2(0.), where A.22(0) = H.22(0)-

H,,2,(O)H-~,(O)H,,~2(O)

is the inverse of the second diagonal block in the partition of H~(0). Results on the limit distributions of the test statisticsunder Ho, as well as under sequences of local alternatives

O~=Oo+F-~r/2(Oo)h,

0o=(0~,0~2),

h=(h~,h~),

(3.4)

where F T/2 is the fight CSR of F., are given in the following Theorem. I f (R), (D) are fulfilled, and (N), (S) hold for the CSR, then

(i) A. d ~ x2(r) under Ho, (ii) A. d ~ mx2(r; 82) under {P(0.)}, where mx 2 is a mixture of noncentral X 2 and a random non-centrality parameter 8 2 = h~Ch2,

C = V22(0) - V2~(0) V~-J(0) V~2(0).

(iii) If, additionally, (S) holds with H.(O) replaced by F.(0), then w. and r~ are asymptotically equivalent to A. and have the same limit distributions. Remarks. (i) The reference to a special square root, i.e. the CSR or light modifications, seems to be an indispensable requirement, see the remark after the proof of statement (i). (ii) The non-null limit distribution mx2(r, 8 2) stands for the distribution of ( Z2 + C T/2h2),( Z2 + C T/2h2),

L. Fahrmeir / Asymptotic testing theory

271

where C r/2 is the right CSR of C, and Z2 r-dimensional standard normal, independent of C. (iii) In the ergodic case V ( 0 ) = / , so that (N) and (D), and therefore the results on the null limit distributions do not depend on a special version of the square root. Statement (ii) simplifies to ,~. _~d x2(r, 82) under {P(0.)}, with 8 2 = ( 0 . 2 - 00

)'B.2,(O)(0m2- 00,),

B.22(0) = F.22(0)- F.2,(O)F-~I,(O)F.,2(O). This result on the non-null limit distribution which follows on using the partitioned form of Fr,/2(O), will generally not be valid if other versions of the square root, e.g. the symmetric square root, are chosen, see Fahrmeir (1987) for a counterexample. (iv) A more natural formulation of an alternative sequence 0, seems to be to -T/2 hold 0~ fixed at the true (but known) value, and define 0~2=0o2+B...,2 (O)h2. However, neither the LAMN property nor the continguity of P(0) and P(0,), which are essential for the proof of (ii), can be established without introducing further requirements on the asymptotic relationship between diagonal and off-diagonal elements of the Fisher information. Example. Fahrmeir and Kaufmann (1987) discuss statistical analysis of autoregressive quantal response models for nonstationary categorical time series. For simplicity we consider a binary autoregressive logit model of order 1: {yn} is a two state Markov chain with transition probabilities P(y,

"-

IJ , ily,-~ , x , ) = [ l + e x p ( a y , _ ~ + / 3 ' x ~ ~-~

where {x,} is a sequence of fixed regressor vectors. The parameter 0 = (a,/3')' is to be estimated by maximum likelihood. If {x,} is bounded, i.e. Ix l < c for all n, and if n

Xmin E X,X -" 00,

(3.5)

i=1

then it can be shown (see Kaufmann, 1987, for detailed proofs in the general case) that (D), (R), (N) and (S) hold with V(0)= I, in particular the model is ergodic. The divergence condition (3.5) is well known from classical linear regression. No additional requirements on the asymptotic relationship between diagonal and offdiagonal elements of ~ xix~, which are introduced if diagonal norming is used, are needed. Proofs of the limit results (i), (ii), (iii) proceed on the usual lines, bowever essential use is made ofthe (partitioned) CSR A ~/2 o f a p.d. matrix A, and of its inverse A-~/2: B

L t/2 "

L - L - i / 2 B U -1/2

L

/2 ,

(3.6)

where U ~/2 is the left CSR of Al~, L ~/2 the left CSR of A2:-A2~A-(~A~2, and B = A 2 1 U -r/2.

L. Fahrmeir / Asymptotic testing theory

272

By the usual decomposition and Taylor expansion of An, using in particular (S) and consistency of On, we arrive at An -" s n, H n- 1 sn -- S n, l I ' ] n l l-Sin l ,., i r . I - T / 2 D

"--" .3/ n l - I

N

I./--1/2,, 1 - , n I I n--

"~n

+Op(l)

+Op(1),

(3.7)

where Pn = I -

H T/2

II

" " n~'i[l/2

(for notational simplicity the argument 0 has been dropped). Since, by assumption, (N) and (S) hold for the CSR, H ~/2 can be chosen as the CSR. Using (3.6), it is easily verified that

Well known theorems on quadratic forms, asymptotic normality (3.2) and the continuity theorem provide (i). Remark. The proof crucially rests on the fact, that Pn is, or at least converges to, an idempotent matrix. This is true for the CSR of Hn, or some slight modifications. However, this is generally not true for the symmetric s.r. of Hn. To prove part (II) by the usual contiguity arguments, the definition of the LAMN property (Jeganathan, 1982, Basawa and Scott, 1983) has to be extended to allow for nonsymmetric square roots. Using (N) and (S), the LAMN condition is seen to hold with F-~ T/2 as norming matrices. Contiguity of P ( O ) and {P(0n)}, as well as H~l/2s n

d >Z + V l / 2 h

under P(On)

follow by the common arguments. Together with (3.7) this gives (ii). Part (iii) is obtained by showing An = wn + %(1) = rn + %(1), using Taylor expansions, a number of manipulations justified by the assumptions, in particular (S), and the properties of the CSR.

Acknowledgement I thank Heinz Kaufmann and a referee for their helpful comments.

References [ 1] I.V. Basawa and FI.L. Koui, Asymptotic tests of composite hypotheses for nonergodic type stochastic processes, Stoch. Proc. App!. 9 (!979) 291-305.

L. Fahrmeir / Asymptotic testing theory

273

[2] I.V. Basawa and H.L. Koul, Asymptotically minimax tests of composite hypotheses for nonergodic type processes, Stoch. Proc. Appl. 14 (1983) 41-54.

[3] I.V. Basawa and D.J. Scott, Asymptotic optimal inference for nonergodic models, Springer, Lecture Notes in Statistics, New York (1983).

[4] K.O. Dzhaparidge, Tests of composite hypotheses for random variables and stochastic processes, Theor. Prob. Appl. 22, (1977) 104-118.

[5] L. Fahrmeir, Asymptotic testing theory for generalized linear models, Statistics 18 (1987), 1, 65-76. [6] L. Fahrmeir and H. Kaufmann, Consistency and asymptotic normality of the maximum likelihood estimator in generalized linear models, Ann. Statist. 13 (1985) 342-368.

[7] L. Fahrmeir and H. Kaufmann, Asymptotic inference in discrete response models, Statistische Hefte 27 (1986) 179-205. L. Fahrmeir and H. Kaufmann, Regression models for nons.:ationary categorical time series, J. Time Series Analysis 8, (1987) 147-160. [9] P. Hall and C.C. Heyde, Martingale limit theory and its application, Academic Press, New York (1980). [1o] I.A. Ibragimov and R.Z. Has'minskii, Statistical estimation, asymptotic theory, Springer, Berlin, Heidelberg, New York (1981). [11] P. Jeganathan, On the asymptotic theory of estimation when the limit of the log.likelihood ratios is mixed normal, Sankhyi Ser. A 44 (1982) 173-212. [12] H. Kaufmann, Mehrclimensionale maximum likelihood Sch~itzung hei stochastischen Prozessen: Asymptotische Theorie, Dissertation, Universifiit Regensburg (1983). [13] H. Kaufmann, Regression models for nonstationary categorical time series: asymptotic estimation theory. Ann. Statist. 15 (1987) 79-98. [14] T.J. Sweeting, Uniform asymptotic normality of the maximum likelihood estimator, ~Smn.Statist. 8 (1980) 1375-1381. [15] T.J. Sweeting, On estimator efficiency in stochastic processes, Stoch. proc. Appi. 15 (1983) 93-98. T [16] L. Weiss, Asymptotic properties of maximum likelihood estimators in some nonstar4ard cases I,,, JASA 68 (1973) 428-430. [171 L. Weiss, The asymptotic distribution of the likelihood ratio in some nonstandard cases, JASA 70 (1975) 204-208.

[8]