Journal of Econometrics 31 (1986) 341-361. North-Holland
M O D I F I E D LAGRANGE M U L T I P L I E R T E S T S F O R P R O B L E M S WITH ONE-SIDED ALTERNATIVES Alan J. R O G E R S * Indiana University, Bloomington, IN 47405, USA
Received January 1985, final version received February 1986 A modified Lagrange multiplier test statistic is proposed which takes explicit account of the one-sided nature of the alternative in problems where the null hypothesis specifies that the true value of the parameter vector lies on the boundary of the parameter space. Computation of this statistic requires only the constrained maximum likelihood estimator. Conditions for the consistency of tests based on this statistic are examined and it is shown that the distribution of the statistic is not affected if nuisance parameters are allowed to lie on the boundary of the parameter space.
I. Introduction The problem of testing a hypothesis which specifies that the true value of a parameter vector lies on the boundary of the parameter space has been addressed by Gourieroux, Holly and Monfort (hereafter G H M ) in two recent papers, G H M (1980,1982). In the second of these they consider the case of a linear regression model with inequality constraints on the coefficients (or on linear functions of the coefficients), and in G H M (1980) they consider the more general case of inequality constraints on the parameters of models of fairly general form. This work extends results obtained by Bartholomew (1961), Chacko (1963), Kudo (1963), Nuesch (1966) and Perlman (1969) for the analogous problem in normal multivariate analysis, and the large sample results of Chernoff (1954), Moran (1971) and Chant (1974). In these cases the null hypothesis specifies that the inequality constraints are satisfied as equalities, so the problem is a multiple-parameter version of the classical problem of testing a simple null hypothesis against a one-sided alternative. Aside from the interest this question has from the point of view of statistical theory, there are applications in econometrics where it arises naturally. For instance, inequality constraints on regression coefficients have attracted considerable attention, *I am very much indebted to Richard Quandt, Gregory Chow, Whitney Newey and two referees for helpful comments on earlier versions of this paper. Any remaining errors are entirely my responsibility. 0304-4076/86/$3.50 ©1986, Elsevier Science Publishers B.V. (North-Holland)
342
A.J. Rogers, Modified Lagrange multiplier tests
although mainly from the point of view of estimation [see, e.g., Thompson (1982)]. Another example is a random coefficient model where the hypothesis of interest is that some or all of the coefficients have zero variance, and the alternative is that one or more of the variances in question are positive ]see, e.g., Nicholls and Pagan (1983)]. G H M also consider a similar problem for an error components model [see also Miller (1977)] and a market possibly in disequilibrium, where equilibrium is characterised by a zero parameter value and disequilibrium by a positive value of the parameter [see Quandt (1978)]. For this class of problems the main result is that the Wald (W) and likelihood ratio (LR) statistics do not have a limiting X 2 distribution under the null hypothesis, but instead are distributed as a mixture of X 2 distributions. G H M (1980,1982) also propose a ' K u h n - T u c k e r ' (KT) test statistic which is asymptotically equivalent to the W and LR statistics. In addition to these three statistics, use can be made of the Lagrange multiplier (LM) (or efficient score) statistic, and this has the usual limiting X 2 distribution under the null hypothesis. 1 The objection to the use of a conventional test based on the LM statistic is that it takes no account of the one-sided nature of the alternative, and so can be expected to have low power compared to procedures which incorporate this prior information (such as tests based on the W, LR, or KT statistics). [In the case where only one parameter is under test the usual LM test can easily be modified, and in fact King and Hillier (1985) show that such 'one-sided' L M tests of some important hypotheses on the error covariance matrix of a linear regression model are locally most powerful invariant.] However, the LM statistic is often easier to compute than the W, LR or KT statistics since the (inequality constrained) maximum likelihood estimator is not required. In addition, the usual LM test is easy to implement once the statistic has been computed, while the mixing parameters of the limiting distribution of the W, LR and K T statistics will generally be difficult to estimate if the number of parameters under test is large. The estimation of these mixing Parameters is required if these statistics are to be used for conventional types of test procedures. The purpose of this paper is to propose and examine a statistic which takes account of the one-sided nature of the alternative, but which (like the LM statistic) is relatively easy to compute in most applications. In the next section we describe the nature of the problem in more formal terms and establish some notation. In section 3 we see how the statistic is constructed as a 1The LM statistic has this limiting distribution under the regularity conditions given in GHM (1980) (contrary to their claim). The critical conditions are that the normalised score vector be asymptotically normal with zero mean vector and non-singular covariance matrix and that the true value of the nuisance parameter vector lie in the interior of its parameter space. See Tanaka (1983), Kiefer (1982) and Schmidt and Lin (1984) for examples where these conditions are not satisfied. In any event, if these conditions fail to hold, the W, LR and KT statistics will not generally have limiting distributions of the type described in GHM (1980) and in section 3 of this paper.
A.J. Rogers,ModifiedLagrangemultipliertests
343
function of the elements of the score vector, where this vector is evaluated under the null hypothesis. In particular, the statistic depends on the signs of certain linear functions of the elements of this score vector. We also show that the statistic is asymptotically equivalent (under the null hypothesis) to the W, L R and K T statistics, and we note that any four of these statistics can be used as the basis for an easily implemented, although somewhat conservative, test procedure. In section 4 we show that stronger conditions are required for the consistency of a test based on the statistic introduced in section 3 than are required to ensure the consistency of tests based on the W, LR, K T or LM statistics. However, the conditions for the consistency of this test exactly parallel the conditions under which LM tests are consistent in problems with two-sided alternatives. Another rather interesting feature of the statistic is that its limiting distribution is the same regardless of whether or not nuisance parameters lie on the boundary of the parameter space, and we discuss this point briefly in section 5.
2. Set-up The framework we use is closely related to those of G H M (1980) and Holly (1982). The log-likelihood for a sample of size n is L,(a) where a is a parameter vector contained in A, a compact subset of R k. We partition a as a' = ( 0 ' : y ' ) and denote the true value of a' by a~ = (0~ : y~). The maintained hypothesis is
Y0 ~ int F, 0 0 ~ O - - (O~Rq:o<_oi<_mi< o ¢ , i = 1 , . . . , q } , where F is a compact subset of R k-q and Oi is the i th element of 0. The null hypothesis is 0o = 0 and the alternative is O0i > 0 for at least one of the O0r So, if q = 1, the problem is the classical one of testing a hypothesis defined by an equality against a one-sided alternative. Here y is a ( k - q ) x 1 vector of nuisance parameters, and the interiority assumption on 70 is needed to ensure that the W, LR, K T and LM statistics have the appropriate limiting distributions under the null hypothesis. This requirement is somewhat more restrictive than is desirable since one will often want to allow the possibility that Y0 lies on the boundary of F. For example, one may want to test the hypothesis that each of a proper subset of the coefficients in a random coefficient model has zero variance without assuming that the variances of the coefficients not under test all be positive. We return to this point in section 5. 2 2Chant (1974) points out that this possibilityought to be taken into account, but he derives his results on the basis of an explicitinteriority assumption on the nuisance parameters.
344
A.J. Rogers,ModifiedLagrangemultipliertests
The results of this paper depend on the satisfaction of certain regularity conditions, and sufficient conditions are listed in the appendix. These are essentially the same as conditions imposed in the more usual maximum likelihood theory. The most important requirement is that the normalised score vector, when evaluated at the true parameter point, be asymptotically normal with zero mean vector. (This condition is A.4 of the appendix.) In the i.i.d, case, for example, assumptions such as those given in White (1982) are sufficient for the results of this paper to hold. [See also White and Domowitz (1984) for a discussion of some of the issues involved in a treatment of non-independent observations.] Here we note that the derivations of the results given below depend only on the satisfaction of regularity conditions over A, whereas the approach of G H M (1980) depends on the satisfaction of conditions over a set, A*, of which A is a proper subset [and in particular (0 : -/~) ~ int A*]. This point is of some importance because L,(a) need not be well defined for some or all a $ A. ~,~') be the maximum We will use the following notation. Let a. .n. -(0~' . likelihood estimator of a 0 [so that t~, maximises L,(~) over A] and let ~tx a~ t, = (0 : y,) maximise Ln(a ) over A subject to the restriction 0 = 0. We define
a.(
=
aL.(,O/
r.(
= -,,-'O2L.(
and for notational ease we let d., d., d. and F., ~'., F. denote d.(a) and F.(a) evaluated at a0, 8., ~., respectively. We let F(a; ao) denote the almost sure limit (under ao) of F.(a), and for convenience we set F(ao; a0) = F. 3. The test statistic
Under the null hypothesis the W and LR statistics have the same limiting distribution as I c . = ( d . - d .^) , -F~1- ( d . - d . )
,
which is the Kuhn-Tucker (KT) statistic proposed by G H M (1980). One can think of K . as a modified LM statistic obtained by using the vector d. as a correction applied to d.. The LM statistic is M. = d,~Fff taT.. In the present case 34. has the usual limiting X 2 distribution with q degrees of freedom, but unlike K., its use as a test statistic implicitly treats the alternative hypothesis as two-sided. As G H M (1980) show, the limiting distribution
A.J. Rogers,ModifiedLagrangemultipliertests
345
o f K , is a m i x t u r e of X 2 distributions, a n d the m i x i n g p a r a m e t e r s of this l i m i t i n g d i s t r i b u t i o n are functions of the elements of the m a t r i x F. F o r our p u r p o s e s it is c o n v e n i e n t to consider this d i s t r i b u t i o n a l o n g the following lines. T h e r e are 2 q subsets of {1 . . . . . q} a n d we d e n o t e these b y I p ( j = 1 . . . . . 2q). W e a s s o c i a t e the following cone with a given one of the Ij*:
Cj = ( x E R q : x = R j y , y E R q + } , where R j = W [ d i a g ( e j ) ] - [ I q - diag( ej)],
W= [(Iq'O) W(Iq'O)'] -1, W = F -1 ' a n d ej is a q × 1 vector with its i t h element equal to one if i ~ I p a n d equal to z e r o if i ~ Ij*; Iq is the q × q i d e n t i t y matrix. Therefore, for a given set Ij*, the c o n e Cj is o b t a i n e d b y first d e t e r m i n i n g the m a t r i x Rj a n d then finding those p o i n t s , x, in R q which satisfy R ~ ix >> 0. 3 T h e 2 q cones Cj p a r t i t i o n R q.4 N o w let W q=(Iq:O)W(O:Ik_q)' [a q × ( k - q ) matrix] a n d define the q × k matrix
Z = ( Iq; W-l~q), a n d a s s o c i a t e with Ij* the s y m m e t r i c k × k m a t r i x
BJ=( W-(([Iq-diag(ej)]:O)W([lq-diag(ej)]:O)')+O
00)'
w h e r e (.) + d e n o t e s the M o o r e - P e n r o s e inverse. A p r o o f of the following result is given in the a p p e n d i x .
Proposition 1. Under the null hypothesis (0o = 0), if Zd. ~ Cj for all n, then K o - d:,WBjWd. L O. 3Throughout we use the following conventions for vector inequalities. If x i and y, are the i th elements of two vectors, x and y, of the same dimension, then x >>y means x, > Yi for all i, x>y means x,>y, for all i and with the inequality strict for at least one of the i, and x>y means x~ _>y, for all i. x < 0 and sign[det(Rj)] = sign[(-1)q-PJ] where pj is the number of elements in the set Ip, and we have made use of a result due to Samelson, Thrall and Wesler (1958).
346
A.J. Rogers, Modified Lagrange multiplier tests
This says that, conditional on d. satisfying Zd,, ~ Cj for a given cone, Cj, and all n, K . is asymptotically equivalent to a quadratic form in d., where the matrix of this form depends on the cone, Cj, in question. This result is essentially the same as one obtained by G H M (1980) in the course of their derivation of the limiting distribution of the W, LR and KT statistics. The form of the cones Cj can be motivated by the fact that, with arbitrarily high probability for all sufficiently large n, Zd. ~ Cj if and only if nl/2[Iq diag(ej)]O n = 0 and nl/2[diag(ej)]O. >> 0. It is easy to verify that the unconditional limiting distribution of d'WBjWd, is central X2 with degrees of freedom equal to the rank of Bj, and that the rank of Bj is equal to the number of elements in the set Ij* (i.e., the number of elements of ej which are equal to one). That the limiting distribution of K. is a mixtur6 of q + 1 X2 distributions follows from the asymptotic independence of the events d ' W B j W d . < 0 and Z d . e Cj, and this in turn is a consequence of the asymptotic normality of d.. The distributions comprising the mixture are a 'X 2' distribution with zero degrees of freedom which assigns unit mass to the origin and X2 distributions with 1,..., q degrees of freedom, and the mixing parameters associated with each of these distributions depend, as we see below, on the quantities lim._,o~Pr(Zd . ~ Cj), for each j = 1. . . . . 2q. For our purposes Proposition 1 has an important implication. We define a random variable in the following way: set
Q,, = d~,WBjWd,,
Zd,, ~ Cj,
if
so that Qn is set equal to the quadratic form in d. which we have associated with the cone in which Zd. lies. 5 Q. is a function of unobservables, but it is possible to construct a test statistic b-y replacing the quantities in Q. with suitable estimstes. Let I,V.=/~n-1 and define l~n,l,l,'q, 2 . and Rj., Bj. as matrices constructed from the elements of l,~'. in the same way as IV, W q, Z, and the Rj,flj are constructed from the elements of W. 6 Similarly, we define the cones Cjn = {X ~ Re: X = R j . y , y ~ Rq++}. Whenever IV. is symmetric positive definite these cones partition R q. We now define S. analogously to Q. as
= d,,W,,Bj,,W,,d,,
if
ZnaTn
S n is the statistic we are primarily concerned with in this paper, and we prove the following result in the appendix. 5 Here we ignore the possibility that Zd,, lies on a boundary of one of the ~ , since such events occur with probability approaching zero as n ~ oo. 6Note that condition A.3 of the appendix ensures that, under the null hypothesis, W,( = ~--l) exists with arbitrarily high probability for all n sufficiently large.
A.J. Rogers, Modified Lagrange multiplier tests
Proposition 2.
347
Under the null hypothesis (0 o = 0), S. - Q. p O.
We therefore have a statistic which is asymptotically equivalent to K., and not surprisingly, this equivalence can also be shown to hold under a sequence of local alternatives of the form 00. = n-1/2~, for any q~> 0. The computation of S. is straightforward once &. has been obtained. As is the case for the LM statistic, all that is needed is the elements of the matrix/~, and the score vector d.: one uses the elements of F. to calculate the parameters of the 2 q cones ~ . , then one calculates Z.aT., determines in which of the ~n it lies and computes the associated matrix Bj. and the quadratic form d.W~Bj.W.d.. ~' . . . . To illustrate a little of what is involved here we briefly consider two examples. The first example is similar to the problem examined by G H M (1982). Suppose that y . = X . f l + u , where y. is a n × 1 response vector, X. a n × ( k - 1) matrix of explanatory variables which are distributed independently of the elements of u., a n × 1 disturbance vector which is distributed as N(0, o21.). The vector fl is subject to q inequality constraints of the form (Iq : 0)fl > 0, and the null hypothesis is (lq: 0)fl = 0. Here we have, as usual, d.=(
'
"
where/~, minimises ( y . - X . f l ) ' ( y . ( y . - X . f l . ) ' ( y . - X . f l . ) / n , and & =
P
x" x./no 0
-2
o
X . f l ) subject to (Iq:O)fl = 0 and 6.2 =
)
267 "
In addition l~z--((X~X.)/nS~) -1, ~ q = 0, and so Z. is equal to the q × k matrix (Iq: 0), for all n. The second example is a simple random coefficient model. Suppose that
y , = x t f l t + u t,
flt=fl+et,
t = l .... , n ,
where et, x, are q × 1, the vectors (x~ ;.e I : ut) are i.i.d., and u, is independent of the elements of (x~: e~). We assume that e l - N(0, ~2), u t ~ N(0, o2), so that, conditional on xt, y , - N(x~fl, vt) with v,= 0 2 + xt~2x ,. Now suppose that the hypothesis of interest is e, = 0 with probability one. Probably the most natural and convenient way to proceed is to let to be the (q × 1) vector consisting of the diagonal elements of 12 and to formulate the null hypothesis as to = 0, with the alternative being o: > 0. (Note that here k = 2q + 1.) If we
A.J. Rogers, Modified Lagrange multiplier tests
348
now define
/ t 22/
so,] [I~,-x:.l :.,°-:.,.l~, gilt I=~
(Yt-,~ X;~nl#xt/otn2
do=t/
(Yt--Xtfln)
'
/2?~t.-1/Z?J,.
where b t is the q × 1 vector consisting of the diagonal elements of xex ~, and
)( Ex,y, ) -1
Ex,x;
~en=6f = n_l~-,(yt_ xtfl, )'~2,
t
then
s~ = ° " ~ t ~ : , : s~,: s:t)= . : ( z s : , [
o:
0),
t
and F may be conveniently estimated by -, . " d,2~'(,7, -,2 = n-lE(d,~t.d~t'_ots t-~,t:d~t:d~t).
l
Z. and the Bj., ~ . may then be obtained in the way indicated above. Note that in this case Wfl ~ 0. In both of the examples just outlined the calculation of S. is rather more straightforward than that of K., since the computation of t~. is not required. The ideas here are similar to those which arise in respect of the Lagrange multiplier statistic in other contexts [see, e.g., Breusch and Pagan (1980)]. We now turn to the question of how S. may be used as the basis for a test procedure. Let Cp be the union of the q !/(q - p)!p ! cones, Cj. which are such that their associated sets Ij* have exactly p elements, and let Cpn be the union of the corresponding ~ . . Now, under the null hypothesis, q Pr(S.
<_c) --> ~. %G(c), p=O
where Fp(.) is the distribution function of the Xp2 distribution and ap = lira Pr( Zd~ ~ Cp) = Pr( Zx ~ Cp ), n ---~ OQ
A.J. Rogers, Modified Lagrange multiplier tests"
349
and x is distributed as N(O, ZFZ'). This is so because q
Pr(&_< c) = Z ps0 q
~., Pr(S~
Z Fq(c)Pr(Zx~Cp). p=O
The mixing parameters, ap, may be estimated by integrating the N(0, ZnFnZ~) density over each of the sets Cpn. We denote the estimates so obtained by h?,. If a ~, is found which satisfies ~,q=o~e,F?(?,) = 1 - s, then a test procedure which accepts the null hypothesis if S, < E, and rejects otherwise will have asymptotic size equal to s. 7 There is no difference between the principles involved here and those involved in the determination of the appropriate critical regions for test statistics used in similar one-sided contexts, since the issue there is also one of determining the parameters of mixtures of X2 distributions. [See, e.g., GHM (1980,1982), Farebrother (1984), Wolak (1985) and their references. Also, Moran (1984) discusses methods for approximating integrals of the sort which define the ae. ] The difficulty with the test procedure just outlined lies in the estimation of the weights, ap, since if q is at all large (greater than two) this is likely to be quite burdensome. A partial solution to this computational difficulty is available in the form of a test procedure which relies on the limiting conditional distributions of S,, rather than on its limiting unconditional distribution. 8 Let cp,: denote the s*-level critical value of the Xp2 distribution for p = 1, .., q [so that Fl(Cl,:) ..... Fq(cq,:) = 1 - s*]. Suppose that the null hypothesis is accepted if 7Note that this test cannot have the correct asymptotic size if s is large because the asymptotic probability of a type I error cannot exceed 1 - a 0. Since a o < ~, this feature raises no difficulty for conventional choices of s. (This is likely to be especially so if q is large, since, for example, a 0 is close to 1/2 q if F is close to diagonal.~ In the case where q = 1 and s < ½ the test is the same as that which rejects the null if Z n d./( Z. F. Z.)l/2 exceeds the s-level critical value of the standard normal distribution. But when s > 12 (and q = 1) the latter test can reject the null even if Z,, d,, < 0, whereas the test based on S~ accepts in this case. This is so because So is asymptotic~ly equivalent to the W (or LR) statistic: the W statistic uses only information contained in nt/Z~., and when all that is known is nl/2~,~ = 0 there is little alternative to accepting the null hypothesis. T h a t is, tests based on S..(or on the W, LR or K T statistics) make no use of the information contained in Z,,dn when Z.dn < 0, and this information is useful if s > 1 - a 0. When q _> 2 and s > 1 - a 0 the only way to use this information sensibly seems to be to use a finite induced test of the sort discussed at the end of this section. 8This approach was suggested to me by Whitney Newey.
350
A,J. Rogers,ModifiedLagrangemultiplier tests
Z . d . ~ Cp. for one of p = 1 . . . . . q and S. < Cp,s* for the corresponding p, is rejected if S ~ > Cp.s. for this p, and is accepted whenever Z.d~ ~ Co. (i.e., whenever Z . d . << 0). Then, from the independence result which ensures that S,, is asymptotically distributed as a mixture of X 2 it follows that the a s y m p t o t i c size of the test procedure just described is = s*(1 - a 0 ) . Therefore, once a0n is computed, a test of asymptotic size equal to s is o b t a i n e d b y setting s* equal to s/(1 - a0.). 9 The advantage of this test over the one described above is due to the fact that it will be easier to c o m p u t e an estimate of a o than estimates of each of the ap, p = 0 . . . . . q. [In fact, the idea of M o r a n (1984) is directly applicable here.] Nevertheless, the computational effort involved m a y still be considerable, and accordingly we now consider three alternative, easily implementable test procedures. 1° T h e first of these is the same as the second procedure outlined a b o v e except that s* is set equal to s (the desired asymptotic size) and whenever Z.d. e Co. a r a n d o m i z i n g rule is adopted such that the null hypothesis is accepted with p r o b a b i l i t y 1 - s* and is rejected with probability s*. This test clearly has the correct a s y m p t o t i c size, but has little intuitive appeal. Next, one can proceed as just outlined b u t drop the randomizing feature, and accept the null hypothesis whenever Z.d~ ~ COn. The asymptotic size of this test is s * ( 1 - a 0 ) , and since 0 < a 0 < ½ for q>__2, we know that this a s y m p t o t i c size lies between s * / 2 and s. Such a procedure poses no c o m p u t a tional p r o b l e m s and m a y be a good c o m p r o m i s e in practice. Finally, the following finite induced test is available. We let ( Z ~ d . ) ~ d e n o t e the ith diagonal element of Z~F~Z£, the ith element of Z.d. and (Z.F~Z.)~, " " ~' and consider accepting the null hypothesis if (2.dn)J[2~P.Z~)~i] 1/2 does not exceed the u p p e r s/q level critical value of the standard normal distribution for each 1 = 1 . . . . , q, and is rejected otherwise. It follows from the Bonferroni inequality that this test has asymptotic size no greater than s, but, as is usually the case with this type of test, the difficulty is that its asymptotic size can be considerably less than s. 11 W e conclude' this section by noting that tests based on the W, L R and K T statistics are available as counterparts of each of the five tests described above. F o r example, the W statistic for the present p r o b l e m is n/~'liz.~., and a p r o c e d u r e which accepts the null hypothesis if this statistic does not exceed cp..~, and nt/28~ has exactly p positive elements, and accepts whenever
}[qp=lS*ap
9Such an s* exists, of course, only if s < 1 - ~o.. roOf the two test procedures outlined above, the first might be expected to be more powerful than the second. Some insight into this can be gained by considering the acceptance regions of the tests in simple cases, such as that for which q = 2. tlAlternatively, one could use this finite induced test statistic.along with a critical value. 5,. chosen to satisfy Pr(x < 5.1), where x is distributed as N(0, Z.F.Z') and I is a q × 1 vector of ones. This test always has the correct asymptotic size, but 5. will usually be difficult to compute. Note that fi,,< 0 whenever s > 1 - ao..
A.J. Rogers, Modified Lagrange multiplier tests
351
hi~20n ----0 is asymptotically equivalent (under the null hypothesis) to the fourth procedure based on S, given above. This result follows from the asymptotic equivalence of Z , d , ~ Cp, on the one hand and nl/2[diag(e)]O. >> O, nl/2[Iq - diag(ej)]O, = 0 on the other.
4. Consistency It is not difficult to show that the W and LR statistics provide bases for consistent tests of the hypothesis examined above when the conditions given in the appendix are satisfied, but this is not so of tests based on the KT or LM statistics, or of tests based on S,. To avoid unenlightening complications we assume in the following that /~, converges almost surely to a symmetric positive definite matrix under any alternative, a 0 = (06" 76), 00 > 0, and we denote these limiting matrices by F(ao). Then, we have the familiar condition that the LM test is consistent (in the sense that Pr(M. < c ) ~ 0 under any alternative for any fixed c) if and only if the almost sure limit of n-1/2d. under any alternative is not equal to the null vector. This condition is also necessary and sufficient for the consistency of a test based on K., since n- 1 / 2 d n ~ 0 a.s., whence n-1K. and n-1M. have the same almost sure limit. We will denote the almost sure limit of the vector n-~/2d, under a o by #(ao). However, the condition #(a0) 4= 0 under all alternatives is not sufficient to ensure the consistency of a test based on S.. To see this we recall that S. = 0 if Z . d . ~ Co., and C0. is just the negative orthant of R q. Now, n-1/2~.d. converges almost surely to #0(a0)---(Iq: 0)/x(ot0), and it follows that P r ( Z . d . Co. ) --* 1 under any alternative, a0, which is such that #o(ao) ~ Co. Therefore, each of the test procedures based on S. which we discussed in the last section (with the exception of the one using a randomizing rule) will actually have zero power asymptotically against such alternatives. (This is also true of the finite induced test based on the elements of Z.d..) In addition, these procedures will have less than unit asymptotic power against any alternative, a o, which is such that #e(ao) lies on the boundary of Co, since then P r ( Z . d . ~ Co) has a non-zero limit, in general. The next result gives conditions under which S. provides the basis for a consistent test.
Proposition 3. Under the alternative a~ = (0~: 7~), 0o > 0, Pr(S n > c) ~ 1 (for any constant, c) if and only if i~e(ao) q~ C o, where Co is the closure of Co. We have just seen that the necessity part is true, and a proof of sufficiency is given in the appendix. We can expect that the condition /~e(ao) ~ CO will be satisfied in well-behaved cases, and this is certainly true of the two examples examined in the last section. The following result is proved in the appendix.
A.J. Rogers, Modified Lagrange muhiplier tests
352
Proposition 4. I f B~ is a ball in R q of radius e centered on 0 = 0, then there exists an e > 0 such that go(ao) qi C O for all a o = (O0 : y~) which are such that Oo ~ B~ O 0 and Oo>O.
Together with Proposition 3 this implies that a test based on S, will have unit power asymptotically against all alternatives which are sufficiently close to the null. It is instructive to compare the problem under study with one for which the alternative is two-sided, say a test of O0 = 0 against the alternative O0 ¢ 0. Then, in the notation employed above, the condition g0(a0) 4:0 (under all alternatives) is necessary and sufficient for the consistency of the Lagrange multiplier test based on M,, and this condition is satisfied for all alternatives sufficiently close to the null. Moreover, g0(a0) 4= 0 is the only implication for g0(a0) that can be obtained by considering all alternatives close to the null. In the case of a one-sided alternative the condition bto(ao) q~ C O (under all alternatives) is necessary and sufficient for the consistency of tests based on S,, it is satisfied for all alternatives sufficiently close to the null, and is the strongest implication that a consideration of such alternatives yields. This idea, along with the fact that S, (like M~) is computed using only the constrained maximum likelihood estimator, ~,, makes it natural to think of S, as the counterpart, for one-sided testing problems, of the LM statistic in two-sided contexts. Finally, we note that the ideas underlying Proposition 4 provide a useful motivation for the form of the cones Cj (and hence for the sets Cp). If we consider only alternatives close to the null, then, as we have just seen, C O is the subset of R q in which tto(ao) cannot lie, and it also readily verified that the set Cq (i.e., Cp with p = q) contains go(ao) only if O0 ~ int ~9. More generally, the cone Cj contains g0(a0) only if eji = 1 when 00i > 0 and eji = 0 when 00i = 0, for i = 1 . . . . . q, where eji and O0, are the ith elements of ej and O0, respectively.
5. Nuisance parameters Here we consider relaxing the assumption made in section 2 that 3'0 lies in the interior of F. Once we allow the possibility that "/0 lies on the boundary of F, Proposition 1 no longer holds in general, but the limiting behavior of the statistic Sn is unaffected because nothing in the proof of Proposition 2 depends on this interiority assumption on Yo. That the limiting distributions of the KT and LM statistics will be so affected can be seen by constructing simple examples. However, it is clear from the proof of Proposition 2 that so far as the LM statistic is concerned one can overcome this problem by using the statistic M*
d,W.
17VÈ 0 0 0
A.J. Rogers, Modified Lagrange multiplier tests
353
rather than M n (as defined in section 3): the null limiting distribution of M~* is X 2 with q degrees of freedom regardless of the position of Y0 in F, whereas M, has the same limiting distribution as this only if Y0 ~ int F. Similar sorts of adjustments which make the limiting distributions of the W, LR or KT statistics invariant with respect to the position of Yo in do not seem to be available. For example, suppose that P~ in the formula for K, is replaced by
Then the arguments used in the proof of Proposition 2 can be applied to show that the statistic so obtained is asymptotically equivalent to the W statistic, ^t ^ ^ nO,~ I¢'~0~. And, it is a simple exercise to show that in the case where k = 2, q = 1, with 0 < y _< ~ (say), if Y0 = 0, then the W statistic will not generally have the (½, ½) mixture of X 2 distributions that it has if 0 < Y0 < ~7- This is so because, if Yo -- 0, the Pr(nl/20~ > 0) has a limit which is greater than ½ if the off-diagonal elements of F are negative. In the present context it is perhaps worth emphasizing that (as usual in much of asymptotic theory) regularity conditions must be satisfied on the boundary of F, as well as on the boundary of O. If, for instance, in the random coefficient example outlined in section 3, the null hypothesis were to specify that only some of the coefficients are non-random, with the elements of to corresponding to the remaining coefficients constrained to be non-negative, the log-likelihood is well behaved if some or all of these elements are equal to zero, provided that o 2 is positive. (In fact, o 2 must be bounded away from zero in this case, and in the original example of section 3, if the parameter space, A, is to be compact and the log-likelihood well behaved over it.) 6. Conclusion We have considered, a statistic which is a modified LM statistic which depends on the signs as well as magnitudes of the elements of the score vector evaluated under the null hypothesis. Its intuitive justification rests on the idea that the null hypothesis should be rejected only if the elements 6f this score vector indicate that the likelihood can be significantly increased by moving from the null in the direction of allowable alternatives. From a practical viewpoint the choice of test procedure will depend to a large extent on considerations of computational ease. It seems likely that either Sn or the Wald statistic will be chosen in applications where it is desired to take the one-sided nature of the alternative into account, since each of these statistics requires the computation of just one set of parameter estimates. A test based on the Wald statistic is not subject to the sort of inconsistency possibility that we discussed in section 4, but its limiting distribution is not invariant with respect to the position of the true nuisance parameter vector,
A.J. Rogers, Modified Lagrange multiplier tests
354
and no simple remedy seems available for this last feature (short of ignoring, where possible, any restrictions on this vector in estimation).
Appendix A. 1. Regularity conditions The following conditions are assumed to hold regardless of the position of a o in A. A.1. For all n, L,(a) is continuous in a, n-lLn(ct) ~ L(a; a0) a.s. uniformly on A, L(a0; a0) > L ( a ; a0), all a 4: a0, and there exists an ~' = (0 : 7') such that L(K; a0) > L(a*;ao) for any a*' (q: 2') = (0: 7*'). A.2. For all n, OL,(a)/Oa is continuous in a, n -x OL,(a)/Oa ~ d(a; ao) =OL(a; ao)/Oa a.s. uniformly on A, and the ith element of d(a; ao) is zero whenever the ith elements of a and a o are equal. A.3. For all n and all a in A, 02Ln(Ot)/OotOa ' is continuous in a, n-lO2L,(eO/Oa Oa'~ - F ( a ; a o ) - Od(a; ao)/Cga' a.s. uniformly on A, and F(ao; % ) is symmetric positive definite. d
A.4. d,(et0) ~ N(0, F(a0; a0) ).
A. 2. Proof of Proposition 1 Using the notation introduced in section 2, conditions A.1-A.3 ensure that
(d°-
L 0,
(1)
where t~, = nl/2(~tn - or0) and W = F -1. For a given subset, 17, of 1 . . . . . q we let Hj and Hjc denote the matrices obtained by deleting rows of zeros from the q × k matrices [diag(e) : 0] and [Iq -- diag(e) : 0], respectively. We will show that whenever Zd, E Cj for all n, HjdnP0
and
Hick, P 0 .
(2)
Without loss of generality we consider that set Ij* for which the first p elements of ej are equal to one and the remaining q - p elements are equal to
A.J. Rogers, ModifiedLagrangemultiplier tests
355
zero. It will be useful to partition W and W accordingly as
W=
W =
wll
wl~
w. I
W21
W22
W23 [
q - p,
W31
W32
W33 }
k- q
p
q-p
k-q
( 11
W21
p
21:( 1
W22 ]
W21
p
W22
-1 p
q_p
q-p
where the submatrices have the indicated dimensions. In the case of interest here, Hj = (Ip:0: 0) and Hjc = (0: Iq_p: 0). And the condition Zd, ~ Cj is equivalent to
Wll
0 ]-1
W21 -Iq_p
Zd. >>0,
and after some manipulation this reduces to
QjWI _ M j ) d. >>0, where Qj = (Ip : - W12WZ21:0) and Mj = (W221W21 : Iq_p : W221W23), of order p × k and (q - p ) × k, respectively. Premultiply (1) by QjW to obtain
QjWd. - (QjWd.- Qja,,) p O, and from the definition of Qj this is equivalent to (Wl1 - W12W221W21)Hjdn-
QjWd. + Hjgt.- W12W2221Hjc~lnL o,
since "p. ~ int F with arbitrarily high probability for all n sufficiently large. We premultiply by (Hid.)' to obtain
(/~/.)'(w.
-
w12w;?w21)M/.-(M/.)'e.Wd. (3)
^
p
^
t
-1
^
+ (~d°) Mja.- (H/.) W~W~ HjcaoL O.
A.J. Rogers, Modified Lagrange multiplier tests
356
The first term in (3) is non-negative for all n, the second term is non-negative (since Hjdn< O, QjWdn>> O, all n), the third term is equal to zero (by complementary slackness) and so ^
t
-1
^
(Hjdn) WxzW;2 Hjcan >--O, with probability approaching one as n---, ( Hj~ftn)'Mj to obtain
Now premultiply (1) by
oo.
(Hi.an) M/t.- (Hi.an) Mid.+ ( Hica. ) MjW- a n X O, ^
t
^
t
^
^
t
^
t
1^
or
^
t
-1
(Hjfin) Mjdn-(Hjfi.)M:dn +(Hj~an) W~2 (Hjcgt.) LO,
(4)
since (W21 : Wz2 : Wz3)W -1 = (0" Iq_p:O) and so
M j W - I = ( w;:lw21: Iq_p : W;:lW23 ) W - l =
(0: W~1:0).
The second term in (4) is non-negative (for all n) because - M i d n >> 0 (from Zd, ~ Cj) and H j ~ n > 0, and the third term is non-negative for all n. With arbitrarily high probability for all n sufficiently large the first term in (4) is equal to
(/~jcan) w;2 w21(Hja.), ^
t
-1
^
that (with in view of (Hjca,) ^ ' (Hjcd,) ^ = 0 and ~/0~ int F, and so we require ^ probability approaching one as n --* oo) (Hjchn)'W~lW21(Hjd,) be non-positive. In order that this be consistent with the implication obtained from (3) we must have
(njcan)tW2-21W21(ajd.)L
O,
and this along with (3) and (4) yields the result (2). We can now use the same sort of arguments that lead to (1) to obtain tin- (d - w-la.) P0, where 5. =
(5)
nX/2(~n - a0), and combining this with (1) yields
w ( d . - d . ) - (,L - ,L) L o.
(6)
A.J. Rogers, Modified Lagrange multiplier tests
357
To see that Proposition 1 holds we first note that, from (5),
a j w ( d . - d.) - Qj(gt. - 7t.) p O, or
Q j W ( d . - d.) - Qjh. L O, using the definition of Qj and ~'~ = nl/2(0 : 9" - 76). And, from (1) and (2),
QjWd,,- Qjgt. p O, and so
Q j W ( d . - d.) - QjWd. L O.
(7)
From (6),
M j ( d . - d.) - M j W - ' ( 3 . - Yr.) p O, and
Mj(d. - d.) L 0,
(8)
since M j W - ' = ( O ' W f 2 " O
), Hjca.=O and, from (2), Hjc3nPo. Letting Hk_ q = (0 : 0 : Ik_q), Hk_qd n = 0 and Hk_qd ~ = 0 with arbitrarily high probability for all n sufficiently large (because 7o ~ int F) and so, from (7) and (8),
Hk-qMj ] ( d . - d . ) -
~
Wd.~O.
(9)
I
N o w recall that the Kuhn-Tucker statistic is
K. = ( a . -
a . - a.),
and since ~'. ~ F a.s. and F - 1 = W, this is asymptotically equivalent to
A.J. Rogers,ModifiedLagrangemultipliertests
358
d.WBjWBjWd, given Zd. ~ Cj, because some algebra reveals that
Mj Hk-q
=
W21 0
W2:- W:: 1 0
,
(ao)
which is just the matrix Bj associated with the CJ of interest here. This completes the proof of Proposition 1.
A.3, Proof of Proposition 2 Consider the cone Cj used in the proof.of Proposition 1 and its sample analog, ~ . . Z . d . ~ ~ . is equivalent to (w.o~,, "-M}.)'d. >> 0, but this is asymptotically equivalent to (WQ~ : - Mf)'d. >> 0, since d. - d. + W-i~. p 0, and
( -w' ,. .o j . . - M j ". ), ' W - -'a. Lo, because ~-s.---' Os a.s., M j . ~ Mj a.s. and Ojfi. = 0, MjW-'a.=O. Therefore, Z j . E Cj. and Zd. E Cj are asymptotically equivalent. We now need only show that
d.W.Bj.W.d. - d;WBjWd, p O. We have s j w d ° - s j w d . + Bj~. p 0, and the result follows from the fact that Bj~. = 0 [because (Iq:O)~n = O] and
A.4. Proof of Proposition 3 Under the assumption made in section 4 that F(ao) (the almost sure limit of F. under the alternative a0) is symmetric positive definite, the 2 q cones g__z~ partition R q with arbitrarily high probability for all n sufficiently large. Let W
A.J. Rogers,ModifiedLagrangemultipliertests
359
denote the almost sure limit of I~'. (=/~-1) under a given alternative a~= (0~: y~), 00 > 0, and let g, #0 denote the almost sure limits of n-X/2d, and n-1/2Z.d., respectively, under this alternative._ If we let Q, j = 1..... 2 q, denote the (open) cones associated with W (in the same way that the Cj are associated with W), then #0 must lie in one of these ~ or on a boundary between two or more of them. We now consider a partic__ular one of these cones ( ~ ) and we may suppose that it is such that Z.d. ~ Cj is asymptotically equivalent to
_ Id. >>0, D
-
-
m
where Qj and Mj are formed form W in the_same way a s Qj and Mj are formed from W in the p roofof Proposition 1. [Qj is p × k, Mj is (q - p ) × k.] Clearly, lim.._, o~Pr(Z.d. ~ Cj) is non-zero if and only if ~ is such that
Under the alternative in question, ~ is of no relevence so far as the limiting behavior of S. is concerned unless it has this.pdoperty. On the assumption that ~. has this property, we have, of course, QjWFt> O. Now, if QjWg = 0, then the first p elements of ~ must be equal to zero. [This follows from the definition of Qs, and the fact_that (0:0: Ik_q)g= 0.] This and - ~ > 0 implies, from the definition of Mj, that ~ < 0, or #0 ~ Co which is ruled out in the statement of the proposition. Therefore, it must be the case that QjW~ > O. Further, if we denote the almost sure limit of n-1/2d, conditional on Z.d. ~ CE (for all n) by ftj. it is clear that we must also have QjWgj > 0. And, conditional on Z.d. E ~ (all n),
s.- d'wnjwa. Lo, where Bj is formed from the elements of W in the same way as Bj is formed from W. So the almost sure limit of n-is. conditional on Z . d . ~ ~ is
;,'jw jw j = 'WBjWBjW j, and from eq. (10) in the proof of Proposition 1 it is clear that BjW~j ~ O. The result follows readily from these considerations.
A.J. Rogers, Modified Lagrange multiplier tests
360
A. 5. Proof of Proposition 4 Consider an alternative, a~ = (0~ : 7(~), 00 > 0. Let ~ be the almost sure limit of 7, under this alternative, and set ~' = (0 : ~'). If 00 is sufficiently close to 0 = 0, ~ lies in a compact convex neighborhood of 70 (since 70 ~ int F). Then, for each i = 1 . . . . . k, there exists a d ~ A on the line segment connecting and a o such that d ( ~ ; ~o)i u~-d(~o; U o ) i - F ( %; % ) i ( ~ - °/o), where d(.)~ denotes the ith element of d(.) and F(.)~ denotes the ith row of F(.). For all ~ sufficiently close to %, the k × k matrix with ith row equal to F(a'; %)~ is positive definite, and if we denote this matrix by F(%; %), we have, in view of d ( % ; % ) = 0, %) and since (0"
t
--
%1 =
%1t
i.
< 0,
Ik_q)d(ot; or0) = 0, we have
0~0(%)
>
O,
recalling that I~o(ao) (Iq : 0)d(~; a0). Since ~ can be made arbitrarily close to a 0 by making 00 sufficiently close to O = 0, the result follows from the fact that 00 > 0, whence at least one element of tLo(ao) must be positive. (If F is convex the result also holds if 3'0 lies on the boundary of F.) =
References Bartholomew, D.J., 1961, A test of homogenity of means under restricted alternatives, Journal of the Royal Statistical Society B 23, 239-281. Breusch, T.S. and A.R. Pagan, 1980, The Lagrange multiplier test and its applications to model specification in econometrics, Review of Economic Studies 47, 239-253. Chacko, V.J., 1963, Testing homogeneity against ordered alternatives, Annals of Mathematical Statistics 34, 945-956. Chant, D., 1974, On asymptotic tests of composite hypotheses, Biometrika 61,291-298. Chernoff, H., 1954, On the distribution of the likelihood ratio, Annals of Mathematical Statistics 25, 573-578. Farebrother, R.W., 1984, Testing linear inequality constraints in the standard linear model, Mimeo. (Department of Econometrics and Social Statistics, University of Manchester, Manchester). Gourieroux, C., A. Holly and A. Monfort, 1980, Kuhn-Tucker, likelihood ratio and Wald tests for nonlinear models with inequality constraints on the parameters, Discussion paper 770 (Department of Economics, Harvard University, Cambridge, MA). Gourieroux, C., A. Holly and A. Monfort, 1982, Likelihood ratio test, Wald test and Kuhn-Tucker test in linear models with inequality constraints on the regression parameters, Econometrica 50, 63-80. Holly, A., 1982, A remark on Hausman's specification test, Econometrica 50, 749-760. Kiefer, N., 1982, A remark on the parametrization of a model for heterogeneity, Working paper 278 (Department of Economics, Cornell University, Ithaca, NY).
A.J. Rogers, Modified Lagrange multiplier tests
361
King, M.L. and G.H. Hillier, 1985, Locally best invariant tests of the error covariance matrix of the linear regression model, Journal of the Royal Statistical Society B 47, 98-102. Kudo, A., 1963, A multivariate analogue of the one-sided test, Biometrika 50, 403-418. Moran, P.A.P., 1971, Maximum likelihood estimation in non-standard conditions, Proceedings of the Cambridge Philosophical Society 70, 441-450. Moran, P.A.P., 1984, The Monte Carlo evaluation of orthant probabilities for multivariate normal distributions, Australian Journal of Statistics 26, 39-44. Miller, J.J., 1977, Asymptotic properties of maximum likelihood estimates in the fixed model of analysis of variance, Annals of Statistics 5, 746-762. Nicholls, D.F. and A.R. Pagan, 1983, Varying coefficient regression, Working paper in economics and econometrics 092 (Department of Economics and Research School of Social Sciences, Australian National University, Canberra). Neusch, P.E., 1966, On the problem of testing location in multivariate problems for restricted alternatives, Annals of Mathematical Statistics 37, 113-119. Perlman, M.D., 1969, One-sided testing problems in multivariate analysis, Annals. of Mathematical Statistics 40, 549-567. Quandt, R.E. 1978, Tests of the equilibrium versus disequilibrium hypothesis, International Economic Review 19, 435-352. Samelson, H., R.M. Thrall and O. Wesler, 1958, A particular theorem for Euclidean n-space, Proceedings of the American Mathematical Society 9, 805-807. Schmidt, P. and T.F. Lin, 1984, Simple tests of alternative specifications in stochastic frontier models, Journal of Econometrics 21,349-362. Tanaka, K., 1983, Non-normality of the Lagrange multiplier statistic for testing the constancy of regression coefficients, Econometrica 51, 1577-1582. Thompson, M., 1982, Some results on the statistical properties of an inequality constrained least squares estimator in a linear model with two regressors, Journal of Econometrics 19, 215-232. White, H. 1982, Maximum likelihood estimation of misspecified models, Econometrica 50, 1-25. White, H. and I. Domowitz, 1984, Non-linear regression with dependent observations, Econometrica 52, 143-162. Wolak, F.A., Testing inequality constraints in linear econometric models, Mimeo. (Department of Economics, Harvard University, Cambridge, MA).