Journal
of Econometrics
42 (1989) 351-369.
North-Holland
ASYMPTOTIC RELATIVE EFFICIENCY OF THE CLASSICAL TEST STATISTICS UNDER MISSPECIFICATION Pentti SAIKKONEN* University of .Tfelsinki,SF-001 00 Helsinki, Finland Received April 1987, final version received January
1989
Asymptotic properties of the three classical test statistics - the likelihood ratio, Lagrange multiplier (efficient score), and Wald statistics - are studied in the case of two types of misspecification. The first misspecification is that the alternative used to construct the test is different from the true nonnull model. In the second case the alternative hypothesis is correct, but some of the nuisance parameters satisfy restrictions which are not taken into account in the test. Pitman’s local alternatives and the concept of asymptotic relative efficiency are used to derive the asymptotic distributions of the test statistics and to examine the effects of the misspecifications. The general results of the paper are illustrated by an example on the dynamic specification of a regression model.
1. Introduction The three classical test statistics - the likelihood ratio (LR), Lagrange multiplier (LM), and Wald (IV) statistics - are nowadays part of the standard methodology in econometrics. They are often used to check the adequacy of a specified statistical model or to specify a restricted version of a general maintained model. As in Mizon (1977) one may speak about tests of misspecification and tests of specification. The W and LR tests are frequently used as tests of specification, whereas the LM tests are more like tests of misspecification. In both of these testing situations several unknown elements are usually involved in the models or hypotheses to be tested. This implies that one may mistakenly base a test on an incorrect or inoptimal model. Information about the power properties of such misspecified tests is therefore of importance. The purpose of this paper is to give a general account of the asymptotic properties of the LR, LM, and W tests under two types of m&specification. In the first case the alternative used to construct the test deviates from the true nonnull model. For instance, in an ordinary regression model one may test for autocorrelated errors while the true alternative is an omitted regressor. The *I wish to thank Timo Terlsvirta and the referees of the paper for useful comments. support from the Emil Aaltonen Foundation and Yrjii Jahnsson Foundation is gratefully edged.
0304-4076/89/$3.500
J.Econ
D
1989, Elsevier Science Publishers
R.V. (North-Holland)
Financial acknowl-
352
P. Saikkonen,
Agwnptotic reldve
effciency
second type of misspecification is the one in which the alternative hypothesis does conform to the true model but some of the nuisance parameters are redundant. This means that they satisfy restrictions which are not taken into account in the test, so that the alternative is in fact overspecified. In a regression context this may happen if one correctly tests for an omitted regressor which, however, has erroneously been replaced by another variable. The statistical tools used in the paper are Pitman’s local alternatives and the concept of asymptotic relative efficiency (ARE) [see, e.g., Kendall and Stuart (1979, ch. 25)]. Following Davidson and MacKinnon (1987) we shall first give the asymptotic distributions of the various test statistics under an appropriate sequence of local alternatives. When based on the same model assumptions the three classical test statistics are always asymptotically equivalent. When the relevant asymptotic distributions are available the concept of ARE will be used to investigate the efficiency of the misspecified tests with respect to the corresponding optimal tests. In the first case the optimal tests are based on the true alternative, and in the second one they take account of the restrictions in the nuisance parameters. There are some recent papers also considering asymptotic properties of test statistics when neither the null nor the alternative hypothesis is true. Reference can be made to Burguete et al. (1982), Newey (1985). Tauchen (1985) and Davidson and MacKinnon (1985, 1987). From the point of view of this paper the work of the last authors is particularly relevant. Using geometrical arguments Davidson and MacKinnon (1987) compare the asymptotic local power of various misspecified classical tests in a very general setting. Their method of comparison also yields the ARE provided the tests have the same number of degrees of freedom. In the present paper the use of the ARE will be extended to cases where the tests have different numbers of degrees of freedom. Furthermore, we shall give bounds for the ARE that are independent of the direction of the local alternative. The plan of the paper is as follows. Section 2 contains the formulation of the problem and preliminary considerations. Asymptotic properties of the test statistics in the two different cases are discussed in sections 3 and 4. Section 5 presents an application of the general theory. and section 6 summarizes the main conclusions. Some mathematical derivations are given in an appendix.
2. Formulation
of the problem
2.1. An illustration We shall start with an example of the dynamic specification of a regression model. Assume for simplicity that LM tests are applied. In practice the first LM test below might be replaced by a conventional F test. The two other LM
l? Salkkonen,
Asymptotic
relurive efficiency
353
tests can be found in Godfrey (19783 and they have computational advantages over the corresponding LR and W tests. Consider two regression models the first of which is dynamic with independent errors, i.e.,
Yt=+,Y,-, + ... +4,.&r + x:v+ E,,
(1) Ef - NID(0,
(J*),
r=l,...,T,
where x, (m x 1) is the vector of regressors and y (m X 1) is the corresponding coefficient vector. Assume that the processes { xt } and {q} are independent and that the roots of the polynomial 1 - #iz - . . . -I)~z’ lie outside the unit circle. For convenience we also assume that { xt} is a stationary and ergodic process with finite second-order moments. The second model is static with autocorrelated errors and can be written as y,=x;y+u
I’
u, = $Qu,-1 + . . . +$p_,
+ E,,
t=l
>..., T,
(2)
where x,, y, and E, are as in (1) and the roots of the polynomial 1 - &z - . . . -Q.,z” lie outside the unit circle. Suppose the investigator chooses (1) to be his maintained model and wishes to test the null hypothesis # = [ $i . . #,I’ = 0 against the alternative that at LM statistic by LM,. least one 4, f 0, j = 1,. . . , r. Call the corresponding Assume, however, that (2) is the data-generating process (DGP) with at least one +, # 0, j = I,. . . , s. Let LM, be the appropriate LM statistic for testing the null hypothesis + = [+i . . +S]’ = 0. It now becomes important to investigate the power properties of LM, relative to LM,. Because LM, is an optimal test statistic based on the true alternative, this power comparison opens up a way of assessing the cost of misspecification. To illustrate the second issue of the paper we introduce the model Yt =
hY,-1 + . ‘.
z4, =
~lu,_~
++,y,-,
+
+ . . . +q$PsU,_s+
x:u+ u,> E,.
t=l
,...’
T,
(3)
where the notation is as before. Consider testing the null hypothesis + = 0 against + # 0 within (3). Let LM, be the corresponding LM statistic and retain the assumption that the DGP is (2) with at least one 3 # 0, j = 1,. . . , s. In this case the only problem is that the alternative is overspecified as it contains the coefficients $i, . . . , #,. The optimal LM statistic is LM, as before. The cost of the overspecification can now be assessed by comparing the power of LM, with that of LM,.
354
P. Saikkonen,
Asymptotic
relative efficiency
2.2. General case Now we shall extend the above discussion to a general maximum likelihood (ML) framework. Let y, 1c/,and + be parameter vectors with dimensions m, r, and S, respectively, while T denotes the number of observations as before. In place of (1) we have a statistical model identified by the log-likelihood function L,,(y, 4). Generalizing the earlier hypotheses slightly we here consider the null hypothesis
and the alternative Hi,: $J # $*. The null model is thus characterized by the log-likelihood function L,,( y, $*) = L&y). Let L,,(y, c$) be the log-likelihood function of the true model and assume that there is a parameter value $* such that L,,(y, +,) = &r(y). For the true model we thus introduce the null hypothesis
H,:
4=4*
in
LzT(~,4)
and the alternative H,,: C#J # &_ The definitions show that both Hi and H, yield the same null model. In the next section the classical test statistics derived for Hi from L,,(y, $) will be investigated and compared with the corresponding optimal statistics derived for H, from LZT(y, $I). Despite the property L,,(y, I/J*) = L,,(y, $,) the two likelihood functions may be quite different. It is, however, assumed that both of them can be obtained from the log-likelihood function L,(y, 4, +) in the sense that &(Y, $1 = WY, 4, h> and -&,(Y, +) = -WY, L $). One may regard L,(y, I/I,$I) as the log-likelihood function of a combined model containing all the parameters y, 4, and C#I.The combined model need not be a sensible model in view of the phenomenon to be modelled although that may well be the case. If L,(y, 4, +) does not represent a sensible model it can be interpreted as a purely auxiliary function which is used to derive the results of the paper. The log-likelihood functions L,,(y, 4) and L,,(y, $), however, are always assumed to represent sensible statistical models. In the example of the preceding section, (3) is the combined model. Typically y* and C#Qare null vectors implying that some variables do not belong to the model. If r = s, one has an idea about the dimension of the parameter vector to be possibly added to the null model, although the form of the alternative may be uncertain. On the other hand, one may also consider the case where r > s and the parameter vector 1,5contains the components of (p as well as some additional components. Then the model with log-likelihood function L,,( y, 4) is overspecified rather than misspecified and some parameters appear twice in the log-likelihood function L,(y, #, 4) giving rise to a
P. Saikkonen,
Asymptotic
relatwe
efirienq
355
singular information matrix. However, instead of removing the redundant parameters it will be convenient to use the log-likelihood function Lr(y, 4, $) in its overparameterized form. One can also consider the case where L,,( y, +) represents an underspecified model. Then r -c s and the parameter vector + contains the components of 4 as well as some additional components. So far we have discussed the problem of testing a null model against a misspecified alternative.’ Now take the overspecification issue and suppose we have a model with log-likelihood function L,,(y, IJ) = L,(y, #,&J. We want to test this model specification against an enlarged model with log-likelihood function L,(y, 1c/,$). The null hypothesis is thus
The corresponding alternative is H,,: I$ # (p*. The DGP remains the same as before and has log-likelihood function L&y, +) = L,(y, $*, $). In this case we thus use an overspecified alternative which does not make use of the restriction 4 = 1c/* (typically #* = 0). An optimal test which correctly takes account of this restriction can be obtained from L,,(y, +). Next we shall discuss the assumptions needed for obtaining the results of the paper. For brevity, set 0 = [y’ 4’ +‘I’. We shall assume that with some minor exceptions and additions to be discussed below the log-likelihood function L,(O) satisfies the regularity conditions stated in the appendix of Holly (1982). These conditions imply that the usual asymptotic results on ML estimation hold and that the Taylor series expansions to be used in the paper are valid. In order to obtain the asymptotic nonnull distributions of our test statistics we consider Pitman’s local alternatives
where 6 # 0 unless otherwise stated. For simplicity the dependence of & on the sample size T is suppressed from the notation. One can interpret qj, as the true parameter value of +. The true value of y is denoted by y0 whereas 0, = [y; $6 +&I’ is the true value of 0. Furthermore, let B, = [y; 4; $;I’. If L,(d) satisfies the regularity conditions in Holly (1982) then the matrix
J(e) =
-plim.(T-la2L.(8)/aeael)
(5)
is positive definite at 8 = 8,. However, since J( 0) is essentially the information matrix of B it is convenient to relax this assumption and only assume that the analogues of J(f3,) obtained from L,,( y, 4) and L&y, $) are positive ‘For simplicity we shall usually speak about misspecitied alternatives although in xomc situations terms like overspecitied or inoptimal might be more appropriate than misspecified.
J.Econ
E
P. Saikkonen, Asymptotic relative efficiency
356
definite.
Instead
of assumption
A7 in Holly (1982) we shall thus require
that
are positive definite. Here J,( t9,) = - plimr( T-’ a 2L,( &J/J y a y’), J,,+( 0,) = -plim.(T-‘a”L.(e,)/aYa~‘), etc. In what follows the argument 8* will be omitted so that, e.g., J,(&) = J,. 3. Misspecifying
the DGP
3.1. Test statistics Let 7 be the ML estimator of y derived under Hi so that 7 maximizes L,,(y) = LT( y, JI*, I$,). Furthermore, let 7^= [T’ @]’ be the ML estimator of 7 = [y’ #‘I’ obtained from &r(7) = Lr(7, &,J_ It will be convenient to express the test statistics by using the parameter vector 0 and the loglikelihood functionA L,(8). Therefore we introduce the ML estimators 0= [y’ I& &+J’ and 8 = [? +;I of which the former is constrained by both Hi and H, and the latter by only H,. The LR statistic for Hi can now be defined by LR, = 2(L&
-L,(g)).
TO define the LM and W statistics set J,.,(e) and g,(B) = dL,(O)/d a, a = #, Y, $, 8. Since the LM statistic for Hi can be written as
= J+(e) - J,,(~)JY?(~)J,,(~> ~L,,(Y,
+)/a$
= g,(y,
4, &J
LM, = T- ‘gJ,( &)‘J;.; ( 6) g, ( 8)) while the W statistic
W,=
is
T(j-rCI,)‘~~.,(B”)(~-~,).
Below we shall use the symbol E to indicate that two random variables are asymptotically equal in the sense that their difference converges to zero in follows a probability. By z we mean that a random variable asymptotically stated distribution and xz( X) stands for the noncentral chi-square distribution with v degrees of freedom and noncentrality parameter A [x:(O) = xz]. Now consider the asymptotic properties of statistics LR,, LM,, and W, under the sequence of local alternatives (4). From Davidson and MacKinnon
P. Saikkonen,
A ymptotic
reluiicle eficiemy
357
(1987) one obtains
where A,(S) = S’J+,.yJ;.~J++.yS and J++,.v = J++ - J+,J;‘J,, = J&,.,. A proof of (6) can be based on conventional Taylor series expansions and the asymptotic normality of the score vector g,(@,); see the appendix. The above asymptotic distribution might be used to investigate the power of the misspecified test statistics LR,, LM,, and W, against H,, which conforms to the DGP. However, because of the artificial assumption of local alternatives it may be more illuminating to compare these statistics with the corresponding optimal test statistics LR,, LM,, and W, derived for H, from the true log-likelihood function LIT(y, (p). Explicit expressions of statistics LR,, LM,, and W, can be obtained by modifying the definitions of LR,, LM,, and W, in an obvious way. As in (6) we now have
where h,(S) = S’J+,+Y with J+_, = J+ - J,,J;?l,. Since LR,, LM,, and W, are based on the true model, (7) is a well-known result of the classical test statistics [see, e.g., Cox and Hinkley (1974, ch. 9)]. Using (6) and (7) one can compare the asymptotic local power of the test statistics. Another method of comparison can be based on the concept of ARE. Following the recent work of Rothe (1981), the ARE’s of this paper will be derived without the frequently used assumption that the asymptotic chisquare distributions of the test statistics have the same number of degrees of freedom, i.e., that r = s. However, a special case has to be singled out. If the rank of the IX s matrix J++.? is less that s, it is possible that J+,.$ = 0 so that X,(6) = 0 (6 f 0). This implies that the asymptotic power function of the misspecified tests is flat in the direction of the null space of JJ+.u, the power being equal to the size of the test. According to the terminology of Davidson and Ma&&non (1987) these directions define the implicit null hypothesis of the tests. On the other hand, since h*(S) > 0 for all S # 0 the optimal test statistics LR,, LM,, and W, have reasonable power properties even in the direction of the null space of J&.?. Thus the cases rank( J++.v) = s and rank( J+,+.,,) < s will be dealt with separately. 3.2. ARE when rank( J++.,) = s Let C, stand for any one of the asymptotically equivalent test statistics LR,, LM,, or w (i = 1,2). We wish to derive the ARE of C, with respect to C,. Let (Y (0 c a: < 1) be the nominal significance level used in both C, and C,. We
358
P. Saikkonen,
shall be interested in achieve a given power and assume that (Y< sample size is needed size for which
As_vmptoiic relutiue eficienq
the numbers of observations required for C, and C, to against the local alternatives (4). Denote this power by j? j3 < 1. When the dependence of the test statistics on the we shall write C, = C,(T). Let qp(6) = T, be a sample
and
P(c;(T,pw - 1)2 U;T-1)
(9)
where a,, is the exact critical value of C,, i.e., P{ C,(T) 2 aiT} = (Ywhen 6 = 0. Rothe (1981) calls q8(8) a Pitman efficiency function for fi. It is a sample size for which the exact power of C, against (4) is at least fi while for 7;,(S) - 1 it is less than p. It is easy to verify that, given (Y< p < 1, T,a(S) must tend to infinity at the same rate as T for (8) and (9) to hold (see the appendix). For a fixed S we wish to measure the ARE of C, with respect to C, by using the limit of TZp( S)/T,,JS) [see Kendall and Stuart (1979, pp. 276-278) and Rothe (1981)]. To make this more precise we have to show that T,,(S)/T,,(S) really converges and that the limit, e,,(S), is independent of the choice of Tlp( 6) and Tzp(8). These results are established in the appendix where it is also shown that
(10) In (lo), d(k, (Y,@) is the (unique) noncentrality parameter such that the 1 - /3 fractile of the xi(d(k, a, p)) distribution and the 1 - 01 fractile of the xi distribution coincide. Values of d(k, OL,/3) are tabulated in Hayman et al. (1962) Harter and Owen (1970), and Pearson and Hartley (1972). Although e,,(6) generally depends on (Y and /?, a convenient feature is that it depends on J( 0) only at B = fQ which specifies the null model. To interpret e,,(6) assume that S is fixed and denote Tp’/*go(&J = g, = [g; g; g;]‘. The covariance matrix of go, calculated assuming that 0 = 6’,, converges to J as T -+ co. Suppose first that the parameter vector y does not appear in the model. From the definitions of h,(6) and h,(6) it then follows that p2(S) = X,(S)/X,(S) is the limit of the squared theoretical multiple correlation coefficient between 6’g, and g, [see, e.g., Anderson (1958, pp. 30-32)]. Thus, the better 6’g, can be linearly predicted by g, the larger is p2(S) and consequently e,,(6). On the other hand, a large value of p’(S) means that a+5is a good substitute for the true parameter vector $J so that the
P. Saikkonen,
Asymptotic
relutiue efiicienq
359
Table 1 Values of d(s. a, j?)/d(r.
a, j?) for a = 0.05. s=2
S=l Br \ 0.25 0.50 0.70 0.90
2
3
4
12
3
4
5
12
0.730 0.775 0.801 0.830
0.609 0.667 0.702 0.741
0.535 0.598 0.637 0.682
0.324 0.388 0.430 0.481
0.834 0.869 0.876 0.893
0.733 0.772 0.795 0.821
0.663 0.709 0.737 0.768
0.444 0.501 0.537 0.580
loss from using it is not large. An extreme case occurs when g, is a linear function of g, because then p2(S) = 1. If the parameter vector y appears in the model, a similar interpretation emerges after eliminating the linear effects of gy from the consideration. Hence, one first predicts 6’g, linearly by gy and computes the prediction error. The limit of the squared multiple correlation coefficient between this prediction error and g, then equals p2(S). A sample analogue of p*(S) can be obtained from an auxiliary regression model as in Davidson and MacKinnon (1987). Next consider the ratio d(s, (Y,/3)/d( r, a, /I) in (10). When (Y and /3 are fixed d(s, a, P)/d(r, a,/?) only depends on r and s and represents their contribution to er2(6). Since rank( J++.,) = s by assumption, r 2 s. As shown in Das Gupta and Perlman (1974) the power of a test statistic with a noncentral chi-square distribution decreases as the degrees of freedom increase while the noncentrality parameter remains unchanged. Therefore, when (Yand p are fixed, d(k, a, /3) is an increasing function of k so that d(s, a, p) I d(r, a, p). Increasing the dimension of $ may thus decrease e,,(6). This happens when the additional parameters are not important enough to increase p2( S) accordingly. Davidson and MacKinnon (1987) use p2( 6) to investigate the power properties of misspecified classical tests. The authors give an illuminating geometrical description of their approach and point out that it yields the ARE if the tests have the same number of degrees of freedom. However, generally this is not the case. Davidson and MacKinnon have no equivalent to the ratio d(s,a, P)/d(r, a, /I) although they clearly notice its general implications. To illustrate the impact of d(s, a, P)/d( r, a, fi) on e,,(6) some numerical values of that ratio with a = 0.05 are given in table 1. When p2(S) = 1 the figures in the table are actually ARE’s. This happens, for instance, when the model is overspecified in the sense that II, is of the form + = [+’ $;I’. Then obviously g, = [I, O]g, and consequently p*(S) = 1. The figures in table 1 increase with /3. although not very rapidly. When s I 2 and r > s + 1 the loss of efficiency due to the overspecification of the model exceeds 20% in most cases. A practical situation where such an overspecification may occur is that of testing
360
P. Saikkonen,
Asymptotic
relative efficiency
the hypothesis +i = . . . = I& = 0 in (2) or (3). If there is no prior information about an appropriate value of s one may wish to guard against several different types of autocorrelation and may consequently overspecify s. Although e,,(S) may be of interest as such, measures which do not depend on S might sometimes be attractive. Following Wieand (1976) and Rothe (1981) we thus define the lower and the upper ARE as elz=
infe,,(S) 6
and
e&= supe,,(S), s
respectively. If e,= e&, we define the ARE as ei2 = elz= eT2. This equality does not generally hold unless Y= s = 1. If r = s, then the above measures of efficiency are independent of OLand j3 conforming to the classical situation [see Hannan (1956)]. However if r > s, they should be examined for a number of chosen values of (Yand /I. To derive explicit expressions for e;* and e& let p: I . 3. I p,’ be the roots of the determinantal equation
It is well known [see, e.g., Rao (1965, pp. 493-495)] that p: > 0 when rank( J++.y) = s. From the definition Rao (1965, p. 59) we can now conclude that
e,= ptd(s, a, P)/d( r,
cu, j3)
and
that 0 I p: 5 p,’ I 1 and of ei2( 6) and a result in
ec2= &(s,
a, P)/d(r,
a, P).
It is seen that e,= p: and e:,= p,’ if r = s. Furthermore, ei2 = pt = p2(S) if 1. As noted above d(s,a,p) rd(r,a,/l) so that e:,< 1. Hence, if ARE is used as a criterion, LR,, LM,, and W, can never be superior to LR,, LM,, and W,. From the above it follows that p: I p2(S) I p,‘. With this in mind, consider canonical the quantities pi,. . . , p,. They are the limits of the conditional correlations of the normalized score vectors g, and g, after the linear effects of gY have been eliminated. Their squares can thus be regarded as measures of similarity between the parameter vectors \cI and + after eliminating the effects of the parameter vector y. If $J and $J describe similar features of the data then pi,..., p,” are all close to one and, provided that r = s, the difference between C, and C, is (asymptotically and locally) not large. An extreme case occurs when pf = 1 for all i = 1,. _. , s. Then, if also r = s, statistics C, and C, are asymptotically equivalent. For LM statistics this is the case in particular when LM, = LM, which may happen even if Hi and H, are two quite different hypotheses. This phenomenon is sometimes called an invariance property of the LM test and it has recently been considered in a number of r=s=
P. Suikkonen, Asympiotic relative efficiency
361
papers [see, e.g., Godfrey (1981) and Poskitt and Tremayne (1981, 1982, 1984)]. On the other hand, if $ and + describe very different features of the data, then at least p: is considerably below unity and there are directions in the parameter space against which the m&specified test statistics are rather inefficient. The above discussion also makes it clear that when r = s the lower ARE and upper ARE do not change when the roles of the log-likelihood functions L,,( y, 4) and L,,(y, +) are changed. In the above comparisons misspecified tests are compared with optimal tests based on the true model. Davidson and MacKinnon (1987) consider the more general case where both tests may be misspecified. To keep the exposition simple we have chosen not to adopt that approach. However, the above results can readily be extended to the case of two misspecified classical tests. Let Ci* be another misspecified test statistic corresponding to C, and r* the number of degrees of freedom in Ci*. Analogously to e,,(6) we can obtain e,*,(6), the ARE of C,* with respect to the optimal test statistic C,. From the interpretation of the ARE in terms of relative sample sizes it is fairly obvious that the ARE of C, with respect to Ci* for a fixed 6 equals e,,(S)/e,*,(a). A rigorous proof of this result can be obtained from the appendix by replacing statistic C, by Ci* in the derivation of e,,(6). The corresponding lower and upper ARE can now be defined in an obvious way. The interpretations based on the multiple correlation coefficient and canonical correlations can also be modified to cover this extension.
3.3. ARE
when rank( JG,+.y) < s
Assume now that rank( J++,.,,) < s. From the appendix it can be seen that the derivation of (10) is valid even then for any 6 satisfying J++.yS f 0. This implies that the above definition of e& is also valid unless J++,.? is a null matrix. Now suppose that 6 belongs to the null space of J++.,. As pointed out in section 3.1 the asymptotic local power of C, is then not greater than the size of the test so that C, must be superior to C,. Indeed, it is shown in the appendix that when J+,,,S = 0 the sample size Ti8(6) must necessarily tend to infinity at a faster rate than Z&( 6). This implies that 7$(G)/Tip(S) must converge to zero so that we can define e,,(6) = 0. On the other hand, when 6 belongs to the null space of J++.v we have A,(6) = 0 and the earlier definitions yield e,,(6) = 0 and e&= 0. We can thus conclude that the various ARE concepts given in the previous subsection are well defined for any S regardless of the rank of the matrix J+s.v. Now, suppose that r < s. Since d(s, (Y,/3) > d(r, (Y,p) if follows that e12 may exceed one. In fact, e&> 1 if p,’ > d( r, (Y,/?)/d(s, (Y,p). An intuitive interpretation of this is that if the low-dimensional parameter vector ~,5can capture some essential features of $ sufficiently well, there exist directions in the parameter space against which C, is more powerful than C,. However,
362
P. Suikkonen,
Asvmptotic
relative efficiency
since e,= 0 there also exist directions which cannot be captured by #. Against such directions the power of C, is very low. This makes applying C, difficult unless prior information about the relevant directions is available. The above discussion applies when L,,( $J, y) represents an underspecified model so that + = [ 4 +;I’. Then g, = [ 1, O]g, and, consequently, p: = . . . = P,‘_~ = 0 and ~:_~+i = . . . = p,’ = 1. Therefore e,= 0 and e&= d(s, (Y,j?)/d(r, 1y,p) > 1. If 6 belongs to the null space of J+,+.,,, then e,,(6) = e,= 0. On the other hand, if 6 belongs to the orthogonal complement of the null space of J++.u, we have e,,(6) = e&> 1. In the latter case C, is (asymptotically and locally) more powerful than C,. Numerical values of the ARE of C, relative to C, can then be obtained from table 1 by interchanging the roles of Y and s. In general we may write 6 = 6, + 8, where 8, belongs to the null space of J++., and 6, to its orthogonal complement. If 6, and 6, are both nonzero vectors, then 0 < h,(6)/A,(6) < 1. The figures in table 1 with the roles of r and s interchanged then show how small h,(S)/X,(S) may be for C, to be asymptotically at least as powerful as C,. Now, suppose there is prior information saying that 6, # 0 if S # 0 and that the contribution of 8, to ]]Sl] likely to be small. It may then be worthwhile to deliberately underspecify the null hypothesis. Das Gupta and Perlman (1974) have dealt with this topic in the case of Hotelling’s T2 test. Another possible example is testing for autocorrelated errors in regression models. Sometimes one may have good reasons to believe that the possible autocorrelation mainly appears at some specific lags like one and four quarters for quarterly time series data. In such cases it may be worthwhile to use only these lags in the test even though autocorrelation at some other lags cannot totally be ruled out in advance. 4. Overspecifying
the DGP
In this section we are concerned with the null hypothesis H, introduced in section 2. Since L,,(y, $) is the log-likelihood function of the DGP the optimal test statistics are LR,, LM,, and W, discussed in the previous section. The actually applied test statistics LR,, LM,, and W, are based on the log-likelihood function L,(B) and can be defined in the usual way. Since we now consider an overspecification rather than a misspecification of the DGP the conventional asymptotic theory applies. Recall that r = [y’ 4’1’ and define J+,.r in the same way as J+.,. Under the sequence of local alternatives (4) we thus have
where X,(6) = S’J,.,S overspecification.
[cf. (7)]. Using (7) and (11) we can study the effects of
P. Saikkonen,
Asymptotic
363
relutroe ejiaency
Let C, be any one of statistics LR,, LM,, or W, and es*(s) the ARE of C, relative to C, for a fixed 6. Since both statistics have the same number of degrees of freedom we have e,,(6) = A,(6)/X,(6). Using the definition of J+.7 and a recursion formula for partial covariances [see Anderson (1958, pp. 33-34)] shows that X,(S) = S’(J+., - J++.,J;.~J++.,)S = h2(8) -X,(6). Thus,
e,,(a) = 1 - h(S)/X,(S), and following the derivations and the upper ARE as ey2= i;fe,,(d)
of the previous
= 1 -p,’
and
section
e&= supe,,(6) s
we can obtain
the lower
= 1 -pi,
respectively. The expressions of e, and e& indicate that the behaviour of the ARE is opposite to that in the previous section. If II/ and + describe different features of the data, then pi,. . ., p,’ are all close to zero and the loss of efficiency in C’s is not substantial. An intuitive interpretation of this is that conditional on y the parameter vectors 4 and cp are nearly ‘orthogonal’ so that the consequences of overspecifying the model by # are not serious. On the other hand, if 4 and cp describe similar features of the data, then at least p,’ is close to one and C, may be rather inefficient as compared with the optimal test statistic C,. Instead of being nearly ‘orthogonal’ 4 and $ are thus almost ‘linearly dependent’ so that including both in the model may lead to an inefficient test procedure. An extreme case occurs when p:, . . . , p,’ are all equal to one because then eT2= e&= ej2 = 0. In the case of LM statistics this particularly happens when the LM test is invariant to Hi and H,. To demonstrate practical consequences of this theory suppose that the investigator wants to check a model specification by first applying C,. Assume that C, does have a good relative power, i.e., pi is large. The value of C, may then turn out to be significant and one may proceed by using a model with log-likelihood function L,,(y, $). If this specification is in turn tested with C,, the test statistic has a low relative power because 1 - pt is small. It may thus well happen that test statistic C, cannot reject the null hypothesis and, as a result, an incorrect model may be adopted. One might therefore argue that in addition to C, and C, it would be useful to apply C, as well as a test for the (true) null hypothesis I/J= Ic/* in L,(y, 4, $J). Studying the outcomes of all these four tests might be helpful in the specification of the model. However, how that information should best be used is not the topic here. We just want to point out that, if there is more than a single serious alternative to the null model, one should be careful in using sequential hypothesis testing procedures like the one based only on C, and C,.
364
P. Saikkonen,
Asymptotic
relutive efficiency
5. An application To illustrate the theoretical results of the paper we shall consider the LM statistics introduced in section 2.1. More examples can be found in Luukkonen et al. (1988) where several LM-type linearity tests of univariate time series models are investigated. That paper also contains a simulation study which shows that ARE calculations can give useful information about the finitesample properties of the test statistics. Consider model (3) and, for convenience, assume that Ex, = 0. Since the impact of the difference between r and s has already been illustrated suppose that r = s. The information matrix of a2 and the other parameters of the model is block-diagonal so that we can assume that u2 is known. Hence,
L&q
= -(2ql
i
E;,
r=1
where E, is interpreted as a function Set wt = x;y + E,. The normalized model, are
of the coefficients in (3). score vectors, evaluated
under
the null
T
gy
=
0 -*T-l/*
c
x,E,,
t=1
T
g,
=
0 -2T-1/2
c
z,E,,
z,=
[Iv_1 ... wtes 1’9
ql,c,,
qr =
[&,_I . . . Et-,]‘.
r=1 T
&b = 0 -2T-1/2
c
1=1
Assume that the matrix I” = Ex,x: IY,, = Ex,z; = riX. By the stationarity
J
= Eg,g;
is nonsingular
set r, = Ez,z:
and
= up2
From (12) one readily obtains J+., = a-*(I” I,, and J+., = J+ = I,. It follows that
e,,(8) =
and
u26’rz:;6p6.
- I’,,r;‘r,,)
= u.12rZ.X, J++.? =
03)
P. Saikkonen,
Asymptotic
365
relative efficiency
Table 2 Values of e,* in (14) for some choices of ii* and
E2 IYI e12
x2 Iyl Cl2
1VI
0.90 0.95 0.53
0.90 0.80 0.24
0.90 0.60 0.15
0.70 0.95 0.81
0.70 0.80 0.54
0.70 0.60 0.40
0.70 0.30 0.32
0.50 0.95 0.91
0.50 0.80 0.74
0.50 0.60 0.61
0.50 0.30 0.52
0.30 0.95 0.96
0.30 0.80 0.87
0.30 0.60 0.78
To investigate (13) note first that r,., is the partial covariance matrix of z, conditional on x,. If y = 0, then z, = 7, and (13) attains its maximum value one as it should because (1) and (2) are identical when y = 0. A general conclusion might be that the smaller the absolute value of the multiple correlation coefficient between y, and x, the larger the efficiency of the misspecified statistic. On the other hand, if 6, y, and a2 are fixed, the value of (13) is minimized when TX, = 0. If I’,(j) = Ex,x;_,, this is equivalent to T,(j)y=Oforall j=l,..., s so that a sufficient condition is that x, is serially uncorrelated. Strong autocorrelation in x, may thus be expected to be favourable to the efficiency of the misspecified statistic. Consider the case r = s = 1 so that (13) is independent of 6. Let E2 = y’TXy/(a2 + y’rXy) be the squared multiple correlation coefficient between y, and x, in the null model. A straightforward calculation shows that the ARE of LM, relative to LM, is e12 = (1 - X2)/(1
- v2X2),
04
where v2 = y’rX(l)‘I”PITX(l)y/y’I’Xy. By the definition 0 I v: 2 v2 < vi 5 1, where vi (v,,,) is the smallest (largest) canonical correlation coefficient between x, and x,+i. If x2 = 0, then ei2 = 1 as already noted. If R2 > 0 is fixed and v2 is chosen sufficiently close to one, the value of (14) is also close to one. On the other hand, when v2 < 1 is fixed and R2 approaches one, the value of (14) tends to zero. Furthermore, ei2 2 1 - R2 with equality if and only if v2 = 0, i.e., I’,(l)y = 0. In general we can say that (14) can range between very low and very high values. Table 2 gives examples for some choices of R2 and 1v I. Note that if m = 1, v2 is independent of y and equals the squared autocorrelation coefficient of x, at lag one. An implication of the very low values in table 2 is that in general LM, is an unreliable overall test for the two hypotheses. From the relatively high values one can conclude that a significant value of LM, may well be caused by serially correlated errors. Now suppose that on the basis of a significant value of LM, or otherwise one has adopted model (1). One may want to proceed by testing the adequacy of (1) against (3). The alternative is overspecified as it contains the lagged
366
P. Suikkonen,
Asvmptotic
r&dive
efficiency
dependent variables. To illustrate the effects of this overspecification suppose that r = s = 1 and compare LM, actually applied with the optimal test statistic LM,. From section 4 we obtain e3* = 1 - ei2 so that the ARE of LM, is low (high) when that of LM, is high (low). From table 2 it can be seen that in a number of cases the overspecification can reduce the asymptotic efficiency of the test as much as 50%. Usually LM, is computed only when the estimate of IJ is significantly different from zero. However, a significant estimate of J, is when e32 is small. Hence, if likely to occur when ei2 is large or, equivalently, one first estimates (1) and obtains a significant estimate of $, the chances of finding the DGP by only testing for serial correlation in the disturbances may not be very good. 6. Conclusions Using Pitman’s local alternatives and the concept of ARE we have given a general account of the performance of the classical likelihood-based test statistics in situations where the alternative hypothesis may be either a misspecification or an overspecification of the DGP. Bounds of the ARE independent of the direction of local alternatives are also provided. All the ARE results are derived without the frequently used assumption that the tests have the same number of degrees of freedom. In fact, the expression of the ARE and its bounds are shown to depend on two ratios. The first one describes the effect of the difference between the degrees of freedom. The other ratio can conveniently be interpreted by using a multiple correlation coefficient and certain canonical correlations related to the model specifications under consideration. The results of the paper are particularly relevant when the classical tests are used in the specification of an econometric model. Very detailed information about the DGP is then rarely available and applying a test with a misspecified or overspecified alternative may be a rule rather than an exception. The example about the dynamic specification of a regression model demonstrates that relative to an optimal test the ARE of both a misspecified or a merely overspecified test can range between very low and very high values. This being the case, one should be careful in applying a series of tests sequentially. The choice of the hypotheses may have a substantial influence on the power properties of such a test procedure and thus on the conclusions. Finally, the results of this paper can be applied in designing informative simulation experiments. For instance, in order to make power comparisons between a misspecified test and optimal test in small samples, it is of interest to find alternatives against which the misspecified test is likely to have low or high power. ARE calculations may help to find such alternatives more easily than by just performing simulation experiments and gaining information by trial and error. Luukkonen et al. (1988) contains several examples of this.
Appendix The asymptotic equivalence of the classical test statistics is established in Davidson and MacKinnon (1987) so that giving the proofs for the LM statistics will suffice. We shall make use of the facts that 6 is a strongly consistent estimator of 8, and that 8, - 8* = 0(T-1/2) [see Holly (1982)]. Suppose that rank( .I++,.,) = s. We shall first derive the asymptotic distribution of LM,(T,,(G)) when (4) holds. For notational convenience write TI = Trp( 6) and note that we now have Tr observations in the test statistic and T in they (4). When the score vectors g,( 0), g+( 8). . . are based on Tl observations are denoted by g:‘(B), g?)(e), . . . For brevity we also introduce
-p(e) =
-r;la2L,(e)/ayayf,
and define J$‘, J$,), . . . analogously. Using (4) and the assumed regularity Taylor series expansion 2-c
where
‘/‘g;‘(
conditions
on L,(e)
we obtain
the
6) = T,- “*g;‘( e,,) - J$,‘( e,) T;‘2( 7 - yo)
]18t - e,]l 2 lie”- &,]I. Similarly, Tyg$“(
and, because
g;“(e)
T,-“2gf)(e*)
e,) = T,-“2g$y
e,) + (T~/TyJ~;y
8,)6
(16)
= 0, = J,“‘(e,)T,“‘(~
- yo).
(17)
Here the vectors 8, and r$ satisfy 116,- @I] 5 I]&+- 0,ll and 110,- e]l I I]& - 811, respectively. Since 6 is a strongly consistent estimator of 8, and 8, - 0, = O(Tp’/*), the vectors 8;, i = 1,2,3, converge to 0* almost surely. By the assumed regularity conditions the matrix J,“‘(e,) for instance then converges to J, almost surely [see the appendix of Holly (1982)]. Hence, when T, is large enough J,?‘( e,) is nonsingular so that combining (15) (16) and (17) yields T;“‘g$‘(&
= T;“2g;‘(eo)
- J;;‘(e,)
+ ( Tl/T)1’2.?&!yS,
J,‘“( e,)~‘T;“‘g~“(eo)
(18)
368
P. Suikkonen,
Asymptotic
relative efficiency
where
The assumed regularity conditions include T-‘/‘g,( t9,) z N(0, J).2 This can be used to show that the asymptotic distribution of the sum of the first two terms on the right-hand side of (18) is N(0, J+ .?). Since J,y.$ converges to J ,,.,6 # 0 almost surely it is clear from (18) that T,/T must be bounded both from above and away from zero for CY-C /3 < 1 to hold. Furthermore, the asymptotic distribution of the variable on the right-hand side of (18) is N(el(8)1/2J,,,,8, J+.,) where e,(a)‘12 = limr(Tls(s)/T)‘/2. It will be shown below that this limit really exists. We can thus conclude that
LJ4(Tlp@)) - x%l(~>h(~>).
09)
as
Denote by G,(x; X) the distribution function of a variable having a xi(X) distribution. Below we will write e, = e,(S), Xl = X,(6), and d, = d(r, (Y,/3) since 6, (Y,and j3 are regarded as fixed. Let a, be the asymptotic critical value of LM, so that limrP{ LM,(T) 2 a,} = a when S = 0. The distribution function of LM,(T,) converges to G,.(x; e,h,) uniformly in x [see Rao (1965, p. loo)]. Therefore limr aiT1 = a, and P{ LM,(T,) 2 a,,,} = 1 - G,(a,; e,X,) + o(1). Using (8) and the definition of d, thus yields p<
P( LM,(Tlp(6))
= 1 - G,(G,-‘(1
~u,~~) - /?; d,);
=l
- G,(G,-‘(1
-a;O);e,X,)
e,A,) + o(l).
+o(l) (20)
On the other hand, replacing TIB(S) in the second term of (20) by TIP(S) - 1 and using (9) gives p > 1 - G,(G;‘(l - p; d,); e,A,) + o(1). Hence G;‘(l - p; e,X,) = G;‘(l - p; d,) implying that e,(6) = d( r, (Y,/3)/X,(6). The above reasoning can also be used to show that TIB(S)/T really converges. Since Tlp(S)/T is bounded we can define lim supr Tlp(S)/T = .Z,( 6) and liminf, TlB(8) = e,(S), say. Analogues of (19) can then be obtained with el( S) replaced by Cl(S) and e,(S). Furthermore, the above arguments can be repeated to show that both C,(6) and e,(6) are equal to d(r, CY,j?)/A,(6). This shows that e,(6) is well defined. Now consider LM,. In the same way as above one can show that e,(6) = The expressions of e,(6) and e,(S) can limr(T,8(@/T) = 4 s, CX,/?)/X,(S). be derived for any sequences Tfl(6) satisfying (8) and (9). Independently of the choice of Tip(S) and T2p(8) we therefore have limrT,B(S)/Tip(B) = e2( 6)/e,(6) which proves (10). 21n Holly (1982) this assumption able to generalize it to the present
appears with J positive definite, form [see, e.g., Silvey (1959)].
However,
it is not unreason-
P. Saikkonen,
Asymptotic
relative efficiency
369
Finally, suppose that rank( J++.,) -C s and that 6 belongs to the null space of -cl) 6 now converges to a null vector almost surely, it immediJ +,+v. Since J++.y ately follows from (18) that a necessary condition for LM,(7’rB(S)) to asymptotically achieve the power /I > a! is that TIP(a) tends to infinity faster than T. Since the results on LAI~~(T~~(~)) are also valid when J+,.,,S = 0, it follows that TzB( 6)/Tlp( 6) must converge to zero. References Anderson, T.W., 1958, An introduction to multivariate statistical analysis (Wiley, New York, NY). Burguete, J.F., A.R. Gallant, and G. Souza, 1982, On unification of the asymptotic theory of nonlinear econometric models, Econometric Reviews 1, 151-190. Cox, D.R. and D.V. Hinkley, 1974, Theoretical statistics (Chapman and Hall, London). Das Gupta, S. and M.D. Perlman, 1974, Power of the noncentral F-test: Effect of additional variate on Hotelling’s T2-test, Journal of the American Statistical Association 69, 174-180. Davidson, R. and J.G. MacKinnon, 1985, The interpretation of test statistics, Canadian Journal of Economics 18, 38-57. Davidson, R. and J.G. MacKinnon, 1987, Implicit alternatives and the local power of test statistics, Econometrica 55, 1305-1329. Godfrey. L.G., 1978, Testing against general autoregressive and moving average error models when the regressors include lagged dependent variables, Econometrica 46,1293-1301. Godfrey, L.G., 1981, On the invariance of Lagrange multiplier test with respect to certain changes in the alternative hypothesis, Econometrica 49, 1443-1455. Hannan, E.J., 1956, The asymptotic powers of certain tests based on multiple correlations. Journal of the Royal Statistical Society B 18, 227-223. Harter, H.L. and D.B. Owen, 1970, Selected tables in mathematical statistics, Sponsored by the Institute of Mathematical Statistics (Markham, Chicago, IL). Hayman, G.E., Z. Govindarajulu, and F.C. Leone, 1962, Unpublished report of Case Institute of Technology. Holly, A., 1982. A remark on Hausman’s specification test, Econometrica 50, 749-759. Kendall, M.G. and A. Stuart, 1979, The advanced theory of statistics, Vol. 2, 4th ed. (Griffin, London). Luukkonen, R., P. Saikkonen, and T. Terlsvirta, 1988, Testing linearity in univariate time series models, Scandinavian Journal of Statistics 15, 161-175. Mizon, G.E., 1977, Inferential procedures in nonlinear models: An application in a UK industrial cross section study of factor substitution and returns to scale, Econometrica 45, 1221-1242. Newey, W.K., 1985, Maximum likelihood specification testing and conditional moment tests, Econometrica 53, 1047-1070. Pearson, ES. and H.O. Hartley, eds., 1972, Biometrika tables for statisticians, Vol. II (The Syndics of the Cambridge University Press, Cambridge). Poskitt, D.S. and A.R. Tremayne, 1981, An approach to testing linear time series models, Annals of Statistics 9, 974-986. Poskitt, D.S. and A.R. Tremayne. 1982, Diagnostic tests for multiple time series models, Annals of Statistics 10, 114-120. Poskitt, D.S. and A.R. Tremayne, 1984, Testing misspecification in vector time series models with exogenous variables, Journal of the Royal Statistical Society B 46, 304-315. Rao, C.R., 1965, Linear statistical inference and its applications (Wiley, New York, NY). Rothe, G., 1981, Some properties of the asymptotic relative Pitman efficiency, Annals of Statistics 9, 663-669. Silvey, S. D.. 1959, The Lagrangian multiplier test, Annals of Mathematical Statistics 30, 389-407. Tauchen. G.E., 1985, Diagnostic testing and evaluation of maximum likelihood models, Journal of Econometrics 30.415-443. Wieand, H.S., 1976, A condition under which the Pitman and Bahadur approaches to efficiency coincide, Annals of Statistics 4, 1003-1011.