Journal
of Econometrics
44 (1990) 41-66.
North-Holland
A UNIFIED APPROACH TO ESTIMATION AND ORTHOGONALITY TESTS IN LINEAR SINGLE-EQUATION ECONOMETRIC MODELS* M. Hashem
PESARAN
Trinity College, Cumhridge, United Kingdom Uuiversi!v of Culiforniu, Los Angeles, CA 90024.1477,
Richard b’rziuersit,l~of Munchester,
USA
J. SMITH Manchester,
United Kingdom
Maximum-likelihood estimation is considered for a generalisation of the model of Anderson and Rubin (1949) in which the exogenous variables in the structural equation may not be included in the reduced-form equations. Classical and specification tests are derived for orthogonality hypotheses A necessary and sufficient condition for their equivalence is presented. The classical tests are compared using Bahadur’s asymptotic relative efficiency criterion. It is shown that a generalisation of the Durbin-Wu-Hausman T2 statistics is asymptotically Bahadur-efficient.
1. Introduction Following the pioneering work of Reiersol (1941,1945) and Geary (1949), the method of instrumental variables has played an important role in the estimation of stochastic relations in circumstances where the ‘orthogonality condition’ is violated; that is, when some or all of the explanatory variables are correlated with the disturbance. The application of the instrumental-variables (IV) technique is, however, subject to two important considerations: (i) whether the chosen instruments are ‘admissible’ in the sense of being uncorrelated with the equation disturbance, and (ii) whether the available admissible set of instruments is used ‘efficiently’ if there are more instruments than the minimum number required for a consistent estimation of the unknown parameters. Both of these issues have been extensively researched in the literature. The problem of the efficient use of instruments has been considered by Durbin (1954) Sargan (1958,1959), and Hansen (1982); and the issue of instrument admissibility has been examined, primarily in the context *The referees
authors are grateful to Les Godfrey, Albert0 for their helpful comments on an earlier version
0304-4076/90/$3.50
T 1990, Elsevier Science Publishers
Holly, Peter of this paper.
Phillips.
B.V. (North-Holland)
and
anonymous
42
M.H. Pesurut~ und R.J. Smith, Estimation
and orthogona1it.v tests
of simultaneous-equations systems, by a number of authors, notably Wu (1973,1974), Revankar and Hartley (1973) Revankar (1978), Hausman (1978) Hwang (1980) Richard (1980,1984), Engle (1982) Holly (1982a, b), Smith (1983b), and Lubrano, Pierse, and Richard (1986). In this paper we present a unified treatment of the dual problems of instrument admissibility and instrument efficiency and provide an asymptotic power comparison of the various exogeneity tests proposed in the literature, using Bahadur’s (1960,1967) criteria of asymptotic relative efficiency. Bahadur’s approach is particularly useful here as it allows for a relatively simple method of comparing test statistics that are asymptotically indistinguishable by the more familiar Pitman local power criterion. Our unification is based on a likelihood framework that differs in two respects from the conventional formulation of Anderson and Rubin (1949) exploited extensively in the literature. Firstly, it is more general as it permits the endogenous variables in the reduced-form mode1 to be determined by different sets of exogenous variables, a situation that frequently arises in rational-expectations models [cf. Turkington (1985) and Pesaran (1986)]. Secondly, our choice of the factorisation of the likelihood function and the associated parameterisation allows important simplifications in the derivation of the maximum-likelihood (ML) estimators and in the generation and comparison of the various classical and Wu-Hausman specification tests proposed in the literature for testing the independence of some or all of the stochastic regressors and the equation disturbance. The plan of the paper is as follows: Section 2 sets out the genera1 likelihood framework and describes how it differs from the conventional framework of Anderson and Rubin (1949). Section 3 derives the ML estimators and provides a generalisation of the K-class estimator. Section 4 deals with the issue of instrument admissibility and derives classical tests for the independence of a subset of stochastic regressors and the equation disturbance. A comparison of classical and specification test statistics is provided in section 5, where the conditions under which these two types of tests are asymptotically equivalent are discussed. Section 6 gives an analysis of the asymptotic power characteristics of the various classical exogeneity test statistics using Bahadur’s approximate slope approach. 2. Formulation Consider
_)I, =
where
of the problem: A likelihood framework
the linear (Y’xr +
stochastic
pz, + u, =
relation
y’w, + u,,
a and fl are k- and r-vectors
t = l,...,
of fixed unknown
n,
(2.1) structural
parameters;
M. H. Pesarun and R.J. Smith, Estimation and orthogonality tests
or in matrix
43
notation,
wy+u.
y=xa+zp+u= The vector of stochastic
variables
x, = B’h, + u,,
t=l
(2.1’)
x, is generated ,.-.,
according
to the relation
n,
(2.2)
or X=HB+
V.
(2.2’)
Eq. (2.2) generalises the more familiar Anderson and Rubin (1949) limitedinformation simultaneous-equations model in allowing z, and h, to be r- and s-vectors of possibly distinct elements. In addition to the usual limited-information simultaneous-equations system where z, c h,, model (2.1)-(2.2) arises in other circumstances, notably in rational-expectations (RE) models. For example, the RE model y, = a’x,e + P’Z, +
E,,
(2.3)
where x, is determined as in (2.2) and x: = B’h, = x, - o,, can be written in the form of (2.1) with U, = E, - a’~,. [See, inter alia, Pagan (1984) and Pesaran (1987, ch. 7)]. In the RE contexts in which (2.2) and (2.3) represent behavioural equations for economic agents with heterogeneous information, z, need not be contained in h,. Another case of model (2.1) and (2.2) in which z, is not fully included in h, occurs when yr is assumed to Granger ‘noncause’ x1; that is, when z, includes lagged values of yt, whereas h, does not. The analysis that follows is based on the following assumptions: Assumption I. Conditional on z[, h,, and a,_,, the (k + 1) X 1 disturbance vector 5, = (u,, u,!)’ is distributed as N(0, z((), where _XIE is a nonsingular matrix and O,_, is the information set on {y,, xs, zs, h,} prior to period t. Assumption 2. column rank.
The parameters
y, B, and E,,
Assumption 3. Observation matrices the moment matrices ,X22= E( Z’Z/n) Assumption
1 can
be relaxed
are identified
and
B has full
Z and H are each full column rank and and z,,,, = E( H’H/n) positive definite.
in a number
of ways
to allow
for serial
44
M. 11. Pesoru~~ end R.J. Smith, Estimution
und orthogonality tesfs
correlation and conditional heteroskedasticity in the { 6,) process. The normality assumption is also not essential for the asymptotic theory developed below. Under Assumption 1. we have the following:
u,Iz,, h,, fir_,- N(0,CT*), x,Iu,. z,, h,, D,_, -
N(B’h,+
(2.4a)
au,,\I/>,
(2.4%
where Ess is partitioned conformably with 5, = (u,, u;)‘, 6 = u~*Z~,,~,, and ’ This factorisation has been exploited by Richard (1984) $J = z,,,, - fJ- 2~,>UCI”I’. and Lubrano et al. (1986) and independently in Pesaran (1984). Let 8 = (v’, a’, u2, (vet I?)‘, (vet #)‘)’ be the unknown parameter vector of model (2.4) and denote its log-likelihood function for the n observations on y,, l-3 we have x,, z,, h,, and a,_, by r(e). Then under Assumptions
log-density where f(E,lz,. h,, fin,_, ) is the conditional the system (2.1)-(2.2) in matrix notation, we obtain
function
n
I(e)
a - ilogo*-
$( y - WY)‘(Y - WY Vu’
- zlog
of cr.* Now for
141
-+tr{q-1(X-m-u8’)‘(X-ml-us’)}.
‘The more conventional framework of Anderson and Rubin (1949) exploited by many authors [see, inrer uliu. Holly and Sargan (1982) and Smith (1983b)] would replace (2.4) by ?; ( 1;. -, . 11,.Q, , - N( y’w, + q’u,.a’), .Y,~~,.Il,.R, , -N(B’h,,Z ,I,_ ), However, as will be evident from later sections. where n = z’,,,,‘2, ,, and w? = 0’ - B,,,H,.‘P,,,. formulation (2.4) allows substantial simplifications in the derivations of the ML estimators and the generation of orthogonality tests. ‘Here we have followed the convention Hall and Heyde (1980, p. 157).
and set the likelihood
for { 5,. s I 0) equal to unity. See
M. H. Pesurut~ und R.J. Smith, Estimation
5.
Maximum-likelihood
+&j-‘8
of B be b; then the first-order
i
W’t=
conditions
D/(e)
W’(X-H&q-‘8,
ii’ii/n
=
(3.2)
,
(3.3)
E=(H~H)-‘H’(x-~~&),
n$ = ( X -
= 0
(3.1)
Hi)‘ii=&ic’ii),
(X-2 0
45
tests
and other estimators
Let the ML estimator yield ;
and orthogonali
(3.4)
Hi? - i;&)‘( X - Hj - id’),
(3.5)
where fi=yCombining
Wf.
(3.2) and (3.4) yields
&=X’(X-Hl?-iii’). Now using we have
this result together
(3.6) with the equations
in (3.1) corresponding
ncT2S” = X’i2. This result. as will be seen in section 4, forms orthogonality tests proposed in the literature. Eqs. (3.7) and (3.2) can also be used to obtain
to X
(3.7) the basis
of most
of the
which forms the basis of the various IV estimators for y proposed in the literature. For example, when Z is in H, Sargan’s (1958) generalised IV estimator results by replacing B in (3.8) with the moment estimator for B = I;~2,1~,, namely B = (H/H)-‘H’X. In the general case when Z may not be in H, the ML estimator of y satisfies the relation
w@,,+ps$=0,
(3.9)
46
M. H. Pesorutl und R.J. Smith, Estimation
and orthogonality
tests
where
6, = Ph- F2L
(3.10a)
P,=H(H’H)-‘H’=Z,-M,,
(3.10b)
j? = ;‘P,,ic/iYn,
(3.1Oc)
S, = M,, - M,,X( x’M,X)-‘X’M,,,
(3lOd)
P = iXs,ii/ic’ti,
(3.10e)
6’ = (1 - ;‘)/i*.
(3.10f)
See appendix A for a derivation. Eq. (3.9) is a generalised generating equation for y, which may be written as
K-class estimator
9 = ( w’k,W)P1W4~y,
(3.11)
i?,, = Q,, + p’s,,.
(3.12)
where
Because of the dependence of k, on B, the computation of the ML estimators needs to be done iteratively. However, an asymptotically fully efficient estimator for y is provided by the following two-step procedure. Step 1. Compute Sargan’s (1958) generalised IV estimator i; = ( w’P,w)-Lw’Phy, ii in (3.10) the IV residuals ii = y - Wi;, and fi2, x2, and fi2, by replacing with ii. Step 2. Compute R, via (3.12) using the estimates from step 1, and hence the two-step estimator,3
qzs= ( W’k,W)-lW’R,y. ‘Notice that W’( i,, - A,,)W/n = o,(l), W’(ih - i?,)y/n1j2 = o,(l) as i;’ - fi2 = ~,(n-“~) and 2 - 1’ = o,,(l). Step 2 may be further simplified by using the asymptotically equivalent form of A,,, namely Pi, + k ‘S,,. thus allowing the computation of the two-step estimator to be carried out on most econometric packges by means of auxiliary regressions.
and orthogonuliry tests
47
7 and other estimators in the literature log-likelihood function
it
M. H. Pesorm und R.J. Smith, Estimation
To consider the relationship between is useful to examine the concentrated
l(J)a
-t{log(X’M,X/n)-log(l-8*)+log(d*x*)},
derived as (A.5) in appendix A. In the standard case when Z is in H, we have
rG2i2= ixs,,a= y&y, for all values of y, and thus l(0) will be a monotonically decreasing function in p*. Minimising p2 in terms of y gives precisely the standard limited-information maximum-likelihood (LIML) estimator derived by Anderson and Rubin (1949). Moreover, since plim( n’/*/i*) = 0, j?* may be set to zero which yields Sargan’s generalised IV estimator. Durbin’s (1954) estimator maximizes the angle between (y - Wy) and the space spanned by the columns of H, which is equivalent to minimising p* = u’P,,u/u’u with respect to y. When Z is in H, Durbin’s estimator is identical to the LIML estimator. In the general case where Z may not be in H, instead of minimising p* the relevant criterion for the derivation of the ML estimator of y is to maximise the ratio (1 - p*)/a*A* or equivalently u’M,u/( u’u)( u’S,u) in terms of y. 4. Orthogonality
tests
The issue of instrument admissibility and the associated problem of testing for lack of correlation between regressors and the disturbances have attracted a good deal of attention in the literature. These tests are generally referred to as orthogonafity tests, although in the simultaneous-equations systems’ context they are also known as exogeneity tests. The precise connection between these two types of orthogonality tests will be explored in section 5. In its general form the orthogonality hypothesis concerns the k,-subvector 6, of 6 = (&‘, 6;)’ defined in (2.4), namely, H,:
6, = plim( X;U/U)
= 0,
(4.1)
to be tested against H,: 6, # 0, where X = (X1, X2) is partitioned conformably with 8. In the context of the RE model (2.2)-(2.3) this hypothesis allows for the replacement of RE variables xF2 by their realization without a loss of efficiency.
M.H.
48
4. I.
Pesorurt crud R.J. Smifh, Estimution
and orthogonality
tests
Wald ( W) test statistics
The Wald statistic is based on the estimator of 6, given by (3.7) and examines the significance of 6,. Partitioning I/ and B conformably with X = (Xi, X2), since V, = X2 - HB,, using (3.8) we can write 8, = r;,,a/i%
(4.2)
Consider n
l/ZQ,Ei = n- i’zv;(I,-
W(W’R,W)_‘W’R,)u+o,(l).
(4.3)
As shown in appendix B, under H,, n -1/2?2fi2 has a limiting normal distribution with mean zero and variance matrix given from (B.5) in moment form as var( flp1/2+ifi) Substituting state: Proposition
estimators
4.1.
= o’plim[V;(I,,+ ,I+ 30
W(W’R,W)-1W’)V2/n].
and sample values where appropriate
(4.4)
in (4.4) we may
A Wald statistic for H,: 6, = 0 is given by
W,, = nz2’F,,l [ I,1 - W( w’k,W n . where P,,, = V2( Q2’f2) ‘f2’. with k, degrees of freedom. the diflerence in uncentred double-length IV regression
+ w’F[,*W) -’ W’] fi+z?/C’ir.
(4.5)
Under H,: 6, = 0, W,, has a limiting x2 distribution The statistic W,, may be simply computed as n times R2 from the LS regression of ii on p2 and the of
where 0,, is a zero vector of order n.4 Consider the more familiar LIML case where Z is included in H. Now 0, replaces R ,? in (4.5). In this case it is further possible to define an alternative Wald statistic by replacing &, c, and 6 in (4.5) with Ph, MhX, and respectively. Defining the projection matrix Ph,xz = P,, + y-WY, A4,,X,( X;M,,X,)-‘X;M,, and the generalised IV estimator under H, as (W’P,,. ,,W)- ‘W/P,,. \2 y, these substitutions result in: 4The uncentered
R’ of the IV regression
R’ = (J,‘J. - e’Pr)/r’y
where
of _v on X in the metric e=y
~ X( YPX)~‘X’qv.
P is defmed
by
M.H.
Pesum~ und R.J. Smith,
Estimution
and orthogonalit?~
tests
49
Corollary 4.2. When Z is included in H, the Wald statistic based on IV estimation under H, for the test of H,: 6, = 0 can be computed as the Wald statistic (denoted by W,,) for the test of d, = 0 in the IV regression y=
Wc+(M,,X,)d,+u,
(4.6)
in the metric P,,, 82. Setting d2 = 0, the IV regression yields the generalised IV (GI V) estimator for y under H,,, whereas when M,, X, is included, the GI V estimator under H,, namely 7, will result. The artificial IV regression in this corollary is in fact the same as that suggested by Newey (1985). The corollary also represents a simple generalisation of the expanded regression test of 6 = 0 discussed in Hausman (1978) and Nakamura and Nakamura (1981). When testing for the orthogonality of the whole set X, the relevant!metric becomes Ph,x, and since Z is assumed to be in H, the results of the IV and LS estimation of (4.6) will coincide. Furthermore, in this case the F statistic for the joint test of the significance of MAX in (4.6) will be numerically equal to Wu’s (1973) T2 statistic [cf. Smith (1983a)]. Seen from this perspective, the test in Proposition 4.1 and in Corollary 4.2 may also be regarded as an extension of Wu’s ‘exogeneity test’ [cf. Smith (1985)]. In fact, the test of 6, = 0 based on (4.6) is identical to the Wald version of Holly’s (1982b) extended regression [see Smith (1983b)]. But unlike in Holly’s procedure, it is unnecessary to compute the special matrix M,,X( X’M,X)-‘. It is also worth noting that one may choose to estimate u2 in the IV regression (4.6) by either ii’8/n or ii’i;/n, where v’= y - Wq - MhX2d2 and d; = (X,‘M,~X,))‘X;M,(y - Wi;), to give valid Wald statistics. Clearly the statistic using the latter estimator, denoted WIv, gives a computationally more attractive statistic than W,,.
4.2. Lagrange
multiplier
(LM) statistic
From (3.2) the score vector with respect to 6 is I/- ‘( X- HB - uS’)‘u. Denoting estimation under H,: 6, = 0 by tilde, it is easily seen that the relevant score vector for the LM statistic is given by (4.7) where ‘For
&,, = p;p22/n ML estimation
and f2 = M,,X,.’ under H,,, the model (2.1’)-(2.2’) is rewritten as y = Wy + u. X, = treating Z, H, and A’>as exogeneous, where L = 2,: ,P,.,$ 1. The _. _ to (3.9) and (3.10) are
H( B, - B2 L.) + X, 1, + E,.
corresponding
equations
w,( CA. \? + P%)J -7 _, p- = U P,,, ,2ii/;‘ii,
= 0.
CA,, ,? = 47. I2 - Br,,.
i;” = (1 - ,L?)/h;‘.
R = iifS,i;/ix.
M.H.
50
Pesurur~ urtd R.J. Smith, Estimution
und orthogonality
tests
To calculate the asymptotic variance matrix for ~‘/*$,-,‘~~2/ii from standard ML theory we merely use the inverse of the asymptotic variance matrix given in (4.4). After cancellation of $,, = _I?,,2,2under H, the relevant matrix will be estimated by
where a”‘= ii’ii/n, I?,,+ = &_ + [(l - P2)/x2]Sh, Thus, combining (4.7) and (4.8) we may state: Proposition
4.3.
P+,
- 6*I,.”
An LM statistic for H,: 6, = 0 is given by
LM,, = nii’v2
[
?i
i
I,, - W( W’l?h.x,
(4.9)
Under H,,: a2 = 0, LM has a limiting x2 distribution A simple
and o,_=
regression
version
with k, degrees of freedom.
of LM, of (4.9) is given by:
Corollary 4.4. An LM statistic for H,: 6, = 0 may be computed as the LM test for d 2 = 0 in the IV regression y=
with respect to the metric I?,, 1x * or iz,,* Proof.
(4.10)
Wc+(M,X,)d,+u,
See appendix
= Ph,x, + (l/i2)S,,.
C.1.’
When Z is in H, the relevant metric for regression (4.10) is &xz = Ph,x, ,5*1,,. The consequent LM statistic is a natural generalisation of the LM versions of statistics discussed by Wu (1973), Hausman (1978) and Nakamura and Nakamura (1981) for the full-set X regre_ssor case. From _the orthogonality of L@= (Xl, X2, Z) and ii, where r?, = H(B, - i2i) + X,L (from footnote ‘Under R 11.\-“
H,,:
iS2= 0. d,, = P,, - $I,, + [(l - jS*)/R].S,,
and
F,,, = Ph. \1 - P,,. Thus
r?, + F,,, =
‘Note that the IV regression setting d, = 0 returns j = ( W’kJ1,XzW))lwIk,,l v-,y or an asympat ;* totically equivalent estimator if it, 1 is used. If R, x or Rh*)-\* are evaluated and R. a similar result occurs. Also ihe IV regression with d, + 0 returns an asymptotically esquivalent estimator_ LO T = (W’R,_W)~‘W’R,y. This result is cbtained because R h. ,$( &‘R /r. 23) ’ VI’RA. x2 = (1 - 6) P[,, from appendix B.2 and R,, x* - P+ = R, from footnote 6. In fact the Wald statistic for d, = 0 derived from such an IV regression will differ from those in section 4.1 by an asymptotically negligible, o,(l), term.
M. H. Pesurun und R.J. Smith, Estimation
and orthogonality tests
51
S), the exact LM stafistic is given as n times the uncentred R* from the LS regression of ii on (W, MAX,) [cf. Engle (1982) and Smith (1983b)j. Using the GIV metric under H,, Ph,x,, in (4.10) gives the LM version of Corollary 4.2 due to Newey (1985), which is also the LM version of Holly’s (1982) statistic [see Smith (1983b)].
4.3. Likelihood
ratio (LR) statistic
For completeness, we present the LR statistic for H, [see also Lubrano et al. (1986) and Richard (1984)]. The likelihood value under H, is
(4.11)
see footnote
5. Together
from section
3, the LR statistic
Proposition
H,:
5. Comparisons
Hi given by
may be stated in:
The LR statistic for the test of H,: 6, = 0 is given by
4.5.
LR,.=dog[
which under freedom. ’
with that under
(;:;+og[
8, = 0 has a limiting
(4.12)
(;:;*)I,
x 2 distribution
between classical and specification
with k,
degrees
of
test statistics
For the purposes of comparison of the classical test (CT) statistics for H,: S, = 0 and specification test (ST) statistics we assume that there exist prior exclusion restrictions on X in (2.1’): y = xsa*
+ zp + u = w*y* + u,
(5.1)
where S = diag( S,, S,) is a (k, k,) selection matrix where S, and S, are (k,, kl,) and (k,, k2*) matrices, respectively, and W, = (XS, Z), y* = ( LY’*,j3’)’ ‘When Z is in H. we have I’= y’.S,y/n8’ and x2 =y’S,,y/nE2 and LR,, n log[(l - {I)/(1 - $)I, namely Hwang’s (1980) LR statistic. See also Smith (1984).
reduces
to
52
M. Il. Pesorcrtl owl R.J. Smith,
Estimution
und orthogonali&
tests
(the analysis of the previous sections is fundamentally unchanged except we now maintain (Y= Sa,). Such prior restrictions might arise in practice if some potential instruments do not appear in (2.1). [See Hausman and Taylor (1981)]. Given that y* is the parameter vector of interest, Hausman’s (1978) ST statistic may be expressed as
-=p2(Y*
-Y*)])
(7, - ?*I,
(5.2)
where q* = (W;i?,,W,)-‘W,‘I?,y and f* = (W,lfi, .,W,)-‘W,lk,.,zy. Maintaining plim p, = y*, the implicit null and alternative hypotheses of ST are H,*: plim & = y* and H:: plim f* f y*, respectively. [See Hausman and Taylor (1981) Holly (1982), and Ruud (1984).] Consequently, H, L H,* c H: c H,. Re-expressing H,*: plim( W;k,,. x2u/n) = 0 immediately reveals that ST examines the validity of Rh.x, W, as instrumental variables in (5.1). To compare H,with H :, the precise form of H, * is straightforwardly calculated as
(5.3)
after some cancellations.
Note that the statements
P
T* --)
Y*
and
n1j2fi2 5 0
are logically equivalent allowing us to drop terms involving fi2 which gives plim( Z’ri ,,, , ,u/n) = 0. The expression (5.3) allows us to relate the independence hypothesis H,, and the instrument validity hypothesis H,*: Proposition 5.1. In the context of (2.1’)-( 2.27, the hypotheses H,: 6, = 0 and H,*: plim y* = y* are equivalent if and only if S, = Ikz or, equivalently, if X, is included fully in ( 2. I ‘). Proof.
See appendix
C.2.
Note that no requirement is made regarding S,, that is, which columns of X, are included in (2.1’). Now an asymptotic equivalence in a local power sense occurs between CT and ST statistics if and only if H, and H,* (and thus H, and H:) coincide. [See Hausma_n and Taylor (1982, proposition 2.7).]
M. H. Pesorun and R.J. Smith, Estimurion
and orthogonulip
fests
53
The CT statistics for H, of section 3 and the ST statistic for H,* Corollary 5.2. based on nl/*( j?+- R) are asymptotically equivalent in a local power sense if and only if there are no a priori restrictions on the coeficients of X, in (2. I’). Noting var[ni’*(T* - y*)] = a2[plim(W~k,W,/n)]-1 and var[n’/*(j$ - y*)] in order to preserve the positive semidefinite= a2[pW W~R,,,,,K/n)l-‘, ness of { ’ } in (5.2j common estimators of u*, p* (or zero), and h* should be used. Thus Wald and LM versions of ST use a^*, F2, A* and 6*, fi*, x2, respectively, giving, for example, Rh,x, - R, = Ph,x, - P,, 2 0. A simple version of ST follows: Proposition 5.3.
A simple version of ST for H,* is
ST = ii’R,,W,[ W;( R, - R,,W*( W,R,,.>W,)
-‘W;R,)
W*] W,R,ii/a*. (5.4)
A Wald version, ST(W), uses o^*, fi*, and A*, whereas a Lagrange multiplier version, ST( LM), uses O-*, fi2, and x2. Both these versions may be computed appropriately as tests for the exclusion of (P,,, x2 - P,,)W, in the IV regression Y=
W*c,+(P,,,,-P,)W,d,+v,
(5.5)
employing the metric R,,, ~,. Proof.
See appendix
Note that (P,>,., regression of W, on (5.5) excluding and Wald versions) and tively. Comparing (4.10)
C.3. P,7)W, may be computed as the predictor of W, in the LS II~~X.~ Moreover, it is easily seen that the regressions in including ( Ph,_Yz- P,)W, provide Ho-efficient (LM or HI-efficient (Wald version only) estimators of y*, respecand (5.5), we also have:
Corollary 5.4. ST(LM) and ST(W) are algebraically identical to the LM statistic of (4. IO) and the Wald statistic given in footnote 7, using R, and R,* x1 defined in Proposition 5.3, if and only if X2 is fully included in (2.1’). Proof.
See appendix
‘Using(P,,, ,, identical
statistics.
-
C.4.
P,, ) IV,, the predictor
of
W, in the LS regression
of
W, on
Mh X2, gives
54
M. H. Pesurun und R.J. Smith, Estimution
and orthogonality tests
The above results generalise the discussion in Newey (1985) and Smith (1983b). When 2 is in H, and generalized IV estimation is used, that is, p2 = 0, regression (5.5) reduces to that suggested by Newey (1985). 6. Asymptotic relative efficiency of orthogonality tests by Bahadur’s approach As is already stated in Corollary 5.2, when there are no u priori restrictions on the coefficients of X2 in (2.1’) all the various CT statistics and the ST statistics proposed for testing the orthogonality hypothesis, H,, have the same asymptotic power function under local alternatives. As a result the familiar Pitman local asymptotic theory cannot be used to distinguish between these tests. An alternative asymptotic procedure would be to adopt Bahadur’s (1960, 1967) criterion of asymptotic relative efficiency, where for a fixed alternative hypothesis the test procedures are compared according to the rate at which their size tends to zero as the sample size increases. In this framework, one test is said to be (asymptotically) Bahadur-efficient relative to another one if its ‘approximate slope’ is greater. Under fairly general conditions Bahadur shows that the approximate slope of a test statistic, say T, which under the null H, is asymptotically distributed as a X2 variate, is given by [r = plim( n-‘T( Hi), where H, stands for the alternative hypothesis. The calculation of the approximate slope of the CT statistic for Ha: 6, = 0 is a straightforward, albeit, tedious matter. The approximate slope of the Wald statistic, W,,, of (4.5) in the general case is given by”
where (6.la) @ = ii22- tit_
+B’B,,[~Z,‘-(X,,+r)-‘]B,,B,
9, = plim( X’MLX/n) = Z,, + B’( Zhh - E,,Z,‘Z,,)
The above result simplifies considerably ‘OThe details of the derivation given in appendix D.
of the various
(6.lb)
B,
in the simultaneous-equations
approximate
slopes reported
(6.ld)
case
in this section
are
M. H. Pewrun and R.J. Smith, Estimation
where
Z is included
und orthogonality tests
55
in H. In this case we have r = 0, and
A further simplification also results when the orthogonality hypothesis of interest involves all the regressors in X. In this case 52,,_ defined in (6.le) needs to be replaced by O,., = plim( X’M,, xX/n) = 0, and (6.2) becomes ti,+/ = a%‘( n,’
- a-1)
8,
(6.3)
where s2, = z,,,, = plim( X’M,,X/n). But since Q2,- 52, = B’{ 1hh - z,z~z;‘~,,}
B,
then t,,, 2 0. The strict inequality 5, > 0 holds in situations where B # 0, and 6 # 0 and columns of H can not be expressed as exact linear combinations of the columns of Z. This last condition also ensures the consistency of the Wald test of H,: 6 = 0 against H,: S # 0 for all B # 0. It is now easy to show that the approximate slope of the W,, statistic defined in Corollary 4.2 is also given by (6.2). The approximate slope of the WI, statistic in Corollary 4.2, which is based on the IV regression (4.6), is slightly different from (6.1) and is given by
where the expression tw,, is the same as that given by (6.2). The difference between the approximate slopes of W,, and WI, arises because of the different estimates of u2 that are used in their construction. In Wzs, u2 is estimated by ii’ii/n, while in Wrv, u2 is estimated by tYG/n. (See section 4.1.) It is now easily seen that when Z is in H, tw,, > tw,, = Ew, and the WI, statistic based on the IV regression (4.6) is asymptotically more efficient than the other forms of the Wald statistic for the test of 6, = 0. This is a fortunate result as WI, is also easier to compute by means of standard computer packages than the other W, statistics. An explicit expression for the approximate slope of the LM statistic given by (4.9) does not seem to be possible. But, as shown in appendix D, when Z is in H, the expression for the approximate slope of the LM version of the test of d, = 0 in (4.10) is given by [LM = [,*,/(l - $), where
56
M. II. Pesurun crnd R.J. Smith.
Estimution
and Lv2sis already defined by (6.2). The considerably when the null hypothesis under this case we have
und orthogonulity
tests
expression for tLM simplifies consideration is H,: 6 = 0. In
o’S’(ii?,’ - a,‘)s ‘5l.M= Finally,
1-
a26’fr’6 2
(6.6)
.
using (4.12) for the approximate
slope of the LR statistic,
we have
where c$, pi, and X2, stand for the probability limits of a’*, fi*, and x2 under Hi. In general, derivation of an explicit expression for these probability limits does not seem to be possible, but as shown in appendix D, when Z is in H, we have tLR = -log(l - pg). The expression for pi in general depends on 6, and 6, in a highly nonlinear manner. But when the null hypothesis is H,: 6 = 0, tLR becomes
(6.7)
A comparison of the approximate slopes for the various by (6.3) (6.4). (6.6) and (6.7) allows us to state:”
CT statistics
given
Proposition 6.1. When Z is included in H, the approximate slopes of the CT statistics for the test sf 6 = 0 satisfy the following inequalities:
Proof.
See appendix
C.5.
The results of this section also suggest that the WI, statistic of Corollary 4.2, which provides a natural extension of the Wu-Hausman statistic T2, is asymptotically Bahadur-efficient, besides being relatively easy to compute on standard software econometrics packages. “Notice
that for H,,: 6 = 0. cw,” in (6.4) simplifies &+,v = 0?6’( Q,,
’ - Qua’)“/{ 1 - 0%‘0,‘6}.
to
M. H. Pewrun
urld R.J. Smith. Estimation
und orthogonaliy
tests
57
7. Concluding remarks The likelihood model employed generalises that of Anderson and Rubin (1949) to include the case in which the endogenous variables in the reducedform model may be determined by a different set of exogenous variables to that in the structural equation. The ML estimator for the structural form coefficients is derived and shown to have the familiar generalized K-class structure. There is some efficacy in considering ML-based procedures in that they dominate others according to second-order efficiency arguments. A simple two-step estimator is also proposed which is asymptotically first-order efficient. In the usual simultaneous-equations context of Anderson and Rubin (1949) an algebraic equivalence is obtained between the LIML and Durbin’s (1954) IV estimators providing a likelihood-based rationale for Durbin’s approach to the problem of the efficient use of surplus instruments. In the general framework of the paper, classical tests (Lagrange Multiplier, Likelihood Ratio, and Wald) are derived for the independence of a subset of structural stochastic regressors and disturbance. Specification tests of the Hausman and Taylor (1981) type are also obtained and compared with the classical tests. An asymptotic equivalence between the tests occurs if and only if the ST regressors are fully included in the structural equation. Simple expanded regression versions of these test statistics are presented which are thereby natural generalisations of those discussed by, inter alia, Hausman (1978) Engle (1982) and Newey (1985). Bahadur’s asymptotic relative efficiency criterion is used to compare the various classical tests which are indistinguishable by the Pitman local power criterion. In particular, the generalisation of the Durbin-Wu-Hausman T, statistic is shown to be asymptotically Bahadur-efficient. Although the analysis is performed under the assumption of normality, this assumption is not crucial to either the asymptotic results for the ML estimator or those for the regression forms of the test statistics. The framework adopted here naturally allows for the introduction of serial correlation and conditional heteroskedasticity in the disturbances. This will be the subject of a further paper. Appendix A Derivation of the generalised K-class estimator generating eq. (3.9) and the concentrated log-likelihood function
Substitution
of (3.4) in (3.8) yields
x’P,,Ti = #2x%,
(A.1 >
where P,, and /’ are defined in (3.10). Using (A.l) in (3.6) and denoting the
58
M. H. Pesarutt and R.J. Smith, Estimation
moment-matrix
estimator
and orthogona1it.v tests
of D = Z,, - Z,,,E;jz,,
by d, we have
j = fi - a^‘(1 - fi’)&,
(A.2)
where fi=X’M,,X/n
M,,=I,,-Ph.
and
Now using the normal from (3.4), we have
eq. (3.1) corresponding
Z’ii = o^‘Z’M,( X-
ii&)j-‘8.
This result can be further s^= X’M$/{
to 2, after substitution
for l?
(A.3)
simplified
by noting
from (3.7) and (A.l)
that
nS2(1 - fi2)},
and using (A.2),
C’
After
=
fi-’
some routine
+
a^2p
_
1 _
a^2(1
algebra
s’)fi-ls^s^f-1 _
{P/(1
{P/(1-~‘)},
-$*)}I$-%=si-18,
CP(l - s”)&&‘s^= Thus,
*
we now have
C?2(1-;2)C-t8=1[l-
fi2)&jz-16”
{(l-
b2)/P}
- 1.
eq. (A.3) gives Z’&ii
+ (1 - 6’) Z’S&@
= 0,
(A.4)
where &, S,, F2, and fi’ are defined in (3.10). But, since S,X= 0, combining (A.l) and (A.4), we have (3.9) in the text. To derive the concentrated log-likelihood function we first note that I(8) Now
a - ilog(B’ljl).
M. H. Pesurun and R.J. Smith,
Estimation
and orthogonulity
tests
59
Hence
l(6)
a -
iI0gltij+tlog(l
(A-5)
- fi2) - 510g(S2i2).
Appendix B
The asymptotic distribution of n’/‘( ?{ ii /n - o ‘~3~) Define
E, by V,=u8;+-
(B.1)
E,,
so that u and E, are thus uncorrelated, R,, = P,, + is,,
and note that
+ o,(n-‘/2),
(B.2)
where we have defined A2 = plim p = [I + u~S’I/-~CS-~. it is straightforward to show that v;l;/PI From
Using (4.3) and (B.2)
= a%, + o,(l).
(B.l) and (B.3) we consider
the asymptotic
distribution
of
n112( $;lC/n - a2S2) = n1/2 [ ( V;u/n - a2a2) - (V,lW/n)(
W’RhW/n)-‘W’R,u/n]
+ o,(l). Noting that the odd-order moments of V, and u are zero consequent on the joint normality assumption, we may show that n112(V;uu/n - ~~6,) and W’R,,u/n’12 are asymptotically uncorrelated using (B.l) and (B.2). However, again from (B.l), n112(V;u/n - ~~6,) has asymptotic variance 2 ++ + i3,&, whereZta2+= n-‘E(V2’V2).
n1j2( V2’l;/n - u2S2)
(
[ V
03.4)
M.H.
60
Pewrun
und R.J. Smith, Estimuiion
and orthogonaliy
tests
where var[ IZ-‘/~W’R,~U] Note
= a2 plim( W’R,W/n).
that var[n-‘/2(f-u)]
See, for example,
=a*[plim(W’R,W/n)]-‘,
Turkington
plim(I/,‘W/n) Therefore,
under
--)
= (Z,,2L,,0).
03.6)
H,: S, = 0, we have
, i a*h
N 0,
(1985). Also
1)21,2
+
(~U,I,,O)[plim(W'R,W/n)l
-1(zli2i130)‘}). (B-7)
Appendix C C. I. Proof of Corollary 4.4
and
under
local
alternatives
to H,,
p’*= ~,(n-‘/~),
rlz,.Kz = o,(n-‘/2). C.2.
Proof of Proposition
Clearly
H, implies
5.1
H,* or H, L H,*. As
is full column
we require
rank,
S, = Ix for H,* to imply H, or H,* G H,.
W’l?T,,xlii = 0,
I?h,x, -
M. H. Pesorm
C.3.
Proof of Proposition
61
und R.J. Smith, Esrimation and orthogonality fests
5.3
To obtain (5.4) from (5.2) we note the invariance of (5.2), as formulated in the proposition, to the choice of g-inverse. Hence A -‘B-A-’ is a g-inverse for ABA if A is nonsingular, which gives the form (5.4) for A = (W;R,W,)-’ and B defined as [ .I. Now Rll. .,m. \ - PII) = 4,x, - R, which justifies the IV regression form (5.5) for statistic (5.4). C.4.
Proof of Corollary 5.4
Necessity follows immediately from Proposition 5.1. For sufficiency, firstly note that ST of (5.4) is invariant to the choice of g-inverse [Rao and Mitra (1971, lemma 2.2.2)]. Secondly,
R/l,JP,LV2 - pJw=
An
appropriate
W(W’R,7,
g-inverse
(Ph.,,- P,)W=
V2[(~;V2)-1~;w],
is the block corresponding bordered by zeroes which
YzW)-1W’)~2]~‘,
to X,, [ fi(l,l will reproduce LM
and W, C.5.
Proof of Proposition
Let x = (1 - ~~)/(l 1 L pi 2 p2 r 0, then
6.1
-pi),
where pi = a*S’52,‘6
and
p2 = a2S’Q_‘8.
Since
x-l x-12logx2X
which upon substituting desired result.
2/Q-Fl*20,
for x and using (6.3), (6.4) (6.6)
and (6.7) yields the
Appendix D Derivation D.1.
of approximate
slopes
Wald statistics
Using results written as n -‘w,,
in section 4.1, the Wald statistic
=
for testing
H,:
6, = 0 can be
ir’v;[e;2/v2 + @w( W’R,W)-1wP2] -lv$,ti%. (D.1)
62
M. 11. Pesuru~~ and R.J. Smifh,
Estimorion
and orthogonulity
tests
Under H,: S, f 0, as n --) cc we haveplim(n-‘fi’fi) = u2, plim(?-‘f$,‘ii) = a26,, and plim(n-‘~~~Z) = C1t,21,2. Plim(n-‘V;W) and plim(n-‘W’R,W) are given in (B.6) and (BS), respectively. Using these results in (D.l) and employing Rao (1973, example 2.9) to invert the probability limit of the matrix (normalized by n-‘) inside the square brackets in (D.l), we obtain
where @ is defined in (6.lb) To obtain the approximate that, under H,, plim( K’ii’ii) plim(, To obtain
in the text. slope for W,, defined
in Corollary
4.2, first note
= u2,
- %+,,,a) _ = u%;Z,f,$,.
plim( n-‘ii’$,ii),
note that
and since P,,, ,., = P,, + F,j2,then
(D.2)
CD.31
where
a,, , z is defined swLs = o-‘(plim(
by (6.le) in the paper.
Hence
~YF~,~ii/n) - plim( ii’p$/n))
or
To derive the approximate (iYi2/YB)W,,, then t WW= u *t W,,/plim(
slope
ii’;/,
).
of
WI,
in Corollary
4.2, since
WI, =
M. H. Pesorutl und R.J. Smith,
Estimation
und orthogonulity
tests
63
But i;=y-
Wf-M,X,d;=
W(y-f)-
+zd”z+u
Therefore, noting that, under H,, 7 and vz are consistent V2, respectively, we have
) = u 2 - 2u
plim( i’C// where
estimators
*d;S, + d;Z1D11,2d2,
of y and
CD.51
d, = plim( d;l H,). Now using (6.5),
from which it immediately in (D.5) yields
follows that d, = a*.E&&.
Substituting
this result
and therefore
where tw,,is given by (D.4). D.2.
The LM statistic
The LM statistic (G’C/ti’ti) W,,. which
that we are concerned with here is given by LM = is the LM version of the statistic for testing d, = 0 in (4.6). The expression for li is defined by ti = y - WY, where T= (W’P,,. ,,W)~‘W’P,,, \->y.Writing ti = W(y - -+) + u, we have plim( z.i'ti/n ) = u 2 - 2fJ2(KO)17
+ 77’qJJ,
where 77= plim( 3 - y)
=
(
plim
(
n~‘W’P,,,,~W))-l(plim(n’W’P~,.~u)].
(D-7)
64
M. H. Pesurun und R.J. Smith,
Estimution
und orthogonuliiy
tests
But using (D.2) and (D.3) in the above, we obtain
where
Hence,
5/M= EW’J(~ - 4).
0.3.
03.8)
The LR statistic
The LR statistic for the test of 6, = 0 is given by (4.12). Noting H,, plim( b2) = 0, plim( i2) = 1 - a2S’G’i ‘6, then
that under
0.9)
where ui, Ai, and pi are the probability limits of cY2. x2, and F2 under H,, respectively. When Z is in H, it follows from footnote 8 that (D.9) reduces to tLR = - log(1 - pt). Unfortunately, the derivation of an explicit expression for pi does not seem to be possible. However, in the simple case where the null hypothesis of interest is Ho: 6 = 0 and Z is included in H, using the result 1-
fi2= ii’M,,, $i/iX,
M. If. Pesoruu uud R.J. Smith, Estimation
where
ii are the LS residuals plim( n -‘ii’M,,, $) plim( K’ii’ii)
and orthogonality
from (2.17, and noting
tests
65
that
= a*(1 - a%?;‘S),
= a2 - 2a2(6’,0)q
+ q’Z,,q,
and q = plim( 7 we have l-p;‘,=(l and hence ,$,.R = log{ (1 - a%%y’8)/(1
- a%‘q’S)}.
References Anderson. T.W. and H. Rubin. 1949, Estimation of the parameters of a single equation in a complete system of stochastic equations, Annals of Mathematical Statistics 20, 46-63. Bahadur. R.R.. 1960. Stochastic comparison of tests, Annals of Mathematical Statistics 31. 2766295. Bahadur. R.R.. 1967. Rates of convergence of estimates and test statistic, Annals of Mathematical Statistics 3X. 303-324. Durbin, J., 1954, Errors in variables, Review of the International Statistical Institute 22, 23-32. Engle, R.F.. 19X2, A general approach to Lagrange multiplier model diagnostics, Journal of Econometrics 20. X3-104. Geary, R.C.. 1949. Determination of linear relations between systematic parts of variables with errors of observations. the variances of which are unknown, Econometrica 17, 30-59. Hall, P. and C.C. Heyde. 1980, Martingale limit theory and its application (Academic Press, New York. NY). Hansen. P.L.. 1982. Large sample properties of generalised methods of moments estimators, Econometrica 50, 102991054. Hausman, J.A.. 1978. Specification tests in econometrics, Econometrica 46, 1251-1271. Hausman. J.A. and W.E. Taylor, 1981, A generalised specification test, Economics Letters 8, 239-245. . Hausman. J.A. and W.E. Taylor. 1982, Comparing specification tests and classical tests, Mimeo. (M.I.T.. Cambridge, MA). Holly. A., 1982a. A remark on Hausman’s specification test, Econometrica 50, 749-759. Holly, A.. 1982b. A simple procedure for testing whether a subset of endogenous variables is independent of the disturbance term in a structural equation, Cahiers de recherches economiquea no. X209 (Universite de Lausanne. Lausanne). Holly, A. and J.D. Sargan, 1982. Testing for exogeneity within a limited information framework, Cahiers de recherchcs economiques no. 8204 (Universite de Lausanne. Lausanne). Hwang, H.-S., 19X0, Tests of independence between a subset of stochastic, regressors and disturbance, International Economic Review 21, 749-760.
66
M.H.
Pe.wrun und R.J. Smith, Estimation
and orthogonalit.v
tests
Lubrano. M.. R.G. Pierse, and J.-F. Richard, 1986, Stability of a U.K. money demand equation: A Bayesian approach to testing exogeneity, Review of Economic Studies LIII, 603-634. Nakamura, A. and M. Nakamura, 1981. On the relationships between several specification tests presented by Durbin, Wu and Hausman, Econometrica 49, 1583-1588. Newey. W.K.. 1985, Generalised method of moments specification testing, Journal of Econometrics 29, 229-256. Pagan, A.R., 1984. Econometric issues in the analysis of regressions with generated regressors, International Economic Review 25, 221-247. Pesaran, M.H., 1984. A general likelihood approach to the instrumental varible estimation and tests of misspecifications; presented at the/Australasian meeting of the Econometric Society, 1984, Sydney, .z Pesaran, M.H., 1986, Two-step, instrumental variable and maximum likelihood estimation of multivariate rational expectations models. M&o. (Trinity College, Cambridge), presented at the European meeting of the Econometric Society, 1986, Budapest: Pesaran, M.H.. 1987. The limits to rational expectations (Basil Blackwell, Oxford). Rao, C.R., 1973. Linear statistical inference and its applications (Wiley, New York, NY). Rao. C.R. and S.K. Mitra. 1971, Generalised inverse of matrices and its applications (Wiley, New York. NY). Reiersol. 0.. 1941, Confluence analysis by means of lag moments and other methods of confluence analysis. Econometrica 9. l-24. Reiersol. 0.. 1945. Confluence analysis by means of instrumental sets of variables, Arkiv for Mathematik. Astronomi och Fysik 32, l-119. Revankar, N.S.. 197X. Asymptotic relative efficiency analysis of certain tests of independence in structural systems. International Economic Review 19, 165-179. Revankar. N.S. and M.J. Hartley, 1973, An independence test and conditional unbiased predictions in the context of simultaneous equations systems, Internatal Economic Review 14. 625-631. Richard. J.-F.. 19X0. Models with several regimes and changes in exogeneity, Review of Economic Studica XLVII. l-20. Richard. J.-F., 19X4. Classical and Bayesian inference in incomplete simultaneous equation models, Ch. 4 in: D.F. Hendry and K.F. Wallis. eds.. Econometrics and quantitative economics (Basil Blackwell, Oxford). Ruud, P.A., 19X4. Tests of specification in econometrics, Econometric Reviews 3, 211-242. Sargan, J.D.. 195X. The estimation of economic relationships using instrumental variables, Econometrica 265. 393-415. Sargan, J.D., 1959. The estimation of relationships with autocorrelated residuals by the use of instrumental variables. Journal of Royal Statistical Society B 21, 91-105. Smith, R.J.. 19X3a. On the classical nature of the Wu-Hausman statistics for the independence of stochastic regressors and disturbance, Economics Letters 11, 357-364. Smith. R.J., 1983b. Limited information classical tests for the independence of stochastic variables and disturbance of a single linear stochastic simultaneous equation, Discussion paper no. ES142 (University of Manchester, Manchester), presented at the European meeting of the Econometric Society. 1983. Pisa. Smith. R.J.. 19X4. A note on likelihood ratio tests for the independence between a subset of stochastic regressors and disturbance. International Economic Review 25, 263-269. Smith, R.J.. 19X5, Wald tests for the independence of stochastic variables and disturbance of a single linear stochastic simultaneous equation, Economics Letters 17, 87-90. Turkington, D.A.. 19X5. A note on two-stage least squares, three-stage least squares and maximum likelihood cxtimation in an expectations model, International Economic Review 26, 507-510. Wu, D.-M.. 1973. Alternative tests of independence between stochastic regressors and disturbance, Econometrica 41. 733-750. Wu, D.-M.. 1974. Alternative tests of independence between stochastic regressors and disturbance: Finite sample results, Econometrica 42, 529-546.