Journal
of Econometrics
17 (1981) 67-82.
ON THE EFFICIENCY
North-Holland
Publishing
Company
OF THE COCHRANE-ORCUTT William
ESTIMATOR*
E. TAYLOR
Bell Telephone Laboratories, Murray Hill, NJ 07974, USA
Received July 1980, final version
received June 1981
Using analytic approximations, we reconcile some radically contradictory evidence and resolve an interesting paradox that occurs in a simple linear model with autocorrelated disturbances. In general, the behavior of conventional coefficient estimators is quite sensitive to the specification of the exogenous variables, or, equivalently, to whether the marginal efficiency or the conditional elXciency of the coefficient estimators is being compared.
1. Introduction Consider a simple linear model with serially correlated disturbances. Conventional coefficient estimators for this problem can be organized as numerical or statistical approximations to solutions of the likelihood equations [see Hendry (1976)]. Such procedures (i) begin with a consistent estimate of the disturbance serial correlation coefficient, (ii) use the estimate in place of the true parameter in the likelihood equations for the slope coefficients, and (iii) perhaps iterate until some numerical standard of convergence is achieved. A large number of asymptotically equivalent estimators fit this description. In a classic Monte Carlo study, Rao and Griliches (1969) showed that several such procedures - including the Cochrane-Orcutt (CO) estimator produce significant gains in efficiency relative to least squares (LS), even for small samples and small degrees of serial correlation. In a related analytic approximation, Malinvaud (1970, p. 526) showed the LS estimator to be seriously inefficient relative to the Cramer-Rao bound for reasonable values of the parameters. There thus appear to be substantial efficiency gains from correcting LS estimators for serial correlation, and it is probably fair to say that this has become the accepted view of the subject. Several recent studies, however, purport to vindicate LS in this context For a single trended exogenous variable, Maeshiro (1976, 1979) calculates *This work grew out of stimulating discussions, now several years old, with J. Rohlfs. 1 am indebted to D. Preston, R. Radner, and J.G. Ramage for helpful comments on previous drafts,
0165-7410/81/000Cr0000/$02.75
0
1981 North-Holland
WE. Taylor, Efficiency
68
of the Cochrane-Orcutt
estimator
exact variances which indicate that LS has smaller variance than CO throughout the economically relevant portion of the parameter space. Moreover, as the autocorrelation coefficient of the exogenous variable (A) approaches that of the disturbances (p), the variance of the CO estimator increases without bound, and - not surprisingly - its variance substantially exceeds that of LS in neighborhoods of these regions. From this, Maeshiro questions the practice of serial correlation corrections in general (1976) and, in particular, those which omit the initial observatior, (1979). Two Monte Carlo studies more or less echo the latter conclusion. Park and Mitchell (1980) emphatically recommend that CO (using T-l observations) be avoided, even in preference to LS. Spitzer (1979) finds an evident gain in efficiency from using all T transformed observations. ‘1hese studies raise a number of difficulties. In the first place, they flatly contradict Rao and Griliches’ (1969) conclusion that the CO estimator (based upon an estimated p and T- 1 observations) has smaller mean square error than LS for each value of ;pJ> 0.3, averaged over all jti. Secondly, the CO estimator is identical to the generalized least squares (GLS) estimator when p is known, except that it omits the first observation. Since the GLS estimator dominates LS (when p is known) for ail p and 1, it is something of even asymptotically the omission of a single a paradox that observation can make such a difference. This paper reconciles the above contradictions and provides a rigorous explanation for the paradoxical behavior relative to the initial observation. We note at the outset that the statistical characteristics of all estimators in this model are quite sensitive to the specification of the process which generates the exogenous variables. Depending upon the specification chosen, we show that for a priori reasonable values of p and A: (i) the data sometimes contain less and sometimes more information about the regression coefficients than the variability of the exogenous variables would indicate; (ii) the proportion of information in the sample carried by a single observation can be very large, even asymptotically; and (iii) the discrepancies in the numerical results above can be explained by differences in the specification of the exogenous variables, or, equivalently, by whether the conditional or the marginal variance of the estimators is being campared. 2. The model Consider the following bivariate linear regression model: ~=cl+j?X,+u,,
t = 1,. . .) T,
~,=Pu,-l
t=...,
(2.1) +q,
-l,O,l,...,
WE. Taylor, Efficiency
of the Cochrane-Orcutt
estiniator
69
in which the disturbances follow a stationary autoregressive process having mean 0 and constant variance cr,“. The E, are independent and identically distributed with mean 0 and variance 0,“. Since IpI is assumed less than 1, the disturbance covariance matrix is given by cov(ut,u,)=p”-~‘,~/(l
-p’),
= fJ,’= cTZ/( 1 - p’),
t#z=
. ..) -l,O,l,...,
t=t=
. -l,O,l,.... .)
Throughout this paper, we shall assume that p is known and that CIis either known or (equivalently) equal to 0. This is done solely for analytical convenience; as long as the X, process is strictly exogenous, the portion of the information matrix relevant to p has the same qualitative features regardless of whether (a,~) are treated as known or unknown. Numerical studies of the behavior of competing estimators for p have used a variety of specifications for the exogenous X, process. Prominent among these are (i) a stationary autoregressive process [Rao and Griliches (1969), Spitzer (1979), Maeshiro (1979)]; (ii) a non-stochastic autoregressive process [Maeshiro (1976, 1979)]; and (iii) various forms of positively trended series, both stochastic and non-stochastic Fark and Mitchell (1980), Maeshiro (1976, 1979)]. In order to identify consequences of these different specifications, we posit a model of the X, process which includes the first two of the above as special cases. Let X,=%X,-l
+I&
t = 2,. . a,T, (2.2)
x,=51+h,
1Aj-d.
The innovations u, are i.i.d. with 0 mean and variance cr,”(t =2,. . ., T); var(X,)=a$, E(X,)=t,, and E(u,[X,_,)=O. The main results we wish to compare are those of Maeshiro (1976, 1979), and Park and Mitchell (1980) on the one hand, and Rao and Griliches (1969) on the other. Since the latter is confined to stable processes (11.1 cl), we shall maintain this assumption here; however, recent Monte Carlo evidence has treated A> 1 and we consider this separately in section 6. If v, is normally distributed (or if we restrict ourselves to information contained in the first two moments of the data), special cases of the model in eq. (2.2) are observationally equivalent to the models simulated in the literature: (i) For model,
(4
@I=0, eq. (2.2) becomes
x,=/m_l+v,,
a covariance
t= . . .. -l,O,l,...,
stationary
autoregressive
70
WE. Taylor, Efficiency of the CochraneVrcutt
estimator
where E(X,)=O and var(X,)= o$= ot/(l -EW2). This is the model simulated Rao and Griliches (1969) and Spitzer (1979). (ii) For oi=O, we obtain Maeshiro (1976, 1979), x,=J.x,-l,
the non-stochastic
autoregressive
model
by
used by
t = 2, . . ., T,
(B) x1=<,. The principle distinction between models (A) and (B) is that different orders in probability of CT= 1 XF as T + a. Proposition 2.1. in model (B). Pro@.
The sequence in T, CT= 1 Xf
In (A). E(X,)=O,
Fliik i
is O(T)
they imply
in model (A) und O(1)
so
X:=0:=~$/(1-;.~).
f-l
In model (B),
plim i
XF = {T/(1 -L*),
a constant
i=1
Conditional on the Xis, the likelihood function for p is easily derived. Taking p as known, a non-singular transformation reduces eq. (2.1) to y,*=x:p+u;,
t=l 9.. ‘3T,
(2.3)
where Y::=J1_pzY,,
x:=J_x,,
uT=J~%l,,
and Y::=Y,-pY*_,,
x:=x,-pX1-Ir
ut*=ut-puc_l, f = 2,. . ., T.
By construction, the random variables (UT, uz,. . ., I$) = (UT,&*, . . ., Ed) are iid. with 0 mean and variance a,” ; assuming the E, to be normally distributed, the
WE. Taylor, Efficiency
joint density proportional
of the observations to
exp i
of the Cochrane-Orcutt
(YT,.
., Y:) conditional
-,ai (1-Pz)(yl-x,B)2+ [ E
i
r:x:=
and when p is known,
i t=2
i
x1”2
x:2
1=2
I
the 1 x 1 Fisher
(l-$)x:+ [
5 (y:-x:m2
r=2
(l-$)x;+
t=2
upon
(X,,
information
matrix
X,)
is
, II
from (UT, Ed,. . ., Q.) to (Y:, for fi is a solution to
since the Jacobian of the transformation one. The maximum likelihood estimator
(l-p2)Y,X,+
71
estimator
YF) is
1 p,
for p is simply
I!
a;,
(2.4)
whose inverse is the Cramer-Rao bound. Two technical details are important to note for the results to follow. First, the initial observation X, has been singled out in eq. (2.4). This has led to slightly misleading statements in the literature concerning the relative importance of the first observation. The transformation above that diagonalizes the disturbance covariance matrix is not unique; indeed, any observation could be chosen to play the role that the initial observation plays here. Finally, recall that the results in Park and Mitchell (1980) and Maeshiro (1976, 1979) for non-stochastic X,‘s are interpretable as conditional variances of the various estimators for fl: i.e., the results are conditional on the particular realization X, (t= 1,. ., T) that was used. To compare results for different stochastic X, processes and to attempt to identify characteristics of the process (e.g., trend) that effect the conditional variances, the unconditional variance is the appropriate measure. It represents the average conditional variance of the estimator, averaged with respect to the distribution of the X, process: i.e.,
var t&d = b, Cvar(&olX,)l. In any case, it is the parameter that the Monte Griliches (1969) and Spitzer (1979) are estimating. 3. Information
Carlo
results
of Rao and
in the sample
It is sometimes
asserted
[e.g., Maeshiro
(1979)]
that
the transformation
72
WE. Taylor, Efficiency
of the Cochrane-Orcutt
estimator
leading to eq. (2.3) reduces the variability of the exogenous variable and the precision of the associated estimator for fi. For large T or small tI, var (XT) ---%I var (X, 1
-2p>“+p2,
t>1,
and the approximation is exact in model (A). Thus the transformation increases the variance of the exogenous variable whenever (p > 0 and p > 22) or (p < 0 and p <2A), and reduces it otherwise. This fact is reflected in the approximate developed in the appendix,
Cramer-Rao
bound
for /I
For large T, the change in efficiency caused by p #O is proportional to (1-2~2 + p’), so the transformation increases the precision of our estimate of /I whenever it increases the variance of the explanatory variables above. In model (B), the Cramer-Rao bound is exactly $(l -A”)
var(Po& = (1 _PA)25:
.
Here, the fact that p #O increases the Cramer-Rao bound for p whenever p and i have the same sign and decreases it otherwise. Thus the effect of the GLS transformation on the amount of information in the sample for estimating p is doubly ambiguous: it depends on (p,i) differently depending upon whether model (A) or (B) generated the exogenous variable.
4. Asymptotic asymmetry of the initial observation A measure of the influence of the initial observation is provided by the efficiency of the CO estimator relative to the Cramer-Rao bound. Using the unconditional variances and the approximations from the appendix, var(&o) (1-2p~+p2)(T-l)a~+(1-p2)o~+~~(l-p~”)2/(1-,~~ ---------_~-(1-2pn+$)(T-l)o;+(:(P-~)2/(1-P) var (PGLS)
--
which clearly goes to 1 as T-+oo, provided ci#O. The effect of omitting the first observation is asymptotically irrelevant as intuition and Johnston (1972,
WE. Taylor, Efficiency of the Cochrane-Orcutt
estimator
13
p. 261) suggest. As 0: approaches 0, however,
var (&o)~ (1-PA)’ = varUkd (P- jv)’
1
+
(1-PW -J’) (p -1.)2
’
which increases without bound as p+i.. Thus there exists an extreme asymmetry in the value of the first observation which persists as T+co, but it is peculiar to model (B). Using the exact expression for the Cramer-Rao bound for j? in model (B),
the relative amount of information in the first and subsequent observations when p is close to L is obvious. Indeed, this explains the infinite variance of the CO estimator at p=ii in Maeshiro (1976, table 1). When model (B) generates the X, process and p =I., the first observation is the only relevant observation for estimating /3 in the following sense: Proposition 4.1. In model (B) when p =?., a minimal sufficient statistic for j3 is Y,/X,. It is unbiased but inconsistent for p, and is the maximum likelihood and GLS estimator for /I. Proof:
The results follow from the observation that v, =0 (t =2,. . ., T) in model (B), so that XF =0 (t =2,. . ., T) when p=A. The inconsistency follows from the fact that var @or,) =var (YJX, )=az/Xf which does not decrease with T. The CO estimator, which omits (X,, Y,) altogether, thus effectively discards all of the relevant information in the sample for estimating p in model (B) when p = i. This does not occur in model (A), where X: = v, when p =i. The pathological importance of the initial observation is thus an artifact of the peculiar process generating the X, in model (B). As long as some randomness persists in the exogenous variable, any single observation will become negligible asymptotically. 5. The relative efficiency of least squares
Notwithstanding the omission of the first observation, the Monte Carlo results of Rao and Griliches (1969) demonstrate the superiority of the CO estimator (based upon an estimated p and T - 1 observations) over least squares. In light of these findings, how do we explain Maeshiro’s diametrically opposite conclusion based on exact relative efficiencies? The answer again lies in the specification of the exogenous variable.
WE. Taylor, Efficiency
74
of the Cochrane-Orcutt
First, how inefficient is LS relative to be known? Conventional wisdom based on the approximation
for an X, process generated for large r. From a plot of appears extremely inefficient 0.5. From the appendix, we in eq. (2.2)
estimator
to the Cramer-Rao bound, assuming p [Johnston (1972), Malinvaud (1970)] is
by model (A) and (AI*and Iplr sufficiently small this expression in Malinvaud (1970, p. 526), LS for p near 1 but probably adequate for p up to derive the analogous expression for X, generated
var tiL&. varMLL
&l!!!Y l+pi
(1-p2)Ta~+(1-p2)~1/(1-~2) (1-2p/l+p~)(T-l)a~+(l-p2)a~+5:(1-p/l)~/(l-i.2) (5.2)
This is equivalent however,
asymptotically
vart&dB
l-p2
var (BLs)B “1_
to eq. (5.1) provided
+P2(l-j.2)<1 1-p2I,2
a:>O;
in model
(B)
(5.3)
.
For 1~1< 1, the expression in eq. (5.3) is greater than that in eq. (5.1), so relative to the Cramer-Rao bound, LS is more efficient when X, is generated by (A) than by (B). Although Malinvaud’s approximation (5.1) is valid for model (A), it is inappropriate for exogenous variables from model (B). To resolve the dispute about the relative efficiency of LS and the CO estimator, we compare the approximate unconditional variances from the appendix. For known p, we obtain var(&s)
z
(1-2pA+p2)(T-
1)Di+s’:(P-i)2/(l
(1-P2)T~~+5t(l-P2)/(l--2)
var @co )
-lW2) 1 +Pl” l-p).’ (5.4)
which reduces to
var (B4
l+pi
var @co) =I-pi
l-2p/z+p2 l-p2
in (A),
(5.5A)
WE. Taylor, Eficiency
of the Cochrane-Orcutt
1 + PI” (p - I,)2 =I-pi.
l-p2
75
estimator
in (B).
(SSB)
The expressions (5.5) demonstrate the source of the difference between the Monte Carlo results of Rao and Griliches (1969) in model (A) and the exact numerical calculations of Maeshiro (1976) in model (B). Eqs. (5.4) and (5SA) are both asymptotically equal to the reciprocal of the expression in eq. (5.2), reflecting the fact that the CO estimator is (asymptotically) fully efficient in model (A) and eq. (2.2). In (B) however, the asymmetric importance of the first observation gives a very different picture of the relative efficiency of CO and LS. Specifically, the ratio of the relative efficiencies in models (.4) and (B) is
which is always greater
than one and increases
without
bound
as p--+j_.
Closer examination of eqs. (5.5A) and (5.5B) shows that the optimal choice between CO and LS generally depends upon whether model (A) or (B) generated the X,. Since {(1+pi.)(l-2p~t-~~Z)~/~(1-pA)(1-p2)}~1:
Proposition 5.1. In model (A), the CO estimutor wriance than the LS estimutorfor all Ip\, )i) < 1.
bus smaller
approxim~rte
On the other hand: Proposition 5.2. In model (B), the CO estimator hus larger uppro?cimate aariunce thun the LS estimator for all i E [O, 1) whenever Ip] < 1/,/?=0.707. Proof:
From eq. (5.5B), (1 + &(P
-V
(1 -pi)(l
-p2)
< 1 ’
whenever 2p2 - p/i - 1
76
WE. Taylor, Efficiency
of the Cochrane-Orcutt
estimator
This difference is illustrated in fig. 1, where for T =20 and i =0.8, we plot var &)/var@co) against PE(O, 1). The upper curve derives from the approximation in eq. (5.5A); the second, from the approximation (5.5B). To exhibit the quality of the latter approximation, the third curve in fig. 1 is the exact relative efficiency for model (B), as calculated in Maeshiro (1976, table 1). Note that the vertical axis is in logs and that the approximation (5.5B) and the exact calculation for (B) coincide at 0 for p=O.8. 100 -
Fig. 1
6. Trended data Recent studies [Park and Mitchell (1980) and Maeshiro (1979)] suggest that the CO estimator performs particularly poorly relative to LS for a trended exogenous variable. Since Rao and Griliches (and Spitzer) confine their studies to stable X, processes, there is some speculation [particularly Maeshiro (1979, p. 259)] that it is the trend in the X,‘s that explains the difference in the relative efficiencies. To examine this, let X, continue to be generated by eq. (2.2) but with A> 1.
WE. Taylor, Eficiency
of the Cochranedlrcutt
estimator
71
The X, process is now unstable:
E(X,) =n’(1
and
var (X,)=0,2
i
jezi,
i=O
both of which increase without bound as t-co. In violation of the usual asymptotic assumptions, cTZ1 X: is no longer O(T) but ~(i(*“)~) for any E>O; however, the usual asymptotic properties of all estimators considered continue to hold. Of course, the relative magnitude of terms of O(T) and on T and on how close 1 is to 1; asymptotic 00. 2+E)T) depends approximations for /z> 1 may be misleading for moderate T and i, zz 1. In the unstable case, the CX: series is dominated by the last term, in the following sense :
=pn$E[ i
x:-(x;-x’,) 1
t=1
Since &-o and /?ors treat the final observation asymptotic asymmetry in the value of the initial Modifying results in the appendix,
var(iLo1 var (&,,)
(p -n)‘[4:(%’
- 1)+a,2i2]
Z (p - A)2[5i(ib2 - 1) + aii2]
identically, observation
there is no in this case.
- (%2- 1)2E(X:)/i2T - (JU2- 1)2E(X:)p2/i2T
’
so that the initial observation is negligible in both models (A) and (B). Its importance increases as p+E., but (p-2) is bounded away from zero by the assumptions 2 > 1 and IpI < 1. The efficiency of LS relative to CO is thus the same as that relative to GLS in this approximation. From the appendix, assuming lp/l[
A2 + pi
var @co) =m
(p - IV)’ (1 - p2)J2 ’
and it is straightforward to check that this ratio exceeds one for IpI < 1, 121 > 1, and )p1) < A2. Here, in both models (A) and (B), the CO estimator based JOE- D
18
NE. Taylor, Efficiency
of the Cochrane-Orcutt estimator
on (T- 1) observations is asymptotically more efficient than LS. Thus one cannot conclude that trend per se is particularly inimical to the CO estimator or that the lack of trend in Rao and Griliches’ X, process explains the difference between their results and those of Park and Mitchell or Maeshiro.
7. Summary In general, we have shown that differences in the specification of the X, process reconcile differences in the relative efficiency of LS and CO obtained in the literature. In particular, for stochastic processes of the form X,=iX,_,+u,, our approximations support the results of Rao and Griliches, provided var (u,) = ci does not equal zero. When 0,” =O, the CO estimator is inefficient relative to LS, which supports the results of Park and Mitchell and Maeshiro. When l/21> 1, the first observation is asymptotically negligible for any az(p # A); for such series, CO is efficient relative to LS. Alternatively, the difference in the specifications of Rao and Griliches (1969), Park and Mitchell (1980) and Maeshiro (1976, 1979) is equivalent to the use of the marginal variance of the slope estimator in the former calculations and the conditional variance in the latter two. Thus Park and Mitchell’s (and Maeshiro’s) results unambiguously apply to non-stochastic trended variables and to stochastic processes whose realizations happen to be trended. On the other hand, Rao and Griliches’ results pertain to a stochastic exogenous process and represent the expected conditional efficiency of the various slope estimators, where the expectation is taken with respect to the distribution of the X,‘s. In summary, these approximations suggest that the conventional view of serial correlation corrections outlined in the Introduction is correct, in the following sense. The CO estimator is asymptotically efficient and preferable to LS unless the realization of the X, process is strongly trended. For economic variables of the typical spectral shape [see, e.g., Granger (1966)], the expected conditional efficiency of the CO estimator is greater than that of LS. Of course, if p is known (estimated), the GLS estimator (asymptotically) dominates LS and CO for every X, process and is only slightly more burdensome to calculate. For practical purposes, then, one should always treat the first observation correctly. What we have shown is that the case in which the first observation is asymptotically important requires an unusual X, process; for typical economic variables, the asymmetry disappears in expectation.
WE. Taylor,
Efficiency
qf’the
Cochrane-Orcutt
79
estimator
Appendix It will be convenient at the outset to approximate E(l/xXf) in terms of the population moments of the X, process; in calculating the accuracy of the approximation, it will simplify matters to assume the X, are normally distributed. Lemma. freedom
(l/cri)c X: is distributed as non-central x2 with and non-centrality parameter d, = c i2(‘- “[f/o:.
Ll~mrl~l/.
probability, Proof1
T - 1 degrees
For r=2.3., ., the rth central moment sf 2 Xf is O(T”“) where [r/2] is the largest integer less than or equal to r/2.
Let mT =xX:.
The cumulants
of
in
of mT are given by
k.,.=2’-1(~-t)!(T-I+rd7.)~~,
r=l,....
[see Johnson and Kotz (1970, p. 134)] so that K, is O(T). Then from the relationship between central moments and cumulants, [outlined in Kendall and Stuart (1969, pp. 6%71)], it follows that for r> 1, the rth central moment ,u,.(mT) is a sum of cross-products of powers of cumulants, and the largest sum of exponents in any term is [r/2]. Q.E.D. Expanding
(l/c 1
CX: Taking mation.
Xf ) around 1 ECX1’
l/E(xX:)
in a Taylor
zX:-P,+(cX:-PIY P:
~~~...~---------~... 2!/1?
expectations of both sides and treating the remainder becomes
so that the first term in the remainder Thus
series, we write
is 0(Te2)
(CX:-PIY (n-l)!p;
the first term as the approxi-
and the rest are all O(T--‘).
since E(xXF)=(r$[(T-l)+d,]. This leads directly to approximations estimators for b in terms of the moments
for the variance of the various of the X, process. For the GLS
80
WE. Taylor, Efficiency of the Cochrane-Orcutt
estimator
estimator,
varULLS)=41 (l-~Vf+ [
from eq. (2.4). Expanding
var(/?,,s)=g:/
i: (X,-PX,-~)~ , 1
1=2
the denominator,
we get
(l-2pl+p2)~X:-p2X:+(2p~-p2)X$-2p5
Xt-iU,
!L
t=2
assuming plim (l/T)XF (iI < 1. Thus
var(&s)=cr~
is a constant.
If oi=O,
the remainder
.1
/[ 0(T3’*), O(AZT),
for
<:(l -pi)* l_i2
(l-2p~.+p2)(T-l)~:+(l-P2)o~+
+
is O(i*‘)
1
model (A), model (B).
The CO estimator is identical to the GLS observation omitted. By the argument above,
estimator
with
the
first
var @co) = cr,’ (1 -2pn+p2)(rl)~:+‘i~~~~~)z] i[
+
0(T-3’2),
model (A),
o(i*T),
model (B).
For ordinary least squares, Johnston (1972, p. 247),
Now for fixed i,
the
exact
conditional
variance
is. given
in
81
WE. Taylor, Efficiency of the Cochrane-Orcutt estimator
and for i = T-F: (E_21), the remainder T-l
/T-i
/
is 0( T- ’ ). Thus T-l
\
1 Pi(1 x,xz+i/ xx2 t ) = i;l PiJ:+o(PTIn
i=l
t= 1
and
var(PLS)=(l
d
_p”)cx,z
The term in brackets
I. Il +ziI;IT-1pi2+O(pT/T)
can be written
as
so that
var(PLs) = 4/
[
p$(1-p2)Toi+& )I
(
For 12)> 1, Xt is of the same order retained in the approximations. Thus var (ii,,,)
+O(pTAT/T2).
L
as IX:
and
obviously
= of/[( 1 - 2~1. + p”) C X: + (2~2 - p’)X$]
must
+ 0(/1-4T),
from which it follows that
var (lGLs) = o,2(A2- l)/(p - A)‘izT [5:+0&]+0(“-4~)>
using the fact that to order KzT,
E(CX:)=&E(X:). The identical approximation holds variances differ only in their treatment For LS when [Al> 1, the term
for var (flco) since the conditional of the negligible first observation.
be
82
WE. Tuylor, Efficiency of the Cochrane-Orcutt
to order A-‘=, so that the conditional
assuming
IpI.1
variance
variance
A2+ pi o,“(A’-1) var (flLs) =7
is
simplifies
to
(1-p2)i.2A2T
+O(FT).
/t2 - pA
estimator
I[
[:+o:& (
)I
References Granger, C.W.J., 1966. The typical shape of an economic variable. Econometrica 34, 15@161. Hendry, D.F., 1976, The structure of simultaneous equations estimators, Journal of Econometrics 4, 51-88. Johnson, N.L. and S. Katz, 1970, Continuous univariate distributions, Vol. 2 (Houghton-Mifflin, Boston, MA). Johnston, J., 1972, Econometric methods, 2nd ed. (McGraw-Hill, New York). Kendall, M. and A. Stuart, 1969, The advanced theory of statistics, Vol. 1, 3rd ed. (Hafner, New York). Maeshiro, A., 1976, Autoregressive transformation, trended independent variables, and autocorrelated disturbance terms, Review of Economics and Statistics 58, 4977500. Maeshiro, A., 1979, On the retention of the first observations in serial correlation adjustments of regression models, International Economic Review 20, 259-265. Malinvaud, E., 1970, Statistical methods of econometrics, 2nd ed. (North-Holland, Amsterdam). Park, R.E. and B.M. Mitchell, 1980, Estimating the autocorrelated error model with trended data, Journal of Econometrics 13, 1855201. Rao, P. and 2. Griliches, 1969, Small sample properties of several two-stage regression methods in the context of autocorrelated errors, Journal of the American Statistical Association 64, 251-272. Spitzer, J.J., 1979, Small-sample properties of nonlinear least squares and maximum likelihood estimators in the context of autocorrelated errors, Journal of the American Statistical Association 74, 4147.