Journal
of Econometrics
ON
12 (1980) 49958. 0 North-Holland
RESTRICTED
ESTIMATION Warren
The University
Publishing
Company
IN LINEAR
MODELS
T. DENT
c~fIowa, Iowa City, IA 52242, USA
Efficient estimation of models under linear restrictions on parameters has received little attention. We present a unified approach which takes account of both full rank and less than full rank design and constraint matrices. The procedure is numerically fast, accurate, and space efficient. An example is included.
1. Introduction The increasing demand for restricted estimation in applied economics, coupled with greater use of many-parameter systems, necessitates a reexamination of traditional estimation techniques with respect to precision in computation. The treatment below presents a scheme for efficient numerical estimation and testing of stochastic linear systems in which subsets of parameters may be constrained by linear restrictions. The work builds on the principles embodied in unconstrained estimation, as developed by Golub and Styan (1973) and Dent (1976). The basic numerical technique relied upon is the Householder transformation [Golub (1965)], known to be numerically stable and accuracy preserving. The only previous research on efficient linearly constrained estimation in econometrics appears to be that of Ruble (1968), Golub and Styan (1973) and Gerig and Gallant (1975). The procedure presented here uses decompositions now standardized by Golub and Styan, but provides a more general, unified theme directed specifically to constrained estimation and tests of constraint validity applicable in econometric single equation models or systems. Accuracy is preserved together with efficient use of time. More intuitive discussions of theoretical problems are found in Searle (1971).
2. Notation The model examined
may be presented
as
y=X/?+q
(2.1)
RI/l= P,
(2.2)
WT Dent, Restricted
50
estimation
in linear models
where y contains n observations on a stochastic variable, X is the design matrix consisting of n observations on k non-stochastic variables, E is a set of n unobserved disturbances drawn from a multivariate population with zero mean and scalar covariance matrix, R' is a J x k matrix of constraint coefficients and Y is a J x 1 vector of constraint constants. In the format (2.1) this model is interpreted as a single equation model. We note, however, that estimation of a system of seemingly unrelated regressions [Zellner (1962)] may also be characterized by this format after estimation of a variance matrix for structural disturbances and when a transformation has been applied to the structural (y,X) data set to reduce this estimated variance matrix to classical scalar form. Constrained systems of the latter form are found, for example, in the increasing application of translogarithmic factor demand theory [Christensen, Jorgenson and Lau (1975)]. When the matrix R' in (2.2) has full row rank and X has full column rank (implying Js k) the restricted maximum likelihood estimator of p is well known [Theil (1971, $1.8)] to be
where fi is the unrestricted
maimum
likelihood
p^= (X’X)_‘X’y.
estimator (2.4)
For some time it has been felt that use of formula (2.3) is inefficient (due to the size of matrix inversions) as it stands. Appropriate reductions in dimensionality can be achieved by solving (2.2) for J beta coefficients in terms of the remaining k--J,substituting in (2.1) and estimating a system of dimension k-J. This procedure has been used exhaustively by Ruble (1968, g1V.B) and in part by Golub and Styan (1973, $5). It is incorporated in the widely used ESP (Econometric Software Package) computerized routines. We show, using a suitable transformation, that this two step procedure to reduce dimensionality may be avoided, and that direct substitution in the formula (or its analog for non-full rank cases) yields efficient computation, and directly produces the array of necessary information for coefficient estimates, estimated variances and test statistics for constraint hypotheses.
3. Constraint manipulation While
in many
economic applications the this may not be always known to have less than full row rank, structure for economic reasons, or its size
R' will be known,
rank of the constraining matrix true. Further, even when R' is one may wish to preserve its may prohibit efficient manual
WT Dent, Restricted
estimution
in linear models
51
replacement by a full rank version. 1 We, therefore, permit R’ to have than full rank. If R has rank ss J, then at least one subset of s rows of R’ has rank s. denote this set by IfiR’ where n, is a J x s matrix containing s columns the J order identity matrix. If s=J, then L’, =I,. Determination of s and is critical, and is developed later. Given L’, the constraint (2.2) is replaced its equivalent full rank version II; R’j3 = II; Y.
less We of n, by
(3.1)
If, in fact, SC J, a further check is needed on the data, and that concerns consistency of the J equations in R’fi = Y. If the third equation is equal to twice the first minus the second, then the third element of Y should be twice the first element minus the second element. This check for consistency is also developed later; for the moment, it is assumed to have been satisfied. We further assume that J 5 k. If this is not true initially, an equivalent full rank system (rank5 k) can be found by a process similar to that just described.
4. Full rank design matrix When k
p'w,Y)=
w
4
()
12
P, the product
( ) 1
where W is k x k upper triangular, II has k, and l2 has (n-k) elements. necessary unrestricted maximum likelihood statistics are then found as fi= w-‘I,,
of k
(4.1)
The
(4.2)
matrix of with SS, =error sum of squares = l;l,, and estimated covariance j?=1;12Wm1W’-‘/(nk). The heart of our procedure for restricted estimation lies in the following transformation and decomposition. A similar reduction to that above [cf. Golub and Styan (1973, eq. (1.29))] yields an orthogonal Q k x k, the product of s Householder transformations, such that ‘For example, estimation of factor share relations derived from translogarithmic cost functions [Christensen, Jorgenson and Lau (19731 involves parameter restrictions for linear homogeneity and symmetry. With 4 factors, system estimation involves )($ + 3q + 6) linear constraints on q(4 + 2) parameters. The rank of the constraint matrix is i($ + q + 4), which is less than or equal to the number of parameters whenever 4 is at least 2.
52
WT Dent, Restricted
estimation
in linear models
(4.3)
where U is s x s upper triangular, I/ is s x (J-s), w1 and w2 have s and (k-s) elements, respectively, II is a J x J permutation matrix and s is the revealed rank of R. The transformation W’- ‘R is easily effected as the solution for A in W’A = R, which is facilitated by the upper triangular nature of w. When ll and Q are partitioned as (ZI,,ZI,) and (Qi, Q2), respectively, where II, and Qr have s columns, 17, has (J -s) and Q2 has (k -s) columns, Ui may be employed as that set of permutation columns which defines a set of s linearly independent rows of R’, and as used in relation (3.1). From (4.3) the columns of R (rows of R’) satisfy W’-‘RII,
=QIU,
(4.4)
W’-‘RIlz=QIV=QJJ(U-‘V), so that RIi’,=RII,(U-‘V). For consistency of the constraint therefore satisfy u’zl, = v’zz, (u-
set (2.2) when s < J, the elements
l V).
of Y must
(45)
If R has full rank s = J, then in (4.3) II = I,, and V does not exist. In this case, and when r=O, there is not need to check for constraint consistency. When (3.1) is the appropriate constraint, the restricted maximum likelihood estimator (2.3) is given by j*=b+(XfX)-‘RL’,[I7;R’(X’X)-‘RI7,]-’[~;r-~;R’fi,
(4.6)
/?*=p^+ W~‘W’-‘R171(~;R’W-‘W’-‘R~1]~’ x [Z7;r-II;R’W-‘I,] =j?+ W-‘Q,UIU’Q;QIU]-‘[n;r-U’Q;ll] =b+
W-lQIUU~‘U’-l[~;v-U’wl]
=j?+ W-‘QIIU’-‘II;v-wl]. Since the term QIU ‘- ‘ZI;r is non-stochastic, given by
(4.7) the variance
matrix
of /I* is
W!7: Dent, Restricted
estimation
53
in linear models
VW*)= fC(W-‘(4 -Qlwl)l
=“IrCW-‘(4-QlQ;4)1 = V[W- 'Q2Q;Zl] =c2Wm1Q2Q;W’-l,
(4.8)
since I, in (4.1) constitutes a set of independent identically distributed normal variates, with common variance 02, say. In conjunction with (4.6), this formula shows that the algorithm requires temporary storage of the matrix W-‘Q (k x k). In general, this should not create serious difficulties, and the transformation can be effected at the stage (4.3). We note that in general this variance matrix is singular and, further, that it is possible for some or all of the diagonal elements to be zero. When w1 or U’- ‘L’;r- wi is null in (4.7) the matrix W- ‘Q2Q; W’- ’ in (4.8) is replaced by W- ‘W’- ‘. The algebraic formula for the above variance, corresponding to the notation of (4.6) is a2[(X’X)-’
- (X’X)-
‘RL’,(77;R’(X’X)-
‘R77,)-
‘77;R’(X’X)p
‘1,
which reduces to (4.8) on substitution. The usual estimate of the error variance under restrictions is SS,/(n - k + s) where SSa is the sum of squares of the residuals y-X/?* found from (4.2) and the formula [see Theil (1971, 93.7)1 ss,-ss,=(n;r-n;R’g)‘[n;R’(X’X)-‘Rn,]~’(n;v-n;R’B) =(n;v-U’Q;Z,)‘U-‘U’-‘(~;v-U’Q;I,) =(U’_‘rII;u-w,)‘(U’_‘rI;v-w,),
(4.9)
corresponding to components of (4.7). The value of the Fs,n_k test statistic for evaluating pothesis (3.1) is given by (n - k)(SS, - SS,)/(s)SS,.
the
restriction
hy-
5. Non-full rank design matrix The decomposition
(4.1) may be generalized
as
(5.1)
where P (n x n) is now the revealed product of Y Householder transformations, W is r x Y non-singular upper triangular, S is r x (k-r), T is a k x k permutation matrix, 1, and 1, have r and (n-r) elements, respectively. The
54
MT. Dent, Restricted
estimation
in linear models
rank of X is r, presumed to be strictly less than k. In this instance, no unique maximum likelihood estimators exist, and there is no interest in any specific member of the class of solutions fi- and /I *-. Estimated standard errors of such solutions are also of no interest, although the sums of squares of residuals SS;, and SS, are. Indeed, these statistics take on unique numerical values with
ss; = 1;1,. An elegant
(5.2)
set of linearly
estimable
functions
of p may be found as
W-‘S)T’/l,
y=(I,,
(5.3)
as implicitly suggested in Golub and Styan (1973, $5) and explicitly demonstrated in Dent (1973). The invariant BLU estimates of these functions are given by W - 1I,. Linear dependence in the columns of X is shown by
w-‘s
xn
[ -I,-,
= 0. 1
As shown by Golub and Styan (1973, §5), the decomposition an easy-to-compute g,,, inverse for X,
w-’ Op’
x-=T
( and a corresponding
0
0>
(5.4)
’
basic solution
B-=X-y=TIW-‘l,, where Tl contains
(5.1) permits
(5.5)
the first r columns
(X’X)-
=X-X’-
so that the difference SS, -SS;
of ?: Further,
=T,W-‘W’-‘T;,
in error sum of squares = (I&r-n;R’p^-)‘[II;R’(X’X)-Rn,]x (II;r-II;R’p^-) =(n;r-II;R’T,W-‘l,)‘[II;R’T,W-‘W’-’T;Rn,]x (KI;r-l-I;R’T,W-‘1,).
WIT Dent, Restricted
When rzs the difference viz :
estimation
in linear models
55
the generalized inverse in this expression is a full inverse and is best computed using two decompositions paralleling (4.3),
(5.6a)
where Q* is k x k orthogonal, the product of s Householder transformations, U* is s x s non-singular upper triangular, T/ is s x (J-s), and IJI is a J x J permutation matrix. The rank of R is determined in this decomposition so that results (4.4)(4.5) hold with Q replaced by Q* and U by U*. The second decomposition is given by Q’(W’-‘T;Rl7,,
II)=
; [
;’ 2
1
(5.6b)
where Q is orthogonal Yx r, the product of s Householder transformations, U is s x s non-singular upper triangular, and w, and w2 have s and (r-s) elements, respectively. The size of W’-’ T’, RI7, is r x s and for general Householder reductions the column dimension must not exceed the row dimension. If any column of this matrix is null, then the corresponding constraint is not estimable. It is shown later that s, the rank of R, cannot exceed Y. With simple manipulation, SS, - SS; is given by the sum of squares
(u’-‘n;r-wl)‘(u’~‘n;r-w,),
(5.7)
which is equivalent to (4.9). Strictly speaking, when r< k one requires that the constraints (3.1) be linearly estimable. This may be interpreted as a requirement that the s individual relations may be determined as linear combinations of the components of y in (5.3). Specifically, does there exist an s x r matrix B such that ll;R’
By rewriting
=B’(I,,
W- ‘S)T’.
this as I;I;R’(T,,
B’ may be chosen
as
T,)=B’(I,,
W-‘S),
(5.8)
WT Dent, Restricted
56
estimation
in linear models
B’=II;R’T,, and the constraints are estimable are numerically equal, that is
n;R’T
-I,-, 1
w-‘s c
only so long as ll; R’T, and F, R’T, W- ‘S
=O.
(5.9)
In (5.8) the left-hand matrix has rank s, and rank Y. Estimability thus precludes the possibility be found to guarantee this. Given estimability, the for testing the hypothesized constraints (3.1) - ss; )/(s)SS,.
the term (I,, W-‘S)T’ has that s > r, since no B could value of the F,,,_, statistic is found as (n- r)(SS,
6. An example Searle (1971, pp. 1655166) presents a one way analysis yij=p+ai+eij for which n=6, k=4, and
x=
L 1110001
01
>
of variance
model
Y=
0ol_
Consider estimation of parameters in this model cc,-a,=13, p+ur=99, and p+c~=86, for which
A Householder decomposition with r = 3, T= 14, and
subject
to the constraints
on (X, y) yields a representation
as in (5.1)
WT Dent, Restricted
estimution
in linear mod&
57
[1 1
w-Is=
-1
)
-1
From (5.2) SS, is calculated at 70.0 and the set of linearly estimable functions (5.3) becomes ~+a,, CI~-aa, CX~ - CI~, with BLU estimates 32, 68, and 54, respectively. Column dependency is revealed as the first column less the sum of the other three is null. For the decompositions (5.6),
whence, the rank of R is found as s = 2, II = I,, and
At this point, a check may estimable. The matrix product
be made as to whether the constraints on the left-hand side of eq. (5.9) becomes
are
which is null, as desired. Consistency of the constraints is examined via formation of components in eq. (4.5). This is seen to readily hold given U ’ I/ above. The remaining computation concerns the sum of squares in eq. (5.7). Here -13Jz
pn;#.=l Js
L 1 469
’
58
IK!7: Dent, Restricted
estimation
in linear models
whence, SS, - SS, = 3.0. The F,, 3 test statistic constraints appear plausible.
has value
0.06, so that
the
7. Conclusion The schema presented for estimation subject to linear equality constraints, pivots on two basic sets of Householder decompositions, viz., eqs. (5.1) and (5.6). Full rank equivalents (which are special cases) are found in (4.1) and (4.3). The Householder reduction technique is known to be fast, numerically accurate and stable. In this respect, our analysis competes with that of Gerig and Gallant (1975) who use two Singular Value Decompositions to achieve similar solutions. For non-full rank cases, this involves use of unique Moore Penrose g1234 inverse, whereas, our analysis employs weaker g,,, inverses. Further, we have derived an elegant set of linearly estimable functions for non-full rank design matrices, and have outlined checks of constraint consistency.
References Christensen, L.R., D.W. Jorgenson and L.I. Lau, 1975, Transcendental logarithmic utility functions, American Economic Review 65, 367-383. Dent, Warren T., 1973, Information in less than full rank regression models, Working Paper Series no. 73.-12 (College of Business Administration, University of Iowa, Iowa City, IA). Dent, Warren T., 1976, Information and computation in simultaneous equations estimation, Journal of Econometrics 4, 89-95. Gerig, T.M. and A.R. Gallant, 1975, Computing methods for linear models subject to linear parametric constraints, Journal of Statistical Computation and Simulation 4, 2833296. Golub, Gene H., 1965, Numerical methods for solving linear least squares problems, Numerische Mathematik 7, 206-216. Golub, Gene H. and G.P.H. Styan, 1973, Numerical computations for univariate linear models, Journal of Statistical Computation and Simulation 2, 2533274. Ruble, W.L., 1968, Improving the computation of simultaneous stochastic linear equations estimates, Ph.D. thesis (Department of Statistics, Michigan State University, East Lansing, MI). Searle, S.R., 1971, Linear models (Wiley, New York). Theil, Henri, 1971, Principles of econometrics (Wiley, New York). Zellner, Arnold, 1962, An efficient method of estimating seemingly unrelated regressions and tests for aggregation bias, Journal of the American Statistical Association 57, 348-368.