Journal
of Econometrics
10 (1979) 43-55.
ESTIMATION
0 North-Holland
WITH
Publishing
AGGREGATED
Company
DATA
R. W. FAREBROTHER Unioersity of Manchester, Manchester Ml3 9PL, UK Received
June 1975.
This paper is concerned with the problem of estimating the parameters of the standard linear model from grouped data, or more generally from aggregated data. A number of alternative solutions are suggested and compared.
1. General theory
1.1. The problem The problem of the model
considered
p=8p+E;
in this paper
EEl=O,
is that of estimating
E.C’=CT~I,,
when we only have data on the model obtained m x n matrix A of rank m*, y=Xp+&,
E&=0,
the parameters
(1) by premultiplying
(1) by the
Eed=o’AA’,
where jj is an n x 1 matrix of observations on the dependent variable, n x k matrix of observations on the independent variables, /? is a k x of unknown parameters, EI is an n x 1 matrix of disturbances and unknown positive scalar; y = Aj7, X = A% and E= AEI; X is assumed full column rank.
(2) J? is an 1 matrix c2 is an to have
1.2. Known error variance Suppose that W* = A* A*’ is known where A* is an m* x n submatrix of A of rank m*. Then, without loss of generality, we can suppose that A* lies in the first m* rows of A and that A= [:]A*, where P is an (m-m*) x m* matrix. Premultiplying (2) by the non-singular matrix (Z+P’P)_’ [
0
0 I][YP
PJ7
44
K.W Farebrother,
Estimation
with aggreguted data
we obtain
[-q=[;‘ia+[yJ
(3)
where J’* =A*?, X* =A*2 and E* =A*& and where X* is of full column rank. Deleting the last m - m* rows of (3) which are trivial we obtain EE*=~,
y*=X*p+&*,
EE*E*=D~ W*.
(4)
The arbitrary choice of a full row rank submatrix of A is innocuous as any other nr* x n full row rank submatrix of A takes the form QA*, where Q is non-singular, and model (2) again reduces to model (4). W* is known so we can evaluate the formulas for the best linear unbiased estimator (BLUE) of b given y* and X*, (5) (6) and CT~is estimated
unbiasedly
62 = i*’ W*-
’ i*/(m*
by - k),
(7)
where E^*=y* -x*fi,. less efficient’ than fl=(rf’g)-‘x’j, the ordinary least B.4 is generally squares estimator of /I, since j? is the BLUE of fi in model (1) after it has been premultiplied by the non-singular matrix [“,‘I whilst /?, is obtained by deleting the last n-m* observations. This is an important result as (4) may be transformed to take the form of (1) so we can deduce that the application of a sequence of aggregation matrices gives rise to progressively less efficient estimators. 1.3. Unknown
error
variance
Let Z’X and 2’~ be known ‘varfl=var~,
matrices
iff
r7~rl=R’A*‘(A*A*‘)-‘~*~,
iff
_?[I,-A*‘(A*A*‘)~‘A*]8=0,
iff
,? =A*‘C
for some
Thus DA is as efficient as j?‘iff r? is linearly
and let Z’X be non-singular
m* x kc. dependent
on A*‘.
then the
R.W Farebrother,
instrumental
variable
Estimation
with aggregated
45
data
estimator
b=(Z’X)_‘Z’y
(8)
is an unbiased estimator determine its variance
of /3. However,
var b = o’(Z’X)-
if W = AA’ is not known,
’ Z’ WZ(X’Z)-
we cannot
‘,
(9)
except in certain special cases. For Z*’ WZ* = &,iwiiZ,?r’ Zf does not involve wij iff ZF =0 or ZT, =O, where Z* = Z(X’Z)-‘, and Zi =0 iff Zi. =O. Thus if A may be partitioned as
where Ai. is an m, x n matrix of full row rank, and V$ is known for i=l,2,... , h I h*, where ej = Ai A;, . Then, assuming that Xi, has full column rank, we have a set of h irreconcilable unbiased estimators’ of p, h”’ = (Xi. w, l Xi.)_ var b”‘=a2(X;. and h irreconcilable
l
x;. w, l yi,
Wi; l Xi.)-‘,
unbiased
estimators
i = 1,2, . ., h,
(10)
i = 1,2,. . .) II,
(11)
i = 1,2, . . ., h,
(12)
of cr2,
s&= ef WiT ’ ei/(mi - k), where ei = yi - Xi,
b(“,
4’i=Ai.~,
Xi.=Ai.rZ,
and where
To resolve this situation unbiased estimator
&i=Ai,E:
when3
h* =h=
k, Haitovsky
has suggested
~,=(xbv~‘X)-‘(XI,V_‘L’), 2b(‘) is, of course, EEi&i = uz l4$. ‘The assumption
the
BLUE
of fi given
h* = h is merely for notational
the
(13) y, and
Xi. in the
convenience.
model
y,=X,.p+~~,
E&,=0,
R.W Farebrother,
46
Estimation
with aggregated
data
where ~d=diag{rl.,rf.,...R.,}, and A,=diag{A,.A,:..A,.), SO
Wzz.‘.Wkk}=l!
A,A:,=diag{W,, and
A,~,=diag{X,,X,,...X,,}=X,. Unfortunately
generally
IjH is
type
a
(8) estimator
and
has
a variance
which
is
unknown, var~H=~2(X~I/-1X)-1X~I/-1WI/-1X,(X’I/-1X,)-’.
(14)
Haitovsky did not note this fact for reasons which An alternative solution which does not suffer obtained by shortcircuiting Haitovsky’s (1973, pp. c be a vector of k constants such that Xj. is j=c1,c2,. . .,ck, then
we will discuss below.4 from this defect may be 3&35) full argument. Let of full column rank for
b*(,n)
is an unbiased
=
{
b’,‘d b(zEd
estimator
4Haitovsky
(15)
of p with known
var by) =a2(X;. where the superscript estimates are, however,
. bfdj
variance
elements
WI; l Xj,)ii,
(16)
ii denotes the iith element of the inverse. only obtained after extensive computation.
does not state the formulas
These
in this form. However,
X,V~‘X=R~A~(A,A~)~‘A9=~~H~=~~(l~~H)~~, where H=col{H,H2...H,j
and Hi=A;.(A,.A;.)-‘Ai..
Similarly, X:,~‘~=r7~H~=8~(l;oH)(I.,~~). Further X;V-‘AA’V-‘X,=R;HH’Rd=x;H*z_,, where H;=H,Hj=A;.
Wi,‘~jW,;‘Aj.A,.
Thus (13) is indeed the Haitovsky (p. 34) estimator but we disagree with his formula for the variance.
if our matrix
Ai. represents
his matrix
Gi,
R.W? Farebrother,
Estimation
with aggregated
data
41
2. Grouped data 2.1. Definitions Let Fi. be an mix n matrix each of whose columns contains one unit element and mi- 1 zeros then we shall refer to it as a simple aggregation matrix. Simple aggregation matrices have the following properties: lkiFi. = 1L Fi. 1, =x, Fi. F;. = diag (jJ, Fi. FJ. = Nij, where 1, is a p x 1 matrix of ones and where x(j), the jth element of A, records the number of unit elements in the jth row of Fi. and the ghth element of Nij records the number of unit elements common to the gth row of Fi. and the hth row of F). Let Fi. be a simple aggregation matrix of full row rank then Gi. = (Fi.FI.)- * Fi. is the corresponding
simple grouping
matrix.
2.2. Preliminaries Suppose matrices
that
F may
F=
Fl. i
be represented
as
k stacked
simple
aggregation
I! .
F,.
Then
FF’=N=
NI,
NI,
...
N,,
N,,
N,, .
...
N,,
Nk,
N,,
...
N,,
where the m, x mj matrix Nij=Fi.Fi. and jth simple aggregation matrices.
records
the joint
frequencies
of the ith
48
R.W Farebrother,
Suppose ing matrix
Estimation
with aggregated
data
that Fi. is of full row rank for i = 1,2,. . ., k, then the correspondof stacked simple grouping matrices is given by
r
G,.
G=
=M-‘F,
i
11G/C. where M=diag{N,,N,,.. 2.3 Haitovsky’s Setting
.N,,..
Clearly
GG’=M-‘NM-‘.
method
A = G in (13) we have
fiH= (& Mx)- ‘2; Mj,
(17)
/?Jj=(XJX+)-‘XJL’+,
(18)
or
where j=Gy,
X=Gr?,
R,=diag{X,,8,,...~,,},
and where X+= F;T?,
j&F’&
Xj=F&
F,=diag{F,.F,:..F,.}.
Eq. (18) is the simple form used by Haitovsky (pp. 3&32). The ith block of Xj’y+ is X,!‘y,!‘so the variance of flH involves form EX+~,t,f~Xf 1
II
J’
where y/=F;.&,
Xz=F;.&.,
tt=F;.&
and where &=Gi.j,
&.=Gi.a
&=Gi.Z
Now EX+‘,+,+‘X+,=,J~X+,‘X 1 1 lJ J
7,
J’
since Xl=Hir?
and
$=HiE;
Hi=F;.(Fi.F;.)-‘F,.
and
where HiHi=Hi,
terms of the
R.W. Farebrother,
Estimation
with aggregated
49
data
there is no need to adopt Haitovsky’s assumption that EE~EJ = o2 I, as the conventional assumption E.2 = o2 I, suffices. Haitovsky’s major error was to treat Xz as if it were known whereas Xz =F;.Xi. and only Xi, is known. This error probably derives from the valid assumption [Haitovsky (1973, p. 6)] that a single simple aggregation matrix may be written as so
Fi. =diag
. . . l;,(,,Jj
{l;i(i,l;,(~,
without loss of generality. If, however, this assumption is made for each Fi. then a set of Nijs result which may be correctly reconstructed from their row and column totals by the ‘north west corner rule’, fiij=diag 2.4. A further Noting
jl;i,i,l;~~z,.
. l;i(m,J .diag {lfj(~jl~j(~).
. lfjcrnj)l.
suggestion
that Nijl,j=Fi.F;.
lmj=Fi.
1, =f;,
and
so that if the elements
It therefore
were randomly
seems reasonable
allocated
to cells one would expect
to use
Nl,
Nil N 22
N 22
k,
as a surrogate for N. To obtain a full row rank submatrix of G we must delete at least one row from all but one of its submatrices; this generally suffices to make fl* non-
R.W Farebrother,
50
Estimation with aggregated data
singular’ so that we can evaluate formulas (5) (6) and (7). However the estimator is no longer independent of the choice of the full row rank submatrix of G so that different investigators making different choices of G* would obtain different estimates of /?. This impasse may be resolved by noting that
where (AA’)+ denotes Moore-Penrose eqs. (5), (6) and (7) become
generalised
inverse
of AA’. For then
/&=[x’w+x]-1x’w+y,
(19)
varfia=a2[X’W+X]-‘,
(20)
CT’= 2’ W+ 6/(m* - k),
(21)
and
where
And different
investigators
2.5. Illustrative
can agree on a single approximation
Wi
to W’.
example6
To illustrate
the theory
Pi=pO+pl
we consider
x+p2Si+&i?
Houthakker’s
model
i= 1,2,. . ., 1218,
where Pi is the net purchases of automobiles by the ith household, x is its income and Si is the value of its automobile stock at the beginning of the year. The E;S are disturbances which are assumed to be mutually uncorrelated and identically normally distributed with zero means. The original observations were crossclassified by income into seven groups and by stock into eight groups, the intercept classification being trivial. The 56 mean values for each variable and their corresponding frequencies are listed in tables A.l-A.4 of Haitovsky (pp. 77-80). Performing generalised least squares on the 56 observations we obtain the first row of table 1. Suppose now that the complete cross-classification is not available, but only the marginal means and the joint frequencies. Then deleting the last row and the intercept grouping matrix of the income grouping matrix s/5ij 1,,=(l/n)f;f;l,,=f;
so there are k- 1 linearly
independent
form FJ.,l,L-fl.jl,,,,-O. ‘1 am indebted
10 .lllhan Taylor
for performing
the calculations.
linear restrictions
on fl of the
R.W. Farebrother,
Estimation
with aggregated
data
and Haitovsky’s
regression.”
51
Table 1 Summary
of the simple regressions Intercept
Model
Y-coefficient
S-coefficient
8*
Complete cross-classification
17.98512 (5.85771)
0.72916 (0.12567)
-0.17236 (0.03370)
4273.067
Houthakker
18.07354 (5.86896)
0.72637 (0.12594)
-0.17186 (0.03384)
4285.348
Y-table
10.86600 (34.39248)
0.55054 (0.84097)
0.03815 (0.97711)
9027.315
S-table
73.74625 (30.80949)
- 0.65330 (0.76224)
-0.09312 (0.04720)
1348.399
0.72713 [0.10335]
-0.17178 [0.02820]
4335.491
Haitovsky
18.03350 C6.606521
“With the exception of the Haitovsky estimates, the numbers in parentheses are the standard errors of the estimated parameters above them. The numbers below the Haitovsky estimates are the north west corner rule approximations to the standard errors.
we obtain the Houthakker estimates given in the second row of W2W;,,, table 1. If, further, the joint frequencies are not known we can use either the income classification or the stock classification and eq. (10) to estimate the regression. These results are given in the third and fourth rows of table 1. Before attempting other estimates in this situation it is interesting to examine the standard deviations of the estimates we have already obtained. As will be seen from table 2 the standard deviations of the Houthakker estimates are larger than those of the corresponding complete cross-classification estimates and the standard deviations of the singleTable 2 The standard
deviations
of the simple regressions.
Model
Intercept
Y-coefficient
S-coetlicient
Original
(0.10080)
(0.001929)
(0.000507)
Complete cross-classification
0.089610
0.0019225
0.00051554
Houthakker
0.089654
0.0019238
0.00051692
Y-table
0.361980
0.0088512
0.01028407
S-table
0.839026
0.0207578
0.00128538
“The factor o is understood throughout computed from Haitovsky’s table 3.2.
the
table.
The
first
row
has
been
R.W Farebrother,
52
Estimation
with aggregated
data
classification estimates are larger still. These are to be compared with the standard deviations implicit in table 3.2 of Haitovsky (p. 18) which do not follow this ordering. Indeed our table 2 shows that the standard errors of the estimates obtained directly from the original data are also stated incorrectly. This error has previously been noted by Johnston (1972, p. 236). The remaining estimates are based on approximations to the unknown joint frequency table which is given in the body of table 3a. Our suggested approximation to this table, given in table 3b, reconstructs the expected joint frequencies from the marginal frequencies on the assumption that elements
Table 3a Observed
frequency
table.
6 7 23 26 34 15 7
6 3 21 19 31 17 15
4 8 22 25 26 20 14
118
118
119
21 26 50 28 38 12 14
18 19 55 56 37 18 7
10 17 58 60 46 24 12
11 7 36 29 33 18 17
195
210
227
151
-
-
4 2 12 16 22 10 14
86 89 277 259 213 134 100
80
1218
5.7 5.8 18.2 17.0 17.9 8.8 6.6
86 89 277 259 273 134 100
Table 3b Expected 13.8 14.3 44.4 41.5 43.7 21.5 16.0 195
14.8 15.3 47.8 44.7 47.1 23.1 17.2 210
16.0 16.6 51.6 48.3 50.9 25.0 18.6
10.7 11.0 34.3 32.1 33.8 16.6 12.4
227
151
frequency
table.
8.3 8.6 26.8 25.1 26.5 13.0 9.7
8.3 8.6 26.8 25.1 26.5 13.0 9.7
118
118
8.4 8.7 27.1 25.3 26.7 13.1 9.8 119
80
1218
Table 3c North
west corner
rule frequency
table
86 89 20 0 0 0 0
0 0 210 0 0 0 0
0 0 47 180 0 0 0
0 0 0 79 72 0 0
0 0 0 0 118 0 0
0 0 0 0 83 35 0
0 0 0 0 0 99 20
0 0 0 0 0 0 80
195
210
227
151
118
118
119
80
X6 89 277 259 273 134 100 1218
R.19: Forebrother,
Estimation with aggregated data
53
are randomly allocated to cells. Haitovsky implicitly7 uses the north west corner rule which constructs table 3c. The Haitovsky estimates, obtained by applying eq. (13) to the data in deviations from means, and the NWCR approximations to their standard errors8 are given in the last line of table 1. It is apparent from table 4 that the NWCR approximations to the standard deviations of the slope coefficients are gross underestimates. , If we delete the same observations as before the first method of section 2.4 produces the estimates of tables 5 and 6. The generalised inverse method, on the other hand, produces the estimates of tables 7 and 8.
Table 4 The estimated
Frequency
standard
matrix
deviations
of Haitovsky’s
estimator.P
Intercept
Y-coefficient
S-coefficient
True
0.089661
0.0019240
0.00051702
Expected
0.086284
0.0020160
0.00054025
NWCR
0.100335
0.0015696
0.00042830
“The factor
e is understood
throughout
the table
Table 5 The ‘deletion’ estimates
Frequency
matrix
and their estimated
standard
errorsa
Intercept
Y-coefficient
S-coefficient
True
18.07354 (5.86896)
0.72637 (0.12594)
-0.17186 (0.03384)
4285.348
Expected
18.50071 C5.659161
0.71338 [0.13202]
-0.16976 [0.03548]
4330.861
NWCR
12.50580 [8.42355]
0.83512 CO.126321
-0.16158 CO.034781
9682.899
“The numbers above them.
in parentheses
are the estimated
standard
errors
of the estimated
parameters
‘Our suggestion that Haitovsky uses the north west corner rule is confirmed by his table A.8 (pp. 86-87). ‘The approximate standard error of the intercept may be obtained from Haitovsky’s formulas (pp. 31-32) by using his data (p. 36) and &1x,x, =971898. uz is estimated biasedly by ?;,M-‘&/(p(G)k) where &, =y-Xfi” (p. 32).
R.W Farebrother,
54
Estimation with aggregated data
Table 6 The true and estimated
standard
deviations
of the ‘deletion’ estimates.” Y-coefficient
S-coefficient
0.089654
0.0019238
0.00051692
Expected
0.089936 [0.085993]
0.0019339 [0.0020062]
0.00051778 [0.00053907]
NWCR
0.185066 [0.085604]
0.0032171 [0.0012837]
0.00085452 [0.00035344]
Frequency
matrix
Intercept
True
“The factor c is assumed throughout the table. The numbers the estimates of the true values above them.
in parentheses
are
Table 7 The ‘generalised Frequency
inverse’ estimates
matrix
and their estimated
standard
errors.’
Intercept
Y-coefficient
S-coefficient
True
18.07458 (5.86906)
0.72634 (0.12594)
-0.17186 (0.03384)
4285.622
Expected
18.50171 [5.65922]
0.71335 [0.13202]
-0.16976 [0.03548]
4331.097
NWCR
12.50545 [8.42433]
0.83511 CO.126331
-0.16157 [0.03478]
9684.573
“The numbers above them.
in parentheses
are the estimated
standard
errors
of the estimated
parameters
Table 8 The true and estimated Frequency True
matrix
standard
deviations
Intercept
of the ‘generahsed Y-coefficient
inverse’ estimates.’ S-coefficient
0.089652
0.0019238
Expected
0.089935 CO.0859921
0.0019339 [0.0020061]
0.00051778 [0.00053907]
NWCR
0.185062 [0.085604]
0.0032170 [0.0012837]
0.00085451 [0.000353453
“The factor (I is assumed throughout the table. The numbers the estimates of the true values above them.
0.0005 1692
in parentheses
are
R.W Farebrother,
Estimation
with aggregated
data
55
3. Conclusion Despite the fact that the conventional x2 test statistic is 103.724, it is clear that the ‘expected’ frequency table is a close approximation to the observed frequency table. We therefore suggest the use of eqs. (19), (20) and (21) with this approximation, The results then obtained are still only second best to those obtained from eqs. (5), (6) and (7) which would be available if the compilers of aggregate data series were to oblige practitioners with the joint frequencies of their tabulation. References Haitovsky, Y., 1973, Regression estimation from grouped observations, Grifftn’s Statistical Monographs and Courses no. 33 (Charles Griffin, London). The section of the monograph which concerns us in this paper is based on an earlier paper by Haitovsky published in: 1966, Journal of the American Statistical Association 61, 72&728. Johnston, J., 1972, Econometric methods, 2nd ed. (McGraw-Hill, New York).