Economics Letters 83 (2004) 129 – 135 www.elsevier.com/locate/econbase
Rectangular regression for an errors-in-variables model Hang Keun Ryu * Department of Economics, Chung Ang University, Seoul 156-756, South Korea Received 19 March 2003; received in revised form 6 October 2003; accepted 10 October 2003
Abstract For a two variable relationship, the least squares estimator will be biased P if the independent variable is measured with error. By minimizing the combined sum of squared distances, Min ni¼1 ½Dx2i þ Dy2i , it can be shown that the rectangular regression estimators are consistent under certain conditions. D 2004 Elsevier B.V. All rights reserved. Keywords: Rectangular regression; Orthogonal regression; Errors-in-variables model JEL classification: C13; C20
1. Introduction When the observed data y and x contain measurement errors, the least squares estimator will be inconsistent. Goldberger (1972) notes that the classical errors-in-variables model is underidentified and that it presents no interesting problems of estimation and testing. In the Freidman model, if the intercept term is assumed equal to zero (the proportionality assumption for permanent consumption and permanent income), then this assumption leads to identification of the other parameters of the model. See Friedman (1957) for details. Without this assumption, parameters of the model are underidentified. In the errors-in-the-variables set-up, the parameters can be underindentified if there are more parameters than the number of derived equations. See Goldberger (1972) for more identification examples. In this paper, a rectangular regression model is introduced and the rectangular estimators are shown to be consistent under certain conditions. Parameter identification problem and consistent parameter estimation method will be discussed later. Performance of this method is compared with those of the ordinary least squares method and the orthogonal regression method. Malinvard (1980) minimized the * Tel.: +82-2-514-6365; fax: +82-2-515-3256. E-mail address:
[email protected] (H.K. Ryu). 0165-1765/$ - see front matter D 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.econlet.2003.10.011
130
H.K. Ryu / Economics Letters 83 (2004) 129–135
sum of squared distances perpendicular to the fitted line and called it the orthogonal regression. Malinvard applied the orthogonal regression method for linear models with errors in variables, but the covariances of the errors are assumed to be known a priori.
2. The rectangular regression model Consider a model where the true values of y* and x* satisfy the relationship y* ¼ a þ bx*
ð1Þ
Although the true values of x* and y* are not observed, data are collected according to random samplings at observation points i = 1, 2, . . . n. yi ¼ y*i þ ui
with ui fN½0; r2u
ð2Þ
xi ¼ x*i þ vi
with vi fN½0; r2v
ð3Þ
where we assume u and v are uncorrelated. Suppose we draw a line y ¼ a þ bx
ð4Þ
fitted to pass through the sample points (xi, yi) as close as possible. The distances from point (xi, yi) along the y direction and x direction are Dyi ¼ yi a bxi
and Dxi ¼ xi
yi a b
ð5Þ
By minimizing the combination of the sum of squared distances, we obtain the rectangular regression.
Mina;b
n h X
Dx2i
i¼1
þ
Dy2i
i
" n X
2 # a 1 ðyi a bxi Þ þ xi þ yi ¼ Mina;b b b i¼1
n X 1 2 ¼ Mina;b 1 þ 2 ðyi a bxi Þ b i¼1 2
Differentiating (6) partially with respect to a and setting the derivative equal to zero leads to a˜ n ¼ y¯ b¯x
ð6Þ
H.K. Ryu / Economics Letters 83 (2004) 129–135
and the substitution of a˜ n for a in (6) produces n X 1 Minb 1 þ 2 ½ðyi y¯ Þ bðxi x¯ Þ2 b i¼1
131
ð7Þ
The minimizing values are denoted as b˜n and a˜n = y¯ b˜nx¯. Rewrite (7) using the sample moments Minb
1 1 þ 2 ðM22 þ b2 M11 2bM12 Þ b
ð8Þ
where M11 ¼
n X ðxi x¯ Þ2 ;
M12 ¼
i¼1
n X ðxi x¯ Þðyi y¯ Þ;
and M22 ¼
i¼1
n X ðyi y¯ Þ2
ð9Þ
i¼1
Differentiation with respect to b and equating it with zero produces, M22 b˜n M12 ¼ b˜4n M11 b˜3n M12
ð10Þ
If M11 = M22, b˜n becomes one, and parameter estimation does not depend on the specific values of M11 and M12. 2 2 2 + rv2, M22/n ! b2rx* + ru2, M12/n ! brx* Now let us establish consistency for b˜n. With M11/n ! rx* as n ! l, (10) can be rewritten as b2 r2x* þ r2u b˜n br2x* b˜4n ðr2x* þ r2v Þ þ b˜3n br2x* ! 0
as n ! l
ð11Þ
2 , and rearranging it produced, Dividing the above equation with rx*
r2 b˜4 r2 ðb b˜n Þ þ u 3 n v ! 0 ðb þ b˜n Þr2x * r2 b˜4 r2 b˜n u 3 n v ! b ðb þ b˜n Þr2x *
as n ! l
as n ! l
ð12Þ
ð13Þ
2 If ru2 p b˜ n4rv2, then b˜n will not be a consistent estimator. However, if rx* Hru2, rv2 then b˜n will be a good estimator for b.
132
H.K. Ryu / Economics Letters 83 (2004) 129–135
P P P Theorem. Suppose ð1=nÞ ni¼1 ui vi ! 0, ð1=nÞ ni¼1 xi ui ! 0, ð1=nÞ ni¼1 xi vi ! r2v as n ! l and ru2 = b4rv2. Then the parameters estimated by the rectangular regression converge to the true parameters, a˜ n ! a and b˜n ! b as n ! l. Proof. Rewrite (12) using ru2 = b4rv2, "
# 3 2˜ ˜2 þ b˜3 Þr2 ðb þ b þ b b b n n n v !0 ðb b˜n Þ 1 þ ðb þ b˜3n Þr2 x*
as n ! l:
If b˜n has the same sign with b, then the bracket term is nonzero and b˜n ! b as n ! l.
ð14Þ
5
About the parameter identification problem for the rectangular method, ru2, rv2 and rx2* can be determined from the following three relationships once b is determined by Eq. (8). r2y ¼ b2 r2x* þ r2u r2x ¼ r2x* þ r2v
ð15Þ
rxy ¼ br2x* Zellner (1971) also considered minimization of combined sum of squared errors along the x axis and squared errors along the y axis. Rewrite equation (5.74) of Zellner (1971) to fit our notation and consider only the terms inside the large bracket.
2 X
2 X
2 n n b2 =/ 1 1 Mina;b / x aÞ þ ½yi a bxi 2 ðy i i b 1 þ b2 =/ i¼1 1 þ b2 =/ i¼1
ð16Þ
The weighting factors of (16) are different from those of (6). Zellner (1971) solves a MLE problem and variance ratio / = ru2/rv2 is used to weight squared errors along x axis and squared errors along y axis. Even if / = 1, weighting factors are different because Zellner (1971) uses xi xi* in (5.73) for the departure along the x axis, but this paper uses xi xˆi in (6).
3. Application To compare the performance of the rectangular regression from those of the OLS and the orthogonal regression, let us review the orthogonal regression model. The orthogonal regression has some intuitive appeal in establishing nearest distance from each observation point to the fitted line.
H.K. Ryu / Economics Letters 83 (2004) 129–135
133
3.1. Review of the orthogonal regression model The sample observations are fitted on the line y ¼ a þ bx
ð17Þ
and the orthogonal distance from point (xi, yi) to the fitted line is given as ðbxi yi aÞ pffiffiffiffiffiffiffiffiffiffiffiffiffi : b2 þ 1
ð18Þ
See Protter and Morrey (1970) for the orthogonal distance from a point to a line. Malinvard (1980) derives the slope of regression by minimizing the sum of orthogonal deviations,
bORTHO ¼
M11 M22 þ
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 ðM11 M22 Þ2 þ 4M12 2M12
ð19Þ
and the intercept is found from aORTHO ¼ y bORTHO x
ð20Þ
3.2. Data generation Assume that unobservable x* and y* satisfy the relation y* ¼ a þ bx*
ð21Þ
One hundred equally spaced x* samples are generated from a uniform distribution, i.e. x1* = 0.01, x2* = * = 1. Corresponding yi* values are y*1 = a + 0.01b, y2* = a + 0.02b, . . . , and y100 * = 0.02, . . . , and x100 a + b. Now add measurement errors ui and vi to xi* and yi* to generate xi and yi samples. For i = 1, 2, . . ., 100, xi ¼ 0:01 i þ ui
ð22Þ
yi ¼ a þ 0:01 b i þ vi
ð23Þ
where ui f N[0, (0.1)2] and vi f N[0, (0.05)2]. These errors are not small in magnitude as the domain of x* is between 0 and 1 and the range of y* is between a and a + b. Let us choose the values, a = 1 and b = 2. For the generated data, the model parameters are estimated by the OLS, orthogonal regression, and rectangular regression. The process above is repeated 100 times and the mean values of the estimated parameters are reported in Table 1. To approximately satisfy the required condition for consistency, ru2 = b4rv2, rescale the
134
H.K. Ryu / Economics Letters 83 (2004) 129–135
Table 1 Performance of OLS, orthogonal regression, and rectangular regression Models
Mean value
Departure
S.E.s
Minimum value
Maximum value
aOLS aORTHO a˜ n (rectangular) a˜ n (M11 = M22) bOLS bORTHO b˜ n (rectangular) b˜ n (M11 = M22)
1.0304 0.98157 1.0007 1.0003 1.9412 2.0379 2.0000 2.0009
0.0304 0.01843 0.0007 0.0003 0.0588 0.0379 0.0000 0.0009
0.02725 0.02849 0.02732 0.02754 0.04469 0.04722 0.04440 0.04517
0.9393 0.8866 0.9082 0.9074 1.8428 1.9468 1.9114 1.9088
1.0954 1.0431 1.0605 1.0625 2.0391 2.1506 2.0957 2.1065
The true parameters are a = 1, b = 2, r2u = 1 and r2v = 0.5. Regressions are repeated 100 times and the mean values of the estimators are reported. For M11 = M22, the orthogonal method is equivalent to the rectangular method.
observations to fit 0.5 V (x˜i ,y˜i) V 0.5. As a rough approximation, M11 can be assumed more or less the same with M22. Then the slope b will be near one (if M11 = M22, b = 1). If M11 c M22 and b c 1, then ru2 c rv2 with M11/n ! rx2* + rv2, M22/n ! b2rx2* + ru2 as n ! l. Then ru2 c b4rv2. Now, b˜n can be transformed back to the original space to match the original scale. It is interesting to note that the rectangular method produces identical result to that of the orthogonal method when M11 = M22. If we rescale y such that M11 = M22, then (10) and (19) produced, b˜n ¼ bORTHO ¼ 1
ð24Þ
Performance of the least squares method, orthogonal regression, and rectangular regression is compared in Table 1. As expected, the slope of the least squares method is biased downward and the intercept term is biased upward from the true values a = 1 and b = 2. For the orthogonal regression, the estimated slope (2.0379) is quite close to the true value 2, with the standard error (S.E.) 0.0472. In comparison, the rectangular regression produced very good results, the slope is 2.0000 and the intercept is 1.0007, with S.E.s (respectively, 0.0444 and 0.0273). When y variable is rescaled to fit M11 = M22 and rescaled the estimated parameter back˜ to the original space, both orthogonal method and rectangular method produced good results with bn = bORTHO = 2.0009 and a˜n = aORTHO = 1.0003. Though true model is a linear model, linear transformation of true model changes parameter estimation. After transformation, previous minimum distance is no longer a minimum for the orthogonal regression. for the rectangular regression, changing scale of y axis changes the weight of (Dy)2 Pn Similarly in Min i¼1 ½Dx2i þ Dy2i . 4. Summary and concluding remarks For an errors-in-variables model, the rectangular regression is introduced and the consistency of the estimated parameters is established. Using a numerical experiment, it is shown that the rectangular regression method is easy to apply and that its estimated parameters are accurate. Its performance is compared with those of the least squares method and the orthogonal regression method.
H.K. Ryu / Economics Letters 83 (2004) 129–135
135
Extension of the rectangular regression method for the two or more independent variables is immediate. Numerical optimization for two or more parameters will take more time, but its method is straightforward. Extension of this method for the nonlinear regression function poses no more difficulty as long as the nonlinear function does not include too many parameters. Acknowledgements This research was supported by the Chung-Ang University Research Grants in 2003. I would like to thank Professor Arnold Zellner for his helpful comments. I also wish to thank an anonymous referee for many useful suggestions.
References Friedman, M., 1957. A Theory of the Consumption Function. Princeton University Press, Princeton, NJ. Goldberger, A., 1972. Structural equation methods in the social sciences. Econometrica 40 (6), 979 – 1001. Malinvard, E., 1980. Statistical Methods of Econometrics. North Holland, New York, NY. Protter, M., Morrey, C., 1970. College Calculus with Analytic Geometry. Addison Wesley, Reading, MA. Zellner, A., 1971. An Introduction to Bayesian Inference in Econometrics. Wiley, New York, NY.