Economics Letters 85 (2004) 85 – 91 www.elsevier.com/locate/econbase
Exponential specifications and measurement error Susanne M. Schennach * Department of Economics, University of Chicago, 1126 E. 59th St., Chicago, IL 60637, USA Received 5 January 2004; accepted 9 March 2004 Available online 4 July 2004
Abstract While it is well-known that the errors-in-variables model with a linear specification can be identified using instrumental variables (IV), we show that, in general, its nonlinear counterpart is not, by providing a class of counterexamples based on exponential specifications. D 2004 Elsevier B.V. All rights reserved. Keywords: Errors-in-variables model; Exponential specification; Instrumental variable; Identifiability JEL classification: C30
1. Introduction Estimators based on instrumental variables (IV) have long been used to correct for the presence of measurement error in linear regressions models. Unfortunately, as first pointed out by Amemiya (1985), standard IV techniques break down when the specification is nonlinear, because the measurement error can no longer be considered as an additively separable disturbance, so that it is not possible to find an instrument which would be correlated with the regressor without being correlated with the disturbance. This problem has prompted a long search for a solution. Hausman et al. (1991) have fully handled the case where g(x*,h) is a polynomial in x*, establishing both identification and providing a root n consistent estimator. Wang and Hsiao (2003) have achieved a similar feat for the class of specifications where g(x*,h) is absolutely integrable1 with respect to x*. A general framework for the consistent estimation of general nonlinear models not limited to polynomial and absolutely integrable specifications has been provided by Newey (2001), under the assumption that the model is identified. So far, however, * Tel.: +1-773-702-8199; fax: +1-773-702-8490. E-mail address:
[email protected] (S.M. Schennach). 1 Their proof of identification also requires that the dimension of h be no larger than one plus the dimension of x*. 0165-1765/$ - see front matter D 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.econlet.2004.03.023
86
S.M. Schennach / Economics Letters 85 (2004) 85–91
the prerequisite general framework for establishing identification of the errors-in-variables model with instruments has remained elusive. In this letter, we show that there exists a class of nonlinear specifications where the parameters cannot be identified in the presence of measurement error using the estimating equations originally proposed by Hausman et al. (1991) and subsequently used by Newey (2001) and Wang and Hsiao (2003), thus explaining why the search for a general proof of identification has remained unsuccessful. Our counterexamples are far from being contrived, as they include the practically relevant case of exponential specifications.
2. Results The nonlinear errors-in-variables model with instruments, as previously used by Hausman et al. (1991), Newey (2001) and Wang and Hsiao (2003), is defined as follows. Let y, x, x*, z, Dy, Dx, u be scalar random variables and consider the following model expressing the relationship between the dependent variable y and a regressor x* y ¼ gðx*; hÞ þ Dy
E½DyAz; u ¼ 0
x ¼ x* þ Dx
E½DxAz; u; Dy ¼ 0
x* ¼ z þ u
u independent from z
ð1Þ
where g(x*,h) is a known function depending on a parameter vector h to be determined. The variables x* and x, respectively, denote the true value of the regressor and its mismeasured counterpart, while z is an instrument for the true regressor x*. The variables x, y, z are observable, while the variables x*, Dx, Dy, u are not. Note that while g(x*,h) is parametrically specified, no parametric distributional assumptions regarding the unobservable variables are made. The instrumental equation x* = z + Dx* could be written in a slightly more general form x* = X(w,a) + Dx*, where X(w,a) is a general nonlinear function of a vector of instruments w and an unknown parameter a which could be consistently estimated by a nonlinear regression of x on w. In addition, it is possible to consider all random variables to be vector-valued. For conciseness, we do not consider these simple extensions here. Following the existing literature on nonlinear errors-in-variables models with instruments (Hausman et al., 1991; Newey, 2001; Wang and Hsiao, 2003), we investigate the identifiability of h in model (1) using the following estimating equations E½yAz ¼
Z
E½xyAz ¼
Z
gðz þ u; hÞ f ðuÞdu
ð2Þ
ðz þ uÞgðz þ u; hÞ f ðuÞdu
ð3Þ
S.M. Schennach / Economics Letters 85 (2004) 85–91
87
where f(u) denotes the density of u and where the integrals extend over the whole real line. Establishing identification would require the determination of h from the fully observed quantities E[ yjz] and E[xyjz]. The heuristic argument put forward by Newey (2001) suggesting that these equations may enable identification is the fact that the model is characterized by two unknown functions g(x*,h) and f(u), while two functional equations are available. Moreover, in the special case of a polynomial g(x*,h), it is known (Hausman et al., 1991) that the knowledge of the conditional expectations E[ yjz] and E[xyjz] is sufficient to identify g(x*,h). However, the following counterexample shows that a general identification result cannot be obtained. Theorem 1. If (i) g(x*,h) = h exp(lx*) for some known laR\{0}, (ii) u has a density f(u) that is bounded away from zero on some interval I (not reduced to a point) and (iii) E[exp(lu)] and E[u exp(lu)] exist, then h in model (1) is not identified by Eqs. (2) and (3). Proof. For exponential specifications, Eq. (2) becomes E½yAz ¼ h
Z
expðlðz þ uÞÞf ðuÞdu
E½yAz ¼ hexpðlzÞ
Z
ð4Þ
expðluÞ f ðuÞdu
ð5Þ
(where the integration range is implicitly taken to be the support of f(u)) while Eq. (3) becomes E½xyAz ¼ h
Z
E½xyAz ¼ zh
Z
ðz þ uÞexpðlðz þ uÞÞf ðuÞdu
expðlðz þ uÞÞf ðuÞdu þ hexpðlzÞ
E½xyAz ¼ zE½yAz þ hexpðlzÞ
Z
ð6Þ Z
u expðluÞ f ðuÞdu
u expðluÞ f ðuÞdu:
ð7Þ
ð8Þ
Hence, h and f (u) are a solution to the following system of equations E½yAz ¼h expðlzÞ
Z
expðluÞ f ðuÞdu
E½ðx zÞyAz ¼h expðlzÞ
Z
u expðluÞ f ðuÞdu;
ð9Þ
ð10Þ
where the left-hand sides are observable, while the right-hand sides are unobservable. We now show that Eqs. (9) and (10) admit another solution h˜ and f˜(u), thus establishing lack of identification.
88
S.M. Schennach / Economics Letters 85 (2004) 85–91
Consider some h˜ lying in a neighborhood of h and let f˜ (u)=(h/h˜)f(u)+(1 (h/h˜))p(u) where p(u) satisfies Z Z Z
pðuÞdu ¼ 1
ð11Þ
pðuÞexpðluÞdu ¼ 0
ð12Þ
pðuÞu expðluÞdu ¼ 0:
ð13Þ
The function p(u) can be constructed as follows: select three bounded functions pj(u) for j = 1,. . .,3 with disjoint supports, each contained in I. Then, set p(u) = S3j = 0 cjpj(u) with cj chosen so that Eqs. (11) through (13) hold. It is always possible to select the pj(u) so that the set of linear equations defining the cj is noncolinear, since the functions 1, exp(lu) and u exp(lu) for l p 0 are noncolinear on any nondegenerate interval. Note that, by construction, ˜f (u) integrates to one and is positive for h˜ sufficiently close to h, since the pj(u) are bounded and supported inside the set I over which f(u) is bounded away from zero. Hence, f˜(u) is a valid density. Now, if h and f(u) satisfy Eqs. (9) and (10), then h˜ and f˜(u) also satisfy them: E½yAz ¼ h˜ expðlzÞ E½yAz ¼h expðlzÞ
Z
Z
˜ ðuÞ þ ð1 ðh=hÞÞpðuÞÞdu ˜ expðluÞððh=hÞf
expðluÞ f ðuÞdu þ ðh˜ hÞ
Z expðluÞpðuÞdu
E½yAz ¼ h expðluÞ f ðuÞdu expðlzÞ E½ðx zÞyAz ¼ h˜ expðlzÞ E½ðx zÞyAz ¼h expðlzÞ E½ðx zÞyAz ¼h expðlzÞ
Z
Z
Z
ð14Þ
ð15Þ
ð16Þ
˜ f ðuÞ þ ð1 ðh=hÞÞpðuÞÞdu ˜ u expðluÞððh=hÞ
ð17Þ
Z ˜ u expðluÞf ðuÞdu þ ðh hÞ u expðluÞpðuÞdu
ð18Þ
u expðluÞ f ðuÞdu
ð19Þ
Hence, Eqs. (2) and (3) are not sufficient to distinguish h from another value h˜ in some neighborhood of h, so that h is not identified. 5
S.M. Schennach / Economics Letters 85 (2004) 85–91
89
Although for the benefit of conciseness, we do not do so explicitly, this result can be straightforwardly extended to show that one cannot fully identify specifications of the form
gðx*; hÞ ¼
Kj J X X
hjk ðx*Þk expðlj x*Þ;
ð20Þ
j¼1 k¼0
u is a vector of all the parameters hjk and where lj are given constants, out of which at least one is nonzero, while J and Kj are some nonnegative integers.2 It is also possible to allow for discrete distributions, at the expense of notational complications. When the lj are allowed to be complex numbers, it can even be shown that linear combinations of trigonometric functions are not identified either.3 However, our results do not readily extend to infinite linear combinations of the above functions. This is an important qualification, for otherwise, our results would imply that the very large class of functions that can be written as infinite linear combinations of trigonometric functions via the Fourier transform would not be identified. What is the importance of such a result? While the ‘‘polynomial –exponential’’ case is admittedly specific, such specifications have been among the most common in studies which have sought to estimate nonlinear specifications in the presence of measurement error (Hausman et al., 1995; Newey, 2001; Schennach, 2004). Hausman et al. (1995) and Schennach (2004) employ repeated measurements, which, unlike instruments, are sufficient for identification, regardless of the specification.4 Interestingly, Newey (2001) performs various Monte Carlo simulations using a specification of the form (20) to illustrate that a series approximation to f(u) might be used to avoid distributional assumptions, but, coincidentally, the series was truncated before reaching the number of free parameters needed to enable the construction of the counterexample establishing nonidentification. The loss of identifiability could therefore not be observed in those simulations.5 The underlying reason for the prevalence of exponential specifications in errors-in-variables models is that they are a natural choice whenever a regressor is assumed to have multiplicative— rather than additive—measurement error. For instance, a linear specification y = hX* + e where X* has a multiplicative error can be rewritten as y = h exp(x*) + e, where x* = ln X* now has an additive error. In light of the positive identification results obtained for polynomial specifications (Hausman et al., 1991) and for absolutely integrable functions (Wang and Hsiao, 2003), we expect that the lack of identification of the errors-in-variables model for polynomial–exponential combinations is the exception
2
The main change in the proof consists of augmenting the system of Eqs. (11) – (13) with equations of the form mp(u)uk du = mf(u)ukdu to allow for some lj to be zero and with equations of the form mp(u)ukexp(lu)du = 0 to allow for polynomial prefactors to the exponential. Correspondingly, the number of parameters cj needs to be increased. 3 Since cos(x*)=(exp(ix*) + exp( ix*))/2 and sin(x*) = (exp(ix*) exp( ix*))/(2i), this extension is straightforward. 4 This conclusion follows from the fact that two error-contaminated measurements of the same underlying variable enable the identification of the distribution of that variable (see Schennach, 2004 for instance). 5 In a private communication with the author, W. Newey indicated that, after increasing the number of terms in the series, his simulations indeed exhibited a behavior consistent with loss of identification.
90
S.M. Schennach / Economics Letters 85 (2004) 85–91
rather than the rule, however. Indeed, our results rely on the fact that exponentials are the only functions that are shape-invariant6 under convolution by any density f(u): Z Z expðz uÞf ðuÞdu ¼ expðzÞ expðuÞf ðuÞdu ¼ expðzÞ constant: ð21Þ However, in order to ever find a general solution to the estimation of error-in-variables models with instruments, it is extremely important to be aware of such counterexamples and the intuition they provide.
3. Conclusion While the estimating equations employed by Hausman et al. (1991), Newey (2001) and Wang and Hsiao (2003) do enable the identification of the nonlinear errors-in-variables model using instrumental variables in various specific cases, we show that, in general, these equations do not enable identification, by providing a class of counterexamples based on exponential functions as well as trigonometric functions (since the latter can be written in terms of exponentials with imaginary arguments). This result, in conjunction with the polynomial result of Hausman et al. (1991), indicates that the set of all specifications that enable—or prevent—identification has a very complex topology. It is conceivable that a function could be very well approximated by a polynomial of finite order as well as by a finite linear combination of trigonometric functions over an interval including most of the data points. Yet one representation would suggest identifiability, while the other would not. Of course, the solution to the dilemma lies in the fact that neither the polynomial nor the trigonometric cases readily extend to the case of infinite linear combinations. A complete treatment of nonlinear errors-in-variables models using instruments will thus likely prove to be a challenging endeavour. Acknowledgements This work is made possible through financial support from the National Science Foundation via grant SES-0214068.
References Amemiya, Y., 1985. Instrumental variable estimator for the nonlinear errors-in-variables model. Journal of Econometrics 28, 273 – 289. Hausman, J., Ichimura, H., Newey, W., Powell, J., 1991. Measurement errors in polynomial regression models. Journal of Econometrics 50, 271 – 295.
6
Sines and cosines are invariant under convolution with a symmetric density. Polynomials, however, are mapped to a different polynomial under convolution by a density.
S.M. Schennach / Economics Letters 85 (2004) 85–91
91
Hausman, J., Newey, W., Powell, J., 1995. Nonlinear errors in variables. Estimation of some engel curves. Journal of Econometrics 65, 205 – 233. Newey, W., 2001. Flexible simulated moment estimation of nonlinear errors-in-variables models. Review of Economics and Statistics 83, 616 – 627. Schennach, S.M., 2004. Estimation of nonlinear models with measurement error. Econometrica 72, 33 – 75. Wang, L., Hsiao, C. 2003. Identification and estimation of semiparametric nonlinear errors-in-variables models. Working Paper, University of Southern California.