Journal of Econometrics (
)
–
Contents lists available at ScienceDirect
Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom
Minimum distance estimation of the errors-in-variables model using linear cumulant equations Timothy Erickson a , Colin Huan Jiang b , Toni M. Whited c,∗ a
Bureau of Labor Statistics, United States
b
University of Chicago, United States
c
University of Rochester and NBER, United States
article
info
Article history: Available online xxxx JEL classification: C15 C26 E22 G31 Keywords: Errors-in-variables Higher cumulants Investment Leverage
abstract We consider a multiple mismeasured regressor errors-in-variables model. We develop closed-form minimum distance estimators from any number of estimating equations, which are linear in the third and higher cumulants of the observable variables. Using the cumulant estimators alters qualitative inference relative to ordinary least squares in two applications related to investment and leverage regressions. The estimators perform well in Monte Carlos calibrated to resemble the data from our applications. Although the cumulant estimators are asymptotically equivalent to the moment estimators from Erickson and Whited (2002), the finite-sample performance of the cumulant estimators exceeds that of the moment estimators. © 2014 Elsevier B.V. All rights reserved.
1. Introduction In 2010, the Journal of Finance, Journal of Financial Economics, and Review of Financial Studies contained 114 articles with at least some content that can be classified as empirical corporate finance. Strikingly, 106 of these studies use proxies for unobservable variables in their regression analyses. However, only 30 acknowledge the possibility of the resulting measurement error, and only about half of these 30 attempt to remedy the likely coefficient biases. This inattention to the errors-in-variables problem is serious, given that measurement error can bias regression coefficients. However, this inattention is understandable. The use of proxies arises naturally in corporate finance because most of the theory is cast in terms of abstract variables, such as private benefits, investment opportunities, or asset tangibility. As discussed in Erickson and Whited (2000), accountants simply do not record these sorts of variables, even approximately. The resulting measurement error is likely to be serious because of wide conceptual gaps between the proxies and true variables. Confounding this problem is a paucity of feasible remedies. Finding instruments for mismeasured regressors is difficult, especially given the evidence in
∗ Correspondence to: Simon School of Business, University of Rochester, Rochester, NY 14627, United States. Tel.: +1 585 275 3916. E-mail address:
[email protected] (T.M. Whited). http://dx.doi.org/10.1016/j.jeconom.2014.05.011 0304-4076/© 2014 Elsevier B.V. All rights reserved.
Erickson and Whited (2012) that using lagged mismeasured regressors as instruments can lead to misleading inferences under plausible assumptions. Moreover, extra identifying information, such as multiple measurements, is nearly nonexistent in corporate finance. In this paper, we address this widespread problem by providing a new method for obtaining consistent estimates of slope coefficients in the presence of measurement error. We consider estimation of an errors-in-variables model with multiple mismeasured and multiple perfectly measured regressors. We develop convenient two-step minimum distance estimators having simple closed-form expressions. The underlying estimating equations are linear in the third- and higher-order cumulants of the joint distribution of the observable variables, with every equation having the same coefficient vector as the regression model being estimated. Our work builds upon the framework in Erickson and Whited (2002), which uses estimating equations that express high-order residual moments as complicated nonlinear functions of slopes and nuisance parameters. Instead, our estimating equations are the linear high-order cumulant equations from Geary (1942). Our estimators extend (Geary, 1942) in two ways. First, we use modern minimum distance estimation results to exploit the overidentifying information obtained by using any number of cumulant equations, instead of following the approach in Geary (1942), which involves solving only exactly identified systems of cumulant equations to obtain slope estimates. This overidentifying
2
T. Erickson et al. / Journal of Econometrics (
information is important because, as shown in Pakes (1982), exactly identified cumulant estimators can have large variances. Second, we provide the correction to the asymptotic variance of the estimators, which is required because the estimators are based on high-order cumulants of the fitted residuals obtained by partialling out perfectly measured regressors. Because these estimators do not require any information beyond that contained in the observed regressors, they are practical to implement. However, because all third- and higher-order cumulants equal zero for normal distributions, our estimators require nonnormality of the mismeasured regressors. Although normality assumptions are common in econometrics, explicit departures from normality outside the errors-in-variable literature can be found in, for example, Bhargava (1987); Richardson and Smith (1993), or Martin (2013). In addition, in practical applications, many regressors are nonnormally distributed, so that, as also demonstrated in Erickson and Whited (2000), estimators that exploit nonnormality can be useful. Cumulants are polynomial functions of moments, and the cumulant estimators we present use exactly the same moments as in Erickson and Whited (2002). We show that our cumulant estimators and the corresponding moment estimators in Erickson and Whited (2002) have the same asymptotic variance. However, the cumulant estimators have a closed-form solution, which stems from the linearity of the estimating equations. This feature eliminates starting-value selection from the data analyst’s task, which is important, given the sensitivity of the moment estimators to starting values documented in Erickson and Whited (2012). Finally, we find in Monte Carlo simulations that the cumulant estimators can outperform the moment estimators. The literature on identifying and estimating errors-in-variables models using nonnormality of regressors starts with the conjecture in Neyman (1937) that such an approach might be possible. Reiersøl (1941) gives the earliest estimator, which is a third-order moment estimator of the one-regressor classical errors-in-variables model. In the first comprehensive paper, Geary (1942) shows how multivariate versions of the classical errors-in-variables model can be estimated using cumulants of any order greater than two. A long series of econometric contributions to the errors-in-variables literature has developed estimators that exploit nonnormality or high-order moments. A partial list includes Madansky (1959), Spiegelman (1979), Pal (1980), Pakes (1982), Van Montfort et al. (1987, 1989), Cragg (1997), Lewbel (1997), Dagenais and Dagenais (1997), Erickson and Whited (2000), Erickson and Whited (2002); Lewbel (2012), and Schennach and Hu (2013).1 Interestingly, apart from Geary (1942), the only other work that explicitly uses cumulants instead of moments is Pakes (1982), who also considers only exactly identified estimators. The rest of the paper proceeds as follows. Section 2 derives the estimators. Section 3 describes our data. Section 4 contains several Monte Carlo simulations. Our estimators can be applied to any errors-in-variables application that satisfies the estimators’ assumptions. Nonetheless, given the prevalence of untreated measurement error in corporate finance, Section 5 presents the results from two corporate-finance applications. Section 6 concludes. The Appendix contains the proofs. 2. Theory This section presents the model assumptions and the linear estimating equations. It then develops the estimators and provides examples.
1 For a detailed literature review, see Erickson and Whited (2002).
)
–
2.1. The model Let (yi , xi , zi ), i = 1, . . . , n, be a sequence of observable vectors, where xi ≡ (xi1 , . . . , xiJ ) and zi ≡ (1, zi1 , . . . , ziM ). Let (ui , εi , χi ) be a sequence of unobservable vectors, where χi ≡ (χi1 , . . . , χiJ ) and εi ≡ (εi1 , . . . , εiJ ). We consider a multiple-regressor version of the classical errors-in-variables model, where (yi , xi , zi ) is related to (ui , εi , χi ) and unknown parameters α ≡ (α0 , α1 , . . . , αM )′ and β ≡ (β1 , . . . , βJ )′ according to yi = zi α + χi β + ui
(1)
x i = χ i + εi .
(2)
Eq. (1) is a linear regression model containing J regressors χi that are imperfectly measured by xi according to (2), and M perfectly measured regressors, zi . The assumption of unit slopes and zerovalued intercepts in (2) is required to identify the parameters in (1). The assumption of no intercept in (2) is implausible but simplifies our analysis, and its violation only biases estimates of the intercept element in α . We assume the variables in (1) and (2) satisfy the following assumptions. Assumption 1. (i) (ui , εi , χi , zi ), i = 1, . . . , n, is an i.i.d. sequence; (ii) ui and the elements of εi , χi , and zi have finite moments of every order; (iii) (ui , εi ) is independent of (χi , zi ), and the individual elements in (ui , εi ) are independentof each other; (iv) E (ui ) = 0 and E (εi ) = 0; (vi) E (χi , zi )′ (χi , zi ) is positive definite. Before deriving the estimators, we partial out the perfectly measured variables and rewrite the model in terms of population residuals. The 1 × J residual from the population linear regression of xi on zi is xi − zi µx , where
−1 ′ µx ≡ E zi′ zi E zi xi .
(3)
The corresponding 1 × J residual from the population linear regression of χi on zi is
ηi ≡ χi − zi µx ,
(4)
where µx appears because (2) and the independence of εi and zi
−1
−1
imply µx = E zi′ zi E zi′ (χi + εi ) = E zi′ zi Note that subtracting zi µx from both sides of (2) gives
E zi′ χi .
xi − zi µx = ηi + εi .
(5)
Similarly, the residual from the population linear regression of yi on zi is yi − zi µy , where µy ≡ E zi′ zi independence of ui and zi imply
−1
E zi′ yi . Eq. (1) and the
−1 ′ µy = E zi′ zi E zi (zi α + χi β + ui ) = α + µx β.
(6)
Therefore, subtracting zi µy from both sides of (1) gives yi − zi µy = ηi β + ui .
(7)
We can now consider a two-step plug-in approach to estimation, where the first step is to substitute least squares estimates
µ ˆx ≡
−1 n
−1 n
′ ′ ′ zi′ zi ˆy ≡ i=1 zi xi and µ i=1 zi zi i=1 zi yi into (5) and (7), and the second step is to estimate β using sample cumulants of yi − zi µ ˆ y and xi − zi µ ˆ x . Estimates of α are then recovered via (6). Erickson and Whited (2002) use the same general approach, except that they minimize a weighted distance between the sample moments of these variables and nonlinear functions of β and other parameters. Instead, here we minimize the distance between sample cumulants and linear combinations of other sample cumulants, where every linear combination has β as the coefficient vector and no other parameters are involved.
n
i=1
n
T. Erickson et al. / Journal of Econometrics (
2.2. Geary’s results To begin the derivation of our estimators, we review the results in Geary (1942) on the relations between β and the product cumulants of yi −zi µy and xi −zi µx in the model given by (5) and (7). These cumulants are sums of products and powers of the moments of yi − zi µy and xi − zi µx , where each term of the sum has a constant coefficient. For example, if xi is scalar, then the cumulant of order 1 in xi − zi µx and order 3 in yi − zi µy is given by E
(xi − zi µx ) − 3E yi − zi µy (xi − zi µx ) 2 × E yi − zi µy .
yi − zi µy
3
An explicit expression for any cumulant of a distribution as a function of the moments of the distribution is given in Chapter 2 of McCullagh (1987), as is the companion expression giving any desired moment as a function of the cumulants. The set of all cumulants of any specified order P depends on all moments up to order P, and this map is invertible: any Pth order moment is a function of cumulants of order P and less. Although cumulants depend on moments through often quite lengthy expressions, the relationships between the cumulants themselves are often simple. This simplicity can be seen in the following result. Define ζi ≡ J j=1 ηij βj and let κ(s0 , s1 , . . . , sJ ) be the cumulant of order s0 in ζi and sj in ηij . Geary (1942) proves that
κ(s0 + 1, s1 , . . . , sJ ) = β1 κ(s0 , s1 + 1, . . . , sJ ) + · · · + βJ κ(s0 , s1 , . . . , sJ + 1)
(8)
holds for all vectors (s0 , s1 , . . . , sJ ) of nonnegative integers. Also, letting K (s0 , s1 , . . . , sJ ) be the cumulant of order s0 in yi − zi µy and sj in xij − zi µxj , Geary (1942) shows that K (s0 , s1 , . . . , sJ ) = κ(s0 , s1 , . . . , sJ )
(9)
holds if at least two elements of (s0 , s1 , . . . , sJ ) are positive. (See his Eqs. (5) and (13).) Therefore, for any (s0 , s1 , . . . , sJ ) containing two or more positive elements, we have the following relationship between cumulants, which can be easily estimated from regression residuals: K (s0 + 1, s1 , . . . , sJ ) = β1 K (s0 , s1 + 1, . . . , sJ ) + · · ·
+ βJ K (s0 , s1 , . . . , sJ + 1).
(10)
There are an infinity of equations given by (10), one for each admissible vector (s0 , s1 , . . . , sJ ). Let Ky = Kx β
(11)
denote a system of M equations of the form (10). If M = J and det Kx ̸= 0, then it is possible to solve for β . Geary (1942) provides examples of specific systems of equations given by (11) for cumulants up to order four. Our estimators extend Geary’s result to systems where M > J, by efficiently combining the information in the high-order cumulants by minimum distance estimation. 2.3. Estimates of β from overidentifying cumulant equations We consider estimators for β of the following type, in which ˆ is a M ≥ J, Kˆy and Kˆx are consistent estimates of Ky and Kx , and W symmetric positive definite matrix:
′ ˆ Kˆ y − Kˆ x b . βˆ ≡ argmin Kˆ y − Kˆ x b W
(12)
b∈ℜJ
Because Kˆ y − Kˆ x b is linear in b, (12) has the solution
ˆ Kˆ x βˆ = Kˆ x′ W
−1
ˆ Kˆ y , Kˆ x′ W
(13)
whenever Kˆ x has full column rank. The following assumption ensures β is identified and that βˆ exists with probability one.
)
–
3
Assumption 2. (i) β ∈ interior (Θ ), where Θ is compact; (ii) every element of β is nonzero; (iii) E (ηi c )3 ̸= 0 for every vector of constants c = c1 , . . . , cJ having at least one nonzero element; (iv) all third-order cumulant versions of (10) are included in (11).
Reiersøl (1950) shows that, given nonzero β , nonnormality of the unobserved regressor suffices to identify the single-regressor version of our model. Since third-order moments vanish for normal distributions, Assumption 2 exploits the result in Reiersøl (1950) by requiring that every element of ηi be skewed. Our assumption on ηi is similar to that given by Kapteyn and Wansbeek (1983) and Bekker (1986), who show that β is identified if there is no linear combination of the unobserved true regressors that is normally distributed. In practice the constraint of a finite sample size makes it desirable that the regressors’ nonnormality include skewness, as identification and estimation relying exclusively on subtle features like kurtosis would require extremely large data sets. We do not recommend our estimators to anyone who cannot confidently assume that the third moments of each element of ηi are nonzero. Our assumption that β contains no zeros ensures identification if one allows for the possibility that elements of ηi are independent of each other. Some zeros in β are permissible if such independence is assumed away, but a precise statement of the various identifying cases would be complex. The requirement of interiority in part (i) is not necessary for the consistency of βˆ , but is useful for obtaining its asymptotic distribution. As shown in Erickson and Whited (2002), condition (iv) combines with (ii)–(iii) to ensure Kx has full rank. To derive the distribution for βˆ we start with the distribution for the partialling coefficients µ ˆ ≡ vec µ ˆ y, µ ˆ x because, unlike OLS and many IV estimators, the use of high-order moment information implies that the variance of βˆ is larger than it would be if the true partialling coefficients µ ≡ vec(µy , µx ) were known. To account for this larger variance, we use the influence function for µ ˆ . Denoted ψµi , it is defined as follows. Definition 1. Let Ri (s) ≡ vec zi′ yi − zi sy , zi′ (xi − zi sx ) , Q ≡
IJ +1 ⊗ E zi′ zi , then ψµi ≡ Q −1 Ri (µ).
As shown in Erickson and Whited (2002), if Assumption √ 1 holds, ˆ −µ then E ψµi = 0, avar µ ˆ = E ψµi ψµ′ i < ∞, and n µ
= n−1/2
n
i=1
ψµi + op (1 ).
We obtain estimates
Kˆ y , Kˆ x
by plugging sample moments
of yi − zi µ ˆ y and xi − zi µ ˆ x into the expressions giving Ky , Kx as functions of the population moments of yi − zi µy and xi − zi µx . The distribution of the high-order ‘‘sample’’ (as distinct from population) moments of the µ-known residuals differs from that for yi − zi µ ˆ y and xi − zi µ ˆ x ; we use the influence function to make the necessary adjustment. Let gi (µ) be a vector of elements of the J form (yi − zi µy )r0 j=1 (xij − zi µxj )rj such that E [gi (µ)] contains all the moments necessary to completely determine every element of (Ky , Kx ). Let g¯ (µ) ≡ n−1 ni=1 gi (µ).
Lemma 1. Let G (m) ≡ E ∂ gi (m) /∂ m′ . If Assumption 1 holds,
p
then (i) g¯ (µ) ˆ −→ E [gi (µ)], and; (ii) N (0, Ω ), where
√
d
n g¯ (µ) ˆ − E [gi (µ)] −→
Ω ≡ var gi (µ) − E [gi (µ)] + G(µ)ψµi . ‘‘Partialling’’ is not innocuous in the context of high-order moment-based estimation because the elements of G(µ) corresponding to moments of order three or greater are generally
3
nonzero. For example, if gi (µ) contains xij − zi µxj , then G(µ)
2 contains E 3 xij − zi µxj (−zi ) .
4
T. Erickson et al. / Journal of Econometrics (
Turning to the distribution of Kˆ y , Kˆ x , recall that each mth order cumulant in Ky , Kx is a continuously differentiable function
of moments of the form E (yi − zi µy )r0
J
j=1
(xij − zi µxj )rj up to
order m. Let ξ ≡ E [gi (µ)] be the vector of moments that completely determine every element of Ky , Kx , which we then write
as Ky (ξ ), Kx (ξ ) . Let ξˆ ≡ g¯ µ ˆ and Kˆ y , Kˆ x ≡ Ky (ξˆ ), Kx (ξˆ ) .
The next proposition establishes the consistency and asymptotic normality of the estimators Kˆ y , Kˆ x . Proposition 1. Let D (s) = ∂ vec Ky (s), Kx′ (s) /∂ s′ , and let D ≡
D(ξ ). If Assumption 1 holds then: (i) Kˆ y , Kˆ x
converges in proba-
bility to Ky , Kx ; (ii) the limiting distribution for
√
n vec Kˆ y , Kˆ x′ − vec Ky , Kx′
(14)
is normal with mean zeroand covariance matrix DΩ D′ ; (iii) the lim iting distribution for
√
n Kˆ y − Kˆ x β
is normal with mean zero and
′ − IM ⊗ β ′ .
− IM ⊗ β ′ DΩ D′ IM
Proposition 2. For the model given by (1) and (2), if Assumptions 1 p
ˆ −→ W , and W is positive definite, then (i) βˆ converges and 2 hold, W √ in probability to β ; (ii) n(βˆ − β) converges in distribution to a normal distribution with zero mean and covariance matrix ˆ ≡ Kx WKx avar (β) ′
−1
Kx W Σ WKx Kx WKx ′
′
−1
.
The optimal estimator of the type given by (12) is obtained by ˆ so that its limit, W , equals Σ −1 , in which case choosing W
ˆ = avar(β)
Kx′ Σ −1 Kx
−1
. The next result is useful for obtaining
ˆ. an optimal W ∂ gi (s) /∂ s′ , Q¯ ≡ IJ +1 ⊗ i=1 n ˆ µi ≡ Q¯ −1 Ri µ n−1 i=1 zi′ zi , ψ ˆ , Dˆ = D ξˆ and ¯ (s) ≡ n−1 Proposition 3. Let G
ˆ ≡ n− 1 Ω
n
gi (µ) ˆ − g¯ (µ) ˆ + G¯ (µ) ˆ ψˆ µi
′
.
(15)
If Assumption 1 holds, and β is a consistent estimator of β , then
ˆ = IM Σ
ˆ Dˆ ′ IM − IM ⊗ β ′ Dˆ Ω
We now describe several examples of (11) that can be used to construct estimators. To simplify notation we define y˙ i ≡ yi − zi µy and x˙ i ≡ xi − zi µx . First, we consider the case of one mismeasured regressor, so that β in (1) contains only one element. A natural starting place for this case is the original example in Geary (1942), which is the third-order cumulant estimator given by K (2, 1) = β1 K (1, 2).
(17)
Because third-order cumulants equal third-order moments when the random variables in question have a zero mean (as is the case with x˙ and y˙ ), we can substitute K (2, 1) = E y˙ 2i x˙ 1i
K (1, 2) = E ˙ ˙
yi x21i
(18)
(19)
.
E y˙ 2i x˙ 1i E y˙ i x˙ 21i
′ − IM ⊗ β′
(20)
A sufficient condition for the rank condition in Assumption 2 to be 3 satisfied is that E (η1i ) ̸= 0 and β1 ̸= 0 (e.g. Erickson and Whited, 2002, 2012). Our second example is a fourth-order cumulant estimator based on the third-order equation in (17) and the two possible fourthorder cumulant equations from (10): K (2, 1) = β1 K (1, 2)
(21)
K (3, 1) = β1 K (2, 2)
(22)
K (2, 2) = β1 K (1, 3).
(23)
Fourth-order cumulants do not equal fourth-order moments, but are instead functions of fourth-order moments. Therefore, to construct an estimator, we substitute into (21)–(23) the following identities that relate moments to cumulants: K (3, 1) ≡ E y˙i 3 x˙ 1i − 3E (y˙i x˙ 1i ) E y˙i 2
K (2, 2) ≡ E y˙i 2 x˙ 1i − E y˙i
2
(16)
converges in probability to Σ . In our empirical application, we do a first-round estimation where we evaluate (16) with the inconsistent OLS estimator obtained by regressing yi − zi µ ˆ y on xi − zi µ ˆ x . The inverse of the resulting matrix yields a consistent but inefficient estimator that we use as β in a second round. Additional rounds provide no asymptotic improvement, but our Monte Carlo simulations indicate that iterating additional rounds until convergence improves finite-sample performance. In contrast, using a continuous updating estimator, as in Hansen et al. (1996), results in worse finite-sample
3
E x˙ 1i − 2E (y˙i x˙ 1i )2
2
2
K (1, 3) ≡ E y˙i x˙ 1i − 3E (y˙i x˙ 1i ) E ˙
i=1
× gi (µ) ˆ − g¯ (µ) ˆ + G¯ (µ) ˆ ψˆ µi
2.4. Examples
n
performance because the resulting estimator does not have a closed-form solution.
β1 =
Proposition 1 (ii) should be useful for devising tests of the validity of Assumption 2 (ii)–(iii) in specific applications, while part (iii) of Proposition 1 is used in the proof of Proposition 2.
–
into (17) to obtain
covariance matrix
Σ = IM
)
x21i
.
(24) (25) (26)
The third example considers the case in which χi contains two elements, so that the regression contains two mismeasured regressors. From (10), the third-order cumulant equations are K (1, 1, 1) = β1 K (0, 2, 1) + β2 K (0, 1, 2)
(27)
K (2, 0, 1) = β1 K (1, 1, 1) + β2 K (1, 0, 2)
(28)
K (2, 1, 0) = β1 K (1, 2, 0) + β2 K (1, 1, 1).
(29)
An examination of these three examples reveals that the estimators in Dagenais and Dagenais (1997) for normal-error models are similar to the third- and fourth-order examples of the estimators we present here. Specifically, (21)–(23) and (28)–(29) can be derived using the instrumental-variable construction method in Dagenais and Dagenais (1997). However, (27) cannot be similarly derived because those authors allow for arbitrary dependence between the normal errors, whereas our independence assumption allows us to exploit product moments between mismeasured regressors.
T. Erickson et al. / Journal of Econometrics (
)
–
5
2.5. Asymptotic equivalence of moment and cumulant estimators
2.7. Cluster sampling and panel data
The cumulant estimators based on (11) have a closedform solution and are thus more convenient than the moment estimators from Erickson and Whited (2002). We now show that these two estimators have identical asymptotic variances. The moment estimators are based on equations
We now extend the estimators to data that are not i.i.d. We consider the case in which the sample consists of K groups (clusters) of nk observations each (n = n1 + · · · + nK ), such that observations are independent across groups but dependent within groups, K → ∞, and nk fixed for each k. For example, in corporate finance panel data, a group might consist of all of the observations for a single firm. We order observations by groups and use doubleindex notation so that u ≡ {u1,1 , . . . , un1 ,1 | . . . | u1,K , . . . , unK ,K }, and so on for all variables in (1) and (2). Under these assumptions, the calculation of the cumulants, Kˆ x and Kˆ y , proceeds exactly as it would under i.i.d. sampling, but the calculation of the covariance ˆ , is different. matrix, Ω To simplify notation, we define pj,k ≡ gj,k (µ) − E gj,k (µ) +
ξ = c (θ ) ,
(30)
′
expressing moments ξ of (˙yi , x˙ i ) as functions of θ = β , σ ′ , where σ consists of moments of (ui , εi , ηi ). The optimal GMM estimator based on these equations has the asymptotic variance
avar θˆ
−1 = C ′ Ω −1 C ,
′
(31)
where Ω and C are defined by
d √ n ξˆ − ξ0 → N (0, Ω ) ∂ c (θ) C = . ∂θ ′ θ =θ0
(32) (33)
¯ ˆ + G¯ (µ) G(µ)ψµ j,k and pˆ j,k ≡ gj,k (µ) ˆ ψˆ µ j,k . Under cluster ˆnk − g (µ) p , then we can define Ω as sampling, if we let p¯ k = j=1 j,k Ω = lim
n→∞
K
E p¯ k p¯ ′k .
(36)
k=1
= 0 only nk ˆ j ,k . A if i and j belong to different clusters. Define p˜ k ≡ j =1 p consistent estimate of Ω is therefore See, for example, Arellano (2003). Note that E pi p′j
Proposition 4. Suppose 1 and 2 hold and that ξ Assumptions determines a matrix Ky , Kx that identifies β via (11). If (i) ξ consists of all moments from order 2 to P, for P ≥ 3, and (ii) the system (11) contains all possible instances of (10) obtainable from cumulants of orders 2 to P, then the optimal minimum distance estimator based on (11) has an asymptotic variance equal to the asymptotic variance of the β component of θˆ . 2.6. Estimating other quantities It is possible to estimate several interesting quantities besides
β . First, one can estimate the coefficient vector α , which can be recovered by the identity (6). The standard errors for α can then be computed by stacking the influence functions for the different components of (6), taking their outer product, and using the delta method. (See Erickson and Whited (2002) for details.) Second, one can estimate the coefficients of determination for (1) and (2), which we denote as ρ 2 and τj2 , j = 1, . . . , J, and which can be written as
ρ2 =
µ′y var (zi ) µy + β ′ var (ηi ) β
µ′y var (zi ) µy + β ′ var (ηi ) β + E u2i µxi ′ var (zi ) µxi + E ηij2 2 , j = 1, . . . , J . τj = µxi ′ var (zi ) µxi + E ηij2 + E εij2
(34)
(35)
The quantities given by (35) are indices of measurement quality that range between 0 and 1, with 0 indicating a worthless proxy and 1 indicating a perfect proxy. The estimation of ρ 2 and τj2 proceeds differently from the way it does in Erickson and Whited (2002) because the high-order moment estimators in Erickson and Whited (2002) automatically deliver consistent estimates of the moments var ηi2 , E εi2 ,
and E u2i . Such is not the case with the high-order cumulant estimators, which only directly deliver consistent estimates of β . However, one can use the moment in and equations Erickson Whited (2002) to solve for var ηi2 , E εi2 , and E u2i in terms of
β and the observable moments var (˙xi ), E y˙ 2i , and cov (˙yi , x˙ i ). As in the case of α , the standard errors for ρ 2 and τj2 can be computed
by stacking the influence functions for the different components of (34) and (35), taking their outer product, and using the delta method.
ˆ = Ω
K 1
n k=1
p˜ k p˜ ′k .
(37)
3. Data The sample for our investment and leverage applications and for the calibration of our Monte Carlos is from the 2012 Compustat Industrial Files and runs from 1970 to 2011. We exclude financial firms (SIC code 6000 to 6999), regulated firms (SIC code 4900 to 4999), and firms that do not have a CRSP share code of 10 or 11. We delete firms with fewer than $2 million in real total assets, which we deflate by the Producer Price Index, with a base year of 1982. We then require non-missing data for the variables used to construct our regression variables. We deal with outliers in two ways. First, several of our variables, such as leverage and the fraction of tangible assets, must lie in the [0, 1] interval. We therefore truncate these variables at the interval endpoints. We then trim the sample to eliminate the top and bottom 0.5% of our other regression variables. We are left with 121,733 firm-year observations, with between 2145 and 3837 firms per year. To calculate the variables for our investment regressions, we follow Erickson and Whited (2000). Investment is Compustat item CAPX. Cash flow is the sum of items IB and DP. Both investment and cash flow are deflated by the gross beginning-of-period capital stock, PPEGT. The numerator of Tobin’s q is DLTT plus DLC plus PRCC_F times CSHO minus AC. The denominator is PPEGT. For our leverage regressions, we use standard definitions from, for example, Rajan and Zingales (1995). We define net book leverage as (DLTT+DLC-CHE) divided by book assets (AT). As in Rajan and Zingales (1995), we use net leverage. Results from using gross leverage are largely similar. Profitability is operating profit (OIBDP) divided by AT; tangibility is net property, plant, and equipment (PPENT) over total assets, and size is the natural logarithm of real net sales (SALE). Finally, the numerator of the market-to-book ratio is AT+PRCC_F times CSHO minus CEQ minus TXDB, and the denominator is AT. Table 1 presents the summary statistics. Panels A and B summarize the variables for our investment and leverage regressions, respectively. Two important features stand out. First, both Tobin’s
6
T. Erickson et al. / Journal of Econometrics (
)
–
Table 1 Summary statistics.
Panel A: Investment regression variables Investment/capital stock Tobin’s q Cash flow/capital stock Panel B: Leverage regression variables Book leverage Market-to-book Tangibility Log sales Operating income/assets
Mean
Variance
Third standardized moment
Fourth standardized moment
Fifth standardized moment
Serial correlation
0.169 2.589 0.181
0.035 35.844 0.258
3.852 6.076 −1.071
26.072 57.175 22.077
203.035 633.429 −65.815
0.500 0.734 0.469
0.125 1.551 0.557 5.833 0.138
0.089 1.149 0.081 2.978 0.023
−0.360
3.549 26.251 1.812 2.929 12.427
−2.924 249.808 0.400 2.245 −66.264
0.786 0.625 0.936 0.913 0.634
3.629 0.148 0.264 −1.359
The nth standardized moment is the nth moment divided by the standard deviation to the nth power. Serial Correlation denotes a first-order autoregressive coefficient, calculated via the technique in Han and Phillips (2010).
q and the market-to-book ratio are highly skewed, with the former more highly skewed than the latter. This feature is important because the identifying conditions in Assumption 2 require nonnormality of the mismeasured regressors. Second, our proxy for asset tangibility is also positively skewed, though not to the same degree as either Tobin’s q or the market-to-book ratio. 4. Monte Carlo simulations Before presenting our data analysis, we consider two Monte Carlo simulations to assess the finite-sample performance of our estimators. First, we consider one mismeasured regressor, with the data-generating process (DGP) calibrated to resemble our investment regressions. Second, we consider two mismeasured regressors, with the DGP calibrated to resemble our leverage regressions. 4.1. Calibration and specification Both of our simulated data sets consist of a panel of length 20 and width 3000. These dimensions are roughly the average time-series and cross-sectional dimensions of our real data set. We create our simulated samples as follows. First, we choose values for three key parameters: β , α , and τ 2 . For the investment design with one mismeasured regressor, β = 0.025, α = 0.01 and τ 2 = 0.45. For the leverage design with two mismeasured regressors, β1 = −0.05, β2 = 1, α1 = 0.05, α2 = −0.05, τ12 = 0.45, and τ22 = 0.25. These settings approximately equal the average estimates from our data analysis, and they imply values for ρ 2 that are approximately equal to those from our data analysis. Next, we generate time-zero i.i.d. cross-sections for the variables (χi0 , zi0 , ui0 , εi0 ), in which each variable has a zero-mean, unit-variance gamma distribution with shape parameters described below. We then generate the entire panel (χit , zit , uit , εit ) by updating AR (1) processes, given by χ
χit = δχ + φχ χi,t −1 + vit ,
(38)
zit = δz + φz zi,t −1 + v ,
(39)
uit = φu ui,t −1 + v ,
(40)
z it
u it
ε
εit = φε εi,t −1 + vit .
(41)
Here, φj , j = (χ , z , u, ε) are the autocorrelation coefficients of the χ AR (1) processes governing (χit , zit , uit , εit ) and vit , vitz , vitu , vitε are the i.i.d. innovations to these processes, again with zero-mean, unit-variance gamma distributions. When χit and zit contain more χ than one element, vit and vitε are vectors, and φχ and φz are matrices with the autocorrelation coefficients on the diagonal and
zeros on the off-diagonal. We update the processes for 25 periods and keep the last 20, which removes the effects of initial conditions. We set the parameters δχ , δz so that the means of the simulated vectors (χit , zit ) equal the means of (xit , zit ) in our data. We set φu = φε = 0. From the autocorrelation estimates in Table 1, we set φχ = 0.78 and φz = 48 for the design based on our investment regression. For the leverage regression design, we set φχ = (0.63, 0.94), and φz = (0.99, 0.63). We then set the covariances between the regressors by multiplying the vector (χit , zit ) by an eigenvalue decomposition of the covariance matrix of (χit , zit ). To calculate the covariance matrix of (χit , zit ), we first set it equal to the estimate of the covariance matrix of (xit , zit ) from our data, and we then replace the elements corresponding to the variance of χit with our data estimates of the variance of xit times τ 2 . With the variables (χit , zit , uit , εit ), we then construct the observable variables (xit , yit ) from (1) and (2). We also set the intercept in (1) to match the mean of yit in our data, and we set the variance of uit so that the simulated and actual variances of yit are equal. Finally, we parameters of the gamma distri χchoose the shape butions for vit , vitz , vitu , vitε and (χi0 , zi0 , ui0 , εi0 ) so that our simulated data vector (xit , yit , zit ) has higher moments approximately equal to those from our real data. This simulated method of moments exercise proceeds as follows. For the investment design, we choose four shape parameters to minimize the equal-weighted distance between five data moments and the corresponding simulated moments. These moments are the skewness of the three observable variables as the two higher-order (xit , yit , zit ),as well cross moments: E x2it yit and E xit y2it . For the leverage design, we choose seven shape parameters to match the skewness of the five observable variables (yit , x1it, x2it , z1it , z2it ), inaddition to four higher-order cross moments: E x21it yit , E x1it y2it , E x22it yit , and
E x2it y2it . In both cases, matching cross moments is useful for identifying the skewness of the unobservable χit . χ For the investment design, the innovations vit , vitz , vitu , vitε and initial variables (χi0 , zi0 , ui0 , εi0 ) have shape parameters of 0.007, 2.08, 0.32, and 0.09, respectively. For the leverage design, we use shape parameters of 0.004, 0.02, 3.55, 1.86, 8.3, 0.26, and 25.3. Because we generate the variables via AR(1) processes, the skewness of these gamma distributions far exceeds the skewness of our simulated variables. Also, in order to simulate actual investment and leverage regressions, we do a within transformation on the data, which further reduces skewness and kurtosis. We report the actual moments of our untransformed, simulated variables with our simulation results.
T. Erickson et al. / Journal of Econometrics (
)
–
7
Table 2 Monte Carlo performance of high-order cumulant and moment estimators: one mismeasured regressor. Panel A
OLS
Mean bias (βˆ1 ) MAD(βˆ1 ) P(| βˆ1 − β1 |≤ 0.2β1 ) P(tβ1 )
M3
M4
M5
C4
C5
0.000 0.022 1.000 0.051
−0.001
0.003 0.020 1.000 0.158
0.000 0.022 1.000 0.051
−0.002
−0.002
0.021 1.000 0.131
0.022 1.000 0.314
4.174 4.174 0.000
−0.006
0.006 0.191 0.600 0.069
−0.021
−0.006
0.190 0.600 0.051
0.182 0.620 0.079
0.190 0.600 0.051
0.011 0.183 0.619 0.063
0.011 0.191 0.596 0.109
−0.554
−0.006
−0.023
−0.073
−0.006
−0.008
−0.008
0.554 0.000
0.067 0.983 0.069
0.069 0.981 0.184
0.093 0.934 0.382
0.067 0.983 0.069
0.067 0.983 0.073
0.067 0.984 0.077
−0.663 0.663 0.000 –
Mean bias (αˆ1 ) MAD(αˆ1 ) P(| αˆ1 − α1 |≤ 0.2α1 ) P(tα )
–
0.022 1.000 0.118
C3
Mean bias (ρˆ 2 ) MAD(ρˆ 2 ) P(| ρˆ 2 − ρ 2 |≤ 0.2ρ 2 ) P(tρ )
–
Mean bias (τˆ 2 ) MAD(τˆ 2 ) P(| τˆ 2 − τ 2 |≤ 0.2τ 2 ) P(tτ )
– – – –
−0.008
−0.016
−0.064
−0.008
−0.005
−0.005
0.059 0.993 0.061
0.060 0.993 0.142
0.081 0.964 0.334
0.059 0.993 0.061
0.058 0.994 0.061
0.060 0.993 0.070
Nominal 5% Sargan test rejection rate
–
–
0.058
0.222
0.065
0.263
Panel B Summary statistics
Mean
Variance
Investment (yit ) Observable q(xit ) Cash flow (zit )
0.169 2.589 0.181
0.035 35.747 0.258
Third standardized moment 3.671 5.866 −0.935
–
Fourth standardized moment 27.296 60.362 4.893
Fifth standardized moment 291.563 868.104 −12.255
Mn and Cn denote the high-order moment and cumulant estimators based on moments up to order n, respectively. MAD indicates Mean Absolute Deviation. P(t ) is the actual size of a two-sided t-test with a nominal significance level of 5%. β1 and α1 are the coefficients on the mismeasured and perfectly measured regressors. ρ 2 is the population R2 of the regression equation (1), and τ 2 is the population R2 of the measurement equation (2). Indicated third through fifth moments are scaled by the standard deviation raised to the corresponding power.
4.2. One mismeasured regressor Table 2 reports the results from a Monte Carlo that is based on our regression of investment on Tobin’s q and cash flow. The model accordingly contains one mismeasured and one perfectly measured regressor. For each of the parameters β1 , α1 , ρ 2 , and τ 2 , we report the mean bias and the mean absolute deviation (MAD) of the estimates from their true value. We express both bias and MAD as a fraction of the true parameter. We also report the probability the estimate is within 20% of the true value (probability concentration), as well as the actual size of a nominal 5% twosided t-test that a parameter equals its true value. Panel A contains results for OLS, the high-order moment estimators from Erickson and Whited (2002) and the high-order cumulant estimators. We report results for estimators that use cumulants or moments of up to order five, where a moment or cumulant estimator of order n is denoted Mn or Cn, respectively. For the overidentified moment estimators, for starting values we use both an exactly identified third-order moment estimator and the OLS estimate as alternative starting values, and we then choose the result corresponding to the lower minimum distance objective function. In practice, using a much larger grid of starting values is preferable, but this procedure is too time-consuming for a Monte Carlo. The first column of Panel A shows the bias induced by measurement error. The R2 and β1 are biased downward, and α1 is biased sharply upward because of the positive correlation between the mismeasured and perfectly measured regressors. In contrast, for both the moment and the cumulant estimators, the biases for slope coefficients are near zero, the MADs are low, and the probability concentrations are all 1 for the coefficient on the mismeasured regressor. For the slope coefficients, the performance of the cumulant and moment estimators is nearly identical. For ρ 2 and τ 2 , the cumulant estimators outperform the moment estimators slightly, especially in terms of bias, but this outperformance is not statistically significant. In terms of hypothesis tests, for all parameters, the third-order moment and cumulant estimators produce tests whose actual sizes
are close to the nominal levels of 5%. However, for the fourthand fifth-order moment and cumulant estimators, the tests that β1 and α1 equal their true values are somewhat oversized, with the fifth-order estimators performing worse than the fourth-order estimators. For ρ 2 and τ 2 , the fourth- and fifth-order cumulant estimators produce correctly sized tests, while the fourth- and fifth-order moment estimators produce oversized tests. The bottom row of Panel A presents the actual size of a nominal 5% test of the model overidentifying restrictions from Sargan (1958). For both the moment and cumulant estimators, the test from the fourth-order estimators is approximately correctly sized, while the test from the fifth-order estimators is oversized. In sum, the moment and cumulant estimators are asymptotically equivalent and have nearly identical finite-sample performance. However, the computational ease afforded by the closed-form solution of the cumulant estimators makes them a better choice for practical applications. Panel B reports the moments of the simulated data, where the third- and higher-order moments are scaled by the standard deviation raised to the corresponding power. Note that all of these moments are quite close to the moments of the actual data in Table 1. 4.3. Two mismeasured regressors As seen in Table 3, when we move to the case of two mismeasured regressors, the performance of the cumulant estimators exceeds that of the moment estimators, with the coefficient biases for the cumulant estimators significantly below the biases for the moment estimators. The structure of Table 3 is identical to that of Table 2. We present estimators based on moments and cumulants up to order five. As in the case of one mismeasured regressor, we use both the OLS estimator and one of the available exactly identified third-order moment estimator as starting value for the moment estimators, all of which are overidentified when there are two mismeasured regressors. Panel A of Table 3 shows that for both the cumulant and moment estimators, the most poorly estimated parameter is α2 ,
8
T. Erickson et al. / Journal of Econometrics (
)
–
Table 3 Monte Carlo performance of high-order cumulant and moment estimators: two mismeasured regressors. Panel A
OLS
Mean bias (βˆ1 ) MAD(βˆ1 ) P(| βˆ1 − β1 |≤ 0.2β1 ) P(tβ1 )
M3
M4
M5
0.131 0.316 0.840 0.154
0.037 0.187 0.897 0.169
0.004 0.181 0.655 0.051
0.008 0.073 0.970 0.118
−0.016
0.142 0.000
0.190 0.439 0.595 0.062
−0.895
−0.200
−0.139
−0.056
−0.005
−0.014
−0.019
0.895 0.000
0.450 0.586 0.060
0.309 0.863 0.151
0.163 0.935 0.144
0.185 0.644 0.050
0.052 0.994 0.093
0.048 0.993 0.190
−0.312
−0.087
−0.059
−0.024
−0.002
−0.005
−0.005
0.312 0.000
0.212 0.814 0.330
0.168 0.871 0.193
0.107 0.945 0.122
0.098 0.900 0.264
0.060 0.992 0.074
0.058 0.993 0.072
-2.112 2.112 0.000 –
−0.645
−0.454
−0.171
−0.016
−0.041
−0.036
1.493 0.187 0.566
1.090 0.406 0.258
0.621 0.450 0.191
0.635 0.207 0.529
0.263 0.466 0.149
0.255 0.477 0.143
−0.695
−0.093
−0.104
−0.115
0.180 0.861 0.191
0.141 0.904 0.373
0.003 0.108 0.868 0.003
0.000 0.060 0.990 0.105
−0.010
0.695 0.000
−0.142
–
Mean bias (βˆ2 ) MAD(βˆ2 ) P(| βˆ2 − β2 |≤ 0.2β2 ) P(tβ2 ) Mean bias (αˆ1 ) MAD(αˆ1 ) P(| αˆ1 − α1 |≤ 0.2α1 ) P(tα1 ) Mean bias (αˆ2 ) MAD(αˆ2 ) P(| αˆ2 − α2 |≤ 0.2α2 ) P(tα2 ) Mean bias (ρˆ 2 ) MAD(ρˆ 2 ) P(| ρˆ 2 − ρ 2 |≤ 0.2ρ 2 ) P(tρ )
–
–
C3
C4
C5
0.088 0.927 0.328
–
0.231 0.788 0.060
Mean bias (τˆ1 ) 2 MAD(τˆ1 ) 2 P(| τˆ1 − τ12 |≤ 0.2τ12 ) P(tτ1 )
– – – –
0.599 0.870 0.790 0.063
0.078 0.283 0.836 0.197
−0.087
−0.020
−0.004
0.210 0.683 0.406
0.144 0.840 0.035
0.093 0.913 0.070
0.033 0.113 0.847 0.113
Mean bias (τˆ2 ) 2 MAD(τˆ2 ) 2 P(| τˆ2 − τ22 |≤ 0.2τ22 ) P(tτ2 )
– – – –
−0.349 1.093 0.604 0.066
0.648 1.151 0.852 0.105
0.185 0.464 0.921 0.143
0.070 0.189 0.661 0.033
0.018 0.094 0.966 0.058
0.034 0.081 0.960 0.087
Nominal 5% Sargan test rejection rate
–
0.126
0.151
0.167
0.049
0.086
0.184
2
2
Panel B Summary statistics
Mean
Variance
Net leverage (yit ) Market-to-book (x1it ) Tangibility (x2it ) Log sales (z1it ) Operating profit (z2it )
0.125 1.551 0.557 5.833 0.138
0.081 1.142 0.077 0.823 0.022
Third standardized moment
−0.287 3.795 0.633 0.227 −0.876
Fourth standardized moment 6.807 123.631 4.458 3.376 4.350
0.060 0.991 0.202
Fifth standardized moment
−22.160 3262.758 10.851 2.541 −11.457
Mn and Cn denote the high-order moment and cumulant estimators based on moments up to order n, respectively. MAD indicates Mean Absolute Deviation. P(t ) is the actual size of a two-sided t-test with a nominal significance level of 5%. βi , i = 1, 2 and αi , i = 1, 2 are the coefficients on the mismeasured and perfectly measured regressors. ρ 2 is the population R2 of the regression equation (1), and τi2 , i = 1, 2 are the population R2 s of the measurement equation (2). Indicated third through fifth moments are scaled by the standard deviation raised to the corresponding power.
the coefficient on the simulated variable corresponding to the ratio of profits to assets. Here, the fifth-order cumulant estimator has the highest probability concentrations and the lowest MAD, and the third-order cumulant estimator has the lowest bias. However, at 0.477, the probability concentration for the fifthorder cumulant estimator is not extremely high. Nonetheless, the moment estimators perform worse, especially the thirdorder moment estimator, which exhibits substantial bias and a probability concentration of 0.187. Both the cumulant and the moment estimators do a better job with the other three slope coefficients, but once again, the cumulant estimators outperform, with the fourth-order estimator having the highest probability concentrations for β1 and β2 , and the fifth-order estimator having the highest probability concentration for α1 . This result suggests that using cumulants beyond the fourthorder cumulants in Dagenais and Dagenais (1997) can improve accuracy and efficiency in finite samples. As in the case of one mismeasured regressor, the cumulant estimators of ρ 2 and the two τ 2 s outperform the moment estimators. For the cumulant estimators, the performance of the t-tests and the tests of the overidentifying restrictions is similar to the case of one mismeasured regressor, with the lower-order estimators having approximately correctly sized tests, and the higher-order estimators having slightly oversized tests. The performance of the
t-tests for the moment estimators is poor for all parameters and all estimators, with nearly all tests oversized. To close this section, we investigate the source of the superior finite-sample performance of the cumulant estimators. One likely candidate is the choice of starting values for the moment estimators. To explore this possibility, we rerun the Monte Carlo of the moment estimators, except that we use the true coefficient value as a starting value. Although we find a marked increase in performance, which is, of course, not available in actual practice, the moment estimators continue to exhibit higher MADs and lower probability concentrations than the cumulant estimators. We conclude that in the case of the moment estimators, difficulties in obtaining global minima contribute to much of their comparatively poor performance. 5. Data analysis In this section we consider two corporate-finance applications of regressions with mismeasured regressors. The first application is the regression of investment on Tobin’s q and cash flow from Fazzari et al. (1988), which is an example of a onemismeasured regressor model, in which Tobin’s q proxies for true unobservable investment opportunities in capital. The cash flow coefficient in this regression is widely used as a vehicle to detect
T. Erickson et al. / Journal of Econometrics ( Table 4 Regressions of investment on Tobin’s q and cash flow. OLS
Third
Cash flow/capital stock
ρ2 τ2
Fourth
Fifth
0.009* (0.000) 0.091* (0.002) 0.106
0.037* (0.003) −0.014 (0.011) 0.237* (0.006) 0.314* (0.017)
0.038* (0.002) −0.018 (0.010) 0.243* (0.007) 0.308* (0.015) 8.237 0.016
0.031* (0.001) 0.007 (0.006) 0.211* (0.006) 0.352* (0.012) 38.057 0.000
Market-to-book ratio
0.039* (0.003) −0.020 (0.012) 0.244* (0.013) 0.301* (0.016) 8.379 0.000
0.032* (0.001) 0.008 (0.007) 0.213* (0.008) 0.350* (0.014) 39.663 0.000
τ22
Panel B: moment estimators
Cash flow
ρ
2
τ2 Sargan test p-value
9
OLS
Third
Fourth
Fifth
−0.015*
−0.115*
−0.040*
−0.032*
(0.000) 0.196* (0.000) 0.038* (0.000) −0.243* (0.000) 0.072
(0.027) 1.262* (0.304) 0.049* (0.009) 0.100* (0.048) 0.229* (0.070) 0.176* (0.032) 0.177* (0.036) 24.719 0.000
(0.005) 1.218* (0.041) 0.056* (0.004) −0.039* (0.017) 0.202* (0.014) 0.385* (0.047) 0.184* (0.010) 138.639 0.000
(0.004) 1.206* (0.038) 0.057* (0.004) −0.056* (0.016) 0.198* (0.013) 0.473* (0.064) 0.186* (0.010) 245.669 0.000
−0.108*
−0.048*
−0.047*
(0.033) 1.656* (0.259) 0.057* (0.005) 0.149* (0.016) 0.275* (0.030) 0.179* (0.043) 0.144* (0.019) 21.516 0.000
(0.005) 1.323* (0.044) 0.057* (0.004) −0.009 (0.012) 0.240* (0.009) 0.398* (0.040) 0.197* (0.009) 129.474 0.000
(0.005) 1.368* (0.042) 0.058* (0.004) −0.004 (0.013) 0.253* (0.009) 0.379* (0.041) 0.201* (0.009) 226.986 0.000
Panel A: cumulant estimators
Sargan test p-value
q
–
Table 5 Leverage regressions.
Panel A: cumulant estimators Tobin’s q
)
0.036* (0.003) −0.009 (0.011) 0.239* (0.012) 0.323* (0.020)
ρ 2 is an estimate of the R2 of the regression. τ 2 ∈ (0, 1) is an index of measurement quality for the proxy for Tobin’s q. ‘‘Sargan Test’’ refers to the test of the model overidentifying restrictions from Sargan (1958). Standard errors are in parentheses under the parameter estimates. An asterisk indicates significance at the 5% level.
the presence of financial constraints, with a higher coefficient thought to indicate more severe constraints. Measurement error in Tobin’s q can bias the cash flow coefficient because Tobin’s q and cash flow are highly positively correlated. The second application is a widely-used leverage regression from Rajan and Zingales (1995), in which leverage is regressed on the market-to-book ratio, the ratio of capital to total assets, the log of sales, and the ratio of operating profit to total assets. This regression contains two mismeasured regressors: the market-to-book ratio proxies for true unobservable investment opportunities in all assets, and the ratio of capital to total assets proxies for the concept of asset tangibility. We treat the other two regressors as perfectly measured. Because this regression contains multiple mismeasured regressors, the individual coefficients on the mismeasured regressors need not be biased downward (e.g. Klepper and Leamer, 1984). As in the case of the cash flow regression, the correlations among the regressors can lead to biased coefficients on the perfectly measured regressors.
Asset tangibility Log sales Operating income/assets
ρ2 τ12
Sargan test p-value Panel B: moment estimators Market-to-book ratio Asset tangibility Log sales Operating income/assets
ρ2 τ12 τ22 Sargan test p-value
ρ 2 is an estimate of the R2 of the regression. τi2 ∈ (0, 1), i = 1, 2 are indices of measurement quality for the two proxy variables. ‘‘Sargan Test’’ refers to the test of the model overidentifying restrictions from Sargan (1958). Standard errors are in parentheses under the parameter estimates. An asterisk indicates significance at the 5% level.
of the overidentifying restrictions reject. To explore the source of this rejection, we use the cumulant estimators on each year of data separately, as in Erickson and Whited (2000), and we compute the overidentification test separately each year. We find that the test rejects in only 7% and 14% of the years for the fourthand fifth-cumulant estimators, respectively. We also find that the test of parameter constancy from Erickson and Whited (2000) rejects. Thus, one likely source of the Sargan test rejection is the assumption that the slope coefficients are constant over time.
5.1. Investment regressions Table 4 presents the results from our investment regression, where all variables have undergone a within transformation. We report results from using OLS, the third- through fifth-order cumulant estimators (Panel A), and the third- through fifth-order moment estimators (Panel B). OLS produces a small coefficient on q, and a much larger and statistically significant coefficient on cash flow. In contrast, the results from the moment and cumulant estimators are sharply different from the OLS results but nearly identical to each other. We find a much larger coefficient on q, which stems from the attenuation bias in the OLS estimate. More importantly, the coefficients on cash flow are all smaller in absolute value than their OLS counterparts, and none are statistically significant. In addition, the cumulant and moment estimators deliver higher estimates of the regression R2 than does OLS, and we estimate the measurement quality of Tobin’s q to be quite low, approximately 45%. These results thus confirm those in Erickson and Whited (2000, 2012). Finally, the Sargan tests
5.2. Leverage regressions Table 5 presents our leverage regression, again with Panels A and B and containing results from cumulant and moment estimators, respectively. Three results stand out in Panel A. First, all but one of the coefficient estimates from the cumulant estimators have the same sign as the corresponding estimate from OLS. This result is unexpected because measurement error in more than one regressor typically results in large coefficient biases (e.g. Klepper and Leamer, 1984; Wansbeek and Meijer, 2000), and because we find low proxy quality (i.e., low estimates of τ 2 ) for both the market-to-book ratio and the tangibility proxy. Low proxy quality is to be expected in corporate finance, where measurement error typically stems from large conceptual gaps between empirical proxies and the underlying true variables. Second, the Sargan tests of the overidentifying restrictions reject strongly, and here the Sargan tests from using the cumulant estimators on the individual years of data also reject between 55% and 93% of the
10
T. Erickson et al. / Journal of Econometrics (
time. Thus, unlike the investment regressions, a typical leverage regression is much less likely to be well specified. Third, the cumulant estimators produce a coefficient on the tangibility proxy of approximately 1.2, which is six times the size of the OLS estimate. This result is important because it supports the intuition in, for example, Rampini and Viswanathan (2013) that asset pledgeability is a first-order determinant of leverage. The results from using the moment estimators in Panel B are similar to the results from Panel A. Again we find large coefficients on the tangibility proxy, and again the test of the overidentifying restrictions of the model rejects strongly. However, the moment estimators are substantially more difficult to compute in the case of two mismeasured regressors because the difficulties with obtaining local minima are more acute in this case. 6. Conclusion This paper develops estimators for the classical errors-invariables model from equations that are linear in the higher-order cumulants of the observable variables, where each equation has the same coefficient vector as the regression model being estimated. Like the high-order moment estimators in Erickson and Whited (2002), these estimators do not require additional information such as an instrument or repeated measurements. Indeed, the moment and cumulant estimators are asymptotically equivalent, but the cumulant estimators are an advance beyond the moment estimators. Notably, the cumulant estimators have closed-form solutions. Although exactly-identified high-order moment estimators have closed-form solutions, overidentified highorder moment estimators do not. Thus, for the moment estimators, exploiting overidentification requires numerical minimization of an objective function and starting values for this minimization. Some starting value choices can lead to local rather than global optima, which can degrade finite-sample performance. Despite this issue, Erickson and Whited (2002, 2012) show that exploiting overidentification also has important benefits because it typically results in sharp increases in finite-sample performance, as long as one computes global optima. In contrast, the closed-form solutions of the cumulant estimators always correspond to global optima, so it is possible to exploit overidentification without computational difficulties. For this reason, the cumulant estimators outperform the moment estimators in our Monte Carlo simulations, especially for the case of two mismeasured regressors. Nonetheless, both the moment and the cumulant estimators require that the data analyst choose among estimators based on different orders of moments or cumulants. Future research could examine data-driven methods, such as cross-validation, for choosing among these different estimators. Acknowledgments We thank Hao Zou for research assistance and two anonymous referees, Alok Bhargava, Bob Chirinko, Harry DeAngelo, Tom George, Arthur Korteweg, Gregor Matvos, Michael Roberts, Huntley Schaller, Malcolm Wardlaw, and the participants of Finance 534 at the University of Rochester for comments on earlier drafts. All of the analysis, views, and conclusions expressed in this paper are those of the authors and do not necessarily reflect the views or policies of the Bureau of Labor Statistics.
)
–
Proof of Proposition 3. Proposition 3 of Erickson and Whited p
p
ˆ −→ Ω . The Slutsky theorem implies Dˆ −→ (2002) establishes Ω D, and then also the asserted result. Proof of Proposition 4. Our proof has two steps. First, we show that the asymptotic variance of the β component of θˆ is the same as that for the β component of a minimum distance estimator based on an equation system containing (11) as a proper subset. Second, we show that the β component of the latter estimator is identical to the estimator based on (11) alone. Part 1: Assume that ξ consists of all moments of (˙yi , x˙ i ) from order 2 to order P. Erickson and Whited (2002) show that σ will then consist of all moments of (ui , εi , ηi ) from orders 2 to P that are not identically zero because of our independence assumptions. Let K consist of all cumulants of (˙yi , x˙ i ) from order 2 to order P. Denote the invertible map from ξ to K as:2 K = h (ξ ) .
(A.1)
′ Define γ = β ′ , κ′ , where κ consists of the nonzero cumulants of (ui , εi , ηi ) from orders 2 to P. The invertible map between κ and σ implies an invertible map from γ to θ , which we write as θ = a (γ ) .
(A.2)
Now apply h to both sides of (30) to obtain K = h (c (θ )) .
(A.3)
Substitute (A.2) into the right-hand side of (A.3) to get K = h (c (a (γ ))) .
(A.4)
Define the partition
K =
Ky
K2
,
(A.5)
where Ky is from the specification of (11) that includes all possible equations (10) that are derivable from K . Let r be the map such that, given β ,
r
Ky
=
Ky − Kx β
K2
K2
.
(A.6)
A simple recursion argument shows that r has an inverse when all elements of β are nonzero. Applying r to both sides of (A.4) gives r (K ) = r (h (c (a (γ )))) .
(A.7)
The identity r Kˆ = r h ξˆ , the delta method, the chain rule, and (32) imply
d √ n r Kˆ − r (h (c (a (γ0 )))) → N 0, RH Ω H ′ R′ ,
(A.8)
where R and H are the Jacobians of r and h, respectively. All Jacobians are defined analogously to (33) using the relevant true value. The true values are related via ξ0 = c (θ0 ), γ0 = a (θ0 ), K0 = h (ξ0 ), and θ0 = (β0 , σ 0 ). Note that the true values, θ0 = (β0 , σ 0 ), and the functions c, h, a, and r and their respective derivatives are non-random when their arguments are given.
Appendix This appendix contains the proof of Propositions 3 and 4. The proofs of Propositions 1 and 2 are standard and thus omitted. The proof Lemma 1 is in Erickson and Whited (2002).
2 For ease of exposition, the notation here differs from the notation in the text, where we used K = K (ξ ), with both K and ξ denoting true values. Here, K and ξ are generic possible values, with K0 and ξ0 denoting true values.
T. Erickson et al. / Journal of Econometrics (
The optimal MD estimator of γ based on (A.7) is γ
× r Kˆ − r (h (c (a (γ )))) .
(A.9)
The associated asymptotic covariance matrix is
−1 −1 RHCA = (RHCA)′ RH Ω H ′ R′
(A.10)
−1 −1 ′ = A−1 C ′ Ω −1 C A ′ = A−1 avar θˆ A−1 ,
(A.11) (A.12)
where A is the Jacobian of a, and the third line comes from (31). Because (A.2) in partitioned form is
β β = , σ a2 (β, κ)
(A.13)
the matrix A−1 will be a lower triangular block matrix with an upper diagonal equal to the identity. The upper diagonal block block of avar γˆ therefore equals the upper diagonal block of
avar θˆ , establishing that the asymptotic covariance of the β component of γˆ equals the asymptotic covariance of the β component of θˆ . Part 2: In partitioned form (A.7) satisfies
Ky − Kx β K2
=
0
(A.14)
f (β, κ)
for equations K2 = f (β, κ) described below. By Ahn and Schmidt (1995), the optimal estimator of β based only on Ky − Kx β will equal the β component of γˆ if the dimension of κ equals the dimension of K2 , and if, given β , the equations K2 = f (β, κ) can be solved for κ. To show that the Ahn–Schmidt conditions hold, we partition K2 = f (β, κ) into four types of equations. Using the notation in (8)–(10), Type 1 equations have s0 , s1 , . . . , sJ = 0, s1 , . . . , sJ with two or more positive elements s1 , . . . , sJ , and take the form
K 0, s1 , . . . , sJ = κ 0, s1 , . . . , sJ .
(A.15)
Each equation introduces the parameter κ 0, s1 , . . . , sJ . Type 2
equations have s0 , s1 , . . . , sJ take the form
= (s0 , 0, . . . , 0) for s0 ≥ 1, and
K (s0 + 1, 0, . . . , 0) = β1 K (s0 , 1, 0, . . . , 0) + · · ·
+ βJ K (s0 , 0, . . . , 0, 1) + κu (s0 + 1)
(A.16)
where κu (s) is the cumulant of order s of the distribution for ui . Each equation introduces, and can be used to solve for, the parameter κu (s0 + 1). Types3 and 4 come in pairs, one for each distinct vector s0 , s1 , . . . , sJ of the form 0, . . . , 0, sj , 0, . . . , 0 for sj > 2 and j ≥ 1. The Type 3 equation is K 1, . . . , sj − 1, . . . , 0 = β1 K 0, 1, 0, . . . , 0, sj − 1, 0, . . . , 0
+ · · · + βj κ 0, . . . , 0, sj , 0, . . . , 0 + · · · + βJ K 0, . . . , 0, sj − 1, 0, . . . , 0, 1 , (A.17) and introduces the single parameter κ 0, . . . , 0, sj , 0, . . . , 0 . Given β , the solution for this parameter follows directly from (A.17). The paired Type 4 equation is K 0, . . . , sj , . . . , 0 = κ 0, . . . , sj , . . . , 0 + κεj sj ,
–
11
which introduces the parameter κεj sj , the cumulant of order sj of the distribution for εj . Given the solution of the Type 3 equation, the Type 4 can be solved for κεj sj . It is seen that the number of equations of Types 1 through 4 equals the number of parameters these equations introduce into (A.14), and that these equations can be solved for those parameters. The Ahn–Schmidt result therefore applies, completing our proof.
′ −1 γˆ = argmin r Kˆ − r (h (c (a (γ )))) RH Ω H ′ R′
avar γˆ
)
(A.18)
References Ahn, S.C., Schmidt, P., 1995. A separability result for GMM estimation, with applications to GLS prediction and conditional moment tests. Econom. Rev. 14, 19–34. Arellano, M., 2003. Panel Data Econometrics. Oxford University Press, Oxford, UK. Bekker, P.A., 1986. Comment on identification in the linear errors in variables model. Econometrica 54, 215–217. Bhargava, A., 1987. Wald tests and systems of stochastic equations. Internat. Econom. Rev. 28, 789–808. Cragg, J.G., 1997. Using higher moments to estimate the simple errors-in-variables model. RAND J. Econom. 28, 71–91. Dagenais, M.G., Dagenais, D.L., 1997. Higher moment estimators for linear regression models with errors in the variables. J. Econometrics 76, 193–221. Erickson, T., Whited, T.M., 2000. Measurement error and the relationship between investment and q. J. Polit. Econom. 108, 1027–1057. Erickson, T., Whited, T.M., 2012. Treating measurement error in Tobin’s q. Rev. Financ. Stud. 25, 1286–1329. Erickson, T., Whited, T.M., 2002. Two-step GMM estimation of the errorsin-variables model using high-order moments. Econometric Theory 18, 776–799. Fazzari, S.M., Hubbard, R.G., Petersen, B.C., 1988. Financing constraints and corporate investment. Brookings Papers on Economic Activity 1988, 141–206. Geary, R.C., 1942. Inherent relations between random variables. Proc. R. Ir. Acad. A 47, 63–76. Han, C., Phillips, P.C.B., 2010. GMM estimation for dynamic panels with fixed effects and strong instruments at unity. Econometric Theory 26, 119–151. Hansen, L.P., Heaton, J., Yaron, A., 1996. Finite-sample properties of some alternative GMM estimators. J. Bus. Econom. Statist. 14, 262–280. Kapteyn, A., Wansbeek, T., 1983. Identification in the linear errors in variables model. Econometrica 51, 1847–1849. Klepper, S., Leamer, E.E., 1984. Consistent sets of estimates for regressions with errors in all variables. Econometrica 52, 163–183. Lewbel, A., 1997. Constructing instruments for regressions with measurement error when no additional data are available, with an application to patents and R&D. Econometrica 65, 1201–1214. Lewbel, A., 2012. Using heteroscedasticity to identify and estimate mismeasured and endogenous regressor models. J. Bus. Econom. Statist. 30, 67–80. Madansky, A., 1959. The fitting of straight lines when both variables are subject to error. J. Amer. Statist. Assoc. 54, 173–205. Martin, I.W.R., 2013. Consumption-based asset pricing with higher cumulants. Rev. Econom. Stud. 80, 745–773. McCullagh, P., 1987. Tensor Methods in Statistics. Chapman and Hall, New York. Neyman, J., 1937. Remarks on a paper by E.C. Rhodes. J. R. Stat. Soc. 100, 50–57. Pakes, A., 1982. On the asymptotic bias of wald-type estimators of a straight line when both variables are subject to error. Internat. Econom. Rev. 23, 491–497. Pal, M., 1980. Consistent moment estimators of regression coefficients in the presence of errors-in-variables. J. Econometrics 14, 349–364. Rajan, R.G., Zingales, L., 1995. What do we know about capital structure? Some evidence from international data. J. Finance 50, 1421–1460. Rampini, A.A., Viswanathan, S., 2013. Collateral and capital structure. J. Finan. Econom. 109, 466–492. Reiersøl, O., 1941. Confluence analysis by means of lag moments and other methods of confluence analysis. Econometrica 9, 1–24. Reiersøl, O., 1950. Identifiability of a linear relation between variables which are subject to error. Econometrica 18, 375–389. Richardson, M., Smith, T., 1993. A test for multivariate normality in stock returns. J. Bus. 66, 295–321. Sargan, J.D., 1958. The estimation of economic relationships using instrumental variables. Econometrica 26, 393–415. Schennach, S., Hu, Y., 2013. Nonparametric identification and semiparametric estimation of classical measurement error models without side information. J. Amer. Statist. Assoc. 108, 177–186. Spiegelman, C., 1979. On estimating the slope of a straight line when both variables are subject to error. Ann. Statist. 7, 201–206. Van Montfort, K., Mooijaart, A., de Leeuw, J., 1987. Regression with errors in variables: estimators based on third order moments. Stat. Neerl. 41, 223–238. Van Montfort, K., Mooijaart, A., de Leeuw, J., 1989. Estimation of regression coefficients with the help of characteristic functions. J. Econometrics 41, 267–278. Wansbeek, T.J., Meijer, E., 2000. Measurement Error and Latent Variables in Econometrics. Elsevier, Amsterdam.