Statistical Methodology 8 (2011) 291–303
Contents lists available at ScienceDirect
Statistical Methodology journal homepage: www.elsevier.com/locate/stamet
Asymptotic variance–covariance matrices for the linear structural model Jonathan Gillard Cardiff School of Mathematics, Cardiff University, Senghennydd Road, CF24 4AG, United Kingdom
article
info
Article history: Received 28 June 2010 Received in revised form 1 November 2010 Accepted 28 December 2010 Keywords: Errors in variables regression Measurement error Linear structural model Method of moments Variance–covariance matrix
abstract In recent years there has been renewed interest in errors in variables regression models, where there are errors in the predictor variables as well as the dependent variable. Despite recent advances, the theory of the simple linear errors in variables model does not yet match the well known methodology of simple linear regression. This paper fills one of these gaps by presenting results for the straight line errors in variables model that will enable a practitioner to estimate not only the parameters of the model but also the approximate variances of these estimates. Attention is therefore focused on the variances of the model and the extraction of estimates. The presentation adopts a method of moments approach, but connections are made with the method of least squares and the maximum likelihood approach. © 2011 Elsevier B.V. All rights reserved.
1. Introduction A comprehensive review of errors in variables regression is [8], radically enlarging Chapter 28 in [20]. The approach centred on the method of maximum likelihood in which the assumption was made that all the random variables in the model are normally distributed. The books by [6,13] include the linear errors in variables model in more wide ranging accounts. Dunn [12] adopted the method of moments approach in describing solutions to the problem. The advantage of this approach is that no assumptions are necessary about the nature of the random variables in the model, other than that the moments exist. It is this approach that is enlarged upon in this paper. None of the studies [8,20,6] and [13] gave complete details about the precision of the parameter estimates, and this is the main topic of this paper. Much previous work, including [3,22] and [24], have concentrated on the slope and intercept estimators. [18] contained the complete asymptotic variance–covariance matrices of the estimators, including those that estimate the variance
E-mail address:
[email protected]. 1572-3127/$ – see front matter © 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.stamet.2010.12.002
292
J. Gillard / Statistical Methodology 8 (2011) 291–303
components of the model. They adopted the maximum likelihood approach however, and their work is restricted to the assumption that the random variables in the model are normally distributed. In this paper we give asymptotic results for the variances of all the parameters, but as we adopt the method of moments approach the results are not restricted to the assumption of normally distributed random variables. A comparison with the work of Hood et al. shows that additional terms have to be included to allow for non-normally distributed random variables, but that in many cases the adjustments that have to be made are slight. Other work has investigated asymptotic variances for method of moments estimators, albeit for different errors in variables regression models. Work by Akritas and Bershady [2] offers only an expression for the variance for their derived estimators of the intercept and slope, and an expression for the covariance between them. This present work is more complete, containing explicitly stated and simplified variance–covariance matrices for all the parameters in the model, and for different slope estimators. This work allows comparisons of the variance–covariance matrices for different slope estimators. This paper also provides simplification of Akritas and Bershady’s formulae. Cheng and Riu [7] discuss fitting errors in variables models when the measurement errors are heteroscedastic. However Cheng and Riu’s work develops large-sample inference via MM estimators, which coincide with maximum likelihood estimators for the so-called normal structural model which is considered later in this paper. Thus it is difficult to see how their estimators may be immediately applied to an errors in variables model containing non-normally distributed random variables. Indeed Patriota et al. [23] note that Cheng and Riu’s approach is particularly affected by perturbations in the distribution of the independent variable, especially for small sample sizes. Patriota et al. use the method of moments to discuss the limiting distribution of the maximum likelihood and method of moments estimators for the parameters of the so-called heteroscedastic structural errors in variables model where there is equation error present. In the context of the models discussed in this paper, equation error is when the true variables (the latent, unobserved variables without measurement error) do not display a perfect linear relationship. As stated by Cheng and Riu, ‘‘the no-equation-error model is frequently found in natural sciences, such as physics and chemistry’’, and so this paper will focus on this model. This paper looks at the errors in variables model using a unified method of moments framework. The complete asymptotic variance–covariance matrices for estimators of all parameters are presented in closed form. The adjustments needed to cope with models containing non-normal random variables are explicitly defined. It is the intention of this paper to present the variance–covariance matrices so that they are more accessible to a practitioner. Simulation studies show that these asymptotic variance–covariance matrices are reliable even for relatively small data sets. Guidance is given for the use of these matrices. Section 2 discusses assumptions in the errors in variables model, and outlines the linear structural model (which is an example of an errors in variables model). Section 3 deals with the method of moments estimators that can be derived for the parameters of the linear structural model. The inherent problem of identifiability is discussed, alongside the problem of obtaining admissible estimates. Section 4 uses the delta method to obtain expressions for variances and covariances of the estimators discussed in Section 3. Section 5 contains a simulation study carried out to confirm the accuracy of the expressions derived for the variance–covariance matrices in Section 4. Section 6 concludes the paper. 2. The model and notation Suppose two variables (ξ , η) are linearly related:
η = α + βξ . In the errors in variables model neither variable can be measured directly. They are latent variables and the measurements (x, y) that are made differ from the latent (ξ , η) by additional random components, often called measurement errors. The measurements x and y are assumed to be related to the true values ξ and η by the equations x=ξ +δ y = η + ε = α + βξ + ε.
J. Gillard / Statistical Methodology 8 (2011) 291–303
293
In this paper these errors, δ and ε , are assumed to be uncorrelated with each other and with the latent variable ξ . A random sample {(xi , yi ), i = 1, . . . , n} of paired measurements is available from which parameters of the model are estimated. In some applications it is assumed that the latent values ξi , associated with the measurements xi , are a sample from a random variable with mean µ and variance σ 2 . This is known as the structural model. In the functional model, in contrast, it is assumed that the values ξi (i = 1, . . . , n) are fixed, although unobservable, quantities. When the method of moments approach is taken the distinction between the structural and functional models is not important in the estimation of the parameters of the model. The distinction needs to be made only if the values ξi themselves are to be estimated, and this problem will not be discussed in this paper. All that is needed in the method of moments approach are assumptions about the moments of the random variables δ and ε , and of the latent variable ξ . In this paper the following assumptions are made about these variables: E [δ] = E [ε] = 0 E [ξ ] = µ Var[ξ ] = σ 2 Var[δ] = σδ2 Var[ε] = σε2 Cov[δ, ε] = Cov[δ, ξ ] = Cov[ξ , ε] = 0. It is also assumed that third and fourth moments of all random variables exist:
µδj = E [δ j ],
j = 3, 4
µεj = E [ε ],
j = 3, 4
j
µξ j = E [(ξ − µ)j ],
j = 3, 4.
3. Estimation of parameters The method of moments equations based on the first and second moments have been stated by many previous authors, for example [12,15], but are repeated here for reference. Here a tilde is placed over a symbol to denote a method of moments estimator. In these expressions x¯ and y¯ are the sample means of x and y respectively, sxx and syy are the sample variances and sxy is the sample covariance. x¯ = µ ˜
(1)
y¯ = α˜ + β˜ µ ˜
(2)
sxx = σ˜ 2 + σ˜ δ2
(3)
syy = β˜ 2 σ˜ 2 + σ˜ ε2
(4)
sxy = β˜ σ˜ .
(5)
2
One of the main problems in fitting an errors in variables model using the method of moments is that of identifiability of the parameters. It can be seen from Eqs. (1)–(5) that a unique solution cannot be found for the parameters since there are five equations, but six unknown parameters (µ, σ 2 , α, β , σδ2 and σε2 ). One way to proceed, and the one adopted in this paper, is to assume that there is some prior knowledge, usually concerning the variances in the model, that enables the parameter space to be restricted such that unique estimators can be found. It has also been suggested that equations derived from third and fourth moments can be found, but Gillard [14] found that there are limitations in the practical value of these equations, essentially because the data have to be very skewed or very kurtotic for the estimating equations to be reliable. Given that the approach in this paper (and of other papers using the method of moments approach) is to assume prior knowledge, allowing a restriction on the parameter space to get solutions from Eqs. (1)–(5), it seems rational to accept the possibility of using Bayesian methods for this problem.
294
J. Gillard / Statistical Methodology 8 (2011) 291–303
Zellner [25] shows that simple assumptions about prior distributions can lead to relatively straightforward derivations of posterior distributions, although the methodology is more complicated than that advocated here. Additionally there is often the need for numerical integration to compute the integrals that arise as a result of the Bayesian approach, and it is thus difficult to get closed form expressions analogous to those derived in this paper. As stated by Gustafson [16] ‘‘In some problems the integral involved is tractable so that the likelihood function is readily evaluated. In other problems however, the integral will not have a closed form.’’ Kendall and Stuart [20] showed that where the assumption is made that all the random variables δ, ε and ξ are normally distributed the method of maximum likelihood gives the same equations as those above, with the same problem of lack of identifiability. The assumptions of other distributions lead to a likelihood that is much more complicated. Gillard [14] investigated other assumptions for the distribution of ξ , though still with normally distributed errors δ and ε , and found that for practical purposes the method is difficult to deal with under these alternative assumptions. This problem has also been noted by other authors such as Patriota et al. [23] who state ‘‘the likelihood function . . . is very complicated to deal with in the sense of finding its global maximum. An iterative procedure is needed. Problems regarding iterative procedures for obtaining the maximum likelihood estimates, mainly for small sample sizes are well known.’’ Despite this, some authors have made progress in fitting structural errors in variables models when the distribution of ξ is non-normal. For example Carroll et al. [5] use a flexible Bayesian model assuming that ξ follows a distribution that may be modelled by a mixture of normal distributions. Similarly Kelly [19] extends this to cope with heteroscedastic measurement errors and censoring. Despite both papers offering explicit forms of the likelihood functions that they use, a Gibbs sampler is used to proceed with this function. An EM algorithm could also be used. There are issues however; expectations in the E-step may not have closed form expressions, and additional effort may be needed to compute standard errors (see, for example, [16]). The estimators of the slope β using the first and second moments alone, with various restrictions on the parameter space, are tabulated below. Assumption Error variance σδ2 known
Slope estimator
Case 1 Case 2
Error variance σε2 known
Case 3
Reliability ratio κ =
β˜2 = β˜3 =
syy −σε2 sxy sxy κ sxx
β˜4 =
(syy −λsxx )+
Case 4
Ratio λ =
σ2 ε
σδ2
σ2 σ 2 +σδ2
β˜1 =
known
known
sxy sxx −σδ2
√
(syy −λsxx )2 +4λ(sxy )2 2sxy
Once a slope estimator β˜ has been obtained, its value may be substituted into Eqs. (6)–(10) in order to estimate the remaining parameters that have not been assumed known.
µ ˜ = x¯
(6)
α˜ = y¯ − β˜ x¯ σ˜ 2 =
(7)
sxy
(8)
β˜
σ˜ δ2 = sxx − σ˜ 2
(9)
σ˜ ε2 = syy − β˜ 2 σ˜ 2 .
(10)
Depending on the sample values obtained for sxx , syy and sxy it is possible for either of Eq. (9) and/or (10) to give negative estimates of a variance. Admissibility conditions, that ensure that this cannot happen, were discussed in detail in [20,18,12]. However, Gillard [14] showed that these conditions can be reduced more conveniently to the requirement that the estimate of the slope should lie between s s the slope of a simple linear regression of y on x, sxy , and that of x on y, syy . For the purposes of this paper xx
xy
we shall call this range the admissible range. This observation leads to a link between the method of
J. Gillard / Statistical Methodology 8 (2011) 291–303
295
moments estimators and the method of least squares. It has long been known that the estimator in Case 4 in the above table can be derived from a calculation based on the distances of the observed points (xi , yi ) from the fitted line. Adcock [1] showed that if these distances are taken at right angles to the line and the sum of squares of the distances is minimized, the solution for Case 4 is given with λ = 1. Kummel [21] showed that if the distances are taken at an angle other than a right angle, the line with minimum sum of squares of distances is the solution for Case 4 with λ ̸= 1. It can further be shown (see [14] for example) that the solution for Case 4 always gives a slope that is in the admissible range. Thus, however, an estimate of the slope is obtained; if it is in the admissible range it corresponds to a slope that could have been obtained from the solution for Case 4. Such a slope corresponds to a minimization of a sum of squares of distances to the line, though not necessarily at right angles to the line. Case 3, where the reliability ratio is assumed known, is included for completeness. The reciprocal of the reliability ratio (i.e. κ1 ) is the multiplicative factor needed to remove the bias in the ordinary least squares (y on x) slope estimator. Dunn [12] has a full description of the relevance of the reliability ratio, and how it may be estimated when there are replicates. The reliability ratio, as the proportion of variance in x due to the latent ξ , seems to be a natural idea. Dudewicz and Mishra [11] give general results that noncentral sample moments are unbiased and consistent estimators of corresponding noncentral population moments. Central sample moments are slightly biased but consistent. Work by Gillard [14] shows that the bias present in the method of moment estimators outlined in this paper is of order 1n (based on the delta method). Furthermore simulation results show that this bias, at least for the method of moment estimators of the slope discussed here, becomes negligible for sample sizes larger than 50. Smaller sample sizes are likely to be acceptable in some conditions, but n > 50 is believed to be ‘safe’. 4. Asymptotic variance–covariance matrices This section will detail the asymptotic variance–covariance matrices of all of the estimators derived in Section 3. In order to maintain brevity of presentation, it will prove prudent not to include full derivations of each of the results within this section. Example derivations are included in [14], and additional details are contained in Appendix A. It is this section which contains the main results of the paper. The variance–covariance matrices were derived using the delta method. Cramer [9] and subsequently other authors such as Bowman and Shenton [4] detailed this approximate method (or the method of statistical differentials) for obtaining expressions for variances and covariances of functions of sample moments. 4.1. Description of the asymptotic variance–covariance matrices The complete asymptotic variance–covariance matrices for the different slope estimators under varying assumptions are included in the following pages. For ease of presentation, the matrices are expressed as the sums of three components, A, B and C . This presentation has the advantage of making the matrices simpler for a practitioner to use. The matrix A alone is needed if the assumptions are made that ξ , δ and ε all have zero third moments and zero measures of excess of kurtosis (see [10] for definitions of skewness and kurtosis). These assumptions would be valid if all three of these variables are normally distributed as in the Normal structural model. The matrix B gives the additional terms that are necessary if the latent variable ξ has a non-zero third moment and a non-zero measure of kurtosis. It can be seen in what follows that in most cases the B matrices are sparse, needing only adjustment for the terms for Var[σ˜ 2 ] and Cov[µ, ˜ σ 2 ]. The ˜ exceptions are the cases where the reliability ratio is assumed known (β3 ). The C matrix contains additional terms that are needed if the third moments and measures of excess of kurtosis are non-zero for the error terms δ and ε . It is likely that these C matrices will prove of less value to practitioners than the A and B matrices. It is quite possible that a practitioner
296
J. Gillard / Statistical Methodology 8 (2011) 291–303
would not wish to assume that the distribution of the latent variable ξ is Normal, or even that its third and fourth moments behave like those of a Normal distribution. Indeed, the necessity for this assumption to be made in the likelihood approach may well have been one of the obstacles against a more widespread use of errors in variables methodology. The assumption of Normal like distributions for the error terms, however, is more likely to be acceptable. Thus in many applications, the C matrix may be ignored. In summary, the practitioner wishing to compute an asymptotic variance–covariance matrix for a fitted linear structural model has four options. Calculate the asymptotic variance–covariance matrix as 1. A, if the Normal structural model is assumed; 2. A + B, if the latent variable ξ is assumed not to follow a Normal distribution, but the errors δ and ε are assumed to follow independent Normal distributions; 3. A + C , if the latent variable ξ is assumed to follow a Normal distribution, but the errors δ and ε are assumed not to be normally distributed; 4. A + B + C , if all the random variables in the model, namely ξ , δ and ε , are assumed not to be normally distributed. As a check on the method employed, the A matrices were compared with those given in [17,18], where a different likelihood approach was used in deriving the asymptotic variance–covariance matrices. In all cases, exact agreement with the A matrices was found, although much simplification of the algebra has been possible.
4.2. The matrices This section contains the variance–covariance matrices for each of the slope estimators outlined earlier. The results are stated first, followed by a brief discussion. For brevity, the notation U = σ 2 + σδ2 , V = β 2 σδ2 + σε2 , e1 = µδ4 − 3σδ4 , e2 = µε4 − 3σε4 , e3 = βλµδ3 + µε3 and |Σ | = β 2 σ 2 σδ2 + σ 2 σε2 + σδ2 σε2 will be used. U is the variance of x. e1 and e2 are the excesses of kurtosis for δ and ε respectively. |Σ | is the determinant of the variance–covariance matrix of x and y. Note that some of these newly introduced parameters may be estimated straightforwardly using the method of moments estimating Eqs. (1)–(5). U may be estimated by sxx and |Σ | may be estimated by sxx syy − (sxy )2 . It has already been shown that σδ2 and σϵ2 may be estimated by Eqs. (9) and (10) respectively. The higher moments of the error terms µδ 3 , µδ 4 , µε3 and µε4 may be estimated by appealing to higher order sample moments as detailed in [14].
˜ σ˜ 2 Error variance σδ2 known. Since σδ2 is assumed known, the variance–covariance matrix for µ, ˜ α, ˜ β, 2 and σ˜ ε is required. −βσδ2
U
0
µ (|Σ | + 2β 2 σδ4 ) + V σ4 2
1 A1 = n
−
0
µ (|Σ | + 2β 2 σδ4 ) σ4 1
σ4
(|Σ | + 2β σδ ) 2
4
0
2µβσδ
2
σ
−
2
2βσδ2
σ2
2U 2
2µβσδ
2
U U
σ
2
2βσ 2 − 2δ V
σ
1 B1 = n
0
0 0
0 0 0
µξ 3 0 0
µξ 4 − 3 σ 4
0 0 0 0 0
2β 2 σδ4 2V 2
V
J. Gillard / Statistical Methodology 8 (2011) 291–303
0
1 C1 = n
βµδ3 σ2 β 2 µδ 3 σ2 β2 e1 σ4
βµµδ3 σ2 2β 2 µµδ 3 − σ2
µδ 3
−
β 2 µδ 3
µε 3 − β µδ 3 . β3 − 2 e1 σ 2 β e1 3
−βµδ3 β e1 σ2
−
297
e1
e2 + β 4 e1
˜ σ˜ 2 and σ˜ δ2 is required. For Error variance σε2 known. The variance–covariance matrix for µ, ˜ α, ˜ β, 2 4 brevity, the notation W = (β |Σ | + 2σε ) is introduced. −βσδ2
U
0
µ W +V β2σ 4 2
1 A2 = n
−
0
µ W β2σ 4 1
β2σ 4
0
2µ (σ 2 V + β 4 σ 2 σδ2 ) β3σ 2 ε
−
W
2
β3σ 2
2
β4
(σε2 V + β 4 σ 2 σδ2 )
(β 4 U 2 + V 2 − 2β 4 σδ4 )
2µσε V 2
2 2 2 2 2 − 4 (σε V + 2β σδ σε ) β 2V 2 −
β3σ 2 2σε2 V β3σ 2
β4 0 1 B2 = n
0 0
0 0 0
µξ 3
0
0 0
0 0 0 0
µξ 4 − 3σ 4
0 1 C2 = n
0
−
0
µε 3 βσ 2
2µµε3
βσ 2
1
β2σ 4
µδ 3
0
−
e2
−
µε 3 − βµδ3 β2
µε 3 β2 1
β3σ 2 1
β4
1
e2
β3σ 2 −
e2 1
β4 Reliability ratio κ =
σ2 σ 2 +σδ2
1
β4
e2 e2
(β 4 e1 + e2 )
.
˜ σ˜ 2 and σ˜ δ2 is required. known. The variance–covariance matrix for µ, ˜ α, ˜ β,
For brevity, the notation ϖ = 1 − κ is introduced. U
1 A3 = n
−βσδ2 |Σ | µ2 4 + V σ
0
|Σ | σ4 |Σ | σ4
−µ
0 0 0 2σ 4
0
ϖ |Σ | σ2 ϖ −2β 2 |Σ | σ −2β 2 κσ 2 σδ2
2µβ
4β 2 ϖ |Σ | + 2σε4
298
J. Gillard / Statistical Methodology 8 (2011) 291–303
−µ
0
1 B3 = n
βϖ µξ 3 σ2
−
0
βκ µδ3 σ2 β 2κ −2µ 2 µδ3 σ
0
1 C3 = n
Ratio of the error variances λ = required.
−βσδ2 |Σ | µ 4 +V σ
U 1 A4 = n
β 2ϖ 2 (µξ 4 − 3σ 4 ) σ4
βκϖ (µξ 4 − 3σ 4 ) σ2 κ 2 (µξ 4 − 3σ 4 )
κµδ3 −βκµδ3 −
βκ 2 e1 σ2 κ 2 e1
β 3ϖ 2 4 − 2 (µξ 4 − 3σ ) σ −β 2 κϖ (µξ 4 − 3σ 4 ) β 4 ϖ 2 (µξ 4 − 3σ 4 ) 0
β 2 κµδ3
−β κµδ3 + µε3 . 3 2 β κ − 2 e1 σ β 2 κ 2 e1 β 4 κ 2 e1 + e2 3
˜ σ˜ 2 and σ˜ δ2 is known. The variance–covariance matrix for µ, ˜ α, ˜ β, 0
0 2µβ
|Σ | −µ 4 σ |Σ | σ4
2
−β 2 ϖ µξ 3
0
−
σε2 σδ2
µξ 3
0
βκ µδ 3 σ2 β 2κ µδ 3 σ2 β 2κ 2 e1 σ4
µ
βϖ µξ 3 σ2
0
|Σ | (β 2 + λ)σ 2 2β − 2 |Σ | (β + λ)σ 2 4|Σ | 2σ 4 + (β 2 + λ)
0
0 2σδ2 σε2 − 2 (β + λ) 2σδ4
0
0 0
1 B4 = n
µξ 3
0 0 0
0 0
µξ 4 − 3σ 4 µλβ µδ 3 (β + λ)σ 2 µβ −2 2 e3 (β + λ)σ 2
0
2
1 C4 = n
0 0 0 0 0
λβ µδ 3 (β + λ)σ 2 β e3 2 (β + λ)σ 2 β 2 e2 + λ2 β 2 e1 (β 2 + λ)2 σ 4
−
2
λ µδ3 (β + λ) 2
−
e3
(β + λ) (β e2 + λ2 β e1 ) − (β 2 + λ)2 σ 2 e2 + λ 2 e1 (β 2 + λ)2 2
β2 µδ 3 (β + λ) β e − βµ 3 δ 3 2 (β + λ) 3 β e2 − λβ e1 . (β 2 + λ)2 σ 2 (e2 + λβ 2 e1 ) − (β 2 + λ)2 4 e2 + β e1 (β 2 + λ)2 2
4.3. Description and patterns There are some common patterns which run through the variance–covariance matrices. For each of the A matrices, for example, Var[µ] ˜ =
U n
,
(11)
J. Gillard / Statistical Methodology 8 (2011) 291–303
˜ + Var[α] ˜ = µ2 Var[β]
V n
,
299
(12)
and Cov[µ, ˜ α] ˜ = −βσδ2 .
(13)
µ ˜ is also uncorrelated with β˜ , and the variance estimators. The variances of the slope estimators are different in each case. These are likely to be of greatest interest to the practitioner; they are easily read off the matrices A, B and C in each case as the third diagonal element. Patterns between rows and columns of the A matrices were also reported in [17]. As can be seen, the matrix B, reflecting skewness and kurtosis in the distribution of ξ , is generally sparse, although the B3 matrix is more complicated. The case of the reliability ratio known is the only ˜ for terms from the B matrix. In this case, and all one where adjustment has to be made for Var[β] the others described in this paper, the B matrices make no adjustment in Eqs. (11)–(13) since the elements of the matrices are all zero. For the cases other than the reliability ratio κ known, there are only corrections for Var[σ˜ 2 ] and Cov[µ, ˜ σ˜ 2 ]. No adjustments are made from the B matrices for variances and covariances of estimators of σδ2 and σε2 . The C1 matrix is more sparse than any other C matrix. Var[σ˜ δ2 ] and Var[σ˜ ε2 ] depend on the skewness and kurtosis of δ and ε . Cov[σ˜ δ2 , σ˜ ε2 ] depends on the skewness of δ and ε . The remaining covariances involving σ˜ δ2 depend solely on the skewness of δ , whilst the remaining covariances involving σ˜ ε2 depend solely on the skewness of ε . There is an identical pattern for the matrices C2 , C3 and C4 , except that variances and covariances 2 2 of σ˜δ replace variances and covariances of σ˜ε . Each of the variances and covariances involving µ ˜ and α˜ are affected by skewness in δ and ε but not kurtosis. The variances and covariances involving ˜ σ˜ 2 and σ˜ δ2 are affected by kurtosis in δ and ε but not skewness. C5 has a much more complicated β, structure than the other C matrices, but as discussed earlier the C matrix is likely to be of least interest to practitioners, and knowledge of both variances is less likely to be available than knowledge of just one. 5. The simulation study This paper is completed with a simulation study, with the aim of offering practical guidance as to the minimum sample size needed to use the asymptotic variance–covariance forms presented in this paper. Detailed simulations on the sampling distributions of the estimators (6)–(10) using each of the slope estimators discussed in this paper were included in [17,18] and are thus not repeated here. Additional simulations (for example, simulations for non-normally distributed measurement errors) are described in [14] and are available from the author. They are not included here for brevity. This section will concentrate on the accuracy of the expressions derived for the variance–covariance matrices. It can be seen that the variance of the slope β is a key component for many of the components of the variance–covariance matrices, and so the simulations here will concentrate on estimating this variance. Fig. 1 compares theoretical and sample variances of√ different slope √ estimators for a linear errors in variables model, with uniform ξ on support (1 − 2 3, 1 + 2 3) and also normally distributed errors following the error laws described in Section 2, under varying sample sizes. The σ 2 used was therefore 4. The other parameter settings used were α = 0, β = 1, σδ2 = 1 and σε2 = 1. These settings give a value of λ = 1, and a reliability ratio of κ = 0.8, a value that imparts a significant degree of uncertainty in the measurement of the latent variable ξ . The values of the theoretical variances for this particular construction of the structural model were nVar[β˜ 1 ] = 0.6875, nVar[β˜ 2 ] = 0.6875, nVar[β˜ 3 ] = 0.5145 and nVar[β˜ 4 ] = 0.5625. These theoretical variances differ slightly amongst the cases. They could only be made equal by having different parameters for each case, making comparisons more confusing. As ξ is considered to be a random variable that follows a uniform distribution, then the corrections given by the B matrix must be made. By using the matrices of the previous section, the deduction of
300
J. Gillard / Statistical Methodology 8 (2011) 291–303
(a) β1 .
(b) β2 .
(c) β3 .
(d) β4 . Fig. 1. Variance of the sampling distribution of β˜ () and theoretical asymptotic variance of β˜ ().
0.048 from the previous variance of β˜ 3 for the normal structural model must be carried out. This yields nVar[β˜ 3 ] = 0.5145. β˜ 3 has the smallest theoretical variance, with β˜ 4 close to this value. The theoretical expressions for the variance of β˜ tend to give larger values than those for the sample variances, although the difference diminishes as n grows larger. The estimated theoretical variances are indistinguishable from the true theoretical variances, and so have been omitted. It can be seen in Fig. 1 that there is close agreement between the theoretical and sample variances of the slope estimators. The author suggests that for sample sizes of 50 or more, the disagreement is so small as to be of no practical significance. Examples using these results are included in [15,14], as well as [17]. 6. Conclusion In this paper a clear methodology for the fitting of a straight line in an errors in variables methodology has been presented. The method of moments approach for both the estimation of the parameters and the corresponding delta method for the asymptotic variance–covariance matrices provides a transparent methodology for the errors in variables problem. Very few papers present variance–covariance matrices in the manner of this paper, and under certain parametric assumptions the simplified matrices presented here become the expressions previously presented in the scientific literature, namely [3,22,24,18]. Simplification of previously given formulae has also been provided. A suggested minimum sample size of 50, for the asymptotic variance–covariance matrices to be used with a high degree precision, is recommended. Acknowledgements The author would like to thank Terence Iles for his help in constructing and revising the paper. The author also acknowledges the helpful comments from two anonymous referees and an associate editor which led to an improved paper.
J. Gillard / Statistical Methodology 8 (2011) 291–303
301
Appendix. The delta method For a function f of p sample moments, x1 , . . . , xp , Var[f (x1 , . . . , xp )] ≈ ∇ T V ∇ where
T
∇ =
∂f ∂f ,..., ∂ x1 ∂ xp
is the vector of derivatives with each sample moment substituted for its expected value (as given by Eqs. (1)–(5)), and
Var[x1 ]
Cov[x1 , x2 ]
.. .
..
Cov[x1 , xp ]
Cov[x2 , xp ]
V =
···
Cov[x1 , xp ]
.. .
.
···
Var[xp ]
is the p × p matrix containing the variances of and covariances between sample moments. Covariances between functions of sample moments can be derived in a similar manner. The function f is assumed to be continuous in the neighbourhood of the p sample moments, and has continuous derivatives (of the first and second order at the p sample moments). A full proof of this result is contained in [9], and is thus not replicated here. In practice, in order to apply the delta method to create asymptotic variance–covariance matrices for the linear structural model, one needs to compute the variances of, and covariances between, each of the sample moments that occur in Section 3. An example derivation is included in the following subsection. A.1. Derivation of Cov[sxx , sxy ] This subsection shows the derivation of Cov[sxx , sxy ] for the linear structural model of Section 2. E [sxx sxy ] =
1 n
E 2
[− ] n n − (xi − x¯ )2 × (xi − x¯ )(yi − y¯ ) . i =1
i=1
¯ and (yi − y¯ ) = β(ξi − ξ¯ ) + (εi − ε¯ ). Substituting It can be shown that (xi − x¯ ) = (ξi − ξ¯ ) + (δi − δ) these into the above summation, and multiplying out leads to E [sxx sxy ] ≈
1
n2
n(βµξ 4 + βσ σδ + 2βσ σδ ) + n(n − 1)(βσ + βσ σδ ) . 2
2
2
2
4
2
2
Hence, Cov[sxx , sxy ] = E [sxx sxy ] − E [sxx ]E [sxy ]
≈
β(µξ 4 − σ 4 ) + 2βσ 2 σδ2 n
.
All the variances and covariances of sample moments needed to compute the asymptotic variance–covariance matrices for the linear structural model outlined in Section 2 are included in [14]. A MAPLE programme for manipulating algebraic expressions within the variance–covariance matrices is available from the author. Further use of the delta method allows computation of general formulae of the variances and covariances of the estimated parameters. The following subsection contains an example.
302
J. Gillard / Statistical Methodology 8 (2011) 291–303
A.2. Derivation of Var[α] ˜ A formula such as the one derived in this subsection may be derived for each element of the variance–covariance matrix for the linear structural model. Var[α] ˜ = Var[¯y − β˜ x¯ ] ≈
∂ α˜ 2
Var[¯y] +
∂ α˜ 2
˜ + Var[β]
∂ α˜ 2
Var[¯x] ∂ y¯ ∂ x¯ ∂ β˜ ∂ α˜ ∂ α˜ ∂ α˜ ∂ α˜ ˜ +2 ˜ +2 Cov[¯y, β] Cov[¯x, β] ∂ y¯ ∂ β˜ ∂ x¯ ∂ β˜ ∂ α˜ ∂ α˜ +2 Cov[¯x, y¯ ] ∂ x¯ ∂ y¯ β 2 σδ2 + σε2 ˜ + ˜ − Cov[¯y, β]). ˜ ≈ µ2 Var[β] + 2µ(β Cov[¯x, β] n
˜ and Cov[¯y, β] ˜ may be derived by further applications of the For each β˜ given in Section 3, Cov[¯x, β] delta method. Again, the complete list of general formulae needed to populate each element of the asymptotic variance–covariance matrices is included in [14]. The formula derived above is a generalization of that derived by [18] who solely considered asymptotic variance–covariance matrices for the normal structural model (as given in Section 2). Indeed, if the normal structural model applies, then as β˜ is a function only of second-order moments ˜ = (or higher), β˜ is statistically independent of the first-order sample moments. As a result Cov[¯x, β] ˜ = 0 and the formula derived above collapses to that suggested by Hood et al. [18]. Cov[¯y, β] References [1] R.J. Adcock, A problem in least squares, The Analyst 5 (2) (1878) 53–54. [2] M.G. Akritas, M.A. Bershady, Linear regression for astronomical data with measurement errors and intrinsic scatter, Astrophysical Journal 470 (1996) 706–714. [3] H. Bolfarine, L.K. Cordani, Estimation of a structural linear regression model with a known reliability ratio, Annals of the Institute of Statistical Mathematics 45 (3) (1993) 531–540. [4] K.O. Bowman, L.R. Shenton, Method of moments, in: Encyclopedia of Statistical Sciences, vol. 5, John Wiley & Sons, Canada, 1985. [5] R.J. Carroll, K. Roeder, L. Wasserman, Flexible measurement error models, Biometrics 55 (1) (1999) 44–54. [6] R.J. Carroll, D. Ruppert, L.A. Stefanski, Measurement Error in Nonlinear Models, Chapman & Hall, London, 1995. [7] C.L. Cheng, J. Riu, On estimating linear relationships when both variables are subject to heteroscedastic measurement errors, Technometrics 48 (4) (2006) 511–519. [8] C.L. Cheng, J.W. Van-Ness, Statistical Regression with Measurement Error, in: Kendall’s Library of Statistics, vol. 6, Arnold, London, 1999. [9] H. Cramer, Mathematical Methods of Statistics, in: Princeton Mathematical Series, vol. 9, Princeton University Press, Princeton, NJ, 1946. [10] M.H. DeGroot, Probability and Statistics, Addison-Wesley, Menlo Park, CA, 1989. [11] E.J. Dudewicz, S.N. Mishra, Modern Mathematical Statistics, John Wiley & Sons, New York, 1988. [12] G. Dunn, Statistical Evaluation of Measurement Errors, Arnold, London, 2004. [13] W.A. Fuller, Measurement Error Models, in: Wiley Series in Probability and Mathematical Statistics: Probability and Mathematical Statistics, John Wiley & Sons, New York, 1987. [14] J.W. Gillard, Errors in variables regression: what is the appropriate model?, Ph.D. Thesis, Cardiff University, 2008. [15] J.W. Gillard, T.C. Iles, Methods of fitting straight lines where both variables are subject to measurement error, Current Clinical Pharmacology 4 (3) (2009) 164–171. [16] P. Gustafson, Measurement Error and Misclassification in Statistics and Epidemiology: Impacts and Bayesian Adjustments, Chapman & Hall, London, 2004. [17] K. Hood, Some statistical aspects of method comparison studies, Ph.D. Thesis, Cardiff University, 1998. [18] K. Hood, A.B.J. Nix, T.C. Iles, Asymptotic information and variance–covariance matrices for the linear structural model, Journal of the Royal Statistical Society. Series D 48 (4) (1999) 477–493. [19] B.C. Kelly, Some aspects of measurement error in linear regression of astronomical data, The Astrophysical Journal 665 (2007) 1489–1506. [20] M.G. Kendall, A. Stuart, The Advanced Theory of Statistics Volume Two, Charles Griffin and Co. Ltd., London, 1973 [21] C.H. Kummel, Reduction of observed equations which contain more than one observed quantity, The Analyst 6 (1879) 97–105. [22] M. Patefield, Fitting non-linear structural relationships using SAS procedure NLMIXED, Journal of the Royal Statistical Society. Series D 51 (3) (2002) 355–366.
J. Gillard / Statistical Methodology 8 (2011) 291–303
303
[23] A.G. Patriota, H. Bolfarine, M. de Castro, A heteroscedastic structural errors-in-variables model with equation error, Statistical Methodology 6 (2009) 408–423. [24] M.Y. Wong, Likelihood estimation of a simple linear regression model when both variables have error, Biometrika 76 (1) (1989) 141–148. [25] A. Zellner, An Introduction to Bayesian Inference in Econometrics, in: Wiley Series in Probability and Mathematical Statistics: Probability and Mathematical Statistics, John Wiley & Sons, New York, 1971.