An expectation–maximization scheme for measurement error models

An expectation–maximization scheme for measurement error models

Statistics and Probability Letters 120 (2017) 61–68 Contents lists available at ScienceDirect Statistics and Probability Letters journal homepage: w...

400KB Sizes 4 Downloads 96 Views

Statistics and Probability Letters 120 (2017) 61–68

Contents lists available at ScienceDirect

Statistics and Probability Letters journal homepage: www.elsevier.com/locate/stapro

An expectation–maximization scheme for measurement error models Anindya Bhadra Department of Statistics, Purdue University, 250 N. University Street, West Lafayette, IN 47907-2066, United States

article

info

Article history: Received 28 October 2015 Received in revised form 7 September 2016 Accepted 8 September 2016 Available online 21 September 2016 Keywords: Expectation–maximization Maximum likelihood Measurement error Splines

abstract We treat the problem of maximum likelihood estimation in measurement error models. Direct maximization of the analytically intractable likelihood in such models is difficult. We derive an efficient expectation–maximization scheme for truncated polynomial spline models of degree one. Simulation results confirm the effectiveness of the proposed method. © 2016 Elsevier B.V. All rights reserved.

1. Introduction There is a considerable literature on nonparametric regression problems where the covariates are measured with error. A semi-parametric framework is described by Carroll et al. (1999) who use the SIMEX method of Cook and Stefanski (1994). Alternatively, a fully Bayesian framework is described by Berry et al. (2002) who find marked performance improvement in simulations over the SIMEX-based method of Carroll et al. (1999) and the iterative conditional modes (ICM) approach of Besag (1986). The Bayesian approach of Berry et al. (2002) relies on a Metropolis–Hastings algorithm, which can severely restrict its applicability due to poor mixing and the requirement to carefully tune the proposal distribution. Bhadra and Carroll (2016) show that for the special case of degree one truncated polynomial and B-spline models, it is actually possible to design a Gibbs sampler, thereby eliminating the problems associated with the Metropolis–Hastings step in Berry et al. (2002). Despite these prior works in semi-parametric and Bayesian estimation procedures, a maximum likelihood estimation (mle) technique for these models have not been studied carefully. This is in part due to the intractability of the likelihood that does not have a closed form and does not lend itself to easy maximization. Using the recent results by Bhadra and Carroll (2016), we show that for the case of degree one truncated polynomial splines, it is possible to formulate an efficient expectation–maximization (E–M) algorithm (Dempster et al., 1977) that permits finding the mle in these models. 2. The likelihood for measurement error models Following Carroll et al. (1999), consider the measurement error model: Yi = g (Xi ) + ϵi , Wij = Xi + Uij ,

(1) j = 1 , . . . , m i , i = 1 , . . . , n,

E-mail address: [email protected]. http://dx.doi.org/10.1016/j.spl.2016.09.007 0167-7152/© 2016 Elsevier B.V. All rights reserved.

(2)

62

A. Bhadra / Statistics and Probability Letters 120 (2017) 61–68

where n is the sample size and mi > 1 is the number of replicates for sample i for i = 1, . . . , n. Let (Yi , Wi1 , . . . , Wimi ) for i = 1, . . . , n be the observed data and let Xi be latent Gaussian random variables with mean µx and variance σx2 . Let ϵi be independent and identically distributed normal random variables with mean 0 and variance σϵ2 ; Uij be independent and identically distributed normal random variables with mean 0 and variance σu2 ; Uij and ϵi be uncorrelated. Let g (·) denote the true, unknown mean function, which is assumed to be smooth. A useful technique for modeling g (·) to approximate it to a desired degree of smoothness is via penalized truncated polynomial splines (P-splines) bases (Eilers and Marx, 1996). Thus, consider the following hierarchical model for k = 1, . . . , K : Yi = β0 + β1 Xi +

K 

Bk (Xi )θk + εi ,

(3)

k=1

Wij = Xi + Uij ,

εi ∼ Normal(0, σε ),

(4)

Uij ∼ Normal(0, σ ),

2

2 u

Xi ∼ Normal(µx , σ ). 2 x

(5)

Our goal is to find the mle of the parameters 9 = (β0 , β1 , θ1 , . . . , θK , σε , σ , µx , σ ). Define M = 2

2 u

2 x

n

i=1

mi , Wi• =

mi

j=1

Wij ,

W i• = Wi• /mi . If Xi were observed, this is a problem of finding the mle in a model with normal likelihood, which is simple. In an E–M scheme, the difficulty lies in computing the conditional expectation of the complete data log likelihood given the observed data. We show that this has an analytically tractable form for degree 1 truncated polynomial splines, for which Bk (Xi ) = (Xi − ωk )+ , where ω1 , . . . , ωK are the known knot points. Denote by 2 the (K + 2) parameter vector (β0 , β1 , θ1 , . . . , θK )T and define the n × (K + 2) matrix Z as



1

X1

Z =  .. 1

Xn

.

B1 (X1 )

.. .

··· .. . ···

.. . B1 (Xn )

BK (X1 )



..  . .  BK (Xn )

The log likelihood of the complete data in Eqs. (3)–(5) is given by

L = −(2σε2 )−1 (Y − Z2)T (Y − Z2) − (n/2) log(σε2 ) mi n  n   (Wij − Xi )2 − (M /2) log(σu2 ) − (2σx2 )−1 (Xi − µx )2 − (n/2) log(σx2 ) + constant,

− (2σu2 )−1

i=1 j=1

i=1

where ‘‘constant’’ is a collection of all terms that are independent of data as well as model parameters. We need to impute the missing data in the complete data log likelihood. Expanding the log likelihood, except for an irrelevant constant,

L = −(2σε2 )−1 (YT Y − 2T ZT Y − YT Z2 + 2T ZT Z2) − (n/2) log(σε2 ) mi n n    (Wij − Xi )2 − (M /2) log(σu2 ) − (2σx2 )−1 (Xi − µx )2 − (n/2) log(σx2 ).

− (2σu2 )−1

i=1 j=1

We have YT Y =

n

i=1

i=1

Yi2 . We also have

2T ZT Y = YT Z2 = β0

n 

Yi + β1

i=1

n 

Xi Yi +

i =1

n  K 

θk Bk (Xi )Yi ,

i=1 k=1

and

2T ZT Z2 =

n 

 β0 + β1 Xi +

i=1

K 

2 θk Bk (Xi )

.

k=1

Thus, the log likelihood is

L = −(2σε2 )−1

n 

 Yi2 − 2β0 Yi − 2β1 Xi Yi − 2Yi

i=1

K 

 θk Bk (Xi ) + β0 + β1 Xi +

k=1 mi

− (n/2) log(σε2 ) − (2σu2 )−1

n  

(Wij − Xi )2 − (M /2) log(σu2 )

i =1 j =1 n  − (2σx2 )−1 (Xi − µx )2 − (n/2) log(σx2 ). i=1

K  k=1

2  θk Bk (Xi ) 

A. Bhadra / Statistics and Probability Letters 120 (2017) 61–68

63

Expanding further, we get

L = −(2σε2 )−1

n  

Yi2 − 2β0 Yi − 2β1 Xi Yi − 2Yi

i =1 K 

θk Bk (Xi ) + 2β1

k=1

− (n/2) log(σε ) − (2σ )

2 −1 u

m n i   i=1

n 

Xi2 + nµ2x − 2µx

i=1

K 

θk2 {Bk (Xi )}2

k=1

θk Xi Bk (Xi ) + 2

k=1

2

− (2σx2 )−1

θk Bk (Xi ) + β02 + β12 Xi2 +

k=1 K 

+ 2β0 β1 Xi + 2β0



K 

K  k−1 

θk θj Bk (Xi )Bj (Xi )



k=1 j=1

 Wij2

+

mi Xi2

− 2Wi• Xi − (M /2) log(σu2 )

j =1 n 

 Xi

− (n/2) log(σx2 ),

(6)

i=1

as the complete data log likelihood. Clearly, in order to compute the observed data log likelihood, one needs to integrate out the latent Xi from the complete data log likelihood, which is an exercise that proves to be difficult. This renders a direct maximization of the observed data log likelihood untenable. However, in the next section, we show that it is possible to impute the latent Xi in the complete data log likelihood, making an E–M scheme for finding the mle feasible. 3. The proposed E–M algorithm Define the convention ω0 = −∞ and ωK +1 = ∞. Our derivation of the E–M algorithm is based on the following result proved by Bhadra and Carroll (2016), who showed that for degree one truncated polynomial splines, the latent Xi given the observed data is distributed as a mixture of double truncated normal random variables, truncated at the knot points. Proposition 1 (Bhadra and Carroll, 2016). Define

ζ1iL ζ2iL

  −1  2 L    = (σε2 )−1 β1 + θℓ + mi /(σu2 ) + 1/(σx2 ) ;   ℓ=1     L L L    2 −1 = −(2σε ) −2β1 Yi − 2Yi θℓ + 2 β0 − θℓ ωℓ β1 + θℓ + (Wi• /σu2 + µx /σx2 );

ζ3iL = −(2σε2 )−1

 

L 



ℓ=1

2Yi

ℓ=1

ℓ=1



2  

θℓ ωℓ + β0 −

L 

ℓ=1

θℓ ωℓ

ℓ=1

.



Further define biL = (ωL+1 − ζ1iL ζ2iL )/ ζ1iL ;



aiL = (ωL − ζ1iL ζ2iL )/ ζ1iL ;



 Di =

K 

−1   {Φ (biL ) − Φ (aiL )} exp (1/2) log(ζ1iL ) + ζ3iL + ζ1iL (ζ2iL )2 /2

;

L=0

piL = Di {Φ (biL ) − Φ (aiL )} exp (1/2) log(ζ1iL ) + ζ3iL + ζ1iL (ζ2iL )2 /2





   = Di ζ1iL {Φ (biL ) − Φ (aiL )} exp ζ3iL + ζ1iL (ζ2iL )2 /2 . Then, the complete conditional distribution of Xi is a mixture of truncated normal random variables on the intervals [ωL , ωL+1 ] for L = 0, . . . , K , with means µiL = ζi1L ζi2L , variances σiL2 = ζi1L and with mixing probabilities piL . The intuitive explanation for the above result to hold is that if a degree one spline is used then the only terms that enter the log likelihood are linear and quadratic in the latent Xi ; but not higher order terms in Xi . Proposition 1 then follows, essentially via a ‘‘completing the squares’’ argument. Using the above result, Bhadra and Carroll (2016) went on to develop a Gibbs sampler to perform Bayesian analysis for the model specified in Eqs. (3)–(5). However, it is possible to use the same result to develop an E–M algorithm to find the mle as well. Since the complete conditional distribution of Xi is available from Proposition 1, the E-step reduces to imputing the required conditional expectations in the complete data log likelihood of Eq. (6). Since the complete data likelihood is just a normal likelihood, maximizing it is considerably easier than maximizing the observed data log likelihood, which has no closed form. We detail the E and M steps in Algorithm 1. Steps (E.1–E.2) and (M.1–M.2) are repeated until convergence. Detailed derivation is provided in the Appendix.

64

A. Bhadra / Statistics and Probability Letters 120 (2017) 61–68

Algorithm 1: E-M algorithm for measurement error models Data: {Yi , Wij }, for j = 1, . . . , mi and i = 1, . . . , n.

ˆ (0) . Parameters: Knot points ω1 , . . . , ωK ; number of iterations T ; initial parameters 9 2 2 ˆ Result: Maximum likelihood estimate 9 of 9 = (β0 , β1 , θ1 , . . . , θK , σε , σu , µx , σx2 ).

// E-M iteration for t = 0 to T − 1 do // (t + 1)th E-step

ˆ (t ) . (E.1) Compute piL , µiL , σiL2 for L = 0, . . . , K by Proposition 1 with 9 = 9 (E.2) For i = 1, . . . , n and j, k = 1, . . . , K ; Set K p µ , LK=0 iL iL E[Xi2 ] = L=0 piL (µ2iL + σiL2 ), K K E[Bk (Xi )] = L=k piL µiL − ωk L=k piL K K E[Xi Bk (Xi )] = L=k piL (µ2iL + σiL2 ) − ωk L=k piL µiL ,  K K K E[{Bk (Xi )}2 ] = L=k piL (µ2iL + σiL2 ) − 2ωk L=k piL µiL + ωk2 L=k piL , K K K E[Bk (Xi )Bj (Xi )] = L=max{k,j} piL (µ2iL + σiL2 ) − (ωk + ωj ) L=max{k,j} piL µiL + ωk ωj L=max{k,j} piL . // (t + 1)th M-step E[Xi ] =

(M.1) Set

βˆ 0(t +1) = βˆ 1(t +1) = θˆk(t +1) = 2 (t +1)

{σˆ ε }

K ˆ (t ) ˆ (t ) i=1 Yi −β1 E[Xi ]− k=1 θk E[Bk (Xi )]

n 



,

n   ˆ (t +1) E[Xi ]− K θˆ (t ) E[Xi Bk (Xi )] k=1 k i=1 Yi E[Xi ]−β0

n 

n , 2 i=1 E[Xi ]   −1 (t +1) (t +1) (t +1) ˆ ˆ Yi E[Bk (Xi )]−β0 E[Bk (Xi )]−β1 E[Xi Bk (Xi )]− kj= θˆ E[Bj (Xi )Bk (Xi )] 1 j n ; ∀k, 2 i=1 E[{Bk (Xi )} ]

n  i=1

=

{σˆ u2 }(t +1) = {σˆ x2 }(t +1) =

(t +1) A 2 σε n (t +1) A 2 σu M (t +1) A 2 σx n

n

; where A(σt2+1) is as defined in Appendix A.2, ε

; where

(t +1) A 2 σu

=

; where A(σt2+1) =

n mi

j =1

i=1

Wij2 + mi E[Xi2 ] − 2Wi• E[Xi ] ,



x

i=1

(t )

(t )

n 



ˆ x E [X i ] , E[Xi2 ] + {µ ˆ x }2 − 2µ

µ ˆ x(t +1) = i=1n i . ˆ =9 ˆ (t +1) = {βˆ 0 , βˆ 1 , θˆ1 , . . . , θˆK , σˆ ε2 , σˆ u2 , µ (M.2) Set 9 ˆ x , σˆ x2 }(t +1) . E[X ]

end

4. Numerical examples Our simulation example is similar to one of the examples of Bhadra and Carroll (2016). Let the true, nonlinear mean function be g (x) =

3 sin(π x/2) 1 + 2x2 {I (x > 0) + 1}

,

where I denotes the indicator function. We generate Xi from Normal(µx , σx2 ). To simulate data according to Eqs. (1)–(2), we use n = 500, mi = 2 for all i, µx = 0, σx2 = 1, σϵ2 = 0.09 and σu2 = 0.33. Table 1 shows the true parameter values and the maximum likelihood estimates found by the proposed E–M algorithm (Algorithm 1). We use 25 knots and place them evenly on the sample quantiles of Wi• . We now provide some guidelines   for choosing the initial values in the E–M algorithm. We use µ ˆ (x0) = W •• = ni=1 Wi• / ni=1 mi . Since {Wij } can n be thought of as observations from a random one-factor ANOVA with n levels and total number of observations m• = i=1 mi , quite straightforward initial estimates of σu2 and σx2 can be obtained. If we define MSE =

mi n   (Wij − W i• )2 /(m• − n), i=1 j=1

MSTR =

n 

mi (W i• − W •• )/(n − 1),

i =1

then E(MSE ) = σu2 and E(MSTR) = c σx2 + σu2 , where c = (m• − and {σˆ x2 }(0) may be taken as

{σˆ u2 }(0) = MSE ,

{σˆ x2 }(0) = max{(MSTR − MSE )/c , α},

n

i=1

m2i /m• )/(n − 1). Thus, the initial estimates of {σˆ u2 }(0)

A. Bhadra / Statistics and Probability Letters 120 (2017) 61–68

65

Table 1 Parameter estimates by the E–M algorithm. Parameter

True value

mle

µx σx2 σu2 σϵ2

0 1 0.33 0.09

−0.01 1.04 0.28 0.11

where α > 0 is a small constant, such as 0.01, that ensures a positive initial value for {σˆ x2 }(0) . Our choice of initial values can be interpreted as simple method of moments estimators of the parameters and similar guidelines were used by Berry et al. (2002). Moreover, we can obtain the initial estimates of β ’s, θ ’s and σε2 by fitting the regression of Eq. (3) by ordinary least squares where we substitute W i• instead of the unobserved Xi . In our simulations, with these choices for the initial values, the E–M algorithm converges to the mle within 100 iterations. We also experiment with slightly different starting values, by adding mean zero Gaussian noise with variance 0.1 to the method of moments starting values listed above and the convergence of the algorithm appears to be stable. The total computer time required by our MATLAB implementation is around 2 min on a Macintosh with a 3.1 GHz Intel processor and 16 GB of RAM. 5. Discussion We proposed an E–M algorithm for computing the maximum likelihood estimate in a measurement error model. Our results complement the semi-parametric technique of Carroll et al. (1999) and the Bayesian techniques of Berry et al. (2002) and Bhadra and Carroll (2016). We used a truncated polynomial spline of degree 1 as our choice of basis function. Using the results in Proposition 2 of Bhadra and Carroll (2016), it can be seen easily that a similar E–M scheme will still be feasible if degree one B-spline bases are used instead. Its implementation should be considered future work. Acknowledgments The author thanks Raymond J. Carroll for providing feedback on the manuscript. The manuscript was considerably improved by the review comments. This material is based upon work supported by the National Science Foundation under Grant No. DMS-1613063. Appendix. Derivation of the E–M algorithm A.1. E-step Note from Eq. (6), in the (t + 1)’th E-step, it is only necessary to impute the following terms: E(Xi |Y, W, 9 = (t ) ˆ ˆ (t ) ), E[Bk (Xi )|Y, W, 9 = 9 ˆ (t ) ], E[{Bk (Xi )}2 |Y, W, 9 = 9 ˆ (t ) ], E[Xi Bk (Xi )|Y, W, 9 = 9 ˆ (t ) ] and 9 ), E(Xi2 |Y, W, 9 = 9 ˆ (t ) ], where 9 ˆ (0) is an initial guess for the parameter values. Without risking ambiguity, we E[Bk (Xi )Bj (Xi )|Y, W, 9 = 9 drop the conditioning variables from the following and let π (xi ) denote the density of (Xi |Y, W, Ψ = Ψˆ (t ) ). Now using Proposition 1,

E [X i ] =

K 

piL µiL ,

L=0

E[Xi2 ] =

K 

piL (µ2iL + σiL2 ).

L=0

For E[Bk (Xi )], note that

E[Bk (Xi )] = E[(Xi − ωk )I (Xi > ωk )]





= ωk  ∞

= ωk

=

K  L =k

(xi − ωk )π (xi )dxi xi π (xi )dxi − ωk

piL µiL − ωk

K  L=k





ωk

piL .

π (xi )dxi

66

A. Bhadra / Statistics and Probability Letters 120 (2017) 61–68

Similarly, for E[Xi Bk (Xi )], note that

E[Xi Bk (Xi )] = E[Xi (Xi − ωk )I (Xi > ωk )] ∞

 =

ωk  ∞

= ωk

=

K 

xi (xi − ωk )π (xi )dxi x2i π (xi )dxi − ωk





xi π (xi )dxi

ωk

piL (µ2iL + σiL2 ) − ωk

L=k

K 

piL µiL .

L=k

For E[{Bk (Xi )}2 ], note that

E[{Bk (Xi )}2 ] = E[(Xi − ωk )2 I (Xi > ωk )] ∞

 =

ωk  ∞

= ωk

=

K 

(xi − ωk )2 π (xi )dxi x2i

π (xi )dxi − 2ωk





ωk

piL (µ2iL + σiL2 ) − 2ωk

L =k

xi π (xi )dxi + ω

2 k

K 

piL µiL + ωk2

L=k





K 

π (xi )dxi

ωk

piL .

L=k

Finally, for E[Bk (Xi )Bj (Xi )], note that

E[Bk (Xi )Bj (Xi )] = E[(Xi − ωk )(Xi − ωj )I (Xi > max{ωk , ωj })] ∞

 =

max{ωk ,ωj }





= max{ωk ,ωj } K 

=

(xi − ωk )(xi − ωj )π (xi )dxi x2i

π (xi )dxi − (ωk + ωj )





max{ωk ,ωj }

piL (µ2iL + σiL2 ) − (ωk + ωj )

L=max{k,j}

K 

xi π (xi )dxi + ωk ωj

∞ max{ωk ,ωj }

K 

piL µiL + ωk ωj

L=max{k,j}



π (xi )dxi

piL .

L=max{k,j}

Substituting these terms in Eq. (6) completes the E-step. A.2. M-step Suppose the latent component of the complete data log likelihood has been imputed according to the conditional expectations computed in the E-step. In what follows, f (ν) denotes the terms containing parameter ν in the complete data log likelihood of Eq. (6). Then, we perform a conditional maximization of one parameter given the rest (i.e., a cyclic coordinate descent) for the (t + 1)’th M-step as follows: 1. f (β0 ) = −(2σε )

2 −1

n 

 −2β0 Yi + β + 2β0 β1 E[Xi ] + 2β0 2 0

i=1

K 

 θk E[Bk (Xi )] .

k =1

Equating f ′ (β0 ) = 0 we get n 

βˆ 0(t +1) =



(t ) Yi − βˆ 1 E[Xi ] −

i=1

K 

θˆk(t ) E[Bk (Xi )]



k=1

.

n

2. f (β1 ) = −(2σε )

2 −1

n 

 −2β1 Yi E[Xi ] + β

2 1E

Xi2

[

] + 2β0 β1 E[Xi ] + 2β1

i=1

k=1

Equating f ′ (β1 ) = 0 we get n 

βˆ 1(t +1) =



K 

(t +1)

Yi E[Xi ] − βˆ 0

E[Xi ] −

i=1

K  k=1

n  i =1

E[Xi2 ]

θˆk(t ) E[Xi Bk (Xi )]

 .

 θk E[Xi Bk (Xi )] .

A. Bhadra / Statistics and Probability Letters 120 (2017) 61–68

67

3. For k = 1, . . . , K f (θk ) = −(2σε )

2 −1

n  

−2θk Yi E[Bk (Xi )] + θk2 E[{Bk (Xi )}2 ] + 2β1 θk E[Xi Bk (Xi )]

i =1

+ 2β0 θk E[Bk (Xi )] + 2θk

k−1 



θj E[Bj (Xi )Bk (Xi )] .

j =1

Equating f ′ (θk ) = 0 we get n 

θˆk(t +1) =



(t +1) (t +1) Yi E[Bk (Xi )] − βˆ 0 E[Bk (Xi )] − βˆ 1 E[Xi Bk (Xi )] −

i=1

k−1



θˆj(t +1) E[Bj (Xi )Bk (Xi )]



j=1 n 

.

E[{Bk (Xi )} ] 2

i =1

4. Let (t +1) A 2 σε

=

n  

K 

i=1

k=1

Yi2 − 2β0 Yi − 2β1 Yi E[Xi ] − 2Yi

K 

+

θk2 E[{Bk (Xi )}2 ] + 2β0 β1 E[Xi ] + 2β0

k=1

+2

θk E[Bk (Xi )] + β02 + β12 E[Xi2 ] K  k=1

K  k−1 

 θk θj E[Bk (Xi )Bj (Xi )] ,

k=1 j=1

(t +1) where we use β0 = βˆ 0 , β1 = βˆ 1(t +1) and θk = θˆk(t +1) . Then,

f (σε2 ) = −

(t +1) σε2

A

2σε2

− (n/2) log(σε2 ),

and using f ′ (σε2 ) = 0 we have 2 (t +1)

{σˆ ε }

(t +1) σε2

A

=

n

.

5. Let (t +1) A 2 σu

=

m n i  

 Wij2

+ mi E[

Xi2

] − 2Wi• E[Xi ] .

j=1

i =1

Then, f (σ ) = − 2 u

A

(t +1) σu2

2σu2

− (M /2) log(σu2 )

and using f ′ (σu2 ) = 0 we have (t +1) σu2

A

{σˆ u2 }(t +1) =

M

.

6. Let A

(t +1) σx2

=

n  

E[Xi2 ] + {µ ˆ (xt ) }2 − 2µ ˆ (xt ) E[Xi ] .



i =1

Then, f (σ ) = − 2 x

A

(t +1) σx2

2σx2

− (n/2) log(σx2 ),

and using f ′ (σx2 ) = 0 we have

{σˆ x2 }(t +1) =

(t +1) σx2

A

n

.

θk E[Bk (Xi )] + 2β1

K  k=1

θk E[Xi Bk (Xi )]

68

A. Bhadra / Statistics and Probability Letters 120 (2017) 61–68

7. Finally, we have n 

(t +1)

µ ˆx

=

E[Xi ]

i=1

n

.

The E and M steps are repeated until the convergence of the algorithm. References Berry, S.M., Carroll, R.J., Ruppert, D., 2002. Bayesian smoothing and regression splines for measurement error problems. J. Amer. Statist. Assoc. 97, 160–169. Besag, J., 1986. On the statistical analysis of dirty pictures. J. R. Stat. Soc. Ser. B Stat. Methodol. 48, 259–302. Bhadra, A., Carroll, R.J., 2016. Exact sampling of the unobserved covariates in Bayesian spline models for measurement error problems. Stat. Comput. 26, 827–840. Carroll, R.J., Maca, J.D., Ruppert, D., 1999. Nonparametric regression in the presence of measurement error. Biometrika 86, 541–554. Cook, J.R., Stefanski, L.A., 1994. Simulation-extrapolation estimation in parametric measurement error models. J. Amer. Statist. Assoc. 89, 1314–1328. Dempster, A.P., Laird, N.M., Rubin, D.B., 1977. Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B Stat. Methodol. 39, 1–38. Eilers, P.H.C., Marx, B.D., 1996. Flexible smoothing with B-splines and penalties. Statist. Sci. 11, 89–121.