Accepted Manuscript Case-deletion diagnostics for spatial linear mixed models F. De Bastiani, M.A. Uribe-Opazo, M. Galea, A.H. M. A. Cysneiros
PII: DOI: Reference:
S2211-6753(17)30224-5 https://doi.org/10.1016/j.spasta.2018.07.007 SPASTA 314
To appear in:
Spatial Statistics
Received date : 8 September 2017 Accepted date : 29 July 2018 Please cite this article as: De Bastiani F., Uribe-Opazo M.A., Galea M., Cysneiros A.H.M.A., Case-deletion diagnostics for spatial linear mixed models. Spatial Statistics (2018), https://doi.org/10.1016/j.spasta.2018.07.007 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Case-deletion diagnostics for spatial linear mixed models F. De Bastiania,∗, M.A. Uribe-Opazob , M. Galeac , A.H.M.A. Cysneirosa a
Universidade Federal de Pernambuco, Avenida Professor Moraes Rego 1235, Recife 50740-540, Pernambuco, Brazil b Universidade Estadual do Oeste do Paran´ a, Rua Universit´ aria 2069, Cascavel 85819-110, Paran´ a, Brazil c Pontificia Universidad Cat´ olica de Chile, Avenida Vicu˜ na Mackenna 4860, Macul, Santiago 782-0436, Chile
Abstract We present global diagnostics techniques to assess the influence of observations on spatial linear mixed models. We review concepts of Cook’s distance based on the likelihood and Q-function in the framework of geostatistical models. The main novelty in spatial statistics, is that we obtain more details to evaluate the sensitivity of the model by splitting the information we have related to the covariance matrix, identifying if the influential observation affects variance error or if it also affects the parameters that determine the spatial dependence structure, i.e. if it changes the geostatistical model selected. A simulation study shows the behaviour of this methodology and an application to real data illustrates the methodology developed. Keywords: Geostatistical; Global influence; Maximum likelihood; Spatial variability.
∗
Corresponding author Email addresses:
[email protected] (F. De Bastiani),
[email protected] (M.A. Uribe-Opazo),
[email protected] (M. Galea),
[email protected] (A.H.M.A. Cysneiros)
Preprint submitted to Spatial Statistics
July 28, 2018
1. Introduction Geostatistics refers to the sub-branch of spatial statistics in which the data consist of a finite sample of measured values relating to an underlying spatially continuous phenomenon, Diggle and Ribeiro Jr. (2007). Some proposals have discussed Gaussian spatial linear models to study the structure of dependence in spatially referenced data. For more details about estimation, inference methods and applications of these models, see De Bastiani et al. (2017). Most diagnostic measures were originally developed under linear regression models. Cook’s (Cook, 1977) distance is one of the most widely diagnostic tools for detecting influential individual (or subsets of) observations in linear regression. Cook and Weisberg (1982) gave a comprehensive account of a variety of methods for the study of influence on linear regression. Chatterjee and Hadi (1988) treated linear regression diagnostics as a tool for applying of linear regression models to real-life data. Haslett and Dillane (2004) extended previous work of deletion diagnostics for estimates of variance components obtained by restricted maximum likelihood estimation for the linear mixed model. Davison and Tsai (1992) reviewed various diagnostics for generalized linear models and extended them to more general models, such as models for censored and grouped data (i.e., nonlinear). Wei (1998) presented a comprehensive introduction to exponential family nonlinear models, paying attention to regression diagnostics and influence analysis. Some proposals have discussed diagnostics tools for spatial linear models. In geostatistics, an atypical observation can cause changes in environmental 2
and geological patterns. Christensen et al. (1992a) measured the effect of the observations on prediction using diagnostics based on case-deletion. Christensen et al. (1993) proposed diagnostics to detect influential observations on the estimation of the covariance function. Zewotir and Galpin (2005) provided routine diagnostic tools for linear mixed models, which are computationally inexpensive. Militino et al. (2006) proposed influence measures for multivariate spatial linear models. Uribe-Opazo et al. (2012) presented local influence for Gaussian spatial linear models. De Gruttola et al. (1987) described measures of influence and leverage for a generalized least squares (GLS) estimator of the regression coefficients in a class of multivariate linear models for repeated measurements. Waternaux et al. (1989) suggested several pratical procedures to detect outliers for the repeated measurements model based on the global influence approach. For example, for longitudinal data, Preisser and Qaqish (1996) developed Cook’s distance for generalized estimation equations. Christensen et al. (1992b), Banerjee (1998), Demidenko and Stukel (2005) and Pan et al. (2014) considered case deletion and subject deletion diagnostics for linear mixed models. Assump¸ca˜o et al. (2014) presented the generalized leverage to evaluate the influence of a vector on its own predicted value in a different approach spatial linear models. De Bastiani et al. (2015b) presented a brief introduction to global diagnostics in Gaussian spatial linear models with multiple repetitions and De Bastiani et al. (2017) considered local influence diagnostics for this class of models. In this work we propose the study of global influence measures for quantifying the effects of perturbing spatial linear mixed models with repeated
3
measures, extending and complementing the work presented by De Bastiani et al. (2015b) and De Bastiani et al. (2017), respectively. The main novelty in spatial statistics, is that we obtain more details to evaluate the sensitivity of the model by splitting the information we have related to the covariance matrix, identifying if the influential observation affects variance error or if it also affects the parameters that determine the spatial dependence structure, i.e. if it changes the geostatistical model selected. The paper unfolds as follows. Section 2 presents the Gaussian spatial linear mixed model (GSLM). Section 3 presents a brief description of the maximum likelihood estimation procedure and the information matrix. Section 4 reviews concepts of global influence based on the likelihood and on the Q-function. Section 5 presents a simulation study. Section 6 contains an application with real data, to illustrate the methodology developed in this paper. Finally, Section 7 contains some concluding remarks. Calculations are presented in the appendices. 2. The Gaussian Spatial Linear Mixed Model For a spatial process, the basic object we consider is a stochastic process {Yi (s), s ∈ S ⊂ R2 } (bi-dimensional Euclidean space), usually though not
necessarily in R2 , and for example, i = 1 is one single realization of the process. We also assume that the variance of Yi (s) exists for all s ∈ S. The
process Yi is said to be Gaussian if, for any k ≥ 1 and locations s1 , . . . , sk , the vector (Y1 (s1 ), . . . , Y1 (sk )) has a multivariate normal distribution. Let Y = Y(s) = vec(Y1 (s), ..., Yr (s)) be an nr × 1 Gaussian random vector of r independent stochastic processes with n elements each, depending 4
on the sites si ∈ S ⊂ R2 for i = 1, . . . , n, where s = (s1 , . . . , sn )> . The “vec” operator transforms a matrix into a vector by stacking the columns of the matrix, one underneath the other. It is assumed that the i-th stochastic process Yi (s) = vec(Yi (s1 ), ..., Yi (sn )) represents an n × 1 vector, for i = 1, . . . , r, that can be expressed as a linear mixed model given by Yi (s) = µi (s) + bi (s) + τi (s), where the deterministic term µi (s) is an n×1 vector, the mean of the process, while bi (s), and τi (s) are independent and together they form the stochastic error, i.e. bi (s) + τi (s) = i (s), where bi ∼ N (0, G) and τi ∼ N (0, Di ), and we consider Di + Gi = Σi . More details are presented in Appendix A. The error i (s) is an n × 1 vector of a stationary process with mean equal to zero, E[i (s)] = 0, and covariance matrix Σi . For a linear model µi (s) = X(s)β, where β = (β1 , . . . , βp )> is a p × 1 vector of unknown parameters, X = X(s) = [xj1 (s) . . . xjp (s)] is an n × p matrix of p explanatory variables, for j = 1, . . . , n, i.e, the design matrix X is the same for all r repetitions. As in Smith (2001), the GSLM for the i-th independent stochastic process, assuming a homogeneous process, can be written in matrix form by Yi (s) = Xβ + i (s).
(1)
The matrix Σi = Σ = [C(su , sv )] is an n × n covariance matrix of Yi (s) for the i-th repetition, for i = 1, . . . , r. It is non-singular, symmetric and positive definite, and for a stationary and isotropic process, the elements C(su , sv ) depend on the Euclidean distance duv = ||su − sv || between the sites su and sv . 5
The covariance matrix Σ has a structure which depends on parameters φ = (φ1 , . . . , φq )> as given in Equation (2) (Uribe-Opazo et al., 2012): Σ = φ1 In + φ2 R,
(2)
where, φ1 ≥ 0 is the parameter known as the nugget effect; φ2 ≥ 0 is known as the sill ; R = R(φ3 , φ4 ) = [(ruv )] or R = R(φ3 ) = [(ruv )] is an n×n symmetric matrix, which is a function of φ3 > 0, and sometimes also a function of φ4 > 0, with diagonal elements ruu = 1, (u = 1, . . . , n); ruv = φ−1 2 C(su , sv ) for φ2 6= 0, and ruv = 0 for φ2 = 0, u 6= v = 1, . . . , n, where ruv depends on the Euclidean distance duv = ||su − sv || between the sites su and sv ; φ3 is a function of the model range (a), φ4 is known as the smoothness parameter, and In is an n × n identity matrix. A number of different covariance structures are available, and the question is which fits best, for example, ethe xponential, Gaussian or Mat´ern model. To select between them, beyond the maximum value of the log-likelihood, De Bastiani et al. (2015a) proposed the cross validation and the trace of the asymptotic covariance matrix of an estimated mean, (Kano et al., 1993). We have that Y ∼ Nnr (1r ⊗Xβ, Ir ⊗Σ), with probability density function (pdf) given by f (Y, θ) = =
Qr
i=1
Qr
f (Yi , θ)
−n/2 |Σ|−1/2 exp i=1 (2π)
1 − 2 (Yi − Xβ)> Σ−1 (Yi − Xβ) ,
where ⊗ denote the Kronecker product and (Yi − Xβ)> Σ−1 (Yi − Xβ) = δi is the Mahalanobis distance. The log-likelihood for the GSLM for the r r X independent repetitions is given by L(θ) = log(f (Y, θ)) = Li (θ), with i=1
n 1 1 Li (θ) = − log(2π) − log |Σ| − (Yi − Xβ)> Σ−1 (Yi − Xβ). 2 2 2 6
Consider η = (η1 , η2 , η3 )> = (φ2 , φ3 , φ4 )> , so R = R(η2 , η3 ) and Σ = φ1 I+η1 R, thus θ = (β > , φ1 , η> )> . The Q-function, Q(θ|θ ∗ ) = E[L(θ)|Y, θ = θ ∗ ], for the model presented in (1) is of the form r
nr r 1 X ∗> ∗ log φ1 − Tr(Ω∗ ) − r r 2 2φ1 2φ1 i=1 i i r r 1 X nr ∗ Tr[R−1 (Ω∗ + b∗> − log η1 − log |R| − i bi )], 2 2 2η1 i=1
Q(θ|θ ∗ ) = −
where r∗i b∗i
= Yi − Xβ − b∗i ,
= E(bi |Yi , θ ∗ ) = η1∗ R∗ Σ∗−1 (Yi − Xβ ∗ ),
Ω∗ = Cov(bi |Yi , θ ∗ ) = η1∗ R∗ − η1∗2 R∗ Σ∗−1 R∗ , where the symbol ∗ means that the objects are functions of θ = θ ∗ . 3. Maximum likelihood estimation and asymptotic standard error estimation The unknown parameters to be estimated are the β’s and φ’s. Here we consider that Y1 , Y2 , ...., Yr are independent and identically random vectors, where Yi ∼ Nn (Xβ, Σ), for i = 1, . . . , r and fixed n. The score functions are given by r X ∂L(θ) U(β) = = X> Σ−1 i , ∂β i=1 ∂L(θ) r ∂ vec> (Σ) U(φ) = = − vec(Σ−1 ) ∂φ 2 ∂φ r 1 X ∂ vec> (Σ) −1 + vec(Σ−1 i > i Σ ), 2 i=1 ∂φ
where i = Yi −Xβ. From the solution of the score function of β, U(β) = 0, ˆ = (X> Σ−1 X)−1 X> Σ−1 Y, the maximum likelihood estimator β is given by β 7
r
1X where Y = Yi (si ). More details are presented in Smith (2001) and De r i=1 Bastiani et al. (2017). Asymptotic standard errors, approximately for r large, can be calculated by inverting either the observed information matrix or the expected information matrix (Boos and Stefanski, 2013). The expected information matrix, F(θ), is given by, (see Waller and Gotway (2004); De Bastiani et al. (2017)) Fββ 0 , F(θ) = F = 0 Fφφ r ∂ vec> (Σ) −1 ∂ vec(Σ) (Σ ⊗ Σ−1 ) . 2 ∂φ ∂φ> The derivatives of first and second-order of the scale matrix Σ, with
where Fββ = rX> Σ−1 X and Fφφ =
respect to φq , for some covariance functions are presented in Uribe-Opazo et al. (2012). 4. Diagnostic Techniques Detecting influential observations is an important step in the analysis of a data set. There are different approaches to assess the influence of perturbations in a data set and in the model given the estimated parameters. Cook (1977) gave a starting point in the development of case-deletion diagnostics for all sorts of statistical models. Case-deletion is an example of a global influence analysis, that is to assess the effect of an observation by completely removing it. Cook (1986) presented the local influence approach, that is, a weight ωi is given for each case and the effect on the parameter estimation is measured by perturbing around these weights. Choosing weights equal to zero or one corresponds to the global case-deletion approach. In general, 8
perturbation measures do not depend on the data directly, but rather on its structure via the model. There are some papers in the literature about influential diagnostics for spatial linear models. Diamond and Armstrong (1984) and Warnes (1986) observed the sensitivity of predictions to perturbations in the covariance function. Christensen et al. (1992a) discussed case-deletion diagnostics for detecting observations that are influential for prediction based on universal kriging, while Christensen et al. (1993) considered a diagnostic method to evaluate the sensitivity in the covariance function when the parameters were estimated via restricted maximum likelihood. More recently, Cerioli and Riani (1999) and Militino et al. (2006) showed that case-deletion diagnostics do suffer from masking and suggest robust procedures based on subsets of data. Filzmoser et al. (2014) proposed the Mahalanobis distance to identify multivariate outliers. Case-deletion is a diagnostic technique that evaluates the impact, on the parameter estimates given the model, of eliminating one or more observations from the data set. These kind of diagnostic techniques have been discussed by Cook and Weisberg (1982) and Chatterjee and Hadi (1988), Pan et al. (2014), among others. We can eliminate a subject, each observation or an entire location, by using the technique proposed by Cook (1977) known as Cook’s distance which has a typical measure defined by ˆ−θ ˆ [i] )> M(θ ˆ−θ ˆ [i] ), Diθ = (θ where M is an appropriately chosen positive definite matrix, for instance, ˆ [i] is the estimate of θ the inverse of the asymptotic covariance matrix and θ ˆ but without refitting the model with the same principles used to obtain θ, 9
the i-th observation. 4.1. Likelihood-based diagnostics at subject level The Cook’s distance case-deletion is where we eliminate a subject to evaluate its influence on the analysis. For the GSLM it is given by ˆ−θ ˆ [i] )> [−L¨[i] (θ)]( ˆ θ ˆ−θ ˆ [i] ), Diθ = (θ for i = 1, . . . , r, where L[i] (θ) is the loglikelihood with the i-th subject
ˆ [i] is the maximum likelihood estimate under L[i] (θ). Here deleted, and θ
ˆ because we have that [−L¨[i] (θ)] ˆ −1 is an estimator of the M = [−L¨[i] (θ)], asymptotic covariance matrix and as mentioned above, M commonly is the
ˆ may become inverse of this estimator. Since the computation of [L¨[i] (θ)]
ˆ to replace L¨[i] (θ). ˆ In gen¨ θ) cumbersome when r is too large, we used L(
ˆ −1 is not block-diagonal, so ¨ θ)] eral, the asymptotic covariance matrix [−L( to decompose the Cook’s statistic into different components of interest an ˆ −1 , E[−L( ˆ −1 , which is the ¨ θ)] ¨ θ)] alternative is to use the expectation of [−L( ˆ −1 = ¨ θ)] inverse of the expected information matrix. For the GSLM the [−L( ˆ −1 = F(θ) ˆ −1 , thus both give the same measure given by ¨ θ)] E[−L( ˆ−θ ˆ [i] )> F(θ)( ˆ θ ˆ−θ ˆ [i] ). Diθ = (θ ˆ [i] , we may use a one-step Newton Raphson approximation at To calculate θ ˆ θ ˆ [i] = θ ˆ + [−L¨[i] (θ)] ˆ −1 L˙ [i] (θ), ˆ where the dots over the functions denote θ, derivatives with respect to θ, and r
n(r − 1) (r − 1) 1X L[i] (θ) = − (Yk − Xβ)> Σ−1 (Yk − Xβ) . log(2π)− |Σ|− 2 2 2 k=1 k6=i
10
ˆ to re¨ θ) Once again, to overcome some computational difficulty, we use L(
ˆ and F(θ) ˆ −1 to replace L( ˆ The approximated Cook’s distance ¨ θ). place L¨[i] (θ), becomes 1 ˆ > F(θ) ˆ −1 [L˙ [i] (θ)] ˆ Diθ = [L˙ [i] (θ)]
ˆ > F(θ) ˆ −1 [U[i] (θ)], ˆ = [U[i] (θ)]
ˆ = (U[i] (β), ˆ U[i] (φ)) ˆ > is the score function forfor i = 1, . . . , r, where U[i] (θ) ˆ (given below). We can decompose mula without the i-th subject evaluatd at θ 1 1 1 Diθ = Diβ + Diφ , which means that the diagnostic measure of θ is the sum of
the diagnostic measures of the fixed effects β and the variance components 1 1 ˆ > F−1 [U[i] (β)] ˆ and φ in terms of Diθ , for i = 1, . . . , r, where Diβ = [U[i] (β)] ββ 1 ˆ > F−1 [U[i] (φ)]. ˆ Also, Diφ = [U[i] (φ)] φφ
ˆ = −X> Σ ˆ −1 ˆi , U[i] (β) and ˆ 1 ˆ −1 ∂ Σ Tr Σ U[i] (φˆj ) = 2 ∂φj
!
−1
ˆ 1 > ∂Σ ˆi ˆi , − 2 ∂φj
or, in vector notation, is given by > ˆ > ˆ ˆ = 1 ∂ vec (Σ) vec(Σ ˆ −1 ) − 1 ∂ vec (Σ) vec(Σ−1 ˆ −1 ˆi ˆ> U[i] (φ) i Σ ), 2 ∂φ 2 ∂φ
ˆ ˆ i = (Yi − Xβ). where 4.2. Q-function-based diagnostics at subject level Pan et al. (2014) also proposed a case-deletion approach to identify influential subjects and influential observations in linear mixed models, based
11
on the Q-function, the conditional expectation of the logarithm of the jointlikelihood between responses and random effects. Consider η = (η1 , η2 , η3 )> = (φ2 , φ3 , φ4 )> , so R = R(η2 , η3 ) and Σ = φ1 I + η1 R, thus θ = (β > , φ1 , η> )> . The Q-function, Q(θ|θ ∗ ) = E[L(θ)|Y, θ = θ ∗ ], for the model presented in (1) is of the form r
nr r 1 X ∗> ∗ Q(θ|θ ) = − log φ1 − Tr(Ω∗ ) − r r 2 2φ1 2φ1 i=1 i i r nr r 1 X ∗ − log η1 − log |R| − Tr[R−1 (Ω∗ + b∗> i bi )], 2 2 2η1 i=1 ∗
where r∗i b∗i
= Yi − Xβ − b∗i ,
= E(bi |Yi , θ ∗ ) = η1∗ R∗ Σ∗−1 (Yi − Xβ ∗ ),
Ω∗ = Cov(bi |Yi , θ ∗ ) = η1∗ R∗ − η1∗2 R∗ Σ∗−1 R∗ , and the ∗ symbol means that the objects are function of θ = θ ∗ .
ˆ [i] of θ, Zhu and Lee (2001) To calculate the case-deletion estimate θ
proposed use of the following one-step approximation based on the Q-function ˆ [i] = θ ˆ + [−Q( ˆ θ)] ˆ −1 Q˙ [i] (θ| ˆ θ), ˆ ¨ θ| θ ˆ θ) ˆ = Q[i] (θ|θ)| ˆ where Q[i] (θ| is the Q-function formed without the i-th θ =θˆ ˆ and Q˙ [i] (θ| ˆ θ) ˆ = subject but evaluated at the maximum likelihood estimate θ, > ˆ ˆ θ) ˆ = [∂ 2 Q[i] (θ|θ)/∂θ∂θ ˆ ¨ [i] (θ| [∂Q[i] (θ|θ)/∂θ] and Q ] ˆ . Pan et al. θ =θˆ θ =θ (2014) proved a theorem implying that, when replacing the loglikelihood
with the Q-function in the EM algorithm, the one-step approximation to the ˆ [i] maintains the same accuracy, of order Op (r−2 ). The maximum likelihood θ Q-function-based Cook’s statistic from Zhu and Lee (2001) and presented in 12
Pan et al. (2014) is given by ˆ θ)] ˆ > [−Q( ˆ θ)] ˆ −1 [Q˙ [i] (θ| ˆ θ)]. ˆ ¨ θ| QDiθ = [Q˙ [i] (θ|
(3)
The Q-function-based Cook’s statistics can be written as the sum of the Q-function-based Cook’s statistics for the fixed effects β and the variance components φ, QDiθ = QDiβ + QDiφ . Then, Pan et al. (2014) proposed use ˆ θ)] ˆ in (3) instead of −Q ˆ θ), ˆ allowing the variance components ¨ θ| ¨ [i] (θ| of − E[Q( to be decomposed as in Equation (4). The modified Q-function-based Cook’s statistic is of the form 1 ˆ θ)]} ˆ −1 [Q˙ [i] (θ| ˆ θ)]. ˆ ˆ θ)] ˆ > {− E[Q( ¨ θ| QDiθ = [Q˙ [i] (θ|
∗ Q˙ [i] (θ|θ ) =
Q˙ iβ Q˙ iφ1 Q˙ iη1 Q˙ iη2 Q˙ iη3
=
1 > ∗ X ri φ1 1 1 n − 2 Tr(Ω∗ ) − 2 r∗> r∗ + 2φ1 2φ1 2φ1 i i n 1 − 2 Tr R−1 Ω∗ + b∗i b∗> i 2η1 2η 1 ∂R 1 ∂R−1 1 −1 ∗ ∗ ∗> Tr R Ω + bi bi + Tr 2 ∂η2 2η1 2 ∂η−1 1 ∂R ∂R 1 −1 ∗ ∗ ∗> Tr R Tr Ω + bi bi + 2 ∂η3 2η1 ∂η3 −
,
∗ ˆ for the spatial linear mixed model ¨ and E[−Q(θ|θ )] evaluated at θ = θ ∗ = θ
has the elements presented in Appendix A. ˆ θ)] ˆ is always block-diagonal with respect to ¨ θ| Because the matrix E[Q( the parameters β, η and φ1 , the modified Q-function-based Cook’s statistic 1 QDiθ can thus be decomposed into three components 1 1 1 1 + QDiφ , QDiθ = QDiβ + QDiη 1
(4)
1 1 1 where QDiβ , QDiη and QDiφ are the modified Q-function-based Cook’s 1
statistics corresponding to the fixed effects β, the between subject covariance components η and the within-subject covariance components φ1 , respectively. 13
4.3. Likelihood based diagnostics at observation level For the model in our study, we have two levels of responses, namely, subjects and repeated measures/observations. Intuitively, an influential subject may or may not contain influential observations. Also, 1 = D[ij]θ
ˆ > F(θ) ˆ −1 [L˙ [ij] (θ)] ˆ [L˙ [ij] (θ)]
ˆ > F(θ) ˆ −1 [U[ij] (θ)], ˆ = [U[ij] (θ)] for i = 1, . . . , r and j = 1, . . . , n, where [ij] is to designate the j-th obˆ = servation of the i-th subject to be deleted from the dataset, U[ij] (θ) ∂L[ij] (θ)/∂θ| ˆ and θ =θ ˆ = − (nr − 1) log(2π) − (r − 1) log|Σ| − 1 log|Σ[j] | L[ij] (θ) 2 2 2 r 1X > −1 − (Yk − Xβ) Σ (Yk − Xβ) 2 k=1 k6=i
1 − (Yi[j] − X[j] β)> Σ−1 [j] (Yi[j] − X[j] β). 2
Thus, ˆ + X> Σ−1 (Yi[j] − X[j] β), ˆ ˆ = ∂L[ij] (θ) = −X> Σ−1 (Yi − Xβ) U[ij] (β) [j] [j] ∂β ! ! ˆ [j] ˆ ∂L (θ) ∂ Σ 1 ∂ Σ 1 [ij] ˆ U (φ) = = Tr Σ−1 − Tr Σ−1 [j] ∂φ 2 ∂φl 2 ∂φl −1 ˆ 1 ˆ > ∂ Σ (Yi − Xβ) ˆ (Yi − Xβ) 2 ∂φl ˆ −1 ∂ Σ 1 [j] ˆ > ˆ − (Yi[j] − X[j] β) (Yi[j] − X[j] β), 2 ∂φl
14
for l = 1, . . . , 4, or in vector notation > ˆ = 1 ∂ vec (Σ) vec(Σ−1 ) U (φ) 2 ∂φ 1 ∂ vec> (Σ) ˆ ˆ > −1 − vec(Σ−1 (Yi − Xβ)(Y i − Xβ) Σ ) 2 ∂φ 1 ∂ vec> (Σ[j] ) vec(Σ−1 − [j] ) 2 ∂φ 1 ∂ vec> (Σ[j] ) ˆ > −1 ˆ + vec(Σ−1 [j] (Yi[j] − X[j] β)(Yi[j] − X[j] β) Σ[j] ), 2 ∂φ
where X[j] is an (n − 1) × p matrix in which the j-th row of X is deleted, Yi[j] ˆ [j] are (n − 1) × (n − 1) matrices. is Yi without the j-th observation and Σ 4.4. Q-function-based diagnostics at observation level The modified Q-function-based Cook’s statistic is of the form 1 ˆ θ)] ˆ > {− E[Q( ˆ θ)]} ˆ −1 [Q˙ [ij] (θ| ˆ θ)], ˆ ¨ θ| QD[ij]θ = [Q˙ [ij] (θ|
ˆ θ) ˆ has the elements where Q˙ [ij] (θ| Q˙ [ij]β | θ =θ ∗ =θˆ
1 1 > ri[j], X ˆri + X> [j] ˆ φˆ1 φˆ1 1 Tr(Ω∗ ) 1 > 1 > ˆ [j] ) + 1 , ˆri ˆri − ˆri[j]ˆri[j] + + Tr(Ω − 2φˆ1 h 2φˆ21 2φˆi21 2φˆ21 2φˆ1 1 1 ˆ ib ˆ >) + ˆ −1 (Ω ˆ +b − 2 Tr R i 2ˆ η1 2ˆ η12 h i 1 ˆ i[j] b ˆ> ) , ˆ −1 (Ω ˆ [j] + b + 2 Tr R i[j] [j] 2ˆ η1 ! " # ˆ −1 ˆ 1 ∂ R 1 ∂ R ˆ ib ˆ >) ˆ −1 ˆ +b Tr R + Tr (Ω i 2 ∂η2 2ˆ η1 ∂ ηˆ2 ! " # ˆ −1 ˆ [j] ∂ R ∂ R 1 1 [j] ˆ ˆ i[j] b ˆ> ) , ˆ −1 − Tr R − Tr (Ω[j] + b i[j] [j] 2 ∂η2 2ˆ η1 ∂ ηˆ2 ! " # ˆ ˆ −1 1 ˆ ib ˆ >) ˆ −1 ∂ R + 1 Tr ∂ R (Ω ˆ +b Tr R i 2 ∂η3 2ˆ η1 ∂ ηˆ3 ! " # ˆ −1 ˆ [j] ∂ R ∂ R 1 1 [j] ˆ i[j] b ˆ> ) , ˆ −1 ˆ [j] + b − Tr (Ω − Tr R i[j] [j] 2 ∂η2 2ˆ η1 ∂ ηˆ3
= −
Q˙ [ij]φ1 | = θ =θ ∗ =θˆ Q˙ [ij]η1 | = θ =θ ∗ =θˆ
Q˙ [ij]η2 | = θ =θ ∗ =θˆ
Q˙ [ij]η3 | = θ =θ ∗ =θˆ
15
where X[j] is an (n − 1) × p matrix in which the j-th row of X is deleted,
ˆ −b ˆ i[j] , Yi[j] is Yi without the j-th observation, b ˆ i[j] = ˆri[j] = Yi[j] − X[j] β ˆ R ˆ [j] are (n − 1) × (n − 1) matrices. ˆ [j] Σ ˆ −1 (Yi[j] − X[j] β), ˆ [j] , Σ ˆ [j] and Ω ηˆ1 R [j] Details are given in Appendix A. 4.5. Cutoff value for influential cases Cook and Weisberg (1982) indicated that the Cook’s distance can be compared with a χ2 distribution with an appropriate degree of freedom for calibration. Zhu et al. (2007) suggested the use of (p + q)/r for models with missing data as a rough cutoff value for calibrating the Q-function-based Cook’s statistics QD, where (p + q) is the dimension of the parameter vector θ and r is the number of repetitions. We also suggest to use QD + 2sd(QD) (the mean of QD plus twice its standard value) as a cutoff value. 5. Simulation study In order to analyze the performance of the methodology, proposed in Sections 4.2 and 4.4, to detect infuential points, simulation studies were carried out considering a regular grid with a distance of 1 unit between points. We generated a Gaussian spatial linear model with exponential geostatistical structure for the covariance matrix, Yij ∼ N (0, 3), i = 1, . . . , 10, and j = 1, . . . , n, i.e a Gaussian spatial linear model with 10 repetitions with n observations each, with parameter φ1 = 1, φ2 = 2 and φ3 = 0.3. After generating the model, firstly we perturbed an observation to create an outlier by Y11 ← Y11 + δ, where δ = 0, 2, 4, 8 and 16. Secondly, we perturbed a subject to create an outlier by Y1j ← Y1j + δ, where δ = 0, 2, 4, 8 and 16. With this perturbation scheme we are taking into account 16
the variability of the data for the model without covariates. For each case scenario we generated 500 replicates and a counter adding 1 if the methodology identified the generated influential observation, or adding 0, otherwise. Table 1 summarizes the results of the simulation study for the observational level scenario. For this perturbation scheme, we observed that the number of times the perturbed observation was correctly identified increases as the δ value is increased. For both n = 49 and n = 100 and for all values of 1 1 δ, the measures #QDijη and #QDij show similar results, i.e., the influential 1 are mainly affecting the parameters that observations detected by #QDij
define the spatial covariance structure. For n = 100, these two measures show an improvement (when compared to n = 49), with values around 93% 1 of correct identification. Clearly, for #QDijβ , this number decreases as n
is increased, i.e., the mean of the process is less influenced by an influential observation when the number of observations in each repetition is increased. Table 1: Percentage of number of times that the correct influential observation was identified considering 500 replicates for the data.
n = 49
n = 100
δ
1 #QDijβ
1 #QDijφ 1
1 #QDijη
1 #QDij
0
0.0
0.0
0.4
0.2
2
0.4
3.8
20.6
20.0
4
0.8
20.0
51.4
51.2
8
41.8
7.4
76.6
76.4
16
63.8
18.8
86.8
86.8
0
0.0
0.0
0.0
0.0
2
0.0
0.0
10.4
8.2
4
0.0
1.4
41.0
36.0
8
1.8
21.4
74.6
73.2
16
2.8
13.8
92.8
93.0
17
Table 2 shows the results of the simulation study for the subject level 1 1 scenario. As at the observational level, the measures #QDijη and #QDij
show similar results, for both n = 49 and n = 100 and for all values of δ. However, for the subject level these values are higher than for the obser1 vational level, presenting percentages very close to 100%. For #QDiβ , the
percentage of number of times that the correct influential subject was iden1 shows the tified achieves 100%, for n = 100 and δ = 16. In general, #QDijφ 1
lowest values for the number of times that the influential subject is correctly identified. Table 2 presents higher percentages than Table 1, i.e., it indicates that an influential subject might be identified more times than an influential observation. Table 2: Percentage of number of times that the correct influential subject was identified considering 500 replicates for the data. δ
n = 49
n = 100
1 #QDiβ
1 #QDiφ 1
1 #QDiη
#QDi1
0
5.6
4.0
7.0
4.4
2
91.2
11.4
60.0
59.4
4
93.8
19.4
87.4
87.4
8
93.8
22.8
93.0
93.0
16
93.8
25.4
93.8
93.8
0
5.2
2.8
6.8
6.8
2
98.4
25.4
14.0
10.0
4
95.4
59.0
40.0
38.6
8
99.6
26.8
80.6
80.4
16
100.0
40.2
99.6
99.6
6. Productivity data from the year 1998 to 2002 The data set were first analyzed by De Bastiani et al. (2015b) and De Bastiani et al. (2017) and consist of 253 observations in each year from 1998 18
to 2002. The data consist of soybean productivity data (t ha−1 ) and four chemical contents considered as explanatory variables and were collected in a grid of 7.20×7.20m an experimential area with 1.33ha. The chemical contents of soil are phosphorus (P)[mg.dm−3 ], potassium (K)[cmolc.dm−3 ], calcium (Ca)[cmolc.dm−3 ] and magnesium (Mg)[cmolc.dm−3 ]. The observations were taken at the same site for each repetition. It is natural to think that a spatio-temporal model would fit better, however De Bastiani et al. (2017) showed that the Gaussian spatial linear mixed model described in Section 2 is appropriate for this data set, when compared to separable spatio-temporal models. The parameter estimates and the respective asymptotic standard errors (se) in parenthesis are presented in Table 3. Table 3: Parameter estimates considering the Gaussian covariance function, the asymptotic standard errors (in parenthesis). Intercept
P
K
Ca
Mg
nugget
sill
f (range)
βˆ0
βˆ1
βˆ2
βˆ3
βˆ4
φˆ1
φˆ2
φˆ3
2.4013
-0.0017
0.3270
0.0118
-0.0592
0.1927
0.0579
0.0407
(0.1020)
(0.0109)
(0.1746)
(0.0392)
(0.0515)
(0.0178)
(0.0219)
(0.0098)
6.1. Global influence Firstly, we evaluate the individual contribution of each repetition in the process of estimation. Each repetition corresponds to a year. Figure 1 shows Cook’s distance for one-step approximation. It shows that subject #5, that corresponds to the year 2002/2003, has more influence on both β and φ.
19
2
3
4
5
8000 Di
2000
4000
5
0
2000
Di
4000
5
0 1
6000
8000 6000
40 0
10
Di
20
30
5
1
2
3
4
5
1
2
3
i
i
i
(a)
(b)
(c)
4
5
Figure 1: Cook’s distance using one-step approximation for elimination the i-th year, a)Dβ1 , b) Dφ1 and c)Di1 .
Figure 2 shows Cook’s distance based on the Q-function for case-deletion and also highlights the year #5, however this subject is not indicated as influential for the η’s parameters, according to Figure 2(c). De Bastiani et al. (2017) considered local influence diagnostics, and the results are very similar to the results presented in Figures 1 and 2, detecting that the year #5 is pontentially influential in the mean and in the variance component parameters. However, the Q-function based measure allows us to understand in which part of the variance component the year #5 is influential, i.e., in the residual error, not in the spatial structure. This is confirmed by the analysis of the data without subject #5, which still selects the Gaussian geoestatistical structure to model the spatial dependence between the data.
20
15
12
QDi
5
0
0
2
5
10
10 6 4
QDi
8
5
2
3
4
5
1
3 i
(a)
(b)
5
25
30
4
20
0.8
5
10
15
QDi
0.6 0.4
0
0.0
5
0.2
QDi
2
i
1.0
1
1
2
3
4
5
1
i
2
3
4
5
i
(c)
(d)
Figure 2: Q-function based distance using one-step approximation for elimination the i-th year, a)QDβ1 , b) QDφ1 1 , c) QDη1 and d)QDi1 .
Figures 3 and 4 show Cook’s distance based on the likelihood and the Q-function, respectively, at the observational level when removing each observation from the data set. Both figures highlight observations from the year 2002/2003, the latest one. The observation #14 is from the the year 1998 and correspond to a value 1.19 t ha−1 , which is the minimum value in this year. The observation #1200 has value 0.98 t ha−1 , below the mean, and below the values of this location for the other years, i.e. 2.05, 2.48, 3.62 and 2.65 t ha−1 . 21
200
200 0
200
400
600
800
1000
1200
150
1200
Di
0
50
100
150 0
0.00
50
Di
1132
0.10
Di
14
1200
100
0.20
1123 1144
0
200
400
600
800
1000
1200
0
200
400
600
i
i
i
(a)
(b)
(c)
800
1000
1200
Figure 3: Cook’s distance using one-step approximation for eliminating the j-th observa-
QDij
0.04
14 59
357
813
0.00
0.0 0
200
400
600
800
1000
1200
0
200
400
nr
1000
1200
30000 10000
20000
1047 1080 1053
0
0
10000
QDij
30000
800
(b)
1047 1080 1053
20000
600 nr
(a)
QDij
565 647
0.02
1.0 0.5
QDij
1.5
2.0
0.06
1 1 1 tion from the i-th year, a)Dijβ , b) Dijφ and d)Dij . 1
0
200
400
600
800
1000
1200
0
nr
200
400
600
800
1000
1200
nr
(c)
(d)
Figure 4: Q-function based distance using one-step approximation for eliminating the j-th observation from the i-th year, a)QDβ1 , b) QDφ1 1 , c) QDη1 and d)QDi1 .
22
As we expected after comparing the percentages in Tables 2 and 1, influential observations are not as clearly identified as influential subjects. The methodology presented is this paper allows us to identify that the year #5 is potentially influential for both β’s and η’s. However, after analyzing the data removing the potentially influential subject, De Bastiani et al. (2017) observed that the most recent year affected the statistical inference on the significance of the parameters of the fixed effect and did not affect the choice of the geostatistical model, which remains the Gaussian structure. 7. Conclusions We proposed global influence measures based on the likelihood and Qfunction to assess the stability of the model in the framework of mixed models, complementing the work presented by De Bastiani et al. (2015b) and De Bastiani et al. (2017). We obtained explicit expressions for the matrices necessary to implement these diagnostic techniques, considered at the subject and observation levels. We evaluated the methodology with a simulation study and illustrated it with an analysis of a real data set. 1 1 We observed that the measures #QDijη and #QDij show similar results
in the simulation study, indicating that the influential observations detected 1 by #QDij are mainly affecting the parameters that define the spatial co1 variance structure. In other words, if we had only considered #QDij , we
would have been detecting influential subject and/or observations that are influential in the spatial covariance structure. This could have been noticed by considering the use of the expected information matrix and the expected second derivative of the Q-function as proposed by Pan et al. (2014). 23
Other approaches would be to consider distributions with heavier tails than the normal distribution or robust methods, i.e., the analysis of nonnormal distributions are alternatives, such as the multivariate t distribution or the family of elliptical distributions. Acknowledgements We acknowledge the partial financial support from Funda¸ca˜o Arauc´aria of Paran´a State, Capes, CNPq and FACEPE, Brazil and project FONDECYT 1150325, Chile. We thank Pontificia Universidad Catolica de Chile for funding Fernanda De Bastiani with a post-doctoral fellowship. References Assump¸ca˜o, R. A. B., Uribe-Opazo, M., Galea, M., 2014. Analysis of local influence in geostatistics using Student’s t distribution. Journal of Applied Statistics 41, 2323–2341. Banerjee, M., 1998. Cook’s distance in linear longitudinal models. Communications in statistics: Theory and Methods 27, 2973–2983. Boos, D. D., Stefanski, L. A., 2013. Essential Statistical Inference: Theorey and Methods. Springer Texts in Statistics. Springer. Cerioli, A., Riani, R., 1999. The ordering of spatial data and the detection of multiple outliers. Journal of Computational and Graphical Statistics 8, 239–258. Chatterjee, S., Hadi, A. S., 1988. Sensitivity analysis in linear regression. John Wiley and Sons, New York. 24
Christensen, R., Johnson, W., Pearson, L., 1992a. Prediction diagnostics for spatial linear models. Biometrika 79, 583–591. Christensen, R., Johnson, W., Pearson, L., 1993. Covariance function diagnostics for spatial linear models. Mathematical Geology 25, 145–160. Christensen, R., Pearson, L. M., Johnson, W., 1992b. Case-deletion diagnostics for mixed models. Technometrics 34, 38–45. Cook, R., 1986. Assessment of local influence. Journal of the Royal Statistical Society, Serie B 48, 133–169. Cook, R. D., February 1977. Detection of influential observations in linear regression. Technometrics 19 (1), 15–18. Cook, R. D., Weisberg, S., 1982. Residuals and Influence in Regression. New York: Chapman and Hall. Davison, A. C., Tsai, C. L., 1992. Regression model diagnostics. International Statistical Review 60 (337–353). De Bastiani, F., Cysneiros, A. H. M. A., Uribe-Opazo, M. A., Galea, M., 2015a. Influence diagnostics in elliptical spatial linear models. TEST 24 (2), 322–340. De Bastiani, F., Galea, M., Cysneiros, A., Uribe-Opazo, M., 2017. Gaussian spatial linear models with repetitions: an application to soybean productivity. Spatial Statistics 21 (Part A), 319–335.
25
De Bastiani, F., Uribe-Opazo, M., Cysneiros, A., Galea, M., 2015b. Global influence diagnostics in gaussian spatial linear model with multiple repetitions. Procedia Environmental Sciences 26, 74–77. De Gruttola, V., Ware, J. H., Louis, T. A., 1987. Influence analysis of generalized least squares estimators. Journal of the American Statistical Association 82 (399), 911–917. Demidenko, E., Stukel, T. A., 2005. Influence analysis for linear mixed-effects models. Statistics in Medicine 24, 893–909. Diamond, P., Armstrong, M., 1984. Robustness of variograms and conditioning of kriging matrices. Mathematical Geology 16, 809–822. Diggle, P. J., Ribeiro Jr., P. J., 2007. Model-based geostatistics. Springer. Filzmoser, P., Ruiz-Gazen, A., Thomas-Agnan, C., 2014. Identification of local multivariate outliers. Stat Papers 55, 29–47. Haslett, J., Dillane, D., 2004. Application of ’delete = replace’ to deletion diagnostics for variance component estimation in the linear mixed model. J. R. Statist. Soc. B 66, 131–143. Kano, Y., Berkane, M., Bentler, P. M., 1993. Statistical inference based on pseudo-maximum likelihood estimators in elliptical populations. Journal of the American Statistical Association 88 (421), pp. 135–143. Militino, A., Palacius, M., Ugarte, M., 2006. Outliers detection in multivariate spatial linear models. Journal of Statistical Planning and Inference 136, 125–146. 26
Pan, J., Fei, Y., Foster, P., 2014. Case-deletion diagnostics for linear mixed models. Technometrics 56 (3), 269–281. Preisser, J. S., Qaqish, B. F., 1996. Deletin diagnostics for generalised estimating equations. Biometrika 83, 551–562. Smith, R. L., June 25-29 2001. Environmental statistics. notes at the conference board of the mathematical sciences (cbms) course at university of washington. Uribe-Opazo, M., Borssoi, J., Galea, M., 2012. Influence diagnostics in gaussian spatial linear models. Journal of Applied Statistics 39 (3), 615–630. Waller, L., Gotway, C., 2004. Applied Spatial Statistics for Public Health Data. Wiley-Interscience, Hoboken NJ. Warnes, J., 1986. Sensitivity analysis for universal kriging. Mathematical Geology 18, 653–676. Waternaux, C., Laird, N., Ware, J., 1989. Methods for analysis of longitudinal data: blood-lead concentrations and cognitive development. Journal of the American Statistical Association 84 (405), 33–41. Wei, B.-C., 1998. Exponential Family Nonlinear Models. Springer. Zewotir, T., Galpin, J. S., 2005. Influence diagnostics for linear mixed models. Journal of Data Science 3, 153–177. Zhu, H., Ibrahim, J., Lee, S., Zhang, H., 2007. Perturbation selection and influence measures in local influence analysis. Annals of Statistics 35, 2565– 2588. 27
Zhu, H., Lee, S., 2001. Local influence for incomplete-data models. Journal of the Royal Statistical Society B63, 111–126.
Appendix A. The Q-function and derivatives for Gaussian spatial linear mixed model a) Linear mixed model
Yi = Xi β + Zbi + τi for i = 1, . . . , r, where bi ∼ N (0, G) ⊥ τi ∼ N (0, Di ). b) Joint distribution
bi Yi
∼ N
0 Xi β
,
G
GZ> i
Zi G
Σi
where Σi = Zi GZ> i + Di , i = 1, . . . , r,
,
Cov(bi , Yi ) = Cov(bi , Xi β + Zbi + τi ) = Var(bi )Z> i = GZ> i . c) Conditional distribution The conditional distribution is given by −1 bi |Yi ∼ N GZ> i Σi (Yi − Xi β), Ωi , i = 1 . . . , r,
−1 where Ωi = G − GZ> i Σi Zi G.
28
d) Complete loglikelihood The complete likelihood is given by Lc (θ) =
r X i=1
Lic (θ),
where θ is a vector of the unknown parameters and 1 1 Lic (θ) = − log |Di | − (Yi − Xi β − Zi bi )> D−1 i (Yi − Xi β − Zi bi ) 2 2 1 1 G−1 bi . − log |G| − b> 2 2 i e) Q-function The Q-function is given by ∗
Q(θ|θ ) =
r X
∗
E(Lic (θ)|Y, θ ) =
i=1
r X
Qi (θ|θ ∗ ),
i=1
1 1 1 ∗> −1 ∗ ∗ > Qi (θ|θ ∗ ) = − log |Di | − Tr(D−1 i Zi Ωi Zi ) − ri Di ri 2 2 2 1 ∗> −1 ∗ 1 1 −1 ∗ − log |G| − Tr(G Ωi ) − bi G bi , 2 2 2 1 1 1 −1 ∗ ∗ > = − log |Di | − Tr(Di Zi Ωi Zi ) − r∗> D−1 i ri 2 2 2 i 1 1 − log |G| − Tr[G−1 (Ω∗i + b∗i b∗> i )], 2 2 where θ ∗ is the vector of parameters estimate of θ from previous iteration of the algorithm. Also, r∗i = Yi − Xi β − Zi b∗i
∗−1 b∗i = E(bi |Yi , θ ∗ ) = G∗ Z> (Yi − Xi β ∗ ) i Σi
∗−1 Ω∗i = Cov(bi |Yi , θ ∗ ) = G∗ − G∗ Z> Zi G∗ . i Σi
29
f ) Q-function for the spatial linear mixed model We considered Xi = X, Zi = I, φ1 = φ1 , η = (η1 , η2 , η3 )> = (φ2 , φ3 , φ4 )> , so R = R(η2 , η3 ) , Di = φ1 I, G = G(η) = η1 R, and Σi = Σ = φ1 I + η1 R, so θ = (β > , φ1 , η> )> which results on the particular case of the Q-function for the spatial linear mixed model given by Q(θ|θ ∗ ) =
r X
Qi (θ|θ ∗ ),
i=1
1 ∗> ∗ 1 n Tr(Ω∗ ) − r r Qi (θ|θ ∗ ) = − log φ1 − 2 2φ1 2φ1 i i n 1 1 − log η1 − log |R| − Tr[R−1 (Ω∗ + b∗i b∗> i )], 2 2 2η1 where r∗i = Yi − Xβ − b∗i
b∗i = E(bi |Yi , θ ∗ ) = η1∗ R∗ Σ∗−1 (Yi − Xβ ∗ )
Ω∗i = Ω∗ = Cov(bi |Yi , θ ∗ ) = η1∗ R∗ − η1∗2 R∗ Σ∗−1 R∗ , thus, r
nr r 1 X ∗> ∗ Q(θ|θ ) = − log φ1 − r r Tr(Ω∗ ) − 2 2φ1 2φ1 i=1 i i (A.1) r r 1 X nr −1 ∗ ∗ ∗> Tr[R (Ω + bi bi )]. − log η1 − log |R| − 2 2 2η1 i=1 ∗
g) First-order derivative of Q-function for the spatial linear mixed model Based on (A.1) the first-order derivative of Q(θ|θ ∗ ) with respect to θ = (β > , φ1 , η1 , η2 , η3 )> is given by ∗ ˙ Q(θ|θ )=
r X i=1
30
Q˙ i (θ|θ ∗ ),
with elements
∗ Q˙ i (θ|θ ) =
where
Q˙ iβ Q˙ iφ1 Q˙ iη1 Q˙ iη2 Q˙ iη3
=
1 > ∗ X ri φ1 1 1 n ∗ Tr(Ω∗ ) + 2 r∗> i ri − 2 2φ 2φ1 2φ1 1 −1 1 n ∗ ∗ ∗> + 2 Tr R − Ω + bi bi 2η1 2η 1 1 ∂R 1 ∂R−1 −1 − Tr R − Tr Ω∗ + b∗i b∗> i 2 ∂η2 2η1 2 ∂η−1 1 ∂R 1 ∂R ∗ ∗ ∗> −1 − Tr Ω + bi bi − Tr R 2 ∂η3 2η1 ∂η3
,
∂R−1 ∂R−1 ∂R −1 ∂R−1 ∂R−1 ∂R −1 = = −R−1 R , = = −R−1 R . ∂η2 ∂φ3 ∂φ3 ∂η3 ∂φ4 ∂φ4
Thus,
∗ ˙ Q(θ|θ )=
Q˙ β Q˙ φ1 Q˙ η1 Q˙ η2 Q˙ η3
=
r 1 X > ∗ X ri φ1 i=1 r X r nr ∗ ∗) + 1 Tr(Ω r∗> i ri − 2 2 2φ1 2φ1 2φ1 i=1 r h i nr 1 X − + 2 Tr R−1 Ω∗ + b∗i b∗> i 2η1 2η1 i=1 X r ∂R r 1 ∂R−1 ∗ ∗ ∗> −1 Ω + bi bi − Tr R − Tr 2 ∂η2 2η1 ∂η2 i=1 r r ∂R 1 X ∂R−1 ∗ ∗ ∗> −1 − Tr R Ω + bi bi − Tr 2 ∂η3 2η1 ∂η3 i=1
ˆ where θ ˆ is the maximum likelihood estimate of θ, Letting θ = θ ∗ = θ, we have ∗ ˆ θ) ˆ = [Q(θ|θ ˙ θ| ˙ Q( )] = 0, θ =θ ∗ =θˆ
31
(A.2) .
which leads to the following r X X>ˆri = 0 i=1 r X r 1 nr ˆ ˆr> =0 Tr(Ω) + ri − i ˆ ˆ1 ˆ1 2 2 φ 2 φ i=1 r nr 1 X h ˆ −1 ˆ ˆ ˆ i − + Tr R Ω + bi bi = 0 2 2ˆ η 1 i=1 ! " # r X ˆ −1 ˆ 1 ∂ R ∂ R r ˆ ib ˆ> = 0 ˆ +b ˆ −1 − Tr Ω − Tr R i 2 ∂η 2ˆ η ∂η 2 1 i=1 2 ! " # r X ˆ ˆ −1 r ∂ R 1 ∂ R ˆ ib ˆ > = 0. ˆ +b ˆ −1 − Tr Ω − 2 Tr R i ∂η3 2ˆ η1 i=1 ∂η3
(A.3)
h) Second-order derivative of Q-function for the spatial linear mixed model The second-order derivative of the Q-function is given by ∗ 2 ¨ iθθ = ∂ Qi (θ|θ ) , Q ∂θ∂θ >
32
with elements ¨ iββ = − 1 X> X Q φ1 ¨ iβφ1 = − 1 X> r∗i Q φ21 ¨ iβη1 = 0 Q ¨ iβη2 = 0 Q ¨ iβη3 = 0 Q n ¨ iφ1 φ1 = − 1 Tr(Ω∗ ) − 1 r∗> r∗i + 2 Q 3 3 i φ1 φ1 2φ1 ¨ iφ1 η1 = 0 Q ¨ iφ1 η2 = 0 Q ¨ iφ1 η3 = 0 Q ¨ iη1 η1 = Q ¨ iη1 η2 = Q ¨ iη1 η3 = Q ¨ iη2 η2 = Q
¨ iη2 η3 = Q
¨ iη3 η3 = Q
1 n − 3 Tr R−1 Ω∗ + b∗i b∗> i 2 2η1 η1 ∂R−1 1 ∗ ∗ ∗> Tr Ω + bi bi 2η12 ∂η2 ∂R−1 1 ∗ ∗ ∗> Tr Ω + bi bi 2η12 ∂η3 2 1 ∂R−1 ∂R 1 −1 ∂ R − Tr − Tr R 2 ∂η2 2 ∂η2 η2 ∂η22 −1 ∂ R 1 Ω∗ + b∗i b∗> − Tr i 2η1 ∂η2 η2 −1 2 1 ∂R ∂R 1 −1 ∂ R − Tr − Tr R 2 ∂η2 2 ∂η23 −1 ∂η2 η3 1 ∂ R Ω∗ + b∗i b∗> − Tr i 2η1 ∂η2 η3 −1 2 ∂R ∂R 1 1 −1 ∂ R − Tr − Tr R 2 ∂η3 2 ∂η23 −1 ∂η3 η3 1 ∂ R − Tr Ω∗ + b∗i b∗> , i 2η1 ∂η3 η3
∂ 2 R−1 ∂R−1 ∂R −1 ∂ 2R = −2 R − R−1 R. More details are given in ∂ηj ∂ηk ∂ηk ∂ηj ∂ηj ∂ηk Uribe-Opazo et al. (2012). where
33
ˆ for the spatial linear mixed ¨ evaluating at θ = θ ∗ = θ i) Expectation of [−Q] model By noting that = 0, E[r∗i ]| θ =θ ∗ =θˆ ˆΣ ˆ −1 R, ˆ E[b∗i b∗> ˆ12 R ˆ = η i ]|θ =θ ∗ =θ ˆ ˆ E[r∗i r∗> ˆ = Ω + φ1 I, i ]|θ =θ ∗ =θ and by the fact that ∗
E[−Q(θ|θ )] = −
r X
34
i=1
E[Qi (θ|θ ∗ )],
ˆ for the spatial linear mixed ¨ evaluating at θ = θ ∗ = θ the expectation of [−Q] model has the elements r > ¨ ββ ]| E[−Q = X X θ =θ ∗ =θˆ ˆ φ1 ¨ βφ1 ]| E[−Q = 0 θ =θ ∗ =θˆ ¨ βη1 ]| E[−Q = 0 θ =θ ∗ =θˆ ¨ βη2 ]| E[−Q = 0 θ =θ ∗ =θˆ ¨ βη3 ]| E[−Q = 0 θ =θ ∗ =θˆ 2r ˆ + nr ¨ φ1 φ1 ]| Tr(Ω) E[−Q = 3 θ =θ ∗ =θˆ ˆ φ1 2φˆ21 ¨ φ1 η1 ]| E[−Q = 0 θ =θ ∗ =θˆ ¨ φ1 η2 ]| E[−Q = 0 θ =θ ∗ =θˆ ¨ φ1 η3 ]| E[−Q = 0 θ =θ ∗ =θˆ −1 r nr ˆ + r Tr Σ ˆ −1 Ω ˆ R ˆ ¨ η1 η1 ]| E[−Q = − 2 + 3 Tr R ∗ ˆ θ =θ =θ 2ˆ η1 ηˆ1 η ˆ 1 ! ! 2ˆ ˆ −1 ∂ R ˆ r r ∂ R ∂ R ˆ −1 ¨ η2 η2 ]| E[−Q = Tr + Tr R θ =θ ∗ =θˆ 2 ∂η2 ∂η2 2 ∂η2 η2 " # ˆ −1 −1 r ∂ 2R ˆ + ηˆ2 R ˆˆ ˆ + Tr Ω 1 Σ R 2ˆ η1 ∂η2 η2 " # −1 ˆ −1 ∂R r ˆ + ηˆ2 R ˆˆ ˆ ¨ η1 η2 ]| E[−Q = − 2 Tr Ω 1 Σ R θ =θ ∗ =θˆ 2ˆ η1 ∂η2 " # ˆ −1 −1 r ∂ R ˆ + ηˆ2 R ˆˆ ˆ ¨ η1 η3 ]| Ω E[−Q = − 2 Tr 1 Σ R θ =θ ∗ =θˆ 2ˆ η1 ∂η3 ! ! 2ˆ ˆ −1 ∂ R ˆ r ∂ R r ∂ R ˆ −1 ¨ η2 η3 ]| E[−Q = Tr + Tr R θ =θ ∗ =θˆ 2 ∂η3 ∂η2 2 ∂η2 η3 " # ˆ −1 −1 ∂ 2R r ˆ + ηˆ2 R ˆˆ ˆ + Tr Ω 1 Σ R 2ˆ η1 ∂η2 η3 ! ! 2ˆ ˆ −1 ∂ R ˆ r ∂ R r ∂ R ˆ −1 ¨ η3 η3 ]| + Tr R E[−Q = Tr θ =θ ∗ =θˆ 2 ∂η3 ∂η3 2 ∂η3 η3 # " ˆ −1 −1 r ∂ 2R ˆ + ηˆ2 R ˆˆ ˆ + Ω . Tr 1 Σ R 2ˆ η1 ∂η3 η3 35
j) Q-function and first derivatives for global influence For the case deletion global influence study, r n(r − 1) (r − 1) 1 X ∗> ∗ ∗ Q[i] (θ|θ ) = − log φ1 − Tr(Ω ) − r r 2 2φ1 2φ1 k=1 k k ∗
k6=i
n(r − 1) (r − 1) − log η1 − log |R| 2r 2 1 X − Tr[R−1 (Ω∗ + b∗k b∗> k )]. 2η1 k=1
(A.4)
k6=i
Similarly to (A.2), the first-order derivative of Q˙ [i] (θ|θ ∗ ) given in (A.5) and by noting (A.3) we obtain ∗ ˙ Q[i] (θ|θ ) =
1 − X> r∗i φ1 1 1 n ∗ ∗ − 2 Tr(Ω ) − 2 r∗> i ri + 2φ1 2φ1 2φ1 −1 n 1 − 2 Tr R Ω∗ + b∗i b∗> i 2η1 2η 1 −1 1 1 ∂R −1 ∂R ∗ ∗ ∗> Tr R + Tr Ω + bi bi 2 ∂η2 2η1 ∂η2 1 ∂R−1 1 −1 ∂R ∗ ∗ ∗> Tr R + Tr Ω + bi bi 2 ∂η3 2η1 ∂η3
For the observation-level,
.
r (nr − 1) (r − 1) 1 X ∗> ∗ Q[ij] (θ|θ ∗ ) = − log1 − Tr(Ω∗ ) − r r 2 2φ1 2φ1 k=1 k k k6=i
1 ∗> ∗ (nr − 1) (r − 1) − ri[j] ri[j] − log η1 − log |R| 2φ1 2 2 r 1 1 X ∗ − log |R[j] | − Tr[R−1 (Ω∗ + b∗> k bk )] 2 2η1 k=1 k6=i
1 ∗ ∗> ∗ Tr[R−1 − [j] (Ω[j] + bi[j] bi[j] )], 2η1 36
(A.5)
where Q˙ [ij] (θ|θ ∗ ) has elements Q˙ [ij]β =
r 1 1 X > ∗ X ri + X> r∗ φ1 k=1 φ1 [j] i[j] k6=i
Q˙ [ij]φ1
(nr − 1) (r − 1) 1 + = − Tr(Ω∗ ) + 2 Tr(Ω∗[j] ) 2 2φ1 2φ1 2φ1 r 1 ∗> ∗ 1 X ∗> ∗ + 2 r r + r r 2φ1 k=1 i[j] i[j] 2φ1 i i k6=i
r (nr − 1) 1 X −1 ∗ ∗ ˙ Q[ij]η1 = − + 2 Tr R (Ω + b∗> k bk ) 2η1 2η1 k=1
Q˙ [ij]η2
Q˙ [ij]η3
k6=i i h 1 −1 ∗ ∗ ) b + 2 Tr R[j] (Ω[j] + b∗> i[j] i[j] 2η1 (r − 1) 1 −1 ∂R[j] −1 ∂R = − Tr R − Tr R[j] 2 ∂η 2 ∂η2 2 r −1 X ∂R 1 ∗ Tr − (Ω∗ + b∗> k bk ) 2η1 k=1 ∂η2 k6=i" # ∂R−1 1 [j] ∗ Tr (Ω∗ + b∗> − i[j] bi[j] ) 2η1 ∂η2 (r − 1) 1 −1 ∂R[j] −1 ∂R = − Tr R − Tr R[j] 2 ∂η3 2 ∂η3 r ∂R−1 ∗ 1 X ∗ (Ω + b∗> − Tr k bk ) 2η1 k=1 ∂η3 k6=i" # ∂R−1 1 [j] ∗ − Tr (Ω∗ + b∗> i[j] bi[j] ) . 2η1 ∂η3
ˆ and noting (A.3), we obtain Q˙ [ij] (θ| ˆ θ) ˆ with Evaluating at θ = θ ∗ = θ
37
elements 1 1 Q˙ [ij]β | = − X>ˆri + X> ri[j] ∗ ˆ [j] ˆ θ =θ =θ φˆ1 φˆ1 Tr(Ω∗ ) 1 > 1 1 > ˆ [j] ) + 1 ˆri ˆri − ˆri[j]ˆri[j] + + Tr(Ω Q˙ [ij]φ1 | = − ∗ ˆ θ =θ =θ 2φˆ1 h 2φˆ21 2φˆi21 2φˆ21 2φˆ1 1 ˆ ib ˆ >) ˆ −1 (Ω ˆ +b Q˙ [ij]η1 | = − 2 Tr R i θ =θ ∗ =θˆ 2ˆ η1 h i 1 ˆ i[j] b ˆ> ) + 1 ˆ −1 (Ω ˆ [j] + b + 2 Tr R i[j] [j] 2 2ˆ η1 2ˆ η! 1 ! ˆ ˆ 1 ˆ −1 ∂ R − 1 Tr R ˆ −1 ∂ R[j] Q˙ [ij]η2 | = Tr R ∗ ˆ [j] θ =θ =θ 2 ∂η2 2 ∂η2 # " −1 ˆ 1 ∂R ˆ >) ˆ ib ˆ +b + Tr (Ω i 2ˆ η1 ∂ ηˆ2 # " ˆ −1 ∂R 1 [j] ˆ ˆ i[j] b ˆ> ) − Tr (Ω[j] + b i[j] 2ˆ η1 ∂ ηˆ2 ! ! ˆ [j] ˆ ∂ R 1 ∂ R 1 ˆ −1 ˆ −1 Q˙ [ij]η3 | = Tr R − Tr R [j] θ =θ ∗ =θˆ 2 ∂η3 2 ∂η2 # " ˆ −1 1 ∂R ˆ ib ˆ >) ˆ +b + Tr (Ω i 2ˆ η1 ∂ ηˆ3 # " ˆ −1 ∂R 1 [j] ˆ ˆ i[j] b ˆ> ) − Tr (Ω[j] + b i[j] 2ˆ η1 ∂ ηˆ3 where X[j] is an (n − 1) × p matrix in which the j-th row of X is deleted,
ˆ −b ˆ i[j] , Yi[j] is Yi without the j-th observation, b ˆ i[j] = ˆri[j] = Yi[j] − X[j] β ˆ R ˆ [j] are (n − 1) × (n − 1) matrices. ˆ [j] Σ ˆ −1 (Yi[j] − X[j] β), ˆ [j] , Σ ˆ [j] and Ω ηˆ1 R [j]
38