Deletion, replacement and mean-shift for diagnostics in linear mixed models

Deletion, replacement and mean-shift for diagnostics in linear mixed models

Computational Statistics and Data Analysis 56 (2012) 202–208 Contents lists available at SciVerse ScienceDirect Computational Statistics and Data An...

261KB Sizes 2 Downloads 28 Views

Computational Statistics and Data Analysis 56 (2012) 202–208

Contents lists available at SciVerse ScienceDirect

Computational Statistics and Data Analysis journal homepage: www.elsevier.com/locate/csda

Deletion, replacement and mean-shift for diagnostics in linear mixed models Lei Shi a , Gemai Chen b,a,∗ a

School of Statistics and Mathematics, Yunnan University of Finance and Economics, Kunming, Yunnan 650221, The People’s Republic of China

b

Department of Mathematics and Statistics, University of Calgary, Calgary, Alberta, T2N 1N4, Canada

article

info

Article history: Received 1 March 2010 Received in revised form 12 April 2011 Accepted 4 July 2011 Available online 18 July 2011 Keywords: Case deletion Replacement Mean-shift model Diagnostics Influential observations

abstract Deletion, replacement and mean-shift model are three approaches frequently used to detect influential observations and outliers. For general linear model with known covariance matrix, it is known that these three approaches lead to the same update formulae for the estimates of the regression coefficients. However if the covariance matrix is indexed by some unknown parameters which also need to be estimated, the situation is unclear. In this paper, we show under a common subclass of linear mixed models that the three approaches are no longer equivalent. For maximum likelihood estimation, replacement is equivalent to mean-shift model but both are not equivalent to case deletion. For restricted maximum likelihood estimation, mean-shift model is equivalent to case deletion but both are not equivalent to replacement. We also demonstrate with real data that misuse of replacement and mean-shift model in place of case deletion can lead to incorrect results. © 2011 Elsevier B.V. All rights reserved.

1. Introduction Influential observations and outliers need to be identified or detected during the diagnostic stage of building a statistical model. Deletion, replacement and mean-shift model are three commonly used approaches to do so. In a series of papers (Haslett and Hayes, 1998; Hayes and Haslett, 1999; Haslett, 1999), Haslett and his co-authors have established that the deletion approach and the replacement approach lead to the same update formulae for the estimates of the regression coefficients for linear models with known covariance matrices. This equivalence allows one to take advantage of the replacement approach to greatly simplify the derivations of some model diagnostic measures and to speed up the computation, because with the replacement approach, the model structure is not changed and only some observations are replaced with predictions. Aimed originally at detecting outliers, the mean-shift model contains dummy variables that correspond to the cases of possible outliers. It turns out that the estimation of the regression coefficients in this model is equivalent to subset deletion (Cook and Weisberg, 1982; Chatterjee and Hadi, 1988; Martin, 1992; Wei and Fung, 1999). Therefore, there is this three-way equivalence. We note that the deletion approach and the mean-shift model approach are intended to define entities that are interesting to study as influence diagnostic measures, while the replacement approach is only for computing convenience. For linear models with unknown covariance matrices whose regression coefficients and unknown parameters in the covariance matrices need to be estimated simultaneously, the current influence/outliers detection methodology is much less developed. Despite warnings (Hodges, 1998, p. 507; Atkinson, 1998, p. 521; and Haslett and Dillane, 2004, p. 142), the most common way to conduct an influential analysis is to treat the full data based estimates of the unknown parameters



Corresponding author at: Department of Mathematics and Statistics, University of Calgary, Calgary, Alberta, T2N 1N4, Canada. E-mail addresses: [email protected] (L. Shi), [email protected] (G. Chen).

0167-9473/$ – see front matter © 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.csda.2011.07.002

L. Shi, G. Chen / Computational Statistics and Data Analysis 56 (2012) 202–208

203

in the covariance matrices as if they were known, and proceed as in the covariance matrix known case, except in a few papers, such as Schabenberger (2004) and Shi and Chen (2008a,b), where the authors took the warnings seriously. It is true that with unknown covariance matrices it is technically more difficult to develop a model diagnostic measure that is also simple to compute. But more importantly, it seems that researchers have not fully understood the need to study the influence on the regression parameter estimation and the influence on the covariance matrix estimation equally seriously and simultaneously. With increased complexity both conceptually and computationally when unknown covariance matrices are involved, one wonders whether the advantages of the replacement approach are still available to help one develop new model diagnostic measures. Haslett and Dillane (2004) applied the ‘delete = replace’ idea to study the deletion diagnostics for variance components estimation in linear mixed models. However, they did not justify that deletion was equivalent to replacement in their case. The goal of this paper is to examine the equivalence or non-equivalence among the deletion, replacement and mean-shift model approaches under linear mixed models. The rest of the paper is organized as below. In Section 2 we introduce the linear mixed model and its parameter estimation. In Section 3 we derive update formulae for the estimates of the model parameters using deletion, replacement and mean-shift model approaches and discuss their relationships. In Section 4 we illustrate with a real data set that the replacement and mean-shift model approaches give different results from those of the deletion approach. 2. Estimation in linear mixed models Consider the linear mixed model Y = X β + ϵ,

(2.1)

where Y = (y1 , . . . , yn )T is an n × 1 response vector, X is an n × p design matrix with rank p, β is∑an unknown p × 1 fixed r j parameter vector, and ϵ is an n × 1 normally distributed vector with E (ϵ) = 0 and cov(ϵ) = j=0 Z τj , V (τ ), where

τ = (τ0 , τ1 , . . . , τr )T is the random parameter vector constrained to make V (τ ) positive definite, and Z j , j = 0, . . . , r, are known n × n matrices. We assume Z 0 = In such that a random error is contained in ϵ , where In is the n × n identity matrix.

Model (2.1) includes a large scope of models as special cases, such as variance components models (Searle et al., 1992), hierarchical models (Bryk and Raudenbush, 1992), multilevel models (Goldstein, 1995), random-coefficients models (Longford, 1993) and some longitudinal data models (Diggle et al., 2002). When the random parameter τ is known, the generalized least squares estimator of β is

ˆ ) = (X T V −1 (τ )X )−1 X T V −1 (τ )Y . β(τ

(2.2)

When τ is unknown, there are two commonly used estimators for τ . The first is the maximum likelihood estimator (MLE), which is obtained by maximizing the main part of the log-likelihood function l(τ |Y ) = − log |V (τ )| − (Y − X β)T V −1 (τ )(Y − X β).

(2.3)

ˆ ) and set the result to zero, we have the following system of equations Differentiate l(τ |Y ) with respect to τj , replace β by β(τ for the MLE of τ to satisfy tr(V −1 (τ )Z j ) = Y T Q (τ )Z j Q (τ )Y ,

j = 0, . . . , r ,

(2.4)

where Q (τ ) = V (τ ) − V (τ )X (X V (τ )X ) X V (τ ) is idempotent with respect to V (τ ) because Q (τ )V (τ )Q (τ ) = Q (τ ) (Hayes and Haslett, 1999; Martin, 1992), ∑r and−1τ andi −β1 are jestimated iteratively from (2.2) and (2.4). From tr(V −1 (τ )Z j ) = tr(V −1 (τ )V (τ )V −1 (τ )Z j ) = (τ )Z V (τ )Z )τi , (2.4) can be written as i=0 tr(V −1

r −

−1

T

−1

−1 T

−1

tr(V −1 (τ )Z i V −1 (τ )Z j )τi = Y T Q (τ )Z j Q (τ )Y ,

j = 0, . . . , r .

(2.5)

i =0

The second estimator of τ is the restricted maximum likelihood estimator (REMLE), which takes into account the loss of degrees of freedom when estimating β (Harville, 1977). From Goldstein (1989) the REMLE is found by maximizing lr (τ |Y ) = − log |V (τ )| − log |X ′ V −1 (τ )X | − (Y − X β)T V −1 (τ )(Y − X β),

(2.6)

which leads to the following system of equations for the REMLE of τ to satisfy tr(Q (τ )Z j ) = Y T Q (τ )Z j Q (τ )Y ,

j = 0, . . . , r .

(2.7)

Christensen (1996, Section 12.6) showed that the above system of equations was equivalent to equating the sum T j T j j j of ∑rsquare Y Qi (τ )Z Qj (τ )Y to its expected value E (Y Q (τ )Z Q (τ )Y ). From tr(Q (τ )Z ) = tr(Q (τ )V (τ )Q (τ )Z ) = i=0 tr(Q (τ )Z Q (τ )Z )τi , j = 0, . . . , r, (2.7) becomes r − i =0

tr(Q (τ )Z i Q (τ )Z j )τi = Y T Q (τ )Z j Q (τ )Y , j = 0, . . . , r .

(2.8)

204

L. Shi, G. Chen / Computational Statistics and Data Analysis 56 (2012) 202–208

Both (2.5) and (2.8) can be written into a compact form as T (τ )τ = s(τ ), where T (τ ) is an (r + 1) × (r + 1) matrix and s(τ ) is an (r + 1) vector. For example, for the REMLE the elements of T (τ ) and s(τ ) are respectively tij (τ ) = tr(Q (τ )Z i Q (τ )Z j ) and sj (τ ) = Y T Q (τ )Z j Q (τ )Y (Searle et al., 1992). 3. Deletion, replacement and mean-shift model 3.1. Deletion We consider the general subset deletion. Let a = {i1 , . . . , im }(m < n) denote the indexes of a subset of the observations and Da = (di1 , . . . , dim ), where dik is an n × 1 vector with the ik th element equal to 1 and the rest equal to zero, k = 1, . . . , m. We use the subscript [a] to denote the corresponding result when the observations indexed by a are removed from the data. For example, βˆ [a] is the estimator of β after removing the observations indexed by a, and Q[a] is obtained the same way we obtain the matrix Q except that the observations indexed by a are removed. When τ is known, an update formula for the ˆ ) of β is (Haslett, 1999; Shi and Chen, 2008a) estimator β(τ

ˆ ) − (X ′ V −1 (τ )X )−1 X ′ V −1 (τ )Da (DTa Q (τ )Da )−1 DTa Q (τ )Y . βˆ [a] (τ ) = β(τ

(3.1)

When τ is unknown, from (2.5) the MLE of τ with the observations indexed by a removed satisfies the following system of equations r −

j

j

1 −1 i T tr(V[− a] (τ )Z[a] V[a] (τ )Z[a] )τi = Y[a] Q[a] (τ )Z[a] Q[a] (τ )Y[a] ,

j = 0, . . . , r ,

(3.2)

i=0

and the corresponding REMLE of τ satisfies r −

j

j

tr(Q[a] (τ )Z[ia] Q[a] (τ )Z[a] )τi = Y[Ta] Q[a] (τ )Z[a] Q[a] (τ )Y[a] ,

j = 0, . . . , r .

(3.3)

i=0

Let I{a} be the result of removing the rows (but not columns) in the n × n identity matrix that are indexed by a. Using property (3.4) in Shi and Chen (2008a), we can prove (see Appendix A) that 1 −1 I{Ta} V[− (τ ) − V −1 (τ )Da (DTa V −1 (τ )Da )−1 DTa V −1 (τ ) , Na (τ ), a] (τ )I{a} = V

I{Ta} Q[a] (τ )I{a} = Q (τ ) − Q (τ )Da (DTa Q (τ )Da )−1 DTa Q (τ ) , Ha (τ ).

(3.4)

It can be directly verified that both Na (τ ) and Ha (τ ) are idempotent with respect to V (τ ), namely, Na (τ )V (τ )Na (τ ) = Na (τ ) j and Ha (τ )V (τ )Ha (τ ) = Ha (τ ). By the above identities and the fact that Z[a] = I{a} Z j I{Ta} and tr(AB) = tr(BA), (3.2) reduces to r −

tr(Na (τ )Z i Na (τ )Z j )τi = Y T Ha (τ )Z j Ha (τ )Y ,

j = 0, . . . , r ,

(3.5)

j = 0, . . . , r .

(3.6)

i=0

and (3.3) becomes r −

tr(Ha (τ )Z i Ha (τ )Z j )τi = Y T Ha (τ )Z j Ha (τ )Y ,

i=0

Both (3.5) and (3.6) can be written into the form as Ta (τ )τ = sa (τ ) as in the full data case, and at convergence, we have Ta (τˆ[a] )τˆ[a] = sa (τˆ[a] ), where τˆ[a] is either the deletion MLE or the deletion REMLE of τ . When the τ in (3.1) is replaced with this τˆ[a] , the resulting βˆ [a] (τˆ[a] ) reflects both the direct effect of deleting the observations indexed by a and the indirect effect on the estimate of β of estimating the parameter τ . We note that the introduction of Na (τ ) and Ha (τ ) simplifies the computation to find τˆ[a] . 3.2. Replacement Let (Y , X , V (τ )) denote a generic linear mixed model with covariance matrix V (τ ), let Ya denote the observations indexed by a, and let Xa denote the rows of X that are indexed by a. When τ is known, the deletion approach starts from fitting the model (Y[a] , X[a] , V[a] (τ )) to obtain a deletion estimator βˆ [a] (τ ) of β . Using this fitted model, one can predict Ya by 1 ˆ Y˜a (τ ) = Xa βˆ [a] (τ ) + Va[a] (τ )V[− a][a] (τ )(Y[a] − X[a] β[a] (τ )),

where Va[a] (τ ) = cov(Ya , Y[a] ) and V[a][a] (τ ) = cov(Y[a] , Y[a] ). Replacing Ya in Y with Y˜a (τ ) to form a new observation ¯ a (τ )Q (τ ))Y (Haslett and Dillane, 2004, p. 135), where vector Ya˜ (τ ) = (Y˜aT (τ ), Y[Ta] )T , it is shown that Ya˜ (τ ) = (In − D

¯ a (τ ) = Da (DTa Q (τ )Da )−1 DTa . Fitting the model (Ya˜ (τ ), X , V (τ )), one finds another estimator βˆ a˜ (τ ) of β . Haslett (1999) D showed that βˆ [a] (τ ) = βˆ a˜ (τ ), the so called ‘delete = replace’ identity.

L. Shi, G. Chen / Computational Statistics and Data Analysis 56 (2012) 202–208

205

When τ is unknown and needs to be estimated simultaneously with β , a natural way to apply the replacement approach is to obtain Ya˜ (τ ) for a given τ and go on to estimate τ through r −

tr(V −1 (τ )Z i V −1 (τ )Z j )τi = Ya˜T (τ )Q (τ )Z j Q (τ )Ya˜ (τ )

i =0

= Y T Ha (τ )Z j Ha (τ )Y ,

j = 0, . . . , r ,

(3.7)

if the MLE is wanted, and through r −

tr(Q (τ )Z i Q (τ )Z j )τi = Y T Ha (τ )Z j Ha (τ )Y ,

j = 0, . . . , r ,

(3.8)

i =0

if the REMLE is desired. We see that (3.7) is not equivalent to (3.5) for the MLE and (3.8) is not equivalent to (3.6) for the REMLE. Together we have shown that the case deletion approach is not equivalent to the replacement approach under linear mixed models. 3.3. Mean-shift model To test whether the observations indexed by a are outliers, one can expand (2.1) to consider Y = X β + Da φ + ϵ,

(3.9)

where φ is an m × 1 parameter, and conduct a test of H0 : φ = 0 vs H1 : φ ̸= 0. Let X = (X , Da ) and write model (3.9) as (Y , X ∗ , V (τ )). When τ is known, it is known (Martin, 1992; Wei and Fung, 1999) that the estimator of β in model (3.9) is the same as βˆ [a] (τ ) in Section 3.1 and diagnostic measures based on the mean-shift model are equivalent to those based on case deletion. Note that the mean-shift model approach keeps the observations Y and the covariance structure V (τ ) the same and changes the design matrix X into X ∗ . Therefore, under model (3.9) the quantity that is changed in estimating τ by the MLE or the REMLE is the Q (τ ) in Eqs. (2.5) and (2.8), respectively. Define ∗

Q ∗ (τ ) = V −1 (τ ) − V −1 (τ )X ∗ (X ∗T V −1 (τ )X ∗ )−1 X ∗T V −1 (τ ).

(3.10)

We show in Appendix B that Q (τ ) = Ha (τ ). The MLE of τ under (3.9) satisfies ∗

r −

tr(V −1 (τ )Z i V −1 (τ )Z j )τi = Y T Q ∗ (τ )Z j Q ∗ (τ )Y ,

j = 0, . . . , r ,

(3.11)

i =0

which are different from (3.5) under case deletion. On the other hand, the REMLE of τ under (3.9) satisfies r −

tr(Q ∗ (τ )Z i Q ∗ (τ )Z j )τi = Y T Q ∗ (τ )Z j Q ∗ (τ )Y ,

j = 0, . . . , r ,

(3.12)

i =0

which are the same as in (3.6). Therefore, the case deletion approach and the mean-shift model approach are not equivalent under MLE and are equivalent under REMLE. On the other hand, the replacement approach and the mean-shift model approach are equivalent under MLE and are not equivalent under REMLE. 4. Example We use a real data set to demonstrate the differences among the deletion, replacement and mean-shift model approaches in diagnostic analysis of linear mixed models. Example: JSP data The data consists of 728 pupils in 48 primary (elementary) schools in Inner London as a part of the ‘Junior School Project’ (JSP) which have been analysed by Goldstein (1995) using a multilevel model. Two measurement occasions are considered: the first one was when the pupils were in their fourth year of schooling (8 year old), and the second one was three years later in their final year of primary school (11 year old). We have the scores from mathematics tests administered on these two occasions together with information collected on the social background of the pupils and their gender. Goldstein (1995) suggested a two-level model to fit this data, which is given by (2)

(2)

(1)

yij = β0 + β1 x1ij + β2 x2ij + β3 x3ij + e0i + e1i x1ij + eij ,

(4.1)

where yij = 11-year-old score for the jth pupil in the ith school, x1ij = 8-year-old score standardized by the sample mean and sample standard deviation (to avoid the ill-conditioning behaviours of some relevant matrices), x2ij = gender (1 for (1)

(2)

(2)

boy, 0 for girl), x3ij = social class (1 for non-manual, 0 for manual), eij , e0i and e1i are level 1 and level 2 random errors (2)

(2)

with variances τ0 , τ1 , τ2 , and covariance cov(e0i , e1i ) = τ3 , and the level 1 and level 2 random errors are independent. The covariance matrix V (τ ) = diag(V1 (τ ), . . . , V48 (τ )) with Vi (τ ) = Ini τ0 + 1ni 1Tni τ1 + x1i xT1i τ2 + (1ni xT1i + x1i 1Tni )τ3 , where ni is the number of students in ith school and 1ni is an ni × 1 vector with elements 1, i = 1, . . . , 48. School ID counts from 1 to 50,

206

L. Shi, G. Chen / Computational Statistics and Data Analysis 56 (2012) 202–208 Table 1 MLE and REMLE of the parameters in (4.1) based on deletion, replacement and mean-shift model when school 48 is deleted from the JSP data. Method

MLE

REMLE

Full data

Deletion

Replace

Mean-shift

Full data

Deletion

Replace

Mean-shift

βˆ 0 βˆ 1 βˆ 2 βˆ 3

0.1358 0.6653 −0.0340 −0.1633

0.1476 0.6731 −0.0298 −0.1663 0.1955

0.1476 0.6733 −0.0294 −0.1667 0.2030

0.1476 0.6733 −0.0294 −0.1667 0.2030

0.1356 0.6659 −0.0339 −0.1631

0.1471 0.6733 −0.0298 −0.1657 0.1777

0.1469 0.6735 −0.0296 −0.1657 0.1828

0.1471 0.6733 −0.0298 −0.1657 0.1777

τˆ0 τˆ1 τˆ2 τˆ3

0.4121 0.0858 0.0102 −0.0211

0.4104 0.0842 0.0105 −0.0245 0.2007

0.3815 0.0831 0.0104 −0.0244 2.2034

0.3815 0.0831 0.0104 −0.0244 2.2034

0.4133 0.0883 0.0114 −0.0218

0.4115 0.0868 0.0188 −0.0252 0.1774

0.3824 0.0856 0.0188 −0.0251 2.1658

0.4115 0.0868 0.0188 −0.0252 0.1774

ˆ AC (β)

AC (τˆ )

but number 10 and 43 are missing, so we have 48 schools in the data. The fixed parameter vector is β = (β0 , β1 , β2 , β3 )T , and the random parameter vector is τ = (τ0 , τ1 , τ2 , τ3 )T . We use Cook’s distance to measure the change caused by estimating β and τ when some observations are deleted. For example, for the case deletion approach the actual change for τˆ[a] is defined by AC (τˆ ) = (τˆ − τˆ[a] )T M −1 (τˆ − τˆ[a] ),

(4.2)

where M is a positive definite matrix and τˆ is the estimator of τ with the full data using the MLE or the REMLE method. A popular choice for M −1 is the observed information matrix of τ given by M −1 = (m∗ij ), m∗ij = 12 tr(V −1 (τˆ )Z j V −1 (τˆ )Z i ) for −1

MLE, and M −1 = (m∗ij ), m∗ij =

(Q (τˆ )Z j Q (τˆ )Z i ) for REMLE (Harville, 1977, p. 326), which we use in our example. The ˆ for the estimate of β is defined similarly. The actual changes AC (β) ˆ and AC (τˆ ) using the replacement actual change AC (β) 1 tr 2

approach and the mean-shift model approach can also be defined similarly. All of the computations in this paper are done using Matlab codes written by us, based on the iterative equations we have derived. All estimates involved are obtained iteratively, with a precision of 10−10 to stop. First we consider the case of MLE. The left panel of Table 1 gives the estimation results using the three different approaches when school 48 is deleted from the JSP data. We see from Table 1 that the three approaches give very close estimates of βi and τi , i = 0, 1, 2, 3, however, the Cook distance 2.2034 of τˆ with the replacement and mean-shift approaches is more than 10 times larger than the Cook distance 0.2007 of the case deletion approach. The reason for this large discrepancy in Cook’s distance is that with the replacement and mean-shift approaches, the absolute change in τˆ0 (versus the full data estimate) is much larger relative to the full data approximate standard error 0.0229 (|0.4121 − 0.3815|/0.0229 = 1.3362) than that with the case deletion approach (|0.4121 − 0.4104|/0.0229 = 0.0742). To see a comprehensive picture, Fig. 1 contains the index plots of AC (τˆ ) over the 48 schools under the study (the missing schools 10 and 43 are plotted with height zero). Under the standard cut-off value of 1, we see from Fig. 1 that using the case deletion approach there are no influential schools, while the replacement approach and the mean-shift model approach pick up schools 13, 31, 42, 47 and 48 as influential. Second, we consider the case of REMLE. The right panel of Table 1 gives the estimation results using the three different approaches when school 48 is deleted from the JSP data. Here we see the agreement between the case deletion approach and the mean-shift model approach, however, both are different from the replacement approach. The large Cook distance 2.1658 of τˆ for the replacement approach is due to the same reason as in the MLE case. Fig. 2 displays the index plots of AC (τˆ ) in this REMLE case. From Fig. 2 we see that the replacement approach identifies schools 13, 31, 42, 47 and 48 as influential, while the deletion and mean-shift model approaches perform the same and find no influential schools. 5. Discussion In this paper we have studied the equivalence/non-equivalence issue among case deletion, replacement and meanshift model under linear mixed models with covariance matrices indexed by unknown parameters, using either maximum likelihood estimation or restricted maximum likelihood estimation. We have shown that under maximum likelihood estimation, replacement and mean-shift model are equivalent but both are not equivalent to case deletion, while under restricted maximum likelihood estimation, case deletion and mean-shift model are equivalent but both are not equivalent to replacement. Therefore, case deletion and replacement are not equivalent under the two estimation methods. We note that the linear mixed models we have considered in this paper do not include most spacial models or models with AR(p) errors. In this larger class of linear mixed models, it may not be true that for MLE mean-shift is equivalent to replacement, and for REMLE mean-shift is equivalent to case deletion. Our guess is that in this larger class of linear mixed models, case deletion, mean-shift and replacement may not be equivalent at all. An open problem we shall investigate in a separate paper is: Is it possible to ‘‘add something’’ to regain the equivalence among case deletion, replacement and mean-shift model?

L. Shi, G. Chen / Computational Statistics and Data Analysis 56 (2012) 202–208

207

2.5

Values of AC

2

1.5

1

0.5

0

0

10

20 30 Case number

40

50

Fig. 1. Index plots of AC (τˆ ) using case deletion (o), replacement () and mean-shift model (x) when the MLE method is used.

2.5

Values of AC

2

1.5

1

0.5

0 0

10

20 30 Case number

40

50

Fig. 2. Index plots of AC (τˆ ) using case deletion (o), replacement () and mean-shift model (x) when the REMLE method is used.

Acknowledgements We want to thank the Editor, the Associate Editor and in particular two reviewers for their helpful comments and suggestions which have helped to improve the presentation of the paper greatly. The first author’s research was supported by National Natural Science Foundation of China and the second author’s research was supported by Natural Sciences and Engineering Research Council of Canada. Appendix A For ease of presentation, we drop the dependence on τ in the following proofs. A.1. Proof of Eq. (3.4) 1 From property (3) of (3.4) given in Shi and Chen (2008a), we see I{Ta} V[− a] I{a} = Na , which is the first equation of (3.4) in

this paper. Furthermore, we have I{Ta} Q[a] I{a} = Na − Na X (X T Na X )−1 X T Na . It can be verified that

(X T Na X )−1 = (X T V −1 X )−1 + (X T V −1 X )−1 V −1 D¯ a V −1 X (X T V −1 X )−1 . Substituting the above equation and Na = V −1 − V −1 Da (DTa V −1 Da )−1 DTa V −1 into Na − Na X (X ′ Na X )−1 X ′ Na , we can prove the second equation of (3.4) after some simple algebra.

208

L. Shi, G. Chen / Computational Statistics and Data Analysis 56 (2012) 202–208

Appendix B. Proof of Q ∗ (τ) = Ha (τ) T

T

By definition, Q ∗ = V −1 − P ∗ , P ∗ = V −1 X ∗ (X ∗ V −1 X ∗ )−1 X ∗ V −1 . Using the inverse formula for a block matrix, we have T

T

P ∗ = V −1 X ∗ (X ∗ V −1 X ∗ )−1 X ∗ V −1

= (V −1 X , V −1 Da )



(X T V −1 X )−1 X T V −1 CaT (DTa QDa )−1 DTa Q



= PCaT + V −1 Da (DTa QDa )−1 DTa Q , where Ca = I − QDa (DTa QDa )−1 DTa . Thus Q ∗ = V −1 − PCaT − V −1 Da (DTa QDa )−1 DTa Q

= V −1 − P + PDa (DTa QDa )−1 DTa Q − V −1 Da (DTa QDa )−1 DTa Q = Q − QDa (DTa QDa )−1 DTa Q = Ha , which finishes the proof. References Atkinson, A., 1998. Discussion on some algebra and geometry for hierarchical models, applied to diagnostics. Journal of the Royal Statistical Society: Series B 60, 521–523. Bryk, A.S., Raudenbush, S.W., 1992. Advanced Qualitative Techniques in the Social Science. In: Hierarchical Linear Models: Application and Data Analysis Methods, vol. 1. Sage, Newbury Park. Chatterjee, S., Hadi, A.S., 1988. Sensitivity Analysis in Linear Regression. John Wiley, New York. Christensen, R., 1996. Plane Answers to Complex Questions. Springer, Berlin. Cook, R.D., Weisberg, S., 1982. Residuals and Influence in Regression. Chapman and Hall, New York. Diggle, P., Liang, K.Y., Zeger, S.L., 2002. Analysis of Longitudinal Data, second ed. Oxford University Press, Oxford. Goldstein, H., 1989. Restricted unbiased iterative generalized least-squares estimation. Biometrika 76, 622–623. Goldstein, H., 1995. Multilevel Statistical Models, second ed. Halsted Press, New York. Harville, D.A., 1977. Maximum likelihood approaches for variance component estimation and related problem. Journal of the American Statistical Association 72, 320–338. Haslett, J., 1999. A simple derivation of deletion diagnostic results for the general linear model with correlated errors. Journal of the Royal Statistical Society: Series B 61, 603–609. Haslett, J., Dillane, D., 2004. Application of ‘delete=replace’ to deletion diagnostics for variance component estimation in the linear mixed model. Journal of the Royal Statistical Society: Series B 66, 131–141. Haslett, J., Hayes, K., 1998. Residuals for the linear model with general covariance structure. Journal of the Royal Statistical Society: Series B 60, 201–215. Hayes, K.and, Haslett, J., 1999. Simplifying general least squares. The American Statistician 53, 376–381. Hodges, J.S., 1998. Some algebra and geometry for hierarchical models, applied to diagnostics. Journal of the Royal Statistical Society: Series B 60, 497–536. Longford, N.T., 1993. Random Coefficients Models. Oxford University Press, Oxford. Martin, R.J., 1992. Leverage, influence and residuals in regression models when observations are correlated. Communications in Statistics—Theory and Methods 21, 1183–1212. Schabenberger, O., 2004. Mixed model influence diagnostics. SUGI-29. Paper 189. Searle, S.R., Casella, G., McCulloch, C.E., 1992. Variance Components. Wiley, New York. Shi, L., Chen, G., 2008a. Case deletion diagnostics in multilevel models. Journal of Multivariate Analysis 99, 1860–1877. Shi, L., Chen, G., 2008b. Detection of outliers in multilevel models. Journal of Statistical Planning and Inference 138, 3189–3199. Wei, W.H., Fung, W.K., 1999. The mean-shift outlier model in general weighted regression and its applications. Computational Statistics and Data Analysis 30, 429–441.