Influence analysis with homogeneous linear restrictions

Computational Statistics and Data Analysis 53 (2009) 3787–3794 Contents lists available at ScienceDirect Computational Statistics and Data Analysis ...

Download PDF

605KB Sizes 1 Downloads 36 Views

Report

PDF Reader
Full Text

Computational Statistics and Data Analysis 53 (2009) 3787–3794

Contents lists available at ScienceDirect

Computational Statistics and Data Analysis journal homepage: www.elsevier.com/locate/csda

Influence analysis with homogeneous linear restrictions Klaus L.P. Vasconcellos ∗ , L.M. Zea Fernandez Departamento de Estatística, CCEN, Universidade Federal de Pernambuco, Cidade Universitária, 50740-540, Recife, PE, Brazil

article

info

Article history: Received 10 July 2008 Received in revised form 28 March 2009 Accepted 31 March 2009 Available online 5 April 2009

abstract We discuss the problem of influence analysis for the particular situation in which the observations must satisfy a set of homogeneous linear restrictions. We present the proof of a theoretical result showing, that for this situation, we can define a system of orthogonal influence directions, as in the unrestricted problem. As an illustration, we discuss a linear regression model where some of the explanatory variables are in the form of proportions, giving also an example with a numerical data set. Our example shows that the influence measure that is here discussed can be quite effective in identifying relevant perturbations. © 2009 Elsevier B.V. All rights reserved.

1. Introduction The formulation of a specific model in order to describe approximately the behaviour of data is never complete without diagnostics tests. Those diagnostics tests help us to decide if a given chosen model is adequate to represent our data behaviour. An important part of the study of diagnostics is influence analysis, which is a set of tools designed to assess robustness to small perturbations in data or in the selected model. Cook (1986) introduced a widely-used likelihood-based method to measure what is called the local influence of a given perturbation. The measure introduced by Cook is constructed using the basic geometric idea of curvature of the log-likelihood surface. This curvature can be represented by a quadratic form in the space of perturbations and the most influential direction, defined as the direction of maximum curvature, will then be given by the eigenvector associated to the largest eigenvalue of the quadratic form. The measure of influence originally proposed by Cook has been largely studied in the literature from the theoretical and practical points of view. Paula (1993) discusses local influence in restricted regression models. Gu and Fung (1998) assess local influence for canonical correlation analysis. Shi and Wang (1999) considers local influence for ridge regression. Poon and Poon (1999) present a normalization of Cook’s measure which can provide us a good idea about different levels of influence. Critchley et al. (2001) introduce another approach to influence analysis by dealing with a perturbation as a change in a weighted empirical distribution. Zhu and Lee (2001) study local influence for incomplete data. Zhu et al. (2003) consider local influence for a semiparametric linear model estimation that uses a penalized Gaussian likelihood. Pan and Bai (2003) study local influence for some growth curve models. Zhu and Lee (2003) consider local influence for generalized linear mixed models. Ortega et al. (2003) discuss influence in generalized log-gamma regression models. Critchley and Marriott (2004) extend the methodology of Cook by considering the available information about perturbations that can be obtained from data itself. Lee and Liang (2004) provide influence analysis for nonlinear mixed-effects models. Huang et al. (2007) propose measures of influence in linear discriminant analysis. Lu and Song (2006) provide local influence analysis of multivariate probit latent variable models. Zhu et al. (2007) develop a rigorous differential-geometrical framework for local influence when the first derivative of the objective function is not zero at the specified point. Espinheira et al. (2008) present local influence for beta regression models. Zhu et al. (2008) develop influence measures based upon empirical likelihood.

∗

Corresponding author. Tel.: +55 81 21267429; fax: +55 81 21268422. E-mail address: [email protected] (K.L.P. Vasconcellos).

0167-9473/$ – see front matter © 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.csda.2009.03.025

3788

K.L.P. Vasconcellos, L.M. Zea Fernandez / Computational Statistics and Data Analysis 53 (2009) 3787–3794

In this paper, we consider the problem of local influence for the particular case where the perturbations must satisfy a set of homogeneous linear restrictions. We provide a proof of a theoretical result showing that we can define a system of orthogonal influence directions based on the eigenvectors of a given matrix. We also present an example, that of a simple linear regression model where some of the explanatory variables appear in the form of proportions. A numerical example with a given data set is provided and shows that the method that is proposed here can be quite effective in assessing the influence of different cases. The paper is organized in the following form: In Section 2 we present the concept of local influence as introduced by Cook (1986) and the normalization proposed in Poon and Poon (1999). We also discuss local influence for the particular situation where the perturbations must satisfy a set of homogeneous linear restrictions. In Section 3 we give a practical example, that of a simple linear regression model for which some of the explanatory variables are in the form of proportions. In Section 4, a numerical example is discussed. Section 5 concludes the paper and the proof of a key theoretical result is presented in the Appendix. 2. Local influence with restrictions 2.1. Local influence Let `(θ ) be a certain log-likelihood, where θ is a p dimensional vector of unknown parameters. Let `(θ|ω) be the loglikelihood corresponding to a given perturbation ω ∈ Ω , with Ω being an r dimensional perturbation space. We assume there exists ω0 ∈ Ω such that `(θ|ω0 ) = `(θ ), for all θ . Let θˆ be the maximum likelihood estimator of θ corresponding to the log-likelihood `(θ ). Cook (1986) suggests to study the influence of perturbation ω using the normal curvature Cd , which can be written as Cd = −2dT ∆T L¨

−1

∆d.

where the matrices ∆ and L¨ are defined as

∂ 2 `(θ|ω) ∆= ∂θ ∂ωT θ =θ,ω=ω ˆ 0

∂ 2 `(θ|ω) ¨ and L = . ∂θ ∂θ T θ=θˆ ,ω=ω0

Cook (1986) observes that valuable information about influence is provided by the direction dmax maximizing Cd . This

−1

direction, as we know, corresponds to the eigenvector associated to the largest eingenvalue of −∆T L¨ ∆, the eigenvalue being the maximum value. If the absolute value of the i-th component of dmax is large, then, we should pay attention to the

−1

∆ is symmetric, the direction orthogonal to dmax maximizing Cd is of a unit i-th perturbation. Furthermore, since ∆T L¨ eigenvector, say, d2 , associated to the second largest eigenvalue. Similarly, the direction orthogonal to the space spanned by dmax and d2 maximizing Cd is of a unit eigenvector associated to the third largest eigenvalue, and so on. Hence, we have an orthogonal system of influence directions, defined by the unit eigenvectors associated to the nonzero eigenvalues of

−1

∆. Usually, we have p < r and ∆ with full row rank, yielding an orthogonal system of p influence directions. Poon and Poon (1999) introduce the conformal normal curvature Bd in the direction of the unit vector d at ω = ω0 , which they define as ∆T L¨

−1 dT ∆T L¨ ∆d Bd = − s . −1 2 tr ∆T L¨ ∆ The main advantage, from the practical point of view, of Bd against Cd is that 0 ≤ |Bd | ≤ 1. This quantity, therefore, can be seen as a normalized version of Cd . Let G be the matrix of the above quadratic form, that is

−1 ∆T L¨ ∆ G = −s . −1 2 tr ∆T L¨ ∆ √

If curvature were the same for all r eigenvalues of G, then, each should be equal to 1/ r. Following Cook (1986)√and Poon and Poon (1999), we define a unit eigenvector as q-influential if the correspondent eigenvalue is greater than q/ r. Also, let d1 , . . . , dν be the q-influential unit eigenvectors, di = (di1 , . . . , dir ), associated to the q-influential eigenvalues √ of G, given by µ1 > · · · > µν > q/ r. Following Poon and Poon (1999), we consider the aggregated contribution m[q]t for the t-th component, t = 1, . . . , r, of the q-influential directions, m[q]t =

s ν X i=1

µi d2it .

K.L.P. Vasconcellos, L.M. Zea Fernandez / Computational Statistics and Data Analysis 53 (2009) 3787–3794

3789

If all components have the same contribution, each must be equal to

v ! u ν u1 X t ¯ [q] = m µi . r

i=1

¯ [q]. The most influential components can be identified by comparing m[q]t with m We can consider the total contribution mt of the t-th component, which is v u r uX mt = m[0]t = t µi d2it , i=1

where µ1 , . . . , µr are all the eigenvalues of G. If the total contributions are the same for all components, each total contribution will be

v ! u r u1 X t ¯ =m ¯ [0] = m µi . r

i=1

Let D be the r × r matrix having the eigenvectors of G for rows and let M be the r × r diagonal matrix of eigenvalues of G. Since G = DT MD, it is clear that m2t is the t-th element in the diagonal of G, that is, the conformal normal curvature Bt in the direction of the t-th canonical vector. Therefore, if the conformal normal curvatures are the same for all canonical vectors, ¯ 2 . Following Poon and Poon (1999), if the conformal normal curvature in the t-th canonical then, each must be equal to m ¯ 2 , we define the correspondent component direction is greater than twice this value, 2m as influential at total level. Similarly, √ ¯ [q]. the t-th component of perturbation will be influential at the q level if m[q]t > 2m 2.2. The restricted problem We consider now the case where the perturbation space must satisfy a certain number of homogeneous linear restrictions. This can be the case if we study models for proportions. Suppose we have a set of n multivariate observations of dimension k, given by X1 , . . . , Xn , where each Xi = (xi1 , . . . , xik ) is a vector of proportions, thus satisfying xi1 + · · · + xik = 1. We now would like to see the effect of a perturbation in this i-th observation. The perturbed version of this observation is a vector (xi1 + ωi1 , . . . , xik + ωik ), which we require to be also a proportion vector. Since we already have xi1 + · · · + xik = 1, we must then impose ωi1 + · · · + ωik = 0, that is, the perturbations must satisfy an homogeneous linear restriction. For the general case, let ω be an r dimensional perturbation and suppose ω must satisfy a set of homogeneous linear restrictions, such that C ω = 0, where C is any m × r matrix, m < r, with full rank m. Then, using Lagrange multipliers we obtain the direction which maximizes Cd and also satisfies the homogeneous linear restrictions as that of the eigenvector

−1 associated to the largest eigenvalue of −Q ∆T L¨ ∆, where Q = Ir − C T (CC T )−1 C , with Ir being the r × r identity matrix. −1

Also, as is shown in the Appendix, if the nonzero eigenvalues of Q ∆T L¨ ∆ are all distinct, then, the associated eigenvectors are mutually orthogonal and the set of unit eigenvectors will define the system of orthogonal influence directions satisfying the restrictions. If, for example, the joint distribution of the observations admits a probability density, then, clearly, the

−1

∆ will be almost surely distinct. In this case, we will have, then, a set of mutually eigenvalues of the random matrix Q ∆T L¨ orthogonal eigenvectors defining the system of influence directions. For example, consider again the study of proportions, for which we have a sample of n multivariate observations of dimension k > 2, given by X1 , . . . , Xn , where each Xi = (xi1 , . . . , xik ) satisfies xi1 + · · · + xik = 1. Then, for each i = 1, .., n we impose ωi1 + · · · + ωik = 0. The dimension of the perturbation space will then be r = nk. The r dimensional vector ω must satisfy C ω = 0, with C = In ⊗ uTk , where In represents the n × n identity matrix, uk is the k dimensional column vector having all elements equal to one and ⊗ denotes the Kronecker product. Then, CC T = pIn and Q = Ir − p−1 (In ⊗ Uk )

−1

where Uk is the k × k square matrix having all elements equal to one. Equivalently, we regress each column of −∆T L¨ ∆ into C T and take the errors of the regressions. We then work with the nk × nk matrix whose columns are the errors of these regressions. Easily enough, each regression consists of considering the nk dimensional vector as observations of n groups, each group with k observations, and, then, we obtain the sample mean of k observations for each of the n groups. In the general case above described, for which the vector ω must satisfy C ω = 0, we observe that the eigenvalues of

−Q ∆T L¨

−1

∆ correspond to the normal curvatures in the directions of the unit eigenvectors of this matrix. We propose then, for this case, the use of the normalized matrix

−1

Q ∆T L¨ ∆ G∗ = − s . −1 2 tr Q ∆T L¨ ∆ Since a valid perturbation here must belong to the kernel of C , a subspace with dimension r − √ k, it makes sense to define here a unit eigenvector of G∗ as q-influential if its corresponding eigenvalue is greater than q/ r − k.

3790

K.L.P. Vasconcellos, L.M. Zea Fernandez / Computational Statistics and Data Analysis 53 (2009) 3787–3794

3. An example: A regression model with proportions We now consider a simple example with ordinary least squares where some of the explanatory variables are in the form of proportions. Assume we are given a set of independent observations y1 , . . . , yn , such that yi = β1 xi1 + · · · + βk xik + · · · + βp xip + εi , with k > 1 and n > p ≥ k where xi1 , . . . , xik are nonnegative explanatory variables satisfying xi1 + · · · + xik = 1, for each i = 1, . . . , n. We suppose also that the εi ’s have common distribution N (0, σ 2 ). Also β1 , . . . , .βp are p unknown parameters corresponding to the explanatory variables. Let Y T = (y1 , . . . , yn ) and β T = (β1 , . . . , .βp ). Also let X be the n × p matrix of the xij ’s and suppose the columns of X are linearly independent. Then, it is well known that the maximum likelihood estimators of β and σ 2 are, respectively, P βˆ = (X T X )−1 X T Y and σˆ 2 = n−1 ni=1 e2i , where ei = yi − βˆ 1 xi1 − · · · − βˆ p xip is the fitting error corresponding to the ith observation. We present, in the next section, a numerical example of influence analysis, as described in Section 2, for this regression model and the particular perturbation scheme with xi1 , . . . , xik being replaced by xi1 + ωi1 , . . . , xik + ωik , where we impose ωi1 + · · · + ωik = 0. Let the perturbation vector ω˜ be

ω˜ = (ω11 , . . . , ω1k , ω21 , . . . , ω2k, . . . , ωn1 , . . . , ωnk ). The dimension of the perturbation space is, therefore, nk and the matrix Q in Section 2 is given by Q = Ink − k−1 (In ⊗ Uk ). The disturbed log-likelihood becomes n

n 1 X

2

2σ 2 i=1

`(β1 , . . . , βp , σ 2 |ω) ˜ = − log(2π σ 2 ) −

ηi2 ,

where ηi = yi − β1 (xi1 + ωi1 ) − · · · − βk (xik + ωik ) − · · · − βp xip . It is a well-known fact that the L¨ matrix, as defined in Section 2, is in this case given in partitioned form as 1 L¨ = − 2 2σˆ

X TX O1×p

Op×1 n

!

σˆ 2

,

where Or ×c here represents an r × c zero matrix.. It is also not difficult to obtain the matrix ∆ that was defined in Section 2. Let X be written in partitioned form as X = X1 X2 , where X1 stands for the first k columns; write β ,

ˆ T . Define now correspondingly, as β T = β1T β2T . Define also J = Ik Ok×(p−k) and consider the error vector e = (y − X β) ∗ X = X1 On×(p−k) . After simple calculations, we obtain, in partitioned form

" ∆ =− T

X ∗ ⊗ βˆ 1 − e ⊗ J

e ⊗ βˆ 1

2σˆ 2

σˆ 4

# .

and the symmetric nonnegative definite matrix −∆T L¨

−1

∆ that is needed for curvature analysis will become T T X ∗ ⊗ βˆ 1 − e ⊗ J (X T X )−1 X ∗ ⊗ βˆ 1 − e ⊗ J 2 −1 −∆T L¨ ∆= e ⊗ βˆ 1 e ⊗ βˆ 1 . + 2 4 2σˆ nσˆ It is important to observe that ∆ has p − k rows of zeros, corresponding to the p − k explanatory variables that are not −1 perturbed. Therefore, the rank of ∆T L¨ ∆ (and, consequently, the rank of G∗ ), the number of nonzero eigenvalues, is k + 1.

4. A numerical example We conclude our work with a numerical study of a data set. This numerical study was carried out with the Ox programming language (Doornik, 2001). Graphics were produced with the R programming environment (Venables and Smith, 2008). The data set here considered is Data 21 in the Appendix D of Aitchison (2003). This data set consists of permeabilities of 21 different fibreboards with different fibre compositions and made with three different bonding pressures, the permeability being defined as a measure of how easily a fluid can run through the material. The fibre compositions are defined as the proportions of four different ingredients constituting the fibreboard, which are short fibres, medium fibres, long fibres and binder. The regression equation is yi = β1 xi1 + · · · + β6 xi6 + εi , where yi measures the permeability of fibreboard i, xi1 , . . . , xi4 correspond to the proportions of the ingredients for this fibreboard, xi1 + · · · + xi4 = 1, while xi5 and xi6 are two dummy variables defining the intensity of the bonding pressure;

K.L.P. Vasconcellos, L.M. Zea Fernandez / Computational Statistics and Data Analysis 53 (2009) 3787–3794

3791

Fig. 1. Contributions of perturbations for 2-influential eigenvectors.

small pressure when xi5 = xi6 = 0, medium pressure when xi5 = 1, xi6 = 0 and high pressure when xi5 = 0, xi6 = 1. Thus we have, with the definitions of last section, n = 21, p = 6 and k = 4. The dimension of the perturbation space Ω is, therefore, r = 21 × 4 = 84 but the perturbations must lie in a subspace of dimension 84 − 21 = 63. We fit here the classical model which considers the observations as independent normal random variables with a common unknown variance. The OLS analysis shows all t-statistics as significant at the 5% level. Besides, the F test for first order autocorrelation gives a p-value of 0.697; from this, we accept the hypotheses that the observations are independent. The R2 for this model is 0.997, a very good fitting. Also, we have tried a competitive model, with the regression equation yi = β1 xi1 + · · · + β5 xi5 + εi , where, again, xi1 , . . . , xi4 are the proportions of the ingredients and xi5 is the value of the bonding pressure (three different values), but AIC and BIC for both models give preference to the first model, with six explanatory variables. We consider now the five (four plus one) nonzero eigenvalues of G∗ , where G∗ is the matrix √ defined in Section 2 with Q = I84 − (I21 ⊗ U4 )/4. These eigenvalues are 0.799, 0.543, 0.182, 0.151 and 0.102. Since 1/ 63 is approximately 0.126, we have one corresponding eigenvector which is not influential, two eigenvectors which are 2-influential (in fact, they are 3-influential) and four eigenvectors which are 1-influential. Figs. 1 and 2 plot the contribution of each perturbation, as defined in Section 2, for the 2-influential and √ 1-influential ¯ [2] and eigenvectors, respectively. The dashed lines in the graphics represent,respectively, the benchmark limits 2m √ ¯ [1], as discussed in the end of Section 2. From these graphs, we observe two main features. First, the number of 2m perturbation components with contributions exceeding the benchmark is reasonably large. Also, the contributions of some components are not very different from the benchmark, some are slightly smaller and some are slightly bigger. This shows that the benchmark should be seen as a guideline for influnce, rather than a rigoruous standard to be adopted. Comparing Figs. 1 and 2, we observe that contributions of perturbation components for q = 1 and q = 2 display similar behavior. In particular, the contributions from the perturbation components labeled as 49, 52, 77 and 80 are the largest both for q = 1 and q = 2. We observe also that perturbation components at 49 and 52 correspond to the perturbation on x13,1 and x13,4 elements of the X matrix of the explanatory variables, while perturbation components at 77 and 80 affect x20,1 and x20,4 . Fig. 3 plots the Studentized residuals for the regression, the Studentized residual ti being defined as ei ti = , √ σ˜ OLS 1 − hi 2 where ei is the i-th component of the error vector, σ˜ OLS = (n − p)−1 i=1 e2i is the OLS estimator of the variance of the errors and hi is the i-th element in the diagonal of the orthogonal projection matrix H = X (X T X )−1 X T . From this figure, we can see that cases 13 and 20 are those producing the largest Studentized residuals, with opposite signs (negative for case 13 and positive for case 20). It is interesting to consider a more detailed analysis of the most influential observations. As we have seen, it was detected, from the eigenvector analysis, that the most influential components in cases 13 and 20 are the first and fourth components.

Pn

3792

K.L.P. Vasconcellos, L.M. Zea Fernandez / Computational Statistics and Data Analysis 53 (2009) 3787–3794

Fig. 2. Contributions of perturbations for 1-influential eigenvectors.

Fig. 3. Studentized residuals for the regression.

As an exercice, we focus on case 13 and consider perturbations of 0.01 magnitude in all six possible pairs of components (noting that, since the sum of proportions must be equal to one, we must perturb at least two components). Then, we compare the residual sum of squares for each perturbed model with the original sum of squares of 6649.4. This is done for all twelve perturbed models that were considered (two possible perturbations for each pair). Table 1 shows the results that were obtained. For example, the first result in the first row is the residual sum of squares when x13,1 and x13,2 are perturbed from 0.18 to 0.17 and from 0.48 to 0.49, respectively. The result is a decrease in the residual sum of squares, from the original value of 6649.4 to 6617.7. Now, if x13,1 and x13,2 are perturbed from 0.18 to 0.19 and from 0.48 to 0.47, respectively, then, the residual sum of squares increases from the original value of 6649.4 to 6680.0. Table 1 clearly shows that the largest variations take place when the chosen pair of variables to be perturbed is x13,1 and x13,4 , which is exactly the pair that were detected in our original influence analysis.

K.L.P. Vasconcellos, L.M. Zea Fernandez / Computational Statistics and Data Analysis 53 (2009) 3787–3794

3793

Table 1 Residual sums of squares for pairs of 0.01 perturbations in case 13. Perturbed pairs

Decreased RSS

Increased RSS

x13,1 x13,1 x13,1 x13,2 x13,2 x13,3

6617.7 6439.8 6413.5 6468.4 6441.7 6620.7

6680.0 6867.3 6893.2 6833.5 6859.2 6673.3

and x13,2 and x13,3 and x13,4 and x13,3 and x13,4 and x13,4

5. Conclusions In this work, we discuss the problem of influence analysis for the particular case where the perturbations are subject to a set of homogeneous linear restrictions. We present the proof of a theoretical result showing that we can define an orthogonal system of influence directions for this problem. This orthogonal system is defined by the eigenvectors associated to the nonzero eigenvalues of a certain matrix. This matrix is obtained by projecting each column of the original matrix (the matrix that would be used if there were no restrictions) in the subspace defined by the restrictions. Although the resulting matrix in not necessarily symmetric, its non-zero eigenvalues are real and positive and the associated eigenvectors are mutually orthogonal. This set of eigenvectors, as we have shown, can work as a system of orthogonal influence directions. We have discussed a particular example, that of a simple linear regression model. This regression model has proportions as explanatory variables and we considered perturbations of those variables. Perturbations must therefore satisfy a set of homogeneous linear restrictions in this particular problem. We have provided a numerical example, based on a data set, which shows that the theoretical results here obtained seem to work in practice. In this numerical example, influential cases are mainly those cases for which the explanatory proportion vectors are near each other and at the same time produce very different values of the explained variable (and, therefore, relatively large errors). We have stressed that our curvature analysis captures exactly which are the most influential proportions, those that will produce the largest deviations as consequence of perturbations. Acknowledgements We gratefully acknowledge the financial support from the Brazilian governmental institution CNPq, as well as two anonymous referees whose comments made possible a much better version of this work. Appendix We provide a proof of a key result that was used in this paper, concerning maximization in r dimensions of a quadratic form S (d) = dT Ad, where A is an r × r nonnegative-definite matrix with rank ρ > 0, the maximization being subject to the restrictions dT d = 1 and Cd = 0, where C is a k × r matrix, k < r, with full rank k. Using Lagrange multipliers, it is not hard to verify that the maximum value of the quadratic form S (d) = dT Ad, subject to the restrictions dT d = 1 and Cd = 0, is attained at a unit eigenvector associated to the largest eigenvalue of QA, where Q = Ir − C T (CC T )−1 C , with Ir being the r × r identity matrix. Observe that the eigenvalues of QA are real because A is symmetric and Q is symmetric and idempotent; the eigenvalues of QA = Q 2 A are the eigenvalues of QAQ = Q T AQ . This optimization problem is discussed in Golub (1973) and Golub and van Loan (1996); also, an extension of it is analysed in Gander et al. (1989). Let d1 be the optimal direction for this maximization problem and suppose now we want the optimal direction in the restriction set that is orthogonal to T d1 . Very simply, C by the matrix D defined in partitioned form by all we need to do is add d1 to the rows of C , replacing T T D = C |d1 . Then, we would like to maximize S (d) subject to dT d = 1 and Dd = 0. From the above result, we need to find a unit eigenvector associated to the largest eigenvalue of Q˜ A, where Q˜ = Ir − DT DDT

−1

D. We show below how the

eigenquantities of Q˜ A are related to those of QA. Theorem 1. Using the previous notation, suppose also that QA has distinct nonzero eigenvalues. Let λ1 > λ2 > · · · > λρ be the nonzero eigenvalues of QA with respective associated unit eigenvectors d1 , d2 , . . . , dρ . Then, d1 , d2 , . . . , dρ are mutually

orthogonal, Q˜ Ad1 = 0 and λ2 > · · · > λρ are the nonzero eigenvalues of Q˜ A with respective associated unit eigenvectors d2 , . . . , dρ . Proof. Let λs and λt be two distinct eigenvalues of QA with respecitve associated unit eigenvectors ds and dt . Then, we have QAds = λs ds and QAdt = λt dt . From the first equality, we get dTt QAds = λs dTt ds . It is easy to see that Qdt = dt , hence dTt Ads = λs dTt ds . From the same arguments, dTs Adt = λt dTs dt . Therefere, λs dTt ds = λt dTs dt . Since λs 6= λt , we must have dTs dt = 0. We have also shown that dTs Adt = 0. Also, a simple calculation shows that DT DDT Therefore, Q˜ Ad1 = QAd1 −

d1 dT1 Ad1

= λ1 d1 − (

−1

D = C T CC T

dT1 Ad1

−1

)d1 . But λ1 =

C + d1 dT1 . Then, Q˜ = Ir − DT DDT

dT1 Ad1 .

Therefore, Q˜ Ad1 = 0.

−1

D = Q − d1 dT1 .

3794

K.L.P. Vasconcellos, L.M. Zea Fernandez / Computational Statistics and Data Analysis 53 (2009) 3787–3794

Now, for s = 2, . . . , ρ we have Q˜ Ads = QAds − d1 dT1 Ads = λs ds − (dT1 Ads )d1 = λs ds − 0 = λs ds . Therefore, λ2 , . . . , λρ

are eigenvalues of Q˜ A with respective associated unit eigenvectors d2 , . . . , dρ .

It remains to show that λ2 , . . . , λρ are the only nonzero eigenvalues of Q˜ A. Let R(X ) be the space spanned by the columns

of a given matrix X . We show that R(Q˜ A) ⊂ R(QA). For a given r dimensional vector v , we have Q˜ Av = QAv − d1 dT1 Av = QAv−(dT1 Av)d1 . Since QAv and d1 belong to R(QA), this linear combination also is in R(QA). Hence, rank(Q˜ A) ≤ rank(QA) = ρ . Now, to prove we cannot have equality, we show the above inclusion is proper. Consider a vector v and a scalar α such that Q˜ Av = α d1 . Hence, QAv − (dT1 Av)d1 = α d1 . Pre-multiplying by dT1 , we get dT1 QAv − (dT1 Av) = α . However, Qd1 = d1 .

Therefore, α = (dT1 Av) − (dT1 Av) = 0. That means d1 6∈ R(Q˜ A). But d1 ∈ R(QA). Therefore, rank(Q˜ A) < rank(QA) = ρ and

λ2 , . . . , λρ are the only nonzero eigenvalues of Q˜ A.

It follows from the above theorem that the optimal direction in the restriction set that is orthogonal to d1 is simply that of a unit eigenvector associated with the second largest eigenvalue of QA, d2 . Also, if we replace Q by Q˜ in this theorem, we will also conclude that the optimal direction in the restriction set that is orthogonal to both d1 and d2 is simply that of a unit eigenvector associated with the third largest eigenvalue of QA, d3 , and so on. That is, to study maximal directions for a quadratic form defined by a nonnegative-definite matrix subject to a set of homogeneous linear restrictions, all we have to do is to replace the original matrix A by the matrix QA and, then, we consider the orthogonal system of eigenvectors associated with nonzero eigenvalues, as we normally do for the unrestricted problem. This result can be better understood if we think of orthogonal projections. The set corresponding to the linear homogeneous restrictions is the orthogonal complement to the space spanned by the rows of C , that is R(C T )⊥ . Then Q corresponds to the matrix that projects a vector orthogonally into this space of the homogeneous linear restrictions. Therefore, we can think of projecting the columns of the original matrix A (or rows, since A is symmetric) into this space of the homogeneous linear restrictions and, then, work with this projected matrix. We can think, if we want, of simultaneous regressions. We regress each column of A into C T and take the matrix having the errors of the regressions as columns. The eigenvalues and eigenvectors of this matrix of errors will give the quantities we are interested in. References Aitchison, J, 2003. The statistical analysis of compositional data. Blackburn Press, New Jersey. Cook, R.D., 1986. Assessment of local influence (with discussion). Journal of the Royal Statistical Society B 48, 133–169. Critchley, F., Atkinson, R.A., Lu, G., Biazi, E., 2001. Influence analysis based on the case sensitivity function. Journal of the Royal Statistical Society B 63, 307–323. Critchley, F., Marriott, P., 2004. Data-informed influence analysis. Biometrika 91, 125–140. Doornik, J.A., 2001. Ox: An object-oriented matrix programming language, 4th ed.. Timberlake Consultants, and Oxford, London, http://www.doornik.com. Espinheira, P.L., Ferrari, S.L.P., Cribari-Neto, F., 2008. Influence diagnostics in beta regression. Computational Statistics and Data Analysis 52, 4417–4431. Gander, W., Golub, G.H., von Matt, U., 1989. A constrained eigenvalue problem. Linear Algebra and Its Applications 114/115, 815–839. Golub, G.H., 1973. Modified matrix eigenvalue problems. SIAM Review 15, 318–334. Golub, G.H., van Loan, C.F., 1996. Matrix Computations, 3rd ed.. The John Hopkins University Press, Baltimore. Gu, H., Fung, W.K., 1998. Assessing local influence in canonical correlation analysis. Annals of the Institute of Statistical Mathematics 50, 755–772. Huang, Y., Kao, T-L., Wang, T-H., 2007. Influence functions and local influence in linear discriminant analysis. Computational Statistics and Data Analysis 51, 3844–3861. Lee, S.Y., Liang, X., 2004. Influence analysis of nonlinear mixed-effects models. Computational Statistics and Data Analysis 45, 321–342. Lu, B., Song, X-Y., 2006. Local influence analysis of multivariate probit latent variable models. Journal of Multivariate Analysis 97, 1783–1798. Ortega, E.M.M., Bolfarine, H., Paula, G.A., 2003. Influence diagnostic in generalized log-gamma regression models. Computational Statistics and Data Analysis 42, 165–186. Pan, J., Bai, P., 2003. Local influence analysis in the growth curve model with Rao’s simple covariance structure. Journal of Applied Statistics 30, 771–782. Paula, G.A., 1993. Assessing local influence in restricted regression models. Computational Statistics and Data Analysis 16, 63–79. Poon, W.Y., Poon, Y.S., 1999. Conformal normal curvature and assessment of local influence. Journal of the Royal Statistical Society B 61, 51–61. Shi, L., Wang, X., 1999. Local influence in ridge regression. Computational Statistics and Data Analysis 31, 341–353. Venables, W.N., Smith, D.M., and the R Development Core Team, 2008. An introduction to R, http://cran.r-project.org/doc/manuals/R-intro.pdf. Zhu, H., Ibrahim, J.G., Lee, S., Zhang, H., 2007. Perturbation selection and influence measures in local influence analysis. The Annals of Statistics 35, 2565–2588. Zhu, H., Ibrahim, J.G., Tang, N., Zhang, H., 2008. Diagnostic measures for empirical likelihood of general estimating equations. Biometrika 95, 489–507. Zhu, H., Lee, S., 2001. Local influence for incomplete data models. Journal of the Royal Statistical Society B 63, 111–126. Zhu, H., Lee, S., 2003. Local influence for generalized linear mixed models. Canadian Journal of Statistics 31, 293–309. Zhu, Z-Y., He, X., Fung, W-K., 2003. Local influence analysis for penalized Gaussian likelihood estimators in partially linear models. Scandinavian Journal of Statistics 30, 767–780.

Influence analysis with homogeneous linear restrictions

Influence analysis with homogeneous linear restrictions

Recommend Documents