Statistics & Probability North-Holland
Letters 11 (1991) 495-501
June 1991
Testing for the redundancy of variables in principal components analysis James R. Schott Department of Statistics, University of Central Florida, Orlando, FL 32816, USA Received November 1989 Revised August 1990
Abstract: When all of the important principal components have zero coefficients on the same original variables, then those variables are redundant and may be eliminated. Tyler (1981) derived a statistic suitable for testing such a hypothesis. An asymptotic expansion for the mean of this statistic is obtained and used to calculate a Bartlett adjustment factor. The performances of the unadjusted and adjusted statistics are investigated in a simulation. Keywords: Asymptotic
expansion,
Bartlett
adjustment
factor,
dimensionality
reduction.
1. Introduction Many aldies, particularly in their initial stages, involve the collection of a large number of variables on different subjects or objects. Principal components analysis is a technique useful in reducing the dimensionality of the resulting data set without the loss of much information. Let the p x 1 vector x represent the vector of observed values of the variables mentioned above for one randomly selected subject. If its covariance matrix 52 has latent roots hi > . . . a A, and corresponding latent vectors y,, . . . , y,, then the i th principal component of x is given by vi = y;x, has variance h,, and is uncorrelated with the other p - 1 principal components. Consequently, if for some k
q):
the vectors
e,,,...,
Let I, > I, > . . . > f, be the latent the corresponding latent
Yl,...>_Q
This research Florida.
was supported
0167-7152/91/$03.50
ep lie in the subspace
generated
by yk + 1, . . . , y, _
roots of a sample covariance matrix with n degrees of freedom, and vectors. Put E = (e,, . . . , e,), Y, = (Y~+~, . . . , y,), and D, =
in part by an In-House
Research
0 1991 - Elsevier Science Publishers
Award
from the Division
B.V. (North-Holland)
of Sponsored
Research,
University
of Central
495
Volume 11, Number 6
diag{l,lk+d(lk+l
STATISTICS & PROBABILITY
.
-
, IjlP/(ZP - l,)‘}.
LETTERS
June 1991
Then Tyler’s test rejects H,(k,
4) if r = rank(Y,Y;E)
< q, or
if r = q and Tk,q = n i
tr( E’y,y,‘E(
E’Y,D,Y;E)-l)
j=l
exceeds the 1 - (Y quantile of the &i-squared distribution with kq degrees of freedom. This is an asymptotic OL level test for H,(k, q) since, under H,(k, q), as n + 00, r + q in probability and T k.9 -+ xi,, in distribution. The purpose of this paper is to investigate this null approximating distribution and improve it through the use of a Bartlett adjustment factor.
2. The Bartlett adjustment
factor
The basic idea behind Bartlett which can be expressed as
adjustment
factors
is a rather
simple
one. If a test statistic
E(T)=a{l+c/n+O(n-3’2)} then the mean of the adjusted E(T*)
statistic
T has a mean
(1) T * = (1 - c/n)T
can be expressed
as
= a + O(n-3’2);
that is, the mean of T * comes closer to the mean, a, of the asymptotic null distribution than does the mean of T. Some general results concerning the effect of this adjustment on other moments and its connection to the normalizing constant for the conditional distribution of a maximum likelihood estimator can be found in Lawley (1956b), Barndorff-Nielsen and Cox (1979, 1984). Although these results may very well extend to a more general setting, the papers above have only shown their application to the adjustment of a likelihood ratio statistic. We will obtain the expected value of Tk,q in the form of (1) by first getting an asymptotic expression for Tk,q in terms of the elements of the matrix A = r’Sr - A. Here S is the sample covariance matrix, y,), and A = diag(A,, . . . , Xp). We will assume throughout this paper that we are sampling r=(Y,,..., form a normal distribution; if this is not the case it may not be advisable to use a sample covariance matrix-based principal components analysis in the first place (Devlin, Gnanadesikan and Kettenring, 1981). Our main result is given in the following theorem. Theorem.
Let r,, be the q x p - k matrix containing the last q rows and last p - k columns of r, X,X,/(X, - X,)2} and tij = r2,Ajr12. Define $J& to be the (u, u)th Aj=diag(h,hk+,/‘CXk+, -xJ)2,..., element of the ( p - k) x ( p - k) matrix Qj = ri2Q,T’r2,. Then the mean of Tk,q can be expressed as E(Tk,q) where
496
= kq{l
+ ck,q/n + 0(n-3’2)},
Volume
11, Number
6
STATISTICS
j=l
u=k+l
u=k+l
& PROBABILITY
LE-M’ERS
June 1991
i
“
XJ”X3, {h; - 3h,h”X,
+ X,X,(A,
+ X”)}
-W&u-k~lLk,o-k (hX,Xjh, -
$
i:
i=l
j=l
I2
(AZ,
-
XJ)4(h”
-
‘Jj4
AlA,)
&,A
u=k+l
(x,-AJ)2(h,-Ai)2(XJ-X,)
I+/
Proof. If u,, . . . , up are the latent can be written as Tk,q = n i
vectors
of T’ST
and U, = (uk+ i, . . . , u,), then, if H,(k,
4) is true Tk,q
tr( E’T u,u;r’E(E’rci,o,v,‘r’E)~‘)
j=l
=n;
5 j=l
/=k+l
5 m=k+l
5; a=1
Y~+h-l,IY~+h~I,~UIJU~~Z~~(~)t
(2)
fi=l
where zap(j) is the (a, P)th element of ( E’I’U2DJU2’T’E)-‘. For our purposes we will need an expression for C4 which is accurate up through fourth ordered terms in the elements of A. Since the asymptotic expression (Sugiura, 1976) for uIJ (1 Zj) involves first and higher ordered terms, we will need to specify z+(j) up through second ordered terms. Let DJ* = 0, - Aj, and let U,, be the q X p - k matrix containing the last q rows of U,. If we put U,T = U,, - Ip_k we find that under H,(k, q),
= {52,+r2,(Dj*+AjU,;’ = fJJ:’ - f2J-1r,2(DJ* +
+ U,;A J + U,;A / U,; ’ + DJ* U,; ’ + U,; D,* + U,; D,* U,; ’ ) r,, > - 1 f A,U,;’
+ U2;Aj + U,;A,U,;’
tiJ-1r12( oJ* + aju,; f+ ~,;a,)r;~~~:lr,,( D,*
+ D,*U,;’ + a,~,;!
-t U,;D,* )r12tiJ-’ +
u,;aJ)r;,fiJpl,
where this last equality is correct up through second ordered terms. If we denote the (a, P)th element of QJ:’ by o$ and the i th diagonal elements of Aj and D,* by S/ and d/ *, respectively, it then follows that to the same accuracy
497
Volume 11. Number 6
STATISTICS & PROBABILITY
y=l
6=1
x
p=-1 ~=l
u=k+l
LETTERS
June 1991
x=k+l
d’*dj* u
yh+u-l,uyh+p-l,uyh+S-l,xyh+7-1,x
x
i
XY)@;Q* +
i
(Y h+p-l,uYh+y--l,uY~y
+
Yh+S-l,uYh+s-l,uYyp
YX
y=k+l P c
+i y=k+l
Y;p”Y,;iw~a~~
3
u=k+l
1
Using the asymptotic expansions of Lawley (1956a) and Y,“G” = Yh+y-l,uYh+S-l,u+ Yh+G-l,uYh+y-l,u. Sugiura (1976) for Zi and uij, respectively, we obtain an expression for zap(j) in terms of the elements of A accurate up through second ordered terms. This is then used in (2) to get an expansion for Tk,q accurate up through fourth ordered terms. Since A + A - W,(A, n), the various moments in the elements of A, needed to compute the mean of this expansion, are obtained using the moments of the Wishart distribution. After some simplification the asymptotic expansion for E(T,,,) can be obtained in the from given above. If 4 =p - k, the hypothesis H,( k, q) states that the subspace generated by yl, . . . , yk is identical to that generated by e,, . . . , 6-k. In this special case the expression for ck,q simplifies further since Qj = A;‘. It follows from Schott (1989) that this simplified version can be expressed as where
I-
Ck,p-k
=3+:(k-l)+{k(p-k))-1
i
5
j=l
u=k+l
5 u=k+l
(x
_h;;l u
_+> J
u
J
U
I
j=l
u=k+l
5
+;; i=l
(Xu-hj)2
j=l
u=k+l
A4 4-x2x2.-2A2A.X. u ‘I
(hi-h,)2(hj-h,)2
UlJ
1
.
3. A simulation study The performances of the adjusted and unadjusted statistics were compared in a simulation. The actual probabilities of type I error were estimated for the unadjusted statistic, Tk,q, the adjusted statistic, and a third statistic, Tk*q*. This last statistic was adjusted using czq = k + q + 1 K$ = (1 - ‘k,$n)T,,,, which approximates c~,~ when A, is large relative to Xk+i since c~,~ + c.& as Xk/Xk+i --* cc. In computing the quantity ck# sample estimates were used in place of parameters. The nominal significance level used was 0.05 and each estimate of type I error probability was computed from 1000 simulations. For simplicity, attention was restricted to covariance matrices 52 which are diagonal and have hk+l = * . . = A, = 1. Some of the results are presented in Table 1. We find that, in general, the adjusted and unadjusted statistics yield inflated type I error probabilities. The adjustment can result in a dramatic improvement, particularly if k and 4 are not both small. 498
Volume 11, Number 6
STATISTICS & PROBABILITY
LETTERS
June 1991
Table 1 Simulated probabilities of type I error when the nominal significance level is 0.05 (P, k q) = (10,294) n
0,~ X2) (10, 1.5)
20 25 30 50 75
(1072)
(1035)
(25,lO)
T2.4
T2:*
Tz:4
T2,4
T,;*
T2:4
T2.4
T2;*
T2:4
T2.4
T2:*
T2:4
0.615 0.594 0.559 0.447 0.376
0.446 0.451 0.430 0.375 0.330
0.119 0.129 0.143 0.115 0.102
0.541 0.483 0.402 0.268 0.183
0.348 0.324 0.269 0.196 0.142
0.079 0.085 0.075 0.066 0.058
0.321 0.231 0.200 0.117 0.119
0.130 0.087 0.108 0.070 0.080
0.058 0.044 0.074 0.054 0.068
0.271 0.223 0.184 0.120 0.097
0.103 0.102 0.083 0.068 0.064
0.083 0.086 0.071 0.065 0.060
(P, k q) = (10,2,1) n
(h
X2)
(10, 1.5)
20 25 30 50 75
T2,1
0.155 0.156 0.132 0.121 0.106
(l&2)
(10,5)
(25,lO)
T2:1*
771
T2.1
T2:l*
T2::
T2.1
T2:,
0.112 0.130 0.104 0.097 0.096
0.064 0.078 0.064 0.059 0.052
0.125 0.136 0.085 0.073 0.073
0.087 0.107 0.061 0.059 0.066
0.055 0.080 0.051 0.047 0.055
0.102 0.073 0.087 0.051 0.059
0.062 0.044 0.063 0.036 0.051
*
T2:1
T2.1
T2:
0.065 0.043 0.063 0.038 0.052
0.101 0.079 0.077 0.073 0.064
0.062 0.049 0.059 0.061 0.056
*
T2:1
0.062 0.050 0.059 0.063 0.056
(P, k q) = (15,2,1) n
(h
AZ)
(15, 1.5)
25 30 35 50 75
(15,2)
(15,lO) T27
T2.1
T2tl*
Tz:1
T2.1
T27
0.058 0.062 0.038 0.049 0.053
0.078 0.077 0.059 0.063 0.051
0.064 0.051 0.045 0.052 0.047
0.076 0.072 0.057 0.058 0.050
0.075 0.078 0.066 0.066 0.050
0.052 0.056 0.043 0.055 0.045
T2.1
%*
T2:1
T2.1
T2I;
0.091 0.092 0.087 0.103 0.074
0.064 0.065 0.066 0.091 0.068
0.059 0.051 0.050 0.061 0.037
0.099 0.094 0.071 0.064 0.056
0.066 0.073 0.053 0.053 0.048
*
(35915) *
T2,:
0.061 0.064 0.054 0.059 0.047
(P. k q) = (10,4,5) ?I
&,
A,,
h
X4)
(10, 10, 5, 1.5)
20 25 30 50 75
(25, 10, 10, 10)
T4.5
G?*
=4:
T4.5
=4>*
7-d
T4.5
T45*
T4:5
T4.5
7-4?5*
T4f
0.949 0.913 0.880 0.705 0.576
0.732 0.522 0.660 0.563 0.482
0.445 0.222 0.389 0.333 0.234
0.917 0.813 0.744 0.468 0.307
0.564 0.522 0.453 0.276 0.196
0.251 0.222 0.191 0.106 0.081
0.800 0.633 0.508 0.298 0.217
0.292 0.206 0.189 0.133 0.099
0.065 0.051 0.077 0.083 0.069
0.680 0.564 0.479 0.239 0.158
0.166 0.154 0.143 0.098 0.065
0.059 0.095 0.090 0.073 0.056
(P. k q) = (10,4,2) n (A,, x2, x3,
h4)
(10, 10, 5,1.5)
20 25 30 50 75
(10, 10, 5, 5)
(10,10,592)
(l&10,
T4,2
T42
*
0.521 0.468 0.411 0.289 0.221
0.301 0.305 0.280 0.219 0.176
5,2)
(25,10,10,10)
(10, 10,5, 5)
Vi
7’4.5
G*
T,T,
T4,2
T41;*
T4:
T42
** T4.2
TC
0.083 0.096 0.089 0.092 0.067
0.456 0.383 0.303 0.198 0.148
0.266 0.224 0.186 0.137 0.100
0.075 0.077 0.057 0.062 0.054
0.298 0.257 0.224 0.111 0.102
0.128 0.111 0.106 0.061 0.066
0.070 0.063 0.072 0.046 0.054
0.286 0.205 0.188 0.113 0.094
0.100 0.085 0.089 0.066 0.064
0.080 0.069 0.073 0.060 0.060
499
Volume 11, Number 6
STATISTICS 8r PROBABILITY
LETTERS
June 1991
Table 1 (continued) (p, .k 4) = (15,432) n
(43 ha x3, X4) (1515,
25 30 35 50 75
10,1.5)
(15,15, 10, 2)
(15,15,10,10)
T4,2
&2*
T4:2
T4,2
T4;*
7’4;
T4.2
0.310 0.313 0.277 0.207 0.172
0.166 0.191 0.179 0.150 0.133
0.040 0.046 0.055 0.038 0.033
0.287 0.267 0.237 0.186 0.111
0.137 0.129 0.149 0.121 0.079
0.042 0.037 0.055 0.046 0.043
0.207 0.180 0.125 0.109 0.095
(30,15,15,15)
T4;*
7%
T4,2
X4;*
X6
0.087 0.081 0.064 0.063 0.069
0.075 0.078 0.059 0.063 0.069
0.197 0.153 0.135 0.114 0.097
0.071 0.058 0.066 0.051 0.063
0.069 0.055 0.064 0.051 0.064
Regardless of the sample size, T& produces probabilities that fall, with few exceptions, between 0.05 and 0.10 when the k th and (k + 1)th roots are well separated; from our limited simulations, Xk/Xk+t > 3 seems to be sufficient. If k + q is not too large, say less than or equal to +p, Tkyq generally continues to perform well for small samples and ratios hk/hk+r as small as 1.5. As k + q increases, however, increasingly large sample sizes are required for small values of Xk/Xk+r. The computationally simpler adjusted statistic T&* does not perform nearly as well as T& unless both k and q are small or the sample size is quite large.
4. An example To illustrate the hypothesis testing problem discussed in this paper, we use the data of Jackson and Morris (1957) from a study of the quality of pictures produced by a photographic process. The data were generated as follows. A film strip was given a graded series of exposures to white light and was processed. Optical densities of the resultant steps were then measured through red, green and blue filters having narrow transmission bands. In the study three steps were included: one at a high density level, one at an average density level and one at a low density level. As a result, there are nine measurements, three colors at each level. The sample covariance matrix, based on 108 degrees of freedom can be found in the paper referenced above. The latent roots of this matrix are tabulated in Table 2(a) while the first two principal component vectors are given in Table 2(b). Since the first two principal components account for 75% of the total sample variance for the nine variables, we will concentrate on these two components. We find that the sample coefficients corresponding to the three low density variables have small values in both principal components. This naturally leads to the question of whether these three variables can be eliminated. Consequently, we consider the
Table 2 Data on the quality of pictures produced by a photographic process (a) Latent roots of the sample covariance matrix 878.5,
196.1,
128.6,
103.4,
81.3,
37.8,
7.0,
5.7,
3.5
(b) Principal component vectors Component
1 2
500
Average density
High density
Low density
red
green
blue
red
green
blue
red
green
blue
0.305 - 0.486
0.653 -0.151
0.483 0.588
0.261 - 0.491
0.324 -0.038
0.271 0.373
0.002 0.057
0.006 0.054
0.014 0.088
Volume
11, Number
6
STATISTICS
& PROBABILITY
LETTERS
June 1991
hypothesis H0(2, 3). Although the second and third sample roots are not that far apart it appears from our simulations that the adjusted statistic should perform adequately. For example, in testing H,(2, 4) when p = 10 and n = 75, the estimated significance level was 0.102 for X, = 1.5 and 0.058 for X, = 2. Routine calculation yields the unadjusted statistic T,,, = 3.45. When compared to the &i-squared distribution with 6 degrees of freedom, we see that H,(2, 3) is not rejected at any reasonable significance level. To compute the adjusted statistic, we note that c2,s = 10.3 and c& = 6. These then yield the adjusted statistics T25 = 3.12 and T$* = 3.26, so that for this particular example, the adjusted statistics lead to the same conclusion as the unadjusted statistic.
Acknowledgement This paper has benefited
from some helpful
comments
and suggestions
from the referee.
References Bamdorff-Nielsen, 0. and D.R. Cox (1979), Edgeworth and saddlepoint approximations with statistical applications (with discussion), J. Roy. Statist. Sot. Ser. B 41, 279-312. Bamdorff-Nielsen, 0. and D.R. Cox (1984), Bartlett adjustments to the likelihood ratio statistic and the distribution of the maximum likelihood estimator, J. Roy. Statist. Sot. Ser. B 46, 483-495. Devlin, S.J., R. Gnanadesikan and J.R. Kettenring (1981) Robust estimation of dispersion matrices and principal components, J. Amer. Statist. Assoc. 76, 354-362. Jackson, J.E. and R.H. Morris (1957), An application of multivariate quality control to photographic processing, J. Amer. Statist. Assoc. 52, 186-99. Lawley, D.N. (1956a), Tests of significance for the latent roots
of covariance and correlation matrices, Biometrika 43, 128-136. Lawley, D.N. (1956b), A general method for approximating to the distribution of the likelihood ratio criteria, Biomefrika 43, 295-303. Schott, J.R. (1989), An adjustment for a test concerning a principal component subspace, Statist. Probab. Lett. 7, 425-430. Sugiura, N. (1976), Asymptotic expansions to the distributions of the latent roots and latent vector of the Wishart and multivariate F matrices, J. Multivariate Anal. 6, 500-525. Tyler, D.E. (1981), Asymptotic inference for eigenvectors, Ann. Statist. 9, 725-736.
501