P. R. Krishnaiah and L. N. Kanal, eds., Handbook of Statistics, Vol. 2 ©North-Holland Publishing Company(1982) 821-833
"~ J
On the Selection of Variables Under Regression Models Using Krishnaiah's Finite Intersection Tests* J a m e s L. S c h m i d h a m m e r
1.
Introduction
The finite intersection test procedure under the univariate regression model was first considered by Krishnaiah in 1960 in an unpublished report which was subsequently issued as a technical report in 1963 and later published in Krishnaiah (1965a). The problem of testing the hypotheses that the regression coefficients are zero as well as the problem of testing the hypotheses that contrasts on means, in the A N O V A setup, are equal to zero are special cases of this procedure. Finite intersection tests under general multivariate regression models were proposed by Krishnaiah (1965b). In this chapter, we discuss some.applications of Krishnaiah's finite intersection tests for selection of variables under univariate and multivariate regression models. Section 2 gives some background material on the multivariate F distribution, which is the distribution most commonly used in conjunction with the finite intersection test. Section 3 describes the application of the finite intersection test procedure to the univariate linear regression problem, while Section 4 discusses the extension to the multivariate linear regression case. Finally, Sections 5 and 6 illustrate the use of the finite intersection test with univariate and multivariate examples respectively.
2.
The multivariate F distribution
Let x~ ..... x n be distributed independently and identically as multivariate normal random vectors with mean vector /L and covariance matrix 2~, with x~=(xil ..... Xip ) for i = 1 ..... n,/#=(/~l,...,/~p), and ~ = ( o i j ) . Also, let z j = (1/%)E7= ~x2 f o r j = 1..... p. Then the joint distribution of z 1..... Zp is a central or noncentral multivariate chi-square distribution with n degrees of freedom (central *This work is sponsored by the Air Force Office of Scientific Research under Contract F49629-82K-001. Reproduction, in whole or in part, is permitted for any purpose of the United States Government. 821
822
James L Schmidhammer
if /t = 0, noncentral if /t =/=0). Also, let z 0 be distributed independently of z ' = (z I ..... zp) as a central chi-square random variable with m degrees of freedom, with E ( z o ) = m . In addition, let F i = ( z i / n ) / ( z o / m ). Then the joint distribution of F~ ..... Fp is a multivariate F distribution with n and m degrees of freedom, with P as the correlation matrix of the accompanying, multivariate normal distribution, where P = (0/j) and &j = o/j/(oi/Oja)l/2. This distribution was introduced by Krishnaiah (1963, 1965a). When n = 1, the multivariate F distribution is known as the multivariate t 2 distribution. For an exact expression for the density of the central multivariate F distribution when X is nonsingular, see Krishnaiah (1964). Exact percentage points of the bivariate F distribution were given in Schuurmann, Krishnaiah and Chattopadhyay (1975), whereas approximate percentage points of the multivariate F distribution were constructed by Krishnaiah and Schuurmann and reproduced in Krishnaiah (1980). Percentage points in the equicorrelated c a s e (Pij= P for i v~ j ) were given in Krishnaiah and Armitage (1970), and Krishnaiah and Armitage (1965), when n = 1. In general, exact percentage points of the multivariate F distribution are difficult to obtain when ~ is arbitrary. However, bounds on these percentage points can be obtained using several probability inequalities. Upper and lower bounds on the critical values can be computed using Poincare's formula,
~<1-- ~
P [ F i > F,~]+ ~] P[F,.> F ~ O F j > F,~].
i=1
i
(2.1) The left-hand side can be used to obtain an upper bound on F~, while the right-hand side can be used to obtain a lower bound. To describe other probability inequalities useful for obtaining bounds on the critical values, let x = (x~,...,xr)' be distributed as multivariate normal random vector with mean vector 0 and covariance matrix a2P, where P = (&j) is the correlation matrix of the xi's, and let y = (Yl,-.. ,Yr)' be distributed as a multivariate normal random vector, with mean vector 0 and covariance matrix OZIr . Also, let s 2 / o 2 ~ X 2 , independent of x and y. Then Sidak (1967) showed that
(2.2)
(,/(;)
(2.3)
On the selection of variables under regression modeb
823
Inequality (2.2) is referred to here as Sidak's u p p e r bound, and inequality (2.3) is referred to as the product u p p e r bound. F o r additional discussion, see Krishnaiah (1979).
3.
The finite intersection t e s t - - A simultaneous procedure in the univariate case Consider the usual univariate linear model given b y y=
Xfl+ e
where y is an n-vector of independently distributed normal r a n d o m variables, X = [1, XI] with 1 being an n-vector of all l's, and X l an n × q matrix of k n o w n constants representing the n observations on the independent variables x l . . . . . x ~ , , f l = ( f l o , fll . . . . . fla) is a ( q + l ) - v e c t o r of u n k n o w n parameters, and ~ " " covarl-" e ~ Nn(0, tr 2 I n). The least squares estimate 0 f f l i s f l^-- - ( X ' X ) -1 X ' y, w~th ance matrix 02( X ' X ) - 1 = 0 2 ( w i j ) . T h e usual error s u m of squares is S 2 = y ' [ I n -
x ( x ' x ) - ' x ' ] y. T h e p r o b l e m of selection of variables can be formulated within the f r a m e w o r k of testing of hypotheses. For example, the variable x~ m a y be declared to be i m p o r t a n t or u n i m p o r t a n t according as fl~ 4 = 0 or fl~ = 0. In order to test the hypothesis H: fl = 0 using the finite intersection test procedure, we partition the overall hypothesis H into a finite intersection of subhypotheses H~:fli = 0 for i :-- 0, 1..... q. Then we accept or reject H i according as F,. % Fi~ where
Fi = w i i S 2 / ( n - q -- 1) W
and P[ o/q=0(Fi < F/~}JH] = 1 - - a . In this p a p e r we will only consider the case where F/~ = F~ for i = 0, 1..... q. Now, the joint distribution of F o, F 1. . . . . Fq is a multivariate ( ( q + l ) - v a r i a t e ) F distribution with 1 and n - q - 1 degrees of freedom. Simultaneous confidence intervals associated with this procedure are given by fli - ~/F~w. S 2 / ( n - q - 1 ) <~ fl, <~ fli + ~F~wii S 2 / ( n - q - 1 )
for i = 0 . . . . . q. F o r comparison, if we use the usual overall F test, we reject H if F > F* where F
]~'(X'X)]~/(q+I) S 2 / ( n - q - 1)
and P [ F < ~ F * ] H ] = I - - a ,
with F ~ F q + l , . q-1. The associated simultaneous
James L. Schmidhammer
824
confidence intervals are given by
~i--~-( q + l)F*wiiSZ/(n-- q--1) <~ <~Bi <~fli + ~/( q + 1)F~*wiiS2/( n -- q -- 1). Now, since F~ ~<(q + 1)F* (see Krishnaiah, 1969), the lengths of the confidence intervals associated with the finite intersection test are never longer than the lengths of the corresponding confidence intervals associated with the overall F test. In the procedure described above, H = ("lqi=oHi, where Hi: fli = 0. Thus, a test is performed on the importance of every independent variable simultaneously, including the intercept. However, it is usually the case that the test H o : r0 = 0 is of no interest, and it is often the case that only a subset of all possible independent variables are to be examined for importance. With this in mind, consider r hypotheses of the form H i : c~fl = 0 for i = 1..... r, with H* = ("]i=r1Hi" In the above context, c[ = (0 ..... 0, 1,0 ..... 0), i.e., c i selects the particular fit of interest for testing, although the procedure described below works for arbitrary % Using the finite intersection test procedure, we reject/4 i if F, > F, where
F,= ( cil0 ) 2 / [ c ; ( X , X ) - , c i S 2 / ( n - q - 1 ) ] r
and P[ O i= IF• . < F~[ H*] = 1 -- a, with the joint distribution of F 1. . . . . F~ being a multivariate F distribution with 1 and (n - q - 1) degrees of freedom. In this case simultaneous confidence intervals are given by
cit~- ~/Fac;( X ' X ) - ' c i S 2 / ( n -
q-1)
<~
C i ~ Cit8"{-~F~c;( X ' X )- 'ciS2//( n - q -- 1).
Table 1 Relative efficiency of overall F test to the finite intersection test (a = 0.05, v 10)
r•
1 2 3 4 5 6 7 8 9 10
0.1
0.5
0.9
1.00 0.83 0.72 0.64 0.58 0.53 0.49 0.45 0.43 0.40
1.00 0.80 0.68 0.60 0.54 0.49 0.43 0.42 0.39 0.36
1.00 0.71 0.57 0.48 0.42 0.37 0.33 0.30 0.28 0.26
On the selection o f variables under regression models
825
Table 2 Relative efficiency of overall F test to the finite intersection test (a = 0.05, u = 30)
1 2 3 4 5 6 7 8 9 10
0.1
0.5
0.9
1.00 0.83 0.72 0.65 0.59 0,54 0.50 0.47 0.44 0.42
1.00 0.81 0.70 0.62 0.56 0.51 0.47 0.44 0.41 0.39
1.00 0.73 0.60 0.51 0.45 0.40 0.37 0.34 0.31 0.29
If instead the usual overall F test is used, then we reject H if F > F* where
and P [ F ~< F* IH*] = 1 - a with F ~ Fr, , _ q 1 and C' = [ c I . . . . . c r ]. Furthermore, simultaneous confidence intervals are given by
ci~-~rF*ci( X ' X ) - ' ¢ i s Z / ( n - q - 1 ) <~cil3 <~eifl + ~rF* ei( X'X )- l¢igZ/( n - -
q --
1).
Again, the lengths of the confidence intervals associated with the finite intersection test are shorter than the lengths of the corresponding confidence intervals associated with the usual overall F test. Table 3 Relative efficiency of overall F test to the finite intersection test (a = 0.01, u = 10)
r• 1 2 3 4 5 6 7 8 9 10
0.1
0.5
0.9
1.00 0.84 0.73 0.66 0.59 0.55 0.51 0.47 0.44 0.42
1.00 0.82 0.71 0.63 0.57 0.52 0.47 0.44 0.41 0.39
1.00 0.76 0.62 0.53 0.47 0.42 0.38 0.34 0.32 0.30
James L. Schmidhammer
826
Table 4 Relative efficiency of overall F test to the finite intersection test ( a = 0.01, ~ = 30)
1 2 3 4 5 6 7 8 9 10
0.1
0.5
0.9
1.00 0.85 0.75 0.68 0.62 0.57 0.53 0.50 0.47 0.44
1.00 0.84 0.74 0.66 0.60 0.55 0.51 0.48 0.45 0.42
1.00 0.79 0.66 0.50 0.52 0.47 0.43 0.40 0.37 0.35
A comparison of the lengths of the confidence intervals associated with the finite intersection test with the corresponding lengths of the confidence intervals associated with the overall F test is given in Tables 1-4. These tables give values of R 2-- F a / ( r F * ) for a (Type I error r a t e ) = 0.01, 0.05 and u (error degrees of freedom) = 10, 30. For similar tables when using the finite intersection test in a one way ANOVA setup, see Cox et al. (1980). For discussions regarding the confidence intervals associated with the overall F test, the reader is referred to Roy and Bose (1953) and Scheff6 (1959).
4.
The finite intersection t e s t - - A simultaneous procedure in the multivariate case
Analogous to the univariate linear model, the multivariate linear model is given by Y=XB+E where Y is an n × p matrix of n observations on p variables Yl ..... yp whose rows are independently distributed as a p-variate normal distribution, with covariance matrix ~, and E ( Y ) = XB. Furthermore, X is as described in the previous section, while B = [fl0, fll ..... flq]' is a (q + 1)X p matrix of unknown regression parameters, and E is an n × p matrix whose rows are independently and identically distributed as a p-variate normal distribution with mean vector 0 and covariance matrix ~. The problem of selection of variables under the multivariate regression model can again be formulated within the framework of simultaneous test procedures as in the univariate case. The problem of testing H : B = 0 is equivalent to the problem of testing H i : c ' i B = O ' for i = 0 , 1 ..... q simultaneously where c~= [Coi , Cli . . . . . Cqi] for i = O, 1,...,q, with (~
Chi =
ifh#i, if h = i,
On the selection of variables under regression models
827
i.e., H = N q=0/-/i. N o t e that with c i as described above, c~B =,6/' for i = 0, 1. . . . . q. W e declare that the variable x i is i m p o r t a n t or u n i m p o r t a n t for prediction of Y ' = (Yl .... ,yp) according as H / i s rejected or accepted. In order to develop the finite intersection test procedure in this case the following notation is needed. Let N k denote the k × k top left-hand corner of Z = ( o i / ) a n d o k+l 2 -- INk+l l / [ ~ [ for k = 0, 1. . . . . p -- 1, with IN o [ = 1. Also, let
[,:,] Yk =
:
r°, = [.ak,k+l
with "/0 = O. Finally, let Y = [ Yl . . . . . Yp], Yj = [ Yl . . . . . yj] for j = 1 . . . . . p, B = [01 .... ,Op], B / = [0~ . . . . . Oj] f o r j = 1 . . . . . p. Now, when Y/ is fixed, the elements of Y/+l are distributed normally with c o m m o n conditional variance ~ 2 and with conditional means ~j+l
(4.1)
where ~//+l = 0/+l -- B/~. Each of the hypotheses Ho, H1,... ,Hq can be expressed as P
for i = 0 , 1 .... ,q j=l
where H//: c;~lj = 0 and c i is as described previously. Thus the p r o b l e m of testing H, which is equivalent to the p r o b l e m of testing H0, H~ ..... Hq simultaneously, is also equivalent to the p r o b l e m of testing H i / ( i = 0, 1..... q; j = 1.... , p ) simultaneously. Noting that the model in (4.1) is just a univariate linear regression model, let c ; ~ / d e n o t e the best linear unbiased estimate ( B L U E ) of c;~//, and let S/2 denote the usual error sum of squares. With the finite intersection test procedure, the test statistic for testing Hij against a two sided alternative is ^ 2 F/j = (ci pr//) /[DqSj2/(n -- j-- q)]
(4.2)
for i = 0,1 ..... q a n d j = 1.... ,p. In (4.2), DijSj2/(n - j -- q) is the sample estimate of the conditional variance of c;~/. W e reject H i / i f F/j > F~, where
P[F,./<- F~; i = O, 1. . . . . q; j = 1..... p i n ] = P
II P[F.ij<~F,;i=O,1 ..... j 1
qln] = l - a .
If any Hi/is rejected, then we declare the ith variable x i to be important.
James L. Schmidhammer
828
When H is true, the joint distribution of Foj, F1/ ..... Fqy, for any given j = l .... ,p, is a (q + 1)-variate F distribution with 1 and n - j - q degrees of freedom. The associated 100(1- a)% simultaneous confidence intervals are given by
c:Oj--~F~DijSj2/(n -- j - - q) <~ e : rIJ • <~ e : ~ .J + ,/F~D iJ . SJ} / ( n V
- j -- q )
Several comparisons have been made between the lengths of the confidence intervals for the finite intersection test in the multivariate case and the lengths of confidence intervals derived from other procedures. It is known (see Krishnaiah, 1965b) that the finite intersection test yields shorter confidence intervals than the step-down procedure of J. Roy (1958). Also, Mudholkar and Subbaiah (1979) made some comparisons of the finite intersection test with the step-down procedure and Roy's largest root test. Additional comparisons of interest are to be found in Cox et al. (1980). Several remarks are in order at this time. First, the critical value F~ has been chosen to be the same for all hypotheses Hij. This was done out of convenience but is Certainly not necessary. Second, the hypothesis Hij was chosen such that Hi = O ~ 1Ho and H = Aq=0Hi, with H : B = 0. However, the overall hypothesis H need not be H : B = 0. We can just as easily consider any set of hypotheses H 1 , . . . , H r where H i : e:B = 0' for i = 1..... r with e/being chosen as desired and H = (-li=lH i. In the context of selection of variables, however, the ei's are to be chosen so as to select out the particular independent variables of interest (see discussion in previous section). r
5.
A univariate example
In order to illustrate the use of the finite intersection test in univariate regression, a data set appearing as Exercise E, p. 230 of Draper and Smith (1966) is analyzed) The data consists of 33 years of observations on corn yield, preseason precipitation, and various monthly readings on temperature and precipitation for the state of Iowa. Corn yield (YIELD) is used as the dependent variable, while the remaining nine variables are used as independent variables. These independent variables are year number (YEAR), preseason precipitation (PSP), May temperature (MAY T), June rain (JUN R), June temperature (JUN T), July rain (JUL R), July temperature (JUL T), August rain (AUG R), and August temperature (AUG T). The data were input into a F O R T R A N i The author is grateful to John Wiley & Sons for givingpermission to use these data for illustrative purposes.
829
On the selection of variables under regression models
Table 5 Finite intersection test--Univariate example Error degrees of freedom Sample estimate of error variance Overall a-level
Critical values Poincare's lower bound Sidak's upper bound Product upper bound Overall F (9F (-95))
23 60.8 0.05
9,23
9.0332 9.1699 9.2969 20.8809
Simultaneous confidence intervals
Poincare's i
fli
Fi
lower bound
1 0.8769 21.96651 (0.31,1.44) 2 0 . 7 8 6 5 3.35373 (--0.50,2.00) 3 0 . 4 5 4 9 1.13878 (-- 1.74,0.83) 4 -0.7644 0.52760 (--3.93,2.40) 5 0 . 4 9 9 7 0.70392 (-- 1.24,2.20) 6 2 . 5 8 2 8 3.53278 (--1.55,6.71) 7 0 . 0 6 0 9 0.00723 (--2.09,2.21) 8 0 . 4 0 1 7 0.15199 (--2.70,3.50) 9 --0.6639 0.88899 (--2.67,1.34)
Sidak's upper bound
(0.31,1.44) (--0.51,2.09) (-- 1.75,0.84) (--3.95,2.42) ( - 1.25,2.21) (-1.58,6.74) (--2.11,2.23) (--2.72,3.52) (--2.69,1.36)
Product upper bound
Overall F test
(0.31,1.45) (0.02,1.73) (-0.52,2.10) ( - 1.18,2.75) ( 1.75,0.84) (-2.40,1.49) (-3.97,2.44) (-5.57,4.04) ( - 1.26,2.22) (-2.13,3.09) (-1.61,6.77) (-3.70,8.86) (-2.12,2.25) ( 3.21,3.34) (-2.74,3.54) (-4.31,5.11) (-2.70,1.37) (-3.71,2.39)
computer program written for use on the DEC-10 computer at the University of Pittsburgh. The results appear in Table 5. The model for the data is y = X13 + e
where fl' = [ flo, 13'1], 13't= [ill .... , flq], and the overall hypothesis tested is H: 131 = 0, against the alternative A : 131 =~ 0. Thus, we test the hypothesis that none of the independent variables are related to the dependent variable, but do not test that the intercept is zero. In Table 5, note that simultaneous confidence intervals are constructed using Poincare's lower bound, Sidak's upper bound, and the product upper bound on the critical values for the finite intersection test, and also using the critical value associated with the overall F test. The confidence intervals associated with overall F test are at least 50% wider than the corresponding confidence intervals using the finite intersection test. However, the confidence intervals constructed using the product upper bound are only 1.4% wider than those constructed using Poincare's lower bound, while the confidence interval constructed using Sidak's upper bound are only 0.75% wider than those using Poincare's lower bound, indicating that a fairly precise estimate of the true critical value is available, at least in this case, using only some probability inequalities. As for the results of the analysis, it is interesting that the only variable related to corn crop yield is year number, reflecting the well known fact that grain production in the United States has been steadily increasing for the past fifty
830
James L. Schmidhammer
years,
primarily
variation
6.
as
a result
of
technological
that might exist due to temperature
innovations,
overwhelming
any
and precipitation.
A multivariate example As an example
linear regression,
of the application a data
of the finite intersection
set appearing
as Table
test in multivariate
4 . 7 . 1 , p. 3 1 4 o f T i m m
Table 6" SAT 49 49 11 9 69 35 6 8 49 8 47 6 14 30 4 24 19 45 22 16 32 37 47 5 6 60 58 6 16 45 9 69 35 19 58 58 79
PPVT
RPMT
48 76 40 52 63 82 71 68 74 70 70 61 54 55 54 40 66 54 64 47 48 52 74 57 57 80 78 70 47 94 63 76 59 55 74 71 54
8 13 13 9 15 14 21 8 11 15 15 11 12 13 10 14 13 10 14 16 16 14 19 12 10 11 13 16 14 19 11 16 I1 8 14 17 14
N
S
NS
NA
SS
1 5 0 0 2 2 0 0 0 3 8 5 1 2 3 0 7 0 12 3 0 4 4 0 0 3 1 2 0 8 2 7 2 0 1 6 0
2 14 10 2 7 15 1 0 0 2 16 4 12 1 12 2 12 6 8 9 7 6 9 2 1 8 18 11 10 10 12 11 5 1 0 4 6
6 14 21 5 11 21 20 10 7 21 15 7 13 12 20 5 21 6 19 15 9 20 14 4 16 18 19 9 7 28 5 18 10 14 10 23 6
12 30 16 17 26 34 23 19 16 26 35 15 27 20 26 14 35 14 27 18 14 26 23 11 15 28 34 23 12 32 25 29 23 19 18 31 15
16 27 16 8 17 25 18 14 13 25 24 14 21 17 22 8 27 16 26 10 18 26 23 8 17 21 23 11 8 32 14 21 24 12 18 26 14
*Reproduced from p. 314, T i m m (1975), with permission.
(1975) is
831
On the selection of variables under regression models
Table 7 Finite intersection test--multivariate example--first dependent variable (RPMT) Critical values
Error degrees of freedom Sample estimate of error variance Overall a-level
i
l 2 3 4 5
fli
0.2110 0.0646 0.2136 -0.0373 -0.0521
Fi
0.82773 0.24418 2.85731 0.06725 0.11646
31 8.65 0.0169524
Poincare'slower bound Sidak's upper bound Productupper bound
9.8828 10.0195 10.0586
Simultaneous confidenceintervals Poincare's Sidak's Product lower bound upper bound upper bound (-0.52,0.94) (-0.52,0.95) (-0.52,0.95) (-0.35,0.48) (-0.35,0.45) (-0.35,0.45) (-0.18,0.61) ( 0.19,0.61) (-0.19,0.61) ( 0.49,0.42) (-0.49,0.42) (-0.49,0.42) (-0.53,0.43) (-0.54,0.43) (-0.54,0.43)
used. These data are reproduced in Table 6. The three dependent variables are scores on a student achievement test (SAT), the Peabody Picture Vocabulary Test (PPVT), and the Ravin Progressive Matrices Test (RPMT). The independent variables consisted of the sum of the number of items answered correctly out of 20 on a learning proficiency test on two exposures to five types of paired-associated learning proficiency tasks. These five tasks are named (N), skill (S), named skill (NS), named action (NA), and sentence skill (SS). The same F O R T R A N program used for the analysis of the previous se6tion was used for this analysis, since when using the finite intersection test, a multivariate linear regression can be expressed as several independent univariate linear regressions. The results appear in Tables 7-9. The model for these data is Y= XB + E
(6.1)
where B ' = [fl0, B'l], B'I = [ill ..... flq] and the overall hypothesis tested is H : B l = 0 against the alternative A : B 1~ 0. Again, a test on the intercept is not performed. As in the previous univariate example, Tables 7 - 9 display simultaneous confidence intervals constructed using the three bounds on the critical values. For these data the use of the product upper bound results in confidence intervals only 0.9% wider than the confidence intervals using Poincare's lower bound, while the use of Sidak's upper bound produces confidence intervals only 0.7% wider than the confidence intervals using Poincare's lower bound. Again, very satisfactory estimates of the true critical values have been obtained using probability inequalities. Note that in each of Tables 7 - 9 the Type I error rate is given as a* = 0.0169524. This yields an experimentwise error rate of a = 0.05, since (1 - a*) 3 = 1 - a, there
832
James L. Schmidhammer
Table 8 Finite intersection test--multivariate example--second dependent variable (PPVT) Critical values
Error degrees of freedom Sample estimate of conditional error variance Overall a-level
i
~/i
~
1 2 3 4 5
-0.2486 -0.7725 -0.4684 1.5001 0.3655
0.11201 3.47015 1.25916 10.85130 0.57045
30 86.49 0.0169524
Poincare's lower bound Sidak's upper bound Product upper bound.
9.9414 10.0781 10.1172
Simultaneous confidence intervals Poincare's Sidak's Product lower bound upper bound upper bound ( 2.59,2.09) ( 2.08,0.54) (-1.78,0.85) (0.06,2.94) (-1.16,1.89)
(-2.61,2.11) (-2.09,0.54) ( 1.79,0.86) (0.05,2.95) (-1.17,1.90)
(--2.61,2.11) (-2.09,0.55) (--1.80,0.86) (0.05,2.95) (--1.17,1.90)
being 3 dependent variables. Also recall that Tables 8 and 9 display statistics on conditional means, variances, and regression coefficients, the results of Table 8 being conditioned on holding the first dependent variable (RPMT) fixed, and the results of Table 9 being conditioned on holding both the first and second dependent variables (RPMT and PPVT) fixed. The results of Table 8 show that the overall hypothesis H is rejected, since the h y p o t h e s i s 942 : '042 = 0 is rejected. Thus, the independent variable named action (NA) is probably the only variable of importance in (6.1), and the other independent variables (N, S, NS, SS) can be regarded as unimportant.
Table 9 Finite intersection test--multivariate example--third dependent variable (SAT) Critical values
Error degrees of freedom Sample estimate of conditional error variance Overall a-level
i
~i
F/
1 2 3 4 5
--0.8567 0.1871 1.8858 --0.1162 2.1723
0.26320 0.03624 3.89072 0.00950 3.92810
29 435.38 0.0169524
Poincare's lower botmd Sidak's upper bound Product upper bound
9.9805 10.1172 10.1563
Simultaneous confidence intervals Poincare's Sidak's Product lower bound upper bound upper bound (--6.13,4.42) (--2.92,3.29) (--4.91,1.13) (--3.88,3.65) (-- 1.29,5.63)
(--6.17,4.45) (--2.94,3.31) (--4.93, 1.16) ( 3.91,3.68) (-- 1.31,5.66)
(--6.18,4.46) (--2.94,3.32) (--4.93, 1.16) (--3.92,3.68) (-- 1.32,5.67)
On the selection of variables under regression models
833
References [1] Cox, C. M., Krishnaiah, P. R., Lee, J. C., Reising, J. and Schuurmann, F. J. (1980). A study on finite intersection tests for multiple comparisons of means. In: P. R. Krishnalah, ed., Multivariate Analysis, Vol. V. North-Holland, Amsterdam. [2] Draper, N. R. and Smith, H. (1966). Applied Regression Analysis. Wiley, New York. [3] Krishnaiah, P. R. (1963). Simultaneous tests and the efficiency of generalized incomplete block designs. Tech. Rept. ARL 63-174. Wright-Patterson Air Force Base, OH. [4] Krishnalah, P. R. (1964). Multiple comparison tests in multivariate case. Tech. Rept. ARL 64-124. Wright-Patterson Air Force Base, OH. [5] Krishnaiah, P. R. and Armitage, J. V. (1965). Probability integrals of the multivariate F distribution, with tables and applications. Tech. Rept. ARL 65-236. Wright-Patterson Air Force Base, OH. [6] Krishnalah, P. R. (1965a). On the simultaneous ANOVA and MANOVA tests. Ann. Inst. Statist. Math. 17, 35-53. [7] Krishnalah, P. R. (1965b). Multiple comparison tests in multi-response experiments. SankhyS, Ser. A 27, 65-72. [8] Krishnaiah, P. R. (1969). Simultaneous test procedures under general MANOVA models. In: P. R. Krishnaiah, ed., Multivariate Analysis, Vol. II. Academic Press, New York. [9] Krishnaiah, P. R. and Armitage, J. V. (1970). On a multivariate F distribution. In: R. C. Bose et al., eds., Essays in Probability and Statistics. Univ. of North Carolina Press, Chapell Hill, NC. [10] Krishnaiah, P. R. (1979). Some developments on simultaneous test procedures. In: P. R. Krishnaiah, ed. Developments in Statistics, Vol. 2. Academic Press, New York. [11] Krishnaiah, P. R. (1980). Computations of some multivariate distributions. In: P. R. Krishnaiah, ed., Handbook of Statistics, Vol. 1: Analysis of Variance. North-Holland, Amsterdam. [12] Mudholkar, G. S. and Subbaiah, P. (1979). MANOVA multiple comparisons associated with finite intersection tests. In: P. R. Krishnaiah, ed., Multivariate Analysis, Vol. V. North-Holland, Amsterdam. [13] Roy, S. N. and Bose, R. C. (1953). Simultaneous confidence interval estimation. Ann. Math. Statist. 24, 513-536. [14] Roy, J. (1958). Step-down procedure in multivariate analysis. Ann. Math. Statist. 29, 1177-1187. [ 15] Scheff6, H. (1953). A method for judging all contrasts in the analysis of variance. Biometrika 40, 87-104. [16] Scheff6, H. (1959). The Analysis of Variance. Wiley, New York. [17] Schuurmann, F. J., Krishnalah, P. R. and Chattopadhyay, A. K. (1975). Tables for a multivariate F distribution. SankhyS, Set. B 37, 308-331. [18] Sidak, Z. (1967). Rectangular confidence regions for the means of multivariate normal distributions. J. Amer. Statist. Assoc. 62, 626-633. [19] Timm, N. (1975). Multivariate Analysis with Applications and Psychology. Brooks/Cole, Monterey, CA.