38 On the selection of variables under regression models using krishnaiah's finite intersection tests

38 On the selection of variables under regression models using krishnaiah's finite intersection tests

P. R. Krishnaiah and L. N. Kanal, eds., Handbook of Statistics, Vol. 2 ©North-Holland Publishing Company(1982) 821-833 "~ J On the Selection of Vari...

568KB Sizes 0 Downloads 29 Views

P. R. Krishnaiah and L. N. Kanal, eds., Handbook of Statistics, Vol. 2 ©North-Holland Publishing Company(1982) 821-833

"~ J

On the Selection of Variables Under Regression Models Using Krishnaiah's Finite Intersection Tests* J a m e s L. S c h m i d h a m m e r

1.

Introduction

The finite intersection test procedure under the univariate regression model was first considered by Krishnaiah in 1960 in an unpublished report which was subsequently issued as a technical report in 1963 and later published in Krishnaiah (1965a). The problem of testing the hypotheses that the regression coefficients are zero as well as the problem of testing the hypotheses that contrasts on means, in the A N O V A setup, are equal to zero are special cases of this procedure. Finite intersection tests under general multivariate regression models were proposed by Krishnaiah (1965b). In this chapter, we discuss some.applications of Krishnaiah's finite intersection tests for selection of variables under univariate and multivariate regression models. Section 2 gives some background material on the multivariate F distribution, which is the distribution most commonly used in conjunction with the finite intersection test. Section 3 describes the application of the finite intersection test procedure to the univariate linear regression problem, while Section 4 discusses the extension to the multivariate linear regression case. Finally, Sections 5 and 6 illustrate the use of the finite intersection test with univariate and multivariate examples respectively.

2.

The multivariate F distribution

Let x~ ..... x n be distributed independently and identically as multivariate normal random vectors with mean vector /L and covariance matrix 2~, with x~=(xil ..... Xip ) for i = 1 ..... n,/#=(/~l,...,/~p), and ~ = ( o i j ) . Also, let z j = (1/%)E7= ~x2 f o r j = 1..... p. Then the joint distribution of z 1..... Zp is a central or noncentral multivariate chi-square distribution with n degrees of freedom (central *This work is sponsored by the Air Force Office of Scientific Research under Contract F49629-82K-001. Reproduction, in whole or in part, is permitted for any purpose of the United States Government. 821

822

James L Schmidhammer

if /t = 0, noncentral if /t =/=0). Also, let z 0 be distributed independently of z ' = (z I ..... zp) as a central chi-square random variable with m degrees of freedom, with E ( z o ) = m . In addition, let F i = ( z i / n ) / ( z o / m ). Then the joint distribution of F~ ..... Fp is a multivariate F distribution with n and m degrees of freedom, with P as the correlation matrix of the accompanying, multivariate normal distribution, where P = (0/j) and &j = o/j/(oi/Oja)l/2. This distribution was introduced by Krishnaiah (1963, 1965a). When n = 1, the multivariate F distribution is known as the multivariate t 2 distribution. For an exact expression for the density of the central multivariate F distribution when X is nonsingular, see Krishnaiah (1964). Exact percentage points of the bivariate F distribution were given in Schuurmann, Krishnaiah and Chattopadhyay (1975), whereas approximate percentage points of the multivariate F distribution were constructed by Krishnaiah and Schuurmann and reproduced in Krishnaiah (1980). Percentage points in the equicorrelated c a s e (Pij= P for i v~ j ) were given in Krishnaiah and Armitage (1970), and Krishnaiah and Armitage (1965), when n = 1. In general, exact percentage points of the multivariate F distribution are difficult to obtain when ~ is arbitrary. However, bounds on these percentage points can be obtained using several probability inequalities. Upper and lower bounds on the critical values can be computed using Poincare's formula,

~<1-- ~

P [ F i > F,~]+ ~] P[F,.> F ~ O F j > F,~].

i=1

i
(2.1) The left-hand side can be used to obtain an upper bound on F~, while the right-hand side can be used to obtain a lower bound. To describe other probability inequalities useful for obtaining bounds on the critical values, let x = (x~,...,xr)' be distributed as multivariate normal random vector with mean vector 0 and covariance matrix a2P, where P = (&j) is the correlation matrix of the xi's, and let y = (Yl,-.. ,Yr)' be distributed as a multivariate normal random vector, with mean vector 0 and covariance matrix OZIr . Also, let s 2 / o 2 ~ X 2 , independent of x and y. Then Sidak (1967) showed that

(2.2)

(,/(;)

(2.3)

On the selection of variables under regression modeb

823

Inequality (2.2) is referred to here as Sidak's u p p e r bound, and inequality (2.3) is referred to as the product u p p e r bound. F o r additional discussion, see Krishnaiah (1979).

3.

The finite intersection t e s t - - A simultaneous procedure in the univariate case Consider the usual univariate linear model given b y y=

Xfl+ e

where y is an n-vector of independently distributed normal r a n d o m variables, X = [1, XI] with 1 being an n-vector of all l's, and X l an n × q matrix of k n o w n constants representing the n observations on the independent variables x l . . . . . x ~ , , f l = ( f l o , fll . . . . . fla) is a ( q + l ) - v e c t o r of u n k n o w n parameters, and ~ " " covarl-" e ~ Nn(0, tr 2 I n). The least squares estimate 0 f f l i s f l^-- - ( X ' X ) -1 X ' y, w~th ance matrix 02( X ' X ) - 1 = 0 2 ( w i j ) . T h e usual error s u m of squares is S 2 = y ' [ I n -

x ( x ' x ) - ' x ' ] y. T h e p r o b l e m of selection of variables can be formulated within the f r a m e w o r k of testing of hypotheses. For example, the variable x~ m a y be declared to be i m p o r t a n t or u n i m p o r t a n t according as fl~ 4 = 0 or fl~ = 0. In order to test the hypothesis H: fl = 0 using the finite intersection test procedure, we partition the overall hypothesis H into a finite intersection of subhypotheses H~:fli = 0 for i :-- 0, 1..... q. Then we accept or reject H i according as F,. % Fi~ where

Fi = w i i S 2 / ( n - q -- 1) W

and P[ o/q=0(Fi < F/~}JH] = 1 - - a . In this p a p e r we will only consider the case where F/~ = F~ for i = 0, 1..... q. Now, the joint distribution of F o, F 1. . . . . Fq is a multivariate ( ( q + l ) - v a r i a t e ) F distribution with 1 and n - q - 1 degrees of freedom. Simultaneous confidence intervals associated with this procedure are given by fli - ~/F~w. S 2 / ( n - q - 1 ) <~ fl, <~ fli + ~F~wii S 2 / ( n - q - 1 )

for i = 0 . . . . . q. F o r comparison, if we use the usual overall F test, we reject H if F > F* where F

]~'(X'X)]~/(q+I) S 2 / ( n - q - 1)

and P [ F < ~ F * ] H ] = I - - a ,

with F ~ F q + l , . q-1. The associated simultaneous

James L. Schmidhammer

824

confidence intervals are given by

~i--~-( q + l)F*wiiSZ/(n-- q--1) <~ <~Bi <~fli + ~/( q + 1)F~*wiiS2/( n -- q -- 1). Now, since F~ ~<(q + 1)F* (see Krishnaiah, 1969), the lengths of the confidence intervals associated with the finite intersection test are never longer than the lengths of the corresponding confidence intervals associated with the overall F test. In the procedure described above, H = ("lqi=oHi, where Hi: fli = 0. Thus, a test is performed on the importance of every independent variable simultaneously, including the intercept. However, it is usually the case that the test H o : r0 = 0 is of no interest, and it is often the case that only a subset of all possible independent variables are to be examined for importance. With this in mind, consider r hypotheses of the form H i : c~fl = 0 for i = 1..... r, with H* = ("]i=r1Hi" In the above context, c[ = (0 ..... 0, 1,0 ..... 0), i.e., c i selects the particular fit of interest for testing, although the procedure described below works for arbitrary % Using the finite intersection test procedure, we reject/4 i if F, > F, where

F,= ( cil0 ) 2 / [ c ; ( X , X ) - , c i S 2 / ( n - q - 1 ) ] r

and P[ O i= IF• . < F~[ H*] = 1 -- a, with the joint distribution of F 1. . . . . F~ being a multivariate F distribution with 1 and (n - q - 1) degrees of freedom. In this case simultaneous confidence intervals are given by

cit~- ~/Fac;( X ' X ) - ' c i S 2 / ( n -

q-1)

<~

C i ~ Cit8"{-~F~c;( X ' X )- 'ciS2//( n - q -- 1).

Table 1 Relative efficiency of overall F test to the finite intersection test (a = 0.05, v 10)

r•

1 2 3 4 5 6 7 8 9 10

0.1

0.5

0.9

1.00 0.83 0.72 0.64 0.58 0.53 0.49 0.45 0.43 0.40

1.00 0.80 0.68 0.60 0.54 0.49 0.43 0.42 0.39 0.36

1.00 0.71 0.57 0.48 0.42 0.37 0.33 0.30 0.28 0.26

On the selection o f variables under regression models

825

Table 2 Relative efficiency of overall F test to the finite intersection test (a = 0.05, u = 30)

1 2 3 4 5 6 7 8 9 10

0.1

0.5

0.9

1.00 0.83 0.72 0.65 0.59 0,54 0.50 0.47 0.44 0.42

1.00 0.81 0.70 0.62 0.56 0.51 0.47 0.44 0.41 0.39

1.00 0.73 0.60 0.51 0.45 0.40 0.37 0.34 0.31 0.29

If instead the usual overall F test is used, then we reject H if F > F* where

and P [ F ~< F* IH*] = 1 - a with F ~ Fr, , _ q 1 and C' = [ c I . . . . . c r ]. Furthermore, simultaneous confidence intervals are given by

ci~-~rF*ci( X ' X ) - ' ¢ i s Z / ( n - q - 1 ) <~cil3 <~eifl + ~rF* ei( X'X )- l¢igZ/( n - -

q --

1).

Again, the lengths of the confidence intervals associated with the finite intersection test are shorter than the lengths of the corresponding confidence intervals associated with the usual overall F test. Table 3 Relative efficiency of overall F test to the finite intersection test (a = 0.01, u = 10)

r• 1 2 3 4 5 6 7 8 9 10

0.1

0.5

0.9

1.00 0.84 0.73 0.66 0.59 0.55 0.51 0.47 0.44 0.42

1.00 0.82 0.71 0.63 0.57 0.52 0.47 0.44 0.41 0.39

1.00 0.76 0.62 0.53 0.47 0.42 0.38 0.34 0.32 0.30

James L. Schmidhammer

826

Table 4 Relative efficiency of overall F test to the finite intersection test ( a = 0.01, ~ = 30)

1 2 3 4 5 6 7 8 9 10

0.1

0.5

0.9

1.00 0.85 0.75 0.68 0.62 0.57 0.53 0.50 0.47 0.44

1.00 0.84 0.74 0.66 0.60 0.55 0.51 0.48 0.45 0.42

1.00 0.79 0.66 0.50 0.52 0.47 0.43 0.40 0.37 0.35

A comparison of the lengths of the confidence intervals associated with the finite intersection test with the corresponding lengths of the confidence intervals associated with the overall F test is given in Tables 1-4. These tables give values of R 2-- F a / ( r F * ) for a (Type I error r a t e ) = 0.01, 0.05 and u (error degrees of freedom) = 10, 30. For similar tables when using the finite intersection test in a one way ANOVA setup, see Cox et al. (1980). For discussions regarding the confidence intervals associated with the overall F test, the reader is referred to Roy and Bose (1953) and Scheff6 (1959).

4.

The finite intersection t e s t - - A simultaneous procedure in the multivariate case

Analogous to the univariate linear model, the multivariate linear model is given by Y=XB+E where Y is an n × p matrix of n observations on p variables Yl ..... yp whose rows are independently distributed as a p-variate normal distribution, with covariance matrix ~, and E ( Y ) = XB. Furthermore, X is as described in the previous section, while B = [fl0, fll ..... flq]' is a (q + 1)X p matrix of unknown regression parameters, and E is an n × p matrix whose rows are independently and identically distributed as a p-variate normal distribution with mean vector 0 and covariance matrix ~. The problem of selection of variables under the multivariate regression model can again be formulated within the framework of simultaneous test procedures as in the univariate case. The problem of testing H : B = 0 is equivalent to the problem of testing H i : c ' i B = O ' for i = 0 , 1 ..... q simultaneously where c~= [Coi , Cli . . . . . Cqi] for i = O, 1,...,q, with (~

Chi =

ifh#i, if h = i,

On the selection of variables under regression models

827

i.e., H = N q=0/-/i. N o t e that with c i as described above, c~B =,6/' for i = 0, 1. . . . . q. W e declare that the variable x i is i m p o r t a n t or u n i m p o r t a n t for prediction of Y ' = (Yl .... ,yp) according as H / i s rejected or accepted. In order to develop the finite intersection test procedure in this case the following notation is needed. Let N k denote the k × k top left-hand corner of Z = ( o i / ) a n d o k+l 2 -- INk+l l / [ ~ [ for k = 0, 1. . . . . p -- 1, with IN o [ = 1. Also, let

[,:,] Yk =

:

r°, = [.ak,k+l

with "/0 = O. Finally, let Y = [ Yl . . . . . Yp], Yj = [ Yl . . . . . yj] for j = 1 . . . . . p, B = [01 .... ,Op], B / = [0~ . . . . . Oj] f o r j = 1 . . . . . p. Now, when Y/ is fixed, the elements of Y/+l are distributed normally with c o m m o n conditional variance ~ 2 and with conditional means ~j+l

(4.1)

where ~//+l = 0/+l -- B/~. Each of the hypotheses Ho, H1,... ,Hq can be expressed as P

for i = 0 , 1 .... ,q j=l

where H//: c;~lj = 0 and c i is as described previously. Thus the p r o b l e m of testing H, which is equivalent to the p r o b l e m of testing H0, H~ ..... Hq simultaneously, is also equivalent to the p r o b l e m of testing H i / ( i = 0, 1..... q; j = 1.... , p ) simultaneously. Noting that the model in (4.1) is just a univariate linear regression model, let c ; ~ / d e n o t e the best linear unbiased estimate ( B L U E ) of c;~//, and let S/2 denote the usual error sum of squares. With the finite intersection test procedure, the test statistic for testing Hij against a two sided alternative is ^ 2 F/j = (ci pr//) /[DqSj2/(n -- j-- q)]

(4.2)

for i = 0,1 ..... q a n d j = 1.... ,p. In (4.2), DijSj2/(n - j -- q) is the sample estimate of the conditional variance of c;~/. W e reject H i / i f F/j > F~, where

P[F,./<- F~; i = O, 1. . . . . q; j = 1..... p i n ] = P

II P[F.ij<~F,;i=O,1 ..... j 1

qln] = l - a .

If any Hi/is rejected, then we declare the ith variable x i to be important.

James L. Schmidhammer

828

When H is true, the joint distribution of Foj, F1/ ..... Fqy, for any given j = l .... ,p, is a (q + 1)-variate F distribution with 1 and n - j - q degrees of freedom. The associated 100(1- a)% simultaneous confidence intervals are given by

c:Oj--~F~DijSj2/(n -- j - - q) <~ e : rIJ • <~ e : ~ .J + ,/F~D iJ . SJ} / ( n V

- j -- q )

Several comparisons have been made between the lengths of the confidence intervals for the finite intersection test in the multivariate case and the lengths of confidence intervals derived from other procedures. It is known (see Krishnaiah, 1965b) that the finite intersection test yields shorter confidence intervals than the step-down procedure of J. Roy (1958). Also, Mudholkar and Subbaiah (1979) made some comparisons of the finite intersection test with the step-down procedure and Roy's largest root test. Additional comparisons of interest are to be found in Cox et al. (1980). Several remarks are in order at this time. First, the critical value F~ has been chosen to be the same for all hypotheses Hij. This was done out of convenience but is Certainly not necessary. Second, the hypothesis Hij was chosen such that Hi = O ~ 1Ho and H = Aq=0Hi, with H : B = 0. However, the overall hypothesis H need not be H : B = 0. We can just as easily consider any set of hypotheses H 1 , . . . , H r where H i : e:B = 0' for i = 1..... r with e/being chosen as desired and H = (-li=lH i. In the context of selection of variables, however, the ei's are to be chosen so as to select out the particular independent variables of interest (see discussion in previous section). r

5.

A univariate example

In order to illustrate the use of the finite intersection test in univariate regression, a data set appearing as Exercise E, p. 230 of Draper and Smith (1966) is analyzed) The data consists of 33 years of observations on corn yield, preseason precipitation, and various monthly readings on temperature and precipitation for the state of Iowa. Corn yield (YIELD) is used as the dependent variable, while the remaining nine variables are used as independent variables. These independent variables are year number (YEAR), preseason precipitation (PSP), May temperature (MAY T), June rain (JUN R), June temperature (JUN T), July rain (JUL R), July temperature (JUL T), August rain (AUG R), and August temperature (AUG T). The data were input into a F O R T R A N i The author is grateful to John Wiley & Sons for givingpermission to use these data for illustrative purposes.

829

On the selection of variables under regression models

Table 5 Finite intersection test--Univariate example Error degrees of freedom Sample estimate of error variance Overall a-level

Critical values Poincare's lower bound Sidak's upper bound Product upper bound Overall F (9F (-95))

23 60.8 0.05

9,23

9.0332 9.1699 9.2969 20.8809

Simultaneous confidence intervals

Poincare's i

fli

Fi

lower bound

1 0.8769 21.96651 (0.31,1.44) 2 0 . 7 8 6 5 3.35373 (--0.50,2.00) 3 0 . 4 5 4 9 1.13878 (-- 1.74,0.83) 4 -0.7644 0.52760 (--3.93,2.40) 5 0 . 4 9 9 7 0.70392 (-- 1.24,2.20) 6 2 . 5 8 2 8 3.53278 (--1.55,6.71) 7 0 . 0 6 0 9 0.00723 (--2.09,2.21) 8 0 . 4 0 1 7 0.15199 (--2.70,3.50) 9 --0.6639 0.88899 (--2.67,1.34)

Sidak's upper bound

(0.31,1.44) (--0.51,2.09) (-- 1.75,0.84) (--3.95,2.42) ( - 1.25,2.21) (-1.58,6.74) (--2.11,2.23) (--2.72,3.52) (--2.69,1.36)

Product upper bound

Overall F test

(0.31,1.45) (0.02,1.73) (-0.52,2.10) ( - 1.18,2.75) ( 1.75,0.84) (-2.40,1.49) (-3.97,2.44) (-5.57,4.04) ( - 1.26,2.22) (-2.13,3.09) (-1.61,6.77) (-3.70,8.86) (-2.12,2.25) ( 3.21,3.34) (-2.74,3.54) (-4.31,5.11) (-2.70,1.37) (-3.71,2.39)

computer program written for use on the DEC-10 computer at the University of Pittsburgh. The results appear in Table 5. The model for the data is y = X13 + e

where fl' = [ flo, 13'1], 13't= [ill .... , flq], and the overall hypothesis tested is H: 131 = 0, against the alternative A : 131 =~ 0. Thus, we test the hypothesis that none of the independent variables are related to the dependent variable, but do not test that the intercept is zero. In Table 5, note that simultaneous confidence intervals are constructed using Poincare's lower bound, Sidak's upper bound, and the product upper bound on the critical values for the finite intersection test, and also using the critical value associated with the overall F test. The confidence intervals associated with overall F test are at least 50% wider than the corresponding confidence intervals using the finite intersection test. However, the confidence intervals constructed using the product upper bound are only 1.4% wider than those constructed using Poincare's lower bound, while the confidence interval constructed using Sidak's upper bound are only 0.75% wider than those using Poincare's lower bound, indicating that a fairly precise estimate of the true critical value is available, at least in this case, using only some probability inequalities. As for the results of the analysis, it is interesting that the only variable related to corn crop yield is year number, reflecting the well known fact that grain production in the United States has been steadily increasing for the past fifty

830

James L. Schmidhammer

years,

primarily

variation

6.

as

a result

of

technological

that might exist due to temperature

innovations,

overwhelming

any

and precipitation.

A multivariate example As an example

linear regression,

of the application a data

of the finite intersection

set appearing

as Table

test in multivariate

4 . 7 . 1 , p. 3 1 4 o f T i m m

Table 6" SAT 49 49 11 9 69 35 6 8 49 8 47 6 14 30 4 24 19 45 22 16 32 37 47 5 6 60 58 6 16 45 9 69 35 19 58 58 79

PPVT

RPMT

48 76 40 52 63 82 71 68 74 70 70 61 54 55 54 40 66 54 64 47 48 52 74 57 57 80 78 70 47 94 63 76 59 55 74 71 54

8 13 13 9 15 14 21 8 11 15 15 11 12 13 10 14 13 10 14 16 16 14 19 12 10 11 13 16 14 19 11 16 I1 8 14 17 14

N

S

NS

NA

SS

1 5 0 0 2 2 0 0 0 3 8 5 1 2 3 0 7 0 12 3 0 4 4 0 0 3 1 2 0 8 2 7 2 0 1 6 0

2 14 10 2 7 15 1 0 0 2 16 4 12 1 12 2 12 6 8 9 7 6 9 2 1 8 18 11 10 10 12 11 5 1 0 4 6

6 14 21 5 11 21 20 10 7 21 15 7 13 12 20 5 21 6 19 15 9 20 14 4 16 18 19 9 7 28 5 18 10 14 10 23 6

12 30 16 17 26 34 23 19 16 26 35 15 27 20 26 14 35 14 27 18 14 26 23 11 15 28 34 23 12 32 25 29 23 19 18 31 15

16 27 16 8 17 25 18 14 13 25 24 14 21 17 22 8 27 16 26 10 18 26 23 8 17 21 23 11 8 32 14 21 24 12 18 26 14

*Reproduced from p. 314, T i m m (1975), with permission.

(1975) is

831

On the selection of variables under regression models

Table 7 Finite intersection test--multivariate example--first dependent variable (RPMT) Critical values

Error degrees of freedom Sample estimate of error variance Overall a-level

i

l 2 3 4 5

fli

0.2110 0.0646 0.2136 -0.0373 -0.0521

Fi

0.82773 0.24418 2.85731 0.06725 0.11646

31 8.65 0.0169524

Poincare'slower bound Sidak's upper bound Productupper bound

9.8828 10.0195 10.0586

Simultaneous confidenceintervals Poincare's Sidak's Product lower bound upper bound upper bound (-0.52,0.94) (-0.52,0.95) (-0.52,0.95) (-0.35,0.48) (-0.35,0.45) (-0.35,0.45) (-0.18,0.61) ( 0.19,0.61) (-0.19,0.61) ( 0.49,0.42) (-0.49,0.42) (-0.49,0.42) (-0.53,0.43) (-0.54,0.43) (-0.54,0.43)

used. These data are reproduced in Table 6. The three dependent variables are scores on a student achievement test (SAT), the Peabody Picture Vocabulary Test (PPVT), and the Ravin Progressive Matrices Test (RPMT). The independent variables consisted of the sum of the number of items answered correctly out of 20 on a learning proficiency test on two exposures to five types of paired-associated learning proficiency tasks. These five tasks are named (N), skill (S), named skill (NS), named action (NA), and sentence skill (SS). The same F O R T R A N program used for the analysis of the previous se6tion was used for this analysis, since when using the finite intersection test, a multivariate linear regression can be expressed as several independent univariate linear regressions. The results appear in Tables 7-9. The model for these data is Y= XB + E

(6.1)

where B ' = [fl0, B'l], B'I = [ill ..... flq] and the overall hypothesis tested is H : B l = 0 against the alternative A : B 1~ 0. Again, a test on the intercept is not performed. As in the previous univariate example, Tables 7 - 9 display simultaneous confidence intervals constructed using the three bounds on the critical values. For these data the use of the product upper bound results in confidence intervals only 0.9% wider than the confidence intervals using Poincare's lower bound, while the use of Sidak's upper bound produces confidence intervals only 0.7% wider than the confidence intervals using Poincare's lower bound. Again, very satisfactory estimates of the true critical values have been obtained using probability inequalities. Note that in each of Tables 7 - 9 the Type I error rate is given as a* = 0.0169524. This yields an experimentwise error rate of a = 0.05, since (1 - a*) 3 = 1 - a, there

832

James L. Schmidhammer

Table 8 Finite intersection test--multivariate example--second dependent variable (PPVT) Critical values

Error degrees of freedom Sample estimate of conditional error variance Overall a-level

i

~/i

~

1 2 3 4 5

-0.2486 -0.7725 -0.4684 1.5001 0.3655

0.11201 3.47015 1.25916 10.85130 0.57045

30 86.49 0.0169524

Poincare's lower bound Sidak's upper bound Product upper bound.

9.9414 10.0781 10.1172

Simultaneous confidence intervals Poincare's Sidak's Product lower bound upper bound upper bound ( 2.59,2.09) ( 2.08,0.54) (-1.78,0.85) (0.06,2.94) (-1.16,1.89)

(-2.61,2.11) (-2.09,0.54) ( 1.79,0.86) (0.05,2.95) (-1.17,1.90)

(--2.61,2.11) (-2.09,0.55) (--1.80,0.86) (0.05,2.95) (--1.17,1.90)

being 3 dependent variables. Also recall that Tables 8 and 9 display statistics on conditional means, variances, and regression coefficients, the results of Table 8 being conditioned on holding the first dependent variable (RPMT) fixed, and the results of Table 9 being conditioned on holding both the first and second dependent variables (RPMT and PPVT) fixed. The results of Table 8 show that the overall hypothesis H is rejected, since the h y p o t h e s i s 942 : '042 = 0 is rejected. Thus, the independent variable named action (NA) is probably the only variable of importance in (6.1), and the other independent variables (N, S, NS, SS) can be regarded as unimportant.

Table 9 Finite intersection test--multivariate example--third dependent variable (SAT) Critical values

Error degrees of freedom Sample estimate of conditional error variance Overall a-level

i

~i

F/

1 2 3 4 5

--0.8567 0.1871 1.8858 --0.1162 2.1723

0.26320 0.03624 3.89072 0.00950 3.92810

29 435.38 0.0169524

Poincare's lower botmd Sidak's upper bound Product upper bound

9.9805 10.1172 10.1563

Simultaneous confidence intervals Poincare's Sidak's Product lower bound upper bound upper bound (--6.13,4.42) (--2.92,3.29) (--4.91,1.13) (--3.88,3.65) (-- 1.29,5.63)

(--6.17,4.45) (--2.94,3.31) (--4.93, 1.16) ( 3.91,3.68) (-- 1.31,5.66)

(--6.18,4.46) (--2.94,3.32) (--4.93, 1.16) (--3.92,3.68) (-- 1.32,5.67)

On the selection of variables under regression models

833

References [1] Cox, C. M., Krishnaiah, P. R., Lee, J. C., Reising, J. and Schuurmann, F. J. (1980). A study on finite intersection tests for multiple comparisons of means. In: P. R. Krishnalah, ed., Multivariate Analysis, Vol. V. North-Holland, Amsterdam. [2] Draper, N. R. and Smith, H. (1966). Applied Regression Analysis. Wiley, New York. [3] Krishnaiah, P. R. (1963). Simultaneous tests and the efficiency of generalized incomplete block designs. Tech. Rept. ARL 63-174. Wright-Patterson Air Force Base, OH. [4] Krishnalah, P. R. (1964). Multiple comparison tests in multivariate case. Tech. Rept. ARL 64-124. Wright-Patterson Air Force Base, OH. [5] Krishnaiah, P. R. and Armitage, J. V. (1965). Probability integrals of the multivariate F distribution, with tables and applications. Tech. Rept. ARL 65-236. Wright-Patterson Air Force Base, OH. [6] Krishnalah, P. R. (1965a). On the simultaneous ANOVA and MANOVA tests. Ann. Inst. Statist. Math. 17, 35-53. [7] Krishnalah, P. R. (1965b). Multiple comparison tests in multi-response experiments. SankhyS, Ser. A 27, 65-72. [8] Krishnaiah, P. R. (1969). Simultaneous test procedures under general MANOVA models. In: P. R. Krishnaiah, ed., Multivariate Analysis, Vol. II. Academic Press, New York. [9] Krishnaiah, P. R. and Armitage, J. V. (1970). On a multivariate F distribution. In: R. C. Bose et al., eds., Essays in Probability and Statistics. Univ. of North Carolina Press, Chapell Hill, NC. [10] Krishnaiah, P. R. (1979). Some developments on simultaneous test procedures. In: P. R. Krishnaiah, ed. Developments in Statistics, Vol. 2. Academic Press, New York. [11] Krishnaiah, P. R. (1980). Computations of some multivariate distributions. In: P. R. Krishnaiah, ed., Handbook of Statistics, Vol. 1: Analysis of Variance. North-Holland, Amsterdam. [12] Mudholkar, G. S. and Subbaiah, P. (1979). MANOVA multiple comparisons associated with finite intersection tests. In: P. R. Krishnaiah, ed., Multivariate Analysis, Vol. V. North-Holland, Amsterdam. [13] Roy, S. N. and Bose, R. C. (1953). Simultaneous confidence interval estimation. Ann. Math. Statist. 24, 513-536. [14] Roy, J. (1958). Step-down procedure in multivariate analysis. Ann. Math. Statist. 29, 1177-1187. [ 15] Scheff6, H. (1953). A method for judging all contrasts in the analysis of variance. Biometrika 40, 87-104. [16] Scheff6, H. (1959). The Analysis of Variance. Wiley, New York. [17] Schuurmann, F. J., Krishnalah, P. R. and Chattopadhyay, A. K. (1975). Tables for a multivariate F distribution. SankhyS, Set. B 37, 308-331. [18] Sidak, Z. (1967). Rectangular confidence regions for the means of multivariate normal distributions. J. Amer. Statist. Assoc. 62, 626-633. [19] Timm, N. (1975). Multivariate Analysis with Applications and Psychology. Brooks/Cole, Monterey, CA.