Testing dimensionality in the multivariate analysis of variance

Statistics & Probability North-Holland Letters December 12 (1991) 445-463 1991 Testing dimensionality in the multivariate analysis of variance T...

Download PDF

1MB Sizes 12 Downloads 88 Views

Report

PDF Reader
Full Text

Statistics & Probability North-Holland

Letters

December

12 (1991) 445-463

1991

Testing dimensionality in the multivariate analysis of variance T. W. Anderson Department

of Statistics,

Stanford

University,

Stanford,

CA 94305, USA

Yasuo Amemiya Department

of Statistics,

Iowa State University, Ames, IA 50011, USA

Abstract: In the multivariate one-way classification with fixed or random effects the between-group effects may be restricted to a lower dimensional space. The problem of testing the dimension of the effect space is treated. For the balanced random effect model the asymptotic null distribution of the likelihood ratio statistic is discussed; the asymptotic distribution is not chi-squared. For the unbalanced fixed or random effect model we suggest the use of a test statistic of the same form. The test statistic is shown to have the same asymptotic null distribution as that for the balanced random effect model. The result is extended to the fixed or random effect models with covariates. The use of the test is illustrated in an example from animal breeding. Some optimality properties of the tests and open problems are discussed as well. AMS

1980 Subject Classification:

Keywords:

Multivariate

variance

Primary

62815;

components,

Secondary

test for rank,

62F05. random

effect, fixed effect, asymptotic

distribution.

1. Introduction In the univariate analysis of variance an important question is whether there are real effects. For example, in the one-way classification the investigator may ask if the means of all classes are equal. The same question can be asked in the multivariate analysis of variance. However, new questions occur in the latter case. If effects exist, they may exist only in some coordinates or only in some linear combinations of coordinates. In geometric terminology, the effects may be evident in a linear space of dimension lower than the dimension of the full observational space. The dimensionality of the effect space impacts on further analysis of the data. For instance, if the effects are considered to be one-dimensional, the class means can be ordered, and a ‘best’ may be defined in terms of this ranking. This paper has to do with testing a hypothesis of a specified dimension for the effects. We treat the case of the one-way classification for ease of exposition. Both fixed and random effects are considered.

2. One-way classification A standard

model for the multivariate x a,=ya+Uai,

This paper

with fixed effects

is prepared

0167-7152/91/$03.50

under

j=l,..., National

0 1991 - Elsevier

one-way

analysis

k,,

)...)

a=1

Science Foundation Science Publishers

Grant

of variance

is

(1)

n, DMS 89-04851

B.V. All rights reserved

at Stanford

University. 445

STATISTICS & PROBABILITY

Volume 12, Number 6

December 1991

LETTERS

where cc,! is a p-component observation vector, the unobservable y, is nonstochastic, and the unobservable u,,‘s.are independently distributed according to N(0, !P). Then the x,,‘s are independent, and x,, is distributed according to N(y,, e). Let

The effect sum of squares H=

(or ‘between’

sum of squares)

is

i k,&_-x)(x,-2)’ Cl=1 (2)

= i k,[(y,-y)+(u,-u)][(v,-y)+(u,-u)]’, IX=1 where 7 = (l/C,k,)C,k,y,, ‘within’ sum of squares) G=

U, = (l/k,)Eju,,

and

si = (l/C,k,)C,k,i,.

i ; (x~~-X~)(Y~,-X,)‘= a=1 j=l

i

5

a=1

j=l

i

sum of squares

(uu,-Uu)(uu,-Zu)‘.

Tests of dimensionality are based on these two matrices. The two matrices H and G are independent. The matrix W( !P, n - 1, s2,) with noncentrality matrix a,,=

The error

(or

is (3)

H has the noncentral

Wishart

distribution

(4)

k,(y,-V)(y,-7)‘.

a=1

The matrix G has the central Wishart distribution W(9, Cz=,k, - n). The likelihood ratio criterion (based on the Wishart distributions) for testing effects H:

D,=O

or

y,=

...

the null hypothesis

of no

=y,

is given by

(5) The exponent C”,,lk, - 1 is the sum of the degrees of freedom of H and G. The null hypothesis if the LRC is too small. (See Anderson (1984, Chapter 8) for example.) Define d, >d,> as the ordered

‘..

>d P

roots of

$H-d;G

(6)

=O,

where M = n - 1 and N = E.“,=,k, - n are the numbers of degrees of freedom terms of these roots the likelihood ratio criterion is the square root of (LRC)2 = 11$ (1 + $d,)

446

is rejected

++?

of H and G, respectively.

In

Volume

12, Number

STATISTICS

6

& PROBABILITY

LE’ITERS

December

This fact shows that the likelihood ratio criterion is invariant with respect where C is nonsingular. Equivalently, the test leads to rejection of H if -zlogLRC=(M+N)L~l

that is, that there exist exactly p - m for all CX,or equivalently that

-v?...>Y,-u)

is of rank m. The hypothesis H:

rank(L?,)

can be stated

= m

-ZlogLRC=(M+N),=$+t The null hypothesis

as

that rank (s2,) > m. The likelihood

against the alternative (Anderson, 1951b)

is rejected

3. One-way classification The random

x,, + CxaJ,

log(l+$d,)

is too large. Now consider the hypothesis that the effects are m-dimensional, linearly independent p x 1 vectors a such that a’y, is a constant (n

to transformations

1991

ratio criterion

for this null hypothesis

is given by

log(l+$d;).

(7)

if (7) is too large.

with random effects - balanced case

effect model is x a,=p+u,+u,,,

The Q’S and u,, ‘s N(0, 9). Hence, the has the distribution In this and the next Wishart distribution

k,,

j=l,...,

cu=l,...,

n.

are independent. The distribution of marginal distribution of xaj is N(p, 8 W( \k, Ct=,k, - n). If the k, are not section we consider the balanced case IV(!P + k8, n - l), and

u, is + 9). equal, where

(8) N(0, 8) and the distribution of u,, is Again H and G are independent, and G H may not have a Wishart distribution. k, =. . . . = k, = k, say. Then H has the

8;G=+,

(9)

&&H=Wk@.

(10)

Define s, >, 622 as the ordered

. . . 2 a,>, 0

roots of

I(k+k63-H’~

=O.

(11)

Then d,, . . . , d, estimate 6,, . . . , S,, respectively. However, note that 6, >, 1, whereas Define the diagonal matrix D = diag(d,, . . . , dp) and the square matrix R so $H= whereM=n-1

R’DR,

$G=

d, can be less than

1.

R’R,

and N=n(k-1). 447

Volume 12, Number 6

STATISTICS & PROBABILITY

It is possible to estimate (l/M)H - (l/N)G since

k8 + !P by (l/M)H

and

LETTERS

December 1991

\k by (l/N)G.

An unbiased

estimator

of k8

is

However

-&Iis positive

;G=R'(D-I,)R only if d, >, 1, i = 1,. . . , p. This fact suggests

semidefinite

that

(12) where the diagonal matrix D+ consists of the roots dj > 1 and p* is the number of dj > 1, would be a better estimator of k8. In fact, (12) is the maximum likelihood estimator of k8. See Amemiya (1985) and Anderson, Anderson and Olkin (1986). Now consider the null hypothesis that the effects are 0, that is, that V, = 0 with probability 1. We can express the null hypothesis as H: The likelihood

8=0

or

ratio criterion

a,=

a..

=$=l.

is given by

if p * =o. Note that roots d, that are less than 1 give no evidence that 8 # 0. Thus some of the smaller roots may be ignored. Next consider the null hypothesis that the effects are of dimension not greater than m. The hypothesis is H: against

rank(@),
the alternative

that rank(@) P*

i=v+, \ (Anderson,

or

a,+,=

...

> m. The likelihood

ratio criterion

is given by

d”(M+NyfN (M~~+N)~+~

1,

1948, Anderson,

=Sp=l

Anderson

and Olkin,

’

‘*

‘m’

P

*
03)

1986; Schott and Saw, 1984).

4. Asymptotic theory for the balanced case of random effects In general the exact distribution of -2 log LRC defined in (13) is too complicated to find explicitly. Instead we look to asymptotic theory. One asymptotic theory obtains when n is fixed and k -+ co. An unbiased estimator of 8 is 448

December 1991

STATISTICS & PROBABILITY LETI-ERS

Volume 12, Number 6

1 1 kzH-;G. i

(14)

1

[(l/(kM)]H has the distribution W[(l/M)(@ + q/k), M]. As k + 03, [l/(kM)]H 3 W[(l/M)@, M] and [l/(kN)]G 3 0. Hence, (14) is not a consistent estimator of 8 as k -+ co. Instead we turn to the asymptotic theory as n + co. Throughout this section we assume rank(@) = m. To describe the asymptotic theory we define A = diag( 6,, . _,a,) and r( p x p) so However,

r’(‘P+

l+\kr=

k@)r=A,

Ip.

Since rank( 63) = m, S,, , = . . . = 6, = 1. Define, G* =

H * = T’Hr,

further,

mr.

The H* and G* have the distributions W(A, M) and N = n(k - 1). Note that d, > . . . > d, are the roots of AH*-d$G*

W(Z,,, N),

respectively,

where

M = n - 1 and

=Q.

Asn-+co, &He

sA=diag(&

,...,

S,,l,...,

l),

where there are p - m l’s, and

Since the roots of a polynomial i=l

d, 5 a,, Now we consider fi(B

,...,

equation

j

functions

of the coefficients,

P.

the nontrivial

H*-A

are continuous

= U,,

asymptotic fi(&

distribution G*-I,

)

of H * and G *. Define

=v,.

Then

where the symmetric matrices ZJ and V are normally and independently distributed. The (functionally independent) elements of U and V are independently distributed with mean 0. The diagonal element u,; has variance 2S,‘, and the off-diagonal element ulj has variance S&3,, i fj. The diagonal element u,; has variance 2/(k - l), and the off-diagonal element u,, has variance l/(k - 1). The limiting distribution of h(d,-l), i=m+l,..., p, can be obtained from the distribution of U and V. (See Anderson (1951a), Anderson (1989a), and Amemiya (1990) for the general theory.) p. Then h,+, > ... > h, are the p - m smallest roots of Let h, = &(d, - l), i=m+l,..., (AH*-(l+&h~;G’l=O, that is, of

(16) 449

STATISTICS & PROBABILITY

Volume 12, Number 6

Let A, = diag(&,

December 1991

LETTERS

. . . , a,,,),

where U,,, and V,i, are m x m. Then (16) can be written

as

(17) where

08)

Factoring

out l/h

from the last p - m columns,

(17) is equivalent

to

(19)

As n + co, (15) holds. For a moment assume that U,, V,, U, and V are nonstochastic matrices and that (15) holds as the usual limit. Then the coefficients of the pth degree polynomial (19) in h converge as n + cc to the coefficients of the ( p - m)th degree polynomial in h given by

IA, -I,

1. I&,-

V,,-hl,_,

I =O,

(20)

where U,, and V,, are the (p - m) x (p - m) lower right-hand corners of U and V. We refer to lemmas on the convergence of the roots of a sequence of polynomials whose coefficients converge by Bai, Krishnaiah and Liang (1986), Anderson (1989a) and Amemiya (1990). Such a lemma implies that h m+,, . . . , h,, treated as sequences of real numbers, converge to the ordered characteristic roots of ZJ,, - V,,. Thus, by Rubin’s theorem given in Anderson (1951a) and proved in Anderson (1963), the set of random variables h m+ 1, . . . , h, converges in distribution to the set of ordered characteristic roots of a random matrix (U,, - V,,). If we re-normalize hi = fi(d, - l), i = m + 1,. . . , p, we obtain

&g&w%? where b,, i=m+l,...,

and the (functionally 450

i=m+l

p, are the characteristic

independent)

elements

,...,

(21)

P,

roots of

of A are independent

and normally

distributed

with means 0.

Volume

12, Number

STATISTICS

6

Each diagonal element statistic in (13) as y,,=

-2

has variance

log(LRC)

=

& PROBABILITY

1, and each off-diagonal

5 i=mfl

December

LETTERS

element

has variance

1991

i. We can write the test

f(d,),

(22)

where

f(d)= [-

M log d+ (M+N)

and I( .) is the indicator

function.

f(d,)%,zl(b,>l),

log [$&$j]I(d>

The Taylor

expansion

i=m+l,...,

(23)

11, argument

and (21) imply

that

p.

Hence Y,$

c

6;.

6, z 0

For convenience

we relabel 1

24’Q See Anderson Y=

b,,,+ ,, . . . , b,, as b,, . . . , by, where q = p - m. The density

ePz~“~/‘n(b,-b,), ,r( ii)

...

>b,.

I
(1984, Theorem c

b,>

of b,, . . . , b4 is

13.3.5). We write

b,2.

(25)

b, > 0

The conclusion of this derivation is summarized discussed further in later sections. Theorem 1. For the balanced random

Y,>

Y

in the following

effect model when rank(@)

theorem.

The result of the theorem

= m,

(26)

asn+oo,

where Y, and Y are defined in (22) and (25), respectively. If q = 2, the cumulative

will be

distribution

Pr{ Y
of the random

- + e-‘/2 i&+(1-$)E],

0

variable

Y is (Anderson, Y>O.

1989b)

(27)

This is obtained by integration of the density (24) of b,, b, over the two regions {b, G 0, 0 G b, Q fi} and { 6, d b,, b: + bi <_y }. The cumulative distribution (27) and the density (0 , 3 the integration becomes more complicated. Hence, we turn to simulations (Amemiya, Anderson and Lewis, 1990). The results are based on 3 X 106/q replications of drawing from the distributions of A, q = 2,. . . ,25. The replications were grouped into 300 batches (each consisting of 10000/q replications). The means of the quantiles of the 300 batches yielded estimates of the quantiles of the distributions of Y. Some of these are given in Table 1. A more complete list is available in Amemiya, Anderson and Lewis (1990). From the variances of the 300 means estimates of the standard deviations of the estimated quantiles were computed. Some of them are given in Table 2. 451

Volume

12, Number

6

STATISTICS

& PROBABILITY

LETTERS

December

1991

8 I

I

I

I

0

5

10

15

Y

Fig. 1. Cumulative

For the xz-variable

Fisher

(1941, Section

distribution

function

20), suggested

of Y for q = 2

that

is approximately a standard normal variate. It seemed reasonable that the square root transform of Y would yield approximate normality; in keeping with the approximation for x’, we subtract the square root of the expected value of Y minus i_ We suggest, therefore, that 1.239[fi

- &C&I+

1) - +]

Fig. 2. Density

452

of Y for q = 2.

Volume

12, Number

Table 1 Some quantiles

6

STATISTICS

of the distributions

& PROBABILITY

LETTERS

December

of Y

probability

4 2

3

6

9

17

25

0.500 0.750 0.900 0.950 0.975 0.990

0.77171 2.15349 4.04566 5.48451 6.92288 8.82110 (exact)

2.261 4.284 6.729 8.500 10.22 12.43

9.764 13.54 17.53 20.20 22.69 25.88

21.76 27.23 32.71 36.29 39.55 43.77

75.79 85.67 95.07 101.1 106.6 113.3

161.6 175.9 189.5 197.8 205.5 215.0

Table 2 Estimated probability

0.500 0.750 0.900 0.950 0.975 0.990

standard

deviations

1991

of the estimated

quantiles

of Y

4 3

6

9

17

25

0.00292 0.00492 0.00741 0.00994 0.01527 0.02236

0.00982 0.01221 0.01681 0.02385 0.02994 0.04541

0.01622 0.01978 0.02860 0.03778 0.04979 0.07814

0.04340 0.05059 0.06698 0.08154 0.1019 0.1651

0.06906 0.08560 0.1170 0.1499 0.1805 0.2754

is approximately normal with mean 0 and variance 1. The number distributions for CJ= 10,. . . ,25; over that range the approximation

5. One-way classification

1.239 is determined is quite accurate.

from the empirical

with random effects - unbalanced case

The model considered is (8) where the k,‘s are not necessarily all equal. In this case (H, G) defined in (2) and (3) is not a sufficient statistic and the maximum likelihood estimators of 8 and 9 do not have explicit expressions. Nevertheless, we can make inferences about the model based on H and G. Note that b(l/N)G = 9 and cf$H=f+&‘~

k,(v,-ti)(v,-5)’ a=1

=?l’+k*@, where ik,-+ a=1

Thus d,, . . , d, estimate IP+k*@-891

C”,,k: a

the ordered

1

. a

i

roots of

=O. 453

Volume 12, Number

6

STATISTICS

& PROBABILITY

LETTERS

December

1991

The effect covariance matrix k * 8 can still be estimated by (12). For testing H: rank(S) < m in this model we propose the use of the test statistic Y, defined in (22) using H in (2) and G in (3). We point out that Y, can also be used for the fixed effect model treated in Section 1. The asymptotic null distribution of Y, when rank(@) = m was given in Section 4 for the balanced random effect model. In the next two sections we derive the asymptotic null distribution of Y, for the unbalanced fixed and random effect models.

6. Asymptotic theory for the unblanced case of fixed effects We shall show that the result (26) in Theorem derivation than the one given here is possible, for the random effect model. Throughout this where Sz,, is defined in (4). Assume further: (a) n-‘C”,=,k, -+ c as n + co. (b) @” = (l/M)D, + e0 as n -+ co, where (The proof is simpler if rank(9,) = m for useful in Section 7.) Define the square matrix T,, so that q@,,T,‘=A.=diag(X,,,,

1 holds for the fixed effect model (1) as n + cc. A simpler but we give a derivation that will also be useful in Section 7 section we assume model (1) with rank(O,) < m for all n,

rank(@) = m. n >,p, but the proof

%,Zr...,%,p),

assuming

only

rank(S2,)

< m will be

T,!PT,’ = Ip ,

where X,, 2 ... >Xnr>Xn,r+l = ... = X,, = 0 are the roots of 1S, - X9 1 = 0 and r = rank(S2,). that r depends on n, but r G m for all n. Define the transformed model x*=Tx n 0.l

aJ =

T,y,

+

T,u,,

=

Y,*

+

u:,,

where the dependence of xz,, y,* and uz, on n is suppressed. Then the u,,* ‘s are independently according to N(0, Z,,), the last p - m components of y,* do not depend on LX,and

where v* = E;,,k,y,*/C”,=,k, X!P 1 = 0. Define H,,* = T,HT,‘, Then

and

X,, 2 . . . > h,,

and G,* - W(I,,

H,,*=MA,+L,+L:,+F,,,

- ti*)(y:

- y*)‘,

a=1

F,=

c

k&i,*-ii*)@,*-ii*)‘,

a=1

i*=&-1k.C.

454

of

I@, -

N), where N = C”,_lk, - n. We write (28)

where k,(ii,*

> XO,m+l = . . . = X0, = 0 are the roots

distributed

G,* = T,GT,‘.

H,* and G,* are independent

L, = i

Note

Volume

12, Number

6

STATISTICS

& PROBABILITY

We use the fact that ⅈ, . . . , \lk,ii,* are independent, independent of uz, - U,*, j = 1,. . . , k,, a = 1,. . . , n. Let (p,, column Pl=

LE7TERS

each distributed according Pi) be an n X n orthogonal

December

1991

to N(0, Z,,), and matrix with first

@G-&(/L-&$.

Define

Then

5 k,ii,*ii,*’

I;,=

-

a=1

MA,, =

k

k,y,*y,*’

-

a=1

L,= Because diag(A,,, x ,11,. . ., matrix

i

k,u*Ii*‘=S’S-s,s;=S;S,,

a=1

i

k,*,*‘=

(29)

B;&,

a=1

2 k,ii,*y;’ a=1

-

i k,ii*y*‘= a=1

S;B,.

(30)

(pl, P,‘) is orthogonal, s,, . . . , s, are independent and s, - N(0, I,). We can write A,, = 0) and B2 = (B2,., 0), where A,, is an r X r diagonal matrix with positive diagonal elements x nr, and B2r is (n - 1) x r. Then B,:B,,. = MA,,,. There exists an (n - 1) x (n - 1) orthogonal

Q =

(

&4,n,,,‘/‘,

Q2,)

= (Q,rr

Q2r)r

(31)

where Qz,. is (n - 1) X (n - 1 - r). Define

where SIT isrXp,S,‘f

is(n-1-r)Xp,and

L, = S,*‘B,* = (mS,:‘n’,/,2,

B2* is(n-l)~p.Then O),

where 0 is p x ( p - r). We can alternatively

(32) partition

S,* as

455

Volume 12, Number 6

STATISTICS

& PROBABILITY

LETTERS

December 1991

where S,*, is m X p and $2

. (n - 1 - m) x p. Note that the top r x p submatrix IS write F], = S,* IS,* = S,*,lS,*, + S2zS2:. Then

+

h

~(S,~S,*,

For all n the elements

By the multivariate

h

- ml,)+

~[S2~S2~--(M-m)l,].

of S,* are independent

SzS&

- mI,)

central

$2

Sc;S&-

N(0, 1) random

(33) variables.

As n + cc,

-5 0.

(34)

limit theorem (M-m)Zp]

as n -+ co,

4

W,

(35)

where the functionally independent elements of W are independent and normally and variance 2 for diagonal elements and mean 0 and variance 1 for off-diagonal ( CY,j) element of S,*, (n - 1) x p. Then, for i, j = 1,. . . , p, the (i, j)th element

(S,: ‘LI”~ nr

2

A’ /3

0

is /KS,, + \/h,is,,.where

of S,*, is S,:. We can

nr

distributed with mean 0 elements. Let s,, be the of

* lr

(36)

0 X,; = 0 for

i > r. Since

A,, +X0;,

the limiting

distribution

of the (i, i)th

element of (36) is N(0, 4X,,) and the limiting distribution of the (i, j)th element is N(0, X,, + ha,), i #j. Since X0; = 0, for i > m, the (p - m) X (p - m) lower right-hand corner of this matrix is 0. Note that fl+ 1 as n -+ co. Thus u, 4 u,,

(37)

where the functionally independent elements of U, are independent and are normally distributed with means 0. The variance of the ith diagonal element is 2 + 4X,,, and the variance of the (i, j)th off-diagonal corner of U, has the element is 1 + h,, + ha,, i #j. In particular, the ( p - m) x ( p - m) lower right-hand corner of U in (15). Assumption (a) implies that same distribution as U22, the lower right-hand

where the functionally independent elements of V, are distributed independently, u,, - N[O, 2/(c - l)], as that of V in (15) with k replaced by and v,, = vj, - N[O, l/(c - l)], i #j. This is the same distribution c. Recall that H,,* and G,* are independent, and that the d,‘s are the roots of I(l/M)H,,* - d(l/N)G,,* 1 = 0. Thus the derivation of the limiting distribution of h, = fi( di - l), i = m + 1,. . . , p, given in Section 4 applies here with A in (16) replaced by A,, + Zp,A, in (18) replaced by A,, = diag(X,,, . . . , A,,) + I,, and A, in (20) replaced by A,, = diag(X,,, ...,h,,)+ Zp.The proof goes through because A,, = diag(A,,, 456

0) + diag(A,,,

0).

(39)

Volume

12, Number

STATISTICS

6

& PROBABILITY

LETTERS

December

1991

Since assumption (a) implies c-l (MM+NN)n + c1

(40)

(21) follows. Thus we have derived the following theorem. Theorem 2. The conclusion of Theorem and (b). EI

1 holds for the unbalanced fixed

effect model under assumptions

(a)

Hence the statistic Y, and the asymptotic cut-off points described in Section 4 can be used for the fixed effect unbalanced model to test H: rank(S2,) = m or H: rank(52,) < m.

7. Asymptotic theory for the unbalanced case of random effects We treat the random effect model (8), but relax the normality of u,. Let (u,, . . . , v,,) and (u,,, . . . , unk,) be independent, where the u,, ‘s are independently distributed according to N(0, 3). Assume also that bv, = 0 and EQU~ = 8 has rank m. (The Q’S are not necessarily independently or identically distributed.) Assume (a) and as n + 00,

(b’>

where rank(8,)

= m.

Because Var( u,) = 8 is of rank m, it follows that P{rank(@,,) < m for all n } = 1 (even for discrete Q’S). Thus, through a conditional argument, we can appeal to the derivation for the fixed effect model in Section 6. Consider the conditional distribution of the statistics when q, . . , u,, are given such that rank(&) < m. Conditionally the model is the fixed effect model in Section 6 with y, = p + u, and r = rank(S2,) Q m. The conditional distribution of U, defined by (33) is the distribution as specified in Section 6. The conditional distribution of S,* does not depend on the conditioning (that is, on or,. . . , u,). Thus, unconditionally, the s a,‘~ are independent N(0, 1) random variables and (34) and (35) hold. The terms in (35) and (36) are unconditionally independent. Under (b’), A,, 3 X0,, i = 1,. . . , p, where A,, 2 . . . 2 X,, > XO,m+l = . . . = h,, = 0. Hence (37) holds unconditionally. Under the random effect model of this section, U, and V, are independent, and (38) holds, Thus we can use the derivation in Section 4 for the limiting distribution of h, = &(d, - l), i = m + 1, _. . , p, with the modifications described in Section 6. Here (39) holds in probability, and so in distribution. Thus Rubin’s theorem applies, and the h,, i = m + 1,. . . , p, have the same limiting distribution as that in Section 6. Note that (40) still holds. Therefore we have proved the following theorem. Theorem 3. The conclusion of Theorem and (b’). q

1 holds for the unbalanced

random effect model under assumptions

(a)

Hence the statistic Y, is useful for the unbalanced random effect model as well. We assumed that the q’s have a common covariance matrix 8 of rank m. This assumption can be dropped in Theorem 3 if we add the condition P{rank(B,,)

(41)

in assumption (b’). 457

Volume 12, Number 6

STATISTICS & PROBABILITY

LETTERS

December 1991

As an extension we can consider the model with some components of v, fixed and others random. For such a model the conclusion of Theorem 1 still holds if (b’) and (41) hold.

8. A more general model The one-way classification models (1) and (8) can be extended explanatory variables. A model for such a case is j=l,...,

x a,=B~a,+v,+u,,,

k,,

a=1

,...,

n,

to the case where there are some

(42)

where z,,, t x 1, is a given fixed vector and B is a p X t matrix of (unknown) parameters. If the Q’S are treated as fixed, (42) is the fixed effect one-way classification model with covariate zaj. If the Q’S are assumed random, (42) is the multivariate regression model with nested error structure. We consider testing the dimensionaiity of the between-group effect space of the v,‘s. Throughout this section assume that the u a,‘~ are independently distributed according to N(0, q). For the case of random Q’S we assume that the Q’S and the u,~‘s are independent. Let

,

i

k,xn,

LX=1

i kxxp, a=1

Z,=

(Z, f’),

where lkm= (1, 1,

_. _, l)‘,

X=ZB’+FV+

k, x 1. The model can be written U=Z,

+ u.

(43)

Define H = Xr(Pz, - Pz) X,

G = X’PzoX,

where P,= Y(Y’Y)-‘Y and p y = I - P, for any matrix Y. The matrices H and G are the between-group and within-group sum-of-squares matrices after adjusting for the covariate z,,. It follows that for both the fixed effect model and the random effect model with unspecified distributed of v, the matrices H and G 458

Volume

12, Number

STATISTICS

6

& PROBABILITY

December

LETTERS

1991

are independent and G - W( q, N ), where N = C”, =, k, - rank( Z,,) = rank P’,,. Let M = rank( Z,) rank(Z) = rank( Pz,, - Pz). Note that M is positive for fixed t and sufficiently large n because rank(Z) < t. We investigate properties of the statistic Y, computed using H, G, M, and N of this section in (6) and (22). For the fixed effect model with fixed Q’S, H - W( ‘k, M, s2,*), where Q,T = V’F’(Pz,,-

Pz)FV=

V’F’(P,-

&)FV=

V’F’&FV.

(44)

Hence the distribution of H and G are the same as that given in Section 2 with 52,, replaced by 52:. Thus the statistic Y, can be used to test H: rank( a,*) < m against the alternative that rank(ti,*) > m. The hypothesis H is equivalent to the condition that rank p,FV < m. Given actual values of z,, and k, this condition can be simplified. Intuitively speaking, this condition says that after adjusting for the covariate at most m. In practice the model often 2a, the group differences are restricted to a space of dimension includes an intercept term. Suppose the first element of zu, is 1. Let Z=

(IX%‘,>Zz>.

Then s2,* = (P,FV)‘Pp,z,P,FV is a quadratic form in v, - U. For such an intercept model a sufficient condition for the hypothesis H is that there exist p - m linearly independent p x 1 vectors a such that a’u, is constant for (Y= 1,. . . , n. Because of the exact same distributional form of H and G, we can apply the asymptotic theory of Section 6. Assume that t does not increase with n, and that rank(S2,*) < m for all n and t. Assume (a) and (b” )

$?#,c

-r,

as n + cc,

where rank(

r, ) =

m.

Note that as n + cc, M + cc, M/n -+ 1, N + 00, and N/n + c - 1. Thus (40) holds. Hence, by the result of Section 6, Y, 3 Y, as n + cc. For the random effect model we assume the zb_‘sin (42) are random vectors with bu, = 0 and Fv~v~ = 8. Then &&H=q+k**Q, where k**

=

$tr(F’PF)

Consider using the statistic the asymptotic distribution

(45) Y, to test H: rank(@) < m against the alternative that rank(@) > m. To derive assume that t does not increase with n and rank 8 = m. Also assume (a) and

as n + cc,

(b”’ )

where rank( I’,) = m and a,* is defined

in (44).

We indicate that a proof similar to that in Section 7 (which used the proof in Section 6) can be used to show the result (26) in Theorem 1 for the random model (42) with (b”’ ). Note that P{rank(Q,*) < m for all n } = 1. Define T, so that T,O,*T,‘=

MA,, = M diag{ A, ,,...,

where h,, 2 . . . > X,,. > A, r+l = . . . =A,,, H,,* = T, HT,' ,

G,* = T,GT,,‘,

h,,P},

T,qT,’ = Ip,

= 0, and r = rank(Q,*). V * = VT,‘,

(46) Let

U* = UT,‘. 459

Volume

12, Number

STATISTICS

6

& PROBABILITY

LETTERS

Since P is an idempotent matrix of rank M, there exists an (Cz=,k,) and K ‘K = I,,,. The elements of the M X p matrix

December

x M

1991

matrix K such that P = KK’

S,=K’U* are independently distributed, each according to N(0, 1). Using (46) we can write H,,*=MA,+L,+L:,+F,,

(47)

where L, = S2/B2,

F, = S-& ,

B2 = K’FV*.

The expression (46) is the same as that given by (28) (29) and (30). Because Y = rank(G?,*) G m, B, = (Bzr, 0), and Bi$.&, = MA,,., where Bzr is M X r and Anr = diag(X.,, . . . , A,,.) is nonsingular. Thus we can define an M X M orthogonal matrix Q as in (31). Hence the derivations in Sections 6 and 7 are valid here with SzT, B:, and S,*, being (M - r) X p, M X p and (M - m) X p, respectively, and with (40) still holding. Therefore Y, 3 Y, as n + cc. The following theorem summarizes the results of this section. Theorem 4. The conclusion of Theorem 1 holds for the general fixed effect model (42) under assumptions and (b”) and for the general random effect model (42) under assumptions (a) and (b”’ ). q

(a)

If u, in model (42) contains both fixed and random components, then the conclusion of Theorem 1 still holds provided (b “’ ) holds and P{rank(D,*)

By defining H, G, M, and N as given in this section, the statistic Y, and the cut-off points described in Section 4 can be used for the general model (42) with either fixed or random effects.

9. An example To illustrate the use of the test, we present an example from animal breeding. Dahm, Melton and Fuller (1983) analyzed a data set consisting of four measurements taken on 372 steer calves sired by 51 bulls. The four measurements are: 1. twice the weight (kg) of closely trimmed, nearly boneless meat from the right-hand side of the carcass of an animal; 2. 100 times the average daily gain in weight (kg) over the 252-day feeding period; 3. weaning weight (kg) of an animal at the standardized age at weaning of 200 days; 4. one-tenth the weight (kg) of total digestible nutrients consumed by an animal during the 252-day feeding period. The sire effect is treated as random. The model contains the fixed effects of intercept and classifications due to six different years and nine breeds. The model used is (42) where x,, is a 4 x 1 vector of four measurements on the jth calf sired by the ath sire, z,, consists of O’s and l’s corresponding to the fixed effects classifications, u, is the random sire effect, n = 51, and X:=,/C, = 372. Because the sire effect is nested in the breed effect, rank(Z) = 14, rank(Z,) = 56, M = rank(Z,) - rank(Z) = 42, and N = 372 rank(Z,,) = 316. Dahm, Melton and Fuller (1983) present two matrices that they call the among-sire and 460

STATISTICS & PROBABILITY

Volume 12, Number 6 within

defined

sire sample covariance in Section 8. Thus

matrices.

LETTERS

We treat their matrices

&H=

399.50 164.10 287.59 .148.85

164.10 126.00 113.80 103.04

287.59 113.80 863.25 193.04

148.85 103.04 193.04 ’ 147.33 1

&G=

190.26 66.89 157.20 64.79

66.89 93.50 5.90 64.18

157.20 5.90 606 .oo 116.50

64.79 64.18 116.50 135.59

as our &ZH and &G,

December 1991

where H and G are

1 .

(42) &H and &G are unbiased estimates of !P = \k + k * * 8 and 9, respectively, where in (45). The matrix 8 is a multiple of the genotypic covariance matrix. It is of interest to see whether the genotypic variation of four variables is explained by a smaller number of underlying genetic variables. The roots of (6) computed for this data set are d, = 2.1418, d, = 1.6729, d, = 0.7988, and d, = 0.6521. Because two roots are less than one, our test accepts the null hypothesis that rank 8 < 2. For the null hypothesis that rank 8 < 1, the test statistic Y, in (22) is Under

model

k * * is defined

Y, = -42

log 1.6729 + 358 log[(42.1.6729

+ 316)/358]

Comparing this value to the simulated quantiles in Table 1 achieved significance level is larger than 0.1. Hence, under strong evidence showing that rank 8 should be 2 rather than variables may be explained by only one underlying variable. As an illustration, we consider testing the null hypothesis equality of @ and !P. The test statistic is Y, = -42

log 2.1418 + 358 log[(42.2.1418

= 5.5902.

with q = p - m = 4 - 1 = 3, we find that the the given assumption, we do not find very 1. Thus, the genotypic variability of the four that

+ 316)/358]

m = 0. This is the one-sided

test of the

+ 5.5902 = 18.6056

Comparing this value to the cut-off point with q = 4 - 0 = 4 given in Amemiya, Anderson and Lewis (1990) we obtain the observed significance level of less than 0.01, and conclude that r is a positive semidefinite matrix not equal to 0, and that there is significant genetic (between-sire) variability.

10. Some optimality results Optimality results for the tests in this paper are rather limited. We present some of the known results. First consider the fixed effect model treated in Section 2. For testing 52, = 0, the likelihood ratio statistic (5) is a special case of the likelihood ratio statistic for the general linear hypothesis. Thus, as described in Section 8.10 of Anderson (1984) the test using (5) is admissible and unbiased, and its power increases monotonically in each 9, where the 77,‘s are the roots of

In Sections 5 and 8 we suggested the use of the statistic Y, for the fixed effect models testing Jz, = 0 (or s2,* = 0 for model (42)) Y, can be written as

(1) and (42). For

461

Volume 12, Number 6

STATISTICS & PROBABILITY

where I, > I, > . . . >/,

aretherootsof

g(l)=[-Mlogl-N

LETTERS

December 1991

lH-I(H+G)l=O,and

log(l-f)+N

log N

+MlogM-(M+N)log(M+N)]1(1~~).

M+N

Note that g(l) is continuous, nondecreasing, and convex in [0, 1). Thus, by Corollary 8.10.2 of Anderson (1984, p. 361) the test using K is admissible for testing D,, = 0 (or a,* = 0 for model (42)). For the random effect model (8) we consider the balanced case, where k, = k, a = 1, 2,. . . , n. Then both H and G have Wishart distributions. Thus Theorem 2 of Anderson and Das Gupta (1964) applies to the statistic Y, in (22). The function f(d) defined in (23) is nondecreasing on (0, 00). Hence the condition of Theorem 2 of Anderson and Das Gupta holds. Thus for any c > 0, Pr( Y, > c) is a function of the roots 6,‘s of (11) and is nondecreasing in each 6,.

11. Open problems There are many problems yet to be solved in this area. As stated in Section 10, many optimality properties of the tests are unknown. For the fixed effect model optimality properties of the tests for rank(O,) < m (m > 0) have not been investigated. For the random effect model optimality properties for the unbalanced case have not been discussed. To compare the test statistic Y, given in (22) to some other alternatives, some optimality properties need to be derived. Another open problem concerns the asymptotic distribution of the random variable Y defined in (25) as q + co. Evidence from our simulation suggests that Y is asymptotically normally distributed, but the standard methods of proving asymptotic normality are not effective here. In fact, we do not have a mathematical derivation of the variance of the asymptotic distribution. For q = 2 the exact distribution of Y was obtained by integration over two regions as in Figure 1. In each region an expression can be given for the integral. For q = 3 the region C,,, , & is partitioned into many more regions. Exact integrals can be obtained for some regions. It would be possible to use numerical integration in the other regions. In our asymptotics we let n + co. We can consider alternative asymptotics where n is fixed but the dimension p --, co or q + co. It is an open question whether Y, converges to a normal random variable under such an assumption. Another area of research is evaluating the error made by approximating the distribution of the criterion for finite n by the asymptotic distribution. We have considered testing the null hypothesis rank(@) < m and rank(&?,,) < m, where m is specified. An investigator may be interested in the multiple decision problem (selection of model) of determining the rank as any integer between 0 and p.

Appendix We present the asymptotic distribution of all p roots di of (6) for the fixed effect model (1). (The result holds also for the fixed effect model (42) by the argument used in Section 8.) Let assumptions (a) and (b) hold. Let r,, . . . , r, be positive integers such that rl + . . . + r, = p. Suppose Xnj=+ang? hOi=+OOg, l;=Jt;(d;-+n,g-l), 462

i=r,

+ . . . +rs_t

i=r,+...+r,_,+l,...,

+ l)...)

r, + . . . +rs, rl+...+r,,

g= l)...) g=l,...,

K, K,

Volume

12, Number

6

STATISTICS

& PROBABILITY

LETTERS

December

1991

in where ~#+,r> &,02> . . . > &,OK.For our model GnnK= +aOK= 0. Then [,, . _ . , 1, have a limiting distribution and the marginal limiting distribution which the sets (I, ,..., r,,) ,..., (r,,+ _.. +rK_,+l ,..., lP) are independent of the characteristics roots of A, = UR- (GoR + l)V,, of Ir,+ ~~~++,+l~-..& “‘+rX is the distribution where UR and V, are the g th diagonal blocks of U, and Va, respectively. The functionally independent elements of A, are mdependent, the diagonal elements of A, are N(0, 7,) random variables, and the off-diagonal elements ar;e N(0, iv,) random variables, where 9, = 2( c#& + 2c&,, + c)/( c - 1). The density of the limiting distribution of Z,,+ .__+rR_,+ 1, . . . , l,, + ,., +ra is r,+

L,(C - 1,) 2r,/*q~(r~+‘)/4n:n=,r(

ii>

exp

- 2

i

We assumed that the multiplicity structure unreasonable for the random effect model.

+rw

“’

1 77,:r,+ “’

c +r,_,+1

1,‘. I

of Ani’s stays the same for all n. Such an assumption

is

References Amemiya, Y. (1985) What should be done when an estimated between-group covariance matrix is not nonnegative definite? Amer. Statist. 39, 112-17. Amemiya, Y. (1990), A note on the limiting distribution of certain characteristic roots, Statist. Probab. Lett. 9,465-470. Amemiya, Y. and T.W. Anderson (1989), Percentage points for a test of rank in multivariate components of variance, Tech. Rept. No. 35, Econometric Workshop, Stanford Univ. (Stanford, CA). Amemiya, Y., T.W. Anderson and P.A.W. Lewis (1990) Percentage points for a test of rank in multivariate components of variance, Biometrika 17, 637-641. Anderson, B.M., T.W. Anderson and I. Olkin (1986), Maximum likelihood estimators and likelihood ratio criteria in multivariate components of variance, Ann. Statist. 14, 405417. Anderson, T.W. (1948), Analysis of multivariate variance, unpublished. Anderson, T.W. (1951a), The asymptotic distribution of certain characteristic roots and vectors, in: J. Neyman, ed., Proc. of the Second Berkeley Symp. on Math. Statist. and Probab. (Univ. of California Press, Berkeley, CA) pp. 103-130. [Reprinted in: G.P.H. Styan, ed., The Collected Papers of T. W. Anderson, 1943-1985 (Wiley, New York, 1990) pp. 187-214.1 Anderson, T.W. (1951b), Estimating linear restrictions on regression coefficients for multivariate normal distributions, Ann. Math. Statist. 22, 327-351. [Correction in: Ann. Statist. 8, 1400. Reprinted in: G.P.H. Styan, ed., The Collected Papers of T. W. Anderson, 1943-1985 (Wiley, New York, 1990) pp. 217-241.1 Anderson, T.W. (1963), Asymptotic theory for principal component analysis, Ann. Math. Statist. 34, 122-148. [Re-

printed in: G.P.H. Styan, ed., The Collected Papers of T. W. Anderson, 1943-1985 (Wiley, New York, 1990) pp. 613639.1 Anderson, T.W. (1984) An Introduction to Multiuariate Statistical Analysis (Wiley, New York, 2nd ed.). Anderson, T.W. (1989a), The asymptotic distribution of characteristic roots and vectors in multivariate components of variance, in: L.J. Gleser, M.D. Perlman, S.J. Press and A.R. Sampson, eds., Contributions to Probability and Statistics: Essays in Honor of Ingram Olkin (Springer, New York) pp. 177-196. Anderson, T.W. (1989b), The asymptotic distribution of the likelihood ratio criterion for testing rank in multivariate components of variance, J. M&variate Anal. 30, 72-79. Anderson, T.W. and S. Das Gupta (1964), A monotonicity property of the power functions of some tests of the equality of two covariance matrices, Ann. Math. Stafist. 35, 10591063. [Reprinted in: G.P.H. Styan, ed., The Collected Papers of T. W. Anderson, 1943-1985 (John Wiley, New York, 1990) pp. 689-675.1 Bai, Z.D., P.R. Krishnaiah and W.Q. Liang (1986), On asymptotic joint distribution of the eigenvalues of the noncentral Manova matrix for nonnormal populations, Sankhyii 48, 153-162. Dahm, P.F., B.E. Melton and W.A. Fuller (1983) Generalized least squares estimation of a genotypic covariance matrix, Biometrics 39, 587-597. Fisher, R.A. (1941). Statistical Methods for Research Workers (Stechert & Co., New York, 8th ed.). Schott, J.R. and J.G. Saw (1984) A multivariate one-way classification model with random effects, J. Multivariate Anal. 15. I-12.

463

Testing dimensionality in the multivariate analysis of variance

Testing dimensionality in the multivariate analysis of variance

Recommend Documents