STATISTICS& PROBABILITY LE'rTlSRS ELSEVIER
Statistics & Probability Letters 35 (1997) 1 7
Minimal sufficient statistics for a general class of mixed models Andr6 I. Khuri Department of Statistics, Universi O, ~[ Florida. Gainesville, FL 32611-8545, USA
Received June 1996; revised September 1996
Abstract Khuri and Ghosh (1990) used a certain technique to obtain minimal sufficient statistics for the unbalanced random two-fold nested model. The present article provides an extension of this technique to a general class of unbalanced models with fixed and random effects. © 1997 Elsevier Science B.V. Keywords: Fixed effects; Neyman factorization theorem; Unbalanced model; Variance components
I. Introduction Minimal sufficient statistics associated with a given data set provide maximum possible reduction of the data without any loss o f information. While such statistics are readily available for balanced linear models (see Graybill, 1976, Ch. 15), they remain largely unknown, in general, when the associated data are unbalanced. Minimal sufficient statistics were established for certain particular unbalanced models associated with mainly balanced incomplete block designs or partially balanced incomplete block designs as in Weeks and Graybill (1962), Hultquist and Graybill (1965), Kapadia and Weeks (1963, 1984), Afonja (1972), and Kapadia et al. (1988). More recently, Khuri and Ghosh (1990) showed that the cell means and the error sum o f squares are minimal sufficient statistics for an unbalanced random two-fold nested model. The present paper extends this result to a general class of unbalanced mixed models, a description of which is given in the next section.
2. Notation and definitions A general mixed model can be expressed as Y0=~70,+e0,
(2.1)
i 0
where 0 = (kl,k2 . . . . , k s ) is a complete set of subscripts that identify a typical response y, 0i is a set of subscripts for the ith effect ( i = 0 , 1. . . . . v), and e0 is a random experimental error. Note that for i = 0 , Oi is the empty set and the corresponding ?0~ is the grand mean, usually denoted by #. For i = 1,2,..., v, Oi consists 0167-7152/97/$17.00 (~ 1997 Elsevier Science B.V. All rights reserved PII S0167-7152(96 )00210-6
2
A.I. Khuri / Statistics & Probability Letters 35 (1997) 1-7
o f subscripts that belong to the set r = (kl, k2 . . . . . ks-1 ). Thus 0 contains one subscript, namely ks, in addition to the subscripts in z. In general, the data set associated with the model (2.1) is considered to be unbalanced with no missing cells, i.e., ki ~> 1 for i = 1, 2 . . . . . s. Without loss o f generality, let us suppose that for i = 0,1 . . . . . p 1 (l~
Y~=
(2.2)
?o~ l n ~ + ~ ,
where I m is a vector of ones o f order n~ x 1 and er is a vector of random experimental errors associated with y~. The mean vector and variance-covariance matrix o f y, in (2.2) are given by
(v)
E ( y 0 =/~rln~
(2.3)
Var(yO =
(2.4)
where #3
=
Z or/2 J"~ + a~l,,, i=p
2P_--O1 ~0, and J~ is the matrix o f ones of order n~ x n~. Furthermore, for r, z t C T, z ¢ z',
Cov(y'r'Yr')= [
Z G~l']nr'nd' 0~C o~(~,r')
(2.5)
where Jnr,nr, is the matrix o f ones of order n~ x n~, and co(r, z ~) denotes the set of subscripts that are common to both r and z ~. In (2.5) the summation extends over all i such that Oi C co(r,r~), where Oi is associated with ~/2. Note that the covariance matrix in (2.5) is equal to zero if co(z,z I) is an empty set, or if it does not contain Oi for any a~. Let y be a column vector of order n x 1 obtained by a vertical concatenation o f the y~'s for all z E T, where n = }--~-rern~. Using formulas (2.3)-(2.5), the mean vector and variance-covariance matrix of y are of the form
E ( y ) = {/&l,~}~r,
(2.6)
Var(y) = {9~~'(a2)Jn, ~,,~, }~,~,~r + a~I,,
(2.7)
where a z = (a2, O'p+21'
' ' ' ,0"2)
9~,,'(a2) = { }--~Vp ~2,
t
and 9~,~'(a2) is defined as r = r',
(2.8)
The right-hand side o f (2.6) represents a column vector o f order n x 1 obtained by a vertical concatenation of the #Tln~ for all r C T. The first term on the right-hand side o f (2.7) represents a partitioned matrix o f order n x n consisting o f values o f 9~,~'0 r2)J.,,.,, for z, z I c T.
A.L Khuri I Statisties & Probability Letters 35 (1997) 1 7
3
3. The main result
In order to derive minimal sufficient statistics for model (2.1), an expression of the associated likelihood function, that is, the density function of y, f(y,K), is needed, where K denotes the vector of all model's parameters. This function is given by 1
f ( y , K ) - (2=)~/21Z11/2 e x p [ - l ( y - q ) ' Z - l ( y - q)],
(3.1)
where i / = E ( y ) and X = V a r ( y ) . Using (2.7), I; can be written as
~, = a~(I,, + F),
(3.2)
where F is the partitioned matrix F = {0r. ~,(a2, a~ )J,~,,,,, }r,r'E r
(3.3)
and 1 2 ~r,r,(O'2, a 3 ) = 7 g r , r,(O" ), "
Z',
i.t
ET.
(7 c
Consider now the matrices, G = {.0r, r,(az,a2)}~,~,Er,
(3.4)
A = Diag(nr)~r.
(3.5)
These matrices are of order c x c, where c denotes the number of elements of T, and A is a diagonal matrix having n~,~E T, along its diagonal. By applying a result in Searle (1982, Exercise 29, p. 153) to the matrix /~ + F in (3.2), we obtain
(L,+F)-' = L - {&,e(,,2, a~,2)J~ ...... }r,~'cr,
(3.6)
where for z, z ' ~ T, the h~.~,(a2, a 2) are the elements of the c x c matrix
H=(GA+I,)
~G.
From (3.2) and (3.6) it follows that Z -~ = 1 [ 1 , - {h~,~,g,,,,~,,}~,,,~r],
(3.7)
where, for simplicity, hr, r,(a 2, a,2:) is written as h~,r,. From (3.1) and (3.7) we then have
f(y,K)(xexp
(y--tl)'(y--11)-•
Note that Yr'1 ,~ = n r ~ , where ~ we have (y - q ) ' ( y - t/) = Z ( y zET
Z r, rtET
hr'r'(&-Ia~ln')'J'"",'(Y~'-Ia~'l"e)
]}
"
(3.8)
is the average of the observations in the rth cell, z E T. Also, from (2.6),
r - #,ln~)'(Yr - / ~ l n , ) .
4
A.I. Khuri / Statistics & Probability Letters 35 (1997) 1 7
Writing y~ - / ~ 1 , , as y~ - y~l~ + (y~ - #~)1,~, it is easy to show that (Y - ql'(Y - q) = ~
(Y~ - y~l,~)'(y~ - y~l,~ )
zET
+Z
n,(y ~ _ #~)2 = SSe + Z
zET
n~(y~ - I~) 2,
(3.9)
zET
where SS~ = Z (Y~ - y~l,~ )'(y~ - y,l,~ ) rET is the residual sum of squares for model (2.1). Furthermore,
hz,~,(yr - ~rln~)tJn~,n~,(yr, - t~,1,~, ) = n~n~,h~,~,(-f~ - / ~ ) ( ~ ,
- / ~ , ).
(3.10)
From (3.8)-(3.10) we finally get the expression, f(y,r)o(exp
{1I
-~-~a2 S S E + Z n ~ ( y ~ - / ~ ) zET
2-
Z z,z'ET
n~n~,h~,~,(-y~-p~)(y~,-#~,)
]/
.
(3.11)
Using Neyman factorization theorem, it is clear from (3.11) that SSe and the cell means, y~ for z E T, are sufficient statistics. Furthermore, formula (3.11) is of the same form as formula (9) in Khuri and Ghosh (1990), hence these statistics are also minimal. We therefore have the following result:
Theorem 1. Under the model 9iven in (2.1) and the assumptions stated in Section 2, SSe and -f~ for z E T 2 . . . . . ff v, 2 (Te2, where contains the fixed effects of the model. a r e minimal sufficient for q, 0"2,0"p+l Remark 1. As was noted in Khuri and Ghosh (1990), the minimal sufficient statistics in Theorem 1 are not necessarily complete. To see this, we note that if #~ =/~ for all z E T, then E(y~) = E(y~,) = kt for z ~ z', which shows that completeness cannot be achieved in this case. Remark 2. The results of Theorem 1 are more general than those given in Khuri (1990, Appendix C), even though the techniques used are similar. In Khuri (1990), the model considered is unbalanced only with respect to ks, that is, the last stage of its associated design. Furthermore, all the effects in the model are random. These restrictions are shown to be unnecessary in the present article. Imbalance can occur in any stage of the design, and the model may contain fixed as well as random effects.
4. Examples The results of the previous sections will be illustrated by considering some specific examples of unbalanced mixed models.
Example 4.1. A two-fold nested model, Yijk :
l ~ q- O~i '~ flij '~ gijk,
(4.1)
where ~i is a fixed unknown parameter, fiij and eqk are mutually independent with the flij'S i.i.d. N(0, a~(~)2), and the eijk's i.i.d. N(0,a~), i = 1,2 . . . . . a; j = 1,2 . . . . . bi; k = 1,2 . . . . . nij.
A.L Khuri / Statistics & Probability Letters 35 (1997) 1 - 7
5
In this case, r = ( i , j ) and ]1~ in formula (2.3) is equal to ]1 + ~. The number of elements of the set T, consisting of all values of z, is c = ~-~=~ ,, ~ " - j Y(ik(i = 1,2 . . . . a; a b~. Furthermore, -y¢ = y~j. --- ± a
j -- 1,2 .... bi) and SSE = ~ i = , Y'~'--L ~ ' - l O ' i j k
~fij.
-¥q.)2
are minimal sufficient statistics. It can be seen that
2 2 has the normal distribution with mean p + ~i and variance crl~(~ ~+ ( o r 2/ n i / ) , and SSE is distributed as a,:2 ,Z.-,.,
where n = ~ a ,=1 }~j=l h, nij and Z2, - c denotes the chi-squared distribution with n - c degrees of freedom. An analysis of variance (ANOVA) table based on Type I sums of squares for the effects in model (4.1) is of the form (see, for example, Milliken and Johnson, 1984, p. 420) given in Table 1. In Table 1, A and B(A) denote the nesting and nested factors, respectively, and b~
17i. = ~ 17ij j= I
1 Yi..
b,
ni ~ 1
niJYij.
a
Y"" = - Z
ni. yi...
i=1
We note that the sums of squares in the A N O V A table are functions of the minimal sufficient statistics. It should be noted that for model (4.1), the matrices G and A in formulas (3.4) and (3.5), respectively, are of the form G = ZI¢.,
A --- Diag(nll,nj2 . . . . . nab,,),
where 2 = al~c~)/a~:. 2 2 The formula for G is true because by formula (2.8), g¢,r,(a 2 ) equal to zero otherwise. Hence, the matrix H (see formula (3.6)) is given by
H=Diag
2 2 2nll+l'2n12+l
2 ..... 2n~+l
=
O'fl(~)2 for z
Z-t and is
=
)
Formula (3.11) can then be expressed as
f ( y , K) oc exp
-~a2
SSe
nij(Yij" -- ]1 --'~i)2 __
qi=1
j=l
Anij q- l (-~ij. _ ]1 _ 3(i)2 i=1
j=l
Table 1 Source
Degrees of
S u m o f squares
freedom A
a
1
B(A)
c- a
Error
n - c
a
SSE
hj
q=,2
a
,
6
A.I. Khuri / Statistics & Probability Letters 35 (1997) 1 7
or equivalently as
f(y,x)(x
exp
-~-a2.
~-'2nij+
SSe+
l(-~ij.-It-~i)
2
(4.2)
.
i=l j=I In particular, if bi = b and nij = m for all i,j, i.e., the design is completely balanced, then formula (4.2) can be written as
f ( y , K ) oc exp
m
-~
Z
SSE + 2m + 1 i=1 j=l
(Yij. - It - c¢i)2
-
= exp
~.
m
-
SSE + 2m +~-~ i=l j=l
2bm ~ 2m + 1 Z-..a
(It -[- O~i)Yi'" q-
bm a 2m +----~Z (It
i=1
-
+
0~i)2
°
i=1 a
b
--2
a
m
This implies that Yi.. (i = 1,2 . . . . . a), Y~=1 Y~j=~ Yij., and SSE = }-~=l Y~=J ~-]k=l(Yijk - - y q . ) 2 are minimal sufficient statistics. Equivalently,-~i..(i= 1,2 . . . . . a), SSB(A) = m }--~i~_l }-~=, y~..-rob y~ia=, y~.., and SSe are also minimal sufficient. This agrees with an already known fact concerning minimal sufficiency for the balanced mixed two-fold nested model. Example 4.2. A model with crossed and nested effects. Consider the model
Yijke = It + ~i + flij + 7ik + (flT)ijk + eijkf,
(4.3)
where 0~i and flij are fixed unknown parameters, 7ik, (flT)ijk, and eijkt are distributed independently as N(0, o'7(7)),2N(0, o'/~7,(~t)),2and N(0, try), respectively, i = 1,2 . . . . . a; j - l , 2 .... ,bi; k = l , 2 . . . . . ci; f = 1,2, .. .,nijk. This model is suited for an experiment involving three factors, namely A, B, and C, where B and C are crossed, but are both nested within A. For this model, ~ = ( i , j , k ) , #3 = It + ei + flij, and the number o f elements o f the set T is c = ~--~i~l bici. A set of minimal sufficient statistics consists of y¢ = -fiijk. = n~i,--; 1 ~'~nf~1YijkE(i . 1,2, . . . , a ;.j .= 1,2, ,bi; k = 1,2 .... ,ci) and SSE = Y'~i=l a Y'~b~-IE k =~l
niik -- )2 . In this case, Yijk. has the normal distribution Et=l(Yijkf--Yijk.
Table 2 Source
Degrees of freedom
Sum of squares
A
a- 1
R(c@)
B(A)
~-]~i"--J (b~ - 1 )
R(/~II,,~)
C(A )
~-~i~1 (ci - 1 )
R(7,]p,~, [])
a
B*C(A)
Ei=I
Error
n- c
(hi - 1)(c~ - l )
R(/~'/I~,~,/~,T) SSE
A.1. Khuri / Statistics & Probability Letters 35 (1997) 1 - 7
with mean # + o~i a
7
~- flij and variance a,,l~ ) 2 + a2[~',,l~)+ a2/n~/k, and SSe is distributed as a,:Zn-c,2 2 where n =
~ k = 1 nijk.
One A N O V A table for model (4.3) can be obtained on the basis o f Type I sums o f squares for the m o d e l ' s
effects (see Table 2). All the sums of squares in this table are functions of the aforementioned minimal sufficient statistics.
Acknowledgements The author wishes to thank the referee for helpful suggestions.
References Afonja, B. (1972), Minimal sufficient statistics for variance components for a general class of designs, Biometrika 59, 295-302. Graybill, F.A. (1976), Theory and Application of the Linear Model (Duxbury, North Scituate, MA). Hultquist, R.A. and F.A. Graybill (1965), Minimal sufficient statistics for the two-way classification mixed model design, ,L Amer. Statist. Assoe. 60, 182-192. Kapadia, C.H. and D.L. Weeks (1963), Variance components in two-way classification models with interaction, Biometrika 50, 327-334. Kapadia, C.H. and D.L. Weeks (1984), Minimal sufficient statistics for the group divisible partially balanced incomplete block design (GD-PBIBD) with interaction under an Eisenhart Model 1I, Metrika 31, 127-144. Kapadia, C.H., A.H. Kvanli and K.R. Lee (1988), Minimal sufficient statistics for incomplete block designs with interaction under an Eisenhart Model III, J. Statist. Plann. InJerence 19, 317-324. Khuri, A.I. (1990), Exact tests for random models with unequal cell frequencies in the last stage, 3~ Statist. Plann. lnjerence 24, 177 193. Khuri, A.I. and M. Ghosh (1990), Minimal sufficient statistics for the unbalanced two-fold nested model, Statist. Probab. Lett. 10, 351 353. Milliken, G.A. and D.E. Johnson (1984), Analysis of Messy Data, Vol. l (Van Nostrand Reinhold, New York). Searle, S.R. (1982), Matrix Algebra UsefulJbr Statistics (Wiley, New York). Weeks, D.L. and F.A. Graybill (1962), A minimal sufficient statistic for a general class of designs, Sankhy& Ser. A 24, 339-354.