Higher order C(α) tests with applications to mixture models

Higher order C(α) tests with applications to mixture models

Journal of Statistical Planning and Inference 113 (2003) 179 – 187 www.elsevier.com/locate/jspi Higher order C() tests with applications to mixture...

131KB Sizes 0 Downloads 18 Views

Journal of Statistical Planning and Inference 113 (2003) 179 – 187

www.elsevier.com/locate/jspi

Higher order C() tests with applications to mixture models Chandranath Pal Department of Statistics, University of Kalyani, West Bengal 741 235, India Received 24 August 2000; received in revised form 15 October 2001; accepted 19 October 2001

Abstract A class of second-order C() tests is proposed for testing composite hypotheses when no optimal test among the (1rst-order) C() tests, as de1ned by Neyman (Probability and Statistics, The Harald Cramer Volume, Almqvist and Wiksell, Uppsala, Sweden, 1959, p. 213), exists. The form of an optimal test of the new subclass is also presented. The procedure is seen to be easily extendable to C() tests of third or higher orders. Two examples of univariate normal mixtures are considered where the procedure is successfully applied. For the corresponding multivariate models, union–intersection tests of Roy (Ann. Math. Statist. 24 (1953) 220) are derived by c 2002 Elsevier Science B.V. combining the above optimal tests of the univariate problems.  All rights reserved. MSC: primary 62H10 Keywords: C() tests; Mixture models; Union-intersection principle

1. Introduction Neyman (1959) (henceforth Neyman) developed a class of asymptotic tests called the class C() of tests in connection with optimal testing problems in the presence of nuisance parameters when ‘exact optimum tests cannot be found’. A typical member of this class is formed under certain regularity conditions similar to Cramer (1946, p. 500), an assumption on the existence of locally root n consistent estimators of the nuisance parameters under the null hypothesis and the theory of regression. For a detailed account of the de1nitions, notations, terminology, theory and procedure of C() tests, we refer to Neyman. We, however, simplify some of the notations according to our convenience by making obvious modi1cations. E-mail address: [email protected] (C. Pal). c 2002 Elsevier Science B.V. All rights reserved. 0378-3758/02/$ - see front matter  PII: S 0 3 7 8 - 3 7 5 8 ( 0 1 ) 0 0 3 1 2 - 3

180

C. Pal / Journal of Statistical Planning and Inference 113 (2003) 179 – 187

In some cases however, it happens that optimum tests in this class do not exist. We illustrate this in Section 2 by an example of a mixture of normal distributions. It is, therefore, necessary to extend the concept of C() tests to second or higher order classes where optimal tests do exist and can be found. Section 3 deals with such a development. For another example, also on mixture of normals, we see that no optimal test even in the class of second-order tests exists so that we need to go for the third-order tests. In Section 4, we consider mixtures of the corresponding multivariate distributions and construct union–intersection (∪–∩) tests of Roy (1953) for the hypotheses of no mixture. We see for the 1rst example that the ∪–∩ test coincides with a modi1ed version of the kurtosis test considered by Mardia (1970, 1974) whereas for the second example the test is equivalent to the skewness test, both used for testing normality. 2. Non-existence of an optimal C () test Let X be a random observable (with support X) having a distribution which depends on a parameter  ∈  ⊂ R and a vector of nuisance parameters  ∈  ⊂ Rs . Suppose the logarithmic derivatives ’ (x; ) and ’j (x; ) of the density p(x|; ), for j = 1; 2; : : : ; s; all evaluated under the null hypothesis H0 :  = 0 , are Cramer functions, as de1ned by Neyman. To complete notations, let m+ , for any positive integer m, designate the √ class of sequences of alternatives {n } such that n ¿ 0 for all n and (n − 0 )m n = O(1) as n → ∞. Neyman has shown that (Theorem 3) the critical region given by n 1  ∗ Zn∗ = √ f (xi ; ˆn ) ¿ (); n i=1 where

  s    1 ’ (x; ) − a0j ’j (x; ) ; f∗ (x; ) =  0 ()  j=1

a0j s are the associated partial regression coeHcients and ˆn = (ˆ1n ; ˆ2n ; : : : ; ˆsn ) is any root n consistent estimator of  under H0 , is an optimal C() test with reference to the sequence of family 1+  , provided 02 (), the minimum variance under H0 of the s random quantity ’ (X; ) − j=1 aj ’j (X; ) is positive. However, in some practical situations it may happen that 02 ()=0. This occurs when there exists a linear relationship among ’ (X; ) and ’j (X; )s with probability one. In that case no optimal C() test of Neyman exists. The following example illustrates such a situation: Example 2.1. Let p(x|; 2 ) be the density of a 12 -mixture of N(; 2 ) and N(−; 2 ) distributions; so that



x− x+ 1  + ; (2.1) p(x|; 2 ) = 2  

C. Pal / Journal of Statistical Planning and Inference 113 (2003) 179 – 187

181

where (:) is the standard normal density with the corresponding c.d.f. (:);  ¿ 0;  ¿ 0 and both these parameters are unknown. Such a model is known as a mirror-image model or a mirrored mixture (Yamamoto and Shinozaki; 2000) and is non-regular in the sense that the logarithmic derivative of p(x|; 2 ) with respect to  at =0 vanishes identically in x and 2 (SenGupta and Pal; 1993). However; the optimal C() test for the hypothesis of no mixture under the presence of 2 as the nuisance parameter for this example; needs to be modi1ed. Because of the above non-regularity, we make a uni1ed reparametrization by the transformation (; 2 ) → (; ) where  = 2 and  = 2 and consider the problem of testing H0 :  = 0 against H1 :  ¿ 0 treating  as the single nuisance parameter. One can then express the density p as ∞

2 1 x +   r x2r √ p(x|; ) = exp − : 2  2r (2r)! 2 2 r=0

It then follows that ’ (X; ) = ’ (X; ) = (X 2 − )=2 2 . Hence 02 () = 0. 3. Second and higher order C () tests While generalizations of the C() approach in diMerent directions have been tried by many authors (see e.g., Buehler and Puri, 1966; Moran, 1970; Hall and Mathiason, 1990), no work seems to exist towards the development of higher order C() tests with particular applications to mixtures. In what follows we shall make such an attempt. Let f(x; ) be an arbitrary Cramer function with E f(X; ) = (; ). If (0; ) = 0, we can use Taylor expansion to get (; ) =  (0; ) + o() (cf. (56) of Neyman), where  (0; ) is the derivative of (; ) w.r.t.  at  = 0 and is equal to the covariance of f(X; ) and ’ (X; ), evaluated under  = 0. In order to construct the class of second-order C() tests for the null hypothesis H0 :  = 0 against the alternative H1 :  ¿ 0, in the presence of  as the nuisance parameter, we take into account all normed Cramer functions g(x; ) satisfying the condition of the existence of the integral X

g(x; )p(x|; ) d x

along with its twice diMerentiability w.r.t.  at  = 0 under the sign of integration. We then take   s  1  f(x; ) = g(x; ) − a0 ’ (x; ) − b0j ’j (x; ) ; (g; ) j=1

0

b0j s

where a and are the partial regression coeHcients of g on ’ and ’j s, respectively, and 2 (g; ) is the variance, assumed to be positive, of s  g(X; ) − a0 ’ (X; ) − b0j ’j (X; ) j=1

182

C. Pal / Journal of Statistical Planning and Inference 113 (2003) 179 – 187

under H0 . Clearly, one would then have  (0; ) = 0 in addition to (0; ) = 0. Taylor expansion then yields 2 (; ) =  (0; ) + o(2 ); 2 where   @2   (0; ) = 2 (; ) (3.1) = CovH0 (f(X; ); ’2 + ’ ); @ =0

’ being the second logarithmic derivative of p(X |; ) w.r.t.  evaluated at  = 0. The class of second-order C() tests is based on the statistics of the form n 1  f(Xi ; ˆn ); Zn = √ n i=1 where ˆn is a sequence of locally root n consistent estimators of . Let us now de1ne   s  1 g∗ (x; ) − a0∗ ’ (x; ) −  f∗ (x; ) = ∗ b0∗ j ’j (x; ) ; 0 () j=1

where g∗ (x; ) = ’2 (x; ) + ’ (x; ), a0∗ and b0∗ j s are, as before, the associated partial ∗2 regression coeHcients and 0 () is the variance of the residual under consideration. We can then have the following. Theorem 3.1. If the estimates ˆjn of the parameters j ; for j = 1; 2; : : : ; s are locally root n consistent (without being strongly consistent); then the test of the hypothesis H0 based on the sequence of critical regions {!n0 }; where !n0 is given by   n  1 !n0 = (X1 ; X2 ; : : : ; Xn ) : Zn∗ = √ f∗ (Xi ; ˆn ) ¿ () n i=1 is an optimal test of the class of second-order C() tests with reference to the sequences of alternatives {n } ∈ 2+ . √ Proof. If 2n n remains bounded as n → ∞; an argument similar to Theorem 2 of Neyman establishes that the asymptotic distribution of Zn∗ is normal with mean √  2  (0; )(n =2) n and variance unity. In view of (3.1); the rest of the proof follows in the lines of Theorem 3 of Neyman. Illustration. Let us now apply Theorem 3.1 to the example considered in Section 2. Note that; under the new parametrization; ’ = −X 4 =6 4 . Recall that ’ = ’ = (X 2 − )=2 2 . Hence ’ + ’2 = X 4 =12 4 − X 2 =2 3 + 1=4 2 . Again it can be veri1ed that CovH0 (’ + ’2 ; ’ ) = 0; so that the associated regression coeHcient a0∗ is zero. Consequently; b0∗ = 0. It therefore remains to evaluate 0∗ (). After a little algebra it turns out that 0∗2 () = 1=6 4 . Also; the maximum likelihood estimator of  under n H0 is ˆn = (1=n) i=1 Xi2 ; which is also root n consistent. Replacing this estimator in

C. Pal / Journal of Statistical Planning and Inference 113 (2003) 179 – 187

183

Zn∗ of the above Theorem; it follows; after some simpli1cation; that an optimal test is based on the critical region  

 √ m4 1 ¿ () ; − !n0 = (X1 ; X2 ; : : : ; Xn ) : Zn∗ = 6n 4 12m2 2 n where mr = (1=n) i=1 Xir ; the rth raw moment about origin. An expression for the asymptotic power of the above optimal test against any sequence of alternatives {n } ∈ 2+ and any  is given by

√ 2 n 1 −  () − √ n : 2 6 2 Remark 3.1. It is clear from the above discussion that the class C() of asymptotic tests; as developed by Neyman; can be termed as the class of 1rst-order C() tests. When an optimal test in the said class fails to exist; it may happen that there is one in the class of second-order tests; which is essentially a subclass of the former. Optimality; however; holds with reference to a diMerent sequence of alternatives of a higher order. Also; as the following example shows; no optimal test even among the class of second-order C() tests may exist. It is then envisaged that higher order C() tests can be analogously de1ned under more stringent conditions and an optimal test in such a class can be constructed by an appeal to the above procedure. Example 3.1. Consider now the mixture density      x − 1 1− x − 1 p(x|; ) = 1=2  ; +  + 1=2  2 21=2 2 21=2

(3.2)

where  (0 ¡  ¡ 1) is known;  (−∞ ¡  ¡ ∞) is the unknown parameter of interest and  = (1 ; 2 ) is the vector of nuisance parameters with −∞ ¡ 1 ¡ ∞ and 0 ¡ 2 ¡ ∞. Observe that (3.2) is a reparametrized version of the mixture density



1−  x − &1 x − &2 + (3.3) p(x|&1 ; &2 ; '2 ) =   ' ' ' ' with 1 = &2 ; 2 = '2 and  = (&2 − &1 )='. Suppose one wishes to test the null hypothesis H0 :  = 0 against the two-sided alternative H1 :  = 0 based on a sample of 1xed size n. The above density is clearly not a member of the exponential family. So one can attempt to construct a locally most powerful similar unbiased (LMPSU) test for this problem along the line of SpjHtvoll (1968). But, though not obvious, it turns out that no such LMPSU test exists. Again the problem can be reduced by invoking location-scale invariance since 1 and 21=2 are the location and the scale parameters, respectively. Still one can verify after some heavy algebra that unique locally most powerful invariant unbiased (LMPIU) test does

184

C. Pal / Journal of Statistical Planning and Inference 113 (2003) 179 – 187

not exist — the form of such a test depends on the particular maximal invariant chosen. It it then natural to explore the possibility of applying the theory of Neyman. One can verify in this case that ’ = −Y ;

’1 = Y=21=2

and

’2 =

1 221=2

(Y 2 − 1);

where Y ≡ Y (X; ) = (X − 1 )=21=2 . Clearly the partial regression coeHcient of ’ on ’1 is −21=2 , whereas that of ’ on ’2 is 0. Hence the residual variance is again 0 and no optimal test in the class of 1rst order tests exists. As regards the optimal second-order test, the key function is ’ + ’2 = (Y 2 − 1) which is to be regressed on −Y , Y=21=2 and (1=221=2 )(Y 2 − 1). The residual variance is again 0 and no optimal second-order C() test exists. Failure of the existence of the optimal tests of 1rst two orders naturally leads us to consider third-order tests. To achieve this end, we make the obvious additional assumptions regarding the choice of Cramer functions in this case. The key function here, analogous to g∗ of the second-order class, is g∗ (X; ) = ’ + 3’ :’ + ’3 ; ’ being the third logarithmic derivative of p(x|; ) w.r.t.  evaluated at =0. After some simpli1cation we get g∗ = (3Y − Y 3 ). To construct the function f∗ , we 1rst 1nd the multiple regression equation of g∗ on ’ and ’ + ’2 and then take the residual. Since all the partial regression coeHcients turn out to be 0 here, the required residual is g∗ itself. By the same reasoning the residual of g∗ after taking multiple regression on ’1 and ’2 is again g∗ . Also Var H0 (g∗ ) = 62 . Hence using the MLEs ˆ1n = XO and n ˆ 2n = (1=n) i=1 (Xi − XO )2 of 1 and 2 , respectively, under H0 , an optimal third-order C() test for H0 against H1 is given by the critical region    !n0∗ = (X1 ; X2 ; : : : ; Xn ) : n=6|g1 | ¿ (=2) ; where g1 is the usual measure of skewness of the data. Observe that this coincides with the so called skewness test and has been used for a long time for testing normality (see e.g., D’Agostino and Pearson, 1973, 1974; Mendell et al., 1993). Remark 3.2. When  = 12 ; the mixture distribution is symmetric for any value of  and consequently; the above skewness test will fail to detect departure from normality. In this case; we suggest that one should use both skewness and kurtosis tests in the lines of Mardia (1974). 4. Testing in multivariate models Consider the multivariate models corresponding to (3.2) and (3.3). These are, respectively, p(x|; T) = 12 {Nk (x|; T) + Nk (x| − ; T)}

(4.1)

C. Pal / Journal of Statistical Planning and Inference 113 (2003) 179 – 187

185

and p(x|1 ; 2 ; +) = Nk (x|1 ; +) + (1 − )Nk (x|2 ; +);

(4.2)

where, in (4.1),  is an unknown k component vector and T is an unknown positive de1nite symmetric matrix of order k. Similarly in (4.2), 1 and 2 are unknown vectors of k components and + is an unknown positive de1nite symmetric matrix. The mixing proportion  ∈ (0; 1) in (4.2) is assumed to be known. In both (4.1) and in (4.2), Nk (:|:; :) is the density function of a k-variate normal distribution with appropriate parameters. For testing the hypothesis of no mixture in each of the models, we can conveniently combine the optimal tests of the univariate problems of Section 3 by the union–intersection (∪–∩) principle of Roy (1953). For this we need the following Lemma which establishes Cramer–Wold type result for mixtures of normal distributions. Lemma 4.1. X ∼ Nk (1 ; ,1 ) + (1 − )Nk (2 ; ,2 ) i5 l  X ∼ N(l  1 ; l  ,1 l) + (1 − )N(l  2 ; l  ,2 l) for every l = 0: Proof. Follows from standard arguments through characteristic functions. We are now in a position to construct ∪–∩ tests for ‘no mixture’ in each of the above models. We consider (4.1) and the treatment for the other will follow analogously. Observe that  {H0l : l   = 0}: H0 :  = 0 ≡ l=0

Write Zi =l  Xi for each i =1; 2; : : : ; n. Using Lemma 4.1 and noting that the distribution of m4 (Z )=m2 2 (Z ) does not depend on l under H0l , it follows that the ∪–∩ test rejects H0 if 2 n   l Xi Xi l ¿ c; sup l  Al l=0 i=1

where c = c(n) generically denotes a constant such that the overall size of the n test is  and A = i=1 Xi Xi which is positive de1nite with probability 1 provided n ¿ k (this is a consequence of Theorem 2:1 of Eaton and Perlman, 1973). But 2  2

 n n   l Xi Xi l l Xi Xi l 6 sup sup l  Al l  Al l=0 l=0 i=1

i=1

=

n  i=1



l  Xi X  l sup  i l Al l=0

2 =

n  i=1

(Xi A−1 Xi )2

186

C. Pal / Journal of Statistical Planning and Inference 113 (2003) 179 – 187

=

n  

l=0

i=1

6 inf

inf

l  Xi Xi l l  Al

2

2 n   l Xi X  l

l=0

i=1

l A

i

6

n  i=1

inf

l=0

l  Xi Xi l l  Al

2

:

This implies that 2  n n   l Xi Xi l sup = (Xi A−1 Xi )2 : l  Al l=0 i=1

i=1

Hence the above test reduces to: Reject H0 if n 1   −1 2 (Xi S Xi ) ¿ c; n

(4.3)

i=1

where S = A=n. It may be instructive to compare the test statistic given by (4.3) with expression (3:12) for multivariate kurtosis of Mardia (1970). The cut-oM point c can be found asymptotically using the limiting distribution (3:20) of Mardia (1970) with b2; p playing the role of the test statistic of (4.3) and p being replaced by k. Proceeding in a similar fashion with model (4.2) one sees that the ∪–∩ test for the null hypothesis H0 : ‘no mixture’ is given by: Reject H0 if n n 1  {(Xi − XO ) S ∗−1 (Xj − XO )}3 ¿ c; (4.4) n2 i=1 j=1

n where, as before, c satis1es the overall size condition and S ∗ =(1=n) i=1 (Xi −XO )(Xi − XO ) . That S ∗ also is positive de1nite with probability 1 follows from e.g., Johnson and Wichern (1996, pp. 110 –111) (see also Theorem 3:4 of Eaton and Perlman, 1973). Recall that the test (4.4) is based on the measure of multivariate skewness b1; p as de1ned by Mardia (1974) (see also Mardia, 1970). An asymptotic expression for c can now be found from (2:26) of Mardia (1970). An algorithm for calculating b1; p can be found in Mardia and Zemroch (1975). Note that ∪–∩ tests, in general, are not unique — the form of such a test depends on the choice of tests for the corresponding univariate problem and also on the way in which they are combined. For our above examples, the ∪–∩ tests are quite appropriate because these are obtained from the ‘optimal’ ones for the univariate problems, the justi1cation for the way of combining which lies in Lemma 4.1. In the light of the above, we have the following. Theorem 4.1. The tests (4:3) and (4:4) are the appropriate ∪–∩ tests for the hypothesis of ‘no mixture’ corresponding to the models (4:1) and (4:2); respectively. Remark 4.1. (i) As in the univariate case; the test (4.4) should be used along with the kurtosis test of Mardia (1974) when  = 12 .

C. Pal / Journal of Statistical Planning and Inference 113 (2003) 179 – 187

187

(ii) The above ∪–∩ tests are derived from the corresponding univariate tests which are optimal in the sense of having the maximum asymptotic local power among tests of certain classes. No such optimality property, however, is known for these mutivariate tests. Nevertheless, (4.3) and (4.4) turn out to be objective procedures for testing multivariate normality if one restricts to multivariate models of the forms (4.1) and (4.2). This may provide some insight into the relevance of using skewness or kurtosis tests as satisfactory procedures for testing multivariate normality. Acknowledgements I would like to thank Professor A. SenGupta of the Indian Statistical Institute, Calcutta, for some helpful discussions. Thanks are also due to two Referees for their constructive comments which have led to this improved presentation of the paper over its earlier version. References Buehler, W.J., Puri, P.S., 1966. On optimal asymptotic tests of composite hypotheses with several constraints. Z. Wahrsch. Verw. Gebiete 5, 71–88. Cramer, H., 1946. Mathematical Methods of Statistics. Princeton University Press, Princeton. D’Agostino, R.B., Pearson, √ E.S., 1973. Tests for departure from normality. Empirical results for the distributions of b2 and b1 . Biometrika 60, 613–622. D’Agostino, R.B., Pearson, E.S., 1974. √ Correction to “Tests for departure from normality. Empirical results for the distributions of b2 and b1 ”. Biometrika 61, 647. Eaton, M.L., Perlman, M.D., 1973. The non-singularity of generalized sample covariance matrices. Ann. Statist. 1, 710–717. Johnson, R.A., Wichern, D.W., 1996. Applied Multivariate Statistical Analysis, 3rd Edition. Prentice-Hall of India Private Limited, New Delhi, India. Hall, W.J., Mathiason, D.J., 1990. On large-sample estimation and testing in parametric models. Int. Statist. Rev. 58, 77–97. Mardia, K.V., 1970. Measures of multivariate skewness and kurtosis with applications. Biometrika 57, 519–530. Mardia, K.V., 1974. Applications of some measures of multivariate skewness and kurtosis in testing normality and robustness studies. Sankhya Ser. B 36, 115–128. Mardia, K.V., Zemroch, P.J., 1975. Measures of multivariate skewness and kurtosis. App. Statist. 24, 262–265. Mendell, N.R., Finch, S.J., Thode Jr., H.C., 1993. Where is the likelihood ratio test powerful for detecting two component normal mixtures?. Biometrics 49, 907–915. Moran, P.A.P., 1970. On asymptotically optimal tests of composite hypotheses. Biometrika 57, 47–55. Neyman, J., 1959. Optimal asymptotic tests of composite statistical hypotheses. In: Grenander, U. (Ed.), Probability and Statistics, The Harald Cramer Volume, Almqvist and Wiksell, Uppsala, Sweden, pp. 213–234. Roy, S.N., 1953. On a heuristic method of test construction and its use in multivariate analysis. Ann. Math. Statist. 24, 220–238. SenGupta, A., Pal, C., 1993. Optimal tests in some applied mixture models with non-regularity problems. In: Basu, S.K., Sinha, B.K. (Eds.), Probability and Statistics. Narosa Publishing House, New Delhi, pp. 151–164. SpjHtvoll, E., 1968. Most powerful tests for some non-exponential families. Ann. Math. Statist. 39, 772–784. Yamamoto, W., Shinozaki, N., 2000. On uniqueness of two principal points for univariate location mixtures. Statist. Probab. Lett. 46, 33–42.