On the determination of the number of components in a mixture

On the determination of the number of components in a mixture

STATISTICS& PROBABILITY LETTERS ELSEVIER Statistics & Probability Letters 38 (1998) 295-298 On the determination of the number of components in a mi...

222KB Sizes 8 Downloads 54 Views

STATISTICS& PROBABILITY LETTERS ELSEVIER

Statistics & Probability Letters 38 (1998) 295-298

On the determination of the number of components in a mixture A. Polymenis, D.M.

Titterington

Department of Statistics. UniversiO, of Glasgow, Glasgow GI2 8QQ, Scotland, UK Received November 1997

Abstract A modification is proposed to a method of Windham and Cutler (1992) for determining the number of components in a mixture. An information-based eigenvalue is computed that, in theory, becomes zero as soon as too many mixture components are included in the model. In a simulation exercise, the method appears to out-perform the basic method of Windham and Cutler (1992), and to be equivalent to the bootstrap likelihood ratio method for large sample sizes. (~) 1998 Elsevier Science B.V. All rights reserved Keywords: Information matrices; Mixture distributions; Parametric bootstrap.

I. Introduction The problem o f determining the number of components in a mixture of distributions has been of interest for many years, and has resisted completely satisfactory solution. Chapter 5 of Titterington et al. (1985) summarises the state o f play at that time and updated reviews are presented in Titterington (1990, 1996). Further recent ideas are described by Lindsay (1995) and, in a Bayesian context, by Robert (1996) and Richardson and Green (1997). Much o f the non-Bayesian effort has concentrated on the fact that straightforward use o f the Generalized Likelihood Ratio Test is thwarted by the non-regular nature of the mixture problem. A somewhat different approach is that of Windham and Cutler (1992), and it is on a development of their approach that we briefly report in this note. In Section 2 we outline the rationale behind Windham and Cutler's method, along with our modification, in Section 3 we describe the bootstrap likelihood ratio approach, and in Section 4 we report numerical evidence that speaks in support of the revised approach as compared to Windham and Cutler's method, and highlights the equivalence of the revised approach with the bootstrap likelihood ratio approach for reasonably large sample sizes.

2. Description of the methodology Suppose L(O) denotes the log-likelihood associated with data from a sample of size n from a mixture distribution, and that I - I(0) denotes the corresponding (expected) Fisher information matrix. Let It = It(O) 0167-7152/98/$19.00 (~ 1998 Elsevier Science B.V. All rights reserved PH SO 167-7152(98 )00030-3

296

A. Polymenis, D.M. Titterington / Statistics & Probability Letters 38 (1998) 295-298

denote the information matrix associated with corresponding complete data, which would be given by the available data from the mixture along with the observations' indicators of membership of the various mixture components. Windham and Cutler (1992) concentrate on the magnitude of the smallest eigenvalue of the information ratio matrix, Ic11. Suppose that we denote this eigenvalue by mk = mk(O), assuming that we are fitting a mixture with k components. Then Windham and Cutler's (1992) 'rule' is to choose the maximizer over k of mk(Ok), where (0k) denotes the maximum likelihood estimator of the parameters associated with a k-component model. The motivation is that, heuristically, a large value of the smallest eigenvalue suggests a good clustering of the data, whereas a small value does not. Windham and Cutler (1992) carry out a simulation study using bivariate Normal mixtures that measures the effectiveness of their method and compares it favourably with the use of Akaike's AIC and the partition coefficient of Bezdek (1981). Our modification of the method is motivated by the remark, made towards the end of the paper, that as soon as a mixture with too many components is fitted the Fisher information matrix I will be close to singular, with the result that the corresponding mk will be close to zero. Our idea is therefore to detect the smallest value of k for which mk is 'close to zero', and select k - 1. Table 3 of Windham and Cutler (1992), which evaluates theoretical values of mk using numerical integration, provides considerable hope that our method will be useful in practice, provided we can quantify at what point an observed value of mk is close to zero. In general, there seems no obvious way to develop implementable theory on this point, and we adopt a Monte-Carlo approach in the following computational scheme: 1. for 2~
3. The bootstrap likelihood ratio approach The idea of bootstrapping the likelihood ratio statistic has already been discussed in the literature (e.g. McLachlan, 1987; Feng and MeCulloch, 1996). Denoting by T~+l(O) = 2[L(0 k+l) - L ( 0 k ) ] the likelihood ratio statistic for testing between k and k + 1 components, we use here this idea in a procedure similar to the one used in the previous section in the following way: 1. for l~
A. Polymenis, D.M. Titterinyton / Statistics & Probability Letters 38 (1998) 295-298

297

Table 1 Number of occasions on which the method of Windham and Cutler selects particular numbers of mixture components cr

2

3

4

5

1.5 1.33 1.0 0.67

55 48 38 25

42 47 62 75

3 5 0 0

0 0 0 0

Table 2 Number of occasions on which the modified method selects particular numbers of mixture components. ~r

2

3

4

5

1.5 1.33 1.0

45 29 3

54 67 94

1 4 3

0 0 0

0.67

0

94

6

0

4. N u m e r i c a l results

Our numerical experiments were based on the example used by Windham and Cutler (1992). Samples o f size n = 100 were drawn from equally weighted mixtures o f 3 bivariate circular normal distributions. The means were 4 units apart, forming an equilateral triangle. The parameter 0 consisted o f the means and the mixing weights; the standard deviation cr associated with the circular component densities was assumed known, and experiments were carried out for a = 1.5, 1.33, 1.0 and 0.67. For each case, 100 replications were carried out. The results are summarised in Tables 1-3. Table 1 shows the results o f applying Windham and Cutler's (1992) rule and Table 2 summarises our own rule. The modified method seems to show a distinct improvement. Note that the results in Table 1 are somewhat different from those in Table 1 o f W i n d h a m and Cutler (1992). Our version o f their results is included because they correspond to the same datasets as those used in Table 2. Table 3 gives the results when the bootstrap likelihood ratio rule is applied. For the sample size used in this example our method gives similar results to the bootstrap likelihood ratio for well-separated components. W e also performed simulations, using the same example as before and taking n = 300. In this case, our method detected the correct number o f components on 97 occasions for a = 1.50 and 98 occasions for a = 1.33. It seems, thus, that for this example our rule performs as well as the bootstrap likelihood ratio.

5. D i s c u s s i o n

Our results reinforce our belief that the modified method should be more reliable than Windham and Cutler's (1992) procedure o f simply identifying k as the maximizer o f mk. Clearly there are arbitrary elements o f the Monte-Carlo testing procedure, and it would be nice to be able to derive some asymptotic theory that would remove the need for Monte-Carlo work. On the other hand the bootstrap likelihood ratio has some obvious advantages over our method, i.e. it takes into account the single-component densities as well, and it performs better for small sizes. However, if we ' k n o w ' that the data come from a true mixture (as considered by

298

A. Polymenis, D.M. Titterinyton / Statistics & Probability Letters 38 (1998) 295 298 Table 3 Number of occasions on which the bootstrap likelihood ratio method selects particular numbers of mixture components. 1

2

3

4

5

1.50 1.33 1.00

0 0 0

0 0 0

95 96 99

5 4 0

0 0 1

0.67

0

0

95

5

0

Windham and Cutler), and if the sample size in hand is not small, then we can be confident that our method will perform as well as the bootstrap likelihood ratio.

References Bezdek, J.C., 1981. Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York. Feng, Z.D., McCulloch, C.E., 1996. Using bootstrap likelihood ratios in finite mixture models. J.R. Statist. Soc. B 58, 609-617. Lindsay, B.G., 1995. Mixture Models: Theory, Geometry and Applications. IMS Lecture N o t e s - Monograph Series, Hayward, CA. McLachlan, G.J., 1987. On bootstrapping the likelihood ratio test statistic for the number of components in a normal mixture. Applied Statistics 36, 318-324. Richardson, S., Green, P.G., 1997. On Bayesian analysis of mixtures with an unknown number of parameters, with discussion. J. Roy. Statist. Soc. B 59, 731-792. Robert, C., 1996. Mixture of distributions: inference and estimation. In: Gilks W., Richardson, S., Spiegelhalter, D.J., (Eds.), Practical Markov Chain Monte Carlo. Chapman & Hall, London, pp. 441-464. Titterington, D.M., 1990. Some recent results in the analysis of mixture distributions. Statistics 21, 619~41. Yitterington, D.M., 1996. Mixture distributions (update). In: Banks, D., Johnson, N.L., Kotz, S., Read, C.B., (Eds.), Encyclopedia of Statistical Sciences Update, Volume 1. Wiley, New York, pp. 399-407. Titterington, D.M., Smith, A.F.M., Makov, U.E., 1985. Statistical Analysis of Finite Mixture Distributions. Wiley, Chichester. Windham, M.P., Cutler, A., 1992. Information ratios for validating mixture analyses. J. Am. Statist. Assoc. 87, 1188-1192.