Test for homogeneity of several populations by stochastic complexity

Test for homogeneity of several populations by stochastic complexity

journal of statistical planning ELSEVIER Journal of Statistical Planning and Inference 53 (1996) 133-151 and inference Test for homogeneity of seve...

842KB Sizes 0 Downloads 22 Views

journal of statistical planning ELSEVIER

Journal of Statistical Planning and Inference 53 (1996) 133-151

and inference

Test for homogeneity of several populations by stochastic

complexity Guoqi Qian, R.P. Gupta *, George Gabor Department of Mathematics, Statistics and Computin9 Sciences, Dalhousie University, HaliJax, N.S. B3H 3J5, Canada

Received 15 September 1994; revised 17 May 1995

Abstract

We use the concepts of stochastic complexity, description length, model selection and nonparametric histogram density to develop a new method for testing homogeneity of several populations. The test procedure depends only on the data and the smoothing parameters, which are shown to be readily chosen according to the criterion of minimum stochastic complexity. The procedure is shown to have asymptotic power 1. Two examples are given to illustrate its practical use. Justifications of the test procedure and comparison with related methods are also presented through a simulation study. A M S Classifications: 62G 10; 62G07 Keywords: Stochastic complexity; Test of homogeneity; Histogram density estimation;

Minimum description length; Model selection

1. Introduction

One of the basic problems in statistical inquiry is the two-sample problem of testing the equality o f two distributions, and more generally, the k-sample problem of testing the homogeneity of the distributions o f several populations (k > 2). A typical example, commonly referred to as the one-way layout problem, is the comparison of several of treatments with a control, where the hypothesis of no treatment effect is tested against the altemative o f at least one effect. In the parametric setting when normality of the populations is assumed, an appropriate test is based on Student's t for the problem of equal means of two populations. However, when approximate normality is suspected but not fully trusted, one may replace the t-test by its permutation analogue, which can again be approximated by a * Corresponding author. Tel.: 902-494-2572; Fax.: 902-494-5130. 0378-3758/96/$15.00 @ 1996--Elsevier Science B.V. All rights reserved SSDI 0378-3758(95)00130-1

134

G. Qian et al./Journal of" Statistical Plannin9 and lnJerence 53 (1996) 133 151

t-test. For the case of homogeneity of means of more than two populations, the appropriate F-test is used, which is based on the assumption of normality and a common variance of the populations, the latter of which is tested by some more or less robust tests like the classic Bartlett's test. For the case where the assumption of a common variance cannot be maintained, the so-called generalized Behrens-Fisher problem, other tests have been proposed. For a review, see Lehmann (1986). To achieve robustness against the violation of some of the assumptions of the parametric tests one may consider nonparametric alternatives. Usually a distribution-free statistic, which is based on the ranks of the observations and satisfying some invariance principles, is constructed to test homogeneity. The two most familiar such statistics are the two-sample Wilcoxon test and the Kruskal-Wallis test. The theory of these and related rank tests can be found in H~ijek and Sid{tk (1967), Lehmann (1975), Randles and Wolfe (1979), Hettmansperger (1984), and others. All the tests cited above require that the different populations have the same distributional shape with the difference only in the location or the scale parameter, which sometimes can be explained by an additive or multiplicative treatment effect or both. But seldom are these claims statistically tested. Moreover, while such tests are sensitive to the location or scale difference, they may not detect differences of other types. The most commonly employed Smirnov test (see, for example, Conover, 1971 ) is consistent against all types of differences that may exist among the k populations. In this paper we will argue that the principle of stochastic complexity and minimal description length (MDL) have important roles to play in testing the homogeneity of the k populations against any type of difference among them. Suppose we are given a set of data consisting of k independent random samples: X l I , X 1 2 . . . . . Xln, with size of nl, X:1,X22 . . . . . Xzn2 with size of n2 . . . . . and X k l , X k 2 . . . . . Xk,k with size of nk, k ~>2, and all the observations are independent. Let FI (x), F 2 ( x ) . . . . . Fk (x) represent, respectively, their unknown population distribution functions and f t (x), f 2 ( x ) . . . . . f k ( x ) their corresponding density functions. We are interested in testing if these k distributions are identical against the alternative that some kind of difference exists among them. Our test procedure works as follows. First, an idealized code length, the stochastic complexity, based upon the class of histogram density estimators with equal-width bins is computed for each independent random sample, which, when minimized, gives the optimal number of the bins with the associated density estimator and the proper measurement of the information contained in each sample (Hall and Hannan, 1988; Rissanen et al., 1992). Second, the same kind of stochastic complexity is computed for the pooled sample, which, when minimized gives the estimator of the associated mixed density. Finally, a comparison is made between the stochastic complexity of the pooled sample and the sum of the stochastic complexities of all the samples; if the former one is smaller, the hypothesis of homogeneity of the k distributions is accepted, and the hypothesis is rejected otherwise. The novelty of our approach lies in using the principle of minimum description length and stochastic complexity instead of the classic methods which employ the empirical

G. Qian et al./ Journal of Statistical Plannin9 and Inference 53 (1996) 133-151

135

distribution. A major drawback of the commonly used classic tests is that they may be applied only to samples of equal sizes. This is because tables for the case of unequal sample sizes are unavailable and must be obtained individually in each case. From a practical standpoint, however, the required calculations could even overtax the capacity of a computer. Our proposed method removes this difficulty, because (a) it does not require the knowledge of the distribution of the test statistic, and (b) the procedure is justified for all continuous distributions and all sample sizes. Furthermore, with this new method one does not need to choose the level of significance of the test since it becomes defined automatically.

2. The test procedure Let ( X I I . . . . . Xin I ) , ( X 2 1 . . . . . X2n 2) . . . . . and (Xkl . . . . . X~ k) (abbreviated as XI, X2 . . . . . Xk) be k independent random samples with sizes nl,n2 ..... nk,~ik=l ni = n, and unknown population density functions fl(x),f2(x) ..... fk(x) respectively. The problem is that of testing the hypothesis H0 :

f l = f2 . . . . .

fk

(la)

against Ha :

at least two of them are not equal.

(lb)

We begin the analysis by first establishing the information contained in each of these k samples. If the densities fl,f2 ..... fk are known, the Shannon entropy (if exists)

- ~-~ / f i(Xij) log f i(Xij)dXij =- -ni f f i log f i, j=l

i = 1.... , k, respectively, will give us the optimal mean code length for each sample. (In this paper all logarithms are with base 2.) In this sense, k

n~

k

- ~--~j~l = l fi(Xij)lOgfi(Xij)dXij=-Zni i=1 f gives us a measurement of information contained in the k samples. Suppose now that we, by mistake, ignore the differences that may exist among the k density functions and encode the k samples of the data as if they were from a single information source. Then the mean code length is

-Zj~l f fi(Xij)l°gfmix(Xij)dXij:-n/ k

n, =

where fmi× =

~-~_l(ni/n)fi

is a mixture density of f l . . . . . fk.

136

G. Qian et al./ Journal of Statistical Plannin9 and Inference 53 (1996) 133-151

The inequality - ~ = , n i f f i log f i <~ - n f fmix log fmix, which holds due to the convexity of x logx, i.e.

-~zj-~lJfi(Xij)l°gfi(Xij)dXiJ~i=l =

-

i=l

j=,

fi(Xij)logfmix(Xij)dXij,

(2)

where equality holds if and only if all the densities f l . . . . . f k are equal (except in a set of measure zero), suggests that if the data are encoded in two distinct ways, each sample separately as well as a pooled sample, and the resulting code length for the latter is found larger than that of the former, then the conclusion that the null hypothesis Ho is violated may be warranted. Indeed, this makes sense because, following the arguments of Shannon (1948) and Rissanen (1989), the optimal mean code length per symbol is a bound which can only rarely be beaten by any other code length per symbol; refer to the noiseless coding theorem due to Shannon (1948) and Propositions 1 and 2 of Dawid (1992). The principle of MDL and the notion of stochastic complexity (Rissanen, 1989) show the way to estimate the optimal encoding of the data. Suppose the unknown densities f i ' s belong to a parametric or a nonparametric model class J[. To achieve an optimal encoding of a given sample, say Xi, we need to select a density )7i in is as m" ) the code length for encoding )?i itself,

~ / b a s e d on which the resulting length of the code for Xi, - log

j=l f i ( X i j ) ,

short as possible while at the same time L ( f ) , is not too long. In other words, we select a density J'i for Xi so that the resulting two-part code length achieves the following: min{-logfi(Xi)+L(fi)},

fief1[

i = 1,2 . . . . . k.

(3)

Similarly, if we combine the k samples together and encode the pooled sample, the resulting optimal code length will be min f h...,f k E ~g

- log

i=1

fmix(Xi) + L(fmix)



(4)

There are some difficulties in performing the minimizations (3) and (4). To overcome these, we apply the so-called stochastic complexity-based nonparametric histogram density estimator and compute the associated minimum description length of the data. Suppose the data of each sample .7(,. fall in the interval [si, t~], and the data of the combined k samples fall in the interval [s,t], where s = min{si, 1 <<.i<~k} and t = max{ti, 1 <~i<.k}. Let ~'1 be the class of histogram densities with equal-width bins, on which we shall demonstrate the minimizations (3) and (4). If we partition [si, ti] into mi congruent subintervals C a for each sample, for 1 <~j~rni and 1 <~i<~k, our histogram density estimator f i ( x ) will take the value (mi/ri)Pij when x C Cij, where ri = ti -- si is the range, pg~>0 and ~ j L I Pij = 1, i = 1. . . . . k. AS in Hall and Hannan (1988) and Risannen et al. (1992), we assume the uniform prior ~z(pi) = ( m i - 1)! on

G. Qian et al./ Journal of Statistical Planningand Inference53 (1996) 133-151 the simplex defined by sample Xi:

Pi = (pil ..... Pin, )

li(Xi;si, ri, mi) =

137

and evaluate the marginal likelihood of the

f i(Xij)7~(~i)d~i

=f(mi)n'

(~=IP~J)

= (mien' (mi- 1)'HTi=lnij ' \ r, :

(g 4m7---i)

'

(5)

where nij denotes the number of the data points in sample Xi that fall in the subinterval Cij. Then the stochastic complexity, i.e. the abstract shortest code length for Xi relative to the set of all histograms with fixed si, ri and mi, is given by

I(Xi [ si, ri, mi) =

- log

li(Xi;si, ri, mi)

= ni log ri + log ( mi

ni nil,

)

• • •, nim,

+log(niWmi--l), \ l n i

i = l . . . . . k.

(6)

where

ni

ni!

(n. ..... and

( ni W mi - - 1 )

(ni -~il.~i + mi---l -- ii

By the same argument, the stochastic complexity for the pooled sample X = (X1,X2 ..... Xk) relative to the set of all histograms, given s,r ( = b - s) and m equalwidth bins, is

, ( X l s , r , m ) = n l o g r- - + l o g ( n ) m n.1,...,n,

+log m

(

n+m-

1) .

(7)

n

Here, n.i denotes the number of the data points in X that falls in the ith bin of the partitioned interval [s, b]. Note that if m~ > ni (or m > n), there will always be some subintervals containing no observation. To describe the employed model we have to take some code length for the encoding of these unnecessary subintervals. This is hardly reasonable. Therefore we restrict the class of histograms such that

l~mi~ni

( i = 1. . . . . k)

and

l<~m<~n.

138

G. Qian et al./ Journal of Statistical Plannin 9 and Inference 53 (1996) 133-151

For the minimizations (3) and (4) we still need the code lengths required to encode the parameter sets {si, ri, mi, i = 1..... k} and {s, r, m}, which will be combined, respectively, with (6) and (7) to provide us the data-based two-part code length corresponding to (3) and (4). Since the optimal m i and m usually depend on the sample size, the code lengths needed to encode the parameters could be quite comparable to the stochastic complexities (6) and (7) - - especially for small and medium sized samples - - which would reduce the importance the stochastic complexity is playing in dominating the random structure of the data. However, we can avoid such an unpleasant situation by truncating the number of decimal digits kept in the parameters, and encode instead the resulting {[si/lOd], [ri/lOa], [mi/lod], i = 1..... k} and {[s/lOa], [r/lOa], [m/lOa]} as well as the optimized precision d, where [y] denotes the nearest integer to y. (In the following we shall use 3~ to denote [y/lOa].) This means that the difference between each parameter and the values within its neighborhood of width 10a is ignored. There is a natural restriction for the precision d, namely, that it ranges from 'minus the largest number of effective digits after decimal point of the observations' to 'one less than the largest number of digits before the decimal point of the observations'. For example, if the measurements of a given sample are all rounded to three decimals and the largest absolute value of the sample is 347.635, then d will be restricted within the interval [-3,2]. Rissanen (1989, Ch. 2) demonstrates that for a set of integers {01,02,..., 0b} a prefix code can be found with about ( 0 + b)! ( b + 1)! L~(01,02 . . . . . Oh) = 1og2.865 + log*(0) + log 0.~-(/~~ i)! + log b+!(b - b+)! (8) number of bits. Here 0 = ~-~i Ioil, b+ is the number of nonnegative items in {0i . . . . . 0b} and l o g * ( n ) = log n + log logn + .... where the sum includes all the positive iterates. Now we are in a position to obtain a data-based expression of the idealized code iength for a sample, namely, m i n { I ( X i [ si, ri, mi) 4- Ll(gi, ri, mi ) -t- l log 10a[} mi,d

(9)

for the sample Xg, i = 1 . . . . . k, and min{I(X I s, r , m ) + Lj(g,?,r~) + I log 10al}

(10)

m,d

for the pooled sample X. Therefore, in the case of unequal densities f l , f 2 . . . . . f k , the idealized code length for encoding Xl ..... Xk is rain | g-" m,...,m~.d ~ I ("X i I si, ri, mi) ki=l

+Ll(Sl,~l,tnl . . . . . sk,rk, mk) + I1og 10a[} =

min ml ,...,mk,d

nilog - i + i=1

mi

log i=1

ni ni I ,...,

nim~

G. Qian et al./Journal of Statistical Plannin 9 and lnJerence 53 (1996) 133 151 k

139

k

i=1

ni

k

i=1

y

+ log (~=,(l~il + ~, + ~))!(3k - 1)! Ok+l)!

+ log (3k)+!(3k - (3k)+)!

+ ilog 10,~]}

(11)

where (3k)+ indicates the number of nonnegative values in {s l, r i, rnl, s2, r2, m2. . . . . sk, rk, mk}. Note that in (l 1 ) we use the shorter Lj(~l,t:l, rhl . . . . . gk,rk,mk) instead of the longer ~ - l Ll(gi,~i, rhi). From the above discussion we could conclude that, if the alternative hypothesis H~ is true, the code length in (11) should be less than that in (10). This is because the encoding procedure corresponding to (10) is based on the wrong model stated in H0. If, however, the null hypothesis is true, then (2) implies that both encoding procedures corresponding to (10) and (11) should give virtually the same code length. The expression (10) on the other hand would more likely result in smaller code length, because in (11) one needs to encode more parameters. Clearly then, (10) and (11) can be used as test criteria to test H0 against Ha, in which the code lengths of the parameters play the role of determining the size of the test. Moreover, it enables us to go further and detect which of the k densities are different and which of them are identical by trying to beat the code length (11) by a more precise modeling of the data. The large sample asymptotic behavior of the test procedure has been discussed in Qian (1994, Ch. 5) in a more general case. We list below the results related to the current paper, the proof of which is presented in Appendix A. Theorem 1. Let X1 . . . . . Xk be k independent random samples o f sizes nl . . . . . nk, ~ - l ni = n, and unknown density functions f l(x) . . . . . f k(x) defined on Is, t] respectively. Suppose that the following conditions are satisfied." (a) 0 < Cil ~ f i ~ c i 2 < oo f o r i = 1. . . . . k, where cil,cn are constants. (b) f i is absolutely continuous with a.s. derivative J'i such that Iffil <~ci3, where ci3 is a constant, i = l , . . . , k . (c) The numbers o f subintervals m and mi satisfy n/i ~ m ~ n / 2 ,

n i'"' <~mi <~nl 'i:,

where "~'1,72,7il, 7i2 are constants satisfying 0 < 71 < 72 < 1,0 < 7il < yi2 < 1,.[br i = 1. . . . . k. Then the following statements hold f o r any prescribed precision d :

(i) min{l(Xi[s~,r~,mi) + Ll(~,Yi, rh~) + ]log 10d[} + log fT~(X~) = A~n~(logn~)~ mi

almost surely f o r i = 1. . . . . k, where fT'(X~) = H~'_~ f i ( X i / ) and Ai is a positive constant depending on Ji.

G. Q&n et al./Journal o f Statistical Planning and Inference 53 (1996) 133 151

140

(ii) I f we write rhi = argminmi{I(X/ I si, ri, mi) + Ll(gi, Yi, rhi) + [log 10al}, then rhi = Bi(ni/log ni ) ",

almost surely

f o r i =- 1. . . . . k, where Bi is a positive constant depending on f i. (iii) I f the alternative hypothesis Ha is true, there exists a constant q < O, such that

A defl

min

n Lmt,''',m~

ZI(Xi

Isi, ri, m g ) + L , ( g , , ~ , r ~ l . . . . . gk,~k,r~k)+ IloglOal

i 1

- m i n { I ( X I s, r,m) + L~(g,~,~) + [loglOal}] < t1, m~ J

almost surely,

as nl ---~oc . . . . . nk---~ec such that (nl/n) > cl > 0 , . . . , ( n k / n ) > ek > O f o r any set o f prescribed

constants

c i . . . . , ok.

(iv) I f the null hypothesis Ho is true, then A --~ O, a s nl ~

almost surely,

O G , . . . , n k ---+ OG.

In the case of a small or medium size sample, the size of the test procedure depends on the selection of the truncation precision d. Notice that if we do not truncate the parameters, the code lengths used to describe them will be comparable to the stochastic complexities (6) and (7), and the level of the test (the type-I error) will be quite small, while the type-II error is likely to be large. On the other hand, if we truncate the parameters too heavily, we will ignore too much information provided by the suggested model resulting in a large type-I error, even though the type-II error will be well controlled. According to experience, using the precisions which minimize (10) and (11 ) will usually truncate the parameters too heavily and result in large typeI error. An exact formula for determining the optimal precision d is not available, but some heuristic grasp of how the power of the test varies with the precision d can be obtained by the simulation study in Section 4. If the Fisher information is used to find the stochastic complexity of a set of data, there would be no need to choose the optimal precision to truncate the parameters, see Rissanen (1994) for the development.

3. Two examples The first example uses the 'PRO Football Scores' data of Lock (1992). In order to get an idea of how the criterion works, we compare only the pointspread (abbreviated as pts., Oddsmaker's points to handicap the favored team) data in the third week, the eighth week and the fourteenth week to assess the presence of a time shift in the scores.

G. Qian et al./Journal of Statistical Planning and InJerence 53 (1996) 133-151

141

Scores of the third week: 7.5 4.5 10.5 2.0

3.5 7.0 10.0 2.5 6.5 8.5 2.5 4.0 7.5 1.5 3.5 4.0 9.5 2.0 5.5 9.0 3.0 9.0 3.5 5.5 9.0 7.0 2.0 14.0 2.0 14.0 3.5 9.0 2.0 3.0 3.0 1.5 3.5 7.5 6.0 8.0 3.0 4.0

Scores of the eighth week: 7.0 6.5 4.0 4.0

6.5 2.0 3.5 2.0

2.5 2.0 0.0 7.0

2.0 2.5 4.0 6.0 3.0 4.0 6.0 8.5 6.5 5.5 2.5 2.5 9.0 3.5 6.0 13.0 0.0 5.5 7.0 12.0 12.5 5.5 l.O 4.0 4.0 13.0

Scores of the fourteenth week: 12.0 8.0 6.5 1.5

1.0 12.0 6.5 3.0 6.0 3.0 9.0 1.5 9.5 10.0 5.0 0.0 13.5 4.5 5.5 3.5 13.0 7.5 5.0 2.0 4.0 4.0 3.0 3.0 6.5 8.0 5.5 9.0 9.5 11.0 5.0 7.0 8.5 5.0 6.5

with sample sizes nl = 42,n2 = 38,n 3 39 respectively. Under the null hypothesis of no time shift in the pointspread, the idealized code length (10) for the pooled sample with rn ~< 119, is 363.27, and the corresponding optimal m = 119 and d = 1 (one less than the largest number of digits before the decimal point in the observations). Under the alternative hypothesis of some time effects in the pointspread, the idealized code length (11 ) for the three independent samples is 445.27, achieved at rnl = 5, m2 = 2, m3 = 1 and d = 1. Because the idealized code length for the pooled sample is considerably smaller than that of the three independent samples, we conclude that there is no evidence of time effect in the pointspread, which concurs with the conclusion of the classic Student's t test for the mean difference of every two of the three samples. Figs. 1 and 2 show how the idealized code lengths of each individual sample and the pooled sample change with the number of bins employed in the corresponding histogram densities and with the precision d used to truncate the parameters. In the second example, we generated two independent samples with sizes n~ = 15 and n2 = 12, respectively, from Gamma(4,3) and Uniform(4,18) distributions. The two samples are as follows: =

X1 =

7.362 8.876 5.219 10.506 12.590 9.552 10.203 11.144 27.296 3.105 8.995 4.955 4.065 10.822 11.097 X2 = 6.6456.246 7.589 4.563 11.131 4.371 6.743 16.647 15.412 6.202 15.134 6.951 Under the null hypothesis H0 that there is no difference between the distributions which generated the two samples, the idealized code length (10) is 120.78 with m = 3 and d = log10 17, while under the alternate hypothesis Ha that there is a difference

142

G. Qian et al./ Journal of Statistical Planning and Inference 53 (1996) 133-151

The 3rd Week

0

i



r

10

20

30

The 8th Week

w

40

:

0

10

20

ml

m2

The 14th Week

!

!

m3

30

The Pooled Sample

i

i

:

!

i

i

i

m

Fig. 1. Relationship between code length and number of equal-width-bins. between the two distributions, the idealized code length (11) is 115.00 with ml = 3, m2 = 7 and d = log1025. The difference is clearly indicated by (10) and (11), but neither the classic Student's t test, which gives the p-value--0.7026, nor the Smirnov test, which is not significant at ~ -- 0.2, would indicate that difference. It is also interesting to note that (10) is always minimized at m = 3 when d is chosen

G. Qian et al./Journal of Statistical Plannin9 and Inference 53 (1996) 133 151

The 3rd Week .......

°

....... .

The 8th Week

........--"''"°'°'i''"°"-. :

-- '

" - . . ~

::,

.......

.:

"°'°°"'°"'~%.



~-

~.~f,

........

.~

:

.::,~

...i .........

.......

...............

o

°*~-

:



~o

.......

.-° .o..--""''''"i~°°'°--.o..

=. ~-:: .....

"%-v,~-~.o

The 14th Week

...-'"

143

; :

The Pooled Sample

...--"

.'.'4 :

--'~'-

~

.......

.................

,,% -'"~0 °"

.~0

c~ O

Fig. 2. Relationship among code length, number of equal-width-bins and precisions.

0

',

--

UnderNO 1

¢g,

LI~

"

0

10

20 10^cl

Fig. 3. Relationship between code length and precision.

i

! ,~

144

G. Qian et al./Journal o f Statistical Plannin9 and Inference 53 (1996) 133-151

from 0, lOgl02, log10 3,...,log10 27, while (11) is always minimized at ml -- 3 and m2 - - 7 . Fig. 3 illustrates the relationship among (10), (11) and d.

4. Simulation studies In this last section we assess the finite sample performance of the proposed test procedure by a simulation study, in which we compare our method with the two sample t-test and the Smirnov test for equal and unequal sample sizes. The comparisons are in terms of the power of the test based on 1000 repetitions. The results are summarized in Tables 1 and 2. Instead of using the optimal precision we choose some different but reasonable precision to truncate the parameters. It is found that there usually exists a precision d which makes both type-I and -II errors reasonably small. Note that with the stochastic complexity test, the type-I error depends on the distributions of the samples, the sample sizes and the precision d, which is very different from the traditional tests where the type-I error is controlled in advance. Although one cannot have an exact fomula calculating the type I error, one can obtain an estimate of it by simulation. For instance, from row 1 of Table 1 we know that the power of testing N(10,9) and N(10,49) by stochastic complexity test is 0.547 if d --- 0.2 and nl = n2 -- 15. The corresponding value in row 2 and 3 of Table 1 is 0.022 and 0.079 respectively, which is for testing N( 10, 49) and N( 10, 49), and N( 10, 9) and N(10,9). This suggests that the empirical size of testing the homogeneity of N(10, 9) and N(10,49) at d = 0.2 and nl ---- n 2 = 15 is about 0.079. The tables illustrate the following findings: (1) When the samples are generated from normal distributions, the three tests are all efficient if the difference of the populations is the result of a mean shift. Both the Smirnov test and the stochastic complexity test are efficient when the difference is the result of a change in the variance, but the latter is better. (2) When the data are generated from uniform distributions, the stochastic complexity test is quite efficient, and also the best of the three methods. (3) When the data are generated from lognormal distributions, the Smirnov test is the best and the other two tests are inefficient. (4) When the data are from exponential distributions, all the three methods are efficient, but the two sample t-test is the best. (5) When the data are from logistic distributions, both the Smirnov test and the stochastic complexity test are quite efficient to indicate a difference in the shape of the distributions with the latter method superior in performance. (6) When the data are from gamma distributions, both the Smirnov test and the stochastic complexity test are efficient with comparable power. (7) When the two samples are from different families of distributions, the stochastic complexity test is quite efficient and performs best.

0.011 0.013 0.012 0.464 0.894 0.008 0.007 0.011 0.015 0.011 0.008 0.007 0.009 0.011 0.016 0.007 0.965 0.009 0.007 0.010 0.762 0.008 0.005 0.003 0.004 0.043 0.000 0.080 0.632 0.193 0.336

0.107 0.104 0.094 0.824 0.988 0.095 0.108 0.104 0.090 0.099 0.095 0.111 0.112 0.094 0.100 0.087 1.0 0.100 0.086 0.107 0.957 0.103 0.111 0.070 0.088 0.221 0.055 0.491 0.989 0.719 0.862

N(10,9) and N(10,49) N(10,49) and N(10,49) N(10,9) and N(10,9) N(12,4) and N(10,4) N(13,4) and N(10,4) N(13,4) and N(13,4) N(10,4) and N(10,4) N(10,16) and N(10,16) N(2,16) and N(2,16) N(10,0.64) and N(10,0.64) N(5, 0.64) and N(5, 0.09) N(5, 0.09) and N(5, 0.09) N(25,1) and N(25,1) Unif(0,1) and Unif(0,1) Unif(1,4) and Unif(1,4) Unif(-2,3) and Unif(-2,3) Unif(1,4) and Unif(-2,3) Unif(2,8) and Unif(3,7) Unif(2,8) and Unif(2,8) Unif(3,7) and Unif(3,7) Unif(5,14) and Unif(3,10) Unif(5,14) and Unif(5,14) LogN(l,1) and LogN(0.5,v/2) LogN(0.5,x/2) and LogN(0.5,x/2) LogN(l, 1 ) and LogN(1,1 ) LogN(1,1) and LogN(0,x/3) LogN(0,v~) and LogN(0,x/3) LogN(7,2) and LogN(l,4) a Exp(l) and Exp(0.2) Exp(0.2) and Exp(0.5) Exp(0.2) and Exp(0.6)

0.052 0.048 0.058 0.717 0.977 0.044 0.055 0.049 0.048 0.049 0.053 0.061 0.056 0.049 0.052 0.037 0.998 0.053 0.049 0.061 0.921 0.051 0.050 0.026 0.035 0.151 0.017 0.307 0.939 0.545 0.721

Two sample t-test : 0.1 0.05 0.01

Distributions 0.487 0.173 0.184 0.837 0.988 0.185 0.200 0.201 0.171 0.186 0.560 0.189 0.201 0.191 0.182 0.181 0.998 0.311 0.173 0.206 0.924 0.179 0.410 0.201 0.165 0.737 0.169 1.0 0.982 0.716 0.854

0.229 0.077 0.078 0.684 0.952 0.084 0.083 0.084 0.078 0.076 0.311 0.077 0.084 0.087 0.079 0.068 0.984 0.124 0.071 0.080 0.814 0.070 0.244 0.082 0.070 0.574 0.077 1.0 0.946 0.525 0.705

Smimov test ct : 0.184 0.076 0.093 0.023 0.023 0.503 0.877 0.025 0.023 0.031 0.030 0.020 0.152 0.026 0.032 0.026 0.029 0.022 0.937 0.046 0.023 0.030 0.640 0.020 0.129 0.038 0.023 0.379 0.021 0.995 0.848 0.359 0.526

0.026 0.410 0.005 0.033 0.279 0.655 0.030 0.033 0.039 0.017 0.053 0.873 0.137 0.003 0.103 0.037 0.011 0.999 0.244 0.007 0.007 0.909 0.001 0.112 0.108 0.046 0.210 0.182 0.510 0.917 0.416 0.599

0.482 0.012 0.045 0.333 0.710 0.047 0.045 0.056 0.029 0.073 0.890 0.159 0.005 0.102 0.039 0.018 0.999 0.305 0.011 0.011 0.948 0.002 0.129 0.130 0.062 0.250 0.202 0.518 0.940 0.462 0.646

0.547 0.022 0.079 0.406 0.758 0.057 0.073 0.074 0.033 0.097 0.902 0.231 0.012 0.203 0.060 0.021 0.999 0.425 0.015 0.017 0.970 0.004 0.163 0.158 0.082 0.294 0.225 0.536 0.958 0.509 0.683

0.603 0.032 0.099 0.454 0.815 0.079 0.100 0.104 0.048 0.133 0.923 0.331 0.018 0.251 0.055 0.020 1.0 0.398 0.020 0.028 0.974 0.007 0.193 0.177 0.112 0.338 0.238 0.548 0.961 0.544 0.729

Stochastic complexity test d:0 0.1 0.2 0.3

Table 1. Comparison of the power of the test in two sample case (sizes nl = n2 = 15 and based on 1000 simulations)

0.783 0.086 0.220 0.639 0.891 0.185 0.235 0.219 0.113 0.214 0.986 0.687 0.060 0.844 0.203 0.145 1.0 0.807 0.079 0.097 0.990 0.033 0.292 0.233 0.200 0.436 0.296 0.568 0.982 0.661 0.824

0.5 0.875 0.173 0.347 0.748 0.941 0.326 0.384 0.362 0.205 0.396 0.988 0.695 0.131 0.848 0.470 0.240 1.0 0.793 0.115 0.096 1.0 0.085 0.399 0.308 0.291 0.521 0.352 0.600 0.992 0.769 0.886

0.7

1.0 0.970 0.371 0.447 0.813 0.965 0.444 0.427 0.611 0.513 0.626 1.0 0.964 0.428 0.851 0.835 0.851 1.0 1.0 0.465 0.865 0.997 0.090 0.559 0.418 0.457 0.647 0.439 0.628 0.994 0.841 0.923

~ ,~

~.~ -~-

r~~

"~

t~ ~.

0.007 0.336 0.007 0.449 0.005 0.006 0.007 0.008 0.007 0.013 0.010 0.006 0.006 0.011 0.010 0.010 0.008 0.005 0.102 0.007 0.010 0.018 0.012 0.011 0.006 0.040 0.014 0.005 0.019 0.009 0.013 0.019

0.100 0.862 0.100 0.919 0.089 0.095 0.093 0.096 0.112 0.107 0.079 0.088 0.104 0.100 0.113 0.113 0.103 0.092 0.392 0.099 0.096 0.108 0.116 0.100 0.101 0.132 0.109 0.114 0.110 0.094 0.114 0.117

Exp(0.2) and Exp(0.2) Exp(0.2) and Exp(0.6) Exp(0.2) and Exp(0.2) Exp(0.2) and Exp(0.7) Exp(0.6) and Exp(0.6) Exp(0.7) and Exp(0.7) Logis(2,2) and Logis(2,2) Logis(2,3) and Logis(2,4) Logis(2,3) and Logis(2,3) Logis(2,5) and Logis(2,5) Logis(2,3) and Logis(2,5) Logis(2,2) and Logis(2,7) Logis(2,7) and Logis(2,7) Logis(2,3) and Logis(2,7) Logis(2,3) and Logis(2,8) Logis(2,5) and Logis(2,8) Logis(2,8) and Logis(2,8) Logis(2,4) and Logis(2,7) Gamma(4,2) and Gamma(2,3) Gamma(2,3) and Gamma(2,3) Gamma(4,2) and Gamma(4,2) Gamma(2,4) and Gamma(4,2) Gamma(5,2) and Gamrna(2,5) Gamma(2,5) and Gamma(2,5) Gamma(5,3) and Gamma(3,5) Gamma(5,1) and Gamrna(l,5) Garnma(6,2) and Gamma(2,6) Gamma(7,2) and Gamma(2,7) Gamma(7,3) and Gamma(3,7) Gamma(8,2) and Gamma(2,8) Gamma(8,3) and Gamma(3,8) Gamma(9,2) and Gamma(2,9)

0.049 0.721 0.049 0.823 0.043 0.039 0.046 0.047 0.056 0.052 0.034 0.047 0.054 0.045 0.055 0.058 0.047 0.042 0.271 0.050 0.047 0.057 0.050 0.050 0.049 0.088 0.056 0.067 0.062 0.041 0.063 0.067

Two sample t-test c~: 0.1 0.05 0.01

(Continued)

Distributions

Table 1

0.181 0.854 0.181 0.898 0.185 0.179 0.190 0.209 0.177 0.180 0.266 0.714 0.189 0.460 0.573 0.266 0.186 0.307 0.566 0.180 0.167 0.260 0.318 0.187 0.226 0.587 0.370 0.390 0.292 0.455 0.315 0.509

0.077 0.705 0.077 0.798 0.088 0.082 0.075 0.092 0.079 0.076 0.118 0.409 0.075 0.224 0.300 0.108 0.070 0.136 0.370 0.080 0.069 0.115 0.150 0.085 0.119 0.368 0.172 0.216 0.155 0.242 0.158 0.286

Smimov test c~: 0.184 0.076 0.021 0.526 0.021 0.651 0.026 0.017 0.023 0.034 0.025 0.033 0.046 0.192 0.024 0.084 0.135 0.049 0.022 0.042 0.210 0.029 0.024 0.044 0.062 0.025 0.041 0.200 0.084 0.092 0.067 0.111 0.065 0.131

0.026 0.017 0.599 0.017 0.728 0.089 0.110 0.026 0.014 0.008 0.002 0.032 0.517 0.002 0.132 0.238 0.013 0.002 0.033 0.089 0.030 0.019 0.048 0.069 0.012 0.012 0.384 0.085 0.092 0.028 0.119 0.025 0.158

0.026 0.646 0.026 0.765 0.108 0.134 0.038 0.021 0.010 0.003 0.049 0.580 0.003 0.169 0.270 0.019 0.003 0.043 0.113 0.040 0.037 0.071 0.097 0.020 0.018 0.434 0.104 0.136 0.039 0.157 0.038 0.209

0.039 0.683 0.039 0.804 0.133 0.162 0.057 0.031 0.020 0.004 0.064 0.643 0.004 0.210 0.327 0.028 0.004 0.052 0.153 0.055 0.059 0.091 0.131 0.029 0.031 0.488 0.155 0.169 0.054 0.206 0.060 0.246

0.053 0.729 0.053 0.829 0.164 0.192 0.084 0.048 0.031 0.011 0.087 0.710 0.008 0.267 0.379 0.039 0.004 0.070 0.196 0.066 0.080 0.121 0.171 0.036 0.046 0.547 0.187 0.224 0.073 0.259 0.087 0.310

Stochastic complexity test d:0 0.1 0.2 0.3 0.129 0.824 0.129 0.889 0.338 0.342 0.175 0.093 0.078 0.032 0.157 0.817 0.024 0.387 0.489 0.092 0.009 0.138 0.328 0.154 0.166 0.244 0.308 0.079 0.106 0.715 0.327 0.366 0.145 0.427 0.156 0.463

0.5 0.216 0.886 0.216 0.938 0.411 0.428 0.298 0.173 0.155 0.060 0.257 0.879 0.044 0.518 0.623 0.156 0.024 0.247 0.471 0.250 0.245 0.364 0.422 0.137 0.187 0.825 0.471 0.519 0.239 0.566 0.258 0.609

0.7

0.370 0.923 0.370 0.960 0.682 0.745 0.584 0.345 0.338 0.151 0.425 0.938 0.116 0.697 0.765 0.278 0.078 0.419 0.712 0.457 0.538 0.639 0.644 0.289 0.305 0.917 0.624 0.635 0.447 0.707 0.437 0.745

1.0

,~

~ ~~"

~.

~2~ ~E

o~

0.108 0.067 0.098 0.118 0.102 0.081 0.095 0.115 0.100 0.113 0.097 0.118 0.125 0.101 0.090

0.054 0.030 0.039 0.063 0.054 0.048 0.046 0.057 0.049 0.065 0.047 0.060 0.075 0.056 0.048

0.007 0.009 0.005 0.010 0.006 0.010 0.007 0.008 0.014 0.015 0.010 0.004 0.027 0.005 0.009

0.202 0.138 0.185 0.199 0.178 0.183 0.181 0.178 0.190 0.192 0.188 0.629 0.562 0.202 0.323

0.086 0.044 0.074 0.092 0.072 0.083 0.083 0.074 0.073 0.087 0.078 0.367 0.335 0.087 0.156

0.038 0.016 0.020 0.029 0.022 0.026 0.028 0.024 0.030 0.032 0.033 0.166 0.173 0.024 0.065

a The power will equal to 0.802,0.875 and 0.925 respectively for d = 3.0,4.0 and 5.0.

Gamma(2,6) and Gamma(2,6) Gamma(7,2) and Gamma(7,2) Gamma(2,7) and Gamma(2,7) Gamma(7,3) and Gamma(7,3) Gamma(3,7) and Gamma(3,7) Gamma(8,2) and Gamma(8,2) Gamma(2,8) and Gamma(2,8) Gamma(8,3) and Gamma(8,3) Gamma(3,8) and Gamma(3,8) Gamma(9,2) and Gamma(9,2) Gamma(2,9) and Gamma(2,9) N(2,16) and Logis(2,7) N(5,5) and Exp(0.2) N(2,49~z2/3) and Logis(2,7) N(5,36) and Exp(0.2)

0.004 0.012 0.004 0.002 0.004 0.008 0.006 0.002 0.002 0.008 0.001 0.409 0.359 0.000 0.197

0.007 0.018 0.005 0.006 0.006 0.022 0.009 0.004 0.006 0.008 0.002 0.490 0.421 0.001 0.240

0.012 0.025 0.009 0.014 0.012 0.026 0.009 0.007 0.009 0.014 0.003 0.562 0.487 0.003 0.287

0.020 0.035 0.015 0.026 0.015 0.041 0.013 0.008 0.011 0.020 0.008 0.635 0.544 0.004 0.348

0.058 0.097 0.049 0.072 0.036 0.092 0.036 0.036 0.022 0.068 0.024 0.748 0.758 0.012 0.537

0.121 0.175 0.101 0.132 0.075 0.168 0.080 0.087 0.051 0.151 0.066 0.845 0.841 0.030 0.692

0.260 0.315 0.222 0.281 0.170 0.299 0.207 0.231 0.150 0.295 0.158 0.939 0.916 0.087 0.849

7~

I

~2

~.

"~,~

.~

0.973 1.0 0.011 0.050 0.012 0.015 0.011 0.007 0.012 0.974 0.011 0.004 0.350 0.432 0.238 0.012 0.008 0.008 0.009 0.941 0.011 0.074 0.013 0.903 0.017 0.004 0.040 0.012

1.0 1.0 0.103 0.232 0.102 0.108 0.102 0.079 0.094 0.998 0.103 0.092 0.873 0.857 0.771 0.085 0.061 0.094 0.097 0.998 0.098 0.325 0.089 0.993 0.120 0.089 0.202 0.099

Unif(-2,3) and Unif(1,4) Unif(3,7) and Unif(1,4) Unif(2,8) and Unif(3,7) Unif(2,8) and Unif(4,7) Unif(2,8) and Unif(1,9) Unif(1,9) and Unif(l,9) Unif(4,7) and Unif(4,7) Unif(-2,3) and Unif(-3,4) Unif(-3,4) and Unif(-3,4) Unif(-3,4) and Unif(-4,0) Unif(-4,0) and Unif(-4,0) LogN(1,1) and LogN(0.5, x/~) Exp(0.2) and Exp(0.6) Exp(0.3) and Exp(0.8) Exp(0.5) and Exp(0.2) Logis(2,2) and Logis(2,7) Logis(2,4) and Logis(2,7) Logis(2,8) and Logis(2,3) N(10,9) and N(10,49) N(13,4) and N(10,4) N(5,0.64) and N(5,0.09) N(-8,49) and N(-6,16) Gamma(7,2) and Gamma(2,7) Gamma(7,2) and Gamma(8,3) Gamma(4,3) and Unif(6,18) N(3,16) and Logis(2,7) N(5,25) and Exp(0.25) Gamma(5,2) and N ( l l , 8 1 )

0.997 1.0 0.051 0.146 0.054 0.045 0.045 0.040 0.048 0.997 0.054 0.035 0.747 0.759 0.602 0.044 0.030 0.040 0.046 0.992 0.049 0.204 0.054 0.978 0.066 0.045 0.121 0.049

Two sample t-test : 0.1 0.05 0.01

Distributions 0.998 1.0 0.362 0.655 0.274 0.224 0.200 0.287 0.203 0.996 0.200 0.470 0.887 0.808 0.780 0.774 0.322 0.623 0.553 0.998 0.674 0.542 0.458 0.993 0.323 0.727 0.514 0.528

0.995 1.0 0.239 0.555 0.139 0.117 0.104 0.155 0.105 0.992 0.088 0.317 0.797 0.698 0.647 0.543 0.169 0.430 0.350 0.986 0.482 0.383 0.274 0.972 0.179 0.499 0.358 0.315

Smirnov test ~ : 0.2 0.1 0.984 1.0 0.125 0.329 0.083 0.068 0.056 0.084 0.057 0.961 0.053 0.237 0.703 0.598 0.548 0.359 0.099 0.264 0.221 0.980 0.285 0.263 0.192 0.953 0.100 0.325 0.263 0.200

0.05 0.917 0.999 0.029 0.149 0.016 0.020 0.011 0.014 0.013 0.849 0.011 0.079 0.475 0.371 0.284 0.097 0.018 0.090 0.047 0.881 0.092 0.110 0.066 0.843 0.020 0.090 0.102 0.060

0.01 1.0 1.0 0.505 0.966 0.111 0.009 0.021 0.093 0.006 0.991 0.005 0.109 0.635 0.585 0.403 0.559 0.028 0.303 0.478 0.756 0.926 0.101 0.134 0.575 0.391 0.475 0.363 0.290

1.0 1.0 0.593 0.974 0.156 0.017 0.026 0.140 0.009 0.994 0.007 0.134 0.680 0.634 0.454 0.616 0.042 0.342 0.540 0.796 0.926 0.129 0.171 0.636 0.449 0.545 0.422 0.346

1.0 1.0 0.684 0.974 0.235 0.024 0.039 0.165 0.012 0.995 0.006 0.177 0.729 0.685 0.522 0.678 0.053 0.388 0.620 0.852 0.934 0.171 0.218 0.677 0.536 0.617 0.489 0.414

1.0 1.0 0.666 0.990 0.295 0.027 0.047 0.182 0.011 0.997 0.014 0.211 0.761 0.723 0.569 0.733 0.073 0.434 0.679 0.887 0.956 0.220 0.269 0.736 0.597 0.674 0.555 0.461

Stochastic complexity test d: 0 0.1 0.2 0.3

Table 2. Comparison of the power of the test in two sample case (sizes nl = 15, n2 = 20 and based on 1000 simulations)

1.0 1.0 0.914 0.997 0.405 0.077 0.104 0.454 0.033 1.0 0.037 0.310 0.837 0.809 0.673 0.833 0.144 0.555 0.828 0.937 0.994 0.341 0.433 0.827 0.752 0.803 0.743 0.649

0.5

1.0 1.0 0.897 0.997 0.811 0.318 0.225 0.885 0.159 1.0 0.037 0.403 0.892 0.885 0.768 0.910 0.233 0.669 0.912 0.964 0.994 0.473 0.591 0.898 0.885 0.872 0.834 0.802

0.7

1.0 1.0 1.0 1.0 0.858 0.290 0.863 1.0 0.295 1.0 0.865 0.546 0.925 0.916 0.845 0.955 0.390 0.794 0.976 0.982 0.999 0.625 0.723 0.954 0.922 0.948 0.924 0.900

1.0

~_~

~.

2

~"

~,

~:~ ~'

G. Qian et al./ Journal of Statistical Plann&9 and Inference 53 (1996) 133-151

149

From the simulation study it seems that the stochastic complexity test is a promising method which can be expected to be improved as better ways to estimate the unknown densities are employed.

Acknowledgements The authors wish to thank the editor for the valuable comments and suggestions.

Appendix A. Proof of Theorem 1

Parts (i) and (ii) are established in (ii) and (iv) of Theorem 2.4 of Rissanen, Speed and Yu (1992) and also in (b) and (d) of Theorem 5.3.4 of Qian (1994). It remains to prove (iii) and (iv). Denote C(XI ..... Xk) =

I ( X i I si,ri, m i ) + L l ( S l , r l , r n l . . . . . sk,rk,~tk) + Ilogl0dt

min

ml ,'",mk ki=l

and

C ( X ) = min { I ( X ] s,r,m) + Li(£,i, rh) + ]log 10al}. /rt i

It is easy to show that Ll(Yl,/1, rha,... ,Yk,/k, rhk) = o

i=lmi

andLl(LY, th)=o(m).

From (i) we obtain

C(X, . . . . . Xk) + ~--~ log fT'(X~) = i=1

a.s. ki=l

(A.1)

/

and

C ( X ) + log f~ix(X) = O (nl/3(logn) 2/3)

a.s.

(A.2)

It is now sufficient to prove that there exists a constant t/ < 0 such that --

1

17

logf~ix(X )

/~/

l o g f i ' ( iX)

__

/7

< q

a.s.

(A.3)

i=1

as nl ~ ~ . . . . . nk--+ ~ satisfying (nl/n) > sl > 0 .... ,(nk/n) > sk > 0 for any prescribed constants s l , . . . , s ~ , if the alternative hypothesis Ha is true; and 1 t'/

logf]'(Xi)

logfmnix(X )

0

a.s.

i=1

as nl ~ cxD. . . . . nk --+ c¢ if the null hypothesis H0 is true.

(A.4)

150

G. Qian et al./Journal of Statistical Planning and Inference 53 (1996) 133 151

Because k

logf;,(x,)

i=1

k

ni

=

i=l j = l

logfmnix(X)

----- ~ - - ~ - ~ l o g

(

f,(Xo. )

i=1 j = l

and the fi's are bounded density functions from (a), it follows from the strong law of large numbers for i.i.d, random variables that k

k

! ~-~l°gfn'(xi) - Znin/f, i=l

logfi---~0

a.s.

(1.5)

i=1

and

_lnlog fmnix(X) - / fnaix log fmix ---~ 0

a.s.

(A.6)

as nl ~ cxD. . . . . nk --+ ~ . By the convexity o f x l o g x , k

SfmixlOgfrnix~Zl'li/filOgfi#'l

(1.7)

i=I

for any group of samples of sizes nl . . . . . nk satisfying ~ - 1 ni = n, where the equality holds if and only if all the densities f l , . . . , fk are equal (except in a set of measure zero). Therefore (A.4) is established by using (A.5) and (A.6). Also for any cj > 0 . . . . . e~ > 0 if (nl/n) > el ..... (nk/n) > ok, and if at least two of f l . . . . . fk are not equal almost surely, there exists a constant q < 0, depending on el . . . . . ok, such that k

J Jmixl°g fmix - Z

n--i/ f i l°g f i <

(A.8)

i=1

for any set of integers {ni} satisfying ~ = 1 ni = n. Hence (A.3) follows from (A.5) and (A.6). Finally, (iii) and (iv) hold by (A.I)-(A.4).

References Conover, W.J. (1971). Practical Nonpararnetric Statistics. Wiley, New York. Dawid, A.P. (1992). Prequential analysis, stochastic complexity and Bayesian inference. In: J. M. Bernardo, J.O. Berger, A.P. Dawid and A.F.M. Smith, Eds., Bayesian Statistics 4. Oxford Univ. Press, Oxford, 109-125 (with discussion). Hfijek, J. and Z. Sid~ik (1967). Theory of Rank Tests. Academic Press, New York. Hall, P. and E.J. Hannan (1988). On stochastic complexity and nonparametric density estimation. Biornetrika 75, 705-714. Hettmansperger, T.P. (1984). Statistical Inference Based on Ranks. Wiley, New York. Lehmann, E.L. (1975). Nonparametrics: Statistical Methods Based on Ranks. Holden-Day, San Francisco.

G. Qian et al./Journal of Statistical Plannin9 and Inference 53 (1996) 133 151

151

Lehmann, E.L. (1986). Testin 9 Statistical Hypotheses. 2nd ed. Wiley, New York. Lock, R. (1992). PRO Football Scores, StatLib. Qian, G. (1994). Statistical Modeling by Stochastic Complexity, unpublished Ph.D. dissertation, Dalhousie University, Dept. of Mathemathics, Statistics and Computing Science, Halifax, Canada. Randles, R.H. and D.A. Wolfe (1979). Introduction to the Theory of Nonparametric Statistics. Wiley, New York. Rissanen, J. (1989). Stochastic Complexity in Statistical Inquiry. World Scientific, Singapore. Rissanen, J. (1994). Fisher information and stochastic complexity. IEEE Trans. Inform. Theory (to appear). Rissanen, J., T.P. Speed and B. Yu (1992). Density estimation by stochastic complexity. IEEE Trans. Inform. Theory 38, 315-323. Shannon, C.E. (1948). A mathematical theory of communication. Bell Syst. Tech. J. 47, 143-157.