COMPUTATIONAL STATISTICS & DATAANALYSIS ELSEVIER
Computational Statistics & Data Analysis 20 (1995) 409-419
A distribution free test for the two sample problem for general alternatives F r i e d r i c h Schmid,* M a r k T r e d e Seminar fiir Wirtschafis- und Sozialstatistik, Universitiit zu K61n, Albertus-Magnus Platz, 50923, K6ln, Germany
Received October 1993; revised July 1994
Abstract. A new distribution free test for the two sample problem is presented. The test statistic is derived from a descriptive measure of interdistributional inequality, thus having an intuitive pictorial interpretation. Some quantiles of the exact distribution of the new test statistic under Ho are computed for balanced samples sized up to fourteen observations from each distribution. The asymptotic distribution of the test statistic under H0 is that of the integral of the absolute value of a Brownian bridge. Using Monte Carlo simulations we found that the distribution of the test statistic is already reasonably well approximated by the asymptotic distribution for rather small sample sizes. Further, we compare the new test in terms of power against general alternatives to other well-known two sample tests, namely the Cramer-von Mises test, the Kolmogorov-Smirnov test, and the Wilcoxon-Mann-Whitney test. It turns out that the new test performs similarly as the Cramer-von Mises test. Both tests clearly dominate the Kolmogorov-Smirnov test. Keywords: Cramer-von Mises test; Kolmogorov-Smirnov test; Power of distribution free tests;
P-P-Plot; Wilcoxon-Mann-Whitney test
1. Introduction T h i s p a p e r p r e s e n t s a n e w d i s t r i b u t i o n free test p r o c e d u r e for the t w o s a m p l e p r o b l e m (i.e. H o : F = G versus H 1: F :# G). It c a n be c o n s i d e r e d as a n a l t e r n a t i v e to w e l l - k n o w n t w o s a m p l e tests such as the K o l m o g o r o v - S m i r n o v o r the C r a m e r - v o n M i s e s test. T h e n e w test is d e s i g n e d to h a v e high p o w e r a g a i n s t general alternatives, i.e., a l t e r n a t i v e s o t h e r t h a n l o c a t i o n a n d scale. A p o s s i b l e field of a p p l i c a t i o n is testing for e q u a l i t y o f i n c o m e d i s t r i b u t i o n s .
* Corresponding author. 0167-9473/95/$09.50 © 1995 Elsevier Science B.V. All rights reserved SSDI 01 6 7 - 9 4 7 3 ( 9 4 ) 0 0 0 4 5 - X
F. Schmid, M. Trede / Computational Statistics & Data Analysis 20 (1995) 409-419
410
The structure of the paper is as follows. In the next section we motivate the test statistic. It is based on a descriptive measure of inequality between two distribution functions F and G. In Section 3 exact quantiles of the small sample distribution of the test statistic are given for the balanced case (n = m < 15) and the asymptotic distribution is derived. For m e d i u m sized samples (n = 15, 25, 50) we compute the distribution of the test statistic by means of Monte Carlo simulation and compare it with the asymptotic distribution. Section 4 is devoted to power studies. In particular, we compare the new test to the Kolmogorov-Smirnov, the Wilc o x o n - M a n n - W h i t n e y and the C r a m e r - v o n Mises tests in terms of power against general alternatives within parametric families of income distributions. It turns out that the properties of the new test are similar to those of the C r a m e r - v o n Mises test.
2. Motivation of the test statistic Let F, G denote two continuous distribution functions on the real line. F and G may represent the income or earnings distributions of individuals with different socio-economic characteristics. For instance, we might look at the earnings distribution of blue-collar workers as against that one of white-collar workers. Consider the probability plots (P-P-plots) R ~ [0, 1] 2,
F x)l
x ~ LO(x)j and x
LF(x ) j
or, written as functions [0, 1] ~ [0, 1] rather than as curves
p ~ F(G-I(p)) and
p ~-~ G(F-I(p)) (see Fig. 1 for an example). The graphs of the two curves coincide with the diagonal of the unit square if and only if F = G. Therefore, the area between the two curves can be taken as a descriptive measure of inequality between F and G. This area equals
A(F, G) =
f2
IF(G-~(p)) - G(F-l(p)) I dp
--f~- o~[F(x)- G(x)l dF(x) + f~-o~I F ( x ) - G(x)l dG(x).
F. Schmid, M. Trede/Computational Statistics & Data Analysis 20 (1995) 409-419
411
cy~ 0
0
tD
~-(5 v I,
v
t.O
X 0
t'O 0 o4 0 "q. 0 0
(5
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
F(x), C(x) Fig. 1. P-P-Plot of F ( x ) = 1 - e -x, G(x)= 1 - ( 1 + 1.8x)-1.
Obviously,
(i) A (F, G) > 0, (ii) A (F, G) = A (G, F), (iii) A (F, G) = 0 ¢:, F = G. Note that A (F, G) is not a metric on the space of distribution functions since it does not satisfy the triangular inequality. A discussion of further properties of A ( F , G) and an empirical application to the measurement of interdistributional income inequality can be found in Schmid (1993). It is now simple to derive a test statistic from A. Let X1 . . . . , Xn and Y i . . . . . Ym denote independent r a n d o m samples from F and G and let /~n and (~m denote the corresponding empirical distribution functions. Define the test statistic T,t nm "'
=( nm ~l/2f+~ \ n + m /
-o~
IF.(x) - G,.(x)[ d
nF.(x)+mGm(x) n + m
Note that this statistic is very similar to the C r a m e r - v o n Mises statistic TCM.m ''
~
(F.(x) - Gin(x)) 2 d n F . ( x ) + m G . , ( x ) -~
n+m
as considered by Anderson (1962). Ta,.,,,, is in fact just the Ll-version of the C r a m e r - v o n Mises statistic. Various modifications and extensions of T~ .... are possible in principle, e.g., one might be tempted to introduce a weight function @(ft.(x), G.,(x)) in order to give different weights to the differences IP.(x) - Gm(x)l,
412
F. Schmid, M. Trede / Computational Statistics & Data Analysis 20 (1995) 409-419
thus arriving at an Ll-version of the A n d e r s o n - D a r l i n g statistic. Details of these modifications, however, are not pursued in this paper. Ta,,,m m a y be expressed in terms of ranks. Let • . . ~_~ Z(n+m)
Z ( l ) -~ Z ( 2 ) ~
denote the ordered pooled sample and let
Rx(i) := rank of X¢i) in Z¢a) < • • • < Z¢.+m), R r ( j ) : = rank of Y¢i) in Z¢~) _< • • • < Z¢.+,.~. It can be shown that
T~ . m =
nm
1
' '
Rx__(i)
-~m
+
Rr(j) j=l
j
/'/
i=1
i
m
+--
+ -m
.
m
For the balanced case n = m this reduces to
Ta.....=k(n)l/2 2n1--L{ 2 ~ ]Rx(i)-2i[+ i=1
~ ,Rr(j)-2j] j-I
Since under Ho every combination of ranks occurs with equal probability the test statistic is distribution free under Ho.
3. Finite sample and asymptotic distribution of the test statistic Tn,.,m under Ho The asymptotic distribution of the test statistic Ta...m can be derived in the same way as that of the C r a m e r - v o n Mises statistic TcM..... Both are continuous functionals of the empirical process
( nm ) 1/2 n + m/ ( F . ( x ) - Gin(x)), which converges in distribution to a Brownian bridge U(t) (see Shorack and Wellner, 1986; Durbin, 1973). For the C r a m e r - v o n Mises statistic we have (see Durbin, 1973) under H0 as n, m ~ 0o and n/m ---. 2 for 2 > 0
rcM,.,m --*
(g(t))2dt
and in the same way we m a y conclude that under Ho, Ta .... ~ ~ I U(t)l dt.
F. Schmid, M. Trede / Computational Statistics & Data Analysis 20 (1995) 409-419 Table 1 Exact quantiles of the distribution of T~ .... under Ho for the balanced case with n = m = 2. . . . . 14
c
P(Td . . . . ~ C)
n= 2
0.500000
0.333333
n= 3
0.612372 0.476290
0.100000 0.200000
n=4
0.707107 0.618718 0.530330
0.028571 0.057143 0.114286
n=5
0.790570 0.727324 0.664078 0.600833 0.537587
0.007937 0.015873 0.031746 0.055556 0.111111
n=6
0.769800 0.721688 0.625463 0.577350 0.529238
0.008658 0.015152 0.045455 0.069264 0.101732
n=7
0.782694 0.744513 0.629973 0.591793 0.553613 0.515432
0.006993 0.011072 0.040210 0.056527 0.079837 0.110722
n=8
0.781250 0.750000 0.625000 0.593750 0.531250 0.500000
0.007304 0.010723 0.040404 0.054701 0.093085 0.119503
n=9
0.772579 0.746390 0.615445 0.589256 0.536877 0.510688
0.008021 0.011024 0.043233 0.054957 0.086548 0.107569
n = 10
0.760263 0.737902 0.603738 0.581378 0.536656 0.514295
0.009277 0.012070 0.047241 0.057838 0.084836 0.102026
Sample size
n=m
413
414
F. Schmid, M. Trede / Computational Statistics & Data Analysis 20 (1995) 409-419 Table 1 Continued n = 11
0.765584 0.746203 0.610529 0.591147 0.533002 0.513620
0.008701 0.010895 0.043673 0.052082 0.086092 0.101025
n = 12
0.765466 0.748455 0.595362 0.578352 0.527321 0.510310
0.008693 0.010561 0.049513 0.057619 0.089028 0.102342
n = 13
0.761836 0.746750 0.595891 0.580805 0.520462 0.505376
0.009050 0.010734 0.048745 0.055758 0.093202 0.105384
n = 14
0.755929 0.742430 0.593944 0.580445 0.512952 0.499453
0.009657 0.011226 0.049170 0.055443 0.098194 0.109535
The distribution of SIIu(t)I dt has been studied by Johnson and Killeen (1983), Rice (1982) and Shepp (1982) and a table of quantiles is given in Johnson and Killeen (1983). For large samples the test procedure is now standard. Ho: F = G is rejected in favour of Hi: F ~ G if T~ .... > cl -~ where ~ is the significance level and cl _~ is the (1 -~)-quantile of the distribution of $1[U(t)l dt. The test is consistent against any alternative F 4: G as follows immediately from the definition of the test statistic T~,,,,,. However, some Monte Carlo simulations suggested that the test is not unbiased. For small samples (n = 2 , . . . , 14) we have computed some relevant quantiles of the exact distribution of T~,,,m under Ho. They are shown in Table 1. Since T~ .... is distributed discretely a certain significance level cannot be attained exactly unless randomization is used. To determine the distribution of T~,,,,, for medium sample sizes we carried out Monte Carlo simulations setting n = m = 15, 25, 50. Fig. 2 shows a histogram of N = 10000 simulated values of TA,25,25 and the density of $o~I u(t)l dt. Table 2 shows quantiles of TA . . . . based on N = 10000 simulations for n = m = 15, 25, 50. Besides, the last column reports the asymptotic quantiles (taken
F. Schmid, M. Trede / Computational Statistics & Data Analysis 20 (1995) 409-419
415
o to tO
O
-4 tO
,5 O
~5 tO t'~0
,q O
e4
/
o. tO
,5 q O
"k ...~ I
0.0
0.1
0.2
I
I
0.3
I
I
I
0.4
I
I
0.5
I
I
0.6
0.7
0.8
i
0.9
1.0
TA,25,25
Fig. 2. Histogram and density for n = m = 25.
Table 2 Simulated quantiles of T~.... for the balanced case with n = m = 15, 25, 50 Sample size n = m Quantiles
15
25
50
oo
x0.05 x0.1 x0.5 Xo.9 Xo.9s x0.99
0.1521 0.1765 0.2860 0.5051 0.5903 0.7364
0.1556 0.1725 0.2857 0.5006 0.5855 0.7439
0.1520 0.1720 0.2820 0.4960 0.5820 0.7460
0.1531 0.1721 0.2818 0.4993 0.5821 0.7518
p a
0.3177 0.1358
0.3150 0.1364
0.3127 0.1348
0.3133 0.1360
f r o m J o h n s o n a n d Killeen, 1993). T h e t w o b o t t o m r o w s s h o w t h e m e a n a n d t h e s t a n d a r d d e v i a t i o n o f t h e s i m u l a t e d T~,,,,, a n d o f t h e a s y m p t o t i c d i s t r i b u t i o n . T a b l e s 1 a n d 2 a n d Fig. 2 i n d i c a t e fairly q u i c k c o n v e r g e n c e o f t h e d i s t r i b u t i o n o f T~ .... t o t h e d i s t r i b u t i o n o f dt under Ho. The asymptotic quantiles are r e a s o n a b l y a c c u r a t e e v e n f o r s a m p l e sizes as s m a l l as n = m = 15. ( W e h a v e n o t l o o k e d at t h e u n b a l a n c e d c a s e n # m.)
I~lU(t)l
F. Schmid, M. Trede / Computational Statistics & Data Analysis 20 (1995) 409-419
416
4. Power of the test for general alternatives The power of tests for the two sample problem is usually investigated with respect to location and scale alternatives. As we are interested in testing for equality of income distributions we will determine the power of the new test with respect to general alternatives within families of skew distributions, and compare it with the power of the C r a m e r - v o n Mises test, the K o l m o g o r o v - S m i r n o v test and the W i l c o x o n - M a n n - W h i t n e y test. The families of distributions considered are: - Pareto distributions, i.e., F p a ( X ) = 1 - (l/x)% x > 1, - Log-normal distributions, i.e., FLN(X) = @(log X/a), X > O, -- S i n g h - M a d d a l a distributions, i.e., Fsu(X) = 1 - (1 + Xb) -c, X > O, The parameters of interest are ct, tr, b, and c, respectively. Parameters of scale have been set to unity throughout. Since the power can only be computed for specific alternatives we varied one of the parameters of interest and determined the power for each value by Monte Carlo simulation (with N = 10 000 replications). The power functions shown in Figs. 3-6 are cubic spline interpolations of some ten points in each case. They refer to a sample size of n = m = 25. The close relationship between the new test and the C r a m e r - v o n Mises test is a salient feature of the figures. Their power functions are almost identical. The W i l c o x o n - M a n n - W h i t n e y test is equally powerful as the latter two tests for Pareto
0 ,
,
,
,
,
,
,
,
,
,
,
•
,
•
,
•
¢5
o .von.o N~M~
0o
¢5 p~
Wilcoxon-Maan-Whitne'y test
c; o
O-
c; "d"
~3
¢5 ~5
q 0
i
1.0
I
1.2
i
I
i
1.4
I
1.6
Parameter
I
I
1.8
i
I
2.0
~ of the
i
I
2.2 Pareto
,i
I
i
2.4
I
2.6
distribution
Fig. 3. P o w e r functions for the P a r e t o alternatives.
i
I
2.8
i
3.0
417
F. Schmid, M. Trede / Computational Statistics & Data Analysis 20 (1995) 409-419 0
0
0 p~
6
Cl'amcr-von~ 0
N
~
,
~
~
0
0
0
Wil~xorvlVlmm-Whitneyt~t
0 0 0
I
I
1.0
I
I
1.4
I
1.8
I
I
I
I
I
2.6
2.2
I
I
3.0
I
I
3.4
I
3.8
4.2
Parameter a of the kognormal distribution Fig. 4. Powerfunctionsfor the Log-normalalternatives.
0
o~
c~
0
Cramer-von~,~test
_i
lqewt = s t ~ 0 Q_
0
Kolmogorov-Smimgv~st J J J
"d"
(6 ro
c~ ¢,q 0
0 o
c~
i
1.4
l
i
1.8
l
2.2
Parameter
I
I
2,6
i
l
i
3,0
b of the Singh-Moddolo
i
3.4
i
l
3.8
distribution
Fig. 5. Powerfunctionsfor the Singh-Maddalaalternatives.
i
4.2
F. Schmid, M. Trede / Computational Statistics & Data Analysis 20 (1995) 409-419
418
o
o~ o
lq'cw test
00
Wileoxon-/Vlann-Wh~~ j
c5
J /
p~
,5 to
,:5 t~ 0 £L
c5 d t'3
c5 cq
c5
q o 2.0
I
I
2.4
i
I
I
2.8
I
3.2
Parameter
I
I
3.6
I
I
4.0
I
I
4.4
c of the Singh-Moddala
i
I
i
4.8
I
5.2
I
,5.6
distribution
Fig. 6. P o w e r functions for the S i n g h - M a d d a l a alternatives.
alternatives and for Singh-Maddala alternatives concerning parameter c. It is clearly less powerful for Singh-Maddala alternatives concerning parameter b. Within the family of lognormal distributions (with p = 0, a > 0) it has minimal power because
P(X < Y ) = f f Fx(x) dFr (x) ----
0
X
~-~½.
for a > 0. It is well known that the Wilcoxon-Mann-Whitney test is not consistent for alternatives with P(X < Y ) = ½ (see Gibbons and Chakraborti, 1992). Notice that for the alternatives under consideration the Kolmogorov-Smirnov test is dominated by the new and the Cramer-von Mises test.
5. Conclusion We have presented a new distribution free test procedure for the two sample problem which is based on a descriptive measure of inequality between distribution functions, thus having an intuitive and pictorial appeal. For balanced samples sized n = m = 2 , . . . , 14 we present exact quantiles of the test statistic. For n = m = 15, 25, 50 the distribution of the test statistic is derived by Monte Carlo simulations.
F. Schmid, M. Trede/ Computational Statistics & Data Analysis 20 (1995) 409-419
419
There is evidence that convergence of the finite sample distribution to the asymptotic distribution is fairly quick in the case of balanced samples. As to power the test is very similar to the Cramer-von Mises test; we have not yet found instances where one of the two tests significantly dominates the other one. It is an interesting finding that the Kolmogorov-Smirnov test is always inferior to the new and the Cramer-von Mises test for the general alternatives under investigation, suggesting that the widespread popularity of the Kolmogorov-Smirnov test in text-books and in empirical applications is unjustified.
References Anderson, T.W., On the distribution of the two sample Cramer-von Mises Criterion, Ann. Statist. 33 (1962) 1148-1159. Durbin, J., Distribution theory for tests based on the sample distribution function, Regional Conference Series in Applied Mathematics (Society for Industrial and Applied Mathematics, Philadelphia, PA, 1973). Gibbons, J.D. and S. Chakraborti. Nonparametric Statistical Inference (Dekker, New York, 3rd ed., 1992). Johnson, B.McK. and T. Killeen, An explicit formula for C.D.F. of the L~-norm of the Brownian Bridge, Ann. Probab. 11 (1983) 807-808. Randles, R.H. and D.A. Wolfe, Introduction to the Theory of Nonparametric Statistics (Wiley, New York, 1979). Rice, S.O., The integral of the absolute value of the pinned Wiener process - calculation of its probability density by numerical integration, Ann. Probab. 10 (1982) 240-243. Schmid, F., Measuring interdistributional inequality, mimeographed (1993). Shepp, L.A., On the integral of the absolute value of the pinned Wiener process, Ann. Probab. 10 (1982) 234-239. Shorack, G.R. and J.A. Wellner, Empirical Processes with Application to Statistics (Wiley, New York, 1986).