Journal of Statistical North-Holland
Planning
and Inference
41 (1994) 61-72
61
Further comments on Matveychuk and Petunin’s generalized Bernoulli model, and nonparametric tests of homogeneity Norman
L. Johnson
Department of Statistics, University of North Carolina, Chapel Hill, NC 27599-3260,
USA
Samuel Kotz Department of Management Science and Statistics, University of Maryland, College Park, MD 20742, USA Received
19 February
1993
Matveychuk and Petunin (Ukrainian Math. J. 42 (1990) 518-528; Ukrainian Math. J. 43 (1991) 779-785) and Johnson and Kotz (Statistical Papers 32 (1991) 1-17) have studied a generalized Bernoulli model defined in terms of placement statistics from two random samples. In this paper we will comment on a test of homogeneity for two populations, introduced by Matveychuk and Petunin and suggest a possible symmetrical test criterion. Illustrative tables, relevant to the latter, are presented. AMS Subject Classification: 62GlO. Key words: Placement
statistics;
tests of homogeneity.
1. Introduction
We will, for convenience, use the abbreviation MP for Matveychuk and Petunin. Also for convenience of readers, we recapitulate the MP (1990/1991) model. Let X=(X1, . . . ,X,), Y=(Y,, . . . . Y,) be random samples of sizes n, m from absolutely continuous distributions with common cumulative distribution functions Denote the order statistics of X by (CDFs) Fx(x), F&J), respectively. X(1)
We are concerned, firstly, with the distributions of the number, T of Y’s falling in and the relative frequency, T/m.
Ji,q,
Correspondence to: N.L. Johnson, NC 27599-3260, USA.
0378-3758/94/$07.00 0 1994-Elsevier SSDI 0378-3758(93)E0088-X
Department
of Statistics,
University
Science B.V. All rights reserved
of North
Carolina,
Chapel
Hill,
62
N.L. Johnson,
S. Katz/
Matveychuk and Petunin’s generalized Bernoulli model
When F,(x) = Fy(x), the distribution of T is m
Pr[T=t]=
t
q(q+
l)...(q+t-
(n+l)(n+2)...(n+m)
0 =
m Lp’(n+ 1 -qptl 0
2. Matveychuk
(n +
t
where atbl=a(a+ l)...(a+bJohnson and Kotz (1991)).
l)(n+ 1 -q)...(n+m--t-q)
l)[“l
(t=O, 1, . . . ,m),
1) (see equation (1) of MP (1991); equation (19) of
and Petunin’s tests of homogeneity
MP (1991) suggest using T as a test criterion for the hypothesis Ho F,(x) =Fr(x), using the data described in Section 1. Their analysis is based on the limiting distribution of the statistic T. They propose a critical region of the form
IT-ECTlHolI>t Pl 4Tl Ho) where tS is chosen to make the (approximate) level of significance of the test equal to 2fl. For example, if approximate normality of T is appropriate then tS would be defined by q-tp)=fl.
Katzenbeisser (1985) obtained a formula for the distribution of T when the interval Ji, is just (- co, X,,,) and applied it to derive the exact distribution of the test criterion under the null hypothesis and under Lehmann alternatives FY(x)=[Fx(x)le with 19# 1. He extended the analysis to shift alternatives (Katzenbeisser, 1986). We (Johnson and Kotz, 1991) obtained formulas for the moments of P = Pr[ YEJi,,], under Lehmann alternatives. From these it is straightforward to calculate the expected value and variance of T from E[T]=mE[P],
Var(T)=mE[P]{l-E[P]}+m(m-l)Var(P)
(3)
(see equations (l), (5.2) and (5.4) of Johnson and Kotz, 1991). Numerical values of E[T] and Var( T) can be obtained for n = 68 from Table II of Johnson and Kotz (1991) (noting that Cov (I(&), I(A,)) =Var(P)). Further values (of E[T] and $G~@=SD(T)),
for n=20 and 30, and selected values of i and q are given in Table 1 of the present paper. MP (1991) point out that when using their test criterion (6) it is desirable to choose ‘appropriate values’ of i and q. Power considerations will clearly be an important factor in such a choice (though not the only one, for example, possibility of inaccurate values in tails of distribution might come into play). MP do not enter into details of
-
._
0.5611 0.0850 0.6018 0.0855 0.6129 0.0861 0.6194 0.0872 0.6221 0.0907 0.6149 0.0956
0.5115 0.0856 0.5418 0.0870 0.5484 0.0880 0.5507 0.0893 0.5467 0.0929 0.5321 0.0973
0.7 0.9 1.0 1.1 1.3 0.5
_.
(i =
(i=5,q=17)
n=30
_
_ -
_
5,q = 19)
0.6482 0.0998 0.6987 0.0970 0.7143 0.0963 0.7251 0.0965 0.7358 0.0992 0.7365 0.1045
3,q = 15)
0.5766 0.1026 0.6109 0.1027 0.6191 0.1035 0.6227 0.1050 0.6204 0.1096 0.6093 0.1157
(i =
0.7 0.9 1.0 1.1 1.3 1.5
3,q= 13)
SD
(i=
E
n = 20
SD
E
0 5,q = 9)
SD
_
_
0.6096 0.0838 0.6612 0.0828 0.6774 0.0826 0.6888 0.0830 0.7003 0.0855 0.7012 0.0899
(i= 5,q = 21)
0.3905 0.0992 0.4205 0.1038 0.4286 0.1055 0.4331 0.1071 0.4338 0.1104 0.4266 0.1138
(i =
E
Table1 Expected value(E)and standard deviation (SD)ofP
7,q = 17)
-
0.4863 0.0840 0.5332 0.0869 0.5484 0.0880 0.5593 0.0891 0.5708 0.0921 0.5720 0.0960
(i =
9,q = 13)
0.3682 0.0801 0.4066 0.0853 0.4194 0.0872 0.4286 0.0889 0.4386 0.0920 0.4398 0.0950
(i =
0.5365 0.1009 0.5971 0.1030 0.6191 0.1035 0.6365 0.1041 0.6605 0.1059 0.6734 0.1092
13)
SD
0.4649 0.1010 0.5094 0.1050 0.5238 0.1065 0.5341 0.1080 0.5450 0.1116 0.5462 0.1159
E (i = 5, q =
SD
(i= 5,q = 11)
E
7,q = 7)
SD
0.2921 0.0908 0.3230 0.0979 0.3333 0.1005 0.3408 0.1026 0.3490 0.1061 0.3500 0.1090
(i =
E
7,q = 9)
SD
0.3665 0.0956 0.4119 0.1029 0.4286 0.1055 0.4419 0.1078 0.4602 0.1118 0.4696 0.1156
(i =
E
64
N.L. Johnson, S. Katz/ Matveychuk and Petunin’s generalized Bernoulli model
achievement of an appropriate choice. Some indication of power might be obtained from the table using normal approximations, but we are not yet in a position to make an extensive investigation.
3. Possible symmetric tests of homogeneity The tests of Matveychuk-Petunin and of Katzenbeisser both use test statistics which are asymmetric with respect to the two populations being compared. They use the number T, of Y values in an interval (Ji,,) defined by the X-order statistics. A different value for the test criterion would be obtained if the number T’, say, of X values is an interval, J I,,4,, say, defined by Y-order statistics were employed - even if n=m, i=i’ and q=q’. It is, therefore, worthwhile to consider the possibility of using test statistics based on values of both Tequal to the number of Y-values in Ji,q r(Xcij, X(i+q)) and T’ equal to the number of X-values in ( Y;i,), Y(,,+4.j). Complete symmetry would be obtainable only if n = m (and would be attained if i = i’, q = q’). In Table 2 we present values of Pr[T= t, T’ = t’] for some such symmetrical cases, based on the null hypothesis F,(x) = FY (4. The calculations in Table 2 required detailed consideration of six different cases arising from different orderings of the values of X(i), X(i+q), Yci,,and Yci,+q,). Figure 1 sets out the analysis in diagramatic form. Each of the rows (l)-(6) in Figure 1 corresponds to one of the six possible orderings of X(i), X(i+q), Yci,,and Yci,+,,,. Row (l), for example, corresponds to y(i’)
<
Y(i’
+q’)
<
x(i
+ 4)’
In each row the expressions above (below) the horizontal line represent the numbers of X-( Y-) values in the intervals defined by the four-order statistics. In row (l), for example, a denotes the number of X values less than Yci,,,while (of course) there are (i’- 1) Y-values less than Y,i,,. In each row, a and g can take any integer values, subject to the requirement that none of the expressions represent negative numbers. Thus, in row (1) we must have O
max(i’,i’+q’-T)
Note also that (i) in all cases there are (q- 1) X’s in (X,i,, Xci+q));(q’- 1) Y’S in (Yci,,, Y,i’+*‘));t Y’S in (X{i], Xti+q)); t’ X’S in (Yci’J,Yci’+,,,); (ii) the Y-values in (X(i), X(i+q)) are always Y(,+ 1J,. . . , YcS+=),so there are g Y’s less than Xfi,, m-g-T Y’s greater than X(i+q); (iii) ties can be ignored, since the X’s and Y’s are mutually independent continuous random variables. Saran and Rani (1991) develop a general method of applying Dwass’ (1967) technique for deriving joint distributions of rank order statistics, which reduces these
N.L. Johnson, S. Katz/ Table 2 Values of Pr[T=t,
t’ n=m=lO,
0
0 1 8 9 10
1
1 2 4 6 7 8 9 10
Matveychuk
Pr[T=t, i=i’=2;
T’=t’]
t
t’
Pr[T=t,
2
2 3 4 5 6
0.0009 0.0010 0.0010 0.0011 0.0012 0.0013 0.0040 0.0091 0.0053
0.0000 0.0002 0.0002 0.0002 0.0002 0.0002 0.0002 0.0010 0.0032 0.0021
8 9 10 3
3 4 5 6
8
t
t’
Pr[T=t,
0.0013 0.0024 0.0031 0.0038 0.0044 0.0108 0.0182 0.0091
4
4 5 6
0.0042 0.0064 0.0088 0.0114 0.0224 0.0273 0.0114
8 9 10 5
5 6 8 9 10
0.0111 0.0170 0.0239 0.0373 0.0300 0.0100
8 9 10
0.0286 0.0436 0.0468 0.0200 0.0050
7
7 8
0.0709 0.0300
8
8
0.0100
10
10 11 12 13 14 15
0.0089 0.0145 0.0223 0.0351 0.0273 0.0091
11
11 12 13 14 15
0.0247 0.0392 0.0429 0.0182 0.0045
12
12 13
0.064 1 0.0273
13
13
0.009 1
6
6
q=q’=l2
0
o-1
All 0.0000
1
1-15
All 0.0000
2
2-13 14 15
All 0.0000 0.0001 0.0001
3
3-12 13 14 15
All 0.0000 0.0001 0.0004 0.0004
4-7 8 9 10 11 12
All 0.0000 0.0001 0.0001 0.0001 0.0001 0.0001
4
T’=t’]
q=q’=7
0.0001 0.0002 0.0002 0.0006 0.0005
i=i’=2;
Bernoulli model
65
T’=t’]
9 10
n=m=lS,
and Petunin’s generalized
6
7
6 I 8 9 10 11 12 13 14 15
0.0001 0.0002 0.0003 0.0004 0.0006 0.0008 0.0010 0.0028 0.0067 0.0045
7 8 9 10 11 12 13 14
0.0003 0.0006 0.0009 0.0013 0.0019 0.0025 0.0064 0.0123
T’=t’]
N.L. Johnson, S. Katz/ Matveychuk and Petunin’s generalized Bernoulli model
66
Table 2 (continued)
t
5
t’
Pr[T=t,
13 14 15
0.0004 0.0013 0.0011
5 6 I 8 9 10 11 12 13 14 15
0.0000 0.0001 0.0001 0.0001 0.0002 0.0002 0.0003 0.0004 0.0011 0.0032 0.0024
T’=t’]
t
t’
Pr[T=t,
15
0.0072
8
8 9 10 11 12 13 14 15
0.0010 0.0017 0.0027 0.0041 0.0057 0.0129 0.0195 0.0097
9
9 10 11 12 13 14 15
0.003 1 0.005 1 0.0080 0.0117 0.0228 0.0260 0.0108
T’=t’]
t
t’
Pr[T= t, T’= t’]
Notes: (1) Pr[T=t, T’=t’]=Pr[T=t’, T’=t]. (2) All omitted values are zero. (3) 0.0000 denotes > zero but less than 0.00005.
problems to analysis of simple binary random walks. However, we found technical difficulties - comparable to those of the method we used - in applying this method to our present problem. We have
where
x
n-i-q+m-g-t [
x
m-g-t
q-t’+i-a-l+t-i’-q’+g [
t-ii’-q’+g
1’
I[
n-i-q+m-g-t m-g-t
1’
N.L. Johnson, S. Katz/ Matveychuk and Petunin’s generalized Bernoulli model
a
T’
I
r,i., i’- 1
i-a-T’-1
I
5,.+@,
)
q’-l
a
g-it--q
)
i-a-l
)
x(i+*, T
I
m-g-T
I
q-T’fi-a-l
T’-ifa
)
n-i-q
1
q-l
X(i)
y,i. +,$I
Y,?,
g-i’
i’- 1
a
i’+qr+g-1
X(i)
T’-ifa-q
g-i’
i-l
1 xv,
a
i’+q’-g-T-1
T’
/ r,i.,
g
i’-g-l
I
i-l
q-a-T’-1
1 r,i, +4’)
(
d-1
a
q-a-l
X0,
JCi+,,
Y,?,
1 q-l
i’+q’-g-T-1
Y,? +4’)
)
n-i--q-T-a
&+4 i’-g-T-1
9
Y(i,, __ I d-1
Fig. 1. Distribution
a+?-1 Mag3=
i/_-l [
L [
of (T, T’)
I[
m-if-q'
n-i-q+m-g-t
X
y(i+q,, (
. l-aj~g-i’][q-f+t][t’-i;;;~~;~;t-l]
n-t’-a+m-i’-q’
X
m-g-t
T’-a
I
)
X(i)
n-i-
T’-qfa
)
T-i’fg
)
2-g-l
n-i-q
m-g-T
T-ii’-q’+g
)
x0+4
g
i-l
n- T’-a
I y,i.cq’) m-if-q’ (
& +4)
y,,,,
(
m-g-T
T-i’-q’+g
I
I g-1 I
i-a-l
C-1
n-i-q
) &+q1
-6
I> 1’
m_i’-q’
m-_i’_--q’
67
N.L. Johnson, S. Katz/
68
Matoeychuk and Petunin’s generalized Bernoulli model
I[ M..n=[i-Ig+g][q-:+t] [ 1 x
t’-q+++i’+q’-g-t-l
~-iit'_a++_ii'-qq
i’+q’-g-t-l
[
m-if-q'
n-ii-qq-t'_a+m-f-q'
X
if A>B
Table 3 Values of Pr [ 1Tn=m=15 (i=i’=2,
m-it-q'
or A-CO or B-CO, then
T’J= w] (i=i’=2,
(i= j’=4,
(i=i’=4,
(i = i’ = 6,
(i = i’ = 6,
w
4=4’=12)
q=q’=8)
q=4’=8)
4=4’=6)
4=4’=4)
q=q’=2)
0
0.1115 0.1775 0.1544 0.1421 0.1262 0.1032 0.0763 0.0507 0.0303 0.0161 0.0075 0.0030 0.0010 0.0003 0.0000 0.0000
0.0876 0.1495 0.1398 0.1317 0.1205 0.1050 0.0864 0.0665 0.0475 0.0312 0.0185 0.0096 0.0042 0.0015 0.0004 0.0000
0.0938 0.1639 0.1484 0.1298 0.1114 0.0946 0.0787 0.0631 0.0474 0.0325 0.0198 0.0103 0.0045 0.0015 0.0004 O.OQOll
0.1019 0.1790 0.1577 0.1326 0.1095 0.090 1 0.0733 0.0574 0.0421 0.0278 0.0161 0.0079 0.0032 0.0010 0.0002 0.0000
0.1295 0.2233 0.1792 0.1339 0.0996 0.0763 0.0588 0.0429 0.0282 0.0161 0.0077 0.0031 0.0010 0.0003 0.0000 0.0000
0.2101 0.3007 0.1881 0.1195 0.0768 0.0485 0.0289 0.0155 0.0073 0.0030 0.0011 0.0003 0.0001 0.0000 0.0000 0.0000
1 2 3 4 5 6 I 8 9 10 11 12 13 14 15
n=m=30
W
0 1 2 3 4 5 6 7 8 9 10
(i = i’ = 2,
(i = i’ = 6,
(i = i’ = 2,
(i = i’ = 6,
q=q’=27)
q=q’=14)
W
q=q’=27)
q=q’=14)
W
0.1007 0.1608 0.1427 0.1345 0.1223 0.1032 0.0804 0.0584 0.0394 0.0251 0.0151
0.0600 0.1104 0.1066 0.1011 0.0944 0.0868 0.0785 0.0700 0.0613 0.0526 0.0442
11 12 13 14 15 16 17 18 19 20
0.0086 0.0047 0.0024 0.0012 0.0005 0.0002 0.000 1
0.0361 0.0286 0.0220 0.0163 0.0116 0.0079 0.0051 0.0031 0.0018 0.0010
21 22 23 24 25 26 27 28 29 30
(i = i’ = 2,
(i = i’ z 6,
q=q’=27)
q=q’=14) 0.0005 0.0002 0.0001
N.L. Johnson, S. Katz/ Matueychuk and Petunin’s generalized Bernoulli model Table 4 Values of Pr[T+ u
Pr[T+
n=m=lO 0 1 2 3 4 5 6 7 8 9 10
0.0001 0.005 0.0006 0.0004 0.0013 0.0023 0.0041 0.0074 0.0135 0.0262 0.0528
T’=u] T’=o]
u
Pr[T+T’=u]
(i=i’=2;
4=4’=7)
11 12 13 14 15 16 17 18
0.1006 0.1684 0.2346 0.2473 0.1200 0.0200 _
19 20
-
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
0.0003 o.ooo5 0.0010 0.0019 0.0036 0.0068 0.0129 0.0240 0.0427 0.0723 0.1146 0.1666 0.2132 0.2172 0.1047 0.0174 _ _ _ _
Pr[T+
”
(j=i’=3; 0 1 2 3 4 5 6 7 8 9 10
T’=u]
u
Pr[T+
11 12 13 14 15 16 17 18 19 20
0.1574 0.0619 0.0136 0.0015 _
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 40
0.0040 0.0073 0.0131 0.0235 0.0409 0.0682 0.1066 0.1527 0.1938 0.2009 0.1240 0.0479 0.0105 0.0012 _ _
36 37 38 39 40 41 42 43 44 45 46 47
0.1302 0.1601 0.1684 0.1282 0.0774 0.0381 0.0154 0.0051 0.0014 0.0003 0.0001 0.0000
4=4’=5) 0.0014 0.0049 0.0082 0.0092 0.0119 0.0214 0.0400 0.0749 0.1372 0.2118 0.2447
_ _ _ _
n=m=20
O-18
19 20
69
All 0.0000
0.0001 0.0002
nxm=3O(i=i’=4; <28 0.0000 (Total: 0.0001(l)) 29 0.0001 30 0.0001 31 0.0002 32 0.0004 33 0.0007 34 0.0012 35 0.0020 36 0.0034 37 0.0059 38 0.0102 39 0.0174 40 0.0289
14 15 16 17 18 19 20 (j=i’=6;
q=q’=23)
46 47 48 49 50 51 52 53 54 55 56 57
O-13 All zero
0.1796 0.1226 0.0602 0.0212 0.0052 0.0009 0.0001 _ _ _ _
0.0001 0.0001 0.0002 0.0004 0.0007 0.0013 0.0023
_ _
q=q’=l9)
<19 0.0000 (Total: O.OOOl(2)) 20 0.0001 21 O.olMl 22 0.0002 23 0.0003 24 0.0006 25 0.0009 26 0.0015 27 0.0024 28 0.0038 29 0.0063 30 0.0102 31 0.0167
T’=o]
70
N.L. Johnson, S. Katz/ Matveychuk and Petunin’s generalized Bernoulli
model
Table 4 (continued) ”
Pr[T+
41 42 43 44 45
0.0465 0.0718 0.1048 0.1424 0.1743
Note:
T’=o]
0.0000 denotes
U
Pr[T+T’=v]
u
Pr[T+
58 59 60
-
32 33 34 35
0.0271 0.0430 0.0659 0.0959
‘
T’=u]
v
Pr[T+T’=u]
48 >49
0.0000 All zero
‘zero’.
0.1 g
0.08
c; c z 0.06 0.04 0.02 0
0
2
4
6
6
(a)
10
12
14
16
16
10
12
14
16
10
(b) n=m=20;
i=i’=6;
q=q’=9;
t
0.12 0.1 r
0.06
z %
0.06 0.04
'
2
4
6
0
@I
Fig. 2. Distribution
t
of IT-T’l:
(a) n=m=20; (c) n=m=30;
i=i’=4; i=i’=2;
q=q’=l3; q=q’=27.
11
N.L. Johnson, S. Katz/ Matveychuk and Petunin’s generalized Bernoulli model 0.16
-
0.14
7
0.12
+ I
p 2
0.1 0.06
t 7
’
0.06
-c
0.04 t
0.02
1
iI
0;
(4
0
4
2
6
6
10
12
14
16
16
Fig. 2. (continued)
M,,, is the number of different orderings of X- and Y-values giving the configuration in row (Y). In row (l), for example, there can be
c
a+?- 1 i-i
1
different orderings of X’s and Y’s in the first interval; t’ + q’ - 1 [
q’-
1
1
in the second interval, and so on. Mogr is the product of these quantities. Intuitively, the statistic 1T- T'Ishould be useful as a test criterion for difference in variability, while (T+ T')should be appropriate for a test for shift alternatives Distributions of these, and, of course, other functions of T and T' can be derived directly from the joint distribution of T and T'.As examples, we give a few distributions of 1T- T'(and (T+ T')in Tables 3 and 4, respectively. Figures 2(a)-(e) are diagramatic representations of some 1T- T'(distributions. They are of generally similar form, although the values of Pr[I T- T'j= t]for t =0 and 1 are noticeably higher in Figure 2(c) than they are in Figures 2(a) and 2(b). Further, tables are available from the second author.
Acknowledegment
We are grateful calculations.
to
Dr.
Qi-Wen
Wang
for assistance
in the numerical
12
N.L. Johnson,
S. Katz/
Matveychuk
and Petunin’s
generalized
Bernoulli
model
References Dwass, M. (1967). Simple random walk and rank order statistics. Ann. Math. Statist. 38, 1042-1053. Johnson, N.L. and Kotz, S. (1991). Some generalizations of Bernoulli and Polya-Eggenberger contagion models. Statist. Papers 32, 1-17. Katzenbeisser, W. (1985). The distribution of the two-sample location exceedance test statistics under Lehmann alternatives. Statist. Hefte (now Papers) 26, 131-138. Katzenbeisser, W. (1986). The exact power of two-sample location tests based on exceedances against shift alternatives. Math. Op. Forsch. (Statistics) 20, 47-54. Matveychuk, S.A. and Y.T. Petunin (1990/1991). A generalization of the Bernoulli model arising in order statistics. I/II Ukr. Math. J. 42, 518-528/43, 779-785. Saran, J. and S. Rani (1991). On joint distributions of order statistics for equal sample sizes, Statistics 22, 299-312.