STATISTICS& ELSEVIER
Statistics & Probability Letters 31 (1996) 107-112
A modified runs test for symmetry Reza Modarres ,,1, Joseph L. Gastwirth 2 Department of Statistics, The George Washington University, Washington DC, 20052, USA Received June 1995; revised December 1995
Abstract We present a modification of a recently proposed test of symmetry about a known center. The new test uses Wilcoxon scores to weigh the runs and has a limiting normal distribution under the null hypothesis. A Monte Carlo study shows that it is more powerful than both the runs and Wilcoxon signed-rank test for the symmetry problem against alternative in the lambda family.
Keywords: Runs test; Test of symmetry; Lambda family; Asymmetric alternative; Power
1. Introduction Let xl . . . . . xn be a sequence of independent and identically distributed random variables with density f and distribution function F and known center #. We consider the problem of testing H0 : F ( g - x) = 1 - F ( # + x) against Ha:F(p-x)¢
1-F(#+x).
Thus, we test whether the density f is symmetric about the median; i.e. f ( # - x) = f ( # + x) for all x or is skew. There is a large literature (Hollander and Wolfe, 1973; Lehmann, 1975; Randles and Wolfe, 1979) concerning the case when # is known and a test based on the theory of runs was recently proposed by Cohen and Menjoge (1988) and McWilliams (1990). Cohen and Menjoge (1988) and Henze (1993) proved that the test is consistent against asymmetric alternatives and McWilliams (1990) showed that the test is more powerful against alternatives in the lambda family than existing procedures based on Wilcoxon test, a Cram6r-von Mises type test and a test based on the empirical distribution function. * Corresponding author. E-mail:
[email protected]. 1 Research supported in part by The George Washington University Facilitating Fund. 2 Research supported in part by a grant from the NSF. 0167-7152/96/$12.00 (~) 1996 Elsevier Science B.V. All rights reserved PH S 0 1 6 7 - 7 1 5 2 ( 9 6 ) 0 0 0 2 0 - X
R. Modarres, J.L. Gastwirth I Stat&tics & Probability Letters 31 (1996) 107-112
108
In the next section, we present a modification of the one-sample runs test for symmetry and prove its asymptotic normality under the null hypothesis. The procedure uses Wilcoxon scores or their modifications by Gastwirth (1965) to weigh the runs. In Section 3, we present a Monte Carlo study which shows that the modified test is more powerful than the ordinary runs test and its competitors studied by McWilliams (1990) for a wide range of alternative asymmetric distributions.
2. A modified runs statistic
To define the test, assume p is known; so without loss of generality we can assume it is zero. Let x(1 ). . . . . X(n) denote the sample values ordered from smallest to largest according to absolute value and retain the signs; i.e. fold the observations around zero. Let S1. . . . . Sn denote indicator variables designating the signs of the x(k) values, 1, 0,
Sk =
ifx(~) ~> 0; otherwise.
The runs statistic counts the number of runs in the Sk sequence. That is,
R = ~/k, k=2
where I k = ~0, L 1,
if&=Sk_l; otherwise.
Under the alternative hypothesis, large negative or positive values tend to form clusters (runs) resulting in relatively a few runs. Thus, one rejects H0 when R is small. The null distribution of R, given by McWilliams (1990), is a binomial with parameters n - 1 and 12" Under the null hypothesis, P(Ik)'s are constant whereas under the alternative of asymmetry, P(Ik), k = 2 . . . . . n, are no longer independent of k. This suggests that a test based on the relative position, k, of the runs should be more powerful than R. For skewed alternatives, the runs should occur in the tails and thus we propose to modify the above test by giving more weight to those runs. Our tests are defined as
Mp = ~
(o(k)Ik,
k=np+2
where
4~(k )
f k-np, 0,
ifk>np; otherwise,
and p is a trimming proportion. If p = 0, then ~b(k) are the Wilcoxon scores. Otherwise, they are percentile modified test scores (Gastwirth, 1965). The R test proposed by Cohen and Menjoge (1988) and McWilliams (1990) corresponds to the choice of p = 0 and q~(k) = 1. Thus, R just counts the total number of runs and applies the sign test to (12,I3 . . . . . In). Mp, on the other hand, gives a weight to the relative position of each run.
It is not difficult to compute the exact null distribution of Mp for small samples as it is completely determined by the indicator variables Ik. Thus, the sample space can be considered as a set of 2 (n-l) ( n - 1)-tuples (I2,I3 . . . . . In), where each Ik is either 0 or 1. Under the null hypothesis, all (n - 1)-tuples are equally likely.
R. Modarres, J.L. Gastwirth I Statistics & Probability Letters 31 (1996) 107-112
109
Therefore, under H0, P(Mp = m) = r/(m)/2 ("-1), where q(m) is the number of ways to assign a 0 or 1 to ~b(k) so that Mp = m. Furthermore, the asymptotic distribution of Mp is given by the following lemma whose proof appears in the Appendix A. Lernma 1. Under the null hypothesis o f symmetry, Mp has an asymptotic normal distribution with mean and variance 1
~(Mp) = ~(n(1 - p ) - 1)(n(1 - p ) + 2),
o.2(Mp) = l ( n ( 1
_ p) _ 1)(2n2(1 _ p)2 + 5n(1 - p) + 6).
3. Monte Carlo simulation This section reports a Monte Carlo study comparing the new family of tests Mp to R and the Wilcoxon signed-rank test. This study is based on 10000 replications at c~ = 0.05 for sample sizes of n =20, 30, 50 and 100. Following McWilliams (1990), nine distributions, including normal, are selected from the generalized lambda family. Uniform random numbers, ui, are generated and are then transformed to generalized lambda variates, xi = 21 + u~'~ - (1 ui) )~4 22 -
0 < u < 1.
Finally, the observations are centered at the median. The program is written in SAS-IML and is run on IBM 4381. The two other procedures that are considered are R and T, the Wilcoxon signed ranks test based on the sum of the positive signed ranks of xl . . . . . xn (Hettmansperger 1984; Gibbons and Chakraborti, 1992). Finally, W and R are randomized to have exact size 0.05 and the asymptotic distribution of Mp is used in the study. Table 1 shows the number of rejections under the null and the alternative hypothesis for p = 0.0,0.10,0.20 and 0.25 and Fig. 1 shows the density functions used in the Monte Carlo study. We should note that a 95% confidence interval for the true number of rejections under H0 with a = 0.05 and 10 000 replications is (457,542). Table 1 shows that the modified tests are powerful procedures for detecting asymmetry. The tests Mp, p = 0 to 0.25 clearly dominate W and R in all cases considered. For example, consider n = 30 and cases 1-4 and 6-8. Table 1 shows that M0.25 is more likely to detect asymmetry than W and R by a factor of (6.1, 5.9, 2.4, 2.9, 1.6, 5.5, 4.7) and (1.9, 1.6, 1.6, 1.6, 1.4, 1.5, 1.5), respectively. Alternative 5 is the most difficult to detect as its skewness, s = 0.8 and its kurtosis, k = 11.4 (see Fig. 1 ). Even so, for that alternative Mp still perform better than W and R. The results of Table 1 indicate that the choice of p = 0.2 yields a noticeable improvement for all n and alternatives considered. For larger (>~50) sample sizes and a skewed alternative, the choice of p = 0.25 appears to be preferable. Our procedures have improved power for testing symmetry about a known center against asymmetry in the tails of the distribution. For distributions whose asymmetry is focused in regions close to the median, the Mp family will not perform well because observations in that region are not given much weight. As noted by McWilliams (1990) and Henze (1993), a distribution may be asymmetric with respect to an assumed median under a location shift alternative. While Mp is more powerful than R in these situations, they are both locally less powerful than W. After this paper was submitted, we became aware of a related article by Tajuddin (1994) who proposed a conditional test based on the Wilcoxon two-sample test. This test yields
R. Modarres, J.L. Gastwirth I Statistics & Probability Letters 31 (1996) 107-112
110 RO
7
6
i
LO 3
2
O.fl
L 0.0
-
-I
0
0
' g
1
3
0.0
X'7 Alternative
i LO
0.5
0
1.5
0
i g
1 X9
X8
6: (S,K)=(2, 2L2)
A l t e r n a t i v e 7: (S,K)=(3A6, 23.8)
Alternative
O.25
• (S,K)=(3.88, 40.7)
0~ 0~
020
0.4 0.15 0~ oJ0 5
02
0.05
/
O~
/
0.04) -10
0
0
I0
-OJ
;
0.0
OA
O.3
--5
0
X5
X4
Alternative
-02
3: (~K)=(0.9, 42,)
10
5 X6
Altertnative
A l t e r n a t i v e 4: {S,K)~(I.5, 7.5)
~. (SJQ=(0.fl, 11.4)
10
0~
S4
0~
6
O2
4~
02
2~
0.0 --4
-3
-2
-!
0
!
2
3
0 -L0
-4).5
0~
L0
0
0.0
0.I
A l t e r n a t i v e I: (S,K)~0.5, 22.)
8 Fig. 1. Density functions for the l a m b d a family.
02
0~
0.4
X3
X2
X!
Null: (S,K)=(O. 3)
0.5
Alternative
~ (S,K)-~L5, 5.8)
0.~
R. Modarres, ZL. Gastwirth I Statistics & Probability Letters 31 (1996) 107-112
111
Table 1 Number of rejections at :t = 0.05 and 10000 replications for the labmda family Distribution
n
W
R
M0
M0.1
Null 22=0.197454 23=0.134915 24=0.134915
20 30 50 100
495 479 510 532
531 517 493 492
482 510 510 491
503 530 516 486
539 531 509 496
459 533 483 497
Al~rn~ive 1 22=1.0 23=1.4 24=0.25
20 30 50 100
798 957 1236 2107
2105 3072 4861 7791
3139 5023 7620 9738
3303 5310 7914 9804
3580 5556 8183 9862
3528 5831 8458 9902
Altern~ive 2 22=1.0 23=0.00007 24=0.1
20 30 50 100
859 1287 1932 3625
3101 4604 6749 9273
4626 6915 9067 9973
4842 7197 9230 9983
5174 7393 9380 9987
5110 7614 9495 9991
Altern~ive 3 22=0.04306 23=0.025213 24=0.094029
20 30 50 100
585 697 848 1379
933 1074 1474 2111
1116 1509 2201 3475
1170 1596 2309 3665
1284 1653 2410 3898
1211 1719 2512 4139
Almm~ive 4 22 = - 1 . 0 23 = - 0 . 0 0 7 5 24 = - 0 . 0 3
20 30 50 100
725 812 1214 1603
1177 1412 2023 3144
1455 2038 3075 5160
1511 2136 3264 5373
1669 2231 3448 5606
1578 2335 3564 5875
Altern~ive 5 22 = -0.351663 23 = - 0 . 1 3 24 = - 0 . 1 6
20 30 50 100
497 453 482 508
523 513 553 580
532 544 598 627
557 554 596 655
590 553 607 669
524 554 603 682
Altern~ive 6 22 = - 1 . 0 23 = - 0 . 1 24 = - 0 . 1 8
20 30 50 100
512 583 639 897
699 700 877 1053
774 849 1122 1535
802 906 ll71 1599
878 930 1215 1657
793 958 1252 1761
Altern~ive 7 22 = - l . 0 2 3 = --0.001 24 = - - 0 . 1 3
20 30 50 100
1254 1492 2438 4109
3718 5362 7671 9651
5410 7712 9485 9990
5610 7918 9577 9996
5977 8106 9661 9998
5878 8299 9718 9999
Alternative 8 22 = --1.0 23 =--0.00001 24 = - - 0 . 1 7
20 30 50 100
1294 1823 2714 4559
4066 5825 8094 9803
5853 8145 9670 9999
6062 8355 9731 10000
6427 8499 9798 10000
6318 8646 9849 10000
M0.2
M0.25
power roughly comparable to that of McWilliams. Our procedure remains the most powerful for the ).-family. Finally, as Lehmann (1986, p. 326) notes the problem of testing symmetry when the center is unknown is more difficult and gives references for asymptotically distribution free tests for this hypothesis.
Acknowledgements We are grateful to a referee for helpful suggestions and comments.
1~ Modarres, J.L. Gastwirth I Statistics & Probability Letters 31 (1996) 107-112
112
Appendix A. Outline of the proof of Lemma 1 U n d e r H0, Ik, k = 2 . . . . . n, are independent and identically distributed b i n o m i a l r a n d o m variables with 1 Thus, parameters 1 and 2"
(k-np)=
#(Mp) = ~
(n(1-p)-l)(n(1-p)+2),
k=np+2 and
1±
a 2 ( M p ) = -~
(k-np)
1
2 = ~--~(n(1 - p ) -
1)(2n2(1 - p)2 + 5n(1 - p ) + 6 ) .
k=np+2 Asymptotic normality follows b y verifying the conditions o f the L i a p o u n o v central limit theorem (Gnedenko, 1962; C h u n g 1968). Note that ~ =nn p + 2 E [ t k 3] ( ~ = n p + 2 trZ(tk)) 3/z
=
l(nq+Z)(nq-
1)(n2q 2 + n q + 2 )
,~ (nq) -1/2,
( l ( n q -- 1)(2n2q z + 5nq + 6)) 3/2
where q = 1 - p and tk = dp(k)(Ik - 1). For fixed p, the above tends to zero as n tends to infinity.
References Chung, K.L. (1968), A Course in Probability Theory (Harcourst, Brace and World, New York). Cohen, J.P. and S.S. Menjoge (1988), One-sample runs test of symmetry, J. Statist. Plann. Inference 18, 93-100. Gastwirth, J.L. (1965), Percentile modification of two sample rank tests, J. Amer. Statist. Assoc. 60, 1127-1141. Gibbons, J.D. and S. Chakraborti (1992), Nonparametric Statistical Inference (Marcel Dekker, New York). Gnedenko, B.V. (1962), The Theory of Probability (Chelsea, New York). Henze, N. (1993), On the consistency of a test for symmetry based on a runs statistics, Nonparametric Statist. 3, 195-199. Hettmansperger, T. (1984), Statistical Inference based on Ranks (Wiley, New York). Hollander, M. and D.A. Wolfe (1973), Nonparametric Statistical Methods (Wiley, New York). Lehmann, E.L. (1986), Testing Statistical Hypotheses (Wiley, New York, 2rid ed.). Lehmann, E.L. (1975), Nonparametrics (McGraw-Hill, New York). McWilliams, T.P. (1990), A distribution-free test for symmetry based on a runs statistics, J. Amer. Statist. Assoc. 85, 1130-1133. Randles, R.H. and D.A. Wolfe (1979), Introduction to the Theory of Nonparametric Statistics (Wiley, New York). Tajuddin, I.H. (1994), Distribution-free test for symmetry based on Wilcoxon two-sample test, J. Appl. Statist. 21 (5).