A modified runs test for symmetry

A modified runs test for symmetry

STATISTICS& ELSEVIER Statistics & Probability Letters 31 (1996) 107-112 A modified runs test for symmetry Reza Modarres ,,1, Joseph L. Gastwirth 2 D...

278KB Sizes 0 Downloads 26 Views

STATISTICS& ELSEVIER

Statistics & Probability Letters 31 (1996) 107-112

A modified runs test for symmetry Reza Modarres ,,1, Joseph L. Gastwirth 2 Department of Statistics, The George Washington University, Washington DC, 20052, USA Received June 1995; revised December 1995

Abstract We present a modification of a recently proposed test of symmetry about a known center. The new test uses Wilcoxon scores to weigh the runs and has a limiting normal distribution under the null hypothesis. A Monte Carlo study shows that it is more powerful than both the runs and Wilcoxon signed-rank test for the symmetry problem against alternative in the lambda family.

Keywords: Runs test; Test of symmetry; Lambda family; Asymmetric alternative; Power

1. Introduction Let xl . . . . . xn be a sequence of independent and identically distributed random variables with density f and distribution function F and known center #. We consider the problem of testing H0 : F ( g - x) = 1 - F ( # + x) against Ha:F(p-x)¢

1-F(#+x).

Thus, we test whether the density f is symmetric about the median; i.e. f ( # - x) = f ( # + x) for all x or is skew. There is a large literature (Hollander and Wolfe, 1973; Lehmann, 1975; Randles and Wolfe, 1979) concerning the case when # is known and a test based on the theory of runs was recently proposed by Cohen and Menjoge (1988) and McWilliams (1990). Cohen and Menjoge (1988) and Henze (1993) proved that the test is consistent against asymmetric alternatives and McWilliams (1990) showed that the test is more powerful against alternatives in the lambda family than existing procedures based on Wilcoxon test, a Cram6r-von Mises type test and a test based on the empirical distribution function. * Corresponding author. E-mail: [email protected]. 1 Research supported in part by The George Washington University Facilitating Fund. 2 Research supported in part by a grant from the NSF. 0167-7152/96/$12.00 (~) 1996 Elsevier Science B.V. All rights reserved PH S 0 1 6 7 - 7 1 5 2 ( 9 6 ) 0 0 0 2 0 - X

R. Modarres, J.L. Gastwirth I Stat&tics & Probability Letters 31 (1996) 107-112

108

In the next section, we present a modification of the one-sample runs test for symmetry and prove its asymptotic normality under the null hypothesis. The procedure uses Wilcoxon scores or their modifications by Gastwirth (1965) to weigh the runs. In Section 3, we present a Monte Carlo study which shows that the modified test is more powerful than the ordinary runs test and its competitors studied by McWilliams (1990) for a wide range of alternative asymmetric distributions.

2. A modified runs statistic

To define the test, assume p is known; so without loss of generality we can assume it is zero. Let x(1 ). . . . . X(n) denote the sample values ordered from smallest to largest according to absolute value and retain the signs; i.e. fold the observations around zero. Let S1. . . . . Sn denote indicator variables designating the signs of the x(k) values, 1, 0,

Sk =

ifx(~) ~> 0; otherwise.

The runs statistic counts the number of runs in the Sk sequence. That is,

R = ~/k, k=2

where I k = ~0, L 1,

if&=Sk_l; otherwise.

Under the alternative hypothesis, large negative or positive values tend to form clusters (runs) resulting in relatively a few runs. Thus, one rejects H0 when R is small. The null distribution of R, given by McWilliams (1990), is a binomial with parameters n - 1 and 12" Under the null hypothesis, P(Ik)'s are constant whereas under the alternative of asymmetry, P(Ik), k = 2 . . . . . n, are no longer independent of k. This suggests that a test based on the relative position, k, of the runs should be more powerful than R. For skewed alternatives, the runs should occur in the tails and thus we propose to modify the above test by giving more weight to those runs. Our tests are defined as

Mp = ~

(o(k)Ik,

k=np+2

where

4~(k )

f k-np, 0,

ifk>np; otherwise,

and p is a trimming proportion. If p = 0, then ~b(k) are the Wilcoxon scores. Otherwise, they are percentile modified test scores (Gastwirth, 1965). The R test proposed by Cohen and Menjoge (1988) and McWilliams (1990) corresponds to the choice of p = 0 and q~(k) = 1. Thus, R just counts the total number of runs and applies the sign test to (12,I3 . . . . . In). Mp, on the other hand, gives a weight to the relative position of each run.

It is not difficult to compute the exact null distribution of Mp for small samples as it is completely determined by the indicator variables Ik. Thus, the sample space can be considered as a set of 2 (n-l) ( n - 1)-tuples (I2,I3 . . . . . In), where each Ik is either 0 or 1. Under the null hypothesis, all (n - 1)-tuples are equally likely.

R. Modarres, J.L. Gastwirth I Statistics & Probability Letters 31 (1996) 107-112

109

Therefore, under H0, P(Mp = m) = r/(m)/2 ("-1), where q(m) is the number of ways to assign a 0 or 1 to ~b(k) so that Mp = m. Furthermore, the asymptotic distribution of Mp is given by the following lemma whose proof appears in the Appendix A. Lernma 1. Under the null hypothesis o f symmetry, Mp has an asymptotic normal distribution with mean and variance 1

~(Mp) = ~(n(1 - p ) - 1)(n(1 - p ) + 2),

o.2(Mp) = l ( n ( 1

_ p) _ 1)(2n2(1 _ p)2 + 5n(1 - p) + 6).

3. Monte Carlo simulation This section reports a Monte Carlo study comparing the new family of tests Mp to R and the Wilcoxon signed-rank test. This study is based on 10000 replications at c~ = 0.05 for sample sizes of n =20, 30, 50 and 100. Following McWilliams (1990), nine distributions, including normal, are selected from the generalized lambda family. Uniform random numbers, ui, are generated and are then transformed to generalized lambda variates, xi = 21 + u~'~ - (1 ui) )~4 22 -

0 < u < 1.

Finally, the observations are centered at the median. The program is written in SAS-IML and is run on IBM 4381. The two other procedures that are considered are R and T, the Wilcoxon signed ranks test based on the sum of the positive signed ranks of xl . . . . . xn (Hettmansperger 1984; Gibbons and Chakraborti, 1992). Finally, W and R are randomized to have exact size 0.05 and the asymptotic distribution of Mp is used in the study. Table 1 shows the number of rejections under the null and the alternative hypothesis for p = 0.0,0.10,0.20 and 0.25 and Fig. 1 shows the density functions used in the Monte Carlo study. We should note that a 95% confidence interval for the true number of rejections under H0 with a = 0.05 and 10 000 replications is (457,542). Table 1 shows that the modified tests are powerful procedures for detecting asymmetry. The tests Mp, p = 0 to 0.25 clearly dominate W and R in all cases considered. For example, consider n = 30 and cases 1-4 and 6-8. Table 1 shows that M0.25 is more likely to detect asymmetry than W and R by a factor of (6.1, 5.9, 2.4, 2.9, 1.6, 5.5, 4.7) and (1.9, 1.6, 1.6, 1.6, 1.4, 1.5, 1.5), respectively. Alternative 5 is the most difficult to detect as its skewness, s = 0.8 and its kurtosis, k = 11.4 (see Fig. 1 ). Even so, for that alternative Mp still perform better than W and R. The results of Table 1 indicate that the choice of p = 0.2 yields a noticeable improvement for all n and alternatives considered. For larger (>~50) sample sizes and a skewed alternative, the choice of p = 0.25 appears to be preferable. Our procedures have improved power for testing symmetry about a known center against asymmetry in the tails of the distribution. For distributions whose asymmetry is focused in regions close to the median, the Mp family will not perform well because observations in that region are not given much weight. As noted by McWilliams (1990) and Henze (1993), a distribution may be asymmetric with respect to an assumed median under a location shift alternative. While Mp is more powerful than R in these situations, they are both locally less powerful than W. After this paper was submitted, we became aware of a related article by Tajuddin (1994) who proposed a conditional test based on the Wilcoxon two-sample test. This test yields

R. Modarres, J.L. Gastwirth I Statistics & Probability Letters 31 (1996) 107-112

110 RO

7

6

i

LO 3

2

O.fl

L 0.0

-

-I

0

0

' g

1

3

0.0

X'7 Alternative

i LO

0.5

0

1.5

0

i g

1 X9

X8

6: (S,K)=(2, 2L2)

A l t e r n a t i v e 7: (S,K)=(3A6, 23.8)

Alternative

O.25

• (S,K)=(3.88, 40.7)

0~ 0~

020

0.4 0.15 0~ oJ0 5

02

0.05

/

O~

/

0.04) -10

0

0

I0

-OJ

;

0.0

OA

O.3

--5

0

X5

X4

Alternative

-02

3: (~K)=(0.9, 42,)

10

5 X6

Altertnative

A l t e r n a t i v e 4: {S,K)~(I.5, 7.5)

~. (SJQ=(0.fl, 11.4)

10

0~

S4

0~

6

O2

4~

02

2~

0.0 --4

-3

-2

-!

0

!

2

3

0 -L0

-4).5

0~

L0

0

0.0

0.I

A l t e r n a t i v e I: (S,K)~0.5, 22.)

8 Fig. 1. Density functions for the l a m b d a family.

02

0~

0.4

X3

X2

X!

Null: (S,K)=(O. 3)

0.5

Alternative

~ (S,K)-~L5, 5.8)

0.~

R. Modarres, ZL. Gastwirth I Statistics & Probability Letters 31 (1996) 107-112

111

Table 1 Number of rejections at :t = 0.05 and 10000 replications for the labmda family Distribution

n

W

R

M0

M0.1

Null 22=0.197454 23=0.134915 24=0.134915

20 30 50 100

495 479 510 532

531 517 493 492

482 510 510 491

503 530 516 486

539 531 509 496

459 533 483 497

Al~rn~ive 1 22=1.0 23=1.4 24=0.25

20 30 50 100

798 957 1236 2107

2105 3072 4861 7791

3139 5023 7620 9738

3303 5310 7914 9804

3580 5556 8183 9862

3528 5831 8458 9902

Altern~ive 2 22=1.0 23=0.00007 24=0.1

20 30 50 100

859 1287 1932 3625

3101 4604 6749 9273

4626 6915 9067 9973

4842 7197 9230 9983

5174 7393 9380 9987

5110 7614 9495 9991

Altern~ive 3 22=0.04306 23=0.025213 24=0.094029

20 30 50 100

585 697 848 1379

933 1074 1474 2111

1116 1509 2201 3475

1170 1596 2309 3665

1284 1653 2410 3898

1211 1719 2512 4139

Almm~ive 4 22 = - 1 . 0 23 = - 0 . 0 0 7 5 24 = - 0 . 0 3

20 30 50 100

725 812 1214 1603

1177 1412 2023 3144

1455 2038 3075 5160

1511 2136 3264 5373

1669 2231 3448 5606

1578 2335 3564 5875

Altern~ive 5 22 = -0.351663 23 = - 0 . 1 3 24 = - 0 . 1 6

20 30 50 100

497 453 482 508

523 513 553 580

532 544 598 627

557 554 596 655

590 553 607 669

524 554 603 682

Altern~ive 6 22 = - 1 . 0 23 = - 0 . 1 24 = - 0 . 1 8

20 30 50 100

512 583 639 897

699 700 877 1053

774 849 1122 1535

802 906 ll71 1599

878 930 1215 1657

793 958 1252 1761

Altern~ive 7 22 = - l . 0 2 3 = --0.001 24 = - - 0 . 1 3

20 30 50 100

1254 1492 2438 4109

3718 5362 7671 9651

5410 7712 9485 9990

5610 7918 9577 9996

5977 8106 9661 9998

5878 8299 9718 9999

Alternative 8 22 = --1.0 23 =--0.00001 24 = - - 0 . 1 7

20 30 50 100

1294 1823 2714 4559

4066 5825 8094 9803

5853 8145 9670 9999

6062 8355 9731 10000

6427 8499 9798 10000

6318 8646 9849 10000

M0.2

M0.25

power roughly comparable to that of McWilliams. Our procedure remains the most powerful for the ).-family. Finally, as Lehmann (1986, p. 326) notes the problem of testing symmetry when the center is unknown is more difficult and gives references for asymptotically distribution free tests for this hypothesis.

Acknowledgements We are grateful to a referee for helpful suggestions and comments.

1~ Modarres, J.L. Gastwirth I Statistics & Probability Letters 31 (1996) 107-112

112

Appendix A. Outline of the proof of Lemma 1 U n d e r H0, Ik, k = 2 . . . . . n, are independent and identically distributed b i n o m i a l r a n d o m variables with 1 Thus, parameters 1 and 2"

(k-np)=

#(Mp) = ~

(n(1-p)-l)(n(1-p)+2),

k=np+2 and



a 2 ( M p ) = -~

(k-np)

1

2 = ~--~(n(1 - p ) -

1)(2n2(1 - p)2 + 5n(1 - p ) + 6 ) .

k=np+2 Asymptotic normality follows b y verifying the conditions o f the L i a p o u n o v central limit theorem (Gnedenko, 1962; C h u n g 1968). Note that ~ =nn p + 2 E [ t k 3] ( ~ = n p + 2 trZ(tk)) 3/z

=

l(nq+Z)(nq-

1)(n2q 2 + n q + 2 )

,~ (nq) -1/2,

( l ( n q -- 1)(2n2q z + 5nq + 6)) 3/2

where q = 1 - p and tk = dp(k)(Ik - 1). For fixed p, the above tends to zero as n tends to infinity.

References Chung, K.L. (1968), A Course in Probability Theory (Harcourst, Brace and World, New York). Cohen, J.P. and S.S. Menjoge (1988), One-sample runs test of symmetry, J. Statist. Plann. Inference 18, 93-100. Gastwirth, J.L. (1965), Percentile modification of two sample rank tests, J. Amer. Statist. Assoc. 60, 1127-1141. Gibbons, J.D. and S. Chakraborti (1992), Nonparametric Statistical Inference (Marcel Dekker, New York). Gnedenko, B.V. (1962), The Theory of Probability (Chelsea, New York). Henze, N. (1993), On the consistency of a test for symmetry based on a runs statistics, Nonparametric Statist. 3, 195-199. Hettmansperger, T. (1984), Statistical Inference based on Ranks (Wiley, New York). Hollander, M. and D.A. Wolfe (1973), Nonparametric Statistical Methods (Wiley, New York). Lehmann, E.L. (1986), Testing Statistical Hypotheses (Wiley, New York, 2rid ed.). Lehmann, E.L. (1975), Nonparametrics (McGraw-Hill, New York). McWilliams, T.P. (1990), A distribution-free test for symmetry based on a runs statistics, J. Amer. Statist. Assoc. 85, 1130-1133. Randles, R.H. and D.A. Wolfe (1979), Introduction to the Theory of Nonparametric Statistics (Wiley, New York). Tajuddin, I.H. (1994), Distribution-free test for symmetry based on Wilcoxon two-sample test, J. Appl. Statist. 21 (5).