Mathl.
Comput.
Modelling Vol. 18, No. 12, pp. 49-55, 1993 Copyright 0 1994 Elsevier Science Ltd Printed in Great Britain. All rights reserved 0895-7177/93 $6.00 + 0.00
Consistency ‘of Rank Tests against Some General Alternatives A. I. KATSEVICH Mathematics Department, Kansas State University Manhattan, KS 66506-2602, U.S.A.
[email protected] A. G. RAMM Mathematics Department, Kansas State University Manhattan, KS 66506-2602, U.S.A.
[email protected]
(Received and accepted
October 1993)
AbstractThe proof of consistency of rank teats against some general alternatives is given. The alternatives considered include arbitrary, not necessarily monotone, trend in location or change points (change surfaces) under the infill asymptotics. One-dimensional and multi-dimensional cases are studied. The numerical experiment illustrating the usage of the above results for image analysis (edge detection) is presented. Keywords-Bank
test, Consistency,
Change points, Trend.
1. INTRODUCTION In this
paper,
statistic prove
we study
tests
[l] (one-dimensional their
consistency
of randomness case)
against
based
and on Geary-type
two general
on the statistic
alternatives.
xk, k = 1,. . . , N, be a random sequence. The
modification
of the
Wald-Wolfowitz
[2,3] (multi-dimensional
Consider
the one-dimensional
case)
and
case.
Let
Ho is that all xk are independent, identically distributed (iid) random variables (xv’s) with common distribution function (df) F(z). Fix m 2 2 and define the first alternative hypothesis H,. null hypothesis
kEK(,
Kn : Fk(x) = G(x),
C=l,...,
m,
(14
e=L...,m-1,
(lb)
where
G(x) > G+l(x),
2 E
ijK!={l,...,
El;
3x,e:Ge(~,e)>Ge+~(x,e), N},
KinKj=O,
fori#j.
WI
e=i Here
Fk is the df of the kth term
of the sequence
xk, Ge are continuously
differentiable
df’s.
The
H, means that the initial sequence is a union of samples (possibly multi-connected) of m rv’s & with df’s Ge such that & is stochastically smaller than &+I, e = 1,. . . , m - 1. Thus, the initial sequence need not be stochastically monotone under H,. The second alternative hypothesis
Typeset 49
by A_ML-TEX
50
A. I. KATSEVICH AND
A. G. RAMM
hypothesis H’ which we consider consists in the presence of an arbitrary “continuous” trend in location: k = l,...,N,
H’ : F/c(t) = F(z - 6’&
(2a)
where we assume that 34(t) E G[o, I],
4(t) f Const : ok = 4
+
.
0
(2b)
A similar multi-dimensional problem deals with the data XL, k = (ICI,. . . , kd), 1 < ki 5 &IV, 0 < pi < 00, 1 < i < d, where ,& are arbitrary fixed integers. In this case, XL are the values , T := ~:~EZd,lIki~PiN,1~i5d { > The multi-dimensional analogs of the hypotheses Ho, Hmr
of a rv defined on a lattice in Rd. Define BN := { t^: t^E Rd, 0 I ti I ,&, 1 < i I d}. and H’ are rlc : Fj$) H,
= q(x),
: F&r) = Ge(x),
V&j
E BN;
(3)
k~l&,
e=l,..., m,
(4a)
= 0,
for i # j;
(4b)
where the functions GI satisfy (lb),
fi e=i
&l=
BN,
i&nkj
and 8’ : F&r)
= F(x - e,),
k~ BN,
(5a)
where we assume that ^ El4(t^)E C(T),
4(i) $ const : t9&= C#I $ 0
.
(5b)
The literature on one-dimensional tests of randomness is very large [4-141, but we could not find any proofs of consistency against the general alternatives we consider. The most frequently considered alternatives are one change-point [6], monotone trend and serial correlation [12]. Different rank tests and different results for the case of multiple change points can be found in [13,14]. There are less publications on the theory of multi-dimensional tests of randomness (tests for spatial autocorrelation), most notably [3], but we did not find any results as general as ours. In Section 2, the statistics are described, basic results are stated and tests of randomness are formulated. In Section 3, some ideas of the proofs are sketched. In Section 4, some applications of the tests are outlined and results of a numerical experiment are reported. Detailed proofs of our results are given in [15,16].
2. TESTS
OF RANDOMNESS
1. Consider first the one-dimensional case. Define the statistic [15] N-l VN
:=
(N - 1)-l
c
(&+I
- &)2N-2.
(6)
k=l
Here Rk is the rank of xk, i.e., Rk := # {Xj : Xj < Xk} + 1. Note that UN has the form of the Durbin-Watson statistic 1171with observations replaced by their ranks. To construct the test of randomness baaed on VN, we need to study its asymptotic properties as N + co. Using the assumptions (lb,c), define interior and boundary points p!) := #{k : k, k + 1 E Ke}, e =
Consistencyof Rank Tests 1 I”‘, m, pcb) := # {Ic E Ki, k + 1 E Kj,i assumptions: $1
N
I n addition
# j}.
O
NT, aeT
51 to (la-c)
let us make quite
natural
e=l,...m;
(Id) . ,
nl
c
(Ye= 1;
pcb)5
as
const
N--+co.
e=i
ms ” the convergence by “N-m
Denote
Our basic res&
where E,
in probability.
is:
THEOREM 1. IfHe E,,
in mean square and by L’NLm” the convergence +
holds, then UN Nzm b. If H,,, (the assumptions (la-d))
< i.
If H’ holds ind
holds, then UN % N+C0 the function #(t) in (2b) is independent of N, then
UN -% E’, whereE’< $. N-C0 Let us formulate the test of randomness. Fix E, 0 < E < 1, the probability of rejecting Ho when Ho holds (type I error or false alarm error) and find the threshold A0 from the equation defines A0 as a function of N and E. Then, for the given p{uN < Ao 1 Ho} = E. This equation sequence {z~}~=i, compute the statistic UN by formula (6). If UN 2 Ao, HO is accepted, i.e. the sequence is assumed to be random. If UN < Ao, HO is rejected. Clearly, the probabilities of the second type errors (against H, or H’) are given by P {UN 2 A0 1Hm} and P {UN 2 A0 ) H’}. THEOREM 2. Fix E, 0 < e < 1, and find A0 from the equation P {UN < Ao ( HO} = E. Then and P{UN > A0 1H’} --+ 0 a.~N -+ 00.
P{UN 2 A0 ( Hm} = @N-l) Theorem 2.
2 shows the consistency
Let us discuss
of the proposed
the multi-dimensional
case.
test. Fix multi-index
h E BN and define
the set
.! : i E BN, e^# i, l~,yd I& - kil = 1 . If i is a -{ > strictly inner point of BN, the number of points in L(i) is 3d - 1. Let Ri be the rank of z:, 2^E BN, R; := # {zj : j E BN, xcj < xi} + 1. Define [15]
of lattice
points
neighboring
to it by L(k)
:=
tiN := MN1 c
c
(RL - R,-)2N-2d,
(7)
jcEB,v &L(i) where MN is the number of terms in the double sum (7), MN = (3d-1)Pi.. . :&Nd(l+o(l/N)) as N -+ 00. Theorems 1 and 2 and the test of randomness remain the same in the multi-dimensional case with VN replaced by fiN. Now a few words about the numerical implementation of the above test are in order. One sees that the most important part is the calculation of the threshold A0 from the equation P{uN
normal
as N -+ 0;) if HO holds.
One has CEy’
(R~+I - Ri)2 = N(N + 1) y
- 2R - (RI - RN)~. Thus, UN = $ - 2R/(N2(N - 1)) + o(l) and UN is also asymptotically normal. For small values of N one can use the Monte-Carlo method. One models a sufficiently large number of permutations of (1,. . . , N}, corresponding to the equiprobable distributions of ranks within the initial sequence, and computes UN for each permutation. Select A0 so that UN < A0 in lOOf% cases. In the multi-dimensional case, the asymptotic distribution of CN is also normal [3], and for small N one can also use the Monte-Carlo method.
3.
IDEAS
OF THE
1. First, we prove that TN = (1 + N-l)/6 HO is true (see also [l]). Here the bar stands
PROOFS
and Var(VN) = (36N)-’ + 0(NW2) as N + 00 if for the mean value. Then we assume H, and let
A. I. KATSEVICHAND A. G. RAMM
52 m =
2. In this case, we prove that var(vry) = 0 (+) 2
where f(r)
:= wg1
2
G-l(r) g(G-l(r))
and ‘~j are defined in (Id).
as N -+ 00 and
(8)
G(z) := alGl(z)
’
Here G-’
on a set of regular values of G(z).
+ cqG2(2),
Let E’ := {r : 0 5 T 5 l,G(z)
= (~1 lim gl(x) G(z) da:= c--r0JwL
From
G’,,
Clearly, f(r) is defined = r,g(x) = 0). By Sard’s
Then, by taking z = G-‘(r),
(~11, I := RglGdx. J
we get
(9)
sup gl(x) -+ 0 as 6 -+ 0. Furthermore, XEW\W,
dx + JGlgl w
=
C onsider an arbitrary open set EL such that
E’ c E:, measE: 5 E. Let E, := [0, l] \ EL, R, := G-l(E,).
I = a1
G’, g1
is the inverse function of G(z).
theorem [18], measE’ = 0. Let E := (0, l] \ E’.
The last equation follows since
g =
~2
Jw
Ggl dx < (u1/2 +
a2
Glgldr=$+F=;.
(10)
Jw
(8) it follows that the inequality E2 < i is equivalent to the following one
z2 -
QlZ + 7
>
a1(1 4
al)
(11)
.
From (9) and (10) one gets z = crll < a1/2, which implies (11). Next, we consider the case m > 2 and prove that 17~ -+ E, and Var(VN) = 0 (k) Denote E, := E,(q,. . . , a,; Gl,. . . , Gm). Let us prove the inequality E, < i.
as N + co.
The following inequality holds
LEMMA.
E,(q
,..., a,;Gl,...,
G,,J < Em--1(cq ,..., a,-2,cx,_1+a,;G1,...,
G,-1).
(12)
Applying (12) repeatedly, one obtains the estimate: E,(cQ ,..., a,;Gl,...,
G,rJ < E~((Y~,cYP+...+(Y,;G~,G~)
= E2 < f.
2. Now assume H’. Fix m 2 2 and divide the interval [0, l] into m equal subintervals. Consider a piecewise-constant trend approximating 4(t). Let VP’ be the statistic (6) calculated for such trend. Using the above results, we prove three results [16]: (i1) I/F’ (i2)
(is)
=
E,;
N+m
(me) , E --+ 0, where UN is the statistic calculated vN - uN I > N+CO {I in the case of the trend (2b); and
t/e >
lim
N-CC
0 3m, such that P
I
P
{
Yj;n')<.}- P{vN < a}1= 0,for any a $ [E,< - 6, Em, + E].
These three results imply VN Nzm E,, and the following formula are pr&ed:
E,
where E,
:=
lim E,.
The existence of this limit
= 2 (f - 1($)yl”hw ere I(4) is a functional which is
Consistency
Gateaux
differentiable
in CIO, 11. Thus,
of Rank Tests
53
to show E Q) < i we need to prove that
I(d)
> 9. The
proof of this fact is based on two assertions:
w6, + @?A
(ii)
aa
Theorem
> 0, for
any y, a < y < b.
a=0
1 is proved.
3. Now the proof of Theorem
2 presents
no difficulties:
P{~N>Ao(H,}=P{~N-E~>Ao-E~IH,} I P {IVN - EmI 2 AO - Em I &J
=
0
(N-l)
points
_ const Nd-’ p-lb) < if fis holds. holds.
case, the argument
ficb) := # {i : k E &, L(k) rl kl as N --) 00 (compare
Moreover,
The inequalities
6~ 8,
3
N-w
Em)-2var(w)
ClSN-+OQ.
O
I (Ao -
+c+ndE,&~,soAo-Em~c>Oas of Theorem 2 is proved similarly. is similar.
# L(k)}
under
with (Id)).
Note that the number
the hypothesis
l?,
of boundary
is assumed
to satisfy
f, Var(fi/N) = O(Nmd) We prove that fiN N-roe z
gm, Var(fiN) = O(N-‘)
if &
< i and gw < i are proved analogously
holds;
and fiN NLm l?,
to the one-dimTnsiona1
if P case.
4. APPLICATIONS There are many possible applications of the proposed test of randomness, especially of its multidimensional version: image processing, edge detection, remote sensing, spatial and temporalspatial analysis of data arising in the earth sciences, ecology, soil science, hydrology etc. For example, let us describe an algorithm for edge detection based on the proposed test. In image processing, the data are intensities of the grey level specified at each pixel, i.e., at the nodes of the two-dimensional square grid (image). An edge (discontinuity of a signal) can be defined as follows: the grey level is relatively consistent in each of the two adjacent extensive regions, and changes abruptly as the border between the two regions is crossed [19,20]. Consider an N x N window BN sliding over the image. For each position of the window one wants to make a decision: whether I’ n BN = 0 or not, where l? is an edge. If I’n BN # 0, then l? divides BN into two sets K1 and Kg, such that the values of the grey level in one set are stochastically larger than those in the other set, hence the hypothesis Hs (or, more generally, H,) takes place. If r rl BN = 0, then the grey level is approximately constant inside BN and the hypothesis HO takes place. Thus, the choice between “I’n BN = 0” (HO) and ‘Trl BN # 0”(H m ) can be made using the test presented in Section 2. If the hypothesis H,,, is accepted, the center of the current window is marked as an edge point. Repeating this process for each position of the window, one finds all edge points. The numerical results of an example. Figure 1 represents magnitude D = 1.5 specified the uniform distribution, zero chosen N = 7, the probability detected edges of Figure 1.
application of the a synthetic image at a square 101 x mean and standard of false alarm has
above algorithm are illustrated by the following of square and circle step edges with the jump 101 grid. The image is corrupted by noise with deviation u = 0.75. The window size has been been E = 0.01. Figure 2 represents the image of
A. I. KATSEVICH AND A. G. RAMM
54
100
60
J
0
.
20
40
60
80
101
Figure 1. A 101 x 101 synthetic image of square and circle step edges corrupted noise.
20
40
60
80
Figure 2. Detected step edges of Figure 1.
by
Consistency
of Rank Tests
55
REFERENCES 1. A. 2. 3. 4. 5. 6.
7. 8. 9.
10. 11. 12.
13. 14. 15. 16. 17. 18. 19. 20.
Wald and J. Wolfowitz, An exact test for randomness in the nonparametric case based on serial correlation, Annals of Mathematical Statistics 14, 378-388, (1943). R.C. Geary, The contiguity ratio and statistical mapping, The Incorporated Statistitian 5, 115-145, (1954). A.D. Cliff and J.K. Ord, Spatial Processes: Models and Applications, Pion, London, (1981). R.J. Aiyar, Asymptotic efficiency of rank tests of randomness against autocorrelation, Annals of the Institute of Statistical Mathematics 33, 255-262, (1981). R.J. Aiyar, C.L. Gouillier and W. Albers, Asymptotic relative efficiencies of rank tests for trend alternatives, Journal of the American Statistical Association 74, 226-231, (1979). M. C&g6 and L. Horv&th, Nonparametric methods for changepoint problems, In Handbook of Statistics, Vol. 7. Quality Control and Reliability, (Edited by P.R. Krishnaiah and P.K. Sen), pp. 403426, NorthHolland, Amsterdam, (1988). J. H&jek and 2. Sid&k, Theory of Rank Tests, Academic Press, New York, (1967). M. Kendall and J.K. Ord, Time Series, 3’d ed., Edward Arnold, UK, (1990). P.R. Krishnaiah and B.Q. Miao, Review about estimation of change points, In Handbook of Statistics, Vol. 7. Quality Control and Reliability, (Edited by P.R. Krishnaiah and P.K. Sen), pp, 375-402, North-Holland, Amsterdam, (1988). M. Kendall and A. Stuart, The Advanced Theory of Statistics, Vol. 2, 4th ed., Charles Griffin, London, (1979). G.E. Noether, Asymptotic properties of the Wald-Wolfowitz test of randomness, Annals of Mathematical Statistics 21, 231-246, (1950). G.K. Bhattacharyya, Tests of randomness against trend or serial correlation, In Handbook of Statistics, Vol. 4. Nonparametric Methods, (Edited by P.R. Krishnaiah and P.K. Sen), pp. 89-111, North-Holland, Amsterdam, (1984). F. Lombard, Rank tests for changepoint problems, Biometrika 74, 615-624, (1987). B.E. Brodsky and B.S. Darkhovsky, Nonparametric Methods in Change-Point Problems, Kluwer, Dordrecht, Netherlands, (1993). A.I. Katsevich and A.G. Ramm, Consistency of rank test against multi-sample alternative: fixed and random design models, (submitted for publication). A.I. Katsevich and A.G. Ramm, Consistency of rank test against trend in location, (submitted for publication). J. Durbin and G.S. Watson, Testing for serial correlation in least squares regression I, Biometrika 37, 409-428, (1950). A. Sard, The measure of critical values of differentiable maps, 8~11. Amer. Math. Sot. 48, 883-890, (1942). W.K. Pratt, Digital Image Processing, Znd ed., Wiley, New York, (1991). A. Rosenfeld and A.C. Kak, Digital Picture Processing, Vol. 2, 2 nd ed., Academic Press, New York, (1982).