Journal of Statistical Planning and Inference 141 (2011) 2645–2655
Contents lists available at ScienceDirect
Journal of Statistical Planning and Inference journal homepage: www.elsevier.com/locate/jspi
Some partially sequential nonparametric tests for detecting linear trend Amitava Mukherjee a,, Uttam Bandyopadhyay b a b
Department of Mathematics and System Analysis, Aalto University, Otakaari 1M, Room-Y335, 00076-Aalto, Finland Department of Statistics, University of Calcutta, Kolkata, India
a r t i c l e i n f o
abstract
Article history: Received 17 April 2010 Received in revised form 9 February 2011 Accepted 11 February 2011 Available online 18 February 2011
In the present study, we develop two nonparametric partially sequential tests for detecting possible presence of linear trend among the incoming series of observations. We assume that a sample of fixed size is available a priori from some unknown univariate continuous population and there is no sign of trend among these historical observations. Our proposed tests can be viewed as the sequential type tests for monitoring structural changes. We use partial sequential sampling schemes based on usual ranks as well as on sequential ranks. We provide detailed discussion on asymptotic studies related to the proposed tests. We compare the two tests under various situations. We also present some numerical results based on simulation studies. Proposed tests are extremely important in profit making in volatile market through Margin Trading. We illustrate the mechanism with a detailed analysis of a stock price data. & 2011 Elsevier B.V. All rights reserved.
Keywords: Partial sequential sampling Sequential rank Wilcoxon score Distribution free Expected sample size Asymptotic power Trend
1. Introduction Let X1, X2,y,Xm, Xm + 1,y, be a sequence of random variables (r.v’s) having cumulative distribution functions (d.f) F1(x), F2(x),y,Fm(x), Fm + 1(x),y, respectively. We assume that the Fi’s are absolutely continuous for all i=1,2,y,m,m +1,y, but the functional form of Fis is not known. Suppose at the time of beginning of statistical monitoring, the r.v’s X1, X2,y,Xm are already observed. Let us further consider that prior investigation reveals the following identity: F1 ðxÞ ¼ F2 ðxÞ ¼ ¼ Fm ðxÞ ¼ FX ðxÞ:
ð1:1Þ
The relation (1.1) obviously indicates that a historical sample of size m is observed from certain unknown univariate absolutely continuous d.f FX. Such a historical sample is more often available to the practitioners and therefore all these assumptions are very simple and plausible for many real life problems. Our purpose is to monitor the incoming series of observations, namely, Xm + 1,Xm + 2, y, sequentially and to verify whether these are also coming from same FX as in (1.1). That is we want to monitor weather the following identity holds as well: FX ðxÞ ¼ Fm þ 1 ðxÞ ¼ Fm þ 2 ðxÞ ¼
ð1:2Þ
We also wish to detect if there is an increasing trend of linear pattern among Xm + 1, Xm + 2,y, in case the relation (1.2) does not hold. As a consequence of such a linear trend, if any, we can expect gradual right shift in location parameter among Fm + 1(x), Fm + 2(x), y. Corresponding author.
E-mail addresses:
[email protected] (A. Mukherjee),
[email protected] (U. Bandyopadhyay). 0378-3758/$ - see front matter & 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.jspi.2011.02.017
2646
A. Mukherjee, U. Bandyopadhyay / Journal of Statistical Planning and Inference 141 (2011) 2645–2655
For k= 1,2,y, writing, Yk ¼ Xm þ k
and
Fm þ k ðxÞ ¼ FðxkdÞ,
for some d 40, we can put forward the entire monitoring problem in the form of a classical two-sample testing problem. Note that the problem boils down to test the null hypothesis, H0 : Fm þ k ðxÞ ¼ FX ðxÞ,
k ¼ 1,2, . . . :,
ð1:3Þ
against the alternative H1 : Fm þ k ðxÞ ¼ FX ðxkdÞðd 40Þ,
k ¼ 1,2, . . . :
ð1:4Þ
Different research problems related to tests under trend in location type alternatives were addressed earlier by a host of researcher. In the middle of last century, Brown and Mood (1951) as well as Cox and Stuart (1955) highlighted such problems. Testing against trend in location is also discussed briefly in the book by Ha´jek et al. (1999). Importance of this problem is well known among the research workers of the present days. Records tests for trend in location are given by Diersen and Trenkler (1996). Trend in location type problem is also considered by Haldeman et al. (1993) in order to study the characterization of the microbiology within a section of rock from the deep subsurface. Aiyer et al. (1979) discuss asymptotic relative efficiency of rank tests for trend alternatives. The present work is intended to bridge the gap between trend in location type problems and partially sequential procedures. Partial sequential tests for equality of two distributions were introduced by Wolfe (1977) and by Orban and Wolfe (1980). They used some usual rank procedures. Various properties of such partial sequential procedures were discussed by Orban and Wolfe (1978, 1982a, 1982b), Randles and Wolfe (1979), Chatterjee and Bandyopadhyay (1984), Bandyopadhyay and Mukherjee (2007), Bandyopadhyay et al. (2007, 2008a, 2008b), Mukherjee (2009, 2010), Mukherjee (to appear) and Mukherjee and Purkait (2011). Chu et al. (1996) emphasized the need for rigorous investigation on partially sequential methodologies in connection to structural monitoring problems in econometric context. Bandyopadhyay and Mukherjee (2007) addressed the problem of detecting an unknown change point using partially sequential sampling schemes. That is also a typical structural monitoring with reference to a nonparametric location problem. They considered the prior availability of a training sample or historical data from control set-up. Note that this paper has virtually nothing in common with traditional change-point problems. Nevertheless, it is worthwhile to mention that we use some of the results in connection to change-point detection problems, specially of Bandyopadhyay and Mukherjee (2007), for deriving new asymptotic results and theorems presented in this paper. In the present work, we develop two tests based on sequential rank based technique as well as usual rank procedure for testing trend in location. Bandyopadhyay and Mukherjee (2007) and Bandyopadhyay et al. (2008a) also considered both these approaches for comparative study but in different testing problems. We also provide a detailed comparison between the two competitive procedures and discuss various situations when these tests for detecting trend in location will be appropriate. For better understanding of the proposed tests and as a whole partially sequential methods, interested readers may also look at Bandyopadhyay et al. (2008a). That work, along with the following references, will give readers a good idea how partially sequential tests can be viewed as a tool for statistical monitoring of incoming series of observations in different contexts. Bandyopadhyay et al. (2008a) developed two tests based on usual rank as well as sequential rank procedures for monitoring structural changes with an aim to control type I error rate. They propose nonparametric tests for fluctuation monitoring which, in some simplest situation, is basically a two-sample location problem. They illustrate their procedures with a daily change of an Indian stock price index data. Bandyopadhyay et al. (2007) introduced partially sequential tests for the homogeneity of a finite number of populations against ordered alternative in monitoring arsenic contamination in ground water. Mukherjee (2010) also discussed typical stochastic approach of controlling type I error rate in nonparametric monitoring of structural changes modifying partially sequential procedures. Mukherjee (2009) introduced some two phase procedure for structural monitoring using the similar partially sequential type sampling schemes. However, partially sequential procedures are so far not available for testing the possible presence of gradual changes (or a trend) in location instead of a single or multiple change point(s). Importance of fluctuation monitoring or monitoring structural changes is well known in the present era. It is useful in different branches of science and commerce including the areas of engineering and market researches. Sequential methodologies, rather time sequential procedures, may be very effective in solving such monitoring problems. Chu et al. (1996) and Horva´th et al. (2004), among others, noted that such problems fall under two distinct categories: one-shot testing or studying changes or structural break down within a fixed historical samples and on-line sequential monitoring when data aries steadily. Along with Chu et al. (1996) and Horva´th et al. (2004), we further refer Aue et al. (2006), ¨ o+ and Horva´th (1997), Huˇskova´ et al. (2009) Basseville and Nikiforov (1993), Brodsky and Darkhovsky (1993, 2000), Csorg and Huˇskova´ and Chochola (2010) for interested readers. In fact, there are host of literature available in this area. For sequential monitoring with training samples or historical data, we refer in addition, Joseph and Pantelis (2002), Jinqiang ¨ et al. (2007), Xiong et al. (2007) and Chochola (2008). For tests based on parameter fluctuation one may see Kramer et al. (1988) and Ploberger et al. (1989) along with Chu et al. (1996).
A. Mukherjee, U. Bandyopadhyay / Journal of Statistical Planning and Inference 141 (2011) 2645–2655
2647
1.1. Scope of applications In fact, there is a wide scope of applications of the procedures discussed in this paper. The present procedures are very much applicable in detecting structural changes, specially in typical market research or stock price research problems. 1.1.1. In market research experiment Bandyopadhyay and Mukherjee (2007) outlined how using the partial sequential sampling schemes inferential problems in market survey experiment can be attacked. The proposed techniques are also useful in testing the null hypothesis of equality of mean level of consumer satisfaction for some item, measured in a continuous scale. Suppose we already have a data collected at some time point, related to consumer satisfaction of an item. Now it is well known fact that, generally under competitive environment, a company continuously works to improve the quality of the brand. This may result in an increasing mean level of consumer satisfaction as time progresses. Obviously, such an assertion can be verified using the proposed procedures. 1.1.2. In typical stock price analysis Consider typical web-based margin trading of stocks, mentioned in Bandyopadhyay et al. (2008a) where both the purchase and sell off take place in the same trading day. Popularity of such trading is ever increasing among small scale investors, mostly among those who economically belong to middle class. It is likely to gain more momentum in volatile market condition followed by recent economic recession. The key advantage of margin trading is the quick-rolling of the money. Investors do not have to block the capital for long time. In the long run, predictive models often fail in every stock market over the world and abnormal volatility plays a major role. In contrast, volatility often makes margin trading more beneficial. For any company’s share, let lt and ht respectively denote the lowest price and the highest price attained at the t-th day. Therefore, the interval [lt, ht] may be referred as the price band of the t-th day. Generally, rational investors try to guess the lower bound of the price band and place purchase order at that price. Similarly, they place sell order at a price that they predict as the possible upper bound of the price band on that day. Hence, for that stock, a simple indicator of profit prospect at the t-th day may be taken to be P t ¼ ðht lt Þ=lt : Obviously, this figure is hypothetical at the time of making investment on the t-th day. However, at that point investors can look at previous figures P t1 ,P t2 , . . . to make some inference. Substantial gain may be made through trading if P t values exceed 0.2 for some stock. Further, it may be seen that there exists m such that P ti ,i ¼ 1,2; . . . ;m are independent even though they are essentially time sequential data. We see this using ordinary run tests for various companies stocks prices during volatile market conditions. Suppose that, at some point of time, we want to make an investment portfolio for margin trading of a particular company’s share. We develop a model free guideline for the consultants. Let us identify Xm þ 1i ¼ P ti ,i ¼ 1,2; . . . ;m, available at the beginning of monitoring as the first sample observations. These values, though time sequential in nature, are found to be independent in a short run and can be checked using run tests for independence. Thus we may assume that these form a random sample from an unknown continuous distribution function F. We search for a trend in location with positive slope among P t ,P t þ 1 , . . . by defining Yk ¼ P t þ k1 ,k ¼ 1,2, . . . : Presence of an increasing trend in location of P t ’s signifies that the corresponding company’s share scripts become more favorable for margin trading. The problem may be looked upon as a two-sample testing problem in the light our introduction. Obviously, delay time to decision is an important issue and we should try to detect presence of trend rather early. Therefore, we need to use sequential methodologies. Moreover, we need to decide how long we can wait and draw second sample observations. We address all these issues in the present context. Rest of this work is organized as follows. Section 2 is devoted to describe the proposed test procedures. In Section 3 asymptotic null distributions of the proposed test statistics are derived. Some asymptotic properties of the proposed tests are discussed in Section 4. Section 5 illustrates the results of simulation studies. Section 6 contains a brief report of a data study and Section 7 concludes. Some technical details are juxtaposed in Appendix. 2. Description of the proposed test procedures Let Xm = (X1, X2,y,Xm) be an initial random sample of fixed size m corresponding to the random variable X. Let Y1,Y2, y,Yn,ybe sequentially observed Y-variables. Here an appropriate nonparametric model relevant to our discussion of the previous section is X F,
and
Yk FðxdkÞ, for k ¼ 1,2, . . . ,
ð2:1Þ
where F is an unknown continuous distribution function (d.f.), d is the slope of possible linear trend in location. Obviously d 4 0 or d o 0 as there is an increasing or decreasing trend in the Y-observations. Our object in the present investigation is
2648
A. Mukherjee, U. Bandyopadhyay / Journal of Statistical Planning and Inference 141 (2011) 2645–2655
to test increasing trend as in (1.3) and (1.4). That can be expressed in a further simplified form as H0 : d ¼ 0
against H1 : d 40:
ð2:2Þ
We, as in Orban and Wolfe (1980), use a partial sequential procedure based on inverse sampling scheme for drawing Y-observations. For Wilcoxon score, defining Fm ðÞ as the empirical d.f. based on Xm and taking r as a prefixed positive number, such a scheme may be described by the stopping variable: ( ) n X ðn þ1Þr kF m ðYk Þ Z M ¼ min n : , ð2:3Þ n Z n0 4 k¼1 where n0 Z 1 is the desired minimum second sample observations to be used. Thus, corresponding to an Y-observation drawn at the k-th stage, the score is ði1Þk=m if the observation belongs to the i-th sample block of Xm, i= 1,2,y,m. Here i-th sample block of Xm is the random interval (Xi 1, Xi) with X0 ¼ 1; with score of the block ½Xm ,1Þ ¼ k. Recently, under partial sequential framework, Bandyopadhyay and Mukherjee (2007) introduced a sequential rank based stopping rule for the identity of two unknown univariate continuous d.f.’s against one-sided shift in location occurring at an unknown time point. Unlike usual rank based procedure, sequential rank based procedure enables to update the comparison groups at each stage of the sampling process. In such a procedure, the number of sample blocks varies from draw to draw while for the usual rank based procedure described by M, the number of sample blocks is fixed. Here we may also use such a sequential rank based stopping variable with necessary modification as follows: ( ) n X ðn þ 1Þr N ¼ min n : kHm þ k1 ðYk Þ Z , ð2:4Þ n Z n0 4 k¼1 where r and n0 are as before, Hm þ k1 ðÞ is the empirical d.f. based on Xm and (Y1,Y2,y, Yk 1), k Z1. The justification for using the common value of r is addressed in Remark 4.1. Here, corresponding to an observation at the k-th stage, the score is ði1Þk=ðm þ ; k; 1Þ, i= 1,2,y, (m +k). Now, it is easy to see that, if Gk(x) is the empirical d.f. based on Y1,Y2,y, Yk 1, Hm þ k1 ðxÞ ¼ wk Fm ðxÞ þ ð1wk ÞGk ðxÞ for any kZ 1, where wk = m/(m+ k 1). Further, it is known that usual ranks and sequential ranks are distribution free under H0. Also, for any k, we have, under H0, E½Hm þ k1 ðYÞ ¼ E½Fm ðYÞ ¼
1 2
ð2:5Þ
and, under any d 40, 1 E½Hm þ k1 ðYÞ 4 : 2
ð2:6Þ
Hence, from (2.4), it is expected that, for a given r, N under alternative H1 : d 4 0 is smaller than N under H0. Thus, a lower tail test based on N would be an appropriate choice. That means, H0 is rejected iff N o Na ,
ð2:7Þ
where Na is the lower 100a% point of the null distribution of N. Similar is the development for a test based on M. Here we use Ma in place of Na . Remark 2.1. In practice, we do not need to sample more than Na (or Ma ) from Y-population for making a valid decision P regarding the acceptance or rejection of H0. For N-test we reject H0 as soon as the partial sum, say Sn ¼ nk ¼ 1 kHm þ k1 ðYk Þ, exceeds ðn þ 1Þr=4 for some n o Na . Nevertheless, if SNa oðNa þ 1Þr=4, we can say that the observed N will exceed Na and accept H0 straightway, without further sampling. Similar development is possible for M-test as well. Thus such a procedure may be looked upon as a truncated sequential type procedure. Remark 2.2. From Remark 2.1, we can conclude that the maximum delay time in decision will be Na for N-test and Ma for M-test. Similarly a may be referred to as the overall false alarm rate throughout the monitoring. Therefore, we can draw an analogy between traditional testing problem and sequential monitoring of an incoming series. 3. Asymptotic null distributions This section is devoted to derive the limiting null distributions of N and M. In order to obtain the limiting null distributions we make one of the traditional assumptions that for each m, there exist a positive number r depending on m such that, as m-1, r-1
but
r -l 2 ð0,1Þ: m
ð3:1Þ
A. Mukherjee, U. Bandyopadhyay / Journal of Statistical Planning and Inference 141 (2011) 2645–2655
2649
Let us write Sn ¼
n X
kHm þ k1 ðYk Þ ¼
k¼1
n X
wk kF m ðYk Þ þ
k¼1
n X
ð1wk ÞkGk ðYk Þ:
ð3:2Þ
k¼1
Now to obtain the null distribution of N, we use following representation: Nr 2 rðN þ1Þ 2 NðN þ1Þ pffiffiffi ¼ pffiffiffi SN pffiffiffi SN : 4 4 2 r r ðN þ 1Þ r ðN þ 1Þ
ð3:3Þ
pffiffiffi Result 3.1. ð2=ð r ðN þ 1ÞÞÞðSN NðN þ 1Þ=4Þ converges almost surely to 0 as m-1. Proof. From (2.4), we get as m-1, 2 rðN þ1Þ 2N 1 2 pffiffiffi Hm þ N1 ðYN Þ pffiffiffi Hm þ N1 ðYN Þ SN o 0 r pffiffiffi 4 ðN þ 1Þ r r ðN þ 1Þ r which tends to 0 with probability 1. Hence the result is proved.
&
From the above result, we see rN 2 NðN þ1Þ pffiffiffi pffiffiffi SN 4 2 r r ðN þ 1Þ
ð3:4Þ
where Xn Yn means that the two random variables have the same asymptotic distribution. pffiffiffi Result 3.2. Let n be the largest integer contained in r. Then, as m-1, the difference ½ð2=ð r ðN þ 1ÞÞÞðSN NðN þ1Þ=4Þ pffiffiffi ð2=ð nðn þ1ÞÞÞðSn nðn þ 1Þ=4Þ converges in probability to 0. The result is an extension of Result 4.2 of Bandyopadhyay and Mukherjee (2007). The proof follows using similar techniques and hence is omitted. Note. Results 3.1 and 3.2 will also hold if N is replaced by M. pffiffiffi Theorem 3.1. Under H0, as m-1, ð2=ð r ðN þ 1ÞÞÞðSN NðN þ 1Þ=4Þ converges in distribution to a r.v. having normal 2 distribution with mean 0 and variance s ðlÞ, where
s2 ðlÞ ¼
½llnð1 þ lÞ2 3
3l
þ
1 : 9
The proof is given in Appendix. Using (3.4) and the above theorem, the asymptotic null distribution of N, for large m, can be approximated by a normal distribution with mean r and variance 4r s2 ðr=mÞ. Hence, the test given by (2.7) can be approximated by Reject H0 approximately at the level a iff pffiffiffi r N or2ta r s m where ta is the upper a-percentile point of a standard normal distribution. pffiffiffi Theorem 3.2. Under H0, as m-1, ð1=ð2 r ÞÞðMrÞ converges in distribution to a r.v. having normal distribution with mean 0 and variance s2 ðlÞ, where 1 4 s2 ðlÞ ¼ lþ : 12 3 The proof is almost similar to that of Theorem 3.1. Thus M-test can be approximated by: Reject H0 approximately at the level a iff pffiffiffi r M o r2ta r s m Note that as l-0, s2 ðlÞ ¼ s2 ðlÞ ¼ 19. That is as l-0, limiting distribution of both M and N is same. 4. Asymptotic performance of the tests 4.1. Consistency We can re-write Sn as Sn ¼
n X k¼1
kwk
m n k1 X 1X 1 X cðXi Yk Þ þ kð1wk Þ cðYj Yk Þ, mi¼1 k1 i ¼ 1 k¼2
2650
A. Mukherjee, U. Bandyopadhyay / Journal of Statistical Planning and Inference 141 (2011) 2645–2655
where c(u)=0 or 1 as u r or 4 0. Now if we make the transformation Yj ¼ Xj0 þ dj,
j ¼ 1,2, . . . :,
where Xj0 F,j ¼ 1,2, . . . independently and are independent of Xi’s, we get that the random variable Sn has the same distribution as that of S0n ¼
n X
kwk
k¼1
m n k1 X 1X 1 X cðXi Xk0 dkÞ þ kð1wk Þ cðXj0 Xk0 dðkjÞÞ mi¼1 k1 i ¼ 1 k¼2
which, as kZ j Z 1 and for d Z0, n X
m n k1 X 1X 1 X cðXi Xk0 dÞ þ kð1wk Þ cðXj0 Xk0 ÞÞ m k1 i¼1 i¼1 k¼1 k¼2 ! n n X X 2Sn 2 1 Z plim kwk þ lim kð1wk Þ, ) liminf n nðn þ1Þ nðn þ1Þ k ¼ 1 nðn þ1Þ k ¼ 1 R where p ¼ Fðx þ dÞdFðxÞ. That is, with probability 1, Sn 1 4 : liminf m 4 nðn þ 1Þ
Z
kwk
Hence, using (A.1) we get, under any F and d 4 0 with probability 1, r liminf 4 1, m N which implies that the lower tail test based on N is consistent for testing H0 against any fixed d 4 0. By the same argument, the lower tail test based on M is consistent for testing H0 against any fixed d 4 0. 4.2. Aymptotic power As both N and M tests are consistent, we are unable to compare these tests asymptotically under a fixed ðf , dÞ,ðd 40Þ. One possible way for comparison is to derive their asymptotic powers under a sequence of local alternatives. Let us precisely define such a sequence of local alternatives. Suppose {Hm} be a sequence of local alternative hypotheses such that Yi Gi ðxÞ ¼ Fðxdm iÞ for i Z1, where F has the density f ðxÞ ¼ F 0 ðxÞ at all real x, and dm 4 0 for each m with dm -0 but r3=2 dm -dð 4 0Þ as m-1. Now we consider the following theorems. pffiffiffi Theorem 4.1. Under {Hm}, as m-1, ð1=ð2 r ÞÞðNrÞ converges in distribution to a r.v. having normal distribution with mean 2 m, depending on d, and variance s ðlÞ, where 1 1 1 lnð1þ lÞ m¼d þ 2 þ : 3 6 4l 2l 2l Proof. It follows from a simple combination of Results A.3 and A.5 together with the technique of the proof of Theorem 3.1. & pffiffiffi Theorem 4.2. Under {Hm}, as m-1, ð1=ð2 r ÞÞðMrÞ converges in distribution to a r.v. having normal distribution with mean 2 m , depending on d, and variance s ðlÞ, where
m ¼ d=3: From Theorem 4.1, the asymptotic power of the test based on N is given by 1 ðNa rÞ PðdÞ ¼ lim Pr pffiffiffiðNrÞ o pffiffiffi Hm ¼ Fðta þ dzðlÞÞ, m-1 2 r 2 r where FðÞ denotes the d.f. of a standard normal r.v. and 1 1 1 lnð1þ lÞ þ 2 þ 3 6 4l 2l 2l : zðlÞ ¼ sðlÞ
ð4:1Þ
Similarly using Theorem 4.2, the asymptotic power of the test based on M is given by 1 ðMa rÞ P ðdÞ ¼ lim Pr pffiffiffiðMrÞ o pffiffiffi Hm ¼ Fðta þdz ðlÞÞ, m-1 2 r 2 r where
z ðlÞ ¼
4 : 3l þ 4
ð4:2Þ
A. Mukherjee, U. Bandyopadhyay / Journal of Statistical Planning and Inference 141 (2011) 2645–2655
2651
Thus the N-test performs better than the M-test whenever
zðlÞ 4 z ðlÞ:
ð4:3Þ
Now it is easy to see that whatever k, when lk0, both the tests have equal power. For any l 4 0, computing zðlÞ and z ðlÞ we see that the relation (4.3) holds when l exceeds 2.33. However, we shall see in the next section that the N-test returns higher power for much smaller l under fixed alternative using simulation study. Remark 4.1. From various results of the previous sections, it is clear that, under both H0 and Hm, N M in probability. Hence r may be interpreted as the common asymptotic value of M and N under both H0 and Hm. This justifies the common choice of r for both (2.3) and (2.4). Remark 4.2. The proposed test procedures based on (2.3) or (2.4) depend on r and n0 which have to be chosen in advance. In particular, the choice of r dramatically influence the quality of the test. This feature is however not a drawback of the proposed tests rather it is an advantage. In case, samples are obtained through costly trials, one can set r depending on available monetary resources. Note that Na (or Ma ) is a monotone function of r. Therefore, given a specified amount of money statistician can compute what could be the maximum possible value of Na or Ma for a given a and evaluate r. On the other hand, when cost is not an important factor statisticians need asses what could be the maximum delay time in decision. In stock price monitoring, it may be one week or one month or one quarter etc. Again as Na (or Ma ) is the respective delay time in decision, assessing this we can evaluate r easily. On the other hand n0 is the desired minimum delay time in decision and can be taken as any positive integer less than Na (or Ma ). As a thumb rule one can consider n0 as an integer around one-third or one-fourth of Na (or Ma ). 5. Simulation studies In general, forms of the exact distributions of the sequential rank statistic (N) and that of usual rank statistic (M) will be complicated even under H0. So it will not be worthwhile to attempt to derive those explicitly. However, using Monte-Carlo simulation, we may easily study the exact behavior of the proposed procedures. We can consider both N and M and carry out some simulation studies to determine the corresponding cut-off points at a prefixed level of significance. Here we fix the level at 0.05 and obtain average numbers of second sample observations required to carry out the tests and the corresponding variances. Throughout the present investigation, we use R software for simulation study. We generate the data from a standard normal distribution as the null distributions of both M and N are independent of parent population. Moreover, we consider 50 000 replicates of the Monte Carlo experiment. Our observations, for some typical choices of m and l, are presented in Tables 1 and 2. From Tables 1 and 2 we see that, for any given m and l, N-test requires almost same expected second sample size as that required by M-test when null hypothesis is true. This is much expected because asymptotically N=r M=r. But significant difference is present in the variances of these variables. We note that for any given m and l, variance of N is much lower than that of M. This feature also confirms our findings in asymptotic studies. Our numerical findings indicate that N=r has little faster rate of convergence to unity. Now, for the similar combinations of m and l used in Tables 1 or 2, we calculate the powers of the two tests. These are presented in Table 3. Here we generate the first sample observations from standard normal population and the k-th second sample observation is generated from normal population with mean dk and variance unity. Moreover, we compute the powers based on the randomized test at size 0.05 to facilitate the comparison. Power study can be carried out simply to observe more or less similar findings if the Y-observations are taken from any other distributions. But to save space we do not include those results in this issue. Table 1 Simulation results on null distributions of M. m
l
10
Cut off E(M) S.D(M)
3 5.76 2.01
7 10.92 3.21
11 16.11 4.37
15 21.24 5.40
25
Cut off E(M) S.D(M)
9 13.24 2.88
19 25.91 4.64
30 38.55 6.31
40 51.21 7.87
50
Cut off E(M) S.D(M)
20 25.72 3.98
41 50.90 6.35
63 76.11 8.64
85 101.20 10.77
75
Cut off E(M) S.D(M)
31 38.20 4.85
64 75.90 7.78
98 113.49 10.50
131 151.17 13.16
0.5
1.0
1.5
2.0
2652
A. Mukherjee, U. Bandyopadhyay / Journal of Statistical Planning and Inference 141 (2011) 2645–2655
Table 2 Simulation results on null distributions of N. m
l
10
Cut off E(M) S.D(M)
4 5.62 1.56
7 10.57 2.16
12 15.54 2.64
16 20.56 3.05
25
Cut off E(M) S.D(M)
10 13.07 2.37
20 25.53 3.35
32 38.02 4.11
43 50.54 4.78
50
Cut off E(M) S.D(M)
20 25.55 3.34
43 50.49 4.73
66 75.55 5.78
90 100.57 6.67
75
Cut off E(M) S.D(M)
32 38.02 4.07
66 75.57 5.79
102 113.02 7.09
137 150.53 8.17
0.5
1.0
1.5
2.0
Table 3 Comparison of power and expected second sample size.
d ¼ 0:01
M-test
l
0.5
1.0
1.5
2.0
0.5
1.0
1.5
2.0
10
Power Exp. sample size
x x
0.058 (10.45)
0.066 (15.11)
0.077 (19.68)
x x
0.061 (10.26)
0.070 (14.93)
0.083 (19.54)
25
Power Exp. sample size
0.064 (12.61)
0.103 (23.71)
0.157 (34.03)
0.246 (43.75)
0.066 (12.56)
0.112 (23.83)
0.187 (34.66)
0.303 (45.08)
50
Power Exp. sample size
0.113 (23.57)
0.313 (43.67)
0.643 (61.71)
0.890 (78.43)
0.113 (23.72)
0.340 (44.61)
0.735 (64.01)
0.962 (82.37)
75
Power Exp. sample size
0.206 (33.88)
0.717 (61.69)
0.985 (86.27)
1.000 (108.78)
0.220 (34.22)
0.787 (63.49)
0.996 (90.28)
1.000 (115.38)
m
N-test
x: Power will be very close to level and the comparison will not be meaningful as no substantial shift will occur even at the expected value of the stopping variable.
From Tables 3, we clearly see that, for any given m and l, the powers of the proposed procedure are significantly higher than the usual rank based procedure whenever l is large. When l is close to 0.5, power and expected second sample size of both the tests are almost alike. However, for moderately large m, say 50, and a higher value of l, say 1, gain in power is always significant in N-test. Higher the degree of the trend function more is the gain. We see that, instead of local alternatives as used in our asymptotic studies, if we use fixed alternative, N-test is better for much wider domain of l. Our findings through power studies also indicate that both the N and the M tests are consistent with respect to fixed alternative.
6. A case study We illustrate our proposed tests for possible presence of trend using stock price data of a private sector bank in India. The data is collected from National Stock Exchange of India. There was 62 trading days in the last quarter of the financial year 2003–2004, namely, January 2004–March 2004. Thus we observe 62 figures as P 1 , . . . ,P 62 . We see using run tests that these 62 figures are essentially independent. Note that arithmetic mean and median of these 62 figures are, respectively, 0.04151 and 0.03495. We see that the observed number of runs is 24 about arithmetic mean against the expected number of 28.7742 runs. Similarly there are 28 runs above median against expected 32.0000 runs. Thus p-values of both the tests are, respectively, 0.1715 and 0.3056. So we can safely assume that the observations are random. If we plan on to check if there is a linear trend in P t values, t =63,64,yfor the remaining three quarters of the year 2004. Moreover we need to set l not less than 3. The choice l ¼ 3 is expected to cover the situation of the remaining three quarters as there are more or less the same number of trading days in each quarter. We note that for ck = k and l ¼ 3, the N-test performs better. We decide to monitor at least throughout the second quarter and so we consider n0 = 63. P We observe, as in (2.4), nk ¼ 1 kHm þ k1 ðYk Þ and compare it with 46.5(n+ 1) for n Z63. Note that here the distribution N is asymptotically normal with mean 186 and variance 106.58. Thus the asymptotic cut-off point at 5% level will be 169.02. Hence, in the light of Remark 2.1, we require to observe a maximum of 169 observations to reach a decision.
A. Mukherjee, U. Bandyopadhyay / Journal of Statistical Planning and Inference 141 (2011) 2645–2655
2653
We see that 169 X
kHm þ k1 ðYk Þ ¼ 5803:346
k¼1
falls much below 46.5(169+ 1)= 7905. Also we observe no intermediate boundary crossing. So observed N will definitely be much greater than 170 and we conclude that there is no reason to suspect the null hypothesis that there is linear trend in profit index of the bank. 7. Concluding remarks Sequential-type testing procedures are often used in various problems related to clinical trials because those are mainly cost efficient. Here, in sequential rank based procedure, we use rank adaptation technique and in this sense this is a typical adaptive procedure. Adaptive techniques often report smaller power though they have other advantages like early decision, which in turn minimizes both the experimental and the ethical costs. But, interestingly, here we see an adaptive technique for testing the trend in location can even improve the power of the test. In fact, the ratio zðlÞ=z ðlÞ may be looked upon as the ARE of N-test relative to M-test. There are many real life situations where we find such a typical problem. Throughout the present investigation, we compare the two tests in terms of power keeping expected total sample size fixed under null hypothesis, i.e. choosing common r. But such a comparison may also be made from other aspect also. Suppose, we start with the same number of first sample observations as in usual rank based procedure using same level and power under any fixed alternative for both the tests. Then, for the proposed N-test, it is expected to produce smaller numbers of second sample observations than the usual rank based test in a certain region of l. As a result we may register a gain in terms of cost factor. Moreover, the variance of N test statistic is much lower than that based on M and this is again leads to a desirable criterion in a test procedure. Finally authors will like to extend the present work for non-linear trend and for more general rank statistics other than the Wilcoxon score. We leave these for future research.
Acknowledgements Authors would like to thank the Executive Editor and two referees for various suggestions and comments which lead to considerable improvement in presentation. The first author is grateful to the Council of Scientific and Industrial Research, India for providing necessary financial support under the CSIR sanction no F.No 9/28(594)/2003-EMR-I and 9/28(697)/ 2008-EMR-I for his stay in Calcutta University, India during the major part of the work. First author also like to thank his former colleagues in Umea˚ University, Sweden for their kind support. Appendix A Result A.1. If wk ¼ m=ðm þ k1Þ, k ¼ 1,2, . . ., as m-1, the limiting values of ð1=ðmnÞÞ P 2 3 ð1=ðr n2 ÞÞ nk ¼ 1 k2 ð1wk Þ are, respectively, 1ðlnð1 þ lÞÞ=l and ð1=3Þð1=ð2lÞÞ þð1=l Þðlnð1 þ lÞÞ=l .
Pn
k¼1
kwk
and
Proof of Result A.1 follows from simple Riemann theory of Integration and hence is omitted. Now we introduce simple results that can be proved with lengthy but straightforward measure theoretic approach. Result A.2. As m-1, N,M-1 almost surely. Result A.3. Let Sn ¼
n X
wk kF m ðYk Þ þ
k¼1
n X
ð1wk ÞkFðYk Þ:
k¼1
Consequently, there exists 0 o L o1 such that for any mð Z 1Þ,
n2 EH0 ðSn Sn Þ2 o L, where EH0 denotes the expectation under H0. Further, if {Hm} be a sequence of local alternatives, then for any m Z 1 and for some 0 o L o 1,
n2 EHm ðSn Sn Þ2 oL :
Note A.1. Under both H0 and {Hm}, 1=n3=2 ðSn Sn Þ converges in probability to 0 as m-1. Result A.4. Under H0, almost surely, N=r tends to 1 as m-1.
2654
A. Mukherjee, U. Bandyopadhyay / Journal of Statistical Planning and Inference 141 (2011) 2645–2655
Proof. From Result A.3, we see that 1 X
EH0 n3 ðSn Sn Þ2 o1:
n¼1
Now using Kolmogorov’s strong law of large number (see Gnedenko, 1963, p. 244), we have, as m-1, Sn Sn pffiffiffi -0
n n
almost surely. But, under H0, as m-1, Sn
n2
-
1 4
almost surely, and hence, Sn
n2
-
1 4
almost surely. Now, using Result A.2, the required result follows from the representation: SN1 r1 SN SN r 2: o & N4 NðN þ 1Þ N2 N Note A.2. Result A.4 is also valid under a sequence of local alternatives Hm. R Result A.5. Suppose d ¼ d f 2 ðxÞdx o 1. Then we have 1 nðn þ 1Þ 1 1 1 lnð1 þ lÞ ¼ d þ 2þ : lim EHm pffiffiffi Sn 3 m-1 4 6 4l 2l n n 2l Proof. It can be easily checked that Z n n k1 Z X X 1 X Fðyþ dm ðkjÞÞ dFðyÞ, EHm ðSn Þ ¼ kwk Fðyþ dm kÞdFðyÞ þ kð1wk Þ k1 j ¼ 1 k¼1 k¼1
ðA:1Þ
which, by Result A.3 under Hm and by dominated convergence theorem, can be made asymptotically equivalent to " #Z n nðn þ 1Þ nðn þ 1Þð2n þ1Þ 1 X þ dm k2 ð1wk Þ f 2 ðyÞ dy: 4 6 2 1 Hence, required result follows from elementary Riemann theory of integration.
&
Proof of Theorem 3.1. Writing 1 1 3 þð1wk Þ FðYk Þ , Tmk ¼ kn2 wk Fm ðYk Þ 2 2 we have 3
n2 Sn
n 2
¼
n X
Tmk :
k¼1
Note that, given Xm, Tmk’s are independent with mean and variance m X 1 FðXi Þ , mk ðXm Þ ¼ m1 n3=2 kwk 2 1 2 k ðXm Þ ¼
s
k2
n
"
1 Fm ðYk Þ 2
w2k E 3
2 # k2 2k2 1 1 2 FðY Xm : ð1w Þ þ w ð1w ÞE F ðY Þ Þ Xm þ m k k k k k 2 2 12n3 n3
Write am ¼
n X
mk ðXm Þ ¼
1
v2m ¼
n X 1
s2k ðXm Þ ¼
1
n
n 1 X kwk mn 1 n X
3 1
!
m 1 X 1 pffiffiffi FðXi Þ , r 1 2
! "
k2 w2k E
Fm ðYk Þ
# n 1 2 1 X 2 k2 ð1wk Þ2 þ 3 Xm þ 2 12n3 1 n
n X 1
!
k2 wk ð1wk Þ E
Fm ðYk Þ
1 1 FðYk Þ Xm : 2 2
P Then, as in Ha´jek et al. (1999, p. 241), it can be easily checked that, given Xm, the asymptotic distribution of nk ¼ 1 Tmk is 2 normal with mean am and variance vm. Further, as m-1, am converges in distribution to a r.v. having normal distribution 2 with mean 0 and variance ð1=ð3lÞÞð1ðlnð1 þ lÞÞ=lÞ2 by CLT and vm converges in probability to 19 by WLLN. Hence, as in
A. Mukherjee, U. Bandyopadhyay / Journal of Statistical Planning and Inference 141 (2011) 2645–2655
2655
P Ha´jek et al. (1999, p. 242), we conclude that the unconditional asymptotic distribution of nk ¼ 1 Tmk is normal with mean 0 and variance ð1=ð3lÞÞð1ðlnð1 þ lÞÞ=lÞ2 þ 19. This along with Result 3.2 completes the proof. & References Aiyer, R.J., Guilliter, C.L., Albers, W., 1979. Asymptotic relative efficiencies of rank tests for trend alternatives. Journal of American Statistical Association 74, 226–231. Aue, A., Horva´th, L., Huˇskova´, M., Kokoszka, P., 2006. Change-point monitoring in linear models. Econometrics Journal 9, 373–403. Bandyopadhyay, U., Mukherjee, A., 2007. Nonparametric partial sequential test for location shift at an unknown time point. Sequential Analysis 26, 99–113. Bandyopadhyay, U., Mukherjee, A., Purkait, B., 2007. Nonparametric partial sequential tests for patterned alternatives in multi-sample problems. Sequential Analysis 26, 443–466. Bandyopadhyay, U., Mukherjee, A., Biswas, A., 2008a. Controlling type-I error rate in monitoring structural changes using partially sequential procedures. Communications in Statistics: Simulation and Computation 37, 466–485. Bandyopadhyay, U., Mukherjee, A., Purkait, B., 2008b. Simultaneous tests for patterned recognition using nonparametric partially sequential procedure. Statistical Methodology 5, 535–551. Basseville, M., Nikiforov, I.V., 1993. Detection of Abrupt Changes: Theory and Applications. Prentice-Hall, Upper Saddle River, NJ, USA. Brodsky, B.E., Darkhovsky, B.S., 1993. Nonparametric Methods in Change-Point Problems. Kluwer, Dordrecht. Brodsky, B.E., Darkhovsky, B.S., 2000. Non-parametric Statistical Diagnosis. Kluwer, Dordrecht. Brown, G.W., Mood, A.M., 1951. On median tests for linear hypothesis. In: Proceedings of 2nd Berkeley Symposium. Mathematical Statistics and Probability, vol. 2, pp. 159–166. Chatterjee, S.K., Bandyopadhyay, U., 1984. Inverse sampling based on general scores for nonparametric two-sample problems. Calcutta Statistical Association Bulletin 33, 35–58. Chochola, O., 2008. Sequential monitoring for change in scale. In: WDS’08 Proceedings of Contributed Papers Part I, pp. 74–79. Chu, C.-S.J., Stinchcombe, M., White, H., 1996. Monitoring structural change. Econometrica 64, 1045–1065. + M., Horva´th, L., 1997. Limit Theorems in Change-Point Analysis. Wiley, Chichester. ¨ o, Csorg Cox, D.R., Stuart, A., 1955. Some quick sign tests for trend in location and dispersion. Biometrika 42, 80–95. Diersen, J., Trenkler, G., 1996. Records tests for trend in location. Statistics 28, 1–12. Gnedenko, B.V., 1963. The Theory of Probability. Chelsea Publishing Company, New York. ˇ ´ k, Z., Sen, P.K., 1999. Theory of Rank Tests. Academic Press, New York. Ha´jek, J., Sida Haldeman, D.L., Amy, P.S., Ringelberg, D., White, D.C., 1993. Characterization of the microbiology within a 21 m 3 section of rock from the deep subsurface. Microbial Ecology 26, 145–159. Horva´th, L., Huˇskova´, M., Kokoszka, P., Steinebach, J., 2004. Monitoring changes in linear models. Journal of Statistical Planning and Inference 126, 225–251. Huˇskova´, M., Praskova, Z., Steinebach, J., 2009. Delay time in monitoring jump changes in linear models. ISWM-2009 Invited Paper No.16, Troyes, France. Huˇskova´, M., Chochola, O., 2010. Simple sequential procedures for change in distribution. Nonparametrics and Robustness in Modern Statistical Inference and Time Series Analysis: A Festschrift in honor of Professor Jana Jurecˇkova´; Institute of Mathematical Statistics 7, 95–104. Jinqiang, G., Chuansong, W., Jiakun, H., 2007. Real-time monitoring of abnormal conditions based on Fuzzy Kohonen clustering network in gas metal arc welding. Frontiers of Materials Science in China 1, 134–139. Joseph, B.K., Pantelis, K.V., 2002. Hybrid methods for calculating optimal few-stage sequential strategies: data monitoring for a clinical trial. Statistics and Computing 12, 147–152. ¨ Kramer, W., Ploberger, W., Alt, R., 1988. Testing for structural change in dynamic models. Econometrica 56, 1355–1370. Mukherjee, A., 2009. Some rank-based two-phase procedures in sequential monitoring of exchange rate. Sequential Analysis 28, 137–162. Mukherjee, A., 2010. Semi-sequential one-shot monitoring of small disorders with controlled type-I error rate. Communications in Statistics: Theory and Methods 39, 2829–2847. Mukherjee, A., Purkait, B., 2011. Simultaneous semi-sequential testing of dual alternatives for pattern recognition. Journal of Applied Statistics 38, 399–419. Mukherjee, A., to appear. A near-nonparametric partially sequential test for monitoring phase-II location under pairwise dependence between two phases. Sequential Analysis 30. Orban, J., Wolfe, D.A., 1978. Optimality criteria for the selection of partially sequential indicator set. Biometrika 65, 357–362. Orban, J., Wolfe, D.A., 1980. Distribution free partially sequential placement procedure. Communications in Statistics: Theory and Methods 9, 883–902. Orban, J., Wolfe, D.A., 1982a. A class of distribution free two-sample tests based on placements. Journal of American Statistical Association 77, 666–672. Orban, J., Wollfe, D.A., 1982b. Properties of a distribution-free two-stage two-sample median test. Statistica Neerlandica 36, 15–22. ¨ Ploberger, W., Kramer, W., Kontrus, K., 1989. A new test for structural stability in the linear regression model. Journal of Econometrics 40, 307–318. Randles, H.R., Wolfe, D.A., 1979. Introduction to the Theory of Nonparametric Statistics. John Wiley, New York. Wolfe, D.A., 1977. On a class of partially sequential two sample test procedure. Journal of American Statistical Association 72, 202–205. Xiong, X., Tan, M., Boyett, J., 2007. A sequential procedure for monitoring clinical trials against historical controls. Statistics in Medicine 26, 1497–1511.