Computational Statistics and Data Analysis xx (xxxx) xxx–xxx
Contents lists available at ScienceDirect
Computational Statistics and Data Analysis journal homepage: www.elsevier.com/locate/csda
A Gini-based unit root test Q1
Amit Shelef ∗ Department of Industrial Engineering and Management, SCE—Shamoon College of Engineering, Basel Street, Beer-Sheva 84100, Israel Department of Industrial Engineering and Management, Ben-Gurion University of the Negev, P.O.B. 653, Beer-Sheva 8410501, Israel
highlights • • • •
A Gini-based unit root test is developed. The test relies on the semi-parametric Gini regression. Includes an in-depth numerical comparison of the Gini-based test and existing tests. Simulations indicate the superiority of the Gini-based test in some design settings.
article
info
Article history: Received 23 January 2014 Received in revised form 12 August 2014 Accepted 16 August 2014 Available online xxxx Keywords: Time series analysis Unit root tests Gini regression Bootstrap
abstract A Gini-based statistical test for a unit root is suggested. This test is based on the well-known Dickey–Fuller test, where the ordinary least squares (OLS) regression is replaced by the semi-parametric Gini regression in modeling the AR process. A residual-based bootstrap is used to find critical values. The Gini methodology is a rank-based methodology that takes into account both the variate values and the ranks. Therefore, it provides robust estimators that are rank-based, while avoiding loss of information. Furthermore, the Gini methodology relies on first-order moment assumptions, which validates its use for a wide range of distributions. Simulation results validate the Gini-based test and indicate its superiority in some design settings in comparison to other available procedures. The Gini-based test opens the door for further developments such as a Gini-based cointegration test. © 2014 Published by Elsevier B.V.
1. Introduction
1
Q2
In most of the literature dealing with time series analysis, underlying dependencies of the time series are modeled based on variance and covariance as measures of variability and association, respectively. This research develops a unit root test that is based on the Gini Mean Difference (hereafter GMD) as an alternative index of variability. The GMD index shares many properties of the variance, but the former can be more appropriate for distributions that depart from normality or symmetry. This measure is less sensitive to extreme observations than the variance because it takes into account both the values of the random variable and its ranks. In addition, the GMD is defined solely under first-order moment assumptions. To clarify the notations, we distinguish between population parameters and estimators by using upper-case letters in the population version and lower-case letters in the sample version. 1.1. Autoregressive unit root tests
4 5 6 7 8 9
11
(1)
∗ Correspondence to: Department of Industrial Engineering and Management, SCE—Shamoon College of Engineering, Basel Street, Beer-Sheva 84100, Israel. Tel.: +972 54 777 6321. E-mail addresses:
[email protected],
[email protected]. http://dx.doi.org/10.1016/j.csda.2014.08.012 0167-9473/© 2014 Published by Elsevier B.V.
3
10
We refer to a first-order univariate autoregression, denoted by AR(1), which satisfies Yt = φ0 + φ1 Yt −1 + εt ,
2
12
2
1 2 3 4 5 6 7 8
A. Shelef / Computational Statistics and Data Analysis xx (xxxx) xxx–xxx
where φ0 is the constant of the model, φ1 is the parameter of the model and εt is an independent and identically distributed (i.i.d.) innovation process. If φ1 = 1, then Yt is nonstationary. Testing for stationarity by detecting a unit root is an important task in the analysis and modeling of time series. Dickey and Fuller (1979) developed a procedure to test for the presence of a unit root (hereafter referred to as the DF test). The main objective of the DF test is to determine whether H0 : φ1 = 1 or, alternatively, H1 : φ1 < 1. The distribution of the appropriate t-statistic, based on applying the OLS estimator for the nonstationarity parameter, Q3 is nonstandard and cannot be analytically evaluated. Dickey and Fuller (1979) used the Monte Carlo simulation method to tabulate the percentiles of the DF t-statistic distribution based on εt ∼ i.i.d N (0, σε2 ) innovations. The DF t-statistic is DFt −stat =
9
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
where φˆ 1OLS =
cov(Yt ,Yt −1 ) cov(Yt −1 ,Yt −1 )
(2)
ˆ (φˆ 1OLS ) = is the OLS estimator for φ1 , SD
−1 0.5 n 2 σˆ ε2 Y and σˆ ε2 is the least squares t =1 t −1
estimator. Extensive attempts to improve this test and to find alternative or superior tests followed. Leybourne (1995) suggested using the maximum of DF t-statistics based on applying the OLS regressions twice, looking both forward and backward at the series. The critical values for the test are obtained in a manner similar to that performed in the original DF test discussed above using Monte-Carlo simulation. Elliott et al. (1996) and Ng and Perron (2001) suggested a class of unit root tests that are based on generalized least squares detrending of the series and then applying the DF test on the detrended data. An important development in unit root tests includes the use of resampling methods for calculating critical values; see, for example, a survey of such methods in Palm et al. (2008). A first residual-based bootstrap version of the DF test was proposed by Ferretti and Romo (1996). Later, Moreno and Romo (2000) used a bootstrap procedure based on the LAD estimator, and more recently, Moreno and Romo (2012) suggested a family of unit root bootstrap tests for the infinite variance case. Another important development appears in Muller and Elliott (2003), who revealed that the initial condition (Y0 ) has a non-negligible influence on the finite sample performances of unit root tests. In general, it is difficult to rule out, a priori, the existence of small or large values of Y0 . Elliott and Müller (2006) suggest statistics whose power is less sensitive to the size of Y0 . Harvey and Leybourne (2005) and Harvey et al. (2009) recommend a union of test rejection decision rules to improve the performances of the statistical procedure. Hallin et al. (2011) propose a class of distribution-free rank-based tests for the null hypothesis of a unit root that takes into account several initial values for the series. They use three test statistics that are based on a choice of a reference density function, which need not be the unknown actual density of the innovations. The first test statistic is based on the Gaussian refer −1 R t n t (n) 1 , where Rt are the ranks of the increments ∆Yt = Yt − ence density and is defined as Tv dW = √1n t =1 n+1 − 2 Φ n+1 Yt −1 and Φ denotes the standard normal distribution. This statistic is also known as the normal or van der Waerden score. The n t (n) 1 second test statistic is based on the double-exponential distribution (Laplace or sign test scores), TL = √1n t =1 n +1 − 2 sign
33
φˆ 1OLS − 1 , ˆ (φˆ 1OLS ) SD
Rt n +1
−
1 2
(n) . The third test statistic is based on the logistic distribution (Wilcoxon scores), TW = √π
1 − n+1R−Rt / 1 + t
3n
n+1−Rt Rt
n
t =1
t n +1
−
1 2
. These three test statistics are denoted here as HAW-vdW, HAW-Laplace and HAW-
39
Wilcoxon, respectively. Asymptotic results and simulated quantiles for these statistics for several sample sizes are given in Hallin et al. (2011). The results for finite samples indicate that for a broad range of non-zero initial values and for a variety of heavy-tailed innovation densities, the suggested rank-based tests outperform a variety of unit root tests. Rank-based tests, which intrinsically involve some loss of information of the real values, are expected to perform well under heavy-tailed distributions. In this paper, we propose a Gini-based unit root test that relies on both the real values and the ranks, while avoiding loss of information.
40
1.2. The Gini methodology
34 35 36 37 38
41 42 43 44 45 46 47 48 49 50 51 52 53
The GMD is an alternative index of variability that is used in this research instead of the variance. The most prevalent presentation of the GMD index is the expected absolute difference between two independent and identically distributed (i.i.d.) variables X1 and X2 (Gini, 1914). Formally, the GMD of X is defined as GX = E |X1 − X2 | .
(3)
Alternatively, the GMD can be expressed as a special case of a covariance, i.e., four times the covariance of X , a random variable, and FX (X ), its cumulative distribution function (Lerman and Yitzhaki, 1984). Formally, GX = 4COV (X , FX (X )) ,
(4)
where FX (X ) is the cumulative distribution function of X . The GMD index shares many properties of the variance, but the former can be more informative for distributions that depart from normality or symmetry. Both measures are based on weighted averages of the distances between each pair of i.i.d. variables. The fundamental difference is the method used to measure the distance. The GMD distance function is referred to as the ‘‘city block’’ distance, which allows one to move only in the vertical and horizontal directions. The variance distance function is Euclidean, which allows one to move in any desired direction.
A. Shelef / Computational Statistics and Data Analysis xx (xxxx) xxx–xxx
3
As mentioned, the GMD takes into account both the values of the random variable and its ranks, and hence, it is less sensitive to extreme observations than the variance. In addition, because Gini requires only first-order moment assumptions (Stuart and Ord, 1987), the GMD-based method is valid for a wide range of distributions and might be more appropriate for heavy-tailed distributions than variance-based methods. For the properties of the Gini covariance, see Schechtman and Yitzhaki (1987, 1999); Serfling and Xiao (2007); Yitzhaki (2003); Yitzhaki and Schechtman (2013). In this research we use the semi-parametric Gini regression method for AR(1) modeling. The semi-parametric Gini regression is a GMD-based regression method that relies on substituting the variance-based expressions in the OLS regression by the equivalent GMD terms (Olkin and Yitzhaki, 1992). Because it is based on imitation of the OLS formulas, this method enables the replication of some of its concepts. This type of Gini regression coefficient can be referred to as ‘‘covariance-based’’ because it is based on the properties of the covariance. It is semi-parametric because it does not require a specification of the functional form of the model. The regression coefficient can be expressed as a weighted average of slopes defined between each pair of adjacent observations of the explanatory variable (Yitzhaki, 1996). Let (Y , X ) be a continuous bivariate random variable with expected values µY and µX and finite variances σY2 and σX2 , respectively (finite variances are needed only for the OLS). Let the regression model be Y = E (Y |X )+ε . The model is approximated by the following linear model, which needs to be estimated: Y = α +β X +ε . The OLS simple regression coefficient is
βO =
COV (Y , X )
.
COV (X , X ) Replacing each covariance by the corresponding Gini covariance, we get that the simple Gini regression coefficient is
βN =
COV (Y , F (X )) COV (X , F (X ))
,
(5)
where FX (X ) is the cumulative distribution function of X and N indicates that we are dealing with the semi-parametric version of the GMD regression coefficient (Olkin and Yitzhaki, 1992). Note that in the numerator we have a covariance (between Y and FX (X )), while the denominator is the GMD of the explanatory variable (X ) divided by 4. It is worth mentioning that Eq. (6) can also be interpreted as an instrumental variable OLS estimator, where the instrumental variable is F (X ). The estimator of the simple Gini regression coefficient can be expressed as
N
b =
cov (Y , R(X )) cov (X , R(X ))
=
3 4 5 6 7 8 9 10 11 12 13 14 15
16
18
19 20 21 22 23
(Yi − Y¯ )(R(Xi ) − R¯ (X ))
i=1 n
2
17
(6)
n
1
,
(7)
24
(Xi − X¯ )(R(Xi ) − R¯ (X ))
i=1
where the population covariance is replaced by the sample covariance, and F (X ) is replaced by the rank of X divided by n, n n 1 i.e., R(X ) = 1n i=1 I (Xi 6 X ) (based on a sample of size n) (Olkin and Yitzhaki, 1992). In addition, R(Xi ) = n j=1 I (Xj 6 Xi ),
N ¯ ¯ R¯ (X ) = i=1 R(Xi )/n, X = i=1 Xi /n and Y = i=1 Yi /n. The estimator of β is a ratio of two U-statistics. Therefore, it is N a consistent estimator for β and for large samples, its distribution converges to the normal distribution (Hoeffding, 1948). To estimate the standard deviation of the Gini regression coefficient, the classical jackknife estimation procedure (delete1 jackknife) is used, as detailed in Yitzhaki and Schechtman (2013) and following Efron (1982). Let
n
n
n
bN = bN ((Y1 , X1 ), (Y2 , X2 ), . . . , (Yn , Xn ))
= b ((Y1 , X1 ), (Y2 , X2 ), . . . , (Yi−1 , Xi−1 ), (Yi+1 , Xi+1 ), . . . , (Yn , Xn )) N
27 28 29 30 31
be the estimator of the Gini regression coefficient based on the bivariate sample of size n, and let bN(i)
25 26
32
(8)
33
be the estimator of the Gini regression coefficient computed from a bivariate sample of size (n − 1) after deleting the ith n N pair of observations (Yi , Xi ) (i = 1, 2, . . . , n) from the original bivariate sample. Then let bN(·) = 1n i=1 b(i) . The estimated standard deviation of the estimator is given by
35
n 2 ˆSD(bN ) = n − 1 bN(i) − bN(·) . n
(9)
34
36
37
i=1
The structure of the paper is as follows: Section 2 presents a Gini-based framework for modeling AR(1). Section 3 develops the Gini-based unit root test. Section 4 presents the finite sample performances of the suggested test and a detailed comparison to various existing unit root tests. Section 5 presents and discusses the conclusions. 2. A Gini-based framework for modeling AR(1) In this research we apply the semi-parametric Gini regression for AR(1) modeling. Using this method leads to two possible (Gini) regression coefficients, looking forward and backward at the series (to be defined below). In this case, beyond the applicability of the procedure under merely the first moment of the process, it offers additional information by comparing the resulting two coefficients. As such, it enables us to examine whether looking forward and backward at the series provides similar results.
38 39 40
41
42 43 44 45 46
4
1 2 3 4 5 6
A. Shelef / Computational Statistics and Data Analysis xx (xxxx) xxx–xxx
Generally, in the existing methodology two AR(1) regression coefficients are available, looking backward and forward COV (Y ,Y ) COV (Y ,Y ) OLS OLS at the series (Box and Jenkins, 1976). Using OLS, they are φ1 1 = COV (Y t ,tY−1 ) and φ1 2 = COV (tY−,1Y )t , respectively. t −1
G
φ1 1 =
7
8 9
G
11
13 14 15
φ1 2 =
18
19 20 21 22 23 24 25 26 27
t
COV (Yt , F (Yt −1 ))
(10)
COV (Yt −1 , F (Yt −1 ))
COV (Yt −1 , F (Yt )) COV (Yt , F (Yt ))
,
(11)
which is not necessarily equal to the first. Only if the distribution of (Yt , Yt −1 ) is exchangeable up to a linear transformation G G G G will φ1 1 = φ1 2 (e.g., if εt are multivariate normally distributed random variables, then φ1 1 = φ1 2 ). However, the two Gini regression coefficients are not necessarily equal. To estimate the Gini regression coefficients (Eqs. (10) and (11)), we imitate the OLS estimator by replacing each sample covariance with a sample Gini covariance. The estimators using the Gini method are G φˆ 1 1 =
16
17
t
(Carcea and Serfling, 2012). The second Gini regression coefficient is
10
12
t −1
Clearly, under homoscedasticity the covariance does not change with time, i.e., COV (Yt , Yt ) = COV (Yt −1 , Yt −1 ); therefore, OLS OLS φ1 1 = φ1 2 . Following the semi-parametric Gini regression approach (Olkin and Yitzhaki, 1992), each covariance is replaced by the corresponding Gini covariance and the first Gini regression coefficient is
cov(Yt , R(Yt −1 ))
(12)
cov(Yt −1 , R(Yt −1 ))
and G φˆ 1 2 =
cov(Yt −1 , R(Yt )) cov(Yt , R(Yt ))
,
(13)
where the bivariate observations used in the numerator of Eq. (12) are the members of S1 = {(Y2 , Y1 ), (Y3 , Y2 ), . . . , (YT , YT −1 )}; and for Eq. (13), S2 = {(Y1 , Y2 ) , (Y2 , Y3 ) , . . . , (YT −1 , YT )}. For the denominators, in Eq. (12) Y1 , . . . , YT −1 are taken G into account, and in Eq. (13) Y2 , . . . , YT are taken into account. An alternative estimator of φ1 1 based on L-comments is given in Carcea and Serfling (2012). As the sample length increases, differences between the two alternative estimators become G G negligible. The backward and forward Gini regression coefficients (φˆ 1 1 and φˆ 1 2 ) are not necessarily equal. A comparison between them is informative in the context of examining whether looking forward and backward at the series gives similar results or whether time is reversible. This issue is treated thoroughly in Shelef and Schechtman (2011) and in Shelef (2013). It is possible to develop a test statistic for a unit root that combines the information from both the backward and forward G Gini regression coefficients. Intuitively, as such a combined test is based on the additional measure (φˆ 1 2 ), it is expected to G
29
result in superior performances. However, φˆ 1 2 is the Gini regression coefficient when looking forward in time, and therefore, it is less informative in the context of estimating the coefficient of the AR(1) model, which is generated backwards.
30
3. A Gini-based stationarity test
28
31 32 33 34 35 36 37
38 39 40 41 42
43 44 45 46
A weak/covariance stationary process is defined as a process in which the mean, variance and autocovariances are unaffected by a change of the time origin. Using the Gini covariance, this definition can be extended to define a process in which the mean, GMD and Gini autocovariances are time-independent (are unaffected by a change of the time origin). The following conditions should hold for all t , t − s: 1. a constant mean, i.e., E (Yt ) = E (Yt −s ) = µ 2. a constant GMD for all Yt , i.e., G(Yt ) = G(Yt −s ) = COV (Yt , F (Yt )) = COV (Yt −s , F (Yt −s )) = GY . As mentioned earlier, the GCOV is not necessarily symmetric. Therefore, each autocovariance is ‘‘translated’’ to two GCOV (s), which are not necessarily symmetric: 3. first direction: GCOV (Yt , Yt −s ) = COV (Yt , F (Yt −s )) = COV Yt −j , F (Yt −j−s ) = γG1 (s) 4. second direction:
GCOV (Yt −s , Yt ) = COV (Yt −s , F (Yt )) = COV Yt −j−s , F (Yt −j ) = γG2 (s) ,
(14)
where µ, GY and all γG1 (s) and γG2 (s) are time-independent constants. Under exchangeability, the GCOV in the second direction can be omitted so that the definition using Gini will be similar in structure to that based on the covariance. In this section, we propose the Gini-based unit root test. The test is similar to the residual-based bootstrap test proposed by Ferretti and Romo (1996), but we replace the OLS regression with the Gini regression when estimating the AR(1) coefficient. The critical values for the test are found specifically for each user’s sample using a bootstrap procedure. The
A. Shelef / Computational Statistics and Data Analysis xx (xxxx) xxx–xxx
5
bootstrap relies on resampling the unrestricted residuals, that is, without imposing the null hypothesis (following the results in Paparoditis and Politis, 2005). As mentioned, the initial condition can substantially influence the power properties of unit root tests in practice. Consequently, we examine the sensitivity to initial values, following Elliott and Müller (2006) and other recent papers (see, for example, Harvey et al., 2009; Hallin et al., 2011), taking into account several initial values. The suggested Gini methodology is a rank-based estimation that takes into account both the variate values and the ranks. Therefore, it combines the robustness of rank-based estimation with the sensitivity to the variate values, avoiding the loss of information that occurs in purely rank-based tests. Consider the following AR(1) model Yt = φ0 +φ1 Yt −1 +εt , where εt (t = 1, 2, . . .) is a sequence of i.i.d. random variables with E (εt ) = 0 and φ0 = 0 (the process has no drift). We use asymmetric distributions and heavy-tailed distributions for the distributions of εt (the normal distribution is used as a benchmark).
10
3.1. The proposed test statistic
11
The Gini-based DF (GDF) test statistic according to the Gini regression method is GDF =
G1 1
φˆ − 1 , ˆ (φˆ 1G1 ) SD
1 2 3 4 5 6 7 8 9
12
(15)
13
G
where φˆ 1 1 is the first Gini regression estimator, as detailed in Eq. (12). The standard deviation is obtained using jackknife, deleting one pair of (Yt , Yt −1 ) each time, based on Eq. (9). We examined several other optional test statistics that rely on Gini G G regression. One optional test statistic was based on replacing φˆ 1 1 with φˆ 1 2 in the above equation. Another option was the G
3.2. Finding the critical values using bootstrap
where
=
G ∗ Y0 1
+
G ∗ et 1
16
,
17 18 19
20
The bootstrap procedure for finding the critical values for the Gini-based test statistic specified above follows Li and Maddala’s (1996) guidelines for bootstrapping unit root tests and is similar to the procedure described in Ferretti and Romo (1996). The procedure addresses whether the sample comes from a stationary process based on critical values that were calculated specifically for the sample (without knowing in advance whether the sample comes from a stationary process and without assuming a specific distribution of innovations). In the first step, we use the original sample of T observations G Y1 , Y2 , . . . , YT to calculate the residuals from the Gini regression, denoted as et 1 . Next, a sample of T values is drawn with replacement from the T calculated residuals. These values are the bootstrapped G ∗ G ∗ residuals, which are denoted as e1 1 , . . . , eT 1 . The new bootstrapped series is then calculated under H0 : φ1 = 1, so that G ∗ Yt −11
15
G
maximum between the first and second test statistics (based on φˆ 1 1 and φˆ 1 2 ), which resembles the test statistic suggested by Leybourne (1995). However, these alternative test statistics achieve relatively inferior finite sample performances for most design settings. Therefore, they are not presented here (although results are available in Shelef, 2013).
G ∗ Yt 1
14
(16)
is the median of the original sample. The Gini-based bootstrapped test statistic is
21 22 23 24 25 26 27 28
29 30
G∗
GDF∗ =
φˆ 1 1 − 1
,
(17)
31
where φˆ 1 1 is the Gini regression estimator for the AR(1) model (Eq. (12)), calculated based on the bootstrapped sample,
32
G i.e., φˆ 1 1 =
33
G∗
SD(φˆ 1 1 )
G∗
∗
cov(Yt∗ ,R(Yt∗−1 )) . cov(Yt∗−1 ,R(Yt∗−1 ))
Finally, the sampling distribution of GDF∗ is employed to imitate the distribution of the test statistic under H0 : φ1 = 1 and to obtain critical values. The above procedure of generating starred innovations is repeated M times and produces M values of GDF∗ . Based on these values, we denote the percentiles wα∗ as the largest w , such that the proportion of the obtained M values of GDF∗ that is smaller than or equal to wα∗ is at most (α · 100%). The decision rule at the significance level α is to reject H0 if GDF < wα∗ .
(18)
4. Results This section studies the finite sample performances of the Gini-based stationarity test using simulations (implemented in R language). We examine the finite sample performances of the Gini-based stationarity test and of various existing unit root tests: (1) the DF test (critical values are taken from Hamilton, 1994, Table 17.1, Case 2), (2) the ERS − PT test (Elliott et al., 1996), (3) the NP − MZαGLS , NP − MSB and NP − MZtGLS tests with p = 0 and c¯ = −7 (Ng and Perron, 2001), (4) the
EM − Qˆ µ (10, 1) and EM − Qˆ µ (10, 3.8) tests (Elliott and Müller, 2006) and (5) the three rank-based tests developed in Hallin et al. (2011) (denoted here as HAW-vdW, HAW-Laplace and HAW-Wilcoxon, critical values were taken from Hallin
34 35 36 37 38 39
40
41 42 43 44 45
6
A. Shelef / Computational Statistics and Data Analysis xx (xxxx) xxx–xxx Table 1 Empirical levels. Innovations distribution
N (0, 1)
Laplace
Cauchy
Skew-normal
0.044 0.040 0.052 0.042 0.052 0.019 0.021 0.052 0.046 0.048 0.059
0.047 0.049 0.056 0.051 0.053 0.023 0.024 0.045 0.048 0.047 0.053
0.083 0.022 0.033 0.048 0.026 0.016 0.052 0.050 0.055 0.056 0.021
0.043 0.043 0.049 0.047 0.046 0.021 0.022 0.055 0.057 0.058 0.058
0.044 0.041 0.055 0.055 0.052 0.036 0.035 0.054 0.048 0.052 0.061
0.041 0.049 0.056 0.051 0.053 0.039 0.031 0.038 0.048 0.037 0.054
0.078 0.029 0.036 0.045 0.033 0.023 0.056 0.054 0.054 0.059 0.028
0.047 0.050 0.055 0.059 0.050 0.048 0.038 0.047 0.045 0.053 0.057
T = 50 DF ERS − PT NP − MZα NP − MSB NP − MZt EM − Q µ (10, 1) EM − Q µ (10, 3.8) HAW-vdW HAW-Laplace HAW-Wilcoxon GDF T = 100 DF ERS − PT NP − MZα NP − MSB NP − MZt EM − Q µ (10,1) EM − Q µ (10,3.8) HAW-vdW HAW-Laplace HAW-Wilcoxon GDF
2
et al. (2011, Table 2)). Note that comparisons between the different tests are done with the same sequences of the i.i.d. noise and include empirical levels and powers of all test statistics. The numbers of observations in the series (T ) are 50 and 100.
3
4.1. The empirical levels
1
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
The simulation steps for each value of T in this scenario are as follows: 1. Create an original data set • Generate a sample of ε1 , . . . , εT according to √ the desired distribution of the innovations: εt ∼ Normal(0, 1) and εt ∼ Cauchy, Laplace (with location 0 and scale 1/ 2, for which σε = 1), Skew-normal (with shape parameter −10, mean 0, variance 1). • Build the series according to a unit root AR(1) model, i.e., Yt = Yt −1 + εt , where Y0 = 0, t = 1, . . . , T . Notice that in the finite sample simulations in Hallin et al. (2011) the resulting series includes Y0 as its first value. • Calculate the GDF test statistic (Eq. (15)) and various existing test statistics (as detailed above). • Calculate the Gini-based residuals from the estimated model e1 , . . . , eT . 2. The resampling procedure: • Sample with replacement from the residuals sample of e1 , . . . , eT to obtain a bootstrap sample of new innovations G ∗ G ∗ G ∗ e∗1 , . . . , e∗T and construct the bootstrapped series so that Yt 1 = Yt −11 +et 1 (under H0 : φ1 = 1), as described in Eq. (16). • Calculate the statistic: GDF∗ (Eq. (17)). 3. Repeat step 2 (the resampling procedure) M = 500 times and obtain M values of GDF∗(m) (where m = 1, 2, . . . , M). 4. Perform the tests: For the Gini-based test, reject H0 if GDF < wα∗ . For the existing test statistics under study, each test is performed according to its reference. 5. Repeat the above four steps R = 1000 times and calculate the proportion of the rejections of H0 .
27
The detailed empirical levels of the examined test statistics are given in Table 1 for the nominal (theoretical) level α = 5%. It can be seen that the empirical levels of the Gini-based test statistic are relatively close to the nominal level (except for the case of Cauchy distributed innovations, for which the test statistic is conservative and rejects too infrequently). This applies to the ERS − PT , NP − MZαGLS , NP − MSB, NP − MZtGLS and EM − Qˆ µ (10, 1) test statistics. The DF test seems to over-reject in the case of the Cauchy distributed innovations. The empirical levels of the test statistics suggested in Hallin et al. (2011) are relatively close to the nominal level.
28
4.2. The empirical powers
22 23 24 25 26
29 30 31
Each series is created using the following model: Yt = φ1 Yt −1 + εt , where φ1 = 0.975, 0.95 and Y0 = aσε / 1 − φ12
with a = 0, 3, 6 (the different values of Y0 follow Elliott and Müller, 2006). For Cauchy, we use σε = 3 in the definition of Y0 (as in Hallin et al., 2011).
A. Shelef / Computational Statistics and Data Analysis xx (xxxx) xxx–xxx
7
Table 2 Empirical powers for φ1 = 0.975. T = 50
T = 100
a=0
a=3
a=6
a=0
a=3
a=6
N (0, 1)
DF ERS − PT NP − MZα NP − MSB NP − MZt EM − Q µ (10, 1) EM − Q µ (10, 3.8) HAW-vdW HAW-Laplace HAW-Wilcoxon GDF
0.054 0.084 0.100 0.092 0.089 0.043 0.034 0.030 0.036 0.034 0.065
0.043 0.006 0.006 0.008 0.006 0.005 0.010 0.070 0.060 0.066 0.109
0.074 0.000 0.000 0.000 0.000 0.000 0.000 0.111 0.108 0.115 0.262
0.072 0.137 0.163 0.147 0.153 0.104 0.081 0.023 0.041 0.027 0.074
0.106 0.001 0.001 0.001 0.001 0.003 0.017 0.075 0.073 0.070 0.185
0.248 0.000 0.000 0.000 0.000 0.000 0.001 0.306 0.205 0.290 0.532
Laplace
DF ERS − PT NP − MZα NP − MSB NP − MZt EM − Q µ (10, 1) EM − Q µ (10, 3.8) HAW-vdW HAW-Laplace HAW-Wilcoxon GDF
0.054 0.089 0.108 0.098 0.096 0.045 0.031 0.045 0.049 0.042 0.070
0.043 0.012 0.012 0.013 0.012 0.003 0.007 0.057 0.073 0.067 0.115
0.074 0.000 0.000 0.000 0.000 0.000 0.000 0.137 0.167 0.158 0.252
0.072 0.146 0.173 0.149 0.160 0.108 0.081 0.012 0.014 0.013 0.083
0.106 0.000 0.002 0.001 0.002 0.003 0.031 0.101 0.143 0.123 0.223
0.248 0.000 0.000 0.000 0.000 0.000 0.000 0.366 0.433 0.420 0.547
Cauchy
DF ERS − PT NP − MZα NP − MSB NP − MZt EM − Q µ (10,1) EM − Q µ (10,3.8) HAW-vdW HAW-Laplace HAW-Wilcoxon GDF
0.050 0.034 0.041 0.058 0.033 0.019 0.034 0.186 0.249 0.229 0.095
0.046 0.024 0.036 0.042 0.027 0.011 0.022 0.228 0.307 0.282 0.109
0.055 0.013 0.020 0.029 0.012 0.006 0.020 0.319 0.394 0.367 0.132
0.039 0.077 0.089 0.099 0.081 0.049 0.039 0.346 0.429 0.413 0.212
0.033 0.050 0.057 0.063 0.049 0.023 0.023 0.417 0.535 0.498 0.266
0.076 0.030 0.036 0.040 0.031 0.016 0.042 0.565 0.635 0.647 0.329
Skew-normal
DF ERS − PT NP − MZα NP − MSB NP − MZt EM − Q µ (10, 1) EM − Q µ (10,3.8) HAW-vdW HAW-Laplace HAW-Wilcoxon GDF
0.059 0.089 0.127 0.111 0.116 0.051 0.040 0.039 0.041 0.040 0.066
0.051 0.010 0.013 0.012 0.011 0.002 0.019 0.055 0.057 0.055 0.115
0.071 0.000 0.000 0.000 0.000 0.000 0.000 0.128 0.094 0.112 0.261
0.056 0.129 0.160 0.138 0.151 0.098 0.057 0.014 0.032 0.021 0.059
0.080 0.002 0.001 0.001 0.002 0.000 0.023 0.095 0.071 0.083 0.186
0.204 0.000 0.000 0.000 0.000 0.000 0.000 0.375 0.208 0.338 0.548
To examine the empirical power of the Gini-based test statistic, we use the bootstrap-based method as suggested in Section 4.1 for examining the empirical levels. The bootstrap resampling procedure is used because it is designed to reflect the null hypothesis, even though we do not know in advance whether the data at hand originate from a hypothesis that is close to or far from the null hypothesis. In other words, by using this procedure we estimate the distribution of the test statistic under H0 , even when the original sample was drawn from a population that fails to satisfy H0 . In our case, the bootstrapped series follow the unit root AR(1) model (Eq. (16)). Therefore, the critical values obtained by the bootstrap resampling procedure mimic as much as possible the true critical values under H0 . In this approach, when the data set comes from H1 , the proportion of rejections serves as an approximation of the power of the suggested bootstrap procedure (as defined in Davidson and MacKinnon, 2006). As such, we are able to estimate the rejection probabilities of the suggested test statistic under H1 , when the procedure is used by a practitioner. To create data sets under H1 , the only change from the empirical level simulation is that φ1 < 1 (more specifically, φ1 = 0.975, 0.95). As φ1 gets larger and closer to 1, it approaches the null hypothesis, and therefore, the power is expected to decrease (for a given sample size T ). The detailed empirical powers of the examined test statistics are given in Tables 2 and 3 for φ1 = 0.975 and 0.95, respectively, for the nominal (theoretical) level α = 5%. The statistics that have the highest empirical powers in each scenario are printed in bold. The results indicate that, as expected, the empirical powers of the Gini-based test statistic increase with the sample size and as the alternative departs from H0 (as φ1 decreases). Furthermore, the empirical powers of the Gini-based test statistic, the DF test statistic and the test statistics suggested by Hallin et al. (2011) (HAW-vdW, HAW-Laplace and HAWWilcoxon) increase as a increases (as Y0 departs from 0), unlike the empirical powers of ERS − PT , NP − MZαGLS , NP − MSB
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
8
A. Shelef / Computational Statistics and Data Analysis xx (xxxx) xxx–xxx Table 3 Empirical powers for φ1 = 0.95. T = 50
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
T = 100
a=0
a=3
a=6
a=0
a=3
a=6
N (0, 1)
DF ERS − PT NP − MZα NP − MSB NP − MZt EM − Q µ (10, 1) EM − Q µ (10, 3.8) HAW-vdW HAW-Laplace HAW-Wilcoxon GDF
0.069 0.149 0.171 0.149 0.157 0.074 0.047 0.020 0.027 0.018 0.055
0.090 0.003 0.006 0.008 0.005 0.002 0.023 0.070 0.063 0.070 0.185
0.224 0.000 0.000 0.000 0.000 0.000 0.000 0.244 0.184 0.250 0.512
0.121 0.272 0.314 0.269 0.299 0.213 0.129 0.002 0.010 0.003 0.100
0.198 0.002 0.003 0.002 0.002 0.007 0.104 0.062 0.055 0.065 0.304
0.549 0.000 0.000 0.000 0.000 0.000 0.020 0.375 0.251 0.375 0.789
Laplace
DF ERS − PT NP − MZα NP − MSB NP − MZt EM − Q µ (10, 1) EM − Q µ (10, 3.8) HAW-vdW HAW-Laplace HAW-Wilcoxon GDF
0.069 0.122 0.145 0.122 0.141 0.060 0.042 0.023 0.033 0.018 0.076
0.085 0.001 0.004 0.007 0.003 0.000 0.014 0.087 0.113 0.100 0.195
0.241 0.000 0.000 0.000 0.000 0.000 0.002 0.340 0.375 0.387 0.547
0.109 0.283 0.314 0.275 0.304 0.198 0.117 0.006 0.018 0.007 0.097
0.213 0.002 0.004 0.002 0.004 0.007 0.099 0.073 0.130 0.090 0.327
0.587 0.000 0.000 0.000 0.000 0.000 0.032 0.531 0.512 0.591 0.825
Cauchy
DF ERS − PT NP − MZα NP − MSB NP − MZt EM − Q µ (10, 1) EM − Q µ (10, 3.8) HAW-vdW HAW-Laplace HAW-Wilcoxon GDF
0.041 0.074 0.076 0.089 0.074 0.032 0.023 0.248 0.317 0.285 0.180
0.046 0.031 0.043 0.057 0.032 0.019 0.018 0.297 0.384 0.360 0.241
0.078 0.011 0.020 0.034 0.016 0.005 0.018 0.436 0.527 0.500 0.348
0.056 0.172 0.212 0.207 0.191 0.119 0.054 0.389 0.483 0.460 0.390
0.083 0.102 0.121 0.122 0.115 0.074 0.076 0.443 0.507 0.511 0.461
0.132 0.067 0.072 0.061 0.065 0.036 0.083 0.574 0.627 0.638 0.486
Skew-normal
DF ERS − PT NP − MZα NP − MSB NP − MZt EM − Q µ (10, 1) EM − Q µ (10, 3.8) HAW-vdW HAW-Laplace HAW-Wilcoxon GDF
0.073 0.137 0.152 0.130 0.147 0.071 0.043 0.019 0.034 0.018 0.071
0.094 0.000 0.001 0.001 0.001 0.001 0.017 0.082 0.080 0.084 0.180
0.205 0.000 0.000 0.000 0.000 0.000 0.001 0.318 0.202 0.298 0.551
0.117 0.293 0.318 0.272 0.313 0.205 0.122 0.002 0.018 0.002 0.114
0.207 0.002 0.003 0.002 0.003 0.004 0.101 0.060 0.067 0.064 0.315
0.558 0.000 0.000 0.000 0.000 0.000 0.021 0.495 0.265 0.452 0.806
and NP − MZtGLS , EM − Qˆ µ (10, 1) and EM − Qˆ µ (10, 3.8), which decrease as a increases. In addition, the Gini-based test statistic seems to keep a relatively steady empirical power for all the distributions examined in this study, except the Cauchy distribution. For example, when T = 100 and φ1 = 0.95, the empirical powers for all distributions except the Cauchy are around 10%, 31% and 80% for a = 0, 3 and 6, respectively, while for the Cauchy distribution the empirical powers are 39%, 46.1% and 48.6% for a = 0, 3 and 6, respectively. The results indicate that when Y0 = 0, the GDF gives competitive and, in some cases, higher powers relative to the results of the HAW-vdW, HAW-Laplace and HAW-Wilcoxon statistics. This occurs in almost all cases other than the Cauchy distribution and mainly for T = 100. Still, when Y0 = 0, the Gini-based test statistic does not achieve higher powers compared to the results for the NP − MZαGLS , NP − MSB and NP − MZtGLS tests, especially for T = 100. Furthermore, for the Cauchy distribution, HAW-vdW, HAW-Laplace and HAW-Wilcoxon are clearly superior to the GDF (by up to 32% for T = 100, φ1 = 0.975 and a = 6), but this is not the case for other distributions. When Y0 ̸= 0, the GDF test statistic clearly outperforms its competitors and its power increases as Y0 increases for Normal, Laplace and Skew-normal innovations. When the innovations are normally distributed, the GDF seems to be from 3% to 26% more powerful (for T = 50, φ1 = 0.975 and a = 3 and for T = 50, φ1 = 0.95 and a = 6, respectively). With Laplace innovations, the improvement is from 4% to 23% (for T = 50, φ1 = 0.975 and a = 3 and for T = 100, φ1 = 0.95 and a = 6, respectively) and with Skew-normal innovations it is from 5% to 25% (for T = 50, φ1 = 0.975 and a = 3 and for T = 100, φ1 = 0.95 and a = 6, respectively).
A. Shelef / Computational Statistics and Data Analysis xx (xxxx) xxx–xxx
9
5. Discussion and conclusions This paper develops a Gini-based unit root test that relies on the semi-parametric Gini regression to estimate the coefficient of the AR(1) model and on the residuals bootstrap to find critical values for the test. The performances of the Ginibased stationarity test are examined for Normal, Laplace, Skew-normal and Cauchy distributions, which represent a variety of symmetric and asymmetric distributions, both with and without heavy tails. When comparing the Gini-based test to other existing tests, the Gini-based test statistic does not have an advantage when Y0 = 0 (compared to Ng and Perron, 2001) or when the innovations are drawn from the Cauchy distribution (compared to Hallin et al., 2011). For Y0 ̸= 0, however, the Gini regression coefficient has much larger power in most design settings. A possible explanation is that the Gini-based autoregression method takes into account both the ranks and the real values of the observations, whereas rank-based methods omit the real values. Thus, the tests suggested by Hallin et al. (2011), which take into account only the ranks, have larger power for the very heavy-tailed Cauchy distribution. On the other hand, the Gini-based test statistic, which takes into account not only the ranks, has larger power for distributions that are less heavy-tailed, including Normal and Laplace (symmetric) and Skew-normal (asymmetric) distributions. Future research should broaden the Gini-based framework to test for cointegration of nonstationary time series based on the Gini regression coefficients and on the bootstrap to find critical values. Acknowledgments The author thanks Edna Schechtman, Shlomo Yitzhaki and Robert Serfling for their helpful comments. The author would also like to thank the anonymous reviewers for their thorough review and highly appreciates their comments and suggestions, which significantly contributed to improving the quality of the paper. The author also thanks Ramon van den Akker for sharing his Matlab code. References Box, G.E.P., Jenkins, G.M., 1976. Time Series Analysis: Forecasting and Control. Holden-Day, San Francisco. Carcea, M., Serfling, R., 2012. A Gini autocovariance function for heavy tailed time series modeling. Available at http://www.utdallas.edu/∼serfling/papers/ Gini_autocov_fcn_Feb_2013.pdf. Davidson, R., MacKinnon, J.G., 2006. The power of bootstrap and asymptotic tests. J. Econometrics 133 (2), 421–441. Dickey, D.A., Fuller, W.A., 1979. Distribution of the estimators for autoregressive time series with a unit root. J. Amer. Statist. Assoc. 74 (366a), 427–431. Efron, B., 1982. The Jackknife, the Bootstrap, and Other Resampling Plans, Vol. 38. Society for Industrial and Applied Mathematics, Philadelphia. Elliott, G., Müller, U.K., 2006. Minimizing the impact of the initial condition on testing for unit roots. J. Econometrics 135 (1–2), 285–310. Elliott, G., Rothenberg, T.J., Stock, J.H., 1996. Efficient tests for an autoregressive unit root. Econometrica 64 (4), 813–836. http://dx.doi.org/10.2307/2171846. Ferretti, N., Romo, J., 1996. Unit root bootstrap tests for AR(1) models. Biometrika 83 (4), 849–860. Gini, C., 1914. On the measurement of concentration and variability of characters. Metron 63 (2005), 3–38. Hallin, M., van den Akker, R., Werker, B.J.M., 2011. A class of simple distribution-free rank-based unit root tests. J. Econometrics 163 (2), 200–214. Hamilton, J.D., 1994. Time Series Analysis. Princeton University Press, New Jersey. Harvey, D.I., Leybourne, S.J., 2005. On testing for unit roots and the initial observation. Econom. J. 8 (1), 97. Harvey, D.I., Leybourne, S.J., Taylor, A.M.R., 2009. Unit root testing in practice: dealing with uncertainty over the trend and initial condition. Econometric Theory 25 (3), 587–636. Hoeffding, W., 1948. A class of statistics with asymptotically normal distribution. Ann. Math. Stat. 19 (3), 293–325. Lerman, R.I., Yitzhaki, S., 1984. A note on the calculation and interpretation of the Gini index. Econom. Lett. 15 (3), 363–368. Leybourne, S.J., 1995. Testing for unit roots using forward and reverse Dickey–Fuller regressions. Oxford Bull. Econ. Stat. 57 (4), 559–571. Li, H., Maddala, G.S., 1996. Bootstrapping time series models. Econometric Rev. 15 (2), 115–158. Moreno, M., Romo, J., 2000. Bootstrap tests for unit roots based on LAD estimation. J. Statist. Plann. Inference 83 (2), 347–367. Moreno, M., Romo, J., 2012. Unit root bootstrap tests under infinite variance. J. Time Ser. Anal. 33 (1), 32–47. Muller, U.K., Elliott, G., 2003. Tests for unit roots and the initial condition. Econometrica 71 (4), 1269. Ng, S., Perron, P., 2001. Lag length selection and the construction of unit root tests with good size and power. Econometrica 69 (6), 1519–1554. Olkin, I., Yitzhaki, S., 1992. Gini regression analysis. Internat. Statist. Rev. 60, 185–196. Palm, F.C., Smeekes, S., Urbain, J.P., 2008. Bootstrap unit root tests: comparison and extensions. J. Time Ser. Anal. 29 (2), 371–401. Paparoditis, E., Politis, D.N., 2005. Bootstrapping unit root tests for autoregressive time series. J. Amer. Statist. Assoc. 100, 545–553. Schechtman, E., Yitzhaki, S., 1987. A measure of association based on Gini mean difference. Comm. Statist. Theory Methods 16 (1), 207–231. http://dx.doi. org/10.1080/03610928708829359. Schechtman, E., Yitzhaki, S., 1999. On the proper bounds of the Gini correlation. Econom. Lett. 63 (2), 133–138. http://dx.doi.org/10.1016/S01651765(99)00033-6. Serfling, R., Xiao, P., 2007. A contribution to multivariate L-moments: L-comoment matrices. J. Multivariate Anal. 98 (9), 1765–1781. Shelef, A., 2013. Statistical Analyses Based on Gini for Time Series Data (Ph.D. dissertation). Ben-Gurion University of the Negev. Shelef, A., Schechtman, E., 2011. A Gini-based methodology for identifying and analyzing time series with non-normal innovations. Preprint. Stuart, A., Ord, J.K., 1987. Kendall’s Advanced Theory of Statistics, Vol. 1, fifth ed.. Oxford University Press, New York. Yitzhaki, S., 1996. On using linear regressions in welfare economics. J. Bus. Econom. Statist. 14 (4), 478–486. Yitzhaki, S., 2003. Gini’s mean difference: a superior measure of variability for non-normal distributions. Metron 61 (2), 285–316. Yitzhaki, S., Schechtman, E., 2013. The Gini Methodology. Springer, New York.
1
2 3 4 5 6 7 8 9 10 11 12 13 14 15
16
17 18 19 20
21
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57