Mean squared error properties of the kernel-based multi-stage median predictor for time series

Mean squared error properties of the kernel-based multi-stage median predictor for time series

Statistics & Probability Letters 56 (2002) 51 – 56 Mean squared error properties of the kernel-based multi-stage median predictor for time series Jan...

98KB Sizes 0 Downloads 109 Views

Statistics & Probability Letters 56 (2002) 51 – 56

Mean squared error properties of the kernel-based multi-stage median predictor for time series Jan G. De Gooijera; ∗ , Ali Gannounb , Dawit Zeromc a Department

b Laboratoire

of Economic Statistics, University of Amsterdam, Roetersstraat 11, 1018 WB Amsterdam, The Netherlands de Probabilit#es et Statistique, Universit#e Montpellier II, Place Eug)ene Bataillon, 34095 Montpellier C#edex 5, France c Tinbergen Institute and Department of Economic Statistics, University of Amsterdam, Roetersstraat 11, 1018 WB Amsterdam, The Netherlands Received February 2000; received in revised form September 2000

Abstract We propose a kernel-based multi-stage conditional median predictor for -mixing time series of Markovian structure. c 2002 Mean squared error properties of single-stage and multi-stage conditional medians are derived and discussed.  Elsevier Science B.V. All rights reserved Keywords: -mixing; Conditional median; Kernel; Markovian; Mean squared error; Multi-stage predictor; Single-stage predictor; Time series

1. Introduction The vast majority of prediction methods used in nonparametric estimation are based on the conditional mean. Recently, interest has focussed on nonparametric estimation of other aspects of the conditional distribution function as a measure of uncertainty; see, e.g., Berlinet et al. (2001), Gannoun (1990), Jones and Hall (1990), Samanta (1989), Matzner-LHber et al. (1998), and De Gooijer and Zerom (2000). In particular, the conditional median has found an increasing use as a general tool for making predictions especially in the presence of outliers or when the conditional distribution is heavy-tailed and=or asymmetric. However, one typical problem of nonparametric prediction methods is that, when making more than one-step ahead predictions, not all the information contained in the past is used. This is particularly the case for time series of the Markovian type. Thus, a loss in prediction accuracy is likely to occur. This paper discusses the important problem of making multi-step ahead predictions using the conditional median in a Markovian structure. To be speci>c, let {Wt ; t ¿ 1} be a strictly stationary real-valued Markovian ∗ Corresponding author. Tel.: +31-20-525-4244; fax: +31-525-4349. E-mail addresses: [email protected] (J.G. De Gooijer), [email protected] (A. Gannoun), [email protected] (D. Zerom).

c 2002 Elsevier Science B.V. All rights reserved 0167-7152/02/$ - see front matter  PII: S 0 1 6 7 - 7 1 5 2 ( 0 1 ) 0 0 1 6 9 - 9

52

J.G. De Gooijer et al. / Statistics & Probability Letters 56 (2002) 51 – 56

process of order m, i.e. L(Wt |W1 ; : : : ; Wt−1 ) = L(Wt |Wt−m ; : : : ; Wt−1 ), where L denotes the law. Using the set of observations W1 ; : : : ; WN , we are interested in making predictions of WN +H (16H 6N −m) where H denotes the prediction horizon. From {Wt }, let us construct the associated process Ut = (Xt ; Yt(1) ; Yt(2) ; : : : ; Yt(H −1) ; Zt ) where Xt = (Wt ; : : : ; Wt+m−1 );

Yt(1) = Xt+1 ;

Yt(2) = Xt+2 ;

:::;

Yt(H −1) = Xt+(H −1)

and Zt = Wt+H +m−1

(t = 1; : : : ; n)

with n = N − H − m + 1. Assume that the process {Ut } is a sequence of strictly stationary random vectors with the same distribution as a vector U de>ned on a probability space (; F; P). Note that U possesses the Markov property L(Z|Y (H −1) ; Y (H −2) ; : : : ; Y (1) ; X ) = L(Z|Y (H −1) ). The “best” predictor of WN +H in the least absolute sense (or the L1 -approximation) is the conditional median of Zt given Xt . More generally, the conditional median is de>ned as the root of the equation F(z|x) = 1=2, where F(·|x) is the conditional distribution function of Z given X = x. The conditional median will be denoted by (x). The well-known Nadaraya–Watson (N–W) estimator of F(z|x) from n realizations (Xt ; Zt ) of (X; Z), ignoring information on the variable Yt(i) (i = 1; : : : ; H − 1), is given by n K((x − Xt )=hn; 1 )1{Zt 6z} n Fˆ n (z|x) = t=1 ; (1) t=1 K((x − Xt )=hn; 1 ) where K(·) is a nonnegative density function (kernel), hn; 1 a smoothing parameter called the bandwidth, and 1A denotes the indicator function for set A. It is natural to estimate (x) by the root of the equation Fˆ n (z|x) = 1=2. In the sequel, we shall denote this estimator by ˆn (x). An estimator of the conditional median which utilizes the information above can be constructed by exploiting the Markovian structure of {Ut } in the following way: E(1{Z6z} |X ) = E E E : : : E(1{Z6z} |Y (H −1) |Y (H −2) | : : : Y (1) |X ): For simplicity of presentation we assume in the rest of the paper that H = 2 and m = 1. This implies Ut = (Xt ; Yt ; Zt ) where Yt = Yt(1) . De>ne g(z; y) = E(1{Z6z} |Y = y). Then, the regular N–W estimator gˆn (z; y), say, of g(z; y) is de>ned by n j=1 K((y − Yj )=hn; 2 )1{Zj 6z} n ; gˆn (z; y) = j=1 K((y − Yj )=hn; 2 ) where hn; 2 is another bandwidth. Hence, an estimator of the conditional median which utilizes the information contained in Yt can be based on the following estimator of the conditional distribution function: n K((x − Xt )=hn; 1 )gˆn (z; Yt ) n : F˜ n (z|x) = t=1 t=1 K((x − Xt )=hn; 1 ) A two-stage nonparametric estimator of (x), the so-called two-stage kernel smoother, is de>ned as the root of the equation F˜ n (z|x) = 1=2. We denote this estimator by ˜n (x). One way of assessing the performance of ˆn (x) and ˜n (x) as estimates of (x) is by deriving their mean squared errors (MSEs). The purpose of this paper is to obtain the leading terms in the asymptotic expansion of MSE for both estimators, ignoring boundary eOects in the computation of ˆn (x) and ˜n (x). It will be shown that the asymptotic MSE of the two-stage predictor ˜n (x) is smaller than the asymptotic MSE of the

J.G. De Gooijer et al. / Statistics & Probability Letters 56 (2002) 51 – 56

53

single-stage predictor ˆn (x). This improvement is due to a lower asymptotic variance of ˜n (x), whereas the asymptotic bias of ˆn (x) and ˜n (x) remains the same. The results will be derived under stationary and strong mixing (or -mixing) conditions. 2. Basic assumptions and notations Throughout the paper, the following notations are used. Let pX (x), pY (y) be the marginal densities of X and Y , respectively. Also, let pX; Y (x; y) and pY; Z (y; z) be the joint densities of (X; Y ) and (Y; Z), respectively, and pZ|X (z|x) = pX; Z (x; z)=pX (x) the conditional density function of Z given X = x. Furthermore, the conditional mean and variances are de>ned as follows: v(z; y) = Var(1{Z6z} |Y = y); w(z; x) = E(v(z; Y )|X = x)

u(z; x) = Var(g(z; Y )|X = x);

and

"2 (z; x) = Var(1{Z6z} |X = x):

We also introduce the short-hand notation G (i; j) (u; v) = @i+j G(u; v)=@i u@j v. The asymptotic results on the MSEs of the estimators ˆn (x) and ˜n (x) will be derived under a set of assumptions gathered below for ease of reference. (A.1) Let Ut = (Xt ; Yt ; Zt ). The sequence {Ut ; t ¿ 1} is strongly mixing (-mixing); see Rosenblatt (1956) or Roussas and Ioannides (1987). ∞ (A.2) The function K is bounded, even, |u|K(u) → 0 as |u| → ∞ and −∞ u2 K(u) du ¡ ∞. (A.3) The sequence (hn; 1 )n¿1 is such that nh3n; 1 → ∞ as n → ∞ and with the mixing coePcient satisfying the following requirements: (i) there exists a sequence (mn ) of real numbers increasing to in>nity such that ∃A ¡ ∞; ∀n ¿ 3; 1 6 mn 6 n=2; n[(mn )]2mn =3n 6 A, and nh2n; 1 =(mn ln n) → ∞. (ii) there exists of real numbers cn such that 1 6 cn 6 n; cn → ∞, and cn hn; 1 → 0 where ∞ a sequence ' )  (j) → 0 with ' ∈ [0; 1). (1=h2' n; 1 j=cn (A.4) The functions F(z|·) and g(z; ·) are twice continuously diOerentiable. (A.5) Let T be equal to either X or Y . (i) The marginal density pT (·) is lower bounded by ) ¿ 0. Its derivative pT(i) (·) exists, is bounded and integrable for 0 6 i 6 2. (i; j) (ii) pT; Z (·; ·) exists, is bounded and integrable for 0 6 i + j 6 2. (iii) Let sj = (tj ; zj ). For all j ¿ 1, the density f1; j of the pair (s1 ; sj ) exists and satis>es |f1; j (s1 ; sj ) − f(s1 )f(sj )| 6 c where f(s) = pT; Z (t; z) (A.6) (x) exists and is unique. Some comments on the above assumptions are in order: Assumption (A.1) is more realistic than the assumption that {Ut } is independent. Chen (1996) uses this latter assumption to derive the asymptotic MSE of the kernel-based two-stage nonparametric estimator of the conditional mean in the case H = 2; m = 1. In fact, the strong mixing condition introduced here is weaker than many other mixing modes and dependence conditions, for example, m-dependence, --mixing, absolute regularity, and .-mixing. ARMA time series models are strong mixing with (k) = O(e−sk ), for some s ¿ 0, under weak assumptions. For an account of this information see, e.g., Pham and Tran (1985). Assumption (A.2) is “classical” in kernel nonparametric estimation. The Gaussian density satis>es this assumption, which is usually employed to evaluate the bias of an estimator. Assumption (A.3)(i) is used to prove convergence in probability of the estimator pˆ Z|X (z|x) to pZ|X (z|x); see Section 4. Assumption (A.3)(ii) is necessary to evaluate the variance of the estimators. In the iid case, this assumption is reduced to

54

J.G. De Gooijer et al. / Statistics & Probability Letters 56 (2002) 51 – 56

nh2n; 1 =(ln n) → ∞ as n → ∞. If the process {Ut } is geometrically mixing, i.e. (.) 6 c0 .k with c0 ¿ 0 and = c ln n with c ¿−ln .; see Boente and Fraiman (1995). In that case Assumption . ∈ [0; 1[, one can choose mn  ∞ (A.3)(ii) can be reduced to n=1 (n1=2 hn; 1 )−0 where 0 ¿ 2; see Roussas (1991). Assumption (A.4) is useful to determine the asymptotic bias of the estimators. Assumption (A.5) will be used in many steps to prove the convergence of pˆ Z|X (z|x) to pZ|X (z|x). Finally, Assumption (A.6) is introduced in order to get consistency of the estimates of the conditional median. 3. The MSE of ˆn (x) and ˜n (x)  Theorem1. Assume that Assumptions (A:1)–(A:6) are satis@ed. De@ne the constants R(K) = K 2 (u) du and 2 (K) = u2 K(u) du. If x ∈ {u|pX (u) ¿ 0} and pZ|X ((x)|x) = 0; the following results hold:   h4n; 1 2 1 1 R(K) B ((x); x) + (i) MSE{ˆn (x)} 2 ((x)|x) 4 nhn; 1 4pX (x) pZ|X and where the asymptotic bias function is given by   2pX(1) (x)F (0; 1) ((x)|x) (0; 2) 2 (K): ((x)|x) + B((x); x) = F pX (x) Furthermore; if B((x); x) = 0; it follows that the asymptotically optimal value h∗n ; say; of hn; 1 is given by 1=5  R(K) ∗ n−1=5 (ii) hn = 4B2 ((x); x)pX (x) and the corresponding best possible MSE is  4=5 R(K) n−4=5 5 (B2 ((x); x))1=5 : (iii) MSE∗ {ˆn (x)} 2 ((x)|x) 4 pZ|X 4pX (x) Theorem 2. Assume that Assumptions (A:1)–(A:6) are satis@ed. If x ∈ {u|pX (u) ¿ 0}; pZ|X ((x)|x) = 0; and hn; 2 = o(hn; 1 ) the following results hold:   h4n; 1 2 1 1 u((x); x)R(K) B ((x); x) + (i) MSE{˜n (x)} : 2 4 nhn; 1 pX (x) pZ|X ((x)|x) Furthermore; if B((x); x) = 0; it follows that the asymptotically optimal value h∗∗ n ; say; of hn; 1 is given by 1=5  u((x); x)R(K) n−1=5 (ii) h∗∗ n = B2 ((x); x)pX (x) and the corresponding best possible MSE is  4=5 u((x); x)R(K) n−4=5 5 (B2 ((x); x))1=5 : (iii) MSE∗ {˜n (x)} 2 ((x)|x) 4 pZ|X pX (x) Corollary. Under the assumptions of Theorems 1 and 2; the ratio of the asymptotic best possible MSEs of ˆn (x) and ˜n (x) is given by 4=5  w((x); x) ¿ 1: r((x); x) = 1 + u((x); x)

J.G. De Gooijer et al. / Statistics & Probability Letters 56 (2002) 51 – 56

55

Remark 1. Note that the asymptotic MSE of ˜n (x) is smaller than the asymptotic MSE of ˆn (x) by a factor which depends on the ratio of the conditional variances w(x) and u(x). Further, we see from Theorem 2 and the corollary that the asymptotic results are insensitive to the choice of the bandwidth hn; 2 , provided nh2n; 2 → ∞ and hn; 2 = o(hn; 1 ). Remark 2. The corollary can also be proved for the case H ¿ 2. 4. Sketch of the proofs The >rst part of this section is related to the proofs of both Theorems 1 and 2. Only the proof of Theorem 2 will be given in a more explicit form. Let us denote by Fn (z|x) an estimator of F(z|x), i.e. Fn (z|x) can be either Fˆ n (z|x) or F˜ n (z|x). Similarly, let n (x) denote an estimator of (x), i.e. n (x) can be either ˆn (x) or ˜n (x). Taylor expansion of F((x)|x) about n (x) and various approximations (see, e.g., Lemma D in SerSing, 1980, p. 97) gives F((x)|x) = 1=2 = Fn (n (x)|x) = Fn ((x)|x) + (n (x) − (x))pˆ Z|X (∗ |x);

(2)

where ∗ is some random point between (x) and n (x) and where pˆ Z|X (·|x) is an estimator of the conditional density function which can be regarded as a nonparametric regression problem. To see this, note that, as hn; 1 → 0     Z − z  1 X = x pZ|X (z|X = x): K (3) E hn; 1 hn; 1  The left-hand side of (3) can be regarded as the regression of (1=hn; 1 )K((Zt − z)=hn; 1 ) on Xt ; see Fan and Gijbels (1996). Now, by Assumptions (A.1) – (A.3) and (A.5), one can prove that pˆ Z|X (∗ |x) = pZ|X ((x)|x) + Op (1):

(4)

Given the above results, Theorem 1 follows directly from Theorems 5:1 and 5:2 of Berlinet et al. (2001). Sketch of a proof of Theorem 2. We combine Roussas (1991, Theorem 2:3) and Chen (1996, Theorem 1), replacing Z by 1{Z6z} . Using Assumptions (A.1), (A.2), (A.3)(ii), (A.4), (A.5), and Davydov’s inequality we obtain the asymptotic normality of F˜ n (z|x):  D nhn; 1 {F˜ n (z|x) − F(z|x)}→N(B(z; x); s(z; x)); (5) where the asymptotic bias function B(z; x) is given by

2pX(1) (x)F (0; 1) (z|x) (0; 2) u2 K(u) du B(z; x) = F (z|x) + pX (x) and where the asymptotic variance s(z; x) is given by u(z; x) K 2 (u) du: s(z; x) = pX (x)

(6)

(7)

We replace z in (5) – (7) by (x) and we use (2) and (4) to get the asymptotic normality of ˜n (x). Now notice that the asymptotic MSE of ˜n (x) can be obtained as the sum of its squared asymptotic bias and asymptotic variance, i.e. 2  hn; 1 s(x) MSE{˜n (x)} B((x); x) + : (8) 2 nhn; 1

56

J.G. De Gooijer et al. / Statistics & Probability Letters 56 (2002) 51 – 56

Hence the asymptotically optimal value h∗∗ n of hn; 1 is given by h∗∗ n = arg min MSE{˜ n (x)}: hn;1

To prove (iii), one has to replace hn; 1 by h∗∗ n in expression (8). Proof of the Corollary. By the Markov property it follows that w((x); x) + u((x); x) = "2 ((x); x) = F((x)|x)(1 − F((x)|x)) = 1=4: Taking the ratio of the best possible MSEs, MSE∗ {ˆn (x)} and MSE∗ {˜n (x)}, one obtains the desired result. References Berlinet, A., Gannoun, A., Matzner-LHber, E., 2001. Asymptotic normality of convergent estimates of conditional quantiles. Statistics 35, 139–169. Boente, G., Fraiman, R., 1995. Asymptotic distribution of data driven smoothers in density and regression dependence. Can. J. Statist. 23, 383–397. Chen, R., 1996. A nonparametric multi-step prediction estimator in Markovian structures. Statist. Sinica 6, 603–615. De Gooijer, J.G., Zerom, D., 2000. Kernel-based multistep-ahead predictions of the US short-term interest rate. J. Forecasting 19, 335–353. Fan, J., Gijbels, I., 1996. Local Polynomial Modelling and its Applications. Monographs on Statistics and Applied Probability, Vol. 66. Chapman & Hall, London. Gannoun, A., 1990. Estimation non paramUetrique de la mUediane conditionnelle: mUedianogramme et mUethode du noyau. Publications de l’Institut de Statistique de l’UniversitUe de Paris XXXXV, pp. 11–12. Jones, M.C., Hall, P., 1990. Mean squared error properties of kernel estimates of regression quantiles. Statist. Probab. Lett. 10, 283–289. Matzner-LHber, E., Gannoun, A., De Gooijer, J.G., 1998. Nonparametric forecasting: a comparison of three kernel-based methods. Comm. Statist. Theory Methods 27, 1593–1617. Pham, T.D., Tran, L.T., 1985. Some strong mixing properties of time series models. Stochastic Process. Appl. 19, 297–303. Rosenblatt, M., 1956. A central limit theorem and a strong mixing condition. Proc. Natl. Acad. Sci. USA 42, 43–47. Roussas, G.G., 1991. Recursive estimation of the transition distribution function of a Markov process: asymptotic normality. Statist. Probab. Lett. 11, 435–447. Roussas, G.G., Ioannides, D.A., 1987. Moment inequalities for mixing sequences of random variables. Stochastic Processes Appl. 36, 107–116. Samanta, M., 1989. Non-parametric estimation of conditional quantiles. Statist. Probab. Lett. 7, 407–412. SerSing, R.J., 1980. Approximations Theorems of Mathematical Statistics. Wiley, New York.