Journal of Statistical Planning and Inference 117 (2003) 199 – 205 www.elsevier.com/locate/jspi
Nonparametric M-estimation with long-memory errors Jan Berana;∗ , Sucharita Ghoshb , Philipp Sibbertsenc a Department
of Mathematics and Statistics, University of Konstanz, P.O. Box D 149, Konstanz 78457, Germany b Landscape Department WSL, Switzerland c Department of Statistics, University of Dortmund, Dortmund, Germany
Received 3 November 2000; received in revised form 1 April 2002; accepted 29 July 2002
Abstract We investigate the behavior of nonparametric kernel M-estimators in the presence of longmemory errors. The optimal bandwidth and a central limit theorem are obtained. It turns out that in the Gaussian case all kernel M-estimators have the same limiting normal distribution. The motivation behind this study is illustrated with an example. c 2002 Elsevier B.V. All rights reserved.
1. Introduction Consider the regression model yi = g(ti ) + G(Zi );
i i = 1; 2; : : : ; n; ti = ; n
(1)
where g(t) is an unknown regression function in C 3 [0; 1] and the error process i =G(Zi ) is generated by a transformation of a zero mean stationary Gaussian process Zi ; i ∈ N , with long-memory. Here G : R → R is a measurable function and if Z denotes a standard normal variable, E[G(Z)] = 0. This paper deals with the nonparametric M-estimation of the regression function g(t) in this model.
∗
Corresponding author.
c 2002 Elsevier B.V. All rights reserved. 0378-3758/02/$ - see front matter PII: S 0 3 7 8 - 3 7 5 8 ( 0 2 ) 0 0 3 9 1 - 9
200
J. Beran et al. / Journal of Statistical Planning and Inference 117 (2003) 199 – 205
Fig. 1. Hourly wind speed maxima in Zurich in 1999, estimated mean (broken line) and median (solid line) curves tted to the logarithm of the original data, and the periodograms (in Iog–log coordinates) of the residuals.
This work was motivated by the recent occurrences of storm events that had disastrous environmental damages in Switzerland and in the neighboring countries. This resulted in increased need for developing suitable models for storms, in particular for wind speed. As part of this research, one basic problem is to estimate time-dependent location parameters of the marginal wind-speed distributions. The estimator should not be aected by outliers, or in this case, unusually large observations (extremely strong wind speed or storms) which in spite of having short duration tend to aect the usual (mean) location estimators. This in turn implies the development of M-estimation procedures for time series, in particular time series which may have long-memory. Fig. 1 shows a wind speed series (in 0:1 m=s) measured at a climate station near Zurich
J. Beran et al. / Journal of Statistical Planning and Inference 117 (2003) 199 – 205
201
(source: SMA). This series consists of hourly maxima during the year 1998–1999. The higher wind speeds during the winters can be seen as well as the storm Lothar that occurred in the December of 1999. For the purposes of illustration, the time-dependent mean and approximate median functions were estimated by kernel smoothing of the log transformed data. For this, we took a kernel that is proportional to the normal density function (truncated on [−1; 1]) for an arbitrary bandwidth equal to 0.05. The estimated mean and the median functions are shown along with the periodograms for the corresponding residual estimates in the log–log coordinates. The presence of long-memory in the residuals is evident as well as the fact that the median estimate is less aected by the fast wind speed due to Lothar. The changes in the probability distribution function over time can be seen in particular by noticing the shifts in the positions of the median curve in relation to the mean. Also, the latter curve is seen to ‘explode’ towards the end of the series, which is the eect of Lothar, whereas the median curve is less aected by these sudden jumps in the observations. A standard procedure for kernel estimators is to use the Naradaya–Watson estimator g(t) ˆ of g(t) obtained by minimizing the function n 1 ti − t H () = (yi − )2 ; K nbn bn i=1
where bn ¿ 0 is a bandwidth and K(x) is a suitably dened kernel. In the presence of long-memory, Naradaya–Watson type estimators were considered among others by Beran (1999), Csorgo and Mielniczuk (1995) and Hall and Hart (1990). Ghosh (2001) considers kernel estimation for replicated time series data including long-memory models. Due to the fact that Naradaya–Watson estimators can be considered as local least-squares estimators and the least-squares estimators are not robust, the Naradaya– Watson type estimators are also not robust. In this paper we consider the so-called M -smoothers gˆ dened as the solution of the equation: 1 K nb n
i=1
ti − t bn
(yi − ) = 0;
(2)
where : R → R is a measurable function which can be chosen in various ways such as the classical Huber- -function (x) = min(c; max(x; −c)). Beran (1991) considers M-estimation for (constant) location parameter for Gaussian long-memory processes. Robinson (1984) and Hardle (1989) also considered robust kernel M-estimators, but in the case of independent errors. Boente and Fraiman (1989) obtained strong consistency for kernel M-estimators for mixing processes. Sibbertsen (1999) considers robust parametric regression models for long-memory models. The aim of this paper is to study the properties of robust nonparametric M-estimators for long-memory processes. References to long-memory processes include Beran (1992, 1994), Cox (1984), Granger and Joyeux (1980), Hosking (1981) and Mandelbrot and van Ness (1968).
J. Beran et al. / Journal of Statistical Planning and Inference 117 (2003) 199 – 205
202
2. Asymptotic results Dene
1−
I (g ) =
[g (t)]2 dt;
1 0¡¡ ; 2
1
x2 K(x) d x;
I (K) = −1
V (d) =
1 (E
)2
1
1
K(x)K(y)|x − y|2d−1 d x dy: 0
0
Assumptions. • • • • • • •
Z (k) = cov(Zi ; Zi+k ) ∼ CZ |k|2d−1 for d ∈ (0; 1=2) and k → ∞. 1 K is a positive symmetric kernel on [ − 1; 1] with −1 K(x) d x = 1: bn ¿ 0 is a sequence of bandwidths such that bn → 0 and nb3n → ∞ as n → ∞. is dierentiable almost everywhere with respect to the Lebesgue measure. E( (i )) = 0; E( 2 (i )) ¡ ∞. E( (i )) = 0. There exists a measurable function M2 such that |(d 2 =dt 2 ) (t)| ¡ M2 (t); E[M2 (t)] ¡ k2 ¡ ∞. • h (y)=supx6 | (y+x)− (y)| 6 c almost everywhere for some ¿ 0 and 0¡c¡∞. • For tending to zero, h (y) tends to zero almost everywhere. • (G(Z)) has Hermite rank m (m ¿ 1) and cm denotes the mth Hermite coecient. Remarks. • The condition nb3n → ∞ is needed in order that the dominating remainder term in the bias is of order o(b2 ), instead of o(b2n ) + O((nbn )−1 ). • The condition |(d 2 =dt 2 ) (t)| ¡ M2 (t); E[M2 (t)] ¡ k2 ¡ ∞ is used in the Taylor expansion of Eq. (2). • Under the assumptions above, Eq. (2) can have multiple solutions. Therefore, only the existence of a suitably chosen consistent sequence of solutions can be proved. For asymptotic uniqueness, the additional assumption E[ ( + u)] = 0 for any u = 0 is needed. Theorem 1. Under the above assumptions and if d ∈ (0; 0:5), we have 1. Uniform consistency: There exists a sequence of solutions gˆn such that supt∈[; 1−] |gˆn (t) − g(t)| → 0 as n → ∞ in probability. 2. Bias: E[g(t) ˆ − g(t)] = 12 b2n g (t)I (K) + o(b2n ) uniformly in ¡ t ¡ 1 − . 3. Variance: (nbn )1−2d var(g(t)) ˆ = cm2 CZm V (d) uniformly in ¡ t ¡ 1 − if 1 1 2 (1 − 1=m) ¡ d ¡ 2 .
J. Beran et al. / Journal of Statistical Planning and Inference 117 (2003) 199 – 205
203
4. IMSE: If 12 (1 − 1=m) ¡ d ¡ 12 , the integrated mean square error in [; 1 − ] is given by 1− I (g )I 2 (K) + (nbn )2d−1 V (d) E{[g(t) ˆ − g(t)]2 } dt = b4n 4 + o(max(b4n ; (nbn )2d−1 )): 5. Optimal bandwidth: If 12 (1 − 1=m) ¡ d ¡ 12 , bandwidth that minimizes the asymptotic IMSE is given by: bopt = Copt n(2d−1)=(5−2d) , where Copt = Copt () = [(1 − 2d)V (d)={I (g )I 2 (K)}]1=(5−2d) . 6. Asymptotic distribution: For every xed t, and 12 (1 − 1=m) ¡ d ¡ 12 , (nb)1=2−d cm−1 × CZ−m=2 V −1=2 [g(t) ˆ − g(t)] converges in distribution to a nondegenerate process Zm . 1 If K(u) = 2 1{u ∈ [ − 1; 1]}, then Zm is a Hermite process (see Rosenblatt, 1984; Taqqu, 1975) of order m at time 1. If m = 1 and G(x) = x, then cm = E[ ()] and V (d) = (1=cm2 ) |x − y|2d−1 d x dy so that we have
1 1 0
0
K(x)K(y)
Corollary 1. If G(x) = x, then for all functions satisfying the conditions of Theorem 1 and such that (G(Z)) has Hermite rank 1, (nb)1=2−d CZ1=2 [g(t)−g(t)] ˆ con 1 1 verges to a normal distribution with zero mean and variance equal to 0 0 K(x)K(y) |x − y|2d−1 d x dy. Thus the above corollary implies that when m = 1 and the error process i is Gaussian, then the limiting distribution of (nb)1=2−d Cm−1 CZ−m=2 V −1=2 [g(t) ˆ − g(t)] is the same Gaussian distribution for all M-estimators.
Acknowledgements Felix Forster (Department of Natural Hazards, WSL, Switzerland) kindly prepared the wind speed data (source: SMA, Zurich) used in this paper. His help is gratefully acknowledged. We would also like to thank a referee for constructive remarks that helped to improve the quality of the paper.
Appendix
Proof of Theorem 1. (1) We start with the estimating equation 1 ti − t = 0: (yi − g(t))K ˆ bn nbn
J. Beran et al. / Journal of Statistical Planning and Inference 117 (2003) 199 – 205
204
(yi −)K{(ti −t)=bn } as a random eld with space-time Consider Un (; t)=(nbn )−1 index (; t). Then Un converges in probability to U (; t) = E[ (g(t) − + )]. In particular, U (g(t); t) = 0 and there exists a sequence ˆn such that Un (ˆn ; t) = 0 and ˆn →p . Also, for all k ∈ N and (1 ; : : : ; k ; t1 ; : : : ; tk ) ∈ R2k ; [Un (1 ; t1 ); : : : ; Un (k ; tk )] converges in probability and hence also in distribution to [U (1 ; t1 ); : : : ; U (k ; tk )]. Moreover, the sequence of processes Un is tight so that Un (; t) converges to the process U (; t) in the supremum norm (see e.g. Billingsley, 1968). Uniform consistency then follows from the properties of g and . (2) As for the bias, using Taylor expansion and the assumptions stated above, we have ti − t ˆ 0= (yi − g(t))K bn ti − t ti − t = (yi − g(t))K + (g(t) ˆ − g(t)) (yi − g(t))K bn bn ti − t + (g(t) ˆ − g(t))2 ; (yi − g(t))K ˜ bn where |g(t) ˜ − g(t)| 6 |g(t) ˆ − g(t)|. This leads to −1 ti − t ti − t g(t) ˆ − g(t) = (yi − g(t))K (yi − g(t))K + Rn bn bn 1 ti − t E[ (yi − g(t))] + R∗n ; (yi − g(t))K = nbn bn where E(R∗n ) = o(b2 ). The result now follows by noting that yi − g(t) = g(ti ) − g(t) + i , where the expected value of i is zero and g(ti ) − g(t) + i = g (t)(ti − t) + g (t)
(ti − t)2 (ti − t)3 + g (t ∗ ) + i (t); 2 6
|t ∗ − t| 6 |ti − t|; so that
3 (ti − t)2 ∗ (ti − t) + g (t ) (yi − g(t)) = (i ) + (i ) g (t)(ti − t) + g (t) 2 6 3 2 (i ) (ti − t)2 ∗ (ti − t) + g (t)(ti − t) + g (t) ; + g (t ) 2 2 6
|i − i | 6 |g(ti ) − g(t)|: The nal result follows from the properties of and K. (3) The expression for the variance follows from Taylor expansion of the estimating equation around g and by noting that 1. The function can be expanded via a Hermite
J. Beran et al. / Journal of Statistical Planning and Inference 117 (2003) 199 – 205
205
expansion of order m; 2. cov(Hp (Zi ); Hq (Zj )) = 0 if p = q, and equal to [cov(Zi ; Zj )]p otherwise, and 3. [cov(Zi ; Zj )]p ∼ CZ |i − j|p(2d−1) when |i − j| → ∞. (4) and (5) The IMSE and the formula for the optimal bandwidth follows from (2) and (3) by standard arguments. (6) The expansion above implies that g(t) ˆ − g(t) is asymptotically equivalent to a constant times (i )K{(ti −t)=b}. If K(u)= 12 1{u ∈ [−1; 1]}, then we have a sample mean of the random variables (i ) (n(t − b) 6 i 6 n(t + b)). The limiting distribution of sums of this form is derived in Taqqu (1979). A similar result can be obtained for other kernels. References Beran, J., 1991. M estimators of location for Gaussian and related processes with slowly decaying serial correlations. J. Amer. Statist. Assoc. 86, 704–707. Beran, J., 1992. Statistical methods for data with long-range dependence. Statist. Sci. 7, 404–427. Beran, J., 1994. Statistics for Long-memory Processes. Chapman & Hall, New York. Beran, J., 1999. SEMIFAR models—a semiparametric framework for modelling trends, long range dependence and nonstationarity. Discussion Paper of the Center of Finance and Econometrics, No. 99/16, June, 1999. Billingsley, P., 1968. Convergence of Probability Measures. Wiley, New York. Boente, G., Fraiman, R., 1989. Robust nonparametric regression estimation for dependent observations. Ann. Statist. 17, 1242–1256. Cox, D.R., 1984. Long-range dependence: a review. In: David, H.A., David, H.T. (Eds.), Statistics: An Appraisal, Proceedings of the 50th Anniversary Conference. Iowa State University Press, Ames, IA, pp. 55 –74. Csorgo, S., Mielniczuk, J., 1995. Nonparametric regression under long-range dependent normal errors. Ann. Statist. 23, 1000–1014. Ghosh, S., 2001. Nonparametric trend estimation in replicated time series. J. Statist. Plann. Infer. 97, 263–274. Granger, C., Joyeux, R., 1980. An introduction to long-range time series models and fractional dierencing. J. Time Ser. Anal. 1, 15–30. Hall, P., Hart, J.D., 1990. Convergence rates in density estimation for data from innite-order moving average processes. Probab. Theory Related Fields 87, 253–274. Hardle, W., 1989. Asymptotic maximal deviation of M-smoothers. J. Multivariate Anal. 29, 163–179. Hosking, J.R.M., 1981. Fractional dierencing. Biometrika 68, 165–176. Mandelbrot, B.B., van Ness, J.W., 1968. Fractional Brownian motions, fractional noises and applications. SIAM Rev. 10 (4), 422–437. Robinson, P.M., 1984. Robust nonparametric regression. In: Franke, J., Hardle, W., Martin, D. (Eds.), Robust and Nonlinear Time Series Analysis, Lecture Notes in Statistics, Vol. 26. Springer, Berlin, pp. 247–255. Rosenblatt, M., 1984. Stochastic processes with short-range and long-range dependence. In: David, H.A., David, H.T. (Eds.), Statistics: An Appraisal, Proceedings of the 50th Anniversary Conference. The Iowa State University Press, Ames, IA, pp. 509 –520. Sibbertsen, P., 1999. Robuste Parameterschatzung im linearen Regressionsmodell bei Fehlertermen mit langem Gedachtnis, Verlag fur Wissenschaft und Forschung, Berlin (in German). Taqqu, M.S., 1975. Weak convergence to fractional Brownian motion and to the Rosenblatt process. Probab. Theory Related Fields 31, 287–302. Taqqu, M.S., 1979. Convergence of integrated processes of arbitrary Hermite rank. Z. Wahrsch. verw. Geb. 50, 53–83.