Local linear estimation for regression models with locally stationary long memory errors

Local linear estimation for regression models with locally stationary long memory errors

Journal of the Korean Statistical Society ( ) – Contents lists available at ScienceDirect Journal of the Korean Statistical Society journal homepa...

734KB Sizes 0 Downloads 79 Views

Journal of the Korean Statistical Society (

)



Contents lists available at ScienceDirect

Journal of the Korean Statistical Society journal homepage: www.elsevier.com/locate/jkss

Local linear estimation for regression models with locally stationary long memory errors Lihong Wang Department of Mathematics, Nanjing University, Nanjing, 210093, China

article

abstract

info

Article history: Received 2 May 2015 Accepted 30 December 2015 Available online xxxx

In this paper we consider the local linear regression estimation for the nonparametric regression models with locally stationary long memory errors. The asymptotic behaviors of the regression estimators are established. It is shown that there is a multiple bandwidth dichotomy for the asymptotic distribution of the estimators of the regression function and its derivatives. The finite sample performance of the estimator is discussed through simulation studies. © 2016 The Korean Statistical Society. Published by Elsevier B.V. All rights reserved.

AMS 2000 subject classifications: primary 62M10 secondary 62G08 Keywords: Asymptotic behavior Local linear regression estimation Locally stationary long memory process

1. Introduction In recent years the long range dependence is often seen in many scientific investigations. In particular, locally stationary long memory (LSLM) processes are becoming an important tool for analyzing non-stationary long range dependent time series. Data examples with time-varying long memory parameter can be found, for instance, in geophysics, oceanography, meteorology, economics, telecommunication engineering and medicine, see, e.g. Beran (2009), Beran, Sherman, Taqqu, and Willinger (1995), Falconer and Fernandez (2007), Granger and Hyung (2004), Lavielle and Ludena (2000), Palma (2010), Ray and Tsay (2002), Roueff and von Sachs (2011) and Whitcher and Jensen (2000), among others. For more details about long memory time series in general, see, e.g. Beran (1994), Beran, Feng, Ghosh, and Kulik (2013), Dobrushin and Major (1979), Doukhan, Oppenheim, and Taqqu (2003), Giraitis, Koul, and Surgailis (2012), Guégan (2005), Palma (2007) and the references therein. For the LSLM processes, Beran (2009) proposed a maximum likelihood type method to estimate the time-varying long memory parameter d(u). Roueff and von Sachs (2011) investigated the asymptotic behaviors of a local log-regression wavelet estimator of d(u). Wang (2015) explored the properties of the GPH-type estimator and the Local Whittle estimator for a LSLM process characterized by a singularity at the origin of the time varying generalized spectral density. Palma (2010) studied the estimation of the mean of LSLM processes. The mean estimation is actually the special case of the nonparametric regression estimation, which is a fundamental problem in statistics. Nonparametric regression models allow empirical investigation of the data without imposing any parametric structure a priori. In this article we will study the nonparametric estimation for the regression models having the LSLM errors with a general class of time-varying long memory parameters. Proceeding a bit more precisely, we consider the random process (Yt , Xt ) ∈ R × R p , p ≥ 1, t = 1, 2, . . . , and the following regression model: Yt = g (Xt ) + Et ,

t = 1, 2, . . . ,

E-mail address: [email protected]. http://dx.doi.org/10.1016/j.jkss.2015.12.005 1226-3192/© 2016 The Korean Statistical Society. Published by Elsevier B.V. All rights reserved.

(1.1)

2

L. Wang / Journal of the Korean Statistical Society (

)



where the regression function g (·) has continuous second partial derivatives, the disturbances Et = σt (Xt )εt , where {εt , t = 1, 2, . . .} is a LSLM process with zero mean and finite variance σε2 . We assume that {εt , t = 1, 2, . . .} is independent of {Xt , t = 1, 2, . . .} and the first, and second, moments of σt (Xt ) exist. Here the conditional heteroscedasticity is permitted and we do not assume the σt2 (·) are constant over t. We assume the availability of the data {Yt , Xt , t = 1, . . . , n}, where n is the sample size. As discussed in Masry and Mielniczuk (1999) and Wang and Cai (2010), the asymptotic property of the regression estimator does not depend on whether or not the explanatory process Xt is strongly correlated. Therefore, we assume that {Xt , t = 1, 2, . . .} is independent, however, the probability densities of {Xt }, ft (·), are allowed to vary across t. With the aim of covering more general setting of the regression models and establishing the asymptotic properties of the estimator under relatively broad conditions, we allow for conditional heteroscedasticity as well as non-identically distributed observations. We shall use a local linear regression estimator based on the observations to estimate the function g (x) and its derivatives due to the superiority of local linear fitting in function estimation. The aim of this paper is to study the asymptotic properties of the estimators for LSLM models. We establish conditions to ensure the consistency of the local linear estimator, provide uniform convergence rates and the limiting distributions under different bandwidths. For the stationary long memory processes, the nonparametric regression estimation problem has been investigated extensively, see, e.g. Ciuperca (2011), Kulik and Wichelhaus (2011) and Masry and Mielniczuk (1999), among others. Let ∇ g (·) be the first partial derivative vector of the regression function g (·). The local linear estimators of g (x) and ∇ g (x), x ∈ R p , are defined as



gˆ (x)



g (x) ∇

= (X τ DX )−1 X τ Dy ,

(1.2)

where



1

.

X =  .. 1

 (X1 − x)τ  .. , .

(Xn − x)τ

  Y1

.

y =  ..  , Yn

and D = diag(Kh (X1 − x), . . . , Kh (X1 − x)), where X τ denotes the transpose of X , h is the bandwidth, K is a kernel with Kh (x) = K (x/h)/hp , where x/h = (x1 /h, . . . , xp /h)τ . g (x) In the following section, we will show the asymptotic properties of the local linear regression estimators gˆ (x) and ∇ for locally stationary long memory setup. We see that, as in the stationary long memory case (see Masry & Mielniczuk, 1999), there is a multiple bandwidth dichotomy for the asymptotic distribution of the estimators of g and its derivatives. Section 3 illustrates the estimation method for LSLM data with simulation studies. The proofs of the theorems are in Section 4. 2. Asymptotic properties of the estimator Throughout this paper, we assume the following regularity conditions: Assumption (A). (A1) {εt , t = 1, 2, . . . , n} is a zero mean locally stationary process with the time-varying covariance function satisfying

γ (s, t ) = cov(εs , εt ) ∼ G



s

,

t

n n



|s − t |d(s/n)+d(t /n)−1

for large |s − t | > 0, where 0 < d(u) ≤ d0 < 12 for u ∈ [0, 1] and G is a continuous function over [0, 1] × [0, 1] with G(u, u) > 0 for all u ∈ [0, 1]. (A2) The function d(·) reaches its maximum value, d0 , at u0 with the second derivative d′′ (u0 ) < 0 and continuous third derivative. (A3) The marginal density functions of Xt , ft (·), are continuous at x uniformly over t, supt ft (x) < ∞, and there exists a n function f¯ (x) such that limn→∞ n−1 t =1 ft (x) = f¯ (x). (A4) The functions σt2 (·) are continuous at x uniformly over t, supt σt2 (x) < ∞, and there exist functions ς¯ (x) and ω( ¯ x) n such that limn→∞ n−1 t =1 σt2 (x)ft (x) = ς¯ (x) and n 

lim

n→∞

s,t =1;s̸=t

σs (x)σt (x)fs (x)ft (x)γ (s, t ) n  s,t =1;s̸=t

= ω( ¯ x). γ (s, t )

L. Wang / Journal of the Korean Statistical Society (

)



3

(A5) The regression function g (·) is twice differentiable with bounded and  integrable derivatives.  (A6) K (u) is bounded with compact support and satisfies the conditions K (u)du = 1, and for every x, uτ H (x)uK (u)du < ∞, uuτ H (x)uK (u)du < ∞, where H (x) is the Hessian matrix of g. Remark 2.1. Similar conditions on ft (x) and γ (s, t ) to Assumptions (A3) and (A4) are also used in Robinson (2011). Although the non-stationarity of Xt and εt is allowed, in order to obtain a useful asymptotic theory, we do need these homogeneity restrictions. One typical example satisfying Assumption (A1) is the linear LSLM process with an infinite moving average representation

εt =

∞ 

φj

 

j =0

t

n

ζt −j ,

t = 1, 2, . . . ,

(2.1)

where {ζj } are independent and identically distributed (i.i.d.) random variables with zero mean and variance one, and φj (u) ∞ 2 are coefficients satisfying j=0 φj (u) < ∞ for all u ∈ [0, 1]. The model defined by (2.1) generalizes the usual Wold expansion for a linear stationary process allowing the coefficients of the infinite moving average expansion vary smoothly over time. A particular case of (2.1) is the generalized version of the fractional noise process with

φj (u) =

1 d(u)−1 0 (j + d(u)) ∼ j , 0 (j + 1)0 (d(u)) 0 (j)

where 0 is the Gamma function and d(·) is a smoothly time-varying long memory coefficient. Moreover, by Assumptions (A1)–(A2) and Theorem 3.3 of Palma (2010), n 

1 1 n1−2d0 (log n)d0 + 2 E ε¯ n2 ∼ n−1−2d0 (log n)d0 + 2

γ (s, t ) −→ V (u0 )

s,t =1;s̸=t

n as n → ∞, where ε¯ n ≡ n−1 t =1 εt and  2d0 √ π G(u0 , u0 )0 (d0 ) 2   ,  1  (−d′′ (u0 ))d0 + 2 V (u0 ) = √  22d0 −1 π G(u0 , u0 )0 (d0 )   ,  1 (−d′′ (u0 ))d0 + 2

if u0 ∈ (0, 1), (2.2) if u0 = 0, 1.

In addition, for the linear LSLM model (2.1), it is shown by Theorem 3.4 of Palma (2010) that n1/2−d0 (log n)d0 /2+1/4 ε¯ n converges weakly to the random variable (V (u0 ))1/2 Z , where Z is the standard normal random variable. It is worthy to mention that the normalization factor in Theorem 3.4 of Palma (2010) should be n1/2−d0 (log n)d0 /2+1/4 . Denote U0 =

U2 =

n 1

n t =1 n 1

Kh (Xt − x),



Xt − x

n t =1



U1 = Xt − x

h

n 1



Xt − x

n t =1

τ

h



h

Kh (Xt − x),

Kh (Xt − x),

and V0 =

n 1

n t =1

Kh (Xt − x)Yt ,

n 1

V1 =

n t =1



Xt − x h



Kh (Xt − x)Yt .

Put

 Un =

U0 U1

U1τ U2



,

  Vn =

V0 V1

,

then (1.2) can be expressed as



gˆ (x) g (x) h∇



= Un−1 Vn .

(2.3)

Let

µ1 =



ν0 =



uK (u)du, K 2 (u)du,

 µ2 = uuτ K (u)du,   ν1 = uK 2 (u)du, ν2 = uuτ K 2 (u)du,

4

L. Wang / Journal of the Korean Statistical Society (

)



and

 U =

 µτ1 , µ2

1

µ1



6=

ν0 ν1

 ντ1 . ν2

We assume that the matrix U is invertible and 6 is positive-definite. We first show the consistency of the local linear g (x). regression estimators gˆ (x) and ∇ Theorem 2.1. Assume that h = hn → 0, nhp+2 → ∞ and n1−2d0 h2 → ∞ as n → ∞. Then, under Assumption (A), we have, for each x ∈ R p , gˆ (x) −→ g (x) in probability, and

g (x) −→ ∇ g (x) in probability. ∇ g (x))τ )τ . As we will see, under Next we study the asymptotic distribution of the local linear regression estimator (ˆg (x), (∇ g (x) depend on the locally stationary long memory assumption, the limiting distribution and the convergence rate of gˆ (x) and ∇ amount of smoothing h, as well as the relative strength of dependence in εt . Theorem 2.2. Assume that Assumption (A) holds, h = hn → 0, n1−2d0 h2 → ∞ and the sequence {nhp+4 } is bounded as n → ∞, the process {εt } satisfies (2.1) with E ζj8 < ∞, and there exists a positive constant C such that |φj (u)| ≤ Cjd0 −1 for all u ∈ [0, 1] and j ≥ 1. (1) If n1−2d0 (log n)d0 +1/2 (nhp )−1 = o(1) as n → ∞, then for each x ∈ R p , n

1/2−d0

d0 /2+1/4

(log n)



gˆ (x) − g (x)



g (x) − ∇ g (x)) h( ∇

D

−→

U −1 f¯ (x)

(ω( ¯ x)V (u0 ))1/2 µZ ,

D

where −→ denotes convergence in distribution, µ = (1, µτ1 )τ , and Z is the standard normal random variable and V (u0 ) is defined in (2.2). (2) If n2d0 −1 (log n)−d0 −1/2 nhp = o(1) as n → ∞, then for each x ∈ R p ,

(nhp )1/2



gˆ (x) − g (x)

 −

g (x) − ∇ g (x)) h( ∇

h2 2



U −1 µ(x)

D

−→

U −1 f¯ (x)

η,

where η is a p + 1-dimensional normal random with mean 0 and variance σε2 ς¯ (x)6, and µ(x) = (µ3 (x), µτ4 (x))τ  vector τ τ where µ3 (x) = u H (x)uK (u)du, µ4 (x) = uu H (x)uK (u)du. (3) If there exists a constant κ such that n2d0 −1 (log n)−d0 −1/2 nhp = κ as n → ∞, then for each x ∈ R p ,

(nhp )1/2



gˆ (x) − g (x)



g (x) − ∇ g (x)) h(∇



h2 2

U −1 µ(x)



D

−→

U −1 ∗ η , f¯ (x)

where η is a p + 1-dimensional normal random vector with mean 0 and variance σε2 ς¯ (x)6 + κ ω( ¯ x)V (u0 )µµτ . ∗

g (x))τ )τ is asymptotically unbiased. Remark 2.2. From Theorem 2.2 we see that, if nhp+4 = o(1), the estimator (ˆg (x), (∇ In general, there is a trade-off between the dependence conditions and the bandwidth in asymptotic properties for nonparametric estimators. In Theorem 2.2(1), under the LSLM and large bandwidths condition, the estimators of g (·) and ∇ g (·) are unbiased and the limiting distributions are scaled distributions of a normal random variable. Moreover, the required norming sequence depends on the maximum value of the time-varying long memory coefficient function, but it does not depend on the amount of smoothing. However, in Theorem 2.2(2), it is shown that in the case of small bandwidths the asymptotic distributions of the estimators are similar to those in the independent or weakly dependent stationary cases. Furthermore, the borderline case between large and small bandwidths shown in Theorem 2.2(3) results in an asymptotic convolution of the limiting distributions in the above two cases. As evidenced by Theorem 2.2, although the asymptotic variance of the local linear estimators for the LSLM processes is more complex than its stationary long memory counterpart, the central limit theorems similar to those studied for the stationary case still hold for this kind of non-stationary process. Remark 2.3. If we assume that the long memory parameter is stable over time, i.e. d(u) ≡ d0 , and the probability density ft (·) does not vary across t, then the results of Theorem 2.2 are seen to agree with those in Masry and Mielniczuk (1999). In fact, Theorem 2.2(1) is the same as Theorem 3(a) of Masry and Mielniczuk (1999) with α = 1 − 2d0 , L(n) = (log n)−d0 −1/2 and r = 1, where α , L(n) and r are the notations used in Masry and Mielniczuk (1999). Similarly, Theorem 2.2(2) and (3) agree with Theorem 4(a) and Theorem 5(a) of Masry and Mielniczuk (1999), respectively.

L. Wang / Journal of the Korean Statistical Society (

)



5

Table 1 MEAN and SD of the MSE of the estimators for Model (3.1). Sample size

n = 200

500

2000

4000

MEAN SD

0.1451 0.1086

0.1432 0.0803

0.1190 0.0471

0.1018 0.0254

Remark 2.4. Note that, in practice, if one uses the bandwidth hn ∼ n−a , a > 0, then the assumptions in Theorems 2.1 and 2.2 will be satisfied as long as 1/(p + 4) ≤ a < min(1/2 − d0 , 1/(p + 2)) for 0 < d0 < 1/2 − 1/(p + 4). Furthermore, for the first case of Theorem 2.2 to hold, we need 1/(p + 4) ≤ a < min(1/2 − d0 , 1/(p + 2), 2d0 /p). Conversely, if we choose max(1/(p + 4), 2d0 /p) ≤ a < min(1/2 − d0 , 1/(p + 2)), Theorem 2.2(2) will hold. For example, if d0 = 0.125, p = 1, then 0.2 ≤ a < 0.25 implies the first case of Theorem 2.2. For Theorem 2.2(2) to hold, we can choose 0.25 ≤ a < 1/3. Moreover, we need hn = n−0.25 (log n)0.625 for the third case of Theorem 2.2 to hold. Finally, in many applications, it is important to obtain uniform consistency of a nonparametric estimator rather than just g (x))τ )τ . Let pointwise consistency. Therefore, we next establish uniform consistency of the local linear estimator (ˆg (x), (∇ n f¯n (x) = n−1 t =1 ft (x) and χ = {x : f¯n (x) ≥ M > 0}, where M is a positive constant. Theorem 2.3. Suppose that f¯n (x) is uniformly bounded and Lipschitz continuous uniformly over n. In addition to Assumption (A), we assume that, for k, l = 0, 1, . . . , p, the functions Kkl∗ (x) = xk xl K (x), where x0 = 1 and x = (x1 , . . . , xp )τ , have Fourier  τ  transforms Φkl (r ) = (2π )p eir x Kkl∗ (x)dx that satisfy ∥Φ (r )∥dr < ∞, where Φ (r ) is the (p + 1) × (p + 1) matrix with the elements Φkl (r ). Then, we have sup |ˆg (x) − g (x)| = OP ((nh2p )−1/2 ) + OP (nd0 −1/2 h−p ) + OP (h2 ), x∈χ

and

g (x) − ∇ g (x)∥ = OP ((nh2p )−1/2 h−1 ) + OP (nd0 −1/2 h−p−1 ) + OP (h) sup ∥∇ x∈χ

provided that the right-hand sides of both expressions are oP (1). 3. Simulation studies In this section we give numerical examples to show the efficiency of the local linear estimation discussed in this paper. We consider the following illustrative examples consisting of a LSLM process {εt } defined by (2.1) with quadratic time1 (2 + u − 2u2 ) for u ∈ [0, 1] which is similar to the example in Palma varying long memory parameter given by d(u) = 17 (2010). The coefficients in the quadratic polynomial ensure that 0 < d(u) < 1/2 and the function d(u) has a maximum value d0 = 0.125 reached at u0 = 0.25. First, let {Xt } be an i.i.d. univariate standard normal random process which is independent of {εt } and σt (Xt ) ≡ 1, and set Yt = 3Xt log(|Xt |) + εt .

(3.1)

Based on the simulated data, we estimate the regression function using the local linear estimator in (1.2) with a standard Gaussian kernel. A data-driven choice of the bandwidth in this context would be desirable. However, in view of the lack of theoretical results on this issue, we will choose the bandwidth based on the conditions for h in Theorems 2.1 and 2.2 in the previous section. That is, we choose h = n−0.22 , h = n−0.3 and h = n−0.25 (log n)0.625 corresponding to three cases of Theorem 2.2, respectively. The computations are performed using R software. For the first case, the empirical mean (MEAN) and empirical standard deviation (SD) of the MSE of the estimators based on 500 replications with the bandwidth h = n−0.22  for four sample sizes n = 200, n = 500, n = 2000 and n = 4000 are n g (Xt ) − g (Xt )|2 . reported in Table 1. The MSE is defined as MSE = 1n t =1 |ˆ The estimated mean regression curves for two different sample sizes n = 500 and 2000 are depicted in Fig. 1. Figs. 2 and 3 show the histograms and the estimated densities for the estimation errors of g (·) at a fixed point x with n = 500 and n = 2000, respectively. It seems that the sampling distributions shown in Figs. 2 and 3 are approximately normal. In order to compare the results of three cases in Theorem 2.2, we use different bandwidths to estimate the regression function with the sample size n = 2000. That is, we set h = n−0.3 and h = n−0.25 (log n)0.625 corresponding to the second and third cases of Theorem 2.2, respectively. The estimated densities for the estimation errors are shown in Fig. 4. From Figs. 3 and 4, we see that the sampling distributions in three cases are all normal but with different asymptotic variances. This is consistent with the theoretical results in Theorem 2.2. Next, we use a heteroscedastic error. Let σt2 (Xt ) = 1 + Xt2 and Yt = 3Xt log(|Xt |) + σt (Xt )εt .

(3.2) −0.22

For the following simulations, we choose the bandwidth h = n . Table 2 shows the computation results based on 500 replications for four sample sizes n = 200, n = 500, n = 2000 and n = 4000. The estimated mean regression curves for

L. Wang / Journal of the Korean Statistical Society (

)



y

-40

-20

-20

0

0

20

20

40

40

6

-6

-4

-2

0

2

4

-5

6

0

5

Fig. 1. The estimated mean regression curve (blue) with the actual regression curve g (x) = 3x log(|x|) for Model (3.1) with sample size n = 500 (left) and n = 2000 (right). The scatter plot shows the observations (Xi , Yi ). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

40 30 20 10 0 -4

-2

0

2

4

-4

-2

0

2

4

0.0

0.1

0.2

0.3

Fig. 2. Histograms of the estimation error at a fixed point with sample size n = 500 (left) and n = 2000 (right).

0.0

0.1

0.2

0.3

Fig. 3. The estimated densities of the estimation errors at a fixed point with bandwidth h = n−0.22 and sample size n = 500 (left) and n = 2000 (right).

Fig. 4. The estimated densities of the estimation errors at a fixed point with bandwidth h = n−0.3 (left) and h = n−0.25 (log n)0.625 (right). Table 2 MEAN and SD of the MSE of the estimators for Model (3.2). Sample size

n = 200

500

2000

4000

MEAN SD

0.6558 0.9055

0.6340 0.5502

0.5745 0.3873

0.4609 0.2587

two different sample sizes n = 500 and 2000 are plotted in Fig. 5. We see that, when there is a heteroscedastic error, the performance of the estimators is not as good as that with a constant variance. To assess the performance of the estimators for the multivariate case, finally we consider the following model: Yt = 3Xt log(|Xt |) + sin(Xt −1 ) + εt .

(3.3)

)



7

-40 -20

0

y

20

40

60

L. Wang / Journal of the Korean Statistical Society (

-6

-4

-2

0

2

4

-5

6

0

5

Fig. 5. The estimated mean regression curve (blue) with the actual regression curve g (x) = 3x log(|x|) for Model (3.2) with sample size n = 500 (left) and n = 2000 (right). The scatter plot shows the observations (Xi , Yi ). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

-5

0

0

5

10

10

-40 0 20

40 60

-5

5

-5

0

5

10

-5

0

5

10

Fig. 6. The coplot of the estimated mean regression curves for Model (3.3) with sample size n = 2000. Table 3 MEAN and SD of the MSE of the estimators for Model (3.3). Sample size

n = 200

500

2000

4000

MEAN SD

0.1965 0.1899

0.1835 0.1127

0.1542 0.0497

0.1183 0.0466

We again set n = 200, 500, 2000 and 4000, and compute the empirical MEAN and SD based on 500 replications. The estimation results are collected in Table 3. A coplot of the estimated mean regression curves of Yt against Xt given Xt −1 with sample size n = 2000 is plotted in Fig. 6. Inspection of Figs. 1–6 and Tables 1–3 reveals that the local linear estimator performs well in medium and large samples. The local stationarity of data does not seem to affect the properties of the local linear estimator. Overall, the simulation results agree well with the consistency and asymptotic normality theories of the previous section. The estimated curves fit the true line more closely and become more stable as the sample size increases. 4. Proofs 4.1. Proof of Theorem 2.1 Proof. We first show that Un → f¯ (x)U

(4.1)

in the sense that each element converges in probability. It suffices to show that, for any c = (c0 , c1τ )τ ∈ R p+1 , c τ Un → f¯ (x)c τ U

in probability.

8

L. Wang / Journal of the Korean Statistical Society (

)



Notice that c τ Un is the p + 1-vector with the kth element given by n 1 

τ

(c Un )k =



Xt − x

nhp t =1

 Kc

h

E ((c Un )k ) =

n  1

Xt − x



h

k

where Kc (u) = (c0 + c1τ u)K (u) and τ



 Xt −x  h

0

,

k = 0, 1, . . . , p,

= 1. First, we have, as n → ∞,

uk Kc (u)ft (x + hu)du −→ f¯ (x)(c τ U )k ,

n t =1

(4.2)

where uk is the kth element  vector    of the  u.     Write Wk,t for h1p Xth−x k Kc Xth−x − E h1p Xth−x k Kc Xth−x . By Assumptions (A3), (A6) and Lebesgue’s dominated convergence theorem, we have, n 1 

Var ((c τ Un )k ) =

n  1 1

E |Wk,t |2 ≤

n2 t = 1 nhp n t =1  1 ∼ p f¯ (x) u2k Kc2 (u)du nh

u2k Kc2 (u)ft (x + hu)du

= O((nhp )−1 ).

(4.3)

Thus it follows from (4.2) and (4.3) that Un → f¯ (x)U

(4.4)

in probability. From (1.1) and (2.3), we get gˆ (x) − g (x)





= Un−1 Vn∗ + Un−1 Rn ,

g (x) − ∇ g (x)) h(∇

where Vn∗ = (V0∗ , (V1∗ )τ )τ , V0∗ =

1 n

n

t =1

Kh (Xt − x)σt (Xt )εt , V1∗ =

n X t −x 1 t =1 n h n Xt −x Kh t =1 h

   1





Kh (Xt − x)σt (Xt )εt , and Rn = (R0 , R1τ )τ ,

τ R0 = (Xt − x)(g (Xt ) − g (x) − ∇ g (x)τ (Xt − x)). t =1 Kh (Xt − x)(g (Xt ) − g (x) − ∇ g (x) (Xt − x)), R1 = n Using the second-order Taylor expansion for g (·) and along similar lines of the proof of (4.1), we have that, 1 n

n

Rn −

1 2

h2 f¯ (x)µ(x) = oP (h2 ),

(4.5)

where µ(x) = (µ3 (x), µτ4 (x))τ , where µ3 (x) = This, together with (4.1), implies that





gˆ (x) − g (x)



g (x) − ∇ g (x)) h( ∇

= Un−1 Vn∗ +

h2 2



uτ H (x)uK (u)du, µ4 (x) =



uuτ H (x)uK (u)du.

U −1 µ(x) + oP (h2 ).

(4.6)

Therefore, by (4.1) and (4.6), it suffices to show that h−1 Vn∗ = oP (1). The independence of {εt } and {Xt } implies that E (Vn∗ ) = 0, and by Assumptions (A1)–(A5), Var (h−1 V0∗ ) =

1

n2 h2 t =1

+ =

n 

2

E εt2 E (σt (Xt )Kh (Xt − x))2

n  s−1 

n2 h2 s=1 t =1 n 1

1

nhp+2 n t =1

+

2

σε

2

n  s−1 

n2 h2 s=1 t =1

E (εs εt )E (σs (Xs )Kh (Xs − x))E (σt (Xt )Kh (Xt − x))



σt2 (x + hu)ft (x + hu)K 2 (u)du

γ (s, t )



σs (x + hu)fs (x + hu)K (u)du



σt (x + hu)ft (x + hu)K (u)du.

L. Wang / Journal of the Korean Statistical Society (

)



9

Then Var (h−1 V0∗ ) ∼

ν0 σε2 ς¯ (x) nhp+2

n  s−1 C ω( ¯ x) 

+

n2 h2

|s − t |d(s/n)+d(t /n)−1

s=1 t =1

∼ Cn−1 h−p−2 + Ch−2 n2d0 −1

1





0

u

|u − v|2d0 −1 dudv

0

= O(n−1 h−p−2 ) + O(n2d0 −1 h−2 ),

(4.7)

and, similarly, Var ((h−1 V1∗ )k ) = O(n−1 h−p−2 ) + O(n2d0 −1 h−2 ). Then we can easily show that h

−1

(4.8)

Vn = oP (1), and this concludes the proof of Theorem 2.1. ∗



4.2. Proof of Theorem 2.2 Proof. Similarly to (4.6), we have that gˆ (x) − g (x)



 −

g (x) − ∇ g (x)) h(∇

h2 2



U −1 µ(x)

= Un−1 Vn∗ + oP (1).

(4.9)

For any c = (c0 , c1τ )τ , c τ Vn∗ =

n 1 

nhp t =1

εt σt (Xt )Kc



Xt − x



h

. 2

Proof of part (1). To prove the first part, by the Slutsky theorem and the fact that n1/2−d0 (log n)d0 /2+1/4 h2 U −1 µ(x) = o(1), it suffices to show that D n1/2−d0 (log n)d0 /2+1/4 c τ Vn∗ −→ c τ µ(ω( ¯ x)V (u0 ))1/2 Z .

Let Zt = h−p σt (Xt )Kc c τ Vn∗ =

n 1

n t =1

(4.10)

 Xt −x 

. We obtain

h

n 1

εt EZt +

n t =1

εt (Zt − EZt ).

(4.11)

Next we show that n 

εt (Zt − EZt ) = oP (n1/2+d0 (log n)−d0 /2−1/4 ).

(4.12)

t =1

Notice that

 E

n 

2 εt (Zt − EZt )



t =1

n 

E εt2 EZt2 ≤ nh−p σε2 (ς¯ (x) + o(1))c τ Σ c = o(n2d0 +1 (log n)−d0 −1/2 ),

t =1

which implies (4.12). Moreover, it is easy to see that, as n → ∞, E (Zt ) ∼ σt (x)ft (x)c τ µ. Let Sn =

n

t =1 d0 −1

|φj (u)| ≤ Cj

(4.13)

εt σt (x)ft (x). Then Sn = j=−∞ cj ζj , where cj = t =max{1,j} φt −j for all u ∈ [0, 1] and j ≥ 1, Assumptions (A3) and (A4) imply that n

n

t  n

σt (x)ft (x). Let σn2 = Var (Sn ). Since

|cj | ≤ C ′ nd0 with some positive constant C ′ , and

σn2 ∼ ω( ¯ x)

n  s,t =1;s̸=t

γ (s, t ) ∼ ω( ¯ x)V (u0 )n2d0 +1 (log n)−d0 −1/2 .

(4.14)

10

L. Wang / Journal of the Korean Statistical Society (

)



Therefore, the same arguments as in the proof of Theorem 3.4 of Palma (2010) yield that 1 D n1/2−d0 (log n)d0 /2+1/4 Sn −→ (ω( ¯ x)V (u0 ))1/2 Z . n

(4.15)

Combining (4.11)–(4.15), we show (4.10) and this completes the proof of part (1) of Theorem 2.2. Proof of part (2). For any c = (c0 , c1τ )τ , let Pn = (nhp )−1/2

n 

εt σt (Xt )Kc



Xt − x



h

t =1

.

Again we have E (Pn )2 =

n 1 

nhp t =1

+

  2 Xt − x σε2 E σt (Xt )Kc h

n 

1



nhp s,t =1;s̸=t

γ (s, t )E σs (Xs )Kc



Xs − x h

    Xt − x E σt (Xt )Kc h

∼ σε2 ς¯ (x)c τ 6c + ω( ¯ x)V (u0 )n2d0 hp (log n)−d0 −1/2 c τ µµτ c ∼ σε2 ς¯ (x)c τ 6c .

Let σ 2 (x) = σε2 ς¯ (x)c τ 6c. Then, to prove part (2), by the Slutsky theorem and the weak convergence of Un , it suffices to show that D Sn∗ := σ −1 (x)Pn −→ Z ∗ ,

(4.16)



where Z is a standard normal random variable. Following along the similar lines of the proof of Theorem 2.1 in Csörgő and Mielniczuk (1999), (4.16) will follow if we show that

µn := E (Sn∗ |ε1 , . . . , εn ) −→ 0,

(4.17)

σ˜ n2 := Var (Sn∗ |ε1 , . . . , εn ) −→ 1

(4.18)

and

together with Ln (ϵ) =

n  hp 

n

t =1

{u:hp/2 |Yt (u)|≥ϵ n1/2 }

Yt2 (u)ft (u)du −→ 0

in probability, for each ϵ > 0, where Yt (u) = Vt (u) − statement (4.17). Using the fact that o(1), we have, as n → ∞,

µn = (n−1 hp )1/2 σ −1 (x)

n 

n

εt

x ). First we consider the Vt (u)ft (u)du, Vt (u) = σ (x1)hp εt σt (u)Kc ( u− h 1

t =1





(4.19)

σt (x + hv )Kc (v )ft (x + hv )dv

t =1

∼ (n−1 hp )1/2 σ −1 (x)c τ µ

n 

εt σt (x)ft (x)(1 + oP (1))

t =1

= OP (nd0 hp/2 (log n)−d0 /2−1/4 ) = oP (1). Similarly,

σ˜ n2 ∼ ∼ ∼

1

n 

nσ 2 (x) t =1 n c τ 6c 

nσ 2 (x) t =1 1

εt2

σt2 (x + hv )Kc2 (v )ft (x + hv )dv

εt2 σt2 (x)ft (x)(1 + oP (1))

n 1

σε2 ς¯ (x) n



t =1

d0

1

εt σt (x)ft (x) = OP (n 2 +d0 (log n)− 2 − 4 ) (see (4.14)) and n2d0 −1 (log n)−d0 −1/2 nhp =

εt2 σt2 (x)ft (x)(1 + oP (1)).

L. Wang / Journal of the Korean Statistical Society (

Let atj = φj (t /n). Note that E (εt4 ) ≤ C (

 Var

n 1

n t =1

 εt2 σt2 (x)ft (x)

=

∞

j =0



n t =1

n 1 

n2 t = 1



n t =1

j =0

11

j4d0 −4 < ∞. Then we have,



2 σε σ (x)ft (x) 2

2 t

2

σt4 (x)ft2 (x)E (εt4 ) +

n 1



atj1 atj2 asj3 asj4 σt2 (x)ft (x)σs2 (x)fs (x)E ζt −j1 ζt −j2 ζs−j3 ζs−j4

× σt2 (x)ft (x)σs2 (x)fs (x) + 

)

∞



n2 s,t =1 j ,...,j =0 1 4 n 1

a4tj E ζj4 ≤ C σε4 + C

j =0

∞ 

n 1 



=

a2tj )2 +

∞

n 

∞ 

n2 s,t =1;s>t j ,j =0 1 2 1

n 

∞ 

n2 s,t =1;s̸=t j ,j =0 1 2

atj1 as,j1 +s−t atj2 as,j2 +s−t

a2tj1 a2sj2 σt2 (x)ft (x)σs2 (x)fs (x)

2 σε σ (x)ft (x) 2

.

2 t

That is,

 Var

n 1

n t =1

 ε σ (x)ft (x) 2 2 t t

= O(n−1 ) + ≤ O(n−1 ) +

2 n2

n 

γ 2 (s, t )σt2 (x)ft (x)σs2 (x)fs (x)

s,t =1;s>t

n s −1 C 

n2 s=1 t =1

≤ O(n−1 ) + Cn4d0 −2



|s − t |2d(s/n)+2d(t /n)−2

1

0

u



|u − v|4d0 −2 dudv 0

≤ O(n−1 ) + O(n4d0 −2 ). This arrives at, as n → ∞, n 1

n t =1

εt2 σt2 (x)ft (x) −→ σε2 ς¯ (x)

(4.20)

in probability, and hence (4.18) follows. Finally, to prove (4.19), setting Itϵ (u) = I



              εt Kc u − x σt (u) + εt Kc u − x σt (u)ft (u)du ≥ ϵ ,    (nhp )1/2 σ (x)  h h 1

we obtain Ln (ϵ) ≤ 2L∗n (ϵ) + 2Wn∗ , where L∗n (ϵ) =

1

n 

nσ 2 (x) t =1

εt2



Itϵ (x + hv )Kc2 (v )σt2 (x + hv )ft (x + hv )dv ,

and Wn∗ =



hp

n 

nσ 2 (x) t =1 hp

n 

εt2



Kc (v )σt (x + hv )ft (x + hv )dv

2

εt2 σt2 (x)ft2 (x)(c τ µ)2 (1 + oP (1))

nσ 2 (x) t =1 = hp OP (1) = oP (1).

The above magnitude OP (1) is due to (4.20). For positive integer N (nhp )1/2 σ (x)/(K ∗ σ ∗ (x) + σ ∗ (x)f ∗ (x)) ≥ N and introduce h(εt , ϵ N ) = εt2 I (|εt | ≥ ϵ N ) ,

= Nn increasing with n, choose n such that

12

L. Wang / Journal of the Korean Statistical Society (

)



where K ∗ = supu |K (u)| < ∞, f ∗ (x) = supt ft (x) < ∞ and σ ∗ (x) = supt σt2 (x) < ∞, it is now easy to see that for n large enough, Ln (ϵ) ≤

n 2c τ Σ c σ ∗ (x)f ∗ (x) 1 

σ 2 (x)

n t =1

h(εt , ϵ N ) + 2Wn∗ .

Note that

(E (h(εt , ϵ N )))2 ≤ E (εt4 )EI (|εt | ≥ ϵ N ) ≤ C

σε2

ϵ2N 2

−→ 0,

and

 Var

n 1

n t =1

 h(εt , ϵ N )

1



n2

n

n  

E εt4 I (εt ≥ ϵ N ) −→ 0



t =1

as n → ∞ and N → ∞. Whence Ln (ϵ) −→ 0 in probability as n → ∞ for any ϵ > 0. Now (4.16) follows immediately and hence the second result of Theorem 2.2 is proved. Proof of part (3). The proof is analogous to the proof of part (2) and the proof of Theorem 3 in Csörgő and Mielniczuk (1999). Hence we omit the details here.  4.3. Proof of Theorem 2.3 Proof. From (2.3), we have that gˆ (x) − g (x)





g (x) − ∇ g (x)) h(∇

= Un−1 (x)Wn (x),

where Wn (x) =

W0 (x) , W1 (x)





τ W0 (x) = 1n t =1 Kh (Xt − x)(Yt − g (x) − ∇ g (x) (Xt − x)), W1 (x) = Then the theorem will follow if we show that

n

1 n

n

t =1

 Xt −x  h

Kh (Xt − x)(Yt − g (x) − ∇ g (x)τ (Xt − x)).

sup ∥Wn (x) − E (Wn (x))∥ = OP ((nh2p )−1/2 ) + OP (nd0 −1/2 h−p ),

(4.21)

sup ∥Un−1 (x)∥ = OP (1),

(4.22)

sup ∥E (Wn (x))∥ = O(h2 ),

(4.23)

x∈χ

x∈χ

and x∈R p

where ∥A∥ = (trace(Aτ A))1/2 for a matrix A. Note that, for some finite positive constant C , sup ∥Wn (x) − E (Wn (x))∥ ≤ C sup ∥Vn (x) − E (Vn (x))∥ x∈χ

x∈χ

+ C (sup |g (x)| + h sup ∥∇ g (x)∥) sup ∥Un (x) − E (Un (x))∥. x∈χ

x∈χ

(4.24)

x∈χ

If we can show that sup ∥Un (x) − E (Un (x))∥ = OP ((nh2p )−1/2 ),

(4.25)

sup ∥Vn (x) − E (Vn (x))∥ = OP ((nh2p )−1/2 ) + OP (nd0 −1/2 h−p ),

(4.26)

x∈χ

and x∈χ

then (4.21) follows from (4.24)–(4.26) and the fact that supx∈χ |g (x)| < ∞ and supx∈χ ∥∇ g (x)∥ < ∞. Now we proceed to prove (4.26). The proof of (4.25) is quite similar and is therefore omitted. Observe that V1 (x) =

n 1

n t =1



Xt − x h



Kh (Xt − x)(g (Xt ) + εt σt (Xt )) := I1 (x) + I2 (x).

L. Wang / Journal of the Korean Statistical Society (

)



13

It in turn suffices to show that sup ∥I1 (x) − E (I1 (x))∥ = OP ((nh2p )−1/2 ),

(4.27)

sup ∥I2 (x) − E (I2 (x))∥ = OP ((nh2p )−1/2 ) + OP (nd0 −1/2 h−p ).

(4.28)

x∈χ

and x∈χ

Note that, for k = 0, 1, . . . , p, sup |(I1 (x))k − E ((I1 (x))k )|

x∈R p

     n   1  τ τ   g (Xt ) e−ir (Xt −x)/h Φk0 (r )dr − E g (Xt ) e−ir (Xt −x)/h Φk0 (r )dr  = sup  p   p nh x∈R t =1    n  τ τ τ  1   {g (Xt )e−ir Xt /h − E [g (Xt )e−ir Xt /h ]} sup |eir x/h | |Φk0 (r )|dr . ≤  p  x∈R p  nh t =1     n  1 −ir τ Xt −ir τ Xt  {g (Xt )e − E [g (Xt )e ]} |Φk0 (rh)|dr . ≤    n t =1

(4.29)

We have

 Var

n 1

n t =1

 τ

g (Xt ) cos(r Xt )

≤ C (sup |g (x)|)2 n−1 . x∈χ

The same inequality holds with cos(·) replaced by sin(·). Hence, n 1

n t =1

{g (Xt )e−ir

τX

t

− E [g (Xt )e−ir

τX

t

]} = OP (n−1/2 ).

This, together with the fact that |Φk0 (rh)|dr = O(h−p ), implies (4.27). (4.28) follows by using the similar arguments as in (4.29), (4.7) and (4.8). Since ∥Un−1 (x)∥ ≤ 1/|λmin (Un (x))|, where λmin (·) denotes the smallest eigenvalue of a matrix, to prove (4.22), it suffices to show that there exists a finite positive constant C such that



lim P ( inf |λmin (Un (x))| ≥ C ) = 1.

n→∞

x∈χ

Note that, for any c = (c0 , c1τ )τ ∈ R p+1 ,

 τ ¯ sup |E ((c Un (x))k ) − fn (x)(c U )k | ≤ sup |uk Kc (u)| |f¯n (x + hu) − f¯n (x)|du x∈R p x∈R p  ≤ C |uk Kc (u)| ∥hu∥du −→ 0 τ

as n → ∞. This, together with (4.25), implies that supx∈χ ∥Un (x) − f¯n (x)U ∥ = oP (1). Thus we have sup |λmin (Un (x)) − λmin (f¯n (x)U )| = oP (1). x∈χ

Note that χ = {x : f¯n (x) ≥ M > 0}, where M is a positive constant. Then (4.30) means that lim P ( inf |λmin (Un (x))| ≥ sup |λmin (f¯n (x)U )| ≥ M |λmin (U )|) = 1.

n→∞

x∈χ

x∈χ

Finally, using the second-order Taylor expansion of g (·) gives that, for some |ν| < 1,

    uk K (u)f¯n (x + hu)(hu)τ H (x + ν hu)(hu)du   x∈R p 2      h2 ≤ sup sup |f¯n (x)| sup |H (x)|  ∥u∥3 |K (u)|du p p 2 n

sup |E ((Wn (x))k )| = sup

x∈R p

1

x∈R

x∈R

= O(h ). 2

This completes the proof of Theorem 2.3.



(4.30)

14

L. Wang / Journal of the Korean Statistical Society (

)



Acknowledgments The author sincerely wishes to thank two referees for their queries and many insightful remarks and suggestions which have led to improving the presentation of the results. This work has been supported by National Natural Science Foundation of China (NSFC) (11171147), and the Cultivation Fund of the Key Scientific and Technical Innovation Project, Ministry of Education of China (708044). References Beran, J. (1994). Statistics for long-memory processes. New York: Chapman and Hall. Beran, J. (2009). On parameter estimation for locally stationary long-memory processes. Journal of Statistical Planning and Inference, 139, 900–915. Beran, J., Feng, Y., Ghosh, S., & Kulik, R. (2013). Long-memory processes. Probabilistic properties and statistical methods. Hiedelberg: Springer. Beran, J., Sherman, R., Taqqu, M. S., & Willinger, W. (1995). Long-range dependence in variable-bit-rate video traffic. IEEE Transactions on Communication, 43(234), 1566–1579. Ciuperca, G. (2011). Asymptotic behaviour of the LS estimator in a nonlinear model with long memory. Journal of the Korean Statistical Society, 40(2), 193–203. Csörgő, S., & Mielniczuk, J. (1999). Random-design regression under long-range dependent errors. Bernoulli, 5(2), 209–224. Dobrushin, R. L., & Major, P. (1979). Non-central limit theroems for non-liear functions of Gaussian fields. Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete, 50, 27–52. Doukhan, P., Oppenheim, G., & Taqqu, M. S. (2003). Theory and applications of long-range dependence. Boston: Birkhäuser. Falconer, K., & Fernandez, C. (2007). Inference on fractal processes using multiresolution approximation. Biometrika, 94(2), 313–334. Giraitis, L., Koul, H. L., & Surgailis, D. (2012). Large sample inference for long memory processes. London: Imperial College Press. Granger, C. W., & Hyung, N. (2004). Occasional structural breaks and long memory with an application to the S&P 500 absolute stock returns. Journal of Empirical Finance, 11, 399–421. Guégan, D. (2005). How can we define the concept of long memory? An econometric survey. Econometric Reviews, 24, 113–149. Kulik, R., & Wichelhaus, C. (2011). Nonparametric conditional variance and error density estimation in regression models with dependent errors and predictors. Electronic Journal of Statistics, 5, 856–898. Lavielle, M., & Ludena, C. (2000). The multiple change-points problem for the spectral distribution. Bernoulli, 6(5), 845–869. Masry, E., & Mielniczuk, J. (1999). Local linear regression estimation for time series with long-range dependence. Stochastic Processes and their Applications, 82, 173–193. Palma, W. (2007). Wiley series in probability and statistics, Long-memory time series. Theory and methods. Hoboken: John Wiley and Sons. Palma, W. (2010). On the sample mean of locally stationary long-memory processes. Journal of Statistical Planning and Inference, 140, 3764–3774. Ray, B. K., & Tsay, R. S. (2002). Bayesian methods for change-point detection in long-range dependent processes. Journal of Time Series Analysis, 23(6), 687–705. Robinson, P. M. (2011). Asymptotic theory for nonparametric regression with spatial data. Journal of Econometrics, 165, 5–19. Roueff, F., & von Sachs, R. (2011). Locally stationary long memory estimation. Stochastic Processes and their Applications, 121, 813–844. Wang, L. (2015). Time varying long memory parameter estimation for locally stationary long memory processes. Working paper. Wang, L., & Cai, H. (2010). Asymptotic properties of nonparametric regression for long memory random fields. Journal of Statistical Planning and Inference, 140, 837–850. Whitcher, B., & Jensen, M. J. (2000). Wavelet estimation of a local long memory parameter. In Exploration geophysics, Vol. 31 (pp. 94–103).