Statistical inference for partially linear stochastic models with heteroscedastic errors

Statistical inference for partially linear stochastic models with heteroscedastic errors

Computational Statistics and Data Analysis 66 (2013) 150–160 Contents lists available at SciVerse ScienceDirect Computational Statistics and Data An...

501KB Sizes 7 Downloads 163 Views

Computational Statistics and Data Analysis 66 (2013) 150–160

Contents lists available at SciVerse ScienceDirect

Computational Statistics and Data Analysis journal homepage: www.elsevier.com/locate/csda

Statistical inference for partially linear stochastic models with heteroscedastic errors Xiaoguang Wang ∗ , Dawei Lu, Lixin Song School of Mathematical Sciences, Dalian University of Technology, Dalian, 116023, PR China

article

info

Article history: Received 27 May 2012 Received in revised form 27 March 2013 Accepted 7 April 2013 Available online 12 April 2013 Keywords: Partially linear model Time series Heteroscedasticity Kernel Simultaneous confidence bands

abstract Partially linear models are extended linear models where one covariate is nonparametric, which is a good balance between flexibility and parsimony. The partially linear stochastic model with heteroscedastic errors is considered, where the nonparametric part can act as a trend. The estimators of the parametric component, the nonparametric component and the volatility function are proposed. Furthermore, simultaneous confidence bands about the nonparametric part and the volatility function are constructed based on their coverage probabilities, which are shown to be asymptotically correct. By the confidence bands, the problems of hypothesis testing in this model can be solved effectively from a global view. The finite sample performance of the proposed method is assessed by Monte Carlo simulation studies, and demonstrated by the analyses of non-stationary Australian annual temperature anomaly series and non-homoscedastic daily air quality measurements in New York, where the simultaneous confidence bands provide more comprehensive information about the nonparametric and volatility functions. © 2013 Elsevier B.V. All rights reserved.

1. Introduction Various semiparametric models have been developed for data analysis, and one popular specification is the partially linear model proposed by Engle et al. (1986), which has gained a considerable reputation in the last three decades. This compromising modeling strategy is more flexible than the standard linear model, and affords greater precision than a pure nonparametric one. Various estimation methods have been developed extensively in the literature, such as Heckman (1986), Rice (1986), Chen (1988), Speckman (1988), Robinson (1988) and Chen and Shiau (1991), among many others. Estimation methods including kernel, spline, series estimation, and local linear smoother are used in these papers. They also considered asymptotic properties of the estimators. For a survey of partially linear models, see Härdle et al. (2000). For the stochastic case modeling by semiparametric models, we refer to Gao (2007). One attractive aspect of partially linear regression is heteroscedasticity. In regression analysis, the heteroscedasticity means a situation in which the variance of the dependent variable is variational. You and Chen (2005) gave the testing of heteroscedasticity in partially linear models based on the fixed design assumption. Ma et al. (2006) provided an efficient semiparametric estimator for heteroscedastic partially linear models. Lu (2009) studied the empirical likelihood inference for heteroscedastic partially linear models. The current literature is mainly confined to the test of heteroscedasticity in the partially linear model based on the fixed design assumption. This motivates the test of heteroscedasticity in the partially linear model based on stochastic processes.



Corresponding author. Tel.: +86 411 84708351 8302; fax: +86 411 84708354. E-mail address: [email protected] (X. Wang).

0167-9473/$ – see front matter © 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.csda.2013.04.004

X. Wang et al. / Computational Statistics and Data Analysis 66 (2013) 150–160

151

Consider the partially linear model with time series Yi = XiT β + µ(Ti ) + σ (Ti )εi ,

i = 1, . . . , n,

(1.1)

where Yi ’s are responses, Xi = (Xi1 , Xi2 , . . . , Xip )T and Ti are stochastic explanatory variables, β ∈ Rp is an unknown parameter and the functions µ(·) and σ 2 (·) are the unknown nonparametric component function and the conditional variance function, respectively. Error terms {εi }, i = 1, . . . , n, are independent and identically distributed (i.i.d.) random noise. Xi , Ti and εi are independent from each other. For identifiability, let (β, µ) satisfy E {Yi − XiT β − µ(Ti )}2 = min E {Yi − XiT β ′ − µ′ (Ti )}2 . β ′ ,µ′

Straightforwardly, we assume that Xi does not contain the intercept. In this paper, we only focus on the case when the heteroscedasticity is caused by the nonparametric part Ti . Our main goal is to estimate the parametric part and conduct the inference by constructing asymptotic simultaneous confidence bands (SCBs) for µ(·) and σ 2 (·), since SCBs play a valuable role in addressing the model validation problem. In this paper, we consider SCBs in heteroscedastic partially linear stochastic models. In order to construct an asymptotic SCB for µ(·), functions ln (·) and un (·) are needed to satisfy lim P {ln (t ) ≤ µ(t ) ≤ un (t ) for all t ∈ T } = 1 − α,

(1.2)

n→∞

where t ∈ T = [a1 , a2 ], and α ∈ (0, 1). However, it is difficult to establish an exact theory of the confidence band for µ(·) as above. We shall consider a variant of (1.2), namely, lim P {ln (t ) ≤ µ(t ) ≤ un (t ) for all t ∈ Tn } = 1 − α.

n→∞

(1.3)

Here, Tn denotes a discrete subset of T and Tn becomes denser as n → ∞. We deal with the points t ∈ Tn instead of T in (1.2). A similar framework is adopted in Knafl et al. (1985). The asymptotic SCB for σ 2 (·) can be discussed in the same way. The rest of the paper is organized as follows. We introduce the dependence structure on (Xi , Ti , Yi , εi ) and present the estimators of β, µ(·) and σ 2 (·) in Section 2. Theoretical results are shown in Section 3 including the asymptotical coverage probabilities of µ(·) and σ 2 (·). Following these theorems, asymptotically simultaneous confidence bands for nonparametric and heteroscedastic parts are constructed. In Section 4, we perform a simulation study about simultaneous confidence bands. In Section 5, Australian annual temperature anomaly series and daily air quality measurements in New York are analyzed by our proposed method. We use the nonparametric part to capture the nonlinear trend along years for the first data, and the second data is obviously heteroscedastic. A conclusion and discussion are given in Section 6. All proofs are provided in the Appendix. 2. Estimation In this section, the dependence structure and the estimation procedure are prepared for the theoretical properties. Now we introduce some notations used in this paper. Let ⌈a⌉ stand for the smallest integer greater than a ∈ R. We define two real sequences {an } and {bn } have the order an ≍ bn to mean that there are constants 0 < A < B < ∞ such that A ≤ an /bn ≤ B for all large n. For T ⊂ R, we define C p (T )= {g (·) : supt ∈T |g (k) (t )| < +∞, k = 0, 1, . . . , p}, where g (k) (t ) is the kth order derivative of g (t ), k ≥ 0. Define Aϵ = y∈A {x : |x − y| ≤ ϵ} to be the ϵ -neighborhood of any interval A, where ϵ > 0.

Define a random variable X ∈ Lp , p > 0, if X satisfies ∥X ∥p := {E (|X |p )}1/p < +∞. Let Pk M = E (M |Fk ) − E (M |Fk−1 ), where M ∈ L1 , k ∈ Z. In (1.1), let Ti = G(Fi ), where Fi = (· · · , ηi−1 , ηi ), ηi are i.i.d., i ∈ Z and G is a measurable function. Hence Ti is a stationary process. Assume that {Xi }i=1,...,n is a bounded stationary process. Assume also that εi is independent of Fi , i ∈ Z, and ηi is independent of εj , j ≤ i − 2. Then the structure of Ti is fully described. Let FT , Fε be the distribution functions of T1 and ε1 , respectively. Define fT = FT′ and fε = Fε′ , which are the densities of T1 and ε1 , respectively. Let FT (t |Fi ) = P (Ti+1 ≤ t |Fi ), i ∈ Z, be the conditional distribution function of Ti+1 given Fi and fT (t |Fi ) = ∂ FT (t |Fi )/∂ t be the conditional density function. Define θi = supt ∈R ∥P0 fT (t |Fi )∥ + supt ∈R ∥P0 fT′ (t |Fi )∥. For n ∈ N, define 2 Ξn = nΘ2n +

∞  (Θk+n − Θk )2 ,

where Θn =

k=n

n 

θi .

i =1

Our target is to obtain the SCBs for µ(·) and σ 2 (·). Without loss of generality, let E (ε1 ) = 0, E (ε12 ) = 1. As the basic work, the estimation must be given first. For a fixed β , the model (1.1) can be considered as a nonparametric model, hence we can define an estimator of µ(·) as

µn (·) =

n  i=1

wni (·)(Yi − XiT β),

i = 1, . . . , n,

(2.1)

152

X. Wang et al. / Computational Statistics and Data Analysis 66 (2013) 150–160

where wni (·) are positive weight functions. If the format of wni (·) and the estimator of β are known, µn (·) will be the real estimator of µ(·). Applying the least squares criterion and the estimator µn (·), we obtain the semiparametric least squares estimator of β in (1.1):

βˆ = (X˜ T X˜ )−1 X˜ T Y˜ ,

(2.2)

where X˜ = (X˜1 , X˜2 , . . . , X˜n )T , X˜ i = Xi − r =1 wnr (Ti )Xr , Y˜ = (Y˜1 , Y˜2 , . . . , Y˜n )T , Y˜i = Yi − r =1 wnr (Ti )Yr . In theory, we can also choose a more efficient weighted least squares estimator for β in case of heteroscedastic errors. For easy computation, we prefer the simpler one in practice. Kernel functions K (·) are provided to fit the weight functions wni (t ). Let wni (t ) = Kbn (Ti − t )/(nbn fˆT (t )), where

n

n

fˆT (t ) = 1/(nbn )

n

i=1

Kbn (Ti − t ). Here and hereafter Kbn (u) = K (u/bn ). Let ψK =

 R

u2 K (u)du/2 and ϕK =

 R

K 2 (u)du. The

bandwidth bn → 0 when nbn → ∞. Furthermore, we update the estimation of µ(t ). Substituting βˆ and wni (t ) into (2.1), we have n 

1

µ ˆ bn (t ) =

nbn fˆT (t ) i=1

ˆ Kbn (Ti − t )(Yi − XiT β).

(2.3)

Next, we deal with the variance function estimation. According to (1.1) and E (ε12 ) = 1, we have E {(Yi − XiT β − µ(Ti ))2 |Ti } = σ 2 (Ti ). We choose the natural residual-based estimator for the variance function n 

1

σˆ h2n (t ) =

nhn f˜T (t ) i=1



Khn (Ti − t ) Yi − XiT βˆ − µ ˆ bn (Ti )

2

,

(2.4)

where f˜T (t ) = 1/(nhn ) i=1 Khn (Ti − t ), and hn is the bandwidth. Following this, we give the asymptotic properties and construct SCBs in the next section based on estimators (2.2)–(2.4).

n

3. Main results In this section, SCBs for µ(·) and σ 2 (·) will be provided. Before presenting the main asymptotic results, we need the following regularity conditions. (A1) inft ∈T fT (t ) > 0, inft ∈T σT (t ) > 0, ε1 ∈ L6 , and fT , µ, σ ∈ C 4 (T ϵ ) for some ϵ >0. (A2) The kernel K (·) has a bounded support with a bounded derivative, and satisfies u2 K (u)du ̸= 0. (A3)



n(βˆ − β) = OP (1).

b3n n

 + n12 → 0.   b3n ln n (ln n)3 (ln n)2 9 (b) nbn ln n + + Ξn + 2 4/3 → 0. n nb3n n bn   3 h ln n (ln n)2 (c) nh9n ln n + ln n4 + Ξn n n + 2 4/3 → 0. nh

(A4) (a) nb9n +

+ Ξn

1 nbn

n

n hn

Conditions (A1)–(A2) are commonly used smoothness conditions. Some strictly stationary conditions and finite moment conditions should be assumed for Xi and Ti in this paper. However, by Härdle et al. (2000), we have the asymptotic normality √ of βˆ , hence we can proceed with condition (A3) n(βˆ − β) = OP (1) and omit the corresponding conditions. It suffices to note that condition (A3) ensures the convergence rate of βˆ . In fact, if other estimation methods can obtain the same result, we can relax our estimation method in (2.2). Condition (A4) is commonly used in the coverage probabilities literature, such as Zhao and Wu (2008). Before constructing SCBs, we fix t and offer a theorem about the point-wise confidence interval. Theorem 3.1. Let t ∈ R be fixed. Under conditions (A1)–(A4)(a), we have



nbn fˆT (t ) 

√ σ (t ) ϕK

 d µ ˆ bn (t ) − µ(t ) − b2n ψK ρµ (t ) → N (0, 1),

(3.1)

as n → ∞, where ρµ (t ) = µ′′ (t ) + 2µ′ (t )fT′ (t )/fT (t ). Since the concrete format is not necessary in the proof, we just suppose that the estimator in (2.2) fits the condition in the theorem. Theorem 3.1 provides a central limit theorem of µ(·) ˆ and it can be used in constructing the point-wise confidence interval of µ(·). In other words, it is the primary step for constructing SCBs. The two following theorems are needed to show the SCBs for µ(·) and σ 2 (·).

X. Wang et al. / Computational Statistics and Data Analysis 66 (2013) 150–160

153

Theorem 3.2. Let T = [a1 , a2 ] and the kernel K have support [−k0 , k0 ]. Let Tn = {tj = a1 + 2k0 bn j, j = 0, 1, . . . , mn − 1} and mn = ⌈(a2 − a1 )/(2k0 bn )⌉. Under conditions (A1)–(A4)(b), for every z ∈ R,

√

 1 [fˆT (t )] 2 −z 2 sup |µ ˆ bn (t ) − µ(t ) − bn ψK ρµ (t )| ≤ Bmn (z ) = e−2e , √ ϕK t ∈Tn σ (t )

lim P

n→∞

nbn

(3.2)

where Bmn (z ) =



1 2 ln mn − √ 2 ln mn





√ z , ln ln mn + ln(2 π ) + √ 2 2 ln mn

1

mn ≥ 2 .

˜ n − 1} and m ˜ n = ⌈(a2 − a1 )/(2k0 hn )⌉. Under Theorem 3.3. Let hn ≍ bn and T˜n = {t˜j = a1 + 2k0 hn j, j = 0, 1, . . . , m conditions (A1)–(A4)(c), for every z ∈ R,     √nh  ˜T (t ) f −z n lim P √ sup 2 |σˆ h2n (t ) − σ 2 (t ) − h2n ψK ρσ (t )| ≤ Bm˜ n (z ) = e−2e , n→∞   ϕK νε t ∈T˜n σˆ hn (t ) where νε = E (ε04 ) − 1 > 0 and ρσ (t ) = 2σ ′ (t )2 + 2σ (t )σ ′′ (t ) + 4σ (t )σ ′ (t )fT′ (t )/fT (t ). According to Theorems 3.2 and 3.3, define SCBs for µ(t ) and σ 2 (t ) as follows:

√ ϕK σˆ hn (t ) Bmn (zα ) ψK ρµ (t ) ±  nbn fˆT (t )

(3.3)

√ ϕK νε σˆ h2n (t ) Bm˜ n (zα ), ψK ρσ (t ) ±  nhn f˜T (t )

(3.4)

µ ˆ bn (t ) −

b2n

σˆ (t ) −

h2n

and 2 hn

where zα = − log log[(1 − α)−0.5 ]. Note that the estimators of ρµ (t ), ρσ (t ) and νε must be provided before constructing the SCBs. But ρµ (t ) and ρσ (t ) cannot be estimated easily since they involve unknown functions µ′′ , µ′ and fT′ . For convenience, we adopt the simple jackknife-type bias correction procedure to solve this problem. Let µ ˆ ∗bn (t ) = 2µ ˆ bn (t ) − µ ˆ √2bn (t ) ∗ and replace µ ˆ bn (t ) by µ ˆ bn (t ) in the SCBs. The treatment above is equivalent to applying the 4th-order kernel K ∗ =





2K (u) − K (u/ 2)/ 2 in the estimation. It is easy to check that ψK ∗ = 0. Reviewing (3.3) and (3.4), ρµ (t ) need not be estimated. Similarly, the estimator of ρσ (t ) can be omitted. We will provide the estimator of νε in simulation. In the next section, we will detect the performance of the proposed bands by applying them to a simulated model. Bandwidth selection is always an issue in kernel smoothing of nonparametric statistics. The larger bandwidth may gain on the variance side but lose on the bias side, while a smaller bandwidth may gain on the bias side but lose on variance. An appropriate bandwidth is imperative for a good estimator. There are many criteria for the bandwidth selection, see Fan and Gijbels (1996). We select the bandwidth to be 2tsd n−1/5 , where tsd is the standard deviation of Ti . 4. Simulation In this section, we attempt to build SCBs for the model. SCBs can be used in the hypothesis test to judge whether µ(·) and σ 2 (·) are of certain forms. For example, whether µ(·) and σ 2 (·) are linear can be checked by SCBs. The kernel function involved in the estimation is taken to be the Epanechnikov kernel K (t ) = 3/4(1 − t 2 )+ . To construct the SCBs introduced in (3.3) and (3.4), we consider the data coming from the model Yi = 0.75Xi + sin{2 ∗ (Ti − 0.5)} +



0.5 ∗ exp(0.6 − Ti2 ) · εi ,

i = 1, . . . , n,

where n = 500, εi ∼ N (0, 1), Xi is produced by Xi = 0.5Xi−1 + 0.7Wi , Ti is produced by Ti = 0.7Ti−1 + 0.4Wi , and Wi ∼ N (0, 0.3). We conduct 500 simulations to compute the confidence bands. In order to compute faster and better, some changes in simulation are adopted. We choose B∗mn (z ) := {q | P (|U | ≤ q) = (1 − α)1/mn } instead of Bmn (z ), where U follows the standard normal distribution, because the convergence to the extreme value distribution is slow. It is easy to check that 99%–100% of the T ’s lie in the interval [−1.2298, 1.2573], so we take T = [−1.2298, 1.2573]. Let Tm be a set containing m points evenly spaced over the T . We simulate SCBs based on the points in Tm . The optimal bandwidth is chosen to be bn = 0.28. Then m is fixed by the condition in Theorem 3.2, considering the balance of the computing speed and the SCB

154

X. Wang et al. / Computational Statistics and Data Analysis 66 (2013) 150–160

Fig. 1. Left: the 95% SCB for µ(·); the upper and lower dotted lines are the SCB for µ(·), the middle dotted line is the estimator of µ(·) and the solid line is the true value of µ(·). Right: the 95% SCB for σ 2 (·); the upper and lower dotted lines are the SCB for σ 2 (·), the middle dotted line is the estimator of σ 2 (·) and the solid line is the true value of σ 2 (·).

performance. Let hn = bn and this satisfies the condition of Theorem 3.3. In order to apply Theorem 3.3, we need to provide an estimator of νε . Define n 

νˆ ε =

εˆ i4 I(Ti ∈T )

i=1 n



− 1, I(Ti ∈T )

i=1

where εˆ i = {Yi − XiT βˆ − µ ˆ ∗bn (Ti )}/σˆ h∗n (Ti ), i = 1, 2, . . . , n, σˆ h∗n is defined similarly as µ ˆ ∗bn and I(·) is an indicator function. The simulation attempts to address the questions of what SCBs for µ(·) and σ 2 (·) look like and whether the forms of µ(·) and σ 2 (·) are linear. The curve in the left graph of Fig. 1 has a nonlinear function shape. It suggests that we can accept the alternative hypothesis that the regression function µ(·) is not linear, because the SCB and the estimator of µ(·) are not linear. Similarly, the curve with nonlinear function shape in the left graph of Fig. 1 tells us the volatility function σ 2 (·) is also not linear. 5. Real data analysis Example 1 (Partially Linear Time Series Error Models). The partially linear time series model has been applied to the Australian annual temperature anomaly series from 1911 to 2011. The motivation is to make a comparison with the results of Gao and Hawthorne (2006), while the application also includes some of the recent high temperature anomalies such as 1998. The temperature series is available from the R package ‘‘DAAG’’. Australian regional temperature, the annual Southern Oscillation Index (SOI), CO2 concentrations and average sunspot counts are included in the dataset. SOI is the difference in the barometric pressure at sea level between Tahiti and Darwin. Consider a partially linear model for trend detection and heteroscedasticity test of the form Yt = XtT β + µ(t /T ) + σ (t /T )εt ,

t = 1, . . . , T ,

where Yt is the mean temperature series of interest, Xt is a vector of 3-explanatory variables, such as SOI, CO2 , and sunspots, and t is the time accounting for years. Recently, Gao and Hawthorne (2006) considered some estimation and testing problems for the trend function of the global temperature series data, and showed that a nonlinear trend looks feasible for the temperature series. Fig. 2 shows the annual mean series of the temperature series from 1911 to 2011. The trend in Fig. 2 appears to be distinctly nonlinear, hence the response is not a stationary process. Before 1960, it seems flat, but after 1960, it begins to increase. The other two covariates seem linear with respect to the response detected by the similar figures. Fig. 3 shows the Southern Oscillation Index (SOI) from 1911 to 2011 since the inclusion of the SOI component is a significant factor which is warranted by the interannual fluctuations of the temperature series. The SOI series in Fig. 3 is almost stationary. Two other factors, CO2 concentrations and average sunspot counts, are also included in covariates. The parametric estimator values for SOI, CO2 and sunspots are −0.013657, −0.0001 and 0.0004.

X. Wang et al. / Computational Statistics and Data Analysis 66 (2013) 150–160

155

Fig. 2. The solid line is the average Australian regional temperature series from 1911 to 2011.

Fig. 3. The solid line is the Southern Oscillation Index (SOI) from 1911 to 2011.

Fig. 4 shows the SCBs for both nonparametric and variance functions. From the curves, the trend along years is apparently nonlinear and the error terms seem homoscedastic so that a constant variance function is enough. Example 2 (Environmetric Model). This example considers daily air quality measurements in New York, May–September 1973, obtained from the New York State Department of Conservation (ozone data) and the National Weather Service (meteorological data), see Chambers et al. (1983). 154 observations are given in this dataset, but there are only 111 observations left without missing data. Daily readings of the following air quality values are from May 1 to September 30, 1973. The detection of the time trend is a first step to assess the atmospheric environment quality. Consider a partially linear model of the form Yi = XiT β + µ(Ti ) + σ (Ti )εi ,

i = 1, . . . , 111,

where the response Yi is the mean ozone in parts per billion from 1300 to 1500 h at Roosevelt Island. The linear covariate Xi is a vector of two explanatory variables, including Solar. R: solar radiation in Langleys in the frequency band 4000–7700 Ångströms from 0800 to 1200 h at Central Park, and Temp: maximum daily temperature in degrees Fahrenheit at LaGuardia Airport. In this paper, we focus on the case that the heteroscedasticity is caused by the wind speed. The nonlinear part Ti is Wind: average wind speed in miles per hour at 0700 and 1000 h at LaGuardia Airport. We standardize the response Yi and the wind speed Ti .

156

X. Wang et al. / Computational Statistics and Data Analysis 66 (2013) 150–160

Fig. 4. Left: the 95% SCB for µ(·); the upper and lower dotted lines are the SCB for µ(·), and the middle solid line is the estimator of µ(·). Right: the 95% SCB for σ 2 (·); the upper and lower dotted lines are the SCB for σ 2 (·), and the middle solid line is the estimator of σ 2 (·).

Fig. 5. The solid line is the mean ozone series.

Fig. 5 shows the response series, that is the standardized mean ozone values. From the pattern, we can conclude that this series is absolutely heteroscedastic, since the variances of left and right parts are smaller while the middle part is more volatile. Hence we should consider a heteroscedastic model to deal with the data analysis. The parametric estimator values for Solar.R and Temp are 0.0037 and 0.0013. Fig. 6 shows the SCBs for both nonparametric and volatility functions. From the curves, the relationship between the ozone and the wind speed is nonlinear and the volatility function does not seem like a constant. 6. Discussion In this paper, we have introduced simultaneous confidence bands for partially linear time series with heteroscedastic errors, based on theories about coverage probabilities. The proposed procedure is easy to implement. Moreover, our proposed method can be used to reveal the potential shapes of the nonparametric and conditional variance functions. As illustrated in our numerical studies and real data analyses, the proposed procedure performs well with moderate sample sizes. With additional efforts, the procedure can be slightly modified to deal with some hypothesis test problems. It is of interest to study other more flexible semiparametric models with an unknown heteroscedastic function.

X. Wang et al. / Computational Statistics and Data Analysis 66 (2013) 150–160

157

Fig. 6. Left: the 95% SCB for µ(·); the upper and lower dotted lines are the SCB for µ(·), and the middle solid line is the estimator of µ(·). Right: the 95% SCB for σ 2 (·); the upper and lower dotted lines are the SCB for σ 2 (·), and the middle solid line is the estimator of σ 2 (·).

Acknowledgments We are grateful to the Editor, an Associate Editor and two referees for reading the paper very carefully and their thoughtful and constructive comments. The research work of Xiaoguang Wang is supported by the National Natural Sciences Foundation of China grant 11101063 and 61173103, and the Fundamental Research Funds for the Central Universities in China (DUT12LK29). The research work of Dawei Lu is supported by the National Natural Sciences Foundation of China grant 11101061, and the Fundamental Research Funds for the Central Universities in China (grant no. DUT12LK16). The research work of Lixin Song is supported by the NSFC grant 61175041, the Specialized Research Fund for the Doctoral Program of Higher Education grant 20100041110036 and the Mathematics+X Project (DUT10JS06).

Appendix In this section we show the proof of Theorems 3.1–3.3. Throughout the proofs, without loss of generality, assume that the kernel K has bounded support [−1, 1] and β is a one-dimensional constant. Proof of Theorem 3.1. Decompose (3.1) into two parts as follows:



nbn fˆT (t )

ˆ bn (t ) − µ(t ) − b2n ψK ρµ (t )} √ {µ σ (t ) ϕK    n  nbn fˆT (t ) 1 2 Kb (Ti − t ){µ(Ti ) + σ (Ti )εi } − µ(t ) − bn ψK ρµ (t ) = √ σ (t ) ϕK nbn fˆT (t ) i=1 n  n  nbn fˆT (t ) 1 ˆ + Kb (Ti − t )(Xi β − Xi β). √ σ (t ) ϕK nbn fˆT (t ) i=1 n For the first item of the right-hand side of the above equation, we can get the following result by Zhao and Wu (2008),



nbn fˆT (t )

√ σ (t ) ϕK



1

n 

nbn fˆT (t )

i =1

 Kbn (Ti − t ){µ(Ti ) + σ (Ti )εi } − µ(t ) −

Since fˆT (t ) ≤ sup fˆT (t ) ≤ sup |fˆT (t ) − fT (t )| + sup fT (t ) = OP (1), t ∈T

t ∈T

t ∈T

b2n

d

ψK ρµ (t ) → N (0, 1).

158

X. Wang et al. / Computational Statistics and Data Analysis 66 (2013) 150–160

and Xi is bounded, we can deal with the second item as follows:

     n n   nbn fˆT (t )  nbn fˆT (t )  1 M   ˆ ˆ K ( T − t )( X β − X β) Kb (Ti − t )|β − β| ≤ √ √ bn i i i  σ (t ) ϕ  σ (t ) ϕK nbn fˆT (t ) i=1 n K nbn fˆT (t ) i=1    n  nbn fˆT (t ) M ˆ = |β − β| Kbn (Ti − t ) √ σ (t ) ϕK nbn fˆT (t ) i =1  M bn fˆT (t ) √ ˆ = | n(β − β)|. √ σ (t ) ϕK Note that fˆT (t ) ≤ OP (1),



ˆ = OP (1) and bn → 0, we have n(β − β)

    n   nbn fˆT (t )  P 1  ˆ Kbn (Ti − t )(Xi β − Xi β) → 0.  σ (t )√ϕ ˆ K nbn fT (t ) i=1   Therefore, Theorem 3.1 can be proved by Slutsky’s theorem.



Proof of Theorem 3.2. We have

√

  √ 1 fˆT (t )|µ ˆ bn (t ) − µ(t ) − b2n ψK ρµ (t )| − 2 ln mn + ln ln mn + ln(2 π ) √ σ (t ) ϕK 2 t ∈Tn    n  2nbn ln mn fˆT (t )  1     ≤ sup Kbn (Ti − t ){µ(Ti ) + σ (Ti )εi } − µ(t ) − b2n ψK ρµ (t )  √  nbn fˆT (t ) i=1  σ (t ) ϕK t ∈Tn     n   2nbn ln mn fˆT (t )  1  √ 1  ˆ  Kbn (Ti − t )(Xi β − Xi β) −2 ln mn + ln ln mn + ln(2 π ) + sup  √  nbn fˆT (t ) i=1  2 σ (t ) ϕK x∈Tn  2nbn ln mn

sup

and

√

  √ 1 fˆT (t )|µ ˆ bn (t ) − µ(t ) − b2n ψK ρµ (t )| − 2 ln mn + ln ln mn + ln(2 π ) √ σ (t ) ϕK 2 t ∈Tn    n  2nbn ln mn fˆT (t )  1     ≥ sup Kbn (Ti − t ){µ(Ti ) + σ (Ti )εi } − µ(t ) − b2n ψK ρµ (t )  √    ˆ σ ( t ) ϕ t ∈Tn K nbn fT (t ) i=1    n  2nbn ln mn fˆT (t )    √ 1 1  ˆ  . Kbn (Ti − t )(Xi β − Xi β) − 2 ln mn + ln ln mn + ln(2 π ) − sup  √  nbn fˆT (tx) i=1  2 σ (t ) ϕK t ∈Tn 

sup

2nbn ln mn

The difference between the two inequalities is just in signs. For the third item of the right-hand side of the inequalities, based on fˆT (t ) ≤ OP (1), we have

   n  2nbn ln mn fˆT (t )  1    ˆ  sup Kbn (Ti − t )(Xi β − Xi β)  √  nbn fˆT (t ) i=1  σ (t ) ϕK t ∈Tn   sup 2nbn M ln mn fˆT (t ) n  ˆ Kbn (Ti − t )|β − β| t ∈Tn sup ≤ √ n  inf σ (t ) ϕK t ∈Tn i=1 Kbn (Ti − t ) t ∈Tn i =1





Op ( 2bn ln mn ) √ P ˆ → | n(β − β)| 0. √ inf σ (t ) ϕK t ∈T

(A.1)

X. Wang et al. / Computational Statistics and Data Analysis 66 (2013) 150–160

159

By Theorem 2 in Zhao and Wu (2008),

   n    K ( T − t ){µ( T ) + σ ( T )ε } b i i i i n 2nbn ln mn fˆT (t )  i=1  2   lim P sup − µ( t ) − b ψ ρ ( t ) √ n K µ   n→∞ t ∈Tn σ (t ) ϕK nbn fˆT (t )           √ 1 −z − 2 ln mn + ln ln mn + ln(2 π ) ≤ z = e−2e .  2      



According to (A.1), (A.2) and the Slutsky Theorem, we finish the proof of Theorem 3.2.

(A.2)



√ √ Proof of Theorem 3.3. Let K ∗ (u) = 2K (u) − K (u/ 2)/ 2. Note that   n  1     2 2 2 ∗ 2 2 2 |σˆ hn (t ) − σ (t ) − hn ψK ρσ (t )| =  Khn (Ti − t ){Yi − Xi βˆ − µ ˆ bn (Ti )} − σ (t ) − hn ψK ρσ (t ) ,  nhn f˜T (t ) i=1  where {Yi − Xi βˆ − µ ˆ ∗bn (Ti )}2 can be split to four items as follows: n 

   {Yi − Xi βˆ − µ ˆ ∗bn (Ti )}2 ≤ µ(Ti ) + σ (Ti )εi − 

j =1

2

Kb∗n (Ti − Tj ){µ(Ti ) + σ (Ti )}  nbn fˆT (Ti )

  

    n    ∗   K ( T − T )( Y − X β) i j j j bn     j =1     ˆ + 2 Yi − Xi β −  Xi (β − β)    ˆT (Ti )  nb f n         n n     ∗ ∗   ˆ K ( T − T )( Y − Z β) K ( T − T ) X ( β − β) i j j j i j i bn bn     j =1     j =1 + 2 Yi − Xi β −      ˆ ˆ  nbn fT (Ti ) nbn fT (Ti )      n 2  ∗ ˆ  j=1 Kbn (Ti − Tj )Xi (β − β)   ˆ  + + Xi (β − β)  .   nbn fˆT (Ti ) , I1i + I2i + I3i + I4i .

(A.3)

Similarly, we have

{Yi − Xi βˆ − µ ˆ ∗bn (Ti )}2 > I1i − I2i − I3i − I4i .

(A.4)

In view of the similarity  of (A.3) and (A.4), we just give a detailed proof of the limit distribution of (A.3). For ease of presentation, let Cn =



˜ n /{ ϕK νε σˆ h2n (t )nhn f˜T (t )}. We have 2f˜T (t )nhn ln m

   n     ∗     Kbn (Ti − Tj )(Yj − Xj β)    n n       j =1     sup Cn  Khn (Ti − t )I2i  ≤ sup 2MCn · |βˆ − β| · Khn (Ti − t ) µ(Ti ) −   i =1  t ∈T˜n     nbn fˆT (Ti ) t ∈T˜n i=1          n  + sup 2MCn · |βˆ − β| · σ (Ti )|εi |Khn (Ti − t ) t ∈T˜n

, B21 + B22 .

i =1

160

X. Wang et al. / Computational Statistics and Data Analysis 66 (2013) 150–160 P

P

Let w ˜ n (t ) = fT (t )/f˜T (t ), ςn (t ) = σ 2 (t )/σˆ h2n (t ) and note that supt ∈T w ˜ n (t ) → 1 and supt ∈T ςn (t ) → 1. By (7.25) P

in Zhao and Wu (2008), B22 → 0. Let Wn∗ (x) =



bn ln n/n +



1/2 Ξn bn

n

t ∈Tn

i=1

+

n

= rn + qn (b2n + χn (6)), where rn = √ −q/4−1 χ ( ) = ln n/(nbn ) + n−q/4 bn (ln n)q/4−1/2 .



Zhao and Wu (2008), B21 → 0. Hence,

, and ∆n nbn fT (t ) 1/2 Ξn n, and n q P hn ln mn 0. By (7.23) in i=1

/n, qn = ln n/(nbn ) + +   P ˜ n → 0 and χn (6) It is easy to check that ∆n hn ln m  P n sup ˜ Cn  Kh (Ti − t )I2i  → 0. b4n

σ (Ti )εi Kb∗ (Ti −t )

b2n

/

˜

P

n

P

Let wn (t ) = fT (t )/fˆT (t ). It is easy to see that supt ∈T wn (t ) → 1. Note that sup

n 

x∈T j=1

|Kb∗n (Tj − t )| ≤ sup

n 

t ∈T j=1

|2Kbn (Tj − t )| + sup

n 

t ∈T j=1

√ |K√2bn (Tj − t )/ 2| = OP (nbn ),

we have n    sup |Kb∗n (Tj − t )| n   t ∈T j=1   ˆ sup |wn (t )| sup Cn  K (T − t )I3i  ≤ sup 2MCn · |β − β|  i=1 hn i  nbn inf fT (t ) t ∈T t ∈T˜n t ∈T˜n t ∈T   n    ∗   K ( T − T )( Y − X β) i j j j bn n    j =1   · Yi − Xi β −  Khn (Ti − t )   ˆ nbn fT (Ti ) i=1     P

→ 0. Similarly,

  n     P sup Cn  Khn (Ti − t )I4i  → 0.   t ∈T˜n i=1 Hence,

   n   √ 1 −z   ˜ n + ln ln m ˜ n + ln(2 π ) ≤ z = e−2e . Khn (Ti − t )I1i  − 2 ln m lim P sup Cn  n→∞   2 ˜ t ∈Tn i=1 

By the Slutsky Theorem, we can complete the proof of Theorem 3.3.



References Chambers, J.M., Cleveland, W.S., Kleiner, B., Tukey, P.A., 1983. Graphical Methods for Data Analysis. Wadsworth, Belmont, CA. Chen, H., 1988. Convergence rates for parametric components in a partially linear model. Annals of Statistics 16, 136–146. Chen, H., Shiau, J., 1991. A two-stage spline smoothing method for partially linear models. Journal of Statistical Planning and Inference 27, 187–201. Engle, R.F., Granger, C.W.J., Rice, J., Weiss, A., 1986. Semiparametric estimates of the relation between weather and electricity sales. Journal of the American Statistical Association 81, 310–320. Fan, J., Gijbels, I., 1996. Local Polynomial Modelling and its Applications. Chapman and Hall, London. Gao, J., 2007. Nonlinear Time Series: Semiparametric and Nonparametric Methods. Chapman & Hall/CRC, London. Gao, J., Hawthorne, K., 2006. Semiparametric estimation and testing of the trend of temperature series. Econometrics Journal 9, 333–356. Härdle, W., Liang, H., Gao, J.T., 2000. Partially Linear Models. Springer, Berlin. Heckman, N., 1986. Spline smoothing in a partly linear modes. Journal of the Royal Statistical Society: Series B 48, 244–248. Knafl, G., Sacks, J., Ylvisaker, D., 1985. Confidence bands for regression functions. Journal of the American Statistical Association 50, 683–691. Lu, X., 2009. Empirical likelihood for heteroscedastic partially linear models. Journal of Multivariate Analysis 100 (3), 387–396. Ma, Y., Chiou, J., Wang, N., 2006. Efficient semiparametric estimator for heteroscedastic partially linear models. Biometrika 93 (1), 75–84. Rice, J., 1986. Convergence rates for partially splined models. Statistics & Probability Letters 4, 203–206. Robinson, P.M., 1988. Root-n-consistent semiparametric regression. Econometrica 56, 931–954. Speckman, P., 1988. Kernel smoothing in partial linear models. Journal of the Royal Statistical Society: Series B 50, 413–436. You, J.H., Chen, G.M., 2005. Testing heteroscedasticity in partially linear regression models. Statistics & Probability Letters 73, 61–70. Zhao, Z., Wu, W.B., 2008. Confidence bands in nonparametric time series regression. Annals of Statistics 36, 1854–1878.