Functional coefficient regression models with time trend

Functional coefficient regression models with time trend

Journal of Econometrics 170 (2012) 15–31 Contents lists available at SciVerse ScienceDirect Journal of Econometrics journal homepage: www.elsevier.c...

415KB Sizes 1 Downloads 97 Views

Journal of Econometrics 170 (2012) 15–31

Contents lists available at SciVerse ScienceDirect

Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom

Functional coefficient regression models with time trend✩ Zhongwen Liang a , Qi Li b,∗ a

Department of Economics, University at Albany, SUNY, Albany, NY 12222, USA

b

Department of Economics, Texas A&M University, College Station, TX 77843-4228, USA

article

info

Article history: Received 9 May 2009 Received in revised form 16 August 2011 Accepted 22 August 2011 Available online 9 April 2012 JEL classification: C14 C22 Keywords: Varying coefficient model Time trend Partially linear model Specification tests

abstract We consider the problem of estimating a varying coefficient regression model when regressors include a time trend. We show that the commonly used local constant kernel estimation method leads to an inconsistent estimation result, while a local polynomial estimator yields a consistent estimation result. We establish the asymptotic normality result for the proposed estimator. We also provide asymptotic analysis of the data-driven (least squares cross validation) method of selecting the smoothing parameters. In addition, we consider a partially linear time trend model and establish the asymptotic distribution of our proposed estimator. Two test statistics are proposed to test the null hypotheses of a linear and of a partially linear time trend models. Simulations are reported to examine the finite sample performances of the proposed estimators and the test statistics. © 2012 Elsevier B.V. All rights reserved.

1. Introduction Trending is an important topic in economics such as trends in productivity growth and economic growth. As Krugman (1995) commented: ‘‘Productivity growth is the single most important factor affecting our economic well being. . . ’’, and as noted in Andrews and McDermott (1995), ‘‘it is clear that most macroeconomic variables and many financial variables exhibit trends’’. Most of the existing researches adopt models with a linear trend or a stochastic trend with a constant drift. When a constant drift is included in a stochastic trend model, this results in a model with the time trend variable having a constant coefficient while the disturbance has a variance that grows over time. In this paper we take an alternative approach to model the trend behavior. In our model, time trend has a stochastic coefficient while we also consider a partially linear specification where the coefficient of the time trend variable is a constant, the coefficients of other variables vary with a stationary covariate. Specifically, we consider a varying ⊤ β1 (Zt )+ t β2 (Zt )+ ut , where coefficient model of the form: Yt = X1t

✩ We would like to thank a co-editor, an associate editor and two anonymous referees for their insightful comments that greatly improve the paper. Li’s research is partially supported by SSHRC of Canada, and by National Science Foundation of China (Project #: 70773005). ∗ Corresponding author. Tel.: +1 979 845 9954; fax: +1 979 847 8757. E-mail address: [email protected] (Q. Li).

0304-4076/$ – see front matter © 2012 Elsevier B.V. All rights reserved. doi:10.1016/j.jeconom.2011.08.009

βj (z ), j = 1, 2, is a smooth function of z. A functional coefficient time trend model provides much flexibility and may help to identify the sources of changes. There is a literature that views the trend as deterministic trends with (multiple) breaks, e.g. see Perron (1989) and Zivot and Andrews (1992). The deterministic trend with multiple break points are closely related to our model. Our model is also related to a regime switching model with time trend and with ⊤ an observable state variable Zt such that Yt = X1t β1 + t β2 + ut if ⊤ Zt < c, and Yt = X1t α1 +t α2 +ut if Zt ≥ c, where c is a constant and the state variable Zt determines the regimes. A regime switching model allows for discrete jump for the β coefficient, but within each regime the coefficient stays constant. In our framework the coefficient β(·) is allowed to vary smoothly with respect to a relevant stationary covariate. In addition, our model contains the traditional linear time trend model, or a more general partially linear time trend model, as special cases. There are other related works on nonlinear time trend models. Andrews and McDermott (1995) studied the nonlinear parametric econometric models with deterministically trending variables. Robinson (1989) was the first to study a nonparametric time-varying coefficient model in which he modeled regression coefficients as an unspecified smooth function of time. Cai (2007) extended Robinson’s model to the case with serially correlated disturbance terms and applied the method to US stock market data to estimate a varying coefficient CAPM, and found the β -coefficient was indeed time varying and exhibited interesting patterns. In this paper we follow a more traditional approach

16

Z. Liang, Q. Li / Journal of Econometrics 170 (2012) 15–31

by introducing the time trend as a regressor but its coefficient can change smoothly over some relevant stationary covariate. We find that the asymptotic results are somewhat different from conventional nonparametric regression models with stationary (I (0)) or unit-root non-stationary (I (1)) regressors. For example, we show that surprisingly, the popularly used local constant kernel estimation method leads to an inconsistent estimation result, while a local polynomial method can be used to consistently estimate the model. Our approach extends the existing functional coefficient regression model literature a step further to include a time trend as a regressor. Recently, the varying coefficient models have attracted much attention among econometricians and statisticians. One attractive feature of this model is its ability to capture the nonlinearity of the data and its capacity to ameliorate the so-called ‘‘curse of dimensionality’’. Another advantage is its interpretability. It provides simple low dimension curves (often univariate curves) which describe the marginal effects of explanatory variables on the dependent variable. Cai et al. (2000), Fan and Zhang (2000), Zhang et al. (2002), Cai (2007), Cai et al. (2009) and Xiao (2009) are among the recent works using this model to deal with stationary and nonstationary time series data.1 The paper is organized as follows. In Section 2, we consider a varying coefficient model with a time trend and some stationary regressors. Asymptotic theory of a local polynomial estimator is derived. We also provide asymptotic analysis of the least squares cross validation selected smoothing parameter, and study a partially linear varying coefficient model. In Section 3, we propose two test statistics for testing a linear and a partially linear varying coefficient models. We report simulation results in Section 4 to examine the finite sample performances of our proposed estimators and the test statistics. The proofs are relegated into three appendices with the supplementary Appendix C available upon request.

will introduce (d − 1)q extra parameters to be estimated (the derivatives of the (d − 1) × 1 vector-valued function β1 (·)). Since the nonparametric kernel method only uses local data (data close to z) when estimating β(z ), an estimation method with too many parameters may severely limit the usefulness of the nonparametric estimation method in practice. It is easy to show that Bˆ (z ) has the following close form expression:

2. A time trend varying coefficient model

l {(X1t⊤ , Zt , ut )}+∞ = t =−∞ is a strictly stationary process. We use C l C (D ) to denote the space of functions that has a continuous lth

Bˆ (z ) =

 n 

 −1 X˜ t X˜ t⊤ Kh,zt z

t =1

n 

⊤ X˜ t (X1t β1 (Zt )

t =1

+ t β2 (Zt ) + ut )Kh,zt z  −1  n n   ⊤ ⊤ = B (z ) + X˜ t X˜ t Kh,zt z X˜ t X1t (β1 (Zt ) t =1

t =1



q  1 (i) − β1 (z )) + t β2 (Zt ) − β (z ) (Zt − z )i i ! 2 i =0  −1 n n   ⊤ ˜ ˜ × Kh,zt z + Xt Xt Kh,zt z X˜ t ut Kh,zt z . t =1



(2.2)

t =1

The first (d − 1) elements of Bˆ (z ) estimate β1 (z ), and the remaining (q + 1) elements estimate β2 (z ) and its derivatives up to the order q. Let Dn be a (d + q) × (d + q) diagonal matrix defined by Dn = Diag(1, . . . , 1, n, nh, . . . , nhq ), Sn (z ) = n n 1 −1 −1 ˜ 1 1 ˜ ˜⊤ D− n [n t =1 Xt Xt Kh,zt z ]Dn , L2n (z ) = n t =1 Dn Xt ut Kh,zt z and

L1n (z ) = 1 i!

1 n

n

t =1 i

⊤ 1˜ D− n Xt [X1t (β1 (Zt ) − β1 (z )) + t (β2 (Zt ) −

β2(i) (z ) (Zt − z ) )]Kh,zt z . Then it is easy to show that

q

i=0

Dn [Bˆ (z ) − B (z )] = Sn (z )−1 (L1n (z ) + L2n (z )). Below we list some regularity conditions. We assume that

2.1. Local polynomial estimation method and asymptotic results We consider the following time trend varying coefficient model. ⊤ Yt = Xt⊤ β(Zt ) + ut = X1t β1 (Zt ) + t β2 (Zt ) + ut ,

(2.1)

where Zt is a scalar, X1t is a (d−1)×1 vector of stationary regressors (d ≥ 2), β1 (·) is a (d − 1) × 1 vector-valued smooth coefficient function associated with X1t . We will use a qth-order (q ≥ 1) local polynomial method to (1) estimate model (2.1). Let B (z ) = (β1 (z )⊤ , β2 (z ), β2 (z ), . . . ,

β2(q) (z ))⊤ with β2( j) (z ) = d βdz2 j(z ) , j = 1, . . . , q, and define X˜ t⊤ = (X1t⊤ , t , t (Zt − z ), 21! t (Zt − z )2 , . . . , q1! t (Zt − z )q ). Then the qth (z ) = (βˆ 1 (z )⊤ , βˆ 2 (z ), βˆ 2(1) (z ), order local polynomial estimator B ( q ) . . . , βˆ 2 (z ))⊤ is given by j

(z ) = argmin B B

n  [Yt − X˜ t⊤ B ]2 Kh,zt z , t =1

where Kh,zt z = h−1 K ((Zt − z )/h). K (·) is the kernel function and h is the smoothing parameter. Note that we use local constant approximation for β1 (·) and local polynomial approximation for β2 (·). Of course we could also use local polynomial approximation for β1 (·). However, this

1 For a variety of economic applications of varying coefficient models, see Mamuneas et al. (2006), Stengos and Zacharias (2006), among others.

derivative function on D , where D is the support of Zt . Assumption 2.1. The coefficient function β1 (·) ∈ C 3 , β2 (·) ∈ C q+3 , where q ≥ 1 is a positive integer, f (·) ∈ C 2 and σ 2 (x1 , ·) ∈ C 2 for all x1 in the support of X1t , where f (·) is the density function of Zt and σ 2 (x1 , z ) = E (u2t |X1t = x1 , Zt = z ). Assumption 2.2. a. The kernel function K (·) is a bounded and symmetric density function with a compact support SK . Also, K (·) satisfies the Lipschitz condition, that is, |K (u) − K (v)| ≤ C |u − v| for all u, v ∈ SK , where C is a positive constant. b. |g (u, v|x0 , x1 ; l)| ≤ M1 < ∞, for all l ≥ 1, where g (u, v| x0 , x1 ; l) is the conditional density function of (Z0 , Zl ) given (X10 , X1l ), and f (u|x) ≤ M2 < ∞, where f (u|x) is the conditional density function of Zt given X1t = x.  ⊤ γ 1−2/δ c. The process {(X1t , Zt , ut )} is α -mixing with ∞ j=1 j [α( j)]

< ∞ for some δ > 2 and γ > 1 − 2/δ . Also, E ∥X1t ∥2δ < ∞. d. E [u20 + u2l |Z0 = z , X10 = x0 ; Zl = z ′ , X1l = x1 ] ≤ M3 < ∞, for all l ≥ 1, x0 , x1 ∈ Rd−1 , z and z ′ in a neighborhood of z0 . e. There exists δ ∗ > δ , where δ is given in condition 2.2c, such ∗ that E [|u|δ |Zt = z , X1t = x] ≤ M4 < ∞, for all x ∈ Rd−1 ∗ and z in a neighborhood of z0 , and α(n) = O(n−θ ), where ∗ ∗ ∗ 2δ ∗ θ ≥ δδ /{2(δ − δ)}. Also, E ∥X1t ∥ < ∞, and n−(δ/4−1/2) −(δ/4+1/2−δ/δ ∗ ) h = O(1). f. As n → ∞, h → 0 and nh → ∞. Further, there exists a sequence of positive integers sn such that sn → ∞, sn = o((nh)1/2 ) and (n/h)1/2 α(sn ) → 0, as n → ∞.

Z. Liang, Q. Li / Journal of Econometrics 170 (2012) 15–31

(q+1)

Theorem 2.1. Under Assumptions 2.1 and 2.2, we have

(i) If q is an odd positive integer and nhq+1 → 0 as n → ∞, then   1 and βˆ 1 (z ) − β1 (z ) = Op h2 + nhq+1 + √ nh  2  h 1 βˆ 2 (z ) − β2 (z ) = Op + hq +1 + √ . n

n3 h

(ii) If q is an even positive integer and nhq+2 → 0 as n → ∞, then   1 ˆβ1 (z ) − β1 (z ) = Op h2 + nhq+2 + √ and nh  2  h 1 βˆ 2 (z ) − β2 (z ) = Op . + hq +2 + √ n

n3 h

From Theorem 2.1 we see that βˆ 1 (z ) − β1 (z ) has the familiar bias term h2 and variance term (nh)−1 , and the additional term nhq+1 (or nhq+2 ) comes from the bias term associated with the time trend regressor t which behaves like a factor n.2 Also, we see that βˆ 2 (z ) − β2 (z ) = Op (n−1 (βˆ 1 (z ) − β1 (z ))). This result is similar to that of a parametric time trend model where the coefficient of the time trend variable can be estimated much more accurately than the coefficients of stationary variables.

ˆ , we make the To derive the asymptotic distribution of β(·) following additional assumption. 2q/3+1/3

Assumption 2.3. (a) As n → ∞, h → 0, nh → ∞, nh → ∞, in addition we require that (i) if 1 ≤ q ≤ 3, nhq+1 → 0; and (ii) if 4 ≤ q ≤ 9, nh2q/3+7/3 → 0 and nh7 → 0. 3

4

(b) If q = 2, h → 0, nh → ∞ and nh → 0 as n → ∞. (c) If q = 2, h → 0, nh → ∞ and nh3 → c as n → ∞, where c is a positive constant. Remark 2.1. The conditions in Assumptions 2.1 and 2.2 are quite standard and are similar to those used in Cai et al. (2000). Assumption 2.3 rules out the case of a local constant estimator (q = 0). We introduce some notations and give some definitions. Let S (z ) be a (q + 2) × (q + 2) symmetric matrix with its elements ⊤ |z ] f (z ), S1,2 (z ) = 21 E [X1t |z ] f (z ), defined by S1,1 (z ) = E [X1t X1t µ

i−2 S1,i (z ) = 2((i− E [X1t |z ] f (z ), for 3 ≤ i ≤ q + 2, S2,2 (z ) = 13 f (z ), 2)!) µi+j−4 Si,j (z ) = 3((i−2)!( j−2)!) f (z ), for 2 ≤ i ≤ q + 2, i ≤ j ≤ q + 2, where

µl =

v l K (v)dv . ⊤ Let η1 (z ) = E (X1t |z )f (z ), η2 (z ) = E (X1t X1t |z )f (z ), D1n (z ) = (1) (1) ( 2 ) η2 (z )β1 (z ) + 21 η2 (z )β1 (z ), and define  nhq+1 µq+1  2  η1 (z )β2(q+1) (z ) h µ2 D1n (z ) +  2 ( q + 1 )!      if q is odd,  nhq+2 µq+2 1 B1n (z ) = 2 h µ2 D1n (z ) + η1(1) (z )β2(q+1) (z )   2 ( q + 1 )!       1   + η1 (z )β2(q+2) (z ) if q is even. (q + 2)! 

17

For 2 ≤ j ≤ q + 2, define D2n (z ) = (q+11)! β2 1 β (q+2) (z )f (z ), and (q+2)! 2

 nhq+2 µq+j h  µj−1 η1 (z )β1(1) (z ) + D2n (z )    2(( j − 2)!) 3     if q, j are odd        h2 1 (1) (2) (1)   µ η µ η ( z )β ( z )β ( z ) ( z ) +  j 1 j 1 1 1  2(( j − 2)!) 2     q + 1  µq+j−1 (q+1) nh   β (z )f (z ) +    3 (q + 1)! 2     if q odd, j even    h Bjn (z ) = µj−1 η1 (z )β1(1) (z )  2(( j − 2)!)     nhq+1 µq+j−1 (q+1)    + β (z )f (z )   3 (q + 1)! 2     if q even, j odd        h2 1  (1) (1) (2)    2(( j − 2)!) µj η1 (z )β1 (z ) + 2 µj η1 (z )β1 (z )      nhq+2 µq+j    + D2n (z )   3  if q, j are even. Let Bn (z ) = (B1n (z )⊤ , B2n (z ), . . . , Bq+2,n (z ))⊤ ,

 ⊤ 2 ν0 E [X1t X1t σ (X1t , z )|z ] f (z ), Σa =  1 ⊤ 2 aE [X1t σ (X1t , z )|z ] f (z ), 2

2 This is because

t =1

t = O(n2 ) so that t acts like a factor of n.

1 2



E [X1t σ 2 (X1t , z )|z ] f (z )a⊤ 

,

Ωa

where a⊤ = (ν0 , ν1 , ν2 /(2!), ν3 /(3!), . . . , νq /(q!)), Ωa is a (q + 1) × (q + 1) symmetric matrix with its (i, j)th element given by E [u2t |z ]f (z )νi+j−2

, i, j = 1, . . . , q + 1, and νl = v l K 2 (v)dv . Here and after we use the notation E [At |z ] ≡ E [At |Zt = z ]. Finally, define



3((i−1)!( j−1)!)

ν2(q+1) (q+1) (β (z ))2 η2 (z ),  1)!)2 2 Σb =  3((q + 1 (q+1) (β (z ))2 b(η1 (z ))⊤ , 4((q + 1)!) 2

1



4((q + 1)!)

Ωb

(q+1)

(β2

(z ))2 η1 (z )b⊤

  

where b⊤ = (ν2q+2 , ν2q+3 , ν2q+4 /(2!), ν2q+5 /(3!), . . . , ν3q+2 /(q!)), and Ωb is a (q + 1) × (q + 1) symmetric matrix with its (i, j)th elν2q+i+j f (z )[β

(q+1)

(z )]2

2 ement given by 5[(i−1)!( j−1)!][( , i, j = 1, . . . , q + 1. q+1)!]2 As can be seen from the above expression, Σa and Σb are from different sources. Σa is related to the asymptotic variance of the term associated with ut , while Σb is related to β(Zt ) − β(z ). With the above assumptions and notations we have the following theorem.

Theorem 2.2. Under the Assumptions 2.1 and 2.2, let z be any interior point of D ,

(a) Under Assumption 2.3(a) √ nh[Dn (Bˆ (z ) − B (z )) − Sn (z )−1 Bn (z )] d

→ N (0, S (z )−1 Σa S (z )−1 ). (b) Under Assumption 2.3(b)  1 [Dn (Bˆ (z ) − B (z )) − Sn (z )−1 Bn (z )] nh5

n

(z )f (1) (z ) +

d

→ N (0, S (z )−1 Σb S (z )−1 ).

18

Z. Liang, Q. Li / Journal of Econometrics 170 (2012) 15–31

(c) Under Assumption 2.3(c) √ nh[Dn (Bˆ (z ) − B (z )) − Sn (z )−1 Bn (z )]

because it can be shown that Var(βˆ 2 (z )) = O(n−1 Var(βˆ 1 (z ))) = O(h + (n2 h)−1 ) = o(1), where βˆ 2 (z ) is the local constant estimator of β2 (z ).

d

→ N (0, S (z )−1 (Σa + c 2 Σb )S (z )−1 ). 2.3. Smoothing parameter selection: a data-driven method Remark 2.2. Under Assumption 2.3(a) the asymptotic variance of ˆ z ) comes from the term associated with ut . If q = 1, it requires β( nh → ∞ and nh2 → 0. Thus, the admissible range for h is quite narrow. The higher the value of q, the wider the range for admissible h. When q = 2 and under Assumption 2.3(b), both the leading estimation bias and variance come from the term associated with β(Zt ) − β(z ), and the term associated with ut is asymptotically negligible. While for q = 2 and under Assumption 2.3(c), the leading variance comes from both ut and β(Zt ) − β(z ) as can be seen from the asymptotic variance expression. We show in Appendix A that Sn (z ) = S (z ) + op (1) = Op (1). Also, we know that (estimation bias) B1n (z ) = O(h2 + nhq+1 ) and B2n (z ) = O(h2 + nhq+1 ). However, if q is even they reduce to B1n (z ) = O(h2 + nhq+2 ) and B2n (z ) = O(h2 + nhq+2 ) because µj = 0 for odd integer j. Hence, from Theorem 2.2 one can infer

ˆ z ) − β(z ). For example, if q is even the rate of convergence of β( and Assumption 2.3(c) holds (q = 2), then Theorem 2.2 implies that βˆ 1 (z ) − β1 (z ) = Op (h2 + nh4 + (nh)−1/2 ) and βˆ 2 (z ) − β2 (z ) = Op (h2 /n + h4 + (n3 h)−1/2 ), which are of course in agreement with Theorem 2.1. 2.2. The inconsistency of local constant estimator Up to now we only consider the case of the qth-order local polynomial estimation method with q ≥ 1, we now show that the local constant method (q = 0) leads to an inconsistent estimation result. For expositional simplicity we only consider the case Xt1 = 1. Therefore, we consider the following simple time trend varying coefficient model: Yt = Xt⊤ β(Zt ) + ut = β1 (Zt ) + t β2 (Zt ) + ut ,

(2.3)

E (ut |X1t , Zt ) = 0,

where Xt⊤ = (1, t ), Zt is a scalar, β(·) is a 2 × 1 vector-valued smooth coefficient function. The local constant estimator of β(z ) = (β1 (z ), β2 (z ))⊤ is given by



  n   1 βˆ 1 (z ) = t βˆ 2 (z )

t t2

t =1

−1

 Kh,tz

n    1 t =1

t

Yt Kh,tz .

In the supplementary Appendix C we prove the following result. Lemma 2.3. Under Assumptions 2.1 and 2.2, but replace q ≥ 1 by q = 0. Then we have Var(βˆ 1 (z )) = a0 (nh) + b0 (nh)−1 + O(nh3 + h/h), (1)

where a0 = (8/15)(β2 (z )) ν2 /f (z ) and b0 = 16ν0 σ (z )/f (z ) are positive constants, σ 2 (z ) = E (u2t |Zt = z ) and νj = K (v)2 v j dv . 2

2

The proof of Lemma 2.3 is given in the supplementary Appendix C. From Lemma 2.3 we know that the source of inconsistency is a0 (nh). Also, from the expression of a0 it can be seen that it is a term associated with β2 (Zt ), the coefficient of the time trend variable, that causes the inconsistency of βˆ 1 (z ). Since nh → ∞ as n → ∞, the local constant method cannot consistently estimate β1 (·). However, we would like to mention that, although the local constant method cannot consistently estimate β1 (z ), it still consistently estimate β2 (z ), the coefficient of the time trend,

Bandwidth selection is of fundamental importance in nonparametric/semiparametric estimations. The least squares crossvalidation method is one of the most widely used approaches in choosing optimal bandwidths. By minimizing a sample estimate of mean integrated squared error (MISE), one can obtain the estimates of optimal bandwidths that are optimal in the sense of minimizing asymptotic MISE. Let hˆ c v denote the value of h that minimizes the following objective function CV (h) = n−1

n  [Yt − X1t⊤ βˆ 1,−t (Zt ) − t βˆ 2,−t (Zt )]2 M (Zt ), t =1

where βˆ 1,−t (·) and βˆ 2,−t (·) are leave-one-out estimates of β1 (·) and β2 (·), respectively, and M (·) is a weight function that trims out observations near the boundary of the support of Zt . To study the asymptotic behavior of the cross validation selected smoothing parameter, we first make some additional assumptions. Assumption 2.4. {(X1t , Zt , ut )}nt=1 are i.i.d., E (u2t |z ) and E (u4t |z ) are both finite for all z ∈ M , the support of M (Zt ). Assumption 2.5. h ∈ Hn = {hn | hn ∈ [0, ηn ], nhn ≥ tn }, where ηn is a positive sequence that goes to zero slower than the inverse of any polynomial in n, and tn is a sequence that diverges to +∞. Remark 2.3. We will only prove the i.i.d data case here to save space. The mixing case can be proved similarly as in Xia and Li (2002), but it requires a very lengthy proof. In the supplementary Appendix C we show that if q is even CV (h) = B1 h4 + B2 n2 h2q+4 + if q is odd CV (h) = B4 h4 + B5 n2 h2q+2 +

B3 nh B6

nh

+ (s.o.),

+ (s.o.),

(2.4)

where Bj (j = 1, . . . , 6) are some constants, (s.o.) denote smaller order terms. The proof of (2.4) is given in the supplementary Appendix C which is available from the authors upon request. From (2.4), we immediately obtain. Theorem 2.4. Under the Assumptions 2.1, 2.2, 2.4 and 2.5, we have that If q is even, (i) hˆ c v = c1 n−3/(2q+5) + (s.o.) if q ≤ 4; and

(ii) hˆ c v = c2 n−1/5 + (s.o.) if q ≥ 6. If q is odd, (iii) hˆ c v = c3 n−3/(2q+3) + (s.o.) if q ≤ 5; and (iv) hˆ c v = c4 n−1/5 + (s.o.) if q ≥ 7 where c1 = ( (2q+34)B )1/(2q+5) , c2 = ( 4B3 )1/5 , c3 = ( (2q+62)B )1/(2q+3) 2 1 5 B and c4 = ( 4B6 )1/5 . B

B

B

4

Below we present the asymptotic distribution result for the local polynomial estimator with cross validation selected smoothing parameter. q

˜ n = Diag(1, . . . , 1, n, nhˆ c v , . . . , nhˆ c v ), Theorem 2.5. Denote by D n  1 1 ˜Sn (z ) = D˜ − ˜ ˜⊤ ˜ −1 ˜ = [ nt=1 X˜ t X˜ t⊤ n [n t =1 Xt Xt Khˆ c v ,ztz ]Dn , B (z ) n

Khˆ c v ,zt z ]−1 t =1 X˜ t Yt Khˆ c v ,zt z . Then under the Assumptions 2.1, 2.2, 2.4 and 2.5, we have that

Z. Liang, Q. Li / Journal of Econometrics 170 (2012) 15–31

(a) If 1 ≤ q ≤ 9 and q ̸= 2,  ˜ n (B˜ (z ) − B (z )) − S˜n (z )−1 Bn (z )] nhˆ c v [D

Theorem 2.6. Under Assumptions 2.1, 2.2, 2.4 and 2.6, we have d

n3/2 (βˆ 20 − β20 ) → N (0, σu2 V ), where σu2 = E (u2t ), V = M1−1 (M1 + M2 )M1−1 , M1 = (1/3) − −1 −1 ⊤ −1 (1/4)E [m⊤ 1t m2t m1t ], M2 = (1/4){E (m1t m2t /ft )E (m2t )E (m2t m1t / ⊤ −1 ⊤ −1 ft ) − 2E (m1t m2t /ft )E (m1t ft ) + E (m1t m2t m1t )}.

d

→ N (0, S (z )−1 Σa S (z )−1 ). (b) If q = 2,  ˜ n (B˜ (z ) − B (z )) − S˜n (z )−1 Bn (z )] nhˆ c v [D d

→ N (0, S (z )−1 (Σa + c 2 Σb )S (z )−1 ). Our proof strategy is similar to the approach of Li and Li (2010), and we will combine an asymptotic stochastic equicontinuity argument with Theorem 2.2 to prove Theorem 2.5. The sufficient condition we will use is a corollary in Bhattacharya and Waymire (2009) or Theorem 3.1 in Li and Li (2010). 2.4. A partially linear time trend model In this section we consider a partially linear time trend model. The model is obtained by imposing the restriction that β2 (·) ≡ β20 in model (2.1), where β20 is a constant. In this case we have ⊤ Yt = X1t β1 (Zt ) + t β20 + ut ,

β˜ 1 (Zt ) =

E (ut |X1t , Zt ) = 0.

(2.5)

−1 ⊤ X1s X1s Kh,ts



s

−1

 βˆ 1 (z ) =







X1s X1s Kh,sz

X1s [Ys − sβˆ 20 ]Kh,sz ,

where Kh,sz = h−1 K ((Zs − z )/h). With βˆ 20 − β20 = Op (n−3/2 ), it is

easy to show that the asymptotic distribution of βˆ 1 (z ) − β1 (z ) is the same as the case when β20 is known.



nh[βˆ 1 (z ) − β1 (z ) − h2 B(z )]

(2)

(2.6)



(2.9)

s

d

⊤ ⊤ where A1t = [ s X1s X1s Kh,ts ]−1 s X1s sKh,ts , A2t = [ s X1s X1s  Kh,ts ]−1 s X1s Ys Kh,ts , and Kh,ts = h−1 K ((Zt − Zs )/h). Note that



1b = 1( fˆt ≥ b) is an indicator function, b = bn > 0 and b → 0 as n → ∞. (iii) needs to be modified to nh4 /b → 0 as n → ∞. It can be shown that with the above modifications, the conclusion of Theorem 2.6 remains valid without Assumption 2.6(ii). With βˆ 20 replacing β20 in (2.6), we obtain a feasible estimator of β1 (z ) given by

→ N (0, m(z )−1 Ω (z )m(z )−1 ),

X1s [Ys − sβ20 ]Kh,ts

s

≡ A2t − A1t β20 , 

Assumption 2.6(ii) imposes restrictions on the density function 1 f (·), if f (·) is bounded below by a positive constant and m− 2t m1t is a bounded function on the (bounded) support of Zt , then (ii) holds trivially. If ft is not bounded away from zero, (ii) may be violated, in this case one may drop Assumption 2.6(ii) and replacing (Yt − ⊤ ⊤ X1t A2t ) by (Yt − X1t A2t )1b in the definition of βˆ 20 in (2.8), where

s

Similar to Robinson (1988), Zhang et al. (2002), we propose to use a profile least squares approach to estimate β20 . First we treat β20 as if it were known and re-write (2.5) as Yt −t β20 = X1t⊤ β1 (Zt )+ ut , then we estimate β1 (Zt ) by (the local constant estimator)

 

19

(2.10) (1)

where B(z ) = (µ2 /2)[β1 (z ) + 2m(z )−1 m(1) (z )β1 (z )], Ω (z ) = 2 ν0 E [X1t X1t⊤ σ 2 (X1t , z )| E (u2t |X1t =  Zt = 2 z ]/f (z ), σ (x1 , z ) = 2 x1 , Zt = z ), µ2 = K (v)v dv and ν0 = K (v) dv (see Li et al. (2002)).

β˜ 1 (Zt ) defined in (2.6) is infeasible as it depends on the unknown parameter β20 . We will derive a feasible estimator for β1 (Zt ) later.

3. Model specification testing

Remark 2.4. From Section 2.2 we know that the local constant method does not lead to consistent estimation of β1 (z ) for the semiparametric model considered in Section 2.1. However, in this section we consider a different model, a partially linear time trend model. The local constant method can lead to a consistent estimation result for this model as we will show below.

In this section we propose two test statistics for testing some restricted null models. In Section 3.1 we consider a joint test for β1 (·) and β2 (·) both being constants, i.e., we test a linear model against a semiparametric varying coefficient model, while in Section 3.2 we test the null of a partially linear varying coefficient model, i.e., we test whether β2 (·) is a constant, leaving β1 (·) unrestricted.

Replacing β1 (Zt ) by β˜ 1 (Zt ) in (2.5) and re-arranging terms, we obtain Yt − X1t A2t = (t − X1t A1t )β20 + ϵt , ⊤



(2.7)

where ϵt ≡ Yt − X1t A2t −(t − X1t A1t )β20 . Applying the OLS method to the above model leads to ⊤

βˆ 20 =



 −1   ⊤ 2 (t − X1t A1t ) (t − X1t⊤ A1t )(Yt − X1t⊤ A2t ). t

(2.8)

3.1. Testing constancy of β(·) The null hypothesis considered in this section is H0a : P [β(Zt ) = β0 ] = 1, where β0 is a d × 1 vector of constant parameters. The alternative hypothesis is the negation of H0a , i.e., H1a : P [β(Zt ) = β] < 1 for all β ∈ M ⊂ Rd , where M is a compact subset of Rd . Our test statistic is based on

t

Ina = Assumption 2.6. (i) Define m1t = m1 (Zt ) = E (X1t |Zt ) and ⊤ m2t = m2 (Zt ) = E (X1t X1t |Zt ). Both m1 (z ) and m2 (z ) are three times continuously differentiable and have finite fourth moments. E (u2t |X1t , Zt ) = E (u2t ). 1 (ii) E [m− 2t m1t /ft ] is finite, where ft = f (Zt ).

(iii) h → 0, nh → ∞ and nh4 → 0 as n → ∞. The next theorem gives the asymptotic distribution of βˆ 20 .

n n 1  ⊤ Xt Xs uˆ t uˆ s Kh,ts , n4 t =1 s̸=t

where uˆ t is the least squares residuals obtained from estimating the null (linear) model and Kh,ts = h−1 K ((Zt − Zs )/h). We make the following assumptions. Assumption 3.1. {ut }nt=1 is an i.i.d. process, ut has zero mean and ⊤ E (u4t ) is finite. {X1t , Zt }nt=1 is a strictly stationary β -mixing process with β(τ ) = O(ρ τ ) for some 0 < ρ < 1.

20

Z. Liang, Q. Li / Journal of Econometrics 170 (2012) 15–31

3.2. Testing constancy of β2 (·)

Assumption 3.2. As n → ∞, h → 0 and nh → ∞. Remark 3.1. The i.i.d. assumption for ut is made to simplify the proof of the following Theorem 3.1. It can be relaxed to be a weakly dependent process (e.g., Fan and Li (1999)) with a more tedious proof. The asymptotic null distribution of Ina is given in the next theorem. Theorem 3.1. Under Assumptions 2.1, 2.2 a, 3.1 and 3.2, and under H0a we have



 d =n / Vˆ 0 → N (0, 1), n n where Vˆ 0a = (4h/n6 ) t =1 s>t (Xt⊤ Xs )2 uˆ 2t uˆ 2s Kh2,ts .

def Tna

hIna

Inb =

The next theorem shows that the Tna test is a consistent test. Theorem 3.2. Under Assumptions 2.1, 2.2 a, 3.1 and 3.2, and under H1a ,

(i) If P [β2 (Zt ) = β02 ] < 1, then we have P [Tna > cn ] → 1 for any sequence cn = o(n3 h1/2 ). (ii) If P [β2 (Zt ) = β02 ] = 1 and P [β1 (Zt ) = β01 ] < 1, then P [Tna > cn ] → 1 for any cn = o(nh1/2 ). The proofs of Theorems 3.1 and 3.2 are given in Appendix B. Theorem 3.2 states that the test statistic Tna is consistent, i.e., the probability of rejecting a false null hypothesis converges to one as the sample size goes to infinity, no matter whether the false null hypothesis comes from β2 (·) ̸= β02 , or from β1 (·) ̸= β01 . It also shows that the test statistic Tna diverges to ∞ at a faster rate if the coefficient of the time trend variable is non-constant than the case that the coefficient of the stationary variable is non-constant. This is what one should expect because the time trend variable asymptotically dominates the stationary variable X1t . Our simulation results show that the Tna test based on the asymptotic standard normal critical values are severely undersized. In order to better approximate the finite sample null distribution of the test statistic Tna , we recommend the use of the following bootstrap method. A bootstrap procedure Step(i). Let uˆ t denote the least squares residual, we generate the bootstrap error by u∗t = auˆ t with probability r, and u∗t = buˆ t with



In this section we consider the null hypothesis of testing H0b : P [β2 (Zt ) = β20 ] = 1, where β20 is a constant, against H1b : P [β2 (Zt ) = β20 ] < 1. Under either H0b or H1b , no restriction is imposed on β1 (·). Thus, under H0b , the regression model is the partially linear time trend model considered in Section 2.4, ⊤ i.e., Yt = X1t β1 (Zt ) + t β20 + ut . Let βˆ 20 and βˆ 1t = βˆ 1 (Zt ) denote the estimators for β20 and β1 (Zt ) proposed in Section 2.4. Then we ⊤ˆ estimate ut by uˆ t = Yt − X1t β1t − t βˆ 20 . Our test statistic for H0b is based on



probability √ 1 − r,√where a = (1 − 5)/2, b = (1 + 5)/2 and r = (1 + 5)/(2 5). Then we generate Yt∗ = αˆ 0 + t αˆ 1 + u∗t , where αˆ 0 and αˆ 1 are the least squares estimators of α0 and α1 based on the linear (null) model. Call {1, t , Zt , Yt∗ }nt=1 the bootstrap sample.

Step(ii). compute the bootstrap test statistic Tna∗ using the bootstrap sample, i.e., Tna∗ is obtained by the same way as Tna except that uˆ t is replaced by uˆ ∗t = Yt∗ − αˆ 0∗ − t αˆ 1∗ , where αˆ 0∗ and αˆ 1∗ are the least squares estimators of α0 and α1 using the bootstrap sample {1, t , Zt , Yt∗ }nt=1 . Step(iii). repeat the above steps (i) and (ii) a large number of times, say B times, then use the empirical distribution of these B simulated bootstrap test statistics to obtain the critical values for the test statistic. The next theorem shows that the above bootstrap method works. Theorem 3.3. Under Assumption 2.1, 2.2 a, 3.1 and 3.2, we have sup |P ∗ (Tna∗ ≤ x) − Φ (x)| = op (1), x∈R

where Φ (·) is the distribution function of a standard normal variable, and P ∗ (·) = P (·|{Xt , Yt }nt=1 ).

1



n( n − 1 )

t

uˆ t uˆ s Kh,ts .

(3.1)

s̸=t

Assumption 3.3. (i) (X1t , Zt ) is a strictly stationary β -mixing process with mixing coefficient β(τ ) = ρ τ for some 0 < ρ < 1. (ii) nh → ∞ and nh9/2 → 0 as n → ∞. The asymptotic distribution of Inb is given in the next theorem. Theorem 3.4. Under Assumptions 2.1, 2.2 a, 3.1 and 3.3, we have

(i) Under H0b ,  d def √ Tnb = n hInb / Vˆ b → N (0, 1), ˆb where √ V is a consistent estimator of the asymptotic variance of n hInb and its definition is given in Appendix B (see the proof of Theorem 3.4). (ii) Under H1b , the test statistic Tnb diverges to +∞ at the rate of nh1/2 . We also recommend using a bootstrap procedure to better approximate the null distribution of Tnb . Specifically, we generate ⊤ˆ the two point wild bootstrap error u∗t from uˆ t = Yt − X1t β1t −

⊤ˆ t βˆ 20 , then we generate Yt∗ = X1t β1t + t βˆ 20 + u∗t . Using the bootstrap sample {X1t , Zt , t , Yt∗ }nt=1 , we estimate the partially

∗ ∗ varying coefficient model to obtain βˆ 20 , βˆ 1t , and uˆ ∗t = Yt∗ −

⊤ ˆ∗ ∗ X1t β1t − t βˆ 20 . Finally, we obtain Tnb∗ (the same as Tnb except that uˆ t is replaced by uˆ ∗t ). This process is repeated a large number of times to obtain the empirical distribution of Tnb∗ . Similar to the proof of Theorem 3.4, one can justify the asymptotic validity of the above bootstrap method.

3.3. Smoothing parameter selection for testings In this section, we discuss smoothing parameter selections for the test statistics proposed in Sections 3.1 and 3.2. The method we adopt is the least squares cross validation. Gao and Gijbels (2008) proposed a method of selecting smoothing parameter based on a power function Edgeworth expansion. We will prove the validity of the cross validation smoothing parameter selection method in our testing problems and compare it with Gao and Gijbels’ (2008) local Edgeworth expansion method via simulations. It can be shown that all the test statistics presented in Sections 3.1 and 3.2 have the same asymptotic distributions when h is replaced by the data-driven least squares cross validation method. To save space we will only prove this for the case of Ina statistic. Theorem 3.5. Under Assumptions 2.1, 2.2 a, 3.1 and 3.2, and under H0a we have





d

n hˆ c v ˜Ina / V˜ 0 → N (0, 1),

Z. Liang, Q. Li / Journal of Econometrics 170 (2012) 15–31

21

Table 1 n2/3 × quantiles-ASE(βˆ 1 (·)). n2/3

n

q=2

q=6 Quantiles (hˆ CV )

Quantiles (had-hoc )

50 100 200 400

13.6 21.5 34.2 54.3

Quantiles (hˆ CV )

Quantiles (had-hoc )

0.1

0.5

0.9

0.1

0.5

0.9

0.1

0.5

0.9

0.1

0.5

0.9

28.0 21.8 15.6 13.6

9.61 8.81 8.58 8.43

3.87 4.38 4.93 5.17

13.5 12.1 10.1 9.03

4.78 4.75 5.12 5.11

1.78 2.01 2.55 2.76

10.3 8.60 7.14 6.66

4.03 3.85 3.92 3.87

1.35 1.57 1.86 1.98

7.26 6.58 5.90 5.78

2.75 2.76 2.50 2.32

0.743 0.778 0.773 0.742

Table 2 n8/3 × quantiles-ASE(βˆ 2 (·)). n8/3

n

q=2

q=6 Quantiles (hˆ CV )

Quantiles (had-hoc )

50 100 200 400

t =1

n

(

1 n4

n

0.1

0.5

0.9

0.1

0.5

0.9

0.1

0.5

0.9

0.1

0.5

0.9

194.5 133.1 87.8 60.3

44.8 38.8 35.6 32.5

18.3 18.9 20.6 20.3

84.6 71.0 49.9 40.8

25.9 24.9 23.2 22.3

9.91 11.6 12.5 12.6

126.3 102.1 72.2 48.5

29.2 26.5 23.5 21.1

11.8 11.9 12.5 11.9

74.7 57.4 45.8 33.4

19.5 16.8 15.0 13.3

7.20 6.52 6.38 5.27

n ⊤ t =1 s̸=t Xt Xs ut us Khˆ c v ,ts ⊤ 2 2 2 2 s>t Xt Xs ut us Khˆ ,ts . cv

where ˜Ina =

n

3.39 × 10 2.15 × 105 1.37 × 106 8.69 × 106 4



Quantiles (hˆ CV )

Quantiles (had-hoc )

ˆ ˆ

and V˜ 0a = (4hˆ c v /n6 )

) ˆ ˆ

Note that one cannot use the local constant least squares cross validation as it will lead to inconsistent estimation results. We need use a qth-order local polynomial (with q ≥ 1) in computing the cross validation function. In simulations reported in the next section, we choose q = 2 and q = 6. 4. Monte Carlo simulations 4.1. Estimation of β1 (·) and β2 (·) In this section we conduct simulations to examine the finite ˆ . We consider sample performance of our proposed estimator β(·) the following data generating process (DGP): Yt = β1 (Zt ) + t β2 (Zt )+ ut , where β1 (z ) = 1 + z, β2 (z ) = sin(π z ), Zt = vt −1 +vt , vt is i.i.d. uniform [0,1], ut is i.i.d. N (0, 1). The sample sizes are n = 50, 100, 200 and 400. We compute the average squared error for βˆ j (·) (j = 1, 2) as follows: for each replication we compute ASEj = n−1 t =1 [βˆ j (Zt ) − βj (Zt )]2 . Then we obtain the following quantiles: 10th, 50th (median) and 90th percentiles of ASEj over the 1000 replications. We use the standard normal kernel and the smoothing parameter is selected either by an ad-hoc method: had-hoc = zsd n−1/α , where zsd is the sample standard deviation of {Zt }nt=1 , or by

n

hˆ CV , the LS-CV selected smoothing parameter. Since an even order polynomial has smaller bias, we recommend using an even value of q to reduce estimation bias. We use q = 2 and q = 6 in the simulations. For q = 2, the leading term of MSE (βˆ 1 (·)) is of the order O(h4 + n2 h8 + (nh)−1 ), it is easy to see that h ∼ n−1/3 is the optimal rate in minimizing MSE (βˆ 1 ). Hence, for had-hoc we choose α = 3 so that had-hoc = zsd n−1/3 . With h ∼ n−1/3 we have MSE (βˆ 1 (·)) = O(n−2/3 ) and MSE (βˆ 2 (·)) = O(n−8/3 ). Therefore, we report the quantiles of n2/3 ASE (βˆ 1 (·)) and n8/3 ASE (βˆ 2 (·)) in Tables 1 and 2. From Table 1 first we observe that the ASE using hˆ CV is about half of that using had-hoc . This is as expected because hˆ CV is the asymptotically efficient bandwidth that minimizes asymptotic estimation mean squared error. Also, we observe that when n is large, n2/3 ASE (βˆ 1 (·)) tends to be stabilized as n increases. This

Table 3 n3 × quantiles-MSE(βˆ 20 ) (h = zsd n−1/3 ). n

50 100 200 400

n3 1.25 × 104 106 8 × 106 6.4 × 107

Quantiles

Mean

0.1

0.25

0.5

0.75

0.9

37.3 37.1 36.9 34.3

18.5 19.1 16.9 15.9

6.42 6.09 6.63 5.25

1.45 1.23 1.49 1.08

0.201 0.152 0.235 0.193

14.2 13.7 13.6 12.2

confirms that the rate of convergence of MSE (βˆ 1 (·)) is indeed n−2/3 . Finally, comparing q = 2 with q = 6, we clearly see that by using a higher order local polynomial approximation (q = 6) the estimated MSEs are smaller than that by using a lower order local polynomial approximation (q = 2) due to smaller biases and variance for the former case. This is consistent with our theory. Table 2 reports the estimated ASE for βˆ 2 (·). We observe that hˆ CV leads to smaller estimation ASE compared with using had-hoc ; n8/3 ASE (βˆ 2 (·)) tends to be stabilized as n increases, confirming the rate of convergence of MSE (βˆ 1 (·)) = O(n−8/3 ); and that a higher order local polynomial approximation (q = 6) gives smaller estimated MSEs than that by using a lower order local polynomial approximation (q = 2). 4.2. Estimation of β20 In this section we estimate the partially linear time trend model with DGP given by Yt = β1 (Zt ) + t β20 + ut , β1 (z ) = 1 + 0.5 sin(π z ), β20 = 0.2 and ut is i.i.d. N (0, σu2 ) with σu2 = 1. We

compute βˆ 20 using (2.8). Sample sizes are n = 50, 100, 200, 400, and the number of replications is 1000. When computing βˆ 20 we choose h = zsd n−1/3 which satisfies the under-smoothing Assumption 2.6(iii). Theorem 2.6 implies that MSE (βˆ 20 ) = O(n−3 ). Therefore, we report quantiles, as well as the average value of n3 × ASE (βˆ 20 ) in Table 3. The results reported in Table 3 clearly show that n3 × MSE (βˆ 20 ) ∼ a constant, which verifies numerically the theoretical prediction of Theorem 2.6 that MSE (βˆ 20 ) = O(n−3 ). 4.3. Test statistics Tna and Tnb We first examine the finite sample performance of Tna , the test statistic for testing the linear model H0a . Our null DGP is given by

22

Z. Liang, Q. Li / Journal of Econometrics 170 (2012) 15–31

Table 4 Estimated size for Tna (DGP0a ). n

50 100 200

Table 8 The estimated size and power for Tnb (h = zsd n−1/4 ).

q=2

q=6

n

1%

5%

10%

20%

1%

5%

10%

20%

0.011 0.018 0.016

0.052 0.065 0.063

0.124 0.125 0.124

0.235 0.226 0.221

0.010 0.017 0.016

0.049 0.054 0.052

0.112 0.108 0.107

0.237 0.202 0.204

50 100 200

50 100 200

q=2

q=6

1%

5%

10%

20%

1%

5%

10%

20%

0.243 0.583 0.912

0.498 0.790 0.982

0.613 0.871 0.990

0.764 0.945 0.994

0.243 0.565 0.919

0.482 0.777 0.979

0.588 0.866 0.989

0.746 0.940 0.992

Table 6 Estimated power for Tna (DGP2a ). n

50 100

q=2

q=6

1%

5%

10%

20%

1%

5%

10%

20%

0.421 1

0.696 1

0.810 1

0.902 1

0.400 1

0.670 1

0.794 1

0.888 1

Power (DGP1b )

5%

10%

20%

1%

5%

10%

20%

0.004 0.004 0.008

0.031 0.034 0.041

0.068 0.078 0.081

0.170 0.173 0.186

0.284 0.514 0.648

0.510 0.778 0.924

0.676 0.906 0.984

0.776 0.972 1.00

expansion based method. The estimated sizes and powers against DGP0a , DGP1a and DGP2a are reported in Table 7. Comparing the simulation results obtained using the least squares cross validation method of Tables 4–6 with that of Table 7, we see the two different approaches of selecting smoothing parameters give similar estimated sizes and powers. Finally, we consider the test statistic Tnb , for testing the null hypothesis of a partially linear varying coefficient model H0b . The null DGP is generated via DGP0b : Yt = 1 + 0.5 sin(π Zt ) + 0.2t + ut , while the alternative DGP is given by DGP1b : Yt = 1 + 0.5 sin(π Zt ) + (0.2 + 0.1 sin(π Zt ))t + ut . The simulation results are reported in Table 8. From Table 8 we observe that the test statistic Tnb is slightly undersized, but the estimated sizes improves as sample size increases. Also, as expected, when the null hypothesis is false, the power increases as the sample size n increases.

Table 5 Estimated power for Tna (DGP1a ). n

Size (DGP0b ) 1%

DGP0a : Yt = β10 + t β20 + ut , where β10 = 1 and β20 = 0.2. The alternative DGPs are given by DPG1a : Yt = β1 (Zt ) + t β20 + ut , and DGP2a : Yt = β10 + t β2 (Zt ) + ut , where (β10 , β20 ) = (1, 0.2), β1 (z ) = 1 + 0.5sin(π z ) and β2 (z ) = 0.2 + 0.02sin(π z ). For the null model, the coefficients are constants. For the alternative model DGP1a , β1 (z ) differs from the constant β10 by a sine function with the maximum deviation ±0.5 (50% of the value of β10 = 1). For DGP2a , β2 (z ) also deviates from β20 by a sine function, however, the maximum deviation is ±0.02 which is 10% of the value of β20 = 0.2. We choose a small deviation (from the null model) for DGP2a because our theory predicts that our test should be more powerful against DGP2a (than against DGP1a ). We will see that the simulation results indeed confirm our theoretical analysis of Section 3. The sample sizes used are 50, 100 and 200, the number of replications is 1000, and within each replication, 400 bootstrap statistics are computed to give 1%, 5%, 10% and 20% critical values of the empirical distribution of bootstrapped statistic. We use the least squares cross validation method to select the smoothing parameter in computing the Tna statistic. The results are reported in Tables 4–6. From Table 4 we observe that the estimated sizes are quite close to their nominal sizes, suggesting that the bootstrap method works quite well in this context. From Tables 5 and 6 we see that the Tna test is quite powerful in detecting the non-constancy of the coefficients. The power increases quite rapidly as the sample size increases. Also, as suggested by our theoretical analysis, when the coefficient of the time trend variable is non-constant (DGP2a ), the Tna test is more powerful than the case when the coefficient of a stationary variable is non-constant (DGP1a ). Next, we report simulation results using bandwidth selected based on Gao and Gijbels (2008), the proposed Edgeworth

Appendix A We know that Dn [Bˆ (z ) − B (z )] = Sn−1 (z )[L1n (z ) + L2n (z )].

(A.1)

In Lemmas A.1, A.2 and A.4 below we show that Sn (z )−1 = S (z )−1 + Op (h + (nh)−1/2 ),

√ 

(A.2)

d

nhL2n (z ) → N (0, Σa ), 1

(A.3) d

[L1n (z ) − E (L1n (z ))] → N (0, Σb ),

(A.4)

E (L1n (z )) = Bn (z ) + O(n−1 h + h3 + hq+1 + nhq+3 ).

(A.5)

nh2q+1

ˆ Proof of Theorem 2.1. Note  that the first d elements of Dn [B (z )− 

B (z )] are

βˆ 1 (z ) − β1 (z ) n(βˆ 2 (z ) − β2 (z ))

. Hence, the first (d − 1) elements of

the right-hand-side of (A.1) give the order of βˆ 1 (z ) − β1 (z ). From the definition of Bn (z ), we know that, for j = 1, 2, Bjn (z ) = O(h2 + nhq+1 ) if q is odd, and Bjn (z ) = O(h2 + nhq+2 ) if q is even. Combining this result with (A.2)–(A.5), we obtain that βˆ 1 (z ) − β1 (z ) = Op (h2 + nhq+1 + (nh)−1/2 ) if q is odd, and

βˆ 1 (z ) − β1 (z ) = Op (h2 + nhq+2 + (nh)−1/2 ) if q is even. And noting that Dn has a factor n for βˆ 2 (z )−β2 (z ), we have that βˆ 2 (z )−β2 (z ) = n−1 Op (βˆ 1 (z ) − β1 (z )).  Proof of Theorem 2.2. For case (a), from (A.2), (A.3) and (A.5) we immediately have



nh[Dn (Bˆ (z ) − B (z )) − Sn (z )−1 Bn (z )]

Table 7 The estimated size and power for Tna using Edgeworth expansion. n

50 100 200

Size (DGP0a )

Power (DGP1a )

Power (DGP2a )

1%

5%

10%

20%

1%

5%

10%

20%

1%

5%

10%

20%

0.017 0.016 0.018

0.064 0.059 0.065

0.112 0.109 0.112

0.249 0.232 0.226

0.232 0.545 0.907

0.436 0.765 0.976

0.584 0.868 0.986

0.740 0.941 0.994

0.401 1 1

0.668 1 1

0.794 1 1

0.890 1 1

Z. Liang, Q. Li / Journal of Econometrics 170 (2012) 15–31

√ = Sn (z )−1 nh(L1n (z ) + L2n (z ) − Bn (z ))

(i,j)

Note that F1,1 (z ) = n (i,j) W1,t

d

→ S (z )−1 N (0, Σa ) = N (0, S (z )−1 Σa S (z )−1 ). For case (b), first note that when q = 2, (A.3) and (nh5 )−1 = o(nh) (under Assumption 2.3(b)) together imply that (nh5 )−1/2 L2n (z ) = op (1). Hence, from (A.2)–(A.4) we obtain



and Then

d

1

dn −1

→ S (z )−1 N (0, Σa + c 2 Σb ) = N (0, S (z )−1 (Σa + c 2 Σb )S (z )−1 ), √ where we have used the fact that nh = c (nh5 )−1/2 . This 

Lemma A.1. Under Assumption 2.2 and assume that h → 0 and nh → ∞, as n → ∞, we have Sn (z ) = S (z ) + Op (h + (nh)−1/2 ). Proof. We will prove Lemma A.1 by showing E [Sn (z )] = S (z ) + O(h) and Var[Sn (z )] = O((nh)−1 ). Note that Sn (z ) is a symmetric matrix, we write Sn (z ) = {Fi,j (z )}1≤i,j≤q+2 , where F1,1 (z ) = n t =1 X1t X1t Kh,zt z , F1,i (z ) = (n((i − 2)!)) t Zt −z i−2 X ( ) K 1t h , z z t for i = 2, . . . , q + 2, and Fi,j (z ) = (n((i − t =1 n h ⊤

−1

Zt −z i+j−4 2)!( j − 2)!)) Kh,zt z for i = 2, . . . , q + 2 and t =1 n 2 ( h ) i ≤ j ≤ q + 2. ⊤ We first evaluate E [Sn (z )]. Recall that η2 (z ) = E (X1t X1t |z ) f (z ), we have

E E (X1t |Zt )

Zt − z

(i,j)

j) |Cov(W1(,i,1j) , W1(,i,l+ 1 )| and



n−1 

j) |Cov(W1(,i,1j) , W1(,i,l+ 1 )|.

different values at different places. From Davydov’s inequality (see Hall and Heyde (1980), cor. A.2), we obtain for all l ≥ 1, j) |Cov(W1(,i,1j) , W1(,i,l+ 1 )| j) δ 1/δ ≤ C [α(l)]1−2/δ [E |W1(,i,1j) |δ ]1/δ [E |W1(,i,l+ . 1| ]

(A.6)

By Assumptions 2.2b and 2.2f, we have (i,j) ⊤ E [|W1,t |δ ] ≤ CE [|(X1t X1t )(i,j) |δ Khδ,zt z ]

≤ Ch1−δ E [|(X1t X1t⊤ )(i,j) |δ ] ≤ Ch1−δ .

(A.7)

Thus, from (A.6) and (A.7), we have J2 ≤ Ch2/δ−2

∞  [α(l)]1−2/δ l=dn

2

For 1 ≤ i ≤ q + 2, note that µj = E (X1t |z )f (z ), we have 1

(i,j)

Cov(W1,1 , W1,l+1 ).

For all l ≥ 1, using Assumptions 2.2a, 2.2b, 2.2e, we have that j) ⊤ ⊤ |Cov(W1(,i,1j) , W1(,i,l+ 1 )| ≤ C |E [(X11 X11 )(i,j) Kh,z1 z (X1l X1l )(i,j) Kh,zl z ]| ≤ −1 C . Hence, J1 ≤ Cdn = o(h ), by the choice of dn . Note that here and in the following, C > 0 is a generic constant, i.e., it can take

E [F1,1 (z )] = E {E [X1t X1t |Zt ]Kh,zt z } = η2 (z ) + O(h ). ⊤

E [F1,i (z )] =

n

l=dn

t2





J2 =

n





l =1

d

n −1

l

1−

(i,j)

√ = Sn (z )−1 nh(L1n (z ) + L2n (z ) − Bn (z ))

n

n l =1



Hence, Var[W1,t ] = O(h−1 ). Letting dn → ∞ be a sequence of positive integers such that dn h → 0. Define J1 =

nh[Dn (Bˆ (z ) − B (z )) − Sn (z )−1 Bn (z )]

−1

n −1 2

⊤ ⊤ Kh,zt z ])2 . It is easy to show that E [X1t X1t Kh,zt z ] = E [X1t X1t |z ] f (z ) + ⊤ 2 O(h2 ) and E [(X1t X1t )(i,j) Kh2,zt z ] = h−1 ν0 E [(X1t X1t⊤ )2(i,j) |z ]f (z ) + O(h).

−1

For case (c), since nh3 → c, or (nh)(nh5 ) → c 2 . Thus, we can replace nh by c 2 (nh5 )−1 . It is easy to see that Cov(L1n (z ), L2n (z )) = 0. L1n (z ) and L2n (z ) are asymptotically uncorrelated. Hence, when q = 2, from (A.2)–(A.4) we obtain

completes the proof of Theorem 2.2.

(i,j)

Var[W1,t ]

(i,j)

nh

−1

⊤ W1,t , where W1,t = X1t X1t Kh,zt z

⊤ 2 Note that Var[W1,t ] = E [(X1t X1t )(i,j) Kh2,zt z ] − (E [(X1t X1t⊤ )(i,j)

→ S (z ) N (0, Σb ) = N (0, S (z ) Σb S (z ) ).



n

+

(L1n (z ) + L2n (z ) − Bn (z )) 5

−1

1

(i,j)

Var[F1,1 (z )] =

[Dn (Bˆ (z ) − B (z )) − Sn (z )−1 Bn (z )] nh5 

(i,j)

t =1

(i, j = 1, . . . , d − 1) is the (i, j)th entry of matrix W1,t .

1

= S n ( z ) −1

23

n −1

≤ Ch2/δ−2 d−γ n

v j K (v)dv , η1 (z ) =

∞ 

lγ [α(l)]1−2/δ = o(h−1 ),

l=dn

γ



i−2 Kh,zt z

n

1 

t

(i − 2)! h n2 t = 1 µi−2 = η1 (z ) + O(n−1 + h2 ), 2((i − 2)!) n where the O(n−1 ) term comes from n−2 t =1 t = (1/2) + O(n−1 ). For 2 ≤ i ≤ q + 2, i ≤ j ≤ q + 2,   i+j−4 n 1 Zt − z 1  2 E [Fi,j (z )] = E Kh,zt z t (i − 2)!( j − 2)! h n3 t = 1 µi+j−4 f (z ) = + O(n−1 + h). 3((i − 2)!( j − 2)!) (i,j) Next, we evaluate Var[Sn (z )]. Let F1,1 (z ) denote the (i, j)th (i,j) element of F1,1 (z ), we show below that Var[F1,1 (z )] = O((nh)−1 ).

by choosing dn such that h1−2/δ dn = O(1). Also, the requirement (i,j) that dn h → 0 is satisfied. Therefore, Var[F1,1 (z )] = O((nh)−1 ).

Similarly, one can show that Var[Fi,j (z )] = O((nh)−1 ) for all 1 ≤ i, j ≤ q + 2. 



d

Lemma A.2. (i) nhL2n (z ) → N (0, Σa ) under case (a) or (c), (ii) (nh2q+1 )−1/2 L2n (z ) = op (1) under case (b). 1 t Zt −z q ⊤ ⊤ t t Zt −z 1˜ Proof. Let X˘ t = D− n Xt = (X1t , n , n ( h ), . . . , q! n ( h ) ) . Then n n √ 1  −1 1  nhL2n (z ) = √ Dn X˜ t ut hKh,zt z ≡ √ vt , n t =1 n t =1





where vt = X˘ t ut hKh,zt z ≡ (v1,t , v2,t , . . . , vq+2,t )⊤ . Since {(X1t⊤ , Zt , ut )}, t = 1, . . . , n, is an α -mixing process satisfying the Assumption 2.2 with E (ut |X1t , Zt ) = 0. This implies that E (vt ) = 0.

24

Z. Liang, Q. Li / Journal of Econometrics 170 (2012) 15–31

Var[˜vt −1 ] = Var[λ⊤ vt ] = E [λ⊤ vt vt⊤ λ]

= λ⊤ E [vt vt⊤ ]λ = λ⊤ E [X˘ t X˘ t⊤ u2t hKh2,zt z ]λ  ⊤ 2 ν0 E [X1t X1t σ (X1t , z )|z ] f (z ) + O(h) ⊤ =λ    t th ⊤ 2 aE [X1t σ (X1t , z )|z ] f (z ) + O n

t n t

E [X1t σ (X1t , z )|z ] f (z )a + O

2

n2

n



2

E [σ 2 (X1t , z )|z ] f (z )Π + O





th



n

    λ,  t h 2

n2

Box I.

⊤ For any λ = (λ⊤ ̸= 0, let v˜ t = λ⊤ vt +1 , 1 , λ2 , λ3 , . . . , λp+2 )



t = 0, . . . , n − 1, then nhλ L2n (z ) = n v˜ t = n t =1 √ t =0    λ⊤ vt , and √1n nt=1 λ⊤ vt = √1n nt=1 λ⊤ X˘ t ut hKh,zt z = √1n nt=1 √  q 1 t Zt −z i [λ⊤ hKh,zt z . We have Var[˜vt −1 ] given 1 X1t + i=0 λi+2 i! n ( h ) ]ut in Box I, where a = (ν0 , ν1 , ν2 /2!, . . . , νq /q!)⊤ and Π = ν +j−2 (πij )(q+1)×(q+1) with its (i, j)th element given by πij = (i−1i)!( , j−1)! for all 1 ≤ i, j ≤ q + 1. to the proof of Lemma A.3(c) one can show that nSimilar −1 v0 , v˜ l )| = o(1). Hence, l=1 |Cov(˜

 Var

n−1 1 



n t =0

  → λ⊤ 

 v˜ t

=

n−1 1

n t =0

Var[˜vt ] + 2

n−1  

2

1−

l=1

ν0 E [X1t X1t⊤ σ 2 (X1t , z )|z ] f (z ) 1

n−1

√1



⊤ 2 aE [X1t σ (X1t , z )|z ] f (z )

1 2 1 3

l n



n

√1

E [|v1,t |δ ] = CE [|X1t |δ uδ1 hδ Khδ,zt z ] ≤ ChE [|X1t |δ ] ≤ Ch.

E [X1t σ 2 (X1t , z )|z ] f (z )a⊤ E [σ 2 (X1t , z )|z ] f (z )Π

  λ (A.8)

The rest of the proof of normality is the same as the proof of Theorem 2 in Cai et al. (2000). Specifically, by employing the Bernstein blocking scheme and the Cramér–Wold device, one can √ establish the asymptotic normality of nhL2n (z ). Therefore, n √ 1  −1 nhL2n (z ) = √ Dn X˜ t ut hKh,zt z n t =1



n 1 

n t =1

|Cov(v1,1 , v1,l+1 )| ≤ C [α(l)]1−2/δ [E |v1,1 |δ ]1/δ [E |v1,l+1 |δ ]1/δ . (A.12) By the Assumption 2.2, we have

Cov(˜v0 , v˜ l )

≡ λ⊤ Σa λ.

≡ √

Letting dn → ∞ be a sequence of positive integers such that dn −1 dn h → 0 as n → ∞. Define P1 = l=1 |Cov(v1,1 , v1,l+1 )| and  n −1 P2 = l=dn |Cov(v1,1 , v1,l+1 )|. Since K (·) has a compact support, and by Assumption 2.2, we have, for all l ≥ 1, |Cov(v1,1 , v1,l+1 )| = |E [X1,1 X1⊤,l+1 u1 ul+1 hKh,z1 z Kh,zl+1 z ]| ≤ Ch. Hence, P1 ≤ Cdn h = o(1), by the choice of dn . Using Davydov’s inequality (see Hall and Heyde (1980), cor. A.2), we obtain for all l ≥ 1,

From (A.12) and (A.13), we have P2 ≤ Ch



1 nh2q+1

L2n (z ) =

d

nh2q+1

(A.9)

[α( )] = ( ) choosing dn such that = ( ) → 0 is satisfied. Thus,  | v(v , v )| = ( )   n 1  v1,t = ν0 E [X1t X1t⊤ σ 2 (X1t , z )|z ] f (z ) Var √ ≤

n t =1

+ O(h) → ν0 E [X1t X1t⊤ σ 2 (X1t , z )|z ] f (z ).

 Var

n 1 



n t =1

(A.10) 

v1,t ] → ν0 E [X1t X1t⊤ σ 2 (X1t , z )|z ] f (z ) and(ii) Var[ √1n ν2(i−2) E [σ 2 (X1t , z )|z ] f (z ), 3((i−2)!)2

n

t =1

2 ≤ i ≤ q + 2.

= ν0 E [X1t X1t⊤ σ 2 (X1t , z )|z ] f (z ) + O(h),  where νj = v j K 2 (v)dv . n n−1 By strict stationarity Var[ √1n t =1 v1,t ] = Var(v1,1 ) + 2 l =1 (1 − nl )Cov(v1,1 , v1,l+1 ). Next, we prove |Cov(v1,1 , v1,l+1 )| = o(1).

Var

n t =1

n 1 



n t =1

 vi,t



ν2(i−2)

Var(vi,t )

n−1  

1−

l n



Cov(vi,1 , vi,l+1 ).

1 3((i − 2)!)2

n−1 l=1

|Cov(vi,1 , vi,l+1 )|

ν2(i−2) E [σ 2 (X1t , z )|z ] f (z ).

This completes the proof of Lemma A.3.

(ii)

⊤ 2 Var(v1,t ) = E [E (X1t X1t σ (Xt , Zt )|Zt )hKh2,zt z ]

l =1

n 1

t2 n2



Lemma A.4. (i) E (L1n (z )) = Bn (z ) + Op (n−1 h + h3 + hq+1 + nhq+3 ),

Proof. First we have

n −1 

=

Similarly to the proof of (A.11), we have = o(1). Hence,



vi,t ] →

vi,t

l=1

Lemma A.3. Under Assumptions 2.1 and 2.2 and the assumption that h → 0 and nh → ∞, as n → ∞, we have(i) Var[ √1n t =1



+2

= Op ((nhq+1 )−1 ) = op (1).

n

[α(l)]1−2/δ



Op ((nh)−1/2 )

This completes the proof of Lemma A.2.

(A.13)

For 2 ≤ i ≤ q + 2, we have Var(vi,t ) = ((i−12)!)2 2 E [σ 2 (X1t , z )|z ]f (z ) + O( tn2h ). Also,

vt → N (0, Σa ).

1

l=dn

−γ ∞ γ l 1−2/δ o 1 , by Ch2/δ dn l=dn l γ 2/δ−1 dn O h , the requirement that dn h n −1 o 1 . Hence, 1,1 1,l+1 l=1 Co

Eq. (A.9) also implies that, under case (b),



∞ 2/δ



1 nh2q+1

d

[L1n (z ) − E (L1n (z ))] → N (0, Σb ).

Proof. First we have n 1  −1 L1n (z ) = Dn X˜ t n t =1

 ⊤ X1t (β1 (Zt ) − β1 (z ))



(A.11)

q  1 (i) + t β2 (Zt ) − β2 (z ) (Zt − z )i i ! i =0

= G1n (z ) + G2n (z ),

 Kh,zt z

Z. Liang, Q. Li / Journal of Econometrics 170 (2012) 15–31

where for j = 1, 2, Gjn (z ) = (G1,j (z ), G2,j (z ), . . . , Gq+2,j (z ))⊤ n ⊤ with G1,1 (z ) = 1n t =1 X1t X1t (β1 (Zt ) − β1 (z ))Kh,zt z , G1,2 (z ) =  q 1 (i) n 1 t i nn i=0 i! β2 (z ) (Zt − z ) )Kh,zt z , and for 2 ≤ t =1 X1t n (β2 (Zt ) − j ≤ q + 2, Gj,1 (z ) =

n 1 t



1

Zt − z

n t =1 n ( j − 2)! − β1 (z ))Kh,zt z ,

Gj,2 (z ) = n

n 1  t2

 j −2

h



1

Zt − z

n t =1 n2 ( j − 2)!

h

(β2 (Zt )

Bn (z ) = (B1n (z ), B2n (z ), . . . , Bq+2,n (z ))⊤ ,

 1 (i) − β2 (z ) (Zt − z )i )Kh,zt z . i ! i=0

  1 (1) (1) (2) B1n (z ) = h µ2 η2 (z )β1 (z ) + η2 (z )β1 (z ) 2

2

+

E [G1,1 (z )] = E [ξ (Zt )(β1 (Zt ) − β1 (z ))Kh,zt z ]

 η2(1) (z )β1(1) (z ) + η2 (z )β1(2) (z ) + O(h4 ), 1

+

2

⊤ where η2 (z ) = E (X1t X1t |z )f (z ). Also,

=n

n 

 q  1 (i) i − β (z ) (Zt − z ) )Kh,zt z i! 2 i =0  nhq+1 µq+1 hµq+2 (1) = η1 (z )β2(q+1) (z ) + η (z )β2(q+1) (z ) 2 (q + 1)! (q + 1)! 1  h µq + 2 η1 (z )β2(q+2) (z ) + O(hq+1 + nhq+3 ), + (q + 2)! where η1 (z ) = E (X1t |z )f (z ). Next, we compute the variances of G1,1 (z ) and G1,2 (z ). It is easy (i,j) Var[G1,1 (z )]

(β ( ) − = O(n h). Let Mt =  β2(i) (z ) (Zt − z )i )Kh,zt z , then G1,2 (z ) = n n−1 Var[G1,2 (z )] = t =1 Var[Mt ] + 2 l=1 (n − l)Cov(M1 , Ml+1 ). 2 We have Var[Mt ] = nt 2 {E [Wt Wt⊤ ] − E [Wt ]E [Wt⊤ ]}, where Wt = q (i) X1t (β2 (Zt ) − i=0 i1! β2 (z ) (Zt − z )i )Kh,zt z .

X1t nt 2 Zt n t =1 Mt . Hence,

−1

1 i=0 i!

h2q+1 ν

(q+1)

Standard derivations lead to E [Wt Wt⊤ ] = (q+1)!(q2q++12)! (β2  (z ))2 E (Xt Xt⊤ |z )f (z ) + O(h2q+2 ), and E [X1t (β2 (Zt ) − qi=0 i1! β2(i) (z ) (Zt − z )i )Kh,zt z ] = O(hq+1 ). Then by using similar arguments as in the proof of Lemma A.1, we have n−1 n−1   (n − l)Cov(M1 , Ml+1 ) ≤ n |Cov(M1 , Ml+1 )| l=1

(q + 1)!

(q + 2)!

h 2(( j − 2)!)

+

2

 η1 (z )β2(q+2) (z ) ,

(A.14)

(1)

(1)

(1)

µj−1 η1 (z )β1 (z ) + hµj η1 (z )β1 (z )  (2)

hµj η1 (z )β1 (z ) + O(n−1 h + h3 ),

µj−1 η1 (z )β1(1) (z ) + hµj η1(1) (z )β1(1) (z ) 

µq+j−1 (q+1) β (z )f (z ) (q + 1)! 2  hµq+j (q+1) hµq+j (q+2) (1) β β + (z )f (z ) + (z )f (z ) . (q + 1)! 2 (q + 2)! 2 1

2

(2)

hµj η1 (z )β1 (z ) +

nhq+1



3

Then we have just proved that E (L1n (z )) = Bn (z ) + Op (n−1 h + h3 + hq+1 + nhq+3 ),

(A.16)

which in turn gives us



1 nh2q+1

[E (L1n (z )) − Bn (z )]

= O((n3 h2q−1 )−1/2 + (nh2q−5 )−1/2 + (h/n)1/2 + (nh5 )1/2 ) √

= o(1) under case (b) or (c);

(A.17)

nh[E (L1n (z )) − Bn (z )]

= O((h3 /n)1/2 + (nh7 )1/2 + (nh2q+3 )1/2 + (n3 h2q+7 )1/2 )

= o(1) under case (a).

(A.18)

Next, we consider L1n (z ) − E (L1n (z )) under case (b) or (c). We have 1 nh2q+1

 =

[L1n (z ) − E (L1n (z ))] 1

nh2q+1

 +





2(( j − 2)!)

+



Summarizing the above we have shown that Var[G1,2 (z )] = O(nh2q+1 ). Next, we consider 2 ≤ j ≤ q + 2,

1

η1(1) (z )β2(q+1) (z )

h µq + 2

h

Bjn (z ) =

l=1

= o(nh2q+1 ).

E [Gj,1 (z )] =

2

µq + 1 η1 (z )β2(q+1) (z ) (q + 1)!

h µq + 2

t E E (X1t |Zt )(β2 (Zt )

to show that



and for 2 ≤ j ≤ q + 2,



t =1

q

nhq+1

+

E [G1,2 (z )] −1

(A.15)

where

We first consider G1,1 (z ) and G1,2 (z ). Let ξ (Zt ) = E (Xt Xt⊤ |Zt ), then

= h 2 µ2



Similar to the above calculation, it is straightforward to show that Var[Gj,1 (z )] = O(n−1 h) and Var[Gj,2 (z )] = O(nh2q+1 ). Recall that

q



µq+j−1 (q+1) β (z )f (z ) 3(( j − 2)!) (q + 1)! 2 hµq+j (q+1) β (z )f (1) (z ) + (q + 1)! 2  hµq+j (q+2) β (z )f (z ) + O(hq+1 + nhq+3 ). + (q + 2)! 2 nh

E [Gj,2 (z )] =

⊤ X1t (β1 (Zt )

j−2

25

q +1

[G1n (z ) − E (G1n (z ))]

1 nh2q+1

[G2n (z ) − E (G2n (z ))].

From the previousproof, we have G1n (z ) − E (G1n (z ))

Op ((n−1 h)1/2 ). Then

= q −1 [ G ( z )− E ( G ( z ))] = O (( nh ) ) = 1n 1n p nh2q+1 1

op (1), under case (b) or (c). We know that

26

Z. Liang, Q. Li / Journal of Econometrics 170 (2012) 15–31



1

Recall that

[G2n (z ) − E (G2n (z ))]    q n   1 1 1 (i) −1 ˜ = β (z ) D X t β ( Z ) − t 2 t n nh2q+1 t =1 n i ! 2 i=0  1˜ × (Zt − z )i Kh,zt z − E [D− n Xt t (β2 (Zt )   q  1 (i) i β (z ) (Zt − z ) Kh,zt z ] − i! 2 i=0

ν2(q+1) (q+1) (β (z ))2 η2 (z )  3((q + 1)!)2 2  Σb =  1 (q+1) (β (z ))2 b(η1 (z ))⊤ 4((q + 1)!) 2

nh2q+1

1  n t =1

show that, for case (b) or (c),

Vnt ,

(A.19)

√1

t =1

n

where the definition of Vnt should be apparent. Obviously, E (Vnt ) =  ˜ 1t (z ) = √ t X1t (β2 (Zt ) − qi=0 1 β2(i) (z ) (Zt − z )i )Kh,zt z , 0. Let G 2q+1 i!

1 nh2q+1

β2(i) (z ) (Zt − n2 ˜ 1t (z ) − z )i )Kh,zt z , 2 ≤ j ≤ q + 2. Then we have Vnt = (G ˜ ˜ ˜ ˜ ˜ E [G1t (z )], G2t (z ) − E [G2t (z )], . . . , Gq+1,t (z ) − E [Gq+1,t (z )])⊤ , and 2

1 h2q+1 ( j−2)!

( Zt h−z )j−2 (β2 (Zt ) −

q

1 i=0 i!

that

(β (q+1) (z ))2 E (X1t X1t⊤ |z )f (z ) n2 ((q + 1)!)2 2  2  t h , +O 2 t 4 ν2(q+j−1) n4 (( j − 2)!)2 ((q + 1)!)2

 +O

t 4h



n4

,

(β2(q+1) (z ))2 f (z )

for 2 ≤ j ≤ q + 2.

n t =1

n t =1

˜ 1t (z )] + Var[G t 2 ν2(q+1)

n

=

1 n t =1

n2

((q + 1)!)2

ν2(q+1)



3((q + 1)!)2

(n − l)Cov(G˜ 11 (z ), G˜ 1,l+1 (z ))

(β2(q+1) (z ))2 E (X1t X1t⊤ |z )f (z ) + o(1)

Var

= = →

n 1 



n t =1

n 1

n t =1

(β2(q+1) (z ))2 E (X1t X1t⊤ |z )f (z ),

(G˜ jt (z ) − E [G˜ jt (z )])

n −1  ˜ jt (z )] + 1 Var[G (n − l)Cov(G˜ j1 (z ), G˜ j,l+1 (z ))

t 4 ν2(q+j−1)

ν2(q+j−1)

5(( j − 2)!)2 ((q + 1)!)2

| ) ( )+ (

[G˜ it (z ), G˜ jt (z )] = And for 1 ≤ i, ≤

nhOp ((n−1 h)1/2 + (nh2q+1 )1/2 ) (A.21)

Proof of Theorem 2.4. First we consider the even q case. From (2.4) we know that when q is even, the leading term of CV (h) is B1 h4 + B2 n2 h2q+4 + B3 (nh)−1 , the first two terms allow for h → 0 arbitrarily fast, while the third term is the variance term and it cannot let h → 0 too fast. Therefore, one of the first two terms must balance the third term. Also, for a given h ∼ n−1/α for some α > 0, the second term decreases as q increases. Therefore, for small values of q, the first term dominates the second term, while for large values of q, the second term dominates the first term. It is easy to show that when q = 5 the first and the second terms lead to the same order. But since q is even (so q ̸= 5), we have that

lim sup P

)

t ν2q+j n3 (( j−2)!)((q+1)!)2

+



3

≤ , ≤ q + 1, Cov



n

(n)

sup |Xt |s−t |≤δ

(n)



− Xs | > ε = 0.

Proof. This is a corollary from (Bhattacharya and Waymire (2009), p. 98).

(β2(q+1) (z ))2 f (z ).

3 X1t z f z O tn3h , and for 2 i j t 4 ν2q+i+j−2 (q+1) z 2f 2 n4 ((i−2)!( j−2)!)((q+1)!)2 n −1 j q 1, l=1 n l Co Gi1 z Gjl

(z )) E (

nh[L1n (z ) − E (L1n (z ))]

δ→0

(β2(q+1) (z ))2 f (z ) + o(1)

˜ 1t (z ), G˜ jt (z )] = Also, for 2 ≤ j ≤ q + 2, Cov[G (β2

From (A.20) we immediately have, for case (a)





n l =1

2

(A.20)

(n)

n t =1 n4 (( j − 2)!)2 ((q + 1)!)2

(q+1)

n t =1

d

Vnt + op (1) → N (0, Σb ).

Lemma A.5. Let {Xt : 0 ≤ t ≤ 1} be stochastic processes on (Ω , F , P ) which have a.s. continuous sample paths. If there are (n) (n) positive numbers α , β , M such that E |Xt − Xs |α ≤ M |t − s|1+β , for all s, t, n, then for each ε > 0



n 1

[L1n (z ) − E (L1n (z ))]

(ii) If for q ≥ 6, the first term dominates the second term. From the B FOC: 4B1 h3 − B3 (nh2 )−1 = 0 we get h0 = ( 4B3 )1/5 n−1/5 . 1 The proof for odd q case is similar and is thus omitted. 

and



[G2n (z ) − E (G2n (z ))] =

(i) If q ≤ 4, it is easy to see that second term dominates the first term. From the first order condition (FOC): (2q + 4)B2 n2 h2q+3 − B B3 /(nh2 ) = 0 we obtain that h0 = ( (2q+34)B )1/(2q+5) n−3/(2q+5) 2 minimizes the leading term of CV (h).

n−1 1

n l=1

  

Eqs. (A.20) and (A.21) together complete the proof of Lemma A.4. 

n−1 (n − l) Cov(G˜ 11 (z ), G˜ 1,l+1 (z )) = n−1l=1 ˜ j1 (z ), G˜ j,l+1 (z )) = o(n), o(n). Similarly, we have l=1 (n − l) Cov(G for 2 ≤ j ≤ q + 2. Hence,   n 1  (G˜ 1t (z ) − E [G˜ 1t (z )]) Var √ n 1

1 nh2q+1

= Op (h + nhq+1 ) = op (1).

From (A.14), we have

=

= √

=

n

˜ jt (z )] = Var[G

Ωb





t 2 ν2(q+1)

˜ 1t (z )] = Var[G

(z ))2 η1 (z )b⊤

Vnt → N (0, Σb ). Hence,

n 1 

n h

√t

(q+1)

(β2

d

n



˜ jt (z ) = and G

4((q + 1)!)

where b⊤ = (ν2q+2 , ν2q+3 , ν2q+4 /(2!), ν2q+5 /(3!), . . . , ν3q+2 /(q!)) and Ωb is defined above Theorem 2.2. Using the Bernstein blocking scheme and Cramér–Wold device  as in Lemma A.2, one can

n

≡ √

1



˜ n = Diag(1, . . . , 1, n, nhˆ c v , . . . , Proof of Theorem 2.5. Let D n q −1 1 ˆ ˜ ˜ ˜ ˜⊤ ˜ −1 nhc v ), Sn (z ) = Dn [ n t =1 Xt Xt Khˆ c v ,zt z ]Dn , L˜ 1n (z ) =

n 1  −1 ˜ n X˜ t [X1t⊤ (β1 (Zt ) − β1 (z )) D n t =1

4

( )) (z ) + O( tn4h ). ( − ) v[ ˜ ( ), ˜ (z )] = o(n).

+ t (β2 (Zt ) −

q  1 (i) β2 (z ) (Zt − z )i )]Khˆ c v ,zt z , i ! i =0

Z. Liang, Q. Li / Journal of Econometrics 170 (2012) 15–31

and L˜ 2n (z ) =

n 1 n

t =1

27

results for F˜1,i (z ) for i = 2, . . . , q + 2, and F˜i,j (z ) for i = 2, . . . , q + 2

1˜ ˜− D n Xt ut Khˆ c v ,zt z . Then it is easy to see that

p

and i ≤ j ≤ q + 2. Thus, S˜n (z ) → S (z ).

˜ n [B˜ (z ) − B (z )] = S˜n (z )−1 (L˜ 1n (z ) + L˜ 2n (z )). D

Next, we consider L˜ 1n (z ). We have

Without loss of generality, we assume q = 2 throughout this proof. The proofs for other q values are similar. First, we show that



nhˆ c v L˜ 1n (z )

p

S˜n (z ) → S (z ). By Theorem 2.4, we have hˆ c v = c1 n−1/3 + (s.o.). Let h0 = p c1 n−1/3 , we have hˆ c v /h0 → 1. Let B = [b, b¯ ], where 0 < b < p

c1 < b¯ < ∞. Denote bˆ c v = n1/3 hˆ c v . Then bˆ c v → c1 .

n 1  −1 nhˆ c v Dn X˜ t n t =1

 =

⊤ X1t (β1 (Zt ) − β1 (z )) + t (β2 (Zt )

q  1 (i) β2 (z ) (Zt − z )i − i ! i=0

Note that S˜n (z ) = {F˜i,j (z )}1≤i,j≤q+2 ,

 Khˆ c v ,zt z

= G˜ 1n (z ) + G˜ 2n (z ),

n ⊤ Khˆ c v ,zt z , F˜1,i (z ) = (n((i − where F˜1,1 (z ) = n−1 t =1 X1t X1t n t Zt −z i−2 −1 2)!)) ) Khˆ c v ,zt z for i = 2, . . . , q + 2, and F˜i,j (z ) t =1 X1t n ( h



cv

 = (n((i − 2)!( j − 2)!))−1 nt=1 2, . . . , q + 2 and i ≤ j ≤ q + 2.

t2

n2

( Zhˆt −z )i+j−4 Khˆ c v ,zt z for i = cv

Let S˜n (z , b) = {F˜i,j (z , b)}1≤i,j≤q+2 ,

˜ in (z ) = (G˜ 1,i (z ), G˜ 2,i (z ), . . . , G˜ q+2,i (z ))⊤ with where for i = 1, 2, G 

 ˜ ˜ (z ) = nhˆ c v 1 nt=1 X1t X1t⊤ (β1 (Zt ) − β1 (z ))K ˆ G hc v ,zt z , G1,2 (z ) = n 1,1 n q 1 (i) t i nhˆ c v n 1n i=0 i! β2 (z ) (Zt − z ) )Khˆ c v ,zt z , and t =1 X1t n (β2 (Zt ) − for 2 ≤ j ≤ q + 2, ˜ j ,1 ( z ) = G

where F˜1,1 (z , b) = n−1 t =1 [X1t X1t⊤ Kbn−1/3 ,zt z − E [X1t X1t⊤ n ˜ Kbn−1/3 ,zt z ]], F1,i (z , b) = (n((i − 2)!))−1 t =1 nt [X1t ( bnZt−−1/z3 )i−2

n

Kbn−1/3 ,zt z − E [X1t ( bnZt−−1/z3 )i−2 Kbn−1/3 ,zt z ]] for i = 2, . . . , q + 2, and F˜i,j (z , b) = (n((i − 2)!( j − 2)!))−1

t2 t =1 n 2

[( bnZt−−1/z3 )i+j−4 Kbn−1/3 ,zt z − Zt −z i+j−4 Kbn−1/3 ,zt z ]] for i = 2, . . . , q + 2 and i ≤ j ≤ q + 2. E [( bn−1/3 ) For each component of F˜1,1 (z , b), using (A.10) in Li and Li (2010) n

and Davydov’s inequality, we have that there exists a constant (i,j) (i,j) D1 > 0 such that for any b1 , b2 ∈ B, E |F˜1,1 (z , b1 ) − F˜1,1 (z , b2 )|2 ≤ D1 |b1 − b2 |2 . Similarly, we have that there exist D2i , D3ij > 0 such

that E |F˜1,i (z , b1 ) − F˜1,i (z , b2 )|2 ≤ D2i |b1 − b2 |2 for i = 2, . . . , q + 2, and E |F˜i,j (z , b1 ) − F˜i,j (z , b2 )|2 ≤ D3ij |b1 − b2 |2 for i = 2, . . . , q + 2 and i ≤ j ≤ q + 2. (i,j)



1 nhˆ c v



˜ 1 ,1 ( z , b ) = G



nbn−1/3



nbn−1/3

× n

−E

2

p

(i,j)

n (i,j) ⊤ Therefore, F˜1,1 (z ) = n−1 t =1 (X1t X1t )(i,j) Khˆ c v ,zt z = F˜1(,i,1j) (z , bˆ c v ) n ( i , j ) ⊤ − 1 + F¯1,1 (z , bˆ c v ) = n = t =1 (X1t X1t )(i,j) Kc1 n−1/3 ,zt z + op (1)

η2(i,j) (z )+ op (1), by Lemma A.1. Similarly, we get the corresponding

j−2

hˆ c v

(β2 (Zt )

n 1  ⊤ X1t X1t (β1 (Zt ) n t =1

n 1 t

1

Zt − z

 j −2

bn−1/3

⊤ X1t (β1 (Zt )

 − β1 (z ))Kbn−1/3 ,zt z

Similar to the derivation of (A.9) in Li and Li (2010), we can show (i,j) (i,j) that there exists M4 > 0 such that |F¯1,1 (z , b1 ) − F¯1,1 (z , b2 )| ≤

B) → 0, by Markov’s inequality and bˆ c v → c1 . So F¯1,1 (z , bˆ c v ) = (i,j) F¯1,1 (z , c1 ) + op (1).

Zt − z

n t =1 n ( j − 2)! j−2 Zt − z ⊤ X1t (β1 (Zt ) − β1 (z ))Kbn−1/3 ,zt z bn−1/3



2, and F¯i,j (z , b) = (n((i − 2)!( j − 2)!))−1 t =1 nt 2 E [( bnZt−−1/z3 )i+j−4 Kbn−1/3 ,zt z ] for i = 2, . . . , q + 2 and i ≤ j ≤ q + 2.

ε) = P (|F¯1(,i,1j) (z , bˆ c v )−F¯1(,i,1j) (z , c1 )| > ε, bˆ c v ∈ B)+P (|F¯1(,i,1j) (z , bˆ c v )− (i,j) F¯1,1 (z , c1 )| > ε, bˆ c v ̸∈ B) ≤ M4 /ε E [|bˆ c v − c1 |] + P (bˆ c v ̸∈



1

⊤ X1t (β1 (Zt )

− β1 (z ))Kbn−1/3 ,zt z − E [X1t X1t⊤ (β1 (Zt ) − β1 (z ))Kbn−1/3 ,zt z ] ,

⊤ where F¯1,1 (z , b) = n−1 t =1 E [X1t X1t Kbn−1/3 ,zt z ], F¯1,i (z , b) =  n Z − z t (n((i − 2)!))−1 t =1 n E [X1t ( bnt−1/3 )i−2 Kbn−1/3 ,zt z ] for i = 2, . . . , q +

(i,j)

hˆ c v

˜ in (z , b) = (G˜ 1,i (z , b), G˜ 2,i (z , b), . . . , G˜ q+2,i (z , b))⊤ , for i = Let G 1, 2, and



(i,j)

 j −2

q  1 (i) β2 (z ) (Zt − z )i )Khˆ c v ,zt z . i ! i=0

S¯n (z , b) = {F¯i,j (z , b)}1≤i,j≤q+2 ,

M4 |b1 − b2 |. Thus, for any ε > 0, P (|F¯1,1 (z , bˆ c v ) − F¯1,1 (z , c1 )| >

Zt − z

n t =1 n ( j − 2)!

˜ j,1 (z , b) = G

n



1

− β1 (z ))Khˆ c v ,zt z ,  n  t2 ˜Gj,2 (z ) = nhˆ c v n 1 2

(i,j)

Let

n  t

n t =1 n ( j − 2)!

Hence, by Lemma A.5, we have that F˜1,1 (z , bˆ c v ) = F˜1,1  (z , c1 ) + op (1) = n−1 nt=1 [(X1t X1t⊤ )(i,j) Kc1 n−1/3 ,zt z − E [(X1t X1t⊤ )(i,j) Kc1 n−1/3 ,zt z ]] + op (1).

p



˜ 1,2 (z , b) = G



nbn−1/3 n

n 1 t

n t =1 n

, 

for 2 ≤ j ≤ q + 2,

 β2 (Zt )

X1t



q  1 (i) β2 (z ) (Zt − z )i Kbn−1/3 ,zt z i ! i =0   q  1 (i) − E X1t β2 (Zt ) − β i ! 2  i=0 



× (z ) (Zt − z )i Kbn−1/3 ,zt z

,

28

Z. Liang, Q. Li / Journal of Econometrics 170 (2012) 15–31

˜ j,2 (z , b) = G

n



nbn−1/3 n

 ×

1  t2

n t =1 j−2 Zt − z

bn−1/3

n2

For L˜ 2n (z ), we have

1

( j − 2)!



(β2 (Zt )

 ,

d

nbn−1/3

 ×E

n 1 t

nbn−1/3 n



1



nbn−1/3 n

 ×E

n t =1 n

q  1 (i) β2 (Zt ) − β (z ) (Zt − z )i i ! 2 i =0 

Kbn−1/3 ,zt z

¯ j,2 (z , b) = G

n 1 t



× E X1t



hˆ c v Khˆ c v ,zt z , for

(V˜ 1 (z , b), . . . , V˜ q+2 (z , b))⊤ , where V˜ 1 (z , b) n t 1 bn−1/3 Kbn−1/3 ,zt z , and V˜ j (z , b) = 1n t =1 n j !

X1t ut

Using (A.10) in Li and Li (2010) and Davydov’s inequality, we know that there exists Mj > 0, such that E |V˜ j (z , b1 ) − V˜ j (z , b2 )|2 ≤ Mj |b1 − b2 |2 , for 2 ≤ j ≤ q + 2. For each component of V˜ 1 (z , b), (i)

we also know that there exist M1i > 0 such that E |V˜ 1 (z , b1 ) −

˜ (i)

V1 (z , b2 )| ≤ M1i |b1 − b2 | . 2

2

By Lemma A.5, we have V˜ j (z , bˆ c v ) = V˜ j (z , c1 )+ op (1). Therefore, we have that



˜ n (B˜ (z ) − B (z )) − S˜n (z )−1 Bn (z )] nhˆ c v [D = S˜n (z )−1 [G˜ 1n (z , bˆ c v ) + G˜ 2n (z , bˆ c v ) + V˜ j (z , bˆ c v ) + (G¯ 1n (z , bˆ c v ) + G¯ 2n (z , bˆ c v ) −



nhˆ c v Bn (z ))]

= S˜n (z )−1 [G˜ 1n (z , c1 ) + G˜ 2n (z , c1 ) + V˜ j (z , c1 ) + op (1)] = S˜n (z )−1 [G˜ 2n (z , c1 ) + V˜ j (z , c1 ) + op (1)]

Proof of Theorem 2.6. We write An = Bn + (s.o.) to mean that Bn is the leading term of An , (s.o.) denotes terms that are of smaller (probability)   order than Bn . Also, At = Bt + (s.o.) means that A = t t t Bt + (s.o.). To shorten the proof we will assume that (X1t , Zt , ut ) is an i.i.d. process. This simplifies the proof tremendously. ⊤ Let β1s denote β1 (Zs ). Substituting Ys by Ys = X1s β1s + sβ20 + us in A2t leads to A2t = A1t β20 + A3t + A4t ,

(A.22)

=

Combining (2.5) and (A.22) we obtain

,

⊤ ⊤ ⊤ Yt − X1t A2t = (t − X1t A1t )β20 + X1t β1t − X1t⊤ A3t + ut − X1t⊤ A4t .

n 1  t2

Substituting the above result into (2.8) gives

1

n t =1 ( j − 2)! j−2 Zt − z (β2 (Zt ) bn−1/3 n2

−1

 βˆ 20 − β20 = n−3

 (t − X1t⊤ A1t )2

n− 3

t

 (t − X1t⊤ A1t ) t

 × ut + (X1t⊤ β1t − X1t⊤ A3t ) − X1t⊤ A4t . 

(A.23)

Let Bjn (j = 1, 2, 3) be defined as in Lemma A.6 below. By Lemmas A.6 and A.7, we obtain that



¯ 1n (z , b) + G¯ 2n (z , b) − nbn−1/3 It is easy to see that supb∈B ∥G ¯ 1n (z , bˆ c v ) + G¯ 2n (z , bˆ c v ) − Bn (z )∥ = op (1) from (A.17). Hence, G nhˆ c v Bn (z ) = op (1).

X1t ut

  ⊤ ⊤ where A3t = [ s X1s X1s Kh,ts ]−1 s X1s X1s β1s Kh,ts and A4t   ⊤ [ s X1s X1s Kh,ts ]−1 s X1s us Kh,ts .

 q  1 (i) i − β (z ) (Zt − z ) )Kbn−1/3 ,zt z , i! 2 i=0 for 2 ≤ j ≤ q + 2.



)

t =1

by Slutsky’s Lemma and Theorem 2.2. Note that here h = c1 n−1/3 , so we have that nh3 → c1 . This completes the proof of Theorem 2.5. 

n t =1 n ( j − 2)!  j−2 Zt − z ⊤ X1t (β1 (Zt ) − β1 (z ))Kbn−1/3 ,zt z , bn−1/3



(

Zt −z j−2 ut hˆ c v

d

for 2 ≤ j ≤ q + 2,

¯ 1 ,2 ( z , b ) = G

t 1 t =1 n j !

→ N (0, S (z )−1 (Σa + c 2 Σb )S (z )−1 ),

⊤ E [X1t X1t (β1 (Zt )

− β1 (z ))Kbn−1/3 ,zt z ], √

n

n

t =1 n √ ( bnZt−−1/z3 )j−2 ut bn−1/3 Kbn−1/3 ,zt z , for j = 2, . . . , q + 2.

˜ 1n (z , c1 ) = op (1) and G˜ 2n (z , c1 ) → of Lemma A.4, we have G N (0, Σb ). ¯ in (z , b) = (G¯ 1,i (z , b), G¯ 2,i (z , b), . . . , G¯ q+2,i (z , b))⊤ , for i = Let G 1, 2, and

¯ j ,1 ( z , b ) = G

1 n

1 n





n 1

=

˜ 1(i,)1 (z , b1 ) − G˜ 1(i,)1 (z , b2 )|2 ≤ D1i |b1 − b2 |2 D1i , D2i > 0 such that E |G ( i ) ( i ) ˜ 1,2 (z , b1 ) − G˜ 1,2 (z , b2 )|2 ≤ D2i |b1 − b2 |2 . and E |G ˜ 1n (z , bˆ c v ) = G˜ 1n (z , c1 ) + op (1) and By Lemma A.5, we have G p ˜ 2n (z , bˆ c v ) = G˜ 2n (z , c1 ) + op (1), since bˆ c v → c1 . From the proof G

n t =1

hˆ c v Khˆ c v ,zt z , and V˜ j (z ) =

j = 2, . . . , q + 2. Let V˜ n (z , b) =

˜ 1,1 (z , b) and G˜ 1,2 (z , b), we also have that there exist component of G

n 1

n 1  −1 ˜ n X˜ t ut K ˆ D hc v ,zt z n t =1

where V˜ n (z ) = (V˜ 1 (z ), . . . , V˜ q+2 (z ))⊤ , V˜ 1 (z ) =



Using (A.10) in Li and Li (2010) and Davydov’s inequality, we ˜ j,i (z , b1 ) − G˜ j,i have that there exists Dj,i > 0, such that E |G (z , b2 )|2 ≤ Dj,i |b1 − b2 |2 , for 2 ≤ j ≤ q + 2, i = 1, 2. For each

nbn−1/3

nhˆ c v

= V˜ n (z ),

bn

q  1 (i) − β2 (z ) (Zt − z )i )Kbn−1/3 ,zt z i ! i =0 for 2 ≤ j ≤ q + 2.





def

q  1 (i) β (z ) (Zt − z )i )Kbn−1/3 ,zt z − i ! 2 i= 0   j −2 Zt − z −E (β2 (Zt ) −1/3

¯ 1 ,1 ( z , b ) = G

nhˆ c v L˜ 2n (z ) =

1 n3/2 (βˆ 20 − β20 ) = B− 1n [B2n + B3n − B4n ]

= M1−1 n−1/2

 [(t /n − X1t⊤ c1t ) t d

+ Dt ]ut + op (1) → N (0, σu2 V ),

Z. Liang, Q. Li / Journal of Econometrics 170 (2012) 15–31

where the two equalities follow from Lemma A.6 and the last convergence result follows from Lemma A.7 and Lyapunov central limit theorem, V = M1−1 (M1 + M2 )M1−1 .  Lemma A.6. Under the same conditions as given in Theorem 2.6, we have def

(i) B1n = n

 −3

t

(t − X1t⊤ A1t )2 = M1 + op (1), where M1 is defined

⊤ ⊤ Next, we consider I2n . Write I2n = (I2n ,1 , I2n,2 ), where I2n,1 =

  (Xt⊤ Xs )X1t us Kh,ts and I2n,2 = n−4 t s̸=t (Xt⊤ Xs )tus Kh,ts . Straightforward calculations show that E [∥I2n,1 ∥2 ] = O(n−1 ) 2 −1/2 and E [I2n ) and I2n,2 = Op (n1/2 ). ,2 ] = O(n). Hence, I2n,1 = Op (n n

−4

  t

s̸=t

Therefore,

⊤ ˆ ˆ I2n (βˆ 0 − β0 ) = I2n ,1 (β0,1 − β0,1 ) + I2n,2 (β0,2 − β0,2 ) ⊤

= Op (n−1/2 )Op (n−1/2 ) + Op (n1/2 )Op (n−3/2 )

in the Theorem 2.6; def

(ii) B2n = n (s.o.);

 −3/2

t

def

(iii) B3n = n−3/2 (iv) B4n

 −1/2

(t − X1t A1t )ut = n ⊤

(t /n − X1t c1t )ut + ⊤

t

(t − X1t⊤ A1t )X1t⊤ (β1t − A3t ) = op (1);   def −3/2 ⊤ ⊤ −1/2 =n t (t − X1t A1t )X1t A4t = n t Dt ut + op (1). 

t

The proof of Lemma A.6 is given in the supplementary Appendix C. def

⊤ c1t ) + Dt ]ut . Then Lemma A.7. Define An = n−1/2 t [(t /n − X1t 2 2 E (An ) = σu (M1 + M2 ) + o(1), where M1 and M2 are defined in Theorem 2.6.



29

= Op (n−1 ).

(B.3)

Finally, note that I3n is a d × d matrix, we partition it into I3n = (I3n,11 I3n,12 I3n,21 I3n,22 ), where I3n,11 is of dimension (d−1)×(d−1), I3n,12 is of dimension 1 × (d − 1), I3n,21 is (d − 1) × 1 and I3n,22 is a scalar. Then it is easy to show that E [∥I3n,11 ∥] = O(1), E [∥I3n,12 ∥] = O(n) and E [∥I3n,22 ∥] = O(n2 ). Hence, I3n,11 = Op (1), I3n,12 = Op (n) and I3n,22 = Op (n2 ). Therefore, we have

(βˆ 0 − β0 )⊤ I3n (βˆ 0 − β0 )

The proof of Lemma A.7 is given in the supplementary Appendix C.

= (Op (n−1/2 ), Op (n−3/2 ))

Appendix B

= Op (n−1 ).

Proof of Theorem 3.1. We will only provide a proof for the case that (Xt , Zt , ut ) is an i.i.d. process to save space. The mixing case can be proved by following similar arguments as in Fan and Li (1999). Under H0a , uˆ t = Yt − Xt⊤ βˆ 0 = ut + Xt⊤ (β0 − βˆ 0 ) =

Tna

√ =n

ut + X1t (β10 − βˆ 10 ) + t (β20 − βˆ 20 ), and it is well known that

βˆ 10 − β10 = Op (n under H0a ,

) and βˆ 20 − β20 = Op (n

−3/2

). We have,

Ina = I1n − 2(βˆ 0 − β0 )⊤ I2n + (βˆ 0 − β0 )⊤ I3n (βˆ 0 − β0 ), −4

 



(B.1) −4

 

where I1n = 2n t s>t Xt Xs ut us Kh,ts , I2n = n t s̸=t   (Xt⊤ Xs )Xt us Kh,ts and I3n = n−4 t s̸=t Xt (Xt⊤ Xs )Xs⊤ Kh,ts . We first consider I1n . E (I1n ) = 0 and it is easy to show that Var(I1n ) = (n2 h)−1 V0 +(s.o.), where V0 = (2/9)ν0 E {[E (σ 2 (Xt , Zt )|Zt )]2 f (Zt )}.   Also, it is straightforward to show that Vˆ 0 = 2hn−6 t s̸=t

(Xt⊤ Xs )2 uˆ 2t uˆ 2s Kh2,ts is a consistent estimator for V0 . Hence, we have  √ that T1n = n h I1n / Vˆ 0 ∼ (0, 1). We will show that T1n has an asymptotic standard normal distribution by using de Jong’s (1987) central limit theorem.   Define Un = t s>t Un,ts , where Un,ts = [2/(n(n − 1))]Hn,ts and Hn,ts = n−2 Xt⊤ Xs ut us Kh,ts . We apply de Jong’s (1987) CLT for generalized quadratic forms to derive the asymptotic distribution of Un . By Proposition 3.2 of de Jong (1987) we know that Un /Sn → N (0, 1) in distribution if GI , GII and GIV are all o(Sn4 ),     4 where Sn2 = E [Un2 ], GI = t s>t E [Un,ts ], GII = t s>t

[E ( Un2,ts Un2,tl ) + E (Un2,st Un2,sl ) + E (Un2,lt Un2,ls )], and GIV =   t s>t j l>j E (Un,tj Un,sl Un,tl Un,sj ). We will use the notation An = Oe (an ) to denote an exact order of an , i.e., it means that An = O(an ), but An ̸= o(an ). Then straightforward calculations show that Sn2 = Oe (n−2 h−1 ), GI = O(n−6 h−3 ), GII = O(n−5 h−2 ) and GIV = O(n−4 h−1 ). By noting that Sn4 = Oe (n−4 h−2 ), we have GI /Sn4 = O((n2 h)−1 ), GII /Sn4 = O(n−1 ), and GIV /Sn4 = O(h).





d

n h I1n / Vˆ 0 → N (0, 1).

(B.2)

under H0a . 

β(Zt ) − βˆ 0 = β(Zt ) − β¯ − βη + Op (n−1/2 ) = ηt − βη + Op (n−1/2 ).

(B.5)

Below we analyze the order of βη .

βη =

  1 t

t

t t2

−1   t

1 t

t t2

  η1t η2t

 η1t + t η2t + (s.o.) 2 1 t η1t + t η2t n3 t   12  [(1/6)nt (2n − 3t )η2t ] + (s.o.) = 3 (1/2)[t (2t − n)η2t ] n t   Oe (n1/2 ) = , −1/2 Op (n ) =

12

n /3 −n/2



2

−n/2

  

(B.6)

where the last equality follows because ηt has zero mean. Summarizing the above we have shown that if P [β2 (Zt ) = β02 ] < 1, then



βˆ 01 − β1 (Zt ) βˆ 02 − β2 (Zt )



 =



to check that n h I1n / Vˆ 0 = Un /Sn + op (1). Hence, we have, under H0a ,



Proof of Theorem 3.2. We consider the case that X1t = 1 to simplify the proof. The result remains the same if X1t is a vector of stationary variables. Writing β(Zt ) = β¯ + β(Zt ) − β¯ ≡ β¯ + ηt , where β¯ = E [β(Zt )] and ηt = β(Zt ) − β¯ has zero mean and   finite variance. We have βˆ 0 = [ t Xt Xt⊤ ]−1 t Xt Yt = β¯ + βη +     ⊤ −1 ⊤ [ t Xt Xt⊤ ]−1 t X t ut , where βη = [ t Xt Xt ] t Xt Xt ηt . It is  ⊤ −1 −1/2 well known that [ t Xt Xt ] ), Op (n−3/2 )). t Xt ut = (Op (n Hence, we know that

l>s>t

d

Op (n−1/2 ) Op (n−3/2 )



  √ / Vˆ 0 = n h I1n / V0

h Ina

d

(1/2)

U-statistic, we know that  Un /Sn → N (0, 1). Finally, it is easy

Op (n) Op (n2 )

(B.4)

+ op (1) → N (0, 1)



Therefore, by de Jong’s central limit theorem for degenerate

Op (1), Op (n),

Combining (B.2)–(B.4) and the fact that Vˆ 0 = V0 + op (1), we have shown that



−1/2



=



η2t 

Oe (n1/2 ) + Op (n−1/2 )

Oe (n1/2 ) + op (n1/2 ) . η2t + op (1)



(B.7)

Case (i). P [β2 (Zt ) = β02 ] < 1. Define β¯ = E [β(Zt )] and ηt = β(Zt ) − β¯ . The test statistic is

30

Z. Liang, Q. Li / Journal of Econometrics 170 (2012) 15–31

1 

Ina =

n4

t

Proof of Theorem 3.4. (ii) Under H1b , let β¯ 20 denote the probability limit of βˆ 20 . Then uˆ t =

Xt⊤ Xs ut us Kh,ts

s̸=t

⊤ˆ Yt −X1t β1t −t βˆ 20 = ut +X1t⊤ (β1t −βˆ 1t )+t (β2t −βˆ 20 ) ∼ t (β2t −β¯ 20 ). Hence, the leading term of Inb is

2 



(Xt⊤ Xs )Xt⊤ (βˆ 0 − β(Zt ))us Kh,ts

n4

t

s̸=t

n− 2

1  ⊤ + 4 (Xt Xs )(βˆ 0 − β(Zt ))⊤ Xt Xs⊤ (βˆ 0 − βs )Kh,ts , n

t

t

s̸=t

and replace (βˆ 0 − β(Zt ))⊤ Xt by η2t t, we use G3n,0 to denote it. Then

=

n4

t

1 n6 n4

9

t 2 s2 E [η2t η2s Kh,ts ] + (s.o.)

s̸=t

h Ina

=n

Case (ii). P [β2 (Zt ) = β02 ] = 1 and P [β1 (Zt ) = β01 ] < 1. In this case η2t = β2 (Zt ) − β02 = 0 and (B.6) becomes



 =

−1/2



+ (s.o.)



(B.8)

Hence, βˆ 10 −β(Zt ) = η1t + Op (n−1/2 ) and βˆ 20 −β2 (Zt ) = Op (n−3/2 ) (since η2t = 0). We have 1 

⊤ ⊤ (ts)E [η1t X1t X1s η1s Kh,ts ] = Oe (1).

n4

t

s̸=t

Similarly, one can show that G2n = op (1) and that Vˆ 0 = Op (1). Hence, we have shown that Tna



h Ina

=n

 √ √ / Vˆ 0 = n h Oe (1) → ∞ at the rate of n h. 

Proof of Theorem 3.4. (i) In Lemma B.1 we show that, under H0b , Inb =

1



n(n − 1)

t

ut us [(1 + G2,ts )Kh,ts + G1,ts K¯ h,ts ] + (s.o.),

s̸=t

where G1,ts and G2,ts are defined in Lemma B.1. Then by Fan and Li (1999) (with β -mixing data) central limit theorem for degenerate U-statistic, we have that, under H0b ,



  t

s̸=t

E {u2t u2s [(1 + G2,ts )Kh,ts + G1,ts

K¯ h,ts ] }. It is easy to show that Vb is a positive constant.   ˆ 2t uˆ 2s A consistent estimator of Vb is given by Vˆ b = n2h2 t s= ̸ tu [(1 + Gˆ 2,ts )Kh,ts + Gˆ 1,ts K¯ h,ts ]2 , where Gˆ j,ts (j = 1, 2) is obtained from Gj,ts with η1t and η2t being replaced by ηˆ 1t = n−1 j X1j Kh,tj and 2

ηˆ 2t = n

−1



j



X1j X1j Kh,tj .



s̸=t

where c0 = E [f (Zt )(β2t − β¯ 20 )2 ] > 0 is a positive constant.





We write Tˆnb = n hInb / Vˆ b and T˜nb = n hInb / V˜ b to distinguish the test statistics using different variance estimators. Again since under H1b , uˆ t ∼ t (β2t − β¯ 20 ), we have Vˆ b ∼     (h/n2 ) t s̸=t t 2 s2 (β2t −β¯ 20 )2 (β2s −β¯ 20 )2 Kh2,ts ∼ (h/n2 ) t s̸=t

= Oe (nh1/2 ), which

Hence, Tˆnb ∼ nh1/2 Oe (n2 )/ Oe (n4 ) diverges to +∞ at the rate of nh1/2 .  Lemma B.1. 1



n(n − 1)

t

ut us [(1 + G2,ts )Kh,ts

s̸=t

+ G1,ts K¯ h,ts ] + (s.o.),

(B.9)

where G1,ts = (1/2)X1t [η2t η1t η1t η2t + η2s η1s η1s η2s ]X1s and ⊤ −1 ⊤ −1 G2,ts = η1t η2t X1s + η1s η2s X1t . ⊤

−1

⊤ −1

−1

⊤ −1

Proof of Theorem 3.5. From Eq. (B.1), we have that, under H0a ,

˜Ina = ˜I1n − 2(βˆ 0 − β0 )⊤ ˜I2n + (βˆ 0 − β0 )⊤ ˜I3n (βˆ 0 − β0 ), (B.10)     1 ⊤ ⊤ ˜ where ˜I1n = n24 t s>t Xt Xs ut us Khˆ c v ,ts , I2n = n4 t s̸=t (Xt Xs )   1 ⊤ ⊤ Xt us Khˆ c v ,ts and ˜I3n = n4 t s̸=t Xt (Xt Xs )Xs Khˆ c v ,ts . From Theoˆ rem 2.4 we know that hc v = b0 n−δ + (s.o.) and bˆ c v = nδ hˆ c v , p where 0 < δ < 1. Then bˆ c v → b0 . Let B = [b, b¯ ], where 0 < b < b0 < b¯ < ∞. √   ⊤ ˜ Denote ˜I1n (b) = n bn−δ n24 t s>t Xt Xs ut us Kbn−δ ,ts , I2n (b) = √ √   1 ⊤ −δ −δ 1 ˜ n bn n4 t s̸=t (Xt Xs )Xt us Kbn−δ ,ts and I3n (b) = n bn  n4    ⊤ ⊤ ˆ ˜ ˜ ˆ ˆ ˜ t s̸=t Xt (Xt Xs )Xs Kbn−δ ,ts . Then n hc v I1n = I1n (bc v ), n hc v I2n  = ˜I2n (bˆ c v ) and n hˆ c v ˜I3n = ˜I3n (bˆ c v ). Using similar arguments as those used in deriving (A.12) in Li and Li (2010), one can show that for any b1 , b2 ∈ B, there exists M > 0 such that E |˜I1n (b1 )− ˜I1n (b2 )|2 ≤ M |b1 − b2 |2 . By Lemma A.5, we have ˜I1n (bˆ c v ) = ˜I1n (b0 ) + op (1). Next, we consider ˜I2n (b). Write ˜I2n (b)⊤ = (˜I2n,1 (b)⊤ , ˜I2n,2 (b)),



where ˜I2n,1 (b) = n bn−δ n−4



  t

s̸=t

(Xt⊤ Xs )X1t us Kbn−δ ,ts and

s̸=t (Xt Xs )tus Kbn−δ ,ts . By direct calculations, ˜ we have that supb∈B |I2n,1 (b)| = Op (n(1−δ)/2 ) and supb∈B |˜I2n,2 (b)| = Op (n(3−δ)/2 ). Thus,

˜I2n,2 n

d

n hInb → N (0, Vb ), where Vb = limn→∞ n2h2

ts = (c0 /4)n2 + o(n2 ),

The proof of Lemma B.1 is given in the supplementary Appendix C.

Op (n ) . Op (n−3/2 )

E [G3n ] ∼

t

Inb =

 √ √ / Vˆ 0 = n h Oe (n2 ) → ∞ at the rate of n3 h.

12  (n/6)(2n − 3t )η1t βη = 3 (1/2)(2t − n)η1t n t





of In is given by G3n . By noting that Vˆ 0 = Op (1), we have that



∼ n−2 c0

tsE [(β2t − β¯ 20 )(β2s − β¯ 20 )Kh,ts ]

s̸=t

t 2 s2 E [(β2t − β¯ 20 )2 (β2s − β¯ 20 )2 Kh2,ts ] ∼ hn−2 n6 O(h−1 ) = O(n4 ).

E [f (Zt )η22 (Zt )] + (s.o.) = Oe (n2 ).

Similarly, term of G2n is G2n,0 =   one2 can show that the leading 1/2 2n−4 t ). Hence, the leading term s̸=t t s η2t us Kh,ts = Oe (n

Tna

 t

where the definitions of G2n and G3n should be apparent. I1n is the same as defined in (B.1). Hence, we know that I1n = Op ((nh1/2 )−1 ) = op (1). Next, we consider G3n . It is easy to show ⊤ that the leading term of G3n is to replace Xt⊤ Xs = X1t X1s + ts by ts 1 

ts(β2t − β¯ 20 )(β2s − β¯ 20 )Kh,ts

s̸=t

∼ n− 2

= I1n + G2n + G3n ,

E [G3n,0 ] =



bn−δ n

  −4



t

˜I2n (b)⊤ (βˆ 0 − β0 ) = ˜I2n,1 (b)⊤ (βˆ 0,1 − β0,1 ) + ˜I2n,2 (b)(βˆ 0,2 − β0,2 ) = Op (n(1−δ)/2 )Op (n−1/2 ) + Op (n(3−δ)/2 )Op (n−3/2 ) = Op (n−δ/2 ) = op (1).

Z. Liang, Q. Li / Journal of Econometrics 170 (2012) 15–31

For ˜I3n(b), it is a d × d matrix, and we partition it into

˜I3n (b) =

˜I3n,11 (b) ˜I3n,21 (b)

˜I3n,12 (b) ˜I3n,22 (b) . By directly calculating the moment,

we have that supb∈B ∥˜I3n,11 (b)∥ = Op (n1−δ/2 ), supb∈B ∥˜I3n,12 (b)∥ =

Op (n2−δ/2 ) and supb∈B ∥˜I3n,22 (b)∥ = Op (n3−δ/2 ). Therefore, we have

  (βˆ 0 − β0 )⊤ ˜I3n (b)(βˆ 0 − β0 ) = Op (n−1/2 ), Op (n−3/2 )   Op (n1−δ/2 ), Op (n2−δ/2 ) × Op (n2−δ/2 ), Op (n3−δ/2 )   Op (n−1/2 ) × Op (n−3/2 ) = Op (n−1 ) = op (1). n n 2 Let V˜ 0a (b) = (4bn−δ n−6 ) t =1 s>t [(Xt⊤ Xs )2 uˆ 2t uˆ 2s Kbn −δ ,ts −  n n a ⊤ 2 2 2 2 −δ − 6 ¯ E [(Xt Xs ) uˆ t uˆ s Kbn−δ ,ts ]] and V0 (b) = (4bn n ) t =1 s>t 2 E [(Xt⊤ Xs )2 uˆ 2t uˆ 2s Kbn −δ ,ts ].

Similarly, using (A.12) in Li and Li (2010), we can show that there exist M1 , M2 > 0 such that E |V˜ 0a (b1 ) − V˜ 0a (b2 )|2 ≤ M1 |b1 − b2 |2 and |V¯ 0a (b1 ) − V¯ 0a (b2 )| ≤ M2 |b1 − b2 |. p

By Lemma A.5 and bˆ c v → b0 , we have that V˜ 0a (bˆ c v ) = V˜ 0a (b0 ) +

op (1) and V¯ 0a (bˆ c v ) = V¯ 0a (b0 ) + op (1). Also, it is easy to see that   (4b0 n−δ n−6 ) nt=1 ns>t (Xt⊤ Xs )2 uˆ 2t uˆ 2s Kb2 n−δ ,ts = V0 + op (1). Thus, 0

˜ = ˜ (bˆ c v ) + ¯ (bˆ c v ) = ˜ (b0 ) + ¯ (b0 ) + op (1) n  n  = (4b0 n−δ n−6 ) (Xt⊤ Xs )2 uˆ 2t uˆ 2s Kb2 n−δ ,ts + op (1)

V0a

V0a

V0a

V0a

V0a

t =1 s>t

0

= V0 + op (1). p

This means V˜ 0a → V0 . Therefore, we have that under H0a





T˜na = n hˆ c v ˜Ina / V˜ 0

  d = n b0 n−δ ˜I1n (b0 )/ V˜ 0 + op (1) → N (0, 1), by Slutsky’s Lemma and Theorem 3.1.



31

References Andrews, D.W.K., McDermott, C.J., 1995. Nonlinear econometric models with deterministically trending variables. Review of Economic Studies 62, 343–360. Bhattacharya, R.N., Waymire, E.C., 2009. Stochastic Processes with Applications. SIAM, Philadelphia. Cai, Z., 2007. Trending time-varying coefficient time series models with serially correlated errors. Journal of Econometrics 136, 163–188. Cai, Z., Fan, J., Yao, Q., 2000. Functional-coefficient regression models for nonlinear time series. Journal of the American statistical Association 95, 941–956. Cai, Z., Li, Q., Park, J.Y., 2009. Functional-coefficient models for nonstationary time series data. Journal of Econometrics 148, 101–113. de Jong, P., 1987. A central limit theorem for generalized quadratic forms. Probability Theory and Related Fields 75, 261–277. Fan, Y., Li, Q., 1999. Central limit theorem for degenerate U-statistics of absolutely regular processes with applications to model specification tests. Journal of Nonparametric Statistics 10, 245–271. Fan, J., Zhang, W., 2000. Simulations confidence bands and hypothesis testing in varying-coefficient models. Scandinavian Journal of Statistics 27, 715–731. Gao, J., Gijbels, I., 2008. Bandwidth selection in nonparametric kernel testing. Journal of American Statistical Association 103, 1584–1594. Hall, P., Heyde, C.C., 1980. Martingale Limit Theory and its Applications. Academic Press, New York. Krugman, P., 1995. The Age of Diminished Expectations. MIT Press, Cambridge. Li, Q., Huang, C.J., Li, D., Fu, T.-T., 2002. Semiparametric smooth coefficient models. Journal of Business & Economic Statistics 20, 412–422. Li, D., Li, Q., 2010. Nonparametric/semiparametric estimation and testing of econometric models with data dependent smoothing parameters. Journal of Econometrics 157, 179–190. Mamuneas, T.P., Savvides, A., Stengos, T., 2006. Economic development and the return to human capital: a smooth coefficient semiparametric approach. Journal of Applied Econometrics 21, 111–132. Perron, P., 1989. The great crash, the oil price shock, and the unit root hypothesis. Econometrica 57, 1361–1401. Robinson, P.M., 1989. Nonparametric estimation of time-varying parameters. In: Hackl, P. (Ed.), Statistical Analysis and Forecasting of Economic Structural Change. Springer, Berlin, pp. 164–253. Robinson, P.M., 1988. Root-N-consistent semiparametric regression. Econometrica 56, 931–954. Stengos, T., Zacharias, E., 2006. Intertemporal pricing and price discrimination: a semiparametric hedonic analysis of the personal computer market. Journal of Applied Econometrics 21, 371–386. Xia, Y.C., Li, W.K., 2002. Asymptotic behavior of bandwidth selected by the crossvalidation method for local polynomial fitting. Journal of Multivariate Analysis 83, 265–287. Xiao, Z., 2009. Functional-coefficient cointegration models. Journal of Econometrics 152, 81–92. Zhang, W., Lee, S.Y., Song, X., 2002. Local polynomial fitting in semiparametric coefficient models. Journal of Multivariate Analysis 82, 166–188. Zivot, E., Andrews, D.W.K., 1992. Further evidence on the great crash, the oil price shock and the unit root hypothesis. Journal of Business and Economic Statistics 10, 251–270.