Statistics and Probability Letters 80 (2010) 1420–1430
Contents lists available at ScienceDirect
Statistics and Probability Letters journal homepage: www.elsevier.com/locate/stapro
Empirical likelihood for the smoothed LAD estimator in infinite variance autoregressive models Jinyu Li a,∗ , Wei Liang b , Shuyuan He c , Xianbin Wu d a
School of Sciences, China University of Mining and Technology, Xuzhou 221116, China
b
School of Mathematical Sciences, Peking University, Beijing 100871, China
c
School of Mathematical Sciences, Capital Normal University, Beijing 100048, China
d
Zhejiang Wanli University, Ningbo 315101, China
article
info
Article history: Received 11 August 2008 Received in revised form 4 March 2010 Accepted 11 May 2010 Available online 1 June 2010 MSC: primary 62G15 secondary 62M10
abstract This paper proposes an empirical likelihood method to estimate the parameters of infinite variance autoregressive (IVAR) models and to construct confidence regions for the parameters. Simulation studies suggest that in small sample case, the empirical likelihood confidence regions may be more accurate than the confidence regions constructed by the normal approximation based on the self-weighted LAD estimator proposed by Ling (2005). © 2010 Elsevier B.V. All rights reserved.
Keywords: LAD estimation Infinite variance Empirical likelihood Kernel function
1. Introduction The empirical likelihood was introduced by Owen (1988, 1990) for the mean vector of independent identically distributed (i.i.d.) observations. The empirical likelihood method produces confidence regions whose shape and orientation are determined entirely by the data, and is Bartlett correctable. For these reasons, the empirical likelihood has found lots of applications such as in smooth functions of means, in nonparametric density, in regression models, in quantiles and so on. In the situation of autoregressive models, Monti (1997) considered the empirical likelihood in the frequency domain, Chuang and Chan (2002) developed the empirical likelihood for unstable autoregressive models with innovations being a martingale difference sequence with finite variance. Chan and Ling (2006) used the empirical likelihood method to generalized autoregressive conditional heteroskedasticity (GARCH) models. Consider the stationary autoregressive (AR(p)) time series {yt } generated by yt = ϕ0 + ϕ1 yt −1 + ϕ2 yt −2 + · · · + ϕp yt −p + εt ,
(1.1) τ
where {εt } is a sequence of independent and identically distributed (i.i.d.) errors and φ = (ϕ0 , ϕ1 , . . . , ϕp ) is an unknown parameter vector with its true value φ0 . When E (εt2 ) is finite, many methods are available for statistical inference, and the empirical likelihood method can be used; see Chuang and Chan (2002). When E (εt2 ) is infinite, model (1.1) is called the
∗
Corresponding author. Tel.: +86 13914879342. E-mail address:
[email protected] (J. Li).
0167-7152/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.spl.2010.05.008
J. Li et al. / Statistics and Probability Letters 80 (2010) 1420–1430
1421
infinite variance autoregressive (IVAR) model. Such heavy-tailed models have been studied in many papers (see, e.g., Davis et al. (1992), and references therein). Recently, Ling (2005) proposed a self-weighted LAD estimator for the IVAR models and show that this estimator is asymptotically normal. In this paper, we propose an empirical likelihood method to estimate the parameters of the IVAR models and to construct confidence regions for the parameters. We derive an empirical likelihood ratio statistic by smoothing the estimating equations of the self-weighted LAD estimator proposed by Ling (2005). We show that the smoothed empirical likelihood ratio statistic has an asymptotic standard chi-squared distribution. Hence it can be used to construct confidence regions for the parameters. Furthermore, we propose a smoothed empirical likelihood (SEL) estimator, which satisfies the smoothed estimating equations of the self-weighted LAD estimator. We show that our estimator is consistent, asymptotically normal, and (first-order) asymptotically equivalent to the self-weighted LAD estimator. Simulation studies suggest that in small sample case, the empirical likelihood confidence regions may be more accurate than the confidence regions constructed by the normal approximation based on the self-weighted LAD estimator. The paper is organized as follows. Section 2 defines the confidence region and the SEL estimator in IVAR models and discusses their asymptotic properties. Section 3 reports some simulation results. The proof of theorems appears in Section 4. 2. Methodology and main results Let {y1 , y2 , . . . , yn } be a random sample from the model (1.1). The strictly stationary and ergodic condition of model (1.1) is as follows (Brockwell and Davis, 1991). (A1) The characteristic polynomial 1 − ϕ1 z − · · · − ϕp z p has all roots outside the unit circle and E |εt |δ < ∞ for some δ > 0. The objective function based on least absolute deviation (LAD) estimator is defined by
Ψn (φ) =
n X
wt |yt − φ τ Yt |,
(2.1)
t =p+1
where Yt = (1, yt −1 , . . . , yt −p )τ , wt = g (yt −1 , . . . , yt −p ) and g (x1 , . . . , xp ) is a given positive measurable function on Rp . Let
φˆ LAD = arg min Ψn (φ),
(2.2)
φ∈Θ
where Θ is the parameter space. φˆ LAD is called self-weighted LAD estimator of the true value φ0 in Θ (Ling, 2005). By definition, the φˆ LAD satisfies estimating equation 1
n X
n − p t =p+1
wt Yt sgn(yt − φ τ Yt ) = 0,
(2.3)
where sgn(x) = −1, 0, 1 for x < 0, x = 0 and x > 0, respectively. Put mt0 (φ) = wt Yt sgn(yt − φ τ Yt ),
p+1≤t ≤n
(2.4)
and note that mt0 (φ) is not differential at point φ such that yt = φ τ Yt for some t. This causes some problems for our subsequent asymptotic analysis. To overcome this problem, we replace mt0 with a smooth function. For this purpose, we use the rth kernel K (Silverman, 1986), defined by
Z
1, 0,
(
+∞
xj K (x)dx = −∞
κ,
if j = 0, if 1 ≤ j ≤ r − 1, if j = r
(2.5)
where r ≥ 2 is an integer and κ 6= 0 is a constant.
R x/ h
Define Gh (x) = −x/h K (u)du for h > 0. Then, a smoothed version of mt0 is defined by mth (φ) = wt Yt Gh (yt − φ τ Yt ),
p + 1 ≤ t ≤ n.
(2.6)
τ
Let P = (pp+1 , . . . , pn ) be a vector of nonnegative numbers adding to unity. Then, the smoothed empirical log-likelihood ratio is defined by
) n n X X lh (φ) = −2 max log((n − p)pt ) p m (φ) = 0, pt ≥ 0, pt = 1 . t =p+1 t th t =p+1 t =p+1 (
n X
(2.7)
Given φ , using the standard Lagrange multiplier arguments, the optimal value of P is derived to be pt (φ) =
1
(n − p)(1 + λ(φ)τ mth (φ))
,
p + 1 ≤ t ≤ n,
(2.8)
1422
J. Li et al. / Statistics and Probability Letters 80 (2010) 1420–1430
where λ(φ) is a (p + 1)-dimensional vector of Lagrange multipliers satisfying 1
mth (φ)
n X
n − p t =p+1 1 + λ(φ)τ mth (φ)
= 0.
(2.9)
This gives the smoothed empirical log-likelihood ratio statistic n X
lh (φ) = 2
log(1 + λ(φ)τ mth (φ)).
(2.10)
t =p+1
The smoothed empirical likelihood (SEL) estimator of φ0 is defined as
φˆ SEL = arg min lh (φ).
(2.11)
φ∈Θ
By Theorem 2 below, the φˆ SEL satisfies pt (φˆ SEL ) =
1 n−p
,
λ(φˆ SEL ) = 0,
1
n X
n − p t =p+1
mth (φˆ SEL ) = 0.
(2.12)
We need the following assumptions for our first result. (A2) The errors εt have zero median and the r − 1 times derivative of the density f (x) exists in a neighborhood of zero and is continuous at zero and f (0) > 0. (A3) K (x) is bounded and compactly supported. (A4) h satisfies nh2r → 0 as n → ∞. Theorem 1. Under the assumptions (A1)–(A4), if E wt2 kYt k2 < ∞, then as n → ∞, we have
d
→ χp2+1 . lh (φ0 ) −
(2.13)
If c = cα is chosen such that P (χp2+1 ≤ cα ) = α , then Theorem 1 implies that the asymptotic coverage probability of the SEL confidence region Ihc = (φ : lh (φ) ≤ cα ) will be α , i.e. P (φ0 ∈ Ihc ) = P (lh (φ0 ) ≤ cα ) = α + o(1), as n → ∞. We need the following extra assumptions for our second result. (A5) h = 1/nγ with 1/2r < γ < 1/3. (A6) The second derivative of K (x) exists and is bounded in the support. (A7) The derivative of f (x) exists in R and is bounded. Theorem 2. Let dn = 1/nα with max{1/3, 3γ /2} < α < 1/2. Under the assumptions (A1)–(A7), if E (wt + wt2 )(wt kYt k3 + kYt k4 ) < ∞ and E (wt kYt k3 )1+a < ∞ for some 2γ /(1 − γ ) < a < 1, then as n → ∞, with probability 1, lh (φ) attains its minimum value at some point φˆ SEL in the interior of the ball kφ − φ0 k ≤ dn , and
√
n(φˆ SEL − φˆ LAD ) = op (1).
(2.14)
By Theorem 2 and the result of Ling (2005), the asymptotic distribution of the SEL estimators can be derived directly. Corollary 1. Under the conditions of Theorem 2, as n → ∞, we have
√
d
n(φˆ SEL − φ0 ) − →N
0,
1 4f 2 (0)
Σ −1 ΩΣ −1 ,
(2.15)
where Σ = E (wt Yt Ytτ ), Ω = E (wt2 Yt Ytτ ). 3. Simulation results We consider the AR(1) model: yt = ϕ0 + ϕ1 yt −1 + εt , t = 1, 2, . . . , n. There are three different distributions for the errors εt : (i) Student t distribution with 2 degrees of freedom, (ii) Cauchy distribution, (iii) Standard normal distribution. We use ϕ0 = 0 and ϕ1 = −0.5, 0.5, 0.80, respectively. The sample size n = 20, 35, 50, 100, and 5000 replications are conducted in all cases (see Tables 1–3). We smooth the estimating equation using a second-order kernel (i.e. r = 2) K ( x) =
3
√ 4 5
1−
1 5
x2
√ I |x| ≤ 5 ,
J. Li et al. / Statistics and Probability Letters 80 (2010) 1420–1430
1423
Table 1 The coverage probability of confidence regions when εt ∼ t2 . n
EL(1.0)
α = 0.90
ϕ1 = −0.50
20 35 50 100
0.863 0.886 0.893 0.897
α = 0.90
ϕ1 = 0.50
20 35 50 100
0.847 0.884 0.883 0.896
α = 0.90
ϕ1 = 0.80
20 35 50 100
0.786 0.868 0.883 0.893
α = 0.95
ϕ1 = −0.50
20 35 50 100
0.923 0.942 0.945 0.949
α = 0.95
ϕ1 = 0.50
20 35 50 100
0.905 0.938 0.941 0.944
α = 0.95
ϕ1 = 0.80
20 35 50 100
0.843 0.919 0.936 0.948
EL(0.75)
EL(0.50)
EL(0.25)
EL(0)
NA(0.25)
NA(0.20)
0.868 0.885 0.889 0.896
0.854 0.877 0.891 0.894
0.844 0.886 0.887 0.897
0.865 0.885 0.893 0.896
0.950 0.949 0.949 0.944
0.964 0.968 0.964 0.966
0.848 0.885 0.886 0.894
0.836 0.879 0.892 0.903
0.826 0.877 0.882 0.891
0.854 0.879 0.889 0.896
0.943 0.942 0.940 0.940
0.961 0.961 0.960 0.965
0.783 0.864 0.881 0.892
0.776 0.854 0.879 0.893
0.761 0.853 0.891 0.896
0.784 0.868 0.885 0.893
0.929 0.924 0.933 0.938
0.947 0.960 0.960 0.959
0.919 0.942 0.943 0.948
0.911 0.933 0.940 0.950
0.897 0.936 0.942 0.945
0.923 0.936 0.943 0.952
0.975 0.977 0.976 0.974
0.985 0.986 0.986 0.985
0.904 0.932 0.942 0.946
0.894 0.927 0.936 0.947
0.879 0.929 0.938 0.950
0.906 0.933 0.942 0.949
0.967 0.972 0.972 0.972
0.978 0.983 0.982 0.983
0.841 0.918 0.934 0.945
0.836 0.908 0.932 0.943
0.818 0.907 0.936 0.950
0.843 0.919 0.938 0.944
0.956 0.936 0.967 0.973
0.967 0.981 0.978 0.980
which is the so-called Bartlett or Epane˘cnikov kernel. The coverage probabilities of confidence regions Ihc based on smoothed empirical likelihood with the bandwidth h = 1/nγ are denoted by EL(γ ), where γ = 1.0, 0.75, 0.5, 0.25, respectively. To evaluate the effect of smoothing the estimating equations, we also consider the coverage probabilities of confidence regions based on unsmoothed empirical likelihood (i.e. the case h = 0), denoted by EL(0). As another benchmark of the simulation experiments, we consider the confidence regions based on the asymptotic normal distribution of self-weighted LAD estimator proposed by Ling (2005). The confidence regions are defined to be
ˆ −1 (φˆ LAD − φ) ≤ cα ), ILAD = (φ : n(φˆ LAD − φ)τ Λ
(3.1)
ˆ is defined by where φˆ LAD is the self-weighted LAD estimator of φ0 , cα is the α -quantile of the χ distribution. The Λ 2 2
ˆ = Λ
1 4fˆ 2 (0)
ˆ −1 Ω ˆΣ ˆ −1 , Σ
(3.2)
ˆ ,Ω ˆ and fˆ (0) are estimators of Σ , Ω and f (0), respectively, obtained by where Σ ˆ = Σ
1
n X
n − p t =p+1
fˆ (0) =
e K (x) =
wt Yt Ytτ ,
1
n X
σˆ w bn (n − p) t =p+1 exp(−x)
, (1 + exp(−x))2
ˆ = Ω
1
n X
n − p t =p+1
τ Yt yt − φˆ LAD wt e K
bn
bn =
1 n
, ν
wt2 Yt Ytτ ,
! ,
σˆ w =
1
n X
n − p t =p+1
wt .
The coverage probabilities of confidence regions ILAD based on the bandwidth bn = 1/nν are denoted by NA(ν ), with ν = 0.25, 0.20, respectively. There are many different weights wt to be chosen, from Ling (2005), we use the following
1424
J. Li et al. / Statistics and Probability Letters 80 (2010) 1420–1430
Table 2 The coverage probability of confidence regions when εt ∼ Cauchy. n
EL(1.0)
α = 0.90
ϕ1 = −0.50
20 35 50 100
0.840 0.876 0.885 0.896
α = 0.90
ϕ1 = 0.50
20 35 50 100
0.832 0.870 0.873 0.893
α = 0.90
ϕ1 = 0.80
20 35 50 100
0.760 0.842 0.872 0.894
α = 0.95
ϕ1 = −0.50
20 35 50 100
0.903 0.925 0.938 0.945
α = 0.95
ϕ1 = 0.50
20 35 50 100
0.877 0.925 0.932 0.946
α = 0.95
ϕ1 = 0.80
20 35 50 100
0.820 0.903 0.928 0.946
EL(0.75)
EL(0.50)
EL(0.25)
EL(0)
NA(0.25)
NA(0.20)
0.849 0.868 0.885 0.897
0.834 0.869 0.887 0.893
0.832 0.865 0.878 0.898
0.844 0.878 0.880 0.889
0.895 0.912 0.923 0.936
0.921 0.941 0.949 0.952
0.831 0.870 0.885 0.899
0.817 0.867 0.877 0.895
0.809 0.863 0.871 0.893
0.829 0.874 0.876 0.891
0.896 0.920 0.912 0.932
0.920 0.935 0.939 0.953
0.756 0.846 0.873 0.895
0.755 0.839 0.871 0.897
0.730 0.828 0.865 0.893
0.765 0.851 0.880 0.897
0.893 0.908 0.918 0.933
0.911 0.933 0.939 0.952
0.898 0.935 0.936 0.945
0.898 0.927 0.933 0.944
0.888 0.922 0.929 0.944
0.904 0.930 0.936 0.948
0.932 0.946 0.957 0.966
0.945 0.959 0.965 0.947
0.886 0.922 0.932 0.941
0.873 0.920 0.925 0.940
0.865 0.913 0.927 0.944
0.885 0.925 0.930 0.946
0.937 0.948 0.950 0.959
0.941 0.959 0.965 0.971
0.814 0.900 0.926 0.948
0.813 0.896 0.922 0.945
0.794 0.889 0.920 0.948
0.825 0.900 0.928 0.944
0.926 0.946 0.959 0.966
0.935 0.958 0.966 0.978
weight analogue to the influence function (Huber, 1977):
wt =
1, C 3 /a3t ,
if at = 0, if at 6= 0,
Pp
(3.3)
where at = i=1 |yt −i |I (|yt −i | ≥ C ) and C is the 0.95 quantile of data {y1 , y2 , . . . , yn }. The simulation results can be summarized as follows. The coverage probabilities of NA(ν ) are larger than the nominal levels and very sensitive to the choice of bandwidth bn . On the other hand, the coverage probabilities of EL(γ ) are much better, and less sensitive to the choice of bandwidth h. The unsmoothed coverage probabilities EL(0) are similar to that of EL(γ ), γ = 1.0, 0.75, 0.50, 0.25. It is also interesting to note that the coverage probabilities of all empirical likelihood confidence regions increase to the nominal levels as the sample size n increases. 4. Proof of main results The following notations will be used in the proof. Let Zn (φ) = max kmih (φ)k, p+1≤i≤n
Qn0 (φ) = Qnh (φ) = S (φ) =
1
n X
n − p i=p+1 1
n X
n − p i=p+1 1
n X
n − p i=p+1
mi0 (φ), mih (φ),
mih (φ)mih (φ)τ .
To prove Theorem 1, we first prove the following lemma.
J. Li et al. / Statistics and Probability Letters 80 (2010) 1420–1430
1425
Table 3 The coverage probability of confidence regions when εt ∼ N (0, 1). n
EL(1.0)
α = 0.90
ϕ1 = −0.50
20 35 50 100
0.878 0.891 0.896 0.897
α = 0.90
ϕ1 = 0.50
20 35 50 100
0.854 0.882 0.887 0.893
α = 0.90
ϕ1 = 0.80
20 35 50 100
0.791 0.865 0.879 0.895
α = 0.95
ϕ1 = −0.50
20 35 50 100
0.930 0.942 0.946 0.950
α = 0.95
ϕ1 = 0.50
20 35 50 100
0.913 0.934 0.943 0.949
α = 0.95
ϕ1 = 0.80
20 35 50 100
0.845 0.918 0.934 0.947
EL(0.75)
EL(0.50)
EL(0.25)
EL(0)
NA(0.25)
NA(0.20)
0.873 0.894 0.897 0.894
0.871 0.889 0.892 0.893
0.858 0.884 0.893 0.899
0.885 0.893 0.894 0.900
0.971 0.960 0.960 0.941
0.985 0.978 0.971 0.961
0.855 0.881 0.888 0.893
0.847 0.880 0.889 0.887
0.830 0.875 0.888 0.890
0.864 0.891 0.888 0.894
0.867 0.955 0.951 0.938
0.982 0.974 0.969 0.963
0.791 0.866 0.876 0.892
0.776 0.865 0.882 0.891
0.765 0.853 0.877 0.892
0.801 0.867 0.882 0.893
0.934 0.940 0.939 0.935
0.978 0.969 0.965 0.956
0.922 0.942 0.948 0.948
0.920 0.943 0.947 0.947
0.913 0.934 0.945 0.947
0.931 0.944 0.945 0.949
0.988 0.983 0.981 0.973
0.994 0.991 0.989 0.984
0.902 0.934 0.947 0.948
0.901 0.936 0.942 0.945
0.886 0.930 0.937 0.942
0.911 0.934 0.943 0.947
0.987 0.982 0.979 0.972
0.994 0.990 0.989 0.984
0.848 0.919 0.936 0.948
0.840 0.910 0.935 0.946
0.817 0.906 0.928 0.945
0.862 0.926 0.936 0.948
0.960 0.973 0.972 0.972
0.992 0.988 0.987 0.982
Lemma 1. Under the conditions of Theorem 1, as n → ∞, we have (i) Zn (φ0 ) = op n1/2 ,
(ii)
√
d
nQnh (φ0 ) − → N (0, Ω ),
(iii) S (φ0 ) = Ω + op (1), (iv) λ(φ0 ) = Op (n−1/2 ). Proof of Lemma 1. For part (i), since εi = yi − φ0τ Yi and |Gh (εi )| ≤ M, for all ε > 0, we have P [Zn (φ0 ) > (n − p)1/2 ε] ≤ (n − p)P [kmth (φ0 )k > (n − p)1/2 ε]
≤ (n − p)P [kwt Yt k > M −1 (n − p)1/2 ε] = (n − p)P [kwt Yt k2 > M −2 (n − p)ε2 ] ≤ M 2 ε−2 E kwt Yt k2 I [kwt Yt k2 > M −2 (n − p)ε2 ] → 0.
Thus, part (i) follows. For part (ii), we may write
√
1
n X
n − pQnh (φ0 ) = √ wi Yi [Gh (εi ) − E (Gh (εi ))] + n − p i=p+1
= √
1
n X
n − p i=p+1
Xni +
1
n X
n − p i=p+1
! wi Y i
√
1
n X
n − p i=p+1
! wi Yi E (Gh (εi ))
√
n − pO(hr ),
(4.1)
where Xni = wi Yi [Gh (εi ) − E (Gh (εi ))]. The second term of (4.1) is op (1) by the ergodicity. Let Fi be the σ -field generated by {εt , t ≤ i}. Then, for each n ≥ p + 1, {Xni , Fi , p + 1 ≤ i ≤ n} is a stationary sequence of martingale differences. By
1426
J. Li et al. / Statistics and Probability Letters 80 (2010) 1420–1430
Theorem 3.2 in Hall and Heyde (1980), we have
√
n X
1
d
n − p i=p+1
Xni − → N (0, Ω ).
(4.2)
Thus, part (ii) follows. For part (iii), we may write S (φ0 ) =
=
n X
1
n − p i=p+1 n X
1
n − p i=p+1
2 τ i Yi Yi
w
[ (εi ) − E ( (εi ))] + G2h
G2h
n X
1
Xi ani +
n − p i=p+1
1
!
n X
n − p i=p+1
2 τ i Yi Yi
w
E (G2h (εi ))
! [1 + O(h)],
Xi
(4.3)
where Xi = wi2 Yi Yiτ , ani = G2h (εi ) − E (G2h (εi )). The second term of (4.3) is Ω + op (1) by the ergodicity. For the first term we suppose that Yi are yi−1 without loss of generality. Using E (ani )2 = O(h) and the Markov inequality, for all ε > 0, we have P
! n n X X 1 Xi ani > ε ≤ ε −1 E (|Xi |)E (|ani |) n − p i=p+1 n − p i=p+1 1
≤ ε −1 E (|Xi |){E (ani )2 }1/2 → 0. Thus, part (iii) follows. Similar to Owen (1990), part (iv) follows from part (i)–(iii). This completes the proof.
Proof of Theorem 1. By Lemma 1, similar to Owen (1990), we have lh (φ0 ) = (n − p)Qnh (φ0 )τ S (φ0 )−1 Qnh (φ0 ) + op (1). The result follows from the Slutsky theorem.
To prove Theorem 2, we introduce the following lemmas. Lemma 2. Under assumptions (A1)–(A4), if E (wt2 kYt k2 ) < ∞, then as n → ∞, Qnh (φ0 ) = O
p
log n/n
a.s.
Proof of Lemma 2. We may write Qnh (φ0 ) =
=
1
n X
n − p i=p+1 1
n X
n − p i=p+1
wi Yi [Gh (εi ) − E (Gh (εi ))] +
Zi bni +
1
n X
n − p i=p+1
1
n X
n − p i=p+1
! wi Yi E (Gh (εi ))
! Zi
O(hr ),
(4.4)
√
where Zi = wi Yi , bni = Gh (εi ) − E (Gh (εi )). The second term of (4.4) is O log n/n a.s. by the ergodicity. Now turning 0 to the first term, we suppose that Yi are yi−1 without loss of generality. Let Zi = Zi I (|Zi |2 ≤ n/ log n), Zi00 = Zi − Zi0 , then √ |Zi00 | ≤ |Zi00 |2 log n/n. Thus, we have
n n X X M 00 Zi bni ≤ |Zi00 |2 √ n − p i=p+1 n log n i=p+1 1
≤
=
M
n X
n − p i=p+1 M
n X
n − p i=p+1
Zi2 I (Zi2 > i/ log i) Zi2 I (Zi2 > N ) +
M
n X
n − p i=p+1
Zi2 I (i/ log i < Zi2 ≤ N ).
The first term on the last expression converges to ME [Zi2 I (Zi2 > N )] a.s. by the ergodicity. The second term clearly converges to 0. Then
n X 00 lim sup √ Zi bni ≤ lim ME [Zi2 I (Zi2 > N )] = 0 a.s. N →∞ n log n i=p+1 n→∞ 1
(4.5)
J. Li et al. / Statistics and Probability Letters 80 (2010) 1420–1430
1427
Therefore, it suffices to show that for some A > 0,
( ) n X p P Z 0 b > A n log n, i.o. = 0. i=p+1 i ni
(4.6)
Note that for each n ≥ p + 1, {Zi0 bni , Fi , p + 1 ≤ i ≤ n} is a sequence of martingale differences with |Zi0 bni | ≤ M For some C0 > 0, by the ergodicity, we have Vn2 =
n X
E {(Zi0 bni )2 |Fi−1 } =
i=p+1
n X
n X
Zi02 E (bni )2 ≤ M 2
i=p+1
Zi2 ≤ M 2 (E (Zi2 ) + C0 )n
√
n/ log n.
a.s.
i=p+1
Set y = M 2 (E (Zi2 ) + C0 )n, by Theorem 1.2A in De Le Peña (1999), for all A > 0, we have
( ) ( ) n n X X p p 2 0 0 P Z b > A n log n = P Z b > A n log n, Vn < y for some n i=p+1 i ni i=p+1 i ni (
−A2 n log n ≤ 2 exp √ √ 2 y + M n/ log nA n log n −A2 log n = 2 exp . 2M 2 (E (Zi2 ) + C0 ) + 2MA
)
Hence, (4.6) follows from the Borel–Cantelli lemma by choosing A such that A2 > 2M 2 (E (Zi2 ) + C0 ) + 2MA. This completes the proof. Lemma 3. Under the assumptions (A1)–(A5), if for some 2γ /(1 − γ ) < a < 1, E (wt kYt k2 )1+a < ∞, then as n → ∞,
∂ Qnh (φ0 ) = −2f (0)Σ + o(1) a.s. ∂φ τ Proof of Lemma 3. We may write n ε i h ε X ∂ Qnh (φ0 ) 1 i i = − wi Yi Yiτ K +K − τ ∂φ (n − p)h i=p+1 h h
=−
=−
1
n X
(n − p)h i=p+1 1
n X
(n − p)h i=p+1
1
Ti cni −
n X
n − p i=p+1 1
Ti cni −
n X
n − p i=p+1
! Ti
h ε i
E K
h
ε i 1 i +K − h
h
! Ti
[2f (0) + o(hr −1 )],
(4.7)
where Ti = wi Yi Yiτ , cni = K (εi /h) + K (−εi /h) − E [K (εi /h) + K (−εi /h)]. The second term of (4.7) is −2f (0)Σ a.s. by the ergodicity. It suffices to show that the first term is o(1) a.s. Without loss of generality we suppose that Yi are yi−1 . Let Ti0 = Ti I (|Ti | ≤ nγ /a ), Ti00 = Ti − Ti0 , then |Ti00 |/h ≤ |Ti00 |1+a . Similar to the proof of (4.5) we have
n n X X M 00 Ti cni ≤ |Ti00 |1+a → 0 a.s. (n − p)h i=p+1 n − p i=p+1 1
Therefore, it suffices to show that (nh)−1
Pn
i=p+1
(4.8)
Ti0 cni = o(1) a.s. Note that for each n ≥ p + 1, {Ti0 cni , Fi , p + 1 ≤ i ≤ n} is
a sequence of martingale differences with |Ti cni | ≤ Mnγ /a , and 0
Vn2 =
n X i=p+1
E {(Ti0 cni )2 |Fi−1 } =
n X i=p+1
Ti02 E (cni )2 ≤ n(nγ /a )2 (f (0)Ch + O(h3 ))
a.s.,
1428
J. Li et al. / Statistics and Probability Letters 80 (2010) 1420–1430
where C > 0 is constant. Set y = n(nγ /a )2 (f (0)Ch + O(h3 )), by Theorem 1.2A in De Le Peña (1999), for all ε > 0, we have
( ) ( ) n n X X 0 0 2 P T c > (nh)ε = P T c > (nh)ε, Vn < y for some n i=p+1 i ni i=p+1 i ni −(nh)2 ε 2 ≤ 2 exp 2(y + Mnγ /a nhε) 1−γ 2γ a− a −n 1−γ ε2 = 2 exp . 2(f (0)C + O(h2 ) + Mn−γ /a ε) The result follows from the Borel–Cantelli lemma.
Lemma 4. Under assumptions (A1)–(A4), if E (wi2 kYi k2 ) < ∞, then as n → ∞, Qnh (φ0 ) − Qn0 (φ0 ) = op (n−1/2 ). Proof of Lemma 4. We may write Qnh (φ0 ) − Qn0 (φ0 ) =
=
1
n X
n − p i=p+1 1
n X
n − p i=p+1
Zi eni +
Zi eni +
n X
1
n − p i=p+1 n X
1
n − p i=p+1
! wi Yi E (Gh (εi ) − sgn(εi )) ! wi Yi O(hr ),
(4.9)
where Zi = wi Yi , eni = Gh (εi ) − sgn(εi ) − E (Gh (εi ) − sgn(εi )). Since the second term of (4.9) is op (n−1/2 ) by the ergodicity, it is suffices to show that the first term is also op (n−1/2 ). Note that for each n ≥ p + 1, {Zi eni , Fi , p + 1 ≤ i ≤ n} is a stationary sequence of martingale differences with E (eni )2 = O(h). The result follows from the Chebyshev inequality. Proof of Theorem 2. For φ ∈ {φ| kφ − φ0 k ≤ dn }, by Taylor expansion, Qnh (φ) = Qnh (φ0 ) +
+
p ∂ Qnh (φ0 ) 1 X ∂ 2 Qnh (φ0 ) (φ − φ ) + (ϕj − ϕj0 )(ϕk − ϕk0 ) 0 ∂φ τ 2 j,k=0 ∂ϕj ∂ϕk
p 1 X ∂ 3 Qnh (φ∗ )
6 j,k,l=0 ∂ϕj ∂ϕk ∂ϕl
(ϕj − ϕj0 )(ϕk − ϕk0 )(ϕl − ϕl0 ),
where φ∗ lies between φ0 and φ . Denote yt −0 = 1, Kth = −K 00 ((yt − φ∗τ Yt )/h) − K 00 (−(yt − φ∗τ Yt )/h). Then the final term on the right side can be written as p 1 X
(ϕj − ϕj0 )(ϕk − ϕk0 )(ϕl − ϕl0 ) h3
6 j,k,l=0
n X
1
n − p t =p+1
! wt Yt yt −j yt −k yt −l Kht ,
which is o(δn ) a.s., where δn = kφ − φ0 k, because |Kth | ≤ M and d2n /h3 = 1/n2α−3γ → 0. The third term can be written as p 1 X
(ϕj − ϕj0 )(ϕk − ϕk0 )
2 j,k=0
h
(
1
n X
h
(n − p)h t =p+1
wt Yt yt −j yt −k K
0
ε t
h
ε i t −K −
)
0
h
which is also o(δn ) a.s., because dn /h = 1/nα−γ → 0, and 1
n X
(n − p)h t =p+1
h ε ε i t t wt Yt yt −j yt −k K 0 − K0 − = o(1) a.s. h
h
by a similar proof of Lemma 3. Therefore, Qnh (φ) = Qnh (φ0 ) +
∂ Qnh (φ0 ) (φ − φ0 ) + o(δn ) a.s., ∂φ τ
(4.10)
J. Li et al. / Statistics and Probability Letters 80 (2010) 1420–1430
1429
uniformly about φ ∈ {φ| kφ − φ0 k ≤ dn }. Denote φ = φ0 + udn , for φ ∈ {φ| kφ − φ0 k = dn }, where kuk = 1. Now, we give a lower bound for lh (φ) on the surface of the ball. Similar to Owen (1990), by Lemmas 2 and 3, we have lh (φ) = (n − p)Qnh (φ)τ S (φ)−1 Qnh (φ) + o(n4/3−3α )
a.s.
iτ = ( n − p) O log n/n + (−2f (0))Σ udn + o(dn ) Ω −1 h p i × O log n/n + (−2f (0))Σ udn + o(dn ) + o(n4/3−3α ) a.s. h p
≥ (c − ε)n1−2α a.s., where c − ε > 0 and c is the smallest eigenvalue of 4f 2 (0)Σ τ Ω −1 Σ . Similarly, lh (φ0 ) = (n − p)Qnh (φ0 )τ S (φ0 )−1 Qnh (φ0 ) + o(1) a.s.
= O(log n) a.s. Since lh (φ) is a continuous function about φ as φ belongs to the ball kφ − φ0 k ≤ dn , lh (φ) attains its minimum value at some point φˆ SEL in the interior of this ball, and φˆ SEL satisfies ∂ lh (φˆ SEL )/∂φ = 0. It follows that Qnh (φˆ SEL ) = 0. By (4.10), expanding Qnh (φˆ SEL ) at φ0 gives
∂ Qnh (φ0 ) (φˆ SEL − φ0 ) + op (δn ), ∂φ τ
0 = Qnh (φ0 ) +
(4.11)
where δn = kφˆ SEL − φ0 k. By Lemma 3, we have
φˆ SEL − φ0 =
1 2f (0)
Σ −1 Qnh (φ0 ) + op (δn ).
From this and Qnh (φ0 ) = Op (n−1/2 ), we know that δn = Op (n−1/2 ). Now we have
√
1
n(φˆ SEL − φ0 ) =
2f (0)
√ Σ −1 nQnh (φ0 ) + op (1).
(4.12)
On the other hand, let Ln (u) =
n X
wt |εt − n−1/2 uτ Yt | − |εt | ,
t =p+1
where u ∈ Rp+1 . Then Ln (u) = −u
√ τ
√
n(φˆ LAD − φ0 ) is the minimizer of Ln (u) on Rp+1 . From (A.3) in Ling (2005), we have
nQn0 (φ0 ) + f (0)uτ Σ u + Rn (u),
where Rn (u) = op (1) for each u ∈ Rp+1 . By the Basic Corollary in Hjort and Pollard (1993), we have
√
n(φˆ LAD − φ0 ) =
1 2f (0)
√ Σ −1 nQn0 (φ0 ) + op (1).
(4.13)
Combining (4.12) with (4.13), we have
√
n(φˆ SEL − φˆ LAD ) =
1 2f (0)
√ Σ −1 n[Qnh (φ0 ) − Qn0 (φ0 )] + op (1).
The result follows from Lemma 4.
Acknowledgement The authors are grateful to the referees for their helpful comments and suggestions. The third author’s research was supported by NSFC (10731010). References Brockwell, P.J., Davis, R.A., 1991. Time Series: Theory and Methods, 2nd ed. Springer-Verlag, New York. Chan, N.H., Ling, S., 2006. Empirical likelihood for GARCH models. Econometric Theory 22, 403–428. Chuang, C.S., Chan, N.H., 2002. Empirical likelihood for autoregressive models with applications to unstable time series. Statist. Sinica 12, 387–407. Davis, R.A., Knight, K., Liu, J., 1992. M-estimation for autoregressions with infinite variance. Stochastic Process. Appl. 40, 145–180. De Le Peña, V.H., 1999. A general class of exponential inequalities for martingales and ratios. Ann. Probab. 27, 1537–1564. Hall, P.C., Heyde, C.C., 1980. Martingale Limit Theory and its Applications. Academic Press, New York.
1430
J. Li et al. / Statistics and Probability Letters 80 (2010) 1420–1430
Hjort, N.L., Pollard, D., 1993. Asymptotics for minimisers of convex processes. Available at: http://www.stat.yale.edu/~pollard/papers/convex.pdf. Huber, P.J., 1977. Robust Statistical Procedures. Society for Industrial and Applied Mathematics, Philadelphia. Ling, S., 2005. Self-weighted least absolute deviation estimation for infinite variance autoregressive models. J. R. Stat. Soc. Ser. B 67, 381–393. Monti, A.C., 1997. Empirical likelihood confidence regions in time series models. Biometrika 84, 395–405. Owen, A.B., 1988. Empirical likelihood ratio confidence intervals for a single functional. Biometrika 75, 237–249. Owen, A.B., 1990. Empirical likelihood ratio confidence regions. Ann. Statist. 18, 90–120. Silverman, B.W., 1986. Density Estimation for Statistics and Data Analysis. Chapman and Hall, London.