Statistics and Probability Letters 82 (2012) 949–958
Contents lists available at SciVerse ScienceDirect
Statistics and Probability Letters journal homepage: www.elsevier.com/locate/stapro
On tail index estimation using a sample with missing observations Ivana Ilić University of Nish, Medical Faculty, Bulevar Dr. Zorana Djindjica 81, 18000 Nis, Serbia
article
abstract
info
Article history: Received 29 May 2011 Received in revised form 16 January 2012 Accepted 16 January 2012 Available online 23 January 2012 Keywords: Missing observations Extremal dependence Near epoch dependence Parameter estimation Tail indices
For the sequence of heavy-tailed, dependent and heterogeneous random variables with the missing observations the estimation of the tail-index is considered. Under minimal but verifiable assumption of ‘‘extremal dependence’’ we proved the consistency of a geometric-type estimator (Brito and Freitas, 2003). We extended results from Mladenović and Piterbarg (2008) and proved the consistency and the asymptotic normality of the Hill estimator. Illustrative examples are provided. © 2012 Elsevier B.V. All rights reserved.
1. Introduction Missing data is a common problem in statistical analysis. The presence of missing values in a dataset can seriously affect the accuracy of statistical research. Missing data may occur in a survey research if the data collection were not done properly or when some data for a respondent are unknown because of refusal to provide or failure to collect the response. Various fields of research have begun to investigate different strategies of dealing with missing data (Graham, 2009), such as economics, biometrics, marketing, psychology, social sciences and medicine. Recently, various papers have appeared which investigate practical missing data issue using the real-life examples: Kline and Santos (2010), Mladenović and Petrovic (2010), Wang and Luo (2011), Koopman et al. (2007) and Prevosti and Chemisquy (2009). Suppose {Xt } = {Xt : 1 ≤ t ≤ n} is a sequence of rv’s with the same distribution function F with the ‘‘heavy’’ tail satisfying: 1 − F (x) = x−α L(x),
x > 0,
(1.1)
where α > 0 denotes the index of regular variation and L is slowly varying at infinity, that is lim
t →∞
L(tx) L( t )
→ 1 for x > 0.
(1.2)
These distributions often occur in a wide variety of domains such as insurance, business, finance, industry, telecommunications, traffics, economics, sociology and geology. The tail thickness has been rigorously studied and different estimators of α have been suggested and studied in the case of complete samples. Probably the most popular is the Hill estimator of the parameter 1/α defined as follows (Hill, 1975): Hk,n =
k 1
k t =1
ln X(t ) − ln X(k+1) ,
E-mail address:
[email protected]. 0167-7152/$ – see front matter © 2012 Elsevier B.V. All rights reserved. doi:10.1016/j.spl.2012.01.014
(1.3)
950
I. Ilić / Statistics and Probability Letters 82 (2012) 949–958
where X(1) > X(2) > · · · > X(n) is a sequence of order statistics and k = kn is a sequence of positive integers satisfying 1 ≤ kn < n, limn→∞ kn = ∞ and limn→∞ kn /n = 0. Hill estimator has been well studied when Xt are i.i.d (see Davis and Resnick (1984), Haeusler and Teugels (1985) and Dekkers et al. (1989)). Other estimators of α in i.i.d. settings were also studied in the case of complete samples; see Dekkers et al. (1989), Bacro and Brito (1995) and Chörgó and Viharos (1998). See also Hsing (1991), Ling and Peng (2004) and Hill (2010) and the extensive list of citations therein for the estimation of the tail index using a full sample of possibly dependent data. The results of Hsing (1991) were extended on the incomplete sample of possibly dependent variables; see Mladenović and Piterbarg (2008). Brito and Freitas (2003) have suggested a geometric-typeestimator of α , R(kn ), which is related to the least squares estimators R1 (kn ) and R3 (kn ) proposed in Schultze and Steinebach (1996). R(kn ) is defined by:
k 2 k n n ln2 (n/t ) − 1 ln(n/t ) kn t =1 t =1 R(kn ) = 2 . k n kn 2 1 ln X(t ) − k ln X(t ) n t =1
(1.4)
t =1
This estimator is, in some situations, more ‘‘robust’’ against deviations of the slowly varying function L from the constant (see Chörgó and Viharos (1998)). Using the similar model as in Mladenović and Piterbarg (2008) who established the consistency of Hill estimator based on the incomplete sample of possibly dependent rv’s (the authors basically exploit Hsing (1991, Theorem 2.2)) we prove the consistency and the asymptotic normality of Hill estimator and the consistency of geometric-type estimator considering the processes with the extremes that are NED (near epoch dependent) on some arbitrary mixing functional. This property has substantial practical advantages because it only requires computation of the conditional expectation and it is easy to verify. Also, this property characterizes a massive array of stochastic processes. We do not require the stationarity in general and our results cover, for example, the processes satisfying Hsing’s mixing condition, nonlinear distributed lag processes, strong mixing GARCH, explosive GARCH, and much more. See Hill (2010) who expedites the theory by analyzing the processes whose extremal support is NED and proves that a broad class of the processes has this property. 2. Preliminaries: a sample with missing observations If condition (1.1) is satisfied one can easily prove that (see Leadbetter et al. (1983, Theorems 1.5.1 and 1.7.3)): 1 − F (x)
lim
x→∞
1 − F (x − 0)
1−F
F −1
1−
1 t
= 1, ∼
(2.1) 1
as t → ∞,
t
(2.2)
where F −1 (y) := inf{x : F (x) ≥ y} is the left continuous inverse of the function F . Since any inference of α should be made with the tail portion of the empirical distribution of the sample, without loss of generality we may assume that F has support on (0, ∞). Now, assume that only observations at certain points are available. Denote the observed random variables among {X1 , . . . , Xn } by X1 , . . . , XSn . Here the random variable Sn represents the number of the observed rv’s among the first n terms of the sequence {Xt }. An incomplete sample may be obtained, for example, if every term of {Xt } is observed with the probability p, independently of other terms, and in this case Sn is a binomial random variable. But we shall assume that the observed random variables are determined by a general point process, and only the conditions on Sn will be imposed. This model was considered in Mladenović and Piterbarg (2008) where the consistency of Hill’s estimator was proved. Assumption A. The sequence X1 , X2 , . . . does not depend on Sn and Sn n
p
−→ c0 > 0 as n → +∞.
Suppose βn is a sequence of real numbers such that limn→∞ βn = ∞ and limn→∞ βn /n = 0. Let
Mn =
Sn
βn
and Bn =
0, Mn Sn
Sn = 0
,
Sn ≥ 1,
where the floor function [·] denotes the largest previous integer. We are interested in the estimation of α , using some portion of the sample. Let X(1) ≥ X(2) ≥ · · · ≥ X(Sn ) be the order statistics defined by Sn observed variables. Let us denote x+ = max(x, 0),
x− = max(−x, 0).
(2.3)
I. Ilić / Statistics and Probability Letters 82 (2012) 949–958
951
Hill estimator is given by: HSn = I {Sn ≥ βn }
Mn 1
Mn t =1
ln X(t ) − ln X(Mn +1)
and geometric-type estimator by:
M n ln2 (S /t ) − n t =1 R(Sn ) = I {Sn ≥ βn } Mn 2 ln X(t ) − t =1
1 Mn
1 Mn
M n
ln(Sn /t )
2
t =1
M n
2
.
(2.4)
ln X(t )
t =1
Let us also define next two quantities: Mn 1 HSn = I {Sn ≥ βn } ln X(t ) − ln F −1 (1 − Bn )
Mn t =1
and HS+n = I {Sn ≥ βn }
Sn 1
Mn t =1
(ln Xt − ln F −1 (1 − Bn ))+ .
According to Mladenović and Piterbarg (2008) all three variables HSn , HSn and HS+n have the same asymptotic behavior in
the distribution. In order to demonstrate the asymptotic properties of HSn and R(Sn ) we appeal to the concepts in Hill (2010, Theorem 1). He imposes new tail dependence properties on {I (Xt > bkn eu )} (where bkn := F −1 (1 − kn /n)), that cover and generalize Hsing’s mixing condition (see Hsing (1991)). Let ℑt := σ (ϵτ : τ ≤ t ) be a σ -field induced by some α -mixing base ϵt . Let qn be an arbitrary sequence of integers satisfying 1 ≤ qn < n, and qn → ∞. For example, we may assume ϵt = I (Xt > bkn eu ) and impose a mixing condition on ϵt as in Hsing (1991). Lp -Extremal-Near Epoch Dependence (Lp -E-NED) property. {Xt } is Lp -E-NED on {ℑt }, p > 0, with size λ > 0 if qn ∥I (Xt > bkn eu ) − P (Xt > bkn eu |ℑtt + −qn )∥p ≤ fnt (u) × ψqn , 1/p where ψqn = o(q−λ ). See Hill (2010, n ) and fnt : R+ → R+ is Lebesgue measurable, sup1≤t ≤n supu≥0 fnt (u) = O((kn /n) p. 1402).
Assumption B. {Xt } is L2 -E-NED on {ℑt }, with coefficients ψqn of size 1/2 and constants fnt (u) where fnt : R+ → R+ is ∞ Lebesgue measurable, sup1≤t ≤n 0 fnt (u)du = O((kn /n)1/2 ). E-NED covers NED and α -mixing, and covers nonlinear AR-GARCH and stochastic volatility (Hill, 2010, 2011). Remark 1. Note that, if the sequence {Xt } is α -mixing, then so is { Xt } because sup
sup
+∞ s∈Z A∈Fs −∞ ,B∈Fs+d
|P (A ∩ B) − P (A)(B)| ≤ sup
sup
+∞ t ∈Z A∈ℑt −∞ ,B∈ℑt +d
|P (A ∩ B) − P (A)(B)|,
where Fs := σ ( ϵτ : τ ≤ s) and ϵs is some functional of Xs . 3. Main results Let
Ynt = (ln Xt − ln F −1 (1 − Bn ))+ and Int = I ln Xt − ln F −1 (1 − ρ Bn ) > ε
for ε ∈ R and ρ ∈ J, where J is some neighborhood of 1. I {} is notation for an indicator function. Lemma 1. If Assumption A holds, then for any integer k > 0: lim βn E (ln Xt − ln F −1 (1 − Bn ))k+ =
n→∞
k!
αk
.
952
I. Ilić / Statistics and Probability Letters 82 (2012) 949–958
Remark 2. Note that if Bn = 0, then ln F −1 (1 − Bn ) = +∞ and according to (2.3) the random variable Ynt takes value 0. Theorem 1. Assume that F satisfies (1.1). Suppose that Assumptions A and B hold. Then all three quantities HSn , HS+n and HSn
converge to α −1 in probability.
Lemma 2. Consider the sequence of random variables: t (Mn ) :=
Mn 1
Mn t =1
ln (Sn /t ) − 2
Mn 1
Mn t =1
2 ln(Sn /t )
,
where Mn is defined in the previous section. Then E (t (Mn )) → 1, as n → ∞. Theorem 2. Assume that F satisfies (1.1). Suppose that both Assumptions A and B hold. Then R(Sn ) converges to α −1 in probability. Assumption C. There is a positive measurable function g on (0, ∞) such that for any λ > 0: L(λx)/L(x) − 1 = O(g (x)), as x → ∞. Also, there are D > 0, z0 < ∞ and τ ≤ 0 such that g (λz )/g (z ) ≤ Dλτ for some λ ≥ 1 and z ≥ z0 . We require 1/2 kn , bkn and g to satisfy kn g (bkn ) → 0. Theorem 3. Suppose that Assumptions A–C hold. Then: d
Mn1/2 (HSn − α −1 )/σMn −→ N (0, 1), 1/2
where σM2 n = E (Mn (HSn − α −1 ))2 = O(1). Also,
2 Sn 2 1 ζ ζ → 0, σ − E I {Sn ≥ βn } Yt ))} { Yt − E ( Yt ) − α −1 ( Yt − E ( Mn 1/2 Mn t =1 ζ ζ Xt − ln F −1 (1 − Bn ) > √M where Yt = I ln
n
and ζ ∈ R.
Example 1. One model which is often used in the applications is GARCH (1, 1) process. Under very general conditions on the noise sequence (Zt ) the GARCH (1, 1) process has Pareto-like marginal distribution and represents a very attractive tool to model the heavier-than-normal tails of the financial data. It is defined by specifying σt as follows:
σt2 = α0 + β1 σt2−1 + α1 Xt2−1 = α0 + σt2−1 (β1 + α1 Zt2−1 ), t ∈ Z and the parameters α0 , α1 and β1 are nonnegative. The GARCH (1,1) process is L2 -NED if
β12 + 2α1 β1 + 3α12 < 1 when Zt is i.i.d normal (see Davidson (2004)). Also Hill (2005, Lemma 7) shows that the processes with the regularly varying tails which are L2 -NED also satisfy L2 -E-NED property. Also, since GARCH (1, 1) processes are strongly mixing with the geometric rate (provided Z has a density and E |Z |ε < ∞, for some ε > 0) they are automatically E-NED on itself. Example 2. Another model which is also used in the applications is a nonlinear autoregressive (AR) model. For the nonlinear models yt = f (yt −1 ) + εt it is possible to show L2 -NED, for any sequence of rv’s that satisfies the model for t = 1, . . . , n and have an arbitrary starting value y0 , such that E (y0 )2 < ∞. Such results can be derived in the case where
|f (x) − f (y)| ≤ L|x − y|, where f (·) is Borel measurable and L < 1 (see Tjostheim (1990)).
I. Ilić / Statistics and Probability Letters 82 (2012) 949–958
953
4. Proofs Proof of Lemma 1. Note that the following equalities hold: ∞
E (ln Xt − ln F −1 (1 − Bn ))k+ =
P (ln Xt − ln F −1 (1 − Bn ))k > u du
0
∞
=
P 0
∞
=
P
>u
ln −1 F (1 − Bn )
Xt
du
u1/k
F −1 (1 − Bn )
0
k
Xt
>e
du.
Since
e
u1/k
<
Xt F −1 (1 − Bn )
u1/k
⊂ e
<
Xt
,
F −1 (1 − 1/βn )
we obtain that E (ln Xt − ln F −1 (1 − Bn ))k+ ≤
∞
P
F −1 (1 − 1/βn )
0
∞
Xt
Xt > eu P
=
1/k
>e
u1/k
du
F −1 (1 − 1/βn )
du
0
∞
=
1 − F (eu
1/k
F −1 (1 − 1/βn ))
du
0
∞ 1/k 1 − F (eu F −1 (1 − 1/βn )) du = {1 − F (F −1 (1 − 1/βn ))} 1 − F (F −1 (1 − 1/βn )) 0 ∞ 1 1 k! 1/k ∼ e−α u = . βn 0 βn α k Let us denote Jε = [n(c0 − ε), n] and kn = [nβn−1 (c0 − ε)] where 0 < ε < c0 . We have that
{Sn ≥ n(c0 − ε)} ⊂ {Sn ≥ kn βn } ⊂ Bn >
kn
1
kn + 1 βn
,
and
kn 1 {Sn ∈ Jε } ⊂ Bn > kn + 1 βn 1 kn = F −1 (1 − Bn ) ≤ F −1 1 − kn + 1 βn 1 1 . = ≥ − 1 F (1 − Bn ) 1 n F −1 1 − k k+ n 1 βn Consequently,
Xt F −1 (1 − Bn )
u1/k
>e
, Sn ∈ Jε
⊃
Xt
F −1 1 −
kn 1 kn +1 βn
>e
u1/k
, Sn ∈ Jε
.
Finally we have E (ln Xt − ln F −1 (1 − Bn ))k+ ≥ P {Sn ∈ Jε }
∞
P 0
∼ P {Sn ∈ Jε }
kn
Xt
F −1 1 − 1 k!
kn + 1 βn α k
u1/k
kn 1 kn +1 βn
>e
du
954
I. Ilić / Statistics and Probability Letters 82 (2012) 949–958
1 k!
kn
∼
kn + 1 βn α k 1 k! , βn α k
∼
n → ∞.
Before we give the proof of Theorem 1. we introduce the definition of Extremal-Mixingale (E-MIXL) processes. Lp -Extremal-Mixingale (Lp -E-MIXL) property. {Xt , ℑt } forms an Lp -extremal mixingale array, p > 0, with size λ > 0 if
∥P (Xt > bkn eu ) − P (Xt > bkn eu |ℑt −qn )∥p ≤ ent (u) × φqn ∥I (Xt > bkn eu ) − P (Xt > bkn eu |ℑt +qn )∥p ≤ ent (u) × φqn +1 , 1/p where φqn = o(q−λ ). n ) and ent : R+ → R+ is Lebesgue measurable, sup1≤t ≤n supu≥0 ent (u) = O((kn /n)
Proof of Theorem 1. By the argument identical to the Theorem 17.5 of Davidson’s (1994) it is easy to show that the L2 E-NED assumption ensures the L2 -E-MIXL assumption, i.e. in our case we have that {Xt } is L2 -E-MIXL on{ℑt } with the ∞ coefficients ψqn of size 1/2 and the constants ent (u) where ent : R+ → R+ is Lebesgue measurable, sup1≤t ≤n 0 ent (u)du = 1/2 O((kn /n) ). By the similar arguments as in Hill’s (2010, Lemma B.1) and Remark 1. {( Ynt − E ( Ynt )), Ft } and {(Int − E (Int )), Ft } for all ρ in an arbitrary neighborhood of 1 form the L2 -E-MIXL arrays with size 1/2 and some constants { e∗nt , ent (u)}. Then, following the proof from Hill’s (2010, Lemma 1) and using Davidson’s (1994, Cor. 20.16) we have that I {Sn ≥ βn }
Sn 1
Mn t =1
p ( Ynt − E ( Ynt )) −→ 0
and I {Sn ≥ βn }
Sn 1
Mn t =1
p
(Int − E (Int )) −→ 0
for all ε ∈ R and ρ ∈ J, where J is some neighborhood of 1. Finally, the conclusion of the theorem is a simple consequence of Theorem 1 from Mladenović and Piterbarg (2008). Proof of Lemma 2. Notice that t (Mn ) :=
Mn 1
ln (t ) −
Mn 1
2
Mn t =1
M n t =1
2 ln(t )
.
Since
E
Mn 1
Mn t =1
ln (t ) 2
=E E
Mn 1
Mn t =1
=
E
Mn 1
=
E
mn
= ≤
mn 1
m n t =1
mn 1 mn
mn t =1 mn
ln (t )|Mn = mn 2
ln (t ) P {Mn = mn } 2
ln2 (t )dt P {Mn = mn }
1
and
E
Mn 1
Mn t =1
ln(t )
=
mn 1 mn
=
mn 1 mn
≥
m n t =1 m n t =2
mn
mn
ln(t )P {Mn = mn } ln(t )P {Mn = mn }
mn
1 1
P {Mn = mn }
ln2 (t )P {Mn = mn }
mn +1
1 mn
ln (t )|Mn = mn
M n t =1
mn
2
ln(t )dtP {Mn = mn },
I. Ilić / Statistics and Probability Letters 82 (2012) 949–958
955
we have, by the simple calculation that
t (Mn ) ≤
1
mn
mn
∼
mn +1
ln (t )dt − 2
1
mn
1 m2n
ln(t )dt
2 P {Mn = mn }
1
P {Mn = mn } = 1,
mn
as n → ∞. See Brito and Freitas (2003, Lemma 2).
Notice that R(Sn ) can be written in the form:
R(Sn ) = I {Sn ≥ βn }
t (Mn ) 1 Mn
Mn
2
ln X(t ) −
t =1
1 Mn2
M n
2 .
(4.1)
ln X(t )
t =1
According to Lemma 2, in order to prove the consistency of R(Sn ) we ought to come to the point where NSn converges to 1/α 2 in probability, defined as follows: Mn 1
NSn :=
Mn t =1
2
ln X(t ) −
1 Mn2
Mn
2 ln X(t )
t =1
Mn 1
=
Mn t =1
1 (ln X(t ) − ln F −1 (1 − Bn ))2 − 2 Mn
2 Mn −1 (ln X(t ) − ln F (1 − Bn )) . t =1
Proof of Theorem 2. Denote: Sn 1
+
NSn :=
Mn t =1
1 (ln X(t ) − ln F −1 (1 − Bn ))2+ −
Mn2
Sn (ln X(t ) − ln F −1 (1 − Bn ))+
2 .
t =1
Using Theorem 1 and the conditions of the theorem, we conclude that (HS+n )2 converges in probability to 1/α 2 . According to Lemma 1 we obtain 1
βn
E (ln X(t ) − ln F −1 (1 − Bn ))2+ → 2/α 2
as n → ∞. By the similar arguments as in Theorem 1 by Davidson’s (1994, Cor. 20.16) we conclude that Sn 1
Mn t =1
P (ln X(t ) − ln F −1 (1 − Bn ))2+ −→ 2/α 2 .
Thus, NS+n :=
Sn 1
Mn t =1
P
(ln X(t ) − ln F −1 (1 − Bn ))2+ − (HS+n )2 −→ 1/α 2 . P
In order to complete the proof we need to demonstrate that NS+n − NSn −→ 0. Note that NS+n − NSn = ASn + BSn + ( HSn )2 − (HS+n )2 , where ASn = −
Mn 1
Mn t =1
(ln X(t ) − ln F −1 (1 − Bn ))2−
and BSn =
1
Sn
(ln X(t ) − ln F −1 (1 − Bn ))2+ .
Mn t =M +1 n
p
According to Theorem 1 and by using the conditions of Theorem 2 we come to ( HSn )2 − (HS+n )2 −→ 0. Now it is only the P
P
matter of pointing that ASn −→ 0 and BSn −→ 0.
956
I. Ilić / Statistics and Probability Letters 82 (2012) 949–958
First, let us show that for all ρ ∈ J, where J is some neighborhood of 1 holds: P
ln X([ρ Mn ]) − ln F −1 (1 − ρ Bn ) −→ 0,
(4.2)
as n → ∞. For this it is enough to claim that for all ρ ∈ J and ε > 0: P {ln X([ρ Mn ]) − ln F −1 (1 − ρ Bn ) > +ε} → 0
(4.3)
P {ln X([ρ Mn ]) − ln F −1 (1 − ρ Bn ) < −ε} → 0
(4.4)
and
as n → ∞. To prove (4.3) we can write P {ln X([ρ Mn ]) − ln F
−1
(1 − ρ Bn ) > +ε} = P
Sn
Ini ≥ [ρ Mn ]
t =1
=P
Sn 1
Mn t =1
(Ini − E (Ini )) ≥
1 Mn
[ρ Mn ] −
Sn
E (Ini )
.
(4.5)
t =1
P
We have that [ρ Mn ]/Mn −→ ρ as n → ∞. By the similar arguments as in the proof of Lemma 1, one can easily prove that E (Int ) ∼ β1 ρ e−αε as n → ∞. n
Sn
Consequently we get 1
[ρ Mn ] −
Mn
t =1
Sn
E (Ini ) ∼ Sn /βn ρ e−αε as n → ∞ and
P
E (Ini ) −→ ρ(1 − e−αε ) > 0.
t =1
Using relations (4.5) and (4.6) we obtain (4.3). The proof is similar as in the case of relation (4.4). One can easily see that
|ASn | ≤ (ln X(Mn ) − ln F −1 (1 − Bn ))2− and for some ε > 0, we have P {|ASn | > ε} ≤ P {(ln X(Mn ) − ln F −1 (1 − Bn ))2− > ε}. P
Now if we use (4.2) with ρ = 1, we get that ASn −→ 0. P
Next we are proving that BSn −→ 0. For some δ ∈ R+ such that (1 − δ, 1 + δ) ⊂ J, we may write BSn = CSn + DSn , where CSn =
1
[(1+δ) Mn ]
Mn t =M +1 n
(ln X(t ) − ln F −1 (1 − Bn ))2+
and DSn =
1
Sn
(ln X(t ) − ln F −1 (1 − Bn ))2+ .
Mn t =[(1+δ)M ]+1 n
We can prove that both terms Cn and Dn converge in probability to 0. From the implication DSn > 0 ⇒ ln X([(1+δ)Mn ]) − ln F −1 (1 − Bn ) > 0 follows P {DSn > 0} ≤ P {ln X([(1+δ)Mn ]) − ln F −1 (1 − Bn ) > 0}. Furthermore we have P {ln X([(1+δ)Mn ]) − ln F −1 (1 − Bn ) > 0}
= P {ln X([(1+δ)Mn ]) − ln F −1 (1 − (1 + δ)Bn ) > ln F −1 (1 − Bn ) − ln F −1 (1 − (1 + δ)Bn )}.
(4.6)
I. Ilić / Statistics and Probability Letters 82 (2012) 949–958
957
By (4.2) we conclude that P
ln X([(1+δ)Mn ]) − ln F −1 (1 − (1 + δ)Bn ) −→ 0. Also, F −1 is regularly varying at ∞ with the parameter 1/α , and therefore: F −1 (1 − Bn ) 1 lim ln −1 = ln(1 + δ) > 0. n→∞ F (1 − (1 + δ)Bn ) α Thus, P {DSn > 0} → 0 as n → ∞. Considering the term CSn , we may state CSn ≤
1 Mn
([(1 + δ)Mn ] − Mn )(ln X(Mn +1) − ln F −1 (1 − Bn ))2+
≤ δ(ln X(Mn +1) − ln F −1 (1 − Bn ))2+ . So for some ε > 0, we have X(Mn +1) − ln F −1 (1 − Bn ))2+ > δ −1 ε} P {|CSn | > ε} ≤ P {(ln
≤ P {(ln X(Mn ) − ln F −1 (1 − Bn ))+ > P
√ δ −1 ε}.
Using again (4.2) with ρ = 1 we obtain that CSn −→ 0.
Proof of Theorem 3. Invoking the Cramér–Wold device, following the track of Hill’s (2010, Lemma 3, the proof of Theorem 2) and using the fact that, under the maintained assumptions: P
ln X([ρ Mn ]) − ln F −1 (1 − ρ Bn ) −→ 0, (see the proof of Theorem 2) we conclude that
d
Mn (HS+n − EHS+n − α −1 ( Sn(ζ ) − E Sn(ζ ) ))/σM∗ n −→ N (0, 1),
where σM∗ n = E (I {Sn ≥ βn }
1 1/2
Mn
[
Sn
t =1
ζ ζ Yt ))}])2 . { Yt − E ( Yt ) − α −1 ( Yt − E (
Next conclusion straightforward follows from Hsing (1991, Theorem 2.4): d
Mn1/2 (HSn − α −1 )/σM∗ n −→ N (0, 1). 1/2
p
Since σM2 n = E (Mn (HSn − α −1 ))2 , immediately follows that |σM∗ n − σMn | −→ 0.
Remark 3. The Assumption D which appears in Hill’s Theorem 2 and enforces a non-degeneracy property associated with tail arrays seems not to be needed since the above ratio is self standardized. Remark 4. Hsing (1991) proves the asymptotic normality of the Hill estimator for the strong mixing processes under remarkably general conditions. Theorem 3 shows that the similar result holds for the incomplete sequences of the dependent variables under the simplified conditions that have substantial practical advantages. Acknowledgments The author wishes to thank the referee and the Associate Editor for their helpful criticism and useful remarks. References Bacro, J.N., Brito, M., 1995. Weak limiting behaviour of a simple tail Pareto-index estimator. J. Statist. Plann. Inference 45, 7–19. Brito, M., Freitas, A.C.M., 2003. Limiting behaviour of a geometric-type estimator for tail indices. Insurance Math. Econom. 33, 211–226. Chörgó, S., Viharos, L., 1998. Estimating the tail index. In: Asymptotic Methods in Probability and Statistics. North Holand, Amsterdam, pp. 833–881. Davis, R.A., Resnick, S.T., 1984. Tail estimates motivated by extreme value theory. Ann. Statist. 12, 1467–1487. Dekkers, A.L.M., Einmahl, J.H.J., de Haan, L., 1989. A moment estimator for the index of an extreme-value distribution. Ann. Statist. 17, 1833–1855. Davidson, J., 1994. Stochastic Limit Theory. Oxford University Press. Davidson, J., 2004. Moment and memory properties of linear conditional heteroscedasticity models, and a new model. J. Bus. Econom. Statist. 22, 16–29. Graham, J.W., 2009. Missing data analysis: making it work in the real world. Annu. Rev. Psychol. 60, 549–576. Haeusler, E., Teugels, J.L., 1985. On asymptotic normality of Hill’s estimator for the exponent of regular variation. Ann. Statist. 13, 743–756. Hill, B.M., 1975. A simple general approach to inference about the tail of a distribution. Ann. Statist. 3, 1163–1174. Hill, J.B., 2005. On tail index estimation using dependent, heterogenous data. Working paper. Hill, J.B., 2010. On tail index estimation for dependent, heterogeneus data. Econometric Theory 26, 1398–1436. Hill, J.B., 2011. Tail and nontail memory with the applications to extreme value and robust statistics. Econometric Theory 27, 844–884. Hsing, T., 1991. On tail index estimation using dependent data. Ann. Statist. 19, 1547–1569.
958
I. Ilić / Statistics and Probability Letters 82 (2012) 949–958
Kline, P., Santos, A., 2010, Sensitivity to missing data assumptions: theory and an evaluation of the US wage structure. In: 2010 Seoul Summer Economics Conference. Koopman, L., van der Heiden, Geert J.M.G., Grobbee, Diederick E., Rovers, Maroeska M., 2007. Comparison of methods of handling missing data in individual patient data meta-analyses: an empirical example on antibiotics in children with acute otitis media. Am. J. Epidemiol. 167 (5). Leadbetter, M.R., Lindgren, G., Rootzen, H., 1983. Extremes and Related Properties of Random Sequences and Processes. Springer-Verlag New York, Inc., 175 Fifth Ave. 336 pp. Ling, S., Peng, L., 2004. Hill’s estimator for the tail index of an ARMA model. J. Statist. Plann. Inference 123, 279–293. Mladenović, Z., Petrovic, P., 2010. Cagans paradox and money demand in hyperinflation: revisited at daily frequency. J. Int. Money and Finance (29), 1369–1384. Mladenović, P., Piterbarg, V., 2008. On estimation of the exponent of regular variation using a sample with missing observations. Statist. Probab. Lett. 78, 327–335. Prevosti, F.J., Chemisquy, M.A., 2009. The impact of missing data on real morphological phylogenies: influence of the number and distribution of missing entries. Cladistics (26), 326–339. Schultze, J., Steinebach, J., 1996. On least squares estimates of an exponential tail coefficient. Statist. Decisions 14 (1996), 353–372. Tjostheim, D., 1990. Non-linear time series and Markov chains. Adv. Appl. Probab. 22, 587–611. Wang, Q., Luo, R., 2011. Semi-empirical pseudo likelihood for estimating equations in the presence of missing responses. J. Statist. Plann. Inference. http://dx.doi.org/10.1016/j.jspi.2011.02.009.