Journal of Statistical Planning and Inference 134 (2005) 409 – 433 www.elsevier.com/locate/jspi
Asymptotically best linear unbiased tail estimators under a second-order regular variation condition夡 M. Ivette Gomesa,∗ , Fernanda Figueiredob , Sandra Mendonçac a CEAUL and DEIO (FCUL), Universidade de Lisboa, Edificio C2, Campo Grande, Lisboa 1749-016, Portugal b CEAUL and Faculdade de Economia da Universidade do Porto, Portugal c CEAUL and Departamento de Matemática, Universidade da Madeira, Portugal
Received 15 November 2003; accepted 23 April 2004 Available online 5 August 2004
Abstract For regularly varying tails, estimation of the index of regular variation, or tail index, , is often performed through the classical Hill estimator, a statistic strongly dependent on the number k of top order statistics used, and with a high asymptotic bias as k increases, unless the underlying model is a strict Pareto model. First, on the basis of the asymptotic structure of Hill’s estimator for different k-values, we propose “asymptotically best linear (BL) unbiased” estimators of the tail index. A similar derivation on the basis of the log-excesses and of the scaled log-spacings is performed and the adequate weights for the largest log-observations are provided. The asymptotic behaviour of those estimators is derived, and they are compared with other alternative estimators, both asymptotically and for finite samples. As an overall conclusion we may say that even asymptotic equivalent estimators may exhibit very diversified finite sample properties. © 2004 Elsevier B.V. All rights reserved. MSC: primary 62G32; 62E20; secondary 65C05 Keywords: Statistics of extremes; Semi-parametric tail estimation; BLUE estimation
夡
Research partially supported by FCT/POCTI/FEDER.
∗ Corresponding author. Tel.: +351-21-7500041; fax: +351-21-7500081.
E-mail address:
[email protected] (M.I. Gomes). 0378-3758/$ - see front matter © 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.jspi.2004.04.013
410
M.I. Gomes et al. / Journal of Statistical Planning and Inference 134 (2005) 409 – 433
1. Introduction and BLUE In the general theory of Statistics, whenever we ask the question whether the combination of information may improve the performance of an estimator, we are led to think on Best Linear Unbiased Estimators (BLUE), i.e., on unbiased linear combinations of an adequate set of statistics, with minimum variance among the class of such linear combinations. The basic theorem underlying this theory is due to Aitken (1935): If X is a vector of observations with mean values EX = A depending linearly on the unknown vector of parameters , with a known coefficient matrix A, and with a covariance matrix 2 , known up to a scale factor 2 , the least-squares estimator of is the vector ∗ which minimizes the quadratic form (X − A) −1 (X − A). Such a vector is thus the vector of solutions of the “normal equations”, A −1 A∗ = A −1 X. This solution is explicitly given by
∗ = (A −1 A)−1 A −1 X and its variance matrix is Var(∗ ) = 2 (A −1 A)−1 . We are here interested in the estimation of a positive tail index , the shape parameter in the Extreme Value (EV) model exp(−(1 + x)−1/ ), 1 + x > 0 if = 0, EV (x) = exp(− exp(−x)), x∈R if = 0, a model which appears as the only possible non-degenerate limiting law of the maximum value, linearly normalized, of a random sample of size n, as n → ∞. Given a vector of m statistics directly related to the tail index , let us say T ≡ (Tik , i = k − m + 1, . . . , k),
1 m k,
where k is intermediate, i.e., k = kn → ∞,
k/n → 0,
as n → ∞,
(1.1)
let us assume that, asymptotically, the covariance matrix of T is well approximated by 2 , i.e., it is known up to the scale factor 2 , and that its mean value is asymptotically well approximated by
s + (n, k)b. It is thus sensible to investigate the following question: Is it possible to find a linear combination of this set of statistics, “asymptotically unbiased and of minimum variance”? Such a linear combination will be called an “asymptotically best linear (BL) unbiased” estimator based on T, and will be denoted BLT . Our goal is then to find a vector a = (a1 , a2 , . . . , am ) such that a a is minimum, subject to the conditions a s = 1 and a b = 0. The solution of such a problem is easily obtained if we consider the function, H (a; , ) = a a − (a s − 1) − a b
M.I. Gomes et al. / Journal of Statistical Planning and Inference 134 (2005) 409 – 433
411
and solve the equations: 2a − s − b = 0, a s = 1, a b = 0.
(1.2)
From the first equation in (1.2) we get 2a = P , with P = [s b]. Hence, 1 a = −1 P , (1.3) 2 P a = 21 (P −1 P) , and, if there exists (P −1 P)−1 , = 2(P −1 P)−1 P a. But from the last two equations in (1.2) we get a P=[1 0], i.e., P a= 01 , and consequently, −1 −1 1 = 2(P P) . (1.4) 0 If we incorporate (1.4) in (1.3), we get 1 −1 −1 −1 1 a = P(P P) = − b −1 (s b − bs )−1 , 0
(1.5)
where = ||P −1 P||. Since we have denoted T the vector of the m statistics we use to construct the estimator, we get BLT (k; m) := a T, a given in (1.5).
(1.6)
With exact results we could derive b −1 b . We assume that we are working in a context of heavy-tailed models, i.e., for all x > 0, and with U (t) = F ← (1 − 1/t), t 1, F ← the generalized inverse of the underlying model F, one of the following equivalent conditions holds true:
Var(BLT (k; m)) = 2
lim
t→∞
1 − F (tx) = x −1/ ⇐⇒ 1 − F (t)
lim
t→∞
U (tx) = x, U (t)
(1.7)
with the above-mentioned tail index. For heavy tails, the classical tail index estimator is then Hill’s estimator (Hill, 1975): k
H (k) :=
1 {ln Xn−i+1:n − ln Xn−k:n }, k
(1.8)
i=1
where Xi:n , 1 i n, are the ascending order statistics (o.s.) associated to the random sample (X1 , X2 , . . . , Xn ). Hill’s estimator is consistent under the first-order framework in (1.7)
412
M.I. Gomes et al. / Journal of Statistical Planning and Inference 134 (2005) 409 – 433
and for intermediate k, i.e., levels k such that (1.1) holds. Moreover, it is unbiased if the underlying model is a strict Pareto model, i.e., F (x) = 1 − x −1/ , x > 0, with > 0. Aban and Meerschaert (2004) have shown that for this strict Pareto model, the Hill estimator in (1.8) is indeed the BLUE (and also the UMVUE) estimator of the tail index , based on the k + 1 largest o.s. of the log-sample. To achieve asymptotic normality for the Hill estimator in (1.8), under the general firstorder semi-parametric framework in (1.7), we need to know the rate of convergence in (1.7), i.e., we need to assume an additional second order condition. We shall here assume that ln U (tx) − ln U (t) − ln x x − 1 = t→∞ A(t) lim
(1.9)
for all x > 0, where A is a suitably chosen function of constant sign near infinity (positive or negative), and 0 is the second-order parameter. The limit function in (1.9), whenever non-null, is necessarily of this given form, and |A| ∈ RV (Geluk and de Haan (1987)). The notation RV stands for the class of regularly varying functions at infinity with index of regular variation equal to , i.e., positive measurable functions g such that limt→∞ g(tx)/g(t) = x , for all x > 0. We shall further assume throughout the paper that < 0. Remark 1.1. For the strict Pareto model, the numerator of the right hand-side of (1.9) is null, i.e., ln U (tx) − ln U (t) − ln x ≡ 0. Remark 1.2. For Hall’s class of Pareto-type models (Hall and Welsh, 1985), with tail function as x → ∞, (1.10) 1 − F (x) = x −1/ 1 + x / + o(x / )
> 0, ∈ R, < 0, (1.9) holds and we may choose A(t) = Ct , < 0, C = . Under the validity of (1.9), and for intermediate k, the following asymptotic distributional representation
1 d A(n/k)(1 + op (1)) H (k) = + √ PkH + 1− k
(1.11)
holds, where PkH is asymptotically a standard normal random variable (r.v.) (de Haan and Peng, 1998). The BL estimator in this paper differs from the one in Aban and Meerschaert (2004) because whereas in the strict Pareto model the Hill estimator is unbiased, under a general semi-parametric framework the Hill estimator has a non-null asymptotic bias √ whenever k A(n/k)− → = 0, as n → ∞. Our objective here is to reduce (or even to remove) the main dominant component of the asymptotic bias of the Hill estimator in (1.8). Related work may be found in Beirlant et al. (1999), Feuerverger and Hall (1999), Gomes and Martins (2002a,b), among others. In Section 2 of this paper we shall consider “asymptotically best linear combinations” of Hill’s estimators in (1.6), both under a misspecification = −1—a value that
M.I. Gomes et al. / Journal of Statistical Planning and Inference 134 (2005) 409 – 433
413
corresponds to many commonly used heavy-tailed models, like the Fréchet model, F (x) = exp(−x −1/ ), x 0 ( > 0)—and for a general , to be estimated from our sample under an adequate methodology. For a general , we have obtained a computer time consuming estimator of the tail index . And since the computation time has even increased for “asymptotically best linear combinations” of log-excesses, we have decided to consider in Section 3 the same kind of derivation, but now based on the scaled log-spacings. Such a derivation led us to much simpler linear combinations, with almost the same exact behaviour and obviously equivalent asymptotic properties, which enable us easily to provide the weights of the BL estimators based on the largest k + 1 o.s. of the log-sample. In Section 4, using Monte Carlo techniques, we exhibit the finite sample behaviour of the “asymptotically best linear unbiased” estimators, and compare them with the Hill estimator and with the “asymptotically unbiased” estimator with smallest asymptotic variance, among the ones considered in Gomes and Martins (2002b). Finally, in Section 5, an overall conclusion is drawn and an application to the Log Exchange Rates of Euro against UK Pound is provided. The main point of the paper is that, as asymptotically equivalent estimators may exhibit very diversified finite sample properties, it is always sensible to work, in practice, with a few estimators of the tail index. This enables us to choose the more appropriate estimate of , the primary parameter of extreme or even rare events.
2. “Asymptotically unbiased” linear combinations of Hill’s estimators Let us consider Hill’s estimators computed at different intermediate levels k − m + 1, k − m + 2, . . . , k, i.e., let us think on the vector H ≡ (Hn (k − m + 1), Hn (k − m + 2), . . . , Hn (k)). We are thus working with the top k + 1 o.s., down to Xn−k:n . We know that, asymptotically, the covariance matrix of H is well approximated by
2 = 2 [i,j ],
i,j = j,i =
1 , k−m+j
1 i j m.
On the other side, its mean value is asymptotically well approximated by k 1 k−m+1 . .. .. . k 1 + b A(n/k) = : 1 + b A(n/k)b, k−m+i .. . .. . 1 1
(2.1)
where is the second-order parameter formally defined in (1.9), and related to the rate of convergence of normalized maximum values towards a non-degenerate limit law. Here, and taking into account (1.11), we have b = 1/(1 − ).
414
M.I. Gomes et al. / Journal of Statistical Planning and Inference 134 (2005) 409 – 433
We get straightforwardly: Proposition 2.1. The inverse matrix, −1 , of
= [ij ],
i,j = j,i =
1 , k−m+j
1i j m
has entries i,j , 1 i, j m, given by i,i
=
(k − m + 1)(k − m + 2) if i = 1, if i = 2, 3, . . . , m − 1, 2(k − m + i)2 if i = m, k2
i−1,i = i,i−1 = −(k − m + i − 1)(k − m + i),
i = 2, 3, . . . , m
and
i,j = 0,
|i − j | > 1.
2.1. Misspecification of ( = −1) Proposition 2.2. Under a misspecification of at −1, and assuming (2.1), k k −1 P P= , k Pmk where 3k 2 Pmk =3k 3 + 3(m − 1)(k − m + 1)2 + 3(m − 1)2 (k − m + 1) + (m − 1)3 − (m − 1). Also
= ||P −1 P|| = k(Pmk − k) and b −1 b = Pmk . If we put m = k, 0 < 1, k
b −1 b 3 − 32 + 3 + 3 −−→ , k→∞ (2 − 3 + 3)
which is a decreasing function of , converging towards 4 as → 1.
M.I. Gomes et al. / Journal of Statistical Planning and Inference 134 (2005) 409 – 433
415
It is thus sensible to put m = k, and we then obtain 0 0 0 ··· 0 k , P −1 = 2k − 1 − 2k − k4 − k6 · · · − 2(k−1) k 4k 2 −1 3 −k 3k (P −1 P)−1 = 2 k k − 1 −k and consequently
3 (k − 1)(2k − 1) . 2 4 6 · · · 2(k − 1) − a = 2 3 k −1
We may thus state the following result. Proposition 2.3. If we consider m = k, we get b −1 b 2 4k 2 − 1 = . k k2 − 1 (−1) The weights ai , i = 1, 2, . . . , k, in BLH (k) = ki=1 ai H (i), are given by Pkk =
ai =
4k 2 − 1 , 3k
6i , k2 − 1
=
k2 − 1 , 3
2
i = 1, 2, . . . , k − 1,
ak = −
2k − 1 . k+1
We are therefore interested in the estimator k−1
(−1)
BLH (k) :=
6 2k − 1 H (k). iH (i) − 2 k+1 k −1
(2.2)
i=1
√ Since we have misspecified , we no longer have a null asymptotic bias whenever k A(n/k) −−→ , finite and non-null, unless = −1. We may state the following n→∞ result: Theorem 2.1. Under the first-order framework in (1.7), and for intermediate k, the linear combination in (2.2) is consistent for the estimation of the tail index . If we further assume a second-order framework, i.e. if we assume that (1.9) holds, the estimator is asymptotically normal. More specifically, the asymptotic distributional representation 2 BL(−1) 2(1 + ) d (−1) BLH (k) = + √ Pk H + A(n/k)(1 + op (1)) (1 − )(2 − ) k (−1)
BLH
holds true, where Pk
(2.3)
is an asymptotically standard normal r.v.
Proof. The proof comes straightforwardly from Hill’s asymptotic distributional represen (−1) BLH √ tation in (1.11), together with the results in Proposition 2.3. The summand 2Pk / k in (2.3), related to the asymptotic variance, comes from the fact that k
b −1 b −−→ 4 k→∞
416
M.I. Gomes et al. / Journal of Statistical Planning and Inference 134 (2005) 409 – 433
and the bias term {2(1 + )A(n/k)/((1 − )(2 − ))} in (2.3) comes from the fact that k−1 6 (k − 1)(2k − 1) 1− k i − ab= 2 k −1 6 i=1
converges towards 2(1 + )/(2 − ), as k → ∞.
Remark 2.1. We are able to get a null bias for = −1, but at the expenses of an increase in the variance, which is 4 times the asymptotic variance of Hill’s estimator—the old trade-off between variance and bias, and a price we have to pay for the second-order bias correction. Remark 2.2. The asymptotic behaviour of this new estimator is the same of the MLestimator, k k (2i − k − 1)Ui 1 (−1) ML (k) = H (k) − iU i ki=1 (2.4) k i=1 i(2i − k − 1)Ui i=1 (Gomes and Martins (2002a)), based on the scaled log-spacings Ui := i[ln Xn−i+1:n − ln Xn−i:n ],
1 i k.
(2.5)
But their finite sample behaviour is different, as we shall see in Section 4. 2.2. “Asymptotically unbiased” linear combination of Hill’s estimators for a general The equivalent to Proposition 2.3 is: Proposition 2.4. For a general , and whenever we consider m = k > 2 levels, we have k k , with P −1 P = k Pkk k −2 Pkk = k 2(1− ) − 2
k−1
i 1− ((i + 1)1− − i 1− ).
i=1
We have = ||P −1 P|| = k(Pkk − k), and b −1 b = Pkk (as in Proposition 2.2). The weights aiH = aiH ( ), i = 1, 2, . . . , k, in ( )
BLH (k) =
k i=1
aiH ( )H (i),
are given by aiH ( ) =
k 2 (−i(i − 1)Si−1 + 2i 2 Si − i(i + 1)Si+1 ), k(Pkk − k)
1 i k − 1, (2.6)
M.I. Gomes et al. / Journal of Statistical Planning and Inference 134 (2005) 409 – 433
akH ( ) =
k 2 (−k(k − 1)Sk−1 + k 2 Sk ), k(Pkk − k)
417
(2.7)
where Si =
k−1
j (j − − i − )(2j 1− − (j − 1)1− − (j + 1)1− )
j =1
+ k(k − − i − )(k 1− − (k − 1)1− ),
1 i k.
Moreover, lim
k→∞
kb −1 b Pkk = = lim k→∞ Pkk − k
1−
2 .
(2.8)
Proof. Note that for any 1, and as k → ∞, ki=1 i = k +1 /( + 1) + k /2 + o(k ). Consequently Pkk = (k(1 − )2 /(1 − 2 ))(1 + o(1)), and (2.8) follows. From the previous results we may thus derive the following: Theorem 2.2. If the second-order condition (1.9) holds and if k = kn is a sequence of intermediate positive integers, i.e., (1.1) holds, then, with ( )
BLH (k) :=
k i=1
aiH ( )H (i),
(2.9)
aiH ( ), 1 i k − 1 and akH ( ) given in (2.6) and (2.7), respectively,
(1 − ) BL( ) ( ) d Pk H + op (A(n/k)) BLH = + 2 k BL
( )
with Pk H asymptotically standard normal. √ Consequently, if k A(n/k) −−→ , finite, non-necessarily null, then n→∞
d √ ( ) 2 (1 − )2 . k BLH (k) − −−→ Normal 0, n→∞ 2
(2.10)
Remark 2.3. We think sensible to conjecture that (2.10) remains true whenever we replace in (2.9) by any consistent estimator of , such that − = op (1) independently of the level k on which we intend to base the estimation of the tail index , i.e., a level k such that √ k A(n/k)− → , finite, non-necessarily null, as n → ∞. However, we have been unable to establish this conclusion, namely because of the intricate expressions of the weights aiH ( ), 1 i k, in (2.6) and (2.7). We thus postpone the use of the estimation of the second-order
418
M.I. Gomes et al. / Journal of Statistical Planning and Inference 134 (2005) 409 – 433
parameter for the next section, where we shall consider, without loss of generality and for sake of simplicity, linear combinations of scaled log-spacings.
3. Linear combinations of top log-observations as linear combinations of scaled log-spacings We first state the following: Proposition 3.1. A linear combination of the k + 1 largest log-observations, Ln (k) :=
k+1
bi ln Xn−i+1:n
(3.1)
i=1
is scale invariant if and only if
k+1 i=1
bi = 0.
Proof. to X/C, C > 0, the estimator in (3.1) changes Indeed, if we change our data from X k+1 to k+1 b ln(X /C) = L (k) − ln C n i=1 bi = Ln (k) if and only if C = 1 (there is i=1 i n−i+1:n k+1 no change) or i=1 bi = 0. Proposition 3.2. Any linear combination of Hill’s estimators in (1.8), up to k, as well as any linear combination of the scaled log-spacings in (2.5), may be written as a scale invariant linear combination of the {k + 1} top o.s. of the log-observations. Conversely, any scale invariant linear combination (3.1) may be written as a linear combination of Hill’s estimators, as well as a linear combination of the scaled log-spacings,
Ln (k) :=
k+1
bi ln Xn−i+1:n =
i=1
k
ai H (i) =
i=1
k
c i Ui ,
i=1
where, for 1 i k, k aj bi = − ai−1 j
(a0 = 0, bk+1 = −ak )
(3.2)
j =i
and i
bi = ici − (i − 1)ci−1 , (bk+1 = −k ck ) ⇐⇒ ci =
1 bj . i j =1
(3.3)
M.I. Gomes et al. / Journal of Statistical Planning and Inference 134 (2005) 409 – 433
419
Proof. For a linear combination of Hill’s estimators we get k
k i ai {ln Xn−j +1:n − ln Xn−i:n } i i=1 j =1 k i ai = ln Xn−j +1:n − i ln Xn−i:n i
ai H (i)=
i=1
i=1
j =1
k ai = ln Xn−j +1:n − ai ln Xn−i:n i i=1 i=1 j =1 k k k+1 a i ln Xn−j +1:n − = aj −1 ln Xn−j +1:n i k
j =1
i
i=j
j =1
with a0 = 0, and hence (3.2), with k
ci U i =
i=1
= =
k i=1 k i=1 k
k+1
i=1 bi
= 0. On the other side
ici {ln Xn−i+1:n − ln Xn−i:n } ici ln Xn−i+1:n −
k+1
(i − 1)ci−1 ln Xn−i+1:n (c0 = 0)
i=1
(ici − (i − 1)ci−1 ) ln Xn−i+1 − kck ln Xn−k:n
i=1
and (3.3) follows, again with
k+1
i=1 bi
= 0.
Since the linear combination obtained in Section 2 for a general is a bit intricate, and things become still worse if we directly approach linear combinations of log-excesses, we are now going to investigate linear combinations of scaled log-spacings, in order to obtain the weights of the largest (k + 1) log-observations.
3.1. Linear combinations of scaled log-spacings Let us think on the vector of scaled log-spacings U ≡ (Ui = i[ln Xn−i+1:n − ln Xn−i:n ], 1 i k). For heavy tails, and whenever k = kn → ∞ and k = kn = o(n), as n → ∞, i.e., whenever (1.1) holds, the scaled log-spacings Ui , 1 i k, are approximately independent
420
M.I. Gomes et al. / Journal of Statistical Planning and Inference 134 (2005) 409 – 433
and exponential with mean value 1 −
1 . .. + A(n/k) 1 .. . 1
k
. .. − i = : 1 + A(n/k)b, k . ..
(3.4)
1
as n → ∞ (Beirlant et al., 1999; Draisma, 2000, pp. 43–59), where A(.) and < 0 are the function and the parameter related to the second-order behaviour of F in (1.9). Note that, in a similar way, Feuerverger and Hall (1999) consider the scaled log-spacings in (2.5) as approximately independent and exponential with mean value exp(D(n/ i) ), 1 i k, assuming thus to be working in Hall’s class of Pareto-type models in (1.10), where A(t) = Ct , D = C/. We may thus consider that, approximately, =I, the identity matrix, and “asymptotically best linear unbiased” combinations of the scaled log-spacings are for sure easier to derive. We may state the following: Proposition 3.3. Assuming (3.4), as well as = I, we have
k − j
k − j k j =1 k j −2
j =1
j =1
P −1 P =
k k
k
and consequently,
= ||P − P|| = k
k −2 j j =1
k
−
k − j j =1
k
2 .
The weights aiU = aiU ( ), i = 1, 2, . . . , k, are given by aiU
− k k − j 1 j −2 i , = − k k k j =1
1 i k.
j =1
Moreover, kb −1 b lim = lim k→∞ k→∞
i −2 1 k i=1 k k i −2 1 k i − 2 1 k − k i=1 k i=1 k k
=
1−
2 .
M.I. Gomes et al. / Journal of Statistical Planning and Inference 134 (2005) 409 – 433
421
We therefore obtain the BL r.v., ( )
BLU (k) := =
k
aiU ( )Ui
i=1 1 k
k i=1 Ui
−2 k 1 i
− i=1 k k −2 k 1 i k
i=1 k
− k 1 i k
−
i=1 k
Ui
− k 1 i
− 2 k i 1 k
k
i=1 k
i=1 k
(3.5) based on the scaled log-spacings in (2.5). Remark 3.1. As for > 0, lim
k→∞
$ 1 k 1 i 1 = x dx = k k + 1 0 i=1
the denominator in (3.5) converges towards 1 − 1 − 2
1 1−
2 =
2 (1 − )2 (1 − 2 )
and consequently the r.v. in (3.5) is asymptotically equivalent to 2 − & k 1 − i 1 (1 − )(1 − 2 ) ) ( % U (k) := BL Ui . − k 2 k
(3.6)
i=1
We may state the following general result: ( )
Theorem 3.1. The limiting normal behaviour (2.10), in Theorem 2.2, holds true for BLU (k) % (U ) (k) in (3.6). Moreover, the same limiting normal behaviour in (3.5), or equivalently, for BL ) ( ) % ( remains true if we consider the tail index estimators BLU (k), or BL U (k), for any second , such that order parameter estimator − = op (1) independently of k, the level on which we are going to base the estimation of the tail index .
Proof. The first part of the theorem results from the fact that ki=1 i −1 /k =1/ +O(1/k), and that, as proven in Gomes and Martins (2002a), we get for > 1, k
k i=1
−1 i A(n/k) d () Ui = + √ (1 + op (1)), Zk + k − (2 − 1)k
1,
√ () k −1 E /k − 1/ is asymptotically standard normal, where Zk = (2 − 1)k i i=1 i being {Ei } a sequence of independent, standard exponential r.v.’s. The asymptotic variance
422
M.I. Gomes et al. / Journal of Statistical Planning and Inference 134 (2005) 409 – 433
comes from the random term √ 1 − 2 (1) (1 − ) 1 − 2 (1− ) Zk − Zk √ 2 k (1)
(1− )
is and √ from the fact that the asymptotic covariance between Zk and Zk 1 − 2 /(1 − ). The second part of the theorem results essentially from the distributional representation k () 22 Wk 2 i −1 d ln(i/k) Ui = + − √ k k (2 − 1) 2(2 − 1)k i=1 2 + A(n/k) (1 + op (1)), − √ () where Wk = (2 − 1) k (2 − 1)/2 k1 ki=1 ( ki )−1 ln( ki )Ei + 12 is asymptotically standard normal. ( ) % (U ) in order to , provides then a r.v. that may be The derivative of either BLU or BL √ written as k ( ) = Op 1/ k + Op (A(n/k)). Since the use of the -method enables us to write ( )
( )
d
BLU (k) = BLU (k) + ( − ) k ( )(1 + op (1)) √ √ d √ ( ) ( ) with k ( )=Op 1/ k +Op (A(n/k)), we may write k(BLU (k)−) = k(BLU (k)− √ ) + op (1), whenever k A(n/k) −−→ finite, non-necessarily null. n→∞
Remark 3.2. For we may choose, such as in Gomes and Martins (2002b), one of the estimators in Fraga Alves et al. (2003). More specifically, we shall consider here particular members of the class of estimators ( ) 3(Tn (k1 ) − 1) 2n , k , (3.7) := min 0, = min n − 1, 1 ( ) ln ln n Tn (k1 ) − 3 where Tn( ) (k) :=
(1) (2) (Mn (k)) −(Mn (k)/2) /2 (M (2) (k)/2) /2 −(M (3) (k)/6) /3 n
if > 0,
n
(1)
(2)
ln(Mn (k))− 21 ln(Mn (k)/2)
1 2
(2) ln(Mn (k)/2)− 13
(3)
ln(Mn (k)/6)
if = 0
with (j )
Mn (k) :=
k Xn−i+1:n j 1 ln , k Xn−k:n
j = 1, 2, 3.
i=1
We advise practitioners not to choose blindly the value of in (3.7). It is sensible to draw a few sample paths of (n ) (k) in (3.7), as functions of k, selecting the value of which
M.I. Gomes et al. / Journal of Statistical Planning and Inference 134 (2005) 409 – 433
423
provides higher stability for large k, by means of any stability criterion. Anyway, in all the Monte Carlo simulations we have considered the level k1 in (3.7) and the -estimators (0) 3(Tn (k1 ) − 1) , advisable for − 1 0 := min 0, (0) Tn (k1 ) − 3 and
1 := min 0,
(1)
3(Tn (k1 ) − 1) (1)
Tn (k1 ) − 3
,
advisable for < − 1.
= √in (3.7), we have − = Remark √3.3. With any of the estimators ))) = o (1) for any level k such that kA(n/k) →
, finite. Indeed, we Op (1/( k1 A(n/k 1 p √ may even have √kA(n/k) → ∞, getting the same limiting result in (2.10), provided that − = op (1/( kA(n/k))), and this happens for every k of smaller order than n/ ln ln n. Remark 3.4. The results in the second part of Theorem 3.1 are now equivalent to the ones got in Gomes and Martins (2002b) for the “Maximum Likelihood” estimator, based on an external estimation of , and given by k k 1 1 − ( ) Ui − i Ui ML (k) := k k i=1 i=1 k k k − − U − k i U i i i i=1 i=1 i=1 × (3.8) k k k − − −2 U i i U i − k i i i=1 i=1 i=1 also based on the scaled log-spacings Ui in (2.5). Remark 3.5. At a first sight it may appear that we are working in Csörg˘o et al.’s (1985) class of kernel estimators of , defined as i 1 n i=1 K k Ui k Kn (k) := 1 n i , i=1 K k k $ ∞ $ ∞ K(u) du = 1, u−1/2 K(u) du < ∞, K 0. 0
0
% (U ) (k) in (3.6) has a “pseudo-kernel” Indeed, BL 1 − 2 (1 − )(1 − 2 ) − K(u) := − u , 0 u 1. 2 (1 (1 Note however that although 0 K(u) du = 1 and 0 u−1/2 K(u) du < ∞, we do not have K(u) > 0, as needed in Csörg˘o et al. (1985). We are not also in the more general class of estimators in Drees (1998), which includes this type of “asymptotically unbiased” tail index estimators, now because our functional of the quantile function Qn (t) = Xn−[kn t]:n , t ∈
424
M.I. Gomes et al. / Journal of Statistical Planning and Inference 134 (2005) 409 – 433
Fig. 1. Simulated sample paths of the BL and ML-estimators for a Fréchet(1) parent.
[0, 1] depends on , which uses a larger number of top o.s. than the number k used for the tail index estimation. In Drees’ class of functionals the minimal asymptotic variance is given by ((1 − )/ )2 , and this is also the value we get here—see formula (2.10). As expected, the exact behaviour of “asymptotically best linear unbiased” combinations based on the Hill estimators, on the log-excesses or on the scaled log-spacings, do not differ significantly, as it is shown in Fig. 1, where we picture, for a Fréchet underlying parent (−1) (−1) (−1) and for a sample size n = 1000, the sample paths of BLH , BLU and BLV , where V denotes the vector of the log-excesses V ≡ (Vik := ln Xn−i+1:n − ln Xn−k:n , 1 i k). ( )
( )
Those sample paths almost overlap. We picture also sample paths of BLH i , BLU i and ) (
BLV i , i = 0, 1, also almost overlapping. Only for very small values of k, as well as, but not so significantly, for large values of k, do appear significative differences (of a larger size than 0.01), which have no special influence in the final properties of the estimators. This justifies the use, in practice, as well as in the simulations performed, of BLU , instead of i ) , i = 0, 1 are also pictured, either BLH or BLV . The sample paths of ML(−1) and ML( showing a significantly different behaviour between the BL and the ML statistics, in spite of their asymptotic equivalence. (•) (•) (•) These estimators BLH (k), BLU (k) and BLV (k), with either misspecified at −1 or estimated through , are all linear combinations of the k + 1 top log-observations, with weights made explicit in the sequel. 3.2. Linear combinations of top log-observations From the results derived in Section 3.1, more specifically from the weights of the scaled log-spacings in (3.6), we may easily derive the weights of the top log-observations enabling an “asymptotically unbiased” estimation of a positive tail index , under the second-order regular variation condition in (1.9):
M.I. Gomes et al. / Journal of Statistical Planning and Inference 134 (2005) 409 – 433
425
Theorem 3.2. The “asymptotically best linear unbiased” combination of the k + 1 top log-observations for a positive tail index , and under the second-order regular variation condition in (1.9) is given by ) (k) := BL(
k+1 i=1
with
aiL ( ) ln Xn−i+1:n
(3.9)
*& ) 2 1− 1− k(1 − 2 i ) 1 i − 1 1 − 1− ) = − aiL ( 1 − k k k
for 1 i k, and L ( ) = − ak+1
1 − .
) (k) in (3.9). The results in Theorem 3.1 hold true for the estimator BL(
Remark 3.6. For an “asymptotically BL unbiased” estimator based on the top k + 1 largest observations, the largest weight is the one of {ln Xn−k:n }, equal to the positive value −(1 − )/ , which goes to +∞, as → 0, approaching 1, as → −∞. The weights of the k largest log-observations, ln Xn−i+1:n , 1 i k, are decreasing with i, from a positive value at i = 1 till a negative value at i = k (both going to zero, as k → ∞, at a rate of the order of 1/k). Indeed, 2 ka L 1 ( ) −−→ ((1 − )/ ) k→∞
and
2 ka L k ( ) −−→ 2(1 − ) / . k→∞
Note also that for the Hill estimator at the level k, the weight of {ln Xn−k:n } is negative and equal to −1, and the weights of the other k top log-observations are all positive and equal to 1/k. 4. The finite sample behaviour of the estimators To enhance the fact that despite their asymptotic equivalence BL(−1) in (2.2) and ML(−1) in (2.4), here denoted BL and ML, respectively, have a different behaviour as the underlying model changes, we present Figs. 2 and 3. In Fig. 2 we present the relative efficiencies of BL|ML at their optimal levels. Such a measure is given by + MSE[ML0 ] REFFBL|ML = MSE[BL0 ] with both estimators considered at their optimal levels, i.e., the levels where the mean squared error is minimum: for any estimator G of the tail index , we denote G0 =G(k0G (n)), with k0G (n) := arg mink MSE[G(k)]. Notice that high relative efficiencies correspond to better performances of the BL-estimator relatively to the ML-estimator, and the other way
426
M.I. Gomes et al. / Journal of Statistical Planning and Inference 134 (2005) 409 – 433 1.5 4
BURR(−0.25) 3
2
BURR(−0.25)
BURR(−0.5)
2
1
1.0
BURR(−2) 1
FRE
BURR(−1) 0 0
5000
10000
15000
20000
0.5 0
5000
n
10000
15000
20000
n
Fig. 2. Relative efficiencies of BL|ML for different parents.
E
3
MSE
0.8
H
2.5
H
0.6
ML(−1)
2
ML(−1)
0.4
1.5 1
BL(−1)
0.2 BL(−1)
0.5
0
0 0
200
400
k
600
800
0
200
400
600
800
k
Fig. 3. Simulated distributional behaviour of the estimators under study for a Burr(1, −0.25) model (n = 1000).
round for low relative efficiencies. Notice also that in Fig. 2 (right) we have pictured, in a different scale, the three central REFF measures presented in Fig. 2 (left), in a dark shade. The reason for this will be given below. Note that this figure must be carefully interpreted, so that it is not misleading—the estimators ML and BL are compared at different (optimal) k-values. Note however that this is what we try to achieve in practice when the adaptive choice of k is based on the minimum estimated mean squared error (Draisma et al., 1999; Danielsson et al., 2001; Gomes and Oliveira, 2001, among others). Some general comments: 1. As approaches 0, the BL mean squared error exhibits two minima for large n—look at Fig. 3, where we picture, for samples of size n=1000, the mean value and the mean squared error of the BL and the ML estimators, together with the Hill estimator, for a Burr(1,−0.25) parent, with d.f. F, (x) = 1 − (1 + x − / )1/ , x 0, > 0, < 0. For such a value of , the global minimum is always achieved at the largest k-value, and the comparison has been done for both minimum values of MSE. The REFF measure associated to the global minimum (the second one) is pictured in Fig. 2 (left), in a grey shade and with the subscript 2, and the one associated to the first minimum is pictured in the same figure (left and right), with the subscript 1. 2. For a Burr(1, −0.5) model, the “two minima” are undistinguishable for the sample sizes considered (see Fig. 5, for instance), and we are thus working with the unique minimum
M.I. Gomes et al. / Journal of Statistical Planning and Inference 134 (2005) 409 – 433
E
1.2
427
MSE
0.03
H 0.02
ML( ˆ0 )
BL (−1)
1
BL( ˆ0 ) 0.01
BL (−1)
ML (−1)
H
( ˆ0 )
BL 0.8
ML (−1)
0.00 200
400
600
800
1000
0
200
400
k
600
800
ML( ˆ 0 ) 1000
k k SD
6
BL( ˆ 0 )
4
ML (−1) BL (−1)
2
ML( ˆ 0 )
H 0 0
200
400
600
800
1000
k Fig. 4. Simulated distributional behaviour of the estimators under study for n=1000 in a Fréchet(1) parent ( =−1).
value really available (assumed to be the “second one”). This is the reason why we picture the relative efficiency also only in Fig. 2 (left) and in a grey shade. 3. For a Burr(1, −1) model, the ML estimator reveals no bias, and that’s the reason for the small relative efficiency of BL|ML, pictured also only in Fig. 2 (left), in a grey shade. 4. The results we think sensible to consider, at the present state-of-the-art, are the ones in Fig. 2 (right), also pictured in the central part of Fig. 2 (left), all in a dark shade. And for those parents, the relative efficiency will ultimately achieve the value one, although still a long way from one for the Fréchet model and for a sample size n = 20, 000. As mentioned before, we think that the other three REFF measures, the ones in grey, are related to specific peculiarities of the estimators, which deserve further study. 5. The main point seems to be the following: asymptotically equivalent estimators may reveal quite distinct finite sample behaviour, and work differently in practice. It is therefore sensible to work with a few of them, instead of one only. The Hill estimator may be considered as the best one, among the estimators under study, if we use a small (up to moderate) number of top o.s., because the Hill estimator exhibits then the smallest MSE. But the mean squared error of Hill’s estimator is very sharp if | | < 1 (see Figs. 3 and 5). Then, a small change in k may imply a drastic change in the estimate and in the associated mean squared
428
M.I. Gomes et al. / Journal of Statistical Planning and Inference 134 (2005) 409 – 433
E
2
MSE
0.2
H
H
BL ( ˆ0 )
1.5
ML(-1) ML( ˆ0 ) 1
ML(-1)
0.1
BL(-1) 0.5
BL
( ˆ 0 )
ML( ˆ0 )
0
BL(-1)
0 0
200
400
600
800
0
200
400
k
600
800
k k SD
6 5
ML(-1)
4
BL ( ˆ0 )
3
BL(-1)
2 1
H
ML( ˆ0 )
0 0
200
400
600
800
1000
k Fig. 5. Simulated distributional behaviour of the estimators under study for a Burr(1, −0.5) model (n = 1000).
error. Both the BL(−1) and the ML(−1) estimators improve over the Hill, when we look at “stability” of the sample paths for a wide region of k-values (around the target value ), even when they are not competitive regarding minimum mean squared error (see also Section 5). In we present the patterns of mean values (E), mean squared errors (MSE) and √ Figs. 4–6√ of k Var= : k SD (with Var denoting variance, and SD standard deviation) of the Hill (H) and of the BL ≡ BL(−1) and the ML ≡ ML(−1) for the Fréchet(1), the Burr(1,−0.5) and the Burr(1,−2) parents, respectively. The simulation has been based on 1000 runs. In these ( ) figures we also picture the mean value and mean squared error of the estimators BLU i , i ) , as well as of ML( i ) , for the adequate value of i, either 0 or 1. denoted BL( We also present in Tables 1–3 the simulated optimal sample fractions, bias and mean squared errors of the different estimators under investigation. Denoting generically G(k) any estimator of , we shall denote OSFG := k0G (n)/n, k0G (n) := arg mink MSE[G(k)], E0G := G E[G(k0G (n))] and MSEG 0 := MSE[G(k0 (n))]. The Monte Carlo simulation herewith implemented is a multi-sample simulation of size 1000 × 10 for n = 100, 200, 500 and 1000, and of size 1000 × 5 for n = 2000 and 5000. Standard errors are not presented, but are available from the authors. The smallest bias and mean squared errors are doubleunderlined, being single underlined the second smallest.
M.I. Gomes et al. / Journal of Statistical Planning and Inference 134 (2005) 409 – 433
E 1.2
429
MSE
0.04
H 0.03
ML(-1)
( ˆ1)
ML 1
BL(-1)
0.02
ML(-1)
BL( ˆ1) BL( ˆ1)
0.01
BL(-1)
ML( ˆ1)
H
0.8
0.00 0
200
400
600
800
1000
0
200
400
600
800
1000
k
k k SD 6
ML(-1)
4
BL(-1) 2
BL
ML( ˆ1)
( ˆ1)
H 0 0
200
400
600
800
1000
k Fig. 6. Simulated distributional behaviour of the estimators under study for a Burr(1, −2) model (n = 1000).
We now advance some extra comments: 6. For not very large values of n (say n 1000) there exists only a slight improvement, or even no improvement at all, in terms of smaller minimum mean squared error, when we use instead of a misspecification of at −1, unless is reasonably small (see Fig. 6, for a parent with = −2). Then the improvement is really significant. 7. As approaches 0 (see Fig. 5, for a parent with = −0.5), there is a significant 0 ) . Indeed, difference between the mean value patterns of the estimators BL(−1) and BL( ( ) the BL 0 estimator exhibits sample paths highly stable around the target value , but with a reasonably high volatility. Such a volatility gives rise to “similar” mean squared errors, as functions of k, both when we misspecify or estimate . enables 8. In general, we may say that, whenever = −1, the replacement of =−1 by us to achieve, with the BL-estimates, sample paths with a reasonable high volatility, but around the target value for a wider region of k-values. Indeed, the sample paths of the BL-estimators are much more stable (around the target value ) than those of the equivalent ML-estimators. The trouble with the BL-estimators is related to the fact that the variance of √ k(BL(k) − ) is, for finite n, an increasing function of k, contrarily to what happens to the
430
M.I. Gomes et al. / Journal of Statistical Planning and Inference 134 (2005) 409 – 433
Table 1 Simulated optimal sample fractions (OSF) of the tail index estimators BL(−1)
ML(−1)
0 ) BL(
0 ) ML(
1 ) BL(
1 ) ML(
ST U DEN T ( = 4) 200 0.0405 500 0.0278 1000 0.0195 2000 0.0141 5000 0.0095
0.1990 0.2024 0.2070 0.2139 0.2126
0.1280 0.0968 0.0697 0.0419 0.0415
0.1180 0.1222 0.1181 0.1112 0.1012
0.2965 0.2096 0.1594 0.1230 0.0853
0.1470 0.2354 0.3534 0.3697 0.3721
0.1450 0.0846 0.0577 0.0390 0.0241
ST U DEN T ( = 2) 200 0.0850 500 0.0672 1000 0.0563 2000 0.0458 5000 0.0315
0.2115 0.1818 0.1678 0.1571 0.1371
0.3850 0.3536 0.3214 0.2829 0.2465
0.1310 0.1410 0.1339 0.1143 0.0947
0.3855 0.4252 0.4439 0.4633 0.4762
0.2005 0.2468 0.2767 0.3176 0.3194
0.2640 0.1908 0.1607 0.1139 0.0894
ST U DEN T ( = 1) 200 0.1665 500 0.1448 1000 0.1339 2000 0.1191 5000 0.0971
0.2500 0.2174 0.1985 0.1815 0.1553
0.3780 0.3356 0.2816 0.2386 0.1919
0.1665 0.1950 0.1901 0.1737 0.1496
0.3685 0.2978 0.2638 0.2199 0.1876
0.2445 0.2560 0.2175 0.2560 0.2409
0.3740 0.3752 0.2922 0.3311 0.3055
n
H
Table 2 Simulated bias of the tail index estimators n
H
BL(−1)
ML(−1)
0 ) BL(
0 ) ML(
1 ) BL(
1 ) ML(
ST U DEN T ( = 4) 200 0.0572 500 0.0536 1000 0.0546 2000 0.0856 5000 0.0259
−0.0541 −0.0436 −0.0045 −0.0022 −0.0072
0.0754 0.0658 0.0628 0.0284 0.0405
−0.0594 −0.0434 −0.0081 0.0072 −0.0118
0.0105 0.0349 0.0349 0.0540 0.0220
−0.0018 0.0295 0.0172 0.0252 0.0027
0.0760 0.0780 0.0570 0.0760 0.0314
ST U DEN T ( = 2) 200 0.1305 500 0.0078 1000 0.0155 2000 0.0154 5000 0.0111
0.0498 −0.0869 −0.0545 −0.0180 −0.0252
0.0397 −0.0191 0.0013 0.0105 −0.0006
−0.0848 −0.1176 −0.0812 −0.0432 −0.0413
−0.0083 −0.0748 −0.0290 −0.0127 −0.0100
0.0653 −0.0567 −0.0075 −0.0061 −0.0047
0.0986 −0.0127 0.0120 0.0219 0.0107
ST U DEN T ( = 2) 200 0.0089 500 0.0171 1000 0.0334 2000 0.0278 5000 0.0206
−0.2525 −0.1250 −0.0413 −0.0253 −0.0402
−0.1782 −0.1128 −0.0576 −0.0502 −0.0419
−0.2567 −0.1866 −0.0289 −0.0325 −0.0413
−0.2137 −0.1178 −0.0455 −0.0406 −0.0443
−0.2756 −0.0341 −0.0257 −0.0081 0.0031
−0.0979 −0.0278 0.0074 0.0187 0.0228
M.I. Gomes et al. / Journal of Statistical Planning and Inference 134 (2005) 409 – 433
431
Table 3 Simulated mean squared errors of the tail index estimators BL(−1)
ML(−1)
0 ) BL(
0 ) ML(
1 ) BL(
1 ) ML(
ST U DEN T ( = 4) 200 0.0206 500 0.0109 1000 0.0073 2000 0.0047 5000 0.0028
0.0179 0.0071 0.0035 0.0018 0.0007
0.0084 0.0055 0.0037 0.0022 0.0022
0.0516 0.0150 0.0070 0.0034 0.0014
0.0084 0.0049 0.0032 0.0023 0.0014
0.0209 0.0103 0.0055 0.0031 0.0012
0.0196 0.0111 0.0073 0.0048 0.0028
ST U DEN T ( = 2) 200 0.0238 500 0.0115 1000 0.0069 2000 0.0041 5000 0.0022
0.0370 0.0156 0.0083 0.0044 0.0020
0.0084 0.0038 0.0022 0.0013 0.0007
0.1650 0.0304 0.0154 0.0082 0.0039
0.0129 0.0039 0.0017 0.0007 0.0003
0.0365 0.0100 0.0050 0.0025 0.0010
0.0177 0.0092 0.0057 0.0035 0.0019
ST U DEN T ( = 1) 200 0.0366 500 0.0164 1000 0.0093 2000 0.0054 5000 0.0025
0.1025 0.0437 0.0241 0.0135 0.0062
0.0396 0.0217 0.0141 0.0091 0.0046
1.2807 0.0555 0.0274 0.0151 0.0066
0.0809 0.0293 0.0169 0.0105 0.0050
0.2114 0.0249 0.0126 0.0057 0.0023
0.0378 0.0131 0.0080 0.0036 0.0016
n
H
ML-estimators. This gives rise to a general better performance of the ML comparatively to the equivalent BL, when both are considered at their optimal levels.
5. An overall conclusion and an application Heavy-tailed data are quite common in telecommunication traffic and financial time series. When analysing such a type of data one never knows how much the underlying model differs from a strict Pareto model. And this is the unique situation where the Hill estimator is “perfect”. Apart from what has been said for specific situations, there is no general clear pattern as to whether the Hill, the BL or the ML estimator behaves better—all depends on the specificity of the underlying heavy-tailed model and on the practitioner’s objectives. If we want to use (or have only access to) a very small number of top o.s., the Hill estimator is perhaps the most adequate one, among the tail index estimators herewith considered. If our aim is to achieve sample paths around the target for a wide region of k-values, we should perhaps choose the BL estimator, although at the expenses of a higher volatility. On the other side, if we are more interested in the minimization of the mean squared error, the ML estimator is usually preferable to the BL estimator. Note that 1 ) estimator even overpasses the Hill regarding minimum mean squared error, the ML( estimator in a region where it has been difficult to find competitors to the Hill estimator, i.e., the region of -values smaller than −1 (see Fig. 6).
432
M.I. Gomes et al. / Journal of Statistical Planning and Inference 134 (2005) 409 – 433 k
0 0
200
-1
400
600
1.25
ˆ0 ( k)
1.00
H 0.75 -2
ˆ1 (k)
0.50
-3 0.25
BL( 0 ) ˆ
ML( 0) ˆ
0.00
-4
0
200
400
600
k
Fig. 7. Estimates of the second-order parameter (left) and of the tail index (right) for the Daily Log-Returns of the Euro-UK Pound.
After taking a decision on the estimate of , and assuming that | | 1, a situation which seems to appear often in practice, we should simultaneously picture the sample path of a 0 ) and the few tail index estimators, with different specificities, let us say the Hill, the ML( ( ) BL 0 . On the basis of those sample paths we may then get, in a more appropriate way, a sensible estimate of the tail index , like we shall see later on, in the application provided. We shall herewith consider an illustration of the performance of the above mentioned estimators, through the analysis of the Euro-UK Pound daily exchange rates from January 4, 1999 till December 15, 2003. In Fig. 7, working with the n+ = 593 positive log-returns, estimates in (3.7) (left), as function of k, for = 0 and we present the sample path of the
= 1, together with the sample paths of the classical Hill estimator, H, of BL( 0 ) and of 0 ) (right). ML( Note that the sample paths of the -estimates associated to = 0 and = 1 lead us to choose, on the basis of any stability criterion for large k, the estimate associated to = 0. From previous experience with this type of estimates, we conclude that the underlying value is larger or equal to −1, and the consideration of = 0 is then advisable. The estimate of is in this case 0 = −0.7. Regarding the tail index estimation, note that whereas the Hill estimator is unbiased for the estimation of the tail index when the underlying model is a strict Pareto model, it exhibits a relevant bias when we have only Pareto-like tails, as happens here, and may be seen from Fig. 7 (right). The other estimators, which are “asymptotically unbiased” reveal a smaller bias, and enable us to take a decision upon the estimate of to be used, with the help of any stability criterion or any heuristic procedure, like a “largest run method” as the one described in the sequel. Let us consider a set of tail index estimates i (k), 1 k < n, i ∈ I, based on the observed sample of size n. Consider those estimates with a small number r of decimal figures, and denote them i|r (k). 1. For any value i ∈ I and for any possible value a in the domain of i|r (k), consider the largest run associated with a, i.e., compute Ri (a), the maximum number of consecutive i|r (k) = a. k values such that M
2. Compute next ai := arg maxa Ri (a). M M = ai0 with i0 := arg maxi ai . 3. Consider as a data-driven estimate of the tail index ,
M.I. Gomes et al. / Journal of Statistical Planning and Inference 134 (2005) 409 – 433
433
Here, if we consider the tail index estimates with one decimal figure, the largest run is 0 ) in (3.8). Such a largest run has a size achieved by the sample path of the estimator ML( 0 ) , 0 ) = 0.3. For the BL-estimate, BL( equal to 507 (21 k 527), and is associated to ML( in (3.9), and with one decimal figure, we would also get a tail index estimate equal to 0.3, but with a run of size 243 (78 k 320). With this same criterion, the Hill estimator would provide an estimate equal to 0.4, with a run of size 103 (73 k 175). According to the 0 ) estimator, previous heuristic procedure we would thus be led to the choice of the ML( = 0.3. computed at any level from k = 21 till k = 527, all providing the same estimate References Aban, I., Meerschaert, M., 2004. Generalized least squares estimators for the thickness of heavy tails. J. Statist. Plann. Inference 119 (2), 341–352. Aitken, A.C., 1935. On least squares and linear combinations of observations. Proc. Roy. Soc. Edinburgh 55, 42–48. Beirlant, J., Dierckx, G., Goegebeur, Y., Matthys, G., 1999. Tail index estimation and an exponential regression model. Extremes 2 (2), 177–200. Csörg˘o, S., Deheuvels, P., Mason, D., 1985. Kernel estimates of the tail index of a distribution. Ann. Statist. 13, 1050–1077. Danielsson, J., de Haan, L., Peng, L., de Vries, C.G., 2001. Using a bootstrap method to choose the sample fraction in the tail index estimation. J. Multivariate Anal. 76, 226–248. Draisma, G., 2000. Parametric and Semi-parametric Methods in Extreme Value Theory. Tinbergen Institute Research Series 239. Draisma, G., de Haan, L., Peng, L., Pereira, T.T., 1999. A bootstrap-based method to achieve optimality in estimating the extreme value index. Extremes 2 (4), 367–404. Drees, H., 1998. A general class of estimators of the extreme value index. J. Statist. Plann. Inference 98, 95–112. Feuerverger, A., Hall, P., 1999. Estimating a tail exponent by modelling departure from a Pareto distribution. Ann. Statist. 27, 760–781. Fraga Alves, M.I., Gomes, M.I., de Haan, L., 2003. A new class of semi-parametric estimators of the second-order parameter. Portugal Math. 60 (1), 193–213. Geluk, J., de Haan, L., 1987. Regular Variation, Extensions and Tauberian Theorems. CWI Tract 40. Center for Mathematics and Computer Science, Amsterdam, Netherlands. Gomes, M.I., Martins, M.J., 2002a. Bias reduction and explicit efficient estimation of the tail index. Notas e Comuniçacões C.E.A.U.L. 5/2002. J. Statist. Plann. Inference 124 (2), 361–378. Gomes, M.I., Martins, M.J., 2002b. Asymptotically unbiased estimators of the tail index based on external estimation of the second-order parameter. Extremes 5 (1), 5–31. Gomes, M.I., Oliveira, O., 2001. The bootstrap methodology in statistical extremes—choice of the optimal sample fraction. Extremes 4 (4), 331–358 (2002). Haan, L. de., Peng, L., 1998. Comparison of tail index estimators. Statist. Neerlandica 52, 60–70. Hill, B.M., 1975. A simple general approach to inference about the tail of a distribution. Ann. Statist. 3, 1163–1174.