ARTICLE IN PRESS Journal of Statistical Planning and Inference 140 (2010) 1513–1518
Contents lists available at ScienceDirect
Journal of Statistical Planning and Inference journal homepage: www.elsevier.com/locate/jspi
A matching prior for extreme quantile estimation of the generalized Pareto distribution Kwok-Wah Ho Department of Statistics, Chinese University of Hong Kong, Shatin, Hong Kong
a r t i c l e in fo
abstract
Article history: Received 16 March 2009 Received in revised form 7 December 2009 Accepted 7 December 2009 Available online 16 December 2009
Extreme quantile estimation plays an important role in risk management and environmental statistics among other applications. A popular method is the peaksover-threshold (POT) model that approximate the distribution of excesses over a high threshold through generalized Pareto distribution (GPD). Motivated by a practical financial risk management problem, we look for an appropriate prior choice for Bayesian estimation of the GPD parameters that results in better quantile estimation. Specifically, we propose a noninformative matching prior for the parameters of a GPD so that a specific quantile of the Bayesian predictive distribution matches the true quantile in the sense of Datta et al. (2000). & 2009 Elsevier B.V. All rights reserved.
Keywords: Quantile estimation Generalized Pareto distribution Peaks-over-threshold model Risk management Probability matching prior
1. Introduction This short note is motivated by a practical problem of estimating extremely high quantiles in financial risk management while similar applications can be found in areas like Hydrology (e.g. Smith, 1991), Reliability (e.g. Ditlevsen, 1994) and Insurance (e.g. Embrechts et al., 1997). The Basel II accord requires banks to hold sufficient regulatory capital for market risk, credit risk and operational risk they engaged. Among these three risk categories, operational risk management is the most challenging. Sophisticated banks are given opportunities to model and predict their operational loss in a coming period and use the 99.9% value at risk (VaR) as a basis to calculate the required capital. The 99.9% VaR (see Jorion, 1997, for more details) is simply the 99.9% quantile of the operational loss distribution. Estimating such a high quantile is extremely difficult but it is exactly the problem that operational risk management is facing (e.g. see McNeil et al., 2005). Extreme value theory (EVT) provides a plausible solution to the above problem by fitting a parametric model to the upper tail of the loss data. A commonly used method is Pickands’ peaks-over-threshold (POT) model. Assume that ðX1 ; X2 ; . . . ; Xm Þ be a set of independent observations from an unknown distribution function F and denote ðX1:m ; X2:m ; . . . ; Xm:m Þ be the ascending ordered sample. Choosing a high threshold u and let n om be the number of exceedances of u. Define Yi ¼ Xmi þ 1:m u, for i ¼ 1; . . . ; n. Pickands (1975) showed that the distribution of Y1 ; Y2 ; . . . ; Yn when u-1 can be approximated by a generalized Pareto distribution (GPD) with scale parameter s 4 0 and shape parameter x. The density function is ( f ðyjx; sÞ ¼
s1 ð1 þyx=sÞð1 þ xÞ=x ; xa0; s1 expðy=sÞ; x ¼ 0:
E-mail address:
[email protected] 0378-3758/$ - see front matter & 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.jspi.2009.12.012
ð1Þ
ARTICLE IN PRESS 1514
K.-W. Ho / Journal of Statistical Planning and Inference 140 (2010) 1513–1518
The shape parameter x plays an important role of determining the tail shape of the GPD. The support is y4 0 for x Z 0 and 0 ry rs=x for x o 0, respectively. Financial loss data are commonly found to have heavy upper tails that correspond to x 4 0. Thus we only consider GPD with x 4 0 in this paper. Estimation for GPD parameters are usually done via frequentist methods like maximum likelihood estimation (MLE) (see e.g. Davison and Smith, 1990), probability weighted moments (PWM) by Hosking and Wallis (1987) and elemental percentile method (EPM) by Castillo and Hadi (1997). With the estimated parameters, quantiles can be estimated from the predictive distribution for a future observation Xm þ 1 . For example, Smith (1987) proposed first obtaining the MLE ðx^ ; s^ Þ for the GPD model using those n data exceeding the threshold u and then estimate PðX 4 uÞ by n=m. An estimate of the 100ð1aÞ% quantile qa for Xm þ 1 can be obtained by qa ¼ qa ðx^ ; s^ Þ þu;
ð2Þ
where a ðm=nÞa and qa ðx^ ; s^ Þ is the 100ð1a Þ% quantile of the GPDðx^ ; s^ Þ distribution. In the context of operational risk management, Moscadelli (2004) applied the MLE approach in a detailed EVT analysis. While the frequentist methods produce predictive distributions using estimated parameters, Bayesian methods have the merit of incorporating parameter uncertainty directly into the predictive distributions that lead to more stable results. However, a common problem for any Bayesian method is the choice of prior distributions. In the context of estimating GPD parameters, Arnold and Press (1989) explored the use of informative priors for Pareto distribution. Behrens et al. (2004) proposed prior elicitation following Coles and Powell (1996). de Zea Bermudez and Amaral Turkman (2003) proposed the use of vague priors. More recently, Castellanos and Cabras (2007) proposed using Jeffreys prior as a default procedure when there is no prior information. For the specific problem of quantile estimation, it however lacks theoretical justification in using any of the above priors. In this article, we focus on constructing a noninformative prior specifically for the purpose of estimating high quantiles of GPDðx; sÞ distribution with x 40. The technique we use is the matching prior for prediction introduced by Datta et al. (2000). These priors match asymptotically the coverage probability of a Bayesian credible set for a future observation with its frequentist counterpart. Therefore this choice of priors provides frequentist validity and the essence of matching for prediction that fits our purpose well. The study of matching priors has a long history. Interested readers are referred to Welch and Peers (1963), Tibshirani (1989), Ghosh and Mukerjee (1992), Mukerjee and Dey (1993), Sweeting (1995), Datta and Ghosh (1995), Mukerjee and Ghosh (1997), Ghosal (1999) and Datta and Mukerjee (2004) among others.
2. A matching prior for GPD quantile estimation We look for a prior pðx; sÞ that the 100ð1aÞ% quantile of the Bayesian predictive distribution matches the true 100ð1aÞ% quantile in the sense of Datta et al. (2000) (see also Chapter 6 of Datta and Mukerjee, 2004, for more details). In that paper, they derived checkable conditions for such priors and showed that while Jeffreys prior is the only possible matching prior of this kind when the parameter is scalar, it is not necessarily the case for problems with higher dimensional parameter vectors. In addition, as remarked in Datta and Mukerjee (2004), such priors generally depend on the value of a. For the operational risk management problem discussed in the Introduction section, a specific a is usually predetermined for either regulatory or internal control purpose. Therefore, the priors’ dependence on a will not be a serious restriction in those applications. Surprisingly, we find that such a prior exists only if a is small enough and those cases are particularly relevant in risk management problems. Let Y ¼ fY1 ; . . . ; Yn g be a random sample from GPDðx; sÞ with x and s 4 0. Given a prior pðx; sÞ and a 2 ð0; 1Þ, we denote Q ðp; a; YÞ as the 100ð1aÞ% quantile of the Bayesian predictive distribution of a future observation. We say that pðx; sÞ is a matching prior for 100ð1aÞ% prediction quantile estimation if, Px;s fYn þ 1 4 Q ðp; a; YÞg ¼ a þ oðn1 Þ
for all x; s 40:
ð3Þ
Theorem 1. For 0 o a o e2 ,
p~ ðx; sÞps1 x2 ð1 þ xÞ2 x lnðaÞ
ð1 þ 2xÞ ð1ax Þ ð1 þ xÞ
1 ;
s; x 4 0
ð4Þ
is a matching prior for 100ð1aÞ%th prediction quantile estimation. Proof. See Appendix. Remark 1. The constraint 0 o a o e2 on p~ ðx; sÞ is not that restrictive in practical problems. For instance, in calculating VaR for operational risk, Basel II requires a 99.9% confidence level. If 10% of the loss data exceed the threshold in the POT model (see DuMouchel, 1983, for a suggestion of using upper 10% of data), we therefore have to predict the 99% quantile of the GPD distribution. It corresponds to a ¼ 0:01 that satisfy the above constraint.
ARTICLE IN PRESS K.-W. Ho / Journal of Statistical Planning and Inference 140 (2010) 1513–1518
0.9
0.35
a
0.8
b
0.3
0.7
0.25
0.6 0.5
0.2
0.4
0.15
0.3
0.1
0.2
0.05
0.1 0
1515
0
0.02 0.04 0.06 0.08 α
0.1
0.12
0
0
0.02 0.04 0.06 0.08 α
0.1
0.12
Fig. 1. Absolute value of the n-1 term in Eq. (6) as a percentage of a when n ¼ 50. (a) x ¼ 0:2, (b) x ¼ 1.
Remark 2. For two different priors, the one with smaller absolute value of the n1 term in (6) indicates closer proximity to the correct frequentist coverage (Datta et al., 2000). Regarding the default prior suggested in Castellanos and Cabras (2007), we plot that term divided by a (to show the percentage error in approximating a) across different values of a when n ¼ 50 for x ¼ 0:2 and 1 in Fig. 1. We can see that the percentage errors are substantial especially for small values of a. For example, for a ¼ 0:01, the percentage error in that term amounts to 8.2% and 11.5% for x ¼ 0:02 and 1, respectively. While the posterior density is not available in closed form, the following theorem guarantees that it is a proper density function. Theorem 2. Under the condition of Theorem 1, the posterior density corresponding to the matching prior p~ ðx; sÞ is a proper density function given two or more observations. Proof. See Appendix. 3. Concluding remarks In this note, we proposed a matching prior for GPDðx; sÞ with x 4 0 specifically designed for extreme quantile estimation. The proposed prior provides frequentist validity for prediction that is particularly relevant in risk management problems. Although we limit our discussion on financial risk management applications, the result is also applicable in other quantile estimation problems. Appendix A A.1. Proof of Theorem 1 As discussed in Datta and Mukerjee (2004), derivations can be significantly simplified if the density function is reparametrized such that the Fisher information matrix is diagonal. For GPD distribution with x 40, the orthogonal reparametrization leads to the following density function: f ðyjx; yÞ ¼
1þx
y
ð1 þyxð1 þ xÞ=yÞð1 þ xÞ=x
for y; y; x 40
ð5Þ
by the transformation s ¼ y=ð1 þ xÞ. With orthogonal parametrization, the result of equation (6.2.11) in Datta and Mukerjee (2004) simplifies to Py;x fYn þ 1 4Q ðp; a; YÞg ¼ a
2 2 1 ½D fIy my ðy; x; aÞpðx; yÞg þ Dx fIx mx ðy; x; aÞpðx; yÞg þoðn1 Þ: npðx; yÞ y
ð6Þ
Here Dy ¼ @=@y and Dx ¼ @=@x. We therefore seek prior distribution pðx; yÞ that satisfy, for a given a, 2
2
Dy fIy my ðy; x; aÞpðx; yÞg þ Dx fIx mx ðy; x; aÞpðx; yÞg ¼ 0; where y2 1
ðI Þ
¼ Iy2
" 2 # @ ln f ; ¼ Ey;x @y
ð7Þ
ARTICLE IN PRESS 1516
K.-W. Ho / Journal of Statistical Planning and Inference 140 (2010) 1513–1518
2
ðIx Þ1 ¼ Ix2 ¼ Ey;x
my ðy; x; aÞ ¼ mx ðy; x; aÞ ¼
Z
"
1
qðy;x;aÞ
Z
1
qðy;x;aÞ
@ ln f @x
2 # ;
@f ðu; y; xÞ du; @y @f ðu; y; xÞ du; @x
and qðy; x; aÞ is defined according to following equation (6.2.6) in Datta and Mukerjee (2004): Z 1 f ðu; y; xÞ du ¼ a: qðy;x;aÞ
Based on this equation, qðy; x; aÞ can be explicitly evaluated as qðy; x; aÞ ¼
y ½ax 1: xðx þ1Þ
Through direct calculation, we get
my ðy; x; aÞ ¼ mx ðy; x; aÞ ¼
að1ax Þ yx
;
ð8Þ
a½ð1 þ 2xÞð1ax Þ þ x lnðaÞð1 þ xÞ
x2 ð1 þ xÞ
:
ð9Þ
Information matrix of the original GPD parameters ðs; xÞ is given in Smith (1984) and again in Davison and Smith (1990) in the form of covariance matrix for the parameters. Regarding the parameters ðy; xÞ under orthogonal parametrization, we get the following quantities by a simple change of variables of their results: 2
2
Iy ¼ y ð1 þ 2xÞ;
ð10Þ
2
Ix ¼ ð1 þ xÞ2 :
ð11Þ
Putting, (8)–(11) into (7), the matching prior should satisfy ( ) að1ax Þ að1 þ xÞ2 1 þ 2x x Dy yð1 þ 2xÞ pðx; yÞ þDx ð1 a Þ þ x lnð a Þ p ð x ; y Þ ¼ 0: x 1þx x2 It is then clear that a solution is of the following form: 1 1þ 2x pðx; yÞpy1 x2 ð1 þ xÞ2 ð1ax Þx lnðaÞ : 1þx
ð12Þ
ð13Þ
Transforming (13) back to original parametrization, we then have the matching prior p~ ðx; sÞ of the form (4). The condition 0 o a oe2 is needed to guarantee the positivity of p~ ðx; sÞ. Firstly, it can be checked that for any s 4 0, lim p~ ðx; sÞ ¼
x-0
2
s ln a½ln a þ 2
:
The limit is positive when 0 o a oe2 . Next, to show that p~ ðx; sÞ is positive for x; s 4 0, it suffices to show the positivity of gðxÞ ¼
1 þ 2x ð1ax Þx lnðaÞ 1þx
for x 4 0:
Note that ð1þ 2xÞ ð1ax Þ ð1 þ2xÞð1ax Þ gðxÞ ¼ x ln a 1 ¼ x ln a 1 : 1 þ x x ln a ln axð1 þ xÞ Let h1 ðxÞ ¼ ð1 þ 2xÞð1ax Þ; we have h1 ð0Þ ¼ 0 and h10 ðxÞ ¼ 2ð1ax Þax ln að1 þ 2xÞ 4 0: Let h2 ðxÞ ¼ ln axð1þ xÞ;
ARTICLE IN PRESS K.-W. Ho / Journal of Statistical Planning and Inference 140 (2010) 1513–1518
1517
we have h2 ð0Þ ¼ 0 and h20 ðxÞ ¼ ln a2x ln a 4 0: Note that h10 ðxÞ ¼ 2ð1ax Þax ln að1 þ2xÞ o 2x ln aax ln að1þ 2xÞ: For 0 o a o e2 ,
ax ð1þ 2xÞ ¼
1 þ 2x
1 þ 2x
¼
ða1 Þx
1 þ lnða1 Þx þ
ðlnða1 ÞÞ2 2 x þ 2
o 1:
Thus h10 ðxÞ o 2x ln aln a ¼ h20 ðxÞ: Now h1 ð0Þ ¼ h2 ð0Þ and we show that the slope of h1 is always smaller than h2 for x 40. So h1 ðxÞ=h2 ðxÞ o1 for x 40 and thus ð1 þ 2xÞð1ax Þ h1 ðxÞ gðxÞ ¼ x ln a 1 ¼ x ln a 1 4 0: ln axð1 þ xÞ h2 ðxÞ This completes the proof. A.2. Proof of Theorem 2 Suppose that we obtain two observations y1 ; y2 from GPDðx; sÞ, x; s 40. The posterior density is ðx þ 1Þ=x 1 ð1þ 2xÞ y x y2 x 1þ ð1ax Þ x2 ð1þ xÞ2 x lnðaÞ s3 1 þ 1 s s ð1 þ xÞ ; pðx; sjy1 ; y2 Þ ¼ C where Z
C¼
1
xð1 þ xÞ2 lnðaÞ
0
ð1 þ 2xÞ ð1ax Þ x ð1 þ xÞ
1
Z
1
s3
ðx þ 1Þ=x ! y1 x y2 x 1þ 1þ ds dx:
0
s
s
We have to show that C is finite. We first consider the inner integral " ðx þ 1Þ=x Z 1 Z 1 pffiffiffiffiffiffiffiffiffiffi 2 #ðx þ 1Þ=x y1 y2 x y1 x y2 x 3 3 1þ I¼ s 1þ ds r s 1þ
s
0
ds ¼
Z
1 0
s
0
s
ð1 þ 1=x Þ x s1 3 s 1þ ds;
s
pffiffiffiffiffiffiffiffiffiffi where x ¼ 1=ð1 þ2=xÞ; s ¼ 1=ð2 þ xÞ y1 y2 . Thus, ð1 þ 1=x Þ Z 1 x z zðs Þ1 1 þ dz: I r s
s
0
Notice that x 2 ð0; 1Þ and so the above integral exists and equals to s =ð1x Þ, so Ir
ðs Þ2 2þx 1 : ¼ ¼ 2y1 y2 ð2 þ xÞ 1x 2ð2 þ xÞ2 y1 y2
Thus, Cr
1 2y1 y2
Z
1 0
1 ð1 þ2xÞ ð1ax Þ dx: lnðaÞ x ð1 þ xÞ ð1þ xÞ ð2 þ xÞ
x 2
It can be checked that under the condition 0 o a o e2 , the above integrand as a function of x is positive, having limit 2 value at x ¼ 0 equals to 1=lnðaÞ½lnðaÞ þ 2 and decrease to zero at a rate of x as x-1. Thus we have C o 1. Therefore, the posterior density is a proper one for two observations and hence it will also be proper for samples with more than 2 observations. The proof is completed. References Arnold, B., Press, S.J., 1989. Bayesian estimation and prediction for Pareto data. J. Amer. Statist. Assoc. 84, 1079–1084. Behrens, C., Lopes, H.F., Gamerman, D., 2004. Bayesian analysis of extreme events with threshold estimation. Statist. Modelling 4, 227–244. Castellanos, M.E., Cabras, S., 2007. A default Bayesian procedure for the generalized Pareto distribution. J. Statist. Plann. Inference 137, 473–483. Castillo, E., Hadi, A.S., 1997. Fitting the generalized Pareto distribution to data. J. Amer. Statist. Assoc. 92, 1609–1620. Coles, S.G., Powell, E.A., 1996. Bayesian methods in extreme value modelling: a review and new developments. Internat. Statist. Rev. 64, 119–136.
ARTICLE IN PRESS 1518
K.-W. Ho / Journal of Statistical Planning and Inference 140 (2010) 1513–1518
Datta, G.S., Ghosh, J.K., 1995. On priors providing frequentist validity for Bayesian inference. Biometrika 82, 37–45. Datta, G.S., Mukerjee, R., Ghosh, M., Sweeting, T.J., 2000. Bayesian prediction with approximate frequentist validity. Ann. Statist. 28, 1414–1426. Datta, G.S., Mukerjee, R., 2004. Probability Matching Priors: Higher Order Asymptotics. Springer, New York. Davison, A., Smith, R., 1990. Models for exceedances over high thresholds. J. Roy. Statist. Soc. B 52, 393–442. de Zea Bermudez, P., Amaral Turkman, M.A., 2003. Bayesian approach to parameter estimation of the generalized Pareto distribution. Test 12, 259–277. Ditlevsen, O., 1994. Distribution arbitrariness in structural reliability. In: Schuller, G.I., Shinozuka, M., Yao, J.T.P. (Eds.), Structural Safety and Reliability. Balkema, Rotterdam. DuMouchel, W.H., 1983. Estimating the stable index a in order to measure tail thickness: a critique. Ann. Statist. 11, 1019–1031. ¨ Embrechts, P., Kluppelberg, C., Mikosch, T., 1997. Modelling Extreme Events. Springer, Berlin. Ghosal, S., 1999. Probability matching priors for non-regular cases. Biometrika 86, 956–964. Ghosh, J.K., Mukerjee, R., 1992. Non-informative priors. In: Bernardo, J.M., Berger, J.O., Dawid, A.P., Smith, A.F.M. (Eds.), Bayesian Statistics, vol. 4. Clarendon Press, Oxford. Hosking, J., Wallis, J., 1987. Parameter and quantile estimation for the generalized Pareto distribution. Technometrics 29, 339–349. Jorion, P., 1997. Value at Risk: The New Benchmark for Controlling Market Risk. McGraw-Hill, Chicago. McNeil, A.J., Frey, R., Embrechts, P., 2005. Quantitative Risk Management. Princeton University Press, Princeton, NJ. Moscadelli, M., 2004. The modelling of operational risk: experience with the analysis of the data, collected by the Basel Committee. Banca d’Italia, Temi di discussione del Servizio Studi, no. 517—July 2004. Mukerjee, R., Dey, D.K., 1993. Frequentist validity of posterior quantiles in the presence of a nuisance parameter: higher order asymptotics. Biometrika 80, 499–505. Mukerjee, R., Ghosh, M., 1997. Second-order probability matching priors. Biometrika 84, 970–975. Pickands, J., 1975. Statistical inference using extreme ordered statistics. Ann. Statist. 3, 119–131. Smith, J., 1991. Estimating the upper tail of flood frequency distributions. Water Resour. Res. 23, 1657–1666. Smith, R., 1984. Threshold methods for sample extremes. In: de Oliveira, J.T. (Ed.), Statistical Extremes and Applications. Reidel, Dordrecht. Smith, R., 1987. Estimating tails of probability distributions. Ann. Statist. 15, 1174–1207. Sweeting, T.J., 1995. A framework for Bayesian and likelihood approximations in statistics. Biometrika 82, 1–23. Tibshirani, R.J., 1989. Noninformative priors for one parameter of many. Biometrika 76, 604–608. Welch, B., Peers, H.W., 1963. On formulae for confidence points based on integrals of weighted likelihoods. J. Roy. Statist. Soc. B 25, 318–329.