Insurance: Mathematics and Economics 43 (2008) 386–393
Contents lists available at ScienceDirect
Insurance: Mathematics and Economics journal homepage: www.elsevier.com/locate/ime
Skewed bivariate models and nonparametric estimation for the CTE risk measure Catalina Bolance a , Montserrat Guillen a , Elena Pelican b , Raluca Vernic b,∗ a
Department Econometrics, RFA-IREA, University of Barcelona, Diagonal, 690, 08034 Barcelona, Spain
b
Faculty of Mathematics and Computer Science, Ovidius University of Constanta, 124 Mamaia Blvd, 900527 Constanta, Romania
article
info
a b s t r a c t
Article history: Received September 2007 Received in revised form July 2008 Accepted 30 July 2008
In this paper, we illustrate the use of the Conditional Tail Expectation (CTE) risk measure on a set of bivariate real data consisting of two types of auto insurance claim costs. Several continuous bivariate distributions (normal, lognormal, skew-normal with the alternative log-skew-normal) are fitted to the data. Besides, a bivariate nonparametric transformed kernel estimation is presented. CTE formulas are given for all these, and numerical results on the real data are discussed and compared. © 2008 Elsevier B.V. All rights reserved.
Keywords: Conditional tail expectation Bivariate distributions Kernel estimation
1. Introduction In order for a financial institution to function properly, a certain safety level must be maintained. The safety level is attained by holding risk capital, i.e. capital that can be determined using an appropriate risk measure. A risk measure is a mapping from the random variables representing the risks to the real line. Its purpose is to give a single value for the degree of risk or uncertainty associated with the random variable. When heavy tails occur in a risk management problem, a risk measure providing information above a given threshold is recommended. Conditional Tail Expectation is such a risk measure defined for a loss random variable X as CTEX xq = E X |X > xq .
It can be interpreted as the mean of the worse losses, given the loss will exceed a particular value xq . Here, xq is the q-th quantile of the distribution of X , that is Pr X < xq ≤ q ≤ Pr X ≤ xq . Unlike some commonly used risk measures (e.g. the quantile, also called Value-at-Risk, VaR), the CTE risk measure is coherent in the sense of Artzner et al. (1999). Therefore, CTE has been suggested as a possible ingredient when computing premiums for policies with deductibles or for reinsurance treaties, and it can be considered a weighted premium calculation principle (see e.g. Furman and Zitikis (2008)). CTE also allows for a natural allocation of the total risk capital of a financial institution to its constituents (Dhaene et al., 2008b,a).
∗
Corresponding author. E-mail addresses:
[email protected] (C. Bolance),
[email protected] (M. Guillen),
[email protected] (E. Pelican),
[email protected] (R. Vernic). 0167-6687/$ – see front matter © 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.insmatheco.2008.07.005
Several authors studied particular cases of CTE when the loss random vector follows a certain multivariate distribution; see e.g. Panjer (2002) for the multivariate normal, Valdez and Chernih (2003) for the multivariate elliptical, Vernic (2006) for the multivariate skew-normal (for details on this last distribution see also Azzalini (2005)), Furman and Landsman (2005) for a multivariate gamma distribution, Landsman and Valdez (2005) for exponential dispersion models, Chiragiev and Landsman (2007) for a multivariate Pareto distribution and more recently Dhaene et al. (2008a) for elliptical distributions. Unfortunately, empirical illustrations are still scarce. This is why the main purpose of this paper is to numerically illustrate the evaluation of CTE for a set of bivariate positive claims data from motor insurance (property damage and medical expenses costs). Besides, we also propose a nonparametric kernel estimation approach and discuss the results. In the following we assume that a policy can generate losses of two different kinds, hence its global loss is modeled by a random vector denoted X = (X1 , X2 )0 with Xj the random variable loss of type j, j = 1, 2. The total loss of the policy is then S = X1 + X2 and we denote its CTE risk measure by K , i.e. K sq = CTE sq = E S |S > sq .
Then this can be decomposed as K sq = K1 sq + K2 sq , where Kj sq = E Xj |S > sq reflects the contribution of the risk Xj to the total risk measure. When analyzing such dependent risks, K and Kj are some of the most popular CTE measures that emerge, having useful interpretation in insurance and finance. They are also related to the capital allocation problem; for details on this problem see e.g. Denault (2001), Panjer (2002), Dhaene et al. (2003), Wang (2002), Buch and Dorfleitner (2008) and references inside.
C. Bolance et al. / Insurance: Mathematics and Economics 43 (2008) 386–393
In our particular case, the data set consists of a sample of claims that include two types of losses: property damage mainly resulting from third party liability, and medical expenses that are not included in the Public Health system. Then the total cost per claim is the addition of the two of them. First, we will fit to the data the bivariate skew-normal and normal distributions. Second, given that real data on claim amounts are usually positive and present right skewness, we propose to use the bivariate lognormal and logskew-normal distributions as alternatives. Finally, we also obtain non-parametric estimations of the joint distribution function using kernel density estimation methods. In Section 2 we recall all the bivariate distributions involved in our study, presenting some results related to the calculation of the CTEs. In Section 3 we show how to obtain kernel and transformation kernel estimations of the CTE risk measure. Then in Section 4 we present the data set that is used to illustrate the methodology, and we discuss the conclusions in a final section. In the following, we denote a random variable by a capital and its qth quantile by the corresponding lower case letter with subscript q. We also denote a 2 × 1 column vector by a bold-face letter and its elements by the corresponding italic with a subscript denoting the number of the element, i.e. x = (x1 , x2 )0 . We let e = (1, 1)0 , 0 = (0, 0)0 , and if F is a distribution function, then F¯ = 1 − F is its tail function.
follows a bivariate skew-normal distribution SN2 (µ, Σ , δ, γ) with parameters δ ∈ R, µ, γ ∈ R2 , and Σ a 2 × 2 covariance matrix, if its density is given by fY (x) =
1
Φ (δ)
ϕ2 (x; µ, Σ ) Φ
! δ + γ 0 Σ −1 (x − µ) p , 1 − γ 0 Σ −1 γ
x ∈ R2 .
(1)
Here µ and Σ are, respectively, location and scale parameters. For this distribution, the moment generating function is given by
t0 Σ t
MY (t) = exp t µ + 0
2
Φ δ + γ 0t Φ (δ)
.
(2)
It was proved that the marginal distributions of Y are univariate skew-normal distributions given by Yj ∼ SN1 µj , σjj , δ, γj for j = 1, 2, and also that S = Y1 + Y2 ∼ SN1 µS , σS2 , δ, γS with
0 µS = e0 µ = µ1 + µ2 , σS2 = e0 Σ e = j,k=1 σjk , γS = e γ = γ1 + γ2 . Hence, the density function of S is −2 1 δ + γ x − µS ) ( S σS 2 , f S ( x) = ϕ x; µS , σS Φ q Φ (δ) 1 − γ 2 σ −2
P2
S
S
x ∈ R.
2. Parametric bivariate distributions In this section we present the continuous bivariate distributions that we will fit to our data, together with the corresponding CTE risk measures formulas. We considered several distributions in order to find the best fit and see the differences in CTE behavior. We first start with the bivariate skew-normal distribution and present the classical bivariate normal distribution as its particular case. Next we introduce the bivariate log-skew-normal distribution, with the bivariate lognormal as particular case. Since the general bivariate skew-normal and log-skew-normal distributions involve complex computations, we also discuss some numerical issues encountered when working with real data. We will denote by ϕ the standard normal N (0, 1) density and by Φ its cumulative distribution function (cdf). We use N µ, σ 2 to denote the univariate normal distribution with parameters µ ∈ R, σ > 0, and N2 (µ, Σ ) for the bivariate normal distribution with parameters µ = (µ1 , µ2 )0 ∈ R2 and Σ =
387
σ11 σ21
σ12 σ22 a 2
×2
covariance matrix (i.e. symmetric and positive definite). Also, the density of this last distribution will be denoted by ϕ2 (·; µ, Σ ) . 2.1. Bivariate skew-normal distribution
Because there is no closed formula for it, the cdf of S must be calculated by integration using mathematical software. From Vernic (2006), we have explicit formulas for the involved CTEs, Kj sq = E Yj |S > sq
= µj +
1
Φ (δ) F¯S sq δσS2 + γS sq − µS σ s q − µS j1 + σj2 q × ϕ Φ σS σS σ σ2 − γ2 S
S
S
s − µ + δγS , ¯ qq S + γj ϕ (δ) Φ 2 2 σS − γS
j = 1, 2.
(3)
Here, of course, σS > 0. Consider now (x1i , x2i )i=1,...,n to be a bivariate data sample. In order to fit the above skew-normal distribution (1) to the data, we first need to estimate the parameters involved, i.e. δ, µ, γ and Σ . We suggest the following approach: use the method of moments to obtain a first estimation of the parameters, then take these as initial values for an algorithm that maximizes the log-likelihood function. In our case, we used a Variable Neighborhood Search (VNS) algorithm (see e.g. Mladenovic and Hansen (1997)).
Introduced by Azzalini (1985) as a natural extension of the univariate normal distribution to accommodate asymmetry, the univariate skew-normal distribution was also extended to the multivariate case (see e.g. Azzalini and Dalla Valle (1996)). The general n-variate skew-normal distribution can be developed in several ways. One method consists of starting with the independent and identically distributed standard normal random variables W1 , W2 , . . . , Wn , U. Then the distribution of W = (W1 , W2 , . . . , Wn )0 given that λ0 + λ01 W > U, where λ0 ∈ R and λ1 ∈ Rn , is an n-variate skew-normal distribution. This formulation involves a linear transformation of a hidden truncation. Gradually, more and more complex forms were developed (see e.g. Arnold and Beaver (2002) and Gupta et al. (2004)).
Azzalini’s form. Azzalini and Dalla Valle (1996) defined a first simpler form of bivariate skew-normal distribution that can be obtained from (1) by taking δ = 0 and µ = 0. In fact, the parameter δ was introduced later to confer more flexibility to the hidden truncation model leading to the skew-normal distribution, see e.g. Arnold and Beaver (2002). The parameters of this form can be estimated based on the maximum likelihood method, using already existing software in R. For details on this see Azzalini and Capitanio (1999).
2.1.1. A general form In the following, we consider the form studied by Arnold and Beaver (2002). We say that the random vector Y = (Y1 , Y2 )0
Bivariate normal distribution. The bivariate normal distribution N2 (µ, Σ ) results from the bivariate skew-normal distribution (1) by taking δ = 0 and γ = 0. In this particular case, the sum S = Y1 + Y2 follows an univariate normal distribution N µS , σS2 .
2.1.2. Particular cases of bivariate skew-normal distributions
388
C. Bolance et al. / Insurance: Mathematics and Economics 43 (2008) 386–393
Hence, the cdf of S is easy to obtain by means of Φ , the standard normal cdf. As in Panjer (2002), the CTE formulas (3) reduce to Kj s q
sq −µS σS σj1 + σj2 ϕ , · = µj + σS ¯ sq −µS Φ σS
j = 1, 2.
For (x1i , x2i )i=1,...,n a data sample, the parameters of the bivariate normal distribution can be easily estimated by the maximum likelihood method. For j = 1, 2, we have n 1X
µ ˆ j = x¯ j =
n i=1 n 1X
σˆ jj = s2j = σˆ 12 =
n i=1
n 1X
n i =1
xji ,
xji − x¯ j
2
This distribution is defined in the same way the lognormal distribution is obtained from the normal one. In the statistical literature, an univariate form of log-skew-normal distribution was already used by Azzalini et al. (2003) to model family income data. Without going into details, they also mention a potential application of the multivariate version, allowing for the consideration of joint distribution of income. 2.2.1. General form We say that the random vector X = (X1 , X2 )0 follows a bivariate log-skew-normal distribution with parameters δ ∈ R, µ, γ ∈ R2 , and Σ a 2 × 2 covariance matrix, if the transformed random vector Y = (ln X1 , ln X2 )0 follows a bivariate skew-normal distribution SN2 (µ, Σ , δ, γ). We write X ∼ LSN2 (µ, Σ , δ, γ) and notice that the density of X satisfies 1
fY (ln x) ,
x1 , x2 > 0.
(4)
For this log-skew-normal distribution, because there are no closed formulas, the density function and cdf of S = X1 + X2 must be calculated using mathematical software for the following integrals fS (x) =
x
Z
fX (x − y, y) dy,
x > 0,
0
FS (s) =
Z
s
fS (x) dx
Z0 s Z
x
fX (x − y, y) dydx
= Z0 s Z0 1 = 0
K2 (s) = E (X2 |S > s) 1
∞
Z
F¯S (s)
1
Z
x2 t fX (x (1 − t ) , xt ) dtdx. 0
s
Notice that, in order to fit the log-skew-normal distribution to the bivariate data (x1i , x2i )i=1,...,n , it suffices to transform the initial data sample into its log-form (ln x1i , ln x2i )i=1,...,n , and then proceed as for the bivariate skew-normal distribution (1).
,
(x1i − x¯ 1 ) (x2i − x¯ 2 ) .
x1 x2
and similarly
=
2.2. Bivariate log-skew-normal distribution
fX (x) =
K1 (s) = E (X1 |S > s) Z ∞ 1 E (X1 |S = x) fS (x) dx = F¯S (s) Zs Z x ∞ 1 y fX (y, x − y) dydx = F¯S (s) s Z ∞ Z0 1 1 = x2 t fX (xt , x (1 − t )) dtdx, ¯FS (s) s 0
2.2.2. Particular case: Bivariate lognormal distribution The bivariate lognormal distribution LN2 (µ, Σ ) results from the more general bivariate log-skew-normal distribution by taking δ = 0 and γ = 0 in (4) and (1). The corresponding density and cdf of S, and also the CTE measures K , K1 , K2 , have no closed forms, so they result in similar integrals as for the bivariate log-skew-normal distribution. In order to estimate the parameters of the lognormal distribution, we proceed as for the bivariate log-skew-normal distribution, i.e. we transform the initial data sample into its log-form and then estimate the parameters of the bivariate normal distribution N2 (µ, Σ ). 2.2.3. Numerical issues The general forms of bivariate skew-normal and log-skewnormal distributions raised important numerical problems when evaluating the double integrals involved in the cdf of S and Kj (see formulas above). Even mathematical software like Mathematica failed in solving them, so we chose to write our own Matlab procedures. Based on Simpson’s composite rule for double integrals (see e.g. Bourden and Faires (2001)), our procedures allow for a certain control of the accuracy of the integrals results. But then the evaluation of a single Kj value could take a few minutes. This doesn’t seem much if we want only a few Kj values, but if we want many such values (like hundreds), the computation process might take quite long. Even if the numerical issues presented could discourage one in using the general skew-normal or log-skew-normal distributions, we didn’t discard them for such reasons, considering that real data could be better modelled by these distributions than by simpler ones (as e.g. the normal or lognormal distributions). 3. Kernel methods
x fX (x (1 − t ) , xt ) dtdx,
0
where we considered the domain of X, and for the last integral we changed the variable y = xt. This last integral is easier to calculate numerically since the integration limits do not depend on the integration variables. For similar reasons, for the CTE formulas we needed the integrals K (s) = E (S |S > s) Z ∞ 1 = xfS (x) dx F¯S (s) s Z ∞Z 1 1 = x2 fX (x (1 − t ) , xt ) dtdx, ¯FS (s) s 0
3.1. Kernel estimation For a random sample of n independent and identically distributed observations x1 , . . . , xn of a random variable X , the kernel density estimator is fˆ (x) =
n 1 X
nh i=1
k
x − xi h
,
(5)
where h is the bandwidth and k (·) is the kernel function. The bandwidth parameter is used to control the amount of smoothing in the estimation so that the greater h, the smoother the estimated density curve. The kernel function is usually a symmetric density
C. Bolance et al. / Insurance: Mathematics and Economics 43 (2008) 386–393
with zero mean; in our illustration a Gaussian kernel is used (see Wand and Jones (1995)). In the multivariate case, a simple generalization of (5) is done by means of the product kernels (see Scott (1992), pp. 150–155). More specifically, in the bivariate case, let us consider a random sample of n independent and identically distributed bivariate data (x1i , x2i )i=1,...,n of the random vector X = (X1 , X2 )0 . Then the kernel estimator of the bivariate density function can be expressed as
n X x1 − x1i x2 − x2i k , ,
1
fˆ (x1 , x2 ) =
nh1 h2 i=1
h1
function f xj |S > sq = 1 − FS sq fˆ xj |S > sq 1 nhj
=
k
1−K
hj
i =1
1−
1 n
n P
K
i=1
s q −s i h
sq −si h
−1 R +∞ sq
f xj , s ds as
,
(7)
where si = x1i + x2i , i = 1, . . . , n, and K (.) is the cdf of the kernel density function (in our case the cdf of the standard normal distribution). Hence, from the previous conditional density, the kernel estimator of E Xj |S > s is
ˆ Xj |S > sq = E
Z
n P
=
+∞
xj fˆ xj |S > sq dxj
=
n P
xji K
i=1 n
n−
P i=1
K
sq −si h
s q −s i h
n P
K
( )
T sq −T (si )
R +∞
h
0
i=1
n
P
K
xj k
( ) ( ) T 0 x dx j j hj
T xj −T xji
( )
T sq −T (si )
.
h
i=1
Bolancé et al. (2003) proposed selecting the transformation function from a transformation family that is based on a generalization of the original Wand et al. (1991) power family,
Tλ1 ,λ2 (x) =
(x + λ1 )λ2 sig (λ2 ) , ln (x + λ1 )
+∞
Z
T xj − T xji
xj k
! T 0 xj dxj .
hj
0
(11)
In order to calculate (11) we used a change of variable yj = T xj , so that the integral is now
Z
+∞ −1
yj k
+∞
Z (8)
1
=
λ2
yj
yj − yji
dyj
hj
−∞
.
(10)
with λ1 ≥ − mini=1,...,n (xi ) and λ2 ≤ 1 for right-skewed data. This parametric family of transformation functions is called the shifted power transformation family. Its main advantage is that it has a simple expression and works particularly well for transformation kernel estimation of asymmetric distributions. In order to estimate the optimal parameters of the shifted power transformation function, we use the algorithm described by Bolancé et al. (2003). The difficulty when evaluating the expression in (9) is the calculation of the integral
T
xji −
i =1
xji −
i=1
−∞ n P
(9)
n P xj −xji
ˆ Xj |S > sq E
(6)
where h1 and h2 are bandwidths that, like in the univariate situation, are used to control the degree of smoothing. The function k (x1 , x2 ) = k (x1 ) k (x2 ) is the product kernel. We will now propose a kernel estimation of Kj , j = 1, 2. So, we first present the kernel estimator of the conditional density
a change of variable, once the kernel estimation is obtained for the transformed variable, the estimation in the original scale is also obtained. Here, the estimation of CTE using a transformation kernel estimation approach is
n−
h2
389
yj − yji − λ1 k dyj .
(12)
hj
−∞
Then, a new change of variable zj =
yj −yji hj
is performed in (12), so
that 3.2. Transformation Kernel estimation
Z
+∞
1
yj − yji
λ1
− λ1 k dyj hj +∞ λ1 2 − λ1 k zj dzj hj zj + yji
−∞
Classical kernel density estimation does not perform well when the true density is asymmetric. For instance, when one is interested in the density of the claim cost variable, the presence of many small claims produces a concentration of the mass near the low values of the domain and the presence of some very large claims causes positive skewness. When there is scarce information in the right tail of the domain, classical kernel density estimation does not capture the tail behavior properly, and as a consequence functionals derived from the density estimation such as the CTE defined in (8) is poorly approximated. Many authors have worked with heavy-tailed distributions and have adapted kernel estimation methods in this context. Wand et al. (1991), Clements et al. (2003), Bolancé et al. (2003), Buch-Larsen et al. (2005) and Bolancé et al. (2008) have proposed different transformation kernel density estimation methods, based on parametric families. Let T (·) be an increasing and monotonous transformation function. If the true density is right skewed, then the chosen transformation T (·) must be a concave function. The transformation kernel estimation method (TKE) consist in transforming the origi nal data yji = T xji so that the new transformed data can be assumed to have been generated by a symmetric random variable, and hence the true density of the transformed variable can easily be approximated by the classical kernel estimation method. Using
λ yj 2
Z = hj
−∞ +∞
Z = hj
hj zj + yji
2
k zj dzj − hj λ1 .
(13)
−∞
Moreover, using Taylor’s approximation 1
hj zj + yji
1
λ1
=
2
λ yji 2
−
+
λ
1
λ2
hj zj
yji 2 yji
2 1 hj
1 λ2
zj2 yji
2λ2 y2ji
−
2 1 hj
!
2λ22 y2ji
+ O zj3 ,
if k(·) is the Gaussian kernel, the resulting integral in (13) is
Z
+∞
hj −∞
1
λ yji 2
1 λ2
1 λ2
1 − zj2 yji yji + hj zj λ2 yji
1
h2j
2λ2 y2ji
−
1
2λ22 y2ji
× k zj dzj − hj λ1 1
= hj
λ yji 2
1
−
λ yji 2
2 1 hj
2λ2 y2ji
−
2 1 hj
2λ22 y2ji
h2j
!! − hj λ 1 .
!
390
C. Bolance et al. / Insurance: Mathematics and Economics 43 (2008) 386–393
Fig. 1. Plot of X1 vs X2 . Table 1 Parameters, log-likelihood and Akaike’s values for the fitted distributions
δˆ µ ˆ1 µ ˆ2 γˆ1 γˆ2 σˆ 11 σˆ 12 σˆ 22 Log-likelihood AIC
∆i
Normal
Lognormal
General skew-normal
Log-skew-normal
0 1827.60 283.92 0 0 47 075 848.65 4312 140.26 743 623.22 −9351.14 18 712.29 4049.29
0 6.44 4.37 0 0 1.78 1.20 2.31 −7329.23 14 668.46 5.46
−2.44 −53 347.36 −4330.52
−6.91 −34.44 −34.66
19 920.14 1665.98 407 664 559.59 34 461 606.54 3266 549.03 −9122.81 18 261.62 3598.62
5.80 5.54 34.78 32.70 32.39 −7323.50 14 663.00 0
Finally, the CTE can be approximated by ˆ Xj |S > sq E 1 1 n n P P T (sq )−T (si ) λ2 λ2 1 K xji − h y − y j ji ji h 2λ2 i=1 i=1 = n P T (sq )−T (si ) n− K h
For the log-data we have means : y¯ 1 = 6.4436, h2j y2ji
−
1
h2j
2λ22 y2ji
− λ1
variances : .
i=1
(14) 4. Data and results The claims we considered refer to motor insurance of a major insurer in Spain in year 2000. Data correspond to a random sample of all claims with both costs in property damage and in medical expenses. Bodily injury is universally covered by the National Health System, but some medical expenses (such as technical aids, drugs or chiropractors) may have to be paid by the insurer. No compensation for pain and suffering or loss of income are included. The claims included in our sample are all claims that had been settled. Although claims for compensations with bodily injury may take a long time to settle, data were collected later on, in 2002, so that there has been enough time for the claimant to include most costs, according to the company’s opinion. Medical expenses may contain medical costs related to a third person that was injured in the accident. The sample size is n = 518, and for each claim i we observe X1 the cost of property damage and X2 the cost of medical expenses in 1000 pesetas. The main empirical characteristics are Mean X1 X2
Std. Dev.
Skewness Kurtosis
1827.6004 6867.8166 15.7430 283.9208 863.1695 8.0836
And covariance: 4312 140.2637.
Min
301.2149 13 83.1602 1
Max 137936 11855
s21
= 1.7854,
y¯ 2 = 4.3755, s22 = 2.3116.
In Fig. 1 we show the plot de X1 vs X2 . In Table 1 we present estimated values for the parameters of the bivariate distributions described in Section 2, together with the corresponding log-likelihood and AIC values. We recall that AIC (Akaike’s information criterion, see e.g. Akaike (1974)), is defined as AIC = 2(s − ln L), where s is the number of estimated parameters and L is the likelihood function. The preferred model is the one with the lowest AIC value. If we study a total of m models and calculate AICi , i = 1, . . . , m, (namely, for every model), they can be rescaled as ∆i = AICi − minj=1,...,m AICj . Then models having ∆i ≤ 2 have substantial support, those where 4 ≤ ∆i ≤ 7 have considerably less support and those with ∆i > 10 have essentially no support. To obtain the kernel estimation of (8) we used a Gaussian kernel. The smoothing parameter was obtained taking into account that for estimating distribution functions, Azzalini (1981) and Reiss 1
(1981) recommended h = cn− 3 , where c is a constant. We calculated c so that the distribution function kernel estimate is close to the empirical distribution function. In Figs. 2–4 we show the results for K1 , K2 and K . When looking at the different estimation methods, we see that the curve corresponding to the normal approach underestimate the conditional tail expectation (vertical axis) for a given total cost (horizontal axis) if compared to all the other methods. Until a certain point, the same thing happens with the lognormal and the
C. Bolance et al. / Insurance: Mathematics and Economics 43 (2008) 386–393
391
Fig. 2. CTE for property damage (several estimation methods).
Fig. 3. CTE for medical expenses (several estimation methods).
Fig. 4. CTE for total claim cost (several estimation methods).
skew-normal curves. A good compromise is the log-skew-normal, because
(i) it provides analytical expressions, but which may not be easy to evaluate numerically
392
C. Bolance et al. / Insurance: Mathematics and Economics 43 (2008) 386–393
Fig. 5. CTE for total cost as a function of cdf (several estimation methods).
Fig. 6. Bootstrap confidence intervals for K .
(ii) it provides CTE results that are close to the empirical and kernel methods for medium costs, and (iii) does not have an asymptotic bound, so that the CTE keeps increasing as a function of the cost quantile. Note that the same total cost level in the x-axis does not necessarily correspond to the same tail probability for each distributional assumption. So, fixed total costs levels on the horizontal axis may correspond to different quantiles in different distributions. Additionally, we also plot in Fig. 5 the conditional tail expectation estimates of the total costs (sum of property damage and medical expenses) as a function of the cumulated probability for each distributional assumption. In all the figures, the classical kernel estimation approach clearly appears as a smoothed version of the empirical distribution and therefore avoids the jumps that are typical of the latter. The transformation kernel curve looks even better, because unlike the empirical curve, it is not constant from the last point and smoothly increases as a function of the cost quantile. In order to compare the precision of the nonparametric approaches introduced here, a bootstrap method to approximate the 95% confidence interval was used in the application. We performed the calculations for a number of replicated samples which we fixed to 1000 and the sample size of each replicate was 518. So the number of individuals in every replicate was exactly
the same as in the original sample, but repetitions could exist, because the bootstrap samples allow replacement once a unit has been selected into the replicate. Then, for every sample replicate, an estimation was obtained for every point where CTE is evaluated. Finally, the estimation results obtained for all the replicates are analyzed together and the 2.5% and 97.5% percentiles were calculated to obtain the bootstrap confidence interval. The major drawback is that the classical kernel estimation degenerates in the extremes, i.e. from 35 000 onwards the confidence intervals have no upper bound. This is due to the lack of observations in the tail. However, the confidence intervals obtained using the bootstrap method for the transformation kernel estimation are bounded. In Fig. 6, the bootstrap confidence interval is plotted for the transformation kernel estimation method, when using the same transformation parameters for all the replicates. We conclude that the transformation estimation approach both improves the point estimate of CTE and also provides the possibility of calculating bootstrap confidence intervals in the tail. 5. Conclusions In this paper we fitted several parametric bivariate distributions and used transformation kernel estimation to analyze a real data set from auto insurance, then numerically illustrated the application of the CTE risk measures for all of them.
C. Bolance et al. / Insurance: Mathematics and Economics 43 (2008) 386–393
As expected, the kernel estimation approach provides a smoothed version of the empirical cdf and CTEs, because the kernel function is continuous and infinitely differentiable, so that these properties are transmitted to the final estimate. One could argue that an asymmetric kernel function may be applied in our context. We have preferred to keep a standard symmetric kernel function and to use transformation kernel estimation because this method is suitable for heavy-tailed data. Moreover, the theory on kernel estimation using asymmetric kernels is developed only for density estimation and not for other functionals, while transformation kernel estimation can be applied directly. Using an asymmetric kernel just guarantees that the boundary bias near zero is removed (see Chen (2000)), but this problem disappears when using transformation kernel estimation. Among the bivariate distributions, the one giving the worst fit (i.e. bivariate normal) clearly underestimate the empirical CTEs. The bivariate lognormal and log-skew-normal, though offering the best fits to the data when considering parametric alternatives, are not as close to the empirical as the nonparametric approximation. We mention that we also tried to fit to our data a bivariate Pareto distribution of first kind and a bivariate Weibull, but the results were so poor compared with the other distributions, that we decided not to include them. In conclusion, we recommend avoiding the use of distributions that underestimate the empirical CTEs, and therefore might cause substantial undervaluation of losses to the company. Notice that this is the case with the bivariate normal distribution which is still used in practice. Also, the bivariate lognormal and skew-normal are not enough satisfactory. On the other hand, the transformation kernel approach provides a better smooth empirical fit to real data and a compromise CTE result. Acknowledgments The authors acknowledge the Spanish Ministry of Education and Science FEDER 2007-63298/ECON for the support. References Akaike, H., 1974. A new look at the statistical model identification. IEEE Transactions on Automatic Control 19, 716–723. Arnold, B.C., Beaver, R.J., 2002. Skewed multivariate models related to hidden truncation and/or selective reporting. Sociedad de Estadistica e Investigacion Operativa Test 11, 7–54. Artzner, P., Delbaen, F., Eber, J.M., Heath, D., 1999. Coherent measures of risk. Mathematical Finance 9, 203–228. Azzalini, A., 1981. A note on the estimation of a distribution function and quantiles by kernel method. Biometrika 68, 226–228. Azzalini, A., 1985. A class of distributions which includes the normal ones. Scandinavian Journal of Statistics 12, 171–178. Azzalini, A., 2005. The skew-normal distribution and related multivariate families. Scandinavian Journal of Statistics 32, 159–188. Azzalini, A., Capitanio, A., 1999. Statistical applications of the multivariate skewnormal distribution. Journal of the Royal Statistical Society, Series B 61, 579–602.
393
Azzalini, A., Dal Cappello, T., Kotz, S., 2003. Log-skew-normal and log-skew-t distributions as models for family income data. Journal of Income Distribution 11, 12–20. Azzalini, A., Dalla Valle, A., 1996. The multivariate skew-normal distribution. Biometrika 83, 715–726. Bolancé, C., Guillén, M., Nielsen, J.P., 2003. Kernel density estimation of actuarial loss functions. Insurance: Mathematics and Economics 32, 19–36. Bolancé, C., Guillén, M., Nielsen, J.P., 2008. Inverse Beta transformation in kernel density estimation. Statistics & Probability Letters 78, 1757–1764. Bourden, R.L., Faires, J.D., 2001. Numerical Analysis, 7th edition. Brooks/Cole, Pacific Grove. Buch, A., Dorfleitner, G., 2008. Coherent risk measures, coherent capital allocations and the gradient allocation principle. Insurance: Mathematics and Economics 42, 235–242. Buch-Larsen, T., Guillen, M., Nielsen, J.P., Bolancé, C., 2005. Kernel density estimation for heavy-tailed distributions using the Champernowne transformation. Statistics 39, 503–518. Clements, A.E., Hurn, A.S., Lindsay, K.A., 2003. Möbius-like mappings and their use in kernel density estimation. Journal of the American Statistical Association 98, 993–1000. Chen, S.X., 2000. Probability density function estimation using Gamma kernels. Annals of the Institute of Statistical Mathematics 52, 471–480. Chiragiev, A., Landsman, Z., 2007. Multivariate Pareto portfolios: TCE-based capital allocation and divided differences. Scandinavian Actuarial Journal 4, 261–280. Denault, M., 2001. Coherent allocation of risk capital. Working Paper. Ecole des H.E.C., Montreal. Dhaene, J., Goovaerts, M.J., Kaas, R., 2003. Economic capital allocation derived from risk measures. North American Actuarial Journal 7, 44–59. Dhaene, J., Henrard, L., Landsman, Z., Vandendorpe, A., Vanduffel, S., 2008a. Some results on the CTE-based capital allocation rule. Insurance: Mathematics and Economics 42, 855–863. Dhaene, J., Laeven, R.J.A., Vanduffel, S., Darkiewicz, G., Goovaerts, M.J., 2008b. Can a coherent risk measure be too subadditive? Journal of Risk & Insurance 75, 365–386. Furman, E., Landsman, Z., 2005. Risk capital decomposition for a multivariate dependent gamma portfolio. Insurance: Mathematics and Economics 37, 635–649. Furman, E., Zitikis, R., 2008. Weighted premium calculation principles. Insurance: Mathematics and Economics 42, 459–465. Gupta, A.K., Gonzalez-Farias, G., Dominguez-Molina, J.A., 2004. A multivariate skew normal distribution. Journal of Multivariate Analysis 89, 181–190. Landsman, Z., Valdez, E.A., 2005. Tail conditional expectation for exponential dispersion models. Astin Bulletin 35, 189–209. Mladenovic, N., Hansen, P., 1997. Variable neighborhood search. Computers and Operations Research 24, 1097–1100. Panjer, H.H., 2002. Measurement of risk, solvency requirements and allocation of capital within financial conglomerates. In: 27th International Congress of Actuaries. Cancun. Reiss, R.D., 1981. Nonparametric estimation of smooth distribution functions. Scandinavian Journal of Statistics 8, 116–119. Scott, D.W., 1992. Multivaraite Density Estimation. Theory, Practice and Visualization. John Wiley & Sons, Inc. Valdez, E.A., Chernih, A., 2003. Wang’s capital allocation formula for elliptically contoured distributions. Insurance: Mathematics and Economics 33, 517–532. Vernic, R., 2006. Multivariate skew-normal distributions with applications in insurance. Insurance: Mathematics and Economics 38, 413–426. Wand, P., Jones, M.C., 1995. Kernel Smoothing. Chapman & Hall. Wand, P., Marron, J.S., Ruppert, D., 1991. Transformations in density estimation. Journal of the American Statistical Association 86, 343–361. Wang, S., 2002. A set of new methods and tools for enterprise risk capital management and portfolio optimization. Working Paper. SCOR reinsurance company. http://www.casact.com/pubs/forum/02sforum/02sf043.pdf.