Insurance: Mathematics North-Holland
and Economics
113
12 (1993) 113-126
Moving weighted average graduation using kernel estimation John Gavin, Steven
Haberman
and Richard
Verrall
The City University, London, UK Received December 1992 Revised February 1993
Abstract: This for graduation formula, if the to the English
paper explores the link between moving weighted average graduation and kernel estimation. A new kernel estimator is studied and an optimal smoothing kernel derived. This gives results which are very similar to Spencer’s 21.term appropriate bandwidth is chosen. The advantages of the kernel approach are discussed and the methods are applied Life Tables Number 14.
Keywords: Graduation;
Kernel
estimation;
Moving weighted
averages;
Optimal
smoothing
kernel.
1. Introduction 1.1. Graduation One of the principal applications of graduation is building a survival model, which is usually presented in the form of a mortality table. The crude rates, G,, on which the model is based, can be seen as a sample from a larger population of lives and thus they contain some random fluctuations. If we believed that the true rates, q,, were independent, then the crude rates would be our final estimate of the true underlying mortality rates. However, a common, prior opinion about the form of the true rates is that each true rate of mortality is closely related to its neighbours. This relationship is expressed by the belief that the true rates progress smoothly from one age to the next. So the next step is to graduate the crude rates in order to produce smooth estimates, ix, of the true rates. This is done by systematically revising the crude rates, in order to remove any random fluctuations. Although this paper refers only to mortality rates, the results can be extended to a multiple decrement table, the transition intensities in a multiple state model or to the force of mortality. 1.2. Kernel graduation Kernel methods were first used for the purpose of graduation by Copas and Haberman (1983). They were developed for estimating statistical density functions and graduation requires the estimation of two such density functions. If we denote the event of death by E = d, whose probability of occurrence depends on some continuous variable X, then =q,.
Pr(E=dlX=x) Correspondence
to: John
0167-6687/93/$06.00
Gavin,
(1)
School of Mathematics,
0 1993 - Elsevier
Science
University
Publishers
of Bath, Bath BA2 7AY, UK.
B.V. All rights reserved
114
J. Gavin et al. / Moving weighted average graduation
Suppose an estimate of q, is required, over a range of values of x. Clearly, Pr(E = d I X=x) define a density function over x. However, we may apply Bayes theorem to obtain Pr(E=dlX=x)
=
Pr(X=xIE=d)
does not
Pr(E=d)
(2)
Pr(X=x)
Now we can apply density estimation methods to both Pr(X = x I E = d) and Pr(X = x). The application of kernel estimators to these two densities leads to the estimate of qx, called 4x2 (q-two), which was studied by Copas and Haberman (1983). Ramlau-Hansen (1983) also noticed the connection between moving weighted average and kernel graduation and studied some of the optimality properties. In this paper, we consider a slightly different estimate to $,‘, which can be regarded as a generalisation of moving weighted average graduation. It allows a reassessment of moving weighted averages (MWA) and also overcomes some of the shortcoming of MWA. 1.3. The data The estimates, &, are based on crude data for a set of ages, C = {xi, x2,. . . , x,). For each age, xi, we are given a measure of exposure, ei, and the number of deaths, d,, where i = 1, 2,. . . , II. The crude estimate, &, of the true mortality rate, q,, at the ith age is denoted by 4, where Gi = di/ei.
(3)
Note that a more exact notation might be H,,, but this involves a redundant level of suffices. Let the total exposure be denoted by t where t=
(4)
cei i=l
and the total number of deaths by u where u=
kdi.
(5)
i=l
The age of a life, which is regarded as a random variable, is denoted by X and its realised value by x. Note that x does not have to be one of the crude ages in the set C. For mortality, the random variable E denotes the event of the life being alive (a) or dead (d). In general the subscript i, where i = 1, 2,. . . , n, denotes the ages at which data has been collected and the subscript j, where j = 1, 2,. . . , t, is used to denote the lives observed.
2. Kernel estimation
2.1. A kernel density estimator
are some observed values of the If xi, xz,...,x, estimator, f: of the density of X is {(x)=-$~(y)
where I-1
jmK(x)dn=l. -cc
random variable,
X, then a non-parametric
(6)
115
J. Gavin et al. / Moving weighted average graduation
K(x) is called a kernel function and f” is called a kernel estimator. If K(x) is differentiable then fA is differentiable. Kernel functions which have been used include the normal (N) and Laplace (L) kernels, defined as e-x=/2 KN(x)
=
KL(x)
G
where
,
= ie-IxI,
-m
< +w,
(7)
--cc)
where
(8)
The amount of smoothing is governed by the choice of bandwidth or smoothing parameter, h. If h is large there is a lot of smoothing, while if h is small, the estimate will just have points of density at each observation. It has a similar role to the smoothing parameter in Whittaker-Henderson graduation. A more complete discussion of kernel estimators and their properties can be found in Copas and Haberman (1983) or Silverman (1986). 2.2. Kernel density estimators for graduation We will now use (6) to derive the kernel estimator which was used by Copas and Haberman (1983) and go on to derive a related estimator which is closely connected by MWA. As was described in the introduction, Bayes theorem can first be applied to 4,. Thus, q, = Pr(death
aged x)
=Pr(E=dlX=x) Pr(X=xlE=d)
Pr(E=d) (9)
Pr(X=x) Next we can apply kernel Pr(X=x) is Pr(X=x)
estimation
= & jciK(
to estimate
Pr(X
=x I E = d) and Pr(X
=x1. A kernel
estimator
of
(10)
T),
where K(x) is a kernel function. Now (10) can be rewritten as follows: Pr(X=x)
= ki$i
j$‘,K(
(11)
F),
where xi,j denotes the age of the jth life out of the group of lives aged xi. Of course, is constant for j = 1, 2,. . . , ei and so
K((l/h)(x
-x~,~))
(12) Thus (11) can be written
as
(13)
Pr(X=x)=k,keiK r=l
Similarly,
a kernel
density
estimator
for Pr(X
Pr,X=xlE=dj=-$,~diK(~).
(14) r=l
A simple
estimator
for Pr(E
= x I E = d) is
= d) is u/t.
116
J. Gavin et al. / Moving weighted average graduation
On substituting (13) and (14) into (9), we get the kernel graduation estimator Liz, where
(15) This then reduces to
(16) Note that the same kernels and bandwidths are used in the numerator and denominator of 4:. Also, 4x2 will lie in the range [O, 11 if the kernel is non-negative. The $z estimator has been used to graduate mortality data by Copas and Haberman (1983) and more recently by Bloomfield and Haberman (1987). We now derive a different kernel estimator that is closely related to MWA. To do this we rewrite $,’ in (16) as follows:
(17) (18) This rearrangement of 4,” makes it clear that there is a contribution in the numerator and denominator for each life aged xi. However, this contribution is the same for each life aged xi. Instead of contributing this for each life, we will now count it just once. This gives a new estimator which is denoted by 4: (q-one), where
(19) (20) Note that, like Gt, it is assumed that the kernels and bandwidths in the numerator and denominator are the same and the estimator lies in the range [O, 11 if the kernel is non-negative. It is clear from this derivation that some information is lost, in going from $,’ to 8:. The (ii estimator needs only the ratio of the number of deaths to the amount of exposure at each age and ignores the actual numbers for exposure and death. For example, if the crude rate was 0.001, say, this could have arisen from 1 death out of an exposure of 1,000 life-years or 100 deaths out of an exposure of 100,000 life-years but $J would treat both cases as identical. Since 4, makes explicit use of the number of deaths and the amount of exposure at each age, we might expect it to be the better estimator. However, with any estimator there is an implicit assumption that there is sufficient information to perform the graduation. In other words, we have sufficient confidence in the crude rates to allow either estimator to be used. An alternative approach to $i is to repeat the derivation of 4: but instead of having ej lives and dj deaths at each age xi, we have one life and 4, deaths at age xi, for i = 1, 2,. . . , n.
3. Generalised
moving weighted averages
Moving weighted average formulae.
averages can be subdivided
into two classes namely, summation
and adjusted
J. Gavin et al. / Moving weighted average graduation
117
3.1. Summation formulae Summation formulae are a particular type of graduation formed by taking an unweighted average of consecutive crude mortality rates. For example, with an average of five consecutive crude rates the graduated rate, ax, is
s^,=(~x-2+~x-L+$x+8x+1+4,+:)/5.
(21)
A shorthand notation can be used here by defining the operator [ul (summation-v) to be
[46,=4,-(,-
1)/Z +&-3)/Z
+ . f. +&+(A),2 + 4x+(u-1),2.
(22)
so [51&= qox-2+90x-*+8,+8,+,+4~+2 5
5
(23)
is a shorthand for a moving weighted average of five. In general we have 4, = [u]~,/u for a moving weighted average of U. The number of crude rates, U, used to calculate one graduated rate is called the range of the summation formula or of the MWA. In the context of graduation, one of the most popular and successful moving weighted average formulas is Spencer’s 21-term formula, which is
3, = [5]j:;7]
{[l] + [3] + [S] - [7]}$.
(24)
3.2. Adjusted average formulae Summation formulae can be generalised to adjusted average formulae, which are a weighted average of consecutive crude mortality rates. They take the form & =
E asGX+s, where s= -m
m C a,=l. s= --m
(25)
In this case, the range of the moving weighted average is 2m + 1. Any summation formula can be expressed as an adjusted average formula but not conversely. The most successful formulae have values for a, which are symmetric, so a, = a_,. When considering the optimality of the co-efficients, a,, it is useful to express the crude rate, which is a random variable, as Gx=qx+rx,
(26)
where qx is the true rate and rX is the residual error. From (25) and (261, we have
4, = f s=
as4x+s + -in
ash.
it s=
(27)
-m
The usual approach is to minimise a function of E a,r,+, s= -m and one method for doing this is outlined in Section 4.1.
(28)
118
J. Gavin et al. / Moving weighted average graduation
3.3. A generalised MWA It can be seen that the kernel estimator &, defined in (201, is a generalisation of moving weighted averages. The definition of the bandwidth is subsumed within that of the kernel. Thus, when the graduation is performed using the kernels defined in Section 3.4 the kernel estimates, using di, are
4^;=
[
&,K(x-x.) I
(29)
]/[~,K~xpxi,]*
i=l
If we define the kernel to be K(x)
=
It”
for
XE[S-+,s++)
for
x< -(m+i)
where and
s=-m,
-m+l,...,
m,
x>rn+i,
(30)
then K(x) is a kernel estimator which is equivalent to the MWA. Note that Cy= _mas = 1 so that +m
/ --m
K(x)
dx=
(31)
1.
MWA does not produce estimates at the ends of the table and for the purposes of showing that it can be considered as a kernel estimator, it is not necessary to go into further details. However, it can be seen that the kernel estimator provides a mechanism for producing graduated rates, even at the ends of the table. In addition, the kernel estimator gives estimates of the mortality rate at any point between two of the observed rates. In order to illustrate the principle behind the 4: estimator, we construct a kernel which produces the same estimates as a given MWA at exact ages and which also produces estimates at the ends of the table or at intermediate ages. 3.4. Example 1: A simple example of the q-one estimator Suppose we are given the crude values {3, 5, 6, 4, 7, 7, 9) corresponding to ages (20, 21, 22, 23, 24, 25, 26} respectively. This example uses integer values for the crude rates for convenience. Firstly, consider graduating these values using a moving weighted average with a range of 5 and weights, a, = i for s= -2 - 1, 0, 1, 2. This produces graduated rates of (5, 5:, 6;) for ages {22, 23, 24) respectively. Now consider using a continuous kernel function K, (M for moving weighted average), defined by
0
if IxI<2+, otherwise.
(32)
Note that jTzK,(x) = 1, which explains the choice of 3 in K,(x). Note also that in the centre of the table 4: will involve the sum of 5 terms, which will produce a MWA with a range of 5. For example, the graduated rate at age 22, using (201, is $3 + 35 + +6 + $4 + $7 +++++++++
=+3+;5+;6++4+$7=5.
(33)
The complete graduation is shown in Table 1. Notice that the kernel estimator also produces values at the end of the table. In fact, it can be extended to values beyond the ends of the table, as shown in Table 2. Ages below 18 and above 28 produce graduated values of 0. Also, graduated values can be calculated for non-integral ages. Note that K,(x) is not the only kernel which reproduces the figures in Tables 1 and 2. We could also use (34)
119
J. Gavin et al. / Moving weighted average graduation
A simple moving weighted
average
Table 1 of five terms using the q-one estimator. Graduated
Age
Calculation
20
(43 + $ + f6,/(f
21
(;3+;5+f6+j4)/(f+~+~+~)
4;
22
(~3+~5+f6+~4+~7)/(f+~+f+~+~)
5
23
(f5+~6+~4+f7+;7)/(f+f+f+~+f)
5;
24
(;6+54+~7+~7+~9)/(;+f+;+~+f)
6;
25
(f4+$7+
6;
26
(f7+;7+;9)/(++;+f)
value
4;
+ ; + f,
;7++9)/(f+++f++)
7;
to produce exactly identical results at integral ages. For non-integral ages K,(x) is probably superior because it has more terms. Of course, those kernels that correspond directly to MWA are only a very limited part of the complete range of kernels that could be used. Also there is no need to restrict our attention to these or to consider only 6:. Having shown that MWA corresponds to certain forms of kernel estimation, we next consider which kernel is best to use in practice. Note that we could define the kernel in (32) as
K,(x) =
1 0
if 1x1 5+, otherwise,
(35)
and vary the range by choosing an approximate bandwidth, h. In some ways, this latter approach seems more satisfactory but we have used the different approach given above as it seems easier to follow. Thus in this simple example, we have seen a kernel estimator, ii, being used to reproduce a simple moving weighted average. In the following sections, it is unlikely that this ‘naive’ kernel would be practical.
4. Designing
a kernel
4.1. The discrete case
An essential feature of any graduation is that the graduated rates should be smooth, in some sense. With moving weighted averages, one approach is to choose weights that give the smoothest graduations, all other things being equal. Some work exists in the literature on designing such an optimal smoothing MWA. This work usually assumes that following constraints are applied to the co-efficients of the MWA, namely m
C a,=1 s=
-m
Off-end-of-table
estimates
Table 2 for a simple moving weighted average
Age
Calculation
18
(53)/(i)
19
(i3 + +5)/f;
+ ;,
27
i;7+
+ f>
28
($9)/f+)
of five terms using the q-one estimator. Graduated 3
;9)/(+
4
s 9
value
120
J. Gavin et al. / Moving weighted average graduation
and m
C s2as=0, s=
(37)
--m
so that the true rates in the interval covered by the range of the moving weighted average formula follow a simple pattern such as a cubic, in this case. Then it is easy to show that
s=
E asqx+s=9,.
(38)
-m
From (25) and (261, we have
= I/(9,)
5
+ s=
If we now assume that the
rx
IF4Wi+J
a,2v(rx+s) =O+ s=
-m
are uncorrelated
-m
and homoscedastic,
with variance (+‘, it follows that
(40) The assumption that the residual errors, r,, are uncorrelated is questionable but note that we are not assuming that they are independent. It may be possible to reduce the correlation between policies by adjusting the given exposures to allow for the presence of duplicate policies. Investigations into the distribution of duplicate policies was carried by the C.M.I. (1957X1986). C.M.I. (1957) suggests that the exposure and numbers of deaths at each age should be divided by a ‘variance ratio’, varying with age, which will have the effect of reducing the exposure and the number of deaths, to reflect the removal of duplicate policies. The ‘variance ratios’ were derived from a single investigation. Renshaw (1992) has provided an alternative approach, which effectively calculates ‘variance ratios’ based on the actual crude data being graduated. The second assumption is that the residual errors, r,, are equivariant. In practice, the variance of the residual errors is more likely to move inversely with the exposure. So the assumption of equivariant residual errors is less likely to be valid than the assumption of uncorrelated residual errors. London (1981) suggests one way around this problem by assuming that V(r,+,)/V(r,> is proportional to e,/e,+,. In a recent paper, Ramsay (1993) adopts a similar but more general approach to allow for correlation and unequal variances among the crude rates. In this paper we will derive results assuming independence and homoscedasticity, leaving these adjustments for future work. To justify the process of graduation, we would expect the graduated rates, $, to be more smooth than the corresponding crude rates, 4,. Thus we would expect R, to be less than one, where
R,= qa,>/qa,).
(41)
From (26)
(42) and so we get
5
R,= s=
--m
a,“.
(43)
121
.I. Gavin et al. / Moving weighted average graduation
We could minimise R,, subject to the constraints (36) and (37). However, this has been found to result in poor graduations. Better results have been obtained by generalising (41) to R, = v( AZ&)/I/( A’&), where the operator 0,
(44)
A” is defined as
= i$0 (;)(
- l)n-iGx+i,
(45)
and then minimising R, for some value of z. The variable z is usually chosen to be 2, 3 or 4. Straightforward, but awkward calculations yield weights that satisfy a,=[(m+1)2-~2][(,+2)2-s2]...[(m+2)2-S2][I+p(m2-s2)],
(46)
, m and 2m + 1 is the range of the moving weighted average. The variables 1 wheres= --m,--m+l,... and p can be determined from the constraint equations, (36) and (37). Practical experience suggests that z = 2, 3 or 4 produce acceptable graduations. For example, if z = 3, the weights turn out to be
315[(m+1)2-s2][(m+2)2-s2][(m+3)2-s2][3m2+12m-4-lls2] S(m +2)[( m+2)2-
“=
1][4(m+2)*-
l][4(m+2)2-9][4(m+2)2-25]
(47)
’
fors= -m,-m+l,..., m. Further details of the derivation of these weights can be found in London (1985, p. 45) and in Benjamin and Pollard (1980, Section 13.16). The latter refer to these weights as the optimal smoothing co-efficients. 4.2. The continuous case The kernel equivalent of the optimal smoothing MWA weights is of the form K(x)
=
[h2-x2][(h+1)2-x2]...[( i0
h+z)2-x2][l+p((h-l)2-x2)]
if
1x1 sh,
otherwise,
(48)
where z + 1 is the degree of differencing applied to the graduated rates, h is the bandwidth, 1 and p are unknown variables and K is subject to the following constraint equations: ‘K(x)dx=l / -h
(49)
and h
/ -h
x2K( x) dx = 0.
(50)
This kernel will be called the optimal smoothing kernel, and denoted by K,. There are some interesting points to note about this kernel. Firstly, it would be undesirable to choose z = 0 since the kernel is then discontinuous at kh. The estimates based on this kernel would then have undesirable discontinuities, so we would not choose to minimise R,. Reassuringly Silverman (1986) reaches the same conclusion in Section (3.621, but does not go on to consider minimising R,, where z 2 1. Secondly, the kernel is unimodal and symmetric but can take negative values. Because this kernel can take negative values, the graduated rates may not always lie in the range 10, 11. Values outside this range are clearly unacceptable and will be ignored. We now consider taking third differences, that is we will choose a kernel that will minimize R, where R, = V( A3&)/V(
A3&).
(51)
J. Gavin et al. / Moving weighted average graduation
122 0.3
-0.05
1
K(x)
I -30
I
I
-20
-10 x
I
/
0
10
I
I
20
30
value
Fig. 1. The optimal smoothing kernel with three degrees of differencing.
The bandwidth, h, is chosen to be m + 1 so that the kernel tends to zero in a continuous fashion, as x - + h. Hence, if we define K,(x) to be zero outside the interval r-h, + hl then the kernel is continuous. So the equation for K,(x) becomes K,(x)
=
[h2-x2][(h+1)2-x2][(h+2)2-x2][Z+p((h-l)2-x2)]
if
1x1 sh,
(52)
otherwise.
i0
Substituting (52) into (49) and (50) and simplifying produces two linear equations of two unknown variables, I and p. Solving for 1 and p in terms of h and substituting the values back into (52) is again straightforward but tedious. It results in the following kernel: K,(x)
=
315[h2-x2][(h+1)2-x2][(h+2)2-x2][Ah2-llBx2]/64hsC
if
-h
otherwise,
(0
(53) where A, B and C are defined as follows: A = 6h4 + 66h3 + 253h* + 297h + 99,
(54)
B = 2h4 + 18h3 + 57h* + 63h + 21,
(55)
C = 16h8 + 288h7 + 2,172h6 + 8,640h5 + 19,677h4 + 26,334h3 + 20,328h2 + 8,316h + 1,386.
(56)
Attempts to factorise these polynomials were unsuccessful. It should be noted that for the purposes of graduation, we do not need to retain the whole of the kernel. Since the kernel appears in both the numerator and denominator of &, in (20), some cancellation takes place. Effectively, we can ignore the 64h5C. The kernel defined in (53) is illustrated in Figure 1 for various bandwidths. 4.3. Example 2: An optimal smoothing
kernel and Spencer’s 21-term formula
It was noted by Benjamin and Pollard (1980, Section 13.751, that the optimal smoothing co-efficients with a range of 19 produce very similar results to Spencer’s 21-term formula, defined in (24). We would
123
.I. Gavin et al. / Moving weighted average graduation
Spencer’s
21-term
formula
Age
Crude rate
35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52
0.00095 0.00638 0.00191 0.00419 0.00278 0.00381 0.00384 0.00665 0.00477 0.01015 0.01093 0.00860 0.00779 0.00862 0.00951 0.00866 0.01109 0.01291
and the optimal
smoothing
Table 3 kernel estimates,
using data from Benjamin
and Pollard
(1980, Table 13.2).
Spencer’s 21-term
q-one rate
Age
Crude rate
Spencer’s 21-term
q-one rate
53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70
0.01274 0.01579 0.01873 0.01482 0.02002 0.02011 0.02827 0.01722 0.02942 0.03374 0.02735 0.02731 0.04323 0.04926 0.04114 0.04099 0.05835 0.06103
0.01347 0.01490 0.01638 0.01790 0.01939 0.02098 0.02268 0.02457
0.00817 0.00859 0.00890 0.00919 0.00959 0.01021 0.01109 0.01220
0.00302 0.00304 0.00313 0.00335 0.00373 0.00430 0.00505 0.00592 0.00680 0.00758 0.00818 0.00860 0.00890 0.00921 0.00963 0.01021 0.01106 0.01218
0.01346 0.01488 0.01641 0.01795 0.01944 0.02095 0.02266 0.02456 0.02667 0.02925 0.03238 0.03585 0.03935 0.04267 0.04574 0.0485 1 0.05105 0.05347
expect to obtain similar results using 4: and an optimal smoothing kernel. A range of 19 equates to m = 9 and h = 10 in (47) and (53) respectively. We now apply the optimal smoothing kernel to the data in
Benjamin and Pollard (1980, Table 13.2). As was mentioned in Section 3, MWA cannot produce graduated rates at the ends of the tables but this is not a problem for the kernel method. Table 3 shows the crude mortality rates, the graduated rates using Spencer’s 21-term formula and 4; with the optimal smoothing kernel. It can be seen that the graduated rates are in very close agreement. Thus using an optimal smoothing kernel with h = 10 corresponds approximately to Spencer’s 21-term formula. 4.4. Example 3: English Life Tables No. 14 In this section the $t estimator, defined in (20) along with the optimal smoothing kernel, defined in (531, is used to graduate the English Life Tables Number 14 (19871, both males and females. For the kernel graduation, a bandwidth of 10 has been chosen as this corresponds approximately to Spencer’s 21-term formula, as was shown in Example 2 in Section 4.3. Figure 2 ’ shows the crude, published and graduated rates for ages S-99, for both males and females. The graphs are based on initial rates. The 4: estimator tends to underestimate the true mortality curve at high ages, where the true curve is concave and to overestimate at young ages, where the true curve is convex. Copas and Haberman (1983) reached the same conclusion for the 4x2 estimator. A possible reason for this is that our data does not cover the entire age range but the kernel density estimates are attempting to do so. This lack of information is most noticeable when estimating the density of the number of deaths at high ages. One way of addressing this problem is to transform the data, by making it linear, before graduating. As mortality data is roughly exponential, an obvious choice is to graduate the log of the crude rates before reversing the transformation to produce the final graduated rates. But this approach may mean using a different transformation for other types of data and difficulties arise with ages for which there are no deaths. In any case, results using the 4: estimator did not show a significant improvement. Hoem and Linnemann (1988) provide a rigourous mathematical approach for dealing with ’ Small variations in these computer drawn curves are due to rounding errors and to the particular the graphics computer software rather than to the graduation methods themselves.
interpolation
procedure
used in
124
J. Gavin et al. / Moving weighted average graduation
.5-
. Crude Rates . . . . . Published Rates _
3
Graduated
Rates
.l-
3 P ‘6 9 5 al iir n Z-B 5
.Ol-
E %
.OOl-
.ooolJ
1 0
20
40
60
60
100
Age
Fig. 2. English Life Tables Number 14; males and females: Ages 5 to 99.
the ends of the table, in the context of moving weighted averages, and it seems plausible that their theory could be incorporated within kernel graduation. Likewise, Silverman (1986) provides some alternative approaches for dealing with bounded domains.
i ++ ...
.OOl+
Crude Rates
. . . . . Published _
+
Rates
Graduated
:
Rates
/
.r
B
.0005-
5 al iii lx
2 5 .0003
3 ..
:4
‘+.
. . . ...+ .i’
‘,.,+
.... + ._. +
_.
+ .....‘..
y”
‘..,
+‘.
S
:$ -I?
+
:’
+
._.
“.+
Ana&
:’ : + ,;(:
:’ __..:+
+
+
.000155
10
15
20
25
Age Fig. 3. English Life Tables Number 14; males and females: Ages 5 to 30.
30
J. Gavin et al. / Moving weighted average graduation
125
Figure 3 shows the same graph but with the ages restricted to 5-30. It shows that the optimal smoothing kernel, with a bandwidth of 10, produces a slightly smoother graduation than the published graduation, over this age range. However, we are not suggesting that this particular choice of estimator, kernel and bandwidth is the best possible combination. The main result is that by approaching the problem using kernel estimation instead of moving weighted averages, we have produced graduated values at ages 5-15 and 89-99. This is not possible with Spencer’s 21-term formula. Secondly, graduated values can be produced at non-integral ages in the age range 5-99. Thirdly, the graduated rates can be extrapolated below age 5 and above age 99. However, the figures are not very meaningful for a truncated kernel such as the optimal smoothing kernel. Finally the optimal smoothing kernel gives a theoretical basis for selecting a superior kernel, all other things being equal.
5. Conclusions
This paper has explored the link between moving weighted averages and graduation and has introduced a new kernel estimator for the rate of mortality, called 4:. It has been shown that 4: is a generalisation of moving weighted averages. A kernel was defined which minimises the variance of the third differences of the graduated rates relative to those of the crude rates. This kernel is the continuous version of what Benjamin and Pollard (1980) called the optimal smoothing co-efficients, which are used in the context of MWA. It is demonstrated to be approximately equivalent to Spencer’s 21-term formula, if a bandwidth of 10 is used. Kernels provide a mechanism for obtaining graduated rates at the ends of the table and also give graduated rates at all ages in between, not just those for which data is available. In the final section of the paper, the English Life Tables Number 14 were graduated using this estimator, kernel and bandwidth. In conclusion, it is the authors’ belief that this class of estimators can be very useful for graduations and that the kernel approach is the best available approach to moving weighted average graduation.
Acknowledgement
During our work with this project, Silverman.
we have benefited
from discussions with Professor
Bernard
References Benjamin, B. and J.H. Pollard (1980). The Analysis of Mortality and Other Actuarial Statistics. Heinemann, London. Bloomfield, D.S.F. and S. Haberman (1987). Graduation: Some experiments with kernel methods. Journal of the Institute of Actuaries 114, 339-369. Continuous Mortality Investigation Committee (1986). An investigation into the distribution policies per life assured in the cause of death investigation data. Continuous Mortality Investigation Reports, Vol. 8. Copas, J.B. and S. Haberman (1983). Non-parametric graduation using kernel methods. Journal of the Institute of Actuaries 110, 135-156. Hoem, J.M. and P. Linnemann (1988). The tails in moving average graduation, Scandanavian Actuarial Journal, 193-229. Joint Mortality Investigation Committee (1957). Continuous investigation into the mortality of assured lives: Memorandum on a special inquiry into the distribution of duplicate policies. Journal of the Institute of Actuaries 83, 34-36. London, R.L. (1981). In defence of minimum-R,, linear compound graduation and a simple modification for its improvement. Actuarial Research Clearing House, 1981.2, 75-78. London, D. (1985). Graduation - The Revision of Estimates. ACTEX Publications, Winsted and Abington, CT. Office of Population Censuses and Surveys (1987). Enghsh Life Tables Number 14, Series DS Number 7, l-22. Her Majesty’s Stationery Office, London.
126
J. Gavin et al. / Moving weighted average graduation
Ramlau-Hansen, H. (1983). The choice of a kernel function in the graduation of counting process intensities. Scandanavian Actuarial Journal, 165-182. Ramsay, C.M. (1993). Minimum variance moving-weighted-average graduation. Transactions of the Sociery of Actuaries. XLIII, forthcoming. Renshaw, A.E. (1992). Joint modelling for actuarial graduation and duplicate policies. Journal of the Institute of Actuaries 119, 69-85. Silverman, B.W. (1986). Density Estimation for Statistics and Data Analysis. Chapman and Hall, London.