Math. Applic. Vol. 25, No. 7, pp. 89-107,
Computers
1997
Copyright@1997 Elsevier Science Ltd Printed in Great Britain. All rights reserved 0898-1221/97 $17.00 $- 0.00 PII: s0895-7177(97)00051-4
Pergamon
Analyses of Cohort Mortality Incorporating Observed and Unobserved Risk Factors K. G. MANTON* Center for Demographic Studies, Duke University 2117 Campus Drive, Box 90408, Durham, NC 27706-0408, U.S.A.
G. LOWRIMORE Center for Demographic Studies, Duke University 2117 Campus Drive, Durham, NC 27706, U.S.A. A. YASHIN Odense University, Institute of Community Health J.B. Winslows Vej 17, DK-5000 Odense, Denmark H. D. TOLLEY Department of Statistics, Brigham Young University Rm. 226 TMCB, Provo, UT 84602, U.S.A. (Received
May 1996;
accepted
July
1996)
Abstract-Interventions to prevent disease and increase life expectancy are most effectively developed from data on pathways to disease and death. Unfortunately, most national data sets separate end-state information-i.e., causespecific mortality-from pathway data describing how specific diseases result from environmental and behavioral processes. Thus, a coherent empirical picture of routes to death from a diversity of causes requires a data combining and modelling strategy that, of necessity, incorporates theory and prior-knowledge-based assumptions together with sensitivity analyses to assess the stability of conclusions. In this paper, a general data combining statistical strategy is presented and illustrated for smoking behavior and lung cancer mortality. Specifically, National Health Interview Survey data on smoking is combined with U.S. vital statistics data 1950 to 1987 to analyze the joint distribution of total and lung cancer mortality. Parameters were estimated for mortality, smoking cessation processes, and for individual risk heterogeneity for nine U.S. white male and female cohorts aged 30 to 70 in 1950 and followed until 1987.
Keywords-Gompertz cessation.
hazard, Weibull hazard, Dubey distribution, Cohort mortality, Smoking
1. INTRODUCTION Mortality preventing
data is useful in making disease and increasing
national
estimates
life expectancy.
of the effects of health interventions
In particular,
U.S. mortality
for
data has the ad-
vantages
that the number of deaths is large, data on all causes of death are reported, data on individual deaths is available back to 1950, and all ages, population groups, and geographic areas are represented. However, the effects of health interventions can be better assessed when information on the pathways to disease, and then to death, are used. Unfortunately, extant national
*Author to whom all correspondence should be addressed. This research was supported by NIA Grant AG01159, AG07025, authors thank anonymous referees for valuable comments.
and NIH/NIA
Grant PO1 AG08791-01.
Typeset 89
by d&-T@
The
K. G. MANTON et al.
90
mortality
data,
though
providing
information
on end states
(e.g., cause specific
mortality)
do
not provide pathway information describing how particular diseases result from environmental and behavioral processes, e.g., mortality data lacks measures of risk factors, and health events, before death. Hazard functions estimated assuming population risk homogeneity do not unbiasedly describe heterogeneous.
the age dependence Epidemiological
of the hazard studies
for an individual
show individual
in a population
risks vary due to differences
which is risk in smoking,
nutrition, physical activity, and environmental and workplace exposures. Studies of monozygotic twins show mortality selection operates strongly from ages 35 to 85 (e.g., [1,2]). Epidemiological study populations, however, are generally not nationally follow-up to describe cohort differences. Cross-sectional sentative, of routes
may confound to death
cohort,
period,
from a diversity
representative and may not have enough health surveys, though nationally repre-
and age effects [3]. Thus,
of causes
requires
a data
a coherent
combining
empirical
and modeling
picture strategy
that, of necessity, incorporates theory and prior knowledge based assumptions. In this paper we present a general statistical data combining strategy and illustrate it in the context of smoking behavior and lung cancer mortality. Technically, the information limitations of different data sets can be dealt with by combining them in a “global” likelihood so parameters can be estimated from data on the joint distribution of individual processes parameterized from biological models and on the population distribution of risks. Thus, limitations of to “borrow” information from dependence of cohort mortality risk factors, mortality selection
one data set are compensated for by using a global likelihood other data sets. To construct the global likelihood [4,5], the age is assumed related to the distribution of individual risks, measured on observed and unobserved risk factors, and the age dependence
of hazards for individuals. By controlling for observed and unobserved heterogeneity of individual risks in a cohort, one can estimate individual transition rates unbiased by mortality selection. Thus, hazard function parameters can be partly adjusted for risk heterogeneity using national health survey data and partly adjusted for the theoretical, or indirectly observed, distribution of individual risks net of We applied this model to survey data on smoking and to U.S. directly observed risk factors. white male and female total and lung cancer mortality for cohorts aged 30 to 70 in 1950 followed until 1987. We assess the fit of the model, and the stability of parameter estimates, to see how much information is gained by combining data sets.
2. THE MODEL As suggested by Little and Schluchter [4], to integrate data describing a common health process, it is necessary first to form a global likelihood function. To do this, we first specify the survival distribution for a cohort or, S(z, [, 2) = Pr {X > z 1<, 2) = e-H*(z*E+),
(1)
where X is the age at death, 5 is the vector of parameters for the hazard function V, z is a vector hazard to 2, and V(U, <, Z) the of risk factors, H*(z, e, z) = Jt ~(‘11,{, z) d u is the cumulative instantaneous hazard. Parameters can be estimated assuming that we have complete information on that cohort using a mixed Poisson likelihood of the form
where d,,, = number of deaths at age x and risk level Z, N, is a mid-year population estimate, H’ is the cumulative hazard, and f(z, t, z) is the distribution function for individual risk factors which mixes the Poisson hazard rates for individuals age x at time t with risk factor values z [5]. In (2), we assume the risk factors t define discrete groups. The factorization of (2) into a hazard,
Analyses of Cohort Mortality conditional
on risk factor data, and a marginal
need to be evaluated factors
01
in respecifying
z are measured
at multiple
risk factor distribution is an assumption (2) to correspond to available data. For example,
time points,
then the distribution
function
that will if the risk
f(z, t, z) could be
a multivariate stochastic process with the Poisson distribution of events estimated conditionally on the values of the outcomes of that process. If some risk factors are unmeasured. then the cumulative hazard H* has to be integrated across the distribution of unobserved risk factors--an integration on individual proportion however,
which is feasible if the distribution of the effects of the unobserved risk factors hazards can be specified from theory. If we have a marginal distribution (e.g., smoking)
mortality
in the distribution that the parameters A Poisson
from a cross-sectional alters the distribution
survey, function
then
f is known
in H*. Thus, the available data will determine to be estimated are mathematically identifiable.
likelihood
is used to describe
for a point
f over time, those dynamics
the conditional
distribution
in trim’.
If,
must) bc included
the requisite
coustri\ints
of health outcomes
so
because
national populations are not “closed” [6]. Consequently, uncertainty in annual mid-year population estimates (due to mortality fluctuations in the first half of the year) makes the variance of rate estimates at least Poisson, i.e., variability in the first half of the year is negative binomial; in the second half, binomial [7]. The convolution of the two distributions produces at least Poisson variation. Additional variation may be added depending on how u is specified to changca over age, or after conditioning on known fixed risk factors. Equation (2) assumes all risk factors are observed. In fact, in mortality data for partial cohorts. many risk factors are either unobserved--or only parameters of their distribution arc observed. To deal with partially observed, or unobserved, risk factors, (2) has to be reparameterized. To account for the partially observed data on smoking in our example, we constructed M mutually exclusive risk groups. If individual transitions from the alive to the dead state, or from one risk factor state to another, are roughly constant over short intervals, the survival of in risk heterogeneous cohort is the weighted average of the hazard for each of the M groups [8]. In (2)) the risks for M groups can be represented by the average risk. If this average can be calculated, thr,n the effect of the distribution of smoking can be transferred from f to the integrated hazard H*. If the hazard for the mth group is v,(x,<), this average is
where ~~(t, Q, the conditional probability national health survey, i.e., for observations
a person age y + t is in group m, is estimated beginning at g, and ending at t
&m(t, C) = - (&I (Y + t, 0 - 77(Y + t, E)) nm(t. 0, where n,(O,<)
= qm is the initial
proportion
in m at age 1~estimated
in qm (where x2=, qm = 1) are due both to distribution of risks in each state, and on relative To solve (3) and (4), we must specify how the of the age dependence of mortality for the jth dent parametric form for V. One function often mortality [9] is the Gompertz, i.e., L’(XL’~
=
m=l,....iu,
from t,he survey.
from a
(4)
Changes
the selection effects of the hazard I/,,, on the> risks between states. average risk in group 712 changes as a function cause. This requires specifying an age depenused to describe the age dependence of adult
pexP(az),
(5)
where 6~ = (a,@) are shape (CY)and scale (p) parameters, pi describes annual percent increases in mortality [lo], p describes initial (Z = 0) mortality rates. For some diseases (e.g., cancer [ll]), or possibly total mortality at late ages [12,13], a Weibull better describes age dependence, i.e., V (T,&)
= &Y1.
(6)
K. G. MANTONet al.
92
In using (6) to model carcinogenesis, tumor growth starts after the uth genetic error disrupts cell growth [14,15]. Laboratory and clinical studies confirm the validity of the Weibull for solid tumors
(e.g., for colon cancer [16] and lung cancer (e.g., in the p53 growth regulation gene (181) causing the time when accumulated genetic damage triggers
[17]) by identifying neoplastic neoplastic
growth. growth,
specific
process reaches a lethal threshold (e.g., a host’s ability exceeded) is the “latency” of the process. While latency
to metabolically support affects the Weibull shape
effects only the Gompertz
hazard
scale parameter. Y
A Gompertz
genetic
errors
In (6), the lag between and the time when the
with a latency
the tumor parameter,
is it
of 1 years is
/3*ea(“-1) = p*e-nletrx,
(a, p, x,1) =
(7)
so that
That is, lag effects do not change a, but are subsumed in P, so the Gompertz shape parameter (and form of age dependence) is not altered. This is important when modeling partial cohort data where unobserved risk heterogeneity is due to either genetic factors, or risk factor exposures accumulated before the first age evaluated for a given cohort (i.e., left censoring of risk factor ex.posures). Thus, for a cohort age 70 in 1950, there may be more indiyidual risk variation due to accumulated exposures than for a cohort age 50 in 1950. Alternatively, for-a cohort aged 70 in 1950, individual genetic heterogeneity may be lost before the first observed age due to mortality selection. Modeling heterogeneity helps adjust other transition parameters for changes in the population distribution of risks. Such differences are absorbed in the Gompertz scale parameter, whereas the value of the Weibull shape parameter is sensitive to such effects. Epidemiological studies show that in modeling mortality to late ages, the heterogeneity of unobserved risks for specific diseases has important effects. The Weibull and Gompertz hazards describe the age trajectory of the average risk in a popuin (5) and (6) d o not appropriately lation when the population is heterogeneous. Thus, we generalize (5) and (6) to represent heterogeneity by reparameterizing those functions to reflect a flexible parametric family of distributions and evaluate how results change for different members of the family. The same distribution and hazard is assumed to exist in each of M risk groups for each of J causes. individuals in group m are subject to a hazard conditional on an unobserved distribution of levels of, the hazard is
The Dubey
distribution
was selected
for ti [19]. This produces
a composite
this risk risk If risk
Gompertz,
(9) or, for the Weibull,
In both (7) and (8), y is the square of the coefficient of variation (CV) at age 0 of zf. Different risk distributions are selected by changing n. The mean of the frailty distribution decreases with age, no matter which value of n is chosen. When n = 2, z; is inverse Gaussian distributed [20], and the CV decreases with age. When n = 1.0, Z: is gamma distributed which has a constant CV. The denominators in (9) and (10) reduce the age rate of increase in mortality after a significant
Analyses of Cohort Mortality proportion consistent than
of the population with the observation
the Gompertz
93
(i.e., enough to remove many high risk persons) dies. This is in population studies, that mortality rises more slowly with age
(or the Weibull)
above age 80. Deviations
from the Gompertz
(or Weibull)
increase as heterogeneity increases [7]. Either (9) and (10) can be applied to the m risk groups comprising each cohort--each age dependent hazards with the same shape, but different scale, parameters. Specification heterogeneous
population
form of the Weibull
or Gompertz
the m groups to solve (3) or (4). The general of J causes of death is
solution
is necessary
to integrate
for (4) with a cumulative
with of the
the V, in
hazard
for each
*mexp{-CjHmj(ti<~)} x,
(t, 0) =
(11) C,4mexp{-CjIImj(t,G)}’
The distribution of persons Gompertz in (9) is TiT,
in the m groups,
rrm(t, <) from (11) with vrn specified
Qm exP [-(l/yG)
ln((1
(t,&2) = c,“=l
For the composite
Weibull
qnt exp
in (lo),
[-(l/YGf
-
(‘YGhn/@))
In ((1
-
+
kYGhn/+Qt)]
(?‘G@m/a)f
+
(W
(YG@m/a)@)l.
it is
qmexp I-(lhw) ln (1 + (rw&J"la))l C,“=,qmew [-(l/-w) ln (1 + (rwbPl~))l
TTn (CSW)=
as the composite
(13)
’
In both (12) and (13), n was assumed to be 1.0. Other values of n can be evaluated by changing or (10). If tw = (&,&, . . . ,ernr~,a)and~c=(pl,P2,...,Prn,~,~), thehazardforapopulation with m groups is
where
VGrn and UW,,, are hazards
for the mth group
for the composite
Gompertz
(9)
or Weibull,
respectively. If individuals (&n,
= - Cl+
move between Lr
represent
km(t)
= 2
groups,
additional
persons
staying
L(t)n1(t)
terms
6i, (t), m = 1,2, . . , M are needed
in the mth state),
[8]
or
- TV%(t)(&I (Y + t, E) - D(Y + t, C))
(14)
l=l
When
M = 2, the equations
specialize
to
+1(t) = -
[Yl (Y + t, <) -
O(Y + t, r)l Tl (q + &%?(t),
fr2(t)
[vz(y
P’(Y + t, E)1”2(t)
=
-
+ t, [)
-
-
&“2(t).
(15)
Equation (15) completes the reparameterization of (2) to represent both observed and unobserved risk factor distributions so it can be estimated from observed data. Thus, by appropriately specifying v,, and integrating it to form H*, the marginal risk factor distribution variation f can be integrated with the variation of individual risks. The values in <‘, and the distribution qm are evaluated from (2) by computing 7r, from (4), for each value of 2, and ~(y + u,<) from (3)-with the form of v determined (e.g., (12) or (13)) by selecting a value of n. H*(z, E) is computed for each cohort by Simpson’s numerical integration formula. A Nelder-Mead [21] algorithm for minimizing multivariate functions without derivatives was used because of the complexity of the derivatives. The negative of the log likelihood was the objective function. The solution for each cohort was evaluated independently, though cross-cohort parametric restrictions were imposed.
K. G. MANTON et al.
94
Data We analyzed U.S. white male and female cohort mortality for 1950 to 1987. We defined two (M = 2) risk groups: smokers and nonsmokers; and two (j = 2) causes of death: lung cancer and nonlung
cancer
females
were used to compute
in nine cohorts
deaths.
National
Center
for Health
Statistics
mortality
data for white males and
the number of lung cancer and nonlung cancer deaths occurring in 1950 aged 30, 35, . . . , 70 years. The mortality for a specific birth cohort is the
average number of deaths taken over five years of age to reduce age heaping and digit preference (i.e., we followed five cohorts age 28 to 32 in 1950 to age 29 to 33 in 1951, etc., and used the average risks for each group). to calculate
the rates.
We used midyear
Population
estimates
census counts adjusted for underenumeration, Bias arises if migrants are different than (a) migrants (b) migration
population
estimates
were calculated
for corresponding
by linearly
interpolating
sets of ages decennial
race misclassification, and age misreporting [22-241. those originally in the cohort. Bias is small if
are similar to residents, or is small relative to population
size.
The U.S. annual migration rate for 1951 to 1960 is 0.15%; for 1961 to 1970, 0.17%; for 1971 to 1980, 0.21%; and 1981 to 1990, 0.31% (Statistical Abstract of the U.S., p. 10, Table 5). Over a Since we analyzed U.S. whites, we 38-year period, 5.4% of the U.S. population were migrants. only considered white migrants. In 1961 to 1970, this was 45.9%; in 1971 to 1980, 20.4%; and in 1981 to 1990, 11.2% of all U.S. migrants. About 70% of migration occurs below age 30, i.e., the first age analyzed. Thus, bias will be small for white male and female cohorts at least age 30 from 1950 to 1987. The marginal distribution of smoking among white males and females of different ages in 1950 was estimated from the 1978 to 1980 NHIS which contained supplemental questions on smoking (25). Two groups (i.e., smokers and nonsmokers) represented risk heterogeneity due to smoking. This is an improvement over not using data on smoking in the cohort mortality analyses.
1670
1060
I890
1900 BIRTH
Figure 1. 1950 smoking proportions to 70 in 1950.
1910
1920
COHORT
for U.S. male and female birth cohorts aged 30
1930
Analyses
of Cohort Mortality
95
A “coarse” smoking variable from the independent survey data was used because determining the detailed smoking status of persons in 1950 is difficult and subject to error. By using less detailed (but more reliable) data on smoking, the relative risk (RR) estimated between smokers and nonsmokers is conservative. ‘Lcoarse” smoking data. Harris [25] estimated
Nonetheless, a large gain in information is expected--even
with
smoking prevalences for birth cohorts for the year 1950 from 1978 to
1980 NHIS data. A second-degree spline was used to interpolate values for intermediat,e ages and extrapolate values for the oldest cohort. Figure 1 shows the estimated values for 1950. Sixty-four percent of the 1905 male cohort smoked at age 65. The 1920 cohort smoking prevalence peaked at 72% at age 30 in 1950. The 1950 cohort prevalences of white female smokers is lower than for males. The proportion smoking in the 1905 cohort was 24%. For t,he 1920 cohort,, it was 38%. These values are used in our integrated hazard functions as the q,,2’s.
3. RESULTS Gompertz
Estimates
of Partial
Cohort
Mortality
We fit the Gompertz to each of nine white male and female cohorts defined in 1950 t,o all deaths occurring between 1950 to 1987 not using marginal data on smoking, nor adjusting for individual heterogeneity. Cohort specific parameter estimates and measures of fit arc in Tablt 1 (see Appendix). The likelihood ratio x2 is 81,590.7 for males and 26,963.l for females. The male (YC range between 7.1% to 7.8% and 7.5% to 8.7% for females-consistsent with other estimates of Gompertz shape parameters for adult human populations [26]. The relative precision (i.e., average percent residual) for white male mortality over 38 years ranged from 8.9% for the 1915 birt,h cohort, to 3.2% for the 1885 cohort. The weighted average deviation is 6.2% for males and 3.8% for females. Thus, the average error was one part in 16 for males, and one part in 25 for females. i.e., the precision of the Gompertz estimates was poor. We also computed the sum of absolute, and signed, residuals. The first is the size of the absolute average deviation. The second is bias. Estimator precision can be poor but unbiased. Often statistical estimators trade small amounts of bias for significantly improved precision (e.g., Stein estimators, Ridge regression [27]). F or males, the bias is negative, i.e., mortality is over predicted. In females, bias is small and positive for cohorts born after 1895--and negative for cohorts born before 1890. Thus, in addition to large relative and absolute errors, there are systematic biases for the Gompertz model differing by gender. Estimates of Partial Cohort Mortality Using Observed and Unobserved Risk
Data Factor
Distributions
To improve the description of cohort mortality, we introduced data on smoking, and modelled individual heterogeneity, in the Gompertz (n = 1.0, i.e., z,* is assumed gamma distributed). Modeling the effects of smoking on causes of death other than lung cancer reflects many reports showing that the health effects of smoking are broad. In addition to lung cancer, it affects other cancers (e.g., pancreatic and bladder cancer), chronic obstructive pulmonary diseases, heart, disease and stroke, peripheral vascular disease, cirrhosis, and atherosclerosis [28]. By affecting microcirculation, smoking has a secondary effect on and an interaction with diabetes: as well as with many degenerative diseases of specific organs, e.g., renal dysfunction. It also has potential general metabolic effects accelerating such diseases as osteoporosis and other physiologicai functions under hormonal regulation. The only possible positive health effects found by Doll et nl. [28] were for select neurological diseases, e.g., Parkinson’s disease. In addition, the effects of smoking on mortality in Doll et al.‘s [28] 40 year follow-up were high at late ages. Given their large number, the effects of smoking are greater for all other causes of death than for lung cancer
K. G. MANTON et al.
96
death. Smoking cessation rates (6,) are estimated simultaneously with age changes in smoking and nonsmoking related mortality because a large portion of total mortality is smoking related, i.e., a joint analysis better defines both the overall health risks of smoking and the cohort specific smoking cessation rates (a,). Introducing smoking and unobserved produced the results for white males in Table 2 (see Appendix). The x2 is 10,142.7 reduction).
(i.e., an 87.6% reduction).
The relative
absolute
error declined
Bias is reduced
are evident
in estimates
from -1.93
into the model
to -0.028
from 6.35 to 2.27 (a 64.2% reduction).
model’s precision is greatly improved, with bias almost totally distribution of smoking estimated from the NHIS and adjusting Improvements
heterogeneity
for specific cohorts.
(a 98.5% Thus, the
eliminated, by using the marginal for unobserved heterogeneity (~,t). The first male cohort
1950) has a RR = 13.3 for smokers and ~130= 13.9%, i.e., mortality adjusted 13.9% per year from age 30 to 67. This is higher than the ~30 estimated
(age 30 in
for smoking increased for mortality without
smoking heterogeneity (e.g., 7.5% to 8.5% [26]), b ecause cohorts were stratified on a fixed risk factor (i.e., smoking). By stratifying on smoking, individual times to death are predicted more precisely, i.e., mortality for male smokers in middle age (i.e., ages 30 to 67) is 13.3 times as high as for nonsmokers. The CV at 30 is 0.240, i.e., the standard
deviation
of individual
risks (z:) within
each smok-
ing group for the first cohort is 0.240 times the mean. The CV is largest for young and old cohorts. This is reasonable because genetic risks have not been reduced by mortality selection in the youngest cohorts [17] and older cohorts are likely to have accumulated more environmental exposures before their index age (i.e., the age at which the mortality experience of that cohort begins to be considered), In general, the CV for smokers, with a higher average risk, is greater. This suggests an interaction of smoking with unobserved risk factors. This is consistent with epidemiological studies of workers exposed to asbestos where, for nonsmokers, lung cancer risk was five to six times higher for exposed persons; for exposed smokers the risk was 70 to 80 times that for nonexposed nonsmokers 129,301. The excess risk for smokers with asbestos exposure was so high that most smokers rapidly died out of the asbestos exposed population. 630 suggests that 5.3% of white male smokers in the 1920 cohort, conditional on surviving to a given age, quit smoking per year. 6, declines over the next five cohorts (i.e., from 3.7% to 0.9%). Male cohorts aged 60 and above in 1950 had most of their mortality experience at late ages after many smokers have died (i.e., the prevalence of smokers for ages 60 to 97 will be low and the likelihood of quitting each year for an individual smoker negligible except possibly if the person experiences the symptomatic expression of health effects initiated by past smoking). 6, differ from estimates of changes in smoking prevalence because they are transition rates estimated simultaneously with mortality rates for smokers and nonsmokers in a full information likelihood. Differences in prevalence (cross-sectional) estimates are often made piecewise for broader cohort Little and Schluchter [4] showed piecewise evaluations can produce biased estimates. groups. Fundamentally, however, it is the improvement of the prediction of cohort mortality by the inclusion of the available, partial survey data on smoking, as well as individual risk heterogeneity, Furthermore, the model of smoking effects allows for the that is the focus of the analysis. proportion of smokers to decline in a way consistent with mortality and smoking cessation as two simultaneous forces of decrement on the 1950 marginal distribution of smoking. Smoking related risk variation declines for male cohorts born after 1895 due to the early mortality of high risk individuals and smoking cessation. The 1915 male cohort shows a RR of 9.9 for smokers with an (Y = 13.7%. The RR for smokers is high since it is conditioned on both survival and smoking cessation. The x2 for the 1915 male cohort is better than for the 1920 cohort; as is the fit for the 1910 cohort, i.e., x2 drops 92.7% (from 13,746.0 to 994.0), a significant improvement from the homogeneous population model (Table 1). The average percent error for the 1915 cohort declined from 8.9% to 2.3%-a reduction of 74.2%. Absolute deviations are smaller and there is less bias compared to Table 1, i.e., bias declined from -3.008 to +0.056, by 98.1% in absolute size-with a change in sign.
Analyses of Cohort Mortality
The RR for smokers 1905 cohort
in the 1910 male cohort
(age 45 in 1950), it decreased
and 1.1. The latter for ages 85 to 94.
two estimates
after age 40 (in 1950) decreased
to 5.0; 3.9 for the 1900 cohort;
are still higher
than
to 6.9. For the
and then
the RR of 1.2 estimated
3.2, 2.6, 1.8, by Harris
[25]
is likely due to early smoking related deaths in being concentrated among susceptible (e.g., [17]) d ivi d ua 1s, who, because of their high RR, will disproportionately die out from the cohort by age 80. Nonetheless, the a65 parameter is 9.1% for the 1885 cohort-suggesting smoking has a strong effect on total mortality at late ages. The model’s
The RR decline
97
fit is shown in Figure
over cohorts
2. 65 fi
Figure 2. Model N = 1.0: total mortality (predicted and observed) with smoking cessation for nine U.S. white male cohorts aged 30 to 70 in 1950.
The model trajectories follow the observed age trajectories closely. To test the sensitivity of the model to the form of the mixing distribution assumed for unobserved heterogeneity, we re-estimated the model, assuming z,” is inverse Gaussian distributed (n = 2), i.e., a decreasing CV for zt. The results are presented in Table 3. This model did not fit as well overall (x2 = 12,013.2; +18.4%) as when n = 1 (CV is constant). Relative error increased to 2.7% (compared to 2.3%), i.e., +13.0%. The ~130is similar (i.e., 13.9%), but the RR for smokers are higher for the 1915 and 1910 cohorts (i.e., 10.9 vs. 9.9 and 7.3 vs. 6.9). CYCis also higher (i.e., starting at age 35 and 40 in 1950). The CV at the initial age examined in the cohort is similar to estimates in Table 2. Smoking cessation rates (6,) are similar, except for the 1915 cohort. The absolute error and bias is larger--especially in the three oldest cohorts. Thus, the hypothesis that the relative magnitude of individual risk heterogeneity, conditioned on the cohort distribution of smoking, decreases over age within cohorts can be rejected, i.e., the gamma model, with constant CV, fits better than an inverse Gaussian model with a decreasing CV. Cohort specific estimates with n = 0.5 (i.e., CV increases with age) are in Table 4 (see Appendix). The x2 in Table 4 is marginally better (by 1.9%) than for the gamma model with the relative error nearly identical (2.26% vs. 2.27%) and similar bias except for the 1880 cohort. The RR, as for other models, starts high (i.e., 13.3%) as does CQ,O(13.9%) and declines with age. 6,
98
et al.
K. G. MANTON
estimates are similar to the other models. The cohort CVs are similar to those for the model with n = 1 except for the oldest cohort. Thus, heterogeneity of risk relative to the mean may tend to modestly increase with age within cohorts. The three models produce e.g., the fiz for the smokers arise from a statistical
similar estimates for parameters describing individual is similar in each model as is the CV within groups.
averaging
of the dynamics
of mortality
risk factors. Robustness of the morbidity-mortality in longitudinal data sets with multiple measures dynamics
were estimated
examined
(311. Despite
be statistically
directly
and temporal
changes
in
process parameter estimates was confirmed of individual risk factor values where their
and the equilibrium
the robustness
selection
processes, This may
of parameter
of the component
estimates,
certain
processes
models
empirically
(i.e., n = 2.0) can
rejected.
We also estimated (see Appendix),
models
we present
of total
mortality
the best fitting
for females for n = 0.5, 1.0, and 2.0. In Table
model
5
(with n = 1) for females.
The fit to the data (x2 declined 54.0%) improved over the homogenous population model (Table 1) with bias nearly eliminated (-97.1%). The improvement in precision is not as large as for males because proportionately fewer females smoke-and they tend to start at later ages. Female cohort data is fit less well (2.75% relative error) than for males (2.27%). Shape parameter estimates (a,) are about 5% to 18% lower than for males up to the 1885 cohort-then female (Y~‘s are larger because those partial cohorts are observed wholly at ages past menopause (i.e., the 1885 cohort is age 65 in 1950) where female mortality starts to increase more rapidly (but from lower levels) than for males. The RR for female smokers is higher than for males for all but the 1890 cohort suggesting that while fewer females smoke, smoking has a large effect on individual females. Female smoking cessation rates (5,) are higher than for males for the first five cohorts. This is partly because there is less smoking for white females (Figure 1)) and a lower RR, producing larger relative cessation rates. For example, white females decline from a peak prevalence of 26.2% in 1960 to 7.5% in 1987-a greater relative, but smaller absolute, decline than for males. The higher RR for older female cohorts is likely due to their lower overall age specific mortality rates. The CV is higher for younger and older cohorts-though female CVs tend to be larger than for males. Figure 3 shows the plots of fits for white female cohorts. Mortality trajectories are fit well by the model. Age specific rates are lower than for males. In Table 6 (see Appendix), we present the joint lung cancer and nonlung cancer mortality analysis for males. Lung cancer mortality (LCM) is modeled as a Weibull, and all other mortality (OM) as a Gompertz, within cohorts. For both, individual risks are assumed gamma (n = 1.0) distributed.
Modeling
lung cancer explicitly
improves
the information
available
in the mortality
data about the effects of smoking, and smoking cessation, because lung cancer is a cause of death with a strong and well-documented relation to smoking behavior. Thus, its inclusion should improve estimates of the age trajectory of smoking effects-especially in the extreme tail of the cohort survival distribution where parameters such as the CV of risk may be difficult to estimate with precision. In Table 6, the Weibull shape parameters for LCM are 5 to 8 for birth cohorts of 1915 to 1890. Changes in the Weibull for cohorts whose observation begins at different ages are reasonable because lung cancers starting in older persons are less likely due to preprogrammed (i.e., inherited) genetic susceptibility, so that more cellular DNA errors have to be environmentally induced. Gompertz parameters for OM are similar to values in Table 2. Latencies of 5 to 30 years were assumed for both total and lung cancer cohort mortality. These latency times were similar to those for other analyses of cohort lung cancer mortality [31]. The shorter latencies for young cohorts arise because, for persons dying at younger ages of cancer, the tumor biology is either histologically more aggressive, or host defenses defective. The rate of growth of tumors interacts with the number of genetic errors required for a tumor to initiate at a given age (minus the latency). A similar result is found for breast cancer mortality with early disease latency half
Analyses of Cohort h’lortalitj 0.30
1
0.25 j
65 170
‘60
Figure 3. Model N = 1: total mortality (predicted aud observed) cessation for nine U.S. white female cohorts aged 30 to 70 in 1950.
(about
seven years)
latency
affects
that
for late onset
the shape parameter
not affect the shape parameter results
of Sellers
enzyme
system
smoking earlier
breast
cancer
of the Weibull
of the Gompertz.
(a mean latency
lung cancer
used to model lung cancer
risks.
models
as is bias and relative
Estimates
For LCM,
for the joint
error.
the RR increases analysis
14 years).
mortality.
cofactors.
This It does
is consistent
with the
defect in the cytochrome
The effect on OM represents
on a broad range of diseases with many different
across cohorts.
of about
The higher CV for LCM
et al. [17] who found that persons with a genetic had highly elevated
with smoking
P450
the effects of
6, are similar to the values in
For OM, the RR for smokers
starts
high and declines
for the 1910 and older cohorts.
of LCM and OM for female cohorts
are in Table
7 (see Appen-
dix) . The female data is not fit as well as the male data-though Female
LCM
is described
differences
less well and with more bias-especially
very high RR for OM for smokers
are not large for OM.
in younger
cohorts.
WC hnd a
in the 1910 to 1920 cohorts.
4. DISCUSSION We used a model to represent first is cigarette health
smoking.
surveys.
variation
The
within
by a Dubey ian performed three models assuming
This type of heterogeneity
second
cohorts,
distribution
inverse Gaussian
ity estimates
is, conditional
risk groups, which,
(decreasing
least well.
is discrete.
and by cause of death.
Smoking
and marginal
risk factor
data from national
CV, when 11 = 2 is an
developed
better
in integrating health
from epidemiological
individual is described
CV. The inverse Gauss-
data performed
there was a large benefit
risk factor
The
data came from U.S. variation
with constant slightly
analyses.
heterogeneity,
and when n = 0.5 h
Thus,
from models
mortality
Individual
The model with 71 = 0.5 performed
i.e., marginal made
in cohort
on the discret#e smoking
when n = 1, is a gamma
CV),
with heterogeneity
risk homogeneity.
heterogeneity,
two sources of risk variability
surveys
than
better
71 = 1.0.
All
than the model
data sources
on risk
and risk heterogene-
and clinical
data.
Parameters
100
describing
K. G. MANTON et al.
individual
The CV in Tables
risks estimated
from combined
6 and 7 are interesting
data are more robust
given analyses
of genetic
and precise.
determinants
of lung cancer
where a codominant inheritance of an autosomal gene accounted for 69% of lung cancer at age 50, 47% at age 60, but only 22% at age 70, i.e., over 20 years of age apparently only 31.9% of the persons with this type of genetic susceptibility survived [17]. This is akin to the genetic determination late disease
of breast is related
cancer at early ages which is related to accumulated
partly selected for genetically show a higher risk for smokers determined
by circulatory
determined in general.
diseases
to family history
risk factor effects in a population
and genetics
while
which has already
been
mortality risks. The Gompertz for all other causes This is because risks in this group are predominantly
where smoking
risks are more rapidly
manifest.
Smoking cessation parameter estimates (6,) showed large reductions at young ages-probably due to the relation of the start of follow-up (1950) with early Surgeon General reports (1963) and the ages at which smoking habits are first formed. Female cessation rates were larger than for males reflecting a flat prevalence of smoking in female cohorts up to 1920 (female smoking started at later ages) and declines-from moderate to relatively much lower levels in a short time span. The moderate peak levels meant relative declines for females were larger than for males and the shorter period over which declines occurred implied faster cessation rates. 6, is estimated from a model where there is heterogeneity within each smoking group, so that there is a large adjustment for mortality (being concentrated in high risk persons). Since the female RR tends to be higher than for males, this suggests that their S, would be larger due to a greater correction for mortality, i.e., since a,, ^/c, and RR are correlated, their estimates will not be the same as Since 3% reflects individual risk differences derived from changes assumed to be independent. adjusted for smoking risks, and disease latency affects other model parameters, the interrelation of all parameters have to be determined simultaneously to understand the risk mechanisms. The best model seems to be the one which includes the marginal distribution of smoking, models lung cancer and total mortality separately, one as a Weibull and one as a Gompertz hazard function, and describes individual unobserved risk heterogeneity by a distribution with an increasing CV. This model performed better empirically-although its advantage over a similar model with Gamma distributed unobserved heterogeneity is not large. Also important is the fact that the components used to construct this model had a higher level of biological credibility. The Weibull model of carcinogenesis is well established for many solid tumors [16]. The explicit modeling of smoking effects is strongly suggested by the epidemiological literature which shows that it is a powerful risk factor, not only for lung cancer, but for many other diseases. However, it is also a risk factor for which exposure varied strongly over cohorts. A study of surgical interventions in patients age 90 to 103 showed very good outcomes over the period 1975 to 1985 because those birth cohorts had a low prevalence of early smoking (80% had never smoked) and thus, had little chronic pulmonary disease. The separate, explicit treatment of lung cancer, a disease with a high RR for smoking, helps to better estimate the trajectory of the loss of smokers in the joint likelihood. The fact that the CV was higher for the youngest and oldest cohorts is consistent with epidemiological studies showing genetic heterogeneity tends to be exhausted by age 85 and that environmental or behavioral exposures tend to accumulate with age. The stability, or slight increase, in the CV over age (within cohort) suggests the average risk tended to have the same rate of decline as the standard deviation of individual risks, so that the relative heterogeneity of risks is preserved to late ages. Thus, there is an explicit, biological rationale for each component of this model. To select another model, where some components had less biological credibility, would require strong statistical evidence (i.e., a much better fit for the alternate model) before rejecting the model most consistent with the available prior information. Thus, the procedure not only provides a strategy that is generally useful because of its ability to combine the types of data, each with specific limitations, that are often available (e.g., survey and vital statistics data), but also because it provides a logical basis for integrating prior scientific information and for assessing the weight of that evidence, in choosing a specific model. This is
Analyses of Cohort Mortality
101
similar to Bayesian principles of inference except that the prior information a stochastic process model of the events as they occur over time. Thus,
using
a population
model whose parameters
are estimated
here is embedded
from multiple
data
in
sources,
we evaluated the mechanisms of cohort smoking risks. Use of the combined data produced fits of cohort mortality which were unbiased and with good precision. The ability to integrate data was due to the use of a global likelihood which is constructed initially to describe the situation as if there were no missing data. The parameters of the global likelihood were then constrained to reflect combination procedure
the portions
of data
of parameter can produce
missing
estimates inconsistent
mortality can be improved health surveys.
by using
in different
data
sets.
made independently results
This is different than piece-wise from different data sets. A piece-wise
[4]. The results
ancillary
risk factor
suggests data
demographic
from nationally
analyses
of
representative
REFERENCES 1. M.E. Marenberg, N. Risch, L.F. Berkman, B. Floderus and U. de Faire, Genetic susceptibility to death from coronary heart disease in a study of twins, New England Journal of Medicine 330, 1041-1046 (1994). 2. S. Marriotti, P. Sansoni, G. Barbesino, P. Caturegli, D. Monti, A. Cossarizza, T. Giacomelli, G. Passeri, U. Fagiolo, A. Pinchera and C. Franceschi, Thyroid and other organ-specific autoantibodies in healthy centenarians, The Lancet 339, 1506-1508 (1992). 3. C. Brown and L. Kessler, Projections of lung cancer mortality in the United States: 1985-2025, Journal of the National Cancer Institute 80, 43-51 (1988). 4. 5. 6. 7. 8.
9.
R.T.J. Little and M.D. Schluchter, Maximum likelihood estimation for mixed continuous and categorical data with missing values, Biometrika 72, 497-512 (1985). K.G. Manton, H.D. Tolley, G.R. Lowrimore and A.I. Yashin, Combining multiple sources of data in cohort analyses, Working Paper No. M574, Duke University, Center for Demographic Studies, (1995). D.R. Brillinger, The natural variability of vital rates and associated statistics (with discussion), Biometrics 42, 693-734 (1986). K.G. Manton, E. Stallard and J.W. Vaupel, Alternative models for the heterogeneity of mortality risks among the aged, Journal of the American Statistical Association 81, 635-644 (1986). A.I. Yashin, Dynamics of survival analysis: Conditional Gaussian property versus Cameron-Martin formula, In Statistics and Control of Stochastic Processes: Steklov Seminar, 1984, (Edited by N.V. Krylov, R.S. Lipster and A.A. Novikov), pp. 466-485, Optimization Software, New York, (1985). C.E. Finch, Longevity, Senescence, and the Genome, University of Chicago Press, Chicago, IL, (1990).
10. G.A. Sacher, Life table modification and life prolongation, In Handbook of the Biology of Aging, (Edited by J. Birren and C. Finch), pp. 582-638, Van Nostrand Reinhold, New York, (1977). 11. N.R. Cook, S.A. Fellingham and R. Doll, A mathematical model for the age distribution of cancer in man, Intenzational Journal of Cancer 4, 93-112 (1969). 12. B. Rosenberg, G. Kemeny, L. Smith, I. Skurnick and M. Bandurski, The kinetics and thermodynamics of death in multicellular organisms, Mechanisms of Ageing and Development 2, 275-293 (1973). 13. A.C. Economos, Rate of aging, rate of dying, and the mechanisms of mortality, Archives of GerontoEogical Geriatrics 1, 3-27 (1982). 14. P. Armitage and R. Doll, Stochastic models for carcinogenesis, In Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Vol. IV, Biology and Problems of Health (Edited by J. Neyman), pp. 19-38, University of California Press, Berkeley, CA, (1961). 15. P. Armitage and R. Doll, The age distribution of cancer and a multistage theory of carcinogenesis, British Journal of Cancer 8, 12 (1954). 16. E.R. Fearon and B. Vogelstein, A genetic model for colorectal tumorigenesis, Cell 61, 759-767 (1990). R. Elston, A. Wilson, G. El&on, W. Ooi and H. Rothschild, Evidence for 17. T. Sellers, J. Bailey-Wilson, mendelian inheritance in the pathogenesis of lung cancer, Journal of the National Cancer Institute 82, 1272-1279 (1990). 18. M. Hollstein, D. Sidransky, B. Vogelstein and C. Harris, p53 mutations in human cancers, Science 253, 49-53 (1991). 19. S.D. Dubey, Some percentile estimators of Weibull parameters, Technometrics 9, 119-129 (1967). 20. P. Hougaard, Life table methods for heterogeneous populations: Distributions describing the heterogeneity, Biometrika 71, 75-83 (1984). 21. J.A. Nelder and R. Mead, A simplex method for function minimization, Computer Journal 7, 308-313 (1965). 22. A.J. Coale and M. Zelnik, New Estimates of Fertility and Population in the united States, Princeton University Press, Princeton, NJ, (1963).
102 23.
24. 25. 26. 27. 28. 29.
30. 31.
K. G. MANTON et al. J.S. Passe& J.S. Siegel and J.G. Robinson, Coverage of the national population in the 1980 census, by age, sex and race: Preliminary estimates by demographic analysis, In Current Population Reports, Ser. P-23, No. 155, U.S. Bureau of the Census, Washington, DC, (1982). J.S. Siegel, Estimates of the coverage of the population by sex, race and age in the 1970 census, Demography 11, 1-23 (1974). J.E. Harris, Cigarette smoking among successive birth cohorts of men and women in the United States during 1900-80, Journal of the National Cancer Institute 71, 473-479 (1983). W.H. Wetterstrand, Parametric models for life insurance mortality data: Gompertz’s law over time, nansactions of the Society of Actuaries 33, 15!+175 (1982). B. Efron and C. Morris, Data analysis using Stein’s estimator and its generalizations, Journal of the American Statistical Association 70, 311-319 (1975). Ft. Doll, R. Peto, K. Wheatley, R. Gray and I. Sutherland, Mortality in relation to smoking: 40 years’ observations on male British doctors, British Medical Journal 309, 901-911 (1994). I.J. Selikoff, Disability compensation for asbestos-associated disease in the United States, Report to the U.S. Dept. of Labor Environmental Sciences Laboratory, (Contract No. J-9-M-8-0165), Mount Sinai School of Medicine, New York, (June 1981). J. Peto, H. Seidman and I.J. Selikoff, Mesothelioma incidence among asbestos workers: Implications for models of carcinogenesis and risk assessment calculations, British Journal of Medicine 45, 124-135 (1982). K.G. Manton, E. Stallard, M.A. Woodbury and J.E. Dowd, Time varying covariates in stochastic multidimensional models of mortality and aging, Journal of Gerontology: Biological Sciences 49, B169-B190 (1994).
Gompertz
Risk of
Alodel
X’
2172.1
994.0
718.0
Age in
1950
30
35
40
2.6
1125.l
906.0
822.:3
GO
65
70
10142 7
3.2
1330.9
55
1.1
I.8
39
50
5.0
953.6
1120.7
45
6.9
9.9
13.3
Smokers
Relative
2. Gompertz
11.0
!I.6 9. I
9.6
10 3
11.2
12.5
13.7
13.9
( x 100)
Parameter)
(Shape
n
(Scale
3.56 2.58 3.38 3.50
7.5 7.5 7.7 8.1 8.3
10.0 10.4 9.4 7.7 7.1
X2 1867.0 1421.0 2120.6 2845.7 3699.7
- 0.002 - 0.006 - 0.014
- 0.074 - 0.107 2.85 5.52
8.7 8.5
6.4 8.1
3010.8 6006.7
-0.315 - 0.706
2.07
6.36
:T.x1
L.2d
11.42
9.83
10.22
7 32
1 88 3.20
5.14
1.03
3.22
2.26
.468
2.80
Parameter)
,211
(Scale
(x10”)
Smokers
P2
for nine U.S. white
,526
,265
327
,063
.ooo
.I50
,583
,695
.240
fi,
of Variation
(x
0.0
0.0
0.0
0.9
1.4
2.4
3.1
3.7
5.3
100)
6,
Average
2 27
I.80
1 855 I .78
1.77
1.56
1.58
1.63
2.31
4.19
(%)
Error
Relat,ive
Absolute
(%)
.169
,511
.623
- .232
- .440
‘.
.(I28
.25”
025
051
- ,028
,048
,042
,044
.056
- ,048
(“/o)
(Bias)
Error
Kelat ive
Average
(6,)
- .482
1.546
- ,827
- ,840
- ,104
aged 30 to 70 in 1950 with smoking
Coefficient
male cohorts
3.77
3.21
8.2
26963.1
2.40
8.4
7.4 9.1
3883.8
- 0.360
2107.8
- 0.032
(“lo) 4.94
(XTOO)
(x105)
(x100)
,228
Parameter)
(x105)
Nonsmokers
PI
mortality
ml.93
6.22
(N = 1.0) model
of total
1 265
3.19 0.534
7.4
-2.733
7.6
24.5
29.2
- 1.151
4.81
(x100)
6.75
4.16
5.49
7.3
7.1
5.85
7.4
7.2
6.72
7.4
7.7
34.9
31.7
26.8
23.8
22.7
Cohort
Table
81590.7
2690.6
6314.6
70
0.614
65
0.303
- 1.747
8023.3
60
- 0.989
8113.7
55
0.279
- 1.327
12313.5
50
0.179
- 1.594
12197.7
45
0.114
2.083
11397.1
40
0.094
18.3
8.86
- 3.008
13746.0
35
0.047
(%) 2.053
7.8
(X0)
15.8
6.84
X2
6794.2
30
(x 100)
(Bias)
81
Error
(x105)
1950
Error
Error
Error
a
Age in (Bias)
Error
Relative
Absolute
Error
Average
Relative
Females
Cohort
White
Relative
U.S. mortality.
Absolute
PI
of total
Relative
models
Absolute Average
specific
Average Average
and cohort
Average
Males
1. Gender
Average
White
Table
0
01u
0.003
Il.001
0.001
o.no1
0.000
0.000
0.000
0.000
( x 100)
Error
Absolute
Average
cessation.
0.735
0.357
0.236
0.107
0.096
0.052
0.033
0.022
0.017
0
001
0 000
0.000
~~0.000
- 0.000
- 0.000
- 0.000
- 0.000
- 0.000
(x100)
Errol
Average
-0.329
-0.167
- 0.092
0.007
0.008
0.004
0.002
0.000
0.000
(x100)
Error
(x 100)
Average
Error
Average Absolute
5.0
3.9
3.2
2.6
1.8
1.9
954.0
1120.7
1330.9
1168.7
918.9
2587.9
45
50
55
60
65
70
3.9
3.2
2.6
1.8
1.1
953.3
1120.7
1330.9
1109.9
902.4
680.3
45
50
55
60
65
70
9952.1
5.0
702.1
40
6.6
9.1
980.4
35
10.3
9.1
9.5
9.6
10.3
11.2
12.5
13.6
3.22
6.45
3.98
3.20
1.88
1.03
,488
,255
,212
13.3
2172.1
30
13.9
(x105)
(x:00)
Smokers
Pl
X2
Relative Nonsmokers
1950
1.18
5.98
3.53
3.22
1.88
1.03
2.22
10.76
9.06
10.26
.602
.296
.383
.060
.ooo
.I43
5.15 7.31
.576
,668
.237
fit
of Variation
Coefficient
3.12
2.13
2.80
(x105)
Smokers
L32
0.0
0.0
0.0
0.9
1.4
2.4
3.2
3.9
5.3
(x 100)
6,
2.72
4.21
1.79
1.55
1.77
1.56
1.58
1.65
2.34
4.19
(%)
Error
- 0.000 - 0.000 - 0.000
0.000 0.000 0.000 0.000 0.001 0.001 0.001
- .048 .054 ,042 ,041 ,048
-
.38(1
- 1.715
- .057
-.lOO
- ,028
- 0.000 - 0.005
0.003 0.008
- 0.000
-0.000
- 0.000
- 0.000
Error (x100)
(x100)
(Bias) (W
Average
Average Absolute Error
Error
Relative
Relative
,241 .735 ,597 ,154 .ooo .068 .307 .251 .444
2.31 3.21 5.14 7.31 10.20 10.17 11.65 3.55
fi,
of Variation
Coefficient
2.80
(x105)
Smokers
02
0.0
0.0
0.0
0.9
1.4
2.4
3.1
3.6
5.3
(x100)
6,
-
2.26
1.64
1.78
1.54
1.77
1.56
1.59
1.62
2.29
4.19
(W
Error
-
- 0.000
- 0.000 - 0.000 0.000 - 0.000
0.000 0.000 0.000 0.000 0.001 0.001 0.001 0.003 0.002
- ,048 ,063 ,047 .042 ,048 - ,027
- .006
- ,071
-.018
- ,038
-0.000
- 0.000
- 0.000
- 0.000
Error (x100) (%)
(Bias)
(x100)
Average
Average Absolute Error
Error
Relative Relative
Average
Average Absolute
(N = 0.5) model of total mortality for nine U.S. white male cohorts aged 30 to 70 in 1950 with smoking (6,) cessation.
11.5
9.2
9.7
9.6
10.3
11.2
,430
.196
.210
(x105)
Nonsmokers
Pl
Risk of
Model
Age in
Cohort
Table 4. Gompertz
12013.2
7.3
744.4
40
12.6
13.9
10.9
35
(x YOO)
1015.5
2172.2
30
Smokers
13.9
X2
1950
Risk of
Relative
Average
Average Absolute
(N = 2.0) model of total mortality for nine U.S. white male cohorts aged 30 to 70 in 1950 with smoking (6,) cessation.
13.3
Model
Age in
Cohort
Table 3. Gompertz
5.4
4.2
3.3
2.2
9.4
12.7
3163.2
1766.7
1426.1
965.5
587.4
45
50
55
60
65
70
12391.4
a.2
537.7
1768.3
40
13.8
787.5
35
12.2
10.7
9.3
9.1
9.7
9.9
10.7
12.3
.54
1.55
4.02
3.66
2.22
1.73
.96
.32
.21
20.0
1389.0
30
13.1
(x105)
Smokers
X2
1950
(ZOO)
Nonsmokers
Risk of
Model
Age in
Pl
Relative
6.84
14.56
8.88
12.11
9.23
9.37
7.90
4.45
4.27
(x105)
Smokers
a
,580
.437
,351
.045
.Oal
,000
.OOO
,388
,424
fi,
of Variation
Coefficient
0.0
0.0
0.0
0.0
2.5
4.1
4.8
4.9
5.7
(x100)
6,
~
2.75
1.40
1.62
1.81
2.27
3.38
2.98
1.82
2.72
4.31
(%)
Error
-
- ,014
.077
-.004
- ,005
,017
,012
,020
,006
- ,031
-.150
(%)
(Bias)
Error
Relative
Relative
Average
Average Absolute
Error
0.002
0.002
0.001
0.001
0.001
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
-0.000
- 0.000
0.000
(x100)
Error (x100)
Average
Absolute
Average
(N = 1.0) model of total mortality for nine U.S. white female cohorts aged 30 to 70 in 1950 with smoking (6,) cessation.
Cohort
Table 5. Gompertz
45
45
50
50
55
55
60
60
65
65
70
70
LCM
OM
LCM
OM
LCM
OM
LCM
OM
LCM
OM
LCM
OM
171.1
30
30
25
25
25
25
871.3
23.8
951.9
26.2
1212.9
92.8
1498.0
25
25
1307.6
115.3
1007.0
260.1
757.4
258.3
20
20
20
20
20
20
4 Value in parentheses
mixed Weibull
is not changed.
is taken to the l/(a
1.0
1.9
1.5
1.6
2.4
1.7
2.9
1.6
3.5
1.5
4.4
1.4
6.6
1.5
11.2
1.1
15.0
1.2
- 1) power.
(153.5)
(12.0)
(9.4)
(7.7)
(9.4)
(9.2)
(11.0)
(2.4)
(4.2)4
Smokers
Risk of
Relative
is raised to 01 - 1 power; the relative
the same for both LCM and OM.
scaJe parameter
3 6 is assumed
2 Gompertz
1 Weibull scale parameter
10602.6
40
OM
47.4
926.6
OM
40
LCM
15
15
1053.7
35
OM
2069.9
LCM
35
LCM
5
5
30
58.7
Years
1950
OM
X2
in
Age in
30
Model
Latency
Cohort
LCM
of gamma
risk of smoking
11.0
8.7
9.0
6.3
9.4
5.3
9.5
5.5
10.0
6.6
10.7
7.0
12.3
7.1
13.7
7.6
13.9
10.1
(OM is x100)
Q
Pl
all stages
62.7
631.3
70.3
498.1
46.8
376.8
38.0
448.2
16.8
571.0
11.7
671.1
6.21
717.1
1.58
864.1
.3782
937.71
(x105)
affected
mortality
and Gompertz
a
.ooo
,000
in carcinogenesis.
.528
5.721
1212.8 62.7
,293
3.754
,323
1.594
.OOO
.ooo
.OoO
SKI0
.OOO
108.5
795.4
111.0
635.5
111.7
707.8
58.5
852.7
51.8
972.4
.632
1060.8 40.8
,689
3.841
,313
4.989
fi,
of Variation
17.8
987.6
5.69
1098.8
(x105)
Smokers
Coefficient
0.0
0.0
0.0
1.0
1.5
2.5
3.4
4.1
5.63
(x100)
6,
mortality
(Bias)
Error
,042
2.78 2.27
2.95 1.86
.286%
- .315 - .263
1.83
- .022%
.014
2.62
2.29%
- .534
1.64
3.10%
- ,671 - .026
3.19
- ,757
1.75
-.007
,050
2.00
1.91
- ,075
1.69
3.42
.550 .021
2.99
.037
1.72
1.19
.534
4.20
3.56
,799 - .062
3.09
-
Error
-
Relative
Relative
(%)
Average
Average
(%)
for
Error
.OOO
.018
,000 - .OOL .015
.009 ,152 .QO6 ,279
,272
- .065
.OOO
- ,002
.I40
.005
- ,002 -.OOl
,011
.002 074
.OOl - .OOl ,006
.OOO .007 .049
.ooO .026
.ooO
.OOO
.OQl
.006
.OOO .OOo
.OOl .021
(x100)
Error (x100)
Average
Absolute
Average
and the second to other mortality.
Absolute
mortality
(N = 1.0) model of all other
The first line in each pair refers to lung cancer
Nonsmokers
(6) cessation.
(N = 1.0) model of lung cancer
aged 30 to 70 in 1950 with smoking
Table 6. Combination
nine U.S. white male cohorts
of gamma
mixed Weibull
78.8
1355.4
48.0
769.7
323.0
551.5
442.8
1647.4
297.7
3164.1
5
5
5
5
15
15
15
15
15
15
30
30
35
35
40
40
45
45
50
50
LCM
OM
LCM
OM
LCM
OM
LCM
OM
LCM
OM
20
4 Value in parentheses
12.8
1.0
9.4
2.0
2.2
1.0
3.4
1.0
3.7
1.5
3.3
1.6
8.2
1.5
15.7
1.1
22.7
1.2
is not changed.
is taken to the l/(a
scale parameter
- 1) power.
(11.3)
(21.9)
(52.4)
(17.6)
(1.7)
(7.2)4
12.2
3.5
10.7
4.5
9.3
4.5
9.2
5.7
9.5
8.1
9.1
9.4
10.6
8.5
12.5
8.7
13.2
9.9
(OM is x100)
a
is raised to a - 1 power: the relative risk of smoking
3 6 is assumed the same for both groups.
’ Gompertz
1 Weibull scale parameter
* SCassumed
1437.5
to be zero.
41.4
589.9
12251.0
20
973.7
40.4
1436.2
OM
70
70
LCM
OM
20
20
20
LCM
65
65
LCM
60
OM
OM
60
LCM
56.5
15
OM
20
108.9
1763.1
15
55
55
LCM
X2
Years
1950
Smokers
Risk of
in
Age in
Model
Relative
Latency
Cohort Pl
(x105)
affected
76.5
69.4
120.5
356.3
56.0
194.9
47.3
330.3
39.4
829.3
45.0
1071.5
40.3
1003.9
7.93
743.5
8.14
1017.0
(x105)
Smokers
&
,583
2.770
.439
,000
,351
2.795
.Ooo
4.774
.ooO
.OOO
.OOO
.OOo
,000
,000
,404
5.500
.460
6.559
fi,
of Variation
Coefficient
all stages in carcinogenesis
5.96
69.4
12.8
178.8
25.4
194.9
13.9
330.3
10.6
537.4
13.5
667.7
4.91
683.8
.506
695.7
.3592
815.1’
and Gompertz
0.0’
0.0
0.0
0.0
2.8
5.8
4.9
5.1
5.S3
(x100)
6,
-
2.70%
8.00%
1.41
6.03
1.63
4.75
1.82
4.47
2.28
5.38
3.39
8.36
2.69
10.12
1.82
7.95
2.72
4.04
4.29
6.22
(%)
Error
-
-.003%
1.04%
,078
-.271
-.004
-.561
-.006
-.731
,020
-.317
.022
.432
,023
1.961
.016
1.510
-.035
.726
-.147
1.163
(%)
(Bias)
Error
Average Relative
Average Absolute Relative
for
.202
.002
,202
,002
,137
,002
,080
,002
.076
,003
.041
,003
,015
.003
.012
,001
.012
,001
,032
,000
,010
.OOO
,016
-.OOl
,005
.OOO
.003
,000
,001
,000
,000
,000
.OOO
,000
.OOO
.OOO
(x100)
Error
Error (x 100)
Average
Absolute
Average
and the second to other mortality.
(N = 1.0) model of all other mortality
The first line in each pair refers to lung cancer mortality
Nonsmokers
(6) cessation.
(N = 1.0) model of lung cancer mortality
aged 30 to 70 in 1950 with smoking
Table 7. Combination
nine U.S. white female cohorts