Analyses of cohort mortality incorporating observed and unobserved risk factors

Analyses of cohort mortality incorporating observed and unobserved risk factors

Math. Applic. Vol. 25, No. 7, pp. 89-107, Computers 1997 Copyright@1997 Elsevier Science Ltd Printed in Great Britain. All rights reserved 0898-122...

2MB Sizes 0 Downloads 40 Views

Math. Applic. Vol. 25, No. 7, pp. 89-107,

Computers

1997

Copyright@1997 Elsevier Science Ltd Printed in Great Britain. All rights reserved 0898-1221/97 $17.00 $- 0.00 PII: s0895-7177(97)00051-4

Pergamon

Analyses of Cohort Mortality Incorporating Observed and Unobserved Risk Factors K. G. MANTON* Center for Demographic Studies, Duke University 2117 Campus Drive, Box 90408, Durham, NC 27706-0408, U.S.A.

G. LOWRIMORE Center for Demographic Studies, Duke University 2117 Campus Drive, Durham, NC 27706, U.S.A. A. YASHIN Odense University, Institute of Community Health J.B. Winslows Vej 17, DK-5000 Odense, Denmark H. D. TOLLEY Department of Statistics, Brigham Young University Rm. 226 TMCB, Provo, UT 84602, U.S.A. (Received

May 1996;

accepted

July

1996)

Abstract-Interventions to prevent disease and increase life expectancy are most effectively developed from data on pathways to disease and death. Unfortunately, most national data sets separate end-state information-i.e., causespecific mortality-from pathway data describing how specific diseases result from environmental and behavioral processes. Thus, a coherent empirical picture of routes to death from a diversity of causes requires a data combining and modelling strategy that, of necessity, incorporates theory and prior-knowledge-based assumptions together with sensitivity analyses to assess the stability of conclusions. In this paper, a general data combining statistical strategy is presented and illustrated for smoking behavior and lung cancer mortality. Specifically, National Health Interview Survey data on smoking is combined with U.S. vital statistics data 1950 to 1987 to analyze the joint distribution of total and lung cancer mortality. Parameters were estimated for mortality, smoking cessation processes, and for individual risk heterogeneity for nine U.S. white male and female cohorts aged 30 to 70 in 1950 and followed until 1987.

Keywords-Gompertz cessation.

hazard, Weibull hazard, Dubey distribution, Cohort mortality, Smoking

1. INTRODUCTION Mortality preventing

data is useful in making disease and increasing

national

estimates

life expectancy.

of the effects of health interventions

In particular,

U.S. mortality

for

data has the ad-

vantages

that the number of deaths is large, data on all causes of death are reported, data on individual deaths is available back to 1950, and all ages, population groups, and geographic areas are represented. However, the effects of health interventions can be better assessed when information on the pathways to disease, and then to death, are used. Unfortunately, extant national

*Author to whom all correspondence should be addressed. This research was supported by NIA Grant AG01159, AG07025, authors thank anonymous referees for valuable comments.

and NIH/NIA

Grant PO1 AG08791-01.

Typeset 89

by d&-T@

The

K. G. MANTON et al.

90

mortality

data,

though

providing

information

on end states

(e.g., cause specific

mortality)

do

not provide pathway information describing how particular diseases result from environmental and behavioral processes, e.g., mortality data lacks measures of risk factors, and health events, before death. Hazard functions estimated assuming population risk homogeneity do not unbiasedly describe heterogeneous.

the age dependence Epidemiological

of the hazard studies

for an individual

show individual

in a population

risks vary due to differences

which is risk in smoking,

nutrition, physical activity, and environmental and workplace exposures. Studies of monozygotic twins show mortality selection operates strongly from ages 35 to 85 (e.g., [1,2]). Epidemiological study populations, however, are generally not nationally follow-up to describe cohort differences. Cross-sectional sentative, of routes

may confound to death

cohort,

period,

from a diversity

representative and may not have enough health surveys, though nationally repre-

and age effects [3]. Thus,

of causes

requires

a data

a coherent

combining

empirical

and modeling

picture strategy

that, of necessity, incorporates theory and prior knowledge based assumptions. In this paper we present a general statistical data combining strategy and illustrate it in the context of smoking behavior and lung cancer mortality. Technically, the information limitations of different data sets can be dealt with by combining them in a “global” likelihood so parameters can be estimated from data on the joint distribution of individual processes parameterized from biological models and on the population distribution of risks. Thus, limitations of to “borrow” information from dependence of cohort mortality risk factors, mortality selection

one data set are compensated for by using a global likelihood other data sets. To construct the global likelihood [4,5], the age is assumed related to the distribution of individual risks, measured on observed and unobserved risk factors, and the age dependence

of hazards for individuals. By controlling for observed and unobserved heterogeneity of individual risks in a cohort, one can estimate individual transition rates unbiased by mortality selection. Thus, hazard function parameters can be partly adjusted for risk heterogeneity using national health survey data and partly adjusted for the theoretical, or indirectly observed, distribution of individual risks net of We applied this model to survey data on smoking and to U.S. directly observed risk factors. white male and female total and lung cancer mortality for cohorts aged 30 to 70 in 1950 followed until 1987. We assess the fit of the model, and the stability of parameter estimates, to see how much information is gained by combining data sets.

2. THE MODEL As suggested by Little and Schluchter [4], to integrate data describing a common health process, it is necessary first to form a global likelihood function. To do this, we first specify the survival distribution for a cohort or, S(z, [, 2) = Pr {X > z 1<, 2) = e-H*(z*E+),

(1)

where X is the age at death, 5 is the vector of parameters for the hazard function V, z is a vector hazard to 2, and V(U, <, Z) the of risk factors, H*(z, e, z) = Jt ~(‘11,{, z) d u is the cumulative instantaneous hazard. Parameters can be estimated assuming that we have complete information on that cohort using a mixed Poisson likelihood of the form

where d,,, = number of deaths at age x and risk level Z, N, is a mid-year population estimate, H’ is the cumulative hazard, and f(z, t, z) is the distribution function for individual risk factors which mixes the Poisson hazard rates for individuals age x at time t with risk factor values z [5]. In (2), we assume the risk factors t define discrete groups. The factorization of (2) into a hazard,

Analyses of Cohort Mortality conditional

on risk factor data, and a marginal

need to be evaluated factors

01

in respecifying

z are measured

at multiple

risk factor distribution is an assumption (2) to correspond to available data. For example,

time points,

then the distribution

function

that will if the risk

f(z, t, z) could be

a multivariate stochastic process with the Poisson distribution of events estimated conditionally on the values of the outcomes of that process. If some risk factors are unmeasured. then the cumulative hazard H* has to be integrated across the distribution of unobserved risk factors--an integration on individual proportion however,

which is feasible if the distribution of the effects of the unobserved risk factors hazards can be specified from theory. If we have a marginal distribution (e.g., smoking)

mortality

in the distribution that the parameters A Poisson

from a cross-sectional alters the distribution

survey, function

then

f is known

in H*. Thus, the available data will determine to be estimated are mathematically identifiable.

likelihood

is used to describe

for a point

f over time, those dynamics

the conditional

distribution

in trim’.

If,

must) bc included

the requisite

coustri\ints

of health outcomes

so

because

national populations are not “closed” [6]. Consequently, uncertainty in annual mid-year population estimates (due to mortality fluctuations in the first half of the year) makes the variance of rate estimates at least Poisson, i.e., variability in the first half of the year is negative binomial; in the second half, binomial [7]. The convolution of the two distributions produces at least Poisson variation. Additional variation may be added depending on how u is specified to changca over age, or after conditioning on known fixed risk factors. Equation (2) assumes all risk factors are observed. In fact, in mortality data for partial cohorts. many risk factors are either unobserved--or only parameters of their distribution arc observed. To deal with partially observed, or unobserved, risk factors, (2) has to be reparameterized. To account for the partially observed data on smoking in our example, we constructed M mutually exclusive risk groups. If individual transitions from the alive to the dead state, or from one risk factor state to another, are roughly constant over short intervals, the survival of in risk heterogeneous cohort is the weighted average of the hazard for each of the M groups [8]. In (2)) the risks for M groups can be represented by the average risk. If this average can be calculated, thr,n the effect of the distribution of smoking can be transferred from f to the integrated hazard H*. If the hazard for the mth group is v,(x,<), this average is

where ~~(t, Q, the conditional probability national health survey, i.e., for observations

a person age y + t is in group m, is estimated beginning at g, and ending at t

&m(t, C) = - (&I (Y + t, 0 - 77(Y + t, E)) nm(t. 0, where n,(O,<)

= qm is the initial

proportion

in m at age 1~estimated

in qm (where x2=, qm = 1) are due both to distribution of risks in each state, and on relative To solve (3) and (4), we must specify how the of the age dependence of mortality for the jth dent parametric form for V. One function often mortality [9] is the Gompertz, i.e., L’(XL’~
=

m=l,....iu,

from t,he survey.

from a

(4)

Changes

the selection effects of the hazard I/,,, on the> risks between states. average risk in group 712 changes as a function cause. This requires specifying an age depenused to describe the age dependence of adult

pexP(az),

(5)

where 6~ = (a,@) are shape (CY)and scale (p) parameters, pi describes annual percent increases in mortality [lo], p describes initial (Z = 0) mortality rates. For some diseases (e.g., cancer [ll]), or possibly total mortality at late ages [12,13], a Weibull better describes age dependence, i.e., V (T,&)

= &Y1.

(6)

K. G. MANTONet al.

92

In using (6) to model carcinogenesis, tumor growth starts after the uth genetic error disrupts cell growth [14,15]. Laboratory and clinical studies confirm the validity of the Weibull for solid tumors

(e.g., for colon cancer [16] and lung cancer (e.g., in the p53 growth regulation gene (181) causing the time when accumulated genetic damage triggers

[17]) by identifying neoplastic neoplastic

growth. growth,

specific

process reaches a lethal threshold (e.g., a host’s ability exceeded) is the “latency” of the process. While latency

to metabolically support affects the Weibull shape

effects only the Gompertz

hazard

scale parameter. Y

A Gompertz

genetic

errors

In (6), the lag between and the time when the

with a latency

the tumor parameter,

is it

of 1 years is

/3*ea(“-1) = p*e-nletrx,

(a, p, x,1) =

(7)

so that

That is, lag effects do not change a, but are subsumed in P, so the Gompertz shape parameter (and form of age dependence) is not altered. This is important when modeling partial cohort data where unobserved risk heterogeneity is due to either genetic factors, or risk factor exposures accumulated before the first age evaluated for a given cohort (i.e., left censoring of risk factor ex.posures). Thus, for a cohort age 70 in 1950, there may be more indiyidual risk variation due to accumulated exposures than for a cohort age 50 in 1950. Alternatively, for-a cohort aged 70 in 1950, individual genetic heterogeneity may be lost before the first observed age due to mortality selection. Modeling heterogeneity helps adjust other transition parameters for changes in the population distribution of risks. Such differences are absorbed in the Gompertz scale parameter, whereas the value of the Weibull shape parameter is sensitive to such effects. Epidemiological studies show that in modeling mortality to late ages, the heterogeneity of unobserved risks for specific diseases has important effects. The Weibull and Gompertz hazards describe the age trajectory of the average risk in a popuin (5) and (6) d o not appropriately lation when the population is heterogeneous. Thus, we generalize (5) and (6) to represent heterogeneity by reparameterizing those functions to reflect a flexible parametric family of distributions and evaluate how results change for different members of the family. The same distribution and hazard is assumed to exist in each of M risk groups for each of J causes. individuals in group m are subject to a hazard conditional on an unobserved distribution of levels of, the hazard is

The Dubey

distribution

was selected

for ti [19]. This produces

a composite

this risk risk If risk

Gompertz,

(9) or, for the Weibull,

In both (7) and (8), y is the square of the coefficient of variation (CV) at age 0 of zf. Different risk distributions are selected by changing n. The mean of the frailty distribution decreases with age, no matter which value of n is chosen. When n = 2, z; is inverse Gaussian distributed [20], and the CV decreases with age. When n = 1.0, Z: is gamma distributed which has a constant CV. The denominators in (9) and (10) reduce the age rate of increase in mortality after a significant

Analyses of Cohort Mortality proportion consistent than

of the population with the observation

the Gompertz

93

(i.e., enough to remove many high risk persons) dies. This is in population studies, that mortality rises more slowly with age

(or the Weibull)

above age 80. Deviations

from the Gompertz

(or Weibull)

increase as heterogeneity increases [7]. Either (9) and (10) can be applied to the m risk groups comprising each cohort--each age dependent hazards with the same shape, but different scale, parameters. Specification heterogeneous

population

form of the Weibull

or Gompertz

the m groups to solve (3) or (4). The general of J causes of death is

solution

is necessary

to integrate

for (4) with a cumulative

with of the

the V, in

hazard

for each

*mexp{-CjHmj(ti<~)} x,

(t, 0) =

(11) C,4mexp{-CjIImj(t,G)}’

The distribution of persons Gompertz in (9) is TiT,

in the m groups,

rrm(t, <) from (11) with vrn specified

Qm exP [-(l/yG)

ln((1

(t,&2) = c,“=l

For the composite

Weibull

qnt exp

in (lo),

[-(l/YGf

-

(‘YGhn/@))

In ((1

-

+

kYGhn/+Qt)]

(?‘G@m/a)f

+

(W

(YG@m/a)@)l.

it is

qmexp I-(lhw) ln (1 + (rw&J"la))l C,“=,qmew [-(l/-w) ln (1 + (rwbPl~))l

TTn (CSW)=

as the composite

(13)



In both (12) and (13), n was assumed to be 1.0. Other values of n can be evaluated by changing or (10). If tw = (&,&, . . . ,ernr~,a)and~c=(pl,P2,...,Prn,~,~), thehazardforapopulation with m groups is

where

VGrn and UW,,, are hazards

for the mth group

for the composite

Gompertz

(9)

or Weibull,

respectively. If individuals (&n,

= - Cl+

move between Lr

represent

km(t)

= 2

groups,

additional

persons

staying

L(t)n1(t)

terms

6i, (t), m = 1,2, . . , M are needed

in the mth state),

[8]

or

- TV%(t)(&I (Y + t, E) - D(Y + t, C))

(14)

l=l

When

M = 2, the equations

specialize

to

+1(t) = -

[Yl (Y + t, <) -

O(Y + t, r)l Tl (q + &%?(t),

fr2(t)

[vz(y

P’(Y + t, E)1”2(t)

=

-

+ t, [)

-

-

&“2(t).

(15)

Equation (15) completes the reparameterization of (2) to represent both observed and unobserved risk factor distributions so it can be estimated from observed data. Thus, by appropriately specifying v,, and integrating it to form H*, the marginal risk factor distribution variation f can be integrated with the variation of individual risks. The values in <‘, and the distribution qm are evaluated from (2) by computing 7r, from (4), for each value of 2, and ~(y + u,<) from (3)-with the form of v determined (e.g., (12) or (13)) by selecting a value of n. H*(z, E) is computed for each cohort by Simpson’s numerical integration formula. A Nelder-Mead [21] algorithm for minimizing multivariate functions without derivatives was used because of the complexity of the derivatives. The negative of the log likelihood was the objective function. The solution for each cohort was evaluated independently, though cross-cohort parametric restrictions were imposed.

K. G. MANTON et al.

94

Data We analyzed U.S. white male and female cohort mortality for 1950 to 1987. We defined two (M = 2) risk groups: smokers and nonsmokers; and two (j = 2) causes of death: lung cancer and nonlung

cancer

females

were used to compute

in nine cohorts

deaths.

National

Center

for Health

Statistics

mortality

data for white males and

the number of lung cancer and nonlung cancer deaths occurring in 1950 aged 30, 35, . . . , 70 years. The mortality for a specific birth cohort is the

average number of deaths taken over five years of age to reduce age heaping and digit preference (i.e., we followed five cohorts age 28 to 32 in 1950 to age 29 to 33 in 1951, etc., and used the average risks for each group). to calculate

the rates.

We used midyear

Population

estimates

census counts adjusted for underenumeration, Bias arises if migrants are different than (a) migrants (b) migration

population

estimates

were calculated

for corresponding

by linearly

interpolating

sets of ages decennial

race misclassification, and age misreporting [22-241. those originally in the cohort. Bias is small if

are similar to residents, or is small relative to population

size.

The U.S. annual migration rate for 1951 to 1960 is 0.15%; for 1961 to 1970, 0.17%; for 1971 to 1980, 0.21%; and 1981 to 1990, 0.31% (Statistical Abstract of the U.S., p. 10, Table 5). Over a Since we analyzed U.S. whites, we 38-year period, 5.4% of the U.S. population were migrants. only considered white migrants. In 1961 to 1970, this was 45.9%; in 1971 to 1980, 20.4%; and in 1981 to 1990, 11.2% of all U.S. migrants. About 70% of migration occurs below age 30, i.e., the first age analyzed. Thus, bias will be small for white male and female cohorts at least age 30 from 1950 to 1987. The marginal distribution of smoking among white males and females of different ages in 1950 was estimated from the 1978 to 1980 NHIS which contained supplemental questions on smoking (25). Two groups (i.e., smokers and nonsmokers) represented risk heterogeneity due to smoking. This is an improvement over not using data on smoking in the cohort mortality analyses.

1670

1060

I890

1900 BIRTH

Figure 1. 1950 smoking proportions to 70 in 1950.

1910

1920

COHORT

for U.S. male and female birth cohorts aged 30

1930

Analyses

of Cohort Mortality

95

A “coarse” smoking variable from the independent survey data was used because determining the detailed smoking status of persons in 1950 is difficult and subject to error. By using less detailed (but more reliable) data on smoking, the relative risk (RR) estimated between smokers and nonsmokers is conservative. ‘Lcoarse” smoking data. Harris [25] estimated

Nonetheless, a large gain in information is expected--even

with

smoking prevalences for birth cohorts for the year 1950 from 1978 to

1980 NHIS data. A second-degree spline was used to interpolate values for intermediat,e ages and extrapolate values for the oldest cohort. Figure 1 shows the estimated values for 1950. Sixty-four percent of the 1905 male cohort smoked at age 65. The 1920 cohort smoking prevalence peaked at 72% at age 30 in 1950. The 1950 cohort prevalences of white female smokers is lower than for males. The proportion smoking in the 1905 cohort was 24%. For t,he 1920 cohort,, it was 38%. These values are used in our integrated hazard functions as the q,,2’s.

3. RESULTS Gompertz

Estimates

of Partial

Cohort

Mortality

We fit the Gompertz to each of nine white male and female cohorts defined in 1950 t,o all deaths occurring between 1950 to 1987 not using marginal data on smoking, nor adjusting for individual heterogeneity. Cohort specific parameter estimates and measures of fit arc in Tablt 1 (see Appendix). The likelihood ratio x2 is 81,590.7 for males and 26,963.l for females. The male (YC range between 7.1% to 7.8% and 7.5% to 8.7% for females-consistsent with other estimates of Gompertz shape parameters for adult human populations [26]. The relative precision (i.e., average percent residual) for white male mortality over 38 years ranged from 8.9% for the 1915 birt,h cohort, to 3.2% for the 1885 cohort. The weighted average deviation is 6.2% for males and 3.8% for females. Thus, the average error was one part in 16 for males, and one part in 25 for females. i.e., the precision of the Gompertz estimates was poor. We also computed the sum of absolute, and signed, residuals. The first is the size of the absolute average deviation. The second is bias. Estimator precision can be poor but unbiased. Often statistical estimators trade small amounts of bias for significantly improved precision (e.g., Stein estimators, Ridge regression [27]). F or males, the bias is negative, i.e., mortality is over predicted. In females, bias is small and positive for cohorts born after 1895--and negative for cohorts born before 1890. Thus, in addition to large relative and absolute errors, there are systematic biases for the Gompertz model differing by gender. Estimates of Partial Cohort Mortality Using Observed and Unobserved Risk

Data Factor

Distributions

To improve the description of cohort mortality, we introduced data on smoking, and modelled individual heterogeneity, in the Gompertz (n = 1.0, i.e., z,* is assumed gamma distributed). Modeling the effects of smoking on causes of death other than lung cancer reflects many reports showing that the health effects of smoking are broad. In addition to lung cancer, it affects other cancers (e.g., pancreatic and bladder cancer), chronic obstructive pulmonary diseases, heart, disease and stroke, peripheral vascular disease, cirrhosis, and atherosclerosis [28]. By affecting microcirculation, smoking has a secondary effect on and an interaction with diabetes: as well as with many degenerative diseases of specific organs, e.g., renal dysfunction. It also has potential general metabolic effects accelerating such diseases as osteoporosis and other physiologicai functions under hormonal regulation. The only possible positive health effects found by Doll et nl. [28] were for select neurological diseases, e.g., Parkinson’s disease. In addition, the effects of smoking on mortality in Doll et al.‘s [28] 40 year follow-up were high at late ages. Given their large number, the effects of smoking are greater for all other causes of death than for lung cancer

K. G. MANTON et al.

96

death. Smoking cessation rates (6,) are estimated simultaneously with age changes in smoking and nonsmoking related mortality because a large portion of total mortality is smoking related, i.e., a joint analysis better defines both the overall health risks of smoking and the cohort specific smoking cessation rates (a,). Introducing smoking and unobserved produced the results for white males in Table 2 (see Appendix). The x2 is 10,142.7 reduction).

(i.e., an 87.6% reduction).

The relative

absolute

error declined

Bias is reduced

are evident

in estimates

from -1.93

into the model

to -0.028

from 6.35 to 2.27 (a 64.2% reduction).

model’s precision is greatly improved, with bias almost totally distribution of smoking estimated from the NHIS and adjusting Improvements

heterogeneity

for specific cohorts.

(a 98.5% Thus, the

eliminated, by using the marginal for unobserved heterogeneity (~,t). The first male cohort

1950) has a RR = 13.3 for smokers and ~130= 13.9%, i.e., mortality adjusted 13.9% per year from age 30 to 67. This is higher than the ~30 estimated

(age 30 in

for smoking increased for mortality without

smoking heterogeneity (e.g., 7.5% to 8.5% [26]), b ecause cohorts were stratified on a fixed risk factor (i.e., smoking). By stratifying on smoking, individual times to death are predicted more precisely, i.e., mortality for male smokers in middle age (i.e., ages 30 to 67) is 13.3 times as high as for nonsmokers. The CV at 30 is 0.240, i.e., the standard

deviation

of individual

risks (z:) within

each smok-

ing group for the first cohort is 0.240 times the mean. The CV is largest for young and old cohorts. This is reasonable because genetic risks have not been reduced by mortality selection in the youngest cohorts [17] and older cohorts are likely to have accumulated more environmental exposures before their index age (i.e., the age at which the mortality experience of that cohort begins to be considered), In general, the CV for smokers, with a higher average risk, is greater. This suggests an interaction of smoking with unobserved risk factors. This is consistent with epidemiological studies of workers exposed to asbestos where, for nonsmokers, lung cancer risk was five to six times higher for exposed persons; for exposed smokers the risk was 70 to 80 times that for nonexposed nonsmokers 129,301. The excess risk for smokers with asbestos exposure was so high that most smokers rapidly died out of the asbestos exposed population. 630 suggests that 5.3% of white male smokers in the 1920 cohort, conditional on surviving to a given age, quit smoking per year. 6, declines over the next five cohorts (i.e., from 3.7% to 0.9%). Male cohorts aged 60 and above in 1950 had most of their mortality experience at late ages after many smokers have died (i.e., the prevalence of smokers for ages 60 to 97 will be low and the likelihood of quitting each year for an individual smoker negligible except possibly if the person experiences the symptomatic expression of health effects initiated by past smoking). 6, differ from estimates of changes in smoking prevalence because they are transition rates estimated simultaneously with mortality rates for smokers and nonsmokers in a full information likelihood. Differences in prevalence (cross-sectional) estimates are often made piecewise for broader cohort Little and Schluchter [4] showed piecewise evaluations can produce biased estimates. groups. Fundamentally, however, it is the improvement of the prediction of cohort mortality by the inclusion of the available, partial survey data on smoking, as well as individual risk heterogeneity, Furthermore, the model of smoking effects allows for the that is the focus of the analysis. proportion of smokers to decline in a way consistent with mortality and smoking cessation as two simultaneous forces of decrement on the 1950 marginal distribution of smoking. Smoking related risk variation declines for male cohorts born after 1895 due to the early mortality of high risk individuals and smoking cessation. The 1915 male cohort shows a RR of 9.9 for smokers with an (Y = 13.7%. The RR for smokers is high since it is conditioned on both survival and smoking cessation. The x2 for the 1915 male cohort is better than for the 1920 cohort; as is the fit for the 1910 cohort, i.e., x2 drops 92.7% (from 13,746.0 to 994.0), a significant improvement from the homogeneous population model (Table 1). The average percent error for the 1915 cohort declined from 8.9% to 2.3%-a reduction of 74.2%. Absolute deviations are smaller and there is less bias compared to Table 1, i.e., bias declined from -3.008 to +0.056, by 98.1% in absolute size-with a change in sign.

Analyses of Cohort Mortality

The RR for smokers 1905 cohort

in the 1910 male cohort

(age 45 in 1950), it decreased

and 1.1. The latter for ages 85 to 94.

two estimates

after age 40 (in 1950) decreased

to 5.0; 3.9 for the 1900 cohort;

are still higher

than

to 6.9. For the

and then

the RR of 1.2 estimated

3.2, 2.6, 1.8, by Harris

[25]

is likely due to early smoking related deaths in being concentrated among susceptible (e.g., [17]) d ivi d ua 1s, who, because of their high RR, will disproportionately die out from the cohort by age 80. Nonetheless, the a65 parameter is 9.1% for the 1885 cohort-suggesting smoking has a strong effect on total mortality at late ages. The model’s

The RR decline

97

fit is shown in Figure

over cohorts

2. 65 fi

Figure 2. Model N = 1.0: total mortality (predicted and observed) with smoking cessation for nine U.S. white male cohorts aged 30 to 70 in 1950.

The model trajectories follow the observed age trajectories closely. To test the sensitivity of the model to the form of the mixing distribution assumed for unobserved heterogeneity, we re-estimated the model, assuming z,” is inverse Gaussian distributed (n = 2), i.e., a decreasing CV for zt. The results are presented in Table 3. This model did not fit as well overall (x2 = 12,013.2; +18.4%) as when n = 1 (CV is constant). Relative error increased to 2.7% (compared to 2.3%), i.e., +13.0%. The ~130is similar (i.e., 13.9%), but the RR for smokers are higher for the 1915 and 1910 cohorts (i.e., 10.9 vs. 9.9 and 7.3 vs. 6.9). CYCis also higher (i.e., starting at age 35 and 40 in 1950). The CV at the initial age examined in the cohort is similar to estimates in Table 2. Smoking cessation rates (6,) are similar, except for the 1915 cohort. The absolute error and bias is larger--especially in the three oldest cohorts. Thus, the hypothesis that the relative magnitude of individual risk heterogeneity, conditioned on the cohort distribution of smoking, decreases over age within cohorts can be rejected, i.e., the gamma model, with constant CV, fits better than an inverse Gaussian model with a decreasing CV. Cohort specific estimates with n = 0.5 (i.e., CV increases with age) are in Table 4 (see Appendix). The x2 in Table 4 is marginally better (by 1.9%) than for the gamma model with the relative error nearly identical (2.26% vs. 2.27%) and similar bias except for the 1880 cohort. The RR, as for other models, starts high (i.e., 13.3%) as does CQ,O(13.9%) and declines with age. 6,

98

et al.

K. G. MANTON

estimates are similar to the other models. The cohort CVs are similar to those for the model with n = 1 except for the oldest cohort. Thus, heterogeneity of risk relative to the mean may tend to modestly increase with age within cohorts. The three models produce e.g., the fiz for the smokers arise from a statistical

similar estimates for parameters describing individual is similar in each model as is the CV within groups.

averaging

of the dynamics

of mortality

risk factors. Robustness of the morbidity-mortality in longitudinal data sets with multiple measures dynamics

were estimated

examined

(311. Despite

be statistically

directly

and temporal

changes

in

process parameter estimates was confirmed of individual risk factor values where their

and the equilibrium

the robustness

selection

processes, This may

of parameter

of the component

estimates,

certain

processes

models

empirically

(i.e., n = 2.0) can

rejected.

We also estimated (see Appendix),

models

we present

of total

mortality

the best fitting

for females for n = 0.5, 1.0, and 2.0. In Table

model

5

(with n = 1) for females.

The fit to the data (x2 declined 54.0%) improved over the homogenous population model (Table 1) with bias nearly eliminated (-97.1%). The improvement in precision is not as large as for males because proportionately fewer females smoke-and they tend to start at later ages. Female cohort data is fit less well (2.75% relative error) than for males (2.27%). Shape parameter estimates (a,) are about 5% to 18% lower than for males up to the 1885 cohort-then female (Y~‘s are larger because those partial cohorts are observed wholly at ages past menopause (i.e., the 1885 cohort is age 65 in 1950) where female mortality starts to increase more rapidly (but from lower levels) than for males. The RR for female smokers is higher than for males for all but the 1890 cohort suggesting that while fewer females smoke, smoking has a large effect on individual females. Female smoking cessation rates (5,) are higher than for males for the first five cohorts. This is partly because there is less smoking for white females (Figure 1)) and a lower RR, producing larger relative cessation rates. For example, white females decline from a peak prevalence of 26.2% in 1960 to 7.5% in 1987-a greater relative, but smaller absolute, decline than for males. The higher RR for older female cohorts is likely due to their lower overall age specific mortality rates. The CV is higher for younger and older cohorts-though female CVs tend to be larger than for males. Figure 3 shows the plots of fits for white female cohorts. Mortality trajectories are fit well by the model. Age specific rates are lower than for males. In Table 6 (see Appendix), we present the joint lung cancer and nonlung cancer mortality analysis for males. Lung cancer mortality (LCM) is modeled as a Weibull, and all other mortality (OM) as a Gompertz, within cohorts. For both, individual risks are assumed gamma (n = 1.0) distributed.

Modeling

lung cancer explicitly

improves

the information

available

in the mortality

data about the effects of smoking, and smoking cessation, because lung cancer is a cause of death with a strong and well-documented relation to smoking behavior. Thus, its inclusion should improve estimates of the age trajectory of smoking effects-especially in the extreme tail of the cohort survival distribution where parameters such as the CV of risk may be difficult to estimate with precision. In Table 6, the Weibull shape parameters for LCM are 5 to 8 for birth cohorts of 1915 to 1890. Changes in the Weibull for cohorts whose observation begins at different ages are reasonable because lung cancers starting in older persons are less likely due to preprogrammed (i.e., inherited) genetic susceptibility, so that more cellular DNA errors have to be environmentally induced. Gompertz parameters for OM are similar to values in Table 2. Latencies of 5 to 30 years were assumed for both total and lung cancer cohort mortality. These latency times were similar to those for other analyses of cohort lung cancer mortality [31]. The shorter latencies for young cohorts arise because, for persons dying at younger ages of cancer, the tumor biology is either histologically more aggressive, or host defenses defective. The rate of growth of tumors interacts with the number of genetic errors required for a tumor to initiate at a given age (minus the latency). A similar result is found for breast cancer mortality with early disease latency half

Analyses of Cohort h’lortalitj 0.30

1

0.25 j

65 170

‘60

Figure 3. Model N = 1: total mortality (predicted aud observed) cessation for nine U.S. white female cohorts aged 30 to 70 in 1950.

(about

seven years)

latency

affects

that

for late onset

the shape parameter

not affect the shape parameter results

of Sellers

enzyme

system

smoking earlier

breast

cancer

of the Weibull

of the Gompertz.

(a mean latency

lung cancer

used to model lung cancer

risks.

models

as is bias and relative

Estimates

For LCM,

for the joint

error.

the RR increases analysis

14 years).

mortality.

cofactors.

This It does

is consistent

with the

defect in the cytochrome

The effect on OM represents

on a broad range of diseases with many different

across cohorts.

of about

The higher CV for LCM

et al. [17] who found that persons with a genetic had highly elevated

with smoking

P450

the effects of

6, are similar to the values in

For OM, the RR for smokers

starts

high and declines

for the 1910 and older cohorts.

of LCM and OM for female cohorts

are in Table

7 (see Appen-

dix) . The female data is not fit as well as the male data-though Female

LCM

is described

differences

less well and with more bias-especially

very high RR for OM for smokers

are not large for OM.

in younger

cohorts.

WC hnd a

in the 1910 to 1920 cohorts.

4. DISCUSSION We used a model to represent first is cigarette health

smoking.

surveys.

variation

The

within

by a Dubey ian performed three models assuming

This type of heterogeneity

second

cohorts,

distribution

inverse Gaussian

ity estimates

is, conditional

risk groups, which,

(decreasing

least well.

is discrete.

and by cause of death.

Smoking

and marginal

risk factor

data from national

CV, when 11 = 2 is an

developed

better

in integrating health

from epidemiological

individual is described

CV. The inverse Gauss-

data performed

there was a large benefit

risk factor

The

data came from U.S. variation

with constant slightly

analyses.

heterogeneity,

and when n = 0.5 h
Thus,

from models

mortality

Individual

The model with 71 = 0.5 performed

i.e., marginal made

in cohort

on the discret#e smoking

when n = 1, is a gamma

CV),

with heterogeneity

risk homogeneity.

heterogeneity,

two sources of risk variability

surveys

than

better

71 = 1.0.

All

than the model

data sources

on risk

and risk heterogene-

and clinical

data.

Parameters

100

describing

K. G. MANTON et al.

individual

The CV in Tables

risks estimated

from combined

6 and 7 are interesting

data are more robust

given analyses

of genetic

and precise.

determinants

of lung cancer

where a codominant inheritance of an autosomal gene accounted for 69% of lung cancer at age 50, 47% at age 60, but only 22% at age 70, i.e., over 20 years of age apparently only 31.9% of the persons with this type of genetic susceptibility survived [17]. This is akin to the genetic determination late disease

of breast is related

cancer at early ages which is related to accumulated

partly selected for genetically show a higher risk for smokers determined

by circulatory

determined in general.

diseases

to family history

risk factor effects in a population

and genetics

while

which has already

been

mortality risks. The Gompertz for all other causes This is because risks in this group are predominantly

where smoking

risks are more rapidly

manifest.

Smoking cessation parameter estimates (6,) showed large reductions at young ages-probably due to the relation of the start of follow-up (1950) with early Surgeon General reports (1963) and the ages at which smoking habits are first formed. Female cessation rates were larger than for males reflecting a flat prevalence of smoking in female cohorts up to 1920 (female smoking started at later ages) and declines-from moderate to relatively much lower levels in a short time span. The moderate peak levels meant relative declines for females were larger than for males and the shorter period over which declines occurred implied faster cessation rates. 6, is estimated from a model where there is heterogeneity within each smoking group, so that there is a large adjustment for mortality (being concentrated in high risk persons). Since the female RR tends to be higher than for males, this suggests that their S, would be larger due to a greater correction for mortality, i.e., since a,, ^/c, and RR are correlated, their estimates will not be the same as Since 3% reflects individual risk differences derived from changes assumed to be independent. adjusted for smoking risks, and disease latency affects other model parameters, the interrelation of all parameters have to be determined simultaneously to understand the risk mechanisms. The best model seems to be the one which includes the marginal distribution of smoking, models lung cancer and total mortality separately, one as a Weibull and one as a Gompertz hazard function, and describes individual unobserved risk heterogeneity by a distribution with an increasing CV. This model performed better empirically-although its advantage over a similar model with Gamma distributed unobserved heterogeneity is not large. Also important is the fact that the components used to construct this model had a higher level of biological credibility. The Weibull model of carcinogenesis is well established for many solid tumors [16]. The explicit modeling of smoking effects is strongly suggested by the epidemiological literature which shows that it is a powerful risk factor, not only for lung cancer, but for many other diseases. However, it is also a risk factor for which exposure varied strongly over cohorts. A study of surgical interventions in patients age 90 to 103 showed very good outcomes over the period 1975 to 1985 because those birth cohorts had a low prevalence of early smoking (80% had never smoked) and thus, had little chronic pulmonary disease. The separate, explicit treatment of lung cancer, a disease with a high RR for smoking, helps to better estimate the trajectory of the loss of smokers in the joint likelihood. The fact that the CV was higher for the youngest and oldest cohorts is consistent with epidemiological studies showing genetic heterogeneity tends to be exhausted by age 85 and that environmental or behavioral exposures tend to accumulate with age. The stability, or slight increase, in the CV over age (within cohort) suggests the average risk tended to have the same rate of decline as the standard deviation of individual risks, so that the relative heterogeneity of risks is preserved to late ages. Thus, there is an explicit, biological rationale for each component of this model. To select another model, where some components had less biological credibility, would require strong statistical evidence (i.e., a much better fit for the alternate model) before rejecting the model most consistent with the available prior information. Thus, the procedure not only provides a strategy that is generally useful because of its ability to combine the types of data, each with specific limitations, that are often available (e.g., survey and vital statistics data), but also because it provides a logical basis for integrating prior scientific information and for assessing the weight of that evidence, in choosing a specific model. This is

Analyses of Cohort Mortality

101

similar to Bayesian principles of inference except that the prior information a stochastic process model of the events as they occur over time. Thus,

using

a population

model whose parameters

are estimated

here is embedded

from multiple

data

in

sources,

we evaluated the mechanisms of cohort smoking risks. Use of the combined data produced fits of cohort mortality which were unbiased and with good precision. The ability to integrate data was due to the use of a global likelihood which is constructed initially to describe the situation as if there were no missing data. The parameters of the global likelihood were then constrained to reflect combination procedure

the portions

of data

of parameter can produce

missing

estimates inconsistent

mortality can be improved health surveys.

by using

in different

data

sets.

made independently results

This is different than piece-wise from different data sets. A piece-wise

[4]. The results

ancillary

risk factor

suggests data

demographic

from nationally

analyses

of

representative

REFERENCES 1. M.E. Marenberg, N. Risch, L.F. Berkman, B. Floderus and U. de Faire, Genetic susceptibility to death from coronary heart disease in a study of twins, New England Journal of Medicine 330, 1041-1046 (1994). 2. S. Marriotti, P. Sansoni, G. Barbesino, P. Caturegli, D. Monti, A. Cossarizza, T. Giacomelli, G. Passeri, U. Fagiolo, A. Pinchera and C. Franceschi, Thyroid and other organ-specific autoantibodies in healthy centenarians, The Lancet 339, 1506-1508 (1992). 3. C. Brown and L. Kessler, Projections of lung cancer mortality in the United States: 1985-2025, Journal of the National Cancer Institute 80, 43-51 (1988). 4. 5. 6. 7. 8.

9.

R.T.J. Little and M.D. Schluchter, Maximum likelihood estimation for mixed continuous and categorical data with missing values, Biometrika 72, 497-512 (1985). K.G. Manton, H.D. Tolley, G.R. Lowrimore and A.I. Yashin, Combining multiple sources of data in cohort analyses, Working Paper No. M574, Duke University, Center for Demographic Studies, (1995). D.R. Brillinger, The natural variability of vital rates and associated statistics (with discussion), Biometrics 42, 693-734 (1986). K.G. Manton, E. Stallard and J.W. Vaupel, Alternative models for the heterogeneity of mortality risks among the aged, Journal of the American Statistical Association 81, 635-644 (1986). A.I. Yashin, Dynamics of survival analysis: Conditional Gaussian property versus Cameron-Martin formula, In Statistics and Control of Stochastic Processes: Steklov Seminar, 1984, (Edited by N.V. Krylov, R.S. Lipster and A.A. Novikov), pp. 466-485, Optimization Software, New York, (1985). C.E. Finch, Longevity, Senescence, and the Genome, University of Chicago Press, Chicago, IL, (1990).

10. G.A. Sacher, Life table modification and life prolongation, In Handbook of the Biology of Aging, (Edited by J. Birren and C. Finch), pp. 582-638, Van Nostrand Reinhold, New York, (1977). 11. N.R. Cook, S.A. Fellingham and R. Doll, A mathematical model for the age distribution of cancer in man, Intenzational Journal of Cancer 4, 93-112 (1969). 12. B. Rosenberg, G. Kemeny, L. Smith, I. Skurnick and M. Bandurski, The kinetics and thermodynamics of death in multicellular organisms, Mechanisms of Ageing and Development 2, 275-293 (1973). 13. A.C. Economos, Rate of aging, rate of dying, and the mechanisms of mortality, Archives of GerontoEogical Geriatrics 1, 3-27 (1982). 14. P. Armitage and R. Doll, Stochastic models for carcinogenesis, In Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Vol. IV, Biology and Problems of Health (Edited by J. Neyman), pp. 19-38, University of California Press, Berkeley, CA, (1961). 15. P. Armitage and R. Doll, The age distribution of cancer and a multistage theory of carcinogenesis, British Journal of Cancer 8, 12 (1954). 16. E.R. Fearon and B. Vogelstein, A genetic model for colorectal tumorigenesis, Cell 61, 759-767 (1990). R. Elston, A. Wilson, G. El&on, W. Ooi and H. Rothschild, Evidence for 17. T. Sellers, J. Bailey-Wilson, mendelian inheritance in the pathogenesis of lung cancer, Journal of the National Cancer Institute 82, 1272-1279 (1990). 18. M. Hollstein, D. Sidransky, B. Vogelstein and C. Harris, p53 mutations in human cancers, Science 253, 49-53 (1991). 19. S.D. Dubey, Some percentile estimators of Weibull parameters, Technometrics 9, 119-129 (1967). 20. P. Hougaard, Life table methods for heterogeneous populations: Distributions describing the heterogeneity, Biometrika 71, 75-83 (1984). 21. J.A. Nelder and R. Mead, A simplex method for function minimization, Computer Journal 7, 308-313 (1965). 22. A.J. Coale and M. Zelnik, New Estimates of Fertility and Population in the united States, Princeton University Press, Princeton, NJ, (1963).

102 23.

24. 25. 26. 27. 28. 29.

30. 31.

K. G. MANTON et al. J.S. Passe& J.S. Siegel and J.G. Robinson, Coverage of the national population in the 1980 census, by age, sex and race: Preliminary estimates by demographic analysis, In Current Population Reports, Ser. P-23, No. 155, U.S. Bureau of the Census, Washington, DC, (1982). J.S. Siegel, Estimates of the coverage of the population by sex, race and age in the 1970 census, Demography 11, 1-23 (1974). J.E. Harris, Cigarette smoking among successive birth cohorts of men and women in the United States during 1900-80, Journal of the National Cancer Institute 71, 473-479 (1983). W.H. Wetterstrand, Parametric models for life insurance mortality data: Gompertz’s law over time, nansactions of the Society of Actuaries 33, 15!+175 (1982). B. Efron and C. Morris, Data analysis using Stein’s estimator and its generalizations, Journal of the American Statistical Association 70, 311-319 (1975). Ft. Doll, R. Peto, K. Wheatley, R. Gray and I. Sutherland, Mortality in relation to smoking: 40 years’ observations on male British doctors, British Medical Journal 309, 901-911 (1994). I.J. Selikoff, Disability compensation for asbestos-associated disease in the United States, Report to the U.S. Dept. of Labor Environmental Sciences Laboratory, (Contract No. J-9-M-8-0165), Mount Sinai School of Medicine, New York, (June 1981). J. Peto, H. Seidman and I.J. Selikoff, Mesothelioma incidence among asbestos workers: Implications for models of carcinogenesis and risk assessment calculations, British Journal of Medicine 45, 124-135 (1982). K.G. Manton, E. Stallard, M.A. Woodbury and J.E. Dowd, Time varying covariates in stochastic multidimensional models of mortality and aging, Journal of Gerontology: Biological Sciences 49, B169-B190 (1994).

Gompertz

Risk of

Alodel

X’

2172.1

994.0

718.0

Age in

1950

30

35

40

2.6

1125.l

906.0

822.:3

GO

65

70

10142 7

3.2

1330.9

55

1.1

I.8

39

50

5.0

953.6

1120.7

45

6.9

9.9

13.3

Smokers

Relative

2. Gompertz

11.0

!I.6 9. I

9.6

10 3

11.2

12.5

13.7

13.9

( x 100)

Parameter)

(Shape

n

(Scale

3.56 2.58 3.38 3.50

7.5 7.5 7.7 8.1 8.3

10.0 10.4 9.4 7.7 7.1

X2 1867.0 1421.0 2120.6 2845.7 3699.7

- 0.002 - 0.006 - 0.014

- 0.074 - 0.107 2.85 5.52

8.7 8.5

6.4 8.1

3010.8 6006.7

-0.315 - 0.706

2.07

6.36

:T.x1

L.2d

11.42

9.83

10.22

7 32

1 88 3.20

5.14

1.03

3.22

2.26

.468

2.80

Parameter)

,211

(Scale

(x10”)

Smokers

P2

for nine U.S. white

,526

,265

327

,063

.ooo

.I50

,583

,695

.240

fi,

of Variation

(x

0.0

0.0

0.0

0.9

1.4

2.4

3.1

3.7

5.3

100)

6,

Average

2 27

I.80

1 855 I .78

1.77

1.56

1.58

1.63

2.31

4.19

(%)

Error

Relat,ive

Absolute

(%)

.169

,511

.623

- .232

- .440

‘.

.(I28

.25”

025

051

- ,028

,048

,042

,044

.056

- ,048

(“/o)

(Bias)

Error

Kelat ive

Average

(6,)

- .482

1.546

- ,827

- ,840

- ,104

aged 30 to 70 in 1950 with smoking

Coefficient

male cohorts

3.77

3.21

8.2

26963.1

2.40

8.4

7.4 9.1

3883.8

- 0.360

2107.8

- 0.032

(“lo) 4.94

(XTOO)

(x105)

(x100)

,228

Parameter)

(x105)

Nonsmokers

PI

mortality

ml.93

6.22

(N = 1.0) model

of total

1 265

3.19 0.534

7.4

-2.733

7.6

24.5

29.2

- 1.151

4.81

(x100)

6.75

4.16

5.49

7.3

7.1

5.85

7.4

7.2

6.72

7.4

7.7

34.9

31.7

26.8

23.8

22.7

Cohort

Table

81590.7

2690.6

6314.6

70

0.614

65

0.303

- 1.747

8023.3

60

- 0.989

8113.7

55

0.279

- 1.327

12313.5

50

0.179

- 1.594

12197.7

45

0.114

2.083

11397.1

40

0.094

18.3

8.86

- 3.008

13746.0

35

0.047

(%) 2.053

7.8

(X0)

15.8

6.84

X2

6794.2

30

(x 100)

(Bias)

81

Error

(x105)

1950

Error

Error

Error

a

Age in (Bias)

Error

Relative

Absolute

Error

Average

Relative

Females

Cohort

White

Relative

U.S. mortality.

Absolute

PI

of total

Relative

models

Absolute Average

specific

Average Average

and cohort

Average

Males

1. Gender

Average

White

Table

0

01u

0.003

Il.001

0.001

o.no1

0.000

0.000

0.000

0.000

( x 100)

Error

Absolute

Average

cessation.

0.735

0.357

0.236

0.107

0.096

0.052

0.033

0.022

0.017

0

001

0 000

0.000

~~0.000

- 0.000

- 0.000

- 0.000

- 0.000

- 0.000

(x100)

Errol

Average

-0.329

-0.167

- 0.092

0.007

0.008

0.004

0.002

0.000

0.000

(x100)

Error

(x 100)

Average

Error

Average Absolute

5.0

3.9

3.2

2.6

1.8

1.9

954.0

1120.7

1330.9

1168.7

918.9

2587.9

45

50

55

60

65

70

3.9

3.2

2.6

1.8

1.1

953.3

1120.7

1330.9

1109.9

902.4

680.3

45

50

55

60

65

70

9952.1

5.0

702.1

40

6.6

9.1

980.4

35

10.3

9.1

9.5

9.6

10.3

11.2

12.5

13.6

3.22

6.45

3.98

3.20

1.88

1.03

,488

,255

,212

13.3

2172.1

30

13.9

(x105)

(x:00)

Smokers

Pl

X2

Relative Nonsmokers

1950

1.18

5.98

3.53

3.22

1.88

1.03

2.22

10.76

9.06

10.26

.602

.296

.383

.060

.ooo

.I43

5.15 7.31

.576

,668

.237

fit

of Variation

Coefficient

3.12

2.13

2.80

(x105)

Smokers

L32

0.0

0.0

0.0

0.9

1.4

2.4

3.2

3.9

5.3

(x 100)

6,

2.72

4.21

1.79

1.55

1.77

1.56

1.58

1.65

2.34

4.19

(%)

Error

- 0.000 - 0.000 - 0.000

0.000 0.000 0.000 0.000 0.001 0.001 0.001

- .048 .054 ,042 ,041 ,048

-

.38(1

- 1.715

- .057

-.lOO

- ,028

- 0.000 - 0.005

0.003 0.008

- 0.000

-0.000

- 0.000

- 0.000

Error (x100)

(x100)

(Bias) (W

Average

Average Absolute Error

Error

Relative

Relative

,241 .735 ,597 ,154 .ooo .068 .307 .251 .444

2.31 3.21 5.14 7.31 10.20 10.17 11.65 3.55

fi,

of Variation

Coefficient

2.80

(x105)

Smokers

02

0.0

0.0

0.0

0.9

1.4

2.4

3.1

3.6

5.3

(x100)

6,

-

2.26

1.64

1.78

1.54

1.77

1.56

1.59

1.62

2.29

4.19

(W

Error

-

- 0.000

- 0.000 - 0.000 0.000 - 0.000

0.000 0.000 0.000 0.000 0.001 0.001 0.001 0.003 0.002

- ,048 ,063 ,047 .042 ,048 - ,027

- .006

- ,071

-.018

- ,038

-0.000

- 0.000

- 0.000

- 0.000

Error (x100) (%)

(Bias)

(x100)

Average

Average Absolute Error

Error

Relative Relative

Average

Average Absolute

(N = 0.5) model of total mortality for nine U.S. white male cohorts aged 30 to 70 in 1950 with smoking (6,) cessation.

11.5

9.2

9.7

9.6

10.3

11.2

,430

.196

.210

(x105)

Nonsmokers

Pl

Risk of

Model

Age in

Cohort

Table 4. Gompertz

12013.2

7.3

744.4

40

12.6

13.9

10.9

35

(x YOO)

1015.5

2172.2

30

Smokers

13.9

X2

1950

Risk of

Relative

Average

Average Absolute

(N = 2.0) model of total mortality for nine U.S. white male cohorts aged 30 to 70 in 1950 with smoking (6,) cessation.

13.3

Model

Age in

Cohort

Table 3. Gompertz

5.4

4.2

3.3

2.2

9.4

12.7

3163.2

1766.7

1426.1

965.5

587.4

45

50

55

60

65

70

12391.4

a.2

537.7

1768.3

40

13.8

787.5

35

12.2

10.7

9.3

9.1

9.7

9.9

10.7

12.3

.54

1.55

4.02

3.66

2.22

1.73

.96

.32

.21

20.0

1389.0

30

13.1

(x105)

Smokers

X2

1950

(ZOO)

Nonsmokers

Risk of

Model

Age in

Pl

Relative

6.84

14.56

8.88

12.11

9.23

9.37

7.90

4.45

4.27

(x105)

Smokers

a

,580

.437

,351

.045

.Oal

,000

.OOO

,388

,424

fi,

of Variation

Coefficient

0.0

0.0

0.0

0.0

2.5

4.1

4.8

4.9

5.7

(x100)

6,

~

2.75

1.40

1.62

1.81

2.27

3.38

2.98

1.82

2.72

4.31

(%)

Error

-

- ,014

.077

-.004

- ,005

,017

,012

,020

,006

- ,031

-.150

(%)

(Bias)

Error

Relative

Relative

Average

Average Absolute

Error

0.002

0.002

0.001

0.001

0.001

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

-0.000

- 0.000

0.000

(x100)

Error (x100)

Average

Absolute

Average

(N = 1.0) model of total mortality for nine U.S. white female cohorts aged 30 to 70 in 1950 with smoking (6,) cessation.

Cohort

Table 5. Gompertz

45

45

50

50

55

55

60

60

65

65

70

70

LCM

OM

LCM

OM

LCM

OM

LCM

OM

LCM

OM

LCM

OM

171.1

30

30

25

25

25

25

871.3

23.8

951.9

26.2

1212.9

92.8

1498.0

25

25

1307.6

115.3

1007.0

260.1

757.4

258.3

20

20

20

20

20

20

4 Value in parentheses

mixed Weibull

is not changed.

is taken to the l/(a

1.0

1.9

1.5

1.6

2.4

1.7

2.9

1.6

3.5

1.5

4.4

1.4

6.6

1.5

11.2

1.1

15.0

1.2

- 1) power.

(153.5)

(12.0)

(9.4)

(7.7)

(9.4)

(9.2)

(11.0)

(2.4)

(4.2)4

Smokers

Risk of

Relative

is raised to 01 - 1 power; the relative

the same for both LCM and OM.

scaJe parameter

3 6 is assumed

2 Gompertz

1 Weibull scale parameter

10602.6

40

OM

47.4

926.6

OM

40

LCM

15

15

1053.7

35

OM

2069.9

LCM

35

LCM

5

5

30

58.7

Years

1950

OM

X2

in

Age in

30

Model

Latency

Cohort

LCM

of gamma

risk of smoking

11.0

8.7

9.0

6.3

9.4

5.3

9.5

5.5

10.0

6.6

10.7

7.0

12.3

7.1

13.7

7.6

13.9

10.1

(OM is x100)

Q

Pl

all stages

62.7

631.3

70.3

498.1

46.8

376.8

38.0

448.2

16.8

571.0

11.7

671.1

6.21

717.1

1.58

864.1

.3782

937.71

(x105)

affected

mortality

and Gompertz

a

.ooo

,000

in carcinogenesis.

.528

5.721

1212.8 62.7

,293

3.754

,323

1.594

.OOO

.ooo

.OoO

SKI0

.OOO

108.5

795.4

111.0

635.5

111.7

707.8

58.5

852.7

51.8

972.4

.632

1060.8 40.8

,689

3.841

,313

4.989

fi,

of Variation

17.8

987.6

5.69

1098.8

(x105)

Smokers

Coefficient

0.0

0.0

0.0

1.0

1.5

2.5

3.4

4.1

5.63

(x100)

6,

mortality

(Bias)

Error

,042

2.78 2.27

2.95 1.86

.286%

- .315 - .263

1.83

- .022%

.014

2.62

2.29%

- .534

1.64

3.10%

- ,671 - .026

3.19

- ,757

1.75

-.007

,050

2.00

1.91

- ,075

1.69

3.42

.550 .021

2.99

.037

1.72

1.19

.534

4.20

3.56

,799 - .062

3.09

-

Error

-

Relative

Relative

(%)

Average

Average

(%)

for

Error

.OOO

.018

,000 - .OOL .015

.009 ,152 .QO6 ,279

,272

- .065

.OOO

- ,002

.I40

.005

- ,002 -.OOl

,011

.002 074

.OOl - .OOl ,006

.OOO .007 .049

.ooO .026

.ooO

.OOO

.OQl

.006

.OOO .OOo

.OOl .021

(x100)

Error (x100)

Average

Absolute

Average

and the second to other mortality.

Absolute

mortality

(N = 1.0) model of all other

The first line in each pair refers to lung cancer

Nonsmokers

(6) cessation.

(N = 1.0) model of lung cancer

aged 30 to 70 in 1950 with smoking

Table 6. Combination

nine U.S. white male cohorts

of gamma

mixed Weibull

78.8

1355.4

48.0

769.7

323.0

551.5

442.8

1647.4

297.7

3164.1

5

5

5

5

15

15

15

15

15

15

30

30

35

35

40

40

45

45

50

50

LCM

OM

LCM

OM

LCM

OM

LCM

OM

LCM

OM

20

4 Value in parentheses

12.8

1.0

9.4

2.0

2.2

1.0

3.4

1.0

3.7

1.5

3.3

1.6

8.2

1.5

15.7

1.1

22.7

1.2

is not changed.

is taken to the l/(a

scale parameter

- 1) power.

(11.3)

(21.9)

(52.4)

(17.6)

(1.7)

(7.2)4

12.2

3.5

10.7

4.5

9.3

4.5

9.2

5.7

9.5

8.1

9.1

9.4

10.6

8.5

12.5

8.7

13.2

9.9

(OM is x100)

a

is raised to a - 1 power: the relative risk of smoking

3 6 is assumed the same for both groups.

’ Gompertz

1 Weibull scale parameter

* SCassumed

1437.5

to be zero.

41.4

589.9

12251.0

20

973.7

40.4

1436.2

OM

70

70

LCM

OM

20

20

20

LCM

65

65

LCM

60

OM

OM

60

LCM

56.5

15

OM

20

108.9

1763.1

15

55

55

LCM

X2

Years

1950

Smokers

Risk of

in

Age in

Model

Relative

Latency

Cohort Pl

(x105)

affected

76.5

69.4

120.5

356.3

56.0

194.9

47.3

330.3

39.4

829.3

45.0

1071.5

40.3

1003.9

7.93

743.5

8.14

1017.0

(x105)

Smokers

&

,583

2.770

.439

,000

,351

2.795

.Ooo

4.774

.ooO

.OOO

.OOO

.OOo

,000

,000

,404

5.500

.460

6.559

fi,

of Variation

Coefficient

all stages in carcinogenesis

5.96

69.4

12.8

178.8

25.4

194.9

13.9

330.3

10.6

537.4

13.5

667.7

4.91

683.8

.506

695.7

.3592

815.1’

and Gompertz

0.0’

0.0

0.0

0.0

2.8

5.8

4.9

5.1

5.S3

(x100)

6,

-

2.70%

8.00%

1.41

6.03

1.63

4.75

1.82

4.47

2.28

5.38

3.39

8.36

2.69

10.12

1.82

7.95

2.72

4.04

4.29

6.22

(%)

Error

-

-.003%

1.04%

,078

-.271

-.004

-.561

-.006

-.731

,020

-.317

.022

.432

,023

1.961

.016

1.510

-.035

.726

-.147

1.163

(%)

(Bias)

Error

Average Relative

Average Absolute Relative

for

.202

.002

,202

,002

,137

,002

,080

,002

.076

,003

.041

,003

,015

.003

.012

,001

.012

,001

,032

,000

,010

.OOO

,016

-.OOl

,005

.OOO

.003

,000

,001

,000

,000

,000

.OOO

,000

.OOO

.OOO

(x100)

Error

Error (x 100)

Average

Absolute

Average

and the second to other mortality.

(N = 1.0) model of all other mortality

The first line in each pair refers to lung cancer mortality

Nonsmokers

(6) cessation.

(N = 1.0) model of lung cancer mortality

aged 30 to 70 in 1950 with smoking

Table 7. Combination

nine U.S. white female cohorts