Adjusting for confounded variables: Pulmonary function and smoking in a special population

Adjusting for confounded variables: Pulmonary function and smoking in a special population

ENVIRONMENTAL RESEARCH 43, 251-266 (1987) Adjusting for Confounded Variables: Pulmonary Function and Smoking in a Special Population H E N R Y A . F ...

1007KB Sizes 0 Downloads 21 Views

ENVIRONMENTAL RESEARCH 43, 251-266 (1987)

Adjusting for Confounded Variables: Pulmonary Function and Smoking in a Special Population H E N R Y A . F E L D M A N , JOSEPH D . BRAIN, AND M A R G A R E T L . H A R B I S O N

Respiratory Biology Program, Department of Environmental Science and Physiology, Harvard University School of Public Health, Boston, Massachusetts 02115 Received April 26, 1986 Confounded variables present an obstacle to valid inference in many environmental and occupational studies. We describe a series of procedures that we used to address this problem in a study of pulmonary function and smoking. Subjects were drawn from the Multiple Risk Factor Intervention Trial (MRFIT), a prospective study of coronary heart disease. Confounding of smoking, hypertension, and hyperlipidemia was designed into the trial and was beyond the control of our ancillary study. We used statistical techniques to (1) detect and characterize the pattern of confounding, (2) identify important variables affecting pulmonary function, and (3) perform appropriate adjustments for extraneous influences (i.e., other than smoking). Among the techniques we used were factor analysis, stepwise multiple regression, and bootstrap replication. Analysis of the adjusted pulmonary function measurements showed that they were satisfactorily standardized and free of artifact. Moreover, use of the adjusted values sharpened our statistical results concerning smoking, the ultimate object of the study. We contrast the use of external and internal standards and discuss methods for detecting, ruling out, or counteracting confounding. © 1987 Academic Press, Inc.

INTRODUCTION Studies of environmental and occupational hazards to physiological function are usually complicated by extraneous factors. Even if the study sample is unbiased, the population of interest--smokers, patients, exposed workers--is itself commonly distorted by selection phenomena, with the result that two or more variables are confounded. For example, a factory may have workers of average age and average airway reactivity, but because of a recently instituted preemployment screening program, the older workers may have generally higher reactivity and the younger workers lower reactivity. Age and reactivity will therefore be confounded in a random sample, even though the sample may not be biased with respect to either variable. Any findings involving airway reactivity in this population would be open to question unless some adjustment had been made for age. Yet the usual tools for "normalizing" clinical measurements--for example, a prediction formula based on sex, age, and height--would be of questionable applicability because even control subjects are not "normal." How can one correct the confounding in a peculiar population? How can one adjust for extraneous influences on physiological function and thus isolate the effects of reactivity, smoking, pollution, or other variables of central interest? In this paper we describe a series of statistical techniques that are useful in 251 0013-9351/87 $3.00 Copyright© 1987by AcademicPress, Inc. All fightsof reproductionin any form reserved.

252

FELDMAN, BRAIN, AND HARBISON

such settings. We illustrate their use in our study of pulmonary function and smoking in subjects from the Multiple Risk Factor Intervention Trial (MRFIT). This study is particularly apt as a methodological example because the confounding was deliberate and explicit but nonetheless inescapable because of the study design (described below). The statistical procedures for identifying and combatting confounding are therefore transparent in their operation because the underlying conditions are known in this case. The MRFIT was a multicenter clinical trial aimed at reducing the incidence of coronary heart disease in men by aggressive treatment of known risk factors (MRFIT, 1982). As an ancillary study, we measured pulmonary function in these men, with the aim of studying the natural history and reversibility of obstructive airways disease. Besides performing pulmonary function tests, we collected data from MRFIT clinical records, including a detailed smoking history. The MRFIT population was interesting physiologically because it included both short-term and long-term ex-smokers, as well as current smokers and nonsmokers. Use of this population was problematical, however, because subjects were chosen for risk of coronary heart disease on the basis of a mathematical combination of smoking history, blood pressure, and serum lipids. Thus nonsmokers had to be hypertensive or hyperlipidemic to be in the study. This confounding was an inherent feature of the MRFIT program and was therefore beyond our control. In this paper we describe how we (1) delineated the pattern of confounding in these data, (2) identified the important variables affecting pulmonary function tests, and (3) performed appropriate adjustments. In a separate paper we have reported the adjusted measurements and described how smoking influenced pulmonary function (Brain et al., submitted). Both papers describe a cross-sectional study. A separate issue not dealt with here is bias in the MRFIT population, i.e., whether our findings can be extrapolated to the general population or any other population of interest. The emphasis here is rather on our methodological approach to dealing with the internal structure of the MRFIT population.

METHODS

Population The MRFIT was conducted at 20 centers in the United States between 1973 and 1982. We studied subjects at the Boston center located at the Harvard University School of Public Health. The MRFIT was designed to investigate whether mortality from coronary heart disease in men could be reduced through treatment of risk factors. The population of interest consisted of men aged 35-64 years who were judged to be at higher than normal risk of death from coronary heart disease on the basis of three risk factors: high serum cholesterol, high diastolic blood pressure, and cigarette smoking. Screening. The Boston center screened 14,638 men to find those at high but not extreme risk. Reasons for exclusion at this stage included (1) low risk as measured by a multivariate formula (MRFIT, 1982), (2) evidence of coronary heart disease or very high serum lipids or blood pressure, (3) other signs of poor health,

ADJUSTING FOR CONFOUNDED VARIABLES

253

and (4) unlikelihood to remain in the 6-year study. The final Boston population comprised 652 men. Further details of screening are given elsewhere (MRFIT, 1982; Brain et al., submitted).

Protocol Subjects were assigned randomly for treatment, 50% to usual care (their own physicians) and 50% to a program aimed at lowering risk factors. Both groups returned annually for assessment of health and risk factors, at which time we asked each one to volunteer for our ancillary pulmonary function study. A few men in poor physical or emotional condition were advised by their physicians or MRFIT staff not to perform the pulmonary function tests. A few subjects demurred because of lack of time, and some came at times when we were not available.

Pulmonary Function Tests F l o w - v o l u m e data were collected by a microcomputer connected to an Ohio rolling-seal spirometer. Subjects performed three maximal expirations breathing air and three expirations breathing a mixture of 80% helium:20% oxygen. From the flow-volume curves we measured expired volume in the first second (FEV0, forced vital capacity (FVC), maximal flow at three stages of expiration (f'max~5o%~, ~ermax(25%), ~¢rmax(12.5%)), and density dependence of maximal flow at the same three lung volumes (AVmax(50%), A/C'm~C25%l, a//m~.x(m5%~). Density dependence was defined by Af'm,x = (Vmax(He) - Vm~x(air))/Vmax(air). Technical details of data acquisition and analysis are given elsewhere (Brain et al., submitted). Data selection. For acceptable data we required at least three curves on air and three on helium-oxygen, with all six FVCs within 5% of the greatest. " P o o r " shapes due to coughing, submaximal effort, or inconsistent performance were discarded. Each subject was tested during every annual visit if possible. The data in this report comprise each subject's first successful attempt to produce acceptable flow-volume curves. We obtained acceptable data on at least one occasion from 403 men. To investigate possible bias caused by our method of selecting data, we ,randomly sampled 30 subjects from each of three groups: (a) acceptable data off the first occasion tested, (b) failure on the first occasion but acceptable data on the second, and (c) failure on first and second occasions. There was no difference among the three groups as to age (P > 0.90), height (P > 0.80), or distribution among smoking categories as defined below (P > 0.30).

Personal Data and Smoking From the MRFIT records of annual visits we collected age, height, weight, blood pressure, serum lipid levels, smoking data, serum thiocyanate (a metabolite of cigarette smoke) (Benfari et al., 1977), and information on the use of prescribed drugs. We divided the subjects into five smoking categories. Smokers (S) were currently smoking at least one cigarette per day. Pipe and cigar smokers were excluded from the study. Recent ex-smokers (X) had quit since enrolling in t h (

254

FELDMAN, BRAIN, AND HARBISON

MRFIT (0-6 years before test date). Moderate-term ex-smokers (XX) had quit within 5 years before enrolling in the MRFIT. Long-term ex-smokers (XXX) had quit more than 5 years before the MRFIT. Nonsmokers (N) had never smoked. Because the recent ex-smokers had quit since entering the MRFIT, we could determine how long it was since they had last smoked. For current smokers we recorded the subject's estimate of daily cigarette consumption and obtained from MRFIT records a composite dosage index of current exposure to cigarette smoke.

Statistical Analysis The dependent variables in this study were eight pulmonary function measurements: lung volume (FEV 1 and FVC), maximal flow (/¢max) at 50, 25, and 12.5% of vital capacity, and helium sensitivity (A/Vma×) at the same three volumes. We refer to the dependent variables as y's. The independent variables of primary concern were smoking category, dosage index, cigarettes per day, and years since quitting. The remaining independent variables, extraneous to the study and potentially confounding, were age, height, weight, blood pressure, use of propranalol and diuretics, and serum concentrations of thiocyanate, cholesterol, and triglyceride. For procedures requiring numerical data we scored medication as 0-1 and smoking category as 1 - 2 - 3 - 4 - 5 , indicating N - X X X - X X - X - S , respectively, as defined above. We refer to independent variables as x's. Our analysis proceeded in four steps, which we list briefly and then describe in more detail: (1) Factor analysis to map out relationships among independent variables. (2) Stepwise multiple regression to select appropriate predictor variables for each pulmonary function variable. (3) Linear adjustment to correct pulmonary function for the selected predictors. (4) Bootstrap validation to identify artifactural predictors and remove their influence by repeating steps (2)-(3) with random resampling. Factor analysis is a technique for "explaining" the correlation among many measured variables (x's) by their linear relationship to a small number of underlying factors (Johnson and Wichern, 1982). The communality of a measured variable is defined as the fraction of its variance explained by the factors. If the factors are constructed appropriately, the communality of each measured variable can be attributed largely to one particular factor. Several x's may have the same principal underlying factor, in which case they are regarded as aliases for one another. In further analysis a single representative variable may be chosen for each group. We applied factor analysis to the set of 10 x's listed in Table 1. Smoking dosage, cigarettes per day, and time since quitting were not included because they applied only to S or X subgroups. The aim of factor analysis was to reduce those 10 correlated variables to a small number of uncorrelated ones that could be used in further analysis. By this method we could separate smoking from other predictors with which it was confounded.

255

ADJUSTING FOR CONFOUNDED VARIABLES TABLE 1 SUMMARYSTATISTtCS(n = 403 SUBJECTS) Pulmonary function variables (dependent) FEVI (liters) FVC (liters) ~m~x(50~ (liters/sec) gmax(25%)(liters/sec) Vm.~xO2.5~) (liters/sec) A Wraax(5O%) AVmax(25%~ AVmax(12.5%) Predictor variables (independent) Age (years) Height (inches) Weight (pounds) Diastolic blood pressure (mrn Hg) Serum cholesterol (mg/dl) LOgl0 serum triglyceride (mg/dl) Log10 serum thiocyanate (mg/dl) Smoking category Current smokers (S) Recent quitters (X) Moderate-term quitters (XX) Long-term quitters (XXX) Never smoked (N) Diuretic use Yes No Propranalol use Yes No a

3.16 4.23 3.82 0.98 0.29 0.45 0.20 0.07

+_ 0.56" _+ 0.65 _+ 1.47 _+ 0.44 _+ 0.13 _+ 0.18 _+ 0.17 _+ 0.19

50.9 69.4 188 81 242 2.22

_+ 5.9 _+ 2.6 +_ 26 _+8 _+ 36 _+ 0.24 1.88 _+ 0.33 133 (33%) 81 (20%) 49 (12%) 62 (15%) 78 (19%) 170 (42%) 233 (58%) 43 (11%) 360 (89%)

Means -+ SD.

Stepwise multiple regression is a procedure for selecting from a list of independent variables a small set of " b e s t " predictors for some dependent variable (Draper and Smith, 1980). In contrast to factor analysis, which concerns redundancy among x's, stepwise regression concerns the dependence of y's on x's. We used the backward elimination variant of stepwise regression, in which statistically insignificant predictors are deleted one by one from an initial list of potential predictors. The procedure is halted when further deletions would cause a loss of information, i.e., when every remaining predictor explains a statistically significant portion of the variance in addition to what is explainable by all other remaining predictors. We used P < 0.05 as the criterion of statistical significance. Linear adjustment formulas were constructed from the final stepwise regression equation. We used the formula to adjust every subject to standard values of each predictor in the equation (except smoking). For standard values we took the sample mean age, height, etc., and no medication. Bootstrap validation is a computer-intensive technique for examining the impact of random variability on any process of sampling and analysis (Diaconis and Efron, 1983; Efron and Gong, 1983). Our concern was whether some questionable

256

FELDMAN, BRAIN, AND HARBISON

predictors selected by stepwise regression--serum cholesterol, for example, predicting ~/max--might be artifactual. Were these genuine effects or were they statistical flukes that might not arise from another random sample? The bootstrap technique addresses such questions by simulating the acquisition of new random samples, on which the analysis is repeated. The " n e w " samples are drawn at random from the original data, each point being replaced after it is drawn. Thus a bootstrap sample is the same size as the original data, though it may contain some of the original points several times and others not at all. We applied the bootstrap technique to the entire sequence of analyses, including stepwise multiple regression and linear adjustment, as follows. From the complete list of 403 subjects we drew 20 bootstrap samples, each consisting of 403 names. On each bootstrap sample we performed stepwise regression, obtaining a list of predictors and a linear adjustment formula for every y. Linear adjustment was then applied to the original set of 403 observations. When we had finished, each subject had 20 complete sets of adjusted pulmonary function measurements. We averaged these to produce a final adjusted set. To examine the workings of the bootstrap we tabulated how many times each predictor was selected.We also analyzed the 20 x 403 adjusted values by Model II (random effects) analysis of variance to compare the three sources of random variability: subject-to-subject, bootstrap-to-bootstrap, and residual errors (Sokal and Rohlf, 1981). Computations were performed with SAS (Statistical Analysis System) at the Health Sciences Computing Facility, Harvard University School of Public Health (SAS Institute, 1982). RESU LTS

Summary Statistics Summary statistics for the eight pulmonary function variables (y's) and 10 independent variables (x's) are shown in Table 1, with the y's unadjusted. In the following subsections we detail our progress from Table 1 to Table 4, which contains adjusted values of the eight pulmonary function variables. Acceptable flow-volume curves were obtained from 403 subjects. Mean age at testing was 50.9 years. Mean height and weight were 69.4 inches and 188 pounds (176.3 cm, 85.5 kg), respectively. There were 133 current smokers (S), 81 recent quitters (X), 49 moderate-term quitters (XX), 62 long-term quitters (XXX), and 78 nonsmokers (N). Because the distributions of serum triglyceride and thiocyanate were severely skewed, we used logarithms of those two variables in all further analyses to reduce the influence of extreme values.

Factor Analysis Factor analysis identified five factors accounting for 71% of the total variability

257

ADJUSTING FOR CONFOUNDED VARIABLES TABLE 2 FACTORANALYSIS-REDUCED10 CONFOUNDEDVARIABLESTO 5 INDEPENDENTGROUPS

Factor

Variance explained (%)

(1)

16.6

(2)

15.1

(3)

14.5

(4)

14.3

(5)

10.5

Variance explained by Associated variable

Primary factor (%)

All factors (%)

Smoking category Thiocyanate Height Weight Cholesterol Triglyceride Diuretics Propranalol Age Diastolic blood pressure

73.0 78.5 70.9 70.3 68.9 64.7 60.4 58.0 37.0 66.0

76.5 78.8 71.9 74.3 74.7 72.6 64.1 61.4 60.4 75.8

(Table 2). Each of the 10 x's was associated far more strongly with one particular factor than with any other. Thus factor analysis effectively divided the 10 x's into five groups of two, each group closely correlated but relatively independent of the other groups. Every variable's communality (variance explained by all factors) was attributable almost entirely to its primary factor. One exception was age, with only 37% of its 60% communality attributable to factor (5). However, no other factor accounted for more than 7% of the variance of age. Interpretation of the factors was clear. Factor (1) was associated with smoking history and serum thiocyanate, i.e., current smoking. Factor (2) reflected body size, while factor (3) represented serum lipids. Factor (4) included diuretic and propranalol use with a positive correlation, presumably because of similar clinical indications. Factor (5) comprised age and diastolic blood pressure with a negative correlation; factor analysis thus revealed a strong pattern of confounding reflecting MRFIT selection rather than natural physiology.

Stepwise Multiple Regression Original data. We conducted stepwise multiple regression analysis to select predictors for each pulmonary function variable (y), using as potential predictors the following five representative independent variables, one from each factor group: smoking category, age, height, cholesterol, and diuretic use. The criterion for retaining variables in the regression equation was statistical significance at P < 0.05. Results are summarized in the upper half of Table 3. Smoking influenced FEV 1, all flows (Vm~x), and helium sensitivity (A/Vma~)at higher volumes. Age and height predicted all flows and volumes but not A/c'm~x. Serum cholesterol was an additional significant predictor for Vmax(12.5%)and diuretic use for Af'max(12.5%). Bootstrap samples. To validate the stepwise regression we repeated it on 20 bootstrap samples, each sample consisting of 403 subjects drawn randomly (with

258

FELDMAN, BRAIN, AND HARBISON TABLE 3

PREDICTORS

OF PULMONARY

Pulmonary function variable

FUNCTION

SELECTED

BY STEPWISE

MULTIPLE

REGRESSION

ANALYSIS

Predictor Smoking

Age

+ -

+ +

+ +

-

_ -

.Vraax(50%)

+

+

+

-

_

.Vmax(25%)

+

+

+

--

_

Voax(12.5% ) A Vmax(50%)

+ Jr

+ .

-~

~-

--

A gmax(25%)

-~-

A Vmax(12.5%)

.

Original analysis" FEV I FVC

.

. .

Height

. .

Cholesterol

.

.

.

.

.

Frequency of selection in analysis of 20 bootstrap samples FEV 1 20 20 20 FVC 7 20 20 Vm~(50~) 20 20 9 Vmax(25%) 20 20 11 Vm.~x~2.5~) 20 20 19 A Vmax(5O%) 16 8 3 Vmax(25%) 18 5 0 AVmaxO2.5%) 6 1 2

Diuretics

-F-

3 5 3 6 10 2 2 5

5 8 4 2 2 4 1 14

a Selected predictors (+) explained a statistically significant amount of variance of predicted variable as judged by F test (P < 0.05), after all other included predictors were taken into account• ( - ) Predictors not selected. replacement) from the original 403. In this way we obtained 20 versions of the predictor list for each y. The lower half of Table 3 shows the frequency of selection of each predictor for each y. Bootstrap selection frequencies agreed with the original regression in most cases. For example, FEV1 was predicted by age, height, and smoking in all 20 replications. Cholesterol and diuretic use were chosen infrequently. The bootstrap thus seemed to confirm the original analysis, in which age, height, and smoking were selected as predictors of F E V 1. The cases of FVC, Vmax and A/dm~ were similar. A graphical summary of bootstrap results is shown in Fig. 1, in which freq u e n c y of inclusion in bootstrap replications is c o m p a r e d with significance level (P value) in the original analysis.

Adjusted Values Calculation. We did not find objective grounds in the bootstrap selection frequencies for judging whether a variable belonged in the prediction equation. Instead o f using the bootstrap to study the list of predictors, we used it to address the variability of adjusted values themselves. To do this we used the regression equation from each bootstrap sample to adjust all of the original data. We thus p r o d u c e d 20 sets of adjusted pulmonary function variables for each of the 403 subjects.

259

A D J U S T I N G FOR C O N F O U N D E D V A R I A B L E S •OOOOOOOOO•

O~

TM

50 o

-..

o~

H I

......................

//

O

<0.0001

0.001

p-VALUE

,~ ,;,,'.

0.01

0.i

IN ORIGINAL

ANALYSIS

1.0

FIG. 1. Relationship between a variable's significance (P value) in stepwise multiple regression analysis and its frequency of selection in random resampling trials (bootstraps). Points represent five independent variables (age, height, smoking, cholesterol, diuretics) considered as predictors of eight pulmonary function indices in MRFIT population (FEV 1, FVC, and Vmax and A~'m,x at 50, 25, and 12.5% FVC). Conventional critical level, P < 0.05, corresponded to selection frequency of approximately 50% in 20 trials.

Data were adjusted to standard values of the four factor-representative x's other than smoking. For standard values we used mean age, height, and serum cholesterol and no diuretic use. Average of the 20 adjusted values was taken as the definitive bootstrap-adjusted value for each subject. The data were now ready for comparison of smoking groups, as described in a separate report (Brain et al., submitted). Summary statistics on the adjusted values are shown in the left half of Table 4. Means and standard deviations were not noticeably different from those of the raw data. We did not expect major changes, because in most cases the sample mean was used as standard value for adjustment. What we did expect was realignment of various subgroups to compensate for their differing composition with respect to influential variables. An example is TABLE 4 PULMONARY FUNCTION VARIABLES ADJUSTED FOR AGE, BODY SIZE, AND OTHER INFLUENCES PECULIAR TO MRFIT SAMPLE Components of SD b

Mean ± SD a Unadjusted F E V 1 (liters) FVC (liters) ~max(50~) (liters/sec) Vm~,czs~) (liters/sec) V~,(~2.5%~ (liters/sec) AVm~x~s0%~ AVmax(25~) AVm~xO2.5%>

3.16 4.23 3.82 0.98 0.29 0.45 0.20 0.07

+ 0.56 ± 0.65 ± 1.47 _+ 0.44 _+ 0.13 ± 0.18 ± 0.17 ± 0.19

Adjusted 3.22 4.29 3.94 1.02 0.30 0.46 0.21 0.05

± ± ± ± ± ± ± ±

0.47 0.52 1.43 0.41 0.12 0.18 0.17 0.19

Total

Subject

Bootstrap

Residual

0.48 0.53 1.45 0.42 0.12 0.18 0.18 0.19

0.47 0.52 1.43 0.41 0.12 0.18 0.17 0.19

0.04 0.05 0.09 0.03 0.01 0.01 0.00 0.01

0.06 0.07 0.18 0.05 0.02 0.02 0.01 0.02

a n = 403 subjects. Adjusted value is subject's average over 20 bootstrap replications of regressionadjustment procedure. b n = 403 subjects × 20 bootstrap replications. Random variability attributed to various sources by 2 2 two-way Model II analysis of variance: SD 2 = SDsubjec t -I- SDbootstra p W S O r e2s i d u a l .

260

FELDMAN, BRAIN, AND HARBISON

shown in Fig. 2. Age and height were the strongest predictors of FEV 1. Nonsmokers (N) and long-term ex-smokers (XXX) happened to be the two oldest and two shortest groups. As a consequence FEV~ was adjusted upward by an average of 174 ml in nonsmokers and 162 ml in long-term quitters. The effect of the adjustments is illustrated in Fig. 3. Anticipating our further analyses, we performed linear regression of FVC on smoking dose among current smokers, using unadjusted FVC in one case and adjusted FVC in the other. The relationship was statistically significant when adjusted values were used (P < 0.05), but not when raw values were used (P > 0.10). Use of adjusted values similarly increased the statistical power when we tested the dependency of pulmonary function variables on cigarettes per day (in current smokers) and years since quitting (in recent ex-smokers). Variance components. The 20 × 403 adjusted values were subjected to twoway Model II (random effects) analysis of variance, by which we assessed three sources of random variation: (1) subject-subject differences, reflecting natural heterogeneity in the sample; (2) bootstrap-bootstrap differences, reflecting variation arising from selection of predictors; and (3) residual error, reflecting random variability attributable to measurement error and other unknown, uncontrolled sources. Results are shown in the right half of Table 4. Subject-subject variation

"t t t { 48 !

,

,

,

,

,

N

XXX

XX

X

S

,

"& 7 0 "

.... i .... I . . . . . . . . . . . . . . . .

68

,

.

N

XXX

.

. XX

. X

. S

A

>~

.°..-II

o o o

..............

.... ....

Z

-0.2 N

XXX SMOKING

XX

X

S

HISTORY

FIG. 2. Statistical adjustment procedure realigned smoking groups with respect to F E V l, compensating for their differing composition with respect to major predictors of F E V 1. Symbols indicate means _+ SEM. Dashed lines are for reference to mean predictor value and zero adjustment.

261

ADJUSTING FOR CONFOUNDED VARIABLES

,.,6.0 v

~J >



. . . . . . . . . . . . . . e ......

|_

!

~

O

:

"

t,, 4.0

o

~

"|I';:

5~

Z~ 2.0

NS (p>O. lo) O.0 4

6

8

i0

12

6.0

{_)

> t,, 4.o c





I

I



:

'

.......

53 2.0 p
,

,

,

,

,

4

6

8

i0

12

SMOKING

DOSAGE

INDEX

FIG. 3. Regression of FVC on smoking dosage index among current smokers (n = 91). Relationship was statistically significant when regression was performed on adjusted FVC (P < 0.05), but not when performed on unadjusted FVC (P > 0.10). Dashed lines indicate 95% confidence band for position of "true" line.

accounted for virtually all of the standard deviation in every pulmonary function variable. Residual error was only 10-20% as large as subject-subject variation, and variation among bootstrap replicates was only 5-10% as large. These figures indicate that artifactual selection of predictors, which would be manifested as variability among bootstrap replicates, was relatively minor. DISCUSSION

Standardization

When extraneous variables intrude on a comparative study, it is natural to divide data analysis into two stages: standardization and comparison. In the standardization phase one tries to produce adjusted or normalized dependent variables (y), freed of all known influences except for the single remaining independent variable (x). Comparison may then proceed, focusing on the influence of x on y. In studies of physiological function, standardization has taken several forms. Simplest is the external standard. For example, Buist et al. (1979) measured

262

FELDMAN, BRAIN, AND HARBISON

FEV1, FVC, and Vmax~50%)in volunteer subjects at a smoking cessation clinic, including some who had quit smoking and some who had not. Predicted values for FEV 1, FVC, and Vmax~50~)were calculated by a formula from previous literature, based on age and sex in a sample of healthy nonsmoking adults (Morris et al., 1971). Measurements were divided by predicted value and the central analy s i s - c o m p a r i s o n of pulmonary function in smokers and q u i t t e r s - - w a s performed entirely on percentages of prediction• Similar examples of external standardization include recent studies of potash and textile workers (Graham et al., 1984; Schachter et al., 1984)• The microprocessors built into modern spirometry equipment are likely to make external standardization irresistibly convenient and therefore more widespread despite the obvious pitfalls, namely, questionable comparability between the study population and the standard population• Investigators should take care at least to note the source of the standardization formula in reporting studies conducted with such equipment• A second approach to standardization involves an internal standard derived from one part of the data, usually the healthiest group or an initial group for reference in a longitudinal study. For example, L a m e t al. (1981), in a study of grain workers and smoking office workers, obtained prediction formulas from a control subgroup who had no history of smoking, dust exposure, or pulmonary disease• Smokers and grain workers were compared with controls after adjustment of pulmonary function values according to the control-based formula. Similarly Beck et al. (1981) surveyed several thousand smokers, nonsmokers, and ex-smokers in three U.S. towns and derived prediction equations for lung function based on age, height, weight, and sex in 1817 healthy nonsmoking white residents. Differences between observed and predicted FEV1, Vmax(50%),and Vmaxt25~) were then analyzed for effects of age, smoking, and cumulative exposure. Another example of internal standardization is a recent study of children's lung function and parental smoking (Tashkin et al., 1984)• It is noteworthy that the pulmonary function formulas of Knudson et al. (1976), widely used by others as an external standard, were intended by the authors as only internal reference formulas for a longitudinal study (Knudson et al., 1983)• A third approach to standardization is to derive separate standards for each group to be compared• For example, Bande et al. (1980) fitted FEV~ and FVC data with polynomial regression equations in age, height, and weight, constructing separate curves for nonsmokers, light smokers, and heavy smokers• The curves were compared at a particular age by means of confidence bands around the curves. Finally, one may standardize on the basis of a single internal standard derived from all the data. An example is a recent study of alcohol consumption and pulmonary function (Sparrow et al., 1983)• Subjects were grouped by smoking status and alcohol consumption• Each group was standardized to the age, height, and years of education of the entire sample by analysis of covariance (ANCOVA), also known as parallel-line regression• ANCOVA removes whatever consistent influence might be exerted by age, height, etc., throughout the sample. If the groups to be compared differ in age, height, etc., then ANCOVA is theoretically

ADJUSTING FOR C O N F O U N D E D VARIABLES

263

more efficient than using a control-group standard in making statistical comparisons. Multiple linear regression analysis (MLR) differs from ANCOVA only in that after adjustments are made, tl-ie remaining variable for comparison is continuous rather than discrete (Draper and Smith, 1980). In both ANCOVA and MLR the distinction between standardization and comparison may be blurred, because both steps are accomplished in one computational operation. Both techniques are analogous to two-way analysis of variance (ANOVA), in which two discrete sources of variation are assessed simultaneously, each "controlled" for the other. Levenstein et al. (1978) used multiple regression to standardize children's pulmonary function measurements in a study of environmental pollution. In this study we standardized on the full sample as in ANCOVA, removing all common influences of age, height, etc., as a prelude to a comparison across discrete smoking categories. Two additional features made our treatment more thorough than that of simple ANCOVA. First, the variables of standardization were selected on the basis of multivariate analysis of these data rather than extrinsic reasons or reference to previous studies. Stepwise regression was the means of selection, preceded by factor analysis, which provided a trim, interpretable list from which to select. Second, the standardization was confirmed and refined by bootstrap replication, making the precise choice of predictors moot and leaving us with confidence that the principal source of variation in our standardized values was biological rather than artifactual. D e a l i n g with C o n f o u n d i n g

In studies of environmental and occupational health effects, detection of confounding (or, conversely, demonstration of its absence) has been accomplished in several ways. The simplest is to confirm by conventional statistical testing that a potential confounding variable does not differ between the groups to be compared. For example, Buist et al. (1979) confirmed that mean age was not significantly different in smokers and quitters. Teculescu et al. (1980), before comparing spirometric results in light, moderate, and heavy smokers, confirmed that their anthropometric data did not differ significantly. In the present study we confirmed that patterns of failure and success in producing flow-volume curves were not associated with age, height, or smoking category. Correlation analysis may be used to quantitate confounding relationships rather than merely to prove or disprove their presence. For example Greaves et al. (1984), in reporting regressions of pulmonary function on age, height, pack-years smoking, and cumulative occupational exposure to solder fumes, noted that "statistical independence of [independent] variables was confirmed by the simple correlation coefficients." A recent study of respiratory health in morticians pointed to a correlation of r = 0.49 between age and number of bodies embalmed as a possible explanation for the failure of the study to detect an effect of cumulative exposure to embalming fluid (Levine et al., 1984). The quantitative approach was carried further by Beck et al. (1981), who analyzed the principal components of their correlation matrix--principal components being, like factors, indepen-

264

FELDMAN, BRAIN, AND HARBISON

dent linear combinations of the original variables--and concluded that the correlations were weak enough to ignore. In statistical literature the interdependence of independent variables is known as collinearity (or, redundantly, multicollinearity). A review of the subject for physiological investigators was recently published by Slinker and Glantz (1985). The monographs of Belsley et al. (1980) and Levenstein (1979) provide criteria for "diagnosis" of collinearity and some remedies. Two of their points are noteworthy. First, pairwise correlation is only the simplest, most easily detected case of the more general phenomenon of collinearity. Small correlations are therefore not sufficient evidence for acquitting one's data of collinearity. Second, collinearity is not necessarily harmful. While certain products of the regression procedure (regression coefficients, adjusted values) may be degraded by large variance stemming from collinear data, other numerical results may be precisely determined nonetheless. The investigator must judge whether the quantities of interest have acceptable precision and, if not, whether collinearity or mere noise is to blame. The diagnostic criteria of Belsley et al. are designed to aid the investigator in making such judgments. In this report we were confronted with a large set of independent variables that were correlated by design. Factor analysis allowed us to reduce the variables to a list of potential predictors that was not only short but also free of collinearity. Smoking behavior in particular constituted a distinct factor, independent of the other four factors (age, body size, serum lipids, medication). The minor contribution of bootstrap-bootstrap variation to the variance of our final adjusted values showed that whatever collinearity remained was not harmful, in that it did not inflate variability in the quantities of interest. Thus multivariate statistical methods not only identified collinearity in the MRFIT population but also showed us how to sidestep its effect. Not every analysis can benefit from such treatment; in some data the confounding is too tight to unravel. An example is the attempt by Sparrow et al. (1983) to detect an influence of alcohol consumption on pulmonary function. After adjustment was made for the obligatory control variables--age, height, smoking, and education--no significant effect of alcohol could be demonstrated. The authors gave a reasonable biological interpretation of their negative finding, but statistical reasons may have been just as important. Although no information on correlation of independent variables was reported by Sparrow et al., alcohol and cigarette consumption were very likely strongly correlated. Conclusion

Prediction equations for pulmonary function show "surprising variability in different reports" (Muiesan et al., 1971). The blame may lie with variations in instrumentation, interpretation, and mathematical formulation, or with the difficulty of unbiased sampling, including genetic, demographic, and environmental factors. Given the long list of published formulas for "normal" physiological function, what should an investigator do? How do the formulas differ? Which, if any, is appropriate for one's own study? For our study we felt none was right. Because of our unique study sample we

ADJUSTING FOR CONFOUNDED VARIABLES

265

had no confidence in the applicability of any adjustment formula from the literature on normal populations. The rationale for the extensive methodological discussions in this paper is that virtually every study of physiological function under adverse occupational, environmental, or behavioral conditions must be suspect of some similar flaw. We have not attempted to add one more set of prediction formulas to the literature but rather to outline a thorough methodological approach to adjusting confounded data. We offer our example to illustrate how such adjustments may be made carefully and effectively. Did statistical adjustments make or break the MRFIT ancillary study? We believe so for two reasons. First, the confounding in the MRFIT was explicit. Our findings would not be credible unless we did our best to delineate the sources of confounding and purge as many as possible. Second, the effects we were trying to detect were not very large compared with the expected sources of variability. Therefore we were bound to pursue any chance to reduce extraneous variability. Our procedures led to discernible adjustments (Fig. 2) and a sharpening of subsequent analyses (Fig. 3), suggesting that statistical adjustment did uncover some effects that otherwise would have been lost in the noise. Such an approach may be essential in other studies of physiological function in human populations exposed to environmental or occupational hazard.

ACKNOWLEDGMENTS The authors thank Steven Bloom and Tony Pikus for technical assistance, Richard Letz for graphics software, and Robert Benfari and the staff of the Boston MRFIT for their cooperation. The study was supported by National Institutes of Health (HL-19170).

REFERENCES Bande, J., C16ment, J., and Van de Woestijne, K. R (1980). The influence of smoking habits and body weight on vital capacity and FEV1 in male Air Force personnel: A longitudinal and cross-sectional analysis. Amer. Rev. Respir. Dis. 122, 781-790. Beck, G. J., Doyle, C. A., and Schachter, E. N. (1981). Smoking and lung function. Amer. Rew Respir. Dis. 123, 149-155. Belsley, D. A., Kuh, E., and Welsch, R. E. (1980). "Regression Diagnostics: identifying Influential Data and Sources of Collinearity," pp. 85-191. Wiley, New York. Benfari, R. C., McIntyre, K., Benfari, M. J. E, Baldwin, A., and Ockene, J. (1977). The use of thiocyanate determination for indication of cigarette smoking status. Eval. Q. 1,629-638. Brain, J. D., Feldman, H. A., Harbison, M. L., Sneddon, S. L., and Kane, D. Pulmonary function and smoking in a special population: Dose effects in smokers and recovery in ex-smokers. Submitted. Buist, A. S., Nagy, J. M., and Sexton, G. J. (1979). The effect of smoking cessation on pulmonary function: A 30-month follow-up of two smoking cessation clinics. Amer. Rev. Respir. Dis. 120, 953 -957. Diaconis, E, and Efron, B. (1983). Computer-intensive methods in statistics. Sci. Amer. 248(5), 116-130. Draper, N. R., and Smith, H. (1980). "Applied Regression Analysis," 2nd ed., pp. 294-379. Wiley, New York. Efron, B., and Gong, G. (1983). A leisurely look at the bootstrap, the jackknife, and cross-validation. Amer. Statist. 37, 36-48. Graham, B. L., Dosman, J. A., Cotton, D. J., Weisstock, S. R., Lappi, V. G., and Froh, E (1984). Pulmonary function and respiratory symptoms in potash workers. J. Occup. Med. 26, 209-214.

266

FELDMAN, BRAIN, AND HARBISON

Greaves, I. A., Wegman, D. H., Smith, T. J., and Spiegelman, D. L. (1984). Respiratory effects of two types of solder flux used in the electronics industry. J. Occup. Med. 26, 81-85. Johnson, R. A., and Wichern, D. W. (1982). "Applied Multivariate Statistical Analysis," pp. 401-457. Prentice-Hall, Englewood Cliffs, NJ. Knudson, R. J., Slatin, R. C., Lebowitz, M. D., and Burrows, B. (1976). The maximal expiratory flow-volume curve: Normal standards, variability and effects of age. Amer. Rev. Respir. Dis. 113, 587-600. Knudson, R. J., Lebowitz, M. D., Holberg, C. J., and Burrows, B. (1983). Changes in the normal maximal expiratory flow-volume curve with growth and aging. Amer. Rev. Respir. Dis. 127, 725-734. Lain, S., Abboud, R. T., Chan-Yeung, M., and Tan, E (1981). Use of maximal expiratory flowvolume curves with air and helium-oxygen in the detection of ventilatory abnormalities in population surveys. Amer. Rev. Respir. Dis. 123, 234-237. Levenstein, M. J. (1979). "Alternative Regression Methods in the Presence of Multicollinear Regressors." Doctoral dissertation, Harvard University. Levenstein, M. J., Bishop, Y. M. M., Ferris, B. G., Jr., and Speizer, E E. (1978). Six-cities study: Standardization of pulmonary function data. In "Energy and Health" (SIAM-SIMS Conference Series, Vol. 6, N. E. Breslow and A. S. Whittemore, Eds.), pp. 169-187. Society for Industrial and Applied Mathematics, Philadelphia. Levine, R. J., DalCorso, R. D., Blunden, E B., and Battigelli, M. C. (1984). The effects of occupational exposure on the respiratory health of West Virginia morticians. J. Occup. Med. 26, 91-98. Morris, J. E, Koski, A., and Johnson, L. C. (1971). Spirometric standards for healthy nonsmoking adults. Amer. Rev. Respir. Dis. 103, 57-67. MRFIT Research Group (1982). Risk factor changes and mortality results. J. Amer. Med. Assoc. 248, 1465-1477. Muiesan, G., Sorbini, C. A., and Grassi, V. (1971). Respiratory function in the aged. Bull. Eur. Physiopathol. Respir. 7, 973-1009. SAS Institute, Inc. (1982). "SAS User's Guide: Statistics." SAS Institute, Cary, NC. Schachter, E. N., Maunder, L. R., and Beck, G. J. (1984). The pattern of lung function abnormalities in cotton textile workers. Amer. Rev. Respir. Dis. 129, 523-527. Slinker, B. K., and Glantz, S. A. (1985). Multiple regression for physiological data analysis: The problem of multicollinearity. Amer. J. Physiol. 249, R1-R12. Sokal, R. R., and Rohlf, E J. (1981). "Biometry," 2nd ed., pp. 321-371. Freeman, San Francisco, CA. Sparrow, D., Rosner, B., Cohen, M., and Weiss, S. T. (1983). Alcohol consumption and pulmonary function: A cross-sectional and longitudinal study. Amer. Rev. Respir. Dis. 127, 735-738. Tashkin, D. E, Clark, V. A., Simmons, M., Reems, C., Coulson, A. H., Bourque, L. B., Sayre, J. W., Detels, R., and Rokaw, S. (1984). The UCLA population studies of chronic obstructive respiratory disease. VII. Relationship between parental smoking and children's lung function. Arner. Rev. Respir. Dis. 129, 891-897. Teculescu, D. B., Pino, J., and Sadoul, E (1980). Cigarette smoking and density dependence of maximal expiratory flow in asymptomatic men. Amer. Rev. Respir. Dis. 122, 651-656.