Statistical Analysis of Air Pollution Panel Studies: An Illustration

Statistical Analysis of Air Pollution Panel Studies: An Illustration

Statistical Analysis of Air Pollution Panel Studies: An Illustration HOLLY JANES, PHD, LIANNE SHEPPARD, PHD, AND KRISTEN SHEPHERD, MS PURPOSE: The pa...

422KB Sizes 0 Downloads 58 Views

Statistical Analysis of Air Pollution Panel Studies: An Illustration HOLLY JANES, PHD, LIANNE SHEPPARD, PHD, AND KRISTEN SHEPHERD, MS

PURPOSE: The panel study design is commonly used to evaluate the short-term health effects of air pollution. Standard statistical methods are available for analyzing longitudinal data, but the literature reveals that these methods are poorly understood by practitioners. METHODS: We review standard statistical methods for modeling longitudinal data. Marginal, conditional, and transitional approaches are reviewed and contrasted with respect to their parameter interpretation and methods for accounting for correlation and dealing with missing data. We also discuss techniques for controlling for time-dependent and time-independent confounding and for exploring and summarizing panel study data. Notes on available software are provided. RESULTS: These methods are illustrated by using data from the 1999 to 2002 Seattle Panel Study. CONCLUSIONS: The quality of statistical analyses and presentation of results of panel studies could be improved if the methods we present were followed. Ann Epidemiol 2008;18:792–802. Ó 2008 Elsevier Inc. All rights reserved. KEY WORDS:

Panel Study Design, Air Pollution, Longitudinal Data.

INTRODUCTION The panel study design is a popular tool for studying the short-term effects of air pollution on human health. Subjects are observed repeatedly over time, thus facilitating assessment of the health effects of changes in exposure over time. Standard statistical methods for analyzing longitudinal data can be applied. However, many published panel studies could be improved if practitioners had a better understanding of the statistical issues pertaining to the analysis of longitudinal data and appropriate statistical analysis techniques. This paper illustrates longitudinal data methods using a recent air pollution panel study, and clarifies common points of confusion. The second section describes the Seattle, Washington panel study that is used in the third section to illustrate different approaches to the analysis of longitudinal data. In the fourth section, we demonstrate techniques for controlling for confounding, and in the fifth section we illustrate methods for exploring and summarizing panel study data. Notes on software are included in the Appendix.

THE SEATTLE PANEL STUDY The Seattle Panel Study, conducted between 1999 and 2002, was designed to assess air pollution exposure and From the Division of Public Health Sciences, Fred Hutchinson Cancer Research Center (H.J.); and the Departments of Biostatistics (L.S.) and Occupational and Environmental Health Sciences (L.S., K.S.), University of Washington, Seattle. Address correspondence to: Dr. Holly Janes, 1100 Fairview Ave N, M2C200, Seattle, WA 98109. Tel: (206) 667-6353. Fax (206) 667-4812. E-mail: [email protected]. Received January 9, 2008; accepted June 18, 2008. Ó 2008 Elsevier Inc. All rights reserved. 360 Park Avenue South, New York, NY 10010

evaluate the health effects of particulate matter (PM) and related pollutants among susceptible individuals (1–4). We restrict attention to data on 19 children with asthma, aged 6–13 years, who were recruited from a local allergy and asthma clinic, had physician-diagnosed asthma, and were taking asthma medications. A full description of this data set is published elsewhere (1). Each child participated in one to three 10-day monitoring sessions, with some sessions occurring in winter 2000– 2001 and some in spring 2001. The average number of sessions was 1.7, for a total of 33 subject-sessions. During the session, the children filled out daily questionnaires pertaining to asthma symptoms and daily activities. Exhaled nitric oxide (eNO), a measure of airway inflammation commonly elevated in asthmatics, was collected daily in the homes of the children, using an NO-inert and impermeable Mylar balloon. Children were asked to forego food intake for 1 hour before the measurement, which was taken in the afternoon or early evening. The child’s ‘‘overall well-being’’ was characterized using a binary variable, coded as 1 if the child did not report feeling ‘‘better than average’’ on a given day. We refer to this adverse event as ‘‘feeling worse.’’ Particulate matter less than 2.5 mm in diameter (PM2.5), measured inside the subjects’ homes using single-stage inertial Harvard Impactors, serves as our exposure of interest. Indoor PM concentrations are thought to more closely represent personal exposures than do concentrations at centrally located population monitors. Daily measures of relative humidity and temperature, measured at a centrally located site operated by the Puget Sound Clean Air Agency (Beacon Hill) were also collected. The 19 children in the Seattle Panel Study had an average age of 9.0 years (standard deviation [SD], 2.0 years). The 1047-2797/08/$–see front matter doi:10.1016/j.annepidem.2008.06.004

AEP Vol. 18, No. 10 October 2008: 792–802

Janes et al. ANALYSIS OF PANEL STUDIES

Selected Abbreviations and Acronyms PM Z particulate matter PM2.5 Z particulate matter less than 2.5 mm in diameter eNO Z exhaled nitric oxide BMI Z body mass index GEE Z generalized estimating equations MCAR Z missing completely at random MAR Z missing at random SD Z standard deviation OR Z odds ratio CI Z confidence interval

purposes, the analyses are presented first and the data descriptives are saved for the section on exploring and summarizing longitudinal data. Let Yit denote the binary outcome, overall well-being, for subject i at time t, and let Xit represent indoor PM2.5 at time t. The three types of longitudinal models assume different forms for the mean of Yit. A marginal model specifies a form for P(Yit Z 1jXit), the probability of feeling worse as a function of the indoor PM2.5 level, for example, logit PðYit Z 1 j Xit Þ Z b0 þ bM Xit

average body mass index (BMI) was 19.8 kg/m2 (SD, 3.3 kg/ m2). Time-varying variables are summarized in Table 1 both across subjects and sessions, and within subject-sessions. (See the section on between- and within-subject effects for the motivation for such partitioning.) For all variables, the variability within-subject-session is more than half of the total variability. The total and within-subject-session proportions of times subjects reported feeling worse are also reported. For 10 subject-sessions, there is no variability in overall well-being.

Outcome Variables Longitudinal models can be used to model binary, count, and continuous outcomes. The outcomes must vary over time within individuals to allow for estimation of withinsubject exposure effects. Model Specification and Parameter Interpretation We consider three different classes of longitudinal models: marginal, conditional, and transition models. The Seattle Panel Study is used to illustrate each approach. For didactic TABLE 1. Summary of time-varying variables for 19 study subjects* No. of observations Relative humidity (%) Temperature ( F) Indoor PM2.5 (mg/m3) Overall well-being eNO (ppb)

313 313 296 314 288

Overall mean (SD)

Within-subjects, within-session SDy

79.03 (10.24) 7.79 44.32 (6.27) 3.63 9.08 (5.87) 3.96 0.59z 0.60 (0.30–1.00)x 15.74 (9.98) 9.59

SD Z standard deviation; PM2.5 Z particulate matter less than 2.5 mm in diameter; eNO Z exhaled nitric oxide; ppb Z parts per billion. *A total of 19 subjects were studied for one to three 10-day monitoring sessions each (mean 1.7), for a maximum of 330 observations. y Summarizing the within-subject, within-session component of this variable, which has mean zero by definition; see Section Between- and within-subject effects for motivation. z Proportion ‘‘positive.’’ x Mean and interquartile range of the within-subject, within-session proportions ‘‘positive.’’

(1)

A model for the correlation among outcomes over time is specified separately (see the section on Modeling the correlation structure). The parameters in model (1) are estimated using generalized estimating equations (GEE) (5). The marginal parameter, bM, represents the difference in the logodds of feeling worse between groups of children with a unit difference in indoor PM2.5. In contrast, a conditional model specifies a form for P(Yit Z Xit, i) , the probability of feeling worse for subject i (6). We use a random intercept model, PðYit Z 1 j Xit; i Þ Z b0 þ bi þ bC Xit

LONGITUDINAL DATA MODELING

793

(2)

and assume that bi w N(0,v2), where v2 describes the heterogeneity in baseline overall well-being across children. A more complex conditional model would also allow the slope to be random. The parameters in model (2) are estimated using maximum likelihood estimation, since the distribution for the random effects induces a form for the likelihood of the data. The conditional parameter, bC, represents a child’s expected change in the log-odds of feeling worse due to a unit increase in indoor PM2.5. In this way, the conditional model facilitates making inferences about individuals, rather than groups of individuals. A transition model specifies a form for P(Yit Z 1jXit,Yit1..,Yi1), the probability of feeling worse as a function of past overall well-being (6). We condition on the previous day’s overall well-being, though other previous outcomes could also be included: logit PðYit Z 1 j Xit; Yit1 Þ Z b0 þ bT Xu þ gYit1

(3)

The parameters in model (3) are estimated using GEE (alternatively likelihood methods can sometimes be used for estimation (6)). The transition parameter, BT, can be interpreted as the difference in the log-odds of feeling worse between groups of children that have one unit difference in indoor PM2.5 today, but that had the same overall wellbeing yesterday. For this reason, a transition model for a binary outcome can be thought of as a model for incidence, while the marginal model relates to the prevalence. The parameter g represents the transition effect, or the effect of

794

Janes et al. ANALYSIS OF PANEL STUDIES

AEP Vol. 18, No. 10 October 2008: 792–802

TABLE 2. Association between indoor PM2.5 levels and overall well-being, as estimated by three different types of longitudinal models*

y

Marginal model Random intercept model Transitional modely

OR below median

95% CI

p Value

OR above median

95% CI

p Value

0.16 0.22 0.22

(0.01–2.37) (0.01–6.89) (0.02–2.64)

0.180 0.389 0.233

2.42 5.96 2.53

(0.67–8.80) (1.71–20.73) (1.07–5.95)

0.179 0.005 0.034

PM2.5 Z particulate matter less than 2.5 mm in diameter; OR Z odds ratio; CI Z confidence interval. *Indoor PM2.5 is modeled using a spline with one knot at the median (7.46 mg/m3). Odds ratio relates to a 10 mg/m3 increase in indoor PM2.5. All models include age, body mass index, relative humidity, and temperature. y Estimated by using generalized estimating equations with independent working correlation. Robust standard errors are reported.

yesterday’s overall well-being on today’s overall well-being, holding today’s indoor PM2.5 level constant. For a transition model to be well defined, observations need to be equally spaced. Models (1), (2), and (3) were fit to the Seattle data (Table 2). In these data, the effect of indoor PM2.5 is piecewise linear (see the section on exploring and summarizing longitudinal data), which we model using a linear spline with one knot at the median, 7.46 mg/m3 (7). Age, BMI, relative humidity, and temperature are included in the models to increase precision. With all three approaches, we see little evidence of a PM effect below the median. Above the median, the conditional PM effect is much larger than the transitional or marginal effects. The conditional model estimates a 496% increase in the odds of feeling worse for a given child per 10 mg/m3 increase in indoor PM2.5 above the median (95% CI: 71% to 1973%). Based on the transition model, there is an associated 153% higher odds of feeling worse in a population with 10 mg/m3 higher indoor PM2.5 today (above the median), but the same overall well-being yesterday (95% CI: 7% to 495%). The different results are due to the fact that the coefficients represent different quantities. Non-binary outcomes can be accommodated in longitudinal data models by using different ‘‘link functions.’’ We have used the logit link function, which is appropriate for a binary outcome. The random intercept model for a general link function g is gðEðYit j Xit; iÞÞ Z b0 þ bi þ bC Xit: With a continuous outcome, it is common to let g(w) Z w be the identity function, and with a count outcome, g(w) Z log(w). Table 3 shows the results of fitting marginal, conditional (random intercept), and transitional (conditioning on yesterday’s outcome) models for eNO as a function of indoor PM2.5, using identity link functions. Here, indoor PM2.5 was modeled linearly, and age, gender, BMI, relative humidity, and temperature were again included for precision. Observe that the conditional and marginal parameter estimates are nearly identical, and differ slightly due to differences in estimation procedures. In fact, with an identity link function, the conditional and marginal parameters will always

agree. With a certain parameterization of the transition model, this parameter will also be the same (6). In general, the scientific question of interest should dictate the type of longitudinal model that is used. A marginal model is used to estimate the effect of exposure on population average outcomes. This is useful, for example, to assess the impact of increasing PM on population rates of asthmainduced hospital visits. If individual-level exposure effects are desired, a conditional model is more appropriate. Such a model can be used, for instance, to assess the effect of PM on individuals’ immune system biomarkers, accounting for heterogeneity in baseline levels of the biomarkers. Finally, a transition model would be useful to determine the effect of increasing PM on population rates of asthma symptoms, accounting for the fact that individuals who experience symptoms on one day are more likely than those who did not to have symptoms on the following day. These examples illustrate the technique of choosing a model which corresponds most closely to the scientific question of interest. An aggregated analysis has historically been a popular approach to analyzing panel data. All outcomes on the same day are collapsed into a total, called the panel average (or panel attack rate for binary outcomes) (8). The panel average is then regressed on exposure. This approach is subject to many of the biases associated with ecological studies (9, 10). In addition, inference from a linear model which assumes that outcomes on successive days are independent and have a common variance may be incorrect. Finally, bias may be an issue if subject-specific missing data patterns are not taken into account (11). For these reasons, aggregated analyses have been discouraged since 1979 (8). TABLE 3. Association between indoor PM2.5 levels and eNO, as estimated by three different types of longitudinal models*

Marginal modely Random intercept model Transitional modely

Coefficient

95% CI

p Value

4.15 4.10 3.28

(1.06–7.24) (1.89–6.32) (1.00–5.57)

0.008 !0.001 0.005

PM2.5 Z particulate matter less than 2.5 mm in diameter; eNO Z exhaled nitric oxide; OR Z odds ratio; CI Z confidence interval. *Coefficient relates to a 10 mg/m3 increase in indoor PM2.5. All models include age, body mass index, relative humidity, and temperature. y Estimated by using generalized estimating equations with independent working correlation. Robust standard errors are reported.

AEP Vol. 18, No. 10 October 2008: 792–802

In our illustration, we characterized exposure using today’s indoor PM2.5. However, longitudinal models can incorporate other exposure measures, such as other exposure lags, cumulative exposure, or different exposure components (see the section on between- and within-subject effects). Distributed lag exposure models (6, 12, 13), which allow the health effects of air pollution to extend over time, can also be used. With a distributed lag model, exposure is modeled as a linear combination of exposure lags, say h0 Xit þ..þ hqXitq, where the coefficients, hs, can be constrained to have P a specific functional form. The sum of the coefficients, qsZ0 hs , is interpreted as the net effect of exposure on the outcome. Accounting for Correlation Modeling the correlation structure. In the Seattle Panel Study, outcomes for each child are correlated, as measurements taken on the same child are likely to be more similar than those on different children. All longitudinal studies have correlation due to repeated measures. This is what makes using special longitudinal models necessary. But the Seattle Panel Study has additional correlation structure since observations in the same session are likely to be more similar than those in different sessions. Multiple levels of correlation, because of observation times or measurement error, are common in longitudinal studies. Correlation among outcomes is typically of two types: exchangeable and serial. An exchangeable structure implies that any pair of outcomes has the same correlation, while serial correlation decreases with the time elapsed between the two observations. There may be both serial and exchangeable correlation in an outcome if correlation decreases with time separation, but there is still correlation between observations that are far apart. In longitudinal models, correlation is accounted for by specifying the independent unit of observation, or ‘‘clustering’’ variable. Observations within-cluster are allowed to be correlated, and those between-clusters are assumed to be independent. The three types of longitudinal models deal with the clustering variable in different ways. With a conditional model, the mean model specifies that observations in the same cluster are correlated because they share the random effects. The transitional model for the mean implies that the outcome at time t depends on outcomes at earlier times, and this induces correlation among outcomes in the same cluster. With a marginal model, the correlation structure is specified separately from the mean model using a ‘‘working correlation matrix’’ (6). Common working correlation choices are independence (observations in the same cluster are assumed to be independent), autoregressive, and exchangeable structures. A working correlation model can also be used to add additional correlation structure to a transition model. Exploratory analyses can be used to select the

Janes et al. ANALYSIS OF PANEL STUDIES

795

working correlation (see the section on exploring correlation structure). In the Seattle data, we specify subject-session as the clustering variable for simplicity and consistency with the literature. Hence, the i subscript indexes subject-sessions. This structure assumes that observations within the same subject-session are correlated, while observations in the same session but different subjects, or the same subject but in different sessions, are uncorrelated. An alternative would be to cluster on both subject and session, thereby allowing for correlation between observations across subjects in the same session, and across sessions for the same subject. However, with many subjects having only one session, and others only two or three sessions, the data will not allow us to estimate the within-subject, between-session correlation precisely. Modeling correlation structure in marginal models. With marginal and transition models estimated using GEE, the parameter estimates for the mean model are still valid even if the correlation structure is misspecified if socalled ‘‘robust’’ standard error estimates are used (5, 14, 15). Robust standard errors are reported by most statistical packages, though the number of individuals must be large to ensure their validity. Correct specification of the correlation structure is essential, however, if the variance model is of interest, perhaps for studying variation in within-subject outcomes. Correct specification of the correlation structure has a benefit in terms of gains in efficiency (6). However, if the appropriate exposure lags are not included in the mean model, the parameter estimates will be biased unless a working independence correlation structure is used (16). Therefore, researchers face a dilemma: specify an appropriate working correlation, and risk bias in the parameter estimates, or specify working independence, report robust standard errors, and risk a loss of efficiency. Schildcrout and Heagerty (17) provide guidance in choosing between these strategies, assuming that the effect of one exposure lag is of interest. They suggest that the degree to which other lags are associated with the outcome be determined. If there are other large lag effects, an independence working correlation matrix should be used, since the bias associated with the incorrectly specified mean model will be large. On the other hand, if the other lag effects are small, non-independence working correlation should be used. In the Seattle data, we use a working independence correlation structure and report robust standard errors to guard against bias in the parameter estimates. Missing Data Missing data is a common problem in longitudinal studies. Subjects may have observations missing intermittently, or

796

Janes et al. ANALYSIS OF PANEL STUDIES

AEP Vol. 18, No. 10 October 2008: 792–802

they may drop out at some point. Missing observations are those that are missing unintentionally; measuring subjects at different prespecified time points results in unbalanced, but not missing, data. All of the longitudinal models we have discussed allow researchers to use the available data for each subject; subjects need not be dropped from the analysis if they have missing data. However, each of the models makes certain assumptions about the reasons for missingness. Missingness is frequently partitioned into three categories (18). Data are said to be missing completely at random (MCAR) if missingness is independent of both observed and unobserved data. Observations are missing simply because of random chance. In contrast, if data are missing at random (MAR), missingness depends only on the data that are collected. Finally, if data are informatively missing, the missingness mechanism depends on both observed and unobserved data. Determining the type of missing data at hand is difficult. There are some ad-hoc procedures for using the data to distinguish between MCAR and MAR (see Diggle et al. (6)). However, it is impossible to determine from the observed data whether missingness is informative. In general, the type of missingness is determined by carefully considering the reasons for missing data in the particular study. There is a substantial amount of missing data in the Seattle Panel Study. Out of a total of 330 observations, 68 (21%) are missing data on one or more factors: overall well-being, eNO, indoor PM2.5, or relative humidity. The missingness is intermittent. The sources of missingness, according to our best judgment, are given in Table 4. We assume that data missing because of equipment problems is true measurement error (and not measurements below the limit of detection) and hence MCAR. This is certainly debatable and should be mentioned as a limitation of our analysis. We classify data that are missing when children are not home for a visit, neglect to answer the symptom questions on the questionnaire, or eat within 1 hour of the eNO measurement as

TABLE 4. Missing data in the Seattle Panel Study No. of observations missing

Type of missingness

Variables missing

Reason given Equipment problem

MCAR

5

eNO, indoor PM2.5, and/or relative humidity eNO

MAR

3

Overall well-being

5

All

Meal within 1 hr of measurement Subject left blank on questionnaire Child not home

55

MAR MAR

eNO Z exhaled nitric oxide; PM2.5 Z particulate matter less than 2.5 mm in diameter; MCAR Z missing completely at random; MAR Z missing at random.

MAR. These observations may be missing because children are not feeling well, in which case missingness would depend on the outcome (an observed variable). We assume that the missingness mechanism can be explained entirely by the observed data, and hence is not informative, but again this is debatable. Likelihood-based methods, such as conditional models, are valid so long as data are MAR or MCAR. However, non-likelihood–based analyses, such as marginal and transitional models (fit using GEE), require that the data be MCAR, a stronger assumption. Hence, the type of missingness should be seriously considered when choosing a statistical analysis. A number of methods have been proposed for using GEE with MAR (19–21) and for analyzing data with informative missingness (6, 22–25), but each depend on making additional assumptions, in particular about the missingness mechanism.

CONTROLLING FOR CONFOUNDING Between- and within-subject effects Consider the marginal model for the Seattle eNO data: EðYit j Xit Þ Z b0 þ bM Xit;

(4)

where Yit represents eNO and denotes indoor PM2.5 for subject-session i at time t, and we ignore the other covariates for the time being. This model implies that the effect of increasing PM on eNO is the same, regardless of whether the difference is in the same subject-session, or across subjectsessions. In fact, given the design of the study, the exposure can be partitioned into three components, and the three components may have very different effects on the outcome due to confounding (6, 26). Therefore, we propose modifying model (4) to allow the three exposure components to have unique effects. Using new subscripting notation, let i index subject, j session, and t time. Denote the average of a variable Z by Z. The exposure, Xijt, is partitioned into three components: Xi: the average exposure for subject i (the betweensubject component) Xij  Xi: for subject i, the average exposure in session j minus the overall average exposure (the within-subject, between-session component) Xijt  Xij: for subject i and session j, the exposure on day t minus the average exposure in the session (the withinsubject, within-session component) We rewrite (4) as:   M EðYit jXit Þ Z b0 þ bM bs Xi þ bwsbs Xij  X i   : ð5Þ þ bM wsws Xijt  X ij

AEP Vol. 18, No. 10 October 2008: 792–802

Janes et al. ANALYSIS OF PANEL STUDIES

The parameters are the between-subject; within-subject, between-session; and within-subject, within-session effects of exposure. The between-subject parameter is the effect of increasing indoor PM2.5 when comparing across subjects. This is potentially confounded by time-independent confounders. For example, the region in which the subject lives might influence levels of both eNO and PM. The withinsubject, between-session parameter is the effect of increasing indoor PM2.5 when comparing across sessions for the same subject. Since sessions are often in different seasons, and season is associated with exposure and outcome, this exposure effect is confounded. The within-session, withinsubject effect of exposure is the real parameter of interest; it represents the effect of increasing indoor PM2.5 when comparing across days in the same session for the same subject. The parameter bM in model (4) represents a combination of these three exposure effects. The parameter estimates for model (5) are displayed in Table 5. Age, BMI, relative humidity, and temperature have been included as covariates for increased precision. Note that the within-subject, between-session exposure effect has a very large standard error because the number of sessions per subject is very small. We estimate that the within-subject, within-session effect of a 10 mg/m3 increase in indoor PM2.5 is an associated increase of 3.90 ppb in eNO (95% CI: 0.74 ppb to 7.06 ppb). All three exposure effects are consistent with this value. As expected, the effect of unpartitioned indoor PM2.5 in the marginal model (see Table 3) is a weighted average of the coefficients of the three exposure components considered herein. An additional benefit of partitioning exposure in this way is that the exposure component of interest, here the withinsubject, within-session exposure component, is ‘‘mean balanced,’’ since it has mean zero for all subjects. Schildcrout and Heagerty (17) suggest that mean balanced covariates behave more favorably in terms of the bias/efficiency tradeoff associated with marginal models (see the section on modeling correlation structure in marginal models). If, by design, each subject has only one observation period, exposure should be partitioned into just two TABLE 5. Estimated marginal association between indoor PM2.5 levels and eNO, when exposure is partitioned into between-subject; within-subject, between-session; as well as within-subject, within-session components*

Indoor PM2.5, between-subject Indoor PM2.5, within-subject, between-session Indoor PM2.5, within-subject, within-session

Coefficient

95% CI

p Value

3.82 10.77

(0.59–7.05) (7.79–29.33)

0.021 0.255

(0.74–7.06)

0.016

3.90

*Coefficient relates to a 10 mg/m3 increase in indoor PM2.5 , and the model also includes age, body mass index, relative humidity, and temperature. Independent working correlation was used. Robust standard errors are reported.

797

components: the between-subject and within-subject components. Alternatively, if the exposure is shared and all subjects are observed at the same times (as in a traditional panel study design), partitioning is impossible and unnecessary. Time-Independent Confounding Factors such as the region in which a subject lives may be important time-independent confounders in panel studies. As discussed in the section on between- and within-subject effects, one of the uses of partitioning exposure is to control for such confounding. Partitioning is useful in marginal, conditional, and transition models. A conditional model also controls for differences between subjects with the use of a random effect for each subject (or for each subjectsession). In a transition model, conditioning on previous outcomes controls for a large amount of time-independent and time-dependent confounding, since the exposure effect represents the association between exposure and outcome among observations with the same outcome history. Finally, with any type of model, time-independent confounders can be further controlled by including them as covariates. Time-Dependent Confounding Air pollution is highly influenced by time-varying factors such as season and temperature, many of which are also associated with health outcomes of interest. The most common method of controlling for time-varying confounders is by including them as covariates in the regression model (see, for example, Yu et al. (27)). This approach requires selecting a functional form for the covariates, as well as choosing the appropriate lags to be included. The method of controlling for confounding should be chosen a priori, as model selection bias is incorporated when researchers select the model with the strongest association from a set of candidate models which control for confounders in different ways. Model selection bias can be of the same magnitude as the health effects themselves (28). Partitioning exposure will control for a certain amount of time-dependent confounding. However, confounders which vary within an individual’s period of observation will not be controlled. For example, if the subject-session is long, there may be residual confounding due to season. Further partitioning by season could be employed to reduce bias. The within-subject, within-season exposure effect would be less confounded.

EXPLORING AND SUMMARIZING LONGITUDINAL DATA Summarizing Data and Graphical Displays In any longitudinal study, the most fundamental data summaries describe where the data lie (e.g., in time, across

798

Janes et al. ANALYSIS OF PANEL STUDIES

AEP Vol. 18, No. 10 October 2008: 792–802

individuals, across sessions) (29). Missingness should be described as intermittent or dropout, and tabulated (see Tables 1 and 4 for examples). A plot of the exposure and outcome trends over time is a simple and useful first descriptive (Fig. 1). This is especially informative if both exposure and outcome are continuous and measured over a long time period. Lines should connect observations within the same subject-session. Such a plot displays the within- and between-subject-session trends, and can be used to assess whether exposure and outcome series have peaks at similar times. Fig. 1 displays all eNO and indoor PM2.5 observations as a function of time. Lines connect observations in the same subject-session. Observe the large amount of variability in both exposure and outcome

series over time and within subject-session. As is typical in the air pollution setting, where exposure effects are very small, it is difficult to see an association between exposure and outcome in this figure. The figure can also be misleading because any association observed may be attributable to confounding. To observe the relationship between exposure and outcome apart from confounding, it is useful to plot exposure and outcome residuals from models that include the confounders. Fig. 2 shows a plot of the eNO residuals versus indoor PM2.5 residuals, each from linear models which include age, BMI, relative humidity, and temperature. The trend is highlighted by overlaying a smooth curve. We see a modest, linearly increasing trend in eNO as a function of indoor

20 10

Indoor PM2.5

30

a

11/2000

12/2000

1/2001

2/2001

3/2001

4/2001

5/2001

3/2001

4/2001

5/2001

eNO

20

40

60

b

80

Date

11/2000

12/2000

1/2001

2/2001

Date

FIGURE 1. a, Indoor PM2.5 as a function of time. b, eNO as a function of time. Lines connect observations within the same subject-session.

Janes et al. ANALYSIS OF PANEL STUDIES

40 −20

0

0

20

eN0 Residuals

40 20

eNO Residuals

799

60

60

AEP Vol. 18, No. 10 October 2008: 792–802

−5

0

5

10

15

20

25

−20

0

Indoor PM2.5 Residuals

20

40

60

Lag 1 eNO Residuals

PM2.5 and note that much of the variation in eNO is not explained by the exposure. The drawback of a plot such as Fig. 2 is that it displays the association between overall (unpartitioned) exposure and outcome. In fact, the three exposure components should be modeled separately, as in the section on between- and within-subject effects. It is appropriate to examine three plots: within-subject, within-session outcome residuals versus within-subject, within session exposure residuals; within-subject, between-session outcome residuals versus within-subject, between-session exposure residuals; and between-subject outcome residuals versus between-subject exposure residuals. In the Seattle data these plots look very similar to Fig. 2. To explore the transition model, it is useful to plot outcome residuals against previous lags of outcome residuals. This is shown in Fig. 3 for the Seattle eNO data, where residuals come from linear models including the covariates: indoor PM2.5, age, BMI, relative humidity, and temperature. We see that there is a strong association between eNO today and eNO yesterday, even after accounting for PM and other covariates. With a binary outcome, it is useful to summarize the outcome in a cross-sectional manner as well as withinindividuals (see Table 1). Exploring the association between a binary outcome and a continuous exposure is difficult because plotting the outcome is not very informative. A plot of the outcome residuals as a function of the exposure residuals can be useful. Fig. 4 shows a plot of the residuals from

a logistic model for overall well-being against indoor PM2.5 residuals from a linear model. Both models are adjusted for age, BMI, relative humidity, and temperature. While the residuals cluster into two groups, we can still see, with a smoothed curve overlaid, that the association between exposure and outcome appears to be piece-wise linear. As with a continuous outcome, plots of partitioned exposures and outcomes should also be examined. Outcome residuals can be plotted against previous lags of outcome

-2

-1

0

1

FIGURE 3. eNO residuals plotted against eNO residuals, lagged one day. Residuals come from linear models with covariates: indoor PM2.5, age, BMI, relative humidity, and temperature. A smoothed curve is overlaid.

Overall Well-Being Residuals

FIGURE 2. eNO residuals as a function of indoor PM2.5 residuals. Residuals are generated from linear models with covariates age, BMI, relative humidity, and temperature. A smoothed curve is overlaid.

0

10

20

Indoor PM2.5 Residuals

FIGURE 4. Overall well-being residuals as a function of indoor PM2.5 residuals. Residuals are generated from a logistic model for overall well-being, and a linear model for indoor PM2.5. Each model includes age, BMI, relative humidites and temperature as covariates. A smoothed curve is overlaid.

800

Janes et al. ANALYSIS OF PANEL STUDIES

AEP Vol. 18, No. 10 October 2008: 792–802

V

0

50

100

150

200

250

300

2

4

6

8

Time Lag

FIGURE 5. Sample variogram for eNO residuals, as a function of time separation, in days. A smooth curve is overlaid (solid line). Dotted line represents total sample variance for the eNO residuals. Residuals come from a linear model including the three indoor PM2.5 exposure components, age, BMI, relative humidity, and temperature.

2.0 1.5 1.0 0.5

The dependence among outcomes in the same cluster should be explored to correctly model the correlation structure. Of interest is the correlation that exists after the mean structure has been taken out. With a continuous outcome, the correlation structure can be summarized using a variogram (6, 30). For residuals rij in subject-session i at time j the differences vijk Z 12(rij  rik)2 are plotted against the corresponding time differences uijk Z tij  tik. Such a plot is shown in Fig. 5 for the eNO residuals in the Seattle data. A smooth curve is overlaid to aid viewing of the trend (the y-axis is truncated at 300). A dotted line is also added at the level of the total estimated variance in the eNO residuals. Observe that the differences between observations tend to increase with time lag, implying a serial correlation structure. The variogram increases all the way to the overall variance, indicating that the correlation decays to zero and has no long-term component. At larger time lags, the variogram is even slightly above the overall variance, a consequence of the instability of the estimates at these large lags. We conclude that a there is some evidence of serial correlation. With a binary outcome, correlation is not a useful measure of association. A more meaningful summary of the relationships among binary observations is the odds ratio. Heagerty and Zeger (31) suggest plotting a lorelogram, or the log-odds ratio for pairs of observations in the same cluster as a function of time separation. We show an example in

0.0

Exploring Correlation Structure

Log Odds Ratio

2.5

3.0

residuals to determine the necessity of controlling for outcome history.

2

4

6

8

Time Lag

FIGURE 6. The lorelogram for overall well-being: Log-odds ratios for pairs of observations in the same subject-session, as a function of time separation, in days.

Fig. 6 for the overall well-being outcome. Observe that the log odds ratios for pairs of observations in the same subjectsession remain high as time separation increases. Children who report feeling worse (better) than average on one day are likely to feel worse (better) than average on all other days in the same subject-session. This high degree of ‘‘correlation’’ persists when the data are stratified by various covariates. An exchangeable correlation structure seems a reasonable assumption.

DISCUSSION The panel study design is a powerful tool for assessing the short-term association between air pollution and health outcomes over time, within individuals. We emphasize that the choice of modeldmarginal, conditional, or transitiond should be based on the scientific question of interest; the models differ in important ways with respect to parameter interpretation. Our models for a child’s overall well-being differ in terms of whether they estimate the difference in the log odds of feeling worse between populations with different indoor PM2.5, as in the marginal model, whether the populations contrasted had the same PM2.5 yesterday, as in the transition model, or whether we estimate the effect of changing PM2.5 on a given child’s overall well-being. We also discussed methods of controlling for confounding and advocated partitioning of the exposure to focus on the exposure effect that is least likely to be confounded. Our analysis of eNO partitions the PM2.5 effect into betweensubject; within-subject, between-session; and within-subject, within-session components to control for between-subject confounders and between-session confounders. The marginal, conditional, and transition approaches also differ in how they model correlation and deal with missing data

AEP Vol. 18, No. 10 October 2008: 792–802

Janes et al. ANALYSIS OF PANEL STUDIES

801

TABLE 6. Aspects of marginal, conditional, and transitional models For making inference about: Marginal Conditional Transitional

Population averages Individuals Population averages, conditional on past outcomes

Estimation method

How correlation is modeled

Assumption about missing data

GEE Maximum likelihood GEE*

Working correlation matrix In the mean model Working correlation matrix*

MCAR MAR MCAR

GEE Z generalized estimating equations; MCAR Z missing completely at random; MAR Z missing at random. *Transitional models are typically fit by using GEE with a working correlation matrix. Maximum likelihood methods can also be used (5).

(see Table 6 for a summary). An appreciation of these model attributes and use of data summaries that are tied to the model choice will lead to a better understanding of the link between air pollution and adverse health outcomes and help address the challenges associated with strong confounding. The sample size of the panel study is an important attribute of the design. In the Seattle Panel Study, the small sample size makes it difficult to distinguish between different types of data structures and to fit complex models. The validity of our robust standard errors is also questionable with just 33 subject-sessions. An important future research topic is the derivation of sample size calculations to facilitate design of efficient studies with specified operating characteristics.

6. Diggle PJ, Heagerty P, Liang KY, Zeger SL. The analysis of longitudinal data. 2nd ed. Oxford: Oxford University Press; 2002. 7. Greenland S. Dose-response and trend analysis in epidemiology: alternatives to categorical analysis. Epidemiology. 1995;6:345–347. 8. Korn EL, Whittemore AS. Methods for analyzing panel studies of acute health effects of air pollution. Biometrics. 1979;35:795–802. 9. Sheppard L, Prentice RL, Rossing MA. Design considerations for estimation of exposure effects on disease risk, using aggregate data studies. Stat Med. 1996;15:1849–1858. 10. Sheppard L. Ecologic study design. Encyclopedia of Environmetrics. New York: John Wiley and Sons; 2002 p. 673–705. 11. Dominici F, Sheppard L, Clyde M. Health effects of air pollution: a statistical review. Int Stat Rev.. 2003;71:243–276. 12. Schwartz J. The distributed lag between air pollution and daily deaths. Epidemiology. 2000;11:320–326. 13. Goodman PG, Dockery DW, Clancy L. Cause-specific mortality and the extended effects of particulate pollution and temperature exposure. Environ Health Perspect. 2003;112:179–185.

APPENDIX

14. White H. Maximum likelihood estimation of misspecified models. Econometrics. 1982;50:1–25.

GEE can be fit in Stata using xtgee, in Splus using gee, or in SAS using proc genmod. In Splus, robust standard errors are always reported, while Stata requires the robust option and SAS the covb option to request robust standard errors. Linear random effects models can be fit in Stata using xtgee, in Splus using lme, and in SAS using proc mixed. Logistic random intercept models can be fit in Stata using xtlogit, and general non-linear random effects models can be fit in SAS using proc glimmix.

15. Royall RM. Model robust inference using maximum likelihood estimators. Int Stat Rev. 2005;54:221–226. 16. Pepe MS, Anderson GL. A cautionary note on inference for marginal regression models with longitudinal data and general correlated response data. Communications in Statistics. Part B. Simulation and Computation. 1994;23:939–951. 17. Schildcrout JS, Heagerty P. Regression analysis of longitudinal binary data with time-dependent environmental covariates: bias and efficiency. Biostatistics. 2005;6:633–652. 18. Little RJA, Rubin DB. Statistical analysis with missing data. New York: John Wiley; 1987. 19. Heyting A, Tolboom JTBM, Essers JGA. Statistical handling of dropouts in longitudinal clinical trials. Stat Med. 1992;11:2043–2062.

REFERENCES 1. Koenig JQ, Jansen K, Mar TF, et al. Measurement of offline exhaled nitric oxide in a study of community exposure to air pollution. Environ Health Perspect. 2003;111:1625–1629. 2. Liu LJ, Box M, Kalman D, et al. Exposure assessment of particulate matter for susceptible populations in Seattle. Environ Health Perspect. 2003;111:909–918. 3. Mar TF, Jansen K, Shepherd K, Lumley T, Larson TV, Koenig JQ. Exhaled nitric oxide in children with asthma and short-term PM2.5 exposure in Seattle. Environ Health Perspect. 2005;113:1791–1794. 4. Trenga CA, Sullivan JH, Schildcrout JS, Shepherd KP, Shapiro GG, Liu LJ, et al. Effect of particulate air pollution on lung function in adult and pediatric subjects in a Seattle panel study. Chest. 2006;129:1614–1622. 5. Liang KY, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika. 1986;73:13–22.

20. Robins JM, Rotnitzky A, Zhou LP. Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. J Am Stat Assoc.. 1995;90:106–121. 21. Scharfstein D, Rotnitzky A, Robins JM. Adjusting for non-ignorable dropout using semiparametric non-response models (with discussion). J Am Stat Assoc. 1999;94:1096–1120. 22. Wu MC, Bailey KR. Estimation and comparison of changes in the presence of informative right censoring: conditional linear model. Biometrics. 1989;45:939–955. 23. Little RJA. Pattern-mixture models for multivariate incomplete data. J Am Stat Assoc. 1993;88:125–134. 24. Diggle PJ, Kenward MG. Informative dropout in longitudinal data analysis (with discussion). Appl Stat. 1994;43:49–73. 25. Wu MC, Carroll RJ. Estimation and comparison of changes in the presence of right censoring by modeling the censoring process. Biometrics. 2005;44:175–188.

802

Janes et al. ANALYSIS OF PANEL STUDIES

26. Palta M, Yao T-J. Analysis of longitudinal data with unmeasured confounders. Biometrics. 1991;47:1355–1369. 27. Yu OC, Sheppard L, Lumley T, Koenig JQ, Shapiro GG. Effects of ambient air pollution on symptoms of asthma in Seattle-area children enrolled in the CAMP study. Environ Health Perspect. 2000;108:1209–1214. 28. Lumley T, Sheppard L. Assessing seasonal confounding and model selection bias in air pollution epidemiology using positive and negative control analysis. Epidemiology. 2000;11:705–717.

AEP Vol. 18, No. 10 October 2008: 792–802

29. Kunzli N, Schindler C. A call for reporting the relevant exposure tem in air pollution case-crossover studies. J Epidemiol Community Health. 2005;59:527–530. 30. Diggle PJ. Time series: A biostatistical introduction. Oxford: Oxford University Press; 1990. 31. Heagerty P, Zeger SL. Lorelogram: a regression approach to exploring dependence in longitudinal categorical responses. J Am Stat Assoc. 1998; 93:150–162.