Statistical considerations when analyzing biomarker data

Statistical considerations when analyzing biomarker data

Clinical Immunology 161 (2015) 31–36 Contents lists available at ScienceDirect Clinical Immunology journal homepage: www.elsevier.com/locate/yclim ...

470KB Sizes 2 Downloads 132 Views

Clinical Immunology 161 (2015) 31–36

Contents lists available at ScienceDirect

Clinical Immunology journal homepage: www.elsevier.com/locate/yclim

Statistical considerations when analyzing biomarker data Craig A. Beam ⁎ Division of Epidemiology and Biostatistics, Department of Biomedical Sciences, Western Michigan University Homer Stryker M.D. School of Medicine, 1000 Oakland Dr. Kalamazoo, MI 49009, USA

a r t i c l e

i n f o

Article history: Received 28 February 2015 Received in revised form 12 May 2015 Accepted with revision 13 May 2015 Available online 23 June 2015 Keywords: Biomarker Surrogate Statistical design and analysis

a b s t r a c t Biomarkers have become, and will continue to become, increasingly important to clinical immunology research. Yet, biomarkers often present new problems and raise new statistical and study design issues to scientists working in clinical immunology. In this paper I discuss statistical considerations related to the important biomarker problems of: 1) The design and analysis of clinical studies which seek to determine whether changes from baseline in a biomarker are associated with changes in a metabolic outcome; 2) The conditions that are required for a biomarker to be considered a “surrogate”; 3) Considerations that arise when analyzing whether or not a predictive biomarker could act as a surrogate endpoint; 4) Biomarker timing relative to the clinical endpoint; 5) The problem of analyzing studies that measure many biomarkers from few subjects; and, 6) The use of statistical models when analyzing biomarker data arising from count data. © 2015 Elsevier Inc. All rights reserved.

1. Introduction

2. Analyzing and designing the biomarker-metabolic outcome study

In this paper I discuss several examples taken from my experience in the design and analysis of autoimmune research in type 1 diabetes (T1D) that highlight what I surmise to be important considerations when analyzing biomarker data in clinical immunologic research in general. This paper has been written so that anyone with a basic understanding of statistics—specifically “power”, “p-value”, “correlation” and “regression”—will be able to read and understand the statistical considerations herein discussed. A glossary is provided to help the reader with new terms. In Section 2 I discuss statistical considerations related to the design and analysis of clinical studies which seek to determine whether changes from baseline in a biomarker are associated with changes in a metabolic outcome. In Section 3 I discuss conditions required for a biomarker to be considered a “surrogate” as well as considerations that arise when analyzing whether or not a predictive biomarker could act as a surrogate endpoint. In Section 4 I review recent evidence suggesting that the timing of the biomarker might be a very important consideration when designing and analyzing biomarker data in clinical research. In Section 5 I discuss the problem of analyzing studies that measure many biomarkers from few subjects. In Section 6 I discuss the use of statistical models when analyzing biomarker data arising from count data.

2.1. Introduction

⁎ Corresponding author. E-mail address: [email protected].

http://dx.doi.org/10.1016/j.clim.2015.05.019 1521-6616/© 2015 Elsevier Inc. All rights reserved.

Without doubt, one of the primary interests in modern immunological research is to associate clinical outcome with immunologic mechanism as measured by biomarkers. For example, a current interest is to determine whether the loss of endogenous insulin production that is seen years after diagnosis of T1D, the result of prolonged autoimmune attack on the insulin-producing beta-cell of the pancreas, can be attributed to, or at least associated with, specific T-cell subsets. In collaboration with several investigators (JDRF funded 1-SRA2014-324-Q-N Autoimmunity Center, M. Peakman (PI), J. Bluestone, A. Long and C. Beam), we sought to design a study that would attempt to answer the above question in T1D by focusing on the CD8 T-cell pathway and immune cell subsets and functional measures. In addition, we decided to use C-peptide as our metabolic outcome as it has been established to be an acceptable “surrogate outcome” for insulin production [1] (and, hence, beta-cell functioning and/or mass). As is also customary, the area under the 2 hour stimulated C-peptide curve divided by the length of time (120 min) was adjusted for baseline and transformed with the natural logarithm function (“Ln”). Our metabolic outcome analyzed was therefore, longitudinal values of C-peptide change from baseline (specifically Ln([C-peptide at time t/baseline C-peptide])) to be regressed on changes from baseline in predictors which interrogate the CD8 T-cell pathway via various immune cell subsets and functional measures (B cell, Tregs, Treg function and T-cell differentiation).

32

C.A. Beam / Clinical Immunology 161 (2015) 31–36

2.2. Data correlation The plan was to measure both C-peptide and biomarkers on each individual subject at four time points: baseline and 3, 6 and 12 months after baseline. An important consideration when analyzing such longitudinal data is the potential correlation among values measured on the same individual across time. This “within subject correlation” violates a basic assumption of regression analysis (observations are statistically independent) and can, therefore, lead to erroneous statistical inference arising from incorrect p-values. We therefore used a “random effects” approach to account for the correlated nature of the longitudinal data. In statistical parlance, a “mixed model” is one in which some of the covariates are considered to be observed at random and so are not something one could predetermine prior to the use of the model in another population or in a future time. A “random effects” model is a specific type of mixed model—one in which the random effect is considered only as a means to group data coming from the same individual (which is the source of correlation in our data). A major advantage of having such a “random effect” is that the correlation of the data coming from the same subject is statistically accounted for with a single parameter—that portion of the variance in the data coming from differences between subjects (the “between subject variance”).

ranging from 5 to 83 (the maximum number of subjects determined to be available and having at least three time points of samples).These analyses were conducted in the statistical software package PASS (Hintz, J. 2008, NCSS LLC, Utah). If changes in a cell population across time largely determine changes in C-peptide across time, then once we adjust for time, the relationship (expressed as a slope) of the C-peptide and the cell population change should be approximately equal to the slope of C-peptide on time itself. The Greenbaum data [2] implies a 3.5% C-peptide AUC loss per month after baseline, so that at 3 months we should see 10% loss, at 6 months 21% loss, at 12 months 42% loss. Based on the prior experience of the co-investigators, we also expected to observe a modest change of approximately 10% from baseline in any cell population. Using the mean frequencies of each biomarker from peripheral blood provided by our labs and the mean C-peptide from Greenbaum, we translated these changes to the implied slope value in each of the assays considered and the smallest slope detectable (with 80% power) with our desired sample size of 50 subjects. Because of the limited availability of biospecimens across time from the repository, we wanted to limit the number of subjects to no more than 50. As can be seen from Table 1, this sample size is sufficient for all populations measured except the CD4+ subset. 3. Surrogacy and the predictive biomarker

2.3. Time and multiple testing 3.1. Surrogacy and correlation Other considerations in the development of this model and sample size evaluation were how to include “time” as a variable and the statistical issues of multiple testing. Time was, by study design, recorded as “months since baseline” and could assume only a small number of ordered values (3,6, and 12). Therefore, we decided to consider time an ordinal classification effect and thereby, instead of having a single “slope for time”, we had to estimate constants for each of months 6 and 12 reflecting the additive contribution of time to C-peptide change relative to month 3 and, as well, estimate any interactions of these “levels of time” with other covariables in the model. Thus, a downside of using time as a categorical variable is the need to conduct multiple testing of time levels and their interactions with other variables. Such multiple testing, of course, increases the risk of a false positive finding (or “Type I error”). Control of multiple testing-related Type I Error can be accomplished either by the “Bonferroni” or the “Protected F” method. Bonferroni is widely used in the literature, but it is very conservative and leads to reduced statistical power for the individual tests. A less-conservative approach is to compare individual time-point means only when the interaction of a predictor with time (through the associated “F-test”) is found to be statistically significant (p b 0.05). This “Protected F” method is, however, not without some disagreement among statisticians about how well it actually protects against Type 1 errors. But, until that debate has been settled, it should be considered as a practical solution to the multiple testing problem in biomarker research.

A “surrogate biomarker” is one for which therapeutic changes in the biomarker reflect changes in a clinically meaningful endpoint [3]. However, as stated by Baker [4]: “A perfect correlate does not a surrogate make”. A strong linear relationship, as indicated by high correlation, between the potential surrogate biomarker and clinical outcome is, of course, desirable. Yet, if therapy alters the slope of the linear relationship, then any definition of “clinically meaningful change” is therapy-dependent. This implies that one would have to have different thresholds defining effective biomarker change for each therapy group which implies that knowledge of the therapy-specific linear relationship between biomarker and clinical outcome exists BEFORE conducting the clinical study. Fig. 1 illustrates this problem. The TrialNet study of co-stimulatory blockade with abatacept [5] found that, in addition to reducing the CD4 central memory population, abatacept also attenuated the rate at which C-peptide was lost. For example, a 0.5 unit reduction in Ln central memory is estimated to be accompanied by a 0.051 (= 0.1019 × 0.5) reduction in Ln C-peptide loss, in the placebo group. Yet, since the slope is not significantly different from 0 in the treated group, no change

2.4. Sample size analysis We decided to base our sample size analysis on testing the main effect of each biomarker on C-peptide decline after adjusting for the effect of time and the age of the subject. We used data from each of the collaborating labs to estimate the “between subject variance” for each of the biomarkers. When “withinsubject” replicates were available from a lab, we used a “Variance Components Analysis” to better estimate the between subject variance. We then used residuals obtained from fitting the longitudinal C-peptide data from Greenbaum [2] to a mixed linear regression model that adjusted for time, age and subject, to estimate the variance in C-peptide to be expected after we control for the effect of time and age of subject. The above two estimates of variance were used to estimate the smallest slope detectable with 80% power for various sample sizes

Fig. 1. From Orban, Diabetes 2014. Scatterplot showing log-change from baseline in 2 year C-peptide vs. 1 year change in Central Memory (CM) population in the abatacept study. Regression lines show trends within treatment groups. The slope in the abatacept group is significantly (p b 0.05) attenuated toward zero relative to placebo.

C.A. Beam / Clinical Immunology 161 (2015) 31–36

associated with a 0.5 reduction in central memory is expected in the abatacept group based on the statistical model. This is to say that, whereas a 0.5 unit reduction in central memory leads to an important reduction in C-peptide in untreated subjects, it is not a threshold one could apply to use central memory as a surrogate marker in subjects treated with abatacept.

33

(1−exp(− 3.979)) = 98.1% reduction in 1-year CD4 central memory compared to placebo to establish efficacy using our biomarker—a value we suspect is not plausible nor clinically desirable. Thus, another consideration worthy of mention is the danger of extrapolations beyond the range of the original data when designing new biomarker studies. 4. Timing of biomarkers

3.2. Statistical approaches to designing a study to establish the predictive biomarker as surrogate Contrary to the abatacept experience, we might yet expect the linear relationship depicted in Fig. 1 to hold for therapies not targeting the CD4 compartment. For example, the AbATE study [6] tested whether treatment with Teplizumab would lessen C-peptide loss and suggested that this might have occurred via the reduction of CD8 lymphocytes and related cytokines. The study was powered to detect a 50% reduction in the loss of baseline-adjusted 2-year C-peptide AUC in the treated group compared with the control group. In the intent to treat analysis of the primary endpoint, patients treated with Teplizumab had a decline in C-peptide at 2-years (mean = −0.28 nmol/L) that was significantly less than that observed in control subjects (mean = − 0.46 nmol/L), p = 0.002. This represents a 40% reduction in loss. Should the slope of the linear relationship found in the Abatacept study hold in the AbATE study populations, we can then establish a threshold for the change in CD4 central memory related to a desired reduction in 2-year C-peptide loss as follows (also see Fig. 2): If “P0” is the 2 year C-peptide change, expressed as proportion of baseline, expected in the control population and if our therapeutic target is a 50% reduction in loss, then the proportion of loss in the treated group must equal P0 + .5(1 − P0). The decrease in “y” is then Ln(P0 + .5(1 − P0))− Ln(P0) (see Fig. 2A). Let “xpbo” be the value of 1-year Ln(central memory change) in the placebo subjects and similarly define “xtrt”. By the abatacept linear relationship, it holds then that xtrt = xpbo + [Ln(P0 + .5(1 − P0))−Ln(P0)] / (−0.1019). Prior to the AbATE study, Herold [7] reported a baseline value of approximately 50% for P0 in controls. The abatacept study reported a 3% reduction in the placebo group at 1 year of CD4 central memory. Therefore, we estimate that the reduction from baseline in Ln(CD4 central memory) to a mean log value of Ln(1.00 − .03) + [Ln(.5 + .5(1 − .5))−Ln(.5)] / (−0.10190) = −.0305 − 3.979 = − 4.010 in biomarker is required to establish efficacy if this were used in the AbATE study (Fig. 2B). However, this reduction equates to a

Studies that attempt to correlate mechanistic outcome with clinical outcome typically use samples taken at the same point in time. However, results from the TrialNet rituximab study in type 1 diabetes [8] suggested that mechanistic samples taken some time before the clinical sample is taken will increase correlation and, hence, increase the predictive information of the mechanistic outcome. The rituximab [8] study was a randomized, phase 2, double-blind study in which 87 patients between 8 and 40 years of age who had newly diagnosed type 1 diabetes were assigned to receive infusions of rituximab or placebo on days 1, 8, 15, and 22 of the study. The primary clinical outcome variable was the area under the C-peptide curve during the first 2-hour mixed meal tolerance test taken at 1 year during the study. It is thought that B lymphocytes play a role in many Tlymphocyte-mediated diseases and that it is possible to achieve selective depletion of B-lymphocytes with rituximab, an anti-CD20 monoclonal antibody. Therefore, CD19 + B lymphocytes were chosen as a mechanistic outcome measure for this study. To assess correlation, the authors used the percent of the between group difference observed in C-peptide that could be explained by the between group difference in the number of CD19 + cells. This measure was calculated using the change in the t-test statistic value in a repeated-measures model without and then with adjustment for the overall and baseline C-peptide values. Fig. 3 shows that improved prediction of C-peptide change at one year was provided by mechanistic measurement of change in CD19 + B lymphocytes at 3 months and 6 months prior to the change measured in C-peptide. Between the two points considered, the best time for mechanistic sampling in this case would be 6 months prior to sampling for stimulated C-peptide. 5. Many biomarkers, few subjects Cellular immunoblotting [9], a technique for studying T-cell responses to islet proteins in T1D patients, was utilized to investigate

Fig. 2. Translating a C-peptide therapy target to a biomarker therapy target. A: 1—Prior evidence provides a change from baseline, expressed as proportion of baseline, expected in the placebo group “P0” equal to 0.50; 2—The greatest amount of clinical benefit (reduction in loss) is therefore (1 − P0) = 0.5; 3—The desired effect size, in this case 50% reduction in loss, is therefore 50 % of (l − P0) = 0.25; 4—which implies then that the target for clinical benefit is P0 +0.5(l − P0) = 0.75 in this example. B: 5—These values are then translated to the log scale, under which a linear relationship with the log of biomarker change from baseline (again, proportion of baseline) is assumed; 6—Prior evidence also suggests the log change from baseline in placebo is a 3% reduction, which equates to a value of ln(0.97) = −0.0305; 7—Prior evidence again suggests the slope of this linear relationship = −0.1019; 8—Starting at (6) and knowing (7) we follow the line back to the clinical target (ln(0.75)) to determine that this treatment effect implies a biomarker value of −4.010; 9—which implies then that the treatment effect for the biomarker must be at least a −3.979 reduction in ln(proportion baseline).

34

C.A. Beam / Clinical Immunology 161 (2015) 31–36

Fig. 3. Mechanistic markers contemporaneously measured with c-peptide are not as predictive as those measured before the C-peptide sample. X-axis is the number of months before C-peptide sample that the mechanistic sample was taken.

the T-cell effects of rituximab treatment (unpublished data from Drs. Brooks-Worrell and Palmer). Cellular immunoblotting results were available on 25 rituximab treated type 1 diabetes patients and 21 type 1 diabetes patients who received placebo infusions. Signal Intensity (SI) was measured at each of 5 time points for 18 distinct “blots” and was available for all subjects. A “blot” refers to a molecularweight region of a corresponding nitrocellulose plate. SI is equal to the ratio of T-cell activity in the blot to the activity in a control. “T-cell activity” refers to auto-reactivity to human pancreatic islets and is measured via a liquid scintillation counter and expressed as the ratio of counts per minute relative to the counts per minute observed in a control. Therefore, in this experiment, at each time point each subject contributed 18 different SI values to the dataset. A significant challenge in the analysis of this data is presented by the small number of patients (46) relative to the number of measurements taken on each patient (90 SI values per patient). Moreover, the conventional “Multivariate Normal” modeling approach (which assumes that SI is normally distributed for each blot and every linear combination of blots is also normally distributed) to this data analytic problem is contraindicated for two reasons. One reason is that this model requires estimation of not only treatment effects and time interactions, but also the covariance structure of the data. Means must be estimated for each of the 113 blot-time combinations (18 blots × 5 time points), for each blot (18) and each time point (5). Then, with a totally unspecified covariance structure, there would be 4005 covariance terms to estimate in addition to 90 variance terms and 103 means using 4140 data points (46 patients × 90 SI values per patient). That is, there would be more parameters to estimate than data. A reduction of the parameter space by using a “simple random effects” covariance structure could reduce the dimensionality of the parameter space to 90 variance terms, 1 random effect variance term, and 103 (blots, time-points and their interactions) fixed effect terms for a total of 204 parameters estimated with 4140 data points—a much more efficient estimation task. However, another contraindication to even the “random effects” mentioned in Section 2.2 approach arises because the dependent variable, SI, is a ratio and is therefore unlikely normally distributed due to the fact that it is bounded below by zero and most values in the data set tend to be distributed near the lower bound. A standard analytic solution then would be to use the natural logarithm transformation to normality.

the use of the Poisson regression model is immediately suggested as a reasonable alternative to conventional multivariate normal theory approaches. Such an approach confers two advantages: 1) The variances of the Poisson model are determined once the means are determined, thus eliminating the need for the estimation of 90 variance terms and yielding greater degrees of freedom and precision; 2) The Poisson regression model can be analyzed using either Generalized Linear Models or Generalized Estimating Equations approaches carried out in standard statistical programs (e.g. the SAS Procedures GLIMMIX and GENMOD). Under the former, analysts can consider various covariance structures as well as the inclusion of random effects along with fixed effects. Under the latter, analysts can use marginal estimation methods to avoid estimation of a specific covariance matrix and rather concentrate on the comparison of treatment and time effects. The results of following this approach are illustrated in Fig. 4 and discussed below. Specifically, longitudinal values of the stimulation index (SI) were analyzed using random effects Poisson regression. This is equivalent to a log-linear model of the mean count per minute (CPM) with log of the control CPM as covariate. The model included factors for blots, time (month), and either treatment or responder status and interactions with blot and time. A working covariance structure with random effect for subject-by-time interaction was used to model data correlation. Estimation used the Generalized Linear Model (GENMOD) procedure of the SAS (v9.2 Cary, NC) statistical software system. Wald tests were used for “Type 3 analysis” of factors, which test the significance of a factor after adjusting for the influence of the other factors in the model. Factors were considered significant at p b 0.05. Model-based estimates of mean change from baseline in SI were derived, along with accompanying standard errors computed using the model-based parameter covariance matrix, using the IML (“Interactive Matrix Language”) procedure of the SAS system. Mean change from baseline was compared between treatment groups or responder groups with Wald tests at each month. A Bonferroni adjusted significance level of p b 0.007 was used for the comparisons within a month in order to ensure a month-wise error rate of no more than 5%. Ratios of mean change between treatment groups or responder groups were also estimated in a similar fashion using model-based estimates. As can be observed from the figure, islet cell reactivity was significantly different between treatments at the lower-molecular weight protein regions. Thus, it is clear that the modeling of longitudinal counts was informative in this application.

6. Modeling longitudinal count data Yet, it is simple to observe that the log transformation of the SI ratio is actually the difference in the log intensity of the T-cell activities in the blot and the control. Since the intensity measure is essentially a count,

Fig. 4. Analysis of blot data by Poisson regression reveals significant treatment differences in low molecular weight range blots. Means and 95% confidence limits are shown (red circle = active treatment group, black square = placebo treated subjects). Only the upper- or lower confidence limits are displayed.

C.A. Beam / Clinical Immunology 161 (2015) 31–36

35

Table 1 Expected and smallest slope detectable in proposed study. Expected slope for C-peptide change from baseline

Smallest detectable slope n = 50

Biomarker

10% (3 mos)

20% (6 mos)

40% (12 mos)

CD45RO of Tregs pSTAT CD45RA of Tregs IFNg of CD4 + CD25 + CD127lo CD4 + CD25 + CD127lo of CD4+ CD4 + CD25 + CD127lo of CD4 + CD45+ Helios-IFNg + FOXP3+ of CD4+ Transitional B cells Naïve B cells Tetramer

0.7 1.1 0.5 1.1 0.1 0.1 0.1 2.5 1.1 4.3

1.3 2.2 1.0 2.2 0.2 0.2 0.2 5.0 2.2 8.6

2.7 4.4 1.9 4.4 0.4 0.4 0.4 10.0 4.3 17.3

7. Summary Assumptions and limitations are required in every statistical analysis and biomarkers are no exception. The examples above illustrate some key considerations when interpreting statistical analyses of biomarker data. Table 1 suggests that a sample size greater than the desired 50 is required to assess the CD4 T-cell biomarkers considered in this study. That conclusion was based on an assumed linear relationship and also the assumption that the variability in the CD4 population will be similar. One could also use these assumptions to determine the sample size needed to detect a minimum slope of interest. But, any power analysis is only as good to the extent the assumptions required for it are met and so a consideration when interpreting the results of any power and sample size analysis is the appropriateness of assumptions. One might consider assessing the “sensitivity” of the power and sample size implications to violations in assumptions and thus gain a feel for the “robustness” of the analysis. Yet, the truth of the matter is that not all assumptions can be totally verified and, therefore, a judgment call is required. In this light, assumptions that lead to conservative sample size estimates are preferred to those which lead to smaller sample sizes, since an underpowered study is not desirable. The study design behind Table 1 is simple in that it only compares a treatment group against a placebo group. Study power and sample size determination is much more formidable and assumptions perhaps more critical when one is planning a trial with 3 arms or more as in a combinatorial study. One approach would be to power the study to compare two specific treatment arms as was done in this example, and then hope that the combined sample size increases the power for the other comparisons. Another approach uses Monte Carlo simulation based on a data set that is considered to be representative of the study population [10] but this method requires specification of exact values for the parameters to be tested or estimated. Finally, sophisticated software exists to determine power and sample size for more complex designs but also which require assumptions be made and trusted. Perhaps the greatest single consideration in the statistical analysis of biomarker data must be the variability of the assay itself. Assay variability can arise from multiple sources such as from samples taken at different times, variation in technology, different analysts, and changes in reagents across time. In addition, and perhaps most underappreciated at the present time, is simple biologic variation itself—an idea that might be best restated by the old saw “You can't step in the same river twice”. The most important impact of assay variability on statistical analysis is that it increases imprecision of estimates and reduces statistical power for testing hypotheses. Moreover, it also increases the rate of misclassification error in “responder analysis” [11]. It is well know that variation can be reduced through improvement in technology, procedure, training of analysts and via the use of the average of repeated measurements to reduce variation in the data analyzed. Statistically,

0.7 0.7 0.9 4.9 4.9 4.9 16.3 0.4 0.6 0.4

one can also control for this source of variation by using methods for “repeated measures”. This method has the advantage that one does not need to reduce the data to means prior to analysis and therefore gains greater total sample size for analysis. But, ultimately, each of the above techniques essentially assumes that a static biological system is being measured. However, there is evidence that this is not the case in autoimmunity. For example, Keller [12] demonstrated circadian patterns of CD19, CD90.2 and CD11b/CD14 in adrenalectomized C57BL/6 mice. Obviously, the time-related dynamics of the autoimmune system need to be better understood so that, if nothing else, we can achieve better control of these sources of variations via statistical modeling. Given the preceding considerations, it seems clear that if we are to move biomarker science appreciably ahead in clinical immunology we need to engage in a continual learning process rather than one which is focused on simple “reject”/“don't reject” thinking. Therefore, we also should engage more seriously the Bayesian paradigm [13] which enables research to be conducted in a self-learning and continually updating manner. Conflict of interest statement The author(s) declare that there are no conflicts of interest. References [1] J.P. Palmer, G.A. Fleming, C.J. Greenbaum, et al., C-peptide is the appropriate outcome measure for type 1 diabetes clinical trials to preserve beta-cell function: report of an ADA workshop, 21–22 October 2001, Diabetes 53 (1) (Jan 2004) 250–264. [2] C.J. Greenbaum, C.A. Beam, D. Boulware, et al., Fall in C-peptide during first 2 years from diagnosis: evidence of at least two distinct phases from composite Type 1 Diabetes TrialNet data, Diabetes 61 (8) (Aug 2012) 2066–2073. [3] T.R. Fleming, J.H. Powers, Biomarkers and surrogate endpoints in clinical trials, Stat. Med. 31 (25) (Nov 10 2012) 2973–2984. [4] S.G. Baker, B.S. Kramer, A perfect correlate does not a surrogate make, BMC Med. Res. Methodol. 3 (16) (Sep 9 2003). [5] T. Orban, C.A. Beam, P. Xu, et al., Reduction in CD4 central memory T-cell subset in costimulation modulator abatacept-treated patients with recent-onset type 1 diabetes is associated with slower C-peptide decline, Diabetes 63 (10) (Oct 2014) 3449–3457. [6] K.C. Herold, S.E. Gitelman, M.R. Ehlers, et al., Teplizumab (anti-CD3 mAb) treatment preserves C-peptide responses in patients with new-onset type 1 diabetes in a randomized controlled trial: metabolic and immunologic features at baseline identify a subgroup of responders, Diabetes 62 (11) (Nov 2013) 3766–3774. [7] K.C. Herold, S.E. Gitelman, U. Masharani, et al., A single course of anti-CD3 monoclonal antibody hOKT3gamma1(Ala-Ala) results in improvement in C-peptide responses and clinical parameters for at least 2 years after onset of type 1 diabetes, Diabetes 54 (6) (Jun 2005) 1763–1769. [8] M.D. Pescovitz, C.J. Greenbaum, H. Krause-Steinrauf, et al., Rituximab, B-lymphocyte depletion, and preservation of beta-cell function, N. Engl. J. Med. 361 (22) (Nov 26 2009) 2143–2152. [9] B.M. Brooks-Worrell, J.L. Reichow, A. Goel, H. Ismail, J.P. Palmer, Identification of autoantibody-negative autoimmune type 2 diabetic patients, Diabetes Care 34 (1) (Jan 2011) 168–173. [10] Bonate P. Clinical Trial Simulation in Drug Development. Pharm Res. 2000/03/01 2000;17(3):252-256.

36

C.A. Beam / Clinical Immunology 161 (2015) 31–36

[11] C.A. Beam, S.E. Gitelman, J.P. Palmer, Type 1 Diabetes TrialNet Study G. Recommendations for the definition of clinical responder in insulin preservation studies, Diabetes 63 (9) (Sep 2014) 3120–3127. [12] M. Keller, J. Mazuch, U. Abraham, et al., A circadian clock in macrophages controls inflammatory immune responses, Proceedings of the National Academy of Sciences of the United States of America, 106, 50 Dec 15 2009, pp. 21407–21412. [13] J.J. Lee, Demystify statistical significance—time to move on from the p value to Bayesian analysis, J. Natl. Cancer Inst. 103 (1) (2011) 2–3.

Glossary Covariance Structure: The collected variances of, and covariances between, each of the variables measured in a multivariate dataset. Generalized Estimating Equations (GEE): A method for modeling Generalized Linear Models. GEE methods are robust in the sense they do not require specification of the covariance structure. However, they provide limited information about interactions between main factors in the model. Generalized Linear Models: A generalization of the conventional regression model to include dependent variables that are not continuous measurements (such as counts or categorical data) and/or not normally distributed. Specification of the covariance structure and distribution of errors is required. In contrast to GEE, this modeling approach provides statistical information about interactions. Mixed Model: A statistical model in which some of the covariates are considered to be observed at random and so are not something one could predetermine prior to the use of the model in another population or in a future time.

Maximum Likelihood Estimation: A general method of statistical estimation based on the idea of choosing values for parameters which maximum the likelihood of the data. Multivariate Data: When more than one dependent variable is measured. If two dependent variables are measured, the data is then referred to as “bivariate”. This data is distinct from “multivariable data” in which one dependent and more than one independent variable are measured. Data which contains multiple dependent and multiple independent variables is referred to as “multivariate multivariable”. Multivariate Normal: When the dependent variables are each normally distributed AND every sum of the dependent variables is normally distributed. Poisson Regression: A regression model when the dependent data are counts of some event. Such models are a specific example of a Generalized Linear Model. Random Effects: A specific type of mixed model- one in which the random effect is considered only as a means to group data coming from the same individual. A major advantage of having such a “random effect” is that the correlation of the data coming from the same subject is statistically accounted for with a single parameter-that portion of the variance in the data coming from differences between subjects (the “between subject variance”). Type 3 Analysis: Refers to the analysis of a factor after adjusting for the other factors (independent variables) in the model. Variance Component Analysis: A statistical method that decomposes the variation in a dataset into sources of variation-e.g. “between subjects” and “within subjects”. Wald Test: A large-sample test that uses empirically estimated standard errors based on “maximum likelihood estimation”.