Enhancing efficiency and quality of statistical estimation of immunogenicity assay cut points through standardization and automation

Enhancing efficiency and quality of statistical estimation of immunogenicity assay cut points through standardization and automation

JIM-12057; No of Pages 9 Journal of Immunological Methods xxx (2015) xxx–xxx Contents lists available at ScienceDirect Journal of Immunological Meth...

2MB Sizes 1 Downloads 77 Views

JIM-12057; No of Pages 9 Journal of Immunological Methods xxx (2015) xxx–xxx

Contents lists available at ScienceDirect

Journal of Immunological Methods journal homepage: www.elsevier.com/locate/jim

Enhancing efficiency and quality of statistical estimation of immunogenicity assay cut points through standardization and automation Cheng Su a,⁎, Lei Zhou a, Zheng Hu b, Winnie Weng a, Jayanthi Subramani c, Vineet Tadkod b, Kortney Hamilton d, Ami Bautista b, Yu Wu a, Narendra Chirmule d, Zhandong Don Zhong b,⁎⁎ a

Global Biostatistical Science, Amgen Inc., One Amgen Center Drive, Thousand Oaks, CA 91320, USA Pharmacokinetics & Drug Metabolism, Amgen Inc., One Amgen Center Drive, Thousand Oaks, CA 91320, USA Information Systems, Amgen Inc., 1120 Veterans Boulevard, South San Francisco, CA 94080, USA d Clinical Immunology Medical Sciences, Amgen Inc., One Amgen Center Drive, Thousand Oaks, CA 91320, USA. b c

a r t i c l e

i n f o

Article history: Received 26 February 2015 Received in revised form 18 June 2015 Accepted 22 June 2015 Available online xxxx Keywords: anti-drug antibody automated statistical analysis cut point equivalence experimental design prediction limit

a b s t r a c t Biotherapeutics can elicit immune responses, which can alter the exposure, safety, and efficacy of the therapeutics. A well-designed and robust bioanalytical method is critical for the detection and characterization of relevant anti-drug antibody (ADA) and the success of an immunogenicity study. As a fundamental criterion in immunogenicity testing, assay cut points need to be statistically established with a risk-based approach to reduce subjectivity. This manuscript describes the development of a validated, web-based, multi-tier customized assay statistical tool (CAST) for assessing cut points of ADA assays. The tool provides an intuitive web interface that allows users to import experimental data generated from a standardized experimental design, select the assay factors, run the standardized analysis algorithms, and generate tables, figures, and listings (TFL). It allows bioanalytical scientists to perform complex statistical analysis at a click of the button to produce reliable assay parameters in support of immunogenicity studies. © 2015 Elsevier B.V. All rights reserved.

1. Introduction Biotherapeutics can elicit immune responses from animals and humans, inducing anti-drug antibodies (ADA). The presence of ADA can have a significant impact on the pharmacokinetic (PK) profiles, safety, and efficacy of the candidate molecule. Such antibody responses can sometimes result in adverse clinical sequelae ranging from hypersensitivity to neutralization of the biological activity of an endogenous protein. A well-designed, robust immunogenicity assay is the foundation for appropriately evaluating the immunological properties of a biotherapeutic. Much effort has been given to the advancement of bioanalytical methods for the detection and characterization of ADAs. Abbreviations: ADA, anti-drug antibody; CAST, customized assay statistical tool; ACP, screening assay cut point; DCP, depletion cut point; ECL, electrochemiluminescence; S/N, signal to noise ratio; NC, negative control; NAb, neutralizing antibody; Log, logarithm; ANOVA, analysis of variance; SW, Shapiro–Wilk. ⁎ Correspondence to: C. Su, 1120 Veterans Boulevard, Mailstop ASF3 2011, South San Francisco, CA 94080, USA. ⁎⁎ Correspondence to: Z.D. Zhong, One Amgen Center Drive, Mailstop 30E-3-B, Thousand Oaks, CA 91320, USA. E-mail addresses: [email protected] (C. Su), [email protected] (L. Zhou), [email protected] (Z. Hu), [email protected] (W. Weng), [email protected] (J. Subramani), [email protected] (V. Tadkod), [email protected] (K. Hamilton), [email protected] (A. Bautista), [email protected] (Y. Wu), [email protected] (N. Chirmule), [email protected] (Z.D. Zhong).

Several regulatory guidelines and industry recommendations have been published concerning the development and validation of ADA assays, sample testing, and data reporting and interpretation (US Food and Drug Administration, 2009, 2013; Gupta et al., 2007; Mire-Sluis et al., 2004; Committee for Medicinal Product for Human Use, 2007; Shankar et al., 2014). Vigorous statistical analyses have been woven into the immunogenicity assessment process throughout the development cycle of a biotherapeutic, starting from assay optimization by utilizing the design of experiments principle (Chen et al., 2012; Hammond et al., 2008; Zhong et al., 2010); evaluation of method robustness and ruggedness during validation and transfer of the assay between test laboratories (Tatarewicz et al., 2009); monitoring the assay performance during in-study sample testing (Barger et al., 2010); and evaluation of the impact of ADA on the PK profile, pharmacodynamic (PD) effect, and the safety profile of the molecule of interest (Chen et al., 2013; Chirmule et al., 2012; Kelley et al., 2013; Sailstad et al., 2014; Thway et al., 2013). A tiered approach is utilized for ADA testing (Shankar et al., 2008). Initially, a sample is tested for the potential presence of ADA in a screening assay, by applying a screening assay cut point (ACP) that is predetermined through appropriate statistical analysis of drug-naïve samples from the population of interest. The screening assay is expected to detect some false positive samples to maximize the probability of identifying all potential positive samples and reduce the probability of false

http://dx.doi.org/10.1016/j.jim.2015.06.013 0022-1759/© 2015 Elsevier B.V. All rights reserved.

Please cite this article as: Su, C., et al., Enhancing efficiency and quality of statistical estimation of immunogenicity assay cut points through standardization and automation, J. Immunol. Methods (2015), http://dx.doi.org/10.1016/j.jim.2015.06.013

2

C. Su et al. / Journal of Immunological Methods xxx (2015) xxx–xxx

negative results. Following the detection of positive reactivity of a sample during the screening step, a competitive inhibition test is performed by treating a sample with the drug. The resulting reduction of assay response greater than the signal depletion cut point (DCP) would confirm that the positive reactivity is indeed specific to the drug. Additional experiments may be necessary to confirm the specific binding to be that of antibody origin, to determine the antibody levels, to identify the ADA binding epitopes, to measure the affinity of the ADA, and to characterize the ADA isotypes and subclasses. Most importantly, the ability of the ADAs to neutralize the biological activity of the biotherapeutic is further evaluated in a functional biological assay (Gupta et al., 2007). Cut point is a fundamental criterion in immunogenicity studies. Regulatory guidelines and industry white papers recommend a risk-based approach to establishing cut points statistically to reduce subjectivity (US Food and Drug Administration, 2009; Mire-Sluis et al., 2004; Shankar et al., 2008; Buttel et al., 2011). Statistical principles used to estimate cut points are consistent across immunogenicity applications. Proper statistical approach requires careful consideration in experimental design, data distribution, and analysis methods. Furthermore, as the cut points of an ADA assay are greatly dependent upon the biological status of the study population, it is important for the bioanalytical scientists to continuously monitor the cut points as well as other assay parameters for the relevant study populations during in-study sample analysis. In order to enhance the efficiency and quality of statistical analysis in support of immunogenicity studies, we have developed a comprehensive web-based customized assay statistical tool (CAST) to automate and standardize data analysis for determining cut points as well as other assay parameters. The CAST tool provides an intuitive web interface for a user to import experimental data generated from standardized experimental designs or study baseline samples, select assay factors, run the standardized analysis algorithms, and generate the tables, figures, and listings (TFL). The tool has been validated with a risk-based approach following the relevant regulatory guidance. It allows users with limited statistical training and experience to perform complex statistical analyses with a click of the buttons and generate the results within a few minutes for the intended use, which is a valuable functionality for monitoring ADA assay parameters during in-study sample analysis. In this manuscript, we report on the development of this CAST system for cut point estimation. We will describe in detail experimental and statistical considerations for the standardized designs of cut point experiments and development of the analysis algorithms.

2. Methods 2.1. Software platform CAST is a web-based, multi-tier statistical analysis system that is built on Ext JS, Java, XML, SAS, and Oracle technologies to support data analysis for immunogenicity applications (Fig. 1). Ext JS is used to create intuitive and flexible user interfaces. The Java application platform is used to process user input, create and submit SAS jobs to the SAS engine. Data preprocessing, analyses, algorithms, and TFL creation are implemented using SAS V9.3. The user can access CAST data analysis through a web browser.

2.2. Experimental designs Standardized experimental designs are devised for estimating cut points for ADA detection. The electrochemiluminescence (ECL) bridging assay is used as a model system to test CAST. As examples, two designs having 4 assay factors, also known as assay variables, are shown in Table 1. In the experimental designs, a set of donor samples are randomized into different donor groups, also known as sequences (eg, S1, S2, S3, and S4), which are stratified based on gender and disease population and allocated to different ECL detection plates. The donor groups are generated and used across all assay runs in a cut point experiment. Specifically, an assay run consists of testing the entire donor set across all assay factors of an experimental design. The assay factors are designated as VAR1, VAR2, etc., with the actual names being assigned through the user interface to provide flexibility in choosing an experimental design. An additional variable is also provided through the user interface for evaluating potential differences between two populations, eg, two disease populations, through equivalence evaluation. Up to 4 assay factors can be incorporated into an experiment, such as date, analyst, instrument, and ECL detection plate lot. Both full and reduced versions are available for each experimental design. In the full design version, a complete set of donors are tested across all factors in each assay run. In the reduced version, only subsets of the donors are tested for each assay condition, with the complete set of donors tested in each assay run. The tool offers over 20 unique experimental designs, encompassing a number of assay factors, full and reduced versions of each design, and equivalency assessment. A flexible experimental design is also available to accommodate calculation of population-specific cut point during in-study sample analysis.

Fig. 1. CAST technology.

Please cite this article as: Su, C., et al., Enhancing efficiency and quality of statistical estimation of immunogenicity assay cut points through standardization and automation, J. Immunol. Methods (2015), http://dx.doi.org/10.1016/j.jim.2015.06.013

C. Su et al. / Journal of Immunological Methods xxx (2015) xxx–xxx Table 1 Full and reduced versions of immunoassay experimental designs for 4 assay factors. Experimental designsb

Assay run

VAR1a

VAR2a

VAR3a

VAR4a

Donor group (sequence)

Reduced designc

1

1

1

1

2

1

2

2

3

2

2

1

4

2

1

2

1

1

1

1

2

1

2

2

3

2

2

1

4

2

1

2

1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2

S1, S2 S3, S4 S3, S4 S1, S2 S1, S2 S3, S4 S3, S4 S1, S2 S1, S2, S3, S4 S1, S2, S3, S4 S1, S2, S3, S4 S1, S2, S3, S4 S1, S2, S3, S4 S1, S2, S3, S4 S1, S2, S3, S4 S1, S2, S3, S4

Full designc

a VAR1, VAR2, VAR3, and VAR4 represent 4 assay factors: Variable 1, Variable 2, Variable 3, and Variable 4, respectively. b A set of donors was randomized into donor groups, also known as sequences (S1 through S4), with one donor group per plate. c In a full design, a complete set of donors was tested in each assay run at each VAR4 level (e.g., plate lot). In the reduced design, only the designated sub-sets of donors were tested in an assay run.

In a cut point experiment during validation, screening and drug inhibition tests are performed concurrently, allowing simultaneous calculation of ACP and DCP by CAST. The ECL response of a sample is recorded as ECL units, and the signal-to-noise ratio (S/N) is calculated by dividing the mean ECL value of each sample by the mean ECL value of the pooled serum negative control (NC). The percent inhibition of the assay response in the inhibition test is calculated by dividing the reduction of the assay S/N value of a drug-treated sample by the signal of the respective untreated sample. Detection of neutralizing antibody (NAb) can be achieved with either a cell-based bioassay or a non-cell-based binding assay. Due to the complexity of NAb testing, many technologies, assay formats, and detection methods have been exploited for NAb measurement. The NAb assay experimental designs are thus focused on the most impactful factors such as plate-to-plate, analyst-to-analyst, and day-to-day variability of the assay on the cut point (Table 2). To facilitate automated cut point calculation in CAST, assay data are captured into a validated standardized template (Table 3), which allows users to transfer data from the ECL instruments. Alternatively, the raw data can be extracted from a Laboratory Information Management System for analysis by CAST (not shown). 2.3. Data transformation and outlier removal Seven power transformations (− 2, − 1, − 0.5, Log, 0.5, original scale, and 2) are utilized in conjunction with outlier removal to achieve data distribution normality, as depicted in Fig. 2. The algorithm starts with removing extreme outliers that are outside the outer fence of [Q25 − 3 ∗ IQR, Q75 + 3 ∗ IQR], where Q25 and Q75 are the first and third quartiles, respectively, and IQR is the inter-quartile range. The remaining dataset is subjected to the first round of transformations,

Table 2 Full and reduced versions of NAb assay experimental designs. Experimental designs

Assay run

Analyst

Date

Plate(s)

Donor group (sequence)

Reduced design

1

Full design

1 2

1 2 1 2

≥2 ≥2 ≥2 ≥2

≥2/day ≥2/day ≥2/day ≥2/day

S1, S3, S5, S7, … S2, S4, S6, S8, … S1, S2, S3, S4, S5, S6, … S1, S2, S3, S4, S5, S6, …

3

and each resulting dataset is evaluated for normality and outliers based on residuals from analysis of variance (ANOVA) model. Normality is concluded with a Shapiro–Wilk's (SW) test p-value greater than 0.05; outliers are identified as those residuals outside the inner fence of [Q25 − 1.5 ∗ IQR, Q75 + 1.5 ∗ IQR]. Log transformation is selected if it achieves normality; otherwise, the transformation with the highest p-value is selected. If none of the transformations achieves normality, the collection of outliers identified in each transformation is removed, and the remaining dataset is subject to a second round of normality evaluation. The final transformation is confirmed in the third normality test using the dataset excluding outliers specific to this transformation. Users are also allowed to specify a data transformation to be applied. In this case, the algorithm starts with residual normality test and outlier removal, followed by retest of residual normality to confirm the transformation. The tool would stop and generate an error message when normality could not be achieved with any of the transformations even after outlier removal, or the resulting total number of outliers exceeds 5% of the entire dataset or the subset as defined by the comparison factor. 2.4. Data modeling using random effect models Cut point data are modeled using SAS PROC MIXED procedure to evaluate the equivalence between two populations and the contribution of each assay factor to the assay variability. Additionally, the distributions at residual and donor levels are evaluated for the purpose of validating the models assumptions. 2.4.1. Random effects in a full model Assay factors and their interactions, as well as donor and plate, are treated as random effects in a full model. The exact model terms are pre-specified through a model library (not shown) for each experimental design as exemplified in Tables 1 and 2. Two sets of models are specified for each experimental design, depending on whether there is a comparison factor. The comparison factor is used as a nested variable for all other model terms. While one model is used to calculate population-specific cut points for two individual datasets that do not demonstrate equivalence, the other model is applied to the combined dataset to calculate a shared cut point for both populations when equivalence conclusion is reached. 2.4.2. Model reduction and donor outlier removal The backward model selection algorithm starts with fitting a full model, followed by removal of non-significant (p ≥ 0.1) model terms one at a time; the model is refitted after removing each model term. The p-value is based on ANOVA Type I method. When there are multiple non-significant terms in a model, a pre-specified priority ranking (not shown) for each term and a p-value ranking are used to select a non-significant term to be removed. When all the remaining terms in the model are either significant or pre-specified to be included in the model, the iteration stops and the final reduced model is used to calculate the cut point. In addition to residual outlier removal described in Section 2.3, the algorithm also allows outlier removal at donor level to achieve normality. Normality is tested according to SW method using the estimated solution for donors in the reduced model. When normality test passes, the analysis continues to cut point calculation. If normality test fails, outliers as defined by the inner fence are excluded, and the remaining dataset is subject to a second round of donor normality evaluation. If the normality test fails in the second round, or if the total number of outliers at both residual and donor levels exceeds 5% of original data, the analysis workflow stops, and an error message is generated. In cases where data are split based on the comparison factor, the 5% criterion is assessed within each split dataset. Termination of analysis for one split dataset automatically triggers the termination of the other.

Please cite this article as: Su, C., et al., Enhancing efficiency and quality of statistical estimation of immunogenicity assay cut points through standardization and automation, J. Immunol. Methods (2015), http://dx.doi.org/10.1016/j.jim.2015.06.013

4

C. Su et al. / Journal of Immunological Methods xxx (2015) xxx–xxx

Table 3 Data entry template used to capture assay data for CAST analysis. Experiment date

Plate Buffer Plate_lot

Instrument Additional parameter (optional)

Disease

Gender Donor ID

Untreated S/N

Exclude_ Reason_ Untreated Untreated

Treated Exclude_ Reason_ S/N Treated Treated

7/26/2011 7/26/2011 7/26/2011 7/26/2011 7/26/2011 7/26/2011 7/26/2011 7/26/2011 7/26/2011 7/26/2011 7/26/2011 7/26/2011 7/26/2011 7/26/2011 7/26/2011

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

G G G G G G G G G G G G G G G

1111111 1111111 1111111 1111111 1111111 1111111 1111111 1111111 1111111 1111111 1111111 1111111 1111111 1111111 1111111

533577 533577 533577 533577 533577 533577 533577 533577 533577 533577 533577 533577 533577 533577 533577

NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

Healthy Healthy Healthy Healthy Healthy Healthy Healthy Healthy Healthy Healthy Healthy Healthy Healthy Healthy Healthy

Male Male Male Male Male Male Male Male Male Male Male Male Male Male Male

1.06 1.06 1.06 1.06 1.06 1.06 1.06 1.06 1.06 1.06 1.06 1.06 1.06 1.06 1.06

No No No No No No No No No No No No No No Yes

1.03 1.03 1.03 1.03 1.03 1.03 1.03 1.03 1.03 1.03 1.03 1.03 1.03 1.03 1.03

No No No No No No No Yes No No No No No No Yes

7/26/2011

1

G

1111111

533577

NA

Healthy Male

XYZ11111 1.06

No

1.03

No

Based on the final reduced model, variance of each assay factor is calculated using ANOVA. The resulting ANOVA table is generated as part of the cut point analysis output. Variance of non-significant terms is reported as zero. This analysis is also referred to as variance component analysis.

XYZ11111 XYZ11111 XYZ11111 XYZ11111 XYZ11111 XYZ11111 XYZ11111 XYZ11111 XYZ11111 XYZ11111 XYZ11111 XYZ11111 XYZ11111 XYZ11111 XYZ11111

Biological outlier

%CV

Biological outlier

t-statistic used to calculate the prediction limits; and ‘SQRT’ is an abbreviation for square root. If the reduced model has only residual term, the following equation is applied: U95 ¼ Mean þ TINVð0:95; n−1Þ  SD  SQRTð1 þ 1=nÞ

2.4.3. Equivalence evaluation For assessing population-specific cut points, equivalence is evaluated by constructing a 90% confidence interval of the ratio of the average responses of the two populations, eg, healthy and diseased populations, two reagent lots, or two test laboratories. Equivalence is concluded if the interval is within a predetermined acceptance range (0.8–1.25). A mixed effect model with the populations of interest as the fixed effect and the random effects similar to those specified in Section 2.4.1 is utilized to estimate the average responses and their variances in the transformed scale. When log transformation is applied, the confidence interval of the mean difference between the 2 populations is calculated and back-transformed to the original scale as the confidence interval of the ratio prior to reporting. In the case of inverse or no transformation, the confidence interval is directly constructed on the ratio of the original scale using the following formula: L90 ¼ Mean−TINVð0:95; n−1Þ  SD U90 ¼ Mean þ TINVð0:95; n−1Þ  SD where ‘Mean’ is the ratio of the average responses between two groups, ‘n’ is the total sample size, ‘SD’ is the standard deviation of the ratio approximated through error propagation based on the Delta method (Oehlert, 1992). Equivalence evaluation is available only for nontransformed data, Log, and inverse transformation.

where ‘Mean’ represents mean of the assay responses, ‘SD’ is the standard deviation of the assay responses, and ‘n’ is the total number of observations used in the analysis. If a transformation is utilized, the estimated baseline and the upper bound are back-calculated prior to reporting. A 99% prediction limit is also calculated using a similar method. To calculate specificity DCP, the percent ratio of the assay response of a drug-naïve sample treated with the drug and that of the respective untreated sample (%T/U) is calculated as the intermediate value for the analysis, as follows: %T=U ¼ ðS=N ratio of Treated SampleÞ=ðS=N ratio of Untreated SampleÞ  100%; and %Depletion ¼ 1−%T=U:

DCP is thus calculated using the equation: DCP ¼ 100%−L99 of %T=U; and L99 ¼ LS‐mean−TINVð0:99; DFÞ  SQRTðVariancetotal þ VarianceLS‐mean Þ

where ‘L99’ is the lower bound of a one-sided 99% prediction limit for %T/U. Prediction limits of 95% and 99.9% are also calculated. If the reduced model has only residual terms, L99 is determined with the following equation: L99 ¼ Mean−TINVð0:99; n−1Þ  SD  SQRTð 1 þ 1=nÞ:

2.4.4. Calculation of cut points through prediction limit estimation The following equation is utilized to calculate screening ACP: U95 ¼ LS‐mean þ TINVð0:95; DFÞ  SQRTðVariancetotal þ VarianceLS‐mean Þ where ‘U95’ represents upper 95% prediction limit; a 99% prediction limit is also calculated; ‘LS-mean’ is the least square mean estimate of the baseline of the assay responses, which is the estimated intercept from the SAS PROC MIXED procedure; ‘Variancetotal’ is the total variance estimate of assay response; ‘VarianceLS-mean’ is the variance estimate of the LS-mean; ‘DF’ is of the estimated total degrees of freedom using the method developed by Satterthwaite (1946); ‘TINV’ is the inverse of the

3. Results CAST offered an automated solution for calculating ADA assay cut points in various configurations, which included flexible experimental designs, data transformation methods, and comparison of different populations for determination of population-specific cut point. The CAST data analysis tool utilized statistical algorithms to remove outliers, identify data transformation, evaluate equivalence, perform model reduction, and calculate cut points. It was also capable of stopping the analysis and alerting statisticians of unusual data distribution and outlier removal to ensure robust and statistically valid cut point calculation. The analysis options and workflows are outlined in Fig. 3.

Please cite this article as: Su, C., et al., Enhancing efficiency and quality of statistical estimation of immunogenicity assay cut points through standardization and automation, J. Immunol. Methods (2015), http://dx.doi.org/10.1016/j.jim.2015.06.013

C. Su et al. / Journal of Immunological Methods xxx (2015) xxx–xxx

5

Fig. 2. Data transformation and outlier removal module.

The utility of CAST was illustrated in an immunoassay cut point experiment with the experimental design depicted in Table 4. In this experiment, a set of 36 donor samples were randomized into 3 donor groups (ie, 3 plates): S1 (n = 14), S2 (n = 14), and S3 (n = 8). The design consisted of 2 test sites with 3 assay factors each: analyst (VAR 1), instrument (VAR 2), and plate lot (VAR 3). While the assay runs at Site 2 were performed by 2 analysts, there was only one analyst available at Site 1. The design was therefore unbalanced. Assay raw data and analytical parameters were entered into CAST through the user interface shown in Fig. 4. The S/N and %T/U values of the samples were analyzed independently to produce the screening ACP and the specificity DCP, respectively, through the workflow shown in Fig. 3. Log transformation was selected by CAST to achieve normality for the final data analysis, and one outlier was identified and removed during the analysis. Another important CAST functionality was equivalence assessment during cut point calculation. In this example, the two sites were concluded to be equivalent for both screening ACP and specificity DCP (Fig. 5). A shared ACP and DCP for both sites were thus calculated using the combined data from both sites (Fig. 6). Screening ACP at two prediction limits (95% and 99%) and specificity DCP at three prediction limits (95%, 99% and 99.9%) were offered for the bioanalytical scientist to choose for the intended use (Fig. 6). Variance component analysis could provide assessment on the impact of assay factors on assay variability to help understand the assay robustness. The results of the variance component analysis were similar between screening and specificity data in this case study. Three model terms were identified to contribute to the assay variance (Fig. 7). While the contribution of individual donor responses to the variance was limited, both the multi-factor interaction and the model residual played a major role in the variability of this assay. The interpretation of such an interaction would require statistical understanding of the confounding factors and limitation of the experimental conduct. In this case the interaction represented assay run within each site. The choice of Log transformation and residual normality was supported by the QQ plot of the model residuals (Fig. 8). 4. Discussion Cut point is one of the fundamental assay parameters for immunogenicity testing, which includes ADA detection, neutralizing activity

measurement, epitope mapping, and ADA isotyping and subclassing. The biological variability of study subjects as well as the assay background and variability can have a significant impact on the cut points. To derive a relevant and robust cut point, it is therefore imperative to appropriately incorporate multiple sources of assay variability into a cut point experiment. Industry white papers and regulatory guidelines recommend that cut points be statistically established with a risk-based approach to reduce subjectivity. Our automated data analysis tool reduces subjectivity by standardizing the experimental designs and the data analysis algorithm. It also provides cut point results in a short turn-around time, allowing the bioanalytical scientists to monitor the need for a cut point specific to a site, a population, or a study, to ensure detection of clinically relevant ADA, and ultimately patient safety. ECL bridging assay is one of the technologies that are commonly utilized for screening binding ADAs. The assay platform has been shown to be a robust tool for immunogenicity assessment. It serves as an assay model to demonstrate the utility of our automated statistical analysis tool. Several assay factors, such as date, analyst, instrument, the ECL detection plate, and detection read buffer, can contribute to assay background noise, assay variability, and sample responses. As the effect of these assay factors can vary from assay to assay, it is therefore important to incorporate the critical assay factors into the cut point experiment based on the understanding of the analytical method and the available resources. Neutralizing activity assay is a critical component of immunogenicity assessment. Due to the diversity of the assay platforms and methodologies used in measuring neutralizing activities, the NAb assay experimental designs provided by CAST are thus focused on the factors that contribute the most to the variability of the NAb assay. These assay factors include analyst, plate lot, and assay date. One important consideration is to standardize experimental designs to allow the automated solution to handle a majority (eg, 80%) of cut point analyses. Not only standardization improves quality of objective statistical analysis, but also promotes utilization of proper experiment design in setting up a cut point experiment. To this end, a library of standardized experimental designs has been compiled (not shown), with each design having select assay factors to be incorporated into the cut point experiment. Some levels of flexibility are given to allow the bioanalytical scientists to choose a design and the more relevant assay factors to fit the intended utility of an assay. The library also includes the appropriate

Please cite this article as: Su, C., et al., Enhancing efficiency and quality of statistical estimation of immunogenicity assay cut points through standardization and automation, J. Immunol. Methods (2015), http://dx.doi.org/10.1016/j.jim.2015.06.013

6

C. Su et al. / Journal of Immunological Methods xxx (2015) xxx–xxx

Fig. 3. CAST analysis workflow for estimation of immunoassay cut points.

statistical models for the individual designs, which are pre-specified to enable automated statistical modeling. Sometimes only a subset of the donor samples is tested in a cut point experiment because of resource constraint. In such cases, CAST provides reduced experimental designs that incorporate statistical considerations to reduce potential confounding. Additional experimental designs and statistical models can readily be added to the library. For screening ACP, a nominal false positive rate of 5% is typically adapted to improve the probability for the assay to identify all ADApositive samples (US Food and Drug Administration, 2009; Mire-Sluis et al., 2004; Committee for Medicinal Product for Human Use, 2007). An alternative false positive rate could be selected depending on the specific requirements of a drug development program, such as the development stage (e.g., clinical vs. non-clinical), the immunogenicity risk profile of the molecule (e.g., monoclonal antibodies vs. proteins having unique

Table 4 Design table of a cut point experiment. Site

Assay run

Analyst

Instrument

Plate lot

# of plates

Donor group (sequence)

Site 1

1

1

1

2

1

2

1

2

1

2

2

2

3

3

1

4

3

2

1 2 1 2 1 2 1 2 1 2 1 2

1 2 2 1 1 2 2 1 1 2 2 1

S1 S2, S3 S2, S3 S1 S1 S2, S3 S2, S3 S1 S1 S2, S3 S2, S3 S1

Site 2

endogenous counterpart), the assay methodology, and the intended use of the assay (e.g., screening vs. specificity) (Buttel et al., 2011). For a screening assay used to support clinical programs, a one-sided 95% prediction limit is recommended and is expected to yield 5% false positive results (US Food and Drug Administration, 2009; Mire-Sluis et al., 2004; Shankar et al., 2008). For non-clinical studies, a one-sided 99% prediction limit with a 1% false-positive rate may be adapted in the screening assay, as the purpose of nonclinical immunogenicity evaluation is to assess the impact of ADAs on drug exposure and activity (Ponce et al., 2009; Sauerborn, 2013). As such, the one-sided prediction limits (95% and 99%) are built into the CAST algorithm for estimating ACP. Following identification of a sample reactive for the presence of ADA in the screening step, a specificity confirmation is necessary to eliminate non-specific binding. This specificity confirmation can be accomplished by inhibiting (depleting) the assay signals by adding excess soluble drug to the sample. A DCP is thus needed to assess the level of inhibition in a sample. To determine DCP, samples that are used for the screening cut point are spiked with excess drug and tested for the reduction in the assay responses. The automated data analysis tool allows for estimation of both ACP and DCP in a single data analysis run. Several statistical approaches have been reported for ADA assay cut point calculation, which includes simple parametric, robust percentile estimation, and prediction interval approach (Hoffman and Berger, 2011; Jaki et al., 2011; Schlain et al., 2010; Zhang et al., 2013). Unlike the prediction interval approach, the simple parametric and robust percentile methods do not take into consideration the sampling variability and are thus not preferred. Such approaches for cut point calculation would result in a false positive rate higher than the targeted false positive rate. By comparison, prediction interval approach allows one to take into consideration the uncertainty associated with estimating the mean and the spread of the population when calculating a cut point. Hence, the prediction interval method is adopted and incorporated into the CAST

Please cite this article as: Su, C., et al., Enhancing efficiency and quality of statistical estimation of immunogenicity assay cut points through standardization and automation, J. Immunol. Methods (2015), http://dx.doi.org/10.1016/j.jim.2015.06.013

C. Su et al. / Journal of Immunological Methods xxx (2015) xxx–xxx

7

Fig. 4. An example of interface of immunoassay cut point analysis.

algorithms; a cut point is thus calculated as one-sided prediction limit of the assay response of drug-naïve donors. To estimate a cut point that achieves a targeted false positive rate (such as 5%), the method assesses both the uncertainty of the sample mean and future observation estimated from the model.

A normal data distribution is the pre-requisite of valid cut point estimation for prediction interval approach. Both residual and donor level data need to be evaluated for normal distribution. CAST offers a novel statistical algorithm to iteratively identify outliers as wells as data transformation. While the algorithm provides the initial route for identifying

Fig. 5. CAST output tables of equivalence evaluation.

Please cite this article as: Su, C., et al., Enhancing efficiency and quality of statistical estimation of immunogenicity assay cut points through standardization and automation, J. Immunol. Methods (2015), http://dx.doi.org/10.1016/j.jim.2015.06.013

8

C. Su et al. / Journal of Immunological Methods xxx (2015) xxx–xxx

Fig. 6. CAST output summary tables of screening ACP and specificity DCP.

and removing outliers, random effect modeling allows assessment of data distribution at the donor level, and identification and removal of outliers to achieve normality. To evaluate the contribution of factors to assay variability, a statistical algorithm is built into the software to iteratively eliminate factors that are not statistically significant. Not only are the resulting variance estimates used to calculate prediction limits, but also help the bioanalytical scientists better understand the assay performance, develop the strategy to maintain the assay performance, and ultimately improve the assay robustness. An immunogenicity assay can be implemented across different populations (e.g., healthy vs. diseased populations, testing facilities, and assay reagent lots). It is desirable for an automated solution to be able to assess and compare the cut points of different populations, e.g., different study populations, throughout the development cycle of a biotherapeutic. This is particularly important to be able to assess the need for a cut point specific to a population or a study, which is frequently encountered in immunogenicity studies. The equivalence evaluation of two different populations allows the users to make an informed decision on whether such a specific cut point is warranted, and to implement the cut point quickly to ensure patient safety. CAST offers an equivalence evaluation module to statistically determine whether a shared or two separate cut points are to be calculated for testing samples of different populations or at different test sites. The prediction interval algorithm extracts information from variance component analysis and efficiently implements cut point computation.

The CAST statistical algorithms have been shown to produce results comparable to manual analysis conducted by expert statisticians. In addition, they are sufficiently flexible to handle over 80% of in-house experimental data. CAST offers an automated data analysis solution for users with limited statistical training to perform cut point analysis from the user computer terminal and to generate cut point results within minutes at a click of the button. This is an important feature because cut points can be impacted by the underlying biology of a population or the biotherapeutic in a development program. This tool allows the user to closely monitor potential study- and population-specific cut points that are needed to ensure detection of clinically relevant ADA when the ADA assay is applied to various study populations. Potential site-specific cut points can also be readily assessed using CAST when an ADA assay is performed in various testing laboratories. A valid, relevant cut point is critical for appropriately assessing the immune response of a subject, which ultimately aids in evaluating the impact of the presence of ADA on pharmacokinetics, pharmacodynamics, efficacy, and safety of a biotherapeutic. Immunogenicity assay validation package has thus become an integral component of biologic licensure. To ensure consistent statistical analyses that are required to support assay validation, each CAST application, including cut point estimation, is developed and validated according to good software engineering practice and computer-related system validation principles (US Food and Drug Administration, 2002; Jones, 2010). Finally, CAST offers the functionality to store and organize the analysis details for each analysis submission in a secured network, allowing review of the

Fig. 7. CAST output summary tables of variance component analysis.

Please cite this article as: Su, C., et al., Enhancing efficiency and quality of statistical estimation of immunogenicity assay cut points through standardization and automation, J. Immunol. Methods (2015), http://dx.doi.org/10.1016/j.jim.2015.06.013

C. Su et al. / Journal of Immunological Methods xxx (2015) xxx–xxx

9

a. The graph on left represents the QQ plot for the Log-transformed screening data. b. The graph on right represents the QQ plot for the Log-transformed inhibitory depletion data. Fig. 8. CAST output QQ plots for residuals of the transformed data.

statistical results by statisticians and reconstruction of the analysis when necessary. In summary, a comprehensive web-based statistical analysis tool has been developed to calculate assay cut points for immunogenicity evaluation using the prediction interval approach, which incorporates multiple assay factors into cut point computation with a range of simple to complex experimental designs. Additionally, through variance component analysis, the tool provides statistical data for the bioanalytical scientists to assess the impact of an array of assay factors on the assay performance, and ultimately improve the assay robustness. The tool automates and integrates all necessary data analysis steps. It produces standardized outputs ready to be integrated into the assay validation data packages, which have become a critical component of biological license applications. CAST has been validated to ensure quick and reliable statistical analysis. Many other statistical analysis applications have been developed, encompassing assay sensitivity calculation, assay curve fitting, assay acceptance criteria determination, concordance, and robustness analysis for assay transfer. This tool is being routinely utilized for statistical analysis in support of assay development and validation on various bioanalytical platforms for immunogenicity studies. Acknowledgment This work was funded by Amgen Inc. The authors thank Silvia Vegas (Chiltern International, Cary, NC) for her contribution to system development. References Barger, T.E., Zhou, L., Hale, M., Moxness, M., Swanson, S.J., Chirmule, N., 2010. Comparing exponentially weighted moving average and run rules in process control of semiquantitative immunogenicity immunoassays. AAPS J. 12 (1), 79. Buttel, I.C., Chamberlain, P., Chowers, Y., Ehmann, F., Greinacher, A., Jefferis, R., et al., 2011. Taking immunogenicity assessment of therapeutic proteins to the next level. Biologicals 39 (2), 100. Chen, X.C., Zhou, L., Gupta, S., Civoli, F., 2012. Implementation of design of experiments (DOE) in the development and validation of a cell-based bioassay for the detection of anti-drug neutralizing antibodies in human serum. J. Immunol. Methods 376 (1–2), 32. Chen, X., Hickling, T., Kraynov, E., Kuang, B., Parng, C., Vicini, P., 2013. A mathematical model of the effect of immunogenicity on therapeutic protein pharmacokinetics. AAPS J. 15 (4), 1141. Chirmule, N., Jawa, V., Meibohm, B., 2012. Immunogenicity to therapeutic proteins: impact on PK/PD and efficacy. AAPS J. 14 (2), 296. Committee for Medicinal Product for Human Use, 2007. Guideline on immunogenicity assessment of biotechnology-derived therapeutic proteins. Doc. Ref. EMEA/CHMP/ BMWP/14327/2006 (London, 2007). Gupta, S., Indelicato, S.R., Jethwa, V., Kawabata, T., Kelley, M., Mire-Sluis, A.R., et al., 2007. Recommendations for the design, optimization, and qualification of cell-based assays used for the detection of neutralizing antibody responses elicited to biological therapeutics. J. Immunol. Methods 321 (1–2), 1.

Hammond, O., Reynolds, J., Rubinstein, L.J., Sikkema, D., Marchese, R.D., 2008. Complexities of clinical assay development and optimization prior to first-in-man immunization trials — a description of immunogenicity assay development for the testing of samples from a phase 1 Alzheimer's vaccine trial. J. Immunoass. Immunochem. 29 (4), 332. Hoffman, D., Berger, M., 2011. Statistical considerations for calculation of immunogenicity screening assay cut points. J. Immunol. Methods 373 (1–2), 200. Jaki, T., Lawo, J.P., Wolfsegger, M.J., Singer, J., Allacher, P., Horling, F., 2011. A formal comparison of different methods for establishing cut points to distinguish positive and negative samples in immunoassays. J. Pharm. Biomed. Anal. 55 (5), 1148. Jones, C., 2010. Software Engineering Best Practices — Lessons from Successful Projects in Top Companies. The McGraw-Hill Companies (660 pp.). Kelley, M., Ahene, A.B., Gorovits, B., Kamerud, J., King, L.E., McIntosh, T., et al., 2013. Theoretical considerations and practical approaches to address the effect of anti-drug antibody (ADA) on quantification of biotherapeutics in circulation. AAPS J. 15 (3), 646. Mire-Sluis, A.R., Barrett, Y.C., Devanarayan, V., Koren, E., Liu, H., Maia, M., et al., 2004. Recommendations for the design and optimization of immunoassays used in the detection of host antibodies against biotechnology products. J. Immunol. Methods 289 (1–2), 1. Oehlert, G.W., 1992. A note on the delta method. Am. Stat. 46 (1), 27. Ponce, R., Abad, L., Amaravadi, L., Gelzleichter, T., Gore, E., Green, J., et al., 2009. Immunogenicity of biologically-derived therapeutics: assessment and interpretation of nonclinical safety studies. Regul. Toxicol. Pharmacol. 54 (2), 164. Sailstad, J.M., Amaravadi, L., Clements-Egan, A., Gorovits, B., Myler, H.A., Pillutla, R.C., et al., 2014. A white paper—consensus and recommendations of a global harmonization team on assessing the impact of immunogenicity on pharmacokinetic measurements. AAPS J. 16 (3), 488. Satterthwaite, F.E., 1946. An approximate distribution of estimate of variance components. Biom. Bull. 2, 110. Sauerborn, M., 2013. Is the tiered immunogenicity testing of biologics the adequate approach in preclinical development? Bioanalysis 5 (7), 743. Schlain, B., Amaravadi, L., Donley, J., Wickramasekera, A., Bennett, D., Subramanyam, M., 2010. A novel gamma-fitting statistical method for anti-drug antibody assays to establish assay cut points for data with non-normal distribution. J. Immunol. Methods 352 (1–2), 161. Shankar, G., Devanarayan, V., Amaravadi, L., Barrett, Y.C., Bowsher, R., Finco-Kent, D., et al., 2008. Recommendations for the validation of immunoassays used for detection of host antibodies against biotechnology products. J. Pharm. Biomed. Anal. 48 (5), 1267. Shankar, G., Arkin, S., Cocea, L., Devanarayan, V., Kirshner, S., Kromminga, A., et al., 2014. Assessment and reporting of the clinical immunogenicity of therapeutic proteins and peptides—harmonized terminology and tactical recommendations. AAPS J. 16 (4), 658. Tatarewicz, S., Moxness, M., Weeraratne, D., Zhou, L., Hale, M., Swanson, S.J., et al., 2009. A step-wise approach for transfer of immunogenicity assays during clinical drug development. AAPS J. 11 (3), 526. Thway, T.M., Magana, I., Bautista, A., Jawa, V., Gu, W., Ma, M., 2013. Impact of anti-drug antibodies in preclinical pharmacokinetic assessment. AAPS J. 15 (3), 856. US Food and Drug Administration, 2002. General Principles of Software Validation; Final Guidance for Industry and FDA Staff. US Food and Drug Administration, 2009. Guidance for industry: Assay development for immunogenicity testing of therapeutic proteins (draft guidance). Published as a Notice in the Federal Register: 74 Fed. Reg. 63758. December 4, 2009 (Docket No. FDA-2009-D-0539). US Food and Drug Administration, 2013. Guidance for Industry: Immunogenicity Assessment for Therapeutic Protein Products. Zhang, L., Zhang, J.J., Kubiak, R.J., Yang, H., 2013. Statistical methods and tool for cut point analysis in immunogenicity assays. J. Immunol. Methods 389 (1–2), 79. Zhong, Z.D., Dinnogen, S., Hokom, M., Ray, C., Weinreich, D., Swanson, S.J., et al., 2010. Identification and inhibition of drug target interference in immunogenicity assays. J. Immunol. Methods 355 (1–2), 21.

Please cite this article as: Su, C., et al., Enhancing efficiency and quality of statistical estimation of immunogenicity assay cut points through standardization and automation, J. Immunol. Methods (2015), http://dx.doi.org/10.1016/j.jim.2015.06.013