Estimating analytical variability in two-dimensional data

Estimating analytical variability in two-dimensional data

Analytical Biochemistry 513 (2016) 36e38 Contents lists available at ScienceDirect Analytical Biochemistry journal homepage: www.elsevier.com/locate...

371KB Sizes 0 Downloads 30 Views

Analytical Biochemistry 513 (2016) 36e38

Contents lists available at ScienceDirect

Analytical Biochemistry journal homepage: www.elsevier.com/locate/yabio

Estimating analytical variability in two-dimensional data Ivan L. Budyak a, Kristi L. Griffiths b, William F. Weiss IV a, * a b

Biopharmaceutical Research and Development, Lilly Research Laboratories, Eli Lilly and Company, Indianapolis, IN 46285, USA Global Statistical Sciences, Lilly Research Laboratories, Eli Lilly and Company, Indianapolis, IN 46285, USA

a r t i c l e i n f o

a b s t r a c t

Article history: Received 27 May 2016 Received in revised form 19 August 2016 Accepted 22 August 2016 Available online 24 August 2016

Throughout the course of drug development there are many instances in which a variability assessment within a set of analytical data is required, which may be challenging for techniques that produce twodimensional data. This note describes an interval-based approach to variability assessment and demonstrates its applicability for analysis of near-UV circular dichroism (CD) spectra. The approach is generalizable and could be applied to two-dimensional data from other analytical techniques as well. © 2016 Elsevier Inc. All rights reserved.

Keywords: Higher order structure Spectrum Interval Noise reduction

The need for assessment of variability often rises in conjunction with many routine analytical tasks, e.g. to track instrument performance or assess method robustness. While an accurate estimate of variability can be challenging for higher order structure characterization techniques generating one-dimensional data as an output [1], the problem becomes even more complex for techniques that generate two-dimensional data, e.g. far-/near-UV circular dichroism (CD), intrinsic fluorescence, Fourier transform infrared spectroscopy [2], and differential scanning calorimetry [3], as these data often contain regions of different signal intensity, noise level, and broad and narrow bands. For one-dimensional data with normally distributed values a tolerance interval (TI) can be employed to estimate the region of expected variability. The upper and lower bounds for a two-sided TI are calculated as

TI ¼ y±ks;

(1)

where y is the sample mean; k is a parameter called a tolerance factor which depends on the confidence level, proportion of the population, and the sample size; and s is the sample standard deviation. A TI covering, with 95% confidence, 99% of the sampled population (95/99 TI) is commonly used [4]. A practical recommendation for the minimum size of the data set N to conduct a TI

Abbreviations: CD, circular dichroism; TI, tolerance interval; VI, variability interval. * Corresponding author. E-mail address: [email protected] (W.F. Weiss). http://dx.doi.org/10.1016/j.ab.2016.08.021 0003-2697/© 2016 Elsevier Inc. All rights reserved.

determination is N  6 and aligns with the ICH Q2(R1) recommendation for evaluating precision [5]. When the 95/99 TI approach is applied to two-dimensional data, several challenges emerge. Consider a set of N two-dimensional data each consisting of n ordered (x,y) points, e.g. seven near-UV CD spectra of an IgG4 (Fig. 1A). If the abscissa (x) has negligible or no variability, the problem is essentially reduced to the estimate of the variability of the ordinate (y). This assumption is expected to hold for analytical data where the abscissa is tightly controlled, such as spectra and thermograms. The standard deviation and, consequently, the 95/99 TI calculated according to Eq. (1) become a function of wavelength (Fig. 1A, also inset) with the resulting 95/99 TI profile being not smooth. The phenomenon has been recently observed for 95/99 TI-based variability estimates applied to CD data [6]. Ensemble biophysical techniques yield bands resulting from transitions between energy levels. The bands are not infinitely sharp due to broadening. Band shapes are approximated by Gaussian or Lorentzian functions [7], and the whole spectrum by a linear combination thereof. As Gaussian and Lorentzian functions are smooth, the theoretical spectrum must be smooth. Furthermore, the standard deviation, a normalized square root of a linear combination of multiples of N smooth functions, must also be smooth. Therefore, it is limited experimental sampling which leads to (i) non-smoothness of the underlying mean and (ii) nonsmoothness of the standard deviation. The latter is amplified in the 95/99 TI through multiplication by the tolerance factor k (see Eq. (1)) and becomes the major contributor to the overall nonsmoothness of the 95/99 TI profile. One way to better approximate the variability is to increase the

I.L. Budyak et al. / Analytical Biochemistry 513 (2016) 36e38

A

20

37

B

10

59

0

MRE [deg*cm2/dmol]

250

300

350

35

0

15 -50

5 -100

1 -150 250

ρ1

C

300

350

wavelength [nm]

250

300

350

wavelength [nm]

1

E

20

0.5

10

0

0 250

300

350

300

350

-1 0

20

40

60

80

100

window [w]

D

1 0.5

ρ

0

MRE [deg*cm2/dmol]

-0.5 0

-50

-100

-0.5 -150

-1 0

5

10

h

15

20

250

wavelength [nm]

Fig. 1. (A) Near-UV CD spectra of an IgG4 monoclonal antibody, seven independent preparations of the same batch (solid lines) with a 95/99 TI (dotted lines). The inset above the graph shows the half-width of the 95/99 TI. Each spectrum is a mean residue ellipticity of an average of three protein sample scans subtracted by an average of three buffer blank scans. All data were collected by one operator during one continuous experimental session using an Aviv 62NT instrument (Aviv Biomedical, Lakewood, NJ) at ambient temperature. (B) The effect of a two-sided moving average filter applied to the half-width of the 95/99 TI, the numbers on the left denoting the window size (w). (C) The first-order autocorrelation r1 for the residuals as a function of window size w; the point corresponding to w ¼ 11 is shown as an open circle. (D) The autocorrelation r for the residuals at w ¼ 11 as a function of correlation order h (see Eq. (4)); the point corresponding to r1 is shown as an open circle. (E) Same data as in (A) with a VI at w ¼ 11 (dotted lines). The inset above the graph shows the half-width of the VI.

size of the data set N. However, N may be limited by the throughput of the technique and/or analyst or the size of the historical data set. If the number of measurements is sufficient (N  6, see above), the variability can be estimated by filtering out the noise in the variance resulting from limited sampling. A moving average filter is a simple and robust approach commonly employed for noise reduction [8]. A two-sided averaging applied to the variance yields an average variance at point j, ^s2j

8 > > w1 > > > Pjþ 2 > > > s2 > > w1 i > > i¼j > > w1 w1 2 > 2 ;n  sj ¼ 2 2 ; w > > w  1 2 2 > > b s j ¼ bs jþ1 ; 1  j  > > > 2 > > > > > w1 2 >b > b2 > : s j ¼ s j1 ; n  j > n  2

(2)

where n is the total number of points, w is the window size, and s2i is

the variance at point i, respectively. Two potential limitations of a moving average filter should be considered: (i) equal weighting of the central and peripheral points, and (ii) the need to treat (w1)/2 end points separately, assigning the values corresponding to the first or the last window that is fully within the data range. If the noise is approximately constant across w points, smoothing with a moving average is a reasonable approach. Alternatively, one can consider increasing resolution and/or extending the data collection range. A smoothed standard deviation ^sj calculated as the square root of ^s2j at each point j can be used to form an interval much like a 95/99 TI. The resulting interval will be referred to as a ‘variability interval’ (VI) in order to distinguish it from a traditional TI. The VI bounds can be defined by substituting the smoothed standard deviation ^s for s in Eq. (1); note that the mean is not subjected to the smoothing procedure in order to avoid information loss. The effect of different averaging window sizes applied to the corresponding variance is demonstrated in Fig. 1B for the upper VI bound. It is clear that large smoothing windows contort the intrinsic features of the function leading to oversmoothing (Fig. 1B, w ¼ 59), while small windows do not fully eliminate the noise (undersmoothing; Fig. 1B, w ¼ 5). The proper window should filter

38

I.L. Budyak et al. / Analytical Biochemistry 513 (2016) 36e38

out only random noise while preserving the true variability. This window can be determined based on the randomness of the residuals between the VI and the 95/99 TI. For the corresponding upper bounds the residuals r are

r ¼ TI  VI ¼ ðy þ ksÞ  ðy þ kb s Þ ¼ kðs  b sÞ

(3)

The same condition holds for the lower bounds for which the residuals are equal to k(^ss). The randomness of residuals can be tested using the first order autocorrelation coefficient r1 [9] calculated as

n1 rh ¼ nh1

Acknowledgements

Pn

j¼hþ1 rjh rj Pn 2 ; j¼1 rj

was successfully applied to near-UV CD data. It is concluded to be simple, robust, and applicable to biophysical characterization data with varying resolutions, noise levels, and spectral features. A VI may potentially be employed for comparison of multiple data sets. For instance, groups of system suitability data may need to be compared in order to track instrument performance and/or assess method robustness. Evaluation of the impact of process change(s) is another common analytical task requiring comparison of two or more groups each consisting of multiple batches.

(4)

where n is the total number of points, h is the correlation order, and rj is the residual value at point j. The residuals are considered random if r1 is zero. For the near-UV CD data set in Fig. 1A the calculation of r1 as a function of window size w yields the profile shown in Fig. 1C. Finding the smallest w yielding r1 reasonably close to zero here gives w ¼ 11 (open circle in Fig. 1C and D). Once the window is chosen it is important to test whether smoothing introduces significant higher order autocorrelations (at h > 1) in residuals, which can be visualized in a r(h) plot (Fig. 1D). For w ¼ 11 the absolute value of r does not exceed 0.25 at any h between 1 and 20 suggesting no significant higher order autocorrelations. Therefore, a VI generated by applying a sliding window averaging of w ¼ 11 to the 95/99 TI approximates the observed variability while filtering out the random noise in the variance resulting from limited sampling. The near-UV CD data set with the overlaid VI is shown in Fig. 1E. Constant variance across seven end points resulting from smoothing is consistent with the observed and expected variabilities over the corresponding wavelength ranges (compare Fig. 1A and E). A universal variability interval approach is proposed to estimate analytical variability in two-dimensional data sets. The approach

The authors thank Michael R. De Felippis, Bryan J. Harmon, and Jeffrey D. Hofer for insightful discussions and critical reading of the manuscript. References [1] A. Pekar, M. Sukumar, Quantitation of aggregates in therapeutic proteins using sedimentation velocity analytical ultracentrifugation: practical considerations that affect precision and accuracy, Anal. Biochem. 367 (2007) 225e237. [2] Y. Jiang, C. Li, X. Nguyen, S. Muzammil, E. Towers, J. Gabrielson, L. Narhi, Qualification of FTIR spectroscopic method for protein secondary structural analysis, J. Pharm. Sci. 100 (2011) 4631e4641. [3] J. Wen, K. Arthur, L. Chemmalil, S. Muzammil, J. Gabrielson, Y. Jiang, Applications of differential scanning calorimetry for thermal stability analysis of proteins: qualification of DSC, J. Pharm. Sci. 101 (2012) 955e964. [4] E. Rozet, S. Rudaz, R.D. Marini, E. Ziemons, B. Boulanger, P. Hubert, Models to estimate overall analytical measurements uncertainty: assumptions, comparisons and applications, Anal. Chim. Acta 702 (2011) 160e171. [5] Validation of Analytical Procedures: Text and Methodology, ICH Harmonised Tripartite Guideline, 2005. Q2(R1). [6] J.C. Lin, Z.K. Glover, A. Sreedhara, Assessing the utility of circular dichroism and FTIR spectroscopy in monoclonal-antibody comparability studies, J. Pharm. Sci. 104 (2015) 4459e4466. [7] J.M. Hollas, Modern Spectroscopy, fourth ed., John Wiley & Sons, 2004. [8] S.W. Smith, The Scientist and Engineer's Guide to Digital Signal Processing, 1 ed., Newnes, 2002. [9] G. Vivo-Truyols, P.J. Schoenmakers, Automatic selection of optimal SavitzkyGolay smoothing, Anal. Chem. 78 (2006) 4598e4608.