Chapter 7
Essentials of laboratory quality management 7.1
Laboratory quality management
Quality can be defined as the conformity of a service provider with the requirements of users or customers, or in other words, satisfaction of users or customers’ needs and expectations. In standard of care, while patients and the parties who pay the bills are the ultimate customers, health-care professionals (HCP) are the immediate users of the laboratory services. In drug development, while patients and HCP can be customers, drug developers are the ultimate beneficiaries. The principles of total quality management (TQM) or quality management system (QMS) have become the foundation by which good clinical laboratories are managed and operated [1]. Fig. 7.1 adapted from Ref. [2] depicts the following five components of QMS: 1. Quality planning starts as early as conceptualizing and scoping a test for a specific need, simply by laying out measures that will allow the test to meet user’s expectations. 2. Quality process includes analytical processes, of which assay validation or verification, and the general policies, guidelines, and procedures that define how the work will be done. This means that assay validity is just a component of many within TQM and without a full integrated QMS, the validity of an assay cannot guarantee acceptable performance in operation. 3. Quality control (QC), as it will be detailed later, includes different actions and applications to assure and/or monitor quality. 4. Quality assessment is concerned with broader measures than QC and tools, mainly administrative, to monitor laboratory performance, such as the documentation of a facility organizational structure, personnel qualifications and competence, recording test menu and utility, tools for patient and specimen identification, defining turnaround time to meet client expectations, tracking of laboratory error rates, assessment of customer satisfaction, and maintaining document records. From this perspectives, since a retrospective measurement of laboratory performance does not by itself either detect problems in time or improve quality, quality assessment is the proper term for what is traditionally called quality assurance (QA) [1]. In fact, as it will be shown later, QA can be applied to some parameters of what is traditionally called QC. 5. Quality improvement denotes problem-solving process by identifying the root cause of a problem, finding a remedy, and ensuring a satisfactory outcome. The new cycle of quality planning starts with proactive implementation of the remedy into daily practice to prevent similar errors from happening in future. Quality improvement and quality planning for a specific issue can be described as a corrective actionpreventive action.
7.2
Quality control
QC will be handled from two aspects, quality measures and quality indicators. From compliance perspectives, laboratories are responsible for all quality-related aspects, but in drug industry, it is in the best interest of a pharmaceutical sponsor to ensure that the lab providers employ the right analytical and quality tools, carry the right certifications and accreditations, and comply with all applicable regulations and guidelines.
7.3
Quality measures
While QA (or assurance of quality) can be the right term, to avoid confusion with the traditional QA terminology, we will call it quality measures that are the tools that should be proactively employed by a lab to avoid errors and to report Biomarkers, Diagnostics and Precision Medicine in the Drug Industry. DOI: https://doi.org/10.1016/B978-0-12-816121-0.00007-6 © 2019 Elsevier Inc. All rights reserved.
149
150
Biomarkers, Diagnostics and Precision Medicine in the Drug Industry
Quality planning (QP)
Quality laboratory process (QLP)
Quality improvement (QI)
TQM
Quality assessment (QA)
Quality control (QC)
FIGURE 7.1 The principles of total quality management (TQM) or quality management system (QMS).
reliable results. While the bulk of these tools are used in the analytical phase, as it will be detailed later, tools should cover the other two phases of laboratory work, preanalytical and postanalytical as follows:
7.3.1
Preanalytical quality tools
Fig. 7.2 depicts steps involved in the three phases of laboratory testing, orange for preanalytical, purple for analytical, and blue for postanalytical phase. For preanalytical phase, it is good laboratory practice to assume that this phase is mostly handled by busy persons who have very little, if any, of lab experience and have no understanding of the impact of preanalytical variable on lab results and, hence, on patient health or trial outcome. Preanalytical quality tools include the followings: G
G
G G
Designing a simple easy-to-read and interpret laboratory order (requisition form) with fields enough to capture all needed demographics, dates, and tests to be ordered. Designing a laboratory manual with type and amount of samples needed and all necessary instructions on sample acquisition, preparation, for example, centrifugation of blood, freezing or fixation, and embedding of tissues, storage and shipping/transportation, and names and contact information of all responsible parties including emergency contacts. It is advisable to include easy-to-follow diagrams and flow charts and highlight or box critical precautions. Establishment of a comprehensive system for sample tracking and reconciliation in real time. Proper and enough training of personnel at the clinical sites who will be handling this phase and ensuring that the sites have the proper equipment and space.
7.3.2
Postanalytical quality tools
This phase spans the steps following the approval of an analytical run by a lab technologist, supervisor, or director, which triggers data interface from the analytical instrument to laboratory information management system (LIMS) in automatic devices or manual data entry in techniques such as immunohistochemistry (IHC). Postanalytical quality tools include the followings: G
G
Verification of the assumptions and algorithm and/or equations to translate an assay readouts to reportable results considering mathematical factors to account for any sample processing, extraction, and/or dilution. Designing lab report or study data transfer that capture all necessary entries including demographics, type and quantity of samples, results and result units, and flags for abnormal results if applicable.
Essentials of laboratory quality management Chapter | 7
151
Clinician or protocol Reporting/ Data transfer
Requisition
Data processing
Sample acquisition
Sample analysis
Sample preparation
Sample processing
Transport and storage
FIGURE 7.2 Laboratory test life cycle, from requisition to reporting.
G
G
For automated systems, G Establishment of alerting message in regards to samples with results above an assay upper limit of quantification (ULOQ) or below the lower limit of quantification (LLOQ) and validating the alerts by running experimental samples with expected high and low values. G Validating the integrity of the analytical instrument-LIMS link by performing mock data transfer from experimental sample analysis. For manual analysis (or manual scoring), for example, IHC, in situ hybridization (ISH), G Preparing and printing forms for manual data entry/recording with fields to document sample ID, date of stain, date of scoring, bench analyst name, pathologist name and signature, and results. G Establishment of SOP/instructions for transcribing handwritten data to electronic file with a double-checking by another qualified person. G Ensuring lab quality assessment (commonly known as quality assurance or QA) review before data transfer to Clinical Research Organization (CRO) and/or sponsor.
7.3.3
Analytical phase quality control tools
While the majority of preanalytical and postanalytical issues are caused by human errors, analytical issues are caused mainly by device errors, especially in automated analyses including tissue autostaining. As said earlier, the following analytical QC tools include proactive measures taken to eliminate or control analytical variables, or in other words, ensure quality, and others to monitor quality. QC tools for this phase include the followings:
7.3.3.1 Proper selection of analytical methods As discussed under the assay development and validation, the right methodology on the right platform should be developed using the right reagents and adequately validated for its intended objective.
152
Biomarkers, Diagnostics and Precision Medicine in the Drug Industry
7.3.3.2 Assurance of accessory equipment performances and water quality PerClinical Laboratory Improvement Amendment (CLIA) regulations [3], the laboratory must establish and follow procedures for performing maintenance and function checks on each piece of equipment/instrument it uses, including those that are peripherally involved in patient testing, for example, incubators, centrifuges, safety cabinets, autoclaves, and microscopes. A lab has to periodically calibrate supplemental equipment and accessories, for example, automatic pipettes, centrifuges, and electric balances. Also, the lab has to monitor ambient temperature and humidity, and refrigerator and freezer’s temperatures. Water quality is classified by several different organizations into different reagent grades dependent on physical, chemical, and microbial characteristics. CLIA requires each laboratory to use the appropriate water quality as required for each instrument, kit, or test system. Laboratories should assure water quality and consider parameters, such as pH, silicate content, particulate matter, and bacterial and organic content in assessing water quality [3].
7.3.3.3 Proper verification of reagent lots Reagent lot numbers and expiration dates, supported by the certificate of analysis whenever applicable, should be properly documented and monitored. Different lots of reagent may behave differently in the measurement process and lead to differences in patient results. An example is the antibodies, especially from animal origins, used to formulate different lots of immunoassay reagents may have different binding characteristics [4]. The FDA bioanalytical guidance [5], indicates a lot-to-lot checking for variability and comparability for any critical reagents. There was no definition of critical reagents provided, but any reagent, which contributes to the reaction, should be considered critical. Immunoassays are particularly prone to the lot-to-lot differences since antibodies, like any biologicals, are more liable to variability during production than small molecules. If not well investigated, change of reagent lots can impose significant deviations in quantitative and qualitative biomarker results [4,68]. Each lab has to prepare or procure new lot of reagents before the expiration or exhaustion of current lot to allow a lot-to-lot verification. 7.3.3.3.1
Examples of current practice for new lot verification
Clinical laboratory practices for a lot-to-lot evaluation vary widely, ranging from testing as few as 34 samples to as many as 2040 samples with each new reagent lot. Patient samples, QC material supplied by the reagent vendors, third-party QC material, and in-house QC material can be used [8]. Statistical approaches also vary between labs [4,8,9]. Relying on QC materials for the lot-to-lot verification was discouraged by some researchers due to possible noncommutability between QC materials and matrix-based patient samples [10]. While we do not see noncommutability as a valid reason, having patient sample representations is a good practice. CLSI EP26-A guidelines [11] intended to provide laboratories with a practical protocol for verifying the consistency of the analytical performance of a test between consecutive reagent lot changes. According to the EP26-A guidelines, the number of samples required for testing at each target analyte concentration, defined by medical decision point(s), is determined considering the lab within-reagent lot imprecision and the intrarun imprecision (repeatability) of a test. The new reagent lot is deemed acceptable if the mean difference per target concentration was less than a predefined rejection limit (the critical difference). A medical institution’s internal policy employs 20 samples with the acceptance criteria as slope between 0.90 and 1.10, intercept ,12.5 ng/mL (,50% of lowest reportable value), R2 . 0.95, and ,10% mean difference between reagent lots [6]. However, it was found that applying these criteria did not enable the detection of drift due to lot changes in insulin-like growth factor 1 (IGF-1) assay reagents [4,8]. The institutional lot verification protocol with the same acceptance criteria, which failed to detect drift in IGF-1 [8], was applied at the same institution against the CLSI EP26-A for the prospective evaluations of two reagent lots each for thyroid-stimulating hormone (TSH), thyroglobulin (Tg), thyroxine (T4), triiodothyronine (T3), free T3 (fT3), and thyroid peroxidase antibody (TPO Ab). While the number of samples used per the institutional protocol was consistent at 20, EP26-A guidelines required 23 for TSH, 17 for Tg, 33 for T4, 31 for T3, 48 for fT3, and 1 for TPO Ab. Using the final decision regarding pass/fail, 9 out of the 12 paired verifications (75%) were in agreement between the institutional protocol and EP26-A. The discordant evaluations included single lots of TSH, Tg, and TPO Ab. A lot of TSH and Tg were deemed acceptable by the institutional protocol and unacceptable by EP26-A, but the TPO Ab outcome was the opposite. The TPO reagent lot was shown by the institution to have a calibration bias at the upper end of the assay measurement range (AMR). While the institutional protocol was not robust enough to predict new lot variability in earlier evaluation [6], the outcome of the comparative study made the lab to decide not to adopt the EP26-A. Challenges behind this decision included the amount of effort needed to set up the study per the CLSI guidelines, inability to use the instructions to identify sample number requirements in some cases and the amount of samples needed. Also, authors listed the shortcoming that EP26-A was not designed to monitor long-term trend in the lot-to-lot performance due the lack of regression analysis [12].
Essentials of laboratory quality management Chapter | 7
7.3.3.3.2
153
Additional adjudication of the guideline and the institutional protocol
% Difference of Lot B from Lot A
It seems unrealistic to use just one sample to verify a new lot of reagents as it was determined by the CLSI guidelines for the TPO example abovementioned. The institutional protocol did not seem good enough to detect even a significant lot-to-lot shift. Fig. 7.3A and the average percentage of difference in Table 7.1 depict a lot-to-lot verification scenario for an assay with AMR of
(A) 700
y = 0.9032x + 2.2915 R2 = 0.9946
Lot B (mg/dL)
600 500 400 300 200 100 0 0
100
200
300
400
500
600
(A’)
20 15 10 5 0 –5 –10 –15 –20 0
700
100
200
% Difference of Lot B from Lot A
(B) y = 0.9724x + 3.4528 R2 = 0.9504
700
Lot B (mg/dL)
600 500 400 300 200 100 0 0
100
200
300
400
500
600
400 300 200 100 0 200
300
700
500
600
700
15 10 5 0 –5 –10 –15 –20 –25 0
400
Lot A (mg/dL)
500
600
700
% Difference of Lot B from Lot A
Lot B (mg/dL)
500
100
600
100
200
300
400
Lot A (mg/dL)
y = 1.0156x –3.8057 R2 = 0.9916
0
500
20
700
(C) 600
400
(B’)
25
Lot A (mg/dL)
700
300
Lot A (mg/dL)
Lot A (mg/dL)
(C’)
15 10 5 0 –5 –10 –15 0
100
200
300
400
500
600
700
Lot A (mg/dL) FIGURE 7.3 Scenarios to demonstrate improper and proper assay reagents lot-to-lot verification Scatter plots (A) and (B) show examples of common practice where the new lot passes the acceptable criteria, but the difference plots (A0 ) and (B0 ) show that the new lots in the two scenarios should not be accepted due to consistent bias and random variability, respectively. (C) A scenario for a good lot-to-lot verification practice. While the correlation plot (C) looks similar to the (A) and (B) abovementioned, the difference plot (C0) shows a good distribution of the values within 610%. (D-F) Plots D and D0, and E and E0 are for scenarios for good correlation and average percentage of bias from two consecutive lots that are very likely acceptable by common practice. However, cumulative bias from the two lots can exceed the total allowable error of an assay. Another scenario for new lot verification that is highly likely to be accepted by common practice, but concentrations in the samples used only cover small portion of the assay dynamic range that may not represent the assay performance at higher concentrations.
154
Biomarkers, Diagnostics and Precision Medicine in the Drug Industry
700
y = 0.9601x–1.7129 R2 = 0.9954
600
Lot B (mg/dL)
% Difference of Lot B from Lot A
(D)
500 400 300 200 100 0 0
100
200
300
400
500
600
(D’)
15 10 5 0 –5 –10 –15 0
700
100
200
y = 0.9537x–0.6079 R2 = 0.9937
600
Lot C (mg/dL)
% Difference of Lot C from Lot B
(E) 700
500 400 300 200 100 0 0
100
200
300
400
500
600
% Difference of Lot B from Lot A
Lot B (mg/dL)
400 300 200 100 0 200
300
400
700
0 –5 –10 –15 100
200
300
400
500
600
700
500
600
700
Lot B (mg/dL)
y = 0.996x + 1.0636 R2 = 0.9037
100
600
5
0
(F)
0
500
10
700
700
500
400
(E’)
15
Lot B (mg/dL)
600
300
Lot A (mg/dL)
Lot A (mg/dL)
500
600
(F’)
15 10 5 0 –5 –10 –15 0
700
100
200
300
400
Lot A (mg/dL)
Lot A (mg/dL) FIGURE 7.3 Continued
TABLE 7.1 Percentage of difference of new lot results from old (in current use) lot results. Scenario
A
B
C
D
E
F
Average
2 9.5
0.0
0.2
2 4.7
2 5.0
0.4
Minimum
2 18.8
2 23.8
2 6.9
2 9.2
2 10.2
2 10.1
Maximum
2 0.8
20.9
9.7
1.1
1.7
10.2
Essentials of laboratory quality management Chapter | 7
155
2750 mg/dL, lower limit of reference range (lower reportable limit) of 20 mg/dL [13], that is, mimics some blood glucose assays. Applying the institution’s protocol and acceptance criteria listed earlier, this new lot is acceptable because the slope is 0.9032 (between 0.9 and 1.1), intercept is 2.2915 (,50% of the lower reportable limit), R2 is 0.9946 ( . 0.95), and the average of percent of differences is 29.5% (,10%). However, the difference plot (Fig. 7.3A0 ) and Table 7.1 clearly demonstrate consistent negative bias with all points shifting between 20.8% and 218.8%, which should make the new lot is unacceptable. Fig. 7.3B and B0 are for another scenario where all the institutional acceptance criteria were met with even excellent (0%) average percentage shift but, as demonstrated by the difference plot and Table 7.1, percentage of differences are widely scattered between 223.8% and 20.9%. This situation is even more problematic than just lot-lot-lot unacceptability because it, also, indicates unacceptable assay imprecision.
7.3.3.3.3 A proposal for lot-to-lot verification Similar to method comparison, discussed in the previous chapter, with the proper statistical evaluation, type rather than number of samples is important for a reagent lot-to-lot verification. Also, the cumulative effect from different sequential lots should be monitored carefully. Here, we are proposing the following protocol for lot verification in quantitative assays, which has been tried in multiple occasions and worked perfectly. 7.3.3.3.3.1 Study plan Use 20 samples but if infeasible, as low as 10 samples can be sued with predefined results to cover as much as possible of an AMR (or reportable range). If neat human samples cannot be available to cover the AMR, contrived samples made by spiking the analyte (which does not need to be a primary reference material) can be used. Samples are analyzed on the two lots of reagents. 7.3.3.3.3.2 Statistical tools Calculate the percentage of difference of new lot from the old lot results for each pair of results and the average of percent of differences. Plot the percentage of differences on the y-axis against old lot results on the x-axis. Plot new lot results on the y-axis against old lot results on the x-axis. 7.3.3.3.3.3 Acceptance criteria Individual percentage of differences should be within the assay acceptable total error allowed ( 6 10% for this assay). An average percent of differences should be as close as possible to 0.0% but not exceeding 1/2 of the TEa. No other criteria are critical, but R2 $ 0.95 is a complementary. As depicted by Fig. 7.3C and C0 and Table 7.1, the new lot in this scenario is acceptable. 7.3.3.3.3.4 Acceptable but consistently biased lot For any consistent bias, especially if the average of percent of differences exceeds 1/4 of the TEa, attention should be paid on the next lot evaluation to avoid cumulative shift. Fig. 7.3D and D0 and Table 7.1 depict a scenario where Lot B was consistently biased from Lot A with an average bias of 24.7% (less than 1/2 the TEa) but, as shown by Fig. 7.3E and E0 and Table 7.1, Lot C had similar consistent negative bias of close magnitude (25.0%). Both of lot B and lot C are acceptable, not only based on the institution’s or CLSI guidelines criteria but by the proposed criteria if lot B to A and lot C to B data are adjudicated separately. However accepting Lot C will likely introduce a total drift in clinical results by about 10% (the assay TEa). This example emphasizes the importance of longitudinal monitoring of a lot-to-lot variation and, also, confirms the previous point that the criteria employed by the medical institution lab or the EP26 guidelines are not enough to detect unacceptable lot of reagents. 7.3.3.3.3.5 Type rather than number of samples matters Data depicted by Fig. 7.3F and F0 demonstrates another scenario that can be acceptable by others. However, while the average and individual percentage of differences are acceptable according to our proposal, not only because of the relatively low R2 but, mainly, because of the limited range of sample concentrations, this is unacceptable study. Please note that the graph axes were left wide intentionally to show the concentrations range compared to Fig. 7.3C and C0 . Since number of samples here was 40 compared to 20 in other scenarios including 7.3C, this scenario is to show the concentration of samples rather than number, which matters.
156
Biomarkers, Diagnostics and Precision Medicine in the Drug Industry
7.3.3.4 Calibration of the assay Calibration of an assay using a series of solutions containing the target analyte at known concentrations is the important QC tool that links the analytical raw signal, for example, optical density, fluorescence units, Ct, with the concentration of the analyte. There is a considerable difference between CLIA and FDA requirements in this regard, and laboratory practices for lab-developed tests (LDT) are inconsistent. Qualitative technologies, for example, IHC, ISH, are usually not candidates for calibration. 7.3.3.4.1
Requirements and expectations
7.3.3.4.1.1 Clinical Laboratory Improvement Amendment requirements CLIA regulations require calibration and calibration verification procedures to substantiate the continued accuracy of the test system throughout the laboratory’s reportable range of test results. CLIA [14] defined the following two parameters: G
G
Calibration is the process of testing and adjusting the instrument or test system readout to establish a correlation between the instrument’s measurement of the substance being tested and the actual concentration of the substance. Calibration verification means testing materials of known concentration in the same manner as patient specimens to assure the test system is accurately measuring samples throughout the reportable range.
When each run is calibrated, calibration serves as a QA (prospective) tool as calibration should be acceptable before the further assessment of other QC tools or patient results. For occasional calibration verification the exercise can be considered a retrospective quality indicator to assess a system performance, and it will be discussed under “Quality indicators” later in this chapter. CLIA stated that lab should follow the test manufacturer’s instructions, but if no instructions provided, the lab should calibrate at least every 6 months using at least three levels (low, medium, and high) covering the reportable range. Calibration can be more frequent than once every 6 months in the case of reagent lot change, after major preventive maintenance or if QC show unusual trend. CLIA waives this requirement for automated chemistry analyzers if the lab is running three levels of QC (low, medium, and high) more than once a day of testing using the National Institute of Standards and Technologytraceable materials and QC results meet the acceptance criteria [3]. 7.3.3.4.1.2 FDA requirements FDA bioanalytical guidelines [5] requires a zero (no analyte) and at least six nonzero calibrators covering the AMR including LLOQ made in sample matrix to be used for the calibration of every run. For chromatographic methods, observed (back-calculated) values for all nonzero calibrators should be 6 15%, except at LLOQ where the calibrator can be up to 6 20% of nominal concentrations, but for LBA, all nonzero calibrators should be 6 20%, except at LLOQ and ULOQ where the calibrator can be up to 6 25% of nominal concentrations. For both technologies, 75% and a minimum of six nonzero calibrator levels should meet these criteria in each run. 7.3.3.4.1.3 Laboratory-developed test With the flexibility in CLIA requirements for calibration, interlab, or even within-lab, practices are inconsistent. For LDT, we recommend to include calibrators covering the AMR with each batch of samples. For a continuous flow analysis, for example, automated analyzers and laboratory certificationmass spectrometry (LCMS), a batch can be an 8-h shift, or 24-h if analysis is not interrupted. For manual assays, each plate, for example, 48, 96, or 384 well, is considered a batch. 7.3.3.4.2
Calibrators
If available, certified reference material should be used, but, otherwise, pure material, preferably, provided with the reagent kit or available from a credible manufacturer can be used. As it is shown later, number and concentrations of calibrators are determined by the type of calibration curve fit. 7.3.3.4.3
Assessment of calibration curve
CLIA did not determine calibration performance acceptance criteria, and the laboratory medical director is ultimately responsible for defining the quality requirement and the limit for acceptable performance [15].
Essentials of laboratory quality management Chapter | 7
157
TABLE 7.2 Calibrator back-calculated results and their percentage of differences from nominal values in quantitative reverse transcriptase-polymerase chain reaction example detailed in the previous chapter. 11000,000 % Diff. from nominal
101000,000
Nominal copy no.
Ct
Backcalculated
Backcalculated
1000,000
20.74
1268,938
26.9
1013,330
100,000
23.98
113,276
13.3
10,000
27.36
9110
1000
30.52
863
100
34.01
64
10
36.96
1
38.69
% Diff. from nominal
1001000,000 Backcalculated
% Diff. from nominal
1.3
973,301
2 2.7
103,306
3.3
102,104
2.1
2 8.9
9542
2 4.6
9717
2 2.8
2 13.7
1029
2.9
1078
7.8
2 36.1
88
2 12.0
95
2 5.0
7
2 30.0
11
10.0
12
20.0
2
100.0
3
200.0
4
300.0
R2 has been widely used as the sole criterion for calibration curve acceptance. R2 cannot be enough but the drift of calibrator back-calculated results, which are usually automatically presented by the autoanalyzer or plate reader, from nominal values should be also considered. As an example, Table 7.2 shows the back-calculated values of the calibrators using the three curve fits demonstrated by Fig. 6.2D in the previous chapter. Using the curve fit from 1 to 1000,000 copy calibration, while R2 was greater than 0.99, percentage of difference of back-calculated copy numbers from only 3 out of the 7 calibrators could be acceptable (within 6 20% of nominal values). Using 101000,000 or 1001000,000 copies as calibrators, other than the 1 copy calibrator, back-calculated results from the other 6 calibrators were acceptable. This confirms the linearity conclusion made in the previous chapter where 101000,000 copies was considered the AMR and proves that R2 should not be the only factor to adjudicate a calibration curve.
7.3.3.4.4 1-Point or 2-point calibrator For assays with nonlinear calibration curve fit, multipoint calibrator is necessary and the more the points the accurate the results but for those with linear curve fit, calibration can be done by only two levels (2-point) or one level (1-point) calibrators with concentrations within assay AMR. 1-Point calibration can be enough if the multipoint calibration curve cross x-axis and y-axis at the “0” point, but 2-point calibration is needed if the multipoint regression line shows a tangible y-axis intercept. However, results may not be as accurate as those from multipoint calibration because drift in 1 or 2 points within a multipoint calibration curve can be absorbed or mitigated by the accuracy of the remaining points. Fig. 7.4A and B demonstrates assay calibration curve data from two assays for the same analyte with AMR of 2750 mg/dL (mimicking blood glucose) where the blue solid regression lines with solid blue circles represent multipoint calibrators, and the red dashed lines with open circles represent 2-point (25 and 500 mg/dL) or 1-point (500 mg/ dL) calibration curves for assay A and assay B, respectively. Table 7.3 lists the back-calculated results, and percent of difference of back-calculated results from nominal values. Back-calculated results were defined for the seven calibrators by using the multipoint calibration curve one time and the 2-point or the 1-point curve another time. While the LLOQ of the assay is 2 mg/dL, to eliminate the significant impact of the negative skew of the 2 mg/dL data point in the 2-point calibration on the summary statistics, the percentage of difference for this point was not included. This can be justifiable from clinical perspectives since the level is too low relative to the lower limit of reportable range. This example confirms the following 2 points that were mentioned above: G
G
While a 2-point or 1-point calibration can replace a multipoint calibration in assays with linear calibration curve, as it is commonly practiced, multipoint calibration is more immune to random error caused by 1 or 2 points. R2, by itself, can be misleading as 2-point and 1-point curves have R2 of 1.0, but the back-calculated results from the two curves are not as accurate as those from the multipoint calibrators.
158
Biomarkers, Diagnostics and Precision Medicine in the Drug Industry
3.5
(A)
3.0
y = 0.0032x + 0.0331 R2 = 0.9961
2.5
(B)
y = 0.0039x + 0.0002 R2 = 0.9967
3.0 2.5
OD
OD
2.0 1.5 1.0
1.5 y = 0.0036x R2 = 1
1.0
y = 0.0029x + 0.0594 R2 = 1
0.5
2.0
0.5 0.0
0.0 0
100
200
300
400
500
600
700
0
800
100
200
300
400
500
600
700
800
Calibrator concentration (mg/dL)
Calibrator concentration (mg/dL)
FIGURE 7.4 Multipoint, 2-point, and 1-point calibration curves. Figure (with the associated table) shows the probability and limitations to use 2- or 1-point instead of using multipoint calibration curve. (A) two-point calibrator vs multi-point calibrator, (B) one-point calibrator vs multi-point calibrator. OD, Optical density.
TABLE 7.3 Calibrator back-calculated results and their percentage of differences from nominal values in quantitative assay depicted by Fig. 7.4A. First assay 7-Point Nominal (mg/dL)
Back-calc. (mg/dL)
Second assay 2-Point
% Diff.
7-Point
Back-calc. (mg/dL)
% Diff.
7.8
2 6.7
2 434.5
Back-calc. (mg/dL)
% Diff. 0.0
Back-calc. (mg/dL) 2.2
% Diff.
2
2.2
25
27.2
8.6
20.9
2 16.4
25.6
2.4
27.8
11.1
50
49.0
2 1.9
45.0
2 9.9
51.2
2.5
55.6
11.1
100
99.0
2 1.0
100.2
0.2
102.5
2.5
111.1
11.1
250
264.7
5.9
283.0
13.2
256.4
2.5
277.8
11.1
500
514.7
2.9
558.8
11.8
459.2
2 8.2
497.5
2 0.5
750
770.9
2.8
841.6
12.2
761.5
1.5
825.0
10.0
Average
2.0
1-Point
11.1
2.9
1.8
0.5
9.0
Minimum
2 1.9
2 16.4
2 8.2
2 0.5
Maximum
8.6
13.2
2.5
11.1
Note: The 2 mg/dL was eliminated from the summary statistics to avoid the impact of its negative skew.
7.4
Quality indicators
Quality indicators in this chapter include what is traditionally known as “QC” in addition to other indicators that serve as retrospective tools to assess a lab test performance. While promoting this right concept to differentiate between “control” and “indicator,” to avoid confusion, we will continue to use QC in this chapter to denote the current common nomenclature.
7.4.1
Quality control
FDA considers QC in an IVD test/device as a material or mechanism, which, when used with or as part of a test system, monitors the analytical performance of that system. It may monitor the entire test system or only one aspect of it [16]. QC is a neat biological sample, reference material, or contrived sample, with known or predetermined quantity of an analyte, that is used to monitor the performance of an assay and, hence, assess the integrity and validity of results from
Essentials of laboratory quality management Chapter | 7
159
clinical samples analyzed on the same run. CLIA considers the performance of daily QC activities as an additional instrument function check, including instrument stability and calibration [3].
7.4.1.1 Quality control materials 7.4.1.1.1
Quality control materials for soluble biomarkers and bioanalysis
Per the CLSI guidelines [17], QC materials should have the characteristics that enable them to produce information about what is happening with the measurement procedure when performing measurements with the intended patient samples. Sample processing, for example, extraction of a drug or metabolite from blood or serum, extraction of DNA or RNA from tissue sample, is considered a part of the run, and QC should include this step. CAP (College of American Pathologists) checklist [18] indicates that the QC plan must monitor the extraction and amplification phases based on the risk assessment performed by the laboratory and the manufacturer’s instructions. In assays where sample processing is performed, two types of QC can be used, entire run QC that includes sample processing and analytical phase QC that does not need processing, for example, plasmid DNA, in vitro transcribed RNA, or a drug or its metabolite in pure form. For bioanalytical methods, FDA requires the preparation of calibrators and QCs from separate stocks of reference material, and both should be prepared in the same matrix as the study samples to be assayed [5]. Also, CAP [18] indicates that, in general, calibrators should not be used as QC materials. If calibrators are used as controls, then different preparations should be used for these two functions. For example, when using commercial calibrators and controls, the lot number for calibration should be different from the lot number used for QC, whenever possible. 7.4.1.1.2 Quality control materials for tissue staining To rule out any nonspecific staining in IHC an antibody-negative control should be applied to a duplicate section from the same tissue specimen using the same reagents and procedure as the patient test slide except that the primary antibody is omitted and replaced by an unrelated antibody. The negative control antibody should be from the same isotype as the primary antibody (for monoclonal antibodies) or an unrelated antibody from the same animal species as the primary antibody (for polyclonal primary antibodies). While CAP [19] allows the use of negative control reagent included in the staining kit (without specifications) or the diluent/buffer solution in which the primary antibody is diluted, inclusion of an isotypic or an antibody from the same animal species may detect more possible interferences. In addition to this internal QC, negative and positive QC should be used with each batch of staining. As explained above, the two controls should be within the assay dynamic range and as close as possible to the assay cutoff. For ISH a probe targeting a constitutively expressed locus colocalized with the gene of interest on the same chromosome is used as internal control, for example, chromosome 17 centromere is used as an internal control for HER2 in breast and gastric tumor tissues. The control probe should show 12 signals/cell nucleus for a sample to be acceptable. In addition, an external QC sample is encouraged [19,20]. In general, in addition to a QC with normal gene pattern, it is recommended to use a QC with deletion, amplification or fusion if gene deletion, amplification, or fusion is the target variance, respectively. While human tumor tissues can be used as QC for IHC and ISH in cancer, to avoid the intratumor heterogeneity issue, which will be discussed later in this book, xenografts (if compatible with the IHC primary antibody) or cell line formalin-fixed paraffin-embedded pellets can be used. In general, CLSI C24 [17] indicates that a laboratory should obtain enough homogenous and stable control material to last for at least 1 year, when practical.
7.4.1.2 Frequency 7.4.1.2.1
Quantitative assays
FDA bioanalytical guidance [5] requires at least three QC levels (low, medium, and high) and at least two replicates per QC level in each analytical run, with total QCs should be 5% of unknown samples or at least 6, whichever number is greater, for each run or a distinct processing batch within a run. For quantitative tests, CAP [18] indicates that control materials at more than one concentration are to be included in every run and as specified in the manufacturer’s instructions (as applicable) and laboratory procedure. Controls should verify assay performance at relevant analytic and clinical decision points. CLSI C24 guidelines states that QC should be analyzed at least once during each analytical run. In general, we recommend at least two levels within an AMR, at least one of them at or close the medical decision point, with each batch of clinical samples.
160
Biomarkers, Diagnostics and Precision Medicine in the Drug Industry
7.4.1.2.2
Qualitative assays
For qualitative tests, CAP [18] expects positive, negative, and sensitivity controls to be included in every run. Ideally, one should use a positive control for each analyte in each run. A sensitivity control is required if the molecular assay is being used to detect low-level target sequences to show that low-level target is detectable. CLSI C24 guidelines states that QC should be analyzed at least once during each analytical run. In addition to the internal controls explained earlier, at least two levels of QC including at least one negative and one positive QC, two QC including one normal and one with targeted variance should be used with each batch of IHC or ISH staining, respectively [19,20]. In general, we recommend at least one negative and one positive control within an assay dynamic range and at least one of them at or close the positivity cutoff, with each batch of clinical samples. In addition to no-template control (blank), two of the FDA guidelines [21,22] that handle the microbial nucleic acid multiplex assays recommend the following four controls: 1. Negative specimen control, which contains nontarget nucleic acid, to rule out nonspecific detection. Human specimen from an individual tested negative for the pathogens targeted by the assay or surrogate negative control, for example, sample containing human total RNA, can be used daily or with each batch of clinical samples. 2. Positive control for amplification/detection that can contain purified target nucleic acid near the limit of detection for a qualitative assay. It controls for the integrity of the device and the reaction components when negative results are obtained from the next control. 3. Positive controls (also can be called processing control) that encompass the entire procedure and are run side by side with patient specimens. A minimum of one positive should be run daily or per batch. The positive control is designed to mimic a patient specimen, contains target nucleic acids, and it is used to control the entire assay process, including nucleic acid extraction, amplification, and detection. 4. Internal control that is a nontarget nucleic acid sequence that is coextracted and coamplified with the target nucleic acid in the same tube. It controls for integrity of the reagents, equipment functionality, sample integrity and the presence of inhibitors in the specimen. Housekeeping genes, for example, B-actin or GADPH, have been commonly used for this purpose. Alternatively, while it can be laborious and challenging, the two FDA guidance allow internal control, which can be a packaged nontarget sequence added to each clinical specimen before any preanalytical steps and is analyzed simultaneously with the clinical targets.
7.4.1.3 Positioning of quality control within a run Daily QC activities and function checks are performed prior to patient testing to ensure that an instrument is functioning correctly and is properly calibrated [3]. As indicated by the CLSI C24 guidelines, QC can be placed at the beginning of the run, that is, before patient samples, at the end of the run or randomly distributed within patient samples. Placing QC before and after patients can be helpful to detect within-run drift. The routine placement of QC immediately after calibrators was cautioned as it may falsely underestimate imprecision throughout a run and will not estimate shift which may happen during a run. For continuous flow technologies especially those, such as LCMS and ICPMS, where building up of analyte(s) in the column or at the injector and carryover are potential problems, we recommend bracketing subject samples with QC for two main reasons: 1. Comparing results from the after-sample to before-sample QC set will enable the detection of drift. 2. If QC is positioned before but not after the clinical samples, QC passes, and the run is accepted, but the next run QC fail, which can be after a day or more from the previous run, no one can predict when the failure might have happened. Patient’s results from the previous run might have been impacted if a drift happened in the middle of the previous run. At this point, repeat analysis may be impossible due to sample instability or the results might have been already reported. Having a second set of QC after samples would detect the shift on the spot. For demonstration, Fig. 7.5 depicts the following four scenarios, from top to bottom: a. A successful run with before-sample and after-sample QC sets passing the check. b. Successful before-sample and passing but drifting after-sample QC that indicates possible drift in patient results. In such situation, while, a run could be acceptable, patient results around a medical decision point should be investigated and, possibly, reanalyzed. For example, the normal limit of blood glucose is 126 mg/dL, and if the drift can be 15%, all samples with results between 126 and 132 mg/dL (within 15% of the 126 mg/dL) should be investigated and, preferably, repeated before reporting results as abnormal.
Essentials of laboratory quality management Chapter | 7
Day 1
161
Day 2
QC
Patient samples
QC
QC
Patient samples
QC
QC
Patient samples
QC
QC
Patient samples
QC
FIGURE 7.5 Good positioning of QC within a run can detect within-run drift in patients’ results. In the top three scenarios two sets of QC bracket patients’ samples, but in the 4th scenario, QC is run before clinical samples only. Green, acceptable; yellow, biased but could be acceptable; and red, unacceptable drift. QC, Quality control.
c. Successful before-sample QC and failed after-sample QC. The run has to be repeated after troubleshooting if needed. d. Successful before-sample QC but no after-sample QC was performed, and next day QC failed. If failure was due systematic error or drift, failure might have happened and/or drift started in the middle of the previous run (as shown by color grading). The erroneous results from the previous run could be difficult to fix at this stage, as explained earlier.
7.4.1.4 Quality control acceptance criteria 7.4.1.4.1
Qualitative assays
For qualitative assays with predefined cutoff, positive control should test positive and negative should test negative for a run to be accepted. However, to detect any drift, for example, systematic increase or decrease in IHC stainability, we recommend to consider the raw reads, for example, IHC scores. Raw reads can be treated in way similar to quantitative readouts, but one has to be careful in suggesting an acceptable range especially for biomarkers with relatively low expressions.
7.4.1.4.2 Quantitative assays For quantitative assays, typically, QC target ranges are established for each QC material lot from 20 measurements performed on 20 different days, but if infeasible, from 4 measurements per day for 5 different days. Sometimes, 20 measurements may not be enough to establish QC ranges, and cumulative values from up to 6 months of QC in production can be used [17] to revise the initial ranges. Ranges are commonly calculated as 6 1, 2, and 3 SD (standard deviation) of the average of assayed values, but they can be also established as 6 % of the average, for example, 6 5%, 10%, 20% of the average. As shown by Fig. 7.6, these different limits are commonly plotted on the y-axis of a graph form called Levy-Jennings chart that is used to log in daily QC values, automatically or manually, in real time. FDA bioanalytical guidance [5] defines the acceptance criteria of QC in chromatographic assays as $ 67% of QCs should be within 6 15% of the nominal, and $ 50% of QCs per level should be within 6 15% of their nominal, and in LBA as $ 67% of QCs should be within 6 20% of the nominal, and $ 50% of QCs per level should be within 6 20% of their nominal. For quantitative assays in clinical laboratory, Westgard’s rules have been widely used for decades [1,17,2328]. CLSI C24 indicates 1-3S and 2-2S as commonly used rejection rules, but others of the Westgard rules listed on Table 7.4 are used to reject a run. Fig. 7.7, redrawn from [24,26], represents QC decision tree that a lab may follow to adjudicate QC results. If QC fails due to possible random error, a lab may repeat QC just one more time to rule out failure by chance, but it is poor practice to simply keep repeating the controls until they pass, though that is commonly done in many laboratories. Westgard quoted a real response from a regulated lab to an out-of-control situation documented in the QC log as “repeated, repeated got lucky” [28]. Also, it would be just waste of time to repeat QC if failure was systematic (consistent) without proper troubleshooting including but not limited to recalibration.
162
Biomarkers, Diagnostics and Precision Medicine in the Drug Industry
(A)
170
Concentration
165 160 155 150 145 140 135 130 1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
Run number
116
(B)
Concentration
112 108 104 100 96 92 88 84 1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
Run number FIGURE 7.6 Examples of QC Levy-Jennings charts for high level (A) and low level (B) QCs. Black line represents the predefined target value of a QC sample, green, orange, and red lines represent different levels for QC result judgment; usually plotted as 1, 2, and 3 SD, but it can be, also, in terms of percentage of the target as 5%, 10%, and 20%. Red circles or ovals encompass failed runs by common practice. QC, Quality control; SD, standard deviation.
TABLE 7.4 Westgard quality control (QC) rejection (failure) rules for quantitative assays. Rule
Description if one QC is used in a run
Description if two QCs are used in a run
Possible type of errors
1-3S
QC value exceeds the target mean plus 3 SD or the mean minus 3 SD limit
One of the two QCs exceeds the target mean plus 3 SD or the mean minus 3 SD limit
Random or systematic
2-2S
QC value exceeds the target mean plus 2 SD or mean minus 2 SD limit on two consecutive runs (both reads are on one side of the mean)
The two QC values exceed the target mean plus 2 SD or mean minus 2 SD limit on the same run (both are on one side of the mean)
Systematic
R-4S
QC value exceeds the target mean plus 2 SD on one and mean minus 2 SD on another of two consecutive runs
One QC value exceeds the target mean plus 2 SD and the second exceed mean minus 2 SD on the same run
Random
4-1S
QC value exceeds the target mean plus 1 SD or mean minus 1 SD limit on four consecutive runs (all on one side of the mean)
One QC value exceeds the target mean plus 1 SD or mean minus 1 SD limit on four consecutive runs or the 2 QCs exceed mean 11 SD or mean minus 1 SD on the same run (all on one side of the mean)
Systematic
10x
10 consecutive QC values fall on one side of the mean regardless of the size of deviation
10 consecutive values of one of the two QCs or 5 consecutive values from each of the 2 QCs fall on one side of the mean regardless of the size of deviation (all on one side of the mean)
Systematic
SD, standard deviation.
Essentials of laboratory quality management Chapter | 7
12S
No
163
Successful QC Accept run
Yes
13S
No
22S
Yes
No
Yes
R4S
No
41S
No
Yes
Yes
10 x Yes
Failed QC → Reject run FIGURE 7.7 Common practice QC decision tree. QC, Quality control. 1-2S, one QC level violates 2 SD (observed value exceeds target value 6 2 SDs of the target); 1-3S, a QC level violates 3 SD; 2-2S, a QC level violates 2 SD in two consecutive runs or each of two QC levels violates the 2 SD (on one side of the target value) in one run; R-4S, a QC level violates 2 SD in two consecutive runs or each of two QC levels violates the 2 SD (on opposite sides of the target value); 4-1S, a QC violates 1 SD on four consecutive runs (on one side of the target) or two QCs violate 1 SD in two consecutive runs (on one side of the mean); 10 3 , a QC comes on one side of target in 10 consecutive or 2 QCs on the same side in 5 consecutive runs. QC, Quality control; SD, standard deviation. 250
200
150
100
50
Sc1
Sc2
Sc3
Sc4
Sc5
Sc6
Sc7
FIGURE 7.8 Different scenarios for acceptable and unacceptable QC outcomes. Figure demonstrates 7 scenarios (Sc17) for two levels of QC with target values of 100 and 200 mg/dL analyzed before and after clinical samples. Sc1 is good performance. Sc2 and Sc3 show consistent drifts (systematic error), and Sc47 show inconsistent (random) errors that dictate either run rejection or investigation. QC, Quality control.
7.4.1.4.3
Revisiting Westgard’s rules
To the best of available knowledge, Westgard’s QC rules could be the most commonly adapted rules in clinical laboratories for quantitative assays and have been recognized by professional associations and regulators. However, it could be the time to revisit them for the following points: 1. These rules can be understood as if a lab runs one set (one, two, or more) of QC one time per run, which is likely to be before patient samples to assure system readiness to analyze patient samples. As explained earlier, this approach has major drawbacks. As demonstrated above, we recommend to use one set of QCs, two levels can be enough, before and another set after patient samples to address the drawbacks. 2. As mentioned by Westgard [1], there is no easy way to tell whether rejection is due to the background random error (false rejection) or additional error has occurred (true rejection). It is acknowledged that there is a 1 out of 20, or 5% chance, that a control measurement from a single QC level will exceed control limits set as the mean 6 2 SD. If there are two levels of control in each run, as required by CLIA, then the chance of observing 1 measurement outside 2 SD control limits is about the double of this chance, and it gets worse as the number of control measurements increase to 3 or 4 as the chance of failure increase proportionally. Having two sets of QC on each run will help to differentiate random from systematic error. Fig. 7.8 demonstrates 7 scenarios (Sc17) for the outcomes from two levels of QC (with target values of 100 and 200 mg/dL) analyzed before and after clinical samples. Sc1 is good performance, Sc2 and Sc3 show consistent drifts (systematic error) and Sc47 show inconsistent (random) errors. Runs from Sc2 and 3 should be rejected, troubleshooting should be conducted before calibration, running QC, and reanalyzing patient samples. While investigation is important in Sc47, these random errors should not trigger automatic
164
Biomarkers, Diagnostics and Precision Medicine in the Drug Industry
run rejection. Analyzing another QC set could address the issue; if results are corrected, no more action may be needed, but if the error stays, more troubleshooting can be conducted. 3. 4-1S may not be realistic as rule for rejection but can serve as warning, for two reasons: a. Some assays have very tight SD that could be just 2% or 1% of the target mean, and this small deviation may not impact medical decision. At least the rule should not be universalized. b. If one QC is used and a lab applies the rule and reject the fourth run when the fourth 1-S deviation was seen, what about the previous 3 runs where repeat analysis and correcting results can be almost impossible? 4. Ignoring magnitude of deviation in the 10x rule makes it very unrealistic. In fact, if target value of a QC is 100 mg/dL (for example) and 10 consecutive measurements come at 100.1100.5, or even up to 101, the behavior could be considered an ideal. Also, similar to 4-1S, if a lab rejects the 10th run, what will be the situation for the previous 9 runs? 7.4.1.4.4 Difference between Westgard’s rules and the suggested rules in quality control interpretation Table 7.5 summarizes the runs that were rejected (runs outlined on Fig. 7.6A and B) following Westgard’s rules and our adjudication of the runs in the light of the abovementioned suggestions. While other Westgard’s failures can be investigated, out of the 7 runs rejected by Westgard, only two are rejected if our suggestions are applied. The two runs have consistent trends in the two QC levels. Unless random error due to contamination or carryover can be a possibility, we do not see a random error in a QC as reason for automatic rejection of a run.
7.4.1.5 Quality control for multiplex assays Multiplex analysis refers to quantifying multiple biomarkers in a single analysis. The approach is amenable to different technologies including immunoassays, mass spectroscopy, but microarray and next-generation sequencing represent the most extensive panels where thousands of analytes can be analyzed in a single RNA or DNA sample. Regardless of their attraction, multiplex assays face multiple challenges, of which is the QC. The more the analytes in a panel, the higher the probability to have failed analyte QC on a given run. For example, Table 7.6 shows the probabilities of false (by chance) failures of two QC for multiplexes with 5, 10, 15, and 20 biomarkers compared to one biomarker. These are the just by-chance (false) failures, but rates of failures can be much higher in practice. In a study conducted by Ref. [29], 2322 plasma samples were analyzed in duplicates for 15 interleukins and other inflammatory biomarkers split between two multiplexes (6 and 9 analytes each) in a CLIA-certified lab. In addition to calibrators, three recombinant proteins and one plasma were used as QC on each of 66 microplates (for each multiplex), and the QC TABLE 7.5 Interpretation of the Levy-Jennings quality control (QC) charts presented on Fig. 7.6A and B following Westgard’s and our suggested rules. Runs rejected following Westgard’s rules 5
Violated Westgard’s rules
1-3S
2-2S
4-1S
10x
X
8
RE
Runs rejected by our rules
X
No consistent trend (the failed QC is below 23 SD but the other is acceptable and on the positive side of the mean) X
X
X
14
X
27
X X
Positive trend The two QC are on the opposite sides of the mean with similar magnitudes
X
17
Our justification for accepting or rejecting Westgard’s decisions
SE
X
11
29
R-4S
Type of errors per Westgard’s rule
X
QC, Quality control; RE, random error; SD, standard deviation; SE, systematic error.
X
The failed QC is below 22 SD, but the other QC was close to 12 SD
X
Investigate the positive trend
X
One QC trend low but the other is well distributed on the two sides of the mean
X
X
Positive trend
Essentials of laboratory quality management Chapter | 7
165
TABLE 7.6 Probabilities of quality control (QC) false warning (1-2S) or false rejection (2-2S or 1-3S) in multiplex assays. Number of biomarkers
1
5
10
15
20
1-2S
4.55
22.75
45.50
68.25
91.00
2-2S
0.21
5.18
20.70
46.58
82.81
1-3S
0.27
1.35
2.70
4.05
5.40
targets were established as median of each analyte 6 20% of the median. At least one analyte failed each of the QCs in 4756 (71.2%90.3%) out of the 66 plates for the 6-analyte panel and 3644 (54.5%66.7%) for the 9-analyte panel. The within-duplicate imprecision was unacceptable where the average percent of samples exceeded CV of 10%, 20%, and 30% were 35.2%, 18.2%, and 11.6%, respectively, with the percent of times CV% exceeded these limits for one analyte were as high as 78.6%, 60.9%, and 46.3%. The assay imprecision was reflected in the QC high failure rates, but precision is not the focus of this section. The main question here is what QC rule(s) can be followed to accept or reject a plate. Combinations of failures from different QCs were not listed on the article, but one can predict that a significant number of plates should have failed. FDA guidelines [21,22] defined highly multiplexed medical microbiological nucleic acid-based diagnostic device as an assay with the capability to detect $ 20 different organisms/targets, in a single reaction, and involve testing multiple targets through a common process of specimen preparation, amplification and/or detection, and result interpretation. The guidance indicated that positive controls can be a subset of the larger assay menu and can be rotated through a predefined schedule. Similarly, CAP suggests the rotation of positive controls, in a systematic fashion and at a frequency defined in the laboratory procedure, for large mutation panels, for example, in cystic fibrosis [18]. With the expansion in multiplex and large panel testing, we suggest to assemble a forum of lab professionals and regulators to discuss these QC issues but meanwhile, we recommend the followings: G G
Each biomarker for decision-making should have the right QC as if it is a single biomarker assay. QC rules are applied to each biomarker independently, and if QC did not pass the acceptance/rejection criteria, only biomarkers with failed QC should be repeated. To avoid confusion in data acquisition, repeated results from biomarkers which have passed on the initial analysis can be suppressed.
7.4.2
Proficiency testing
In proficiency testing, also known as external quality assessment, an external organization (professional organization, governmental agency, or accredited manufacturers of QC materials) sends sets of unknown, commonly simulated, samples [called proficiency testing (PT) samples or challenges] to all laboratories enrolled in a PT program for analysis. By CLIA regulations, PT samples must be treated and tested as if they were patient samples, and the results are submitted to the PT provider for evaluation to determine the quality of individual laboratory’s performance. Government and licensing agencies are using PT as an objective tool for certifying labs. Historically, this approach was developed in an effort to protect patients’ welfare by improving lab quality after a series of newspaper articles that focused on laboratory quality issues [1]. The PT provider usually sends two to five unknown samples for each analyte two to three times per year. The PT provider evaluates the results from different laboratories and grades the performance based on how close each laboratory’s results are to the target values for each analyte, if known as determined by the PT provider using a reference method, or to the mean value of peer group (group of laboratories utilizing same technology/platform). The closeness of a particular lab results to the average of peer group can be calculated in terms of percent difference (bias) that is calculated as [(Lab result 2 Group mean)/(Group mean) 3 100] or as SDI (standard deviation index) that is calculated as [(Lab result 2 Group mean)/(Group SD) 3 100]. Acceptable results must fall within the accuracy ranges established by a PT provider for a test. PT provider reports an evaluation summary containing the lab’s scores from each test, the peer group summary statistics, the deviation of the lab’s result from the peer group mean, and a notation if the lab passed or failed on each sample. Other than blood ABO group, D (Rho) type, and compatibility testing where a score of 100% is required, if a lab achieved $ 80% success (e.g., if 4 or the all 5 samples passed) in an event, the lab is considered
166
Biomarkers, Diagnostics and Precision Medicine in the Drug Industry
successful [30]. In addition to the pass/fail notation the lab has to assess the numerical deviations to distinguish any systematic bias from the peer group and conduct the appropriate troubleshooting.
7.4.3
Calibration verification
As mentioned above, occasional calibration verification (also known as linearity check or AMR verification) using samples with known values covering as much as possible of an AMR is a retrospective quality indicator to assess a system performance. Linearity is a fundamental characteristic of good analytic measurement methods, whereby there is a straight-line relationship between measured (also known as observed or assayed) analyte concentrations and “true” (also known as target or nominal) analyte concentrations. In this context, linearity refers to the relationship between final reportable results and not to the relationship between instrument signal output and analyte concentration, which can be nonlinear [31]. CLIA requires calibration verification at least every 6 months, but if a test system is calibrated at least every 6 months using three or more levels of calibrators including a low, mid, and high value, the requirement for calibration verification is considered met [14]. CAP provides calibration verification and linearity (CVL) surveys that enable participating laboratories to perform the calibration verification and assessment of method linearity for many analytes in chemistry and hematology. For calibration verification, similar to PT discussed earlier, participant results are compared with target values that are determined by reference methods or by peer group means, and grading is based on a bias from the target value.
7.4.3.1 Samples for calibration verification CLIA requires a minimum of three levels (low, mid, and high), but five to eight levels is a more standard practice especially for assays with wide AMR. CLIA advises that the samples should be analyzed like regular patient samples, same steps and, probably, same replicate, but running samples in duplicates or, preferably, triplicates, and calculating the averages of replicate measurements were indicated [15,31].
7.4.3.2 Acceptance criteria There is no acceptable criteria published by CLIA or CAP. Common lab practice applies one or the two following criteria [15,31]: 1. Bias is the absolute difference or percent of difference of observed target values. As common practice, labs accept calibration verification if observed (assayed) results are within target 6 TEa. For example, if three samples with glucose concentrations at 100, 200, and 400 mg/dL are used for CVL, results will be acceptable if observed values are within 90110, 180220, and 360440 mg/dL (target value 6 10%, CLIA TEa for blood glucose). However, to detect an assay improper performance, using 1/2, 1/3, or 1/4 of the TEa is recommended if averages of replicate measurements are used. Westgard [15] suggests bias budget as 1/3 (0.33) of TEa instead of the whole TEa because he believed that averaging of duplicates or triplicates eliminate the impact of imprecision that represents the remaining 2/3 of the TEa. The more exact approach, as stated by Westgard, is to define the bias budget considering the actual impression of an assay and calculate bias as 5 [TEa 2 2 SD] or 5 [TEa% 2 2 CV%]. 2. The slope of the regression line from plotting the observed values on the y-axis against target values on the x-axis, and the acceptable limits can be calculated as 1 6 [TEa/100]. Using blood glucose as an example, acceptable slope should be 1 6 10/100, that is, 0.91.10.
7.4.3.3 Challenging these criteria As it was the case under calibration curve, the two criteria listed above to judge CVL are not enough to indicate an assay performance that does not cause bias in clinical results. Fig. 7.9A and B demonstrates three scenarios that can be acceptable for the blood glucose example mentioned above. As shown slopes are within the 0.91.1 range and percent differences of observed averages from the target values are within 1, 1/2 or, 1/4 in the different scenarios. While it is a common practice to accept results within target 6 TEa, which indicate possible bias in clinical results by this magnitude (up to 10%), even consistent bias at 1/4 of the TEa should not be acceptable, at least, without investigation and close monitoring of the assay performance.
Observed averages (mg/dL)
Essentials of laboratory quality management Chapter | 7
167
(A)
450
350
250 y = 0.975x
1TEa
150
1/2TEa y = 0.95x 1/4TEa y = 0.9x
50 50
150
250
350
450
Target values (mg/dL)
(B)
15
1TEa
% Differences
10
1/2TEa 1/4TEa
5 0 –5 –10 –15 50
150
250
350
450
Target values (mg/dL) FIGURE 7.9 Three scenarios for common practice acceptable blood glucose calibration curves. The three scenarios are widely acceptable by common practice relying on the good correlation plots (A) but consistent biases at 1/4, 1/2, and 1x the total allowable error of the assay are demonstrated by the difference plots (B).
Here, we, also, recommend to consider consistent bias into the acceptable criteria for CVL. We suggest to use the two criteria: (1) the percentage difference of individual pairs of results should be within 6 TEa, preferably, within 6 1/ 2 TEa, and (2) the average of percent of differences should be as close as possible to 0 but does not exceed 1/4 TEa.
7.4.4
Delta checks
Delta checks serve as patient-based QC tool to detect testing problems [32]. While it can be a long lead-time retrospective way, similar to PT, patient data in a lab can provide additional way to monitor lab performance in standard of care. However, in fact, this can be an effective “prospective” tool to adjudicate clinical trial data from batch analysis.
7.4.4.1 Individual patient results Intrabiomarker results comparison for a subject or intrasubject comparison of different related biomarkers or matching a subject biomarker results to his/her clinical status can serve as a quality assessment tool. 7.4.4.1.1
Intrabiomarker comparison
Comparing a patient result to his/her previous results or inspecting a set of results from a patient in a clinical trial may help to identify a random error that might have happened. A larger-than-expected interval change in results may indicate a testing problem associated with either the former or the current specimen and prompt an investigation before results are reported [32]. For example, if a patient, in standard of care or longitudinal clinical trial, is on statin treatment and his/her low-density lipoprotein cholesterol values from multiple previous measurements have been ranging between 100 and 120 mg/dL and the new result is 180 mg/dL, this sample should be reanalyzed or, if unavailable, another sample should be taken. Another example, if a pharmacodynamic biomarker results from several time points showed a trend of decrease after treatment and just one point spiked in the middle of the pattern, the sample has to be reanalyzed.
168
Biomarkers, Diagnostics and Precision Medicine in the Drug Industry
7.4.4.1.2
Interbiomarker comparison
The correlation of a biomarker result to other biomarkers within a panel from same sample (time point) can be helpful to distinguish an outlying result. For example, if a patient has an active hepatitis C virus as detected by RNA copy number, and significantly high aspartate transaminase but normal alanine aminotransferase (ALT), sample should be repeated for ALT. 7.4.4.1.3
Matching results with clinical status
While sometimes clinical lab results lead clinicians to make or confirm a diagnosis or diagnosis is not revealed to the lab, in other times, the clinical picture can be obvious, and lab testing is mainly for confirmation, followup, or to predict prognosis. In these cases, lab results can be correlated to the clinical presentation not to bias the lab staff but to inspect any outlying results. For example, if a jaundiced patient with abnormal ultrasound live abnormalities has high total and direct serum bilirubin but normal alkaline phosphatase, sample should be analyzed for ALP.
7.4.4.2 Multiple patients results Periodical, for example, quarterly to annual plotting of patients’ results, with monthly averages calculated, for a biomarker in a lab can be an effective retrospective quality assessment tools. The approach was reported as a more effective tool than daily QC or lot-to-lot evaluation to detect the fluctuation of results with different lots of reagents [4,6].
References [1] Klee GG, Westgard JO. Quality management. In: Burtis CA, Ashwood ER, Bruns DE, editors. Tietz textbook of clinical chemistry and molecular diagnostics. 5th ed. St Louis, MO: Elsevier; 2012. p. 213175. [2] Westgard JO, Burnett RW, Bowers GN. Quality management science in clinical chemistry: a dynamic framework for continuous improvement of quality. Clin Chem 1990;36(10):171216. [3] CLIA. State operations manual Appendix C—survey procedures and interpretive guidelines for laboratories and laboratory services, ,https:// www.cms.gov/Regulations-and-Guidance/Legislation/CLIA/Downloads/App-C_Survey-Procedures-IGs-for-Labs-Labs-Svcs-Final.pdf.; issued October 6, 2015 [accessed 20.09.18]. [4] Liu J, Tan CH, Loh TP, Badrick T. Detecting long-term drift in reagent lots. Clin Chem 2015;61(10):12928. [5] FDA. Bioanalytical method validation guidance for industry, ,https://www.fda.gov/downloads/drugs/guidances/ucm070107.Pdf.; released May, 2018 [accessed 27.08.18]. [6] Algeciras-Schimnich A, Bruns DE, Boyd JC, Bryant SC, La Fortune KA, Grebe SK. Failure of current laboratory protocols to detect lot-to-lot reagent differences: findings and possible solutions. Clin Chem 2013;59(8):118794. [7] Hayden JA, Schmeling M, Hoofnagle AN. Lot-to-lot variations in a qualitative lateral-flow immunoassay for chronic pain drug monitoring. Clin Chem 2014;60:8967. [8] Algeciras-Schimnich A. Tackling reagent lot-to-lot verification in the clinical laboratory. Clin Lab News July 1, 2014. ,https://www.aacc.org/ publications/cln/articles/2014/july/bench-matters. [accessed 01.10.18]. [9] Linnet K. Necessary sample size for method comparison studies based on regression analysis. Clin Chem 1999;45:88294. [10] Miller WG, Erek A, Cunningham TD, Oladipo O, Scott MG, Johnson RE. Commutability limitations influence quality control results with different reagent lots. Clin Chem 2011;57:7683. [11] CLSI. EP26-A. User evaluation of between-reagent lot variation; approved guideline; issued September 1, 2013. [12] Katzman BM, Ness KM, Algeciras-Schimnich A. Evaluation of the CLSI EP26-A protocol for detection of reagent lot-to-lot differences. Clin Biochem 2017;50(1314):76871. [13] Roberts WL, McMillin GA, Burtis C, Bruns DE. Reference information for the clinical laboratory. In: Burtis CA, Ashwood ER, Bruns DE, editors. Tietz textbook of clinical chemistry and molecular diagnostics. 5th ed. St Louis, MO: Elsevier; 2012. p. 213175. [14] CLIA. Brochure #3: calibration and calibration verification, ,https://www.cms.gov/Regulations-and-Guidance/Legislation/CLIA/Downloads/ 6065bk.pdf.; undated [accessed 01.10.18]. [15] Westgard JO. Basic method validation—calibration verification criteria for acceptable performance, ,https://www.westgard.com/cal-verification-criteria.htm.; 2016 [accessed 01.10.18]. [16] FDA. Overview of IVD regulation, ,https://www.fda.gov/medicaldevices/deviceregulationandguidance/ivdregulatoryassistance/ucm123682. htm.; last updated July 9, 2018 [accessed 21.09.18]. [17] CLSI. C24-A3: statistical quality control for quantitative measurement procedures: principles and definitions. 3rd ed.; 2006. [18] CAP. Molecular pathology checklist, ,http://www.cap.org/ShowProperty?nodePath 5 /UCMCon/Contribution%20Folders/DctmContent/education/OnlineCourseContent/2016/LAP-TLTM/resources/AC-molecular-pathology.pdf.; dated July 28, 2015 [accessed 24.09.18]. [19] CAP. Anatomic pathology checklist, ,http://www.cap.org/ShowProperty?nodePath 5 /UCMCon/Contribution%20Folders/DctmContent/education/OnlineCourseContent/2016/LAP-TLTMv2/checklists/cl-anp.pdf.; dated August 17, 2016 [accessed 01.10.18].
Essentials of laboratory quality management Chapter | 7
169
[20] FDA. HER2 FISH pharmDx Kit label, ,https://www.accessdata.fda.gov/cdrh_docs/pdf10/P100024C.pdf.; dated November 30, 2011 [accessed 22.09.18]. [21] FDA. Highly multiplexed microbiological/medical countermeasure in vitro nucleic acid based diagnostic devices—guidance for industry and Food and Drug Administration staff, ,https://www.fda.gov/downloads/medicaldevices/deviceregulationandguidance/guidancedocuments/ ucm327294.pdf.; issued August 27, 2014 [accessed 28.09.18]. [22] FDA. Class II special controls guideline: multiplex nucleic acid assay for identification of microorganisms and resistance markers from positive blood cultures—guideline for industry and Food and Drug Administration staff, ,https://www.fda.gov/downloads/MedicalDevices/ DeviceRegulationandGuidance/GuidanceDocuments/UCM448520.pdf.; issued May 27, 2015 [accessed 28.09.18]. [23] Westgard JO, Groth T, Aronsson T, Falk H, deVerdier C-H. Performance characteristics of rules for internal quality control: probabilities for false rejection and error detection. Clin Chem 1977;23:185767. [24] Westgard JO, Barry PL, Hunt MR, Groth T. A multi-rule Shewhart chart for quality control in clinical chemistry. Clin Chem 1981;27:493501. [25] Westgard JO, Klee GG. Quality management. In: Burtis CA, Ashwood ER, editors. Tietz textbook of clinical chemistry. 2nd ed. Philadelphia, PA: W.B. Saunders; 1994. p. 54892. [26] Westgard JO, Basic QC. Practices—training in statistical quality control for healthcare laboratories. 2nd ed. Westgard QC, Inc.; 2002. p. 78. [27] Westgard JO, Basic QC. Practices—QC—the multirule interpretation, ,https://www.westgard.com/lesson18.htm.; 2009 [accessed 25.09.18]. [28] Westgard JO, Westgard SA. Basic QC practices—re-emerging issues in QC today, ,https://www.westgard.com/re-emerging-qc-issues.htm.; 2011 [accessed 25.09.18]. [29] Ellington AA, Kullo IJ, Bailey KR, Klee GG. Measurement and quality control issues in multiplex protein assays: a case study. Clin Chem 2009;55(6):10929. Available from: https://doi.org/10.1373/clinchem.2008.120717. [30] CAP. Proficiency testing manual, ,http://www.cap.org/ShowProperty?nodePath 5 /UCMCon/Contribution%20Folders/WebContent/pdf/pt-manual.pdf.; 2017 [accessed 01.10.18]. [31] Killeen AA, Long T, Souers R, Styer P, Ventura CB, Klee GG. Verifying performance characteristics of quantitative analytical systems calibration verification, linearity, and analytical measurement range. Arch Pathol Lab Med 2014;138:117381. [32] Schifman RB, Talbert M, Souers RJ. Delta check practices and outcomes—a Q-probes study involving 49 health care facilities and 6541 delta check alerts. Arch Pathol Lab Med 2017;141:81323.