Biologicals 44 (2016) 306e318
Contents lists available at ScienceDirect
Biologicals journal homepage: www.elsevier.com/locate/biologicals
Process characterization and Design Space definition Christian Hakemeyer a, *, Nathan McKnight c, Rick St. John c, Steven Meier c, Melody Trexler-Schmidt c, Brian Kelley c, Frank Zettl b, Robert Puskeiler d, Annika Kleinjans b, Fred Lim c, Christine Wurth d a
Pharma Technical Development, Roche Diagnostics GmbH, Sandhofer Str. 116, 68305 Mannheim, Germany Pharma Technical Development, Roche Diagnostics GmbH, Nonnenwald 2, 82377 Penzberg, Germany c Pharma Technical Development, Genentech, South San Francisco, CA 94080, USA d Pharma Technical Development Biotech Europe, F. Hoffmann-La Roche Ltd, 4070 Basel, Switzerland b
a r t i c l e i n f o
a b s t r a c t
Article history: Received 9 June 2016 Accepted 10 June 2016 Available online 25 July 2016
Quality by design (QbD) is a global regulatory initiative with the goal of enhancing pharmaceutical development through the proactive design of pharmaceutical manufacturing process and controls to consistently deliver the intended performance of the product. The principles of pharmaceutical development relevant to QbD are described in the ICH guidance documents (ICHQ8-11). An integrated set of risk assessments and their related elements developed at Roche/Genentech were designed to provide an overview of product and process knowledge for the production of a recombinant monoclonal antibody (MAb). This chapter describes the tools used for the characterization and validation of MAb manufacturing process under the QbD paradigm. This comprises risk assessments for the identification of potential Critical Process Parameters (pCPPs), statistically designed experimental studies as well as studies assessing the linkage of the unit operations. Outcome of the studies is the classification of process parameters according to their criticality and the definition of appropriate acceptable ranges of operation. The process and product knowledge gained in these studies can lead to the approval of a Design Space. Additionally, the information gained in these studies are used to define the ‘impact’ which the manufacturing process can have on the variability of the CQAs, which is used to define the testing and monitoring strategy. © 2016 International Alliance for Biological Standardization. Published by Elsevier Ltd. All rights reserved.
Keywords: Quality by design (QbD) Process characterization Process validation Design space Monoclonal antibody
1. Introduction The characterization, and validation of a monoclonal antibody production process is built upon a comprehensive science- and risk-based approach. These program elements incorporate process and product understanding developed from process and productspecific studies as well as platform knowledge gained from similar molecules and processes. Fig. 1 depicts our approach to implementation of the Quality by Design (QbD) principles using a holistic set of risk assessment tools. The QbD tools described in this chapter are limited to the elements in the dashed blue line. Basis for all the activities described in this chapter is the definition of CQAs as described in Ref. [1]. The
* Corresponding author. E-mail address:
[email protected] (C. Hakemeyer).
methodologies used comprise risk assessments for the identification of potential Critical Process Parameters (pCPPs), multivariate studies for process characterization and subsequent linkage studies to define the CPPs with appropriate ranges. All the information gained in these studies are used to define the ‘impact’ which the manufacturing process can have on the variability of a CQA. This information is an important pre-requisite for the definition of the testing strategy as described further in Ref. [2]. Finally, the process and product knowledge gained in these studies can lead to the approval of a Design Space. This approach described here has been used in several development and licensing stage submissions, and has been refined since its incorporation in the A-Mab case study ([3] CMC Biotech Working Group. A-MAb: A Case Study in Bioprocess Development. Emeryville, CA: CASSS; 2009) and fulfills the ICH guidelines Q8, Q9, Q10 and Q11 [4e7] for monoclonal antibody manufacturing processes.
http://dx.doi.org/10.1016/j.biologicals.2016.06.004 1045-1056/© 2016 International Alliance for Biological Standardization. Published by Elsevier Ltd. All rights reserved.
C. Hakemeyer et al. / Biologicals 44 (2016) 306e318
307
Does the process control CQAs? Is the product stable? Attribute Testing Strategy (RRF)
CQA Identification (RRF)
Product Attributes
QTPP Process Development
Control Strategy
CQA Acceptance Criteria
Final CQAs
Platform Knowledge
Testing Strategy
Product Understanding Scientific Literature
Post Approval Lifecycle Management Plan
Design Space
Process Characterization
Process Parameters PC/PV Study Design (RRF)
CPP Identification (Impact Ratio)
Fig. 1. Quality by design risk assessment tools assess the process impact and controls on product quality.
1.1. Sequence of process characterization and process validation (PC/PV) studies After the process development cycle is completed, risk assessments are conducted to identify pCPPs. In these assessments, the potential impact that a variation of a process parameter can have on a certain CQA is assessed for each unit operation of the manufacturing process. Then unit operation specific process characterization and validation studies are conducted using both qualified scale-down models and manufacturing scale equipment. Linkage studies are further used to evaluate the cumulative impact that a variation of process parameters in several unit operations can have on the CQAs in the final Drug Substance (DS) of Drug Product (DP). This leads to a systematic and extensive process understanding used to determine critical process parameters (CPPs) and their acceptable ranges. The outcomes of the process characterization and process validation studies then inform the final Attribute Testing Strategy and post approval lifecycle plan (PALM plan) [2,8]. An overview about all the activities is shown in Fig. 2. 1.2. Definition of a Design Space The process knowledge gained using the approach described in this chapter is the basis for defining a Design Space. ICH Q8 defines
the Design Space as the multidimensional combination and interaction of input variables (e.g., material attributes) and process parameters that have been demonstrated to provide assurance of quality. In its QbD filings, Roche/Genentech used a further refined and specific definition: The design space includes all the unit operations, the process parameters describing the operation of each of the unit operations, and the raw materials used. The design space is limited by the acceptable ranges for all process parameters, including both CPPs as well as nonCPPs. Acceptable ranges are established for both CPPs and non-CPPs. The design space is described in sections S.2.2: Description of the Manufacturing Process and Process Controls, and S.2.3 Raw Materials, and P.3.3 in the Common Technical Document (CTD) structure as described in Ref. [9]: Description of the Manufacturing Process and Process Controls. As mentioned in the ICH Q8 Definition, the Design Space defines the extent to which a manufacturer can introduce changes into the process without pre-approval from Health Authorities. The Design Space claim defines what is acceptable in terms of potential process parameter target changes [10]. It does not describe the edges of failure for the process and excursions of process parameters outside the design space do not automatically lead to unacceptable product quality. In the Description of the Manufacturing Process and Process Controls for a process with a
Fig. 2. Sequence of process characterization and process validation (PC/PV) studies.
308
C. Hakemeyer et al. / Biologicals 44 (2016) 306e318
design space in the BLA/MAA, parameter ranges, rather than targets and normal operating ranges are provided, consistent with the regulatory management of change for these elements. Basically, there are no fundamental differences in the process description in the BLA/MAA, independent of whether a Design Space is claimed or not, although process parameter ranges may differ between both cases. Further insights about the Design Space concept are shown in Section 7.1. Managing changes within the Design Space enables a more active management of product quality. It allows a mean-centering CQAs using process parameters to account for raw material variability, facility fit and optimization, as well as yield improvements without increased risk to product quality. 2. Definition of the manufacturing process The starting point for process characterization and process validation (PC/PV) is a detailed description of each unit operation. Process development is complete when the unit operations, including their process parameter targets, and raw materials are defined. Drug Substance manufacturing processes for monoclonal antibodies (mAbs) have converged to a standard process design and can be divided into upstream and downstream operations [11]. Upstream unit operations in routine manufacturing begin with thaw of a vial of a working cell bank (WCB) through to synthesis of the MAb in the production bioreactor. Downstream unit operations start with the removal of the cells and include several filtration and chromatography steps to obtain a highly pure bulk Drug Substance. The manufacturing processes for Roche's first two products developed under the QbD paradigm, follow this process description (Fig. 3). A typical Drug Product manufacturing process consists of thawing of the frozen bulk Drug Substance. The thawed bulk may be pooled and/or diluted, then filtered to obtain a sterile solution, filled into vials, capped and inspected. Vials are labeled and packaged and then stored at the recommended temperature (usually 2e8 C). A similar but more complicated process flow is used for lyophilized products or liquids filled and assembled into devices (e.g., pre-filled syringes) (Fig. 3).
Typical MAb process unit operations have numerous input variables that may affect process performance and product quality attributes. Generally, there are two types of input variables, process parameters and material inputs. Process parameters are variables that can directly be controlled during manufacturing such as the pH value of a bioreactor or the column flow rate during a chromatography step. Material inputs consist of raw materials, in-process materials, and components (e.g., resins). Material inputs can also cause process and product variability [12]. For example, variability in raw materials in media can affect the cell culture process and thereby the mAb CQAs. Relative to process parameters, material input variability is more complicated to control. Therefore, it is important to understand the impact of raw materials, and if necessary, provide control through appropriate raw material specifications. To characterize individual unit operations, it is necessary to evaluate how they affect the final CQAs in DS or DP. For example, the cell culture process may generate high-molecular weight species, which are then effectively removed by the downstream chromatography steps. The CQAs themselves do not become less critical, but they can be demonstrated to be well-controlled by the process. For example, whether the impact of process parameters on the level of high molecular weight species (HMWs) in the cell culture supernatant is critical or not has to be evaluated considering the removal capacity of the downstream unit operations. One means to account for this downstream capability is by defining target ranges for the CQAs for each unit operation (see Section 5.1). For the DS purification and DP unit operations, these CQAs can often be directly measured in the pool of the unit operation. For the cell culture process, some degree of small-scale purification is generally needed (e.g. by high throughput Protein A chromatography). In some cases, it may not be possible to directly measure a CQA or a process output in a way that can provide information about the CQA in the pool of a unit operation. For example the impact of the cell culture process on DS bulk color can only be assessed after several purification steps. These cases can greatly complicate the execution of process characterization studies for monoclonal antibody processes because additional small scale purification or sample preparation steps are necessary.
Fig. 3. Flow chart of a standard MAB manufacturing process.
C. Hakemeyer et al. / Biologicals 44 (2016) 306e318
3. Designing and establishing qualified scale down models Systematic evaluation of the impact of the process parameters on the CQAs of a MAb requires dozens to hundreds of experiments. Scale-down models (SDM) of unit operations are used because it is impractical to generate the multivariate data to assess parameter criticality at full commercial manufacturing scale. These models have a proven history in predicting the directionality and magnitude of parameter impacts on process performance and product quality [13,14]. Indeed, these models have historically been used as an important part of the PC/PV packages for many licensed biologicals. They are the fundamental cornerstone for the site and scale independent process characterization. Because SDMs are used to determine CPPs, as well as the acceptable ranges for the CPPs and Non-CPPs that may potentially comprise the Design Space, it is necessary to demonstrate that the SDM is adequately representative of the full scale cGMP unit operation. There are two general approaches to characterize unit operations using SDMs: characterization of all relevant parameters across their full presumptive acceptable ranges and use of partial or worst-case designs. A SDM represents the entire unit operation by appropriate miniaturization from full scale. It is designed to represent the physical and (bio)chemical environment at reduced size to examine effects of process parameters and materials. Examples are bioreactor cultures and chromatography columns. When studied using the “partial” or “worst-case” approach, the SDM may represent a specific sub-set of physical and/or (bio) chemical properties of a unit operation, e.g., shear, surface area-tovolume, cell lysis. It may use miniaturized equipment, or an apparatus imparting a desired force, liquid shear, or stress. This approach is typically used to test worst-case conditions of a subset of input parameters or materials, often in the Drug Product processing unit operations. 3.1. Scale down models representing an entire unit operation These SDMs are commonly used for development and characterization of manufacturing unit operations. To ensure a model is representative of the manufacturing scale unit operation behavior it must be designed and operated using conditions, materials, and process parameters representative of the manufacturing scale process. Key elements to consider in SDM design are: Input variables: raw materials and components, feedstock/cell source, environmental conditions. Feedstock/cell source effects must be considered since they can affect the outcome of a unit operation (e.g. seed train for a bioreactor, content of impurities in a chromatography load) Design: equipment limitations, operational procedures, parameter control concepts, on- and off-line analytical instruments Output variables: performance and product quality metrics, sample handling/storage, analytical methods. Use of sound scientific and engineering principles for scaling The objective of the SDM is to match the full-scale unit operation as much as possible. It is important to understand and/or control for differences between the SDM and full-scale operations (e.g., materials of construction, use of different assays for manufacturing scale and small scale). A fundamental objective of process characterization is to understand the behavior of the process in response to process parameter and material input variation [15]. Therefore, it is critical that the SDM
309
qualification demonstrates the model is representative of the large scale process. The practicality of operating manufacturing scale runs under off-target conditions is limited, so comparison of the SDM and manufacturing scale results at target conditions is necessary. The SDM qualification, therefore, can be considered an assessment of risk. When the SDM results are equivalent to manufacturing results at target, it is concluded there is a high probability (i.e., low risk) that the response of the small scale system to input parameter variation is representative of the response that would be observed at manufacturing scale. If the at-target results are not equivalent, it indicates a higher risk (but not a conclusion) that the model may not be sufficiently representative of variation at manufacturing scale. An observation of nonequivalence or potential non-equivalence, therefore, warrants further evaluation of the SDM results. Such further evaluation is necessarily undertaken on a case-by-case basis. Formal qualification of the SDM includes a statistical test of equivalence between data sets comprised of manufacturing scale results and SDM model results, with both scales operated at target. The 90% confidence interval (90%CI) for the difference in means is compared to an equivalence limit or practically significant difference (PSD) by a two one-sided test (TOST). Because the comparison is two-sided, the use of the 90%CI results in a finding of equivalent or not equivalent with a significance level corresponding to p < 0.05. The approach requires a certain minimum number of data to lead to meaningful results. Comparing six at-scale runs to six SDM runs can be considered as a lower limit of the data requirements for this purpose. In general, the difference in means and the associated 90% CI are calculated from the SDM and manufacturing scale data sets as independent groups. However, when PC/PV models are conducted using manufacturing scale feedstock, the comparison can use the more powerful matched pairs analysis. It may not be possible to obtain data from at least six manufacturing scale runs at the time of an initial BLA or MAA filing. In this cases it is difficult to achieve a high level of confidence about the ‘true mean’ of the manufacturing scale. Using additional pilot scale data or data from earlier clinical manufacturing is one option to overcome this issue. As illustrated in Fig. 4, adapting the methodology described by Ref. [16], the observed difference between a SDM data set and manufacturing scale data set can be classified into four different equivalence levels with two main cases: 1.a The difference in means lies between an equivalence limit (or practically significant difference, PSD, described below) and the confidence interval for the difference in means is completely contained in the PSD range. The SDM result is considered to be “equivalent”. 1.b The confidence interval for the difference in means partly overlaps the PSD limits. Equivalence is likely (“equivalent-insample-mean-only”), but is not demonstrated. 2.a The difference in means lies outside the equivalence limits and the confidence interval for the difference in means partly overlaps the PSD limits. The SDM result is more likely to be non-equivalent (“failed to be equivalent”). 2.b The confidence interval for the difference in means is entirely outside the PSD range. The SDM result is considered to be “not equivalent”. The equivalence testing approach requires the determination of PSDs to enable its implementation. For each chosen output, PSDs that indicate what is considered to be a true difference are effectively the limiting values for an acceptable difference between the
310
C. Hakemeyer et al. / Biologicals 44 (2016) 306e318
Fig. 4. Possible outcomes of the two one-sided test.
data sets. Differences in product quality metrics within the PSDs are considered to be not relevant for the purpose of the qualification of the SDM. For most CQAs the relative level of a PSD will be consistent across unit operations of a manufacturing process. For CQAs with significant changes across the process (e.g. removal of host cell protein), PSDs that are customized to be appropriate for the unit operations need to be defined. PSDs are relevant for an “engineering” evaluation of the fidelity of the model to large scale, i.e, from a process engineering perspective. This is independent of the magnitude of effect on an attribute that might be relevant to patient safety or efficacy. Therefore, when discussing results that might be “not equivalent”, it is relevant to note if the observed offset really is “meaningful for patient safety or efficacy”. An appropriate justification of the PSDs is important and can also be a topic of long debate, since there are few generally established rules or guidance for the definition of PSDs [17]. The starting point for defining a PSD is using a range of two standard deviations of the manufacturing scale process for each CQA. However, there are a variety of circumstances where this value is not suitable or appropriate. An equivalent outcome of the TOST (case 1a) implies a high likelihood of representative behavior of the SDM to full scale. However, in some situations, it will not be possible to reach full equivalence for every process output of a unit operation. In these cases (1b, 2a, and 2b), an individual assessment of the implications of the observed offset in the output is necessary. It is important to reiterate that the objective of model qualification is demonstrating the SDM is suitably representative of manufacturing scale behavior for the attribute it is requested to represent or simulate. “Representative behavior” in this context is when the output measured in the SDM changes in the same direction and approximate magnitude in response to process parameter variation as the output at manufacturing scale would change in response to the same process parameter variation. An observed difference that cannot be statistically determined to be within the PSD (i.e., TOST cases 1b, 2a and 2b), indicates a potential risk that the SDM will not have this desired representative behavior. Generally, it is unusual to have sufficient replication of off-target conditions at full-scale for a statistically robust confirmation of the similarity in output change at the different scales. Therefore, scientific understanding of an observed offset, offset stability over time, in addition to any available off-target full-scale testing add incrementally to the totality of evidence that an offset does, or does not, imply the model may not have representative
Fig. 5. Scale-down model comparisons for affinity chromatography pools.
behavior. In some cases there might be a clear scientific understanding of the offset and the reasons for it (e.g. sample handling, exposure to light etc.) which support that the change in an output remains representative of the manufacturing scale behavior despite the offset observed under at-target conditions. The degree of reliance on data with an offset should be proportional to the degree of confidence in its accuracy e i.e., not an all-or-nothing acceptance or rejection of model-derived information. Generally the quality of the SDM should also be taken into consideration when defining the testing strategy for a CQA (see also Ref. [2]). Generally, it is expected that the majority of the CQAs do not show an offset. As an example, for the cell culture process and purification chromatography steps of mAb A approx. 50 TOSTs were performed and only 5 did not show full equivalence.
3.2. Example of a scale-down model qualification To illustrate the SDM qualification approach with a practical example, Fig. 5 shows the results of the TOST for the first chromatography step of the mAb A manufacturing process. The figure shows that the with exception of High Molecular Weight (HMW) forms, DNA and Leached protein A all CQAs show equivalence. This SDM was used for PC/PV studies of the affinity chromatography unit operation and the offsets of these three nonequivalent CQAs were addressed as follows: High Molecular Weight forms (HMWs): For HMWs the confidence interval of the difference in means overlaps the PSD (“equivalent-in-sample-mean-only”). However, the differences in means between manufacturing scale and SDM were small (0.2 area-%), and no relevant influence of the investigated process parameters was detected in the PC/PV studies. Leached protein A (LPA): LPA failed to be equivalent. The reason for this was determined to be the difference in cycle numbers between manufacturing scale and small scale, since leaching is more pronounced during earlier cycles. Since it would be a substantial operational complication to control this variable for the qualification of SDMs, it must be accepted that a formal demonstration of equivalence is not possible for this CQA in this
C. Hakemeyer et al. / Biologicals 44 (2016) 306e318
unit operation. Taking the cycle number into account, the SDM would achieve an equivalent result. However, based on this understanding of the source of the observed offset, it is expected that the response to variation in process parameters will be representative of the manufacturing scale operation. DNA: Host-cell DNA also failed to be equivalent. However, it is known that the genomic DNA is very unstable and strongly adsorbs to surfaces. This causes a large variation of DNA levels in the load of the affinity chromatography column. In addition, even slight variability during process pool handling can cause a significant impact on DNA levels. The variations that were observed between scale-down runs and manufacturing scale runs is therefore caused by DNA instability and not by a scale effect. So for this CQA, a formal demonstration of equivalence is not possible for the affinity chromatography step. However, the SDM was considered to be suitable to determine the direction and effect magnitude of process parameters variations on DNA removal although interpretation of results needed to take into account the variation of DNA levels in feed stocks. To compensate for the remaining uncertainty with regard to absolute DNA levels, additional spiking studies were performed for anion- and cation-exchange chromatography steps. Based on these measures, and the evaluation of DNA level changes in manufacturing scale, it was demonstrated that the process is very robust concerning the removal of DNA (see also Section 6 Process impact).
3.3. Partial scale-down models for “worst-case” studies Certain unit operations can be sufficiently characterized without modeling the full unit operation. These operations typically involve only physical manipulation of the bulk solution where the process stresses are well understood (e.g. mixing or filtration), parameter optimization is not required, and where quality attributes are expected to be unchanged (e.g. most DP unit operations). For these cases, it is sufficient to develop representative small-scale models based on engineering principles and scale-independent parameters (e.g., dimensionless numbers) that reproduce the relevant product stresses encountered in commercial scale. Qualification of partial models is based upon adequately describing and justifying the engineering principles and selection of materials used in their design. The models are not intended to mimic manufacturing scale processes under target conditions, so a comparison of performance between target data sets from manufacturing scale and the model is not meaningful. Examples for this could be certain types of stirring vessels to investigate shear forces on DP solutions that by far exceed actual manufacturing data or light chambers for the assessment of the impact of light on filled DP vials. 4. Identification of potential critical process parameters by risk ranking and filtering Even with the use of qualified SDMs, it is impractical to experimentally evaluate the impact of all process parameters on the CQAs. Considering a typical MAB manufacturing process has more than 500 process parameters across the different unit operations, it is therefore necessary to focus experiments on the process parameters that are reasonably likely to impact CQAs. This is done by risk assessments using a risk ranking and filtering (RRF) methodology to identify potential CPPs (pCPPs). The approach to this assessment has evolved substantially at Roche/Genentech. For example, the methodology used for first QbD filing ranked both the potential variation a process parameter could have on a CQA alone and via interaction(s) with other process parameters. In contrast, the methodology used for mAb A was based on a FMEA, ranking
311
potential severity, occurrence and detectability of variation from target for a process parameter at manufacturing scale. Based on the experience with these two projects, Roche/Genentech has developed a novel risk ranking and filtering procedure for identification of pCPPs and material attributes. Independent of the method used it is important to notice that conducting the risk assessment is a multi-disciplinary effort involving participants from development, manufacturing, analytics and quality assurance. A careful documentation of the risk assessments including participants and rationales for the scoring should be performed. The methodology analyzes each process input based on three assessments: 1) severity of impact on process output (CQAs and Key Performance Indicators (KPIs) including consideration of interactions based on estimates using relevant data and 2) certainty of the information used to inform the severity ranking. 3) An additional score is assigned based on the potential for a process input to exceed the characterization range (i.e., an excursion), by comparing the planned characterization range against the expected normal operating range of the large scale process (i.e., process input control capability). Table 1 gives an overview of the definition and scores for each of these three categories. As a first step, it is necessary to define which process outputs (CQAs and KPIs) can potentially be affected by individual unit operations [18]. Additional quality attributes or key performance indicators may also be considered if they are considered metrics for process consistency (e.g., glycan distribution for an antibody without effector function as part of its mechanism of action). The ranking of the severity score will be based on product specific knowledge gained during the earlier process development phase of the product, but it will also rely on platform knowledge generated during the development and characterization of previous MAb manufacturing processes. Risk scores are generated for severity and severity certainty for each process input. Risk scores are compared against threshold values to determine if the process input should be included in characterization studies (see also Table 2). Threshold values are based on the premise that process inputs should be included in characterization studies if variation across their characterization range has the potential to cause variation in a process output which is detectable, i.e., outside its assay variability. The Capability assessment will only indicate an otherwise low risk process input should be included in characterization studies. A high degree of control capability does not remove a process input from inclusion in characterization studies. One important aspect of the risk ranking and filtering methodology is that it requires the definition of a ‘planned characterization range’ for the capability evaluation and a comparison against the normal operating range (NOR) of the process. Process inputs with a low capability score will also be included in characterization studies. By this it can be avoided that process parameters are falsely classified as non-CPPs because of overly tight characterization ranges. Roche/Genentech's experience shows that this could be a key concern from health authorities during the review of a license application. The definition of NORs is not a trivial exercise and, likewise, the selection of planned characterization ranges must be carefully considered. Roche/Genentech has invested a significant amount of time and effort in the definition of NORs for parameter control that are relevant to Roche/Genentech's network of Drug Substance manufacturing facilities. In general, the characterization ranges should be at least two to three times wider than the expected NORs. This reflects the fact that for process parameters, the controllability influences the criticality assessment of the parameter. However, if the characterization range is set “too wide”, nearly every process parameter can become critical. If the range is set too small (even for
312
C. Hakemeyer et al. / Biologicals 44 (2016) 306e318
Table 1 Risk scores for the RRF tool used to identify pCPPs. Description
Definition
Score
Severity descriptions, definitions and scores No impact Variation in process input across the proposed characterization range alone, or if affected by an interaction, causes variation in process output which is not expected to be detectable (e.g., no effect or within assay variability) Minor impact Variation in process input across the proposed characterization range alone, or if affected by an interaction, causes variation in process output which is detectable but expected to be within expected output range. Major impact Variation in process input across the proposed characterization range alone, or if affected by an interaction, causes variation in process output which is expected to be outside expected output range. Certainty descriptions, definitions and scores High Product specific process development data available. Medium Platform data available. Generally accepted scientific principle. Low No public, platform or product specific information (e.g., novel product quality attribute), or data available only as published external literature for related molecule or relevant mechanism (e.g., for impurity clearance). Capability descriptions, definitions and scores High Proposed characterization range/Normal Operating Range 2.0, or capability to precisely control the process input is expected to be high as determined by a subject matter expert. Low Proposed characterization range/Normal Operating Range < 2.0 a
2 6a 10
2 4 6
2 10
May be reduced to 2 if variation in output is not practically significant, as determined by a SME evaluation and documented in risk ranking report.
Table 2 Risk ranking thresholds. Assessment
Threshold
Severity score
6, considering the highest Severity score for the process input 12, considering the highest result for the process input. 10
Severity score Certainty score Capability score
Process input is included in studies if any assessment meets its threshold criteria.
a well-controlled parameter), there is a certain risk that CPPs may be ‘overlooked’. Therefore, the definition of these characterization ranges is of key importance. For BLA/MAA submission the justification of characterization ranges has proven to be of vital importance and the description of these ranges in the process development and/or process validation sections of the dossier will explicitly need to contain information about the NORs of the manufacturing process.
5. Characterization of the unit operations Experimental process characterization studies are conducted with the objective of improving process understanding of each unit operation and defining critical process parameters and their acceptable ranges. Results from characterization and subsequent validation studies are performed using qualified SDMs. Each unit operation is assessed in the context of raw materials whose variability may be critical. After the risk ranking and filtering, the list of process parameters which must be evaluated may be extensive, and therefore, efficient experimental designs are desirable. Generally, characterization studies are statistically designed multivariate experiments, or DoEs, that should provide clear interpretation of the impact of the input process parameters evaluated. Univariate studies may also be conducted to supplement multivariate data if there is a sufficient degree of assurance there are no meaningful interactions of process parameters and to support univariate excursions. Evaluation of raw material variation should also be performed. This may be fulfilled by either including raw material variation as a factor in multivariate or univariate experiments, or, because of practical difficulties of this approach (e.g., limited variation in raw materials available during development), by a retrospective or other observational analysis of raw material
lots used during development and/or process characterization studies. Experimental designs may be sequential, starting with lower resolution screening designs, which are followed by higher resolution designs that enroll fewer, high-impact factors as for example described in Ref. [19]. Alternatively, a large experimental design may be executed in multiple experimental blocks. Multivariate experiment results are then analyzed statistically by multiple linear regression for each process output (i.e., KPI and CQA). It is also important to assess the raw data outputs and regression models of these studies by the use of statistical criteria such as: Coefficient of variation of the process data to assess the overall variability in the data. Overall analysis of variance (ANOVA) for the multiple regression models to compare the full regression model to a model representing the mean (without factor variables). The p-value gives the probability that the given data is observed when no factor has an impact on the response. If this probability is less than 0.05, it can be concluded that there is at least one factor with an impact on the response Coefficient of determination R2 of the multiple regressions to indicate the proportion of variability explained by the model. Values close to 1 indicate sufficient coverage of variability in the model. Coefficient of prediction Q2 to describe the predictive capability of the model. Q2 is based on a cross-validation type method, the prediction error sum of squares PRESS [20,21]. Models with sufficient predictive capability have small values of PRESS compared to the total sum of squares. Q2 values close to 1 indicate perfectly predictive models. Lack of fit test to compare the residual model variance to a pure error estimate of variance. If the model variance is substantially larger than the pure error estimate, then the model may not be adequate. A p-value less than 0.05 indicates there might be a lack of fit and the model may not be adequate. The pure error is estimated from replicated observations. Actual by predicted plot compares the predicted values calculated by means of the model (x-axis) to the observed values (yaxis). In a perfect model all observed values should be equal to the predicted. There may be cases when the statistical criteria indicate ‘poor’ predictive performance. However, in some situations, this can
C. Hakemeyer et al. / Biologicals 44 (2016) 306e318
actually mean the studied process parameters simply do not have a meaningful impact on a CQA across the parameter ranges studied, and residual variation is greater than the variation imparted by any of the studied process parameters. It is important to assess the overall variability of the data and compare it against the variability of multiple manufacturing scale runs. Large variability that cannot be explained by the variation of process parameters may suggest an unidentified source of variability which should be further investigated. If the variability observed in the data of the small scale studies is comparable to the variability observed in the manufacturing scale batches, this indicates a robust process that is not impacted by process parameter variations. Independent of the quality of the statistical model, a low overall output variability can indicate a very robust manufacturing process within the characterized range. The same is true if all measured output values of a CQA product variant are very small (e.g. below 1.0%). 5.1. Critical quality attribute target ranges (CQA-TRs) The results of the unit operation's PC/PV studies are compared against CQA-Target Ranges (CQA-TRs). The CQA-TR is the allowable range for the unit operation's output that will ensure the acceptable product quality is obtained in the final Drug Product. CQA-TRs are typically derived from narrowing (~5%) the CQA-AC (please Refer to Ref. [2]). This CQA-TR strategy was implemented to reduce risk that the SDM results do not accurately predict manufacturing scale result outputs if the manufacturing scale process were run at the edges of the Design Space. The use of a CQA-TR or similar narrowing of the CQA-AC range for process design helps to ensure a result within the CQA-AC is consistently achieved. For CQAs that do not change during processing or storage, the CQA-TR is directly derived from the CQA-AC, which is defined based on the acceptable level of the CQA at the Drug Product end of shelf life (please refer to CONTROL Strategy Chapter). In this case, the CQA-TR is calculated as a minimum 5% relative narrowing of a CQA-release criterion (i.e. for a one-sided CQA-TR the reduction is 5%. For a two-sided CQA-TR the reduction is done by changing each limit by 2.5% for a total reduction of the full range by 5%.) For CQAs that change from unit operation to unit operation during the process or on stability, a CQA-TR must be defined which is appropriate for the output of the individual unit operation. The CQA-TR must be defined based on data or relevant rationale which may include changes expected from a subsequent unit operations operated at the target parameter settings. This takes into account the capability of downstream unit operations for example for the removal of process related impurities. In some cases, spiking studies may be performed to demonstrate that some chromatography steps have removal capacity for beyond what is routinely observed in the process and the results of these studies can be used to define CQA-TRs of upstream unit operations. If the results of the unit operation meet CQA-TRs, the investigated process parameter ranges can be considered as acceptable parameter ranges for the unit operation. However, in order to claim these ranges as Design Space, additional confirmatory studies are often conducted as described in chapter ‘Linkage Studies’. 5.2. Impact ratio assessment use in defining critical process parameters The results of the unit operation characterization studies are used to define the critical process parameters of the manu
313
facturing process. A quantitative metric, the Impact Ratio, is calculated:
Impact ratio ¼
Parameter Effect jProcess Mean CQA TRj
where the Parameter Effect is the expected change in a CQA result when the parameter is changed from the midpoint of its acceptable range to the limit of its acceptable range. The Process Mean term represents the mean CQA value when the manufacturing scale process is operated at target. The CQA-TR term is the CQA-TR limit (upper or lower) closest to the Process Mean. As described above, the numerator term is generally evaluated using experiments with the qualified model. Conceptually, the numerator represents how much a CQA may vary when a process parameter is moved to the edge of its proposed acceptable range, while the denominator represents how much CQA variability is allowed before the CQA approaches its CQA-TR. A visual graphical representation of this concept is displayed in Fig. 6. Process parameters are ultimately categorized as high-impact CPPs, low-impact CPPs, or non-CPPs, in order to better categorize their relative criticality. For each parameter, an Impact Ratio is calculated for each CQA. The highest Impact Ratio is compared to the following criteria: High Impact CPP: Impact Ratio > 0.33 Low Impact CPP: 0.33 Impact Ratio 0.10 Non-CPP: Impact Ratio < 0.10 The definition of criticality based on an impact ratio of 0.10 was selected to balance identifying parameters with a meaningful effect while avoiding classifying parameters with only minor effects as critical, since it would take at least 10 such parameters, all impacting the same CQA to be simultaneously operating at the worst-case limit of their range to cause a failure in that CQA. In practice, this is highly unlikely. Calculation of the Parameter Effect term in the Impact Ratio calculation assumes that process characterization studies have been conducted which permit estimation of a linear statistical model. The linear statistical model is typically a polynomial including terms for an intercept, main effects, two-factor interaction effects, and curvature. Which terms can be estimated depends on the design of the process characterization experiment. Model coefficients from this statistical model estimate main, interaction and curvature effect sizes for each parameter. In general, model coefficients from the “full” model, the function containing all the estimable terms, are used as parameter effect size estimates. Coefficients from a “reduced” model, a function with terms removed to improve the overall predictive power of the model, may be used. Higher order effects (e.g. 3-factor interactions) are not included due to a low probability that such effects are practically significant, unless otherwise justified by a known higher order effect. Statistical process models can be based on the following response quantities: the absolute value of a CQA response in the output pool of the unit operation, the difference (by subtraction) in CQA results between the unit operation input pool and output pool, the ratio or log change in CQA results between the unit operation input pool and output pool.
314
C. Hakemeyer et al. / Biologicals 44 (2016) 306e318
Fig. 6. Illustration of impact ratio calculation.
The quantity modeled must be appropriate to the data set available (e.g., the availability of input pool results), the magnitude of change across a unit operation, and the expected mechanism of change in CQA output in the unit operation. When interaction and curvature effects are observed in the statistical analysis, these are incorporated into the Parameter Effect term to account for the more complex behavior of the parameter. For parameters studied only under worst-case conditions, the parameter effect (impact ratio numerator) is equal to the difference in quality attribute value for the stressed material and the unstressed or to control. If this difference is within assay precision, then the parameters are not considered CPPs even if the impact ratio is above 0.10. Calculation of the impact ratio denominator is as described previously. It has been well recognized that the width of the range over which the process parameters are varied while studying their criticality has a direct influence on the ability to observe CQA impact and hence, identify CPPs. According to the definition used by Roche/Genentech, non-CPPs are parameters that do not show a relevant impact on a CQA within their characterized ranges. A short example can help to illustrate the calculation of the impact ratio. A drug substance process may yield a mean result for HMWs of 1.0 area-%. Assuming a CQA-TR for HMWs of 3.0 area-% would lead to a denominator of 2.0 area-% in the impact ratio formula. In this scenario, any shift of a process parameter to the limit of its acceptable range that leads to an increase in HMWs of 0.2 area-% would be classified as CPP. The dilemma here is that it is not practical or an efficient prioritization of resources to characterize the edges of failure for all parameters, or study ranges that are some arbitrary factor wider than the intended operating range, in order to assure that parameters have no potential critical impact. This has already been mentioned in the description of the risk ranking and filtering tool above. A clear description of the manufacturing capability in terms of NORs and their relationship to the acceptable ranges is of key importance. 5.3. Worst-case linkage assessment The strategy for worst-case experimental linkage is to perform the unit operations that influence a CQA in series at their worst-
case conditions and assess the cumulative, process wide impact on the CQA output. By demonstration of acceptable product quality in the final process pools from these linkage experiments validity, of the Design Space across the entire process can be confirmed. Process parameters included in the worst-case linkage studies are identified using the results of the individual unit operation PC/ PV studies. All High and Low impact ratio process parameters are included in the linkage study. The directionality of process parameter settings is based on worst-case prediction of the regression models. Each CQA that is relevantly influenced by more than one unit operation is assessed in a linkage study experiment. For a CQA which is impacted by a single unit operation (e.g. glycan distribution), worst-case runs of the single unit operation can be considered as the “worst-case linkage study” for that CQA. Some unit operations are inherently robust and do not contain process parameters which impact CQAs (e.g. early inoculum train stages, virus filtration). In general, these unit operations are operated under target conditions in a linkage study. However, unit operations and/or process parameters that did not have clearly identified worst-case conditions established (potentially due to a “flat” response in PC/PV studies) may be included at their theoretical worst-case conditions if there is a strong rationale for assigning the worst-case conditions based upon platform knowledge or established principles. Examples of these type of parameters are: 1) When impurity levels in pools are below the limit of quantitation of the assay and, therefore, parameter effects could not be explicitly determined in the unit operation's characterization study, 2) parameter settings based on a mechanistic understanding of the unit operation, and/or 3) parameter was tested at a known worst-case during process characterization (e.g., not in a DoE), therefore, no estimate for an individual parameter effect is obtained. For the linkage studies, these parameters are tested at their theoretical or known worst-case for selected quality attributes to ensure the process is challenged appropriately. An in-process pool hold time condition with impact to a CQA may not be experimentally included in the linkage study if the results from the hold time-and-temperature conditions are mathematically added to the final Linkage Study result. This circumstance requires the characterization of the kinetics of CQA changes during in-process pool holds. An example summary of linkage studies for a variety of CQAs is given in Fig. 7, which illustrates some of the concepts described
C. Hakemeyer et al. / Biologicals 44 (2016) 306e318
above. Unit operations with a colored box are included at their experimentally determined (red boxes) or theoretical (yellow boxes (in the web version)) worst-case conditions. Unit operations with an uncolored box are performed at target for the linkage experiment of the given CQA. Additionally, an ‘in silico’ linkage may be performed with the data from the individual unit operation PC/PV studies by adding up the worst-case outputs or changes for a respective CQA for all unit operations that affect this CQA. Comparable results between this calculation and the experimental linkage study greatly increase the confidence in the outcomes of the unit operation PC/PV studies because they can be interpreted as an external validation of the data. Linkage Study outcomes are compared to the CQA-TR. A Linkage Study result within the CQA-TR is indicative of a robust process, even under worst-case conditions across the entire process, and the tested process parameter acceptable ranges are acceptable without further restriction. Linkage Study results outside CQA-TR indicate that the cumulative impact of operating unit operations at worst-case may have an adverse impact on product quality and, therefore, the linear combination of the acceptable process parameter range values may require further restriction. In general, such a restriction can be made in two ways: 1) narrowing the acceptable range of one or more process parameters in one or more unit operations in order to prevent the CQA from exceeding its CQA-TR or 2) utilize the understanding of parameter effect magnitudes and interactions to disallow future use of those combinations of process parameter values which produce unacceptable results. The latter option will permit greater operational flexibility in manufacturing, but requires definition of a relationship between process parameter values impacting the CQA as well as clear processes that assure appropriate management of cumulative or sequential process parameter targets changes within the PALM plan [8]. The characterized multivariate acceptable ranges, with or without restriction, that deliver product quality within the defined CQA-TR may then define the limits of a Design Space. It is important to note that this establishment of limits is necessary for parameters that have been identified as CPPs and also for non-CPPs. Roche's current concept of Design Space, which have been approved by all major Health Authorities, is closely aligned with the overall process descriptions provided in S.2.2 and P.3.3 of the CTD. This creates a clear picture of what can be considered regulatory change relevant,
315
and it is a step forward in terms of creating the ability to enable continuous improvements to process parameter targets with reduced regulatory oversight while maintaining a high degree of sustained product quality. 6. Connecting to the testing strategy e definition of process impact scores Using QbD principles a design space based on qualified models, qualified materials, risk assessments, characterization and validation studies supplemented with linkage studies was provided for MAb manufacturing processes. Using the knowledge and data derived from establishing a robust Design Space, an equally robust Attribute Testing Strategy can be implemented. While the Acceptance Criteria of the CQAs, and the CQA-TR derived from them, limit the acceptable ranges of the process parameters and the thereby the Design Space, the results of the process characterization studies also impact the testing strategy, because the ATS scoring tool requires the definition of a process impact score. The process impact score represents an estimation of the residual risk that a CQA could exceed its acceptable range, as defined by the CQA-TR, when the process is operated within its acceptable ranges. The methodology passes through filters for quality attribute abundance and having a representative scale down and process model, and concludes with the score assigned based on how robustly the process controls the quality attributes, as determined by comparing the worst-case linkage result to the CQA-TR. Further details and examples are described in Ref. [2]. 7. Lessons learned and future opportunities 7.1. Lessons learned The development of the Roche/Genentech QbD approach was a long journey with many iterative steps, starting from FDA's QbD pilot program and continuing through the health authority interaction during for the first two QbD filings for monoclonal antibodies. The whole approach was new for the company as well as for the health authorities. The novelty and interconnections between elements of the Testing Strategy introduced complexities that took time to understand and reveal the practical consequences. However, the experience reinforced that the first time through is always the hardest. Future projects will profit from the methodologies
Fig. 7. Example summary of linkage studies.
316
C. Hakemeyer et al. / Biologicals 44 (2016) 306e318
developed for identifying pCPPs and CPPs and defining acceptable ranges for process parameters and the Design Space. Also, the approach will provide a standardized framework that assures a harmonized approach to process validation and characterization that is essential for a large organization with multiple late stage projects in the pipeline. This benefits both the company due to the better predictability of planning and resources, but also Health Authorities in knowing better what to expect in a BLA/MAA. Because careful and thorough process characterization should be the goal for any newly developed product, the majority of underlying principles of the QbD approach (e.g., risk assessment, statistically designed multivariate experimentation) have become a baseline expectation. Still, introducing the QbD approach to PC/PV into an organization required an understanding of risk assessments, statistics and experimental design that was new for many members of the organization. However, once this understanding is engrained, the effort for subsequent projects will be much smaller. The opportunity of the QbD approach lies in the inherent robustness of the process and the quality of the information gathered to demonstrate this independent of whether a Design Space is claimed of not. Following the QbD approach strongly connects the process to its output CQAs. This can also be a challenge because late changes to CQAs and/or their acceptance criteria can significantly impact the outcome of the PC/PV studies. It is important to get the right information at the right time, but also to be able to react flexibly if changes occur late in the program. For example, a list of potential CQAs and preliminary CQA-ACs is needed for risk ranking and filtering for pCPPs and a rational experimental design and evaluation. Last minute changes of CQAs/CQA-ACs can have a significant impact on the Design Space and CPPs. In order to accommodate for the risk of CQAs that are identified only shortly before the filings of a BLA/MAA it is important that one is able to analyze these CQAs in the samples from the already conducted PC/PV studies. Late changes in CQA-ACs can impact the acceptable ranges of process parameters and their classification as CPPs or non-CPPs. Such changes can even occur after submission of the license application through to Health Authority interactions during review. Approval of a Design Space does not obviate the need for change control and a robust Quality System. The internal control and governance of changes within the Design Space is still required. However, an approved Design Space is anticipated to reduce the Health Authority notification-related workload and especially the implementation time for post-approval changes to process parameter targets within their acceptable ranges. Such changes may be desired to optimize a manufacturing site schedule (e.g., culture and hold time durations), increase process yield, robustness or product quality (e.g., changing of a parameter to increase yield or robustness that requires a compensatory change in another parameter to keep a product quality attribute “mean-centered”). The Design Space allows to react in case ‘process drifts’ are observed which could be caused for example by changes in raw materials that cannot be actively controlled in order to keep the product quality at the intended level. Such changes within an approved Design Space do not require Health Authority notification or pre-approval. For site transfers and for products manufactured at multiple sites, an approved Design Space is anticipated to provide justification for the acceptability of facility-fit changes/differences which require parameter target differences with their acceptable ranges (e.g., inoculation densities, feed addition times, culture and hold durations, bed height, pooling criteria and conductivity targets). It is recognized that a site transfer will require a regulatory submission, but the degree of risk could be justified as lower if changes/
differences between sites were within the approved Design Space. This lower risk may justify faster Health Authority review and/or obviate the need for a pre-approval inspection. In summary, managing changes within the Design Space enables a more active management of product quality. It allows meancentering of CQA results using process parameters to account for raw material variability, facility fit and optimization, as well as yield improvements without increased risk to product quality. It has to be acknowledged that the process description given in a BLA/MAA does not necessarily differ dramatically, although described process ranges may be different. The description of the Design Space is completed by the description of risk categories for process changes as they are described in the PALM plan (please refer to Ref. [8]). In practice, the Design Space concept has proven to be one of the biggest challenges to the complete application of the QbD concept. As already described, the first dilemma is that it is almost always impractical for manufacturers to create the multivariate data to assess parameter criticality, or to prospectively confirm the Design Space at full commercial manufacturing scale. Instead, both the identification of CPPs and the setting of the quantitative limits for the Design Space parameters depend on data derived from SDMs and experimental designs that include some degree of residual uncertainty with regard to the performance of the process at scale. This has been the central question challenging reviewers, and appropriate strategies to manage the remaining risk have to be defined as described in Ref. [22]. 7.2. Further outlook The worst-case linkage approach describe in this publication is a very conservative approach, because it combines worst-case setting over the whole process. It is recognized that worst-case linkage has theoretical value in demonstrating that a manufacturing process can deliver acceptable product quality even when operated under the absolute worst-case extremes of process parameter targets across all unit operations. These data have value as risk mitigation to changes in product quality posed by hypothetical changes to the process within the design space following approval. However, the practical value of this knowledge is limited because combining all process parameter targets at their worst-case settings is not realistic for routine commercial manufacturing. The likelihood of ever running the process under these conditions is remote. Furthermore, this approach may limit the ability to make changes to the normal operating range that could ensure consistent product quality and continuous supply as described above. For example, a process parameter range would be restricted based on its impact on CQAs using a worst-case linkage study that includes all parameters at their worst-case settings. Then, however, during the lifecycle of that product, as greater experience at manufacturing scale is gained, a shift is observed for another CQA (e.g., an impurity). In this situation, we would have the process knowledge (i.e., PC/PV study results and accumulated manufacturing knowledge) to correct that drift by moving process targets for a subset of those parameters. Such an action may be unnecessarily constrained by restricting the limits of each parameter beyond the limits demonstrated in the worst-case unit operation studies based on a process-wide worst-case linkage study. Furthermore, worst-case linkage studies are inherently complex because the conditions they are designed to test often require additional manipulations that are not part of the process design. For example, the worst-case output of an upstream chromatography step may be a low-mass pool, but the worst-case input to the subsequent step is a high loading density. Such circumstances would require cycling of a chromatography step and potential
C. Hakemeyer et al. / Biologicals 44 (2016) 306e318
317
addition of a concentration step that are not normally executed in the process. They may also be unrealistic because the in-process pools need to be manipulated during the experiment (e.g., concentrated) in a way that is not operationally possible in manufacturing. Therefore, as a component of continual improvement of the PC/ PV program strategy, for the future, Roche/Genentech aims for the definition of a Design Space without a complete worst-case linkage assessment for every CQA. The process linkage assessment approach proposed for supporting design space claims in the future would comprehensively assess the capability of the overall process using, where appropriate, single unit operation worst-case linkage studies, spiking studies, polishing step removal studies, and modelbased linkage calculations to understand process robustness. These process linkage assessment elements are defined as follows: Unit operation worst-case linkage study: This is a linkage study that evaluates a worst-case combination of process parameter setpoints for one unit operation. The resulting pool is processed downstream under representative process conditions consisting of target process parameter setpoints. The unit operation worst-case linkage study is repeated for all relevant product quality attributes. Model-based linkage calculation: As a complement to the experimental unit operation worst-case linkage study, this evaluation uses the worst-case combination of process parameter setpoints for one unit operation and uses process models to predict the outcome of processing downstream under representative process conditions consisting of target process parameter setpoints. Spiking study: In cases where elevated levels of impurities are not observed in upstream pools, spiking studies for processrelated impurities may be used to demonstrate process robustness for removal of a particular impurity (e.g., DNA). Polishing step removal study: This study demonstrates adequate clearance of a CQA by experimentally skipping a unit operation and thereby demonstrating process robustness. The resulting series of process linkage experiments and modelbased evaluations explores the worst-case conditions of each unit operation to evaluate the impact on the complete process. A schematic of process linkage relative to worst-case linkage is shown in Fig. 8. It illustrates that for the process linkage, only one unit operation in the overall process is performed at its worst-case settings, while other unit operations are run under target conditions. For future BLAs/MAAs, Roche/Genentech proposes to define process parameter acceptable ranges for each unit operation based on the results of the worst-case results of the unit operation study with subsequent downstream processing under target conditions. Process parameter ranges of a unit operation would only be restricted if results from that unit operation at worst-case with subsequent target linkage did not meet the CQA target ranges (CQA-TRs) for all CQAs. The process linkage assessment described above will also inform the Attribute Testing Strategy (ATS) that establishes the control system as part of the determination of the process impact score, similar to the worst-case linkage study. The process impact score will be determined based on the process linkage assessment, which includes a comprehensive evaluation of worst-case conditions within each unit operation and overall process robustness. Therefore, use of the process linkage assessment does not change the ability to define a robust and well thought-out control system that ensures consistent product quality throughout the lifecycle. Additionally, the PALM plan should now require confirmatory
Fig. 8. Schematic comparison of worst-case linkage and process linkage.
testing of combinations of process parameter target limits using scale-down models before verification at scale when multiple CPP targets are changed. One further step in fully exploring the Design Space concept is to again remove the non-CPPs from the Design Space description. Since the ICH Q8 definition of design space establishes that changes within the design space are not considered changes from a regulatory perspective, a Design Space definition that includes nonCPPs was useful in creating mutual understanding between the MA applicant and HAs that no high risk changes, such as removal of a unit operation could occur, and also reduce concerns that some CPPs might not have been captured. In the future, as both industry and HAs become more familiar with the risk assessment strategy, it is hoped that some non-CPPs might be excluded from the design space and fully managed in the Pharmaceutical Quality Systems (PQS) in order to more fully realize the benefits of the enhance process understanding created. Ultimately, it is hoped that sufficient confidence is gained that specific non-CPPs may be identified as being outside the design space and potentially free from preapproval requirements even if their acceptable ranges are expanded. Additionally, the Design Space concept could possibly be extended to more aspects of the process description than just parameter ranges. For example, in future filings the aim could be to allow more flexibility for the management of raw materials such as allowing the use of chromatography resins from multiple suppliers in order to minimize supply risks. 8. Conclusions The tools described in this paper were designed to provide a systematic approach to process characterization and validation. Starting point is the risk assessment tool used for the identification of potential CPPs. The impact of a variation of these process parameters on the manufacturing process is evaluated in extensive multivariate PC/PV studies that are mainly conducted in qualified scale down models. The qualification of these models is a very stringent process based on the comparison of manufacturing scale batches and small scale batches with statistical tests. Linkage studies are performed to assess the cumulative impact that varying process parameters in several unit operations can have on the product quality in the final Drug Substance or Drug Product and are
318
C. Hakemeyer et al. / Biologicals 44 (2016) 306e318
the basis for claiming a Design Space. The methodologies and tools described in this chapter are an important part in the overall QbD approach. For example, the definition of CQAs is a clear prerequisite before the PC/PC activities can start. On the other hand, the knowledge gained in these studies is used to define the process impact that is an essential part of the definition of the testing strategy for each CQA. This approach was successfully developed and applied through the process characterization and filing of two monoclonal antibodies. Although a lot of effort was necessary to establish all the concepts, risk assessment and methodologies, the framework they provide now greatly facilitates and streamlines the process validation of new products, greatly increasing the consistency of Roche/Genentech validation approaches and the way the information is provided in market applications. These methodologies are currently fully in use at Roche/Genentech. However, future improvements can be envisioned to address needs in areas such as improvement in quantification of risk, expansion to devices and combination products, as well as enhanced filings for both platform products that would benefit from additional efficiencies as well as novel therapeutics that present new and greater complexities in establishing appropriate product and process knowledge bases.
Acknowledgements The authors would like to acknowledge the contributions of many coworkers on both local and global QbD working teams, including Nadja Alt, Oliver B€ ahner, Christoph Finkler, Reed Harris, Felix Keppert, Lynne Krummen, Paul Motchnick, Ettore Ohage, Silke Werz, and Ron Taticek.
References [1] Alt N, Zhang TY, Motchnik P, Taticek R, Quarmby V, Schlothauer T, et al. Determination of critical quality attributes for monoclonal antibodies using quality by design principles. Biologicals 2016;44(5):291e305. [2] Kepert JF, Cromwell M, Engler N, Finkler C, Gellermann G, Gennaro L, et al. Establishing a control system using QbD principles. Biologicals 2016;44(5): 319e31. [3] CMC Biotech Working Group. A-MAb: a case study in bioprocess development. Emeryville, CA: CASSS; 2009.
[4] International Conference on Harmonization of Technical Requirements for Registration of Pharmaceuticals for Human Use, ICH Harmonized Tripartite Guideline, Pharmaceutical Development, Q8 R2, August 2009. [5] International Conference on Harmonization of Technical Requirements for Registration of Pharmaceuticals for Human Use, ICH Harmonized Tripartite Guideline, Quality Risk Management, Q9 Step 4, 9 November 2005. [6] International Conference on Harmonization of Technical Requirements for Registration of Pharmaceuticals for Human Use, ICH Harmonized Tripartite Guideline, Pharmaceutical Quality, Q10 Step 4, 4 June 2008. [7] International Conference on Harmonization of Technical Requirements for Registration of Pharmaceuticals for Human Use, ICH Harmonized Tripartite Guideline, Development and manufacture of drug substances (chemical entities and biotechnological/biological entities), Q11 Step 4, May 2012. [8] Ohage E, Iverson R, Krummen L, Taticek R, Vega M. QbD implementation and Post Approval Lifecycle Management (PALM). Biologicals 2016;44(5):332e40. [9] International Conference on Harmonization of Technical Requirements for Registration of Pharmaceuticals for Human Use, ICH Harmonized Tripartite Guideline, Organisation of the Common Technical Document for the registration of pharmaceuticals for human use, M4 Step 4, 13 January 2004. [10] Rathore AS. Roadmap for implementation of quality by design (QbD) for biotechnology products. Trends Biotechnol 2009;27(9):546e53. [11] Kelley B. Industrialization of mAb production technology: the bioprocessing industry at a crossroads. mAbs 2009;1(5):443e52. [12] Jose GE, Folque F, Menezes JC, Werz S, Strauss U, Hakemeyer C. Predicting Mab product yields from cultivation media components, using near-infrared and 2D-fluorescence spectroscopies. Biotechnol Prog 05/2011;27(5):1339e46. [13] Li F, Hashimura Y, Pendleton R, Harms J, Collins E, Lee B. A systematic approach for scale-down model development and characterization of commercial cell culture processes. Biotechnol Prog 2006;22:696e703. [14] Rathore AS, Johnson GV, Buckley JJ, Boyle DM, Gustafson ME. Process characterization of the chromatographic steps in the purification process of a recombinant Escherichia coli-expressed protein. Biotechnol Appl Biochem 2003;37(1):51e61. [15] FDA Guidance for Industry. Process validation: general principles and practice. 2011. [16] van der Voet H, Perry JN, Amzal B, Paoletti C. A statistical assessment of differences and equivalences between genetically modified and reference plant varieties. BMC Biotechnol 2011;11:15. [17] Wiens BL. Choosing an equivalence limit for noninferiority or equivalence studies. Control Clin Trials 2002;23(1):2e14. [18] Martin-Moe S, Lim FJ, Wong RL, Sreedhara A, Sundaram J, Sane SU. A new roadmap for biopharmaceutical drug product development: integrating development, validation, and quality by design. J Pharm Sci 2011;100(8): 3031e43. [19] Horvath B, Mun M, Laird MW. Characterization of a monoclonal antibody cell culture production process using a quality by design approach. Mol Biotechnol 2010;45(3):203e6. [20] Allen DM. The prediction sum of squares as a criterion for selecting predictor variables. University of Kentucky, Department of Statistics; 1971. Technical Report #23. €m C, Wold S. Design of [21] Eriksson L, Johansson E, Kettaneh-Wold N, Wikstro experiments. Umetrics Academy; 2008. p. 77e9. [22] Kelley B, Cromwell M, Jerkins J. Integration of QbD risk assessment tools and overall risk management. Biologicals 2016;44(5):341e51.