PHYSIOLOGICAL STATE ESTIMATION IN RECOMBINANT ESCH. ..
14th World Congress ofIFAC
0-7d-OI-2
Copyright (Q 1999 IFAC 14th Triennial World Congn,s>, Bt:ijing, P.R. China
Physiological State Estimation in Recombinant Escherichin coli Fermentations 1\1. Feng, A. J. Austin, A. C. Ward*, and J. Glassey+,
Dept. Chemical and Process Engineering *Dcpt. Agricultural and Environmental Science University of Newcastle upon Tyne Newcastle upon Tyne NEl 7RU, lJK + phone: (+44) 191 2227275; fa..x: (+44) 191 2225292 e-mail:
[email protected]
Abstract: Implementation of advanced control strategies in bioprocesses is often hindered by the lack of on-line measurements reflecting the physiological state of the culture. Although a number of techniques have been used to estimate key variables from data monitored on-line, the~e often do nO[ explicitly take into account changes in physiological state and information on many aspects of physiological state may not be present in on-line data. This paper demonstrates that data obtained from chemical fingerprinting methods, such as Pyrolysis Mass Spectrometry, can be used to identify changes in physiological state during cultivation. This information can be utilised for estimation of the physiological state with on-line data that will enable physiological-state-specific model development for on-line bioprocess control. Copyright © 1999 IFAC
Keywords: Fermentation processes, Neural networks, Physiological models, Estimation, Modelling
the physiological state of the cell, (ii) cell physiology and its interaction with tbe constantly changing physicochemical environment in the reactor and (iii) how to control the process in a way that optimises productivity and product fidelity. So there is considerable potential for cost savings.
1. INTRODUCTION
Recombinant DNA techniques confer on bacteria the possibility of producing novel products of high biotechnological value. Key elements are the ability to grow bacteria to high cell densities and for them to produce the required protein product as a large part of their cellular biomass. As there is considerable potential for the production of therapeutic products of high value, there is considerable industrial interest in the development of such production systems. Nevertheless, the number of recombinant protein products that are on the market or under development is low in comparison with commercial expectations and the current level of investment.
Traditional bioprocess control is constrained by the availability of reliable on-line measurement of key parameters that accurately ret1ect the physiological state of the culture and product formation. In practice, a chosen parameter is used to control the growth mte of the culture by modulating the fced rate of a limiting component. The consequences of substrate limitation on cell physiology and product formation, including the effects on and by bigh-Ievel intracellular protein production, are often unknown, A lack of good process control strategies can lead to over-feeding of substrate and/or starvation which in turn can result in yield
This is partly due to limited efficiency of bioprocesses caused by incomplete understanding of (i) the intcgration of recombinant protein overexprci-ision and
7527
Copyright 1999 IF AC
ISBN: 008 0432484
14th World Congress ofIFAC
PHYSIOLOGICAL STATE ESTIMATION IN RECOMBINANT ESCH. ..
reduction, prevention of product formation and reduced product fidelity. One means of overcoming this limitation would be the implementation of an inferential control scheme where data routinely monitored on-line are used to infer secondary p~ocess variables available only with significant time delay. AlthQugh this strategy has been applied to chemical processes for some time (Mitchell, et aI., 1995) the complex nature of bioprocesses, especially recombinant fermentations, have somewhat limited its applicability in bioprocess control. A number of techniques have been used to estimate biomass and product concentrations from on-line measurements and these estimates have been used to drive feed rate control in order to maintain the required biomass set point. However, the metabolic burden imposed upon bacteria producing recombinant proteins often leads to significant changes in physiology and potentially to process-model mismatch. Indeed a number of models developed for particular physiological state (PS) may be required to reduce the problem of mismatch. This introduces the additional problem of detecting the switch between physiological states and thus between process models. Further analytical measurements capable of detecting the changes in physiology are therefore vital in developing such a physiological control scheme. Pyrolysis mass spectrometry has been shown to be a rapid, cheap and sensitive method for detecting changes in microbial cultures (Kang, et aI., 1997), including recombinant protein production (Goodacre, et al.. 1994) and metabolite production (Good acre and KelI, 1993). The huge volume of data produced can be used together with on-line data to estimate bioprocess variables and potentially changes in physiological state. This information can be used directly to control fcrmentations to follow process defined optimal profiles. PyMS is a chemical fingerprinting method and requires additional information for interpretation of complex fingerprints from biomass samples. For example, information on substrate changes (Kang, et ai, 1997) or the concentration of stress indicators, such as ppGpp or GroEL, measured with conventional analytical techniques can be correlated with specific PyMS spectral information. Comparison of stress indicator levels inside the cell with PyMS data will allow correlation between stress levels and protein production to be rapidly determined and biological interpretation of bioprocesses to be applied to process optimisation.
developed here will be more readily transferable to industrial scale-up procedures than experiments carried out using for example /3-galactosidase as the recombinant protein. The strains were developed to produce interferon-a. (INF) , tumour necrosis factor a (TNF), granulocyte colony stimulating factor (GCSF) and ricin A chain. There are differences in the productivity and growth of the organisms producing the proteins: TNF is completely soluble even at very high % total cell protein (TCP); GCSF is extremely insoluble producing large highly visible inclusion bodies; ricin A chain is toxic to the cells which grow poorly. The strains used in the preliminary investigations produce INF at different levels. These can be used to detect different stress levels in strains that are otherwise identical. Detailed description is given elsewhere (Austin, et aI., 1998)
2.2 Media and Growth Conditions
Batch and fed-batch fermentations were carried out in a 51 Bioflo III (New Brunswick Scientific, Edison, NJ) bioreactor. The working volume was 4 1 and the initial operating conditions were temperature 37°C, agitation 400 rpm, pH 6.8, and airflow rate 3 llmin. The pH was controlled in some fermentations by the addition of IM NaOH and lM HCI and foaming was reduced by the addition of anti foam. A single colony was used to start the inoculum of 50 ml that was grown overnight at 37°C and 200 rpm. A 10 ml sample was removed from the inoculum prior to inoculation of the fermenter to obtain a large sample for PyMS analysis of the starting state of the cells. The remainder of inoculum was added and samples were taken every 30 min for dry weight determination, measurement of glycerol concentration, SDS-PAGE and PyMS analysis. The on-line data, %DOT, pH, rpm, temperature and off gas composition was collected every 30 s during the fermentation using a data logging program.
2.3 Off-line Analyses
Detailed description of dry weight, optical density, recombinant protein, GroEL and PyMS analyses is given elsewhere (Austin, et aI., (998).
2.4 PhYSiological Slate Detection
A number of techniques were used to detect physiological states. PyMS spectra was analysed using Principal Component Analysis (PCA) and Canonical Variate Analysis (CV) to compare the chemical fingerprints of fermentation samples. Data was analysed by cluster analysis and samples with similar chemical fingerprints assigned to groups associated with physiological states. Radial basis function (RBF)
2. MATERIALS AND METHODS 2. 1 Bacterial Strains
This study makes use of industrially developed strains provided by Zeneca Pharmaceuticals so that methods
7528
Copyright 1999 IF AC
ISBN: 008 0432484
PHYSIOLOGICAL STATE ESTIMATION IN RECOMBINANT ESCH. ..
14th World Congress ofIFAC
network models for parent and recombinant fermentations have been developed in order to differentiate between the physiological states detected by PyMS. Input data for these models was prepared by peA on each fermentation PyMS spectra separately and principal components explaining 90 % of variance were used. Alternatively data features were extracted with an Autoassociative Neural Network (AANN) model (with a topology of 150-35-15-35-150) developed for each individual fermentation and the outputs of the 15 bottleneck layer neurons were used (Kramer, 1992). For comparison the full PyMS spectra (150 mass/ion charges) have been included. Additionally models using on-line data as inputs were developed to predict the three physiological states determined from the PyMS spectra. Data from seven fermentations was used for training, three fermentation set" were used for crossvalidation and the model was tested using data from previously unseen fermentation data.
that the time points duster closely and that three distinct metabolic phase;; can be identified (Figure I).
Fig.!. Comparison of fermentations AJA007 and AJA0080f MSD462 on 2 gll glycerol. Time points cluster closely and three metabolic states can be identified.
2.5 Data-based Estimation Models
3. J Metabolic State Determination
On-line data was re-sampled to obtain data at 5 min intervals. Off-line data: OD, biomass, protein accumulation, and GroEL concentration were interpolated using a piece-wise cubic spline to obtain data at 5 min intervals. Data from seven fermentations was used to develop partial least Squares (PLS) and Artificial Neural Network (ANN) models. A further three fermentations were used in cross-validation to determine the optimum model structure. Final models were then tested using data from an unseen fermentation. Two categories of models have been developed. One model using each of the techniques. was developed for the whole course of the fermentation to represent a baseline for the comparison. Subsequently, individual models for the three identified physiological states were developed (again using each of the techniques) and their performance was compared to the overall models. Since the training sets would be significantly reduced by only considering data from a particular physiological state. eight fennentations (training and testing as described above) were used to construct the training files and 20 % of each of the files selected randomly was retained for cross-validation purposes. The resulting physiological state specific models were then tested using data from an unseen fermentation.
TabJe I summarises the success rate of classification (in terms of percentage of correct classification) into the physiological state as determined by PyMS clustering. Three types of modelling techniques were used PLS, FANN and RBF. The structure of the models was determined by cross-validation on one of the triplicate samples from PyMS analysis.
.i8-~~~~~: -ro
PS3
J-5.5h
.PSl'" 7-0.5h _---M
'7-5h
7-1.5h
(M-M'~ e-X:-~ '
.
7·1h
•
ltJ :Z-4.Sh i . '" ,'JJS-4.S1;1
1.?!1 ">;-2$, '.
P
PS2
Table I Percentage success rate of correct classification of phvsiological states using three modelling techniques and three different input sets.
PLS IPc
% 40 45
On-line data
44
~vMS
LVs 5 7 2
Neural Network Structure Tvpe 47 34/3 RBF 57 35/3 RBF 9-4-1 76 FANN %
It was found that the highest success rate was achieved
3. RESULTS
using neural network model with on-line data as inputs. A significant proportion of the misclassification was found to be due to assigning a lower physiological state once the next stage is well established. These apparent misclassifications could be an indication of dramatic changes in the physiological state of the organism that would affect productivity. These changes in physiological states could be corrected with appropriate feeding strategies and thus bring the physiological states back to the desired condition.
The capability of PyMS to detect changes in physiological state has been confirmed previously (Austin et aI, 1998). The clustering of PyMS spectra using peA-CV analysis is demonstrated in Figure 1. Analysis of pellets from the duplicate fermentatioIls AJA007 & AJA008 of MSD462 on 2 gll glycerol shows
The optimum network topology as determined by crossvalidation, for PCA and AANN feature based models was found to be 15-35-1. The model using full PyMS spectra had a topology of 150-34-1. We have found that RBF using non-linear features extracted by AANN performed better than RBF models using principal
7529
Copyright 1999 IF AC
ISBN: 008 0432484
PHYSIOLOGICAL STATE ESTIMATION IN RECOMBINANT ESCH .. .
14th World Congress ofIFAC
w ith overall models the average RMS error on the same validation set is sho\vn.
components as inputs. The RBF using the raw data as inputs performed better overall. The improveme nt in correct classification of the metabolic state over the principal component based RBF using raw data was 16 ck on the resting set for the recombinant fermentations . The RBF using AANN extracted features performed 6% better than the RB F using principal components as inputs. For the parent fermentations the RBFs using raw data and features extracted from AANN performed identically and showed the improvement of 17 % over the RBF using the principal compo nents. These results show that the raw PyMS data can be used directly for prediction of metabolic state using RBF networks.
Comparison of predicted and actual values for GroEL production RMS error = 3.5836 20.00 18.00
-;;;
~
16.00 j 4 .00
.S! 12.00
.=,
~ 10.00
-=-
8.00
'"e' co
4.00
,.J
3.2 Estimation Using Overalllvfodels
6.00
2.00
A range of input variables was tested when developin g each m o del and the best combinatio n was found on the basis of RMS achieved on the testing set. The structure of the modcls was determined by cross-validation and is given in Table 2 together with the model performance ex.pressed as the RMS error achieved on the unseen validation data set.
0.00 2.00
2-42
2.83
3 .25
3.67
4.0B
4.50
4.92
5.33
5. 75
6.17
Time(hours) r - -- . - - - - - - - . - I==predict~-+- ra\\! Fig. 2-Comparison of predicted and actual values for PLS physiological state model of GroEL p rod uction.
Table 2 Estimation o f biomass, protein and GroEL concentrations using PLS , FANN and RBF models over the whole duration of the cultivation. Rlv1S - RMS error achieved on the validatio n set; Topology = Topology of the netv.:ork (input nodes: hidden lay er nodes: output nodes); RBFINN number of radial b as is functions/number of neare~t neighbours; LVs = number of latent variables
1t is clear that only GroEL estimation was improved by implementing physio logical state specific models. This is not surprisi ng as the quality of overall models est.imating biomass and recombinant protein has been acceptable (Table 3). Table 3 Estimation of biomass, protein and GroEL concentrations using PLS, FANN and RBF m odels over the whole duration of the cultivation. RMS - RMS error achieved on the validation set; T opology - Topology of the network (inp ut nodes: hidden layer nod e s: o utput n odes) ; RBFINN number of radial basis functio nslnumber of nearest neighbours; LV" - number of latent variables
PLS FANN RBF RMS Topo]o!' RMS RBFINN RMS LVs 15/4 1.8122 3 IProtein 1.800 4:15:1 2 .053 5/4 0.8430 5 tBiomass 0.4367 3:15;1 0.7955 5/3 5.3772 4 GroEL 4.368 4:15:1 4.501 The results clearly show that whilst the estimation of biomass and recombinant protein is acceptable, the estimation of GroEL concentration is very poor. This is eviden t from Figure 2 where the actual and estimated GroEL concentration is plotted for the validation data set.
Process Variable Protein 1 2 3
Avg .RMS
3.3 Estimation Using Physio logical State Specific Models
Biomas::; 1
2 3 Avc . R~1S
Fermentation data was clustered on the basis of Pyl\1S spectra into three physiological states as described in the Methods section. Once again a range of input variable combinations was used in model development and the model structure was determined by cross- validatio n on testing set, The estimation results for biomass , recDmbinant protein and GroEL concentrations for PLS, FANN and RBF are given in Table 3. For compa ri so n
GTOEL
1 2 3
Av~ . RMS
FANN RMS 3.7500 4:4:1 1.3290 6:3:1 1.4310 6 :3: 1 2 .1700 0.1958 6:3:1 0.7394 6:6:1 0.5064 5 :6: 1 OA805 4,4390
TOVO
9 . 12 50 5.8423
4 :4:1
-
RMS 3.0340 1.0486 2.1410 2.0745 0. 1478 05944 0.5610 0.4344 1.9500 1.807 0 35380 2.02l!,"1
RBF RBFI!';N 6/4 12/4 1214 -
15/5 1214
1214 6/4 20/4 12/4
-
PLS RMS 2.0085
Lll73 1.9798 1,7019 0.1888 0.3706 0.7061 0.4218 1.7850
LVs
4 5 5 5 6 4 5
4.9OU2
4
40596 3.5836
6
-
7530
Copyright 1999 IF AC
ISBN: 0 08 043248 4
PHYSIOLOGICAL STATE ESTIMATION IN RECOMBINANT ESCH ...
4. DISCUSSION
14th World Congress oflFAC
REFERENCES
PyMS Analysis of the Cultures The results obtained thus far are very promising. PyMS data has been used to determine [he changes in phy.siological state of the culture. A number of techniques have been used to cluster the PyMS spectra into particular physiological states and it has been shown that on-line data can be used to :mccessfuJly predict the phy;;ioJogical states in the current fermentation once the model has been developed . The information about physiological state can be directly used to control the process in such a way as to maintain a predefined physiological state profile. Alternatively it can he used to switch between physiological states specific models that estimate biomass, recombinant protein or GroEL concentrations within a particular physiological state. This physiological control scheme is indicated in Figure 3.
5. FURTHER WORK Although the results presented here are limited they do show interesting trends which require further analysis. The proposed contro l scheme is currently being investigated under laboratory conditions initially using a PlD controller to control the feed of the limiting carbon source . Thc data thus collected from these fed-batch fcrmcntations will be used to try and identify physiological state changes from PyMS analysis in the cell. This intormation will be used to improve the control strategy.
6. CONCLUSION PyMS has been shown to be of great use for monitoring the changes due to growth and other stresses including the uverall metabolic state of the cell during fermentation. ANNs can be used to learn the changes and predict these from on-line data for control of feeding throughout the fer·mentations.
7. ACKNOWLEDGEMEKTS
Thanks to the staff at the fermentation unit of Zeneca Pharmaceuticals, AlderIey Edge, UK for their support in starting the work. We also acknowledge the financial support for M. ~Feng from the Research Committee of the University of Newcastle and ORS Committee.
Austin, A.J., Piergentili, D., \Vard, A.C., Kara, B. and Glassey, J. (1998): Monito ring of stress levels in recombinant Escherichia coli fermentations for development of feeding profiles using Artificial Neural Networks. Proc. CAB7, Osaka, Japan, pp 263-268 Birnbaum. S. and Bailey, J. E. (1991) Plasmid presence changes the relative levels of many host cell proteins and ribosome components in recombinant Escherichia coli. Biotechnol. Bioeng., vo!. 37 pp 736-745 Chun. J., Atalan, E., \\lard, A. C . and Goodfellow. !vi. (1993) Artificial neural network analysis of pyrolysis mass spectrometry data in the identification of Streptomyces strains. FEMS Microbiology Letters, vol. 107, pp 321-326 Dcdhia, N., Richins, R., Mesina. A. and Chen, W. (1997) Improvement in recombinant protein production in ppGpp deficient Escherichia coli. Biotechnol. Bioeng., vo!. 53(4), pp 379-386 Glassey. J., Montague, G. A., Ward, A. C. and Kara. B. V . (1994a) Enhanced supervision of recombinant E. coli fermentations via artificial neural networks. Process Biochemistry, vo!' 29, pp 387-398 Glassey. J.. Montague, G. A., Ward, A. C. and Kara. B. V. (1994b) Artificial neural network based experimental design procedures fur enhancing fermentation development. Biotechnol. Bioeng., vo!. 44, pp 397 -405 Goodacre, R. and Kell, D. B. (1993) Rapid and quantitative analysis of bioprocesses using pyrolysis mass spectrometry and neural networks: application to indole production. Analytica Chimica Acta, vo!. 279, pp 17-26 Goodacre, R., Karim, A., Kaderbhai. M . A. and Kell, D. B. (1994) Rapid and quantitative analysis of recombinant protein expression using pyrolysis mass spectrometry and artificial ncural networks: application to mammalian cytochrome hs in Escherichia coli . .1. Bioteclmol., vol. 34, pp 185-193 Goodacre, R. and Ke.!l , D. B. (1996) Pyrolysis mass spectrometry and its applications in biotechnology. CIIYr. Opin. Biotechnol .. vol. 7 , pp 20-28 Gutteridge, C. S., Vallis, L. and McFie, H. J. H. (1985) Numerical methods classification of microorganisms by pyrolysis mass spectrometry. In: Computer assisted bacterial systematics, pp 369-409 (Eds. M. Goodfellow, D. Jones and F. G. Priest) Academic Press, London Hendrick, J. P. and Hart!, F-U. (1993) Molecular chaperones functions of heat-shock proteins. Annu. R ev. Biochel11. , vo!. 63, pp 349 -384
7531
Copyright 1999 IF AC
ISBN: 0 08 043248 4
PHYSIOLOGICAL STATE ESTIMATION IN RECOMBINANT ESCH. ..
14th World Congress ofIFAC
Fig. 3 fnferential control scheme incorporating information about the physiological state of the process
Kang, S. G., Kenyon, G. W., Lee, K. J. and Ward, A. C. (1997) The analysis of physiological states in Escherichia coli by the combination of PyMS and neural networks. Process Biochemistry. submitted. Kramer, M. A. (1992) Autoas50eiative neural networks. Computers chem Engng, vol. 16 (4), pp 313-328 Laemili, U. K. (1970) Cleavage of structural proteins during the assembly of bacteriophage T4. Nature. vo!. 227, pp 680-685 Mitchell. A .. Willis, MJ., Tham, MT., Johnson, N. and Skorge, L.M.(1995): Inferential estimation of viscosity index on a lubricant production plant. Proceeding of the American Control Conference, Seattle, Washington, pp24-27 Puschmann, M., Kramer, W., Duerrschmid, E. and Bayer, K. (1997) Quantification of the signal molecule guanosine-5' -diphosphate-3 'diphosphate and other nucleotides in bacterial cell culture. Biotechnology, in press Sehein, C. H. (1991) Optimising protein folding to the native state in bacteria. Curr. OpiJl. Biotechnol., vo!' 2, pp 746-750 Sorenscn, M. A., Jensen, K. F. and Pedersen, S. (1994) High concentrations of ppGpp decrease the RNA chain growth rate; implications for protein synthesis and translational fidelity during amino acid starvation in Escherichia coLi. J. Mo!. Bioi.. vo!. 236, pp 441-454
7532
Copyright 1999 IF AC
ISBN: 008 0432484