Copyright © IFAC ArtificialIntelligencein Real-Time Control, Delft The Netherlands, 1992
A TASK DECOMPOSITION APPROACH TO
USING NEURAL NETWORKS FOR THE INTERPRETATION OF BIOPROCESS DATA G.K. Raju and c.L. Cooney Biotechnology Process Engineering Center, Department cf Chemica! Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
AIls1I:W:1. During the courseof mostbloprocess development programs a largeamount of process data Is generated and stored. However, whilethesedatarecordscontain important information aboutlhe process, littleor no use is made of this asset. The workdescribed hereusesa neural network approach to"learn"to recognize patterns in fermentation data. Neural networks, trained using fermentation data generated from previous runs, are then used to interpret data from a new fermentation. We propose a taskdecomposition approach to llie problem. Theapproach involves decomposing theproblem of bioprocess data interpretation into specific tasks. Separate neuralnetworks are trained to perform each of thesetasks which includefaultdiagnosis, growth phasedetermination and metabolic condition evaluation. Thesetrained networks are combined into a multiple neural network hierarchy for the diagnosis of bloprocess data. The methodology is evaluated usingexperimental data from fed-batch, Saccharomyces cerevisiae fermentations. We argne that thetaskdecomposition approach taken here allows for each network to develop a task specific representation and that this in tum, can lead to network activations and connection weights thatare moreclearly interpretable. Theseexpert networks can now bepruned to removenodesthatdo not contribute significant additional information. Keywgrds Neural network; task decomposition; Saccharomyces cerevisiae; learning, pattern recognition, data interpretation. bioprocess, modular approach, bioprocess development. fed-batch
Introductlon During the course of bioprocess development there is often a need to make inferences about the state of a complex biological system using a very limited number of measurements. However, due to the scarcity of measurements and the complex. nonlinear, and time-varying nature of cell growth and product formation, there is both a severe limitation in our ability to measure system "performance" (often formulated in terms of yield, productivity, product quality, concentration and purity) and a serious lack of models to relate observed parameters to performance. This leads to the need for a large number of experimental runs to be performed in order to approach statistically meaningful optimal conditions for product synthesis. However, since the experimental space for process optimization is large, only a part of it is investigated because of the considerable amount of time and effort required to do so. As a result, bioprocesses continue to be plagued by variability in performance despite the effort to maintain tight control. Figure 1, summarizes three different approaches to dealing with this problem. We characterize the traditional approach to solving the bioprocess monitoring problem over the last 20-30 years as being a model based engineering approach 3 ,16. While this is an appropriate approach to use for some simple model systems, the time constraints (which determine the economics of process development) and effort involved make detailed modeling and sensor development difficult to justify for more complex systems. In these cases, simplified models tend not to reflect the real situation because of the large number of assumptions required before the model takes on a form that is tractable for actual use. To deal with these difficulties an expert system approach has been developed over the
447
last few years as a solution to some of the problems encountered during bioprocess development. We characterize this second approach as being a knowledge based engineering approachs), This approach is aided by the large number of "generic rules" associated with process diagnostics and process performance. While this approach attempts to capture the known expertise about a process, experience has shown that it is limited by the ability of getting an expert(s) to formalize a consistent set of rules for a process that fundamentally is not yet well understood. In this paper, we propose a third alternative to dealing with the bioprocess data interpretation problem. It is motivated by the fact that a primary asset in most bioprocess development programs is the large amounts of process data recorded during the course the experimental runs. Early on in the life of the process, detailed models and specific expertise have not yet been developed. They have to be "learned" by doing more runs and by analyzing the results of the previous runs. We propose a neural network engineering approach. The ability of feedforward networks to model arbitrarily complex relationships together with their ability to leam from examples t"3 motivates us choose that network architecture to demonstrate our methodologylU''.
Materials and Methods A fed-batch Saccharomyces cerevisiae fermentation was used as a model system to test the neural network engineering approach. This system is well understood and is one for which models have been developed. In addition, a knowledge based expert system. has been developed for this model system!'. Using a well understood model system for which other techniques have already been developed'! gives
us a basis to evaluate the neural network engineering approach. The details of the experimental work are given by O 'Connor et aI.I ! Given a set of measurements and calculated variables, the pattern recognition task is to identify the growth phase of the fermentation, the cell's current metabolic state, and equipment or sensor faults, Figure 2 shows fermentation data from a fed-batch S. cerevisiae fermentation. The data is from a fermentation where certain metabolic conditions are deliberately inflicted on the cutture!'. Hence the metabolic condition during the fermentation is known .
Results This network was previously trained on data from one fermentation where abnormalities were deliberately inflicted on the culture. The blackbox network resulted in a generalization accuracy of 64% . This generalization accuracy depicts the number of correct interpretations made by the trained network on a fermentation that the network had not seen before. This generalization accuracy did not depend on the choice of objective function. That is , the same generalization accuracy was obtained for networks trained using the mean squared error objective function and those trained using the cross entropy objective function ' .
The training algorithm involves finding the optimal set of hidden nodes and connection weights for the task. This is done by training networks with different numbers of hidden nodes. The fermentation data is partitioned into three sets - a training set , a cross-validation set and a testing set. An optimal set of connection weights is defined to be one that leads to the maximum generalization accuracy (minimum error) on the cross-validation data set. The optimal number of hidden nodes is one that leads to the maximum generalization accuracy (on the cr~ss validation data set) among alI the fully trained networks. The training data set is used to actually train the network and determine the optimal set of weights, The cross-validation data set is used to decide when the training is complete. A network is fully trained when it has achieved its highest generalization accuracy on the cross validation d ata set. The testing set is then used only to report results on. Each of the networks were trained to minimize using two different objective functions : mean squared error and cross entropy.
Task decomposition Approach While it has been shown that a feed forward neural network can simulate any relationship to an arbitrary degree of precision, this assumes that the necessary training data is available and that the network can be trained until a global minima is reached. This is usually not the case when dealing with more practical pattern recognition problems. The training data are often incom ple te and inconsistent. Error backpropagation and any other gradient descent method are local techniques. Hence, there is a need to use a-priori knowledge to constrain the number of possible solutions when using such methods for real world problems where the training data is finite. There are many ways to introduce this a-priori knowledge. Some of them include the choice of the activation functions (e.g., logistic vs. radial basis fuucrio ns t), objective functions- and constraining the network weights to satisfy certain constraints (e.g.. scale and rotation invariance).
Blackbox Approach Figure 3 illustrates the typical "blackbox approach" to the pattern recognition task we chose. This approach is similar to that taken in recent work using neural networks for pattern recognition in chemical engineeringwt. The network shown in Fig. 3 was trained on experimental data obtained from the S. cerevisiae fermentation with deliberately inflicted abnormalities. Inputs to the neural network are variables that are typically recorded during the course of bioprocess development programs. (Table 1). Typeof variable
Variable
Measured variables
Temperature
Another way to use a-priori knowledge is to decompose the problem into simpler, welldefined tasks . This is one way of constraining the number of possible representations. Modular architectures can be designed to reduce the effect of conflicting training information referred to as "crosstalk'". "Crosstalk" can be spatial or temporal. Spatial crosstalk occurs when the output units of a network provide conflicting error information to a hidden unit during training. If a separate network was designed for each of these outputs, then this crosstalk would be eliminated. Temporal crosstalk occurs when a network is trained to perform different functions at different times. Train ing a system to compute one function may affect the system's ability to learn a second function. This transfer of tra ining can be positive or negative. The goal in designing suitable modular architectures is to have similar functions learned by the same network (resulting in the positive transfer of training) and dissimilar functions learned by different networks (avoiding the detrimental effects of negative transfer of training)6.
pH %0 2 in outlet gas %C0 2 in outletgas
time Manlpulated variables
Agitata speed Air flow rate
Calculated variables
02 uptake rate C02 evolution rate respiratory quotient
Table 1: Variables used as Inputs to network
The outputs from the single large network can be divided into three sets of outputs; each associated with a separate task . The first set involves the identification of the phase of the fermentation and is only the requ ired classification if the fermentation is normal. The second involves the detection of an abnormal metabolic condition. The third set involves the detection of equipment and sensor faults. This task decomposition is based on the rationale that it does not make sense to try to recognize the phase or metabolic condition if there are sensor or equipment faults.
Time is included as an input. In addition to variable values, their average trend over the last 3 time steps is also used in order to deal with the dynamics involved in the pattern recognition task. There are twenty-one inputs to the network. Sixteen possible categories were chosen as outputs based on knowledge of the different kind of possible states the fermentation could be in. (Fig. 3).
448
Figure 4 shows a task decomposition approach to the pattern recognition task of monitoring a fermentation. The task involves breakingdown a networkthat learnsto perform manydifferent and contradictory functions at the same time, into a set of networks, where each learns a set of similar functions. The approach involves breaking the problem of fermentation control into specific tasks of fault diagnosis, growth phase determination, and metabolic condition evaluation. The different neural networks are trained for each of these tasks. These trained networksare then combined into a multiple neural network hierarchy for fermentation control. These expertnetworks are then combined with an "observer" network which is trainedto assignthe patternrecognition task to the appropriate expert networkbasedon the inputpattern.
nodes and sevenoutputnodes. Figure 5 does not show the inputlayer. Filled circles indicate nodes with an activation of 1 while the empty circles indicate no activation at all. Figure 6 shows the connection weights between the hidden layer andtheoutputlayer. Thepattem of activations through thenetwork together with the connection weights depictedin Fig. 6 indicate thatthe three hidden nodes haveleameda rough binary representation of the inputs to the networks. The different combinations of these three higherorderbinary features are then all that arerequired to classify the setof faults. It canbe argued that the fault network has "learned" to extractthe threehigherorderfeatures present in the bioprocess data andused thesefeatures then tomaketheclassifications. Pruning
We expectthatmodulararchitectures will develop representations that are more easily interpreted. That is, will be easier to understand how a modular architecture implements a function than to understand how a single network implements the sarne function. In the modularapproacha differentset of hidden units is used to represent information about each different task. The functions learned by the modulescan be thoughtof as building blocksto be used in more complex tasks. Modular architectures localizefunctions anddevelop more interpretable representations and henceare easier to debug. An advantage of the modular approach is that each network becomes an expert at a certain subset of the classifications. Since the smallerspecialistnetworks have a smaller, more welldefined,task, it is more likelythat they will developinterpretable representations of theinput output relationship. Because much of expert knowledgeis modularized, it is easier to embed domain knowledge into a connectionist system when it is organizedin a modularfashion. Each of the four networkswere trainedindependently using data relevant to the task that the network was trained to perform. That is, thefaultnetwork was trained only with data that contained faults, the phase network was trained with data known to be normal and the metabolic network was trained using data generated during abnormal metabolic condition. The observor network is trained with all the data and is used to classify data as indicatingnormalmetabolism, abnormal metabolism, and equipment and sensor faults. The softmax- output function was used with cross entropy was the objective function. This combination allowedfor the network outputsto be interpreteda-posteriori probabilities-. While this combination resulted in the same generalization accuracy as in the case where mean-squared-error was used, the probabilistic interpretation of the network outputs is convenient and can form a basis for a rigorous interpretation of network outputs. It also allows for multiple networks to be combined into a multiple networkhierarchy.
Pruning is useful becauseit allows for theremovalof nodes or connections thatarenot beingused to maketheclassifications. Previous researchers 10,9 have developed pruning algorithms for feedforward networks. Here we use a statistical approach developed byBogeret al.! to prune the inputnodesnot contributing to the final classification. Each of the threeexpert networks werepruned toremove inputsthatwere notcontributing to classification. Table2 depicts thepruningof theirinputnodes. Network Phase net Metabolism net Fault net
Inputs nodes before pruning
Inputs nodes after pruning
21 21 21
3
9 6
Table 2: Pruning ofinput nodes As depicted, there is a significant reduction in thenumberof inputs required. Conclusions
This work investigated the use of artificial neuralnetworks as an way to "leam" to recognizepatterns in bioprocess data. Using a fed-batch S. cerevtsiae as a model system, the neuralnetworkapproach has beendemonstrated for theinterpretation of bioprocess data. It was found that a task decomposition approach to designing neural networks has certain advantages over a singlelarge network approach. Decomposing the pattern recognition taskinto smaller specialized subtasks improved abilityto generalize. Analysis of the faultnetworkshowed that themodular approach allowed the smaller expert networks to develop representations thatare moreinterpretable, This opens the possibility of trying to understand whatthese features represent. Network pruning resultedin a significant reduction in the number of hidden nodes.
Results
The modular approach using task decomposition resulted in a generalization accuracy of 79%. This is an increaseover using a singlelargenetwork.
Literature 1. Boger, Z., H. Guterman and M. A. Kramer. Neural Network Reduction: Application of a Statistical Relevance Approach. Submitted to IEEE Transactions onNeuralNetworks, August, (1990).
It was difficult to get interpretable representations from the single large network. Figure 5 summarizes the propagation of activations from the hidden layer to the output layer of the fault network fault for the seven different classes. The optimal fault network network architecture consisted of twenty one input nodes (described earlier), three hidden
2. Bridle, J. S. Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Reccgnition. In
Neuro-computing: algorithms, architectures and applications, Springer Verlag, 1989.
449
3. Cooney C. L., H. Y. Wang and D. I. C. Wang. Computer aided material balancing for prediction of fermentation parameters. Biotechnology and Bioengineering, 19.55·67, (1977). 4. Geman, S., E. Bienenstock, and R. Doursat, Neural Networks and the BiasjVariance Dilemma. Neural Computation 4, 1·58,(1992). 5. Haskins, J. C. and D. M. Himrnelblau, Artificial NeuralNetwork Modelsof Knowledge Representation in Chemical Engineering. In Computers and Chemical Engineering, 12,No. 9/10, pp. 881-890, (1988). 6. Jacobs,R. A., M. I. Jordan and A. G. Barto. Task decomposition through competition in a modular connectionist architecture: The what and where vision tasks. COINS Technical Report 90-27, University of Massachusetts, Amherst, MA.. March,(1990). 7. Lippmann R. P.. An introduction to computing with
neural nets. IEEEASSPMag.• 4, 4-22, (1987).
8. Leonard. J. and M. A. Kramer. NeuralNetworks and Pattern Recognition Techniques for Fault Tolerant Control. In Proc. American Instiuue of Chemical Engineers AnnuaiMeeting, San Francisco, November, (1989). 9. Le Cun,Y., J. S. Denker, and S. A. Solla. Optimal Brain Damage. In Touretzky, D., editor, Neural Information Processing Systems. 2, (1990). 10. Mazer,M. C., and P. Smolensky. Skeletonization: A Technique for Trimmingthe Fat from a Network via Relevance Assessment. In Touretzky, D., editor, Neural Information Processing Systems, 1, Denver, Morgan Kaufman. (1988). 11. O'Connor. G. M.. Development of an Intelligent Fermentation Control System. Ph.D. Thesis. Massachusetts Institute of Technology, September (1989). 12. O'Connor, G. M., F. Sanchez-Riera and C. L. Designand Evaluation of Control Strategies for High Cell Density Fermentations. Biotechnology and Bioengineering. Vol.39,293-304,(1992). Coone~.
13. Raju, G. K. and C. L. Cooney. Using Artificial Neural Networks for bioprocess control. ACS 200th National Meeting, Washington D. C., August, (1989).
14. Raju, G. K. and C. L. Cooney. Using Artificial NeuralNetworks 10 aid the interpretation of bioprocess data. IFAC Symposium on Modeling and Control of Biotechnical Processes, Keystone, Colorado, April, (1992). 15. Rumelhart D. E. and J. L. McCelland (Eds), Parallel Distributed Processing: Explorations in the Microstructure of Cognition. MIT Press, Cambridge, Mass.(1986). 16. Stephanopoulos, G. and Ka-Yiu San. Studies on On-Line Bioreactor Identification. r. Theory. Biotechnology and Bioengineering, 26. 1176-1188, (1984). 17. Venkatasubramanian, V. and K. Chan, A Neural Network MethodOlogy for Process Faull Diagnosis, AIChE Journal. Vol.35, 12, 1993-2002. (1989).
450
Modelbasedengineering approach Model basedengineering Validate model w~h experimental data
Mathematical formulation ot bloprocess
Knowledge based engineering approach Kngwledoe basedengIneering Knowledlle Representation Problem &lIving Methodology
Neural network engineering approach Neural network engineering A-~iori knowledge
Generalizations
odular~y
Rules Corralations
Pruning Statistics
+--
Experimental Data
Fig. 1: 3 different approaches to bioprocess data interpretation
•• ......
120 100 llO Oluolvod OXygen llO (", ..t) .co
20
o
4-,-..-~~---.-......,..-r--.-..,...,-..-rii"-'\"::<""""""::;::~-'\
30 25
20 15
RQ In 5
o
-5
Fig. 2: CER, DO and RQ over time with some induced faults Lagphase Growth phase & Ethanol prodn. Growthphase& cell mass prodn. Oxygen Umffatfon
Fermentation time Temperature pH Air flow rete StirringRate
N~rogen Um~atlon
Dissolved oxygen C02 evolution rate
Nutrient Llm~atlon
02 uptakerate respiratoryquotient %002 exit gas
Glucose Starvation Growth Slowing Ethanol Consumption
%02ex~gas
Temptrend pH trand Air flow trend
00 signal failure 1t's:~K:~ 00 probedead
Stirrersignalfailure
Stirring ratetrend 00 trend CERtrend
Stirrermotordead Air flowsignal failure Airflowplugged
OURlrend RQ trend
Massspsoreadings unreliable
%C02trend %02 trend
Fig. 3: "Blackbox" approach to Interpreting bioprocess data
451
~
~~~se
Growth Phase & Ethanolprodn. Growth Phase & Cell Massprodn.
GROWTHPHASE IDENTIFICATION
Normal
Oxygen IIm~ation
/I.-'-'"Nltrogen limitation Nutrient llmltatlon lucose starvation
dR:.o~Abnormal Metabolic Condition
~Growth slowing Ethanol consumption ABNORMALMETABOLIC CONDITION EVALUATION DO signal failure
Equipment Fault
DO probe dead RPM signal failure RPM motor dead Air flow signal failure Air plugged Mass Spectrometer readingsunreliable EQUIPMENTFAULT DIAGNOSIS
Fig. 4: Task decomposition approach
FAlLT 1
~
g ~
.s
Q
0
e 8
:g
"8r::;:
sc-
0 0
.E
~
o
g j
•8
o
0
gO
i ~
0 0 0
~
o
0
.. FAlLT7
'§
•
a
0
.':
'5
c-
o
8 g ~ i• 0i .E 0 :g
8• •
"8r::;:
.s
0 0
FAlLT 4
8
:g
. FAOCT8 g ~ I 8 • •0 .s
I-ALlT 5
:g
FALlT3
FAllT 2 0
•
.E
0
8 0 0
0
0
•
Fig. 5: Analysis of network activations of fault network
L
I
Hode#4
Hode#3
Hode#2
Hodll#1
r
r;1:~r:;-iL I I.. I~~ :! I Hod e#5
:: Hode#6 ..
..
Hode#7
Fig. 6: Connection weights between hidden layer and output layer for fault network
452