Copyright CO IFAC Computer Applications in Biotechnology, Osaka, Japan, 1998
OPTIMIZED FUZZY RULE GENERATION FOR FERMENTATION CONTROL R. Guthke l , M. Pfaffl and F. Meyerl
J
Hans Knolllnstitutejor Natural Product Research. Jena. Germany 2 BioControl Jena GmbH. Jena. Germany
Abstract: An approach towards automatic generation of fuzzy rules is presented and applied to data from an industrial antibiotic fermentation. Fuzzy rules generated describe the influence of the kinetics of the preculture on the antibiotic yield at the end of the main culture. For feature extraction the data were classified by the fuzzy-C-means algorithm. Both, the selection of fuzzy rules and the optimal definition of membership functions for fuzzy values are carried out automatically. This requires criteria in order to assess the quality of the fuzzy rule set. The formulation of these criteria and consequently. the rule set obtained depend on the purpose of process control. Copyright CO 1998 IFAC Keywords: Fermentation processes. knowledge acquisition; fuzzy expert systems
I. INTRODUCTION
The purpose of this paper is to demonstrate the influence of the objective function for optimized rule generation on the rule set generated.
The availability of data mining procedures that support the development of fuzzy expert systems is a bottle neck that hinders the use of knowledge based systems in bioprocess control. Thus. differeOl methods of rule generation based on pattern recognition techniques were developed (Guthke and Ludwig. 1994. Kamimura et at. 1996; Stephanopoulos et al.. 1997). One of the fundamental algorithms is the so called ID3 algorithm that uses decision trees and applies Shannon's entropy to rule selection (Quinlan. 1986). The use of decision trees reduces the robustness of the rule set when the input data are changing. In this paper the ROSA (Rule Qriented Statistical Analysis) method was used which is not tree-oriented and consequeOlly more robust (Krone and Kiendl, 1994). An alternative approach is to construct a rule base by adjustment of weights assigned to each rule by minimizing the sum of deviation (Piitz and Weber 1996). In our paper the rule set was fitted to the measured input and output values not by adjustment of weights but by tuning the membership functions of fuzzy values in a given rule set.
2. MATERIALS AND METHODS For classification the fuzzy-C-means method was applied (DataEngine. MIT GmbH. Aachen. Germany). For rule selection the software WINROSA (M IT GmbH. Aachen. Germany) was used. The membership functions were fitted using the software tool FuzzyOpt (SEI GmbH. Ilmenau, Germany). Data from 10 fermentation runs of an industrial antibiotic fermentation were studied. The influence of the course of the preculture on the antibiotic yield of the main culture was formulated by a fuzzy model. The variable y represents the final yield of the main culture as shown in Table 1. The variables x and t shown in Figure 1 characterize the preculture of the 10 fermentations runs. (For reasons of confidentiality. experimental details cannot be given and only normalized data are shown. The kinetics x(t) may therefore represent variables such as dissolved oxygen, carbon dioxid evolution rate, oxygen uptake rate. pH or any other measured variable of the preculture.) 277
Table 1: Output variable y: antibiotic yield of the main culture (normalized values) run number 1 2 3
4 5 6 7 8
9 10
antibiotic yield y 0.965 0.964 0.952 0.920 0.958 0.991 0.804 1.000 0.815 0.846
Y05
05
Fig . 3: Two classes of the kinetics y(t") of the antibiotic concentration in the main cuILure For the two rules 'IF x in class 1 THEN y in class l' and 'IF x in class 2 THEN y in class 2' a high probability was found by contingency tahle analysis (Guthke, 1992, Guthke and Ludwig, 1994). Based on this result the input-output relation x(t) ~ y was modelled by fuzzy rules a~ followes . Due to the small numher of 10 fermentation runs used to fit the model only two linguistic values ('low ' and ' high ' ) were formulated for each variable. The crisp definition of these 6 values can be formulated as follows:
X 0.5
05
IF x
Fig. 1: Kinetics of the preculture variable x for 10 fermentation runs (data normalized)
The fuzzy formulation is given by the memhership functions as shown in Figure 4 and eq. (1) for the input variable x:
3. RESULTS For feature extraction the fermentations were classified separatelly in respect to all measured variables using the fuzzy-C-means algorithm (Bezdek, 1981). For the two variables x and y the results of classification are shown in Figures 2 and
~(x,
' X is high') =
l-~(x,
'X is low') =
= max (O,min (l ,(x-X_crit)ID_X+O.5»
(I)
The membership function s for all 3 variables and 2 fuzzy values for each variable were defined by 6 parameters: X3rit, T_crit, Y_crit, D_X , D_T, D_ Y .
3.
............ X is low ::1.
X is high
0.5
X05
O~------~~L---------
o
0.5
x
O ~------------~~------------~
o
X_crlt=
05
Fig. 4: Membership function ~ of the fuzzy values 'X is low' and 'X is high' (for X_crit = 0.5 and D_X = 0.1)
Fig. 2: Two classes of preculture kinetics x(t) 278
The total set of 16 rules formulated by these 6 linguistic values is shown in Table 2.
the fuzzy parameters D_X, D_T and D_ Y were fixed as in Figure 6. The optimization of the three other parameters X_crit, T _crit and Y_crit will be discussed. The three dimensional search will be described in detail with respect to the parameter X_crit which separates 'X is low' and 'X is high' (see Figure 4).
Table 2: Total rule set Al A2 A3 A4 A5 A6 A7 A8 BI B2 B3 B4 B5 B6 B7 B8
IF X is low THEN Y is low IF X is low AND T is low THEN Y is low IF X is low AND T is high THEN Y is low IF T is low THEN Y is low IF X is high THEN Y is low IF X is high AND T is low THEN Y is low IF T is high THEN Y is low IF X is high AND T is high THEN Y is low IF X is high THEN Y is high IF X is high AND T is high THEN Y is high IF X is high AND T is low THEN Y is high IF T is high THEN Y is high IF X is low THEN Y is high IF X is low AND T is high THEN Y is high IF T is low THEN Y is high IF X is low AND T is low THEN Y is high
0.2
Each rule shown in Table 2 has been rated by the relevance index Jj which evaluates the reliability of the rule i using a statistical test procedure (Krone and Kiendl, 1994): Jj is the normalized distance d j between the confidence interval of the probability p(C) for the conclusion C j and the confidence interval of the conditional probability p(C j IP j) under the premiss P j :
0.4
0.6
0.8
Fig. 5: Relevance indices J j of fuzzy rules depending on the parameter X_crit for a = 0.1 and D_X = 0.0
IRule B11
di
Ji=----
1
I-p(C)
The confidence intervals are calculated using the significance level I-a. Figure 5 shows the relevance indices J j for the rules AI, A2, Bland B2 for ~O.1 , D_X= D_T= D_Y= 0.0, Y_crit = 0.83 and T3rit = 0.59 (optimal values as described below) depending on the parameter X_crit (parameter of membership functions, see Figure 4). For the other 12 rules of the total rule set shown in Table 2 the relevance indices were negative. These rules with negative J j were not relevant and not shown in Figure 5.
I I
, ,
0.6
OA
",,
I
,
0.8
Fig. 6: As Figure 5, but with a=O.OI and D_X = D_T = 0.1, D_Y = 0.02
The relevance index J j declines monotonously with decreasing significance level The fuzzyness was zero for the Figure 5, i.e. crisp linguistic values were used. With increasing fuzzyness the shape of the relevance index J j versus the fuzzy parameters X_crit, T_crit and Y_crit becomes smoother and J j is smaller as shown in Figure 6 for D_X= D_T= 0.1, D_ Y=0.02 (i.e. fuzzyness of 10 %) and ~0.01. That implies that the fuzzyness would be minimized by maximizing the relevance indices, i.e. crisp values would be optimal as shown in Figure 5. Therefore, the criterion to optimize the fuzzyness should not be the relevance index but the robustness. Robustness means the rules should be invariant against (small) variation of input data. Howewer, such an optimization using a formal criterion for the robustness will not be considered here. In our studies
To optimize the entire set of selected rules various criteria can be used depending on the aim of fuzzy modeling:
3.1 Fault detection: Average and maximum relevance index If n is the number of rules that were found to be significant than the average relevance index J_av and the maximum relevance index J_max can be determined to quantify the entire set of significant rules:
1
n
J _av =- 2Ji n i=l 279
-7
max
(2)
J max
= i=lmax J, .....
~
max
(3)
n
To optimize X3rit we used the objective function
In the given example the result when maximizing the average relevance index Lav was identical with the result when searching for the maximum of all rules. The optimal fuzzy parameter X_crit = 0.62 as shown in Figure 7 (dotted line, labelled by 'J_av' ) was obtained. Using this value in the membership functions for x the rule A2 turned out to be the most significant one. This rule should be applied to fault detection: IF X is smaller than 0.57 (= X_crit D_X/2) during the early phase of the preculture (t < 0.54 = T_crit - D_T/2) then a low antibiotic yield of the main culture is predicted. In this situation, the preculture should be aborted. Therefore, this rule can be applied to preculture supervision.
Fjn
xI)
(4) as global relevance criterion. In Figure 7 this function is represented as solid line and reaches its maximum at X_crit = 0.25 . The optimized rule set consists of rules Al and B2. This rule set can be used for quality control of the preculture: If the rule B2 fires (i.e. x> 0.3 = X_crit + D_X/2 for t > 0.64 = T _crit - D_T/2) than a high antihiotic yield can be expected and the preculture should be used for the inoculation of the main culture.
3.3 Fault diagnosis and optimal control: Grade of output situation coverage
Fault diagnosi s and optimal process design (Angeolov and Guthke, 1997) by expert systems uses the backward chaining. This inference strategy requires rules for all output situations (i.e. grade of coverage F_out = I). Here, we have two output situations: ' Y is low' and 'Y is high'. That means at least two rules are necessary, one rule labelled A and one rule labelled B in Table 2. The minimum relevance index of both rules should he at a maximum since the relevance index of the most uncertain rule R, detennines the quality of the rule
Lav
'0
,5 I)
g 0.5 1\1
>
I)
'ii a::
0 0
0.2
0.8
set
=
J out min
min
i : F _ out( {Ri : Ji>O})= I ~
Fig. 7: Global relevance J_in_av for rules shown in Fig. 6
Ji
max
(5)
The dependence of the global relevance criterion J_oucmin on the parameter X_crit is shown in Figure 8.
3.2 Quality control: Grade of input situation coverage
In expert systems two basic strategies of inference can be used: forward and backward chaining. For forward chaining it is important to have relevant rules for all input situations. Here, a total of 4 input situations had to be considered (two fuzzy values for each of the two input variables). The rules Al and B 1 cover all input situations. Both rules are relevant for X_crit = 0.8 (see Figure 5). However, the relevance index of rule Al is very low (0.02) and the significance level is not high (ex = 0.10). For higher significance (ex = 0.01) no fuzzy parameter set in the 6 dimensional parameter space was found that generates a rule set which covers all input situations. This is a typical situation when the experimental design is not satisfactory. The grade of input situation coverage F_in was 0.25 , 0.5 or 0.75 if one, two or three input situations were covered by the rules. The grade F_in = 0.75, for example, results from the fact that the rules A 1 and B2 cover three situations within the interval 0.13 < X_crit < 0.42 as shown in Figure 7.
)(
11 '0
.5
A1
11
u 0.5 I:
,
A2
B2
,
B1
I'
-+\-B
I
III
,
> 11 'ii
I
,
I
a:
,
0 0
0.2
0.4
0.6
0.8
X_erit Fig . 8: Global relevance J_oucmin according to eq.(5) for rules shown in Fig. 6 There are three local maxima. Two of them have approximately the same global relevance index (J_oucmin = 0.37 at X_crit = 0.22 and at X_cri! = 0.49). For a lower significance level (a = 0.1) and/or a smaller fuzzyness ( D_X= D_T = D_ Y= 0) there 280
are three local maxima too, but one of them is superior (J_oucmin = 0.61 at X_crit = 0.49). Using this criterion (5) for optimizing the other two model parameters one obtains the optimal fuzzy parameters T_crit = 0.59 and Y _crit = 0.83, which were used for the calculations given above. The multi-modal shape of the objective function J_oucmin versus X_crit as shown in Figure 8 illustrates the need to use globally converging algorithms for the optimization of membership functions. Therefore, genetic algorithm based methods are superior to gradient methods (Angelov and Guthke, 1996).
Table 3: Measured, fitted and Qredicted outQut value~ {:t0~ted) Run i 1 2 3 4 5 6 7 8 9
to 3.4 Process monitoring: Minimum error
;ti 0.965 0.964 0.952 0.920 0.958 0.991 0.804 1.000 0.815 0.846
ll' 0.970 0.968 0.965 0.948 0.946 0.931 0.800 0.970 0.800 0.893
ed ;tt 0.980 0.968 0.971 0.970 0.946 0.914 0.800 0.960 0.546 0.976
Table 4: Fuzz:t model Qarameters X crit, T crit, and :thiib identified by the best fit for 9 fennentation runs {excluding the one that is indicated by the index i} which are used for calculation of Qredicted ed values yt
y~
A number of commercially available software tools use the best fit of the fuzzy model to the experimental data. Such models can be applied to process monitoring (so-called software sensors). For this fitting the sum of deviation squares (sdq) and the between the calculated values y* I experimental data Yi is minimized:
J _ sdq
~ (y * = -1 ~ n.
i -
)"
yi - ~
. mm
Run i 1 2 3 4 5 6 7 8 9
(6)
i=l
Using this criterion a Tagaki-Sugeno-type fuzzy model (Tagaki and Sugeno, 1985) with the crisp output values Ylow and Yhigh was identified that mapps the preculture kinetics xlL) to the antibiotic yield Yi* of the main culture. The fuzzy model that employes rules A2 and B2 (see Table 2) with parameters X_crit = 0.79, T _crit=0.42, Ylow=0.81, Yrugh=0.96, D_x=0.OO5, D_t ~0.OO5 was found to provide the best fit to all 10 fennentation (n= 10) with Lsdq=0.0208. The fuzzyness wa" very low. With a greater fuzzyness fixed at D _x = D_t = 0.1 we identified the parameters X_crit = 0.76, T_crit=0.46, Ylow=0.80, Ybigh=0.97. The fitted values yt" (= Yi*) are shown in Table 3. These values y/,' were calculated ed using all 10 runs , whereas the values yt were calculated using 9 runs (n=9) only. In the latter case one run labelled by i can be used to test the model validity. The Table 4 shows that the results (i.e. parameters identified) were robust in 9 of the 10 cases. Only when run 9 was excluded the identified parameter Ylow = 0.45 was found to differ from the mean value (0.8±0.01) and the predicted value yl red = 0.546 was far off the measured value Y9 = 0.815 . In this case only one low productive run (run 7) was taken into account. This was unsatisfactory with respect to an adequate identification considering that run 10 is not typical. When, however, run 9 and run 10 were not taken into account, than the predicted value was found to match the measured value very closely (Y9 pred = 0.80) and the identified parameter Ylow (=0.80) was found to be the same as the mean value.
to
X crit 0.76 0.76 0.76 0.77 0.76 0.76 0.76 0.76 0.76 0.71
T cri! 0.47 0.46 0.47 0.39 0.46 0.48 0.46 0.44 0.31 0.53 .
;tlow 0.81 0.80 0.81 0.79 0.80 0.80 0.80 0.80 0.45 0.81
;tbii!l1 0.98 0.97 0.98 0.97 0.97 0.98 0.97 0.96 0.96 0.98
Fig. 9: Hyperface J_sdq according equ. (6) for Ylow=0.81, Yhigh = 0.96, D_X=D_T=O.1 and the 10 fennentations runs shown in Figure 1 and Table 1 The surface in Figure 9 shows the objective function J_sdq in dependence on the model parameters X_crit and T 3rit. It is characterized by different flat areas (with J_sdq = 0.045, 0.084 and 0.140) and by a narrow and curved groove which contains the optimum (at X_crit=0.76, T _crit=0.46, J_sdq=0.02) . 281
For this reason, gradient methods for parameter identification did work only when the optimization procedure was started close to the optimum. A genetic algorithm based procedure was found to converge sufficiently, although it proved more time consuming (Aogelov and Guthke, 1996).
5. REFERENCES Angelov, P. and Guthke, R. (1996). An Approach to Fuzzy Optimal Control supported by Genetic Algorithms. International Panel Conference on Soft and Intelligent Computing, Budapest, Hungary, October 1996. Angelov, P. and Guthke, R. (1997). A GeneticAlgorithm-based Approach to Optimization of Bioprocesses Described by Fuzzy Rules. Bioprocess Engineering, 16, 299-303.
4. CONCLUSION The two rules A2 or Al and B2 were found to be adequate to describe the relation between the preculture kinetics x(t) and the final yield of an industrial antibiotic fermentation . The parameters that quantify the membership functions of the fuzzy values were identified using different criteria. Maximizing the average or maximum relevance index the rule A2 was obtained which can be applied to preculture supervision. However, this rule did not cover all input situations nor all output situations. The rule set obtained by maximum grade of input situation coverage was able to classify the quality of the preculture with respect to the inoculation of the main culture. In order to use backward chaining in expert systems the rule set obtained by maximizing the grade of output situation coverage should be applied. All these criteria (2)-(5) link the relevance indices of the significant rules which were calculated based on a statistical approach (WINROSA). The rules generated by the application of such a statistical approach can be applied to reasoning in expert systems. A quite different approach is necessary for process monitoring. For this task a rule set A2 and B2 was generated that provides the best fit to experimental data (least square method). The fuzzy model based on these two rules predicted the antibiotic yield from the preculture kinetics in 8 of 10 cases with an error smaller than 10% (greater errors for run 9 and 10) as shown in Table 3. The same rules proved to be suitable for different objectives in bioprocess control. However, the result of the identification of fuzzy parameters depends on the given objective as examplified in this paper for the parameter X_crit (X_crit = 0.62 for criteria 2 and 3, X_crit = 0.25 for criterion 4, X_crit = 0.22 and 0.49 for criterion 5, X_crit = 0.79 for criterion 6). The various criteria investigated for fuzzy rule generation (i.e. rule selection and fuzzy parameter identification) were not suitable for fuzzyness identification. Fuzzyness must be optimzed using a criterion that assesses robustness.
Bezdek, J. (1981). Pattern Recognition with fuzzy objective function algorithms. Plenum, New York. Guthke, R. (1992). Learning of Rules from Fermentation Data. In: Modeling and Control of Biotechnical Processes. Eds.: M.N. Karim and G. Stephanopoulos, Colorado, USA, pp.403-405. Guthke, R. and Ludwig, B. (1994) . Generation of Rules for Expert Systems by Statistical Methods of Fermentation Data Analysis. Acta Biotechnol., 14, 13-26. Kamimura, R., Konstantinov, K., Stephanopoulus, G. (1996). Knowledge based systems artificial nural networks and pattern recognition: Application to biotechnological processes. Current Opinion in Biotechnol., 7, 231-234. Krone, A. and Kiendl, H. (1994). Automatic Generation of Positive and Negative Rules for Two-Way Fuzzy Controllers. EUFIT '94, Aachen (Germany), pp. 438-447. PUtz, A. and Weber, R. (1996) . Automatic Adjustment of Weights in Fuzzy Rule Bases. EUFIT '96, Aachen (Germany), pp. 933-938. Quinlan, 1.R. (1986) . Induction of decision trees. Machine Learning, 1, 81-106. Stephanopoulos, G., Locher, G., Duff, MJ., Kamimura, R., and Stephanopoulos, G. (1997). Fermentation Database Mining by Pattern Recognition. Biotechnol. Bioeng., 53, 443-452. Tagaki, T. and Sugeno, M. (1985). Fuzzy identification of systems and its application to modeUing and control. IEEE Trans. Systems Man and Cybernetics, 15, 116- 132.
282