Application of radial basis function and feedforward artificial neural networks to the Escherichia coli fermentation process

Application of radial basis function and feedforward artificial neural networks to the Escherichia coli fermentation process

Neurocomputing 20 (1998) 67—82 Application of radial basis function and feedforward artificial neural networks to the Escherichia coli fermentation p...

253KB Sizes 4 Downloads 129 Views

Neurocomputing 20 (1998) 67—82

Application of radial basis function and feedforward artificial neural networks to the Escherichia coli fermentation process Mark R. Warnes!,*, Jarmila Glassey!, Gary A. Montague!, Bo Kara" ! Department of Chemical and Process Engineering, University of Newcastle upon Tyne, Newcastle upon Tyne, NE1 7RU, UK " Zeneca Pharamaceuticals, Mereside, Alderley Park, Macclesfield, Cheshire SK10 4TG, UK Received 11 January 1997; accepted 1 April 1998

Abstract Radial basis function and feedforward neural networks are considered for modelling of the recombinant Escherichia coli fermentation process. The models use industrial on-line data from the process as input variables in order to estimate the concentrations of biomass and recombinant protein, normally only available from off-line laboratory analysis. The models performances are compared by prediction error and graphical fit using results obtained from a common testing set of fermentation data. ( 1998 Published by Elsevier Science B.V. All rights reserved. Keywords: Radial basis function network; Feedforward neural network; Bioprocess monitoring; Biomass estimation; Recombinant process modelling

1. Introduction Current practice in bioprocess industries is for the physicochemical variables to be monitored regularly on-line during a fermentation. However, significant indicators of bioprocess behaviour, such as biomass and recombinant protein concentrations (in the case of a recombinant process) are usually measured off-line in the laboratory, providing delayed and relatively infrequent information. It is therefore very difficult to recognise the early signs of an undesirable fermentation, hindering on-line control actions and ultimately leading to a significant waste of time and resource. This

* Corresponding author. 0925-2312/98/$ — see front matter ( 1998 Published by Elsevier Science B.V. All rights reserved. PII S 0 9 2 5 - 2 3 1 2 ( 9 8 ) 0 0 0 2 5 - 3

68

M.R. Warnes et al. /Neurocomputing 20 (1998) 67–82

problem has led to the development of a range of “software sensors”. These sensors utilise mathematical models (ranging from structured to data-based) and algorithms, together with available on-line information, to estimate the key bioprocess parameters. There are a wide number of data-based modelling techniques available to formulate process models, each varying in complexity and ease of development. They range from well-established statistical techniques, such as multiple linear regression (MLR), principal component regression (PCR), partial least squares (PLS) [19], to non-linear techniques, such as non-linear principal component analysis (NLPCA), non-linear autoregressive moving average with exogenous input (NARMAX) model [2] and artificial neural networks (ANN). In recent years, there has been a resurgence of interest in the potential of artificial neural networks (ANNs) as a modelling tool in bioprocesses. Their ability to learn complex non-linear relationships without prior knowledge of the model structure makes them a very attractive alternative to other non-linear modelling techniques. Numerous successful applications have been made to simulated and actual fermentation data [5,7,8,13,18,19,22,23]. Most of these used feedforward neural network architecture. The purpose of this study is to compare the effectiveness and accuracy of radial basis function and feedforward neural networks in bioprocess modelling and estimation. The models are developed and applied to industrial on-line data from recombinant Escherichia coli fermentations and their suitability assessed from their performances on a common testing set. The paper is organised as follows. Section 2 describes the two techniques used and the bioprocess to which these techniques are applied. Section 3 presents the results obtained and finally Section 4 summarises the work.

2. Methods and data 2.1. Feedforward artificial neural networks A standard feed-forward ANN (FANN), shown in Fig. 1, has been used in this study. Although the figure demonstrates only an FANN with one hidden layer, this study allowed a selection of topologies with two hidden layers depending upon the FANN performance on the process data. The non-linear processing function of the hidden layer neurons used was a sigmoid f (z)"1/(1#e~z),

(1)

where z is the sum of the weighted inputs and bias term. Once the topology of the neural network is selected, the weights for the node connections have to be determined. The determination of these weights allows the ANN to learn the information about the system to be modelled. There are numerous techniques available to perform this weight selection and adjustment. One of the most well-known and well-documented is the Back propagation algorithm [21]. This is a modified gradient descent technique that minimises an objective function (typically, the sum of squared errors of the network for the training data) by simply

M.R. Warnes et al. /Neurocomputing 20 (1998) 67–82

69

Fig. 1. An example of artificial neural network architecture.

redistributing the output error back through the network, appropriately modifying the weights of the node connections. However, this algorithm suffers from slow convergence times and can easily get trapped in local minima within the weight space. Features such as the addition of a “momentum” term and adaptive learning rates go some way to speeding up convergence and avoiding local minima but exhaustive searches must still be made to guarantee the optimum solution. This study uses two alternatives to the Back propagation algorithm. These are the chemotaxis algorithm and the conjugate gradient algorithm. The chemotaxis algorithms is a form of random walk search technique where the weights are adjusted by adding Gaussian distributed random values to old weights [1,25]. The new weights are retained if the resulting prediction error is smaller than that obtained with the previous set of weights. This procedure is repeated until there is no further reduction in training error. The conjugate gradient algorithm [8,16] is similar to the Back propagation algorithm in that it too is a gradient descent method, except that it makes use of second-order information when deriving its next search direction and ultimately its changes to the network weights. The result is a faster, more direct convergence. 2.2. Radial basis function networks The RBF networks utilise a clustering process on the input data before presentation to the network and use different non-linear activation functions that are locally tuned to cover a region of the input space. The network structure is similar to that of FANN shown in Fig. 1 and it consists of an input layer, a single hidden layer containing the same number of nodes as cluster centres, and an output layer. Only the connections

70

M.R. Warnes et al. /Neurocomputing 20 (1998) 67–82

from the hidden to the output layer are weighted, leading to a much faster training rate. This faster training rate means that the neural network models can be quickly developed, and the fixed layer structure of the RBFN (i.e. only one hidden layer) places a convenient restriction on topology, which assists the trial and error selection approach. An additional property of RBFNs is that it is relatively easy to introduce on-line calculation of confidence limits for the model estimations, providing a measure of local reliability. For details on confidence limit calculations see Refs. [11,12]. The main difference between a basic feed-forward ANN and a RBFN is the non-linear transfer function used in the hidden layer nodes. Instead of the sigmoid used previously (Eq. (1)), radial basis function units employ a type of symmetrical Gaussian density function (2) a "exp(!ExK !x E2/p2), h hk h k where a is the activation of the hth unit in the hidden layer given the input vector x . hk k The vector xL is the n-dimensional position of the centre of the hth radial unit in the h space covered by the input vectors (where n is the number of input variables used). The activation, and hence the output, of the unit is dependent upon the distance between the given input vector x and the unit centre xL . The closer to the centre the input lies, k h the higher the activation of the unit, i.e. the larger the value sent on to the output layer in the network. The parameter s is a local scaling constant, which determines the h distance in input space over which the hth unit has influence. The transfer function of the unit is, therefore, only significantly activated if the input vector presented falls within this distance of the unit centre. Training procedure for the RBFNs can be decomposed naturally into three distinct stages: (i) locating the centres xL of the radial units; (ii) determining s , the local h h scaling constant or width of each radial unit; and (iii) calculating the network weights for interconnections between the radial basis layer and the output layer. The scheme proposed by Moody and Darken [17] to perform the training is used in this work. The radial basis unit centres are determined using a technique called k-means clustering [10,15,17]. This algorithm partitions the training data into H subsets or “clusters” in the input space. Each of these clusters has a centre which is directly associated with one of the H radial basis units in the RBFN. The training data points are assigned to the cluster with the nearest centre. The clustering process seeks to find a local minimum of E , the total squared Euclidean distances between the kv.%!/4 K training points assigned to each cluster and the H cluster centres H K E " + + M ExL !x E2, (3) k~.%!/4 hk h k h/1 k/1 where M is the cluster membership function, an H]K matrix whose columns hk contain a single “1” in the row corresponding to the cluster to which the training point belongs and zeros elsewhere. This work uses a “batch” version of the k-means clustering algorithm [15], where all the training data is available initially to perform the clustering. The centres of the

M.R. Warnes et al. /Neurocomputing 20 (1998) 67–82

71

clusters are initialised by randomly selecting points from the training data set. Then the training set is clustered by assigning each point to the nearest unit. When the entire set has been assigned, the average position of the training points within each cluster is calculated and the cluster centre is moved to that point. This process of assigning and averaging is repeated until all the cluster centres no longer move. The process is then said to have converged. Each of the final cluster centres then become the centres of the radial basis units, xL . h The second stage of the training process involves calculating the parameters s of h the radial basis units, representing the receptive width over which each unit will be significantly activated. The appropriate width can be determined using a P-nearestneighbour heuristic

A

B

1@2 1 p p " + ExL !xL E2 , (4) h h j p j/1 where xL are the P-nearest-neighbours of xL . A suggested value for P is 2 [17]. The j h value of P is an important design parameter for RBFNs that needs to be defined correctly by cross-validation techniques in order to yield the optimum model [11,12]. Although it only has a direct influence on the widths of the radial basis units in the input space (causing differences in unit activation for otherwise identical clusterings), the value of P has an indirect effect upon the mapping of the activations on to the correct output space. This is because the trained connection weights are changed since the effective inputs (the unit activations) have been altered. In one sense, it is similar to presenting a different input training data set to the network. 2.3. Process data Data from the laboratory scale (20 L working volume) recombinant Escherichia coli fermentations were supplied by Zeneca Pharamaceuticals. Twenty-four fed-batch fermentations were performed and a number of measurements, such as carbon dioxide evolution rate (CER), oxygen uptake rate (OUR), yeast extract feed (F1), glycerol feed (F2), temperature (TP) and batch age (T), were available on-line at 30 s intervals. Biomass and protein concentrations were measured off-line at irregular intervals.

3. Results and discussion 3.1. Model topology Twelve of the data sets were chosen to form a training set, to be used to train the different models, i.e. determine the model parameters for a range of input variable combinations. These sets were then subdivided into three training subsets, each containing 4, 8 and 12 data sets from the original set, respectively, allowing for model training with different amounts of process information to assess the influence of this factor upon the model performance. The remaining twelve sets formed the testing set,

72

M.R. Warnes et al. /Neurocomputing 20 (1998) 67–82

unseen by the models during the training process. This set was used to select, by cross-validation, the best performing model for the various combinations of input variables. Only the results for the best performing models are reported in this paper. FANN topologies investigated ranged from a single hidden layer with up to 12 nodes to two hidden layers with up to five nodes in each layer. RBFN topologies included a range of nearest-neighbour values (P"2—4) and radial basis units (P#1) to 35. Once the optimum models were found, the results from the testing stage enabled the different modelling methods to be evaluated and compared. The statistics used to measure performance is the mean sum of squared errors (MSE) generated for the testing set, defined as 1 MSE" + (y !d )2, i i n i

(5)

where y is the actual output (biomass or protein concentration), d the modeli i predicted output and n the number of measurements in the testing set. Tables 1 and 2 show the best results obtained for various input variable combinations and various training set sizes using the different modelling methods for biomass and recombinant protein concentrations, respectively. These tables give the best

Table 1 Model structure and errors (MSE) for on-line estimation of biomass concentration Model Input combination

1. CER/OUR/F1/F2/ TP/T 2. CER/OUR/F1/ F2/T 3. CER/F1/F2/T 4. CER/OUR/TP/T 5. FI/F2/TP/T 6. CER/OUR/T 7. F1/F2/T 8. CER/TP/T 9.CER/T

FANN-CG

FANN-CHEM

RBFN

Sets

Topol

MSE

Sets

Topol

MSE

Sets P

Unit

MSE

8

6-3-1

9.1

8

6-2-4-1

12.0

4

4

13

6.4

12

5-4-2-1

4.9

12

5-4-3-1

9.7

12

4

27

8.5

12 8 4 12 12 12 12

4-3-5-1 4-3-1 4-2-3-1 3-4-5-1 3-3-3-1 3-3-5-1 2-3-5-1

7.7 9.3 18.1 4.8 14.6 9.9 4.0

12 8 4 12 12 12 12

4-3-5-1 4-5-1 4-3-4-1 3-2-5-1 3-3-4-1 3-3-3-1 2-4-3-1

9.6 22.1 20.7 11.0 18.4 18.9 10.8

12 12 4 12 12 12 12

4 4 4 4 4 4 4

27 5 32 12 34 8 16

9.2 10.0 18.3 6.2 13.3 8.5 5.67

Note: Model — specifies the modelling technique. FANN-CG, FANN-CHEM"feedforward network trained via conjugate gradient and chemotaxis method, respectively, RBFN"radial basis function network. Input Combination — which inputs were used for each model. CER"carbon dioxide evolution rate; OUR"oxygen uptake rate; F1"yeast extract feed; F2"glycerol feed; TP"temperature; ¹"batch age (time). Sets — which training subsets were used to train the model (choices were 4, 8 or 12 data sets as described in the results section). ¹opol — best topology in terms of the number of hidden nodes used (e.g. 3-2-1 refers to a network with three inputs, one hidden layer with two nodes and one output), P — the nearest-neighbour constant. ºnit — the number of radial basis function nodes in the hidden layer. MSE — the total mean squared error of the testing set when the model was tested.

M.R. Warnes et al. /Neurocomputing 20 (1998) 67–82

73

Table 2 Model structure and errors (MSE) for on-line estimation of recombinant protein concentration Model Input combination

1. CER/OUR/F1/F2/ TP/T 2. CER/OUR/F1/ F2/T 3. CER/F1/F2/T 4. CER/OUR/TP/T 5. FI/F2/TP/T 6. CER/OUR/T 7. F1/F2/T 8. CER/TP/T 9.CER/T

FANN-CG

FANN-CHEM

RBFN

Sets

Topol

MSE

Sets

Topol

MSE

Sets P

Unit

MSE

12

6-2-2-1

2.4

12

6-3-2-1

2.8

12

2

33

1.6

4

5-3-3-1

3.2

8

5-2-5-1

2.8

4

4

5

2.9

4 12 12 4 8 12 4

4-1-1 4-2-4-1 4-4-2-1 3-2-1 3-1-1 3-3-3-1 2-2-4-1

2.9 2.6 1.4 2.9 3.4 2.5 2.5

4 12 12 4 8 12 4

4-3-1 4-2-3-1 4-3-5-1 3-2-4-1 3-5-1 3-2-4-1 2-5-1

2.6 2.6 3.0 2.8 3.3 2.5 2.5

4 12 12 4 8 12 4

4 2 2 2 2 2 2

6 34 23 19 6 30 18

2.7 1.4 1.8 2.1 2.7 1.3 2.6

Note: Model — specifies the modelling technique. FANN-CG, FANN-CHEM"feedforward network trained via conjugate gradient and chemotaxis method, respectively, RBFN"radial basis function network. Input Combination — which inputs were used for each model. CER"carbon dioxide evolution rate; OUR"oxygen uptake rate; F1"yeast extract feed; F2"glycerol feed; TP"temperature; ¹"batch age (time). Sets — which training subsets were used to train the model (choices were 4, 8 or 12 data sets as described in the results section). ¹opol — best topology in terms of the number of hidden nodes used (e.g. 3-2-1 refers to a network with three inputs, one hidden layer with two nodes and one output), P — the nearest-neighbour constant. ºnit — the number of radial basis function nodes in the hidden layer. MSE — the total mean squared error of the testing set when the model was tested.

topology found for each of the networks (based on the MSE on the testing set). Several observations can be made about the performance of these models. In the case of biomass estimation, conjugate gradient trained FANN outperforms the RBFN model for almost every combination of input variables and the Chemotaxis trained FANN for every combination. The lowest MSE was achieved for CER and batch age as inputs and topology 2-3-5-1 (MSE"4.0). Interestingly, the same input variable combination resulted in the lowest MSE within the RBFN models (MSE"5.67 for 16 radial basis units and nearest-neighbour constant of 4). Figs. 2 and 3 graphically illustrate the closeness of fit of these two model estimations, respectively, to the actual data from one testing set. Fig. 3 also shows the 95% confidence limits calculated by the RBFN model. In all these and subsequent figures the data is presented as scaled values due to commercial confidentiality. In the case of recombinant protein estimation RBFN outperforms both the conjugate gradient and chemotaxis trained FANNs (with chemotaxis network again yielding the highest errors) for most of the input variable combinations. The lowest MSE was achieved using CER, temperature and batch age as inputs, 18 radial basis units and nearest-neighbour constant of 2. Fig. 4 shows the fit of this model estimation to the actual data from one of the testing sets together with the 95% confidence limits. For comparison, the estimations of the best performing FANN model (trained via conjugate gradient, using inputs of yeast extract and glycerol feeds, temperature and

74

M.R. Warnes et al. /Neurocomputing 20 (1998) 67–82

Fig. 2. Estimation (-.-) of biomass concentration (-) for test set A using the best performing FANN model trained via the conjugate gradient method.

Fig. 3. Estimation (- - -) of biomass concentration (-) for test set A using the best performing RBFN model, including 95% confidence limits (2).

M.R. Warnes et al. /Neurocomputing 20 (1998) 67–82

75

Fig. 4. Estimation (- - -) of recombinant protein concentration (-) for test set A using the best performing RBFN model, including 95% confidence limits (2).

batch age with a 4-4-2-1 topology, yielding an MSE of 1.4) is shown in Fig. 5. As can be seen from these figures, the graphical fit is considerably worse than for the biomass estimation (the numerical values of MSE are lower since these were calculated on unscaled data and the difference in the magnitudes of biomass and recombinant protein measurement reflects in these MSE values). This can be explained by a more complex relationship between the input variables and recombinant protein production than for the biomass estimation. Although the profiles presented in Figs. 2—5 look similar the rest of the testing set profiles of recombinant protein show a great variation. To demonstrate this point, Fig. 6 shows the estimation of the best performing RBFN model to the whole testing set where batches 1—12 follow on from each other and each peak represents the end of an individual batch. The effect of the topology upon the FANN model performance has been investigated [4], but the data are not presented in this paper. It has been shown that a certain complexity of the network is required to capture the process characteristics, but over parametrisation of the network can lead to over-training and brittleness (loss of the capability to generalise for unseen data). 3.2. RBFN topology The topologies of the best performing RBFNs can be seen, in the majority of cases, to be quite large compared with their FANN counterparts. A significant proportion of

76

M.R. Warnes et al. /Neurocomputing 20 (1998) 67–82

Fig. 5. Estimation (-.-) of recombinant protein concentration (-) for test set A using the best performing FANN model trained via the conjugate gradient method.

the biomass and recombinant protein models use around 15—30 radial units in the hidden layer. These numbers of units reflect the number of radial basis functions that are required to adequately sample the input space of the training data set. The basis functions and their defined position are effectively chosen to represent all the training input measurements. In the case of the training subset containing all 12 data sets available for training, this is a total of 448 measurements. It is perhaps not too surprising then that a reasonable number of units are required to cover all these training points. At the same time, the radial units attempt to capture trends and characteristics within the data set. If too few units are used, the widths of the related radial basis functions become too large to ensure that the input space is covered. As a result of this, each function is activated over a wider area by a greater number of input points. Thus any “locally” behaviour that may be exhibited within the input space is not as effectively represented as it would have been if a more localised radial basis function was available. This of course requires a greater number of radial units, and hence is reflected in the lager topologies. The varying number of units required for each input combination is due to the different spaces defined by the employed input variables. Fig. 7 shows the testing MSE obtained for a progressively increasing hidden layer topology for biomass concentration input model 9 from Table 1. This depicts the topology search on the selected area of interest. The minimum of the error profile can

M.R. Warnes et al. /Neurocomputing 20 (1998) 67–82

77

Fig. 6. Estimation (- - -) of recombinant protein concentration (-) for the entire testing set using the best performing RBFN model, including 95% confidence limits (2).

clearly be seen for the topology with 16 radial basis units. Results not shown here for topologies lying outside the area of interest yield increasing error magnitudes, thus indicating the occurrence of over-parameterisation leading to a loss of generalisation quality. While the RBFN model for protein estimation used mainly two nearest neighbours, the networks for biomass estimation all required the upper limit of four neighbours used in this study. The preliminary results of investigations with an increased number of nearest neighbours (P), presented in Fig. 8, show the error surface for the biomass input model 9 (CER and time) considered with the same range of topologies (3—35 radial units) but using the range 2—15 for the value of P. This clearly shows the improvement as the value of P increases. In fact, MSE values of approximately 4.2 have been recorded with this systematic search of the wider region. This is in agreement with the results obtained by the FANN model. Fig. 8 shows that there is a minimum to be located on the error surface. Because of the natural limitation on the topology of the RBFN (which subsequently places a constraint on the range of P), it would not be unfeasible to consider using a systematic search to find the minimum error and hence the optimum RBFN configuration for a particular choice of input model and training information size. The surface for the biomass model is relatively smooth and therefore, it would not be inappropriate to consider some kind of

78

M.R. Warnes et al. /Neurocomputing 20 (1998) 67–82

Fig. 7. Testing MSEs obtained for progressively increasing hidden layer topologies of RBFN, based on biomass concentration input model 9 (CER/T) in Table 1.

optimisation scheme to locate the minimum. Since the topologies and P must have integer values, the search is not infinitely large and this assists the optimisation. For the recombinant protein case, however, the variation of required P-values seen in the best results (Table 2) indicates that the error surface may not be as smooth as the biomass case. This would likely cause problems for an optimisation scheme and therefore a systematic search would probably be more useful. 3.3. Training speed Direct comparison between the training speeds of FANN and RBFN models is unfortunately not possible since they were implemented in different programming environments with experimental runs being performed on different computer platforms. Comparison of the conjugate gradient and chemotaxis trained FANN (performed under identical conditions) shows that conjugate gradient performed best, with faster convergence during training (10—900 iterations were required depending on the complexity of the model) and a superior convergence ratio. With chemotaxis, 13.2% of the attempted models did not converge to the desired tolerance within the maximum number of learning iterations specified, whereas all the conjugate gradient models converged. Since the weight determination in the case of RBFN is a simple one-step procedure, the major influence upon the training speed in this case is the clustering of input space. Moody and Darken [17] have shown that the training time required to train a RBFN

M.R. Warnes et al. /Neurocomputing 20 (1998) 67–82

79

Fig. 8. Error surface for RBFN model 9 (CER/T) estimating biomass concentration for a range of nearest-neighbour values (2)P)15).

is significantly less than that for a traditional back propagation-trained ANN. The k-means clustering algorithm was found in these experiments to converge in under 20 s for the full 448 available training data points, reflecting the speed and efficiency of the technique [15]. The speed of the linear least-squares method for calculating the connection weights is only restricted by the number of weights to be found. For a relatively small RBFN using for example 10 radial units, only 11 weights (10 plus one for the bias term) need determination and this is completed in a very short time. The benefits to overall model development time due to this fast training procedure are extremely significant. 3.4. Process information requirements The variety of training set sizes utilised by the best performing models in Tables 1 and 2 is very similar for both FANN and RBFN models. For the majority of biomass models, the best performances are obtained using the whole 12 fermentations of the training set. Some of the recombinant protein models also require the entire training set. It is observed that for all the FANN protein models that used the entire training set, the equivalent RBFN models also use the whole set. This suggests that it is preferable to make use of as much training information as is available when

80

M.R. Warnes et al. /Neurocomputing 20 (1998) 67–82

developing a model. Of course, as with any data-based model, the quality of the training data significantly influences the model performance. It has also been shown [4] that the selection of correct input variables has a more significant effect upon the model performance than the size of the training set. A clear example of this fact in the case of RBFN is the significant performance difference between biomass concentration input models 2 and 3, despite the identical configuration of training sets, P and radial units.

4. Conclusion This study has compared the performance of conjugate gradient and chemotaxis trained FANN and RBFN models in the estimation of fermentation performance parameters. It has been shown that conjugate gradient FANN and RBFN provide similar quality predictions for both biomass and recombinant protein concentrations, provided their structures (especially the nearest-neighbour constant in RBFN) are optimised. The chemotaxis FANN was the slowest and least accurate of the compared modelling techniques. The issues related to the model structure, training speed and process data requirements have been discussed. It has been shown that FANN can take a substantial length of time compared with RBFN. Both can, however, result in rich mappings that perform well when presented with new unseen data. Additionally, RBFN can readily provide confidence limits for their predictions, which are very desirable in the industrial applications. Although techniques exist to provide these limits for FANN models [3] their calculation is more straightforward in the case of RBFN. The ability of ANNs to learn process characteristics with little prior knowledge is a desirable trait that eases their implementation and heightens their modelling potential. Combining some user-knowledge of the process with this learning ability make ANNs powerful and flexible tools that are well-suited to modelling the fermentation process. Conventional modelling techniques were also applied to the task described in this paper and the comparison is given in Ref. [4]. The results of the ANN-based study have led to the implementation of a modelling software (ZENNET) developed by the University of Newcastle, at Zeneca Pharmaceuticals. It is now routinely used by the bioprocess research personnel to aid the process development. 5. Further reading [6];[9];[14];[20];[24]. Acknowledgements The authors gratefully acknowledge the support of the Department of Chemical and Process Engineering, University of Newcastle upon Tyne, Zeneca Pharmaceuticals, and the Biotechnology and Biological Sciences Research Council in the UK. The comments of reviewers are also gratefully acknowledged.

M.R. Warnes et al. /Neurocomputing 20 (1998) 67–82

81

References [1] H.J. Bremermann, R.W. Anderson, An alternative to back-propagation: a simple rule for synaptic modification for neural net training and memory, Internal Report, Department of Mathematics, University of California, Berkeley, 1989. [2] S. Chen, S.A. Billings, Representations of non-linear systems: the NARMAX model, Int. J. Control 49 (1989) 1013—1032. [3] G. Chryssolouris, M. Lee, A. Ramsey, Confidence interval prediction for neural network models, IEEE Trans. Neural Networks 7 (1996) 229—232. [4] J. Glassey, Application of artificial neural networks to fermentation development and supervision, Ph.D. Thesis, Department of Chemical and Process Engineering, University of Newcastle upon Tyne, 1994. [5] J. Glassey, G.A. Montague, A.C. Ward, B.V. Kara, Enhanced supervision of recombinant E. coli fermentations via artificial neural networks, Proc. Biochem. 29 (1994) 387—398. [6] R. Hecht-Nielson, Neurocomputing, Addison-Wesley, Reading, MA, 1990. [7] N.A. Jalel, D. Tsaptsinos, A.A. Milrzai, J.R. Leigh, K. Dixon, Modelling the oxytetracycline fermentation process using multi-layered perceptrons, Proc. IFAC Modelling and Control of Biotechnical Processes, CO, USA, 1992, 415—418. [8] M.N. Karim, S.L. Rivera, Application of neural networks in bioprocess state estimation, Proc. ACC, 1992, 495—499. [9] A.H. Kramer, A. Sangiovanni-Vincentelli, Efficient parallel learning algorithms for neural networks, in: D.S. Touretzky (Ed.), Advances in Neural Information Processing Systems, vol. 1, Morgan Kaufmann, Los Altos, CA, 1989, pp. 40—48. [10] J.A. Leonard, M.A. Kramer, Radial basis function networks for classifying process faults, IEEE Control Systems (1991) 31—38. [11] J.A. Leonard, M.A. Kramer, L.H. Ungar, Using radial basis functions to approximate a function and its error bounds, IEEE Trans. Neural Networks 3 (1992) 624—627. [12] J.A. Leonard, M.A. Kramer, L.H. Ungar, A neural network architecture that computes its own reliability, Comput. Chem. Engng. 16 (1992) 819—835. [13] P. Linko, Y. Zhu, Neural network modelling for real-time variable estimation and prediction in the control of glucoamylase fermentation, Proc. Biochem. 27 (1992) 275—283. [14] L. Ljung, System Identification. Theory for the User, Prentice-Hall, New Jersey, 1987. [15] J. MacQueen, Some methods for classification and analysis of multivariate observations, Proc. 5th Berkeley Symp. Math. Stat. and Prob., Berkely, CA, 1967, pp. 28l—297. [16] S. Makram-Ebeid, J.A. Sirat, J.R. Viala, A rationalized error back-propagation learning algorithm, IJCNN 2 (1989) 373—380. [17] J. Moody, C.J. Darken, Fast learning in networks of locally-tuned processing units, Neural Comput. 1 (1989) 281—294. [18] G.K. Raju, C.L. Cooney, Using neural networks for the interpretation of bioprocess data, Proc. of the IFAC Modeling and Control of Biotechnical Processes, CO, USA, 1992, 425—428. [19] J.O. Rawlings, Applied regression analysis: a research tool, Adsworth & Brooks/Cole Advanced Books, Belmont, CA, 1988. [20] S.L. Rivera, M.N. Karim, On-line estimation of bioreactors using recurrent neural networks, Proc. of the IFAC Modeling and Control of Biotechnical Processes, CO, USA, 1992, 159—162. [21] D.E. Rumelhart, G.E. Hinton, R.J. Williams, Learning representations by back-propagating errors, Nature 323 (1986) 533—536. [22] J. Thibault, V. Van Breusegem, A. Cheruy, On-line prediction of fermentation variables using neural networks, Biotechnol. Bioeng. 36 (1990) 1041—1048. [23] V. Van Breusegem, J. Thibault, A. Cheruy, Adaptive neural models for on-line prediction in fermentation, Can. J. Chem. Eng. 69 (1991) 481—487. [24] M.R. Warnes, Modelling techniques for improved supervision of recombinant Escherichia coli fermentations, Ph.D. Thesis, Department of Chemical and Process Engineering, University of Newcastle upon Tyne, 1996. [25] M.J. Willis, C. Di Massimo, G.A. Montague, M.T. Tham, A.J. Morris, Artificial neural networks in process engineering, IEE Proc.-D 138, 1991, 256—266

82

M.R. Warnes et al. /Neurocomputing 20 (1998) 67–82 Mark R. Warnes originally studied as a Theoretical Physics undergraduate at the University of Newcastle upon Tyne, UK from 1989 to 1992. After graduation, he remained at Newcastle to embark upon the degree of Doctor of Philosophy with the Department of Chemical & Process Engineering. These postgraduate studies included the consideration of feedforward and radial basis function neural networks as modelling tools for recombinant fermentation processes. Having finished these studies in September 1995 (submitting his thesis in May 1996) Mark currently works in Coventry, UK as the Web Editor for the University of Warwick (http://www.warwick.ac.uk).

Jarmila Glassey graduated from the Faculty of Chemical Technology, Slovak Technical University in Bratislava, Slovak Republic in 1990. After graduation she worked as a research associate in the Department of Chemical and Process Engineering at the University of Newcastle where she was awarded the degree of Doctor of Philosophy in 1995. Jarka became a lecturer in this Department in 1994 and is leading the biochemical engineering modules within the Department. Her research interests are in the area of bioprocess supervision, modelling, optimisation and control using databased modelling techniques.

Gary A. Montague graduated in chemical engineering from the University of Newcastle. After studying at the University of Sheffield for a year, he returned to Newcastle to undertake his Ph.D. research. Following a year’s secondment to ICI, Gary took up his lecturing commitments and is currently Professor of Bioprocess Control in the Department of Chemical and Process Engineering at the University of Newcastle. Gary has undertaken research predominantly in the areas of process monitoring, control and optimisation. He collaborates closely with industry and has worked extensively with many major pharmaceutical companies.

Bo Kara’s background is in microbial physiology and biochemical engineering gained during graduate and postgraduate studies at the University of Manchester Institute of Science and Technology (UMIST). Bo joined Zeneca Pharmaceuticals (then ICI Pharmaceuticals) in 1987 after a brief period working in the brewing industry (BRF International). Current interests are in microbial physiology, expression system development, fermentation process optimisation and scale up. Other interests include process modelling and control, particularly the application of artificial neural networks to bioprocess modelling.