A Case Study in Industrial Bioprocess Advisory System Development

A Case Study in Industrial Bioprocess Advisory System Development

Copyright @IFAC Advanced Control of Chemical Processes, Pisa, Italy, 2000 A CASE STUDY IN INDUSTRIAL BIOPROCESS ADVISORY SYSTEM DEVELOPMENT Dr Jarka...

1MB Sizes 5 Downloads 54 Views

Copyright @IFAC Advanced Control of Chemical Processes, Pisa, Italy, 2000

A CASE STUDY IN INDUSTRIAL BIOPROCESS ADVISORY SYSTEM DEVELOPMENT

Dr Jarka Glassey*, Prof Gary Montague and Dr Pankaj Mohan+ Department o/Chemical and Process Engineering, University o/Newcastle, Newcastle upon Tyne, NE1 7RU, England. + Eli Lilly Speke Operations, Fleming Road, Speke. Liverpool, England.

Abstract: This paper describes the background and motivation for the construction of a fault detection and advisory system for an industrial fermentation process plant. The need to utilise both algorithmic and rule based fault detection methods is discussed. Following this, the implementation strategy based on the use of G2 from Gensym is outlined. The KAT knowledge elicitation method was used to efficiently capture the fault detection rule set. Examples of the method of elicitation, form of the rules and ease of implementation are given. Finally, the integration with multivariate data-based methods for fault detection and process application of the combined system is described. Copyright @ 2000 IFAC Keywords: Knowledge based systems, knowledge acquisition, bioprocess monitoring, fault detection, multivariate data analysis, knowledge management

improving the situation is to exploit artificial intelligence (AI) procedures.

1. BACKGROUND

Typical supervision issues that arise in traditional chemical process plant operation are compounded in bioprocesses by problems related to the biological aspects of production. As a result of this, a high level of variability in production is common. Feedback control systems are scarce in these processes primarily due to the difficulties in measurement of the actual state of the bioprocess. Measurements that can be made require some degree of interpretation before they can be used to make plant adjustments. Furthermore. the information required can be hidden in a number of temporal process measurements. Thus human expertise in pattern extraction and interpretation is exploited. Human in the loop' control systems can be effective. but by their very nature are not consistent. Operators with varying levels of capability and the fact that process experts are not always available twenty four hours a day means that the control is not always ideal. One means of

The developments in the real-time Knowledge Based Systems (KBS). or Expert Systems ES, have enabled the application of AI techniques for monitoring and control of bioprocesses. A number of approaches have been aimed at using KBS to increase the level of automatic process supervision (Cooney et al.. 1988. Konstantinov et ai, 1994). At the supervisory level the ES functions to simultaneously monitor all process information and thus gain insight into the overall progress of the bioprocess. Pharmaceutical companies are now starting to recognise the worth of AI techniques in general and ES in particular. Whilst a number of ES shells are available commercially, G2 from Gensym is finding widespread use, with the first reported industrial application being by EIi Lilly in 1992 (Fowler et aI. , 1992). Since then many other pharmaceutical

87

companies have purchased and are using G2 as a user-friendly means of coding and implementing expert knowledge but this is not the only issue to consider. The expert knowledge must be accumulated, checked for consistency and most importantly stored and coded in such a way that it is maintainable. Indeed process plants are subject to change and the knowledge base must evolve along with it. Without this the system would soon become redundant. Considering the knowledge elicitation task, many approaches have been suggested, most of which are based on some variant of repertory grid analysis, card sort, goal decomposition, protocol analysis, forward scenario simulation or structured interviews. However, these techniques have a number of limitations. Some result in a knowledge base not very suitable for coding in a machine suitable form, others tend to be highly time and resource demanding (lengthy interviews) and yet others are incapable of capturing unusual situations within the domain that may be most important for the ES to handle. Recently a new Knowledge Acquisition Technique (KAT), developed by Empiricom, has been shown to significantly reduce the knowledge elicitation time and to result in a complete, correct and consistent knowledge base (Duke, 1992). Fundamentally the knowledge elicitation procedure is a highly structured and methodical seeking of successive falsifications of the states of belief of the expert about some core belief state. During interviews the knowledge base is structured in the form of exception graphs that capture the expert's decision process. From these graphs production rules can be generated and relatively straightforwardly coded within G2 or indeed any such shell.

2. APPUCATlON OUfLINE EIi Lilly at Liverpool in the UK operate a large scale fermentation facility producing several products using traditional fermentation techniques. As part of their on-going programme for continuous process improvement, several years ago they installed a G2 system for process monitoring purposes. Most recently, a project was initiated to extend this system further by exploiting the knowledge of the engineers and scientists more readily. The aim was to address all stages of the process with the objective of reducing process variability and increasing productivity. The objective was to code knowledge within G2 to rapidly provide assistance to the operators. The seed stage fermentation was seen as one of the most critical steps in the process and hence the project set out to warn the operators if conditions were arising which would lead to a poor seed. With seed quality identified as the issue, the question arises of how to measure it. No one single parameter identifies a seed as being good. At the end of the seed stage, the decision as to the quality is based upon the operating characteristics experienced and some combination of the process measurements, interpreted by the experts. Ultimately, if high performance was attained at the end of the production run, then a good seed was indicated. However problems arise when a poor performance was attained, since this could be related to problems in the production stage not influenced by the seed. Thus for knowledge based system purposes a good seed was left as an abstract termjudged by the expert.

3. KNOWLEDGE EUCITATlON The quality of the final rule-base is obviously highly dependent upon the correctness of the expert knowledge and the capabilities of the elicitation method to capture this in a complete and accurate manner. An early and critical decision point therefore was the identification of a panel of experts to be interviewed in which the knowledge base owner, the industrial project leader, played a key role. Fourteen experts made up the panel, three knowledge engineers were available to conduct the elicitation and two calendar months were assigned for this activity.

However, rules extracted from experts do not provide an a11-encompassing solution to bioprocess control problems. Algorithmic methods are more suited to tackling certain forms of problem domain such as in extremely data rich situations encountered in large scale bioprocess operation. In such cases cognitive overload is a recognised problem. Thus an efficient bioprocess control system must be a marriage of different technologies, with algorithmic procedures complementing rule-based methods. The work described in this paper concentrates upon fault handling (see Krarner and Fjellheim, 1995 for a review of recent progress in this area). Specifically, two aspects of knowledge based fault handling system development that were the key to the success and were not commonly adopted practice are described: the knowledge elicitation experience leading to the rule-based system and the added capabilities provided by the algorithmic methods.

The KAT process results in a unique exception logic that can be illustrated in a simple form as follows. A good quality seed is normally achieved, unless condition A occurs in which case the expert believes the quality of the seed will be poor. However if condition B also occurs then the seed quality will be good. Associated with these conditions, actions and explanations are sought in a defined and highly structured fashion from the expert in order to complete the advisory system. In this simple case the resulting rules would have a form of: Good seed = true

88

if condition A occurs conclude Good seed =false actions end if if condition A occurs and condition B occurs conclude Good seed = true actions end if

of the seed quality at the time of transfer into the production vessel. The quality of the seed is only ascertained retrospectively once final production data is available and even then only if there are no deviations encountered during the production run. Due to confidentiality constraints it is not possible to identify the precise variables used in the analysis, however, they are typical of the measurements collected during the seed stage of a fermentation process.

In the application considered the rule-base is more complex than this but the principle remains the same. The interview process was completed within the planned two calendar month schedule, with the longest interview taking four man-days. The majority of the interviews took around one man-day. The exhaustive nature of the interviews limited single elicitation sessions to be no longer than half a day. In theory there is no requirement for the knowledge engineer to be experienced in the subject domain. Best practice is to start with the most experienced expert who has most extensive coverage of the domain and is cooperative in offering all possible scenarios. This acts to focus the subsequent expert interviews more efficiently. In the case of this application, the knowledge engineers had an appreciation of the basic features of the domain. This resulted in a faster, more complete elicitation being achieved. There are several hidden dangers though if the knowledge engineer is too experienced in the domain: biasing the knowledge base from preconceived opinions, modifying rather than simply recording the knowledge and subconsciously intimidating the expert during the interview.

In this paper the application of Multiway Principal Component Analysis (MPCA) to pilot plant fermentations is demonstrated. MPCA extracts features present in the data and the compressed information in the form of the principal components (PCs) can be plotted to assist in identifying process variations. To maximise the efficiency of the feature extraction method it is necessary to: •



4.ALGORfT~CPROCEDURES

Multivariate data analysis techniques provide a means of compressing high dimensional data onto a lower dimension to extract the key features. They have proved effective in many diverse applications (Nomikos and MacGregor, 1995), with a common theme being the fact that the patterns relating to problems were hidden in a complex data set and distributed throughout it both among many variables and over time. In such cases univariate Statistical Process Control (SPC) cannot always identify process problems. It is likely that this situation is found with the seed fermentations due to the complex interactions occurring between variables.

Select the process variables containing the most relevant information. Incorporating irrelevant variables reduces the precision of the technique by introducing noise. Given the limited number of variables available an exhaustive testing of all possible combinations was performed in this project. Select the combination(s) of PCs that show the features most effectively. It is not necessarily the PCs that capture the greatest variability in the process measurements that are most descriptive of the features relating to production variability. While it is not necessary to investigate combinations of all the PCs (the lower ones tend to capture the noise), it is desirable to assess a reasonable number of combinations visually. In a practical situation this could be extremely time consuming and the automation of the selection is preferable. This can be achieved by assessing the separation of high and low quality batches achieved. However, in this application it was possible to assess all the combinations of PCs capturing up to 80% variance (normally no more than 5-10) visually since the dimension of the data was low.

Methodology development including consideration of pilot plant fermentations forms a separate publication (Cunha et. ai, 1999) and only the key results are summarised here. Data from 20 seed batches was available and following the selection of the process variables and their sampling rates, MPCA was applied to this data. Figure 1 shows the plot of PCl and PC2 that demonstrates the clearest separation between the batches.

Analysis of the quality of the seed is problematic. The data available on-line is of limited accuracy due to inherent characteristics of the seed stage of most bioprocesses (e.g. due to low initial biomass concentrations the off-gas measurements are not as accurate as in subsequent stages). Furthermore a number of measurements are available only from laboratory analysis (e.g. mycelial volume). However, the major problem is that there is no direct indication

89

It can be seen these are predominantly linearly 6



5

separable, apart from two low productivity batches clustered with the high ones. This is not unexpected since no account was taken of the final stage conditions and it is feasible that the seed was good quality but problems occurred in the final stage. Since these results appeared to offer a useful indicator of seed quality, the methodology is currently being applied to data from production scale.



4

3 2

,t,

,

,t, ,t,

0

.,0

, ••

,t, ,t, ,t,



·2

",,t,

,t,

5. IMPLEMENTATION IN THE G2 ENVIRONMENT

,t, ,t,

·3 ·4

·2

·3

.,

0

,

2

4

3

Principal Component ,

In the final G2 based intelligent system, rule-based advice sits alongside information provided by data based methods (univariate I multivariate SPC) as illustrated in figure 3. In this figure three components make up the KBS. The univariate SPC indicates deviations in process variables from standard profIles. The rules elicited from the experts incorporated the univariate SPC procedures as it was standard monitoring policy in the Company. Hence knowledge elicited was in many cases a direct implementation of univariate SPC and was in standard rule form. Multivariate SPC (MSPC) could not be easily implemented in such a way. Rather than rules, algorithmic methods provided the operational information from the data. Methods are available to interpret the output of the MSPC and come up with likely process problems, but relating the information back directly to process-causes rather than combinations of effects can be more challenging depending upon the particular fault occurring. In some instances this will require the use of a knowledge base which can only be constructed once experience is gained in the performance of MSPC. Thus in the current system MSPC provides indications of problems but the output may require interpretation by operators to determine corrective action. The true power of the MSPC approach will be gained when the KBS interpreting its output is constructed. This is part of an on-going improvement programme.

Figure 1. Score plot of the first two PCs for 20 pilot stage seed cultivations. 6 seeds carried out in vessels used for production seed cultivations; * seeds carried out specifically for pilot plant experiments. The right-hand cluster (6) represents batches grown in fermenters used for production seed cultivations whilst the left-hand clusters (*) represent batches carried out specifically for the pilot plant experiments and thus run in different vessels. Although the operating procedures are identical, variations in vessel geometry and conditions clearly influence the behaviour. This confmns the long held belief of the influence of the vessel on fermentation performance. However this distinguishing feature is not influential when production is concerned since the seed fermentations are carried out in the same vessel. Thus MPCA was solely applied to the batches in the right-hand cluster (6). Figure 2 illustrates the degree of separation within this cluster. In this figure (0) represent batches that ultimately resulted in low final stage productivity while (+) represent the high productivity batches. 3

,,/ 0

2

,

0

"

0/

°

/

/ °

·3

/ /

/

With regard to the rule-based information, following elicitation from individual experts, a fused knowledge base made up of the expertise of fourteen experts was constructed by the knowledge engineers and agreed by the top experts and the knowledge base owner. The next stage was to convert the exception graphs into production rules for implementation within G2. Although a software package is available from Empiricom that will instantly convert the graphs directly into G2 rules or objects, in this application the rules were written manually for experience and assessment purposes. This was a relatively straightforward and rapid exercise. Two man-weeks were required to produce a final set of fifty rules. However, a number of critical issues had to be addressed before these rules could be implemented on-line. In order for the operators to accept and use

0

+

+

v

/

2

/

/ + +

0

,

/

"

//

/

/ ·5 ·5

-4

·3

·2

·1

0

,

2

4

Principal Canponent ,

Figure 2. Score plot of the first two PCs for pilot stage seed cultivations of seeds cultivated specifically for pilot plant experiments. + high final productivity; 0 low final productivity

90

the system, it is vital to display only the necessary information and in a form that is acceptable to them. Therefore a discussion with the operators was arranged and their views captured. It was also essential to construct the system in a way that would

require IllImmum intervention at a later date and would allow easy expansion as the knowledge evolves. The nature of G2 allows this to be achieved with relative ease.

Intelligent Alarming

Process Inputs Process Effects Operational Advice ,,

: ,

:,

.. - - -- __________________________________ ________ - _____________________________________________ --------.1

Feedback via operator intervention / direct feedback opportunities Figure 3. Overview of information and knowledge base One problem arising for any company that does not have the track record of Eli Lilly in implementing knowledge-based systems is justifying the capital outlay to build the system. The time of experts is valuable and the costs involved in coding this information can be high, therefore it is essential to build a business case. This requires some assessment of the likely savings and without experience of application this can be difficult. Eli Lilly have adopted the G2 system widely having found considerable benefit so project justification was more straightforward in their case.

6. CONCLUDING COMMENTS The project described in this paper set out to implement an artificial intelligence based supervisory system making maximum use of the available information. The diverse forms of knowledge necessitated using several different approaches, all integrated into one package. With regard to knowledge elicitation, a direct comparison on the effectiveness of the KAT method over alternative techniques is almost impossible without using a different technique on each of several different experts in the same area of a given domain. Once an elicitation from an expert has been completed using one technique, it is impossible to reelicit using an alternative technique without biasing the result. However, the depth of the knowledge and the speed with which it was elicited indicate that the KAT method is a particularly fast and efficient procedure.

ACKNOWLEDGEMENTS The authors are indebted to Eli Lilly Speke Operations for their financial support and permission to publish the results of the work and Gensym for their provision of software. Furthermore the contributions of Peter Duke and Richard Turner of Empiricom are gratefully acknowledged. Thanks also to Claudia Cunha and Anubhav Ranjan for their work in the algorithmic analysis and G2 implementation. The authors are indebted to all those at Eli Lilly who assisted in the construction of the knowledge base.

Seed data analysis using MSPC appears to provide some very useful information on seed quality which can not be readily obtained using rule-based or univariate SPC. The use of the MSPC method alongside the expert knowledge has the potential to provide a powerful analysis tool for complex processes.

REFERENCES Cooney, C. L., O'Connor, G. M. and Sanchez-Riera, F. (1988) An Expert System For Intelligent

91

Supervisory Control of Fermentation Processes. Proc of 8th Int. Biotech. Symp., Paris, France Cunha, C.C.F, Glassey, J., Montague, G.A., Albert, S. and Mohan, P. (1999), submitted to Biotech Bioeng. Duke, P. (1992). 'KAT A Knowledge Acquisition Techniques Methodology manual'. Fowler G., Alford J. and Higgs R. (1992) Development of real-time expert systems approach for the on-line analysis of fermentation respiration data Proc 2nd IFAC Symposium on Modelling and Control of Biotechnical Processes, Keystone, Colorado Gregersen, L. and Jergensen, S.B.(1999). Supervision of fed-batch fermentations. Chem. Eng. J. 75, 69-75

92

Ignova, M., Glassey, J., Ward, A.C. and Montague, G.A. (1997). Multivariate Statistical Methods in Bioprocess Fault Detection and Performance Forecasting. Trans.lnst. MC 19, part 5,271-279 Konstantinov K.B.,Golini, and F.,Hu, W. (1994) Expert systems in the control of animal cell culture processes: Potentials, functions and perspectives. Cytotechnology, 14,233-246 Kramer M.A. and Fjellheim R. (1995) Proc. Int. Conf. on Intelligent Systems in Process Engineering, Snowmass, Colorado Nomikos P. and MacGregor J.F. (1995) ). Monitoring of batch processes using multiway principal component analysis. AlChE Journal, 40, 13611375