Development of an expert system for selection and synthesis of protein purification processes

Development of an expert system for selection and synthesis of protein purification processes

Journal of Biotechnology, 11 (1989) 275-298 Elsevier 275 BIOTEC 00422 Development of an expert system for selection and synthesis of protein purifi...

1MB Sizes 0 Downloads 70 Views

Journal of Biotechnology, 11 (1989) 275-298 Elsevier

275

BIOTEC 00422

Development of an expert system for selection and synthesis of protein purification processes J.A. Asenjo, L. Herrera * and B. B y r n e Biochemical Engineering Laboratory, University of Reading, P.O. Box 226, Reading R G6 2A P, U.K.

(Received 12 May 1989; accepted 30 May 1989)

Summary A 'second generation' of protein purification rules has been developed and implemented into a prototype expert system (ES) using a 'shell'. The work has concentrated on developing a more precise and accurate knowledge base for selection of optimal large scale protein purification sequences. Expert knowledge was obtained partly from the literature but mainly from industrial experts working on the large scale separation of therapeutic, diagnostic and analytical proteins. The knowledge was expressed in - 6 5 rules, some of which carry a degree of uncertainty. The downstream process was divided into two distinct subprocesses: a first subprocess called recovery after which the total protein concentration is 60-70 g l - t and a second subprocess called purification. A limiting factor in the development of ESs for protein purification at present is the acquisition, clarification, formalization and structuring of the domain of expert knowledge. The main deficiency of accurate information was found to be in that required for the selection of high resolution purification operations on a rational basis. An expert system for selection of optimal protein separation sequences will give the user a number of alternatives chosen on the basis of extensive data back-up on proteins and unit operations. This constitutes a clear case of 'expert amplification' and not of 'expert replacement'. Expert system; Selection; Protein purification; Purification process

Correspondence to: J.A. Asenjo, Biochemical Engineering Laboratory, University of Reading, P.O. Box 226, Reading RG6 2AP, U.K. * Permanent address: Department of Chemical Engineering, University of Chile, Beaucheff 861, Santiago, Chile.

276 Introduction

Developments in the field of artificial intelligence (AI) in the last few years, particularly in building and using expert systems (ESs), have made this a potentially important tool in the area of computer-based process design and process synthesis. In general, AI is concerned with the development of computer-based programmes that emulate the reasoning of humans. This requires the understanding of human problem-solving methods in areas such as those where the amount of knowledge to be manipulated is large a n d / o r there are significant uncertainties. An expert will narrow down the search by recognizing patterns and using appropriate heuristic rules. Designing an interactive computer programme to do this is the study of knowledge-based ESs, which has important potential in engineering design and operation (Banares et al., 1985a). A complete ES is shown diagrammatically in Fig. 1 and can provide advice on a wide range of tasks. It typically consists of: (a) Knowledge base: a collection of general facts, rules of thumb and models of the behaviour of the problem domain (equivalent to our long-term memory). A number of forms have been used to represent knowledge, and the most widely used is the production system model (Brownston et al., 1986). The knowledge is encoded in the form of antecedent-consequent pairs of I F - T H E N rules, and uncertainty in the knowledge is represented by confidence or certainty factors. (b) Inference engine: uses the knowledge base and context to solve the problem; it is equivalent to our problem-solving ability. A number of problem-solving strategies exist in current ESs. In addition to these two main modules, the system should be provided with a workspace for keeping track of the solution process (called 'context', this is equivalent to our short-term memory), a user friendly interface, an explanation facility and a knowledge acquisition module. Conventional scientific and engineering computer programmes consist of a set of statements where the order of execution is predetermined. These programmes are rather rigid and updating them needs considerable effort (the programmer has to locate the appropriate place to update in the predefined sequence). The programmer must at all times ensure complete specification of the problem and uniqueness of the solution. ESs alleviate this strictness by making a clear distinction between the knowledge base and the control strategy. This partition allows for incremental addition of knowledge without manipulating the overall programme structure; the programmer does not need to guarantee completeness, and, by using confidence factors with the I F - T H E N rules, the system can be made to provide a number of alternative solutions, ranked on the basis of their likelihood, thus relaxing the uniqueness constraint. Knowledge-based ESs deal with difficult, ill-structured, problems in complex domains, for which no straightforward, algorithmic solutions exist. The solution process often involves skillful manipulation of substantial quantities of knowledge, in a trial and error fashion, starting out with certain assumptions and hypotheses and revising them when necessary until the solution is reached. Although many

277

papers have been published on the applications of ESs in engineering, there has not been much activity in the chemical engineering field. The potential of ESs in chemical engineering, however, has been recognized (Banares et al., 1985a). Process synthesis is one area where ESs can be of great help, particularly in the selection of plant equipment and in the design of separation sequences, specially in the field of biotechnology. Rigorous methods such as those used for process synthesis in chemical engineering will not be appropriate to solve the overall process synthesis and selection of operations problem in biotechnology as rigorous information and databases, mathematical correlations, design procedures and constraints are not readily available as they are in chemical process engineering. AI has been applied to several biotechnology related topics such as protein structure estimation, planning of experiments in genetics and process modelling ('Fuzzy Models') (Siletti and Stephanopoulos, 1986). The topic of process selection and rational design of large scale protein separation sequences has been explored by Siletti and Stephanopoulos (1986), by Wacks (1987) and more recently by Asenjo (1988).

Development of AI programmes ES technology has been applied to relatively few systems in chemical engineering. However there are applications in closely related areas. H E U R I S T I C D E N D R A L is an ES written in I N T E R L I S P that helps the chemist determine the molecular structure of organic compounds (Buchanan and Feigenbaum, 1978). F A L C O N is an ES for diagnosing faults in process plants (Chester et al., 1984). An example of how AI can be applied to a chemical engineering problem is illustrated by the expert system C O N P H Y D E (CONsultant for PHYsical property DEcisions) developed by Banares and Westerberg (Banares et al., 1985b). PICON (Process Intelligent CONtrol) is an ES for monitoring and controlling industrial processes (Moore et al., 1984). H E A T E X is an ES for aiding in the construction of networks that minimize energy requirements (Grimes et al., 1982). The REACT program developed by Govind and Powers (1981) generates synthetic routes to industrial chemicals.

Process design in biotechnology Process design and selection of operations is a complex process where the design evolves (from a preliminary stage to the final stage) in a trial and error fashion, repeatedly revising and refining the initial assumptions and restrictions [1, flowsheet generation (qualitative/semiquantitative); 2, quantitative design of units; 3, revise flowsheet (1), then 2 and etc. until some objective is reached]. ESs must have the necessary domain knowledge of the process such as databases and models for the performance of the important unit operations. An important aspect of process design involves the selection of operations and design of plant equipment. In the initial stages this process is more or less done using heuristics: using rules of thumb to arrive at a rapid (and reliable) specification of equipment type, size and maybe cost.

278 The design of a protein recovery and purification process shares m a n y characteristics with other engineering design activities. To design a process or an operation requires the satisfaction of a number of constraints (purity, quality, process temperature, desired yield) using what is known about the materials (chemical and biochemical properties, thermodynamics and fluid dynamics of the process material) to end with a sequence of equipment interconnected in a particular order. It is important to stress that the type of reasoning behind the design process does not rely only upon strict mathematical models. Equations could only provide the information necessary to conclude that a particular equipment is appropriate, but the inclusion or not of the step is left for the designer in a job which is based mainly on judgement. Siletti and Stephanopoulos (1986) proposed a prototype design system for protein recovery using several AI techniques with facilities for selecting a sequence of separation steps and to allow the biochemical engineer to create and analyze alternative processes. In the case of processing equipment it included both algebraic equations and qualitative (heuristic) rules. In order to determine a separation strategy physicochemical properties of protein and contaminants had to be determined. Estimates of these were carried out based on detailed knowledge of the amino acid sequence of the target protein. Selection of individual purification operations was carried out based on the deviations in the isoelectric point, molecular weight, and hydrophobicity of the target protein from the average values of those properties in the protein mixture (contaminants of the product protein). Processes that separated according to the most deviant properties were retained. This philosophy has certain disadvantages since it will usually result in the choosing of inappropriate unit operations because different separations handle differences in physicochemical properties of proteins with dramatically different efficiency (e.g. gel filtration is very inefficient on a large scale whereas ion-exchange is very efficient) and also that protein physicochemical properties and behaviour in separation operations cannot be accurately predicted from their amino acid sequence. The implementation of this interesting prototype, which included a process flowsheet, was done on a Symbolics 3640 LISP machine and was written in ZETALISP. Wacks (1987) developed our 'first generation' expert system (ES) to help a biochemical engineer in selecting a protein purification sequence. After a designer has supplied some initial data regarding the product protein characteristics as well as its source, the system is able to generate a generalized recovery process. The system is interactive and offers explanations as to why certain choices are selected. Design of the recovery and purification sequence using the expert system starts with the user answering some basic questions on the microbe and protein product. A typical question is " W h a t type of microbe is being grown to generate the protein in question?" or " W h a t is the incoming cell concentration?". It will also ask for certain physicochemical properties of the target protein that will affect separation such as isoelectric point, surface hydrophobicity, molecular weight, location in the cell, and concentration. Next the system sets up in abstract form the overall process flow sheet consisting of a description of those steps in the separation sequence which must be carried out

279 in order to achieve the desired yield or purity decided upon by the user. Again the user may be questioned if more data are needed to determine the process selection. One such question might be "Will the product be used for therapeutic purposes?". At this point the user can type " W h y ? " to any question being asked and will receive an explanation like " I f the product is to be used medicinally the protein has to have a specific purity (e.g. 99.98%) and cannot have specific contaminants (e.g. pyrogens below x ppm)". Since at this point the number of doses to be given to the patient and the size of those doses will have an effect on the purity level required it may be better to specify the exact purity required as well as the level of specific impurities that can be allowed. After the process has been selected it is displayed to the user. The system then decides on the particular unit operations to be used based on specific requirements of the target product and again the user may be questioned if more information is required to make a more accurate decision. The final decision is made and displayed to the user. Finally, the explanation facility is made available offering the user the option of receiving an explanation for the use of each unit operation. In this program all of the qualitative knowledge was represented in sets of I F - T H E N rules which were grouped in rule bases. A description of the rule bases and an English description of the rules were carried out (Wacks, 1987). The actual program rules were implemented in OPS5 (Brownston et al., 1986). An example rule from rule base no. 3 where particular unit operations were chosen is: IF cell disruption is required and organism is yeast or bacteria T H E N use a homogenizer (Manton-Gaulin) (rule 66) This 'first generation' ES was written in LISP and rules were implemented in OPS5. It concentrated on building and developing the structure and main features of a system to help a biochemical engineer to select a medium or large scale protein purification sequence as seen in Fig. 1, including an inference engine, a knowledge acquisition subsystem, an explanation subsystem and a user interface. Although a few experts were interviewed this study did not concentrate on developing a particularly accurate or advanced knowledge base. The 'second generation' ES, which is the one described in this paper and has been called P R O T E I N system has concentrated in developing accurate and more advanced knowledge which has been obtained mainly from industrial experts. The aim of the P R O T E I N system is to assist in the preliminary selection of operations (flowsheeting) of the downstream recovery and purification in manufacture processes of therapeutical, diagnostic and analytical proteins which are products of the 'new' biotechnology.

Knowledge acquisition process and use of expert system 'shells' The major steps involved in the development of an ES are shown in Fig. 2. (a) Identification. This involves identifying the relevant experts and resources needed as well as the knowledge engineering concepts required. This will usually

280

-Ii:::::

:.::. : :,.:. ,..:.:....:.:..:. : ~'.':'ilt~ : Working ..~ ::iiii!ii:!!:!:..~i;:ii'.i:ii:~:~!i:~r--];:; memory :i~t [;!:Rules.::.{;.:.::}]:;;:.!:;! Facts::.::~ I:.: .::::::;:::

t

I

Inference engine

L lnference

../

Control

I

",, -....

-~ii~e~ttieOgn~ sExu~a;a~e~~

]

iltUerSearce

t

Expert or knowledge engineer

User

Fig. l. The architectureof a knowledgebased expert system. (The knowledgebase is shaded for emphasis).(Harmonand King, 1985). consist of the translation of expert knowledge into a general formal structure. This will only include a small part of the knowledge. (b) Formalization. This involves the selection of a knowledge representation scheme and appropriate tools to build the ES. At this stage usually the knowledge

II)ENTIFICATION

I

Ar,o l

I IMPLEMENTATIONt

TESTING ,, , Fig. 2. Developmentprocessof a knowledgebased expertsystem(ES).(Banareset al., 1985a).

281 engineer becomes familiar with the domain; he performs a few preliminary interviews with the expert. (c) Implementation. This consists of encoding the knowledge developed in the previous stage. A prototype system is developed at this stage. (d) Testing and Refinement. The prototype system is taken to the expert and tested. Several examples are run and the weaknesses in the knowledge base and the inference mechanism are identified. This final and vital step consists of knowledge acquisition through experimentation. Currently, the major bottleneck in the development of ESs is the knowledge acquisition process (Banares et al., 1985a) which is mainly carried out in stage c and to some extent in stage d. Today there are well developed ES software systems or 'shells' e.g., Personal Consultant Plus (PC Plus, Texas Instruments) or Expert Systems Environment (ESE, IBM), that help develop an organised knowledge base from the domain knowledge and also provide the inference engine. No shell is truly universal and as a system grows and develops it may become necessary to develop special purpose inference engines.

The knowledge of the PROTEIN system The knowledge available for the prototype system reported here (our 'second generation' ES) has been expressed in - 6 5 rules. This 'second generation' ES has been developed using a shell and has concentrated on extensive interviews and consultation, particularly with industrial experts, to develop a much more precise and accurate knowledge base in the field of selection of optimal large scale protein purification sequences (flowsheeting). The use of the rules depends upon the context in which each rule is valid and some rules carry uncertainty associated with them. The downstream process was separated into two distinct subprocesses which were used to structure the knowledge of the problem. A first subprocess comprises taking the reaction mixture out of the biochemical reactor system (e.g. a bioreactor, a hollow fibre reactor) and processing it to the point that a cell free solution is obtained and that the total protein concentration is

INTRACELLULAR PRODUCT

!

/ PRODUCTION

----~ HARVES . . . . -~ DISRUP . . . . . I ......... 1 I I T oN TING -

DEBRIS SEPARA-

W -~.A

TION

S

i

-: :::::i

....

__3

Cell Free Prot. ----~ CONCENTRA- I 60-70 g i -~ • ......-I TION .... ~WATER Fig. 3. The operations involved in the recovery block of a separation process.

282

--

TIONING

I

.....

TION

PURIF.

I

----

" ->

Fig. 4. The operations involved in the purification block of a separation process. between 60 and 70 g 1-1. This subprocess is recovery (Fig. 3) and it comprises harvesting, cell disruption, separation of solid debris and precipitation of nucleic acids (if required), and concentration (also called dewatering). The objective of this stage is to recover the product from the production system. The second subprocess comprises taking the 60-70 g 1-1 protein solution and purifying the individual protein products to a high purity with a high yield. This subprocess is purification (Fig. 4) and it comprises preconditioning or cleaning, high resolution purification and polishing. The objective of the preconditioning operation is to attain a 'gin clear' solution (in expert's terms). In the high resolution purification stage (usually one or two steps), the aim is to achieve a very high purity. It is possible to achieve this in one step (e.g. affinity chromatography) but it may be practically more appropriate or more economical to carry out the high resolution purification in two steps (e.g. two ion-exchange steps, Duffy et al., 1988). If the requested purity is not attained by the high resolution purification steps, a polishing operation is designed with the purpose of separating small amounts of contaminants or aggregation or hydrolysis products. The aim of this ' p r o t o t y p e ' expert system is to select the proper sequence of operations or flowsheet.

Recovery block (a) Harvesting is always required. The variety of equipment found in industrial practice is not very large (centrifuge, rotary vacuum filter, membrane filtration) and the decision depends upon the microbial source, equipment availability, equipment efficiency and economics. If the product of interest is intracellular the solid fraction of the harvesting is kept; if the product is secreted (extracellular) then the liquid part is kept. When a mammalian culture is used, the product will be extracellular. (b) Cell disruption is required when the product is intracellular. The equipment is selected mainly on the basis of the microbial source, as each type of microbe presents particular resistance to disruption. A result of the disruption technique used is the size of the resulting debris which has an influence on subsequent operations as well as potential damage to the activity of the protein product. (c) Mechanical disruption releases nucleic acids which need to be precipitated. Thus, a step is necessary for precipitation of nucleic acids. There is only one standard method to achieve nucleic acid precipitation (polyethyleneimine) so the expert system needs only to decide whether this operation is required or not.

283 (d) Separation of cell debris from the proteins in solution has to be undertaken once the cells have been disrupted. As a result of this step, the product will be in a solution with other proteins but without solids. Disruption and separation are not performed if the product of interest is extracellular. (e) Concentration is required when the protein concentration of the harvested, disrupted and separated stream is below 60-70 g 1 ~. A range is given as with some proteins it would be impossible to get much higher than 60 g 1-~ without a serious increase in viscosity which would impose very poor transport characteristics on the system. If a m e m b r a n e (e.g. ultrafiltration) is used for concentration the resulting flux would determine the highest possible concentration that could be obtained from the operation. If at the point where flux has dropped the concentration is below 60 g 1-~, then the proteins can be precipitated (e.g. by a m m o n i u m sulphate) to increase the final concentration.

Purification block (a) Preconditioning which is usually necessary as a first step in protein purification to adsorb contaminants in suspension that might be present and produce fouling of the high resolution purification operation (ion-exchange, affinity or hydrophobic interaction) and hence shorten its life. The aim of this stage is the recovery of soluble proteins from other contaminants (particularly material in suspension) to produce a 'gin clear' solution; it constitutes a clean-up step. Usually a relatively inexpensive treatment is used. This step will evidently not give a high purity but must give a very high yield. (b) High resolution purification. This stage should give a product of up to 99% (usually 95-98%) purity. Typical operations include one or two high resolution ion-exchange chromatography steps or affinity chromatography (Duffy et al., 1988). Although a high resolution is the main concern at this stage, an adsorbent that will also give a high yield should be chosen. Chromatofocusing has also been proposed as a very high resolution operation to be used at this point on a relatively large scale. (c) Polishing. After the high resolution stage, a polishing step will probably be necessary to obtain ultra high purity. This will mainly depend on the final use of the protein. If another physicochemical property cannot be exploited, gel filtration will be used which can separate dimers or oligomers of the product (formed due to aggregation phenomena) or its hydrolysis products (due to the action of proteases) solely on the basis of different molecular weights. H P L C can also be used for polishing, however this is an expensive technique for preparative purposes.

Characterization of product and starting material The main parameters used to characterize the culturing system and the reaction mixture are: microbial source (bacteria, yeast, fungus, mammal, unknown) or any combination of the above if the downstream process is of general purpose; product

284

of interest, Cellular location (intracellular, extraceUular, unknown), Name, Titration curve (charge as a function of pH), Isoelectric point, Surface hydrophobicity, Molecular weight, Differential product release of microbial source data base, Two phase aqueous separation data base, Thermal stability. For most proteins only parts of this information will be available. Evidently, only that which is available can be given to the system. Furthermore, complete information will only be required by the fully developed system and little is used by the prototype described in this paper. An important feature of an ES is that it should find a solution even if only partial information is available. The proposed process consists of a sequence of operations to obtain the stated design objective. There might be several different sequences of operations that will accomplish the same objective. In those cases, a semiquantitative degree of performance (given by the 'certainty factor') of each operation is assigned by the expert and carried by the system into the proposed design. Each of the operations described in Fig. 3 (recovery subprocess) requires the determination of particular equipment that could accomplish the purpose of that stage. Which equipment is used depends mainly upon the properties of the starting materials and process economics. The strategy selected at this point is mainly based on empirical expert knowledge. The following rules are used to select the initial harvesting equipment ( H - E Q U I P M E N T parameter).

Recovery subprocess Certainty factor MICROORGANISM = FUNGI IF T H E N H - E Q U I P M E N T = Microporous membrane H - E Q U I P M E N T = Rotary vacuum filter H - E Q U I P M E N T = Filter press M I C R O O R G A N I S M = YEAST IF T H E N H - E Q U I P M E N T = Disc centrifuge H - E Q U I P M E N T = Microporous membrane MICROORGANISM = BACTERIA IF T H E N H - E Q U I P M E N T = Microporous membrane MICROORGANISM = MAMMALIAN IF T H E N H - E Q U I P M E N T = Disc centrifuge H - E Q U I P M E N T = Microporous membrane MICROORGANISM = UNDEFINED IF T H E N H - E Q U I P M E N T = Disc centrifuge H - E Q U I P M E N T = Microporous membrane

system

system

0.4 0.3 0.3 0.6 0.4

system

system

0.7 0.3

system

0.5 0.5

Once the harvesting equipment is decided, if the product is extracellular the output stream of the unit should be characterized by its protein concentration, viscosity, density and other parameters. The user will be consulted as to whether these properties have been measured.

285

Disruption of the cells is required if the product is intracellular, so that the solids from the harvesting are kept. If the product is extracellular, disruption is not required. The product is expected to be extracellular if mammalian cells are used. These facts are reflected by the rules: IF THEN IF THEN

MICROORGANISM = MAMMALIAN P R O D U C T IS E X T R A C E L L U L A R P R O D U C T IS I N T R A C E L L U L A R D I S R U P T I O N IS R E Q U I R E D DEBRIS S E P A R A T I O N IS R E Q U I R E D P R E C I P I T A T I O N OF N U C L E I C A C I D S IS R E Q U I R E D IF P R O D U C T IS E X T R A C E L L U L A R T H E N D I S R U P T I O N IS N O T R E Q U I R E D DEBRIS S E P A R A T I O N IS N O T R E Q U I R E D P R E C I P I T A T I O N OF N U C L E I C A C I D S IS N O T R E Q U I R E D Whenever disruption ( D - E Q U I P M E N T ) is necessary, the particular equipment needed will be obtained from the following set of rules: IF M I C R O O R G A N I S M IS IN D P R 1 data base T H E N D - E Q U I P M E N T = Differential product release A N D DEBRIS-SIZE = L A R G E A N D P R E C I P I T A T I O N of N U C L E I C A C I D S IS N O T R E Q U I R E D D - E Q U I P M E N T = Mechanical breakage (bead mill or homogenizer)

Certainty factor 0.6

0.4

If the cell disruption operation on a bacterial culture releases proteases then it would need to be inhibited so that it does not hydrolyse the proteins in the reaction mixture. The rules to select the operation are: IF

DISRUPTION REQUIRED AND MICROORGANISM = BACTERIA T H E N D - E Q U I P M E N T = High pressure homogenizer DEBRIS-SIZE = SMALL IF DISRUPTION REQUIRED AND MICROORGANISM = BACTERIA A N D PROTEASES = H I G H T H E N Add a protease inhibitor to the reaction mixture IF DISRUPTION REQUIRED AND MICROORGANISM = FUNGI

1 DPR is a differential product release data base which shows the use of chemical and enzymatic treatment to permeabilize the cell wall and selectively release specific intracellular proteins. The DPR operation is selective hence the nucleic acids remain in the cell. So, if DPR is used, nucleic acids removal is unnecessary.

286 THEN IF THEN

D - E Q U I P M E N T = Bead mill DEBRIS-SIZE = MEDIUM DISRUPTION REQUIRED AND MICROORGANISM = YEAST D - E Q U I P M E N T = High pressure h o m o g e n i z e r DEBRIS-SIZE = SMALL

T h e p r e v i o u s is a general rule for i n t r a c e l l u l a r yeast p r o d u c t s ; it can be sup e r c e d e d b y very specific rules in case of k n o w n a p p l i c a t i o n s as follows: IF

THEN

DISRUPTION REQUIRED AND MICROORGANISM = YEAST A N D ( P R O D U C T = V L P (Virus like particles) O R P R O D U C T = H e p B vaccine) D - E Q U I P M E N T = Bead mill DEBRIS-SIZE = MEDIUM C e r t a i n t y factor

IF THEN

DISRUPTION REQUIRED AND D-EQUIPMENT = Undetermined D - E Q U I P M E N T = H i g h pressure h o m o g e n i z e r DEBRIS-SIZE = SMALL D - E Q U I P M E N T = Bead mill DEBRIS-SIZE = MEDIUM

0.6 0.4

As e x p l a i n e d before, the size of the d e b r i s m a t e r i a l is a p r o p e r t y of the d i s r u p t i o n e q u i p m e n t a n d is a design p a r a m e t e r of the next process, separation. F o r the s e p a r a t i o n process, a n o t h e r d a t a b a s e will be r e q u i r e d that links t w o - p h a s e a q u e o u s s e p a r a t i o n ( T P A S ) to the p r o d u c t that needs to be s e p a r a t e d from the debris. In this p r o t o t y p e we have n o t c o n s i d e r e d the processing of inclusion b o d i e s from E. coli. A l t h o u g h this is an i m p o r t a n t a r e a of d o w n s t r e a m process design there are n o t m a n y s a t i s f a c t o r y m e t h o d s for large scale s e p a r a t i o n , d e n a t u r a t i o n a n d refolding of the p a r t i c u l a t e proteins. R e c e n t d e v e l o p m e n t s in the use of reversed micelles for p r o t e i n r e f o l d i n g a n d of t w o - p h a s e a q u e o u s systems for s e p a r a t i o n of VLPs f r o m yeast a p p e a r p a r t i c u l a r l y attractive. T h e s e p a r a t i o n unit o p e r a t i o n ( S - E Q U I P M E N T ) will be selected from the following rules: C e r t a i n t y factor IF THEN IF THEN IF THEN

DEBRIS-SIZE = S-EQUIPMENT S-EQUIPMENT DEBRIS-SIZE = S-EQUIPMENT DEBRIS-SIZE = S-EQUIPMENT S-EQUIPMENT

LARGE = Disc centrifuge = M i c r o p o r o u s m e m b r a n e system MEDIUM = M i c r o p o r o u s m e m b r a n e system SMALL = Ultrafiltration = M i c r o p o r o u s m e m b r a n e system

0.6 0.4

0.4 0.6

287 P R O D U C T IS IN 2 P H A S E SEP D A T A BASE A N D 2 P H A S E SEP E F F I C I E N C Y H I G H T H E N S - E Q U I P M E N T = 2 phase separation IF DEBRIS-SIZE = Undefined O R S - E Q U I P M E N T = Undefined T H E N S - E Q U I P M E N T = Microporous membrane system S - E Q U I P M E N T = Disc centrifuge S - E Q U I P M E N T = Ultrafiltration IF

0.5 0.3 0.2

The next step is a concentration operation which is thought to be crucial because it greatly reduces the volume of material to be handled by the subsequent operations. Dewatering ( W - E Q U I P M E N T ) will be necessary in order to obtain a liquid phase where the concentration of total protein will be in the range 60 to 70 g 1-a of total protein. The concentration of proteins in the stream needs to be backcalculated or obtained from the client. The most probable equipment to concentrate the protein is ultrafiltration. IF THEN IF THEN

P R O T E I N C O N C E N T R A T I O N >~ 60 g 1W - E Q U I P M E N T IS N O T N E C E S S A R Y C O N C E N T R A T I O N < 60 g 1-1 W - E Q U I P M E N T = Ultrafiltration

A set of rules is composed to decide whether the ultrafilter can or cannot achieve the required concentration depending upon the flux obtained through the membrane. If the flux is too low and the protein concentration is below 60 g l -a, precipitation with an inorganic salt (e.g. (NH4)2SO4) is used. IF

W-EQUIPMENT = ULTRAFILTRATION AND FLUX = HIGH T H E N Continue with ultrafiltration IF W-EQUIPMENT = ULTRAFILTRATION A N D F L U X -- LOW A N D P R O T E I N C O N C E N T R A T I O N < 60 g 1 1 T H E N Use precipitation IF W-EQUIPMENT = ULTRAFILTRATION A N D F L U X = LOW A N D P R O T E I N C O N C E N T R A T I O N >/60 g 1T H E N F I N A L CONCENTRATION = A C T U A L CONCENTRATION go to the next operation

Purification subprocess At this point in the process, a cell free protein solution is obtained. The subsequent operations are designed to meet the final product specifications as given by the user. If the solution is already 'gin clear' the next step could be a high resolution purification.

288 If the solution is not 'gin clear' it will require a preconditioning step, designed to clarify the solution. The point to be stressed is that the preconditioning might have already taken place in the previous processes (e.g. dewatering). In this sense, operations can overlap so that the conceptual approach to design of separation processes must be understood as an uncertain sequence of operations. The previous knowledge can be represented by the following rules: IF THEN IF THEN

DEWATERED LIQUID = GIN CLEAR P R E C O N D I T I O N I N G IS N O T R E Q U I R E D DEWATERED LIQUID = NOT GIN CLEAR P R E C O N D I T I O N I N G IS R E Q U I R E D

Preconditioning is a stage which is necessary if the solution is not 'gin clear' due to fine suspended material or other contaminants which may produce fouling of the high resolution purification columns. Non-specific (ion-exchange) adsorbants such as Whatman DE52, DE32 or C D R (cell debris remover) or aqueous two-phase separation which can remove lipids, nucleic acids, coloured matter and suspended solids can be used. In some cases it is possible to use hydrophobic interaction chromatography for this purpose. If the efficiency of all these possible preconditioning operations is low, the proteins can be precipitated by an inorganic salt like a m m o n i u m sulphate. The preconditioning operation ( P - E Q U I P M E N T ) will be selected from the following rules: IF

THEN IF THEN IF

THEN IF

THEN IF

PRECONDITIONING REQUIRED A N D T W O PHASE S E P A R A T I O N E F F I C I E N C Y H I G H AND ADSORPTION EFFICIENCY HIGH P R I N T 'Because both adsorption and two phase separation are efficient the most suitable equipment will have to be found experimentally' PRECONDITIONING REQUIRED AND ADSORPTION ION-EXCHANGE EFFICIENCY HIGH P - E Q U I P M E N T = Adsorption ion exchange PRECONDITIONING REQUIRED AND TWO PHASE SEPARATION EFFICIENCY H I G H A N D ( A D S O R P T I O N E F F I C I E N C Y M E D I U M O R LOW) P - E Q U I P M E N T = Two phase separation PRECONDITIONING REQUIRED AND HYDROPHOBIC INTERACTION EFFICIENCY HIGH A N D ( A D S O R P T I O N E F F I C I E N C Y M E D I U M O R LOW) A N D (TWO PHASE P A R T I T I O N I N G E F F I C I E N C Y M E D I U M O R LOW) P - E Q U I P M E N T = Hydrophobic interaction chromatography PRECONDITIONING REQUIRED A N D A D S O R P T I O N E F F I C I E N C Y LOW A N D T W O P H A S E S E P A R A T I O N E F F I C I E N C Y LOW

289 A N D H Y D R O P H O B I C I N T E R A C T I O N E F F I C I E N C Y LOW T H E N P - E Q U I P M E N T = Precipitation (ammonium sulphate) The resulting solution will be 'gin clear', cell free and with a concentration 60-70 g1-1. IF

P - E Q U I P M E N T = Precipitation A N D H I G H R E S O L U T I O N P U R I F I C A T I O N STAGE 1 = Hydrophobic interaction T H E N D E S A L T I N G N O T NECESSARY P - E Q U I P M E N T = Precipitation IF A N D H I G H R E S O L U T I O N P U R I F I C A T I O N STAGE 1 IS N O T Hydrophobic interaction T H E N D E S A L T I N G IS NECESSARY The high resolution purification stage that follows is designed to achieve the expected purity which would usually be between 95 and 98% for the high resolution step. Purification is usually performed by high resolution ion exchange or by affinity chromatography. Ion exchange is most commonly anion (e.g. DEAE) rather than cation (e.g. CM). In many instances high resolution ion exchange can be carried out in two steps to obtain the desired purity (Duffy et al., 1988). The decision is partially based upon the feasibility of the affinity chromatography operation and the expert system uses information supplied by the user about purification of the protein (usually generated at a laboratory scale) or a data base containing known biospecific ligands and other physicochemical information Regarding cost, affinity chromatography will usually be more expensive than ion exchange (Duffy et al., 1988). The rules required for the high resolution purification equipment (HRPE Q U I P M E N T ) selection are: Certainty factor IF BIOSPECIFICITY IS POSSIBLE T H E N H R P - E Q U I P M E N T = Affinity chromatography H R P - E Q U I P M E N T = High resolution ion exchange chromatography H R P - E Q U I P M E N T = HIC (hydrophobic interaction chrom.) IF P R O D U C T = IgG M O N O C L O N A L T H E N H R P - E Q U I P M E N T = Protein-A affinity chromatography H R P - E Q U I P M E N T = Ion exchange (1 or 2 Steps) IF HRP-EQUIPMENT = UNDEFINED T H E N H R P - E Q U I P M E N T = High resolution ion exchange H R P - E Q U I P M E N T = Affinity chromatography

0.3 0.6 0.1

0.6 0.4 0.8 0.2

290 In a few cases the high resolution purification can be achieved in one step; more commonly two steps are required. High resolution purification is usually carried out by chromatography. Selection of these purification operations is based on the efficiency of different chromatographic techniques to separate the target protein from the contaminating ones. Different techniques exploit different physicochemical properties and some are much more efficient than others in exploiting these differences. Ion exchange chromatography will separate the proteins based on their difference in charge. The charge on a protein changes with the p H following the titration curve. Hence, if carried out at significantly different p H values at which the difference in charge of three or more proteins is very different, this technique can be used twice to purify a protein from different protein contaminants. Ion exchange can use small differences in charge to give a very high resolution and hence it is an extremely efficient operation to separate proteins. Affinity chromatography can have a very high specificity for a particular protein or a small group of proteins hence it can also have a very high resolution. The matrix can be expensive but it can be reused for long periods. Ligand leakage into the product can be a problem. Hydrophobic interaction chromatography (HIC) has only been proposed as a pretreatment step or as a first high resolution purification ( H R P ) step. The resolution is not particularly high as the distribution of surface hydrophobicity in a protein can be very random thus giving poor resolution. Chromatofocusing, which separates proteins based on their isoelectric points ( p I ) appears to have an extremely high resolution but the materials used are extremely expensive for large scale use, particularly those that have to be disposed of and are not recycled (polybuffer). Gel filtration for protein fractionation is normally not used as a high resolution operation in the large scale due to the low efficiency in exploiting differences in molecular weight. For selection of operations, information generated at a very small scale in terms of 'efficiency' of separation or alternatively information on physicochemical properties (charge-titration curve, bioaffinity, surface hydrophobicity, pI, M.W.) will be used. In this case the deviation of the value for the product protein from those of the main contaminants or, if no major contaminants are present, from the mean values (or distribution of properties based on mass) of the contaminants is used. A factor for efficiency of the operation in exploiting this difference also has to be included in this evaluation (Asenjo, 1989). It should be pointed out that although our 'first generation' ES took into account the influence of scale in the choice of operations (i.e. equipment availability for a particular scale; e.g. laboratory, bench and industrial) this was not included in this 'second generation' ES as it was felt that at this point this was not such a fundamental issue as others that should be addressed first (protein properties, efficiency of operations). Certainty factor IF ION EXCHANGE EFFICIENCY = HIGH T H E N H R P - E Q U I P M E N T S T A G E 1 = Ion exchange

0.95

291 IF THEN IF THEN IF THEN IF THEN IF THEN IF THEN IF THEN IF THEN

IF THEN IF THEN IF THEN IF THEN

ION EXCHANGE EFFICIENCY = MEDIUM H R P - E Q U I P M E N T S T A G E 1 = I o n exchange ION EXCHANGE EFFICIENCY = LOW H R P - E Q U I P M E N T S T A G E 1 = I o n exchange AFFINITY CHROMATOGRAPHY EFFICIENCY = HIGH HRP-EQUIPMENT STAGE 1 = Affinity chromatography AFFINITY CHROMATOGRAPHY EFFICIENCY = MEDIUM HRP-EQUIPMENT STAGE 1 = Affinity chromatography HYDROPHOBIC INTERACTION CHROM. (HIC) EFFICIENCY = HIGH HRP-EQUIPMENT STAGE 1 = HIC HYDROPHOBIC INTERACTION CHROM. (HIC) EFFICIENCY = MEDIUM HRP-EQUIPMENT STAGE 1 = HIC ISOELECTRIC POINT OF PROTEIN = UNKNOWN A s k user for isoelectric p o i n t of p r o t e i n ( p I ) HRP-EQUIPMENT STAGE 1 = ION EXCHANGE AND ISOELECTRIC POINT > 7 P R I N T ' A s the p I of the p r o t e i n is greater than 7 a c a t i o n e x c h a n g e r w o u l d p r o b a b l y yield b e t t e r results' HRP-EQUIPMENT STAGE 1 = ION EXCHANGE AND ISOELECTRIC POINT < 7 P R I N T ' A s the p I of the p r o t e i n is less t h a n 7 an a n i o n e x c h a n g e r w o u l d p r o b a b l y yield b e t t e r results' HRP-EQUIPMENT STAGE 1 = ION EXCHANGE AND ISOELECTRIC POINT = 7 P R I N T ' A s the p l of the p r o t e i n is 7 b o t h a n i o n a n d c a t i o n e x c h a n g e are feasible' STAGE 1 PURITY < PREPOLISHING PURITY STAGE 2 PURIFICATION REQUIRED S T A G E 1 P U R I T Y >~ P R E P O L I S H I N G P U R I T Y STAGE 2 PURIFICATION NOT REQUIRED

0.65 0.35

0.75

0.45

0.70

0.40

If a s e c o n d p u r i f i c a t i o n step is r e q u i r e d the b u f f e r necessary will u s u a l l y b e d i f f e r e n t f r o m the o n e the s a m p l e is in after the first step. G e l filtration is c o m m o n l y used for d e s a l t i n g as there is n o n e e d to c o n c e n t r a t e the p r o t e i n s a m p l e b u t d i a f i l t r a t i o n can also b e used.

292

Certainty factor IF

THEN IF

THEN IF

THEN IF

THEN IF

THEN IF

THEN IF

THEN IF

THEN

STAGE 2 P U R I F I C A T I O N R E Q U I R E D A N D STAGE 2 PURIFICATION BUFFER D I F F E R E N T F R O M STAGE 1 B U F F E R Desalting using gel filtration to change buffer Desalting using diafiltration to change buffer STAGE 2 P U R I F I C A T I O N R E Q U I R E D A N D S T A G E 2 ION E X C H A N G E EFFICIENCY = HIGH H R P - E Q U I P M E N T STAGE 2 = Ion exchange STAGE 2 P U R I F I C A T I O N R E Q U I R E D A N D STAGE 2 ION EXCHANGE EFFICIENCY = MEDIUM H R P - E Q U I P M E N T STAGE 2 = Ion exchange STAGE 2 P U R I F I C A T I O N R E Q U I R E D A N D STAGE 2 I O N E X C H A N G E E F F I C I E N C Y = LOW H R P - E Q U I P M E N T STAGE 2 = Ion exchange STAGE 2 PURIFICATION REQUIRED AND STAGE 2 A F F I N I T Y C H R O M A T O G R A P H Y EFFICIENCY = HIGH H R P - E Q U I P M E N T STAGE 2 = Affinity chromatography STAGE 2 P U R I F I C A T I O N R E Q U I R E D A N D STAGE 2 A F F I N I T Y C H R O M A T O G R A P H Y EFFICIENCY = MEDIUM H R P - E Q U I P M E N T S T A G E 2 = Affinity chromatography STAGE 2 P U R I F I C A T I O N R E Q U I R E D AND CHROMATOFOCUSING EFFICIENCY = HIGH H R P - E Q U I P M E N T STAGE 2 = Chromatofocusing STAGE 2 PURIFICATION R E Q U I R E D AND CHROMATOFOCUSING EFFICIENCY = MEDIUM H R P - E Q U I P M E N T S T A G E 2 = Chromatofocusing

0.7 0.3

0.90

0.60

0.30

0.85

0.55

0.60

0.30

Often in actual practice the purity obtained with the high resolution steps designed in the purification stage does not reach the requirements (e.g. 99.9 or 99.98% in hard cases). The main reason is that dimers or oligomers (aggregation products) or protein fractions (hydrolysed by proteases) may be present and, in such cases, one of the only possible final separation or polishing steps would be by exploiting molecular weight. If there is no other difference in properties between the protein and these minor contaminants, then gel filtration chromatography has to be used.

293 However, if ion exchange had been used in the purification stage with a step elution, then polishing could be performed by gradient ion exchange. In the rare event that both gel filtration and gradient ion exchange have a low efficiency for polishing then H P L C (high performance liquid chromatography) would be used. Its resolution can be extremely high but it is a very costly operation when used in a preparative mode. The polishing operation ( P O L - E Q U I P M E N T ) will be selected from the following rules: Certainty factor IF THEN IF THEN IF

THEN IF

THEN IF THEN IF THEN IF

THEN IF THEN

P U R I T Y - A C T U A L >~ P U R I T Y - R E Q U I R E D Polishing is not required PURITY-ACTUAL < PURITY-REQUIRED Polishing is required P O L I S H I N G IS R E Q U I R E D AND GRADIENT ION EXCHANGE EFFICIENCY MEDIUM P O L - E Q U I P M E N T = Gradient ion exchange chromatography P O L I S H I N G IS R E Q U I R E D AND GRADIENT ION EXCHANGE EFFICIENCY HIGH P O E - E Q U I P M E N T = Gradient ion exchange chromatography P O L I S H I N G IS R E Q U I R E D AND GEL FILTRATION EFFICIENCY MEDIUM P O L - E Q U I P M E N T = Gel filtration P O L I S H I N G IS R E Q U I R E D AND GEL FILTRATION EFFICIENCY HIGH P O L - E Q U I P M E N T = Gel filtration P O L I S H I N G IS R E Q U I R E D AND GRADIENT ION EXCHANGE E F F I C I E N C Y LOW AND GEL FILTRATION EFFICIENCY LOW P O L - E Q U I P M E N T = High pressure liquid chromatography P O L I S H I N G IS R E Q U I R E D AND POLISH = UNDETERMINED P O L - E Q U I P M E N T = Gel filtration

0.65

0.95

0.60

0.90

Discussion As modern biotechnology products become more competitive and their use more widespread, production process optimization and economics play an increasingly

294

important role. The aim then is to design a large scale protein purification process with a very high recovery, a virtually pure product and minimum cost. This work on expert systems (ESs) has shown that properly developed ESs can be a helpful tool to assist in solving the knowledge intensive and heuristic based problem of process synthesis and flowsheet design in process biotechnology. Rigorous methods will not be appropriate to solve the overall synthesis problem as rigorous information and mathematical correlations are not readily available as they are in chemical process engineering (Prokopakis and Asenjo, 1989). The overall downstream process synthesis problem in biotechnology does not have a strict combinatorial nature. Only the high resolution purification stage within the purification subprocess, where more than one high resolution purification steps and several alternatives in different order combinations can be used, may lend itself to be solved as a combinatorial problem. However, as has been recently pointed out and discussed (Asenjo, 1988) rigorous models have an important role in the simulation of individual operations (e.g. for process evaluation and comparison of performance and cost of individual operations). Some models can be found in the literature where their use in process simulation and optimization has been described. Computer simulations are a useful tool to optimize separations in chemical process engineering. Examples of useful downstream process stimulations, and investigation of process conditions are the microbial cell breakage and selective product release using enzymes (Hunter and Asenjo, 1986, 1988; Liu et al., 1988) and investigation of the affinity chromatography and ion-exchange separation of proteins (Chase, 1988; Wang, 1989). A limiting factor in the development of ESs for protein purification at present is the acquisition, clarification, formalization and structuring of the domain of expert knowledge. Heuristic methods (such as those that have been implemented in ESs) are an important tool in the spectrum of available process synthesis techniques in addition to rigorous methodologies such as mathematical modelling techniques. ESs are being developed that use expertise mainly from industrial experts for the separation of proteins. They will allow the selection of the most appropriate sequence of operations (i.e. flowsheet) to purify a particular protein from a specific production stream to a specified degree of purity. These expert systems are based on heuristics, rules and less rigorous information acquired from industrial experts. It allows the use of uncertainty factors which is important as heuristics and expert knowledge have a degree of uncertainty; the object is to allow the selection of a few possibilities with specific degrees of certainty. In this work on development of a prototype system for process synthesis and selection (flowsheeting) we have found that we could divide the process into two subprocesses. For building the first subprocess, which is recovery of the protein in a cell free solution at - 6 0 - 7 0 g 1-1 total protein content most information for structuring the knowledge could be obtained from experts or was available. The rules presented here are not complete, some were simplified and specific particular cases were not considered as the main aim of the exercise was to build a prototype and not a complete ES. For instance we did not take into account the processing of inclusion bodies in E. coli which includes separation of the bodies, denaturing and refolding of the protein into its native state. Otherwise selection of operations in the

295 recovery subprocess was found to be well documented and hence could be well structured. In the second subprocess, named purification, the structuring of the knowledge and obtention of information was more difficult mainly because it does not exist. An important achievement in the development of this ES was the structuring of the purification subprocess into three distinct stages: pretreatment or preconditioning, high resolution purification and polishing, some of which would correspond to more than one individual operation. This also left the possibility of carrying out some 'conditioning' operations in between stages like desalting or buffer exchange. The main lack of available information was found to be in the selection of high resolution purification operations (which are usually one or two chromatographic steps) based on the physicochemical properties of the proteins and those of the major contaminant proteins. This information is vital to select the right operations and in the best possible order according to their relative efficiencies (Asenjo, 1989). Five main heuristic rules have been suggested for this purpose (Asenjo, 1989): Rule Rule Rule Rule

1: 2: 3: 4:

Choose separation processes based on different properties. Separate the most plentiful impurities first. Use a high resolution step as soon as possible. Choose processes that exploit the differences in the physicochemical properties in the most efficient way. Rule 5: Do the most arduous step last.

The present ES clearly reflects the use of these rules. There is a significant lack of available information (both published and confidential) on minor contaminants (such as pyrogens) that are removed in the latest polishing stages. At present, little formal, structured knowledge exists on how to separate such minor contaminants, if present, apart from information available for very specific cases where this separation has been achieved after lengthy trial and error procedures. A polishing stage for the separation of aggregated product protein, on the other hand, tends to use gel filtration, as molecular weight is virtually the only physicochemical property difference that can be exploited. An area where some information was available but we felt that it was not sufficient is in the use of pretreatment methods in order to eliminate impurities and obtain a 'gin clear' solution for the high resolution purification operations. For all the reasons discussed above, the selection of many operations in the purification subprocess is expressed in terms of individual operation efficiencies. In order to predict these it is necessary to characterize both the product protein and the main contaminating ones in terms of their physicochemical properties. For the contaminating ones average values and mass based distribution curves are probably the most useful information to have. Main sources of proteins in the modern biotechnology industry are few: E. coli (intracelhilar), mammalian (extracellular) in the presence and in the absence of calf serum in the media and yeast (extra- and intracellular). The characterization of both protein products and contaminants has

296 to be carried out in terms of charge and titration curve of major proteins, molecular weight, surface hydrophobicity, p I and available biospecific interactions. Determination of this information can be done on a case by case basis for the individual product proteins. General distribution of physicochemical properties in the five sources mentioned above (E. coli, yeast (2 cases) and mammalian (2 cases)) should be the next step in the development of this ' p r o t o t y p e ' ES. This will allow selection of purification operations on a much more rational basis and will have an important impact in the optimization of downstream processes of actual practical cases. For the prototype system developed here several shells are adequate, particularly if they are capable of evaluating uncertainties associated with the inference process. This second generation of protein purification rules have been implemented in two shells: ESE and PC Plus. Expert knowledge was obtained partially from the literature but mainly from industrial experts working on the large scale separation and purification of therapeutic, diagnostic and analytical proteins. A next stage in the development of expert systems should consider the introduction of more quantitative elements (simple algebraic correlations, design equations and short-cut methods) for the design and evaluation of individual operations and their alternatives, and, also the introduction of basic cost calculations into the selection procedure of alternative processes. Such a hybrid system that will include heuristic rules, more rigorous information and design correlations is particularly attractive for process biotechnology as rigorous correlations and detailed information are not readily available. As has been stated above process simulation and optimization of individual operations can be carried out and solved rigorously using numerical methods, whereas it clearly appears that the overall design of an efficient process, which is a knowledge intensive domain, will only be tractable using an AI approach. Development of an appropriate computer system where the man-machine interaction takes place in a reactive environment is fundamental as the computer knowledge base will provide the necessary support in terms of protein property data bases, extensive conceptual modeling and constraints of protein separation processes. As only a few proteins (a few dozen) are currently of commercial importance the data bases required will be quite reasonable in size. Furthermore, the parameters that characterize proteins that would be commercially important can be determined in advance and data bases are relatively easily maintained, modified and actualized. If the properties of the protein and of the microbial source are not found in the support data bases the client can be asked for them. Usually a user interested in the downstream processing of a particular product will have knowledge of some of its properties. The properties of the contaminants can be standardized as the most c o m m o n production systems are few as has already been discussed. Conclusions

The selection of optimal protein separation sequences is an important problem that will gain in importance as biotechnology products (such as new vaccines and

297 therapeutics) become more widespread and their production more competitive. A limiting factor in the development of ESs for protein purification at present is the acquisition, clarification, formalization and structuring of the domain of expert knowledge. Heuristic methods that use expertise mainly from industrial experts can be implemented in available ES 'shells' and are an important tool in the spectrum of available process synthesis techniques in addition to rigorous methodologies. Such ESs allow the use of certainty factors which is important as expert knowledge has a degree of uncertainty. It allows the selection of a few possibilities with specific degrees of certainty. The overall process was satifactorily divided into two subprocesses with clear objectives: recovery and purification. Selection of operations in the recovery subprocess could be well structured. In the second subprocess (purification) the structuring of the knowledge was more difficult. The main deficiency of available information was found to be in that required for the selection of high resolution purification operations. A considerable lack of information was also the case for the separation of minor contaminants present that are removed in the final polishing stage. To predict the selection of many operations in the purification subprocess it is necessary to characterize both the product and the main contaminant proteins in terms of their physicochemical properties. Main sources of proteins (for physicochemical characterization) in modern biotechnology industries are few: E. coli, mammalian ceils and yeast. Good characterization of these sources will allow selection of purification operations on a much more rational basis. For selection of optimal protein separation sequences the ES will give the user a number of process alternatives which will be chosen based on extensive data back-up on protein sources and unit operations as well as algebraic correlations. Such a system will clearly constitute 'expert amplification' and not 'expert replacement'.

Acknowledgements Dr. Ian Patrick from Celltech Ltd. for useful discussions in structuring the expert knowledge base. Dr. Scott Wheelwright from Chiron Co. and Dr. Armin Ramel from Genentech for useful discussions when part of this work was presented at the 196th ACS National Meeting in Los Angeles, September 1988.

References Asenjo, J.A. (1988) The Rational Design of Large Scale Protein Separation Sequences. Paper presented at the 196th ACS National Meeting, MBTD division, Los Angeles, CA, 25-30 Sept. 1988. Asenjo, J.A. (1989) Selection of operations in separation processes. In: Asenjo, J.A. (Ed.), Separation Processes in Biotechnology,Marcel Dekker, New York, in press. Asenjo, J.A. and Patrick, I. (1989) Large scale purification. In: Harris, E.L.V. and Angal, S. (Eds.), Protein Purification Applications: A Practical Approach, IRL Press, U.K., in press.

298 Banares-Alcantara, R., Sriram, D., Venkatasubramanian, V., Westerberg, A. and Rychener, M. (1985a) CEP, Sept. 1985, 25-30. Banares-Alcantara, R., Westerberg, A. and Rychener, M. (1985b) Development of an expert system for physical property predictions. Comp. Chem. Eng. 9, 127. Brownston, L., Farrell, R., Kant, E. and Martin, N. (1986) Programming Expert Systems in OPS5, Addison Wesley. Buchanan, B.G. and Feigenbaum, E.A. (1978) DENDRAL and Meta-DENDRAL: Their applications dimensions. Artif. lntell. 11. Chase, H.A. (1988) The Performance of Affinity Separations, Paper presented at the SCI meeting on Antibodies for Purification, London, March, 1988. Chester, D., Lamb, D. and Dhurjati, P. (1984) Computers in Engineering - Advanced Automation: 1984 and Beyond, Ed. W.A. Gruver, ASME, 345. Duffy, S.A., Moellering, B.J. and Prior, C.R. (1988) Optimal Large Scale Purification Strategies for the Production of Highly Purified Monoclonal Antibodies for Clinical Application. Paper presented at the 196th ACS National Meeting, MBTD division, Los Angeles, Sept. 1988. Govind, R. and Powers, G.J. (1981) AIChE J. 27, 429. Grimes, L., Rychener, M. and Westerberg, A. (1982) Chem. Eng. Commun. 14, 339. Harmon, P. and King, D. (1985) Expert Systems: Artificial Intelligence in Business, Wiley. Hunter, J.B. and Asenjo, J.A. (1986) Sep. Recovery Purif. Biotechnol. 314, 9-31. Hunter, J.B. and Asenjo, J.A. (1988) Biotechnol. Bioeng. 31,929-943. Liu, L.C., Prokopakis, G.J. and Asenjo, J.A. (1988) Biotechnol. Bioeng. 32, 1113-1127. Moore, R.L., Howkinson, L.B., Knickerbocker, C.G. and Churchman, L.M. (1984) Proc. IEEE Conf. on Applic. of Artif. Intell. 569, (Dec. 1984). Prokopakis, G.J. and Asenjo, J.A. (1989) Synthesis of Downstream Processes. In: Separation Processes in Biotechnology, Ed. J.A. Asenjo, Marcel Dekker, New York, in press. Siletti, C.A. and Stephanopoulos, G. (1986) Computed Aided Design of Protein Recovery Processes. Paper presented at the 192nd ACS National Meeting, Anaheim, CA, September 1986. Wacks, S. (1987) Design of Protein Separation Sequences and Downstream Processes in Biotechnology. Use of Artificial Intelligence, M.Sc. Thesis, Columbia University, New York. Wang, L. (1989) Ion-exchange in Purification. In: Separation Processes in Biotechnology, Ed. J.A. Asenjo, Marcel Dekker, New York, in press.