Cmnputers chem. Engng Vol. 21, No. 9, pp. 981-996, 1997
Pergamon
© 1997 Elsevier Science Ltd Printed in Great Britain. All rights reserved
PII: S0098-1354(96)00329-8
0098-1354/97 $17.00+0.00
Steady-state modelling of chemical process systems using genetic programming Ben McKay m,Mark Willis 2 and Geoffrey Barton m Dept. of Chemical Eng., The University of Sydney, NSW 2006, Sydney, Australia
2 Dept. of Chemical and Process Eng., The University of Newcastle upon Tyne, Newcastle upon Tyne, U.K.
(Received 9 August 1995; revised 21 August 1996) Abstract Complex processes are often modelled using input-output data from experimental tests. Regression and neural network modelling techniques are commonly used for this purpose. Unfortunately, these methods provide minimal information about the model structure required to accurately represent process characteristics. In this contribution, we propose the use of Genetic Programming (GP) as a method for developing input-output process models from experimental data. GP performs symbolic regression, determining both the structure and the complexity of the model during its evolution. This has the advantage that no a priori modelling assumptions have to be made. Moreover, the technique can discriminate between relevant and irrelevant process inputs, yielding parsimonious model structures that accurately represent process characteristics. Following a tutorial example, the usefulness of the technique is demonstrated by the development of steady-state models for two typical processes, a vacuum distillation column and a chemical reactor system. A statistical analysis procedure is used to aid in the assessment of GP algorithm settings and to guide in the selection of the final model structure. © 1997 Elsevier Science Ltd
Keywords: genetic programming; symbolic regression; process modelling Introduction
The increasing emphasis on product 'quality', economic process performance and environmental issues in the chemical and allied industries is placing significant demands on existing operational procedures. Enhanced process performance generally requires increased process knowledge, with mathematical models being the most common means of representing this knowledge. While it may be possible to develop a model using a detailed knowledge of the physics and chemistry of a system, there are a number of drawbacks to this approach. Industrial process systems are often extremely complex and non-linear in nature, thus it may take a considerable amount of time and effort to develop a realistic model. Moreover, in many instances simplifying assumptions have to be made in order to provide a tractable solution. A first-principles model will, therefore, often be costly to develop and may be subject to inaccuracies. However, if an accurate process model were available, then many of the benefits of improved 981
process operability would be achievable. The current trend within the process industries is to use data based modelling techniques to develop accurate, cost-effective input-output process descriptions. The popular techniques may be divided into two categories. The first are based on the use of various statistical techniques and regression analysis, while the second involves the use of artificial neural networks. The most well known and simple to apply statistical techniques assume that any relationships between input and output variables are linear and that the data itself is normally distributed. Unfortunately, industrial systems are normally highly non-linear and the data obtained from such processes generally do not conform to normal distributions. Nevertheless, numerous methods can be used to implement a systematic data analysis methodology and can help to establish the basic characteristics of the process. The interested reader is referred to Weiss and Kulikowski (1991) and Breiman et al. (1984) for an assessment of the strengths and weaknesses of the available methods. However, it should be noted that a
982
B. McKAY et al.
degree of expertise is often required in applying and interpreting such statistical based results. This is one of the reasons why researchers have turned so readily to the use of artificial neural networks. It has been shown mathematically that a neural network is capable of learning any continuous non-linear input-output mapping (Cybenko, 1989). Indeed, applications within the chemical and allied industries indicate that neural networks can adequately represent process system behaviour (e.g. see Qin and McAvoy, 1992; Willis et al., 1992; Willis et al., 1995; McKay et al., 1996). While neural networks can provide an extremely effective black box modelling tool for both steady-state and dynamic systems, the technique has some inherent disadvantages and limitations. Firstly, it is often necessary to fall back on the traditional linear statistical data analysis techniques in order to select appropriate model inputs from the large number of potential inputs that may be available. Next decisions must be made about the size and topology of the required network, which generally involves heuristics or time consuming iterative design techniques. Finally, the complex, highly parameterised structure of neural networks leads to the need for relatively large amounts of training and verification data, which may be restrictive in industrial situations if experiments have to be performed to obtain the data. It also results in models with structures that are difficult to analyse or interpret, except perhaps by indirect techniques such as what if scenarios or sensitivity analysis. This contribution describes a novel approach which offers a useful alternative to popular data based modelling methodologies. The technique utilises Genetic Programming (GP) techniques (e.g. Koza, 1992; Koza, 1994; Kinnear, 1994) to perform: • process model input selection • process model structure selection • parameter identification The objective of this work was to produce inputoutput models with relatively simple, transparent structures. As the process is defined by a finite sample of data and there is a lack of information about the internal workings of the system, it is not anticipated that an exact phenomenoiogical model will be obtained. To develop such a mechanistic model, knowledge of the underlying physical processes would be required, and this is not available solely from plant input-output data.
Genetic programming The performance of an individual organism in its environment determines the likelihood of it passing on its genetic material to future generations. This basic biological principle is known as Darwinian survival of the fittest, and has inspired a class of algorithms known as Genetic Algorithms (GAs). GAs attempt to find the best solution to a problem by mimicking the process of evolution in nature. Thus, a typical algorithm will 'breed' a population of individuals that represent possible solutions to a particular problem.
For an excellent introduction to GAs see Goldberg (1989). Unfortunately, while GAs have proved to be a useful technique for finding solutions in a wide range of problem domains, e.g. function optimisation (Goldberg, 1989), the control of natural gas pipelines (Goldberg, 1989), the prediction of protein conformation (SchulzeKremer, 1992), process control (Nordvik and Renders, 1991; Fogarty, 1989) as well as in the design of heat exchanger networks (Androulakis and Venkatasubramanian, 1991), they do not appear to be appropriate for symbolic regression problems where the structure and parameters of a model are to be determined simultaneously. This is because GAs generally use fixed length binary strings to code potential solutions to a problem. Clearly this is unsuitable for symbolic regression, where the model structure is allowed to vary during evolution. However, GP is a closely related approach that does lend itself to the implementation of symbolic regression. GP differs from GAs by utilising the following: • Tree structured, variable length chromosomes (rather than chromosomes of fixed length and structure). • Chromosomes coded in a problem specific fashion (that can usually be executed in their current form) rather than binary strings. • Genetic operators that preserve the syntax of the tree structured chromosomes during 'reproduction'. A number of early results have demonstrated the potential applicability of GP to the field of process systems engineering. For instance, GP has been used to approximate the impulse response of a linear time invariant system (Koza, 1993). However, the example studied here was rather simplistic and the results obtained were less than impressive. In another application, Iba et al. (1993) used GP to perform time series prediction for a chaotic system. However, their algorithm was restricted to the use of polynomial functions, thus unnecessarily limiting the type of 3olutions that could be obtained. In a further contribution, Zhang and Muhlenbein (1993) used GP to optimise both the architecture and the connection weights of a feedforward artificial neural network. Their results indicated that given enough resources the methodology could determine minimal complexity networks. While this is an interesting result in itself, it fails to fully exploit the power of GP. Rather than manipulating neural network structures, there is the potential to discover significantly more information about the underlying process characteristics by the direct use of symbolic regression. In summary most work in the field has been conducted by computer scientists and has concentrated on small, relatively simple problems. By comparison, we use GP to evolve input--output models of complex process systems.
Description of the algorithm The first step in the implementation of GP is the definition of a terminal set. In other words, when developing a mathematical model of a process it is necessary to supply the input variables that are thought
983
Steady-state modelling of chemical process systems to be related to the output. In addition, the algorithm should have the ability to generate constants, as it will generally be a combination of input values and numeric constants that produce the required regression model. It is also necessary to define a functional set. This is the set of arithmetic operations and mathematical functions that the algorithm may use when constructing a potential solution to a given problem. Typically, the functional set will include arithmetic operators such as addition and multiplication, and common mathematical functions (such as the square root, logarithm or exponential functions). It is important to ensure that each functional has the property of closure. That is, it must be able to accept and return a numeric value when presented with an input. The square root function does not have this property if presented with negative numbers. Thus, in order to ensure closure in this case, the absolute value of the input should be taken before evaluation. The search space of the GP algorithm contains all possible models that can be constructed from elements in the functional and terminal sets. Regardless of the specific application, an algorithm searches the problem space using simple operations involving the copying, recombining and random alteration of mathematical expressions. A flow diagram of a typical GP algorithm is shown in Fig. 1. Initially a population of N random mathematical expressions must be generated, each of which is coded as a tree structure. For a thorough discussion of tree structures and their properties, see Tenenbaum and Augenstein (1981). However, it is sufficient for the purposes of this paper, to introduce the relevant terminology by means of a simple example. Consider the problem of predicting the numeric value of an output variable, y, from two input variables a and b. One possible symbolic representation for y in terms of a and b would be, y = (a - b)13
( 1)
Fig. 2 demonstrates how this expression may be represented as a tree structure. This example will be used later to explain the action of genetic operators. Each element in the tree is called a node. There are several ways of classifying the nodes of a tree, for example interior nodes and leaf nodes. Leaf nodes are used to represent the elements of the terminal set and can be classified as either constant (if associated with a constant value) or variable (if associated with an input variable) nodes. Interior or branch nodes represent elements of the functional set. The degree of a branch node is defined by the number of arguments of the functional it represents. For instance, functionals with a single argument such as {sqrt, exp, log} have degree one, while functionals such as { +, - , / , * } have degree two. Thus, referring to Fig. 2, it can be seen that the tree has three leaf nodes, two being variable nodes a and b, while one is a constant node '3'. There are two functionals in the mathematical expression ( - and/), both of degree two. Once the initial population has been formed (each
described by its own tree structure), evolution is guided by a measure of the 'fitness' of each individual in the population. Assignment o f fitness values
Fitness is a numeric value assigned to each member of a population to provide a measure of the appropriateness of a solution to the problem in question. Fitness functions are generally based upon the error between the actual and predicted solutions. However, error-based measures decrease for better solutions. In GP, it is more natural to think of fitness values that are increasing during the evolutionary process. In order to modify the error-based performance index, a scaled inverse transformation is generally used (South et al., 1995). However, for symbolic regression problems, South (1994) and South et al. (1995) proposed the use of the correlation coefficient between the actual and predicted outputs as an alternative to error-based fitness functions. Generally, when compared to algorithms utilising errorbased functions, the correlation coefficient produces an accurate solutions in fewer generations. Therefore, this performance index was used throughout this work. The absolute value of the correlation coefficient is given by:
r(i) =
O-p(i)o¥
(2) where yp(ij) is the predicted value of record j by tree i, Yr(]) is the target value, R is the number of data records, o-p(i) is the standard deviation of the predicted values for tree i and o"T is the standard deviation of the target values. The correlation coefficient lies in the range [0 1] and measures the way in which the predicted values and the target values vary together. Therefore, credit is given to solutions that are close to the correct form. It is well recognised that as the complexity of a model increases, so its ability to generalise (ie. to fit unseen data as well as the training data) is compromised. Consequently, large tree structures can cause overfitting of the data. In order to prevent the formation of overly large trees, the correlation fitness function was penalised according to the size of a tree. The following sigmoidal penalty function was used,
f,=
r(i) 1 +exp(a~(SL - a2))
(3)
where St is the size of a tree expressed as the string length, while a~ and a2 are parameters that determine the softness of the constraint and where it comes into effect.Once the fitness of each individual has been determined, it is used as the basis for selecting members of the population for reproduction. Selection strategies
A number of selection methods have been suggested in the literature. These include the elitist strategy,
984
B. McKAY et al.
tournament selection and fitness proportionate selection. With the elitist scheme, the population is sorted into descending order according to individual fitness values.
The fittest M (M -< N) individuals then undergo reproduction. Tournament selection involves the random sampling (with replacement) of a fixed number of
IGenerateInitial Populationof Size N[
I EvaluateModel Fitness I
DirectReproduction, Mutation(P,~) _ _ ~ ~ ( l ' P m ' P c )
Fitness Proportionate Selection
Fitness Proportionate Selection
Fitness Proportionate Selection
t
IJ=J+ 11
IJ=J+2l
Copy to New Population Unchanged
I Perform Crossover I
+
+
IAdd New Solution(s) I~ r [ to New Population I II ,i
Y
Add (N-M) Fittest Members from Old Populationto New Population
Fig. 1. Typical genetic programmingalgorithm flowsheet.
IJ=J+ll Perform Mutation
I
Steady-state modelling of chemical process systems
985
Y
Fig. 2. Representation of a numeric expression using a tree structure. individuals from the parent population to form a subset. The fittest member of this (relatively small) subset is then chosen to reproduce, and the process repeated as required. With fitness proportionate selection, an individual is sampled from the parent population (again with replacement) with a probability proportional to its fitness. Thus, if the i th individual in the parent population has a fitness f , , the probability of this individual being chosen is, Probability(selection)=f,/~, where ~ = Y.f~,i= 1..... N. (4) It is presently unclear as to which is the optimum selection strategy. In this paper, fitness proportionate selection was used, as it appears to be a favoured technique within the GA literature. Application o f genetic operators
Once an individual has been selected from the current population, three basic genetic operators (direct reproduction, crossover and mutation) may be applied. The direct reproduction operator directly copies a member from the parent population to the next generation. The genetic operation of crossover takes two members of the population and combines them to create new offspring. The operation of mutation creates a new individual by randomly altering a single member of the population. The choice of each operator is probabilistic, with crossover, mutation or direct reproduction being applied with probabilities Pc, Pm and (1 - Pc - P,,), respectively. Direct reproduction requires no further explanation. However, a more detailed discussion of the crossover and mutation operators is presented below. Crossover. The role of the crossover operator is to allow the construction of new individuals from existing ones, thus enabling new regions of the solution space to be searched. During crossover two mathematical expres-
sions (ie. members of the existing population) are randomly chosen using fitness proportionate selection. A randomly selected subtree from one parent is then interchanged with a subtree randomly chosen from a second parent. The two newly created individuals are then used to form the next generation. In GP the crossover operation is non-homologous. In other words, the crossover points in the two expressions are chosen independently, unlike GAs where the position must be identical for both chromosomes. A naive application of traditional crossover would produce meaningless offspring. Thus, it is required that the operation of crossover preserve the syntax of the original mathematical expressions. Fig. 3 shows a valid crossover operation where the two parent expressions are given by: Parent I :y = (a - b)/3
(5)
Parent2:y = (c - b)*(a + c)
(6)
Parent 1 has input variables 'a' and 'b' and a constant '3' while parent 2 has three input variables 'a', 'b' and 'c'. Both expressions attempt to predict the process output, 'y'. If the / from parent 1 and the * from parent 2 are chosen as the crossover points, then the two offspring are given by: Offspring I :y= (a - b)l(a +c)
(7)
Offspring2:y = (c - b)*3
(8)
It will be noted that crossover produces offspring that are different from the two parent expressions. Nevertheless, they are created entirely from the genetic material of their parents. Intuitively, if a parent represents a reasonable solution to a numeric prediction problem, then it might be expected to contain subtrees with information relevant to an acceptable solution. By recombining relevant subtrees, new expressions may be produced that provide an even fitter solution.
986
B. McKAY et al.
Fig. 3. A typical crossover operation.
Mutation. The role of the mutation operator is to maintain population diversity, hence reducing the risk of premature (but poor) convergence. It enables the introduction into the population of elements from the functional and terminal sets that have either been lost through evolution, or did not exist in the initial population. Mutation consists of randomly changing a functional, input or constant in one of the mathematical expressions making up the present population. Fig. 4 shows the flow diagram for our mutation algorithm. Once a member of the population has been selected for mutation, one of the inputs, constants or functionals present in the expression is chosen at random. If an input is selected, there is a user defined probability (Prove) that it will be replaced by a constant. Similarly a constant may be chosen and replaced by a randomly selected input, with the same probability (Pmv¢). However, if a functional is chosen, then it is necessary to identify the arguments associated with it and choose a replacement functional. Depending upon the degree of the replacement, the number of arguments required to reconstruct the mathematical expression may differ. Table I demonstrates the logic for this operation. Thus, if the degree of the two functionals are the same, the old functional is simply replaced by the new. If the degree of the old is less than that of the new, it is necessary to randomly generate a new argument. However, if the degree of the old is greater than that of the new, an argument must be culled. In order to illustrate these concepts, consider the following expression: y= (c - b)*(a + c)
(9)
Suppose that the' - ' is to be mutated to an 'exp'. T h e ' - ' has two arguments, in this case 'c' and 'b' (in general the two arguments could be complex subtrees). In order to proceed with the mutation, one of the arguments must be
culled. For example, if the argument b is culled, the new expression becomes: y=exp(c)*(a+c)
(10)
However, if the ' - ' is to be mutated to a '+', then the new expression would simply be:
y=(c+b)*(a+c)
(11)
Thus, during evolution the direct reproduction, crossover and mutation operators are applied to members of the old population. Fitter individuals from this population have a higher probability of passing on elements of their structure to the next generation, a procedure which should allow the progressive evolution to an acceptable solution.
Optimisation of individual expressions Each member of the population may contain a number of constants. Rather than randomly generating the values of these constants, it would seem logical to optimise each expression to obtain the best possible values. Utilising the 'optimal' constant values for each suggested solution will: • Ensure the best possible performance from each model structure. • Provide a mechanism by which to recover disrupted building blocks (ie. improve the portability of successful subtrees). These benefits can be demonstrated by the following example. Suppose the objective is to fit a model to the data presented in Table 2, where a and b are inputs and y is the output. Two candidate models may be, y=5.6*(a - b) 2
(12)
y=ll(a-O.16)
(13)
which have correlation fitness values of 0.5 and 0.1,
Steady-state modelling of chemical process systems
987
Select Member of Populationto be Mutated ] Constant
Input
Randomly Select Replacement Functional II
Convert Constant to Input with Pr--P--/b'~bilj~P=~
Identify and Extract Arguments of Old Functional
Degree> Degree Old ~
Cull one of the Old Arguments
Convert Input to Constant with Probability Pmvc Degree< Degree New
ReplaceOld Argument with New
Generatean Additional Argument
t
t
J
[
SwapOld Functionalwith New
Fig. 4. M u t a t i o n o p e r a t o r flowsheet. Table 1. Mutation operator logic Degree
old functional = 1
old functional = 2
new functional ffi I new functional =2
directly exchange generate new argument
cull one argument directly exchange
]
B. McKAv et al.
988 Table 2. Input- output data a
b
y
0.91 0.76 0.26 0.05 0.74
1.71 1.16 0.26 -0.35 - 0.06
5.00 1.67 1.00 0.71 0.56
respectively. If the crossover operator was applied to these two expressions, one possible offspring would be, y= 1/((a - b ) - 0.16))
(14)
which has a correlation fitness value of only 0.2. However, if the value of the constant in this new expression is optimised, the following model is obtained, y= 1/((a - b) - 1.0)
(15)
which yields a fitness value of 1.0. Thus, without performing an optimisation of the model constants, it is very difficult to successfully transfer building blocks from one mathematical expression to another. This means that models with an inherently good structure could be discarded due to sub-optimal constants. In our algorithm, each time a new mathematical expression is generated, regression constants are determined using the Levenberg-Marquardt method of nonlinear least squares optimisation (Edger and Himmelblau, 1989). Thus, the global search in the structural (i.e. topological) space is combined with local parameter optimisation. It may be anticipated that this procedure would be computational intensive, however in our work to date, solution times have been quite acceptable. Creating the next generation Once a population of new individuals has been generated, a decision must be made as to which members of the old population should die to make room for the next generation. Several methods may be adopted. The simplest, given M = N new individuals, is to replace the entire old population so that there is a complete turnover for every cycle of the algorithm. However, it may be expedient to retain the fittest members of the old population in order to ensure the survival of structures that perform well. Thus a proportion ~ otd, of the original population is preserved, leaving ~ p = (1 - ~o~a) as a generation gap (the proportion to be replaced with new individuals). Once the new population has been created, the whole process is repeated until the maximum fitness of the population has reached the desired level, or the maximum number of generations has been reached. Typically the total number of generations in GP is low and the convergence to a good approximate solution is suprisingly rapid.
Implementationaspects Our algorithm has been implemented in MATLAB running on a P5-90. The environment provides an
excellent development platform, with in-built graphics visualisation tools, optimisation and symbolic mathematical manipulation capabilities, as well as convenient routines for string manipulation. To run the program, a number of parameters must be specified. Nevertheless, during evaluation studies it has been found that the algorithm is extremely robust. In other words, it is remarkably adept at finding a solution to a given problem for a wide range of parameter settings. Throughout this work, we heuristically selected the default parameters shown in Table 3. Using these default settings, typical run times of the order of 30 min were achieved. While this may be regarded as being rather slow, dramatic improvements in speed could be expected if the algorithm was coded in C.
Analysis procedure GP is probabilistic by nature, and thus for non-trivial symbolic regression problems each time the algorithm is used it is expected that it will arrive at different approximate solutions. In order to capitalise on this inherent variation, it is useful to perform multiple runs and to employ a statistical analysis of the results. In particular, this provides the following useful information: • The frequency of runs that produce successful results. This allows the adequacy of the program settings to be assessed. If the probability of obtaining a successful result (Psuc) is low, then it may be necessary to adjust some of the parameters of the algorithm. In particular, it may be necessary to increase either the population size, the number of generations or both. • An analysis of the probability of occurrence of each input variable (Pinp.~i= 1..... n) taken from the original terminal set. This will allow the user to obtain a feel for the relevance of a particular input to the modelling task. This is particularly important in the context of chemical process systems, as the determination of the key inputs that affect an output can be as time consuming as the actual model development. A run is considered a success if the root mean square (RMS) error on the verification set is below a prespecified error tolerance. This error tolerance must be determined in a heuristic fashion for each application. This type of decision is always made when developing a model. The modeller must decide what is acceptable for each separate application. As a run can either be successful or not, the probability of success can be defined as follows:
X Pr(success)= L/m ff =P,,,
(16)
where X is the number of acceptable runs and J is the total number of runs. The success of each run is a binomial variable, as it can take only one of two values
Steady-state modelling of chemical process systems
989
Table 3. Default Parameters used by the Genetic Programming Algorithm.. Functional set:
+, - , / , *, ~ sqrt, square, log, exp
Total No. of runs:
J = 50
Relative abundance of operators: Fitness Function:
Equal
Pm =0.2
Correlation
Selection Method Population Size:
Fitness Proportionate N=25
Number of Generations:
G=60
Mutation Probability: Crossover Probability: Generation Gap: Constant-lnput Probability Tree Size Penalty Parameters
(successful or unsuccessful). Moreover, it should be noted that for any specified number of GP runs (j
Pr(L
(17)
where L and U are the lower and upper bounds, a is the confidence coefficient, and P ~ lies within (L,U) with 1 0 0 ( I - a ) % confidence. Based upon the binomial distribution standard statistical tables can be used to determine the values of L and U for given values of P,,c and J. In a similar fashion, the probability that a given input variable will be present in a successful model may also be described by a binomial distribution. Thus, following the procedures detailed above, it is possible to calculate the bounded probability that a given input is required to obtain a successful model.
Selection of the best set of models Having reduced the number of models to a set that conforms to the user's specification on the verification data, the choice of final model structure can be further refined by performing an F-test. This identifies the structures that do not perform significantly worse than the best model. To do this, it is necessary to check that the error residuals on the verification data of the selected models are of zero mean and can be adequately approximated by a normal distribution. This allows the comparison of the standard deviation of the error residuals between various models. The F-test is then used to determine whether the standard deviation of the residuals of one model is significantly greater than another (or the F-statistic is used to test the null hypothesis that tr, = tr, ). Consequently, the final model set represents equally acceptable solutions to the original modelling problem. The choice of which model to use if multiple models are still present is then a matter of preference.
Application results In this section, three examples are used to demonstrate the modelling capabilities of the GP technique. First, a
P~ =0.8
p,~. =90% Pm,~ =0.1 a~ =0.08a,. = 150
tutorial example is presented. Subsequently the algorithm is used to develop an inferential estimator for a vacuum distillation column and a non-linear steady-state model suitable for the optimisation of a simulated reaction system.
Tutorial example The objective here was to use the available data to determine an unknown functional relationship. The data consisted of one hundred samples of five randomly generated input variables (u, ..... us) within the range [0 1], and one output variable (y) generated using the following (assumed unknown) relationship: y = 1000ulexp( - 5.0/u 2) + 3u 3
(! 8 )
Note that the output is only a function of three of the five input variables. The two remaining inputs do not affect the output and are present merely to test the ability of the algorithm to determine relevant inputs. When our algorithm was applied to this problem, 34 out of the 50 runs were successful. Success was defined as achieving a root mean square (RMS) error of less than 0.2 on the verification data set. Using the binomial distribution, we are thus 95% confident that 53-80% of all runs would meet the required tolerance on the validation data set. This success rate was deemed satisfactory and no further runs were made using different algorithm parameter settings. Next, the population of successful models was analysed to determine the frequency of occurrence of each member of the terminal set. At the 95% confidence level, this revealed that 88-100% of all successful models would contain input variables ul, u2 and u 3 while only 0-11% would contain u4 and us. In other words, the algorithm has determined which variables are relevant to this input-output mapping. Next, the residual error between the actual output and that predicted by each successful model was analysed to ensure that they were of zero mean and normally distributed. An F-test was then used to find those structures that were not significantly worse than the best model (as determined by the minimum RMS error on the verification data set). This reduced the set of successful models to just one, being: y= 1000 uj(3.0007 × 10 - 3 u ~ - 1.506 × 10 -4u~ + 1.885 × 10 -4u~)+2.99u3
(19)
This model structure had an RMS on the training data set of 0.012 and an RMS on the verification set of 0.014.
B. McKAY et al.
990
Fig. 5 shows plots of the actual versus predicted output values and the residual error. Note that the residual errors are extremely small and thus an excellent approximation has been obtained. However, it will be noted that the final structure is quite different to (18). It would appear that the exponential term has been approximated with a high-order polynomial. It is conjectured that during model evolution, structures that contain exponentials are more sensitive than polynomial approximations to model inaccuracies. As a result, structures containing exponential functions tend to be eliminated from the population.
Inferring bottom product composition of a vacuum distillation column The objective of this exercise was to develop a model that could be used to infer the bottom product composition of a binary vacuum distillation column. While composition analysers may be used to measure product quality, investment and maintenance costs often restrict their use. However, a reliable estimator would provide a cheap alternative to direct measurement and allow tighter control of the bottom product composition. The vacuum column considered is equipped with 48 trays, a steam-heated reboiler and a total condenser (see Fig. 6). The feed F (which consists primarily of two components) is split into two product streams the distillate (D) and bottoms (B). The column design specifications and operating data are summarised in Table 4. Since the system is essentially binary, compositions can be estimated based on temperature and pressure measurements. However, the necessary sensors are only
installed on trays 12, 27 and 42. As a result, reliable composition estimates can only be obtained for these three trays. In order to use the bottom product composition estimator as an integral part of any control scheme, it must be implemented within the computational constraints of the available Distributed Control System (DCS). Thus, a simple inferential model is required that makes use of the available column measurements. Previous attempts at the development of an inferential estimator based upon this data set have proved troublesome (Musch, 1994). High-order polynomial models were required in order to achieve even a reasonable level of accuracy. Therefore, this example has been included to demonstrate the ability of GP to develop a parsimonious estimator. Input-output data was obtained from a phenomenological model of the column. The model comprises several hundred differential and algebraic equations describing the column dynamics (Gani et al., 1986). To develop an estimator, one hundred and fifty records of steady-state composition estimates from trays 12 (x~2), 27 (x27) and 42 (X42) together with the corresponding value of xa were used. A second set of fifty data records was used for model validation. When our GP algorithm was applied, 16 out of the 50 runs were successful. Success was defined as achieving a RMS error of less than 0.02 on the verification data set. Using the binomial distribution, we are thus 95% confident that 20--47% of all runs would meet the required tolerance on the validation data set. This success rate was considered somewhat low. In an attempt to improve the performance of the algorithm, the
8 6 ~'4 0 2 0
0
0.05
20 !
40 60 Data Points I
I
I
!
80
1O0
!
LU '10 °w
0
¢/}
n-
-0.05
I
0
20
40 60 Data Points
Fig. 5. Model fit and residual error, tutorial example
I
80
100
Steady-state modelling of chemical process systems population size was doubled and another 50 runs were carded out. The number of successful runs increased to 24. Therefore, the proportion of successful runs was increased to 31-60% (at the 95% confidence level), a significant improvement. Based upon the complete set of successful models, we are 95% confident that 54-85% would contain input variable xl2, while 89-100% of all models would contain x27 and x4v Therefore, it would appear that the composition on tray 12 is less important than the other two. This result is intuitive, as it would be expected that the compositions near the top of the column would be more weakly related to the bottom product composition.
991
An F-test was then used to find the structures that were not significantly worse than the best model (as determined by the minimum RMS error on the verification data set). The set of successful models was reduced to five, four of which were identical after simplification. The final two models were:
xn =
- 3.166+2.574( - 0 . 6 6 8 8 +x27) 2 ) 2 _-L2..32~-+-~4,
- 2.25 +x,., +
(2O) X
_ _ ~ (3A3 i . q , +(27.0576_ 27.07.rl,~r4,+ 1.724x,7 _ 2.63)
n-~42
.
.
.
.
(21)
Condener }DisDt)llate Reflux
XI2
Feed
v
X
X42
Boil-up _
Reboiler
BottomProduct B, xB Fig. 6. Binary vacuum distillation column. Table 4. Binary vacuum distillation column data. Number of trays Column diameter (m) Feed tray Murphree tray efficiency (%) Relative volatility
48 0.85 24 44 1.61
Top composition xD (%) Bottom composition x B(%) Feed composition x r (%) Feed flowrate F (I/h) Top pressure (mbar)
99,8 2.0 80-95 50--400 30
B. McKAy et al.
992
Both models performed equally well in terms of their performance as an estimator, yet neither appeared to have captured any significant physico-chemical process knowledge. Thus, (20) was chosen as the estimator because of its predominance in the final set of successful models. This model had an RMS on the training data set of 0.011 and a RMS on the verification data set of 0.015. Fig. 7 shows the actual and predicted bottom composition on the verification data set, and the corresponding residual error. It should be noted that an excellent inference has been obtained. Modelling of a continuous stirred tank reactor system
The dual reactor system under consideration here is shown in Fig. 8. It consists of two continuous stirred tank reactors (CSTRs) in series. Reactor 1 is fed by a stream rich in a reactant A, of concentration CAFz and flowrate QF,. Within the system, the following exothermic reactions take place: A---,B---*C Reactant A is converted to product B, but at high temperatures B undergoes further reaction and is transformed into the undesired by-product C. Each reactor is cooled by means of an external pump-around heat exchanger. The temperature in each reactor (Tj and T2, respectively) may be regulated by manipulating the flowrate of cooling water (Qc~ and Qc2) to the corresponding heat exchanger. The full mechanistic model of the system (and associated constants) is presented in Appendix 1. It is expected that there will be values of T, and T., that will maximise the production of B. A simple temperature control scheme could easily be designed to maintain the
0.6
!
I
reactors at these set-point temperatures. However, the set-points that maximise the production of B may change due to feed disturbances caused, for example, by stockto-stock variation in raw material. In this situation, an on-line steady-state optimising control scheme could be used to provide set-points to the lower level temperature controllers that will maximise the amount of B produced. Thus, the objective of this study was to use GP to develop a non-linear, steady-state, input--output model, that could be used to determine the best temperature setpoints for a regulatory temperature control scheme for this system. Steady-state input--output data was obtained from the mechanistic model of the reactor system. Three hundred data points were used for the development of the model, and a further three hundred were used for verification. The data set consisted of values for the two manipulated variables T I and T2, as well as the disturbance variables CA~I and Q~l and the required process output CB2. The magnitude of the variation in the feed flow rate was set to a level that would not significantly affect the product composition, a situation which would normally occur if the feed flow was kept under tight regulatory control. Thus, the feed flowrate has only been included as a potential input variable in order to test the algorithms ability to discriminate between relevant and irrelevant inputs. Before presenting the data to the GP algorithm, all of the variables were scaled as the magnitudes of the variables were considerably different. For this example, all results are presented in terms of the scaled variables (Tt,T~.,CAr~ and C~), where the bar denotes a scaled variable. When the GP algorithm was applied to this problem, 66-90% of the runs could be expected to be successful. i
i
I-- re ed ctul
tO
:~ 0.4 o
~ 0.2 o 0
30 20 Data Points
10
.... 0
40
-.----.:
50
I,
_
i -0.05 -0.1 i 0
i 10
R , 20 30 Data Points
, 40
, 50
Fig. 7. Model fit and residual error, bottom product composition (xB) in a vacuum distillation column.
Steady-state modelling of chemical process systems Feed
993
'L
QFI, CAFI
Reactor 1
9
A ~ B ~ C Reactor 2
Heat E x c h a n g e r 1 Product A ~ B ~ C
Qcl----I~
CB2
Heat E x c h a n g e r 2
Fig. 8. Two CSTR reactor system. Success was defined here as achieving an RMS error of 0.06 on the verification data. Based on the set of successful models, 88-100% of successful models will contain Tj and T2, 73-96% will contain CAe~, while Ce~ will be present in 23-56% of successful models. So, as expected, it would appear that CFj is less significant than the other three variables for the accurate prediction of
C82. Once again, an F-test was performed to find the models that were not significantly worse than the best model. This reduced the set of successful models to just two:
C---~=0.369~- 1.089T, T2 - 0.725T'~+ 0.577Tt
Conclusions
- 0.8076T-'~+ 0.431TI'T'T~+ 0.431T--~l - 0.684T-'~1 +0.154Carl +0.78
(22)
C--~= 6.45 + 5.089T-S+ 1.546T~1- 4.48 I.exp(~) + 1.607T2 - 1.201 .exp(T2) - 0.995 TIT2 + 0.054TICArl +0.166CAFI
It is also worth noting that an inspection of (22) and (23) reveals that there is little or no interaction between CAe~ and the two reactor temperatures in determining Ca. Thus, if there are changes in CA~~ the steady-state optimiser will not adjust the temperature set-points, as they are already at the optimum. The only consequence will be a change in the product quality (CB2). Thus, provided the system is time invariant, a single off-line optimisation would be sufficient to determine the necessary temperature set-points. This process knowledge is not immediately apparent from an examination of the mechanistic model.
(23)
The performance of both models was comparable. However, (23) was chosen as it explicitly displayed the exponential temperature dependencies that are present in the mechanistic model of the reactor system. This model had a RMS on the training data set of 0.021 and a RMS on the verification data set of 0.028. Fig. 9 shows the actual and predicted values of C a on the verification data, and the corresponding residual error. It may be noted that the residual error is extremely small and thus an excellent approximation of C a has been obtained.
The application of genetic programming to the development of steady-state models of chemical process systems has been considered. Three examples were used to highlight the utility of this approach. The results revealed that in each case the GP algorithm could generate an accurate input-output model based solely on observed data. A distinct advantage of this method is that no a-priori assumptions have to be made about the actual model form: the structure and complexity of the model evolve as part of the problem solution. Moreover, it would appear that GP has potential in the area of data analysis. In all of the examples presented, our algorithm was able to discriminate between relevant and irrelevant input data, evolving parsimonious system representations. One possible perceived disadvantage of the inputoutput models obtained using GP is that the identified
994
B. McKAv e t al. I
,.-
I
° 01 o Q.
E . 1 o L~ -2 0
I
I
I
I
I
50
1O0
150 Data Points
200
250
300
0.4
_•0.2 n-
-0.2
0
I
I
I
I
I
50
1 O0
150 Data Points
200
250
300
Fig. 9. Model fit and residual error, scaled outlet concentration 0 from the two CSTR reactor system. structure does not provide detailed phenomenological information about the system being modelled (a characteristic that this approach shares with other black-box modelling techniques, including neural networks). However, in each case the final model form clearly indicates the relative contribution of each input to the output. This allows the sensitivity of the model to changes in various inputs to be judged with relative ease. The preliminary results presented in this paper indicate the potential of GP for developing models of non-linear process systems: the methodology can perform both input recognition and model form recognition. In our application studies we assumed 100, 150 and 300 data records were available for respectively, the tutorial, distillation and C S T R examples. In any practical exampie this would correspond to the number of experiments that would have to be performed on the process in order to obtain the data. It would be interesting to study how gracefully the model degrades as the amount of data becomes more scarce. Finally, results to date are based upon modelling steady-state (albeit non-linear) characteristics of chemical processes. If this technique is to achieve widespread application and utility, then it must be capable of modelling dynamic systems. Current work is focusing on the extension of the ideas presented here to the modelling of dynamic systems.
Nomenclature ot a, a, f~ G
Confidence level Tree size penalty function parameter one Tree size penalty function parameter two Sum of the fitness of all members of the population Fitness of the ith member of the population Number of generations
J L M N n P~
einp Pm p ....
P~o¢ ,~i,,, ~P ~',,td R r crp or SL U u~ X Y Yr, yr
Number of runs Lower probability bound Number of new individuals bred from old population Population size Number of input variables Probability of crossover Actual probability of occurrence of input variable Probability of mutation Probability that an input variable will be mutated to a constant (and vise versa) Actual probability of a successful run Estimated probability of a successful run Proportion of the old population to be replaced with new individuals Proportion of the old population to be preserved Number of data records Correlation coefficient Standard deviation of predicted values Standard deviation of target values Tree size expressed as string length Upper probability bound Input variable i Number of acceptable runs Output Predicted value of output Target value of output
Distillation Column B Bottoms stream D Distillate stream F Feed stream xt2 Pressure corrected temperature (composition)on tray 12 x27 Pressure corrected temperature (composition)on tray 27 x42 Pressure corrected temperature (composition)on tray 42 xB Bottoms composition xo Distillate composition xo Feed composition
Steady-state modelling of chemical process systems Reactor System CAFt
Concentrationof A in the feed stream to reactor one
C,,_
Concentrationof B in the effluentstream of reactor two
QCI Qc,QFI
Tj T2
Cooling water flow rate to reactor one Cooling water flow rate to reactor two Feed flow rate to reactor one Temperature in reactor one Temperature in reactor two
Acknowledgements
The authors would like to thank Ming Tham, Marie South, Hans Musch, Justin Elsey and Charles Sanderson for providing a forum for the debate and discussion of many of the ideas presented in this paper. References
Androulakis, I.E and Venkatasubramanian, V. (1991)A genetic algorithm framework for process design and optimisation. Computers and Chemical Engineering 15, 217-228. Breiman L., J. Friedman, R. Olshen and C. Stone, Classification and regression trees, Wadsworth (1984). Cybenko, G. (1989) Approximation by superposition of sigmoidal functions. Math. Control Signal Syst. 2, 303-314. Edger T. E and D. M. Himmelblau, Optimisation of chemical processes, McGraw-Hill (1989). Fogarty T. C., Adapting the rule base, Proc. of the 28th Conf. on Decision and Control, Tampa, USA, pp 761-766 (1989). Gani, R.C., Ruiz, A. and Cameron, I.T. (1986) A Generalised Model for a Distillation Columns - - I. Comp. Chem. Eng. 10, 181-198. Goldberg D. E., Genetic Algorithms in Search, Optimisation and Machine Learning, Addison-Wesley, Reading, Massachusetts (1989). Iba H., T. Kurita, H. Garis, and T. Sato, System Identification Using Structured Genetic Algorithms, Proc. of the 5th Int. Conf. on Genetic Algorithms, Urbana--Champaign, USA, pp 276-286 (1993). Kinnear K. E., Advances in Genetic Programming, MIT Press, (1994). Koza J. R., Genetic Programming: On the programming of computers by means of natural selection, MIT Press, (1992). Koza J. R., Finding an Impulse Response Function Using Genetic Programming, Proc. of the ACC, pp 2345-2350, San Francisco, USA (1993). Koza J. R., Genetic Programming II, MIT Press (1994). McKay B., C. J. Chessari, G. W. Barton, O. Agamennoni, J. A. Romagnoli and E Watson, Supervisory control of an industrial catalytic reformer using an on-line neural network model, Submitted to J. Proc. Control (1996). Musch H. E., Robust Control of an Industrial HighPurity Distillation Column, Ph.D. Theses, Measurement and Control Laboratory, The Swiss Federal Institute of Technology (ETH), Diss. No. 10666 (1994). Nordvik J. E and J. M. Renders, Genetic algorithms and their potential for use in process control: a case
995
study, Proc. of the 4th Int. Conf. on genetic algorithms, Eds. R. K. Belew, and L. B. Booker, pp 480--486 ( 1991). Qin, S.J. and McAvoy, T. (1992) A data based modelling approach and applications. Dycord 92, 321-326. Schulze-Kremer S., Genetic algorithms for prediction of protein conformations, lEE Colloquium digest, 119, Statistical, logical and connectionist models for learning, Pt. 1 (1992). South M. C., The application of genetic algorithms to rule finding in data analysis. Ph.D. Theses, Dept. of Chemical and Process Eng., The University of Newcastle upon Tyne, UK (1994). South M. C., S. McConnel, M. T. Tham, M. J. Willis, Data Analysis Via Symbolic Regression, Submitted to the Trans. IChemE, (1995). TenenbaumA. M. and M. J. Augenstein, Data structures using pascal, Prentice Hall Inc, Englewood Cliffs, USA (1981). Weiss S. M. and C. A. Kulikowski, Computer systems that learn, Morgan Kaufmann, San Mateo, USA (1991). Willis, M.J., Di Massimo, C., Montague, G.A., Tham, M.T. and Morris, A.J. (1992) Artificial neural networks in process estimation and control. Automatica 28, 1181-1187. Willis M. J., G. A. Montague, and C Peel, On the application of artificial neural networks to process control, Applications of Neural Networks. A. Murray (ed), p 191-219, Kluwer Academic Publishers (1995). Zhang B. and H. Muhlenbein, Genetic Programming of Minimal Neural Nets Using Occam's Razor, Proc. of the 5th Int. Conf. on Genetic Algorithms, Urbana-Champaign, USA, pp 342-349 (1993).
Appendix Dynamic model for each CSTR
• Materialbalancefor componentA V dCai =
' dt
-ca
-Ku,e ~r~C~V~+Qr~(CAe,-CA~)
(AI)
• Materialbalancefor componentB
dC~i -= ' dt =K°"e tc~'CniV'+Qri(CeF'
Ct~i)
(A2)
• Overallmaterialbalance
dr, ~lt =QF,- Q,=0
(A3)
• Energy balance
dT,, -E, -~-,, pcpV~~ - =AHkKoe jr, Ca V,+AIHRK e ~r~CnV~ + pc/,Qgi( TFil -
Ti)+ 19ct,Qtci( Ttcii- Ti)
(A4)
996
B. McKAv et al.
Dynamic model for each heat exchanger
• Material balance for component B
• Energy balance for recycle stream(external cooling) C ~ = aO~,G,
dTRi
ceVki d~ =Pc"QR'(T'- Tn') + UA(T,,- Tn,)
Q~.
(A5) • Energy balance
• Energy balance for cooling medium
dT,.i ceV,, ~ - =pc~Qc,(T,.,.,,, - T~,)+UA(T~,- T,)
aQ~,T, +(1 a)Q~,Tr, -
(A6)
Steady-state model for the mixer
a~.
(AI0)
(aQr, Cal +( l - ot)aelCAel)
(A7)
Model parameters Parameter
KA EA /'/R~ KB e~ H~B R VI
V2 Ui
V2 AI A2
Vc, VR, Vc2 V~2 P
%
(A9)
• Overall material balance
• Material balance for component A CA~. =
(A8)
Description Rate coefficient(A-.B) Activation energy(AB) Heat of reaction(AB) Rate coefficient(BC) Activation energy(BC) Heat of reaction Gas constant Volume, CSTR 1 Volume, CSTR 2 Heat transfer coefficient, CSTR 1 Heat transfer coefficient, CSTR 2 Area, heat exchanger (HX) 1 Area, heat exchanger (HX) 2 Volume, cooling water in HX 1 Volume, process stream in HX 1 Volume, cooling water in HX 2 Volume. process stream in HX 2 Process fluid density Process fluid heat capacity
Unit molls J J/mol molls J J/mol J/mol K m3 m3 W/m2K W/m2K m2 m" m3 m3 m3 m3 kg/m 3 J/kgK
Value 7 X 10 u 90 000 80 000 9 × 10 u 100 000 40 000 8.314 3 3 3000 3000 100 100 0.2 0.2 0.2 0.2 1000 4180