Computers
Pergamon PII: SOO98-1354(98)00103-3
them. Engng Vol. 22, Suppl.. pp. S579-S58.5, 1998 0 1998 Elsevier Science Ltd. All rights reserved Printed in Great Britain 0098-1354/98 $19.00 + 0.00
Sequencing of Batch Operations for a Highly Coupled Production Process: Genetic Algorithms Versus Mathematical Programming Thomas LGhl, Christian Schulz and Sebastian Engell Process Control Group Department of Chemical Engineering University of Dortmund D-44221 Dortmund Germany email: {loehl, Schulz, engell) @ast.chemietechnik.uni-dortmund.de FAX: ++49-231-7555129 Abstract: In this contribution the application of a genetic algorithm to a real-world scheduling problem with highly coupled production is presented. Since the schedules are highly constrained, special attention is paid to the handling of constraints at the different levels of the implementation of the algorithm. The quality of the result and its numerical performance is discussed in comparison with a mathematical programming algorithm. 0 1998 Elsevier Science Ltd. All rights reserved.
INTRODUCTION
PROCESS DESCRIPTION
In this contribution we present the application of a genetic algorithm to a real-world scheduling problem from the polymer industries. Genetic algorithms have proved to be efficient in optimisation problems which are dominated by a combinatorial element as in the problem presented here; furthermore, they are not as much affected by nonlinearities and complex objective functions as mathematical programming algorithms. However, it is quite difficult to include complex constraints into the internal representation to ensure the feasibility of the generated schedules. Thus, the major focus of this paper is to show how this modelling problem can be solved. Mathematical models on the other band allow to incorporate almost any constraint in the problem formulation, but the solution algorithms on the other hand suffer from the combinatorial complexity, especially if nonlinearities are included in the model. After giving a description of the scheduling process and pointing out the particular properties of the scheduling problem, the genetic algorithm is described in detail. For comparison, a mathematical model and solution algorithm will be sketched. Finally, we compare both approaches with the help of numerical case studies and discuss their relative merits.
The plant shown in figure 1 is used to produce two different types of polystyrene in several grain fractions. The production process is divided into the main steps preparation of raw material, polymerisation, finishing of the polystyrene suspension in continuously operated production lines, and splitting into the different grain fractions for final storage. The process is of the flowshop type, i. e. all recipes have the same basic structure and differ only in the parameters and in certain steps. In the preparation step, batches of input material are mixed in vessels where they have to reside for a certain period which ranges from half an hour to several hours. Then, the mixtures are pumped into one of several storage vessels the capacity of which is adapted to the batch size in the polymerisation step. There, further additives are added. The storage vessels are grouped according to the types of polymer produced and this assignment cannot be changed during plant operation. One of the inputs to the polymerisation is a mixture of styrene and some additives. The number and the composition of the additives determine the grain size distribution of the product and the type of polystyrene. These parameters are not varied continuously but certain fixed sets of parameters are used to control the grain size distribution. s579
xi80
European Symposium on Computer Aided Process Engineering-8
To start the polymerisation, input material is pumped into one of the reactors. The batch size is fixed in the polymerisation step because the filling level of the reactors influences the grain size distribution. Variable batch sizes would introduce a complexity which currently cannot be handled
always be a certain amount of production to storage and unwanted grain fractions are always produced for which only low prices can be obtained. The second objective thus is to minim& the amount of overproduction.
-mi
Line
J7J-j
1
Line 2
-
I
/-.
i
L_
Preparation
Reaction
Mixing
Finishing
Fig. 1: polymerisation process. appropriately. The polymerisation consists of four phases of almost equal duration. During the polymerisation, certain parameters also influence the grain size distribution. However, this dependence currently is not used actively due to the complexity and lack of precise howledge of the relationships. For safety restrictions, the start of a polymerisation run in one reactor must be delayed for a certain time after the start in any other reactor. The continuously operating finishing lines are coupled with the reactors by two vessels in which the batches are mixed. Each line is assigned to one type of polystyrene. A major problem for production planning results horn the limited influence of the free process parameters on the particle size distributions. A typical distribution is shown in Figure 2. The process parameters affect the distribution among the fractions but all fractions are always produced in significant amounts. Thus the production of a certain grain fraction cannot be related to a certain batch exclusively and hence all batches are coupled.
particle diameter Fig. 2: Particle Size Distribution During the scheduling horizon a number of orders have to be fulfilled. Each order consists of a due date and an amount of some grain fraction. The main objective is to produce the required grain fractions with minimum delay. On the other hand, there will
GENETIC ALGORITHM A key issue to solve an optimisation problem with genetic algorithms efficiently is the representation of the problem within the genetic search. Intelligent coding of the problem on the genetic level should include problem knowledge to reduce the size of the search space, thus excluding infeasible search regions as much as possible. But, not all constraints of real-world problems can be handled by an appropriate coding. These issues are discussed in the constraint handling section. General structureof the algorithm
The basic structure of the genetic search shown in figure 3.
Fig. 3: Genetic algorithm procedure A number of genomes which have been coded appropriately constitute a population. Within the genetic search, genomes are selected from the
European Symposium on Computer Aided Process Engineering-8 population with a probability proportional to their fitness values. Genetic operators such as crossover and mutation are applied on the selected genome and the objective value is calculated. Some or all of the members of the new generation replace the current population depending on the value of the objective function. Coding When coding an optimization problem into a genetic algorithm the criteria of completeness, soundness, non-redundancy and characteristics-preservingness (Kobayashi, 1995) should be satisfied. These criteria reflect the requirement that a solution coded in a genome has exactly one corresponding schedule and vice versa. The polymerisation stage is regarded as the stage where the crucial decisions are made. A closer look at the ordering of batches within this stage reveals that due to the minimal delay needed between the batches, there is a straightforward sequence of operations. Thus a linear genome type with the batches as genes is chosen. Because the number of operations which are necessary to satisfy the customers demand is not known aforehand and cannot be calculated easily due to the nonlinearity of the mixing stage, the length of the genome is maximum number determined by the polymerisations which can be scheduled. The actually scheduled batches are determined by the schedule builder. The batches producing the required intermediates for the scheduled polymerisation are determined backwards by a just in time policy. The mixing stage and the following continuous stage are simulated by solving mass balances. Constraint Handling With this coding, the capacity constraint of the reaction stage can be fulfilled within the evaluation of the genome. In order to cope with the problem of infeasible solutions due to delayed intermediates, equipment breakdown and the capacity constraints in the mixing stage, several possibilities exist. Our approach combines the advantages of the following constraint handling techniques (Michalewicz, 1995): ?? Including penalty functions into the objective function. Some constraints are removed from the representation and their violation is penal&d during the evaluation of the objective function. This method is appropriate if feasible regions within the search space are larger than infeasible regions, otherwise the algorithm would spend a lot of time evaluating infeasible solutions. This method increases the probability to explore better solutions while temporarily allowing to stay in infeasible regions of the search space. Another advantage is that the genetic algorithm itself remains problem-independent as the evaluation function is the only problem-specific part of the algorithm. Consequently, standard methods for genetic search can be used with minor
S581
modifications. The design of the fitness function should incorporate knowledge where it is more likely to find feasible regions. This can be accomplished by the use of a monotonic function, which is proportional to the degree of infeasibility. To avoid overlapping with the fitness functions of bad but feasible solutions and slightly infeasible solutions, an offset is used to separate the feasible from the infeasible production schedules. This offset is obtained by calculating the objective function for an empty plan. In the example presented here, a temporary capacity violation of the mixing stage is admitted. Using decoder algorithms. These algorithms are designed to increase the probability of generating a feasible solution firorn each genome. With the chosen representation, capacity violations in the reaction stage occur only rarely, as we use a decoder algorithm to assign only available resources to operations. In the decoder algorithm the sequence of genes is interpreted as the sequence of polymerisations. The scheduling of the batches and the determination of the starting time of each operation is calculated in order to avoid capacity violations. A detailed description is given in the section on the schedule builder. However, this method is computationally expensive. Repair algorithm. These algorithms locate potentially infeasible sequences of genes and correct these genes by local search procedures. Depending on the violated constraints, two types of repair strategies are real&d. In the case where intermediate products cannot be produced due to capacity limitations within the preparation stage at the time they were needed, the whole schedule is shifted until the required intermediates are available. Obviously, only small disturbances will still generate a feasible schedule. The other case is the detection and repair of sequences of batches which cause the continuous finishing line to run empty, independent of how the flow rate or the starting time of a batch are selected. This method acts directly on the level of the genome representation and is carried out only a few times within an optimisation run. Creating the initial population In order to obtain a good solution rather fast, the population is initialised with an approximation of the batches which are necessary to satisfy the production goal. Two methods are implemented, both of which balance the sum of production and demands at the end of tbe planning horizon and which neglect the mixing stage. ?? The first makes use of the coupled production explicitly. The matrix of grain size fraction is inverted and multiplied with the demand vector at the end of the scheduling interval.
S582
European Symposium on Computer Aided Process Engineering-8
In the second method, the effect of the coupling of products is neglected. It is assumed that the demand is satisfied by the polymerisations which have the maximum of the desired fraction. The result of both methods is a set of polymerisations which are not yet sequenced. When constructing a new member of the initial population, the probability for the choice of a certain polymerisation depends on its proportion in this set. This method ensures diversity within the initial population ??
The crossover operator
The crossover operator is the backbone of the genetic search, as it draws the search direction towards better solutions. This is done by combining good polymerisation sequences of successful genomes to form a new and hopefully better solution. Obviously, the choice of the crossover operator should include knowledge of how the solution of the parents can be improved. The relative order of the scheduled polymerisations influences the ratio of the fractions in the mixing stage. The implemented recombination crossover is based on the general&l order crossover presented in (Bierwirth, 1995) and is designed to inherit good partial sequences of operations. This is done by selecting a sequence of operations within one parent and inserting this sequence into the other, while maintaining the relative order of the batches within the generated offspring. However, a few modifications had to be made. In (Bierwirth, 1993, the type and the number of operations to be scheduled were known a priori, so that an optimal solution is a permutation of the initial jobs. In the problem presented here tbe number of batches is an optimisation variable. Mutation
The mutation operator serves to bring new genetic material into the population. In this case, a polymerisation is randomly chosen at a random position. The schedule builder
Within this problem dependent component of the genetic algorithm, the generated sequence of batches is scheduled with respect to the capacity constraints of the reaction stage. A general problem of this procedure is that only local information on the effect of the starting times is available. A greedy heuristic which starts all operations as soon as possible may even produce an infeasible schedule if the continuous finishing lines run empty later. Therefore a more sophisticated approach is applied. For each gene which represents a polymerisation, the starting time is chosen within the feasible bounds. Several constraints influence the size and the position of this interval: ?? the available capacity of the reaction stage; ?? due to the safety constraints a minimal delay time between the starting of the polymerisation is needed;
the mixer constraint yields an upper and a lower bound of the time interval from the under- and overflow of the mixer If the variation of the feed rate is possible, the actual mass flow is chosen to maintain a fixed level in the mixer for each line. This level is chosen to be the half of the capacity, to ensure maximum flexibility to keep the actual capacity within feasible bounds. If too many polymerisations for one line are scheduled, the size of the interval is reduced to a point. However, it is possible that no feasible time interval could be determined, causing the evaluation of this genome to stop. After all genes are scheduled, the overall objective function of the genome is calculated by a forward simulation. The simulation calculates the produced material during the schedule horizon and detects the degree of the capacity constraint violation of the mixing stage. ??
The objective function
To reflect the production goals, a weighted sum of over- and underproduction is used as the objective function. If a mixer capacity violation occurs, the penalty function is added to this sum. MATHEMATICAL ALGORITHM Due to the limited space and since the main topic here is the genetic algorithm, we can only give a brief description of the mathematical model and the solution algorithm. Modelling issues
The model is based on a discrete representation of time with an a priori determined uniform interval duration, cf. e. g. (Reklaitis, 1995). Modeling techniques with a continuous representation of time as in (Zhang, 1995) or (Schilling, 1996) have shown to decrease the number of intervals and thus the number of continuous variables; the number of binary variables, however, increases in comparison to the discrete model. In this model, only the discretely operated stages for the intermediate production and the polymerisation perform tasks, the continuously operated mixing and finishing stages are either down or up. The up- or down-status (which may depend on time) is not determined by the solution algorithm but can be set externally as a model parameter, e. g. to represent maintenance periods. In the polymerisation stage, the different grain size distributions are produced by different recipies which in turn have to be represented by different tasks, i. e. for each distribution one task and thus one binary variable have to be introduced at each time instance Further binary variables describe the start of operations in the intermediate production stage. Continuous variables represent the masses of the substances in each process stage and the continuous feed in the finishing lines. A major computational difficulty is introduced by the mixing the output of the stage, where polymerisations is mixed and continuously fed into
European Symposium on Computer Aided Process Engineering-8
the finishing lines. In order to determine the mass of each fraction in the feed we have to determine the actual concentration of each fraction at each time instance because the concentration in the mixer and the feed has to be identicaI. This leads to equations of the form !!E_=, f
M,
F,
where m and M denote the mass of each fraction and the overall mass in the mixer, f and F the respective masses in the feed (all variables subject to optimization). These equations are nonlinear and nonconvex and there is no exact method to transform these equations into a set of linear or convex expressions. Thus, the resulting problem is a large Mixed Integer Nonlinear Program (MINLP). Algorithmic issues
When trying to solve MINLPs, several standard techniques exist. One choice is a branch & bound algorithm similar to the ones used when solving MILPs, but for a nonconvex problem, as given here, it is difficult and computationally expensive to derive good bounds. Although it is possible to calculate a lower bound by a convex relaxation, the quality of the bounds is dubious. Furthermore, tbe bounds cannot be calculated directly from the solution of the relaxed MINLP but have to be calculated by solving an additional problem at each branch which increases the computation time. The main reason for not performing a full branch & bound is the solution time of the relaxed MINLP which is far too high for this approach. Other techniques are sequential linear programming (SLIP) or the OA/EWAP used in DICOPT++ (Visvanathan et al., 1990). Both algorithms require the subsequent solution of MILPs and for the given problem size, not even a feasible integer solution can be calculated in reasonable computation times, let alone an optimal solution. For these reasons, a specific solution algorithm was developed. The main goal of this algorithm is to find a feasible solution by performing a depth first search in the decision tree (which represents the sequencing and timing of tasks) by subsequent solutions of relaxed MINLPs. Scheduling algorithm
The tasks of the scheduling algorithm are: ?? to determine the necessary polymerisations ?? to determine their sequence and timing ?? to schedule the operations in the intermediate preparation stage ?? to determine the feed rates for the two finishing lines. In order to fulfil these tasks, the algorithm’s primary goal is to subsequently schedule polymerisations beginning with the start of the planning horizon and to fulfil the other tasks in a secondary stage after each single scheduling decision. Thus, the primary Ucf*2:*,-T
S.583
question is to find a good estimate for the next polymerisation and its timing. This decision is based on the values of the binary variables in the solution of the relaxed MINLP, where they may take any value between 0 and 1: the algorithm searches for the next tune interval where one of the binary variables denoting the start of a polymerisation is larger than zero. At that time instance, it searches for the largest value of the binary variables and schedules the respective polymerisation at this time instance. The next decision, when to start operations producing the intermediates, is also based on the values of the respective binary variables. Here, the search is performed from the beginning of the planning horizon until the time instance is reached at which the next polymerisation will be started. Then, the values of all binary variables up to the current scheduling horizon are fixed to the values determined by the scheduling decisions and the previous solves and a new MINLP is generated and solved. None of the continuous variables (among other the feed rates) are fixed but they are determined by the NLP-solver. There is, however, no guarantee that at each stage of the decision tree only feasible problems are generated. Although the capacity constraints of the reactors are not violated by iteratively scheduling from the beginning of the planning horizon, an overor underflow in the mixing stage may occur, one of the finishing lines may run empty or not enough intermediates can be produced. Thus, backtracking is inevitable. In order to limit the amount of backtracking, the next choice after a backtracking step should be determined by the reason for the unfeasibility. However, the constraint violation reported by a NLP-solver cannot reveal this reason. But one can get a good guess by partially simulating the production plan which directly allows to detect e. g. overflows. When backtracking, one has the following choices: ?? schedule a polymer&ion which produces into another line at the same time instance ?? schedule the last polymerisation earlier or later . ,,unschedule” the last polymerisation, go back in time and perform the above possibilities for the previous polymerisation. At each stage the scheduling for the intermediates has to be recalculated. The search is stopped when the end of the planning horizon is reached The search could easily be continued by searching other branches after the first integer feasible solution has been found until a time limit is reached. For the purpose of this comparison, however, no additional branches were investigated. NUMERICAL RESULTS To investigate the relative merits of the presented algorithms, several test cases were examined. Experiments were run with scheduling horizons of 8, 10, 12 and 14 days resulting in a maximum number
S584
European Symposium on Computer Aided Process Engineering-8
of 36, 48, 59 and 70 batches. For each of these scheduling horizons, two types of problems were formulated. In the first class, both algorithms could make use of all degrees of freedom regarding the timing of the operations and the adjustment of the continuous process variables. Scheduling was always performed under limited availability of the shared intermediate and polymerisation reactors. The genetic algorithm, however, has fewer degrees of freedom concerning the flow rates and the starting times of the polymerisations than the mathematical programming algorithm. Therefore. in a second set of numerical experiments, the degrees of freedom were reduced to a subset by holding the suspension streams constant and fixing the starting times to preset values. In this case, there was no restriction imposed on the availability of the other units. For this problem class, the degrees of freedom reduce to the combinatorial aspect: to choose the number and sequence of polymerisations. When comparing the quality of algorithms, it is not only of interest to measure the relative performance but also to rate them against a global measure. Due to the non-convexity of the problem, the overall optimum cannot be calculated in general (at least not in reasonable computing times). To examine the quality of the solution, satisfiable demands were used since then the optimal value of the objective function equals zero. In the objective function, overproduction was weighted with a factor of 1110, underproduction with a factor of 1.
\variables 6733 8707 10661 1261.5 binary variables 696 902 1088 1274 constraints 7043 9107 11159 13213 Table 2: Problem size for second problem class The genetic algorithm was terminated after a certain time limit was reached. This time limit was determined by the average solution time of the mathematical programming algorithm for each problem class and planning horizon. The solution times varied from half an hour for the smallest problems (second problem class) and 5 hours for the largest problems (on a SUN SPARCStation 20). Results In tables 3 and 4 the numerical results are listed, In the rows, GA denotes the genetic algorithm, MP the mathematical programming algorithm. Since in all cases the underproduction dominated the value of the objective function, we also list the sum of all demands over the planning horizon (denoted by DS); then, the objective value can directly be rated against the delayed orders. Scheduling horizon in days
8
10
12
14
Complexity of the test problems In the following tables the complexity of the test problems is listed. For the genetic algorithm, the size of the search space in the genomes is determined by the maximum number of polymerisations which can be scheduled and the number of alternative choices for each polymerisation. We have chosen 10 different grain size distributions which is similar to the number actually used in the production process. For the mathematical algorithm, the number of binary and continuous variables is listed as well as the number of constraints. These numbers are taken from the initial relaxed NLP, they decrease in the course of the depth first search. The number of binary variables is much smaller in the second class of test cases because many of them are fixed a priori.
Table 3: Results for the first problem class Scheduling horizon in days
8
10
12
14
Table 4: Results for the second problem class Discussion Table 1: Problem size for first problem class
First, it can be stated that the mathematical programming algorithms produced better results than the genetic algorithm for both problem classes. Secondly, the quality of the results of the genetic
European Symposium on Computer Aided Process Engineering-8 algorithm does not seem to depend on the planning horizon and thus the size of the search space. when the feed rate and starting times of the operations are fixed. With all degrees of freedom, the performance of the genetic algorithm becomes worse for longer planning horizons. We attribute this tendency back to the limited degrees of freedom concerning the adjustment of the feed rate. Although the genetic algorithm currently cannot modify both the starting times and the feed rate simultaneously, the starting times cannot explain the bad performance, because they were almost equal to the starting times of the mathematical programming algorithm. The method by which the genetic algorithm calculates the feed rate, however, reduces the number of feasible sequences because it only makes forward adjustments (forward seen from the polymerisation which is currently to be scheduled). Thus, sequences are infeasible in the genetic algorithm which can be “made feasible” by the NLP solver because it adjusts the feed rate over the whole planning horizon. This effect becomes more important the longer the planning horizon is because then more “in reality” feasible sequences of polymerisations are regarded as infeasible than for short horizons. In future research, strategies for adjusting the feed rate backwards in rime will therefore be implemented. As can be seen from the results for identical degrees of freedom, however, the optimal sequences are never found. Fig. 4 depicts the typical behaviour of the solution procress of the genetic algorithm. The algorithms for finding an initial guess are very good in comparison to a purely stochastic initialisation. Furthermore, in all test cases, they generated feasible initial populations. The progress, however, does not show the typical curve shown for genetic algorithms and is rather slow, but the typical curve usually is shown for a random initialisation. Furthermore, we found that the crossover operators generate a high percentage of infeasible solutions which reaches more lhan 90%. Solution quality 50
S58.5
criterion for the genetic algorithm, e. g. to stop if for a certain number of generations no improvement can be observed, the genetic algorithm would terminate after approximately half the computation time of the mathematical program. Furthermore, the genetic algorirhm could be stopped at any time, leaving a feasible and not too bad solution whereas the mathematical programming algorithm had only have calculated a partial plan which might not lead to a complete and feasible schedule. SUMMARY We have presented a genetic algorithm for a real world scheduling problem from the polymer industries. It has been shown that the quality of solution obtained by the genetic algorithm is not so good as the value of the objective function of the mathematical approach. On the other hand, if a feasible and not too bad schedule is desired within a relatively short computation time, the genetic algorithm is the method of choice, esp. when the weknesses which we have pointed out are eliminated. ACKNOWLEDGMENTS This research was partially funded by the Deutsche Forschungsgemeinschaft under grant EN 152/17-l which is greatly acknowledged. REFERENCES Bierwirth, C., 199.5, A Generalized Permutation Approach to Job Shop Scheduling with Genetic Algorithms, OR-Spektrum 17, No. 213, 87-92. Kobayashi, S, Ono, I., and Yamamura, M., 1995, An efficient genetic algorithm for job shop scheduling problems, Proc. of the 6”’ Int. Conf: on genetic algorithms (ICGA ‘95), pp. 506-511. Michalewicz, Z., 1996, Genetic Algorithms + Data Artificial = Evolutionary Programs, Intelligence, Springer, Berlin, 2”d Edition.
Structures
Reklaitis, G. V., 1995, Scheduling approaches for the batch process industries, ISA Transactions 34, 349-358.
45
Schilling, G., and Pantelides, C. C., 1996, A simple continuous-time process scheduling formulation and a novel solution algorithm, Sixth European
t
40 4
-
Solution of mathematical programming
Symposium on Computer Aided Process Engineering ESCAPE-h, Rhodos, Griechenland, 1221-1226.
z ” 325
Visvanathan, J. and Grossmann, I.E., 1990, A combined penalty function and outer approximation for MINLP optimization, Computers & Chemical
.@O815.
‘.
SOlubulbnmo,mdl p,c$ a1g=a
10 -[_I ’ ._‘t____ -.____ -_.__ ._._ _ ---, _ ---------____I_____“___~.~_~. 5b -~__________________-_--____-_______I Ou0
:
,
2000
8000 ioooo 4000 6000 Computatton time I sec.
12000
1
Engineering,
I
x. , 1995, Algorithms for optimal process scheduling using nonlinear models, PhD Thesis, Imperial College, London.
zhang, 14000
Figure 4: Solution progress of the genetic algorithm The genetic algorithm calculates good feasible solutions much faster than the mathematical programming algorithm. With another stopping
14, 769-782.