Journal Pre-proof A reinforcement learning decision model for online process parameters optimization from offline data in injection molding Fei Guo, Xiaowei Zhou, Jiahuan Liu, Yun Zhang, Dequn Li, Huamin Zhou
PII: DOI: Reference:
S1568-4946(19)30609-X https://doi.org/10.1016/j.asoc.2019.105828 ASOC 105828
To appear in:
Applied Soft Computing Journal
Received date : 10 April 2019 Revised date : 31 July 2019 Accepted date : 30 September 2019 Please cite this article as: F. Guo, X. Zhou, J. Liu et al., A reinforcement learning decision model for online process parameters optimization from offline data in injection molding, Applied Soft Computing Journal (2019), doi: https://doi.org/10.1016/j.asoc.2019.105828. This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
© 2019 Published by Elsevier B.V.
*Highlights (for review)
Journal Pre-proof Process parameters optimization is modeled with Markov Decision Process in injection molding.
A decision model combined with a self-prediction model built by neural network is formulated.
Strategies of optimization is learned by the decision system through reinforcement learning.
The proposed system is validated by an ultra-high precision lens product in practical production.
Jo
urn a
lP
re-
pro of
*Manuscript Click here to view linked References
Journal Pre-proof A reinforcement learning decision model for online process parameters optimization from offline data in injection molding Fei Guoa, Xiaowei Zhoua, Jiahuan Liua, Yun Zhangb,*, Dequn Lia, Huamin Zhoua,* a
State Key Lab of Material Processing and Die & Mold Technology, Huazhong University of Science
and Technology, Wuhan 430074, China. b
School of Materials Science and Engineering, Huazhong University of Science and Technology,
*
pro of
Wuhan 430074, China.
Corresponding author. Email:
[email protected],
[email protected].
Abstract
Injection molding is widely used owing to its ability to form high precision products. Good
re-
dimensional accuracy control depends on appropriate process parameters settings. However, existing optimization methods fail in producing ultra-high precision products due to their narrow process windows. In order to address the problem, an online decision system which consists of a novel
lP
reinforcement learning framework and a self-prediction artificial neural network model is developed. This decision system utilizes the knowledge learned from offline data to dynamically optimize the process of ultra-high precision products. Process optimization of an optical lens is dedicated to
urn a
validating the proposed system. The experimental results show that the proposed system has excellent convergence performance in producing lens with deviation not exceeding ±5μm. Comparison with the static optimization method prove that the decision model is more robust and effective in online production environment. And it achieves superior results in continuous production with the process
Jo
capability index of 1.720 compared to 0.315 in fuzzy inference system. There is great potential for utilizing the proposed data-driven decision system in similar manufacturing process. Key words:
Intelligent manufacturing; Injection molding; Neural network; Reinforcement learning;
1
Journal Pre-proof 1. Introduction Injection molding (IM) is a widely used process in plastic production owing to its capability to produce complex-shaped, good-performance, and high-precision products. The injection molding process consists of four basic phases: filling, packing, cooling, and ejection. This continuous process is
pro of
controlled by process parameters. Therefore, high-quality products depend on appropriate process parameter settings. Traditionally, the molding process parameters have been determined by a trial-and-error procedure. The engineers adjust the process parameters through repeated trial production according to their own experience until the specifications of product quality are satisfied. However,
re-
since the relationship between product quality and the process parameters is ambiguous as a result of the coupling between different control variables [1], there are still challenges in determining the process parameters for high-quality products. Especially for products requiring ultrahigh dimensional
lP
accuracy, the process window is always narrow and irregular, so acquiring proper process parameter settings is more difficult compared with those of a general product. Therefore, efficient process
products.
urn a
parameter optimization methods are necessary to address the problem of producing high-precision
With the application of soft computing, recent methods based on evolutionary algorithms, fuzzy systems, expert systems, and artificial neural networks have contributed to defining the process parameters in a more effective way. Research studies on process parameter optimization can be divided
Jo
into two categories. The first category is static process parameter optimization, which expects to obtain a global optimal result based on a surrogate model. The other category is dynamic optimization, which is based on knowledge or historical cases. Through interactions, the optimized results are achieved gradually. Most related works have examined static process parameter optimization based on modern 2
Journal Pre-proof
techniques. This approach consists of three steps: obtaining background data, constructing a surrogate model, and applying an optimization algorithm. The background data are obtained by the design of experiment (DOE). Surrogate models including response surface methodology (RSM), the Kriging
pro of
method, artificial neural network (ANN), and support vector regression (SVR) are employed to model the relationship between the process parameters and quality indexes. For example, Chen [2] presented a self-organizing map and a backpropagation neural network (SOM-BPNN) model to dynamically predict the product weight with high accuracy. The self-organizing map algorithm was used to extract the dynamic process parameter characteristics as additional network input. Yin [3] established an
re-
accuracy and rapid warpage prediction model using a backpropagation neural network with the ability to predict the warpage of plastic within an error range of 2%. Manjunath and Krishna [4] developed an
lP
ANN model to predict dimensional shrinkage with an error level of less than 10%. Everett [5] developed a subspace artificial neural network to predict molding cooling changes in injection molding. This method provided good predictability of the cavity temperature profile for varying processing
urn a
conditions and uncertain disturbances. After constructing a surrogate model, an optimization algorithm such as a genetic algorithm (GA), particle swarm optimization (PSO), and simulated annealing (SA) could be applied to find a global optimal result. An ANN coupled with the GA method [6–8] was used to optimize the process parameters in injection molding with the aim of minimizing warpage, sink
Jo
marks, and dimension errors in plastic products. Spina [9] developed an ANN-PSO approach that showed advantages in the reduction of volumetric shrinkage through optimization. The simulated annealing method [10] was used to discover an optimal set for achieving lower warpage. However, the consistency of the real production environment and the surrogate model built with experimental background data cannot be guaranteed. Although the above static methods demonstrate promising
3
Journal Pre-proof
results based on the surrogate model, the obtained process parameter settings can be invalid in the setup of a real machine. Research works with the dynamical optimization of process parameters try to avoid the limitations
pro of
of static methods. The dynamical optimization method includes an expert system, case-based reasoning, and fuzzy inference system. These systems are based on a knowledge or case database that is used for imitating the human trial-and-error process in optimization tasks. For example, Yang [11] presented a knowledge-based tuning (KBT) method. This method estimates the process window persistently for meeting the product specifications. An integrated intelligent system [12] takes the advantages of the
re-
fuzzy inference model based on expert knowledge and practical experience for corrections and process optimization. A fuzzy inference system [13] was developed for injection molding where fuzzy
lP
reasoning combined with human reasoning were used to obtain the qualified product with fewer trials. A case-based reasoning application [14] was proposed for the production of high-precision drippers. The defects of drippers were avoided by using domain expert knowledge based on the relationship of
urn a
quality and process parameters.
The performance of the above systems is directly influenced by the knowledge, rule, or case database. However, accurate and adequate domain knowledge, the primary component for inference in optimization, is difficult and complicated to acquire. In addition, extracting knowledge from humans
Jo
instead of learning from data is inefficient since new data are generated persistently in practical production.
The dynamic optimization procedure is a sequential decision problem that can be modeled as a Markov Decision Process (MDP). The MDP can be solved by reinforcement learning (RL) [15]. Reinforcement learning combined with deep artificial neural networks demonstrates powerful
4
Journal Pre-proof
performance in a host of fields such as games, robotics, and natural language processing [16,17]. In the fields of medical research, engineering, and finance, there are also some preliminary applications [18,19] based on reinforcement learning. Since process parameter optimization is a highly dynamic and
pro of
complex decision-making process in injection molding, a decision system which consists of a novel reinforcement learning framework and a self-prediction artificial neural network model (SP-RL) is developed. Several process parameters including the mold temperature, melt temperature, packing pressure, and packing time are considered as control variables. The dimension precision is the quality index of the variable optimization. Through orthogonal experimental design, a series of simulation
re-
experiments are conducted using the Moldflow software. Then, a self-prediction model and a hybrid decision agent modeled by MDP are trained successively based on the background data. The decision
lP
model contains a replay buffer and an Actor-Critic model built by two neural networks. A deep reinforcement learning algorithm is utilized for training the decision model that learns strategies to adjust the process parameters without empirical rules or expert knowledge. Finally, the feasibility of
Jo
parameters.
urn a
the proposed method is confirmed by practical experiments of a lens when optimizing the processing
5
Journal Pre-proof
2. Decision system scheme The proposed system is an intelligent decision agent that optimizes the process parameters in injection molding. For this work, an online decision model that combines a quality self-prediction
pro of
model is developed. Through an experiment of design and a simulation software analysis, background data are obtained. These data are data pairs that consist of the quality index and its corresponding process condition. A self-prediction model is trained by the background data to predict the quality index based on the process condition. Then, the self-prediction model is used as a simulated injection machine and provides a training environment for the decision model by applying a reinforcement
re-
learning algorithm. Afterward, the trained decision model is packaged as an application that is deployed in an industrial computer connected to an injection machine for online decisions. The
lP
application receives feedback from the engineers and adjusts the process parameters automatically based on knowledge learned from the offline training data. The construction procedure is described below, and a flowchart of the proposed method is shown in Fig. 1.
urn a
1) Experiment of design and background data acquisition: to acquire a certain amount of background data that can characterize the forming properties of the part under the selected process parameters and quality indexes in plastic injection molding. 2) Trained self-prediction quality model: to establish a nonlinear ANN regression model that can
Jo
map the complex relationship between the process parameters and quality indexes by using the background data.
3) Trained RL decision model: to build and train a decision model for learning process parameter adjustment strategies through a reinforcement learning algorithm. Many interactions with the environment are needed to train a reinforcement learning agent, but this is unrealistic in real production
6
Journal Pre-proof
considering the cost. Therefore, a preestablished prediction model is used to simulate a real injection machine to deal with the problem of insufficient data. The training procedure of the decision model interacts with this prediction model for rapid response instead of directly interacting with a real
lP
re-
pro of
environment.
Fig. 1 Flow chart of proposed system.
urn a
2.1. Offline background data acquisition
The background data is the basis for training the prediction model. Thus, it is crucial that the obtained data fully reflect the relationship between the quality indexes and process parameters under constraints. Instead of conducting full factorial experiments, experiments of different process
Jo
conditions are conducted through an orthogonal experimental design. This is widely used in injection molding research and can minimize the number of experiments while ensuring the validity and robustness of the data. Simulation experiments are performed using the Moldflow software. Then, the conditions and results of simulation experiments construct the background data set of the process parameters and quality indexes, which can be expressed as 7
Journal Pre-proof D {( X1 , y1 ),( X 2 , y2 ),...,( X i , yi ),...,( X n , yn )}
(1)
where X i is the process parameter vector, yi is the target vector, and i is the serial number of the experiment.
pro of
2.2. Quality self-prediction modeling A prediction mapping model can be built to predict output corresponding to the input after the background data are obtained. The model in this paper is a multilayer neural network that is used to predict the target values under different process parameter conditions. A network with hidden layers has the ability of nonlinear mapping. Thus, neural networks have been extensively used in engineering
re-
applications for prediction and optimization. In plastic injection molding, prediction models based on neural networks have proven effective in predicting quality indexes or classifying defects such as short
lP
shots, shrinkage, and warpage. Our study case for the experiments is an optical lens that requires high dimensional accuracy. The lens is aspheric with 46-mm effective diameter and 6.91-mm edge thickness. The edge thickness calculated by three measurement points is selected as the key dimension for
urn a
production requirements. These values are continuous real numbers, so our prediction model is a multiobjective regression model in which a sample input has more than one target output. In this study, the dimension of input data is relatively low. Hence, a neural network model with two hidden layers is designed. The structure of the neural network can be seen in Fig. 2. The figure shows that the network
Jo
has one input layer consisting of four neurons standing for four process parameters and an extra bias neuron; two hidden layers with 20 and 15 neurons, respectively; and one output layer having three neurons representing three predicted values of the dimension.
8
pro of
Journal Pre-proof
Fig. 2 Structure of self-prediction model.
By iteratively updating the connection weights to minimize the mean square error between the
re-
predicted target values and the experimental values, a precise model can be built to predict the edge thickness values under different process parameter conditions. These thickness values are continuous. Thus, the model’s output layer is a sigmoid layer, which means that the active function of neurons at
lP
the output layer is a sigmoid function. A sigmoid function can map a n-dimensional vector of arbitrary real values to another n-dimensional vector of continuous values in the range expressed as
urn a
p( y (i ) | x(i ) ; w)
1
1 e w
T (i )
x
. This function is
(2)
A multilayer neural network is sensitive to input feature scaling, so it is necessary to scale the data to account for the different value ranges of different parameters. Here, each process parameter on the
Jo
input vector x(i ) is scaled to
, and the transformation is given by
(i ) xscaled
x(i ) xlowerbound xupperbound xlowerbound
(3)
where xupperbound is the vector of the parameters’ upper bound. Similarly, xlowerbound is the vector of the parameters’ lower bound. The self-prediction model uses the Square Error loss function for regression, written as
9
Journal Pre-proof J (w)
1 m (i ) yˆ y (i ) + w 2 2 2 i 1 2
(4)
where yˆ ( i ) is the model prediction vector, w represents the weight parameters in the neural network, and is a regularization term that helps to avoid overfitting by penalizing weights with large
pro of
magnitudes. To train the prediction model from background data, a suitable training algorithm is responsible for
the
model
convergence.
Hence,
an
algorithm
named
the
limited-memory
Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) [20] is utilized in minimizing the loss function J ( w) , which approximates the inverse of the Hessian matrix H 1 to perform weight parameter updates.
re-
L-BFGS performs better on small datasets than the Stochastic Gradient Descent (SGD) algorithm and its variants. The overall procedure for training the prediction model is provided in the Appendix
lP
Scheme 1. 2.3. Markov decision process modeling
The decision model is a data learning model based on deep reinforcement learning through
urn a
interacting with the injection environment. The reinforcement learning method is not concerned with the specific form of the input; rather, it focuses on what action should be taken under the current state to achieve the final goal. Fig. 3 illustrates the framework for the proposed decision model to address our problems. The learner and decision-maker is called an agent. The objects it interacts with,
Jo
comprising everything outside the agent, are collectively named the environment. In this paper, the decision agent is named the Plastic Injection Molding (PIM) agent, and the environment is named the injection environment. The PIM agent continuously interacts with the injection environment, which returns rewards to the agent. Through cumulative rewards, the agent is expected to learn process parameter control in order to meet the production specifications as a final goal. Before deploying the
10
Journal Pre-proof
decision model to an online production environment, the agent should be trained to save costs by
pro of
interacting with our self-prediction model.
re-
Fig. 3 Proposed structure of reinforcement learning decision model.
The MDP framework is introduced here to mathematically describe our decision-making problem. MDP is a classical formalization of sequential decision-making. It is an idealized form of the
lP
reinforcement learning problem for which precise theoretical statements can be made. An MDP consists of a tuple of five elements (S , A, P, r, ) , where S is a finite set of states, A is a finite set of actions, P is a state transition probability function P( s' | s, a) , r is a reward function r ( s, a) ,
urn a
and is a discount factor where [0,1] , which indicates the present value of future rewards. In this paper, the decision-making problem being studied is also modeled as an MDP. Definitions of the tuple (S , A, P, r, ) related to our task are as follows: State space S: A state st S is defined as a vector that combines the current values of the
Jo
process parameters and thickness. It is described as follows: st { pt , yt } S
(5)
pt { pt1 , pt2 ,..., ptk ,..., ptm }, yt { yt1 , yt2 ,..., ytj ,..., ytq }
(6)
where ptk is the current value of the kth process parameter, and ytj is the current value of the jth quality index. When a mold trial is done at step t , both process parameter settings and the 11
Journal Pre-proof corresponding results are taken as references to adjust the process parameters. A new state st 1 can be observed under the new process parameter conditions. By repeating the procedure and interacting with the injection environment, an optimal process parameter condition that meets the quality requirements
pro of
can be acquired. Action space A: An action that recommends an adjustment amount of the process parameters based on the current st , denoted as follows:
at {at1 , at2 ,..., atk ,..., atm } A
(7)
m process parameters are controlled by the PIM agent within the constraints. Thus, the real process
re-
parameters at each time step can be written as t
pt pinit ai
(8)
i 0
lP
This means that the current process parameter setting is a sum of the initial values plus all adjustments before time step t . atk is limited to the range [-1,1] for convenient model training. Thus, the actual adjustment value is the product of atk and a step size for each process parameter. For example, the kth
urn a
variable in the vector represents the packing pressure, and its preset step size is 20 MPa. Therefore, when atk 0.5 , the PIM agent proposes an increase in the packing pressure by 10 MPa before the next mold trial while interacting with the injection machine. Transition function P: The transition function maps a given input state st and an action at to
Jo
a future state st 1 . It is desirable to predict future states accurately. However, it is impossible for us to predict the future state in a practical manner because the complex relationship between the process parameters and the quality of the product is unknown. This is why we should build a prediction model and employ that model as a transition function approximation. The prediction of the transition function gives an indication of the likeliest next state, which can be rewritten as
12
Journal Pre-proof p(st 1 | st , at ) { pt , yˆt }
(9)
yˆt M ( pt | w)
(10)
Reward R: One time step later, as a consequence of the PIM agent’s actions, a new process
pro of
parameter condition is generated. Then, the injection machine runs on this set of parameters, and the agent receives a numerical reward. The immediate reward the agent receives at any time step is a function of the current states and the control action taken by the agent, given by r (st , at ) . The immediate reward is necessary for the agent to judge the goodness or badness of the actions it takes. For our purposes, the agent should obtain an optimal process parameter setting within the shortest time
re-
step. Thus, the reward function is given by
r (st , at ) ( t 2 t 2 )
(11)
where t is a vector that represents the difference between the current values and required values of
lP
the quality indexes, and and are importance weights given to the quality index and time step, respectively.
urn a
Discount factor γ: The discount factor value ranges from 0 to 1. In particular, when 0 , the PIM agent only considers the immediate reward to take action. Conversely, when 1 , the agent will take action by considering all future rewards.
Using the notations and definitions above, the problem of process parameter optimization can be
Jo
formally described as follows: through interactions between the agent and the injection environment, the agent is expected to find control strategies that can maximize the cumulative rewards. In MDP, the return
is the total discounted future reward from time step
and is written as
T
Gt i t r (si , ai )
(12)
i t
An action-value function is the expected return starting from state s , taking action a , and then
13
Journal Pre-proof following a policy : Q (st , at ) E [Gt | st s, at a]
(13)
In this paper, the policy is a strategy to control the process parameters according to the feedback of the
pro of
injection environment. A deterministic policy is used and described as a function using a greedy policy:
(s) argmaxa Q(s, a)
(14)
There are two basic types of training algorithms in reinforcement learning: policy based (e.g., policy gradient) and value based (e.g., Q-learning). The integration of the two, which is named the
re-
Actor-Critic method, is used in our paper to improve the learning efficiency and ensure convergence. The agent architecture consists of two components: an Actor-Network and a Critic-Network. Both the Actor-Network and Critic-Network are neural networks that estimate the policy function to ( s )
lP
and the action-value function to Q' ( s, a) , where and ' are weight parameters. The Actor receives a state and outputs an action expressed by an m-dimensional vector, which means adjustments
urn a
of the total m process parameters. Then, the action taken by the Actor is evaluated by the Critic. The evaluation is indicated by the value of
taking at under st . Fig. 4 shows the interactions between
Jo
the Actor-Network, Critic-Network, and injection environment at each time step.
Fig. 4 Learning framework of Actor-Critic decision model. 14
Journal Pre-proof
The core of the proposed methodology is the PIM agent. A well-trained agent can optimize the process parameters effectively by using a good strategy in the form of ( s ) . Through some molding trials under random initial conditions, the trained agent can always get an optimal process parameter
pro of
condition to fit the quality index requirements. The training process needs to interact with the injection environment. This involves inputting a set of process parameters to the injection machine. This produces a part that can then be measured by specific indicators. This procedure is very time-consuming and costly. Therefore, the self-prediction model is used here as an environment simulator owing to its ability to map from process parameters to
re-
thickness values. It is used as a substitute for a real injection machine in training the decision model agent.
lP
To train the PIM agent, a replay buffer [16] module is introduced to accelerate the convergence of the agent training. This is a mechanism that stores the experiences of interaction between the agent and environment at each time step in the form of transition tuple et (st , at , rt , st 1 ) . This tuple contains the
urn a
state and action at each time step, the next state, and the reward given by the environment under the action taken. The replay buffer pools over many episodes and is randomly sampled when the neural networks are trained using the mini-batch Gradient Descent method. This is a good way to break the coherence of data in an episode. In practice, our replay buffer only stores the recently experienced
Jo
tuples, which means that previous transitions are overwritten with recent transitions owing to the limited capacity of the buffer. The agent maintains the policy (a | s) and the value function Q' ( s, a) approximation. In order to cope with the exploration challenge of the reinforcement learning algorithm in continuous action space, noise sampled from a noise process is added to construct an exploration policy for
15
Journal Pre-proof
training. The policy and the value function are updated after every action is taken until an episode is terminated. The update of the Actor-Network is performed using a sampled policy gradient [21], given as follows:
J A
1 Q (si , (si )) (si ) B i
The target value of the critic network is given by
pro of
J A
yt rt Q' (st 1 , ( st 1 ))
(15) (16)
(17)
The target is a bootstrapping idea that is widely used in dynamic programming. This involves
re-
estimating the current action-value function by estimating its successor state. Thus, the Critic is updated to minimize the square error loss with respect to the target. The gradient-updating formulation is as follows:
1 ( yi Q' (st , at ))2 B i
(18)
' ' ' J C
(19)
lP
JC
Jo
urn a
Full details of the training procedure are described in the Appendix Scheme 2.
16
Journal Pre-proof
3. Experiments Validation and Results In this section, the experiment settings are explained and simulations are performed to acquire background data through an L25 orthogonal experimental group for training the prediction model and
pro of
decision model. The effectiveness and robustness of the proposed decision system are evaluated by validation experiments on an optical lens. 3.1. Experiment setup
The experimental case utilized in this study is an optical lens, and the critical dimensions are defined in Fig. 5 (a). The diameter of this lens is 54 mm, while the effective diameter is 46 mm. The is 6.91
re-
aspherical vertex curvature is R 52 mm and R 33 mm, respectively, and the edge thickness
mm. In this study, the edge thickness is selected as the quality index. The edge thickness is a significant
lP
specification in lens assembly. In order to further ensure the forming accuracy of the lens, three measuring points detailed in Fig. 5 (b) are set on the part for edge thickness measurements. The mold
Jo
urn a
of the lens is shown in Fig. 5 (c).
(a)
17
urn a
lP
re-
(b)
pro of
Journal Pre-proof
(c)
Fig. 5 Experimental case: (a) critical dimensions of the lens, (b) the position of measuring points of the lens and (c) the injection machine, the lens mold and the lens product.
The experimental material used for the lens is a type of polycarbonate (Lexan OQ2720 PC). This
Jo
was selected for its good comprehensive performance. Fig. 6 shows the pressure-volume-temperature (PVT) properties of the material. An offline numerical simulation is performed on the Moldflow software for background data acquisition, while online practical production is conducted on a JSW J110ADC-180H electric injection molding machine. The screw diameter is 35 mm, and the clamping force is set to 100 t in the actual experiment. The process parameters to be researched in this work are 18
Journal Pre-proof
the mold temperature, melt temperature, packing pressure, and packing time. These parameters are considered to be the process factors that have the greatest impact on the part dimensions according to
pro of
previous research [22].
(a)
(b)
re-
Fig. 6 The properties chart of the material Lexan OQ2720 PC: (a) rheological and (b) PVT. The optimized injection speed (screw speed) is directly used in the experiment. The part is an
lP
aspheric lens that has special geometric characteristics and high optical requirements. In order to meet the above requirements, a multistage injection speed is adopted in the process of injection molding. Therefore, in order to avoid injection defects and ensure the formability of the lens, the injection
urn a
molding speed was optimized in advance through a short injection experiment. The injection speed curve is shown in Fig. 7. The injection speed is divided into four stages. The speed of the first stage is 10 mm/s to enable the melt to enter the cavity quickly and reduce the melting temperature drop. In the second stage, melt flows through the gate at a lower injection speed of 4 mm/s to avoid appearance
Jo
defects caused by the injection. The third stage is the molding stage of the part body. A high-speed injection of 25 mm/s is adopted to shorten the molding cycle and reduce the melt temperature difference in the mold cavity. In addition, the high-speed injection can improve the surface gloss of the product. Finally, the melt front is located at the end of the product and is injected at a low speed of 20 mm/s to facilitate gas emission.
19
pro of
Journal Pre-proof
Fig. 7 Injection speed setting in the experiment.
The constraints and adjustment step size settings for the process parameters in our experiment are
re-
detailed in Table 1. Five experimental levels were designed for each parameter, and specific details are listed in Table 2.
Table 1 Constraints and adjustment step sizes of process parameters.
Large Step
size
size
100
10
15
5
25
2
3.5
lP
Small Step
Melt Temperature[℃]
280
320
5
7.5
Mold Temperature[℃]
70
110
5
7.5
Process parameter Packing Pressure[MPa]
Upper bound
20
urn a
Packing Time[s]
Lower bound
Table 2 Process variables and their levels for experiment of design. Factor
Level 2
3
4
5
Packing Pressure[MPa]
20
40
60
80
100
Packing Time[s]
0
5
10
15
20
Melt Temperature[℃]
280
290
300
310
320
Mold Temperature[℃]
70
80
90
100
110
Jo
1
20
Journal Pre-proof
3.2. Offline simulation and decision model building An L25 orthogonal experimental design was chosen for the simulation experiments according to the number of variables and levels. The mesh type of the 3D lens model is a tetrahedral unit, and the
pro of
quantity is 490,753. Twenty-five orthogonal simulation experiments were performed by the Moldflow software on a Windows server. In order to obtain accurate results, all cases were fully analyzed, including filling, cooling, packing, and warpage analysis. The critical dimension values were extracted from the postprocessing result in Moldflow for constituting the background data, which is listed in Appendix Table 3. Fig. 8 presents the warpage results of different process conditions. The values of
re-
the edge thickness are not stable within the reasonable range owing to differences in the warpage effect.
lP
A self-prediction model with a 20-15-3 structure is trained with the background data. This artificial neural network is expected to approximate the relationship between the edge thickness value and the process parameters. By updating the connection weights, the square error between the network
urn a
prediction and the target value in the training data declines gradually and converges to 1.5087E-005 after 217 iterations through the L-BFGS algorithm. R 2 of the final prediction model is 0.998, which
Jo
indicates accurate predictions of the background data.
Fig. 8 Illustrative examples of warpage results by different process conditions. 21
Journal Pre-proof
The PIM agent is trained by interactions with the self-prediction model. The Actor and Critic network are nonlinear function approximators. The Actor network structure is empirically set to 256-256-128-64-4, and the Critic network structure is set to 256-256-128-64-1. The neuron numbers of
pro of
their output layers are different since the Actor outputs adjustments of the process parameters while the Critic outputs the action value. The learning rate of these neural networks is set to 0.001. Fig. 9 shows the accumulated rewards obtained by the decision agent during one episode while training. The reward curve represents the total rewards in an episode obtained by the agent. The curve is smoothed by the moving average. It can be seen that the rewards value fluctuates violently in most time. This is because
re-
the model-free reinforcement learning needs to explore the entire state space fully in order to learn proper strategies. Finally, the rewards obtained by the agent converge to a maximum value, which
Jo
urn a
parameter conditions.
lP
means that our agent has learned how to control the process parameters according to different process
Fig. 9 Training procedure of PIM agent. 3.3. Validation and convergence analysis 22
Journal Pre-proof
In order to verify the validity of the proposed method, validation experiments are carried out using the different step sizes listed in Table 2. The thickness dimension T of the lens part and its deviation are measured and calculated. The deviation of the edge thickness is within = 5.0m
pro of
for a qualified lens. The dynamic optimization procedures of the PIM agent using different adjustment step sizes are listed in Table 4 and Table 5. The initial process parameters in each table are the same and are randomly generated within the process parameter constraints. The critical dimension thickness is measured, and the deviation is calculated after each trial. The optimizations stop when the edge thickness meets the quality requirement.
re-
Table 4 Convergence validation experiment for PIM agent with small step size. Packing
Melt
Mold
Measure
Measure
Measure
pressure
time
temperature
temperature
point
point
point
[MPa]
[s]
[℃]
[℃]
Δ1[μm]
Δ2[μm]
Δ3[μm]
0
92
9
304
86
-32
-26
-35
1
94.5
11.0
302.2
81.0
-25
-18
-24
2
96.0
13.0
300.7
76.0
-18
-15
-19
3
96.6
15.0
300.2
71.0
-13
-8
-14
4
97.6
17.0
299.8
70.0
-9
-3
-8
5
100
19.0
298.0
71.0
-2
+2
-3
urn a
No.
lP
Packing
Trial
Table 5 Convergence validation experiment for PIM agent with large step size. Packing
Packing
Melt
Mold
Measure
Measure
Measure
pressure
time
temperature
temperature
point
point
point
[MPa]
[s]
[℃]
[℃]
Δ1[μm]
Δ2[μm]
Δ3[μm]
92
9
304
86
-32
-26
-35
95.7
12.5
301.3
78.5
-18
-17
-22
2
97.4
16.0
299.9
71.0
-7
-7
-11
3
100.0
19.5
298.5
70.0
0
+1
-3
Trial
0 1
Jo
No.
Plots (a) and (b) in Fig. 10 show the convergence results of the proposed method. For comparison, 23
Journal Pre-proof
the convergence results of the fuzzy inference system are shown in Fig. 10 (c)–(d). The quality of the lens can be guaranteed to the required dimensional precision through six and four mold trials by the proposed system. When the initial process parameters are the same, the adjustment process for the large
pro of
step size is two mold trials fewer than the adjustment process for the small step size. On the other hand, the fuzzy system with a small step size can produce a qualified lens through seven trials, but it fails with a large step size. Since the process window of a high-precision product is small, a qualified product may not be obtained when a large step optimization is applied. Thus, successful optimization by a fuzzy system depends on both the step size and the initial process parameters. By contrast, the
re-
adjustment amount of a parameter in the proposed system is determined by the product of the action and the step size, which can avoid the oscillation when the adjustment step size is too large. The results
lP
illustrate that both step size settings can produce a qualified lens in fewer steps, and further, the
Jo
urn a
optimization converges more quickly and is more stable compared with a fuzzy inference system.
24
re-
pro of
Journal Pre-proof
lP
Fig. 10 Convergence results of dynamic optimization: (a) proposed system with small step size, (b) proposed system with large step size, (c) fuzzy system with small step size and (d) fuzzy system with large step size.
The specific adjustment paths of the process parameters in the proposed system are shown in Fig.
urn a
11 (a)–(d). The final process parameter settings are similar, but the adjustments with a large step size are more direct than adjustments with a small step size. The adjustment path for a small step size is tortuous near the dimension control bound. This indicates that the adjustment policy of the PIM agent is not always optimal, which causes the agent to detour near the effective quality area. This
Jo
phenomenon is attributed to the fact that the agent did not fully explore the state space during training, so the agent becomes confused and proposes an inappropriate action when it encounters an unfamiliar state. The validation results demonstrate that the proposed dynamic optimization method is able to optimize the process parameters in fewer mold trials from random initial conditions. The method involves a trial-and-error process just like a human being, and it accumulates the experience and 25
Journal Pre-proof
lP
re-
pro of
knowledge of process parameter control by using a learning model.
Fig. 11 Adjustment for process parameters: (a) packing pressure, (b) packing time, (c) melt temperature and (d) mold temperature.
urn a
3.4. System robustness analysis
The robustness of the proposed method is verified by both offline and online experiments. The optimization results are compared with an ANN coupled with the GA method. The ANN coupled with GA is a common static optimization method used in injection molding. The GA is able to find global
Jo
optimal solutions given an objective function. A self-prediction model is used for GA optimization. Since this is a multiobjective optimization problem, the NSGA-II algorithm [23] is adapted. The first rows in Table 6 and Table 7 list the results. The max deviation of the lens edge thickness is +1 μm in the offline experiment and increases to +11 μm in the online experiment with the same process parameter settings. The explanation is that the self-prediction model is built with offline simulation
26
Journal Pre-proof
data, so the GA optimization based on the offline model achieves high precision in the offline experiment. However, there is a gap between the offline model and the online practical environment,
pro of
which makes the same settings fail in the online environment.
Table 6 Process optimization of GA and PIM agent for offline environment.
GA
Packing
Melt
Mold
pressure
time
temperature
temperature
deviation
[MPa]
[s]
[℃]
[℃]
Δmax[μm]
--
92.7
24.8
286.8
71.4
+1
0
60
15
300
90
-37
1
75.0
18.5
296.5
82.5
-28
2
90.0
3
100.0
Trial No.
Max
22.0
293.6
78.2
-12
25.0
291.2
82.5
-3
lP
PIM agent
re-
Method
Packing
Table 7 Process optimization of GA and PIM agent for online environment.
GA
Packing
Melt
Mold
Max
time
temperature
temperature
deviation
[MPa]
[s]
[℃]
[℃]
Δmax[μm]
--
92.7
24.8
286.8
71.4
+11
0
100.0
25.0
291.2
82.5
+15
1
96.4
24.3
298.6
75.1
+9
2
92.3
23.1
306.1
82.3
+2
Jo
PIM agent
Packing
pressure
Trial No.
urn a
Method
By contrast, the proposed dynamic method achieves the desired results in both experiments. First, the initial process parameters are set to the middle values of the given constraints in the offline experiment. The optimized parameters are obtained after three mold trials of adjustment, and the max deviation of the edge thickness decreases from 37 to 3 μm. Then, an experiment with this set of process 27
Journal Pre-proof
parameters is conducted in the online environment, and the dimension requirements are no longer met. Owing to state changes at this time, the PIM agent will continue to adjust the process parameters, and a new set of required process parameters are obtained in the subsequent two trials. The final optimized
re-
pro of
results for the GA and PIM agent in the online experiment are shown in Fig. 12.
Fig. 12 Optimized process parameters for online production: (a) lens optimized by GA and (b) lens
lP
optimized by PIM agent.
The adjustment details of different process parameters are shown in Fig. 13. As can be seen from the figure, the packing pressure and packing times are positively correlated with the lens edge thickness.
urn a
The value of the melt temperature increases when a positive thickness deviation occurs, which indicates a negative correlation. The situation is more complicated for the mold temperature. Owing to the coupling between the process parameters, the correlation is phased. The PIM agent can address the optimization problem both offline and online by learning the correlations between the process
Jo
parameters and the quality indexes from the background data.
28
re-
pro of
Journal Pre-proof
lP
Fig. 13 Dynamic optimization from offline to online process. In summary, the experiment results show that a static optimization such as GA can obtain the optimal settings of the process parameters based on an accurate model. However, accurate modeling of
urn a
the actual injection molding process is almost impossible. The RL decision system learns the parameter adjustment strategy that should be adopted in different states through offline training. Although the model is not exactly the same as the actual environment, it retains the relationship between different process parameters and the quality index. Therefore, the decision system can transfer knowledge to
Jo
online environment optimization and ensure that process parameters meeting the production requirements can be obtained. The injection molding is a repetitive manufacturing process and noises or disturbances commonly occur in production. Therefore, an appropriate setting of process parameters should not only fall into the process window, but should also be as close as possible to the center. To further investigate the
29
Journal Pre-proof
robustness of the proposed decision system, continuous production is performed for different methods, including the static GA optimization method, the dynamic optimization based on fuzzy inference and the proposed decision system. Continuous production of 30 cycles is conducted for each method. The
pro of
first 10 cycles are used for optimization and the last 20 cycles are for stable production. The initial process parameters setting in this experiment is given by the GA optimization method. The process capability index is used to measure the stability of mold cycles. The process capability index is an effective statistical measure which is widely adopted in many manufacturing processes and is expressed as:
USL LSL , ) 3 3
(20)
re-
C pk min(
where USL and LSL represent upper limit and lower limit of the lens thickness respectively,
lP
denotes the average thickness, denotes the standard deviation.
The results of edge thickness with respect to the producing cycles are shown in Fig. 14. As can be seen from the figure, the parameters setting obtained by the GA method cannot produce qualified lens
urn a
due to the gap between the offline and online environment. Meanwhile, the static optimization method is unable to dynamically adjust the process parameters. Conversely, for fuzzy inference system and the proposed decision system, the lens edge thickness meets the requirements after 6 and 2 adjustment
Jo
cycles respectively.
30
pro of
Journal Pre-proof
Fig. 14 Online mold cycles of different adjustment methods.
re-
The process capability results are shown in Table 8. Since the average thickness of the GA optimization method is not in the range of USL and LSL, the C pk value is meaningless. The proposed
lP
decision system maintains a more stable status in continuous production and the C pk value increases from 0.315 to 1.720 compared to the fuzzy inference system. There are substandard products in the last 20 cycles since the optimization by fuzzy inferences system is close to the LSL. The products optimized
urn a
by the decision system locate in the middle of the limits, so the process condition can sustain disturbances to some extent. Therefore, the proposed decision system is more effective and robust for producing ultra-high precision lens compare to the traditional process parameter optimization methods. Table 8 Process capability of different methods. Average thickness
Standard deviation
Cpk (last 20 cycles)
GA method
6.8942
0.00130
--
Fuzzy inference system
6.9062
0.00127
0.315
6.9102
0.00093
1.720
Jo
Method.
RL decision system (SP-RL)
31
Journal Pre-proof 4. Conclusion Process parameter optimization is a very important procedure in the injection molding process and has not been resolved to date. In order to address the optimization problem in ultrahigh-precision products, a decision system based on reinforcement learning integrated with an ANN self-prediction
pro of
model was developed for dynamic optimization in injection molding. The following conclusions were found from validation experiments performed on a lens part:
1) The trained self-prediction model accurately predicts the lens thickness with an R2 of 0.998. The prediction model provides information for training the decision agent, which can learn process
re-
parameter optimization without prior rules or knowledge. The proposed system achieves different step size optimization while avoiding oscillation. Qualified lenses are produced by six mold trials with a small step optimization or four mold trials with a large step optimization. The system presents a rapid
lP
and stable convergence in process parameter optimization.
2) The correlations between the process parameters and the edge thickness of the lens are learned by the decision agent and can be utilized in both offline and online optimizations. For online
urn a
production, the max deviation of the edge thickness is +9 μm as optimized by GA, while a max deviation of +2 μm is optimized by this system. Instead of obtaining a fixed set of process parameters by static optimization methods, this method is robust enough to handle both offline and online environments based on knowledge learned from the background data and its ability for dynamic
Jo
optimization.
Despite the above achievements, the proposed decision model based on reinforcement learning suffers from the challenges that most reinforcement learning methods face. When using neural networks as function approximators, the decision model lacks theoretical convergence guarantees, and the data is used inefficiently. More interpretable approximation methods need to be researched and 32
Journal Pre-proof
applied in the decision-making model. Besides, the decision model is trained for the specific task that is the process optimization of the optical lens in this study. To make the proposed method more stable and general, further researches will focus on exploring more reasonable function approximation and
pro of
transferring knowledge into different injection tasks based on one model by meta-learning or transfer
Jo
urn a
lP
re-
learning.
33
Journal Pre-proof
Acknowledgement The authors acknowledge financial support from the National Natural Science Foundation of China (Grant No. 51635006, 51675199, 51575207), the Fundamental Research Funds for the Central
Jo
urn a
lP
re-
pro of
Universities (Grant No. 2016YXZD059, 2015ZDTD028).
34
Journal Pre-proof
Appendix
Scheme 1: Parameters Training for the Prediction Model with L-BFGS Input:
Training set D , Number of iterations N , Regularization term Initialize the model M with random weights and shuffle set D randomly
2:
for e 1, N do Sample all data ( x ( i ) , y ( i ) ) from D
4:
Predict yˆ ( i ) of x(i ) through M
5:
Calculate the loss J ( w)
6:
Calculate H 1 by L-BFGS
7:
Update parameters of M :
8:
w w H 1 w J ( w)
The trained self-prediction model M
Jo
urn a
Output
end for
lP
9:
re-
3:
pro of
1:
35
Journal Pre-proof
Scheme 2: Parameters Training for the Agent with Mini-batch Gradient Descent A differentiable policy parameterization (s)
Input:
A differentiable action-value parameterization Q' ( s, a)
Input:
Training episodes N , batch size B , replay buffer R
Input:
The self-prediction model M , the max episode steps tmax
pro of
Input:
1:
Initialize policy weights and state-value weights '
2:
for e 1, N do
Random initial process setting pinit , and get initial state st
4:
Repeat
re-
3:
Output at according to ( s ) from Actor network
6:
Execute at and receive reward rt , next state st 1 from model M
7:
Store transition et in replay buffer R
8:
Sample random B transitions from R
9:
Update Critic and Actor network:
11: 12: 13:
J A , ' ' ' J C
t t 1 , st st 1
Until terminal sT or t tmanx end for
The trained ( s ) and Q' ( s, a)
Jo
Output:
urn a
10:
lP
5:
36
Journal Pre-proof
Packing
Melt
Mold
Measure
Measure
Measure
pressure
time
temperature
temperature
point
point
point
[MPa]
[s]
[℃]
[℃]
T1 [mm]
T2 [mm]
T3 [mm]
1
20
5
280
70
6.868
6.862
6.864
2
20
10
290
80
6.874
6.872
6.873
3
20
15
300
90
6.875
6.872
6.874
4
20
20
310
100
6.868
6.866
6.866
5
20
25
320
110
6.868
6.867
6.867
6
40
10
280
90
6.871
6.869
6.872
7
40
15
290
100
6.873
6.872
6.873
8
40
20
300
110
6.876
6.875
6.876
9
40
25
310
70
6.877
6.878
10
40
5
320
80
6.868
6.867
6.869
11
60
15
280
110
6.882
6.880
6.881
12
60
20
290
re-
6.879
70
6.891
6.887
6.889
13
60
25
300
80
6.887
6.886
6.887
14
60
5
310
90
6.862
6.860
6.862
15
60
10
320
100
6.883
6.880
6.882
16
80
20
280
80
6.891
6.889
6.890
17
80
25
290
90
6.895
6.892
6.893
18
80
5
300
100
6.866
6.865
6.871
19
80
10
310
110
6.882
6.881
6.881
20
80
15
320
70
6.893
6.888
6.891
21
100
25
280
100
6.908
6.904
6.905
22
100
5
290
110
6.872
6.869
6.873
23 24 25
lP
urn a
No.
pro of
Packing
Jo
Table 3 Orthogonal simulation experiments and results for L25.
100
10
300
70
6.893
6.893
6.894
100
15
310
80
6.904
6.902
6.902
100
20
320
90
6.906
6.904
6.905
37
Journal Pre-proof
References [1]
H. Zhou, Computer modeling for injection molding: simulation, optimization, and control, John Wiley & Sons, 2013. W.C. Chen, P.H. Tai, M.W. Wang, W.J. Deng, C.T. Chen, A neural network-based approach
pro of
[2]
for dynamic quality prediction in a plastic injection molding process, Expert Syst. Appl. 35 (2008) 843–849. [3]
F. Yin, H. Mao, L. Hua, W. Guo, M. Shu, Back propagation neural network modeling for warpage prediction and optimization of plastic products during injection molding, Mater. Des.
[4]
re-
32 (2011) 1844–1850.
P.G.C. Manjunath, P. Krishna, Prediction and optimization of dimensional shrinkage variations
lP
in injection molded parts using forward and reverse mapping of artificial neural networks, in: Adv. Mater. Res., 2012: pp. 674–678. [5]
S.E. Everett, R. Dubay, A sub-space artificial neural network for mold cooling in injection
[6]
urn a
molding, Expert Syst. Appl. 79 (2017) 358–371. H. Kurtaran, B. Ozcelik, T. Erzurumlu, Warpage optimization of a bus ceiling lamp base using neural network model and genetic algorithm, J. Mater. Process. Technol. 169 (2005) 314–319. [7]
W. Guo, L. Hua, H. Mao, Minimization of sink mark depth in injection-molded thermoplastic
Jo
through design of experiments and genetic algorithm, Int. J. Adv. Manuf. Technol. 72 (2014) 365–375.
[8]
K.-M. Tsai, H.-J. Luo, An inverse model for injection molding of optical lens using artificial neural network coupled with genetic algorithm, J. Intell. Manuf. 28 (2017) 473–487.
[9]
R. Spina, Optimisation of injection moulded parts by using ANN-PSO approach, J. Achiev.
38
Journal Pre-proof
Mater. Manuf. Eng. 15 (2006) 146–152. [10]
C. Yen, J.C. Lin, W. Li, M.F. Huang, An abductive neural network approach to the design of runner dimensions for the minimization of warpage in injection mouldings, J. Mater. Process.
[11]
pro of
Technol. 174 (2006) 22–28. D. Yang, K. Danai, D. Kazmer, A knowledge-based tuning method for injection molding machines, J. Manuf. Sci. Eng. 123 (2001) 682–691. [12]
H. Zhou, P. Zhao, W. Feng, An integrated intelligent system for injection molding process determination, Adv. Polym. Technol. J. Polym. Process. Inst. 26 (2007) 191–205.
M.K. Karasu, L. Salum, FIS-SMED: a fuzzy inference system application for plastic injection
re-
[13]
mold changeover, Int. J. Adv. Manuf. Technol. 94 (2018) 545–559. M.R. Khosravani, S. Nasiri, K. Weinberg, Application of case-based reasoning in a fault
lP
[14]
detection system on production of drippers, Appl. Soft Comput. 75 (2019) 227–232. R.S. Sutton, A.G. Barto, Reinforcement learning: An introduction, MIT press, 2018.
[16]
V. Mnih, K. Kavukcuoglu, D. Silver, A.A. Rusu, J. Veness, M.G. Bellemare, A. Graves, M.
urn a
[15]
Riedmiller, A.K. Fidjeland, G. Ostrovski, others, Human-level control through deep reinforcement learning, Nature. 518 (2015) 529. [17]
D. Silver, A. Huang, C.J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser,
Jo
I. Antonoglou, V. Panneershelvam, M. Lanctot, others, Mastering the game of Go with deep neural networks and tree search, Nature. 529 (2016) 484.
[18]
Z. Zhou, X. Li, R.N. Zare, Optimizing chemical reactions with deep reinforcement learning, ACS Cent. Sci. 3 (2017) 1337–1344.
[19]
F. Belletti, D. Haziza, G. Gomes, A.M. Bayen, Expert level control of ramp metering based on
39
Journal Pre-proof
multi-task deep reinforcement learning, IEEE Trans. Intell. Transp. Syst. 19 (2018) 1198–1207. [20]
R.H. Byrd, P. Lu, J. Nocedal, C. Zhu, A limited memory algorithm for bound constrained optimization, SIAM J. Sci. Comput. 16 (1995) 1190–1208. T.P. Lillicrap, J.J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, D. Wierstra,
pro of
[21]
Continuous control with deep reinforcement learning, ArXiv Prepr. ArXiv1509.02971. (2015). [22]
M.-C. Huang, C.-C. Tai, The effective factors in the warpage problem of an injection-molded part with a thin shell feature, J. Mater. Process. Technol. 110 (2001) 1–9.
K. Deb, A. Pratap, S. Agarwal, T. Meyarivan, A fast and elitist multiobjective genetic
urn a
lP
re-
algorithm: NSGA-II, IEEE Trans. Evol. Comput. 6 (2002) 182–197.
Jo
[23]
40
Journal Pre-proof
Figure captions Fig. 1 Flow chart of proposed system. Fig. 2 Structure of self-prediction model. Fig. 3 Proposed structure of reinforcement learning decision model.
pro of
Fig. 4 Learning framework of Actor-Critic decision model. Fig. 5 Experimental case: (a) critical dimensions of the lens, (b) the position of measuring points of the lens and (c) the injection machine, the lens mold and the lens product.
Fig. 6 The properties chart of the material Lexan OQ2720 PC: (a) rheological and (b) PVT. Fig. 7 Injection speed setting in the experiment.
Fig. 8 Illustrative examples of warpage results by different process conditions. Fig. 9 Training procedure of PIM agent.
re-
Fig. 10 Convergence results of dynamic optimization: (a) proposed system with small step size, (b) proposed system with large step size, (c) fuzzy system with small step size and (d) fuzzy system with large step size.
and (d) mold temperature.
lP
Fig. 11 Adjustment for process parameters: (a) packing pressure, (b) packing time, (c) melt temperature
Fig. 12 Optimized process parameters for online production: (a) lens optimized by GA and (b) lens
urn a
optimized by PIM agent.
Fig. 13 Dynamic optimization from offline to online process.
Jo
Fig. 14 Online mold cycles of different adjustment methods.
41
Journal Pre-proof
Table captions Table 1 Constraints and adjustment step sizes of process parameters. Table 2 Process variables and their levels for experiment of design.
pro of
Table 4 Convergence validation experiment for PIM agent with small step size. Table 5 Convergence validation experiment for PIM agent with large step size. Table 6 Process optimization of GA and PIM agent for offline environment. Table 7 Process optimization of GA and PIM agent for online environment.
Jo
urn a
lP
re-
Table 8 Process capability of different methods.
42
*Declaration of Interest Statement
Journal Pre-proof The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this
Jo
urn a
lP
re-
pro of
paper.