A reinforcement learning decision model for online process parameters optimization from offline data in injection molding

A reinforcement learning decision model for online process parameters optimization from offline data in injection molding

Journal Pre-proof A reinforcement learning decision model for online process parameters optimization from offline data in injection molding Fei Guo, X...

2MB Sizes 0 Downloads 93 Views

Journal Pre-proof A reinforcement learning decision model for online process parameters optimization from offline data in injection molding Fei Guo, Xiaowei Zhou, Jiahuan Liu, Yun Zhang, Dequn Li, Huamin Zhou

PII: DOI: Reference:

S1568-4946(19)30609-X https://doi.org/10.1016/j.asoc.2019.105828 ASOC 105828

To appear in:

Applied Soft Computing Journal

Received date : 10 April 2019 Revised date : 31 July 2019 Accepted date : 30 September 2019 Please cite this article as: F. Guo, X. Zhou, J. Liu et al., A reinforcement learning decision model for online process parameters optimization from offline data in injection molding, Applied Soft Computing Journal (2019), doi: https://doi.org/10.1016/j.asoc.2019.105828. This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

© 2019 Published by Elsevier B.V.

*Highlights (for review)

Journal Pre-proof Process parameters optimization is modeled with Markov Decision Process in injection molding.



A decision model combined with a self-prediction model built by neural network is formulated.



Strategies of optimization is learned by the decision system through reinforcement learning.



The proposed system is validated by an ultra-high precision lens product in practical production.

Jo

urn a

lP

re-

pro of



*Manuscript Click here to view linked References

Journal Pre-proof A reinforcement learning decision model for online process parameters optimization from offline data in injection molding Fei Guoa, Xiaowei Zhoua, Jiahuan Liua, Yun Zhangb,*, Dequn Lia, Huamin Zhoua,* a

State Key Lab of Material Processing and Die & Mold Technology, Huazhong University of Science

and Technology, Wuhan 430074, China. b

School of Materials Science and Engineering, Huazhong University of Science and Technology,

*

pro of

Wuhan 430074, China.

Corresponding author. Email: [email protected], [email protected].

Abstract

Injection molding is widely used owing to its ability to form high precision products. Good

re-

dimensional accuracy control depends on appropriate process parameters settings. However, existing optimization methods fail in producing ultra-high precision products due to their narrow process windows. In order to address the problem, an online decision system which consists of a novel

lP

reinforcement learning framework and a self-prediction artificial neural network model is developed. This decision system utilizes the knowledge learned from offline data to dynamically optimize the process of ultra-high precision products. Process optimization of an optical lens is dedicated to

urn a

validating the proposed system. The experimental results show that the proposed system has excellent convergence performance in producing lens with deviation not exceeding ±5μm. Comparison with the static optimization method prove that the decision model is more robust and effective in online production environment. And it achieves superior results in continuous production with the process

Jo

capability index of 1.720 compared to 0.315 in fuzzy inference system. There is great potential for utilizing the proposed data-driven decision system in similar manufacturing process. Key words:

Intelligent manufacturing; Injection molding; Neural network; Reinforcement learning;

1

Journal Pre-proof 1. Introduction Injection molding (IM) is a widely used process in plastic production owing to its capability to produce complex-shaped, good-performance, and high-precision products. The injection molding process consists of four basic phases: filling, packing, cooling, and ejection. This continuous process is

pro of

controlled by process parameters. Therefore, high-quality products depend on appropriate process parameter settings. Traditionally, the molding process parameters have been determined by a trial-and-error procedure. The engineers adjust the process parameters through repeated trial production according to their own experience until the specifications of product quality are satisfied. However,

re-

since the relationship between product quality and the process parameters is ambiguous as a result of the coupling between different control variables [1], there are still challenges in determining the process parameters for high-quality products. Especially for products requiring ultrahigh dimensional

lP

accuracy, the process window is always narrow and irregular, so acquiring proper process parameter settings is more difficult compared with those of a general product. Therefore, efficient process

products.

urn a

parameter optimization methods are necessary to address the problem of producing high-precision

With the application of soft computing, recent methods based on evolutionary algorithms, fuzzy systems, expert systems, and artificial neural networks have contributed to defining the process parameters in a more effective way. Research studies on process parameter optimization can be divided

Jo

into two categories. The first category is static process parameter optimization, which expects to obtain a global optimal result based on a surrogate model. The other category is dynamic optimization, which is based on knowledge or historical cases. Through interactions, the optimized results are achieved gradually. Most related works have examined static process parameter optimization based on modern 2

Journal Pre-proof

techniques. This approach consists of three steps: obtaining background data, constructing a surrogate model, and applying an optimization algorithm. The background data are obtained by the design of experiment (DOE). Surrogate models including response surface methodology (RSM), the Kriging

pro of

method, artificial neural network (ANN), and support vector regression (SVR) are employed to model the relationship between the process parameters and quality indexes. For example, Chen [2] presented a self-organizing map and a backpropagation neural network (SOM-BPNN) model to dynamically predict the product weight with high accuracy. The self-organizing map algorithm was used to extract the dynamic process parameter characteristics as additional network input. Yin [3] established an

re-

accuracy and rapid warpage prediction model using a backpropagation neural network with the ability to predict the warpage of plastic within an error range of 2%. Manjunath and Krishna [4] developed an

lP

ANN model to predict dimensional shrinkage with an error level of less than 10%. Everett [5] developed a subspace artificial neural network to predict molding cooling changes in injection molding. This method provided good predictability of the cavity temperature profile for varying processing

urn a

conditions and uncertain disturbances. After constructing a surrogate model, an optimization algorithm such as a genetic algorithm (GA), particle swarm optimization (PSO), and simulated annealing (SA) could be applied to find a global optimal result. An ANN coupled with the GA method [6–8] was used to optimize the process parameters in injection molding with the aim of minimizing warpage, sink

Jo

marks, and dimension errors in plastic products. Spina [9] developed an ANN-PSO approach that showed advantages in the reduction of volumetric shrinkage through optimization. The simulated annealing method [10] was used to discover an optimal set for achieving lower warpage. However, the consistency of the real production environment and the surrogate model built with experimental background data cannot be guaranteed. Although the above static methods demonstrate promising

3

Journal Pre-proof

results based on the surrogate model, the obtained process parameter settings can be invalid in the setup of a real machine. Research works with the dynamical optimization of process parameters try to avoid the limitations

pro of

of static methods. The dynamical optimization method includes an expert system, case-based reasoning, and fuzzy inference system. These systems are based on a knowledge or case database that is used for imitating the human trial-and-error process in optimization tasks. For example, Yang [11] presented a knowledge-based tuning (KBT) method. This method estimates the process window persistently for meeting the product specifications. An integrated intelligent system [12] takes the advantages of the

re-

fuzzy inference model based on expert knowledge and practical experience for corrections and process optimization. A fuzzy inference system [13] was developed for injection molding where fuzzy

lP

reasoning combined with human reasoning were used to obtain the qualified product with fewer trials. A case-based reasoning application [14] was proposed for the production of high-precision drippers. The defects of drippers were avoided by using domain expert knowledge based on the relationship of

urn a

quality and process parameters.

The performance of the above systems is directly influenced by the knowledge, rule, or case database. However, accurate and adequate domain knowledge, the primary component for inference in optimization, is difficult and complicated to acquire. In addition, extracting knowledge from humans

Jo

instead of learning from data is inefficient since new data are generated persistently in practical production.

The dynamic optimization procedure is a sequential decision problem that can be modeled as a Markov Decision Process (MDP). The MDP can be solved by reinforcement learning (RL) [15]. Reinforcement learning combined with deep artificial neural networks demonstrates powerful

4

Journal Pre-proof

performance in a host of fields such as games, robotics, and natural language processing [16,17]. In the fields of medical research, engineering, and finance, there are also some preliminary applications [18,19] based on reinforcement learning. Since process parameter optimization is a highly dynamic and

pro of

complex decision-making process in injection molding, a decision system which consists of a novel reinforcement learning framework and a self-prediction artificial neural network model (SP-RL) is developed. Several process parameters including the mold temperature, melt temperature, packing pressure, and packing time are considered as control variables. The dimension precision is the quality index of the variable optimization. Through orthogonal experimental design, a series of simulation

re-

experiments are conducted using the Moldflow software. Then, a self-prediction model and a hybrid decision agent modeled by MDP are trained successively based on the background data. The decision

lP

model contains a replay buffer and an Actor-Critic model built by two neural networks. A deep reinforcement learning algorithm is utilized for training the decision model that learns strategies to adjust the process parameters without empirical rules or expert knowledge. Finally, the feasibility of

Jo

parameters.

urn a

the proposed method is confirmed by practical experiments of a lens when optimizing the processing

5

Journal Pre-proof

2. Decision system scheme The proposed system is an intelligent decision agent that optimizes the process parameters in injection molding. For this work, an online decision model that combines a quality self-prediction

pro of

model is developed. Through an experiment of design and a simulation software analysis, background data are obtained. These data are data pairs that consist of the quality index and its corresponding process condition. A self-prediction model is trained by the background data to predict the quality index based on the process condition. Then, the self-prediction model is used as a simulated injection machine and provides a training environment for the decision model by applying a reinforcement

re-

learning algorithm. Afterward, the trained decision model is packaged as an application that is deployed in an industrial computer connected to an injection machine for online decisions. The

lP

application receives feedback from the engineers and adjusts the process parameters automatically based on knowledge learned from the offline training data. The construction procedure is described below, and a flowchart of the proposed method is shown in Fig. 1.

urn a

1) Experiment of design and background data acquisition: to acquire a certain amount of background data that can characterize the forming properties of the part under the selected process parameters and quality indexes in plastic injection molding. 2) Trained self-prediction quality model: to establish a nonlinear ANN regression model that can

Jo

map the complex relationship between the process parameters and quality indexes by using the background data.

3) Trained RL decision model: to build and train a decision model for learning process parameter adjustment strategies through a reinforcement learning algorithm. Many interactions with the environment are needed to train a reinforcement learning agent, but this is unrealistic in real production

6

Journal Pre-proof

considering the cost. Therefore, a preestablished prediction model is used to simulate a real injection machine to deal with the problem of insufficient data. The training procedure of the decision model interacts with this prediction model for rapid response instead of directly interacting with a real

lP

re-

pro of

environment.

Fig. 1 Flow chart of proposed system.

urn a

2.1. Offline background data acquisition

The background data is the basis for training the prediction model. Thus, it is crucial that the obtained data fully reflect the relationship between the quality indexes and process parameters under constraints. Instead of conducting full factorial experiments, experiments of different process

Jo

conditions are conducted through an orthogonal experimental design. This is widely used in injection molding research and can minimize the number of experiments while ensuring the validity and robustness of the data. Simulation experiments are performed using the Moldflow software. Then, the conditions and results of simulation experiments construct the background data set of the process parameters and quality indexes, which can be expressed as 7

Journal Pre-proof D  {( X1 , y1 ),( X 2 , y2 ),...,( X i , yi ),...,( X n , yn )}

(1)

where X i is the process parameter vector, yi is the target vector, and i is the serial number of the experiment.

pro of

2.2. Quality self-prediction modeling A prediction mapping model can be built to predict output corresponding to the input after the background data are obtained. The model in this paper is a multilayer neural network that is used to predict the target values under different process parameter conditions. A network with hidden layers has the ability of nonlinear mapping. Thus, neural networks have been extensively used in engineering

re-

applications for prediction and optimization. In plastic injection molding, prediction models based on neural networks have proven effective in predicting quality indexes or classifying defects such as short

lP

shots, shrinkage, and warpage. Our study case for the experiments is an optical lens that requires high dimensional accuracy. The lens is aspheric with 46-mm effective diameter and 6.91-mm edge thickness. The edge thickness calculated by three measurement points is selected as the key dimension for

urn a

production requirements. These values are continuous real numbers, so our prediction model is a multiobjective regression model in which a sample input has more than one target output. In this study, the dimension of input data is relatively low. Hence, a neural network model with two hidden layers is designed. The structure of the neural network can be seen in Fig. 2. The figure shows that the network

Jo

has one input layer consisting of four neurons standing for four process parameters and an extra bias neuron; two hidden layers with 20 and 15 neurons, respectively; and one output layer having three neurons representing three predicted values of the dimension.

8

pro of

Journal Pre-proof

Fig. 2 Structure of self-prediction model.

By iteratively updating the connection weights to minimize the mean square error between the

re-

predicted target values and the experimental values, a precise model can be built to predict the edge thickness values under different process parameter conditions. These thickness values are continuous. Thus, the model’s output layer is a sigmoid layer, which means that the active function of neurons at

lP

the output layer is a sigmoid function. A sigmoid function can map a n-dimensional vector of arbitrary real values to another n-dimensional vector of continuous values in the range expressed as

urn a

p( y (i ) | x(i ) ; w) 

1

1  e w

T (i )

x

. This function is

(2)

A multilayer neural network is sensitive to input feature scaling, so it is necessary to scale the data to account for the different value ranges of different parameters. Here, each process parameter on the

Jo

input vector x(i ) is scaled to

, and the transformation is given by

(i ) xscaled 

x(i )  xlowerbound xupperbound  xlowerbound

(3)

where xupperbound is the vector of the parameters’ upper bound. Similarly, xlowerbound is the vector of the parameters’ lower bound. The self-prediction model uses the Square Error loss function for regression, written as

9

Journal Pre-proof J (w) 

1 m (i )  yˆ  y (i ) + w 2  2 2 i 1 2

(4)

where yˆ ( i ) is the model prediction vector, w represents the weight parameters in the neural network, and  is a regularization term that helps to avoid overfitting by penalizing weights with large

pro of

magnitudes. To train the prediction model from background data, a suitable training algorithm is responsible for

the

model

convergence.

Hence,

an

algorithm

named

the

limited-memory

Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) [20] is utilized in minimizing the loss function J ( w) , which approximates the inverse of the Hessian matrix H 1 to perform weight parameter updates.

re-

L-BFGS performs better on small datasets than the Stochastic Gradient Descent (SGD) algorithm and its variants. The overall procedure for training the prediction model is provided in the Appendix

lP

Scheme 1. 2.3. Markov decision process modeling

The decision model is a data learning model based on deep reinforcement learning through

urn a

interacting with the injection environment. The reinforcement learning method is not concerned with the specific form of the input; rather, it focuses on what action should be taken under the current state to achieve the final goal. Fig. 3 illustrates the framework for the proposed decision model to address our problems. The learner and decision-maker is called an agent. The objects it interacts with,

Jo

comprising everything outside the agent, are collectively named the environment. In this paper, the decision agent is named the Plastic Injection Molding (PIM) agent, and the environment is named the injection environment. The PIM agent continuously interacts with the injection environment, which returns rewards to the agent. Through cumulative rewards, the agent is expected to learn process parameter control in order to meet the production specifications as a final goal. Before deploying the

10

Journal Pre-proof

decision model to an online production environment, the agent should be trained to save costs by

pro of

interacting with our self-prediction model.

re-

Fig. 3 Proposed structure of reinforcement learning decision model.

The MDP framework is introduced here to mathematically describe our decision-making problem. MDP is a classical formalization of sequential decision-making. It is an idealized form of the

lP

reinforcement learning problem for which precise theoretical statements can be made. An MDP consists of a tuple of five elements (S , A, P, r,  ) , where S is a finite set of states, A is a finite set of actions, P is a state transition probability function P( s' | s, a) , r is a reward function r ( s, a) ,

urn a

and  is a discount factor where  [0,1] , which indicates the present value of future rewards. In this paper, the decision-making problem being studied is also modeled as an MDP. Definitions of the tuple (S , A, P, r,  ) related to our task are as follows: State space S: A state st  S is defined as a vector that combines the current values of the

Jo

process parameters and thickness. It is described as follows: st  { pt , yt }  S

(5)

pt  { pt1 , pt2 ,..., ptk ,..., ptm }, yt  { yt1 , yt2 ,..., ytj ,..., ytq }

(6)

where ptk is the current value of the kth process parameter, and ytj is the current value of the jth quality index. When a mold trial is done at step t , both process parameter settings and the 11

Journal Pre-proof corresponding results are taken as references to adjust the process parameters. A new state st 1 can be observed under the new process parameter conditions. By repeating the procedure and interacting with the injection environment, an optimal process parameter condition that meets the quality requirements

pro of

can be acquired. Action space A: An action that recommends an adjustment amount of the process parameters based on the current st , denoted as follows:

at  {at1 , at2 ,..., atk ,..., atm }  A

(7)

m process parameters are controlled by the PIM agent within the constraints. Thus, the real process

re-

parameters at each time step can be written as t

pt  pinit   ai

(8)

i 0

lP

This means that the current process parameter setting is a sum of the initial values plus all adjustments before time step t . atk is limited to the range [-1,1] for convenient model training. Thus, the actual adjustment value is the product of atk and a step size for each process parameter. For example, the kth

urn a

variable in the vector represents the packing pressure, and its preset step size is 20 MPa. Therefore, when atk  0.5 , the PIM agent proposes an increase in the packing pressure by 10 MPa before the next mold trial while interacting with the injection machine. Transition function P: The transition function maps a given input state st and an action at to

Jo

a future state st 1 . It is desirable to predict future states accurately. However, it is impossible for us to predict the future state in a practical manner because the complex relationship between the process parameters and the quality of the product is unknown. This is why we should build a prediction model and employ that model as a transition function approximation. The prediction of the transition function gives an indication of the likeliest next state, which can be rewritten as

12

Journal Pre-proof p(st 1 | st , at )  { pt , yˆt }

(9)

yˆt  M ( pt | w)

(10)

Reward R: One time step later, as a consequence of the PIM agent’s actions, a new process

pro of

parameter condition is generated. Then, the injection machine runs on this set of parameters, and the agent receives a numerical reward. The immediate reward the agent receives at any time step is a function of the current states and the control action taken by the agent, given by r (st , at ) . The immediate reward is necessary for the agent to judge the goodness or badness of the actions it takes. For our purposes, the agent should obtain an optimal process parameter setting within the shortest time

re-

step. Thus, the reward function is given by

r (st , at )  ( t 2   t 2 )

(11)

where  t is a vector that represents the difference between the current values and required values of

lP

the quality indexes, and  and  are importance weights given to the quality index and time step, respectively.

urn a

Discount factor γ: The discount factor value ranges from 0 to 1. In particular, when   0 , the PIM agent only considers the immediate reward to take action. Conversely, when   1 , the agent will take action by considering all future rewards.

Using the notations and definitions above, the problem of process parameter optimization can be

Jo

formally described as follows: through interactions between the agent and the injection environment, the agent is expected to find control strategies that can maximize the cumulative rewards. In MDP, the return

is the total discounted future reward from time step

and is written as

T

Gt    i t r (si , ai )

(12)

i t

An action-value function is the expected return starting from state s , taking action a , and then

13

Journal Pre-proof following a policy  : Q (st , at )  E [Gt | st  s, at  a]

(13)

In this paper, the policy is a strategy to control the process parameters according to the feedback of the

pro of

injection environment. A deterministic policy is used and described as a function  using a greedy policy:

 (s)  argmaxa Q(s, a)

(14)

There are two basic types of training algorithms in reinforcement learning: policy based (e.g., policy gradient) and value based (e.g., Q-learning). The integration of the two, which is named the

re-

Actor-Critic method, is used in our paper to improve the learning efficiency and ensure convergence. The agent architecture consists of two components: an Actor-Network and a Critic-Network. Both the Actor-Network and Critic-Network are neural networks that estimate the policy function to  ( s )

lP

and the action-value function to Q' ( s, a) , where  and  ' are weight parameters. The Actor receives a state and outputs an action expressed by an m-dimensional vector, which means adjustments

urn a

of the total m process parameters. Then, the action taken by the Actor is evaluated by the Critic. The evaluation is indicated by the value of

taking at under st . Fig. 4 shows the interactions between

Jo

the Actor-Network, Critic-Network, and injection environment at each time step.

Fig. 4 Learning framework of Actor-Critic decision model. 14

Journal Pre-proof

The core of the proposed methodology is the PIM agent. A well-trained agent can optimize the process parameters effectively by using a good strategy in the form of  ( s ) . Through some molding trials under random initial conditions, the trained agent can always get an optimal process parameter

pro of

condition to fit the quality index requirements. The training process needs to interact with the injection environment. This involves inputting a set of process parameters to the injection machine. This produces a part that can then be measured by specific indicators. This procedure is very time-consuming and costly. Therefore, the self-prediction model is used here as an environment simulator owing to its ability to map from process parameters to

re-

thickness values. It is used as a substitute for a real injection machine in training the decision model agent.

lP

To train the PIM agent, a replay buffer [16] module is introduced to accelerate the convergence of the agent training. This is a mechanism that stores the experiences of interaction between the agent and environment at each time step in the form of transition tuple et  (st , at , rt , st 1 ) . This tuple contains the

urn a

state and action at each time step, the next state, and the reward given by the environment under the action taken. The replay buffer pools over many episodes and is randomly sampled when the neural networks are trained using the mini-batch Gradient Descent method. This is a good way to break the coherence of data in an episode. In practice, our replay buffer only stores the recently experienced

Jo

tuples, which means that previous transitions are overwritten with recent transitions owing to the limited capacity of the buffer. The agent maintains the policy  (a | s) and the value function Q' ( s, a) approximation. In order to cope with the exploration challenge of the reinforcement learning algorithm in continuous action space, noise sampled from a noise process is added to construct an exploration policy for

15

Journal Pre-proof

training. The policy and the value function are updated after every action is taken until an episode is terminated. The update of the Actor-Network is performed using a sampled policy gradient [21], given as follows:

 J A 

1   Q (si ,  (si ))  (si ) B i

The target value of the critic network is given by

pro of

     J A

yt  rt   Q' (st 1 ,  ( st 1 ))

(15) (16)

(17)

The target is a bootstrapping idea that is widely used in dynamic programming. This involves

re-

estimating the current action-value function by estimating its successor state. Thus, the Critic is updated to minimize the square error loss with respect to the target. The gradient-updating formulation is as follows:

1  ( yi  Q' (st , at ))2 B i

(18)

'  '  ' J C

(19)

lP

JC 

Jo

urn a

Full details of the training procedure are described in the Appendix Scheme 2.

16

Journal Pre-proof

3. Experiments Validation and Results In this section, the experiment settings are explained and simulations are performed to acquire background data through an L25 orthogonal experimental group for training the prediction model and

pro of

decision model. The effectiveness and robustness of the proposed decision system are evaluated by validation experiments on an optical lens. 3.1. Experiment setup

The experimental case utilized in this study is an optical lens, and the critical dimensions are defined in Fig. 5 (a). The diameter of this lens is 54 mm, while the effective diameter is 46 mm. The is 6.91

re-

aspherical vertex curvature is R 52 mm and R 33 mm, respectively, and the edge thickness

mm. In this study, the edge thickness is selected as the quality index. The edge thickness is a significant

lP

specification in lens assembly. In order to further ensure the forming accuracy of the lens, three measuring points detailed in Fig. 5 (b) are set on the part for edge thickness measurements. The mold

Jo

urn a

of the lens is shown in Fig. 5 (c).

(a)

17

urn a

lP

re-

(b)

pro of

Journal Pre-proof

(c)

Fig. 5 Experimental case: (a) critical dimensions of the lens, (b) the position of measuring points of the lens and (c) the injection machine, the lens mold and the lens product.

The experimental material used for the lens is a type of polycarbonate (Lexan OQ2720 PC). This

Jo

was selected for its good comprehensive performance. Fig. 6 shows the pressure-volume-temperature (PVT) properties of the material. An offline numerical simulation is performed on the Moldflow software for background data acquisition, while online practical production is conducted on a JSW J110ADC-180H electric injection molding machine. The screw diameter is 35 mm, and the clamping force is set to 100 t in the actual experiment. The process parameters to be researched in this work are 18

Journal Pre-proof

the mold temperature, melt temperature, packing pressure, and packing time. These parameters are considered to be the process factors that have the greatest impact on the part dimensions according to

pro of

previous research [22].

(a)

(b)

re-

Fig. 6 The properties chart of the material Lexan OQ2720 PC: (a) rheological and (b) PVT. The optimized injection speed (screw speed) is directly used in the experiment. The part is an

lP

aspheric lens that has special geometric characteristics and high optical requirements. In order to meet the above requirements, a multistage injection speed is adopted in the process of injection molding. Therefore, in order to avoid injection defects and ensure the formability of the lens, the injection

urn a

molding speed was optimized in advance through a short injection experiment. The injection speed curve is shown in Fig. 7. The injection speed is divided into four stages. The speed of the first stage is 10 mm/s to enable the melt to enter the cavity quickly and reduce the melting temperature drop. In the second stage, melt flows through the gate at a lower injection speed of 4 mm/s to avoid appearance

Jo

defects caused by the injection. The third stage is the molding stage of the part body. A high-speed injection of 25 mm/s is adopted to shorten the molding cycle and reduce the melt temperature difference in the mold cavity. In addition, the high-speed injection can improve the surface gloss of the product. Finally, the melt front is located at the end of the product and is injected at a low speed of 20 mm/s to facilitate gas emission.

19

pro of

Journal Pre-proof

Fig. 7 Injection speed setting in the experiment.

The constraints and adjustment step size settings for the process parameters in our experiment are

re-

detailed in Table 1. Five experimental levels were designed for each parameter, and specific details are listed in Table 2.

Table 1 Constraints and adjustment step sizes of process parameters.

Large Step

size

size

100

10

15

5

25

2

3.5

lP

Small Step

Melt Temperature[℃]

280

320

5

7.5

Mold Temperature[℃]

70

110

5

7.5

Process parameter Packing Pressure[MPa]

Upper bound

20

urn a

Packing Time[s]

Lower bound

Table 2 Process variables and their levels for experiment of design. Factor

Level 2

3

4

5

Packing Pressure[MPa]

20

40

60

80

100

Packing Time[s]

0

5

10

15

20

Melt Temperature[℃]

280

290

300

310

320

Mold Temperature[℃]

70

80

90

100

110

Jo

1

20

Journal Pre-proof

3.2. Offline simulation and decision model building An L25 orthogonal experimental design was chosen for the simulation experiments according to the number of variables and levels. The mesh type of the 3D lens model is a tetrahedral unit, and the

pro of

quantity is 490,753. Twenty-five orthogonal simulation experiments were performed by the Moldflow software on a Windows server. In order to obtain accurate results, all cases were fully analyzed, including filling, cooling, packing, and warpage analysis. The critical dimension values were extracted from the postprocessing result in Moldflow for constituting the background data, which is listed in Appendix Table 3. Fig. 8 presents the warpage results of different process conditions. The values of

re-

the edge thickness are not stable within the reasonable range owing to differences in the warpage effect.

lP

A self-prediction model with a 20-15-3 structure is trained with the background data. This artificial neural network is expected to approximate the relationship between the edge thickness value and the process parameters. By updating the connection weights, the square error between the network

urn a

prediction and the target value in the training data declines gradually and converges to 1.5087E-005 after 217 iterations through the L-BFGS algorithm. R 2 of the final prediction model is 0.998, which

Jo

indicates accurate predictions of the background data.

Fig. 8 Illustrative examples of warpage results by different process conditions. 21

Journal Pre-proof

The PIM agent is trained by interactions with the self-prediction model. The Actor and Critic network are nonlinear function approximators. The Actor network structure is empirically set to 256-256-128-64-4, and the Critic network structure is set to 256-256-128-64-1. The neuron numbers of

pro of

their output layers are different since the Actor outputs adjustments of the process parameters while the Critic outputs the action value. The learning rate of these neural networks is set to 0.001. Fig. 9 shows the accumulated rewards obtained by the decision agent during one episode while training. The reward curve represents the total rewards in an episode obtained by the agent. The curve is smoothed by the moving average. It can be seen that the rewards value fluctuates violently in most time. This is because

re-

the model-free reinforcement learning needs to explore the entire state space fully in order to learn proper strategies. Finally, the rewards obtained by the agent converge to a maximum value, which

Jo

urn a

parameter conditions.

lP

means that our agent has learned how to control the process parameters according to different process

Fig. 9 Training procedure of PIM agent. 3.3. Validation and convergence analysis 22

Journal Pre-proof

In order to verify the validity of the proposed method, validation experiments are carried out using the different step sizes listed in Table 2. The thickness dimension T of the lens part and its deviation  are measured and calculated. The deviation of the edge thickness is within =  5.0m

pro of

for a qualified lens. The dynamic optimization procedures of the PIM agent using different adjustment step sizes are listed in Table 4 and Table 5. The initial process parameters in each table are the same and are randomly generated within the process parameter constraints. The critical dimension thickness is measured, and the deviation is calculated after each trial. The optimizations stop when the edge thickness meets the quality requirement.

re-

Table 4 Convergence validation experiment for PIM agent with small step size. Packing

Melt

Mold

Measure

Measure

Measure

pressure

time

temperature

temperature

point

point

point

[MPa]

[s]

[℃]

[℃]

Δ1[μm]

Δ2[μm]

Δ3[μm]

0

92

9

304

86

-32

-26

-35

1

94.5

11.0

302.2

81.0

-25

-18

-24

2

96.0

13.0

300.7

76.0

-18

-15

-19

3

96.6

15.0

300.2

71.0

-13

-8

-14

4

97.6

17.0

299.8

70.0

-9

-3

-8

5

100

19.0

298.0

71.0

-2

+2

-3

urn a

No.

lP

Packing

Trial

Table 5 Convergence validation experiment for PIM agent with large step size. Packing

Packing

Melt

Mold

Measure

Measure

Measure

pressure

time

temperature

temperature

point

point

point

[MPa]

[s]

[℃]

[℃]

Δ1[μm]

Δ2[μm]

Δ3[μm]

92

9

304

86

-32

-26

-35

95.7

12.5

301.3

78.5

-18

-17

-22

2

97.4

16.0

299.9

71.0

-7

-7

-11

3

100.0

19.5

298.5

70.0

0

+1

-3

Trial

0 1

Jo

No.

Plots (a) and (b) in Fig. 10 show the convergence results of the proposed method. For comparison, 23

Journal Pre-proof

the convergence results of the fuzzy inference system are shown in Fig. 10 (c)–(d). The quality of the lens can be guaranteed to the required dimensional precision through six and four mold trials by the proposed system. When the initial process parameters are the same, the adjustment process for the large

pro of

step size is two mold trials fewer than the adjustment process for the small step size. On the other hand, the fuzzy system with a small step size can produce a qualified lens through seven trials, but it fails with a large step size. Since the process window of a high-precision product is small, a qualified product may not be obtained when a large step optimization is applied. Thus, successful optimization by a fuzzy system depends on both the step size and the initial process parameters. By contrast, the

re-

adjustment amount of a parameter in the proposed system is determined by the product of the action and the step size, which can avoid the oscillation when the adjustment step size is too large. The results

lP

illustrate that both step size settings can produce a qualified lens in fewer steps, and further, the

Jo

urn a

optimization converges more quickly and is more stable compared with a fuzzy inference system.

24

re-

pro of

Journal Pre-proof

lP

Fig. 10 Convergence results of dynamic optimization: (a) proposed system with small step size, (b) proposed system with large step size, (c) fuzzy system with small step size and (d) fuzzy system with large step size.

The specific adjustment paths of the process parameters in the proposed system are shown in Fig.

urn a

11 (a)–(d). The final process parameter settings are similar, but the adjustments with a large step size are more direct than adjustments with a small step size. The adjustment path for a small step size is tortuous near the dimension control bound. This indicates that the adjustment policy of the PIM agent is not always optimal, which causes the agent to detour near the effective quality area. This

Jo

phenomenon is attributed to the fact that the agent did not fully explore the state space during training, so the agent becomes confused and proposes an inappropriate action when it encounters an unfamiliar state. The validation results demonstrate that the proposed dynamic optimization method is able to optimize the process parameters in fewer mold trials from random initial conditions. The method involves a trial-and-error process just like a human being, and it accumulates the experience and 25

Journal Pre-proof

lP

re-

pro of

knowledge of process parameter control by using a learning model.

Fig. 11 Adjustment for process parameters: (a) packing pressure, (b) packing time, (c) melt temperature and (d) mold temperature.

urn a

3.4. System robustness analysis

The robustness of the proposed method is verified by both offline and online experiments. The optimization results are compared with an ANN coupled with the GA method. The ANN coupled with GA is a common static optimization method used in injection molding. The GA is able to find global

Jo

optimal solutions given an objective function. A self-prediction model is used for GA optimization. Since this is a multiobjective optimization problem, the NSGA-II algorithm [23] is adapted. The first rows in Table 6 and Table 7 list the results. The max deviation of the lens edge thickness is +1 μm in the offline experiment and increases to +11 μm in the online experiment with the same process parameter settings. The explanation is that the self-prediction model is built with offline simulation

26

Journal Pre-proof

data, so the GA optimization based on the offline model achieves high precision in the offline experiment. However, there is a gap between the offline model and the online practical environment,

pro of

which makes the same settings fail in the online environment.

Table 6 Process optimization of GA and PIM agent for offline environment.

GA

Packing

Melt

Mold

pressure

time

temperature

temperature

deviation

[MPa]

[s]

[℃]

[℃]

Δmax[μm]

--

92.7

24.8

286.8

71.4

+1

0

60

15

300

90

-37

1

75.0

18.5

296.5

82.5

-28

2

90.0

3

100.0

Trial No.

Max

22.0

293.6

78.2

-12

25.0

291.2

82.5

-3

lP

PIM agent

re-

Method

Packing

Table 7 Process optimization of GA and PIM agent for online environment.

GA

Packing

Melt

Mold

Max

time

temperature

temperature

deviation

[MPa]

[s]

[℃]

[℃]

Δmax[μm]

--

92.7

24.8

286.8

71.4

+11

0

100.0

25.0

291.2

82.5

+15

1

96.4

24.3

298.6

75.1

+9

2

92.3

23.1

306.1

82.3

+2

Jo

PIM agent

Packing

pressure

Trial No.

urn a

Method

By contrast, the proposed dynamic method achieves the desired results in both experiments. First, the initial process parameters are set to the middle values of the given constraints in the offline experiment. The optimized parameters are obtained after three mold trials of adjustment, and the max deviation of the edge thickness decreases from 37 to 3 μm. Then, an experiment with this set of process 27

Journal Pre-proof

parameters is conducted in the online environment, and the dimension requirements are no longer met. Owing to state changes at this time, the PIM agent will continue to adjust the process parameters, and a new set of required process parameters are obtained in the subsequent two trials. The final optimized

re-

pro of

results for the GA and PIM agent in the online experiment are shown in Fig. 12.

Fig. 12 Optimized process parameters for online production: (a) lens optimized by GA and (b) lens

lP

optimized by PIM agent.

The adjustment details of different process parameters are shown in Fig. 13. As can be seen from the figure, the packing pressure and packing times are positively correlated with the lens edge thickness.

urn a

The value of the melt temperature increases when a positive thickness deviation occurs, which indicates a negative correlation. The situation is more complicated for the mold temperature. Owing to the coupling between the process parameters, the correlation is phased. The PIM agent can address the optimization problem both offline and online by learning the correlations between the process

Jo

parameters and the quality indexes from the background data.

28

re-

pro of

Journal Pre-proof

lP

Fig. 13 Dynamic optimization from offline to online process. In summary, the experiment results show that a static optimization such as GA can obtain the optimal settings of the process parameters based on an accurate model. However, accurate modeling of

urn a

the actual injection molding process is almost impossible. The RL decision system learns the parameter adjustment strategy that should be adopted in different states through offline training. Although the model is not exactly the same as the actual environment, it retains the relationship between different process parameters and the quality index. Therefore, the decision system can transfer knowledge to

Jo

online environment optimization and ensure that process parameters meeting the production requirements can be obtained. The injection molding is a repetitive manufacturing process and noises or disturbances commonly occur in production. Therefore, an appropriate setting of process parameters should not only fall into the process window, but should also be as close as possible to the center. To further investigate the

29

Journal Pre-proof

robustness of the proposed decision system, continuous production is performed for different methods, including the static GA optimization method, the dynamic optimization based on fuzzy inference and the proposed decision system. Continuous production of 30 cycles is conducted for each method. The

pro of

first 10 cycles are used for optimization and the last 20 cycles are for stable production. The initial process parameters setting in this experiment is given by the GA optimization method. The process capability index is used to measure the stability of mold cycles. The process capability index is an effective statistical measure which is widely adopted in many manufacturing processes and is expressed as:

USL     LSL , ) 3 3

(20)

re-

C pk  min(

where USL and LSL represent upper limit and lower limit of the lens thickness respectively, 

lP

denotes the average thickness,  denotes the standard deviation.

The results of edge thickness with respect to the producing cycles are shown in Fig. 14. As can be seen from the figure, the parameters setting obtained by the GA method cannot produce qualified lens

urn a

due to the gap between the offline and online environment. Meanwhile, the static optimization method is unable to dynamically adjust the process parameters. Conversely, for fuzzy inference system and the proposed decision system, the lens edge thickness meets the requirements after 6 and 2 adjustment

Jo

cycles respectively.

30

pro of

Journal Pre-proof

Fig. 14 Online mold cycles of different adjustment methods.

re-

The process capability results are shown in Table 8. Since the average thickness of the GA optimization method is not in the range of USL and LSL, the C pk value is meaningless. The proposed

lP

decision system maintains a more stable status in continuous production and the C pk value increases from 0.315 to 1.720 compared to the fuzzy inference system. There are substandard products in the last 20 cycles since the optimization by fuzzy inferences system is close to the LSL. The products optimized

urn a

by the decision system locate in the middle of the limits, so the process condition can sustain disturbances to some extent. Therefore, the proposed decision system is more effective and robust for producing ultra-high precision lens compare to the traditional process parameter optimization methods. Table 8 Process capability of different methods. Average thickness

Standard deviation

Cpk (last 20 cycles)

GA method

6.8942

0.00130

--

Fuzzy inference system

6.9062

0.00127

0.315

6.9102

0.00093

1.720

Jo

Method.

RL decision system (SP-RL)

31

Journal Pre-proof 4. Conclusion Process parameter optimization is a very important procedure in the injection molding process and has not been resolved to date. In order to address the optimization problem in ultrahigh-precision products, a decision system based on reinforcement learning integrated with an ANN self-prediction

pro of

model was developed for dynamic optimization in injection molding. The following conclusions were found from validation experiments performed on a lens part:

1) The trained self-prediction model accurately predicts the lens thickness with an R2 of 0.998. The prediction model provides information for training the decision agent, which can learn process

re-

parameter optimization without prior rules or knowledge. The proposed system achieves different step size optimization while avoiding oscillation. Qualified lenses are produced by six mold trials with a small step optimization or four mold trials with a large step optimization. The system presents a rapid

lP

and stable convergence in process parameter optimization.

2) The correlations between the process parameters and the edge thickness of the lens are learned by the decision agent and can be utilized in both offline and online optimizations. For online

urn a

production, the max deviation of the edge thickness is +9 μm as optimized by GA, while a max deviation of +2 μm is optimized by this system. Instead of obtaining a fixed set of process parameters by static optimization methods, this method is robust enough to handle both offline and online environments based on knowledge learned from the background data and its ability for dynamic

Jo

optimization.

Despite the above achievements, the proposed decision model based on reinforcement learning suffers from the challenges that most reinforcement learning methods face. When using neural networks as function approximators, the decision model lacks theoretical convergence guarantees, and the data is used inefficiently. More interpretable approximation methods need to be researched and 32

Journal Pre-proof

applied in the decision-making model. Besides, the decision model is trained for the specific task that is the process optimization of the optical lens in this study. To make the proposed method more stable and general, further researches will focus on exploring more reasonable function approximation and

pro of

transferring knowledge into different injection tasks based on one model by meta-learning or transfer

Jo

urn a

lP

re-

learning.

33

Journal Pre-proof

Acknowledgement The authors acknowledge financial support from the National Natural Science Foundation of China (Grant No. 51635006, 51675199, 51575207), the Fundamental Research Funds for the Central

Jo

urn a

lP

re-

pro of

Universities (Grant No. 2016YXZD059, 2015ZDTD028).

34

Journal Pre-proof

Appendix

Scheme 1: Parameters Training for the Prediction Model with L-BFGS Input:

Training set D , Number of iterations N , Regularization term  Initialize the model M with random weights and shuffle set D randomly

2:

for e  1, N do Sample all data ( x ( i ) , y ( i ) ) from D

4:

Predict yˆ ( i ) of x(i ) through M

5:

Calculate the loss J ( w)

6:

Calculate H 1 by L-BFGS

7:

Update parameters of M :

8:

w  w  H 1 w J ( w)

The trained self-prediction model M

Jo

urn a

Output

end for

lP

9:

re-

3:

pro of

1:

35

Journal Pre-proof

Scheme 2: Parameters Training for the Agent with Mini-batch Gradient Descent A differentiable policy parameterization   (s)

Input:

A differentiable action-value parameterization Q' ( s, a)

Input:

Training episodes N , batch size B , replay buffer R

Input:

The self-prediction model M , the max episode steps tmax

pro of

Input:

1:

Initialize policy weights  and state-value weights  '

2:

for e  1, N do

Random initial process setting pinit , and get initial state st

4:

Repeat

re-

3:

Output at according to  ( s ) from Actor network

6:

Execute at and receive reward rt , next state st 1 from model M

7:

Store transition et in replay buffer R

8:

Sample random B transitions from R

9:

Update Critic and Actor network:

11: 12: 13:

     J A , '  '  ' J C

t  t  1 , st  st 1

Until terminal sT or t  tmanx end for

The trained  ( s ) and Q' ( s, a)

Jo

Output:

urn a

10:

lP

5:

36

Journal Pre-proof

Packing

Melt

Mold

Measure

Measure

Measure

pressure

time

temperature

temperature

point

point

point

[MPa]

[s]

[℃]

[℃]

T1 [mm]

T2 [mm]

T3 [mm]

1

20

5

280

70

6.868

6.862

6.864

2

20

10

290

80

6.874

6.872

6.873

3

20

15

300

90

6.875

6.872

6.874

4

20

20

310

100

6.868

6.866

6.866

5

20

25

320

110

6.868

6.867

6.867

6

40

10

280

90

6.871

6.869

6.872

7

40

15

290

100

6.873

6.872

6.873

8

40

20

300

110

6.876

6.875

6.876

9

40

25

310

70

6.877

6.878

10

40

5

320

80

6.868

6.867

6.869

11

60

15

280

110

6.882

6.880

6.881

12

60

20

290

re-

6.879

70

6.891

6.887

6.889

13

60

25

300

80

6.887

6.886

6.887

14

60

5

310

90

6.862

6.860

6.862

15

60

10

320

100

6.883

6.880

6.882

16

80

20

280

80

6.891

6.889

6.890

17

80

25

290

90

6.895

6.892

6.893

18

80

5

300

100

6.866

6.865

6.871

19

80

10

310

110

6.882

6.881

6.881

20

80

15

320

70

6.893

6.888

6.891

21

100

25

280

100

6.908

6.904

6.905

22

100

5

290

110

6.872

6.869

6.873

23 24 25

lP

urn a

No.

pro of

Packing

Jo

Table 3 Orthogonal simulation experiments and results for L25.

100

10

300

70

6.893

6.893

6.894

100

15

310

80

6.904

6.902

6.902

100

20

320

90

6.906

6.904

6.905

37

Journal Pre-proof

References [1]

H. Zhou, Computer modeling for injection molding: simulation, optimization, and control, John Wiley & Sons, 2013. W.C. Chen, P.H. Tai, M.W. Wang, W.J. Deng, C.T. Chen, A neural network-based approach

pro of

[2]

for dynamic quality prediction in a plastic injection molding process, Expert Syst. Appl. 35 (2008) 843–849. [3]

F. Yin, H. Mao, L. Hua, W. Guo, M. Shu, Back propagation neural network modeling for warpage prediction and optimization of plastic products during injection molding, Mater. Des.

[4]

re-

32 (2011) 1844–1850.

P.G.C. Manjunath, P. Krishna, Prediction and optimization of dimensional shrinkage variations

lP

in injection molded parts using forward and reverse mapping of artificial neural networks, in: Adv. Mater. Res., 2012: pp. 674–678. [5]

S.E. Everett, R. Dubay, A sub-space artificial neural network for mold cooling in injection

[6]

urn a

molding, Expert Syst. Appl. 79 (2017) 358–371. H. Kurtaran, B. Ozcelik, T. Erzurumlu, Warpage optimization of a bus ceiling lamp base using neural network model and genetic algorithm, J. Mater. Process. Technol. 169 (2005) 314–319. [7]

W. Guo, L. Hua, H. Mao, Minimization of sink mark depth in injection-molded thermoplastic

Jo

through design of experiments and genetic algorithm, Int. J. Adv. Manuf. Technol. 72 (2014) 365–375.

[8]

K.-M. Tsai, H.-J. Luo, An inverse model for injection molding of optical lens using artificial neural network coupled with genetic algorithm, J. Intell. Manuf. 28 (2017) 473–487.

[9]

R. Spina, Optimisation of injection moulded parts by using ANN-PSO approach, J. Achiev.

38

Journal Pre-proof

Mater. Manuf. Eng. 15 (2006) 146–152. [10]

C. Yen, J.C. Lin, W. Li, M.F. Huang, An abductive neural network approach to the design of runner dimensions for the minimization of warpage in injection mouldings, J. Mater. Process.

[11]

pro of

Technol. 174 (2006) 22–28. D. Yang, K. Danai, D. Kazmer, A knowledge-based tuning method for injection molding machines, J. Manuf. Sci. Eng. 123 (2001) 682–691. [12]

H. Zhou, P. Zhao, W. Feng, An integrated intelligent system for injection molding process determination, Adv. Polym. Technol. J. Polym. Process. Inst. 26 (2007) 191–205.

M.K. Karasu, L. Salum, FIS-SMED: a fuzzy inference system application for plastic injection

re-

[13]

mold changeover, Int. J. Adv. Manuf. Technol. 94 (2018) 545–559. M.R. Khosravani, S. Nasiri, K. Weinberg, Application of case-based reasoning in a fault

lP

[14]

detection system on production of drippers, Appl. Soft Comput. 75 (2019) 227–232. R.S. Sutton, A.G. Barto, Reinforcement learning: An introduction, MIT press, 2018.

[16]

V. Mnih, K. Kavukcuoglu, D. Silver, A.A. Rusu, J. Veness, M.G. Bellemare, A. Graves, M.

urn a

[15]

Riedmiller, A.K. Fidjeland, G. Ostrovski, others, Human-level control through deep reinforcement learning, Nature. 518 (2015) 529. [17]

D. Silver, A. Huang, C.J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser,

Jo

I. Antonoglou, V. Panneershelvam, M. Lanctot, others, Mastering the game of Go with deep neural networks and tree search, Nature. 529 (2016) 484.

[18]

Z. Zhou, X. Li, R.N. Zare, Optimizing chemical reactions with deep reinforcement learning, ACS Cent. Sci. 3 (2017) 1337–1344.

[19]

F. Belletti, D. Haziza, G. Gomes, A.M. Bayen, Expert level control of ramp metering based on

39

Journal Pre-proof

multi-task deep reinforcement learning, IEEE Trans. Intell. Transp. Syst. 19 (2018) 1198–1207. [20]

R.H. Byrd, P. Lu, J. Nocedal, C. Zhu, A limited memory algorithm for bound constrained optimization, SIAM J. Sci. Comput. 16 (1995) 1190–1208. T.P. Lillicrap, J.J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, D. Wierstra,

pro of

[21]

Continuous control with deep reinforcement learning, ArXiv Prepr. ArXiv1509.02971. (2015). [22]

M.-C. Huang, C.-C. Tai, The effective factors in the warpage problem of an injection-molded part with a thin shell feature, J. Mater. Process. Technol. 110 (2001) 1–9.

K. Deb, A. Pratap, S. Agarwal, T. Meyarivan, A fast and elitist multiobjective genetic

urn a

lP

re-

algorithm: NSGA-II, IEEE Trans. Evol. Comput. 6 (2002) 182–197.

Jo

[23]

40

Journal Pre-proof

Figure captions Fig. 1 Flow chart of proposed system. Fig. 2 Structure of self-prediction model. Fig. 3 Proposed structure of reinforcement learning decision model.

pro of

Fig. 4 Learning framework of Actor-Critic decision model. Fig. 5 Experimental case: (a) critical dimensions of the lens, (b) the position of measuring points of the lens and (c) the injection machine, the lens mold and the lens product.

Fig. 6 The properties chart of the material Lexan OQ2720 PC: (a) rheological and (b) PVT. Fig. 7 Injection speed setting in the experiment.

Fig. 8 Illustrative examples of warpage results by different process conditions. Fig. 9 Training procedure of PIM agent.

re-

Fig. 10 Convergence results of dynamic optimization: (a) proposed system with small step size, (b) proposed system with large step size, (c) fuzzy system with small step size and (d) fuzzy system with large step size.

and (d) mold temperature.

lP

Fig. 11 Adjustment for process parameters: (a) packing pressure, (b) packing time, (c) melt temperature

Fig. 12 Optimized process parameters for online production: (a) lens optimized by GA and (b) lens

urn a

optimized by PIM agent.

Fig. 13 Dynamic optimization from offline to online process.

Jo

Fig. 14 Online mold cycles of different adjustment methods.

41

Journal Pre-proof

Table captions Table 1 Constraints and adjustment step sizes of process parameters. Table 2 Process variables and their levels for experiment of design.

pro of

Table 4 Convergence validation experiment for PIM agent with small step size. Table 5 Convergence validation experiment for PIM agent with large step size. Table 6 Process optimization of GA and PIM agent for offline environment. Table 7 Process optimization of GA and PIM agent for online environment.

Jo

urn a

lP

re-

Table 8 Process capability of different methods.

42

*Declaration of Interest Statement

Journal Pre-proof The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this

Jo

urn a

lP

re-

pro of

paper.