Lead time prediction in a flow-shop environment with analytical and machine learning approaches

Lead time prediction in a flow-shop environment with analytical and machine learning approaches

Proceedings,16th IFAC Symposium on Proceedings,16th IFAC Symposium on Information Control Problems in Manufacturing Proceedings,16th IFAC Symposium on...

432KB Sizes 0 Downloads 19 Views

Proceedings,16th IFAC Symposium on Proceedings,16th IFAC Symposium on Information Control Problems in Manufacturing Proceedings,16th IFAC Symposium on Information Control Problems in Manufacturing Proceedings,16th IFAC Symposium on Bergamo, Italy, JuneProblems 11-13, 2018 Available online at www.sciencedirect.com Information Control in Manufacturing Proceedings,16th IFAC Symposium on Bergamo, Italy, June 11-13, 2018 Information Control Problems in Manufacturing Bergamo, Italy, June 11-13, 2018 Information Control Problems in Manufacturing Bergamo, Italy, June 11-13, 2018 Bergamo, Italy, June 11-13, 2018

ScienceDirect

IFAC PapersOnLine 51-11 (2018) 1029–1034

Lead time prediction in a flow-shop Lead time prediction in a flow-shop Lead time prediction in a flow-shop Lead time with prediction in a and flow-shop environment analytical machine environment with analytical and machine Lead time prediction in a flow-shop environment with analytical and machine environment with analytical and machine learning approaches learning approaches environment with analytical and machine learning learning approaches approaches ∗ learning approaches ∗ ∗ D´ avid Gyulai ∗ Andr´ as Pfeiffer ∗ G´ abor Nick ∗ Viola Gallina ∗∗ D´ avid Gyulai Andr´ as Pfeiffer G´ abor Nick Viola Gallina ∗∗

∗ ∗ ∗∗ ∗ D´ avid GyulaiWilfried asSihn Pfeiffer G´ aobor Nick ∗∗ Viola Gallina ∗∗ aszl´ Monostori ∗ Andr´ ∗ ∗∗ L´ ∗ D´ avid GyulaiWilfried Andr´ asSihn Pfeiffer G´ aobor Nick ∗ Viola Gallina ∗∗ a szl´ Monostori ∗∗ L´ ∗ ∗ ∗ Sihn aszl´ obor Monostori ∗∗ L´ ∗ D´ avid GyulaiWilfried Andr´ a s Pfeiffer G´ a Nick Viola Gallina ∗∗ Wilfried Sihn ∗∗ L´ aszl´ o Monostori ∗ ∗ in Production Informatics and Control (EPIC), Wilfried Sihn L´ aszl´ o Monostori ∗ Centre of Excellence ∗ Centre of Excellence in Production Informatics and Control (EPIC), of Excellence in Production Informatics and Control (EPIC), for Computer Science and Control (SZTAKI), ∗ CentreInstitute CentreInstitute of Excellence in Production Informatics and Control (EPIC), for Computer Science and Control (SZTAKI), ∗ Institute for Computer Science and Control (SZTAKI), Hungarian Academy of Sciences (MTA), Budapest, Hungary Centre of Excellence in Production Informatics and Control (EPIC), Institute for Computer Science and Control (SZTAKI), Hungarian Academy of Sciences (MTA), Budapest, Hungary Hungarian Academy of Sciences (MTA), Budapest, Hungary (e-mails: {david.gyulai,andras.pfeiffer,gabor.nick}@sztaki.mta.hu) Institute for Computer Science and Control (SZTAKI), Hungarian Academy of Sciences (MTA), Budapest, Hungary (e-mails: {david.gyulai,andras.pfeiffer,gabor.nick}@sztaki.mta.hu) ∗∗ {david.gyulai,andras.pfeiffer,gabor.nick}@sztaki.mta.hu) (e-mails: Austria Research(MTA), GmbH,Budapest, Wien, Austria, ∗∗ Fraunhofer Hungarian Academy of Sciences Hungary (e-mails: {david.gyulai,andras.pfeiffer,gabor.nick}@sztaki.mta.hu) Austria Research ∗∗ Fraunhofer Austria Research GmbH, GmbH, Wien, Wien, Austria, Austria, (e-mail: [email protected]) ∗∗ Fraunhofer (e-mails: {david.gyulai,andras.pfeiffer,gabor.nick}@sztaki.mta.hu) Fraunhofer Austria Research GmbH, Wien, Austria, (e-mail: [email protected]) ∗∗ (e-mail: [email protected]) Fraunhofer Austria Research GmbH, Wien, Austria, (e-mail: [email protected]) (e-mail: Abstract: Manufacturing lead [email protected]) (LT) is often among the most important corporate Abstract: Manufacturing Manufacturing lead lead time time (LT) (LT) is is often often among among the the most most important important corporate corporate Abstract: performanceManufacturing indicators thatlead companies wishis to minimize order meet the corporate customer Abstract: time (LT) often amongin the mostto important performance indicators that companies wish to minimize in order to meet the customer performance indicators thatthe companies wishis in to minimize inthe order to meet theproduction customer expectations, by delivering right products the shortest possible time. Most Abstract: Manufacturing lead time (LT) often among most important corporate performance thatthecompanies wish in to the minimize order to meet theproduction customer expectations, indicators by delivering delivering right products products shortestinpossible possible time. Most expectations, by the right in the shortest time. Most production planning and indicators scheduling methods rely onwish LTs,in the efficiency of these is performance that totherefore, minimize inpossible order to meet themethods customer expectations, delivering thecompanies right products the shortest time. Most production planning and and by scheduling methods rely on LTs, the efficiency of methods is planning scheduling methods rely on LTs,in therefore, therefore, theachieving efficiency of these these methods is crucially affected by the accuracy of LT prediction. However, high accuracy is often expectations, by delivering the right products the shortest possible time. Most production planning and scheduling methods ofrely on LTs, therefore, theachieving efficiencyhigh of these methods is crucially affected by the accuracy LT prediction. However, accuracy is often crucially affected by the accuracy of LT prediction. However, achieving high accuracy is often complicated, due to the complexity of the processes and high variety of products. In the paper, planning anddue scheduling methods ofrely on LTs, therefore, the efficiency of these methods is crucially affected by the thecomplexity accuracy LT prediction. However, achieving high accuracy ispaper, often complicated, to of the processes and high variety of products. In the complicated, to complexity theprediction. processes and high variety of products. Infocusing theispaper, analyticalaffected anddue machine techniques are analyzed and compared, on crucially by the thelearning accuracyprediction ofof However, achieving high accuracy often complicated, due to the complexity ofLT the processes and high variety of products. Infocusing the paper, analytical and machine learning prediction techniques are analyzed and compared, on analytical anddue machine prediction techniques are analyzed and compared, on a real flow-shop environment exposed to processes frequent changes and uncertainties resulted the complicated, to the learning complexity of the and high variety of products. Infocusing the by paper, analytical and machine learning prediction techniques are analyzed and compared, focusing on a real real flow-shop flow-shop environment exposed to frequent frequent changes and uncertainties resulted by the a environment exposed to changes and uncertainties resulted by the changing customer order stream. The digital data twin of the processes is applied to accurately analytical and machine learning prediction techniques are analyzed andiscompared, focusing on achanging real flow-shop environment exposed to frequent changes and uncertainties resulted by the customer order stream. The digital data twin of the processes applied to accurately changing customer order stream. digital datathe twin of theand processes isup-to-date applied to accurately predict the manufacturing LTexposed ofThe jobs, prediction models via by online a real flow-shop environment tokeeping frequent changes uncertainties resulted the changing customer order stream. The digital datathe twin of the processes isup-to-date applied to accurately predict the manufacturing LT of jobs, keeping prediction models via online predict the manufacturing LT ofThe jobs, keeping the models via online connection with theorder manufacturing execution system, and frequent retraining of the models. changing customer stream. digital data twinprediction of the processes isup-to-date applied to accurately predict the manufacturing LT of jobs, keeping the prediction models up-to-date via online connection with the manufacturing manufacturing execution system, and frequent retraining of the the models. connection with the execution system, and frequent retraining of models. predict the manufacturing LT of jobs, keeping the prediction models up-to-date via online connection the manufacturing execution frequent retraining of the © 2018, IFACwith (International Federation of Automaticsystem, Control)and Hosting by Elsevier Ltd. All rightsmodels. reserved. connection with the manufacturing system, and systems, frequent Machine retraininglearning, of the models. Keywords: Production control, Leadexecution time, Manufacturing Keywords: control, time, Keywords: Production Production control, Lead Lead time, Manufacturing Manufacturing systems, systems, Machine Machine learning, learning, Prediction methods, Statistical inference Keywords: Production control, Lead time, Manufacturing systems, Machine learning, Prediction methods, Statistical inference Prediction Production methods, Statistical inference Keywords: control, Lead time, Manufacturing systems, Machine learning, Prediction methods, Statistical inference Prediction methods, Statistical inference 1. INTRODUCTION LT prediction is the key of successful production planning 1. INTRODUCTION INTRODUCTION LT prediction prediction is is the the key key of of successful successful production production planning planning 1. LT andprediction control, as due-date jobs areproduction typically planning assigned 1. INTRODUCTION LT is the key of of successful and control, as due-date of jobs are typically assigned and control, as due-date of jobs are typically assigned based on their expected LT, and they are mostly scheduled 1. INTRODUCTION LT prediction is the key of successful production planning 1.1 Relevance of LT prediction, motivation and control, as due-date ofand jobs are typically assigned based on their expected LT, they are mostly scheduled 1.1 Relevance Relevance of of LT LT prediction, prediction, motivation motivation based on their expected LT, and they arethe mostly scheduled with control, backwards techniques relying on parameter in and due-date jobs are assigned 1.1 based on theiras expected LT,ofand they aretypically mostly scheduled with backwards backwards techniques relying on the parameter in 1.1 Relevance of LT prediction, motivation with techniques relying on the parameter in consideration. based on their expected LT, and theyon arethe mostly scheduled backwards techniques relying parameter in consideration. 1.1 Relevance LT prediction, A recent globaloftrend in marketsmotivation is the increasing variety with consideration. with backwards techniquesmanufacturing relying on theenvironment parameter in A recent global trend in markets is the increasing variety consideration. In the paper, a flow-shop A recent global trend in markets is the increasing variety In the paper, a flow-shop manufacturing environment in of products, as a trend key solution of companies towards success in A recent global in markets is the increasing variety consideration. of products, as a key solution of companies towards success In the paper, a flow-shop manufacturing environment in thethe optics industry is analyzed, where complexity of the of products, as a key solution of companies towards success is offering customized products that might satisfy individpaper, a flow-shop manufacturing environment in A recent global in markets is the increasing variety In the optics industry is analyzed, where complexity of the of products, as a trend key solution of companies towards success is offering customized products that might satisfy individthe optics industry is analyzed, where complexity leadthe time prediction resulted by thecomplexity diversity of the paper, a flow-shop manufacturing environment in is offering customized products that might satisfy individualoffering needs. customized The ofcompanies highmight volume and individvariety In industry is is where the of as a combination key solution of towards leadoptics time prediction isanalyzed, resulted by the the diversity diversity of the is products satisfy ualproducts, needs. The The combination of that high volume volume and success variety the lead time prediction is resulted by of process parameters, and especially by the high influence the optics industry is analyzed, where complexity of the ual needs. combination of high and variety can be provided by applying mass customization strategy time predictionand is resulted bybythe of the is offering customized products might satisfy individprocess parameters, especially thediversity high influence influence ual needs. The combination of that high volume andstrategy variety lead can be provided provided by applying applying mass customization process parameters, and especially by the high of customer order stream on thebyjobs in progress. Allead time prediction is resulted diversity of the can be by mass customization strategy thatneeds. the ability to provide individualized products parameters, and especially bythe the high influence ual Theability combination of high volume andproducts variety process of customer order stream on the jobs in progress. Alcan behas provided by applying mass customization strategy that has the to provide individualized of customer order stream on the jobs in progress. Although companies in the optics industry aims at applying process parameters, and especially by the high influence that has the ability to provide individualized products by utilizing high flexibility ofindividualized the processesproducts and re- of customer order instream on the jobs aims in progress. Alcan behas provided by applying mass customization strategy though companies the optics industry at applying that thethe ability to provide by utilizing the high flexibility of the processes and rethough companies instream the optics industry aims at applying mass customization strategy to produce customized customer order in on the jobshighly in progress. Alby utilizing the high flexibility of the the processes and re- of sources et al., 2012). Besides custom design, companies the optics industry aims at applying that has(Fogliatto thethe ability to provide products mass customization strategy to produce produce highly customized by utilizing high flexibility ofindividualized the the processes and re- though sources (Fogliatto et al., al., 2012). Besides custom design, mass customization strategy to highly customized products in a high volume, late differentiation (which is the though companies in the optics industry aims at applying sources (Fogliatto et 2012). Besides the custom design, important element of the service level expected by the customization strategy todifferentiation produce highly(which customized by utilizing the high flexibility of level the the processes re- mass products in aa high high volume, volume, late is the the sources (Fogliatto etof al., 2012). Besides custom and design, important element the service expected by the the products in late differentiation (which is principle of amass and thus batch producmass customization strategy todifferentiation produce highly customized important element of the service level expected by customers is the on-time delivery of the products, which in high customization) volume, late (which is the sources (Fogliatto etof al.,the 2012). Besides the custom design, principle of mass customization) and thus batch producimportant element service level expected by the products customers is the on-time delivery of the products, which principle of mass customization) and thus batch production is often impossible to apply (Braunecker et al.,produc2008). in high customization) volume, late differentiation (which is the customers is the on-time delivery of the products, which also leads to challenges indelivery the manufacturing. At most of products of amass and thus batch important element of the service level by the tion is is often often impossible to apply apply (Braunecker et al., al., 2008). 2008). customers is the on-time of the expected products, which also leads leads to to challenges in the the manufacturing. At most most of principle tion impossible to (Braunecker et In order to tackle this challenge, the individual job LTs principle of mass customization) and thus batch producalso challenges in manufacturing. At of the companies, where short delivery times are expected is often impossible apply (Braunecker et al.,job 2008). customers is challenges thewhere on-time of the products, which In order order to tackle tackle this to challenge, the individual individual LTs also leads to indelivery thedelivery manufacturing. most of tion the companies, companies, short times areAtexpected expected In to this challenge, the job LTs are predicted by machine learning (ML) techniques, based tion is often impossible to apply (Braunecker et al., 2008). the where short delivery times are by the customers, manufacturing lead times are among order to tackle this challenge, the individual jobbased LTs also leads to challenges in thedelivery manufacturing. most of In are predicted by machine learning (ML) techniques, the companies, where short times areAt expected by the customers, manufacturing lead times are among are predicted byproducts machine learning (ML) techniques, based on order the data ofby and processes obtained from the to tackle this challenge, the individual jobbased LTs by the customers, manufacturing lead times areare among the the top corporatewhere performance indicators. These to be In predicted machine learning (ML) techniques, the companies, short delivery areare expected on the the data of products products and processes obtained from the by customers, manufacturing leadtimes times among the top top corporate performance performance indicators. These are to be be are on data of and processes obtained from the manufacturing execution (MES) system. The ML-based are predicted by machine learning (ML) techniques, based the corporate indicators. These are to minimized to provide the customer-expected service level. the data of products processes obtained from the by the manufacturing lead times areareamong manufacturing executionand (MES) system. The ML-based ML-based the top customers, corporate performance indicators. These to be on minimized to provide the customer-expected service level. manufacturing execution (MES) system. The leadthe time prediction methods are system. compared to ML-based the data of products and processes obtained frommost the minimized to provide the customer-expected service level. Efficient solutions to the achieve short manufacturing lead manufacturing execution (MES) Theto the top corporate performance indicators. These are to be on lead time time prediction methods are compared the minimized to provide customer-expected service level. Efficient solutions to achieve achieve short manufacturing manufacturing lead lead prediction methods are system. compared to ML-based the most most common applied analytical techniques, highlighting their manufacturing execution (MES) The Efficient solutions to short lead times are provided by the lean principles, e.g. one-piecelead time prediction methods are compared to the most minimized to provide service level. common applied applied analytical analytical techniques, techniques, highlighting highlighting their their Efficient to achieve manufacturing lead common times are are solutions provided bythe thecustomer-expected leanshort principles, e.g. one-pieceapplicability in the production environment under lead timeapplied prediction methods are compared to thestudy. most times provided by the lean principles, e.g. one-pieceflow production, however, are often analytical techniques, highlighting their Efficient to short manufacturing lead common applicability in the production environment under study. times are solutions provided by achieve thelean leantools principles, e.g.complicated one-pieceflow production, however, lean tools are often complicated applicability in the production environment under study. Besides selecting the most appropriate prediction method, applied analytical techniques, highlighting their flow production, however, lean tools are often complicated to apply inbycase high customization isone-pieceexpected common in the production environment under study. times are directly, provided thelean leantools principles, e.g.complicated Besides selecting selecting the most appropriate appropriate prediction method, flow production, however, are often to apply apply directly, in case case high customization is expected expected applicability Besides the most prediction method, the digital data twin of the products and processes is applicability in the production environment under study. to directly, in high customization is (Stump and Badurdeen, 2012). Furthermore, it is not only selecting appropriate method, flow production, however, lean are oftenitcomplicated the digital digital data the twinmost of the the productsprediction and processes processes is to apply directly, in case2012). high tools customization isis expected (Stump and Badurdeen, Furthermore, not only only Besides the data twin of products and is also defined, which encompasses the prediction models Besides selecting the most appropriate prediction method, (Stump and Badurdeen, 2012). Furthermore, it is not hard to maintain decreasing the LT, but it is also often the digital data twin of the products and processes is to apply directly, in case2012). highthe customization isisalso expected also defined, defined, which which encompasses encompasses the the prediction prediction models models (Stump and Badurdeen, Furthermore, it notoften only also hard to maintain decreasing LT, but it is and digital their application byofmaintaining reliability of the data twinencompasses the products and processes is hard to maintain decreasing the LT, but it it is also often complicated to predict it, asthe customized have also defined, which thethe prediction models (Stump Badurdeen, 2012). Furthermore, notoften only the and their their application by maintaining maintaining the reliability of the the hard to and maintain decreasing LT, but itproducts is isalso complicated to predict predict it, as customized customized products have and application by the reliability of prediction with frequent revision and periodic retraining also defined, which encompasses the prediction models complicated to it, as products have several features influencing the manufacturing parameters, and their application by maintaining the reliability of the hard to maintain decreasing the LT, but it is also often prediction with frequent revision and periodic retraining complicated to influencing predict it, the as manufacturing customized products have prediction several features features parameters, with frequent revision and periodic retraining based on the latest historical and quasi real-time data. their application by maintaining reliability of the several influencing the manufacturing parameters, thus differentiating the lead as well.products The accurate prediction with frequent revision andthe periodic retraining complicated to influencing predict it, the astimes customized have and based on the the latest historical and quasi quasi real-time data. several features manufacturing parameters, thus differentiating the lead times as well. The accurate based on latest historical and real-time data. with frequent revision and periodic retraining thus differentiating the lead times as well. The accurate prediction based on the latest historical and quasi real-time data. several features influencing the manufacturing parameters, thus differentiating the lead times as well. The accurate thus differentiating Copyright © 2018 IFAC the lead times as well. The accurate1048based on the latest historical and quasi real-time data.

2405-8963 © IFAC (International Federation of Automatic Control) Copyright © 2018, 2018 IFAC 1048Hosting by Elsevier Ltd. All rights reserved. Copyright 2018 responsibility IFAC 1048Control. Peer review©under of International Federation of Automatic Copyright © 2018 IFAC 1048 10.1016/j.ifacol.2018.08.472 Copyright © 2018 IFAC 1048

IFAC INCOM 2018 1030 Bergamo, Italy, June 11-13, 2018

Dávid Gyulai et al. / IFAC PapersOnLine 51-11 (2018) 1029–1034

1.2 Lead time prediction: state-of-the-art methods In the paper, both analytical and data analytics models are applied for LT prediction, and they are compared regarding the prediction accuracy. The most fundamental analytical method in production control is Little’s law, which states that the average number of items in a queuing system, denoted by L, equals the average arrival rate of items to the system, λ, multiplied by the average waiting time (or lead time) of an item in the system, W, thus L = λW (Little, 2011). In recent manufacturing environments where a great variety of products are produced and process parameters are varying, lead times are affected by several parameters that cannot be simply considered even in more sophisticated analytical methods. Therefore, data analytics and machine learning methods are often applied for lead time prediction. The applicability of these methods is strengthened by the cyber-physical production systems that provide quasi-real time data about processes and products as well as by the latest advances in statistical and machine learning (Monostori et al., 2016). Various data analytics methods are applied for lead time ¨ urk et al., 2006), estimation, e.g. regression trees (Ozt¨ support-vector machines (Alenezi et al., 2008), deep neural networks (Wang and Jiang, 2017) or linear regression methods (Sabuncuoglu and Comlekci, 2002). Although these methods are proven to be efficient to predict LTs in certain cases, similarly to other machine learning tasks, the model and feature selection are always depending on the production environment under study. Therefore, there is no rule of thumb for selecting a model or algorithm to predict LT, but a comprehensive analysis is needed to be performed beforehand. 2. PROBLEM STATEMENT 2.1 Production environment In the paper, a flow-shop manufacturing environment is analyzed, in which optical lenses for eyeglasses are produced. The first, shaping process of the plastic lenses is performed in the system in consideration, which is only part of the whole manufacturing procedure. The forming process of the raw lens is mainly done by machine cutting and some supporting sub-processes; in the analyzed area, 6-8 process steps are performed in total, depending on the lens type. The very first step is a separation of the jobs: according to a main attribute, A and B lens types are distinguished that follow separate routing and process steps on the shop-floor except the very final, common process step when the material flows of A and B lenses join again. Important characteristics of the industrial branch analyzed is the handling of customer orders, which increase the complexity of LT prediction, while also making it more important to perform accurately. Eyeglasses and especially the lenses are highly customized products, and orders are placed by the optician at the stores after inspecting the customers’ eyes. Therefore, the applied manufacturing strategy is called make-to-recipe, stands for providing customized products to the customers, without applying production batches. Batching is not possible in the analyzed manufacturing stage, as most lenses have aspherical geometry, and the main optical features of a lens —which

are customer-dependent— are created by the very first, complex abrasive process steps (Braunecker et al., 2008). The lenses are produced following the order placements coming from the stores during their opening times. Accordingly, the order stream has a quasi-periodic nature, with peak values during the opening time of the optics stores, and valleys during the night. Typical workload indicators are ca. 2500 completed per day, and the WIP ranges between 500-2000 jobs. Besides the impossibility of batch production, the problem complexity is increased by the customer-expected contractual delivery service: there is a short due date, until the products need to be produced and delivered in the optics store. As the company under study is responsible to deliver the products to several destination countries, a reduced time slot is available for production, to keep enough reserve time for delivering the products in the stores. According to the previous points, engineers do not have the option for applying traditional production planning and scheduling tools, but they are aimed at controlling the lead times of the products by allocating the proper amount of manpower to smooth the fluctuation of production, resulted by the incoming orders. Besides, they seek for accurate LT prediction to assign due dates for jobs precisely. To achieve this, the processing of the MES log is applied: log entries are available for all the jobs, including all of their features and the time stamps corresponding to the process steps. Although this data is available to the subprocesses as well, the LT prediction in this case stands for the estimation of the time that spans between the arrival and exit of the job in/from the system. Conclusively, time stamps among the entry and exit points are disregarded in this case. The features of job k that are relevant in this segment of the process chain is denoted by a tuple Fk = ik , ok , uk , mk , sk , qk , pk . Parameters ik and ok are the entry and exit times of job k in the system. The features uk , mk , sk , qk and pk denote the type (A or B), material (four different materials exist), shape (two shape categories exist), quantity (1 or 2) of lenses and priority of job k, respectively. Although several steps are automated, some of them are manual, thus relevant process related parameter is the headcount ht of operators working in the system in time t, which data is available in the MES at any point of time. Parallel machines are available to perform some of the process steps, and the typical number of parallel resources is 5-15. At some processes, operators are performing only the machine handling and loading/unloading, the typical number of operators in a working shift is around 40, and the number of machines is above 100. Based on the above data, the task is to predict the manufacturing lead time tL k of a job k when it arrives to the system, based on the actual status of the system and the parameters Fk of the job. 2.2 Descriptive statistical analysis Before defining the LT prediction models, it is useful to make a descriptive statistical analysis in order to get a comprehensive picture of the data, upon which the models will rely. As described in the previous section, customer order arrivals follow a quasi-periodic pattern, based on the opening times of the stores where the orders are placed.

1049

IFAC INCOM 2018 Bergamo, Italy, June 11-13, 2018

Dávid Gyulai et al. / IFAC PapersOnLine 51-11 (2018) 1029–1034

1031

method presented by Kim and Whitt (2013) calculates the elements of Little’s law as they follow. Let tL k = ok − i k denote the lead time of job k, and r0 count the jobs that arrived before the start of observations (the starting WIP) and remain in the system at time t = 0. Representing the arrival process and the WIP as the function of time t, the total number of new job arrivals in the interval [0, t] is denoted by A(t); and Lt is the WIP at time t. The average values of arrival rate λ and WIP L(t) over the interval of observations are the followings: t −1 −1 λ = t A(t) L(t) = t L(s)ds (1) 0

Fig. 1. Histogram of the jobs’ lead times This fluctuation of the order arrivals leads to diverse lead times as well. Regarding the previously applied weekly data with the incoming orders and applying changing operator headcounts in the different working shifts, the distributions of the lead times are represented in Fig. 1. According to the histogram, it can be inferred that LT of the individual jobs cannot be predicted by taking simply the historical mean values, as the deviation of the LT is rather high, and the distribution does not follow any statistical pattern. Even though the deviation of the LTs and the fluctuation of the order arrivals, some of the lead time estimation methods are capable of handling this kind of system dynamics, with the constraint that time series describing the processes (e.g. order arrivals) need to be stationary. Stationarity is a fundamental concept of time series analysis, and defines that a process remains around a statistical equilibrium with probabilistic properties that do not change over time, in particular varying around a constant mean level and with constant variance (Box et al., 2015). Therefore, stationarity test was performed to analyze the arrival process of the jobs: in case the process is stationary, dynamic LT prediction methods —especially those rely on Little’s law— can be applied with higher probability of success. For the stationarity analysis, the ADF (Said and Dickey, 1984) and KPSS (Kwiatkowski et al., 1992) tests were applied using tseries R package of Trapletti and Hornik (2017). According to the test results, the weekly and daily job arrival processes are stationary around a mean, whereas the hourly arrivals are non-stationary. Therefore, the analytical LT prediction methods require stationarity of the data can be applied in periodical forms, with the finest time granularity (length of the periods) set to one day. This also imply that the dynamics of the processes within a day cannot be captured with those methods, but only daily mean LT values can be obtained.

As the arrival process and also the WIP can be observed easily (data is available in the MES system), the average value of lead time over the interval can be calculated with Little’s law applying λ and L(t) as estimators. In the analytical tests, the lead time tL k ≈ W L,λ (t) of a job k arriving at time ik = t is assigned by applying the following formula: L(t) tL (2) ∀ k | ik = t k ≈ W L,λ (t) = λ(t) In the test case, the above formulas were applied to predict the LT of jobs by setting the length of the intervals to one week and one day, respectively. As for the measure of the lead time prediction accuracy, throughout the paper, the normalized root-mean square error (NRMSE on the test dataset) is applied that gives the average prediction error in the percentage of the real lead time values. The calculation of NRMSE is provided by Eq. 3, where Oi and Si are the real and predicted lead time values, respectively, and the Omax and Omin are the extrema of the actual lead times.   N 2 1 i=1 (Si − Oi ) n NRMSE = 100 (3) Omax − Omin

3. ANALYTICAL METHODS FOR LT PREDICTION The most simple yet efficient analytical LT prediction method is Little’s law that can be applied to dynamic systems by dividing the time into finite intervals. Any of the average LT, arrival rate and processing time values can be calculated by taking observations about the other two parameters over the given interval. This finite time interval

Fig. 2. Job lead times with finite interval Little’s law (job type A): actual LT values, one-day periodic and oneday rolling horizon prediction In the first analytical tests, the LT tL k of job k was calculated by taking into account the data of jobs over

1050

IFAC INCOM 2018 1032 Bergamo, Italy, June 11-13, 2018

Dávid Gyulai et al. / IFAC PapersOnLine 51-11 (2018) 1029–1034

[0, ik ], and the starting time t = 0 of the interval is fixed to the start of a given day. The period length —due to the results of the stationarity test— was set to one day. As depicted by Fig. 2, the general trends of the actual lead times can be captured by the analytical method, however, significant errors occur in the beginning of each periods, due to the lack of a-priori data. The accuracy of the prediction is slowly increasing together with the number of samples, however, average error term is high in general. Therefore, the analytical prediction was also performed by applying a rolling horizon method one-day-wide sliding horizon. This means, that each new job’s lead time was predicted by taking into account all jobs’ lead time within the previous 24 hours. This method could capture the system’s dynamics more efficiently, yet resulted in high average NRMSE values around 30% and 16% for the two lens types (Table 1). 4. LEAD TIME PREDICTION WITH STATISTICAL LEARNING METHODS As the prediction results of the introduced analytical method were not satisfactory due to the rather high NRMSE (above 25%), statistical learning methods were applied that are capable of predicting the lead times based on the individual features of the jobs. In the following sections, the most commonly applied regression methods will be presented: the multivariate linear models, the treebased methods (nonlinear) and the support-vector regression. In each of the cases, the selection of the features and fine tuning of the parameters was performed with forward stepwise procedure, and importance ranking supported by the random forests. As for the selected features, the most accurate results were provided by estimating the LT of a job k with the following features: L tL k = tk (uk , mk , sk , qk , pk ), and adding ht and the input hour as numerical features at time t = ik . 4.1 Linear regression The rationale behind the application of linear methods in this case relies in the fact that they are capable of capturing the linear correlation among the features applied already in analytical model. Moreover, the other production related features might be also in a nearlinear correlation with the lead time, e.g. the number of operators. Therefore lead times are predicted by applying the multivariate linear models, applying separate models for both lens types. As mentioned in the feature selection, due to the quasi periodical nature of the order arrivals, the input hour of a given job is added as an additional feature with the aim of increasing the accuracy of the prediction by having a time-dependent factor in our model. During the model building and feature selection, 10-fold cross validation was applied to estimate the prediction accuracy of the models. The general results of the multivariate linear regression are provided in Table 1.

science is to build the simplest yet most accurate models that possible. Classification and regression trees are easy to interpret in most of the cases, while they are capable of capturing nonlinear correlation among the variables (James et al., 2013). However, in regression tasks, the output of the trees are discrete by nature, and depending on the complexity of the tree, it might take values only from a limited set. In order to avoid overfitting and reduce the bias that often occurs when ”deep” trees are grown on the data, ensemble methods are applied that are capable of enhancing the regression results, in contrast to the single-tree methods. Thus, random forests method was also applied to predict the lead times, and its accuracy was compared to the linear and single-tree regression. According to the results, random forests could outperform both methods, providing lower NRMSE values (Table 1). However, one need to consider the trade-off between interpretability and accuracy: although simple linear regression has less accuracy than random forests, the latter has higher complexity and thus lower interpretability. Moreover, random forests are inflexible considering the training data: in contrast to the linear regression, they cannot provide estimations outside the boundaries of the training dataset (cannot extrapolate). 4.3 Support-vector regression The last tested data analytics method was the support vector regression that are often similar in performance to the random forests: it usually provides high prediction accuracy, however, the interpretability is much worse than that of the linear methods, due to the procedure of model fitting. In case of support vector (SV) regression, a convex optimization problem is solved in order to fit a function on a training dataset that is mapped into a higher dimension hyperspace. Accordingly, the model obtained by applying the most common ε-SV regression cannot be directly translated to technical language, therefore, it supports production control decisions less efficiently than linear methods. The results of the lead time analysis support the above theory, namely that SV regression provided more accurate prediction than the simple linear models, however, the gain in accuracy is far less than the loss in interpretability (Table 1). 5. STABILITY AND RELIABILITY OF THE MODELS According to the results of the data analytics methods summarized in the previous section, the random forests method was selected for further analysis, regarding its sensitivity on the input data that is essential from the viewpoint of implementing the digital data twin.

4.2 Tree-based models Although the above defined linear models provided nearto-satisfactory results and outperformed the analytical models according to the NRMSE measure, the goal of data 1051

Table 1. Accuracy of the tested LT prediction methods, compared based on the NRMSE Little’s law (periodic, one-day) Little’s law (rolling horizon) Linear regression Regression tree Random forests SV regression

NRMSEA [%] 28.7 30 11.2 9.4 6.1 9.1

NRMSEB [%] 17.5 16.5 9.1 7.4 3.1 7.6

IFAC INCOM 2018 Bergamo, Italy, June 11-13, 2018

Dávid Gyulai et al. / IFAC PapersOnLine 51-11 (2018) 1029–1034

5.1 The digital data twin The data analysis performed in the previous section was a proof-of-the-concept that manufacturing lead times can be accurately predicted by data analytics tools, even in case the production environment is under dynamic changes. Besides, the results also emphasize that in such a nonstationary environment, conventional analytical tools — that are mostly applied in industrial practice— are clearly outperformed by the data analytics and statistical learning methods. As highlighted earlier, the main enabler of using ML techniques for lead time prediction relies on the fact that cyber-physical production systems are state-ofthe-art, became part of the industrial practice and not only exist in experimental environments. The essence of these systems is the availability of the data: in several cases, up-to-date or even quasi-real time information can be obtained from products, processes and production systems. This is also valid for the test environment: all data that applied in the previous lead time analysis can be gathered in minutes from the historical (that left the system already), and also about the in-progress jobs. Utilizing this feature of the environment, one can implement the digital data twin of the system, replacing the offline analytics models. In all the previous cases, the prediction accuracy of the models was evaluated by applying random subsets of historical data considering a fix, relatively long timeframe. The digital data twin concept implements the co-existence and co-evolution of the statistical learning model with the physical process. This is especially important in case of dynamically changing conditions, e.g., machine breakdowns, stochastic time parameters or non-stationary job arrivals. Theoretically, two main options exist to implement such a digital twin: (i) selecting an online learning algorithm that can be trained incrementally (e.g. neural networks), or (ii) applying a mini-batch training with any of the previously mentioned models. In the second case, only the latest complete data samples are applied for the training, and the model is used to predict the new data samples. Accordingly, one need to select the timeframe that will form the basis of the rolling horizon training and prediction. The models are trained periodically (the period length equals the timeframe), and the new data samples within the next time period will be predicted upon this model. Of course, the digital data twin will rely most on the actual status of the system in case the selected timeframe is minimal. Conclusively, the digital data twin itself is a statistical learning model that is trained by applying the latest possible data samples obtained from the physical system, and capable of predicting predefined production parameters. Accordingly, one physical environment might have more digital data twins, each with different prediction capabilities. In the paper, only a single digital twin of the manufacturing environment is implemented, which can predict the lead times of new job arrivals, by knowing the actual, and latest states of the system, and also the lead time of the latest jobs that left the system. 5.2 Periodic retrain of the models Based on the above characteristics, the main task of digital data twin implementation is the proper selection

1033

of the model retraining period (additionally to the general statistical learning tasks that were detailed in Section 4). In case of the system is exposed to disturbances and the parameters change dynamically, the training data need to capture the extrema that the parameters supposedly reaches in the upcoming period. This is valid especially for the algorithms that are weak in extrapolation, for example the random forests method that performed well in the offline analysis. In order to implement the digital data twin of the system under study, periodic retrain of the RF model (analyzed in Section 4.2) was performed, testing various retrain interval length. The key measure of prediction accuracy was the NRMSE on a comparable subset of the testing data 1 . Important to highlight that the lengths of the retraining intervals are based on the number of data samples, and not proportional with time units. Therefore, in case of frequent arrivals in the rush periods, the models are trained more frequently. Additionally, the training of the models was always performed by using complete data subsets, including all features of tuple Fk .

Fig. 3. Results of the periodic model retrain: effect of the period length on the prediction error (NRMSE); splines are fitted on points with local regression (loess) applying d = 0.3 distance value. In the experiments, the length of the model retraining period was changed between 100 and 5000 samples, applying 100 samples increments. Accordingly, 50 NRMSE values were obtained for both A and B lens types. Observing Fig. 3, it can be concluded that the length of the retraining period has a significant impact on the accuracy of the prediction, and therefore, the similarity of the digital data twin to the physical system. In case of both lens types, the shortest periods provide accurate prediction results, due to the frequent model adjustment, however, the prediction error increases significantly together with the period length. After reaching a peak at around 1000 period length, the error starts decreasing again, and achieves a relatively stable value. This remains almost constant, even though the period length is further increased. Conclusively, in the test case, short retraining periods would be proposed which also results in short model fitting times due to the 1 Longer retrain periods results in shorter test sets, due to the fact the first period can be used only for training, excluding prediction.

1052

IFAC INCOM 2018 1034 Bergamo, Italy, June 11-13, 2018

Dávid Gyulai et al. / IFAC PapersOnLine 51-11 (2018) 1029–1034

small training set. However, due to the significant increase in prediction error, it is hard to find the shortest yet satisfactory retrain interval, but rather more efficient to apply longer retraining periods. This involves longer model fitting times (cca. 50 seconds in case of 2000 samples), but the models will rely on a more representative training set. Besides, important to note that the period length should not be overly increased, as the model fitting times increase significantly, while the prediction accuracy remains at the same level. 6. CONCLUSIONS AND OUTLOOK 6.1 Discussion of the results According to the test results, it is obvious that data analytics and machine learning based prediction models can outperform the analytical ones in lead time prediction tasks, in case the process under study are non-stationary. Besides the dynamics of the system, another important advantage of ML tools is the efficient consideration of the job features that can take several values in a typical manufacturing environment where a great variety of products are made. As for the selection of the proper prediction model, in general, the best choice much depends on the parameter to be predicted and the process under study. However, all analyzed methods are capable of providing accurate results, and each of them can be implemented in a real industrial environment. In case accuracy is the primary KPI, random forests might be the best choice, whereas in case of simplicity, linear methods (with fine tuned parameters) are also capable of providing accurateenough prediction performance. As introduced throughout the case study, an important step after the selection of the prediction method is the implementation of the digital data twin. By nature, a prerequisite of this step is the existence of a proper data collection system that is an MES system with logging and tracking functionalities in most of the cases. According to the test results, the selection of the model retraining interval is of crucial importance to obtain accurate and reliable results, therefore, an offline study —as provided in section 5.2— is suggested, as this parameter also process-dependent. 6.2 Future work As for the future work, the authors consider the above described study as a major step towards the implementation of a situation-aware, active production controller that utilizes ML tools to increase the reliability of typical control decision as due-date assignment, order priorization and dispatching. With the support of the digital data twin, the predicted lead times and the prediction models can be applied in decision making mechanisms, such as optimization models. To this aim, the currently existing method is planned to be applied not only for predicting the lead times before the release of the jobs, but also to predict the expected, remaining lead time of in-progress jobs. This would enable to change the priority of the jobs during production, and/or alter the routing of a given product if it will certainly in late applying the original routing.

ACKNOWLEDGEMENTS The research in this paper was supported by the European Commission through the H2020 project EPIC (https://www.centre-epic.eu/) under grant No. 739592, and by the GINOP-2.3.2-15-2016-00002 grant on an ”Industry 4.0 research and innovation center of excellence”. REFERENCES Alenezi, A., Moses, S.A., and Trafalis, T.B. (2008). Realtime prediction of order flowtimes using support vector regression. Computers & Operations Research, 35(11), 3489–3503. Box, G.E., Jenkins, G.M., Reinsel, G.C., and Ljung, G.M. (2015). Time series analysis: forecasting and control. John Wiley & Sons. Braunecker, B., Hentschel, R., and Tiziani, H.J. (2008). Advanced optics using aspherical elements, volume 173. Spie Press. Fogliatto, F.S., Da Silveira, G.J., and Borenstein, D. (2012). The mass customization decade: An updated review of the literature. International Journal of Production Economics, 138(1), 14–25. James, G., Witten, D., Hastie, T., and Tibshirani, R. (2013). An introduction to statistical learning, volume 112. Springer. Kim, S.H. and Whitt, W. (2013). Statistical analysis with little’s law. Operations Research, 61(4), 1030–1045. Kwiatkowski, D., Phillips, P.C., Schmidt, P., and Shin, Y. (1992). Testing the null hypothesis of stationarity against the alternative of a unit root: How sure are we that economic time series have a unit root? Journal of econometrics, 54(1-3), 159–178. Little, J.D. (2011). OR Forum – Little’s law as viewed on its 50th anniversary. Operations research, 59(3), 536– 549. Monostori, L., K´ad´ar, B., Bauernhansl, T., Kondoh, S., Kumara, S., Reinhart, G., Sauer, O., Schuh, G., Sihn, W., and Ueda, K. (2016). Cyber-physical systems in manufacturing. CIRP Annals-Manufacturing Technology, 65(2), 621–641. ¨ urk, A., Kayalıgil, S., and Ozdemirel, ¨ Ozt¨ N.E. (2006). Manufacturing lead time estimation using data mining. European Journal of Operational Research, 173(2), 683– 700. Sabuncuoglu, I. and Comlekci, A. (2002). Operation-based flowtime estimation in a dynamic job shop. Omega, 30(6), 423–442. Said, S.E. and Dickey, D.A. (1984). Testing for unit roots in autoregressive-moving average models of unknown order. Biometrika, 71(3), 599–607. Stump, B. and Badurdeen, F. (2012). Integrating lean and other strategies for mass customization manufacturing: a case study. Journal of Intelligent manufacturing, 23(1), 109–124. Trapletti, A. and Hornik, K. (2017). tseries: Time Series Analysis and Computational Finance. URL https://CRAN.R-project.org/package=tseries. R package version 0.10-42. Wang, C. and Jiang, P. (2017). Deep neural networks based order completion time prediction by using realtime job shop RFID data. Journal of Intelligent Manufacturing, 1–16.

1053