Towards proactive maintenance actions scheduling in the Semiconductor Industry (SI) using Bayesian approach

Towards proactive maintenance actions scheduling in the Semiconductor Industry (SI) using Bayesian approach

IFAC Conference on Manufacturing Modelling, IFAC Manufacturing Management and on Control IFAC Conference Conference on Manufacturing Modelling, Modell...

1MB Sizes 0 Downloads 20 Views

IFAC Conference on Manufacturing Modelling, IFAC Manufacturing Management and on Control IFAC Conference Conference on Manufacturing Modelling, Modelling, IFAC Conference on Manufacturing Management and June 28-30, 2016. Troyes, France Modelling, Available online at www.sciencedirect.com Management and Control Control Management and Control June 28-30, 28-30, 2016. 2016. Troyes, Troyes, France France June June 28-30, 2016. Troyes, France

ScienceDirect

IFAC-PapersOnLine 49-12 (2016) 544–549 Towards proactive maintenance actions scheduling in the Towards proactive maintenance actions scheduling in the Towards proactive maintenance actions scheduling in Semiconductor Industry (SI) using Bayesian approach Towards proactive maintenance actions scheduling in the the Semiconductor Industry Industry (SI) (SI) using using Bayesian Bayesian approach approach Semiconductor Semiconductor Industry (SI) using Bayesian approach Anis BEN SAID*’ **, Muhammad-Kashif SHAHZAD**, Eric ZAMAI**,

Anis BEN SAID*’ **, Muhammad-Kashif SHAHZAD**, Stephane HUBAC*, Michel TOLLENAERE** Anis **, Muhammad-Kashif SHAHZAD**, Eric Eric ZAMAI**, ZAMAI**, Anis BEN BEN SAID*’ SAID*’ **, Muhammad-Kashif SHAHZAD**, Eric ZAMAI**, Stephane HUBAC*, Michel TOLLENAERE** Stephane HUBAC*, Michel TOLLENAERE** Stephane HUBAC*, Michel TOLLENAERE** * STMicroelectronics, 850 Rue Jean Monnet 38926 Crolles, France ** STMicroelectronics, 850 Rue Jean Monnet 38926 Crolles, France ([email protected], [email protected]) STMicroelectronics, 850 Rue Monnet 38926 Crolles, * STMicroelectronics, 850 Rue Jean Jean MonnetCNRS, 38926G-SCOP, Crolles, France France ([email protected], [email protected]) ** Univ. Grenoble Alpes, G-SCOP, F-38000 Grenoble, France F-38000 Grenoble, France ([email protected], [email protected]) ([email protected], [email protected]) ** Grenoble, F-38000 ([email protected], [email protected]) ** Univ. Univ. Grenoble Grenoble Alpes, Alpes, G-SCOP, G-SCOP, F-38000 [email protected], Grenoble, France France CNRS, CNRS, G-SCOP, G-SCOP, F-38000 Grenoble, Grenoble, France France ** Univ. Grenoble Alpes, G-SCOP, F-38000 Grenoble, France CNRS, G-SCOP, F-38000 Grenoble, France ([email protected], ([email protected], [email protected], [email protected], [email protected]) [email protected]) ([email protected], [email protected], [email protected]) Abstract: The Semiconductor Industry (SI) is one of the fastest growing manufacturing domains Abstract: ThetheSemiconductor Industryproduction (SI) is one one of theproduct fastest life growing manufacturing domains challenged by high-mix low-volume andof short cycles.manufacturing This results an domains increase Abstract: The Industry (SI) the fastest growing Abstract: ThetheSemiconductor Semiconductor Industryproduction (SI) is is one of theproduct fastest life growing manufacturing domains challenged by high-mix low-volume and short cycles. This results an increase in unscheduled equipment often result decreasing unstableThis production capacities. challenged by the high-mixbreakdowns low-volumethat production and in short product and life cycles. results an increase challenged high-mix low-volume production and short product life cycles. This an increase in unscheduled equipment breakdowns that often result in decreasing and unstable production capacities. The successby in the the SI depends on our ability to quickly recover from unplanned events. Itresults is reported (Abuin unscheduled equipment breakdowns that often result in decreasing and unstable production capacities. in unscheduled equipment breakdowns that often result in decreasing and unstable production capacities. The success in the SI depends on our ability to quickly recover from unplanned events. It is reported (AbuSamah et al, 2014) that misdiagnosis is one of the key reason for the extended failure durations. This is The success in the SI depends on our ability to quickly recover from unplanned events. It is reported (AbuThe success in2014) the depends on procedures our ability unplanned events. Itdurations. is reported (AbuSamah etofal, al,the that misdiagnosis is one oneto ofquickly the key keyrecover reasonfrom for the the extendedfor failure This is because factSIthat existing to support maintenance decisions an equipment recovery Samah et 2014) that misdiagnosis is of the reason for extended failure durations. This is Samah etofal,the2014) that existing misdiagnosis is one to of support the key maintenance reason for the extendedforfailure durations.recovery This is because fact that procedures decisions an equipment are oftenofbased on FMEA approach that represents static experts' knowledge. In this paper, we present a because the that procedures to maintenance decisions for an recovery because ofbased the fact fact that existing existing procedures to support support maintenance decisions In forthis an equipment equipment recoverya are often on FMEA approach that represents static experts' knowledge. paper, we present methodology based on Bayesian network (BN) tostatic advise technicians on the choice of we maintenance are often based on FMEA approach that represents experts' knowledge. In this paper, present a are often basedbased on FMEA approachnetwork that represents experts' knowledge. In this paper, we present a methodology on Bayesian (BN)We tostatic advise technicians on the the choice ofand maintenance procedure in case ofon unscheduled breakdowns. argue that the sequence of choice patternsof alarms as methodology based Bayesian network (BN) to advise technicians on maintenance methodology based on Bayesian network (BN) to advise technicians on the choice of maintenance procedure in the caseequipment of unscheduled breakdowns.canWe argue that the sequence of maintenance patterns andprocedure; alarms as generated by during production be argue associated to the choice of procedure in case of breakdowns. that sequence of patterns alarms procedure in the caseequipment of unscheduled unscheduled breakdowns.canWe We argue that the the sequence of maintenance patterns and andprocedure; alarms as as generated by during production be associated to the choice of therefore, by BNthe is learned as aduring function of these can alarms and warnings to predict themaintenance choice of maintenance generated equipment production be associated to the choice of procedure; generated by the equipment during production can be associated to the choice of maintenance procedure; therefore, BN is unscheduled learned as aa function function of these alarmsdata. and The warnings towarnings predict the choice of maintenance procedure BN from breakdown historical set ofto and alarms are grouped therefore, is learned of alarms and warnings predict choice of therefore, BN is unscheduled learned as as a function of these these alarmsdata. and The warnings towarnings predict the the choice of maintenance maintenance procedure from breakdown historical set of and alarms are based grouped together in the proposed methodology using hybrid approach where these are first clustered on procedure from unscheduled breakdown historical data. The set of warnings and alarms are grouped procedure unscheduled breakdown historical data. The set of and areThe grouped together infrom the proposed methodology using hybrid approach where these are first alarms clustered based on distribution similarity followed by anusing experts' intervention to finewarnings tune are initial clusters. main together in the proposed methodology hybrid approach where these first clustered based on together in the proposed methodology hybrid approach to where first clusters. clusteredThe based on distribution similarity followed by anusing experts' intervention finethese tune are initial main contribution of the proposed methodology is to support technicians with advise on the choice of most likely distribution similarity followed by intervention to fine tune initial clusters. The main distribution of similarity followed by an an experts' experts' intervention to with fineadvise tune on initial clusters. The likely main contribution the proposed methodology is to support technicians the choice of most effective maintenance procedure that willisreduce unscheduled breakdown period and help of in most improving contribution of methodology to technicians with on choice likely contribution of the the proposed proposed methodology isreduce to support support technicians with advise advise on the the choice of most likely effective maintenance procedure that will willThe unscheduled breakdown period and help instudy, improving and stabilizing the production capacities. proposed methodology is validated with ahelp casein from effective maintenance procedure that reduce unscheduled breakdown period and improving effective maintenance procedure that will reduce unscheduled breakdown period and help in improving and world stabilizing the semiconductor production capacities. capacities. The proposed proposed methodology isThe validated with a case case study, from the reputed manufacturer, using historical data.is resultswith show 49%study, of gain in and stabilizing the production The methodology validated a from and stabilizing the semiconductor production capacities. The proposed methodology validated a case from the world reputed manufacturer, using periods. historical data.isThe The resultswith show 49%study, of gain gain in terms of productive time from unscheduled breakdown the world reputed semiconductor manufacturer, using historical data. results show 49% of in the world reputed semiconductor manufacturer, using periods. historical data. The results show 49% of gain in terms of productive productive time from unscheduled unscheduled breakdown terms of time breakdown periods. Keywords: Maintenance actions guidance, unscheduled maintenance Bayesian network, FDC, terms of IFAC productive time from from unscheduled breakdown periods. © 2016, (International Federation of Automatic Control) Hosting byduration, Elsevier Ltd. All rights reserved. Keywords: Maintenance Maintenance actions guidance, guidance, unscheduled unscheduled maintenance maintenance duration, duration, Bayesian Bayesian network, network, FDC, FDC, maintenance procedure. actions Keywords: Keywords: Maintenance maintenance procedure. actions guidance, unscheduled maintenance duration, Bayesian network, FDC, maintenance maintenance procedure. procedure. 1. INTRODUCTION the decision support to choose appropriate procedures based 1. the decision support to appropriate based on knowledge extracted from historical procedures diagnosis efforts. 1. INTRODUCTION INTRODUCTION the the decision support to choose choose appropriate procedures based The semiconductor industry (SI) is characterized by high-mix on 1. INTRODUCTION the the decision support to choose appropriate procedures based knowledge extracted from historical diagnosis efforts. We argue that the FMEA with static knowledge must evolve on the knowledge extracted from historical diagnosis efforts. The semiconductor industry (SI) is characterized by high-mix on the knowledge extracted from historical diagnosis efforts. low-volume and a multi technology production environment, The semiconductor industry (SI) is characterized by high-mix argue thatwith the FMEA with static knowledge The semiconductor industry (SI) is characterized by high-mix We dynamically emerging equipment behaviourmust (Milievolve et al., We argue the FMEA static must evolve low-volume aa multi technology production environment, argue that thatwith the emerging FMEA with with static knowledge knowledge must evolve challenged short product life cycles, where demand comes We low-volumebyand and multi technology production environment, dynamically equipment behaviour (Mili et al., 2009). This must be updated based on the lessons learned by dynamically with emerging equipment behaviour (Mili et low-volume and a multi technology production environment, challenged by short product life cycles, where demand comes dynamically with be emerging equipment behaviour (Mili et al., al., from end-user markets (Ballhaus et al., 2009). This results in a 2009). challenged by short product life cycles, where demand comes This must updated based on the lessons learned by technicians in the diagnosis during unscheduled breakdown challenged by short product life cycles, where demand comes 2009). This must be updated based on the lessons learned from markets (Ballhaus et results in 2009). This in must bediagnosis updated during based on the lessons breakdown learned by by highlyend-user competitive production where success from end-user markets (Ballhaus environment et al., al., 2009). 2009). This This results in aa technicians the unscheduled events. in the diagnosis during unscheduled breakdown from markets (Ballhaus et al., 2009). This results in a technicians highlyend-user competitive production environment where success requires improved and stable production capacities. Literature highly competitive production environment where success technicians in the diagnosis during unscheduled breakdown highly competitive production environment whereLiterature success events. events. requires improved and stable production capacities. suggests that such complex environment often results in more events. (Ison and Spons, 1996), (Moore et al., 2006) focused research requires improved and stable production capacities. Literature requires improved and stable production capacities. Literature suggests that such complex environment often results in more and Spons, 1996), (Moore etvariability al., 2006)in focused research unscheduled breakdowns with longer durations (Bodein etmore al., (Ison on characterizing and controlling semiconductor suggests that such complex environment often results (Ison and (Moore al., focused research suggests that breakdowns such complexwith environment often results in etmore (Ison and Spons, Spons, 1996), 1996), (Moore et etvariability al., 2006) 2006)in focused research unscheduled longer durations (Bode al., on characterizing and controlling semiconductor 2007) that is breakdowns an evidence with of misdiagnosis by the(Bode technicians manufacturing equipment throughvariability advancedinprocess control unscheduled longer durations et al., on characterizing and controlling semiconductor unscheduled breakdowns with longer durations (Bode et al., on characterizing and controlling variability in semiconductor 2007) an evidence of by manufacturing through process control (Abu-Samah 2014). Consequently, choice (APC) methodsequipment like fault detection and classification 2007) that that is is et anal., evidence of misdiagnosis misdiagnosisinappropriate by the the technicians technicians manufacturing equipment through advanced advanced process (FDC). control 2007) that is an evidence of misdiagnosis by the technicians manufacturing equipment through advanced process control (Abu-Samah et al., 2014). Consequently, inappropriate choice (APC) methods like fault detection and classification (FDC). of maintenance procedure results in poor preventive actions. The multiagent based approaches have emerged as one(FDC). of the (Abu-Samah et al., 2014). Consequently, inappropriate choice (APC) methods like fault detection and classification (Abu-Samah et al., 2014). Consequently, inappropriate choice (APC) methods like fault detection and classification (FDC). of The approaches have as of most interestingbased techniques to schedule maintenance actions, of maintenance maintenance procedure procedure results results in in poor poor preventive preventive actions. actions. The multiagent multiagent based approaches have emerged emerged as one one of the the The FMEA (failure mode effect approach is used in The of maintenance procedure resultsanalysis) in poor preventive actions. multiagent based approaches have emerged as one ofwell the most interesting techniques to schedule maintenance actions, dynamically (Aissani et al., 2009). These approaches are most interesting techniques to schedule maintenance actions, The FMEA (failure mode effect analysis) approach is used in most interesting techniques to schedule maintenance actions, the SI to manage experts' knowledge aboutapproach equipment failure The FMEA (failure mode effect analysis) is used in dynamically (Aissani et These well The FMEA (failure mode knowledge effect analysis) approach is used in suited to select appropriate procedure forapproaches the known are failures dynamically (Aissani et al., al., 2009). 2009). These approaches are well the SI to manage experts' equipment failure dynamically (Aissani and causes. This serves as the basis forabout defining diagnostic and suited the SI to manage experts' knowledge about equipment failure et They al., 2009). These approaches are well tostatic select appropriate procedure foroffer the known known failures and are in nature. also do not any support to the SI to manage experts' knowledge about equipment failure suited to select appropriate procedure for the failures and serves for diagnostic and suited tostatic selectin appropriate procedure foroffer the known failures maintenance procedures for basis unknown and known failures and causes. causes. This This serves as as the the basis for defining defining diagnostic and and are nature. They also do not any support to the technicians to select the procedure during diagnosis upon and are static in nature. They also do not offer any support to and causes. This serves as the basis for defining diagnostic and maintenance procedures for unknown and known failures and technicians are static intonature. They also do not offerdiagnosis any support to respectively. diagnostic guidelines maintenance The procedures forprocedures unknown are andmerely known failures the select the procedure during upon unknown failures. the technicians to select the procedure during diagnosis upon maintenance procedures for unknown and known failures respectively. The procedures are merely guidelines the technicians to select the procedure during diagnosis upon to carryout maintenance actions; however, on unknown respectively. The diagnostic diagnostic procedures arechoice merelydepends guidelines unknown failures. failures. respectively. The procedures arechoice merely guidelines to maintenance actions; however, depends on thecarryout experience anddiagnostic intuitiveness of technicians. Misdiagnosis, The FDC failures. (fault detection and classification) system monitors to carryout maintenance actions; however, choice depends on unknown to carryout maintenance actions; however, choice depends on the experience and intuitiveness technicians. Misdiagnosis, FDC (fault through detection and classification) system monitors at this stage, can result in longerof failure durations and impact The the equipment sensors and generates warnings and the experience and intuitiveness of technicians. Misdiagnosis, The FDC detection and system The equipment FDC (fault (faultthrough detectionsensors and classification) classification) system monitors monitors thethis experience andresult intuitiveness of technicians. Misdiagnosis, at stage, can in longer failure and impact the and generates the future equipment behaviour. We mustdurations provide technicians alarms. These are used bysensors the automation systemwarnings to triggerand the the equipment through and generates warnings and at this stage, can result in longer failure durations and impact the equipment through sensors and generates warnings and at this stage, can result in longerWe failure durations and impact alarms. the future equipment behaviour. must provide technicians These are used by the automation system to trigger the alarms. These are used by the automation system to trigger the the future equipment behaviour. We must provide technicians the future equipment behaviour. We must provide technicians alarms. These are used by the automation system to trigger the Copyright © 2016, 2016 IFAC 544Hosting by Elsevier Ltd. All rights reserved. 2405-8963 © IFAC (International Federation of Automatic Control) Copyright 2016 IFAC 544 Copyright ©under 2016 responsibility IFAC 544Control. Peer review© of International Federation of Automatic Copyright © 2016 IFAC 544 10.1016/j.ifacol.2016.07.692

IFAC MIM 2016 June 28-30, 2016. Troyes, France

Anis BEN SAID et al. / IFAC-PapersOnLine 49-12 (2016) 544–549

maintenance actions against known failures and causes, in an automated production line. We argue that these alarms and warnings have signature that might be linked to the choice of effective maintenance procedure from historical data. Hence, in this paper, we present Bayesian network based approach to learn the causal relationship between the signature of alarms and warnings and the choice of maintenance procedure. This approach will support decision process for the maintenance procedure selection in both known and unknown breakdown events. The prediction made by BN in the former might result in capturing the knowledge about changing equipment failure behaviour. This will be an added advantage for the experts' as it provides a dynamic way to update the FMEA files used as the source for defining appropriate maintenance procedure.

545

predictive maintenance planning that often result in the over engineering.

This paper is subjectively focused to support the technicians’ decision process for the selection of appropriate procedure in unscheduled breakdown events. It will reduce the emerging longer failure durations and significantly contribute to stable and improved production capacities in the SI. Besides this, it will also help us in detecting any drift in the emerging drifts in equipment behaviour for known failures. In this approach, we first cluster the alarms and warnings into groups based on the distribution similarity. These groups are then validated by the experts, prior to their use for BN structure and parameter estimations through semi-supervised learning algorithms. The presented methodology is tested with a case study using data from the world reputed semiconductor manufacturer. Results show causal relations between alarms and warnings signature and choice of maintenance procedure. The simulation on the historical validation data shows 49% gain in the productive time from prolonged failure durations due to misdiagnosis in unscheduled equipment breakdowns.

Fig. 1. Standard maintenance strategies Several studies have dealt with the problem of corrective or preventive maintenance actions optimization. (Sze-jung Wu, 2007) proposed an integrated neural-network based model that uses sensory information to predict equipment remaining useful life (RUL) prior to maintenance. This allows real-time monitoring of vibration information till failure occurrence to predict RUL. This study takes into account common failures in static context and concludes that reduced number of sensor signals provide efficient information on the system reliability. (He and Wang, 2007) presented a methodology based on KNearest Neighbour (KNN) approach for fault detection in the SI, using sensors data from the equipment. The FDC based approaches help to (i) reduce scrap, (ii) increase equipment uptime, and (iii) reduce the usage of test wafers (Thieullen et al., 2013). It is a static approach to monitor the process based on expert’s knowledge in terms of cause & effect relationship between variables. Consequently, recent works use Bayesian networks for equipment failure diagnosis and prognosis due to its ability to (i) cope with complex systems through system decomposition, (ii) provide visual representation of cause and effect relationships, and (iii) combine multisource data from historical records and experts' opinions.

This paper is divided in four sections. The section 2 presents literature review on the existing approaches and methods for failure diagnosis and maintenance action recommendations. The proposed methodology and the case study are presented in the section 3 along with results and we conclude this paper with discussion and future perspectives in section 4.

(Jones et al., 2010) use BN modelling to compute equipment failure rate to support the maintenance planning. The authors use historical equipment sensors and related data, responsible for the failures, to define the BN structure where selection of variables is based on experts' judgement. However, variables e.g. operators’ competence cannot be measured accurately. (Weber et al., 2007) explored a methodology to develop BN based diagnosis and prognosis tool for production processes. In this study, the FMEA approach is used to define the BN structure and probability is based on the occurrence criterion. A similar method is presented by (Bouaziz et al, 2012) that assesses equipment health factor (EHF) in the SI. This allows the equipment health monitoring and root-cause identification upon failure occurrences. Beside the fact that this approach uses contextual data instead of FDC, it is still static in nature as it do not offer the adaption of the inherent BN structure as per emerging equipment drifts in dynamic environment. The above approaches are significantly relevant to model known failures and do not extend any support to technicians during unscheduled equipment breakdowns.

2. LITERATURE REVIEW The literature review is performed on the existing approaches and methods for maintenance policy optimization and actions recommendations. 2.1 Maintenance policies optimization In the SI, maintenance is no longer an unproductive function. It has emerged as a key factor to ensure high product quality and sustainable production capacities. The corrective (run to failure), preventive (time and/or usage based) and predictive maintenance are known as the standard maintenance policies (Mili et al., 2009), as shown in the Fig. 1. The corrective maintenance in the SI destabilizes production capacities. This often results in additional costs and long failure durations but are unavoidable due to unscheduled equipment breakdowns. Besides the corrective maintenance, preventive and predictive strategies are well suited for known failures. The corrective maintenance involves the guidelines for diagnosis, the choice of maintenance procedure and the validation of equipment. These actions do have a strong impact on the preventive and 545

IFAC MIM 2016 546 June 28-30, 2016. Troyes, France

Anis BEN SAID et al. / IFAC-PapersOnLine 49-12 (2016) 544–549

2.2 Maintenance actions recommendation

(Equivalence class), Taboo and Taboo order algorithms (Ben Said et al., 2014). The connectors (1), (2), (3) and (4) are used to further explain the detailed methodology (Fig. 5).

In literature, there are few papers that focus on maintenance actions recommendation to support maintenance decisions. (Aissani et al., 2009) uses multi-agent based approaches to schedule dynamic maintenance actions in petroleum industry. (Kimotho et al., 2013) and (Das, 2013) used event-based decision trees and collaborative filtering, respectively, to find problems associated with particular maintenance actions by analysing equipment sensors data. None of these approaches provides support to technicians during unscheduled failures and breakdowns to select an appropriate maintenance action a.k.a. procedure. The technicians always rely on experiences that have been reported as misdiagnosis resulting in longer failure durations. The methodology presented in this paper is the first initiative of its kind to find and correlate the alarms and warnings signatures with maintenance actions choice in unscheduled events. 3. PROPOSED METHODOLOGY The Fig. 2 illustrates an example from the dielectric (DIEL) workshop, of the SI, based on corrective maintenance actions data completed by technicians during unscheduled equipment breakdown. It can be seen that upon the occurrence of Failure 1, three maintenance procedures {P1, P2, P3} were applied in order. The production period prior to this failure serves as the source to generate alarms and warnings signature. This shows that the right maintenance procedure for Failure 1 is P3 that is correlated with an alarms and warnings signature that can be used as a predictor for subsequent similar signatures through BN. Consequently, this will reduce the long failure durations and potential impacts on the equipment failure behaviours.

Fig. 3. The proposed methodology: phase of model building (a)

Fig. 2. Unscheduled downtime issue illustration The proposed methodology is presented in Fig. 3. The first step corresponds to the extraction of corrective maintenance actions followed by the extraction and alignment with alarms and warnings from the preceding production periods, for unknown failures. The next step is to reduce dimensionality as the BN learned with too many predictors is not accurate. In this step, we clustered alarms and warnings into groups using their respective distribution similarity over the whole period under consideration. These initial clusters are then validated by an expert from dielectric workshop. This is done using Ishikawa diagram as presented in Fig. 4. The snapshot data, collected and aligned, with alarms and warnings groups and associated maintenance procedures is presented in Table 1. This table is further complemented with the actual failure as identified during diagnosis procedure by the technicians. This data is further used to learn semi-supervised learning through EQ

(b) Fig. 4. Ishikawa diagram of alarms and warning groups defined by expert Table 1. Example of data set structure Alarm Grp1 1 1 13 19 17 1 12

546

….. ….. ….. ….. ….. ….. …. ….

Alarm Grp8 2 0 0 0 0 0 2

Warning Grp 1 0 7 3 0 0 0 0

….. ….. ….. ….. ….. ….. …. ….

Warning Grp 5 4 153 269 1 2 156 255

Failure Failure1 Failure 2 Failure 3 Failure4 Failure4 Failure2 Failure3

Maintenance Procedure Procedure 1 Procedure 2 Procedure 3 Procedure 4 Procedure 4 Procedure 2 Procedure 3

IFAC MIM 2016 June 28-30, 2016. Troyes, France

Anis BEN SAID et al. / IFAC-PapersOnLine 49-12 (2016) 544–549

3.1. Model building

It can be further observed in the presented scenario that the probability criterion is defined by users based on which the checklist of maintenance actions is prepared. If this criterion is not met then it is referred as a bad prediction. The proposed approach then detect the degradation of predictive BN, using consecutive number of bad predictions. This criterion is also given by users and might result in model relearning (structure and/or conditional and joint probabilities).

The BN modelling approach is selected as the target method due to its inherent abilities to model causal relations between variables of interest and offers probabilistic reasoning under uncertainty (Kjærulff and Madsen, 2006; Jensen and Nielsen, 2007; Pourret et al., 2008). The BN is a direct acyclic graph (DAG) that comprises of nodes (alarms and warning groups) and directed edges (links, arcs) between nodes. The directed edges represent inter-dependence between the nodes in the network. This dependence is computed through conditional probability calculation using theorem of Mr. Thomas Bayes (Peter, 2012).

A P( A | C ) 

4. CASE STUDY The proposed methodology is tested in an industrial context in dielectric (DIEL) workshop. It is one of the 8 workshops in the SI production line where thin film of electrical insulation is deposited on the silicon wafers. This serves to insulate the different zones with the transistors and interconnections. This is completed with CVD (chemical vapour deposition) process using plasma technology at 400°C. This workshop is one of the critical workshops that often turn into bottleneck and limit the production capacities and result in the higher unscheduled equipment breakdowns. Hence, role of effective maintenance actions becomes critical.

C

P( A, C ) P(C | A) P( A)  P (C ) P (C )

547

(1)

3.2 Model implementation In this subsection, we present the scenarios of unscheduled equipment breakdowns where proposed methodology based BN is used as predictive model to support decision making process of technicians (Fig. 5). In an automated production environment, equipment are monitored by automation system that generates a work request for maintenance and diagnosis actions upon the occurrence of known and unknown failures, respectively. The known failures result in direct corrective maintenance actions as per predefined procedures whereas an unknown failure requires alarms and warnings data from the preceding production period. This data is then grouped into signature as per defined/agreed groups by an expert. It is then fed into the learned BN for probabilistic prediction of advised maintenance procedure, to be executed by the technicians, as well as diagnosis of the failures.

4.1Model learning The structure of the BN structure is learned with the prepared data set using three unsupervised learning algorithms (EQ, Taboo and Taboo order). The objective function used in these algorithms is minimum description length (MDL). It takes into account 'correlation' plus structural complexity of the causal network and sets the automatic significance thresholds (Rissanen 1978; Bouckaert 1993). These algorithms result not only in the network, but also in the associated conditional probabilities. Equivalence class (EQ) is an efficient algorithm for structure learning as it significantly reduces search space. It assumes that two BN structures are said to be equivalent if and only if all distributions that can be represented by one of these structures are identical for all distributions that can be represented with other BNs (Chickering 2002; Munteanu and Bendou 2001). The principal usage of Taboo search algorithm is to refine the BN network based on a given structure; hence, it is more accurate when initial structure is defined with experts‟ knowledge or using some other semi-supervised learning algorithms. This algorithm can be used to learn network from scratch but in this case, it is less efficient than EQ. Therefore, we use it in combination with EQ where EQ provides an initial structure followed by Taboo to improve it based on the MDL score. In addition, Taboo order (Teyssier and Koller 2005) is an expensive search algorithm that offers more accurate results, but takes more time than simple Taboo search. This method searches space in the order of Bayesian network nodes by choosing parents of a node between nodes appearing before it, in the considered order. The best network is selected accordingly to the lowest MDL score. The BN learned using maintenance actions over 6 months period is presented in Fig. 6 with maintenance procedure as the target node. The BN model is learned using BayesiaLab 5.0, EQ and Taboo algorithms with 75:25 cross validation. The model presented in the graph below is divided into four classes of nodes with different colours e.g. alarms (Sandy-Brown),

Fig. 5. The proposed methodology: phase of model implementation 547

IFAC MIM 2016 548 June 28-30, 2016. Troyes, France

Anis BEN SAID et al. / IFAC-PapersOnLine 49-12 (2016) 544–549

warning (Green) and potential failure (Orange). The maintenance procedure as target node is presented as bleu node. The nodes representing alarms and warnings are further discretized into optimized intervals.

For example in (Fig.7a), model predicts TPM3_PM_Forline as a recommended maintenance procedure in presence of 19 pressure Forline_Pirani alarms, zero RF_Gene_ouput bias alarms, and 3 alarms of pressure_chamber. In addition, the model predicts that likely failure is the Forline_Failure with 57% probability. Similarly, Fig. 7b presents that the same procedure TPM3_PM_Forline, with probability of 60%, in presence of different settings of alarms and warnings. The following table presents the predictions made by the learned BN model over 25% test dataset. The learned BN model is able to predict the most appropriate maintenance actions with 75% accuracy with alarms and warnings signatures. Table 2. Validation of BN model prediction Alarm Grp1 1 1 13 19 1 15 12

Fig. 6. Learned BN Structure with industrial dataset 4.2 Result and discussion

….. ….. ….. ….. ….. ….. …. ….

Alarm Grp8 2 0 0 0 0 2 2

Warning Grp 1 0 7 3 0 0 0 0

….. ….. ….. ….. ….. ….. …. ….

Warning Grp 5 4 153 269 1 7 0 0

Predicted Effective Probability PM PM TPM1 TPM1 93% TEMC2 TEMC2 98% TPM3 TPM3 100% TPM2 TPM2 99% TPM3 TPM2 57% TEM3 TPM1 56% TPM1 TPM1 81%

The results presented in table 3 below demonstrate the gain of equipment productive time by implementing the proposed methodology over 25% test dataset. The BN with historical alarms and warnings pattern helps maintenance personals to schedule the right maintenance actions at the first time. This approach also avoids inappropriate interventions.

The ability of BN to handle causal independence, results in efficient inference even with large number of variables. Hence, it’s chosen to identify most appropriate maintenance actions from a predefined list of maintenance procedures. The learned BN model is tested with 25% of dataset in order to validate its reliability of prediction. The figure below (Fig. 7) presents an example of choice of maintenance procedure based on alarms and warnings signature. The change in this signature changes the probability of alternative maintenance procedures.

Table 3. Reduction of unscheduled downtime Failure

performed PMs

Failure1 Failure 2 Failure 3 Failure4 Failure5

TPM2, TPM1 TPM1, TECM, TMP2 TECM TPM1, TPM2, TPM3 TPM1, TECM

Intervention duration 24 hours 30 hours 13 hours 48 hours 17 hours

Predicted PM TPM1 TPM2 TECM TPM3 TECM

Mean duration of Time saved right predicted PM 11 hours 13 hours 13 hours 17 hours 13 hours 0 hours 24 hours 24 hours 6 hours 11 hours

5. CONCLUSIONS AND PERSPECTIVE The proposed BN based methodology to support decisions on the appropriate choice of maintenance actions by technicians demonstrates 49% gain in productive time from long failure durations. This approach is equally applicable for the known failures where equipment behaviour starts drifting. The added advantage is the renewal of the experts' knowledge. The said approach demonstrates that alarms and warnings signatures can be used to predict the choice of appropriate procedure; hence, unknown failure.

(a)

The BN model presented in this study is static and it does not allow a real time monitoring of the equipment. Further, we do not treat the problem of multiple maintenance procedures and their sequences. However, in real conditions, some failures require a sequence of procedures to be resolved. It should be interesting to take into account these aspects in future work. Also the proposed approach will be applied and tested at other workshops in the world reputed SI.

(b) Fig. 7. Example of inference with BN by setting alarms and warning groups 548

IFAC MIM 2016 June 28-30, 2016. Troyes, France

Anis BEN SAID et al. / IFAC-PapersOnLine 49-12 (2016) 544–549

REFERENCES

549

Maintenance Action Recommendation. International Journal of Prognostics and Health Management. Kjærulff, U.B., and Madsen A.L. (2006). Probabilistic Networks for Practitioners – A Guide to Construction and Analysis of Bayesian Networks and Influence Diagrams, Department of Computer Science, Aalborg University, HUGIN Expert A/S. Mili, A., Bassetto, S., Siadat, A., and Tollenaere, M. (2009). Dynamic risk management unveil productivity improvements. Journal of Loss Prevention in the Process Industries, vol. 22(1), pp. 25-34. Moore, T., Harner, B., Kestner, G., Baab, C., and Stanchfield, J. (2006, September). Intel’s FDC proliferation in 300mm HVM: Progress and lessons learned. In Proceeding of AEC/APC Symp. XVIII, Westminster, CO. Munteanu, P., and Bendou, M. (2001). The EQ Framework for Learning Equivalence Classes of Bayesian Networks‟. First IEEE International Conference on Data Mining (IEEE ICDM), San José. Peter, M.L (2012), Bayesian Statistics: An Introduction, Wiley. Pourret, O., Naïm, P., and Marcot, B. (2008). Bayesian Networks: A Practical Guide to Applications. Chichester, England: John Wiley & Sons. Rissanen, J. (1978). Modeling by shortest data description, Automatica, vol. 14 (5), pp. 465–658. Teyssier, M., and Koller, D. (2005). Ordering-based search: A simple and effective algorithm for learning Bayesian networks, In UAI, pp. 584–590. Thieullen, A., Ouladsine, M., and Pinaton, J. (2013). Application of PCA for efficient multivariate FDC of semiconductor manufacturing equipment. In Advanced Semiconductor Manufacturing Conference (ASMC), 2013 24th Annual SEMI, pp. 332-337. IEEE. Yu, R., Iung, B., and Panetto, H. (2003). A multi-agents based E-maintenance system with case-based reasoning decision support. Engineering applications of artificial intelligence, vol. 16(4), pp. 321-333. Weber, P., Medina-Oliva, G., Simon, C., and Iung, B. (2012). Overview on Bayesian networks applications for dependability, risk analysis and maintenance areas. Engineering Applications of Artificial Intelligence, vol. 25(4), pp.671-682. Weber, P., Suhner, M.C., and Iung B. (2001). System approach-based Bayesian Network to aid maintenance of manufacturing process. In proceeding of 6th IFAC Symposium on Cost Oriented Automation, Low Cost Automation. Wu, S. J., Gebraeel, N., Lawley, M. A., and Yih, Y. (2007). A neural network integrated decision support system for condition-based optimal predictive maintenance policy. Systems, Man and Cybernetics, Part A: Systems and Humans, IEEE Transactions on, 37(2), 226-236.

Aissani, N., Beldjilali, B., and Trentesaux, D. (2009). Dynamic Scheduling of Maintenance Task in the Petroleum Industry: A reinforcement approach. Engineering Applications of Artificial Intelligence, vol. 22, pp. 1089-1103. Ballhaus, W., Pagella, A., and Vogel, C. (2009). A change of pace for the semiconductor industry? Technical report, German Technology, Media and Telecommunications, Nov. 2009. Ben Said, A., Shahzad, M.K., Zamaï, E., Hubac, S., and Tollenaere, M. (2014). A Bayesian network based approach to improve the effectiveness of maintenance actions in Semiconductor Industry. In Proceedings of 2nd European Conference of the PHM Society. Bode, C. A., Wang, J., He, Q. P., and Edgar, T. F. (2007). Runto-run control and state estimation in high-mix semiconductor manufacturing. Annual Reviews in Control, 31(2), 241-253. Bouaziz, M. F., Zamaï, E., and Duvivier, F. (2013). Towards Bayesian network methodology for predicting the equipment health factor of complex semiconductor systems. International Journal of Production Research, 51(15), 4597-4617. Bouckaert, R.R. (1993). Probabilistic network construction using the minimum description length principle, In Lecture Notes in Computer Science, vol. 747, pp.41-48. Chickering, D.M. (2002). Learning Equivalence Classes of Bayesian-Network Structures. Journal of Machine Learning Research, vol. 2, pp.44. Das, S. (2013). Maintenance Action Recommendation Using Collaborative Filtering. International Journal of Prognostics and Health Management, vol. 4(2), pp. 7-13. Dekker, R., and Scarf P.A. (1998). On the impact of optimisation models in maintenance decision making: the state of the art. Reliability Engineering & System Safety, vol. 60(2), pp. 111-119. Fernandez, O., Labib, A.W., Walmsley R., & Petty D.J. (2003). A decision support maintenance management system: development and implementation. International Journal of Quality & Reliability Management, vol. 20(8), pp.965-979. Ison, A., and Spanos C.J. (1996). Robust fault detection and fault classification of semiconductor manufacturing equipment. In Proceedings of the 5th International Symposium on Semiconductor Manufacturing, pp. 1-4. Jensen, F.V., and Nielsen, T.D. (2007). Bayesian Networks and Decision Graphs, Second Edition, New York, USA: Springer Verlag. Jones, B., Jenkinson, I., Yang, Z., and Wang, J. (2010). The use of Bayesian network modelling for maintenance planning in a manufacturing industry. Reliability Engineering & System Safety, vol. 95(3), pp. 267-277. He, Q.P., and Wang, J. (2007). Fault detection using the knearest neighbor rule for semiconductor manufacturing processes. Semiconductor Manufacturing, IEEE Transactions on, vol. 20(4), pp. 345-354. Kimotho, J.K., Sondermann-Woelke, C., Meyer, T., and Sextro, W. (2013). Application of Event Based Decision Tree and Ensemble of Data Driven Methods for 549