Using Bayesian networks for root cause analysis in statistical process control

Using Bayesian networks for root cause analysis in statistical process control

Expert Systems with Applications 38 (2011) 11230–11243 Contents lists available at ScienceDirect Expert Systems with Applications journal homepage: ...

2MB Sizes 0 Downloads 27 Views

Expert Systems with Applications 38 (2011) 11230–11243

Contents lists available at ScienceDirect

Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa

Using Bayesian networks for root cause analysis in statistical process control Adel Alaeddini, Ibrahim Dogan ⇑ Industrial and Manufacturing Engineering Department, Wayne State University (WSU), Detroit, MI 48202, USA

a r t i c l e

i n f o

Keywords: Statistical process control (SPC) Control chart patterns Root cause analysis (RCA) Bayesian network

a b s t r a c t Despite their fame and capability in detecting out-of-control conditions, control charts are not effective tools for fault diagnosis. There are other techniques in the literature mainly based on process information and control charts patterns to help control charts for root cause analysis. However these methods are limited in practice due to their dependency on the expertise of practitioners. In this study, we develop a network for capturing the cause and effect relationship among chart patterns, process information and possible root causes/assignable causes. This network is then trained under the framework of Bayesian networks and a suggested data structure using process information and chart patterns. The proposed method provides a real time identification of single and multiple assignable causes of failures as well as false alarms while improving itself performance by learning from mistakes. It also has an acceptable performance on missing data. This is demonstrated by comparing the performance of the proposed method with methods like neural nets and K-Nearest Neighbor under extensive simulation studies. Ó 2011 Elsevier Ltd. All rights reserved.

1. Introduction Root cause analysis (RCA) is targeting at identifying the causes of problems in processes for directing counteractive actions (Rooney & Heuvel, 2004). Control charts typically do not have this feature; however non-random patterns on the chart can be used as a source for RCA (Doty, 1996; Montgomery, 2005; Smith 2004). However, large number of possible relations among patterns and causes makes the process of cause/s identification difficult. Certain information from the process (at the time of change) can be used in accompany with chart patterns to simplify this process. As a simple example, if we know from pattern analysis that either machine condition or the quality of input-material has caused an outof-control situation, when the process data shows that the operating machine has not been serviced for a while but the material has recently been tested showing no problem, there is a high chance that the bad condition of the operating machine has caused the problem. The relationship structure among chart patterns, process information, and assignable causes can are represented in Fig. 1. The chart patterns considered here which are among the most frequent patterns in control charts are discussed in Section 3.3.1. Meanwhile, the specific pieces of information from the process that are included in the network have been discussed in Section 3.3.2. Bayesian networks are powerful tools for knowledge representation and inference under the uncertainty. The graphical nature of

⇑ Corresponding author. Tel.: +1 313 577 9066; fax: +1 313 577 8833. E-mail addresses: [email protected] (A. Alaeddini), [email protected] (I. Dogan). 0957-4174/$ - see front matter Ó 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2011.02.171

Bayesian networks allows seeing relationships among different variables and features. Using conditional independencies in the structure, they are able to perform probabilistic inference. They can not only learn from their mistakes but also they work with incomplete data. Such characteristics make Bayesian network a suitable candidate for modeling relationship structure in Fig. 1. For this purpose, the rest of the paper is organized as follows: Section 2 reviews different techniques of RCA in the literature. Section 3 presents an introduction to Bayesian network, followed by detailed design of model and proposed data structure. Section 4 compares the proposed Bayesian network method with K-Nearest Neighbor (KNN) and Multi-Layer Perceptron (MLP), and discusses its performance under various conditions. Finally, Section 5 presents the conclusions and areas for future research. 2. Root cause analysis literature There are a number of RCA methods in SPC, meanwhile there are other successful methods in other engineering fields mainly based on artificial intelligence techniques that are considered in this research. In this regard, Section 2.1 reviews the methods developed in SPC context. Next, Section 2.2 studies the methods from other engineering fields. 2.1. Root cause analysis techniques in SPC Seder (1950a, 1950b) discusses the use of root causing diagrams in indentifying the assignable causes of out-of control conditions. Yang, He, and Xie (1994) use multivariate statistical methods

A. Alaeddini, I. Dogan / Expert Systems with Applications 38 (2011) 11230–11243

11231

Fig. 1. Relationship among chart patterns, process information and possible root causes.

combined with engineering judgment to diagnose assignable causes. Doggetti (2005) provides a framework for analyzing the performance of three RCA tools: cause-and-effect diagram, interrelationship diagram, and reality tree. Their framework provides information on performance characteristics of the tools so that decision-makers can better understand the underlying assumptions of a recommended solution. Sarkar (2004) proposes a technique which is based on the analysis of the sequence of events preceding the out-of-control state to identify the most likely cause/s of failure. Pollock, Raymer, and Waters, 1998 discuss RCA procedure in practice. Bothe (2001) suggests the use of run charts to confirm root causes. Dew (1991) discusses three tools which are used in RCA including process diagram, barrier analysis, and change analysis. Montgomery (2005), Doty (1996) and Smith (2004) divide the assignable causes into six categories of Man, Machine, Method, Materials, Measure, and Environment. They suggest employing control chart patterns to determine one or more of above categories caused the out-of-control situation. Demirli and Vijayakumar (2008) develop a fuzzy rule based system for X hart, based on control patterns-cause relationship network, to resolve the uncertainties in identifying the (real) chart patterns and relating them to assignable causes. They also discuss that categorizing the out of control states into isolated shifts, sustained shifts and gradual shifts can hasten the RCA process.

and uses the residues as the features. Then, they employ a Hidden Markov Model (HMM) for classification. Widodo and Yang (2007) present a survey of machine condition monitoring and fault diagnosis using support vector machine. Lunze and Schiller (1999) provide an example of fault diagnosis by means of probabilistic logic reasoning. Dey and Stori (2005) use data from multiple sensors on sequential machining operations through a causal belief network framework to provide a probabilistic diagnosis of the root cause of the process variation. Chang and Ho (1999) apply neural network monitoring techniques to process control in an integrated monitoring/diagnosis scheme. The proposed technique contains a modified cause/effect diagram including process and part information which speeds up the diagnosis process.

2.2. Other root cause analysis techniques

Bayesian network, B = hG, Hi, is a directed acyclic graph G that encodes a joint distribution over set of random variables U = {X1, . . . , Xn} where each variable Xi can take values from a finite set with instantiation of each variable represented by small letters, {x1, . . . , xn}. The random variables are represented as vertices, and direct relationships between these random variables are represented as edges. Graph G encodes conditional independence, that is, each variable is independent of its non-descendents given the state of its parents. This conditional independence allows the modularity in the network in which complex system can be represented by consistent several modules. Therefore the statistical relations among variables can be represented by less number of parameters. The symbol H is used to represent the set of parameters that quantifies the network. The joint distribution of all variables in U defined by Bayesian network B as

Dassau and Lewin (2006) use optimization as a mean of automating RCA. They formulate the problem as a mixed-integer nonlinear program, whose system variables include the possible perturbations that affect low quality and low yield, with possible process improvements as decision variables. Weidl, Madsen, and Israelson (2005) develop a methodology that integrates decisiontheoretic troubleshooting with risk assessment for industrial process control. They model the process using generic object-oriented Bayesian networks. Their system presents corrective actions with explanations of the root causes. Motschman and Moore (1999) discuss the process of RCA as well as corrective action in transfusion medicine. Dhafr, Ahmad, Burgess, and Canagassababady (2006) develop a methodology for identifying various sources of quality defects on the product. Leonhardt and Ayoubi (1997) present a summary of methods that can be applied to automatic fault diagnosis with a focus on classification and fuzzy based techniques. Mo, Lee, Nam, Yoon, and Yoon (1997) suggest a methodology based on clustered symptom tree which utilizes the advantage of the signed digraphs to represent the causal relationship between process variables and/or the propagation paths of faults in a simple and graphical way. Ge, Du, Zhang, and Xu (2004) use Hidden Markov Models for metal stamping process monitoring and fault diagnosis. They use a number of autoregressive models to model the monitoring signal in different time periods of a stamping operation

3. Proposed Bayesian network This section discusses the design process of the proposed Bayesian network. For this purpose, Section 3.1 provides an introduction to Bayesian network as a general framework for next sections. Section 3.2 discusses the detail structure of the proposed network. Finally Section 3.3 explains the data structure of the proposed method. 3.1. General structure of Bayesian network

PB ðUÞ ¼

n Y i¼1

PB

 ! Y  Xi X

ð1Þ

i

Q where Xi denotes the set of parents of Xi in G. There is always a parameter hxi jQ for each possible value of xi of Xi and its parent Q xi pxi of Xi . The symbol pxi denotes the state of any combination of parents of Xi. Therefore Eq. (1) can be written as PB ðX 1 ; . . . ; Q X n Þ ¼ ni¼1 hxi jpxi . These conditional independencies allow performing inference in a reasonable amount of time. It is clear in formulation (1) that the lack of possible edges implies conditional

11232

A. Alaeddini, I. Dogan / Expert Systems with Applications 38 (2011) 11230–11243

X1

X2

X3 Fig. 2. A simple Bayesian network with three variables.

independence and requires less parameters. Therefore sparse network is much easier to learn from denser network. Fig. 2 shows a sample Bayesian network, with X1 and X2 on the Q Q root nodes and X3 on the leaf node. Therefore, X 1 ¼ X 1 ¼ fg, Q which means X1 and X2 do not have any parent, and X3 ¼ fX 1 ; X 2 g, i.e. X1 and X2 are parents of X3. If all Xi have two states, Bayesian network needs 6 parameters to quantify the conditional independencies; they are hX 1 ¼ X 11 ; hX 2 ¼ X 21 ; hX 3 ¼ X 31 jX 11 ; X 22 ; hX 3 ¼ X 31 jX 12 ; X 21 , and hX 3 ¼ X 31 jX 12 ; X 22 . Since every variable in the network is binary variable, parameters of the second state of the variable Xi is just complimentary of the first variable, for example, hX 3 ¼ X 31 jX 11 ; X 22 . These conditional probabilities are kept in probability tables. It is clear from Fig. 2 that the size of the probability tables increases exponentially by the number of parents of a node. The parameters of the Bayesian network must be learned before any inference to be used for classification. In the proposed method, it is assumed that the structure is known, so there is no need to exploit the structure of the Bayesian network. As a result, here the problem is estimating parameters of the each class conditional probabilities for a given training data, D = {u1, . . . , uM}. For this purpose, the training data is employed to the network to encode all the conditional probabilities in the graph. Since the training data are all observed (complete data), one common way is to use maximum likelihood estimators. Since training data is all observable and we know that data vectors are independent, the resulting joint probability can be written as:

Pðu1 ; . . . ; um jhÞ ¼

M Y

Pðxi1 ; . . . ; xin jhÞ ¼

i¼1

M Y n Y i¼1 j¼1

Pðxij jpX i Þ ¼ wðHjDÞ

their neighbors and then these neighbors pass the messages to their neighbors. The details of this algorithm can be found in Pearl (1988). When the data D is complete, the network parameters can be got using MLE in Eq. (2). However, in real world applications, the data may contain incomplete records, data has missing values or some variables are not observed. Data may have missing values because of high cost of data collection procedure, data entering errors, unavailable data or sensor reading problems (Demirtas, Arguelles, Chung, & Hedeker, 2007; Formann, 2007; Pendharkar, 2008). In the literature, researchers handle missing values in different ways like deleting the cases that involves any missing values. This is not of course the most appropriate solution since the relevant information is lost and the amount of available data will be decreased. Alternatively, the missing values may be completed by mean values. This creates other problems such as changing the original distributions. One popular option is to use Expectation Maximization (EM) algorithm which is considered in this research as well (Dempster, Laird, & Rubin, 1977; Redner & Walker, 1984). EM algorithm is a proven method to calculate MLE of problems which involves missing data or incomplete information (Lauritzen, 1995). In this regard, assume that Y = (Y1, . . . , YM) is observed and Z = (Z1, . . . , ZM) is missing data, so complete data is D = (Y, Z). When the data is incomplete, Eq. (2) cannot be applied directly. The main difficulty in learning with incomplete data is that the likelihood function does not decompose. Therefore, EM algorithm follows an iterative approach, firstly it starts with initial guess of parameters H(0), and then iteratively generates succeeding parameters, H(1), H(2), . . .. In each step, there are mainly two steps that also give acronyms to the algorithm: 1. Step E, which finds the conditional expectation of log-likelihood function of the observed data:

Q ðHjHðtÞ Þ ¼ EH ðlog pðDjHÞjHðtÞ ; YÞ

2. Step M, which maximizes the expectation that computed in step E:

Hðtþ1Þ ¼ arg max Q ðHjHðtÞ Þ

j

ð5Þ

H

ð6Þ

ð2Þ xij

This function w(H|D) is called the likelihood function and denotes the value of ith variable in jth sample. In maximum likelihood, we wish to find H⁄ where:

H ¼ arg max PðDjHÞ H

ð3Þ

It is well known that the ML estimators result in relative frequency of the cases in the training data. Here, for simplicity, Xi = k is used to specify that ith variable takes k possible states. Then, the relative frequency of the cases when the variable Xi has the specific state k is

NðX i ¼ k; Pj Þ hi;Pj ;k ¼ P k NðX i ¼ k; Pj Þ

ð4Þ

After learning the parameters of conditional probability tables, network is ready for inference. The inference problem in our study is to calculate the classification probability for given initial conditional probabilities and external evidence from testing data. For example, for (M + 1)th data point the class probabilities are calculated by P(Ci|H⁄, uM+1, B) for a given network B and learned parameters H⁄. By exploiting local independencies, Pearl (1988) developed a message-passing algorithm that enables efficient and exact calculation of inference in Bayesian networks. This is accomplished by initiating messages from each instantiated variables to

These two steps are repeated until convergence of maximum likelihood function while each iteration is guaranteed to increase the likelihood (Bilmes, 1998; Bishop, 2006). 3.2. Detail structure of the proposed bayesian network Fig. 3(a) shows the detail structure of the proposed Bayesian network. The root causes construct the root nodes of the proposed Bayesian networks, while chart patterns and process information from the leaf nodes. The root causes considered here are: Man, Machine, Material, Method, Measure and Environment, which are the six most commonly considered root causes plus false alarm. False alarm is included because not every out-of-control signal from the control chart is true. The chart patterns and process information are discussed in Sections 3.3.1 and 3.3.2. Part (b) and (c) of Fig. 3 show two possible arrangements of cause and effect relationships in the proposed Bayesian network which are used for the numerical examples in Section 4. Most of these arrangements are got from Demirli and Vijayakumar (2008), Montgomey (2005), Doty (1996) and Smith (2004). The first arrangement on left is used for situations with single assignable causes. Such assumption is common in practice because searching for assignable causes can be very expensive. On the other hand the arrangement on right is used for the condition in which

A. Alaeddini, I. Dogan / Expert Systems with Applications 38 (2011) 11230–11243

11233

Fig. 3. The detail structure of the proposed Bayesian network with the Matrix of Parent and Children relations. (a) Bayesian network with chart patterns, process information and root causes, (b) Bayesian network relations for single causes and (c) multiple causes.

either single or multiple assignable causes can occur. Such assumption is more realistic because it is not uncommon that a number of insignificant assignable causes get together and make a significant shift in the process. It should be noted that these arrangements can be changed based on the specific characteristic of the process under study.

Table 1 Western electric company run rules. Run rule

Pattern

Description

1 2

OCL Freak1 (FRK1)

3

Freak2 (FRK2)

3.3. Data structure of the proposed method

4

Run

The proposed Bayesian network uses two sources of data, namely chart patterns and process information. To be able to use these data effectively they should be organized in a data structure. For this purpose, Section 3.3.1 briefly discusses the chart patterns under the framework of run rules. Next, Section 3.3.2 explains the required process information and suggests a data structure for capturing these data.

5

Trend

One point plots outside the area of UCL or LCL Two out of three consecutive point plot in zones A or A Four out of five consecutive point plot in zones A, B or B, A Eight consecutive points plot in zones A, B, C or C, B, A Seven consecutive points continuously increasing or decreasing

3.3.1. Control chart patterns There are different methods for pattern recognition in control charts; among them run rules are more common in practice. Run rules may indicate an out-of-control condition either when one or more points fall beyond control limits or when a pre-specified number of points falls on specific regions on the control chart (A, B, C, C, B, A) or exhibit specific behaviors. Western Electric (1965) and Nelson (1984) suggested sets of decision rules for non-random patterns detection (see Table 1). Fig. 4 shows a trend pattern on a X chart detected by run rule 3. More investigations on run rules can be found in the works of Champ and Woodall (1987), Page (1955), Roberts (1958), Bissell (1978) and Wheeler (1983).

3.3.2. Process information Table 2 shows the data structure considered for proposed method which includes specific information from the process and chart patterns (chart pattern has discussed in Section 3.3.1). Clearly, one may add or remove data pieces to the table considering the characteristics of the process under study. In Table 2, under Man subcategory, there are four items: (1) Employee ID, (2) Performance, (3) Employee changes, and (4) Overtime. (1) Employee ID represents the operator who was working in the station at the time of failure. The reason for considering employee ID in the data structure is that typically some employees make more errors on certain tasks, or some of them have problem working with some machines, etc. (2) Employee performance refers to the performance of the employee on most recent working times since the trend of employee performance has a direct relation with the number of errors he/she makes. This information can be gained from Scorecard forms or other resource, (3) Employee change determines whether there were any changes in the

11234

A. Alaeddini, I. Dogan / Expert Systems with Applications 38 (2011) 11230–11243

Fig. 4. X control chart showing a trend pattern on last seven points.

Table 2 Data structure (for leaves) of the proposed Bayesian network (first row shows the subcategory, second row is the column number, and third row shows specific pieces of information for each subcategory). Man

Machine

Material

1

2

3

4

5

6

7

8

9

‘10

Employee ID

Employee performance

Employee change

Over time

Machine ID

Maintenance condition

Machine adjustment

Raw material code

New material

Material AQL

Environment

Chart pattern

12 New methods

13 Gauge ID.

14 Gauge R&R

15 Environmental level

16 OCL

17 FRK1

18 FRK2

19 RUN

20 Trend

Method 11 Procedure

Measure

operator of the machine/station at or near the time of signal because many shifts occur right after the operators substitutions. (4) Over time shows if the operator is working during over time or regular time. Machine subcategory contains three items: (1) Machine ID, (2) Maintenance Condition and (3) Adjustment. (1) Machine ID illustrates the code of the machine which was active at the time. The justification for including machine ID in the data structure is like Employee ID. (2) Maintenance condition shows the service level of the machine and its instruments at the time of use. This information is taken from maintenance report and included in the model because it has close relation with the quality of the process. (3) Machine adjustments shows if machine had a very recent set up, tuning, etc. This information is helpful because a change in any process element setting can increase the probability of out-of-control signal. Material sub-category includes: (1) Material code, (2) Material change, and (3) Material AQL. (1) Material code investigates the supplier’s quality status and its effect on the rate of out-of-control alarms. (2) New materials/Material change, points out if there has been a change in the material in the process at the time of signal. Because some operators may need some time to get used to new material or some machine could not effectively process some materials. (3) AQL is the average quality level of the batch of material. Method sub-category has two items: (1) Procedure and (2) Changes. (1) Procedure shows the work order that was followed by the operator and/or machine. Like case discussed before some procedure may not be fulfilled effectively by some operators and machines. (2) New method/Method changes explore if the procedure was changed or renewed. Again as it affects the elements of the process it increases the risk of error in the process. Measure sub category contains: (1) Gauge ID and (2) Gauge Reproducibility and Repeatability (R&R). (1) Gauge ID identifies the information about the measurement tools with the same justification as Machine and Material ID. (2) Gauge R&R discusses the reproducibility and repeatability of the measurement gauge and

quality control personnel which is a significant factor in the rate of false alarm. Environment subcategory represents the severity of the conditions of the work station. Clearly, a large number of out of controls are due to the environmental issues like pollution, noise, etc. The main objective of the proposed method is to find the probability of occurrence of root nodes based on the observed values of leaf nodes. For this purpose, the data pieces 1–15 in Table 2 have to be gathered from the process after each failure. In parallel, data pieces 16–20 in Table 2 should be filled based on the non-random patterns on the control chart after each out-of-control signal. Having enough number of such data records, along with their related root causes the parameters of the proposed Bayesian network can be calculated by training. Afterward, whenever an out of control condition is detected; given the process information and observed pattern related to that failure, the probability of occurrence for every possible root cause/s can be calculated by the Bayesian network. After fixing the problem, a new record can be added to the training dataset in order to improve the parameter estimates of the Bayesian network.

4. Verification and validation To verify the performance of the proposed method two independent sets of simulation studies have been conducted. In the first series of simulations, the proposed method is compared with K-Nearest Neighbor (KNN) and Multi-Layer Perceptron (MLP). In the next series of simulations, the proposed method is evaluated under various conditions to gain detail information about its performance. For each set of simulation studies, various sets of input–output data are generated using statistical simulations based on the data structure discussed in Section 3.3.2. These datasets are then used to train and evaluate the performance of the proposed method under different situations. The simulated data are originally from a real process in a telecommunication systems manufacturing

A. Alaeddini, I. Dogan / Expert Systems with Applications 38 (2011) 11230–11243

company. Although these datasets can truly examine the effectiveness of the proposed method, due to their incompleteness and small size, they cannot be applied directly for evaluations. To overcome this problem, using the original datasets, the distributions of leaf nodes are indentified given that the root cause/s is known. These distributions, with the original or modified parameters, are then used for generating the training and testing datasets. Two major scenarios have been considered for developing training and testing datasets: (1) single assignable causes; that consider the cases for which the failure in the process has been originated from one of causes: Man, Machine, Material, Method, Measure, and Environment; (2) multiple assignable causes; that studies the cases for which the failure in the process has been originated by two of above causes. For each series of simulation studies these two scenarios are examined along with the case of false alarm; which evaluates the performance of the proposed method in diagnosing wrong signals from the control chart, when no shift has occurred in the process. To study the effect of the sample size on the performance of the proposed method, for training part, datasets of sizes 40, 80, 120, 160, 200, 240, 280, 320, 360, 400, 440, and 480 have been considered. Meanwhile to be able to compare the results, for testing part datasets of size 480 are used. Finally to analyze the effects of missing data, we study and compare the performance of the proposed method on complete data as well as 10%, 20% and 30% missing data in training, testing and both training and testing datasets. To arrange the datasets with missing values, the original values in the dataset are randomly replaced (based on a uniform distribution) by null value respect to the given proportion of missing data. 4.1. Comparing the proposed method with KNN and MLP Fig. 5 shows the structure of the conducted simulations for comparing the performance of the proposed Bayesian network with KNN and MLP. In this study, KNN with 1, 3 and 5-Nearest

11235

Neighbors and MLP with 3 layers, learning rate of 0.3, momentum 0.2, and 200 numbers of trainings have been considered. These classifiers have been trained on dataset of sizes 40, 80, . . . , 480 and compared based on data sets of size 480 with 0%, 10% and 20% missing in training and testing dataset under single and multiple assignable causes. 4.1.1. Simulation results for single assignable causes Figs. 6 and 7 compare the performance of the proposed Bayesian network, KNN and MLP under complete and 10% missing data for single assignable causes. The results with 20% missing data are similar to these cases (Also see Table A1 in Appendix A). The results from Figs. 6 and 7 show when the number of training data is high, the performance of the comparing methods, specially the proposed method and MLP, is close to each other. However, when the number of training data is small, which is a realistic assumption in control charts applications, the proposed method has a clear advantage over other techniques. 4.1.2. Simulation results for multiple assignable causes Figs. 8 and 9 compare the performance of the proposed Bayesian network, KNN and MLP on complete and 20% missing data and multiple assignable causes. The results with 10% missing data are similar to these cases (see Table A2 in Appendix A). Similar to simulations with single causes, here, in most cases especially the ones with small training datasets; the proposed method performs as an upper bound to the other methods which is followed by MLP. From above simulation results, for almost all cases except very few ones with large training datasets (in which MLP performs better than the proposed method), the proposed method outperforms other methods in term of accuracy. Such results show the ability of the proposed method in diagnosing the root causes under different conditions.

Fig. 5. The structure of simulations for comparing the proposed method with KNN and MLP (SS represents Sample Size).

11236

A. Alaeddini, I. Dogan / Expert Systems with Applications 38 (2011) 11230–11243

Fig. 6. Accuracy summary for complete dataset with single assignable cause.

Fig. 7. Accuracy summary for 10% missing value in training and testing dataset with single assignable cause.

Fig. 8. Accuracy summary for complete data with multiple causes.

4.2. Evaluating the proposed method under different conditions Fig. 10 shows the structure of conducted simulation studies on the proposed method for single and multiple causes. It also shows different rate of missing data and different sizes of datasets considered for this section. 4.2.1. Simulation results for single assignable causes Fig. 11 illustrates the mean square error (MSE) of the proposed method in predicting real causes of changes in the process based on different percentage of missing data in the training and testing datasets with different sizes (also see Table A3 in Appendix A). As noted before the size of testing dataset is fixed at 480 also the

estimates are based on 50 replicates. In Fig. 11, the first column of data points shows the MSE of 50 runs of the proposed method when it is trained only by 40 samples and tested by 480 samples without any missing data. The second column of data points is related to training dataset of size 80 and so on. As shown in Fig. 11 increasing the size of training dataset decreases MSE drastically, especially moving from dataset of size 40–120. However the slope of improvement decreases near dataset of sizes 200 and 240, after that there is no clear improvement. This shows that with a fairy small training dataset even with missing values the proposed method provides acceptable estimates. Another point is that, as expected, MSE of experiments with lower ratio of missing data is less than their counterparts.

A. Alaeddini, I. Dogan / Expert Systems with Applications 38 (2011) 11230–11243

11237

Fig. 9. Accuracy summary for 20% missing in training and testing with multiple causes.

Fig. 10. The structure of the simulation studies for evaluating the proposed method under different conditions (SS represents Sample Size).

Complete Data

0.77

Miss. Train (30%)

0.72

Miss. Test (30%)

MSE

0.67

Miss. Train (30%) + Test (30%) Miss. Train (20%)

0.62 0.57 0.52 0.47 0.42 Sample Size

Fig. 11. Mean square error for different sizes of dataset and different percentage of missing data.

Meanwhile, training the proposed Bayesian Net with missing data, results in better performance in manipulating (testing) datasets with missing values.

Figs. 12 and 13 depict the classification accuracy and related confidence interval of the proposed method in diagnosing real causes of shifts in the process based on different ratios

11238

A. Alaeddini, I. Dogan / Expert Systems with Applications 38 (2011) 11230–11243

Classification Accuracy (%)

77% 72% 67% 62% 57% 52%

Complete Data Miss. Test (30%) Miss. Train (20%) Miss. Train (20%) + Test (20%)

47% 42%

Miss. Train (30%) Miss. Train (30%) + Test (30%) Miss.Test (20%) Miss. Train (10%)

Sample Size Fig. 12. Classification accuracy for different sizes of dataset and different percentage of missing data.

Fig. 13. Confidence interval on classification accuracy for different sample sizes for: (a) complete data, (b) 10% missing in testing data (c) 10% missing in training data (d) 10% missing in testing and training data.

Table 3 Confusion matrix of the proposed method for single assignable causes with no missing data.

Table 4 The combination of assignable causes used for analyzing the performance of the proposed method. Combinations

Feasible Evaluated

Feasible Evaluated

Possible combinations 1: Man , 2: Machine, 3: Material, 4: Method, 5: Measure, 6: Environment, 7: False alarm 1

2

3

4

5

6

7

1&2

1&3

1&4

1&5

1&6

X X

X X

X X

X X

X X

X X

X X

X X

X

X X

X

X X

1&7

2&4

2&5

2&6

2&7

3&4

3&5

3&6

3&7

4&5

4&6

4&7

5&6

5&7

X

X

X

X

X

X

X

X

X X

X

2&3 X X 6&7

A. Alaeddini, I. Dogan / Expert Systems with Applications 38 (2011) 11230–11243

Complete Data Miss. Train (30%) Miss. Test (30%) Miss. Train (30%) + Test (30%) Miss. Train (20%) Miss.Test (20%) Miss. Train (20%) + Test (20%) Miss. Train (10%) Miss. Test (10%) Miss. Train (10%) + Test (10%)

0.96 0.86 0.76 MSE

11239

0.66 0.56 0.46 Sample Size

Fig. 14. Mean square error for different sizes of dataset and different percentage of missing data.

60%

Classification Accuracy

55% 50% 45% 40% 35% 30% 25%

Complete Data Miss. Test (30%) Miss. Train (20%) Miss. Train (20%) + Test (20%) Miss. Test (10%)

Miss. Train (30%) Miss. Train (30%) + Test (30%) Miss.Test (20%) Miss. Train (10%) Miss. Train (10%) + Test (10%)

Sample Size Fig. 15. Classification accuracy for different sizes of dataset and different percentage of missing data.

Fig. 16. Confidence interval on classification accuracy for different sample sizes for (a) complete data, (b) 10% missing in testing data (c) 10% missing in training data (d) 10% missing in testing and training data.

11240

A. Alaeddini, I. Dogan / Expert Systems with Applications 38 (2011) 11230–11243

of missing data and different sizes of dataset (see also Table A4 in Appendix A). Obviously there is a close relation between these results and Fig 10, but they are not exactly the same.

The second case simply shows the rate correct classification which might be more tangible practitioners.

of for

Table 5 Confusion matrix of the proposed method for single and multiple assignable causes.

Table A1 Classification accuracy of comparing methods on single assignable causes (%). Comparing method

Percentage of missing in training and testing dataset

Accuracy & SSE

MLP

(0, 0)

KNN-1

Training date set 40

80

120

160

200

240

280

320

360

400

440

480

Accuracy (%) SSE

49 0.33

60 0.3

59 0.3

69 0.27

64 0.28

63 0.28

67 0.27

65 0.28

68 0.27

68 0.27

69 0.27

69 0.26

(0, 0)

Accuracy (%) SSE

47 0.35

48 0.36

47 0.36

54 0.33

53 0.34

54 0.34

58 0.32

53 0.33

56 0.32

56 0.33

58 0.32

56 0.33

KNN-3

(0, 0)

Accuracy (%) SSE

45 0.33

52 0.3

50 0.31

55 0.3

55 0.3

56 0.3

60 0.28

55 0.29

59 0.29

59 0.29

60 0.28

60 0.28

KNN-5

(0, 0)

Accuracy (%) SSE

43 0.32

53 0.3

52 0.31

55 0.3

58 0.29

56 0.3

64 0.28

56 0.29

59 0.29

61 0.28

63 0.28

70 0.28

Bayesian

(0, 0)

Accuracy (%) SSE

61 0.59

69 0.45

69 0.44

69 0.46

70 0.43

72 0.42

70 0.42

70 0.44

71 0.43

77 0.41

76 0.41

71 0.42

MLP

(0.1, 0.1)

Accuracy (%) SSE

48 0.33

53 0.33

51 0.33

55 0.31

59 0.3

51 0.3

60 0.3

60 0.3

63 0.3

63 0.29

60 0.3

65 0.29

KNN-1

(0.1, 0.1)

Accuracy (%) SSE

45 0.33

52 0.3

50 0.31

55 0.3

55 0.3

56 0.3

60 0.28

55 0.29

59 0.29

59 0.29

60 0.28

60 0.28

KNN-3

(0.1, 0.1)

Accuracy (%) SSE

39 0.37

43 0.32

40 0.34

48 0.32

52 0.31

51 0.31

57 0.3

54 0.3

51 0.31

55 0.31

55 0.3

55 0.3

KNN-5

(0.1, 0.1)

Accuracy (%) SSE

42 0.33

45 0.31

39 0.33

48 0.31

53 0.3

51 0.3

59 0.29

55 0.29

53 0.3

56 0.3

58 0.29

57 0.29

Bayesian

(0.1, 0.1)

Accuracy (%) SSE

55 0.61

64 0.54

66 0.51

70 0.48

68 0.48

65 0.48

66 0.49

68 0.47

66 0.48

70 0.46

71 0.46

69 0.47

MLP

(0.2, 0.2)

Accuracy (%) SSE

44 0.35

50 0.33

50 0.33

57 0.31

53 0.32

54 0.32

56 0.31

56 0.31

55 0.32

61 0.3

59 0.31

58 0.31

KNN-1

(0.2, 0.2)

Accuracy (%) SSE

43 0.32

53 0.3

52 0.31

55 0.3

58 0.29

56 0.3

64 0.28

56 0.29

59 0.29

61 0.28

63 0.28

70 0.28

KNN-3

(0.2, 0.2)

Accuracy (%) SSE

37 0.34

44 0.33

40 0.34

40 0.33

45 0.32

43 0.33

45 0.32

45 0.36

44 0.32

48 0.31

51 0.31

46 0.31

KNN-5

(0.2, 0.2)

Accuracy (%) SSE

37 0.33

53 0.3

43 0.33

42 0.32

44 0.32

45 0.32

48 0.31

50 0.31

44 0.32

49 0.31

53 0.31

48 0.31

Bayesian

(0.2, 0.2)

Accuracy (%) SSE

44 0.81

53 0.7

61 0.54

62 0.53

61 0.53

64 0.51

64 0.51

61 0.52

65 0.51

66 0.52

67 0.51

65 0.52

11241

A. Alaeddini, I. Dogan / Expert Systems with Applications 38 (2011) 11230–11243

shows the combination of assignable causes used for analyzing the performance of the proposed method. Fig 14 illustrates the mean square error (MSE) of the proposed method when dealing with different combinations of single and double assignable causes shown in Table 4 (Also see Table A5 in Appendix A). Other assumptions and setting are just like the previous cases. It is expectable that MSE increases when there are multiple assignable causes. This is due to the exponential increase in the number of (assignable cause) combinations that the proposed method should search/decide among. Nonetheless, the performance of the proposed method still looks acceptable under such condition while the interpretation of performance trend under conditions is pretty much the same as the single assignable cause case. Similar to the case of single assignable causes, Figs. 15 and 16 show the classification accuracy and related confidence intervals of the proposed method based on 50 replicates with various sizes of training datasets and different rate of missing values (see Table A6 in Appendix A). Similar to Fig. 14, the performance under multiple assignable causes is less than single assignable cause case. Like Table 4, Table 5 illustrates the confusion matrix of the proposed method on classifying different combinations of single and double assignable causes discussed in Table 4. The headings, showing the evaluated assignable, are based on Table 4. Similar to Table 3, here, the probability of correct classification can be found on the diagonal of the matrix. As can be seen from the table, although the number of classes is very high (comparing to

It is interesting that having a large enough training dataset, e.g. greater than 200, even with high percentage of missing values, e.g. 30%, the proposed method can differentiate among various possible causes of variability in the process with an acceptable precision, e.g. over 70%, and confidence interval with small length. While Figs. 11 and 12 discuss MSE and accuracy of the proposed method in classifying different assignable causes, Table 3 illustrates the confusion matrix of the proposed method, which study the percentage of correct and incorrect classification for each assignable causes based on 50 experiments each contains 480 data points. Here, the percentage of correct classification can be found on the diagonal of the matrix. For instance the first highlighted cell in the table shows that the proposed method can correctly classify errors originating from ‘‘man’’ with the probability of 92%. The results show that the probability of correct classification is highest for diagnosing shifts originating from ‘‘Man’’. Meanwhile, the weakest probability of correct classification is related to ‘‘Measure’’. This might be because of employed dataset in which the difference between a caliber and un-caliber measuring tool is very small. 4.2.2. Simulation results for multiple assignable causes As noted before, it is common in practice to have more than single assignable causes. For example machine and material together can cause the out-of-control condition. In such cases the assignable causes usually have some sort of interactions with each other. In this section we study the performance of the proposed method in manipulating two simultaneous assignable causes. Table 4

Table A2 Classification accuracy of comparing methods on multiple assignable causes (%). Comparing method

Percentage of missing in training and testing dataset

Accuracy & SSE

MLP

(0, 0)

KNN-1

Training date set 40

80

120

160

200

240

280

320

360

400

440

480

Accuracy (%) SSE

30 0.29

41 0.27

44 0.26

51 0.25

52 0.25

53 0.25

56 0.23

60 0.23

56 0.24

61 0.23

63 0.22

65 0.22

(0, 0)

Accuracy (%) SSE

29 0.3

35 0.3

37 0.29

44 0.28

43 0.28

43 0.28

50 0.26

48 0.26

50 0.26

54 0.25

53 0.25

56 0.25

KNN-3

(0, 0)

Accuracy (%) SSE

25 0.28

35 0.27

36 0.27

43 0.26

41 0.25

41 0.26

45 0.24

46 0.24

45 0.24

53 0.23

50 0.24

50 0.24

KNN-5

(0, 0)

Accuracy (%) SSE

26 0.27

35 0.26

34 0.26

41 0.25

44 0.24

39 0.25

49 0.24

48 0.24

45 0.24

53 0.23

45 0.24

50 0.24

Bayesian

(0, 0)

Accuracy (%) SSE

37 0.88

48 0.64

50 0.62

55 0.52

56 0.52

54 0.53

56 0.5

58 0.51

57 0.51

60 0.51

60 0.5

57 0.5

MLP

(0.1, 0.1)

Accuracy (%) SSE

26 0.3

39 0.28

42 0.27

42 0.71

45 0.26

48 0.26

51 0.25

48 0.26

50 0.26

56 0.24

57 0.24

56 0.24

KNN-1

(0.1, 0.1)

Accuracy (%) SSE

27 0.3

30 0.3

32 0.3

38 0.29

40 0.29

40 0.29

44 0.27

48 0.27

41 0.27

52 0.25

45 0.27

53 0.26

KNN-3

(0.1, 0.1)

Accuracy (%) SSE

25 0.28

29 0.28

31 0.27

35 0.27

35 0.26

35 0.26

40 0.25

42 0.25

36 0.26

45 0.24

41 0.25

45 0.24

KNN-5

(0.1, 0.1)

Accuracy (%) SSE

23 0.28

27 0.27

32 0.27

36 0.36

35 0.26

35 0.26

41 0.25

44 0.25

37 0.25

48 0.24

42 0.24

45 0.24

Bayesian

(0.1, 0.1)

Accuracy (%) SSE

40 0.84

41 0.72

44 0.64

47 0.61

47 0.59

50 0.55

52 0.55

54 0.55

55 0.55

52 0.57

53 0.56

50 0.54

MLP

(0.2, 0.2)

Accuracy (%) SSE

23 0.3

28 0.29

35 0.28

39 0.27

37 0.28

41 0.27

41 0.27

43 0.27

43 0.27

47 0.26

48 0.26

45 0.28

KNN-1

(0.2, 0.2)

Accuracy (%) SSE

20 0.32

24 0.31

30 0.31

31 0.31

33 0.3

34 0.3

35 0.3

36 0.29

34 0.29

40 0.28

39 0.29

36 0.29

KNN-3

(0.2, 0.2)

Accuracy (%) SSE

17 0.29

27 0.28

31 0.27

30 0.27

34 0.26

31 0.27

34 0.26

33 0.27

32 0.27

32 0.26

35 0.26

39 0.26

KNN-5

(0.2, 0.2)

Accuracy (%) SSE

19 0.28

28 0.27

29 0.27

30 0.27

34 0.26

31 0.26

34 0.26

35 0.26

34 0.27

38 0.25

37 0.26

40 0.25

Bayesian

(0.2, 0.2)

Accuracy (%) SSE

35 0.89

39 0.75

40 0.69

46 0.64

43 0.64

45 0.6

47 0.6

48 0.6

49 0.61

48 0.62

47 0.62

45 0.6

11242

A. Alaeddini, I. Dogan / Expert Systems with Applications 38 (2011) 11230–11243

the number of trainings and the number of Bayesian Net nodes), still the rate of correct classification is acceptable for many causes. In general, the results from Tables 3–5 and Figs. 6–16 confirm the capability of the proposed method in diagnosing the root cause/s of the out-of-control conditions under different conditions.

5. Conclusions In this study we developed a hybrid intelligent approach based Bayesian networks for fault detection and diagnosis in control charts. The proposed Bayesian network use control chart patterns

Table A3 Mean square error for different dataset size and different percentage of missing data (%). Training dataset Percentage of missing in trainingand testing dataset

40

80

120

160

200

240

280

320

360

400

440

480

Complete data Miss. train (30%) Miss. test (30%) Miss. train (30%) + test (30%) Miss. train (20%) Miss. test (20%) Miss. train (20%) + test (20%) Miss. train (10%) Miss. test (10%) Miss. train (10%) + test (10%)

64 71 69 76 68 67 73 67 65 69

49 54 61 62 52 58 58 51 54 55

46 49 58 59 48 53 54 47 49 50

44 46 57 58 46 53 54 45 48 49

44 45 56 57 45 52 52 44 47 48

43 44 56 57 43 52 52 43 48 48

43 45 55 56 44 51 51 44 47 47

43 44 55 56 43 51 52 43 47 47

43 43 55 56 43 51 51 43 47 47

42 43 55 56 43 51 51 43 46 47

42 43 55 56 43 51 51 42 46 47

42 43 55 56 43 51 51 43 47 47

Table A4 Classification accuracy for different dataset size and different percentage of missing data (%). Training date set Percentage of missing in training and testing dataset

40

80

120

160

200

240

280

320

360

400

440

480

Complete data Miss. train (30%) Miss. test (30%) Miss. train (30%) + test (30%) Miss. train (20%) Miss. test (20%) Miss. train (20%) + test (20%) Miss. train (10%) Miss. test (10%) Miss. train (10%) + test (10%)

56 52 50 47 55 53 51 56 56 54

66 63 56 55 64 59 59 65 62 62

69 67 59 57 67 63 61 68 66 65

70 69 59 59 69 63 63 69 67 66

71 70 60 60 70 64 64 71 68 68

71 71 60 59 72 64 64 71 68 67

72 69 61 60 70 65 65 71 69 68

72 71 61 60 72 65 64 72 69 68

72 72 61 61 72 65 65 72 69 69

73 72 61 61 73 65 65 72 69 69

73 72 62 61 72 66 65 72 70 69

73 72 62 61 72 65 65 73 70 69

Table A5 Mean square error for different dataset size and different percentage of missing data (%). Training date set Percentage of missing in training and testing dataset

40

80

120

160

200

240

280

320

360

400

440

480

Complete data Miss. train (30%) Miss. test (30%) Miss. train (30%) + test (30%) Miss. train (20%) Miss. test (20%) Miss. train (20%) + test (20%) Miss. train (10%) Miss. test (10%) Miss. train (10%) + test (10%)

90 97 92 99 95 92 94 89 91 91

66 70 76 80 67 71 75 66 70 70

59 61 71 73 61 67 67 57 62 62

55 57 68 71 56 64 65 55 59 59

53 55 68 69 54 63 63 53 58 58

52 54 67 68 53 62 63 53 57 57

51 53 67 67 52 62 62 52 57 57

51 52 67 67 51 61 62 51 56 56

51 52 66 67 52 61 61 51 56 56

51 51 66 66 51 61 61 51 56 56

51 51 66 66 51 61 61 51 55 55

50 51 66 66 51 60 61 50 55 55

Table A6 Classification accuracy for different dataset size and different percentage of missing data (%). Training date set Percentage of missing in training and testing dataset

40

80

120

160

200

240

280

320

360

400

440

480

Complete data Miss. train (30%) Miss. test (30%) Miss. train (30%) + test (30%) Miss. train (20%) Miss. test (20%) Miss. train (20%) + test (20%) Miss. train (10%) Miss. test (10%) Miss. train (10%) + test (10%)

36 34 30 29 35 32 32 37 34 34

47 45 36 34 46 40 38 47 44 43

50 50 38 37 50 43 42 52 48 47

54 51 40 39 52 44 44 53 49 49

55 53 41 40 54 46 44 54 50 50

56 55 41 41 55 46 46 56 51 51

57 55 41 40 56 46 46 56 51 51

57 55 41 41 56 47 46 56 52 52

58 56 42 42 57 47 47 57 52 53

58 57 42 42 57 48 47 58 52 52

58 57 42 41 57 47 47 57 53 52

58 58 42 41 57 47 47 58 53 52

A. Alaeddini, I. Dogan / Expert Systems with Applications 38 (2011) 11230–11243

in accompany with as set of specific information from the process at the time of change as inputs and provides a ranked list of most important root causes with related probability of occurrence as the output. Through two sets of extensive simulation studies we verified the proposed method under different conditions. In the first series we compared the performance of the proposed method with neural nets and nearest neighbor classifiers. Next, we calculate detail statistics on its performance under different out-of-control situations under variable sizes of datasets and different rate of missing data. The proposed method is easy to construct and implement. Meanwhile it can be constructed based on either direct or indirect approach. It learns from its mistakes and robust to noise and missing data. Appendix A See Tables A1–A6. References Bilmes, J. A. (1998). A gentle tutorial of the em algorithm and its application to parameter estimation for Gaussian mixed and hidden Markov models. Technical Report, U.C. Berkeley, TR-97-02. Bishop, C. M. (2006). Pattern recognition and machine learning. Springer. Bissell, F. (1978). An attempt to unify the theory of quality control procedures. Bulletin in Applied Statistics, 5, 113–128. Bothe, D. R. (2001). Column: One good idea: Use run charts to confirm root causes. Quality Progress, 34(2), 104. Champ, C. W., & Woodall, W. H. (1987). Exact rules for Shewhart control charts with supplementary run rules. Technometrics, 29, 389–393. Chang, S. I., & Ho, E. S. (1999). A two-stage neural network approach for process variance change detection and classification. International Journal of Production Research, 37(7), 1581–1599. Dassau, E. & Lewin, D. (2006). Optimization-based root cause analysis. In 16th European symposium on computer aided process engineering (pp. 943–948). Demirli, K., & Vijayakumar, S. (2008). Fuzzy assignable cause diagnosis of control chart patterns. In Annual meeting of the North American fuzzy information processing society (NAFIPS) (pp. 1–6). Demirtas, H., Arguelles, L. M., Chung, H., & Hedeker, D. (2007). On the performance of bias-reduction techniques for variance estimation in approximate Bayesian bootstrap imputation. Computational Statistics and Data Analysis, 51, 4064–4068. Dempster, A. P., Laird, N., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm (with discussion). Journal of the Royal Statistical Society, Series B, 39, 1–38. Dew, J. R. (1991). In search of the root cause. Quality Progress, 24(3), 97–102. Dey, S., & Stori, J. A. (2005). A Bayesian network approach to root cause diagnosis of process variations. International Journal of Machine Tools & Manufacture, 45(1), 75–91. Dhafr, N., Ahmad, M., Burgess, B., & Canagassababady, S. (2006). Improvement of quality performance in manufacturing organizations by minimization of production defects. Robotics and Computer-Integrated Manufacturing, 22(5–6), 536–542.

11243

Doggetti, A. M. (2005). Root cause analysis: A framework for tool selection. Quality Management Journal, 12(4), 34–45. Doty, L. A. (1996). Statistical process control (2nd ed.). NY, USA: Industrial Press. Formann, A. K. (2007). Mixture analysis of multivariate categorical data with covariates and missing entries. Computational Statistics and Data Analysis, 51, 5236–5246. Ge, M., Du, R., Zhang, G., & Xu, Y. (2004). Fault diagnosis using support vector machine with an application in sheet metal stamping operations. Mechanical Systems and Signal Processing, 18, 143–159. Lauritzen, S. T. (1995). The EM algorithm for graphical association models with missing data. Computational Statistics and Data Analysis, 19, 191–201. Leonhardt, S., & Ayoubi, M. (1997). Methods of fault diagnosis. Control Engineering Practice, 5(5), 683–692. Lunze, J., & Schiller, F. (1999). An example of fault diagnosis by means of probabilistic logic reasoning. Control Engineering Practice(7), 271–278. Mo, K. J., Lee, G., Nam, D. S., Yoon, Y. H., & Yoon, E. S. (1997). Robust fault diagnosis based on clustered system trees. Control Engineering Practice, 5(2), 199–208. Montgomery, D. C. (2005). Introduction to statistical quality control (5th ed.). New York, NY: Wiley. Motschman, T. L., & Moore, S. B. (1999). Corrective and preventive action. Transfusion Science, 21, 163–178. Nelson, L. S. (1984). The Shewhart control chart–test for special causes. Journal of Quality Technology, 16, 237–239. Page, E. S. (1955). Control charts with warning lines. Biometrics, 42, 243–257. Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible inference. San Francisco, CA: Morgan Kaufmann. Pendharkar, P. C. (2008). Maximum entropy and least square error minimizing procedures forestimating missing conditional probabilities in Bayesian networks. Computational Statistics and Data Analysis, 52, 3583–3602. Pollock, S., Raymer, E., & Waters, J. (1998). A practical root cause analysis (RCA) procedure. Annual Spring Conference Proceedings, 20, 219–224. Redner, R., & Walker, H. (1984). Mixture densities, maximum likelihood and the em algorithm. SIAM Review, 26(2), 195–239. Roberts, S. W. (1958). Properties of control chart zone test. The Bell System Technical Journal, 37, 83–114. Rooney, J. J., & Heuvel, L. N. V. (2004). Root cause analysis for beginners. Quality Progress, 45, 53. Sarkar, P. (2004). Clustering of Event Sequences for Failure Root Cause Analysis. Quality Engineering, 16(3), 451–460. Seder, L. A. (1950a). Diagnosis with diagrams-Part 1. Industrial Quality Control, 2, 11–19. Seder, L. A. (1950b). Diagnosis with diagrams-Part II. Industrial Quality Control, 6, 7–11. Smith, G. M. (2004). Statistical process control and quality improvement (5th ed.). NJ, USA: Pearson/Prentice Hall. Weidl, G., Madsen, A. L., & Israelson, S. (2005). Applications of object-oriented Bayesian networks for condition monitoring, root cause analysis and decision support on operation of complex continuous processes. Computers and Chemical Engineering, 29, 1996–2009. Western Electric (1965). Statistical quality control handbook. Indianapolis, IN. Wheeler, D. J. (1983). Detecting a shift in process average: Tables of the power function for X charts. Journal of Quality Technology, 18, 99–102. Widodo, A., & Yang, B. S. (2007). Support vector machine in machine condition monitoring and fault diagnosis. Mechanical Systems and Signal Processing, 21(6), 2560–2574. Yang, K., He, Y., & Xie, W. (1994). Statistical diagnosis and analysis techniques: A multivariate statistical study for an automotive door assembly process. Quality Engineering, 1, 128–130.