Computers & Industrial Engineering 60 (2011) 801–810
Contents lists available at ScienceDirect
Computers & Industrial Engineering journal homepage: www.elsevier.com/locate/caie
Data mining for quality control: Burr detection in the drilling process q Susana Ferreiro a,⇑, Basilio Sierra b, Itziar Irigoien b, Eneko Gorritxategi a a b
Fundación TEKNIKER, Eibar, Guipúzcoa, Spain University of the Basque Country, San Sebastián, Guipúzcoa, Spain
a r t i c l e
i n f o
Article history: Received 25 March 2010 Received in revised form 24 January 2011 Accepted 27 January 2011 Available online 4 February 2011 Keywords: Data mining Machine learning Drilling process Burr detection
a b s t r a c t Drilling process is one of the most important operations in aeronautic industry. It is performed on the wings of the aeroplanes and its main problem lies with the burr generation. At present moment, there is a visual inspection and manual burr elimination task subsequent to the drilling and previous to the riveting to ensure the quality of the product. These operations increase the cost and the resources required during the process. The article shows the use of data mining techniques to obtain a reliable model to detect the generation of burr during high speed drilling in dry conditions on aluminium Al 7075-T6. It makes possible to eliminate the unproductive operations in order to optimize the process and reduce economic cost. Furthermore, this model should be able to be implemented later in a monitoring system to detect automatically and on-line when the generated burr is out of tolerance limits or not. The article explains the whole process of data analysis from the data preparation to the evaluation and selection of the final model. Ó 2011 Elsevier Ltd. All rights reserved.
1. Introduction Nowadays, practically all fields of industrial activities are moving towards automation of their processes. This automation should ensure the quality of the product while minimizing manufacturing cost and optimizing resources. Drilling is the most important operation for aeronautic industry because it implies a high economic cost. This cost is consequence of the visual inspection and burr elimination tasks. They are non-productive operations, carried out subsequent to drilling and they should be eliminated or minimized to the maximum extent possible. A small or medium size aeroplane has more than 250,000 holes to be inspected and if there is burr, it must be eliminated. It is necessary to eliminate this manual process and change it for a monitoring system able to detect automatically and on-line when there is a burr. The aim of this article is to obtain a model that can be implemented into the machine to predict the burr generation during the drilling process. The technological centre ‘C.I.C MARGUNE’, a Cooperative Research Centre for High Performance Manufacturing, patented a monitoring method, experimentally adjusted, able to detect if the size of the generated burr is between aeronautical limits or not (Peña, Aramendi, & Rivero, 2007). This method consists of a conventional mathematical model to burr detection based on the parameters extracted from the whole internal signal of the machine and its percentage of correct classification was 92%. q
This manuscript was processed by Area Editor Satish Bukkapatnam.
⇑ Corresponding author.
E-mail address:
[email protected] (S. Ferreiro). 0360-8352/$ - see front matter Ó 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.cie.2011.01.018
Nevertheless, this model could not be implemented into the machine and there is currently no monitoring method for burr detection, so most of the research of this article was focused on obtaining a model that could be implemented in a monitoring system to predict automatically the burr generation during the drilling process. This model was derived from a process that extracts useful and understandable knowledge previously unknown from a set of experiments. Fig. 1 shows the communication among different phases of the knowledge extraction process. The storage, the organization and the information retrieval have been automated thanks to the data base systems and the availability of a huge quantity of information. There are some analytic techniques based on statistics that have been used to analyze this information, but they are cryptic for people who are not very experience with it. Data mining as explained in Michalski, Bratko, and Kubat (1998) and Kaelbling and Cohn (2003) is a multidisciplinary field easy to put into practice and it combines several techniques such as statistics, machine learning, decision-making support systems, and visualization, in order to extract knowledge from a data set. Each phase of the process includes a set of these techniques. The process is iterative and interactive. It is iterative because the output of any phase may turn back to the previous steps and because some iterations are necessary to extract high-quality knowledge. It needs to explore various models to find the most useful one to solve the problem. In the search of a good model it may be possible to return to previous phases and make changes in the data. Even the problem definition could be modified to give it a different approach. Moreover, the process is interactive because the expert in the problem domain should help in data
802
S. Ferreiro et al. / Computers & Industrial Engineering 60 (2011) 801–810
Fig. 1. Data analysis process.
preparation and evaluation. The evaluation is one of the most important phases in this process and it needs to have well-defined training and validation stages to decide which model offers better performance and accuracy. The idea is to estimate (train) the model with a subset of the dataset (training dataset) and then validate it with the rest of the dataset (test dataset). The main contribution of this article is to explain the usefulness and benefits of the data mining techniques to obtain a robust, accurate and reliable model that can be implemented into a monitoring system and used to predict burr generation during the drilling process as shown in Fig. 2. The rest of the article is organized as follows. Section 2 presents a review of related and previous works in the use of other data driven models in machining, and the position of the present work in the context of previous ones. Section 3 shows the experimental dataset, characteristics of the process, and data selection and preparation. It defines a clean and reliable dataset. Section 4 briefly describes the concept of machine learning. Next, Section 5 explains the results of the analysis and evaluation in order to obtain the best final model for detecting burr generation: application of machine learning algorithms without selection of variables, then with selection and combination of algorithms, and finally, makes a change of strategy to eliminate false negatives (cases in which the model predicts no burr but burr is generated. And finally, Section 6 closes the article with the most important conclusions.
2. A review of related works Aeronautic industry, as well as other industrial sectors must modify some of its manufacturing processes and maintenance strategy. Considering maintenance strategy, it is necessary to minimize the cost of maintenance and to increase operational reliability, replacing the traditional ‘‘fail and fix’’ method with ‘‘predict and prevent’’ as explained in Ferreiro and Arnaiz (2010). And with regard to manufacturing, the major need is to increase productivity and to optimize and automate certain processes while ensuring the
quality of the product. In both, manufacturing and maintenance, it is essential to explore new technologies, and a lot of works have been published looking into monitoring and diagnosis. Bukkapatnam, Kumara, and Lakhtakia (1999) presents a methodology based on chaos theory, wavelets and neural networks for analyzing AE signals. It evolves a thorough signal characterization, followed by signal representation using wavelet packets, and state estimation using multilayer neural networks. Bukkapatnam, Kumara, and Lakhtakia (2000) develops a methodology for accurate and algorithmically simple neural network estimation by exploiting the properties of the underlying machining, dynamics and its interactions with flank wear dynamics. Kamarthi et al. (2000) investigate a flank wear estimation technique in turning through wavelet representation of acoustic emission (AE) signals. The effectiveness of the wavelet representation of AE signals for flank wear estimation is investigated by conducting a set of turning experiments on AISI 6150 steel workpiece and K68 (C2) grade uncoated carbide inserts. In these experiments, flank wear is monitored through AE signals. A recurrent neural network of simple architecture is used to relate AE features to flank wear. Using this technique, flank wear estimation results are obtained for the operating conditions that are within in the range of those used during neural network training. In Pittner and Kamarthi (2002) the work deals with the assessment of process parameters or states in a given application using the features extracted from the wavelet coefficients of measured process signals. Sick (2002) describes the ‘state of the art’ with 138 publications dealing with on-line and indirect tool wear monitoring in turning by means of artificial neural networks. The article compares the methods applied in these publications as well as the methodologies used to select certain methods, to carry out simulation experiments, to evaluate and to present results. Rangwala and Dornfeld (2002) present a scheme that uses a feedforward neural network for learning and optimization of machining operations. The network learns by observing the effect of the input variables of the operation (such as feed rate, depth of cut, and cutting speed) on the output variables (such as cutting force, power, temperature, and surface finish of the work-
S. Ferreiro et al. / Computers & Industrial Engineering 60 (2011) 801–810
803
Fig. 2. Elimination of unproductive operations.
piece). The learning phase is followed by a synthesis phase during which the network predicts the input conditions to be used by the machine tool to maximize the metal removal rate subject to appropriate operating constraints. Byrne, Dornfeld, and Denkena (2003) review some of the main developments in cutting technology since the foundation of CIRP. Caprino, Teti, and de lorio (2005) predicts the residual strength of pre-fatigued glass fibre-reinforced plastic laminates through acoustic emission monitoring. An empirical correlation was found between the material residual strength and the total event counts detected at maximum stress applied during prefatiguing cycles. Moreover, the correlation was improved when a previous model, relying on fracture mechanics concepts, was utilised. Gradually, the techniques have been varied always trying to improve the results obtained so far. In How, Liu, and Lin (2003) a rough set is used to extract causal relationships between manufacturers’ parameters and product quality measures. In order to identify the relationship between residual stresses and the process parameters themselves, Umbrello, Ambrogio, Filice, Guerriero, and Guido (2009) proposes data mining techniques applied to the cutting process. Additionally Malakooti and Raman (2000) define the problem of assessing the value of certain input when trying to minimize cost, maximize productivity and improve the surface finish. However, today drilling is one of the most important manufacturing processes which requires attention. Peña et al. (2007) have studied ways to obtain cleaner holes, free of burr, in an effort to reduce or eliminate the lubricant used in the cleaning process prior to riveting. Nevertheless, the main problem is the occurrence of burr, and several studies into this question have been carried out, such as Kim et al. (2000) or Min et al. (2001). In these studies a control chart was developed for stainless AISI 304L and AISI 4118 in order to examine the drilling process in terms of cut-
ting conditions and drill diameter. Hambli (2002) describes a finite elements approach with neural network to predict the burr height of the parts. Heisel, Luik, Eisseler, and Schaal (2005) proposes a method based on empirical cutting examinations and correlation between burr parameters. Lauderbaugh (2009) presents a methodology to predict burr height, force, heat flux, and temperature at breakthrough based on a statistical analysis of 2024-T351 and 7075-T6 aluminium. Gaitonde, Karnik, Achyutha, and Siddeswarappa (2008a) determines burr height and burr thickness combining response surface methodology with genetic algorithms in drilling of AISI 316L stainless steel using HSS twist drills. Gaitonde, Karnik, Achyutha, and Siddeswarappa (2008b) presents an application of Taguchi optimization method in order to minimize burr height and thickness. Chang and Bone (2010) describes an analytical model to predict burr height in vibration assisted drilling of aluminium 6061-T6. The drilling process is particularly important in the aeronautics industry because of the need to ensure the safety of the product and meet the statutory requirements regarding the maximum size of the burrs. Holes with burr exceeding the official limit of 127 lm are not permitted, even when the percentage of these is low. Several studies based on data mining approaches, machine learning algorithms or advanced statistics have been carried out since the patent Peña, Aramendi, Rivero, and López de LaCalle (2005) because as mentioned in Wang (2007): ‘‘. . .human operators may never find such rules by investigating a dataset manually’’ and ‘‘. . .one may never be able to discover such hidden knowledge from a dataset without the assistance of computer-based data analysis and mining approaches’’. Initially, some of the classic algorithms of machine learning were applied to solve the problem presented in this article as detailed in Ferreiro et al. (2009). This study was applied without required precision, in an explorative way, only
804
S. Ferreiro et al. / Computers & Industrial Engineering 60 (2011) 801–810
with the intention of seeing if machine learning algorithms could improve the detection of burrs during the process. They improve the results of the patented model without much loss of time or the need for computational resources. Then, a study in-depth was carried out into the possibility of improvement through the use of various data mining approaches explained in Kaelbling and Cohn (2003) such as a pre-processing of the data (discretization) and a wider selection of the most influential variables in the drilling process, based on different criteria and search methods. Moreover, the criterion to select the best model among several was set by means of the evaluation of each of them plus a hypothesis test that determines if there are significant differences. This study is explained below.
3. Experimental setup: data selection and preparation The main objective of the present work is to improve the detection of burr during the drilling process by using data mining techniques. It should be able to improve the correct ratio classification obtained by the current mathematical model and to be implemented later into the machine, within a monitoring system to predict automatically and on-line the burr generation. The monitoring system should start from a data base within which create a model and introduce it later in the machine. One of the fundamental tasks prior to the development of this project was to study the sensitive of different signals to the burr detection, to treat and to use them to develop on-line monitoring system. It implied to analyze the signals and to evaluate which of them had more information about the burr. The internal signals of the machine were analyzed (torque of the spindle, the power and advanced force) and the studies reached that these signals present certain advantages: 1. they are a simple acquisition method and it does not require additional elements, 2. they form a non intrusive method since no elements are added to the work piece, 3. they provide an easy integration in the machines control. Fig. 3 shows an example of internal signal caught during a drilling test. This signal belongs to the torque of electro-spindle during the drilling of a hole, from the electro-spindle acceleration to the deceleration. It shows four areas: ‘‘Spindle acceleration’’ of the gear-head, ‘‘Approach to work piece’’ to the material, ‘‘Cutting’’ and ‘‘Spindle deceleration’’. The studies of these types of signals by experts concluded that the shape of the signal of the torque of the electro-spindle regarding to the time domain is related to the size of the burr, and it was
Fig. 4. Cutting area.
observed that the most representative area corresponded to ‘‘Cutting’’ area represented below in Fig. 4. Finally, the most influential and representative variables of the process and the ‘‘Cutting’’ area of the pair of the spindle signal were taken into account as predictive variables as shown in Table 1. Drill bit, speed of cut, length in entrance, speed of advanced, length in exit and thickness defines the process, while minimum, maximum, angle, height and weight were calculated from the ‘‘Cutting’’ area. In order to develop a model, a set of experiments was performed from a design of experiments. The material for the tests was aluminium Al 7075-T6, commonly used in aeronautical structures and it were performed in a high speed machining centre CNC (3-axis) without lubricant. This machine operates at a maximum speed of 24,000 rpm, and has a maximum feed rate of 120 m/min. It has an acceleration of 2 g (g = 9.8 m/s2), a nominal power of 27 kW and nominal torque of 16.97Nm. The tool was a three-edged carbide drill with two different angles: point angle (130° – hard rock) and helix angle (30° – soft rock). The geometry of the drill-hole corresponded to a 10 mm diameter, drilled length was 12 and 25 mm thicknesses. And cutting speed range was set at 0.2–0.5 mm/rev. Finally, there was a dataset of 106 tests plus the class to being predicted as represented in Table 2: Class (Burr = no): admissible burr. Class (Burr = yes): non admissible burr. The class was categorized based on the permissible burr size imposed by the aeronautical industry which demands maximum size of 127 lm. The size of burr was measured during the execution of the experiments at different angles (0°, 90°, 120°, 180°, 240°, 270°) by means of a roughness tester. The mean of these values was used to categorize the class based on the permissible burr size as defined by aeronautical industry. However, the dataset from which to learn the algorithms does not always get through a design of experiments. Sometimes it is necessary to extract this data from complex databases which are Table 1 Predictive variables.
Fig. 3. Electro-spindle signal caught during drilling.
Description
Variable
Origin
Type
Values
Drill bit Test num/drill bit Velocity Length in entrance Advance 1 Advance 2 Length in exit Thickness Minimum Maximum Angle Height Weight
BRO NUM VC TRM AV1 AV2 REC ESP MIN MAX ANG ALT ANC
Configuration Configuration Configuration Configuration Configuration Configuration Configuration Configuration Sensor Sensor Sensor Sensor Sensor
Discrete Continuous Continuous Discrete Continuous Continuous Discrete Discrete Continuous Continuous Continuous Continuous Continuous
SR; HARD
8; 15; 20; 35
20; 35 12; 25
805
S. Ferreiro et al. / Computers & Industrial Engineering 60 (2011) 801–810 Table 2 Dataset structure. BRO
NUM
VC
TRM
AV1
AV2
REC
ESP
MIN
SR SR HARD HARD
1 20 24 12
150 200 150 250
35 35 8 35
0.3 0.4 0.3 0.2
0.3 0.4 0.5 0.2
35 35 20 35
25 25 25 12
0.11 0.21 0.07 0.69
0.31 0.243 2.08 0.3
5
200
20
0.3
0.5
20
12
0.203
0.84
SR
difficult to understand and operate. This data as seen in Fayyad, Piatetsky-Shapiro, and Smyth (1996) are typically too voluminous to understand and digest easily. After defining the initial dataset, a group of experts in the drilling process carried out a detection of irrelevant or unnecessary data, anomalous data (outliers), missing values, inconsistencies, etc., as in Hand, Mannila, and Smyth (2001). Consequently, the dataset was more representative and reliable. Each task mentioned above includes a broad set of data mining techniques to its execution such as visual inspection, histograms, box plots, methods to replace missing values or treat inconsistencies and outliers. But these techniques are not yet commonly used by the experts, and usually they rely on their knowledge and the visual inspection as in the present work. Moreover, when the data set is obtained from an experimental design, the data is more reliable and the effort required for the pre treatment is lower. Having defined the dataset and to complete this phase of data selection and preparation we decided to perform a supervised discretization (Kononenko, 1995) of the dataset. The discretization is beneficial and usually performed when there is significant threshold, it is necessary to integrate different scales, the error of the mean is large, the model requires nominal variables, the model is slow with numerical variables, etc. Data selection and processing is an important task that usually takes more than 50% of the analysis time so the results obtained later depend on the quality of the initial data set to a certain extend.
4. What is machine learning? Machine learning as defined in Mitchell (1997a) is a subfield of artificial intelligence (Russel & Norvig, 1995) closely related to data mining (Michalski et al., 1998) and its aim is to develop algorithms that allow to the machines learn from data, that is, to develop programs able to induce models that improve their performance over the time from data. This is the reason that it is a knowledge induction process. Within Machine learning it can be distinguished different types of algorithms depending on their functionality and application: Supervised Learning: they offer the possibility of learning from a categorized set of data, resulting in a predictive model able to represent and generalize the behaviour contained in the data. Once a model is created, it will be able to classify and categorize new cases of the problem that it is trying to solve. Unsupervised Learning: from a set of no categorized data, consisting of cases where the class is unknown and will be analyzed by the models in order to recognize different groups of cases with similar characteristics. The creation of these groups allows the extraction of information from the available data set, making visible certain characteristics that hide information. Machine learning algorithms have been used up to some years in specific applications. Langley and Simon (1995) offers a brief description of some of these applications such as diagnosis of mechanical devices, preventing breakdowns in electrical trans-
MAX
ANG
ALT
ANC
BURR
42.3 55.7 43.3 31.1
2.091 2.228 4.93 3.78
10.44 17.04 8.82 14.32
YES YES YES YES
22.5
3.689
10.77
NO
formers, forecasting severe thunderstorms, predicting the structure of proteins or making credit decisions. However, these algorithms are widely used increasingly in medicine as demonstrated in Inza, Merino, et al. (2001), bioinformatics as explained in Inza et al. (2010) and industrial applications as shown in Nieves et al. (2009), Santos, Nieves, Penya, and Bringas (2009), Correa, Bielza, de Ramirez, and Alique (2008) and Correa, Bielza, and PamiesTeixeira (2009). It mixes mathematical elements with statistics and computational sciences such as classification trees, induction rules, neural networks, Bayesian networks, regression algorithms, supported vector machines, and clustering. The development of the present work was performed using Weka (Witten & Frank, 2000) software, which is a collection of machine learning algorithms written in Java and developed by the University of Waikato (Australia). At the end of this article the validity of these techniques for the industrial field are shown, and a model valid for the burr prediction and detection in the drilling process is obtained in order to improve the current results of the conventional techniques of the data analysis.
5. Presented approach and experimental results After a first pre-processing of data there were a set of 106 tests of drilling measures made up of the parameters of the process and variables from the pair of the spindle signal. The objective was to obtain a more suitable classification model, with a result of (Burr = yes | Burr = no). These classes were defined taking into account the imposed aeronautic limits. Burr = yes: non acceptable burr (with value equal or greater than 127 lm). Burr = no: acceptable burr (with value lower than 127 lm). 5.1. Evaluation Evaluation is a very important question to bear in mind after learning the network because the network validity depends on the quality of the evaluation. The following objectives are considered: A. To estimate the real error rate of the prediction (with new validation samples): this rate should be calculated using the data sets that have not been used for learning the model because the error rate calculated from training samples underestimates the error rate predicted for new samples. B. To select the model from two or more models. The evaluation assesses whether one model is better than another. Because of the importance of these two objectives, the following procedure was carried out for calculating good estimates of the error rate of the models: 1. 10-fold cross-validation was applied. This is a technique to estimate the performance of the predictive model. It randomly assigns tests to 10 sets {d1, d2 , . . ., d10} of equal size. Then the
806
S. Ferreiro et al. / Computers & Industrial Engineering 60 (2011) 801–810
model is trained on each set and tested on the rest. The final accuracy is calculated as the average of the accuracy obtained from the 10 sets. 2. 10-fold cross-validation was repeated ten times (using different seeds), obtaining 10 rates of the percentage of correct classification. 3. The average of the 10 rates was calculated. Finally, a ranking of model was identified, based on the correct classification rate calculated in 3 above. 5.2. First approach Firstly, several algorithms of machine learning presented in Mitchell (1997b) were applied such as classification trees, induction rules, distance based techniques, techniques based on probabilities and neural networks. 5.2.1. Results Tables 3 and 4 show the results obtained by some of the machine learning algorithms for the classification task (Burr = yes | Burr = no) before and after data pre-processing (discretization). It can be seen that the results are not much better when discretization is not performing, there are even algorithms in which the accuracy seems worse. For this reason and because having the data discretized speeds up the training process of the algorithms, it is decided to work with this data from here on. It is observed that most of the machine learning algorithms provide better or very close results to the conventional approach used by the industrial partner (remember that it gives a 92% of correct classification rate). The three algorithms that provide the best results from Table 4 are marked with an asterisk (ID3, Prism, and KNN). These algorithms are very intuitive, and it can be developed and implemented into the machine easily. 5.3. Second approach Once the standard approach was applied, we expected to improve the models increasing their accuracy, validity, reliability and stability as in Kazakov and Kudenko (2001) with a feature subset selection or selection of variables as explained in Mitchell (1997a, 1997b) and a combination of classifier algorithms. 5.3.1. Feature subset selection The aim of the selection of attributes as stated in Nilsson (1996) does not lie on analyzing the variables because this task was made in a previous sections, but this is a second selection and it determines which are the most influential variables and which of them improve the model. This feature subset selection has some advantages:
Table 3 Results of machine learning algorithms before data discretization. Type of classification
Algorithm
Mean value (%)
Standard deviation
Classification trees
J48 ID3
92.85 94.66
1.06 1.86
Induction rules
JRip Prism
89.33 95.71
1.79 1.14
Distance based techniques
KNN (k = 1) KNN (k = 3)
92.09 93.14
1.06 1.26
Techniques based on probabilities
Naive Bayes simple
94.09
0.85
Noise elimination, increasing data precision and predictive and explanatory ability of the model. Irrelevant data elimination, decreasing acquisition cost and computational cost of the data base. Redundancies elimination, avoiding problems of inconsistencies and duplications. There are a wide variety of complex algorithms developed for the selection of variables in different applications such as Inza, Larrañaga, Etxeberria, and Sierra (2000) and Inza, Larrañaga, and Sierra, 2001. The present work combines several criteria and measures (such as gain information, explicative variance, and correlation test) together with search methods (such as exhaustive search or ranker) in order to reach to the more representative set of variables. 5.3.2. Combination of classifier algorithms The combination of classifiers raised in Witten and Frank (2000) is carried out to improve classification accuracy and to decrease uncertainty. Each classifier has its own performance and its own error rate due to their different input spaces and/or algorithm. Therefore, by combining different types of classifiers focusing on different aspects of the task, the overall error rate can be decreased. It is very interesting to reinforce the classification ability by taking the maximum profit of the fact that each model has a different set of well classified cases. There are several methods to combine the classifiers (combination of classifiers of different types, combination of classifiers of the same type and hybrids). The difference of the combination of classifiers of different types is given by the method of the combination (the order in which the classifiers are executed): In series or Cascade: the combination is chained up (the first model is included in the second and so on). In parallel: the final classifier is obtained by means of the combination of strategies such as voting. Hierarchic: similar to Cascade but setting hierarchies amongst the classifiers in a structured way like decision trees. The combination of classifiers of the same type can be divided in two groups: Bagging: it combines the classifiers obtained from replicas by using the re-sampling method called bootstrapping on the training set. Boosting: it increases the samples in each iteration. Finally, hybrid methods are developed from the combination of two or more paradigms (i.e.: Lazy Bayesian Rules, Naïve Bayes Tree, Logistic Model Trees, etc.). 5.3.3. Results Table 5 presents the selection of variables taking into account different criteria and search methods that are explained below and Table 6 shows the results obtained when applying machine learning algorithms with these sets of variables. Criteria Criteria_1: it evaluates the worth of a subset of variables by considering the individual predictive ability of each variable along with the degree of redundancy between them. Criteria_2: it evaluates the worth of a variable by computing the value of the chi-squared statistic with respect to the class. Criteria_3: it evaluates the worth of a subset of variables by the level of consistency in the class values when the training data is projected onto the subset of variables.
S. Ferreiro et al. / Computers & Industrial Engineering 60 (2011) 801–810 Table 4 Results of machine learning algorithms after data discretization. Type of classification
Algorithm
Mean value (%)
Standard deviation
Classification trees
J48 ID3
92.19 94.95⁄
1.17 1.62
Induction rules
JRip Prism
89.24 95.81⁄
1.56 1.12
Distance based techniques
KNN (k = 1) KNN (k = 3)
95.43⁄ 93.43
1.67 0.30
Techniques based on probabilities
Naive Bayes Simple
94
0.78
Table 5 Selection of variables. Criteria
Search method
Selected variables
Criteria_1 Criteria_2 Criteria_3 Criteria_4 Criteria_5
SearchMethod_1 SearchMethod_2 SearchMethod_1 SearchMethod_2 SearchMethod_2
VC, TRM, ANC, MIN ANC, MIN, VC, ALT, AV1 BRO, VC, TRM, MIN, ALT, ANC VC, ANC, ALT, AV1, MIN ANC, MIN, VC, ALT, AV1
Table 6 Results of machine learning algorithms with selection of variables. Selected variables
Algorithm
Mean value (%)
Standard deviation
VC, TRM, ANC, MIN
J48 ID3 JRip Prism KNN (k = 1) KNN (k = 3) Naïve Bayes simple
94.95 95.52 90 91.14 92.19 94.95 96.19⁄
2.20 1.56 1.12 0.46 1.54 0.90 0
ANC, MIN, VC, ALT, AV1
J48 ID3 JRip Prism KNN (k = 1) KNN (k = 3) Naive Bayes Simple
94 94.29 88.76 92.57 93.05 94.48 95.24
1.96 0.9 1.08 1.33 1.68 1.89 0
BRO, VC, TRM, MIN, ALT, ANC
J48 ID3 JRip Prism KNN (k = 1) KNN (k = 3) Naive Bayes simple
92.48 96.1⁄ 90.1 95.9⁄ 96.29⁄ 93.71 94.29
1.14 1.65 1.21 1.19 1.82 0.49 0
Criteria_4: it evaluates the worth of a variable by measuring the gain ratio with respect to the class. Criteria_5: it evaluates the worth of a variable by measuring the information gain with respect to the class. Search algorithms SearchAlgorithm_1: it performs an exhaustive search through the space of variable subsets starting from the empty set of attributes and it reports the best subset found. SearchAlgorithm_2: it ranks variables by their individual evaluations. This selection of variables determined which parameters of the process were relevant and at the same time to show that not all parameters used by the classical algorithm were necessaries.
807
It is clear that some of the algorithms exceed the accuracy of the previous algorithms learnt without selection of variables. These algorithms (Naïve Bayes Simple, ID3, Prism and KNN) have been identified in the table above. Naïve Bayes Simple uses speed of cut, speed in advance, weight, height and minimum as predictors, while ID3, Prism and KNN use drill bit, speed of cut, length in entrance, minimum, height and weight. As a result, six variables were selected (BRO, VC, TRM, MIN, ALT and ANC) from the original dataset of eleven variables and we attempted to improve the accuracy of the model by combining classifiers as shown in Table 7. The analysis of the different algorithms of combination of classifiers showed that ‘Vote’ classifier provided the best accuracy using KNN (k = 1), ID3 and Prism, with the six variables extracted from the feature subset selection. It provided with a 96.76% of correct classification rate with a 1.57% of standard deviation. ‘Boosting’ classifier applied to KNN (k = 1) and to Prism algorithms individually provides a very similar accuracy to the ‘Vote’ classifier. However, although the accuracy of the algorithms seems to have been improving throughout the whole process, the top 10 algorithms (), selected in the previous approaches and presented in Table 8, do not differ significantly in their accuracy and it would be inappropriate to assume that one is better than another. Table 9 shows the result when applying the one-way ANOVA1 technique to compare the means of the accuracies using F distribution (Ross, 2005). It test the null hypothesis (H0: l1 = l2 = . . . = ln) that the groups are drawn from the same population, where li is the mean of ith algorithm. The ANOVA produces an F statistic, the ratio of the variance calculated among the means to the variance within the samples. If the algorithm means are drawn from the same population, the variance between the group means should be lower than the variance of the samples, following central limit theorem. A higher ratio implies that the samples were drawn from different populations. Before applying ANOVA, it was proved that basic assumptions of the model are true by means of the analysis of remainders, that according to the hypothesis should be random values, independent and normally distributed with mean 0 and with homogeneous standard deviation. The test concludes that there are not significant differences among the algorithms accuracy because the significance value is 0.206 and it is greater than 0.05. But despite there are no significant differences among the algorithms accuracy, Naïve Bayes Simple is the best option. Firstly, it is a simple model to implant into the machine for the burr detection. Secondly, its standard deviation is very close to zero which means that the algorithm is very stable. And finally, it uses only a subset of the overall set of variables and this implies that it is possible to reduce the quantity of data to use for the machine learning task, reducing the computation time.
5.4. Third approach This approach changed the aim of the problem due to a new requirement. It was necessary to eliminate ‘false negatives’, this is to say, cases in which the model predicts no burr but burr is generated. This requirement is imposed by manufacturing industry in order to reduce the high cost of the checking and to ensure 100% that the burr is detected. The (two states) classification had the inconvenience of having false negatives; they were not eliminated in spite of the high percentage of correct classification rate (even by the conventional 1 ANOVA is a statistical procedure used to determine whether the means of two or more samples are drawn from populations with the same mean.
808
S. Ferreiro et al. / Computers & Industrial Engineering 60 (2011) 801–810
Table 7 Results of machine learning algorithms with combination of classifiers. Type of combination
Algorithms
Classifiers of different types Stacking KNN (k = 1), ID3, Prism Vote KNN (k = 1), ID3, Prism Classifiers of the same type Boosting KNN (k = 1) ID3 Prism Bagging KNN (k = 1) ID3 Prism
Mean value (%)
Standard deviation
95.24
1.42
96.76⁄
1.57
96.48⁄ 95.33 96.38⁄ 96.1 95.62 95.9
1.56 1.31 1.17 1.82 1.81 1.19
Table 10 Results of machine learning algorithms. Type of classification
Algorithm
Mean value (%)
Standard deviation
False negatives
Classification trees
J48 ID3
93.33⁄ 89.62
1.19 1.14
0 0.8
Induction rules
JRip Prism
93.05⁄ 89.9⁄
1.01 0.66
0 0
Distance based techniques
KNN (k = 1) KNN (k = 3)
89.9 88.57
0.66 0
1.2 3
Techniques based on probabilities.
Naive Bayes simple
90.38
0.3
1.2
Table 11 One-way ANOVA. Table 8 Top 10 algorithms. Variables
Algorithm
Mean value (%)
Standard deviation
All All All VC, TRM, ANC, MIN
94.95⁄ 95.81⁄ 95.43⁄ 96.19⁄
1.62 1.12 1.67 0
TRM, MIN, ALT,
ID3 Prism KNN (k = 1) Naive Bayes simple ID3
96.1⁄
1.65
TRM, MIN, ALT,
Prism
95.9⁄
1.19
TRM, MIN, ALT,
KNN (k = 1)
96.29⁄
1.82
⁄
1.57
BRO, VC, ANC BRO, VC, ANC BRO, VC, ANC BRO, VC, ANC BRO, VC, ANC BRO, VC, ANC
TRM, MIN, ALT,
Vote
96.76
TRM, MIN, ALT,
Boosting
96.48⁄
1.56
TRM, MIN, ALT,
Boosting
96.38⁄
1.17
Sum of squares
df
Mean square
F
Sig.
Between groups Within groups
72.381 25.850
2 27
36.190 .957
37.801
.000
Total
98.230
29
Table 12 Scheffé method. (I) Group
J48 JRip Prism
*
(J) Group
JRip Prism J48 Prism J48 JRip
Mean difference (I–J)
.2857 3.4286* .2857 3.1429* 3.4286* 3.1429*
Std. error
.43758 .43758 .43758 .43758 .43758 .43758
Sig.
.809 .000 .809 .000 .000 .000
The mean difference is significant at the .05 level.
Table 9 One-way ANOVA. Sum of squares
df
Mean square
F
Sig.
Between groups Within groups
25.404 183.312
9 90
2.823 2.037
1.386
0.206
Total
208.717
99
approach). Notice that to predict burr when there is none is less relevant than failing to predict burr. The second action implies more consequences. At this point in time, the aim was to reduce the number of holes being revised but without ‘false negatives’. In order to avoid these ‘false negatives’ the analysis was changed, adding one more class to introduce a margin of error: Class 1: Burr size P127 lm. Class 2: 127 > Burr size P100 lm. Class 3: Burr size <100 lm. 5.4.1. Results Table 10 shows the accuracy obtained by using all the initial predictive variables, from the calibration of 10 different seeds (to introduce noise in the data) as applied in the previous evaluations, and the quantity of ‘false negatives’ calculated as the average of false negatives obtained from each calibration. Thanks to having added one more class the false negatives disappear although some accuracy is lost. The mean of the percentage
Fig. 5. Classification tree (J48).
95% confidence interval Lower bound
Upper bound
.8476 2.2952 1.4191 2.0095 4.5619 4.2762
1.4191 4.5619 .8476 4.2762 2.2952 2.0095
S. Ferreiro et al. / Computers & Industrial Engineering 60 (2011) 801–810
of the correct classification rate is worst that for the algorithms showed in the previous section. The reason is due to the raise of the number of classes being predicted, it makes more difficult for the model the discrimination amongst them. Even so, the number of ‘false negatives’ decreases in some algorithms (J48, JRip and Prism), which is a requirement imposed by the industry. But these algorithms present significant differences amongst them because the significance level is less than 0.05 when applying one-way ANOVA (Table 11). Next, Scheffé2 method (Montgomery, 2004) shows that the difference is due to the third algorithm: Prism. Prism is worse that J48 and JRip and this difference is significant as shown in Table 12. Both algorithms provide a good result and they eliminate false negatives as well. Besides, its implementation into the monitoring system to detect automatically and on-line the burr generation can be simple because JRip algorithm is based on a set of rules (defined below) and J48 is based on a classification tree (Fig. 5), which can be converted quickly and easily in a set of rules. JRIP rules IF (ANC 6 7.254) and (AV2 > 0.225) THEN Class 2 ELSE IF (VC 6 112.5) THEN Class 3 ELSE Class 1 END IF
6. Conclusions Data mining is a multidisciplinary field that can be applied in many activity fields to data study and analysis. Sometimes the industry relies on a lot of data from which to extract previously unknown information. This information brings many benefits as shown in the article. From a simple design of experiments and data mining techniques, especially the selection of variables and machine learning algorithms, a model for detection of burr during the drilling process was developed. This model provides better accuracy than the previously used mathematical model and in addition it has certain advantages. Firstly, the model is based on the internal signal of the machine and certain parameters of the conditions of the process, so the implementation would be easier and without external sensors. It is able to be implemented later in a monitoring system to detect automatically and on-line when the burr occurs during the drilling process. And secondly, it provides information about which parameters of the drilling process define if there is occurrence of burr or not. Regarding the results, almost all developed models provide a higher accuracy than the current mathematical model. Moreover, for the final model based on Naïve Bayes the accuracy was 95% and standard deviation equal to 0, which means it is a very stable model. At this point, it must also be mentioned that in most of the cases in which the model performs a bad prediction, it is because the burr is very close to the aeronautical limits (127 lm), which makes more difficult to distinguish which is the corresponding class. Other aspect to take into account is the fact that the prediction error has not the same significance whether the drilled hole has burr or not. The main error is produced when a sample is classified as acceptable but, actually, it has not acceptable limits. For the detection of these cases, the model has been improved, reducing considerably the number of inspections and operations to remove burr. 2 Scheffé method is a test performed after ANOVA. This makes comparisons amongst means to determine significant differences between pairs of groups.
809
These techniques could be applied to other industrial processes such as moulding and chip removal processes, potentially obtaining good results as in the previous case and making possible the automation of some tasks which are made by hand at present. They are relatively new and novel techniques in certain industrial sectors such as metal, aeronautics or chemistry, and they should be investigated as a possible solution to some of their current problems. Further studies could be undertaken by using Bayesian network, a model representation for reasoning under uncertainty. Formally, its representation is a directed acyclic graph (DAG) where each node represents a random variable and the edges represent (often causal) dependence relations between them (Jensen, 1996). Each variable represents a unique event or hypothesis, with a finite set of mutually exclusive states: X = {X1, . . . , Xn}, and there must be a state for each possible value and its conditional probabilities. It uses Bayes’ theorem to upgrade the probabilities, and taking into account the good results obtained by the Naïve Bayes algorithm, Bayesian network would be a model to evaluate. Moreover, it could be possible to increase the number of predictive variables, such as the type of material, and the geometry of the bit. Another possibility would be to take into account the wear on the tool. At present, when the operator becomes aware of the worn condition of the tool, it should be changed. It is clear that many other aspects of the drilling process can be usefully explored.
References Byrne, G., Dornfeld, D., & Denkena, B. (2003). Advancing cutting technology. CIRP Annals – Manufacturing Technology, 52(2), 483–507. Bukkapatnam, S. T. S., Kumara, S. R. T., & Lakhtakia, A. (1999). Analysis of acoustic emission signals in machining. Journal of Manufacturing Science and Engineering, 121(4), 568–571. doi:10.1115/1.2833058. Bukkapatnam, S. T. S., Kumara, S. R. T., & Lakhtakia, A. (2000). Fractal estimation of flank wear in turning. Journal of Dynamic Systems, Measurement, and Control, 122, 89–94. http://www.okstate.edu/commsens/Papers/10journalpublic.pdf. Caprino, G., Teti, R., & de lorio, I. (2005). Predicting residual strength of pre-fatigued glass fibre-reinforced plastic laminates through acoustic emission monitoring. Composites Part B: Engineering, 36(5), 365–371. Chang, Simon S. F., & Bone, G. M. (2010). Burr height model for vibration assisted drilling of aluminum 6061-T6. Precision Engineering, 34, 369–375. Correa, M., Bielza, C., de Ramirez, M. J., & Alique, J. R. (2008). A Bayesian networks model for surface roughness prediction in the machining process. International Journal of Systems Science, 39, 1181–1192. Correa, M., Bielza, C., & Pamies-Teixeira, J. (2009). Comparison of Bayesian networks and artificial neural networks for quality detection in a machining process. Expert Systems with Applications, 36, 7270–7279. Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996). From data mining to knowledge discovery in databases. AI Magazine, 17, 37–54. http:// www.aaai.org/ojs/index.php/aimagazine/article/view/1230/1131.pdf. Ferreiro, S., Arana, R., Aizpurua, G., Aramendi, G., Arnaiz, A., & Sierra, B. (2009). Data mining for burr detection (in the drilling process). Distributed computing, artificial intelligence, bioinformatics, soft computing, and ambient assisted living, Pt li, proceedings (Vol. 5518, pp. 1264–1273). Ferreiro, S., & Arnaiz, A. (2010). Improving aircraft maintenance with innovative prognostics and health management techniques. Case of study: Brake wear degradation. In 2nd International Conference on agents and artificial intelligence (ICAART 2010), Valencia, Spain (pp. 568–575). Gaitonde, V. N., Karnik, S. R., Achyutha, B. T., & Siddeswarappa, B. (2008a). Genetic algorithm-based burr size minimization in drilling of AISI 316L stainless steel. Journal of materials processing technology, 197(1–3), 225–236. Gaitonde, V. N., Karnik, S. R., Achyutha, B. T., & Siddeswarappa, B. (2008b). Taguchi optimization in drilling of AISI 316L stainless steel to minimize burr size using multi-performance objective based on membership function. Journal of materials processing technology, 202(1–3), 374–379. Hand, D. J., Mannila, H., & Smyth, P. (2001). Principles of data mining. Cambridge: MIT Press. Hambli, R. (2002). Prediction of burr height formation in blanking processes using neural network. International Journal of Mechanical Sciences, 44, 2089–2102. Heisel, U., Luik, M., Eisseler, R., & Schaal, M. (2005). Prediction of parameters for the burr dimensions in short-hole drilling. Annals of the CIRP, 54(1), 79–82. How, T., Liu, W., & Lin, L. (2003). Intelligent remote monitoring and diagnosis of manufacturing processes using an integrated approach of neural networks and rough sets. Journal of Intelligent Manufacturing, 14, 239–253. Inza, I., Larrañaga, P., Etxeberria, R., & Sierra, B. (2000). Feature subset selection by Bayesian network-based optimization. Artificial Intelligence, 123, 157–184.
810
S. Ferreiro et al. / Computers & Industrial Engineering 60 (2011) 801–810
Inza, I., Merino, M., Larrañaga, P., Quiroga, J., Sierra, B., & Girala, M. (2001). Feature subset selection by genetic algorithms and estimation of distribution algorithms – A case study in the survival of cirrhotic patients treated with TIPS. Artificial Intelligence in Medicine, 23, 187–205. Inza, I., Larrañaga, P., & Sierra, B. (2001). Feature subset selection by Bayesian networks: A comparison with genetic and sequential algorithms. International Journal of Approximate Reasoning, 27, 143–164. Inza, I., Calvo, B., Armañanzas, R., Bengoetxea, E., Larrañaga, P., & Lozano, J. A. (2010). Machine learning: An indispensable tool in bioinformatics. In R. Matthiesen (Ed.). Bioinformatics methods in clinical research (Vol. 593, pp. 25–48). Humana Press. Jensen, F. V. (1996). An introduction to Bayesian networks. Springer Verlag. Kaelbling, L. P., & Cohn, D. P. (2003). Special issue on feature subset selection. Journal of Machine Learning Research, 3. Kamarthi, S. V., Kumara, S. R. T., & Cohen, P. H. (2000). Flank wear estimation in turning through wavelet representation of acoustic emission signals. Journal of Manufacturing Science and Engineering, 122, 12–19. Kazakov, D., & Kudenko, D. (2001). Machine learning, ILP for MAS. In advanced course on artificial intelligence (ACAI 2001). Lecture Notes in Artificial Intelligence, 2086, 246–270 [Prague, Czech Republic]. Kim, J., Min, S., & Dornfeld, D. A. (2000). Optimization and control drilling burr formation of AISI 304L and AISI 4118 based on drilling burr control charts. International Journal of Machine Tools & Manufacture, 41, 923–936. Kononenko, I. (1995). On biases in estimating multi-valued attributes. In 14th international joint conference on artificial intelligence, Montréal, Canada (pp. 1034–1040). Langley, P., & Simon, H. A. (1995). Application of machine learning and rule induction. Communications of the ACM, 38(11). doi:10.1145/219717.219768. Lauderbaugh, L. K. (2009). Analysis of the effects of process parameters on exit burrs in drilling using a combined simulation and experimental approach. Journal of Materials Processing Technology, 209(4), 1909–1919. Malakooti, B., & Raman, V. (2000). An interactive multi-objective artificial neural network approach for machine setup optimization. Journal of Intelligent Manufacturing, 11, 41–50. Michalski, R. S., Bratko, I., & Kubat, M. (1998). Machine learning and data mining: Methods and applications. New York: Wiley. Min, S., Kim, J., & Dornfeld, D. A. (2001). Development of a drilling burr control chart for low alloy steel, AISI 4118. Journal of Materials Processing Technology, 113, 4–9.
Mitchell, T. M. (1997a). Does machine learning really work? AI Magazine, 18(3), 11–20. Mitchell, T. M. (1997b). Machine learning. McGrawHill International Editions. Montgomery, D. C. (2004). Design and analysis of experiments (6th ed.). John Wiley & Sons. Nieves, J., Santos, I., Penya, Y. K., Rojas, S., Salazar, M., & Bringas, P. G. (2009). Mechanical properties prediction in high-precision foundry production. In 7th IEEE international conference on industrial informatics, Cardiff, Wales (Vol. (1–2), pp. 31–36). Nilsson, N. J. (1996). Introduction to machine learning. Early draft of proposed textbook. Peña, B., Aramendi, G., Rivero, A., & López de LaCalle, L. N. (2005). Monitoring of drilling for burr detection using spindle torque. International Journal of Machine Tools & Manufacture, 45, 1614–1621. Peña, B., Aramendi, G., & Rivero, M. A. (2007). Method for monitoring burr formation in processes involving the drilling of parts. WO2007/065959A1. Pittner, S., & Kamarthi, S. V. (2002). Feature extraction from wavelet coefficients for pattern recognition tasks. International Conference on Neural Networks, 1997(3), 1484–1489. Rangwala, S. S., & Dornfeld, D. A. (2002). Learning and optimization of machining operations using computing abilities of neural networks. IEEE Transactions on Systems, Man and Cybernetics, 19(2), 299–314. Ross, S. M. (2005). Introductory statistic (2nd ed.). Elsevier Inc. Russel, S., & Norvig, P. (1995). Artificial intelligence: A modern approach. Prentice Hall. Santos, I., Nieves, J., Penya, Y. K., & Bringas, P. G. (2009). Optimising machinelearning-based fault prediction in foundry production. In Proceedings of distributed computing, artificial intelligence, bioinformatics, soft computing, and ambient assisted living, Pt li (Vol. 5518, pp. 554–561). doi:10.1007/978-3-64202481-8_80. Sick, B. (2002). On-line and indirect tool wear monitoring in turning with artificial neural networks: A review of more than a decade of research. Mechanical Systems and Signal Processing, 16(4), 487–546. Umbrello, D., Ambrogio, G., Filice, L., Guerriero, F., & Guido, R. (2009). A clustering approach for determining the optimal process parameters in cutting. Journal of Intelligent Manufacturing, 21(6), 787–795. Wang, K. (2007). Applying data mining to manufacturing: The nature and implications. Journal of Intelligent Manufacturing, 18(4), 487–495. Witten, I. H., & Frank, E. (2000). Data mining: Practical machine learning tools and techniques with Java implementations. San Francisco: Morgan Kaufmann.