Applied Soft Computing 35 (2015) 186–198
Contents lists available at ScienceDirect
Applied Soft Computing journal homepage: www.elsevier.com/locate/asoc
Incremental learning for online tool condition monitoring using Ellipsoid ARTMAP network model C. Liu, G.F. Wang ∗ , Z.M. Li Tianjin Key Laboratory of Equipment Design and Manufacturing Technology, Tianjin University, Tianjin 300072, China
a r t i c l e
i n f o
Article history: Received 25 September 2014 Received in revised form 18 June 2015 Accepted 20 June 2015 Available online 30 June 2015 Keywords: Ellipsoid ARTMAP network Fast correlation based filter Online classification Incremental learning Tool wear monitoring
a b s t r a c t In this paper, an Ellipsoid ARTMAP (EAM) network model based on incremental learning algorithm is proposed to realize online learning and tool condition monitoring. The main characteristic of EAM model is that hyper-ellipsoid is used for geometric representation of categories which can depict the sample distribution robustly and accurately. Meanwhile, adaptive resonance based strategy can realize the update of the hyper-ellipsoid node locally and monotonically. Therefore, the model has strong incremental learning ability, which guarantees the constructed classifier can learn new knowledge without forgetting the original information. Based on incremental EAM model, a tool condition monitoring system is realized. In this system, features are firstly extracted from the force and vibration signals to depict dynamic features of tool wear process. Then, fast correlation based filter (FCBF) method is introduced to select the minimum redundant features adaptively so as to decrease the feature redundancy and improve classifier robustness. Based on the selected features, EAM based incremental classifier is constructed to realize recognition of the tool wear states. To show the effectiveness of the proposed method, multi-teeth milling experiments of Ti-6Al-4V alloy were carried out. Moreover, to estimate the generation error of the classifiers accurately, a five-fold cross validation method is utilized. By comparison with the commonly used Fuzzy ARTMAP (FAM) classifier, it can be shown that the averaging recognition rate of EAM initial classifier can reach 98.67%, which is higher than FAM. Moreover, the incremental learning ability of EAM is also analyzed and compared with FAM using the new data coming from different cutting passes and tool wear category. The results show that the updated EAM classifier can get higher classification accuracy on the original knowledge while realizing effective online learning of the new knowledge. © 2015 Elsevier B.V. All rights reserved.
1. Introduction Tool wear has negative effects on surface quality, dimensional precision of workpiece, and may even cause a harmful effect on safe operation of total machining system. According to the research, about 20% machine downtime is caused by the tool wear [1]. In addition, the cost of cutting tool and tool changing is about 3–12% of the total machining cost [2]. Therefore, it is necessary to build an online tool condition monitoring (TCM) system for machining industry to detect and recognize the variation of tool wear state fast and effectively. Tool wear is a complex nonlinear process under combined action of the cutting tool, work-piece and machine tools. Therefore, different kinds of artificial intelligence based monitoring strategies, such as artificial neural networks (ANNs), support vector machine (SVM) and hidden Markov model (HMM) have been proposed in recent years. By using multi-layer perceptron (MLP) network and backpropagation (BP) training algorithm, Purushothaman and Srinivasa [3] realized a TCM system to judge whether the cutter was worn or not. To realize the classification of multi categories tool states, Dimla and Lister [4] developed a tool condition monitoring system based on multi-layer perceptron (MLP) neural networks. In this
∗ Corresponding author. Tel.: +86 02227406951. E-mail address:
[email protected] (G.F. Wang). http://dx.doi.org/10.1016/j.asoc.2015.06.023 1568-4946/© 2015 Elsevier B.V. All rights reserved.
system, four kinds of tool states (sharp, part worn, worn and fractured tools) were monitored by adopting cutting force and vibration signal. The authors claimed that it was capable of accurate tool state classification in excess of 90% accuracy. To depict the complex relationship between features and tool wear states, Brezak et al. [5] used radius basis function (RBF) neural network to classify tool states under three wear levels during milling process. To avoid over fitting phenomenon and realize stable learning under small samples cases, SVM based classifiers have been proposed and widely used in recent years. Xu and Wang [6] used SVM model to realize the tool wear identification by using acoustic emission sensory information and wavelet package based feature extraction algorithm. The results show that its identification accuracy reaches as high as 93.3%. Sun et al. [7] introduced an improved SVM approach combining with one-versus-one learning strategies to carry out multiclassification of the tool states. The authors claimed that the percentage error of the classifier was about 4.8%. Wang et al. [8] presented a TCM system based on support vector machine (-SVM) to realize multi categories tool wear classification during milling process of Ti-6Al-4V alloy. The vibration signals corresponding to four kinds of tool wear states were collected and time-frequency domain features were extracted based on wavelet packet decomposition. The analysis shows that the classification accuracy of the system exceeds 92%. To depict the time dynamics of the selected features, Kassim et al. [9] used HMM to classify various states of the tool wear during end milling operation by taking the fractal features of machined surface as the input. The results show that the HMM based model can provide reliable tool condition classification. Boutros and Liang [10] proposed a discrete HMM for the
C. Liu et al. / Applied Soft Computing 35 (2015) 186–198 recognition of tool wear states during milling process. The experiments under three different tool conditions were conducted and the corresponding acoustic emission data were collected. The results show that the success rate for the tool wear severity classification was greater than 95%. Cetin et al. [11] applied multi-rate coupled HMMs to characterize the stochastic processes of the tool wear during titanium alloy milling. The authors concluded that it can achieve an excellent classification performance. The above-mentioned methods can reflect the relationship between the tool wear states and the selected features accurately and effectively. Nevertheless, the construction of these classifiers depends on the predefined training samples. When new data appears, the whole classifier should be re-trained because the parameters of network cannot be adjusted locally and incrementally. In such case, the previously acquired knowledge is easier to be forgotten when the re-trained classifier is constructed. However, for the tool condition monitoring of small batch manufacturing process, the types of the cutter and work-piece materials vary frequently. Moreover, the morphology and position of the tool wear are so complicated that it is not realistic to collect all the data corresponding to different tool states only by finite cutting experiments. Therefore, it is necessary to update an existing classifier in an incremental fashion so as to memorize new training data without sacrificing the classification performance on the original knowledge [12]. In recent years, incremental learning based classifiers, such as ARTMAP based classifiers, nearest generalized exemplar (NGE) [13], generalized fuzzy min–max neural networks (GFMMNN) [14], growing neural gas (GNG) [15] and function decomposition (FD) [16] are proposed to memorize new knowledge without catastrophic forgetting the previously knowledge. Among these algorithms, ARTMAP based classifiers, due to its superior local representation and discrete distribution ability, have been studied by many researchers to realize supervised learning and classification tasks [17–19]. Within the ARTMAP family, fuzzy ARTMAP (FAM) is one of the commonly used typical structure in which the training data is coded and represented using hyper rectangle shape node. Up to now, it has been used in some applications ranging from data clustering [20,21], incremental learning [22,23] and tool condition monitoring [24–26]. However, FAM processes two potential weaknesses. First, the hyper-rectangle used in FAM is more suitable for uniformly distributed data [27,28]. However, it is not an ideal shape to represent complex sample distribution in real application [29]. Second, FAM use hard competitive learning to form their categories and lack of smoothing operation algorithm, which make it easier to over-fitting and sensitive to noise. Therefore, in this study, Ellipsoid ARTMAP (EAM) network in which the node is represented by hyper ellipsoid is presented. In comparison with FAM, hyperellipsoid shape retains virtually all of the excellent characteristics of hyper rectangle, such as batch and incremental learning with fast learning ability, and overcome some of its drawbacks at the same time [30]. The major characteristic of EAM can be summarized as follows. The first is that EAM learn associative mappings based on adaptive resonance theory, which means it can undergo both batch and incremental learning. The second is that training process of the EAM classifier can converge to a stable state rapidly due to its fast learning ability. In fact, these two characteristics are also owned by FAM. What makes the EAM more suitable for classification lies in the third feature. That is, instead of hyper-rectangle, EAM describes the distribution of the sample data by hyper-ellipsoid shape. Therefore, the nonlinear decision boundary can be described by EAM, which gives a more accurate and smooth boundary description. In real application, the sample data is usually uniformly distributed and the sample data near the boundary is sparse. Therefore, the noisy data near the boundary has a great influence on the shape of the decision boundary. In this case, EAM based nonlinear boundary description strategy can effectively eliminate the influence of the noisy data. Xu et al. [31] utilized EAM to cluster tissues by analysis of the gene expression data generated by DNA microarray experiments. Anagnostopoulos and Georgiopoulos [32,33] studied the EAM using artificial classification example named the circle-in-a-square problem and compared it with FAM classifier. The authors declared that EAM showed better ability in clustering and classification tasks. However, the incremental learning and online classification ability of EAM is seldom investigated especially in the field of tool condition monitoring. Therefore, in this study, EAM based classifier is constructed to realize online incremental learning of complex machining process and recognition of the tool wear states. Within the EAM model, offline training is used to construct the initial classifier using batch mode and incremental learning is adopted to update the classifier without access to previous training data. The combination of batch and incremental learning can improve the generalization performance of the classifier. To characterize the dynamic tool wear process accurately [34], both the force and vibration signal in three different directions were collected. Moreover, 138 different features in the time, frequency and time–frequency domain are extracted from the collected signals. Fast correlation based filter (FCBF) algorithm is then employed to search for a minimum redundant feature subset adaptively. Finally, the EAM classifier is constructed to realize the tool state classification and incremental learning. To show the effectiveness of the proposed system, milling experiments of Titanium alloy were carried out. Both FAM and EAM classifier are constructed to compare the performance based on five-fold cross validation method. The results show that the accuracy of the EAM is about 98.7%, which outperforms the FAM model for the recognition of the tool wear state. In addition, the incremental learning ability of EAM is also discussed by taking the new data as the online learning samples. It can be concluded that EAM can realize effective incremental learning without forgetting the original knowledge.
187
Fig. 1. Structure of Ellipsoid-ARTMAP classifier.
The remainder of the paper is organized as follows: in Section 2, the principle of EAM incremental learning and classification is presented. Section 3 outlines the architecture of the proposed method. The principle of data acquisition, feature extraction and features selection are given in details. In Section 4, a multi-teeth milling experiment of Ti-6Al-4V alloy was carried out and 138 features are extracted from the force and vibration in three directions. In addition, FCBF algorithm is utilized to select the minimum redundant features and the corresponding dataset are built. In Section 5, FAM is adopted simultaneously to make comparison with EAM classifier. Moreover, incremental learning ability of EAM classifier is testified using the dataset from different cutting passes and tool wear category. The results show the EAM model outperforms the FAM model for the recognition of the tool wear state. Besides, EAM can realize effective incremental learning without forgetting the original knowledge. Some useful conclusions are drawn in Section 6.
2. Principle of EAM network 2.1. Structure of EAM The structure of EAM based classifier is illustrated in Fig. 1. It is mainly composed of three layers: input layer F1 , resonance layer F2 and category layer F3 . F1 and F2 layer are linked by weight vector wj , which is also called template vector. The weight vector wj is used to encode the input vector into jth node in F2 layer. The nodes which pass the vigilance test (VT) are called committed nodes. The commitment test (CT) is then carried out for these nodes and the winner node is mapped into the category layer so as to get the classification results. During the training process, match tracking (MT) process need to be invoked if the result of F3 is incorrect. The function of MT is to search for an appropriate node that can correctly classify a presented training sample in the case that the category of this sample was originally mismatched. The representation of a two-dimensional EAM node in F2 is shown in Fig. 2(b). Different from the rectangle type node (as shown in Fig. 2(a)), the node in EAM is a hyper ellipsoid type which is described by a template vector wj = [mj , dj , Rj ], where mj is the center of hyper-ellipsoid. dj is direction vector of the node, which coincides with the direction of the hyper-ellipsoid’s major axis. Rj is called radius, which equals to half length of the major axis.
188
C. Liu et al. / Applied Soft Computing 35 (2015) 186–198
vj
dj
μR j uj
S (w j )
E new j E old j d old j
mj
Rj
(a)
m
R
(b)
d new j
old j
m new j
old j
R new j
Fig. 2. Two-dimensional node representation.
Generally, the distance between an input sample I and the center mj of a node j is given as [30]
I − m = (I − mj )T Cj (I − mj ) C
Fig. 3. Two-dimensional description of EAM weight update.
(1)
j
Cj − CTj =
⎧ ⎨ 1 I − (1 − 2 )dj dTj if dj =/ 0 2 ⎩
jnew = jold +
(2)
if dj = 0
I
where, is the ratio of minor-to-major axes lengths of the ellipsoid. The distance between I and the node j is thus given by [32] dis(I|wj ) = max
I − mj , Rj − Rj C
(3)
j
As shown in Fig. 2(b), the shaded area denotes the representation region of the node j, which is the set of points that satisfy the condition below [33]
dis(I|wj ) = 0 ⇒ I − mj
Cj
≤ Rj
(4)
If dis(I|wj ) = 0, the node j is committed, or else the further vigilance test needs to be carried out. 2.2. Construction of EAM classifier Realization of the EAM classifier comprises three important stages: vigilance test (VT), commitment test (CT) and weight updating. To complete VT and CT, two critical functions: category match function (CMF) *(wj |I) and category choice function (CCF) T*(wj |I) are respectively used, whose expressions are given as [31]
∗ (wj |I) = 1 −
T ∗ (wj |I) =
Rj + max
I − mj , Rj C j
(5)
Dm
Dm − Rj − max
I − mj , Rj C j
(6)
Dm − 2 ∗ Rj + a
where, Dm is called effective diameter and a is the choice parameter. For each node j in EAM, CMF value *(wj |I) and CCF value T*(wj |I) are calculated when I is given. Then *(wj |I) and T*(wj |I) are compared with baseline vigilance parameter ¯ and constant value Tu respectively. Where the constant value is defined as Tu = Dm /(2Dm w + a). The detailed parameter selection process is described in the following section. If these two tests are passed, EAM nodes encode the training sample by updating its templates wj according to the following formulas [32] Rjnew
=
Rjold
+ 2
max
Rjold ,
I − mold j
C old j
− Rjold
(7)
⎛
where, Cj is called shape matrix of the category j and can be described as
dj =
I
2
⎜ ⎜1 − ⎝
max
C old
I − j old
⎞
Rjold , I − jold C old
j
⎟ ⎟ (I − j ) old ⎠
(8)
j
−m
j (2) I(2) − mj
(9)
2
where ∈ (0, 1] denotes the learning rate. = 1 means that the EAM has the ability of fast learning. I(2) represents the second sample encoded by the node j. Fig. 3 presents the updating process of a two-dimensional EAM node. During the training process, the node can only grow in size and never be destroyed. As shown in Fig. 3, the node’s new representation region is a minimum hyper-ellipsoid which contains both old region and the new sample to be encoded. It should be noted that when the node j is first created to encode the sample I(1) , its center mj coincides with I(1) , the direction vector dj = 0 and the radius Rj = 0. That is to say, the initial node only includes one point. When the node j encodes the second sample I(2) , the category’s representation region is expanded into an ellipse, as shown in the yellow ellipse in Fig. 3. The direction vector dj is determined according to Eq. (9). If the category j is further used to encode the next sample, the representation region expands the blue ellipse, which includes the previous presentation region. During this process, the direction vector maintains constant. And it will not change during the future update process of weight vector. Meanwhile, Enew and Eold touch only at one point because the minimum hyperellipsoid is used to characterize the old region and the new pattern simultaneously [33]. This feature guarantees that the EAM classifier possesses the ability of updating an existing classifier in an incremental fashion without sacrificing classification performance on the previous training data. 2.3. Incremental learning and classification For tool condition monitoring system, the classifier needs to be updated via online means. So in this study, incremental learning is realized for EAM classifier. As shown in Section 2.2, the update of the EAM classifier lies in two aspects. One is the update of template vector wj , the other is to add new node. It can be seen that the ellipsoid node is encoded through the update of the template vector wj . As given in Eqs. (7)–(9), wj increases monotonously during the training process, which indicates that the representation region of the ellipsoid node will be gradually expanded with the update of wj . That assures that the updated node can cover the
C. Liu et al. / Applied Soft Computing 35 (2015) 186–198
Fig. 4. Flowchart of EAM training.
189
190
C. Liu et al. / Applied Soft Computing 35 (2015) 186–198
Fig. 5. Flowchart of EAM classification.
original training sample completely. Moreover, new node has no any influence on other nodes. Therefore, EAM classifier has excellent incremental learning ability. Fig. 4 gives a detailed description of EAM incremental training process. First, the network is initial¯ Those ized and the vigilance parameter is given a baseline value . nodes which can pass VT are viewed as committed and form a candidate set S. Next, CT will be carried out for all nodes in S. The weight vector is updated if the node wins the CT and the category label is in accordance with the real label. If all nodes fail to pass the CT or the category label is incorrect, the match tracking (MT) process will be activated [31]. As shown in Fig. 4, by increasing the value of gradually and removing the mismatched node, then MT initiates a new search for the remaining nodes in S so as to find an appropriate node to classify the presented sample correctly. The classification based on EAM is similar to its training process. The label corresponding to the winning node is utilized as the
output of the classifier. If all nodes are not committed, the output is set to −1. The detailed description is given in Fig. 5.
2.4. Parameters selection There are six parameters need to be predefined: Vigilance parameter ∈ [0, 1], choice parameter a > 0, learning rate ∈ (0, 1], effective diameter Dm , ratio of minor-to-major axes lengths ∈ (0, 1] and parameter ω > 0. Dm is used to ensure that√CMF value remains positive [35]. In this paper, Dm is defined as M where M is the dimensionality of the sample. The parameter ω is usually chosen as ω ≥ 0.5 to ensure stability of EAM [36]. In order to ensure the efficiency of EAM, equals to 1. The choice parameter a is a really small positive value which has no obvious influence on EAM performance. In this research, a is selected as 0.001. The selection of
C. Liu et al. / Applied Soft Computing 35 (2015) 186–198
191
Table 1 Mathematical expression of time domain features. Dimensional features
Dimensionless features
Feature
Function equation
Feature
Function equation
Crest factor
CF =
Form factor Pulse indicator
Ws = RMS X¯ Pu = Pm ¯
Margin
MAR =
Kurtosis
KUR =
N
Mean value
X¯ =
Peak value Peak to peak value
Pm = max(|xi |) Pk = max(xi ) − min(xi )
1 N
xi i=1
N RMS = N1 xi2
Root mean square
Pm RMS
X
i=1
N
N Pm 2 √ 1 |xi | N i=1
2
¯ (xi −X) i=1 N−1
VAR =
Variance
P=
Power
1 N
N
1 N
N x −X¯ i
4
i=1
xi2
Skewness
SKE =
N
3
¯ (xi −X)
(N−1)(N−2) 3
i=1
3.1. Data acquisition Multi sensor fusion could reduce the uncertainty of the measurement and provide independent information during the machining process. So in this study, force and vibration sensor in three directions (radial, axial and feed) are chosen to depict the dynamic characteristic of the tool wear process. 3.2. Feature extraction
Fig. 6. Architecture of TCM.
and are realized using trial and error method based on training samples. 3. Framework of online TCM Aiming at online recognizing the tool wear states during milling process, a TCM framework based EAM model is proposed. As illustrated in Fig. 6, the TCM framework presented in this study consists of four parts: (1) data acquisition, (2) feature extraction, (3) feature selection, (4) EAM classification.
In this study, features of force signal and vibration signal are extracted in time domain, frequency domain and time–frequency domain respectively [34,37]. As shown in Table 1, features from the time domain mainly include 7 dimensional features: mean ¯ peak value (Pm ), peak to peak amplitude value (Pk ), value (X), root mean square of amplitude value (RMS), variance of amplitude value (VAR), power P and burst rate (Br , the number of times the signal exceeds the preset thresholds per second), and 6 dimensionless features: crest factor (CF), form factor (Ws ), pulse indicator (Pu ), margin (MAR), kurtosis (KUR), skewness (SKE) [34]. As shown in Table 2, features from the frequency domain mainly include 4 dimensional features: the mean of power spectrum (MPS, mean of power spectrum in a specific frequency band), sum of total
Table 2 Mathematical expression of frequency domain features. Dimensional features Feature
Dimensionless features Function equation
Feature
Function equation N
Mean of power spectrum
MPS =
1 N
N
S(f )i
Variance of band power
VBP =
N−1
i=1
Sum of total band power
Peak of band power
STP =
f2 f1
S(f )
PBP = max(S(f))
S(f) is the power spectrum of signal X. f1 , f2 is the cut off frequency.
(S(f )i −MPS)2
i=1
N
Skewness of band power
SBP =
1 N
(S(f )i −MPS)3
i=1 3
VBP 2
N
Kurtosis of band power
KBP =
1 N
(S(f )i −MPS)4
i=1 VBP 2
192
C. Liu et al. / Applied Soft Computing 35 (2015) 186–198
Table 3 Frequency range of orthogonal wavelet decomposition. Frequency band
L1
Frequency range
0,
f 16
H1
f
, 16
f 8
H2
f
, 8
f 4
H3
f
4
,
f 2
f is sampling frequency.
band power (STP), peak of band power (PBP), frequency of maximum peak of band power (FPBP) and 4 dimensionless features: variance of band power (VBP), skewness of band power (SBP), kurtosis of band power (KBP) and relative spectral peak per band (RSPB, the ratio of peak of band power over the mean of band power) [34]. In addition, wavelet decomposition is applied to obtain the time–frequency domain features. Db5 wavelet function is used here and decomposition level is set to 3. The frequency range of different bands is shown in Table 3. The energy percentage of four frequency bands L1 , H1 , H2 and H3 over the total energy of the signal are calculated respectively as the time-frequency features. Therefore, the total dimensions of features from time domain, frequency domain and time–frequency domain for each signal are 25. 3.3. Feature selection Although the increase of the feature dimension characterizes the information in the signals more completely, some irrelevant and redundant features can also cause dimension disaster [38] and negatively influence on classifier performance. In order to improve the accuracy and robustness of the TCM system, a minimum redundant feature subset needs to be obtained before the classifier is constructed. In this study, the fast correlation based filter (FCBF) feature selection method is presented. In comparison with other methodologies, such as maximum relevance and minimum redundancy (mRMR) algorithm [39,40], distance evaluation technique [41], the main characteristic of FCBF is that the attributes of each feature are evaluated applying symmetrical uncertainty principle and redundant features are removed heuristically. During this process, the number of the minimum redundant features depends only on the candidate dataset. Therefore, the feature selection process can be carried out adaptively and automatically. The symmetrical uncertainty value U is used here to measure the correlation between two variables X and Y [42] U(X, Y) = 2
G(X|Y) H(X) + H(Y)
(10)
where H(X) = − i P(xi )log2 (P(xi )) is the entropy of the input variable X. P(xi ) is the prior probability for each value xi of X, and P(xi |yi ) is the posterior probability of xi given the value yj in the variable Y. G(X|Y) = H(X) − X(X|Y) is called information gain from X provided by Y. In addition, the conditional entropy H(X|Y)is expressed as [43,44]
H(X|Y) = −
yj
j
P(xi |yi )log2 (P(xi |yi ))
(11)
Fig. 7. Picture of cutter.
Step 2: Filtering Set the initial value of index i as 1, calculate the uncertainty Ui,j between Di and the each remaining feature Dj (j = 1, 2, . . ., N, j = / i) respectively. At the same time, the uncertainty Uj,c between subset Dj and label C is also calculated. Remove the feature subset Dj if Ui,j > Uj,c . Rearrange the dataset D using the remaining subset. Step 3: Iteration Increase i by 1 and repeat Step 2. The selection process is terminated if all subset is traversed or no more subset can be removed. The remaining dataset is taken as the final feature set. 4. Experiment setup and data preparation 4.1. Experiment setup To show the effectiveness of the proposed method, milling experiments were carried out on FNC86-A20 Makino vertical machining center. Ti-6Al-4V titanium alloy was machined using four-flute inserted teeth milling cutter. The type of cutter holder was EAP300R25D25d150L4T which was 150 mm in length and 25 mm in diameter. The insert used was APMT1135PDER-H2 coated with VP15TH layer (as shown in Fig. 7). A dynamometer (type: Kistler 9257A) was connected to a charge amplifier (type: Kistler 5070) and mounted on the machining table to measure the dynamic cutting force with 1 KHz sampling frequency. Three vibration sensors (type: LC0103TA) were used for signal collection with 10 KHz sampling frequency. One sensor was installed on the workpiece to measure the feed vibration. The other two sensors were installed in perpendicular sides of the spindle to measure the axial and radial direction vibration respectively. The vibration signals were sampled and converted into digital form using acquisition card (type: NI 9234) and stored in the computer. Fig. 8 shows the experimental setup of the TCM system.
i
The dataset for feature selection is composed of feature set D = {Dk } = and its class label vector C. Dk , whose length is N, is the subset corresponding to certain feature index k. The total number of candidate feature index is K. Redundant features is removed by comparing the uncertainty of subset pair Uk,c = U(Dk , C) with Uk,l = U(Dk , Dl(l =/ k) ) iteratively. The detailed process is given as follows: Step 1: Rank ordering Calculate the symmetrical uncertainty Uk,c between each subset Dk and class label C. Rearrange dataset D based on descending order of Uk,c .
4.2. Data preparation As shown in Table 4, the experiments were carried out under fixed cutting parameters. The tool path was straight line and each Table 4 Cutting parameters of end milling process. Cutting depth (mm)
Cutting width (mm)
Feed (mm/tooth)
Spindle speed (rpm)
1
22
0.1
509
C. Liu et al. / Applied Soft Computing 35 (2015) 186–198
193
Fig. 8. Experiment setup.
Fig. 9. Tool morphology under different tool wear states.
Table 5 The list of the selected features. Number
Feature name
Number
Feature name
1 2 3 4 5 6 7 8 9 10
Peak value of the radial force Power of the feed force Form factor of the feed force RMS of the radial force Energy percentage of H1 in the feed vibration Skewness of the feed force Skewness of the feed vibration Peak value of the axial vibration Burst rate of the radial vibration Power of the feed vibration
11 12 13 14 15 16 17 18 19
Burst rate of the feed vibration Power of the axial vibration Peak to peak amplitude value of the feed force Mean value of the axial vibration Variance of amplitude value of the radial feed Relative spectral peak per band of the axial vibration Form factor of the feed vibration Skewness of band power of the feed vibration Skewness of the feed vibration
194
C. Liu et al. / Applied Soft Computing 35 (2015) 186–198
Table 6 Description of feature dataset.
1 Size
New cutter Middle wear Severe wear
100 100 100
D2
19
New cutter Middle wear Severe wear
100 100 100
D3
19
Small wear
100
cutting pass was 100 mm. After each pass, flank wear of four cutters was measured by optical microscope. According to the averaging flank wear value of the cutter, tool wear condition was thus divided into four classes: new cutter (0–0.1 mm), small wear (0.1–0.2 mm), middle wear (0.2–0.3 mm) and severe wear (>0.3 mm). The morphology of the tool wear zone under these four states is illustrated in Fig. 9. The recorded signals at different tool wear states are divided into segments with the length of 2048. For each segment, 25 features are extracted and normalized respectively from the vibration and cutting force signal in three directions. By observing the feature variation of force signals under different tool wear states, it can be found that the features Br , FPBP and the percentage energy of L1 and H1 almost show no relationship with the tool wear states. Therefore, these four features are removed from the list of force features family. Finally, 63 features from force signals and 75 features from vibration signals in three directions are correspondingly acquired and the total number of the features is 138. To show the effectiveness of EAM classifier based TCM system, three dataset are constructed. The dataset S1 is used for construction of the original EAM classifier which includes three tool wear states (i.e. new cutter, middle wear and severe wear). Each state has 100 samples so that the total size of S1 is 300. S2 includes the same tool wear states and the same length as S1 . However, these data come from different cutting passes. S3 is the data for small tool wear category which did not appears in S1 and S2 . Both S2 and S3 are used for incremental learning of EAM classifier. Based on S1 , the FCBF algorithm is applied to search for a minimum redundant feature subset and 19 feature vectors are finally selected as predominant feature (the names of these features are listed in Table 5). Therefore, the feature vectors other than these 19 features are removed from S1 and a new dataset D1 is formed with size 300 × 19. In order to keep consistency between the off line training data and the incremental data, S2 and S3 are also reduced to 19 dimensions based on these features and the corresponding new dataset is named as D2 and D3 , respectively. The detailed information of these three dataset is shown in Table 6. The spatial distribution of some selected feature vectors from D1 is shown in Fig. 10 in which different tool wear states are labeled as different kinds of marks. It can be obviously seen that the spatial distribution for each tool wear is very disperse and irregular. Some feature data under different tool wear states even mix with each other, which make the boundary of each category hard to be distinguished. To realize accurate classification of tool states, EAM classifier is adopted in the following section to recognize it. 5. Classification and incremental learning based on EAM
0.8
Power of the feed force
Tool wear category
19
0.6 0.4 0.2 0
0
0.2
0.4 0.6 0.8 Peak value of the radial force
1
(a) 1
Burst rate of the radial vibration
Dimension
D1
new cutter middle wear severe wear
0.8 0.6 0.4 0.2 0
0
0.2
0.4 0.6 0.8 Form factor of the feed force
1
(b) Fig. 10. Spatial distribution of feature vectors.
and then tested using the remaining one subset. This procedure is repeated v times and each subset is utilized for testing exactly once. By averaging the test errors over the v trials, the expected generalization error can be calculated. Because the samples utilized in the training and test process are different from each other, cross validation scheme can detect and prevent over-fitting effectively. In the following section, cross validation scheme is adopted to segment training samples and realize the construction of the classifier. 100 90 80 70 Accuracy rate/%
Dataset
new cutter middle wear severe wear
60 50 40 30 20
5.1. Cross validation To reduce the probability of over-fitting and improve the robustness of the classifier, cross validation method is adopted [45]. For a v-fold cross validation, the samples are randomly divided into v equal size subsets. The classifier is obtained by training v − 1 subsets
10 0
EAM FAM 1
2
3 Fold number
4
5
Fig. 11. Comparison of EAM with FAM under five-fold cross validation.
C. Liu et al. / Applied Soft Computing 35 (2015) 186–198
1
0.6 0.4
0.6 0.4 0.2
0.2 0
D1 D2
0.8
Root mean square
Root mean square
1
D1 D2
0.8
195
0
0.2
0.4 0.6 Form factor
0.8
0
1
0
0.2
(a) New cutter 1
D1 D2
1
D1 D2
0.8
Root mean square
Root mean square
0.8
(a) New cutter
0.8 0.6 0.4 0.2
0.6 0.4 0.2
0
0.2
0.4 0.6 Form factor
0.8
0
1
0
0.2
(b) Middle wear
0.6 0.4
0.2
0.4 Form factor
0.8
1
0.6
0.8
(c) Severe wear
D1 D2
1
Root mean square
0.8
0
0.6
(b) Middle wear D1 D2
1
0.2 -0.2
0.4
Form factor
1.2
Root mean square
0.6
Form factor
1
0
0.4
0.8
0.6
0.4
0.2
0
0.2
0.4
0.6
0.8
Form factor (c) Severe wear
Fig. 12. Node distribution of EAM classifier before incremental learning. Fig. 13. Node distribution of EAM classifier after incremental learning.
5.2. EAM classifier construction and comparison with FAM Using the feature sample D1 , five-fold cross validation scheme are adopted and EAM classifier is constructed. To make a comparison, FAM classifier is trained simultaneously to realize the tool wear state recognition. Based on trial and error method, the parameters of both EAM and FAM are determined. Finally, for EAM classifier, is selected as 0.84 and the ratio of minor-to-major axes is chosen as 0.6. For FAM, the learning rate and vigilance value are set to be 1 and 0.84, respectively. Considering the decision boundary characteristic of multi-class classification for EAM and FAM, the
classification accuracy rate of the classifier is calculated to evaluate their performance based on the cross validation principle. The accuracy rates for the test samples under five-fold cross validation are calculated and the corresponding bar plots are shown in Fig. 11. It can be clearly seen that the accuracy rates of the EAM are higher than the FAM under each fold. In addition, the averaging recognition rates of both methods are also calculated, the results show that the EAM can reach 98.67% while the FAM reaches only 89.67%.
196
C. Liu et al. / Applied Soft Computing 35 (2015) 186–198
Table 7 Recognition rate prior and post to incremental learning of D2 by EAM. Fold number
1 2 3 4 5 Average Standard deviation
Original sample
Incremental sample D2
Prior
Post
Rate of change (%)
Prior
Post
Rate of change (%)
1.0 1.0 1.0 0.9667 0.9667 0.9867 0.0182
1.0 1.0 1.0 0.9667 0.9667 0.9867 0.0182
0 0 0 0 0 0
0.6767 0.5867 0.5833 0.6600 0.6033 0.6220 0.0434
1.0 1.0 1.0 1.0 1.0 1.0 0
↑32.33 ↑41.33 ↑41.67 ↑34.00 ↑39.67 ↑37.8
Table 8 Recognition rate prior and post to incremental learning of D2 by FAM. Fold number
1 2 3 4 5 Average Standard deviation
Original sample
Incremental sample D2
Prior
Post
Rate of change (%)
Prior
Post
Rate of change (%)
0.8833 0.9333 0.8833 0.8833 0.9000 0.8966 0.0217
0.8333 0.9000 0.8333 0.6667 0.8500 0.8167 0.0882
0 ↓3.57 0 ↓24.52 ↓5.56 ↓6.73
0.5333 0.5433 0.5200 0.5367 0.5600 0.5387 0.0146
1.0 1.0 1.0 1.0 1.0 1.0 0
↑87.51 ↑84.06 ↑92.31 ↑86.32 ↑78.57 ↑85.75
5.3. Incremental learning In real application, the data depicting certain kinds of tool wear state is hard to collect in one time. Therefore, the further learning is needed to improve the accuracy of the classifier. In this section, two kinds of incremental learning strategies are realized. One is to learn the data coming from the same category as the original classifier. The other is to learn a new tool wear category. In both cases, FAM based incremental learning is also realized to compare with EAM. The demonstration of these two cases is given as follows. 5.3.1. Incremental learning for the same tool wear categories The incremental samples D2 own the same category labels (i.e. new cutting, middle wear and severe wear) as the training samples D1 . However, their sample distribution did not coincide with each other. Taking the third and the fourth features in dataset as x and y axis, the distribution of D2 and D1 is illustrated in Fig. 12. Moreover, the representation regions of each node in the original EAM classifier trained using D1 are shown in this figure. It can be seen that the distribution of D1 and D2 differ from each other although they share the same kinds of tool wear state. In addition, the nodes in the EAM classifier fail to cover the sample distribution in D2 . The classification results taking D2 as input is shown in Table 7. It can be seen that the maximum accuracy under five fold cross validation scheme is only 67.7% and the averaging accuracy is 62.2%. Taking these EAM classifiers as the original ones, incremental learning is carried out using D2 so as to obtain the updated classifiers. The distribution of the representation region for each node of the new EAM classifier is shown in Fig. 13. Based on the updated
classifier, D1 and D2 are input as test samples under five fold cross validation scheme and the corresponding recognition rates are calculated and also shown in Table 7. It can be found that there is no change between the original and updated classifier for the dataset D2 . After incremental learning, the recognition rate still remains the same. Therefore, it can be concluded that the incremental learning maintains the classification performance of the original knowledge. In contrast, the obvious improvement of the recognition rates for D2 can be seen clearly. The accuracy for D2 after incremental learning reaches 100%. The results prove that incremental learning can improve the accuracy of the new data without sacrificing the performance of previous training. To make a comparison, FAM is also utilized to realize incremental learning for the same tool wear categories based on the same dataset. Table 8 shows the recognition rate prior and post to incremental learning of D2 based on FAM. It can be seen that, although the accuracy for D2 after incremental learning reaches 100%, the averaging recognition rate of the original samples after incremental learning has reduced from 89.66% to 81.67%. While for EAM (shown in Table 7), the averaging recognition rates of original sample before and after the incremental learning are all 98.67%. 5.3.2. Incremental learning for different tool wear category Based on the classifiers above, D3 is further used to realize incremental learning of new categories using both EAM and FAM respectively. Different from Section 5.3.1, the tool wear category in D3 is never trained before. This means that the node corresponding to the new category must be added during the incremental
Table 9 Recognition rate prior and post to incremental learning of D3 by EAM. Fold number
1 2 3 4 5 Average Standard deviation
Original sample
Incremental sample D3
Prior
Post
Rate of change (%)
Prior
Post
1.0 1.0 1.0 0.9667 0.9667 0.9867 0.0182
0.9833 1.0 0.9833 0.9667 0.9667 0.9800 0.0139
↓1.67 0 ↓1.67 0 0 ↓0.67
0 0 0 0 0 0 0
1.0 1.0 1.0 1.0 1.0 1.0 0
Rate of change (%) ↑100 ↑100 ↑100 ↑100 ↑100 100
C. Liu et al. / Applied Soft Computing 35 (2015) 186–198
197
Table 10 Recognition rate prior and post to incremental learning of D3 by FAM. Fold number
1 2 3 4 5 Average Standard deviation
Original sample
Incremental sample D3
Prior
Post
Rate of change (%)
Prior
Post
Rate of change (%)
0.8833 0.9333 0.8833 0.8833 0.9000 0.8966 0.0217
0.7500 0.8167 0.7500 0.6833 0.7500 0.7500 0.0472
↓15.09 ↓12.49 ↓15.09 ↓22.64 ↓16.67 ↓16.40
0 0 0 0 0 0 0
1.0 1.0 1.0 1.0 1.0 1.0 0
↑100 ↑100 ↑100 ↑100 ↑100 ↑100
learning process. Also based on the five-fold validation method, these classifiers are trained and updated incrementally. The variation of recognition rates for the original sample and D3 before and after incremental learning for EAM and FAM are shown in Tables 9 and 10, respectively. It can be found in Table 9 that the recognition rates of the original samples show negligible change after EAM incremental learning, only reduces from 98.67% to 98%. The maximum error between prior and post to incremental learning is no more than 1.67%. Moreover, the recognition rate of the updated classifier for D3 almost reaches 100% although its accuracy is zero before the incremental learning is preceded. In contrast, the averaging recognition rate for the original samples after incremental learning of FAM classifier has reduced to 75%. These results testify that EAM based classifier get higher incremental ability than FAM.
6. Conclusion In this study, an Ellipsoid ARTMAP (EAM) network model is proposed to realize accurate classification of the tool wear states and incremental learning. Its main characteristic lies in that the hyperellipsoid is used for realizing the geometric representation of the feature vectors and adaptive resonance algorithm is adopted to update the hyper-ellipsoid node locally and monotonically. Therefore, this model has strong incremental learning ability, which guarantees that the built classifier can learn new knowledge without forgetting the original information. To testify the effectiveness of the proposed methodologies, Ti-6Al-4V alloy milling experiment was carried out and vibration and force signals in different directions are collected simultaneously. A tool condition monitoring system is built in which 138 features are extracted from time, frequency and time–frequency domain and then the minimum redundant features are selected using FCBF algorithm so as to reduce the redundancy and improve the robustness. Base on the selected features, the EAM classifiers are constructed based on the five-fold cross validation method and the classification accuracy is compared with FAM simultaneously. The classification results show the averaging recognition rate of EAM initial classifier can reach 98.67%, which is higher than FAM. Moreover, the incremental learning ability of EAM is also analyzed and compared with FAM using the data from different cutting passes and tool wear category. The results show that the updated EAM classifier can get higher classification accuracy on the original knowledge, which proves that EAM has stronger incremental learning ability.
Acknowledgment This project is supported by National Natural Science Foundation of China (51175371 and 51420105007), National Science and Technology Major Projects (2014ZX04012-014) and Tianjin Science and Technology Support Program (13ZCZDGX04000).
References [1] M. Kious, A. Ouahabi, M. Boudraa, R. Serra, A. Cheknane, Detection process approach of tool wear in high speed milling, Measurement 43 (10) (2010) 1439–1446. [2] R. Teti, Machining of composite materials, CIRP Ann. Manuf. Technol. 51 (2) (2002) 611–634. [3] S. Purushothaman, Y.G. Srinivasa, A back-propagation algorithm applied to tool wear monitoring, Int. J. Mach. Tools Manuf. 34 (5) (1994) 625–631. [4] D.E. Dimla Sr., P.M. Lister, On-line metal cutting tool condition monitoring. II: Tool-state classification using multi-layer perceptron neural network, Int. J. Mach. Tools Manuf. 40 (5) (2000) 769–781. [5] D. Brezak, T. Udiljak, K. Mihoci, D. Majetic, B. Novakovic, J. Kasac, Tool wear monitoring using radial basis function neural network, in: International Joint Conference on Neural Networks, IEEE, 2004, pp. 1859–1862. [6] T. Xu, T. Wang, Cutting tool wear identification based on wavelet package and SVM, in: 8th World Congress on Intelligent Control and Automation, IEEE, 2010, pp. 5953–5957. [7] J. Sun, M. Rahman, Y.S. Wong, G.S. Hong, Multiclassification of tool wear with support vector machine by manufacturing loss consideration, Int. J. Mach. Tools Manuf. 44 (11) (2004) 1179–1187. [8] G.F. Wang, Y.W. Yang, Y.C. Zhang, Q.L. Xie, Vibration sensor based tool condition monitoring using support vector machine and locality preserving projection, Sens. Actuator A Phys. 209 (2014) 24–32. [9] A.A. Kassim, M. Zhu, M.A. Mannan, Tool condition classification using Hidden Markov Model based on fractal analysis of machined surface textures, Mach. Vis. Appl. 17 (5) (2006) 327–336. [10] T. Boutros, M. Liang, Detection and diagnosis of bearing and cutting tool faults using hidden Markov models, Mech. Syst. Signal Process. 25 (6) (2011) 2102–2104. [11] O. Cetin, M. Ostendorf, G.D. Bernard, Multirate coupled Hidden Markov Models and their application to machining tool wear classification, IEEE Trans. Signal Process. (2007) 2885–2896. [12] R. Polikar, L. Udpa, S.S. Udpa, V. Honavar, Learn++: an incremental learning algorithm for supervised neural networks, IEEE Trans. Syst. Man Cybern. 31 (2001) 497–508. [13] S. Salzberg, A nearest hyperrectangle learning method, Mach. Learn. 6 (1991) 277–309. [14] B. Gabrys, A. Bargiela, General fuzzy min-max neural network for clustering and classification, IEEE Trans. Neural Netw. 11 (3) (2000) 769–783. [15] T. Martinetz, S. Berkovich, K. Schulten, Neural gas network for vector quantization and its application to time-series prediction, IEEE Trans. Neural Netw. 4 (4) (1993) 558–569. [16] A. Bouchachia, Incremental learning via function decomposition, in: Proceedings of the 5th International Conference on Machine Learning and Applications, IEEE Computer Society, 2006, pp. 63–68. [17] G.A. Carpenter, S. Grossberg, J.H. Reynolds, ARTMAP: a self-organizing neural network architecture for fast supervised learning and pattern recognition, Neural Netw. 1 (1991) 863–868. [18] G.A. Carpenter, S. Grossberg, J. Reynolds, ARTMAP: supervised real-time learning and classification of nonstationary data by a self-organizing neural network, Neural Netw. 4 (5) (1991) 565–588. [19] M. Mokhtar, X.W. Liu, An ARTMAP-incorporated multi-agent system for building intelligent heat management, in: 3rd IEEE PES International Conference and Exhibition on Innovative Smart Grid Technologies Europe (ISGT Europe), 2012, pp. 1–8. [20] I. Dagher, M. Georgiopoulos, G.L. Heileman, G. Bebis, An ordering algorithm for pattern presentation in fuzzy ARTMAP that tends to improve generalization performance, IEEE Trans. Neural Netw. 10 (4) (1999) 768–778. [21] C.P. Lim, M.M. Kuan, R.F. Harrison, Application of fuzzy ARTMAP and fuzzy c-means clustering to pattern classification with incomplete data, Neural Comput. Appl. 14 (2) (2005) 104–113. [22] G.A. Carpenter, S. Grossberg, N. Markuzon, J.H. Reynolds, D.B. Rosen, Fuzzy ARTMAP: an adaptive resonance architecture for incremental learning of analog maps, in: International Joint Conference on Neural Networks, IEEE, 1992, pp. 309–314. [23] G.A. Carpenter, S. Grossberg, N. Markuzon, J.H. Reynolds, D.B. Rosen, Fuzzy ARTMAP: a neural network architecture for incremental supervised learning of analog multidimensional maps, IEEE Trans. Neural Netw. 3 (1992) 698–713.
198
C. Liu et al. / Applied Soft Computing 35 (2015) 186–198
[24] R. Javadpour, G.M. Knapp, A fuzzy neural network approach to machine condition monitoring, Comput. Ind. Eng. 45 (2) (2003) 323–330. [25] M.-W. Park, B.-T. Park, H.-M. Rho, S.-K. Kim, Incremental supervised learning of cutting conditions using the fuzzy ARTMAP neural network, CIRP Ann. Manuf. Technol. 49 (1) (2000) 375–378. [26] G.F. Wang, Z.W. Guo, L. Qian, Online incremental learning for tool condition classification using modified Fuzzy ARTMAP network, J. Intell. Manuf. 25 (6) (2014) 1403–1411. [27] S. Mohamed, D. Rubin, T. Marwala, Incremental learning for classification of protein sequences, in: International Joint Conference on Neural Networks, IEEE, 2007, pp. 19–24. [28] E. Sapojnikova, ART-Based Fuzzy Classifiers: ART Fuzzy Networks for Automatic Classification, Cuvillier Verlag, Goettingen, 2004. [29] J.R. Williamson, Gaussian ARTMAP: a neural network for fast incremental learning of noisy multidimensional maps, Neural Netw. 9 (5) (1996) 881–897. [30] R. Peralta, G.C. Anagnostopoulos, E. Gomez-Sanchez, S. Richie, On the design of an ellipsoid ARTMAP classifier within the Fuzzy Adaptive System ART framework, in: Proceedings of International Joint Conference on Neural Networks, IEEE, 2005, pp. 469–474. [31] R. Xu, G.C. Anagnostopoulos, D.C. Wunsch, Tissue classification through analysis of gene expression data using a new family of ART architectures, in: Proceeding of the 2002 International Joint Conference on Neural Networks, IEEE, 2002, pp. 300–304. [32] G.C. Anagnostopoulos, M. Georgiopoulos, A.R.T. Ellipsoid, ARTMAP for incremental clustering and classification, in: International Joint Conference on Neural Networks, IEEE, 2001, pp. 1221–1226. [33] G.C. Anagnostopoulos, M. Georgiopoulos, Ellipsoid ART and ARTMAP for incremental unsupervised and supervised learning, in: Conference on Application and Science of Computational Intelligence IV, SPIE, 2001, pp. 293–304. [34] S. Binsaeid, S. Asfour, S. Cho, A. Onar, Machine ensemble approach for simultaneous detection of transient and gradual abnormalities in end milling using multisensory fusion, J. Mater. Process. Technol. 209 (10) (2009) 4728–4738.
[35] G.C. Anagnostopoulos, M. Georgiopoulos, S.J. Verzi, G.L. Heileman, Boosted ellipsoid ARTMAP, in: 5th Conference on Applications and Science of Computational Intelligence, SPIE, 2002, pp. 74–85. [36] G.C. Anagnostopoulos, M. Georgiopoulos, S.J. Verzi, G.L. Heileman, Reducing generalization error and category proliferation in ellipsoid ARTMAP via tunable misclassification error tolerance: boosted ellipsoid ARTMAP, in: Proceeding of the 2002 International Joint Conference on Neural Networks, IEEE, 2002, pp. 2650–2655. [37] G.F. Wang, Y.H. Cui, On line tool wear monitoring based on auto associative neural network, Int. J. Adv. Manuf. Technol. 24 (2013) 1085–1094. [38] L. Zhu, J. Yang, J.N. Song, K.C. Chou, H.B. Shen, Improving the accuracy of predicting disulfide connectivity by feature selection, J. Comput. Chem. 31 (7) (2010) 1478–1485. [39] G.F. Wang, Y.W. Yang, Z.W. Guo, Hybrid learning based Gaussian ARTMAP network for tool condition monitoring using selected force harmonic features, Sens. Actuator A Phys. 203 (2013) 394–404. [40] H.C. Peng, F.H. Long, C. Ding, Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy, IEEE Trans. Pattern Anal. 27 (8) (2005) 1226–1238. [41] B.-S. Yang, K.J. Kim, Application of Dempster–Shafer theory in fault diagnosis of induction motors using vibration and current signals, Mech. Syst. Signal Process. 20 (2) (2006) 403–420. [42] L. Yu, H. Liu, Feature selection for high-dimensional data: a fast correlationbased filter solution, in: Proceedings of the Twentieth International Conference on Machine Learning, Washington, DC, 2003, pp. 856–863. [43] F. Fernandez-Navarro, C. Hervas-Martinez, R. Ruiz, J.C. Riquelme, Evolutionary generalized radial basis function neural networks for improving prediction accuracy in gene classification using feature selection, Appl. Soft Comput. 12 (6) (2012) 1787–1800. [44] K. Kiatpanichagij, N. Afzulpurkar, Use of supervised discretization with PCA in wavelet packet transformation-based surface electromyogram classification, Biomed. Signal Process. 4 (2) (2009) 127–138. [45] M. Stone, Cross-validatory choice and assessment of statistical prediction, J. R. Stat. Soc. B 36 (2) (1974) 111–147.