IEPE accelerometer fault diagnosis for maintenance management system information integration in a heavy industry

IEPE accelerometer fault diagnosis for maintenance management system information integration in a heavy industry

Journal Pre-proof IEPE Accelerometer Fault Diagnosis for Maintenance Management System Information Integration in a Heavy Industry Dr. Chao-Chung Pen...

5MB Sizes 0 Downloads 29 Views

Journal Pre-proof

IEPE Accelerometer Fault Diagnosis for Maintenance Management System Information Integration in a Heavy Industry Dr. Chao-Chung Peng , Mr. Lin-Ga Tsan PII: DOI: Reference:

S2452-414X(19)30041-X https://doi.org/10.1016/j.jii.2019.100120 JII 100120

To appear in:

Journal of Industrial Information Integration

Received date: Revised date: Accepted date:

19 March 2019 4 September 2019 11 December 2019

Please cite this article as: Dr. Chao-Chung Peng , Mr. Lin-Ga Tsan , IEPE Accelerometer Fault Diagnosis for Maintenance Management System Information Integration in a Heavy Industry, Journal of Industrial Information Integration (2019), doi: https://doi.org/10.1016/j.jii.2019.100120

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2019 Published by Elsevier Inc.

1

IEPE Accelerometer Fault Diagnosis for Maintenance Management System Information Integration in a Heavy Industry Type of article : research article Name(s) of the author(s) with highest degree Dr. Chao-Chung Peng Mr. Lin-Ga Tsan Word count (excluding abstract and references) : 4344 Number of Tables : 1 Number of Figures : 14 Number of Photos : Number of References : 14 Corresponding author : Dr. Chao-Chung Peng Postal address : 1 University Road, Tainan City, Taiwan, R.O.C. 70101 Mobile number :+886-912199480 Email address : [email protected] Affiliations of author(s) : Department of Aeronautics and Astronautics, National Cheng Kung University, Tainan, Taiwan Abstract—With the increasing demand for reliable production facilities, the design of a health condition monitoring system with the implementation of automatic diagnosis as well as software solutions is one of the main issues for a smart factory. Among many industrial applications, accelerometer is one of the most frequently used sensors for facility vibration monitoring. Thus, the health condition of the sensor itself is a critical factor for a correct diagnosis. Failure to monitor the sensor’s health condition would potentially cause a false alarm, which may lead to a wrong decision making made by field operators. In this research, a preprocessing method of synthetic data and a Gaussian mixture model (GMM) classifier were developed to classify the health conditions of the online integrated electronic piezoelectric (IEPE) accelerometers. The proposed method was integrated into a product line and the test results achieved >99% of accuracy in determining five different health conditions of the accelerometers. With the aid of the proposed method, the time of human inspection can be significantly reduced and the field safety can also be improved. Moreover, false alarms caused by sensor failure can be prevented. This leads to increase in reliability of the facility monitoring system. Index Terms— Maintenance engineering; condition monitoring; fault diagnosis; classification algorithms; Gaussian mixture model.

I. INTRODUCTION

I

N the modern process industry, a smooth factory operation and uninterrupted production often rely on series of mechanical drives that operating in non-abnormal condition. To monitor and supervise the health conditions of the operating machines and their operation parts, various sensing techniques were adopted and diagnosis methods were developed to provide fault diagnostics [1]. Furthermore, the predictive maintenance capability of a facility management system is desirably possible with the further extend of fault diagnosis implementation. Different measurable physical signals may

include vibration signal [2][4][7][8][9][10], current and voltage [3][6][7][11], temperature through thermal imagery or temperature sensors [5], acoustic signal, and magnetic force measurement. Among them, the vibration signal was used as a primary raw signal source in many industrial applications and related research. To obtain the vibration signal, IEPE accelerometers are often used in such research and applications, where the sensors were mounted on the machines and mechanical driving system to acquire voltage response signals to record the vibration signal. Ensuring the health condition of the online sensors is important in order for an automatic facility health diagnosis and monitoring system run reliably and provide high precision diagnosis.

To diagnose the IEPE accelerometer health condition without additional sensors attached to it, utilizing the voltage response signal from the online accelerometers is regarded as both a reliable and efficient approach for industrial application [2]. By an automatic diagnostic algorithm, the reliability of the cabling from online IEPE accelerometer to the data acquisition (DAQ) equipment can be determined. To develop an automatic clustering process for the preprocessing and clustering of raw data features before the model training, the K-means and density based spatial clustering of applications with noise (DBSCAN) combinatory methods were used to exclude ambiguous overlapping of data features [12]. Since the features extracted from the voltage response of the IEPE accelerometers were distributed in several clusters in the same health condition [2], a classification model that could consider several clusters in the same class was desired. In previous research of image processing, Gaussian based classification model were often used to differentiate classes among data points that exhibit normal distribution [13].

2 For those clusters which are not able to be described by single Gaussian distribution, GMM will be a good alternative. With the combination of GMM and expectation maximization (EM) as a prospective classification model for multi-cluster data points distribution, the method was used to determine the health condition of the IEPE accelerometer. In this paper, we developed a IEPE sensor diagnosis and monitoring system. The IEPE sensor has been widely applied in many heavy industries for equipment monitoring, diagnosis and maintenance. However, once the IEPE sensor is malfunctioned, it is going to give rise to incorrect signals and will cause misleading decision making for field operators. To deal with this issue, a GMM based classifier to diagnose the cabling status and the health status of the IEPE accelerometer was developed. This proposed GMM based algorithm with improved model parameter training process combining techniques of the DBSCAN, K-mean, and linear discriminant analysis (LDA) showed a reliable initial centroids selection solution to the EM algorithm in order to compute GMM parameters automatically. The proposed solution aimed to increase the classification accuracy and allow the facility management system to distinguish up to five different categories of online accelerometer operating status. The Gaussian based classifier used in the proposed method was also useful to prevent false alarm of unrecognized health status. Moreover, outliers can be identified and isolated to avoid being forced to be classified as a predetermined (known) health conditions. As a result, the developed method can be integrated for current industrial information management system.

Fig. 2. Five cabling status of the IEPE accelerometers observed during factory maintenance and operations.

Fig. 3. A snapshot of an operating steel manufacturing line, where the workers were exposed to high temperature and moisture due to the hazardous environment.

(a)

(c)

(b)

(d)

Fig. 1. Accelerometer and cabling installation on the steel manufacturing line. (a) The junction box was exposed to unpredictable high temperature and moisture, rusting the cables and accelerometers. (b) The cable was connected reversely with faults. (c) The broken cable caused a short circuit scenario. (d) Field inspection scenario carried out by the front line technician.

II. SYTEM DESCRIPTION With the accelerometers operating in unpredictable high temperature and moisture environment, failure of the IEPE sensors must be identified by human inspection, as shown in Fig. 1. In this research, five common health condition classes were considered, which included normal, reverse connection, short circuit, disconnection, and open circuit. If these five operating conditions can be diagnosed and determined by the facility management system automatically, not only inaccurate diagnosis caused by sensors fault can be prevented, facility operation may benefit from downtime reduction and saving maintenance effort, both on human resources and time for onsite inspection. The five common cabling and accelerometer health conditions observed on the manufacturing lines was illustrated in Fig. 2. In a recent study, a graphical histogram algorithm (GHA) was developed to diagnose the state mentioned above [2]. In the research, the histogram was used to characterize both the voltage AC and DC responses. By using cubic-spline for fitting the histogram as the golden pattern, each new measurement can

3 be compared with the training dataset on their correlation coefficient. This method could successfully distinguish four states, which are normal, reverse connection, short circuit, and open circuit. However, since the voltage response signal had a high resemblance, it could not distinguish the states between disconnection and open circuit. According to utility configuration in a factory, often thousands of sensors are distributed out in the production site, with hundreds of meters long cables stretching out from the control pulpit to each sensor. If the disconnection and open circuit faulty can be distinguished, the facility manager and the maintenance technicians can prioritize their inspection site, and both save inspection time and avoid unnecessary risk to dispatch workers into hazardous sites as shown in Fig. 3. According to the previous study [2], up to 400 parameters for each class have to be stored for each measurement to construct the golden pattern. This may take up large storage and large computation effort to establish long-term deterioration tracking. Thus, it may not be suitable for realizing it in an embedded system. To increase the benefits and reduce the efforts for fault diagnosis system industrial implementation, the technique developed in this research increased the diagnostics accuracy and enabled less storage space for recording the historical indexes. An automatic number of group determination and outlier detection algorithm had also been developed. III. RESEARCH METHOD In this research, measurement data were collected from the IEPE accelerometers that were installed on the production facility as illustrated in Fig. 3. The dataset then was separated into two sets, one for model training and the other for model testing to verify the accuracy of the trained classification model. To improve the accuracy of such diagnostics algorithm, the EM algorithm was used to estimate the proper parameters of the GMM classifier. Before the dataset was used to train the classification model, preselected 30 measurement indexes were used to extract features. After that, by linear discriminant analysis (LDA) process, the features were projected into a weighted combinative space resulting in dimension-reduced features by using LDA process. Clustering technique of combining DBSCAN with K-mean++ can determine the proper numbers of clusters within each class to mitigate improper initial centroids selection causing iteration inaccuracy during the EM steps. With the dataset acquired and recorded from the real operating steel manufacturing line, the accelerometer voltage response signals were recorded in two types of responses, and it was labeled as AC and DC respectively. Its respective voltage time response raw signals to the health conditions are shown in Fig. 4. Apparently, different fault situations could cause the deviation of the voltage responses.

Fig. 4. AC and DC voltage response of each type of health condition. Data Preprocessing

Classification Model Training

Import Data

AC

Training Data (AC/DC)

Feat.1 Feat.2 Feat.3 Feat.4 Feat.5

Moving-Average

μ

σ

μ σ μ σ

IEPE type Data Acquisition accelerometers Cards (DAQs)

Prepared Data -----------------------------Import Data

Raw Data (AC/DC)

Symbols μ: Mean σ: Standard Deviation

μ

σ

μ

μ σ μ σ

σ

μ σ μ σ

Gaussian Distribution Functions Gaussian Distribution Functions Gaussian Distribution Functions Gaussian Mixture Models Gaussian Mixture Models (GMM)

Parameters (μ,σ)

Moving-Average

Unknown Test Data

DC

DC/AC (μ,σ)

Perform Diagnosis

GMM Probability Functions

Results & Probability of Each Class

Fig. 5. Research method and process flow.

Based on the collected data, both the training dataset and testing dataset would undergo preprocessing, and then 30 measurement indexes would be used for feature extraction. A new domain of the weighted composition of these 30 features would be projected into another multiple-dimension domain by using LDA method. The new value in this projected domain can be used as dimension reduced features representing each health condition. The details of the mathematics and algorithm would be discussed further. With the results from the LDA, the new feature value was used for 2-step clustering, by a combinative algorithm of DBSCAN and K-mean plus-plus (K-mean++) [12]. Since the proper numbers of groups for each health condition classifier would be determined using the 2-step clustering process, the resulting centroids were used and the EM approach was taken to compute the parameters in order to construct the desired GMM classifier [13]. The illustration of the process is shown in Fig. 5. After the GMM classifier had been constructed, the testing dataset that was prepared in advance was used to test the classifier accuracy. IV. ALGORITHM AND CLASSIFICATION MODEL A. Challenges and proposed solutions In the relevant research, it indicated that the response signal of the accelerometers might be coupled with the data acquisition hardware, causing some variance of the voltage response signal even when the health conditions were same [2]. Since the numerical scale of the voltage response signal was in the order of 10-3, the developed algorithm should be able to withstand the variance caused by the measurement equipment. It was also able to perform correct classification even when the

4 dataset across different health conditions might overlap.

a single class composed of k clusters, the GMM only required 3k parameters. The detail process of developing the GMM classifier was illustrated in Fig. 7. After the GMM classifier was constructed by the training dataset, online measurements of the IEPE accelerometers which were mounted on machines of the production lines could be measured and fed into the developed software for algorithmic classification of the accelerometer health conditions. B. Analysis Procedures 1) Preprocessing – moving average & measurement indexes

Fig. 6. Challenges of classifying IEPE accelerometers.

To tackle the challenges during the classification of IEPE accelerometer health conditions, as shown in Fig. 6, the respective methods were used to cope with the signal coupling, high variance across selected features, and numerical challenges. 1) Noise reductions To achieve noise reduction in signal preprocessing, a moving average technique [2] was applied. 2) Data type standardization To achieve numerical standardization, the raw data was used in the selected 30 measurement indexes calculation, and the 30 indexes were used as the representing features. 3) Automatic features selection and dimension reduction LDA method to analyze the 30 features indexes from the previous step. The new significant features were the data points projected in this new weighted combinatory space. Each newly generated feature was standardized within this transformed coordinated. With the standardized LDA results, the dimension reduced features was used to train the GMM classifier. 4) Response signals coupling with DAQ hardware To construct a classifier tolerating minor differences caused by hardware signal coupling, multiple clusters might exist in a single class (health condition). Thus, GMM was used in the developed method to create classifiers that were composed of several single Gaussian distributions. With the combination of multiple Gaussian distribution groups, a model representing the class was created. However, it was found that the number of groups predetermined to train the GMM classifier would enormously affect the classification accuracy. Therefore, an automatic number of group algorithms were needed so that the process can be automated without any human intervention in selecting the proper clustering before constructing the classifier. The proposed solution in this research combined DBSCAN to filter outliers out of the candidate pool for cluster centroids. Then by using K-mean++ method, the number of cluster and the initial centroids of each cluster could be determined. 5) Reducing parameters of the classifier model By using GMM as the classifying model, each of the Gaussian clusters required 3 parameters: mean, standard deviation, and weighting. Hence, by using the GMM to replace the GHA [2], the health diagnosis algorithm did not required 400 parameters for each class in the classification model. With

Moving average: 1. Input X  [ x1 , x2 ,..., xn ] 

1n

: a data set containing

n measurement. 2.

Output X ma  [ x1, ma , x2, ma ,..., xn, ma ] 

1n

: an output

with the values replaced by new values after moving average. To enhance diagnosis reliability, a signal preprocessing for the industrial IEPE accelerometer was needed. The moving average filter [2] was introduced and used in this study for AC current noise reduction. With a sampling rate of 1,000 measurements per second, each raw voltage signal response was recorded up to 10 seconds. Thus, the data size of each raw data measurement would be 10,000 double-precision floating-point numbers. To standardize the features used for later classification, 30 measurement indexes were selected by the industrial partner and used to measure each signal data. By combining m times of measurement, it might form a m  n matrix of the moving average output as follows: x1n , ma   x11, ma x12, ma x x22, ma x2 n , ma  21, ma X mn , ma    m n (1)     xmn , ma   xm1, ma xm 2, ma where the measurement indexes: m n 1. Input X mn, ma  with m measurement and n with the value replaced by new value after moving average. 2.

Output M m, d 

md

with m measurement and the

column as each measurement index value for the measurements. In this study, 30 measurement indexes were used and therefore d = 30. The measurement data which already gone through the moving average step for noise-reduction were used to calculate the value of each 30 measurement indexes. The resulting 1 30 array was stored. In this study, 750 accelerometer measurements for each of the five health conditions were measured. Thus, both AC and DC signal responses were used to generate a size of 3750  30 matrix. Since the numerical scale differences might be up to 103 across the indexes, the standardization process should be

5 applied to reduce the large deviation differences. For all elements in the M m, d matrix, the value of each row number

i and column number j element can be replaced by xi , j   x std xi , j 

j

x

(2)

j

where  x j and  x j are the column mean and the column standard deviation of the M m, d matrix. 5 W orking Conditions Raw Data

the GMM classifier. Thus, by using LDA, a space of projection might be found in this new projection space created by the weighted combination of the 30 measurement indexes. The new features (projected points) in this space might have a higher distance between each class point clouds for better differentiation. In terms of the case of having k classes, by calculating the relative distances between each class and the relative distances within each class matrix. Since there would be two objectives to obtain the maximum value, one was to maximize the numerator and another was to minimize the denominator. The objective function can be written as the following:

v  arg max Testing Raw Data

Training Raw Data

Features Computation & Reduction

1

DBSCAN

30 M easurement Indexes (Feature Extraction)

5 Kmean+ +

k k

where Sb 

Clustering Before Model Training

4

M oving Average

2

v

• Excluding outliers for later Kmean++ search for centroids. • Find proper initial mean & standard deviation for EM computation

vT Sb v vT Swv

 arg max vT Sb v

(3)

vT Swv 1

is a matrix of relative distances between each

class and Sw  k k is a matrix of relative distances within each class. By adding a constraint, the denominator would be equal to one, and the objective function could be written in the form of Lagrange multiplier ( L ) as follows

v  arg max f (v, L )  vT Sb v  L (vT Swv  1) v 2

107

IEPE-AC 30 Indexes

(4)

IEPE-AC 30 Indexes

1.2

1.8

1 1.6

Expectation Maximization (EM)

Gaussian Classifier Parameters Computation

1.4

Accumulative eigen-value

Linear Discriminant Analysis(LDA)

eigen-value (Ƙ element )

3

6

1.2 1 0.8 0.6

0.8

2 feature to represent > 99%

0.6

0.4

0.4

0.2 0.2

7

0 0

5

10

15

20

25

30

0 0

5

10

Order of LDA Factor

Gaussian M ixture M odel (GMM ) Probability Classifier

IEPE-DC 30 Indexes

Classification Results

35

1

6000

Accumulative eigen-value

Classification Accuracy

eigen-value (Ƙ element )

7000

30

IEPE-DC 30 Indexes

1.2

8000

15 20 25 Summation Order of LDA

5000 4000 3000

2000

0.8

3 features to represent > 99%

0.6

0.4 0.2

1000

Fig. 7. Algorithm for data preprocessing, automatic clustering and GMM classifier constructing processes.

0

0 0

5

10

15

20

Order of LDA Factor

25

30

0

5

10 15 20 25 Summation Order of LDA

30

35

Fig. 8. Finding the accumulative eigenvalues where the dimension can represent more than 99% of the significance.

2) Linear discriminant analysis (LDA) 1. Input: There were two inputs, M m,i 

mi

is an input

matrix consist of i indexes measurement of each of m time measurement. L  m1 is a data class label column vector. 2. Output: Three outputs could be generated, including

Y

md

a projection data matrix, V 

d d

a

projection vector, and   a column vector with each lambda value corresponded to the feature dimension. Before using the data to train the Gaussian classifier, maximizing the distance between classes and minimizing the distance within each class was able to increase the accuracy of d 1

To evaluate and select a proper dimension which have enough representation significance, the accumulative eigenvalues might be evaluated; that is, a summation of  . The evaluation standard used in this research was to reach more than 99% of the representation. Fig.6 shows the accumulative eigenvalues. To satisfy the 99% of representation standard, having a dimension of 2 and 3 for AC and DC respectively was enough. 3) Density-based spatial clustering of applications with noise (DBSCAN) The LDA output dataset must be processed by DBSCAN method to exclude the outliers to avoid excess group identified among the outlier regions and create a data matrix without the outliers before feeding the LDA output dataset (Y) into K-mean++ step.

6 md

1.

Input Y 

2.

Output YDB 

-4

is a data matrix generated by LDA. j d

is a matrix excluding the outliers.

To illustrate the necessity of adding the step of DBSCAN for prior clustering to K-mean++, an example using the Sensorless Drive Diagnosis Data Set available on the University of California, Irvine Machine Learning Repository [14] is further considered below. The original LDA data matrix ( Y ) and the output data matrix excluding the outliers ( YDB ) was illustrated in the Fig. 9 below. The points of black and green-blue, showed the original LDA data matrix and the output data matrix excluding the outliers respectively. The class of 4, 7, 9, and 11 were selected to demonstrate different data point distribution. With the aid of DBSCAN, it can be observed that the cleaned matrix now only includes those points that were more concentrated to the group center. By excluding points at the boundaries, a selection of a centroid among the outliers in the next step of K-mean++ might be prevented. 8

10

-4

-4

Excluding outlier by DBSCAN, Class: c4

10

4

4

2

2

0

0 -2

-2

-4

-4

-6

-6 -8

-8

-10

-10 0.5

1

1.5

2

2.5 10

-4

14

10

2.5

-3 -3

Excluding outlier by DBSCAN, Class: c9

10

0.8

10

0.6

8

0.4

6

0.2

4

0

2

-0.2

0

-0.4

-2

-0.6

3.5

4

-3

10

-1 -15

-10

-5

0

5

-4

10

-4

6 4 2 0 -2 -4 -6 -8 -10 0.5

1

1.5

2

2.5-3 10

-4

14 12 10 8 6 4 2 0 -2 -4 -6

10

-15

Kmean++ Centroids Class: c9

-10

-5

0

1 0.8 0.6 0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 -1 5

Kmean++ Centroids Class: c7

10

2.5 10-3

3

3.5

4

-3

10

Kmean++ Centroids Class: c11

-7

-6.5

10-4

-6

-5.5

-5

-3

10

Fig. 10. The resulting initial centroids selected by K-mean++ after outliers had been excluded by DBSCAN.

-7

-6.5

-6

-5.5

5) Gaussian mixture model (GMM) The GMM utilized a weighted combination of several Gaussian distribution functions, to establish a classifier for a dataset in which there might be several groups of points in a single class. A single Gaussian possibility distribution function (PDF) can be written as in Eq. (5): 1 T  1  g  x  exp    x  μ  Σ1  x  μ   (5) D 2    2  Σ where D is the dimension of the Gaussian, x is a column vector containing m measurement values, μ is a vector of means (when D=1, it represents a mean value), and Σ is the array of the covariance matrix (when D=1 it is the same as the variance value). To establish a weighted-combination of Gaussian PDFs, the formula can be written as follows

-0.8

-4 -6

3

Excluding outlier by DBSCAN, Class: c11

1

12

Kmean++ Centroids Class: c4

10

Excluding outlier by DBSCAN, Class: c7

6

6

8 6 4 2 0 -2 -4 -6 -8 -10

-5

-3

10

Fig. 9. LDA data matrix and the DBSCAN output data matrix comparison.

4) K-mean plus plus (K-mean++) To achieve proper seeding, which was the first step of K-means to select the initial centroids of the clusters for iteration, K-mean++ was used to improve the values initialization [12]. The K-mean++ would find a proper initial centroids among the data matrix excluding the outliers, then by both an absolute and a relative threshold of total distance values as the criteria of reaching an ideal number of clusters, the centroids and the numbers of clusters can be determined. The resulting centroids of the clusters can be observed in Fig. 10 marked as the navy-blue dots. As shown in the results, three centroids were selected in class-4, two centroids in class-7 and class-9 respectively, and five centroids in class-11. These centroids were used as the initial position in the next step of EM.

p(x)   k 1 wk gk (x | k , k ) K

(6)

where K is the number of Gaussian PDF components, wk is the weighting for the kth Gaussian PDF components. The details of finding proper weightings were introduced in the next part. 6) Expectation maximization (EM) In this research, the EM algorithm was used to find the parameters to establish a representable GMM classifier. To find the proper mean and covariance matrix, the initial mean and covariance matrix from the previous K-mean++ step were used. By defining a latent variable as shown in (7), estimations for both mean ( ˆ k ) and covariance matrix ( ˆ k ) can be calculated, where both equations are written in (8)-(9), with the following steps. wk g k ( xi | k , k ) zki  (7) K  k 1 wk gk ( xi | k , k )

7

ˆ k  ˆ   k

1



N

1



N

i i 1 k

z



N i i 1 k i

z x

  i 1 zki ( xi  ˆ k )( xi  ˆ k )T

(8)

N

i i 1 k

z

(9)

robustness of finding a proper initial mean and covariance matrix by K-mean++ and avoid redundant classes caused by the outliers. Then, by the mean and covariance from K-mean++, the robustness of the EM algorithm also increased since the initial parameters would be adjacent to the final optimum values for the GMM parameters.

In the following, the EM steps are summarized: Table. 1. Classification accuracy of the proposed improved steps to find the Gaussian mixture model classifier parameters Label 1 2 3 4 5

Step 1. Initialize µ and  i Step 2. Fix µ and  , then update zk (E-step)

Category

Normal

Reverse Connection

Results

>99%

>99%

i

Step 3. Fix zk and update µ, then  (M-step) Step 4. Repeat step 2 and 3 until reaching convergence By the EM iteration, the estimation of mean and covariance matrix should approach the optimum values. By comparing the latent variables for each of k different Gaussian PDF components of data points, the weighting ( wk ) can also be found as shown in (10). zi wk  Nk (10)  i 1 zki After using the expectation-maximization algorithm to obtain the estimated ˆ k and ˆ k , the proper parameters for the constructed Gaussian mixture model classifier can be obtained. If there were C different classes, then the GMM probability ( p( x) ) of each of the class can be found by K p( x)   k 1 wk gk ( x | ˆ k , ˆ k )

(11)

where g k ( x) 

1 (2 ) | ˆ k | D

 1  exp   ( x  ˆ k )T ˆ k 1 ( x  ˆ k )   2 

(12)

7) Gaussian mixture model classifier The most likely class of an unknown data point xunknown can be determined with the last step of the algorithm. After the GMM classifier was successfully constructed, the parameters were obtained by the previous steps through DBSCAN, K-means++, and EM for the parameters ˆ k and

ˆ k . It was possible to find the maximum GMM probability of the xunknown to determine the most likely class it belonged

to. With the given class number C  1, 2,3, 4,5 it might be written in the format of (13):

c xunknown  arg max  p1 ( xunknow ), p2 ( xunknow ),..., p5 ( xunknow )

(13)

Short Circuit

Disconnect

Open Circuit

>99% (Found:

>99%

>99%

Overall Performance

>99 %

1 outlier)

B. Industry Application Contribution 1) Outlier detections In the last step to find the belonging class of the unknown datasets, an absolute threshold to distinguish the points where its probability of each class was lower than value can be established. The pseudo-code of the algorithm can be written as: if  p1  xunknown  , p2  xunknown  , , pC  xunknown    , xunknown  C By labeling the outliers, these outlying data points might be further examined if they exhibited a new health condition class. Also, through identifying outliers with the absolute values, the long-term deterioration path between the classes also can be determined (from normal to mild-malfunction to severe-malfunction). In this research, among the 60 testing unknown datasets, one was recognized as an outlier, which was shown in Fig. 11. 2) Probability-based insights The GMM classifier is based on the combination of Gaussian probability density functions. For an unknown dataset, the probabilities of it belonged to each of the classes can be calculated. Among different measurement indexes as the features, the GMM classifiers might be compared across the selected dimensions and compare the aggregated possibilities given by the GMM PDF. The results of a 3D features Gaussian classifier can be visualized by an ellipsoid that demonstrating the space within 3 standard deviations. The visualization of the 3D GMM classifier was shown in Fig. 12 and Selecting 3D features generated by LDA on AC signals with the ellipsoids showing spaces within 3 standard deviations of each class. C. Automatic features selection and dimension reduction

By finding the maximum probability, pv with v  1 ~ 5 , then the unknown data point should belong to the class v. V. EXPERIMENT RESULTS A. Classification Accuracy Based on the proposed method, combining DBSCAN, K-mean++ and EM algorithm to find the estimated parameters for the GMM. It increased the classifier accuracy to >99%. The classification results were summarized in Table. 1. By DBSCAN, outliers can be excluded to increase the

1) Feature points management With the 30 selected indexes as preliminary features, the LDA approach might be used to find the most significant weighted features projections on the newly formed projection space. This would reduce both of the feature dimensions and the parameters needed for the GMM. Before LDA, the original 30-dimension features can be stored as a 3750  30 features matrix across five accelerometer health status. Since overlaps in most feature dimensions can be found, a new weighted projection feature may be an alternative.

8 Through the LDA process, the feature points automatically being selected and with a 3-D graphical illustration of the results can be found in Fig. 12 and Fig. 13. 2) Classification model parameters reduction With the proposed GMM parameters estimation steps, the parameters needed to construct the classifier compared to the GHA method [2] was reduced from 400 parameters to 10 sets of GMM parameters, composing mean, covariance matrix, and weighting. Therefore, the memory size requirement was reduced significantly and it was suitable for embedded system realization. AC Signals Mean & Standard Deviation Distribution

0.04

DC Signals Mean & Standard Deviation Distribution 10-3

Normal Reverse connection Short circuit Disconnected Open circuit

8

Standard Deviation(V)

Standard Deviation(V)

0.06

0.02

0

-0.02

6

Normal Reverse connection Short circuit Disconnected Open circuit

4 2 0

Outlier

-2

-0.04 -0.01

-0.005

0

0.005

0.01

-20

-15

Mean(V)

-10

-5

Mean(V)

0

10-4

Fig. 11. The outlier detected with the absolute density threshold shown in the AC and DC response voltage 2D dimension (mean and standard deviation).

VI. CONCLUSION This paper presented an IEPE accelerometer sensor diagnosis algorithm to pursue a smart factory application. The proposed method was tested and achieved >99% accuracy in determining five health conditions of the accelerometers. The proposed dimension reduction can reduce the parameters required to establish a long-term tracking model. The clustering technique improved the centroids selection and number of the clusters determination process. This also improved the EM robustness to estimate GMM parameters. The health diagnosis technique can benefit the facility manager in identifying the type of faulty of the malfunctioned accelerometer. The overall employment of such method can isolate potential fault-exhibit sensors and prevent a false alarm. The faulty type of the IEPE accelerometer provided can lead to a reduction of inspection time, and the need for human resources dispatched in finding the root cause and faulty online sensors. With these applications, unscheduled shut down for inspection can be avoided, and preventive maintenance can be deployed when the online sensor had been identified as faulty. Finally, the proposed method had been realized in a 24hr product line.

Fig. 12. Selecting 3D features generated by LDA on DC signals with the ellipsoids showing spaces within 3 standard deviations of each class.

9 Fig. 13. Selecting 3D features generated by LDA on AC signals with the ellipsoids showing spaces within 3 standard deviations of each class.

ACKNOWLEDGMENT The author would like to thank Dr. Chung-Yung Wu for his helpful advice on various technical issues examined in this paper. This work was partially supported by the Ministry of Science and Technology under Grant No. MOST 107-2221-E-006-114-MY3 and MOST 108-2923-E-006-005-MY3. No conflict of interest exits in the submission of this manuscript, and manuscript is approved by all authors for publication.

VII.

REFERENCES

[1] Y. Chen, ―Industrial information integration—A literature review 2006– 2015,‖ Journal of Industrial Information Integration, vol. 2, 2016, pp. 30-64. [2] C.C. Peng, C.H. Kuo, C.Y. Wu, ―Graphical Histogram for Integrated-Circuit-Piezoelectric-Type Accelerometer for Health Condition Diagnosis and Monitoring,‖ Sensors and Materials, vol. 29, no. 11, 2017. [3] Kucuker, Ahmet, Bayrak, Mehmet, ―Detection of Mechanical Imbalances of Induction Motors with Instantaneous Power Signature Analysis,‖ Journal of Electrical Engineering and Technology, vol. 8, Issue 5, 2013, pp.1116-1121, The Korean Institute of Electrical Engineers, DOI: 10.5370/JEET.2013.8.5.1116

[4] S. Bindu and V. V. Thomas, ―Diagnoses of internal faults of three phase squirrel cage induction motor — A review,‖ in ICAECT, Manipal, 2014, pp. 48-54. [5] A. Glowacz and Z. Glowacz, ―Diagnosis of the three-phase induction motor using thermal imaging,‖ Infrared Physics & Technology, vol. 81, Mar. 2017, pp. 7-16. [6] H. Nakamura, Y. Yamamoto and Y. Mizuno, "Diagnosis of electrical and mechanical faults pf induction motor," IEEE Conference on Electrical Insulation and Dielectric Phenomena, Kansas City, MO, 2006, pp. 521-524. [7] T. C. A. Kumar, G. Singh and V. N. A. Naikan, "Effectiveness of vibration and current monitoring in detecting broken rotor bar and bearing faults in an induction motor," 2016 IEEE 6th International Conference on Power Systems (ICPS), New Delhi, 2016, pp. 1-5. [8] M. Behzad, A.R. Bastami, ―Effect of centrifugal force on natural frequency of lateral vibration of rotating shafts,‖ J. of Sound and Vibration, vol. 274, Issues 3–5, 22 July 2004, pp. 985-995. [9] F. Immovilli and M. Cocconcelli, "Experimental Investigation of Shaft Radial Load Effect on Bearing Fault Signatures Detection," in IEEE Transactions on Industry Applications, vol. 53, no. 3, pp. 2721-2729, May-June 2017. [10] Chao Jin, A. P., ―A Vibration-Based Approach for Stator Winding Fault Diagnosis of Induction Motors: Application of Envelope Analysis,‖ in PHM Conference, Fort Worth, TX, Oct. 2014. [11] Detection and classification of induction motor faults using Motor Current Signature Analysis and Multilayer Perceptron, 2014 IEEE 8th International Power Engineering and Optimization Conference (PEOCO2014), Langkawi, 2014, pp. 35-40. doi: 10.1109/PEOCO.2014.6814395 [12] C. Guan, K. K. F. Yuen and Q. Chen, "Towards a Hybrid Approach of K-Means and Density-Based Spatial Clustering of Applications with Noise for Image Segmentation," 2017 IEEE International Conference on

10 Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData), Exeter, 2017, pp. 396-399. doi: 10.1109/iThings-GreenCom-CPSCom-SmartData.2017.65 [13] V. E. Neagoe and V. Chirila-Berbentea, "Improved Gaussian mixture model with expectation-maximization for clustering of remote sensing imagery," 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Beijing, 2016, pp. 3063-3065. [14] Martyna Bator, ―Dataset for Sensorless Drive Diagnosis Data Set‖, University of California Irvine Machine Learning Repository (2015).