A data-driven prognostic approach based on statistical similarity: An application to industrial circuit breakers

A data-driven prognostic approach based on statistical similarity: An application to industrial circuit breakers

Measurement xxx (2017) xxx–xxx Contents lists available at ScienceDirect Measurement journal homepage: www.elsevier.com/locate/measurement A data-d...

1MB Sizes 2 Downloads 49 Views

Measurement xxx (2017) xxx–xxx

Contents lists available at ScienceDirect

Measurement journal homepage: www.elsevier.com/locate/measurement

A data-driven prognostic approach based on statistical similarity: An application to industrial circuit breakers Giacomo Leone a,⇑, Loredana Cristaldi a, Simone Turrin b a b

Dipartimento di Elettronica, Informazione e Bioingegneria (DEIB), Politecnico di Milano, Piazza Leonardo da Vinci, 32, 20133 Milano, Italy ABB AG, Corporate Research Center, Wallstadter Str. 59, 68526 Ladenburg, Germany

a r t i c l e

i n f o

Article history: Received 25 October 2016 Received in revised form 18 January 2017 Accepted 9 February 2017 Available online xxxx Keywords: Data-driven Industrial circuit breakers Prognostics Remaining useful life Sub-fleet Statistical test

a b s t r a c t In this paper, a data-driven prognostic algorithm for the estimation of the Remaining Useful Life (RUL) of a product is proposed. It is based on the acquisition and exploitation of run-to-failure data of homogeneous products, in the followings referred as fleet of products. The algorithm is able to detect the set of products (sub-fleet of products) showing highest degradation pattern similarity with the one under study and exploits the related monitoring data for a reliable prediction of the RUL. In particular, a novel methodology for the sub-fleet identification is presented and compared with other solution found in literature. The results obtained for a real application case as Medium and High Voltage Circuit Breaker, have shown a high prognostic power for the algorithm, which therefore represents a potential tool for an effective Predictive Maintenance (PdM) strategy. Ó 2017 Elsevier Ltd. All rights reserved.

1. Introduction The Remaining Useful Life (RUL) of a system is defined as the useful life left at a particular time instant, that is the remaining time interval in which it will be able to meet its operating requirements. RUL estimation represents the core of the Prognostics and Health Management (PHM) programs which aim to a reduction of maintenance and life-cycle management costs, an increase of the systems availability and the adoption of Predictive Maintenance (PdM) strategies [1,2]. In the literature, the prognostic algorithms for the RUL estimation are usually classified in three different categories. The first class is related to the model-based approaches that refer to physical models describing the behavior of the systems under study. Such models can be very accurate but often require a strong and detailed knowledge of the inherent physics-of-failure. It follows that they are often very specific to the case study and their implementation is not always possible. On the other hand, data-driven algorithms, the second main category found in the literature, are mainly based on the exploitation of the collected run-to-failure data and usually do not require particular knowledge about the inherent failure mechanisms. They provide a good trade-off between model complexity and results accuracy. ⇑ Corresponding author. E-mail addresses: [email protected] (G. Leone), loredana.cristaldi@polimi. it (L. Cristaldi), [email protected] (S. Turrin).

Finally, hybrid approaches attempt to leverage the advantages of combining the prognostics models in the aforementioned different categories for RUL prediction. In the last years, data-driven approaches have experienced a wide diffusion. One reason is their suitability for applications related to complex engineered systems for which the definition of analytical models is a complex and resources demanding task. Another factor is the increasing availability of cheap monitoring systems that allow the collection of condition monitoring data in substantial quantities. A complete and detailed review about them is given in [3]. The same authors of this paper already presented two datadriven prognostic algorithms, one based on the statistical extraction and exploitation through Monte Carlo (MC) simulations of reliability and maintenance knowledge [4] and one based on a Machine Learning solution, in particular a Neural Network (NN) architecture [5]. A key and novel differentiator of the proposed approaches was the concept of fleet of products. A fleet of products is a set of homogenous products, with respect to the function for which they are intended, clustered together following different possible criteria such as belonging to the same customer, being installed in the same region or same industrial application and so on. The advantage of this practice is the possibility to extract fleet-specific usage and degradation profiles that can be exploited for the RUL prediction of a specific element (i.e. a specific product) of the selected fleet. In particular, the contribution of the acquisition of knowledge at fleet level on the improvement of the prognostic ability is quite

http://dx.doi.org/10.1016/j.measurement.2017.02.017 0263-2241/Ó 2017 Elsevier Ltd. All rights reserved.

Please cite this article in press as: G. Leone et al., A data-driven prognostic approach based on statistical similarity: An application to industrial circuit breakers, Measurement (2017), http://dx.doi.org/10.1016/j.measurement.2017.02.017

2

G. Leone et al. / Measurement xxx (2017) xxx–xxx

relevant when the estimation of the product RUL at its early stage is of interest. Predicting the future behavior, in fact, is tied to the ability to learn from the past [1], and this is quite limited if the product is not in the mature stage of its life. The application cases considered in [4,5], as well as in this paper, are Medium Voltage (MV) and High Voltage (HV) Circuit Breakers (CBs). MV and HV CBs are crucial elements in power transmission and distribution systems. They are usually characterized by a long useful life and high reliability, but at the same time, even a single failure of them may cause severe damages, from technical, economical and safety point of view. It follows that reliable prognostic models are necessary for such kind of engineered systems. Defining physical models for such devices, however, is a time consuming and laborious task, since several different failure mechanisms depending on many parameters (number of completed opening operations, contacts degradation, short circuit current) may occur and also interfere with each other. All these aspects motivate the choice of applying data-driven algorithms for the estimation of their RUL. As already said, the approaches proposed in [4,5] are based on the concept of fleet of products. The application case proposed, however, represents one of the many classes of products for which collecting a relevant number of run-to-failure curves is difficult. A striking example for this are the Vacuum Circuit Breakers (VCBs). They are based on a relatively recent technology and producers claim for them a Mean Time To Failure (MTTF) of over 30 years. It follows that acquisition of Condition Monitoring (CM) data spanning all the lifetime for many VCBs that can be considered to work in similar conditions and industrial applications is often not possible. These considerations have driven the authors to propose in [6] a new approach for the selection of a sub-fleet of products showing highest degradation pattern similarity with the product under study and for which the RUL estimation is required. Then, the related monitoring data are exploited for a reliable prediction of the RUL, offering a potential tool for an effective Predictive Maintenance strategy. The main resulting advantage is the lower constraint required for the definition of the product fleet (the reference library can now be constituted by run-to-failure data of products belonging to different customers, different geographical regions and industrial application), as the proposed methodology automatically selects the products with the most affine degradation profiles, discarding the ones that would weight in negative way in the RUL estimation. In this paper, an alternative strategy for the selection of the products sub-fleet is proposed. In particular, it is based on the concept of degradation rate, which is explained further, and the application of a suitable statistical test for discarding products showing a degradation profile statistically not homogenous with the one of the product under analysis. This article is structured in the following way: in Section 2 the proposed methodology for the selection of an appropriate sub-fleet is described and compared with the one presented by the authors in [6] and other works found in the literature. In Section 3, the prognostic stage exploiting the CM data of the sub-fleet products for the RUL prediction is illustrated. Finally, in Section 4, the results obtained for the application case are reported and compared with the ones obtained with the distance-based sub-fleet identification of [6]. The paper is ended with the conclusions.

2. Sub-fleet selection The definition of a sub-fleet of products consists in the selection among a given set of products of a subset of them that show higher similarity, in terms of observed degradation (OD) in time, with respect to the item for which the estimation of the RUL is required.

In the literature, some methods for a sub-fleet definition already exist. In particular, they are based on a similarity-based approach that consists in the evaluation of the similarity between the test trajectory pattern (monitored degradation pattern for the item for which the RUL has to be predicted) and the reference trajectory patterns stored in the database and use the RULs of these latter to estimate the RUL of the former, accounting for how similar they are [7]. In [8], the authors propose the definition of a similarity coefficient based on the sum of the squared errors between the monitored test pattern and the reference trajectories. In particular, given a test product x and a reference item j, the similarity coefficient scxj is calculated as:

scxj ¼

I X ðODji  ODxi Þ2

ð1Þ

i¼1

where ODxi (ODji) is the observed degradation for the test product x (library specimen j) at cycle i and I is the number of observed cycles. In the estimation of the RUL, in order to give more weight to the library specimens with larger similarity coefficients, a weighting coefficient for a product j is defined as:

  scxj wxj ¼ exp  b

ð2Þ

where b is selected according to the desired selectivity (i.e. if b is small, few specimens are influential). Finally, the RUL for the product x is computed according to (3):

PJ RULx ¼

j¼1 wxj RULj PJ j¼1 wxj

ð3Þ

This approach has been used also in [9], whereas in [10] a slight modification is proposed. The modifications are made in the RUL estimation (i.e. Eq. (3)), in which the most similar P percent number of the library samples are utilized rather than using whole dataset. A different approach is suggested in [11]. The authors, in fact, propose a definition of a deterministic model Mi for each i-th training unit of the library, so that, for a given time t, an estimated value y of the Health Indicator (HI) variable that describes the degradation pattern of the item is provided. At this point, if a sequence Y = y1, y2, . . . , yr of values of the HI for a test unit is available, a distance metric between the model Mi and Y is defined as the sum of the squared errors between the monitored test pattern and estimations provided by the model, divided by the prediction variance of the model itself. Then the RUL estimation for the test product is equal to a weighted sum of the RUL of the reference products. The weights can be assigned according to different principles. One of them is to apply the k-nearest neighbor method that is to select the products with the k smallest distance values and apply a weight 1/k to their RULs. In this paper, an alternative methodology for the sub-fleet identification is proposed. In particular, it is based on the application of a statistical test for the identification of those products presenting a statistical distribution of the degradation rate similar to the target product. This approach deals with the identification of homogenous products through a different point of view, with respect to the methodology presented in [6]. The rest of the Section is structured as follows: first, a short overview (Section 2.1) on the condition monitoring data considered for the study is provided. Then, in order to highlight the novelty of the proposal, in Section 2.2 the sub-fleet identification presented in [6] is briefly recalled and discussed. Finally, last Section 2.3 presents the new identification approach.

Please cite this article in press as: G. Leone et al., A data-driven prognostic approach based on statistical similarity: An application to industrial circuit breakers, Measurement (2017), http://dx.doi.org/10.1016/j.measurement.2017.02.017

G. Leone et al. / Measurement xxx (2017) xxx–xxx

rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 Xn dxj ¼ ðt  t i;j Þ2 i¼1 i n

2.1. Condition monitoring data Condition monitoring systems are usually installed with the aim to identify faults earlier through the collected data and to provide direct or indirect information on the health condition (HC) of the systems in which they are employed. Examples of CM data for MV and HV circuit breakers are the measurement of the contact ablation, SF6 gas density for gas insulated circuit breakers, and temperature of the interrupting chamber. Based on the obtained CM data and its direct or indirect relation towards the HC, it is possible to have a sketch of the health condition profile of a product over time, starting from the installation time instant to its end of life (EoL). EoL is defined as the time instant in which the product is not anymore able to perform its intended function hence a maintenance activity, refurbishment, replacement or the disposal of the product is required. In particular, HC = 100% refers to a product in a perfect healthy state, whereas HC = 0% means that it has reached its EoL. In Fig. 1, the HC versus time (t) profile for a fleet of homogeneous products is presented. 2.2. Sub-fleet selection through distance computation Let us suppose that for the test product x for which the RUL estimation is wished a partial monitoring of its HC profile is available and it is constituted by n consecutive observations. The knowledge about the degradation profile of such product can be described through a time series Kx composed by n couples of values as follows:

K x ¼ fðt1 ; HC 1 Þ; ðt2 ; HC 2 Þ; . . . ; ðt n ; HC n Þg

ð4Þ

In particular, the generic value ti represent the life stage (time instant or number of cycles) of the product x at which the i-th observation about its HC has been carried out, being HCi the related value. At this point, the HC profile of the generic j-th fleet product can be compared with the test profile, determining the related time stamps corresponding to the HC values available for the test profile, namely HC1, HC2, . . . , HCn. In other words, for each j-th product is possible to determine a time series Kj defined as:

K j ¼ fðt1;j ; HC 1 Þ; ðt2;j ; HC 2 Þ; . . . ; ðtn;j ; HC n Þg

ð5Þ

It is easy to understand that the more the HC profiles of the reference product j and the test product x are similar, the smaller is the difference between corresponding time instants, such as t1 and t1,j, t2 and t2,j and so on. Starting from this assumption a distance value dxj that correlates the two products can be computed as:

100

ð6Þ

that is the Root Mean Square Error (RMSE) between the two HC profiles. A small dxj means that the two profiles are similar or, equivalently, similar degradation processes characterize the two products, whereas large distance values are related to products that are subject to different degradation mechanisms and that should be excluded in the estimation of the RUL for the target products since are representative of different working conditions. At this point if the reference fleet is composed by Nf products, it is possible to define a variable df, namely the fleet distance, that describes the distribution of the distance values for the fleet. The experimental sample for such variable is provided by the Nf different values of the distance computed according to (6):

df ¼ fdx1; ; dx2 ; . . . ; dxNf g

ð7Þ

In order to isolate the sub-fleet of products showing highest functioning similarity with the one under study and exploit the related monitoring data for a reliable prediction of the RUL, a threshold ds, corresponding to the s-quantile for the variable df can be defined. The parameter s is substantially the desired degree of selectivity (i.e. the smaller the parameter s, the stricter the selection criterion). Finally, the products characterized by a distance value smaller than ds will compose the desired sub-fleet. The sub-fleet identification based on the computation of distances among fleet and target product HC curves, allows to identify the ones at minimum distance. In this way, however, it always defines a sub-fleet, even when the target product behavior is not comparable, so that the RUL estimation may be affected by low accuracy. This criticality will be shown in Fig. 9. 2.3. Sub-fleet selection through statistical test about degradation rate In this Sub-section, a different strategy for the selection of the most appropriate subset of products with similar degradation profile with respect to the target product is presented. The idea is to extract from the HC profiles of the target and fleet products an informative contribution richer than the exclusive knowledge of the HC values at different time stamps. The methodology here proposed is to consider for the different products the statistical distribution of the degradation rate, which represents the velocity with the product degrades with time. In order to mathematically define the degradation rate, it is opportune to preliminarily introduce the concepts of sampling time and health condition variation. Considering the knowledge of the HC profile for a given product, mathematically representable with a time-series as in (4), the sampling time is simply obtained as the difference between two consecutive observation time instants as reported in:

Dt ¼ ft2  t1 ; . . . ; t n  t n1 g 80

health condition, HC [%]

3

ð8Þ

Similarly, the vector of health condition variation is expressed as:

60

DHC ¼ fHC 2  HC 1 ; . . . ; HC n  HC n1 g

40

20

0 0

20

40

60

80

100

time, t [Time Unit] Fig. 1. Health condition profile for a fleet of products.

120

ð9Þ

Fig. 2 graphically depicts the computation of the sampling time and health condition variation for a generic product for which the HC-time curve is partially known. Considering MV and HV CBs, the variation of the sampling time provides information on the usage profile of the breaker which is equivalent to the frequency of the opening operations. On the other hand, the variation of the health condition represents the degradation profile of the product. From (8) and (9) the variation of the degradation rate dri can be calculated as follows:

Please cite this article in press as: G. Leone et al., A data-driven prognostic approach based on statistical similarity: An application to industrial circuit breakers, Measurement (2017), http://dx.doi.org/10.1016/j.measurement.2017.02.017

4

G. Leone et al. / Measurement xxx (2017) xxx–xxx

the presence of a particular trend. In this sense, the application of the KST is useful also from a diagnostic point of view. In the next Section 3, the proposed prognostic algorithm based on MC simulations is presented. A more detailed description, however, is given in [4].

(t1,HC1)

HC [%]

1

100 %

(t2,HC2)

t1

3. Prognostic algorithm description (tn ,HCn ) 0%

tEOOW

time, t

Fig. 2. Sampling time and HC variation for a generic product with HC profile partially known. The End of Observation Window (EOOW) time instant is denoted as tEOOW.

dr i ¼

DHC i Dt i

ð10Þ

for i = 1, 2, . . . , n  1, so that the degradation rate vector is obtained:

dr ¼ fdr 1 ; dr 2 ; . . . ; dr n1 g

ð11Þ

This vector represents an experimental sample of the statistical distribution of the degradation rate affecting the considered product. At this point, exploiting relations (8)(11), it is possible to obtain the degradation rate vector dr x related to the target product. In the same way, also the vector dr j associated to the j-th product of the reference fleet can be defined. In doing this, however, in order to compare products at equal degradation levels, for the jth product only the measurement obtained from HC = 100% to HC = HCn are considered (from (4), HCn is the last HC observation for the target product), Finally, the application of a suitable statistical test allows to make a statement about the probability that the two degradation rate profiles belongs to the same statistical distribution, that is they show similar degradation processes. In this paper, authors have considered as most appropriate statistical test for this kind of analysis, the two-sample KolmogorovSmirnov Test (KST) [12], a nonparametric hypothesis test that evaluates the difference between the Cumulative Distribution Functions (CDFs) of the two sample data vectors in order to return a test decision about the null hypothesis that the two samples are drawn from the same continuous statistical distribution. Obviously, the results of the test is mostly dependent on the selected value for the significance level a of the test [12], corresponding to the probability of rejecting the null hypothesis, given that it is true. The higher a, indeed, the higher is the degree of selectivity imposed to the test. The sub-fleet of interest is obtained running the test fixing as first sample the vector dr x related to the target product, and setting cyclically as second sample the vector dr j for the j-th product of the fleet, with j varying from 1 to Nf. All the fleet products for which the test returns that there is no statistical evidence for rejecting the null hypothesis constitute the desired sub-fleet. The main contribution of our proposal is the overcoming of the limitations discussed in Section 2.2. In fact, differently from distance-based algorithms, when the target product behavior is not comparable with the profile of any fleet product, the KST rejects the null hypothesis of statistical similarity of the degradation rate for all the fleet items, so that no sub-fleet is defined. In this scenario, the use of CM data related exclusively to the target product could be a better option for the RUL estimation. Furthermore, the application of the KST for two samples of degradation rate, both related to the target product but computed in different time stamps would provide the possibility to include in the model the knowledge of an eventual change in the degradation process or

The main assumption underlying the algorithm is that the future usage of the test product (for which a partial observation of its HC profile is available) might possibly be similar to the usage profile of the sub-fleet of homogeneous/similar products. It follows that the first step for the enhancing of the precision in forecasting the target RUL is to extract knowledge from the condition monitoring data of such products. It has been already demonstrated in [4] that this approach allows to overcome the risk of carry out a longterm prediction relying on the limited portion of CM data that would be available if only the past history of the test product was considered. Finally, this knowledge can be exploited in order to predict the future HC profile over time and extracting a confidence interval for the test product RUL. 3.1. Knowledge extraction at sub-fleet level Let us suppose that after the sub-fleet selection described in Section 2 the historical data analysis is limited to Nsf products (the subscript ‘‘sf” stays for ‘‘sub-fleet”). One way extraction to extract past usage information about such products is to take into account for each of them the distribution of the sampling time and the distribution of the health variation in the related CM data. Considering the generic j-th product of the sub-fleet, the related sampling time vector Dtj and health condition variation are obtained exploiting respectively (8) and (9). The extraction of sampling time and health variation information at sub-fleet level is carried out systematically for each product of the selected subset, so that the following vectors are obtained:

Dtsf ¼ fDt 1 ; Dt 2 ; . . . ; Dt Nsf g

ð12Þ

DHC sf ¼ fDHC 1 ; DHC 2 ; . . . ; DHC Nsf g

ð13Þ

as concatenation of the vectors obtained at single product level. Starting from the above vectors, the Empirical Cumulative Distribution Functions (ECDFs) [12] of the variation of the sampling time and health condition at sub-fleet product level is obtained. 3.2. Knowledge exploitation In order to exploit the extracted knowledge an approach based on Monte Carlo simulations is discussed. The procedure is explained as follows: 1. Generation of two random numbers r1 and r2, drawn from a uniform distribution with values between 0 and 1. 2. The two extracted numbers are used to determine the subsequent point in the HC vs. time curve for the target product. In particular, the next sampling time Dt⁄ and health condition variation DHC⁄ are determined exploiting the related CDFs, applying the Inverse Transform Method (Fig. 3). The generation of two uniformly distributed random numbers is made starting from the assumption that the sampling time and the variation of the health condition are not correlated. Step 1 and 2, however, can be easily modified to take into account the case in which these two variables are correlated. 3. Steps 1 and 2 are iteratively repeated until the estimation of the HC reaches the value 0%. The corresponding estimated time

Please cite this article in press as: G. Leone et al., A data-driven prognostic approach based on statistical similarity: An application to industrial circuit breakers, Measurement (2017), http://dx.doi.org/10.1016/j.measurement.2017.02.017

5

G. Leone et al. / Measurement xxx (2017) xxx–xxx

4. Algorithm validation The application case for the presented approach are MV and HV CBs. The data used and reported in the contribution are confidential information (ABB property), hence the exact numerical values are not reported. A set of CM data related to a fleet of 90 products coming from different customers, operating regions and applications is considered. In order to evaluate the prognostic performances of the algorithm, the following procedure has been followed:

90 80

Health Condition, HC [%]

At the end of step 4, NMC estimations of the product RUL are available, so that a confidence interval at a desired confidence level can be obtained, as shown in Fig. 6. A crucial difference between our proposal and the solutions presented in the literature is that after the selection of the most representative sub-fleet for the target product, its RUL estimation is not just defined as a weighted sum of the RUL of such subset of products, but, on the contrary, a prognostic model that exploits all the information enclosed in their HC profile is involved. Taking into account the entire HC profiles instead of considering exclusively the RUL values, which represent only the last points of such curves, enables the possibility to include in the prognostic model a more complete information, so that, besides the RUL, also other outcomes of interest such as the Probability of Failure (PoF) within a predetermined window of time can be provided. Another benefit deriving from the application of this approach is that the sub-fleet is not strictly required to be solely composed by products with a known RUL (i.e. already failed), but also products characterized by a partial HC profile knowledge can be included in the analysis, extending the potential set of products from which extracting information suitable for prognostics issues. This factor is as more determining as more difficult is the acquisition of run-to-failure data in substantial quantities (e.g., this is the case for VCBs).

100

HC

*

60 50 40 30 20

0 0

5

10

15

20

25

30

35

40

45

50

55

60

Time, t [Time Unit] Fig. 4. Forecasting the RUL (the gray line represents the forecasted trajectory of the future HC vs time product curve).

100 90

Observed Curve Actual Future Trend

80

N MC forecasts

70 60 50 40 30 20 10 0

0

5

10

15

20

25

30

35

40

45

50

55

60

Time, t [Time Unit] Fig. 5. NMC different forecasts for the product RUL.

0.14

1. Set as reference fleet a number Nf of products chosen randomly among the original fleet. 2. Select a test product among the reference fleet.

EoL Forecasts pdf Actual EoL Median EoL Forecasts Quantile 0.025 EoL Forecasts Quantile 0.975 EoL Forecasts

0.12 0.1

t

EoL Forecasts pdf

1

Cumulative Distribution Function

t*

70

10

Health Condition, HC [%]

provides an estimation of the EoL for the target product. The distance between the end of the observation window and the estimated EoL corresponds to an estimation of the test product RUL (Fig. 4). 4. Steps 1–3 are repeated for a statistically significant number of times NMC (Fig. 5).

0.8

0.6

0.08 0.06 0.04 0.02

r=0.5

0.4

t * =0.3342

0

35

40

45

50

55

60

65

End of Life, EoL [Time Unit]

0.2

Fig. 6. Probability density function of the forecasted RUL for the target product.

0 0

0.2

0.4

0.6

0.8

1

t [Time Unit] Fig. 3. Forecasting of the next point of the health condition vs. time curve.

3. Delimiting the observation window for the test product at a given observed degradation OD. 4. Set a value a and extracting the most affine sub-fleet products among the fleet defined at the step 2.

Please cite this article in press as: G. Leone et al., A data-driven prognostic approach based on statistical similarity: An application to industrial circuit breakers, Measurement (2017), http://dx.doi.org/10.1016/j.measurement.2017.02.017

6

G. Leone et al. / Measurement xxx (2017) xxx–xxx

5. Run the prognostic algorithm described in Section 3 and obtain a confidence interval for the RUL. 6. In particular, if the lower and upper bounds of such confidence interval are denoted respectively by RULmin and RULmax and the actual RUL value by RULact, an indicator c for the correctness of the prediction can be obtained according to the following definition:





1 if RULact 2 ½RULmin ; RULmax  0

ð14Þ

otherwise

In other words, c is equal to 1 when the RUL prediction is correct, 0 otherwise. 7. Repeat steps 3–5 setting cyclically as test product a different product of the reference fleet. 8. Finally, the algorithm average performance in predicting the RUL of a product for given values of observed degradation OD and significance level a, starting from a reference fleet of Nf products, is obtained as:

Perf ðOD; a; Nf Þ ¼ 100

PN f

c

ð15Þ

i¼1 i

Nf

The procedure just described can be then repeated for any value of the parameters OD, a and Nf. In Fig. 7, the performances exhibited by the algorithm with a reference fleet of 69 products are depicted. In particular, the focus is on the variation of the results as function of the observed degradation level for the test product (i.e., the percentage of observed HC-time profile) and the selected value of a. The results refer to estimated confidence interval for the RULs at 95% level. In order to clarify the algorithm performance evaluation procedure, let us consider the value of the performances when OD = 10% and a = 0.70. From Fig. 7, it is visible that the prognostic performance of the algorithm with such parameters is equal to 82.61%. It means that for the RUL prediction of a product for which only 10% of the degradation profile is known (performed starting from a reference fleet of 69 products and setting a level of significance equal to 0.70), a success rate of about 83% has been observed. It is possible to observe that when the degradation level is limited the best results are associated to high level of significance level (i.e. high selectivity of the test). When few data are available, indeed, a small portion of the HC-time space is explored so that it is not possible to get a solid evidence about the different statistical distribution of the target and fleet products degradation rate. In this case, low values of significance level would be ineffective, so that it is necessary to raise the value of a in order to be more selec-

tive. On the contrary, as long the observed degradation level increases, the best results are obtained with lower values of a. The motivation for this is that as the amount of CM data available for the target product increases, the statistical distribution of the degradation rate is a reliable representation of deterioration process affecting the product and the differences with respect to the statistical distributions related to the fleet products are more emphasized. It follows, that is opportune to decrease the value of the significance level, as it allows to lower the probability of rejecting the null hypothesis that the two samples are drawn from the same distribution when this is the actual condition. The analysis of the performances exhibited by the algorithm in the different tested configurations (in terms of number of products composing the fleet and adopted significance level) has allowed to define a practical and effective rule of thumb for an optimal choice of the parameter a as function of the observed degradation level, independently on the number of products composing the reference fleet. It represents a satisfying trade-off between the simplicity to be applied and the resulting algorithm prognostic power and is graphically described in Fig. 8. Finally, Fig. 9 compares the results about the algorithm performances obtained through the sub-fleet methodology proposed in this paper with the results presented in [6]. The black curve depicts the prognostic algorithm performances when a reference fleet of 81 products is considered and no sub-fleet discrimination procedure is applied. Better results are obtained when a suitable subfleet is identified. In particular, dashed red curve refers to the performance obtained starting from the same reference fleet and defining a sub-fleet according to the distance-based procedure introduced in [6]. The new proposed sub-fleet identification based on the statistical test about the products degradation rate distribution, however, improve substantially the prognostic power as performances higher than 90% are obtained already when only 30% degradation level is observed (continuous red curve). The reported results have been obtained setting the value of significance level according to the strategy depicted in Fig. 8. Degradation rate based sub-fleet identification provides higher performances even when a limited reference fleet is involved as shown with the continuous blue line, which refer to the results obtained with the smallest considered starting fleet, composed of 12 products. Such performances are higher than the ones provided by the distance-based procedure when dealing with the same reference fleet (dashed blue line), but also when 81 products are taken into account. In any case, however, the results obtained through a preliminary phase of subfleet discrimination, no matter the used technique, seems to be very effective and contributes to the improvement of the prognos-

Algorithm Performances (N =69) f

95

= 0.65

0.6

= 0.55

0.55

= 0.50 = 0.45 = 0.40

85

= 0.35 = 0.30 = 0.25

80

= 0.20 = 0.15 = 0.10

75

Significance level

Algorithm Performances [%]

= 0.60

90

Optimal significance level

0.65

= 0.70

0.5 0.45 0.4 0.35

= 0.05

0.3 70 10

20

30

40

50

60

70

Observed degradation level [%] Fig. 7. Algorithm performances for Nf = 69 and different values of a.

0.25 10

20

30

40

50

60

70

Observed degradation level [%] Fig. 8. Proposed rule of thumb for an optimal choice of a.

Please cite this article in press as: G. Leone et al., A data-driven prognostic approach based on statistical similarity: An application to industrial circuit breakers, Measurement (2017), http://dx.doi.org/10.1016/j.measurement.2017.02.017

G. Leone et al. / Measurement xxx (2017) xxx–xxx

Comparison of Prognostic Performances

Algortihm Performances [%]

95 90 85 80 75 70 65 10

20

30

40

50

60

70

Observed degradation level [%] No SF Selection (N =81)

DB SF Selection (N =81)

DB SF Selection (N =12)

ST SF Selection (N =81)

f

f

f

f

ST SF Selection (N =12) f

Fig. 9. Improvement of algorithm prognostic power through a proper sub-fleet selection. Red and blue dashed lines refer to the results obtained with the distancebased (DB) sub-fleet (SF) selection proposed in [6], whereas red and blue continuous lines refer to the performances obtained with the statistical test about the degradation rate distribution. Black line depicts the results exhibited by the prognostic algorithm when no SF selection is performed. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

tic power of the algorithm. The performance obtained without defining a suitable sub-fleet and represented with the black curve, indeed, are always lower than the previous cases. The goodness of both methodologies is further justified by the fact that both show performance levels of at least 90% when a 70% of degradation level has been monitored, no matter the number of products composing the reference fleet. 4.1. Considerations The proposed algorithm has shown improved performances with respect to the results obtained for the same application case with the algorithm previously developed by the authors in [6]. The main novel differentiator of our proposal is the ability to recognize situations in which the target product is characterized by a degradation rate quite different from the one exhibited by the fleet products. In such scenario, indeed, the two sample KST at the basis of the selection procedure would reject the null hypothesis of statistical similarity of the degradation rate for all the fleet products. In such case, the use of exclusively CM data of the target product could be a better option for the RUL estimation. Furthermore, the proposed methodology can be applied also for diagnostic aims. In fact, the application of the KST for two samples of degradation rate, both related to the target product but computed in different time stamps, enables the identification of an eventual change in the degradation process or the presence of a particular trend. 5. Conclusions In this paper, a data-driven prognostic algorithm for the estimation of the RUL of a product is proposed. It is based on the acquisition and exploitation of run-to-failure data of homogeneous products, referred as fleet of products. The core of the contribution is the proposal of a method for the identification of a subset of products (subfleet) exhibiting highest statistical similarity with the target product (for which the RUL estimation is required) for what concerns

7

the degradation rate characterizing the deterioration process affecting it. The methodology has been compared with those found in literature, including a previous work of the same authors of this paper, and the main differences have been highlighted. The results obtained for the application case of Medium Voltage and High Voltage Circuit Breakers have demonstrated that this approach considerably contributes to the improvement of the RUL predictions and the increase of the prognostic algorithm performances. In particular, the advantages of our proposal is the possibility to exploit in the estimation of the future degradation of the test product all the information stored in the Health Condition profile of the sub-fleet products and not only the knowledge of their RUL, that corresponds to the last point of such curves. This makes the prognostic algorithm able to provide, besides the RUL, also other outcomes of interest such as the Probability of Failure (PoF) within a predetermined window of time. Another arising advantage is the possibility to involve in the analysis also reference products that still have not failed, making the approach very interesting for those classes of systems, such as Vacuum Circuit Breakers, for which the acquisition of runto-failure data in substantial quantities is a hard task. Appendix A In this appendix, a list of acronyms and notations introduced in the paper is presented. CB CDF CM dr ds dxj EoL HC HV K KST MC MTTF MV n Nf NMC NN Nsf OD PdM Perf RUL s t VCB

a c

DHC Dt

Circuit Breaker Cumulative Distribution Function Condition Monitoring Degradation rate Distance threshold corresponding to the selected quantile threshold s Root mean square distance along the time axis between HC profiles of two products x and j End of Life Measured Health Condition High Voltage Time series of the Health Condition values measured for a given product at each time stamp Kolmogorov-Smirnov Test Monte Carlo Mean Time To Failure Medium Voltage Number of HC measurements available for a given product Number of products composing the reference fleet Number of performed Monte Carlo simulations Neural Network Number of products composing the sub-fleet Observed Degradation Predictive Maintenance Performance exhibited by the prognostic algorithm Remaining Useful Life Quantile set as threshold for the determination of the sub-fleet through the distance computation Health Condition measurement time stamp Vacuum Circuit Breaker Significance level at which the KST is performed Indicator of the correctness of the predicted RUL confidence interval Difference between two consecutive HC values for a given product Difference between two consecutive observation time stamps for a given product

Please cite this article in press as: G. Leone et al., A data-driven prognostic approach based on statistical similarity: An application to industrial circuit breakers, Measurement (2017), http://dx.doi.org/10.1016/j.measurement.2017.02.017

8

G. Leone et al. / Measurement xxx (2017) xxx–xxx

References [1] M.G. Pecht, Prognostics and Health Management of Electronics, John Wiley & Sons, 2008. [2] D.C. Swanson, A general prognostic tracking algorithm for predictive maintenance, in: 2001 IEEE Aerospace Conference Proceedings, 2001, pp. 2971–2977. [3] M. Schwabacher, A survey of data-driven prognostics, in: Proc. AIAA Infotech@Aerosp. Conf, Reston, VA, 2005. [4] S. Turrin, S. Subbiah, G. Leone, L. Cristaldi, An algorithm for data-driven prognostics based on statistical analysis of condition monitoring data on a fleet level, in: 2015 IEEE International Instrumentation and Measurement Technology Conference (I2MTC), 2015, pp. 629–634. [5] L. Cristaldi, G. Leone, R. Ottoboni, S. Subbiah, S. Turrin, A comparative study on data-driven prognostic approaches using fleet knowledge, in: 2016 IEEE International Instrumentation and Measurement Technology Conference (I2MTC), 2016, pp. 263–268. [6] G. Leone, L. Cristaldi, S. Turrin, A data-driven prognostic approach based on sub-fleet knowledge extraction, in: 14th IMEKO TC10 Workshop on Technical

[7] [8]

[9]

[10]

[11]

[12]

Diagnostics: New Perspectives in Measurements, Tools and Techniques for Systems Reliability, Maintainability and Safety, 2016, pp. 417–422. L. Angstenberger, Dynamic Fuzzy Pattern Recognition with Applications to Finance and Engineering, Kluwer Academic, Boston, 2001. E. Zio, F. Di Maio, A data-driven fuzzy approach for predicting the remaining useful life in dynamic failure scenarios of a nuclear power plant, Reliab. Eng. Syst. Safety, RESS (2009), http://dx.doi.org/10.1016/j.ress.2009.08.001. B.K. Guépié, S. Lecoeuche, Similarity-based residual useful life prediction for partially unknown cycle varying degradation, in: 2015 IEEE Conference on Prognostics and Health Management (PHM), 2015, pp. 1–7. O.F. Eker, F. Camci, I.K. Jennions, A similarity-based prognostics approach for remaining useful life prediction, in: The 2nd European Conference of the Prognostics and Health Management (PHM) Society, Nantes, France, 8–10 July 2014, vol. 5, no. 11, 2014. T. Wang, Jianbo Yu, D. Siegel, J. Lee, A similarity-based prognostics approach for remaining useful life estimation of engineered systems, in: 2008 International Conference on Prognostics and Health Management (PHM), 2008, pp. 1–6. S.M. Ross, Introduction to Probability and Statistics for Engineers and Scientists, 4th ed., Elsevier Academic Press, Amsterdam, 2009.

Please cite this article in press as: G. Leone et al., A data-driven prognostic approach based on statistical similarity: An application to industrial circuit breakers, Measurement (2017), http://dx.doi.org/10.1016/j.measurement.2017.02.017