Bayesian Hierarchical Models for aerospace gas turbine engine prognostics

Bayesian Hierarchical Models for aerospace gas turbine engine prognostics

Expert Systems with Applications 42 (2015) 539–553 Contents lists available at ScienceDirect Expert Systems with Applications journal homepage: www...

2MB Sizes 1 Downloads 28 Views

Expert Systems with Applications 42 (2015) 539–553

Contents lists available at ScienceDirect

Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa

Bayesian Hierarchical Models for aerospace gas turbine engine prognostics Martha A. Zaidan ⇑, Robert F. Harrison, Andrew R. Mills, Peter J. Fleming Automatic Control and Systems Engineering, The University of Sheffield, Mappin Street, S1 3JD, UK

a r t i c l e

i n f o

Article history: Available online 23 August 2014 Keywords: Condition-based maintenance Prognostics Gas turbine engine Bayesian Hierarchical Model

a b s t r a c t Improved prognostics is an emerging requirement for modern health monitoring that aims to increase the fidelity of failure-time predictions by the appropriate use of sensory and reliability information. In the aerospace industry, it is a key technology to maximise aircraft availability, offering a route to increase time in-service and to reduce operational disruption through improved asset management. An aircraft engine is a complex system comprising multiple subsystems that have dependent interactions so it is difficult to construct a model of its degradation dynamics based on physical principles. This complexity suggests that a statistically robust methodology for handling large quantities of real-time data would be more appropriate. In this work, therefore, a Bayesian approach is taken to exploit fleet-wide data from multiple assets to perform probabilistic estimation of remaining useful life for civil aerospace gas turbine engines. The paper establishes a Bayesian Hierarchical Model to perform inference and inform a probabilistic model of remaining useful life. Its performance is compared with that of an existing Bayesian nonHierarchical Model and is found to be superior in typical (heterogeneous) scenarios. The techniques use Bayesian methods to combine two sources of information: historical in-service data across the engine fleet and once per-flight transmitted performance measurement from the engine(s) under prognosis. The proposed technique provides predictive results within well defined uncertainty bounds and demonstrates several advantages of the hierarchical variant’s ability to integrate multiple unit data to address realistic prognostic challenges. This is illustrated by an example from a civil aerospace gas turbine fleet data. Ó 2014 Elsevier Ltd. All rights reserved.

1. Introduction To remain competitive, there is a growing need for gas turbine engines to minimise whole life costs. Prognostics holds great promise to contribute to this goal. Prognostics is an emerging requirement of modern health monitoring that aims to increase the fidelity of failure-time predictions, potentially reducing lifing conservatism and operational disruption. The premise of prognostics is to use sensor measurements to estimate the remaining useful life (RUL) of an asset. A considerable amount of research in prognostic methods has been conducted to improve the prediction of RUL for engineering assets. The research has been carried out in various aerospace domains including avionics (Celaya, Saha, Wysocki, & Goebel, 2008; Hecht, 2006; Johansson & Leisner, 2012; Pecht & Jaai, ⇑ Corresponding author. E-mail addresses: [email protected] (M.A. Zaidan), r.f.harrison@ sheffield.ac.uk (R.F. Harrison), a.r.mills@sheffield.ac.uk (A.R. Mills), p.fleming@ sheffield.ac.uk (P.J. Fleming). http://dx.doi.org/10.1016/j.eswa.2014.08.007 0957-4174/Ó 2014 Elsevier Ltd. All rights reserved.

2010; Xu & Xu, 2011), unmanned aerial vehicles (UAV) (Walker, 2010), defence, such as the Joint Strike Fighter programme (Brown, Moore, McCollom, & Hess, 2007; Gao, Li, Zhang, & Zhang, 2008) and civil aerospace (Lipowsky, Staudacher, Bauer, & Schmidt, 2010; Puggina & Venturini, 2012). In this work, the application is focused on prognostics for civil aerospace gas turbine engines. Gas turbine engines are highly valuable assets in aircraft, where large sums are spent in maintenance support and logistics. The application of prognostic technologies in civil aerospace has the potential to reduce costs to commercial aircraft operators. A gas turbine engine is a complex system comprising multiple subsystems that have dependent interactions. Therefore, it is difficult to construct a model-based approach that mimics the dynamics of the system’s degradation. This encourages the direct use of routinely measured in-flight data to achieve the prognostic goal. There are a large number of gas turbine engines used for civil aerospace. For instance, Rolls-Royce has more than 14,000 engines in service, operated by more than 500 airlines and powering more than 5.5 million commercial flights per year (Rolls-Royce, 2013). This fleet of engines is capable of generating a considerable volume

540

M.A. Zaidan et al. / Expert Systems with Applications 42 (2015) 539–553

Fig. 1. Illustrations of a hierarchy of the engine fleet across airlines, engine type and individual engine (left) and multiple degradation data available (right). A considerable volume of health signal data generated from the fleet of engines is advantageous for data-driven prognostic algorithm.

of health signal data. Fig. 1 illustrates a hierarchy of degradation signals across airlines, engine type and individual engine. The challenge is: how can a prognostic algorithm use, optimally, these data for estimating the RUL of a specific engine. In the statistical literature (Gelman & Hill, 2006), these health signals are known as repeated measurement or panel data. It is advantageous to maximise the value of these data by modelling them with a hierarchical structure that is capable of accommodating multiple degradation signals. In this way, the health information can be shared between the engines in order to enhance RUL estimation for any specific engine. In particular, for a number of the reasons given in Zaidan, Mills, and Harrison (2013), a Bayesian framework can be justified for the development of prognostic algorithms. This paper presents an advanced data-driven Bayesian Hierarchical Model (BHM) to enhance RUL estimation of civil aerospace gas turbine engines. The focus is to model the degradation signals obtained from M engines (unit level), for a specific type of engine (fleet level). The proposed technique is then compared with a Bayesian non-Hierarchical Model (BnHM) which does not share fleet-information as effectively. 2. Background Selecting an appropriate health index1 is crucial for the success of prognostic programme. The following subsections discuss how to determine the main health indicator in gas turbine engine and review briefly prognostic algorithms in Bayesian framework. 2.1. Gas turbine engine degradation Gas turbine engines show the effects of damage and deterioration in its lifetime of service. The degradation of an engine has an adverse effect on the engine’s overall performance (Khani et al., 2012). As the engines age, their performances degrade for a variety of reasons including (General-Electric, 2008; Malinge & Courtenay, 2007): (1) dust/dirt ingestion and further accumulation on fan blades/ compressor airfoils, (2) increased air seal, compressor and turbine blade-tip clearances because of rub, 1

Health index represents the overall health of a gas turbine engine to be forecast.

(3) other mechanisms such as erosion of airfoils and seals, hot section oxidation, (4) foreign object damage. As engine efficiency reduces, the fuel supplied will necessarily increase to generate the required thrust. This results in an increase in the temperature in the engine. Therefore, the global health of the engines can be derived from the core flow temperature, measured at the turbine exit. This is called Turbine Gas Temperature (TGT). An estimate of the difference between the certified TGT redline (operational limit on TGT) and a projection of TGT to full-rated take-off at reference conditions is named TGT margin (Malinge & Courtenay, 2007). The TGT margin is usually used to monitor the gas path degradation of the engine to detect the changes in performance for each engine and to indicate the need for inspection/ maintenance. The TGT margin can also be used to forecast the RUL of the engines. Fig. 2 shows two examples of take-off TGT margin degradation data as functions of number of flight (cycles) between two maintenance events. In all cases, measured engine data are normalised in both axes. There are several challenges in prognostics using TGT margin degradation data. First, TGT margin is difficult to use directly to evaluate prognostic algorithms. Such degradation signals are frequently right-censored, that is the engines are often subject to a maintenance intervention before crossing the defined threshold. As a degradation signal, TGT margin data is also very noisy making it difficult to understand the underlying degradation phenomenon. Consequently, a ‘‘ground truth’’ does not exist and there is no model based on physical principles that can describe accurately the behaviour of the degradation. A second complication arises in the large uncertainty associated with TGT margin data owing to engine design, manufacture, ambient, environmental and operating conditions, duty mission, maintenance actions, etc (Li & Nilkitsaranont, 2009). When the uncertainty in the data is very large (as shown in Fig. 2(b)), this may cause inconsistency in prognosis, i.e. an irrational prediction may arise, especially when there is little data available. The final issue is the fact that the pattern of degradation is frequently not linear and changes with time. In gas turbine engine degradation, it is common to observe almost linear patterns of degradation (Li & Nilkitsaranont, 2009; Puggina & Venturini, 2012), however there are many cases where the rate of degradation may be non-linear e.g. Li and Nilkitsaranont (2009) and

541

M.A. Zaidan et al. / Expert Systems with Applications 42 (2015) 539–553

Engine 9

Engine 8 1.2

TGT margin (normalised)

TGT margin (normalised)

1.2 1 0.8 0.6 0.4

action warning threshold

0.2

1 0.8 0.6 0.4

action warning threshold

0.2

failure warning threshold

0 0

10

20

30

40

failure warning threshold

0 50

60

70

80

Flight cycle (normalised)

0

10

20

30

40

50

60

70

80

Flight cycle (normalised)

Fig. 2. Examples of normalised TGT margin degradation. Engine 8 appears to have a change in the degradation slope whereas engine 9 has very high noise.

Saravanamuttoo, Rogers, Cohen, and Straznicky (2009), piecewise linear with a step increase in rate as shown in Fig. 2(a). In order to deal with the absence of ground truth, the proposed method for testing and comparing the prognostic algorithms begins by testing them on synthetic degradation data to gain confidence in their applicability and to explore under what conditions BHM offers an advantage over BnHM. Subsequently, the prognostic approaches are applied to real TGT margin data to demonstrate their applicability. Furthermore, to overcome the remaining challenges, a suitable class of Bayesian prognostic algorithm should be selected, which will be discussed in next subsection. 2.2. Bayesian prognostic algorithms Prognostic algorithms play a vital role in estimating the RUL of a system. Selecting an appropriate algorithm for a particular application is crucial to the ultimate success of a prognostic programme (Sikorska, Hodkiewicz, & Ma, 2011). Data-driven Bayesian Regression (BR) approaches are promising for the probabilistic estimation of RUL for civil aerospace gas turbine engines. Two main factors drive the focus on the Bayesian framework: (i) They provide probability distributions which have advantages over point value estimates, such as quantifying the risk of failure, handling uncertainty, etc (Bishop, 2006). For example, prognostics using a neural network2 (Gebraeel, Lawley, Liu, & Parmeshwaran, 2004) estimates RUL as a single time point while, by using a Bayesian method, RUL estimation takes the form of a failure-time distribution. As a result, any decision can be informed by the entire distribution; hence the risk of early failure can be assessed. (ii) They provide a natural framework for learning, integrating prior belief and permitting the update of probability estimates as data arrives (usually asynchronously) across the fleet and an ability to subsume a new unit into the fleet without making special provision. This is more difficult in conventional e.g. least-squares based methods (Berry, 1996). The incorporation of prior belief and means that even in the early life of a fleet of assets, when data are scarce, it is possible to generate sensible estimates of predictive distributions to estimate RUL. According to the model structure, Bayesian prognostic algorithms can be divided into two classes; Bayesian non-Hierarchical 2

Here, the neural network was not set in a probabilistic framework.

Model (BnHM) and Bayesian Hierarchical Model (BHM), which are considered as a single level model and a multi-level model, respectively. Existing, state-of-the-art prognostic algorithms are able to update degradation models with data collected from individual assets with in-service measurements (BnHM). However, the ability to accommodate a heterogeneous fleet into a single prognostic model, has not been explored to the best of our knowledge. Examples of prognostic algorithms based on BnHMs include bearing prognostics using relevance vector machine (Widodo & Yang, 2011). However, this approach uses run-to-failure data for training the algorithms, which is impractical for gas turbine engines owing to their relatively long lifetimes and the fact that they would never knowingly be allowed to fail. Gas turbine engine prognostics requires degradation data for RUL estimation. An example in this area is also based on a BnHM, where Bayesian forecasting and dynamic linear models were applied for capturing uncertainty in degradation data (Lipowsky et al., 2010). This method detects the change in the data (outlier detection) and performs a prognostics on measurement values. The drawbacks of this approach are that several crucial parameters need to be determined heuristically, leading to uninformative distributions and while detecting the change, ‘‘rule of thumb’’ is involved to determine gradient change in the prognostic trajectory. The use of ‘‘rule of thumb’’ leads to a non-generic approach, i.e. the method may only work for a specific problem. A potential prognostic method based on BnHM is the Bayesian parametric model for degradation developed by Gebraeel, Lawley, Li, and Ryan (2005) and Gebraeel (2006), which is able to update, on-line, the stochastic parameters of exponential degradation models. This method is promising because it utilises the available reliability data to form an informative prior, i.e. the initial knowledge about degradation. This results in an appropriate RUL estimation in early prediction interval, when few data are available. The method is also able to update failure-time predictions by using real-time condition monitoring information and reliability information. Another advantage of the previous method is that the posterior distribution can be obtained analytically through a suitable choice of (conjugate) priors. In this paper, that algorithm is called Bayesian Regression 1 (BR-1). However, due to inflexibility in specifying the noise variance, BR-1 cannot cope well with large variations (from unit to unit) in uncertainty in degradation data which present in TGT margin degradation. In previous work, an extension of BR-1 was developed, called Bayesian Regression 2 (BR-2) (Zaidan et al., 2013). This includes measurement noise in the Bayesian treatment, treating the noise

542

M.A. Zaidan et al. / Expert Systems with Applications 42 (2015) 539–553

variance for each unit as a random variable with its own prior distribution. This prevents the averaging that would occur between high and low noise measurements allowing for a more accurate unit by unit prognosis. It was shown to result in improvements in accuracy and consistency for gas turbine engine fleet data. The algorithm uses the Normal Inverse Gamma (NIG) conjugate prior to obtain a deterministic solution for the posterior distribution of the model parameters. In Zaidan et al. (2013), the advantages of BR-2 have been demonstrated for dealing with ‘‘standard’’ TGT margin degradation.3 However, in order to deal with more challenging problems, a more general prognostic algorithm is required. The BHM is therefore proposed as a strong candidate and we refer to this as Bayesian Regression 3 (BR-3) in the sequel. Hierarchical models (also called multilevel models) are statistical models that have their own hierarchy, i.e. the model coefficients vary at more than one level (Gelman & Hill, 2006). This capability allows Bayesian models to obtain information from both prior distributions; an individual prior, related to a particular unit and a common prior, related to the whole fleet of units. This type of model has been used in many fields to deal with panel data (econometrics), longitudinal data (biostatistics) and repeated measurement data (reliability).

2.2.1. Pros and cons of BHMs BHMs are attractive in fleet-wide prognostics for a number of reasons. First, a fully Bayesian approach to inference, disregarding the model structure has the advantage over conventional analysis of handling uncertainty in a natural way for finite (even small) sample sizes, i.e. does not rely on asymptotic arguments, through proper assignment of likelihood function and prior distributions. In addition, the interpretation of uncertainty is straightforward, e.g. a 95% ‘‘credible region’’ means that the probability of a variable lying inside it is 95%. Further, Bayesian inference provides a natural mechanism for updating the model as more data becomes available and this can be done asynchronously as data arrive for each unit e.g. after each flight. Finally, it provides a natural hedge against overfitting when fitting complex model structure, without having to take explicit action. For large samples, inference ultimately becomes independent of choice of prior so in regression typically converges to the classical result e.g. of least-squares/maximum likelihood whence the advantages of the Bayesian approach are reduced except, perhaps, when the model changes over time. In the current application the early-lifetime behaviour of individual units can be informed by data from older units and realistic assessment of uncertainty in predictions is provided. The choice of the hierarchical framework is appropriate for the following reasons. As illustrated in Fig. 1, the problem has a natural multilevel or ‘‘group’’ structure, individual engines within ‘‘builds’’ (variations on a nominal design) within airlines (having different operating policies/piloting styles). It is expected that data vary from engine-to-engine, from buildto-build and from airline-to-airline or even finer levels of gradation. Hierarchical modelling admits this structure in a natural way and is not beset with the problem of overfitting to a large number of parameters as might be the case in a classical analysis where multiple interaction terms would appear. All data is used to perform inference which can be beneficial for groups with small sample size. The alternative of fitting unit-by-unit models is unable to exploit information from similar items and will be unreliable until a sizeable sample of data has been acquired. Prognosis is essentially a problem of prediction and classical regression that includes group effects has no automatic way of making predictions for a new group 3 Here, we define ‘‘standard’’ degradation as degradation data with an approximately linear pattern which occurs in much measured TGT margin data.

(e.g. a new build entering service), instead relying on a two-stage regression (at individual and group level) which will be unreliable in early life when data are scarce. The hierarchical model obviates this problem (Gelman & Hill, 2006) and furthermore, permits the inclusion of different sets of covariates at each level which, again, is unnecessarily complicated in the classical framework. Finally, the hierarchical model accounts for any correlation between higher levels which is much more difficult to do in a conventional model. Unfortunately, while there are numerous advantages to the BHM, there are also disadvantages. First, for the solution of any specific problem, assignment of the probability model requires substantial technical facility in advanced probability and statistics, particularly in the specification of the priors. The main difficulty, however, lies in performing inference. Even in the case where conjugate priors are assigned, the multi-layer structure makes it impossible to find a closed-form solution for the posterior and predictive distributions, demanding a numerical solution. For non-trivial models (e.g. many units/groups) multi-dimensional numerical integration is unappealing owing to the curse of dimensionality and Monte Carlo simulation is preferred, usually via Markov Chain Monte Carlo (MCMC) algorithms such as Gibbs Sampling (Gelfand, Hills, Racine-Poon, & Smith, 1990) or the Metropolis–Hastings algorithm. Such methods can be computationally intensive, which is becoming less of an issue with advent of e.g. GPU arrays, but, for the practitioner, demands a fairly high degree of understanding of how the algorithms behave to generate a successful solution (Raftery & Lewis, 1992). This may militate against its adoption. For online use where the model is updated as data become available, sequential MCMC algorithms, e.g. particle filters, compound this difficulty. It is fair to say that, for prognostic problems displaying multilevel structure, use of the BHM will be unlikely to perform worse than a conventional analyses and has the potential to significantly outperform them if group/ individual differences are substantial. In biostatistical applications, hierarchical models arise frequently in situations where several measurements are made on a number of individuals (Davidian & Giltinan, 1995). In econometrics, the models are commonly used with panel data (Koop, Poirier, & Tobias, 2007) that include understanding the diversity of preferences and sensitivities that exists in the market (Rossi, Allenby, & McCulloch, 2005), purchasing (Guo, 2009) and in forecasting applications (Tobias, 2001). In survival analysis and reliability engineering, hierarchical models can be used to create a natural model for failure count, failure time data and degradation data from multiple-unit system (Hamada, Wilson, & Reese, 2008; Ibrahim, Chen, & Sinha, 2001). In addition, these models have been applied in political science (Gelman & Hill, 2006), spatial analysis (Banerjee, Carlin, & Gelfand, 2004) and many other statistical fields. Hierarchical models also have been adopted recently in the field of machine learning. Examples of applications include computer vision (Fei-Fei & Perona, 2005), multimedia web page classification (Girolami & Rogers, 2005) and information retrieval (Blei, Jordan, & Ng, 2003). However, the authors have not found any literature on the use of hierarchical models in prognostic applications. This paper presents a comparative study between the BR-2 and BR-3 approaches. The algorithms are implemented on synthetic degradation data to evaluate their prognostic performance and subsequently, are applied on actual gas turbine service data.

3. Prognostics using a Bayesian Hierarchical Model In this section, the prognostic methodology using BR-3 is described. The discussion also includes the details of prior specification and the solutions of posterior as well as predictive distributions based on a sampling method.

M.A. Zaidan et al. / Expert Systems with Applications 42 (2015) 539–553

The hierarchical model in the Bayesian framework was introduced in Lindley and Smith (1972). In complementary work, a solution for the posterior distribution for the hierarchical model using Markov Chain Monte Carlo (MCMC) was presented in Gelfand et al. (1990). Subsequently, this method has attracted many researchers to address growth curve analysis (Davidian & Giltinan, 1995; Menzefricke, 1999; Robinson & Crowder, 2000) which also covers the many areas mentioned above. In the hierarchical model context, degradation signals obtained from each unit, yj , are modelled at the first model level, by:

yj ¼ Uj wj þ ej

ð1Þ

where j indexes the jth unit, ej is a random error term that follows an iid Gaussian distribution, ej  N ð0; r2j IÞ, and r2j is the noise variance, assumed to follow an inverse-Gamma (IG) distribution, r2j  IGðaj ; bj Þ. Uj is an Nj  D design matrix of basis functions whose ith row refers to the evaluation of the D-dimensional basis function expansion of the ith input vector associated with the jth unit. For notational convenience we assume the same expansion for every unit but this is not necessary in general. The vector, wj , is a D-dimensional vector of weights for the jth unit. The second level of the model represents information common to the whole set of degradation signals. In order to express the commonality between units’ parameters (wj ), they are assumed to be drawn from a common Gaussian distribution. This can be written as:

 VÞ pðwj Þ ¼ N ðw;

ð2Þ

 is the mean and V is the covariance of wj . The parameters where w  and V 1 can be modelled as a Gaussian and a Wishart distribuw   N ðg; CÞ and V 1  Wð½qR1 ; qÞ, where g is tion, respectively; w  C is the covariance of w;  R is a scale matrix and q the mean of w, denotes degree-of-freedom. The Wishart distribution is a conjugate prior for precision matrix of a multivariate Gaussian distribution (Bishop, 2006) and a multidimensional generalisation of the Gamma distribution (Murphy, 2007). As for the BR-2 approach, this technique also combines two sources of information, all existing fleet data and real-time sensor information, to update, periodically, the RUL of individual units. There are five main components in the proposed model, illustrated in Fig. 3. Components 1–3 are used to process the health signal. The database contains historical degradation signals from many units which can be used to estimate prior parameters. The model is able to accommodate multiple degradation signals monitored asynchronously from different units. These signals are used to compute the likelihood function. Having both the likelihood and the prior distribution, the posterior distribution can be calculated by Bayes’s rule, that is: posterior / likelihood  prior. The predictive distribution is then calculated and extrapolated over a desired time horizon (component 4). In the final step (component 5), the forecasts is transformed into a failure time distribution to deliver an estimate of RUL. 3.1. Prior distribution BR-3 has three prior distributions with six parameters.

543

BR-3 has more prior parameters than BR-2 all of which must be chosen indicating a need for a guiding principle. Various methods have been suggested in the literature for setting diffuse priors, e.g. Bernardo and Smith (2009), Kass and Wasserman (1996), Spiegelhalter, Thomas, Best, and Gilks (1996) and Gelman, Carlin, Stern, and Rubin (2004). However, for our purpose, non-informative priors can sometimes have strong and undesirable implications (Gelman, 2009). To address this we propose a methodology to produce ‘‘moderately’’ informative prior distributions for BR-3, where some of the prior parameters can be estimated e.g. from existing in-service data. First we consider the IG distribution associated with the individual noise variances, pðr2j jaj ; bj Þ. In BHM, the choice of prior for these variances is important and may be difficult (Daniels, 1999). The most common non-informative choice for this distribution is the Jeffreys prior (Jeffrey, 1961). Jeffrey’s rule (Jeffrey, 1961; Hamada et al., 2008) defines a completely non-informative prior for scalar variances as:

pðr2 Þ / 1=r2

ð6Þ

However, the use of such an uninformative prior may lead to a serious problem, where the resulting inference may be sensitive (Gelman, 2006) or require a large amount of data to converge. To overcome this we propose choosing an informative prior estimated from the in-service database. The parameters aj and bj of Eq. (3) can be estimated by taking the squared residual values between the ordinary least squares (OLS) solution and the historical record. ^ j , and variance, r ^ 2j , for each This provides an estimate of the mean, l unit, and the parameters, aj and bj , can be calculated from:

aj ¼

 2 l^ j þ2 r^ j

^j bj ¼ l

!  2 l^ j þ1 r^ j

ð7Þ

ð8Þ

  N ðg; CÞ. Considering now the prior on the common mean, w The parameter, g, can be estimated by calculating the mean of all the wj obtained by OLS. To express our prior uncertainty about the value of this common mean, the parameter C can be selected to be diffuse. Finally, consider the prior distribution of V 1 which has two parameters, q and R. The parameter, q, is restricted to be q > D  1 to ensure that the Gamma function in the normalisation factor is well-defined (Murphy, 2007). The parameter, R, can be determined by calculating the covariance of the OLS estimates of the wj . To control the diffuseness of the Wishart prior on V 1 , we can ~r R according to the desired effect expressed in scale R thus R Table 1. By increasing ~r, the model will emulate the BnHM (e.g. BR-2) whereas the common prior will play a dominant role by specifying ~r to be small (Tobias, 2001). In the event that prognosis is required for a new unit, an average (across the fleet) of the individual priors can be taken to initialise the system for the addition of the ðm þ 1Þth unit. 3.2. Likelihood function

pðr2j jaj ; bj Þ ¼ IGðaj ; bj Þ

ð3Þ

 g; CÞ ¼ N ðg; CÞ pðwj

ð4Þ

pðV 1 jq; RÞ ¼ Wð½qR1 ; qÞ

ð5Þ

The first prior is the individual prior, Eq. (3), whereas the last two priors, Eqs. (4) and (5), are the common prior distributions.

Multiple observed degradation signals are used to compute the likelihood function. The likelihood for our model is conditional on the joint probability of observing the data (X j ) and the model parameters (wj ; r2j ), written:

pðyj jX j ; wj ; r2j Þ ¼ N ðUj wj ; r2j IÞ where, j is the unit number.

ð9Þ

544

M.A. Zaidan et al. / Expert Systems with Applications 42 (2015) 539–553

Fig. 3. Block diagram of the BHM for prognostics. The main difference between BHM and BnHM lies in accommodating multiple units in a single model.

Table 1 Effect of scaling factor, ~r , on the Wishart distribution. ~r

V 1

Effects

‘‘Small’’ ‘‘Large’’

‘‘Large’’ ‘‘Small’’

Common Individual

aforementioned joint posterior. Thus, all of the terms in the product of Eq. (10) not involved in each posterior conditional are absorbed into the normalising constant of this conditional. The detail of the complete posterior conditionals follows.

3.3. Posterior distribution

3.3.1. Posterior conditional for wj All of the terms in the product of Eq. (10) that do not involve wj are absorbed into the normalising constant of this conditional. Hence, we can calculate posterior conditional for wj :

Having the likelihood function and the prior distributions, a joint posterior distribution for all the parameters of this model can be computed using Bayes’ theorem by:

       V 1 ; r2j ; yj / p yj jX j ; wj ; r2j p wj jw;  V p wj jX j ; w;      V 1 ; r2j ; yj ¼ N dwj ; Dwj p wj jX j ; w;

" M Y

 V 1 ; r2j jyj Þ / pðwj ; w;

#  V 1 Þ pðr2j jaj ; bj Þ pðyj jX j ; wj ; r2j Þ pðwj jw;

j¼1

 g; CÞ pðV 1 jq; RÞ pðwj

ð10Þ

where M is the total number of units. Integration of the joint posterior distribution is not analytically tractable so we resort to monte carlo sampling to compute the posterior distribution. The conjugacy of the specifications in Eqs. (2)–(5) at each stage in the hierarchy delivers a straightforward form for the complete conditional posterior densities for all parameters of interest, making the Gibbs sampler an appropriate choice. To obtain the posterior distribution for each parameter, it is necessary to derive their posterior conditional distributions (Gelfand et al., 1990; Koop et al., 2007; Menzefricke, 1999). Each complete posterior conditional is proportional to the

ð11Þ

where

 1 Dwj ¼ UTj Uj =r2j þ V 1    dwj ¼ Dwj UTj yj =r2j þ V 1 w

ð12Þ ð13Þ

 3.3.2. Posterior conditional for w  All of the terms in the product of Eq. (10) that do not involve w are absorbed into the normalising constant of this conditional. Hence, we have:

 j ; wj ; V 1 ; r2j ; yj Þ / pðwjX

" M Y

#  V 1 Þ pðwj  g; CÞ pðwj jw;

j¼1

 j ; wj ; V 1 ; r2j ; yj Þ ¼ N ðdw ; Dw Þ pðwjX where

ð14Þ

545

M.A. Zaidan et al. / Expert Systems with Applications 42 (2015) 539–553

 1 Dw ¼ MV 1 þ C 1 dw ¼ Dw MV 1

M 1X wj M j¼1

ð15Þ !

pðyj jX j ; yj Þ ¼

Z

 V 1 ; r2j jyj Þ dwj dw  dV 1 dr2 pðUj wj ; r2j jyj Þpðwj ; w;

!1 þ C 1 g

ð16Þ

3.3.3. Posterior conditional for r2j All of the terms in the product of Eq. (10) that do not involve r2j are absorbed into the normalising constant of this conditional. Hence, we have:

 V 1 ; yj Þ / pðyj jX j ; wj ; r2j Þpðr2j jaj ; bj Þ pðr2j jX j ; wj ; w; ^j Þ  V 1 ; yj Þ ¼ IGða ^j ; b pðr2j jX j ; wj ; w;

ð17Þ

where

Nj  M þ aj 2  1 ^ ¼ 1 ðy  U w ÞT ðy  U w Þ þ b1 b j j j j j j j j 2

^j ¼ a

ð18Þ ð19Þ

where N j is the sample size for each unit. 3.3.4. Posterior conditional for V 1 All of the terms in the product of Eq. (10) that do not involve V 1 are absorbed into the normalising constant of this conditional. Hence, we have:

" # M Y  r2j ; yj Þ /  V 1 Þ pðV 1 jq; RÞ pðV 1 jX j ; wj ; w; pðwj jw;

" #1 M X T ^  ðwj  wÞ  þ qR ðwj  wÞ R¼

been determined to converge. A draw from the marginal posterior predictive density of yj can be obtained by taking draws from the following normal density:

  ðgÞ ðgÞ ðyj ÞðgÞ  N Uj wj ; ðr2j IÞ

Algorithm 1 presents pseudocode for the Gibbs sampler solution for both the posterior conditional distributions and Monte Carlo integration for predictive distribution. Based on these steps, the mean and standard deviation of the sample of predictive distribution of yj ðgÞ can be estimated to produce a prediction of the degradation trajectories. Algorithm 1. Gibbs sampler for posterior conditional  r2j ; V 1 g and Monte Carlo integration for predictive fwj ; w; 0

0

 0 ; ðr2 Þ ; ðV 1 Þ g. 1: Specify starting values fw 2: for g ¼ 0; . . . ; ðG  1Þ do 3: for j ¼ 1; . . . ; M do

ð21Þ

4:

ðgþ1Þ

Draw wj

 ðgÞ ; ðV 1 Þ  pðwj jX j ; w

ðgÞ

ðgÞ

; ðr2 Þ

; yj Þ, Eq.

(11).

ð22Þ

ðgþ1Þ

5:

Wð:; jÞðgþ1Þ ¼ wj

6:

Draw ðr2j Þ

j¼1

Samples from the posterior conditional distributions in Eqs. (11), (14), (17) and (20) can be generated to implement a Gibbs sampler (Gelfand et al., 1990; Gelfand & Smith, 1990; Koop et al., 2007). This yields a set of G draws from the Gibbs sampler, given by:

n o ðgÞ ðgÞ ðgÞ  ðgÞ ; ðr2j Þ ; ðV 1 Þ wj ; w ; g ¼ 1; . . . ; G

ð25Þ

ð20Þ

where

q^ ¼ M þ q

As before, the required integration does not have a closed-form analytical solution, while the dimensionality of the space and the complexity of the integrand will prohibit numerical integration based on one-dimensional integration methods. Draws from the posterior predictive density for yj can be obtained using postburn-in draws from the Gibbs sampler. This performs direct Monte Carlo integration using Eq. (24) where draws from  V 1 ; r2 jyj Þ are taken from the Gibbs sequence after it has pðwj ; w;

distribution yj

j¼1

^ q  r2j ; yj Þ ¼ WðR; ^Þ pðV 1 jX j ; wj ; w;

ð24Þ

ð23Þ

The idea behind the Gibbs sampler is to generate a sequence of draws which, after a suitable ‘‘pre-convergence’’4 or ‘‘burn-in’’ period, have converged in distribution to the joint posterior density. The post-convergence draws (draws kept after the burn-in) can then be used to estimate posterior statistical properties, such as mean, standard deviation or other quantities of interest (Tobias, 2001). After removing the first B iterates, the remaining G–B draws are used in the averages for the mean, standard deviation or other statistical properties of the samples from the posterior conditionals of  r2j and V 1 . Pseudocode for the Gibbs sampler is given in wj ; w;

ðgþ1Þ

 ðgÞ ; ðV 1 Þ  pðr2j jX j ; Wðgþ1Þ ; w

ðgÞ

; yj Þ, Eq.

(17). 7: 8:

ðgþ1Þ

r2 ð:; jÞðgþ1Þ ¼ ðr2j Þ end for

 ðgþ1Þ  pðwjX  j ; Wðgþ1Þ ; ðV 1 Þ 9: Draw w (14). 10: Draw ðV 1 Þ Eq. (20). 11:

ðgþ1Þ

ðgÞ

; r2

ðgþ1Þ

; yj Þ, Eq.

 ðgþ1Þ ; r2  pðV 1 jX j ; Wðgþ1Þ ; w ðgþ1Þ

Draw ðyj Þðgþ1Þ  pðyj jX j ; yj Þ ¼ N ðUj wj

Eq. (25). 12: end for 1 13: Calculate the mean of yj : Eðyj Þ  ðGBÞ

ðgþ1Þ

; ðIðr2j Þ

; yj Þ,

ðgþ1Þ

ÞÞ,

PG1

 ðgþ1Þ . g¼B ½ðyj Þ

14: Calculate the variance of PG1  ðgþ1Þ 2 1  Eðyj Þ . yj : varðyj Þ  ðGBÞ g¼B ½ðyj Þ

Algorithm 1. 3.4. Predictive distribution 3.5. Failure-time distribution and RUL The predictive density of the future5 of a degradation signal yj can be obtained by marginalising the conditional predictive density  V 1 ; r2j jyj Þ: over the posterior pðwj ; w; 4 ‘‘Pre-convergence’’ or ‘‘burn-in’’ describes the number of initial draws of the Markov chain that should be discarded to allow the distribution to converge to the true posterior. 5 We use a superscripted asterisk to indicate out-of-sample data, design matrix etc.

In this case, the failure-time distribution f ðtÞ can be calculated using a Bernstein distribution (Ahmad & Sheikh, 1984; Gebraeel et al., 2005), because the predictive distribution is drawn from a Gaussian distribution, Eq. (25). The probability of the event that the estimated failure time (T) exceeds time (t) is equivalent to the probability that the future degradation signal (yj ) is less than a chosen failure threshold (yfail ). This can be written as:

546

M.A. Zaidan et al. / Expert Systems with Applications 42 (2015) 539–553

prognostics focuses on a single degradation cycle and the algorithms are tested only on data in between two maintenance events, neglecting any maintenance effects such as recovery to ‘‘as new’’ performance. The advantages of using BR-2 over BR-1 have been demonstrated for ‘‘standard’’ TGT margin degradation in Zaidan et al. (2013). 4.1. Case study 1: synthetic degradation data

Fig. 4. The differences between BR-2 and BR-3 in the form of graphical model. BR-3  and V as adopts an additional level of hierarchy by treating the parameters w random variables and permitting r2 to vary across units, while in BR-2, these parameters are assumed fixed. The dashed and solid boxes illustrate that BR-3 can accommodate multiple units with different data samples.

PðT > tÞ ¼ Pðyj 6 yfail Þ

ð26Þ

Therefore, the failure-time distribution can be formulated as: FðtÞ ¼ PðT 6 tÞ ¼ 1  PðT > tÞ ¼ 1  Pðyj 6 yfail Þ ¼ 1  U f ðtÞ ¼

dFðtÞ dt



yfail  Uj wj

rj

 ð27Þ ð28Þ

where Uð:Þ is the cumulative distribution function of a Gaussian distribution. The parameters of wj and rj are obtained from the draws of predictive distribution, Eq. (25). RUL is then estimated by subtracting the current time from the mode of the failure-time distribution, f ðtÞ, known as the estimated failure-time, denoted by T. The current time is denoted by s. So, the RUL is then given by: RUL ¼ T  s. 3.6. Algorithm summary In summary, Fig. 4 shows the graphical models for BR-2 (left) and BR-3 (right). The main difference between these models can  and V. In BR-3, these paramebe seen clearly in the parameters w ters are modelled as random variables following Gaussian and Wishart distributions, respectively, while in BR-2, these parameters are treated as fixed. In addition, BR-3 also accommodates multiple units M with different data samples and noise variances, shown by the dashed box. Symbol N j is the data sample size and M is the total number of units. Fig. 5 is a block diagram of the BR-3 methodology. Prior distributions are estimated from the historical database. All units together with the database are fed into the algorithm, (Fig. 3 details BR-3). RUL for all units can be estimated simultaneously and asynchronously regardless of current sample size. The process continues until a maintenance action is taken. For each completed degradation cycle, the data can be added to the database to update prior distributions. For a new unit, it can be fed directly into BR-3 to estimate its RUL. 4. Case studies To develop intuition in the use of BR-3, we first explore the effect of different modelling scenarios on a set of synthetic problems whose parameters are entirely known. This should help build confidence in its applicability in situations where ‘‘ground truth’’ is unknown as will normally be the case in gas turbine engine prognostics. We compare BR-2 and BR-3 to determine the conditions under which we would expect to see benefit in suing the more general model – clearly, when data are homogeneous across all units both models should revert to the same solution. In the case studies,

The synthetic data are generated to model real data characteristics. In the first case, a ‘‘standard’’ degradation signal is generated a linear model with various levels of noise:

d1 ¼ Uw þ e1

ð29Þ

where w and e1 are each drawn from a Gaussian distribution, so that: w  N ðlw ; Rw Þ and e1  N ð0; r21 IÞ. The noise variance, r21 , varies from one unit to another. Here, we use first-order polynomial basis function,6 so that U ¼ ½1 x, where 1 and x denote N-dimensional column vectors of ones and time index, respectively. The mean of w can be written lw ¼ ½lw1 lw2 T and we constrain lw2 < 0, to encourage the synthetic data to have a negative slope mimicking real TGT margin behaviour. Next, two extreme scenarios are generated to emulate the realworld situation involving non-linearity and high noise levels. The first of these is a piecewise linear degradation with moderate noise leading to the model:

d2 ¼



Ua wa þ ea ¼ ½1a xa wa þ ea if xa 6 xc ; Ub wb þ eb ¼ ½1b xb wb þ eb if xb > xc ;

ð30Þ

where ea  N ð0; r2a IÞ and eb  N ð0; r2b IÞ and the variances vary from one unit to another. Model coefficients, represented by wa and wb , are drawn from Gaussian distributions, wa  N ðlwa ; Rwa Þ and wb  N ðlwb ; Rwb Þ. xc is a time index where the change in degradation’s slope occurs and to make this scenario more realistic we increase the rate of degradation at this point (an apparent increase in the downward slope of degradation curves is frequently observed in gas turbine engine data, c.f. Fig. 2). A second extreme scenario is when degradation contains a very high level of noise. In this case, we generate synthetic degradation data from a linear model with very high noise level, given by:

d3 ¼ Uw þ e3

ð31Þ

where w  N ðlw ; Rw Þ and e4  N ð0; r23 IÞ, the noise variance again varies from unit to unit. For this case we revert to the first-order polynomial basis so that U ¼ ½1 x, and we produce high noise levels in the signal by setting r23 > fr21 ; r2a ; r2b g. Fourteen degradation signals are generated which we have tried to make similar to observed gas turbine engine behaviour, as shown in Fig. 6, where four of these characterise the extreme scenarios: units 1 and 2 are generated from Eq. (30) whereas units 10 and 11 are generated from Eq. (31). The remaining units are generated from Eq. (29) to represent the ‘‘standard’’ situation. ‘‘Time index’’ denotes the normalised number of flight cycles, and ‘‘health index’’, the engine’s health to be forecast. The action warning threshold is chosen, arbitrarily, to be 22. The signals for units 2 and 11 are selected to exemplify the method because they represent extremes of behaviour so we expect to see the main differences between BR-2 and BR-3. Unit 2 represents sudden rate change behaviour (+) and its solid line illustrates ‘‘ground truth’’. It can be seen that the signal decreases gradually until time index xc  0:25, after that, it starts to degrade 6 This assumption follows the explanation in early section that many degradation patterns in gas turbine engine are nearly linear.

M.A. Zaidan et al. / Expert Systems with Applications 42 (2015) 539–553

547

Fig. 5. Block diagram of complete BR-3 approach. Having estimated prior parameters for all units from the degradation database, data from multiple units with different sample sizes are fed into the single BR-3 model, where RUL can be estimated simultaneously and asynchronously for any unit.

Fig. 6. Synthetic data for 14 units. Owing to their extreme nature, the signals for units 2 (symbol ‘‘þ’’) and 11 (symbol ‘‘’’) are selected to exemplify the method.

more rapidly. Unit 11 and its ‘‘ground truth’’, illustrated by () and a solid line, respectively, has a high noise level which occurs at the beginning of the degradation signal, highlighted by an oval region. The other degradation signals are shown by dashed lines. As described in Section 3.1, the prior specification is crucial in BHM. Therefore, we will analyse our methodology for prior specification first in this following subsection, before the benefits of BR-3 in prognostics are investigated. 4.1.1. Informative IG prior vs Jeffreys prior This subsection presents the advantage of using ‘‘moderately’’ informative prior information. By this we mean a mixture of an informative IG prior with more diffuse second-level priors. We begin by comparing performance using the non-informative, Jeffreys, prior.

The IG prior parameters are obtained using the procedure described in Section 3.1. Some parameters are selected to be dif C and q, the degree-of-freefuse, including the prior covariance w; dom parameter in the prior for V 1 . To express prior uncertainty about the value of the common mean, the parameter C can be selected to be five times larger than its mean. This produces a substantially flat prior over each element (Tobias, 2001). Furthermore, since there are two coefficients in the linear model, the parameter, q, must be at least two to ensure propriety of the Wishart distribution. A choice of two gives the maximally diffuse (Koop et al., 2007; Tobias, 2001) distribution. We have chosen q ¼ 5 as a moderate compromise. The parameter, R, is determined by calculating the covariance of the OLS estimates of the wj , and here ~r ¼ 1. The parameters aj and bj are calculated from Eqs. (7) and (8), whilst for the Jeffreys prior, the Eq. (7) can be well approximated by setting pðr2 Þ ¼ IGð0:001; 0:001Þ (Lesaffre & Lawson, 2012). The number of iterations for Gibbs sampling is set to be G ¼ 10; 000, with B ¼ 1000. Here, a first-order polynomial basis function is used for modelling. Fig. 7 shows RUL estimates for units 2 and 11. The solid line is the true RUL and (þ) and () represent RUL for BR-3 with informative IG prior and Jeffreys prior, respectively. The results demonstrate clearly that the use of Jeffreys prior leads to poor performance in RUL estimation. The utilisation of the Jeffreys prior may require numerous data points in order to converge to the true RUL as shown in Fig. 7(b). We therefore recommend the use of informative, IG, priors for the individual noise variances and adopt our proposed methodology to determine crucial prior parameters in the sequel. It is appropriate to explain the construction of the curves in Fig. 7. Each trace shows the estimated RUL as the time index progresses so that the predictive model is updated each time a new datum arrives. In this case we adopt the mode of operation in Gebraeel et al. (2005) whereby the jth unit is assumed to begin with no data while all others have full information in the model. As the time index increments, new data for the jth unit only is learned. For simplicity, we accommodate the arrival of a new sample by re-running the Gibbs sampler ab initio. This is time consuming but accurate and not beyond the bounds of practicality if data arrive

548

M.A. Zaidan et al. / Expert Systems with Applications 42 (2015) 539–553

Unit 2

Unit 11

1

1

True RUL Informative IG Jeffreys

0.9

0.8

0.7

0.7

0.6

0.6

RUL

RUL

0.8

True RUL Informative IG Jeffreys

0.9

0.5

0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0.1

0

0 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0

0.2

0.4

Time Index

0.6

0.8

1

Time Index

Fig. 7. RUL comparison between BR-3 with informative IG prior and BR-3 with non-informative, Jeffreys, prior. The results reveal that the use of Jeffreys prior might produce unexpected inference.

Unit 2

Unit 11

1.5

1.5

True RUL BR−2 BR−3, R =0.01

True RUL BR−2 BR−3, R =0.01

sf

sf

BR−3, R =1

BR−3, R =1

sf

1

sf

1

BR−3, Rsf=100

RUL

RUL

BR−3, Rsf=100

0.5

0.5

0

0 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0

0.2

0.4

Time Index

0.6

0.8

1

Time Index

Fig. 8. RUL comparison between BR-2 and BR-3 (~rf0:01; 1:0; 100:0g) using the synthetic data for units 2 and 11. The solid line represents ‘‘ground truth’’, (), BR-2 and (}), (+) and () BR-3 estimates for ~r ¼ f0:01; 1:0; 100:0g respectively. A choice of ~r ¼ 1 is suggested, indicating a ‘‘moderate’’ common prior (+) which is able to deal well with the extreme problems.

4.1.2. The advantages of BR-3 in accommodating heterogeneous fleet data Here we demonstrate the advantages of BR-3 in dealing with the heterogeneity that arises across fleets of units exemplified by the extreme scenarios outlined earlier. We use the same setting for prior parameters and Gibbs sampler as described above, and again select units 2 and 11 for illustration. To evaluate the effect of prior uncertainty of the common

Unit 2

45 Syn data BR−2 BR−3 Ground truth

Health Index

40 35 30 25 20

threshold

15 4 0

Density

relatively slowly as is likely in airline operations, with asynchronous sample rates perhaps on the order of hours rather than seconds. However, in real operations either a ‘‘warm start’’7 or a full, sequential solution such as the particle filter may prove more attractive although neither is without its own challenges (Arulampalam, Maskell, Gordon, & Clapp, 2002). Our intention here, however, is to demonstrate model capability.

0.5

1

0.5

1

1.5

2

2.5

1.5

2

2.5

2 0

0

Time Index 7 ‘‘Warm start’’ indicates that the sampler is re-started from its current state and allowed to run to convergence in a time expected to be substantially less that an ab initio run. However, while such methods are used, they are not guaranteed to converge to the true sequentially updated distribution.

Fig. 9. Degradation predictions for the synthetic data for unit 2 at time index 0.25. The BR-2 prediction deviates from the true value, whilst BR-3 maintains predictive accuracy when there is a sudden change behaviour in the true degradation signal.

549

M.A. Zaidan et al. / Expert Systems with Applications 42 (2015) 539–553 BR−2 method 10 0.1 0.73 1.49 Ground truth

Density

8 6 4 2 0

0

0.5

1 Time Index

1.5

2

BR−3 method 10 0.1 0.73 1.49 Ground truth

Density

8 6 4 2 0

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

Time Index Fig. 10. The evolution of failure-time distributions for BR-2 and BR-3. At time index 0.73, the failure-time distribution of BR-2 deviates strongly from ground truth in comparison to that of BR-3. The common priors in BR-3 share information from other units to influence its prediction of how degradation may occur in the future.

covariance, three simulations are generated using different values of ~r (the scaling factor of R), ~r ¼ f0:01; 1:0; 100:0g. As described in Section 3.1, the choice of this scaling factor will influence the behaviour of BR-3, whether it fully shares information across units (BHM) or treats them separately (BnHM) with increasing ~r. This study aims to provide a way of selecting ~r for our target problem to demonstrate the strength of BR-3 for the extreme scenarios. Fig. 8 demonstrates RUL metrics for units 2 and 11. The solid line represents ‘‘ground truth’’, () represents BR-2 estimates, and (}), (þ) and () represent BR-3 estimates for ~r ¼ f0:01; 1:0; 100:0grespectively. Owing to the effect of a sudden rate change in degradation in unit 2, BR-2 deviates strongly from the true value (Fig. 8(a)), whilst in unit 11, BR-2 also diverges from the true value because of the high noise level early in the degradation (Fig. 8(b)). This occurs because BR-2 relies solely on an individual prior and does not accommodate heterogeneous effects. As described in Table 1, RUL estimation using BR-3 with ~r ¼ 100 also tends to emulate BR-2.

In contrast, by specifying ~r to be small, e.g. ~r ¼ 1:0 or 0:01 BR-3 is able to maintain accuracy because common (level 2) priors play an important role in the prediction. In other words, prediction shares mutual information across the units, and combines it with individual (level 1) prior information as well as incoming data. We suggest a moderate choice of ~r , e.g. ~r ¼ 1 for the current problem, because, as Fig. 8(a) shows, BR-3 with ~r ¼ 0:01 converges more slowly than with ~r ¼ 1 owing to over diffusivity in the prior. This is particularly problematic when initial predictions are poor. Therefore, we set ~r ¼ 1 in BR-3 in the sequel but are mindful that its choice is problem specific. Fig. 9 shows the predicted degradation trajectory for unit 2 at time indexes P0.25, i.e. inference is performed only on data for unit 2, up to time index 0.25. In the top subfigure, the straight dashed line and the straight solid line represent the BR-2 and BR-3 predictions, respectively. The surrounding dashed lines around these prediction lines are their 99% confident intervals for both approaches. The (þ) represents the ‘‘observed’’ data and the straight dot-dashed line its ‘‘ground truth’’. The action warning threshold yfail ¼ 22 (horizontal dashed line). Current time is shown by the vertical dashed line. In the lower subfigure, the dashed, solid and dot-dashed curves represent the failure-time distributions of BR-2, BR-3 and ‘‘ground truth’’, respectively. The figure illustrates that when only the ‘‘gradual decrease’’ in degradation signal is available, the BR-2 prediction deviates strongly from the truth, being dominated by the initial shallow slope of unit 2, whereas, BR-3 maintains a good degree of predictive accuracy because of the information sharing across units permitted by the additional hierarchical level. Fig. 10 illustrates the evolution of predicted failure-time distributions for unit 2 at time indexes 0.1, 0.73 and 1.49 for both approaches. The dotted curve represents true failure-time distribution. It can be seen that the mode of the failure-time distribution of BR-3 approaches, monotonically, the true mode of failure-time whilst, for BR-2, it is grossly optimistic at time index = 0.73. However, when enough data are gathered, both failure-time distributions approach close to the truth. For the ‘‘standard’’ case (e.g. units 6 and 8) there is no substantial difference between BR-2 and BR-3 as shown in Fig. 11. To summarise performance across all units, the RUL residual (predicted–true) is calculated for each individual at each time. These are compared for all data in the time-indexed box plot of Fig. 12. For each box, the central mark is the median, the edges of the box are the 25th and 75th percentiles, the whiskers

Unit 6

Unit 8

1.5

1.5

True RUL BR−2 BR−3

True RUL BR−2 BR−3

RUL

1

RUL

1

0.5

0.5

0

0 0

0.1

0.2

0.3

0.4

0.5

Time Index

0.6

0.7

0.8

0.9

0

0.2

0.4

0.6

0.8

1

Time Index

Fig. 11. RUL comparison between BR-2 and BR-3 using synthetic data for ‘‘standard’’ degradation behaviour (units 6 and 8). Here there is no substantial difference between BR-2 and BR-3 as expected.

550

M.A. Zaidan et al. / Expert Systems with Applications 42 (2015) 539–553

extend to the most extreme points not considered outliers. The plot for BR-2 shows more overall variability compared with BR-3, implying a lack of consistency and accuracy, especially in the early stages (time index <0.5) when data are scarce. This results from the inclusion of the extreme scenarios for operational veracity.

Box plot of RUL residual for BR−2

RUL residual

0.2 0 −0.2 −0.4

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

4.2. Case study 2: aerospace engine degradation

1

Box plot of RUL residual for BR−3

RUL residual

0.2 0 −0.2 −0.4

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Time index

Fig. 12. Box plot representing RUL performance for all synthetic data. The (realistic) inclusion of the extreme scenarios means that BR-2 shows more overall variability and lower accuracy than BR-3.

This case study focuses on in-service data, the take-off values of TGT margin for aerospace gas turbine engines. The aim is to illustrate the effectiveness of our proposed algorithm using real degradation data. All data are normalised and the action warning threshold is chosen arbitrarily as yfail ¼ 22. There are 14 TGT margin time histories of varying lengths, generated by 14 aircraft engines. Fig. 2 shows two plotted against normalised number of flight cycles. Engines 8 and 9 are selected for illustration because engine 8 has an apparent change in degradation behaviour (Fig. 2(a)) and engine 9 has poor signal-to-noise ratio (SNR), as shown in Fig. 2(b). The prognostic algorithms are applied to these data using the same methodology as in case study 1. Fig. 13 compares RUL

Engine 8

Engine 9

180

180

Expected RUL BR−2 BR−3

160

140

120

120

100

100

RUL

RUL

140

80

80

60

60

40

40

20

20

0

0

10

20

30

40

50

60

70

Expected RUL BR−2 BR−3

160

0

80

0

10

Flight cycle (normalised)

20

30

40

50

60

70

80

Flight cycle (normalised)

Fig. 13. RUL comparison between BR-2 and BR-3 using real TGT margin data for engines 8 and 9. BR-3 copes well with the extremes of slope change and poor SNR.

Engine 2

Engine 3

45

45

Expected RUL BR−2 BR−3

40

35

30

30

25

25

RUL

RUL

35

20

20

15

15

10

10

5

5

0

0

5

10

15

20

25

Flight cycle (normalised)

30

35

Expected RUL BR−2 BR−3

40

40

0

0

5

10

15

20

25

30

35

40

Flight cycle (normalised)

Fig. 14. RUL comparison between BR-2 and BR-3 using real TGT margin data for engines 2 and 3, considered as ‘‘standard’’ cases. There is no substantial difference in performance between BR-2 and BR-3.

M.A. Zaidan et al. / Expert Systems with Applications 42 (2015) 539–553

Engine 1 45

Expected RUL BR−2 BR−3

40 35

RUL

30 25 20 15 10 5 0 0

5

10

15

20

25

30

35

40

Flight cycle (normalised) Fig. 15. RUL estimations for engine 1 seems to converge slowly to the ‘‘expected RUL’’ because its data characteristics is not similar with other engine data and also this data is right censored.

Box plot of RUL residual for BR−2 RUL residual

60 40 20 0 −20 −40 2

4

6

8

10

12

14

16

18

20

22

24

26

28

Box plot of RUL residual for BR−3

551

to the ‘‘expected RUL’’ because the degradation data lies close to the initial prediction whereas for engine 3, there is a little fluctuation in both methods before they converge to ‘‘expected RUL’’ (a similar situation obtains for the synthetic data of Fig. 11(b)). We ascribe this behaviour to poor initial estimates. In all cases early behaviour will depend on the specific pattern of noise corruption. In general, there is little substantial difference between BR-2 and BR-3 for such ‘‘standard’’ scenarios. Engine 1 (see Fig. 15) is interesting because both approaches converge slowly to the ‘‘expected RUL’’ while its behaviour is apparently ‘‘standard’’. Examination of its characteristics (not shown) reveal that while it does conform to the standard description, its slope is considerably different than most, placing it on the upper limits of behaviour. Consequently, prognostic approaches require more data to converge. It can be seen, though, that there is again no substantial difference between BR-2 and BR-3. Fig. 16 shows a sequence of box plots, representing the RUL residuals of BR-2 and BR-3 when applied across the fleet. Because the total number of flight cycles varies from engine-to-engine, we show only the first 28 here. BR-2 performance can be better than BR-3 in the early stages (before flight cycle 4) owing to its full informative prior, whereas BR-3 uses only ‘‘moderately’’ informative priors. This would lead to less accurate RUL estimation at early prediction for BR-3 (recall the ‘‘expected RUL’’ is computed from the OLS fit to all data). Next, between flight cycle 6 and 14, it can be seen that BR-2 and BR-3 have similar inter-quartile ranges, but BR-2 typically has longer whiskers. This situation arises as a consequence of the presence of the extreme scenarios discussed earlier. In the longer run (after flight cycle 22), it can be seen that BR-3 has smaller inter-quartile range than BR-2, indicating better overall performance than BR-2.

RUL residual

60 40

5. Conclusions

20 0 −20 −40 2

4

6

8

10

12

14

16

18

20

22

24

26

28

Flight cycle (normalised) Fig. 16. Boxplot representing RUL performance for all real data. Box plot for BR-3 shows smaller edges and shorter whiskers than box plot for BR-2, indicate that BR-3 is more consistent and accurate in prognostic analysis involving some extreme scenarios.

predictions for BR-2 and BR-3 for engines 8 and 9. The solid line is the ‘‘expected RUL’’,8 () and (þ) represent the BR-2 and BR-3 estimates, respectively. Fig. 13(a) reveals a significant deviation in RUL prediction for BR-2 owing to the sudden change in degradation behaviour, whereas BR-3 maintains good predictive accuracy. Similar behaviour is shown in Fig. 13(b): the BR-2 prediction deviates largely from ‘‘expected RUL’’ owing, now, to the poor SNR, whilst BR-3 accommodates the high level of noise. After receiving enough data, both predictions converge to the ‘‘expected RUL’’ line. The conclusions drawn from the synthetic data for BR-2 and BR-3 therefore carry over to this case study: a Bayesian approach involving two types of prior (BR-3) in both levels, is able to enhance RUL estimation for these extreme scenarios. Fig. 14 shows RUL for the ‘‘standard’’ case9 exemplified by engines 2 and 3. For engine 2, both algorithms converge consistently

8 ‘‘Expected RUL’’ is derived for each unit by fitting the OLS line through the entire degradation history then calculating the distance to yfail at each time index. It is a widely used benchmark despite the fact that it may have little relation to the true RUL and should not be regarded as ‘‘ground truth’’ (Li & Nilkitsaranont, 2009). 9 Virtually linear relationship with moderate noise.

The paper establishes a decision support system for the prognosis of multiple units arranged in a hierarchical manner. A Bayesian hierarchical framework is used to permit inference at both the individual and group levels thus allowing inhomogeneity in the characteristics of uncertainty. This is combined with a model for the predictive density of remaining useful life to provide more precise predictions of failure (threshold exceedance in the target application) than is currently available. In the absence of either a hierarchical structure or heterogeneity in variability, inference naturally defaults to the appropriate result. The Bayesian approach lends itself to update and this can be done asynchronously as and when data become available for any particular unit. The framework is directly extendable to deeper levels of hierarchy. The BHM is derived (named BR-3) and is compared with an existing non-hierarchical version (named BR-2). The capabilities of these approaches are explored using synthetic data designed to resemble field data but with known parameters. Subsequently, we find that the characteristics displayed in the synthetic framework are preserved when applied to in-service data where ‘‘ground truth’’ is, of course, unknown. Civil aviation generates a considerable quantity of health signals across fleets of assets. BR-3 prognostics appear to offer an appropriate modelling framework to maximise the use of such data to exploit maximally the information associated with both the individual asset and the group of like assets. The production of a full predictive distribution is expected to enable operators to make better informed judgements on the quality of RUL forecasts. We have suggested some principles for specifying the parameters of ‘‘moderately informative’’ prior distributions, where some (e.g. the noise variances) can be determined directly from historical data and some are set subjectively. We have compared this

552

M.A. Zaidan et al. / Expert Systems with Applications 42 (2015) 539–553

with a well-known uninformative prior, the Jeffreys prior, and shown its superior performance. In practice, the use of ‘‘moderately informative’’ priors will reduce the need for expert involvement in specifying prior parameters and was shown to work well in the case study. We expected no substantial difference between BR-3 and BR-2, applied to ‘‘standard’’ degradation signals owing to homogeneity in the uncertainty. However, BR-3 outperforms BR-2 for more extreme situations which reflect the heterogeneity observed in fleet data. To demonstrate this, two degradation signals, one with a sudden rate change, and the other with very high noise levels, were examined in detail. The results reveal that BR-3 has better performance than BR-2 for these extreme scenarios. BR-3 is able to share information across units owing to the common (level 2) priors and this is then combined with individual (level 1) prior information leading to the improved outcome. We conclude that BR-3 shows promise in the domain of complex system prognostics, where heterogeneity is the rule rather than the exception. In aviation, gas turbine engines undergo overhaul to maintain performance and safety, at a cost that may be amortised into each flight completed. Competing factors determine the need for overhaul, which include life exceedance on critical engine components, operating at maximum shaft speed or temperature limits, and reliability issues. It is estimated that the exceedance of temperature currently accounts for around one fifth of engine removals, reducing the number of flights achieved between overhauls and thus increasing operating costs. The data used in this paper is already available to fleet management operators who use algorithms embedded in a service-orientated architecture to generate point predictions for remaining time before temperature exceedance. Improved predictions with well-defined uncertainty estimates gives fleet managers confidence in the capability of engines to operate within operational limits, extending aircraft availability and thus reducing cost per flight. The hierarchical Bayesian framework established here has shown the effective construction of life estimates with prediction uncertainty. In particular, the hierarchical formulation is shown to be robust to noise and unusual burn-in characteristics (rate of degradation change) of early-life engine behaviour reducing the variance in life predictions across the fleet, this is a benefit of the inclusion and proper handling of a population model. The improved consistency of predictions, along with measures of uncertainty, has the potential to provide enhanced information for decision makers to determine the most cost effective time at which to service each engine. 5.1. Future research directions From an industrial perspective, Monte Carlo methods are unattractive because a high level of expertise would be required of the end-user and its stochastic nature might, dependent on use, militate against uptake. Additionally, deterministic solutions lend themselves more readily to on-asset exploitation and may provide competitive advantage in the burgeoning unmanned systems market. This motivates the exploration of deterministic, necessarily approximate, solutions such as the Laplace approximation (e.g. Bishop (2006)), the modal approximation of Lindley and Smith (1972) or variational Bayes (e.g. Fox & Roberts (2012)). Central to our work has been the choice of a linear input–output model. Adding flexibility e.g. by additional polynomial terms may be beneficial but can introduce problems of over-fitting, especially in early life and loss of monotonicity. These issues can be cumbersome to deal with directly and recent advances in non-parametric function approximation and machine learning – so-called Gaussian Process models (e.g. Rasmussen & Williams (2006)) may offer a way forward. This methodology provides effective methods that relax the need for a priori model choice and allow the data to

determine the optimal model from a class defined by a characteristic covariance function while naturally guarding against overfitting. A benefit of doing this would be to permit a different model form for each individual without having to go through extensive and explicit model-building. Our formulation assumes that the explanatory variables are known exactly. This is appropriate in the target problem where flight cycle number is the variable of interest, as it would be for other deterministic variables describing fixed characteristics (e.g. engine type or operator). These covariate types are exactly knowable both in the historic data and for future predictions of their value. However, if continuous and uncertain variables were thought to influence behaviour both measurement uncertainty and predictive uncertainty should be included in the Bayesian model. It is also possible that there may be missing values from the measured data due to individual sensor faults. For example, ambient temperature is known to affect the rate of degradation, and while the measurement uncertainty is likely negligible in its influence, the prediction of future values can be highly uncertain. The uncertainty of the covariate predictions and the issue of missing data should therefore be handled in a full Bayesian treatment to estimate the true uncertainty in remaining life predictions. Acknowledgement The authors would like to thank Rolls-Royce plc for their support and guidance in this work. References Ahmad, M., & Sheikh, A. (1984). Bernstein reliability model: Derivation and estimation of parameters. Reliability Engineering, 8, 131–148. Arulampalam, M. S., Maskell, S., Gordon, N., & Clapp, T. (2002). A tutorial on particle filters for online nonlinear/non-gaussian bayesian tracking. IEEE Transactions on Signal Processing, 50, 174–188. Banerjee, S., Carlin, B., & Gelfand, A. (2004). Hierarchical modeling and analysis for spatial data. Boca Raton, Florida: Chapman & Hall. Bernardo, J. M., & Smith, A. F. (2009). Bayesian theory. New York: John Wiley & Sons. Berry, D. A. (1996). Statistics: A Bayesian perspective. New York: Duxbury Press. Bishop, C. M. (2006). Pattern recognition and machine learning. New York: Springer. Blei, D. M., Jordan, M. I., & Ng, A. Y. (2003). Hierarchical Bayesian models for applications in information retrieval. In J. Bernardo, M. Bayarri, J. Berger, A. Dawid, D. Heckerman, & A. Smith, et al. (Eds.). Bayesian statistics (Vol. 7, pp. 25–43). Oxford University Press. Brown, E., Moore, E., McCollom, N., & Hess, A. (2007). Prognostics and health management a data-driven approach to supporting the F-35 lightning II. In 2007 IEEE aerospace conference (pp. 1–12). IEEE. Celaya, J., Saha, B., Wysocki, P., & Goebel, K. (2008). Prognostics for electronics components of avionics systems. In 2008 IEEE aerospace conference (pp. 1–7). IEEE. Daniels, M. J. (1999). A prior for the variance in hierarchical models. Canadian Journal of Statistics, 27, 567–578. Davidian, M., & Giltinan, D. (1995). Nonlinear models for repeated measurement data. Boca Raton, Florida: Chapman & Hall. Fei-Fei, L., & Perona, P. (2005). A Bayesian hierarchical model for learning natural scene categories. In 2005 IEEE computer society conference on computer vision and pattern recognition CVPR 2005 (pp. 524–531). IEEE. Fox, C. W., & Roberts, S. J. (2012). A tutorial on variational Bayesian inference. Artificial Intelligence Review, 38, 85–95. Gao, Y., Li, Y., Zhang, G., & Zhang, Z. (2008). Research status and perspectives of fault prediction technologies in prognostics and health management system. In 2nd International symposium on systems and control in aerospace and astronautics ISSCAA 2008 (pp. 1–6). IEEE. Gebraeel, N. (2006). Sensory-updated residual life distributions for components with exponential degradation patterns. IEEE Transactions on Automation Science and Engineering, 3, 382–393. Gebraeel, N., Lawley, M., Li, R., & Ryan, J. (2005). Residual-life distributions from component degradation signals: A Bayesian approach. IIE Transactions, 37, 543–557. Gebraeel, N., Lawley, M., Liu, R., & Parmeshwaran, V. (2004). Residual life predictions from vibration-based degradation signals: A neural network approach. IEEE Transactions on Industrial Electronics, 51, 694–700. Gelfand, A., Hills, S., Racine-Poon, A., & Smith, A. (1990). Illustration of Bayesian inference in normal data models using Gibbs sampling. Journal of the American Statistical Association, 85, 972–985. Gelfand, A. E., & Smith, A. F. (1990). Sampling-based approaches to calculating marginal densities. Journal of the American Statistical Association, 85, 398–409.

M.A. Zaidan et al. / Expert Systems with Applications 42 (2015) 539–553 Gelman, A. (2006). Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper). Bayesian Analysis, 1, 515–534. Gelman, A. (2009). Bayes, Jeffreys, prior distributions and the philosophy of statistics. Statistical Science, 24, 176–178. Gelman, A., Carlin, J. B., Stern, H. S., & Rubin, D. B. (2004). Bayesian data analysis (2nd ed.). Boca Raton, Florida: Chapman & Hall. Gelman, A., & Hill, J. (2006). Data analysis using regression and multilevel/hierarchical models. New York: Cambridge University Press. General-Electric (2008). Understanding the EGT redline. GE aviation flight operations newsletter, 3, 4–7. Girolami, M., & Rogers, S. (2005). Hierarchic bayesian models for kernel learning. In Proceedings of the 22nd international conference on machine learning ICML ’05 (pp. 241–248). ACM. Guo, R.-S. (2009). A multi-category inter-purchase time model based on hierarchical bayesian theory. Expert Systems With Applications, 36, 6301–6308. Hamada, M. S., Wilson, A. G., & Reese, C. S. (2008). Bayesian reliability. New York: Springer. Hecht, H. (2006). Why prognostics for avionics? In 2006 IEEE aerospace conference (pp. 1–6). IEEE. Ibrahim, J., Chen, M., & Sinha, D. (2001). Bayesian survival analysis. New York: Springer. Jeffrey, H. (1961). Theory of probability. Oxford: Clarendon Press. Johansson, J., & Leisner, P. (2012). Prognostics of thermal fatigue failure of solder joints in avionic equipment. IEEE Aerospace and Electronic Systems Magazine, 27, 16–24. Kass, R. E., & Wasserman, L. (1996). The selection of prior distributions by formal rules. Journal of the American Statistical Association, 91, 1343–1370. Khani, N., Segovia, C., Navaratne, R., Sethi, V., Singh, R., & Pilidis, P. (2012). Towards development of a diagnostic and prognostic tool for civil aero-engine component degradation. In 2012 ASME gas turbine india conference (pp. 1–12). ASME. Koop, G., Poirier, D., & Tobias, L. (2007). Bayesian econometric methods (econometric exercises). New York: Cambridge University Press. Lesaffre, E., & Lawson, A. B. (2012). Bayesian biostatistics. Chichester, UK: John Wiley & Sons. Lindley, D. V., & Smith, A. F. M. (1972). Bayes estimates for the linear model. Journal of the Royal Statistical Society. Series B (Methodological), 34, 1–41. Li, Y., & Nilkitsaranont, P. (2009). Gas turbine performance prognostic for conditionbased maintenance. Applied Energy, 86, 2152–2161. Lipowsky, H., Staudacher, S., Bauer, M., & Schmidt, K. J. (2010). Application of Bayesian forecasting to change detection and prognosis of gas turbine performance. Journal of Engineering for Gas Turbines and Power, 132.

553

Malinge, Y., & Courtenay, C. (2007). Avoiding high speed rejected takeoffs due to EGT limit exceedance. Safety First: The Airbus Safety Magazine, 8–13. Menzefricke, U. (1999). Bayesian prediction in growth-curve models with correlated errors. Test, 8, 75–93. Murphy, K. P. (2007). Conjugate Bayesian analysis of the Gaussian distribution. Technical Report University of British Columbia. Pecht, M., & Jaai, R. (2010). A prognostics and health management roadmap for information and electronics-rich systems. Microelectronics Reliability, 50, 317–323. Puggina, N., & Venturini, M. (2012). Development of a statistical methodology for gas turbine prognostics. Transactions of the ASME-A-Engineering for Gas Turbines and Power, 134, 1–9. Raftery, A. E., & Lewis, S. (1992). How many iterations in the gibbs sampler. Bayesian Statistics, 4, 763–773. Rasmussen, C. E., & Williams, C. K. I. (2006). Gaussian processes for machine learning. Cambridge, Massachusetts: The MIT press. Robinson, M., & Crowder, M. (2000). Bayesian methods for a growth-curve degradation model with repeated measures. Lifetime Data Analysis, 6, 357–374. Rolls-Royce (2013). Rolls-Royce: our corporate story. URL accessed: September 2013. Rossi, P. E., Allenby, G. M., & McCulloch, R. (2005). Bayesian statistics and marketing. Chichester, UK: John Wiley & Sons. Saravanamuttoo, H., Rogers, G. F., Cohen, H., & Straznicky, P. (2009). Gas turbine theory (Sixth ed.). Dorset, UK: Pearson Education. Sikorska, J., Hodkiewicz, M., & Ma, L. (2011). Prognostic modelling options for remaining useful life estimation by industry. Mechanical Systems and Signal Processing, 25, 1803–1836. Spiegelhalter, D., Thomas, A., Best, N., & Gilks, W. (1996). Bugs 0.5: Bayesian inference using gibbs sampling manual (version ii). MRC Biostatistics Unit, Institute of Public Health, Cambridge, UK (pp. 1–59). Tobias, J. (2001). Forecasting output growth rates and median output growth rates: A hierarchical bayesian approach. Journal of Forecasting, 20, 297–314. Walker, M. (2010). Next generation prognostics and health management for unmanned aircraft. In 2010 IEEE aerospace conference (pp. 1–14). IEEE. Widodo, A., & Yang, B.-S. (2011). Application of relevance vector machine and survival probability to machine degradation assessment. Expert Systems with Applications, 38, 2592–2599. Xu, J., & Xu, L. (2011). Health management based on fusion prognostics for avionics systems. Journal of Systems Engineering and Electronics, 22, 428–436. Zaidan, M. A., Mills, A. R., & Harrison, R. F. (2013). Bayesian framework for aerospace gas turbine engine prognostics. In IEEE aerospace conference. Big Sky, MT: IEEE.