Dynamic Effect of Perfect Preventive Maintenance on System Reliability and Cost Using HiP-HOPS

Dynamic Effect of Perfect Preventive Maintenance on System Reliability and Cost Using HiP-HOPS

2010 Management and Control of Production Logistics University of Coimbra, Portugal September 8-10, 2010 Dynamic Effect of Perfect Preventive Mainten...

496KB Sizes 0 Downloads 5 Views

2010 Management and Control of Production Logistics University of Coimbra, Portugal September 8-10, 2010

Dynamic Effect of Perfect Preventive Maintenance on System Reliability and Cost Using HiP-HOPS Shawulu H. Nggada, David J. Parker and Yiannis I. Papadopoulos Department of Computer Science, University of Hull, HU6 7RX, UK { S.H.Nggada@2007. | D.J.Parker@ | Y.I.Papadopoulos@ }hull.ac.uk Abstract: The occurrence of failure in a safety critical engineering system can be reduced through the use of preventive maintenance (PM). Each time a component of the system is maintained its effective age is reduced; the extent of this age reduction depends on the effectiveness of the PM known as the improvement factor. While some components may need to be maintained every PM interval T, typically most components will possess a PM interval which is a multiple of T (αT), where α is the coefficient of maintenance interval (CoMI). Where PM completely rejuvenates the component being maintained, it is termed perfect preventive maintenance (PPM). This paper investigates the effect of PPM on system reliability, unavailability and cost. Firstly, a model of the effect of PPM actions on the reliability, unavailability, and cost of components, calculated using a Weibull distribution, is established. Secondly, HiP-HOPS, a semi-automatic method for analyzing system dependability (i.e. safety, reliability and availability analysis), is extended to consider PPM of components in a system. Finally, an evolutionary optimization algorithm, using HiP-HOPS, is applied to PM scheduling. Copyright © 2010 IFAC. Keywords: Preventive maintenance, reliability, availability, multiobjective optimization, genetic algorithms. to improve the state of the component by some degree (Martorell et al 1999, Márquez 2007).

1. INTRODUCTION Maintenance refers to the action taken to retain or restore a component or system to its designed condition (Storey 1996). To retain implies that a maintenance action is performed before failure, in order to keep the system operable on demand, while restore means to return the system back to its operable condition after failure has occurred. The former is referred to as preventive maintenance and the latter as corrective maintenance. A PM that restores the condition of a component to as good-as-new (GAN) is known as perfect preventive maintenance (PPM), whereas that which improves its condition to in-between as bad-as-old (BAO) and GAN is termed imperfect preventive maintenance (Wang and Pham 2006, Márquez 2007). If the state of a component is BAO after maintenance, then the maintenance action had no effect on the state of the component. This could mean that the maintenance action was performed; (i) at an inappropriate time, (ii) not in accordance with prescribed procedure or (iii) the component is beyond its useful life. The improvement mechanism of preventive maintenance of a system can be divided into two parts (Tsai et al 2004). The first is the improvement of degraded parts of the system which are restored by repair (in the case of failure), replacement or renewal. The second part is extending the survival of the remaining parts of the system which can also be restored in the same manner as in the first case. The same improvement mechanism also applies to the components of a system. PPM completely rejuvenates a component; implying an improvement factor f = 1. With IPM, f is a value that lies between 0 and 1 (i.e. 0 ≤ f 1) and each PM activity is likely 978-3-902661-81-4/10/$20.00 © 2010 IFAC

This paper investigates the use of PPM on an engineering system assuming, for simplicity, that no repair (either minimal or corrective) is carried out. Each act of PPM at αT restores the component’s effective age to new (i.e. t0 = 0). Although the effect of renewal on component reliability is the same as the effect of replacement, the two are completely different actions. In replacement, the component age and effective age at replacement time are both t0, where as for renewal, it is only the effective age at renewal time αT that is t0. The remainder of this paper is structured as follows. In section 2, we present HiP-HOPS. In section 3 we discuss the extension of the current capabilities of HiP-HOPS with the ability to calculate system unavailability and reliability with assumptions of PPM for components of a system. In section 4, we demonstrate how these extensions can be used, in conjunction with a genetic algorithm, for optimisation of the PPM schedules for a simplified model of a fuel-oil system supplying the engine of a ship, where in section 5 we present results. Finally in section 6, we draw conclusions. 2. HIERARCHICALLY PERFORMED HAZARD ORIGIN AND PROPAGATION STUDIES (HIP-HOPS) HiP-HOPS is a state-of-the-art compositional system dependability (i.e. safety, reliability and availability) analysis technique. It offers a significant degree of automation and reuse, addressing problems arising from the increasing complexity of systems. With HiP-HOPS, the topology of a system is used together with reusable local failure

204

10.3182/20100908-3-PT-3007.00039

MCPL 2010 Coimbra, Portugal, Sept 8-10, 2010

specifications at component level to automatically produce a network of interconnected fault trees and an FMEA (Failure Modes and Effects Analysis) for the system. The technique is supported by an automated tool which currently works in conjunction with modelling tools like Matlab Simulink and Simulation X – but can also be interfaced to other modelling packages. HiP-HOPS defines a language for the description of failure behaviour at component level. In the basic version of this language, the failure behaviour of a component can be specified as a list of internal failure modes of the component (internal malfunctions) and a list of deviations of parameters as they can be observed at component outputs (output deviations). Each internal malfunction is optionally accompanied by quantitative data, for example a failure and a repair rate if these are known. Output deviations carry Boolean expressions which describe their causes as a logical combination of internal malfunctions of the component and similar deviations of parameters at component inputs (input deviations). HiP-HOPS has recently been extended with multi-objective optimisation capabilities. These allow the tool to search the design space, defined by the variability of a design model, for potential design solutions that are optimal, or near optimal, in terms of dependability and cost. In this approach, a variable design model for a system is one in which components and subsystems have alternative user defined implementations which can include standard fault tolerant configuration schemes. HiP-HOPS uses a multi-objective genetic algorithm to effectively search the design space defined by the permutations of the design that can arise following resolution of variability. The genetic algorithm exploits the automated fault tree and FMEA synthesis and analysis algorithms of the tool to calculate the fitness of candidate designs. The goal is to identify Pareto optimal architectures for the system which give optimal trade-offs between dependability, cost and other parameters. In the remainder of the paper, we show how HiP-HOPS can be extended to take into account PPM in the evaluation of reliability and availability, and how the extended model can then be used to optimise the PPM schedules for the components of a system. 3. PPM MODELS The age reduction model presumes that after each maintenance action, the age of a given component assumes a new value known as the effective age. This effective age for the jth PM stage is expected to fall within a boundary as shown below. 0 

 W  

 

For PPM, the effective age is renewed to t0 (where typically t0 = 0) at every αT. However, the value of the CoMI of components varies and therefore the component PM interval Tp for the component i is expressed as: α                                                                                        2   Fig. 1 shows a component whose time line runs from t0 to τ (i.e. its mean time to failure or useful life). In-between these boundaries exist PM intervals illustrated in the form αT. Also in-between these boundaries lies an arbitrary time t, where t0 t  τ. At any time t, the remaining time trem after the jth PM stage can be modelled as: trem = t – Tp

; j = 1   Tp

trem = t – 2Tp

; j = 2   2Tp

t  3Tp

trem = t – 3Tp

; j = 3   3Tp

t  4Tp

From the above, via induction it becomes obvious that for n PM operations, it is: t

 W  

 W

T

;j=2

W  

 W  

 W

T

;j=3

 

 W  

 W

T                                                          1

τ nTp

2Tp

Tp

t time Fig. 1. time line illustration of PM times The total number n of PM stages that can be performed on a given component of a system can be predetermined as follows.         ;  

    

                 ;  

 

 

 

Where MTTF is the mean time to failure of the component, RT is the useful system operational life time also known as system risk time and Q is the integer quotient of the division. 3.1 Reliability and Unavailability Model under PPM Two scenarios need to be considered in modelling the reliability of a component under PM and these are: i) The probability of surviving until PM time nTp  ii) The probability of surviving the remaining time t - nTp; nTp t ≤ τ  According to Tsai et al (2004, eq 1), the reliability model of a system on the jth PM stage can be constructed as:  

In general, this can be represented as: W

nT                                                                                3

t0

;j=1

W  

t  2Tp

                                                                        4

Where Roj is the probability of surviving until the jth PM stage, where as Rvj is the probability of surviving the 205

MCPL 2010 Coimbra, Portugal, Sept 8-10, 2010

remaining time. For a component with n PM intervals; its reliability can be expressed as the product of the probability to survive each PM stage and the probability of surviving the remaining time;  



For constant PM interval and assuming PPM, the above formula evaluates to the following (Ebeling 1997, eq 9.25):  

                                                     5

The first part of the product is the probability of surviving n PM intervals, while the second part is the probability of surviving the remaining time.

4. OPTIMISATION The established PPM models and the optimisation are demonstrated on a simplified model of the main engine fuel oil service of a ship as seen in fig. 2. The system incorporates a service tank which contains stored fuel oil. The booster pump conveys fuel oil to the mixing tank through a filter and flow meter. Should the pressure level in the mixing tank exceed a defined level, fuel oil is released back to the service tank through a pipe connecting the two. The circulation pump then conveys fuel oil to the main engine through a heater, viscosity meter and a filter. Excess fuel oil not used in the main engine is released to the service tank via the mixing tank.

The Weibull model for equation 5 is as seen in equation 6 (Ebeling 1997, pp205), with location parameter γ = 0.  

Ѳ

Ѳ nTp

t

                    6

(n+1)Tp

Assuming that there is no repair, the unavailability of a component is modelled as in equation 7. 1

                                                                           7

In this paper, component reliability and unavailability under PPM are evaluated using equations 6 and 7 respectively. System reliability and unavailability are then evaluated using the Esary-Prochan approximation (Jin and Coit 2003, eq 2.1) which is applied on the minimal cut sets of the fault trees produced by HiP-HOPS analysis of a system model. 3.3 PPM Cost Model The total preventive maintenance cost varies in response to variation in the total number of PM stages for a component. With the assumption that no repair exist in the PPM policy, our model as seen in equation 8 is a simple one. It does not also account for component replacement which is left out of scope in this work.  

 

                                                                  8

Where CTCi is the total cost for the ith component Cppmi is the cost of performing PPM for the ith component

Fig. 2. Fuel oil service system. The components of the above system are annotated with HiPHOPS failure behaviour data. A detailed presentation of these annotations is impossible in the space provided for this paper, but the component failure behaviour is simple; each component has a single failure mode which causes omission of outputs while input failures typically propagate to the outputs of components. The aim of the optimisation is to define an optimal schedule of PM for the components of the system. An optimal schedule is one that optimises system cost and unavailability (and consequently reliability, as the two are related according to equation 7). In such a schedule, each component i of the system is maintained at an interval αiT a multiple of the shortest PM interval T of the system. Formally, the objective of optimisation can be stated as: Finding a solution A, where A is a set of CoMIs such that A = {α1, α2, α3, ... αm-1, αm}, m being the number of components constituting the system and αi is the CoMI of the ith component. The optimisation problem is then defined as: minimise F(A) = { Us(A), Cs(A) }

Cci is the unit cost of the ith component

and that satisfies the following constraints:

ni is the total number of PM stages for the ith component

T   

Using equation 8 the total system cost CTS under PPM can be expressed as:

Where: Us and Cs are the objective functions representing system unavailability and system cost. is the average failure rate of the component that fails most frequently and is the average failure rate of component i.

                                                                              9 Where m is the number of system components.

1 1    ,   αiT    λH λi

The first of the above two constraints defines that the shortest PM interval T must be smaller than the Mean Time To Failure (MTTF) of the component that fails most frequently in the system. The second constraint defines that for each 206

MCPL 2010 Coimbra, Portugal, Sept 8-10, 2010

unavailability) and the crowding distance CD computed as follows, where x refers to a solution:      ,    2. . 1,      10  

component i, its PM interval must be smaller than the MTTF of the component. These two constraints ensure that maintenance is effective and is not scheduled too late when the reliability of components has dropped too much. 4.1 Genetic Encoding

ii. Step i is repeated for the other objective function (i.e. cost). Finally the crowding distance of each solution j is computed as;

For the purposes of genetic encoding, the CoMI of each component is encoded using an integer number such that: 1    ; this is equivalent to 1      , where

                                                          11

MTTFi is the mean time to failure of the ith component. In HiP-HOPS, system models are configured such that each component’s CoMI parameter is set to the value held in the encoding. Table 1 shows an example of a PM encoding for a system of nine components. Supposing the shortest maintenance interval T = 120, and considering the index i = 6, then the PM time Tp for this 6th component is α6T = 4 x 120.

Where M is the number of objective functions, in this case unavailability and cost. The crowding comparison operator used for selection is as follows: • Given two solutions x and y, solution x is preferred to solution y if Rx < Ry or (Rx = Ry and CDx > CDy), where Rx and Ry are non-domination ranks for solutions x and y respectively

Table 1 example of a PM encoding Component Index CoMI

1 5

2 7

3 3

4 1

5 8

6 4

7 5

8 9

9 8

In each generation t, a child population Qt is created with N new potential solutions. This occurs in the following way:

4.2 GA Selection Method

i. Binary tournament selection is used to select parents from population Pt; genetic operators crossover and mutation (explained in 4.3) are applied to create a child solution encoding.

For optimisation of the PM schedule of the fuel system, we have used a variant of the Non-Dominated Sorting Genetic Algorithm (NSGA-II) which is already implemented in the HiP-HOPS tool. To enable optimisation of PM schedules, we have extended the HiP-HOPS tools to enable evaluation of reliability in conditions of PPM and adapted the genetic algorithm to work with the above encoding.

ii.

iii. Pt and Qt are combined i.e. Rt = Pt based on non-domination.

The mechanics of the adapted algorithm are briefly outlined below. An extensive explanation of the original NSGA II can be found in Deb et al (2002). The algorithm first generates a random initial population P of N number of individuals p. The following steps are then executed: i.

p ϵ P, configure p using the encoding to set the CoMI of each component and then evaluate the unavailability and cost (objective functions) of the system by calling the automatic fault tree synthesis and analysis functions of HiP-HOPS.

ii.

p ϵ P, find np number of solutions that dominate p, and Sp set of solutions for which p dominates.

iii. Add all p with np = 0 into the set F1, the first front (Rp = 1), where R refers to domination rank. iv. For each p ϵ F1, visit each q ϵ Sp and decrement np by 1. If in doing this, np becomes 0 then add q into the set F2 (q belongs to the second front, Rq = 2) v. Repeat step iv with each member of F2 to find the third front, and so on To maintain diversity, the following crowding calculation and comparison is used. i. All l number of solutions in a front are sorted in ascending order of the objective function fm (i.e.

p ϵ Qt, p is configured to reflect genetic changes effected by genetic operators. The values of objective functions (unavailability and cost) are also calculated. Qt and Rt is sorted

iv. From 2N solutions (combination of Pt and Qt) in Rt, N best solutions are selected using the crowding calculation and comparison to form Pt+1 4.3 Genetic Operators Genetic operators, crossover and mutation, are used to create new potential solutions. Crossover recombines the information contained in the encodings of the parents; this affects a local search of the space around the parents and promotes search convergence. Mutation randomly changes genetic make-up and promotes population diversity. The synergistic combination of the two operators together with the selection algorithm guides the search. In this study, a uniform crossover is performed based on a probability pc = 0.9. Each locus of a child solution can only accept one CoMI from either parents, and therefore for each locus of the child solution, both parents have a 50% chance of contributing their CoMI at this locus to the child. Mutation is also used to alter the CoMI of a child in order to infuse new traits into the solution. If l is the length of p ϵ Qt, then a locus li is selected at random using probability pm = 0.167 such that 1 ≤ li ≤ l. Mutation is then performed at this locus and the mutated value of the CoMI for the ith component i must lie between 1 and its maximum value imax, hence satisfying the

207

MCPL 2010 Coimbra, Portugal, Sept 8-10, 2010

precondition 1 ≤ i ≤ imax. The maximum CoMI obtained from equation 12.  

1

imax

is

                                                                               12

probability of the system surviving the remaining time R(t – nTp). The graph conforms to expected patterns and the reliability of the system can be seen to have improved at PM times.

4.4 Evaluation

5. RESULTS

5.1 Non-Optimised Implementation

200

400

600

800

1000

1200

1400

Time Reliability of surviving remaining time [R(t - n.Tp)] Reliability with PPM [Rm(t)] Reliability without PM [R(t)]

A closer view of the effect of PPM is shown in fig. 4 which reveals more on these patterns. Rm(t) shows the cumulative effect of PPM on the system reliability while R(t – nTp) shows that the system is restored to GAN after each PM action. 5.2 PM Scheduling – Optimisation

To illustrate the improvement on reliability due to PPM, a simple PM schedule was defined manually and the reliability of the fuel system was analysed; two cases were considered: without PM and with PPM. Table 2 shows comparison of results obtained for the two cases using a system PM interval of 180 and CoMI of value 1 for all components. The reliability model with no PM under Weibull distribution is as shown in equation 13 (Márquez 2007, eq 4.9). The value of the slope (shape) parameter used is β = 2, and location parameter γ = 0. Ѳ

0

Fig. 3. fuel oil system reliability with and without PPM

The established models for evaluation of unavailability, reliability and cost under PPM and the PM scheduling optimisation approach described in this paper were implemented in HiP-HOPS and applied to the fuel oil system.

 

Reliability

Each candidate solution in the population is evaluated with respect to the objectives of the optimisation (unavailability and cost). The total PM cost for each component is evaluated using equation 8 while each individual p ( p ϵ P p ϵ Qt) is evaluated using equation 9. The unavailability for each component is evaluated using equation 7; system unavailability is evaluated using minimal cut sets produced from configuration of individuals (system) and employs the use of Esary-Prochan approximation formula that has also been used for unavailability prediction by Parker and Papadopoulos (2007) in addressing the redundancy allocation problem with HiP-HOPS.

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

In this experiment a set of optimal PPM schedules for the fuel oil system were generated by the genetic algorithm. The shortest PM interval T was set to 180. With initial and child population size of 150 and after 1056 generations HiP-HOPS produced set of 186 trade-off PM schedules shown in the Pareto frontier of fig. 5. Each of those solutions corresponds to a different PM schedule representing an optimal trade-off among the objectives of optimisation (system unavailability and cost).

                                                        13

1 0.98 0.96

Time (nTp) 0 180 360 540 720 900 1080 1260 1440

Reliability

Table 2 reliability with and without PPM Reliability Without PM With PPM 1 1 0.985703 0.985703 0.944027 0.971611 0.878447 0.95772 0.794216 0.944027 0.697676 0.930531 0.595473 0.917227 0.493812 0.904114 0.397882 0.891188

0.94 0.92 0.9 0.88 0

200

400

600

800

1000

1200

1400

Time Reliability of surviving remaining time [R(t - n.Tp)] Reliability with PPM [Rm(t)]

Fig. 4. closer view at the effect of PPM on fuel oil system reliability

System reliability is clearly improved with PPM. Fig. 3 is the graphical representation of table 2 with the inclusion of the

Each PM schedule is optimal in the sense that it gives best unavailability (and consequently reliability) for a given level of cost. An engineer would be able to choose one of those PM

208

MCPL 2010 Coimbra, Portugal, Sept 8-10, 2010

schedules that best meets requirements and get the maintenance schedule of each component from HiP-HOPS.

Unavailability

0.12

REFERENCES Deb, K., Pratab, A., Agarwal, S. and Meyarivan, T., 2002, A Fast and Elitist Multiobjective Genetic Algorithm: NSGA-II, IEEE Transactions on Evolutionary Computation, (6)2, pp182-197 Ebeling, C.E., 1997, An Introduction to Reliability and Maintainability Engineering, USA: McGraw-Hill

0.08

Jin T. and Coit D.W., 2003, Approximating Network Reliability Estimates Using Linear and Quadratic Unreliability of Minimal Cuts, Reliability Engineering and System Safety, (82)1, pp41-48

0.04

0 11

11.5

12

12.5

13

13.5

14

14.5

Márquez, A.C., 2007, The Maintenance Management Framework: Models and Methods for Complex Systems Maintenance, Springer Series in Reliability Engineering, London: Springer

Cost Optimal PPM schedule

Fig. 5. Pareto frontier of PM schedules 6. CONCLUSION This paper extended HiP-HOPS, a state-of-the-art dependability analysis technique, with capabilities for: a) Reliability, unavailability and cost analysis in scenarios where PPM of components can take place and b) Automatic multi-objective optimisation of PM schedules. The particular formulation and solution of the maintenance scheduling problem represents a wider contribution to research on optimisation in this area. The PPM model was applied on a ship fuel oil system case study, and the results were shown to conform to expectations about system performance under PPM. The PM schedules of this system were subsequently optimised by HiP-HOPS via application of a multi-objective genetic algorithm, which was adapted from its original use of architectural optimisation, to deal with optimisation of PM schedules, in the context of this work. Further work is required in the following areas: •

Calculation of optimal component replacement times and number of replacements throughout the life of the system defined by a predetermined minimum value below which reliability should not drop.



Possible grouping of components for PM at same time as a cost saving measure.



Support for expert judgement on component PM intervals in the optimisation of PM schedules.



Optimisation of PM schedule that also allows for substitution of components with alternatives – useful for optimising system design at early design stages.



Optimisation of PM schedules for imperfect preventive maintenance models.

Martorell, S., Sanchez, A. and Serradell V., 1999, AgeDependent Reliability Model Considering Effects of Maintenance and Working Conditions, Reliability Engineering and System Safety, (64)1, pp19-31 Parker, D.J. and Papadopoulos, Y.I., 2007, Effective Multicriteria Redundancy Allocation Via Model-Based Safety Analysis, In: Intelligent Manufacturing Systems, Alicante, Spain Storey, N., 1996, Safety-Critical Computer Systems, London: Addison Wesley Longman Tsai, Y., Wang, K. and Tsai, L., 2004, A Study of AvailabilityCentered Preventive Maintenance for Multi-Component Systems, Reliability Engineering and System Safety, (84)3, pp261-270 Wang, H. and Pham, H., 2006, Availability and Maintenance of Series Systems Subject to Imperfect Repair and Correlated Failure and Repair, European Journal of Operational Research, (174)3, pp1706-1722

ACKNOWLEDGEMENTS This work was supported by the EU Project ATESST-2 (Grant 224442). We would like to thank Germanischer Lloyd (Erich Ruede, Rainer Hamann) for providing part of the case study.

209