Microelectronics Reliability 50 (2010) 1230–1235
Contents lists available at ScienceDirect
Microelectronics Reliability journal homepage: www.elsevier.com/locate/microrel
The FMEDA approach to improve the safety assessment according to the IEC61508 M. Catelani, L. Ciani *, V. Luongo Department of Electronics and Telecommunications, University of Florence - Via S. Marta 3, 50139 Firenze, Italy
a r t i c l e
i n f o
Article history: Received 29 June 2010 Accepted 19 July 2010 Available online 10 August 2010
a b s t r a c t According to the Standard IEC61508, the paper presents a case study concerning the evaluation of both the safe failure fraction (SFF) and the probability of failure on demand (PFD) for a complex system. After a preliminary presentation of the criteria for the safety integrity level (SIL) verification, the work focuses the attention on the method to achieve the PFD. In particular, an approach based on failure modes, effects and diagnostic analysis (FMEDA) is proposed and then a comparison with the approach described in the Standard. The paper aims to clarify both the knowledge and the application of the IEC61508 and proposes a technique to satisfy the hardware safety integrity requirements. Ó 2010 Elsevier Ltd. All rights reserved.
1. Introduction In many fields of application it’s fundamental to reduce the consequences of hazardous events that could generate potential source of harm for person’s health and environment. In this context a key role is covered by the safety system. In fact, the function of a safety system is to provide an independent protection layer that implements the required safety function necessary to achieve, or maintain, a safe state of an electrical, electronic, and/or programmable electronic (E/E/PE) devices. For this purpose IEC61508 [1] Standard represents a guide for design, validation and verification of a safety instrumented system (SIS). Such Standard states criteria and guidelines which enables the management of a safety-related system right from the first phases of the project up to the recollection of the product from the market. The fundamental purpose of a SIS is to bring the plant, or an equipment, to a safe state if an hazardous event occurs in order to mitigate their consequences to humans, the environment, and material assets. Safety integrity is one of the main concepts in this Standard: it is defined as the probability of a SIS satisfactorily performing the required safety functions under all the stated conditions within a specific time interval [1]. To this aim, the Standard classifies four distinct discrete levels for the safety integrity denoted as safety integrity level (SIL). SILs specify the safety integrity requirements of the safety functions – denoted as safety instrumented functions – to be allocated to the safety instrumented system; in particular, SIL 4 correspond with the highest level of safety integrity while SIL 1 is the lowest. To fulfil a specified SIL, all the safety integrity requirements must be met [2]. They are split into requirements
for systematic safety integrity and hardware safety integrity. The systematic type are referred to the technique and measures that avoid systematic fault during design and development. The hardware safety integrity, instead, comprises quantitative and qualitative requirements. The first one must be considered to estimate the probability of failure on demand (PFD) or the probability of failure per hour (PFH), according to the frequency of demand. A demand is a process deviation that must be handled by the SIS. Low demand mode means that the SIS experiences a low frequency of demands, typically less than once per year and the PFD parameter must be calculated. If more frequent demands are expected for the safety function, typically several times a year, the SIS is operating in high or continuous mode and PFH must be taken into account. These parameters indicate the average probability of SIS’s failure to perform its design function on demand, of each safety function. They concern only the dangerous random failures, that are physical failures where the supplied service deviates from the specified service due to physical degradation of the item. The qualitative requirements, instead, are related to the architectural constraints that limit the achievable SIL based on Hardware Fault Tolerance (HFT) and the safe failure fraction (SFF) of the sub-system. The HFT represents the ability of a functional unit (hardware) to continue to perform a required function in the presence of faults or errors. An HFT of N, e.g., means that N + 1 faults could cause a loss of the safety function. The SFF is a parameter that gives the fraction of overall hardware failure rate of device considered as safe or detected dangerous failure. Referring [3], SFF can be evaluated as the ratio between the sum of safe failures rate kS and detected dangerous failure rate kDD, over the sum of all possible failure rate kT (safe and dangerous), that is:
P * Corresponding author. E-mail address: lorenzo.ciani@unifi.it (L. Ciani). 0026-2714/$ - see front matter Ó 2010 Elsevier Ltd. All rights reserved. doi:10.1016/j.microrel.2010.07.121
SFF ¼
kS þ P
P kT
kDD
ð1Þ
M. Catelani et al. / Microelectronics Reliability 50 (2010) 1230–1235
In addition to SFF, the diagnostic coverage (DC) represents another important parameter for the safety analysis. It is defined as the ratio of the detected dangerous failure rate with respect to the total dangerous failure rate kD of the component or sub-system detected by diagnostic tests:
P kDD DC ¼ P kD
ð2Þ
The paper is structured as follows. In Section 2, the attention is focused on an alternative analysis for SFF and PFH evaluation, based on the failure modes, effects and diagnostic analysis (FMEDA) technique. A case study for the SIL verification concerning a complex safety system with a particular architecture, working in high demand mode, is addressed also in this section. In Section 3 both SFF and PFH parameters are evaluated and some criticalities and ambiguities concerning the application of IEC61508 comparing to FMEDA approach are underlined.
1231
2. The Proposed approach A SIS is structured in three main sub-systems as shown in Fig. 1: input elements, logic solvers, and actuators or final elements. The input elements are used to detect the onsets of hazardous events (e.g. sensor, push buttons, switches, etc.), the logic solver for deciding what to do (e.g. programmable logic devices), and the final elements to perform according to the decision (e.g. valves, solenoids and circuit breakers). The SIS may perform one or more safety instrumented function (SIF). The logic solver is often a shared component for all SIFs that are implemented by the same SIS. The system under investigation is an electronic parking brake protection system, that implements several safety functions. According to the general structure, the safety system of this equipment is shown in Fig. 2; it is composed by four sensors, two microprocessors (logic solver) and four actuators. In this case, each block of safety system is configured as 1oo2: this architecture consists of two channels, each of them can
Fig. 1. General structure of a SIS – safety instrumented system.
Fig. 2. Parking brake SIS.
1232
M. Catelani et al. / Microelectronics Reliability 50 (2010) 1230–1235
execute the safety function by itself. If one channel fails dangerously the other channel is still able to execute the safety function and thus the safety system is still available. It was assumed that any diagnostic testing would only report the faults found and it would not change any output states. Furthermore, we assume that the system works in high demand or continuous mode of operation due to its field of application [4]. In order to prove the achievement of a specified SIL, the hardware safety integrity requirements must be met. Our approach, as shown in Fig. 3, is based on the reliability prediction and FMEDA analysis [5] that allow to identify all the fault states of the component or sub-system under examination. After the FMEDA analysis, in order to demonstrate that sufficient fault tolerance is ensured and the safety function is able to meet the reliability target, the SFF and PFH are evaluated. The first step is represented by the reliability prediction, that plays a fundamental role in the electrical–electronic field [6]. Under specified hypotheses, it can be performed by using particular data base or field data. Nevertheless, as known, the reliability evaluation depends on the particular application of the system. In electric/electronic field as well as in mechanical, some information are strictly necessary to this aim: operating conditions, environment, stress, life/wear-out, and so on. In this context, It is more difficult problem compared to electronics due to the presence of more failure modes and mechanisms and a strong maintenance dependencies.
Fig. 3. Necessary steps for SIL verification.
In each case field data and data from laboratory tests must be taken into account in order to obtain information on failure modes and mechanisms and to guarantee a more accurate reliability prediction according to different methods proposed in literature [7]. After the reliability prediction and the failure rate evaluation, the second step shown in Fig. 3. concerns the Failure Mode, Effect and Diagnostic Analysis. FMEDA [8] can be considered as an extension of the FMEA-Failure Mode and Effect Analysis, to identify diagnostic techniques. Referring to SIS, the main purpose of FMEDA can be summarized as: Identification of the possible ways a safety system can fail to perform its designed function/s and the consequences of those failures. Definition of the measures that can be implemented to detect, or prevent, a failure or reduce the criticality of the consequences. Collection of information for the safe failure fraction (SFF) calculation. In fact, as shown in Fig. 4, FMEDA requires to indicate for each component, the failure rate, the failure modes and the probability of occurrence as well as the effects on safety-related output signal that can be classified as safe or dangerous. An important additional information is represented by the diagnostic coverage (DC parameter), that allows to obtain a correct division of the failure rates as detected or undetected for a given component. To this aim, the Standard advises a partitioning of failures into 50% safe and 50% dangerous. The approach of the Standard seems to be ambiguous [9]. This is an important issue because an erroneous division of failures rate leads to a wrong value of SFF and PFD/ PFH [10]. The FMEDA is a fundamental step in meeting the requirements of IEC61508; evaluation of SFF and PFD/PFH, in fact, needs to achieve a realistic classification of failure rates in each important category (safe detected, safe undetected, dangerous detected, dangerous undetected) in the safety models. In order to give a more detailed and exhaustive description of the FMEDA technique a part concerning the sensor block of the SIS considered in this work is shown in Table 1. First of all, we can observe the description and classification of each constituting element on the first four columns starting from the left. For each component of the system, all possible failure modes should be considered in column 5: oc-open circuit, sc-short circuit, dr-drift. It is important to underline that the identification of the failure modes can be often difficult to achieve. Failure modes depend, in fact, on many factor and different Standard give often different information. The percentages with which failure modes occur are summarized in column 6 and the effects produced by failure, classified in safe and dangerous, is shown in column 7. Columns 8 and 9 concern, respectively, the failure rate achieved from reliability prediction and the diagnostic coverage (DC) obtained on the basis of the following considerations: 0% if the fault is never detected. 50% if the fault is detected only in specific conditions or mode of operation. 75% the main part of the faults can be detected.
Fig. 4. FMEDA approach: details.
M. Catelani et al. / Microelectronics Reliability 50 (2010) 1230–1235
1233
Table 1 FMEDA: extract of sensor block analysis.
100% if the fault is always detected. An important issue is that the proposed method shows diagnostics only for known component failure modes. The Standard, instead, requires a classification of the failure rate of each component in: probability of safe failure kS, and the probability of dangerous failure kD. Dangerous and safe are then splitted in detected and undetected to indicate the fraction of dangerous failures and safe failures which will be respectively detected or undetected
by the diagnostic tests (columns from 10 to 13). The classification between safe and dangerous failures can be made in deterministic way for simple components but is otherwise based on engineering choice. 3. Probability of failure per hours evaluation This section focuses on the method to achieve the probability of failure per hour (PFH) with FMEDA approach and the comparison
1234
M. Catelani et al. / Microelectronics Reliability 50 (2010) 1230–1235
between FMEDA and the approach (PFHSTD) proposed by the Standard. As already said, the system considered in this paper is classified as continuous mode of operation; consequently the probability of failure per hour is calculated. According to IEC61508 this parameter, referred to 1oo2 architecture, is given by:
PFH ¼ 2ðð1 bD ÞkDD þ ð1 bÞkDU Þ2 t CE þ bD kDD þ bkDU
ð3Þ
where b denotes the fraction of undetected failures that have a common cause; bD represents the fraction of failures detected by the diagnostic test that have common cause; tCE is the channel equivalent mean down time (in hours) given by:
t CE ¼
kDU T i kDD MTTR þ MTTR þ kD 2 kD
ð4Þ
In Eq. (4), Ti represents the proof-test interval (in hours) and MTTR denotes the mean time to restoration (in hours). The probability of failure per hour of the system PFHsys is determined by calculating and combining the PFH for all sub-systems which together provide the safety function. This can be expressed by:
PFHSYS ¼ PFHS þ PFHL þ PFHA
ð5Þ
where PFHS, PFHL, PFHA are, respectively, the probability of failure per hour of the three main sub-systems: sensor, logic solver and actuator. The PFH calculation with FMEDA approach is carried out by the Eqs. (3)–(5), where the lambda values (kDD, kDU and kD) are obtained by the FMEDA analysis. To achieve PFH values as function of b factors, it is assumed b = 2bD, and proof-time intervals consistent with high demand mode (Table 2); the value of MTTR is considered equal to 8 h as suggested by the Standard. It is worth noting that the PFH parameter changes in term of b and bD, which are calculated for each sub-systems taking into account which measures lead to an efficient defence against the occurrence of common cause failure. For bD = 1% b = 2% the system is classified as SIL 3; in the other cases the achievable level is SIL 2. So, a correct evaluation of b and bD becomes important for the SIL verification. PFH values calculated above are compared with values obtained using the method advised by IEC61508 (PFHSTD), as summarized in Table 3.
Table 2 PFH (in high demand or continuous mode of operation) as a function of b factor and proof-test interval (Ti), calculated with FMEDA approach. Ti
1 year 6 months 3 months 1 month
PFH probability of failure per hour of system bD = 1% b = 2%
bD = 5% b = 10%
bD = 10% b = 20%
2.66E08 2.66E08 2.66E08 2.66E08
1.32E07 1.33E07 1.33E07 1.33E07
2.65E07 2.65E07 2.65E07 2.65E07
Fig. 5. PFHSTD and PFH vs. b factors.
In this case the lambda values considered are kD = kS = k/2, being k the failure rate of each SIS sub-systems. The PFH value estimated by Standard method is greater than the value estimated using FMEDA, therefore it gives a lower SIL. The patterns of PFHSTD along with PFH, that are respectively the probability of failure per hour evaluated as suggested by IEC61508 and which that is obtained with the proposed method, are shown in Fig. 5 as function of the values of b and bD . 4. Conclusions In this paper a case study concerning the safety analysis of a complex system based on FMEDA has been presented. From this analysis, more information concerning failure modes, effects and detection of each SIS component under examination, in comparison with the Standard IEC61508 approach, can be deduced and analyzed. A first ambiguity in the Standard is given by the assumption that only random hardware failures are taken into consideration in the SFF calculation; at the same time, in the IEC61508-Part 2, the target is to be applied, equally, both to random hardware failures and systematic failures. A solution of this problem is that the reliability prediction must be as more carefully as possible regardless of handbook selected; it is important to ensure that the data are valid and suitable for the field of application of the device under examination. The PFH values calculated by using the failures classification obtained with the technique FMEDA are then compared with those obtained using Standard method, that advises a division of failures into 50% safe and 50% dangerous. IEC61508 SIL assessment is based on a series of formulas for PFD calculation, in which some parameters are not very easy to evaluate. As conclusion the values of two methods are quite close, but the estimate by FMEDA is slightly less. Therefore the Standard gives a more conservative results while the FMEDA allows a more accurate SIL assessment, so optimizing the design in terms of costs and complexity. References
Table 3 PFHSTD as a function of b factor and proof-test interval (Ti) calculated as proposed by the Standard. Ti
1 year 6 months 3 months 1 month
PFHSTD bD = 1% b = 2%
bD = 5% b = 10%
bD = 10% b = 20%
6.96E08 5.47E08 4.76E08 4.24E08
2.25E07 2.12E07 2.05E07 2.01E07
4.19E07 4.08E07 4.03E07 3.99E07
[1] IEC61508, Electric/Electronic/Programmable Electronic safety-related systems, parts 1–7. Technical report. International Electrotechnical Commission, March 2002. [2] Lundteigen M.A, Rausand M. Assessment of Hardware Safety Integrity Requirements. In: Proceedings of the 30th ESReDA seminar, Trondheim, Norway, June 07–08, 2006. [3] Rausand M, Høyland A. System reliability theory. 2nd edition. New Jersey: J. Wiley & Sons, Inc., Hoboken; 2004. [4] Langeron Y, Barros A, Grall A, Bérenguer C. Combination of safety integrity levels (SILs): a study of IEC61508 merging rules. J Loss Prevent Process Ind 2008;21:437–49.
M. Catelani et al. / Microelectronics Reliability 50 (2010) 1230–1235 [5] Mariani R, Boschi G, Colucci F. Using an innovative SoC-level FMEA methodology to design in compliance with IEC61508. In: Design automation and test in Europe conference & exhibition, Nice, France, 16–20 April, 2007. [6] Murthy DNP, Rausand M, Virtanen S. Investment in new product reliability. Reliab Eng Syst Safety 2009;94(10):1593–600. [7] Bisschop J. Reliability methods and standards. Microelectron Reliab 2007; 47(9–11):1330–5.
1235
[8] Catelani M, Ciani L, Singuaroli R, Luongo V. Evaluation of the safe failure fraction for an electromechanical complex system: remarks about the Standard IEC61508. In: Proc of IEEE I2MTC 2010 Austin, Texas, May 3–6, 2010. p. 949–53. [9] Smith DJ, Simpson KGL. Functional safety. 2nd ed. Elsevier ButterworthHeinemann; 2004. [10] Rouvroye JL, Brombacher AC. New quantitative safety standards: different techniques, different results? Reliab Eng Syst Safety 1999;66(2):121–5.