Chapter 4.2
Initial Risk Analysis of Potential Failure Modes B. Carlsson Swedish National Testing and Research P.O. Box 857, S-501 15 Boras, Sweden
Institute,
Abstract: The purpose of this chapter is to describe the important elements of an initial risk analysis of potential failure modes of a component and its materials for a given application. In this analysis all the requirements related to the performance and the durability for the intended appUcation are defined. From a practical point of view, as well as from an economic viewpoint, an assessment of the durabihty or service Ufe of a product component has to be limited in its scope and focused on the most critical failure modes. An important part of the initial step in such an assessment is therefore estimating the risk associated with each of the potential failure modes of the component. The initial risk analysis work program includes: (a) specifying the required function(s) of the component and its minimum acceptable performance, service life, and the anticipated in-use environment; (b) identifying important functional properties and methods for quahfying the performance of the component; (c) identifying potential failure modes, associated degradation mechanisms and methods for qualification of the component and its materials with respect to durability; and (d) estimating the risks associated with each different failure mode. The risk or risk number associated with each identified potential failure and associated damage mode may preferably be estimated by using the methodology of Failure Modes and Effect Analysis (FMEA). The estimated risk number is taken as the point of departure to judge whether a particular failure mode needs to be further evaluated or not. The estimated risk number is also a valuable tool to determine what kind of testing is needed for qualification of a particular component and its materials. Keywords: Initial risk requirements. Service life mechanisms, Performance analysis, Failure modes degradation indicator
4.2.1
analysis. Potential failure modes. Performance requirements. Functional properties. Degradation test methods. Durability test methods. Fault-tree and effect analysis (FMEA), Risk assessment
INTRODUCTION
Using the scheme illustrated in Table 4.2-1 the first step is to analyze the potential failure modes v^ith the objective of obtaining (1) a list of potential failure modes of
147
148
Optical Materials for Solar Thermal Sy terns
the component, and associated risks and critical component and material properties, degradation processes and stress factors; (2) a framework for the selection of test methods to verify performance and service life requirements; (3) a framework for describing previous test results for a specific component and its materials or a similar component and materials used in the component, and classifying their relevance to the actual application; and (4) a framework for compiling and integrating all data on available component and material properties and material degradation technology. From a practical point of view, as well as from an economic viewpoint, an assessment of the durabihty or service hfe has to be limited in its scope and focused on the most critical failure modes. An important part of the initial step in such an assessment is therefore estimating the risk associated with each of the potential failure modes of the component. The work program in the initial step of service life assessment may be structured into the following activities (Carlsson et al., 1995): (a) for in-service use, specify the required function of the component and its materials, its minimum acceptable performance and service life requirement, and the anticipated in-use environments; (b) identify important functional properties defining the performance of the component and its materials, and the relevant test methods and requirements for quaHfying the component to perform its designed function; (c) identify potential failure modes and degradation mechanisms, relevant durability or life tests and durability requirements for qualification of the component and its materials; (d) estimate the risks associated with each different failure mode and associated degradation mechanisms. The results of the initial risk analysis of potential failure modes and associated degradation mechanisms may be documented as indicated in Table 4.2-1.
Table 4.2-1. Examples of documention of the results from an initial risk analysis of potential failure modes A. Specification of end-user and product requirements Function and general General requirements for requirements long-term performance during design service time
In-use conditions and severity of environmental stress
B. Specification of functional properties and requirements on component and its materials Critical functional Test methods for Requirements for properties determining the functional functional capability and properties long-term performance C. Potential failure modes, critical factors of environmental stress and degradation Failure/damage mode/ degradation indicator Critical factors of degradation process environmental stress/ degradation factors and severity
Initial Risk Analysis of Potential Failure Modes
149
4.2.2 PENALTY AND FAILURE The first activity is to specify the function of the component and service Hfetime requirement in general terms from an end-user and product point of view, and from that to identify the most important functional properties of the component and its materials (see Table 4.2-2). The importance of the function of the component from an end-user and product point of view needs to be taken into consideration when formulating the performance requirements in terms of those functional properties. For determining, the Penalty, it may be important to understand the consequences of different types of failure and to define the general performance requirements and service lifetime. Failure of a component occurs when a minimum performance level.
Table 4.2-2. Example of the results from an initial risk analysis of potential failure modes (based on the information taken from the IE A task 10 case study on selective solar absorber surfaces (Carlsson et al., 1994)) A. Specification of end-user and product requirements on component Function and general requirements
General requirements for long-term performance during design service time
In-use conditions and severity of environmental stress
Efficiently convert solar radiation into thermal energy Suppress heat losses in the form of thermal radiation
Loss in optical performance should not result in reduction of the solar system energy performance (solar fraction) by more than 5%, in a relative sense, during a design service time of 25 years
Absorber is in contact with air Air is exchanged with surroundings, so airborne pollutants may contaminate the absorber Humidity in the absorber contact air may be high if assembly is not water(rain) tight Maximum temperature permitted is 200° C
B. Specification of functional properties and requirements on component and its materials Critical functional properties
Test method for determining functional property
Solar absorptance (a) Thermal emittance (e) Adhesion of the absorber to its substrate (ad)
ISO CD 12592.2 ISO CD 12592.2 ISO 4624
Requirement for functional capability and long-term performance Functional capability a > 0.92 s<0A5 ad> 0.5 MPa Long-term performance - A a + 0.25, A£<0.05
150
Optical Materials for Solar Thermal Sy terns
at which satisfactory functioning cannot be guaranteed, is no longer obtained. Thus, if the performance expectations are not fulfilled, the particular component has failed. Performance requirements can be formulated on the basis of optical properties, mechanical strength, aesthetic values, or other criteria related to the performance of the component and its materials. For failure modes characterized by a gradual degradation in performance, the consequences of failure may not be significant shortly after the minimum performance requirement is no longer met. For catastrophic types of failure modes, however, the intended functional capabihty of the component or some part of the component may be completely lost. Defining the performance requirements should be accompanied by an assessment of the economic effects of a component failure at that level of performance. Based on these requirements, a mean service lifetime may be defined. Alternatively, a minimum reliability expectation may be defined that must be maintained for a selected number of years (see Figure 4.5-1 in Chapter 4.5). An example from the IE A Task 10 case study on selective solar absorbers (Carlsson et ai, 1994) of how these first steps in the initial risk analysis of potential failure modes may be documented is given in Table 4.2-2 (see also Figure 4.2-1).
4.2.3 FAILURE, DAMAGE, CHANGE, AND DEGRADATION INDICATORS Potential failure modes and important degradation processes should be identified after failures have been defined in terms of the minimum acceptable performance levels. When identifying potential failure modes, it is important to distinguish between (a) failures initiated by the short-term influence of exposure to environmental stresses for which the latter represents events of high-environmental loads on the component and its materials, and (b) failures initiated by the long-term influence of exposure to environmental stresses in which the latter results in material degradation so that the performance and sometimes also the environmental resistance of the component and its materials gradually decrease. In case (a), catastrophic failures occur, whereas in case (b), materials degradation may result not only in gradual types of failures but also in catastrophic types of failures. In general, many kinds of failure modes exist for a particular component and even the different parts of the component and the different damage mechanisms, which may lead to the same kind of failure, can sometimes be quite numerous. An overview of common failure mechanisms experienced by materials has been given (Dasgupta and Pecht, 1991). Discussions of specific mechanisms are also available for corrosion (Tullmin and Roberge, 1995), electromigration (Black, 1969; DiGiacomo, 1982; Nitta et ai, 1993; Young and Christou, 1994; Krumbein, 1995), diffusion (Li and Dasgupta, 1994), and photodegradation (Al-Sheikhly and Christou, 1994).
Initial Risk Analysis of Potential Failure Modes
151
Porous AI2O3 Ni particles Homogeneous AI2O3 Metallic Al
a) Component/materials: Selective solar absorber coating of anodised aluminium pigmented with small metallic nickel particles
Glazing Air Absorber b) Cross section of the flat-plate collector
Flat-plate solar collector
Solar absorber surface c) Application: Use in single-glazed flat-plate solar collector for Domestic Hot Water system Figure 4.2.1. Principal components of the selective solar absorber system studied in lEA Task 10 for use as single glazed flat-plate solar collectors to be installed in domestic hot water systems (Carlsson et al., 1994).
Fault-tree analysis is a tool that provides a logical structure relating failure to various damage modes and the underlying chemical or physical changes. The use of fault-tree analysis to describe the potential degradation pattern of an organic coating system for metals is illustrated in Table 4.2-3. The objective of fault-tree analysis is to identify potential failure and damage modes, associated degradation mechanisms or mechanisms that result in material degradation and damage, and the associated critical factors of environmental stress or degradation factors. For the purpose of service hfe prediction, it is important also to select suitable degradation indicators for the different potential failure modes so that failure may be assessed properly. If possible, the degradation process responsible for each critical failure may also be followed (see the example in Table 4.2-3).
152
Optical Materials for Solar Thermal Sy terns
Table 4.2-3. Example of using a fault-tree analysis to represent potential relationships of failures, damage, and change to identify suitable degradation indicators and critical environmental stress/degradation factors for organic coatings on metals (Carlsson et ai, 1995) Corrosion of substrate because of lost corrosion protection capability of the coating
Failure of coating — Unacceptable change in appearance of surface because of coating degradation
Degradation indicator
Failure/Damage modes
Uniform corrosion of coated substrate
Degree of corrosion
Blistering
Degree of blistering
Adhesion loss/ undercorrosion at defects or damage of the coating
Creep of corrosion defects from mechanical damage
Partial or total adhesion loss
Adhesion/Degree of flaking
Cracking
Degree of cracking
Colour Change
Colour change AE (CIE system)
Gloss change
Specular reflectance change
Degradation resulting from chemical attack
Soiling
Failure/Damage mode
Blistering
M
Colour change AL (CIE system)
Degradation mechanisms
Critical factors of environmental stress
Swelling/shrinking resulting from water absorption/desorption from coating
Alternating wet/dry periods, salt contaminants and other corrosionpromoting agents
Osmotic migration of water through the coating resulting from salt contaminated substrate surface
Surface contaminants and corrosionpromoting agents
Cathodic delamination resulting from local corrosion cell reactions
Corrosion-promoting agents
Osmotic migration of water to anodic 1—j surfaces in local corrosion cells
Corrosion-promoting agents
Initial Risk Analysis of Potential Failure Modes
153
4.2.4 LOAD, EFFECTIVE STRESS, AND DEGRADATION FACTORS To identify potential failure modes, the type of environmental stress factors and their severity under service conditions must be known. For the purpose of service life prediction, in-use conditions representing a worst case may be selected. Alternatively, in-use conditions may be determined by measuring the environmental stresses under varying service conditions and selecting data from the most representative case as a basis for service life prediction. In the initial phase of service hfe prediction, however, the most important issue is to identify the most critical in-use conditions and environmental stress factors that may contribute to material degradation and result in failure. Potential degradation mechanisms, failure and damage modes are identified on the basis of this knowledge (see Table 4.2-3). A literature research is recommended after potential failure modes, degradation processes and critical factors of environmental stress have been initially identified. The objective of this search is to find reports on durability and available service Hfe data on similar components and materials used in a similar appUcation as those being investigated. Alternatively, durability or service life for the specific component and materials of the component in other applications and in-use environments as the investigated object are also sought (see Table 4.2-4). 4.2.5 RISK ANALYSIS ACCORDING TO FAILURE MODES AND EFFECT ANALYSIS The risk or risk number associated with each identified potential failure and associated damage mode can be estimated by use of the methodology of Failure Modes and Effect Analysis (FMEA) in a simplified way; see lEC Standard (1985) and Failure Mode and Effect Analysis (FMEA) (1993) for reviews of the FMEA methodology. The estimated risk number is taken as the point of departure to judge whether a particular failure mode needs to be further evaluated or not. The estimated risk number may also be used to determine what kind of testing is needed for qualification of a particular component and its materials. The risk number associated with a particular failure mode is estimated by using the following factors: Severity (5), Probability of occurrence {PQ)^ and Probabihty of escaping detection {PD)- The risk number RPN is the product of all these factors.
Table 4.2-4. Examples of documention of the available service life data of relevance to the investigated case Component/Materials
Available service life data
Remark
Optical Materials for Solar Thermal Sy terns
154
Table 4.2-5. Rating scales for FMEA analysis with respect to servity, probability of escaping detection, and probability of occurence in the assessment of the risk numer for a particular failure/damage mode. An example is given in the lower table for organic coatings on metals (see Table 4.2-3) 1 Severity
RPN 1 1 Probability of occurrence
RPN 1 1 Probability of detection 1
Failure that always is noted. Probability for detection > 99.99%
1
2-3
Normal probability of detection 99.7%
1 Risl< of failure in product 1 function
4-6
1 Certain failure in product functioning 1 Failure that may affect 1 personal safety
Minor effect on product but no effect on product function
RPN 1
Unlikely that failure will occur
1
2-4
Very low probability for failure to occur
2-3
Certain probability of detection >95%
5-7
Low probability for failure
4-5
7-9
Low probability of detection >90%
8-9
10
Failures will not be found cannot be tested
10
1 Moderate probability for failure to occur High probability for failure to occur 1 Very high probability for 1 failure to occur
6-7 8-9 10
RPN
Failure/Damage modes of the coating Uniform corrosion of coated substrate
6
6
2
72
Blistering
4
7
5
140
Adhesion loss/ undercorrosion at defects or damage of the coating
4
9
3
63
Partial or total adhesion loss
7
6
2
84
Cracking
4
6
5
120
Colour Change
7
7
7
343
Gloss change
7
8
8
448
Degradation resulting from chemical attack
8
4
3
96
Initial Risk Analysis of Potential Failure Modes
155
Table 4.2-6. Risk assessment of potential failure modes Failure mode/degradation process
Severity (rating number)
Probability of occurrence (rating number)
Rating number for risk
Probability of discovery (rating number)
Table 4.2-7. Example of the results of an initial risk analysis with information from the lEA task 10 case study on selective solar absorber surfaces (see also Tables 4.2-2 and 4.2-5) C. Service reliability! service iif^> cost of failures and maintenance Failure/damage mode/degradation process
Degradation indicator
Unacceptable loss in optical performance
PC' Adhesion
Estimated risk of the failure/damage mode from FMEA (see Figure 4.2-2)
Critical factors of environmental stress/degradation factors and severity
High-temperature High temperature Reflectance oxidation of metallic spectrum Vis-IR nickel (B) Electrochemical Reflectance High humidity, corrosion of metallic spectrum Vis-IR sulphur dioxide nickel (atmospheric corrosivity) (C) Hydratization of Condensed water, Reflectance aluminum oxide spectrum IR temperature
S
Po
PD
RiskRPN
112
(A)
5
5^
175
196
PC = performance criteria = — AQ;-|-0.25Ae. ^PD value resulted from possible failure of the glazing.
i.e., RPN = S' Po' PD- The first factor, Severity, is a measure of the consequences of a particular failure for safety and economic reasons, when the component and its materials are treated as part of a product or system. For rating Severity, a scale with ten degrees may be used as defined in Table 4.2-5. The second factor. Probability of occurrence, is a measure of how probable it is that failure will occur according to the particular mode during the design service life of the component and its materials. A ten-degree scale may also be used for rating, as defined in Table 4.2-5. The third factor. Probability of escaping detection, accounts for the probability that a damage or failure mechanism will escape detection that could have prevented failure. The ten-degree scale may also be used here, as defined in Table 4.2-5. The risk assessment may be documented as shown in Table 4.2-6. In the risk assessment of potential failure modes, the relevance of durabiUty and Ufe data found in the literature for the specific component and its materials must be considered. The risk assessment is most advantageously performed by a group of experts.
156
Optical Materials for Solar Thermal Sy terns
The reasonableness of setting the design service Hfe of the component or parts of the component at the same level as that of the product may also be questioned during the risk assessment. During maintenance or repair work, the component or parts of the component may be replaced, which may considerably lower the requirement on the service life of the component or some of its replaceable parts. The risk assessment made at this initial stage of service life prediction is only quaUtative in nature. The main purpose of the risk assessment is to limit the scope of the service hfe evaluation to focus on the most important failure modes. The rating numbers may be used principally as an aid to reduce the number of critical failure modes in the subsequent evaluation of the service life of the component. An example of the result of an initial risk analysis based on information from the lEA Task 10 absorber surface case study referred to in Table 4.2-2 is shown in Table 4.2-7.
REFERENCES Al-Sheikhly, M. & Christou, A. (1994 December) How Radiation Affects Polymeric Materials, IEEE Trans. Reliability, 43, 551-556. Black, J.R. (1969 April) Electromigration - A Brief Survey and Some Recent Results, IEEE Trans. Electron Devices, ED-16, 338-347. Carlsson, B., Berglund Ahman, A. & Jutengren, K. (1995) Assessment of Service Life by Accelerated Testing - Methodology for Qualification of Rust Protective Paint Systems; SP Swedish National Testing and Research Institute, SP-Report 1995:65, ISBN 91-7848-592-4 (Swedish), SE-50115 Boras, Sweden. Carlsson, B., Frei, U., Kohl, M. & Moller, K. (1994) Accelerated Life Testing of Solar Energy Materials - Case Study of Some Selective Solar Absorber Coatings for DHW systems, International Energy Agency, Solar Heating and Cooling Programme Task X: Solar Materials Research and Development, Technical Report, SP- Report 1994:13, SE-50115 Boras, Sweden. Dasgupta, A. & Pecht, M. (1991) Material Failure-mechanisms and Damage Models, IEEE Trans. Reliability, 40, 531-536. DiGiacomo (1982) Metal Migration (Ag, Cu, Pb) in Encapsulated Modules and Time-to-fail Model as a Function of the Environment and Package Properties, IEEE Proc. Int'l Reliability Physics Symp., 27-33. Failure Mode and Effect Analysis (1993) Instruction manual from Volvo Car Corporation, In: Britsman, C, Lonnqvist, A. & Ottosson, S.O., Failure Mode and Effect Analysis, Ord&Form AB, ISBN 91-7548-317-3, Stockholm, Sweden. lEC Standard, Publ. No.812 (1985) Analysis Techniques for System Reliability - Procedure for Failure Mode and Effect Analysis (FMEA), CH-1211 Geneva 20, Switzerland. Krumbein, S.J. (1995 December) Electrolytic Models for Metallic Electromigration Failure Mechanisms, IEEE Trans. Reliability, 44, 539-549. Li & Dasgupta, A. (1994, March) Failure-mechanism Models for Material Aging due to Interdiffusion, EEE Trans. Reliability, 43, 2-10.
Initial Risk Analysis of Potential Failure Modes
157
Nitta, T. et al. (1993 April) Evaluating the Large Electromigration Resistance of Copper Interconnects Employing a Newly Developed Accelerated Life-test Method, / . Electrochem. Soc, 140, 1131-1137. Tullmin, M. & Roberge, P.R. (1995 June) Corrosion of Metallic Materials, IEEE Trans. Reliability, 44, 271-278. Young, D. & Christou, A. (1994, June) Failure-mechanism Models for Electromigration, IEEE Trans. Reliability, 43, 186-192.