Reliability diagnostics of electronic equipment V. Solja(~i(~ and M. Kosic
Iskra Automatika, Research Institute, 61000 Ljubljana, Stegne 15/b, Yugoslavia
Reliability diagnostics is important in the selection and design of equipment and spare parts, and for establishing maintenance policy. Three possible methods are: failure rate analysis; failure mode, effect and criticality analysis; and fault tree analysis. The importance of reliability diagnostics in the early planning and design stages is emphasised. Keywords: Diagnostics,reliability,failure rate, failure mode, fault tree
1. Introduction Reliability diagnostics of electronic equipment in the early phase of a design is based on prediction of the reliability of component parts, units, equipment and systems. Early estimation and diagnostics is necessary for choosing the most satisfactory design of equipment, for the correct planning of spare parts, determining better maintenance policy and for achieving better availability during the life cycle of the equipment.
resistors, capacitors, integrated circuits, connectors, etc, in all kinds of applications and laboratories. A design engineer uses such formulae to calculate failure rates of individual components, units and even equipment. Based on this analysis, the 'weak points' of equipment can be determined. The widely used mathematical models for calculation are in handbooks: MIL-HDBK217 ( A , . . . E), CNET's and BELL-COR's. The basic form of these models is: Ax = Ab.,rr 1-,n-2.. . .
2. S u g g e s t e d m e t h o d s From literature and practice we know a few methods for early diagnostics of the incorrect working of equipment, from the reliability point of view. The most often used are: (a) failure rate analysis (A) (b) failure mode, effect and (FMECA) (c) fault tree analysis (FFA).
criticality
analysis
It should be mentioned that these methods as analysis and reliability diagnostics can be divided into two groups: the inductive and the deductive approaches. The inductive approach, including failure rate analysis and FMECA, analyses the influence of separate items and events on the function of the equipment or the system. The deductive approach, including FTA, analyses the reasons for some failure of equipment or system. Thus it can be stated that the inductive approach is used when we want to determine what kind of states of equipment are possible (generally a failure state), and the deductive approach is used when we want to determine how a given system state (usually a failed state) can
where Ax is the failure rate of the electronic part and Ab is the basic failure rate of the electronic part, taking into account the technology, temperature and electrical stress. ~'), 7 r 2 . . . are factors connected to quality, construction, environment, electrical stress, application, etc. Here it has to be pointed out that it is assumed that the equipment is in the flat part of the so-called 'bath-tub' curve (Fig 1). That means that the equipment is in the 'stable' part of its useful life (period B), and that early failures (period A) are eliminated in the production phase (burn-in, run-in). In this case the failures that occur because of wearing out are not taken into consideration (period C).
Occur.
(a) Failure rate analysis r
Failure rate analysis is carried out based on knowing the failure rate of an equipment's electronic parts. The mathematical model for estimating the failure rate of electronic parts is the result of many years observation of
MeasurementVol 8 No 3, JuI-Sep 1990
A
B
C
Fig 1 Typical failure rates over time 141
Solja~i~ and Kosi# For estimating the failure rate of units, from the reliability point of view, the series model is used, which means that the sum of the parts' failure rates is the units' failure rate. For estimating the equipment's failure rate, the realiability block diagram ( R B D ) is often used to estimate the critical part of the equipment (for example, the receiver or transmitter in R R equipment). Reliability diagnostics, where reliability is defined as the probability of an item to perform a required function under stated conditions for a stated period of time, is calculated by using the equation: R = e -A' Where: R is the reliability of the equipment; A is the failure rate of the equipment; t is the time (in hours) which we expect the equipment or unit will have to work. In the case when we have the series R B D with i units (i = 1,2 . . . . n)
R=~-~Ri i=
I
In the case of parallel (active redundant) R B D , with i units (i = 1,2 . . . . n), the equipment's reliability is
R = ~ (l-R,) t=
I
For all other combinations we can find mathematical models in the literature referring to these problems. (b) Failure mode, effect and criticality analysis (FMECA) This analysis, known as down-top analysis, can be carried out on all levels of the complex-elements, units, equipment and electronic systems. In this analysis, besides failure rate, the criticality of failure, according to function, is taken into account. As a result, we have an insight into the kind of failures which have an influence on the correct work of the equipment. Reliability diagnostics using F M E C A has one very important feature: all supposed possible failures are analysed, as well as their causes and consequences. It is thus possible in time to 'build-in' added parts, or even units, to be able to diagnose degradation of some characteristics or the appearance of the fault. Sometimes it is necessary to make a re-design of the equipment's construction, to build in the element of better quality, or to build in redundant units. In the case of built-in SW, it can be very practical to build in special diagnostics SW for testing the ability of equipment and to be ready for correct work at a stated instant of time or over a stated period of time. Based on those results, we can in time take some action to put the equipment into the correct state.
(c) Fault tree analysis (FTA) A fault tree analysis can be simply described as a technique whereby an undesired state of the system is specified (usually a state that is critical from the safety standpoint), and the system is then analysed in the context of its environment and operation to find all possible ways in which the undesired event can occur. The fault tree itself is a graphic model of the various parallel and serial combinations of faults that will result in the occur142
rence of the predefined undesired event. It is important to understand that a fault tree is not a model of all possible system failures or all possible causes for system failure. A failure tree is tailored to its top event, which corresponds to some particular system failure mode, and the fault tree thus includes only those faults that contribute to this top event. A fault tree is a complex of entities known as 'gates' which serve to permit or inhibit the passage of fault logic up the tree. The gates show the relationships of events needed for the occurrence of a 'higher' event. The 'higher' event is the output of the gate; the 'lower' events are the inputs to the gate. The gate symbol denotes the type of relationship of the input events required for the output event. 3. E x a m p l e s
from practice
In this part we want to show the reliability diagnostics results of three equipments done in recent time in our Institute. We have to point out that until now only the basic reliability analysis has been included in development work in the Institute, so we made an estimation of failure rate of elements, units and equipment. For this work we used SW developed in-house and based on M I L - H D B K - 2 1 7 E . Using the element-producer's data and application data we calculated failure rates of units and equipment. Application data include application stress factor, ambient temperature, t e m p e r a t u r e inside the unit, application factor, etc. For our calculation we took into consideration two possibilities of ambient temperature specified maximum ambient t e m p e r a t u r e (50°C) and temperature of 25°C. In these two possibilities, calculation was done with the stress factor appearing in the application. The third calculation for every element was done for simulating the stand-by state with 20°C of ambient t e m p e r a t u r e and 10% stress factor for the elements. It was thus possible to see the influence of temperature and stress factor on the failure rate of elements, TABLE 1 : Data for calculating MTTF
EQP1
EQP2
EQP3
Unit
MTI'F (h)
Unit
MTTF (h)
U1 U2 U3 U4 U5 U6 U7 U8 U9 U10 Ull U12 U13 U14 U15 U16 U17 U18 U19 U20 U21
15330 40620 50080 58380 58940 67190 79830 112900 116900 145900 147300 246100 251900 254600 395000 499200 559500 705000 714200 778400 1,5x106
U1 U2 U3 U4 U5 U6 U7 U8 U9
15730 20680 26560 38900 48340 58000 71260 72590 198000
EQP1
4546
EQP2
4137
Unit
M'I-rF (h)
U1 U2
7426 8554
EQP3 3957
Measurement Vol 8 No 3, Jul-Sep 1990
Solja#i& and Kosi& units and equipment. This paper presents the results of calculations for ambient temperature of 25°C and real stress factor for every element in the application. From those data, M T I ' F is calculated, and presented in Table 1. Analysing the results we found out that in E Q P 1 the most critical unit is unit U1 and its monitor is the most critical 'element' (MI"-FF = 10000 h). As mentioned above, a series model of reliability is used (group of switches) because the great number of that element also has a big influence on the failure rate of the unit. The most critical element in that unit is one type of connector. Analysing the results of E Q P 2, it is obvious that the most critical unit is U1. The most critical group of elements are the integrated circuits and inside that group the highest failure rate is found with one linear integrated circuit ( M T T F = 50000 h). In E Q P 3, both units have approximately the same 1
.? ,£
|h \ \
%
\ .3
e
\ 5(] LEE 15fl 2gE 25g' 3EE 35E .~glE ,tSe 5Be ~ 1 i ~ ^ 3 "c £ h ]
TABLE 2: Complexity of equipments EQP 1 EQP2 EQP3
1820 elements 1220 elements 300 elements
failure rate and so too does the MTTF. As a group of elements, the most critical are switches and linear integrated circuits. The complexity of those equipments, referring to the number of electronic parts, is presented in Table 2. The complexity is based only on the number of elements and not on the kind of elements because that is not the object of this article. The reliability curves are shown in Figs 2a, 2b and 2c. The goal of reliability analysis was to plan the method of maintenance and to plan for the necessary spare parts. As we supposed that just-failed units would be changed, we analysed equipment from that point of view. From the data shown it is evident that all three equipments have expected mean-time-between-failures of approximately half a year, not depending on complexity. All three equipments work in a ground-fixed environment. From numerical and graph analyses, it can be seen that in the first equipment, EQP1, we have one group of units which are not critical from the reliability point of view (U12, U13, U14, U15, U16, U17, U18, U19, U20, U21), one group which can be put in the upper level (U2, U3, U4, U5, U6, U7), one group which can be put in the lower level (US, U9, U10, U l l ) and one unit which can be critical, U1, because we can expect its total failure in the same time (tl) when the other units have a fairly high probability of correct working. Analysing the reliability diagnostic results of the second equipment, EQP2, it is evident that the unit U9 is not critical compared with the group of other units.
(a) t
itl'x
.C
"21 e
__
.7
\
\\xx
,I \l Nx 5(: :88 tSg 28G 25e 3ge 35e 4ee 45e 51~ ~ t e , , . 3 1: I;h~ (b)
Measurement Vol 8 No 3, JuI-Sep 1990
8
5e Lee t5~. 2eE 25E 3El~ 35G 4E~ ,i.SE 51~le
(c) Fig2 Reliability curves: (a) EQP1, (b) EQP2, (c) EQP3 143
Soljadid and Kosid Inside that group it is also evident that unit U1 has onethird the probability for correct work in the t = tl than unit U8. Based on the results of reliability diagnostics of EQP1 and EQP2, the decision on maintenance policy has to be made. That means that the method of functional diagnostics has to be determined and the number of spare parts has to be calculated and planned according to the probability of equipment's and unit's working correctly in the stated time. In our cases that means that an additional ntimber of critical units and of units from lower levels of reliability has to be planned as spare parts in the place where those equipments have to operate. Looking at the results of units of the third equipment, EQP3, it is evident that the units have much the same probability of correct working during the expected work life, and that they have to be treated in the same way during maintenance activities.
4. C o n c l u s i o n
The reliability diagnostics of electronic equipment in the early development phase or on the prototype is very important because of the planning the other activities during its life cycle. Also important are: burn-in, run-in, planning the method of diagnosing during the equipment's operation, planning the system of maintenance, planning the spare parts, and availability and dependability. The examples from practice are a modest effort for reaching the mentioned goals. 5. R e f e r e n c e s
O'Connor, P. D. T. 1985. Practical Reliability Engineering (2nd edn). Vesely, W. E., Goldberg, F. F., Roberts, N. I-I. and l-laasl, D. F. 1981. Fault Tree Handbook, N U R E G 0492, January.
Coming events Organised or co-sponsored by IMEKO Event
Date a n d v e n u e
Contact
1990
Technical diagnostics (TC-10)
17-19 September Helsinki, Finland
Finnish Automation Support Ltd, H&meentie 6A/15, SF-00530 Helsinki, Finland
TEMPMEKO '90: Temperature and thermal measurement in industry and science (TC-12)
17-21 September Helsinki, Finland
Finnish Automation Support Ltd, H&meentie 6A/15, SF-00530 Helsinki, Finland
12th Conference on weighing technology (TC-3)
18-20 September Szeged, Hungary
IMEKO Secretariat
2nd Workshop on measurement and inspection in industry by computer-aided laser metrology (TC-14)
24-27 September Balatonf0red, Hungary
IMEKO Secretariat
7th Symposium on knowledge based measurement (TC-1/TC-7)
26-29 September Karlsruhe, Germany
VDINDE Gesellschaft f~JrMess-und Automatisierungstechnik, D-4000 D0sseldorf 1, PO Box 1139, FR Germany
9th Colloquium: Education in measurement and instrumentation (TC-1)
September Munich, Germany
VDINDE Gesellschaft for Mess-und Automatisierungstechnik, D-4000 DQsseldorf 1, PO Box 1139, FR Germany
5th Seminar: Electronic weighing (TC-11),
- September Zagreb, Yugoslavia
IMEKO Secretariat/Institute for Developing Countries, Zagreb, Yugoslavia
4th Int Symp: Intelligent measurement of electrical and magnetic quantities (TC-4)
15-17 November Varna, Bulgaria
Dr I. Adarski, Institute for Microprocessor based Instrumerits and Systems, Lenin bul 7th km, 1184 Sofia, Bulgaria
Workshop for instrumental problems of technical diagnostics (TC-10)
2-5June Warsaw, Poland
Dr Z. Warsza, Institute of Industrial Chemistry, UI Rydgiera 8, 01-793 Warszawa, Poland
12th IMEKO World Congress
5-10 September Beijing, China
IMEKO Xll Secretariat, c/o Chinese Society for Measurement, PO Box 1413, Beijing 100013, China
8th symposium on artificial intelligence based measurement and control (TC-7)
12-14 September Kyoto, Japan
Prof K. Kariya, Dept of Electrical Engineering, Faculty of Science and Engineering, Ritsumeikan University, 56-1 Tojiin-kita, Kita-ku, Kyoto 603, Japan
144
1991
Measurement Vol 8 No 3, JuI-Sep 1990