Reliability diagnostics of electronic equipment

Reliability diagnostics of electronic equipment

Reliability diagnostics of electronic equipment V. Solja(~i(~ and M. Kosic Iskra Automatika, Research Institute, 61000 Ljubljana, Stegne 15/b, Yugosl...

354KB Sizes 7 Downloads 131 Views

Reliability diagnostics of electronic equipment V. Solja(~i(~ and M. Kosic

Iskra Automatika, Research Institute, 61000 Ljubljana, Stegne 15/b, Yugoslavia

Reliability diagnostics is important in the selection and design of equipment and spare parts, and for establishing maintenance policy. Three possible methods are: failure rate analysis; failure mode, effect and criticality analysis; and fault tree analysis. The importance of reliability diagnostics in the early planning and design stages is emphasised. Keywords: Diagnostics,reliability,failure rate, failure mode, fault tree

1. Introduction Reliability diagnostics of electronic equipment in the early phase of a design is based on prediction of the reliability of component parts, units, equipment and systems. Early estimation and diagnostics is necessary for choosing the most satisfactory design of equipment, for the correct planning of spare parts, determining better maintenance policy and for achieving better availability during the life cycle of the equipment.

resistors, capacitors, integrated circuits, connectors, etc, in all kinds of applications and laboratories. A design engineer uses such formulae to calculate failure rates of individual components, units and even equipment. Based on this analysis, the 'weak points' of equipment can be determined. The widely used mathematical models for calculation are in handbooks: MIL-HDBK217 ( A , . . . E), CNET's and BELL-COR's. The basic form of these models is: Ax = Ab.,rr 1-,n-2.. . .

2. S u g g e s t e d m e t h o d s From literature and practice we know a few methods for early diagnostics of the incorrect working of equipment, from the reliability point of view. The most often used are: (a) failure rate analysis (A) (b) failure mode, effect and (FMECA) (c) fault tree analysis (FFA).

criticality

analysis

It should be mentioned that these methods as analysis and reliability diagnostics can be divided into two groups: the inductive and the deductive approaches. The inductive approach, including failure rate analysis and FMECA, analyses the influence of separate items and events on the function of the equipment or the system. The deductive approach, including FTA, analyses the reasons for some failure of equipment or system. Thus it can be stated that the inductive approach is used when we want to determine what kind of states of equipment are possible (generally a failure state), and the deductive approach is used when we want to determine how a given system state (usually a failed state) can

where Ax is the failure rate of the electronic part and Ab is the basic failure rate of the electronic part, taking into account the technology, temperature and electrical stress. ~'), 7 r 2 . . . are factors connected to quality, construction, environment, electrical stress, application, etc. Here it has to be pointed out that it is assumed that the equipment is in the flat part of the so-called 'bath-tub' curve (Fig 1). That means that the equipment is in the 'stable' part of its useful life (period B), and that early failures (period A) are eliminated in the production phase (burn-in, run-in). In this case the failures that occur because of wearing out are not taken into consideration (period C).

Occur.

(a) Failure rate analysis r

Failure rate analysis is carried out based on knowing the failure rate of an equipment's electronic parts. The mathematical model for estimating the failure rate of electronic parts is the result of many years observation of

MeasurementVol 8 No 3, JuI-Sep 1990

A

B

C

Fig 1 Typical failure rates over time 141

Solja~i~ and Kosi# For estimating the failure rate of units, from the reliability point of view, the series model is used, which means that the sum of the parts' failure rates is the units' failure rate. For estimating the equipment's failure rate, the realiability block diagram ( R B D ) is often used to estimate the critical part of the equipment (for example, the receiver or transmitter in R R equipment). Reliability diagnostics, where reliability is defined as the probability of an item to perform a required function under stated conditions for a stated period of time, is calculated by using the equation: R = e -A' Where: R is the reliability of the equipment; A is the failure rate of the equipment; t is the time (in hours) which we expect the equipment or unit will have to work. In the case when we have the series R B D with i units (i = 1,2 . . . . n)

R=~-~Ri i=

I

In the case of parallel (active redundant) R B D , with i units (i = 1,2 . . . . n), the equipment's reliability is

R = ~ (l-R,) t=

I

For all other combinations we can find mathematical models in the literature referring to these problems. (b) Failure mode, effect and criticality analysis (FMECA) This analysis, known as down-top analysis, can be carried out on all levels of the complex-elements, units, equipment and electronic systems. In this analysis, besides failure rate, the criticality of failure, according to function, is taken into account. As a result, we have an insight into the kind of failures which have an influence on the correct work of the equipment. Reliability diagnostics using F M E C A has one very important feature: all supposed possible failures are analysed, as well as their causes and consequences. It is thus possible in time to 'build-in' added parts, or even units, to be able to diagnose degradation of some characteristics or the appearance of the fault. Sometimes it is necessary to make a re-design of the equipment's construction, to build in the element of better quality, or to build in redundant units. In the case of built-in SW, it can be very practical to build in special diagnostics SW for testing the ability of equipment and to be ready for correct work at a stated instant of time or over a stated period of time. Based on those results, we can in time take some action to put the equipment into the correct state.

(c) Fault tree analysis (FTA) A fault tree analysis can be simply described as a technique whereby an undesired state of the system is specified (usually a state that is critical from the safety standpoint), and the system is then analysed in the context of its environment and operation to find all possible ways in which the undesired event can occur. The fault tree itself is a graphic model of the various parallel and serial combinations of faults that will result in the occur142

rence of the predefined undesired event. It is important to understand that a fault tree is not a model of all possible system failures or all possible causes for system failure. A failure tree is tailored to its top event, which corresponds to some particular system failure mode, and the fault tree thus includes only those faults that contribute to this top event. A fault tree is a complex of entities known as 'gates' which serve to permit or inhibit the passage of fault logic up the tree. The gates show the relationships of events needed for the occurrence of a 'higher' event. The 'higher' event is the output of the gate; the 'lower' events are the inputs to the gate. The gate symbol denotes the type of relationship of the input events required for the output event. 3. E x a m p l e s

from practice

In this part we want to show the reliability diagnostics results of three equipments done in recent time in our Institute. We have to point out that until now only the basic reliability analysis has been included in development work in the Institute, so we made an estimation of failure rate of elements, units and equipment. For this work we used SW developed in-house and based on M I L - H D B K - 2 1 7 E . Using the element-producer's data and application data we calculated failure rates of units and equipment. Application data include application stress factor, ambient temperature, t e m p e r a t u r e inside the unit, application factor, etc. For our calculation we took into consideration two possibilities of ambient temperature specified maximum ambient t e m p e r a t u r e (50°C) and temperature of 25°C. In these two possibilities, calculation was done with the stress factor appearing in the application. The third calculation for every element was done for simulating the stand-by state with 20°C of ambient t e m p e r a t u r e and 10% stress factor for the elements. It was thus possible to see the influence of temperature and stress factor on the failure rate of elements, TABLE 1 : Data for calculating MTTF

EQP1

EQP2

EQP3

Unit

MTI'F (h)

Unit

MTTF (h)

U1 U2 U3 U4 U5 U6 U7 U8 U9 U10 Ull U12 U13 U14 U15 U16 U17 U18 U19 U20 U21

15330 40620 50080 58380 58940 67190 79830 112900 116900 145900 147300 246100 251900 254600 395000 499200 559500 705000 714200 778400 1,5x106

U1 U2 U3 U4 U5 U6 U7 U8 U9

15730 20680 26560 38900 48340 58000 71260 72590 198000

EQP1

4546

EQP2

4137

Unit

M'I-rF (h)

U1 U2

7426 8554

EQP3 3957

Measurement Vol 8 No 3, Jul-Sep 1990

Solja#i& and Kosi& units and equipment. This paper presents the results of calculations for ambient temperature of 25°C and real stress factor for every element in the application. From those data, M T I ' F is calculated, and presented in Table 1. Analysing the results we found out that in E Q P 1 the most critical unit is unit U1 and its monitor is the most critical 'element' (MI"-FF = 10000 h). As mentioned above, a series model of reliability is used (group of switches) because the great number of that element also has a big influence on the failure rate of the unit. The most critical element in that unit is one type of connector. Analysing the results of E Q P 2, it is obvious that the most critical unit is U1. The most critical group of elements are the integrated circuits and inside that group the highest failure rate is found with one linear integrated circuit ( M T T F = 50000 h). In E Q P 3, both units have approximately the same 1

.? ,£

|h \ \

%

\ .3

e

\ 5(] LEE 15fl 2gE 25g' 3EE 35E .~glE ,tSe 5Be ~ 1 i ~ ^ 3 "c £ h ]

TABLE 2: Complexity of equipments EQP 1 EQP2 EQP3

1820 elements 1220 elements 300 elements

failure rate and so too does the MTTF. As a group of elements, the most critical are switches and linear integrated circuits. The complexity of those equipments, referring to the number of electronic parts, is presented in Table 2. The complexity is based only on the number of elements and not on the kind of elements because that is not the object of this article. The reliability curves are shown in Figs 2a, 2b and 2c. The goal of reliability analysis was to plan the method of maintenance and to plan for the necessary spare parts. As we supposed that just-failed units would be changed, we analysed equipment from that point of view. From the data shown it is evident that all three equipments have expected mean-time-between-failures of approximately half a year, not depending on complexity. All three equipments work in a ground-fixed environment. From numerical and graph analyses, it can be seen that in the first equipment, EQP1, we have one group of units which are not critical from the reliability point of view (U12, U13, U14, U15, U16, U17, U18, U19, U20, U21), one group which can be put in the upper level (U2, U3, U4, U5, U6, U7), one group which can be put in the lower level (US, U9, U10, U l l ) and one unit which can be critical, U1, because we can expect its total failure in the same time (tl) when the other units have a fairly high probability of correct working. Analysing the reliability diagnostic results of the second equipment, EQP2, it is evident that the unit U9 is not critical compared with the group of other units.

(a) t

itl'x

.C

"21 e

__

.7

\

\\xx

,I \l Nx 5(: :88 tSg 28G 25e 3ge 35e 4ee 45e 51~ ~ t e , , . 3 1: I;h~ (b)

Measurement Vol 8 No 3, JuI-Sep 1990

8

5e Lee t5~. 2eE 25E 3El~ 35G 4E~ ,i.SE 51~le

(c) Fig2 Reliability curves: (a) EQP1, (b) EQP2, (c) EQP3 143

Soljadid and Kosid Inside that group it is also evident that unit U1 has onethird the probability for correct work in the t = tl than unit U8. Based on the results of reliability diagnostics of EQP1 and EQP2, the decision on maintenance policy has to be made. That means that the method of functional diagnostics has to be determined and the number of spare parts has to be calculated and planned according to the probability of equipment's and unit's working correctly in the stated time. In our cases that means that an additional ntimber of critical units and of units from lower levels of reliability has to be planned as spare parts in the place where those equipments have to operate. Looking at the results of units of the third equipment, EQP3, it is evident that the units have much the same probability of correct working during the expected work life, and that they have to be treated in the same way during maintenance activities.

4. C o n c l u s i o n

The reliability diagnostics of electronic equipment in the early development phase or on the prototype is very important because of the planning the other activities during its life cycle. Also important are: burn-in, run-in, planning the method of diagnosing during the equipment's operation, planning the system of maintenance, planning the spare parts, and availability and dependability. The examples from practice are a modest effort for reaching the mentioned goals. 5. R e f e r e n c e s

O'Connor, P. D. T. 1985. Practical Reliability Engineering (2nd edn). Vesely, W. E., Goldberg, F. F., Roberts, N. I-I. and l-laasl, D. F. 1981. Fault Tree Handbook, N U R E G 0492, January.

Coming events Organised or co-sponsored by IMEKO Event

Date a n d v e n u e

Contact

1990

Technical diagnostics (TC-10)

17-19 September Helsinki, Finland

Finnish Automation Support Ltd, H&meentie 6A/15, SF-00530 Helsinki, Finland

TEMPMEKO '90: Temperature and thermal measurement in industry and science (TC-12)

17-21 September Helsinki, Finland

Finnish Automation Support Ltd, H&meentie 6A/15, SF-00530 Helsinki, Finland

12th Conference on weighing technology (TC-3)

18-20 September Szeged, Hungary

IMEKO Secretariat

2nd Workshop on measurement and inspection in industry by computer-aided laser metrology (TC-14)

24-27 September Balatonf0red, Hungary

IMEKO Secretariat

7th Symposium on knowledge based measurement (TC-1/TC-7)

26-29 September Karlsruhe, Germany

VDINDE Gesellschaft f~JrMess-und Automatisierungstechnik, D-4000 D0sseldorf 1, PO Box 1139, FR Germany

9th Colloquium: Education in measurement and instrumentation (TC-1)

September Munich, Germany

VDINDE Gesellschaft for Mess-und Automatisierungstechnik, D-4000 DQsseldorf 1, PO Box 1139, FR Germany

5th Seminar: Electronic weighing (TC-11),

- September Zagreb, Yugoslavia

IMEKO Secretariat/Institute for Developing Countries, Zagreb, Yugoslavia

4th Int Symp: Intelligent measurement of electrical and magnetic quantities (TC-4)

15-17 November Varna, Bulgaria

Dr I. Adarski, Institute for Microprocessor based Instrumerits and Systems, Lenin bul 7th km, 1184 Sofia, Bulgaria

Workshop for instrumental problems of technical diagnostics (TC-10)

2-5June Warsaw, Poland

Dr Z. Warsza, Institute of Industrial Chemistry, UI Rydgiera 8, 01-793 Warszawa, Poland

12th IMEKO World Congress

5-10 September Beijing, China

IMEKO Xll Secretariat, c/o Chinese Society for Measurement, PO Box 1413, Beijing 100013, China

8th symposium on artificial intelligence based measurement and control (TC-7)

12-14 September Kyoto, Japan

Prof K. Kariya, Dept of Electrical Engineering, Faculty of Science and Engineering, Ritsumeikan University, 56-1 Tojiin-kita, Kita-ku, Kyoto 603, Japan

144

1991

Measurement Vol 8 No 3, JuI-Sep 1990