Computers ind. Engng Vol. 33, Nos 3--4, pp. 753-756, 1997 © 1997 Elsevier Science Ltd Printed in Great Britain. All rights reserved 0360-8352197 $17.00 + 0.00
Pergamon PII: S0360-8352(97)00239-8
Mission R e l i a b i l i t y of an Automatic Control System Integrated with D i s t r i b u t e d I n t e l l i g e n t B u i l t - I n - T e s t Systems Wang-Jin Yoo Dept. of Industrial Engineering Kon-Kuk University Seoul, Korea 133-701
Kyung-Hee Jun9 Management Consulting Division POSCOResearch Institute Seoul, Korea 137-070
Abstract
This paper introduces distributed-centralized B u i l t - I n - T e s t (BIT) systems interfaced with an automatic control system with the purpose of improving mission r e l i a b i l i t y . By Using a block dia9ramming method, a complicated system is decomposed into mutually exclusive subsystems so that a distributed BIT is connected to each subsystem for multiple parallel processing of f a u l t detection. The data produced by the d ist rib u t e d BITs is sent to a central control processor. We present a Markov process approach to a n a l y t i c a l l y derive the mission r e l i a b i l i t y of an automatic control f a u l t - t o l e r a n t system with distributed BITs. As diagnostic mistakes of the BIT, the false alarm and f a u l t missing of BIT are considered with the malfunction of the BIT i t s e l f . Numerical examples are also prepared to evaluate the performance of distributed i n t e l l i g e n t BITs, by comparing mission r e l i a b i l i t i e s corresponding to the variation of design parameters in a time domain. © 1997 E l s e v i e r S c i e n c e L t d Keywords : I n t e l l i g e n t BIT, Mission R e l i a b i l i t y , 1.
Markov Chain Shao and Kapur [8] introduced the block diagramming method to decompose a complex system into simple and basic blocks. Based on their decomposition method, a system in this paper is decomposed into subsystems connected to BITs. For an e f f e ct -a n a lysis of a distributed-centralized BIT system, mission r e l i a b i l i t y is considered to be a major c r i t e r i o n . We derive the analytic solution of mission r e l i a b i l i t y for a k-out-of-n:G load sharing system with BITs by using a Markov process approach. Through numerical examples, it is possible to compare the mission r e l i a b i l i t i e s according to the variation of design parameters of a system on a time domain, and to suggest the boundary of appropriate parameter values.
Introduction
Since modern avionics systems have become mere complicated, they are monitored and controlled by a central c o n t r o l l e r in order to prevent severe failures. The system units could be subject to several types of outages including independent, dependent, multiple, common-cause, and system originated outages. This paper deals with a model of an i n t e l l i g e n t b u i l t - i n - t e s t (BIT) interfaced to a f a u l t - t o l e r a n t automatic control system. The main tasks of a BIT are to control a total system as well as to detect faults and to perform diagnosis. Buswell & Graves [ I ] showed the design logic of BIT hardware and software incorporated into the primary system. Carroll et. al. [2] proposed a general approach for design parameter selection and assessment of the BIT. And, Haedtke [3] and Krause [5] presented d i f f e r e n t applications of BIT results such as m u l t i l e v e l l i n g of s e l f - t e s t s and reduction of mean to repair.
2. C o n f i g u r a t i o n o f D i s t r i b u t e d - C e n t r a l i z e d BIT The BIT is a microprocessor based s e l f - t e s t i n g system installed the computer. Its tasks and f a u l t detection, diagnosis, and isolation and storage of information record. When the BIT tests a system, the sensors in the BIT receive signals about system status and output. These analog signals are transformed into d i g i t a l ones via an analog-digital converter. A microprocessor analyzes signals for f a u l t detection, and subsequently dia9noses the f a i l u r e mode, time and system impact. I f the f a u l t c r i t i c a l l y a f f e ct s a system, a warming signal is given, and the system is then reconfigured a f t e r isolation of the f a u lt . Then, the f a u l t - r e l a t e d information is stored in a database for system maintenance and repair analysis. The algorithms of monitoring and co n t ro llin g a f a u l t are obtained by a f a u l t tree analysis on the basis of all possible f a i l u r e modes and effects. Based on the data from distributed BITs, a central microprocessor analyzes the system health, environment, and mission requirement, and optimizes the design parameters for the improvement of mission r e l i a b i l i t y . The malfunctioning of a BIT is a serious problem, and we
Besides the outages of system units, there are the failures or diagnostic errors of central c ontroller and testing system in actual systems. Shao & Lamberson [7] analyzed the impact of a BIT-issued false alarm on a system's RAM and came up with the result that i t is very c r i t i c a l to maintain and ensure the r e l i a b i l i t y of a system. This paper w i l l provide several techniques of reducing BIT malfunctions through s t r a t e g i c a l l y designing the BIT configuration (see [9-10]). A total system can be partitioned into several subsystems, and each subsystem is then connected to a distributed BIT. Harris [4] presented the methods of star network and data bus for distributed BIT processing. I t was assumed in [4] and [6] that there exists a central control unit in a distributed BIT processing structure.
753
Proceedings of 19961CC&IC
754
now suggest several methods to reduce the d i a g n o s t i c e r r o r s of a BIT. The f i r s t method is to analyze the BIT c o n d i t i o n s p e r i o d i c a l l y during a given mission time. I f a BIT malfunction is i d e n t i f i e d , we set a system to bypass the diagnosis o f BIT. As another method, we may set up a dual BIT i n t e r f a c e system. I f the same r e s u l t s are obtained from two systems, then the response is accepted; otherwise, i t is rejected. As f o r ways to neglect unrealistic information produced by BITs, we can consider multiple detection, specification of reasonable parameter values, and a B I T - i n d i c a t i o n analyzer.
3. Mission R e l i a b i l i t y
o f System w i t h BITs
The main task of a BIT is to improve system m a i n t a i n a b i l i t y , not to hinder the system o p e r a t i o n as an a d d i t i o n a l subsystem. The BIT is also a system c o n t r o l l e r , and supports decision-making processes o f system maintenance. Therefore, the e f f e c t of BIT should be evaluated on the aspects of system r e l i a b i l i t y . Based on the block diagramming method of Shao and Kapur, a complex system w i t h numerous inspection u n i t s is d i v i d e d i n t o several subsystems i n t e r f a c e d w i t h BITs. Mission r e l i a b i l i t y is the p r o b a b i l i t y o f a system completing a specified mission under given c o n d i t i o n s w i t h i n a c e r t a i n time. In t h i s paper, we assume that the BIT o b s t r u c t s or c o n t r o l s system operation. For the r e l i a b i l i t y a n a l y s i s o f blocked subsystems, we may consider serial, parallel, k-out-of-n, load sharing, and v o t i n g s t r u c t u r e methods. However, i f the number of u n i t s is more than two, the traditional methods are not appropriate for reliability estimation. In order to d e r i v e the a n a l y t i c s o l u t i o n , i t assumes that the l i f e t i m e s of u n i t s are e x p o n e n t i a l l y d i s t r i b u t e d . Then, we w i l l show the Markov process approach to estimate the mission r e l i a b i l i t y o f a system w i t h BITs. I f the BIT detects a f a u l t , a c o n t r o l l e r adjusts the parameters of operable u n i t s in order to maintain the system in an e f f i c i e n t and economic manner. Let us assume that a system f a i l s i f BIT misses a f a u l t , and that the f a i l e d u n i t is not i s o l a t e d from a system. Also, i f BIT issues a f a l s e alarm for a u n i t t h a t a c t u a l l y d i d not f a i l , the c o n t r o l l e r a u t o m a t i c a l l y removes out t h i s u n i t . I f e i t h e r BIT or c o n t r o l l e r fails, the system also fails. The t o t a l f a i l u r e rate o f u n i t s at each s t a t e is dependent upon the number of operable units. 3.1
Mission
Reliability
of
System
with
I d e n t i c a l Unit F a i l u r e Before describing the problem, we w i l l d e f i n e the f o l l o w i n g notations. 7fa False alarm rate of BIT A b,t F a i l u r e rate of BIT 7:on F a i l u r e rate of c o n t r o l l e r ~j F a i l u r e rate o f each operable u n i t at s t a t e w i t h j f a i l e d u n i t s f o r j = 0 , 1 , 2 . . . . . n-k Fd P r o b a b i l i t y of a f a u l t being detected by BIT The i n t e r f a i l u r e times o f a l l u n i t s and exponentially distributed
are i d e n t i c a l with the
corresponding mean rates. When the BIT detects a f a u l t or issues a f a l s e alarm, the system s t a t e is changed. Then, the f a i l u r e rate of each operable u n i t is also chen9ed. I f j u n i t s f a i l , there are ( n - j ) operable u n i t s and each operable u n i t has a constant failure rate ~j f o r j = O, I, 2, . . . . (n-k). The system s t a t e w i t h u n i t f a i l u r e rate A,-k., does not e x i s t since a system f a i l s as soon as there are (n-k+l) f a i l e d u n i t s . We assume that the processes of switching the s t a t e s are p e r f e c t and that switching time is n e g l i g i b l e . Let us define the state-spaces o f a system. At s t a t e j (j =0, 1, 2 . . . . . n-k), there are ( n - j ) operable u n i t s and j f a i l e d u n i t s are disconnected from a given network, where BIT and the c o n t r o l l e r are on normal conditions. A system f a i l s at s t a t e (n-k*1) having (n-k+l) f a i l u r e s since there are less than k operable units, i f the BIT does not detect actual u n i t f a i l u r e , or i f e i t h e r BIT or controller itself fails, a system goes to the failure s t a t e f. Therefore, all possible s t a t e space in a system is E = {0, I, 2 . . . . . n-k, n-k+1, f } . The set o f states of system f a i l u r e is F = {n-k+l, f } , w h i l e the elements in E\F are the s t a t e s o f system operation. The f a i l u r e rate o f the i n i t i a l s t a t e is nRo since there e x i s t n operable u n i t s at the i n i t i a l state, where ~o is a mean f a i l u r e rate of each u n i t at s t a t e O. I f a system f a i l s at the i n i t i a l state, the t r a n s i t i o n rate from s t a t e 0 to s t a t e f is the sum of the BIT f a u l t missing rate, and the f a i l u r e rates of BIT and c o n t r o l l e r , that is, (1-Fd)n~o + ~b~t +~con (l-Fa) is the p r o b a b i l i t y of a f a u l t being missed by BIT, and (1-Fd)n~o is the f a u l t missing rate of BIT. There are two types of BIT t e s t s such as the p e r i o d i c and the i n i t i a l test. The p e r i o d i c t e s t is performed p e r i o d i c a l l y if a system is in operation. Since the t e s t process of BIT should not impact system operation, the p e r i o d i c t e s t is very r e s t r i c t e d . The i n i t i a l test is performed when the s i m u l a t i o n s i g n a l s are put into a system. The p r o b a b i l i t y of BIT f a u l t d e t e c t i o n is estimated through both i n i t i a l and p e r i o d i c tests. The t r a n s i t i o n rate of s t a t e 0 to 1 is (F~ • n 2 o + 2fa), where ~o is the f a i l u r e rate o f each u n i t at s t a t e O, and ~f~ is the f a l s e alarm rate of BIT. Since i t is possible to miss a f a u l t , the actual f a i l u r e rate of u n i t detected by BIT is Fd~0 at the initial state. The f a l s e alarm rate is the p r o b a b i l i t y that BIT issues a f a l s e alarm during a u n i t time a f t e r a p o i n t not issued a f a l s e alarm or issued the last f a l s e alarm that does not cause the actual system f a i l u r e . The f a l s e alarm rate is approximately close to the product of the p r o b a b i l i t y of issuing a f a l s e alarm and the t o t a l f a i l u r e rate of the primary system. E m p i r i c a l l y , the p r o b a b i l i t y o f issuing a f a l s e alarm is the percentage o f t o t a l maintenance actions based o f BIT. We represent the above Markov t r a n s i t i o n s the d i f f e r e n t i a l equation, as f o l l o w s : dPo(t) dt
Cl~Po(t)
by using
Proceedings of 1996 ICC&IC dPt(t) tit = - C,PI(t) + BoPo(t)
dPi(t) dt =-CiPi(t)+Bi_tPi_t(t)
for j=2 ..... n - k
e x t e r n a l event. Let us assume that the f a i l u r e o f u n i t i is time dependent, t h a t is, 2 , ( t ) .
rate
The f o l l o w i n 9 section.
this
dt
Bi : F~ ( n - i ) A i C, = ( n - i ) A ,
i = O, l, 2 . . . . .
(I)
+ A,,
+ A,,+
And, the i n i t i a l
A~(t) Ale(t) bit(t) Acon(t) E, E(O ui uf~(° U,
n-k are
Ab,t + A¢o~ (2)
c o n d i t i o n s at time 0 are
1
if i = 0
0
if Vi
P, (0) =
(3) - -
~ S\{O}
Where S = ( O, l, 2 . . . . .
U{O
n-k}
By usin9 the laplace transform, we can represent each P j ( t ) ( j=O, 1, 2 . . . . . n-k) such as
Po(t)= exp(-Cot)
be
used
unit failure
in
rates at
In a s i m i l a r way to Section 3.1, we show the d i f f e r e n t i a l equations of t r a n s i t i o n states f o r a l l possible domains as f o l l o w s :
= exp{ - ( n A o+ A ~,+ A bit+ A co~)t } dP@(t) = - C(o)(t)p(o~(t)
(4) Pj(t)= __~B,~
dPu)(t)
exp(-C,t)
C(I)(t)P(I)(t)+B(o)(t)P(o) (t)
• _]~,.(c. c,)
=
will
time t F a i l u r e rate o f u n i t i at time t False alarm rate of BIT at time t F a i l u r e rate o f BIT at time t F a i l u r e rate o f c o n t r o l l e r at time t e x t e r n a l event i f o r i = 1, 2 . . . . . p i t h e x t e r n a l event u n i t i for i = I, 2 . . . . . n f a l s e alarmed u n i t at the i t h s t a t e set o f u n i t s simultaneously f a i l e d by e x t e r n a l event E, set of u n i t s simultaneously f a i l e d by the i t h e x t e r n a l event E(O
= Bn-k P n - k ( t )
Where Bi and C, f o r
notations
,__~A i(t);sum o f a l l
A (t)
dP.-~.i(t)
755
{Fd- ( n - - i ) A i +
Al~}
~=~ e x p [ - ( ( n - i ) A ,+ ,~f,+ ,~b,,+ A,,.}t] . . . . . . . :lol ( n - ~)A , - ( n - i ) A J
dP ti)(t ) = _ C (i)(t) P (,~(t) + B (j_ 1)(t) - P (j_ i)(t) dt
(5) f o r j = I, 2 . . . . . Where 8 i ( t )
dP(,~(t) dt =B(,-~(t)P(,-t~(t)
n-k
and C~(t) are defined as in Eqs. (1-2)
The mission reliability of a system is the probability t h a t BIT completes a mission under 9iven c o n d i t i o n s for a certain time period. Therefore, the mission r e l i a b i l i t y R(t) is the sum of p r o b a b i l i t i e s of e l l possible t r a n s i t i o n states at time t, which is :
R(t) = , ~ P i ( t ) = j=~Pi(t)
(6)
Where S = { O, 1, 2 . . . . . n-k}, and Pj ( t ) (j = O, 1, 2 . . . . . n-k) are defined as in Eqs. (4-5). As shown in Eqs. (4-6). the mission r e l i a b i l i t y is dependent upon the values o f desi9n parameters. By s u b s t i t u t i n 9 the a p p r o p r i a t e parameter values into P i ( t ) in Eq. (6), the BIT e f f e c t can be estimated.
3.2 Mission Reliability N o n - i d e n t i c a l Unit F a i l u r e
of
System
Where for 0 --< j
~
for ¢ < ( n - k + l )
~-1,
B(i)(t)=F~'~"t( a ( t ) -
u~ .s¢.Z,-,~) , A , ( t ) } + a ta(t) (8)
C(i)(t)={A(t)-
~
+ A ~(t)
A (t)}+ A~.(t)+ Abit(t)
""(9) and IS(j)I : number of elements in S(j)
SO) have f o l l o w i n 9 two cases : 1){ufa 0)} - i f BIT issues a f a l s e alarm at the j t h s t a t e and 2) U'(j) i f BIT detects e l l f a i l u r e s o f UO) at the j t h s t a t e U(i)=U(i)\i~i{
Lira (i)} ; set of u n i t s f a i l e d by E(j)
but f a l s e alarmed u n i t s up to the ( j - 1 ) t h And, the i n i t i a l
state.
c o n d i t i o n s at time 0 are : 1
if
i =0
0
if Vi
e(i) (0) :
with
~
By usin9 the Laplace transform, In t h i s section, consider a k - o u t - o f - n : G system w i t h common-mode outa9es. A common-mode outa9e is an event havin9 a s i n g l e external cause r e s u l t i n 9 in a m u l t i p l e - u n i t f a i l u r e . The f a i l u r e p a t t e r n of un i t s i s 9enerated i ndependen t I y. Commen-mede outa9es are not c o r r e l a t e d , and t h e i r i n t e r a r r i v a l times are e x p o n e n t i a l l y d i s t r i b u t e d . A u n i t f a i l s by a sin91e e x t e r n a l cause where there are p types of common-mode outa9es. Therefore, the set o f a l l u n i t s is mutually e x c l u s i v e l y p a r t i t i o n e d into the subsets o f u n i t s f a i l e d simultaneously by each
(7)
(1, 2 . . . . .
~}
P(o(t) are shown as
P (o)(t) = exp{ - C (o)(t)t) (10) P ( i ) ( t ) = i_[~B(i)(t) i= ~ '=
exp{-Co'(t)t} _]~ { C ~ , ) ( t ) - C , ( t ) }
for J=l.2 ..... ¢
Where B(i)(t) and C(o(t) are defined as in (8-9). And, the mission r e l i a b i l i t y becomes :
R(t)
~P(i)(t)
for ¢ < p
Eqs.
756
Proceedings of 1996 ICC&IC
Where Po)(t) are defined as in Eq. (10). 4.
Numerical
In this paper, a model of an intelligent built-in-test (BIT) interfaced to a f a u l t - t o l e r a n t automatic control system is shown. The main tasks of a BIT are central control processing by f a u l t detection and diagnosis in a complex system on-line, in real-time. Besides the outages of units in a system, the f a i l u r e s of the BIT and the c o n t r o l l e r , and the diagnostic mistakes of BIT such as f a u l t missing and f a l s e alarm are also considered. In order to reduce malfunctions of the BIT, several techniques are suggested on the aspects of BIT c o n f i g u r a t i o n design.
Examples
Case O cons i ders a I oad-shared 2-out-of-4: G automatic control system with identical failure rate o f unit, 0.1. This system is interfaced with BIT whose f a i l u r e rate is 0.0, and therefore, the BIT never f a i l s . I f a f a u l t occurs, BIT p e r f e c t l y detects i t , that is, F~ = 1.0. The f a l s e alarm rate o f BIT is 0.0. The set of a l l possible t r a n s i t i o n states is E = {0, 1, 2, 3, f } , where f denotes the s t a t e of system f a i l u r e . I f the f a i l u r e rate of c o n t r o l l e r is 0.0, Bj and C~ of Eqs. (1-2) become
The mission r e l i a b i l i t y is estimated as a c r i t e r i a of the e f f e c t analysis of d i s t r i b u t e d c e n t r a l i z e d BIT system, which is the p r o b a b i l i t y of performing an assigned mission w i t h i n a given time. We considered a k - o u t - - o f - n : G load sharing system interfaced with BITs, and derived the a n a l y t i c s o l u t i o n of mission r e l i a b i l i t y using a Markov process approach. Through numerical examples, the performance evaluation o f d i s t r i b u t e d - c e n t r a l i z e d BITs was done with more appropriate design parameters.
B i= 1.0(4-j) ~ j Ci=(4-j)A
i
for j
=
O, 1, 2
Then the t r a n s i t i o n p r o b a b i l i t i e s are :
Po(t) =
e -0.4,
pz(t)=Bo ~=
exp(-Cit) ,-~ ~(C _C ) exp{-(4-
i1~,1)
References
.~
l. Buswell, S. M. and F. P. Graves, 1988, " S e l f - T e s t o f Timing and Control C i r c u i t r y , " Proceedings of Annual R e l i a b i l i t y and Maintenance Symposium, pp. 108-111 2. C a r r o l l , W. H. et, al., 1981, "Diagnostics S p e c i f i c a t i o n - A Proposed Approach," Proceedings of Annual R e l i a b i l i t y and Maintenance Symposium, Vol. 30, pp. 227-231 3. Haedtke, J. E. and W. R. Olson, 1987, " m u l t i l e v e l Self-Test for the Factory and F i e l d , " Proceedings of Annual R e l i a b i l i t y and Maintenance Symposium, pp. 2?4-278 4. Harris, D. E., 1984, " B u i l t - I n - T e s t to Support Remote System Maintenance," Proceedings of Annual R e l i a b i l i t y and Maintenance Symposium, pp. 422-42? 5. krause, G. S., 1984, "Microprocessing to Reduce MTTR of Analog System," Proceedings of Annual R e l i a b i l i t y and Maintenance Symposium, pp. 505-509 6. Krause, G. S., 1985, " D i s t r i b u t e d vs. Centralized BIT/FIT Proceeding," Proceedings of Annual R e l i a b i l i t y and Maintenance symposium, pp. 291-295 ?. Shao, J. and L. R. Lamberson, 1988, "Impact of BIT Design Parameters on Systems RN,t " R e l i a b i l i t y Engineering and System Safety, Vol. 23, pp. 219-246 8. Shao, and Kapur, 1989, " S t r u c t u r e Functions for M u l t i s t a t e Systems," Technical note, WSU
(C _C~)
= =l~(4-i)A,= ,,__~=
#-, e×p{- ( 4 - 0_~il}
,_1ol ,{(4- ~)A , -(4-i)A ~) Where the i n i t i a l
Po)(O)
conditions are :
[
1
if i = 0
[- -
0
if Vi
~
Then, the mission r e l i a b i l i t y R(t)=e
{1,
2}
at time t is :
-0"4t-F4'~ O~L~ _L. exp{--(4--i)A,t } 11 {4- ~)A ,-(4-i)A,} ~- ,__I~(4--i),~ i=~
exp{_-(4-i)A ,t}
and R(O) = 1. The mission r e l i a b i l i t y o f Case 0 ts estimated on time interval [0,20]. We also consider the mission r e l i a b i l i t i e s of Cases 1 and 2 with d i f f e r e n t parameter values. The design parameters of Cases 0-2 are shown in Table 1. Table 1. Design Parameters of 2-out-of-4:G System for Cases 0-2 Ao
A,
2z
Case 0
O. 1O0
O. 1O0
O. 100
Case 1
O. 045
O. 075
O. 100
Case 2
O. 045
O. 075
O. I00
The mission r e l i a b i l i t i e s for Cases 1-2 are higher than that of Case O since unit f a i l u r e rates are lower in these cases. Moreover, in Case 1, the B i t can f i n d the unit f a u l t p e r f e c t l y , and BIT and c o n t r o l l e r are never f a i l e d . 5. C o n c l u s i o n s
Fd
2 fa
2 bit
1.000
0.000
0.000
0.000
1.000
0.000
0.000
0.000
0.990
0.002
0,001
0.001
con
9. Yoo, W. and H. Oh, 1993, "A Study of B u i l t - I n - T e s t Diagnosis Mistakes as a False Alarm F i l t e r , " Journal of the KSQC, Vol. 21, No. 2, pp. 1-16 IO. Yoo, W. and H. Oh, 1 9 9 5 , "Useful Redundant Techniques for Built-In-Test Related System," Journal of the KIIE, Vol. 21, No. 2, pp. 183-195