Copyright (:) IFAC Fault Detection, Supervision and Safety of Technical Processes, Washington, D.C., USA, 2003
~
IFAC
0
C>
Publications www.elsevier.comlIocatelifac
DEPENDABILITY ANALYSIS OF COMPLEX MECHATRONIC SYSTEMS
Mihaela Barreau, Alens Todoskoff, Jean-Yves Morel, Fabric:e Guerin and Alin Mihalac:he
University ofAngers LAboratoire de Surete de fonctionnement, Qualite et Organisation (LASQUO) (Quality, Dependability &Management Laboratory) 62, avenue N-D du Lac - 49000 ANGERS - France Fax. 33.2.41.22.65.21 E-Mail : mihaela.barreau@univ-angers.(r
Abstract: The dependability analysis of complex mechatronic systems is a very important engineering issue, in order to guarantee their functional behavior. Most of the critical failures are generated by the interactions between the sub-systems, implemented in different technologies, e.g. mechanics, electronics, and software. Therefore, the analysis of the system as a whole is not enough and it becomes necessary to study all the interactions in order to estimate the system's dependability. Copyright © 2003IFAC Keywords: Complex systems, Reliability analysis, Safety analysis, Petri Nets, Performance evaluation, Markov models.
thus significantly more severe than in the past. The control of their dependability (reliability, availability, maintainability, confidentiality, safety and security) is therefore a major concern. However, this evaluation is extremely difficult.
I. INTRODUCfION Nowadays, more and more systems have a potential impact on people's safety, and on the environment. We entrust our lives and our goods to them, consequently completely depending on them However, some of their failures (e.g. in transport or in energy production) can involve considerable losses. In addition to the traditional fields that are aeronautics, nuclear power plants, space or the railway transport, other fields such as medical, oil industry, automobile industry, metallurgy, and the tertiary sector must integrate a dependability study to fulfill the final users' expectations.
Firstly, industrial and organizational systems are usually very complex to study. Their complexity is mainly due to the increasing variety of technologies involved in their implementation, e. g. mechanical, electronic and including many software components (frequently controlling the system itself) (Barreau, et al., 2002). Studying each sub-system (of different technique) separately is insufficient: it is necessary to study the system as a whole and all the interactions between the sub-systems (Morel, et al., 2002). These interactions generate most of the failures and also the most critical ones. The most important constraint in studying the interactions between different sub-
Furthermore, technological progress and their increase of performance increase these systems importance. The consequences of their failures are proportional to the increase of their performances,
63
systems is the variety of their technologies, which implies co-operation between reliability engineers (Barger, et al., 2002). These difficulties are still increased in the particular case of systems (possibly distributed) known as «Real Time», for which there are critical temporal specifications that must be respected. These systems have additional requirements in term of quality of service: their temporal accuracy and behavioral validity (security, safety, deadlock avoidance) are fundamental.
Faults
Fig. 1. Architecture of a mechatronic system
Secondly, the evaluation of the reliability (as precisely as possible) must be carried out as soon as possible in the life cycle of the system, and throughout the latter (YiD, et al., 2000). The failures occurring in the integration phase are extremely damageable for both costs and delays. Therefore, the reliability requirements must be carefully taken into account in the design phase.
2. DEPENDABILITY ANALYSIS USING PETRI NETS
Petri Nets (PN) are distinguished among the numerous models at the disposal of reliability engineers. Indeed, they can be employed throughout the process development cycle: one can thus preserve the same formalism to understand the architecture and the behavior of a process (as well as functional analysis) but also to model its failing behavior (like Markov graphs or fault trees).
Lastly, a precise and validated methodological process taking all the elements of a complex system into account does not yet exist (Kanoun and OrtaloBorrel, 2000). Whether the dependability and more precisely the reliability analysis in the case of mechanical and electronic systems is based on wellknown methods (such as FORMISORM, Monte Carlo simulation or RBD - Reliability Binary Diagram), the problem is still incompletely solved in the case of software. There are some techniques and tools enabling to increase software dependability (such as formal methods, fault trees, FMECA, diversification or fault tolerance). Unfortunately, few of these methods are widely used and software is still the less reliable part of a complex system. It is thus necessary to set up a process adapted to identify and analyze the risks and to make suitable decisions to reduce them This evaluation process cannot be specific of a field.
This versatility is still increased by their graphical and mathematical representations. The first one is of simple comprehension, thus supporting the dialogue between designers and reliability engineers, while the second has the power of expression of a formal method, while authorizing simulation (both functional and statistical) (Lindemann, 1998). To represent the dynamics of a system obliges to resort to a Markov approach or to PN. However, the Markov approach quicldy reaches its limits. Beyond ten components, only PN associated with a Monte Carlo simulation remain usable. Moreover, PN modeling makes it possible to analyze a priori as well as a posteriori (at the critical phases of validation and specification) the specific properties of the model for nominal and out-of-order operations.
Finally, this evaluation process should be declined in several steps: construction of the Reliability of the system, checking of the Reliability of the system, and evaluation of the level of Reliability reached.
The behavioral or performance analysis enables to highlight possible problems of incorrect or incomplete definition of the operating modes or transitions between these modes (specifications). In the same way, one can highlight problems of dimensioning, scheduling, resource sharing, and inputs/outputs synchronization (Schneeweiss, 1995).
This paper presents the dependability analysis of a particular kind of systems, whose complexity is inherent to their structural heterogeneity: mechatronic systems.
Indeed, a mechatronic system combines various techniques such as electric, mechanic, hydraulic, and electronic and is computer controlled. The aim of the control system is to observe the operative part through physical variables measured by sensors, to detect some events (e.g. threshold overshoot), and choose the suitable command processed by the actuators (state change of the system) (Moncelet, et. aI., 1998; cfFigure 1).
PN prove to be the best tool to establish a model of functional behavior and failure recovery and they are often the only possible approach. They present moreover the major advantage to remain understandable. The main reason why PN are used for mechatronics systems is that they can represent the dynamic of a system; indeed, PN allow modeling both the behavior
(;4
of each physical or functional part (for the control) and the interactions between these components. Since the latter can lead to unexpected behavior, the model describes all the possible behaviors of the system (Malhotra and Trivedi, 1995). The structural analysis gives both the firing invariants, describing the kind of activities (normal processing or failure/recovery cycle) and the marking invariants, showing the activity of the server unit (processing, free or in failure) and the output buffer status (Liu and Chiou, 1997). The behavioral analysis gives the transition matrix of the embedded Markov chain, found from the reachability graph of the Petri Net. The steady state probabilities are computed from the transition matrix, which contains the firing rates of the transitions. The steady state probabilities allow computing the probability to have a special condition, a mean transition rate or an average marking (Lopez-Benitez, 2000).
Fig. 2. PN model of a machine with an output buffer. The first marking invariant shows the activity of the machine (processing a part, waiting or in failure) and the second shows that the total nwnber of parts being processed or in the output buffer is always equal to two (i.e. equal to the buffer capacity). The firing invariants, given by the transition invariants, are {T., Ts} and {T 2 , T 3 , T., Ts} ; the fonner represents the failure/repair cycle, while the latter shows the normal production cycle.
2.1 Example Let us consider a modeling example: a machine with an output buffer, as represented Figure 2. T. is an immediate transition, the machine processes a part as soon as possible. The marking of the place Ps represents the capacity of the output buffer (2 in this particular case): the transition T. cannot be fired more than twice without firing Ts at least once, i.e. the marking of the place P 4 cannot exceed 2 (2 processed parts in the output buffer).
The analysis of the dynamical behavior of the model is done by constructing the reachability graph of the PN (Figure 3), which describes the evolution of the markings by the fIring of the transitions, thus representing the evolution of the states of the system.
The incidence matrix of the PN Figure 2, showing the changes of the marking by the firing of the transitions, is W (equation 1): T1 P1 P2
W= P3 P4 Ps
T2
T3
T4
0
0
-1 1 1 -1 0 0 0 1 -1 0
-1
Ts 0 0 0
1 -1 0 0 -1 0 0 1
(1)
Fig. 3. Reachability graph of the PN.
The places and transition invariants are obtained by solving pT.W = 0 and W.S = 0 respectively (Morel, et al., 1996): T
(J
P =(0
o 0)
I
I
J)
T
S
(0 0
=(1
The embedded Markov chain presented Figure 4 is found from the reachability graph by taking into account only the tangtble markings (since T. is an immediate transition, the markings Mo and M2 are vanishing).
I 1 0) (2)
I 0 0 I)
The corresponding marking invariants are: m(P1) + m(P2 ) + m(P3 ) = 1 { m(P ) + m(P ) + m(P4 ) + m(P ) = 2 3 s 2
The state transition matrix represents the transition rates between the different possible states of the system. For the embedded Markov chain in Figure 3, the state transition matrix is given in (4):
(3)
65
The mean output rate is given by the firing rate of the output transition T s and the probability that T s be enabled:
The average in-process inventory is the average marking of the output buffer, p.: E[rn(P.») = l.Pr[rn(P.) = 1) + 2.Pr[m(P4 ) = 2] E[rn(P.)] = 1t4+ 1ts+ 2 .1t6 = 0,364 (10)
3. APPLICATION ON AIRBAG Fig. 4. Embedded Markov chain of the PN. -~-As
Q=
-4 As 0 0
As --4
~
0
0
0 0
0 0
-~-As-As
As
~
-4 As
--4-As
0
0
-As
As 0
Nowadays, death and injuries resulting from the use of all kind of motor vehicles are at a terribly high level worldwide. Therefore, the issue of the dependability of protection devices is extremely important.
(4)
Mechatronic systems such as active suspension, automatic gear box, engine control or anti-skating system are already available on most vehicles. We presented the reliability analysis of the antilock brake system (ABS) in (Guerin, et al., 2002); the same global method is used in order to analyze the dependability of an airbag.
The steady states probabilities are obtained by solving II.Q = 0 and are used to compute probabilities of normal functioning or failure states, in order to evaluate the perfonnances or estimate the dependability of the system (i.e. its reliability, availability or maintainability).
An airbag system, also called supplemental inflatable restraint or supplemental restraint system, is composed of three major sub-systems: inflator and bag assembly, diagnostic module and crash sensors. The inflator and bag system is used to inflate an airbag so that the head and chest injury severity of occupants be reduced when an automotive collision occurs (Mahmud and Alrabady, 1995).
Numerical application: If the fire rates of exponential transitions are 1..2 = 1, 1..3 = 3, A. = 5 et As = 2, then the state transition matrix Q is: -4 3 0 0 5 -5 0 0 0 Q= 2 1 0 -6 3 0 2 5 -7 0 0 0 2 0 -2
(5)
The PN model of the airbag system contains some different levels, the highest one representing the main sub-systems. In order to estimate transition rates, the model is further detailed: each sub-system is separately modeled.
The steady state probabilities are: 1t1 = 0,428, = 0,283, 1t. = 0,150, 1ts = 0,064 and 1t6 = 0,075 .
1t3
For each feared event a fault tree analysis can be done; for example, inadequate inflator system output (Yang and Liu, 1997). The occurrence probability of the feared event is calculated from the occurrence probability of root events.
For example, the mean processing time of the machine is given by the probability of having one token in the place P2: Pr[m(P 2) = 1]
=1t1+ 1t4 = 0,578
(6) Each root event is characterized by a probability density function (pdf) which defines the failure occurrence probability. The pdf depends on the techniques used (Hosseini, et al., 2000):
The mean duration of the repair is given by the marking of the "failure" place P 3: Pr[m(P 3)
= 1] = 1t3+ 1ts = 0,347
(7) Most mechanical and hydraulic components are characterized by a Weibull distribution:
The probability of having the output buffer full is the probability that the marking of the place P4 be 2: Pr[m(P.)
=
2) = 1t6 = 0,075
F(t)=l-e-C~rr
(8)
66
(ll)
The evaluation phase switches to stochastic PN. Indeed, stochastic PN, while offering broad possibilities of representation, allow a global analysis of a system's performances, the. determination of the main indicators of reliability and, moreover, the monitoring by periodic tests.
where 13 is the shape parameter, TJ is the scale parameter and y is the location parameter. F(t) can be estimated directly by FORMISORM or Monte-Carlo simulation. Electronic and sensor components are generally defmed by an Exponential distribution: F(t) = 1 - e-A1
In the case of airbag roodel Figure 5, the transitions are deterministic and stochastic. The embedded Markov chain, obtained from the reachability graph of the PN is presented Figure 6. The system has three normal functioning states (M\> M2 and M 3) and three failure states (Mc.. MCd and Mc,), one for each sub-system.
(12)
where J... is the failure rate estimated from reliability handbooks (e.g. MIL-HDBK 217 or Bellcore). Software components (Markovian processes) can be characterized by :
F(t)=l-e-(,Z'lAJ )
(13)
where TJ is the solicitation rate, y the execution rate and J... the failure rate. These parameters are evaluated by tests or simulations. The high level PN model of the airbag is presented Figure 5, where the sub-systems are the crash sensors, the diagnostic module and the inflator and bag assembly. Fig. 6. Embedded Markov chain of the PN model. When a collision occurs, the crash sensors send the information to the diagnostic module, which activates the inflator and bag assembly and the airbag is inflated, protecting thus the occupants of the vehicle.
The state transition matrix, containing the transition ruing rates, allows computing steady states probabilities, which are used to compute failure states or normal states probabilities.
Besides this normal activity, more problems may occur, such as crash sensor failure, wrong diagnostic or inflator failure (delayed output, reduced output or no output at all). The failure rates of the sub-systems are estimated by more detailed models, taking into account their components.
The steady state probabilities for the six states are computed by solving n.Q= 0, where n is the steady state probabilities vector and Q is the transition matrix given below (14):
-tc -As Q=
Sensor
0 0 !-Is
failure
0 0
tc
0
As
0
- lj -'"
tj -Aj
0 0
Ad
-lJs
!-Id
0 0
0
IJj
0 0
0 0
0 0
0 0
Aj
-lJd
0 0
0
-1Ji
(14)
As mentioned before, the steady state probabilities are used to estimate the reliability of the system.
Diagnostic module
For example, the mean failure duration is the probability that the system be in a failure state: Pr[failure] = 1t(Mrs) + 1t(Mr~ + 1t(MIi)
(15)
A performance measure can be the inflating duration, given by the probability of being in M) state, 1t(M) . The response time is given by the mean fuing rate of the transition airbag active, given by the fuing rate of the transition and the probability that this transition be enabled: fTa = ta. 1t(M).
active
Fig. 5. High level PN model of the airbag.
67
Hosseini, M., R.M. Kerr and R.B. Randall (2000). An Inspection Model with Minimal and Major Maintenance for a System with Deterioration and Poisson Failures. IE~E Transactions on Reliability, Vol. 49, p. 88-98. Kanoun, K. and M. Ortalo-Borrel (2000) FaultTolerant System Dependability-Explicit Modeling of Hardware and Software Component-Interactions. IEEE Transactions on Reliability, vol. 49, nO 4. Lindemann, C. (1998) Performance Modelling with Deterministic and Stochastic Petri Nets, Wiley. Liu, T .S. and S.B. Chiou (1997) The application of Petri nets to failure analysis. Reliability Engineering and System Safety, vol. 57, p. 129142. Lopez-Benitez, N. (2000) Petri-Net Based of Distributed Performance-evaluation Homogeneous Systems. IEEE Transactions on Reliability, vol. 49, p. 188-198. Mahnmd, S.M. and A.I. Alrabady (1995). A new decision making algorithm for airbag control.
The mean time to failure (MTTF) can be estimated with a Monte Carlo simulation of the PN model, on some thousands stories.
4. CONCLUDING REMARKS This paper presents the results of a co-operational work, getting together mechanical, electronic, and software engineers. Dependability evaluation of mechatronic systems requires the mode ling of failure and repair behavior of different components and the numerous interactions between them. We use a graphical and mathematical modeling tool, Petri Nets, which allow a compact representation of systems involving concurrency, synchronization, resource sharing and time constraints. They enable static analysis of the model by structural verification, as well as dynamical behavior analysis. In order to check system properties, the different subsystems are mode led separately; the design or the analysis of global properties is based upon composition techniques, in a bottom-up approach or in a top-down approach, respectively.
IEEE Transactions on Vehicular Technology, vol. 44, nO 3, p. 690-697. Malhotra, M. and K. S. Trivedi (1995) Dependability Modeling Using Petri-Nets. IEEE Transactions on Reliability, vol. 44, nO 3, p. 428-440. Moncelet, G., J. Porras, S. Christensen, H. Demmou, M. Paludetto (1998). Application des reseaux de Petri colores I'evaluation de la sUrete de fonctionnement d 'un systeme mecatronique automobile. 1 ItA National Symposium on
In the evaluation phase, we use stochastic and deterministic Petri Nets, to represent the dynamic behavior of the system In order to estimate failure and repair rates, we model the mechanical, electronic and embedded software sub-systems. The modeling level of detail depends on the dependability measures to be evaluated.
a
Reliability and Maintainability, SEE Lambda-mu J1, Arcachon, France. Morel, J.-Y., M. Barreau (ex-Mares) and M. Bourcerie (1996) Composition and Analysis of Generalized and Colored Petri Nets. MultiConference IEEE IMACS CESA, Lille, France. Morel, J.-Y., M. Barreau and A. Todoskoff (2002) Petri Nets: a tool adapted to computer system dependability and safety. IEEE ESREL 2002
The highest level model represents the interactions between the sub-systems, to check and to estimate the reliability, the availability and the safety of the mechatronic system An application to a vehicle airbag illustrates this method.
European Conference on System Dependability and Safety, Lyon, France.
REFERENCES
Schneeweiss, W.G. (1995) Mean Time to First Failure of repairable Systems with one Cold Spare. IEEE Transactions on Reliability, vol. 44, nO 4, p. 567-574. Yang, S.K. and T.S. Liu (1997) Failure Analysis for an airbag inflator by Petri Nets. Quality and Reliability Engineering International, vol. 13, p. 139-151, John Wiley & Sons. Yin, M.L., C.L. Hyde and L.E. lames (2000) A Petrinet Approach for Early Stage System-level Software Reliability Estimation. Annual Reliability and Maintainability Symposium, p. 100-105.
Barreau, M., J.-Y. Morel and A. Todoskoff (2002). State-of-the Art Information on Petri Nets Applied to Software Quality. 5th International Software Quality Week Europe, Brussels, Belgium. Barger, P., J.-M. Thiriet and M. Robert (2002). Performance and dependability evaluation of distributed dynamical systems. IEEE ESREL
2002 European Conference on System Dependability and safety, Lyon, France. Guerin, F., A. Todoskoff, M. Bureau, I-Y. Morel, A. Mihalache and B. Dumon (2002). Reliability analysis for complex industrial real-time systems: application on an antilock brake system. IEEE International Conference on Systems. Man and Cybernetics, Hammamet, Tunisia.
68