Distributed system design based on dependability evaluation: a case study on a pilot thermal process

Distributed system design based on dependability evaluation: a case study on a pilot thermal process

Reliability Engineering and System Safety 88 (2005) 109–119 www.elsevier.com/locate/ress Distributed system design based on dependability evaluation:...

332KB Sizes 1 Downloads 59 Views

Reliability Engineering and System Safety 88 (2005) 109–119 www.elsevier.com/locate/ress

Distributed system design based on dependability evaluation: a case study on a pilot thermal process Blaise Conrarda,1, Jean-Marc Thirietb,*, Michel Robertb a

LAGIS-Laboratoire d’Automatique, de Ge´nie Informatique et Signal (LAGIS UMR CNRS 8021), USTL, Cite´ scientifique, Baˆt. EUDIL, 59 655 Villeneuve d’Ascq Cedex, France b CRAN—Centre de Recherche en Automatique de Nancy (CRAN UMR CNRS 7039), ESSTIN, 2 rue Jean Lamour, 54519 Vandœuvre les Nancy Cedex, France Received 17 July 2002; received in revised form 17 March 2004; accepted 16 July 2004 Available online 2 October 2004

Abstract The aim of this paper is to test and validate a methodology for the design of distributed systems by evaluating performances and dependability (more specially reliability and availability). The optimal material architecture of the automation system is determined from an over-dimensioned preliminary material architecture, and from the functional decomposition of this system. The interest of this method is to allow an early comparison of several choices of architecture, and of sets of components, during the designing of the system. One of the results of our work is described below as the application of the method for the designing of a pilot thermal process. q 2004 Elsevier Ltd. All rights reserved. Keywords: Automation system; Designing phase; Dependability; Availability; Reliability; Distributed systems; Pilot thermal process; Validation

1. Introduction The dependability evaluation of an architecture, composed of one or several real-time networks, together with several actuators, sensors, data processing units (PLCs, microprocessors, DSPs.), remains a challenge. At the best, it is possible to get some information on various devices, such as sensors and actuators, thanks to field feedback or information from manufacturers. It is far more difficult to properly evaluate the digital components such as communication networks or data processing units [1]. The method proposed in this paper consists of a binary decision diagram (BDD) based method [2]. Its aim is to quantify the system availability and to measure the effects of inauspicious, or spurious, events. It is possible thanks to this * Corresponding author. Tel.: C33-3-83-68-51-31; fax: C33-3-83-6851-21. E-mail addresses: [email protected] (B. Conrard), [email protected] (J.-M. Thiriet), michel.robert@esstin. uhp-nancy.fr (M. Robert). 1 Tel.: C33-3-28-76-73-36. 0951-8320/$ - see front matter q 2004 Elsevier Ltd. All rights reserved. doi:10.1016/j.ress.2004.07.014

method to compare several possible architectures [3] for distributed automation systems. The first part of this paper deals with a presentation of the approach, followed by a description of the functional modelling of the components and the communication networks. Several criteria are then proposed for the evaluation of availability and reliability. The method is finally applied to a process composed of a tank, a pump, several sensors and a communication network.

2. The modelling approach 2.1. General processing The proposed method is shown in Fig. 1. It shows the description of a process and its general working. It is organised in three steps. The first step is based on a functional study. Its purpose is to identify all functions and services needed for a correct functioning of the process. From this, the second step consists in establishing the list of the devices that can be

110

B. Conrard et al. / Reliability Engineering and System Safety 88 (2005) 109–119 Table 1 Matrix for the communication between tasks

Task Task Task Task Task

Fig. 1. General method.

used to perform these functions and services. The last step is the use of an optimization algorithm, which determines the functions and the devices, which are necessary for the system to achieve its mission at a required dependability level. 2.2. Functional modelling 2.2.1. Functions The first step consists in achieving a functional modelling of the distributed system, taking into account the user’s needs. In order to achieve this functional modelling, various tools can be used, based for example on structured analysis and design techniques [4] or structured analysis for realtime systems methods [5], or object-based techniques [6]. This hierarchical decomposition allows us to get ideally some non-decomposable processing chains called distribution atoms, or elementary functions [7]. During the decomposition, an elementary function is defined when a simple (or standard) device is able to fulfil it. Several kinds of elementary functions can be identified, those which are critical for the production itself, and others which are useful only to improve other services (safety, comfort.). The description of the elementary functions can be completed by other data such as the time response, the data processing load, the needed memory size., which allow at the end to check that the chosen devices capacity is not overtaken. Once the list of needed or useful functions is established, the interconnections between these functions have to be defined.

0 1 2 3 4

Task 0

Task 1

Task 2

Task 3

Task 4

0 0 0 1 0

1 0 0 0 1

1 0 0 0 0

0 0 1 0 1

0 1 0 0 0

2.2.2. Study of the communication The complete functional analysis allows to explicit the various communication links between functions (or tasks). Two kinds of information are exchanged between the functions: data flows and control flows. A matrix can be elaborated (see Table 1). The value 1 shows that a communication link is required from the source task to the destination task; if it is not the case, the value is equal to 0. When all the tasks are decomposed, it is possible to quantify the information between each pair of tasks (periodicity, size of the messages.). The purpose is to check again in the end that the capacity of the chosen devices is not exceeded. 2.3. Preliminary hardware architecture The proposed methodology needs to establish a preliminary hardware architecture. It consists in defining a set of devices that can support the elementary functions identified at the first method step. To achieve this stage, an easy option consists in establishing a set of devices for each function. The position in the process for each possible device can complete this model. Finally, a link between each function and a set of possible usable devices is set-up. An advantage of the proposed method lays on the consideration that various types of device with various capacities can achieve one or several functions. Among them, the most advanced ones (but also more expensive) are: – some intelligent automation components, like the sensors, actuators and other devices including processing units; – a communication tool often based on a field-bus network which can interconnect several components and offers a digital communication. On the other hand, the following devices are cheaper but they cannot support several functions: – some standard automation components which can perform a unique function and which have limited means of communication; – some standard communication lines which generally carry on analogue signals (ex.: 4–20 mA).

B. Conrard et al. / Reliability Engineering and System Safety 88 (2005) 109–119

When all possible devices are identified, it is possible to specify their interconnections. Lastly, the target is to check that there is an interconnection between two devices supporting two functions linked by a data flow. 2.4. Evaluation criteria Achieving the evaluation of an automation system is a delicate task. This concept, which is defined as the aptitude of an entity to satisfy one or several required functions, is by definition not directly quantifiable. Nevertheless, some dependability characteristics can be evaluated, considered as financial costs. These costs can be integrated into the global possession cost estimation. Thanks to the use of a unique way of measuring these values, it is then possible to achieve the comparison. 2.4.1. System cost criterion The cost system is the first criterion (independent of the dependability aspect). It integrates the fixed or investment costs along with the variable or use costs. 2.4.2. Availability criterion As far as dependability is concerned, two criteria have been taken into account. The first one relates to the system availability and is representative of its ability to produce. The second one is relative to reliability and safety; it is a representation of failures consequences. Availability and more particularly unavailability are crucial during the life of a production system. The financial losses due to the production shutdowns could be high. Availability is defined as the aptitude for a device to ensure its mission (EU X 60-010 standards) [8]. It is generally quantified as a rate between the useful time (when there is effective production) and the aperture time (when the user wants to produce). By this rate estimation and its application to the production time (about some years), it is then possible to get an estimation of the production as a cost or profit. In order to evaluate the availability/unavailability of the system, the relation between this global availability and the components availability has to be established. The structure of the functional decomposition can help to find this relation. The principle which is used is that a set of states are associated to each function or component. They can be ‘available’, ‘unavailable’, ‘degraded’. and correspond to the capacity of the function to be realised. A relation between each function and its sub-function(s) can be described without any problems. Indirectly, a relation between the elementary functions states and the system states is built. 2.4.3. Reliability criterion The addition of a second criterion, representative of the reliability/safety, enhances the evaluation thanks to

111

the consideration of the possible failure consequences. Based on the preliminary risk analysis or on the failure mode, effects and criticality analysis [9], the aim is to consider the effect of the failures on the system in terms of seriousness. Globally, a set of dreaded events is identified. In order to establish the seriousness/probability couple for each of these events, their occurrence probability can be evaluated, in a second phase, from: – the probability of component failures, likely to be the origin of this inauspicious event; – the system state, or the functioning mode of the system; – the relation between the component failures and the global system failure. As for the availability relation, it can be established for each function by describing the failure effect of one of its sub-functions. This way, the component failure modes are linked to the system ones. The probability/seriousness couple can finally be quantified, as it is possible to assign a cost to each consequence (time loss, production loss, components loss.). Finally, the dependability quantification, which is necessary to allow a comparison between various solutions, lays on an a priori estimation, as costs of: – the availability rate; – a set of events associated to a probability/seriousness couple. Considering these costs, in addition to the usual fixed and variable costs, it is possible to get a unique measure of the quality for the obtained architecture. 2.5. Optimised design with dependability assessment 2.5.1. Problem definition The last step of the proposed method concerns the optimisation phase. The data regarding the automation system to be designed and optimised are: – a set of elementary functions; – a set of components; – a relation between the system availability and the components one; – a relation between one component failure and its effect on the system; – some quantitative data about components (cost, availability, reliability.). The purpose of the optimisation phase is to set-up a selection of elementary functions and components for the automation system that can give the best profit for its owner. An allocation method is proposed in order to consider various architectures and to compare them. The principle consists in associating a function/component couple to a Boolean value. This value is equal to 1 if the considered

112

B. Conrard et al. / Reliability Engineering and System Safety 88 (2005) 109–119

function is implemented in the considered component. These Boolean values form a vector, and the study of its different values allows various architectures to be considered. Such a method allows a selection of elementary functions. For non-indispensable functions (redundant and safety functions.), the case of their implementation (or a combination of them) is considered and studied as they are or not associated to a component. Using the same method, the components are selected and various combinations are considered. According to the Boolean vector value, some components have no implemented functions, therefore, they are taken away from the final architecture. Finally, the problem consists in studying various architectures thanks to a Boolean vector and then to evaluate the different criteria (cost, availability, reliability) in order to keep only the best results.

2.5.3. Study of the solution space Such a method can be used to examine various possible architectures. Several approaches to solve such optimisation problem [12,13] can be used according to the size of the solution space. If the size is not too large, the ‘Branch and Bound’ algorithm [14] is well-suited. It allows interesting solutions to be studied (and not all the possibilities) and ensures that the result is the optimal one. However, when the solution space is too large and the processing time too long, stochastic algorithms have to be used such as genetic ones [15]. The encoding of solutions with Boolean vectors makes them easier to use. The crossover operation is simplified and new solutions can be built from old ones. Nevertheless, these algorithms cannot ensure that the solution is the optimal one, but they can give the designer satisfying results.

3. Application of the method 2.5.2. Evaluation of considered solutions The other part of the optimisation phase relates to the evaluation of the considered solutions. A combination of three criteria is used to achieve it. The first criterion related to the cost of the system is evaluated by adding the costs of all the components. This cost includes buying costs, installation costs, maintenance costs. For a given architecture, the evaluation of this criterion is simple. The second and the third criteria related to availability and reliability are more complex to be evaluated. Thanks to the hierarchical functional model, a relationship between states and events of the system and of the components has been established. The use of a BDD [10,11] allows this relation to be encoded and eases the evaluation of these two criteria. For a given solution, an algorithm scrutinises the BDD and determines the probability to reach its branches (corresponding to the states or the failure events of the systems). During the analysis of the BDD, the probability to follow a node branch relates to the probability of failure of the component associated to this node. If this component is not implemented, the algorithm considers that this component is always unavailable. Fig. 2 shows an example of tree similar to the used BDD. Finally, for a given solution, thanks to the three criteria evaluation, a unique value is determined by the integration of economical factors, and gives a potential profit estimation of the considered architecture.

Fig. 2. Tree for dependability evaluation.

3.1. The process The process is a tank, supplying a fluid at a desired temperature. This process is composed of a pump to fill the tank and a heating element (a resistor) to keep the fluid at the desired temperature. The automation system is in charge of keeping a sufficient fluid level in the tank and of supplying the required energy to keep a constant temperature, whatever fluid flow variations are (Fig. 3). Let us assume that the pump extracts the fluid from a tank supposed to be always filled up. Let us suppose that the aim of using a tank is to improve the control facing variations, particularly for some punctual demands for an important quantity. 3.2. Dreaded events As far as the mission is concerned for such a process, the dreaded events are (in increasing order of importance of their consequences): – the spurious shutdown of the installation; – the flow supplied at an incorrect temperature; – the overflowing or the destruction of a heating element when a resistor is switched on without any fluid.

Fig. 3. Simplified view of the process.

B. Conrard et al. / Reliability Engineering and System Safety 88 (2005) 109–119

113

– the flows interconnecting these processes (see also Table 2 for further details).

3.3. Functional decomposition The first step of the approach consists in describing the functions and services that the automation system must carry out. This approach lays on the iterative decomposition of functions and sub-functions. The main function, top of the decomposition, is to keep both fluid level and fluid temperature constant in the tank. The representation used for this application is an adapted data flow diagram [16,17]: – a unique form of data flow is used to represent the flow between processes (or functions); – there is no storing process; – finally, the interactions with the physical process are achieved by some processes in the diagram. Only a part of the decomposition is proposed in Fig. 4 [18]. Table 2 displays a hierarchical and functional decomposition for the mission ‘to keep the tank level and temperature’. This representation of the functions allows the identification of: – the elementary processes to distribute onto the devices, represented by thin circles;

It is possible to distinguish various types of elementary process. These can simply be the information transmission between the devices (for instance ‘TO TRANSMIT intermediate level’, 11122), the measurement acquisition (for instance ‘TO READ intermediate level’ 11121), the action on the process thanks to an actuator (for instance ‘TO PUMP’, 1121), or finally the execution of an algorithm (for instance ‘TO CONTROL PUMP’, 11221, which gives some orders to the pump, as a function of the fluid level within the tank). The whole hierarchical decomposition used for this application is presented in Table 2. In this decomposition, some functions are ‘optional’, which means not mandatory for the nominal system operation. These are supervision functions; the mission of the supervision functions is to shutdown the system as soon as a dangerous situation is detected. It can be a too high level of the fluid in the tank, a too high temperature or a too low level to authorize the heating. The absence of these functions does not compromise the system mission; the system is always able to keep the fluid at the required level and temperature. Nevertheless, when failures likely to drag a bad behaviour of the system occur,

Fig. 4. Some data flow diagrams of the system.

114

B. Conrard et al. / Reliability Engineering and System Safety 88 (2005) 109–119

Table 2 Hierarchical and functional decomposition for the mission ‘to keep the tank level and temperature’ TO KEEP the tank level and temperature TO KEEP level TO CHECK level TO MEASURE level TO MEASURE high level TO READ high level TO TRANSMIT reading of high level TO MEASURE intermediate level TO READ intermediate level TO TRANSMIT reading of intermediate level TO CONTROL pumping TO PUMP TO CONTROL pump TO CONTROL pump according to levels TO TRANSMIT pumping order TO CONTROL electric supply for the pump TO SUPERVISE limit level reached (optional) TO MEASURE limit level TO TRANSMIT shutdown order TO SHUTDOWN pump TO KEEP temperature TO CONTROL temperature TO MEASURE temperature TO READ temperature measurement TO TRANSMIT temperature measurement TO SUPERVISE heating TO CONTROL heating TO CALCULATE order for heating TO TRANSMIT order for heating TO CONTROL electric supply for the resistor TO HEAT TO SUPERVISE level (optional) TO MEASURE too low level TO TRANSMIT measurement of low level TO SHUTDOWN heating TO SUPERVISE temperature (optional) TO MEASURE too high level TO TRANSMIT measurement of low level TO SHUTDOWN heating

0 10 11 111 1111 11111 11112 1112 11121 11122 112 1121 1122 11221 11222 11223 12 121 122 123 20 21 211 2111 2112 212 2121 21211 21212 21213 2122 22 221 222 223 23 231 232 233

the actions of these functions can limit the inauspicious effects and consequences of this malfunctioning state. On the contrary, the possible occurrence of failures on the safety functions can lead to spurious shutdowns and decrease the global system availability. When some functions are ‘optional’, this allows various allocation strategies to be considered, including or not the associated devices. The evaluation of these criteria, associated to the research of the architecture with the bestestimated profitability, selects some safety mechanisms to be used or not. These criteria are also related to the ratio

device costs, system availability, occurrence probabilities of inauspicious effects. 3.4. Specification of dependability aspects The functional decomposition makes possible a census of the elementary processes (or functions) and the flows interconnecting them. With a view to study the dependability, the aim consists in enhancing this decomposition with data allowing the availability and failure occurrence estimation. 3.4.1. Availability study To each function elaborated during the decomposition, a set of states describing whether the related service can or cannot be achieved (i.e. its availability) is associated, as well as the relations linking these states. For the application ‘heating tank’, the possible states of these elementary functions have been limited to ‘available’ or ‘unavailable’. The same modelling has been used for the hierarchically superior functions, except for ‘TO KEEP temperature’ which is composed of three states. The added state is ‘available unsafe’ which indicates that the service ‘to keep temperature’ is valid. Nevertheless, in case of tank emptying, this service will not detect the lack of fluid; so there is a risk that the heating will be kept, and consequently the heating element will be damaged. The relationships between states are defined thanks to the truth tables. Most functions are in the ‘available’ state, only if all their sub-functions are also in the ‘available’ state. Some exceptions exist in the case of redundant or noncrucial sub-functions. As an example, the tables for the functions ‘TO KEEP temperature’ and ‘TO MEASURE high level’ are shown in Fig. 5. 3.4.2. Reliability study The second aspect, relative to the dependability study in the proposed methodology, is the evaluation of the failure probability of the system. It concerns the definition of the failure (or event) propagation through the hierarchical decomposition. For the ‘heating tank’ application, the considered events are ‘Shutdown’ of the service (for data processing or transmission), ‘active blocking’ or ‘inactive blocking’ (for measurement functions associated to detectors), ‘spurious shutdown’ or ‘spurious starting’ (for action functions such as the pump, heating body, relay.). For the system mission, ‘TO KEEP the tank level and temperature’, the associated events are ‘Spurious shutdown’, ‘Supplying of the fluid at a bad temperature’, ‘Accident’ (overflowing, destruction of the heating body, bubbling). For the elementary functions, these events come directly from the nature of the functions considered individually. These events are indirect images of their related component failure modes.

B. Conrard et al. / Reliability Engineering and System Safety 88 (2005) 109–119

115

Fig. 5. Decomposition and truth tables for two functions.

For the functions at a higher level, this concerns the definition of the event propagation, which is a function of the state of the resources. Table 3 shows the correspondence between events for the system mission. In the same way, a correspondence is defined between the events of the sub-functions and those relative to the considered function.

Table 3 Event propagation table Sub-functions

Sources events

Corresponding events

TO KEEP level

Shutdown of the service Overflowing

Shutdown of the service Risk of accident (overflowing)

3.5. Material architecture Finally, the proposed methodology needs to define a preliminary material architecture. Table 4 describes the set of devices likely to be employed in the system. The preliminary architecture is described in Fig. 6. For the devices identified by * and only used for safety functions (for instance: ‘to avoid an overflow’), let us assume their unavailability rate is tripled. As a matter of fact, some of their failures do not have a direct effect on the process; so they are masked. Periodical tests linked to a maintenance activity are the only possibility to allow their detection. Their availability is consequently directly affected.

Inauspicious emptying

TO KEEP temperature

Shutdown of the service Incorrect temperature

Permanent heating

Shutdown of the service Risk of accident (destruction of the heating body) Shutdown of the service Supplying fluid with an incorrect temperature Risk of accident (bubbling)

State of TO KEEP level: Available Available unsafe

116

B. Conrard et al. / Reliability Engineering and System Safety 88 (2005) 109–119

Table 4 Components dependability data Type of device

Characteristics/connections

Cost equip.

Information linked to the availability

Mode and number of failures

Pump

1 input, power supply

500

Spurious shutdown: 3

1 input, power supply and relay

600

Available: 98% Unavailable: 2% Available: 97% Unavailable: 3%

Spurious start: 2

Available: 97% Unavailable: 3% Available: 99% Unavailable open: 1% Available: 99% Unavailable: 1%

Spurious shutdown: 3 Spurious start: 1 Blocked open: 1 Blocked closed: 1 Blocked open: 1 Blocked closed: 1 Shutdown: 1 Distorted measurements: 3

Level detector

Temperature sensor

Heating resistor

Relays

Communication system

1 start input, 1 shutdown input 1 input, power supply 1 field-bus interface 1 output open/closed*

20

1 field-bus interface*

50

1 output 4–20 mA

30

1 field-bus interface

50

Sensor with a threshold*

20

Sensor with a threshold and a field-bus interface*

40

1 input, power supply

200

1 input 4–20 mA integrated controller

300

1 input, power supply 1 field-bus interface Power relay*

300 20

Power relay with field-bus interface*

30

‘Point-a`-point’ connection*

5

Available: 99% Unavailable: 1% Available: 99% Unavailable: 1% Available: 99.5% Unavailable: 0.5% Available: 99% Unavailable: 1% Available: 99% Unavailable: 1% Available: 99%

Distorted measurements: 3 Measurement shutdown: 1 Shutdown: 1 Shutdown: 2

Spurious shutdown: 2 Spurious shutdown: 2

Unavailable: 1% Available: 99% Unavailable: 1% Available: 99% Unavailable: 1% Available: 99%

Inauspicious heating: 0.5 Spurious shutdown: 2 Inauspicious heating: 0.5 Blocked open: 1 Blocked closed: 1 Blocked open: 1 Blocked closed: 1 Rupture: 1 Short-circuit: 1 Communication shutdown: 5

PLC with analog or binary inputs-outputs

200

Unavailable: 1% Available: 99.5% Unavailable: 0.5% Available: 99% Unavailable: 1% Available: 99%

PLC with field-bus interface

200

Unavailable: 1% Available: 99%

Incoherent functioning: 0.5 Shutdown: 2

Unavailable: 1%

Incoherent functioning: 0.5

Field-bus Control system

700

Spurious shutdown: 3

50

The chosen material architecture allows several possible implementation forms. Data flows can be achieved through a field-bus network or thanks to point-to-point wires (ex: 4–20 mA). It is possible to use PLCs to control simple actuators or to use intelligent actuators, which dialog directly with the sensors (thanks to the field-bus network or point-to-point connections). Relays allow also the falling back of pumping or heating functions and can also be used in safety functions. All these components and their characteristics relative to dependability are defined in the previous table.

Shutdown: 2

After having defined which functions can be implemented on each component, the stage of quantification and selection of the best architecture can begin. 3.6. Quantification and optimization 3.6.1. Correspondence of costs In order to compare the various solutions, it is first necessary to convert the availability and the event occurrence probabilities into compatible costs. The system availability is evaluated using one of the two chosen rates (Table 5).

B. Conrard et al. / Reliability Engineering and System Safety 88 (2005) 109–119

117

Table 7 Values of criteria for the best architecture Rate and probabilities Components costs Availability Reliability (probable number of occurrences)

Fig. 6. Preliminary material architecture of the system.

These values are established as a function of the system lifecycle. When the installation performs its mission (or is in the state to be able to do so), during 1% of its life, the expected profits are 20 monetary units, whereas the unavailability is evaluated as a loss of 5 monetary units. In the same way, three events are associated to the system mission failure. A loss representative of the actual consequences relates to each occurrence of one of these events (Table 6). It is also estimated that a spurious shutdown is equivalent to a loss of 25 monetary units, whereas the non-respect of a control is equivalent to a loss of 50, and an accident to 500 monetary units. 3.6.2. Research of the best solution The research of the best solution consists in studying each of the possible configurations (material, processes allocation, 2184 possible solutions). This research lays on a projection of the elementary functions onto the preliminary material architecture defined before. For each projection, it is checked that material constraints are respected, including the possible interconnections between components (same type of communication interface for instance). The cost of used components, the availability rates, the a priori event occurrences during the lifecycle of the system are then evaluated to provide a final cost characterizing the system profitability. Table 5 Criteria used for the correspondance between availability and costs for the system State

Corresponding profits or losses

Available Unavailable

1%/20 1%/K5

Criteria

K1010.0 1882.4 K29.4 K319.5

Availability Unavailability Spurious shutdown

94.12% 5.88% 12.78

Incorrect temperature Accident

2.74

K137.0

0.78 Total

K392.3 K5.8

A computerized tool dedicated to this application has been developed in order to achieve this research of architecture. With the chosen values (component costs, failure probabilities, conversion of rates into costs.) only one architecture maximizing the criterion has been obtained. The various parameters of this solution are given in Table 7. This optimal solution consists in using an intelligent pump and an intelligent heating unit. The intelligent pump (which has got its own control unit) is directly connected to the level detectors (high, intermediate, ‘too high’), the intelligent heating unit is directly connected to the temperature sensor. So the recommended architecture is one of the simplest in terms of number of devices. Indeed the level detector dedicated to the detection of a ‘too high level’ (in charge of avoiding an overflow) could be removed without affecting the mission (but in this case, the occurrence probabilities of the event ‘accident’ would reach 4.22, and so this solution would be less interesting) (Fig. 7). In this example, the various component costs lead to a more profitable architecture with point-to-point links than with a field-bus network. 3.6.3. Other architectures More in depth architecture could be evaluated by modifying the weight given to the availability and event occurrences rates.

Table 6 Criteria used for the correspondance between event occurrences and costs for the system Event

Loss resulting of an occurrence

Spurious shutdown Supply of the fluid with an incorrect temperature Accident (overflow, destruction of the heating body, bubbling)

1 occurrence/K25 1 occurrence/K50 1 occurrence/K500 Fig. 7. Best selection for a material architecture of the system.

118

B. Conrard et al. / Reliability Engineering and System Safety 88 (2005) 109–119 Table 8 Synthesis of the various solutions

Cost of the components Availability

Fig. 8. Diagram for the transformation of dependability criteria into costs.

The following curves define domains relating to various architectures as a function of the weights applied to the transformation of criteria into costs (Fig. 8). The first relates to the previous architecture. The solutions associated to the domains 2 and 3 use a field-bus network. The two solutions use an intelligent pump directly connected to the field-bus. This option is more expensive due to the use of the network (225 monetary units with the chosen values), but is compensated by a more reduced probable number of overflows (0.62 instead of 0.78), which can be explained by the fact that failures from the level sensors or from the network are easier to detect. Finally, the solution of the domain 3 integrates a greater number of safety functions. For instance, a level detector ‘too low’ is connected to a relay and a threshold temperature sensor controls the relay through the field-bus network. These new components increase on the one hand the installation cost of the system but, on the other hand, they decrease the probability of occurrence of the event ‘overflow’ (from 0.78 to 0.20) to the detriment of the ‘spurious shutdowns’ (with 14.05 instead of 12.78) (Fig. 9).

Fig. 9. Example of other material architectures of the system.

Reliability (probable number of occurrences)

Rate of the state ‘availavailable’ (%) Rate of the state ‘unaunavailable’ (%) Spurious shutdown

Incorrect temperature Accident

Solution 1

Solution 2

Solution 3

K1010

K1235

K1350

94.12

94.12

91.32

5.88

5.88

8.68

12.78

11.21

14.05

2.74

2.74

2.74

0.78

0.62

0.20

The synthesis of the obtained results is shown in Table 8. These results show what we intuitively knew. The most important the number of safety functions is, the least the risk of serious incidents is. But availability is then reduced and the number of system fallbacks increases, as a result of the components failure.

4. Conclusion The proposed method is innovative since it allows to get an optimal dependable solution in an exhaustive manner, without separately studying the various architecture possibilities. This method is easily applicable for middle-sized systems. Nevertheless, the combinatory explosions due to the use of BDDs are a disadvantage for more important sized systems. This application tries to show the interests and the various potentialities of the proposed methodology. This method is applicable to a complete system or to only parts of it. This method allows to research the best material (organization of the components) and operational (organization of elementary functions within this material architecture) architectures. This requires the evaluation of criteria allowing the comparison of planned solutions (relative aspect), but offering also the designer an image of the robustness of the system facing possible failures (absolute aspect). In other respects, the methodology imposes the use of a computer tool in order to correctly process the important number of combinations at the designing stage. In spite of everything, this methodology seems to offer an appreciated help at the designing phase of automation systems with a view to optimize them. It is a useful tool for a preliminary study in order to answer a bid, for instance. As the main perspective of the work, we are deepening the method study, working more particularly on

B. Conrard et al. / Reliability Engineering and System Safety 88 (2005) 109–119

the semi-qualitative expressions of the dependability specification. Today, constraints define a limited number of tolerated component faults. This can be improved by setting for each component a coefficient relative to its safety. Then, in the dependability specification constraints, the term ‘number of faults’ can be replaced by a sum of faults. On the one hand, the various safety parameters of available components (standard, safe, cheap.) can be taken into account in our method, and on the other hand, the designer can fix the dependability more accurately.

Acknowledgements The authors would like to thank very much F. Clerc and P. Gay for the English corrections.

References [1] Arlat J. Composants logiciels et suˆrete´ de fonctionnement, inte´gration de COTS (Software components and dependability, COTS integration)—Hermes, Paris; 2000. [2] Bryant R. Symbolic Boolean manipulation with ordered binary decision diagrams. ACM computing surveys; 1992. [3] Conrard B, Thiriet JM, Bicking F. Dependability as a criterion for distributed systems design. Fourth IFAC International Symposium on Intelligent Components and Instruments for Control Applications, Buenos Aires, Argentina, 13–15 September 2000, p. 45–50. [4] Marca DA, McGowan CL. Sadt: structured analysis and design techniques. McGraw-Hill software engineering series; 1988. [5] Hatley DJ, Pirbhai IA. Strategies for real-time system specification. Dorset House; 1988 [ISBN: 0932633110].

119

[6] Rumbaugh J, Blaha M, Premerlani W, Eddy F, Lorenson W. Object-oriented modeling and design. Prentice-Hall International editions; 1991. [7] Bayart M, Simonot-Lion F. Impact de l’e´mergence des re´seaux de terrain et de l’instrumentation intelligente dans la conception des architectures des syste`mes d’automatisation de processus. Convention with the French ministry of industry (1992)-P-0239, final report; 1995. [8] EU X 60-010 standards: maintenance. [9] Villemeur A. Methods and techniques. Reliability, availability, maintainability and safety assessment, vol. 1. New York: Wiley; 1992 [ISBN: 0471930482]. [10] Conrard B, Thiriet JM, Robert M. Re´partition des traitements dans la conception de syste`mes d’automatisation architecture´s autour de re´seaux de terrain (Distribution of processings for the design of automation systems organised around fieldbus networks). Colloque europe´en ’Re´seaux de capteurs et communications /INNOCAP 99, Grenoble, 28/29 April, p. 43–9. [11] Rauzy A. A brief introduction to the binary decision diagram (Une bre`ve introduction aux diagrammes binaires de de´cision). In: JESA No. 8, Hermes, Paris; 1996. [12] Charon L, Germa A, Hudry O. Me´thodes d’optimisation combinatoire. Paris: Edition Masson; 1996. [13] Reeves CR, editor. Modern heuristic techniques for combinatorial problems. Oxford: Blackwell Scientific Press; 1992. [14] Jonsson J, Shin KG. A parametrized branch and bound strategy for scheduling precedence constrained tasks on multiprocessor system. Proceedings of the International Conference on Parallel Processing, Bloomingdale, USA, p. 158–65. [15] Goldberg DE. Genetic algorithms in search, optimization, and machine learning. Reading, MA: Addison-Wesley; 1989 [ISBN 0-20115767-5]. [16] Hatley D, Pirbhai I. Strategies for real-time system specifications. New York: Dorset House Publishing Co.; 1991. [17] Ward P, Mellor J. Structured development for real-time systems, vol. 3. Englewood Cliffs, NJ: Prentice-Hall; 1986. [18] Conrard B. Contribution to the quantitative evaluation of dependability during the design phase of automation systems. PhD Universite´ Henri Poincare´ Nancy 1, 24 September 1999 [in French].