Reliability and safety analysis for systems of fusion device

Reliability and safety analysis for systems of fusion device

Fusion Engineering and Design 94 (2015) 31–41 Contents lists available at ScienceDirect Fusion Engineering and Design journal homepage: www.elsevier...

1000KB Sizes 2 Downloads 106 Views

Fusion Engineering and Design 94 (2015) 31–41

Contents lists available at ScienceDirect

Fusion Engineering and Design journal homepage: www.elsevier.com/locate/fusengdes

Reliability and safety analysis for systems of fusion device Robertas Alzbutas ∗ , Roman Voronov Laboratory of Nuclear Installation Safety, Lithuanian Energy Institute, 3 Breslaujos Str., LT-44403 Kaunas, Lithuania

h i g h l i g h t s • • • • •

Reliability is very important from fusion devices efficiency perspective. Rich experience of probabilistic safety analysis exists in nuclear industry. Reliability and safety analysis was applied for systems of fusion device. This enables to identify and prioritize availability improvement measures. Recommendations are based on cost effectiveness for risk decrease options.

a r t i c l e

i n f o

Article history: Received 19 October 2014 Received in revised form 27 January 2015 Accepted 4 March 2015 Available online 4 April 2015 Keywords: Fusion device Cooling circuit System reliability Probabilistic safety analysis.

a b s t r a c t Fusion energy or thermonuclear power is a promising, literally endless source of energy. Development of fusion power is still under investigation and experimental phase, and a number of fusion devices are under construction in Europe. Since fusion energy is innovative and fusion devices contain unique and expensive equipment, an issue of their reliability is very important from their efficiency perspective. A Reliability, Availability, Maintainability, Inspectability (RAMI) analysis is being performed or is going to be performed in the nearest future for such fusion devices as ITER and DEMO in order to ensure reliable and efficient operation for experiments (e.g., in ITER) or for energy production purposes (e.g., in DEMO). On the other hand, rich experience of the reliability and Probabilistic Safety Analysis (PSA) exists in nuclear industry for fission power plants and other nuclear installations. In this paper, the Wendelstein 7-X (W7-X) device is mainly considered. This stellarator device is in commissioning stage in the Max-Planck-Institut für Plasmaphysik, Greifswald, Germany (IPP). In the frame of cooperation between the IPP and the Lithuanian Energy Institute (LEI) under the European Fusion Development Agreement a pilot project of a reliability analysis of the W7-X systems was performed with a purpose to adopt Nuclear Power Plant (NPP) PSA experience for fusion device systems. During the project reliability and safety (risk) analysis of a Divertor Target Cooling Circuit, which is an important system for permanent and reliable operation of in-vessel components of the W7-X, was performed. Application of reliability and safety analysis for systems of fusion device enable to recommend and prioritize availability improvement measures for such systems and thus increase efficiency of fusion device. The methods and models described in this paper could be easily adopted for other systems of various fusion devices. © 2015 Elsevier B.V. All rights reserved.

Abbreviations: DEMO, Demonstration Power Plant; DTCC, Divertor Target Cooling Circuit; ETA, Event Tree Analysis; FA, Functional Analysis; FB, Function Block; FC, Fractional Contribution; FMEA, Failure Modes and Effects Analysis; FMECA, Failure Modes, Effects and Criticality Analysis; FPP, Fusion Power Plant; FTA, Fault Tree Analysis; FV, Fussell–Vesely Importance; HE, Heat Exchanger; IPP, Max-Planck-Institut für Plasmaphysik, Greifswald, Germany; ISHM, Integrated System Health Management; ITER, International Thermonuclear Experimental Reactor; JET, Joint European Torus; LEI, Lithuanian Energy Institute; MCS, Minimal Cut Sets; MTBF, Mean Time Between Failures; MTTR, Mean Time To Repair; NPP, Nuclear Power Plant; PPA, Power Plant Availability; PSA, Probabilistic Safety Analysis; RAMI, Reliability, Availability, Maintainability, Inspectability; RBD, Reliability Block Diagrams; RDF, Risk Decrease Factor; RIF, Risk Increase Factor; W7-X, Wendelstein 7-X Stellarator Type Fusion Device. ∗ Corresponding author. Tel.: +370 37401945. E-mail address: [email protected] (R. Alzbutas). http://dx.doi.org/10.1016/j.fusengdes.2015.03.001 0920-3796/© 2015 Elsevier B.V. All rights reserved.

32

R. Alzbutas, R. Voronov / Fusion Engineering and Design 94 (2015) 31–41

1. Introduction In general, the purpose of the reliability and risk analysis is to provide support in making correct management decisions by evaluating the reliability and risk (criticality) associated with a set of decision alternatives. The classical definition of the risk of failure is as follows [1]: R = pf C,

(1)

where R is the risk of failure, pf is the probability of failure and C is the cost caused by the failure (loss given failure). The measure of the failure cost may be different (depending on various consequences). For production plants (including a nuclear power plant), it is usually not only the cost of failure (and accident in the worst case) and repair, but also the amount of lost production (e.g., electricity) and lost profit. Risk can be reduced from level R to lower level R either by reducing the cost caused by failure C or by reducing probability of failure pf , or even by reducing both parts [1]. On the other hand, such risk reduction requires some investments and should be taken into account during the analysis. The values pf and C should be selected in such a way that risk reduction R is achieved at minimal costs. The purpose of this paper is to demonstrate how the reliability, as the main ingredient of safety (antonym of the risk), could be analyzed for systems of fusion devices and show the practical application and results of such analysis by the possibility to reduce the risk and the cost related to the risk. In this section, we will review the main fusion devices in Europe and the approaches of the reliability analysis used for fusion devices; Section 2 is devoted to a short survey of methods and techniques used for the analysis; in Section 3 we will present the overview of methods for analysis and in Section 4 we will demonstrate the results of practical application of these methods as a case study for the Divertor Target Cooling Circuit ACK10 of the Wendelstein 7-X experimental fusion device. 1.1. Fusion power development in Europe There are several fusion experimental devices in Europe operating or being constructed, or yet planned to be constructed. The aim of the European fusion research programme is developing nuclear fusion as an energy source, i.e., developing the knowledge in physics, technology and engineering required to design and build fusion power plants [4]. The roadmap toward a fusion reactor relies on three devices: Joint European Torus (JET), its successor, an International Thermonuclear Experimental Reactor (ITER) and a demonstration reactor called DEMO. JET represents a pure scientific experiment. The ITER project aims to make long-awaited transition from experimental studies of plasma physics to full-scale fusion power plants. Construction work on the ITER began in 2010, and the first plasma is initially expected to be produced in 2020. The ITER fusion reactor itself has been designed to produce 500 MW of output power for 50 MW of input power. The machine is expected to demonstrate the principle of producing more energy from the fusion process than that used to initiate it, something that has not yet been achieved with previous fusion reactors. But it will be only a scientific demonstration; the ITER will not generate any electricity. The next foreseen device, DEMO, is expected to be the first fusion plant to reliably provide electricity to the grid. Competitive electrical power production is the end goal for fusion energy [5]. If the DEMO does not substantially convince the decision makers and financiers of the viability of economically competitively electricity production, fusion will fail to be an energy source. This implies extensive pre- and post-operational examinations along with an

embedded, real-time monitoring of all operational components as a part of an Integrated System Health Management (ISHM) system that will predict and schedule preventative maintenance actions. However, the DEMO reactor will be a first-of-type device. That is why the two reactor concepts are still considered (DEMO1: pulsed reactor and DEMO2 steady-state reactor). Validated ITER technology will be applied wherever possible to DEMO, nevertheless there will be areas of technological development that will have not been attempted previously (i.e., self-sufficient fuel cycle) in an integrated environment. Therefore, it can be assumed that unscheduled maintenance due to random failures will dominate the early phase of operation and the subsequent investigation and rectification of such failures will require long periods of down time. Furthermore, the incremental start-up approach will also require periodic inspection and material sampling that will also require plant shut-down periods. While it is a difficult task to predict the actual availability that will be achieved by the DEMO plant, a reasonable target (certainly for the later stages) may be considered to be 30%. According to the other understanding and estimates [6], DEMO must look like and act like the first commercial power plant, and its availability must be adequately high (>50%) for the power producers to commit to building a commercial fusion plant. To be competitive, the final goal for commercial Fusion Power Plant (FPP) should have high availability, preferably exceeding 80%, with very few unplanned shutdowns. To profit from economies of scale, the plants should be large (∼1600 MW electrical power) and total construction time should be less than 5 years [2]. Approaching this goal and knowing the historical availability of other fusion devices (mainly research devices), DEMO should meet at least 30% availability if DEMO will be more related to the research activities. In case DEMO will represent the first commercial power plant, then high availabilities must be demonstrated by DEMO, and thus the availability should be in the interval between 40% and 70%. 1.2. Wendelstein 7-X The Wendelstein 7-X (W7-X) is an optimized stellarator experiment, which shall demonstrate the possibility to use such a system as a FPP. Steady-state operation is an intrinsic feature of stellarators, and one key element of the W7-X mission is to demonstrate steady-state operation under plasma conditions [16]. The project is in the assembly and preparation for a commissioning phase at the Max-Planck-Institut für Plasmaphysik (IPP) in Greifswald, Germany. The W7-X is in the commissioning stage now; the first plasma is expected in 2015. The purpose of the W7-X is to evaluate the main components of a future fusion reactor built using stellarator technology, even if the W7-X itself is not an economical fusion power plant with high availability. The W7-X, when finished, will be the largest fusion device created using the stellarator concept. It is planned to operate with up to 30 min of continuous plasma operation, demonstrating an essential feature of a future power plant: continuous operation. The major radius of the W7-X plasma is 5.5 m, and the effective (i.e., averaged) minor radius is 0.55 m. The device consists of five nominally identical modules. Each of these is made out of two flip-symmetric parts, so that in fact the device is composed of 10 almost identical half-modules. According to the symmetry described before, each half-module is equipped with five nonplanar coils of a different type and two slightly different planar coils [16]. In total, the magnet system of W7-X is made from 50 non-planar coils for the basic magnetic field and 20 planar coils to allow for a variation of the magnetic field configuration (rotational transform and radial position of the plasma), a bus-bar system to connect these coils electrically with each other and with the power supplies,

R. Alzbutas, R. Voronov / Fusion Engineering and Design 94 (2015) 31–41

a central support structure and a set of support elements fixing the coils to the central ring and supporting them against each other [16]. The main components are the magnetic system, cryostat, plasma vessel, diagnostics, divertor and heating systems. The total mass of W7-X is 725 ton. The cold mass of the magnet system including the central support ring amounts to 432 ton. The system is placed in a toroidal cryo-vacuum space created by the outer cryostat vessel [17], 254 ports and the plasma vessel [18]. The ports allow access to the plasma vessel for heating, diagnostics, cooling of in-vessel components and pumping of the plasma vessel. On the cryo-vacuum side, a thermal shield is installed to minimize thermal radiation to the cold components. As it was mentioned previously, after the completion of the assembly, the start of operation of W7-X is foreseen in 2015. An initial phase of shortpulse operation (first operational phase) will be followed by the completion of the actively cooled invessel components including the replacement of the inertially cooled test divertor by actively cooled high heat-flux targets [19]. Afterwards (second and further operational phases) plasma operation can be extended to 30 min at 10 MW of heating power [16]. 2. Reliability analysis of fusion devices Power plant availability is essential from the economical perspective for both fission and fusion power plants, as they require very high initial investments. Returning of the investment and earning profit means that the plant shall generate the highest possible amount of electricity, and this implies high availability. High availability of experimental fusion devices is needed for the most efficient use of the device for experiments. A conceptual research of future commercial fusion power plants has been performed with a Power Plant Availability (PPA) study aimed at identifying the aspects affecting the availability and generating costs of FPPs [2,3]. Among others, availability and reliability issues of FPPs were partially covered by the study. It concludes that in order to be competitive, fusion plants starting from the first generation need to comply with the availability factor greater than 80%, similarly to existing fission plants, with very few unplanned shutdowns. In order to guarantee continued safety of operation during fusion plant lifetime, in-service inspection and maintenance are needed, and this aspect should be taken into consideration in the design of the systems [2,4]. 2.1. ITER RAMI approach The availability objective for ITER is 60% inherent availability and 32% operational availability [3]. The inherent availability is the percentage of time during which the machine would be available if no delay due to the scheduled maintenance or supply was encountered. The operational availability reflects the inherent design including the effects of maintenance/upgrade delays, taking into account the availability of maintenance personnel and spares and other non-design factors (like operational procedures and management system). The ITER organization uses the RAMI (Reliability, Availability, Maintainability, Inspectability) approach to perform a technical risk assessment and for designing and optimizing ITER systems [15]. The RAMI approach focuses on the operational functions needed by the operation of the ITER rather than on physical components. It enables to define the requirements for the operational functions and provide the means to ensure that they could be met. The RAMI process begins during the design phase of a system, because corrective actions are still possible. The process is focused

33

on the functions needed to operate the ITER and their failure criticality. It is performed in four steps: • • • •

Functional Analysis (FA); Failure Modes, Effects and Criticality Analysis (FMECA); risk mitigation actions; RAMI requirements.

A functional analysis of the systems is performed with a functional breakdown (top–down description of the system as a hierarchy of functions) and an assessment of reliability and availability performance of the functions by using Reliability Block Diagrams (RBDs). The RBD approach uses the function blocks (FB) as a basis, but concentrates on the reliability-wise relationships between the function blocks. The input data, such as Mean Time Between Failures (MTBF) and Mean Time To Repair (MTTR), are fed to the lowest level blocks. A FMECA is performed in parallel to the RBDs to list the function failure modes and evaluate their risk level. A decision whether to accept or mitigate the failure mode is made based on the risk level. FB and RBD are input to FMECA. Risk mitigation actions are initiated in order to reduce the risk level of the failure modes identified by the FMECA. After integrating RMA, new RBDs are prepared. RAMI requirements are outputs of the ITER RAMI process. They are integrated in the system requirements: • Availability and reliability targets for the system and main functions according to the project requirements. • Required design changes that need to be integrated to improve the current design. • Specific tests to be performed on the components or systems. • Operation procedures and specific training to lower the risks when operating the machine. • Maintenance requirements in terms of a list of spares, intervals of inspection and preventive maintenance, procedures and training. • Proposals for standardization of common parts used in great numbers in the project, as ensuring inter-changeability of spares in the design of the systems shall then allow for shorter maintenance operation (replacement of consumables, repairs of failed components) and shall reduce the downtime of the systems and the severity ratings in the FMECA, reducing the risk level and allowing for more availability of the ITER for the experimental program. The process applied for the analysis of the plant systems defines failures of the functions, their criticality and provides risk mitigation actions. Up to 2010 RAMI was applied to 16 out of 21 main ITER systems. The analysis performed for the Tokamak Cooling Water System [7] initially identified 27 major risks, such as failure of the main pumps, leaks on the heat exchangers or associated valves leading to loss of cooling and possible damage for the plasmafacing components and failure of the coolant chemistry control leading to corrosion. For such major risks, risk mitigation actions are considered which reduce either the likelihood (prevention) or the consequences (protection) of the failures. The analysis proves that after implementation of the identified actions, the cooling system could be operated in higher reliability and availability at 97.7% as required by the project. The RAMI analysis for the ITER fuel cycle system [8] identified several failure modes with high risks, majority of which were removed by implementing risk reducing means. However, some most critical risks remain, e.g., several critical components of the tritium plant, which are not easily replaced or repaired.

34

R. Alzbutas, R. Voronov / Fusion Engineering and Design 94 (2015) 31–41

Up to date the ITER project is probably the one which achieved the biggest advance in systematic use of reliability and risk analysis methods for a fusion device in Europe. 2.2. Approach used for W7-X IPP has decided to use the RAMI approach for the W7-X. As the W7-X at that time was already in the manufacturing and assembly state, it was too late to make significant design changes. Therefore, it was decided to perform a reliability analysis based on modeling of already existing systems and then provide recommendations for improvement of system reliability and availability. This approach is different from the one used for ITER where the overall ITER availability goal is “distributed” among the systems and is defined for the systems and components (top–down approach). For the W7-X, on the contrary, it was decided to use the “bottom-up” approach when the existing system availability is estimated and improved. Ideally, a full-scope analysis would enable to obtain the overall W7X availability as a summary of all systems availabilities. Having such a complete model would enable to estimate how improvements of each system design, operation, maintenance, inspections, etc. would improve systems and overall W7-X availability. In order to perform reliability/availability and risk analyses of the W7-X probabilistic safety assessment methods were used. 3. Overview of methods for analysis 3.1. Main methods for assessment To estimate risk of hazardous objects a Probabilistic Safety Analysis (PSA) is used in such industries like nuclear, where it is necessary regulatory requirement in many countries, transport, chemical, oil, etc. PSA can be applied for any hazardous systems in fusion area as well, e.g., [9,10]. PSA methodology integrates information about device design, operating practices, operating histories, component reliabilities, human behavior, thermal hydraulic facility response, accident phenomena (e.g., [20]) and potential environmental and health effects. PSA is widely used for estimation of safety and reliability of energy generating complex systems. A Fault Tree Analysis (FTA) together with an Event Tree Analysis (ETA) are two main tools in a system analysis. Both methods include a quantification part and visual representations of the Boolean logic for accident sequences [11]. FTA is an analytical technique, whereby an undesired state of the system is specified (usually a state that is critical from a safety or reliability standpoint), and the system is then analyzed in the context of its environment and operation to find all realistic ways in which the undesired event (top event) can occur. Failure Modes and Effects Analysis (FMEA) is used as an input to fault tree analysis. FMEA like FMECA is a qualitative and systematic tool, which helps anticipate what might go wrong with a product or process. In addition to identifying how a product or process might fail and the effects of that failure, FMEA also helps find the possible causes of failures and the likelihood of failures being detected before occurrence. The fault tree is a graphic model of various parallel and sequential combinations of faults, caused by hardware failures, human errors, software errors, or any other pertinent events that will result in the occurrence of the predefined undesired state. The FTA attempts to develop a deterministic description of the occurrence of an event, called the top event, in terms of the occurrence or nonoccurrence of other (intermediate) events. Intermediate events are also described further until the lowest level of the detail, the basic events, is reached. Examples of basic events are “Pump fails to run”

or “Valve fails to open”. The fault tree example is presented in Fig. 3, where basic events are marked with circles. A fault tree analysis may be qualitative, quantitative, or both, depending on the objectives of the analysis. Possible results from the analysis could be the following: • Listing of possible combinations of environmental factors, human errors (if included), normal operational events, and component failures that may result in a critical state of the system. • The probability that the critical event will occur during a specified time interval. As a result of the fault tree based initial qualitative analysis, Minimal Cut Sets (MCS) are generated. A cut set is a set of basic events, which, if occurred, definitely lead to the top event. A MCS is such cut set when after removal of any basic events from it, there is no more cut set. When the fault trees are structured, the MCS generations and quantification for a quantitative analysis are made by the PSA software. 3.2. Importance and sensitivity In order to better understand the influence of each component and each parameter on the total system reliability/unavailability and risk, the importance and sensitivity analyses are performed using RiskSpectrum PSA software. The importance measures are the following: The Fussell–Vesely (FV) importance for a basic event is the ratio between the unavailability based only on all MCSs, where the basic event i is included, and the nominal top event unavailability is IiFV =

Q TOP (MCSincluding i) QTOP

,

(2)

where IiFV is the FV importance; QTOP is the nominal top event unavailability; QTOP (MCSincluding ) is the unavailability based only on MCSs, where the basic event i is included. The Risk Decrease Factor (RDF) is calculated as IiD =

QTOP , QTOP (Qi = 0)

(3)

where IiD is RDF; QTOP (Qi = 0) is the top event unavailability, where the unavailability of the basic event i is set to zero (the basic event does not contribute to the top event unavailability). The Risk Increase Factor (RIF) is calculated as follows: IiI =

QTOP (Qi = 1) , QTOP

(4)

where IiI is RIF; QTOP (Qi = 1) is the top event unavailability, where the unavailability of the basic event i is set to one (the basic event does contribute to the top event unavailability). The Fractional Contribution (FC) is calculated as follows: IiF = 1 −

1 IiD

.

(5)

The sensitivity S is calculated as a ratio between “sensitivity high” and “sensitivity low” indicators: S=

QTOP,U , QTOP,L

(6)

where “sensitivity high” QTOP , U is top event results, where the unavailability of the basic event i is multiplied by a sensitivity factor (normally equal to 10); “sensitivity low” QTOP , L is the top event results, where the unavailability of the basic event i is divided by the sensitivity factor. Using RiskSpectrum PSA software the importance estimations for parameters are calculated according to a similar procedure as

R. Alzbutas, R. Voronov / Fusion Engineering and Design 94 (2015) 31–41

35

Table 1 Considered failure modes options. Equipment

Failure mode

Description

Pump Pump Heat exchanger Heater Filter Valve Valve Valve

Fails to start Fails to run Failure Failure Failure Failure to open/close/change position Left in wrong position Valve spurious operation

Failure to start and reach required capacity after standby Stop or failure to keep the required capacity during operation Failure to ensure sufficient cooling disregarding the root cause Failure to ensure sufficient heating disregarding the root cause Flow blockage or leakage Failure to change position due to hardware failure Valve left in wrong position due to operators’ error Valve changes position due to false signal or human error

for basic events. In some cases, importance measures cannot be defined, or would be meaningless. This is the case for time to the initial test (TI) parameters, and for a risk increase factor for frequency (f) parameters. • The parameter value is set to the “best theoretically possible”, in all cases it is X = 0. This is made for all parameter types except for time to the first test (TF) parameters. • A new top event result (unavailability or frequency, depending on the type of calculation in the MCS-analysis specification) is calculated. This new, lower, result is indicated with QTOP (X = 0) in the following formula. The risk decrease factor can now be calculated as follows: IiR =

QTOP . QTOP (Xi = 0)

(7)

The fractional contribution is the following: IiF = 1 −

1 IiR

.

(8)

• The parameter value is set to the “worst theoretically possible”. It is different depending on the type of the parameter, it is q = 1 for probability parameters and X = ∞ for all other parameters. This is not applicable for frequency parameters, because an infinite frequency value would imply an infinite top event frequency. • A new top event result (unavailability or frequency, depending on the type of calculation in the MCS-analysis specification) is calculated. This new, higher, result is indicated with QTOP (qi = 1) below. The risk increase factor can now be calculated as follows. For probability parameters: IiI =

QTOP (qi = 1) . QTOP

(9)

For all other parameter types X (except frequency (f) and time to first test (TI) parameters for which no calculations are made): IiI =

QTOP (Xi = ∞) . QTOP

(10)

4. Case study: Wendelstein 7-X Divertor Target Cooling Circuit The Divertor Target Cooling Circuit (DTCC) was chosen as a pilot system for demonstration of PSA applicability for fusion device systems. DTCC is a part of the water cooling circuits for the W7-X. It provides cooling flow for the target modules during plasma operation and ensures water circulation during other operational modes. It also provides heating up of the divertor target modules up to 150 ◦ C (e.g., in the so-called baking and other modes) before starting operation campaign after outage for maintenance. The cooling circuit consists of a primary part (ACK10 cooling circuit) and a secondary part (ECB10 water supply system). The circles are separated by two parallel heat exchangers. The secondary

Table 2 Effects of equipment failure modes to the system failures. System failure mode

Equipment failure mode

Loss of circulation Loss of flow Flow in wrong direction Insufficient cooling Insufficient heating Loss of coolant Loss of static pressure

Pump failure Valve closure; Filter blockage Wrong valves position HE failure Heater failure Pipe rupture/leakage Pressurizer failure

part cools the primary part during the experiment and holds its temperature constant. The primary part includes a cooling circuit and a separate baking circuit with its own pump and provides water to 110 parallel target modules. The pipes with diameters of 25–600 mm, the valves and other components are stainless steel parts. Water for the primary part must be deionized. The total water content of the primary part is about 87 m3 . The content of a lockable target module is about 25 L. A simplified flow diagram of the DTCC [12] is provided in Fig. 1. FMEA, as describe below, in this study is used to identify possible failure modes of cooling circuits equipment, their consequences and criticality, appearance and detection possibility. 4.1. Failure modes, effects and criticality Different equipment has different typical failure modes. In some cases the failure modes are presented in general way, in others – in more detailed depending on availability of data and feasibility to go deeper in the analysis. Failure modes considered in this analysis are presented in Table 1. Failures of instrumentation and control as well as operating algorithms were not considered in the current study as they belong rather to control system reliability analysis. Instead a general “Loss of control” event is considered later among the other external systems’ failures. Failures of instrumentation, if they lead to system failures, may be analyzed in other works and added to the analysis by simple modification of the fault trees developed in scope of this work. The main task of the considered systems is cooling of the corresponding components during experiments or after baking and heating during baking. Equipment failures, which lead to failure of these functions, are considered critical. In general, there are only several system failure modes and corresponding equipment failure modes as presented in Table 2. 4.2. Failure occurrence, detection and correction opportunity Some failure modes are relevant only to specific operation modes and are not expected in others. For example, failure to open a certain valve is expected only when this valve has to be opened and impacts only that operation mode when this valve is required

36

R. Alzbutas, R. Voronov / Fusion Engineering and Design 94 (2015) 31–41

Fig. 1. Simplified flow diagram and 3D schema of W7-X DTCC.

in open position. Another example is that failure of heater does not affect cooling operation mode. For all equipment and failure modes, possibilities of failure occurrence and relevance to different operating modes were analyzed. Failure modes, which are not expected or do not impact the operating mode, were excluded from the analysis. Possibilities to detect and correct the failure were also analyzed as they impact actual system unavailability time due to this certain failure. For some equipment, e.g., a manual valve with position indicator, the wrong position could be noticed by an operator before starting operation and corrected very quickly by simply going and opening the valve. In such case, an impact to operation would be minimal or no impact at all. However, for manual valve without position indicator, the wrong position may be identified only during system survey or not detected at all. In such case, wrong position would be found only after the beginning of operation when such closed valve did not allow water flow. Each operation campaign begins with baking followed with plasma operation. This means that all failures like valves left in wrong position will be identified and corrected before baking and therefore are not relevant to plasma operation. Some other failure modes may be detected immediately but not possible to correct. For example, pump failure to start or valve failure to open will be detected immediately by system operation failure, but the correction might require repair or replacement of failed equipment. FMEA provides only qualitative estimation. Numerical estimation of failure rates and repair rates is based on failure occurrences and on failure detections and corrections. 4.3. Fault tree model The top event of the fault tree is “ACK10 unavailable for experiment” (Fig. 2). This FT consists of five branches, which model failures of five sections of the DTCC. The branches are connected by the OR-gate, which means that ANY of them may lead to the top

event. Each of the five branches is further modeled by its own fault tree: • • • • •

Loss of modules cooling; direct flow of water from the cold to the hot pipeline; no cooling water in supply pipeline; no water from pumps; no water from Heat Exchangers (HE) of ECB10 system.

Each fault tree models failure of different parts of the DTCC. The fault tree “no water from pumps AP002, AP003” is described below as an example. Cooling water circulation in the ACK10 system [13] is ensured by pumps AP001, AP002, AP003. Two pumps are required to provide sufficient cooling for normal load and full load operation modes. Only one pump AP003 is installed now, and AP002 is planned to be installed. Installation of AP001 is under consideration, and it is possible that this pump will not be installed, and only two pumps will be in operation. Each pump line is equipped with the following: • Manual gate valve KA505(504) is normally open and is closed only for pump maintenance. • Pump AP003(002) may be in operation or in standby. • Check valve KA508(507) is opened when the pump is running and closes due to the pressure difference when the pump stops; • Pneumatic valve KA510(511) is opened when the pump is running and closes due to the automatic signal when the pump stops. Most of the time, the DTCC system will be in Part Load and Standby modes, which correspond to pauses between W7-X experiments and non-working time, respectively. Cooling water flow rates for these modes are 425 m3 /h and 177 m3 /h, respectively. Such flow rate may be ensured by one pump, which is assumed to be in operation all the time. The second pump is started only when an additional flow up to 1382 m3 /h for normal load and 1602 m3 /h for full load is required.

R. Alzbutas, R. Voronov / Fusion Engineering and Design 94 (2015) 31–41

37

Fig. 2. Fault tree “ACK10 unavailable for experiment”.

This determines the failure modes of the system. It is assumed that AP003 is the main running pump, and AP002 is an auxiliary one. Therefore, the failure modes for AP003 are the following:

limitations do not enable to reflect the number of trials. Therefore, in order to fit to the mean unavailability model, failure rates per demand were recalculated to failure rates per time unit:

• Manual gate valve KA505 is erroneously closed or spuriously closes. • Pump fails to run. • Pneumatic valve KA511 is erroneously closed or spuriously closes.

=

The failure modes for AP002 are the following: • Manual gate valve KA504 is erroneously closed or spuriously closes. • Pump fails to start or fails to run. • Check valve KA507 fails to open at pump startup.

qn , T

(11)

where  is the failure rate per time unit; q is the failure probability per demand; n is the number of demands per time period; T is the length of the time period. The main pump is running the entire or almost the entire year (both pumps are out for the short period of baking heating and holding). Therefore, the main failure mode for this pump is failure to run and the failure rate for the main pump remains unchanged 3E-5 1/h. This failure rate is used for ACK10 AP003 pump. The auxiliary pump operates only about 136 h/year and it’s average “failure to run” rate per operating hour is therefore: 3E-5 failures/h × 136 h/year : 8760 h/year = 4.7E-7 failures/h.

Pneumatic valve KA510 fails to open at pump startup or is erroneously closed or spuriously closes. The fault tree is presented in Fig. 3. 4.4. Equipment reliability data Generic reliability data of NPP safety equipment were used in this analysis, as currently there is no W7-X failure statistics. The generic failure rates are presented in Table 3. The failure rates provided are averaged values, which for the purpose of this analysis were recalculated in accordance with equipment operation regimes as described below. In this analysis, two failure modes are considered for pumps, “failure to start” and “failure to run”. From nuclear power plant PSA experience, failure to run is important characteristic for constantly operating pumps of normal operation systems like feedwater, service water, etc. while failure to start is important for safety systems like reactor emergency cooling pumps or auxiliary feedwater pumps, the task of which is to start in case of emergency and run until emergency situation is mitigated. Neither of these two approaches is completely suitable for fusion device due to its pulse (tokamaks) or short-term experiments operation. In the considered DTCC system, only one pump out of two is required for most of the time, while the second one is started only for normal load and full load plasma operations. This means that the second pump starts two times per day during experimental phase, i.e., approximately 320 times per year. As number of start/stop cycles increases, negative influence of a “failure to start” on the failure rate should be updated, too. Indeed, more trials to start a pump mean more chances to fail; however, modeling

But this pump has about 320 starts/stops per year as: 2 demands/day × 5 days/week × 16 weeks/campaign × 2 campaigns/year = 320 demands/year. A failure rate per demand is therefore recalculated to the “failure to start” rate per hour: 320 demands/year × 3E-3 failures/demand : 8760 hours/year = 1.1E-4 failures/hour. Cadwallader [14] proposes to multiply pump failure rates for a factor of three in order to reflect the impact of start/stop cycles. As we see, our calculated value is approximately four times higher that 3E-5 1/h used for permanently running pump. The same approach is used for a “valve failure to open” failure rate recalculation in the model. Repair time for both failed pumps and valves is conservatively assumed 720 h (1 month) as it is assumed that no spares are available, and the necessary spares should be procured and delivered on site. 4.5. Results of reliability and sensitivity analysis A reliability analysis of the DTCC (ACK10) was performed using the FTA and RiskSpectrum PSA software. The developed FTA model quantification includes minimal cutsets generation and an uncertainty and sensitivity analysis. The calculation was performed for the time period of 6526 h, i.e., the total time of operation campaigns

38

R. Alzbutas, R. Voronov / Fusion Engineering and Design 94 (2015) 31–41

Fig. 3. Fault tree “No water from pumps AP002, 003”.

R. Alzbutas, R. Voronov / Fusion Engineering and Design 94 (2015) 31–41

39

Table 3 Generic failure rates. Name

Failure rate

Unit

Distribution

Error factor

Pump fails to run Pump fails to start Check valve fails to open Pneumatic valve fails to open Pneumatic valve spurious closure Filter failure Regulating valve spurious opening Pneumatic valve spurious opening Manual valve fails to open Heat exchanger failure Pneumatic valve left closed Manual valve spuriously closed Regulating valve left opened

3.00E−05 3.00E−03 1.00E−04 2.00E−03 1.00E−07 2.00E−06 5.00E−07 5.00E−07 1.00E−04 9.00E−06 3.00E−03 3.00E−03 3.00E−03

1/h 1/demand 1/demand 1/demand 1/h 1/h 1/h 1/h 1/demand 1/h 1/demand 1/year 1/demand

Lognormal Lognormal Lognormal Lognormal Lognormal None Lognormal Lognormal Lognormal Lognormal Lognormal Lognormal Lognormal

10 10 3 3 3 – 10 10 3 10 10 3 10

Table 4 ACK10 unavailability for each failure in ACK10. No.

Unavailability

% Total

Event name

1. 2. 3. 4. 5. 6. 7.

9.57E−02 6.60E−02 2.11E−02 6.44E−03 6.44E−03 3.52E−03 4.50E−04

51 35.1 11.3 3.43 3.43 1.87 0.24

Pump AP002 fails to start Pneumatic valve KA510 fails to open Pump AP003 fails to run Heat exchanger AD002 fails Heat exchanger AD001 fails Check valve KA507 fails to open Pump AP002 fails to run

per 1 year. The MCS generations for the analysis are made by the PSA software. The calculated total unavailability for the ACK10 operation period in a year is 0.188. This means that the system will be unavailable for 18.8% of the operation campaign. As a result of the ACK10 model analysis 56 minimal cut sets were generated. The most important 7 MCS (which gives the highest unavailability) are presented in the following Table 4; the remaining MCS bring only 0.01% to the total unavailability. The results show that due to low system redundancy the failure of a single component may lead to complete system unavailability. The failure of an auxiliary (secondary) cooling pump AP002, which has a high quantity of cyclic loads, brings more than 50% influence on the system unavailability. The pneumatic valve KA510, located at the pressure line of the same pump, brings about 35%. This valve is also subject to high cyclic loads. In order to better understand the influence of each component and each parameter on the total unavailability, the importance and sensitivity analyses were performed. The results for the most important seven basic events are presented in Table 5. It is obvious that the most important basic events are related to the minimal cut sets. The interesting outcome is that importance and sensitivity measures show how total unavailability would change if reliability of each component changes. For example, RDF shows that assuming “perfect” pumps with failure probability equal to 0 (1st basic event), this would decrease ACK10 unavailability 1.84 times, and a “sensitivity low” indicator QTOP , L shows that increasing pump reliability 10 times, ACK10 unavailability would be 11% against the current 18.8%.

The results of the importance and sensitivity analyses for parameters are presented in Table 6. The final results show that the most important contributor to the system unavailability is not equipment failure rates, but 1 month time period for hardware repair or replacement, which was assumed (parameter ONE MONTH). The “sensitivity low” indicator QTOP, L shows that decreasing this time 10 times would result in ACK10 unavailability only 2.2%. The next important parameter is the pump standby failure rate (parameter PUMP STBY), which is used for the pump AP002 and whose improvement by 10 times would change ACK10 unavailability from 18.8% to 11.1%. Other results can be interpreted in the same way. 4.6. Evaluation of risk decrease measures The reliability analysis identifies that the main impact on unavailability of ACK10 is the operation regime of the cooling pumps where, the second pump is started only to provide additional cooling for plasma experiments. This causes high cyclic load and corresponding high failure probability to the secondary pump (unavailability 95.7%, which is almost certainly once per year). This component brings 51% to the total unavailability of ACK10. The current section provides analysis of different options to increase availability of ACK10 pumps. As indicated above, one ACK10 pump is constantly operating, and the second pump is an auxiliary one, which is started only during plasma operation, when additional water flow is required for cooling. A high number of start/stop cycles of the pump and open/close cycles of the pneumatic valve lead to high failure rate

Table 5 Results of the basic events importance and sensitivity analysis. No.

Normal value

FV

FC

RDF

RIF

S

QTOP, U

QTOP, L

1. 2. 3. 4. 5. 6. 7.

9.57E−02 6.60E−02 2.11E−02 6.44E−03 6.44E−03 3.52E−03 4.50E−04

5.10E−01 3.51E−01 1.13E−01 3.43E−02 3.43E−02 1.87E−02 2.40E−03

4.58E−01 3.06E−01 9.35E−02 2.80E−02 2.80E−02 1.53E−02 1.95E−03

1.84E+00 1.44E+00 1.10E+00 1.03E+00 1.03E+00 1.02E+00 1.00E+00

5.33E+00 5.33E+00 5.33E+00 5.33E+00 5.33E+00 5.33E+00 5.33E+00

8.71E+00 5.17E+00 2.01E+00 1.28E+00 1.28E+00 1.15E+00 1.02E+00

9.61E−01 7.04E−01 3.46E−01 2.35E−01 2.35E−01 2.14E−01 1.91E−01

1.10E−01 1.36E−01 1.72E−01 1.83E−01 1.83E−01 1.85E−01 1.87E−01

40

R. Alzbutas, R. Voronov / Fusion Engineering and Design 94 (2015) 31–41

Table 6 Results of the parameters importance and sensitivity analysis. No.

ID

Type

Normal value

FC

RDF

RIF

S

QTOP , U

QTOP , L

1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13.

ONE MONTH PUMP STBY PV FTO PUMP FTR HE FAIL CV FTO ONE DAY MV SPC PUMP A FTR PV SPC PV SPO MV FTO FILTER FAIL

Tr R R R R R Tr R R R R Q R

7.20E+02 1.47E−04 9.81E−05 3.00E−05 9.00E−06 4.90E−06 2.40E+01 9.19E−07 6.25E−07 9.19E−07 9.19E−07 1.00E−04 2.00E−06

9.96E−01 4.58E−01 3.06E−01 9.35E−02 5.63E−02 1.53E−02 3.44E−03 2.10E−03 1.95E−03 1.15E−03 1.91E−04 6.60E−07 6.22E−07

2.37E+02 1.84E+00 1.44E+00 1.10E+00 1.06E+00 1.02E+00 1.00E+00 1.00E+00 1.00E+00 1.00E+00 1.00E+00 1.00E+00 1.00E+00

5.33E+00 5.33E+00 5.33E+00 5.33E+00 5.33E+00 5.33E+00 5.33E+00 5.33E+00 5.33E+00 5.33E+00 5.33E+00 1.01E+00 1.00E+00

3.65E+01 5.07E+00 3.59E+00 1.85E+00 1.54E+00 1.15E+00 1.03E+00 1.02E+00 1.02E+00 1.01E+00 1.00E+00 1.00E+00 1.00E+00

8.02E−01 5.64E−01 4.90E−01 3.18E−01 2.74E−01 2.13E−01 1.94E−01 1.91E−01 1.91E−01 1.90E−01 1.88E−01 1.88E−01 1.88E−01

2.20E−02 1.11E−01 1.36E−01 1.72E−01 1.78E−01 1.85E−01 1.87E−01 1.87E−01 1.87E−01 1.88E−01 1.88E−01 1.88E−01 1.88E−01

Table 7 Comparison of risk for risk decrease options. Option

Failure probability, pf

Loss given failure, C, kD

Risk, R, kD

Current status Installation of 3rd pump Spare parts Maintenance

0.188 0.044 0.059 0.148

2064.88 10.00 489.47 2064.88

388.20 0.44 28.69 305.60

Table 8 Comparison of cost effectiveness for risk decrease options. Option

Risk reduction, R, kD

Total expenditures, Ct , kD

Cost effectiveness, R/Ct

Installation of the 3rd pump Spare parts Maintenance

387.8 359.5 82.6

220.0 115.0 42.7

1.8 3.1 1.9

Fig. 5. Comparison of risk reduction and expenditures for risk decrease. Fig. 4. Comparison of risk for risk decrease options.

of the auxiliary pump line. Failure of the auxiliary pump leads to inability to provide sufficient cooling of target modules and, as a result, inability to perform experiments. This means that until operation of the failed pump line is restored the complete W7-X is inoperable and all related experiments and investigations are delayed to the corresponding amount of time. Although the W7-X is not producing a profit, the cost of W7X downtime can be estimated on a basis of a daily cost of the W7-X operation. Indeed, when W7-X is inoperable the entire device and personnel cannot perform their main task (experiments, research), while the operational costs (personnel salary, infrastructure, resources, etc.) remain the same.

The cost of W7-X downtime due to a failure may be interpreted as inoperable time (or downtime) Td , caused by the failure multiplied by cost of time unit Ct (hour, day), and the risk may be rewritten as: R = pf Td Cb ,

(12)

Reduction of the risk may be achieved by reduction of any of these three variables: probability of failure, downtime and cost of downtime. Below, several options to reduce the risk related to failure of the ACK10 pumps are considered. A comparison of the risk decrease measures is presented in Table 7 and Fig. 4.

R. Alzbutas, R. Voronov / Fusion Engineering and Design 94 (2015) 31–41

The lowest risk corresponds to the installation of the third redundant pump. However, the cost of risk decrease should also be evaluated. A comparison of risk decrease against the expenditures is provided in Fig. 5. The cost effectiveness of expenditures for risk reduction is calculated as ratio between risk reduction R and total expenditures Ct for each option. The result is presented in Table 8. Cost effectiveness indicates the risk reduction gained for each euro invested in reliability increase. From the cost effectiveness perspective, the option of keeping the necessary set of spare parts in order to decrease repair time is preferable. 5. Conclusions A pilot reliability and risk analysis of the Divertor Target Cooling Circuit ACK10 was performed applying PSA related methods. The analysis covered data collection, development of a fault tree model, failure modes and effects analysis, estimation of reliability parameters and unavailability calculations. Results of such analysis enable to recommend measures for improvement of availability of such systems and thus increase efficiency of use of fusion devices. The most important results and conclusions are as follows: (1) Unavailability of the ACK10 is 18.8% of the operational campaign, i.e., about 1.5 months of 8-month operation in a year the system would be unavailable thus causing unavailability to use the W7-X for experiments. (2) The main impact on unavailability is an operational regime of the cooling pumps, where one pump is always running to provide cooling during all operational modes and the second one is started only to provide additional cooling for plasma experiments. This causes high cyclic load and corresponding high failure probability to the secondary pump (unavailability 95.7% which is almost certainly once per year) and its regulating valve (unavailability 66%, i.e., twice in three years). Unavailability of these components brings 51% and 35% to the total unavailability, respectively. (3) Another major reason for unavailability is a long repair time, which is assumed to be 1 month accounting for the time required to deliver and repair the equipment at the manufacturer’s site or procure the spares required for the repair. Limited redundancy of the equipment does not enable to continue operation while the components are being repaired. Comparison of the W7-X and ITER reliability and risk analysis shows that the W7-X analysis uses FTA and FMECA, which are similar to the RBD-FMECA approach for the ITER. The W7X has fewer possibilities to make design changes in comparison with ITER, therefore, it should concentrate on such risk prevention and mitigation measures, which would require less intervention to already designed and installed systems, such as: • Improvement of maintenance program. • Improvement of operating/maintenance procedures. Hardware or system configuration changes only for safety important components. The conclusions and results of the loss of offsite power accident sequence analysis enable to identify and prioritize reliability increase measures aimed to protect the W7-X equipment against undesirable consequences of loss of power supply. Based on the results of W7-X DTCC reliability analysis, the availability improvement measures for DTCC were proposed and compared in order to identify the most efficient one. Three options for pumps availability increase were considered:

41

(a) Installation of the redundant (third) pump. (b) Reducing repair time by providing stock of spare parts. (c) Preventive maintenance. Cost effectiveness expressed as risk reduction for each euro invested in availability increase was used for comparison of the options. Option b has the highest cost effectiveness equal to 3.1, while options a and b have 1.8 and 1.9, correspondingly. Thus, application of reliability and safety (risk) analysis for systems of fusion device enable to recommend and prioritize availability improvement measures for such systems and thus increase efficiency of fusion device. The methods and models described in this paper could be easily adopted for other systems of various fusion devices. Acknowledgements This paper was prepared on the basis of the work, which was carried out within the framework of the European Fusion Development Agreement and supported by the European Communities. The presented work is also partly supported by the Agency for Science, Innovation and Technology in Lithuania. And last but not least, the authors wish to acknowledge the great support and valuable assistance provided by Dr. Dirk Naujoks from the Max-Planck-Institut für Plasmaphysik, Greifswald, Germany. References [1] M.T. Todinov, Risk-Based Reliability Analysis and Generic Principles for Risk Reduction, Elsevier Science & Technology Books, London, 2006. [2] D. Ladra, G. Sanguinetti, E. Stube, Fusion power plant availability study, Fusion Eng. Des. 58–59 (2001) 1117–1121. [3] D. Van Houtte, K. Okayama, F. Sagot, ITER operational availability and fluence objectives, Fusion Eng. Des. 86 (2011) 680–683. [4] J. Pamela, A. Bécoulet, D. Borba, J.-L. Boutard, L. Horton, D. Maisonnier, Efficiency and availability driven R&D issues for DEMO, Fusion Eng. Des. 84 (2–6) (2009) 194–204. [5] L.M. Wagner, An Integrated Design for Demo-FPP to Achieve RAMI Goals, ReNeW (Research Needs Workshop) White Papers, Theme IV: Harnessing Fusion Power, Office of Fusion Energy Sciences, 2009. [6] L.M. Wagner, Why is Reliability, Availability, Maintainability, and Inspectability important to the future of fusion? in: Material of HFP Workshop, RAMI Session, UCLA, 2–4 March, 2009. [7] D. Van Houtte, K. Okayama, F. Sagot, RAMI approach for ITER Fusion Eng. Des. 85 (2010) 1220–1224. [8] K. Okayama, D. van Houtte, F. Sagot, S. Maruyama, RAMI analysis for ITER fuel cycle system, Fusion Eng. Des. 86 (2011) 598–601. [9] L. Hu, Y. Wu, Probabilistic safety assessment of the dual-cooled waste transmutation blanket for the FDS-I, Fusion Eng. Des. 81 (8–14) (2006) 1403–1407. [10] G. Cambi, G. Cavallone, M. Costa, S. Ciattaglia, Summary of NET Plant Probabilistic Safety Approach and results by means of ENEA Fusion Plant Safety Assessment (EFPSA), J. Fusion Energy 12 (1–2) (1993) 127–131. [11] R. Caporal, T. Pinna, Multiple failure accident sequences for SEAFP reactor, J. Fusion Energy 16 (1/2) (1997) 45–53. [12] Technologisches Konzept für die Experimentkühlung, Institut für Plasmaphysik, 1-ECB00-T0001. [13] M. Braun, Kühlkreisläufe Experiment W-7X, Institut für Plasmaphysik, 1ACK00-Z0001.2. [14] L.C. Cadwallader, Selected Component Failure Rate Values From Fusion Safety Assessment Tasks, in: INEEL/EXT-98-00892, September, 1998, Idaho National Engineering and Environmental Laboratory, 1998. [15] J.J. Ferrada, W.T. Reiersen, RAMI analysis for designing and optimizing ITER tokamak cooling water system, Fusion Sci. Technol. 60 (1) (2011) 105–112. [16] H.-S. Bosch, R.C. Wolf, T. Andreeva, J. Baldzuhn, D. Birus, T. Bluhm, et al., Technical challenges in the construction of steady-state stellarator Wendelstein 7-X, Nucl. Fusion 53 (12) (2013) 1–16. [17] T. Koppe, A. Cardella, B. Missal, B. Hein, R. Krause, H. Jenzsch, et al., Overview of main-mechanical-components and critical manufacturing aspects of the Wendelstein 7-X cryostat, Fusion Eng. Des. 86 (6–8) (2011) 716–719. [18] B. Hein, A. Cardella, D. Hermann, A. Hansen, F. Leher, A. Binni, et al., Manufacturing and assembly of the plasma and outer vessel of the cryostat for Wendelstein 7-X, Fusion Eng. Des. 87 (2) (2012) 124–127. [19] G. Dundulis, R. Janulionis, R. Karaleviˇcius, The application of leak before break concept to W7-X target module, Fusion Eng. Des. 88 (2013) 3007–3013 (ISSN: 0920-3796). [20] E. Uˇspuras, A. Kaliatka, T. Kaliatka, Analysis of the accident with the coolant discharge into the plasma vessel of the W7-X fusion experimental facility, Fusion Eng. Des. 88 (5) (2013) 304–310 (ISSN: 0920-3796).