Reliability Engineering and System Safety 114 (2013) 99–113
Contents lists available at SciVerse ScienceDirect
Reliability Engineering and System Safety journal homepage: www.elsevier.com/locate/ress
Quantification of margins and uncertainties using multiple gates and conditional probabilities Gianluca Iaccarino a,n, David Sharp b, James Glimm c a
Mechanical Engineering Department, Stanford University, Stanford, CA 94305-3035, USA Los Alamos National Laboratory, Los Alamos, NM 87545, USA c Applied Mathematics and Statistics Department, SUNY Stony Brook, NY 11794-3600, USA b
a r t i c l e i n f o
abstract
Article history: Received 24 July 2012 Received in revised form 20 November 2012 Accepted 21 November 2012 Available online 4 February 2013
A methodology to perform the Quantification of Margins and Uncertainties (QMU) is introduced to enable the assessment of the safety associated with the operating conditions of complex engineering devices consisting of multiple subcomponents or coupled multi-physics processes. One of the key components of the approach is the possibility of decomposing the system into subcomponents characterized by critical metrics—gates—that collectively describe the reliability of the whole system. In the present study we formalize the process of constructing conditional probabilities for system performance and illustrate it with two applications: the evaluation of the test-time in a shock-tube experimental facility and the assessment of the unstart limit in the combustion chamber of a supersonic propulsion engine. In both cases, multiple uncertainties are considered and the gates are used as a mechanism to reduce the complexity of the resulting stochastic problem. & 2012 Elsevier Ltd. All rights reserved.
Keywords: Uncertainty Safety margins Hypersonics
1. Introduction and motivation The operability of engineering systems is typically characterized in terms of one or a few metrics associated with failure modes. In some situations such as the failure of a structure due to excessive loading, it is clear that characterization of the material allows one to determine accurate bounds on the operating range. In other cases, interactions between diverse physical processes and limited information on the operating environment can hinder our ability to precisely determine failure conditions and possibly result in an overly conservative design and inefficient systems. This is often the case for aerospace engineering devices and also for experimental facilities used to certify these devices, which are typically designed using relatively simple theoretical tools. Considerable effort is currently devoted to the enhancement of our ability to assess risks. Although great strides have been made in the computational determination of reliability measures in all engineering fields, most of the actual certification remains fundamentally based on physical prototyping. This is especially true for the application considered here—vehicles and propulsion systems operating in a hypersonic regime [1,2]. The present approach is based on an extension of the methodology known as Quantification of Margins and Uncertainties (QMU) [3–7] which has been developed to study the safety and reliability of realistic engineering systems and was recently applied
n
Corresponding author. E-mail addresses:
[email protected],
[email protected] (G. Iaccarino).
0951-8320/$ - see front matter & 2012 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.ress.2012.11.026
to a simplified hypersonic air-breathing propulsion system [8]. QMU aims at computing a confidence range for the performance of a given system balancing the need for a safety margin—to operate away from a failure region—with the presence of uncertainties. In a broader sense, the extension of the QMU framework presented here recognizes the need to construct a fine-grain characterization of complex systems, in which simpler subsystems can interact in a strongly non-linear fashion creating emerging behaviors [9] and a potential for systemic failure even when all the subsystems behave within what would normally be considered acceptable risk levels. In the QMU methodology, risks are related to both catastrophic system failure as well as anomalous impediments of performance as long as these lead to serious consequences. The approach we are exploring incorporates the familiar and widely used method of safety margins into a systematic framework. Serious consequences: This is not a new problem and QMU has some features in common with other methodologies such as fault tree analysis (see Ref. [10] and discussion below), Bayesian networks [11], and others. We will discuss some of the differences in Section 5 after introducing the tools used in this work. The QMU framework is based on:
A set of performance metrics and a ‘‘gate’’ for each metric, which represent acceptable conditions for the system operation. Information represented by each gate includes safety margins and uncertainties in such margins. Collectively, the metrics and gates form a reduced order model of the system which describes the confidence (optimally, in a mathematical
100
G. Iaccarino et al. / Reliability Engineering and System Safety 114 (2013) 99–113
sense, but in practice typically including scientific judgement) that the system will perform within prescribed limits, that is with specific performance levels under stipulated operating conditions and manufacturing tolerances. We designate one gate as the primary gate and the others, which are notionally connected to it, as secondary. It is in principle possible to have multiple primary gates if additional metrics (and failure mechanisms) can be identified for the system considered. For example, in the hypersonic propulsion system considered in the following, failure can occur because of excessive structural loading (mechanical failure) or thermal choking associated with excessive heat release during the combustion process [8]. In this work only the latter will be examined. A network of metric-associated gates can also be understood as a decomposition of a complex system into units (which may or may not be independent) whose study furnishes information of prime importance in assessing the operation of the full system. This starting point provides an approach for dealing with the curse of dimensionality brought about by the presence of a large number of components and uncertainties in realistic engineering systems by focusing on the prime indicators of performance, at the cost of an incomplete, or inexact, description of the full system. At the same time, as will be seen, it leads to a method for evaluating the conditional probability for passing through a sequence of performance gates, from beginning to end, which enables one to estimate the probability of successful operation at the primary gate, used to characterize the ‘‘full system performance’’. A byproduct of this assessment is an illustration of the behavior of the subcomponents conditioned on the safe operation of the system. A quantitative measure of the desired operability margin (safety margin). This is specified through a non-dimensional confidence ratio CR defined for the primary gate metric as the margin, i.e. the distance between nominal operation and unsafe conditions, over the uncertainty in the metric itself. If
the CR is assumed to be unity and the uncertainties in the gate metric are evaluated, one can quantify the margin. This will be the strategy adopted in the applications presented below. In our previous work [8] we illustrated how different choices of CR 4 1 lead to different safe operating scenarios for the system. We believe that the QMU framework presented here provides a workable approach to assessing or estimating the probability of failure for an important class of complex engineered systems governed by multi-physics processes or consisting of distinct subsystems. This estimate, a confidence estimate, reflects available information about safety margins associated with prime indicators of performance at various times and locations throughout the system. Thus we have more information than a simple ‘‘yes’’ or ‘‘no’’ about the conditions at the primary gate(s). This additional information helps diagnose what has gone wrong when something has gone wrong. The paper is organized as follows. The next section describes the two engineering systems we selected to illustrate the application of the methodology introduced here. The QMU concept of a network of gates is introduced afterward, together with a detailed computational procedure for evaluating the conditional probabilities of passing through an identified sequence of gates.
2. The engineering systems 2.1. Shock-tube wind tunnel Experimental design of a hypersonic vehicle relies heavily on blow-down or impulsive facilities, which translate high-enthalpy stagnating fluid into high kinetic energy flow, reaching conditions relatively close to the actual flight conditions for a very short time window. The accurate determination of the time interval in which the desired conditions are reached is obviously of great importance. We consider a shock tube schematically represented in Fig. 1: the operating scenario is quite simple; two gases are initially
Fig. 1. Schematic of a reflected shock tunnel configuration and corresponding space–time diagram showing the primary flow features and the definition of the test-time (modified from [12]).
G. Iaccarino et al. / Reliability Engineering and System Safety 114 (2013) 99–113
separated by a diaphragm. The pressure difference (and potentially the density difference) between the driving and driven gas leads to the emergence of a strong right-moving shock wave when the diaphragm is ruptured. At the far-end of the shock tube a second diaphragm (nozzle diaphragm) is burst when the shock arrives, leaving a region of almost stationary hot test gas at the nozzle inlet. This highly energized gas begins to flow through the hypersonic nozzle and generates a hypervelocity flow in the test section; this is the start of the test window. Meanwhile a reflected shock moves back into the tube (towards the left) and eventually meets the contact surface separating the driving and driven gas, and a secondary reflected shock (right-moving) occurs—see Fig. 1. When this reflected shock reaches the nozzle entrance, the local gas conditions are no longer uniform and this can be considered the end of the useful test window. Although in theory it is possible to determine accurately the arrival time of the reflected shock, there are many compounding factors that can lead to large uncertainty in the quantification of the amount of driving-gas contamination at the entrance of the nozzle. One important effect is the interaction between the reflected shock and the boundary layer which forms on the walls of the shock tube.
2.2. Hypersonic vehicle combustion chamber The second problem we consider is the study of the operability limits associated with the propulsion system of hypersonic vehicles flying at high Mach numbers. In these circumstances, the engine operates under supersonic conditions—scramjet mode—and a key performance metric is the amount of heat released in the combustion chamber, as this is directly connected to the generation of thrust. Fig. 2 shows a schematic of the cross-section of the combustor chamber of an air-breathing hypersonic vehicle. The injection system is carefully designed to mix the fuel and the incoming supersonic air stream and to produce auto-ignition of the mixture. This process is complicated by the presence of a shock wave system within the chamber (shock train) interacting with the viscous boundary layers formed on the walls. The vehicle thrust is directly related to the amount of heat release occurring in the chamber, but there is an upper limit [13] corresponding to thermal choking: a normal shock and a consequent subsonic flow region is established in the combustor. In addition to a reduction in the thrust, this can lead to increased structural and thermal loads and eventually to failure. The normal shock can also propagate upstream in the combustion chamber creating extensive regions of boundary layer separation; moreover, the shock motion can lead to engine unstart with the entire shock train moving upstream. Under these conditions the vehicle performance is compromised and extreme actions have to be taken to restart the engine [14].
Fig. 2. Schematic representation of the combustion chamber of an air-breathing hypersonic vehicle. The supersonic conditions in the chamber result in the formation of a shock train, which interacts with the viscous boundary layers on the inner and cowl surface. The injection of fuel creates a region within the chamber where fuel ignition takes place leading to heat release. The symbols U, W and C are representative of the upstream, wall, and centerline conditions, respectively.
101
Unstart conditions can also be reached for different reasons, not directly connected to the combustion process. In particular, indications of unstart events connected to perturbations of the flow at the inlet [15] and to thermal deformations of the structure [16] have been observed in ground tests.
3. QMU concepts QMU is intended to provide a measure of confidence in the safe operation of an engineered device. A QMU analysis is summarized in terms of the confidence ratio (CR) defined as the ratio of the margin (M) to the uncertainty (U). The margin is defined as the distance between the nominal design conditions (in a given performance metric) and a safe operating boundary while the uncertainty is related to the variability and limited confidence in the estimation of the operating range. The use of CR as a confidence measure, instead of the explicitly reporting both M and U, provides an intuitive notion of safety when CR 4 1 but obviously hides the quantitative information on both the margins and the uncertainty [7]. In the examples presented later we assign a value to CR and from CR ¼ M=U we evaluate the margin given the uncertainty. From a mathematical perspective, we associate a gate G with a critical measure of performance f and define 1 when fðx,tÞ A ½fa : fb , ð1Þ Gðx,tÞ ¼ 0 otherwise and assume that safe conditions are associated with Gðx,tÞ ¼ G ðfÞ ¼ 1. This is not equivalent to assuming that failure will occur when GðfÞ ¼ 0 but is merely to assert that there is no sufficient evidence to certify the system outside of the range ½fa : fb . If we assume that only the upper bound of the interval, fb , needs to be considered to establish safe operation of the system, we further define M ¼ Jffb J U ¼ Uf CR ¼ M=U
ð2Þ
where f represents the actual performance measure of the system obtained using experimental data or numerical simulations and U f is a measure of the uncertainties in the predictions of f and in the estimate of the location of the safe operating boundary, respectively. The example of the hypersonic propulsion system reported later corresponds to this situation because only the upper bound of the pressure inside the combustion chamber has to be considered in relation to unstart phenomenon. In general however, it is possible that both bounds on the performance metric need to be controlled and in this case the margin would be defined as M ¼ minðJffa J,Jffb JÞ:
ð3Þ
The system performance f is evaluated using a computational model and, therefore, is affected directly by uncertainty; however, the location of the safe boundary, e.g. fa or fb , might also be affected by uncertainty. The bound can be obtained using a combination of theoretical arguments, experimental observations and scientific judgement, and therefore only an estimate is available. In the following examples this additional uncertainty is not considered although in principle it can be included in the definition of U in Eq. (3). In [8] we assumed either that the performance metric is characterized probabilistically or using intervals. In the former case f is the expected value and U is proportional to the standard deviation. Alternatively, and this approach is used in the following, we define f as the midpoint of the interval covering all the
102
G. Iaccarino et al. / Reliability Engineering and System Safety 114 (2013) 99–113
observed (or predicted) performance, and 2 U represents the interval size. In both cases the confidence ratio CR is a deterministic quantity. In practical applications, it is possible to evaluate the margin M and the corresponding uncertainty U thus obtaining CR, or, vice versa, to assign CR and compute the corresponding margin. In our previous work [8], we studied the operability range of a hypersonic air-breathing propulsion system with respect to an engine failure due to thermal choking [8]: the analysis focused on the evaluation of the confidence ratio as a function of the fuel injection rate and the geometrical design of the combustor chamber. From a practical point of view, Monte Carlo sampling was used to propagate uncertainties in the flight conditions and failure was identified with the occurrence of a strong shock within the combustion chamber. Therefore, although the failure mechanism was directly connected to the heat release due to combustion, the gate/metric couple was associated with the location of a shock-wave. This choice was motivated by two factors: (i) the computational analysis of the system was based on 1D thermodynamic analysis and therefore did not allow a precise characterization of the flow within the combustor; moreover (ii) in current engineering practice for such vehicles pressure transducers are used to detect unstart scenarios and, therefore, they directly measure the occurrence of shocks. The apparent disconnect between the gate/metric and the underlying physical process partly motivated the present study. Although the presence of a shock within the chamber is obviously representative of incipient unstart failure, it does not uniquely define the underlying mechanism and does not rule out structural deformations or aerodynamic blocking as two other possible precursors. The present choice of metrics (in Fig. 2) enables us to characterize the flow (and thermodynamic conditions) within the combustor more precisely, thus enabling us to construct a more precise picture of the conditions leading to unstart/failure. 3.1. Primary and secondary gates The primary performance gate is introduced to quantitatively summarize the safe operation of the system. It is a global measure of the device performance and provides no insight into the system’s detailed behavior, e.g. the successful operation of the various subcomponents. In this work, we propose the introduction of additional gates—primary and secondary—and a formal dependency structure among them: a network. The main motivations behind the introduction of a network of gates are:
1. quantitative and finer-grained assessment of the successful operating conditions; 2. reduction and decomposition of the problem to combat the curse of dimensionality; 3. introduction of an approximate procedure to compute conditional probabilities. The primary gate (hereafter indicated by G) is connected to the overall performance of the system—the shock-wave location in the scramjet example reported above [8]. The other gates in the network—the secondary gates—are introduced as a way to decompose the system in order to study the conditions that determine success or failure at the primary gate. It is worth noting that in a general setting, it is possible to define multiple secondary gates, although in general only one primary gate will be present that pertains to a possible failure mode of the system. However, multiple primary gates can be defined with respect to alternative failure modes and a separate QMU analysis can be carried out to assess the safety of the system.
We start from the simplest network consisting of a primary gate (G) and a secondary gate (g); we consider different metrics at the primary and secondary gates: fðx,tÞ and cðx,tÞ, respectively. These metrics are evaluated at distinct space/time points and in general
f ¼ fðvelocity, temperature, geometry, uncertainties, etc:Þ and similarly for c. We define the gates G and g as follows: 1 when fðx1 ,t 1 Þ ¼ f1 A ½fa : fb , Gðx,tÞ ¼ 0 otherwise gðx,tÞ ¼
1
when cðx2 ,t 2 Þ ¼ c2 A ½ca : cb ,
0
otherwise
ð4Þ
in which the intervals represent the variation of the metrics f and c which are believed to be compatible with successful performance of the system. Note that in general it is possible to introduce a secondary gate which characterizes a subcomponent behavior (spatial decomposition) or the evolution of the system until a certain state occurs (temporal decomposition). In both situations, it is desirable that the two gates are not independent so that the study of the system behavior at g can lead to increased understanding of the overall performance at G. In other words it is desirable that G¼ G(g) or more precisely f ¼ fðcÞ. From a statistical point of view, if G and g are strictly correlated the added information might be of limited value but may still lead to potential savings in the analysis depending on the nature of the two gates. If the secondary gate is interpreted as a proxy for the primary gate, it is possible to define a safe range for g and therefore obtain four possible operating scenarios as reported in Table 1. The operating conditions corresponding to G ¼g are the least interesting because they correspond to either safe or unsafe behavior of the system, although the availability of g provides more information on the origin of the unsafe conditions in scenario #3. The failure of the primary gate (G ¼0) with safe operation at the secondary gate is insightful: the unsafe conditions are either determined outside of the domain of the subcomponent identified by g or the secondary gate is not correctly defined (from a safety standpoint). We consider these two conditions as related to an unexpected scenario. Finally the conditions that lead to failure at the subcomponent level but safe operation of the system (G¼ 1) reflect a possible overly conservative design, or the presence of redundancy. The discussion above focuses on the case in which the secondary gate g is defined to support a decomposition of the device into subsystems. Another option is to consider the temporal evolution of the system and to use the secondary gate to assess successful performance up to a certain time. In this case obviously the secondary gate is not a proxy for the primary gate, and failure can clearly occur after g is evaluated. This decomposition can be useful because if failure is detected at g, no further analysis is required thus reducing the overall computational cost. This situation will be illustrated afterward in the shock-tube example. Table 1 Combination of the gate metrics leading to failure or safe operation of the system. In this case the secondary gate is intended as a proxy of the primary gate. Scenario
G1
g2
Failure
Note
1 2 3 4
1 1 0 0
1 0 0 1
No No Yes Yes
Robust performance g2 that maybe too conservative Unsafe performance Unexpected scenario
G. Iaccarino et al. / Reliability Engineering and System Safety 114 (2013) 99–113
3.2. Networks of gates The generalization of the concepts introduced in the previous section leads naturally to a sequence, or, more generally, a complex network of secondary gates. The introduction of a set of metrics and gates used to establish confidence in the safe operation of a device is a very common concept in engineering practice; an example is the checklist used by airplane pilots which includes multiple items (gates) associated with correct operation of the engine, flaps, rudder, avionics, etc. Only safe conditions on all these items leads to a go decision for take-off. The analysis of the shock-tube facility introduced earlier (Section 2.1) follows this structure with two secondary gates attached in sequence to the primary gate (left of Fig. 3). The concept of sequential checklist is not general enough to analyze a complex engineering system. Let us consider the case of the scramjet propulsion system. Even considering only unstart events as limiting the safe operation of the device, there are multiple scenarios that can occur:
The amount of fuel injected can exceed the thermal capacity of
the supersonic flow in the chamber and lead to abrupt deceleration through a normal shock (thermal choking). Autoignition close to the chamber walls might lead to local flow deceleration, and the interaction between the shock train and the boundary layer can result in massive separation and corresponding increased residence time and further heat release. This feedback can lead to a separation-driven choking. Changes in the flight conditions might lead to modifications of the airflow entering the chamber, for example by lowering the Mach number or increasing the levels of turbulence, leading to more sensitivity to the thermal or separation-driven choking.
103
processes, imprecise characterization of materials, variability in the operating scenarios, and so on. A secondary gate might be introduced, for example, to illustrate the effect of a subset of the uncertain factors on the primary gate. For this reason, in the sequel we always define a gate g with its related uncertainties u(g). In some situations the uncertainties associated with a gate are independent of the others and this will be used as an effective method to reduce the complexity of the system. In the scramjet example above in Fig. 3, the uncertainties u(1) associated with the inlet gate (g1) represent the variability in the flight conditions and are independent of the uncertainties introduced in the combustion process, such as those induced by imprecise measurements of the reaction rates. Secondary gates can serve different objectives and thus a classification is proposed. 3.3.1. Separating gates Introduce a spatial/temporal decomposition of the system into two or multiple subsystems enabling uncoupled subsystem analysis. This is the obvious case of the inlet in the combustor. 3.3.2. Reducing gates Introduce a notional separation of effects, thus leading to a simplified description of a physical process (e.g. flamelet modeling separates the effect of chemistry uncertainty from heat release [17]) within the full system coupled analysis. 3.3.3. Redundant gates Introduced for the purpose of verifying the behavior of a gate; typically based on the same metric and measured in a close vicinity, in space or time, as that of the gate whose behavior is being verified. It is assumed that the two gates must behave consistently.
4. A QMU procedure with multiple gates A network of gates captures the possibility of these three scenarios with three secondary gates as depicted in Fig. 3. The multiple paths indicated in the corresponding network are directly associated with the three different failure modes mentioned above and the strong coupling between the various physical processes. In this situation, all three paths must be considered to ensure safe operation of the device. 3.3. Classification of secondary gates The secondary gates portray the subsystem behavior associated with safe operation at the system level; as mentioned earlier, there is another important use of secondary gates which is related to the presence of uncertainties. In a realistic engineering system, uncertainties are a consequence of manufacturing
The introduction of multiple gates in the analysis of a complex system can lead to alternative scenarios that characterize safe or unsafe operation as illustrated in Table 1. From a practical point of view, the determination of such scenarios requires the computation of conditional probabilities of acceptable outputs at the various gates given all the known uncertainties: an exceedingly difficult task. The process illustrated below aims at computing these conditional probabilities more efficiently. We consider a generic system with one primary gate G (with metric M) and K secondary gates gk with k ¼ 1 . . . K associated with metrics mk. Assume there are d uncertain inputs xj , j ¼ 1 . . . d, to the problem under consideration, i.e. the initial conditions, or parameters describing the physical models, and so on. Each secondary ðkÞ gate is associated with one or more uncertainties, namely x‘ ; this is to say that some uncertainties affect one gate but not others. 4.1. Phase 0: system description In this preliminary phase the physical system and all the known uncertainties have to be defined. In addition, a primary performance metric characterizing successful behavior of the system has to be identified, together with a number of secondary metrics/gates. Each secondary gate is associated with one or more uncertainties within a subsystem (separating gate) or through a physical model (reducing gate).
Fig. 3. Network of gates. Left: the shock tube problem: the primary gate G is related to the test time; the secondary gates g1 and g2 are associated with the Mach number before and after reflection at the end wall. Right: the hypersonic combustion chamber: the primary gate G is associated with the maximum pressure in the combustor. The secondary gates are related to the inlet conditions g1, the state of the boundary layer in the combustor g2 and the heat release g3.
4.2. Phase 1: nominal full-system simulation
1. Solve the full-scale problem using the nominal value of all inputs variables x (for example, the mean values E½xj ) and
104
G. Iaccarino et al. / Reliability Engineering and System Safety 114 (2013) 99–113
compute the primary metric MðE½xj Þ ¼ M. Also compute the corresponding metrics at all other gates m k for k ¼ 1 . . . K. Note that M ¼ MðE½xj Þ aE½Mðxj Þ; this choice is made because M is representative of the nominal (desired) operating conditions, rather than a statistical measure of the performance. 2. Assign a safety margin to the primary gate. As mentioned earlier, the margin can be defined based on design constraints, scientific judgement or available data. Alternatively, the margin can be computed using a desired confidence ratio CR. This is the approach followed here. We start by assuming CR ¼1 and compute the margin M; in this case this is defined in terms of a range for the primary gate metric M, for example ½MsM : M þ sM . Other definitions are possible, and nonsymmetric intervals can be considered. In the following we will assume that DM ¼ 2sM is the allowable range, and compute the coefficient of variation at the primary gate C G ¼ DM =M. 3. Assign a margin at each secondary gate by computing Dm1 . . . Dmk . . . DmK using the range at the primary gate, i.e. Dmk ¼ m k C G . This assumes a unit Jacobean, in the dependence of G on mk and might imply that very narrow ranges are required at the secondary gates to provide adequate control at the primary gate. This assumption can be replaced later by an actual computation of the Jacobean for example using an adjoint approach [19]; in the examples reported here this step was not carried out. It is also worth mentioning that this assumption implies that the ratio of intervals Dmk =DM is proportional to the ratio of the means m k =M. At this point we have intervals defined for all the secondary gates. The output of Phase 1 is a nominal computation that (we assume) is associated with successful operation of the system and margins defined over all the gates in the network. 4.3. Phase 2: secondary gate analysis This step is carried out by generating realizations of the input uncertainties and evaluating the system performance at the gates one by one; if the performance metric at gate k is within the tolerable interval Dmk , the input values are recorded and, by ðkÞ repeating the process, eventually collected into a subset Dx‘ . The procedure can be summarized as follows:
For gate k with k ¼ 1 . . . K: 1. Propagate the uncertainties that directly impact gate gk ðkÞ (namely x‘ ) and evaluate mk. This might require full system simulations unless the gate is a separating one, but with only the uncertainties connected to gk—all the others are set to their nominal values. 2. Take all the solutions compatible with the range of gate gk and identify the corresponding input space of uncertainties ðkÞ xðkÞ ‘ which we refer to as Dx‘ ; it is these intervals only, for example in the initial conditions, for which later we will need to perform full system simulation. This step is similar although not identical to what is used in particle filters or sequential Monte Carlo methods [18]. Although MC is typically not the most efficient methodology to construct stochastic responses, it must be emphasized that in the present context it is easy to generate samples and verify whether they meet the conditions at each gate, so as to continue the analysis. The nature of the probability distributions obtained is typically not well behaved with multiple regions (in parameter space) that are compatible with the constraints at the gate. MC methods deal with this
situation naturally without requiring further modifications that might be necessary if other methodologies are considered [19,20]. 3. Additional considerations (such as available data, scientific judgement, etc.) can be introduced at this point to correct the ranges Dmk in Phase 1 if the gate under consideration is not effective or overly conservative. This behavior might be induced by strong non-linearities that were ignored in the definition of the range in Phase 1. Note that a modified Dmk ðkÞ will naturally lead to a new Dx‘ , but these can be evaluated using the results obtained previously without need to perform additional simulations. In other words the process described above can be interpreted as the conðkÞ struction of the map Dx‘ ðDmk Þ.
4.4. Phase 3: network path analysis The result of Phase 2 is the identification of a set of uncertain ðkÞ inputs, e.g. Dx‘ , compatible with the k-th gate. In other words the inputs are constrained by the gate associated with each metric. In order to construct a complete representation of the input subset that yields acceptable conditions at all the gates, we need to construct all possible pathways through the gates. The output is a set of conditional simulations, that are compatible with ALL of the secondary gates and ALL of the possible pathways through the system operation. If the gates are all sequential there is only one pathway connecting all the gates; this is the case for the shock tube where the gates are placed at different times. The only path is fg 1 -g 2 -Gg. In some cases the physics of the problem creates circuits of dependencies: this is the case for the combustor in which unstart might be induced by excessive heat release, or by flow separation induced by heat release (see Fig. 1). The four possible pathways are: fg 1 -g 2 -Gg, fg 3 -Gg, fg 1 -g 2 -g 3 -Gg, and fg 3 -g 2 -Gg. We assume that there are P possible paths and the procedure is summarized as:
For path p with p ¼ 1 . . . P 1. consider the gates associated with path p and their allowðkÞ able uncertain intervals Dx‘ 2. if full system simulations were performed in the previous step, we simply collect all the ones that are compatible with all the secondary gates on the path (this is a solution database lookup). If separating gates were used in Phase 2, then we need to perform full system simulations spanning all the uncertainties associated with the gates active along path p, collecting the ones that successfully pass through all the gates on this path. This set of solutions corresponds to the conditional probability of safe operation corresponding to the path p.
4.5. Phase 4: primary gate analysis The final step is simply postprocessing, which requires the selection of all the cases identified in Phase 3 that are compatible with the primary gate. At this point we also have the full simulations that illustrate safe operation and, as a by-product, the joint probabilities of the secondary gate metrics, conditioned on successful operation at the primary gate. It is important to notice that conditioning selects only those samples that have successfully passed through all the gates. If a sample path does not pass through a particular secondary gate, this does not necessarily imply that downstream behavior represented by the
G. Iaccarino et al. / Reliability Engineering and System Safety 114 (2013) 99–113
primary gate will be compromised, or fail. It means rather there might not be sufficient evidence to certify acceptable performance. In spite of this, the analysis of the conditions that do or do not lead to successful passage through a secondary gate may still be useful in understanding the relationship between the gate metric and the system’s operational characteristics. 4.6. Discussion and computational analysis The analysis illustrated above can be understood as a process of successive filtering of events leading to success. In Fig. 4 this is illustrated using Venn diagrams. Consider a system characterized by two secondary gates and a primary gate. In Phase 1 the nominal conditions are considered, while in Phase 2 (secondary gate analysis) the goal is to build a set of events that correspond to g 1 ¼ 1 and g 2 ¼ 1. This is illustrated in Fig. 4a for the gate g1. The successive analysis of the secondary gate path (Phase 3) allows one to build the subset of events reported in Fig. 4b, and the final step (Phase 4) further reduces the space of events leading to acceptable performance. As mentioned earlier, the introduction of secondary gates serves two purposes: on one hand, we want to construct a fine grain representation of the system to understand the physical mechanisms that could lead to unsafe operation. On the other hand, we seek a simulation approach that reduces the overall computational cost. In general, the determination of the safety margin requires a large number of full system simulations to explore the space of the uncertain inputs. The simplifications and cost-reduction afforded by the secondary gates are due to two factors:
105
common with the earlier theory of fault tree analysis. Fault tree analysis originated at Boeing in the 1970s [21] and achieved prominence through its application to the safety analysis of nuclear power reactors, culminating in the UW Nuclear Regulatory Commission report [22]. In fault tree analysis, the failure of any component is assigned a probability, and its consequence for the next larger subsystem is assessed, leading to failure or not of the subsystem, and eventually of the full system. Low probability for system failures are achieved in part through quality assurance (low failure rates) for components, but more realistically through redundant and back up systems, so that component failure is unlikely to lead to unit failure and unit failure is unlikely to lead to system failure. The essence of fault tree analysis is to integrate probabilities (usually assumed to be independent) across a network to arrive at a full system failure probability. Care must be taken to build into the analysis common mode failures (failure of electric power leading to failure of all pumps) for which probabilities are not independent. Although our analysis shares the network concept and the component level failure probability concept, there are important qualitative differences. We envision a much simpler causal network than is common for fault tree analysis and a deeper analysis of the gates and components based on validated physics models rather than engineered quality assessment. For each component, we are concerned not with a component malfunction related failure but with a system design failure, in the presence of correctly operating components. The probability of failure is a function of the safety margin and the latter is modified (through engineering design or selection of a safe operating region) to achieve the desired small probabilities.
full system simulations are replaced by subsystem simulations because of the presence of separating gates;
6. Evaluation of the test time in a shock-tube facility
the comprehensive exploration of the space spanned by all known uncertain inputs is replaced by full system analysis corresponding to reduced spaces of uncertain inputs imposed by the secondary gates. Depending on the complexity of the system and the nature of the secondary gates, both of the cost-reduction mechanisms identified above could be important. It is also possible to establish secondary gates whose purpose is the reduction of computational cost (for example, the separating gates).
5. The connection to fault tree analysis The concepts developed here for a QMU risk assessment strategy based on a network of multiple gates has features in
6.1. Deterministic analysis The test time in a shock-tube facility as depicted in Fig. 1 is defined as the time interval between the reflection of the primary shock and the arrival of the secondary reflected shock at the nozzle. The determination of the flow properties within the tube is based on the one-dimensional Euler equations for gasdynamics of a gas mixture. In the present investigation we will focus on the flow within the shock-tube and will truncate the domain at the entrance of the nozzle, precisely at the location of the nozzle diaphragm where the first shock reflection occurs. In a previous investigation [8] we developed an accurate solver for the one-dimensional gas-dynamic equations and used it to investigate the flow within simplified supersonic combustion chambers. Here we use the same computational tool to determine
Fig. 4. Venn diagram showing the successive conditioning of the event space induced by primary and secondary gates. (a) Nominal condition and set of scenarios leading to acceptable performance at gate g1; (b) conditioning induced by the combined analysis at secondary gates; (c) reduction of the set of the events related to the primary gates.
106
G. Iaccarino et al. / Reliability Engineering and System Safety 114 (2013) 99–113
the flow and thermal field in a straight tube; the governing equations are discretized in finite volume form, using second order spatial discretization and a third order Runge–Kutta timeintegration [23]. The physical domain is x A ½15,5 with the initial membrane location at x ¼ 0.8. The time evolution of the system is studied in the interval t A ½0,5. Time and spatial coordinates are made non-dimensional with respect to the speed of sound in the driving gas (left of the membrane) and a reference length of Lref ¼1 m. In the following, all quantities are assumed to be non-dimensional unless specifically noted. The operating conditions correspond to a pressure ratio (pL =pR ) of 86 and a density ratio (rL =rR ) of 37.5. A space-time plot of the flow evolution is reported in Fig. 5 in terms of Mach isolines. The key features of the solutions are clearly visible and follow closely the schematic reported in Fig. 1. The grid sensitivity of the solution has been studied by considering three successively refined meshes consisting of 128, 512, 1024 grid cells. The results are reported in Table 2 and demonstrate that the medium grid has sufficient accuracy with respect to the characterization of the state T (reported in Fig. 5), which eventually determines the conditions in the test chamber and the test duration. The latter is clearly defined as T test ¼ t 2 t 1 and according to the results reported in Table 2, it is Ttest ¼1.06. 6.2. Probabilistic analysis The analysis presented in the previous section represents the typical characterization of the test-time Ttest for a shock tube
Fig. 5. Time–space diagram of the solution in a shock tube closed on the right hand side, with initial conditions corresponding to pL =pR ¼ 86 and rL =rR ¼ 37:5. The lines correspond to isolevels of density and illustrate the generation of a shock and a contact discontinuity moving towards the right end side and expansion fan. The reflection at the closed right side is also visible. This figure is an actual computation reproducing the schematic in Fig. 1. The subscript L and R refer to the driving and the driven gas, respectively. The state T corresponds to the state resulting after the reflection of the shock and before the arrival of the contact discontinuity (it is also the state of the gas expanding in the nozzle of Fig. 1 and eventually determines the conditions in the test chamber and the test duration).
Table 2 Grid sensitivity of the shock-tube solution. The definition of the various quantities is reported in Fig. 5. Cells
T T =T R
pT =pR
t1
t2
128 512 1024
9.4 10.8 10.9
44.3 46.2 46.4
2.57 2.58 2.58
3.67 3.64 3.63
facility. In order to evaluate the confidence in the estimate we have to characterize the uncertainties present in the system and determine their impact on the system evolution. One source of uncertainty is the characterization of the initial state of the driving and driven gases (states L and R in Fig. 5). Specifically, three uncertain parameters are introduced, the left pressure and density and the right density to represent limited accuracy in the measurement of the actual gas conditions. A second uncertainty source is associated with the interaction of the shock wave with the boundary layers forming on the tube walls. The precise characterization of this process is complicated and requires detailed simulations. In the present computational model, the viscous effects are completely ignored and therefore this phenomena can only be represented in terms of the resulting uncertainty in the propagation speed of the shocks. We also introduce uncertainty in terms of a variable temperature of the gas (and therefore different speed of sound) in the region towards the right-end of the shock tube. This uncertainty is described in terms of a parametrized temperature gradient, DT R . It is useful to mention that the first group of uncertainties introduced above is aleatoric in nature, being related to observation errors; the second source, e.g. the temperature gradient, is of epistemic nature, e.g. related to limitation of the mathematical model employed. The latter uncertainty can be reduced by introducing a more sophisticated model of the interaction of the fluid flow and the shock-tube wall. In principle it is important to distinguish these two types of uncertainties and carry out the analysis so that the uncertainty in the metric of interest has distinct contributions. A QMU study along these lines is presented in [7]. In the present work the focus is on the introduction and management of a network of gates for the identification of margins to failure and the analysis would become overly complicated if the two uncertainty sources were to be represented separately. Four different scenarios are considered to evaluate the impact of uncertainties on the dynamics of the system; we assume that all the uncertainties can be represented using independent, uniform random variables, as reported in Table 3. The evaluation of the uncertainty in the results is obtained by performing Monte Carlo sampling, thus performing independent simulations corresponding to realization of the quantities in Table 3 and then computing statistical moments of the quantity of interest. The four cases are designed to provide insight into the effect of multiple sources of uncertainty. Cases 1 and 2 correspond to variability in the conditions to the left of the membrane and far towards the right end-wall; we expect these two to have considerably different effects on the evolution of the reflected shock, and therefore on the test time. Case 3 occurs when both uncertainties active and will be used to guide the introduction of a secondary gate in the next section. The last case is characterized by additional uncertainties (in the gas densities to mimic the presence of contamination in the initial mixture) and will serve as a further test of the effectiveness of QMU in characterizing the system performance. In the following we will distinguish between the uncertainties that characterize the initial conditions Table 3 Initial conditions in the shock-tube problem. L and R correspond to the left and right state with respect to the initial membrane location. Quantities are made non-dimensional with respect to the left state. Case
pL
pR
rL
rR
DT R
1 2 3 4
1.298 7 10% 1.298 1.298 7 10% 1.298 7 10%
0.015 0.015 0.015 0.015
1.4 1.4 1.4 1.4 7 10%
0.04 0.04 0.04 0.047 10%
2.0 2.0 720% 2.0 720% 2.0 720%
G. Iaccarino et al. / Reliability Engineering and System Safety 114 (2013) 99–113
of the shock tube, e.g. pL, pR and rR , and the uncertainties that relate to the progression of the shock towards the end tube (DT R ); the first group will be mapped to a secondary gate, while these, together with DT R , will affect the primary gate corresponding to the shock arrival time after the end-wall reflection. The results are reported in Fig. 6 in terms of the variance of the shock trajectory; Monte Carlo sampling based on 1000 realizations is used. The four different cases show similar dynamics both before and after the reflection at the right endwall. The main differences are associated with the presence of the gas heating (see Case 1 vs the others), which leads to a shock acceleration as it approaches the right endwall. The interaction with the wall is also more complex in these cases, as is evident from the jagged contour of the stagnation region behind the reflected shock. The time-evolution of the variance (Fig. 6) demonstrates that the uncertainties in the initial solution are not damped but are instead amplified by the presence of both the shock wave and the contact discontinuity. The variability in the Mach number at t ¼5 is roughly the same in all four cases, thus not additive, although the character of the variance in the Case 3 appears to be a composition of Cases 1 and 2. Case 4 exhibits larger variance.
Carlo samples corresponding to the uncertainties in the four cases illustrated above can be used to construct a probability distribution for the time window (T test ¼ t 2 t 1 ) observed in the simulations. These are reported in Fig. 7 together with a userspecified lower bound for the tolerable time-window, in this case set to T min test ¼ 0:5. The most important observations can be summarized as follows:
The uncertainty in the initial conditions corresponding to
6.3. Safety margin The variance reported above provides an indication of the overall behavior of the system but does not allow a precise quantification of the operating range. A more appropriate measure is the probability that the arriving shock (t1 in Fig. 5) and the reflected shocks (t2 in Fig. 5) are considerably different with respect to the (deterministic) nominal conditions. The Monte
107
Cases 1 and 2 is not sufficient to lead to failure. The probability distribution in Case 2 is highly skewed, whereas in Case 1, it resembles the uniform distribution corresponding to the input uncertainty. Case 3, which nominally is the composition of the input uncertainties in Cases 1 and 2, exhibit a scenario that is different from the previous two cases. Specifically, the variance in time window is considerably larger, and a small, although non-zero probability of failure (T test o 0:5) is observed. Case 4, in which two more sources of uncertainties in the initial conditions are considered, appears to be similar to Case 3 but with a higher probability of failure. The probability distribution (and even more so the probability of failure) cannot be used directly to infer the scenario that leads to a reduction of the operating range. In particular, the effects of the two sources of uncertainties are compounded and no indication on their separate effect is available.
In the following we illustrate how the introduction of secondary gates in this system allows one to determine the operating
Fig. 6. Space–time diagrams illustrating the Mach number variance computed over 1000 Monte Carlo samples. The black regions correspond to high variance and are wellcorrelated to the flow discontinuities. The conditions for cases 1–4 are reported in Table 3.
108
G. Iaccarino et al. / Reliability Engineering and System Safety 114 (2013) 99–113
Case 2
pdf(Ttest)
pdf(Ttest)
Case 1
0
0.5
1
1.5
0
2
0.5
1
1.5
2
Ttest
Ttest
Case 4
pdf(Ttest)
pdf(Ttest)
Case 3
0
0.5
1
1.5
Ttest
2
0
0.5
1
1.5
2
Ttest
Fig. 7. Probability density functions of the operating time window for the shock tube. The dashed line corresponds to the lower bound of the operability range: T min test ¼ 0:5. The conditions for cases 1–4 are reported in Table 3.
range and additional information regarding the evolution of the system. 6.4. QMU analysis We follow the steps indicated in Section 4 to construct and evaluate the secondary gates and the corresponding conditional probabilities of successful operation of the system. 6.4.1. Phase 0: system description The time window Ttest introduced in the previous section is the critical performance metric for the system: the primary gate G. We consider two additional gates to characterize the state of the system. The first (g1) measures the time of arrival of the shock at x1 after the diaphragm breaks (see Fig. 8); the other (g2) signals the arrival time of the reflected shock at x2. These two separating gates serve the purpose of precisely assessing the effect of the uncertainty on the operating range. The relation between primary and secondary gates in this case is simple, because of the implied temporal sequence, and follows the schematic depicted in Fig. 3a. The uncertainties considered in the problem are associated with the variability in the initial conditions (u(1)) and the effect of the viscous layer on the propagation speed of the reflected shock (u(2)), which are naturally connected with the secondary gates g1 and g2, respectively. 6.4.2. Phase 1: nominal full-system simulation In the absence of uncertainties, the metrics at the primary and secondary gates can be evaluated, resulting in T test ¼ M ¼ 1, m1 ¼1.95 and m2 ¼3.12. In this phase, we also assign safety margins to the gates. First we specify the safe operability range for the device, which in this case is simply an interval defined around the nominal time
Fig. 8. Shock tube problem. The initial condition is depicted at the bottom, with the membrane still in place dividing the left and right state. After the rupture a shock moves toward the right endwall and at time t ¼ t 1 is located at x ¼ x1 (secondary gate g1). Finally, after reflection at t ¼ t 2 the shock reaches the location x ¼ x2 (secondary gate g2).
window T test A ½0:75 : 1:25 or equivalently DT test ¼ DM ¼ 0:5, resulting in C G ¼ DM =M ¼ 0:5. Note that only the lower bound is of interest here but we consider both of them for the sake of generality in illustrating the proposed assessment procedure.
G. Iaccarino et al. / Reliability Engineering and System Safety 114 (2013) 99–113
Then, we compute the ranges at the secondary gates m 1 and m 2 as Dm1 ¼ m 1 C G ¼ 0:98 and Dm2 ¼ m 2 C G ¼ 1:56.
6.4.3. Phase 2: secondary gates analysis At this stage, we need to perform simulations corresponding to the uncertainties associated with each gate. The stochastic computations presented in the previous section are used here; specifically Cases 1 and 2 represent probabilistic scenarios corresponding to the uncertainties associated only with gate g1 and g2, respectively. With the ranges Dm1 and Dm2 derived above, we can then compute the allowable ranges of the uncertainties (pL and DT R , respectively) that enable successful operation at the secondary gates.
6.4.4. Phase 3: network path analysis The next step requires the determination of the conditional probabilities related to the possible pathways connecting the secondary gates. In this case only one path is present and therefore the conditional probability Pðg 2 ¼ 19g 1 ¼ 1Þ is easily constructed, and reported in Fig. 9. The correlation between the two metrics, evident in the figure, is a result of the simple, direct relationship between the uncertainties and the shock speed.
6.4.5. Phase 4: primary gate analysis As a final step of the QMU process, we identify the states that lead to successful operation of the device, in this case, to sufficient test window time Ttest. This typically requires performing simulations corresponding to the uncertainty ranges found above, but in this case, the simulations already carried out for Case 3 (uncertainties in both pL and DT R active) can be used. We emphasize that all the states represented in Fig. 9 correspond to acceptable conditions at the primary gate; in mathematical terms P ¼ PðG ¼ 19g 2 ¼ 19g 1 ¼ 1Þ ¼ Pðg 2 ¼ 19g 1 ¼ 1Þ. This is not a general conclusion, and we expect that further data conditioning will typically lead to additional limitations in the allowable uncertainties in the system (as presented in Section 7). 5
4.5
4
t2
3.5
3
2.5
2
1.5 1.4
1.6
1.8
2
2.2
2.4
t1 Fig. 9. Scatter plots of the arrival time of the shock at g1 and g2 conditioned on successful operation at both secondary gates (g 1 ¼ 1 and g 2 ¼ 1). This is effectively a representation of the space of the acceptable operating scenarios for the system (cf. with Fig. 4).
109
6.4.6. Discussion The construction of P, the full conditional PDF of the safe states (system operations compatible with acceptable margin in the primary gate), allows one to study the behavior of the system under different scenarios, for example corresponding to the larger set of uncertainties considered in Case 4. P is only a function of the metrics considered at the secondary gates and not a function of the original uncertainties; therefore we can monitor the state of the system at the gate g1 corresponding to the conditions compatible with the uncertainties in Case 4 and automatically discard the simulations that do not lead to g 1 ¼ 1. The remaining ones will be further conditioned by the second gate by enforcing g 2 ¼ 1, and finally by G¼1. This process corresponds to the determination of a path compatible with the gate network defined for the system and once again determines the safe operating conditions. The critical aspect of the process is the fact that we used one-dimensional uncertainty analysis (Cases 1 and 2) to evaluate P, and then used the support of P to filter the operating scenarios generated under more complex conditions (high-dimensional uncertainties). This natural dimension reduction leads to a considerable saving in terms of computational resources; in this case a large portion of the simulations corresponding to Case 4 would not be continued beyond the first gate (for t 4 t 1 ). It is worth recalling that the analysis above was based on CR ¼1 in the definition of the margin. Similar analysis with increased conservatism could be repeated by setting CR 4 1. In conclusion, the process of successive conditioning introduced by the gates leads to the evaluation of a range on conditions compatible with the safety evaluation at the primary gate. In this particular example, the initial uncertainties in pL and DT R have been reduced from 10% and 20% to 4% and 9%, respectively. 7. Hypersonic vehicle combustion chamber 7.1. Deterministic analysis Simulations of the Hyshot II vehicle have been presented in [19,20,23]. The computations use the same computational code employed for the shock-tube example, but include a twoequation turbulence model and a simple model for representing the heat release due to combustion [8]. Steady state simulations are carried out in 2D using three different grids ranging from 30,000 to 450,000 elements; the mesh inside the combustor includes 128 and 512 cells in the streamwise direction in the coarsest and finest grid, respectively. The computational domain includes the vehicle nose and the nozzle, in addition to the combustor; the height of the combustion chamber h, and the conditions at its entrance (denoted with the subscript inlet) are used to non-dimensionalize all the other variables. The results are reported in Fig. 10 for two different values of the fuel mass flow, or more precisely in non-dimensional terms, for two equivalence ratios f corresponding to nominal and high-thrust operating conditions. The effect of the grid is visible mainly in the smearing of the shock-train. The pressure rise along the chamber, which is the most important metric to evaluate the scramjet performance, is not modified even on the coarsest grid used. For all the simulations reported below the medium grid is used. The numerical predictions are compared to experimental data [24] showing a reasonable accuracy in predicting the pressure rise occurring within the combustor. The parametrization of the heat release is a critical ingredient and it is discussed in our previous work [8] and not reported here for brevity. 7.2. Probabilistic analysis Several uncertainties were considered in previous activities [19]; in the present work, we focus on two sources: the flight
110
G. Iaccarino et al. / Reliability Engineering and System Safety 114 (2013) 99–113
4.5
3.5
4
3
3.5 2.5 3 2
pr
pr
2.5 2
1.5
1.5 1 1
Experiments coarse grid medium grid fine grid
0.5
0.5 0
0 0
5
10
15
20
25
30
35
40
0
5
10
(x - xinlet)/h
15
20
25
30
35
40
(x - xinlet)/h
Fig. 10. Simulations of the Hyshot vehicle. Pressure distribution along the combustor compared to experimental measurements from [24]. (a) Nominal conditions f ¼ 0:3 and (b) High-thrust conditions f ¼ 0:5.
conditions and the parametrization of the heat release model. The vehicle is designed to operate at a given altitude and speed, but variability in the atmospheric conditions and difficulties in controlling precisely the vehicle trajectory lead to uncertainties. We limit our analysis to parametric variability of the altitude, the flight Mach number and the angle of attack and we assume 5% uncertainty with respect to the nominal conditions. Furthermore we describe the variability using independent uniform random variables (as in the previous example). In the following QMU analysis we will introduce a secondary gate, g1—referred to as U in Fig. 2—to capture the variability related to the combustor inflow. The uncertainties introduced by the model of the heat release are more complicated to characterize because they are associated with the limitations associated with the description of the physical process. In our previous work [8] we have defined three parameters representing the location of autoignition (start of the heat release region), the penetration of the burning region within the chamber and the total amount of fuel burnt within the chamber. Once again these parameters are represented using independent uniform random variables. A better characterization of these uncertainties was obtained using an inference process as described in [20], but it is not included here to simplify the description of the QMU process. The uncertainties associated with the heat release might have a complex effect on the overall conditions in the combustor. Specifically heat release in the supersonic stream leads naturally to a deceleration and eventually to unstart. This is the process that was captured in our previous QMU study with the 1D thermodynamic model [8]; we introduce a secondary gate g3 to capture this effect measured at location C in Fig. 2. In addition, in the present model we introduce an additional secondary gate g2 which represents the state of the boundary layer. The heat release and the different inflow scenario can lead to flow separation (at the location W in Fig. 2). Monte Carlo simulations corresponding to the high-thrust fuel flow rate (f ¼ 0:5) were carried out and the results obtained using 1000 realizations are reported in Fig. 11 in terms of histograms of the Mach number at three locations within the combustion chamber, corresponding to the symbols U, W and C reported in Fig. 2. As expected, the distributions in Fig. 11 are strongly dependent on the spatial location. In addition, the variability is much larger close to walls than in the chamber centerline, due to the effect of the heat release uncertainty and the interactions between shocks and boundary layers.
7.3. Safety margin Excessive fuel injection and heat release within the combustion chamber leads to unstart, a dramatic failure of the propulsion system. In order to characterize the safe operation of the combustor we need to define a critical performance parameter (the primary gate) that identifies the state of the scramjet. In this work, as done previously [19], the pressure ratio (pr ¼ p=pinlet ) within the chamber is defined as representative of the overall heat release. Theoretical considerations (one-dimensional thermodynamics [13]) lead to the definition of a limit pmax and r therefore to the computation of an operability margin based on the nominal conditions. This is illustrated in Fig. 12, where the Monte Carlo simulations reported in the previous section are analyzed in terms of the pressure ratio. As reported in Fig. 10 in the high-thrust conditions, the pressure rise reaches a maximum at x=h 20; this roughly corresponds to the appearance of a region of subsonic flow within the combustor. The flow is not yet choked as this low-speed region does not extend to the entire height of the combustion chamber. The presence of uncertainties in the system alters the incoming flow and the effective heat release; a subset of the Monte Carlo realizations is reported in Fig. 12 and shows that the uncertainties lead to considerable variability in the overall pressure rise, with peak values that exceed pmax 4:5 [13]. r The probability distribution of pr reported in Fig. 12 shows a fairly extended tail towards high values of the pressure ratio as already found in previous work [19]; it is worth noting that a conventional study of the probability of failure requires a very accurate estimation of this tail. The present approach, on the other hand, is based on the determination of a safety margin with respect to the nominal conditions. 7.4. QMU analysis Once again, we follow the steps indicated in Section 4 to construct and evaluate the secondary gates and the corresponding conditional probabilities of successful operation of the system. 7.4.1. Phase 0: system description The pressure ratio at x/h¼20 is considered as the critical performance metric for determining the operability limit of the scramjet: the primary gate. As done for the shock-tube example, we
111
pdf(Mach)
pdf(Mach)
G. Iaccarino et al. / Reliability Engineering and System Safety 114 (2013) 99–113
2.4
2.6
2.8
3
3.2
0.2
0.4
0.6
0.8
Mach
1
1.2
1.4
1.6
pdf(Mach)
Mach
2
2.1
2.2
2.3
2.4
Mach Fig. 11. Probability distributions of the Mach number at various location within the Hyshot combustion chamber (see Fig. 2), corresponding roughly to the inlet and the boundary layer and the centerline after the fuel injection position. Position U. Position W. Position C.
define additional secondary gates to determine the state of the system corresponding to safe operation. We consider three secondary gates: g1 corresponds to the conditions at the combustor inflow, g2 to the boundary layer state inside the chamber after the injection and g3 to the centerline flow at x/h¼20. The metric for all the secondary gates is the Mach number. The uncertainties associated with the flight conditions are clearly associated with the gate g1; on the other hand, the calibration parameters used in the heat release model and their uncertainties affect both gates g2 and g3.
7.4.2. Phase 1: nominal full-system simulation The nominal operating scenario is evaluated first and leads to the determination of the metrics for the primary and secondary gates: M¼3.51, m1 ¼2.75, m2 ¼0.83 and m3 ¼2.23. The safe region is specified here in terms of a range at the primary gate as pr A ½2:9 : 4:1 which corresponds to C G ¼ DM=M ¼ 0:34. The resulting ranges at the secondary gates are defined as Dmk ¼ m k C G and therefore, Dm1 ¼ 0:93, Dm2 ¼ 0:28, Dm3 ¼ 0:76.
7.4.3. Phase 2: secondary gates analysis Gate g1 is a separating gate because it captures all the uncertainties related to the flight scenario and it is not affected by the uncertainty in the heat release model; gates g2 and g3 on the other hand directly respond to the latter uncertainties.
Following the same procedure used for the shock-tube example, we carry out Monte Carlo simulations by activating the uncertainties associated with gate g1 and with g2 and g3; these realizations together with the ranges Dmk defined above allows one to determine the allowable ranges of uncertainties that guarantee successful operation at the secondary gates.
7.4.4. Phase 3: network path analysis The final stage is the construction of the conditional probabilities using the gate network associated with the secondary gates, Fig. 3, resulting in Pðg 3 ¼ 19g 2 ¼ 19g 1 ¼ 1Þ. Two marginals of this three-dimensional Pðm1 ,m2 ,m3 Þ distribution are reported in Fig. 13. The correlation between the gates g2 and g3 is due to the strong coupling between the viscous regions and the overall flow in the chamber. On the other hand, the relation between the metrics g1 and g2 is less obvious and provides insightful information. A comparison between the support of the distributions presented in Figs. 11 and 13 shows that the successful operation at the centerline gate g3 is almost always achieved, whereas the boundary layer gate g2 strongly conditions the flow. Specifically, the range of Mach numbers at g2 computed in the previous section (Fig. 11 middle) includes low speed conditions with Mach numbers as low as 0.4, well below the threshold obtained here (M 40:7).
112
G. Iaccarino et al. / Reliability Engineering and System Safety 114 (2013) 99–113
5
1.2
4.5 4
1
3.5
0.8
m2
pr
3 2.5 2
0.6 1.5 1
0.4 0.5
1
0 0
5
10
15
20
25
30
35
1.5
2
2.5
40
3
3.5
4
4.5
m1
(x - xinlet)/h
1.2
m2
pdf(pr)
1
0.8
0.6
0.4 2.5
3
3.5
4
4.5
5
pr
2
2.2
2.4
2.6
m3
Fig. 12. Simulations of the Hyshot vehicle. (a) Pressure distributions (for selected input values) along the combustor and (b) probability distribution of the pressure ratio at x/h ¼ 20. The dashed line corresponds to the thermal choking limit [13].
Fig. 13. Hyshot simulations: scatter plots of the metrics at the secondary gates g1 (left) and g3 (right) conditioned on the success at gate g2 (cf. with Fig. 4). (a) Sðg 1 ¼ 19g 2 ¼ 1Þ and (b) Sðg 3 ¼ 19g 2 ¼ 1Þ.
7.4.5. Phase 4: primary gate analysis As a final step we can evaluate the joint PDFs considering the three paths defined in Fig. 3, further conditioned on successful operation at the primary gate. The intersection region of these three joint PDFs represents the conditions under which the system operates safely under all the possible scenarios identified. The marginal distribution of the conditional PDF P is reported in Fig. 14. Inspection of the support of these distributions illustrates the conditioning process induced by the gates: the ranges of the metrics m1 and m2 compatible with successful performance are smaller than those reported in Fig. 13.
conditions of the system. This process, which we refer to as conditioning sheds light on the connection of the various physical subsystems and, more importantly, on the impact of the various uncertainties. In this example, the computational savings are afforded mainly by the fact that the secondary gate at the inlet provides a natural separation between two sets of uncertainties (inflow and heat release model). On the other hand, the strongly coupled nature of the reactive flow in the engine required full system simulations which involve both secondary gates g2 and g3, not allowing any further savings. It is important to emphasize that the conditional PDF P is now expressed in terms of the metrics at the secondary gates mk rather than the uncertain inputs considered. This implies that under the same nominal conditions, if different uncertainties are considered, only a subset of the analysis (specifically Phases 2–4) needs to be repeated for the full assessment of the safety of the system.
7.4.6. Discussion The procedure illustrated above allows one to determine the successive impact of the gates on the range of safe operating
G. Iaccarino et al. / Reliability Engineering and System Safety 114 (2013) 99–113
113
parameter space to ensure that the design specifications are met. This is indeed an interesting though challenging extension of the present work and will be explored in future work.
1.2
1
Acknowledgments The authors wish to thank Dr. Johan Larsson for useful suggestions and comments on the draft of this manuscript. This material is based upon work supported by the Department of Energy [National Nuclear Security Administration] under Award Number NA28614. The present manuscript also appears as a Los Alamos preprint LA-UR 12-01526.
m2
0.8
0.6
References 0.4
1
1.5
2
2.5
3
3.5
4
4.5
5
m1 Fig. 14. Hyshot simulations: scatter plot of the success metric of the primary gate conditioned on successful operation at all the secondary gates. This is effectively a representation of the space of the acceptable operating scenarios for the system (cf. with Fig. 4).
8. Conclusions A computational procedure to quantify margins and uncertainties of engineered systems is presented in this work. We introduce primary and secondary gates and a network connecting them as an effective tool to evaluate conditional probabilities of safe operation and to provide a fine grain description of the behavior of the device and its subsystems. The methodology is illustrated using two examples related to the evaluation of the test time in a shock-tube facility and the assessment of the thermal choking limit in a supersonic combustion chamber. In the first example the gates are organized in a temporal sequence and the results show a correlation between them. In the supersonic combustion chamber problem, the complexity of the physical processes and their strong coupling leads to multiple pathways to failure captured by three secondary gates representing subcomponents of the system. The resulting dependency between safe operation at the secondary and primary gates is quite complicated. The present approach results in computational savings if the effect of multiple uncertainties can be clearly associated with a unique (separating) gate; in this case, once the analysis is performed, the gate metric can be described without resorting to the original set of uncertain variables, thus leading to a potential dimensional reduction. Moreover, once the overall range of safe conditions is described in terms of the conditional PDF of the primary performance metric, the effect of additional uncertain sources can be studied simply by evaluating the secondary gate metrics. One final comment concerns the possibility of formulating the present framework in reverse mode: in this case given a performance goal and tolerable intervals around it, the present procedure could be used to establish compatible regions in the input parameter space that would successfully reach the targets. Two difficulties arise in this case: (i) the possibility that the inverse problem is ill-conditioned and no compatible input ranges exist and (ii) problems connected to enforcing constraints on the input
[1] Ratti F, Gavira J, Passarelli G, Massobrio F. EXPERT—The ESA experimental reentry test-bed: system overview. In: 16th AIAA/DLR/DGLR international space planes and hypersonic systems and technologies conference, AIAA2009-7224. [2] Dolvin D. Hypersonic international flight research and experimentation technology development and flight certification strategy. In: 16th AIAA/ DLR/DGLR international space planes and hypersonic systems and technologies conference, AIAA-2009-7228. [3] Sharp DH, Wood-Schultz MM. QMU and nuclear weapons certification. What’s under the hood? Los Alamos Science 2003(28):47–53. [4] Sharp DH, Wallstrom TC, Wood-Schultz MM. Physics package confidence: ‘ONE’ vs. ‘1.0’ (U). In: Proceedings of the NEDPC 2003, LAUR-04-0496. [5] Wallstrom TC. Quantification of margins and uncertainties: a probabilistic framework. Reliability Engineering and System Safety 2010;96(9):1053–62. [6] Helton JC. Quantification of margins and uncertainties: conceptual and computational basis. Reliability Engineering and System Safety 2010;96(9) 976–1013. [7] Helton JC, Johnson JC, Sallaberry CJ. Quantification of margins and uncertainties: example analyses from reactor safety and radioactive waste disposal involving the separation of aleatory and epistemic uncertainty. Reliability Engineering and System Safety 2011;96(9):1014–33. [8] Iaccarino G, Pecnik R, Glimm J, Sharp D. A QMU approach for characterizing the operability limits of air-breathing hypersonic vehicles. Reliability Engineering and System Safety 2010;96(9):1150–60. [9] Fromm Jochen, The emergence of complexity. Kassel University Press; 2004. ISBN 3-89958-069-9. [10] Ericson CA. Fault tree analysis: a history. In: 17th international system safety conference, 1999. [11] Fenton N, Neil ME. Managing risk in the modern world: applications of Bayesian networks. London Mathematical Society; 2007. [12] Gai SL. Free piston shock tunnels: developments and capabilities. Progress in Aerospace Sciences 1992;29:1–41. [13] Riggins D, Tackett R, Taylor T, Auslender A. Thermodynamic analysis of dualmode scramjet engine operation and performance. AIAA Paper 2006-8059, 2006. [14] Huebner LD, Rock KE, Ruf EG, Witte DW, Andrews EH. Hyper-X flight engine ground testing for X-43 flight risk reduction. AIAA Paper 2001-1809, 2001. [15] Sato T, Kaji S. Study of steady and unsteady unstart phenomena due to compound choking and/or fluctuations in combustor scramjet engines. AIAA Paper 1992-5102, 1992. [16] Buchmann OA. Thermal-structural design study of an airframe-integrated scramjet NASA CR 3141. in a model scramjet engine. Journal of Propulsion and Power 1979;16(5):808–14 (96(9):1014–1033). [17] Mueller M, Iaccarino G, Pitsch H. Chemical kinetic uncertainty quantification for large Eddy simulation of turbulent nonpremixed combustion. In: Proceedings of the Combustion Institute 2013;34:1299–306. [18] Doucet A, Godsill SJ, Andrieu C. On sequential Monte Carlo sampling methods for Bayesian filtering. Statistics and Computing 2000;10:197–208. [19] Wang Q, Duraisamy K, Alonso JJ, Iaccarino G. Risk assessment of hypersonic flow simulations using adjoint-based sampling methods. AIAA Journal 2012;50(3):581–93. [20] Constantine P, Doostan A, Wang Q, Iaccarino G. A surrogate accelerated Bayesian inverse analysis of the HyShot II flight data. AIAA-2011-2037, 2011. [21] Eckberg CR. Fault tree analysis program plan. Seattle, WA: The Boeing Company; 1964, D2-30207-1. [22] US Nuclear Regulatory Commission. Fault tree handbook (NUREG-0492). 1981. [23] Pecnik R, Terrapon V, Iaccarino G. Reynolds-averaged Navier–Stokes simulations of the Hyshot II scramjet. AIAA Journal 2012;50(8):1717–32. [24] Karl S, Hanneman K, Mack A, Steelant J. CFD analysis of the HyShot II scramjet experiments in the HEG shock tunnel. AIAA 2008-2548, 2008.