Reliability Engineering and System Safety 80 (2003) 217–232 www.elsevier.com/locate/ress
Statistical aspects of best estimate method—I Attila Guba, Miha´ly Makai*, Le´na´rd Pa´l KFKI Atomic Energy Research Institute, P.O. Box 49, H-1525 Budapest, Hungary Received 6 December 2002; accepted 3 January 2003
Abstract In former years, thermal hydraulics phenomena have been analyzed mostly by means of conservative models. Recently there is a tendency to revert to best estimate safety analysis of thermal hydraulics phenomena in a nuclear power plant. In a best estimate analysis we have to know the uncertainty of the estimated values. We investigate two aspects of the error analysis. Firstly, we define a black box model of the thermal hydraulic calculation and derive the precise number of sample calculations needed for a given tolerance level even for several statistically dependent output variables. Then we point out possible chaotic behavior of calculation along with suggesting methods for recognizing the appearance of chaos. q 2003 Elsevier Science Ltd. All rights reserved. Keywords: Safety analysis; Black box model; Best estimate; Statistics; Chaos
1. Introduction The safety analysis [1,2], of a nuclear power plant has two major components: neutronics and thermal hydraulics models. The first one estimates the power distribution in the fuel, the second one estimates the temperature, void content and other features of the coolant and of the structural materials. Neutronics calculations usually perform best estimate calculations, which means that neutronics models the real situation, and attempts to calculate the real power in the real situation. A model inevitably involves approximations; hence the calculated values are not free from errors. There are methods (e.g. sensitivity analysis [3]) to estimate the uncertainty caused by model errors or the uncertainty of input data. Traditionally thermal hydraulics has been using another approach. There a situation is analyzed, in which safety is more endangered than in any real situation. This approach is called conservative. If the conservative approach predicts that a state is safe, we may anticipate any real situation also to be safe. The best estimate approach turned up only recently in thermal hydraulics. Because of the difference in the structure of the equations, and the solution methods, the error estimation methods of neutronics are not applicable * Corresponding author. Tel.: þ 36-1-395-9293; fax: þ36-1-395-9220. E-mail addresses:
[email protected] (M. Makai), guba@sunserv. kfki.hu (A. Guba),
[email protected] (L. Pa´l).
directly in thermal hydraulics. Our work outlines a statistical estimation of the errors associated with the output variables of thermal hydraulics calculations, points out the existence of chaotic behavior, determines its characteristic features but does not deal with its statistical evaluation. In the sequel we quote certain calculated results. All the calculations refer to the experiment PH-SLB that was carried out at KFKI Atomic Energy Research Institute at the facility PMK-2, see Ref. [4]. We do not give details of the experiment, as it will be the topic of a subsequent paper, which will be devoted to applications of our considerations. Quoted calculations are based on ATHLET mod 1.2 cycle A code version [5], which is being used at a number of places including GRS (Germany), where the code has been elaborated. We do not describe the ATHLET, but shortly refer to its major features: † The basic equations demand the mass, energy, and momentum conservation. † The investigated facility should be divided into parts called nodes and the connection between nodes should be clearly described. † Material properties and correlation functions have been elaborated in such a manner that the model is adequate for normal and transient operation of a nuclear power plant. † The equations are capable of taking into account the scheme of the actual power plant. For example, it should
0951-8320/03/$ - see front matter q 2003 Elsevier Science Ltd. All rights reserved. doi:10.1016/S0951-8320(03)00022-X
218
A. Guba et al. / Reliability Engineering and System Safety 80 (2003) 217–232
be specified in the model when the emergency cooling system switches on and what are the conditions when it stops. † The model is inherently non-linear. Not only the balance equations may involve non-linear terms but also the operations of certain devices depend on the solution. Our goal is to characterize the error of the output quantities in order to get a better estimate for the accuracy of ATHLET type calculations in a safety analysis. We do not intend to address details of the input – output relationship except pointing out chaos may appear. Unfortunately, the usual notation accepted in the safety reports concerning interval estimates differs from the notation accepted in statistics. In the sequel, we follow the notation used in the literature of statistics [6] to make it easy for the reader to look up details.
2. Approach to safety analysis Design, licensing, and operation of nuclear power plants all are subject to strict regulation [1]. In the US, the Code of Federal Regulations (CFR) orders licensing nuclear facilities under Title 10, Part 50. Section 50.34 specifies the needed technical information. The Nuclear Regulatory Commission issues guidelines. Limiting conditions for operation are defined on 10CFR50.36 as …the lowest functional capability or performance level of equipment required for safe operation of the facility. When a limiting condition for operation of a nuclear reactor is not met, the licensee shall shut down the reactor or follow any remedial action permitted by the technical specification until the condition can be met. There are two approaches to safety analysis. Since the analysis has to demonstrate safety of the operation under the investigated circumstances, we may scrutinize a not too realistic but rather unfavorable situation saying that if that situation is safe then any real situation must be on the safe side. This approach we call conservatism. An alternative approach may attempt to investigate the real situation and show that no limit violation can occur. In this case the calculated values should be increased by the possible error when compared with the safety limit [3]. That approach is called best estimate. 2.1. Conservative approach In conservative analysis, the first problem is in the selection of the case to be studied. It identifies an overt attempt to bound the actual expected state hence it should estimate also the consequences of model uncertainties. How do we know if a given situation is more conservative than
the other is? It is often impossible to foresee the outcome of a non-linear process. Another problem may be the interplay between approximations. It may happen that either of two approximations leads to conservatism but their simultaneous presence does not. The conservative approach has been prevalent for a long time, although today rather the best estimate methods are in the focus. 2.2. Best estimate The main difficulty with best estimate calculation is in the complexity of the phenomena involved. A new material phase (steam or molten clad) may appear, at a given temperature chemical reactions may take place producing new material properties, and also producing or removing heat. In spite of the problem’s complexity, a best estimate method attempts to solve the equations describing the involved physical processes as accurately as our knowledge permits. From licensing viewpoint, the key parameter XBE to be compared to the acceptance criteria can be formulated [7] as follows: XBE ðlicencingÞ ¼Xbasecase þ plant uncertainty þ CODE uncertainty: The purpose of the uncertainly evaluation is to provide assurance that XBE ; which has a probability of 95% or more, will not exceed the acceptance criteria for a given accident. The present work does not investigate the first two terms. Best estimate methods are accompanied by an uncertainty analysis to learn the uncertainty band of the response. NRC proposes [8] the code scaling applicability and uncertainty methodology, where different steps of uncertainty analysis are clearly formulated. This approach endeavors to estimate the uncertainty of the model (or code) and of the numerical methods and data. Several methods [9 –11] are utilized to account for error propagation and combination. We assume the modeled procedure to start from a known initial state. All the physical quantities in the model we sort as input, output, and latent data. By definition, a datum is input if its domain is known along with a distribution function associating a probability with any admitted value. In a model, there are several constants, which are considered either as input or latent data. Input, when the given constant is looked upon as a variable in a given range and a probability is allotted to every possible value. Distinction between input and latent data is a matter of engineering judgment. The nature of the distribution may depend on the determination of the constant. Latent, when we refrain from analyzing the uncertainties of the constant, temporarily we take it as a fixed number. A datum not falling into the input or latent category is called output. This classification is subjective therefore in a given analysis we list the input and output data explicitly.
A. Guba et al. / Reliability Engineering and System Safety 80 (2003) 217–232
219
3. Black box model Let us consider a system as complex as a nuclear power plant. Such a system is described by geometry, material properties and so forth. Assume we have a model describing that system, and that model enables us to calculate physical parameters (e.g. clad temperature, void content) characterizing the system at arbitrary instant t: Let p be the number of technologically important variables. In the frame of the model, the operation of the system is considered safe if all calculated variables belong to a given set of intervals ðjÞ VT ¼ {½LðjÞ T ; UT ; j ¼ 1; …; p}
determined by the technology. In order not to be set back by the complexity of the problem, we suggest a simple black box model, in which output variables are linked to input variables. That link can be a computer code that transforms vector ~x; the input variables, into a vector ~yðtÞ; the output variables. In general, the dimension of ~x; i.e. the number of input variables is not the same as dimension of ~yðtÞ; i.e. the number of output variables. Every data that enters into the model is treated as an input variable, hence we do not distinguish parameters. The model is an explicit relationship between input ~x and output ~y ^ x; ~yðtÞ ( CðtÞ~
ð1Þ
^ is a non-linear operator that maps where CðtÞ 1 0 0 1 y1 ðtÞ x1 C B B C B y ðtÞ C Bx C B 2 C B 2C C B B C ~x ¼ B . C into ~yðtÞ ¼ B . C: B . C B . C B . C B . C A @ @ A yp ðtÞ xh In practical cases the link between input and output is very complex hence there is no reason to anticipate an analytical ^ is assumed to relationship like ~yðtÞ ¼ ~fð~x; tÞ: In the sequel CðtÞ be deterministic, in other words once the input has been fixed, we obtain the same output within the computation accuracy for each run. At the same time, the input vector fluctuates according to distribution laws simulating possible variations of the technology, or, reflecting uncertainty of some parameters of the model. Consequently, the output parameters also fluctuate in repeated runs. We present an illustration of how random input may influence an output variable, see Fig. 1. We used ATHLET to generate several output variables for a simple experimental setup, but in Fig. 1 we presented only one output variable as function of time for three independent runs. It is obvious that in this case the above given criterion for safe operation of the system needs to be changed because there is no guarantee that a new run after a successful run will also be successful. We call a state ~x0 nominal, if all the input parameters take their respective expectation value, i.e. ~x0 ¼ E{~x}: We can
Fig. 1. Influence of the random input on the time dependence of one output variable in three independent runs.
perform a calculation in the nominal state to get the ^ x0 : Usually the state ~x0 is corresponding output ~y0 ¼ CðtÞ~ called safe if ~y0 is in the safety envelope VT : However, we need a more stringent definition: state ~x is called safe if ~y is in the safety envelope for every x [ X; where X is the set of all possible values of ~x: Here we should make three remarks. (i) X may be an infinite interval when one of the input variables is of normal distribution. In practical calculations such variables are confined to a finite interval by engineering judgement. (ii) We check that statement by a given, finite number of calculations [12] with input from X: If there is a value outside the safety envelop VT the state ~x is unsafe independently of the fact that the nominal state ~x0 may be safe. (iii) Even if every calculated output is safe, there is a probability that the state is actually unsafe. Fixing time t after N runs we obtain N randomly varying output vectors {~yð1Þ ðtÞ; ~yð2Þ ðtÞ; …; ~yðNÞ ðtÞ} which carries information on the fluctuating input and the code properties. The present work focuses on deriving criteria for safe operation when the output variables are fluctuating as a result of randomness of input variables. In order to find these criteria, we introduce a slightly new variant of the well-known tolerance interval method in Section 5. Before to do this we would like to deal briefly with the possible chaotic behavior of output variables.
4. Chaotic behavior 4.1. General remarks The purpose of performing safety analysis is to assure that the designed equipment can be operated safely. In Section 4.2 we will give criteria based on mathematical statistics for safe operation but with respect to the possible chaotic behavior of non-linear systems, it is impossible to disregard this problem in analyzing the behavior of output variables. Chaotic behavior [13,14] is usually defined in connection with evolution equations, see Sattinger’s work. There
220
A. Guba et al. / Reliability Engineering and System Safety 80 (2003) 217–232
the state of a physical system is a point in a phase space and the evolution equation maps the present status of the physical system into its status in the next moment. Now we face a different situation, we have a mapping of a set of input variables into a set of output variables, see Eq. (1). Some of the definitions of the usual system are easily transferable to our case. Chaos is attached to the shape of the evolution curve in the phase space. Bifurcation occurs when the evolution curve forks, and we speak of chaotic behavior when the evolution curve is a capricious line which remains in a closed set. Exploiting that analogy, we investigate a smooth curve Cin in the input space and consider its image Cout in the output space. We may speak of bifurcation at point ~xbi when Cout forks at ~xbi : We generalize the chaos in the following fashion. We speak of chaos when Cout is the union of disjoint sets. In this case there are such points on an arbitrary Cin curve, that those points are mapped into a smooth curve but other points of Cin are mapped into a disjoint curve. In what follows, for the sake of simplicity the notation yð~xÞ ¼ ^ x will be used. Main properties of input – output CðtÞ~ mapping are summarized in Table 1. When an arbitrary smooth curve is mapped into an arbitrary smooth curve, the gradient is continuous function. ^ xbi þ d~x1 Þ and Let ~xbi be a bifurcation point. Then, CðtÞð~ ^ ~ CðtÞð~xbi þ dx2 Þ are two curves branching at ~xbi : Here d~x1 and d~x2 are not parallel and arbitrarily small. Then, the limits lim
d~x1 !0
yð~xbi þ d~x1 Þ 2 yð~xbi Þ ld~x1 l
– lim
d~x2 !0
of a sequence of iterations on diverse subsets. It can be shown that when chaos is present, the input set X can be divided into disjoint subsets: X¼
n [
Xj :
j¼1
^ is the same inside a It is important to note that the operator C given subset but varies when passing from Xi to Xj ; i – j: We show that when chaos is present, the expectation values of the output variables are linear combinations of the expectation values over the disjoint subsets Xj ; j ¼ 1; …; n: In order to prove this statement let us replace the input and output random variables x and y by j and h; respectively, and denote by ðx P{X # x} ¼ pðuÞdu; 21
the probability distribution function of j; where pðuÞ is the density function. Let us introduce the indicator function ( 1; if j [ Xj ; IXj ðjÞ ¼ 0; if j Xj ; and write the output variable h into the form
h¼
n X
^ j jIX ðjÞ: C j
j¼1
We see immediately that
yð~xbi þ d~x2 Þ 2 yð~xbi Þ ld~x2 l
ð2Þ
E{h} ¼
n X
^ j jIX ðjÞ}; E{C j
ð3Þ
j¼1
exist and are different, indicating that the gradient at ~xbi has at least two different values depending on the direction we approach to ~xbi : This shows the gradient to be direction dependent at bifurcation points. We have to show that chaos entails discontinuous mapping function. To this end let us ^ x1 and CðtÞ~ ^ x2 lie in select two points ~x1 and ~x2 so that CðtÞ~ disjoint curves in the output set. Let us draw a smooth curve ^ x is discontinuous between ~x1 and ~x2 : Along this curve CðtÞ~ indicating that the derivative must be discontinuous for an ~x1 # ~x # ~x2 : At the first glance, it may seem strange speaking of chaos in connection with mapping one space into another. Actually, there are subsets of the variables such that the algorithm maps each one into itself; and the output is a result
where ^ j jIX ðjÞ} ¼ E{C j
ð X
^ j xIX ðxÞpðxÞdx ¼ C j
ð Xj
^ j xpðxÞdx: C
By using the notations ð pðxÞ pðxÞdx and p0 ðxÞ ¼ pðxlj [ Xj Þ ¼ ; wj ¼ wj Xj we obtain E{h} ¼
n X j¼1
wj
ð Xj
^ j xp0 ðxÞdx ¼ C
n X
wj E{hj }:
ð4Þ
j¼1
Here we used the notation E{hj } ¼ E{hlj [ Xj }:
Table 1 Chaos in input–output mapping (7y-gradient of y) Cin
Cout
y
Phenomenon
Smooth curve Smooth curve Smooth curve
Smooth curve Branching curve Disjoint curves
7y continuous 7y direction dependent y discontinuous
None Bifurcation Chaos
This gives us hints on detecting chaos in the framework of the black box model. Chaos appears when the following three conditions are simultaneously satisfied: † The average value considerably changes. † The variance suddenly increases considerably. † Output variables tend to separate into distinct groups.
A. Guba et al. / Reliability Engineering and System Safety 80 (2003) 217–232
221
4.2. Non-linearity A physical problem is called well posed [15] if the solution is a smooth function of the parameters and boundary conditions. Usually we have in mind that kind of problem and do exploit the smoothness of the solution. For instance, in a validation and verification (V&V) process we are usually unable to prove that the solution is sufficiently accurate everywhere. Instead, we perform the V&V process at few points of the range and perform an interpolation to get the error at other points. There is a class of problems to which these considerations do not apply. It has been observed that certain nonlinear problems may behave unusually, for example, a small variation in input data may entail a large variation of output data [13,14]. Impact of non-linearity (bifurcation, chaos) has been investigated in Section 4.1. Here we pay attention to a specific source of non-linearity. The mass, energy and impulse conservation equations are non-linear in themselves. Here we point out another source of non-linearity. The input –output mapping models a technological process, often one specific technological process from among the several available processes. The selection between the available processes is based on conditions imposed on input or output variables. Just think of a sprinkler system, which starts to work when the temperature exceeds a threshold value. Although we can not follow the steps inside the model, is should be emphasized that the mentioned branching inside the model may be very abrupt and non-differentiable and may result in chaotic output [16]. 4.3. Example Chaos is not merely a theoretical possibility. The example discussed below refers to experiment PH-SLB
Fig. 2. Downcomer level (le61) in experiment PMK; black line-calculated values with nominal parameters; green lines-measured curves; red linescalculated values when overheating occurred; blue lines-calculated values when overheating did not occur.
Fig. 3. Relative standard deviation of le61 as function of time.
[17]. The calculations were performed by ATHLET [5]. In the calculations 50 scalar variables were considered as input the output space was spanned out by 33 variables, which were time dependent functions. Results of 101 runs are analyzed, one run is associated with the nominal state, 100 runs are ATHLET calculations with expositions of the input data. We mention only here that the selected input variables are drawn from practical distributions (e.g. uniform distribution in a reasonable interval). Let us consider output variable le61 (Downcomer Level) shown in Fig. 2. In the range (0, 150 s) the calculated curves are in one bunch, showing that no bifurcation occurs in that interval. Around t < 200 s, the calculated curves separate into three bunches indicating that chaos appears (not bifurcation but trifurcation). Let the output variable be le61 at t < 300 s. We notice the intervals (0, 0.5), (1, 2.5) and (4.0, 4.7) to cover the respective bunches. 17 calculations predicted no overheating, those calculations are plotted in blue, the rest in red. The calculated water level varies between 20 cm and 3 m for most of the (0, 500 s) interval. The figure speaks for itself: in the simple experimental set up chaos is present in the ATHLET calculational model. Fig. 3 shows the relative dispersion of le61 as a function of time. From the curve, we see two critical points t ¼ 160 and 240 s. At the latter, the calculated curves fall into three bunches, at the former the range of the calculated curves tends to widen. The distribution of le61 is unknown, therefore the time dependence of limits LðtÞ ¼ minðle61Þ and UðtÞ ¼ maxðle61Þ has been plotted in Fig. 4. This, however, prognosticates large
Fig. 4. Range of le61 as function of time.
222
A. Guba et al. / Reliability Engineering and System Safety 80 (2003) 217–232
possible errors in practical cases of safety analysis calculations.
5. Statistics of black box model In this Section we assume that the observed randomness of output variables is the result of the randomness ^ x does not of input variables, and the mapping ~y ¼ CðtÞ~ show chaotic behavior. We adopt the widely used conventions that x and j as well as y and h are not distinguished. In the sequel we follow the notation used in the literature of statistics [6]. For the sake of simplicity, only one output variable is considered with a probability density function gðyÞ because the generalization is straightforward. Furthermore, the time t is taken as fixed and its notation is omitted. If we carry out N runs with fluctuating input, then we obtain a sample {y1 ; y2 ; …; yN } of the random variable y: Let us construct two random functions L ¼ Lðy1 ; …; yN Þ and U ¼ Uðy1 ; …; yN Þ; called tolerance limits, such that ðU P gðyÞdy . g ¼ b: ð5Þ L
We remark only that ðU gðyÞdy ¼ Aðy1 ; …; yN Þ
ð6Þ
a normal distribution Nðm; sÞ then exact formula can be obtained for b: It is worth mentioning that if output variable y is a sum of a large number of small, statistically independent random variable then, its distribution is almost normal. Thus, it makes sense to derive the probability b for normal distribution. Let y1 ; …; yN be values of the output variable y in N statistically independent runs. We shall denote by y~ the sample estimate of the expectation value m and by s~2 that of the variance s2 ; i.e. y~ ¼
N 1 X y N k¼1 k
and N 1 X ðy 2 y~ Þ2 : N 2 1 k¼1 k
s~2 ¼
5.1. Known probability density function Let us assume the probability density function gðyÞ to be known. In general, the attempt to get an explicit expression for b by means of expression (5) would fail. There is however one exception, when gðyÞ is the density function of
ð8Þ
Let us construct two random variables, viz. L ¼ Lðy1 ; …; yN ; lÞ ¼ y~ 2 ls~ and U ¼ Uðy1 ; …; yN ; lÞ ¼ y~ þ ls~; where the parameter l scales the length of the interval ½L; U: Denote by Að~y; ls~Þ the proportion of the output distribution included between the limits Lðy1 ; …; yN ; lÞ ¼ y~ 2 ls~ and Uðy1 ; …; yN ; lÞ ¼ y~ þ ls~; i.e.
L
is a random variable, sometimes called probability content, which measures the proportion of the distribution included in the random interval ½L; U: Probability b bears the name confidence level. For safe operation it is advisable to specify probability content g and confidence level b as large as possible in the interval (0,1). It should be emphasized that g is not a probability although a non-negative real number, which is not greater than 1. Having fixed b and g; it becomes possible to determine the number of runs N: Carrying out N runs, we get a sample {y1 ; …; yN }; from which we can determine an appropriate tolerance interval ½L; U: If that interval lies in VT we declare the operation safe. Many authors have discussed the problem of setting tolerance limits for a distribution on the basis of an observed sample. The pioneering work was done by Wilks [18,19] and by Wald [20,21]. In the present paper we endeavor to analyze possibilities and limitations of the tolerance interval method in resolving code uncertainty problems and give some help in practical applications.
ð7Þ
Að~y; ls~Þ ¼
ðU
gðyÞdy
L
" # 1 ðU ðy 2 mÞ2 dy: exp 2 ¼ pffiffiffiffi 2s 2 2ps L Introducing new variable z ¼ ðy 2 mÞ=s we obtain 1 ðu 2z2 =2 e dz; Aðm þ sz~ ; ls~Þ ¼ rð~z; s~Þ ¼ pffiffiffiffi 2p ‘
ð9Þ
ð10Þ
where ‘¼
y~ 2 m y~ 2 m s~ s~ 2 l ¼ z~ 2 ls~ and u ¼ þl s s s s
¼ z~ þ ls~: We stress again that rð~z; s~ Þ is a random variable because in expression (10) the limits of the integral are random variables. Theorem 1. For any given positive value of l the probability that r . g; where 0 p g , 1 is expressed by " rffiffiffiffiffi ðþ1 # N qðm; gÞ 2 Wðl; g; NÞ ¼1 2 K ðN 2 1Þ 2p 21 N21 l
e2N m
2
=2
dm;
ð11Þ
A. Guba et al. / Reliability Engineering and System Safety 80 (2003) 217–232
223
Table 2 l values for the number of runs N ¼ 50ð5Þ100 N
50 55 60 65 70 75 80 85 90 95 100
b ¼ 0:90
b ¼ 0:95
b ¼ 0:99
g ¼ 0:90
g ¼ 0:95
g ¼ 0:99
g ¼ 0:90
g ¼ 0:95
g ¼ 0:99
g ¼ 0:90
g ¼ 0:95
g ¼ 0:99
1.916 1.901 1.887 1.875 1.865 1.856 1.848 1.841 1.834 1.828 1.822
2.284 2.265 2.248 2.234 2.222 2.211 2.202 2.193 2.185 2.178 2.172
3.001 2.976 2.956 2.936 2.920 2.906 2.894 2.882 2.872 2.862 2.854
1.996 1.976 1.958 1.943 1.929 1.917 1.907 1.897 1.889 1.881 1.874
2.379 2.354 2.333 2.315 2.299 2.285 2.272 2.261 2.251 2.241 2.233
3.126 3.093 3.066 3.042 3.021 3.002 2.986 2.971 2.958 2.945 2.934
2.162 2.130 2.103 2.080 2.060 2.042 2.026 2.012 1.999 1.987 1.977
2.576 2.538 2.506 2.478 2.454 2.433 2.414 2.397 2.382 2.368 2.355
3.385 3.335 3.293 3.257 3.225 3.197 3.173 3.150 3.130 3.112 3.096
where KN21 ½· is the x2 distribution with ðN 2 1Þ-degrees of freedom and qðm; gÞ is the solution of the equation 1 ðmþq 2x2 =2 pffiffiffiffi e dx ¼ g: ð12Þ 2p m2q The value l determining the tolerance interval at a preassigned probability content g and a preassigned significance level b in the case of N runs can be calculated from the equation Wðl; g; NÞ ¼ b
ð13Þ
and it is independent of unknown parameters m and s of the density function gðyÞ: The equation (13) has exactly one root in l; since Wðl; g; NÞ is a strictly increasing function of l: Proof of Theorem 1 is given in Appendix A, since the mathematical details are not relevant to the aim of the present work. However, it is worth mentioning that an approximate tolerance interval can be derived when N is large (e.g. N . 50). Theorem 2. The approximate tolerance interval is given by
interval around the sample mean of the output variable), Table 2 gives the l values associated with often used g and b for the number of runs N ¼ 50ð5Þ100: One can see that at N ¼ 100 the tolerance interval which includes 99% of the distribution with 99% probability is given by ½~y 2 3:1s~; y~ þ 3:1s~: If that interval lies within ½LT ; UT then the system is safe on level g ¼ 0:99 and b ¼ 0:99: Fig. 5 shows convincingly the interrelations between the basic characteristics of the tolerance interval method. Note that l decreases with increasing N provided g and b are fixed. 5.2. Unknown probability density function 5.2.1. One output parameter To solve the problem of setting tolerance limits when nothing is known about the density function gðyÞ except that it is continuous, seems to be not an easy task. Exploiting advantages of the order statistics, Wilks [18,19] was the first who found a satisfactory solution to the problem and somewhat later Robbins [22] published a nice proof that
½~y 2 la ðg; bÞs~; y~ þ la ðg; bÞs~; where sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffi N21 qð1= N ; gÞ: la ðg; bÞ ¼ QN21 ð1 2 bÞ
ð14Þ
2 Here QN21 ð1 2 bÞ is ð1 2 bÞ-percentile of the pffiffiffix distribution with ðN 2 1Þ degree of freedom and qð1= N ; gÞ is the root of the equation pffiffi 1 ðð1= N Þþq 2z2 =2 pffiffiffiffi e dz ¼ g: ð15Þ pffiffi 2p ð1= N Þ2q
Proof of Theorem 2 is given in Appendix B. In order to give an impression of l values (i.e. of the tolerance
Fig. 5. The dependence of the probability b on the parameter l at probability content g ¼ 0:99 for the number of runs N ¼ 50; 75, 100.
224
A. Guba et al. / Reliability Engineering and System Safety 80 (2003) 217–232
distribution free tolerance limits can be given only by means of order statistics. It is evident that in the order statistics we are unable to exploit the total amount of information which is present in the sample when the density function gðyÞ is unknown. Consequently, with g and b given, we anticipate either a wider tolerance interval around the sample mean or a larger sample size to achieve the same tolerance interval as in the case of known gðyÞ: Not going into details, we give here a well-known theorem, which is useful in uncertainty and sensitivity analysis of codes. Theorem 3. Let y1 ; …; yN be N independent observations of the random output y. Suppose that nothing is known about the density function gðyÞ except that it is continuous. Arrange the values of y1 ; …; yN in increasing order,1 and denote by yðkÞ the kth of these ordered values; hence in particular yð1Þ ¼ min yk ; 1#k#N
yðNÞ ¼ max yk 1#k#N
and by definition yð0Þ ¼ 21; while yðN þ 1Þ ¼ þ1: In this case for some positive g , 1 and b , 1 there can be constructed two random function Lðy1 ; …; yN Þ and Uðy1 ; …; yN Þ; called tolerance limit, such that the probability that ðU
gðyÞdy . g
L
holds is equal to
b ¼ 1 2 Iðg; s 2 r; N 2 s þ r þ 1Þ ! s2r21 X N j ¼ g ð1 2 gÞN2j ; j j¼0
ð16Þ
where2
interval the expression
b ¼ 1 2 gN 2 ðN 2 1Þð1 2 gÞgN21 :
Often we are interested solely in the upper tolerance limit U ¼ yðNÞ and we call the interval ½yð0Þ; yðNÞ one sided tolerance interval. Now r ¼ 0 and s ¼ N; therefore
b ¼ 1 2 gN :
ð17Þ
ðj 2 1Þ!ðk 2 1Þ! Bðj; kÞ ¼ ; ðj þ k 2 2Þ! 0 # r , s # N; and L ¼ yðrÞ; U ¼ yðsÞ:
The proof of Theorem 3, which is a simplified version of Wald’s proof, is given in Appendix C. The selection of tolerance limits L ¼ yð1Þ and U ¼ yðNÞ appears to be expedient in many cases. Substituting r ¼ 1 and s ¼ N in Eq. (16), we get for the two-sided tolerance 1
The probability that equal values occur is zero. Here, one has to stress once again that g in Eq. (16) is not probability but a non-negative not larger than 1 real number. 2
ð19Þ
When the lower limit is of interest, we select ½yð1Þ; yðN þ 1Þ and this is also a one sided tolerance interval. Substituting r ¼ 1 and s ¼ N þ 1 into expression (16), we obtain Eq. (19)) again. Finally, we make two remarks. Two outputs are considered the same if their difference is smaller than the round-off error. Therefore, the probability that two runs yield the same output is very small but not zero. The second remark is that expressions (18) and (19) may appear as a relationship between two probabilities b and g: However, g is not a probability, which can be seen from the nonsensical interpretation for g from any of the mentioned expressions. In Table 3. we compiled the probability content g of the tolerance interval ½yð1Þ; yðNÞ for b ¼ 0:9; 0:95; 0:99 and N ¼ 10ð10Þ100ð25Þ300: If we are interested in a tolerance interval ½L; U which includes larger than g ¼ 0:953 proportion of the distribution of the output with probability b ¼ 0:95; then we should make 100 runs, see Table 3 and select the lowest output as L and the largest as U: If U is smaller than the technological limit UT ; then the system is safe at the level g ¼ 0:953; b ¼ 0:95: This means that additional runs may produce outputs exceeding U but these portion of runs is not larger than 4.7% of the total number of runs. However, these rare Table 3 g values of tolerance interval ½yð1Þ; yðNÞ for b ¼ 0:9; 0.95, 0.99 and N ¼ 10ð10Þ100ð25Þ300 N
ðg uj21 ð1 2 uÞk21 du; Iðg; j; kÞ ¼ Bðj; kÞ 0
ð18Þ
10 20 30 40 50 60 70 80 90 100 125 150 175 200 225 250 275 300
g values b ¼ 0:90
b ¼ 0:95
b ¼ 0:99
0.66315 0.81904 0.87643 0.90620 0.92443 0.93671 0.94557 0.95225 0.95747 0.96166 0.96924 0.97432 0.97796 0.98069 0.98282 0.98453 0.98593 0.98710
0.60584 0.78389 0.85141 0.88682 0.90860 0.92336 0.93402 0.94207 0.94837 0.95344 0.96262 0.96877 0.97318 0.97650 0.97909 0.98118 0.98287 0.98429
0.49565 0.71127 0.79845 0.84528 0.87448 0.89442 0.90890 0.91989 0.92851 0.93554 0.94813 0.95658 0.96268 0.96736 0.97087 0.97375 0.97618 0.97809
A. Guba et al. / Reliability Engineering and System Safety 80 (2003) 217–232
225
Fig. 6. The dependence of the probability b on the number of runs N at probability contents g ¼ 0:8; 0.9, 0.95, 0.98, 0.99.
output values may be greater than the technological limit UT : Evidently, if U is larger than UT ; the system must be declared unsafe. To get some insight into relation (18) we present the probabilities b versus N for six g values, see Fig. 6. With increasing number of runs, each interpolated curve reaches saturation, and b tends to unity as N tends to infinity. The smaller is the g value the sooner comes the saturation, because small g means that only a small fraction of the calculated output is required to fall into the given interval. Small g value is not acceptable in safety analysis for small g means that a large portion of output values may fall outside the tolerance interval. Practically we need g . 0:95: For example, if we wish the tolerance interval ½L; U to include larger than g ¼ 0:98 proportion of the output values with probability b ¼ 0:95; we need approximately 235 runs in order to get the proper L and U: In spite of the large number runs the probability content g ¼ 0:98 is far from being completely satisfactory. To achieve a better probability content, say g ¼ 0:99 with probability b ¼ 0:95; we need 473 runs, which is practically hard to realize. 5.2.2. Several output variables Now we assume the output to comprise p variables. Let these variables be y1 ; …; yp : If they are statistically completely independent we can apply the result of Section 5.2.1, otherwise we need new considerations. Let gðy1 ; …; yp Þ be the joint density distribution of the output
variables, furthermore, let 1 0 y11 y12 · · · y1N C B C By B 21 y22 · · · y2N C C B Y ¼B . .. C .. . . C B . B . . C . . A @ yp1 yp2 · · · ypN
ð20Þ
be the sample matrix obtained in N q 2p independent runs. The problem of setting tolerance limits for y1 ; …; yp can be formulated as follows. For some given positive values g , 1 and b , 1 we have to construct p pairs of random variables Lj ðy1 ; …; yp Þ and Uj ðy1 ; …; yp Þ j ¼ 1; …; p such that the probability that ðUp ðU1 ··· gðy1 ; …; yp Þdy1 · · ·dyp . g; ð21Þ L1
Lp
holds is equal to b: A natural extension of the procedure applied previously to the one variable case would seem the right selection. Unfortunately that choice does not provide the required solution since the probability of the random event (21) depends on the unknown joint density function gðy1 ; …; yp Þ: Our task is to find a reasonable procedure such that the probability b is independent of gðy1 ; …; yp Þ: It can be shown that such a procedure exists but its uniqueness has not been proven yet. When the density function gðy1 ; …; yp Þ is continuous, we may assume that no two elements of the sample matrix Y are
226
A. Guba et al. / Reliability Engineering and System Safety 80 (2003) 217–232
equal since the probability of that event is zero. The sequence of rows in the sample matrix Y can be arbitrary, reflecting the fact that we number the output variables arbitrarily. Let us choose the first row of the sample matrix, and arrange its elements in order of increasing magnitude y1 ð1Þ; y1 ð2Þ; …; y1 ðNÞ: Select from these y1 ðr1 Þ as L1 and y1 ðs1 Þ . y1 ðr1 Þ as U1 : Let i1 ; i2 ; …; is1 2r1 21 stand for the original column indices of elements y1 ðr1 þ 1Þ; y1 ðr1 þ 2Þ; …; y1 ðs1 2 1Þ: In the next step, choose the second row, the N observed values of the output variable y2 and arrange the part y2i1 ; y2i2 ; …; y2is 2r 21 of its elements in increasing 1 1 order to obtain y2 ð1Þ , y2 ð2Þ , · · · , y2 ðs1 2 r1 2 1Þ: From among these, y2 ðr2 Þ and y2 ðs2 Þ are selected for L2 and U2 and evidently r2 $ r1 ; s2 # s1 2 r1 2 1: We continue this imbedding procedure to the last row of the sample matrix and define a p-dimensional volume Vp ¼ {½L1 ; U1 £ ½L2 ; U2 £ · · · £ ½Lp ; Up };
Uj ¼ yj ðsj Þ
and rj $ rj21 $ · · · $ r1 ; while rj , sj # sj21 2 rj21 2 1;
;j ¼ 2; …; p:
Theorem 4. In the case of p $ 2 dependent output variables with continuous joint density function gðy1 ; …; yp Þ it is possible to construct p-pairs of random intervals ½Lj ; Uj ; j ¼ 1; …; p such that the probability of the inequality ðU1
···
L1
ðUp Lp
ð24Þ
The structure of expression (24) is remarkably similar to that of expression (16), which refers to the one output variable case. Furthermore, if r1 ¼ r2 ¼ · · · ¼ rp ¼ 0 and sp ¼ N 2 p þ 1 then one obtains the following one-sided confidence level:
b ¼ 1 2 Iðg; N 2 p þ 1; p þ 2Þ ! N2p X N j g ð1 2 gÞN2j : ¼ j j¼0
ð25Þ
gðC; y1 ; y2 Þ ¼
1 pffiffiffiffiffiffiffiffiffi 2p 1 2 C 2 1 2 2 exp 2 2 2Cy y þ y Þ ðy 1 2 1 2 ; ð26Þ 2ð1 2 C2 Þ
gðy1 ; …; yp Þdy1 · · ·dyp . g; Table 4 Number of runs needed to determine the two-sided tolerance region for p ¼ 1–3 output variables at listed g; b values
is free of gðy1 ; …; yp Þ and is given by ( ) ðU1 ðUp P ··· gðy1 ; …; yp Þdy1 · · ·dyp . g ¼ b L1
b ¼ 1 2 Iðg; N 2 2p þ 1; 2pÞ ! N22p X N j ¼ g ð1 2 gÞN2j : j j¼0
In order to compare the number of runs needed to determine tolerance regions at a given (b; g) level for p ¼ 1 – 3 output variables with unknown distributions, we compiled Table 4. The number of runs needed to meet stringent requirements, e.g. with three output variables, g ¼ 0:98; and b ¼ 0:98 we need N ¼ 598 runs. Therefore, it seems to be inevitable to seek methods with lower computational demands. In order to provide some insight, let us consider the following example. We have two output variables y1 and y2 ; their joint distribution function is known
where Lj ¼ yj ðrj Þ;
confidence level b is given by
b
Lp
¼ 1 2 Iðg; sp 2 rp ; N 2 sp þ rp þ 1Þ:
ð22Þ
0.95
Here function Ið·Þ is given by Eq. (17) and sp # sp21 2 rp21 2 1 # s1 2
p21 X
0.96
ðrj þ 1Þ and rp
j¼1
$ rp21 $ · · · $ r1 :
0.97
ð23Þ 0.98
Proof of Theorem 4 is given in Appendix D. In several practical applications the choice r1 ¼ r2 ¼ · · · ¼ rp ¼ 1 and sp ¼ N 2 2ðp 2 1Þ can be advised, hence the two-sided
0.99
g values
p
0.95
0.96
0.97
0.98
0.99
93 153 207 98 159 215 105 167 224 114 179 237 130 197 258
117 191 260 123 200 269 132 210 281 143 224 297 163 248 324
156 256 348 165 267 360 176 281 376 192 300 397 218 331 433
235 385 523 249 402 542 266 422 565 289 451 598 329 499 651
473 773 1049 499 806 1086 533 848 1134 581 905 1199 661 1001 1307
1 2 3 1 2 3 1 2 3 1 2 3 1 2 3
A. Guba et al. / Reliability Engineering and System Safety 80 (2003) 217–232
227
Table 5 Levels of significance b of two-sided tolerance regions for two output variables at listed g and N values N
50
100
150
200
g values 0.95
0.96
0.97
0.98
0.8831 0.9433 0.2396 0.9109 0.9911 0.7422 0.9933 0.9981 0.9452 0.9986 0.9998 0.9910
0.7547 0.8775 0.1391 0.8836 0.9590 0.5705 0.9443 0.9869 0.8542 0.9683 0.9955 0.9605
0.5351 0.7442 0.0628 0.6297 0.8488 0.3528 0.6871 0.9044 0.6616 0.7380 0.9414 0.8528
0.2376 0.4970 0.0178 0.2121 0.4970 0.1410 0.1894 0.5554 0.3528 0.1612 0.5779 0.5685
C ¼ 0:1 C ¼ 0:9 DF C ¼ 0:1 C ¼ 0:9 DF C ¼ 0:1 C ¼ 0:9 DF C ¼ 0:1 C ¼ 0:9 DF
where lCl # 1 is the correlation coefficient of variables y1 and y2 : We are interested in the relationship between the significance level b and probability content of a given two dimensional interval ½L; U ¼ ½L1 ; U1 £ ½L2 ; U2 at the number of runs N ¼ 50ð50Þ200: Now we can proceed in two ways. The first way is to fix the fraction of the samples to fall into the given interval ½L; U; and to determine the associated probability b; from Eq. (25), these numbers are in row DF (referring to Distribution Free). The second way is to use the known joint distribution function, calculate the estimates of variances s~i for i ¼ 1; 2 from N runs and define the interval ½Li; Ui ¼ ½22:5s~i; þ2:5s~i: From 105 random cases we estimated the b value, see Table 5. These values are given for two correlation coefficient in rows C ¼ 0:1 and 0.9. As we see in Table 5 the order statistics gives lower b values, in most of the cases, compared to those obtained by using the density function gðC; y1 ; y2 Þ:3 This indicates a considerable gain from known density function of output variables. Probability b versus proportion g is shown in Fig. 7 with fixed N ¼ 100 runs.
Fig. 7. Two output variables.
the probability of being safe or unsafe (depending on the state of the output vector). Assume for the sake of simplicity that the model is not ^ x0 : chaotic around the nominal state ð~x0 ; ~y0 Þ where ~y0 ¼ C~ Then, we have a situation shown in Fig. 8. The input variable(s) may take values in a given range, that range is mapped into a range of the output variables. From some other considerations, which thus far have not been regarded as part of safety analysis, we get information on the probability distribution of the input variable. And, we select a range into which a large portion, say more than 90%, of the possible area lies (indicated as dark area in Fig. 8) with a given high, say 95%, probability. Consequently, we conclude that: 1. It is insufficient to show that the nominal state is safe because there may be probable inputs, which are unsafe. Therefore, when the calculations are carried out exclusively in the nominal state, safety analysis should demonstrate the estimated error to be realistic. 2. Another possibility is that safety analysis should show that images of all x points in the vicinity of x0 are safe. In this case we get rid of the uncertainty caused by input uncertainty. 3. Assumptions or knowledge of probability distributions of the input do have an impact on safety issues, therefore they must not be treated separately. Here two problems
6. Safety inference Our approach has severe consequences on every statement concerning safety. The present section assesses those consequences. The first consequence is that a system cannot be either safe or unsafe but rather we can speak of 3
Without going into details, we mention only that there exists a critical value gcrt ðCÞ such that for g . gcrt ðCÞ for all N1 , N2 we have bðg; N1 Þ . bðg; N2 Þ: That critical value is defined by an integral of gðC; y1 ; y2 Þ: The decrease of b with increasing N can be so large for N . 100 that b becomes smaller than the value obtained from the order statistics. This is the case for some b values with C ¼ 0:1 in Table 5 when N . 100 and g . gcrt ð0:1Þ < 0:9753:
Fig. 8. Probable input is mapped into probable output.
228
A. Guba et al. / Reliability Engineering and System Safety 80 (2003) 217–232
occur. Engineering input data are usually not accompanied by probability distributions and input variables, which are actually internal in the given calculational model, may influence the output error decisively. Such internal inputs are usually obtained from a fitting but that procedure usually gives no information on the probability distribution of fitted parameters (although theory and technique are known). ^ x is safe for a given interval, there is a slight 4. Even if every C~ chance that some input(s) may be associated with unsafe output(s). Those chances can be read out from Table 2 for normally distributed output and from Table 3 for a single arbitrarily distributed output variable, and from Table 4 for two and three arbitrarily distributed output variables. 5. Safety is described by random variables therefore we can make only statistical assertions. Any claim concerning safety is associated with ðb; gÞ and the assumptions on the probability distribution(s) of the input variables. 6. Further complications may enter if the model is chaotic in the vicinity of x0 : In that case the connected input interval may be mapped into several disjoint output intervals. Safety analysis should make it clear that every output interval lies inside the safety envelope. Chaos is indicated by increase of variance and separation of output variables into distinct bunches. When the model is discontinuous, the outputs may fall into disjoint sets. It is not clear how to figure out if all the disjoint sets have been explored, however, without that the safety analysis is incomplete. Since the general consideration results in loss of a large amount of information, efforts should be made to determine the probability distribution of the output variable(s). To this end specific safety analysis models should be analyzed individually. All these caveats necessitate a reconsideration of safety issues.
follows. ^ x0 Þ is determined from the 1. The nominal state ð~x0 ; C~ expectation value of the input and from the associated output. The investigation of the nominal state alone is insufficient to declare safety of the nominal state. 2. When the output is of normal distribution, Theorem 1 ^ 0 into which a determines an interval ½L; U around Cx large fraction of the output set falls with high probability. Values of ðN; b; gÞ are given in Table 2. Our results are in accordance with Ref. [23]. 3. When the distribution of output is not known, we can find an interval ½L; U into which a large portion of the output set falls with high probability. In this case, however, only a fraction of the information present in the output can be utilized. As a result, more runs are needed or lower probabilities are achieved, see Table 3. Our results are in accordance with results of Ref. [23]. 4. When the output consists of more then one statistically not independent quantities, the portion of information content that can be utilized rapidly decreases with the number of output variables. That manifests again in a larger number of runs or in lower probabilities. Results given in Ref. [23] comply with our results (see Table 4) only when the outputs are statistically independent. Actually, to achieve identical ðb; gÞ level, statistically dependent outputs need a larger number of runs than given in Ref. [23]. 5. It should be clarified how chaos is treated in a given safety analysis code. In a subsequent paper it will be shown that slight variations of innocent looking parameters in ATHLET are capable of changing the output drastically, similarly as shown in Fig. 5. All these observations may influence, in safety analysis, the application of best estimate methods. Detailed discussion of the comparison of ATHLET results with measured data is postponed to a subsequent paper.
7. Concluding remarks The object of our investigation is a computer code that we use as a black box: From a well-defined input set the code produces a well-defined output set. Both sets have metrics; we can speak of distance between two input sets or between two ^ : X ! Y; where X output sets. The computer code is a map C is the input set and Y is the output set. When analyzing given equipment, we have a nominal input x0 but the actual input might as well be anywhere in X, the input is a random vector. The probability distribution of input components is usually derived from diverse engineering considerations. In that setting, we have to predict the statistical behavior of the output vector; and, we have to specify a safety envelope into which the actual input falls with high probability. The present work was initiated by a remark stating that a limited number of runs suffices to determine the safety envelop. The exact statements are formulated as theorems in Section 4, and the conclusions are summarized as
Appendix A. Proof of Theorem 1 The proof of Theorem 1 is based on a few well known relations of mathematical statistics. By the definition of conditional probability
P{rð~z;~sÞ. g}¼
ðþ1 21
P{rð~z;~sÞ. gl~z ¼ m}dP{~z # m};
ðA1Þ
where (
) rffiffiffiffiffi N 1 X yn 2 m N 2N m2 =2 e dP{~z # m} ¼ dP #m ¼ dm: N n¼1 s 2p
A. Guba et al. / Reliability Engineering and System Safety 80 (2003) 217–232
Since
229
Substituting Eq. (A4) into Eq. (A2) we get the theorem proven. A
P{rð~z; s~Þ . gl~z ¼ m} ¼ P{rðm; s~Þ . g}; we have rffiffiffiffiffi ðþ1 N P{rð~z; s~Þ . g} ¼ P{rðm; s~ Þ 2p 21 . g}e2N m
2
=2
dm;
Appendix B. Proof of Theorem 2 ðA2Þ
where 1 rðm; s~Þ ¼ pffiffiffiffi 2p
ðmþls~
Before setting out for the proof of Theorem 2, we set forth the following notation. Let Hðl; glmÞ ¼ P{rhoðm; s~Þ . g}:
e2z =2 dz 2
We need
m2ls~
is random variable. Let us define the function
Lemma 1. It can be shown that pffiffiffi Hðl; gl1= N Þ 2 Wðl; g; NÞ ¼ OðN 22 Þ;
1 ðmþls 2z2 =2 rðm; lsÞ ¼ pffiffiffiffi e dz; 2p m2ls
where
for real s: If m and l are fixed, then rðm; lsÞ is strictly monotonously increasing function of s; therefore the equation
rffiffiffiffiffi ðþ1 2 N Wðl; g; NÞ ¼ Hðl; glmÞe2m =2 dm: 2p 21
1 ðmþls 2z2 =2 pffiffiffiffi e dz ¼ g 2p m2ls has only one root in s: It is clear that ls is independent of l; hence we may write ls ¼ qðm; gÞ and q is obtained from 1 ðmþq 2z2 =2 pffiffiffiffi e dz ¼ g: 2p m2q It follows from the property of rðm; lsÞ that the probability of rðm; s~Þ . g equals to the probability of s~ . sr ; i.e. qðm; gÞ P{rðm; s~ Þ . g} ¼ P{~s . sr } ¼ P s~ . : ðA3Þ l By using this relations we can write that ( ) qðm; gÞ ½qðm; gÞ2 2 P s~ . ¼ P s~ . l l2 ( ) s~2 ½qðm; gÞ2 ¼P 2 . s l2 and taking into account that the random variable N s~2 X yn 2 y~ 2 ðN 2 1Þ 2 ¼ s s n¼0 is of x2 distribution with ðN 2 1Þ degree of freedom [24], we get ( ) ½qðm; gÞ2 ; ðA4Þ P{rðm; s~ Þ . g} ¼ 1 2 KN21 ðN 2 1Þ l2 where KN ðxÞ ¼
ðx u ðN22Þ=2 1 e2u=2 du: 2GðN=2Þ 0 2
Proof of Lemma 1. The expression 1 ðmþls~ 2z2 =2 Hðl; glmÞ ¼ P pffiffiffiffi e dz . g 2p m2ls~
ðB1Þ
ðB2Þ
ðB3Þ
is an even function of m and can be developed into Taylor series around m ¼ 0 as " # 1 X ›2n Hðl; glmÞ m2n Hðl; glmÞ ¼ : ðB4Þ 2n ð2nÞ! ›m n¼0 m¼0 Substituting Eq. (B4) into Eq. (B3) we obtain " # 1 X ›2n Hðl; glmÞ ð2n 2 1Þ!! 1 Wðl; g; mÞ ¼ 2n ð2nÞ! N n › m n¼0 m¼0 " # ›2 Hðl; glmÞ 1 ¼ Hðl; gl0Þ þ 2N ›m2 m¼0 þ OðN 21=2 Þ pffiffiffi and replacing m by 1= N in Eq. (B4), we have " # pffiffiffi ›2 Hðl; glmÞ 1 Hðl; gl1= N Þ ¼ Hðl; gl0Þ þ 2N ›m2 m¼0 þ OðN 21=2 Þ:
ðB5Þ
ðB6Þ
From Eqs. (B5) and (B6) it follows that Eq. (B1) is true. Consequently, we have the following approximate equation " # pffiffiffi ½qðN 21=2 ; gÞ2 < b; Hðl; gl1= N Þ ¼ 1 2 KN21 ðN 2 1Þ l2 where the argument of KN21 ½· is nothing else than the ð1 2 bÞ percentile of x2 distribution with ðN 2 1Þ degree of
230
A. Guba et al. / Reliability Engineering and System Safety 80 (2003) 217–232
freedom. Introducing the notation QN21 ð1 2 bÞ ¼ ðN 2 1Þ
½qðN
21=2
after integration we obtain
; gÞ
l2
2
;
P{zðsÞ 2 zðrÞ . g} ¼ 1 2
0
we find that sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffi N21 qð1= N ; gÞ: l < la ðg; bÞ ¼ QN21 ð1 2 bÞ This completes the proof of Theorem 2.
ðg
ts2r21 ð1 2 tÞN2s2r dt: Bðs 2 r; N 2 s þ r þ 1Þ
In other words ðB7Þ
b ¼ 1 2 Iðg; s 2 r; N 2 s þ r þ 1Þ
ðC3Þ
as stated. A
A
Appendix C. Proof of Theorem 3
Appendix D. Proof of Theorem 4
The derivation of Eq. (13) is based on the following observation. Let ðy GðyÞ ¼ gðtÞdt
The proof of Theorem 4 is given in two steps. In the first step we show that the theorem holds for p ¼ 2; and then we generalize the claim for p . 2: Step 1. We assume that the unknown joint distribution function of two output variables y1 and y2 is given by
21
be the unknown cumulative distribution function of the output variable y: Let y1 ; …; yN be its observed values. It can be shown [24] that irrespective of the form of GðyÞ; the random variables zj ¼ Gðyj Þ; j ¼ 1; …; N are independent and uniformly distributed in ½0; 1: However, the elements of the ordered sample zð1Þ , zð2Þ , · · · , zðNÞ are not independent [24], the bivariate density function of zðsÞ and zðrÞ; r , s is given by ðNÞ mðr;sÞ ðu; vÞ ¼
N! ur21 ðv ðr 2 1Þ!ðs 2 r þ 1Þ!ðN 2 sÞ! 2 uÞr2sþ1 vN2s ;
Gðy1 ; y2 Þ ¼
ð y1 ð y2 21
and denote by ðþ1 g1 ðy1 Þ ¼ gðy1 ; t2 Þdt2 ; 21
the density function of the output variable y1 : Let us consider the following random variable ðU1 ðU2 gðy1 ; y2 Þdy1 dy2 ; ðD1Þ A2 ¼ A2 ðL1 ; U1 ; L2 ; U2 Þ ¼
A2 ðL1 ; U1 ; L2 ; U2 Þ ¼ C2 ðL2 ; U2 lL1 ; U1 ÞA1 ðL1 ; U1 Þ:
yðrÞ
¼ P{zðsÞ 2 zðrÞ . g};
A1 ðL1 ; U1 Þ ¼
ðNÞ P{t # zðsÞ 2 zðrÞ # t þ dt} ¼ wðr;sÞ ðtÞdt
mðNÞ ðr;sÞ ðu; u þ tÞdudt:
ðC1Þ
Substituting mðNÞ r;s ðu; vÞ into Eq. (C1), the integration in Eq. (C1) can be carried out wðNÞ ðr;sÞ ðtÞ ¼
1 ts2rþ1 Bðr; s 2 rÞBðs; N 2 s þ 1Þ ð12t £ ur21 ð1 2 t 2 uÞN2s du;
ðU1 L1
g1 ðy1 Þdy1
ðD3Þ
and C2 ðL2 ; U2 lL1 ; U1 Þ ¼
ðU2 L2
f2 ðy2 lL1 ; U1 Þdy2 ;
ðD4Þ
where ÐU1 (C2)
f2 ðy2 lL1 ; U1 Þ ¼ ðþ1
where Bðj; kÞ is the beta function defined by Eq. (14). Taking this expression into account, we get ð1 ðNÞ P{zðsÞ 2 zðrÞ . g} ¼ wðr;sÞ ðtÞdt;
21
0
g
ðD2Þ
Here
we need the probability
0
L2
where the boundaries of the integration are random variables. The limits were discussed in Section 5.2.2. A2 can be expressed almost surely as
In order to determine the probability ðyðsÞ P gðyÞdy . g ¼ P{G½yðsÞ 2 G½yðrÞ . g}
ð12t
gðt1 ; t2 Þdt1 dt2
L1
0 # u # v # 1:
¼
21
ðU1 ¼
L1
gðy1 ; y2 Þdy1 ðU1 dy2 gðy1 ; y2 Þdy1 L1
L1
gðy1 ; y2 Þdy1
A1 ðL1 ; U1 Þ
ðD5Þ
A. Guba et al. / Reliability Engineering and System Safety 80 (2003) 217–232
is the random density of variable y2 under the condition that y1 lies in ½L1 ; U1 : Since A1 ðL1 ; U1 Þ ¼ G1 ½y1 ðs1 Þ 2 G1 ½y1 ðr1 Þ; using relation (C2), we find that P{t1 # A1 ðL1 ; U1 Þ # t1 þ dt1 } ¼
t1s1 2r1 21 ð1
2 t1 Þ dt Bðs1 2 r1 ; N 2 s1 þ r1 þ 1Þ 1
ðt 21
Gðy1 ; …; yp Þ ¼
ð yp
···
ð Y1
21
21
Ap ðL1 ; U1 ; …; Lp ; Up Þ ¼
f2 ðy2 lL1 ; U1 Þdy2 ;
Lp
¼
ðþ1 21
dyp · · ·
ðþ1 21
dyiþ1
where r1 # r2 , · · · , s2 # s1 : Finally we get
and
P{t2 # C2 ðL2 ; U2 lL1 ; U1 Þ # t2 þ dt2 }
fi ðyi lL1 ; U1 ; …; Li21 ; Ui21 Þ
t2s2 2r2 21 ð1 2 t2 Þs1 2r1 212s2 þr2 dt Bðs2 2 r2 ; s1 2 r1 2 s2 þ r2 Þ 2
¼ ðD7Þ
Note that expression Eq. (D7) contains neither L1 nor U1 ; therefore, distribution of random variable C2 is independent of L1 and U1 : Consequently, the joint density distribution of A1 and C2 is the product of Eqs. (D6) and (D7). We still need the density function of the random variable A2 : Exploiting the independence of C2 and A1 ; we get
ð1 1 1 2r1 21Þ kðrðNÞ1 ;s1 Þ ðxÞ‘ðs ðt=xÞddt ¼ wA2 ðtÞdt: ðr2 ;s2 Þ x t
L1
gðy1 ; …; yp Þdy1 · · ·dyp ;
ðUi Li
dyi · · ·
ðU1 L1
dy1 gðy1 ; …; yp Þ
ðU1 1 ðUi21 dyi21 · · · dy1 gðy1 ; …; yp Þ; Ai21 Li21 L1
which is the random density of the variable yi under the condition that Lj # yj # Uj ; j ¼ 1; …; i 2 1: As we did in Eq. (D4), we introduce the random probability measure associated with the condition that the first ði 2 1Þ output variables lie in the interval assigned to them by ½Lj ; Uj ; j ¼ 1; …; i 2 1 : Ci ¼ Ci ðLi ; Ui lL1 ; U1 ; …; Li21 ; Ui21 Þ ¼
ðUi
P{t # A2 ðL1 ; U1 ; L2 ; U2 Þ # t þ dt} ¼
ðU1
Ai ðL1 ; U1 ; …; Li ; Ui Þ
C2 ðL2 ; U2 lL1 ; U1 Þ ¼ G½y2 ðs2 ÞlL1 ; U1 2 G½y2 ðr2 ÞlL1 ; U1 ;
1 2r1 21Þ ¼ ‘ðs ðt2 Þdt2 : ðr2 ;s2 Þ
···
which is a p-fold integral over the p output variables. We introduce an intermediate term, in which an i-fold definite integral over the first i variables is involved, and the rest of the variables are integrated over the ½21; þ1 range:
with which we can express C2 as
¼
gðv1 ; …; vp Þdv1 · · ·dvp :
ðUp
ðD6Þ
To obtain the density function of C2 ðL2 ; U2 lL1 ; U1 Þ; we define the random probability measure GðtlL1 ; U1 Þ ¼
the output variables y1 ; …; yp is given by
Our task is to derive the probability distribution of the random variable
N2s1 þr1
¼ kðrðNÞ1 ;s1 Þ ðt1 Þdt1 :
231
Li
fi ðyi lL1 ; U1 ; …; Li21 ; Ui21 Þdyi :
The above defined Ai ’s obey the recursion ðD8Þ
Aiþ1 ¼ Ciþ1 Ai :
ðD10Þ
Substituting here Eqs. (D6) and (D7) and performing the indicated calculations we obtain: wA2 ðtÞ ¼
ts2 2r2 21 ð1 2 tÞN2s2 þr2 : Bðs2 2 r2 ; N 2 s2 þ r2 þ 1Þ
ðD9Þ
From this, immediately follows P{A2 ðL1 ; U1 ; L2 ; U2 Þ . g} ¼12
Bðg; s2 2 r2 ; N 2 s2 þ r2 þ 1Þ Bðs2 2 r2 ; N 2 s2 þ r2 þ 1Þ
¼ 1 2 Iðg; s2 2 r2 ; N 2 s2 þ r2 þ 1Þ: This completes Step 1. Step 2. Now we generalize the above result for p . 2: Let us assume that the unknown joint probability distribution of
Lemma 2. The probability of finding Ai in the interval ½ti ; ti þ dti is given by P{ti # Ai # ti þ dti } ¼
tisi 2ri 21 ð1 2 ti ÞN2si þri dt : Bðsi 2 ri ; N 2 si þ ri þ 1Þ i
ðD11Þ
Proof of Lemma 2. Eq. (D11) is certainly true for i ¼ 1; 2 because A1 ¼ A0 C1 ¼ C1 ¼
ðU1 L1
gðyÞdy
232
A. Guba et al. / Reliability Engineering and System Safety 80 (2003) 217–232
and A2 ¼ C2 A1 ¼ C2 ðL2 ; U2 lL1 ; U1 ÞA1 ðL1 ; U1 Þ ¼
ðU2 ðU1 L2
L1
gðy1 ; y2 Þdy1 dy2 :
Now we assume that Eq. (D11) is true for i ¼ j and show that it is true also for i ¼ j þ 1: Note that Aj and Cjþ1 are statistically independent because P{tjþ1 # Cjþ1 # tjþ1 þ dtjþ1 } s
¼
2r
21
jþ1 jþ1 tjþ1 ð1 2 tjþ1 Þsj 2rj 212sjþ1 þrjþ1 dt Bðsjþ1 2 rjþ1 ; sj 2 rj 2 sjþ1 þ rjþ1 Þ jþ1
does not involve the quantities Lj ; Uj ; …; L1 ; U1 ; which occur in Aj : The joint density function of Aj and Cjþ1 takes the form of the joint density function of A1 and C2 in the case of p ¼ 2: Hence the density function of Aj Cjþ1 ¼ Ajþ1 is obtainable from Eq. (D9) by substituting rjþ1 for r2 and sjþ1 for s2 ; i.e. P{tjþ1 # Ajþ1 # tjþ1 þ dtjþ1 } s
¼
2r
21
jþ1 jþ1 tjþ1 ð1 2 tjþ1 ÞN2sjþ1 þrjþ1 dt : Bðsjþ1 2 rjþ1 ; N 2 sjþ1 þ rjþ1 þ 1Þ jþ1
Hence, Eq. (D11) is proven for i ¼ 1; 2; …; p: This completes Step 2. Furthermore, the density function of Ap is given by P{t # Ap # t þ dt} ¼ wAp ðtÞdt ¼
tsp 2rp 21 ð1 2 tÞN2sp þrp dt: Bðsp 2 rp ; N 2 sp þ rp þ 1Þ
It is interesting to note that the density function of Ap does not depend on the integers r1 ; s1 ; …; rp21 ; sp21 : This completes the proof of Theorem 4 A References [1] US Nuclear Regulatory Commission, Standard review plan for the review of safety analysis report for nuclear power plants. Report NUREG-800; July 1981.
[2] Sorensen JM, editor. The reactor analysis support package (RASP). Report EPRI NP-4489, vol. 1. Campbell, CA: S. Levy; 1986. [3] Gandini A. Uncertainty analysis and experimental data transpositions methods based on perturbation theory. In: Ronen Y, editor. Handbook of uncertainty analysis. Boca Raton, FL: CRC Press; 1988. [4] Szabados L, Baranyai G, Guba A. PMK-2 handbook. Technical specifications of the Hungarian integral test facility for VVER-440/ 213 safety analysis. Budapest: KFKI Atomic Energy Research Institute; 1996. [5] Austregesilio H, Dellenbeck H, editors. ATHLET mod 12 cycle A programmers manual, vol. 1. GRS; 1998. [6] Kendall M, Stuart A, The advanced theory of statistics, vol. 2. London: Griffin; 1979. [7] Guidelines for best estimate approach to accident analysis of WWER nuclear power plants. Report WWER-SC-133. Wienna: IAEA; April 1996. [8] Wulff W, Boyack BE, Catton I, Duffey RB, Griffith P, Katsma KR, Lellouche GS, Levy S, Rohatgi US. Quantifying reactor safety margins. Part 3. Assessment and ranging parameters. Nucl Engng Des 1990;119:33. [9] Glaeser HG. Uncertainty evaluation of thermal-hydraulic code results. Proc Int Meet Best Estimate Meth Nucl Install Safety Anal (BE2000), Washington, DC, USA November 2000;. [10] Chojnacki E, Ounsy A. The IPSN method for uncertainty and sensitivity analysis and the associated software: SUNSET. Proc ICONE 4, Louisiana: ASME; 1996. [11] de Cre´cy A. Determination of the uncertainties of the constitutive relationship of the CATHAR-2 code. M&C 2001, Salt Lake City; September 2001. [12] Glaeser HG, et al. Uncertainty analysis of a post-experiment calculation in thermal hydraulics. Reliab Engng Syst Safety 1994; 45:19. [13] Marsden JE, McCracken M. The Hopf bifurcation and its application. Berlin: Springer; 1976. [14] Sattinger D. Group theoretic methods in bifurcation theory. Berlin: Springer; 1979. [15] Vladimirov VS. Equations of mathematical physics. New York: Marcel Dekker; 1971. chapter I, section 4. [16] Makai M. Chaos and safety in nuclear reactors. Proc Tenth Symp AER 2000;2:945. [17] Guba A, Trosztel I. Uncertainty analysis of a PMK-2 pressurizer surge line middle size break experiment. Budapest: KFKI Atomic Energy Research Institute; 2000. [18] Wilks SS. Ann Math Stat 1941;12:91. [19] Wilks SS. Ann Math Stat 1942;13:400. [20] Wald A. Ann Math Stat 1943;14:45. [21] Wald A. Ann Math Stat 1946;17:208. [22] Robbins H. Ann Math Stat 1944;15:214. [23] Krzykacz B. EQUUS—a computer program for the derivation of empirical uncertainty statements of results from large computer models. GRS-A-1720; 1990. [24] Pa´l L, Fundamentals of probability theory and statistics, vols. I and II. Budapest: Akade´miai Kiado´; 1995. in Hungarian.