ARTICLE IN PRESS
Reliability Engineering and System Safety 93 (2008) 1060–1071 www.elsevier.com/locate/ress
A general first-order global sensitivity analysis method Chonggang Xu, George Zdzislaw Gertner Department of Natural Resources and Environmental Sciences, University of Illinois, 1102 South Goodwin Avenue, Urbana, IL 61801, USA Received 27 December 2006; received in revised form 12 April 2007; accepted 14 April 2007 Available online 24 April 2007
Abstract Fourier amplitude sensitivity test (FAST) is one of the most popular global sensitivity analysis techniques. The main mechanism of FAST is to assign each parameter with a characteristic frequency through a search function. Then, for a specific parameter, the variance contribution can be singled out of the model output by the characteristic frequency. Although FAST has been widely applied, there are two limitations: (1) the aliasing effect among parameters by using integer characteristic frequencies and (2) the suitability for only models with independent parameters. In this paper, we synthesize the improvement to overcome the aliasing effect limitation [Tarantola S, Gatelli D, Mara TA. Random balance designs for the estimation of first order global sensitivity indices. Reliab Eng Syst Safety 2006; 91(6):717–27] and the improvement to overcome the independence limitation [Xu C, Gertner G. Extending a global sensitivity analysis technique to models with correlated parameters. Comput Stat Data Anal 2007, accepted for publication]. In this way, FAST can be a general first-order global sensitivity analysis method for linear/nonlinear models with as many correlated/uncorrelated parameters as the user specifies. We apply the general FAST to four test cases with correlated parameters. The results show that the sensitivity indices derived by the general FAST are in good agreement with the sensitivity indices derived by the correlation ratio method, which is a nonparametric method for models with correlated parameters. r 2007 Elsevier Ltd. All rights reserved. Keywords: Correlated parameters; Fourier Amplitude Sensitivity Test; Random balance design; Sensitivity analysis; Uncertainty analysis
1. Introduction Identification and representation of uncertainty are recognized as essential components in model application [1–6]. In the study of uncertainty, we need to know how much uncertainty there is in the model output (uncertainty analysis) and where the uncertainty comes from (sensitivity analysis). There are two groups of sensitivity analyses: local sensitivity analysis and global sensitivity analysis [7]. The local sensitivity analysis examines the local response of the output(s) by varying input parameters one at a time while holding other parameters at central values. The global sensitivity analysis examines the global response (averaged over the variation of all the parameters) of model output(s) by exploring a finite (or even an infinite) region. The local sensitivity analysis is easy to implement. However, local Corresponding author. Tel.: +1 217 333 9346; fax: +1 217 244 3219.
E-mail address:
[email protected] (G.Z. Gertner). 0951-8320/$ - see front matter r 2007 Elsevier Ltd. All rights reserved. doi:10.1016/j.ress.2007.04.001
sensitivity analysis can only inspect one point at a time and the sensitivity index of a specific parameter is dependent on the central values of the other parameters. Thus, more studies currently are using global sensitivity analysis methods instead of local sensitivity analysis. Many global sensitivity analysis techniques are now available [7,8], such as Fourier Amplitude Sensitivity Test (FAST) [8–15]; the design of experiments method [16–20]; regression-based methods [5,21–24]; Sobol’s method [25]; McKay’s one-way ANOVA method [26]; and moment independent approaches [27–29]. One of the most popular global sensitivity analysis techniques is FAST. The theory of FAST was first proposed by Cukier et al. in 1970s [10–12,15]. The main mechanism of FAST is to assign each parameter with a characteristic frequency through a search function. Then, for a specific parameter, the variance contribution can be singled out of the model output by the characteristic integer frequency. Koda et al. [9] and McRae et al. [13] provided the computational codes for FAST. FAST is computationally efficient and can be used for
ARTICLE IN PRESS C. Xu, G.Z. Gertner / Reliability Engineering and System Safety 93 (2008) 1060–1071
nonlinear and non-monotonic models. Thus, it has been widely applied in sensitivity analysis of different models, such as chemical reaction models [10–12,14,15,30]; atmospheric models [31–33]; nuclear waste disposal models [34]; soil erosion models [35]; and hydrological models [36]. Although FAST has been widely applied, there are two limitations. First, there will be an aliasing effect among parameters when using integer characteristic frequencies. In order to avoid the aliasing effect, large sample sizes are needed when there are many candidate parameters. Second, FAST can only be applied for models with independent parameters [32]. However, in many cases, the parameters are correlated with one another. For example, in meteorology, the central pressure of the storm is correlated with the radius of the maximum wind [3]. Recently, there have been two improvements aimed at overcoming the aliasing and independency limitation. The aliasing effect limitation can be circumvented by using a random balance design [1]. For the independence limitation, it can be overcome by reordering [2]. In this paper, we propose to synthesize the two improvements using both a random balance design and reordering. In this way, FAST can be a general first-order global sensitivity analysis method for linear/nonlinear models with as many correlated/uncorrelated parameters as the user specifies. Section 2.1 is a brief review of the traditional FAST method. We introduce the improvement to overcome the independence limitation in Section 2.2 and the improvement to overcome the aliasing effect limitation in Section 2.3. In Section 2.4, we propose to synthesize the two improvements described in Sections 2.2 and 2.3. We introduce a general rule to select the maximum harmonic order for the synthesized method in Section 2.5 and provide a detailed procedure for the proposed method in Section 2.6. Then, we provide four test cases in Section 3. In Section 4, we discuss the random error effect which is introduced by the random balance design and compare our proposed method with other sensitivity analysis methods for models with correlated parameters. Finally, we summarize the main contribution of this paper in Section 5. 2. Method 2.1. Review of FAST The main idea of FAST is to introduce for all parameters a search function with a characteristic integer frequency. Through the search functions, the model output becomes a periodic function. Fourier analysis is then performed on the model outputs to derive the Fourier spectrum. Finally, the first-order sensitivity index of each parameter is derived from the Fourier spectrum based on the characteristic frequency. We consider a computer model Y ¼ f ðx1 ; x2 ; . . . ; xn Þ, where n is the number of independent parameters and the
1061
domain of independent parameters is the hypercube oxi oxðMaxÞ ; i ¼ 1; . . . ; nÞ, On ¼ ðX jxðMinÞ i i xðMinÞ i
(1)
xðMaxÞ i
where and is the minimum and maximum value for xi. In FAST, a search function is introduced for each parameter to explore the space On: 1 1 1 þ arcsinðsinðoi sÞÞ ; ppspp, xi ¼ F i (2) 2 p where oi is the characteristic frequency for xi and F 1 is the i inverse cumulative distribution function (ICDF) for xi [34]. s is the common variable for all parameters. The search function aims to sample the parameter space according to an expected probability density function and lets the parameter xi to oscillate periodically at the corresponding frequency oi. Consequently the model output is a periodic function of s. If the oi’s are positive integers, the period T is 2p [15]. Thus, we can expand the model output with a Fourier series Y ¼ f ðx1 ; x2 ; . . . ; xn Þ þ1 X ¼ f ðsÞ ¼ A0 þ fAk cosðksÞ þ Bk sinðksÞg,
ð3Þ
k¼1
where the Fourier coefficients Ak and Bk are defined as Z 1 p A0 ¼ f ðsÞ ds, 2p p Z p 1 Ak ¼ f ðsÞ cosðksÞ ds; p p Z p 1 Bk ¼ f ðsÞ sinðksÞ ds; ð4Þ p p where k 2 Z ¼ f1; . . . ; þ1g. For real problems, we need to use discrete sampling to get the Fourier coefficients of Eq. (4). We denote the sample for s as S ¼ fs1 ; s2 ; . . . ; sj ; . . . ; sN g,
(5)
where sj ¼ p þ p=N þ ð2p=NÞðj 1Þ; 8j ¼ 1; 2; . . . ; N. The search function of Eq. (2) is then applied to each sample element from S to get the sampled values for each parameter: @i ¼ fxi1 ; xi2 ; . . . ; xij ; . . . ; xiN g,
(6)
1 F 1 i ð2
þ ð1=pÞ arcsinðsinðoi sj ÞÞÞ. Then the model where xij ¼ is run N times on sample values for each parameter. Finally, through the sample S, the discretized Fourier coefficients can be calculated as follows: A0 ¼
N 1X f ðsj Þ, N j¼1
Ak ¼
N 2X f ðsj Þ cosðsj kÞ, N j¼1
Bk ¼
N 2X f ðsj Þ sinðsj kÞ, N j¼1
where A0 is equal to the sample mean of Y.
ð7Þ
ARTICLE IN PRESS C. Xu, G.Z. Gertner / Reliability Engineering and System Safety 93 (2008) 1060–1071
1062
Using the discrete sample, we can decompose the variance of the model output as follows: ðN1Þ=2 1 X V¼ ðA2k þ B2k Þ. 2 k¼1
(8)
We define the spectrum of the Fourier series as Lk ¼ 12 ðA2k þ B2k Þ ðk 2 ZÞ. By summing the spectrum values for the characteristic frequency oi and its higher order harmonics poi (p is a positive integer), the partial variance in model output arising from the uncertainty of parameter xi, Vi, can be estimated by X Lpoi , (9) Vi ¼ p
where p ¼ f1; 2; . . .g and poi pðN 1Þ=2. By summing all the values of the spectrum, we can derived the total variance X Lk . (10) V¼ k
The ratio Vi/V measures the contribution of parameter xi to the total variance of response variable Y. This ratio is also termed the first-order sensitivity index based on the Sobol’s definition of variance-based sensitivity indices [25]. Using the FAST method, it is also possible to get the higher order sensitivity indices and the interaction effects among parameters [37]. However, in this paper, we mainly focus on the first-order sensitivity indices. Based on the lottery setting one for sensitivity analysis as defined by Saltelli and Tarantola [38], the parameter with a higher sensitivity index would lead to the greater reduction in the variance of Y if the parameter is fixed and has a smaller expected variance of Y given xi, E(Var(Y|xi)). Alternatively, if the parameter xi has a higher sensitivity index, there would be a higher variance in the expected value of Y given xi (Var(E(Y|xi))), given that VarðY Þ ¼ EðVarðY jxi ÞÞ þ VarðEðY jxi ÞÞ.
(11)
Thus, the partial variance Vi can also be represented by Var(E(Y|xi)). There will be an aliasing effect if one particular frequency can be expressed as the product of an integer and another frequency. For example, if o1 ¼ 3o2 , then Lo1 ¼ L3o2 , which means that there is interference between V1 and V2 at an order of three. Schaibly and Shuler [14] defined a frequency set to be free of interferences to an order M if n X i¼1 n X
ai oi a0, jai jpM þ 1,
ð12Þ
i¼1
where ai is an integer and M is an integer of our choice. M is the maximum value for the p in calculating Vi in Eq. (9). Since poi pðN 1Þ=2, in order to calculate Vi from its characteristic frequency oi and its Mth harmonics Moi, the
minimum sample size N should be [37] N ¼ 2Momax þ 1,
(13)
where omax is the largest frequency among the frequency set for all parameters. For details about how to select the frequency free of interference to an order of four, please refer to Cukier et al. [10].
2.2. Improvement to overcome independency limitation In the conventional FAST, the sampled parameters are mutually independent. The variance contribution by a specific parameter is only from the variation of the parameter itself. However, for models with correlated parameters, the variance contribution by a specific parameter is not only from the variation of parameter itself, but also from the correlated variations of other parameters [39]. If we can let the characteristic frequency of a parameter capture not only the variations of parameter itself, but also the correlated variations of other parameters, then the variance contribution determined by this characteristic frequency will include the variance contribution from both the parameter itself and its correlation with other parameters. Xu and Gertner proposed reordering the independent parameter values ð@i ¼ fxi1 ; xi2 ; . . . ; xij ; . . . ; xiN gÞ so that the sampled parameter values honor the specified correlation structure [2]. The reordered parameter values are as follows: ðrÞ ðrÞ ðrÞ ðrÞ @ðrÞ i ¼ fxi1 ; xi2 ; . . . ; xim ; . . . ; xiN g,
(14)
where m is the sample order after reordering. In this paper, we use Iman and Conover’s method [40] to reorder the sampled parameter values @i . The main mechanism of Iman and Conover’s method is to reorder the independent sample according to the order of a correlated multivariate normal sample. After we get the correlated sample by Iman and Conover’s reordering method, the model is then run on the reordered parameter values to get the model output values: ðrÞ ðrÞ Y m ¼ f ðxðrÞ 1m ; . . . ; xim ; . . . ; xnm Þ;
m ¼ 1; . . . ; N.
(15)
Finally, for a specific parameter of interest, the FAST analysis is applied on the model output values according to the original sample order (i.e., order before applying the random reordering or order of j in Eq. (6)), thus restoring the characteristic frequency for the parameter. Through the reordering, the other parameters correlated with the parameter of interest will also partially honor the characteristic frequency of the parameter of interest. Thus, the variance contribution determined by this characteristic frequency in the FAST will include the variance contribution from both the parameter itself and its correlation with other parameters.
ARTICLE IN PRESS C. Xu, G.Z. Gertner / Reliability Engineering and System Safety 93 (2008) 1060–1071
2.3. Improvement to overcome aliasing limitation In order to avoid the aliasing effect by using integer characteristic frequencies, Tarantola et al. [1] proposed using a common frequency for all parameters and permuting the sample S ¼ fs1 ; s2 ; . . . ; sj ; . . . ; sN g. Each parameter has a different random permutation of S: ðiÞ S ðiÞ ¼ fs1ðiÞ ; s2ðiÞ ; . . . ; sðiÞ l ; . . . ; sN g,
(16)
where l is the sample order after permutation and i indicates the permuted sample for xi. Through the search function using Eq. (2), we get the permuted parameter values for each parameter. Then the model is run on the permuted parameter values for N times: ðiÞ ðnÞ Y l ¼ f ðx1 ðsð1Þ l Þ; . . . ; xi ðsl Þ; . . . ; xn ðsl ÞÞ.
(17)
Finally, FAST analysis is applied on the model output values according to the original sample order (i.e., order before applying the permutation) for a specific parameter of interest.
Specifically, there will be a different spectrum for each parameter. For parameter xi, the spectrum ðLðiÞ k Þ is as follows: 2
By reordering the independent sample to honor specified correlation structures among parameters, the Xu and Gertner’s method described in Section 2.2 overcomes the independence limitation, but is still limited by the aliasing effect. By using a common characteristic frequency and random balance design, the Tarantola et al.’s method described in Section 2.3 overcomes the aliasing effect limitation, but does not take into consideration the correlation among parameters. In order to avoid the aliasing effect and also take into consideration the correlation among parameters, we propose (1) using a common frequency for all parameters to sample parameter values through the search function using Eq. (2) (based on Tarantola et al.’s method [1]); and (2) reordering the sampled parameter values so that the parameter values honor the specified correlation structure (based on Xu and Gertner’s method [2]). The difference between this synthesized method and Xu and Gertner’s method is that the synthesized method uses a common characteristic frequency for all parameters. In this way, we can avoid the aliasing effect. The difference between the synthesized method and the Tarantola et al.’s method is that the synthesized method reorders parameter values @i to honor the correlation structure, whereas in the Tarantola et al.’s method, the parameter values @i are only randomly permuted through the sample S. The reordered parameter sample is similar to that in Eq. (14). The model is then run on the reordered sample to obtain the sampled output values. Finally, FAST analysis is applied on the model output values according to the original sample order (i.e., order before applying the random reordering or order of j in Eq. (6)) for a specific parameter of interest.
2
ðiÞ ðiÞ 1 LðiÞ k ¼ 2ðAk þ Bk Þ;
k ¼ 1; 2; . . . ; ðN 1Þ=2,
(18)
where Ak(i) and Bk(i) are the Fourier coefficient of the model output based on the original sample order of xi. Since all parameters have the common characteristic frequency (o), we define the amplitudes at the characteristic frequency and its higher harmonics as characteristic spectrum. For example, for xi, the characteristic spectrum ðCLðiÞ p Þ is as follows: ðiÞ CLðiÞ p ¼ Lpo ;
p ¼ 1; 2; . . . and popðN 1Þ=2.
(19)
For the selection of the maximum harmonic order p when calculating the partial variance contributed by each parameter in Eq. (9) (see Section 2.5 for details), we also derive the characteristic spectrum scaled by the output variance SCLðiÞ p ¼
2.4. Synthesis of improvements
1063
LðiÞ po
; p ¼ 1; 2; . . . and popðN 1Þ=2, (20) V P where V ¼ k LðiÞ k . Finally, the variance contribution (or sensitivity index) of xi can also be calculated as follows:
Si ¼
pmax X
SCLðiÞ po ;
(21)
p¼1
where pmax is the maximum harmonic order for the sensitivity index calculation. Generally we would recommend that pmax be four or six. For a more refined selection of pmax, please refer to Section 2.5. Since we are not limited by the aliasing effect, the sample size is no longer needed to be determined by Eq. (13). However, in order to distinguish the variance contribution using random balance design, we should have a relatively large sample size ðNb2pmax omax þ 1Þ. In order to be symmetric, we would recommend an odd sample size. 2.5. Maximum harmonic order selection By using a random balance design, we can circumvent the limitation of a harmonic order p (ppM, where M is generally four). However, the random balance design may introduce random error. Thus, we should not specify a large harmonic order p when calculating the sensitivity indices using Eq. (21). pmax should be selected so that the random error has a small effect on the sensitivity indices. We assume that the low characteristic amplitudes at high harmonic order are more susceptible to the random error than relatively high characteristic amplitudes at a low harmonic order. To reduce the random error effect, we recommend that the maximum harmonic order pmax be the harmonics order at which the scaled characteristic spectrum begins to stabilize at a small value. We need to point out that, due to the random error contamination, pmax may be misleading when based on
ARTICLE IN PRESS C. Xu, G.Z. Gertner / Reliability Engineering and System Safety 93 (2008) 1060–1071
1064
only one FAST sample. We would recommend checking this through several replications to obtain the best value of pmax. For example, for the parameter x3 in test case 2 in Section 3.2, the scaled characteristic spectrum begins to stabilize to a small value (o0.009) at p ¼ 4 (or frequency ¼ 92). So we selected the pmax as 4 for x3 in this test case (Fig. 1). For low sensitivity parameters, the characteristic spectrum may not show a decreasing pattern with increasing harmonic order. Instead, it may be irregular due to the random error contamination (Fig. 2). In this situation, we would recommend a low pmax (e.g., 1 or 2). Generally, we need a relatively small pmax for linear effect parameters, but large pmax for nonlinear effect parameters. However, the pmax also depends on the distribution of the parameters since we use the search function based on Eq. (2). If the ICDF function is highly
0.25
Amplitude
0.20
nonlinear, then even if the parameters have a linear effect, we may still need a large pmax. For example, for parameter x1 in test case 1 in Section 3.1, the characteristic spectrum begins to stabilize to a small value (o0.005) at p ¼ 14 (or frequency ¼ 322) (Fig. 3). So we set the pmax at 14 for test case 1. The large pmax is due to the nonlinearity of the ICDF for the normal distribution. For our four test cases in Section 3, the selected maximum harmonic orders are shown in Table 1. 2.6. Procedure for synthesized FAST Based on Sections 2.2–2.6, the synthesized FAST can be implemented as follows (Fig. 4): (1) Obtain a sample for the common variable s, S ¼ fs1 ; s2 ; . . . ; sj ; . . . ; sN g, where sj ¼ p þ p=N þ ð2p=NÞðj 1Þ; 8j ¼ 1; 2; . . . ; N and N is user-defined odd sample size. (2) A common characteristic frequency o is assigned to each parameter based on the search function of Eq. (2). The sample values of each parameter ð@i Þ are obtained by applying the search function on sample S.
0.15
0.10
0.10 0.08
0.00 0
50
100 150 Frequency
200
250
Fig. 1. Scaled characteristic spectrum of parameter x3 for test case 2 in Section 3 for 10 replicates.
Amplitude
0.05 0.06
0.04
0.02
0.00 100
200
400
500
Fig. 3. Scaled characteristic spectrum of parameter x1 for test case 1 in Section 3 for 10 replicates. We did not plot the spectrum before frequency 60 for better illustration.
0.015
Amplitude
300 Frequency
0.020
Table 1 Synthesized FAST specification for test cases
0.010
Model
Characteristic frequency
Sample size
Maximum harmonic order
Test case 1 Test case 2
23 23
921 461
Test case 3
23
461
Test case 4
23
461
14 4 for 2 for 4 for 2 for 4 for 2 for
0.005
0.000 0
50
100
150
200
250
Frequency
Fig. 2. Scaled characteristic spectrum of parameter x5 for test case 2 in Section 3 for 10 replicates.
x1–x4 others x1–x3 others x2–x5 others
ARTICLE IN PRESS C. Xu, G.Z. Gertner / Reliability Engineering and System Safety 93 (2008) 1060–1071
1065
Fig. 4. Illustration of the synthesized FAST procedure for models with correlated parameters. x1 and x2 are uniformly distributed (lower bound ¼ 0 and upper bound ¼ 1) with a correlation coefficient of 0.7. The model is Y ¼ x1 þ x2 . The common characteristic frequency o ¼ 2. Order1 is the original sample order for x1 and order2 is the original sample order for x2. See Section 2.6 for explanations of steps from (1) to (6).
(3) Iman and Conover’s method is employed to reorder the sample values within @i (i ¼ 1,y,n) to capture the correlation structure among parameters [40]. (4) The model is run on the reordered sample values @ðrÞ i (i ¼ 1,y,n) and N model output values can be derived. (5) For each parameter, the output values are shifted according to the parameter’s original sample order (i.e., the order before applying Iman and Conover’s method), thus restoring the characteristic frequency for the parameter. (6) For each parameter, we check the characteristic spectrum and select the harmonic order pmax at which the scaled characteristic spectrum begins to stabilize at a small value. Based on the selected pmax, Eq. (21) is used to obtain the sensitivity index for each parameter. 3. Applications In this section, we apply the synthesized FAST method (for notational convenience, it is termed as SFAST) to four test cases with correlated parameters. Test case 1 is a linear model, test case 2 is a nonlinear model, test case 3 is a non-monotonic model, and test case 4 is a real application model. The characteristic frequency and sample size for each test case is shown in Table 1. Test case 1–3 is based on 10, 50 and 100 replications, while test case 4 is based on only 10 replications. For the common characteristic frequency o, we selected relatively
large o in order to exhaustively explore the parameter space. However, too large a o would result in too large of a sample size N ðNb2pmax omax þ 1Þ. In this paper, we select o as 23. Since it is not easy to obtain the analytical indices of parameters in the presence of correlations, we also conduct the sensitivity analysis with the correlation ratio method (CRM), which is used as a reference to test the reliability of the proposed SFAST method. The CRM was proposed by Saltelli based on McKay’s one-way ANOVA method [41,42]. This method is a non-parametric method and is suitable for nonlinear and non-monotonic models. However, it is based on the replicated Latin hypercube samples and thus requires a large sample size. In this paper, the CRM is based on the 100 replications with each replicate having a sample size of 500 (a total of 50 000 model runs to obtain the sensitivity indices for all parameters). 3.1. Test case 1 We first use a very simple model Y ¼ 2x1+3x2 where x1 and x2 are standard normally distributed with a Pearson correlation coefficient of 0.7. The variance of model output is V ¼ VarðY Þ ¼ 4 Varðx1 Þ þ 9 Varðx2 Þ þ 12 Covðx1 ; x2 Þ ¼ 21:4.
ð22Þ
ARTICLE IN PRESS C. Xu, G.Z. Gertner / Reliability Engineering and System Safety 93 (2008) 1060–1071
1066
1.0
Sensitivity
0.8 0.6 0.4 0.2 0.0 x1
x2 Parameter
Fig. 5. Sensitivity indices for test case 1 using SFAST with 10, 50 and 100 replications. The cross over each bar represents the standard errors. The circles are the results based on the correlation ratio method (CRM).
According to the lottery setting one proposed by Saltelli and Tarantola [38], the partial variance due to x1 is V 1 ¼ VarfE½ð2x1 þ 3x2 Þjx1 g ¼ 4 Varðx1 Þ þ 9 Var½Eðx2 jx1 Þ þ 12 Covðx1 ; x2 Þ ¼ 4 Varðx1 Þ þ 9 0:72 Varðx2 Þ þ 12 Covðx1 ; x2 Þ ¼ 16:81.
ð23Þ
Similarly, the partial variance due to x2 is
uniform distribution with a lower limit of 0 and an upper limit of 5. A more practical measure of correlation among parameters with non-normal distributions is the rank correlation coefficient [40,43]. The rank correlation coefficient describes the rank dependency among variables. For this case, we assume a rank correlation structure as follows: 3 2 1 0 0 0 0 0 0 0 0 0 6 0 1 0:3 0:7 0 0 0 0 0 07 7 6 7 6 6 0 0:3 1 0:4 0 0 0 0 0 07 7 6 6 0 0:7 0:4 1 0 0 0 0 0 07 7 6 7 6 60 0 0 0 1 0 0 0 0 07 7. 6 (26) 60 0 0 0 0 1 0 0 0 07 7 6 7 6 60 0 0 0 0 0 1 0:4 0 0 7 7 6 60 0 0 0 0 0 0:4 1 0 0 7 7 6 7 6 40 0 0 0 0 0 0 0 1 05 0 0 0 0 0 0 0 0 0 1 The results show that the sensitivity indices calculated by SFAST is in close agreement with those by CRM (Fig. 6). The results also show that, due to the correlation, the sensitivity of x1 becomes less than x2 and x3. We need to point out that, in this test case, there are interactions among x1, x2 and x3. In order to quantify the interaction effect, the reader can refer to the extended FAST proposed by Saltelli et al. [37].
V 2 ¼ VarfE½ð2x1 þ 3x2 Þjx2 g ¼ 4 0:72 Varðx1 Þ þ 9 Varðx2 Þ þ 12 Covðx1 ; x2 Þ ¼ 19:36.
3.3. Test case 3 ð24Þ
Based on Eqs. (22)–(24), we can analytically obtain the sensitivity indices for x1 and x2. The results show that the sensitivity indices of x1 and x2 derived by SFAST with 10, 50 and 100 replications are in good agreement with the CRM and analytical sensitivity indices (Fig. 5).
The G-function of Sobol’ is a popular non-monotonic test function that has been used for assessing global sensitivity analysis methods [1,37,44,45]. The mathematical form of the G-function of Sobol’ is as follows: f ¼
n Y
gi ðxi Þ,
(27)
i¼1
3.2. Test case 2
where n is the number of candidate parameters and
We use a test case introduced by Lu and Mohanty [34], which is represented through an explicit function as follows (for notational convenience, we term it the L-function): y¼
10 X
ai ðxi þ x3i Þ þ 50ðx1 x2 x3 Þ2 ,
(25)
i¼1
where a1 ¼ 100, a2 ¼ 80, a3 ¼ 60, a4 ¼ 40, a5 ¼ 20, a6 ¼ 0.1, a7 ¼ 0.08, a8 ¼ 0.06, a9 ¼ 0.02, a10 ¼ 0.01. All the parameters have nonlinear effects (represented by the cubic terms), while x1, x2 and x3 have an interaction effect (represented by the product of x1, x2 and x3). If all xi are of the same distribution and are independent, it is clear that the effect of xiþ1 is always smaller than xi, i ¼ 1,2,y,9. In this test case, we assume that every parameter has a
gi ðxi Þ ¼
j4xi 2j þ ai 1 þ ai
for 0pxi p1 and ai X0.
(28)
The analytical partial variances (Vi) and the total variance (V) of the model output are as follows: 8 1 > > ; > Vi ¼ < 3ð1 þ ai Þ2 (29) n Q > > ðV þ 1Þ 1: V ¼ > i : i¼1
The lower is the value of ai, the higher the partial variance contribution. In this case, we set ai ¼ f0; 1; 4:5; 9; 99; 99; 99; 99g, which indicates the decreasing order of importance [44]. We assume a correlation structure
ARTICLE IN PRESS C. Xu, G.Z. Gertner / Reliability Engineering and System Safety 93 (2008) 1060–1071
1067
0.35
0.30
Sensitivity
0.25
0.20
0.15
0.10
0.05
0.00
x1
x2
x3
x4
x5
x6
x7
x8
x9
x10
Parameter Fig. 6. Sensitivity indices for test case 2 using SFAST with 10, 50 and 100 replications. The cross over each bar represents the standard errors. The circles are the results based on the correlation ratio method (CRM).
0.8
Sensitivity
0.6
0.4
0.2
0.0 x1
x2
x3
x4
x5
x6
x7
x8
Parameter Fig. 7. Sensitivity indices for test case 3 using SFAST with 10, 50 and 100 replications. The cross over each bar represents the standard errors. The circles are the results based on the correlation ratio method (CRM).
Table 2 Parameter specifications for uniform distributions for World3 model Parameter
Label
Lower bound
Upper bound
x1 x2 x3 x4 x5 x6 x7
Industrial output per capita desired Industrial capital output ratio before 1995 Fraction of industrial output allocated to consumption before 1995 Fraction of industrial output allocated to consumption after 1995 Average life of industrial capital before 1995 Average life of industrial capital after 1995 Initial industrial capital
315 2.7 0.387 0.387 12.6 16.2 1.89(10+11)
385 3.3 0.473 0.473 15.4 19.8 2.31(10+11)
ARTICLE IN PRESS C. Xu, G.Z. Gertner / Reliability Engineering and System Safety 93 (2008) 1060–1071
as follows: 2 1 0 0:6 6 0 1 0 6 6 6 0:6 0 1 6 6 0 0 0 6 6 6 0 0:7 0 6 6 0 0 0 6 6 4 0 0 0 0
0
0
0 0
0 0:7
0 0
0 1
0 0
0 0
0
1
0
0 0
0 0
1 0:4
0
0
0
Earth [46]. The model consists of six main systems: food system, agriculture system, industrial system, population system, non-renewable resources system and the pollution system.
3
0 0
0 07 7 7 0 07 7 0 07 7 7. 0 07 7 0:4 0 7 7 7 1 05 0
(30) 0.6 x2 x3 x4 x5
0.5
1
0.4
Results show that the sensitivity indices calculated by SFAST are in good agreement with those from the CRM (Fig. 7). Results also show that the sensitivity index of x3 becomes larger than x2 due to the correlation between x3 and x1, which should be less than x2 under parameter independence.
Sensivity
1068
0.3 0.2 0.1 0.0
3.4. Test case 4 1900
For the last test case, we test a fairly complex real model: World3. The World3 model is a computer program for simulating the interactions between population, industrial growth, food production and limits in the ecosystems of the
1950
x2
x3 SFAST CRM
Sensitivity
Sensitivity
0.3 0.2
SFAST CRM
0.5 0.4 0.3 0.2
0.1
1950
2000 Year
2050
0.1 1900
2100
1950
x4
2000 Year
2050
2100
x5 0.18 SFAST CRM
SFAST CRM
0.16 0.14 Sensitivity
Sensitivity
2100
0.6
0.4
0.22 0.20 0.18 0.16 0.14 0.12 0.10 0.08 0.06 0.04 0.02 0.00 1900
2050
Fig. 9. Sensitivity indices of World3 model using FAST under the assumption of parameter independences. Excluding x4, all other parameters not plotted have sensitivity indices less than 0.02.
0.5
0.0 1900
2000 Year
0.12 0.10 0.08 0.06 0.04 0.02
1950
2000 Year
2050
2100
0.00 1900
1950
2000 Year
2050
2100
Fig. 8. Sensitivity indices for test case 4 using SFAST and the correlation method (CRM). The vertical line over SFAST sensitivity is the standard error based on 10 replications.
ARTICLE IN PRESS C. Xu, G.Z. Gertner / Reliability Engineering and System Safety 93 (2008) 1060–1071
In this test case, we are concerned with the industrial system, which can provide the products for world population [46]. At the same time, this industrial system creates pollution, which reduces the land productivity (for details, please refer to [46]). The seven parameters of interest in the test case are shown in Table 2. The output of interest is the world human population on an annual basis. For real application models, the parameter range, distribution and correlation among parameters can be derived from empirical or historical data [3,47]. However, for the simplification of our test case, we assume a uniform distribution for each parameter. The bounds for each parameter are assumed to be a 10% deviation of the central value (Table 2). We assume there is a positive rank correlation of 0.6 between x3 and x4 and a positive rank correlation of 0.4 between x5 and x6. The results show that there is good agreement between the sensitivity indices calculated with the SFAST method and the CRM (Fig. 8). For x2, x3 and x4, after year 2030, the sensitivity indices calculated by SFAST are slightly higher than those by CRM, which may be due to the random error (please refer to Section 4.1 for details). In the results, we do not show the parameters with small sensitivity indices (o0.02) in view of the fact that small numbers are very susceptible to random error (please refer to Section 4.1 for details). The results when all the parameters are assumed to be independent are shown (Fig. 9). The parameter x4, which was an important variable when there was correlation between x3 and x4 (Fig. 8), becomes a trivial effect parameter under the assumption of independence. This illustrates the importance of incorporating the correlation structure in the sensitivity analysis. However, the parameter x5 is almost the same for the correlated and independent cases. This is because x5 and x6 has a relatively small contribution to the variance in model output and the correlation between x5 and x6 does not have much of an effect on the sensitivities of x5 and x6. 4. Discussion 4.1. Random error effect By using the reordered randomized balance design, we get around the aliasing effect limitation. However, the random design will also introduce random error. The random error will contaminate the periodic model output. Thus, the spectrum value at high order harmonics may not result from the parameters, but from the random error (see Fig. 1 for better understanding). Random error can exaggerate the FAST sensitivity especially when we calculate the variance contribution (or sensitivity indices) using Eq. (21) with a large pmax. In order to reduce the random error contamination, we should select an appropriate maximum harmonic order pmax as specified in Section 2.5. We may use a relatively
1069
large sample size for FAST. More replications may be another choice. But in our test cases, additional replications were found not to have much of an effect on the sensitivity outputs. We need to point out that the random error will not have much of an effect on the sensitivity indices for parameters with large variance contribution. In global sensitivity analysis, it is necessary to estimate accurately the sensitivity indices for the most important parameters but not for trivial parameters. Thus, the random error effect should not have much affect on our proposed FAST. 4.2. FAST for models with correlated parameters In this paper we synthesized two improvements to FAST so that FAST can be used as a general first-order global sensitivity method for linear/nonlinear models with as many correlated/uncorrelated parameters as the user specifies. It is attractive in that FAST can be applied for models with correlated parameters and also not be limited by the aliasing effects. Up to now, only a few studies have been conducted to obtain the sensitivity index for models with correlated input. Iman et al. [3,48] proposed the partial correlation as a measure of parameter sensitivity for models with correlated input based on the LHS. Xu and Gertner [39] proposed a regression-based method to derive the correlated contribution (by variations of parameter correlated with other parameters) and the uncorrelated contribution (by variations of parameter uncorrelated with other parameters). Both the partial correlation method and Xu and Gertner’s regression-based method rely on the assumption that the parameter effects are approximately linear. However, for complex models, it can be expected that parameter effects are nonlinear. Bedford [49] proposed the Gram–Schmidt orthogonalization to obtain orthogonal parameter input. However, the parameter sensitivity index is dependent on the order of the parameter orthogonalization. Fang et al. [50] proposed sequential sampling to approximate the approximation to the differential sensitivity index. Saltelli et al. [41,42] proposed the CRM based on McKay’s method. However, both Fang et al.’s method and the CRM require a large sample size which would be impractical for complex models. In addition, Fang et al.’s method is only valid for normally distributed parameters. The synthesized FAST method is computationally efficient and can be used for non-linear and non-monotonic models. Thus, it would be a good choice for uncertainty and sensitivity analysis for models with correlated parameters. 5. Conclusion For real problems, it would be common for the parameters of the model to be mutually correlated. This paper synthesizes two recent improvements in FAST to overcome the aliasing effect and the independence limitation; and proposes a general first-order global sensitivity
ARTICLE IN PRESS 1070
C. Xu, G.Z. Gertner / Reliability Engineering and System Safety 93 (2008) 1060–1071
approach for linear/nonlinear models with as many correlated/uncorrelated parameters as the user specifies. The main mechanism of the synthesized FAST is to reorder the FAST parameter sample (based on one common characteristic frequency) to honor a specified correlation structure so that it can be used for models with correlated parameters. By using a common characteristic frequency for all parameters, the synthesized FAST method is not limited by the aliasing effect. Thus, it is computationally efficient and would be a good choice for uncertainty and sensitivity analysis for models with correlated parameters. Acknowledgments U.S. Army Corps of Engineers Construction Engineering Research Laboratory (CERL) and U.S. Department of Agriculture McIntire-Stennis funds were used to support this study. We thank three anonymous reviewers for their helpful comments which greatly improved this paper. References [1] Tarantola S, Gatelli D, Mara TA. Random balance designs for the estimation of first order global sensitivity indices. Reliab Eng Syst Safety 2006;91(6):717–27. [2] Xu C, Gertner G. Extending a global sensitivity analysis technique to models with correlated parameters. Comput Stat Data Anal 2007, accepted for publication, doi:10.1016/j.csda.2007.04.003. [3] Iman RL, Johnson ME, Schroeder TA. Assessing hurricane effects. Part 1. Sensitivity analysis. Reliab Eng Syst Safety 2002;78(2):131–45. [4] Iman RL, Johnson ME, Schroeder TA. Assessing hurricane effects. Part 2. Uncertainty analysis. Reliab Eng Syst Safety 2002;78(2): 147–55. [5] Helton JC, Davis FJ, Johnson JD. A comparison of uncertainty and sensitivity analysis results obtained with random and Latin hypercube sampling. Reliab Eng Syst Safety 2005;89(3):305–30. [6] Saltelli A, Tarantola S, Campolongo F. Sensitivity analysis as an ingredient of modeling. Stat Sci 2000;15(4):377–95. [7] Saltelli A, Chan K, Scott M. Sensitivity analysis. Probability and statistics series. West Sussex: Wiley; 2000. [8] Saltelli A, Ratto M, Tarantola S, Campolongo F. Sensitivity analysis for chemical models. Chem Rev 2005;105(7):2811–26. [9] Koda M, McRae GJ, Seinfeld JH. Automatic sensitivity analysis of kinetic mechanisms. Int J Chem Kinet 1979;11(4):427–44. [10] Cukier RI, Schaibly JH, Shuler KE. Study of the sensitivity of coupled reaction systems to uncertainties in rate coefficients. III. Analysis of the approximations. J Chem Phys 1975;63(3):1140–9. [11] Cukier RI, Levine HB, Shuler KE. Nonlinear sensitivity analysis of multiparameter model systems. J Comput Phys 1978;26(1):1–42. [12] Cukier RI, Levine HB, Shuler KE. Nonlinear sensitivity analysis of multiparameter model systems. J Phys Chem 1977;81(25):2365–6. [13] McRae GJ, Tilden JW, Seinfeld JH. Global sensitivity analysis—a computational implementation of the Fourier Amplitude Sensitivity Test (FAST). Comput Chem Eng 1982;6(1):15–25. [14] Schaibly JH, Shuler KE. Study of the sensitivity of coupled reaction systems to uncertainties in rate coefficients. II. Applications. J Chem Phys 1973;59(8):3879–88. [15] Cukier RI, Fortuin CM, Shuler KE, Petschek AG, Schaibly JH. Study of the sensitivity of coupled reaction systems to uncertainties in rate coefficients. I. Theory. J Chem Phys 1973;59(8):3873–8. [16] Saltelli A, Andres TH, Homma T. Sensitivity analysis of model output: performance of the iterated fractional factorial design method. Comput Stat Data Anal 1995;20(4):387–407.
[17] Henderson-Sellers B, Henderson-Sellers A. Sensitivity evaluation of environmental models using fractional factorial experimentation. Ecol Modelling 1996;86(2–3):291–5. [18] Cryer SA, Havens PL. Regional sensitivity analysis using a fractional factorial method for the USDA model GLEAMS. Environ Modelling Software 1999;14(6):613–24. [19] Beres DL, Hawkins DM. Plackett–Burman techniques for sensitivity analysis of many-parametered models. Ecol Modelling 2001; 141(1–3):171–83. [20] Morris MD. Factorial sampling plans for preliminary computational experiments. Technometrics 1991;33(2):161–74. [21] Helton JC, Davis FJ. Latin hypercube sampling and the propagation of uncertainty in analyses of complex systems. Reliab Eng Syst Safety 2003;81(1):23–69. [22] Helton JC, Davis FJ. Illustration of sampling-based methods for uncertainty and sensitivity analysis. Risk Anal 2002;22(3): 591–622. [23] Helton JC. Uncertainty and sensitivity analysis techniques for use in performance assessment for radioactive waste disposal. Reliab Eng Syst Safety 1993;42(2–3):327–67. [24] Helton JC, Johnson JD, Sallaberry CJ, Storlie CB. Survey of sampling-based methods for uncertainty and sensitivity analysis. Reliab Eng Syst Safety 2006;91(10–11):1175–209. [25] Sobol’ IM. Sensitivity estimates for nonlinear mathematical models. Math Modeling Comput Exp 1993;1(4):407–14. [26] McKay MD. Nonparametric variance-based methods of assessing uncertainty importance. Reliab Eng Syst Safety 1997;57(3): 267–79. [27] Chun MH, Han SJ, Tak NI. An uncertainty importance measure using a distance metric for the change in a cumulative distribution function. Reliab Eng Syst Safety 2000;70(3):313–21. [28] Borgonovo E. Measuring uncertainty importance: investigation and comparison of alternative approaches. Risk Anal 2006;26(5): 1349–61. [29] Park CK, Ahn KI. A new approach for measuring uncertainty importance and distributional sensitivity in probabilistic safety assessment. Reliab Eng Syst Safety 1994;46:253–61. [30] Haaker MPR, Verheijen PJT. Local and global sensitivity analysis for a reactor design with parameter uncertainty. Chem Eng Res Des 2004;82(A5):591–8. [31] Collins DC, Avissar R. An evaluation with the Fourier Amplitude Sensitivity Test (FAST) of which land-surface parameters are of greatest importance in atmospheric modeling. J Climate 1994;7(5):681–703. [32] Rodriguez-Camino E, Avissar R. Comparison of three land-surface schemes with the Fourier Amplitude Sensitivity Test (FAST). Tellus Series A—Dyn Meteorol Oceanogr 1998;50(3):313–32. [33] Kioutsioukis I, Tarantola S, Saltelli A, Gatelli D. Uncertainty and global sensitivity analysis of road transport emission estimates. Atmos Environ 2004;38(38):6609–20. [34] Lu Y, Mohanty S. Sensitivity analysis of a complex, proposed geologic waste disposal system using the Fourier Amplitude Sensitivity Test method. Reliab Eng Syst Safety 2001;72(3):275–91. [35] Wang G, Fang S, Shinkareva S, Gertner GZ, Anderson A. Uncertainty propagation and error budgets in spatial prediction of topographical factor for revised universal soil loss equation (RUSLE). Trans Am Soc Agric Eng 2001;45(1):109–18. [36] Francos A, Elorza FJ, Bouraoui F, Bidoglio G, Galbiati L. Sensitivity analysis of distributed environmental simulation models: understanding the model behaviour in hydrological studies at the catchment scale. Reliab Eng Syst Safety 2003;79(2):205–18. [37] Saltelli A, Tarantola S, Chan KPS. A quantitative model-independent method for global sensitivity analysis of model output. Technometrics 1999;41(1):39–56. [38] Saltelli A, Tarantola S. On the relative importance of input factors in mathematical models: safety assessment for nuclear waste disposal. J Am Stat Assoc 2002;97(459):702–9.
ARTICLE IN PRESS C. Xu, G.Z. Gertner / Reliability Engineering and System Safety 93 (2008) 1060–1071 [39] Xu, C, Gertner, GZ. Uncertainty and sensitivity analysis for models with correlated parameters. Reliab Eng Syst Safety 2007; in review. [40] Iman RL, Conover WJ. A distribution-free approach to inducing rank correlation among input variables. Commun Stat—Simulation Comput 1982;11(3):311–34. [41] Saltelli A, Ratto M, Tarantola S. Model-free importance indicators for dependent input. In: Proceedings of SAMO 2001, third international symposium on sensitivity analysis of model output, Madrid, 2001. [42] Saltelli A, Tarantola S, Campolongo F, Ratto M. Sensitivity analysis in practice: a guide to assessing scientific models. West Sussex: Wiley; 2004. [43] Iman RL, Davenport JM. Rank correlation plots for use with correlated input variables. Commun Stat—Simulation Comput 1982;11(3):335–60. [44] Borgonovo E, Apostolakis GE, Tarantola S, Saltelli A. Comparison of global sensitivity analysis techniques and importance measures in PSA. Reliab Eng Syst Safety 2003;79:175–85.
1071
[45] Saltelli A, Sobol’ IM. About the use of rank transformation in sensitivity analysis of model output. Reliab Eng Syst Safety 1997;50(3):225–39. [46] Meadows DH, Meadows DL, Randers J. Beyond the limits. Post Mills, Vermont: Chelsea Green Publishing Company; 1992. [47] Kanso A, Chebbo G, Tassin B. Application of MCMC-GSA model calibration method to urban runoff quality modeling. Reliab Eng Syst Safety 2006;91(10–11):1398–405. [48] Iman RL, Shortencarier MJ, Jonson JD. A FORTRAN 77 and user’s guide for calculation of partial correlation function and standardized coefficient. In: NUREG/CR-4122, SAND85-0044, Albuquerque, NM: Sandia National Laboratories; 1985. [49] Bedford T. Sensitivity indices for (Tree)-dependent variables. In: Proceedings of the second international symposium on sensitivity analysis of model output, Venice, Italy, 1998. [50] Fang S, Gertner GZ, Anderson A. Estimation of sensitivity coefficients of nonlinear model input parameters which have a multinormal distribution. Comput Phys Commun 2004;157(1):9–16.