PERGAMON
Computers and Structures 67 (1998) 47±51
Seismic hazard analysis: How to measure uncertainty? G. Grandori *, E. Guagenti, A. Tagliani Politecnico di Milano, Piazza Leonardo da Vinci 32, 20133 Milan, Italy
Abstract Applications of probabilistic seismic hazard analysis demand the adoption of a model (i.e. of the forms of a certain number of correlations and probabilistic distributions) and the estimate of the parameters of the model. As a measure of uncertainty in the calculation of the expected value of a given quantity (for instance the peak ground acceleration corresponding to a given return period at a given site) a coecient of variation is frequently adopted, which is intended to include uncertainties due to both the choice of the model and the estimate of parameters. The following three statements are illustrated in this paper: (1) in theory, the use of a coecient of variation, when uncertainties in modeling are involved, is not correct, (2) in practice, the aforesaid use can lead to unreliable results and (3) the analysis of uncertainties can be carried out in a more satisfactory way if uncertainties in modeling and uncertainties in the estimate of parameters are considered separately and with dierent approaches. # 1998 Elsevier Science Ltd. All rights reserved.
1. Introduction It is assumed that seismic hazard throughout a region is described by: (1) the probabilistic distribution of events in space and time; (2) the probabilistic distribution of magnitudes and (3) an attenuation law, i.e. the probabilistic distribution of a ground motion component as a function of magnitude, source-site distance and local ground conditions. The forms of these distributions constitute a ``model''. The numerical coecients that quantitatively de®ne the distributions are the ``parameters'' of the model. A model, with given values of its parameters, describes what will be called here, in a general sense, an ``earthquake process''. In the applications of probabilistic seismic hazard analysis, it is generally assumed that, for the considered region, an actual earthquake process does exist, with probabilistic characteristics that remained constant during the period of observation and will remain the same in the period involved in our predictions. In other words, we assume that there is a
* Author to whom all correspondence should be addressed.
``truth'', de®ned by the ``true'' probabilistic distributions of the above-mentioned quantities. The analysis of available data leads to the adoption of a model and to the estimate of its parameters, that constitute an approximation of the truth. As a measure of uncertainty in the calculation of a given quantity (for instance the value A of the peak ground acceleration corresponding to a given return period at a given site) a coecient of variation is frequently adopted, which is intended to include uncertainties due to both the choice of the model and the estimate of parameters. The following three statements are illustrated in this paper: (1) In theory, the use of a coecient of variation, when uncertainties in modelling are involved, is not correct. (2) In practice, the aforesaid use can lead to unreliable results. (3) The analysis of uncertainties can be carried out in a more satisfactory way if uncertainties in modelling and uncertainties in the estimate of parameters are considered separately, with dierent approaches. The ®rst statement can be simply supported by considering that the results obtained with dierent models
0045-7949/98/$19.00 # 1998 Elsevier Science Ltd. All rights reserved. PII: S 0 0 4 5 - 7 9 4 9 ( 9 7 ) 0 0 1 5 5 - 7
48
G. Grandori et al. / Computers and Structures 67 (1998) 47±51
do not constitute a sample of a random variable of which mean value and variance could be estimated. In particular, supposing that n dierent models are used, so that n values are obtained for the quantity A, by no means we can calculate the probability that the ``true'' value is included in the range de®ned by those n values (or in any other range), Moreover, the dispersion of the n values depends on the subjective choice of possible models and so does the mean value. In conclusion, when dierent models are proposed for the interpretation of reality (i.e. as an approximation of the ``truth''), the problem of uncertainty cannot be formulated in terms of mean value and coef®cient of variation. The correct question is: which one of the proposed models leads to the value of A ``nearest to the truth''? As regards the second statement (about the reliability of practical results), many examples of applied probabilistic seismic hazard analysis show that, starting from the same data base, the results obtained with dierent models can vary by a factor of 10 (the word ``model'' is used here in a broad sense and includes the speci®c procedure and/or simpli®cations adopted in the applications). In these conditions it does not make sense to rely on a mean value. It is worthwhile to quote two examples. Cornell [1]: ``three independent consulting teams recently conducted a seismic hazard analysis of the Diablo Canyon (California) Nuclear Power Plant site; one of the three results estimated annual probabilities more than one order of magnitude higher than the other two and the other two results, although apparently quite similar (at least when plotted on log±log paper), had two, counteracting, dierences, either of which by itself would have led to signi®cant numerical discrepancies between the two studies''. Krinitzsky [2]: ``Okrent (1975) engaged seven experts to give probabilistic estimations at eleven nuclear power plants sites. The experts were provided with the same basic information. They provided probabilistic motions at recurrence rates of 10ÿ4/year and 10ÿ6/year ten of eleven sites have accelerations that vary by factors of 8±10''. Krinitzsky concluded that ``probabilistic seismic hazard analysis, when based on multiple expert opinions, is intrinsically unreliable''. In particular, Krinitzsky writes in the abstract: ``procedures that statistically merge multiple expert opinions to get probabilistic seismic hazard evaluations are intrinsically defective and should not be used for design applications in engineering''. In the above-mentioned cases, the data base was the same for all experts, so that the dispersion of results is mainly due to the use of dierent models. Let us now introduce the discussion of the third statement. In order to separate the two kinds of uncer-
tainty, we will assume in a ®rst step that the model is correct and that uncertainties derive only from the ®nite number of data available on which to base estimates of the parameters of the model. As we shall see, in the frame of this step it is possible and appropriate to de®ne a coecient of variation of the quantity under consideration. However, in many cases this coecient is not by itself a meaningful index of the uncertainties deriving from the uncertain estimate of parameters. In a second and independent step a method is proposed for the comparison between the reliability of dierent models, when applied to a speci®c seismic region.
2. If the model is correct 2.1. Generals As anticipated in Section 1 we assume that for the considered region a ``true'' earthquake process does exist: it is de®ned by the ``true'' model and by the ``true'' values of the parameters. The hypothesis ``if the model is correct'' means that we know (or we had the luck to divine) the true model, while for the parameters we have to rely on the available data. These data constitute a sample of observations regarding the true process: uncertainties in the results of hazard analysis depend on the ``dimensions'' of the sample (number of events contained in the catalogue, number of events with strong-motion data leading to an estimate of the parameters of the attenuation law, . . .). To be more clear: with the same dimensions as those of the sample actually available, the true process could deliver to us an in®nity of samples, each one leading to a set of parameters and hence to a value As of the quantity A under consideration, i.e. to the sampling distribution of A. This distribution is representative of uncertainties due to the estimate of parameters. The coecient of variation of As is obviously of particular interest; however, attention should be paid to the fact that the mean value As may or may not coincide with the true value of A. Without detriment to generality of the method the problem of the above-mentioned uncertainties can be discussed with reference to a speci®c ``truth'', so that also numerical results become available. The ``truth'' is chosen, on purpose, very simple as follows. Earthquakes occur along a fault and are identi®ed by the location of the epicenter and by magnitude (independent of location). The fault length is 400 km. All locations are equally probable for a new earthquake, independently of the location of the previous ones.
G. Grandori et al. / Computers and Structures 67 (1998) 47±51
Earthquake occurrence follows a stationary Poissonian process, with mean number of events per year l = 0.1. The probabilistic distribution of magnitude is given by: FM
m ProbMRm 1 ÿ expexp
bm0 ÿ exp
bm,
1 with m0 5,
b 0:20
2
The expected value A of the peak horizontal ground acceleration at a site depends on M and on the distance R between the site and the epicenter: A A
M, R:
3
Eq. (3) has the form and the numerical values of the coecients bi proposed by Donovan and Bornstein [3]. Given M and R, the acceleration is a random variable de®ned as follows: Ae
M, R A
M, R e
4
e being a mean-one random variable log-normally distributed with standard deviation se=0.5. 2.2. In¯uence of the attenuation law In order to isolate uncertainties due to the attenuation law from other uncertainties, let us maintain constant, for the moment, the values of l and b; so that uncertainties derive only from the estimate of the parameters bi and se. We concentrate our attention on the quantity a(500), de®ned as the peak ground acceleration corresponding to 500 years return period at a site. Uncertainties in the estimate of se do not have in general a great in¯uence on the calculation of a(500). First because, if the data base is not extremely poor, the value of se can be estimated with good accuracy. Second because possible errors in the estimate of se lead to modest variations of a(500), as shown for instance in Table 1 for a site at a distance of 20 km from the fault.
49
The calculation has been carried out maintaining constant l and coecients bi, all of them with their ``true'' values, while for b many values have been considered in order to explore the in¯uence of this feature of the truth on the sensitivity of a(500) to possible errors in the estimate of se. Note that, depending on b, the value of a(500) may increase or decrease with increasing se. Table 1 exempli®es the in¯uence of the parameter se, which represents the dispersion of acceleration Ae about the mean A. However, uncertainty in the estimate of coecients bi, i.e. of the mean A, should also be considered. The estimate of coecients bi is based on the data collected in the occasion of a certain number of earthquakes that occurred in the zone. Let us call this set of data an ``attenuation-sample'' and AZ(M, R) the estimate of A(M, R) obtained from this sample. It is dicult to de®ne and even more so to calculate the sampling distribution of AZ. As a rough approximation of the uncertainty derived from the sampling distribution we can assume: AZ
M, R A
M, R Z
5
where Z is a mean-one random variable with standard deviation sZ. It is important to point out the dierent origins and hence the dierent roles, of the variables e and Z, and of their standard deviations se, sZ. The ®rst one describes the probabilistic distribution (in the true process) of acceleration about the mean. The second one describes the dispersion of the sampling distribution of the mean. As a consequence, if sZ=0, any given se leads to a precise value of a(500); only uncertainties in the estimate of se involve (modest) uncertainties in the calculation of a(500). On the contrary, if sZ$0, independently of se the sampling distribution of AZ and hence of a(500) will have a coecient of variation cv = sZ. As far as numerical values of sZ are concerned, an indication can be obtained with reference to the case in which the data are in terms of macroseismic intensity and the intensity decay is given by the formula [4]:
Table 1 Values of a(500)/g for dierent values of se and b b
Truth Max error %
se
0.24
0.22
0.20
0.18
0.3
0.14
0.20
0.41
0.98
0.5 0.7 40
0.17 0.20 17
0.22 0.25 14
0.39 0.37 5
0.92 0.84 9
a(500)/g at a distance of 20 km
50
I0 ÿ I a b ln R cR,
G. Grandori et al. / Computers and Structures 67 (1998) 47±51
6
where I0 is the epicentral intensity and I is the local intensity at distance R. In this case a single earthquake (if well documented) allows us to obtain the coecients a, b and c, i.e. to de®ne the attenuation law. An analysis of Italian data for earthquakes of Irpinia zone leads to sZ=0.3 if each single earthquake is assumed as attenuation sample. Obviously, if the attenuation sample is made of n>l earthquakes, the coecient of variation of the sampling distribution decreases rapidly with increasing n. In conclusion, if the model is correct, the uncertainties due to the attenuation law seem to be not dramatic if the available attenuation sample includes, say, more than 3 or 4 earthquakes. 2.3. In¯uence of l and b Now we consider the ``true'' earthquake process previously de®ned, with the simpli®cation se=sZ=0 (deterministic attenuation law) in order to isolate the uncertainties due to the uncertain estimate of l and b. By using the true probabilistic distributions with the true values of l and b, it is possible to produce a very long earthquake catalogue (say 100,000 years) that follows the distributions of the truth. From this catalogue we can draw out as many samples as we want with given dimensions and hence from each sample to estimate l and b and calculate a(500) at the considered site. We obtain in this way the sampling distribution of a(500), its mean value and coecient of variation. Table 2 shows the results when the true process is de®ned by l = 0.1, b = 0.2 and with 3 hypotheses about the length of the period of observation on which the estimate of l and b is based: 100, 200 and 300 years. The results of Table 2 show that, as anticipated in Section 2.1, the mean value of the sampling distribution may be dierent from the truth. Moreover, the coecient of variation may be rather high. These two indicators of uncertainty depend obviously on the features of the model. In particular, for instance, if the true value of b is larger than 0.2, all other conditions remaining constant, the coecients of variation of the sampling distribution are smaller than those indicated in Table 2. 3. Comparing models Suppose that a certain number of models are proposed for the interpretation of the earthquake process of a given region. As we outlined before, the question is: which one of the proposed models leads to the best results? The simple comparison between the results
Table 2 Mean value (m) and coecient of variation (cv) of the sampling distribution of a(500)/g for a site at 20 km from the fault
Truth 100 years observation 200 years observation 300 years observation
m
cv
0.414 0.504 0.475 0.464
ÿ 0.922 0.692 0.53
obtained with dierent models is not very helpful because, now, we do not know the ``truth''. A completely satisfactory answer cannot be given in this case; however, an approximate judgement can be reached in the following way. The experimental data lead to a certain number of point-values for each probabilistic distribution function involved in the de®nition of the earthquake process. We assume that these point-values simply derived from the experimental sample of data are ``true'' values, in the sense that if an in®nite number of data were available they would con®rm our point-values. Then we complete the information contained in the point-values by introducing the minimum necessary number of hypotheses: namely a criterion for the interpolation between point-values and the limit values of random variables. In this way the characteristics of an earthquake process are numerically de®ned. It diers obviously from the true process because the probabilistic distributions are derived from a limited sample (and with the aid of a few hypotheses). However, if the experimental data are not extremely poor, the main aspects of the truth are approximately present in the process derived from the data, without the introduction of the set of hypotheses about the form of the probabilistic distributions that constitute a model. If, for instance, the data refer to the Italian Irpinia region, the process obtained in the way described above could be called an ``Irpiniatype'' process. From this process (considered as the ``truth'') we can derive as many random samples as we want with the same dimensions as the original sample of experimental data. For each one of prospective models it is now possible to estimate the parameters by using many samples and so to obtain a mean value and a coecient of variation of the desired quantity. By comparison with the ``Irpinia-type truth'', the most reliable model, for the interpretation of the Irpinia-type earthquake process, can be selected with high con®dence. What we suggest, as an approximate judgement, is to assume that the most reliable model for the Irpiniatype process is also the most reliable one for the real (unknown) Irpinia process. This assumption involves
G. Grandori et al. / Computers and Structures 67 (1998) 47±51
obviously some uncertainties. However, these uncertainties, too, can be numerically analyzed.
References [1] Cornell, A., Probabilistic Seismic Hazard Analysis: A 1980 Assessment. 7WCEE, Istanbul, 1980.
51
[2] Krinitzsky E. Earthquake probability in engineering. Part 1. The use and misuse of expert opinion. Engineering Geology 1993;33:257±88. [3] Donovan, N. C. and Bornstein, A. E., Uncertainties in seismic risk procedures. J. Geotech. Eng. Div., Soc. Civil Engrs. 1978;104:869±887. [4] Howell, B. F. and Schultz, T. R., Attenuation of modi®ed Mercalli intensity with distance from the epicenter. Bull. Seismol. Soc. Am. 1975;65:651±665.