Extreme value prediction of snow avalanche runout

Extreme value prediction of snow avalanche runout

Cold Regions Science and Technology, 19 ( 1991 ) 163-175 Elsevier Science Publishers B.V., Amsterdam 163 Extreme value prediction of snow avalanche ...

944KB Sizes 0 Downloads 59 Views

Cold Regions Science and Technology, 19 ( 1991 ) 163-175 Elsevier Science Publishers B.V., Amsterdam

163

Extreme value prediction of snow avalanche runout D.M. McClung ~ and A.I. Mears b alnstitutefor Research in Construction, National Research Council Canada, 3650 WesbrookMall, Vancouver,B.C. V6S 2L2, Canada bNatural Hazards Consultant, 222 East GothicAvenue, Gunnison, CO 81230, USA

(Received February 21, 1990; accepted after revision June 19, 1990)

ABSTRACT McClung, D.M. and Meats, A.I., 1991. Extreme value prediction of snow avalanche runout. ColdReg. Sci. Teehnol., 19: 163-175. Avalanche runout distances have traditionally been calculated by selecting friction coefficients and then using them in an avalanche dynamics model. Uncertainties about the mechanical properties of flowing snow and its interaction with terrain make this method speculative. Here, an alternative simple method of predicting runout based on terrain variables is documented. By fitting runout data from five mountain ranges to extreme value distributions, we are able to show how ( and why) extreme value parameters vary with terrain properties of different ranges. The method is shown to be applicable to small and truncated data sets which makes it attractive for use in situations where detailed information on avalanche runout is limited.

Introduction When decisions must be made about definition o f safe areas and placement o f structures in snow avalanche terrain, estimation of avalanche stopping positions is the primary consideration. Speed measurements show that large avalanches maintain high speed for long distances and then they slow down very rapidly at the end o f the runout zone. Therefore, if structures are built, high impact forces can be expected unless construction is limited to locations near the end of the runout zone. Given all the uncertainties about avalanche velocities, decelerations and impact pressures, the best policy is to avoid threatened areas. In spite o f the need for high precision in defining safe areas, data show (e.g., Lied and Bakkeh6i, 1980) that runout distances have a statistical character. When the runout distance cannot be found in the field or when reliable long-term data are unavailable or discontinuous, we feel that statistical prediction (based upon terrain features) represents

0165-232X/91/$03.50

the only rational quantitative method for determining runout distances. The alternative (traditional) approach for calculating runout is based upon attempts to specify friction coefficients in an avalanche dynamics model. Uncertainties about flowing snow and its interaction with terrain, make this method highly speculative. The pioneering efforts in statistical rnnout prediction (Boris and Mears, 1976; Lied and Bakkeh6i, 1980; Bakkeh6i et at., 1983) were based on regression analyses o f topographic parameters. However, McClung et at. (1989) analyzed data from four different ranges and concluded that the data obey a Gumbel (extreme value) distribution. We believe this approach is superior and have employed it here. In this paper, a simple method is documented to fit runout data to an extreme value distribution. The method yields extreme value parameters which are largely independent o f sample size (number o f avalanche paths) except for the (unavoidable) dependence o f the mean and standard deviation of the

© 1991 - - Elsevier Science Publishers B.V.

164

D.M. McCLUNG AND A.i. MEARS

extreme values on sample size. We also show that the procedure is applicable to censored (truncated) data sets. Since the method is applicable to the important cases of both small and censored data sets it is attractive for use in avalanche applications. The method is illustrated by analysis of data from five different mountain ranges. This analysis shows the wide variations in predictions between ranges. It is shown that in order to obtain an optimum prediction equation, special techniques such as data censoring and partitioning may be required.

the angle defined by sighting between the positions marked for a and ft. We have chosen the position defined by/3 (the fl point) as a reference point from which to calculate runout. The runout distance (dx) is the horizontal reach from the fl point to the extreme runout position. The length (X•) is the horizontal distance between the starting point and the fl point and H is the total vertical displacement from the starting point to the extreme runout position. We have also defined a dimensionless runout ratio (Fig. 1 ):

Ax tan B - t a n a X a - t a n o r - t a n fi

Description of data The five data sets analyzed in this paper all consist of "100-year" avalanches. The extreme position reached by avalanche debris was deduced by vegetation damage or the historical record. Although we have attempted to select a 100-year return period, the true return period probably ranges from 50 to 300 years in the data, thus introducing an unavoidable random variation into the analysis. This effect should diminish in importance as the sample size (number of avalanche paths) increases for a given mountain range. The data for each mountain range consists of three angles (a, fl, fi) and three lengths (Ax, X/~, H) (Fig. 1 ). The angle a is defined by sighting from the extreme point reached by avalanches in the past (extreme runout position) to the starting point; fl is obtained by sighting from the position where the slope angle first declines to 10 ° to the starting point; ~ is

¢ Start position

".'.._ H

xx~xxx I

\ ~

~¢, I = ~

runout

Fig. 1. Definition of geometry and angles to determine extreme runout.

( 1)

A total of 524 avalanche paths from the four ranges were included in the analysis: ( 1 ) Western Norway (127); (2) Rocky Mountains and Purcell Mountains, Canada ( 125 ); (3) Rocky Mountains, Colorado (130); (4) Sierra Nevada, California (90); and (5) Coastal Alaska (52). The data collection methods were slightly different for the different ranges. We have described the data collection methods previously for all the data with the exception of the Alaska data which were collected in the same manner as the other data from the USA. The reader is referred to McClung et al. (1989) for the collection methods and a discussion of data accuracy.

Characteristics of dma: descriptive statistics The mean, standard deviation and range for all variables used in the analysis are listed in Table 1. Differences in the terrain of the mountain ranges are shown by the values of ~ (the bar denotes a mean value), fl and ~.. A simple measure of runout is the difference between p and a. The difference betweenBand c~is (from Table l ): 2.0 ° (Canada); 3.2 ° (Norway); 4.2 ° (Coastal Alaska); 5.4 ° (Colorado); 6.2 ° (Sierra, Nevada). The mean values of both/Ix and A x / X B rank in the same order. The original work on terrain runout prediction (Lied and BakkehSi, 1980) was a prescription to specify a (given B and other terrain parameters) using a least-squares procedure. McClung and Lied

165

EXTREMEVALUEPREDICTION OF SNOWAVALANCHERUNOUT TABLE 1 Descriptive statistics for data from five mountain ranges Mean

Standard Range of values deviation

Canadian Reckies and Purcelis a 27.8 3.5 20.5 fl 29.8 3.1 23 5.5 5.3 -21.5 H(m) 869 268 350 Ax(m) 168 131 -190 dx/Xp 0.114 0.100 -0.185 Western Norway a 29.4 5.2 18 fl 32.6 5.4 21 6.4 3.5 0 H(m) 827 255 342 dx(m) 224 170 -119 Ax/Xp 0.176 0.111 -0.121 Coastal Alaska a 25.4 fl 29.6 5.2 H(m) 765 z/x(m) 302 dx/Xa 0.25 Colorado Rockies a 22.1 /? 27.5 5.1 H(m) 543 Ax(m) 334 Ax/Xp 0.406 Sierra Nevada ot 20.1 fl 26.3 t~ 4.8 H(m) 429 dx(m) 354 ztx/Xp 0.490

Number of paths

40 42 20.6 1960 524 0.404

127 126 125 124 124 125

42 45 16 1539 823 0.52

127 127 127 131 131 127

3.2 3.3 3.1 247 166 0.14

18.9 23.0 0 320 80 0.07

34.2 38.2 9.5 1400 790 0.66

52 52 52 52 52 52

3.2 3.6 3.4 227 185 0.262

15.5 18.8 -2.9 128 76 0.07

30.7 37.7 10.2 1134 1200 1.57

130 130 130 130 130 130

3.6 4.2 2.7 239 223 0.260

14.0 16.5 0.00 104 107 0.15

35.9 40.7 9.0 1145 1433 1.35

90 90 90 90 90 90

(1987) found that predictor variables other than fl were not significant in a least-squares analysis of the data from western Norway. The same is true of the data from Canada: no significant terrain variables (except fl) have been found. With least-squares analyses, we get the following regression equations for the different ranges ( ^ denotes predicted value)

ti=0.93fl Canada (r2=0.75, N= 126) ti=0.90fl Western Norway (r2=0:87, N= 127) ti=0.74fl+ 3.67 Alaska ( r 2 = 0 . 5 8 , N----52) dr=0.63 fl+4.68 Colorado (r2=0.50, N= 130) t~=0.67 fl+2.50 Sierra Nevada (r 2= 0.60, N = 90 )

(2)

In Eq. 2, r is the Pearson (ordinary) correlation coefficient, and N is the number of avalanche paths. When the more general regression equation t~= CoP+ Ct is used, the constant C~ is not significant for the data sets from Norway and Canada. For example, with the Canadian data, the t statistic for fl is 19.1 while for C~ it is - 0.52 and calculations with the more general equation show that C~ is not significant. Without C~, analyses give dr=0.86 fl (Alaska), &=0.80 fl (Colorado), and dr=0.76 fl (Sierra Nevada). With these results and Eq. 2, the ratio fl/dt ranks the same as the mean runout ratio and fland t~ for all five data sets. The parameter ~ will affect the runout ratio, but it cannot be used as a predictor in a multiple regression least-squares analysis of ot as a function of# ( is unknown until the runout position is specified). In this paper, the runout ratio is fitted to a Gumbel distribution so that the effects of 6 are included. This may be an advantage over the least-squares method presented above. The values in Table 1 also show that the mean value of runout (either zlx or Ax/Xp) decreases as the mean value of H or Xp increases for some of the data. Since zlx and ,dx/Xp are not Gaussian variables, rank correlation coefficients are preferable for determining their dependence on other variables. For the individual mountain ranges, the rank correlation coefficients, rs, for Ax/Xp correlated with XBare: -0.22 (Canada); -0.26 (Norway);-0.41 (Alaska); -0.64 (Colorado); - 0 . 5 0 (Sierra Nevada). The correlation coefficients with H are somewhat less. For the data from Alaska, Colorado and the Sierra Nevada this scale dependence represents an important effect, which will be dealt with in a later section. We do not have a complete expla-

166

D.M. McCLUNG AND A.I. MEARS

nation of the scale effect but we note that the terrain is steeper and the vertical relief is greater on average for the data sets from Canada and Norway. In the five data sets presented, the absolute value of the rank correlation coefficient (Ax/Xp,X~) increases roughly as the mean value of H decreases. All of the data sets have negative rank correlation fordx/X, with c ~ = - 0 . 5 8 (Canada); - 0 . 3 9 (Norway); - 0 . 4 9 (Alaska); - 0 . 3 3 (Colorado); and - 0 . 5 1 (Sierra Nevada). From Eq. 1, it is easily shown that the first derivative of Ax/X, with respect to a must be negative so these results are not surprising. All of the data sets have nearly insignificant rank correlation for Ax/X,, with fl: - 0 . 0 6 ( C a n a d a ) ; - 0 . 0 7 (Norway); 0.10 (Alaska); 0.23 (Colorado); and 0.04 (Sierra Nevada). Iffl is taken as a known index of path steepness, we conclude that the runout ratio is nearly statistically independent of path steepness.

Quantitative runout: extreme value statistics Previously (McClung et al., 1989) it was shown that the runout ratio may be fitted to the first asymptotic extreme value (or Gumbel) distribution. For brevity let x=Ax/Xa. The mean probability density function o f x is then ( - o o < x < o o ) :

f(x) = ~i

exp -

--if- -exp-

T

(3)

where b is the scale parameter and u is the location parameter. For a complete (uncensored) distribution, the mean of the distribution is u+7b ( 7 = 0.57721 .... Euler's constant) and the variance is (zr2/6)b 2. The pth quantile o f E q . 3 may be expressed as:

xp=u-b ln(-ln p)=-u+ b Yo

(4)

where p is the non-exceedance probability and Yp is the reduced variate. Given a value, Xp, the probability of any value o f x being less than or equal to Xp is p = Pr(x_
p(xp)=~f(z)dz=exp-[exp-(~-~)]

(5)

A chosen value o f p ( 0 < p < I ) defines the number

x~ for which p × 100% of the values in the distribution will not exceed Xp. Once u and b are determined, Eq. 4 can be used to specify a mapping of runout distances as a function of the non-exceedance probability for a given avalanche path. There are a number of known methods to determine u and b for a set of values o f x which approximately fit an extreme value distribution. The method we shall use is probably the simplest: fitting a least-squares regression line of the form of Eq. 4 through the data points. It is essential that the method can be applied to both censored (truncated) data sets and small data sets. The latter condition is crucial to produce predictions for mountain ranges where limited data are available: the most common situation in avalanche consulting. In Appendices A, B and C we have documented a simple method to define the non-exceedance probability (p) in terms of a set of values of the runout ratio for distributions which are censored or when the data sets are small. In the sections below the method described in the appendices has been used for determining the scale and location parameters for five different mountain ranges.

Quantitative runout for five mountain ranges Figure 2 shows the runout ratio as a function of Y (using Eq. B6 from Appendix B) for data from Canada. The figure indicates that when the runout ratio is negative the data seem to belong to a different population of values. (A negative runout ratio means maximum runout for the avalanche in question occurred where the slope angle was greater than I0 ° ). For accuracy at high values of p, it is sometimes advantageous to censor the data. We found that there is an advantage in censoring the data from Norway similar to the results in Fig. 2. The data from Canada (Fig. 2 ) and Norway have been censored at p = e - l (Table 2). The data from Alaska, Colorado and the Sierra Nevada do not include negative values of the runout ratio and their accuracy is not improved by censoring the data. Leastsquares procedures gave scale and location parameters for all the ranges (see Table 2 ). The data and regression line for the Colorado Rockies are shown in Fig. 3.

167

EXTREME VALUE PREDICTION OF SNOW AVALANCHE RUNOUT

/

2.0

0.6

1.5 0.4 O

O

1.0 0.2 0 Z

0 Z

0.0

0.5

0.0

0 o

-0.2 -2

I

I

I

0

2

4

-0.5

-2

6

J

i

i

0

2

4

6

REDUCED VARIATE

REDUCED VARIATE

Fig. 2. Extreme runout as a function of non-exeeedanceprobability for data from Canada, censored at p= e- t. The regression line is also shown. The ordinate Jx/Xp and the abscissa is - I n - (In p).

Fi& 3. Extreme runout for data from Colorado similar to Fi~ 2. There is no need to censor the data as in Fig. 2. i

i

i

i

i

TABLE2 a

Location (u) and scale (b) parameters for five data sets (Data from Norway and Canada were censored at p= e- i, the other data were uncensored) o

Mountain range

u

b

S~

r2

Number of paths in analysis

-l

,d

-2

Canadian Rockies and Purcells Western Norway CoastalAlaska Colorado Rockies Sierra Nevada

0.079 0.070 0.012 0.98

0.143 0.185 0.288 0.374

0.077 0.108 0.202 0.206

0.011 0.023 0.040 0.041

79 -3

0.98 80 0.97 52 0.98 130 0.98 90

All o f the regression lines (except the data from Canada) (Table 2 ) have fairly high autocorrelation in the residuals (exceeding 0.6). Probability plots o f the residuals for all o f the regression lines produced residuals which are normally distributed except for the data from Colorado. For the Colorado data, this latter condition suggests a transformation on the runout ratio. We found that a log transformation produces a variable which fits a log-normal

-3

I

I

I

I

I

-2

-I

0

I

2

3

RANKIT

Fig. 4. Data from the Colorado Rockies fitted to a log-normal distribution. The ordinate is ln(Ax/Xp) and the abscissa is the frequency factor for the ordinate treated as a Oausslan variable. distribution. Figure 4 shows a normal quantile plot for the variable In (Ax/Xp). The equation o f the regression line through the plotting positions, p = ( 3 i - I ) / ( 3 N + l ), (Draper and Smith, 1981 ) for the lognormal fit is: In

(dx/Xp) =

- 1.085+0.615 (RANKIT)

168

with r2=0.99, Se=0.050, N= 130. (RANKITis defined as the frequency factor for the response variable treated as a Gaussian variable. ) This transformation produces residuals which more closely obey a normal distribution than those for the extreme value regression line. However, for the log-normal fit there is higher autocorrelation among the residuals (0.71 versus 0.54). When results from the lognormal fit are compared with those from the extreme value distribution, the values agree within 2% over the range of interest. In general, there is no compelling theoretical reason to choose an extreme value distribution over a log-normal one for any of the avalanche data. Our interest is pragmatic: we seek a distribution which fits the data accurately. In all cases, we have found the extreme value distribution to be as good or better than other ones and we recommend its use for the least-squares procedure, even for the Colorado data. From Figs. 2, 3, and Table 2, it can be seen that the extreme value parameters and runout characteristics vary significantly for the different mountain ranges. The results indicate that the method we have used to specify runout demands that data from one mountain range cannot be used to quantify runout in another. To illustrate runout from the different mountain ranges, regression lines and data for five ranges are given in Fig. 5. In Fig. 5, all of the data sets are censored at p = y and we have included only paths for which H > 340 m. This figure shows the big differences in extreme value parameters for the various ranges (see McClung, Mears and Schaerer, 1989 for further discussion ). Another result previously emphasized (McClung et al., 1989 ) is that there is no climate regime effect evident in the data. Two of our data sets are from continental areas (Canada and Colorado), and the others are from maritime areas. The highest runout comes from the two areas with the gentlest terrain (Colorado, Sierra Nevada); one of these is continental, the other maritime. From Eqs. A6 and A9 (see Appendix A), the scale and location parameters depend only on the mean and standard deviation of the runout ratio. The differences in these quantities from one range to another depend on terrain parameters, and climate regime does not ap-

D.M. McCLUNG AND A.I. MEARS 2,0

1.5 © <

10 o z

f..

05

L

0.0 05

16

27

~

38

49

60

REDUCED VAR[ATE Fig. 5. Data from five m o u n t a i n ranges compared. All five data sets are censored at p = y. Only data are included for which H>_>-340 m. ( ~ ) Sierra N e v a d a , ( X ) Colorado Rockies, ( + Coastal Alaska; ( [ ] ) Western Norway, ( A ) C a n a d i a n Rockies.

pear from the analysis. This result could be expected: for the time scale of our data ( ~ 100 years) we believe that essentially the same character of avalanche could emerge to dominate the extreme runout statistics. Field observations indicate that a m i n i m u m friction avalanche is usually a large dry avalanche; it is possible this type can occur even in a maritime range for time scales of 100 years.

Length scale effects Two of the data sets (Colorado, Sierra Nevada) and possibly the Alaska data to a degree exhibit a scale effect when the runout ratio is used to define runout. The rank correlation coefficients for Ax/Xp with X~ are significant for these data. Many of the small (Xp or H ) paths from Colorado and the Sierra Nevada had start positions on ridge crests or cirque headwall locations characterized by deep snow accumulations. In such cases, large release volumes may be associated with short paths. This effect is not considered in our statistics derived from lengths, length ratios and angles. To illustrate the length effect, the Colorado data

EXTREMEVALUEPREDICTIONOF SNOWAVALANCHERUNOUT

169

were partitioned above and below Xp=1000 rn (close to the median 934 m) and extreme value fits were performed for these two data subsets. The results are shown in Fig. 6. Calculations reveal the crucial importance of the length scale. For p = 0.99, we get respectively: 0.67 (Xp>1000 m); 1.42 (Xp< 1000 m); and 1.22 (full data set). In practical applications such differences would have profound influence on land use planning decisions. Our partitioning of the data at 1000 m is somewhat arbitrary but it illustrates one simple method to improve predictions for data sets which exhibit this scale effect. We feel there are not enough data to justify partitioning into smaller parcels for the Colorado values. Clearly if partitioning is used at all, a larger data set is required (to avoid sample size effects) than for a data set which does not exhibit a scale effect. For the Colorado data, a further (probably fortuitous) benefit of partitioning is that the autocorrelation among the residuals is reduced in comparison with the full distribution and the residuals more closely obey a normal distribution. Clearly, the data sets from Colorado and California (and perhaps Alaska) require that the length scale effect must be dealt with before applications are attempted.

Another method of dealing with the length scale effect is suggested by inspection of Fig. 6. The Figure shows that the relation between/ix/Xp and Xp is approximately hyperbolic. This suggests that a better method of dealing with the data from Colorado may be to perform an analysis with/Ix rather than ~Ix~X/3. This reinforces our belief that no simple procedure is available to deal with all situations for all sets of data.

2.0

o

1.5 o

o

1.0

o

[]

o

o

Dc~

0.5 o

o o

o

!

0.0 0

1000

o.

.

.

.

o

.

.

~

.

o

2000

300~

Horizontal Reach to Beta Point (m) Fig. 6. Data from the Colorado Rockies partitioned at

Xp= 1000m. The threelinesrepresentthreevaluesof the nonexceedance probability p=0.99 (. . . . andp--0.80 ( . . . . ).

), p--0.90 ( - - ) ,

Control curves and error a n a l y s i s

Confidence limits on the value of the runout ratio can be calculated by standard procedures assuming the errors in the least-squares analysis fit a Gaussian distribution. If go is estimated from the regression line for a value of the reduced variate Yo, then a ( 1 - ¢ ) × 100% confidence interval for the response at Xo is (e.g., Walpole and Myers, 1978 ): [

(Yo-y)2 11/2 - 1)

-~0 "F'Z(I-,/2)Se 1 -t N S 2 y ( N

(6)

where Z(l_e/2) is the area under a Gaussian curve between p = 0 a n d p = 1 - ~ ( 0 < ¢ < 1 ) and Se is the standard error of the regression equation. For example, consider ¢= 0.10 (90% confidence limit), Zo.95= 1.645 (from tables). Consider this limit for the maximum value of x = 1.35 for data from the Sierra Nevada: Yo=5.414, Sv=1.283, F=0.583, N=90 and Se=0.041 (Table 1 ). Calculations with Eq. 6 give: 1.457 + 0.074 or 1.38-1.53. Therefore the value 1.35 is outside the 90% confidence limit. The data point in fact falls near the 98% limit. Probability plots were constructed to compare the residuals (from the regression lines estimated in Table 2) and the Gaussian distribution. This analysis showed that the residuals for all of the data sets except Colorado are approximately normally distributed. With the serial correlation among the residuals (0.6) for the Colorado data set, it is suggested that weighted least-squares procedures could be applied (Draper and Smith, 1981 ) to improve the prediction equation and error analysis. The error analysis above the simplest available. Since we are interested in confidence limits above 90%, more advanced error methods may be re-

170

D.M. McCLUNG AND A.I. MEARS

quired when the errors do not fit a Gaussian distribution (often on the tail of the error distribution). Cheng and Iles (1988) have given a prescription based on a maximum likelihood method. Their method involves adding an amount (increasing with the non-exceedance probability) to that predicted by the regression line. Application of their methods to the data in Fig. 3 adds 0.2-0.3 to the value of Ax/Xp determined from the regression line as p increases from 0.9 to 0.99. Their analysis is more general and it may be preferred.

Frequency factor The variable for runout, x, may be represented by the mean, £, plus the departure of the variable from the mean which is usually written as the product of the standard deviation ax and a frequency factor. From expression 4 and Eqs. A6 and A9 (see Appendix A), the frequency factor, Kp, may be defined as: Kp _

- I n - I n ( p ) - I7 ay

(7)

The optimum regression (Eq. 4 ) then becomes: x. =X + K.ax

Our procedure is to quantify runout distances in a given mountain range using the historical record of extreme avalanche reach. Given a value of p, we are able to define a position on an avalanche path for which p% of avalanche paths in the data set have runout which does not exceed the given position. A set of values of p, then define a mapping of positions with increasing risk of them being over-run as p decreases. For some applications, choice of a suitable value ofp will be easy. For example, when good security is demanded, values ofp near or in excess of 0.99 will be needed (for a data set with ~ 100 avalanche paths). However, in many applications, high land prices will exclude such high values and increasing risk must be accepted. In most instances, use of p=0.99 to define the limit of safe area will exclude huge amounts of land from construction areas. The true value of the method is that it can alert land-use planners to the increasing risk at positions closer to the starting zone. If constructions are planned in higher risk areas, occupation of them must be conditional on application of proper safety measures, for example, avalanche forecasting, evacuation and structural control.

(8)

with l~and av defined by Eqs. A2 and A5 (Appendix A). For the special (uncensored case), @ = 1, 17= ~,and av = ~/x//g, Kp becomes: Kp = - ~V/-6[7+ In ln~-l)]

Application of the results

(9)

which is equivalent to an expression given by Chow (1964). When p=0.5704, Kp-~ 0 and x = £ . For the Colorado regression line, £=0.405 when p = 0.5704, which may be compared with the calculated mean of 0.406. Similarly, the Sierra Nevada data give £=0,489 at p=0.5704 compared with £=0.490 (calculated). Since frequency factors are commonly used in hydrology and flood applications, Eq. 7 provides a basis for comparison with these subjects as well as with other distributions. Chow (1964) gives frequency factors for Gaussian and log normal distributions.

Summary and discussion The method of predicting runout distances presented here can be applied to complete or censored data sets consisting of at least 30 paths which have produced major (100-year) avalanches. Data analysis has yielded the following results: (1) the parameters defining runout differ sil~ficantly between mountain ranges; (2) the effect of climate (maritime or continental) may not affect extreme runout but this effect could not be quantified in our work; and (3) individual data sets differ, and therefore special techniques will usually be needed to derive optimal prediction equations in any given mountain range. These techniques could include: weighted least-squares, censoring, partitioning, or use of techniques other than least-squares to determine extreme runout parameters (see e.g., Lawless, 1982). In consulting applications, the methods pre-

EXTREME VALUE PREDICTION OF SNOW AVALANCHE RUNOUT

171

sented here should be used in combination with historical information and ground truth where available. If field or historical data indicate that runout exceeds that predicted by our methods, the ground truth information must obviously take precedence. Although the analysis of runout using terrain variables is a rational objective method, it must be used judiciously and in combination with other techniques. The optimal runout equation for a given mountain range is affected not only by terrain variations but also by the data collection method. The different procedures illustrated in this paper show that no single method is unique or universal. However, unlike physical models, runout distance uncertainties can be defined in standard statistical terms using our method. The additional engineering tool introduced here can be used to complement other statistical methods, dynamics models and field data. An important benefit of the method we have presented is that it provides a basis for comparison of avalanche terrain and runout distances from different mountain ranges. With over 500 paths in the analysis, we believe we have given the most comprehensive study of avalanche runout so far presented.

interest in runout is usually greatest for p > 0.5, the information lost by censoring the data may be sacrificed to achieve better accuracy in the range of interest. If u and b are determined by a regression analysis for a censored set of data points, xl ~x2 < " "
Acknowledgements We wish to thank Peter Schaefer, Institute for Research in Construction of the National Research Council of Canada, and Karstein Lied, Norwegian Geotechnical Institute for contributing data from Canada and Norway. Dr. Michael Bovis, Department of Geography, University of British Columbia provided helpful suggestions on an earlier draft. Special thanks are owed to Dr. C. Jaccard, Swiss Federal Institute for Snow and Avalanche Research, for a careful reading of the paper and constructive suggestions for improvement.

Appendix A: results for censored distribution Analysis of avalanche runout data has shown that improved predictions are sometimes obtained by retaining only values of x for which p > 0.5. Since

-~

'L

- l n ( - l n p)dp

(A1)

Let p = e -t, then E[ ( x - u ) / b ] is given by:

~

1__ o

-Ap ~q--Inp0

ln(t)e_tdt=~p

(A2)

Similarly, E [ ( ~ - ~ ) 2 ] is given by:

--

l~xox-u

2

'L'

=~PP o [ - l n ( - l n p ) ] 2 d p

(A3)

=-Zp1 I_ 'n"o[In(t) 12e-tdt=A~

(A4)

or:

Expressions A2 and A4 give the mean and mean square deviation of the reduced variate over the interval Ap. With Eqs. A2 and A4, the standard deviation of the reduced variate may be defined over Ap:

~v=[Y~-(~)2]

-L~-\~/J

(AS)

Also, from Eq. A1 an expression for ~ (the mean of

172

D.M. M c C L U N G A N D A.I. MEARS

the extreme values on dp) can be obtained in terms of 17on the same interval:

E(x) =X=u+ b17

(A6)

that expression 4 may be rewritten using Eqs. A6 and A9 to give a relation between two centered and scaled variables:

xp-x G-17

Similarly, from Eq. A5:

G

which may be simplified to:

a ~ - E ( x 2 ) - [ E ( x ) ] 2 a~ b2

-b2

(A8)

Therefore, over the interval ~p, the scale parameter is given by: b= ax av

(A9)

and the location parameter is given by:

u=X-b17

(A10)

In Eqs. A9 and A 10, X, ax are the mean and standard deviation o f x on the interval dp. Also, av and 17are not "statistics" but are parameters defined by the integrals A2 and A4. Over the interval zip, the integrals A2 and A4 may be expanded in terms of infinite series. The integration is easily performed term by term using formulas given by Gradshteyn and Ryzhick (1980, pp. 203-204). Expressions for I, and I2 are given by: 11 =

(

6o,n[

In t o - 1

(All)

r/=l

and: ÷ 21 1 2 = - .=~ ~ ( - tn! o ) " [ (In to) 2 2Into n ~

7, 12 = n 2 / 6 + ? 2 or ao=n/x//-6. It is also of interest

(A13)

It can easily be shown that Eq. A 13 is equivalent to an optimum regression line (with zero remainder). In particular, on the tail of the distribution, where extreme avalanche runout data may provide a better fit than an uncensored distribution, the quantities 17and ay can be calculated analytically from Eqs. A2, A5, A11 and Al2 and the optimum regression equation may be defined using Eq. A 13 using only the extreme values on the interval (dp) to calculate ar and X. These results are applied in the main body of the paper. Use of Eq. A13 alone does not provide an assessment of model fit; therefore it is necessary to define a set of probabilities pi for the data points xi if the fit is to be defined. This question is addressed in Appendices 8 and C.

Appendix B: plotting positions for censored and small data sets

In the past a number of plotting position equations have been defined to assess model fits to the extreme value distribution (eg., Chow, 1964), Weibull plotting positions are among the most popular; they are defined by ranking N values of xi in decreasing value XN>XN-~ >''Xi>''X2 >X~. The non-exceedance probability is then given in terms of the rank i of the observation xi:

(A12)

where to = - In Po. Note that u and b depend only on the mean and standard deviation of x and the analytical quantities, the mean and standard deviation of the reduced variate. Also, for most values the series of Eq. A 11 and Eq. AI 2 converge very rapidly; evaluation can be done in less than 10 minutes on a hand calculator or the expressions can easily be programmed for a computer. In the special case of an uncensored data set, po~0, zip= 1 and 17=I~ =

~v

i

P~-N+ I

(B1)

Thus, for the smallest value p~ = 1/N+ 1, and the largest PN=N/N+ 1. Similarly, Hazen (1930) defined another popular set of plotting positions: p~_

i+0.5 N

(B2)

Expressions B 1 and B2 have some potential disadvantages for small data sets and censored data sets. Since plotting positions are functions only of the reduced variate (an ancillary statistic), they should

EXTREME VALUE PREDICTION OF SNOW AVALANCHE RUNOUT

173

not depend on u and b nor should they depend on sample size; the calculated values of 17and try should approximate the analytical quantities A2 and A5 for the sample sizes (N) of interest. For a "population" of values p~, appropriate to the values x~, I7 and trr are defined by standard methods as:

judiciously), emphasis is placed upon accurate estimates o f £ and trx. From the central limit theorem, if N>- 30, the normal approximation for ~ can be used with good precision even by sampling from an extreme value distribution, therefore:

( 1 ) 17= N + l - r o

~ -ln(-lnp,) i=,~

(B3)

N

£=i=1 N

(B7)

Similarly, the variance may be accurately defined by the statistic S, for N>_ 30:

and

N

ay= ~ k

~ [-ln(-lnp~)-17] 2

--ro/i=ri

(B4)

where ro is the rank of the lowest value included. For an uncensored data set, ro= 1, so ro > - 1 in general. In order that sample size is not introduced, the estimates in Eqs. B3 and B4 should be close to those from Eqs. A2 and A5 for infinite sample size since I7 and try are independent of sample size. Consider plotting positions of the empirical form: P'-

i-Co N

(B5)

The definition Co=0 sends the value X N ( i = N ) tO while Co=l puts the value Xl(i=l) to --GO. These extremes are undesirable and therefore it is expected that Co lies between 0 and 1. In fact, Hazen's (1930) logic was to set Co=0.5 (the midpoint). In order not to introduce sample size and so that Eqs. B3 and B4 are fulfilled for censored or uncensored data sets, further constraints apply to &. We evaluated Co using a computer for censored and uncensored data sets and found that good estimates of plotting positions are given when Co = 7/(tro ) 3/2 where tro = n/v/6 and 7= 0.57721 . . . . ; the mean and standard deviation of the reduced variate for the full extreme value distribution. This gives Co = 0.4, and the plotting positions are: &=

i-0.4 N

E $ 2 - ~='

2 N--1

(B8)

In Table B 1, 17and av are given as a function of N (calculated from Eqs. B1, B2 and B6). These should be compared with the expected values (N~oo). In Appendix C, we have given analyses based on differential entropy (minimum uncertainty) for a censored extreme value distribution. These analyses quantify the assertion that Eq. B6 is superior to Eqs. B 1 and B2 to estimate the quantities defined by Eqs. A2 and A5. The results show that Eq. B6 provides superior results for censored (ro > 1 ) or uncensored distributions (ro= 1 ) when N_ > 20. Also, the values calculated from Eq. B6 are nearly independent of N (as is required of a proper set of plotting positions). Weibull plotting positions are shown to be inferior for determining regression line coefficients (Table 2). Although Eq. B6 does not differ much from Eq. B2 the estimates are just as easy to calculate and more accurate results are obtained for small or censored data sets. A literature search for simple expressions of plotting positions, including papers by White (1969), Barnett ( 1975 ) and Blom (1958) revealed that the definitions proposed by these authors are inferior to the estimates in Eq. B6 for determining scale and location parameters by least-squares analysis.

(B6)

for ro < i < N . In general, we recommend a sample size N> 30 if possible. Since sample size should not enter through the reduced variate (if plotting positions are chosen

Appendix C: uncertainty introduced by plotting positions A measure of uncertainty of a random variable x (assuming a continuous series of values) is the dif-

174

D.M. McCLUNG AND A.I. MEARS

TABLE B 1

andf(x) is given by Eq, 3 where P>Po.

Mean value (l7) as standard deviation (a¥) of the reduced variate for plotting position equations

f°(x)=-plnp/(bAp).

The values are calculated as a function of number of data points (N) and non-cxceedanceprobability (Po) for censoring Po

Eq. B6

Eq. B2 (Hazen)

Eq. BI (Weibull)

On the interval Po < P < 1, it can be shown that With the substitution p = e - ~, integration of Eq. C 1 gives (after considerable algebra):

H= 17+In(hAp) + 1 +~p In Po

(C3)

I f A p = 1, then p o = 0 to give Hu for the uncensored distribution: N= 130 0.0 1/e 0.50 y 0.70

Hu = 17+1n b + 1 0.582 1.267 1.550 1.755 2.128

0.563 1.232 1.506 1.692 2.057

1.224 1.032 0.989 0.963 1.013

1.282 0.571 1.263 0.549 1.076 1.531 1.040 1.461

1.172 0.920

N= 30 0.0 0.592 1.280 0.568 1.254 0.536 0.50 1 . 5 5 8 ! . 0 7 1 1.522 1.023 1.421

1.131 0.869

N= 50 0.0 0.587 0.50 1.555

N=20 0.0 0.50

0.597 1.560

1.284 1.111 1.081 1.065 1.041

0.575 1.259 1.540 1.733 2.113

1.273 1.094 1.060 1.042 1.013

1.276 0 . 5 6 3 1.244 0.524 1.090 1.064 1.511 1.006 1.379 0.822

N=oo (Eqs. B3 and B4) 0.0 ~ 1.283 l/e 1.260 1.110 0.50 1 . 5 4 5 1.080 ? 1.732 1.077 0.70 2 . 1 2 2 1.043

H[X] = -

f°(x) lnf°(x) dx

(el)

where f ° ( x )

is the probability density and f ° ( x ) l n f ° ( x ) - 0 for values o f x w h i c h f °(x) =0. For a censored extreme value distribution the probability density function is:

f(x) f(x) pO<_p<_l f ° ( x ) - -~p - 1--Po f ° ( x ) =-0

From Eq. A9, it is clear that H in Eq. C3 depends only on 17, try, ax and P0 (not on )~), and maximum entropy is obtained when p o = 0 (an uncensored distribution). From Eq. C3, given Po, the entropy of a censored extreme value distribution depends only on 17, av and ax. O f these, 17and av depend on the choice of plotting positions and usually they converge to their true values ( N ~ o o ) when the sample size is fairly large. If values with subscript infinity (oo) represent theoretical quantities for N--,oo, then given any value Po, the loss of information due to errors in av, I7 introduced by the plotting positions is (from Eq.

C3):

A H = H - H ~ = ( Y - Y o o ) - I n (aa-~v)

ferential entropy H [ X ] (see e.g., Sveshnikov, 196 8 ) given by:

P
(C2)

(C4)

(C5)

Evaluation with Eqs. B 1, B2 and B6 shows that zlH is smallest in nearly all cases for Eq. B6. For example, if N= 130 and Po = 0 ( 17~ = 3', tr~ = 1.283), the values for AHare: 0.033 (Eq. B1 ), 0.0056 (Eq. B2) and 0.0040 (Eq. B6). Needless errors are produced by arbitrary choice o f plotting positions and the errors become acute for small samples. In applications, choice of plotting positions is important to reduce sample size dependence on them because it is expensive and time consuming to collect avalanche runout data. References Bakkeh6i, S., Domts, U., and Lied, K., 1983. Calculation of snow avalanche runout distance. Ann. GlacioL, (4): 2427. Barnett, V., 1975. Probability plotting methods and order statistics. Appl. Stat., 24: 95-108.

EXTREMEVALUEPREDICTIONOF SNOWAVALANCHERUNOUT

175

Blom, G., 1958. Statistical Estimates and Transformed BetaVariables. Wiley, New York, N'Y. Bovis, M. and Mears, A., 1976. Statistical prediction of snow avalanche runout from terrain variables. Arct. Alpine Res.,

Lied, K. and BakkehOi, S., 1980. Empirical calculations of snow-avalanche runout distance. J. Glaciol., 26 (94): 165177. McClung, D.M. and Lied, K., 1987. Statistical and geometrical definition of snow avalanche runout. Cold Reg. Sci. Technol., 13(2): 107-119. McClung, D.M., Mean, A.I. and Schaerer, P.A., 1989. Extreme avalanche runout: Data from four mountain ranges. Ann. Glaciol., 13: 180-184. Sveshnikov, A.A., 1968. Problems in Probability Theory, Mathematical Statistics and the Theory of Random Functions. Dover Publ. Incl., New York, N.Y. Walpole, R.E. and Myers, R.H., 1978. Probability and Statistics for Engineers and Scientists (2nd ed.). MacMillan, New York, N.Y. White, J.S., 1969. The moments of log-Weibull order statistics. Technometrics, I l: 373-386.

8:115-120.

Cheng, R.C.H. and Iles, T.C., 1988. One-sided confidence bands for cumulative distribution functions. Technometrics, 30(2): 155-159. Chow, Ven Te, 1964. Handbook of Applied Hydrology. McGraw-Hill, New York, NY. Draper, N. and Smith, H., 1981. Applied Regression Analysis (2rid ed.). Wiley, New York, N.Y. Gradshteyn, I.S. and Ryzhik, I.M., 1980. Table of Integrals, Series and Products. Academic Press, New York, N.Y. Hazen, A., 1930. Flood Flows, a Study of Frequencies and Magnitudes. Wiley, New York, N.Y. Lawless, J.F., 1982. Statistical Models & Methods for Lifetime Data. Wiley, New York, N.Y.