Applied Ocean Research 91 (2019) 101870
Contents lists available at ScienceDirect
Applied Ocean Research journal homepage: www.elsevier.com/locate/apor
A simulation study on the uncertainty of environmental contours due to sampling variability for different estimation methods
T
Erik Vanem , Odin Gramstad, Elzbieta M. Bitner-Gregersen ⁎
DNV GL Group Technology and Research, Høvik, Norway
ARTICLE INFO
ABSTRACT
Keywords: Environmental contours Marine structures Extreme loads Structural reliability Marine design Ocean environment Environmental loads
Environmental contours are often used in design of marine structures to identify extreme environmental conditions that may give rise to extreme loads and responses. Recently, attention has been given to the fact that different methods exist for establishing such contours, and that in some cases significant differences may be obtained from the various methods. In this study, another aspect of the uncertainty related to the calculation of environmental contours is addressed, namely the uncertainty due to sampling variability when environmental contours are constructed based on metocean data of finite sample size. The uncertainty of environmental contours for the joint distribution of significant wave height and wave period for different sample sizes (10, 25 and 100 years of data) are investigated considering different underlying datasets and for different estimation methods for the joint distribution. Both cases where samples are drawn from a known joint distribution of wave height and periods and cases where samples are drawn from a real hindcast dataset and fitted to the joint distribution are considered. The uncertainty of the estimated contours is quantified and discussed in light of differences that can be anticipated from the different methods used to calculate the contours. Moreover, the potential bias from assuming different estimation methods is illustrated.
1. Introduction Environmental contours are often used in long-term extreme response analysis of marine structures as a simple and approximate alternative to more computationally demanding full long-term analyses, for example based on I-FORM [1,2], I-SORM [3] or Monte Carlo simulations [4,5]. One of the main advantages of environmental contours is that the structural response analysis is de-coupled from the environmental description, meaning that only a limited number of short term response calculations are needed for providing long-term extreme values. Initial methods for calculating environmental contours were based on equi-density lines [6], but later contours based on the inverse first order reliability method (IFORM) have been more widely used [7,8]. More recently, environmental contours based on direct Monte Carlo sampling have been proposed, in order to avoid the need for the Rosenblatt transform into standard normal space [9,10]. These contours have well-defined probabilistic properties, and will in some cases be quite different from the traditional IFORM-contours, see the comparison studies in [11,12]. In particular, the direct sampling contours will by definition always enclose a convex set, whereas the IFORM contours will tend to follow the scatter of the data more closely. Hence,
⁎
in situations where the scatter of the data displays non-convex features, for example in cases with combined seas, the two contour methods may give quite different results, although the contours will be very similar in more well-behaved convex cases. It is noted that other definitions of environmental contours and other ways of calculating them exist, but these will not be addressed in this paper, see e.g. [13–21]. In this paper, both the IFORM-approach and the direct sampling approach to environmental contours are considered, and their relative differences are compared to the uncertainties due to sampling variability. The paper extends the results presented in [22] by including two new fitting procedures for the underlying joint distribution assumed for constructing the environmental contours, namely the method of moments and a least-squares approach. The added fitted methods are often used in the marine industry and are deemed relevant for this comparison study. Hence, the possible bias due to different estimation methods is also illustrated in this paper. The calculation of environmental contours is based on some joint distribution describing the relevant metocean parameters, for example the significant wave-height Hs and peak wave-period Tp. The joint distribution is in turn normally based on some dataset – for example insitu observations or hindcast simulations from a numerical model, such
Corresponding author. E-mail addresses:
[email protected] (E. Vanem),
[email protected] (O. Gramstad),
[email protected] (E.M. Bitner-Gregersen).
https://doi.org/10.1016/j.apor.2019.101870 Received 1 April 2019; Received in revised form 6 July 2019; Accepted 7 July 2019 Available online 26 August 2019 0141-1187/ © 2019 Elsevier Ltd. All rights reserved.
Applied Ocean Research 91 (2019) 101870
E. Vanem, et al.
as a spectral wave model. Obviously, there are uncertainties related to the fitting of a joint distribution to data. Most importantly, the results will strongly depend on the specific choice of parametric model to fit to the data [23], but the accuracy of different models is outside the scope of the present study. Also other choices made during the fitting procedure [24], for example fitting method, binning of data, starting values and bounds on the parameters for the optimization routines and possible data pre-processing will influence the final results. Moreover, there is also an uncertainty related to the fact that the underlying metocean data are of finite sample size. That is, each finite subsample from a given distribution may result in a different fitted joint distribution, and consequently different environmental contours. It is expected that this issue will be more critical if constructing an environmental contour corresponding to a large return period from a small dataset. In this study uncertainties related to the calculation of environmental contours are considered. In particular, if discussing the differences between environmental contours when different methods are used to construct the contours, it is of interest to assess this basic uncertainty due to sampling variability, which in most cases is an uncertainty that cannot be avoided. Hence, the uncertainty of environmental contours due to sampling variability is addressed relative to the difference between two contour methods. Other uncertainty aspects of environmental contours have recently been addressed by e.g. [25,26]. In this way, three aspects of the uncertainty of environmental contours are addressed by way of simulation studies, i.e., different contour methods (IFORM or direct sampling), different estimation methods for the joint distribution (maximum likelihood or least squares fitting of the cumulative distribution function) and sampling variability. This provides a mean to compare these sources of uncertainty and identify the most important aspect. It should be noted that environmental contours are used to account for the long-term variability of sea states, and that in many design applications there is a need to account for short term variability, conditioned on a sea state, as well. There are different ways one can do this, for example by inflating the contours according to a specified α-quantile as recommended by [27,28]. Alternatively, one may include the short term variability as an additional variable and construct 3-dimensional environmental contours with this additional information [29]. Notwithstanding, in this study, the short-term variability is disregarded. The aim here is to study the uncertainty in the calculation of environmental contours due to sampling variability compared to the variability due to different contour- and fitting methods, and not to provide design values for actual marine structures.
each case a joint distribution of the same type is fitted to the data. Naturally, due to sampling variability, each of the fitted distributions will have somewhat different the distribution parameters, resulting in different values of e.g. return periods and also different environmental contours. For the third case (N10) a real hindcast dataset [30,31] is used directly, and for one chosen geographical location, re-sampling with replacement is used to generate bootstrap samples for fitting joint distributions for Hs and Tp, and for calculation of environmental contours. The cases NA and N10 represent sea-state duration of 3 hours, while NWA represent hourly data. Thus, the total number of sea states in Ny years are Nss = Ny · 365.26 · 8 for cases NA and N10, and Nss = Ny · 365.26 · 24 for the NWA case. For both cases NA and NWA, the distributions of significant wave height Hs and wave period Tp or Tz are modelled by a joint distribution where Hs is fitted to a three-parameter Weibull distribution on the form
fHs (hs ) =
1
hs
exp
hs
,
(1)
and Tp or Tz are modelled by a Lognormal distribution with parameters conditional on Hs so that
fT | Hs (t|hs ) =
1 (hs ) t 2
exp
(ln t µ (hs ))2 , 2 (hs ) 2
(2)
where
µ = E (ln T ) = a1 + a2 hsa3,
(3)
= std(ln T ) = b1 + b2 e b3 hs ,
see also [32] for details. The distribution used in case NA was fitted to visual observations, covering about 37 years, from the North-Atlantic [33] which is a basis for the North-Atlantic scatter diagram [27], while the distribution used for the case NWA was fitted to 11 years of hourly hindcast data from North-West Australia [34]. The distribution parameters for these two cases are summarized in Table 1, and contour plots of the joint distributions are shown in Fig. 1. Note that the NWA case represents a wave climate dominated by swell and it is generally difficult to approximate such conditions with unimodal probability distributions, as discussed in [35]. Nevertheless, this case is included here for illustrative purposes as it illustrates well the differences in the shape of the contours with the different contouring methods. In case N10, the data, consisting of three-hourly hindcast data for a period of 57 years, are used directly and scatter plot of the Hs–Tp-data that are used in this case is shown in the right plot in Fig. 1. For the N10 case, the starting point is not a known joint distribution with known parameters as in the two other cases. However, the conditional model in (1)–(3) will be assumed appropriate for these data, and parameter estimates based on the complete dataset are made based on three distinct fitting procedures referred to as ML, MoM and LS (to be outlined in the subsequent section of this paper). These parameter estimates are also included in Table 1. It is noted that other parametric models could have been fitted to the NORA 10 data, for example a copula-based model [23], but the investigation of different joint models is out of scope of this particular study. It is also noted that the joint model above is recommended in [27]. It is noted that in this study it has tacitly been assumed that the data are independent and identically distributed (iid). Obviously, this assumption is correct for the two first cases where the data are generated
2. Description of cases In this study three different cases are considered, denoted NA (North-Atlantic), NWA (North-West Australia) and N10 (NORA 10), respectively. In the NA case, a pre-specified parametric joint distribution of significant wave height Hs and zero-crossing wave-period Tz is used. Also for the NWA case a pre-specified joint distribution is used, but here the joint distribution is described in terms of Hs and peak wave-period Tp. For different sample sizes, corresponding to 10, 25 and 100 years of data, a large number of datasets (N = 1000) are generated by parametric bootstrap sampling from the given distributions, and in
Table 1 Distribution parameters used for the cases NA and NWA and parameter estimates for N10 based on the complete dataset. Case
α
β
γ
a1
a2
a3
b1
b2
b3
NA NWA N10 – ML N10 – MoM N10 – LS
3.075 0.605 3.347 2.646 2.750
1.505 0.867 1.729 1.341 1.331
0.626 0.322 0.400 0.939 0.712
0.7052 0.0 0.000 0.666 0.666
1.2638 1.798 2.016 1.356 1.356
0.1319 0.134 0.127 0.177 0.177
0.0940 0.042 0.000 0.000 0.000
0.0595 0.224 0.220 0.224 0.224
−0.0328 −0.500 −0.088 −0.098 −0.098
2
Applied Ocean Research 91 (2019) 101870
E. Vanem, et al.
Fig. 1. Joint probability density function for the cases NA and NWA and scatter plot of the data used in case N10.
by simulations from a parametric model. However, for the N10 case, where a parametric distribution is fitted to actual data, it is not obvious that this assumption is appropriate. Indeed, data will typically not be iid in this case. This may influence the model fit to the data, and in particular, any uncertainty estimates of model parameters. Nevertheless, this assumption is often made in engineering applications, and it is not expected that it will significantly influence the results regarding sampling variability in this study. Even though it is possible to correct for this in various ways, see e.g. [36], no attempts have been made to challenge this assumption in the current study. Hence, it is interesting to keep this in mind when comparing results for the two cases where the iid assumption is obviously correct and the case where it is most likely violated. A perhaps more interesting difference between these cases is that when data are simulated from a parametric distribution, any values may be obtained according to this probability distribution. For the N10 case, however, where simulations are based on re-sampling from a finite dataset with replacement, only values included in the actual dataset can be obtained. Hence, in this case, the overall uncertainty may tend to be underestimated. Hence, it is expected that the estimated effect of sampling variability in this study will by larger for the two cases assuming a parametric distribution function compared to the N10 case. Moreover, with a finite sample, it is not known how representative this sample is of the overall population, which may introduce biases.
marginal distribution for Hs and a conditional distribution for Tp or Tz. These can be fitted independently and one could get different joint models by combining the different methods in different ways. In this paper we have considered the following three cases for the full joint distribution: (i) ML: maximum-likelihood method for both the Weibull and Lognormal fit, (ii) MoM: method of moments for the Weibull fit and MoM/LS method for the conditional Lognormal distribution and (ii) LS: least-square method for Weibull and MoM/LS method for the conditional Lognormal. In [22] results obtained from fitting the marginal distribution of Hs by maximum likelihood and the conditional distribution of Tp by least squares is reported. It is noted that the fitting of a 3-parameter Weibull distribution to data with unknown location parameter γ is far from trivial and even the maximum likelihood estimator may not exist [37]. Hence, several approaches exist for reliable estimation of these parameters, but only three of these are explored in this study. For other fitting methods of the 3-parameter Weibull distribution, see e.g. [38–45]. The procedure of fitting the distributions to the data involves optimization of a nonlinear function that can be performed with standard numerical methods available in many software packages. In the results presented herein an implementation of the L-BFGS-B algorithm [46] is used, available through the Python-function scipy.optimize.minimize. The optimization routine utilizes the gradient of the objective functions, which for the cases considered here can be derived analytically.
3. Fitting procedures
3.1. Fitting Weibull distribution
For each of the three cases (NA, NWA and N10), and for each sample size (10, 25 and 100 years of data), N = 1000 datasets are generated by either drawing random samples from the parametric distribution (for the cases NA and NWA) or resampling with replacement from the datasets (for the case N10). Hence, for each of the 3 × 3 × 1000 generated datasets of Hs and Tp or Tz a joint distribution of the type (1)–(3) are fitted to the data. And for each of these fitted distributions environmental contours are calculated. Due to the large number of distributions that are to be fitted, only a fully automatic fitting approach is feasible. In this study different fitting procedures are investigated. For the Weibull distribution, both maximum likelihood (ML), method of moments (MoM) and least squares fitting to the cumulative density function (LS) is presented. For the conditional lognormal distribution we have used both maximum likelihood and a combined method of moments/least squares (MoM/LS) approach where the mean and standard deviation of the lognormal distribution, as functions of Hs, are fitted to the data using least squares. It is noted that with the conditional modelling approach one is basically estimating two distributions to describe the joint model, that is, the
3.1.1. Estimation by maximum likelihood – ML The log likelihood function corresponding to the distribution (1) has the form N
log
( , , |x1, …, xN ) = N ln
N ln
+(
1)
N
ln(xi
)
i=1
i=1
xi
.
(4)
The maximum of this function subject to α, β and γ is found by numerical optimiztion using the Python function scipy.optimize.minimize, as mentioned above. Due to difficulties in estimating the location parameter in a 3-parameter Weibull distribution, the γ-parameter is restricted to take values between 0 and Hsmin where Hsmin represents the smallest observed value of Hs. In this study, ϵ = 0.9999 was used. For each simulated sample x = {x1, x2, …, xN}, the initial guesses for the distribution parameters are chosen as 0
= 0.99 min(x ),
0
ˆ = mean (log(x where m 3
= 1.2ˆs 1, 0 ))
0
= emˆ + 0.572
and sˆ = std (log(x
0
1
,
0 )) .
(5)
Applied Ocean Research 91 (2019) 101870
E. Vanem, et al.
Fig. 2. Distribution of the Weibull parameters α, β and γ as well as the associated 100-years return values for Hs for all Weibull-fits, using 10, 25 and 100 years of data. Dashed lines in the “violins” show the quartiles of the data. Black dashed lines (for cases NA and NWA) indicate the values for the target distribution. For the N10-case the coloured dashed lines indicate the values obtained when fitting the entire dataset using each of the three fitting methods, respectively. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
3.1.2. Estimation by method of moments – MoM For the method of moments fit the following set of equations for α, β and γ is solved
µˆ =
(1 +
1)
+ ,
ˆ =
ˆ=
(6a)
(1 + 2
(1 + 3
1)
1)
(1 + 2
(6b)
1) 2 ,
3 (1 + 1) (1 + 2 [ (1 + 2 1) (1 +
1)
+ 2 (1 +
1)2]3/2
1)3
,
(6c)
where µˆ , ˆ and ˆ are the sample mean, standard deviation and 4
Applied Ocean Research 91 (2019) 101870
E. Vanem, et al.
skewness, respectively. These equations can easily be solved numerically by first solving (6c) for β, then finding α from (6b), and finally obtaining γ from (6a).
also, that while ML and MoM produce close to unbiased estimates for the NA and NWA cases, the least square fitting methods produce a relatively strong bias in most cases. A peculiar feature is observed in ML fits for the N10-case, where the fitted parameter estimates cluster in two distinct groups. This is due to the fact that since the underlying Hs-data have a resolution of 0.1 m, the lowest value that is present in each resampled dataset will vary between 0.4, 0.5 and 0.6 m, and the fitted value for γ will always be close to the lowest value in the dataset. Moreover, for different values of γ, different sets of α and β values are found by the optimization routines, which again impacts the estimated 100-years return values. Hence, one should be aware that the results of the ML-fit seem to be sensitive to the lowest value that is present in the dataset, and that this not only affects the lower tail of the distribution, but also the probabilities at large values for Hs. With 100 years of data, the off-target cluster is significantly less pronounced, due to the fact that a seastate with Hs = 0.4 m is almost always present in the 100 years resampled datasets, but for 10 and 25 years of data the off-target cluster is comparable or even larger than the on-target cluster, which gives a notable bias for this case. This clustering is not observed in the MoM- and LS-results, which can be explained by the fact that the Hs-bins with lowest values are removed prior to the estimation procedure in the LS fitting, and for the MoM fitting, γ is estimated directly from the mean of the data, which is insensitive to the lowest value present in the data. As shown in the bottom row in Fig. 2, the uncertainty of the estimated 100-years return values for Hs can be quite notable, in particular for the LS fitting method. In the NWA case the variability of the 100 years return value is about 4.2 m for the LS method, when using 10 years of data, while the range of MoM and ML methods are 1.45 and 0.67 m, respectively. When using 100 years of data, the total ranges for the NWA case are reduced to 2.8, 0.4 and 0.2 m, for LS, MoM and ML, respectively.
3.1.3. Estimation by least squares and cumulative density fitting – LS In the least-square fitting of the Weibull distribution, all data with Hs below a lower value are first filtered out to provide more stable estimation of the lower threshold, γ. In this study, the lower threshold is set to Hs = 0.5 m, and any data points with Hs < 0.5 m are removed. Then, the remaining data are binned according to the Hs-value in bins with size 0.25 m, and empty bins are removed. Furthermore, all bins with less than a certain number of data points are merged iteratively with neighbouring bins until the minimum number is achieved in order to ensure stable fitting. In this study, the minimum number of data points in a bin is set to 5. The parameter values are then estimated by a least-square fit of the empirical distribution function to the 3-parameter Weibull cumulative distribution function. This is equivalent to transforming the data to Weibull scale and fitting a straight line as is commonly done with graphical estimation methods using Weibull paper. 3.2. Fitting lognormal distribution For both the ML and LS fitting method, described below, the starting values for the optimization routines were chosen as a1 = 1.0, a2 = 1.0, a3 = 0.1, b1 = 0.1, b2 = 0.1 and b3 = −0.1. In order for the fitted expressions for μ and σ, see Eq. (3), to remain non-negative and bounded we have also assumed the following bounds for the parameters: a1 > 0, a2 > 0, a3 > 0, b1 > 0, b2 > 0 and b3 < 0. It should also be noted that both the ML and LS fits leads to nonconvex optimization problems, so that for different starting values the optimization routines may find different local optima. Therefore, we have used a optimization routine that attempts to find the global maximum by exploring different local maxima, more specifically the Python function scipy.optimize.basinhopping. Each iteration of the basinhopping routine utilizes the L-BFGS-B algorithm, as described above.
4.2. Estimation of the conditional log-normal parameters for peak wave periods The fitting of the conditional lognormal distribution for the wave periods means fitting the functions μ(Hs) and σ(Hs), given by (3), and both ML- and LS-fits are reported. The resulting fits for the conditional mean and standard deviation, for 10 and 100 years of data, are shown in Figs. 3–5. It is clear that the ML method generally gives a smaller variation between the individual fits than the least square-based fitting procedure. This is not surprising, since maximum likelihood utilize all data directly, while the least-square fit applies binning of the data so that bins with few data (typically in the tail of the distribution) are given equal weight as bins that contain many data points. In this sense, the least square fit gives more weight to the tail of the distribution, and consequently is more affected by sampling variability than the ML fit. It is further noticed that in the case of ML fitting the individual fits are centred around the underlying distribution from which the data are generated, while the least square fits generally have a larger bias. In particular, the fitted functions for the conditional σ-parameter in the NA-case deviate significantly from the true function with the LSmethod. Possibly, this is due to the fact that the data are grouped together in bins and the standard deviation is calculated in each bin before the function is fitted by least squares. If many of the bins are scarcely populated, it might give a bias towards smaller conditional spread, and this may be expected for large values of Hs in particular. In this way, results may be very sensitive to bin-size, something that was also investigated in [23]. In the results presented in [22], the least squares fitting method was used for the conditional distribution with bins of size 1 m and this produced less biased results for the conditional standard deviation. It is also noted that for the results presented in [22] all bins with less than 10 observations were removed in the analysis, whereas in the results presented here, this limit is set to five. This is
3.2.1. Estimation by maximum likelihood – ML For the conditional lognormal distribution for the wave periods the log-likelihood function takes the form N
log
(a1, a2, a3, b1, b2, b3 |h1, …, hN , t1, …, tN ) =
log ti i=1
log
(hi)
1 log ti µ (hi ) 2 (hi)
2
.
(7) This is then maximized subject to the parameters a1, a2, a3, b1, b2 and b3 to provide the ML-estimators for the conditional distribution. 3.2.2. Estimation by method of moments/least squares – MoM/LS For fitting the conditional lognormal distribution for the wave periods, the data are first binned according to the Hs values in the same manner as for the LS-fitting of the Weibull distribution described above. Then, the mean and standard deviation are calculated within each bin. Finally the resulting points are fitted to (3) using a least-square fit. 4. Results 4.1. Estimation of the Weibull parameters for significant wave heights The results of the Weibull-fit to the significant wave-height data are shown in Fig. 2, showing the distribution of the N = 1000 fitted Weibull parameters, as well as the associated 100-years return values for Hs, for each of the three cases (NA, NWA and N10) for each fitting method (ML, MoM and LS) and different sample sizes (10, 25 and 100 years of data). As expected, for all three fitting methods the spread of the parameters becomes more narrow when including more years of data. There are however large differences between the different fitting methods, with ML having the lowest variance, and LS the largest. Note 5
Applied Ocean Research 91 (2019) 101870
E. Vanem, et al.
Fig. 3. Fitted wave-height dependent mean μ(Hs) and standard deviation σ(Hs) in lognormal distribution for all 1000 fits for the case NA. Maximum-likelihood (ML) fits and least-squares (LS) fits for 10 and 100 years of data. Dashed red lines show the corresponding curves for the underlying distribution. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
probably the explanation for the rather poor fits to the conditional standard deviation function. This illustrates that the least-square fitting procedure represents a biased approach for estimation of the conditional distribution parameters. It should however be mentioned that in applications where the distribution tail is of particular interest, it may be natural to employ a method that gives more emphasis to the data in the tail, such as the least square fitting described here. Note that also a weighted maximum likelihood approach may be used to give more emphasis on the tail in the ML-fit, or any other part of the distribution of particular interest.
4.3. Environmental contours Environmental contours have been calculated for each of the datasets described above, using both the IFORM-approach and the direct sampling approach. For details of how the direct sampling contours (DSC) are constructed, reference is made to e.g. [8,10], and the importance sampling technique proposed in [47] is utilized to get smooth contours. A comparison of the contours obtained from the two contour methods for all the three cases, where the initial environmental model is assumed (i.e. based on the initial (“true”) model or the models fitted
Fig. 4. Fitted wave-height dependent mean μ(Hs) and standard deviation σ(Hs) in lognormal distribution for all 1000 fits for the case NWA. Maximum-likelihood (ML) fits and least-squares (LS) fits for 10 and 100 years of data. Dashed red line shows the corresponding curve for the underlying distribution. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) 6
Applied Ocean Research 91 (2019) 101870
E. Vanem, et al.
Fig. 5. Fitted wave-height dependent mean μ(Hs) and standard deviation σ(Hs) in lognormal distribution for all 1000 fits for the case N10. Maximum-likelihood (ML) fits and least-squares (LS) fits for 10 and 100 years of data. Dashed red lines show the corresponding results when fitting the entire dataset. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 6. Comparison of environmental contours based on IFORM and direct sampling; NA (top left), NWA (top-right), N10 – ML-fit (bottom left), N10 – LS-fit (bottom middle) and N10 – MoM fit (bottom right) for 10- (red), 20- (green) and 100-year (blue) conditions. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
7
Applied Ocean Research 91 (2019) 101870
E. Vanem, et al.
Fig. 7. Comparison of environmental contours based on different estimation methods on the N10 dataset, direct sampling contours (left) and IFORM contours (right) for 10- (red), 20- (green) and 100-year (blue) conditions. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 8. Environmental contours for North Atlantic 20 years conditions based on direct sampling method and 10 (top) and 100 (bottom) years of data; maximum likelihood (left), least squares (middle) and method of moments (right) fitting.
to all the data with both fitting methods) is shown in Fig. 6. Contour lines for the 10-, 20- and 100-year conditions are shown. As can be seen from these plots, the differences vary from case to case, with the largest differences being for the NWA-case. This is a case where the density contours are highly non-convex, as can be seen from Fig. 1. For the N10-case, the comparison includes all three fitting methods, i.e. the assumed “true” model according to the ML-, LS- and MoM-estimation method, respectively.
For the N10 case, the differences between the contours from the ML-, LS- and MoM-fitted models are shown in Fig. 7, where the same contour methods are compared for the different joint models. As can be seen by comparing Figs. 6 and 7, the differences due to different joint models for Hs and Tp are much larger than the differences due to different environmental contour method for the N10 case. In order to evaluate the contour-uncertainty due to sampling variability, environmental contours are calculated for each bootstrap 8
Applied Ocean Research 91 (2019) 101870
E. Vanem, et al.
Fig. 9. Environmental contours for North Atlantic 20 years conditions based on IFORM and 10 (top) and 100 (bottom) years of data; maximum likelihood (left), least squares (middle) and method of moments (right) fitting.
Fig. 10. Environmental contours based on direct sampling method for North West Australia 100 years conditions based on direct sampling method and 10 (top) and 100 (bottom) years of data; maximum likelihood (left), least squares (middle) and method of moments (right) fitting.
9
Applied Ocean Research 91 (2019) 101870
E. Vanem, et al.
Fig. 11. Environmental contours based on IFORM for North West Australia 100 years conditions based on IFORM and 10 (top) and 100 (bottom) years of data; maximum likelihood (left), least squares (middle) and method of moments (right) fitting.
Fig. 12. Environmental contours based on direct sampling method for NORA 10 20 years conditions based on direct sampling method and 10 (top) and 100 (bottom) years of data; maximum likelihood (left), least squares (middle) and method of moments (right) fitting.
10
Applied Ocean Research 91 (2019) 101870
E. Vanem, et al.
Fig. 13. Environmental contours based on IFORM for NORA 10 20 years conditions based on IFORM and 10 (top) and 100 (bottom) years of data; maximum likelihood (left), least squares (middle) and method of moments (right) fitting.
Fig. 14. Sampling variability and variability due to contour method for the NORA 10 data; with 10 (top) and 100 (bottom) years of data; ML-estimation (left), LSestimation (middle) and MoM-estimation (right).
11
Applied Ocean Research 91 (2019) 101870
E. Vanem, et al.
Fig. 15. Environmental contours with 95% confidence intervals based on direct sampling method; ML (top), LS (middle) and MoM (bottom) fitting; NA (left), NWA (centre) and N10 (right).
sample for each case described above, and some of the results are shown in the following. For the NA-dataset, 100-year environmental contours based on direct sampling are shown in Fig. 8, where the contours are based on re-sampled data corresponding to 10, 20 and 100 years, respectively, and fitting a distribution by either the ML, LS or MoM procedure. Similar contours based on IFORM are shown in Fig. 9. The “true contours” that are indicated in the plots are the contours based on the initial model. It can be observed that for the contours based on ML-fitted distributions, the sampling variability is effectively reduced by increasing the amount of data, i.e. by going from 10-years of data to 100-years of data. This is not observed for the LS and MoM methods to the same extent, and the spread appears to be only somewhat reduced by increasing the sample size, but notable sampling variability remains even for 100 years of data. The fact that the sampling variability decreases as the data size increases and that the uncertainty due to sampling variability is larger for the contours based on least-square fitting and method of moment fitting compared to the maximum likelihood fitting is consistent with the corresponding fits of the joint distributions shown in the previous
section. Thus, for this particular case it is seen that the ML-estimation method performs best, and that results improve when the amount of data increases. Since in this case the true distribution is known, this is not surprising. Moreover, it should be recommended to use as much data as possible for contour estimation. The contours for North Atlantic conditions are not very different for the two contour approaches, and much larger differences are observed for the NWA-conditions. Hence, similar plots are shown for these data, but now for the 100-year conditions, in Figs. 10 and 11. The same trend can be observed in these plots, i.e. that the sampling variability becomes less important as the data size increases, but less so for the LSand MoM-fitting methods. Moreover, in this case it is observed that the difference between the contour methods are dominating the uncertainty, i.e. that the differences between the contours are larger than the uncertainty due to sampling variability, at least for some segments of the contours. However, for the contours based on the LS-estimates in particular, the magnitude of the sampling variability is quite large, especially for lager values of HS, meaning that an individual contour line may quite seriously over- or underestimate the extreme sea states if 12
Applied Ocean Research 91 (2019) 101870
E. Vanem, et al.
the ML-estimated models, the confidence intervals are rather narrow, indicating that perhaps 10 years of data is sufficient to construct 20year contours with adequate precision if adopting the MLE. However, in some of the cases presented herein there were notable biases, so even if the sampling variability can be reduced by increasing the sample size, care should be taken to avoid large biases in the estimated joint models. Moreover, if the true distribution is unknown, the ML method can give notable biases in the extremes, even if the sampling variability is low. 5. Discussion In this paper environmental contours based on different contour methods, different sample sizes and different estimation methods for the underlying joint distribution of met-ocean parameters are compared for three specific cases, i.e. based on parametric models for North Atlantic (NA) and Northwest Australia (NWA) wave climates and on a hindcast dataset from NORA 10. This is done by way of simulation studies where a large number of environmental contours are calculated based on parametric and non-parametric bootstrap samples. The main aim of this paper is to illustrate the sampling variability of environmental contours due to different amount of data, but some interesting observations regarding the fitting procedure is made as well.
Fig. 16. Checking the Weibull model assumption for Hs in the NORA 10 data.
5.1. Comparison of fitting methods
based on an arbitrary sample of limited size. Finally, results for the N10 data are shown in Figs. 12 and 13 for the 20-year contours. It is noted that in this case no parametric model was assumed initially, and non-parametric bootstrapping on the initial data was performed. Again, sampling variability can be seen to diminish for increasing data sizes, and more so for the ML-fitting method. However, for this case, also the LS- and MoM-fitted models display notable reduced sampling variability with 100 years of data compared to 10 years. In this case, where the starting point was a set of data rather than a know parametric distribution, it is also seen that the uncertainty in the fitting procedure dominates the overall uncertainty. Hence, even in light of the sampling variability and the uncertainty due to different contour methods, the differences between the different fitting methods are significant. In order to illustrate this, Fig. 14 includes all bootstrap 20-year contours for both contour methods in the same plot, for the ML-, LS- and MoM-fitted models respectively, and for 10 and 100 years of data. Rather than plotting all the individual bootstrap contours, the sampling variability can also be illustrated by calculating the mean contour and confidence intervals around it. Hence, environmental contours with 95% confidence intervals for the 20-year conditions are shown in Fig. 15 for all three data cases and for all three fitting methods, based on 10 years of data. Again, it is observed that the uncertainty due to sampling variability is notable and much more so for the least-square and method of moment based fitting procedures. For
For the NA and NWA cases, where the data follow the assumed distribution, it is clear that the maximum likelihood estimation method performs very well, with no obvious bias and rather small sampling variability that can be reduced by increasing the sample size. The LSestimation approach performs inferior in this case, with much larger sampling variability which persist even for quite large datasets. The LSmethod bins the data and discards all observations with Hs-values less than a prescribed cut-off value and results may be sensitive to the bin size and the cut-off value. The method of moment method also performs reasonably well, but with somewhat larger variability than the MLmethod. For the case based on a dataset rather than a known parametric distribution, it is observed that the fitting method has a very large influence on the joint distribution fitted to the data. In this case there is an extra uncertainty in the fact that the data may not follow the assumed joint distribution (3-parameter Weibull and conditional log-normal). The biggest difference is observed for the largest Hs-values, so the Weibull fit to Hs is perhaps the most critical part and the most critical assumption is that the data for Hs follow a marginal 3-parameter Weibull distribution. The validity of this assumption can be checked by plotting the data on Weibull scale, as shown in the QQ-plot in Fig. 16. From this plot the adequacy of the Weibull assumption may be assessed, and it is observed that there are some deviations, particularly in the tail. Possibly, other candidate models would fit better to the data, but
Fig. 17. Results for NA-case without regard to the minimum number of observations within each bin; fitted functions for σ(Hs) (left) and resulting 20-year contours (right). 13
Applied Ocean Research 91 (2019) 101870
E. Vanem, et al.
similar QQ-plots for some other parametric distribution families, i.e. the log-normal, exponential, inverse Weibull, gamma, kappa and Wakeby families does not suggest obvious improvements (not shown). At any rate, it is out of scope of this study to propose other distribution families for the Hs-data but it is clear that the possible deviation from the Weibull-assumption introduce extra uncertainties and this may be one explanation for why the different fitting methods utilized in this study estimate very different marginal distributions for the Hs data when trying to enforce a Weibull distribution. It is noted that the QQ plots are for the complete dataset, and larger deviations may occur for the bootstrapped samples used in the simulation study. Notwithstanding, from the plot in Fig. 16 is clear that the LS- and MoM-methods do a much better job in fitting the tail of the marginal distribution for Hs. Regarding the sampling variability, it is again observed that the LS- and MoM-fitting methods yield considerably larger sampling variability. This can be reduced by increasing the sample size, but continues to be notable even for 100 years of data. It is also observed that in the N10 case, where the Hs data are discrete with a resolution of 0.1 m, the maximum likelihood estimation is found to be sensitive to the lowest value of Hs that occurs in the dataset. In the re-sampled bootstrap samples, the lowest value is sometimes 0.4 and sometimes 0.5 m Hs and this influences the entire fitted distribution; not only the mode of the distribution but also the tails. This can explain the double peak distribution of parameter estimates observed for the marginal distribution of Hs (see Fig. 2) and it is observed that this actually leads to up to 0.5 m difference in the 100-year return value.
case specific and would vary for different structures, different responses and even different locations (environmental conditions). Hence, a detailed investigation of this is out of scope of this paper. In general, however, it can be stated that the estimated extreme structural responses will increase if the contours are inflated. A previous comparison study on the effect of contour method for some structural responses is presented in [12], and it was concluded that, with regard to the shape of the contour due to the direct sampling or the IFORM approach, this would be highly dependent on the shape of the limit state function. It is realized that both IFORM and direct sampling contour methods are based on a linear approximation of the limit state function at the design point. With the IFORM method, this linearization is performed in standard normal space, whereas the direct sampling contour method assumes a linear approximation in the physical space. The most accurate method will be the one where the linear approximation is best. The effect of different contour methods on ship responses has also been addressed in e.g. [15,48], see also [49]. 6. Summary and conclusions This paper has investigated the effect of sampling variability on environmental contours, commonly used for the design and assessment of marine structures for identifying dimensioning extreme environmental conditions (design sea states). Parametric and non-parametric bootstrap methods have been applied to obtain time-series of significant wave height and spectral peak period of different finite lengths. Environmental contours were then calculated from the various bootstrap samples by first fitting a parametric joint distribution and then calculating the contours. As expected, results show that the sampling variability becomes increasingly important for decreasing sample size. Furthermore, alternative fitting methods have been used to fit the joint distribution models, i.e. maximum likelihood, method of moments and least-squares and cumulative density fitting. The results show that the uncertainty due to sampling variability is affected by the fitting method. The effect of sampling variability is generally smallest for results based on maximum likelihood fitting, and largest for the leastsquare based fitting methods. However, it should be kept in mind that in cases where the true probability distribution is not known, as illustrated herein with the N10 case, wrong or inaccurate assumptions on the distribution can introduce notable biases, particularly with the maximum likelihood method. The fitting procedure based on least squares utilizes binning of the data, and the way this binning is performed may strongly influence the results. The uncertainty due to sampling variability has been compared to the differences due to choice of environmental contour method, and notwithstanding the differences between the fitting procedures, it is seen that for cases where there are notable differences between the contour methods, these differences are larger than the uncertainty due to sampling variability. Hence, it can be concluded that, generally, the differences between contour methods are significant even if the uncertainty due to finite data samples are taken into account.
5.2. Binning sensitivity in least square approach With the LS-fitting method, the data are collected in bins and the parametric distribution function is fitted to the binned data. One effect of this is improved fitting of the upper tail of the distribution. However, results may be very sensitive to how the binning is performed, as will be demonstrated in the following and also noted in [23]. If the binning leaves some empty bins or some bins with very few observations, there might be a strong bias in the fitting. Particularly, for the conditional fitting of the log-normal distribution for wave periods. To illustrate this, the NA-case is repeated with LS-fitting without any merging of the bins apart from removal of empty bins. This may result in several bins with very few or even just one observation. This will give very small or even zero values for the conditional standard deviation, and will hence introduce a strong bias in the results. This can be seen in Fig. 17, where results for the NA-case is presented when the LS-method has been used without any regard to the minimum number of observations in each bin. It is observed that the estimated function for the conditional standard deviation of wave periods is strongly biased, and that the resulting environmental contours are equally biased. The plots show results using 10 years of data, but the same bias are clear for any sample size. Obviously, there are ways around this, and one could put bounds on the parameter estimates, or merge bins with less than a pre-specified number of observations as was done in this study. The optimal bin size and cut-off value can be investigated in more detail on a case-by-case basis, but these results indicate that the LSprocedure may not be well suited for a fully automatic fitting procedure as employed in this simulation study, and that very wrong results could occur without proper regard to how the binning is performed.
Acknowledgements The work presented in this paper has been carried out within the research project ECSADES, with support from the Research Council of Norway (RCN) under the MARTEC II ERA-NET initiative; project no. 249261/O80. Our colleague Luca Garrè is kindly acknowledged for providing parts of code that were used in generating the results for the LS-fitting method.
5.3. Impact of contour variability on extreme structural responses This paper explores the effect of sampling variability on environmental contour estimation. This is important since environmental contours are often used to estimate extreme structural responses of marine structures. Hence, it is relevant to investigate the effect of the contour variability on the estimated responses of selected structures. However, the sensitivity to contour variability would obviously be very
References [1] L. Sagrilo, A. Næss, A. Doria, On the long-term response of marine structures, Appl. Ocean Res. 33 (2011) 208–214. [2] F.-I.G. Giske, B.J. Leira, O. Øiseth, Full long-term extreme response analysis of
14
Applied Ocean Research 91 (2019) 101870
E. Vanem, et al.
C205, 2017. [28] NORSOK, NORSOK Standard N-003:2017. Action and Action Effects, 3rd ed., (2017). [29] E. Vanem, 3-Dimensional environmental contours based on a direct sampling method for structural reliability analysis of ships and offshore structures, Ships Offshore Struct. 14 (2018) 74–85. [30] M. Reistad, Ø. Breivik, H. Haakenstad, O.J. Aarnes, B.R. Furevik, J.-R. Bidlot, A high-resolution hindcast of wind and waves for the North Sea, the Norwegian Sea and the Barents Sea, J. Geophys. Res.: Oceans 116 (2011) C05019. [31] O.J. Aarnes, Ø. Breivik, M. Reistad, Wave extremes in the northeast Atlantic, J. Clim. 25 (2012) 1529–1543. [32] E.M. Bitner-Gregersen, Joint met-ocean description for design and operation of marine structures, Appl. Ocean Res. 51 (2015) 279–292. [33] E. Bitner-Gregersen, E. Cramer, F. Korbijn, Environmental description for long-term load response of ship structures, Proceedings of ISOPE-95 Conference (1995) International Society of Offshore and Polar Engineers. [34] E. Bitner-Gregersen, Uncertainties of joint long-term probabilistic modelling of wind sea and swell, Proc. 29th International Conference on Ocean, Offshore and Arctic Engineering (OMAE 2010) (2010) American Society of Mechanical Engineers (ASME). [35] E.M. Bitner-Gregersen, A. Toffoli, Uncertainties of wind sea and swell prediction from the Torsethaugen spectrum, Proc. 28th International Conference on Ocean, Offshore and Arctic Engineering (OMAE 2009) (2009) American Society of Mechanical Engineers (ASME). [36] E. Vanem, A simple approach to account for seasonality in the description of extreme ocean environments, Mar. Syst. Ocean Technol. 13 (2018) 63–73. [37] R. Cheng, N. Amin, Estimating parameters in continuous univariate distributions with a shifted origin, J. R. Stat. Soc. Ser. B 45 (1983) 394–403. [38] D. Cousineau, Fitting the three-parameter Weibull distribution: review and evaluation of existing and new methods, IEEE Trans. Dielectr. Electr. Insul. 16 (2009) 281–288. [39] H. Nagatsuka, T. Kamakura, N. Balakrishnan, A consistent method of estimation for the three-parameter Weibull distribution, Comput. Stat. Data Anal. 58 (2013) 210–226. [40] D. Cousineau, Nearly unbiased estimators for the three-parameter Weibull distribution with greater efficiency than the iterative likelihood method, Br. J. Math. Stat. Psychol. 62 (2009) 167–191. [41] H. Ng, L. Luo, Y. Hu, F. Duan, Parameter estimation of the three-parameter Weibull distribution based on progressively Type-II censored samples, J. Stat. Comput. Simul. 82 (2012) 1661–1678. [42] M. Teimouri, S.M. Hoseini, S. Nadarajah, Comparison of estimation methods for the Weibull distribution, Statistics 47 (2013) 93–109. [43] F.N. Nwobi, C.A. Ugomma, A comparison of methods for the estimation of Weibull distribution parameters, Metodol. zv. 11 (2014) 65–78. [44] H.H. Örkcü, V.S. Özsoy, E. Aksoy, M.I. Dogan, Estimating the parameters of 3-p Weibull distribution using particle swarm optimization: a comprehensive experimental comparison, Appl. Math. Comput. 268 (2015) 201–226. [45] T. Li, W. Griffiths, J. Chen, Weibull modulus estimation by the non-linear least squares method: a solution to deviation occurring in traditional weibull estimation, Metall. Mater. Trans. A 48 (2017) 5516–5528. [46] C. Zhu, R.H. Byrd, P. Lu, J. Nocedal, Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization, ACM Trans. Math. Softw. 23 (1997) 550–560, https://doi.org/10.1145/279232.279236. [47] A.B. Huseby, E. Vanem, B. Natvig, A new Monte Carlo method for environmental contour estimation, Proc. ESREL 2014 (2014) European Safety and Reliability Association (ESRA). [48] E. Vanem, B. Guo, E. Ross, P. Jonathan, Comparing different contour methods with response-based methods for extreme ship response analysis (2019) submitted for publication. [49] E. Ross, O.C. Astrup, E. Bitner-Gregersen, N. Bunn, G. Feld, B. Gouldby, A. Huseby, Y. Liu, D. Randell, E. Vanem, P. Jonathan, On environmental contours for marine and coastal design, Ocean Eng. (2019) in press.
marine structures using inverse FORM, Probab. Eng. Mech. 50 (2017) 1–8. [3] F.-I.G. Giske, B.J. Leira, O. Øiseth, Long-term extreme response analysis of marine structures using inverse SORM, J. Offshore Mech. Arct. Eng. 140 (2018) 0516011–7. [4] A. Næss, O. Gaidai, Monte Carlo methods for estimating the extreme response of dynamical systems, J. Eng. Mech. 134 (2008). [5] J.J. Jensen, Extreme value predictions using Monte Carlo simulations with artificially increased load spectrum, Probab. Eng. Mech. 26 (2011) 399–404. [6] S. Haver, On the joint distribution of heights and periods of sea waves, Ocean Eng. 14 (1987) 359–376. [7] S. Winterstein, T. Ude, C. Cornell, P. Bjerager, S. Haver, Environmental parameters for extreme response: inverse FORM with omission factors, Proc. 6th International Conference on Structural Safety and Reliability (1993). [8] S. Haver, S. Winterstein, Environmental contour lines: a method for estimating long term extremes by a short term analysis, Trans. Soc. Naval Archit. Mar. Eng. 116 (2009) 116–127. [9] A.B. Huseby, E. Vanem, B. Natvig, A new approach to environmental contours for ocean engineering applications based on direct Monte Carlo simulations, Ocean Eng. 60 (2013) 124–135. [10] A.B. Huseby, E. Vanem, B. Natvig, Alternative environmental contours for structural reliability analysis, Struct. Saf. 54 (2015) 32–45. [11] E. Vanem, E.M. Bitner-Gregersen, Alternative environmental contours for marine structural design – a comparison study, J. Offshore Mech. Arct. Eng. 137 (2015) 0516011–8. [12] E. Vanem, A comparison study on the estimation of extreme structural response from different environmental contour methods, Mar. Struct. 56 (2017) 137–162. [13] B.J. Leira, A comparison of stochastic process models for definition of design contours, Struct. Saf. 30 (2008) 493–505. [14] A.F. Haselsteiner, J.-H. Ohlendorf, W. Wosniok, K.-D. Thoben, Deriving environmental contours from highest density regions, Coast. Eng. 123 (2017) 42–51. [15] C. Armstrong, C. Chin, I. Penesis, Y. Drobyshevski, Sensitivity of vessel response to environmental contours of extreme sea states, Proc. 34th International Conference on Ocean, Offshore and Arctic Engineering (OMAE 2015) (2015) American Society of Mechanical Engineers (ASME). [16] R. Montes-Iturrizaga, E. Heredia-Zavoni, Environmental contours using copulas, Appl. Ocean Res. 52 (2015) 125–139. [17] F. Silva-González, E. Heredia-Zavoni, R. Montes-Iturrizaga, Development of environmental contours using Nataf distribution model, Ocean Eng. 58 (2013) 27–34. [18] R. Montes-Iturrizaga, E. Heredia-Zavoni, Multivariate environmental contours using C-vine copulas, Ocean Eng. 118 (2016) 68–82. [19] L.D. Lutes, S.R. Winterstein, A dynamic inverse FORM method: design contours for load combination problems, Probab. Eng. Mech. 44 (2016) 118–127. [20] W. Chai, B.J. Leira, Environmental contours based on inverse SORM, Mar. Struct. 60 (2018) 34–51. [21] L. Manuel, P.T. Nguyen, J. Canning, R.G. Coe, A.C. Eckert-Gallup, N. Martin, Alternative approaches to develop environmental contours from metocean data, J. Ocean Eng. Mar. Energy 4 (2018) 293–310. [22] O. Gramstad, E. Vanem, E. Bitner-Gregersen, Uncertainty of environmental contours due to sampling variability, Proc. 37th International Conference on Ocean, Offshore and Arctic Engineering (OMAE 2018) (2018) American Society of Mechanical Engineers (ASME). [23] E. Vanem, Joint statistical models for significant wave height and wave period in a changing climate, Mar. Struct. 49 (2016) 180–205. [24] E. Vanem, Uncertainties in extreme value modeling of wave data in a climate change perspective, J. Ocean Eng. Mar. Energy 1 (2015) 339–359. [25] R. Montes-Iturrizaga, E. Heredia-Zavoni, Assessment of uncertainty in environmental contours due to parametric uncertainty in models of the dependence structure between metocean variables, Appl. Ocean Res. 64 (2017) 86–104. [26] F. Silva-González, L. Vázques-Hernández, R. Sagrilo, Cuamatzi, The effect of some uncertainties associated to the environmental contour lines definition on the extreme response of an FPSO under hurricane conditions, Appl. Ocean Res. 53 (2015) 190–199. [27] DNV GL, Environmental Conditions and Environmental Loads, DNV GL, DNVGL-RP-
15