Decision making for contractor insurance deductible using the evolutionary support vector machines inference model

Decision making for contractor insurance deductible using the evolutionary support vector machines inference model

Expert Systems with Applications 38 (2011) 6547–6555 Contents lists available at ScienceDirect Expert Systems with Applications journal homepage: ww...

612KB Sizes 1 Downloads 25 Views

Expert Systems with Applications 38 (2011) 6547–6555

Contents lists available at ScienceDirect

Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa

Decision making for contractor insurance deductible using the evolutionary support vector machines inference model Min-Yuan Cheng a, Hsien-Sheng Peng b,⇑, Yu-Wei Wu a, Yi-Hung Liao a a b

Department of Construction Engineering, National Taiwan University of Science and Technology, Taiwan Ecological and Hazard Mitigation Engineering Research Center, National Taiwan University of Science and Technology, Taiwan

a r t i c l e

i n f o

Keywords: Loss frequency Loss severity Construction insurance Deductible decision

a b s t r a c t Loss risk during the course of a construction project may be described in terms of frequency (i.e., loss frequency) and severity (i.e., loss severity). This study focused on improving the methodology used to evaluate loss risk. The authors first identified the common attributes of building construction project loss through a review of the literature and interviews with experts. Objective factors adequate to describe loss attributes were selected as model inputs. The loss prediction model was created using the evolutionary support vector machine inference model (ESIM) and deployed to evaluate loss frequency and loss severity. This research combined the deductible efficient frontier curve with the indifference curve of risk versus insurance cost, and developed criteria for optimal insurance deductible decision making. Ó 2010 Elsevier Ltd. All rights reserved.

1. Introduction Both project owners and contractors have gradually accepted the importance of construction insurance, which transfers various economic and financial loss risks to insurance providers. Therefore, reducing loss and improving profits through insurance strategies have become a focal point for project partners. Issues of greatest concern in project insurance include the insurance premium level and loss risk that the assured is willing to undertake. While the insurance company tends to emphasize the project loss rate after accepting an insurance obligation, the assured tends to be most concerned about the adequacy of insurance provider guarantee conditions. Regardless of which perspective is taken, loss frequency and loss severity must be calculated in the process of making an insurance decision in order to pursue follow up actions (e.g., determining insurance premium rates and deductibles). In light of the dangerous nature of construction projects, this research employed artificial intelligence (AI) to predict the two studied types of loss risk, i.e., loss frequency and loss severity. Predicting loss frequency allows prediction of the number of loss times during the project insurance period. Predicting loss severity allows prediction of the monetary value of losses caused by project accidents during the project insurance period.

⇑ Corresponding author. Address: #43, Sec. 4, Keelung Rd., Taipei, 106, Taiwan. Tel.: +886 2 27301277; fax: +886 2 27301074. E-mail address: [email protected] (H.-S. Peng). 0957-4174/$ - see front matter Ó 2010 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2010.11.084

To infer loss frequency and loss severity, statistical methods were initially used to generally evaluate theoretical assignments of probability, after which simulation software was used to obtain prospective loss frequency and loss severity. However, construction insurance is a newly developed area in the field of insurance, with significantly fewer contracts in comparison with, for example, business or life insurance. A further complicating factor is that each construction project presents a unique loss risk profile, with risks characterizing one project likely unrepresentative of risks of another. Research published to date in the research are of limited practical value as most incorporate conditional assumptions that render them impractical in application to real world projects. Deductibles represent one of the insurance conditions in the typical construction industry insurance contract, with stipulation terms and authorized amounts affecting the rights of the assured. The most direct effect of deductibles is the ultimate amount of compensation that an insurer is obligated to pay. While a high deductible may allow a contractor to purchase insurance at a premium within the owner’s budget, the contractor is, as a result, subject to a greater out of pocket loss should an accident occur. Conversely, a relatively low deductible will directly increase the amount of the contractor’s insurance premium. While many insurance companies have previously allowed contractors to purchase zero-deductible insurance, most today have stopped offering low-deductible insurance due to construction industry risk and general economic conditions. Based on the above discussions, estimating the loss risk associated with a specific project is an important insurance decision

6548

M.-Y. Cheng et al. / Expert Systems with Applications 38 (2011) 6547–6555

problem faced by contractors. The objective of this research was to establish a loss risk prediction model suitable for application in construction projects of many different risk profiles. 2. Evolutionary support vector machine inference model (ESIM) 2.1. Support vector machine (SVM) Support vector machine (SVM) is a computer training technique popularized in recent years. It is based on a statistics learning theory described by Vapnik (Huang & Wang, 2006). Traditional training techniques usually focus on minimizing empirical risk; i.e., minimizing the classification error of training data. However, SVM aims to minimize the structural risk in finding a probable upper bound of the classification error of training data (Hsu & Lin, 2002). This new computer training technique effectively minimizes the upper bound of theoretical error. Data classification and regression, two critical components of computer science, are being used in increasingly broad and general applications. Traditional classification methods include neural networks (NNs), decision trees and nearest neighbour method, among others. SVM is a new method that has already proved its value through good results in many applications. SVM has relatively firmer theoretical foundations than NNs. Support Vector Classification (SVC) is founded on the principle of minimizing training theoretical structure risk. An important advantage of SVC is its ability to handle linear inseparable problems. SVC utilizes existing data to do training and then selects several support vectors by analyzing the training data to represent the whole data. Some extreme values were eliminated in advance. Finally, selected support vectors were packed into a model and SVC was used to carry out classification on testing data. The concept of Support Vector Regression (SVR) is similar to that of SVC. It maps regression problems from low dimensional to high dimensional vector spaces to identify the support vector in which the appropriate linear regression equation could be obtained.

2.2. Fast Messy Genetic Algorithms (fmGA) The Simple Genetic Algorithm (sGA), an efficient and accurate algorithm, was first developed by Holland in 1975. Goldberg et al. subsequently developed the Messy Genetic Algorithm (mGA) in 1989 in order to improve sGA shortcomings. Several experiments (Huang & Wang, 2006; Lin, 2004) have since shown the mGA much better at solving permutation problems than sGA. In 1993, Goldberg established the Fast Messy Genetic Algorithm (fmGA) to reduce the high memory consumption of operation processes (Goldberg, Deb, Kargupta, & Harik, 1993). The mGA resolved the problem that sGA does not consider logical limitations amongst gene bunches during the optimization process. There are four main differences in solving mechanisms between fmGA and sGA (Feng & Wu, 2006; Day, Zydallis, & Lamont, 2002). The first is that chromosomes of variable length could be adopted in fmGA. Secondly, simple cut and splice are used to replace the sGA operator mechanism. Thirdly, the optimization process includes a primordial and juxtapositional stage. Lastly, competitive templates are adopted to retain the most outstanding gene building blocks in each generation (Drucker, Burges, Kaufman, Smola, & Vapnik, 1997). 2.3. ESIM framework The evolutionary support vector machine inference model (ESIM) fuses SVM and fmGA (Chang & Lin, 2001; Goldberg, Deb, Kargupta, & Harik, 1993; Knjazew & Ome, 2003). In this model, SVM is used to sum up the complicated relationship between input parameters and output parameters, while fmGA searches for the best parameters (C and c) needed by SVM needs in order to improve SVM prediction accuracy. The framework of SVM is shown in Fig. 1. Steps are explained as follows. Default C, c: The value of C and c may be set up differently to reflect case and problem characteristics. C and c can be selected as 1 and 1/M respectively, where M stands for parameter number.

Default C, γ

Training Data Set

SVM Training Model

Average Accuracy (Fitness Function)

Termination Criteria

Yes

Optimized Parameters Fig. 1. ESIM framework.

fmGA Search for Parameters C, γ

No

6549

M.-Y. Cheng et al. / Expert Systems with Applications 38 (2011) 6547–6555

Training data set: Before executing the prediction model, patterns of influence must first be found and as training data into the system as prediction input parameters.

SVM training model: In this step, the user collects relevant historical cases for research. Case influence patterns serve as input parameters and the case decision serves as output parameters.

Table 1 Input parameters for ESIM prediction model. Variable

Project scale Ground floors

Lose frequency predicting Loss severity predicting Input value

Underground floors

s s s s Actual quantities

Project duration

Constructing season

Construction mode

Excavating depth

Total area

Insurance years

Constructing season of geological work

Geology conditions

Structure type

s s

s s

s

s

s s [1]-Hard [2]-Medium [3]-Weak

s [1]-RC [2]-SRC

[1]-Normal [2]-Rainy season [3]-Typhoon

Table 2 Input and output data of the training cases. Case No.

01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50

Input value

Desired output

Ground floors

Underground floors

Excavating depth

Total area

Project duration

Constructing season

Geology conditions

Structure type

Loss frequency

Loss severity

Average loss times

Normalized

Average loss proportion

7 6 8 5 9 10 26 10 24 11 8 9 23 11 20 8 28 10 8 14 10 24 14 26 9 14 13 9 8 24 7 8 13 24 10 8 13 6 7 14 22 8 9 14 10 8 13 8 22 10

2 1 1 2 2 1 3 3 4 3 1 2 4 1 4 1 4 2 1 3 3 4 3 4 2 2 2 2 2 4 1 1 2 3 3 2 2 1 1 4 4 1 2 3 2 3 2 2 3 2

8.3 5.2 5.5 8.5 8.9 5.2 10.1 11.3 14.6 11.6 5.2 9.3 17.8 5.3 15.6 5.2 16.7 8.2 5.3 12.1 11.6 14.6 12.7 16.1 9.1 8.4 8.1 8.7 8.4 16.6 5.5 5.4 8.5 12.3 11.7 8.7 9.3 5.9 5.1 16.5 15.7 5.0 8.9 12.5 9.2 11.8 8.4 8.6 11.1 8.9

723.53 278.00 1190.50 324.35 489.55 2453.00 6436.70 2614.80 7495.96 4244.00 11008.00 1423.86 3342.00 793.57 4403.75 1185.36 6448.27 6027.40 378.50 3962.10 4818.60 4978.40 9974.57 5578.37 7888.85 2766.75 1723.40 2295.74 910.00 4554.40 442.10 4408.10 3852.40 10662.00 943.10 9717.30 2398.20 3420.80 1057.30 3473.20 3840.10 4350.70 603.20 1849.80 4003.60 5913.00 1550.58 850.68 7201.30 1358.60

2.00 1.50 1.42 1.00 1.50 1.83 2.00 1.58 2.50 1.75 2.50 2.00 2.08 1.67 3.00 2.00 3.00 2.50 2.00 1.25 2.00 2.42 2.33 1.75 2.00 2.00 2.00 2.17 2.00 2.50 1.58 1.83 3.00 3.00 1.50 2.00 2.25 1.42 2.00 1.67 2.17 1.42 2.00 2.00 2.50 2.00 2.00 2.00 2.42 2.00

3 1 2 1 1 2 3 3 1 1 2 2 1 1 1 2 2 2 1 1 1 2 3 2 1 1 1 1 1 2 1 2 2 3 3 2 2 1 1 1 1 2 2 2 1 1 1 2 1 2

3 1 2 1 1 3 3 3 1 1 2 3 3 3 3 1 2 2 1 2 2 3 3 3 1 3 2 2 3 3 2 3 2 3 2 3 3 3 2 2 2 2 2 2 3 2 3 3 1 3

1 1 1 1 1 1 2 1 2 2 1 1 2 1 2 1 2 2 1 2 1 2 2 2 1 1 1 1 1 2 1 1 2 2 1 1 2 1 1 2 2 1 1 2 1 1 2 1 2 1

2.500 0.667 1.408 1.000 0.667 1.639 3.000 2.527 0.400 0.570 2.000 1.500 2.880 2.395 1.333 1.000 1.333 1.200 0.500 0.800 0.500 1.240 1.717 2.286 0.500 1.500 0.500 0.461 1.000 2.800 0.633 1.639 1.333 2.000 1.333 2.500 1.778 1.408 0.500 0.600 0.461 1.408 1.000 1.000 0.800 0.500 1.000 1.500 0.826 2.000

0.756 0.169 0.406 0.276 0.169 0.480 0.917 0.765 0.083 0.138 0.596 0.436 0.878 0.723 0.382 0.276 0.382 0.340 0.115 0.212 0.115 0.353 0.505 0.688 0.115 0.436 0.115 0.103 0.276 0.853 0.158 0.480 0.382 0.596 0.382 0.756 0.525 0.406 0.115 0.147 0.103 0.406 0.276 0.276 0.212 0.115 0.276 0.436 0.220 0.596

0.103 0.295 0.523 0.254 0.305 0.032 0.053 0.049 0.019 0.062 0.006 0.043 0.162 0.242 0.035 0.105 0.145 0.013 0.305 0.011 0.371 0.029 0.049 0.189 0.077 0.232 0.107 0.174 0.738 0.030 0.144 0.019 0.071 0.003 0.051 0.006 0.032 0.031 0.388 0.131 0.020 0.022 0.173 0.062 0.017 0.053 0.535 0.340 0.018 0.218

6550

M.-Y. Cheng et al. / Expert Systems with Applications 38 (2011) 6547–6555

Table 3 Input and output data of the testing cases. Case No.

51 52 53 54 55

Input value

Desired Output Loss frequency

Loss severity

Ground floors

Underground floors

Excavating depth

Total area

Project duration

Constructing season

Geology conditions

Structure type

Average loss times

Normalized

Average loss proportion

14 8 7 14 20

3 3 1 3 4

12.5 11.4 5.5 12.6 15.6

3685.50 7379.5 1582.6 3010.30 4100.20

1.25 2.00 1.50 1.67 3.00

3 3 1 2 1

2 3 2 3 2

2 1 1 2 2

1.6 3.0 0.667 2.395 1.0

0.468 0.917 0.169 0.723 0.276

0.053 0.031 0.115 0.111 0.025

Such input and output values became the training data set, and are input into the model as initial training data. SVM regards selected C and c values as default patterns for the first training process. Average accuracy: This step regards the reciprocal of the objective function as the fitness function. A larger value correlates to a superior model framework. Termination criteria: The procedure operates continuously until certain conditions are satisfied, e.g., confirmation of appropriate fitness or absence of conspicuous fitness after making calculations in several generations to demonstrate that convergence has already been reached. Search for fmGA parameters: In this step, fmGA searches for relatively appropriate C and c to serve as parameters for the next generation. Optimized parameters: According to above-mentioned optimization calculations, the best gene set will be retained. The optimum inference model is obtained after gene set decoding as the C and c value for SVM type.

3.3. Loss frequency prediction model 3.3.1. Model training Data from 50 training cases were input into the ESIM system. The optimal chromosome was identified through ESIM self-adaptation and then decoded by the system as the loss frequency prediction value. The terminal training condition was 100 search iterations. From training results, the fault-tolerant parameter was C = 0 and the kernel function parameter was c = 0.7857. 3.3.2. Model testing The main purpose of model testing is to examine and demonstrate the validity of using the inference model after training in predictions involving other cases. The desired and actual outputs of test cases are illustrated in Table 4 and Fig. 2. The obtained RMSE was 0.0886.

Contractors work to minimize construction risk by purchasing a suitably comprehensive insurance package. It is critical to first understand the factors of influence in construction project loss in order to assess adequately loss risk magnitude (loss frequency and loss severity).

3.3.3. Project clustering analysis In order to improve the ESIM predictive accuracy, this research used the K-means data clustering method to aggregate similar data within the context of the concept of data mining. The ESIM model was then employed to obtain better prediction results. From the results shown in Table 4 and Fig. 2, actual output values were close to desired output values after clustering analysis. The RMSE dropped from 0.0886 to 0.0633. The feasibility and rationality of this model was thus further demonstrated.

3.1. Input and output parameters

3.4. Loss severity prediction model

Number of input parameters is not limited in the ESIM model. The most important aspect, rather, is the connection between the parameter and expected objective, as input parameter choice will influence the optimal solution and model operation time. Input parameters that have no direct or important influence on the result simply increase data collection difficulty and model operation time. In certain cases, a result may even be unobtainable. Therefore, unnecessary input parameters must be deleted. However, when input parameters are limited due to lack of data, substitute factors must be identified to serve as input parameters. Considering the suitability of every influence factor based on the above, input parameters were selected as shown in Table 1. In this research, the number of selected input parameters for the loss frequency and loss severity prediction models were 7 and 6, respectively. The output parameters for the models were, respectively, annual average loss total and average loss proportion for the target project.

3.4.1. Model training After 50 training cases were input into the ESIM system, the predicted loss proportion values was output by the system. As

3. Establishing a loss RISK prediction model

3.2. Data collection This research collected data on 55 actual building projects. Fifty of these were employed as training cases (see Table 2). Five served as test cases (see Table 3).

Table 4 Loss frequency predicting outputs of the testing cases. Mode

Case No.

Actual output

Desired output

Optimum parameters (C and c)

RMSE

Before clustering

51 52 53 54 55

0.497 0.786 0.266 0.619 0.243

0.468 0.917 0.169 0.723 0.276

C=0 c = 0.7857

0.0886

After cluster

51

0.467

0.468

0.0633

52

0.792

0.917

53

0.228

0.169

54

0.693

0.723

55

0.274

0.276

C = 20.1 c = 0.0001 C = 28.714 c = 0.0715 C=0 c = 0.0001 C = 36.54 c = 0.091 C = 18.27 c = 0.9091

6551

M.-Y. Cheng et al. / Expert Systems with Applications 38 (2011) 6547–6555 1

0.2

Actual Output before Clustering Actual Output after Clustering Desired Output

0.9 0.8

0.16

0.7

0.14 Output Value

Output Value

Actual Output before Clustering Actual Output after Clustering Desired Output

0.18

0.6 0.5 0.4

0.12 0.1 0.08

0.3

0.06

0.2

0.04

0.1

0.02

0

0 51

52

53 Case No.

54

55

Fig. 2. Loss frequency predicting outputs of the testing cases.

Mode

Case No.

Actual output

Desired output

Optimum parameters (C and c)

RMSE

Before clustering

51 52 53 54 55

0.115 0.051 0.174 0.120 0.113

0.053 0.031 0.115 0.111 0.025

C=0 c = 0.0001

0.0558

After cluster

51

0.047

0.053

C = 14.28

0.0149

c = 4.285 0.031

0.031

53

0.148

0.115

54

0.111

0.111

55

0.025

0.025

52

53 Case No.

54

55

Fig. 3. Loss severity predicting outputs of the testing cases.

Table 5 Loss severity predicting outputs of the testing cases.

52

51

C = 128.57 c = 8.571 C = 14.28 c = 0.0001 C = 42.85 c = 0.091 C = 28.57 c = 0.7143

output values were restricted to a range between 0 and 1, normalization was not needed. The terminal training condition was set at 200 search iterations. From training results, the fault-tolerant parameter was C = 0 and the kernel function parameter was c = 0.0001. 3.4.2. Model testing Desired and actual output values for test cases after ESIM system inference are shown in Table 5 and Fig. 3. The RMSE was 0.0558.

an adequately high degree of loss frequency accuracy. However, in terms of risk measurement, assessing loss severity is far more important than predicting loss frequency. In addition to considerations of loss frequency and severity, risk managers should pay greater attention to predicting loss type. Maximum acceptable prediction error will depend on the intended application of predicting results. If loss risk prediction results are applied to decision making on insurance deductibles, the error should not be larger than the deductible amount. This is to say that the expected responsibility of the assured will be influenced by loss risk prediction results if prediction error makes average monetary losses greater than the deductible. Combined with the loss frequency error, the difference in monetary value due to loss prediction error is an issue on which the assured should pay greater attention. 4. Establishing an optimal deductible decision-making model After predicting anticipated contractor loss risk, the next step is to utilize this information in an insurance decision-making strategy. Presently, the most common risk management approach for contractors is to insure for all construction risk in order to shift risk to the insurance company and reduce the potential losses of contractors. However, the first problem contractors face when purchasing insurance is determining an appropriate deductible level. Therefore, this research aimed to establish an optimal deductible decision-making model for all construction risk insurance by applying loss risk prediction results to the insurance process. 4.1. Composition of risk transferred costs

3.4.3. Project clustering analysis After clustering analysis, the RMSE was further reduced from 0.0558 to 0.0149 (see Table 5 and Fig. 3). Prediction accuracy was significantly improved. 3.5. Assessment of prediction accuracy Risk managers may decide and adopt an optimal management approach that reflects the most appropriate loss type. However, loss frequency and severity are estimated using past loss experience. Risk assessment is further influenced by essential differences in risk and by the quantity and accuracy of training data. The loss risk prediction model was established after verification using training and test cases. The RMSE of loss frequency and loss severity prediction model were 0.0633 and 0.0149, respectively. Generally speaking, predicting loss severity with an adequately high degree of accuracy is typically more difficult than predicting

In selecting a deductible decision-making strategy, risk transfer costs that must be considered by the assured include insurance costs, total potential amount of loss born and the difference in premiums. At different deductible levels, risk transferred costs borne by contractors have nothing in common with one another. The optimal deductible decision-making strategy is to identify the lowest deductible risk transfer cost range. The above-mentioned risk transfer cost may be expressed as follows:

TCðDj Þ ¼ PðDj Þ þ LTðDj Þ þ RðDj Þ

ð1Þ

where TC(Dj) = risk transfer cost when deductible is Dj, P(Dj) = insurance cost for the insured when deductible is Dj, LT(Dj) = total amount of loss borne by contractor when deductible is Dj and R(Dj) = difference in premium payable by contractor when a loss occurs when deductible is Dj

6552

M.-Y. Cheng et al. / Expert Systems with Applications 38 (2011) 6547–6555

The method of calculating total amount of loss borne by the contractor [LT(Dj)] is based on the deductible. The assured must cover the full amount of the loss if the loss amount caused by a single accident (Xi) is less than the deductible (Dj), but is entitled to receive compensation equal to the amount by which the loss exceeds the deductible. Eq. (2) shows the loss amount Li(Dj) that the assured should bear in the ith accident

value reduces the assured’s insured value. At this point, the assured can make up the premium difference and restore the original insured amount. The premium difference is calculated by multiplying the premium difference rate and the amount of the insurance company claims settlement. The rate is not the same as the original rate. The marketplace at present frequently uses 1.5 times the original rate as the premium difference rate. The total premium difference R(Dj) may be calculated as follows:

Li ðDj Þ ¼ minðX i ; Dj Þ

RðDj Þ ¼ CðDj Þ  tðDj Þ

4.2. Expected total amount of loss borne

ð2Þ

Therefore, if N loss accidents happened to the contractor during the insurance period, the total monetary value of the loss (LT) is represented by the sum of each loss incident, as shown in Eq. (3).

LT ¼

n X i¼1

Li ðDi Þ ¼

n X

minðX i ; Dj Þ

4.5. Deductible efficient frontier

However, ESIM prediction output values are ‘‘loss times N’’ and ‘‘average monetary loss of each accident Xave’’ during the insurance period, which transforms Eq. (3) into Eq. (4)

ð4Þ

Using the average monetary loss will cause some disparity with regard to amount of money, but identify the optimal deductible on the same level after confirmation with several actual cases. Therefore, this research assumes that this error is acceptable and that the decision result will not be adversely influenced. 4.3. Insurance costs Insurance premiums and deductibles are negatively correlated, i.e., higher deductibles place a greater responsibility on the assured to compensate for losses sustained. When insurance premiums are relatively low, the insurance company is responsible to pay less compensation for accidents. Due to the special nature of construction insurance and the rarity of insurance cases, insurance premiums are mainly checked and ratified by insurance-rating personnel. Assessments are typically based on statistical data or prior loss experiences. At present, contractors that take out insurance are mostly passive in terms of deductibles decision-making. Contractors usually provide relevant project data to the insurance company in their application. Insurance-rating personnel tentatively draft deductible recommendations and corresponding insurance premium rates based on this data, which contractors can use to consider insurance costs and make their purchase decisions. An overly high premium may be lowered by adjusting the deductible. Otherwise, the contractor may look for another suitable insurance provider. This research targeted executives of insurance companies with relatively good relationships with contractors. The insurance premium and associated deductible (Dj) offered by these companies were obtained using a tailored questionnaire. The relationship between deductibles and premiums is shown as Eq. (5)

PkðDj Þ ¼ I  rkðDj Þ

where R(Dj) = premium difference that the assured must cover, C(Dj) = total value of insurance company compensation and t(Dj) = premium difference rate (equals to 1.5 times of the original rate).

ð3Þ

i¼1

LTðDj Þ ¼ minðX av e ; Dj Þ  N

ð6Þ

ð5Þ

where Pk(Dj) = corresponding insurance premium of the deductible Dj, I = project insured value and rk(Dj) = insurance premium rate for the deductible Dj that kth insurance company provided. 4.4. Differences in premiums The insurance company regards a project’s insured value as the upper limit of compensation responsibility. The assured requests compensation following a loss incident in which the value of the loss exceeds the deductible. The accumulation of settlement claims

In this research, the collocation of the forecasted loss predicted by ESIM with various deductible values allowed calculation of the corresponding premium expenditure. However, forecasted losses and deductibles alone are inadequate for assured decision makers to select the optimal choice. Consequently, this research further utilized these two parameters to establish a ‘‘deductible efficient frontier’’ to sieve the efficient deductible out by eliminating profitless deductible values. The procedure employed to apply the deductible efficient frontier is described below: 1. Insurance cost is designated as the Y-axis, while total amount of losses borne is designated as the X-axis. The latter was predicted using ESIM. The contractor is required to undertake the risk and uncertainty of the prediction error. 2. Various insurance premium points (i.e., deductibles) and the corresponding total amount of losses borne are sketched onto the coordinate. 3. When deductible insurance premiums are the same, the smaller total amount of losses borne would be chosen. When the total amount of losses borne is the same, decision makers would choose the lower insurance premium. After the process of sieving in the third step, remaining deductibles are all efficient choices for the assured. The deductible efficient frontier has, therefore, been formed by linking each efficient deductible. The efficient deductibles that compose the deductible efficient frontier are represented as De [P(De), LT(De)], as Fig. 4 shows. The deductibles above and on the right of the deductible efficient frontier are inefficient, so the assured need only consider the deductibles on the efficient frontier. 4.6. Risk–Insurance Cost Indifference Curve The indifference curve is a line on which associated utilities have no difference. This research used the concept to establish the Risk–Insurance Cost Indifference Curve, in which risk forms the X-axis and insurance cost forms the Y-axis. Risk refers to uncertain losses. The forecasted loss amount is predicted using the ESIM, so that the actual bearing loss remains uncertain. For this reason, this research defines the forecasted loss amount as the meaning of the risk. Because contractor risk partiality remains indeterminate, a straight line is the simplest and most explainable to implement. This research assumes that the Risk–Insurance Cost Indifference Curve is a simple straight line that may be represented as the following equation:

PðDj Þ ¼ a þ b  LTðDj Þ

ð7Þ

6553

D1 ×

De ×

De ×

× De ×

×

Insurance Cost P (Dj)

Insurance Cost P (Dj)

M.-Y. Cheng et al. / Expert Systems with Applications 38 (2011) 6547–6555

D2 × × ×

Dj

× De Deductible Efficient Frontier

× ×

αe α

D3

×

×

P (De )

× De

Optimum Deductible × ×

×

Deductible Efficient Frontier

U1 U0

0

0

Total Bearing Amount of Losses LT (Dj)

Total Bearing Amount of Losses LT (Dj)

Fig. 6. Searching for the deductible efficient frontier using Risk–Insurance Cost Indifference Curve.

Insurance Cost

Fig. 4. Composition of the deductible efficient frontier.

transfer cost (insurance cost only) at zero-risk (i.e., deductible = 0) that the assured must bear as well as the Y-intercept of the indifference curve passing through such a deductible point. Characteristic of indifference curves, the utility lever of zero-deductible point is the same as that of the deductible of the tangent point. This is to say that the Y-intercept value a may be selected as the optimal deductible decision-making criterion. Because the utility of risk– insurance cost indifference tends to rise when nearing the original point, the corresponding deductible of the smallest a will be the optimal deductible. The optimal deductible ae is shown as Eq. (9)

10 8 4

min a ¼ ae ¼ PðDe Þ  b  LTðDe Þ

B 0

LT (De )

10

ð9Þ

A 5. Actual case validation

Total Bearing Amount of Losses

Fig. 5. Diagram of the Risk–Insurance Cost Indifference Curve.

5.1. Verified case data

where P(Dj) = insurance cost, a = intercept of the indifference curve and Y-axis, b = slope of the indifference curve, represent the risk partiality degree of decision maker and LT(Dj) = total amount of loss. Smaller b values indicate higher risk premiums borne by the assured for the same utility level. Risk premium refers to the insurance cost increment with which assured decision makers use to evade risk. As Fig. 5 shows, insurance costs both equal 10 for A and B when the risk is 0. When the risk increases to 10, the risk premium required by B is 10  4 = 6, while the risk premium required by A is 10  8 = 2. This means that B requires a higher risk premium than A under the same risk increment (i.e., B tends to be more risk-averse than A).

The verified case was a building project located in Taipei City. The construction insurance period was between November 26, 1998 and November 25, 2000 (2 years). The project insured value was NTD 119.04 million. Relevant data are shown in Table 6. 5.2. Estimating insurance cost The range of the deductible amount considered by this research was that most frequently used for actual building projects (see Table 7). This research compared the insurance premium rate of the Engineering Insurance Association and two insurance companies (A and B). In actuality, insurance company A, with the lower premium, was chosen to provide insurance for this project.

4.7. Optimum deductible decision-making criterion

5.3. Estimating risk transfer costs

Calculated P(Dj) and LT(Dj) values were sketched on the risk– insurance cost coordinate. The efficient deductibles were then sieved out to form the deductible efficient frontier. Concurrently, the decision maker should determine the slope of Risk–Insurance Cost Indifference Curve based on burden of risk preferences. Moving the risk–insurance cost curve to approach efficient deductibles will eventually touch one point, which represents the optimal deductible (see Fig. 6). Eq. (7) can be rewritten as Eq. (8)

Loss risk was predicted using the model established in this research. ESIM prediction results are shown in Table 8. The predicted average loss amount was compared with corresponding deductibles. The amount exceeding the deductible would be the compensation that the insurance company would be required to pay. The premium difference that the contractor would need to make up was calculated by multiplying the compensation with the premium difference rate. Results of calculation are shown in Table 9. The predicted average loss amount was then compared again with the corresponding deductibles. The smaller one represented the monetary amount that the contractor should pay. This total number multiplied by the loss time is equal to the total loss amount borne during the insurance period. In addition, the

a ¼ PðDj Þ  b  LTðDj Þ

ð8Þ

If the corresponding insurance cost and forecasted loss for each deductible are substituted in Eq. (8), the obtained a will be the risk

6554

M.-Y. Cheng et al. / Expert Systems with Applications 38 (2011) 6547–6555

Table 6 Project data of the testing cases.

Table 10 Calculation results of total amount of losses.

Item

Content

Construction type Constructing season Total area Excavating depth Project scale Structure type Geology conditions

Building Rainy 1423.86 m2 9.3 m 9 Ground floors, 2 underground floors RC Weak

Deductible (NTD)

Contractor bearing loss amount

Total bearing loss amount

R(Dj)

LT(Dj)

0 2000 5000 8000 20,000 50,000 100,000 120,000

0 2000 5000 8000 8457 8457 8457 8457

0 5292 13,230 21,168 22,377 22,377 22,377 22,377

524 333 137 16 0 0 0 0

524 5625 13,367 21,184 22,377 22,377 22,377 22,377

Table 7 Insurance cost of the validation case. Deductible

0 2000 5000 8000 20,000 50,000 100,000 120,000

Engineering Insurance Association

Insurance company A

Insurance company B

Rate (%)

Insurance Cost

Rate (%)

Insurance Cost

Rate (%)

Insurance Cost

– 1.8 1.6 1.0 0.95 0.9 0.82 0.8

– 214,272 190,464 119,040 113,088 107,136 97,613 95,232

1.56 1.3 1.0 0.9 0.84 0.81 0.78 0.7

185,702 154,752 119,040 107,136 99,994 96,422 92,851 83,328

1.7 1.5 1.2 1.0 0.9 0.85 0.8 0.75

202,368 178,560 142,848 119,040 107,136 101,184 95,232 89,280

Table 11 Risk transferred cost of the validation case.

difference in premiums attributable to losses must be incorporated into the total amount of losses LT(Dj). Calculation results are shown in Table 10. Summing insurance cost together with total monetary value of assured-borne losses provides the total risk transfer cost; namely, the cost that the contractor must pay for transferring risk via insurance. Calculation results are shown in Table 11. 5.4. Selecting the optimum deductible The insurance cost [P(Dj)] was designated as the Y-axis and the total amount of loss borne [LT(Dj)] was designated as the X-axis. Various deductible points were sketched onto coordinates, which,

Dj

P(Dj)

LT(Dj)

TC(Dj)

0 2000 5000 8000 20,000 50,000 100,000 120,000

185,702 154,752 119,040 107,136 99,994 96,422 92,851 83,328

524 5625 13,367 21,184 22,377 22,377 22,377 22,377

186,226 160,377 132,407 128,320 122,371 118,799 115,228 105,705

together, defined the deductible efficient frontier based on efficient deductible sieving criteria, as shown in Fig. 7. After that, the deductibles on the deductible efficient frontier were further analyzed. The reliance level of the decision maker on the ESIM prediction model was regarded as the representative parameter of risk attitude value (i.e., the b value). This research assumed five types of contractors in terms of risk attitude, with associated b values falling in the 0–10 range. The system error levers for these five decision maker types were, respectively, 9, 7, 5, 3 and 1. Finally, the insurance cost and expected loss amount of efficient deductibles were entered into Eq. (9), based on optimal deductible decision-making criteria. Corresponding ae values for the various b values were then calculated, and the deductible value with the smallest ae value was designated the optimal deductible. These calculated results are shown in Table 12.

Table 8 ESIM output values of the validation case.

Actual output Transforms into annual average loss times Transforms into expected total loss times

Loss severity predicting 0.043 1.385 2.646

200.000

Actual Output Transforms into average loss proportion Transforms into average loss amount

0.036 0.0072

160.000

8457

Table 9 Calculation results of total difference of premium. Deductible (NTD)

Contractor bearing loss amount

Compensation of single loss

C(Dj)

Rate (%)

t(Dj)

R(Dj)

0 2000 5000 8000 20,000 50,000 100,000 120,000

0 2000 5000 8000 8457 8457 8457 8457

8457 6457 3457 457 0 0 0 0

22,377 17,085 9147 1209 0 0 0 0

1.56 1.3 1.0 0.9 0.84 0.81 0.78 0.7

2.34 1.95 1.5 1.35 1.26 1.215 1.17 1.05

524 333 137 16 0 0 0 0

Insurance Cost (NTD)

Lose frequency predicting

120.000

80.000

4 0.000

0 0

5000

10000

15000

20000

25000

Total Bearing Amount of Losses (NTD) Fig. 7. Deductible efficient frontier of the validation case.

30000

6555

M.-Y. Cheng et al. / Expert Systems with Applications 38 (2011) 6547–6555 Table 12 Optimum deductibles with different risk attitudes of the validation case. De

P(De)

LT(De)

0 2000 5000 8000 120,000

185,702 154,752 119,040 107,136 83,328

524 5625 13,367 21,184 22,377

Min ae = P(De)  b  LT(De) b = 9

b = 7

b = 5

b = 3

b = 1

190,418 205,377 239,343 297,792 284,721

189,370 194,127 212,609 255,424 239,967

188,322 182,877 185,875 213,056 195,213

187,274 171,627 159,141 170,688 150,459

186,226 160,377 132,407 128,320 105,705

Table 13 Validation of the optimum deductible decision-making model. Loss records of the validation case

Corresponding bearing loss of efficient deductible

Date

Loss amount

Actual bearing loss

Efficient deductible

Actual bearing loss

Total

March 29, 1999 April 19, 1999 January 12, 2000 Total Note: Deductible of this case

$3000 $17,200 $10,500

$2000 $2000 $2000 $6000

$0 $2000 $5000 $8000 $120,000

$0 $2000 + $2000 + $2000 $3000 + $5000 + $5000 $3000 + $8000 + $8000 $3000 + $17,200 + $10,500

$0 $6000 $13,000 $19,000 $30,700

is 2000

In order to verify the feasibility of the optimal deductible decision-making model established by this research in practice, the losses recorded in the actual validation case were compared with the deductible selected by the model. Calculation results are shown in Table 13. The deductible selected using the deductible decision-making model was NTD 2000. The corresponding total amount of losses borne was NTD 6000 – a value lower than those of other efficient deductibles. The actual deductible stipulated in this case was also NTD 2000, with the lowest amount of borne losses being NTD 6000. The deductible decision-making model proposed in this research was thus verified as feasible in practice.

6. Conclusion This research established a loss risk prediction model for construction projects, and estimated the insurance deductible utilizing prediction results. This research further established an optimal deductible decision-making model that is valid for use in actual construction projects. Results are further explicated below.

6.1. Loss risk prediction model established This research sieved influence factors correlating with the estimated objective for use as input parameters. The loss risk prediction model for construction projects was established using ESIM. The knowledge of professionals and accumulated loss experience were organized and introduced from previous case studies. The method proposed improves defects attributable to typical failure to distinguish between different project risk profiles when estimating loss frequency and severity. The mapped relationship between input and output parameters can be obtained in the shortest training time using ESIM model, thus facilitating the solving of problems that incorporate uncertainties.

6.2. Optimal deductible decision-making model established This research applied the concept of an efficient frontier to establish a deductible efficient frontier. Combined with the Risk– Insurance Cost Indifference Curve, the risk attitude of the insured was represented by the indifference curve slope. The deductible level with the lowest risk transferred cost could, therefore, be selected from the efficient deductibles. Validation results demonstrate satisfactory consistency between the optimal deductible and the choice actually made by the project’s insurance decision maker, even though risk transfer costs were influenced by a slight disparity between predicted and actual losses. The objective of the assured to realize the lowest possible risk transferred cost was thus satisfied. As such, the optimal deductible decision-making model established in this research is feasible in practical application. References Chang, C. C., & Lin, C. J. (2001). Training nu-support vector classifiers: Theory and algorithms. Neural Computation, 13(9), 2119–2147. Day, R., Zydallis, J., & Lamont, G. (2002). In Proceedings of the 2nd international conference on computational nanoscience and nanotechnology (pp. 36–39). Cambridge: Computational Publications. Drucker, H., Burges, C. J. C., Kaufman, L., Smola, A., & Vapnik, V. (1997). Support vector regression machines. Proceedings of the 10th annual conference on neural information processing systems (pp. 155–161). Cambridge: MIT Press. Feng, C. W., & Wu, H. T. (2006). Integrating fmGA and CYCLONE to optimize the schedule of dispatching RMC trucks. Automation in Construction, 15(2), 186–199. Goldberg, D. E., Deb, K., Kargupta, H., & Harik, G. (1993). Rapid, accurate optimization of difficult problems using fast messy genetic algorithms. In Proceedings of the5th international conference on genetic algorithms (pp. 56–64). San Mateo: Morgan Kaufmann Pub. Inc.. Hsu, C. W., & Lin, C. J. (2002). A simple decomposition method for support vector machine. Machine Learning, 46(1–3), 219–314. Huang, C. L., & Wang, C. J. (2006). A GA-based feature selection and parameters optimization for support vector machines. Expert Systems with Applications, 31(2), 231–240. Knjazew, D., & Ome, G. A. (2003). A competent genetic algorithm for solving permutation and scheduling problems. Boston: Kluwer Academic Publishers. Lin, C. F. (2004). Fuzzy support vector machines. Ph.D. Thesis. Department of Electrical Engineering, National Taiwan University.