EUROPEAN JOURNAL OF OPERATIONAL RESEARCH European Journal of Operational Research 84 (1995) 35-46
ELSEVIER
Knowledge-based DSS for construction contractor prescreening Mahmoud
A . T a h a a, S a n g C, P a r k b , . , J e f f r e y S. R u s s e l l a
a Department of Civil and Environmental Engineering, University of Wisconsin-Madison, Madison, W153706, USA b School of Business, University of Wisconsin-Madison, Madison, 141153706, USA
Abstract
This paper presents the development of a knowledge-based decision support system for predicting construction contract bond claims using contractor financial data. The learning and refining sub-system of the proposed DSS employs Inductive Learning and Neural Networks to extract the problem solving knowledge to catch the contractor's deteriorating financial condition. The acquired knowledge is stored in the knowledge sub-system and continually updated to incorporate recent additional information. This acquired knowledge augments the existing statistical models including multiple discriminate analysis, regression, and logistic regression models. We propose a framework for integrating fragmented models and knowledge into a DSS so that sureties can analyze the outcome of each model and knowledge in what-if manner. Moreover, proposed DSS is equipped with the meta-knowledge selecting the most suitable models and knowledge for the given situation intelligently thus providing peer-opinion for the sureties.
Keywords: Decision support systems; Neural nets; Artificial intelligence; Construction; Finance
1. Introduction
Surety contract bonds provide a mechanism by which an owner can acquire financial protection in the event of contractor's failure. The focus of the surety underwriting process is on the risk of contractor failure and the contractor's ability to cover financial losses. Thus the analysis performed by sureties is more financially oriented. To evaluate the contractor's financial stability, sureties expend a large amount of their effort analyzing the contractor's financial data. Currently, the evaluation process of this data is done
* Corresponding author.
subjectively and with a large dependence on the accumulated experience of the underwriter. Additionally, the interrelationship between the data elements are not thoroughly understood. Therefore, there exists a need to develop a tool to assist in the underwriting process. A claim contractor was defined as the one that defaulted on a surety bond requiring the surety to pay a loss. A nonclaim contractor was defined as one that has not defaulted on a surety bond. In the past few years there has been substantial attention devoted to the use of artificial intelligence techniques such as induction of decision trees or production rules and neural networks as tools for decision support. These methods have been applied to a wide variety of problems in the
0377-2217/95/$09.50 © 1995 Elsevier Science B.V. All rights reserved SSDI 03 77-2217(94)00316-5
36
M.A. Taha et al. / European Journal of Operational Research 84 (1995) 35-46
business area including prediction of bank failure (Tam et al., 1992) and prediction of failure of savings and loans (Salchenberger et al., 1992) because of their ability to discern patterns among data. The integration of these methods with conventional decision support systems can provide a means of significantly improving the quality of decision making by individuals and organizations. This paper deals with the development of a Knowledge-Based Decision Support System (KBDSS) to assist in predicting construction contract bond claims using contractor financial data.
2. Background 2.1. Relevant ex&ting stat&tical models A review of past studies of financial data analysis reveals that multiple discriminate analysis (MDA), regression, and logistic regression models are frequently developed to evaluate these data. Filippone (1976) developed a model to statistically test the dependability of common surety underwriting measures used when analyzing the risk of contractor failure. The model was constructed using MDA on 28 construction firms (14 claim resulting from failure to complete contracts and 14 nonclaim). The model was used to categorize each candidate as a claim or nonclaim case by calculating a Z-score for each firm and comparing it to a predetermined threshold values. Kangari et al. (1992) presented a quantitative model based on financial ratios using regression analysis to assess the financial performance and grade of a construction company, and its chances of business survival. The model was developed in terms of six financial ratios: 1) total current assets to total current liabilities, 2) total liabilities to net worth, 3) total assets to revenues, 4) revenues to net working capital, 5) return on total assets, and 6) return on net worth. The model also considered the following six construction types: 1) general contractors, 2) operative builders, 3) heavy construction, 4) plumbing, heating and air conditioning, 5) electrical work, 6) other specialty trade. The model provides useful information but it was not validated with actual data.
Severson et al. (1993) developed a predictive model using discrete choice modeling to predict the probability of claim on construction contract surety bonds. The data used to develop the model were obtained from surety companies for a recent time period (1983-1991). The variables identified in the model are: 1) cost monitoring, 2) underbillings to sales, 3) total current liabilities to sales, 4) retained earnings to sales, and 5) net income before taxes to sales. A direct comparison of the performance of the statistical predictor models presented is difficult as each study differs with respect to the modeling technique and classification criteria. However, the common result emerges from these past studies is that the analysis of the contractor financial data is often formulated as a classification problem.
2.2. Characteristics of data sample The data sample used in the study were collected from five surety companies underwriting construction contract bonds. These companies provided financial statements of 128 contractors (57 claim and 71 nonclaim) as well whether the company had a formal cost monitoring system. Claim contractors are defined as those who defaulted in a contract bond and caused the surety to incur a loss because of their failure to complete their projects. Approximately 95% of the nonclaim contractors and 26% of the claim contractors had a formal cost monitoring system. The contractors in the data sample were categorized by construction type. The construction types are defined according to definitions found in the Standard Industrial Classification Manual (1987). The construction types addressed in this study are: 1) general building construction, 2) heavy construction, and 3) special trade construction. Table 1 shows the number of contractors for each respective construction type. The time period represented ranges from 1983 to 1991. The financial statements used the percentage of the completion income recognition method. The quality of the financial statements is summarized in Table 2. In this study, each case in the data set represents one contractor. Thirteen variables (attri-
M.A. Taha et al. / European Journal of Operational Research 84 (1995) 35-46
37
Table 1 Construction types in data sample
Table 2 Quality of financial statements in data sample
Construction type (1)
Data characteristics
Quality of financial statement (1)
Building Heavy Special trade
15 24 18
23 25 23
38 49 41
Total
57
71
128
Claim cases Nonclaim cases Total (2) (3) (4)
Audit Review Compilation Unknown Total
butes) have b e e n used to describe each case. T h e o u t c o m e of each case is given as claim or n o n claim. T h e first 12 variables were chosen from the 14 ratios c o m m o n l y used by the D u n & Bradstreet a n d surety i n d u s t r y professionals. T h e s e ratios m e a s u r e a c o m p a n y ' s ability to m e e t shortt e r m obligations (solvency or liquidity ratios), indicate how effectively a c o m p a n y uses its assets (efficiency ratios), a n d show how successfully a c o m p a n y e a r n s a r e t u r n from its o p e r a t i o n s (profitability ratios). T h e t h i r t e e n t h variable describes if the c o n t r a c t o r has a formal cost m o n i t o r i n g system (no = 0, yes =- 1). T h e existence of a formal cost control system assists in m o n i t o r i n g project costs a n d increase the probability of a suc-
Data characteristics Claim cases (%) Nonclaim cases (%) (2) (3) 47 38 9 6
68 25 2 5
100
100
cessful execution of the project. T a b l e 3 p r e s e n t s these variables along with their definition.
2.3. Machine learning methodologies Artificial intelligence t e c h n i q u e s such as ind u c t i o n of decision trees or p r o d u c t i o n rules a n d n e u r a l networks are e m e r g i n g as a p a r a d i g m for classification. T h e s e t e c h n i q u e s differ from traditional p a r a m e t r i c statistical m e t h o d s in a f u n d a m e n t a l way. P a r a m e t r i c statistics r e q u i r e the developer to assume the n a t u r e of the f u n c t i o n a l r e l a t i o n s h i p b e t w e e n the d e p e n d e n t variables a n d the i n d e p e n d e n t variables ( S a l c h e n b e r g e r , 1992).
Table 3 Description of variables used in the study Measures of financial performance
Variable
Definition
Solvency
Quick Ratio
Cash + Account Receivable divided by Total Current Liabilities Total Current Assets divided by Total Current Liabilities Total Current Liabilities divided by Net Worth Total Liabilities divided by Net Worth Total Fixed Assets divided by Net Worth
Current Ratio CL/NW TL/NW FA/NW Efficiency
Profitability
Col _ Per TA/SALES SALES/WC
Average claim
Average nonclaim
1.26
2.64
1.62
3.11
2.23 4.41 - 0.04
1.09 1.38 0.55
77.67 0.55 - 165.32
54.37 0.40 9.85
0.15
0.08
AP/SALES
Collection Period ((Account Receivable/Sales) x 365) Total Assets divided by Net Sales Net Sales divided by Net Working Capital (Current Assets - Current Liabilities) Account Payable divided by Net Sales
NIAT/SALES NIAT/TA NIAT/NW
Net Income After Taxes divided by Net Sales Net Income After Taxes divided by Total Assets Net Income After Taxes divided by Net Worth
- 0.02 - 0.05 2.59
0.03 0.06 0.11
CST_ MON
A formal system where actual field productivity and estimated productivity are compared (% Yes)
26.00
95.00
38
M.A. Taha et al. / European Journal of Operational Research 84 (1995) 35-46
It also requires an a priori assumption regarding the distributions of the data and tends to focus on tasks in which all the attributes have continuous or ordinal values (Quinlan, 1992). Artificial intelligence techniques use data to develop relationships among the variables without requiring a priori assumptions regarding the distribution of attributes values or the independence of the attributes themselves. As a consequence, better results may be expected in case of using these methods when the relationship between the variables does not fit the assumed model (Salchenberger et al., 1992). Several researchers have compared statistical classification methods and artificial intelligence classification methods, such as Breiman et al. (1984), Weiss and Kapouleas (1989), Weiss and Kulikowski (1991), and Tam and Kiang (1992). Some of them are encoded heuristically in the form of metaknowledge in the proposed system. In this paper, inductive learning and artificial neural networks are used to evaluate a contractor's financial data, using financial variables that describe a contractor's deteriorating financial condition.
3. Overview of the proposed system architecture Decision support systems (DSS) are interactive computer-based systems that help decision makers utilize data and models to solve unstructured problems. During a decision making process, information as input is of vital importance. The primary source of information is raw data, which is processed for information by utilizing relevant modeling methodologies. Both data and models are necessary resources that need to be managed efficiently for effective performance of the overall decision making process. In addition to prevailing mathematical and statistical modeling tools, recent growing attention on Artificial Intelligence (AI) methodologies opens a new domain for decision making. Machine learning methodologies, a sub-field of AI, extract knowledge (a learned concept) from training examples (observations) to be used for better decision making. The proliferation of math/stat models and knowledge necessitates a structured way to store,
retrieve, and utilize appropriate models and knowledge along with data. A solution to this is to use intelligent DSS equipped with control structure which controls the selection of proper model or knowledge. This control mechanism is governed by the meta-knowledge. Meta-knowledge can be acquired either through knowledge engineering (human encoding of knowledge) or by the application of machine learning methodologies. Criteria used to select among the competing models and knowledge include but are not limited to the build-up time, the flexibility of representing decision classes (capability of representing multiple decisions and continuous decision values), prediction accuracy, the effectiveness in identifying discriminant attributes and their structural dependency, and the effectiveness of prediction when training data's distribution is multi-modal. The example of meta knowledge is described in Section 4. We propose a specialized decision support system using the Knowledge-Based DSS framework proposed by Piramuthu et al. (1993). Statistical models and knowledge on contractors' financial stability are stored in the Knowledge System (KS) along with meta knowledge. Here, the term 'knowledge' is extended to incorporate models and trained neural networks. Meta-knowledge and induction tree are represented in the rule forms. Statistical models are represented in the form of formulas. A trained neural network is represented in the form of a connected network with weights. The Learning and Refining System (LRS) interacts with the Knowledge System (KS). The knowledge learned and refined incrementally by the LRS are stored in the KS for further processing by Problem Processing System. The LRS thus adds incremental learning facility to the proposed system. The two distinct machine learning methods, inductive learning and artificial neural networks, are incorporated in our LRS. In this research, focus is given to the components of KS and LRS. A detailed description can be found in Section 4. The Problem Processing System (PPS), first consorts the meta-knowledge to select the proper model or knowledge for the decision maker. Then,
39
M.A. Taha et al. /European Journal of Operational Research 84 (1995) 35-46
can be constructed. The intention is to use "0(t0, Y) to predict unobserved (unseen) class Y0 on the basis of its predictor vector t o. The basis of the induction task is a set of positive and negative training examples. In the case of data collected for this study, the positive examples are cases of nonclaim contractors and the negative examples are cases of claim contractors. These examples can be assembled in two ways. First, they may come from an existing database that forms a history of observations. Objects of this type may be reliable from a statistics point of view, however, there may be redundant cases or uncommon cases that have not been encountered during the period of record keeping (Quinlan, 1986). Second, the objects may be a carefully culled set of tutorial examples prepared by a domain expert(s). The key to the development of an inductive learning model is the reliability of the rule induction algorithm. The inductive inference tool used in this study is Quinlan's inductive package C4.5, a descendant of ID3 (Quinlan, 1986). The reasons for selecting this concept learning system include the following:
the PPS gathers input from users through the Language System (LS) interface, and applies them to the selected model or knowledge from the KS to provide the outcome for the decision maker. The proposed system is shown in Fig. 1.
4. Description of decision models and metaknowledge 4.1. I n d u c t i v e l e a r n i n g m e t h o d
Inductive Learning or Concept Learning from Examples can be defined as the process of extracting the description of a class from a set of training examples (Shaw, Park and Raman 1992). The output generated by this method is a set of decision rules consisting of inductive concept definition for each of the classes. For a given classification problem, a set of cases Vl, v 2 v n col= lectively called the training set V is available. Each case consists of two parts v i = ( t , , Yi), where ti is a vector of observations and Yi is a classification decision made by a domain expert. On the basis of the training set, a decision rule r/(t, y) . . . . .
LS
PPS
LRS
KS KNO~EDGE Induction Tree •
USER ~---~ .41--
Meta t, Know -ledge
_ _
Trained Neural Network
C4.5 Inductive Learner
I Back
i'
I ii~uPwra~f~ti°n [
~
Regression Model Logistic Regression Model MDA Model Math/Stat
Models Fig. 1. Proposed intelligent decision support system.
Critic for Refinement
40
M.A. Taha et al. / European Journal of Operational Research 84 (1995) 35-46
1) its ability to work with noisy and missing data anticipated in this domain; 2) its ability to handle both discrete and continuous (real) attributes; 3) its flexibility in representing the induced knowledge in the form of decision trees a n d / o r production rules. The induction process is based on the process of dividing a group of training examples by the value of a selected attribute, in a hope that the examples in a subgroup would belong to the same class. This program generates a classifier in the form of a decision tree. The decision tree structure includes: • a leaf, indicating a class, or • a decision node, specifying a test to be performed on a single attribute value, with one branch and sub-tree for each possible outcome of the test. A decision tree can be used to classify a case by starting at the root of the tree and moving through it until a leaf is encountered. The program starts with a randomly-selected sub-set of input data (working set) to generate a trial decision tree. This tree is used to classify the remaining objects in the training set. If they are all correctly classified, the decision tree is determined to be satisfactory and the process is terminated. Otherwise, a small number of the objects misclassified by the decision tree is added to the working set, to construct a new decision tree. The previous process is repeated until all training examples are correctly classified. C4.5 uses a method of selecting a test to form the root of a tree called gain ratio criterion (Quinlan, 1988). The program also contains heuristic methods for simplifying decision trees. The purpose of these methods is to produce more comprehensive structures without compromising accuracy on unseen cases either by pruning the original tree or by expressing a classification model as production rules. The production rules are in the form L ---,R, where the left-hand side L is a conjunction of attribute-based tests and the right-hand side R is a class. The decision tree shown in Fig. 2 was generated using the C4.5 algorithm. Note that the decision tree is developed in terms of only six
Fig. 2. Induction tree for claim/nonclaim contractors.
variables of the thirteen financial variables used to describe the training set. This is because C4.5 selects first those attributes about which there is the least uncertainty concerning their association with a particular class (identifying validity). The variables selected are: 1) account payable to total sales, 2) net income after taxes to net worth, 3) existence of cost monitoring system, 4) collection period, 5) total assets to total sales, and 6) net income after taxes to total sales. Not only identifying the most essential discriminant attributes, the decision tree provides the structural relationship between them (structural validity). In the statistical models, the same set of attributes may be selected with their relative
M.A. Taha et al. / European Journal of Operational Research 84 (1995) 35-46
weights. Yet, the statistical models fail to show their conditional dependencies. For example, once the value of account payable to total sales is known, the importance of net income after taxes to net worth is reduced (for some claim cases, the latter attribute is not necessary). Nonetheless, statistical models suggest the same weights for the attribute even when additional information is given. If the statistical model is built using stepwise regression, this problem can be avoided partially. However, the decision on selecting attributes for each step and whether the attributes selected in earlier stage should be included in the later stage or not are prone to the subjectivity of the researcher (model driven). On the contrary, the inductive learning method provides this structural dependency based upon the given data rather than researcher's subjective judgement (data driven) so that eliminates possible bias. The third advantage of this induction tree lies in its predictiue ability. Every model can predict the outcome of the unknown case if it is fully described. If values of several essential attributes are unknown, most statistical model fail to predict. On the other hand, the induction tree can provide correct prediction for some cases. Suppose no information is given on the existence of the cost monitoring system. The induction tree still identifies many claim cases using three rules: Rule 1: IF A P / S A L E S > 0.2, T H E N claim. Rule 2: IF A P / S A L E S < 0.2, and N I A T / N W < 0.24, T H E N claim. Rule 7." IF A P / S A L E S < 0.2, and N I A T / N W > 0.24, and Col_Per < 86, and T A / S A L E S < 0.24, and N I A T / S A L E S _< 0, T H E N claim. The prediction accuracy of this decision tree, as determined using 10 fold cross-validation experiments was i2.6% (prediction error).
4.2. Neural network model Neural networks are mathematical models of theorized mind and brain activity (Simpson, 1990). They exploit the massively parallel local process-
41
ing and distributed representation properties believed to exist in the human brain. The primary intent of neural networks is to explore and produce human information processing tasks. The subject of neural network is hardly new, but there has been much recent progress in developing methods for training more complex configurations of these networks. Much of recent excitement in this field can be traced to new learning techniques, coupled with improved implementation techniques which together provided the potential for wider practical applications of these devices. A neural network system is made up of a number of simple and highly interconnected processing elements that process information by their dynamic state response to external inputs. The arrangement of a number of processing elements in different meaningful configurations leads to different neural network models. A special class of neural networks called multi-layer feed-forward neural nets have proved their utility and power in the development of complex classification systems (Wilson et al., 1992; Pao, 1989), is used in this study. A feed-forward network with appropriate link weights can be used to model the causal relationship between a set of variables. The architecture of these, as the name implies, consists of multiple layers of neurodes as shown in Fig. 3. These layers are: (1) input layer that introduce information from the environment to the network, (2) output layer that holds the responses of the network to a given pattern, and (3) middle or hidden layers that are any layers between the input and output layers. Each unit in the middle and the output layers can have a threshold, usually referred to as bias, associated with it. Neural networks with hidden layers have the ability to develop internal representations. The middle layer neurodes are often characterized as features detectors that combine raw observations into higher order features, thus permitting the network to make reasonable generalizations (Salchenberger et al., 1992). The outputs of nodes in one layer are transmitted to nodes in another layer through links that amplify or attenuate such outputs through weight factors. Except for the input layer nodes, the net input to each node is the sum of the
42
MM. Taha et al. / European Journal o f Operational Research 84 (1995) 35-46
weighted outputs of the nodes in the prior layer. For example, the net input to a node in layer j is netj = ~ wuO i.
(1)
1, 2 . . . . . l, where the k-th pattern pair is represented by the input vector A k = {a~, a ke, .. . , a~,} and the desired output vector,
The output of node j is Oi = f ( n e t j )
(2)
where f is the activation function. Activation functions map a neurode's input to its output. It is generally a threshold level of neurode's activity at which the neurode will output a signal. Usually, a neural network model starts with a random set of weights and a training algorithm is used to adjust these weights. In this study, the backpropagation learning algorithm (Rumelhart, Hinton and Williams, 1986; McClelland and Rumelhart, 1988) is used to perform the training requirements. It has been widely used in the development of many applications today. Backpropagation uses a multilayer gradient descent error-correction encoding algorithm. It calculates the inter-connection weights by presenting a set of pattern pairs (Ag, Ck), k =
Ck={d, ck2 .... ,ckm }. The derivation of the backpropagation learning rule requires that the derivative of the activation function exists. The most widely used activation function is the logistic sigmoid function. Eq. (3) shows that the output, Oj, of a unit j is computed by applying the sigmoid function to the net input netj that is calculated using Eq. (1) of this unit: Oj = 1/(1 + e-neb). (3) The learning procedure consists of the net starting off with a random set of weight values, choosing one of the training-set patterns, using this pattern as input, evaluating the output(s) in a feed forward manner, and propagating the error back to adjust the weights. The errors at the output(s) generally will be quite large, which necessitates changes in weights. Using the back-
OUTPUT LAYER > 0.5 nonclaim, otherwise claim
~ ~ ~ ' ~ 3 d
I_../~__~____--~ -4 oo
.....
"~J
-20.36
HIDDEN LAYER
-1.92
~-~-~_~ 15
INPUT LAYER (financial attributes) Fig. 3. Trained neutral for claim/nonclaim contractors.
M.A. Taha et aL / European Journal of Operational Research 84 (1995) 35-46
propagation procedure, the net calculates the change in all the weights in the net for that particular pattern, Agw. This procedure is repeated for all the patterns in the training set to yield the resulting Aw for all the weights for that presentation. This procedure is repeated until it converges to a stable set of weights, which will exhibit only small fluctuations in value as further learning is attempted. To help the convergence, two constants are added and multiplied to the error. They are momentum and learning rate. Since the back propagation is a gradient descent procedure, the system will follow the contour of the error surface-always moving downhill in the direction of steepest descent. In multi-layer networks, the error surface is complex with the possibility of many minima. It is possible for some of the minima to be deeper than others. In this case, a gradient descent method may not find the best possible solution to the problem at hand. However, high-dimensional spaces (with many weights) have relatively few local minima (Rumelhart, Hinton and Williams, 1986). For the purpose of our system, a fully connected neural network with 13 inputs (one for each financial attribute), two middle layers with 15 units for the first and 8 units for the second, and one output unit is used to distinguish between claim and nonclaim contractors. The learning control parameters, including a learning rate of 0.5 and a momentum rate of 0.7, are chosen to control the learning process. Since the outcome belongs to a binary classification (claim vs. nonclaim), only a single output unit of the net is used. The decision can be interpreted as: Output value > 0.5: Nonclaim contractor, Output value < 0.5: Claim contractor. However, it is also possible that we can come up with several more classes including near-claim and near-nonclaim contractors since the output value ranges between [0,1] thus mitigating the rigidity of decision class representation as in the case of a decision tree generated from the inductive learning method. The model performance on unseen cases, measured by conducting 10-fold cross-validation experiments, was found to be 10.25%. In general, a neural network takes more learning time than
43
that of the inductive learning (approximately 5 minutes for generating the induction tree, and 40 hours for a trained neural network on a DecStation 5000/200 Ultrix Workstation - both algorithms written and compiled in C). Thus, there is a tradeoff between the prediction accuracy and learning time. The learning time of the neural network can be reduced considerably by presorting the training data using the induction tree. The convergence time has been reduced by 8 hours approximately. The resulting trained network is shown in Fig. 3. The flexibility of representing output values is one of the advantages of the neural network over the induction tree. Not only can the output value be represented in the continuous form as opposed to the limited number of classes of induction tree, but also multiple outputs can be incorporated in the neural network since each output node of the network represents the dependent variable. Even in statistical models, multiple dependent variables are hard to represent so that they have to be identified one by one.
4.3. Description of the example statistical model The predictive construction contract surety bond claims model developed by Severson et al. (1993) is implemented also in the system. The model calculates the probability of experiencing a claim in the accounting period following the period in which the financial statement was prepared. This model is developed using the same data set described in Section 2.2. Y = 2.27 - 7.72. C S T _ M O N + 45.05 * U B / S A L E S + 13.94 * T L / S A L E S - 13.24 * R E / S A L E S - 34.42 • NIBT/SALES, Probability of claim = eY/(1 + er), where C S T _ M O N is 1 if cost monitoring system is performed and 0 otherwise, UB is under-billing, T L for total liabilities, R E for retained earnings, and NIBT for net income before tax. As in the case of trained neural network, the decision class is represented in the range [0,1]. Prediction accuracy in terms of error rate has not
M.A. Taha et al. / European Journal of Operational Research 84 (1995) 35-46
44
Table 4 Comparison between different modeling methodologies - The case for u n i m o d a l / n o r m a l distribution of data
Statistical model Induction tree Trained neural network
Build-up time
Output value
Prediction accuracy
short short long
continuous distinctive classes continuous/multiple
good good slightly better
Table 5 Comparison between different modeling methodologies - The case for multimodal distribution of data
Statistical model Induction tree Trained neural network
Explanatory capability
Missing value handling
Prediction accuracy
no yes partially yes
no yes no
inferior good slightly better
been calculated since it is beyond the focus of this research.
4.4. The meta-knowledge As described in Section 3, the meta-knowledge guides the best use of existing statistical models and acquired knowledge and resolves possible conflict between the models. If the distribution of all training data (observations) is unimodal and follows a normal distribution, machine learning methodologies do not have a distinct advantage over the statistical methodologies in terms of build-up time, flexibility of output value representation, and prediction accuracy (see Table 4). Considering the cost and benefit, building the proper model has a tradeoff (time vs. accuracy and flexibility). If the training data (observations) follow a multimodal distribution, however, the prediction accuracy of the statistical model decreases (Tam
and Kiang, 1992). See Table 5. It is also known to be true that the distribution of the financial ratios does not always follow the normal distribution (Deakin, 1976). Moreover, models built with machine learning methodologies have the following relative advantages over the statistical model: 1) The induction tree has a superb explanation capability for the rationale of the underlying decision making process by revealing the structural relationship between discriminant attributes; 2) The induction tree can be used even though the case to be predicted contains missing values; and 3) The trained network has flexibility in representing multiple output classes having continuous values. In this preliminary study, the meta-knowledge is built heuristically. Ideally, a thorough comparative experimentation is necessary to construct full-fledged and more accurate meta-knowledge. Fig. 4 illustrates sample meta-knowledge.
IF user-selection exists T H E N user-selected model. ELSE IF missing-value exists T H E N induction-tree. ELSE IF structural decision explanation is necessary T H E N induction-tree. ELSE IF multiple output (dependent variables) are present T H E N trained-neural network. ELSE IF no evidence on unimodal data distribution is present T H E N trained-neural-network (1) & induction-tree (2). ELSE select any model. Fig. 4. Sample meta-knowledge.
M.A. Taha et al. / European Journal of Operational Research 84 (1995) 35-46
5. Summary and conclusion Our proposed system integrates fragmented statistical models and knowledge into a DSS so that sureties can analyze the outcome of each model and knowledge in a coordinated manner. Rather than relying on a single model, sureties can easily compare the outcomes of different methodologies, and thus come up with a better deciSion. Moreover, the proposed DSS is equipped with the meta-knowledge selecting the most suitable models and knowledge for the given situation intelligently, thus providing peer-opinion for the sureties. In this given problem domain, the induction tree shows better prediction in the non-claim case, whereas the trained network provides better prediction in the claim case (which suggests the non-claim cases identified by the induction tree can be re-examined by the trained neural network). Overall, both knowledge sources are good in prediction (the trained neural network is slightly better). Statistical models are good when strict assumptions are given and no specific requirements are present. Yet, if there are missing values in the case to be predicted, or multiple dependent variables are present, and structural explanation on decision is required, knowledge acquired from machine learning methodologies has relative advantages over existing statistical models. Moreover, the acquired knowledge is continually updated to incorporate recent additional information. In conclusion, this acquired knowledge augments the existing statistical models including multiple discriminate analysis, regression, and logistic regression models.
Acknowledgements The third writer sincerely thanks the National Science Foundation Grant No. msm-9058092, Presidential Young Investigator Award for financial support of this effort. The writers also wish to thank the numerous industry professionals who provided the data analyzed in this investigation.
45
References Breiman, L., Friedman, J.H., Olshen, R.A., and Stone C.J. (1984), Classification and Regression Trees, Wadsworth, Belmont, CA. Caudill, M., and Butler, C. (1992), Understanding Neural Networks: Computer Explorations, Vol. 1: Basic Networks, MIT Press, Cambridge, MA. Deakin, E.B. (1976), "Distribution of financial accounting ratios: Some empirical evidence". Accounting ReL,iew, January, 90-96. Duda, R., and Hart, P. (1973), Pattern Classification and Scene Analysis, Wiley, New York. Filippone, R.W. (1976), "A statistical analysis of some common underwriting measures used by contract surety underwriters", Thesis presented at Cleveland State University, School of Business Administration, Cleveland, OH, in partial fulfillment of the requirement for the degree of Master of Business Administration. Kangari, R., Farid, F., and Elgharib, H. (1992), "Financial performance for construction industry", ASCE Journal of Construction Engineering and Management 118/2, 349-361. Lin, F.C., and Lin, M. (1993), "Analysis of financial data using neural nets", A1 Expert 8/2, 36-41. McClelland, J., and Rumelhart, D. (1988), Explorations in Parallel Distributed Processing, MIT Press, Cambridge, MA. Piramuthu, S., Park, S.C., Raman, N., and Shaw, M.J. (1993), "Integration of simulation modeling and inductive learning in adaptive decision support systems", Decision Support Systems 9, 127-142. Quinlan, J.R. (1986), "Induction of decision trees", Machine Learning 1, 81-106. Quinlan, J.R. (1988), "Decision trees and multi-valued attributes", Machine Learning 11,305-318. Quinlan, J.R. (1992), C4.5: Programs for Machine Learning, Morgan Kaufmann, San Mateo, CA. Rumelhart, D.E., Hinton, G.E., and Williams, R.J. (1986), "Learning internal presentations by error propagation", in: D.E. Rumelhart and J.L. McClelland (eds.), Parallel Distributed Processing: Explorations in the Microstructures of Cognition. Vol. 1: Foundations, MIT Press, Cambridge, MA, 318-362. Russell, J.S., Jaselskis, E.J, and Zhai, H. (1993), "Relationship of economic and contractor financial data to surety bond claims", ASCE Journal of Construction Engineering and Management, under review. Salchenberger, L.M., Cinar, E.M., and Lash, N.A. (1992), "Neural networks: A new tool for predicting thrift failures", Decision Sciences 23, 899-916. Severson, G.D., Russell, J.S., and Jaselskis, E.J. (1993), "Predicting construction contract surety bond claims using contractor financial data", to appear in ASCE Journal of Construction Engineering and Management. Shaw, M.J., Park, S.C., and Raman, N. (1992), "Intelligent scheduling with machine learning capabilities: The induction of scheduling knowledge", IIE Transactions 24/2, 156-168.
46
M.A. Taha et al. / European Journal of Operational Research 84 (1995) 35-46
Simpson, P.K. (1990), Artificial Neural Networks, Pergamon, Oxford. Standard Industrial Classification Manual (1987), Executive Office of the President, Office of Management and Budget, Washington, DC. Stone, M. (1974), "Cross-validatory choice and assessment of statistical predictions", Journal of the Royal Statistical Society 36, 111-147. Tam, K.Y., and Kiang, M.Y. (1992), "Managerial application of neural networks: The case of bank failure predictions", Management Science 38/7, 926-947. Weiss, S.M., and Kapouleas, I. (1989), "An empirical compar-
ison of pattern recognition, neural nets, and machine learning classification methods", in: Proceedings of the llth International Joint Conference on AI, Detroit, MI, 781-787. Weiss, S.M., and Kulikowski, C.A. (1991), Computer Systems That Learn, Morgan Kaufmann, San Mateo, CA. Wilson, R., and Sharda, R. (1992), "Neural networks", O R / MS Today, August, 36-42. Yeh, Y.-C., Kuo, Y.-H., and Hsu, D.S. (1992), "Building KBES for diagnosing PC pile with inductive learning", ASCE Journal of Computing in Civil Engineering 6/2, 200-219.