Information Sciences 179 (2009) 542–558
Contents lists available at ScienceDirect
Information Sciences journal homepage: www.elsevier.com/locate/ins
Diversity of ability and cognitive style for group decision processes David West a,*, Scott Dellana b a
Department of Marketing and Supply Chain Management, College of Business Administration, East Carolina University, 3205 Bate Building, Greenville, NC 27858-4353, USA Department of Marketing and Supply Chain Management, College of Business Administration, East Carolina University, Greenville, NC 27858-4353, USA
b
a r t i c l e
i n f o
Article history: Received 4 May 2007 Received in revised form 24 October 2008 Accepted 27 October 2008
Keywords: Multi-agent group decisions Ability diversity Cognitive diversity Bankruptcy detection
a b s t r a c t This research investigates the potential for two forms of error diversity (ability diversity and diversity of cognitive style) to increase the accuracy of multi-agent group decision processes. An experimental methodology is employed that rigorously controls for the sources of error diversity. The results indicate that ability diversity decreases group decision errors by approximately 4%. Cognitive diversity is much more effective; decision errors are reduced by approximately 13% by groups formed from four cognitive classes. As sources of ability and cognitive diversity increase, the generalization error of the group decision decreases, and the prominence of the most capable member (i.e., expert) in the group diminishes. Thus, the popular reliance on using more capable members to create high performance homogenous groups may be misguided. This research indicates that a better strategy is to create groups of members that ‘think differently’ and cooperate to produce a group decision. Using this strategy, we are able to reduce the group decision error in two bankruptcy detection data sets by 11–47%. Reductions of this magnitude in high volume, high value, and repetitive decision environments characterizing the financial domain are extremely significant, where error reductions of even a fraction of a percent are welcome. Ó 2008 Elsevier Inc. All rights reserved.
1. Introduction The concepts of multi-agent decision systems [53] and cooperative intelligent systems [45] have recently been proposed for decision applications in economics and finance [11]. The literature distinguishes between multi-agent systems that employ competitive synthesis and systems that use cooperative synthesis [45]. In a competitive synthesis system, group members work separately on the same problem and the best individual solution is identified as the group’s solution. Logically, the accuracy of competitive synthesis systems can be increased by identifying and including smarter experts. By contrast, in cooperative synthesis systems, the individual capabilities of all group members are aggregated to produce a group solution. The primary focus to increase the accuracy of a cooperative synthesis group is to create error diversity among the group members so that group members think differently and disagree on some aspects of the problem theory. There is no advantage in cooperative synthesis systems if all group members have similar problem solving mechanisms and come to the same decision. Several studies conclude that a problem’s error can be reduced by a cooperative synthesis group decision processes (sometimes referred to as ensembles) [21,22,32]. Despite the recent evidence of the superiority of cooperative synthesis group decisions, most of the research to date in the financial domain focuses on competitive synthesis, finding smarter experts. The study of the impact of a group’s error diversity on the accuracy of cooperative synthesis is an underdeveloped * Corresponding author. Tel.: +1 252 328 6370; fax: +1 252 328 4092. E-mail addresses:
[email protected] (D. West),
[email protected] (S. Dellana). 0020-0255/$ - see front matter Ó 2008 Elsevier Inc. All rights reserved. doi:10.1016/j.ins.2008.10.028
D. West, S. Dellana / Information Sciences 179 (2009) 542–558
543
research area. While there has been substantial research on data manipulation methods to create group error diversity, other diversity sources such as variations in group member’s capabilities or cognitive styles are largely ignored [10]. To the best of our knowledge, this is the first study to systematically evaluate the importance of ability diversity and cognitive diversity in group decision processes. Further, we choose to focus on the financial application of bankruptcy detection because this decision environment is characterized by high volume (there are tens of thousands of firms to monitor), high significance (there are millions of dollars at stake), and repetitive activity (requires constant updating and monitoring). Our motivations for this research are supported by the following quote from Brown et al. [10]. ‘‘The number of investigations into using different types of neural networks (or different types of learners in general) in ensembles is disappointingly small. If we want diverse errors in our ensembles, it makes sense that using different types of function approximator may produce this.” 2. Literature review The literature search focuses first on the current theoretical knowledge of multi-agent cooperative synthesis systems and the role of error diversity. We then describe a limited set of the research conducted on competitive synthesis systems, the search for smarter experts. The final subsection characterizes the research on error diversity mechanism for cooperative synthesis systems, the search for smarter groups. 2.1. Theory of cooperative synthesis The theoretical evaluation of the relationship between sources of diversity and group decision accuracy begins with the quadratic error of the group decision and attempts to decompose the quadratic error into terms that measure both the average quadratic error of the individual group members and the group diversity effect. The group classification decision involves algorithms that output discrete class labels, zero–one loss functions, and aggregate decisions produced by majority vote. Unfortunately, the theoretical analysis for classification decisions is complex and is still being developed. We will briefly highlight the theoretical development for regression group decisions that are characterized by real-valued outputs with convex combinations. This problem has been easier to analyze and is well developed. Those readers interested in more detail of the chronological development of this theory are referred to Brown et al. [10]. For regression, the group decision Dg, is a convex combination where each individual member’s decision di is weighted by P P wi Dg ¼ i wi di where wi > 0 and i wi ¼ 1. Krogh and Vedelsby [34] algebraically rearranged the quadratic error of the group regression decision at a test value, v, to produce the following form, referred to as the ambiguity decomposition:
ðDg v Þ2 ¼
X i
wi ðdi v Þ2
X
wi ðdi Dg Þ2 :
ð1Þ
i
The first term on the right-hand side of Eq. (1) expresses the group error as a function of the weighted average error of the individual group members. The second term, referred to as the ambiguity term, is always a positive value and measures the weighted average variability among the group members. Larger disagreements among group members create larger ambiguity values, which in turn decrease the group error. Minimizing the group error involves a complex trade-off: larger ambiguity terms are associated with higher individual error rates. Therefore, higher levels of diversity do not guarantee more accurate decisions. It is essential to achieve an effective balance between diversity (the ambiguity term) and individual accuracy (the average error term) to minimize the group error. We can also observe from Eq. (1) that the error of the convexcombined group will be less than or equal to the average error of the individuals. For group classification decisions we depend on the qualitative advice of Hansen and Salamon [21] who claim that a necessary and sufficient condition for a majority voting group to be more accurate than its individual members is that the individuals are accurate and diverse. 2.2. Competitive synthesis: the search for smarter experts There has been a significant research effort devoted to competitive synthesis systems. This methodology usually takes the form of competitive tournaments with a new champion competing against a subset of traditional algorithms. In this section, we review some of the competitive synthesis examples but caution the reader that this is not an exhaustive coverage. One of the first efforts that Wilson and Sharda [63] who compare the predictive capabilities for firm bankruptcy of neural networks and multivariate discriminant analysis; they conclude that neural networks predict firm bankruptcies significantly better than discriminant analysis. Tam and Kiang [56] investigate a neural network to identify bank defaults. They compare in a competitive manner a neural network, linear discriminant analysis, logistic regression, k-nearest neighbor, and recursive partitioning trees. Their results demonstrate that the neural network shows promise for evaluating bank default in terms of predictive accuracy, adaptability, and robustness. Zhang et al. [69] also conclude that neural networks are significantly better than logistic regression in bankruptcy prediction. Serrano-Cinca [52] propose Self-Organizing Feature Maps for financial diagnosis; the maps project financial ratio information from a high dimensional space to a two dimensional map. The Self-Organizing Feature Map is compared with linear discriminant analysis as well as with multilayer perceptron neural network models [52]. Tsakonas et al. [57] demonstrate
544
D. West, S. Dellana / Information Sciences 179 (2009) 542–558
the efficient use of hybrid intelligent systems based on genetic programming for solving the bankruptcy detection problem. Huang et al. [27] introduce support vector machines to the bankruptcy detection problem in an attempt to provide a model with better explanatory power. The authors obtain prediction accuracy of 80% for both the multilayer perceptron neural network and support vector machines using data from the United States and Taiwan markets [27]. Lee et al. [37] developed a hybrid neural network for bankruptcy prediction and conclude from Korean bankruptcy data that the hybrid neural network is more promising for bankruptcy prediction in terms of accuracy and adaptability. Finally, a number of other authors have focus on variations of neural network models including the following [3,6,9,13,15,16,20,25,28,35,36,45–49, 62,66,67,69,74,75]. 2.3. Cooperative synthesis: the search for smarter groups The ability of competitive synthesis systems to produce accurate decisions is dependent on a number of formal mechanisms that generate diversity of error among group members [10,71]. A substantial set of research has accumulated on diversity generating mechanisms. We will limit our discussion to some of the most significant publications and to those that most closely relate to our research. The reader is advised to consult Brown et al. [10] for a comprehensive survey of error diversity and diversity creation methods. Brown et al. [10] categorizes diversity creation methods as follows: manipulation of training data, manipulation of architectures, and hypothesis space traveled. We refer to diversity sources created by the manipulation of training data as content diversity. The category of error diversity that Brown et al. [10] refers to as manipulation of architecture includes ability diversity and diversity of cognitive style. Ability diversity is created by manipulating the parameters of a single architecture to create learners with differing capacities. Diversity of cognitive style (which we refer to hereafter as simply cognitive diversity) results from manipulation of architectures to create agents that think differently. The initial empirical research on cooperative synthesis systems relies primarily on content diversity (data manipulation methods). Two of the most common methods to generate content error diversity are bagging [7] and boosting [58]. Bagging and boosting operate directly on the examples (i.e., the rows) in the training set. Bagging implicitly creates data manipulation error diversity by random sampling with replacement to form bootstrap replicates, while boosting increases the proportion of observations that are not correctly classified during prior training cycles. Content error diversity can also be created by variable subset selection (randomly choosing subsets of feature variables), adding noise to the input or target data, and by randomly reversing a proportion of the output targets in the training data. Bootstrap aggregation (bagging) is a popular way of achieving content error diversity; each member learns from a unique bootstrap replicate of the original set of learning examples [7]. Breiman [7] demonstrates this by testing bootstrap groups of classification and regression trees on several benchmark data sets, reporting reductions in decision error ranging from 6% to 77%. A number of other studies focus primarily on decision agent systems with content error diversity [5,14,23,26,39,55,60,68,72–74]. Generally, these studies employ a single cognitive style (algorithm) and a single ability level. Skurichina and Duin [54] investigate bagging for linear discriminant classifiers. Franke [19], and West [62] research bootstrapping for groups neural networks while Kim [30] uses both bagging and boosting for groups of support vector machines in multi-class classification problems. Bauer and Kohavi [4] and Opitz and Maclin [42] also research forms of bagging and boosting on decision trees and naïve Bayes algorithms. The explicit content error diversity generated by boosting is studied by Drucker et al. [18] and Schwenk and Bengio [50] for homogenous groups of neural networks and by Kearns and Mansour [31] for decision trees. In contrast to bagging and boosting, subset feature selection creates content error diversity by operating on the feature variables (the columns of the training set). Subset feature selection (selecting random subsets of feature variables for each learner) is particularly useful for problems with large feature sets such as DNA micro-array. Abdel-Aal [1] uses abductive network groups trained on different features for improving classification accuracy in medical diagnosis. Kim and Cho [29] research subset selection in groups learning DNA micro-array, while Kim [30] reports on variable subsets in groups of neural networks to maximize direct mailing response. Yu and Cho [65] create feature subset selection for groups of support vector machines to analyze customers’ historical purchase data. The practice of injecting noise in the learning data is a data manipulation method intended to increase content error diversity by distributing group members over a more diverse set of local minima in the error surface creating a more independent set of estimators. Raviv and Intrator [47] add noise to boosted groups of neural networks, and Diettrich [17] investigates randomization of input data for groups of decision trees and compares the results with bagging and boosting. Zhang [70] adds noise to input data for neural network groups that predict time series forecasts. Content error diversity can be created by data manipulation methods that operate on the training output data. In this research thread, Melville and Mooney [40] study diversity created by oppositional relabeling of artificial training examples in decision trees, while Breiman [8] examines the potential to increase group prediction accuracy by perturbing the outputs alone. A few studies have implicitly examined sources of error diversity generated by the manipulation of architecture (i.e., varying the ability level and/or the cognitive style of group members). Versace et al. [59] estimate the performance of groups consisting of recurrent back-propagation and radial basis function neural networks for predicting the exchangetraded fund DIA. Mukkamala et al. [41] construct a group of neural networks, support vector machines, and multivariate adaptive regression splines for intrusion detection. Peddabachigari et al. [44] also study the problem of intrusion detection
D. West, S. Dellana / Information Sciences 179 (2009) 542–558
545
with heterogeneous groups of decision trees and support vector machines. Cho [12] introduces a mechanism for adapting the structure of a group of self-organizing maps. Malhotra et al. [38] create groups of neural networks, discriminant analysis, quadratic discriminant analysis, k-nearest neighbor, and multinomial logistic regression analysis in a retail department store decision environment. Mangiameli et al. [39] and West et al. [61] both employ cognitive diversity to analyze the accuracy of medical decision support systems and breast cancer detection, respectively. 2.4. Contributions of this research While the effectiveness of data manipulation to create content error diversity is extensively researched, there is little recognition in the literature of the impact of either ability diversity or cognitive diversity on group decision error. We are unable to find any research that explicitly controls by diversity mechanism and estimates the relative contribution to group decision accuracy. The objective of this paper is to estimate the potential of ability diversity and cognitive diversity sources to reduce cooperative synthesis group decision error. We formulate the following specific research questions to support this objective. 1. 2. 3. 4.
What decision accuracies can be expected of groups that are limited to content diversity? Will the inclusion of ability diversity in the groups investigated in question 1 increase decision accuracy? Will the inclusion of cognitive diversity in the groups investigated in question 2 increase decision accuracy? How does the composition of group membership differ between the most accurate and least accurate groups?
In the next section, we discuss the sources of group diversity investigated in this research. Section 4 explains our research methodology and the experimental design used to estimate the group decision error and the effects for changing group composition. The mean generalization error for several levels of diversity is presented in Section 5 for US bankruptcy data, and in Section 6 for Spanish bank bankruptcy data. The changing composition of groups is analyzed in Section 7. We conclude in Section 8 with a discussion of results and implications to guide the development of intelligent decision systems for financial applications.
3. Controlling sources of error diversity Before discussing error diversity, we need to clarify some terminology used in this paper. We adopt the expression ‘‘decision agent” in our research to define an entity that has a specific cognitive style and a mechanism for learning from examples of a problem domain in its environment. The decision agent has the ability to render decisions about the status of objects in its environment. This is a more limited concept of an agent, which is generally defined as an entity that has some form of perception of its environment, can act, and can communicate with other agents. While it is technically possible for our decision agents to communicate with each other during problem solving, we choose to have them work autonomously in this research. We use a group of terms synonymously in the remainder of the paper. A decision group can also be identified as an ensemble; a group member is identified as a decision agent, as a member, or as an algorithm depending on the context of the discussion. In this research, we investigate three major sources of error diversity in group decision processes. The first source of diversity is diversity in learning content caused by the perturbation of the training sets of the bagging algorithm. The second source of diversity is generated by the variation in the agents cognitive capacity as the agent’s architecture is varied. For example, as the MLP hidden layer neurons vary from 2 to 10 (Appendix), the capacity of the MLP agent to represent complex decision boundaries increases. This form of diversity, which we refer to as ability diversity, results from agents that think the same (i.e., have the same approach to solving the problem) but have more or less ability to represent complex decision boundaries. The third source of diversity is created when group membership consists of agents from different cognitive classes; we refer to this as cognitive diversity. For example, we will establish that MLP and KNN agents think differently, i.e., they have different ways of representing the problem and use different heuristics to achieve improved solutions [24]. Each source of diversity will be discussed briefly in the following subsections. 3.1. Bagging to create content diversity The bagging algorithm is the most common method of content diversity for multi-agent decision systems. It functions by varying the learning content available to each intelligent agent. Each agent has some specific learning examples emphasized through duplication in its bootstrap replicate, while other learning instances are missing. For unstable agents, these differences in learning content result in significantly different parameter estimates and potentially a diversity of member decisions that create more accurate collective decisions. Unique training sets (bootstrap replicates) are formed to estimate agent parameters for each group member by sampling with replacement from the original training set partition Li. Details of the bagging algorithm are given below [7]. In step 4, the collective group decision is created by a majority vote of B hypothesis.
546
D. West, S. Dellana / Information Sciences 179 (2009) 542–558
Algorithm for bagging Given: training set of size n and base classification algorithm Ct(x) Step 1. Input sequence of training samples (x1, y1), . . . , (xn, yn) with labels y 2 Y = (1, 1) Step 2. Initialize probability for each example in learning set, D1(i) = 1/n and set t = 1 Step 3. Loop while t < B group members (a) Form training set of size n by sampling with replacement from distribution Dt (b) Get hypothesis ht : X ? Y (c) Set t = t + 1End of loop P Step 4. Output the final group hypothesis C ðxi Þ ¼ hfinal ðxi Þ ¼ arg max Bt¼1 IðC t ðxÞ ¼ yÞ
3.2. Agent architecture and ability diversity Ability diversity is characterized as agents that think alike (i.e., are from the same cognitive class) but have different capabilities to form decision boundaries. Higher capacity agents can form more complex decision boundaries; lower capacity agents form simpler decision boundaries. For agents from cognitive classes like k-nearest neighbor, Kernel density estimation, multilayer perceptron neural network and radial basis function neural network, it is possible to vary the capacity (and therefore the ability) of the agent by the choice of agent architecture. For example, MLPa, MLPb, . . . , MLPe form a sequence of MLP architectures with increasing numbers of hidden neurons and therefore increasing capacities (see Appendix). The term ‘capacity’ refers to a concept from statistical learning theory that refers to the representational abilities of the agents [58]. High capacity agents can form more complex decision boundaries and consequently, fit the training data more closely (possibly at the expense of overtraining and failing to generalize to the test data). Agent capacity is, in turn, related to the concept of shattering. An agent C(H) can shatter a set of data points Xn, if, for all assignments of labels to those data points there exists a H, so the model C(H) makes no errors evaluating that set of data points. The capacity of the agent can then be measured by the Vapnik Chervonenkis (VC) dimension. The VC dimension of C(H) is the maximum h such that some data point set of cardinality h can be shattered by C(H). The VC dimension, h, can then be used to estimate an upper bound on the test error of C(H) with probability 1 g, where h is the VC dimension and N the size of the training set:
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi hðlogð2N=hÞ þ 1Þ logðg=4Þ : TestError 6 TrainError þ N
ð2Þ
3.3. Cognitive classes and cognitive diversity Cognitive diversity results from groups that include intelligent agents from different cognitive classes. Cognitive diversity implies that the agents think differently. Hong and Page [24] define cognitive diversity with two key dimensions, the agent’s internal perspective of the problem and the heuristics used by the agent to achieve improved solutions. A brief discussion of the nature of cognitive diversity among agents follows. The intention is not to provide rigorous coverage of each cognitive class, but to highlight key distinctions in internal perspective and problem solving heuristic. These distinctions are summarized in Table 1. We refer to the categories of problem solving methodologies as cognitive classes. The original Z score model of bankruptcy detection is based on multiple discriminant analysis (MDA) [2]. The MDA cognitive class assumes the sample of observations from p1(x), and p2(x) are multivariate normally distributed with a common variance–covariance matrix. The nature of the MDA’s solution is to find a hyperplane that effectively separates the healthy and bankrupt classes based on a linear combination of the feature vectors (XA). The heuristic used to obtain a solution is a least squares calculation of the linear combination XA of feature variables that maximizes the ratio of between-group variance to within-group variance. Like MDA, the cognitive class of logistic regression (LR) assumes normal class distributions with a common covariance matrix. The LR’s internal perspective of the bankruptcy problem is that the posterior probabilities obey a log linear model of the following form:
log
pðk ¼ 2jxÞ ¼ a þ bx: pðk ¼ 1jxÞ
ð3Þ
The posterior log odds of healthy to bankrupt class is a linear function of the feature vector. A solution is obtained by estimating the LR parameters a and b by maximum likelihood estimation of a parameter distribution conditioning on X. The cognitive class Kernel density estimation (KD) perceives the class density functions p1(x), and p2(x) to be locally constant. KD defines a Kernel K (i.e., multivariate normal density function) that is a bounded function in feature space with integral one. A nonparametric estimation heuristic is used by KD to obtain an empirical distribution of the class densities p1(x), and p2(x). This empirical distribution of the feature vector within a class gives mass 1/nk to each of the examples in the data. _ The local estimate of the density p k ðxÞ can be found by summing each of these contributions with weight K(x xi). This can also be interpreted as an average of Kernel functions centered on each example from the class, the weighted proportion of
D. West, S. Dellana / Information Sciences 179 (2009) 542–558
547
Table 1 Cognitive classes of intelligent agents. Cognitive class
Perspective
Multiple discriminant analysis
Logistic regression
Logit transformation is linear function of X Estimates posterior class probabilities Parametric: assumption of normal class distribution and common covariance matrix Class density function is locally constant Kernel is bounded function in X Nonparametric: estimates posterior class probabilities
Empirical distribution. . . pk(X) is average of Kernel functions. . . Dependent on choice of K and sigma
K-nearest neighbor
Kernel is constant over k-nearest neighbors and zero elsewhere Nonparametric: estimates posterior class probabilities
Posterior distribution is proportion of classes from the nearest k training examples Dependent of distance used and choice of k
MLP neural network
Nonlinear projection of X onto a hidden layer (usually of smaller dimension) Hidden layer neurons interact globally Nonparametric Estimates posterior class probabilities Universal approximation properties
Back-propagation of error, a gradient descent algorithm Dependent on initial randomization of weights Many local minima possible
RBF neural network
Nonlinear expansion of X into an arbitrary (frequently Gaussian) basis hidden layer Linear separation of hidden layer values Covers theorem Hidden layer neurons are localized nonlinearities Nonparametric Estimates posterior class probabilities Universal approximation properties
Clustering locates positions for hidden layer neurons Output weights determined by linear least squares optimization Dependent on parameter choice Many local minima possible
Kernel density estimation
Heuristics
Partition feature space by hyperplane Decision boundary (surface) Based on linear combination of feature vector XA Parametric: assumption of multivariate normality
Deterministic for specified training set XA to minimize ratio of between-group to within-group variances Maximum likelihood estimation
points around x which have class k. With empirical distributions for the class conditional probabilities, the posterior probabilities can be estimated as follows:
P Kðx xi Þ ^ðkjxÞ ¼ P2i¼k p : i¼1 Kðx xi Þ
ð4Þ
Agents of the K-nearest neighbor cognitive class (KNN) perceive the problem with a simple adaptive Kernel that is constant over the nearest k examples and zero elsewhere. This suggests a simple estimate of the posterior probability distribution as the proportions of the classes among the nearest k data points. The multilayer feedforward neural network (MLP) is a cognitive class designed to mimic (in a crude fashion) biological brain mechanisms. This agent forms an internal representation of the bankruptcy problem by a nonlinear projection of the feature variable X onto a hidden layer of neurons, followed by a nonlinear mapping from the hidden layer to two output nodes that estimate the posterior probabilities of membership in the decision classes. This nonlinear mapping of a single hidden layer network is of the general form:
pðkjxÞ ¼ fk ak þ
X
wjk fj aj þ
j!k
X
!! wij xi
;
ð5Þ
i!j
where j is the number of neurons in the hidden layer, and k in the output layer, wij is the weights between input and hidden neurons, while wjk is the weights between the hidden and output layers. MLP uses an error back-propagation heuristic to iteratively find improved solutions. During a training process, the heuristic analyzes the squared error gradient with respect to the weights and uses a steepest descent technique to decrease the error. While MLP constructs a global approximation to a nonlinear input–output mapping, agents of the radial basis function (RBF) cognitive class estimates local approximations to the nonlinear input–output mappings using exponentially decaying localized nonlinearities, G. RBF perceives the posterior probability as a linear combination of basis functions. When G is Gaussian, RBF can be seen as extending the notion of approximating a probability density by a mixture of known densities:
pðkjxÞ ¼ a þ
X
bj Gðkx xj kÞ:
ð6Þ
j
The norm is unspecified, but is generally a Euclidean or Mahalanobis distance. The heuristic used by the RBF expert to find improved solutions involves two stages. In the first stage a clustering algorithm (like Self-Organizing Maps, learning vector quantization, or k means clustering) is used to fix j centers in the data, xj. With fixed centers, the coefficients a and b are calculated by an ordinary least squares algorithm.
548
D. West, S. Dellana / Information Sciences 179 (2009) 542–558
4. Research methodology Our research methodology is designed to rigorously contrast differences in group decision accuracy between the commonly used content diversity (manipulation of data) and both ability diversity and cognitive diversity (manipulation of architecture) in group decision processes. In this section, we define the important decisions necessary to conduct this research, including the partitioning of the US bankruptcy data set, the formation of bootstrap replicates, agent selection and configuration, and controlling the sources of agent ability and cognitive diversity. We then introduce the support vector machine that we employ as a benchmark for assessing the US bankruptcy data accuracy results. To support the generalization of our findings from the US bankruptcy data we repeat the experiments using a second data set on bankruptcy in Spanish banks [51]. Finally, we describe a methodology to analyze the impact of group composition on decision accuracy. Before discussing the research methodology, we introduce the formal notation for the two-group bankruptcy classification problem used in this research. From the population of business organizations, each firm is to be classified as belonging to one of two classes k = {1 = bankrupt, 2 = healthy}. The proportion of bankrupt and healthy firms in the population is p1 and p2, respectively. Knowledge of the recent financial performance of firm i = {1, . . . , N} is represented by the feature vector Xi. The feature vectors from each of the two classes are distributed according to the density function p1(x), and p2(x), the class conditional probabilities. The decision task is to classify a firm as belonging to the bankrupt or healthy class on the basis of an observed value of Xi = x. In the ideal case, both the class densities pk(x) and class prior probabilities pk are known. The familiar Bayes classifier for the two-group problem is then an optimal decision model.
pk pk ðxÞ : l¼1 pl pl ðxÞ
pðkjxÞ ¼ P2
ð7Þ
This characterizes the posterior probability in terms of the class conditional and prior probabilities. If the decision makers’ risk function is to minimize the expected error rate, the optimal Bayes decision rule is to classify a firm into the class with the highest value of the product of the prior probability times the class conditional probability. 4.1. US bankruptcy data and partitions The US bankruptcy data is constructed from Standard and Poor’s Compustat database. The feature variables are the set of five key financial ratios originally used by Altman for the Z score model [2]: working capital/total assets, retained earnings/ total assets, earnings before interest and taxes/total assets, market value of equity/book value total liability, and sales/total assets. These ratios are calculated from financial statement information two years prior to a bankruptcy event. There are 329 companies in the data set with 93 bankrupt companies and 236 healthy companies. We use labels of 1 (bankrupt) and +1 (healthy). The data is first partitioned into subsets that are used to train the agents (learning data), to prevent overfitting (validation data), and an independent set of data to estimate generalization error (test set). Our data partitioning methods follow the strategy used by Breiman [7]. The complete data set of 329 observations is partitioned into a learning set Li with 70% of the examples (observations 1–230), a validation set Vi with 15% (observations 231–280), and an independent holdout test set Ti with 15% (observations 281–329). To increase the validity of the results, the data set observations are shuffled and re-partitioned a total of i = 100 times creating 100 different learning, validation, and test sets. A total of 100 bootstrap replicate training sets LiB are formed from each of the 100 learning partitions (Li) for a total of 10,000 bootstrap training replicates. The bootstrap replicates are formed by sampling with replacement from the original training set partition Li. An implication of the bootstrap process is that some of the original training set observations will be missing in LiB, the bagging training set, while other observations may be replicated several times. Each of the 22 unique agents (defined in Section 3.2) is trained on each of the 10,000 bootstrap training replicates and tested on the respective test set. The testing decisions of each agent (total of 490,000 decisions) are stored in a decision database. Specific sampling procedures (defined in Section 4.3) of this database produce groups with different error diversity mechanisms. 4.2. Configuration of intelligent agents The intelligent agents selected for this research represent most of the agents investigated in prior financial decision research as well as those used in commercial applications. These include multiple discriminant analysis (MDA), logistic regression (LR), two different neural network algorithms (multilayer perceptron (MLP) and radial basis functions (RBF)), K-nearest neighbor classifier (KNN), and Kernel density (KD). We use the term ‘‘unique agent” to refer to a decision agent with a unique ability level, i.e., a specific model with defined parameter values. For example, a k-nearest neighbor classifier with two nearest neighbors is a unique agent. Twenty-two unique agents (defined in the Appendix) are used in this research. With the exception of MDA and LR, all of these agents require specific parameter values; the selection of parameter values is guided by principles of simplicity and generally accepted practices. For example, the neural network agents are limited to a single hidden layer, with the number of hidden layer neurons ranging from 2 to 10. A total of five specific abilities (parameter configurations) are created for each of these agents (also defined in the Appendix).
D. West, S. Dellana / Information Sciences 179 (2009) 542–558
549
4.3. Constructing decision groups with controlled sources of error diversity We measure the generalization error of the decisions reached by the groups of decision agents at controlled levels of ability diversity and cognitive diversity and contrast the results to baseline conditions from groups that are limited to content diversity. All groups formed in this research have a total of 34 members; this quantity exceeds the level of 25 members Breiman found necessary for effective bagging results [7]. Our strategy is to create a database of agent decision results at each experimental condition and to randomly sample the database to create group decisions. We produce a total of 490,000 agent decision results for each of the 22 unique agents. Three different sampling plans are used to create groups with content error diversity, content and ability diversity, and content, ability and cognitive diversity. Results are averages of 1000 random sampling iterations for each of 100 data partitions. 4.3.1. Forming groups with content diversity only By constraining the sampling of database decisions to a specific unique agent, the resulting groups are limited to content error diversity (the database of decision results consists of unique agent decisions for 10,000 bootstrap learning replicates). A description of our specific sampling procedure follows. Take the first agent from the list in the Appendix, MLPa. Start with the first data shuffle and randomly identify 34 of its 100 bootstrap replicates by sampling without replacement. Each of the 34 replicates serves as one member of the decision group; the result is a matrix consisting of the individual decisions of the 34 group members for each of the 49 test examples in the first data shuffle. Majority vote is applied to the 34 member decisions to produce a group decision for each of the 49 test examples (the reader is referred to Pasi and Yager [43] for a discussion of majority opinion in group decisions). The group decision is then compared to the target result to determine a group generalization error. The procedure is repeated for shuffles 2, . . . , 100. The process of estimating the generalization error for 100 shuffles is then repeated 1000 times yielding an estimate of the generalization error for MLPa. This process is repeated for each of the remaining 21 unique agents. 4.3.2. Forming groups with content and ability diversity A second experiment tests the effectiveness of agent ability diversity. This is accomplished by forming groups with membership constrained to those cognitive classes where multiple architectures or ability levels are possible. There are four cognitive classes with ability diversity, MLP, RBF, KNN, and KD; each of these classes has five ability levels. For example, to estimate a generalization error for the MLP cognitive class, members of the decision group would include MLPa, MLPb, MLPc, MLPd, MLPe. In this experiment, all 5 of the unique agents in the cognitive class are represented in the group. To fill each group the order of the five unique agents is randomly determined and the agents are replicated a number of times. The first four agents are replicated 7 times and the last agent 6, giving 34 group members. Each time a unique agent is replicated in a group, a different bootstrap result is randomly selected without replacement from the population of 100 bootstrap replicates. Each experiment again yields a decision matrix with 34 members’ individual decisions on 49 test examples. A group generalization error is estimated by using majority vote to determine 49 group decisions and comparing the group decisions to known targets. Following the same process described in Section 4.3.1 this process is repeated for the 100 data shuffles and for 1000 iterations. 4.3.3. Forming groups with content, ability, and cognitive diversity This experiment tests the effectiveness of differences in cognitive style by forming groups whose members are randomly constrained to two and four different cognitive classes, respectively. This experiment uses the process described in Section 4.3.1 for data shuffles, error estimation, and number of experimental iterations. The difference in this experiment is the sampling procedure. For groups of 2 cognitive classes, the 2 cognitive classes are randomly selected without replacements from the set of 4 classes. All 5 unique agents for each of the cognitive classes are included in the group. The groups with 4 cognitive classes consist of all 5 unique agents from all 4 cognitive classes. Unique agents are again replicated to create groups of 34 members. 4.4. Diversity measurement Metrics to estimate the amount of error diversity in a group decision process is still an open research issue [10]. None of the multitude of diversity metrics proposed is strongly correlated with group decision accuracy. Recognizing these limitations, we measure the diversity of each group of agents by the dissimilarity (di(x)) between an agent’s decision (Ci(x)) and the aggregate decision of the group of agents (C*(x)) as defined by Melville and Mooney [40]:
di ðxÞ ¼ 0 if C i ðxÞ ¼ C ðxÞ;
otherwise di ðxÞ ¼ 1:
ð8Þ
To compute the diversity of a group of size B on a training set of size N, an average group diversity measure is calculated
D¼
B X N 1 X di ðxÞ: BN i¼1 j¼1
ð9Þ
550
D. West, S. Dellana / Information Sciences 179 (2009) 542–558
4.5. Generalization of findings: the support vector machine and Spanish bankruptcy data We include two other experiments to assess whether the findings of our US bankruptcy data will generalize. First the support vector machine (SVM) is tested on the US bankruptcy. The appeal of SVM is the focus on structural risk minimization from computational learning theory [58]. For this reason, SVM is an ideal candidate to benchmark our US bankruptcy data results. The second experiment involves repeating the full US bankruptcy data experimental design with an independent database of Spanish bank failures [51]. This includes partitioning of the Spanish banks data set, the formation of bootstrap replicates, agent selection and configuration, and controlling the sources of ability diversity and cognitive diversity for each agent. The Spanish bank data are more focused, containing only 66 cases with 9 financial ratios. Bank failures in this data occurred during a period from 1977 to 1985. 4.6. Assessing the impact of group composition on decision accuracy Our final research question investigates the differences in group composition between the most accurate and least accurate groups. In this experiment, we repeat the experimental design with different agent sampling constraints; group membership is not constrained by cognitive class but simply by the number of unique agents. Groups are randomly formed with 1, 4, 8, and 12 unique agents. This strategy randomizes both ability and cognitive diversity in the group. At any point in the group formation process, all agents have an equal probability of joining independent of earlier membership decisions. We then retrospectively rank agents by the frequency of occurrence in the most accurate and least accurate groups. 5. Results of agent diversity strategies, US bankruptcy data Each result reported in this section is based on 1000 random group decision formations for each of the 100 random data partitions. We first discuss our baseline conditions for this research (the performance of groups limited to content diversity). We then contrast the performance of groups with content and ability diversity, and finally groups with content, ability and cognitive style diversity. 5.1. Research question 1: Accuracy of groups limited to content diversity We use the term homogenous group for a collection of agents from a single cognitive class with a single ability level. There are 22 possible homogenous groups for this research, one for each of the unique agents defined in the Appendix. The only source of diversity for these groups is the content diversity generated by the bagging algorithm. This case is a baseline for comparison with the other forms of diversity investigated. The generalization error (independent holdout test set error) for the group decisions of each of the 22 homogenous groups is plotted in Fig. 1 for the US bankruptcy data. The average generalization error across all homogenous groups is 0.145. The least accurate homogenous group is the KNNa group with an 0.18
0.17 KDa
KNNa
Y=Generalization Error
0.16
y = 0.2617x + 0.1281 2 R = 0.5914
KDb KDd
KDe
0.15
KNNb RBFa
KNNc RBFc
0.14 RBFb RBFe
0.13
KNNe
KNNd
KDc RBFd
MLP groups "The experts"
0.12
0.11
0.1 0
0.02
0.04
0.06
0.08
0.1
X=Melville Diversity Fig. 1. Generalization error for homogenous group decisions US bankruptcy data.
0.12
0.14
551
D. West, S. Dellana / Information Sciences 179 (2009) 542–558
error of 0.171, while the most accurate homogenous group is the MLPc group with an error of 0.131. All of the MLP homogenous groups are closely clustered and labeled ‘‘the experts” for this problem since they exhibit the greatest independent ability to solve the problem. It is evident from Fig. 1 that there is a significant range of variability in the accuracy of homogenous groups, with a difference of 0.04 between the most accurate and least accurate group. One of the striking features of Fig. 1 is that group decision error increases as error diversity (measured by the Melville diversity metric) increases. A linear model of the homogenous group decision results (Eq. (11)) has a positive slope estimated to be 0.2617 with a p < 0.000. One potential explanation for this phenomenon is research reporting that group accuracy is not always strongly correlated with error diversity [10]. We also speculate that the homogenous groups exhibit an ineffective trade-off between the loss of individual accuracy for the less capable agents and the gain in accuracy from error diversity generated by content diversity theory. To understand this we must recognize that the capability of the expert in a particular problem results from a match between the problems, computational characteristics and the expert agent’s cognitive style. An expert whose cognitive style matches the computational requirements for optimal problem solutions exhibits little error diversity. The less capable decision agents exhibit lower capabilities and higher error diversity. The error diversity generated by content diversity (in this case bagging) is not adequate to overcome the decrease in individual abilities for the less capable agents. We found a disproportionate share of the research and applications of decision focused on homogenous groups. An important lesson implied by our results for homogenous groups is the critical need to search for the expert decision agent. This imperative will decrease as we examine groups with ability and cognitive diversity in the next sections:
Group Error ¼ 0:2617 Melville diversity þ 0:1281 ðR2 ¼ 0:5914Þ:
ð10Þ
5.2. Research question 2: Accuracy of groups limited to content and ability diversity Fig. 2 plots the error for groups of decision agents from a single cognitive class but with varying levels of ability as a function of the Melville diversity metric for the US bankruptcy data. These groups, shown as large circles in Fig. 2, consist of decision agents from a single cognitive class (KD, KNN, MLP and RBF cognitive classes) with content diversity and five different ability levels. The error diversity created by varying agent ability results in group decisions that are more accurate than the homogenous groups that are limited to content diversity only. The average error of all four groups with ability diversity is 0.139, compared to 0.145 for the homogenous groups. Our empirical response to research question 2 is that the addition of ability diversity reduces the group decision error by 4.13%. It is also interesting to observe that the KD group of agents with an error of 0.130 achieves more accurate group decisions than the MLP agents (0.133), despite the fact that all five MLP agents were identified as problem experts in Section 5.1. 5.3. Research question 3: Accuracy of groups with content, ability and cognitive diversity The results discussed in this subsection are groups of decision agents with content diversity, ability diversity, and cognitive diversity achieved by sampling from 1, 2, and 4 cognitive classes, respectively. The average generalization errors are 0.16
0.15
Y= Generalization Error
KNN class
RBF class
0.14
1 cognitive class
KD class
0.13
MLP class Cognitive diversity: groups sampled from 1, 2, and 4 cognitive classes respectivley
Ability Diversity: groups sampled from unique agents in a single cognitive class
0.12
y = -0.7451x + 0.1939 2 R = 0.9932
y = 0.137x + 0.1292 2 R = 0.25
0.11
4 cognitive classes
0.1 0
0.02
0.04
0.06
0.08
0.1
X= Melville Diversity Score Fig. 2. Group generalization error vs. ability and cognitive diverstiy US bankruptcy data.
0.12
552
D. West, S. Dellana / Information Sciences 179 (2009) 542–558
plotted as squares in Fig. 2. These results offer convincing empirical evidence that the inclusion of decision agents from different cognitive classes in the group reduces the generalization error and increases the diversity metric. The group error decreases from 0.139 for 1 cognitive class to 0.131 for 2 classes and to 0.126 for 4 classes. The slope of the linear model in Eq. (11) for the cognitive diversity groups is estimated to be -0.7451 with a p = 0.053:
Group Error ¼ 0:7451 Melville diversity þ :1931 ðR2 ¼ 0:9932Þ:
ð11Þ
The average error of decision groups formed with 4 cognitive classes (0.126) is the lowest error obtained for all experimental conditions. Paired two sample mean t tests are used to test for significant differences in means between the 4 cognitive class results and the following: 2 cognitive class results, KD class with varying levels of ability, and the homogenous MLPc group. There is a statistically significantly difference between the mean generalization error of the 4 cognitive class results and the 2 cognitive class result, the KD group and the MLPc homogenous groups, with p < 0.000 in all cases. Our results confirm that the diversity of cognitive classes is the most effective method of forming accurate decisions from groups of decision agents. The groups with cognitive diversity are significantly more accurate than groups of homogenous agents and groups from a single cognitive class with varying ability levels. We also find that groups of 4 cognitive classes are significantly more accurate than groups of two cognitive classes. To summarize our estimates for research questions 2 and 3, the group decision error was reduced by 4.13% by adding ability diversity to content diversity. An additional reduction of 5.76% is achieved by adding two cognitive classes to the ability and content diversity sources for a total reduction relative to the baseline content diversity groups of 9.7%. If four cognitive classes are added to the ability and content sources, the group error is reduced an additional 9.35% for a total reduction 13.1%. 5.4. Benchmarking with support vector machine The generalization error of the SVM on the US bankruptcy data is estimated at 0.154, slightly greater that the average error of all homogenous groups (0.144) and significantly higher than the error of the homogenous MLPc group (0.131). This demonstrates that our group decision results outperform the SVM benchmark. It is noteworthy that Huang et al. [27] compare SVM and MLP decision agents on US and Taiwan credit markets and they report similar accuracies for both agents.
6. Generalization of findings; Spanish banks bankruptcy data The results of the Spanish bank analysis are shown in Figs. 3 and 5. The Spanish bank results strongly confirm the earlier conclusions from the US bankruptcy data. Again we observe an increasing error with increasing values of the Melville diversity metric for the homogenous groups (triangular markers in Fig. 3). The linear equation of the Spanish bank homogenous
0.25
Y= Generalization Error
0.2
y = 1.7279x - 0.0846 2 R = 0.8462
0.15
KNN class
Homogeneous groups of a single cognitive class 1 cognitive class RBF class
0.1 MLP class Ability diversity: groups sampled form unique agents in a single cognitive class
0.05
KD class
Cognitive diversity: groups sampled from 1, 2, and 4 cognitive classes respectivley
4 cognitive classes y = -2.0498x + 0.336 2 R = 0.994
0 0
0.02
0.04
0.06
0.08
0.1
0.12
X= Melville Diversity Score Fig. 3. Group generalization error vs. ability and cognitive diversity Spanish bank data.
0.14
0.16
553
D. West, S. Dellana / Information Sciences 179 (2009) 542–558
groups has a slope of 1.7279, with p 6 0.000. The error diversity created by ability (plotted as circles in Fig. 4) produces group decisions that are only slightly more accurate than homogenous groups. The average error of all groups with ability diversity is 0.104, compared to 0.107 for the homogenous groups. Cognitive diversity (sampling of agents from multiple cognitive classes) is the most effective method of generating high performance groups. As the number of cognitive classes in the group increases, the error decreases from 1 cognitive class (error of 0.104) to 2 cognitive classes (0.073) and finally to 4 cognitive classes (0.057). The linear model for generalization error vs. the Melville diversity metric has a slope of 2.0498 with p = 0.049, and R2 = 0.994. This implies that increasing cognitive diversity (measured by the Melville diversity score) by a magnitude of 0.01 decreases the group generalization error by 0.02. Paired two sample t tests for differences in mean generalization error were analyzed for significant differences between the 4 cognitive class results and the following: 2 class cognitive results, MLP class, and the homogenous MLPe group. All differences are statistically significant with p values less than 0.000. This again confirms the most accurate groups have members from all 4 cognitive classes. Fig. 4 presents the Spanish bank results for groups sampled from constrained levels of unique agents (a proxy for ability diversity and cognitive diversity). The collective decision accuracy of the Spanish bank problem is much more pronounced than the US bankruptcy problem, with error reductions ranging up to 50%. The average error of all homogenous groups is 10.34%, with the most accurate group achieving a 5.30% error and the least accurate a 21.20% error. These errors fall dramatically as the number of unique agents in the group decision is increased from 1 to 4. The average error of all randomly formed groups is 6.66%, or 64.4% of the homogenous average; the most accurate group has a generalization error of 2.90% (54.7% of the homogenous case), while the least accurate group generalization errors is 18.6% (87.7% of the homogenous case). The accuracy of the average and least accurate groups continues to decrease as the number of unique agents increases to 8 and 12. The average error decreases to 5.51% at 12 unique agents, almost half of the homogenous average. The least accurate group decision decreases to 9.4% at 12 unique agents (44.3% of the homogenous case). The error of the most accurate group increases slightly as agent diversity increases beyond 4 unique agents (2.9–3.3% and 3.4%, respectively for 8 and 12 unique agents). Clearly the improvements in group decision accuracy are more pronounced for the Spanish bankruptcy data than the US bankruptcy data. The slopes of the group error as a function of the Melville diversity metric is evidence of this difference; the error improvement for the Spanish data with a slope of 2.05 is significantly greater than the US data at 0.745. While we cannot give a comprehensive reason for these differences, we would like to offer the following insight. First, it is possible that some of these differences result from the problem structure. The US data has five feature variables and 329 observations, while the Spanish data has nine variables and 66 observations. If we consider a neural network agent for each problem with six hidden nodes, the US neural network would have 42 weights trained on 230 observations, while the Spanish agent would have 72 weights trained with 42 observations. The Spanish data is clearly over-parameterized with a parameter: observation ratio less that one. This can cause unstable estimates from single agents. However, the tendency to overfit the Spanish models may result in larger group error improvements [14]. We also note that the amount of error diversity in the Spanish data is greater than the diversity of the US data for all group conditions; the Melville diversity metric for the US data ranges from 0.04 to 0.11 and from 0.08 to 0.16 for the Spanish data.
0.25
Maximum error-1,000 groups
Generalization Error
0.2
100%
87.7%
0.15
63.2%
Average error-1,000 groups
0.1
44.3%
100% 64.4% 56.7%
0.05
0
53.3%
Minimum error-1,000 groups 100%
54.7%
64.2%
Increasing Group Diversity
Homogeneous Groups
1
62.3%
4
8
Number of Unique Agents Fig. 4. Group generalization error Spanish bank data.
12
554
D. West, S. Dellana / Information Sciences 179 (2009) 542–558
7. Results for group composition study In response to our fourth research question, we report the results of our group composition analysis in this section. From Fig. 5, it is evident that increasing the number of unique agents in a group creates groups with significantly lower decision errors for the US bankruptcy data. As the number of unique agents in the group increases to 4, 8, and 12 (Fig. 5), both the maximum and average error decrease monotonically. With 4 unique agents, the maximum error is reduced from 0.171 for the homogenous group to 0.153, a reduction of 10.3%; the average error is reduced from 0.1446 for the homogenous group to 0.131, a reduction of 9.6%. If we compare the homogenous group with groups of 8 unique agents, the maximum error is 0.140, an 18.1% reduction; the average error is 0.1289, a 10.9% reduction. The improvement in generalization error from 8 to 12 unique agents is much more modest, but continues to decline for the maximum and average error. The maximum error for 12 unique agents is 0.1373, an overall reduction from the homogenous groups of 19.5%. The corresponding average error is 0.1285, an 11.1% reduction. We consider the groups with minimum error to be the high performance decision groups; they are the most accurate groups from the set of 1000 randomly formed groups. The minimum generalization error behaves somewhat differently than the maximum and average error. At 4 unique agents, the minimum error is 0.1208, a 7.2% reduction from the minimum homogenous group error. At 8 and 12 unique agents, the minimum error increases slightly to 0.1214 (6.7% reduction) at 8 agents, and to 0.1216 (6.6%) at 12 agents. The minimum group error occurs at 4 unique agents and does not decline as the number of unique agents increases beyond that point. Fig. 3 also confirms that the variability of group results decreases significantly as the number of unique agents in the group increases. The range between the maximum and minimum error (0.04) for the homogenous groups decreases monotonically with increasing numbers of agents to 0.016 at 12 unique agents. To understand the difference in composition between the high performance and low performance groups, we employ a simple weighted accuracy score and rank order each unique agent. The six unique agents that appear most frequently in the high performance and low performance homogenous groups for the US bankruptcy data are reported in Table 2. Members of the high performance homogenous groups are primarily MLP agents and logistic regression (a closely related algorithm). The LR and five MLP unique agents are the smartest experts since they individually demonstrate the greatest ability to solve the bankruptcy detection problem. By contrast, members of the low performance homogenous groups are primarily multiple linear discriminant analysis, Kernel density, and k-nearest neighbor agents. As the group diversity increases to 4 unique agents (increasing both ability and cognitive diversity), the membership of the high performance groups changes (Table 2). Although the decision error of these groups has decreased, LR and MLP agents (the smartest experts) now represent only four of the top six positions. Two Kernel density agents (KDa and KDc) are now more frequently associated with high performance groups formed from 4 unique agents. Interestingly, KDa is the second least accurate of the 22 homogenous agents (see Fig. 1). Two radial basis function agents (RBFd and RBFb) are now associated with the low performance groups, in addition to LDA, KD, and KNN agents. The composition of groups at the two highest levels of cognitive and ability diversity (8 and 12 unique agents) is identical and strikingly different from the composition of homogenous groups (Table 2). The composition of unique agents in the most
0.18 Maximum error-1,000 groups
0.17 100%
Generalization Error
0.16 0.15
89.7% Average error-1,000 groups
0.14
100% 81.9% 80.5% Minimum error-1,000 groups
0.13 100%
0.12 0.11
90.4%
92.8%
Homogeneous Groups
89.1%
88.9%
93.3%
93.4%
Increasing Group Diversity
0.1 1
4
8
Number of Unique Agents Fig. 5. Group generalization error US bankruptcy data.
12
555
D. West, S. Dellana / Information Sciences 179 (2009) 542–558 Table 2 Agent membership in high and low performance groups. High performance groups
Low performance groups
Homogenous agent
4 Unique agents
8 and 12 Unique agents
Homogenous agent
4 Unique agents
8 and 12 Unique agents
LR MLPb MLPa MLPc MLPd MLPe
KDc LR MLPd MLPb KDa MLPc
KDb KDa KNNa KDc KNNb KNNc
KNNa LDA KDa KDe KDb KNNb
LDA KDe KDd KNNd RBFd RBFb
LDA KDe KDd MLPb MLPd MLPe
Increasing ability and cognitive diversity.
diverse high performance groups now consists entirely of Kernel density (KDa, KDb, and KDc) and k-nearest neighbor agents (KNNa, KNNb, KNNc). Interestingly, four of these six unique agents are associated with the low performance homogenous groups, implying they individually have limited ability to solve the problem. It is also noteworthy that three of the MLP unique agents (MLPb, MLPd, MLPe), which are the most accurate homogenous groups (i.e., the smartest experts), are now associated with low performance 8 and 12 agent groups. When we examine the high performance groups we find the lowest generalization error is from groups with members drawn uniformly from each of four cognitive classes: MLP, RBF, KNN, and KD. For example, the membership of the single most accurate group formed with 4 unique agents is KDc, RBFe, MLPc, and KNNa; from 8 unique agents the membership is MLPa, MLPc, RBFc, RBFe, KDc, KDb, KNNd, KNNa. This confirms our earlier findings; the most accurate groups are sampled from 4 cognitive classes. Our answer to research question four is that the most accurate cooperative synthesis systems are groups with membership compositions that uniformly represent all four cognitive classes. A pattern of expert migration is also noted in these results. As the groups become more diverse, the expert agents are replaced in the most accurate groups with agents of lower abilities. This finding reinforces the concept that ‘‘diversity trumps ability,” a theme advanced by Hong and Page [24], who conclude that groups of diverse problem solvers can outperform groups of high-ability problem solvers. This finding suggests the importance of forming decision groups from a diverse set of agents with differences in ability and cognitive style. 8. Concluding discussion: diversity and group decisions Our review of the literature suggests that much of the prior research on error diversity in group decision processes focuses on competitive synthesis (tournaments of single agents) and cooperative synthesis with agents limited to content diversity (manipulation of data). This research rigorously investigates the value of incorporating ability diversity and cognitive diversity into cooperative synthesis systems. We believe this is the first study to accomplish this and are unaware of any other empirical research on this topic. Based on our experimental findings, we offer the following insights for practitioners implementing group decision processes. If the group is comprised of a single decision agent with content (manipulation of data) error diversity, it is critically important to search for the expert agent, that agent with the highest capability level for the given problem. We find there is significant variability between and within homogenous groups of a single agent. The inclusion of different ability levels for any single decision agent results in modest improvements in group decision accuracy; we estimate these improvements range from 3% to 4%. The most accurate groups had agents from four different cognitive classes. We estimated group decision accuracy to increase 11% to 47% from error diversity created by differences in cognitive style. Interestingly, the composition of the most accurate groups no longer relies on the experts. Therefore, future research should focus on exploring diverse cognitive groups; the search for better experts is not productive for cognitively diverse groups. The experimental results verify that the group error decreases as the number of cognitive classes increases, and that the most accurate groups have agents from four different cognitive classes (i.e., agents that form different perspectives of the problem and use different heuristics to solve the problem). We show that this strategy reduced the generalization error by 11% for US bankruptcy data and 47% for Spanish bank failures. With the US bankruptcy data, the groups formed with uniform membership from 4 cognitive classes achieved a 12.94% reduction in error from the average of 20 homogenous groups and a 4.65% reduction from the most accurate homogenous group. The Spanish bank results were more striking. The 4 cognitive class groups reduced the average error of all homogenous groups by 46.99%, and reduced the error of the most accurate homogenous group by 11.29%. The results of this research offer conclusive evidence that multi-agent decision systems formed from a diverse set of unique agents are significantly more accurate than those formed with a single agent. There is a discernable decrease in generalization error as the number of unique agents increases. All sources of group diversity are not universally beneficial. Multi-agent groups that depend solely on content diversity exhibit increasing group decision errors with increasing levels of diversity. In this case, increasing diversity is generated by agents of lower ability that are incorrect more often. By contrast, as the number of cognitive classes represented in multiagent groups increases, group diversity increases and the group decision error decreases. It is also interesting to observe the migration of agents as group diversity is increased by increasing the number of unique agents in the group. High performance homogenous groups consist of the experts, MLP agents which have the highest individual ability to solve the problem. The group decision error decreases as the number of unique agents in the group
556
D. West, S. Dellana / Information Sciences 179 (2009) 542–558
is allowed to increase. Experts have been replaced in the most effective high performance groups by agents of average ability. Hong and Page refer to this in human groups as ‘‘diversity trumps ability”, groups of diverse problem solvers can outperform groups of high-ability problem solvers [24]. While we feel the data used in this research is representative of financial applications, the reader is cautioned that our conclusions are based on one specific application and limited to a two-group binary decision domain with relatively small numbers of learning examples. Our results are also dependent on specific decisions including the choice of feature variables, the selection of decision agents and the configuration of agents. While we attempt to use conventional principles in our research, differences in any of these research decisions may produce different results. We focus on understanding some general properties of diversity and multi-agent decision systems. For this type of research, it is fairly common to measure misclassification instances and not misclassification costs. We acknowledge that the costs of misclassification do vary considerably between the bankruptcy and survival groups and must be considered during the implementation stage. Finally, the implications of the no free lunch theorems suggest our results may not be generalized to problem domains with significantly different problem structures [64]. Essentially, the no free lunch theorem (NFL) establishes that any two algorithms are equivalent when their performance is averaged across the set of all possible problems [64,66]. This implies that if we know classifier C2 is more accurate than classifier C1 on a specific subset of problems, then C1 must be more accurate than C2 on the set of all remaining problems. Recently there has been research defining conditions that obviate the dire consequences of NFL [33,66]. Yu et al. [66] argue that NFL does not eliminate the incentive to continue to improve classifiers in domains where they do well. Koehler [33] establishes conditions for NFL to hold based on the problem structure. In an analysis of combinatorial problems, Koehler finds that only trivial subclasses of problems fall under the NFL implications. Both of these arguments provide justification for believing that dominant algorithms can exist for specific problem domains. Appendix. Definitions of unique agents, ability levels, and cognitive classes Cognitive class
Agent and ability
MLP
RBF
DA LR
KNN
KD
Parameter
Note
Neural network MLPa
522
MLP = multilayer perceptron, Format I H1 O, where I = number of input nodes, H1 = number of nodes in hidden layer 1, O = number of output nodes
MLPb MLPc MLPd MLPe
542 562 582 5 10 2
RBFa
5 20 2
RBFb RBFc RBFd RBFe
5 40 2 5 60 2 5 80 2 5 100 2
Parametric MDA LR
RBF = radial basis function, Format I H O, where I = number of input nodes, H = number of nodes in hidden layer, O = number of output nodes
MDA = Fisher’s linear discriminant analysis LR = logistic regression
Nonparametric KNNa KNNb KNNc KNNd KNNe
k=5 k=7 k=9 k = 11 k = 13
KNN = k nearest neighbor Format k = i, where i = number of nearest neighbors
KDa KDb KDc KDd KDe
R = 0.1 R = 0.5 R = 1.0 R = 1.5 R = 2.0
KD = Kernel density Format R = j, where j = radius of Kernel function
D. West, S. Dellana / Information Sciences 179 (2009) 542–558
557
References [1] R. Abdel-Aal, Improved classification of medical data using abductive network committees trained on different feature subsets, Computer Methods and Programs in Biomedicine 80 (2) (2005) 141–153. [2] E. Altman, Financial rations discriminant analysis and the prediction of corporate bankruptcy, Journal of Finance 23 (1968) 589–609. [3] E. Altman, G. Marco, F. Varetto, Corporate distress diagnosis: comparisons using linear discriminant analysis and neural networks (the Italian experience), Journal of Banking and Finance 18 (1994) 505–529. [4] E. Bauer, R. Kohavi, An empirical comparison of voting classification algorithms: bagging, boosting, and variants, Machine Learning 36 (1–2) (1999) 105–139. [5] S. Bay, Nearest neighbor classification from multiple feature subset, Intelligent Data Analysis 3 (1999) 191–209. [6] K. Bertels, J. Jacques, L. Neuberg, L. Gatot, Qualitative company performance evaluation: linear discriminant analysis and neural network models, European Journal of Operational Research 115 (1999) 608–615. [7] L. Breiman, Bagging predictors, Machine Learning 26 (1996) 123–140. [8] L. Breiman, Randomizing outputs to increase prediction accuracy, Machine Learning 40 (3) (2000) 229–242. [9] P. Brockett, W. Cooper, L. Golden, X. Xia, A case study in applying neural networks to predicting insolvency for property casualty insurers, Journal of the Operational Research Society 48 (1997) 1153–1162. [10] G. Brown, J. Wyatt, R. Harris, X. Yao, Diversity creation methods: a survey and categorization, Information Fusion 6 (1) (2005) 5–20. [11] S. Chen, Computationally intelligent agents in economics and finance, Information Sciences 177 (2007) 1153–1168. [12] S. Cho, Ensemble of structure-adaptive self-organizing maps for high performance classification, Information Sciences 123 (1–2) (2000) 103–114. [13] P. Coats, L. Fant, A neural network approach to forecasting financial distress, Journal of Business Forecasting (1992) 9–12. Winter. [14] P. Cunningham, J. Carney, S. Jacob, Stability problems with artificial neural networks and the ensemble solution, Artificial Intelligence in Medicine 20 (2000) 217–225. [15] S. Davalos, R. Gritta, G. Chow, The application of a neural network approach to predicting bankruptcy risks facing the major US air carriers: 1979–1996, Journal of Air Transport Management 5 (1999) 81–86. [16] V. Desai, J. Crook, G. Overstreet, A comparison of neural networks and linear scoring models in the credit union environment, European Journal of Operational Research 95 (1996) 24–37. [17] T. Dietterich, Experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization, Machine Learning 40 (2) (2000) 139–157. [18] H. Drucker, C. Cortes, L. Jackel, Y. LeCun, V. Vapnik, Boosting and other ensemble methods, Neural Computation 6 (1994) 1289–1301. [19] J. Franke, Bootstrapping neural networks, Neural Computation 12 (2000) 1929–1949. [20] L. Glorfeld, B. Hardgrave, An improved method for developing neural networks: the case of evaluating commercial loan creditworthiness, Computers and Operations Research 23 (1996) 933–944. [21] L. Hansen, P. Salamon, Neural network ensembles, IEEE Transactions on Pattern Analysis and Machine Intelligence 12 (10) (1990) 993–1001. [22] J. Hansen, Combining predictors: comparison of five meta machine learning methods, Information Sciences 119 (1–2) (1999) 91–105. [23] Y. Hayashi, R. Setiono, Combining neural network predictions for medical diagnosis, Computers in Biology and Medicine 32 (2002) 237–246. [24] L. Hong, S. Page, Groups of diverse problem solvers can outperform groups of high-ability problem solvers, Proceeding of the National Academy of Science 101 (46) (2004) 16385–16389. [25] J. Hongkyu, H. Ingoo, L. Hoonyoung, Bankruptcy prediction using case-based reasoning, neural networks, and discriminant analysis, Expert Systems With Applications 13 (1997) 97–108. [26] M. Hu, C. Tsoukalas, Explaining consumer choice through neural networks: the stacked generalization approach, European Journal of Operational Research 146 (2003) 650–660. [27] Z. Huang, H. Chen, C. Hsu, W. Chen, S. Wu, Credit rating analysis with support vector machines and neural networks: a market comparative study, Decision Support Systems 37 (2004) 543–558. [28] H. Kim, S. Pang, H. Je, D. Kim, S. Yang Bang, Constructing support vector machine ensemble, Pattern Recognition 36 (12) (2003) 2757–2767. [29] K. Kim, S. Cho, Ensemble classifiers based on correlation analysis for DNA microarray classification, Neurocomputing 70 (1–3) (2006) 187–199. [30] Y. Kim, Toward a successful CRM: variable selection, sampling, and ensemble, Decision Support Systems 41 (2) (2006) 542–553. [31] M. Kearns, Y. Mansour, On the boosting ability of top-down decision tree learning algorithms, Journal of Computer and System Sciences 58 (1) (1999) 109–128. [32] J. Kittler, M. Hatef, R. Duin, J. Matas, On combining classifiers, IEEE Transactions on Pattern Analysis and Machine Intelligence 20 (3) (1998) 226–239. [33] G. Koehler, Conditions that obviate the no-free-lunch theorems for optimization, INFORMS Journal on Computing 19 (2007) 273–280. [34] A. Krogh, J. Vedelsby, Neural network ensembles, cross validation, and active learning, NIPS 7 (1995) 231–238. [35] C. Kun, H. Ingoo, K. Youngsig, Hybrid neural network models for bankruptcy predictions, Decision Support Systems 18 (1996) 63–72. [36] R. Lacher, P. Coats, S. Sharma, L. Fant, A neural network for classifying the financial health of a firm, European Journal of Operational Research 85 (1995) 53–65. [37] K. Lee, I. Han, Y. Kwon, Hybrid neural network models for bankruptcy predictions, Decision Support Systems 18 (1996) 63–72. [38] M. Malhotra, S. Sharma, S. Nair, Decision making using multiple models, European Journal of Operational Research 114 (1) (1999) 1–14. [39] P. Mangiameli, D. West, R. Rampal, Model selection for medical diagnosis decision support systems, Decision Support Systems 36 (2004) 247–259. [40] P. Melville, R. Mooney, Creating diversity in ensembles using artificial data, Information Fusion 6 (2005) 99–111. [41] S. Mukkamala, A. Sung, A. Abraham, Intrusion detection using an ensemble of intelligent paradigms, Journal of Network and Computer Applications 28 (2) (2005) 167–182. [42] D. Opitz, R. Maclin, Popular ensemble methods: an empirical study, Journal of Artificial Intelligence Research 11 (1999) 169–198. [43] G. Pasi, R. Yager, Modeling the concept of majority opinion in group decision making, Information Sciences 176 (4) (2006) 390–414. [44] S. Peddabachigari, A. Abraham, C. Grosan, J. Thomas, Modeling intrusion detection system using hybrid intelligent systems, Journal of Network and Computer Applications 30 (1) (2007) 114–132. [45] P. Pendharkar, The theory and experiments of designing cooperative intelligent systems, Decision Support Systems 43 (2007) 1014–1030. [46] S. Piramuthu, Financial credit-risk evaluation with neural and neurofuzzy systems, European Journal of Operational Research 112 (1999) 310–321. [47] Y. Raviv, N. Intrator, Bootstrapping with noise: an effective regularization technique, Connection Science 8 (1996) 355–372. [48] A. Refenes, A. Burgess, Y. Bentz, Neural networks in financial engineering: a study in methodology, IEEE Transactions on Neural Networks 8 (1997) 1222–1267. [49] L. Salchenberger, E. Cianr, N. Lash, Neural networks: a new tool for predicting thrift failures, Decision Sciences 23 (1992) 899–916. [50] H. Schwenk, Y. Bengio, Boosting neural networks, Neural Computation 12 (2000) 1869–1887. [51] C. Serrano-Cinca, Feedforward neural networks in the classification of financial information, European Journal of Finance 3 (1997) 183–202. [52] C. Serrano-Cinca, Self organizing neural networks for financial diagnosis, Decision Support Systems 17 (1996) 227–238. [53] R. Sikora, M. Shaw, A multi-agent framework for the coordination and integration of information systems, Management Science 44 (11) (1998) 65–78. [54] M. Skurichina, R. Duin, Bagging for linear classifiers, Pattern Recognition 31 (7) (1998) 909–930. [55] S. Sohn, S. Lee, Data fusion ensemble and clustering to improve the classification accuracy for the severity of road traffic accidents in Korea, Safety Science 41 (2003) 1–14. [56] F. Tam, M. Kiang, Managerial applications of neural networks: the case of bank failure predictions, Management Science 38 (1992) 926–947.
558
D. West, S. Dellana / Information Sciences 179 (2009) 542–558
[57] A. Tsakonas, G. Dounias, M. Doumpos, C. Zopounidis, Bankruptcy prediction with neural logic networks by means of grammar-guided genetic programming, Expert Systems with Applications 30 (2006) 449–461. [58] V. Vapnik, Statistical Learning Theory, John Wiley, New York, 1998. [59] M. Versace, R. Bhatt, O. Hinds, M. Shiffer, Predicting the exchange traded fund DIA with a combination of genetic algorithms and neural networks, Expert Systems with Applications 27 (3) (2004) 417–425. [60] D. West, S. Dellana, J. Qian, Neural network ensemble strategies for financial decision applications, Computers and Operations Research 32 (2005) 2243–2559. [61] D. West, P. Mangiameli, R. Rampal, V. West, Ensemble strategies for a medical diagnostic decision support system: a breast cancer diagnosis application, European Journal of Operational Research 162 (2005) 532–551. [62] D. West, Neural network credit scoring models, Computers and Operations Research 27 (2000) 1131–1152. [63] R. Wilson, R. Sharda, Bankruptcy prediction using neural networks, Decision Support Systems 11 (1994) 545–557. [64] D. Wolpert, W. Macready, No free lunch theorems for optimization, IEEE Transactions on Evolutionary Computation 1 (1997) 67–82. [65] E. Yu, S. Cho, Constructing response model using ensemble based on feature subset selection, Expert Systems with Applications 30 (2) (2006) 352–360. [66] L. Yu, S. Wang, K. Lai, An intelligent-agent-based fuzzy group decision making model for financial multicriteria decision support: the case of credit scoring, European Journal of Operational Research (2007), doi:10.1016/j.ejor.2007.11.025. [67] Z. Yang, M. Platt, H. Platt, Probabilistic neural networks in bankruptcy prediction, Journal of Business Research 44 (1999) 67–74. [68] J. Zhang, Developing robust non-linear models through bootstrap aggregated neural networks, Neurocomputing 25 (1999) 93–113. [69] G. Zhang, M. Hu, B. Patuwo, D. Indro, Artificial neural networks in bankruptcy prediction: general framework and cross-validation analysis, European Journal of Operational Research 116 (1999) 16–32. [70] G. Zhang, A neural network ensemble method with jittered training data for time series forecasting, Information Sciences 177 (23) (2007) 5329–5346. [71] Y. Zhang, S. Bhattacharyya, Genetic programming in classifying large-scale data: an ensemble method, Information Sciences 163 (1–3) (2004) 85–101. [72] P. Zhilkin, R. Somorjai, Application of several methods of classification fusion to magnetic resonance spectra, Connection Science 8 (1996) 427–442. [73] J. Zhou, D. Lopresti, Improving classifier performance through repeated sampling, Pattern Recognition 30 (1997) 1637–1650. [74] Z. Zhou, Y. Jiang, Y. Yang, S. Chen, Lung cancer cell identification based on artificial neural network ensembles, Artificial Intelligence in Medicine 24 (2002) 25–36. [75] J. Zurada, B. Foster, T. Ward, R. Barker, Neural networks versus logit regression models for predicting financial distress response variables, Journal of Applied Business Research 15 (1998) 21–30.