Aggregating expert predictions in a networked environment

Aggregating expert predictions in a networked environment

Computers & Operations Research 28 (2001) 1231}1244 Aggregating expert predictions in a networked environment Raymond L. Major*, Cli! T. Ragsdale Dep...

355KB Sizes 0 Downloads 60 Views

Computers & Operations Research 28 (2001) 1231}1244

Aggregating expert predictions in a networked environment Raymond L. Major*, Cli! T. Ragsdale Department of Management Science & Information Technology, Virginia Tech, 2064 Pamplin Hall, Blacksburg, VA 24061, USA Received 1 September 1999; received in revised form 1 January 2000; accepted 1 April 2000

Abstract This paper describes an approach for combining the classi"cations or predictions of n local experts into a single composite prediction. We describe a Java-based application that allows a user to select up to n prediction experts that provide information for assigning an object to one of two predetermined groups. An advantage of this type of application is that it is capable of interacting with the Internet in a relatively seamless way. We examine the accuracy and robustness of our technique by comparing the classi"cation accuracy of our technique, a maximum entropy-based aggregation technique, and four classi"cation methods on a real-world, two-group data-set concerned with bank failure prediction. The classi"caiton methods studied in this work include Quinlan's C4.5 decision-tree classi"er, logistic regression, mahalanobis distance measures, and a neural network classi"er. Our model includes a fundamental component (i.e., a transaction manager) that helps improve the general performance of applications that perform network-based classi"cation. This component is found to provide reliable and secure connections along with ways to direct tra$c across the Internet. Our results suggest three major contributions: (1) a transaction manager increases the #exibility of a network-based classi"er since it is capable of transacting with one or more speci"c types of prediction expert(s) over the Internet; (2) our approach tends to be more accurate than the individual classi"cation methods we examined; and, (3) our approach can outperform a recently introduced statistically based aggregation technique. Scope and purpose The emergence of the Internet has produced a need for employing new types of programming and research tools that are capable of accessing information resources located throughout the world. There is only a limited amount of research available in this area and this work describes a network-based tool that solves a two-group classi"cation problem. The two-group classi"cation problem in discriminant analysis is concerned with developing a rule for predicting to which of k"2 mutually exclusive groups an observation of unknown origin belongs. This problem commonly occurs in business and other areas, and a plethora of statistical and arti"cial intelligence (AI) techniques exist to help decision-makers e!ectively analyze their

* Corresponding author. Tel.: #1-540-231-3163; fax: #1-540-231-3752. E-mail address: [email protected] (R.L. Major). 0305-0548/01/$ - see front matter  2001 Elsevier Science Ltd. All rights reserved. PII: S 0 3 0 5 - 0 5 4 8 ( 0 0 ) 0 0 0 3 7 - X

1232

R.L. Major, C.T. Ragsdale / Computers & Operations Research 28 (2001) 1231}1244

data. A number of recent studies have compared the classi"catory performance of various AI techniques to the more traditional statistical techniques, however, decision makers are left in somewhat of a quandary about which of the many available classi"cation techniques to use to solve a speci"c classi"cation problem. This paper proposes a new aggregation technique that focuses on combining or aggregating the predictions from multiple classi"cation techniques into a single composite prediction. Our approach provides a simple method for aggregating expert predictions coming from remote locations by combining Java and Common Object Request Broker Architecture (CORBA) into a general classi"cation tool. Object-oriented models developed using Java are platform independent and can be easily modi"ed. CORBA provides the services necessary to establish and manage network connections. Computational results show that our technique outperforms a recently introduced maximum entropy-based aggregation technique using a real-world data set.  2001 Elsevier Science Ltd. All rights reserved. Keywords: Distributed processing; Discriminant analysis; Two-group classi"cation

1. Introduction The fast growing number of computers and other information resources connected to the Internet coupled with the rapid technological advances in computer hardware is revolutionizing the way that researchers and other decision-makers obtain and analyze data. Java2+ is an object-oriented programming language developed to produce applications that are platform independent. Veith et al. [1] discuss many advantages of Java and also describes a World Wide Web (WWW)-based tool that provides (1) object oriented #exibility, (2) platform independence through the WWW, and (3) compatibility with other Java-based graphical and analytical tools. The combination of Java and The Common Object Request Broker Architecture (CORBA) results in a robust tool for developing networked-based applications. CORBA is middleware, standardized by the Object Management Group, that enables the implementation of robust distributed systems. It permits creating a heterogeneous world that can include objects implemented in various languages like C, C##, Smalltalk, Cobol, and Ada [2]. This standard provides an infrastructure that enables the invocation of operations on objects located anywhere on a network as if they were local to the application using them. This paper describes an Internet-based tool, developed using Java and CORBA, that avoids certain limitations of WWW-based applications like being unable to access local system resources. We use our tool to examine a two-group classi"cation problem in a real-world setting. Mangiameli and West [3] describe the two-group classi"cation problem along with a neural network classi"cation technique that is more accurate than certain parametric and non-parametric methods. The classi"cation problem is one of the most fundamental problems of scienti"c inquiry and has found application in diverse "elds from biology to arti"cial intelligence and the social and administrative sciences [4]. A number of well-known statistical techniques are available to address this problem including Fisher's linear discriminant function, Smith's quadratic discriminant function, logistic regression, and the nearest-neighbor technique. More recently, neural networks, decision trees, genetic algorithms and a variety of other arti"cial intelligence (AI) techniques have been applied to this problem [3,5]. As Mangiameli and West [3] describe, for a given data set, it is

R.L. Major, C.T. Ragsdale / Computers & Operations Research 28 (2001) 1231}1244

1233

not known a priori which of the alternative methods will produce the most accurate classi"cation results. One possible solution to this quandary for decision makers is to consider how the results of several di!erent classi"cation techniques might be combined to create a composite classi"cation [6]. If we regard each classi"cation technique as a &prediction expert', this is equivalent to the opinion aggregation problem in which a group of experts express their predictions on the outcome of a random event and we then combine their predictions into a single aggregated consensus. The underlying idea is that the consensus of multiple experts should result in a superior prediction than any of the individual experts used in isolation [7]. This rather subjective form of decision making lends itself well to associating certain characteristics with a group of experts like di!ering areas and levels of competence. This seems appropriate since some techniques will perform well for a given data set only to perform poorly on another. When all the experts' predictions agree, the aggregation process is straightforward. However, when one or more experts disagree the problem becomes quite di$cult as one attempts to account for di!erences in competence among the experts and correlation between the experts' predictions. Recently, Myung et al. [8] introduced a maximum entropy technique for aggregating individual expert predictions on the outcome of a discrete random event that explicitly considers individual expert competence and possible dependencies among experts. While the maximum entropy technique was shown to have numerous desirable features, no empirical results were provided to demonstrate the e!ectiveness of this technique on real-world data. In this paper, we use the maximum entropy technique and a new heuristic technique to aggregate predictions from four classi"cation techniques using a well-known, real-world, two-group data set related to bank failure prediction. Our results indicate that the heuristic technique introduced here outperforms the maximum entropy technique on this bank failure data set. Section 2 of this paper discusses recent literature results associated with the development of research tools for the Internet and also describes approaches used to solve a classi"cation problem along with aggregating expert predictions. In Section 3 we test our approach against the maximum entropy technique, two statistically based classi"cation methods (logistic regression and mahalanobis distance measures), and two AI-based classi"cation methods (Quinlan's C4.5 decision-tree classi"er and a neural-network classi"er). The results and analysis are presented in the fourth section and the "fth section concludes the paper.

2. Research tools for solving a classi5cation problem Research in the area of network data processing is developing rapidly as new internet programming tools emerge. Current research issues include not just how existing resources will be distributed and accessed on the network, but how to process information coming from remote locations over the Internet and how will decision-making be reshaped by the changing environment and the availability of new tools [2,9]. This paper also describes an alternative way for decision-makers to perform two-group classi"cation analysis. Our algorithm produces a composite prediction (or classi"cation) using the predictions of two or more prediction experts for a given observation. Each prediction expert provides information independently of the others. This is appealing because each expert may have di!erent strengths in di!erent types of data domains.

1234

R.L. Major, C.T. Ragsdale / Computers & Operations Research 28 (2001) 1231}1244

Once the assignment rules have been developed they can be stored for additional use. We next highlight certain concepts in network programming and approaches for solving a classi"cation problem that we employ in this research. 2.1. Data processing and the Internet An increasing number of works are appearing in the literature that describe new tools being used by researchers that process data over the Internet. Veith et al. [1] present a WWW-based discrete-event simulation package (written entirely in Java) that o!ers maximum user interactivity and cross-platform capabilities. OpsResearch.com is a WWW location (i.e., http://www.opsresearch.com) that includes a freeware library of more than 420 Operations Research objects for Java and a collection of links to Operations Research resources on the WWW. Thus, the Internet creates new and important opportunities for the OR/MS community. Bhargava and Krishnan [2] provide a detailed discussion on how the Web and Internet can be used in the service of OR/MS. At the most fundamental level, allowing applications to work together across the Internet requires reliable and secure connections along with ways to direct tra$c. Our approach uses a &transaction manager' that is capable of managing these issues along with performing other activities. For example, using the description of a classi"cation task along with certain data distribution information as described by Nault and Storey [10] and Bhattacharyya and Pendharkar [5], this component may choose to connect and transact with one or more speci"c types of prediction expert(s) over the Internet when pursuing its objectives. This paper o!ers a model of an additional component, a transaction manager, for inclusion in any application that interacts with the Internet. Generally speaking, our transaction manager is composed of a set of business processes. Each business process represents a certain task like (1) gaining access to services on one or more network resources, (2) tracking the commitment and/or rollback of transactions on the network, (3) recording the beginning and end of a transaction, or, (4) acting as a message server to remote computer systems. We used Java together with CORBA as a platform for building our network-based system. We chose Java as the programming language because the internet protocols are embedded in the language itself so information can be retrieved from either local resources or over the Internet in a way that is relatively seamless. Java is an object-oriented language that is well suited to network-centric computing and enhances the interconnectivity of remote computer systems [11]. Java programs are compiled into bytecode that is executed by interpreters available on any major platform. Bytecode represents an &international language' in our computer world. We refer the interested reader to Flanagan [12] for a thorough description of the Java language and application programming interface (API). Using Java, instead of invoking a method on a remote object, the code for the class providing the method is transferred across the network, run locally, and then the method is invoked on a local object instance. This feature of the language combined with CORBA provides an API for developing powerful applications for the Internet. Java's remote method invocation (RMI) is an alternative infrastructure for supporting distributed computing and has many of the features found in CORBA. These include an interface de"nition language, a registry that performs identical functions as CORBA's object manager and naming service, and dynamic remote binding and class loading [13]. Our use of CORBA over RMI

R.L. Major, C.T. Ragsdale / Computers & Operations Research 28 (2001) 1231}1244

1235

Fig. 1. Information #ow within the model.

is rather arbitrary since this work represents an initial development of a networked-based classi"cation system. Our object-oriented model is composed of three classes of objects. A TManager class representing a transaction manager, an Expert class representing prediction experts capable of supplying prediction information over the Internet, and an Aggregation class representing aggregation techniques that independently produce a set of rules for combining two or more expert predictions. A TManager object is essentially an interface designed to access objects regardless of their location and access the standard CORBA services and facilities. Fig. 1 depicts the information #ow within the model. The knowledge base represents a set of rules used to combine expert predictions. Fig. 2 shows the primary user/programmer interface that provides the means for using the tool. The transaction manager provides the user interface shown in Fig. 2 for problem-oriented communication between the user and computer. The transaction manager also contains an inference engine to propagate inferences over the knowledge and a workplace (i.e., an area of working memory) for the description of a current problem. The transaction manager and knowledge base together represent a more or less complete model of the problem-solving process, and with respect to user control and interactivity, the initiative is always with the system. The system asks for data, performs all the reasoning steps, asks for more information, and in the end comes up with a prediction of group membership for an observation. After the user speci"es the location and format of the data and the system is trained, this component will communicate the information for an observation to each prediction expert. It subsequently communicates the experts' predictions to each aggregation object in order to obtain a composite prediction. The "nal prediction of group membership for an observation is then communicated to the user. 2.2. Solving a classixcation problem Four prediction experts are used in this work that, respectively, represent approaches for solving a classi"cation problem. Our experts use techniques associated with decision tree classi"ers, logistic regression, Mahalanobis distance measures, and neural networks. We chose these methods since they are most familiar to researchers. We did not consider using any of the advanced methods

1236

R.L. Major, C.T. Ragsdale / Computers & Operations Research 28 (2001) 1231}1244

Fig. 2. User/programmer interface.

found in the literature or pre-processing the data in an e!ort to optimize an expert's performance since we are interested in studying aggregation methods. Also, the expert selection process can be incorporated into the functionality of the transaction manager component of our system. The goal would be to remove any expert from consideration by establishing some threshold level of performance for an expert before it is used in the aggregation scheme. The next version of our system will address certain issues associated with the expert selection process. We next brie#y describe the four methods used in this work and provide a discussion of approaches for combining multiple expert predictions into a single one. 2.2.1. Decision tree classixers The ID3 algorithm developed by Quinlan [14,15] is an extensively studied technique for inducing a decision tree from a set of examples. A decision tree is a structure consisting of nodes and branches where each node represents a test or decision. A standard decision-tree algorithm creates a hypothesis by recursively selecting which attribute to place at a node and partitioning the set of examples according to values for the test attribute. There is a branch attached to the node for every possible outcome of the test and the terminal nodes of the decision tree correspond to sets of the same class or category. At the heart of any decision-tree classi"er lies the criterion that it uses to choose attributes when partitioning or splitting the sample data. We use an extended form of ID3, C4.5 [16], for building the decision trees used in this analysis. C4.5 represents a well-understood approach to the classi"cation problem, has been implemented through computer-based tools available to most researchers, and uses the gain-ratio criterion to select attributes when partitioning the data. Unlike

R.L. Major, C.T. Ragsdale / Computers & Operations Research 28 (2001) 1231}1244

1237

the entropy-based criterion used in ID3, the gain-ration criterion does not exhibit a strong bias in favor of attributes having many outcomes for a test. A comparison of Quinlan's goodness-of-split measure to alternative probabilistic measures is found in Breiman [17]. 2.2.2. Logistic regression The logistic regression or (logit) technique is a popular statistical technique for two-group classi"cation problems [18]. Here, the response function is based on the cumulative logistic distribution and de"ned by







\ K . (1)  " 1#EXP ! !  x G  H H G H If the two groups in question are denoted as group 1 and group 0,  may be interpreted as the G estimated probability that an observation belongs to group 1 while 1! represents the probabilG ity of membership in group 0. Thus, when prior group membership probabilities and costs of misclassi"cation are assumed equal, an observation is assigned to group 1 if  *0.5 and is G assigned to group 0 otherwise. The parameters in (1) are generally estimated using the maximum likelihood method. 2.2.3. Mahalanobis distance measure One of the most widely used statistical classi"cation methods is the Mahalanobis distance measure (MDM). Using this technique, a new observation of unknown origin X would be  classi"ed into the group it is closest to based on the following multivariate distance measures: d "(X !X )S\(X !X ), (2) H  H  H where d is the Mahalanobis distance from X to the centroid of group j, X "(x H ,2, x H ) the K H  H  centroid for group j, x H the sample mean for variable x in group j, S" I (n !1)/(n!k)S the G H H H G pooled covariance matrix, and S the covariance matrix for group j. H To classify a new observation, the MDM approach "rst calculates its multivariate distance to the centroid of each of the k groups using (2). This will result in k distance measures, d , d ,2, d .   I A new observation would then be classi"ed as belonging to group j for which d "MIN(d , d ,2, d ). Under certain conditions (i.e., multivariate normality of the independent H   I variables in each group and equal covariance matrices across groups) the MDM approach can be shown to provide &optimal' classi"cation results in that it minimizes the probability of misclassi"cation. Even when these conditions are violated, the MDM approach can still be used as a heuristic (although other techniques might be more appropriate). Note that the MDM technique produces predictions equivalent to Fisher's linear discriminant function [19]. 2.2.4. Neural networks A neural network (NN) consists of a number of computational elements (nodes), linked together via weighted directed arcs [20]. In general, a feed-forward NN computes a mapping from the input space to the output space. Thus, for classi"cation problems, a feed-forward NN can easily be used to develop a mapping of the independent variables X G "(x G , x G ,2, x G ) into group membership   K  predictions X G "(g G , g G ,2, g G ).   I -

1238

R.L. Major, C.T. Ragsdale / Computers & Operations Research 28 (2001) 1231}1244

The training or learning process in a NN consists of presenting the network with a number of input vectors X and comparing the resulting derived output vectors X to a collection of desired ' output vectors. For training purposes, we could de"ne the desired output vector for each observation as follows: X G "(gHG , gHG ,2, gHG ),   I where



0 if observation i belongs to group j, gHG " H 1 otherwise. The weights in the NN are then adjusted in an attempt to reduce any disparity between the derived and desired outputs. This process of deriving output vectors and adjusting weights continues iteratively until the weights converge or a predetermined number of iterations have been completed. While several weight-adjustment methods have been proposed, the backpropagation &learning' method is probably the best known and most widely used [21,22]. Once the NN learns a mapping for a given classi"cation problem using the training sample, a new input observation of unknown origin may be presented to the network. This will result in k derived group membership values g , g ,2, g at the output nodes of the NN. As with MDM technique, these outputs could   I be interpreted as representing measures of group membership where the smaller the value of g , the H greater the likelihood of the observation belonging to group j. Thus, a new observation would be classi"ed as belonging to group j for which g "MIN(g , g ,2, g ). H   I 2.3. Aggregating expert predictions As mentioned earlier, the plethora of available classi"cation techniques creates quite a quandary for decision-makers needing to solve a particular classi"cation problem. This paper investigates using a group approach to solve a problem assuming that a group approach produces better solutions. We now introduce two aggregation techniques that can be used to combine the predictions from multiple prediction experts for a given observation into a single, composite prediction. 2.3.1. Maximum entropy aggregation Myung et al. [8] recently introduced a maximum entropy-based procedure for aggregating predictions from multiple experts for binary (two-group) classi"cation problems. In that article, the maximum entropy inferred aggregate probabilities for an observation's membership in each group were derived are as follows:





L E[Z]#E[O ]!E[R ] MG G G P(Z"1, O "o ,2, O  "o  )"E[Z]  L   L E[Z]!E[O ]#E[R ] G G G E[Z]!E[O ]#E[R ] G G , ; 2E[Z]





(3)

R.L. Major, C.T. Ragsdale / Computers & Operations Research 28 (2001) 1231}1244



1239



L E[O ]#E[R ]!E[Z] MG G G P(Z"0, O "o ,2, O  "o  )"(1!E[Z])    L L 2!E[O ]!E[R ]!E[Z] G G G ;





2!E[O ]!E[R ]!E[Z] G G . 2(1!E[Z])

(4)

Here, Z is a random variable denoting the true group membership of the observation in question (i.e., Z30, 1), O is a random variable denoting expert i's prediction of Z (i.e., O 30, 1), and G G n denotes the number of experts. The binary random variable R "Z#O !2ZO represents the  G G G prediction quality of expert i for a given observation (i.e., R "0 if Z"O ; otherwise, R "1). G G G Thus, the expected value E[R ] represents the proportion of times that expert i's prediction G was incorrect. It can be shown that the aggregate prediction should be 1 if P(Z"1, O "o ,2, O  "o  )'0.5 and 0 otherwise. L   L 2.3.2. Weighted majority aggregation Another approach to aggregating expert predictions involves assigning di!ering weights to each expert's prediction. The composite prediction of group j for a given observation is then determined using a weighted majority voting approach. To illustrate, let w denote the weight assigned to G expert i's prediction (o ) and de"ne G "i o "j. Then the total weighted votes for assigning the G H G observation in question to group j is < " H w . We would assign the observation in question to H GZ% G the group that receives the maximum weighted votes (i.e., assign the observation to group j for which < "MAX(< ,2, < )). The simplicity of this procedure has considerable intuitive appeal. H  I The next section describes a simple method for determining the values of w required by this G technique. 2.3.2.1. Determining the weights. When attempting to devise an e!ective aggregation technique, it seems reasonable to adjust for di!erences in expert competencies * just as the maximum entropy aggregation technique does with the expected value E[R ]. However, if a classi"cation technique G over"ts the training sample for a given problem, it can be di$cult to gauge the true misclassi"cation rate for the technique in question. Over"tting occurs when a classi"cation technique models characteristic of a sample that are not representative of the population from which the sample was drawn [23]. AI-based classi"cation techniques can be particularly susceptible to this problem. One way to estimate the future misclassi"cation rate for a classi"cation technique is to split the training sample into two sets, ¹ and ¹ , of size n  and n  , respectively. A classi"cation 2   2 technique is trained using only the observations in ¹ . The resulting classi"cation rule is then used  to classify the observations in both ¹ and ¹ . Let M represent the number of misclassi"ed   GI observations from ¹ , k31, 2, using classi"cation technique i. The proportion of misclassi"cation I on ¹ using classi"cation technique i is given by M /n I . I GI 2 Now if M /n  +M /n  we may assume the e!ect of over"tting to be minimal and expect the G 2 G 2 classi"cation rule to perform with similar accuracy on future observations from the population. However, substantial di!erences between M /n  and M /n  indicate a biased classi"cation rule G 2 G 2 that "ts one sample substantially better than the other, raising concerns about the classi"cation rule's reliability on future observations. In either case, M /n  is likely to provide a better estimate G 2

1240

R.L. Major, C.T. Ragsdale / Computers & Operations Research 28 (2001) 1231}1244

of the classi"cation technique's misclassi"cation rate on future observations because it is calculated using data that was not used in the development of the classi"cation rule. Clearly, we would prefer a classi"cation technique that is both relatively accurate on the ¹ sample and exhibits little (or no) bias between the ¹ and ¹ samples. This suggests the    following scheme for assigning weights to n expert opinions in our weighted majority algorithm, =M:



M G w " 1! G L M G G





M /n !M /n  G 2 1! n G 2 . M /n  !M /n  G 2 G G 2

(5)

Note that 0)w )1 and a classi"cation technique that produces no mistakes on ¹ (i.e., M "0) G  G and exhibits no bias (i.e., M /n¹ "M /n¹ ) would receive the maximum weight of 1. Weights G  G  for other classi"cation techniques would be reduced according to their relative accuracy and relative bias.

3. Methodology We compare the e!ectiveness of our approach for weighted majority aggregation to the maximum entropy technique using a real-world, two-group, data set concerned with bank failure prediction. This data set contains 19 variables representing various "nancial ratios for 81 failed and 81 nonfailed banks in Texas during the 1985}1987 time period [24]. We "rst created 30 data sets by randomly dividing the data into three subsets: a training sample containing 82 observations (corresponding to the ¹ sample described above), a testing sample  containing 10 observations (corresponding to the ¹ sample described above), and a validation  sample containing 70 observations. Observations in the validation sample are not used during the training and testing phases and thus represent a holdout sample used to measure the accuracy of the classi"cation algorithm. The performance measure reported in this research is an average of the number of observations in the validation sample that were misclassi"ed. Each subset contained 50% failed and 50% nonfailed bank observations. The training sample was analyzed to obtain classi"cation rules for each of the four classi"cation techniques described earlier. These rules were then applied to obtain four expert predictions for each observation in the training, testing and validation samples. The predictions from all techniques were then combined using both the maximum entropy and the weighted majority aggregation techniques. In applying the maximum entropy technique we took E[Z]"0.5 (since there were an equal number of failed and nonfailed banks in each sample), E[R ]"M /n  and estimated E[O ] as G G G 2 the proportion of times expert i predicted group 1 in the ¹ sample. The weights for the weighted  majority aggregation technique were computed as shown in (5). Next, we dedicated and con"gured three computer platforms for running Java along with CORBA for each of the three classes of objects described in Section 2.1. Each class of objects exists as a Java application. The TManager objects were implemented on an Intel Pentium II 266 MHz system running Microsoft Windows 95. The Expert objects were located on a Pentium Pro 200 MHz system running Microsoft Windows NT Server 4.0, and the Aggregation objects were invoked on a Pentium 133 MHz system running Windows NT Workstation 4.0. Each system is connected

R.L. Major, C.T. Ragsdale / Computers & Operations Research 28 (2001) 1231}1244

1241

to an ethernet LAN and contained additional Java programs to support full language mapping and a complete Java implementation of mandatory CORBA features. In our system, a TManager object accesses an Expert object over the Internet to get its prediction of group-membership for an observation. An Aggregation object produces a composite prediction using the four expert predictions. Thus, this system can use one of two alternative approaches when supplying information to a user. This feature was built into the system to examine future research ideas.

4. Experimental results The fundamental measures of performance when considering a network-based application include measures related to security, speed, accuracy, reliability, and ease of development and management. We performed our work in a relatively stable environment since the computer systems we used were all located in the same building (i.e., connected to the same node). Thus, the speed, accuracy and reliability information associated with our experiments is rather insigni"cant. This rather insigni"cant information may also suggest that the combined Java and CORBA platforms allow for the development of network-centric applications with relative ease. Future research e!orts revolve around studying certain events that can occur in a distributed environment such as this one like how to work noisy or missing data. Our classi"cation results from testing the two aggregation approaches are shown in Table 1. The table also lists results for each prediction expert. Because the performance of the aggregation techniques are largely contingent on the data in the testing (¹ ) sample, for each ¹ sample we   generated 30 di!erent pairs of testing and validation samples. Thus, each row in Table 1 represents the average number of misclassi"cations on 30 di!erent validation samples for each set of classi"cation rules and aggregation techniques. A number of interesting observations may be made from the data in Table 1. First, none of the individual prediction experts or aggregation techniques outperformed the others on a consistent basis. Among the individual prediction experts, the Mahalanobis distance measure performed best, misclassifying an average of 11.07 observations in each validation set. It also produced the lowest number of misclassi"cations in 8 of the 30 trials. The Decision Tree classi"er tended to do the worst having an average of 14.89 misclassi"ed observations in each validation sample. It also produced the largest average number of misclassi"cations in 24 of the 30 trials. In comparing the aggregation techniques, the weighted majority technique outperformed the Maximum Entropy technique in 21 of the 30 trials and also produced the lowest average number of misclassi"cations overall. Table 1 indicates that the average number of observations in the validation sample misclassi"ed was 11.42 for the Maximum Entropy technique and 10.88 for the weighted majority technique. A statistically signi"cant di!erence exists between the average number of misclassi"cations at the "0.05 level (t"2.079). Also note that both of the aggregation techniques always outperformed at least one of the prediction experts in each trial. The weighted majority technique misclassi"ed the fewest average number of observations among the experts and aggregation techniques in 6 of the 30 trials and also provided the best worst-case performance * deviating from the best technique by at most 3.17. The average number of misclassi"cations in Table 1 was 11.07 for the Mahalanobis distance technique and a statistical test at the "0.05 level shows that the Mahalanobis distance and

1242

R.L. Major, C.T. Ragsdale / Computers & Operations Research 28 (2001) 1231}1244

Table 1 Comparison of classi"cation accuracy Trial

Average number of misclassi"cations Mahalanobis distance

Logistic regression

Neural network

Decision tree

Maximum entropy

Weighted majority

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

7.13 12.70 14.33 10.70 10.57 7.67 8.57 9.43 8.60 8.77 12.63 14.20 10.93 9.83 11.50 13.40 17.53 9.33 11.07 9.60 10.50 9.50 10.67 13.77 10.10 12.07 13.40 11.83 10.37 11.43

7.77 12.47 14.10 8.80 15.10 9.50 9.40 14.87 10.43 11.30 12.33 18.23 9.63 10.60 12.43 13.40 15.47 12.70 11.20 7.77 11.43 11.27 14.30 17.47 11.10 16.47 13.17 14.17 9.70 14.03

6.17 11.73 16.67 13.13 13.30 9.50 7.87 13.93 5.20 11.33 14.40 11.97 11.10 10.73 14.10 12.57 16.63 10.43 12.00 8.80 12.23 10.23 12.30 16.40 9.33 9.43 10.47 13.33 18.07 10.47

9.57 17.83 16.83 16.93 12.40 16.20 14.30 10.60 9.63 12.97 15.33 16.87 13.27 12.17 16.50 17.67 14.30 14.60 12.23 13.67 16.10 14.90 15.00 19.23 14.90 19.77 17.60 15.13 13.80 16.30

6.57 13.93 13.70 12.90 12.43 8.60 8.73 11.33 8.13 10.97 13.40 12.33 10.73 11.60 13.10 13.03 15.03 7.33 10.83 9.20 13.57 11.50 12.50 16.03 9.87 9.00 12.13 12.33 10.80 10.97

6.30 10.97 12.83 10.13 10.40 8.77 8.60 11.10 6.87 9.30 13.77 12.43 10.27 9.97 10.87 13.87 15.33 10.50 10.80 7.80 11.00 9.27 11.23 16.43 9.50 11.57 11.93 13.40 10.40 10.67

Avg. SD

11.07 2.25

12.35 2.72

11.79 2.98

14.89 2.57

11.42 2.25

10.88 2.26

8 1 3.4

5 4 7.47

8 1 8.37

1 24 10.77

2 0 4.1

6 0 3.17

No. of times Best No. of times Worst Worst deviation from best technique

weighted majority techniques perform equally as well (t"0.9). Table 1 indicates that the Mahalanobis distance technique outperformed the weighted majority technique in 13 of the 30 trials. These results suggests that by combining the predictions of multiple experts in the manner

R.L. Major, C.T. Ragsdale / Computers & Operations Research 28 (2001) 1231}1244

1243

suggested here it may be possible to create a classi"cation tool that consistently performs as well as (or better) than any individual technique used in isolation.

5. Conclusion In this paper, we described a new component (i.e., a transaction manager) for a networked based application that combines the prediction information coming from remote information resources over the Internet. This component provides a reliable and secure interface to the Internet for providing classi"cation information. We used an actual distributed system instead of a simulated one in order to examine how certain issues like speed and accuracy impact the system's performance. Studying this system also allowed us to identify certain characteristics of an internet tool developed to solve an aggregation problem. These characteristics include the ability to track the commitment and/or rollback of transactions and act as a message server to remote computer systems. We examined our classi"cation tool along two fundamental dimensions in this paper * the bias and accuracy of expert predictions * and introduced a new, simple technique for determining weights to aggregate discrete predictions coming from multiple experts. Computational testing was presented demonstrating that this technique is capable of outperforming a recently introduced maximum entropy-based aggregation technique on a real-world problem. While the new Weighted majority technique introduced here provided only a marginal improvement over the best individual classi"cation technique (i.e., the Mahalanobis distance measure) it is signi"cant to note that even small improvements in classi"cation accuracy can result in tremendous cost savings [25]. The technique introduced here explicitly considers and accounts for the e!ect that over"tting may have in biasing estimates of expert competence. As AI-based classi"cation techniques become more readily available, they are also more likely to be embedded in decision support systems and web-based e-commerce systems. In these environments, it is likely that less attention will be given to the proper training of these AI techniques and over-"tting may become a far more prevalent problem. Statistically based techniques are more stable in this regard, however, their results can sometimes be di$cult to interpret within the problem domain. In general, we conclude that the Internet o!ers a rich environment for acquiring up-to-date information in a timely way that can be exploited by a networked based application. Future research e!orts will focus on investigating practical problems related to aggregation techniques that rely on accurate estimates of a prediction expert's relative accuracy. These include determining how many missing expert predictions a model can tolerate along with identifying ways to detect and manage faulty data. Other future e!orts include investigating the e$cacy of using a consensus approach by examining an n-group classi"cation problem (n'2), however, we cannot apply the maximum entropy method to this type of problem * other benchmarks are needed.

References [1] Veith TL, Kobza JE, Koelling CP. Netsim: Java2+-based simulation for the World Wide Web. Computers and Operations Research 1999;26(6):607}21.

1244

R.L. Major, C.T. Ragsdale / Computers & Operations Research 28 (2001) 1231}1244

[2] Bhargava H, Krishnan R. The World Wide Web: opportunities for operations research and management science. INFORMS Journal on Computing 1998;10(4):359}83. [3] Mangiameli P, West D. An improved neural classi"cation network for the two-group problem. Computers and Operations Research 1999;26(5):443}60. [4] Ragsdale CT, Stam A. Mathematical programming formulations for the discriminant analysis problem: an old dog does new tricks. Decision Sciences 1991;22(2):296}307. [5] Bhattacharyya S, Pendharkar PC. Inductive, evolutionary, and neural computing techniques for discrimination: a comparative study. Decision Sciences 1998;29(4):871}99. [6] Markham I, Ragsdale C. Combining neural networks and statistical predictions to solve the classi"cation problem in discriminant analysis. Decision Sciences 1995;26(2):229}42. [7] Hill GW. Group versus individual performance: are N#1 heads better than one?. Psychological Bulletin 1982;91:517}39. [8] Myung IJ, Ramamoorti S, Bailey Jr. AD. Maximum entropy aggregation of expert predictions. Management Science 1996;42(10):1420}36. [9] Mendelson H, Pillai R. Clockspeed and informational response: evidence from the information technology industry. Information Systems Research 1998;9(4):415}33. [10] Nault B, Storey V. Using object concepts to match arti"cial intelligence techniques to problem types. Information and Management 1998;34(1):19}31. [11] Hamilton M. Java and the shift to net-centric computing. Computer 1996;29(8):31}9. [12] Flanagan D. Java in a nutshell. California: O'Reilly & Associates, 1997. [13] Bradley G, Buss A. Dynamic, distributed, platform indepenent OR/MS applications } a network perspective. INFORMS Journal on Computing 1998;10(4):384}7. [14] Quinlan JR. Induction of decision trees. Machine Learning 1986;1:81}106. [15] Quinlan JR. Decision trees and decisionmaking. IEEE Transactions on Systems, Man, and Cybernetics 1990;20:339}46. [16] Quinlan JR. C4.5: programs for machine learning. San Mateo, CA: Morgan Kaufmann, 1993. [17] Breiman L. Technical note: some properties of splitting criteria. Machine Learning 1996;24(1):41}7. [18] Pindyck RS, Rubinfeld DL. Econometric models and economic forecasts. New York: McGraw-Hill, 1981. [19] Morrison DF. Multivariate statistical methods. New York: McGraw-Hill, 1990. [20] Burke L. Introduction to arti"cial neural systems for pattern recognition. Computers and Operations Research 1991;18(2):211}20. [21] Sharda R. Neural networks for the MS/OR analyst: an application bibliography. Interfaces 1994;24(2):116}30. [22] Wasserman P. Neural computing theory and practice. New York: Van Nostrand Reinhold, 1989. [23] Leahy K. The over"tting problem in perspective. AI Expert 1994;35}36. [24] Tam KY, Kiang MY. Managerial applications of neural networks: the case of bank failure prediction. Management Science 1992;38(7):926}47. [25] Hansen JV, McDonald JB, Messier WF, Bell TB. A generalized qualitative-response model and the analysis of management fraud. Management Science 1996;42(7):1022}32. Raymond Major is an Associate Professor of Management Science & Information Technology at Virginia Tech in Blacksburg, VA. He received his Ph.D. in Decision and Information Sciences from the University of Florida. His research interests include the application of statistical and arti"cial intelligence techniques to such areas as classi"cation decisions, distributed processing, and manufacturing process control. He has published in Engineering Applications of Artixcial Intelligence, and the international journals of Information Processing and Management, and Computational Intelligence. Cli4 T. Ragsdale is an Associate Professor in the Department of Management Science and Information Technology at Virginia Tech. He received his Ph.D. in Management Science and Information Technology from the University of Georgia. Dr. Ragsdale's primary areas of research interest include microcomputer systems and applications, arti"cial intelligence, mathematical programming and applied statistics. His research has appeared in Decision Sciences, Decision Support Systems, Naval Research Logistics, OMEGA: The International Journal of Management Science, Computers & Operations Research, Operations Research Letters, Personal Financial Planning and other publications. He is also the author of the textbook Spreadsheet Modeling and Decision Analysis published by South-Western.