Applications of ensemble methods

Applications of ensemble methods

Available online at www.sciencedirect.com Information Fusion 9 (2008) 2–3 www.elsevier.com/locate/inffus Guest Editorial Applications of ensemble me...

62KB Sizes 3 Downloads 72 Views

Available online at www.sciencedirect.com

Information Fusion 9 (2008) 2–3 www.elsevier.com/locate/inffus

Guest Editorial

Applications of ensemble methods

Just as committees of people typically make better, or at least, more reliable decisions than any one committee member would make alone (particularly when one does not know which committee member to trust), committees of machine learning models, also known as ensembles, routinely outperform each of the base models that constitute the ensemble. Ensemble machine learning methods have been an active area of research for the last fifteen years. Ensembles were originally motivated by the desire to avoid relying on just one learned model when only a small amount of training data was available. Because of this, most research efforts on ensemble methods have evaluated their new algorithms on relatively small datasets—most notably, the University of California at Irvine (UCI) Machine Learning Repository. However, modern data mining problems raise a variety of issues very different from the ones ensembles have traditionally addressed. These problems include too much data and data that is distributed, noisy, and sampled from non-stationary environments. The goal of this issue is to examine the different applications that lead to these modern data mining problems and how current and novel ensemble methods aid in solving these problems. To that end, the first paper in this issue provides a survey of both the most traditional ensemble techniques, and the most frequently-seen applications of those techniques. The second paper, by Suutala and Ro¨ning, discusses the use of ensemble methods to identify people based on their footsteps on a pressure-sensitive floor. Their system is intended to enable identification of people as part of an intelligent environment without making people consciously aware that they are being tracked, as would happen if people were required to wear sensors or if video cameras were used. They experimented with individual classifiers (by themselves and as part of ensembles) that either used one of three different sets of available features or used all of them together. The ensembles tended to outperform the individual classifiers, especially when each base classifier used a different feature subset and when the classifiers could choose to abstain from labeling an example when they were not confident in their answers.

1566-2535/$ - see front matter Ó 2007 Elsevier B.V. All rights reserved. doi:10.1016/j.inffus.2007.07.004

The third paper, by Assaad, Bone´ and Cardot, implements boosting with recurrent neural networks for time-series forecasting—both single-step-ahead and multistep-ahead prediction problems. They obtained favorable results relative to single recurrent neural networks trained using different back-propagation through time (BPTT) methods as well as other simple methods such as linear and polynomial models and multi-layer perceptrons (MLPs). The fourth paper, by Tsymbal, Pechenizkiy, Cunningham and Puuronen, deals with the problem of concept drift, where the ‘‘true’’ function that maps inputs to outputs changes over time or the underlying distribution over the inputs changes over time. The authors integrate base classifiers dynamically based on accuracy in the instance space local to the example being tested. They devised several methods for dynamic integration and found that they performed better than static combination methods such as weighted voting on several synthetic problems and one real problem (a problem involving changing antibiotic resistance) containing gradual and sudden concept changes. The fifth paper, by Giacinto, Perdisci, Del Rio and Roli, addresses the problem of computer network intrusion detection. In particular, the authors address this problem using modular intrusion detection. That is, they design one classifier, which they refer to as a module, for each group of computer network protocols or services. Each group’s behavior is then easier to model than all the groups’ behaviors simultaneously, and this arrangement also allows each group’s behavior module to be tuned separately. They show that, for lower allowable false alarm rates (less than 4%), their modular approach outperforms monolithic methods that attempt to model all protocols or services together. The sixth paper, by Polikar, Topalis, Parikh, Green, Kounios and Clark, combines classifiers trained using complementary data sources to improve early diagnosis of Alzheimer’s disease. The authors use Polikar’s Learn++ algorithm, which is designed specifically for the incremental learning scenario in which additional features are available with each additional training data set, whereby one

Editorial / Information Fusion 9 (2008) 2–3

would like to benefit from the additional features of the latest training data without throwing away the results of training on the earlier datasets. Their algorithm is especially well-suited to the application, which the authors demonstrate in their experimental results. The seventh paper, by Cabrera, Gutie´rrez and Mehra, develops a hierarchical ensemble method for intrusion detection in a mobile ad-hoc network, an example of which is a typical wireless computer network. This is a particularly difficult problem because nodes regularly exit and enter the network. Their hierarchical system contained anomaly detectors at multiple levels ranging from the entire network down to the individual node. They found that their distributed anomaly detection method worked better than the individual detectors. The eighth paper, by Shoemaker, Banfield, Hall, Bowyer and Kegelmeyer, deals with labeling spatially disjoint data. The authors use two example problems—face recognition and simulations of an impactor bar crushing a storage canister. The simulations in the second problem are of a large scale, yielding a large volume of data and requiring the division of the data among multiple classifiers. The result-

3

ing partitioned data have substantial class imbalance, and yet they successfully identify regions of interest in both problems. We would like to thank all the authors who contributed articles to this special issue as well as the reviewers who provided invaluable feedback through their detailed reviews. In addition, we thank Belur Dasarathy for his advice and frequent help throughout the process of preparing this issue. Guest Editors Nikunj C. Oza NASA Ames Research Center, Mail Stop 269-2, Moffett Field, CA 94035-1000, USA E-mail address: [email protected] Kagan Tumer Oregon State University, 204, Rogers Hall, Corvallis, OR 97331, USA E-mail address: [email protected] Available online 10 August 2007