ARTICLE IN PRESS
Information Systems 29 (2004) 271–272
Editorial
Introduction to special issue with best papers from KDD 2002 This special issue contains the extended versions of five of the best papers from the 2002 Conference on Knowledge Discovery and Data Mining, KDD 2002. Held in Edmonton in Alberta, Canada, during July 23–26 2002, KDD 2002 was the eighth in a continuing series of conferences dedicated to the dissemination and exchange of the latest discoveries in Data Mining. The conference received 306 research paper submissions, of which 12% were selected as full papers another 12% as posters. This places KDD among the most competitive data mining conferences. Based on an assessment of quality and suitability for journal publication, we invited five papers for this special issue. Extended versions of these papers went through a review and revision process before they reached their current form. The papers available in this special issue present substantial advances in data mining in a wide variety of contexts and applications. They also present fine examples of the constructive and experimental nature of data mining research. The first paper, H-MDS: A new approach for Interactive Visualization with Multidimensional Scaling in the Hyperbolic Space, by Joerg Walter, introduces a novel projection based method for visualizing high dimensional data sets, combining concepts from Multidimensional Scaling and the geometry of hyperbolic spaces. The proposed Hyperbolic Multi-Dimensional Scaling synthesizes two important concepts and constitutes a generic technique with a multitude of important applications. In, Selecting the Right Interestingness Measure for Association Patterns, Pang-Ning Tan, Vipin 0306-4379/$ - see front matter r 2003 Published by Elsevier Ltd. doi:10.1016/S0306-4379(03)00075-9
Kumar and Jaideep Srivastava, present several key properties one should examine in order to select the ‘interestingness’ measure for a given application domain. Many data mining techniques require an objective measure to evaluate the dependencies of variables in the data. Many such measures exist but a significant number of those provide conflicting information about how interesting a pattern is. The paper includes a comprehensive comparative study of these properties and proposes an algorithm for selecting a small set of patterns such that a domain expert can find a desirable measure by looking at the ranking of this small set of patterns. The paper Hierarchical Model-Based Clustering of Large Datasets Through Fractionation and Refractionation, by Jeremy Tantrum, Alejandro Murua and Werner Stuetzle, extends the idea of Fractionation for non-parametric hierarchical clustering of large data sets. In addition it proposes Refractionation which is a procedure that can be successful even in the difficult situation where there are large numbers of small groups in the underlying data set. In their paper, Privacy Preserving Mining of Association Rules, Evfimievski, Srikant, Agrawal and Gerhke present a framework for mining association rules from transactions consisting of categorical items where the data has been randomized to preserve the privacy of individual transactions. Although it is possible to recover association rules and preserve privacy using a simple randomization technique based on uniformity the discovered rules can be exploited to find privacy breaches. This paper analyzes the nature of privacy breaches and proposes a class of
ARTICLE IN PRESS 272
Editorial / Information Systems 29 (2004) 271–272
randomization operations that are much more effective than uniform randomization. The paper presents a novel formalism which is embedded into an algorithmic framework, and presents a nice experimental validation. The fifth paper, entitled Exploiting Response Models—Optimizing Cross-Sell and Up-Sell Opportunities in Banking, by Marc-David Cohen, presents a solution that answers the question of what products, if any, to offer to each customer in a way that maximizes the marketing return of investment. Typically response models based on historical data are used to estimate the probability of a customer purchasing an additional product and the expected return from the additional purchase. Response models are a challenging problem which is compounded because of the capability to launch multiple campaigns over
multiple time periods. Overall the proposal improves the state of the art, offering an effective tool for both tactical campaign execution and strategic planning, accounting for limited resources, and various business constraints. We found these five papers exciting to read and hope you enjoy them as much as we did.
Daniel Keim University of Constance, Universitaetstrasse 10 Constance, Germany Nick Koudas AT&T Labs Research, 180 Park Avenue Florham Park, NJ 07932, USA E-mail address:
[email protected]