Applied Soft Computing 6 (2006) 348–356 www.elsevier.com/locate/asoc
Towards the automation of intelligent data analysis Martin Spott *, Detlef Nauck BT Research and Venturing, Intelligent Systems Research Centre, Orion pp1/12, Adastral Park, Ipswich IP5 3RE, UK
Abstract Data analysis tools are still very much a collection of data analysis methods that require analysis experts as users. On the other hand, many business users are keen to apply data analysis to business data in order to understand it or to make predictions to improve on their business decisions. In order to make state-of-the-art data analysis techniques available to such non-experts, we developed a wizard for our data analysis tool SPIDA that selects appropriate data analysis methods given soft high-level requirements. The wizard also configures and runs the chosen methods automatically. This paper describes our general approach to automating data analysis, in particular, how to select an appropriate data analysis algorithm. # 2005 Elsevier B.V. All rights reserved. Keywords: Automatic data analysis; Fuzzy ranking; Computing with words
1. Introduction Nowadays, data analysis tools are still very much a collection of data analysis methods that require analysis experts as users. The user not only needs domain knowledge of the data, but he also needs to know which data analysis methods are applicable to his problem, which ones meet some special requirements for the solution, how data needs to be prepared for the chosen method and finally, how the method needs to be configured. On the other hand, many business users are keen to employ data analysis to make use of data that is being collected. Although a great proportion of typical analysis problems look quite simple to a data analysis * Corresponding author. E-mail addresses:
[email protected] (M. Spott),
[email protected] (D. Nauck).
expert, business users are overwhelmed by the sheer knowledge that is required to use current tools. We consider this one of the reasons for the fact that modern machine learning techniques like decision trees, neural networks, fuzzy techniques, support vector machines, etc. are still not industry standard. Business users require a much more user- or problem-oriented approach to data analysis. Rather than knowing analysis methods, they are experts in the data domain and they know what they want to achieve with data analysis. If they only knew how. They might know, for example, that they want to classify insurance claims as fraudulent or non-fraudulent, given historic information of the customer and the current case. They might want to understand, how the analysis method actually classifies customers (e.g. with a rule set), they might require a certain classification accuracy and that the algorithm is so simple that it can be implemented as an SQL query. Ideally, such users would simply like
1568-4946/$ – see front matter # 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.asoc.2005.11.002
M. Spott, D. Nauck / Applied Soft Computing 6 (2006) 348–356
to feed all these high-level requirements and the data into a tool that would then automatically find the best algorithm in terms of requirements, configure it, run it and create a software module that can be plugged into the business application. Based on these ideas we developed SPIDA (soft computing platform for intelligent data analysis) [7] and equipped it with a wizard that, to certain extent, does most of the things mentioned above. The rest of the paper is organised as follows. In Section 2, we discuss the general problem of automating data analysis and existing approaches. Afterwards, we show how to use soft constraints to rank data analysis methods in Section 3. Finally, we describe the wizard of our data analysis tool SPIDA as an implementation of the techniques introduced before.
2. Approaches to automatic data analysis A typical data analysis tool provides access to data sources like databases or plain text files, it can filter the data in various ways and prepare it for the actual data analysis, for example change format or representation. Such a tool offers a variety of data analysis methods which can be applied to prepared data and can finally present results in different formats. Setting up a data analysis process involves the following steps. (1) Define the data analysis problem (like classification, prediction, clustering). (2) Define requirements and preferences for the solution. (3) Select the data source. (4) Depending on the data and the problem. (a) Decide on applicable data analysis methods. (b) Filter and prepare data according to chosen methods (might be different for different methods). (c) Configure data analysis methods (parameter settings). (5) Run the analysis. (6) Check the results. (7) Go back to step 4a if results are not satisfactory. (8) Produce report. The first two steps define what the analyst wants to achieve, i.e. what he wants to solve and how he wants
349
the solution to look like. From the business perspective, this should include how the solution will be applied like if it will be a stand-alone application, a module embedded in an existing application, a simple SQL query, etc. Depending on such requirements, some analysis methods are more suitable than others. From step 4a on typically a data analysis expert is required, who knows from experience which data analysis methods can be applied to which problem given a number of requirements and the data. The expert would also know, what kind of data preparation is necessary and how to set the parameters of the chosen methods. After setting up the analysis process it will be run and the results will be evaluated. Usually, first results are not satisfactory, so the actual data analysis process is iterative. Based again on experience, an expert might change data preparation and parameter settings in the actual analysis method several times in order to achieve better results. Our understanding of automatic data analysis incorporates steps 4–8. The user is required to define the type of the problem, his requirements and preferences for the solution as well as specify the data source. Given this information, an automated tool would take over and recommend appropriate analysis methods, set-up the analysis process, run it, match the results with the given requirements and improve the match iteratively by changing the set-up, if necessary. In the following subsection we briefly describe existing approaches for automating data analysis. Our approach will be introduced in subsequent sections, whereby we focus on selecting the most suitable data analysis method according to user requirements. 2.1. Previous approaches The approach described in [11] breaks analysis methods down into formal blocks and represents user requirements in a formal language. Then a search algorithm identifies suitable blocks and arranges them in a way to carry out an analysis process. This approaches faces the problem of formalising mainly heuristic methods and that it is usually not feasible to formally compute all necessary parameters to execute an analysis method. Other authors discuss mainly architectural features of systems that could automate data analysis or data
350
M. Spott, D. Nauck / Applied Soft Computing 6 (2006) 348–356
mining and avoid discussing how to automatically select and execute analysis methods [4,3]. A more recent system uses an ontology based approach and simply describes analysis methods and pre-/post-processing methods as input/output blocks with specific interfaces [2,1]. The system is built on top of the data analysis package Weka [12]. If the interfaces between two blocks match, they can be concatenated in an analysis process. If a user wants to analyse a data set, all possible concatenations of blocks are created and executed as data analysis processes. Once a suitable analysis process has been identified, it can be stored, re-used and shared. The authors suggest a heuristic ranking of analysis processes in order to execute only the best processes. However, they only use speed as a ranking criterion, which can be easily determined as a feature of an algorithm. More useful features about the quality of the analysis like accuracy are obviously dependent on the analysis process itself as well as the analysed data and can therefore not be determined up-front. None of the approaches above deals with high-level requirements or preferences like simplicity of a model which are indispensable for real-world applications. In summary, the reported results in the area of automating data analysis are not very encouraging.
3. Ranking data analysis models The selection of an appropriate data analysis method depends on a number of factors. First of all, the type of analysis problem restricts the list of applicable methods. By analysis problem we mean, if it is a classification problem, function approximation like time series prediction, a clustering problem, if it is about finding dependencies or associations, etc. The second category of requirements is concerned with preferences regarding the solution. These comprise properties like accuracy and simplicity of the solution, if the method is adaptable to new data (rather than learning from scratch whenever new data is available), whether it offers an explanation facility like rule-based systems or functional models like linear regression and how simple the explanation should be. The user might have requirements for the execution time of a method, for prior knowledge to be integrated, etc. Finally, the data might constrain the applicability of
methods. The number of data records, for example, might be too small for some statistical methods, or generally, some methods might cope better with certain types of data than others. Depending on the user group, the level at which the requirements are defined will vary considerably. Where some users will be able to understand the difference between function approximation and classification, for instance, others will certainly not. One could therefore think of hierarchical approaches where requirements are iteratively mapped onto lower level requirements until the lowest level is reached. In order to evaluate the suitabilities of analysis methods, the requirements have to be mapped onto properties of the methods. In some cases, requirements are directly expressed as desired method properties. Adaptability is an example. A user might request a highly adaptable method, which can be directly matched to adaptability properties of given methods. Especially for higher level requirements this is not the case. For instance, ‘I would like to detect fraudulent cases every night and I would like to understand the solution’ could be mapped onto ‘classification problem requiring a highly adaptable method with explanation facility providing a very simple explanation’. Another observation is that non-expert users are often only able to specify requirements vaguely like ‘I would like a simple explanation in terms of rules’. In this case the term ‘simple’ either indicates that the user cannot specify more accurately, what he means, because he does not know what to expect. Or the user does not want to give a more precise specification because his requirement is inherently vague, i.e. the exact degree of simplicity is not important. In order to provide a mapping from requirements onto desired analysis method properties that can deal with vague requirements and properties we propose to use a fuzzy rule base. Fig. 1 shows the general schema for matching requirements with properties. For the actual properties of data analysis methods we distinguish method properties and model properties.
Fig. 1. Matching requirements with properties.
M. Spott, D. Nauck / Applied Soft Computing 6 (2006) 348–356
Method properties are static features of an analysis method like adaptability, explanation facility, etc., whereas model properties are the ones that can only be observed after an actual model has been created. Examples are accuracy and simplicity of an explanation. For instance, we may decide to use decision trees. Before we have actually learned a tree we cannot tell the accuracy nor do we know its size. On the other hand we know that decision trees in general are not adaptable. The actual ranking procedure works as follows. First the user defines all preferences, which will be mapped onto desired properties. The method properties of all available analysis methods will then be matched with the respective desired properties and an initial ranking of methods will be produced. The user can then select the methods he wants SPIDA to create models of. After the creation of actual models, model properties are available and then all desired properties can be matched with method and model properties. Depending on the degree of match of some of the model properties like accuracy or simplicity, SPIDA might decide to adapt learning parameters in order to improve the match. If a decision tree for example is very big and the user required a simple explanation, we might want to reduce the maximum tree height and relearn the tree. If no more improvement can be achieved, the final ranking of created models can be presented. If several models have been created for the same method, the best one will be picked according to the user’s preferences. However, the properties of all models can be shown and the user can pick a different one. The rest of this section is organised as follows. Section 3.1 introduces the concept of combining fuzzy words which allow us to fuse symbolic and numeric requirements and properties in a sound way. Finally, we propose a method to measure the match between desired and actual analysis method properties in Section 3.2. 3.1. Combination of fuzzy words As already laid out in Section 3, we use fuzzy techniques to describe user requirements as well as method, model and desired properties. Also, mappings between requirements and properties will be modelled as fuzzy rules. Looking closer at the requirements and
351
properties reveals that some of them carry symbolic (categorical) values, whereas others are of numeric nature. Examples for the first category are the type of analysis problem (classification, function approximation, clustering, etc.), while accuracy and simplicity represent the second category. Obviously, fuzzy sets can be defined on numeric as well as symbolic (categorical) domains. However, a more homogeneous approach has been presented in [8]. The main idea is to represent information at two levels of abstraction. The lower level is the level of details like the potentially infinite number of possible values in a continuous numeric domain of discourse. At the higher level of abstraction we only deal with a finite number of symbols by abstracting from details. In other words, the granularity of information can be either fine or coarse (potentially, entire hierarchies of granularity are conceivable). If we measure accuracy of a classification model we can do so at the level of details as a value in [0,1], for example, or at the symbolic level in terms of symbols like ‘high accuracy’, ‘medium accuracy’ and ‘low accuracy’. What this approach allows us to do is to represent inherently symbolic information like the type of analysis problem mentioned above at the same level as symbols like ‘high accuracy’ which are in fact abstractions from a level of finer granularity. The advantage of this approach compared to simply defining fuzzy sets on symbolic or numeric domains becomes more obvious when we think of expressing information using a combination of symbols like ‘I fully accept a model of high accuracy, but to a lower degree I would accept a model of medium accuracy, as well’. This expression could be quantified in a requirement as ‘high accuracy (1.0) + medium accuracy (0.6)’. We call such an expression a weighted combination of symbols or in short a combination. In [9] it has been shown how to process such expressions at the symbolic level. Thereby, it does not matter, if the terms used are inherently symbolic or if they are actually fuzzy sets themselves. In this way, all symbols can be treated in a coherent way. In [8,9] we proposed a probabilistic model for the weights in combinations. That means in particular that the sum of the weights is 1. In Section 3.2, this model will be extended by allowing a sum of weights larger than 1. Obviously, such weights cannot be interpreted as probabilities, they rather represent the existence of alternatives, or possibilities.
352
M. Spott, D. Nauck / Applied Soft Computing 6 (2006) 348–356
3.2. Measuring the match of requirements and properties We assume that the original user requirements have already been mapped onto desired properties and that information about the respective method and model properties is available. Each property is represented by a fuzzy variable that takes combinations of fuzzy words as their values. The desired accuracy, for example, could be ‘medium (1.0) + high (1.0)’ whereas a created analysis model might be accurate to the degree of ‘low (0.3) + medium (0.7)’. In other words, we are looking for a model with medium or high accuracy, and the created model’s accuracy is low with degree 0.3 and medium with degree 0.7. In this example, the weights for the desired accuracy sum up to 2, whereas the ones for the actual accuracy add up to 1. We interpret a combination of fuzzy words with sum of weights greater than one as alternative fuzzy words. Rather than modelling the weights as probabilities, we assume a possibility density function [5,8] on the fuzzy words, which allows for alternative values which, as an extreme case, could all be possible without restriction. Thereby, we define the degree of possibility of a fuzzy word in conjunction with probabilities of the related method or model property. The degree stands for the maximum acceptable probability of a property. In the example above, we were entirely happy with an analysis model of medium or high accuracy. We therefore assigned the possibilistic weight 1 to both of them, i.e. models exhibiting the property ‘low (0.0) + medium (a) + high (b)’ with a + b = 1 are fully acceptable. In case of the requirement ‘low (0.0) + medium (0.0) + high (1.0)’ and the above property with a > 0 the weight a exceeds the possibility for ‘medium’ and therefore at least partially violates the requirements. Degrees of possibility can be any real number in [0,1]. Building on these ideas we stipulate that requirements are represented as a possibilistic combination of fuzzy words, i.e. at least one of the fuzzy words carries the weight 1, so the sum of weights of the fuzzy words is at least 1. These properties are based on the assumption that at least one of the alternative requirements is fully acceptable. Properties on the other hand stand for existing properties of a given model, so we assume that they are represented by a combination of fuzzy words, where the sum of weights of fuzzy words equals 1.
The following part of the section deals with the question of how a match of requirements and properties can be quantified, i.e. we are looking to measure the compatibility of properties with requirements. We will first focus on the compatibility of a single pair of requirement/property and then consider ways of combining several degrees of compatibility in one value. In order to find an appropriate compatibility measure we stipulated a number of required properties in [10] and proposed a measure that meets these. For reasons of simplicity, we refer to combinations of fuzzy words simply as fuzzy sets (on fuzzy words) R˜ for requirements and P˜ for properties, defined on a finite universeP of discourse X . P Assuming x 2 X mR˜ ðxÞ 1 and x 2 X mP˜ ðxÞ ¼ 1 the proposed measure is ˜ RÞ ˜ :¼ CðP; X 1 X 1 jmR˜ ðxÞ mP˜ ðxÞj mR˜ ðxÞ þ 1 2 x2X x2X (1) which can be rewritten as X ˜ RÞ ˜ ¼1 CðP;
mP˜ ðxÞ mR˜ ðxÞ
(2)
x 2 X :mP˜ ðxÞ > mR˜ ðxÞ
Fig. 2 shows a fuzzy set R˜ for requirements on the left hand side (we used a continuous domain for better illustration), the right triangular function is the fuzzy set P˜ for properties. The right term in (2) measures the size of the grey area, which can be interpreted as a degree to which the properties violate the requirements. The size of the area is bounded by the area underneath the membership function of P˜ which we stipulated to be 1. That means that the right term
˜ properties P˜ and the interFig. 2. Fuzzy sets of requirements R, 0 section R˜ . The grey area represents incompatibility of properties with requirements.
M. Spott, D. Nauck / Applied Soft Computing 6 (2006) 348–356
measures the proportion of properties P˜ that violate the requirements. Since we deal with a set of requirements and properties, we end up having one match value for each requirement/property pair. Requiring a set of properties can formally be interpreted as a logical conjunction of the individual requirements. Given the assumption that all requirements are equally important, we therefore propose to use a t-norm to aggregate the individual match values. We decided to use multiplication, as it is a strictly monotonous operator. Strict monotony basically means that the overall match value decreases with any of the individual match values. Other operators like minimum do not have this property. In case of minimum, the overall match obviously is the minimum of the individual matches. That means that all the match values apart from the minimum can be increased without changing the overall value. This is not the desired behaviour in our case since many different sets of properties would result in the same overall match value as long as the minimal value is the same. So the proposed measure for a multi-criteria match is ˜ RÞ ˜ ¼ CðP;
m Y
CðP˜ j ; R˜ j Þ
j¼1
m Y ¼ 1 j¼1
mP˜ j ðxÞ mR˜ j ðxÞ
X x 2 X j :mP˜ ðxÞ > mR˜ ðxÞ j
j
(3)
Fig. 3. Specifying preferences, here regarding an explanation facility.
terms like ‘Rules’ and ‘Functions’ for the type of explanation. The dialogs for other preferences look very similar. A typical ranking of data analysis methods according to user preferences is shown in Fig. 4, where the match or compatibility of method properties with preferences is given as suitability. At this stage, no models have been created, so model properties like accuracy and simplicity are not taken into account for the suitability. The user can preselect the most suitable methods and trigger the creation of models for them. As already mentioned in Section 3, the wizard will then create models for each selected method, evaluate model properties afterwards and try to improve on the match with the respective desired properties. This is achieved
4. The SPIDA wizard for analysis model selection Based on the techniques described in the preceding sections, we implemented a wizard for our data analysis tool SPIDA. In a series of dialogs the user specifies the data analysis problem (prediction, grouping and dependencies), chooses the data source and gives his preferences regarding the solution (explanation facility, type of explanation, simplicity of explanation, facility to take prior knowledge, adaptability, accuracy, etc.). Fig. 3 shows the dialog for specifying requirements for an explanation facility. Possible selections are a mixture of fuzzy terms like ‘at least medium’ or ‘simple’ for simplicity, and crisp
353
Fig. 4. Ranking of analysis models.
354
M. Spott, D. Nauck / Applied Soft Computing 6 (2006) 348–356
type of analysis problem (classification, function approximation, clustering, dependency analysis, etc.), if an explanation facility exists, type of explanation (rules or functions), adaptability to new data, if prior knowledge can be integrated. and model properties simplicity of an explanation, accuracy.
Fig. 5. Accuracy, simplicity and overall suitability of different Nefclass models.
by changing learning parameters of the methods, which have been collected from experts in the field. If no improvement can be achieved, anymore, the final overall suitability can be shown. Fig. 5 shows five different models of the Neuro-Fuzzy classifier Nefclass [6]. The user has asked for a simple model, so the wizard tried to force Nefclass to produce a simple solution but keeping the accuracy up. As can be seen in the figure SPIDA produced three models with high simplicity, but considerably different accuracy—in this case between 44% and 55% (the actual values for accuracy can be revealed in tool tips). The user can balance the importance of simplicity against accuracy as one of the preferences, so the wizard decides on the best model according to this. Nevertheless, the user can pick a different model based on the information in Fig. 5. 4.1. User preferences and method properties In the current version of the SPIDA wizard, we measure the suitability of a data analysis method according to the following method properties:
Another conceivable model property is execution time, which can be crucial for real-time applications. Examples for property profiles are shown in Table 1. The method properties above are symbolic, whereas the model properties are numeric. In general, of course, this is not necessarily the case. For all numeric properties, fuzzy sets have to be defined as a granularisation of the underlying domain. For example, if accuracy was measured as value in [0,1] fuzzy sets for ‘high’, ‘medium’ and ‘low’ accuracy could be defined on [0,1] as fuzzy values for accuracy. Since accuracy is heavily dependent on the application, the definition of the fuzzy terms is as well. We decided to ask users to specify a desired accuracy and the lowest acceptable accuracy whenever they use the wizard. These two crisp accuracy values are then used as cross-over points for three trapezoidal membership functions for ‘high’, ‘medium’ and ‘low’. In case the user cannot specify accuracy due to lack of knowledge, accuracy will simply not be used to determine the suitability of an analysis model. For other properties, fuzzy sets can be defined accordingly, either by the user or by the expert who designs the wizard. Fuzzy sets can even be adapted according to user feedback. If the wizard, for instance, recommends a supposedly simple model that is not simple at all from the user’s perspective the underlying fuzzy set can be changed accordingly (user profiling).
Table 1 Property profiles for decision trees, neural networks and Nefclass Method
Problem
Explain
Adapt
Prior knowledge
Decision tree Neural network Nefclass
Classification Classification, function approximation Classification
Rules No Rules
No Medium High
No No Yes
M. Spott, D. Nauck / Applied Soft Computing 6 (2006) 348–356
In the current version of the wizard, user preferences are specified at a similar level as desired method and model properties. They include: type of analysis problem (classification, function approximation, clustering, dependency analysis, etc.); importance of an explanation facility (do not care, nice to have, important); type of explanation (do not care, rules, functions); adaptability to new data (do not care, nice to have, important); integration of prior knowledge (do not care, nice to have, important); simplicity of an explanation; accuracy; balance importance of accuracy and simplicity. The mapping from user preferences onto desired properties as shown in Fig. 1 is therefore quite simple, in some cases like accuracy almost a one-to-one relation like ‘If accuracy preference is at least medium, then desired accuracy is medium or high’. For others like simplicity it is slightly more complicated with rules like ‘If simplicity preference is high and an explanation is important, then desired simplicity is medium (0.6) + high (1.0)’. The balance for the importance of accuracy and simplicity is not used to compute the suitability of models, since we can assume that the user has specified his preferences regarding these properties. It is only taken into account if several models of the same analysis method get the same suitability score, so the wizard can decide on the better one. The balance is also used when the wizard decides to rerun an analysis method with different learning parameters because accuracy and/or simplicity are not satisfactory. Depending on a combination of accuracy and simplicity score and their balance the wizard changes parameters in order to either improve on accuracy or simplicity.
5. Conclusion As a new direction in automating data analysis, we introduced the concept of using soft constraints for the selection of an appropriate data analysis method.
355
These constraints represent the user’s requirements regarding the analysis problem in terms of the actual problem (like prediction, clustering or finding dependencies) and preferences for the solution. Requirements can potentially be defined at any level of abstraction. Expert knowledge in terms of a fuzzy rule base maps high-level requirements onto required properties of data analysis methods which will then be matched to actual properties of analysis methods. This general concept now opens up the possibility to define entire hierarchies of requirements. Different branches of such a hierarchy would capture different aspects of data analysis solutions, and different levels would correspond to the level of expertise of the user. For instance, a branch could be concerned with an explanation facility. At the highest level, a user could demand a ‘simple explanation’, intermediate users might require certain types of explanations like a ‘simple rule set’ or a ‘simple function’ whereas expert users might even want to distinguish different types of rules or functions and specify the desired complexity themselves. In a data analysis tool, the level of specifying requirements could be customised based on such a requirement hierarchy, so different user groups would be able to create data analysis solutions which match their individual requirements. In contrast, data analysis tool providers nowadays offer custom solutions for different industry sectors, whereby experts of the solution providers capture requirements of the respective sector based on use cases and manually turn these into a solution. The main problem of such an approach is its inflexibility. If the problem at hand differs slightly from the use cases such a tailored solution might be useless or require expensive consulting work by the provider. With our approach, individual requirements would be captured and the space of possible solutions would be searched for the best match. Furthermore, requirement templates could be provided for different industry sectors. The methods presented above have been implemented as a wizard for our data analysis tool SPIDA, which has been successfully used to produce solutions to a variety of problems within BT, for example fraud detection, travel time prediction and customer satisfaction analysis.
356
M. Spott, D. Nauck / Applied Soft Computing 6 (2006) 348–356
References [1] A. Bernstein, S. Hill, F. Provost, Intelligent assistance for the data mining process: an ontology-based approach. CeDER Working Paper IS-02-02, Center for Digital Economy Research, Leonard Stern School of Business, New York University, New York, 2002. [2] A. Bernstein, F. Provost. An intelligent assistant for the knowledge discovery process, CeDER Working Paper IS-01-01, Center for Digital Economy Research, Leonard Stern School of Business, New York University, New York, 2001. [3] J.A. Botia, A.F. Skarmeta, J.R. Velasco, M. Garijo, A proposal for meta-learning through a mas, in: T. Wagner, O. Rana (Eds.), Infrastructure for Agents, Multi-Agent Systems, and Scalable Multi-Agent Systems, Number 1887 in LNAI, Springer-Verlag, Berlin, 2000, pp. 226–233. [4] J.A. Botia, J.R. Velasco, M. Garijo, A.F.G. Skarmeta, A Generic Datamining System. Basic Design and Implementation Guidelines, in: H. Kargupta, P.K. Chan (Eds.), Workshop in Distributed Datamining at KDD-98, AAAI Press, New York, 1998. [5] J. Gebhardt, R. Kruse, The context model—an integrating view of vagueness and uncertainty, Intern. J. Approximate Reasoning 9 (1993) 283–314.
[6] D. Nauck, R. Kruse, A neuro-fuzzy method to learn fuzzy classification rules from data, FSS 89 (3) (1997) 277– 288. [7] D. Nauck, M. Spott, B. Azvine, Spida—a novel data analysis tool, BT Technol. J. 21 (4) (2003) 104–112. [8] M. Spott, Combining fuzzy words, in: Proceedings of the FUZZ-IEEE 2001, Melbourne, Australia, (2001. [9] M. Spott. Efficient reasoning with fuzzy words, in: S.K. Halgamuge, L. Wang (Eds.), Computational Intelligence for Modelling and Predictions, Springer Verlag, pp. 117–128 (Chapter 10). [10] M. Spott, D. Nauck, On choosing an appropriate data analysis algorithm, in: Proceedings of the IEEE International Conference on Fuzzy Systems 2005. [11] R. Wirth, C. Shearer, U. Grimmer, T.P. Reinartz, J. Schloesser, C. Breitner, R. Engels, G. Lindner, Towards process-oriented tool support for knowledge discovery in databases, in: Principles of Data Mining and Knowledge Discovery. First European Symposium, PKDD ’97, Number 1263 in Lecture Notes in Computer Science, Springer-Verlag, Berlin, 1997, pp. 243–253. [12] I.H. Witten, E. Frank, Data Mining: Practical Machine Learning Tools and Techniques with JAVA Implementations, Morgan Kaufmann Publishers, San Francisco, CA, 2000.