Variables importance in questionnaire data on advertising

Variables importance in questionnaire data on advertising

Expert Systems with Applications 38 (2011) 14218–14224 Contents lists available at ScienceDirect Expert Systems with Applications journal homepage: ...

642KB Sizes 2 Downloads 69 Views

Expert Systems with Applications 38 (2011) 14218–14224

Contents lists available at ScienceDirect

Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa

Variables importance in questionnaire data on advertising Aniceta Cymbala, Marcin Owczarczuk ⇑ Institute of Econometrics, Warsaw School of Economics Al. Niepodleglosci 164, 02-554 Warsaw, Poland

a r t i c l e

i n f o

Keywords: Low-involvement Feature selection Ranking Monte Carlo Advertisement

a b s t r a c t In this article, we deal with the problem of measuring the importance of features, that determine the purchase of the product after being exposed to an advertisement. We use an algorithm called Monte Carlo feature selection, which is based on multiple usage of decision trees, to achieve a ranking of variables from the questionnaire data. Our data generation process relies on low-involvement during the advertisement watching phase and the comparison of advertised products is based on purchase in a virtual shop. Ó 2011 Elsevier Ltd. All rights reserved.

1. Introduction In marketing, the effective advertisement is one of the keys to success. Marketers responsible for the decision, which advertisement should be applied, are interested, prior to the emission, what is the expected result of the emission on customers. For example, whether customers will like it, find it interesting and convenient. Marketers are interested in what assotiations are created with their brand and product as an effect of watching an advertisement by their customers. Sometimes, a few versions of the same advertisement are produced and the question is, which version should be chosen and shown in mass media. Usually, before the advertisement is broadcasted on TV or printed in newspaper, it is tested whether it will perform its functions well. Answering to these questions may be done using various types of market research techniques. A focus study may be the illustrative example. A small group of potential clients is shown a commercial and then they discuss, among each other, their opinion about it. This discussion is recorded by researchers and then the conclusions are drawn. Sometimes, a wider group of potential customers is shown a commercial and then they fill a questionnaire about the just watched commercial. However, results obtained from the above mentioned procedures are severly biased, due to the fact, that respondents concentrate during watching advertisements, which is usually not true in real life (people usually switch channels during commercial breaks). One of the key concerns in market research of this type is therefore providing an experimental design, as similar to the real life situation as possible. Also, the effect of advertisement on willingness to purchase in real life may differ from the one declared in questionnaire. In order to diminish this difference, often the research based on low-involvement of respondents is conducted. ⇑ Corresponding author. E-mail address: [email protected] (M. Owczarczuk). 0957-4174/$ - see front matter Ó 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2011.04.234

After the research is made, the statistical and logical analysis of its results is performed. One of the questions, which should be answered at this stage, is as follows: which features of the product (or advertisement) are most important when making a decision of purchase? Having this knowledge, the marketers may verify, whether their ideas and aims during the design of the advertisement were confirmed by clients. In other words, whether the message from the advertisement was correctly interpreted by the customers. In particular, the marketers may want, according to the long term policy of the company, customers to associate its product with such features as ‘‘modern, innovative’’ and associate the product of competitors as ‘‘traditional’’. In other words it is possible to find out whether the associations created with the brand (product, user) are in line with the long term policy of the company. Advertisement should coincide with this policy and the market research is a way to verify this coincidence. It is also possible to unveil consumers’ associations with the competeing products. Such research enables comparison of the examined brand and product versus its competitors, therefore provading essential information for future campaigns. The aim of this article is to present the methodology, that is capable to answer the just stated questions. The result is a ranking of variables with respect to their influence on the purchase and on the perception of the particular brand by the customer. This ranking has a clear interpretation and may be easily utilized by the marketers, who do not have statistical or machine learning background. The technique, we apply, is called the Monte Carlo feature selection and was introduced by Dramiski et al. (2008). This article is organized as follows: in the next section, we describe other works that are related to our problem. Then, we discuss in details our data generating process and methodological problems that are related to it. Then, we describe the statistical methodology applied to achieve the importance ranking. In the fifth section, we illustrate this approach to generate the ranking of features, that determine the purchase of an FMCG (Fast Moving Consumer Goods) product.

A. Cymbala, M. Owczarczuk / Expert Systems with Applications 38 (2011) 14218–14224

Our dataset is gathered from one of the Polish market research agency. 2. Related work The idea of applying data mining techniques to analyze questionnaire data is well documented in the literature. In Ramalingam, Palaniappan, Panchanatham, and Palanivel (2006), a group of respondents was asked 50 questions about three advertisements of toothpastes. Then, they were asked about the effectiveness of each advertisement. Then, the neural networks were applied to predict the advertisement effectiveness. The authors of Terano and Ishino (1996) used C4.5 decision trees and genetic algorithms to achieve clear, simple and interpretable models learned on questionnaires about toothpastes. Conclusions from these models were then the basis for strategy of promotion for domain experts. In Chen and Yan (2008), the architecture of customer utility prediction system is introduced. It uses conjoint analysis of questionnaire data to measure the product utility value and radial basis function neural networks based on the features of the product as a prediction module. A case study on cellular phone design is presented. The idea, how to apply association rules to questionnaire data, is described in Chen and Weng (2009). Authors emphasize, that various data types may occur in such datasets and propose a unified approach, based on fuzzy sets, to handle them correctly. They illustrate the performance of their methodology on the survey on the teaching evaluation. 3. Data generating process Comparing to the previous studies, our dataset has far less observations. There are only tens of questionnaires compared to hundreds or even thousands in articles mentioned before. However, our questionnaires are gathered in an experiment, that is very close to the real life decision process. Now we describe it in details. We use data gathered during a research conducted by one of the Polish market research agencies. A group of 180 respondents selected with respect to the gender and age was invited to take part in the research. This group was then randomly split into three groups of 60 persons. The first two groups, denoted by E1 and E2 respectively, watched two versions of the same advertisement. The group E1 watched the first version and the E2 group watched the second version. The third group, denoted by K, did not watched any of the two versions. The group K is a control group. In all advertisement blocks for both experimental groups (E1 and E2), the advertisement of the competitor was played. In order to keep this experiment close to the real life, the respondents were informed, that they are invited to the research about the TV programs. Nothing about the commercials was mentioned before the study. Respondents watched TV programs with commercial breaks. During the commercial breaks, they were allowed to do things, they usually do in this situation. Some of them made a coffee, read newspapers, talked to each other etc. What is important, in these commercial breaks, the advertisements, that were the target of the study, were shown. In this way the low-involvement principle was applied. After this part, questions about the association with brands and products of the FMCG company, that ordered the research, and with brands and products of its closest competitor, were asked. Also questions about the purchase intentions were asked. Questionnaires were computer aided, i.e. questions were presented on laptops and so were the answers collected. Respondents had to answer questions concerning 37 attributes of the product, brand

14219

or user. Each attribute was used in two questions: (1) in relation to the product (brand, user) of this particular company and (2) in relation to the product (brand, user) of the competitor. This made 2  37 = 74 questions. Each question was repeated three times in random order. So there were overall 222 questions, the customers had to answer. Each question was answered on the 5 grade Likert scale from 2 to +2, where 2 means ‘‘I totally disagree’’ and +2 means ‘‘I totally agree’’. Each question had the following form: ‘‘Does the description attribute suits brand/product/user?’’. For example ’’Does the description ‘is expert in cooking’ suits ‘brand M1’? ’’ Some questions were synonymous and were included in the research in order to filter inconsistent answers, for example ‘‘my brand’’ and ‘‘brand for me’’. Also, three repetitions of the same question were applied for this filtering. After this part, the questions with inconsistent answers (among three repetitions and among the group of synonymous questions), for each respondent, were deleted. The successive answers to repetitions and synonymous questions were then averaged. This gave 62 explanatory variables. After the questionnaire part, each respondent was given a certain amount of money and he or she was allowed to spend it in a specially arranged shop. The money might have been spent on the product of this particular brand or on the product of the competitor. The prices were reduced in comparison to the real market prices. The respondents could also keep the money for themselves. This last part is called the shelf test. This last part allows to calculate the categorical binary explained variable: the brand the customer chose, i.e. the brand of the company, that produced the advertisement and ordered the research, or the brand of the competitor. Almost all the customers chose a product and no one kept the money for himself, so this variable has two levels. The individuals, that kept the money, were excluded from the research. So we have 62 continuous explanatory variables and a two level categorical explained variable. This forms the supervised learning task. The goal is to answer the following question: what are the determinants of purchase? In other words, what are the most important features, the customers take into account when buying a product? What is their perception of this particular brand and the competitor’s brand? What features do they associate with these brands?. The goal is to describe the relations between the features of the product implied by the advertisement and the future purchase behavior. In particular, the hierarchy of the variables, with respect to their influence on the purchase, is the key concern. Having two target groups and the control group allows to compare, which elements of the marketing communication, that is which attributes, were activated by each advertisement. Answering to these questions has very serious business implications. If the advertisement activates an attribute, which is commonly associated with the brand of the competitor, then the advertisement may be unsuccessful. The single advertisement may not be able to break this association, because of the strong association of this particular attribute to the competitor’s product. Of course, using a long term advertising policy, it is possible to do so, but it may be too expensive. Strategies concerning the creation of the image of brands and products are planned in a few year advance. For example, it may be concluded, that a particular advertisement activates an attribute that is seemingly advantageous, but this attribute stays with the contradiction to the policy of the firm. For example, if the advertisement activates ’’traditional’’ attribute, but the policy assumes the target group of young people, then this target group may be discouraged to buy this product. In case of the emission of this advertisement, older people are activated, but they may form less profitable segment of the market. What is more, the positive association itself may not be correlated with increase of the purchase.

14220

A. Cymbala, M. Owczarczuk / Expert Systems with Applications 38 (2011) 14218–14224

So there is a need to determine the attributes, the advertisements activate, but also the need to measure the strength of this association with the increase of the purchase. It is possible to conduct such research for almost all advertised products, but it is perhaps easiest for the FMCG sector. For other kinds of products, e.g. financial services or durable goods, like cars, the shelf test is too expensive and it is usually replaced with questions on the purchase intention. However, as we stated in Section 1, this approach may result in bias and does not fully reflect the real life purchas decisions.

4. Statistical methodology Since there are only tens of observations for each group and the number of variables is comparable to the number of observations, special techniques must be applied to determine the importance of each variable. First of all, the univariate analysis, such as coefficient of correlation between the explained and each explanatory variable, neglects potential interactions between the explanatory variables. The application of multivariate techniques such as Fisher discriminant analysis or logistic regression also cannot be taken into account, because it requires a relative large number of observations in relation to the number of features in order to calculate properly the significance levels for each variable. One could think of the direct usage of other techniques, such as decision trees, and then determine the importance of variables and construct the ranking according to the fact, whether a decision tree picked a particular variable in a split or not. However, if there are only tens of observations, it is likely, that the tree uses only a few variables for splitting and so the vast majority of them remains unranked. In Dramiski et al. (2008), the proper methodology to deal with the problem of the measurement of the importance of variables in case of large number of features compared to the number of observations, is presented. The authors apply their technique to the problem of selection of genes, that are responsible for certain types of tumor. In their scenario, there are only tens of observations and number of genes, that form explanatory variables is measured in thousands. The goal is to select a small fraction of genes that are responsible for the particular type of tumor. The strict prediction task is not the key concern here. Similar task can be formulated for our questionnaire data. Best to our knowledge, there is no research about application of this technique to the questionnaire data. Now we present the basic idea of this algorithm (cf. Dramiski et al., 2008 for details). The algorithm is based on bootstrapping of variables and observations. In the simplest case, we do the following: having k variables and n observations we may draw randomly m  k variables and build a decision tree using these m variables. The decision tree has a variable selection method incorporated into its induction algorithm, so as a result, we achieve a list of variables selected by the tree among these m variables. Then we may repeat the process of random drawing of m variables among the k potential variables (the bootstrap phase) and in each repetition we may save the list of variables, which were picked by the tree. Variables, that are more important in this supervised learning task, should be chosen more often by trees. If we repeat the drawing sufficient number of times (tens of thousand of iterations), then each variable has a high probability of being drawn sufficiently often. Each variable, provided that it is drawn, is probably drawn with a different set of m  1 variables. Then, by building a tree, we test its performance in relation to this particular set. As a final result, we may sort variables with respect to the number of times, they were chosen by trees. This forms our ranking. By doing so, we do not neglect the interactions between the variables as in case of

univariate analysis. We also solve the problem, that a substantial number of variables may never be chosen by the tree, so the ranking may not sort variables properly. Of course, when constructing such ranking, it should be taken into account, on which level of the tree a particular variable is used in the split (splits, that are closer to the root node are more important) and what number of observations the particular variable splits (the larger number of observations is split, then the more important this particular split is). The overall classification rate of the trees (the accuracy), that used this particular variable should also be incorporated. In Dramiski et al. (2008), the ideas, how to take advantage of this knowledge, are presented. Also, in order to achieve the unbiased ranking, the bootstrapping of observations is applied. The sample is split many times into the train and test part and the trees are learned on the train part and their classification rate is tested on the test dataset (cf. Dramiski et al., 2008 for details). We will use the following formulas. Weighted accuracy is given by

wAcc ¼

c 1X nii ; c i¼1 ni1 þ ni2 þ . . . þ nic

ð1Þ

where c is the number of classes (two in our example) and nij is the number of observations from class i classified into class j. The relative importance (RI) of the variable gk is defined as

RIgk ¼

st X

ðwAccÞu

s¼1

 v no:in ngk ðsÞ IGðngk ðsÞÞ ; no:ins ðsÞ

X ng k

ð2Þ

where s is the number of iterations of the bootstrap phase, t is the number of partitions into the train and test subset, ngk ðsÞ is the number of nodes, that used variable g k ; IGðngk ðsÞÞ stands for information gain (like Gini index or entropy) related to the node ngk ðsÞ. The symbol no.in s denotes number of observations in the root node of the sth tree. We will also need a formula to calculate the distance between two rankings:

Distðs; s  10Þ ¼

1 X jrankðg k ; sÞ  rankðg k ; s  10Þj; dp g

ð3Þ

k

where summation is over dp top features obtained after s  10 iterations and rank(gk, r) is the rank of variable gk after r iterations. We set dp = 5. 5. Results Since the algorithm requires all observations to have no missing values, the following number of observations are left n(E1) = 31, n(E2) = 28 and n(K) = 28. In E1 group, 14 responents chose brand M1, in E2-11 and in K group - 9. Missing values were the result of the inconsistent answers. The algorithm has a few parameters to set: number of iterations (s), number of variables drawn in one iteration (m), number of partitions into train and test subsets inside one iteration (t), weight of the accuracy of the trees that used particular variables (u) and the weight of splits (v). As noted in Dramiski et al. (2008), the algorithm is rather robust to selection of the parameters and converges quite fast with respect to the number of iterations of the bootstrap procedure. In our experiments, we also noted this feature of this algorithm. Setting different values of parameters resulted at most in switching of ranking of neighbouring variables. The Fig. 1 illustrates the effect of number of iterations (s) on the final ranking for the first target group E1 and selected values of other parameters. Figures for the second target group E2 and the control group K are very similar, so we presents results only for

A. Cymbala, M. Owczarczuk / Expert Systems with Applications 38 (2011) 14218–14224

14221

Fig. 1. Convergence of the algorithm for E1 group and different values of v and u (m = 5, t = 10). The horizontal axis represents number if iterations (s) in tens and the vertical axis represents distance between the current ranking for top 5 variables and the ranking for top 5 variables achieved 10 iterations earlier. Fig. 2. Histograms of accuracy of the first target group E1 for true class labels (upper graph) and for the random permutation of class labels (lower graph).

the first target group E1. We may conclude, that the algorithm converges rather fast. We finally set the number of iterations s = 3000. We also set u = 1, v = 1 and t = 10. The whole procedure of bootstrapping of observations and variables allows to answer many important business questions. First of all we are able to verify, whether the variables, i.e. features of the product or brand (associations with the product or brand) really have the influence on the purchase and so, whether it is reasonable to analyse relations between them. In order to answer this question, we may do the following: repeat the whole alorithm for the orginal dataset and then for the dataset with random permutation of the class labels. Each time, we may save the accuracy of the tree achieved in each iteration for both datasets. Then we may compare histograms of accuracy for these two datasets. If there is a relation between the features and the choice of the brand, then these histograms should substantially differ. The histogram of accuracy for random permutations of the class label should be concentrated around 0.5 and the histogram for the true class labels should be shifted towards right. Figs. 2–4 illustrate this idea. We may conclude, that there really is a relation between the features and the future purchase for all groups E1, E2 and K. Histograms for the true class labels are shifted towards right, so trees achieved generally higher accuracy than randomness. The next question, that needs to be answered is, whether the achieved ranking really performs its function well, i. e. whether variables from the top of the ranking are more important, i.e. they allow to predict better than variables from the bottom of the ranking. In order to answer to this question, we may repeat our bootstrap procedure for the random m variables from the top 2 m variables of the ranking and m variables from the rest of the ranking. We set m = 10. We may save accuracies of trees generated in these two situations. Variables from the top of the ranking should generate higher accuracies. Results of this step are presented in Figs. 5–7. In case of the control group K, we noted that there exists an almost perfect predictor in the top of the ranking which makes a distribution highly concentrated around 1. So we made an additional experiment - we removed this predictor and repeated the whole procedure without it. Results are presented in Fig. 8. We may observe that, top variables have higher mean accuracy than the rest of variables. Now, we show the graphs of relative importance (RI) of top 20 variables for different values of m. To make graphs more transpar-

Fig. 3. Histograms of accuracy of the second target group E2 for true class labels (upper graph) and for the random permutation of class labels (lower graph).

ent, we used coded names of variables. Their full description is shown later. The results are shown in Figs. 9–11. We may observe, that different values of the m parameter gives different values of the relative importance (RI), but the sorting of variables remains practically unchanged. W finally chose m = 10, because the greatest contrast of importance between first variables and the remained features. In Tables 1–3 we present the list of variables that formed the top of the ranking (top 10 attributes) and the bottom of the ranking (bottom 10 attributes) for each group. Symbol M1 refers to the brand and product of the company, that ordered the research and M2 refers to the brand and product of the competitor. We may observe, that the highest positions in all three rankings are occupied by attributes related to brand M2, that is the brand of

14222

A. Cymbala, M. Owczarczuk / Expert Systems with Applications 38 (2011) 14218–14224

Fig. 6. Box plots of accuracies for the second target group E2. Left hand side graph represents accuracy for top variables and the right hand side graph represents accuracy for remained variables.

Fig. 4. Histograms of accuracy of the control group K for true class labels (upper graph) and for the random permutation of class labels (lower graph).

Fig. 7. Box plots of accuracies for the control group K. Left hand side graph represents accuracy for top variables and the right hand side graph represents accuracy for remained variables. Fig. 5. Box plots of accuracies for the first target group E1. Left hand side graph represents accuracy for top variables and the right hand side graph represents accuracy for remained variables.

the competitor. Both target groups watched the same version of the advertisement of the competitor. It means, that the perception of the brand M2 is most important when deciding about the purchase. We may conclude, that customers, that chose brand M1 did this, not because they think that the brand M1 is for them (attribute ‘‘for me2.M1’’), but they think, that brand M2 is not for them (attribute ‘‘for me2.M2’’). When analyzing the results for the control group (the group, that did not watch the advertisements), we may concentrate on the long term perception of the brands (not affected by the previously watched advertisement). In this group, the top 10 attributes are related to brand M2, that is the brand of the competitor. It means, that both advertisements of the brand M1 did not change substantially the way the customers think about these two brands. The advertisements only changed the order of associations, but the general tendency remained unchanged.

However, both advertisements of the M1 brand caused changes in the order of attributes, in comparison to the control group. The attribute ’’recommendation of products.M2’’ was second in the control group. The same attribute for the M1 brand, that is ‘‘recommendation of products.M1’’ was at 57. position. In both target groups, the attribute ’’recommendation of products.M1’’ reached 6. place. Meanwhile, the same attribute for brand M2 moved to 59. and 58. place, for the first and second target group, respectively. It is a very important attribute from the marketers’ point of view. It means, that respondents would rather recommend, to other potential consumers, the M1 brand than M2 brand. Indirectly, this attribute reflects the willingness to buy. If someone thinks, that the product is worth recommendation, then, to some extent, it means that this product is worth buying. The first and the second version of the advertisement caused the attribute ‘‘for everyday usage.M2’’ reach the second place. The same feature was at a relative low 22. position in the control group. Consumers, that chose brand M1 do not think, that this product is for everyday usage, but rather, that the product of the competitor is not suitable for everyday usage. The placement of

14223

A. Cymbala, M. Owczarczuk / Expert Systems with Applications 38 (2011) 14218–14224

Fig. 8. Box plots of accuracies for the control group K after removing the perfect predictor. Left hand side graph represents accuracy for top variables and the right hand side graph represents accuracy for remained variables.

Fig. 11. Relative importance of top 20 variables for the control group K.

this attribute in the top of the ranking is related to the marketing slogan, which was used in the competitor’s advertisement. This information may be used in the marketing strategy. We may create the image of the M1 brand as a product for everyday usage and increase its differential advantage over the competitor. When comparing rankings for both target groups, which saw two versions of the same advertisement, we may choose, which advertisement is more apropriate for the emission in mass media. We may take into account, which attributes are activated by each advertisement. We may observe, that the associations with brand M1 were weaker in group E2, which saw the second version of the advertisement. We know, from the results on the control group, that the associations with brand M2 are most important, when deciding about the purchase. The second version of the advertisement changes these associations stronger than the first version, so it may be reasonable to choose the second version of the advertisement as the final one. These two versions have the following difference: in the first advertisement, a certain controvercial celebrity appeared. We may conclude, that employing this person in the commercial, resulted in strengthening of the differFig. 9. Relative importance of top 20 variables for the first target group E1. Table 1 Ranking of variables for the first target group E1.

Fig. 10. Relative importance of top 20 variables for the second target group E2.

1

Rank

Variable

1 2 3 4 5 6 7 8 9 10 ... 53 54 55 56 57 58 59 60 61 62

for me2.1M2 for everyday usage.M2 highest quality of indredients.M1 cares about the family.M2 makes everyday cooking plesant.M2 recommendation of products.M1 just like me.M2 wide range of products.M2 likes cooking.M2 demanding.M2 ... helps to prepare tasty meals.M1 modern.M2 highest quality of indredients.M2 willing to try new dishes.M1 draws from the tradition of cooking.M2 having the product.M2 recommendation of products.M2 cares about the family.M1 moderate price.M2 makes cooking easy.M1

The attribute ‘‘for me2’’ refers to the product and ‘‘for me’’ refers to the brand.

14224

A. Cymbala, M. Owczarczuk / Expert Systems with Applications 38 (2011) 14218–14224

Table 2 Ranking of variables for the second target group E2. Rank

Variable

1 2 3 4 5 6 7 8 9 10 ... 53 54 55 56 57 58 59 60 61 62

makes cooking easy.M2 for everyday usage.M2 inspires to cook unusual meals.M2 moderate price.M2 surprises with easy and inspiring ideas.M2 recommendation of products.M1 made of natural indgedients.M1 encourages to family meals.M2 to try a product.M1 made of natural indgedients.M2 ... for me.M2 appreciates good taste.M2 for everyday usage.M1 recommended by experts.M2 makes cooking easy.M1 recommendation of products.M2 wide range of products.M2 modern.M1 appreciates good taste.M1 just like me.M1

Table 3 Ranking of variables for the control group K. Rank

Variable

1 2 3 4 5 6 7 8 9 10 ... 53 54 55 56 57 58 59 60 61 62

draws from the tradition of cooking.M2 recommendation of products.M2 for me2.M2 having the product.M2 reliable.M2 likes cooking.M2 for me.M2 willing to try new dishes.M2 expert in cooking.M2 recommended by experts.M2 ... just like me.M1 appreciates good taste.M1 ingredient of balanced meals.M1 demanding.M1 made of natural indgedients.M1 cares about the family.M1 likes cooking.M1 willing to try new dishes.M1 traditional.M1 makes cooking easy.M2

ences in perception of the second brand, rather than building the perception of the brand, which this person advertised. 6. Conclusions In this article, we presented a methodology to achieve a hierarchy of attributes, that determine the purchase, when the purchase is influenced by the advertisement. This ranking has a clear interpretation and can be easily understood by marketers without the machine learning background. We used questionnaire data, that were gathered using the low-involvement principle. We also analyzed the problem, how to choose best advertisement of the same product, having a few competing versions. We employed, in this marketing study, the Monte Carlo feature selection algorithm, that was earlier used in genetics. We analyzed the FMCG products, but our approach can be applied to any product, that is advertised, so there is a wide range of potential applications of this idea. The algorithm presented in this paper can be easily applied to other types of marketing research. Due to the fact that it uses decision trees as the inbuilt classifiers it can also cover various types of data, which are typical in marketing research. The application of the method presented here can also contribute to cutting costs of research, due to the fact that it is designed to find relationships in small samples. References Dramiski, M., Rada Iglesias, A., Enroth, S., Wadelius, C., Koronacki, J., & Komorowski, J. (2008). Monte Carlo feature selection for supervised classification. Bioinformatics, 24(1), 110–117. Ramalingam, V., Palaniappan, B., Panchanatham, N., & Palanivel, S. (2006). Measuring advertisement effectiveness – A neural network approach. Expert Systems with Applications, 31, 159–163. Chen, C.-H., & Yan, W. (2008). An in-process customer utility prediction system for product conceptualisation. Expert Systems with Applications, 34, 2555–2567. Terano, T., & Ishino, Y. (1996). Knowledge acquisition from questionnaire data using simulated breeding and inductive learning methods. Expert Systems With Applications, 11(4), 507–518. Chen, Y.-L., & Weng, C.-H. (2009). Mining fuzzy association rules from questionnaire data. Knowledge-Based Systems, 22, 46–56.