Available Available online online at at www.sciencedirect.com www.sciencedirect.com Available online at www.sciencedirect.com Available at Science www.sciencedirect.com Procedia online Computer 00 (2017) 000–000 Available online at www.sciencedirect.com Available online at www.sciencedirect.com
www.elsevier.com/locate/procedia
ScienceDirect Procedia Computer Science 00 (2017) Information Technology and Procedia Computer Science 00 Quantitative (2017) 000–000 000–000 Management (ITQM 2017) Procedia Computer Science 00 (2017) 000–000 Procedia Computer Science 00 (2017) 000–000 Procedia Computer Science 122 (2017) 1031–1038 Procedia Computer Science 00 (2017) 000–000
www.elsevier.com/locate/procedia www.elsevier.com/locate/procedia www.elsevier.com/locate/procedia www.elsevier.com/locate/procedia www.elsevier.com/locate/procedia
A Multi Attribute Value Theory approach to rank (ITQM association rules for Information Technology and Quantitative Management 2017) Information Technology and Quantitative Management (ITQM 2017) Information Technology and Quantitative Management (ITQM 2017) leveraging better Information Technology and business Quantitative decision Managementmaking (ITQM 2017) Information Technology and Quantitative Management (ITQM 2017) A Multi Attribute Value Theory approach to rank association rules for Shekhar Shukla Ashwani Kumar *, B.K.Mohanty, A Value Theory approach to rank association rules for A Multi Multi Attribute Attribute Value Theory approach to rank association rules for leveraging better business decision making A Multi Attribute Value Theory approach to rank association rules for Indian Institue of Management Lucknow,Lucknow-226013,India leveraging better business decision making leveraging better business decision making leveraging betterB.K.Mohanty, business decision making Shekhar Ashwani Shekhar Shukla Shukla **,, B.K.Mohanty, Ashwani Kumar Kumar
Shekhar Shukla *, B.K.Mohanty, Ashwani Kumar Shekhar Shukla Ashwani Kumar *, B.K.Mohanty, Indian Institue Lucknow,Lucknow-226013,India IndianShukla Institue of of Management Management Lucknow,Lucknow-226013,India Shekhar Ashwani Kumar *, B.K.Mohanty, Indian Institue(ARM) of Management Lucknow,Lucknow-226013,India Market Basket Analysis or Association Rule Mining is an approach to discover the purchase patterns of the customers by extracting Abstract
Indian Institue of Management Lucknow,Lucknow-226013,India
and analyzing the basket of items whichIndian sell together. also keen to discover those rules which can generate more profits. Institue ofBusinesses Managementare Lucknow,Lucknow-226013,India Abstract Abstract These profitable purchase patterns of customers once identified can lead to better product assortment decision making for businesses. Better Abstract AbstractAssortment decisions can surely be a competitive advantage for businesses in terms of customer satisfaction and profit generation. Product Market Basket or Association Rule Mining (ARM) is approach to the purchase patterns of the by Abstract Market BasketofAnalysis Analysis Association Mining (ARM)rules is an anhelp approach to discover discover thewith purchase patterns the customers customers by extracting extracting The concepts supportor and confidenceRule in the association to extract the rules frequent and of reliable co-occurrences of the Market Basket the Analysis orof Rule (ARM) is an approach to discover the purchase patterns of the by extracting and analyzing basket items which sell together. Businesses are also keen to those which can generate more profits. and analyzing the basket ofAssociation items which sellMining together. Businesses are also keen to discover discover those rules rules which cancustomers generate more Market Basket Analysis or Association Rule Mining (ARM) is an approach to discover the purchase patterns of the customers by extracting items in customers’ purchases. The profitable rules can be assessed using the domain-related measures such as item set value and theprofits. crossand analyzing the basket of items which sell together. Businesses are also keen to discover those rules which can generate more profits. These profitable purchase patterns of customers once identified can lead to product decision making for businesses. Better Market Basket Analysis orof Association Rule Mining isaan approach to discover theassortment purchase patterns of the by extracting These profitable purchase patterns of customers once identified can leadalso to better better product assortment decision making for businesses. Better and analyzing the basketwith items which sell together. Businesses are keen to to discover those rules which cancustomers generate more profits. selling profit associated the association rules. We(ARM) propose ranking mechanism combine the different criteria of Confidence, Support, These profitable purchase patterns of customers once identified can are lead to better assortment decision making for businesses. Better Product Assortment decisions can surely be aa competitive advantage for businesses in terms of customer satisfaction and profit generation. and analyzing the basket of items which sell together. Businesses keen product tousing discover rules which can generate more profits. Product Assortment decisions can Profit surely be competitive advantage foralso businesses in terms ofthose customer satisfaction and profit generation. These profitable purchase patterns of customers once identified can lead to better product assortment decision making for businesses. Better Item Set Value and Cross-Selling to get an overall interestingness measure; Multi-Attribute Value Theory (MAVT) approach; Product Assortment decisions can surely be a competitive advantage for businesses in terms of customer satisfaction and profit generation. The concepts concepts of purchase support and and confidence in the the association association rulescan help totoextract extract the rulesassortment with frequent frequent and reliable reliable co-occurrences of the the These profitable patterns of customers once identified lead better product decision making for businesses. Better The of support confidence in rules help to the rules with and co-occurrences of ProductinAssortment decisions canimplementation surely be a competitive advantage for businesses and profit generation. which turn uses DIVIZ as the tool. These association rules can in beterms used of as customer a leveragesatisfaction for the marketing activities like The of support and confidence in the association rules help to businesses extract the in rules with frequent and co-occurrences of the items in customers’ purchases. The profitable can assessed using the domain-related measures such as item value and the Product decisions can surely be a rules competitive advantage for terms of customer satisfaction and profit itemsconcepts in Assortment customers’ purchases. The profitable rules can be be assessed using the measures such asreliable item set set value andgeneration. the crosscrossThe concepts of support and inetc. the association rulesdecisions help to extract the rules with frequent and reliable co-occurrences of the cross-selling promotions, shelfconfidence placement and other crucial likedomain-related product assortment selection. itemsconcepts inprofit customers’ purchases. The profitable rules canpropose be assessed using the domain-related measures such item set value and the crossselling associated with the association rules. We propose ranking mechanism to combine the different different criteria ofco-occurrences Confidence, Support, The of support andthe confidence in the association rules help tomechanism extract the to rules with frequent andas reliable of the selling associated with association rules. We aa ranking combine the criteria of Confidence, Support, items inprofit customers’ purchases. The profitable rules can be assessed using the domain-related measures such as item set value and the crossselling profit associated with the association rules. We propose a ranking mechanism to combine the different criteria of Confidence, Support, Item Set Value and Cross-Selling Profit to get an overall interestingness measure; using Multi-Attribute Value Theory (MAVT) approach; items inprofit customers’ purchases. profitable rules canpropose beinterestingness assessed usingmechanism the domain-related measures such as item set value and approach; the crossItem Set Value and Cross-Selling Profit to get an overall measure; using Multi-Attribute Value Theory (MAVT) selling associated with theThe association rules. We a ranking to combine the different criteria of Confidence, Support, © 2017 The Authors. Published by Elsevier B.V. Item Set Value and DIVIZ Cross-Selling Profit to get an overall interestingness measure; Multi-Attribute Value Theory (MAVT) approach; which in turnassociated uses DIVIZ asthe theassociation implementation tool.propose These association rules can canusing becombine used as the leverage for the marketing activities like selling profit withas rules. We aassociation ranking mechanism to different criteria of Confidence, Support, which in turn uses the implementation tool. rules be used as aa leverage for the marketing activities like Item Set Value and Cross-Selling Profit to get an overall interestingness measure; using Multi-Attribute Value Theory (MAVT) approach; Selection and/or peer-review under responsibility of theThese organizers of ITQM 2017 which in turn uses DIVIZ as the implementation tool. These association rules can be used as a leverage for the marketing activities like cross-selling promotions, shelf placement etc. and other crucial decisions like product assortment selection. Item Set and DIVIZ Cross-Selling Profit toetc. get and an overall interestingness Multi-Attribute Value Theory (MAVT) approach; cross-selling shelf placement otherThese crucial decisionsmeasure; like selection. which in Value turnpromotions, uses as the implementation tool. association rulesproduct canusing beassortment used as a leverage for the marketing activities like cross-selling promotions, shelf placement etc. and tool. otherThese crucialassociation decisions like product assortment selection.for the marketing activities like which in turn uses DIVIZ as the implementation rules can be used as a leverage cross-selling promotions, shelf placement andInterestingness other crucial decisions like product assortment selection. Keywords: Market Basket Analysis; Associationetc. Rules; measure; Multi-Attribute Value Theory; Preferences; Business Decision Making © 2017 Authors. Published by Elsevier B.V. cross-selling promotions, shelf placement etc. and other crucial decisions like product assortment selection. © 2017 2017 The TheAuthors. Authors. Published byElsevier Elsevier B.V. © Published by B.V. © 2017 The The Authors. Published by Elsevier B.V. of Selection and/or peer-review under responsibility of the the organizers organizers of of ITQM ITQM 2017 2017 Selection and/or peer-review under responsibility © 2017 The under Authors. Published by Elsevier B.V.committee Peer-review responsibility of the scientific of the 5th Selection and/or peer-review under responsibility the organizers ofInternational ITQM 2017 Conference on Information Technology and Quantitative © 2017 The Authors. Published by Elsevier B.V. of Selection and/or peer-review under responsibility of the organizers of ITQM 2017 1. Introduction Management, ITQM 2017. Keywords: Market Basket Association Rules; measure; Multi-Attribute Value Selection under responsibility of the organizers of ITQM 2017 Keywords: and/or Market peer-review Basket Analysis; Analysis; Association Rules; Interestingness Interestingness measure; Multi-Attribute Value Theory; Theory; Preferences; Preferences; Business Business Decision Decision Making Making Keywords: Market Basket Analysis; Association Rules; Interestingness measure; Multi-Attribute Value Theory; Preferences; Business Decision Making Keywords: MarketisBasket Analysis; Associationdiscovery Rules; Interestingness measure; Multi-Attribute Value Theory; Business Decision Making “Data Mining a step of knowledge in databases consisting of applying data Preferences; analysis and discovery algorithms Keywords: Market Basket Analysis; Association Rules; Interestingness measure; Multi-Attribute Value Theory; Preferences; Business Decision Making
that, under acceptable computational efficiency limitations, produce a particular enumeration of patterns over the data.”[1] 1. Introduction 1. Introduction This paper uses concepts of Association Rule Mining (ARM) proposed by Agrawal et al [2]which is one of the main Data 1. Introduction 1. Introduction Mining Techniques. Association Rule Mining or in Market Basket Analysisof is an important Data Mining technique used to find “Data Mining is of discovery databases consisting data analysis and algorithms 1. Introduction “Data Mining is aa step step oforknowledge knowledge discovery indata. databases consisting of applying applying data analysis and discovery discovery algorithms interesting business rules patterns hidden in the Market Basket Analysis uses certain set of criteria or constraints like “Data Mining is a step of knowledge discovery in databases consisting of applying data analysis and discovery algorithms that, under acceptable efficiency limitations, produce aa particular enumeration of patterns over the data.”[1] that, under acceptable computational efficiency limitations, produce particular enumeration ofsuspicion patterns over thealgorithms data.”[1] “Data Mining is a stepcomputational oftoknowledge discovery in databases consisting ofcontain applying data analysis and discovery Support and Confidence generate rules. But rules so discovered always an element of that they may not that, under acceptable computational efficiency limitations, produce a particular enumeration of patterns over the data.”[1] This paper paper uses concepts of Association Rule Mining Mining (ARM) proposed byapplying Agrawal et al alanalysis [2]which is one one of the the main Data “Data Mining is concepts a stepcomputational ofof knowledge discovery in databases consisting of data and discovery This uses Association Rule (ARM) proposed by Agrawal et [2]which of main Data that, under acceptable limitations, produce a like particular enumeration of patterns over thealgorithms data.”[1] be statistically valid. These rules may efficiency satisfy the criteria or constraints support and confidence inis sample data but not in This paper uses concepts of Association Rule Mining (ARM) proposed by Agrawal et al [2]which is one of the main Data Mining Techniques. Association Rule Mining or Market Basket Analysis is an important Data Mining technique used to find that, under acceptable computational efficiency limitations, produce a particular enumeration of patterns over the data.”[1] Mining Techniques. Association RuleitMining Market Basket Analysis is an important Mining technique toData find This paper usesdistribution concepts of[3]. Association Rule or Mining (ARM) proposed by Agrawal et for alData [2]which is one of theused main the whole data Thus, becomes important to perform statistical validity the rules before they are used for Mining Techniques. Association Rule Mining or Market Basket Analysis is an important Data Mining technique used to find interesting business rules or patterns hidden inor theMarket data.(ARM) Market Basket Analysis uses certain certain setMining of criteria criteria orofconstraints constraints like This paper business uses concepts of patterns Association Rule Mining proposed by Agrawal et alData [2]which istechnique oneor theused maintoData interesting rules or hidden in the data. Market Basket Analysis uses set of like Mining Association Rule Mining Basket Analysis is an important find practicalTechniques. purposes because very nature of Mining is data driven [4]. Agarwal et al. [2] introduced Association Rules interesting business rules orthe patterns hidden inData the data. Market Analysis uses certain of criteria or constraints like Support and Confidence to generate But rules so discovered always an element of suspicion that they not Mining Techniques. Association Rulerules. Mining or Market Basket Basket Analysis iscontain an important Dataset Mining technique usedmay to find Support and Confidence to generate rules. But rules so discovered always contain an element of suspicion that they may not interesting business rules or patterns hidden in the data. Market Basket Analysis uses certain set of criteria or constraints like also known as Market Basket Analysis which isrules helpful in decisions about marketing activities, e.g., promotional or Support andbusiness Confidence toorgenerate rules. But so discovered always contain anand element ofof suspicion that theypricing may not be statistically valid. These rules may satisfy the criteria or constraints like support confidence in sample data but not in interesting rules patterns hidden in the data. Market Basket Analysis uses certain set criteria or constraints like be statistically valid. These rules may satisfy the criteria or constraints like support and confidence in sample data but not in Support and Confidence to generate rules. But rules so discovered always contain an element of suspicion that they may not product placements. A rule is defined as: be statistically valid. These rules may satisfy the criteria or constraints like support and confidence in sample data but not in thestatistically wholeand data distribution [3]. Thus, itsatisfy becomes important toconstraints perform statistical validity for the theofrules rules before they arebut used for Support Confidence to [3]. generate Butthe rules so discovered always anand element suspicion that they may the whole data distribution Thus, it becomes important perform statistical validity for they are used for be valid. These rules mayrules. criteria orto likecontain support confidence inbefore sample data notnot in the whole data distribution [3]. Thus, it becomes important to perform statistical validity for the rules before they are used for practical purposes because the very nature of Data Mining is data driven [4]. Agarwal et al. [2] introduced Association Rules be statistically valid. These rules may satisfy the criteria or constraints like support and confidence in sample data but not in practical purposes because the very nature of Data Mining is data driven [4]. Agarwal et al. [2] introduced Association Rules the whole data distribution [3]. Thus, it becomes important to perform statistical validity for the rules before they are used for known � → where �,� the ⊆� Analysis ��� � ∩which �of=Data ∅is ;helpful Here �in and � aredriven known as item set et practical purposes because very nature Mining is data [4]. Agarwal al. [2] introduced Association Rules also as�Market Market Basket decisions about marketing activities, e.g.,before promotional pricing or the whole data distribution [3]. Thus, it becomes important to perform statistical validity for the rules they are used for also known as Basket Analysis which is helpful in decisions about marketing activities, e.g., promotional pricing or practical purposes because the Analysis very nature of Data Mining data driven [4].marketing Agarwal et al. [2] introduced Association Rules also known as Market is helpful inis decisions about activities, e.g., promotional pricing or product placements. A rule is as: practical purposes because very nature of Data Mining data driven [4].marketing Agarwal et al. [2] introduced Association Rules product placements. A Basket rule the is defined defined as:which also known ascalled Market Basket Analysis which is helpful inisdecisions about activities, e.g., promotional pricing or � is the Antecedent and Y is called the Consequent product placements. A rule is defined as: also known as Market Basket Analysis which is helpful in decisions about marketing activities, e.g., promotional pricing or product placements. A rule is defined as: placements. � ��� ∩ product A �,� rule ⊆� is defined �→ →� � where where �,� ⊆� ��� � �as: ∩� �= =∅ ∅ ;; Here Here � � and and � � are are known known as as item item set set For→instance the ��� table�1 given if we have� aare rule {hardasdisk, then : Support =1/5=.2 � � wherefrom �,� ⊆� ∩ � =below ∅ ; Here � and known itemlaptop}→{mouse} set � → � where �,� ⊆� ��� � ∩ � = ∅ ; Here � and � are known as item set Confidence= 1/1=1 is called the Antecedent and Y is called the Consequent and � → � where �,� ⊆� ��� � ∩ � = ∅ ; Here � and � are known as item set � is called the Antecedent and Y is called the Consequent � is called the Antecedent and Y is called the Consequent � is called the Antecedent and Y is called the Consequent For the 11 given below if have � called thefrom Antecedent Y is called Consequent Foris instance instance from the table tableand given below the if we we have aa rule rule {hard {hard disk, disk, laptop}→{mouse} laptop}→{mouse} then then :: Support Support =1/5=.2 =1/5=.2 For instance from1/1=1 the table 1 given below if we have a rule {hard disk, laptop}→{mouse} then : Support =1/5=.2 and Confidence= and Confidence= For instance from1/1=1 the table 1 given below if we have a rule {hard disk, laptop}→{mouse} then : Support =1/5=.2 and For Confidence= instance from1/1=1 the table 1 given below if we have a rule {hard disk, laptop}→{mouse} then : Support =1/5=.2 and Confidence= 1/1=1 * Corresponding author. Tel.: +91-9839693548 and Confidence= 1/1=1 E-mail address:
[email protected]
1877-0509 © 2017 The Authors. Published by Elsevier B.V. * Corresponding Corresponding author. Tel.: +91-9839693548 +91-9839693548 Peer-review under responsibility of the scientific committee of the 5th International Conference on Information Technology and * author. Tel.: * Corresponding author. Tel.: +91-9839693548 E-mail address:
[email protected] Quantitative Management, ITQM 2017. E-mail address:
[email protected] * Corresponding author. Tel.: +91-9839693548 E-mail address:
[email protected] 10.1016/j.procs.2017.11.470 * Corresponding author. Tel.: +91-9839693548 E-mail address:
[email protected] E-mail address:
[email protected]
Shekhar Shukla et al. / Procedia Computer Science 122 (2017) 1031–1038
1032
Author name / Procedia Computer Science 00 (2017) 000–000 Table 1. A transaction level database
T.ID 1 2 3 4 5
EXTERNAL HARD DISKS 1 0 0 1 0
WIRELESS MOUSE
LAPTOP
WIRELESS KEYBOARD
1 0 0 1 1
0 1 0 1 0
0 0 1 0 0
Classical ARM can solve problems for Boolean attributes, since the association concerned is whether an item is present in a transaction or not, being a binary value 0 or 1. But a real world data base may contain attributes in different formats like integer, categorical (logical), real etc. Since we cannot apply classical ARM, we look for an alternate solution. We can handle such attributes by partitioning the attribute domains, and then transforming the problem into binary one. Suppose we have product catalogue data with attribute values of price and service (expected in 3 yrs.) cost associated with each laptop product as shown in figure 1:
Fig. 1. Numerical attributes translated into Boolean attributes using partition
This partitioning approach induces a problem of sharp boundary which can be dealt with smoothening of boundary transitions with concepts like Fuzzy Sets etc. The Association Rules are generated based on Support and Confidence using Apriori Algorithm. In the work presented in this paper, frequent rules are mined which are subjected to constraints or interestingness measures of confidence and support. Once the frequent and reliable rules are in place, measures like cross-selling profit and item set value are calculated for each rule which are useful in terms of business profits. These rules are ranked using a Multi-Attribute Value Theory Approach. The rules thus achieved are ready to be used in business intelligence. The rest of the paper is organized as follows. A brief literature review is presented in Section 2; and the proposed methodology is covered in Section 3. An illustrative example is presented in Section 4. Finally, the conclusions and future work is covered in Section 5. 2. Literature Review The literature review is organized in three sections. We firstly take a brief look at the development of Association Rules. Then we discuss the past research works on the statistical validity of these rules and why research emphasizes this aspect. Finally, we discuss the research works done to find interesting rules based on criteria like support and confidence and DEA as a MCDM method in developing an aggregate score for rating the interesting rules. 2.1 Association Rule Mining Association Rules also known as Market Basket Analysis is helpful in decisions about the marketing activities e.g., promotional pricing or product placements by identifying the hidden purchase patterns in data [2]. Association rule mining provides a flexibility to the decision makers to define the MBA model and its parameters like support and confidence in a way that suits the problem context and data [5]. It is important in terms of actual implementation of Association rules that only interesting rules should be left after final filtration; thus it is necessary to rank rules from data mining due to the number of quality rules [6] and business resource constraint [7]. A variety of domains like marketing, healthcare, logistics etc. supply their domain database to use Association Rule mining for Business Intelligence [8,9]. Support and confidence are taken as two measures to evaluate the interestingness of association rules [2,10]. Association rules are regarded as interesting if their
Shekhar Shukla et al. / Procedia Computer Science 122 (2017) 1031–1038 Author name / Procedia Computer Science 00 (2017) 000–000
1033
support and confidence are greater than the user-specified minimum support and minimum confidence, respectively. Data miners usually specify these thresholds in an arbitrary manner. Numerous algorithms for finding association rules have been developed in previous studies [11]. However, relatively little literature has attempted to employ the application-specific criteria for setting the threshold of association rules. 2.2 Ranking methods for aggregating interestingness measures of discovered rules Multi-Criteria Decision Making methods have been used to rank the discovered association rules. Choi et al [7] used ELECTREE II method to rank the discovered association rules. DEA has been extensively used as a multi-criteria tool for evaluating the discovered association rules based on their interestingness measures. DEA was defined as a mathematical programming model applied to observational data, which provides a new method of obtaining empirical estimates of extreme relations-such as the production functions and/or efficient production possibility surfaces that are fundamental to modern economics [12] . DEA was originally designed to mathematically measure decision making units (DMUs) with multiple inputs and outputs in terms of relative efficiency (i.e., the ratio of total weighted output to total weighted input). Cook and Kress [7] introduced a theoretical extension of DEA to analyze the ranked voting data. The ranked voting system is a pure output DEA model where each candidate (DMU) has multiple outputs (ranked votes) and only one input with the amount unity [14]. In the approach developed by Cook and Kress [13], the preference scores are estimated without initially imposing any fixed weights. The score of each candidate (DMU score) is calculated based on its most favorable weights [15]. The preference score, �� of candidate (DMU) is the weighted sum of votes with certain weights. Chen et al [16] used DEA to rank the association rules based on the multiple interestingness factors of confidence, support, cross-selling profit and item set value. Toloo et al [17] modified Chen et al [16] by introducing an iterative approach and considering only output data for ranking the association rules. Foroughi [18] proposed a full ranking approach which was a modification to the method proposed by Toloo et al [17] by removing the constraint of non-ranking of inefficient DMU’s. In the next section, we propose a framework based on the more simplistic approach of Multi-Attribute Value theory to rank the discovered association rules; where Fuzzy membership functions are used to elicit the preference of Decision Makers over the different criteria values. Decision Maker also has the say in deciding the preference weights of each criteria for different alternatives; which are nothing but the association rules. 2.3 Multi-Attribute Value Theory Multi-Attribute Value theory was described in theoretical concepts by Fishburn [19] and Keeney and Raiffa [20]. MAVT is a deterministic additive model where Value function �� (�) to each criterion j, to obtain the function. Global value of an alternative (1) �(�) = ( ∑���� �� (�) �� ), where �� is the preference weight for � �ℎ criteria
3. Methodology`
Taking reference from the brief discussion made in previous sections, we define the flow of our methodology through the following chart as shown in figure 2:
Fig. 2. Flow Diagram of Methodology
Shekhar Shukla et al. / Procedia Computer Science 122 (2017) 1031–1038
1034
Author name / Procedia Computer Science 00 (2017) 000–000
The details of each step of this methodological flow are mentioned below: 3.1 Input Data Data preprocessing is applied in order to obtain clean and consistent data. These preprocessing techniques include the removal of missing values and discretization of attributes with continuous values. We also identify the attributes in their roles of antecedent and consequent in the rules. 3.2 Mine Association Rules by defining Support and Confidence Mine association rules by using the Association Rule Mining Algorithm with minimum support and minimum confidence. For association rules like � → � Support: The support, s, is the percentage of transactions that contain � ∪ � (Agrawal et al) [2]. It takes the form � = �(� ∪ �)
(2)
Confidence: The confidence , �, is the ratio of the percentage of transactions that contain � ∪ � to the percentage of transactions that contain � (Agrawal et al) [2]. It takes the form � = (�(� ∪ �))/(�(�))
(3)
3.3 Define other Subjective measures based on Domain Knowledge Some other subjective measures can be defined based on the domain knowledge for assessment of these multiple criteria apart from support and confidence. Two such subjective measures are defined below: Item set value: The item set value �, is the sum of values of all items (�� ′�) in the item set, � ∪ � and can be calculated by � = ∑� ∈ � ∪� ��
(4)
� = ∑�∈� ��
(5)
Cross-selling profit: For the rule � → �, the cross-selling is described as recommending that customers purchase �, if they have bought �. Therefore, the cross-selling profit is the sum of the profits of all items (�� ’�) in �, and can be calculated by Thus, the measures of support, confidence, Item set value and Cross-Selling Profit can be used as the Criteria for the MAVT model. 3.4 Find aggregate score for interestingness using MAVT These four measures of Support, Confidence, Item set value and Cross-Selling Profit are used as the criteria in the MAVT model, to calculate the preference scores for all the generated association rules. Preference scores (�� ) for each rule is calculated. The interestingness of a rule is measured by the preference scores; which are essential for filtering a number of rules and report only those which are most interesting to the decision makers. The criteria of support and confidence only consider the database perspective. Marketing Analysts are more interested in the criteria of Item Set Value and Cross-Selling Profits. Association rule is generally application- dependent; and the domain information in application areas can potentially provide useful criteria for picking important rules. 3.5 Discriminate and Select Rules for final implementation Discriminate the efficient rules which are most interesting to the decision maker based on the business logic and domain knowledge. This discrimination is necessary in order to have a set of rules which are best suited for implementation. Select rules for implementation by considering the preference scores generated and domain related knowledge on the discriminated rules. The rankings of all rules can be obtained, and analysts accordingly select useful rules for implementation.
Shekhar Shukla et al. / Procedia Computer Science 122 (2017) 1031–1038 Author name / Procedia Computer Science 00 (2017) 000–000
1035
4. Numerical Illustration To show the applicability of the proposed method, we use Market Basket Analysis data originally used by Chen et al [16]. Here the association rules are discovered using a minimum confidence and support of 1% and 10% respectively. Forty Six association rules are identified for which domain related parameters like cross-selling profit and item set value are also calculated. We used Diviz – Decision Deck to evaluate the rules using the MAVT approach where the workflow is shown below:
Fig. 3. DIVIZ Decision Deck Workflow for the MAVT Approach
We used a bisection method between the maximum and minimum values of each criteria to define their value functions:
1036
Author name / Procedia Computer Science 00 (2017) 000–000
Shekhar Shukla et al. / Procedia Computer Science 122 (2017) 1031–1038
Fig. 4. The Criteria Value Functions and Weights
Fig.5.Association Rules database with the MAVT Approach
Author name / Procedia Computer Computer Science 00 (2017) Shekhar Shukla et al. / Procedia Science000–000 122 (2017) 1031–1038
1037
These ranked rules are good in terms of involving the decision maker’s preference both in terms of defining value functions as well as eliciting an appropriate weight to each of the criteria based on the decision maker’s preference or problem context. Here, we have given the top most preference to the cross-selling criterion considering a decision maker who might be interested in understanding the profitability aspect of cross-selling products. 5. Conclusions and Future Research This paper presents a framework which integrates a number of ways for ascertaining the extracted data mining rules in terms of interestingness. The aim of this paper was to ascertain the quality of the rules in terms of their interestingness and preferences of the decision makers and usefulness for business. These rules are discovered from association rule mining which has a huge amount of rules and complex attributes measurement levels. The complexity of the rule evaluation and selection is difficult for the analysts. The traditional approaches usually ignore the subjective domain knowledge in selecting the useful rules. To meet the requirements of marketing analysts, the Multi-Attribute Value Theory (MAVT) is used in this paper to evaluate the efficiency (interestingness or usefulness) of association rules with multiple criteria, including the subjective domain related measures. This framework provides significant and interesting rules when applied to the structured or relational data. The proposed framework can be applied to ascertain more complex rules which are discovered from the semi-structured data. Also, more important preference measurement schemes for Decision Makers can be incorporated which can be more scientific to bring out the actual priorities of a Decision Maker. This framework can also be used over complex datasets as a mechanism for rule evaluation and selection. Most importantly, this approach empowers the decision maker to include his expertise in terms of weights of MAVT approach to discover profitable and context specific association rules. This helps businesses to gain competitive advantage and strive for higher profits. Acknowledgements We want to thank the team of http://www.decison-deck.org/diviz/ for freely available software called DIVIZ which has been used in the paper to execute the MAVT approach. References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16]
U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, others, Knowledge Discovery and Data Mining: Towards a Unifying Framework., in: KDD, 1996: pp. 82–88. R. Agrawal, T. Imieliński, A. Swami, Mining association rules between sets of items in large databases, in: Acm Sigmod Record, ACM, 1993: pp. 207–216. G.I. Webb, Discovering significant patterns, Machine Learning. 71 (2008) 131–131. A. Goodman, C. Kamath, V. Kumar, Data analysis in the 21st century, Statistical Analysis and Data Mining: The ASA Data Science Journal. 1 (2008) 1–3. G.I. Webb, S. Zhang, K-optimal rule discovery, Data Mining and Knowledge Discovery. 10 (2005) 39–79. P.-N. Tan, V. Kumar, Interestingness measures for association patterns: A perspective, in: Proc. of Workshop on Postprocessing in Machine Learning and Data Mining, 2000. D.H. Choi, B.S. Ahn, S.H. Kim, Prioritization of association rules in data mining: Multiple criteria decision approach, Expert Systems with Applications. 29 (2005) 867–878. M.-C. Chen, Configuration of cellular manufacturing systems using association rule induction, International Journal of Production Research. 41 (2003) 381–395. M.-C. Chen, A.-L. Chiu, H.-H. Chang, Mining changes in customer behavior in retail marketing, Expert Systems with Applications. 28 (2005) 773– 781. R. Srikant, Q. Vu, R. Agrawal, Mining association rules with item constraints., in: KDD, 1997: pp. 67–73. J. Hipp, U. Güntzer, G. Nakhaeizadeh, Algorithms for association rule mining—a general survey and comparison, ACM Sigkdd Explorations Newsletter. 2 (2000) 58–64. A. Charnes, W.W. Cooper, E. Rhodes, Measuring the efficiency of decision making units, European Journal of Operational Research. 2 (1978) 429– 444. W.D. Cook, M. Kress, A data envelopment model for aggregating preference rankings, Management Science. 36 (1990) 1302–1310. A. Hashimoto, A ranked voting system using a DEA/AR exclusion model: A note, European Journal of Operational Research. 97 (1997) 600–604. T. Obata, H. Ishii, A method for discriminating efficient candidates with ranked voting data, European Journal of Operational Research. 151 (2003) 233–237. M.-C. Chen, Ranking discovered rules from data mining with multiple criteria by data envelopment analysis, Expert Systems with Applications. 33 (2007) 1110–1116.
1038
Shekhar Shukla et al. / Procedia Computer Science 122 (2017) 1031–1038 Author name / Procedia Computer Science 00 (2017) 000–000
[17] [18] [19] [20]
M. Toloo, B. Sohrabi, S. Nalchigar, A new method for ranking discovered rules from data mining by DEA, Expert Systems with Applications. 36 (2009) 8503–8508. A.A. Foroughi, A note on “A new method for ranking discovered rules from data mining by DEA”, and a full ranking approach, Expert Systems with Applications. 38 (2011) 12913–12916. P.C. Fishburn, Methods of estimating additive utilities, Management Science. 13 (1967) 435–453. R.L. Keeney, H. Raiffa, Decision with multiple objectives, Wiley, New York, 1976.