Finding “persistent rules”: Combining association and classification results

Expert Systems with Applications 36 (2009) 6019–6024 Contents lists available at ScienceDirect Expert Systems with Applications journal homepage: ww...

Download PDF

201KB Sizes 0 Downloads 16 Views

Report

PDF Reader
Full Text

Expert Systems with Applications 36 (2009) 6019–6024

Contents lists available at ScienceDirect

Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa

Finding ‘‘persistent rules”: Combining association and classiﬁcation results Karthik Rajasethupathy a, Anthony Scime b,*, Kulathur S. Rajasethupathy b, Gregg R. Murray c a b c

Department of Mathematics, 310 Malott Hall, Cornell University, Ithaca, NY 14853-4201, United States Department of Computer Science, The College at Brockport, State University of New York, 350 New Campus Dr., Brockport, NY 14420-2933, United States Department of Political Science, Texas Tech University, Box 41015, Lubbock, TX 79409, United States

a r t i c l e Keywords: Association mining Classiﬁcation Persistent rules Strong rules

i n f o

a b s t r a c t Different data mining algorithms applied to the same data can result in similar ﬁndings, typically in the form of rules. These similarities can be exploited to identify especially powerful rules, in particular those that are common to the different algorithms. This research focuses on the independent application of association and classiﬁcation mining algorithms to the same data to discover common or similar rules, which are deemed ‘‘persistent-rules”. The persistent-rule discovery process is demonstrated and tested against two data sets drawn from the American National Election Studies: one data set used to predict voter turnout and the second used to predict vote choice. Ó 2008 Elsevier Ltd. All rights reserved.

1. Introduction Data mining is a process of inductively analyzing data to ﬁnd interesting patterns and previously unknown relationships in the data. Typically, these relationships can be translated into rules that are used to predict future events or to provide knowledge about interrelationships among data. Data mining methodologies often lead to a large number of rules that need to be evaluated to ﬁnd the most interesting and useful. These methodologies reduce the number of rules through pruning or imposing thresholds of support and conﬁdence. A domain expert further reduces the rules by identifying those that are physically impossible, redundant, or not meaningful to the issue under consideration. The rules that remain are the interesting and useful rules. However, different data mining methodologies process a data set differently. This yields results in different forms. For example, the a priori association mining algorithm presents results as strong rules, while the C4.5 classiﬁcation algorithm creates a decision tree that can be converted into rules. Although these algorithms create different sets of rules, some of the individual rules may be similar. When this is the case, these independently identiﬁed yet common rules may be considered ‘‘persistent-rules”. Persistent-rules improve decision making by narrowing the focus to rules that are the most robust, consistent, and noteworthy. In this research, the concept of persistent-rules is introduced. Further, the persistent-rule discovery process is demonstrated in the area of voting behavior, which is a complex process subject to a wide variety of factors. Given the high stakes often involved in elections; researchers, campaigns, and political parties devote con* Corresponding author. Tel.: +1 585 395 2323; fax: +1 585 395 2304. E-mail address: [email protected] (A. Scime). 0957-4174/$ - see front matter Ó 2008 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2008.06.090

siderable effort and resources to trying to understand the dynamics of voting and vote choice (Edsall, 2006). This research concludes by showing how persistent-rules, found using both association and classiﬁcation data mining algorithms, can be used to identify likely voters and for whom they will vote. 2. Related work Researchers have attempted to identify the most useful and interesting rules by applying other methodologies to data prior to the application of a data mining algorithm. For example, Deshpande and Karypis (2002) and Padmanabhan and Tuzhilin (2000) improved classiﬁcation rules by ﬁrst using association mining techniques. In this approach, the association mining creates itemsets that are selected based on achieving a given support threshold. The original data set then has an attribute added to it for each selected itemset, where the attribute values are true or false; true if the instance contains the itemset and false otherwise. The classiﬁcation algorithm is then executed on this modiﬁed data set to ﬁnd the interesting rules. Jaroszewicz and Simovici (2004) employed user background knowledge to determine the interestingness of sets of attributes using a Bayesian Network prior to association mining. In their research, the interestingness of a set of attributes is indicated by the absolute difference between the attributes’ support as calculated in a Bayesian Network and as an association itemset. Data dimensionality reduction (DDR) can be used to reduce the number of rules by simplifying the data. Fu and Wang (2005) employed DDR to improve classiﬁcation using neural networks and to produce concise, accurate, and interesting rules. Murray, Riley, and Scime (2007) and Scime and Murray (2007) used expert knowledge to reduce data dimensionality while iteratively creating classiﬁcation models.

6020

K. Rajasethupathy et al. / Expert Systems with Applications 36 (2009) 6019–6024

3. Data mining: association and classiﬁcation methodologies In data mining, there are a number of methodologies used to analyze data. Association mining is used to ﬁnd patterns of data that show conditions where sets of attribute-value pairs occur frequently in the data set. It is often used to determine relationships in transaction data. Classiﬁcation mining, on the other hand, is used to ﬁnd models of data for categorizing instances of, for example, objects, events, or persons. It is often used for predicting future events from historical data (Han & Kamber, 2001). Typically, the choice of methodology is determined by both the goal of the data mining and the data. However, similar results have been obtained by mining the same data set using both methodologies. For example, Bagui (2006) mined crime data using association and classiﬁcation techniques that yielded the same conclusions about criminal activity and enforcement. More speciﬁcally, association mining evaluates data for relationships among attributes in the data set (Witten & Frank, 2005). The association rule mining algorithm a priori ﬁnds itemsets within the dataset at user speciﬁed minimum support and conﬁdence levels. The size of the itemsets is continually increased as the algorithm proceeds until no itemsets satisfy the minimum support level. The support of an itemset is the number of instances that contain all the attributes in the itemset. The largest supported itemsets are converted into rules where each item implies and is implied by every other item in the itemset. For example, given an itemset of three items (C1 = x, C2 = g, C3 = a), six rules are generated:

IF ðC1 ¼ x AND C2 ¼ gÞ THEN C3 ¼ a

ð1Þ

IF ðC1 ¼ x AND C3 ¼ aÞ THEN C2 ¼ g IF ðC2 ¼ g AND C3 ¼ aÞ THEN C1 ¼ x

ð2Þ ð3Þ

IF C1 ¼ x THEN ðC2 ¼ g AND C3 ¼ aÞ

ð4Þ

IF C2 ¼ g THEN ðC1 ¼ x AND C3 ¼ aÞ

ð5Þ

IF C3 ¼ a THEN ðC1 ¼ x AND C2 ¼ gÞ

ð6Þ

Classiﬁcation mining, on the other hand, uses the C4.5 algorithm to generate a decision tree. The goal of classiﬁcation is to determine the likely value of a class variable (the outcome or dependent variable) given values for the other attributes of data. This is accomplished by the construction of a decision tree using data containing the dependent variable and its values. The decision tree consists of decision nodes and leaf nodes, beginning with a root decision node, connected by edges. Each decision node is an attribute of the data and the edges represent the attribute values. The leaf nodes represent the dependent variable; the expected classiﬁcation results of each data instance. Using the three items from above with C1 as the dependent variable, Fig. 1 represents a possible tree. The branches of the decision tree can be converted into rules, which all have as the consequent the dependent variable with its legal values. The rules for the tree above are

IF C3 ¼ a AND C2 ¼ g THEN C1 ¼ x

ð7Þ

IF C3 ¼ a AND C2 ¼ h THEN C1 ¼ y

ð8Þ

IF C3 ¼ b THEN C1 ¼ z

ð9Þ

IF C3 ¼ c THEN C1 ¼ y

ð10Þ

4. Rule reduction and supersession The need to reduce the number of rules is common to classiﬁcation and association techniques. This reduction may take place because the rule is not physically possible, the rule’s conﬁdence falls below the established threshold level, or the rule can be combined

Fig. 1. Classiﬁcation decision tree.

with other rules. In association mining, a minimum conﬁdence level is set for the rules. Those rules whose conﬁdence falls below that level are eliminated. In classiﬁcation mining, a pruning process combines decision tree nodes to reduce the size of the tree while having a minimum effect on the classiﬁcation result (Witten & Frank, 2005). It is possible that physically impossible or obviously coincidental rules remain after the algorithms reduce the number of rules. These rules should be identiﬁed by a domain expert and be eliminated, as well. Furthermore, one rule may have all the properties of another rule in association mining. As a rule’s premise takes on more conditions, the conﬁdence of the rule generally increases. For example, given two rules, with the conﬁdence levels given after the rule

IF A1 ¼ r and A2 ¼ s THEN A3 ¼ t ðconf : :90Þ

ð11Þ

IF A1 ¼ r THEN A3 ¼ tðconf : :80Þ

ð12Þ

Rule (12) has all the conditions of Rule (11). The additional condition in Rule (11) increases the conﬁdence, however if a conﬁdence level of .80 is sufﬁcient Rule (12) can supersede Rule (11). Rule (11) is eliminated. 5. Persistent-rule discovery Data mining methodologies can be complementary. For example, association mining has been used to strengthen the results found in classiﬁcation mining. Deshpande and Karypis (2002) added the resulting association itemsets as Boolean attributes of the data set being classiﬁed. However, persistent-rules are those that are obtained across independent data mining methods. That is, they are the subset of rules common to more than one method. If an association rule and a classiﬁcation rule are similar, then the rule would be robust across methods and be considered persistent. The only association rules that can be compared to classiﬁcation rules are those rules that contain the same premise and consequent, such as in Rules (3) and (7) in which the premise is C2 = g AND C3 = a and the consequent is C1 = x. Commonly, the classiﬁcation rules contain many conditions as the tree is traversed to construct the rule. That is, a condition exists for each node of the tree. As long as the entire association rule premise is present in a classiﬁcation rule, the association rule can supersede the classiﬁcation rule. When the classiﬁcation rule drops conditions that are not present in the association rule, it becomes a rule-part.

K. Rajasethupathy et al. / Expert Systems with Applications 36 (2009) 6019–6024

However, there may be many identical rule-parts. The process of ﬁnding the rule-parts that match an association rule involves the following steps: (1) Find association rules with the classiﬁcation dependent variable as the consequent; (2) Find those classiﬁcation rules that contain the same conditions as the association rule; (3) Create rule-parts by deleting the classiﬁcation rule conditions that are not conditions in the association rule. 6. Rule reversal Classiﬁcation mining begins with a goal class or dependent variable toward which the construction of the tree is oriented. As a result, all classiﬁcation rule consequents contain the same attribute, although this attribute may have different values. In contrast, association mining creates candidate rules by considering all the possible combinations of attribute-values pairs in the data as premises and consequents. Rules are selected from the candidate rules by determining the number of instances that satisfy the candidate rule and then comparing that number to a threshold value. As a result, association and classiﬁcation rules can only be compared when the premise and consequent of the rule match. An association rule may have the classiﬁcation rule’s consequent as one of its conditions. In this case, the association rule needs reversal in order for it to be compared to its corresponding classiﬁcation rule. To reverse a rule, apply the following Boolean logic:

IF ðX ¼> YÞ THEN ðNOT Y ¼> NOT XÞ For example, reverse Rule (4) (IF C1 = x THEN (C2 = g AND C3 = a)):

IF NOT ðC2 ¼ g AND C3 ¼ aÞ THEN NOT C1 ¼ x

ð13Þ

A simple application of DeMorgan’s Law, followed by decomposition leads to

IF ðNOT C2 ¼ gÞ OR ðNOT C3 ¼ aÞ THEN NOT C1 ¼ x

ð14Þ

IF ðNOT C2 ¼ gÞ THEN NOT C1 ¼ x

ð15Þ

and

IF ðNOT C3 ¼ aÞ THEN NOT C1 ¼ x

ð16Þ

Rules (15) and (16) combined state ‘‘if any values other than C2 = g and C3 = a then anything except C1 = x”. Or,

IF C1 ¼ x OR C1 ¼ y OR C1 ¼ z OR C2 ¼ h OR C3 ¼ b OR C3 ¼ c THEN C1 ¼ y OR C1 ¼ z OR C2 ¼ g OR C2 ¼ h OR C3 ¼ a OR C3 ¼ b OR C3 ¼ c

ð17Þ

which can be decomposed into 42 rules. However, the only rules that will match the classiﬁcation tree rules are ones that conclude with C1 = y or C1 = z, of which there are 12. Of those 12 rules, the six with C1 as a condition will not appear in the classiﬁcation rule set. Leaving six rules that may match part of the classiﬁcation tree.

IF C2 ¼ h THEN C1 ¼ y IF C3 ¼ b THEN C1 ¼ y

ð18Þ ð19Þ

IF C3 ¼ c THEN C1 ¼ y

ð20Þ

IF C2 ¼ h THEN C1 ¼ z

ð21Þ

IF C3 ¼ b THEN C1 ¼ z

ð22Þ

IF C3 ¼ c THEN C1 ¼ z

ð23Þ

Rule (18) matches part of Rule (8), Rule (20) matches Rule (10), and Rule (22) matches Rule (9). The other three rules are no longer of

6021

interest. In this example, then, there are four persistent-rules. Rule (3) is directly a persistent-rule, while Rules (18), (20), and (22) are persistent-rules based on rule reversal. 7. The ANES data and the data mining application The 1948–2004 ANES cumulative data ﬁle (ANES, 2005) is a single ﬁle composed of the merged cases and attributes from each of the ANES studies conducted since 1948 (47,438 records). The ﬁle includes most, but not all, of the questions that have been asked in three or more ANES surveys conducted during the multi-decade time period. It is composed, therefore, of more than 900 attributes, which, for comparability, have been coded in a consistent manner from year to year. Because the data set is prepared for analysis, all the attribute values are coded numerically with predeﬁned meanings. This study uses ANES data that had been previously selected and cleaned for data mining (Murray et al., 2007; Scime & Murray, 2007).1 The ANES attributes are of two types: discrete and continuous. Discrete-value attributes contain a single deﬁned value such as party identiﬁcation, which is indicated as Democrat, Republican, or other. Continuous-value attributes take on an inﬁnite number of values such as the 0-100-scale ‘‘feeling thermometers”, which measure affect toward a speciﬁed target, and subtractive scales, which indicate the number of ‘‘likes” minus the number of ‘‘dislikes” mentioned about a target. It should be noted that in the previous studies the continuous-value attributes were left as continuous attributes. As a result of the previous data mining methodology studies, the data sets had been cleaned and prepared for classiﬁcation mining. To insure discrete attributes were not misinterpreted as numeric values an ‘‘a” or ‘‘A” was prepended to each value. Because association mining only uses discrete attributes, the continuous attributes were discretized. In this study, the WEKA (Waikato Environment for Knowledge Analysis) (Witten & Frank, 2005) software implementations of the association mining a priori algorithm and the classiﬁcation mining C4.5 algorithm were used. Shannon’s entropy method was used to discretize the continuous attributes. 8. Demonstrating persistent-rules: predicting vote choice The persistent-rule discovery process was ﬁrst applied to the data set used in the presidential vote choice studies (Scime & Murray, 2007; Murray & Scime, in press). This data set consists of 14 attributes and 6677 instances from the ANES. The a priori association algorithm was run on the data set, which generated 29 rules with a minimum 0.80 conﬁdence and 0.15 support levels. All 29 rules concluded with the race attribute having the value ‘‘white”. This suggested that the number of white voters in the data set was sufﬁciently large to skew the results. Further examination of the data set revealed that 83.5% of the voters were white. The domain expert concluded that race as an indicator of association is not useful. Recleaning the data to remove the race attribute, the data set was rerun with the a priori algorithm. This resulted in 33 rules with conﬁdence levels between 0.60 and 0.85 and a support level of 0.15. Though the conﬁdence levels had decreased, the rule consequents were varied and reasonable. Next, the C4.5 classiﬁcation algorithm using three folds was applied to the data set to which the a priori association algorithm was applied (i.e., the data set that excluded the race attribute). Following Scime and Murray (2007), the dependant variable was the political party for which the voter reported voting (depvarvotewho). The classiﬁcation tree had more complex rules than the rules

1

Please see the Appendix for attribute deﬁnitions.

6022

K. Rajasethupathy et al. / Expert Systems with Applications 36 (2009) 6019–6024

obtained from association mining. For example, one branch of the tree was

IF the feeling about Democratic presidential candidate is negative THEN the respondent votes Republican ð29Þ

apid = a2 affrepcand = ‘(0.5 to 0.5]’2 demtherm = ‘(inf to 42.5]’ aeduc = a1: NotVote

IF the affect towards the Republican Party is mostly positive

This branch of the tree translates into the rule: IF Party identiﬁcation (apid) = weak or leaning Democratic (a2) AND Affect towards Republican candidate (affrepcand)= no affect,‘(0.5 to 0.5]’ AND Democratic thermometer (demtherm) = not favorable ‘(inf to 42.5]’ AND Education of respondent (aeduc) = 8 grades or less (a1) THEN Dependent variable, party voted for (depvarvotewho) = NotVote Recall that persistent-rules must have identical rule consequents generated independently by both data mining methodologies. Because vote choice (depvarvotewho) was the subject of the classiﬁcation mining, only rules with that consequent among the association mining results are candidates for identiﬁcation as persistent-rules. Ten of the 33 association rules met this requirement; two of these are superseded by another, leaving eight possibly persistent-rules. For example, one of the eight association rules states: IF Affect towards Republican candidate (affrepcand) = extreme like ‘(2.5 to inf)’ THEN Dependent variable, party voted for (depvarvotewho) = Republican A review of the tree rules reveals that there are six classiﬁcation rules whose premises and consequents match the premises and consequents of the association rules. The other rules are not considered further, because to be classiﬁed along a branch an instance must satisfy all the conditions (attribute-value pairs) of the branch. By supersession, the instances that satisfy the branch would also satisfy the association rule being evaluated. The six classiﬁcation rules that incorporate the association rule have the rule-part:

The persistent-rule discovery process was repeated on all eight association rules. The persistent-rules are

THEN the respondent votes Republican

ð24Þ

IF feelings about Republican presidential candidate are positive THEN the respondent votes Republican

ð25Þ

IF the affect towards the Democratic candidate is negative THEN the respondent votes Republican

ð26Þ

IF the respondent identifies him or herself as a strong Democrat THEN the respondent votes for the Democratic candidate

ð30Þ

IF the affect towards the Democratic Party is positive THEN the respondent votes Democratic

ð31Þ

Four association rules were also generated that had the class attribute, depvarvotewho, in the condition portion of the rule. Given rule reversal, these rules were candidates for further analysis:

IF reptherm ¼ ‘ð79:5 to infÞ’ AND depvarvotewho ¼ REP THEN awhoelect ¼ a2

ð27Þ

IF the feeling about Democratic presidential candidate is positive THEN the respondent votes for the Democratic candidate ð28Þ 2 The numbers are the range of values as found in the discretization process. The value closed by the parentheses is not included in that range of numbers. The value closed by the square brackets is included in the range of numbers. The values ‘inf’ and ‘inf’ represent negative and positive inﬁnite, respectively.

ð32Þ

IF affdemcand ¼ ‘ð inf to 1:5’ AND depvarvotewho ¼ REP THEN awhoelect ¼ a2

ð33Þ

IF aintelect ¼ a2 AND depvarvotewho ¼ REP THEN awhoelect ¼ a2 IF depvarvotewho ¼ REP THEN awhoelect ¼ a2

ð34Þ

ð35Þ

Rule (35) supersedes Rules (32)–(34) because the consequent of all the rules is awhoelect = a2 and the conditions of all the rules contain depvarvotewho = REP. Applying rule reversal to Rule (35):

IF NOT awhoelect ¼ a2 THEN NOT depvarvotewho ¼ REP

ð36Þ

Of the rules that can be derived from Rule (36), the only rules of interest are those that have the classiﬁcation dependent variable (depvarvotewho) as the consequent and have as a condition the consequent attribute (awhoelect) of the original rule (Rule (35)). That is

IF awhoelect ¼ a1 OR awhoelect ¼ a7 THEN depvarvotewho ¼ DEM OR depvarvotewho ¼ Not Vote OR depvarvotewho ¼ MAJ ð:73Þ

IF affrepcand ¼ ‘ð2:5 to infÞ’ THEN REP

IF the affect towards the Republican candidate is positive

THEN the respondent votes Republican

ð37Þ

Rule (37) corresponds to eight classiﬁcation rules with the following rule-parts IF IF IF IF

awhoelect = a1 awhoelect = a1 awhoelect = a7 awhoelect = a1

THEN THEN THEN THEN

DEM Not Vote MAJ REP

Therefore, Rule (37) is also a persistent-rule. This rule states, ‘‘If the respondent thought either a Democrat or another candidate (not Republican) would most likely win the election, then he/she either voted for a Democrat or did not vote all”. In this example, then, there are nine persistent-rules. Rules (24)–(31) are directly persistent-rules, while Rule (37) is a persistent-rule based on rule reversal. 9. Demonstrating persistent-rules: identifying likely voters The persistent-rule discovery process was next applied to the data set used in the likely voter study (Murray et al., 2007). This data set consists of three attributes and 3899 instances from the ANES. A threefold C4.5 classiﬁcation algorithm generated a tree with three rules. Association a priori analysis resulted in three

K. Rajasethupathy et al. / Expert Systems with Applications 36 (2009) 6019–6024

rules. The resulting rules were compared and evaluated following the process detailed for the vote choice rules. The focus of this analysis is voter turnout – whether the respondent is expected to vote or not. As such, the only association rules that could be compared to the classiﬁcation rules were those that concluded with voter turnout. Interestingly, none of the rules satisﬁed this requirement. However, two of the three association rules included voter turnout in the premise. The classiﬁcation tree follows: A_Intent = A1 A_Prevvote = A0: A0 A_Prevvote = A1: A1 A_Intent = A0: A0 The association rules include

IF A Voteval V2 ¼ A1 AND A Prevvote ¼ A1 THEN A Intent ¼ A1 IF A Voteval V2 ¼ A1 THEN A Intent ¼ A1

ð38Þ

ð39Þ

6023

tent-rules”, are those that are common to different algorithms. Persistent-rules are discovered by the independent application of association and classiﬁcation mining to the same data set. These rules have been identiﬁed as strong by the association mining algorithm and have met the minimum conﬁdence level established for the classiﬁcation algorithm. While persistent-rules may have a lower conﬁdence than similar association rules and may not classify all future instances of data, they improve decision making by narrowing the focus to rules that are the most robust, consistent, and noteworthy. In this case, the persistent-rule discovery process is demonstrated in the area of voting behavior. In the vote choice data set, mining and analysis resulted in nine persistent-rules out of the 33 total rules that were generated through association mining. In the likely voter data set, the process resulted in one persistent-rule out of the two rules that were generated through association mining. The persistent-rule discovery process suggests these 10 rules are the most robust, consistent, and noteworthy of the much larger potential rule sets. Appendix. ANES survey items in the vote choice data set Discrete-valued questions (attribute names)

IF A Prevvote ¼ A1 THEN A Intent ¼ A1

ð40Þ

Rule (40) is eliminated because it does not include voter turnout. Rule (39) supersedes Rule (38). Rule (39), if the respondent voted, then he/she intended to vote, is reversed and becomes if the respondent did not intend to vote, then he/she did not vote:

IF NOT ðA Intent ¼ A1Þ THEN NOT ðA Voteval V2 ¼ A1Þ

ð41Þ

Both attributes’ values are binary in Rule (41). A_Intent is either A1 or A0 (the voter either intended to vote or not); and, A_Voteval_V2 is either A1 or A0 (the voter either voted or did not vote). Decomposition of Rule (41) leads to only one rule concluding with the classiﬁcation dependent variable (A_Voteval) and matching a classiﬁcation rule. The rule is

IF A-Intent ¼ A0 THEN A Voteval V2 ¼ A0

ð42Þ

Hence, if a person does not intend to vote, then it is very likely he/ she will not vote. This is the only persistent-rule from this analysis.

10. Conclusion Data mining typically results in a set of rules that can be applied to future events or that can provide knowledge about interrelationships among data. This set of rules is most useful when it can be dependably applied to new data. Dependability is the strength of the rule. Generally, a rule’s strength is measured by its conﬁdence level. Strong association mined rules are those that meet the minimum conﬁdence level set by the domain expert (Han & Kamber, 2001). The higher the conﬁdence level the stronger the rule and the more likely the rule will be successfully applied to new data. Classiﬁcation mining generates a decision tree, and resulting rules, that has been pruned to a minimal set of rules. Each rule also has a conﬁdence rating suggesting its ability to correctly classify future data. This research demonstrates a process to identify especially powerful rules. These powerful rules, which are deemed ‘‘persis-

What is the highest degree that you have earned? (aeduc) 1 8 grades or less. 2 9–12 grades, no diploma/equivalency. 3 12 grades, diploma or equivalency. 4 12 grades, diploma or equivalency plus non-academic training. 5 Some college, no degree; junior/community college level degree (AA degree). 6 BA level degrees. 7 Advanced degrees including LLB. Some people do not pay much attention to political campaigns. How about you, would you say that you have been/were very much interested, somewhat interested, or not much interested in the political campaigns this year? (aintelect) 1 Not much interested. 2 Somewhat interested. 3 Very much interested. Some people seem to follow what is going on in government and public affairs most of the time, whether there is an election going on or not. Others are not that interested. Would you say you follow what is going on in government and public affairs most of the time, some of the time, only now and then, or hardly at all? (aintpubaff) 1 2 3 4

Hardly at all. Only now and then. Some of the time. Most of the time.

How do you identify yourself in terms of political parties? (apid) 3 Strong Republican 2 Weak or leaning Republican 0 Independent 2 Weak or leaning Democrat 3 Strong Democrat

6024

K. Rajasethupathy et al. / Expert Systems with Applications 36 (2009) 6019–6024

In addition to being American, what do you consider your main ethnic group or nationality group? (arace) 1 2 3 4 5 7

White Black Asian Native American Hispanic Other

Who do you think will be elected President in November? (awhoelect) 1 Democratic candidate 2 Republican candidate 7 Other candidate Continuous-valued questions Feeling thermometer questions. A measure of feelings. Ratings between 50 and 100 degrees mean a favorably and warm feeling; ratings between 0 and 50 degrees mean the respondent does not feel favorably. The 50 degree mark is used if the respondent does not feel particularly warm or cold: Feeling about Democratic Presidential Candidate. (demtherm) Discretization ranges: (inf to 42.5], (42.5 to 54.5], (54.5 to 62.5], (62.5 to 77.5], (77.5 to inf) Feeling about Republican Presidential Candidate. (reptherm) Discretization ranges: (inf to 42.5], (42.5 to 53.5], (53.5 to 62.5], (62.5 to 79.5], (79.5 to inf) Feeling about Republican Vice Presidential Candidate. (repvptherm) Discretization ranges: (inf to 32.5], (32.5 to 50.5], (50.5 to 81.5], (81.5 to inf) Affect questions. The number of ‘likes’ mentioned by the respondent minus the number of ‘dislikes’ mentioned: Affect toward the Democratic Party. (affdem) Discretization ranges: (inf to 1.5], (1.5 to 0.5], (0.5 to 0.5], (0.5 to 1.5], (1.5 to inf) Affect toward Democratic presidential candidate. (affdemcand) Discretization ranges: (inf to 1.5], (1.5 to 0.5], (0.5 to 0.5], (0.5 to 2.5], (2.5 to inf) Affect toward Republican Party. (affrep) Discretization ranges: (inf to 2.5], (2.5 to 0.5], (0.5 to 0.5], (0.5 to 2.5], (2.5 to inf)

Affect toward Republican presidential candidate (affrepcand) Discretization ranges: (inf to 2.5], (2.5 to 0.5], (0.5 to 0.5], (0.5 to 2.5], (2.5 to inf) ANES survey items in the likely voter data set Was respondent’s vote validated? (A_Voteval_V2) 0 No record of respondent voting. 1 Yes. ‘‘On the coming Presidential election, do you plan to vote?” (A_Intent) 0 No 1 Yes ‘‘Do you remember for sure whether or not you voted in that [previous] election?” (A_Prevvote) 0 Respondent did not vote in previous election or has never voted 1 Voted: Democratic/Republican/Other

References American National Election Studies (ANES). (2005) Center for political studies. Ann Arbor, MI: University of Michigan. Bagui, S. (2006). An approach to mining crime patterns. International Journal of Data Warehousing and Mining, 2(1), 50–80. Deshpande, M., & Karypis, G. (2002). Using conjunction of attribute values for classiﬁcation. In Proceedings of the eleventh international conference on information and knowledge management, McLean, VA, pp. 356–364. Edsall, T. B. (2006). Democrats’ data mining stirs an intraparty battle. The Washington Post, March 8: A1. Fu, X., & Wang, L. (2005). Data dimensionality reduction with application to improving classiﬁcation performance and explaining concepts of data sets. International Journal of Business Intelligence and Data Mining, 1(1), 65–87. Han, J., & Kamber, M. (2001). Data mining: Concepts and techniques. Boston, MA: Morgan Kaufman. Jaroszewicz, S., & Simovici, D. A. (2004). Interestingness of frequent itemsets using Bayesian networks as background knowledge. In Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, Seattle, WA, pp. 178–186. Murray, G.R., & Scime, A. (in press). Micro-targeting and electorate segmentation: Data mining the American national election studies. Journal of Political Marketing. Murray, G. R., Riley, C., & Scime, A. (2007). ‘‘A new age solution for an age-old problem: Mining data for likely voters, presented at the 62nd annual conference of the american association of public opinion research, May 17–20, 2007, Anaheim, CA. Padmanabhan, B., & Tuzhilin, A. (2000). Small is beautiful: discovering the minimal set of unexpected patterns. In Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining, Boston, MA, pp. 54–63. Scime, A., & Murray, G. R. (2007). Vote Prediction by iterative domain knowledge and attribute elimination. International Journal of Business Intelligence and Data Mining, 2(2), 160–176. Witten, I. H., & Frank, E. (2005). Data mining: Practical machine learning tools and techniques (second ed.). San Francisco, CA: Morgan Kaufman.

Finding “persistent rules”: Combining association and classification results

Finding “persistent rules”: Combining association and classification results

Recommend Documents