Expert Systems with Applications 39 (2012) 1273–1283
Contents lists available at ScienceDirect
Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa
Making use of associative classifiers in order to alleviate typical drawbacks in recommender systems Joel Pinho Lucas ⇑, Saddys Segrera, María N. Moreno Departamento de Informática y Automática, Universidad de Salamanca, Plaza de la Merced s/n 37008, Salamanca, Spain
a r t i c l e
i n f o
Keywords: Recommender systems Associative classification Sparsity
a b s t r a c t Nowadays, there is a constant need for personalization in e-commerce systems. Recommender systems make suggestions and provide information about items available, however, many recommender techniques are still vulnerable to some shortcomings. In this work, we analyze how methods employed in these systems are affected by some typical drawbacks. Hence, we conduct a case study using data gathered from real recommender systems in order to investigate what machine learning methods can alleviate such drawbacks. Due to some especial features inherited by associative classifiers, we give a particular attention to this category of methods to test their capability of dealing with typical drawbacks. Ó 2011 Elsevier Ltd. All rights reserved.
1. Introduction Nowadays, e-commerce systems present loads of products available for sale. Thus, users would probably have difficulty to choose the products they prefer and, consequently, to purchase them. Due to such facts and to a more and more competitive industry, these systems need to personalize their products’ presentation. A way to reach such personalization is by means of the ‘‘recommender systems’’, which according to Bae and Kim (2010) have emerged in e-commerce applications to provide advice to customer about items they might wish to purchase or examine. In this sense, recommender systems aim at enabling the creation of a new store personally designed for each consumer (Schafer, Konstan, & Riedl, 2001). Taking into account that machine learning techniques are applied for identifying patterns with different purposes, such as classification or prediction and knowledge discovery, according to Cheung, Kwok, Law, and Tsui (2003), these techniques can be successfully applied to predict users’ preferences in recommender systems. Machine learning methods can provide several improvements on these systems and then provide more personalization. However, even employing machine learning methods, recommender systems still suffer from innumerous limitations and may be very susceptible to produce errors. In this work we investigate the main shortcoming presented by recommender systems and how they affect recommendations’ quality. Afterwards, we analyze how they may be alleviated. Therefore, after obtaining successful results with association rules in ⇑ Corresponding author. Tel.: +34 923294653; fax: +34 923294514. E-mail addresses:
[email protected] (J. Pinho Lucas),
[email protected] (S. Segrera),
[email protected] (M.N. Moreno). 0957-4174/$ - see front matter Ó 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2011.07.136
other domains with similar drawbacks (Moreno, Ramos, García, & Toro, 2008), we try classification based on association, which is a machine learning technique that combines concepts from classification and association, in recommender systems. As it will be described along the next sections, such technique may present several advantages if applied for building a recommender model. Moreover, in a preliminary study made in Lucas, Segrera, and Moreno (2008), we proved that classification based on association have a promising potential in the recommender systems domain. In order to perform such analysis, we accomplished a case study using two recommender systems databases. The first consists of ratings of movies made by MovieLens users and the second consists of book ratings made the BookCrossing community. We considered essential to use MovieLens data on the case study we propose, because almost every case study on recommender systems we found employed mainly movie rating datasets (mostly MovieLens). However, employing just one type of data from a single domain may limit the scope of many case studies. This shortcoming on this research field is also confirmed by Herlocker, Konstan, Terveen, and Riedl (2004), who argues that the lack of variety in publicly available collaborative filtering datasets (particularly with significant numbers of ratings) remains one of the most significant challenges in the field. In this way, we decided to employ, in addition to a classic movie rating data, a different data base within a different domain in order to enhance the quality of the case study presented in this work. Herlocker et al. (2004) also affirm that even early researchers recognized that, in a recommender scenario, it can be more valuable to measure how often the system leads its users to wrong choices and that accuracy differences are usually tiny even when they are measurable. Based on such observations, we consider other
1274
J. Pinho Lucas et al. / Expert Systems with Applications 39 (2012) 1273–1283
metrics, besides accuracy, to evaluate algorithms in a recommender context. Hence, we also consider the false positive occurrence and number of rules generated (if applicable for the algorithm being analyzed). Moreover, we also took into account some implicit features related to shortcomings of recommender systems. In the next section we present the main classes of methods employed in recommender systems. Subsequently, we depict the most important ones. At this time, we also describe the use of association rules for classification problems. In Section 4, we quote and explain the main shortcomings and research challenges related to recommender systems. Finally, Section 5 contains the case study accomplished in order to experiment methods’ capability for alleviating effects of shortcomings produced in recommender systems.
2. Types of recommender methods Methods employed in recommender systems have their foundations in different areas of knowledge. Cheung et al. (2003) and Lee, Kim, and Rhee (2001) classify recommender systems depending on the type of method used for making recommendations. The two main categories of recommender methods are collaborative filtering algorithms and content-based approaches. Content-based methods compare text documents to a specific user profile, where web objects are recommended to a user based on those he has been interested in the past (Lee et al., 2001). On the other hand, collaborative filtering methods were proposed aiming to provide more consistency to recommendations by means of information related to the social context of each user. To do so, the recommendation will be given to the active user according to item’s opinions given by other users who have similar profile (similar preferences about other items). The collaborative filtering approach was originally based on the nearest neighbor algorithm (Sarwar, Karypis, Konstan, & Reidl, 2001), which recommends products to a target user according to the opinions of users who have similar purchase patterns. Thus, recommended products will be the ones liked by users with similar interests. This approach appeals to the notion that when we are looking for information, we often seek the advice of friends with similar tastes or other people whose judgment we trust (Condliff, Lewis, Madigan, & Posse, 1999). In this way, information about items that other people have already found and evaluated is taken into account. Breese, Heckerman, and Kadie (1998) classified collaborative filtering methods into two groups: memory-based methods and model-based methods. In memory-based methods the nearest neighbors of a target user is found by matching the opinions of such user to the opinions of all system’s users. On the other hand, model-based methods build a predictive model by means of a training set comprising opinions acquired from a small portion of the system’s users. Such methods have been developed more recently in order to avoid the sparsity problem, which usually arises when memory-based methods are employed, because e-commerce systems generally offer millions of products for sale, so that it is not feasible to obtain opinions about all of them (Sarwar, Karypis, Konstan, & Riedl, 2000). Current recommender systems usually do not hold a substantial number of evaluations comparing with the number of items available in the system. As a consequence, the system seldom encompasses a complete database with evaluations of all items available. That is why model-based methods generally employ a training set with evaluations gathered from just a part of users of the system. Even though, generally the number of evaluations available is proportionally short and, as result, it is still necessary to develop more techniques to solve such shortcoming. Due to difficulties on obtaining considerable information of evaluation from users about system’s items, current recommender
systems usually do not employ only collaborative filtering methods. Likewise, content-based methods are usually not employed solely, because they are not effective due to the lack of mechanisms to extract Web objects features. Hence, each of these approaches has its strengths and weaknesses (Bae & Kim, 2010). Therefore, content-based and collaborative filtering methods are commonly combined or employed together in recommender systems. Combining different methods to overcome disadvantages and limitations of a single method may improve the performance of recommenders (Göksedef & Gündüz-Ögüdücü, 2010).
3. Methods employed in recommender systems Seeing that recommender systems recommend items to users based on ratings or past customer behavior and also that there are usually several items and users, they need to be grouped in order to make recommendations feasible. In this way, classification methods are mostly employed to classify every user and/or item in one of those groups. Techniques used for classification are considered predictive methods because they aim at predicting the value of an attribute, called label-attribute, in a certain dataset. Each value of the label-attribute must be discrete since it is responsible for representing a class. The prediction of the label-attribute value is achieved by means of other attributes’ values (the descriptive attributes). The values of these attributes must be known in all samples, so that they may build a training set defining clearly the characteristics of all classes. Thus, classification is considered as a supervised learning approach. Furthermore, a test set is used to verify the consistence of the learning model. In this section we will describe the main techniques employed in recommender systems, which may be either supervised or unsupervised learning techniques. In the next subsection we depict the most common machine learning methods employed in recommender systems. Subsequently, we describe some foundations of association rules and how they can be employed in these systems. Finally, in Section 3.3, we describe classification based on association methods.
3.1. Machine learning methods According to Bae and Kim (2010), most researchers have been using data mining techniques in recommender systems aiming at predicting the customer’s future behavior and to increase the chance of repurchasing. In this way, and since data mining approaches are basically made of machine learning methods, we may say that such methods are quite popular in the recommender systems area. Billsus and Pazzani (1998) were the first to apply machine learning techniques in recommender systems, the authors proposed to transform the traditional collaborative filtering recommender problem (nearest neighbor) into a machine learning problem. To do so, authors firstly employed neural networks and considered the recommender problem as a classification problem. In this sense, a neural network constructs a model, defining classes of items, for recommendation using the ratings given by users and then classifying the non-rated items. More recent neural networks approaches (Chou, Li, Chen, & Wu, 2010) uses consumer knowledge upon items in order to provide personalization in ecommerce systems. They assume that costumers have some amount of experience or information about items they are using or plan to purchase. In this context, collaborative filtering may be seen as a prediction task, because the basic idea is to predict how a user will rate a new item (not rated before). That is the reason why this new approach for collaborative filtering was called item-based methods, because recommendations are given based
J. Pinho Lucas et al. / Expert Systems with Applications 39 (2012) 1273–1283
on items’ features. However, nowadays item-based collaborative methods are generally known as model-based methods. Since most machine learning techniques employed in recommender systems are supervised learning methods, they need to be ran off-line in order to build a learning model used for classification. Thus, great part of the processing is performed off-line and hence, these methods do not require much on-line processing time, letting recommendations to be provided faster. Moreover, they may reduce shortcomings caused by sparsity, because they do not need ratings of all items available in order to build the classification model. However, as highlighted in Section 2, due to difficulties on obtaining considerable information of evaluation from users about system’s items, model-based methods are commonly mixed with content-based methods in order to enhance recommendation’s quality. Despite being the first machine learning method applied for collaborative filtering, nowadays neural networks are not vastly employed in recommender systems. The ‘‘black box’’ learning approach of such method, where no information known except input and output, is a crucial issue for building recommender models. In addition, the training time of a neural network is costly, which might be a critical issue in current recommender systems, whose data, according to Koren (2010), is changing over time and models should be continuously updated to reflect its present nature. As classifiers may be implemented employing many different machine-learning techniques, naturally recommender systems expanded the model-based techniques they apply. Probably the most popular machine learning method for recommender systems are the Bayesian networks. They are an effective method employed in data mining that are widely applied for building recommender models (Condliff et al., 1999). The use of Bayesian networks for collaborative filtering was first suggested in Breese et al. (1998). It makes use of a training set to build a probabilistic structure for predicting user ratings. A distinct Bayesian network is used for representing every user who recommendations will be provided to and a learning algorithm is responsible for processing every network. Each node of the network represents, by means of a decision tree, a domain item and the edges represent information about the user. The states of a node correspond to rating values for the item being represented by the node. Since the classification model used for recommendations need to be build and, depending on the amount of data of the system, it may be very costly. On the other hand, the output model is small and its efficiency is similar to the nearest neighbor algorithm. However, it may still be not efficient enough depending on the number of items, because in order to predict the rating of a given item, the algorithm needs to calculate the conditional probabilities of all possible ratings for such item given all possible ratings for other items. Another popular data mining technique widely employed in recommender systems, as a model-based method, is clustering. It consists on performing non-supervised learning in order to identify groups of users who appear to have similar preferences. Therefore, recommendations provided to the active user will be related to the opinions (or ratings) given by users of the group he owns to. Most clustering algorithms require a distance metric or similarity metric, obtained by computing similarity between items, to guide the clustering process (Connor & Herlocker, 2001). Depending on the system, sometimes, after the clusters being created, opinions of other users may be averaged in the cluster in order to provide recommendations for the active user. Nevertheless, clustering techniques may use fuzzy logics and then every user may own to more than one cluster. In this way, each user will have partial participation in several clusters. At this point, a membership measure is assigned to each user in every cluster. Recommendation will be an average across the clusters,
1275
weighted by the membership value. However, even using fuzzy logics, according to Breese et al. (1998), clustering techniques usually produce less-personal recommendations than other methods, and in some cases, the clusters have worse accuracy than classic collaborative filtering (i.e., nearest neighbor) algorithms. Therefore, nowadays clustering techniques are usually applied in collaborative filtering together with other methods. In this context, they are applied as a first step in order to distribute the data for different recommender methods or to reduce the initial dataset. While dividing the population into clusters may hurt the accuracy of recommendations to users near the fringes of their assigned cluster, pre-clustering may be a worthwhile trade-off between accuracy and throughput (Schafer, 2005). Taking into account machine learning techniques mainly applied in collaborative filtering methods, the Support Vector Machines (SVMs) are quite frequent. Such technique consists in a supervised learning method for building a lineal classifier. In order to build a SVM, every user is seen as a vector composed by ratings about items. Such vectors are associated to a geometric space in which a hyperplane of separation between the possible classes is built. In this context, such classes are related to groups of users with similar preferences. Unlike other learning methods, SVM’s performance is related not to the number of features in the system, but to the margin with which it separates the data (Cheung et al., 2003). Association rule-based methods for classification are also employed in recommender systems. They are being more and more popular due to some benefits they present for the collaborative filtering context. In the next subsection we detail association rule discovery applied for recommender systems. 3.2. Association rules As well as most data mining techniques, association rules induction algorithms can be employed to enhance personalization in recommendation systems, such as the ones in Lee et al. (2001), Lazcorreta, Botella, and Fernández-Caballero (2008). Association rules were first introduced by Agrawal, Imielinski, and Swami (1993) aiming at discovering consuming patterns in retail databases. Thus, the task of discovering association rules in retail data was termed as ‘‘market basket analysis’’. Data stored in market basket are items bought on a per-transaction basis for a retailing organization. The representation of an association rule may be declared as A ? B, where A and B are item sets. Such representation states that, in a transaction, the occurrence of all items from ‘‘A’’ (antecedent side of the rule) results in the occurrence of items belonging to ‘‘B’’ (consequent side of the rule), such as A # I and B # I, where ‘‘I’’ is an item set. An association rule describes an association relation between item sets that occurs together on transactions of a data set. Thus, association, unlike classification, is not considered as a prediction task, because it aims at describing data. Therefore, association rule mining is not a machine learning method. There are several association rule mining algorithms available in the literature, such as ECLAT (Zaki, Parthasarathy, Ogihara, & Li, 1997), DIC (Brin, Motwani, Ullman, & Tsur, 1997) and FP-Growth (Han, Pei, & Yin, 2000). However, the Apriori (Agrawal et al., 1993) is certainly the most popular, and widely employed, association rules discovery algorithm nowadays (Neves, 2003). Its concepts and techniques are used by almost every algorithm proposed currently, which, are mostly mere extension of the Apriori (Neves, 2003). Our motivation to apply association rules in recommender systems is based on the structure of these rules, which is very appropriated for recommendation purposes, because they can learn patterns like ‘‘70% of users who liked item 1, also liked item
1276
J. Pinho Lucas et al. / Expert Systems with Applications 39 (2012) 1273–1283
2’’. In this context, the rules can denote two types of associations: between users and items (in the case of content-based methods) and associations between items or users (in the case of collaborative filtering methods). The first type of association rules in recommender systems consists on associate active user’s profile data with items’ features. The second one, employed in collaborative filtering, consists on associating data of the active user with data of other users or on associating data of items the active user is interested in with data of other items available on the system. In Sun, Kong, and Chen (2005), for example, association rules are applied to mine relationships between items and then make the prediction about an item for an active user by computing the weighted sum of the ratings given by the user about the items similar to the target item. Géry and Haddad (2003) propose the use of implicit opinions (instead of ratings) of users. The problem of finding Web pages visited together is similar to find associations rules among itemsets in transaction databases. In most cases, rules may be discovered offline by processing data related to users’ opinions. Recommender models based on association rules are easy to be interpreted and are usually faster to be built than most machine learning models. A recommender model using Bayesian networks, for example, considers the conditional probabilities of all the possible opinions for an item given all possible opinions for other items. Moreover, association rules may alleviate the sparsity drawback, which will be described into the next section, because rules do not need to consider all opinions from users in order to build a recommender model. Furthermore, the system may take into account just the best rules for the recommender model. Rules may be evaluated and ranked by means of statistical measures like support and confidence. In the case study depicted in the next section, we will analyze in a more detailed way how association rules can reduce sparsity. However, Zhang and Chang (2005) affirm that recommender models built employing only association generally present poorquality recommendations. In Zhang and Chang (2005), association rules were combined with sequential rules to enhance recommendations efficiency. While in Lee et al. (2001), a method combining the nearest neighbor algorithm with association rules was developed. Such method employs information acquired from transactions of groups of users with similar preferences (neighbors) in order to discover association rules about Web objects. Alternatively, other machine learning methods could be applied for recommender systems, as the one proposed by Liu, Hsu, and Ma (1998). In such work, authors adapted association rules in order to play the role of a classifier. Such method will be described in the next subsection and it will be tried for recommender systems in Section 5. 3.3. Classification based on association methods As highlighted in the previous subsection, association rules aim at describing data and, consequently, they are seen a non-supervised learning method. On the other hand, a classification method is seen as a prediction technique, because it aims at predicting the value of an attribute (label) in a data set. The joining of concepts from classification and association (Liu et al., 1998) is an alternative approach for performing classification tasks, where association rules are employed as the basis of a classification method. Seeing that association models are commonly more effective than classification models, a crucial matter that encourages the use of association rules in classification is the high computational cost that current classification methods present. Several works (Li, Han, & Pei, 2001; Liu et al., 1998; Thabtah, Cowling, & Peng, 2005; Yin & Han, 2003) verify that classification based on association methods presents higher accuracy than traditional classification methods. Differing from association rules, decision trees, for example, do
not consider simultaneous correspondences occurring on different attribute values. Moreover, human beings can easier comprehend an output provided by association rule algorithms than an output provided by usual classification techniques, such as artificial neural networks (Sarwar et al., 2000). Thabtah et al. (2005) sustain that a few accurate and effective classifiers based on associative classification have been presented recently, such as CBA (Classification Based in Association) (Liu et al., 1998), CPAR (Classification based on Predictive Association Rules) (Yin & Han, 2003), and CMAR (Classification based on Multiple class-Association Rules) (Li et al., 2001). Taking into account that for classification rule mining there is one and only one predetermined target, while for association rule mining the target of discovery is not pre-determined (Liu et al., 1998), it is necessary to constrain the rules’ consequent terms to encompass only one attribute. Thus, the consequent term of an association rule will represent the target, or class, attribute. Therefore, such rule can play a prediction role in a given system: in order to classify an item, the rule’s properties are matched to every rule’s antecedents and the attribute value of the consequent term (from one or more selected rules) will be the predicted class. Generally, the classification model is a set of ordered rules. In the CBA algorithm, for example, the rules are ordered by means of the confidence measure and it uses only one rule for performing classification. However, in this case some scenario in which could exist multiples rules with similar confidence measures may occur and, at the same time, with greatly different support measures. Hence, a rule A with much higher confidence than a rule B could be the one chosen for classification even if B had a much higher support (Li et al., 2001). The MCAR algorithm solves such drawback by means of an approach that considers, in addition to the confidence, the rules’ support. The CMAR algorithm has a fine approach for selecting association rules for classification, instead of using just one rule, it makes use of all rules that match the case to be classified. If the consequent term of all selected rules is the same, the predicted class will obviously be the value of the rules’ consequent term. Though, in a different scenario, rules are divided in groups according to the consequent terms’ values. The value chosen for classification is acquired through the group in which its elements hold the highest correlation value depending on the weighted v2 measure. Similarly to CMAR, the CPAR algorithm also divides rules in groups, though, instead of using all rules that match to the object to be predicted, it uses the ‘‘k’’ best rules that represent each class. Afterwards, the algorithm chooses a group, by means of the Laplace Accuracy measure, that will be the one used for classification. The drawbacks presented by association rules induction algorithms are, in general, the same ones of classification based on association algorithms. A critical drawback of these algorithms is due to those rules that have few attributes. Seeing that such rules expresses narrow information, an object which has few attributes would be ineffectively classified. Another critical drawback is due to the large number of rules that algorithms commonly produce (Sarwar et al., 2001), as a consequence, much of them do not supply relevant information or are contradictory. Such drawback is a critical issue related to associative classifiers, because the performance of the algorithm may be affected when retrieving, storing, pruning and sorting a large number of rules (Li et al., 2001). The CMAR algorithm tries to solve such drawback by implementing a FP-Tree data structure to store the association rules’ frequent itemsets.
4. Recommender systems shortcomings Shortcomings inherited by methods employed in recommender systems are reflected in erroneous recommendations. An error in a
J. Pinho Lucas et al. / Expert Systems with Applications 39 (2012) 1273–1283
recommendation may be presented as a false negative or a false positive. The first one consists of products that were not recommended, though the consumer would have liked them. The second one consists of recommended products, though the consumer does not like them. According to Sarwar et al. (2000), the false positives are more critical because they will lead to angry consumers. Moreover, even early researchers recognized that when recommender approaches are used to support decisions, it can be more valuable to measure how often the system leads its users to wrong choices (Herlocker et al., 2004). Consequently, recommender methods should concern mostly on avoiding false positives. However, Claypool et al. (1999) have suggested that recommender methods, unlike humans, have difficulty in distinguishing between high-quality and low-quality information relating to the same topic. Therefore, it might not be effective to provide recommendations when evaluations acquired from users are not taken into account. On the other hand, the use of information from user evaluations probably provide more false positives. In order to reduce false positives, methods employed in recommender systems must avoid the occurrence of some typical drawbacks. Next subsections will describe the four most critical drawbacks that may occur in these systems. 4.1. Data sparsity Probably the biggest challenge recommender systems have nowadays is related to data sparsity problem due to the huge amount of data available in current recommender systems. Fundamentally, sparsity occurs because the number of ratings needed to build a prediction model is greater than the number of the ratings obtained from users. Moreover, most recommender techniques require user explicit expression of personal preferences for items. Nevertheless, methods for obtaining ratings implicitly have been developed in order to add more ratings and reduce sparsity. However, even with the use of up to date methods (including data mining methods), sparsity still remain a critical drawback for recommender systems due to the extensive number of items available. This is a significant problem because, in practice, it is usually costly and difficult to collect sufficient data for all users (Ahn, Kang, & Lee, 2010). According to Sarwar et al. (2001), active users may have purchased less than 1% of the items available in a system. This means that in a recommender system of movies owning 500,000 items, for example, an active user would be able to rate 5000 movies, nevertheless, we cannot expect that all users of the system watch 5000 movies and provide ratings to all of them. In addition, rating schemes can only be applied to homogeneous domains and the number of ratings eligible to be used by the system is even more restricted. Model based methods reduce shortcomings derived from sparsity, however it is still necessary to have a certain minimum number of ratings in order to build an estimation model of ratings. In any case, owning ratings of just 1% of system’s items may be, according to the technique used, scarce to build a reliable model. In this work we also analyze how the performance of classifiers may be affected by sparsity. Therefore, we address some questions related to sparsity: how we can nominate whether a dataset is sparse or not and if it is possible to measure the degree of sparsity of a dataset. Due to practical reasons sometimes industry and academy, evaluate sparsity considering the number of NULL/NA values presented by a certain dataset. Hence, sparsity may be seen as density, which reflects both the overall size of recommender’s item space and the degree in which users have explored it (Herlocker et al., 2004). In this context, in Rittman (2005) an example about a dataset with four attributes in a retail market scenario is described. The attributes are: store, week in a year, costumer and product. If there
1277
are 1000 stores, 52 weeks in a year, 500,000 customers and 10,000 products, the dataset has 1000 52 10,000 500,000 = 260,000,000,000,000 potential cells. However, it might only have 1,872,000,000 populated cells, because there are 450,000 customers shopping on average 26 times a year, buying 40 products at just 2 stores. So, the dataset is 0.00036% sparse ((936,000,000/ 260,000,000,000,000) 100). 4.2. Scalability Scalability is another drawback of recommender systems resulted by the huge number of items available in it. Scalability in recommender systems includes both very large problem sizes and real-time latency requirements (Schafer et al., 2001). One example of such requirements may be a scenario in which there is a recommender system connected to a Web site needing to provide recommendations within some milliseconds and, at the same time, serve thousands of users simultaneously. Thus, a major challenge in recommender systems nowadays, according to Schafer et al. (2001), is to adapt data mining techniques to meet simultaneously low latency and high throughput (amount of data flowing in a system) requirements in order to attract and serve a huge number of users. In the recommender system context, the throughput may be measured by the number of users and items that the system is able to support without affecting efficiency. Efficiency is a key feature for recommender systems, because they need to supply fast feedback to their users. Generally, shortcomings resulted from scalability do not occur in model based methods, because in these methods, differently from other classes of methods, the computer data processing is usually not done during run time. Scalability may turn into a major concern for the efficiency of a system, because some techniques, like searching for the nearest neighbor, for example, may be unfeasible to be employed in systems having a huge data base. A typical web-based recommender system running only the nearest neighbor algorithm will probably suffer serious scalability problems (Sarwar et al., 2001). 4.3. Early rater problem Despite that the drawbacks described before may be minimized by means of usage of model-based methods, there are other shortcomings that may occur along with these methods. The Early Rater (or First-Rater) Problem (Claypool et al., 1999; Condliff et al., 1999) is an example of drawback associated with every class of collaborative filtering methods. Such drawback arises when it is impossible to offer recommendations about an item that was just incorporated in the system and, therefore, has few (or even none) evaluations from users. In fact, the early rater problem is directly linked with sparsity, because when a system has a high number of items, probably most of these items have never received any evaluation. Conceptually, the early-rater problem can be viewed as a special instance of the sparsity problem (Huang, Chen, & Zeng, 2004). Sarwar et al. (2001) affirm that current recommender systems depend on the altruism of a set of users who are willing to rate many items without receiving many recommendations. Economists have speculated that even if rating required no effort at all, many users would choose to delay considering items to wait for their neighbors to provide them with recommendations (Avery & Zeckhauser, 1997). Thus, it is needed to find a way to encourage users to made evaluations about items available in the system. Analogously, such drawback also occurs with a new user joining the system, because since there is no information about him, it would be impossible to determine his behavior in order to provide him recommendations. Actually, this scenario of the early rater
1278
J. Pinho Lucas et al. / Expert Systems with Applications 39 (2012) 1273–1283
problem is also referred as the ‘‘Cold-Start problem’’ (Guo, 1997) in the literature. As extreme case of the early rater problem, when a collaborative filtering system first begins every user suffers from the early rater problem for every item (Claypool et al., 1999). 4.4. Gray sheep problem Another drawback occurring just in collaborative filtering methods is related to the gray Sheep Problem (Claypool et al., 1999). Such drawback is associated to users whose preferences do not match with the ones of any group of users. As a consequence, these users will not receive any recommendation. In fact, a user who stays long time in the condition of cold-start may be considered in the condition of gray sheep as well, because such user has not shown interest on system items. Thus, conceptually the gray sheep problem can be viewed as a special instance of the early-rater problem. Given that in content-based methods preferences of other users is not considered for building recommendations, the gray sheep problem does not occur in this class of methods. Moreover, the early rater problem neither occurs in such class of methods, because they may provide recommendation taking into account merely the features of an item. On the other hand, since content-based methods do not consider the social context of the user, the system may only be capable of recommending items that score highly against a user profile (Condliff et al., 1999), therefore, the user will only get to see items that are similar to those he or she has rated positively. As a result, lots of false negatives may occur, because such methods are not able to distinguish between low and high quality information related to the same domain. Moreover, in some domains the items are not amenable to any useful feature extraction methods with current technology (such as movies, music, restaurants) (Balabanovic´ & Shoham, 1997). In web pages, for instance, only some features of the content can be extracted, because current information retrieval techniques ignore multimedia features like text embedded in images for example (Balabanovic´ & Shoham, 1997). 5. Case study In this section we describe a case study that aims at investigating algorithms and design conducts that may help to alleviate the most common and challenging drawbacks inherited to recommender systems. To do so, we analyze the behavior of some classification algorithms on general data and also on real data gathered from recommender systems. In order to evaluate such algorithms, we consider, besides the classic precision rate metric, some important metrics in the recommender system context (i.e., false positives rate and number of rules considered for classification). Hill, Stead, Rosenstein, and Furnas (1995) suggested that algorithmic improvements in collaborative filtering methods may come from different directions than just continued improvements in mean absolute error. Moreover, though the new algorithms often appear to do better than the older algorithms they are compared to, we find that when each algorithm is tuned to its optimum, they all produce similar measures of quality (Herlocker et al., 2004). That is why we consider, in addition to numerical metrics, some implicit non numerical features where we assume empirical knowledge too. Consequently, we are able to analyze recommender techniques within different perspectives and not being restricted to evaluate methods simply by comparing classic accuracy metrics. In order to build experiments performed in this case study, we considered the 26 datasets tested in the case studies accomplished in Liu et al. (1998) and Li et al. 2001) and, we also employed five datasets obtained from real recommender systems. For all datasets
the WEKA (Waikato Environment for Knowledge Analysis) tool was used to perform data transformation and pre-processing and we applied the 10-fold cross-validation method to estimate algorithms’ classification metrics for all experiments made. In the next subsection we describe all datasets employed in this case study, as well as how they were obtained. Subsequently, in Section 5.2, we compare some typical rule-based classifiers against some general classifiers, where we take into account their behavior associated to the drawbacks depicted in Section 4. In Sections 5.3 and 5.4 we analyze how rule-based algorithms and recommender systems data are related to sparsity (the most critical drawback of recommender systems). Finally, in Section 5.5 we probe the occurrence of false positive errors on three different groups of classifiers. 5.1. Employed datasets In order to analyze sparsity on rule-based classifiers (Section 5.3), we considered the 26 datasets employed by the works proposing the CBA (Liu et al., 1998) and CMAR (Li et al., 2001) algorithms, which are general data of different gathered from the UCI Machine Learning Repository. However, at this point we just compared the results obtained by authors in each dataset, where we considered the number of attributes, the number of records and the accuracy obtained by classifiers. However, for the rest of the experiments performed in this case study, we employed real data obtained from two recommender systems: MovieLens and BookCrossing. The data of MovieLens consists of movies ratings made by users in 2000, which is a recommender system based on the GroupLens technology, and is freely available for research purposes. Initially, the MovieLens dataset contained approximately 100,000 ratings for 1682 movies made by 943 users, but we integrated the data related to users and movies into one file, which was the input provided for the algorithms analyzed in this case study. However, before supplying such input we changed the rating attribute in order to have only two values: ‘‘Not recommended’’ (score 1 or 2) and ‘‘Recommended’’ (score 3, 4 or 5). The first one refers to an item the user may be interested in and the second refers to the opposite case. Such changes were performed to simplify classification, because the main aim in a recommendation task is to determine if an item should be offered to the user or not. Taking into account users’ data, we used the following attributes: gender, age and occupation. The age attribute was discretized in five age ranges. The user’s occupation attribute is a nominal variable with 21 distinct values. Taking into account movies’ data, the file provided by MovieLens originally contained 19 binary attributes related to movie genres. An instance with value 1 expressed that the movie belongs to a specific gender and 0 otherwise. The association model’s consistency would be compromised if 19, among the 23 attributes on the dataset, were binaries. Thus, these 19 binary attributes were reduced to just one attribute representing the movie genre’s name. However, since some movies may belong to different film genres, we only used the records containing ratings about movies with just one genre. After data pre-processing and transformation, 14,587 records were remained in the input file for the algorithms used in this study. On the other hand, the database of BookCrossing consists of book ratings gathered by Ziegler, McNee, Konstan, and Lausen (2005) from the BookCrossing community. Users from this community exchange books and experiences all around the world. Initially, the BookCrossing data contained 433,671 explicit ratings (an assigned mark from 1 to 10) about 185,832 books provided by 77,797 users. Such database has 2.33 ratings per item, for this reason it may be considered much sparser than MovieLens (which has 59.45 ratings per item). This assumption makes the use of the
1279
J. Pinho Lucas et al. / Expert Systems with Applications 39 (2012) 1273–1283
BookCrossing data even more interesting and innovative to test recommender methods. In this context, Herlocker et al. (2004) verified that different algorithms designed for recommender systems may be better or worse on different datasets and, in addition, many algorithms have been designed specifically for datasets having many more users than items. As a consequence, such algorithms may present a completely different performance when applied to datasets that do not own an abundant quantity of users per item or ratings per item. So that, testing recommender algorithms only using the MovieLens data might not take trustful evaluations for the algorithms use in general recommender systems. However, as the BookCrossing data was not used before for testing recommender algorithms, some efforts was required for preparing data to be supplied for the algorithms input. In the rest of this subsection, we describe the main approaches employed for performing data pre-processing and transformation. In order to simplify the classification, the rating attribute on the BookCrossing database was modified in the same way as the MovieLens was: ‘‘Not recommended’’ (score from 1 or 6) and ‘‘Recommended’’ (score from 7 to 10). For books’ data, we used two attributes from the dataset: publication Year and Author. The first was discretized in five ranges. The Author attribute was also modified, because at first it encompassed 48,234 distinct values. Thus, the dataset was reduced in order to this attribute encompasses only 40 distinct values (the ones that appear on more records). Taking into account users’ data, we also used two attributes: Age and Place where the user inhabits. The first was discretized in nine age ranges. The Place attribute originally contained the name of the city, the state or province, and the name of the country. However, in this way such attribute presented 12,952 distinct values. Therefore, we changed this attribute in order to encompass only 40 distinct values. For that reason and noticing that 75% of the places were from USA, we divided the dataset, based on this attribute, in two: places grouped by states of USA and places grouped by countries excepting USA. Afterwards, the first dataset (states of USA) remained with 25,523 records and the second one (countries) remained with 8926 records. In order to perform the case study on even more diverse data, we also used two more datasets, owing a smaller range of distinct values, derived from those mentioned before. So that, we copied both datasets and kept only 10 distinct values (the most frequent) for author and Country/State attributes. This way, we obtained two more datasets that contain 6270 records (on the dataset of states of USA) and 3238 records (on the dataset of countries). 5.2. Associative classifiers vs. general classification algorithms In this subsection we describe some experiments done intending to test classification algorithms, especially associative classifiers, on real recommender systems data. To do so, we compared three associative classifiers (CBA, CPAR and CMAR) with two traditional classifiers (Bayes Net and C4.5), where Bayes Net is a probabilistic algorithm and C4.5 a decision tree based algorithm. The first two were run through WEKA and the other three were obtained from LUCS-KDD Software Library, from the University of Liverpool. Bayes Net and C4.5 were chosen because, in addition to be classification methods widely employed in recommender systems, they represent two groups of machine learning methods: probabilistic classification (Bayes Net) and rule-based classification (C4.5). On the other hand, besides the existence of few accurate and effective associative classifiers, there are even fewer implementations available. In fact, the LUKS-KDD Software Library was the only software repository we found offering associative classifiers for free use. Therefore, we only employed three classification based on association algorithms in this case study. CBA was chosen because it was the first associative classifier built and its concepts
are in most current associative classifiers. Conversely, CMAR was chosen because it represents a modern generation of these classifiers. It employs an alternative data structure for storing classification rules, which, according to Li et al. (2001), makes able to reduce processing time and, in some cases, increase precision. At last, CPAR was chosen because it is another popular associative classifier proposed more recently. In order to perform experiments, we defined a very low support threshold value (1%), for running classification based on association algorithms, to be able to obtain enough frequent itemsets. Conversely, we defined high values of confidence threshold: 85% to apply CBA and CPAR and 70% to apply CMAR. We reduced the confidence threshold to apply CMAR because the data structure it employs, a FP-Tree (Frequent Pattern Tree), stores frequent itemsets in a compact way in which common relations between itemsets are explored. In this way, items need to be frequent enough to be stored in the FP-Tree and to be considered at first time. For the datasets of BookCrossing containing 10 distinct values for Author and Country/State attributes, we increased the support threshold to 5% due to its reduced number of records. In Table 1 we show the results obtained after running the algorithms mentioned above. Each line depicts the accuracy obtained on each classifier, which is defined as the percentage of the correct classified samples among the whole data taken into account. Results revealed that the associative classifiers reached similar accuracy, excepting CMAR on BookCrossing data, to traditional classifiers (supervised learning). Actually, in some cases associative classifiers reached higher accuracy. Despite the fact of being the first method of classification based on association, the CBA algorithm reached the highest accuracy on two of the four datasets of BookCrossing. On MovieLens data, CMAR reached the highest accuracy, which was the best result obtained over all the experiments. Since rules provided by the associative classifiers hold a high confidence value (equal or greater than 70% or 85%), the rules used for building the classification models are reliable. The ninth rule generated by CMAR on MovieLens data is an example of this kind of rule: ‘‘age = (Li et al., 2001; Liu et al., 1998; Lucas & P, 2008; Moreno et al., 2008; Neves, 2003; Rittman, 2005; Sarwar et al., 2000, 2001; Schafer, 2005; Schafer et al., 2001) & genre = ‘drama’ ? rating = ‘yes’’’. Such rule states that, if a user is older than 25 years and younger than 34 years old, he will probably rate positively a drama movie. Despite of presenting the highest precision over all experiments (85.16% on MovieLens data), CMAR presented lower precision than other classifiers on the BookCrossing datasets. MovieLens and BookCrossing data basically differ on the number of distinct values of their attributes. MovieLens has only two distinct values on the Genre attribute, for example, and the other attributes have, in general, less distinct values than MovieLens datasets when the ratio of records/number of distinct values is taken into consideration. Moreover, it has a ratio of 59.45 ratings per item, which is considerably greater than the ratio of MovieLens (2.33 ratings per item). Taking into account the datasets of BookCrossing, when comparing the datasets of ‘‘world countries’’ with the datasets of
Table 1 Comparison of classifiers. Data
Bayes Net (%)
C4.5 (%)
CBA (%)
CPAR (%)
CMAR (%)
MovieLens BCrossing World BCrossing World 10 BCrossing USA BCrossing USA 10
81.95 80.87 80.51
82.88 80.21 79.98
81.4 79.47 80.52
74.07 73.25 79.86
85.16 70.43 41.41
80.23 81.53
81.31 80.82
80.28 81.56
78.15 76.71
77.66 69.59
1280
J. Pinho Lucas et al. / Expert Systems with Applications 39 (2012) 1273–1283
‘‘states of USA’’, we noticed a slight loss of precision, in most scenarios, on the dataset of world countries. Such loss is reasonable due to the diversification of characteristics that readers from different world countries (from four continents) may have, which probably are much more substantial than people from just one country. Comparing datasets having 40 distinct values (on Country/State and Author attributes) with datasets having 10 distinct values, we noticed that there was not a significant difference of precision among them with the values set confidence and support threshold. At last, the CPAR algorithm also presented acceptable results, even though its precision was slightly lower than ones of other classifiers. Such algorithm is more effective for scenarios of very large datasets where processing time may be a critical issue, because the classifier construction and the rules induction are made in just one processing step. However, in the context of recommender systems, the response time to the user is not a critical issue, because the recommender model is built off-line. Through these experiments we were able to conclude that classification based on association methods can be employed effectively in recommender systems like other methods are, because they can reach similar, or even greater, precision to traditional classifiers, and also because rules obtained by the classification model are feasible due to the high value of confidence they present. Since the estimation model is built off-line, recommendations may be provided on real time without significant processing efforts and scalability, which is usually a challenging feature for recommender systems, is not a major design concern like for on-line recommender methods. Moreover, as rule-based classifiers (especial associative classifiers) are composed of trivial classification rules, an estimation model would be easy to be interpreted. It allows the analyst to interfere and interpret easily the classification model generated. Another advantage of a rule-based recommender model is due to it is not static, since it is not needed to build a model for every user (there are general rules for all users). This way, the gray sheep problem would be less likely to occur using associative classifiers as well. Subsequently, we are going to verify, by means of several datasets with different features and from different domains, how much rule-based classifiers are vulnerable to sparsity. In this way, we will be able to verify if there are even more advantages, for alleviating typical shortcomings, on employing rule-based classifiers on recommender systems. 5.3. Analyzing sparsity on rule-based classifiers In this section we describe a case study that evaluates how sparsity affects some rule-based classifiers accuracy. Therefore, we took into account the case studies accomplished in CBA’s (Liu et al., 1998), CPAR’s (Yin & Han, 2003) and CMAR’s (Li et al., 2001) papers. Such papers compared the accuracy rate obtained by their algorithms against Ripper (a traditional rule based classifier) and C4.5 algorithms. For performing their experiments running such algorithms, they employed twenty-six datasets, which are general data of different domains gathered from the UCI Machine Learning Repository. In such analysis, for evaluating associative classifiers, we took into account CBA and CMAR, and for evaluating rule-based algorithms we considered the same algorithms (C4.5 and Ripper) analyzed in the case studies accomplished in CBA’s, CPAR’s and CMAR’s papers. We did not consider the CPAR algorithm for evaluating sparsity because, as highlighted in the previous subsection, the other two associative classifiers (CBA and CMAR) are more convenient to be employed in recommender systems. In order to consider sparsity, in this analysis we considered a measure based in a relationship between the number of records
and attributes. Such correlation was built taking into account the approach shown in Rittman (2005), which was described in detail in the Section 4.1. A high value of such correlation suggests a dense (non-sparse) dataset (high number of records compared with the number of distinct values of its attributes), because it is easier to have frequent itemsets if there are more records available. In Table 2 we analyze the performance of C4.5, Ripper, CBA and CMAR on sparse datasets. To do so, we selected, (among the 26 datasets analyzed), the four sparsest datasets (Sonar, Labor, Zoo and Hepatic) for running the algorithms. The first four lines on the table depict the accuracy obtained by the classifiers and the last line depicts the correlation between the number of records and attributes. Table 2 shows that the associative classifiers, CBA and CMAR, obtained better results than the other rule-based classifiers (C4.5 and Ripper), because on the four datasets they reached the highest accuracy (on bold) and excepting on the ‘‘Sonar dataset’’, both reached greater accuracy than C4.5 and Ripper. In order to compare the associative classifiers to the others algorithms, we show, on every subsequent table, the four datasets corresponding to the best results reached by each algorithm. For each of the four algorithms analyzed, we show the four datasets corresponding to its best results when compared to other algorithms. In this way, we may know which algorithms are more susceptible to sparsity, because we can analyze the density of the dataset in which every algorithm got the best results. Firstly, we verified the best results reached by CBA in comparison to the non-associative classifiers (C4.5 and Ripper). Such comparison was done through a variation metric considering the difference of accuracy between CBA and C4.5 plus the difference of accuracy between CBA and Ripper. Subsequently, we specify a general formula for such metric.
Metrici ¼ ðaccuracy of algorithm1 accuracy of algorithm2 Þ þ ðaccuracy of algorithm1 accuracy of algorithm3 Þ Taking into account the first column of Table 3, for example, we can check that the accuracy of CBA is 96.8%, of C4.5 is 92.2% and of Ripper is 88.1%. Consequently, the variation metric in this case is 13.3% (4.6% + 8.7%). Table 3 shows the best results of CBA, where the first three lines depict the accuracy obtained by the classifier and the last two lines depict, respectively, the variation metric defined previously and the correlation between the number of records and attributes.
Table 2 Algorithm’s precision on the sparsest datasets. Sonar
Labor
Zoo
Hepatic
C4.5 Ripper CBA CMAR
70.2% 78.4% 77.5% 79.4%
79.3% 84.0% 86.3% 89.7%
92.2% 88.1% 96.8% 97.1%
80.5% 76.7% 81.8% 80.6%
Records/ attributes
208/60 = 3.47 57/16 = 3.56
101/16 = 6.31 155/19 = 8.16
Table 3 CBA’s best results.
CBA C4.5 Ripper metric1
Zoo
Glass
Labor
Sonar
96.8% 92.2% 88.1% 13.3%
73.9% 68.7% 69.1% 10.0%
86.3% 79.3% 84.0% 9.3%
77.5% 70.2% 78.4% 6.4%
Records/attributes 101/16 = 6.31 214/9 = 23.78 57/16 = 3.56 208/60 = 3.47
1281
J. Pinho Lucas et al. / Expert Systems with Applications 39 (2012) 1273–1283 Table 4 CMAR’s best results. Labor
Zoo
Lymph
Waveform
CMAR C4.5 Ripper metric2
89.7% 79.3% 84.0% 16.1%
97.1% 92.2% 88.1% 13.9%
83.1% 73.5% 79.0% 13.7%
79.4% 70.2% 78.4% 12.3%
Records/ attributes
57/ 16 = 3.56
101/ 16 = 6.31
148/ 18 = 8.22
5000/ 21 = 238.1
Through the results shown on Table 3, we can conclude that the non-associative classifiers are much more susceptible to sparsity than CBA, because the four datasets in which metric1 presented the greatest values are quite sparse (the records/attributes correlation is, excepting the Glass dataset, lower than 10). Analogously, we also analyzed the four datasets in which CMAR presented the greatest variation of accuracy (i.e., metric2) when compared to C4.5 and Ripper. Table 4 shows such results. Results revealed that the non-associative classifiers are also much more susceptible to sparsity than CMAR, because the first three datasets are quite sparse. However, CBA appeared to be less susceptible to sparsity than CMAR, because the values of the correlation records/attributes are greater on the results of CMAR (the Waveform dataset, for example, is quite dense). Conversely, we also compared C4.5 and Ripper to the associative classifiers. Table 5 shows the best results of C4.5, where metric3 is the difference of accuracy between C4.5 and CBA plus the difference of accuracy between C4.5 and CMAR. Results shown in Table 5 revealed that the datasets in which C4.5 reached its greatest accuracies are very dense (excepting the Auto dataset). Analogously, the results in Table 6 (Ripper’s best results) have shown than Ripper presented similar results to C4.5. Thus, results suggest that non-associative classifiers are much more susceptible to sparsity than associative classifiers. Moreover, the values of the variation metric in C4.5 and Ripper are notably lower than the ones of CBA and CMAR, which reveals that there is lower. 5.4. Analyzing sparsity on recommender systems data In this subsection, we continue analyzing the effects caused by sparsity. However, at this point we are concerned specifically on
Table 5 C4.5’s best results. Vehicle
Auto
Pima
Led7
C4.5 CBA CMAR metric3
72.6% 68.7% 68.8% 7.7%
80.1% 78.3% 78.1% 3.8%
75.5% 72.9% 75.1% 3.0%
73.5% 71.9% 72.5% 2.6%
Records/ attributes
846/ 18 = 47
205/ 25 = 8.2
768/ 8 = 96
3200/ 7 = 457.1
Table 6 Ripper’s best results. Horse
Austral
Sick
Hypo
Ripper CBA CMAR Variation
84.8% 82.1% 82.6% 4.9%
87.3% 84.9% 86.1% 3.6%
97.7% 97.0% 97.5% 0.9%
98.9% 98.9% 98.4% 0.5%
Records/ attributes
368/ 22 = 16.7
690/ 14 = 49.3
2800/ 29 = 96.5
3163/ 25 = 126.5
recommender systems, because we want to investigate methods that are capable of alleviating effects of sparsity in recommender systems. Furthermore, taking into account results of the experiments made in the previous subsection, in the experiments described subsequently we considered specifically associative classifiers (CBA and CMAR). In this way, we are able to depict a more accurate and less redundant case study. Before performing the experiments, we analyzed how sparsity features of the five datasets acquired from MovieLens and Book Crossing, which were described in Section 5.1. To do so, we also considered the approach taken in Rittman (2005), which was described in detail in Section 4.1. At this point, we also take into account the number of distinct values of the attributes, because the investigated datasets have similar features. Hence, we first take into account the Movielens dataset (with 14,587 records) and calculate the product of distinct values of its attributes: 2 2 7 21 14 = 17,052. For BookCrossing, the product of distinct values both ‘‘Country’’ and ‘‘State’’ datasets is 144,000 (2 40 5 40 9) and the product of distinct values for the reduced datasets is 9000 (2 10 5 10 9). Table 7 shows the density correlation (considering the approach taken in Rittman (2005)) of the datasets used within the experiments made in this subsection. As shown on Table 7, the ‘‘BCrossing World’’ and ‘‘BCrossing USA’’ are the sparsest datasets. Not surprisingly, the two datasets with reduced number of attributes are less sparse. Conversely, the MovieLens is the densest dataset used in this study (0.86). For this reason, the MovieLens dataset is used in mostly recommender systems’ works, because its density make easier to develop trustful case studies. The datasets shown in Table 7 were provided as input to CBA and CMAR algorithms, in which we set a confidence threshold value of 75%. The support threshold values were set empirically according to each dataset and algorithm in a way we could obtain at least 10 rules on the classification model. In Table 4 we show the results obtained by CBA, where we depict the number of rules built for the classification model. Analogously, in Table 5 we show the results obtained by the CMAR classifier. Results revealed associative classifiers reached similar accuracy for MovieLens (the densest dataset), where CMAR presented better accuracy (75.63%) and the greatest support threshold within its results. However, for sparser datasets, CMAR presented worse results, because, as shown in Table 7, the datasets of countries are sparser than the datasets of states of USA and, as shown in Table 9, there was a great loss of accuracy for ‘‘BCrossing World’’ compared with ‘‘BCrossing USA’’ and for ‘‘BCrossing World 10’’ compared with ‘‘BCrossing USA 10’’. On the other hand, as shown in Table 8, CBA did not present a great loss (less than 1%) of accuracy for the datasets of countries. A similar scenario occurred for the datasets with 40 distinct values compared with the ones with 10 distinct values, to which CMAR presented a loss of accuracy. Such datasets own substantial less records (around 75%) than the ones with 40 distinct values, which means they are less likely to present frequent itemsets and to identify relationships to build rules. Conversely, CBA did not lose accuracy in these datasets.
Table 7 Dataset’s density. Dataset
Number of records/product of distinct values
BCrossing USA BCrossing USA 10 BCrossing World BCrossing World 10 MovieLens
25,523/144,000 = 0.18 6270/9000 = 0.7 8926/144,000 = 0.062 3,228 / 9000 = 0.36 14,587/17,052 = 0.86
1282
J. Pinho Lucas et al. / Expert Systems with Applications 39 (2012) 1273–1283
Table 8 Results of CBA.
Table 10 False positive rates.
Dataset
Accuracy (%)
Num. rules
Support (%)
Density
Dataset
Bayes Net (%)
C4.5 (%)
CBA (%)
BCrossing USA BCrossing USA 10 BCrossing World BCrossing World 10 MovieLens
80.24 80.56 79.43 79.26 72.77
15.9 12.3 17.7 10.9 10.3
5 10 5 10 15
0.18 0.7 0.062 0.36 0.86
MovieLens BCrossing World BCrossing World 10 BCrossing USA BCrossing USA 10
47.4 45.15 40.3 47.45 48.25
42.6 39.9 42.45 41.55 44
33.22 23.03 33.83 20.43 34.66
Table 9 Results of CMAR. Dataset
Accuracy (%)
Num. rules
Support (%)
Density
BCrossing USA BCrossing USA 10 BCrossing World BCrossing World 10 MovieLens
71.26 69.58 61.7 42.34 75.63
18.7 16 17.1 11.3 11.7
3 5 2 3 10
0.18 0.7 0.062 0.36 0.86
Therefore, we can conclude that the CBA algorithm is more appropriate than CMAR to be applied in recommender systems. Results have shown that CMAR is less effective on sparser datasets (more distinct values or less number of records). This may be justified by the data structure it employs, which is a FP-Tree (Frequent Pattern Tree). Such structure stores frequent itemsets in a compact way in which common relations between itemsets are explored. So, items stored in the FP-Tree were not frequent enough to generate rules. Besides, usually every new associative classifier paper uses CBA as reference to make comparative studies and to validate their algorithms. The main contribution of recent associative classifiers is related to memory usage and processing time, because, in general, accuracy is higher just in some datasets and such increment is not very substantial. Subsequently, we are going to consider, by means of false positive occurrence on algorithms, more advantages on employing associative classifiers on recommender systems. 5.5. Analyzing false positives occurrence According to what was described in Section 4, designers of recommender methods should have a major concern on avoiding false positive errors. In this context, in this subsection we describe some experiments we made in order to compare the false positive rates of associative classifiers and traditional classifiers. To do so, we chose three algorithms: Bayes Net, C4.5 and CBA. The first two represent two different classes of machine learning methods largely employed in recommender systems, probabilistic classification and rule-based classification, respectively. The third one, CBA, represents associative classifiers and was chosen basically because it presented considerably greater precision than the other two associative classifiers when tested on recommender systems data, especially on the datasets of BookCrossing. In order to calculate the false positive rate of a given class c1, we employed the same approach defined by Fawcett (2003), which is stated as follows:
FP rate ¼ negatives incorrectly classified=total negatives The false positive rate is also called ‘‘false alarm rate’’, because it counts negative instances (samples not owning to the class c1 being analyzed) that were incorrectly classified. On the other hand, the true positives represent the instances owing to c1 that were correctly classified (also called hits). An ideal ‘‘conservative classifier’’ is the one which classifies positives instances (the ones owning to c1) only with strong evidence, as consequence they make few false positive errors, conversely its true positive rates is re-
duced as well, and therefore, the precision is also reduced. Nevertheless, conservative classifiers are appropriate for a recommender system scenario, in which false positives need to be avoided. It should be noted that, generally, associative classifiers do not consider a default rule to classify an instance which would not match any rule generated, as would be done in other traditional classifiers. This would indeed lead to recommend an item that does not match the user’s needs. This way, an associative classifier does not classify a user whose data does not match with any rule generated. Conversely, other classifiers always classify the active user as they classify every sample provided as input. Table 10 shows the false positive rates obtained by Bayes Net, C4.5, CBA on the same datasets employed in the previous subsection, where we also set the same confidence and support threshold values. Table 10 shows that Bayes Net presented the greatest false positive rates among all algorithms studied. This suggests that Bayes Net is more susceptible to the occurrence of false positive errors than rule-based classifiers. Taking into account the CBA algorithm, it obtained, in average, a false positive rate of 10% lesser than C4.5, which was superior to Bayes Net. In addition to that, the false positive rate for the ‘‘BCrossing USA’’ dataset was 21.12% and in comparison with Bayes Net and C4.5. This scenario shows the use of associative classifiers is very appropriate for avoiding false positives occurrence in recommender systems, given that they were dramatically reduced using the CBA algorithm.
6. Conclusion In this work we focused on revealing how the most typical and critical drawbacks of recommender systems may avoided and alleviated. To do so, we tried some machine learning, especially classification based on association, methods on diverse types of data. We also used data gathered from real recommender systems, including the BookCrossing, which is a data base concerning a domain that has never been used before in case studies on recommender systems. According to the results obtained in the first steps of the case study developed in Section 5, we have driven this work on evaluating classification based on association. There are several advantages on employing classification based on association algorithms in recommender systems. First, due to it is an off-line model and, consequently, the processing time and scalability are not a major concern on recommender systems employing associative classifiers. Likewise, sparsity may be alleviated as well, because their features proved to be (according to the experiments made in Sections 5.2 and 5.3) less susceptible to present a radical loss of precision on sparse datasets. Since a recommender model using associative classification has general rules considering several users’ attributes, the gray sheep problem is also likely to be alleviated. In addition, the analyst may set up the input parameters of the algorithm in order to generate more rules that would likely be suitable for more users. Furthermore, the number of false positives is reduced dramatically when we tested the CBA algorithm (Section 5.5). By means of the experiments described in Section 5.4, we argue that CBA is more effective than CMAR on sparser datasets. Thus,
J. Pinho Lucas et al. / Expert Systems with Applications 39 (2012) 1273–1283
the CBA algorithm is more suitable to be employed for recommender systems, because, as seen in Section 2, these systems are generally seriously affected by the sparsity drawback. Besides, since new approaches for classification based on association are generally related to memory usage and processing time, current associative classifiers would not supply many benefits to recommender systems (the recommender model is built off-line). In this context, we argue that CBA is very appropriate to be employed efficiently in recommender systems as a model based collaborative filtering method. The revision on recommender systems’ drawbacks and the case study conducted in this work present can be the basis of several future directions in recommender systems’ research. First, because association rules may be easily combined with machine learning methods. Hence, the CBA algorithm, for example, could be combined to a content-based method in order to improve recommendations quality. Likewise, any other artificial intelligence technique could be used with CBA to bring up more personalization. Alternatively, there are vast possibilities for extending an associative classification approach for a specific domain implemented in a recommender system. References Agrawal, R., Imielinski, T., & Swami, A. (1993). Mining association rules between sets of items in large databases. In ACM SIGMOD international conference on management of data, Washington, DC, USA (pp. 207–216). Ahn, H. J., Kang, H., & Lee, J. (2010). Selecting a small number of products for effective user profiling in collaborative filtering. Expert Systems with Applications, 37(4), 3055–3062. Avery, C., & Zeckhauser, R. (1997). Recommender systems for evaluating computer messages. Communications of the ACM, 40(3), 88–89. Bae, J. K., & Kim, J. (2010). Integration of heterogeneous models to predict consumer behavior. Expert Systems with Applications, 37(3), 1821–1826. Balabanovic´, M., & Shoham, Y. (1997). FAB: content-based, collaborative recommendation. Communications of the ACM, 40(3), 66–72. Billsus, D., & Pazzani, M. J. (1998). Learning collaborative filters. In: 15th international conference on machine learning. Breese, J. S., Heckerman, D., & Kadie, C. (1998). Empirical Analysis of Predictive algorithms for collaborative filtering. In Proceedings of the 14th annual conference on uncertainty in artificial intelligence (UAI-98) (pp. 43–52). Morgan Kaufmann. Brin, S., Motwani, R., Ullman, J. D., & Tsur, S. (1997). Dynamic itemset counting and implication rules for market basket data. In J. Peckham (Ed.), ACM SIGMOD international conference on management of data (pp. 255–264). ACM Press. Cheung, K.-W., Kwok, J. T., Law, M. H., & Tsui, K.-C. (2003). Mining customer product ratings for personalized marketing. Decision Support Systems, 35(2), 231–243. Chou, P.-H., Li, P.-H., Chen, K.-K., & Wu, M.-J. (2010). Integrating web mining and neural network for personalized e-commerce automatic service. Expert Systems with Applications, 37(4), 2898–2910. Claypool, M., Gokhale, A., Miranda, T., Murnikov, P., Netes, D., & Sartin, M. (1999). Combining content-based and collaborative filters in an online newspaper. Condliff, M. K., Lewis, D. D., Madigan, D., & Posse, C. (1999). Bayesian mixed-effects models for recommender systems. Conference SIGIR99 workshop on recommender systems algorithms and evaluation (Vol. 99). Connor, M., & Herlocker, J. (2001). Clustering items for collaborative filtering. In SIGIR 2001 workshop on recommender systems. Fawcett, T. (2003). Roc graphs: Notes and practical considerations for data mining researchers. Technical report HPL-2003-4, HP Laboratories. Géry, M., & Haddad, H. (2003). Evaluation of web usage mining approaches for user’s next request prediction. In The 5th ACM international workshop on Web information and data management, ACM, New York, NY, USA (pp. 74–81).
1283
Göksedef, M., & Gündüz-Ögüdücü, S. (2010). Combination of web page recommender systems. Expert Systems with Applications, 37(4), 2911–2922. Guo, H. (1997). SOAP: Live recommendations through social agents. In Fifth DELOS workshop on filtering and collaborative filtering. Han, J., Pei, J., & Yin, Y. (2000). Mining frequent patterns without candidate generation. In W. Chen, J. Naughton, & P. A. Bernstein (Eds.), ACM SIGMOD intl. conference on management of data (pp. 1–12). ACM Press. Herlocker, J. L., Konstan, J. A., Terveen, L. G., & Riedl, J. (2004). Evaluating collaborative filtering recommender systems. ACM Transactions on Information Systems, 22, 5–53. Hill, W., Stead, L., Rosenstein, M., & Furnas, G. (1995). Recommending and evaluating choices in a virtual community of use. In The SIGCHI conference on Human factors in computing systems (pp. 94–201). New York, NY, USA: ACM Press/Addison-Wesley Publishing Co. Huang, Z., Chen, H., & Zeng, D. (2004). Applying associative retrieval techniques to alleviate the sparsity problem in collaborative filtering. ACM Transactions in Information Systems, 22(1), 116–142. Koren, Y. (2010). Collaborative filtering with temporal dynamics. Communications of the ACM, 53(4), 89–97. Lazcorreta, E., Botella, F., & Fernández-Caballero, A. (2008). Towards personalized recommendation by two-step modified Apriori data mining algorithm. Expert Systems with Applications, 35(3), 1422–1429. Lee, C.-H., Kim, Y.-H., & Rhee, P.-K. (2001). Web personalization expert with combining collaborative filtering and association rule mining technique. Expert Systems and Applications, 21(3), 131–137. Li, W., Han, J., & Pei, J. (2001). CMAR: Accurate and efficient classification based on multiple class-association rules. In ICDM (pp. 369–376). Liu, B., Hsu, W., & Ma, Y. (1998). Integrating classification and association rule mining. In Knowledge discovery and data mining (pp. 80–86). Lucas, J. P, Segrera, S., & Moreno, M. N. (2008). Comparing the use of traditional and associative classifiers towards personalized recommendations. In 20th international conference on software engineering and knowledge engineering (pp. 607–612). Moreno, M. N., Ramos, I., García, F. J., & Toro, M. (2008). An association rule mining method for estimating the impact of project management policies on software quality, development time and effort. Expert Systems with Applications, 34(1), 522–529. Neves, J. P. (2003). Ambiente de pós-processamento para regras de associção, Master’s thesis, Faculty of Economy, Universidade do Porto. Rittman, M. (2005). ‘‘What is sparsity, and why should I be concerned with it?’’ Oracle news. Sarwar, B. M., Karypis, G., Konstan, J. A., & Riedl, J. (2000). Analysis of recommendation algorithms for e-commerce. In ACM conference on electronic commerce (pp. 158–167). Sarwar, B. M., Karypis, G., Konstan, J. A., & Reidl, J. (2001). Item-based collaborative filtering recommendation algorithms. In World Wide Web (pp. 285–295). Schafer, J. B., Konstan, J. A., & Riedl, J. (2001). E-commerce recommendation applications. Data Mining and Knowledge Discovery. Schafer, J. B. (2005). The application of data-mining to recommender systems. In J. Wang (Ed.), Encyclopedia of data warehousing and mining. Information Science Publishing. Sun, X., Kong, F., & Chen, H. (2005). Using quantitative association rules in collaborative filtering. In W. Fan, Z. Wu, & J. Yang (Eds.), WAIM. Lecture notes in computer science (Vol. 3739, pp. 822–827). Springer. Thabtah, F., Cowling, P., & Peng, Y. (2005). MCAR: Multi-class classification based on association rule. In International conference on computer systems and applications. Washington, DC, USA: IEEE Computer Society, pp. 33-I. Yin, X., & Han, J. (2003). CPAR: Classification based on predictive association rules. In SIAM international conference on data mining (pp.331–335). Zaki, M. J., Parthasarathy, S., Ogihara, M., & Li, W. (1997). New algorithms for fast discovery of association rules. Technical Rep. TR651. Zhang, F., & Chang, H.-Y. (2005). On a hybrid rule based recommender system. In The fifth international conference on computer and information technology (pp. 94–198). Washington, DC, USA: IEEE Computer Society. Ziegler, C.-N., McNee, S. M., Konstan, J. A., & Lausen, G. (2005). Improving recommendation lists through topic diversification. In 14th international World Wide Web conference (WWW ’05).