Expert Systems with Applications 36 (2009) 7114–7122
Contents lists available at ScienceDirect
Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa
Multidimensional credibility model for neighbor selection in collaborative recommendation Kwiseok Kwon a, Jinhyung Cho b,*, Yongtae Park a a b
Interdisciplinary Graduate Program of Technology and Management, Seoul National University, 599, Kwanak Street, Kwanak-Gu, Seoul, Republic of Korea School of Computing and Information, Dongyang Technical College, Seoul, Republic of Korea
a r t i c l e
i n f o
Keywords: Recommendation system Collaborative filtering Source credibility Importance weight Neighbor selection
a b s t r a c t Collaborative filtering (CF) is the most commonly applied recommendation system for personalized services. Since CF systems rely on neighbors as information sources, the recommendation quality of CF depends on the recommenders selected. However, conventional CF has some fundamental limitations in selecting neighbors: recommender reliability proof, theoretical lack of credibility attributes, and no consideration of customers’ heterogeneous characteristics. This study employs a multidimensional credibility model, source credibility from consumer psychology, and provides a theoretical background for credible neighbor selection. The proposed method extracts each consumer’s importance weights on credibility attributes, which improves the recommendation performance by personalizing recommendations. Ó 2008 Elsevier Ltd. All rights reserved.
1. Introduction With the proliferation of e-commerce, recommender systems have been highlighted as a key solution to information overload for customers and for customization for e-commerce providers who strive for dominance (Cho et al., 2007a; Garfinkel et al., 2006). Currently, collaborative filtering (CF) is the most commonly applied and actively researched recommendation system due to its compactness and superiority of recommendation performance (Cho et al., 2007a; Herlocker et al., 2004). CF selects a group of users to serve as information sources for a recommendation to a target customer. Recently, as the participants in the e-commerce market vary, the non-face-to-face communication and anonymity characteristics of e-commerce have caused side effects such as relaying incorrect information from unreliable users, Internet fraud, and so on (Cho et al., 2007b). However, because conventional CF forms neighbors only by preference similarity, it has difficulty dealing with the information provided by unreliable users. To overcome this problem, researchers have introduced a trust (or expertise) metric into CF as a complement to the traditional similarity metric (Massa & Avesani, 2004; O’Donovan & Smyth, 2005). However, these researchers have applied this metric without any theoretical background information on the credibility of the information sources. Moreover, there has been no agreement on the definitions for related terms such as trust and expertise.
* Corresponding author. Tel.: +82 17 328 6720/2 2610 1864; fax: +82 2 2610 1859. E-mail addresses:
[email protected],
[email protected] (J. Cho). 0957-4174/$ - see front matter Ó 2008 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2008.08.071
Meanwhile, customers may evaluate the attributes of information sources differently with reference to their characteristics. While some customers highly evaluate persons with similar tastes, others do the same with persons of expertise in a given subject matter. In other words, there is heterogeneity in what customers consider to be important attributes. The current CF system fails to consider this heterogeneity; there is much room for personalization in CF. In order to address these issues, it is necessary to reflect on the various psychological attributes of online users that have been actively researched in offline consumer behavior studies. This study attempts to apply some consumer behavior theories to e-commerce recommendation methods through the integration of technological and managerial perspectives. It improves and systemizes the existing neighbor selection criteria by adopting the multidimensional credibility model in consumer psychology and proposes a tendency reflected neighbor selection method to CF by considering each customer’s importance weights on the credibility attributes of information sources. 2. Neighbor selection in CF and its limitations One of the most successful recommendation techniques to date is CF (Sarwar et al., 2000; Shardanand & Maes, 1995), whose performance has been proved in various e-commerce applications (Herlocker et al., 2004; Sarwar et al., 2000; Wei et al., 2008). The CF method automates the ‘‘word of mouth” (WOM) that we usually receive from our family, friends, and coworkers (Shardanand & Maes, 1995). Therefore, CF-based recommender systems should form a group of recommenders (that we term ‘‘neighbors”)
K. Kwon et al. / Expert Systems with Applications 36 (2009) 7114–7122
that plays the role of an information source for a target customer. CF identifies neighbors, a group of customers who are regarded as valuable to a target user for whom recommendations are needed, and recommends the items that those neighbors have liked in the past. Therefore, the recommendation quality of CF depends on the credibility and value of the recommenders that are selected for the target customer. 2.1. Similarity-based CF and its limitations The traditional CF method forms a recommender group that comprises the nearest neighbors whose preferences are similar to those of a target user. It selects as neighbors the users that have the most similar rating tendency to the target user. Thus, these methods are referred to as similarity-based CF methods. However, similarity-based CF methods have certain fundamental problems. One of them is the recommender reliability problem, which has been addressed in recent studies (Massa & Avesani, 2004; O’Donovan & Smyth, 2005; O’Mahony et al., 2002; Riggs & Wilensky, 2001). This means that a recommender might not be reliable for a given item or set of items, even though the preferences of the recommender and target user are similar (O’Donovan & Smyth, 2005). Regardless of how similar a user might be to a target customer, the recommendation from the user cannot be reliable or credible if the user is malicious or bears ill-will to a specific firm or product. In addition, an information source has various attributes and the level of a customer’s evaluation of these attributes varies with respect to the characteristics of the user (Duhan et al., 1997). Some customers rely on the opinions of users who have similar preferences, while others depend on the opinions of users who have high expertise, or high trustworthiness. For example, when choosing a movie, a user may greatly depend on the advice of friends with similar tastes, whereas another user might depend on the advice of friends with expertise. This difference can occur even in the same person with respect to products. In other words, although a user searching for a movie to watch relies on friends with similar tastes in movies, the user might thoroughly listen to the users with expertise when selecting highly priced goods such as cars. Therefore, the customers’ evaluation of the attributes of information sources is heterogeneous. However, conventional similarity-based CF cannot reflect this important heterogeneity since it only considers one attribute, i.e., similarity between users. Here lies another fundamental problem with similarity-based CF. 2.2. Trust-based CF and its limitations Trust-based concepts (i.e., trustworthiness, reputation, reliability, credibility, and so on) have been identified as key components in the solution to secure transactions in e-commerce (Jøsang et al., 2007; Kim et al., 2005; Manchala, 2006; Resnick et al., 2000; Riggs & Wilensky, 2001). As a countermeasure to the abovementioned recommender reliability problem, they have also gained an increasing amount of attention in the field of CF research. Recently, a few trust-based CF methods (Massa & Avesani, 2004; O’Donovan & Smyth, 2005; Weng et al., 2006) have been proposed. These techniques derive the neighbors’ trust explicitly or implicitly and use it as an important criterion to select neighbors, which eases the recommender reliability problem. However, the term ‘‘trust” is used in different contexts in trustbased CF literature (Jøsang et al., 2007). Trust-based CF can be classified into two broad approaches by way of defining, measuring, and using trust. The first approach borrows the concept of a social network. It uses trust as the sole criterion for selecting more credible neighbors in a social network linked by the trust metric. In this approach, ‘‘trust” has been used to imply ‘‘trustworthiness,” i.e.,
7115
how much a user can trust other users in a trust network. This method selects trustworthy users for a recommender group using the explicitly rated trust value and trust propagation (Massa & Avesani, 2004; Weng et al., 2006). However, it does not consider the various attributes of the information sources because it only uses a single dimensional trust. Consequently, it cannot reflect the heterogeneity of users in valuating the attributes of the information sources. Another approach follows the basic concept of similarity-based CF and uses trust as a supplementary criterion of similarity for selecting more credible neighbors. In this approach, ‘‘trust” has been used to imply ‘‘expertise” or ‘‘competency,” i.e., the ability of a user to make an appropriate recommendation (O’Donovan & Smyth, 2005, 2006; Riggs & Wilensky, 2001). This method forms a recommender group based on a new combined criterion set such as the product or mean of similarity and trust. Thus, it may reflect the various attributes of the information sources by considering two attributes. However, since both similarity and expertise are equally weighted when combined, the users’ heterogeneous valuation level of the attributes (i.e., similarity and trust) is not yet considered. In this manner, although a trust or expertise metric has been employed for trust-based CF in order to solve recommender reliability problems, two fundamental problems still exist. First, there is no agreement on the definition of ‘‘trust” or ‘‘expertise.” One study used ‘‘trust” to mean ‘‘trustworthiness,” whereas another study used it as ‘‘expertise.” Some papers have used the terms ‘‘trustworthy” and ‘‘credible” synonymously. Second, more fundamentally, there is not enough theoretical background to introduce a trust or expertise metric, in addition to the similarity metric, to CF recommendation systems. Is trust the only criterion for selecting credible neighbors? Are there any other metrics to consider? In other words, there is no managerial or psychological basis to prove which attributes of a human recommender as an information source are pertinent. 2.3. Problem statements Hitherto, the existing CF methods and their limitations were overviewed. We have summarized them here and this study focuses on the following key issues: (1) Recommender reliability problem: not all similar users are reliable. (2) Theoretical lack of credibility attributes of the information sources: there can be various attributes of the credibility of information sources, except for similarities such as expertise and trustworthiness. A theoretical background and definition of the related terms are required. (3) No consideration of target customers’ heterogeneous valuation on the attributes of users’ credibility: some target customers highly evaluate the similarity with the neighbors, while others highly evaluate the level of expertise or trustworthiness of the neighbors.
3. Proposed approach To solve the problems previously mentioned, we propose the following suggestions. 3.1. Source credibility As mentioned in Section 2.2, there have been problems in defining the concept of trust and with the lack of a theoretical background to introduce the trust concept into neighbor selection.
7116
K. Kwon et al. / Expert Systems with Applications 36 (2009) 7114–7122
Hereupon, in order to solve the problems, we introduce source credibility theory to recommender systems to explain the credibility of an information source consisting of multidimensional factors. Source credibility theory has been proposed in the WOM communications studies of consumer psychology and marketing (Eisend, 2006; Robertson, Zielinski, & Ward, 1984. Credibility refers to a person’s perception of the truth of a piece of information (Eisend, 2006). Previous studies have shown that source credibility has a positive, persuasive impact (Wilson & Sherrell, 1993). Credibility is seen as a multidimensional concept and a variety of studies have dealt with the discovery of its dimensions (Eisend, 2006). The identified key dimensions of the credibility of an information source are expertise (competency), trustworthiness, similarity (co-orientation), and attraction (Robertson et al., 1984). We term these ‘‘attributes,” and each attribute is defined as follows:
Expertise: The extent to which a source is perceived as being capable of providing correct information Trustworthiness: The degree to which a source is perceived as providing information that reflects the source’s actual feelings or opinions Similarity: The degree to which a source is similar to the target audience members, or is depicted as having similar problems or other characteristics relating to the use of a particular product or brand Attraction: The extent to which a source elicits positive feelings from audience members, such as a desire to emulate the source in some way The concepts of trust and expertise in the existing trust-based CF methods could be systematically redefined by this source
Fig. 1. Comparison of the existing CF and our proposed method. (a) The conceptual models of the existing CF. (b) The conceptual model of the proposed method.
K. Kwon et al. / Expert Systems with Applications 36 (2009) 7114–7122
7117
credibility. It is possible to encompass and systemize the neighbor selection criteria of trust-based CF as well as similarity-based CF by employing the multidimensional attributes of source credibility. Therefore, the credibility theory can compensate for the weaknesses of the existing CF methods by providing a consumer psychological frame and clear definitions. Among the attributes, attraction is not appropriate for CF because the attribute is generated in an environment wherein the information provider is revealed to the information receiver. Thus, we will obtain and quantify the source credibility by the attributes, except for attraction. We assert that if the source credibility theory that consists of multidimensional attributes is applied to the neighbor selection in CF recommender systems, the performance of the systems can be improved as compared to the existing ones that select neighbors by single or dual dimensions.
user C to user A and user B may be different and the probability of being a neighbor might also differ with respect to the attributes of importance weights for users A and B, although user C’s source credibility for the two users is exactly the same. Considering the importance weights on the credibility attributes enables us to locate more valuable neighbors for a target user, which provides a more customized recommendation. We assert that heterogeneity exists in the attributes importance weights between users, and a recommendation reflecting this heterogeneity can improve the recommendation quality, relative to one that is assuming homogeneity.
3.2. Customer’s importance weights on credibility attributes
4.1. Offline mining phase
The relative importance that each consumer places on the attributes of information sources varies with respect to his or her characteristics, differences in prior knowledge of the products, product involvement, and so on. Previous CF studies have not considered this variation. For example, the existing trust-based CF selects neighbors by the weights that combine trust (expertise) and similarity as though they have the same relative importance, as mentioned in Section 2.2. This research takes into account the heterogeneous users’ importance weights on credibility attributes. Each user has his or her own attribute importance weights relative to the attribute expertise, trustworthiness, and similarity of other users who act as an information source. The proposed method attempts to enhance the CF’s neighbor selection process by reflecting these attribute importance weights. The weight between a target user and another user is the sum of the product of each of another user’s attribute values and the target user’s importance weights on each attribute (see Fig. 1). Let us take an example, for instance, when user A’s importance weights on expertise, trustworthiness, and similarity are 1.5, 1, 0.5, respectively, and user B’s are 0.5, 1, 1.5, respectively. Another user, C, whose weights on expertise, trustworthiness, and similarity with users A and B are 0.8, 0.5, and 0.3, respectively, has a weight with user A of 1.85(=1.5 * 0.8 + 1 * 0.5 + 0.5 * 0.3) and with user B of 1.35(= 0.5 * 0.8 + 1 * 0.5 + 1.5 * 0.3). In other words, the value of
4. Proposed method The overall process of the proposed method consists of the offline mining and online recommendation phases (see Fig. 2).
Step 1: Deriving source credibility attributes Each user’s credibility attributes are derived in this step. (1) Expertise We define expertise as the degree of a user’s competency to provide an accurate prediction and exhibit high activity, on the basis of source credibility. Based on the definition, we devised a measure of expertise reflecting the users’ activity and prediction competency. To measure the prediction competency, we employ and improve the equation used in previous research (O’Donovan & Smyth, 2005). The equation assumes that user u’s rating on an item i, Ru,i, for another user v is correct if the rating is within a range e from v’s actual rating, Rv,i. While previous research (O’Donovan & Smyth, 2005) placed e at 1.8, we set the value more strictly at 1.0 in this study. However, if the number of raters in an item is lower, the user’s prediction competency for the item should be underestimated. Thus, we introduce the compensation value, k, as n/10 where n is the number of raters, with the exception of user u, in an item when the number of raters in an item is less than 10; otherwise, it is 1. See Eq. (1).
CORRECTu;i
Fig. 2. Recommendation process.
¼ 0; ¼ k;
if jRu;i Rv;i j > e if jRu;i Rv;i j 6 e
ð1Þ
7118
K. Kwon et al. / Expert Systems with Applications 36 (2009) 7114–7122
where k: n/10 if n < 10, otherwise 1 (n: number of raters except user u); e: tolerance range. The expertise of user u (EXPERTISEu) is calculated as the rate of the correct value with activity weighting, as shown in Eq. (2). In order to obtain a higher value of expertise with more rating activities for more items, we add activity weighting to the measure in the previous research. If the number of ratings for items by the user considered is lower than that by the average of all the users, the user’s activity weighting au is the number of ratings of the user considered/average number of ratings of all the users; otherwise, au = 1.
P P
EXPERTISEu ¼ au
mu
CORRECTu;i Pni P mu ni 1
In step 2, we extract the heterogeneous importance weights on the credibility attributes with respect to the users. The proposed method builds the following value model that considers importance weights under the assumption that each user evaluates the attributes of the information sources with different importance weights. (1) The value of an information source
(2) Trustworthiness Trustworthiness is defined as the degree to which a user is perceived as providing information that reflects his actual feelings or opinions, on the basis of source credibility. It is difficult to implicitly extract trustworthiness. We assume that if a user’s ratings are similar to the average ratings of all the users in a community, he offers his true opinions (Cho et al., 2007b). Let us consider, for example, the case when the averages of all the users in the community for five movies are 3–5–4–3–5, and the ratings of an individual user are 1–3–2–1–3. Even if his ratings are not exactly the same as the others, his rating tendency is very similar to the other users. In this case, the user is not believed to provide false representation. Thereby, a user’s trustworthiness could be measured by the similarity between the user’s ratings and the averages of the ratings given by the group that the user belongs to. The trustworthiness of user u (TRUSTu) is calculated by using Eq. (4) and employing the absolute value of the Pearson correlation coefficient, which sets the range for trustworthiness from 0 to 1. The significance weighting b is 1 if the number of ratings of a target user is over 50; otherwise, it is n/50, where n is the number of ratings of the user (Herlocker et al., 1999). This provides a user who has many ratings for the items with a high trustworthiness value.
ð3Þ
where Ru,i: rating of user u for item i; Raverage, i: average rating of the users in the community for item i; b: significance weighting; m: number of user u’s rated items (3) Similarity Similarity is defined as the rating similarity between two users. To measure the rating similarity between the users, Pearson’s correlation coefficient, which is the most widely used in conventional CF methods, is employed. The similarity between user u and another user v (SIMILARITYu,v) is calculated using Eq. (4). Significance weighting, c, which is 1 if the number of co-rated items between the two users is over 50 and n/50 (where n is the number of co-ratings between the users) otherwise, is included in the equation (Herlocker et al., 1999). This also assigns a high similarity to a user who has many co-ratings with the other user.
Pm i¼1 ðRu;i Ru ÞðRv;i Rv Þ SIMILARITYu;v ¼ c qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Pm 2 Pm 2 i¼1 ðRu;i Ru Þ i¼1 ðRv;i Rv Þ
Step 2: Extracting each user’s importance weights on the credibility attributes.
ð2Þ
where au: activity weighting; ni: rating set with ratings in item i; mu: item sets with the ratings of user u
Pm i¼1 ðRu;i Ru ÞðRaverage;i Raverage Þ TRUSTu ¼ b qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi P Pm 2 m 2 i¼1 ðRu;i Ru Þ i¼1 ðRaverage;i Raverage Þ
where Ru,i: rating of user u for item i; Rv,i: rating of user v for item i; c: significance weighting; m: number of co-rated items.
ð4Þ
The value of user u as an information source to target user a, Va,u, can be measured in various aspects. However, since CF methods do not expose the source to a target user, it is necessary to estimate the value implicitly. We consider that the value of an information source to a target user depends on the distance between the ratings of the two users. If the user’s rating is exactly the same as that of the target user, the user’s value to the target user is regarded as very high. Therefore, the value of an information source is estimated using mean absolute error (MAE) between the ratings of the users, as shown in the following equation.
Pm V a;u ¼ C MAEa;u ¼ C
i¼1 jRa;i
Ru;i j
m
ð5Þ
where Va,u: target user a’s evaluation of user u; Ra,i: rating of user a for item i; Ru,i: rating of user u for item i; m: number of co-rated items; C: a constant. C is a constant that increases and makes positive the value of an information source whose MAE between the target user is low (Note: C is set to 5 in the experiment in this paper because the MAE can range from 0 to 4 in our experiment data.) A user would have a low MAE with a target user when he and the target user have few co-rated items. To avoid assigning a high value to the user who has less co-rated items with the target user, we include the users who have a minimum threshold number of co-rated items with the target user. Variation with reference to the threshold will be shown in the experiment. (2) Finding each user’s importance weights As mentioned in Section 3.2, we assume that the credibility of an information source consists of expertise, trustworthiness, and similarity, according to the source credibility. The target user’s perceived value for another user is regarded as the sum of the product of each credibility attribute and its importance weight, as shown in the following equation
V a;u ¼ K E;a EXPERTISEu þ K T;a TRUSTu þ K S;a SIMILARITYa;u þ ea ð6Þ Here, a and u denote a target user and another user as an information source, respectively. KE,a, KT,a, and KS,a are the target user’s importance weights on the attributes of EXPERTISEu, TRUSTu, and SIMILARITYa,u, separately. These are estimated by multiple regression analysis using the least squares method. If there is a high probability of multicollinearity among EXPERTISEu, TRUSTu, and SIMILARITYa,u, it is difficult to measure the effect of each attribute. Thus, when multicollinearity exists among three attributes [multicollinearity is considered to exist if the variance inflation factor (VIF) is greater than 10], KE,a, KT,a, and KS,a are estimated by ridge regression analysis.
7119
K. Kwon et al. / Expert Systems with Applications 36 (2009) 7114–7122
Step 3: Selecting the most valuable neighbors The value of user u to target user a, Va,u, is calculated using Eq. (6). Then, top-N valuable users who have the highest value from the viewpoint of the target user are selected as the ‘‘most valuable neighbors”. We selected the top 20 valuable users in this research. Step 4: Generating recommendation The rating of the target user for an item is estimated by the ratings of the neighbors for the item from the conventional CF equation below. We use similarity between two users as the sole weight for prediction because the CF prediction formula is optimized to the correlation coefficient for similarity.
REstimated ¼ Ra þ a;i
PN
u¼1 SIMILARITYa;u ðRu;i PN u¼1 SIMILARITY a;u
Ru Þ
ð7Þ
5. Performance evaluation 5.1. Benchmark systems To prove the performance of the proposed neighbor selection method, we benchmarked the existing CF methods as follows: (1) Similarity-based CF (SCF): Neighbor selection by similarity with significance weighting [high similarity neighbors]. (2) Trust-based CF (TCF): Neighbor selection by similarity and trust (expertise) [high similarity and trust (expertise) neighbors]. (3) Credibility-based CF (CCF): Neighbor selection by similarity, expertise, and trustworthiness based on source credibility with the same weights [high similarity, expertise, and trustworthiness neighbors]. (4) Credibility-based CF with Attribute Importance Weight (CCF-AIW): Neighbor selection by expertise, trustworthiness, and similarity based on source credibility with importance weights [the proposed method].
5.2. Experiment data We have evaluated the feasibility and advantages of our proposed method with a research data set that was open to the public. In the experiments, we used the MovieLens data set consisting of approximately one million ratings from 1 (very bad) to 5 (very good), involving 6040 users and 3900 movie titles. We randomly selected 1000 users. For the evaluation, we have separated this data set into two parts – a calibration set that consists of the ratings for 3000 items and a validation set that consists of the ratings for the remaining items. We calculated the source credibility and importance weights on the attributes of the users in the calibration set and verified the recommendation performance of the benchmark systems in the validation set.
5.4. Experimental results and discussion The performance of the CCF-AIW depends on the accuracy of the importance weights of the target user. The accuracy of the importance weights relies entirely on the other users from whom a target user’s importance weights are extracted. When extracting a target user’s importance weights, one can include other users whose number of co-rating items with the target user is above 0, 10, 20, and so on. In this paper, we refer to this minimum number of co-rating items a ‘‘threshold”. Fig. 3 shows the variation with respect to the threshold. Among the four thresholds of 0, 10, 20, and 30, the CCF-AIW with threshold 10 shows the best result; however, higher thresholds make it worse. This implies that some threshold is indispensable and very helpful. However, if the threshold is too high, the number of users participating in extracting importance weights would be rapidly reduced, which would consequently deteriorate the quality of the weights and performance of the CCF-AIW. Table 1 shows the predictive and classification accuracy of each benchmark system. The MAE of the CCF-AIW with threshold 10 (CCF-AIWt = 10) is approximately 10.198% lower than that of the SCF, 9.70% lower than that of the TCF, and 8.64% lower than that of the CCF at a significance level of 1%. Compared to the SCF, the TCF and CCF do not show much better results in our experiment. The CCF-AIW also improves the SCF by 1.65% with F1 values. These results prove that our proposed method can significantly improve the recommendation quality of the existing CF. We believe that the proposed method can improve the recommendation in the case of the items that need more expertise or trustworthiness, such as cars and digital cameras, which are known as ‘‘high-involvement products”. Since a movie is a relatively lowpriced item, people are disposed to the similarity of users in general. If an item under consideration is expensive and/or important, 0.43 0.42 0.41
12% 0.4165
0.4191
10.19%
10%
9.31% 8%
0.40 0.3911
6.11%
6%
%
In this phase, recommendation is generated in real time by utilizing the data in source credibility and importance weight databases. Here, we explain the recommendation procedures.
each system’s predictive accuracy. MAE is the absolute difference between an actual and a predicted rating value. The second class is classification accuracy metrics. To evaluate how well the recommendation lists match the user’s preferences, we employ the widely used precision, recall, and F1 measures. If a user rates an item as being greater than 70% of the maximum preference value, we consider that the user prefers that item. Our method recommends an item when that item’s predicted rating is greater than 70% of the maximum preference value. Precision refers to the number of recommended items that a user prefers, whereas recall refers to the number of preferred items that the system recommends. F1 is the harmonic mean of precision and recall, that is, (2 * precision * recall)/(precision + recall).
MAE
4.2. Online recommendation phase
0.39 0.38
4%
0.3777 0.3741
2%
0.37
0%
0.36
5.3. Evaluation metrics
-0.61% 0.35
-2%
Baseline [SCF] Thresh 0
To evaluate the performance of the systems, we employed two broad classes of recommendation accuracy metrics. The first class is predictive accuracy metrics. Here, we use the MAE to compare
MAE
Thresh 10
Thresh 20
Thresh 30
Performance Gain
Fig. 3. Variation in predictive accuracy with respect to threshold.
7120
K. Kwon et al. / Expert Systems with Applications 36 (2009) 7114–7122
Table 1 Performance comparison of the benchmark systems System
MAE
CCF-AIWT = 10[A] SCF[S] |A-S|/Sa TCF[T] |A-T|/T CCF[C] |A-C|/C
0.3741 0.4165 10.19% 0.4143 9.70% 0.4095 8.64%
t-value between CCF-AIW 9.884* 9.285* 9.027*
Precision
Recall
F1
0.9075 0.8942 1.49% 0.9013 0.69 0.9012 0.70%
0.8812 0.8655 1.81% 0.8635 2.05% 0.8637 2.03%
0.8942 0.8796 1.65% 0.8820 1.38% 0.8821 1.37%
a *
The performance gain of CCF-AIW with threshold 10 over other system. p < 0.01.
one tends to rely on the users with expertise or trustworthiness rather than similarity. In that case, the proposed method that considers importance weights on all the attributes will show its potential. Each user’s importance weights on the credibility attributes were identified through the offline mining phase in Section 4.1. In our data, all 1000 users, except for 15 users who had a small number of ratings, were found with a threshold of 10. The importance weights on the attributes of some users are shown in Table 2. User #171 has more importance on the expertise of the information sources; User #318, on the trustworthiness; and User #703, on the similarity. If each user’s importance weights are averaged, the group-, market segment-, or entire market-level importance weights can be identified. In this research, if all the users are in the market, the market-level importance weights on expertise, trustworthiness, and similarity are 0.0480, 0.0781, and 1.1954, respectively, as shown in Table 3. This means that the users in the market are similarity dependent for the movie item. This explains why our CCF-AIW does not show improved results far greater than the SCF. The market can be grouped into more homogeneous segments with respect to the importance weights. The market users could be clustered using the K-means clustering method. Through testing, we found the optimal number of clusters to be 4. Table 4 shows the results of the K-means clustering, and it is significant at the 1% level according to the ANOVA table. Since cluster 3, which has only one user, is believed to be an outlier, we observed the characteristics of the clusters, with the exception of cluster 3. It has been reported in consumer psychology and marketing research that consumers who have prior knowledge of the products under consideration may acquire information from their internal sources and thus be less likely to rely on the expertise of external
Table 2 Examples of importance weights
sources, whereas consumers who have less information are more dependent on the expertise of external sources (Furse et al., 1984; Gardener, 1983; Johnson & Russo, 1984). Although the components or dimensions consisting of prior knowledge are under debate (Kerstetter & Cho, 2004), there is a consensus that prior knowledge includes expertise and product familiarity (Alba & Hutchinson, 1987; Brucks, 1985; Mitchell & Dacin, 1996). Therefore, expertise in this paper is regarded as a measure of a user’s prior knowledge of the products, since it is extracted from the user’s predictive competency and past experience, termed activity. We investigated the difference of the expertise among the clusters. According to Table 5, the average expertise of clusters 1, 4, and 2 are 0.0870, 0.1933, and 0.5396, respectively as an ascending order, and the differences between clusters were all significant at the .01 level through post-hoc multiple comparison. Standardized KE values of clusters 1, 4, and 2 are 3.24332, 0.02829, and 0.30144, respectively. This shows that the higher the expertise of the user,
Table 4 K-means clustering of users with respect to importance weights (a) Final cluster centers Cluster
Standardized KE Standardized KT Standardized KS
1
2
3
4
3.24332 .75992 1.83668
.30144 .23889 .55754
8.58633 8.13089 .62192
.02829 .51406 .71713
(b) Distance between final cluster centers Cluster 1 2 3 4
1
2 4.309
4.309 10.444 3.635
12.265 1.517
3
4
10.444 12.265
3.635 1.517 11.457
11.457
(c) Number of cases in each cluster User no.
124 171 318 703
Importance weights
Cluster
On expertise (KE)
Trustworthiness (KT)
Similarity (KS)
2.982762167 1.350303652 0.641907955 0.638276311
0.001767516 0.215063452 2.033321616 0.085543387
2.410872877 0.982064001 1.402344527 2.984669815
1
50.000
2 3 4
598.000 1.000 336.000 985.000 .000
Valid Missing (d) ANOVA table Cluster
Table 3 Descriptive statistics of the market-level importance weights Importance weights
N
Minimum
Maximum
Average
Standard deviation
On expertise (KE) On trustworthiness (KT) On similarity (KS)
985 985
1.16 1.51
4.27 2.03
.0480 .0781
.46250 .21630
985
.09
4.24
1.1955
.58242
Standardized KE Standardized KT Standardized KS
Error
F
Significance
981
648.901
.000
.781
981
93.009
.000
.465
981
378.234
.000
Mean square
df
Mean square
df
218.095
3
.336
72.634
3
175.914
3
7121
K. Kwon et al. / Expert Systems with Applications 36 (2009) 7114–7122 Table 5 One-way ANOVA table for expertise of the clusters (a) Descriptive of the expertise Cluster
1 2 4 Total Model
N
50 598 336 984
Mean
.0870 .5396 .1933 .3983
Fixed effects Random effects
Standard deviation
0.1553 .22044 .11757 .25625 .18521
Standard deviation
.00220 .00901 .00641 .00817 .00590 .17307
95% confidence interval for mean Lower bound
Upper bound
.0825 .5219 .1807 .3823 .3868 .3463
.0914 .5573 .2060 .4144 .4099 1.1430
Minimum
Maximum
.05 .05 .04 .04
.12 .88 .81 .88
Bet. component variance
.06125
(b) ANOVA Expertise
Sum of squares
df
Mean square
F
Significance
Between groups Within groups Total
30.895 33.652 64.547
2 981 983
15.447 .034
450.309
.000
(c) Post-hoc multiple comparisons (LSD): dependent variable: expertise (I) Cluster no.
1
2 4 1 4 1 2
2 4 *
(J) Cluster no.
Mean difference (I J)
.45260 .10638 .45260 .34622 .10638 .34622
* * * * * *
Standard error
.02727 .02807 .02727 .01263 .02807 .01263
Significance
.000 .000 .000 .000 .000 .000
95% confidence interval Lower bound
Upper bound
.5061 .1615 .3991 .3214 .0513 .3710
.3991 .0513 .5061 .3710 .1615 .3214
The mean difference is significant at the .01 level.
the lower is the importance weight on the expertise of the user. This coincides with the results of previous consumer psychological research (Duhan et al., 1997; Johnson & Russo, 1984) and implies that our implicit derivation of attributes is meaningful. Most of all, identifying upper-personal level importance weights enables marketers to locate the users who are the most valuable to the group or market. The users are regarded as highly valuable information sources in the group or market as a social network. In our experiment, the users who have a higher level of expertise and high similarity are more valuable in cluster 1, whereas the users who have high similarity and trustworthiness are in cluster 4. Thus, marketers can determine the most effective information sources in a group or market segment for a specific product domain. 6. Conclusion Today, most CF recommendation methods do not consider the heterogeneous characteristics of the users, i.e., the differences in customers’ importance weights on the attributes of the recommendation sources. Our proposed method, which employs source credibility and customers’ importance weights on the credibility attributes, has proved through experiments to be advantageous over the existing CF methods. This study makes a few important contributions to CF research. First, it has upgraded the existing CF’s neighbor selection strategy. This enabled CF to include multidimensional criteria and consider the heterogeneous users’ importance weights on the attributes of the credibility and provide more adaptive and personalized recommendation. Second, this study has proposed an interdisciplinary research framework by incorporating the consumer psychological view to the existing CF methods. Our proposed method adopted consumer psychological concepts such as source credibility and attributes of importance weights, thereby establishing the theoretical background about the credibility and trust-based concepts.
Third, our proposed method makes a more adaptive and effective marketing strategy possible. With our approach, e-commerce marketers can identify not only each customer’s importance weights on credibility attributes, but also the market-level importance weights. As a result, it is possible to find the most valuable customer to the target market or segment who may act as an important information source. The directions for future research are as follows. First, it is essential to combine a web usage mining technique that reflects user behavior as well as user preference. Second, it will be interesting to expand it to other challenging e-commerce domains. Comparative research considering the characteristics of products, such as product involvement, is to be anticipated. If our method is applied to high-involvement products such as luxury goods, it can highly improve the performance of a recommendation. References Alba, J., & Hutchinson, W. (1987). Dimensions of consumer experience. Journal of Consumer Research, 13, 411–454. Brucks, M. (1985). The effects of product class knowledge on information search behavior. Journal of Consumer Research, 12, 1–12. Cho, J., Kwon, K., & Park, Y. (2007a). Collaborative filtering using dual information sources. IEEE Intelligent Systems, 22(3), 30–38. Cho, J., Kwon, K., & Park, Y. (2007b). Implicit user credibility extraction for reputation rating mechanism in B2C e-commerce. International Journal of Intelligent Information and Database Systems, 1(3/4), 247–263. Duhan, D. F., Johnson, S. D., Wilcox, J. B., & Harrell, G. D. (1997). Influences on consumer use of word-of-mouth recommendation sources. Journal of Academy of Marketing Science, 22(Fall), 283–295. Eisend, M. (2006). Source credibility dimensions in marketing communication – A generalized solution. Journal of Empirical Generalizations in Marketing, 10, 1–33. Furse, D., Punj, G., & Stewart, D. (1984). A typology of individual search strategies among purchasers of new automobiles. Journal of Consumer Research, 10, 417–431. Gardener, M. (1983). Advertising effects on attributes recalled and criteria used for brand evaluations. Journal of Consumer Research, 10, 310–318. Garfinkel, R., Gopal, R., Tripathi, A., & Yin, F. (2006). Design of a shopbot and recommender system for bundle purchases. Decision Support Systems, 42(3), 1974–1986.
7122
K. Kwon et al. / Expert Systems with Applications 36 (2009) 7114–7122
Herlocker, J. L., Konstan, J. A., Borchers, A., & Riedl, J. (1999). An algorithmic framework for performing collaborative filtering. Proceedings of ACM SIGIR, 230–237. Johnson, E., & Russo, J. (1984). Product familiarity and learning new information. Journal of Consumer Research, 11, 542–550. Jøsang, A., Ismail, R., & Boyd, C. (2007). A survey of trust and reputation systems for online service provision. Decision Support Systems, 43(2), 618–644. Kerstetter, D., & Cho, M.-H. (2004). Prior knowledge, credibility and information search. Annals of Tourism Research, 31(4), 961–985. Kim, Dan J., Song, Yong I., Braynov, S. B., & Rao, H. R. (2005). A multidimensional trust formation model in B-to-C e-commerce: A conceptual framework and content analyses of academia/practitioner perspectives. Decision Support Systems, 40(2), 143–165. Konstan, J. L., Herlocker, J. A., Terveen, L. G., & Riedl, J. T. (2004). Evaluating collaborative filtering recommender systems. ACM Transactions on Information Systems, 22(1), 5–53. Manchala, D. W. (2006). E-commerce trust metrics and models. IEEE Internet Computing, 4(2), 36–44. Massa, P., & Avesani, P. (2004). Trust-aware collaborative filtering for recommender systems. In Proceedings of the second international conference cooperative information systems (CoopIS ‘04) (pp. 492–508). Mitchell, A., & Dacin, P. (1996). The assessment of alternative measures of consumer experience. Journal of Consumer Research, 23, 219–239. O’Donovan, J., & Smyth, B. (2005). Trust in recommender systems. In Proceedings of the 10th international conference intelligent user interfaces (IUI 05) (pp. 167–174). O’Donovan, J., & Smyth, B. (2006). Is trust robust?: An analysis of trust-based recommendation. In Proceedings of the 11th international conference on intelligent user interfaces (pp. 101–108).
O’Mahony, M. P., Hurley, N., Guenole, C., & Silvestre, C. M. (2002). Promoting recommendations: An attack on collaborative filtering. In Proceedings of the 13th international conference on database and expert systems applications (pp. 494– 503). Aix-en-Provence, France. Resnick, P., Zeckhauser, R., Friedman, E., & Kuwabara, K. (2000). Reputation systems. Communications of the ACM, 43(12), 45–48. Riggs, T., & Wilensky, R. (2001). An algorithm for automated rating of reviewers. In Proceedings of the first ACM/IEEE-CS joint conference digital libraries (JCDL 01) (pp. 381–387). Roanoke, VA, USA: ACM Press. Robertson, T. S., Zielinski, J., & Ward, S. (1984). Consumer behavior scott. Foresman and Co. Sarwar B., Karypis, G., Konstan, J., & Riedl, J. (2000). Analysis of recommendation algorithms for e-commerce. In Proceedings of the second ACM conference on electronic commerce (EC ‘00) (pp. 158–167). Minneapolis, MN, USA. Shardanand U., & Maes, P. (1995). Social information filtering: Algorithms for automating ’Word of Mouth’. In Proceedings of the human factors in computing systems conference (CHI ‘95) (pp. 210–217). ACM Press. Wei, C.-P., Yang, C.-S., & Hsiao, H.-W. (2008). A collaborative filtering-based approach to personalized document clustering. Decision Support Systems, 45(3), 413–428. Weng, J., Miao, C., & Goh, A. (2006). Improving collaborative filtering with trustbased metrics. In Proceedings of the 2006 ACM symposium on applied computing (SAC’06) (pp. 1860–1864). Wilson, E. J., & Sherrell, D. L. (1993). Source effects in communication and persuasion research: A meta-analysis of effect size. Journal of the Academy of Marketing Science, 21(2), 101–112.