Accepted Manuscript CD-SPM: Cross-domain Book Recommendation using Sequential Pattern Mining and Rule Mining Taushif Anwar, V. Uma PII: DOI: Reference:
S1319-1578(18)31142-X https://doi.org/10.1016/j.jksuci.2019.01.012 JKSUCI 572
To appear in:
Journal of King Saud University - Computer and Information Sciences
Received Date: Revised Date: Accepted Date:
31 October 2018 18 January 2019 27 January 2019
Please cite this article as: Anwar, T., Uma, V., CD-SPM: Cross-domain Book Recommendation using Sequential Pattern Mining and Rule Mining, Journal of King Saud University - Computer and Information Sciences (2019), doi: https://doi.org/10.1016/j.jksuci.2019.01.012
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
CD-SPM: Cross-domain Book Recommendation using Sequential Pattern Mining and Rule Mining Taushif Anwara , V. Umaa a Department
of Computer Science, School Of Engineering and Technology, Pondicherry University, Pondicherry 605014, India
Abstract Recommender system suggests a personalized recommendation by filtering the information based on users interest. Nowadays, users like to purchase the best possible items and services to spend the shortest span of time. The cross-domain recommendation system is a method of recommendation wherein knowledge is gathered from multiple domains. With respect to the users search term from the source domain, most similar items are recommended from the target domain. Semantic similarity between two dierent items can be achieved through Wpath method using Ontology. PrefixSpan is used for generating sequential patterns and Topseq rule mining algorithm is used for finding the frequent sequential rule. So, this work tries to extend cross domain recommendation by 1) finding the semantic similarity of items using Ontology; 2) applying Collaborative Filtering for finding similar items and users; 3) generating frequent item sequences using PrefixSpan sequential pattern mining algorithm and 4) recommending user preferred items using Topseq rule mining algorithm. The recommender system is evaluated considering precision, recall and F1 Score measures. It finds CD-SPM which yields better F1 Score. The proposed approach also alleviates the new user problem and sparsity problem to some extent. Keywords: Semantic similarity, Ontology, Cross-domain recommendation, Collaborative filtering, PrefixSpan
1. Introduction Recommender system (RS) is a subclass of information filtering system. It helps in finding the user interested items from a huge amount of items. The exponential growth of the internet and the explosion of online data become an information overhead problem. Finding the right information at the right time has become a challenging problem. RS is used in various fields for reducing information overhead problems like e-commerce, e-learning, movies, music, news, books and research articles (Zhou et al., 2018; Abdullah et al., 2019). RS is used to retrieve the user preferred information in internet. By using this, we can increase the average order value and easily reduce the trac in services and improve the delivery of relevant content to the user (Cambria et al., 2017; Al-Adrousy et al., 2015). RS are generally classified on the basis of how they recommend items. Usually, three methods are used namely Collaborative Filtering (CF), Content-Based Filtering (CBF) and Hybrid Filtering (HF). CF is used to recommend items preferred by other users having similar taste (Lee et al., 2019). Content-Based Filtering (CBF) is alien to CF and is used to recommend Preprint submitted to Journal of King Saud University
content according to user characteristics (Wang et al., 2018). HF approach attempts to merge dierent filtering approaches for handling the traditional problems of RS (Tarus et al., 2018). Cross-domain recommender systems (CDRS) have the capability to access information belonging to one or more domains. CDRS can be improved by exploiting the knowledge from source domains and enhancing the recommendations in a target domain (Fern´andezTob´ıas et al., 2019). By applying this, we can accomplish better accuracy and overcome data sparsity problem (Hwangbo & Kim, 2017). Knowledge representation technique can be used to represent the characteristics of dierent domains in a hierarchical manner. This helps in categorizing the concepts. Notions of mapping can be done more precisely resulting in better accuracy (Zhu & Iglesias, 2017). Ontology is used for domain knowledge representation. Ontology represents concepts and relationships between facts. Ontology is used to describe individuals, classes, attributes and relations. Ontology can also be represented in the form of Knowledge Graphs (Obeid et al., 2018; Zouaoui & Rezeg, 2018). In this January 29, 2019
in the database. It gives better performance than the other pattern mining algorithms (Fournier-Viger et al., 2011). Topseq rules algorithm is used to discover the Sequential Rules and helps in recommending items in filtered order. To conclude, in this work ontology is used for the representation of knowledge about two domains namely Movies and Books. Semantic similarity is measured between concepts belonging to multiple domains using Wpath method and the most similar items are retrieved using CF (Zhu & Iglesias, 2017). Item-item similarity and user-user similarity help in constructing sequences that represent the similar users’ preference of items.Finally, frequent patterns are identified using the PrefixSpan algorithm and Topseq rules (Fournier-Viger & Tseng, 2011; Nguyen et al., 2018). Items in the sequential rules will be given as a recommendation. Thus, the proposed cross-domain recommendation system will recommend the items based on semantic similarity measurement. Thus, this proposed work can provide cross-domain recommendation items in e-commerce by considering the user- item ratings. Motivation behind this work is explained in Section 2. Section 3 discusses the literature survey. Section 4 explains the CD-SPM system architecture and the modules involved in implementing the system. Section 5 briefly explains the experimental setup and performance measures. Section 6 discusses the results obtained and Section 7 concludes the paper.
Figure 1: Cross-Domain Recommendation
work, KG is used in calculating semantic similarity between two concepts belonging to 2 dierent domains. Semantic similarity can be calculated by finding the shortest distance between the concepts in the knowledge graph (Ali et al., 2017). Semantic similarity is a technique that measures the similarity between dierent entities, words, sentences, or documents. It can be evaluated by describing topological ontologies (Saleena & Srivatsa, 2015; Nilashi et al., 2018). There are two main techniques for measuring the similarity between words viz. Group-wise and Pair-wise. There are three representation methods viz. set, graph and vector in one hand and on the other hand, the similarity between two sets of words can be calculated by combining the similarity between the words in the sets. Wpath semantic similarity method is used for calculating the semantic similarity between concepts (Zhu & Iglesias, 2017). Overloaded information present in multiple domains can be reduced when CF is combined with semantic similarity. Through this, we can get higher accuracy with better recommendations . CF can be achieved by considering ratings given by dierent users for items present in multiple domains. In applying CF, the itemitem similarity is calculated using Adjusted Cosine similarity method. By using item-item similarity, we can easily determine the items matching the user interest. The user-user similarity is calculated for identifying the most similar users which help in retrieving items based on similarity. Identified items represent the users preference of items which are considered as sequences (Liu et al., 2014). Sequential Pattern Mining (SPM) is used to find the frequent patterns from the sequences. PrefixSpan (i.e., Projected Sequential Pattern Mining) is one of the algorithms used for sequential pattern mining (Maylawati et al., 2018). It mines the sequences and is used in reducing the size of the database. It retrieves the most related sequences with the help of frequent search by fixing the prefix and only projecting the postfix sequences
2. RESEARCH MOTIVATION The following research questions gave the motivation to do this research work: In this proposed work we encountered a general research question related to CDRS. How to develop and design a CDRS model that provides better recommendations? To answer this question, we framed sub-questions. What type of Recommendation system could overcome the new user problem so that when a new user who has provided very few ratings joins the system, he she can be recommended items matching their interests? New user problem arises when the new user enters the system and RS has no information about the users interest and tests. Hence, it cannot provide appropriate recommendations. In CDRS, new user problem is solved by using the users preferences gathered from the various source domains. What type of recommendation system could deal with sparsity and improve the accuracy? When there may be few users with fewer preferences and many unrated items then condition become sparse and this leads to /
2
applied and the result shows that deep recurrent neural network outperforms previous CF (Collaborative Filtering) approach significantly. Wen-Yu Lee et al.(Lee et al., 2018) presented event of interest discovery using cross-domain media streams. The first part discusses the methods to normalize the contents of various datasets for comparison and the second part discusses the merging of normalized content using the graph-based algorithm. The experimental results show that the proposed model achieves 57% more precision improvement compared with spanning graphs and k-NN approach. Qian Zhang et al.(Zhang et al., 2017a) proposed a CDRS using consistent information transfer which maintains the consistency during the transferring of knowledge from one domain to other domain. For experimental testing, five real-world datasets with three domains, i.e., books, movies and music were used. The result shows that consistent information transfer increases the accuracy of recommendations in the target domain. Nima Mirbakhsh and CHARLES X. LING (Mirbakhsh & Ling, 2015) proposed CDRS using clusteringbased matrix factorization. This method can more eectively utilize data from auxiliary domains to achieve better recommendations, principally for cold-start users. In this method, eciently utilized cross-domain data improve accuracy rate. For experimental testing, Amazon datasets and Epinions real-world datasets are used. A CDRS proposed by (Tan et al., 2014) based on Bayesian hierarchical approach and Latent Dirichlet Allocation (LDA) for transferring user interests in crossmedia or across domains. CDRS based learning model combines multi-type media information: rating, usergenerated text data and media descriptions. For experimental testing, MovieLens and Amazon real world Datasets are used. The experimental result shows that this method eectively addresses the data sparsity problem. A cross-domain recommendation using social tags and collaborative filtering proposed by (Hao et al., 2016). Social tags are used for connecting dierent domains for handling the data sparsity problem. Usertags are taken from two publicly available datasets viz. Movie-Lens dataset and LibraryThing datasets. By connecting these two domains, authors control the sparsity problem and improve the system eciency. From the literature survey, it is found that many works have been done on the cross-domain recommender system. But, cross-domain recommendation in collaboration with CF and Sequential pattern mining has not been attempted to our knowledge. In this paper,
data sparsity problem. In various domains, the average number of ratings per item and user is low, which may directly aect the recommendations quality (Tan et al., 2014). In this proposed approach PrefixSpan algorithm is used, which retrieves the sequential patterns of the most similar items. Then, Topseq rules are applied to generate the most frequent sequence of books. Through this, recommendation accuracy can improve and sparsity problem can be eectively addressed (Jiang et al., 2015). What type of Recommendation system provides diversity in the recommendation of items and how? Diversity represents the correlations between the recommended items. In diversity unrelated items are suggested in a list. CDRS uses two or more domains and gives diverse recommendations. In this paper, books are recommended on the basis of movies. Through this diversity, better coverage of user preferences is achieved resulting in better recommendation. The above research questions gave a motivation to propose a CDRS to overcome new user, data sparsity problems and provide diversified recommendations. 3. LITERATURE SURVEY Researchers like (Tarus et al., 2018) presented a hybrid knowledge-based RS for e-learning. In this proposed approach, the authors have combined the context awareness, ontology and sequential pattern mining. Context awareness is applied to incorporate contextual information about the learner like learning goals and knowledge level. Ontology is used to represent and model the domain knowledge about the learner and learning resources. SPM generates sequential learning patterns for the learners. The result shows that the hybrid knowledge-based RS approach achieved better recommendation accuracy and also overcomes the data sparsity and cold-start problems. A recommendation based on the deep recurrent neural network to address real-time customized recommendation service was proposed (Wu et al., 2016). The proposed approach traces users browsing pattern using multiple hidden layers. Each hidden layer models is a combination of how the web pages are accessed and in which order. For reducing the processing cost, the networks only store a finite number of states and old states collapse into a single history state. This model also refreshes the recommendation results time to time when the user opens a new web page. After that, recurrent neural network is integrated with a Feedforward network to improve the prediction accuracy. For experimental testing Koala (http://www.kaola.com) dataset is 3
three approaches are combined viz. collaborative filtering, PrefixSpan and Topseq rule (CD-SPM). We have found that the CD-SPM gives higher F1 Score measure, which specifies the recommendation accuracy.
has no contribution in shortest path length. K shows the contribution of the least common subsume (lcs)’s IC which denotes the common information shared by two concepts.
4. CD-SPM SYSTEM
4.1.2. Collaborative Filtering CF is used for filtering similar items, separating the items based on the users’ similar interest and suggesting the items based upon similar users taste (Kumar & Thakur, 2018). In this work, CF is used in prediction of missing user ratings with the help of user-rating dataset. By using adjusted cosine similarity, item-item similarity is generated and prediction matrix is formed with the help of Singular Value Decomposition (SVD) (Nilashi et al., 2018). SVD is a matrix factorization approach which is used in various fields such as data mining, machine learning and theoretical computer science. SVD produces matrices which become a Language of data science. In this paper, the first semantic similarity measure is applied to Ontology and most similar books that are related to the given movie are found using Wpath. CF is then applied on item-rating matrix. By using this item-rating matrix, the item-item similarity matrix is found. To find the item-item similarity, Adjusted Cosine similarity measure is applied. The Adjusted Cosine similarity between two item vectors (i,j) can be calculated using equation 2.
In this CD-SPM system, book recommendation is done in e-commerce by using Ontology and Sequential Pattern Mining. CDRS is provided for two dierent domains viz. movie and book. Ontology is used for knowledge representation (Nilashi et al., 2018; Kermany & Alizadeh, 2017). Wpath method is used to calculate the semantic similarity and to improve the results CF is used in predicting the item-rating matrix. PrefixSpan algorithm is applied to retrieve the sequential patterns of most preferred user items. Topseq rules help to suggest the sequence of books. This method is expected to increase the accuracy and reduce the data sparsity problem by using cross-domain datasets. Figure 2 shows the CD-SPM system architecture. 4.1. TECHNIQUES INVOLVED 4.1.1. Wpath Wpath (weighted path length) is used to calculate the semantic similarity between two items. Crossdomain knowledge is represented using ontology and Wpath is applied to find the semantic similarity (Chergui et al., 2018). One of the major advantages of Wpath method over a conventional knowledge-based approach is the elimination of uniform distance problem. According to uniform distance problem, semantic similarity of any two nodes with the same path length is always the same. Wpath approach combines IC (Informativeness of Concept) in measuring the semantic similarity between concepts and path length. In table 1, a comparison is provided between various semantic similarity measures namely path, li, lin, res, jcn and Wpath using WordNet. From the table, it can be easily inferred that Wpath similarity measure overcomes uniform distance problem (Zhu & Iglesias, 2017). Hence, in this work, Wpath measure is used to find the similarity between items. For example, figure 3 presents a fragment of movie-book concept taxonomy. Wpath Semantic similarity is calculated using equation 1. W pathS imilarity =
P
S im(i; j) =
q
( Ru;i P
( Ru;i
P
Ru ) ( Ru; j Ru
)2
q
Ru )
P
( Ru; j
Ru
(2) )2
where i and j represent the two books, Ru;i represent ratings for the item i given by the user u, Ru is the mean rating of all the ratings provided by user u. Similarly, user-user similarity matrix is calcuRating of lated using Adjusted Cosine formula. items is then calculated using Prediction matrix (Li et al., 2018; Kumar et al., 2018). Prediction matrix is calculated using the formula given in equation 3. P
Pu;i =
t" N P
sim (i; t) Ru;t t"N (sim(i; t))
(3)
where n is the neighborhood of most similar items rated by active user u, and sim(i,t) is the similarity between items i and t. After calculating prediction matrix, the sequence of items is generated based on user interest and then PrefixSpan algorithm is applied for generating frequent sequences.
1 (1) 1 + length(Ci ; C j ) K (IC)(Clcs )
where k 2 (1; 0), where ci and c j belong to source and target domains respectively. When value of k=1, IC 4
Figure 2: CD-SPM System Architecture
100,000 ratings of 1682 movies given by 943 users 1 . The details (User id, Movie id, Movie name, rating, genre) are obtained. We have also used the Book domain dataset from github.com where dataset contains details about 207,572 books belonging to 30 genres 2 . The details (Book id, Book name, Genre id, Genre name) are present in the dataset. The user ratings have been synthetically generated.
4.1.3. PrefixSpan Algorithm For frequent mining patterns, PrefixSpan algorithm is applied. PrefixSpan algorithm also provides efficient processing and reduces the size of projected databases. By applying this algorithm, the most similar and also most preferred items can be recommended to the user. It helps to mine the sequences and retrieve the most preferred sequence of items (Pei et al., 2001; Ma & Ye, 2018).
5.2. Evaluation Metrics
4.1.4. Topseq Rules Topseq Rules is a sequential rule mining algorithm used to order the items in the frequent sequences by varying the confidence and mining the top K sequential rules. This algorithm has excellent performance and scalability (Fournier-Viger & Tseng, 2011).
5.2.1. Root Mean Square Error (RMSE) RMSE computes the deviation between predicted and actual ratings (Zhang et al., 2017b). It can be calculated using the formula given below in equation 4. s
RMS E =
5. Experimental setup 5.1. DATASET DESCRIPTION We have used the freely available dataset for Movie domain from MovieLens 100K dataset which contains
1
Pn i=1
(Pi n
https://grouplens.org/datasets/movielens/1m/
2 https://github.com/uchidalab/book-dataset
5
ri )2
(4)
Figure 3: A fragment of Movie - Book concept taxonomy
where n is the total number of ratings on the item set, Pi is the predicted rating of user on item i, ri is the actual rating of user on item i.
books (CRb ) and Relevant books (Rb ).
5.3. Recommendation accuracy
F1 Score is the weighted average and harmonic mean of precision and recall. F1 Score is a better measure as it denotes the balance between precision and recall.
Recall =
The accuracy of RS is the most essential part of the recommendation. Accuracy of RS measures the accuracy of the algorithm in suggesting the items that matches the users interest. Precision describes the ratio of Correctly Recommended books (CRb ) and Total Recommended of books (T Rb ). It shows how close the measured values are to each other. It denotes the ratio of correct positive observations (Mannepalli et al., 2018). Precision =
CRb T Rb
F1S core =
CRb Rb
2 Precision Recall Precision + Recall
(6)
(7)
6. EXPERIMENTAL RESULTS 6.1. Analysis of error rate Our experiment result shows that User-based CF Root Mean Square Error (RMSE) is 2.95 (highest) in the case of movie Get shortly and 2.530 (lowest) with Toy Story. In Item-based CF, RMSE rate is 3.37 (highest) in the case of movie Get Shortly and 2.96 (lowest) with movie Twelve Monkeys. When we talk about Userbased CF, Mean Square Error (MSE) is 3.50 (highest) in the case of movie Get Shortly and 3.06 (lowest) with
(5)
Recall denotes the presence of relevant and preferred items in the sequence of recommended items (Tong et al., 2018). It is the ratio of Correctly Recommended 6
Table 1: The Illustration of Semantic Similarity Methods on Some Concept Pair (Movie-Book) Examples
Romance- mystery Romance-Fantasy Comedy - Mystery Romance-fiction Fantasy-Thriller Romance- Thriller Teen- children Teen- Young Children- Young
Path 0.333 0.25 0.0909 0.333 0.2 0.25 0.333 0.166 0.2
li 0.112 0.547 0.112 0.669 0.448 0.548 0.67 0.367 0.448
lin 1 0.672 0.247 0.785 0.59 0.59 0.848 0.253 0.3
Res 11.75 7.559 2.619 7.559 7.559 7.797 7.387 2.333 2.333
Jcn 1 0.119 0.0591 0.194 0.089 0.086 0.274 0.069 0.086
wpath 1 0.642 0.152 0.729 0.574 0.655 0.722 0.251 0.296
Table 2: Comparison of error rates
Toy Story Get Shortly Twelve Monkeys Sabrina City of Lost Children
User-based CF RMSE 2.530 2.954 2.613 2.535
Item-based CF RMSE 3.253 3.376 2.967 3.187
User-based CF MSE 3.298 3.503 3.145 3.067
2.517
3.194
3.302
Figure 5: Support vs Number of Sequential patterns Figure 4: Graphical Representation of error rates
6.2.1. Support v s Number of Sequential patterns By considering dierent category of movies, graph is plotted between support, number of sequential patterns generated and is shown in Figure 5. From Figure 5, it is found that by increasing the support values the sequential rules decreases. /
the movie Sabrina. The Table 2 is represented graphically and is shown in figure 4. 6.2. Support v s Number of Sequential patterns and Total time required /
6.2.2. Support v s Total time required By considering dierent category of movies, graph is plotted between support, total time taken and is shown in Figure 6. From 6 it is found that by increasing the support values the execution time decreases. From the analysis, it is found that support value 0.5
By considering various support values such as 0.3, 0.5 and 0.7, the details about corresponding number of sequential patterns generated and total time taken for the books 1) Toy Story 2) Get shortly 3) Twelve Monkeys 4) Sabrina 5) City of Lost Children are presented in table 3.
/
7
Table 3: Support vs Number of Sequential patterns and Total time required
Toy Story GET SHORTLY TWELVE MONKEYS SABRINA City of Lost Children
0.3 NO. OF SEQUENTIAL PATTERNS 4375 6284
TIME REQUIRED IN ms 122 206
0.5 NO. OF SEQUENTIAL PATTERNS 3288 2289
TIME REQUIRED IN ms 102 135
0.7 NO. OF SEQUENTIAL PATTERNS 2873 1323
TIME REQUIRED IN ms 95 60
868
21
252
12
102
7
5756 3708
108 172
2665 2487
92 83
1885 1143
50 35
Figure 6: Support vs Total time required
gives better result in terms of number of sequential patterns and execution time. So, minimum support threshold is set as 0.5 in our work.
Figure 7: Confidence vs Number of sequential patterns
6.3. Confidence v s number of sequential patterns and total time required /
While generating sequential rules, by changing the confidence value as 0.6, 0.8 and 1.0, the details of corresponding number of sequential rules generated and total time required for the books 1) Toy Story 2) Get shortly 3) Twelve Monkeys 4) Sabrina 5) City of Lost Children are presented in Table 4. 6.3.1. Confidence v s Number of sequential rules By considering dierent category of movies the graph is plotted between confidence and number of sequential rules generated and shown in Figure 7. From Figure 7, it is found that by increasing the confidence values the sequential rules get decreased. /
6.3.2. Confidence v s Total time required By considering dierent category of movies the graph is plotted between confidence and total time taken and shown in Figure 8. From Figure 8, it is found that by increasing the confidence values the execution time gets decreased. /
Figure 8: Confidence vs Total time
8
Table 4: Confidence v/s No. of Sequential patterns and Total time required
Toy Story GET SHORTLY TWELVE MONKEYS SABRINA City of Lost Children
60% Confidence NO. OF SEQUENTIAL TIME REQUIRED RULES IN ms 8658 30 9684 115
80% Confidence NO. OF SEQUENTIAL TIME REQUIRED RULES IN ms 5932 22 5764 107
100% Confidence NO. OF SEQUENTIAL TIME REQUIRED RULES IN ms 5674 20 2553 94
17874
103
9030
86
8471
80
12032 6494
85 22
8646 4462
70 18
8208 3927
63 15
Figure 9: RMSE Comparison between CF and CD-SPM
Figure 10: Precision comparison between CF-KNN and CD-SPM (CF + SPM)
From the analysis, it is found that confidence value 0.8 gives better result in terms of number of sequential rules and execution time. So, confidence threshold was set as 0.8 for measuring the recommendation accuracy.
k nearest neighbor (KNN)) (Bilal et al., 2016). Five dierent categories of movies have been used (which has been explained in an earlier section) and it was found that the proposed approach CD-SPM gives better recommendation accuracy with better Precision, Recall and F1 Score values. Figure 10, 11 and 12 show a comparison between the two approaches with respect to Precision, Recall and F1 Score respectively. This is because in this proposed approach PrefixSpan algorithm is used, which retrieves the sequential patterns of the most similar items and Topseq rules help to generate the most frequent books sequence.
6.4. RMSE Comparisons between CD-SPM and CF v t
RMS E =
n
1X (Pi n i=1
Ai )2
(8)
where n is the total number of item on the item set, Pi is the total number of predicted item for user ui and Ai is the actual recommended item for user ui . Sequential pattern is generated using the PrefixSpan algorithm and then most frequent rule is generated using the Topseq rule in CD-SPM. Error comparison is done using the RMSE values calculated using equation 8. CD-SPM which uses CF and SPM achieves lesser error rate when compared to CF.
7. Conclusion The proposed approach CD-SPM can recommend the most preferred items with better recommendation accuracy from dierent domains by combining Wpath, Collaborative Filtering and SPM. In our work, Wpath helps to find the semantic similarity of items belonging to multiple domains. PrefixSpan algorithm helps to retrieve the frequent sequences and Topseq rules fetch the preferred items in a sequence. Initially, the error comparison is performed using the RMSE using five dierent categories of movies and the result shows that the
6.5. Precision,Recall and F1 score Comparison of the proposed CD-SPM and existing CFKNN approaches on the basis of evaluation parameters In the comparative analysis, the proposed CD-SPM approach is compared with CF-KNN (Collaborative filtering with the well-known machine learning algorithm 9
Table 5: Comparison of Precision, Recall and F1 score
Toy Story GET SHORTLY TWELVE MONKEYS SABRINA City of Lost Children
Precision CF-KNN CD-SPM 0.818 0.933 0.733 0.964
Recall CF-KNN CD-SPM 0.545 0.933 0.740 0.893
F1 Score CF-KNN CD-SPM 0.654 0.933 0.736 0.926
0.80
0.98
0.50
0.92
0.615
0.949
0.75 0.813
0.968 0.893
0.50 0.56
0.933 0.885
0.60 0.663
0.950 0.889
proposed system gives lesser error rate. Finally, Pattern mining algorithm is evaluated considering precision, recall and F1 Score measures. The result reveals that CD-SPM performs better when compared with CFKNN approach. The proposed approach alleviates the new user problem and sparsity problem to some extent as the knowledge (rating) of one domain is applied in another domain. The proposed work also provides diversified recommendation with respect to the two domains considered. In future, Multi-domain recommendation can be incorporated to provide even more diversified recommendations. Transfer Learning can be used so that the learning from one domain can be transferred to another domain. Furthermore, contextual recommendations can be given considering changing users interests with time and measure the impact of it on recommendation results. Moreover, RNN (Recurrent Neural Network) can be applied to model the user session and provide personalized recommendation.
Figure 11: Recall comparison between CF-KNN and CD-SPM (CF + SPM)
References Abdullah, L., Ramli, R., Bakodah, H., & Othman, M. (2019). Developing a causal relationship among factors of e-commerce: a decision making approach. Journal of King Saud University-Computer and Information Sciences, . Al-Adrousy, W. M., Ali, H. A., & Hamza, T. T. (2015). A recommender system for team formation in manet. Journal of King Saud University-Computer and Information Sciences, 27, 147–159. Al-Nazer, A., & Helmy, T. (2015). Personalizing health and food advices by semantic enrichment of multilingual cross-domain questions. In GCC Conference and Exhibition (GCCCE), 2015 IEEE 8th (pp. 1–6). IEEE. Ali, F., Kwak, D., Khan, P., Ei-Sappagh, S. H. A., Islam, S. R., Park, D., & Kwak, K.-S. (2017). Merged ontology and svm-based information extraction and recommendation system for social robots. IEEE Access, 5, 12364–12379. Balabanovi´c, M., & Shoham, Y. (1997). Fab: content-based, collaborative recommendation. Communications of the ACM, 40, 66–72. Bilal, M., Israr, H., Shahid, M., & Khan, A. (2016). Sentiment classification of roman-urdu opinions using na¨ıve bayesian, deci-
Figure 12: F1 Score comparison between CF-KNN and CD-SPM (CF + SPM)
10
sion tree and knn classification techniques. Journal of King Saud University-Computer and Information Sciences, 28, 330–344. Burke, R. (2007). Hybrid web recommender systems. In The adaptive web (pp. 377–408). Springer. Cambria, E., Poria, S., Gelbukh, A., & Thelwall, M. (2017). Sentiment analysis is a big suitcase. IEEE Intelligent Systems, 32, 74–80. Carrer-Neto, W., Hern´andez-Alcaraz, M. L., Valencia-Garc´ıa, R., & Garc´ıa-S´anchez, F. (2012). Social knowledge-based recommender system. application to the movies domain. Expert Systems with applications, 39, 10990–11000. Chergui, W., Zidat, S., & Marir, F. (2018). An approach to the acquisition of tacit knowledge based on an ontological model. Journal of King Saud University-Computer and Information Sciences, . Fern´andez-Tob´ıas, I., Cantador, I., Tomeo, P., Anelli, V. W., & Di Noia, T. (2019). Addressing the user cold start with crossdomain collaborative filtering: exploiting item metadata in matrix factorization. User Modeling and User-Adapted Interaction, (pp. 1–44). Fournier-Viger, P., Nkambou, R., & Tseng, V. S.-M. (2011). Rulegrowth: mining sequential rules common to several sequences by pattern-growth. In Proceedings of the 2011 ACM symposium on applied computing (pp. 956–961). ACM. Fournier-Viger, P., & Tseng, V. S. (2011). Mining top-k sequential rules. In International Conference on Advanced Data Mining and Applications (pp. 180–194). Springer. Hao, P., Zhang, G., & Lu, J. (2016). Enhancing cross domain recommendation with domain dependent tags. In Fuzzy Systems (FUZZIEEE), 2016 IEEE International Conference on (pp. 1266–1273). IEEE. Hwangbo, H., & Kim, Y. (2017). An empirical study on the eect of data sparsity and data overlap on cross domain collaborative filtering performance. Expert Systems with Applications, 89, 254– 265. Jiang, M., Cui, P., Chen, X., Wang, F., Zhu, W., & Yang, S. (2015). Social recommendation with cross-domain transferable knowledge. IEEE Transactions on Knowledge and Data Engineering, 27, 3084–3097. Kermany, N. R., & Alizadeh, S. H. (2017). A hybrid multi-criteria recommender system using ontology and neuro-fuzzy techniques. Electronic Commerce Research and Applications, 21, 50–64. Kumar, P., Kumar, V., & Thakur, R. S. (2018). A new approach for rating prediction system using collaborative filtering. Iran Journal of Computer Science, (pp. 1–7). Kumar, P., & Thakur, R. S. (2018). Recommendation system techniques and related issues: a survey. International Journal of Information Technology, 10, 495–501. Lee, J., Hwang, W.-S., Parc, J., Lee, Y., Kim, S.-W., & Lee, D. (2019). l-injection: Toward eective collaborative filtering using uninteresting items. IEEE Transactions on Knowledge and Data Engineering, 31, 3–16. Lee, W.-Y., Hsu, W. H., & Satoh, S. (2018). Learning from crossdomain media streams for event-of-interest discovery. IEEE Transactions on Multimedia, 20, 142–154. Li, B., Yang, Q., & Xue, X. (2009). Can movies and books collaborate? cross-domain collaborative filtering for sparsity reduction. In IJCAI (pp. 2052–2057). volume 9. Li, W., Cao, J., Wu, J., Huang, C., & Buyya, R. (2018). A collaborative filtering recommendation method based on discrete quantuminspired shued frog leaping algorithms in social networks. Future Generation Computer Systems, . Li, Y., Niu, Z., Chen, W., & Zhang, W. (2011). Combining collaborative filtering and sequential pattern mining for recommendation in e-learning environment. In International Conference on WebBased Learning (pp. 305–313). Springer.
Liao, C.-L., & Lee, S.-J. (2016). A clustering based approach to improving the eciency of collaborative filtering recommendation. Electronic Commerce Research and Applications, 18, 1–9. Liu, H., Hu, Z., Mian, A., Tian, H., & Zhu, X. (2014). A new user similarity model to improve the accuracy of collaborative filtering. Knowledge-Based Systems, 56, 156–166. Liu, L., Cui, J., Song, W., & Wang, H. (2017). Multi-domain collaborative recommendation with feature selection. China Communications, 14, 137–148. Ma, X., & Ye, L. (2018). Career goal-based e-learning recommendation using enhanced collaborative filtering and prefixspan. International Journal of Mobile and Blended Learning (IJMBL), 10, 23–37. Mannepalli, K., Sastry, P. N., & Suman, M. (2018). Emotion recognition in speech signals using optimization based multi-svnn classifier. Journal of King Saud University-Computer and Information Sciences, . Maylawati, D., Aulawi, H., & Ramdhani, M. (2018). The concept of sequential pattern mining for text. In IOP Conference Series: Materials Science and Engineering (p. 012042). IOP Publishing volume 434. Mirbakhsh, N., & Ling, C. X. (2015). Improving top-n recommendation for cold-start users via cross-domain information. ACM Transactions on Knowledge Discovery from Data (TKDD), 9, 33. Nguyen, L. T., Vo, B., Nguyen, L. T., Fournier-Viger, P., & Selamat, A. (2018). Etarm: an ecient top-k association rule mining algorithm. Applied Intelligence, 48, 1148–1160. Nguyen, V.-D., Sriboonchitta, S., & Huynh, V.-N. (2017). Using community preference for overcoming sparsity and cold-start problems in collaborative filtering system oering soft ratings. Electronic Commerce Research and Applications, 26, 101–108. Nilashi, M., Ibrahim, O., & Bagherifard, K. (2018). A recommender system based on collaborative filtering using ontology and dimensionality reduction techniques. Expert Systems with Applications, 92, 507–520. Obeid, C., Lahoud, I., El Khoury, H., & Champin, P.-A. (2018). Ontology-based recommender system in higher education. In Companion of the The Web Conference 2018 on The Web Conference 2018 (pp. 1031–1034). International World Wide Web Conferences Steering Committee. Ozer, M., Keles, I., Toroslu, H., Karagoz, P., & Davulcu, H. (2016). Predicting the location and time of mobile phone users by using sequential pattern mining techniques. The Computer Journal, 59, 908–922. URL: http://dx.doi.org/10.1093/comjnl/ bxv075. doi:10.1093/comjnl/bxv075. Patil, L., Dutta, D., & Sriram, R. (2005). Ontology-based exchange of product data semantics. IEEE Transactions on automation science and engineering, 2, 213–225. Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., & Hsu, M.-C. (2001). Prefixspan: Mining sequential patterns efficiently by prefix-projected pattern growth. In icccn (p. 0215). IEEE. Roy, S. D., Mei, T., Zeng, W., & Li, S. (2012). Empowering crossdomain internet media with real-time topic learning from social streams. In Multimedia and Expo (ICME), 2012 IEEE International Conference on (pp. 49–54). IEEE. Saleena, B., & Srivatsa, S. (2015). Using concept similarity in cross ontology for adaptive e-learning systems. Journal of King Saud University-Computer and Information Sciences, 27, 1–12. Taghavi, M., Bentahar, J., Bakhtiyari, K., & Hanachi, C. (2017). New insights towards developing recommender systems. The Computer Journal, 61, 319–348. Tan, S., Bu, J., Qin, X., Chen, C., & Cai, D. (2014). Cross domain recommendation based on multi-type media fusion. Neurocomputing, 127, 124–134.
11
Tarus, J. K., Niu, Z., & Kalui, D. (2018). A hybrid recommender system for e-learning based on context awareness and sequential pattern mining. Soft Computing, 22, 2449–2461. Tong, C., Yin, X., Li, J., Zhu, T., Lv, R., Sun, L., & Rodrigues, J. J. P. C. (2018). A shilling attack detector based on convolutional neural network for collaborative recommender system in social aware network. The Computer Journal, 61, 949–958. URL: http://dx.doi.org/10.1093/comjnl/ bxy008. doi:10.1093/comjnl/bxy008. Wang, D., Liang, Y., Xu, D., Feng, X., & Guan, R. (2018). A content-based recommender system for computer science publications. Knowledge-Based Systems, 157, 1–9. Wang, Z., Liao, J., Cao, Q., Qi, H., & Wang, Z. (2015). Friendbook: a semantic-based friend recommendation system for social networks. IEEE transactions on mobile computing, 14, 538–551. Wu, S., Ren, W., Yu, C., Chen, G., Zhang, D., & Zhu, J. (2016). Personal recommendation using deep recurrent neural networks in netease. In Data Engineering (ICDE), 2016 IEEE 32nd International Conference on (pp. 1218–1229). IEEE. Zhang, Q., Wu, D., Lu, J., Liu, F., & Zhang, G. (2017a). A crossdomain recommender system with consistent information transfer. Decision Support Systems, 104, 49–63. Zhang, Y., Chen, M., Huang, D., Wu, D., & Li, Y. (2017b). idoctor: Personalized and professionalized medical recommendations based on hybrid matrix factorization. Future Generation Computer Systems, 66, 30–35. Zheng, L., Zhu, F., & Mohammed, A. (2017). Attribute and global boosting: A rating prediction method in contextaware recommendation. The Computer Journal, 60, 957– 968. URL: http://dx.doi.org/10.1093/comjnl/bxw016. doi:10.1093/comjnl/bxw016. Zhou, M., Ding, Z., Tang, J., & Yin, D. (2018). Micro behaviors: A new perspective in e-commerce recommender systems. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining (pp. 727–735). ACM. Zhu, G., & Iglesias, C. A. (2017). Computing semantic similarity of concepts in knowledge graphs. IEEE Transactions on Knowledge and Data Engineering, 29, 72–85. Zouaoui, S., & Rezeg, K. (2018). Islamic inheritance calculation system based on arabic ontology (arafamonto). Journal of King Saud University-Computer and Information Sciences, .
12