An improved recommendation algorithm for big data cloud service based on the trust in sociology

Accepted Manuscript An Improved Recommendation Algorithm for Big data Cloud Service based on the Trust in Sociology Chunyong Yin , Jin Wang , Jong Hy...

Download PDF

604KB Sizes 2 Downloads 130 Views

Report

PDF Reader
Full Text

Accepted Manuscript

An Improved Recommendation Algorithm for Big data Cloud Service based on the Trust in Sociology Chunyong Yin , Jin Wang , Jong Hyuk Park PII: DOI: Reference:

S0925-2312(17)30413-7 10.1016/j.neucom.2016.07.079 NEUCOM 18163

To appear in:

Neurocomputing

Received date: Revised date: Accepted date:

5 May 2016 23 June 2016 10 July 2016

Please cite this article as: Chunyong Yin , Jin Wang , Jong Hyuk Park , An Improved Recommendation Algorithm for Big data Cloud Service based on the Trust in Sociology, Neurocomputing (2017), doi: 10.1016/j.neucom.2016.07.079

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

An Improved Recommendation Algorithm for Big data Cloud Service based on the Trust in Sociology Chunyong Yin1, Jin Wang2, Jong Hyuk Park3* 1

SUMMARY

CR IP T

School of Computer and Software, Jiangsu Engineering Center of Network Monitoring, Jiangsu Collaborative Innovation Center of Atmospheric Environment and Equipment Technology, Nanjing University of Information Science & Technology, Nanjing, China 2 College of Information Engineering, Yangzhou University, Yangzhou, China 3 Department of Computer Science and Engineering, Seoul National University of Science and Technology, Korea

Personal recommendation technology is becoming a useful and popular solution to solve the problem of

AN US

information overload with the popularity of big data cloud services. But most recommendation algorithms pay too much attention to the similarity to focus on the social trust between users. So this paper focus on the research of hybrid Recommendation algorithm for big data based on the optimization combining with the similarity and trust in sociology. In this paper, we introduced some user trust models including trust path model and loop trust model, and then we took these models into the calculation of mixed weighting. The experiment results show that the recommendation algorithm considering the trust models has the higher accuracy than the

RMSE (Root Mean Square Error).

Recommendation technology, trust model, similarity, big data, hybrid recommendation

ED

KEYWORDS:

M

traditional recommendation algorithm, and we have a 2% increase in both MEA (Mean Absolute Error) and

PT

1 INTRODUCTION

With the continuous development of network technology, users can gain amounts of information by many

CE

different paths, especially in recent years, the entire internet data size is being in exponential growth. A recent research report shows that 95% of the data generated by the entire human civilization is generated in past 6 years [1]. So we have entered the age of big data.

AC

Big data is the massive data generated by the rapid development of the human modern society; it is also a

research hot spot in academic and industry for its great application prospects. In fact, the big data is also changing the way of our life and work, the big data cloud service can make it easy for users to gain the big data at any time and place with a simple Information and communication equipment. These big data can provide users with a lot of convenience in their daily life and work, but it also causes some problems. The rapid growth of data make users submerged in the sea of the big data, many users have to waste much unnecessary time to find the useful information which is called the problem of information overload. To solve the problem which exacerbated by big data, personal recommendation technology has been a popular research direction in recent years. *

Jong Hyuk Park (SeoulTech, Corresponding Author)

ACCEPTED MANUSCRIPT

The big data exacerbates the problem of information overload, so many research institutions and companies like ALIBABA and NETFLIX focus on the recommendation technology based on the big data recent years. Recommendation technology and search engine technology are both used to ease the problem of information overload, but personal recommendation technology is different from the search engine technology, it can provide the recommendation results from the big data by the user behavior data instead of the explicitly keyword provided by the users, so recommendation technology usually has a better performance in fuzzy search and random search of the items [2].

CR IP T

Personal recommendation technology has been developed more than 20 years from the 1990s [3]. In academia, the earliest research on the this field is a film rating system called MovieLens developed by the GroupLens in the university of Minnesota in United States [4]. The system collects the rating scores of the users’ favorite movies and then provides users some movies they may be interested in by the analysis of the score data [5].

In practical application, personal recommendation technology has been proved that the technology has a good

technology.

AN US

application prospect [6]. For example, Amazon has increased 30% of the sales due to the recommendation

In academia, ACM holds the Conference on Recommendation System since 2008; many companies also open up their data set for research such as Netflix.

Before the formal research, there is an introduction for a basic architecture of a complete recommendation system.

AC

CE

PT

ED

M

A completed recommendation system consists of following parts in Figure 1.

Figure 1. Recommendation system

First, the system collects the user’s data into user database according to the historical records data of all users,

while the user asks the system for help, the system uses the built-in recommendation algorithms to give the target user useful recommendations according to the analysis of the user data and item data [7]. Among them, the algorithm module can affect the whole quality of system [8]. So many researchers focus on the design or optimization of recommendation algorithm and they have made great progress at home and abroad about the research direction.

ACCEPTED MANUSCRIPT

2. RELATED WORKS This technology has a mature theoretical framework until the end of the 20th century [9]. In the 21st century, the technology has entered a high-speed development period because of the rapid development and popularization of internet.

CR IP T

2.1. Previous research on the recommendation algorithm Su X and Khoshgoflaar T M proposed a collaborative filtering recommendation algorithm based on bayesian networks [10], it can improve the recommendation accuracy of sparse data by optimizing the bayesian networks. Bayesian network is a craphics mode to describe the dependencies of variable data, and it is also developed todeal with the uncertainty problem of artificial intelligence researches. Bayesian network expressed the conditions of the independent relationship between each node, so Su X builds the corresponding bayesian

AN US

network topology diagram according to the independent feature attributes of objects to gennerate bayesian traning data set, then uses the collaborative filtering recommendation algorithm to process the data after the correct classification of the users.

Sarwar used matrix decomposition technique to reduce the dimension [11]. Singular value decomposition is a matrix decomposition technique which can extract the algebra features effectively. Sarwar use singular value

M

decomposition to analyze the interests of users according to the implied semantic relation, then the system provides the recommendations by those interents.

Feng Zhang proposed a hybrid recommendation algorithm based on the BP artificial neural network and

ED

collaborative filtering to solve the problem of data sparseness [12]. BP artificial neural network has a good performance on the ability of learning and modeling to the complex relationship between input and output which make it easy to deal with the incomplete information. So this

PT

technology can be used to predict neurological deficit score to improve the quality of the system and this can also ease the problem of data sparseness.

CE

But the above algorithms pay too much attention to the similarity to focus on the social trust between users, especially in recent years, social network is in a state of rapid developing, there is not only similarity, but also social relationship between the users. So this paper proposed some trust models and then focuses on the research

AC

of weighting combining the similarity and trust in hybrid recommendation algorithm.

2.2. Related theoretical support of the research in the paper Trust is a very important relationship in our daily life so as in the recommendation system. There are two kinds of trust [13] in the system. One is the trust between the users and recommendation system and the other is the trust between different users. The former kind of trust can be improved by Increasing the transparency of recommendation system and it can mke users trust the recommendations that the system provide. The later kind of trust takes the social net of the users into consideration and help system provide the recommendations by

ACCEPTED MANUSCRIPT

using the friends of the users, because users always trust their friends which means that users would also trust the recommendations that their friends have bought. The trust in the recommendation system is mainly concentrated in the recommendation system of review site Epinion [14], because Epinion built a trust system to express the trust relationship between the users to help users determine whether to trust the comments of the item. Young Ae Kim also built a trust prediction framework for the users in a popular online community where people are easily able to share their good and bad experiences on various products and services with a large

CR IP T

number of unknown people as well as their friends [15]. In this paper, some definitions in the trust prediction framework can be useful in the recommendation system.

So we believe that trust play a very important roile in the morden recommendation system.

3. COLLABORATIVE FILTERING RECOMMENDATION ALGORITHM

AN US

Collaborative filtering recommendation algorithm (CFRA) is a popular algorithm in information filtering and information systems [13]. CFRA is mainly based on the users or items that may have similar properties, its basic idea is finding the nearest neighbors set of the target user according to the user behavior data collected by the recommendation system, and then choosing the items that the nearest neighbors had chosen as the recommendations to the target user [14]. It has three steps of pretreatment of the data set, finding the nearest

M

neighbors and providing the recommendation results [15].

3.1. Pretreatment of data set

ED

Most data collected by the recommendation system is clutter and redundant, and those data cannot be used directly in the recommendation system, so some pretreatment must be done on the data firstly [16, 20, 23, 24].

AC

CE

user-item matrix.

PT

So we put the historical ratings data into a matrix called user-item matrix S[m,n]. The following Table 1 is the

Table 1.user-item matrix Item1

......

Itemj

......

Itemn

User1

S1,1

......

S1,j

......

S1, n

......

......

Useri

Si,1

......

......

Userm

Sm,1

...... ......

Si,j

...... ......

...... .......

Sm,j

Si,n ......

......

Sm,n

In the matrix, m-row represents the users and n-line represents the items, the element Sij in the matrix is the specific rating score that useri had on itemj.

ACCEPTED MANUSCRIPT

But sometimes the matrix has little useful data but high dimensions because of the little behaviors of most users on the items, and with the continuous expansion of network data, it is hard to generate the complex matrix in a short time.

3.2. Finding the nearest neighbors This step is a very important step, because that the chosen of nearest neighbors (NN) can affect the quality of the algorithm. In this step, CFRA would get the similarity of the target user and every other user in the user set by

CR IP T

the different similarity calculation formulas firstly, and then find the nearest neighbors according to the values of the similarity [21]. The similarity is ordered from large to small, so we can generate a set NN={U1,U2,U3....UP} called nearest neighbors set. Next, some most used methods to calculate the similarity are introduced in the paper.

Cosine method is one of the most commonly used method in information retrieval field, its basic idea is to measure the similarity of two vectors by calculating the cosine of the angle between two vectors, it is also used

AN US

in recommendation system with the development of the recommendation technology.

Assuming that i and j are two different rating vectors, then the formula (1) of the cosine method is as follows:

sim(i, j )  cos(i, j ) 

i j

i * j

(1)

the bigger is similarity they have.

M

Among the formula (1), the similarity is the value of cos(i,j) . The smaller is angle between the two vectors,

ED

The cosine method reflects the more differences of directions between two vectors, it is not sensitive to the absolute value. So sometimes it is hard to find the difference between the vectors which leads to the error of the results. To correct the error, modified cosine method would use cosine method to calculate the similarity after

PT

deducting a mean on the values of all dimensions. Assuming that Sm,n is the score of usr m on item n; then the formula (2) to the modified cosine method is as

CE

follows:

AC

sim(i, j ) 

 (S

i ,c

 S i )(S j ,c  S j )

cI i , j

 (S i ,c  S i ) 2 cI i

 (S j ,c  S j ) 2

(2)

cI j

Pearson correlation coefficient is a value in [-1,1], it reflects the degree of linear correlation between two

variables. If the value of pearson correlation coefficient is 0, there is no relations between the vectors; if the value of pearson correlation coefficient is 1, the vector is very similar to anther; and if the value of pearson correlation coefficient is -1, the vector is completely opposite to another. The formula (3) to the Pearson method is as follows:

ACCEPTED MANUSCRIPT

sim(i, j ) 

 (S

i ,c

 S i )(S j ,c  S j )

cI i , j

 (S

i ,c

 Si ) 2

cI i , j

 (S

j ,c

 S j )2

(3)

cI i , j

Euclidean distance similarity is the easiest method in all similarity methods and it is also very easy to understand. Its basic idea is calculating the euclidean distance between two users to gain the similarity. The

d (i, j )  (  ( Si , c  S j , c ) 2 ) cI i , j

CR IP T

formula (4) to calculate the euclidean distance is as follows; (4)

But d(i,j) is not the similarity and it is a value bigger than 0, in order to reflect the similarity, we used the following formula(5) to calculate the similarity.

1 1  d ( x, y )

AN US

sim(i, j )  3.3. Providing the recommendations

(5)

CFRA considers that if there is higher similarity between two users, the two users would give similar or same

M

scores on the same item, so CFRA would calculate a weighted average according to the scores that the users in NN set of target user have on the item as the predicted score of target user, then CRAF generates the

ED

recommendations to the target user according to all the predicted scores [22]. The formula (6) of the predicted scores is as follows:

n

jc

 Sj)

m 1

n

 sim(i, j )

(6)

m 1

CE

PT

Pi ,c  S i 

 sim(i, j )  (S

AC

Among the formula (6), n represents that there are n nearest neighbors.

3.4. The algorithm steps of CFRA Step 1: Input user i and user-item matrix S[m,n]. Step 2: Calculate the similarities between the target user i and other users in user data according to the user-

item matrix. Step 3: Choose the first n similar users of the target user i as the NN(i) according to the similarities. Step 4: Calculate the predicted scores of target user i to every item according to the formula (6). Step 5: Arrange the items from big to small according to the value of predicted scores. Step 6: Choose the former N items as recommendations to the target user i. Step 7: Output the recommendations.

ACCEPTED MANUSCRIPT

4. THE RESEARCH OF RECOMMENDATION BASED ON THE TRUST IN SOCIOLOGY 4.1. Hybrid recommendation algorithm After recent years of rapid development, many recommendation algorithms are designed in these years, but these algorithms have their own strength and weakness, and it is hard to adapt to the complex real situation by

CR IP T

using a single recommendation algorithm. To solve the problem, some researchers mixed the different recommendation algorithms together to get a new hybrid recommendation algorithm. The new hybrid recommendation algorithm often has more advantages and less disadvantages than the single recommendation algorithm.

There are two basic ideas of hybrid recommendation algorithm, one is the mixture of the recommendation results, and another is the mixture of the different weights. We focus on the second hybrid recommendation

AN US

algorithm.

4.2. Trust model

The trust in sociology is very important relationship in our daily life, users will decide weather take the advice according to the trust between the users and their friends, but many recommendation algorithms lose sight of the trust and pay more attention to the similarity. So in the chapter, we discuss the role of trust in the algorithms.

M

A unified definition of trust has not been formed because of the complexity of the trust, so we simply define the trust in the recommendation algorithm according to the theoretical framework of trust in this paper.

ED

Definition 1: The trust in the recommendation algorithm is a subjective probability forecast of recommendation accepter to recommendation provider. In the real social life, trust is a quantitative nouns, so we use trust degree (TD) to represent the degree of trust

PT

between two users. The specific definitions of TD is as follows. Definition 2: TD is a measurement of a point-to-point trust relationship which exists in the recommendation

CE

accepter and recommendation provider. Definition 3: Assuming that A and B are any two nodes in the user trust model, the trust degree of B to A is represented as TD(B,A). The bigger is value of TD(B,A), the more that B trusts A.

AC

It is important to note that the trust relationship between B and A is asymmetric, which means that TD(A,B) is

not equal to TD(B,A). Assuming that i and j are two nodes in the user trust model, then the formula(7) to calculate the TD(i,j) is as

follows:

 AccTime( j, i, c)

TD (i, j ) 

cI j , i

Re cTime( j, i)

(7)

ACCEPTED MANUSCRIPT

1, S  S   j ,c  i ,c AccTime( j , i, c)   0, S i ,c  S j ,c   

Among the formula (7), AccTime(j,i,c) represents the correctness of the recommendation that user j to user i about item c, Ɛ is a fixed threshold. If the absolute difference value between Si,c and Sj,c is less than or equal to Ɛ, AccTime(j,i,c) is equal to 1; if the absolute difference value between Si,c and Sj,c is more than Ɛ, AccTime(j,i,c) is

CR IP T

equal to 0. RecTime(j,i) represents the total times of recommendations about all items that i and j both have scores on in the data set that user j to user i. 4.2.1. Trust path

In the field of social relationships, user may not communicate with a stranger, but if one of your friends that the user trusts very much trust the stranger, and then the user may take the advice that the stranger provides. There

AN US

is a trust path from the user to the stranger. The Figure 2 expressed the relationship.

A

B

C

M

Figure 2. Trust path

ED

In Figure 2, TD(A,B) is the trust degree that user A to user B, TD(B,C) is the trust degree that user B to user C. The trust degree that user A to user C is in the following formula (8):

TD( A, C)  TD( A, B)  TD( B, C)

(8)

PT

But in most instances, there exists more than one trust path between user A and user C. It is as shown in

AC

CE

Figure 3.

B1

A

C B2 Figure 3. Multi trust path

In Figure 3, there are two trust paths between user A and user C. In the multi trust path situation, the trust degree that user A to user C is in the following formula (9):

ACCEPTED MANUSCRIPT

TD( A, C ) 

TD( A, B)  TD( B, C )

(9)

BTC

n

Among the formula (9), TC represents the trust circle, n is the total number of all trust paths between user A and user C. In some instances, user A and user C may trust each other in case that they don't know each other, we call this

CR IP T

kind of trust path loop trust path. It is as shown in Figure 4.

B A

C

AN US

D

Figure 4. Loop trust path

In loop trust path, we use reverse TD addition mechanism to calculate the TD (A,C)’. The trust degree that

M

user A to user C is in the following formula (10):

TD( A, C)' TD( A, C)  TD(C, A)

(10)

ED

Among the formula (910), λ is a constant parameters, if the value of TD(A,C)’ is more than 1, take TD(A,C)’=1.

PT

It is important to notice that the TD(A,C) after the synthesis is still asymmetric. 4.2.2. The length of the trust path

which means

CE

If there are large amount of users in the TC, transmission distortion will exist in the trust path,

AC

there are too many users from user A to user C in the trust path, It is as shown in Figure 5.

A

B1

B2

......

Bn

C

Figure 5. The length of the trust path

In Figure 5, there are n users between the trust path of user A and user C, to avoid the transmission distortion, we can set a threshold value x to limit the length of the trust path, if n is more than x, we abandon the trust path to insure the precision of TD.

ACCEPTED MANUSCRIPT

4.3. Algorithm analysis This paper takes the trust model into consideration to design two recommendation algorithms based on the trust in sociology. We calculate and store the trust information of the users, and then use TD to take the place of the similarity to design a recommendation algorithm based on the trust in sociology (CFRAT). We also use the TD to design a hybrid recommendation algorithm based on the trust and similarity (HRAT). The synthesis of weight based on the trust and similarity is shown as formula (11):

Among the formula (11), α is a weight constant.

CR IP T

weight( A, C)    sim( A, C)  (1   )  TD( A, C)

(11)

And the predicted scores of user i on item c is shown as formula (12) according to formula (6): n

m 1

n

jc

 Sj)

AN US

Pi ,c  S i 

 weight(i, j )  (S

 weight(i, j )

(12)

m 1

4.4. The specific steps of the algorithm

M

4.4.1. The steps of CFRAT

Step 1: Input user i and user-item matrix S[m,n], threshold value x, constant parameters λ and weight constant α.

ED

Step 2: Calculate the TD between the target user i and other users in user data according to the user-item matrix.

Step 3: Choose the first n users according to the value of TD as the NN(i) .

PT

Step 4: Calculate the predicted scores of target user i to every item according to the formula (6). Step 5: Arrange the items from big to small according to the value of predicted scores.

CE

Step 6: Choose the former n items as recommendations to the target user i. Step 7: Output the recommendations.

AC

4.4.2. The steps of HRAT

Step 1: Input user i and user-item matrix S[m,n], threshold value x, constant parameters λ and weight

constant α.

Step 2: Calculate the similarities between the target user i and other users in user data according to the user-

item matrix. Step 3: Choose the first n similar users to the target user i as the NN(i)sim according to the similarities. Step 4: Use the trust model to calculate the TD of the target user i and other users. Step 5: Choose the first n users according to the value of TD as the NN(i)trust. Step 6: When the intersection set of NN(i)sim and NN(i)trust is not a An empty set, take the intersection set as NN(i)’, otherwise end the algorithm.

ACCEPTED MANUSCRIPT

Step 7: Calculate the weight of the neighbors in NN(i)’ according to the formula (10). Step 8: Choose the first n users in NN(i)’, to form the final nearest neighbors set as NN(i) according to the value of the weight. Step 9: To every item c, calculate the predicted scores of the target user i on item c according to formula (11). Step 10: Choose the former N items as recommendations to the target user i.

5. EXPERIMENT 5.1. The experiment mode

CR IP T

Step 11: Output the recommendations.

There are three modes of experiment modes including offline experiments mode, user surveys mode and online experiments mode. Offline experiments mode has 4 steps

AN US

Step 1: Get the user behavior data by the log system and generate a standard data set according to certain format.

Step 2: Divide the data set into training set and testing set.

Step 3: Train the user model by using training set, then predict the scores on the testing set. Step 4: Evaluate prediction results of the algorithm by the chosen off-line index in advance. User surveys mode needs some real users. First those real users need to complete some tasks on the testing

M

recommendation system so we can observe and record the behavior and ask the users to answer some questions when they finish tasks, finally we can evaluate the performance of the test system by the analysis of their

ED

behavior and answers

Online experiments mode also needs real users. Those users are divided into some groups according to certain rules to test different algorithms, and then we can compare those algorithms by the statistics of different groups

PT

on the different evaluation index such as click rate. All three modes have their own advantages and disadvantages, so we compare those modes by the Table 2.

CE

Table 2. The advantages and disadvantages of the modes advantage disadvantage

AC

Offline experiments mode

Unable to calculate a business

Need not the real users

indicators of concern

Can test a number of different algorithms in a

There is a gap between the off-line

fast speed and convenient

index and commercial index

Get the better index of reflecting

user

subjective feeling

Need real users

User surveys mode

Online experiments

Low risk to make up of the mistakes

Hard to recruit large-scale test users

Fair to get the online performance indicators of

Need real users

ACCEPTED MANUSCRIPT

mode

different algorithms

Long test cycle and complex design

From Table 2, we know the advantages and disadvantages of the modes, to test the algorithms in a fast speed by using limited resources, we chose the Offline experiments mode.

5.2. The experimental data and the evaluation index

CR IP T

The experimental data of this paper is Netflix data set which comes from the database of a movie rental web site called Netflix. Netflix published the data set at the end of 2005, it includes over 100 million scores of 480189 anonymous users to 17770 movies, every score of film rating is from 1-5, the data also includes movie names and rating dates.

This paper chose a random sample which includes the scores of 500 users to 1000 movies, and used the

AN US

former 800 movies as training set, the later 200 movies as testing set. It is shown as Table 3.

Table 3. Data set

500

Training set

500

Testing set

500

Record

1000

9756

800

7673

200

2093

ED

Total set

Movie

M

User

To test precision of the algorithms, MAE （Mean Absolute Error）and RMSE (Root mean square error) were used in this paper.

PT

MAE is a deviation value of predicted user scores and user real scores. Compared with the average error, the situation of the positive and negative offset will not happen in MAE because of the absolute value of the

CE

deviation. If the MAE is smaller, the algorithm has better performance.

AC

The formula to calculate the MAE is shown as formula (13): N

MAE 

 p(i)  q(i)

(13)

i 1

N

RMSE is the variance of the arithmetic square root; it is used to measure the deviation between the observed

value and the true value. If the RMSE is smaller, the algorithm has better performance. The formula to calculate the RMSE is shown as formula (14): N

RMSE 

 ( p(i)  q(i)) i 1

N

2

(14)

ACCEPTED MANUSCRIPT

Among the formula (13) and formula (14), p(i) is the predicted value of user i, q(i) is the real value of user i . We used the average value of all MEA and RMSE as the final precision.

5.2. The experiment results To compare the performance of the algorithm more obviously, we choose the reasonable parameter values and different numbers of the nearest neighbor to do the experiment. We take Ɛ=1; λ=0.4; α=0.4; x=4.

CR IP T

Table 4 and Table 5 showed the results of the algorithms proposed in this paper.

Table 4. Experimental results of MEA 5

10

15

20

25

CFRA

0.9313

0.9264

0.9237

0.9218

0.9202

CFRAT

0.9272

0.9214

0.9165

0.9137

0.9113

HRAT

0.9213

0.9127

0.9065

0.9038

0.9024

15

20

25

M

AN US

Neighbors

Table 5. Experimental results of RMSE 5

CFRA

0.9329

0.9311

0.9297

0.9290

0.9239

CFRAT

0.9299

0.9261

0.9226

0.9188

0.9154

0.9179

0.9080

0.9054

0.9029

PT 0.9222

CE

HRAT

10

ED

Neighbors

AC

From the Table 4 and Table 5, we can see that the MEA and RMSE of the proposed algorithms are smaller than the MEA and RMSE of collaborative filtering recommendation algorithm, so we can conclude that the proposed algorithms have better performance than the CFRA in all different numbers of neighbors. Next, we use a chart to show the different of those algorithms more obviously so that we can see the

improvement more directly.

CR IP T

ACCEPTED MANUSCRIPT

Figure 6. Precision results comparison

Form Figure 6, we can see obviously that all three algorithms have a higher MEA and RMSE when they all

AN US

have less number of neighbors. Because it is hard to get the exact trust degrees and similarities when number of user is less, the lack of the relationship messages between the users leads to the poor performance, but the proposed algorithms have better performance compared with the collaborative filtering recommendation algorithm.

With the increasing of the neighbors, all three algorithms have better performance than their own former data, but the algorithms proposed in this paper still have better performance compared with collaborative filtering

M

recommendation algorithm, so we can conclude that on the whole experiment, the proposed algorithms have lower MEA than the traditional algorithm and RMSE is in the same distribution which means that the proposed

ED

algorithms can improved the quality of recommendation system.

PT

6. DISCUSSION AND CONCLUTION The traditional algorithms pay too much attention to the similarity to focus on the social trust between users,

CE

so this paper design CFRAT and HRAT to research the impact of trust in recommendation. From the experimental results, we know that trust is a very import factor to affect the quality of recommendation algorithm. The proposed algorithms improve the disadvantage of relying on similarity and

AC

have better performance than the traditional algorithm. But we also find a phenomenon of trust called authority trust, and it is possible to use authority trust to solve the problem of cold boot and improve the function of recommendation algorithm. So the distinctive contributions of this paper is focused on the specific application of the trust between the

users in the recommendation system and proposed the trust model with more details than the later papers. Cold boot is a very common problem in recommendation system. It is hard for recommendation system to give new users or some old users with little behaviors in the database recommendations with high quality, the lack of user behavior data leads to the cold boot problem and there is no effective method to solve the problem. But, we can use a trust model called authority trust model to ease the problem.

ACCEPTED MANUSCRIPT

In our daily life, we may know nothing about some field so is same to our friends, so we will ask the pundits in that field for some help and the pundits also will provide us with the recommendations which most people like. According to the idea, we can propose an authority trust model. We can calculate the total trust degrees (TTD) of all users, then sort the users in an order from big to small according to the values of TTD, take the former n users as authority user set. At last, the recommendation system will choose the items that the users in authority user set have chosen as the recommendations to the new users, and we can ease the problem in this way.

model to ease the problem of cold boot.

ACKNOWLEDGEMENTS

CR IP T

Next step we will continue to research the trust for the further step, we will add authority trust into trust

AN US

This work was funded by the National Natural Science Foundation of China (61373134, 61402234). It was also supported by the Priority Academic Program Development of Jiangsu Higher Education Institutions (PAPD), Jiangsu Key Laboratory of Meteorological Observation and Information Processing (No.KDXS1105) and Jiangsu Collaborative Innovation Center on Atmospheric Environment and Equipment Technology (CICAEET). In addition, this research was partially supported by the MSIP(Ministry of Science, ICT and Future Planning), Korea, under the ITRC(Information Technology Research Center) support program (IITP-2016-H8601-161009) supervised by the IITP(Institute for Information & communications Technology Promotion).

REFERENCES

AC

CE

PT

ED

M

1. An W, Liu Q, Liyi Z. Research Progress on the Diversity of Personalized Recommendation System. Library information service 2013; 57(20):127-135. 2. Hongwei M, Guangwei Z, Peng L. Overview of Collaborative Filtering Recommendation Algorithm. Journal of Chinese Computer Systems 2009; 30(7):1282-1288. 3. Yuan S, Cheng C. Ontology-based personalized couple clustering for heterogeneous product recommendation in mobile marketing. Expert Systems with Applications 2004; 26(4):461-476. 4. Bin G, Victor S. Incremental learning for V-Support Vector Regression. Neural Networks 2015 67(4):140-150. 5. Goldberg K, Roeder T, Gupta D. Eigen taste: A Constant Time Collaborative Filtering Algorithm. Information Retrieval 2001; 4(2):133-151. 6. Cho J, Kwon K, Park Y. Collaborative filtering using dual information sources. IEEE Intelligent system 2007; 22(3):30-38. 7. Klare B F, Burge M J, Klontz J C. Face recognition performance: Role of demographic information. IEEE Transactions on Information Forensics and Security 2012 7(6):1789-1801. 8. Turney P D, Pantel P(2010) From frequency to meaning: Vector space models of semantics. Journal of artificial intelligence research 2010; 37(1):141-188. 9. Willmott C J, Matsuura K. Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Climate research 2005; 30(1):68-79. 10. Nessler S H, Uhl G, Schneider J M. Genital damage in the orb-web spider Argiope bruennichi (Araneae: Araneidae) increases paternity success. Behavioral ecology 2007; 18(1):174-181. 11. Kevin C, Zhong Li M. Techniques for finding similarity knowledge in OLAP reports. Expert system with application 2011; 38:3743-3756. 12. LUO H,NIU C,SHEN R. A collaborative filtering framework based on both local user similarity and global user similarity. Machine learning 2008; 72(3):231-245. 13. Henriette Cramer , Vanessa Evers. The effects of transparency on trust in and acceptance of a contentbased art recommender. User Model User-Adap Inter 2008; 18:455–496. 14. W. Yuan, L. Shu. iTARS: trust-aware recommender system using implicit trust networks. IET Communications 2010; 4(14):1709-1721.

ACCEPTED MANUSCRIPT

AN US

CR IP T

15. Young Ae Kim. A trust prediction framework in rating-based experience sharing social networks without a Web of Trust. Information Sciences 2012; 191:128-145 16. Gu B, Sheng VS, Tay KY, Romano W, Li S. Incremental support vector learning for ordinal regression. IEEE Transactions Neural Network Learn System 2014; 26(7):1403-1416. 17. Zike Z, Chuang L, Mcheng Z. Solving the cold-start problem in recommender systems with social tags. Europhysics letters 2010; 92(2):28002-28007. 18. Yin C, Zou M, Iko D,Wang J (2013) Botnet detection based on correlation of malicious behaviors. Intentional Journal of Hybrid Information Technology2013; 6(6):291–300. 19. Ailin D, Yangyong Z, Bai-le S. A Collaborative filtering recommendation algorithm based on item rating prediction. Journal of software 2003; 14(9):1621-1628. 20. Yin C. Towards accurate node-based detection of P2P Botnets. Sci World J 2014; 42(15):146-158. 21. Gu B, Sheng VS, Wang Z, Ho D, Osman S, Li S. Incremental learning for ν-support vector regression. Neural Network 2015; 67:140–150. 22. Terveen L, Hill Wamento B. PHOAKS: A system for sharing recommendations. Communication of the ACM 1997; 40(3):59-62. 23. Yin C, Feng L, Ma L. An improved Hoeffding-ID data-stream classification algorithm. The Journal of Supercomputing 2015; 1-12. 24. Yin C, Ma L, Lu F. A feature selection method for improved clonal algorithm towards intrusion detection. International Journal of Pattern Recognition and Artificial Intelligence 2016; 30(5): 1-13. 25. Lee, Jaewon, Hanjoon Kim, and Sanggoo Lee. Letters: Conceptual collaborative filtering recommendation: A probabilistic learning approach. Neurocomputing 2016; 73(13): 2793-2796. 26. Niu, J., Huo, D., Wang, K., & Tong, C. Real-time generation of personalized home video summaries on mobile devices. Neurocomputing 2013:404-414.

PT

ED

M

ChunyongYin Dr. Chunyong Yin is currently an associate professor and dean with the Nanjing University of Information Science & Technology, China. He received his Bachelor (SDUT, China, 1998), Master (GZU, China, 2005), PhD (GZU, 2008) and was Post-doctoral associate (University of New Brunswick, 2010).He has authored or coauthored more than twenty journal and conference papers. His current research interests include data mining, privacy preserving and network security.

AC

CE

Jin Wang received the B.S. and M.S. degree from Nanjing University of Posts and Telecommunications, China in 2002 and 2005, respectively. He received Ph.D. degree from Kyung Hee University Korea in 2010. Now, he is a professor in the College of Information Engineering, Yangzhou University. His research interests mainly include routing protocol and algorithm design, network performance evaluation and optimization for wireless ad hoc and sensor networks. He is a member of the IEEE and ACM.

Dr. James J. (Jong Hyuk) Park received Ph.D. degrees in Graduate School of Information Security from Korea University, Korea and Graduate School of Human Sciences from Waseda University, Japan. From December, 2002 to July, 2007, Dr. Park had been a research scientist of R&D Institute, Hanwha S&C Co., Ltd., Korea. From September, 2007 to August, 2009, He had been a professor at the Department of Computer Science and Engineering, Kyungnam University, Korea. He is now a professor at the Department of Computer Science and Engineering and Deptartment of Interdisciplinary Bio IT Materials, Seoul National University of Science and Technology (SeoulTech), Korea. Dr. Park has published about 200 research papers in international journals and conferences. He has been serving as chairs, program committee, or organizing committee chair for many international conferences and workshops. He is a founding steering chair

ACCEPTED MANUSCRIPT

AC

CE

PT

ED

M

AN US

CR IP T

of some international conferences – MUE, FutureTech, CSA, UCAWSN, etc. He is editor-in-chief of Humancentric Computing and Information Sciences(HCIS) by Springer, International Journal of Information Technology, Communications and Convergence (IJITCC) by InderScience, and Journal of Convergence (JoC) by KIPS CSWRG. He is Associate Editor / Editor of 14 international journals including 8 journals indexed by SCI(E). In addition, he has been serving as a Guest Editor for international journals by some publishers: Springer, Elsevier, John Wiley, Oxford Univ. press, Hindawi, Emerald, Inderscience. His research interests include security and digital forensics, Human-centric ubiquitous computing, context awareness, multimedia services, etc. He got the best paper awards from ISA-08 and ITCS-11 conferences and the outstanding leadership awards from IEEE HPCC-09, ICA3PP-10, IEE ISPA-11, and PDCAT-11. Furthermore, he got the outstanding research awards from the SeoulTech, 2014. Dr. Park' s research interests include Human-centric Ubiquitous Computing, Vehicular Cloud Computing, Information Security, Digital Forensics, Secure Communications, Multimedia Computing, etc. He is a member of the IEEE, IEEE Computer Society, KIPS, and KMMS. Personal website: http://parkjonghyuk.net

An improved recommendation algorithm for big data cloud service based on the trust in sociology

An improved recommendation algorithm for big data cloud service based on the trust in sociology

Recommend Documents