Available online at www.sciencedirect.com
ScienceDirect Procedia CIRP 62 (2017) 78 – 83
10th CIRP Conference on Intelligent Computation in Manufacturing Engineering - CIRP ICME '16
Value Estimation Method of Products / Services using Wisdom of Crowd Kazuaki Yamadaa a
Department of Mechanical Engineering, Toyo University, 2100 Kujirai Kawagoe-shi Saitama 350-8585, Japan
* Corresponding author. Tel.: +81-49-239-1455; fax: +82-49-233-9779. E-mail address:
[email protected]
Abstract This paper proposes a novel method to estimate appropriate values of products and services by using particle filter and local regression smoothing from many user evaluations in websites such as amazon.com, yelp.com and so on. These websites provide a service which estimates values of the products and services from many user evaluations. However, users cannot always estimate the appropriate value of the product and service. In addition, the product and service cannot always keep a same value. For example, a human being gives a different evaluation to a same dish when he / she is hungry or full. And a mobile phone rises its value by improving connectability as base stations increase. Thus, we need to estimate appropriate values of the products and services by removing noises included in both user evaluations and the value of the products and services. We investigate the effects of the proposed method through simple simulation experiments imitating a reputation information site about restaurants. ©©2017 Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license 2016The The Authors. Published by Elsevier B.V. (http://creativecommons.org/licenses/by-nc-nd/4.0/). Selection and peer-review under responsibility of the International Scientific Committee of “10th CIRP ICME Conference". Peer-review under responsibility of the scientific committee of the 10th CIRP Conference on Intelligent Computation in Manufacturing Engineering Keywords: Reputation information site, Partifle filter, Wisdom of crowd
1. Introduction Websites such as Amazon.com, Yelp.com, Booking.com and so on, provide services which users can share reviews and evaluations about the products and services. These reviews and evaluations are contributed by many users who have used the products and services actually. In this study, we call these websites "reputation information sites". One of interesting points of the reputation information sites is that these websites are designed to improve the estimation accuracy of values of the products and services by using the evaluations of many novices who are not specialists like Michelin Guide investigators. Thus, these websites take advantage of wisdom of crowds [1] in a nicely way. Reputation information sites have used simple average to estimate the appropriate values of the products and services from many user evaluations. It is known that the accuracy of an estimated value improves by the law of large numbers as the number of the observation samples increases. However, the value of a product and service is not always constant, and the reliability of a user evaluation is not always constant. For example, a restaurant might raise its value by improving the
quality of the dishes and services as chefs and waiters gain experiences. And, a mobile phone might raise its value by improving connectability as base stations increase, even if the functions of the mobile phone are same. On the other hand, people give different evaluations for a same product and service because they have different evaluation criterions. In addition, it is known that a human being gives different evaluations for a same dish when he /she is hungry or full [2], and cannot keep the consistency of evaluations as time passes. Namely, the values of the products and services, and the estimation accuracy of the user evaluation change as time passes. In addition, a user evaluation is biased. In order to estimate the appropriate values of the products and services from many user evaluations including noises, we need to consider (a) the value fluctuations of the products and services, and (b) the biases and fluctuations of the user evaluations. Therefore, in this paper, we consider that the value fluctuation of the product and service is the variance of the system noise, and that the bias and the fluctuation of the user evaluation are respectively the mean and the variance of the observation noise. And we propose a novel method to estimate the appropriate values of the products and services by using
2212-8271 © 2017 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). Peer-review under responsibility of the scientific committee of the 10th CIRP Conference on Intelligent Computation in Manufacturing Engineering doi:10.1016/j.procir.2016.06.117
79
Kazuaki Yamada / Procedia CIRP 62 (2017) 78 – 83
1
θ θ: reliability of user : (evaluation) : (reputation information)
(a)
Auction site model
Who
Where
θ1
θ
θ
( 1θ1)
( θ)
( θ)
When What
Fig. 1. The conventional method to estimate the reliability of information
product/service
on websites
particle filter [3] and local regression smoothing [4] in order to remove these noises. In addition, we employ the selforganizing state space (SOSS) model to particle filter in order to estimate the parameters of both system noises and observation noises. In next section, we describe the relationship between the conventional methods and the proposed method. Next, we model the evaluation process of a user in a reputation information site in order to reveal the mechanism of occurring the gap between a user evaluation and the appropriate value of the product and service. In firth section, we explain the proposed method which uses particle filter and local regression smoothing. And the effect of the proposed method is investigated through simple simulation experiments. Finally, we conclude this paper, and describe a future work. 2. Related works In this section, we reveal that the conventional estimation methods cannot use to estimate the appropriate values of products/services in the reputation information sites. The conventional methods to estimate the reliability of information on websites (Fig. 1) are classified as follows. (1) Trust based on the content of information Trust based on the evaluation of information itself. (2) Trust based on the context of information Trust based on the information associated with the information such as publication date, link source, and electronic signature. (3) Trust based on the reputation of information (4) Trust based on the reliability of creator a. Trust based on the direct relationship between user and creator. b. Trust based on the reputation of creator.
θ: reliability of user : reference : (evaluation) : (reputation information)
(b)
Reputation information site model
1
( 11,1)
( ,)
( ,)
product/service
: (evaluation) : system noise : observation noise : (reputation information)
(c) The proposed model Fig. 2. Methods to estimate the values of products/services from many user evaluations.
(1), (2) and (4)-a are the methods to estimate the reliability of information from the primary information such as the direct relationship with the information creators, contexts of information and the information itself. On the other hand, (3) and (4)-b are the methods to estimate the reliability of information from the secondary information such as the reputation of information itself and the reputation of the information creators. On current websites, it is very difficult for us to directly estimate the reliability of information because many users create various information. Therefore, Google uses PageRank [5] as one of some technologies to estimate the importance of websites based on the heuristics that the site linked from many websites is important. This method is equivalent to "(3) trust based on the reputation of information".
80
Kazuaki Yamada / Procedia CIRP 62 (2017) 78 – 83
On the other hand, auction sites such as eBay estimate the reliability θ of users from the result that a seller and a buyer evaluated with each other as shown in Fig. 2(a). This method is equivalent to "(4)-b trust based on the reputation of creator". However, in reputation information sites, it is difficult to estimate the appropriate value of products/services by using the estimation method employed in auction sites, as shown in Fig. 2(b). Because, in reputation information sites, users and products/services cannot evaluate with each other like auction sites. In addition, it is difficult to estimate the appropriate value of products/services by simple average of user evaluations because of noises including in the user evaluations. Therefore, Yelp.com develops a method to estimate the appropriate value of products/services from many user evaluations as follows. Firstly, they estimate the reliability of a user by using the special algorithms from the user contributions such as reviews, the reputation of other users and so on. Next, they total the user evaluations weighted based on the estimated reliability of users. Finally, they estimate the appropriate value of product/service from the weighted evaluations. However, the high reliability reviewers cannot always evaluate products/services because the evaluations of products/services change as time passes, and the user evaluations has bias and fluctuations. Therefore, reputation information sites need to remove the bias and fluctuation including the user evaluation, and the fluctuation of products/services, and estimate the appropriate values of products/services. In this study, we assume that a user is a sensor, and a user evaluation is a sensor value. And we consider the fluctuation of product/service as the variance of system noise, and consider the bias and fluctuation of the user evaluation as the mean and variance of observation noise respectively. This paper proposes a novel method to estimate the appropriate values of products/services by using particle filter and local regression smoothing. Particle filter can estimate the true system state by considering the system noise and observation noise. Generally, particle filters need previously the parameters of the sensor noise and system noise. However, it is difficult to measure the parameters the system noise and sensor noise in advance. Thus, the proposed method employs the self-organizing state space (SOSS) model to estimate these noise parameters from many user evaluations.
(-1 )
observation
2 (τ )
-1
(0σ2)
evaluation
() Fig. 3. A process model which a user evaluates product/service.
Fig. 4. A mechanism of the proposed reputation information system.
3. Modeling of a reputation information site
ݐݔൌ ݂ሺ ݐݔǡ ݐݒሻǡ
In this section, we model the evaluation process of a user in a reputation information site, as shown in Fig. 3. The reputation information site is constituted from m products/services and n users. We assume that the appropriate value of the j-th product/service is xt-1. The value of product/service is transformed xt from xt-1 by eq. (1-a) because of the noise vt including itself. The i-th user observes xt as the value of product/service when the user uses it. The user gives yt as the value of product/service by eq. (1-b) because of the noise wt including the user evaluation. Thus, in this model, the appropriate value of product/service is xt-1, but the user Ui gives yt as the value of product/service PSj because of noises including both the product/service and the user evaluation.
ݕ௧ ൌ ݄ሺݔ௧ ǡ ݓ௧ ሻǡ
̱ܰ ݐݒሺͲǡ ߪʹ ሻ
(1-a)
ʹ
(1-b)
̱ܰ ݐݓሺͲǡ ߬ ሻ
Where f and h are nonlinear functions about the state vector x, the fluctuation of the value of product/service is the white noise N(0, σ2), and the fluctuation of the user evaluation is the white noise N(m, τ2). In particle filter, eq. (1-a) and eq. (1-b) are called system model and observation model respectively. vt and wt are called system noise and observation noise respectively. 4. Reputation information system Our proposed reputation information system can estimate the appropriate values of products/services by using particle filter and local regression smoothing. This system employs the
81
Kazuaki Yamada / Procedia CIRP 62 (2017) 78 – 83
self-organizing state space (SOSS) model in order to estimate the noises including both the value of products/services and the user evaluations. In addition, this system employs fixed-lag smoothing [3] in order to improve the estimation accuracy of particle filter. In this section, firstly, we describe the outline of the proposed system. Next we explain the constitute elements of this system, such as particle filter, self-organizing state space model, fixed-lag smoothing and local regression smoothing.
predicted distribution () {-1|-1 }=1
likelihood
4.1. Outline of a reputation information system The reputation information system generates a user model U and a product/service model PS when a user uses a product/service, as shown in Fig. 4. The user model U has N particles, and each particle has a mean m and variance τ2 of the noise w including the user evaluation. The product/service model PS also has N particles, and each particle has the value x of product/service and a variance σ2 of the noise v including the value of product/service. This system estimates the appropriate value of product/service according to the following procedure when a user uses it. (1) Generate N particles which have an extended state vector by combining parameters of both the user Ui and the product/service PSj when the i-th user evaluates the j-th product/service. And generate a particle population ሺሻ ሼݖ௧ିଵȁ௧ିଵ ሽே ୀଵ constructed from N particles. The probability distribution of the value of product/service in the time t-1 is approximated by this particle population. ሺሻ (2) Generate the particle population ሼݖ௧ȁ௧ ሽே ୀଵ at time t by resampling based on the user evaluation yt. (3) Separate the elements of the extended state vector, and update the parameters of product/service models and user models. (4) Calculate the value ݔො௧ȁ௧ of product/service from the particle population at time t. (5) Calculate the appropriate value ݔҧ ௧ȁ௧ of product/service by smoothing the value ݔො௧ȁ௧ of product/service by local regression smoothing. This system can improve the estimation accuracy of both the value of product/service as the number of user evaluations increases. In addition, this system uses local regression smoothing in order to estimate the appropriate value ݔҧ ௧ȁ௧ of product/service. Because particle filter estimates the value ݔො௧ȁ௧ of the product/service fluctuated by noises. 4.2. Particle filter Particle filter is a kind of time series filter, and can estimate the state xt of a system at the time t by using the Monte Carlo method from the observation value yt. As shown in Fig. 5, a basic idea of particle filter, firstly is to generate a predictive distribution by spreading the many particles in the state space based on the system model. Next, to calculate the likelihood of each particle based on the observation value yt. Then, to generate a filter distribution by resampling particles based on the likelihood of each particle. Particle filter can estimate the state of the system by repeating the above operations.
predicted distribution () {|-1 }=1
resampling
filtering distribution {|()}=1
Fig. 5. A conceptual model of particle filter.
4.2.1. Algorithm Particle filter algorithm can be described as follows. ሺሻ
ሺሻ
(1) Generate the particle population ሼݔȁ ሽே ୀଵ ሺݔȁ ̱ ሺݔሻሻ which approximates the initial distribution. Where p0(x) is the initial distribution of x at time t=0. (2) Compute the step (a), (b), and (c). (a) Likelihood Compute the step (i) ~ (iii). ሺሻ (i) Generate a random number ݒ௧ ̱ݍሺݒሻ. ሺሻ ሺሻ ሺሻ (ii) Calculate ݔ௧ȁ௧ିଵ ൌ ݂௧ ሺݔ௧ିଵȁ௧ିଵ ǡ ݒ௧ ሻ. ሺሻ ሺሻ (iii) Calculate ߚ௧ ൌ ሺݕ௧ ȁݔ௧ȁ௧ିଵ) (eq. (2)). (b) Resampling ሺሻ Resampling a particle ݔ௧ȁ௧ିଵ from particle population ሺሻ ሺሻ ሺሻ ሺሻ ே ሼݔ௧ȁ௧ିଵ ሽୀଵ with a probability ߚ෨௧ ൌ ߚ௧ Ȁ σே ୀଵ ߚ௧ , ሺሻ ே and generate a new particle population ሼݔ௧ȁ௧ିଵ ሽୀଵ . (c) Estimation of the state ே ͳ ሺሻ ݔො௧ȁ௧ ൌ ݔ௧ȁ௧ ܰ ୀଵ If the observation noise follows a normal gauss distribution which the mean and the variance are $m$ and τ2 respectively, the likelihood of each particle can be calculate as follows, ሺሻ ሺݕ௧ ȁݔ௧ȁ௧ିଵ ሻ
ൌ
ଵ ඥଶதమ
ሺሻ
െ
ሺሻ
൬௬ିுቀ௫ȁషభାȁషభ ቁ൰ ଶఛమ
మ
൩
(2)
4.2.2. Self-organizing state space model Self-organizing state space (SOSS) model uses the state vector that is constituted by the state xt and the hyperparameters λt including the variance σt2 of the system noise vt
82
Kazuaki Yamada / Procedia CIRP 62 (2017) 78 – 83
and the variance τt2 of the observation noise wt. The state vector can be described as shown in eq. (3). SOSS model can estimate both the system noise and observation noise at same time by repeating operations same as particle filter * . Then, the observation model and system model are described by eq. (4), and nonlinear functions H and F are described by eq. (5). ݔ௧ ݖ௧ ൌ ቂߣ ቃ (3-a) ௧ ߪ௧ଶ ߣ௧ ൌ ݉ ௧ ߬௧ଶ
(3-b)
ݖ௧ ൌ ܨሺݖ௧ିଵ ǡ ݒ௧ ሻ
(4-a)
ݕ௧ ൌ ܪሺݖ௧ ǡ ݓ௧ ሻ
(4-b)
݂ሺݔ௧ିଵ ǡ ݒ௧ ሻ ܨሺݖ௧ିଵ ǡ ݒ௧ ሻ ൌ ൨ ߣ௧ିଵ ߳௧
(5-a) (5-b)
ܪሺݖ௧ ǡ ݓ௧ ሻ ൌ ሺݖ௧ ǡ ݓ௧ ሻ
λt =λt-1 + εt in eq. (5-a) describes the time variation of hyperparameter. εt = [ζt, ηt, ηt]' is ζt ~ N(0, ν2) and ηt ~ N(0, ξ2) respectively. η and ξ are called the hyper-hyper parameter because these parameters characterize hyper-parameters. 4.2.3. Fixed-lag smoothing Fixed-lag smoothing can estimate the state smoothed by calculating the state vector that was extended by eq. (6), in the same way as particle filter. Finally, it is possible to estimate the state at time t-L smoothed based on the information of the time t from time t-L, by taking out the state at time t-L from the state vectors in the particle population. ሺሻ
ሺሻ
ሺሻ
ሺሻ
ݖǁ௧ȁ௧ିଵ ൌ ሾݖ௧ȁ௧ିଵ ǡ ݖ௧ିଵȁ௧ିଵ ǡ ڮǡ ݖ௧ିȁ௧ିଵ ሿԢ
(6)
4.3. Local regression smoothing Local regression smoothing can smooth data including noises by calculating weighted linear least-square regression based on eq. (7). A regression weight w(xt) is calculated by eq. (8). The regression weight becomes the max value at data xt, and becomes small as data x goes far from data xt. Thus, the regression weight is symmetrical about data w(xt), but is asymmetrical at both sides of data. Usually, local regression smoothing applies to static data set. However, we smooth data by local regression smoothing, each time new data is observed. ଶ ଶ σே ௧ୀଵ ݓሺݔ௧ ሻሺݕ௧ െ ܽ ܾݔ௧ െ ܿݔ௧ ሻ
ݓሺݔ௧ ሻ ൌ ൬ͳ െ ቛ
௫ ି௫ ௗሺ௫ሻ
ଷ ଷ
ቛ ൰
(7) (8)
Where xt is a data at time t, and xi is a data in a range. d(x) is the distance from xt to the most far data in a range.
*
We use logarithm value in order to keep the positivity of σ2 and τ2.
5. Simulation 5.1. Simulation setting In this paper, we refer to Tabelog.com which is a famous restaurant reputation information site in Japan, in order to model the noises of the user evaluations. 5.1.1. User model We construct a user model through analyzing the noises of user evaluations in the existing reputation information sites. Thus, we compare a user evaluation and the average of other user evaluations, and calculate the mean and the variance of user evaluations. 5.1.2. Restaurant model It is very difficult to measure the fluctuation of a restaurant value in advance. Therefore, we use the normal gauss noise N(0, 0.04) as the fluctuation of the restaurant value. In addition, we assume that the value of a restaurant is constant. In this case, we assume that a restaurant chain which uses a manual for customers, and cooks dishes at a factory in a lump. 5.1.3. Parameters This simulation is constituted from 50 users and 50 restaurants. Each user uses a restaurant at one day, and evaluates it. The number of particles that users and restaurants hold respectively is N=500. The initial distribution of particles which estimate the mean and the variance of a user evaluation is determined by the uniform random number U[-1.5, 1.5] and U([0.0, 1.5]) respectively. The hyper-hyper parameter that adjusts the fluctuation width is ξ2=0.04. In addition, the initial distribution of particles which estimate a restaurant value and the variance of the restaurant value is determined by the uniform random number U([2.5, 5.5]) and U([0.0, 0.2]) respectively. The hyper-hyper parameter that adjusts the fluctuation width is ν2=0.01. The lag time of fixed lag smoothing is L=5. 5.2. Simulation results We describe a result in the case that the appropriate values of restaurants are constant. Fig. 6(a) shows root-mean square between the appropriate value and the estimated value by two types of estimation methods. One is particle smoother (PS) which is particle filter employing fixed-lag smoothing, and another is the proposed method which uses particle smoother and local regression smoothing. In the figure, the horizontal and the vertical axis respectively indicate days and the logarithm root-mean square. The orange and the green line in the graph respectively indicate the error of particle smoother and the error of the proposed method.
83
Kazuaki Yamada / Procedia CIRP 62 (2017) 78 – 83
Fig. 6(b) shows the appropriate value of a restaurant which is estimated by the proposed method. In the figure, the horizontal and the vertical axis respectively indicate days and the value of the restaurant. The red line indicates the appropriate value of the restaurant. Each blue star indicates a different user evaluation. The orange and the green lines respectively indicate user evaluations estimated by particle smoother and by the proposed method. As can be seen blue stars in Fig. 6(b), some users gave the value which deviated significantly from the appropriate value of the restaurant. You can see that particle smoother could remove the noise from user evaluations through comparing raw user evaluations with the estimated values by particle smoother. In addition, you can see that the proposed method could estimate the appropriate value by applying local regression smoothing to the value estimated by particle smoother. Fig. 6(c) and Fig. 6(d) respectively show the mean and the variance of a user evaluation estimated by self-organizing state space model. In the Fig. 6(c), the horizontal and the vertical axis respectively indicate days and the mean of the user evaluation. The red line in the graph indicates the mean of the user evaluation. The orange and the green line respectively indicate the estimated values by particle smoother and the proposed method. This result shows that self-organizing state space model could estimate the mean of the user evaluation finally. And in the Fig. 6(d), the horizontal and the vertical axis respectively indicate days and the variance of the user evaluation. This result shows that self-organizing state space model could estimate the variance of the user evaluation at early stage.
(a) Root-mean square.
(b) An estimated value of a restaurant.
5. Conclusions This paper proposed a novel method to estimate the appropriate values of the products and services from many user evaluations in the reputation information sites by using particle filter and local regression smoothing. The proposed method employed self-organizing state space model in order to estimate the noises of both the value of product/service and the user evaluation, and employed fixed-lag smoothing in order to improve the estimation accuracy of particle filter. We investigated the effects of the proposed method through simple simulation experiments. As the results of simulation experiments, the proposed method was confirmed that it could estimate the appropriate value of products and services, and simultaneously estimate the noises of both the value of products/services and the user evaluations. In the future work, we will apply the proposed method to the existing reputation information sites such as Amazon.com, Yelp.com, Booking.com and so on. Acknowledgements This research was supported by MEXT KAKENHI Grant Number 25730185 and Service Science, Solutions and Foundation Integrated Research Program (S3FIRE), Research Institute of Science and Technology for Society (RISTEX), Japan Science and Technology Agency (JST).
(c) An estimated mean of a user evaluation.
(d) An estimated variance of a user evaluation. Fig. 6. Simulation results.
References [1] J. Surowiecki, The Wisdom of Crowds, Anchor, 2005. [2] L. Mlodinow, The Drunkard's Walk - How Randomness Rules Our Lives, Penguin, 2009. [3] G. Kitagawa, Self-organizing state space model, Journal of the American Statistical Association, Vol.93, No.443, pp.1203-1215, 1998. [4] William S. Cleveland and Susan J. Devlin., Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting, Journal of the American Statistical Association, Vol.83, No.403, pp.596-610, (1988). [5] Sergey Brin and Lawrence Page, The Anatomy of a Large-Scale Hypertextual Web Search Engine, COMPUTER NETWORKS AND ISDN SYSTEMS, pp.107-117, 1998.