Expert Systems with Applications PERGAMON
Expert Systems with Applications 18 (2000) 283–297 www.elsevier.com/locate/eswa
Next-generation agent-enabled comparison shopping Soe-Tsyr Yuan*, A. Liu Information Management Department, Fu-Jen University, Taipei, Taiwan, ROC
Abstract Agents are the catalysts for commerce on the Web today. For example, comparison-shopping agents mediate the interactions between buyers and sellers in order to yield more efficient markets. However, today’s shopping agents are price-dominated, unreflective of the nature of seller/buyer differentiation or the changing course of differentiation over time. This paper aims to tackle this dilemma and advances shopping agents into a stage where both kinds of differentiation are taken into account for enhanced understanding of the realities. We call them next-generation shopping agents. These agents can leverage the interactive power of the Web for more accurate understanding of buyer’s preferences. This paper then presents an architecture of the next-generation shopping agents. This architecture is composed of a Product/Merchant Information Collector, a Buyer Behavior Extractor, a User Profile Manager and an Online Learning Personalized-Ranking Module. We have implemented a system following the core of the architecture and collected preliminary evaluation results. The results show this system is quite promising in overcoming the reality challenges of comparison shopping. 䉷 2000 Elsevier Science Ltd. All rights reserved. Keywords: (Multi-) agent systems; Comparison shopping; Reinforcement learning; Neural networks; Buyer valuation models
1. Introduction The increasing importance of electronic commerce and its likely impact on business worldwide is well documented through academic research and industry reports (Caglayan & Harrison, 1997; McEarchern & O’Keefe, 1998). Software agents are a catalyst for commerce on the Web (Guttman & Maes, 1998). There is a strong affinity between the Web—a worldwide distributed computing environment—and the capability of agents to act on and through software. The ultimate goal of agents is to accelerate the evolution of the Web from a passive, static medium to a tuned, highvalued environment. Comparison shopping agents are good examples of agents that serve this goal by providing a ranked list of product/merchant information gathered from the Web on behalf of buyers so that buyers can make their transaction decisions easily and quickly. Today’s shopping agents (BargainFinder; Doorenbos, Etzioni & Weld, 1997; Firefly; Jango; PersonaLogic) provide such ranked lists based on the prices of merchant products. However, these shopping agents fail to resolve the reality challenges addressed below: • Seller and buyer differentiation. In order to distinguish themselves from their competitors, sellers often add to * Corresponding author. Tel./fax: ⫹ 886-2-2369-3220. E-mail address:
[email protected] (S.-T. Yuan).
products values such as extended warranties, fast delivery times, special gifts and diverse payments. Accordingly, sellers can raise the price of a product above its marginal cost. Sellers are not satisfied in using pricedominated ranked lists as a sole reflection of their products, and so can block shopping agents’ access to their product information on their web sites. On the other hand, given a particular kind of product, for instance a walkman, buyers are often distinguished by their differing valuations of product/merchant terms, such as product brands, product functions, seller warranties or delivery days. There are some well-recognized ways of valuation, such as the ideal-brand model, the conjunctive model, the disjunctive model and the expected model (Kotler, 1999). The ranked-lists generated by price-dominated shopping agents obviously cannot satisfy the buyers’ needs. • The differentiation change. In reality, sellers occasionally make changes to the added values of products, according to their marketing strategies. On the other hand, buyers also change their own valuations of product/merchant terms of because of market trends or personal reasons. Customized ranked lists generated using static user profiles 1 are not sufficient to tackle such a reality challenge. 1 The approach using a static user profile means memorizing a user’s valuations of product/merchant terms and coming up with a customized ranked list for the user.
0957-4174/00/$ - see front matter 䉷 2000 Elsevier Science Ltd. All rights reserved. PII: S0957-417 4(00)00010-5
284
S.-T. Yuan, A. Liu / Expert Systems with Applications 18 (2000) 283–297
I m p r ove m e n t s
Customized rank of products/merchants
Behavior analysis of interactions
interaction behaviors Fig. 1. Two agent-enabled activities for the integrative comparison shopping.
Therefore, what is a good approach to comparison shopping that can tackle the reality challenges of seller/buyer differentiation and differentiation change? We believe the premised approach should be the combinations of two agent-enabled activities as shown in Fig. 1: • Agent-enabled customization of content: i.e. the ranked list, for tackling seller/buyer differentiation. In a ranked list, the most preferred item is placed in the first position, and the second preferred item is place in the second position and so on. • Agent-enabled behavior analysis of interactions: for improving the customized content and tackling the differentiation change of buyers. In other words, in addition to exploiting the Web’s commercial potential of connectivity, we also use the other important Web’s potential of interactivity and leverage the interactive power of the Web by analyzing users’ behavior for capturing buyers’ dynamic preferences over time. Notice that merely using agents to observe buyer behaviors on the Web and relating it to buyer preferences is not enough to gain an accurate understanding of buyer preferences. It is necessary to employ techniques that can leverage the interactive power of the Web for greater accurate understanding of buyer’s preferences. This paper presents an architecture of comparison shopping that can support the two indicated agent-enabled activities and
tackle the reality challenges of comparison shopping. This architecture comprises of a Product/Merchant Information Collector, a Buyer Behavior Extractor, a User Profile Manager and an Online Learning Personalized-Ranking Module that can leverage the interactive power of the Web. The Product/Merchant Information Collector acquires the most current product/merchant information. The Behavior Extractor obtains the interaction behaviors of buyers. The User-Profile Manager maintains the interaction history of buyers. The Online Learning Personalized-Ranking Module takes charge of the agent-enabled behavior analysis of interactions. This paper is organized as follows. In Section 2, the architecture of the second-generation comparison-shopping agents is described. Section 3 details the core module of the architecture, the Personalized-Ranking Module, which can leverage the interactive power of the Web to better understand buyer’s preferences. Section 4 describes some experiment results that show how our system reaches a degree of accuracy in understanding users’ dynamic preferences within a few interactions. A discussion is provided in Section 5, followed by the conclusion in Section 6.
2. An architecture of next-generation shopping agents In this section, we describe an architecture of nextgeneration comparison-shopping agents, as shown in Fig. 2, which can handle the reality challenges raised in Section 1. • Product/Merchant Information Collector: for particular kinds of products, for example, a portable CD player, the collector acquires a variety of brands/models of portable CD players from the Web sites of major electronics online stores. • Buyer Behavior Extractor: with the current rank of products/merchants, the extractor acquires the behavior of the buyer such as the items the buyer browses, the order the items are browsed, the time spent by the buyer for browsing the details of the items and the online store Web sites that the buyer browses.
Fig. 2. A structure of the integrative comparison-shopping agents.
S.-T. Yuan, A. Liu / Expert Systems with Applications 18 (2000) 283–297
285
the following sections, we provide the description of the method, the rationale behind why the method suits the task and a novel modeling of next-generation comparison shopping. 3.1. Reinforcement learning with value approximation
Fig. 3. The RL model.
• User Profile Manager: the manager retains the buyer’s behaviors and the behavior analysis results generated by the Online Learning Personalized-Ranking Module. • Online Learning Personalized-Ranking Module: with the buyer’s behavior history and the product/merchant information, the module makes the necessary reasoning/ computation/analysis of the multiple terms of products/ merchants in order to provide a better personalized rank—a rank where the preferred items are placed nearer to the front of the rank. This architecture is qualified to support the two agentenabled activities as mentioned in Section 1. • Agent-enabled customized ranked list: 1. With the Personalized-Ranking Module that makes inferences on the multiple terms of the product/ merchant, the architecture makes the seller differentiation possible. 2. With the User Profile Manager and the Buyer Behavior Extractor, the architecture makes the buyer differentiation possible. • Agent-enabled interaction behavior analysis: 1. With the Product/Merchant Information Collector, the architecture makes the differentiation change of the sellers, i.e. the up-to-date terms of products/ merchants, known to the buyers. 2. With the Personalized-Ranking Module that repeatedly performs the behavior analysis of interactions acquired by the Buyer Behavior Extractor, the architecture takes the differentiation change of the buyers into account, improving the customized ranked list. In this paper we only detail the core component of the architecture, the Online Learning Personalized-Ranking Module in Section 3. For the details of the other components, please refer Yuan and Liu (1999). 3. The Online Learning Personalized-Ranking Module The task of the Online Learning Personalized-Ranking Module is to provide personalized ranks that reflect the realities of the differentiation of sellers and the change in the buyer differentiation. The method behind the module we use is temporal-difference reinforcement learning (Sutton, 1988) with value approximation (Sutton & Barto, 1998). In
Reinforcement Learning (RL) combines the fields of dynamic programming and supervised learning to yield powerful machine-learning systems. In RL, the computer is simply given a goal to achieve. The computer then learns how to achieve that goal by trial-and-error interactions with its environment. In the standard RL model as shown in Fig. 3, an agent interacts with its environment model. This interaction takes the form of the agent sensing the environment, and based on this sensory input (state), chooses an action to perform in the environment. The action changes the environment in some manner and this change is communicated to the agent through a scalar reward. In other words, the RL system learns a mapping from situations to actions by trialand-error interaction with a dynamic environment. Accordingly, there are three fundamental parts of an RL problem: 1. The environment: partially observable or perfectly observable represented by states. 2. The reward function: a mapping from a state to a scalar reward indicating the desirability of the state, and through which the goal of the RL system is defined and furthermore through which the RL agent learns to perform actions that will maximize the total rewards received in the long run. 3. The value function: a mapping from a state to a state value representing the sum of the rewards received when starting in the states, and performing actions until a terminal state is reached. It can be formulized as an iterative optimization function and can determine a policy (a mapping from states to actions) that determines which action should be performed in each state and which action choices are made on the basis of value judgement—seeking actions that bring about states of highest value. In addition, it can be approximated using a function approximator such as a multi-layered perceptron. In the RL model, the estimates of value functions are generally represented as a table with one entry for each state. However, it is limited to tasks with small numbers of states and actions. The problem is not just the memory needed for large tables, but the time and data needed to accurately fill them, that is, a generalization problem (a limited sub-set of the state space can be generalized to produce a good approximation over a larger sub-set). On the other hand, supervised learning is about generalization from examples. Therefore, the combination of RL with an existing generalization method can provide such generalization. We call this a value function approximator (Baird, 1997).
286
S.-T. Yuan, A. Liu / Expert Systems with Applications 18 (2000) 283–297
3.2. The rationale To view the search for good personalized rank as an RL problem, we must find a natural correspondence between them. Such a correspondence is shown below. • Environment (state): a current ranked list of product/ merchant information. Initially, the environment is represented by the price-dominated ranked list of product/ merchant information. • Reward: a measurement of how good the current customized ranked list is based on the feedback behavior invoked by the buyer on the environment. • Agent: the entity that performs the analysis of behavior interactions based on the current reward and the past understanding of the buyer (the current value function) so that the proposed actions can improve over time. • Action: the proposed ranked list generated by the agent based on its analysis results.
3.3. The modeling In order to apply the RL learning with value approximation in the search for good personalized rank on behalf of the buyer, we need to define the reward function, describe the RL method we use, define the value approximator and explain how the agent comes up with ranks based on the analysis results. 3.3.1. Reward function The goal of our RL model is to find the near-to-best personalized ranked list of product/merchant information—the position of the items in the rank is proportional to the buyer’s interests over the items. Therefore, the reward function should be defined in such a way that good actions (good ranked lists) result in higher rewards than bad ones. The measurement of the ‘goodness’ of a ranked list is defined through the analysis of the feedback behavior rendered by the buyer on the items of the ranked list. Currently, we only consider three types of feedback behaviors: clicking for details, staying for the look of the details and linking to online store sites, i.e. in order to get insight into shoppers, vendors can set up a team in back rooms monitoring how often and how long shoppers glance at a pair of slacks or touch a rack of sweaters, while still making new products on the spot and rushing them out to those fussy buyers. Before stating the formula of the reward function, there are three underlying assumptions: 1. The sooner an item is clicked for details the more interest the buyer has. 2. The sooner an item is linked for its store presence the more interest the buyer has. 3. The longer an item’s details are browsed the more interest the buyer has.
The reward function is then defined as the weighted sum of the three interests, as shown in Eq. (1), each of which is computed based on a particular kind of feedback behavior with the formula shown in Eq. (2): R w1 Rc ⫹ w2 Rt ⫹ w3 Rl N X
PositionScorei × BehaviorScorei
1
1 N X
BestPositionScorei
1
× BestBehaviorScorei
(2)
Rc: the interest contributed by the feedback behavior of clicking for details. Rt: the interest contributed by the feedback behavior of staying for the look of the details. Rl: the interest contributed by the feedback behavior of linking to the store web Site. N: the total number of items in the ranked list. PositionScorei: the position score of the ith item in the ranked list is defined as
N ⫺ i ⫹ 1 and i its position in the ranked list. For example, the item at the top of the rank has the PositionScore of N. BehaviorScorei: the score of the order Orderi, in which the behavior is rendered on the ith item is defined as
1=Orderi : For example, the item which is clicked last for its details has the BehaviorScore of 1=N: BestPositionScorei: the position score of the ith item in the best ranked list. BestBehaviorScorei: the behavior score of the ith item in the best ranked list. Therefore, for the example of the ‘click’ behavior, the items in the best ranked list are supposed to be clicked sequentially from the top to the bottom; subsequently, the Rc interest of the best ranked list equals 1, computed from the formula shown in the Eq. (2). Conversely, the worst ranked list is supposed to be clicked sequentially from the bottom to the top, the Rc interest of the worst ranked list equals 0. 3.3.2. RL method The RL method we use is the temporal difference TD(0) (Sutton, 1988), in which the TD(0) method updates estimates based in part on other learned estimates without a model of the environment’s dynamics and without waiting for a final outcome. The TD(0) method uses the value of the successor and the reward along the way to compute a backed-up value as shown in formula (3), thus changing the value of the original state accordingly. The value iteration formula is shown in Eq. (4). The TD(0) method is sound (Sutton, 1988), that is, it guarantees the convergence to the correct answer for the table-based case of the value function or for the case of general linear value function approximators (Sutton & Barto, 1988). St ! Rt ⫹ gV
St⫹1
3
S.-T. Yuan, A. Liu / Expert Systems with Applications 18 (2000) 283–297
287
Table 1 An example of the computation of a sub-characteristic The real ranked list
The best ranked list
12 20 13 16 11 15 15 13 16 12 20 11 10 10 The total score of the ranked list
V
St ← V
St ⫹ aRt ⫹ gV
St⫹1 ⫺ V
St
The correct score based on the best ranked list
The current score based on the real ranked list
Score
3 4 2 5 6 7 1
7 6 5 4 3 2 1
7×3 4×6 2×5 5×4 6×3 7×2 1×1 108
4
St: the state at time t; Rt: the observed reward at time t; g : discounted factor; a : learning rate where V
St is updated to be closer to the value of Rt ⫹ gV
St⫹1 : 3.3.3. Value approximator In the RL model, the state space of our problem is extremely large. Therefore, V
S has to be represented by a function approximator other than a lookup table. The function approximator we use is Backpropagation Neural Network (BPN) (Chester, 1993). The normal inputs to the BPN are examples of the desired input–output behavior of the function it is trying to approximate. Therefore, the basic conception of the combination of RL and the BPN is the following: The backup formula, as in Eq. (3), means that the estimated
N X
N: the total number of items in the ranked list T : St !
a1 a2 …an
N X
WorstCorrectScorei × WorstCurrentScorei
1
BestCorrectScorei × BestCurrentScorei ⫺
1
value for state St should be more like
Rt ⫹ gV
St⫹1 : Therefore, it is natural to interpret each backup as specifying an example of the desired input–output behavior of the function approximator BPN. We use the BPN for value prediction by simply passing to it each backup St !
Rt ⫹ gV
St⫹1 as a training example, subsequently all of the weights would be adjusted through gradient descent to make the actual output closer to the desired output. The approximated function produced by the BPN is then interpreted as an estimated value function. As St is a ranked list of product/merchant information, we need a transformation function that can transform each ranked list into a set of attribute values as shown in formula
5
n: the number of the attributes ai: a sub-characteristic of the given ranked list based on an attribute
CorrectScorei × CurrentScorei ⫺
1 N X
(5). These attribute values are then combined to form the input vector of a training example. In other words, each input vector should reflect the characteristics of a ranked list of product/merchant information and thus is qualified to be representative of the ranked list. The heuristic we employ is that the characteristics of a whole ranked list of the multiple-attribute product/merchant information can be represented by an array of sub-characteristics each of which summarizes the ranked list according to a single attribute. The sub-characteristic with respect to an attribute is defined as in formula (6):
N X
6 WorstCorrectScorei × WorstCurrentScorei
1
CurrentScorej: the position score of the jth item in the ranked list is defined as
N ⫺ j ⫹ 1 where j is its position in the ranked list. For example, the item at the top of the rank has the CurrentScore of N. CorrectScorej: the correct score of the jth item is defined as its attribute-value order in the list. For example, the item with the smallest value 2 of the attribute then has the CorrectScore of 1 when highest values are preferred. 2 The score in proportion to the increasing/decreasing order of the attribute value depends on the preference of the larger/smaller values of the attribute.
288
S.-T. Yuan, A. Liu / Expert Systems with Applications 18 (2000) 283–297
Fig. 4. The steps for the evaluation of our system.
generally determine a better ranked list of product/merchant information (a good action) than the previous one (the current state of the environment). This decision is obtained by computing the values of all possible actions at the current state (the process of value computing is shown above) and choosing the action with the highest value. However, from our experiments the value computation process is lengthy because of the mapping step from a ranked list to its subcharacteristic vector, where the size of all possible actions at each state is as big as the number of all possible permutations of the items. Therefore, we need a heuristic in order to speedup this decision process. This heuristic is used in the search for an action with a high value. The heuristic is to reverse the decision process: obtaining the sub-characteristic vector of an approximately best ranked list and then determining the final ranked list of the items by comparing the distance between each item and the approximately best rank. From formula (5), the characteristic of a ranked list is represented by a vector of sub-characteristics ai, and each ai ranges from 0 to 1. The way we obtain a roughly best ranked list is as follows:
WorstCurrentScorei: the position score of the ith item in the worst ranked list. WorstCorrectScorei: the attribute-value order score of the ith item in the worst ranked list. Below is an example of the computation of the warranty length sub-characteristic of a ranked list of seven items. The leftmost column in Table 1 shows a shortened ranked list in which only the values of the particular term, warranty length, are available. The score is equal to the summation of the product of CorrectScore and CurrentScore of the items in the ranked list. The score of the best ranked list 7 × 7 ⫹ 6 × 6⫹ 5 × 5 ⫹ 4 × 4 ⫹ 3 × 3 ⫹ 2 × 2 ⫹ 1 × 1 140. The score of the worst ranked list 7 × 1 ⫹ 6 × 2 ⫹ 5 × 3 ⫹ 4 × 4 ⫹ 3 × 5 ⫹ 2 × 6 ⫹ 1 × 7 84: The warranty length sub-characteristic of the ranked list is
108 ⫺ 84=
140 ⫺ 84 0:43; a score that summarizes the characteristics of the ranked list according to the warranty length term. 3.4. The agent
1. For each ai, select six candidate values from 0 to 1 such
With the approximated value function, the agent can
Table 2 The sub-characteristic vectors of the ranks generated in the consecutive ten interactions for the example in Section 4.2.1 The Rank after the Nth interaction
Price sub-char.
Warranty length sub-char.
Merchant reputation sub-char.
Product-band reputation sub-char.
Delivery days sub-char.
Delivery cost sub-char.
Value
Initial rank 1st 2nd 3rd 4th 5th 6th 7th 8th 9th 10th
1.000 0.761 0.496 0.369 0.572 0.529 0.381 0.352 0.116 0.113 0.113
0.241 0.103 0.552 0.552 0.379 0.793 0.828 0.793 0.828 0.862 0.862
0.413 0.272 0.435 0.196 0.109 0.674 0.815 0.848 0.739 0.728 0.728
0.100 0.450 0.550 0.650 0.500 0.550 0.550 0.550 0.750 0.750 0.750
0.358 0.134 0.194 0.090 0.060 0.791 0.806 0.821 0.761 0.716 0.716
0.439 0.756 0.756 0.854 0.878 0.195 0.22 0.293 0.366 0.341 0.341
0.262 0.362 0.562 0.654 0.769 1.041 1.223 1.304 1.287 1.287 1.284
Table 3 The sub-characteristic vector of the ideal rank for the example in Section 4.2.1 Price sub-char.
Warranty length sub-char.
Merchant reputation sub-char.
Product-band reputation subchar.
Delivery days sub-char.
Delivery cost sub-char.
0.164
1.000
0.532
0.550
0.522
0.463
S.-T. Yuan, A. Liu / Expert Systems with Applications 18 (2000) 283–297
as 1,.8,.6,.4,.2 and 0. Subsequently, all possible ranked lists of the items can be roughly represented by the set of the sub-characteristic vectors, which is generated by combining each possible candidate values of each attribute. 2. Compute the values of the sub-characteristic vectors in the set with the current trained value approximator. 3. The vector with the highest computed value accordingly characterizes the approximately best ranked list. Next, the way to determine the final ranked list of the items is to assign orders to the items according to their distance from the approximately best ranked list. For instance, in the most simple case of a sub-characteristic vector of size 1, such as {0}, and with a list of four items with values 10, 20, 30 and 40, the item with a value of 10 is the closest one because it is the smallest value. For a vector of size more than 1, the distance metric can be either the Euclidean or the Absolute metric. 4. Experiments This section describes the strategy behind our experiments and presents a portion of experimental results undertaken for the evaluation of our system. For the entire results, please refer to (Yuan & Liu, 1999) for details. Above all, we need to demonstrate the degree of accuracy and the number of interactions for the online learning module in order to leverage the interactive power of the Web for better understanding of buyers’ preferences. 4.1. The strategy The fundamental strategy of our experiments comprises the four steps shown in Fig. 4. As mentioned in Section 1, buyers are often distinguished from each other by their different valuations of product/merchant terms. The wellrecognized ways of valuation are the ideal-brand model, the conjunctive model, the disjunctive model and the expected model. Therefore, in order to have an independent measure of actual preferences, each valuation model of buyers is used for modeling the behavior of a buyer who interacts with the personalized ranks generated by our system. In the experiments, the target product type is a Walkman and the terms of product/merchant information taken into account are product price, warranty length, delivery days, delivery cost, product-brand reputation and merchant reputation. Prior to any interaction, the initial personalized rank provided is based solely on the term of price because we believe most, if not all, buyers regard price as an important consideration in shopping. Within each buyer-valuation model, we test the system for different users and also test the system in cases where buyer preference changes and product/seller content changes. Part of the results are described in Section 4.2.
289
4.2. The results Due to the limitation of space, we only give some experimental results for the disjunctive model (Sections 4.2.1 and 4.2.2) and some results for the expected model of buyer valuation (Section 4.2.3). In the disjunctive model, the preference of the items is in accordance with the degree of satisfaction of the values of particular specified terms, regardless of the values of the rest of the terms. In the expected model, buyer’s preference over an item is obtained by computing the weighted sum of utilities over the term values of the item. 4.2.1. The fundamental results In this section, we give results on an example in which a buyer, who is interested in shopping for a Walkman, thinks the warranty length as the primary consideration, the product price as the secondary consideration and is indifferent to the rest of the terms of product/merchant information. Table 2 shows the interaction results in terms of the subcharacteristic vectors that represent the personalized ranks generated in the consecutive ten interactions. The ‘Value’ column represents the values computed by the online learning module for each generated rank. These values can reflect how close these generated ranks have come to achieving near-to-best situations. Table 3 is the sub-characteristic vector of the ideal rank for this example. Fig. 5 is the graph view of Table 2. Table 4 then shows the generated rank after the tenth interaction. From Table 2 and Fig. 5 it can be seen that the generated ranks gradually progress toward the direction of longer warranty length and the direction of lower product price. The degree of accuracy in the acquired preferences can be realized by comparing the sub-characteristic vector of the ideal rank in Table 3, as approximately
1 ⫺
0:164 ⫺ 0:113=0:164 ⫹
1 ⫺
1 ⫺ 0:862=1=2 0:78: Figs. 6 and 7 show some screen dumps that convey a sense of our system. 4.2.2. The results after buyer preferences change Now suppose that after the tenth interaction, the buyer modeled in Section 4.2.1 changes his preferences to focusing on delivery cost, preferably buying things from shops with middle-to-low reputation and being indifferent to the rest of the terms of product/merchant information. Table 5 shows the interaction results in terms of the sub-characteristic vectors that represent the ranks generated in another five consecutive interactions. From Table 5, the generated ranks gradually progress toward the direction of low delivery cost and the direction of shops with middle-to-low reputation. Table 6 is the generated rank after the fifteenth interaction. 4.2.3. The results after product/seller contents change In this section, we give results on the example in which a buyer, who is interested in shopping for a Walkman, weighs
290
S.-T. Yuan, A. Liu / Expert Systems with Applications 18 (2000) 283–297
Table 4 The generated rank after the tenth interaction for the example in Section 4.2.1 Merchant
Product model
Product brand
Price
Warranty length
Delivery days
Delivery cost
SX500
3900
24
3
150
SX510
4000
24
3
150
D-E505
4000
24
3
150
D-E700
4200
24
5
50
D-E800
4900
24
4
100
D-E808
5000
24
3
100
D-E900
6000
24
3
25
D-E400
2900
18
6
100
SP220
3500
18
5
50
SX510
4000
18
5
50
D-E700
4600
18
4
100
D-E707
4800
18
6
100
the utilities of Walkman terms differently. For instance, the relative weights for the terms of price, warranty length, merchant reputation, product brand, delivery days and delivery cost are 3, 3, 10, 2, 10 and 1, respectively. Table 7 shows the sub-characteristic vectors of the ranks
generated in the ten consecutive interactions before the contents change, while Table 8 is the sub-characteristic vector of the ideal rank for this example and Table 9 is the rank generated after the ten interactions before the contents change.
Table 5 The sub-characteristic vectors of the ranks generated after buyer’s preferences change from the eleventh interaction The rank after the Nth interaction
Price sub-char.
Warranty length sub-char.
Merchant reputation sub-char.
Product-band reputation subchar.
Delivery days sub-char.
Delivery cost sub-char.
Value
11th 12th 13th 14th 15th
0.634 0.579 0.579 0.579 0.579
0.483 0.483 0.483 0.483 0.483
0.359 0.511 0.511 0.511 0.511
0.400 0.550 0.550 0.550 0.550
0.239 0.313 0.313 0.313 0.313
0.488 0.415 0.415 0.415 0.415
1.120 1.096 1.074 1.051 1.050
S.-T. Yuan, A. Liu / Expert Systems with Applications 18 (2000) 283–297
291
Table 6 The generated rank after the buyer’s preferences change Merchant
Product model
Product brand
Price
Warranty length
Delivery days
D-E505
3800
12
7
25
D-E707
4400
12
7
25
D-E900
6000
24
3
25
SP270
2900
12
5
50
SX500
3400
12
5
50
SP220
3500
18
5
50
SX510
4000
18
5
50
D-E700
4200
24
5
50
D-E400
2900
18
6
100
D-E707
4800
18
6
100
D-E808
5000
24
3
100
Now suppose the contents of products/sellers change after the tenth interaction, however, the learned preferences can still transfer into the new contents. Table 10 then shows the new sub-characteristic vectors of the ranks generated conse-
Delivery cost
cutively. Table 11 is the sub-characteristic vector of the ideal rank for the new contents and Table 12 shows the final rank generated by our system after these new interactions. The degree of accuracy of transferred preferences is
Table 7 The sub-characteristic vectors of the ranks generated in the consecutive ten interactions for the example before the contents change The rank after the Nth interaction
Price sub-char.
Warranty length sub-char.
Merchant reputation subchar.
Product-brand reputation subchar.
Delivery days sub-char.
Delivery cost sub-char.
Value
Initial rank 1st 2nd 3rd 4th 5th 6th 7th 8th 9th 10th
1.000 0.761 0.458 0.369 0.572 0.568 0.393 0.381 0.747 0.503 0.324
0.241 0.103 0.517 0.552 0.379 0.414 0.759 0.828 0.310 0.552 0.655
0.413 0.283 0.196 0.196 0.109 0.087 0.293 0.815 0.337 0.750 0.772
0.100 0.350 0.500 0.650 0.500 0.500 0.550 0.550 0.000 0.500 0.700
0.358 0.149 0.075 0.090 0.060 0.030 0.567 0.806 0.134 0.582 0.597
0.439 0.732 0.829 0.854 0.878 0.854 0.561 0.220 0.659 0.195 0.244
0.331 0.504 0.597 0.729 0.825 0.836 0.980 1.221 1.174 1.270 1.332
292
S.-T. Yuan, A. Liu / Expert Systems with Applications 18 (2000) 283–297 1.4
Sub-Characteristic Value
1.2 1 0.8 0.6 0.4 0.2 0 Interaction Number Price Sub-Char. Warrenty-Length Sub Char. Merchant-Reputation Sub-Char. Product-Reputation Sub-Char. Delivery-Days Sub-Char. Delivery-Cost Sub-Char. Value
Fig. 5. The graphical view of Table 2.
Fig. 6. The screen dump of a personalized ranked list of product/merchant information where the interaction behaviors of the buyer are rendered.
S.-T. Yuan, A. Liu / Expert Systems with Applications 18 (2000) 283–297
293
Fig. 7. Within the product/merchant information, the detail of an item becomes available when the item is clicked for detail by a buyer.
shopping. We then discuss our results in terms of the number of interactions needed, the degree of accuracy in understanding buyer’s preferences and the portability of learned preferences.
roughly as follows:
3=29
1 ⫺
0:601 ⫺ 0:571=0:571 ⫹
3=29
1 ⫺
0:952 ⫺ 0:810=0:952 ⫹
10=29
1 ⫺
0:962 ⫺ 0:844=0:962 ⫹
2=29
1 ⫺
0:500 ⫺ 0:232=0:232 ⫹
10=29
1 ⫺
0:935 ⫺ 0:641=0:935 ⫹
1=29
1 ⫺
0:350 ⫺ 0:143=0:350 ⬇ 0:75:
5. Discussion In this section, we first describe a recent work that also addresses the buyer differentiation problem in comparison
1. In a recent work of T@T (Moukas, Guttman & Maes, 1998), the challenge of seller/buyer differentiation as mentioned in Section 1 was addressed. Nevertheless, the differentiation change was not resolved in the work. T@T provided a buyer personalized ranked lists of product/merchant information with multiple terms such as extended warranty, delivery time and price. The mechanism T@T based its work on was Multi-Attribute Decision Utility Theory (MAUT) (Keeney & Raiffa, 1976). This theory involved two distinctive analyses: an uncertainty analysis and a utility (preference) analysis that analyzed preferences with multiple attributes. Uncertainty analysis is concerned with the uncertainties of
Table 8 The sub-characteristic vector of the ideal rank before the contents change Price sub-char.
Warranty length sub-char.
Merchant reputation sub-char.
Product-brand reputation Sub-char.
Delivery days sub-char.
Delivery cost sub-char.
0.412
0.759
0.866
0.146
0.913
0.450
294
S.-T. Yuan, A. Liu / Expert Systems with Applications 18 (2000) 283–297
Table 9 The generated rank after the tenth interaction before the contents change Merchant
Product model
Product brand
Price
Warranty length
Delivery days
Delivery cost
D-E505
4000
24
3
150
SX500
3900
24
3
150
SX510
4000
24
3
150
D-E400
3500
12
3
150
XP-R650
4200
12
3
150
D-E900
6000
24
3
25
D-E800
4900
24
4
100
D-E700
4600
18
4
100
XP-R650
3300
12
4
100
D-E808
5000
24
3
100
D-E400
2900
18
6
100
D-E700
4200
24
5
50
D-E505
3800
12
7
25
D-5WD
3400
12
3
150
SX510
4000
18
5
50
SP220
3500
18
5
50
SX500
3400
12
5
50
SP270
2900
12
5
50
D-E707
4800
18
6
100
D-E707
4400
12
7
25
S.-T. Yuan, A. Liu / Expert Systems with Applications 18 (2000) 283–297
295
Table 10 The sub-characteristic vectors of the ranks generated in the five consecutive interactions after the contents change The rank after the Nth interaction
Price sub-char.
Warranty length sub-char.
Merchant reputation sub-char.
Product-brand reputation subchar.
Delivery days sub-char.
Delivery cost sub-char.
Value
11th 12th 13th 14th 15th
0.696 0.635 0.601 0.601 0.601
0.714 0.762 0.810 0.810 0.810
0.844 0.857 0.844 0.844 0.844
0.400 0.400 0.500 0.500 0.500
0.654 0.641 0.641 0.641 0.641
0.054 0.071 0.143 0.143 0.143
1.357 1.267 1.203 1.117 1.166
attribute values in buyers’ buying decisions. Utility analysis is concerned with the tradeoff relationships between the multiple attributes in buyers’ buying decisions. In the beginning, T@T required direct feedbacks from buyers for the acquirement of the uncertainties and the utilities before application of the MAUT theory for the generation of the personalized ranked list. Accordingly, in addition to imposing on the buyers the burden of the assessment of uncertainties and utilities, T@T could not adapt over time to the changing differentiation of the buyers. 2. In Sections 4.2.1 and 4.2.2, within the disjunctive model of buyer valuation, we demonstrate how our system is able to understand user preferences after the fourth interaction (see Table 2), and takes few extra interactions (see Table 5) to realize the change in buyer’s preferences. Furthermore, the degree of accuracy for learned preferences is around 78%. Therefore, the results look quite promising and such promises also occur in the other buyer-valuation models, such as the ideal-brand model and the expected model (Yuan & Liu, 1999). 3. In addition, Section 4.2.3 shows that within the expected model of buyer valuation, for the case of product/seller content change, our system still adapts smoothly in corresponding to the new products/sellers. The degree of accuracy for transferred preferences is around 75%. Similar results apply to the other buyer valuation models (Yuan & Liu, 1999). In other words, within a buyer-valuation model, the learned preferences of a buyer transfer well across different products/sellers of same type. For instance, given a buyer, the learned preferences for a Walkman can transfer across different sets of products/ sellers or even to a CD player. 4. From the experiments we also recognize two observations: ◦ With very few behaviors, such as sparse clicking per interaction, the system may not learn a good ranked list.
◦ The bigger the discrepancy between our initial rank (price-dominated rank) and the ideal rank in the buyer’s mind, the larger the number of interactions required for obtaining a good ranked list. Based on the above limitations, it may be better to combine our approach with a pre-processing step, which aims to provide a good initial rank from which our system can proceed. Such a pre-processing step can employ either a knowledge-based approach with simple user-interest solicitation or a clustering approach with the analysis of the clusters obtained. 6. Conclusion In this paper, we have advocated the importance of next-generation agent-based comparison shopping because it advances the current stage of comparison shopping, price-dominated ranked list of merchant/ product information to a stage that reflects the reality challenges of seller/buyer differentiation and differentiation change. We then present an architecture that is able to support the next-generation agent-enabled comparison shopping, and an implemented system in accordance with the architecture is provided. The core component of this system is an Online Personalized-Ranking Model that plays a vital role in the two agent-enabled activities: agent-enabled customization of ranked lists and agent-enabled analysis of buyer interaction behaviors. The method of the module we employ is the reinforcement learning with value approximation. Using this method, we provide a novel modeling of next-generation comparison shopping. The results show that our implemented system is quite promising in terms of the number of interactions needed and the degree of accuracy obtained to understand buyer’s preferences with respect to different models of buyer’s valuation.
Table 11 The sub-characteristic vector of the ideal rank after the contents change Price sub-char.
Warranty length sub-char.
Merchant reputation sub-char.
Product-brand reputation subchar.
Delivery days sub-char.
Delivery cost sub-char.
0.571
0.952
0.962
0.232
0.935
0.350
296
S.-T. Yuan, A. Liu / Expert Systems with Applications 18 (2000) 283–297
Table 12 The generated rank after the tenth interaction after the contents change Merchant
Price
Warranty length
Delivery days
Delivery cost
SP270
2900
36
3
200
SX500
3600
36
3
200
SX510
3700
36
3
200
D-E707
4800
36
3
200
D-E700
5000
36
3
200
D-E700
4600
24
4
50
D-E800
4900
24
4
50
XP-R650
3300
12
4
50
SOGO
D-5WD
3400
24
6
150
SOGO
SX500
3400
24
6
150
SOGO
D-E505
3800
24
6
150
D-E400
3500
24
5
50
D-E808
5000
24
6
150
D-E505
4200
24
5
50
SP220
3600
12
6
150
SX510
4000
24
5
50
XP-R650
4200
12
5
50
D-E400
2900
12
7
100
D-E707
4000
12
7
100
D-E900
5500
12
7
100
SOGO
SOGO
Product model
Product brand
S.-T. Yuan, A. Liu / Expert Systems with Applications 18 (2000) 283–297
References Baird, L. (1997). Residual algorithms: reinforcement learning with function approximation, http://kirk.usafa.af.mil/~baird. BargainFinder. URL: http://bf.cstar.ac.com/bf. Caglayan, A., & Harrison, C. (1997). Agent source book: A complete guide to desktop, internet, and intranet agents. New York: Wiley. Chester, M. (1993). Neural networks. Englewood Cliffs, NJ: Prentice Hall. Doorenbos, R. B., Etzioni, O. & Weld, D. S. (1997). A scalable comparison-shopping agent for the world-wide web, http://www.cs. washington.edu/homes/etzioni/. Firefly. URL: http://www.firefly.com/. Guttman, R. H., & Maes, P. (1998). Agent-mediated electronic commerce: a survey. Knowledge Engineering Review, June. Jango. URL: http://www.jango.com/. Keeney, R., & Raiffa, H. (1976). Decisions with multiple objectives: preferences and value tradeoffs. New York: Wiley. Kotler, P. (1999). Marketing management. Prentice Hall. McEachern, T., & O’Keefe, B. (1998). Rewriting business: uniting management and the web. New York: Wiley. Moukas, A. G., Guttman, R. H. & Maes, P. (1998). Agent-mediated inte-
297
grative negotiation for retail electronic commerce, http://ecommerce. media.mit.edu. PersonaLogic. URL: http://www.personalogic.com/. Sutton, R. S. (1988). Learning to predict by the method of temporal differences. Machine Learning, 3, 9–44. Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: an introduction. Cambridge, MA: MIT Press (http://envy.cs.umass.edu/~rich/book/ the-book.html). Yuan, S. T., & Liu, A. (1999). Integrative shopping agents. Technical report. Taipei: Fu-Jen University (Information Management Department). Soe-Tsyr Yuan is an associate professor of Information Management Department of Fu-Jen University in Taipei, Taiwan. Current research interests include agent-based electronic commerce and knowledge discovery and data mining.
Andy Liu is a graduate student of Information Management Department of Fu-Jen University in Taipei, Taiwan.