Learning to bid in sequential Dutch auctions

Learning to bid in sequential Dutch auctions

Journal of Economic Dynamics & Control 48 (2014) 374–393 Contents lists available at ScienceDirect Journal of Economic Dynamics & Control journal ho...

901KB Sizes 0 Downloads 88 Views

Journal of Economic Dynamics & Control 48 (2014) 374–393

Contents lists available at ScienceDirect

Journal of Economic Dynamics & Control journal homepage: www.elsevier.com/locate/jedc

Learning to bid in sequential Dutch auctions E. Guerci a,n, A. Kirman b,c, S. Moulet b a

GREDEG (UMR 7321), University of Nice Sophia-Antipolis, 250 rue Albert Einstein, 06560 Valbonne, France GREQAM (UMR 7316), Aix Marseille University, 2, rue de la vieille charité, 13236 Marseille Cedex 02, France c EHESS, France b

a r t i c l e in f o

abstract

Article history: Received 12 November 2013 Received in revised form 18 September 2014 Accepted 18 September 2014 Available online 28 September 2014

We propose an agent-based computational model to investigate sequential Dutch auctions with particular emphasis on markets for perishable goods and we take as an example wholesale fish markets. Buyers in these markets sell the fish they purchase on a retail market. The paper provides an original model of boundedly rational behavior for wholesale buyers' behavior incorporating learning to improve profits, conjectures as to the bids that will be made and fictitious learning. We analyze the dynamics of the aggregate price under different market conditions in order to explain the emergence of market price patterns such as the well-known declining price paradox and the empirically observed fact that the very last transactions in the day may be at a higher price. The proposed behavioral model provides alternative explanations for market price dynamics to those which depend on standard hypotheses such as diminishing marginal profits. Furthermore, agents learn the option value of having the possibility of bidding in later rounds. When confronted with random buyers, such as occasional participants or new entrants, they learn to bid in the optimal way without being conscious of the strategies of the other buyers. When faced with other buyers who are also learning their behavior still displays some of the characteristics learned in the simpler case even though the problem is not analytically tractable. & 2014 Elsevier B.V. All rights reserved.

JEL classification: C63 D44 D81 D83 Keywords: Multi-agent learning Auction markets Agent-based computational economics

1. Introduction This paper presents an agent-based computational model to analyze how agents learn to bid in markets based on sequential Dutch/descending price auctions. We construct a learning process for the agents in the model who have limited information and see to what extent they learn to bid as they would, in theory, if they had full information. Furthermore we show that our model is capable of reproducing certain stylized facts in these markets, such as the “declining price paradox” and the fact that, at the very end of the day prices may rise. The problem is of empirical interest since a large number of markets are organized as a sequence of auctions. These include those for wine (see Ashenfelter, 1989; Ginsburgh, 1998), art (see Ashenfelter and Graddy, 2003; Beggs and Graddy, 1997), dairy cattle (see Engelbrecht-Wiggans and Kahn, 1999), real estate (see Ashenfelter and Genovese, 1992), perishable goods such as flowers (see van den Berg et al., 2001) and fish (see Kirman, 2001; Graddy, 2006). The problem is also of theoretical interest. There is, in effect, a literature on what path prices for successive sales of identical objects should follow during a sequence of auctions. We examine the evolution of prices over time in such markets in our computational model and compare the evidence with that from previous work on such markets. n

Corresponding author. E-mail addresses: [email protected] (E. Guerci), [email protected] (A. Kirman), [email protected] (S. Moulet).

http://dx.doi.org/10.1016/j.jedc.2014.09.029 0165-1889/& 2014 Elsevier B.V. All rights reserved.

E. Guerci et al. / Journal of Economic Dynamics & Control 48 (2014) 374–393

375

The standard result is due to Milgrom and Weber (1982) who show that in equilibrium if bidders have independently drawn private values successive sales of identical objects should yield, on average, the same price. As van den Berg et al. (2001) point out if n buyers are faced with the successive sales of n identical objects each of whom wish to buy one, there is a tension between two forces. As each round passes one buyer is eliminated from the market suggesting that competition will decline. However one object is also removed so the available supply diminishes. These two tensions, as Milgrom and Weber (1982) show, should offset each other and the successive prices should follow a martingale with expected prices the same in each round. The intuitive explanation for this simple theoretical result is that if there was a pattern to the sequence of prices, then those participating in the auction at the time when the price was highest would have done better to participate at the lower price period. Yet, the empirical evidence suggests otherwise and in many markets organized as successive auctions prices have been observed to decline over time and, as a result, this has been called the “declining price paradox”. The earliest discussions which appeared in relatively obscure documents, concerned the sale of strawberries,1 an eminently perishable product. Yet declining prices have been observed in wine auctions (see Ashenfelter, 1989), fish markets (see e.g. Pezanis-Christou, 2000) and in flower auctions (see van den Berg et al., 2001) and in a number of other contexts. As a result of the simple game theoretical analysis this rather widespread phenomenon has since come to be known as the “declining price anomaly”. This is a reversal from the earlier literature where Sosnick (1963) has a discussion in which he suggests that declining prices might be quite natural in the auction of a sequence of objects sold at English auction since the first object would be sold at the reservation price of the bidder with the second highest price and the second at the third highest and so forth. For Sosnick, descending prices would be normal rather than an exception. Yet, in contrast with the later literature he noted that in practice this did not seem to be the case for those auctions for which he had information. He suggested, as an explanation, a form of strategic behavior in which bidders might bide their time and refers to this as “precautionary measures” and in so doing anticipates Milgrom and Weber without giving a formal argument. But Sosnick did not consider that people in reality take such precautionary measures and says: “In the absence of such protective measures, a downtrend in prices might be expected. Both the few price histories available and the comments of buyers, however, indicate that no pattern is sufficiently frequent to be regarded as typical. There appear to be several reasons why down-trends do not predominate”. Amongst these he suggests variations in quality and the entry and exit of buyers. The later literature on the subject, taking the equilibrium argument into account, offers a number of further explanations. Among these are uncertainty as to the number of objects remaining, (see Neugebauer and Pezanis-Christou, 2007), risk aversion (see McAfee and Vincent, 1993) the existence of unknown absentee bidders (see Ginsburgh, 1998) or the presence in some auctions of the option to buy some or all of the remaining objects at that price.2 This paper pursues this strand of research with a specific focus on the nature of this phenomenon in the context of centralized markets for perishable goods organized as sequential Dutch auction. Perishable goods markets are particularly suited to the analysis of price formation since they have a number of “good” properties. For instance, fish is perishable and therefore there is no possibility for the sellers to postpone the sale of the good (fresh fish cannot be stocked for more than a day and still be characterized as “fresh”), so the total quantity on hand must be traded. This greatly simplifies the economic analysis since one does not have to analyze the holding of inventories by the seller and can reduce his behavior to selling to the current best bidder with no intertemporal considerations. This helps us to arrive at conclusions in a simple model and to see to what extent they are consistent with the empirical evidence. One point is worth noting. The argument that a price decline happens because the seller cannot store the good should be qualified. In Dutch auctions the seller notifies the auctioneer of the price at which his lot should be withdrawn. Therefore if bidders wait too long before bidding they will not obtain the good. The reason that the seller behaves like this is that there are alternative uses for fish, petfood, fishmeal etc. In addition there is the possibility of selling the fish directly in outside over the counter markets. Consequently, it would seem that the declining price pattern is a direct outcome of the repeated competition among the buyers who participate in these auctions. One of the authors has acquired with colleagues, a detailed knowledge of the functioning of the Ancona wholesale fish market (see Gallegati and Gianfranco Giulioni, 2011). This was obtained from discussions and interviews with traders and auctioneers and by observing their modus operandi. The computational model incorporates some of this information. Although we present a stylized version of a market based on sequential Dutch auctions, our computational model incorporates some specific features of this Italian wholesale fish market. This seems to be quite reasonable as opposed to the more standard approach of using assumptions based on introspection. But imposing this sort of constraint should not prevent the model from capturing some important and interesting aspects that might be relevant for analyzing other perishable or even non-perishable good markets which often share the organizational and behavioral features found on our specific market. In the particular case of fish markets, several papers have already addressed this issue from a theoretical, computational or empirical point of view (Graddy, 2006; Weisbuch et al., 2000; Kirman and Vriend, 2001; Kirman et al., 2005; Giulioni and Bucciarelli, 2008, 2010, 2011; Gallegati and Gianfranco Giulioni, 2011; Kirman and Vignes, 1991).

1

See Bressler (1936) especially p. 16 and appendices. And Clark and Bressler (1938), p. 18. Black and de Meza (1992) suggests this as an explanation as this option becomes less valuable over time. However Ashenfelter (1989) argues that the existence of such an option does not suffice to explain observed price declines. 2

376

E. Guerci et al. / Journal of Economic Dynamics & Control 48 (2014) 374–393

As far as the computational literature is concerned, the most common agent-based modeling approach is to exploit a well-established behavioral model which determines certain minimal behavioral requirements for the agents' decision making process capable of replicating the empirical evidence and in particular certain “stylized facts” such as, in our current framework, the declining price paradox at the aggregate level. Standard approaches/algorithms are adopted for different economic contexts where the level of information available varies or where the repeated nature of the interaction among the same market actors is neglected. In the literature, some papers (Camerer et al., 2002; Hommes and Lux, 2013) have proposed new behavioral models incorporating not only the well-exploited adaptive assumptions, but also more sophisticated learning assumptions to capture salient aspects of the specific context modeled. In this paper we adopt a similar approach and propose a decisionmaking model which incorporates behavioral assumptions which have been derived also from field study and data analysis. The specification of the behavioral model that we propose is, in part, based on our specific empirical information. The idea is to validate the model not only from a so-called predictive or descriptive output validation perspective, but also to ensure that the structural conditions, institutional arrangements, and behavioral dispositions incorporated into the model reflect the salient aspects of the actual system.3 In the economic context which we analyze, we aim to give an accurate reflection of the level of information available to market operators about the market mechanisms and conditions, and their conjectures about opponent's behavior in the daily sequential descending auctions. We implement a nonstandard computational learning algorithm which captures behavioral aspects which can help us to reproduce the emergence of certain well known “stylized facts”. To do this, we build a computational model, in which we specify the behavior of the agents. We analyze the price evolution over the day, examine to what extent this is affected by learning over time, and whether prices decline over time. We look at situations in which outside (noise) traders are also bidding. This is convenient from a modelling point of view since it ensures that the good will always be sold, but also reflects reality, in that there are always a few buyers who are not regularly present and whose bidding is less predictable. We analyze the impact of varying the number of competing learning bidders and we examine the effect of changes in the quantity of the good which is available on the market on the evolution of prices. In particular, we identify and analyze the buyer's long-term stable “optimal” strategies both in a stationary (competing against a random buyer) and a non-stationary (competing against other learning buyers) market environment. We show that the behavior implied by this strategy is consistent with common stylized facts of market price dynamics of sequential Dutch auctions for perishable goods market, in particular the price declining phenomenon. Furthermore, in the context of a stationary environment, we prove analytically that despite the fact that the behavior of the opponents is unknown, the process finds the corresponding “full knowledge” optimal prices. In the nonstationary market environment, it is not possible to provide an analytical proof supporting the simulation results, but we confirm the presence of similar features in the learned “optimal policy”. Finally, by selecting ad hoc models of the retail demand (linear and perfectly inelastic) we can study the role that specific aspects of the behavioral model plays in the determination of price patterns. Summarizing, the three factors that in our model contribute significantly to the generation of declining prices both in the stationary and nonstationary market environments are the diminishing marginal profits (linear retail demand model), the illusory decreasing marginal profit effect due to intertemporal optimization (perfectly inelastic retail demand model) and more abundant supply. The behavioral model generates also a “time pressure” effect (both retail demand models), that is, in the last auction round, the buyer's optimal policy is to raise his bid for the unit that has not yet been sold because there is no more value for the option to wait, and this is again due to the intertemporal behavioral component. Thus although our model captures the overall declining price phenomenon reported elsewhere it also captures the empirical fact observed by Hardle and Kirman (1995) which is that for the very last transactions prices do not decline and sometimes rise. The paper is organized as follows. In the next section, we describe the Ancona wholesale fish market in order to provide detailed information on the market environment that inspired our computational model. In Section 3, we introduce the computational model of the market environment and of the market participants' decision making processes. In Section 4 we present results from simulations of different market scenarios and we discuss under which conditions our model generates the declining price anomaly. Finally, Section 5 presents some comments and concluding remarks. 2. An empirical example: the MER.IT.AN The MER.IT.AN (MERcato ITtico ANcona” which is Italian for Fish market of Ancona) is open 4 days a week (Tu.–Fr.; 3.30– 7.30 a.m.). It consists of 3 Dutch auctions which are run in parallel with about 15 transactions in total per minute. The total value of the fish sold amounts to 25 millions euros per year. All the vessels unload their fish the evening before the auction. In the morning, each type of fish is arranged in cases of about 5–7 Kilograms. Then, the catch from each of the vessels is randomly assigned to one of three conveyor belts and the market employees put the crates from that vessel on to the selected belt. When a crate from the selected vessel is put on the belt, the type of fish, the weight and the name of vessel that caught the fish are shown on the screen. Then, the price display for that crate is set (the auctioneer decides the initial price) and the price progressively declines as the crate moves toward the end of the belt. Thus, at any one time there are three crates, each of them moving along one of the belts. Buyers watch the three displays (or clocks) and can bid on one or 3

In this we follow Barreteau et al. (2003) who developed an input validation approach.

E. Guerci et al. / Journal of Economic Dynamics & Control 48 (2014) 374–393

377

more of them. In such Dutch auctions the first person to push the button corresponding to a belt, at the price that has been reached for the case on that belt, wins the auction. In the period from the 19th September 2002 to the 28th of May 2003, 53,555 transactions for a total weight of 360,115 kg were made of the fish sold on one of the three conveyor belts4 (see Gallegati and Gianfranco Giulioni, 2011). During this period 70 sellers and 149 buyers exchanged 110 transaction classes on this specific conveyor belt. Note that a transaction class is different from a species. In fact, in this market, the items sold are recorded using a more detailed classification than would be obtained by considering only species. For example, the species sole has three transaction classes: big, medium and small sole, defined by the length of the fish. The data are collected daily and stored on the market computer. In our computational model we will confine ourselves to the one belt for which we have data and we will assume also that market participants trade each transaction class independently and that a transaction class consists of a homogenous good.

3. The agent-based computational model The goods are assumed here to be indivisible units of a perishable and homogenous good, which are traded daily in the wholesale market and are then sold on retail markets. The former market is modeled as a sequential Dutch/descending auction. Each buyer on the wholesale market is assumed to be a monopolistic seller facing a demand on his own retail market. Every period/day d the same sequential trading procedure comprising T rounds is repeated.5 Furthermore, we assume, as is in fact the case, that each market participant meets all the other participants every day and that the population may contain some random or noise traders who are those who participate only occasionally as mentioned previously. The focus of this research is on the functioning of the wholesale market. In what follows, we sketch the characteristics of the model. The market actors are one auctioneer, several sellers and buyers. In each round of the auction the auctioneer is responsible for choosing the starting price P . The auctioneer seldom varies the starting price. This behavior is motivated by the attempt to avoid buyers' strategic reasoning. One auctioneer that we interviewed mentioned an analogy with the card game of bridge saying that auctioneers do not want buyers to implement the standard strategy in that game of counting and memorizing the cards which have already been played so as to work out which remain in the hands of their opponents.6 We restrict ourselves to the case of one seller for simplicity and we assume that at each of the T auction rounds one and only one unit of good is sold.7 The seller is characterized by a reservation price P corresponding to an opportunity cost related to a sale in another market. Furthermore,. In our model, both the seller and the auctioneer are assumed to be passive actors who do not act strategically, the rationale for this is that this enables us to focus on the buyers who exhibit the most interesting and relevant behavior among market actors with respect to price formation. Furthermore, the only strategic variable available to the auctioneer is the starting price and the sellers no longer play an active role once they have delivered their fish to the market. The n buyers however are assumed to learn from their experience and generate the daily demand. Both demand and supply may thus vary on a daily basis. In the following, we derive the profit function of the generic buyer i for day d under the hypothesis that she faces a daily elastic linear demand in the retail market. This exercise enables us to then introduce both the elastic and inelastic retail market demand cases that are reported in Fig. 1 and simulated. The profit function of the buyer i for day d is 



Π id pik ; Q i ¼ Q i

K ai  Q i  ∑ pik bi k¼1

ð1Þ

where Q i ¼ ∑Tt ¼ 1 1iK ðtÞ rT is the total number of units bought in day d in the wholesale market in the K A T auction rounds where she won, which are then sold in the retail market. 1iK ðtÞ is the indicator function, which is equal to one if in the tth auction round buyer i won (t belongs to the set K) otherwise is equal to 0. Thus, Qi is the final amount of good purchased in day d and not the desired amount, that we denote Q n;i , which can be obviously smaller or even, for simulation purposes, greater than Qi. We assume that she can resell in the same day d the total amount in her retail market. In the latter, the buyer i is supposed to face a daily linear demand Di;r ðpi;r Þ ¼ ai bi pi;r . On the other hand, costs correspond to the daily sum of d d i wholesale market prices pk paid by i in order to buy Qi units of the good. At round t of the auction of the day d, the winning buyer i, having already bought qi A f0; …; Q i g units of good, obtains a reward equal to the marginal profit of buying one more unit of good qi þ 1: 







 

π it qi þ 1 ¼ Π id qi þ 1  Π id qi ¼ 4

ai ð2qi þ 1Þ  pit : bi

t A 1; …; T

ð2Þ

Data for the other belts was not available but the organizers of the market intend to start recording this in the near future. Of course, this is a simplification since T is not constant and varies from day to day depending on the total catch. 6 In reality the buyers observe the passage of boxes or crates of different types of fish and since, in Ancona for example, the auction happens very fast there is one transaction every 4 s. This achieves the auctioneer's stated goal of “preventing the buyers from behaving strategically” (Gallegati and Gianfranco Giulioni, 2011). 7 In reality, the sellers determine the supply of fish but the captains of the vessels have little control over how much fish they can catch on any particular day. 5

378

E. Guerci et al. / Journal of Economic Dynamics & Control 48 (2014) 374–393

P

P Retail Demand

Re tai l De ma nd

Q

Q

Retail Demand

Fig. 1. Retail demand models.

Buyers rationally exit the current daily market when the marginal profit at price 0 is negative, in other words when there is no positive price at which it would be profitable to trade and this in turn determines the maximum/desired amount of quantity to buy in the current day Q n;i 4ðai 1Þ=2. Q n;i 4 Q i if the buyer cannot buy all the units she wishes in day d, and Q n;i oQ i if she buys more than desired. Fig. 1 shows on the left side the linear retail demand model. Both the continuous profit curve (dashed line) and discrete marginal profit values (continuous segment) are drawn. The diminishing marginal profit effect is evident and the profit region where marginal revenue is greater or equal to marginal costs is shown as a grey area. In the right-hand side of the figure, we report the second market scenario, where the retail demand is perfectly inelastic. The resulting marginal profits are constant up to the amount of goods sold in the retail market and after they are constant and negative. This assumption might appear rather extreme, but it captures an interesting effect. When we refer to demand on the retail market we are referring to a short term (daily) demand. In the long run the demand is never perfectly inelastic, but in our model the retail demand is determined by the daily behavior of the customers. The retail seller has to forecast the retail demand daily and to buy the fish in advance on the same day in the auction-based wholesale market. For example a restaurant owner who sells a particular dish at a given price falls into this category. 3.1. The learning algorithm The buyers, who are the sellers on the retail market, learn in the long run to increase their long-term profits by improving their short term gains. We assume that they do by means of an original learning algorithm which is based on several behavioral assumptions.8 Our starting point is a classical reinforcement learning rule (see Marimon and McGrattan, 1995; Erev and Roth, 1998). We assume that the buyers had the opportunity to explore the market environment thanks to their daily practice and then that they progressively start to exploit their acquired knowledge of the market environment by selecting an optimal bidding strategy. The key elements of a reinforcement algorithm are attractions Ai : P-R. They represent a numerical measure of how much it is worth to play an action.9 The action space P is a discrete set of bid prices p A f0; …; P ; …; P g which is common to all buyers. We consider both prices below and above the true seller's reservation price P for each buyer. In a standard reinforcement learning model, attractions are defined only over actions, because no state space is considered. In the behavioral model we propose, we extend the standard formulation by defining attractions over both actions and states Ai : P  S i -R. The state, in our case will reflect how far along the auction is and the number of units the buyer still wishes to purchase. The attractions' values evolve over time on the basis of current experience. Buyers are assumed to have a complete representation of the state space when participating in the market, thus they update the values of attractions conditioned on the state they are experimenting with. We therefore define an action-state mental representation of the buyer. In particular, the rule for updating attractions is the following: Ait ðsit ; pi Þ ¼ ð1  αÞ  Ait  1 ðsit ; pi Þ þ α  Rit ðsit ; pi ; p  i Þ

ð3Þ

is a reward term where α is a forgetting parameter that slowly reduces the importance of past experience and which depends on the current state and the price bid pi of buyer i and the price bids of all other buyers p  i ¼ ð…; pi  1 ; pi þ 1 ; …Þ. This “state space” approach has been already empirically investigated in the paper by van den Berg et al. (2001) on flower markets. The idea is that buyers condition their price bid on the rank of the transaction they are facing and on the amount of Rit ðsit ; pi ; p  i Þ

8

We thank two anonymous referees for their useful suggestions which led to the definition of a more parsimonious learning model. Attractions are referred to in different ways in the literature, for instance in the machine learning literature they are called q-values, in the economic literature they are also called strengths or propensities. 9

E. Guerci et al. / Journal of Economic Dynamics & Control 48 (2014) 374–393

379

A

Fig. 2. Each circle represents a possible state of the system. The first number indicates the number of units bought (qi). In this example the number of units that the buyer wants to acquire is Q n;i ¼ 2. The second number indicates the number of the current auction t within the sequence of auctions. In this example, there are 5 sequential auction rounds. States are connected among each other by possible actions. The selection of an action/bid price does not determine the transition to the next state because the decisions of the opponents play a major role.

remaining units. In the market for flowers analyzed by van den Berg et al. (2001), the buyers know in advance the total supply of the particular good for which they are bidding. Each lot is then composed of several identical units, i.e., a fixed amount of a particular type of flower. The first winning buyer announces how many “units” she wants to buy of one lot, so there is what the authors call a “parcel option”. The remaining units are then sold in a subsequent auction and so on until all units of the lot are bought. They found that these two variables (the rank of the transaction they are facing and the amount of remaining units) are statistically significant for the buyers' decision making process. We adopt similar but not identical assumptions, because in our market environment the buyers do not receive directly the information as to the total supply of the good as they can condition their choice on the rank of the auction and the number of units already bought, but they can learn this over time since we keep the number of rounds constant throughout the simulation and they are equal to the total supply of the good. Thus, we condition the buyer's decision on two values which are the rank number of the transaction they are facing and the number of remaining units that they are willing to buy. It is very difficult for buyers to recall precisely what has been traded, for as mentioned before the auctioneer at MER.IT.AN tried to prevent them from doing so. However, buyers are aware that only a limited amount of fish is available and do condition their bids on the amount that they have already purchased. As previously mentioned, the auctioneer of MER.IT.AN that we interviewed confirmed the validity of such assumptions by introducing an analogy with the card game of bridge, because buyers obviously would like, as far as possible, to condition their bids on signals that they receive from the market environment. Thus, we define the state space S i ¼ S iint  S ext as the cartesian product of an internal (private) state set S iint and an external (common) state S ext set. The internal state is represented by the number of units of good that the buyer i has already bought qi A f0; …; Q i g. The buyer can modify her reservation price conditional on this. The external state enables the buyer to determine at which round of the sequence of auctions they are t A f0; …; Tg. So, the buyer modifies her reservation price conditional on the time remaining until the daily market closes or, rather, on their estimation of how much fish may still remain. Fig. 2 sketches the action-state space representation from the viewpoint of a learning buyer who is willing to buy two units of the good and faces five auction rounds. The arrows establish the connection between states (transition matrix) from the daily initial state (0,1) where no units have been bought yet and the first auction round is starting, to the final absorbing state (A) where all states (2,  ) refer to the case when all units have been bought converge. We need the presence of such an absorbing state because it might happen that a buyer buys more units than desired. We need to be able to define such a possibility, when for instance a buyer wins the first two units of goods or equivalently auction rounds and then “involuntarily” (because the choices are probabilistic as specified below) she wins also a third auction round or even more. We collapse all such states in a final absorbing one because buyers are indifferent with respect to all such states, that is, they yield negative profits and buyers should learn to bid at zero in such states. This daily pattern represents an episode which repeats itself identically day after day. Indeed, we assume that the buyer faces an episodic task, i.e., to learn the best policy to adopt in order to improve on previous daily profits in such a repeated market environment where she is always willing to buy two units of the good and there are five auctions rounds. i The probability Kit of buyer i of choosing the bid price p in auction t when the state is st is then monotonically related to the attractions by implementing a classical logit probabilistic choice model: i i i   eλd At ðst ;p Þ : K it sit ; p ¼ i i ∑p eλd At ðst ;pÞ Þ

ð4Þ

λ d is a time-varying parameter that reflects the weight put on previous experience and can be thought of as the weight put

on using the experience from previous choices (exploiting) as opposed to trying new choices, (exploring).10 Starting the

10 This is in contrast to many models in which the parameter λ is kept constant. In our context it seems more reasonable to assume that a buyer will become more sure of what the best choice is in the repeated environment.

380

E. Guerci et al. / Journal of Economic Dynamics & Control 48 (2014) 374–393

simulation with a small postive value of λd o o 1, the learning buyer chooses almost uniformly the action to play among the whole action space even the ones that have the lowest attraction value. Progressively throughout the simulation we impose that λd -1 that is the probabilistic model for selecting the next action becomes increasingly concentrated on the particular i i action p^ having the highest attraction value Ait ðsit ; p^ Þ with respect to the current state, thereby converging to the almost sure i selection of the action p^ . Our learning model assumes realistically that buyers can perform an almost complete attraction updating that is the buyer reinforces not only the most recently played action but all other actions of which she can perform a correct inference as well. Buyers can simply use counterfactual reasoning. This is easy to understand by looking at the consequences of a bid. In particular, we can infer the new state if the buyer lost in the following cases:

 she can easily understand that any other private reservation prices that she would have had below the market price would have yielded an identical profit equal to zero and the same future state.

 she can also infer what would have happened if her reservation price had been above the market price, that is, she would have obtained the fish at that price. Indeed, we assume that she is not capable of guessing what would have happened in the case where her bid was at the same price as the highest bidder. If two or more buyers bid at the same price we pick at random one of them as the winner. If the buyer won:

 she can easily understand that had she had any other private reservation prices above her own winning bid price. This would have yielded a similar outcome of the market game (but different profit) and the same future state. She cannot infer anything for other bids below her winning bid since she cannot observe what other participants would have bid. There is only one missing important aspect of our behavioral model and it involves the reward term Ri ðsit ; pi ; p  i Þ and the intertemporal decision-making rule. In particular, we exploit the machine learning literature to find a successful solution for implementing an intertemporal decision-making rule implemented over an action-state space representation, i.e., the Q-learning algorithm (see Watkins and Dayan, 1992). Instead of the standard instantaneous profit Rit ðsit ; pit ; pt i Þ ¼ π it ðq þ 1Þ, we adopt the Q-learning formula as the reward term. Rit ðpit ; pt i ; sit Þ ¼ π it ðq þ 1Þ þ γ Atn;i ðsit Þ;  ¼ γ Atn;i ðsit Þ;

if ( j: pit o pjt

if pit 4pjt

8 j ai

ð5Þ ð6Þ

where Atn;i ðsit Þ ¼ max Ait ðsit þ 1 ; pÞ: pAP

Because of the discreteness of the grid of prices in the simulation, it often happens that two or more players bid at the same winning price. We assume in this case that one buyer is drawn at random to win the auction round and thus she updates attractions according to Eq. (5), otherwise she looses and she updates attractions according to Eq. (6). The attraction term Atn;i ðsit Þ plays a crucial role in enabling the learning agent to be forward-looking. She can anticipate the maximum potential profits in the next state given past experience. Indeed, Ant ;i ðsit Þ is the best expected value of i attractions over all possible actions (p A P) about next state sit þ 1 that she can have at st. The agent can thus incorporate in her current utility the expectation of the maximum future rewards she can obtain given her experience to date, in the next round/state. Indeed, her target is to maximize a cumulative function of the rewards and γ corresponds to the discount factor of the expected total discounted reward.11 This function reflects the decision maker's inter-temporal tradeoffs between present and future decisions. One of the conditions required for optimality is that the environment is Markovian. Summarizing, we have adopted the following four behavioral assumptions/modules: a reinforcement learning rule (a trade-off between exploring and exploiting mechanism), an action-state representation (condition-action rule), the possibility of using counterfactual reasoning and finally an intertemporal decision-making rule. These four modules are a minimal and coherent set of behavioral components, given the fact that the first two modules are included. Indeed, if we accept the assumptions of a reinforcement learning algorithm and an action-state representation, the two additional modules are strictly linked. Firstly, the possibility of using counterfactual reasoning, which is obviously realistic from a behavioral viewpoint, from a computational viewpoint guarantees a faster convergence towards the learned solution. 11 The Q-learning algorithm became popular as Watkins and Dayan (1992) demonstrated the convergence to the optimal policy when the expected n;i total discounted reward is considered as the measure of the performance. The convergence of Ant ;i ðsit Þ to the optimal A^ t ðsit Þ is guaranteed if α is progressively decreased to zero and each action is played in each state an infinity of times. The optimal policy is thus pn ¼ argmaxp Ant ;i ðsit Þ. Since then, QL has been widely adopted for Markov decision processes in particular in the single-agent framework, but also in the multi-agent framework.

E. Guerci et al. / Journal of Economic Dynamics & Control 48 (2014) 374–393

381

Secondly, the intertemporal decision-making rule is a reasonable choice given an action-state mental representation. It is the other side of the coin of human reasoning with respect to counterfactual reasoning. Indeed, humans can derive both the outcome that would have been obtained from playing an alternative action at the same time (counterfactual reasoning) and similarly the effect of current action on outcomes in future (intertemporal reasoning). Indeed, the proposed learning model is not “original” in the sense that it does not differ from the Q-learning algorithm except from the fact that we introduce complete attractions updating by means of counterfactual reasoning. This approach is not common because of the impossibility to implement it in the majority of simulated environments. We can perform full updating in the proposed market scenario because the state transition function is deterministic given the actions of all buyers, i.e., the market outcome for the specific auction round. Obviously, other aspects could have been considered, but we have constrained our model to a minimal set of learning modules. It is worth noting that some aspects have been deliberately neglected. For instance we did not include belief-based reasoning (belief-based models assume that agents explicitly model opponents' behavior as in “fictitious play” in game theory), because of the large number of buyers and the fast decision-making process. Indeed, as previously mentioned, the auctioneer tries to speed up the trade as much as possible. To do so she occasionally diminishes the starting price. This is not explicitly introduced into our model. A further possible limitation of such a learning model might be that we cannot deal with continuous action or state spaces. We indeed adopt a rather small discrete grid of prices and states. We could have followed the computer science literature to cope with such a limitation, as has already been done successfully.12 Nonetheless, we believe that our simple discrete model is able to generate some of the salient aspects of the dynamic pattern of prices. In conclusion, the proposed behavioral model is based mainly on two major assumptions: the reinforcement learning and the action-state representation. If these two assumptions are accepted as reasonable for the buyers' behavior in sequential Dutch auctions than the proposed model is appropriate and parsimonious. In the literature even more parsimonious models have been proposed, but the approach is different and the two assumptions that we have made here would not be appropriate in the framework analyzed (see Kirman and Vriend, 2001; Giulioni and Bucciarelli, 2008). 4. Simulation results In Fig. 3 the basic framework of our model is illustrated and the parameters which are varied for the different scenarios are given as follows, by introducing some notation we used for defining the different scenarios. T represents the amount available, or equivalently the number of auction rounds, on a particular day and D is the number of days for which one simulation is repeated. To analyze what happens once the learning process has had time to take place we select F as a number of days at the end of the simulation to examine the behavior to which the buyers have “converged” if they have. Finally, R is the number of independent simulations or repetitions of each market scenario. For each scenario we then consider two demand cases. This is shown in Fig. 3 and Table 1 for the first scenario we have simulated. Finally, the three parameters of the learning algorithm have been set for all scenarios as follows:  α ¼ 0:1. The rationale for this choice stems from the experimental literature. For instance Erev and Roth (1998) conclude that the average value for α that best fits the data of several economic games is 0.1. Accordingly, we assume it as the reference value for the parameter α for all simulations.  γ ¼ 1 is adopted in order to model a buyer who is trying to maximize the cumulative function of the rewards without discounting. Indeed, the problem that buyers want to solve is an episodic task that is, to improve day after day the sum of profits over the daily sequence of auctions. We assume that the buyer over such a short time horizon (one day) does not discount the stream of profits.  λd is the time varying parameter required to progressively force the selection at each state of the action with the highest attraction value. We change the value of λd each day d A ½0; …; D according to the following formula: λd ¼ ðd=DÞ3 . We have modeled four scenarios and for each of these we have simulated two market cases the first one considering a linear demand function for the retail market and the second one considering a perfectly inelastic demand function. Furthermore, we have simulated just for scenario 1 the two retail demand cases with the gamma parameter set equal to zero, for a total of ten different market cases simulated. In the first market scenario, there is one single learning buyer facing a stationary market environment, that is, she competes against an uniformly random buyer at each auction round. As we mentioned earlier the random buyer represents someone who does not visit the market regularly or is a new participant. We model eight auction rounds for scenario 1. This scenario is a benchmark scenario which highlights some key aspects of the model. Scenario 2 is an extension of scenario 1. In scenario 2, we increase the number of learning buyers up to four, keeping all other market conditions identical, that is, auction rounds and the presence of a random buyer. We investigate 12 We could have added a learning module based on a function approximation as in Sutton and Barto (1998). Indeed, approaches to deal with the continuous state space of the problem, often fall into one of two categories, action and state space discretization (the one we have adopted) or a function approximation performed commonly with neural nets (by choosing a set of basis functions beforehand, the agent can approximate the value function at each state).

382

E. Guerci et al. / Journal of Economic Dynamics & Control 48 (2014) 374–393

Result analysis over F days

Number of independent experiments (r)

Days (d) 1

2

3

.

.

.

.

D

1

1 2 3 . . . . . . T 1 2 3 . . . . . . T 1 2 3 . . . . . . T

1 2 3 . . . . . . T

2

1 2 3 . . . . . . T 1 2 3 . . . . . . T 1 2 3 . . . . . . T

1 2 3 . . . . . . T

3

1 2 3 . . . . . . T 1 2 3 . . . . . . T 1 2 3 . . . . . . T

1 2 3 . . . . . . T

. . . .

R

. . . .

. . . .

. . . . 1 2 3 . . . . . . T

1 2 3 . . . . . . T 1 2 3 . . . . . . T 1 2 3 . . . . . . T

Auction round (t) 1 auction - 1 good

Fig. 3. A daily market is composed of T identical auction rounds, in each of which only one unit of good is traded. One simulation is composed of D days. A number R of simulations, giving the same market and experimental condition, are run independently for each scenario studied in order to compute the results at the equilibrium as an average over the last F days of each simulation. Table 1 Simulation settings for scenario 1. Cases

1L1 1L0 1In1 1In0

Supply J Demand

Simulation

Learning

R

D

F

T

P1

Pr

γ

20 20 20 20

10,000 10,000 10,000 10,000

50 50 50 50

8 8 8 8

10, 6 10, 10 10, 6 10, 10

1 1 1 1

1 0 1 0

the effect of an increase of demand and of competition with this disruption of the stationary environment. In scenario 3, we just remove the presence of the random buyer with respect to scenario 2. We run such a scenario to better highlight the effect of the random buyer by testing a highly nonstationary scenario and comparing with respect to scenario 2. Lastly, in scenario 4 we simply increase the number of auction rounds up to twelve with respect to scenario 2. We compare the scenarios 2 and 4 to highlight the effects of an excess of demand or vice versa of supply. 4.1. Scenario 1: eight auctions, one learning buyer plus one random buyer This scenario is a convenient reference point before examining situations in which the buyers face other learning buyers. The statistical properties are good and enable to better highlight some important aspects of the model. Incidentally, it is worth noting that such a scenario is not completely unrealistic. Indeed, we can expect that for instance a market entrant, or a buyer who participates only occasionally, faces expert players who have already developed a more stable bidding behavior. Thus, the opponents, in particular in the case of a large number, might determine a quasi-stationary stochastic environment. Table 1 gives the simulation settings for the four market cases, the linear demand case with γ ¼ 1 (1L1) and with γ ¼ 0 (1L0) and the inelastic demand case with γ ¼ 1 (1In1) and with γ ¼ 0 (1In0), investigated in this section. Table 1 is composed of four parts: the left part lists the simulated market cases, the second part reports the values of the simulation settings defined previously in Fig. 3, the third part reports the values for characterizing supply and demand in each of the simulated cases, whereas the right part specifies the value of the discount parameter γ that is adopted for the learning buyers. Consider the left part. The number of days that we have chosen before we consider that learning has taken place might appear excessive. However, this allows us to be reasonably confident of any evidence of statistical regularities that we find. By repeatedly exploring all possible states and actions given opponents’ behavior over several days we find that this does indeed, ensure convergence. It is worth recalling that only the last F ¼50 days are then considered when we examine market results. As far as supply is concerned, the parameter T (daily auction rounds) fully describes the daily overall supply given our assumption that one unit is traded in each round. Finally, demand is generated by n learning agents P1, P2, …, Pn or (and) a “zero-intelligence” buyer who chooses uniformly at random from the action space P. If the random bidder wins the good she is replaced by an identical agent. Hence, when we say that there is one random buyer this means that there is always one such bidder in each round. The presence of the latter agent guarantees that there is always positive demand for each unit of the good. The action space is the set of 11 prices p A f0; 1 ¼ P ; 2; 3; …; 10 ¼ P g. We introduce the action corresponding to a bid price at 0 which is below the reservation price of the seller P , in order to give buyers the option to simply exit the market. 4.1.1. The linear demand case with γ ¼ 1 The first case (1L1) is thus characterized by 8 auction rounds and one learning agent willing to buy two of the eight traded units of good with diminishing marginal profits (10 and 6) and by the presence at each auction round of one

Prices

E. Guerci et al. / Journal of Economic Dynamics & Control 48 (2014) 374–393

383

10 9 8 7 6 5 4 3 2 1 (0,1) (0,2) (0,3) (0,4) (0,5) (0,6) (0,7) (0,8)

(1,2) (1,3) (1,4) (1,5) (1,6) (1,7) (1,8)

Prices

States 10 9 8 7 6 5 4 3 2 1

)

)

)

,8

(1

,7

(1

)

,6

(1

)

,5

(1

,4

(1 )

)

,3

(1

,2

)

)

,8

(1

(0

)

,7

(0

)

,6

(0

)

)

,5

(0

,4

(0

)

,3

(0

)

,2

(0

,1

(0

States Fig. 4. Action-state matrix for scenario 1L1 (linear demand) in the top axis. In the bottom axis, the black-filled circles highlight the ABS best actions for each state. The empty circles and the connecting curve report the backward induction solution (BIS).

uniformly random buyer. The internal states are given by the set of possible units of goods bought S iint A f0; 1; 2g and the external states S ext by the set of auction rounds t A f0; 1; 2; …; 8g. Fig. 4 presents in the upper part a plot representing the averaged action-state matrix of attractions (estimated over the final F days and R number of independent runs, please refer to Fig. 3) for the learning buyer. The action-state attraction matrix reports what a buyer has learned about the desirability of an action/bid price (y-axis) given a specific state (x-axis). On the y-axis only the prices between P and P are shown, but, as previously mentioned, we have introduced a further action corresponding to a bid price at 0 (which we do not plot) providing the buyer with the possibility to exit the market. This exit strategy is never chosen since the buyer is always willing to buy until she obtains the second unit. The states (x-axis) are represented by first fixing the internal state and then running over the external states. Thus, the first eight states (left-hand side of the plot) refer to the eight auction rounds, respectively, when the buyer has bought 0 unit of good out of the 2 units she is willing to buy, we will refer to these states with the following notation ð0; Þ. The right-hand side of the plot reports the seven states ð1; Þ referring to the sequence of the last seven auction rounds when the buyer has bought only one unit of good (she still needs one more). Finally it is worth noting that one more state is always added in the simulation which we never show in the plots (it was highlighted in Fig. 2). This state is a final absorbing one, where buyers end up when they have bought all the units they need. In the absorbing state, this is, in effect, saying that buyers bid at 0. When a new trading day occur buyers exit from this absorbing state and they restart from state ð0; 1Þ. The upper plot reports the complete action-state attraction matrix with each attraction value shaded from the highest value (darkest) to the lowest value (lightest). The grey tones of attractions are normalized with respect to each column/state. Even if some states are more frequently visited than others, for instance state ð0; 1Þ is always visited every day and therefore the corresponding attraction values increase more than those of other states, for the sake of legibility we have restricted the interval between the maximum and minimum value of the attractions for each state/column to vary in the same range, i.e., [  10, 10]. We report this upper plot only for this scenario, because the lower plot reports the useful information from the simulation viewpoint, that is, with black-filled circles it highlights the best action for each state given the information that the individual has obtained while learning. The lower plot can be thought as the mental map that the buyer has developed after convergence in order to play on the market. Indeed, the black dots correspond to the bid price played at each state, because after learning the buyer always select the action with the highest attraction given the state. All the results that we report focus on the behavior at convergence after very long simulations which achieve really stable outcomes. The “mental map” of the unique learning buyer (presented in Fig. 4) represents the behavior learned by the buyer after several days of “practice”. The first finding is that a significant difference exists between the set of states (left-hand side of the plot) when the buyer has not yet bought any unit and the set of states (right-hand side of the plot) when one unit has been bought. The former set exhibits bid prices higher than the latter set for each corresponding auction round. This behavior simply reflects the fact

384

E. Guerci et al. / Journal of Economic Dynamics & Control 48 (2014) 374–393

that, in this scenario, the marginal profit yielded by the second unit is lower than that of the first unit. It is worth noting that all the best actions are significantly below the corresponding marginal profit, namely, 10 and 6. A second and original finding of the learning algorithm is that the buyer learns to recognise time pressure. As time passes without success the buyer learns to increase her bids. Thus, as auction rounds go by the bid price increases for both internal states. In the last auction rounds, the buyer still willing to buy some units increases her chance of winning by raising the bid price. Given this and the fact that the disturbance term (the random buyer or “noise trader”) has mean 5, we can explain how the level of the price evolves. For states ð0; Þ, bid prices are significantly below the corresponding marginal profit (10), but progressively in the later auction rounds bid prices increase very close to the “average disturbance price” or average noise trader bid. The buyer learns, in particular in earlier auction rounds, to make a risky bet (bidding at a price significantly lower than the average disturbance price), because she knows that she has still in the future a certain number of opportunities in the subsequent auction rounds to increase her chance of winning the first unit of good by raising her bid price. Similar reasoning applies to states ð1; Þ, but it is also obvious that the bid for the last unit will be lower in the case where the individual has already obtained a unit. In effect, the buyer learns to bid against the distribution of bids with which she is actually faced. 4.1.2. Backward induction Another interesting way to interpret the buyer's behavior is to reason in terms of backward induction. Thus what we can do is to calculate what an individual would do if she knew the characteristics of the other bidder. We can then see whether the simulated behavior corresponds to the solution obtained in this way. Accordingly, we can easily compute what is the optimal strategy in such a stationary environment (a uniformly random opponent). The optimizing buyer i should maximize E½Π i ðs^ ; aÞ the sum of the future expected profits π i ðs; aÞ starting form the current state s^ up to the end of the “episode” i.e. the final absorbing state s: " # s

E½Π ðs^ ; aÞ ¼ E ∑ π i ðs; aÞ : i

ð7Þ

s ¼ s^

It is worth noting that, by construction, no profits can be earned in the final absorbing state s, thus π i ðs; aÞ is always equal to 0. Now, assume that the buyer is at state ð0; 8Þ, that is, she is in the last (eight) auction round and is still considering buying the first unit of good, what is the best strategy? max E½Π i ðs08 ; aÞ ¼ max E½π i ðs08 ; aÞ ¼ max ½Pr w ðs08 ; aÞ  π i ðs08 ; aÞ a

a

a

ð8Þ

We consider a as any possible action (a bid price) available at state s08 ¼ ð0; 8Þ, and Prw ðs08 ; aÞ is the probability of winning the auction (obtaining the unit of good) given that the buyer bids a at current state s08 . With respect to state s08 , Pr w ðs08 ; aÞ ¼ a=10 and π i ðs08 ; aÞ ¼ 10  a A straightforward calculation shows that the action yielding the maximum of the sum of the expected profits is a bid price equal to 5. In the bottom plot of Fig. 4 for state s08 the black dot indicates a bid of 5. The buyers behavior is optimal with respect to such a state, yet she has done no conscious optimization. Analogously, considering state s18 , the objective function of the maximization is equivalent except that π i ðs18 ; aÞ ¼ 6 a, because the buyer is at the eight auction round and she is willing to buy the second unit of good which she now values at 6. Solving the maximization problem, the optimal strategy is to bid at price equal to 3 as also Fig. 4 shows. If we consider previous auction rounds, we can also prove that the behavior learned is optimal. Now consider the state s07 (7th auction round and two units still to buy), the objective function now changes as follows: max E½Π ðs07 ; aÞ i

a;b;c

¼ Prw ðs07 ; aÞ  ½π i ðs07 ; aÞ þPr w ðs18 ; cÞ  π i ðs18 ; cÞ þ ð1 Pr w ðs07 ; aÞÞPr w ðs08 ; bÞ  π i ðs08 ; bÞ; where a, b and c are the available actions at states s07 , s18 and s08 respectively. Solving the maximization problem, considering that by backward induction we know that the best action in state s18 and s08 are 3 and 5 respectively, as previously determined and Pr w ðs07 ; aÞ ¼ a=10 and π i ðs07 ; aÞ ¼ 10  a, we obtain 4.2 as the optimal bid price. Since our simulation settings only allow integer prices thus we obtain a bid price of 4 corresponding to state s07 represented with the black-filled circle in Fig. 4. Finally, for state s17 the objective function is max E½Π ðs17 ; aÞ ¼ Pr w ðs17 ; aÞ  π i ðs17 ; aÞ þ ð1 Pr w ðs17 ; aÞÞPr w ðs18 ; cÞ  π i ðs18 ; cÞ; i

a;c

and the optimal bid price is 2.55. This backward induction exercise finds the optimal policies in the stationary environment of Scenario 1. In the bottom plot of Fig. 4 we report also with the empty circles and the connecting curve the backward induction solutions (BIS) for all states, which have been computed recursively in accordance with previous equations. The BIS curve assumes values which are not integers, whereas the best-action values determined by the agent-based simulations (ABS) are integers, the buyers are obviously constrained to a discrete action space. The BIS and ABS values are very close.

Prices

E. Guerci et al. / Journal of Economic Dynamics & Control 48 (2014) 374–393

385

10 9 8 7 6 5 4 3 2 1

(1 ,8 )

) ,7 (1

) ,6 (1

) ,5 (1

) ,4 (1

) ,3 (1

) ,2 (1

) ,8 (0

) ,7 (0

)

) ,6 (0

,4

) ,5 (0

(0

) ,3 (0

) ,2 (0

) ,1 (0

States Fig. 5. Best action-state plot for case 1L0 (linear demand) when buyers do not implement any intertemporal decision-making rule, that is, γ ¼ 0.

To repeat, what is quite remarkable, is that our learning process finds these policies even though the learner is unaware of the nature of the opponent whom she is facing.

4.1.3. Changing the value of gamma Before moving to the inelastic retail demand market cases, we report a further simulation exercise concerning the market scenario 1 where instead of using a value of γ equal to 1, that is, the buyer learns the best strategy in order to improve the sum of daily profits, we use a value of 0, that is, the buyer learn the best strategy in order to improve instantaneous profits (profits for each auction round). We show the results of market case 1L0 in Fig. 5 in order to highlight the role of this behavioral component. The result is obvious from a theoretical viewpoint, indeed, the previous backward induction analysis gives the right argument for finding the optimal solutions. All states ð0; Þ are equivalent to state ð0; 8Þ in the previous case when the buyer is still considering buying the first unit of good. Thus, Eq. (7) is now valid for each state in the considered interval and the bid price is equal to 5. Analogously, all states ð1; Þ are now equivalent to the previous state ð1; 8Þ where the bid price is equal to 3. What is again remarkable is that the simulation outcome shows a robust convergence to the optimal policy given a sufficient number of daily iterations. The role of the intertemporal decision-making rule in enabling the buyers to feel the time pressure and to benefit from this perception by lowering prices at initial auction rounds is now evident.

4.1.4. The inelastic demand case Now we give the results for the inelastic retail demand market case (1In1) where the marginal profits for the learning buyer are constant and equal to 10 and γ ¼ 1. One might expect the first finding (page 16) in the previous case to vanish, and that a constant behavior (price strategy) would appear irrespective of the internal state, that is, how many units of good she has already bought. Fig. 6 plots the best actions of the action-state attraction matrix for each state. It is evident that the bidding behavior does not change significantly, in the sense that both previous findings are confirmed. While the second one (time pressure effect) is now obvious (given the simulation outcomes of previous case), the first is not trivial. The buyer learns a different bid price when trying to obtain the first unit of good compared to the second one. This phenomenon cannot be due to the different profitability of the units (marginal profits are constant), it is more the outcome of a speculative behavior faced with time opportunities, that is, the buyer can take more risk (diminishing bid prices) after having bought at least the first unit of good (states ð1; Þ). In other words with one unit in hand and the option of bidding in later rounds the buyer can bid lower. This bidding strategy contributes to diminishing average market prices throughout the day, because obviously the first unit of good is bought on average at a higher price in earlier auction rounds. It is not necessary then to induce such a declining price phenomenon by exogenously imposing a diminishing marginal profit condition on buyer's behavior. It emerges as an illusory diminishing marginal profit effect. It is worth noting that the noise trader imposes an exogenous reference price level for trading, for the purchase of both units of the good. This market scenario is extreme in one sense, the market environment with which the learning buyer is faced is stationary. Resorting again to a backward induction argument, it is again easy to calculate the optimal bid price in the inelastic retail demand market scenario. The only difference with the previous scenario is that the second unit to be bought is valued at 10 and not at 6 as previously, thus the π i ðs; aÞ ¼ 10  a for all states s except the final absorbing one. Fig. 6 again highlights the optimal policy given a random opponent and both ABS and BIS outcomes are plotted, dots and empty circles, respectively. Again, the predictive power of the ABS approach is remarkable as compared to BIS. We present also for this scenario the simulation results concerning the market case (1In0) when the value of gamma is equal to 0. Fig. 7 plots the action-state matrix for scenario 1In0 (inelastic demand) when buyers do not implement any intertemporal decision-making rule. The results are again obvious from a theoretical viewpoint, indeed, all states are equivalent and the bid price is equal to 5 every time. The endogenous and illusory effect of valuing the second unit of good

E. Guerci et al. / Journal of Economic Dynamics & Control 48 (2014) 374–393

Prices

386

10 9 8 7 6 5 4 3 2 1

(1 ,8 )

)

)

)

)

)

,7

,6

,5

,4

,3

(1

(1

(1

(1

(1 )

)

,2

,8

(1

(0 )

)

)

)

)

,7

,6

,5

,4

,3

(0

(0

(0

(0

(0 )

)

,2

,1

(0

(0

States

Prices

Fig. 6. Best action-state plot for case 1In1 (inelastic demand). The black-filled circles highlight the ABS best actions for each state. The empty circles and the connecting curve report the theoretical backward induction solution (BIS).

10 9 8 7 6 5 4 3 2 1

)

,8

(1 )

)

,7

(1

)

,6

(1

)

,5

(1

,4

(1

)

)

,3

(1

)

,2

,8

(1

(0

)

)

,6

)

,7

(0

(0

)

)

,5

(0

,4

(0

,3

)

)

,2

(0

(0

,1

(0

States Fig. 7. Best action-state plot for case 1In0 (inelastic demand) when buyers do not implement any intertemporal decision-making rule, that is, γ ¼ 0.

less than the first one vanishes. In this case (γ ¼ 0), only exogenously imposed diminishing marginal profits succeeds in producing this effect. This is an important aspect of our paper when it comes to explaining the declining price anomaly. 4.2. Scenario 2: eight auctions, four learning homogeneous buyers plus one random buyer Table 2 shows the simulation settings for two new market cases, 2L1 and 2In1. In these cases we propose to investigate market outcomes with four learning buyers plus one uniformly random buyer or noise trader. The four buyers co-evolve determining a highly non-stationary environment where four interdependent learning dynamics interact. In a certain sense this scenario is at the opposite extreme to the previous one. Now, we are assuming that the learning dynamic for all the buyers start simultaneously from scratch, indeed, they have no priors at all over the market environment. In reality, of course, except in rare cases, buyers have priors based on their previous experience and do not start learning together and are faced with some less experienced buyers. Therefore, we keep a random buyer and the proposed scenario can still provide a simulation framework for testing if buyers facing a co-evolving market environment learn to behave in a way that preserves the nice properties of the previous scenarios. Figs. 8 and 9 report the average action-state attraction matrices (estimated over the final F days and R number of independent experiments, please refer to Fig. 3) after the learning period for both cases 2L1 and 2In1, respectively. We report the action-state attraction matrix for only one of the four learning buyers, since, the four matrices are identical, as all buyers are homogeneous and the situation is therefore symmetric. We now observe that significant differences between the bidding behaviors in the elastic and inelastic demand situations emerges. In the first case (see Fig. 8), both the diminishing marginal profit and the time pressure effects are present as in the previous scenario. However, the level of the “optimal” bidding price with respect to each state, is now considerably higher. Obviously, we have generated a market condition with more demand. If the noise trader wins in any round the players can no longer all purchase two units. There are three more learning buyers competing for the same amount of scarce resource as in previous scenario. Given this, Fig. 9 shows that there is now no difference between the bidding price for all states (both internal and external), the only effect which is still slightly present is the time pressure. Fig. 10 reports the price dynamics for four averaged generic days (estimated over a number R of independent experiments), after learning, for both cases. The comparison now reveals that the declining price phenomenon is not common to both cases. This effect is present only in the 2L1 (linear demand) model. The diminishing marginal profit condition is now the only cause for the declining price paradox. In earlier auction rounds, the units of good with the highest

E. Guerci et al. / Journal of Economic Dynamics & Control 48 (2014) 374–393

387

Table 2 Simulation settings for market cases 2L1 and 2In1. Cases

R

D

F

T

P1

P2

P3

P4

Pr

20 20

100,000 100,000

50 50

8 8

10, 6 10, 10

10, 6 10, 10

10, 6 10, 10

10, 6 10, 10

1 1

Prices

2L1 2In1

Supply J Demand

Simulation

10 9 8 7 6 5 4 3 2 1

)

,8

(1 )

)

,7

,6

(1

(1

)

)

)

,5

,4

,3

(1

(1

(1 )

)

,2

,8

(1

(0

)

,7

(0 )

,6

(0 )

)

,5

,4

(0

(0

)

)

)

,2

,1

,3

(0

(0

(0

States

Prices

Fig. 8. Best action-state plot for retail market case 2L1 (linear demand).

10 9 8 7 6 5 4 3 2 1

(1

(1

,8

,7

)

)

)

)

,6

,5

)

)

,3

,4

(1

(1

(1

(1 )

)

)

)

)

,2

,8

,7

,6

(1

(0

(0

(0

)

)

,5

,4

,3

(0

(0

(0 )

)

,2

,1

(0

(0

States Fig. 9. Best action-state plot for retail market case 2In1 (inelastic demand).

value to the buyers are bought whilst in later rounds the remaining ones have lower values. This effect vanishes in the 2In1 (inelastic demand) model. The buyers have learned to compete strongly to get the desired amount of goods and the prices bid remain at the same level for both units of the good. Indeed, the opposite effect, a slight increase in the price levels as rounds go by, now emerges, because the time pressure effect now plays a prominent role. This effect has also been noted empirically by Hardle and Kirman (1995) for the Marseille fish market where the very last bids were often higher, particularly on days when supply was low. It is worth noting that the purchases of both units of good occur at prices significantly higher than the average disturbance price or noise trader bid.13 4.3. Scenario 3: eight auctions, four learning homogeneous buyers Table 3 shows the simulation settings for the two cases considered, 3L1 and 3In1. These market cases are identical to 2L1 and 2In1 cases except for the fact that the random buyer is no longer present. The aim is to explore how the computational market model performs in a highly nonstationary environment. This scenario is extreme, as the scenario 1 was in another sense. Indeed, the four buyers learn from scratch simultaneously, and they spend the majority of their learning time to explore the highly nonstationary market environment. Only during the final days of the simulation they start progressively 13 We should also point out that the simulated days for this scenario has been increased of a factor of 10 with respect to the previous scenario, that is, 100,000 days. Because of an increased number of learning buyers, the learning dynamics takes longer to become stable at convergence.

388

E. Guerci et al. / Journal of Economic Dynamics & Control 48 (2014) 374–393

linear demand

inelastic demand

10 9 8

Price

7 6 5 4 3 2 1

1

2

3

4

5

6

7

8

1

2

3

4

5

6

7

8

1

2

3

4

5

6

7

8

1

2

3

4

5

6

7

8

Auction rounds Fig. 10. Transaction prices for four generic days (eight auction rounds per day) after learning. Table 3 Simulation settings for retail demand cases 3L1 and 3In1. Cases

R

D

F

T

P1

P2

P3

P4

Pr

20 20

100,000 100,000

50 50

12 12

10, 6 10, 10

10, 6 10, 10

10, 6 10, 10

10, 6 10, 10

1 1

Prices

3L1 3In1

Supply J Demand

Simulation

10 9 8 7 6 5 4 3 2 1

)

,8

(1

)

)

,6

,7

(1

(1

)

)

,4

,5

(1

(1

)

)

,2

)

,3

(1

(1

,8

(0 )

,7

(0

)

,6

(0 )

,5

(0

)

,4

(0 )

,3

(0

)

)

,2

,1

(0

(0

States

Prices

Fig. 11. Best action-state plot for market case 3L1 (linear demand).

10 9 8 7 6 5 4 3 2 1

(1 ,8 )

)

)

,7

(1

,6

)

)

)

,4

,3

)

)

,2

,5

(1

(1

(1

(1

(1

,8

(0 )

)

)

)

,7

(0

,6

,5

(0

(0

)

)

,4

,3

(0

(0

)

,2

,1

(0

(0

States Fig. 12. Best action-state plot for market case 3In1 (inelastic demand).

and simultaneously to exploit the learned policy. Thus, they have no “stationary” pattern to identify in the price dynamic. In fact, we would expect that a single buyer enters the market at any time and starts to learn facing expert or at least already trained traders that have developed “optimal” policies. Entering in the market in reality is an asynchronous event. However, we build and provide results for this scenario because it is a reference scenario for understanding the behavioral model in an extreme setting to test for the robustness of previous findings.

E. Guerci et al. / Journal of Economic Dynamics & Control 48 (2014) 374–393

linear demand

389

inelastic demand

10 9 8

Price

7 6 5 4 3 2 1

1

2

3

4

5

6

7

8

1

2

3

4

5

6

7

8

1

2

3

4

5

6

7

8

1

2

3

4

5

6

7

8

Auction rounds Fig. 13. Transaction prices for four generic days (12 auction rounds per day) at convergence.

Table 4 Simulation settings for scenarios 4L1 and 4In1. Cases

4L1 4In1

Supply J Demand

Simulation R

D

F

T

P1

P2

P3

P4

Pr

20 20

100,000 100,000

50 50

12 12

10, 6 10, 10

10, 6 10, 10

10, 6 10, 10

10, 6 10, 10

1 1

Figs. 11 and 12 report the averaged action-state attraction matrices for both retail demand cases. Comparing the simulation results for the previous scenarios 1 and 2 and the current one, similar behavioral patterns can be identified in both cases: the (real and illusory) diminishing marginal profit effect and the time pressure effect are both present. These two behavioral findings are therefore robust also in the extreme scenarios 1 and 3. We can thus expect the emergence of the same stylized facts in the price dynamic.14 Fig. 13 shows again the price dynamics for four averaged generic days, after learning, for both the 3L1 and 3In1 cases. The declining price phenomenon is now present in both cases even in the inelastic one and it is even more pronounced. The explanation is twofold: (1) when the random buyer is present her average bid is 5 and this invisible reference price level that buyers manage to learn forces them to bid higher for both units and (2) the more abundant supply because of the absence of the random buyer. The latter aspect will be furter investigated with the next scenario.

4.4. Scenario 4: twelve auctions, four learning homogeneous buyers plus one random buyer Finally, Table 4 shows the simulation settings for the last two cases considered, 4L1 and 4In1. These market cases are identical to 2L1 and 2In1 cases except for the fact that one day is composed of 12 auction rounds instead of eight. The aim is to explore what happens when the demand of the learning buyers is kept the same whilst more units of the good are available. The competition amongst the learning buyers is thus, in effect, diminished. Figs. 14 and 15 report the averaged action-state attraction matrices for both scenarios. Comparing the previous case 2L1 and the new one 4L1, similar bidding behavior emerges. The bids for the first unit of good are again around 6, whereas those for the second unit are slightly lower 4 instead of 5, just below the average disturbance price of 5. In the new market context, buyers can take more risk and the presence of the random buyer is more strongly felt. But now a major difference emerges between cases 4L1 and 4In1. The decline in competitive pressure leads to learning buyers in later rounds becoming more aware of the presence of the random buyer. Thus they start to decrease their bids for the second unit of the good. Again, in the context of constant marginal profits the results are different. This effect is evident also in Fig. 16. Fig. 16 shows the price dynamics for four averaged generic days, after learning, for both 4L1 and 4In1 cases. The declining price phenomenon is now present again in both retail demand situations.

14 We have also run simulations with 8 learning buyers with the same marginal profit structure and 16 auction rounds without the presence of a random buyer and the findings are still robust.

E. Guerci et al. / Journal of Economic Dynamics & Control 48 (2014) 374–393

Prices

390

10 9 8 7 6 5 4 3 2 1

2)

1)

0)

)

,1

,1

,1

,9

(1

(1

(1

(1 )

)

,8

,7

(1

(1 )

)

,6

,5

(1

(1 )

)

)

,4

,3

,2

(1

(1

(1 2)

1)

,1

,1

0)

)

,1

(0

(0

(0

)

)

)

)

)

,9

(0

,7

,6

,5

,4

,8

(0

(0

(0

(0

)

)

)

,2

,1

,3

(0

(0

(0

(0

States

Prices

Fig. 14. Best action-state plot for market case 4L1 (linear demand).

10 9 8 7 6 5 4 3 2 1

2)

1)

,1

,1

(1

(1

)

)

)

0)

,9

,8

,1

(1

(1

(1

)

)

)

,7

,6

(1

(1

,4

,5

(1

(1

)

)

1)

2)

,2

,3

,1

(1

(1

(0

0)

,1

,1

)

)

)

,8

,9

(0

(0

(0

(0

)

)

,7

,6

(0

(0

)

,5

,4

(0

(0

)

)

)

,2

,1

,3

(0

(0

(0

States Fig. 15. Best action-state plot for market case 4In1 (inelastic demand).

linear demand

inelastic demand

10 9 8

Price

7 6 5 4 3 2 1

1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12

Auction rounds Fig. 16. Transaction prices for four generic days (12 auction rounds per day) at convergence.

4.5. Comparing the profits Finally, we present some statistics for buyers' profits for the eight market cases when we consider γ ¼ 1. Fig. 17 plots the buyer's profits for the market case 1L1 where only one learning buyer is active. We plot the daily ensemble average of the profit over all R simulations at convergence (last 100 days). The presence of the random trader is crucial in determining the randomness of the time-series. Furthermore, it is worth noting that the algorithm at convergence does not always select, but simply more frequently, the action with the highest attraction value at each state (the action that is highlighted in previous best-action state plots). This is because our algorithm does not select the best action with certainty at convergence, instead we continue to implement the classical logit probabilistic choice with a time-varying λd parameter which has been calibrated with respect to the length of the simulation in order to attain at convergence a strong tendency to select the action with the highest attraction. These two effects contribute to the randomness. In Table 5 the average profit computed over the R simulations, over the last 100 days and over the four buyers (only for scenarios 2, 3 and 4) is reported.

E. Guerci et al. / Journal of Economic Dynamics & Control 48 (2014) 374–393

391

4

Profits

3

2

1

0

10

20

30

40

50

60

70

80

90

100

Days Fig. 17. Buyer's profit for market case 1L1. Table 5 Average profit of the average buyer for each scenario and retail demand case. Statistics

1L1

1In1

2L1

2In1

3L1

3In1

4L1

In1

Average profit Std

2.75 0.45

6.84 0.45

3.21 0.21

3.30 0.23

6.34 0.30

6.88 0.14

3.72 0.22

5.40 0.26

Market cases with inelastic demand generate the highest profits because of the higher marginal profit for the second unit of goods. But this positive effect is counterbalanced in particular in scenarios 2 and 3 by the buyers' competition when the excess of demand is higher. Indeed, the largest discrepancy between the inelastic and linear cases is in scenario1 and the smallest is in scenario 2. 5. Conclusions This paper has proposed an agent-based computational model to investigate the market price dynamics of sequential Dutch auctions in particular in the context of perishable goods markets such as that for fish, for which we have empirical evidence. Our approach was to see to what extent learning by market participants who reinforce on their previous experience and can reason counterfactually will generate the sort of phenomena observed empirically in this sort of perishable good markets. The particular market on which we based our model, that in Ancona, is one on which wholesale buyers meet each other repeatedly and participate in a sequence of Dutch auctions at which the catch of the previous night is sold. The model focuses on the behavior of buyers, and, in particular incorporates inter-temporal profit maximization, conjectures on bids and fictitious learning. Nonetheless, some behavioral aspects have been neglected. We have implicitly neglected the emergence of loyalty among buyers and sellers. In this case the sellers are the owners of the fishing vessels and the name of the boat that caught the fish is announced when the fish is presented for auction. Thus buyers might learn to reinforce their preference for fish from certain vessels. In reality, auctioneers try to mitigate this inefficient market outcome by randomly ordering the presentation of the goods of the various sellers throughout the auction rounds. Obviously, this action is not sufficient, because buyers might wait, taking some risk, given that the identity of the vessel that caught the fish that is currently being sold is posted. However this is offset by buyers not knowing when, and if, fish from the vessels that they favor will appear. Some empirical studies on auction-based fish markets (Giulioni and Bucciarelli, 2011; Gallegati and Gianfranco Giulioni, 2011) have statistically confirmed the presence of loyalty. However, this aspect does not seem to be important in explaining the declining price phenomenon, since there is no obvious reason for the random sorting of the vessels to induce a decline in price as auction rounds go by. We have also not included belief-based reasoning (belief-based models assume that agents explicitly model opponents' behavior), because of the large number of buyers and the fast decision-making process. Indeed, as previously mentioned, the auctioneer speeds up the trade by occasionally diminishing the starting price in order to discourage strategic reasoning. In our buyer's behavioral model we have also not incorporated explicitly the formation of expectations about daily supply. In reality, when buyers come to the market in the early morning, they start to form expectations about the daily overall market supply. They can collect information by looking at the total number of crates of fish, or even by talking with other market actors and they also will have an idea as to how much impact the weather will have had on the catch. Buyers thus form priors about the expected daily supply and bid accordingly. Conversely, our behavioral model, which is a first step to modelling the results of learning in this complex context, has been trained over different scenarios which have been kept stable throughout each independent simulation (number of buyers, number of auction rounds/unit of goods sold). We have proposed four main scenarios (one learning buyer plus one random buyer with eight auction rounds (S1), four learning buyers plus one random buyer with eight auction rounds (S2), four learning buyers and no random buyer with eight auction rounds (S3), four learning buyers plus one random buyer with twelve auction rounds (S4), however obviously

392

E. Guerci et al. / Journal of Economic Dynamics & Control 48 (2014) 374–393

more could have been proposed. But, we have focused our attention on the effect of two modeling aspects, the effect of a stationary versus a non-stationary market environment preserving the number of auction rounds (S1 versus S2 or S3), and the effect of an excess of demand by changing the number of auction rounds while nevertheless preserving the competitive environment (S2 versus S4). Again we should emphasize that the first scenario is particularly relevant, because in reality market entrants do not face a continuously co-evolving market environment rather they face a market environment where trained buyers act according to well-established decision-making rule that have been learned through practice and it is the presence of some less experienced buyers that causes disturbances. This is why we examined in some depth the first scenario which already highlights the important elements of our agent-based computational model. In conclusion then, our agent-based computational model has, among other things, successfully replicated the price declining phenomenon which has long been thought of as paradoxical in economics and furthermore reproduces an effect which has been empirically observed, that the prices of the final transactions have a tendency to increase after the general decline in prices. The focus of our work has been to identify the stable optimal policy that has been learned by the buyers. The hypothesis is that buyers after a long-term practice have developed a condition-action response to the market signals they perceive and then exploit it in daily trading. Furthermore, despite the fact that the behavior of the opponents is unknown, the process finds the corresponding “full knowledge” optimal prices. Three factors are present in our approach. They are diminishing marginal profits (linear retail demand model), the illusory diminishing marginal profit effect which is, in fact, due to intertemporal optimization (perfectly inelastic retail demand model) and more abundant supply. All of these aspects in a more complex and realistic market settings can jointly contribute to the emergence of such a price formation phenomenon. Lastly, it is worth remembering that all the scenarios simulated have considered a homogeneous population of learning buyers. The heterogeneity of buyers (in particular in terms of marginal profits) is an aspect that has not been addressed here and has been left for future research. Obviously, this may further contribute to “exogenously” generate the declining price effect. Finally, we have been capable thanks to our model to obtain the time pressure effect (both retail demand models), that is, in the latter auction rounds, the buyer's optimal policy is to raise his bid because the option to wait becomes less profitable. This is specific to a model where intertemporal optimization is incorporated in the buyer's decision and corresponds to a stylized fact found by Hardle and Kirman (1995) for the Marseille fish market, which is that the price of the very last transactions tends, if anything, to increase. We have not undertaken a statistical analysis of simulation results because the goal of our research is not to perform an output validation, that is, we have not tried to replicate exact distributions of price, this would have required a precise calibration of the model which is not an easy task, but simply to provide theoretical/computational evidences of the fact that this original model can generate under some market conditions specific features such as the declining price pattern. Accordingly, the next step for our research on this topic will be to undertake experimental and further empirical analysis in order to validate the behavioral model. Statistical investigation of human-subjects experiments and the bidding data of participants on real wholesale fish markets will enable us to confront our theoretical/computational findings with the empirical and experimental evidence for some of the apparently anomalous features of market prices.

Acknowledgments We wish to thank Paul Pezanis Christou, Nobuyuki Hanaki, Gianfranco Giulioni, Nick Vriend, Gerard Weisbuch, Mauro Gallegati, and all the participants in the Tromso workshop on fish markets for helpful discussions and comments. We would also like to thank the editors and anonymous referees for their comments and suggestions which we hope have helped us make the paper not only better but clearer. This work is partly financed by a JSPS-ANR bilateral research grant (ANR-11FRJA-0002) “BECOA.” References Ashenfelter, O., 1989. How auctions work for wine and art. J. Econ. Perspect. 3 (3), 23–36. Ashenfelter, O., Genovese, G., 1992. Testing for price anomalies in real estate auctions. Am. Econ. Rev. 82, 501–505. Ashenfelter, O., Graddy, K., 2003. Auctions and the price of art. J. Econ. Lit. Am. Econ. Assoc. 41 (3), 763–787. Barreteau, O., Antona, M., d'Aquino, P., Aubert, S., Boissau, S., Bousquet, F., Daré, W., Etienne, M., Le Page, C., Mathevet, R., Trébuil, G., Weber, J., 2003. Our companion modelling approach. J. Artif. Soc. Soc. Simul. 6 (2). Beggs, A., Graddy, K., 1997. Declining values and the afternoon effect: evidence from art auctions. RAND J. Econ., The RAND Corp. 28 (3), 544–565. Black, J., de Meza, D., 1992. Systematic price differences between successive auctions are no anomaly. J. Econ. Manag. Strateg. 1 (4), 607–628. Bressler, R.G., 1936. The Relation of Quality to the Price of Strawberries on the Manchester Farmers' Auction Market, M.S. Dissertation, Connecticut State College. Camerer, C.F., Ho, T.-H., Chong, J.-K., 2002. Sophisticated experience-weighted attraction learning and strategic teaching in repeated games. J. Econ. Theory 104 (1), 137–188. Clark, C.B., Bressler, R.G., 1938. Prices as Related to Quality on the Connecticut Strawberry Auctions. Storrs Agricultural Experiment Station Bulletin 227, University of Connecticut. Engelbrecht-Wiggans, R., Kahn, C.M., 1999. Calibration of a model of declining prices in cattle auctions. Q. Rev. Econ. Finance 39 (1), 113–128. Erev, I., Roth, A.K., 1998. Predicting how people play games: reinforcement learning in experimental games with unique, mixed strategy equilibria. Am. Econ. Rev. 88 (4), 848–881.

E. Guerci et al. / Journal of Economic Dynamics & Control 48 (2014) 374–393

393

Gallegati, M., Gianfranco Giulioni, G., Kirman, A., Palestrini, A., 2011. What's that got to do with the price of fish? Buyers structure behavior on the Ancona fish market. J. Econ. Behav. Organ. 80 (1), 20–33. Ginsburgh, V.A., 1998. Absentee bidders and the declining price anomaly in wine auctions. J. Polit. Econ. 106 (6), 1302–1331. Giulioni, G., Bucciarelli, E., 2008. Understanding the price dynamics of a real market using simulations: the Dutch auction of the Pescara wholesale fish market. Complex. Artif. Mark., 15–25. Giulioni, G., Bucciarelli, E., 2010. Agent's behaviour in a sequential Dutch auction: evidence from the Pescara wholesale fish market. Appl. Econ. Lett. 18 (5), 455–460. Giulioni, G., Bucciarelli, E., 2011. Agents' ability to manage information in centralized markets: comparing two wholesale fish markets. J. Econ. Behav. Organ. 80, 34–49. Graddy, K., 2006. Markets: the Fulton fish market. J. Econ. Perspect. 20 (2), 207–220. Hardle, W., Kirman, A., 1995. Nonclassical demand: a model-free examination of price-quantity relations in the Marseille fish market. J. Econom. 67 (1), 227–257. Hommes, C., Lux, T., 2013. Individual expectations and aggregate behavior in learning-to-forecast experiments. Macroecon. Dyn. 17 (02), 373–401. Kirman, A., Vignes, A., 1991. Price dispersion: theoretical considerations and empirical evidence from the Marseilles fish market. In: Arrow, Kenneth, J. (Eds.), Issues in Contemporary Economics: Proceedings of the Ninth World Congress of the International Economic Association, Athens, Greece, New York University Press, New York. Kirman, A., 2001. Market organisation and individual behaviour: evidence from fish markets. In: Rauch, J., Casella, A. (Eds.), Networks and Markets, Russell Sage Foundation, New York. Kirman, A., Vriend, N.J., 2001. Evolving markets. An ACE model of price dispersion and loyalty. J. Econ. Dyn. Control 25 (3/4), 459–502. Kirman, A., Schulz, R., Hardle, W., Werwatz, A., 2005. Transactions that did not happen and their influence on prices. J. Econ. Behav. Organ. 56, 567–591. Marimon, R., McGrattan, E., 1995. On adaptive learning in strategic games. In: Kirman, A., Salmon, M. (Eds.), Learning and Rationality in Economics, Basil Blackwell, Oxford, pp. 63–101. McAfee, R.P., Vincent, D., 1993. The declining price anomaly. J. Econ. Theory 60 (1), 191–212. Milgrom, P., Weber, R.J., 1982. A theory of auctions and competitive bidding. Econom. Econom. Soc. 50 (5), 1089–1122. Neugebauer, T., Pezanis-Christou, P., 2007. Bidding behavior at sequential first-price auctions with(out) supply uncertainty: a laboratory analysis. J. Econ. Behav. Organ. 63 (1), 55–72. Pezanis-Christou, P., 2000. Sequential Descending-Price Auctions with Asymmetric Buyers: Evidence from a Fish Market, CEPR Working Paper. Sosnick, S.H., 1963. Bidding strategy at ordinary auctions. J. Farm Econ. 45, 163–182. Sutton, R.S., Barto, A.G., 1998. Reinforcement Learning: An Introduction. The MIT Press, Cambridge. van den Berg, Gerard J., van Ours, Jan C. Pradhan, Menno, P., 2001. The declining price anomaly in Dutch rose auctions. Am. Econ. Rev. 91(4), 1055–1062. Watkins, C., Dayan, P., 1992. Q-learning. Mach. Learn. 8 (3-4), 279–292. Weisbuch, G., Kirman, A., Heirreiner, D., 2000. Market organisation and trading relationships. Econ. J. 110 (463), 411–436.