Games and Economic Behavior 41 (2002) 26–45 www.academicpress.com
Fairness and learning: an experimental examination David J. Cooper a,∗ and Carol Kraker Stockman b a Department of Economics, Weatherhead School of Management, Case Western Reserve University,
10900 Euclid Avenue, Cleveland, OH 44106, USA b Graduate School of Public Health, University of Pittsburgh, 11119 Bellflower Road,
Pittsburgh, PA 15261, USA Received 18 October 1999
Abstract Over the last twenty years, experimental economists have identified a wide variety of games in which subjects display “other-regarding” behavior in violation of standard game theoretic predictions. The two primary approaches to explaining this behavior can be characterized as the “fairness hypothesis” and the “learning hypothesis.” Both hypotheses provide an adequate explanation of behavior in existing experiments, but the two approaches rely on very different interpretations of the factors underlying subjects’ behavior. In this paper, we examine experiments with a step-level public goods game designed to distinguish between the fairness and learning hypotheses. We find that neither approach provides an adequate explanation of the experimental data by itself. We propose a model which synthesizes the fairness and learning hypotheses. Using econometric analysis, we demonstrate that this model provides a significantly better fit to the experimental data. 2002 Elsevier Science (USA). All rights reserved. JEL classification: C72; C91; C92; D64 Keywords: Learning; Fairness; Public goods
* Corresponding author.
E-mail addresses:
[email protected] (D.J. Cooper),
[email protected] (C.K. Stockman). 0899-8256/02/$ – see front matter 2002 Elsevier Science (USA). All rights reserved. PII: S 0 8 9 9 - 8 2 5 6 ( 0 2 ) 0 0 0 1 1 - 8
D.J. Cooper, C.K. Stockman / Games and Economic Behavior 41 (2002) 26–45
27
1. Introduction Over the last twenty years, experimental economists have identified a wide variety of games in which subjects display “other-regarding” behavior that violates standard game theoretic predictions.1 Explaining these results has become a major focus of experimental economics and the related areas of economic theory. The two primary approaches to explaining this anomalous behavior can be characterized as the “fairness hypothesis” and the “learning hypothesis.” The fairness hypothesis maintains the assumption of full rationality but eliminates the equivalence between monetary payoffs and utility, instead allowing a player’s utility to depend on the payoffs of others.2 Sophisticated versions of this approach, such as Bolton and Ockenfels (2000) and Fehr and Schmidt (1999), give unified explanations of results from a wide variety of experiments.3 The learning hypothesis takes the opposite approach, relaxing the assumption of full rationality but maintaining the equivalence between monetary payoffs and utility. Under the learning hypothesis, players have bounded rationality, only gradually learning to behave optimally. As shown by Roth and Erev (1995) and Binmore et al. (1995), the learning hypothesis also provides a unified explanation of results from a wide variety of experiments. Although both hypotheses provide an adequate explanation of existing experimental data, the two approaches rely on very different interpretations of the factors underlying subjects’ behavior. As such, there exists a need for experimental work which allows us to determine whether subjects’ behavior is best explained by the fairness hypothesis, the learning hypothesis, or a synthesis of both approaches. This paper examines experiments with a three player sequential step-level public goods game. We focus on the behavior of “critical” third players—third players whose decisions determine whether or not the public good will be provided. This case is of particular interest because critical third players face no strategic uncertainty. The game creates a strong tension between monetary payoffs and fairness for critical third players. Applying the fairness and learning hypotheses to this game allows us to make sharp predictions about the observed behavior of critical third players. In particular, existing models of fairness, static by their very nature, predict no change in behavior over time, while existing learning models predict that contributions by critical third players, a strictly dominant strategy, must increase over time. Our results unequivocally indicate 1 Well known examples include the ultimatum game (Roth, 1995), the dictator game (Roth, 1995), VCM public goods games without thresholds (Ledyard, 1995), experimental labor markets (Fehr et al., 1998), and trust games (Berg et al., 1995). 2 It is somewhat of a misnomer to refer to the “fairness” hypothesis, since the theory encompasses a wide variety of emotions such as envy, spite, and reciprocity. The “interpersonal comparisons” hypothesis would be a more accurate (if unwieldy) characterization of this approach. 3 See also Falk and Fischbacher (1998), Levine (1998), and Charness and Rabin (2000).
28
D.J. Cooper, C.K. Stockman / Games and Economic Behavior 41 (2002) 26–45
that neither approach can explain the experimental data by itself—we observe consistent movements in critical third players’ play, including a strong decrease in the contribution rate for one of the treatments. We propose a model which synthesizes the fairness and learning hypotheses. Using maximum likelihood analysis, we show that this hybrid model provides a better explanation for our data than either a pure fairness or learning approach. To the best of our knowledge, our data set is unique in showing robust movement away from a dominant strategy. The existing experimental literature on other-regarding behavior generally reports no observable dynamics, in part due to the relatively small number of repetitions typically employed. To the extent that any dynamics are observed, they involve movement towards a dominant strategy. We initially viewed our MCS game experiments as a contest between the fairness and learning hypotheses, but discovered that neither approach is wholly satisfactory. Rather than saying both approaches are wrong, it would be more accurate to say that both are partially right. By synthesizing the best features of the fairness and learning hypotheses, our model hopefully not only improves our ability to fit the current experimental data, but also improves our ability to understand interactions between learning and fairness.
2. The MCS game, the experimental design, and predicted results The minimal contributing set (MCS) game we study is a three-person steplevel public goods game.4 Each player is given an initial endowment of 12 tokens. Players sequentially choose whether or not to contribute to provision of a public good. Contribution is a binary choice; players choose to either contribute or not contribute, and a predetermined cost of contribution is deducted from a player’s initial endowment following contribution. The public good is provided if at least two of the three players contribute. If the public good is provided, then each of the players receives an additional 18 tokens regardless of whether he contributed to the public good. Costs of contribution are deducted regardless of whether or not the public good is provided. Players are given perfect information—they know all players’ payoff tables and the moves of preceding players. The pre-set cost associated with contribution for player i is ci tokens, i ∈ {1, 2, 3}, where the index gives the player’s position in the game (Player 1 moves first, Player 2 moves second, and Player 3 moves third). We use three different cost structures, as summarized in Table 1. Only one cost structure is used in any given experimental session. We will refer to the treatments by the cost structure, giving us the 3/6/9 treatment, the 1/3/9 treatment, and the 1/3/16 treatment. Note that 4 The MCS game was first introduced in a simultaneous form by Van de Kragt et al. (1983) and in a sequential form by Erev and Rapoport (1990).
D.J. Cooper, C.K. Stockman / Games and Economic Behavior 41 (2002) 26–45
29
Table 1 Summary of contribution costs by treatment
3/6/9 treatment 1/3/9 treatment 1/3/16 treatment
Player 1
Player 2
Player 3
3 1 1
6 3 3
9 9 16
costs of contribution are increasing in player’s position for all three treatments. Although payoffs for all treatments are set so that all players earn more money if the public good is provided (ci < 18, ∀i), each player earns more if he can free ride on the contributions of the other players. Restricting our attention to pure strategy equilibria, and assuming that utility is a function solely of a player’s monetary payoff, the MCS game has four Nash equilibria. In one equilibrium none of the players contribute and the public good is not provided. In the remaining three equilibria exactly two of the three players contribute and the public good is provided. Only the equilibrium in which the last two players contribute is a subgame perfect equilibrium. The position contingent contribution costs and the sequential nature of the game create tension between payoff maximization and other-regarding preferences. Consider the position of a “critical” third player—a third player who knows that exactly one of the two preceding players has contributed. A critical third player knows that the public good will be provided if and only if he contributes. If this player cares solely about his monetary payoffs, he should always contribute. However, this player also knows that a preceding player with a lower cost of contribution has not contributed and will free ride on provision of the public good. We can easily imagine that a critical third player might resent this, and therefore not contribute to punish the earlier player who did not contribute. A critical third player must also consider that his decision to contribute or not contribute affects the preceding individual who did contribute, an individual whom he may wish to help. Thus, the desire to maximize one’s own monetary payoffs, the desire to punish free riders, and the desire to reward other contributors all can affect the behavior of critical third players. The resulting behavior depends on which of these forces carries the greatest weight. The three different cost structures are designed to place differing emphasis on maximizing one’s own payoffs versus fairness for critical third players. Compare the 3/6/9 treatment with the 1/3/9 treatment. In both cases, the monetary benefits of contributing are identical for critical third players. However, in the 1/3/9 treatment the preceding players’ costs of contribution are lower. Holding the identity of the contributing player fixed, we would expect a critical third player to feel greater resentment when the non-contributing player’s cost of contribution is lowered. A critical third player should also be less concerned about harming a player who has contributed when that player’s cost of contribution is reduced.
30
D.J. Cooper, C.K. Stockman / Games and Economic Behavior 41 (2002) 26–45
We therefore would expect fairness considerations to play a greater role in the 1/3/9 treatment than the 3/6/9 treatment, lowering contribution rates by critical third players. Likewise, compare the 1/3/9 treatment with the 1/3/16 treatment. While the costs of contribution are identical for Player 1s and Player 2s across the two treatments, the cost of contribution has increased for Player 3s in the 1/3/16 treatment. Since the monetary incentives to contribute have been reduced, we expect fairness considerations to play a more important role in determining the choices of critical third players in the 1/3/16 treatment than in the 1/3/9 treatment. We therefore anticipate lower contribution rates by critical third players in the 1/3/16 treatment than in the 1/3/9 treatment. By their very nature, existing models of fairness predict no change in the behavior of critical third players over time. These are all models in which individuals have fixed utility functions over their own payoffs and the payoffs of others. Since there is no uncertainty about other players’ payoffs to be resolved for critical third players, there is no reason for their behavior to change. The preceding arguments have all been made on an intuitive basis, but can also be placed on a more formal footing. For example, we can apply by Bolton and Ockenfels’ (2000) ERC model, a leading model of fairness, to the MCS game. Bolton and Ockenfels model a player’s utility from a game as a function of his monetary payoff and his payoff share. The latter term captures a player’s concern with fairness. Making some standard assumptions, we prove the following two propositions. The proofs are available from the authors upon request. Proposition 1. If an individual ever contributes as a critical third player in the 1/3/9 treatment, they must always contribute as a critical third player in the 3/6/9 treatment. Proposition 2. If an individual ever contributes as a critical third player in the 1/3/16 treatment, they must always contribute as a critical third player in the 1/3/9 treatment. It follows from these propositions that the ERC model predicts contribution rates by critical third players to be non-increasing as we go from the 3/6/9 treatment to the 1/3/9 treatment and then to the 1/3/16 treatment. Because a player’s utility function only depends on his own monetary payoff and the monetary payoffs of the other two players, information which is known to a critical third player, the ERC model predicts no systematic change over time in contribution rates by critical third players. Our predictions for the MCS game are not driven by the fine details of the ERC model, and wouldn’t change if a different model of fairness was used. For example, Propositions 1 and 2 can both be proved using Fehr and Schmidt’s (1999) model of fairness, competition, and cooperation. Copies of these alternative proofs are also available from the authors upon request.
D.J. Cooper, C.K. Stockman / Games and Economic Behavior 41 (2002) 26–45
31
The predictions of a reinforcement learning model (with reinforcements equal to monetary payoffs) are quite different than those of a fairness model. While we put off a detailed description of the reinforcement learning model of Roth and Erev until Section 5, a brief intuitive explanation of what the model predicts is useful in reading the experimental results. Reinforcement learning models can be summarized in a single sentence: Strategies that do well should be played more frequently over time. A learning model by its very nature makes no predictions about the initial distribution of play—learning models predict the evolution of play given a starting distribution of strategies. With monetary reinforcements, play should move in the direction of increased monetary payoffs. Since it is a strictly dominant strategy to contribute as a critical third player regardless of the treatment, we expect that the contribution rate for third players will increase (on average) over time regardless of the treatment. However, we do expect that the rate of increase will be slower in the 1/3/16 treatments than in the other two since the reinforcement for contributing is smaller in this treatment. To summarize, both a pure fairness and a pure learning approach make strong predictions about the data. Considering fairness by itself, we predict that contribution rates by critical third players should be decreasing as we move from the 3/6/9 treatment to the 1/3/9 treatment and then the 1/3/16 treatment. A fairness only approach predicts no change over time in contribution rates. A learning model with no fairness component makes no predictions about the starting distribution of strategies across the three treatments, but predicts that contribution rates should be rising for critical third players in all three cases. It also predicts that the rate of increase will be slowest in the 1/3/16 treatment.
3. Experimental procedures Four replications of each of the three cells of the design have been run using a total of 234 subjects. Between fifteen and twenty-four subjects participated in each session and each subject was allowed to take part in only one session. Subjects were recruited from the University of Pittsburgh undergraduate population using newspaper ads, posters, and electronic bulletin board postings. The average session lasted about 1 12 hours, and the average subject earned about $15, including a $5 participation fee. These earnings were adequate to ensure a steady supply of subjects. Before the beginning of a session, instructions were read aloud to all subjects. Subjects were also given a written copy of the instructions and of the payoff tables. All substantive questions about the instructions were answered out loud to ensure common knowledge. Subjects were asked to complete a short payoff quiz to verify their ability to read the payoff tables. Each session consisted of forty periods of play of the sequential MCS game. Subjects were told the number of periods to be played. We randomly determined
32
D.J. Cooper, C.K. Stockman / Games and Economic Behavior 41 (2002) 26–45
each subject’s player position (Player 1, Player 2, or Player 3) before the first round of play. A subject’s position remained unchanged throughout the course of the session. Subjects knew their own position and that there were an equal number of subjects in each position, but did not know the position of any other subject. At the beginning of each round of play, we randomly and anonymously assigned subjects to three person groups containing one subject for each position. The instructions emphasized that the three person groups would be randomly rematched in every round. This was intended to preserve the one-shot strategic character of the MCS game. After each round of play, subjects were informed of the decisions made and the payoffs earned by the other members of their group. Subjects were given record sheets on which to record their outcomes. While we did not require subjects to fill out the record sheet, we observed that most did. To avoid any framing effects, neutral terms were used wherever possible. For example, players were asked to choose between “X” and “Y” rather than “contribute” and “don’t contribute.” No explicit mention was ever made of “provision” or “public goods.” Instead, the payoff table refers solely to “your choice” and “total number of other players choosing X.” Subjects were paid privately, in cash, based on their token earnings in two periods randomly chosen from the forty periods of play. Each token was worth $0.30 to subjects.
4. Experimental data Figure 1 shows the evolution of contribution rates for Player 1s, Player 2s contingent on Player 1’s decision, and critical Player 3s. Our data analysis concentrates on the choices of critical third players. Unlike other players, third players face no uncertainty about the play of others. We therefore don’t need to worry about the unobservable beliefs of subjects in analyzing their behavior. Only for third players can we make strong predictions about how behavior should vary across treatment and across time and draw strong conclusions about the effects of fairness and learning on subjects’ behavior. After sorting themselves out over the first few rounds, contribution rates by critical third players are decreasing as the asymmetry of costs increases. These differences emerge early—aggregating over the first ten periods, the contribution rates by critical third players are 82, 76, and 73% in the 3/6/9, 1/3/9, and 1/3/16 treatments, respectively—and increase over time. For the 3/6/9 and 1/3/9 treatments, the contribution rate for critical third players rises over time, reaching 90 and 83%, respectively, for the final ten periods. In the 1/3/16 treatment, the contribution rate for critical third players falls substantially over time, dipping to 50% in periods 21–30 and then rebounding somewhat to 60% in the final ten periods. This decline in contribution rates is not due to anomalous behavior in a single session—for all four sessions of the 1/3/16 treatment the contribution rate
33
Fig. 1.
D.J. Cooper, C.K. Stockman / Games and Economic Behavior 41 (2002) 26–45
34
D.J. Cooper, C.K. Stockman / Games and Economic Behavior 41 (2002) 26–45
by critical third players was lower in the final ten periods than in the initial ten periods. Rather than capturing a substantive feature of subjects’ behavior such as reputation building, the late increase of the contribution rate for critical third players in the 1/3/16 treatment illustrates the deceptive effects of aggregation over sessions. In later periods of the experiment, the probability of being a critical third player is endogenous. If learning by the first two players moves them towards strategies with higher expected payoffs, there will be positive correlation between the initial probability that critical third players contribute in a session and the probability that third players are critical in later periods. It follows that observations of critical third players in later periods are more likely to come from sessions in which critical third players tend to contribute. This aggregation effect biases estimated changes in the contribution rate for critical third players upwards. The rise in contribution rates for 1/3/16 critical third players in the final ten periods is largely due to the aggregation effect described above. When contribution rates are broken down by session, no clear pattern of change emerges over the second half of the experiment, but the percentage of observations coming from sessions with relatively high contribution rates rises steadily over time. As such, we see little support for the hypothesis that reputation effects are driving our results.5 To control for individual effects (and for the resulting aggregation effects), we examine the behavior of critical third players using probit analysis with a random effects specification. This analysis is summarized in Table 2. Each observation corresponds to a play of the game for which the third player was critical. For all three models the dependent variable is the critical third player’s decision (0 = don’t contribute, 1 = contribute). The random effects term is significant at the 1% level in all three regressions, but is not reported in Table 2 since it is of little economic significance. Model 1 is a baseline regression looking for general differences between the treatments. The only regressors are dummy variables for treatment (Payoff 3/6/9 = 1 if 3/6/9 treatment, 0 otherwise; Payoff 1/3/16 = 1 if 1/3/16 treatment, 0 otherwise). Using the 1/3/9 treatment as a base, this regression finds that contribution rates are higher in the 3/6/9 treatment and lower in the 1/3/16 treatment with both dummies significant at the 1% level.6 5 We can also refute the reputation hypothesis by looking at play by critical third players in the 40th
period. Since this is the last period of play, there is nothing to be gained by maintaining a reputation. None the less, only 8 of 15 critical third players in the 40th period of the 1/3/16 treatment contribute to the public good. By way of comparison, 20 out of 20 critical third players contributed in the 40th period of the 3/6/9 treatment and 13 out of 14 critical third players contributed in the 40th period of the 1/3/9 treatment. 6 Throughout this paper, all tests of significance for individual parameters are two tailed z tests. All tests of joint significance use a log likelihood ratio test.
D.J. Cooper, C.K. Stockman / Games and Economic Behavior 41 (2002) 26–45
35
Table 2 Results of probits for critical third players Model 1
Model 2
Model 3
Constant Payoff 3/6/9 Payoff 1/3/16 Period Period 3/6/9 Period 1/3/16 Whocon Chance Chance 3/6/9 Chance 1/3/16
1.960a (0.163) 0.987a (0.232) −0.903a (0.154)
1.678a (0.219) 0.832a (0.357) −0.955a (0.297)
2.145a (−0.507) 0.240 (0.606) −1.643a (0.561) 0.012b (0.007) 0.008 (0.010) −0.041a (0.009) 0.095 (−0.095) −0.787 (0.634) 0.670 (0.640) 1.012c (0.671)
Log-likelihood
−643.24
0.014b (0.007) 0.007 (0.010) −0.043a (0.010)
−627.37
−625.12
Standard errors are in parentheses. a, b, c Indicates significance at the 1, 5 and 10% levels, respectively.
Model 2 adds a regressor for the period in which the decision was made as well as two variables which interact period with a treatment dummy (period 3/6/9 and period 1/3/16). Once again the treatment dummies are significant at the 1% level, indicating that statistically significant differences between treatments are present from the beginning of play. Jointly, the three new variables are significant at the 1% level (χ 2 = 31.756, 3 d.f., p < 0.01). The parameter for period is positive and significant at the 5% level, while the period 3/6/9 parameter is not significant at any standard level. Together, these parameters indicate that contribution rates are rising in the 3/6/9 and 1/3/9 treatments with the difference between the two treatments remaining roughly constant. The parameter for period 1/3/16 is negative and significant at the 1% level. Additionally, the overall rate of change in 1/3/16 sessions is negative and significant at the 1% level, indicating that contribution rates are falling significantly over time in the 1/3/16 treatment.7 Model 3 adds four new variables to the regression. These additions are chosen to test secondary predictions of the fairness and learning hypotheses. Whocon is a dummy variable indicating which of the preceding two players contributed (whocon = 1 if the first player contributed, 0 otherwise). Under general conditions, the fairness hypothesis predicts that critical third players should contribute more frequently if Player 2 has contributed than if Player 1 has contributed. (Formal proofs using either the Bolton–Ockenfels model or the Fehr and Schmidt model are available from the authors upon request.) This prediction holds regardless of treatment, so no interactions with the treatment dummies 7 To test this hypothesis, we reran Model 2 with the 1/3/16 treatment as the base. The parameter estimate for period is −0.0295 with a standard error of 0.0063. This parameter is significant at the 1% level.
36
D.J. Cooper, C.K. Stockman / Games and Economic Behavior 41 (2002) 26–45
are included in the regression. Chance gives the percentage of previous periods in which the third player has been critical. In a reinforcement learning model without fairness (reinforcements equal monetary payoffs), the contribution rate by critical third players should be positively correlated with chance. Intuitively, the more opportunities you have had to learn that contributing yields higher payoffs than not contributing, the more likely you should be to contribute. When the reinforcement includes a fairness component, the model predicts that more chances to be critical should accelerate learning but not affect the direction. Since contribution rates by critical third players move in different directions in the different treatments, the variable chance is interacted with the treatment dummies to generate the variables chance 3/6/9 and chance 1/3/16. The results for Model 3 indicates that which player contributed prior to Player 3 does not have any significant effect on a critical third player’s decision. The three chance variables are not jointly significant at any standard level (χ 2 = 4.46, 3 d.f., p > 0.10). Examined individually, only the parameter estimate for chance 1/3/16 is significant even at the 10% level. The positive sign of the parameter estimate for chance 1/3/16 does not fit with any sort of reinforcement learning model, given that contribution rates for critical third players in the 1/3/16 sessions decrease over time. Overall, the regression results reinforce our observations from Fig. 1. Both the level of contribution rates for critical third players and the direction of change for these contribution rates depend on the asymmetry of contribution costs. For the treatments with relatively symmetric costs (3/6/9 and 1/3/9 treatments), contribution rates for critical third players are relatively high and rise over time. For the treatment with relatively asymmetric costs (1/3/16 treatment), critical third player contribution rates are initially low and fall over time. The experimental results can only be partially explained by the existing theories of fairness. The initial differences in play by critical third players between treatments are consistent with the predictions of fairness models like the Bolton– Ockenfels ERC model. However, these initial differences are quite small, only becoming economically significant with the passage of time. The existing models of fairness are static in nature, and therefore cannot track the dynamics. A pure learning approach also fails to explain all of the major features of the data. As noted previously, such an approach predicts a rising contribution rate for critical third players in all three treatments, when in fact the contribution rate falls sharply in the 1/3/16 treatment.
5. Learning and fairness While neither the pure fairness hypothesis nor the pure learning hypothesis captures all of the major features of the experimental data, both are able to explain some of the major features. While unable to explain the changes in subjects’
D.J. Cooper, C.K. Stockman / Games and Economic Behavior 41 (2002) 26–45
37
behavior, the pure fairness hypothesis does correctly predict that contribution rates for critical third players are decreasing as the costs of contribution become more asymmetric. Likewise, while unable to predict the decrease in contribution rates for critical third players in the 1/3/16 treatment, the pure learning hypothesis correctly predicts that behavior will change over time with the differences between treatments growing. These partial successes suggest that a hybrid model synthesizing the best of the fairness and learning hypotheses might fit the data well. This section presents a reinforcement learning model and explores whether the addition of fairness to the model improves its ability to fit the data. Because the dynamics generated by the learning model are probabilistic in nature and sensitive to how the learning model is specified, the model must be formally specified and appropriate econometric tests must be applied to cleanly reject the null hypothesis of pure learning in favor of a hybrid between learning and fairness. 5.1. The learning model We use a variation of the reinforcement learning model developed by Roth and Erev (1995). There exist a wide variety of learning models in the literature, and which of these best tracks the behavior of experimental subjects remains an open question.8 Our concern is not to determine whether the reinforcement learning model is the best model of learning, but rather to study whether or not the observed data is best explained by a learning model that allows for fairness considerations. The reinforcement learning model is particularly well suited to this purpose since it has been used previously to study similar issues and is easily modified to incorporate fairness considerations. We strongly conjecture that our conclusions will hold for any learning model which puts increased weight over time on strategies that do well. The reinforcement learning model studies how behavior changes over time for a population of individuals who are repeatedly matched in small groups to play a game. Individuals are assumed to always play the same role within this game, but the groups matched with each other change from round to round. Formally, consider an N player normal form game. Let Mi be the number of strategies available to the ith player. In round t, the j th individual in the ith role has propensity qijt (m) for choosing his mth strategy. The probability of the j th individual in the ith role choosing his mth strategy in round t is obtained by
8 See Cooper and Feltovich (1996), Cheung and Friedman (1997), Erev and Roth (1998), Salmon (1998), Camerer and Ho (1999), Blume et al. (1999), and Feltovich (2000).
38
D.J. Cooper, C.K. Stockman / Games and Economic Behavior 41 (2002) 26–45
dividing the associated propensity for this strategy by the sum of the propensities in round t qijt (m) pijt (m) = M . i t µ=1 qij (µ)
(1)
In round t, each individual is randomly assigned N − 1 opponents, then chooses his strategy based on his round-t probabilities. Let σijt ∈ {1, 2, . . ., Mi } be the strategy chosen by the j th individual in the ith role in round t. Monetary payoffs are determined by the strategies chosen by all N individuals in a matching. Let rijt be the reinforcement received by the j th individual in the ith role in round t. For the learning model without fairness, this reinforcement is equal to the individual’s monetary payoff. For the hybrid model with fairness, the reinforcement is a function of the individual’s monetary payoff and the size of his payoff relative to the payoffs of others. An individual’s propensities are updated by multiplying all propensities by a “forgetting” parameter δ and then adding his reinforcement to his propensity for choosing his realized strategy. Eq. (2) gives the formula for updating the propensity of the realized strategy—for all other strategies, the rijt term is dropped. By assumption, 0 δ 1 (2) qijt +1 sijt = δqijt sijt + rijt . To help pick up the individual effects in the data, we modify the model by adding an autocorrelation parameter psame . This “inertia” parameter allows for the possibility that the draws are not independent across periods, but instead correlated. The addition of this parameter affects how strategies are selected in each period. With probability psame , the same strategy is chosen in period t as was chosen in period t − 1 (σijt = σijt −1 ). Otherwise, the formula in Eq. (1) is used to pick a new strategy. In other words, Eq. (1) now gives the probability for each strategy being used subject to the individual choosing a new strategy. 5.2. Fitting the reinforcement learning model Before fitting parameters for the reinforcement learning model, we must make some decisions as to how the model will be formulated for the MCS game. The model is written down in terms of normal form strategies, but can be equally well stated in terms of behavioral strategies. For an extensive form game like the MCS game, this might seem like a more natural choice. However, to give the pure learning hypothesis its best possible chance, we use normal form strategies. This decision increases the probability of a decline in critical third players’ contribution rates as observed in the 1/3/16 treatment. The two approaches differ on the critical issue of whether or not third players reinforce the decision they would have made if they had been critical even when they are not critical. Using normal form strategies allows this sort of “accidental” reinforcement while using
D.J. Cooper, C.K. Stockman / Games and Economic Behavior 41 (2002) 26–45
39
behavioral strategies does not.9 As a neutral force, accidental reinforcement works against the natural direction of the dynamic. Since contributing is a dominant strategy for critical third players with no fairness considerations, the natural direction of the dynamic is towards increasing contribution rates. Thus accidental reinforcement works to our benefit if we are trying to generate decreasing contribution rates. In this paper, the model will only be fit for third players since this is the primary focus. Results on fitting the model for Players 1 and 2 are available from the authors upon request. In the full normal form game, Player 3s have sixteen available strategies. To make the estimation less unwieldy, the strategy set is simplified by forcing third players to choose the same action whenever they are not critical. In other words, a third player who is not critical cannot condition his strategy on whether both or neither of the previous players have contributed. Given that contribution rates are about the same in both cases, and given the low proportion of contributions observed in either case, this assumption is unlikely to have any substantial effect on our conclusions.10 The remaining eight strategies tell a Player 3 what action to take when not critical, critical following a contribution by Player 1, or critical following a contribution by Player 2. We fit two variations of the reinforcement learning model to the data for third players using standard maximum likelihood techniques. In the first version of the model, the learning model without fairness, reinforcements equal the players’ monetary payoffs. In the second version, we incorporate fairness into the learning model by using a version of Bolton and Ockenfels’ ERC model to generate the reinforcements. In particular, the reinforcements for the learning model with fairness are given by Eq. (3). Let πi be the monetary payoff for the ith player. Note that if α = 0, we get the learning model without fairness 1 πi (3) − . r3 (π1 , π2 , π3 ) = π3 − α ∗ min 0, π1 + π2 + π3 3 The results of the maximum likelihood estimation are contained in Table 3. All Player 3 observations were used in fitting the models. Separate initial propensities are fit for each of the three treatments. Given that the inclusion of psame probably does not fully control for individual effects in the data, the standard errors have been corrected for clustering using techniques due to Moulton (1986) and White 9 For example, suppose the first two players contribute. Suppose the third player would not have contributed if he had been critical. With normal form strategies, this choice gets reinforced even though it was never implemented. No reinforcement takes place with behavioral strategies. 10 The frequency of contribution for third players when both preceding players contribute is 3.1% in the 3/6/9 treatment, 13.2% in the 1/3/9 treatment, and 3.2% in the 1/3/16 treatment. The frequency of contribution for third players when neither preceding player contributes is 4.2% in the 3/6/9 treatment, 18.7% in the 1/3/9 treatment, and 1.5% in the 1/3/16 treatment.
40
D.J. Cooper, C.K. Stockman / Games and Economic Behavior 41 (2002) 26–45
Table 3 Maximum likelihood estimates of parameters for reinforcement learning model Player 3 data Model without fairness Initial sum of propensities Delta Probability same Alpha Log-likelihood
46.2a (19.4) 0.251b (0.036) 0.228a (0.093) 750.33
Model with fairness 36.6b 0.248b 0.222b 51.8b
(11.2) (0.035) (0.074) (15.5)
−744.38
Each regression has 3120 observations split between 78 subjects. Numbers reported in parentheses are standard errors corrected for clustering. a Significantly different from 0 at the 5% level. b Significantly different from 0 at the 1% level.
(1994). The initial propensities are of little economic interest, and hence are not reported. The critical parameter in the maximum likelihood estimation is α, the weight put on fairness in the reinforcement function. Even after correcting the standard errors for clustering, we can reject the null hypothesis that α = 0 at the 1% level using either a z test (z = 3.34, p < 0.01) or a log-likelihood ratio test (χ 2 = 11.91, 1 d.f., p < 0.01). Adding fairness to the reinforcement function significantly improves the model’s ability to fit the data. This improvement does not depend on the precise details of how fairness is added to the reinforcements. We have done analogous analysis using several alternative specifications of the ERC model and several specifications of the Fehr–Schmidt model, and get identical qualitative results. The details of these additional fitting exercises are available upon request. The ability of the reinforcement learning model with fairness to better fit the data follows from the basic properties of the model. Reinforcement learning models tends to move in the direction of higher reinforcements. Because contributing always provides a higher monetary payoff for critical third players, the learning model without fairness is strongly predisposed toward rising contribution rates. This makes it difficult for the model to track the falling contribution rates in the 1/3/16 treatment. With fairness added to the reinforcement function, it becomes possible for not contributing when critical to generate a higher reinforcement than contributing. For the fitted value of α, this is indeed the case for the 1/3/16 treament, but not for the other two treatments.11 The model therefore can track both the rising contribution rate for critical third 11 For α > 14.7, a critical third player in the 1/3/16 treatment always receives greater reinforcement from not contributing than from contributing. For the 1/3/9 treatment, a critical third player always receives greater reinforcement from contributing than not contributing if α < 127.1. For the 3/6/9 treatment, a critical third player always receives greater reinforcement from contributing than not contributing if α < 140.4.
D.J. Cooper, C.K. Stockman / Games and Economic Behavior 41 (2002) 26–45
41
players in the 3/6/9 and 1/3/9 treatments and the falling contribution rate in the 1/3/16 treatment. The preceding argument should make it clear why use of reinforcement learning versus some other learning model is not an essential ingredient of the hybrid model. What makes the hybrid model work is its ability to change which strategy is dominant in the 1/3/16 treatment. As long as the learning model has the tendency of moving towards a dominant strategy, the specifics of the dynamics are not crucial.
6. Conclusions and discussion This paper has presented experiments with a sequential step-level public goods game, the MCS game. It explores whether the observed data can best be explained by the pure fairness hypothesis, the pure learning hypothesis, or some combination of both approaches. While neither approach by itself captures the major features of the data, a hybrid model that combines the dynamics of the learning hypothesis with the sensitivity to relative payoffs of the fairness hypothesis provides a significantly better fit for the experimental data. In particular, this model is consistent with declining contribution rates for critical third players in the 1/3/16 treatment. The novelty of our work comes from two sources, one empirical and one theoretical. Empirically, we find strong movement towards a dominated strategy, a unique result as far as we know. For the most part, games with other-regarding behavior yield no discernable dynamics. One case for which strong dynamics are typically observed is in VCM public goods games without a threshold (Ledyard, 1995, pp. 146–148). However, these dynamics involve movement towards the dominant strategy of not contributing. Only because of the significant shift in the 1/3/16 treatment away from the dominant strategy of contributing for critical third players can we reject the pure learning hypothesis.12 The primary theoretical innovation of this paper is driven by the empirical results. The data make it clear that fairness and learning do not act in isolation, but instead interact with each other. This suggests the need to develop a theory 12 The observed dynamics in VCM public goods games do not allow us to reject the pure fairness hypothesis in any substantive manner. Because VCM games are simultaneous play games, subjects face strategic uncertainty. It is consistent with the fairness hypothesis to build a model in which players are completely rational and have other-regarding preferences, but have to learn about the preferences (and by extension play) of others. Such a Bayesian learning model can fully explain the observed dynamics in VCM public goods games. Because critical third players in MCS games face no strategic uncertainty, a similar approach cannot explain the observed changes in contribution rates over time. Roth and Slonim (1998) and Cooper et al. (2000) report learning by proposers in ultimatum games. As with VCM public goods games, these changes can be explained as a response to uncertainty about responders’ preferences.
42
D.J. Cooper, C.K. Stockman / Games and Economic Behavior 41 (2002) 26–45
that incorporates both fairness and learning and allows us to model the interaction between these two factors. Our hybrid model, the reinforcement learning model with fairness, takes a first step towards building a unified theory of fairness and learning. An important implication of the learning model with fairness is that fairness and learning are not distinctly separate phenomena, but instead must be considered as aspects of a unified whole. Understanding the dual nature of subjects’ behavior allows us to reconcile the contradictory conclusions reached by earlier papers comparing the fairness and learning hypotheses. Two preceding papers have studied variants of the ultimatum game to compare these hypotheses. Abbink et al. (1997) conclude that punishment is a better explanation of responder behavior than learning. They find clear evidence that fairness matters for responders and no firm evidence of learning by responders. In contrast, Cooper et al. (2000) find support for the pure learning hypothesis. Their main treatment effect is in the direction predicted by the pure learning hypothesis and they find strong evidence that responders’ actions respond to their past experience as a learning model predicts. Within the framework of our hybrid model, these opposing results can happen quite naturally. Trying to discern the relative importance of fairness and learning from a single experiment is closely akin to the famous anecdote about blind men trying to determine the nature of an elephant by feeling a single body part. Depending on exactly which experiment is considered, different conclusions will be reached about the relative importance of fairness and learning.13 We speculate that the learning model with fairness captures something about the behavioral foundations of critical third players’ choices. Unlike many settings where strong dynamics are observed, it is somewhat puzzling that the behavior of critical third players changes so dramatically. After all, what is there to learn for a critical third player? Usually when strong learning effects are observed, subjects face either substantial uncertainty about parameters of the game or substantial strategic uncertainty. Neither of these conditions applies to critical third players in the MCS game. In settings which involve fairly complex optimization problems, we observe learning because subjects must develop a good heuristic for solving the difficult maximization problem. This scarcely applies to critical third players who face a straightforward binary choice. We cannot even justify the observed dynamics through a decrease in random errors. The relatively weak increases in contribution rates for critical third players in the 3/6/9 and 1/3/9 treatments might be attributed to a decrease in random errors, but it is hard to square the strong decrease in contribution rates for critical third players in the 1/3/16
13 In fact, Cooper et al. fit a hybrid model analogous to ours to their data in the working paper version of their text. Even though it isn’t necessary to explain the main treatment effect, they find that the learning model with fairness provides a significantly better fit for the data.
D.J. Cooper, C.K. Stockman / Games and Economic Behavior 41 (2002) 26–45
43
treatment with a decrease in random errors.14 Just about the only thing left that subjects could be learning is their preferences. Indeed, recent advances in cognitive psychology suggest that subjects will not enter our experiment with well-formed preferences. Economists typically assume that preferences are an innate characteristic of individuals, and hence stable. Over the past few decades, cognitive psychologists have largely abandoned this rigid view of preferences in favor of the more flexible notion of constructive preferences.15 The basic idea can be summarized as follows. Individuals are not assumed to carry around pre-specified preferences in their heads. They cannot simply look up the appropriate preferences from some master list when faced with a novel problem. At best, individuals may have well defined attitudes towards the primary attributes of a novel problem. However, they lack any coherent notion of how these attitudes combine into a single preference measure. Instead, preferences must be constructed on the spot. The concept of constructive preferences applies naturally to the decision facing inexperienced subjects who are critical third players in the MCS game. The primary attributes of their choices are monetary payoffs and fairness. Contributing to the public good generates higher monetary payoffs but lower fairness. Presumably most subjects know that (ceteris paribus) they prefer outcomes that are more fair to outcomes that are less fair. They also can be assumed to prefer outcomes that (ceteris paribus) give higher monetary payoffs to those that give lower monetary payoffs. However, we speculate that subjects lack a complete preference ordering over combinations of fairness and monetary payoffs and must construct said preferences as needed. Put simply, rather than entering our experiment knowing how they feel about tradeoffs between fairness and monetary payoffs, we suspect that our subjects must learn their preferences. In particular, many critical third players in the 1/3/16 treatment realize with experience that they prefer a slightly lower monetary payoff with equal relative payoffs over a slightly higher monetary payoff with very unequal relative payoffs. Given this story, the reinforcement learning model with fairness is a natural approach to capturing the process of critical third players resolving their preferences. Consider the position of a subject playing as a third player in the 1/3/16 treatment. He is in an unfamiliar environment facing a problem he’s never 14 Suppose we concede the dubious point that subjects would have a 30% error rate in early
periods. If play in the last half of the experiment accurately reflects subjects’ true preferences, there is approximately a 50–50 split between subjects that prefer to contribute as critical third players in the 1/3/16 treatment and those who prefer to not contribute. Since random errors are presumably unbiased, it is equally likely that a player that intended to contribute would accidentally choose to not contribute as vice versa. Thus, even with a high rate of random errors, the contribution rate for critical third players should be roughly 50% in the early periods. This is clearly not the case. 15 For reviews of the literature on constructive preferences, see Payne et al. (1992), Slovic (1995), and Bettman et al. (1998). For attempts to build a theory of constructive preferences, see Bettman et al. (1998) and Payne et al. (forthcoming).
44
D.J. Cooper, C.K. Stockman / Games and Economic Behavior 41 (2002) 26–45
seen before. He must resolve a difficult tradeoff between fairness and monetary considerations. We might reasonably expect that he will be somewhat confused. While it is possible that our subject uses a very sophisticated strategy for resolving his confusion, it seems at least as plausible that he muddles along before settling on a strategy that seems to do well. This is essentially the learning strategy of a reinforcement learner. Our current experiments do not allow us to directly confirm this conjecture that changes in the choices of critical third players reflect the construction of preferences. More generally, they do not allow us to state with any certainty that the reinforcement learning model with fairness is the best possible method of synthesizing fairness and learning, to determine the best possible specification for this model, or to determine the range of settings in which the hybrid model is applicable. In future work with the MCS game and other related games, we plan to explore these issues.
Acknowledgments The authors would like to thank the National Science Foundation for financial support. We would like to thank Ido Erev, John Kagel, Jim Rebitzer, Al Roth, David Schkade, Bob Slonim, John Van Huyck, Jim Warnick, an anonymous editor, two anonymous referees, and seminar participants at Alabama, Case Western Reserve, the Cleveland Federal Reserve, Oberlin, Ohio State, Pittsburgh, Purdue, Texas A&M, Tufts, the SEA meetings, and the ESA meetings for their helpful comments. The usual caveat applies.
References Abbink, K., Bolton, G.E., Sadrieh, A., Tang, F., 1997. Adaptive learning versus punishment in ultimatum bargaining. Mimeo, Penn State University. Berg, J., Dickhaut, J.W., McCabe, K.A., 1995. Trust, reciprocity and social history. Games Econ. Behav. 10 (1), 122–142. Bettman, J.R., Luce, M.F., Payne, J.W., 1998. Constructive consumer choice processes. J. Cons. Res. 25 (3), 187–217. Binmore, K., Gale, J., Samuelson, L., 1995. Learning to be imperfect: The ultimatum game. Games Econ. Behav. 8 (1), 56–90. Blume, A., DeJong, D.V., Neumann, G.R., Savin, N.E., 1999. Inferring learning rules from experimental game data. Working paper, University of Iowa. Bolton, G.E., Ockenfels, A., 2000. ERC: A theory of equity, reciprocity, and competition. Amer. Econ. Rev. 90 (1), 166–193. Camerer, C., Ho, T.H., 1999. Experience-weighted attraction learning in normal form games. Econometrica 67 (4), 827–874. Charness, G., Rabin, M., 2000. Social preferences: Some simple tests and a new model. Working paper, University of California, Berkeley.
D.J. Cooper, C.K. Stockman / Games and Economic Behavior 41 (2002) 26–45
45
Cheung, Y.W., Friedman, D., 1997. Individual learning in normal form games: Some laboratory results. Games Econ. Behav. 19 (1), 46–79. Cooper, D.J., Feltovich, N., 1996. Reinforcement-based learning vs. Bayesian learning: A comparison. Working paper, University of Pittsburgh. Cooper, D.J., Feltovich, N., Roth, A.E., Zwick, R., 2000. Relative versus absolute speed of adjustment in strategic environments: Responder behavior in ultimatum games. Working paper, Case Western Reserve University. Erev, I., Rapoport, A., 1990. Provision of step-level public goods: The sequential contribution mechanism. J. Conflict Res. 34 (3), 401–425. Erev, I., Roth, A.E., 1998. Predicting how people play games: Reinforcement learning in experimental games with unique, mixed strategy equilibria. Amer. Econ. Rev. 88 (4), 848–881. Falk, A., Fischbacher, U., 1998. An eye for an eye, a tooth for a tooth: A theory of reciprocity. Mimeo, University of Zurich. Fehr, E., Kirchler, E., Weichbold, A., Gachter, S., 1998. When social norms overpower competition: Gift exchange in experimental labor markets. J. Labor Econ. 16 (2), 324–351. Fehr, E., Schmidt, K., 1999. A theory of fairness, competition, and cooperation. Quart. J. Econ. 114 (3), 817–868. Feltovich, N., 2000. Reinforcement-based vs. beliefs-based learning models in experimental asymmetric-information games. Econometrica 68 (3), 605–641. Ledyard, J.O., 1995. Public goods: A survey of experimental research. In: Kagel, J.H., Roth, A.E. (Eds.), The Handbook of Experimental Economics. Princeton University Press, Princeton. Levine, D.K., 1998. Modeling altruism and spitefulness in experiments. Rev. Econ. Dynam. 1 (3), 593–622. Moulton, B., 1986. Random group effects and the prevision of regression estimates. J. Econometrics 32, 385–397. Payne, J.W., Bettman, J.R., Johnson, E.J., 1992. Behavioral decision research: A constructive processing approach. Ann. Rev. Psych. 43, 87–131. Payne, J.W., Bettman, J.R., Schkade, D.A. Measuring constructed preferences: Towards a building code. J. Risk Uncertainty, forthcoming. Roth, A.E., 1995. Bargaining experiments. In: Kagel, J.H., Roth, A.E. (Eds.), The Handbook of Experimental Economics. Princeton University Press, Princeton. Roth, A.E., Erev, I., 1995. Learning in extensive-form games: Experimental data and simple dynamic models in the intermediate term. Games Econ. Behav. 8 (1), 164–212. Roth, A.E., Slonim, R., 1998. Learning in high stakes ultimatum games: An experiment in the Slovak Republic. Econometrica 66 (3), 569–596. Salmon, T.C., An evaluation of econometric models of adaptive learning. Econometrica, forthcoming. Slovic, P., 1995. The construction of preference. Amer. Psych. 50 (5), 364–371. Van de Kragt, A.J., Orbell, J.M., Dawes, R.M., 1983. The minimal contributing set as a solution to public goods problems. Amer. Politic. Sci. Rev. 77 (1), 112–122. White, H., 1994. Estimation, Inference and Specification Analysis. Cambridge University Press, New York.