Available online at www.sciencedirect.com
Mathematics and Computers in Simulation 81 (2011) 1202–1217
Original article
Decision making in dynamic stochastic Cournot games Hamed Kebriaei ∗ , Ashkan Rahimi-Kian Control and Intelligent Processing Center of Excellence, School of ECE, University of Tehran, Tehran, Iran Received 29 June 2010; received in revised form 12 October 2010; accepted 13 November 2010 Available online 5 December 2010
Abstract In this paper, the Cournot competition is modeled as a stochastic dynamic game. In the proposed model, a stochastic market price function and stochastic dynamic decision functions of the rivals are considered. Since the optimal decision of a player needs the estimation of the unknown parameters of the market and rivals’ decisions, a combined estimation–optimization algorithm for decision making is proposed. The history of the rivals’ output quantities (supplies) and the market clearing price (MCP) are the only available information to the players. The convergence of the algorithm (for both estimation and decision making processes) is discussed. In addition, the stability conditions of the equilibrium points are analyzed using the converse Lyapunov theorem. Through the case studies, which are performed based on the California Independent System Operator (CA-ISO) historical public data, the theoretical results and the applicability of the proposed method are verified. Moreover, a comparative study among the agents using the proposed method, naïve expectation and adaptive expectation in the market is performed to show the effectiveness and applicability of the proposed method. © 2010 IMACS. Published by Elsevier B.V. All rights reserved. Keywords: Cournot game; Stochastic game; Oligopoly market; Estimation; Nash equilibrium
1. Introduction The first dynamic model in the field of oligopoly games was introduced by Cournot [13]. In this game model, the players make their decisions according to the strategies of their rivals using the naïve expectations model (that assumes the rivals would repeat their last strategies in the next step of the game). In Refs. [1,2,7], the oligopoly dynamic models were considered assuming the naïve expectation of the rivals’ strategies. In Ref. [1], the dynamics of three and four player Cournot game model were studied with linear cost functions of the players. In Ref. [7], the results of Ref. [1] were extended to an n-player game with linear cost functions. The basins of attraction for multiple Nash equilibrium points were discussed in Ref. [2], where the players used naïve expectations and did not estimate the rivals’ behaviors. In Ref. [10], a duopoly game with adaptive adjustment of the rival’s behavior was taken into account. A review of the studies on complicated dynamics of oligopoly games can be found in Ref. [22]. Many research works have considered homogeneous expectations of the rivals, where the players use the same rule for rivals’ behavior adjustment [3–5,17,22]. This approach may not be useful for the cases that players have different decision making strategies.
∗
Corresponding author at: P.O. Box 14395-515, Tehran, Iran. Tel.: +98 21 88027756; fax: +98 21 88778690. E-mail address:
[email protected] (H. Kebriaei).
0378-4754/$36.00 © 2010 IMACS. Published by Elsevier B.V. All rights reserved. doi:10.1016/j.matcom.2010.11.007
H. Kebriaei, A. Rahimi-Kian / Mathematics and Computers in Simulation 81 (2011) 1202–1217
1203
In a more complex model of dynamic Cournot game, it is assumed that none of the players have complete information of the game, and thus, they should act adaptively. In an adjustment process based on bounded rationality, the players would estimate their marginal profits to amend their performance [8,9]. In Ref. [11], an oligopoly game with incomplete information for the sellers was studied. The players used naïve expectations to predict the behaviors of other players. A new method, called local monopolistic approximation, was proposed for profit maximization. The dynamics of a triopoly game with heterogeneous players were discussed in Ref. [14]. In that work, three players with different expectations were used, namely bounded rational, adaptive, and naïve expectations. The stability of the game and its chaotic behavior were analyzed. However, none of the agents estimated the actions of the rivals; and they only used the last actions of their rivals for decision making. A nonlinear duopoly game with heterogeneous players was considered in Ref. [25]. One of the players used naïve expectations to predict its rivals’ actions and the other one was a bounded rational player that used a replicator equation called “myopic”. The existence and the stability conditions of the equilibrium point were discussed. However, the players did not estimate the next actions of the rivals. In Ref. [18], a stochastic dynamic Cournot game was considered. The market price function had a stochastic linear form and the rivals were modeled as dynamic decision makers who decided based on bounded rational rule. The whole decision making process was modeled as a stochastic optimal control problem to maximize the long term players’ expected profits. However, it was assumed that the mean and variance of all random parameters of the market and rivals were available information to the players. In Ref. [16], for optimal decision making, each player estimated its rivals’ behaviors using linear regression and recursive weighted least squares method. In that model the players were not aware of the cost function parameters of their rivals. However, the price function parameters were assumed as available information to the players. The convergence of the algorithm was analyzed in the case of all players use the proposed method for estimation and decision making. In most of the research works, the parameters of the price function are assumed among available information in the market, while this information may not be available in some markets. In addition, usually the last actions of the opponents are used for decision by a player in the next round of the game. However, the complete history of the rivals’ actions could be used to estimate their next step decisions. Furthermore, the uncertainties of the market parameters and the rivals’ decisions have not been considered in the previously proposed models in literature. In this paper, it is tried to use the real available information in a market and use this information for optimal decision making of the players. In addition, by assuming the stochastic price function and the rivals’ decision functions, a more realistic model of the market, and opponents’ decision models are proposed. The prospect of this paper is to design an intelligent player with adaptive estimator for estimating the unknown market parameters and then predict the next-step actions of the rivals in a Cournot game to assist him/her in optimal decision making. The history of the rivals’ actions (supply amounts) and the market clearing price are assumed the available information to the players in the proposed algorithm. It should be noted that the above mentioned information are available through the public database of some energy market operators such as CA-ISO, PJM, and NY-ISO. The market price function has a linear stochastic model with unknown parameters. The strategies of the players, who are sellers, are their supply quantity offers to the market, and it is assumed that the rivals’ supply offers for the next market round are not known. As in Ref. [18], the rivals are modeled as bounded rational players. However, an additive noise is included in the rivals’ models to cover the modeling errors. Therefore, the rivals are modeled as stochastic bounded rational agents. The proposed intelligent player estimates the unknown parameters of the stochastic price function and also predicts the strategies of the rivals for the next market round and then finds the profit maximizing strategy for the next round. It is shown that the proposed estimator guarantees the convergence of the estimation to the real market price and rivals’ parameters. Also, it is proved that the expected supply strategy of the proposed player converges to the optimal supply strategy in finite time steps. The stability analysis of the Nash equilibrium point and boundary equilibrium point of the game are also discussed. The stability margin for the interior equilibrium point (Nash equilibrium) is obtained using converse Lyapunov theorem. Also the conditions in which the boundary equilibrium point becomes unstable are also studied. In the market simulation part, three different cases are considered: (1) three market players including two stochastic bounded rational agents and the new proposed agent; (2) a comparative case study when the proposed player is replaced by a naïve expectation player [10]; (3) a comparative case study when the proposed player is replaced by a player with adaptive expectation [20]. It is shown that the players’ strategies converge near to the Nash equilibrium point. The comparison among the average and cumulative payoffs of the
1204
H. Kebriaei, A. Rahimi-Kian / Mathematics and Computers in Simulation 81 (2011) 1202–1217
players shows that the new proposed agent performs better than the bounded rational, naïve and adaptive expectations agents. The organization of the paper is as follows: Sections 2 and 3 provide the problem formulation and the preliminaries of the work. In Section 4, the payoff maximizing algorithm is presented. In Section 5, the convergence of the estimation and decision making proposed algorithms are discussed. The stability analysis of the equilibrium points is given in Section 6. Section 7 includes the market simulation results for the three cases studies. Finally, the conclusion of the paper is provided in Section 8. 2. Preliminaries In oligopolistic games, the players may use different strategies to estimate their rivals’ behaviors, such as the simple rule of naïve expectations [10] or the more complicated rules of bounded rationality [6] and adaptive adjustment [20]. Moreover, the players in a game may have different strategies (heterogeneous expectations) or similar strategies (homogeneous expectations). Considering a fully rational Cournot oligopoly market, each player is assumed to be able to predict its rivals’ Cournot strategies one step forward. The optimal offered quantities of the players could be determined by solving the following set of n equations [6]: qi (t + 1) = arg max πi (q1i (t + 1), q2i (t + 1), . . . , q(i−1)i (t + 1), qi (t + 1), q(i+1)i (t + 1), ..., qni (t + 1)) qi (t+1)
i = 1, ..., n
(1)
/ i) is the prediction of the supply quantity of player-j by player-i. If a unique optimal solution where qji (j = 1, ..., n, j = for (1) exists, it can be described as: qi (t + 1) = fi (q1i (t + 1), q2i (t + 1), ..., q(i−1)i (t + 1), q(i+1)i (t + 1), .., qni (t + 1))
(2)
where fi ( · ) is the reaction function of player-i. Cournot assumed that qji (t + 1) = qji (t), which implies that each player assumes its rivals will repeat their last step strategies in the next time-step [10]. This is called naïve expectation. Therefore, the Cournot reaction function has the following dynamic discrete time form: qi (t + 1) = fi (q1 (t), q2 (t), ..., qi−1 (t), qi+1 (t), .., qn (t))
(3)
Eq. (3) describes an oligopoly game with naïve expectations of homogenous player behaviors. In general, the players can use more complicated expectations such as bounded rationality. The bounded rational players do not have complete information of the market, and make their decisions based on local estimation of their marginal payoffs [12]. If the marginal payoff is positive (negative), the player will increase (decrease) its supply quantity for the next time-step. The dynamic supply update equation of a bounded rational player may be written as [6]: qi (t + 1) = qi (t) + γqi (t)
∂πi ∂qi (t)
(4)
where γ is a positive scalar that shows the relative behavior adjustment speed. The minimum value of qi (t) should be bounded by zero. Another strategic behavior is adaptive adjustment [24]. Through this strategy a player calculates its desired supply quantity for the next step using the weighted sum of its reaction function and its supply quantity in the previous step. The dynamic equation for the adaptive adjustment for player-i is shown as follows: qi (t + 1) = (1 − v)qi (t) + vfi (q1 (t), q2 (t), ..., qi−1 (t), qi+1 (t), .., qn (t))
(5)
where v ∈ [0, 1] is the adjustment factor of the player. Note that for the special case of v = 1, Eq. (5) reduces to Eq. (3), which represents the best reply dynamics with naïve expectations. Another method of dynamic decision making is that, each player would forecast the supply quantities of its rivals for the next period, and then make a profit maximizing decision that based on the forecasted supply quantities [20]. A general formation of expectations used in economy is also adaptive adjustment. The player-i adapts its expectations of
H. Kebriaei, A. Rahimi-Kian / Mathematics and Computers in Simulation 81 (2011) 1202–1217
1205
player-j’s supply strategies as follows: qji (t + 1) = qji (t) + βi (qj (t) − qji (t)),
0 ≤ βi ≤ 1
(6)
It is easy to see that under this formulation, the estimated supply quantities are the weighted averages of the past actual supply quantities, where the weights decrease as time recedes [20]. The generalization of the proposed model in Ref. [20] to multi-product firms was considered by Okuguchi and Szidarovsky in Ref. [21]. In this paper the rivals (of the proposed player) are modeled as stochastic bounded rational players. In addition the proposed algorithms for estimation and decision making are compared with the naïve and adaptive expectations via numerous case studies. 3. Problem formulations Consider a market consisting of n sellers (players). The market clearing price may be calculated using the following stochastic model: λ(t) = λ0 − α
n
qi (t) + v(t)
(7)
i=1
where qi is the supply quantity of player-i and α > 0 shows the shares of the players on the market price. The scalar λ0 is the maximum price in the market and v is a Gaussian noise with zero mean and variance of σ. Eq. (7) implies that increasing the supply quantities would lower the market price. The payoff (or profit) of player-i may be calculated as follows: πi (t) = λ(t)qi (t) − Ci (qi (t)) i = 1, ..., n
(8)
where Ci ( · ) is the cost function of player-i with the following quadratic form: 1 (9) ai qi 2 (t) + bi qi (t) + ci 2 with ai , bi , ci > 0. As it is clear from (7) and (8), each player’s payoff is a function of other players’ offered supply quantities. The strategic decisions of the rivals of player-i are assumed to have the following stochastic dynamic model: ∂πj (t) qj (t + 1) = qj (t) 1 + βj / i (10) + wj (t), j = 1, 2, ..., n, j = ∂qj (t) Ci (qi (t)) =
where βj is the adjusting factor of player-j and wj (t) is a Gaussian noise with zero mean and variance of κj . Thus, the quadratic payoff function of player-i could be formulated as follows: πi (t) = (λ0 − α
n l=1
1 ql (t) + v(t))qi (t) − ai qi2 (t) − bi qi (t) − ci 2
(11)
The first order condition (FOC) for player-i to maximize its profit will be: n ∂πi (t + 1) ql (t + 1) + v(t + 1)) − (2α + ai )qi (t + 1) − bi = 0 = (λ0 − α ∂qi (t + 1)
(12)
l=1,l = / i
Solving Eq. (12) for qi (t + 1) would result in q¯ i (t + 1), which is the optimal supply decision of player-i at t + 1, yields: λ0 − bi − α nl=1,l =/ i ql (t + 1) v(t + 1) q¯ i (t + 1) = + (13) 2α + ai 2α + ai where term (v(t + 1))/2α + ai is also a zero mean white noise. The second order condition (SOC) of player-i’s profit function is: ∂πi (t + 1) = −2α − ai < 0 (14) ∂qi 2 (t + 1)
1206
H. Kebriaei, A. Rahimi-Kian / Mathematics and Computers in Simulation 81 (2011) 1202–1217
which confirms that Eq. (13) maximizes the player’s profit function. It should be noticed that the corner optima of (11) is occurred when q¯ i (t + 1) < 0. In this case q¯ i (t + 1) = 0 is the optimal solution of (11). The following model is used to estimate the optimal decision of player-i in (13): λˆ 0 − bi − αˆ nl=1,l =/ i qˆ l (st + 1) (15) qi (t + 1) = 2αˆ + ai where λˆ 0 , αˆ and qˆ l (t + 1), l = 1, 2, ..., n, l = / i are the respective parameters estimates of player-i. Therefore it needs to estimate the parameters of the market price function and the rivals’ supply strategies (Eqs. (7) and (10)). The FOC for the profit maximization implies that if the estimated MCP (considering the effects of the predicted rivals’ supply quantities) was equal or greater than player-i’s marginal cost of production/supply, then it will be profitable for player-i to supply to the market; otherwise, his/her supply quantity should be set to zero. In our market case studies this condition is always checked for the proposed player; also the market price parameters and the players’ cost function parameters are selected logically to represent real market conditions. The proposed estimation–decision making algorithms are described in the following section. 4. The payoff maximizing algorithm in stochastic Cournot games In this section, an estimation based optimization algorithm is proposed. The history of the players’ actions (supply quantities) and market price are assumed the available information to the market players. The proposed algorithm (for player-i) has the following steps: 1- Begin with the initial action set for all players at t = 1: {qj (t)}nj=1 2- Receive the available information from the market at time-step t: {qj (t)}nj=1 and λ(t) 3- Estimate parameters λ0 and α as follows. Assume the following linear estimation model: n n λˆ 0 ˆλ(t) = λˆ 0 − αˆ qi (t) = 1 − qi (t) αˆ i=1 i=1
(16)
ˆ − λ(t), the parameters λˆ 0 (t), α(t) Defining the estimation error as, e(t) = λ(t) ˆt should be chosen such that to minimize n the following cost function over the training data set: j=1 qj (k), λ(k) k=1
ˆ = I(λˆ 0 , α)
t
eT (k)e(k)
(17)
k=1
By some algebraic manipulation, the solution of this least square (LS) problem is obtained as follows: ˆ = (XT X)−1 XT Y θ(t) where
⎡
ˆ = θ(t)
(18)
λˆ 0 (t) , ˆ α(t)
n
⎤
⎢ 1 − qj (1) ⎥ ⎢ ⎥ j=1 ⎢ ⎥ ⎢. ⎥ .. ⎥, . X=⎢ . ⎢. ⎥ ⎢ ⎥ n ⎢ ⎥ ⎣ 1 −q (t) ⎦ j
⎡
⎤ λ(1) ⎢ . ⎥ ⎥ Y =⎢ ⎣ .. ⎦ λ(t)
(19)
j=1
As the columns of X are independent, if we have at least two measurements, the matrix (XT X) will be invertible. However, the data may be noisy and thus the regression matrix becomes near singular. To guarantee the accuracy ˆ such as: solving the normal equations of a numerical inversion, several approached can be used for calculating θ(t)
H. Kebriaei, A. Rahimi-Kian / Mathematics and Computers in Simulation 81 (2011) 1202–1217
1207
ˆ = XT Y by Gaussian elimination, forming an orthogonal decomposition of X by Gram–Schmidt or forming (XT X)θ(t) a singular value decomposition of X. The more details on these numerically advanced algorithms are given in Ref. [15]. 4- Modify the dynamic model other players’ decisions in (10) as follows: ∂λ(t) qj (t + 1) = qj (t) + βj qj (t) λ(t) + qj (t) − aj qj (t) − bj + wj (t) = (1 + βj λ(t) − βj bj )qj (t) ∂qj (t) −βj (α + aj )qj2 (t) + wj (t) = (1 − βj bj )qj (t) + βj λ(t)qj (t) − βj (α + aj )qj2 (t) + wj (t) = Aj qj (t) + βj λ(t)qj (t) + Bj qj2 (t) + wj (t)
(20)
5- Using the following linear regression model and the training data set: {qj (k), λ(k)}tk=1 , parameters Aj , βj and Bj could be estimated using the following regression model: ⎤ ˆj A ⎢ ⎥ λ(t − 1)qj (t − 1) qj2 (t − 1)] ⎣ βˆ j ⎦ ˆj B ⎡
qˆ j (t) = [qj (t − 1)
(21)
6- Using the estimated parameters (Aj (t), βj (t) and Bj (t)), the next step decision (supply quantity offer) of the rivals may be evaluated as follows: ⎡
qˆ j (t + 1) = [qj (t)
λ(t)qj (t)
⎤ ˆ j (t) A ⎢ ⎥ qj2 (t)] ⎣ βˆ j (t) ⎦
(22)
ˆ j (t) B 7- Now, player-i could make his/her optimal decision using Eq. (15) for time step t + 1. 8- Go back to step two and start the estimation–decision process for the next steps. For estimating the parameters of (16) at lease two measurements (two periods) are needed. However, for estimation of each opponent’s strategy in the next step at least four observations are needed (one for the output and three for the regressors). Hence in our proposed method in periods one and two, the player is initiated arbitrary and in periods three and four (when it can estimate the parameters of the price function) the player uses the naïve expectation. After four iterations the algorithm works normally. 5. The convergence analysis of the proposed algorithm In the following the convergence of the proposed estimation and decision algorithms are analyzed. Theorem 1. The estimated parameters of the market price function and the next step rivals’ strategies converge to their actual values. Proof. The proof for the convergence of the price function parameters is given below. The least square (LS) algorithm is an unbiased estimator [19], it follows that: λ0 ˆ E{θ(t)} = α
(23)
1208
H. Kebriaei, A. Rahimi-Kian / Mathematics and Computers in Simulation 81 (2011) 1202–1217
where E is the expectation operator. Moreover, in this case, the covariance of the estimation error in LS is defined as follows [19]: ⎡ −1 ˆ Cov{θ(t)} = σ 2 (XT X) (XT X) =
x1T x2 T
[x1
x2 ] =
x1T x1
x1T x2
x2T x1
x2T x2
1 x1 (t)x1 (t) N N
⎢ ⎢ t=1 =N⎢ ⎢ N ⎣1 N
x2 (t)x1 (t)
k=1
⎤
1 x1 (t)x2 (t) ⎥ N ⎥ N
k=1
N 1
N
x2 (t)x2 (t)
⎥ ⎥ ⎦
(24)
k=1
where the vectors x1 and x2 are two columns of X. Using the following approximation we have: N
1 xi (k)xj (k) ≈ E{xiT xj } N
(25)
k=1
ˆ Cov{θ(t)} ≈
σ2 −1 [E{xiT xj }]i,j=1,2 N
(26)
If N → ∞ then the proximities (25) and (26) convert to equality and Eq. (26) will go to zero: σ2 [E{xiT xj }]i,j=1,2 = 0 N→∞ N
ˆ lim Cov{θ(t)} = lim
N→∞
(27)
In addition according to Eq. (18) we have: λ0 (t) λ0 (t) −1 T −1 T ˆ = (X X) X X + V (t) = + (XT X) XT V (t) θ(t) α(t) α(t)
(28)
ˆ is also a stochastic vector where V (t) = [v(1), v(2), ..., v(t)]T . Since V(t) is a Gaussian white noise vector, then θ(t) ˆ ˆ with Gaussian distribution. The convergence of θ(t) covariance to zero confirms that θ(t) would converge to its mean λ0 ˆ j (t), βˆ j (t) and value, . The same procedure could be followed for the convergence analysis of parameters A α ˆ j (t). B Proposition 1. The prediction errors of the rivals’ decision strategies (supply quantities) converge to Gaussian white noise. Proof. Using Theorem 1, we have: lim (qj (t + 1) − qˆ j (t + 1)) = lim Aj qj (t + 1) + βj λ(t + 1)qj (t + 1) + Bj qj2 (t + 1) + wj (t + 1)
t→∞
t→∞
⎡
⎤ ˆ j (t + 1) A ⎢ ⎥ −[qj (t + 1)λ(t + 1)qj (t + 1)qj2 (t + 1)] ⎣ βˆ j (t + 1) ⎦ = wj (t + 1) ˆ j (t + 1) B
j = 1, 2, ..., n, i = / j
(29)
Therefore, the expected prediction errors of the rivals’ decision strategies (supply quantities) converge to zero in finite time. Proposition 2. As time goes to infinity, the expected value of the decision of player-i converges to the expected value of optimal decision.
H. Kebriaei, A. Rahimi-Kian / Mathematics and Computers in Simulation 81 (2011) 1202–1217
Proof. lim qi (t + 1) = lim
t→∞
λˆ 0 − bi − αˆ nl=1,l =/ i qˆ l (t + 1) 2αˆ + ai
t→∞
λ0 − b i − α
n
ql (t + 1)
l=1,l = / i
=
2α + ai
+
α
=
λ0 − bi − α
n
l=1,l = / i wj (t
n
l=1,l = / i (ql (t
1209
+ 1) − wj (t + 1))
2α + ai + 1)
(30)
2α + ai
Therefore, lim E{qi (t + 1) − q¯ i (t + 1)} = 0
(31)
t→∞
6. Stability analysis of the Nash equilibrium In this section, using the converse Lyapunov theorem for discrete time systems, the stability conditions of the Nash equilibrium point of the proposed game model is analyzed. In addition, the conditions in which the boundary equilibrium point of the game becomes unstable are also studied. For brevity, we consider a duopoly game including the proposed player and a bounded rational player. The results could be extended to more than two players. The estimation errors and the rivals’ strategies noises are considered as model uncertainties; and the effects of them are studied via simulation results in the next section. Consider the following two-player game: qj (t + 1) = qj (t)(1 + βj qi (t + 1) =
∂πj (t) ) = (1 + βj λ0 − βj bj )qj (t) − βj (2α + ai )qj2 (t) − βj αqi (t)qj (t) ∂qj (t)
λ0 − bi − αqj (t + 1) 2α + ai
(32) (33)
The equilibrium points of the above dynamics are: λi αj λi − αλj ∗ αi λj − αλi ∗ , qj = 0, ,q = αi αj αi − α 2 i α j αi − α 2
(34)
where αk = 2α + ak , λk = λ0 − bk , k = i, j
(35)
In the following theorems, the conditions for stability of the interior equilibrium point of the game (Nash equilibrium) and the instability of boundary equilibrium point are analyzed. i then the Nash equilibrium of the game is asymptotically stable. Theorem 2. If qj ∈ 0, β (α 2α α −α2 ) j
i j
Proof. By setting qj = qj + qj∗ and qi = qi + qi∗ the Nash equilibrium point will shift to the origin. Now, using Eq. (35), Eqs. (32) and (33) may be rewritten as follows: qj (t + 1) = (1 − βj αj qj∗ )qj (t) − βj αqj (t)qi (t) − βj αqj∗ (t)qi (t) − βj αj q j (t) 2
qi (t + 1) =
−αqj (t + 1) αi
Considering Eq. (37) for time t , t > 1 and replacing it in Eq. (36) would result in: β j α2 β j α2 2 2 qj (t) + − βj αj qj (t) = (1 − Kqj∗ )qj (t) − Kqj (t) qj (t + 1) = 1 − βj αj qj∗ + αi αi
(36) (37)
(38)
1210
H. Kebriaei, A. Rahimi-Kian / Mathematics and Computers in Simulation 81 (2011) 1202–1217
where α i αj − α 2 β j α2 α2 = βj >0 = βj αj − K = βj αj − αi αi αi
(39)
Consider the following Lyapnov function: V (qj (t)) = qj (t) 2
(40)
which is always positive except at the equilibrium point (that becomes zero). Now we have: V (qj (t + 1)) − V (qj (t)) = qj (t + 1) − qj (t + 1) = (−Kqj∗ qj (t) − Kqj (t))(2qj (t) − Kqj∗ qj (t) − Kqj (t)) 2
2
2
2
= −Kqj (t)(qj∗ + qj (t))(2 − Kqj∗ − Kqj (t)) 2
(41)
If −qj∗ < qj (t) <
2 − qj∗ K
(42)
Or equivalently 0 < qj (t) <
2 K
(43)
Then: V (qj (t + 1)) − V (qj (t)) = −Kδqj (t) < 0 2
(44)
where δ is a positive number. Eq. (42) is the condition that was given in Theorem 2. Therefore, in that condition the Nash equilibrium is stable and the proof is completed. Remark 1. The inequality qj (t) > −qj∗ is equivalent to qj (t) > 0 which is necessary for the feasibility of the solution. Moreover, in condition qj (t) < K2 by setting βj 1 we can achieve the desired stability margin. Remark 2. The initial conditions qj (0) and qi (0) should be selected such as to satisfy inequality (42) for t = 1. Therefore, we should have: −qj∗ < (1 − βj αj qj∗ )qj (0) − βj αqj (0)qi (0) − βj αqj∗ qi (0) − βj αj qj (0) < 2
Theorem 3. If λj <
α αi λi
or λj >
αλi αi
+
2 βj
2 − qj∗ K
(45)
then the boundary equilibrium point 0, αλii will be unstable
Proof. By replacing qj and qi by qj and qi + ((αi )/λi ), respectively and linearization of Eqs. (32) and (33) around the origin, and using Eq. (35) we have: βj αλi qj (t) (46) qj (t + 1) = 1 − βj λj + αi qi (t + 1) =
−αqj (t + 1) αi
(47)
Since Eqs. (46) and (47) are interdependent, it is sufficient to analyze Eq. (46) for stability conditions. If λj < (α/αi )λi or λj > (αλi /αi ) + (2/βj ) the eigenvalue of Eq. (46) would lay outside the unit circle and the boundary equilibrium point becomes unstable.
H. Kebriaei, A. Rahimi-Kian / Mathematics and Computers in Simulation 81 (2011) 1202–1217
1211
Table 1 The cost function parameters for the three players. Parameter Value
a1, a2 0.0057
b1, b2 0.2
c1, c2 500
a3 0.0075
b3 0.1
c3 400
7. Case studies In this section, the simplified California energy market consisting of only three power suppliers (including the proposed player and two opponents) is used for the market simulation case studies. The CA-ISO data from February 3to March 2, 2000 (30 days) are used for our case studies [23]. The market price function was estimated using the load-price data pair for hour 20, when the peak load and market price have usually occurred in that hour. Using a linear regression model, the parameters λ0 and α were obtained as 37.83 and 0.0002, respectively. The estimation error had a random distribution (assumed Gaussian) with zero mean and a variance equal to σ 2 = 3.68. In addition to estimate the opponents’ decisions (supply quantities) noise distributions, the real bid data of two major players in the California market were used. The variance of the steady state bids of the two GenCos (κj2 ) with the identification numbers (IDs) 467620 and 110778 were estimated as 3.54 and 40.41 [MW2 ], respectively. The cost function parameters of the players are given in Table 1. For a fair comparison between players 1 and 2, we assume that these players have identical cost function parameters. Also β2 and β3 are set to 0.03 and 0.02, respectively. In the following, first we examine the convergence of the estimated rivals’ strategies (supply quantities) as well as the convergence of the market to the Nash equilibrium. In addition, the market payoffs of the three players during the simulation periods are compared. Also, the proposed agent is compared with the naïve and adaptive expectation agents in terms of market accumulated payoffs. 7.1. The market simulations Suppose, three market agents including the agent designed based on the proposed algorithm (player 1) and two other stochastic bounded rational agents (players 2 and 3) that use the decision rule of Eq. (10). To find the deterministic Nash equilibrium of the game (in the absence of the v(t) and wj (t)), the FOC equations for all three players should be solved simultaneously as follows: ∂π2 (t + 1) ∂π3 (t + 1) ∂π1 (t + 1) = 0, = 0, =0 ∂q1 (t + 1) ∂q2 (t + 1) ∂q3 (t + 1)
(48)
The Nash equilibrium point of the game is obtained as (q1 , q2 , q3 )Nash = (5830.8, 5830.8, 4480.7) [in MW] After 20 iterations of the game, the offered supply quantities of the GenCos become (q1 , q2 , q3 )Game in t=20 = (5807.5, 5753.5, 4449.1) [in MW]. It is clear that the players’ strategies have converged close to the deterministic Nash equilibrium of the game. Fig. 1 shows that the supply quantities of all players. In order to compare the market performance of the players, we have used normalized accumulative profit (NAP) which is calculated as follows: t π¯ i (t) =
k=1 πi (k)
t
(49)
Fig. 2 shows the NAP of the players. It can be seen that the proposed agent can earn more profits in the market compared with the other two agents. Figs. 3 and 4 show the evolution of the estimated price function parameters. After about 30 iterations, λˆ 0 , αˆ have converged to 37.909 and 0.00022, respectively. As it was mathematically shown in Section 5, the estimated parameters have converged to their real values in finite time. Figs. 5, 6 and 7 show the evolution of the estimated rivals’ decision functions parameters in Eq. (20). These parameters have converged close to their real values after finite iterations. The estimation errors for the two rivals’ decision functions are shown in Fig. 8, where ej (t) = qj (t) − qˆ j (t), j = 2, 3.
H. Kebriaei, A. Rahimi-Kian / Mathematics and Computers in Simulation 81 (2011) 1202–1217 7000 6000
Quantity (MW)
5000 4000 3000 2000 Player 1 Player 2 Player 3
1000 0
10
20
30
40
50 60 Iteration
70
80
90
100
Fig. 1. The players’ supply quantities.
4
12
x 10
10
Payoff($)
8 player 1 player 2 player 3
6 4 2 0
10
20
30
40
50 60 Iteration
70
80
90
100
Fig. 2. The players’ normalized accumulative profits.
45 40 35 lambda0 estimate
1212
30 25 20 15 10 5 0
10
20
30
40
50 60 Iteration
70
80
Fig. 3. Evolution of the estimated λ0 .
90
100
H. Kebriaei, A. Rahimi-Kian / Mathematics and Computers in Simulation 81 (2011) 1202–1217
1213
-3
6
x 10
alpha estimate
5 4 3 2 1 0
10
20
30
40
50 60 Iteration
70
80
90
100
Fig. 4. Evolution of the estimated α. 3
2.5 j=2 j=3
Aj estimate
2
1.5
1
0.5
0
0
10
20
30
40
50 60 Iteration
70
80
90
100
Fig. 5. Evolution of the estimated Aj . 0.05 j=2 j=3
0.04
βj estimate
0.03 0.02 0.01 0 -0.01 -0.02
10
20
30
40
50
60
70
80
90
100
Fig. 6. Evolution of the estimated βj .
7.2. Comparison with the naïve player In this part, we simulate a game with three agents. Agent 1 is a naïve player and agents 2 and 3 use the stochastic bounded rational decision rule. The supply quantities of the players are shown in Fig. 9. Agent 1(the naïve player) has
H. Kebriaei, A. Rahimi-Kian / Mathematics and Computers in Simulation 81 (2011) 1202–1217 -4
1
x 10
j=2 j=3
0
Bj estimate
-1 -2 -3 -4 -5 -6 -7
10
20
30
40
50 60 Iteration
70
80
90
100
Fig. 7. Evolution of the estimated Bj .
Prediction error(MW)
1400 1200
q2-q2(hat)
1000
q3-q3(hat)
800 600 400 200 0 -200
20
40
60
80
100
Iteration
Fig. 8. The rivals’ decision function estimation errors.
7000 6000 5000 Quantity (MW)
1214
4000 player 1 player 2 player 3
3000 2000 1000 0
10
20
30
40
50 60 Iteration
70
Fig. 9. The players’ supply quantities.
80
90
100
H. Kebriaei, A. Rahimi-Kian / Mathematics and Computers in Simulation 81 (2011) 1202–1217
1215
earned more profits (NAP) in the market as shown in Fig. 10. The estimation errors of the rivals’ decision functions by the naïve player are shown in Fig. 11. By comparing Fig. 2 and Fig. 8 with Fig. 10 and Fig. 11, one could see that the proposed player in this paper had better estimations of the rivals’ decision functions and earned more profits in the market compared with the naïve player.
7.3. Comparison with the adaptive player We used μ = 0.05 in Eq. (6) that provided better results in the simulations. In this part, a game with three players is simulated and the results are analyzed. The only difference between this case study and case study 7.2 is that the first player uses adaptive expectation (Eq. (6)) for decision making in the market. The supply quantities of the players are displayed in Fig. 12. As shown in Fig. 12, agent 1 (with adaptive expectation rule) has made more profits (NAP) in the market. The estimation errors of the rivals’ decision functions by the adaptive player are shown in Fig. 14. By comparing Fig. 2 and Fig. 8 with Fig. 13 and Fig. 14, one could see that the proposed player in this paper had better estimations of the rivals’ decision functions and earned more profits in the market compared with the adaptive player. The summery of the comparative results for the mean absolute percentage error (MAPE) and the mean NAP (MNAP) are given in Table 2.
4
12
x 10
10
Payoff ($)
8 player1 player2 player3
6 4 2 0
10
20
30
40
50 60 Iteration
70
80
90
100
Fig. 10. The players’ normalized accumulative profits.
1500 q2-q2(hat) q3-q3(hat)
Prediction error (MW)
1000
500
0
-500
-1000
-1500
10
20
30
40
50 60 Iteration
70
80
90
Fig. 11. The rivals’ decision function estimation errors.
100
H. Kebriaei, A. Rahimi-Kian / Mathematics and Computers in Simulation 81 (2011) 1202–1217 7000 6000
Quantity (MW)
5000 4000 player 1 player 2 player 3
3000 2000 1000 0
10
20
30
40
50 60 Iteration
70
80
90
100
Fig. 12. The players’ supply quantities.
4
10
x 10
9 8 7 player 1 player 2 player 3
Payoff
6 5 4 3 2 1 0
0
10
20
30
40
50 60 Iteration
70
80
90
100
Fig. 13. The players’ normalized accumulative profits.
3000 q2-q2(hat) q3-q3(hat)
2000
Prediction error (MW)
1216
1000 0 -1000 -2000 -3000 -4000
10
20
30
40
50 60 Iteration
70
80
90
Fig. 14. The rivals’ decision function estimation errors.
100
H. Kebriaei, A. Rahimi-Kian / Mathematics and Computers in Simulation 81 (2011) 1202–1217
1217
Table 2 The summary of the MAPE and MNAP for the proposed, naïve and adaptive players.
MAPE of player 2 [%] MAPE of player 3 [%] MNAP [$]
Proposed player
Naïve player
Adaptive player
4.0954 4.1726 9.4474e+004
11.8379 8.3099 7.5706e+004
169.2536 31.8462 8.8460e+004
8. Conclusions In this paper, a method for combined estimation and decision making in stochastic Cournot games was developed. The market price function and rivals’ decision functions were assumed stochastic with white Gaussian noise. It was assumed that the only information available to each player (for implementing the proposed estimation–decision making algorithm) included the previous market prices and rivals’ bids (supply quantities) published by the regional market operator (e.g., CA-ISO). Using the proposed algorithm, a player would exploit the estimated parameters for optimal decision making in the market. The convergence of the LS estimators in finite time was mathematically shown under the assumption of white Gaussian noise. Furthermore via numerous market simulations (using the California energy market data), it was shown that the proposed algorithm (for simultaneous estimation and decision making) had advantages over some other proposed algorithms in the literature (e.g., naïve and adaptive agents) in terms of estimating the rivals’ decision functions and earning accumulated market profits. References [1] H.N. Agiza, Explicit stability zones for Cournot games with 3 and 4 competitors, Chaos Soliton & Fractals 9 (1998) 1955–1966. [2] H.N. Agiza, G.I. Bischi, M. Kopel, Multistability in a dynamic Cournot game with three oligopolists, Mathematics and Computers in Simulation 51 (1999) 63–90. [3] H.N. Agiza, On the analysis of stability, bifurcations, chaos and chaos control of Kopel map, Chaos Solitons Fractals 10 (1999) 1909–1916. [4] H.N Agiza, A.S. Hegazi, A.A. Elsadany, The dynamics of Bowley’s model with bounded rationality, Chaos Solitons Fractals 9 (2001) 1705–1717. [5] H.N. Agiza, A.S. Hegazi, A.A. Elsadany, Complex dynamics and synchronization of a duopoly game with bounded rationality, Mathematics and Computers in Simulation 58 (2002) 133–146. [6] H.N. Agiza, A.A. Elsadany, Chaotic dynamics in nonlinear duopoly game with heterogeneous players, Applied Mathematics and Computation 149 (2004) 843–860. [7] E. Ahmed, H.N. Agiza, Dynamics of a Cournot game with n-competitors, Chaos Solitons Fractals 9 (1998) 1513–1517. [8] E. Ahmed, H.N. Agiza, S.Z. Hassan, On modification of Puu’s dynamical duopoly, Chaos Solitons Fractals 11 (2000) 1025–1028. [9] G.I. Bischi, A. Naimzada, Global analysis of a duopoly game with bounded rationality, Advances in Dynamic Games and Applications 5 (1999) 361–385. [10] G.I. Bischi, M. Kopel, Equilibrium selection in a nonlinear duopoly game with adaptive expectations, Journal of Economic Behavior & Organization 46 (2001) 73–100. [11] G.I. Bischi, A.K. Naimzada, L. Sbragia, Oligopoly games with local monopolistic approximation, Journal of Economic Behavior & Organization 62 (2007) 371–388. [12] G.I. Bischi, M. Gallegati, A. Naimazada, Symmetry-breaking bifurcations and representative firm in dynamic duopoly games, Annals of Operations Research 89 (1999) 253–272. [13] A. Cournot, Researches into the Principles of the Theory of Wealth, Irwin Paper Back Classics in Economics, Hachette, Paris, 1963. [14] E.M. Elabbasy, H.N. Agiza, A.A. Elsadany, H. EL-Metwally, The dynamics of triopoly game with heterogeneous players, International Journal of Nonlinear Science 3 (2007) 83–90. [15] G.H. Golub, C.F. Van Loan, Matrix Computations, Mathematical Sciences, The John Hopkins University press, Baltimore, 1987. [16] H. Kamalinejad, V.J. Majd, H. Kebriaei, A. Rahimi-Kian, Cournot games with linear regression expectations in oligopolistic markets, Mathematics and Computers in Simulation 80 (2010) 1874–1885. [17] M. Kopel, Simple and complex adjustment dynamics in Cournot duopoly models, Chaos Solitons Fractals 12 (1996) 2031–2048. [18] Y. Liu, F.F. Wu, Prisoner Dilemma: Generator Strategic Bidding in Electricity Markets, IEEE Transactions on Automatic Control 52 (2007) 1143–1149. [19] O. Nelles, Nonlinear System Identification: (from Classical Approaches to Neural Networks and Fuzzy Models), Springer-Verlag, Berlin, Heidelberg, 2001. [20] K. Okuguchi, Expectations and Stability in Oligopoly Models, Springer-Verlag, Berlin, 1976. [21] K. Okuguchi, F. Szidarovsky, The Theory of Oligopoly with Multi-Product Firms. Lecture Notes in, Springer-Verlag, Berlin, 1990. [22] T. Puu, The Chaotic duopolists revisited, Journal of Economic Behavior and Organization 37 (1998) 385–394. [23] URL: www.ucei.berkeley.edu/. [24] H. Weihong, Theory of Adaptive Adjustment, Discrete Dynamics in Nature and Society 5 (2001) 247–263. [25] J. Zhang, Q. Da, Y. Wang, Analysis of nonlinear duopoly game with heterogeneous players, Economic Modeling 24 (2007) 138–148.