Control of Stochastic Evolutionary Games on Networks∗

Control of Stochastic Evolutionary Games on Networks∗

5th IFAC Workshop on Distributed Estimation and 5th IFAC Workshop on Distributed Estimation and 5th Workshop on Distributed Control Networked 5th IFAC...

492KB Sizes 2 Downloads 141 Views

5th IFAC Workshop on Distributed Estimation and 5th IFAC Workshop on Distributed Estimation and 5th Workshop on Distributed Control Networked 5th IFAC IFACin Workshop onSystems Distributed Estimation Estimation and and 5th IFAC Workshop on Distributed Estimation and Control in Networked Systems Control in Networked Systems Available online at www.sciencedirect.com September 10-11, 2015. Philadelphia, USA, Control in Networked Systems Control in Networked Systems September 10-11, 2015. Philadelphia, USA, September 10-11, 2015. Philadelphia, USA, September September 10-11, 10-11, 2015. 2015. Philadelphia, Philadelphia, USA, USA,

ScienceDirect

IFAC-PapersOnLine 48-22 (2015) 076–081

Control Control Control Control

of of of of

Stochastic Evolutionary Stochastic Evolutionary Stochastic Evolutionary  Stochastic Evolutionary on Networks  on Networks on on Networks Networks 

Games Games Games Games

∗ ∗ James R. Riehl ∗ Ming Cao ∗ ∗ ∗ James R. Riehl Ming Cao ∗ ∗ James R. Riehl Cao ∗ Ming James R. Riehl Ming Cao James R. Riehl Ming Cao ∗ ∗ ∗ Faculty of Mathematics and Natural Sciences, ENTEG, University of ∗ Faculty of Mathematics and Sciences, ENTEG, University of ∗ of Mathematics and Natural Natural Sciences, ENTEG, University of ∗ Faculty Faculty of and Sciences, ENTEG, University Groningen, The Netherlands (e-mail: {j.r.riehl, m.cao}@rug.nl) Faculty of Mathematics Mathematics and Natural Natural Sciences, ENTEG, University of of Groningen, The Netherlands (e-mail: {j.r.riehl, m.cao}@rug.nl) Groningen, The Netherlands (e-mail: {j.r.riehl, m.cao}@rug.nl) Groningen, The Netherlands (e-mail: {j.r.riehl, m.cao}@rug.nl) Groningen, The Netherlands (e-mail: {j.r.riehl, m.cao}@rug.nl) Abstract: We investigate the control of stochastic evolutionary games on networks, in which Abstract: We investigate the control of stochastic evolutionary games on networks, in which Abstract: We investigate the control of stochastic evolutionary games on networks, in which Abstract: We investigate the control of stochastic evolutionary games on networks, in each edge represents a two-player repeating game between neighboring agents. The games occur Abstract: We investigate the control of stochastic evolutionary games on networks, in which which each edge represents a two-player repeating game between neighboring agents. The games occur each edge represents a two-player repeating game between neighboring agents. The games occur each edge edge represents represents a two-player two-player repeating game between neighboring agents. The games games occur simultaneously at each time step, after which the agents can update their strategies based on each a repeating game between neighboring agents. The occur simultaneously at each time step, after which the agents can update their strategies based on simultaneously atstrategy each time time step, after afterwhile whicha the the agents can update update their strategies basedand on simultaneously at each step, which agents can their strategies based on local payoff and information, subset of agents can be assigned strategies simultaneously at each time step, after which the agents can update their strategies based on local payoff and strategy information, while aa subset of agents can be assigned strategies and local payoff and strategy information, while subset of agents can be assigned strategies and local payoff and strategy information, while a subset of agents can be assigned strategies and thus serve as control inputs. We seek here the smallest set of control agents that will guarantee local serve payoffasand strategy information, while a smallest subset ofsetagents can be assigned strategies and thus control inputs. We seek here the of control agents that will guarantee thus serve as control inputs. We seek here the smallest set of control agents that will guarantee thus control inputs. We seek here the set of agents that guarantee convergence the network desired state. deriving an exact solution that thus serve serve as as of control inputs. to Wea herestrategy the smallest smallest setAfter of control control agents that will will guarantee convergence of the network to aaseek desired strategy state. After deriving an exact solution that convergence of the network to desired strategy state. After deriving an exact solution that convergence of the network to a desired strategy state. After deriving an exact solution that is too computationally complex to be practical on large networks, we present a hierarchical convergence of the network to a desired strategy state. After deriving an exact solution that is too computationally complex to be practical on large networks, we present a hierarchical is too computationally complex to be practical on large networks, we present aa hierarchical is too computationally complex to be practical on large networks, we present hierarchical approximation algorithm, which we show computes the optimal results for special cases of is too computationally complex to beshow practical on large networks,results we present a hierarchical approximation algorithm, which we computes the optimal for special cases of approximation algorithm, which we show computes the optimal results for special cases of approximation algorithm, which we show computes the optimal results for special cases of complete and ring networks, while simulations show that it yields near-optimal results on trees approximation algorithm, which we show computes theitoptimal results for special cases of complete and ring networks, while simulations show that yields near-optimal results on trees complete and ring networks, while simulations show that it yields near-optimal results on trees complete and ring networks, while simulations show that it yields near-optimal results on trees and arbitrary networks in a wide-range of cases, performing best on coordination games. complete and ring networks, while simulations show that it yields near-optimal results on trees and arbitrary networks in a wide-range of cases, performing best on coordination games. and arbitrary networks in a wide-range of cases, performing best on games. and arbitrary networks a of performing best on coordination coordination games. and arbitrary networks in in Federation a wide-range wide-range of cases, cases,Control) performing best coordination games. © 2015, IFAC (International of Automatic Hosting by on Elsevier Ltd. All rights reserved. Keywords: networks, evolutionary games, optimization, control algorithms, stochastic systems Keywords: networks, evolutionary games, optimization, control algorithms, stochastic systems Keywords: networks, networks, evolutionary games, games, optimization, control control algorithms, stochastic stochastic systems Keywords: Keywords: networks, evolutionary evolutionary games, optimization, optimization, control algorithms, algorithms, stochastic systems systems 1. INTRODUCTION Although a relatively recently emerging topic, researchers 1. INTRODUCTION Although aa relatively recently emerging topic, researchers 1. INTRODUCTION INTRODUCTION Although relatively recently emerging topic, researchers 1. Although a relatively recently emerging topic, researchers have attacked similar problems from several different and 1. INTRODUCTION Although a relatively recently emerging topic,different researchers have attacked similar problems from several and have attacked similar problems from several different and have attacked similar problems from several different and interesting angles. For example, in the setting of infihave attacked similar problems from several different and The rapid growth in connectivity of our society and techinteresting angles. For example, in the setting of infiThe rapid growth in connectivity of our society and techinteresting angles. For example, in the setting of infiinteresting angles. For example, in the setting of infinite and well-mixed populations, Kanazawa et al. (2009) The rapid rapid growth in connectivity connectivitycomplexity of our our society society and tech- interesting angles. For example, in the setting of infiThe growth in of and technology has led to ever-increasing of the systems nite and well-mixed populations, Kanazawa et al. (2009) The rapid growth in connectivity of our society and technology has led to ever-increasing complexity of the nite and andhow well-mixed populations, Kanazawa et al. al. (2009) nite well-mixed populations, Kanazawa et (2009) showed taxation and subsidies can promote the emernology has led to to ever-increasing ever-increasing complexity of the the systems systems nite andhow well-mixed populations, Kanazawa et al. (2009) nology has led complexity of we rely upon, of which can be characterized by showed taxation and subsidies can promote the emernology has led tomany ever-increasing complexity of the systems systems we rely upon, many of which can be characterized by showed how taxation and subsidies can promote the emershowed how taxation and subsidies can promote the emergence of desired strategies in a routing game under the we rely upon, many of which can be characterized by showed how taxation and subsidies can promote the emerwe rely upon, many of which can be characterized by agents making decisions across a network. These decisions gence of desired strategies in a routing game under the we relymaking upon, decisions many of across whicha can be characterized by gence agents network. These decisions of desired strategies in a routing game under the gence of desired strategies in a routing game under the replicator dynamics, and Sandholm (2002) introduced agents making decisions across a network. These decisions gence of desired strategies in a routing game under the agents making decisions across aa network. These decisions are often based on local information and may be driven by replicator dynamics, and Sandholm (2002) introduced agents making decisions across network. These decisions are often based on local information and may be driven by replicator dynamics, and Sandholm (2002) introduced replicator dynamics, and Sandholm (2002) introduced pricing schemes to promote efficient choices on roadway are often often based based on local local information and may may be driven driven by replicator dynamics, and Sandholm (2002) introduced are on information and be by competing objectives of the agents. Game theory has long schemes to promote efficient choices on roadway are often based on local information and may be driven by pricing competing objectives of the agents. Game theory has long schemes to promote efficient choices on roadway pricing to promote efficient choices on networks, as well in more general economic contexts competing objectives of the the agents. Game theory theory has scale long pricing pricing schemes schemes toas promote efficient choices on roadway roadway competing objectives of agents. Game has long been used to tackle these kinds of problems on a small as well as in more general economic contexts competing objectives of the agents. Game theory has scale long networks, been used to tackle these kinds of problems on a small networks, as well as in more general economic contexts networks, as well as in more general economic contexts (Sandholm (2007)). For more complex networks under been used to tackle these kinds of problems on a small scale networks, as well as in more general economic contexts been used to tackle these kinds of problems on a small scale and where the agents can be assumed to be perfectly ratio(Sandholm (2007)). For more complex networks under beenwhere used to tackle these kinds of problems on a smallratioscale (Sandholm and the agents can be assumed to be perfectly (2007)). For more complex networks under (Sandholm (2007)). For more complex networks under best-response dynamics, Balcan et al. (2014) showed that and where the agents can be assumed to be perfectly ratio(Sandholm (2007)). For more complex networks under and where the agents can be assumed to be perfectly rational, but for larger scale and more complex systems, these best-response dynamics, Balcan et al. (2014) showed that and where the agents can be assumed to be perfectly rational, but for larger scale and more complex systems, these best-response dynamics, Balcan et al. (2014) showed that best-response dynamics, Balcan et al. (2014) showed that broadcasted information can guarantee convergence to an nal, but but for for larger larger scale and hold moreand complex systems, of these best-response dynamics, Balcan et al. (2014) showed nal, scale and more complex systems, these assumptions often no longer the dynamics the broadcasted information information can can guarantee guarantee convergence convergence to tothat an nal, but for larger scale and hold moreand complex systems, of these assumptions often no longer the dynamics the broadcasted an broadcasted information can guarantee convergence to an equilibrium that is within a given factor of the social assumptions often no longer hold and the dynamics of the broadcasted information can guarantee convergence to an assumptions often no longer hold and the dynamics of the strategy choices are better modeled by evolutionary game equilibrium that is within a given factor of the social assumptions often nobetter longermodeled hold andbythe dynamics of the equilibrium strategy choices are evolutionary game that is within a given factor of the social equilibrium that is within a given factor of the social optimum. Also, Cheng et al. (2015) presented a framework strategy choices are better modeled by evolutionary game equilibrium that is within a given factor of the social strategy choices are better modeled by evolutionary game theory (Sandholm (2010), Szab´ o and F´ a th (2007)), where optimum. Also, Cheng et al. (2015) presented a framework strategy(Sandholm choices are(2010), betterSzab´ modeled byF´ game optimum. theory o and aevolutionary th (2007)), where Also, Cheng et al. (2015) aa framework optimum. Also, Cheng etof al.networked (2015) presented presented framework for studying the control evolutionary games theory (Sandholm (Sandholm (2010), Szab´ and F´ th (2007)), (2007)), where optimum. Also, Cheng et al. (2015) presented a framework theory (2010), Szab´ oo F´ aa where strategies propagate through the population based on the for studying the control of networked evolutionary games theory (Sandholm (2010), Szab´ o and and F´ ath th (2007)), where strategies propagate through the population based on the for studying the control of networked evolutionary games for studying the control of networked evolutionary games using large-scale logical dynamic networks to model transtrategies propagate through the In population based on the the for studying the control of networked evolutionary strategies propagate through the population based on payoffs acquired by the agents. biological terms, using large-scale large-scale logical logical dynamic dynamic networks networks to to model modelgames transtrategies propagate through the In population based on the using payoffs acquired by the agents. biological terms, tranusing large-scale logical dynamic networks to model transitions between all possible strategy states. They used this payoffs acquired by the agents. In biological terms, the using large-scale logical dynamic networks to model tranpayoffs acquired by the agents. In biological terms, the mechanism for this propagation is an evolutionary survival sitions between all possible strategy states. They used this payoffs acquired bypropagation the agents.isIn biological terms, the sitions mechanism for this an evolutionary survival between all possible strategy states. They used this sitions between all possible strategy states. They used framework to derive equivalent conditions for reachability mechanism for this propagation is an evolutionary survival sitions between all possible strategy states. for They used this this mechanism for this propagation is an evolutionary survival of the fittest process in which the fitness is directly related framework to derive equivalent conditions reachability mechanism for this propagation is an evolutionary survival of the fittest process in which the fitness is directly related framework to derive equivalent conditions for reachability framework to derive derive equivalent conditions for reachability and consensus of strategies on aa network given a particof the the fittest process in which theagents fitnessneed is directly directly related framework to equivalent conditions for reachability of fittest process in which the fitness is related to the payoffs of the game. The not be simple and consensus of strategies on network given a particof the fittest process in which the fitness is directly related to the of the game. The agents need not be simple of strategies on a network given a control particand consensus consensus of on given particular set agents. the to the the payoffs payoffs of the the game. game.complex The agents agents needsuch not as be human simple and of strategies strategies on aa network network given a a control particto of The need not be simple organisms however; systems ular consensus set of of control control agents. However, However, the optimal optimal to the payoffs payoffs of themore game.complex The agents needsuch not as be human simple and organisms however; more systems ular set of control agents. However, the optimal control ular set of control agents. However, the optimal control of evolutionary games on networks remains a challenging organisms however; more complex systems such as human ular set of control agents. However, the optimal control organisms however; more complex systems such as human social networks and robotic networks can also fit well into of evolutionary games on networks remains a challenging organisms however; more complex systems suchfitaswell human social networks and robotic networks can also into games on networks remains aapaper, challenging of evolutionary evolutionary games on networks remains challenging open problem and is the primary focus of this which social networks and and robotic networks canhere alsothe fit well well into of evolutionary games on networks remains apaper, challenging social networks robotic networks can also fit into an evolutionary game framework, but strategy open problem and is the primary focus of this which social networks and robotic networks canhere alsothe fit well into of an evolutionary game framework, but strategy open problem and is the primary focus of this paper, which open problem and is the primary focus of this paper, which extends our earlier work in Riehl and Cao (2014a) and an evolutionary game framework, but here the strategy open problem and is the primary focus of this paper, which an evolutionary game framework, but here the strategy propagation mechanism can be better thought of as a extends our earlier work in Riehl and Cao (2014a) and an evolutionary game framework, but here the strategy propagation mechanism can be better thought of as aa extends our earlier work in Riehl and Cao (2014a) and extends our earlier work in Riehl and Cao (2014a) and Riehl and Cao (2014b). propagation mechanism can be better thought of as extends our earlier work in Riehl and Cao (2014a) and propagation mechanism can be better thought of as a learning process or update rule. One of the well-established Riehl and Cao (2014b). propagation mechanism can be better thought of as a Riehl learning process or update rule. One of the well-established and Cao (2014b). Riehl and Cao (2014b). learning process or update rule. One of the well-established Riehl and Cao (2014b). learning process or update rule. One of the well-established findings in the field is that evolutionary games can and learning process or update rule. One of the well-established After defining aa general evolutionary game framework findings in the field is that evolutionary games can and defining evolutionary game framework findings inlead thetofield field is that that evolutionary evolutionary games can with and After findings in the is games can and defining aa general general evolutionary game framework often do complex undesired outcomes After defining general evolutionary game framework findings inlead thetofield is thatand evolutionary games can with and After in Section 2, we formulate aa minimum agent control often do complex and undesired outcomes After defining a general evolutionary game framework in Section 2, we formulate minimum agent control often do lead to complex and undesired outcomes with often do lead to complex and undesired outcomes with in Section 2, we formulate a minimum agent control respect to the population as a whole, such as in prisoner’s in Section 2, we formulate a minimum agent control often do lead to complex and undesired outcomes with problem in Section 3. Although we derive an exact solution respect to the population as aa whole, such as in prisoner’s in Section 2, we 3. formulate aweminimum agent control problem in Section Although derive an exact solution respect to the population as whole, such as in prisoner’s respect to the population as a whole, such as in prisoner’s problem in Section 3. Although we derive an exact solution dilemma games or tragedy of the commons, where selfishproblem in Section 3. Although we derive an exact solution respect togames the population asofathe whole, such aswhere in prisoner’s algorithm in Section 4, the high computational complexity dilemma or tragedy commons, selfishproblem ininSection 3.4,Although we derive an exact solution algorithm Section the high computational complexity dilemma games or tragedy of the commons, where selfishdilemma games or of the where algorithm in Sectiononly 4, thefor high computational complexity ness tends to prevail over (Liebrand et al. algorithm in 4, high computational complexity dilemma games or tragedy tragedy ofcooperation the commons, commons, where selfishselfishmakes it practical small networks. Faced with ness tends to prevail over cooperation (Liebrand et al. in Section Sectiononly 4, the thefor high computational complexity makes it practical small networks. Faced with ness tends tends to prevail prevail over cooperation (Liebrand et al. al. algorithm ness to over cooperation (Liebrand et makes it practical only for small networks. Faced with (1986), Ostrom (2008)). There is consequently a strong makes it practical only for small networks. Faced with ness tends to prevail over cooperation (Liebrand et al. this obstacle, we proceed by designing an algorithm in (1986), Ostrom (2008)). There is consequently a strong makes it practical only for small networks. Faced with this obstacle, we proceed by designing an algorithm in (1986), Ostrom Ostrom (2008)). There There is consequently consequently a strong strong this (1986), (2008)). is a obstacle, we proceed by designing an algorithm in incentive to understand the possibilities for influencing this obstacle, we proceed by designing an algorithm in (1986), Ostrom (2008)). There is consequently a strong Section 5 that uses a hierarchical approach to approximate incentive to understand the possibilities for influencing this obstacle, we proceed by designing an algorithm in Section 5 that uses a hierarchical approach to approximate incentive to understand the possibilities for influencing incentive to understand the possibilities for influencing Section 5 that uses a hierarchical approach to approximate evolutionary games in order to catalyze better collective Section 5 that that We uses show a hierarchical hierarchical approach tothe approximate incentive to understand the to possibilities for influencing the solution. in Section 6 that resulting evolutionary games in order catalyze better collective Section 5 uses a approach to approximate the solution. We show in Section 6 that the resulting evolutionary games in order to catalyze better collective evolutionary games solution. We show in Section 66 games that the resulting outcomes in populations. the solution. show in Section that resulting evolutionary games in in order order to to catalyze catalyze better better collective collective the approximation is exact on complete outcomes in the solution. We We show for in classes Section of 6 games that the the resulting approximation is exact for classes of on complete outcomes in in populations. populations. outcomes approximation is exact for classes of games on complete approximation is exact for classes of games on complete outcomes in populations. populations. and ring networks. Moreover in Section 7, simulations show approximation is exact for classes of games on complete and ring networks. Moreover in Section 7, simulations show and ring networks. Moreover in Section 7, simulations show and ring networks. Moreover in Section 7, simulations show that the results are quite accurate on trees as well as and ring networks. Moreover in Section 7,trees simulations show that the results are quite accurate on as well as a a that the results are quite accurate on trees as well  This work was supported in part by the European Research that the the results are quite quite accurate on trees trees assummarize well as as a a broad class of geometric random networks. We  that results are accurate on as well as a broad class of geometric random networks. We summarize This work was supported in part by the European Research   This broad class of geometric random networks. We summarize work was supported in part by the European Research Council (ERCStG-307207). broad This was  broad class class of of geometric geometric random random networks. networks. We We summarize summarize This work work was supported supported in in part part by by the the European European Research Research Council (ERCStG-307207).

Council (ERCStG-307207). Council Council (ERCStG-307207). (ERCStG-307207). Copyright © 2015, 2015 IFAC 76 Hosting by Elsevier Ltd. All rights reserved. 2405-8963 IFAC (International Federation of Automatic Control) Copyright © © 2015 2015 IFAC IFAC 76 Copyright © 76 Copyright © 2015 IFAC 76 Peer review under responsibility of International Federation of Automatic Copyright © 2015 IFAC 76 Control. 10.1016/ifacol.2015.10.310

IFAC NecSys 2015 James R. Riehl et al. / IFAC-PapersOnLine 48-22 (2015) 076–081 Sept 10-11, 2015. Philadelphia, USA



our contributions and discuss important directions for future work in Section 8.

switch with a probability proportional to the payoff difference. This is a widely studied model that has some nice properties, in particular that the strategy distributions in well-mixed populations using proportional imitation is approximated by the replicator dynamics (Schlag (1998)). The proportional imitation rule can be expressed as follows:      1 λ  p xi (t + 1) = xj (t) := yj (t) − yi (t) (4) |Ni | 0 for each agent i ∈ V where j ∈ Ni is a uniformly randomly chosen neighbor, λ > 0 is an arbitrary rate constant, and the notation [z]10 indicates max(0, min(1, z)). This update rule is clearly payoff monotone and it is also persistent with a lower bound  that can be computed from λ, the maximum degree of the network, and the smallest positive payoff difference, which exists since there are only a finite number of possible agent payoffs.

2. EVOLUTIONARY GAME FRAMEWORK In this section we define the evolutionary game framework, which consists of a network, payoff matrix, and strategy update dynamics. 2.1 Network and single-game payoffs Let G = (V, E) denote an undirected network consisting of an agent set V = {1, . . . , n} and an edge set E ⊆ V × V, where each edge represents a 2-player symmetric game between neighbors. 1 The agents choose strategies from a binary set S := {A, B} and receive payoffs upon completion of each game according to the matrix A B   A a b M= . (1) B c d

2.4 Dynamics with control agents We add to the general framework a set of control agents L ⊆ V whose strategy can be externally manipulated, either through direct control or by adding neighbors with very high payoffs. Although the most general formulation would allow dynamic control sequences for these agents, we restrict the inputs here to fixed strategies and focus initially on the simpler case of the problem. This results in the following controlled dynamics:    xi (t + 1) = fc xj (t), yj (t) : j ∈ Ni ∪ {i} , L , which are an extension of (3) with a special case for the control agents.  A, i∈L xi (t + 1) = (5) f (·), otherwise

We assume that at each time step, players use a single strategy against all opponents, and thus the games occur simultaneously. We denote the strategy state by x(t) = T [x1 (t), . . . , xn (t)] , where xi (t) ∈ S is the strategy of agent i at time t. Total payoffs for each agent are given by  yi (t) = wi Mxi (t),xj (t) , (2) j∈Ni

where Ni := {j ∈ V : {i, j} ∈ E} is the neighbor set of agent i and the most common values for the weights wi are 1 for cumulative payoffs and |N1i | for averaged payoffs. The total payoffs are collected into the vector T y(t) = [y1 (t), . . . , yn (t)] .

The combination of a network, payoff matrix, and update rule forms what we call a network game Γ := (G, M, f ).

2.2 Strategy update dynamics A fundamental concept behind evolutionary games is that better performing strategies are adopted more often, meaning that rather than rationally choosing bestresponse strategies, players imitate strategies in their neighborhood that result in higher payoffs. We capture this dynamic with a strategy update rule that is a function of the strategies and payoffs of neighboring agents:   xi (t + 1) = f xj (t), yj (t) : j ∈ Ni ∪ {i} . (3) The only restrictions we make on the update rule is that it is payoff monotone, i.e. players only switch to strategies with which at least one agent in the neighborhood achieves a greater payoff (Szab´ o and F´ ath (2007)), and persistent, meaning that if there exists a better performing strategy in an agent’s neighborhood, then the agent will switch to that strategy with a probability that is lower bounded by  > 0.

3. PROBLEM FORMULATION Now that we have a general dynamic evolutionary game model with control inputs, we are interested in how one can influence the network through efficient use of these inputs in order to achieve some desired outcome of strategies. In this work, we focus on achieving uniform adoption of strategy A and pose what we call the Minimum Agent Consensus Control (MACC) problem. Problem 1. (MACC). Given a network game Γ and initial strategy state x(0), find the smallest set of control agents L such that xi (t) → A for each agent i ∈ V. We say that xi (t) converges almost surely to X if limt→∞ [p(x(t) = X)] = 1, and indicate this with the shorthand notation xi (t) → X.

2.3 Example: proportional imitation

Remark 1. Since we are concerned with optimality, we seek solutions corresponding to given initial strategy states. Although one could modify the proposed approach to compute a set of control agents that would work for any initial condition, due to the complexity of the underlying network, this would almost certainly lead to very conservative results.

One example of such dynamics is the proportional imitation rule, in which each agent chooses a neighbor randomly and if this neighbor received a higher payoff in the previous round by using a different strategy, then the agent will 1

77

We do not require connectivity of the network.

77

IFAC NecSys 2015 James R. Riehl et al. / IFAC-PapersOnLine 48-22 (2015) 076–081 Sept 10-11, 2015. Philadelphia, USA

78

4. EXACT SOLUTION

can serve as the root, we will take r = 1 without loss of generality. Denoting the number of levels by n , let g : V → {0, . . . , n } be the mapping of agents to their level in the spanning tree rooted at r, which is equivalent to the number of edges in the shortest path from the agent to the root. Let V := {i ∈ V : g(i) = } denote the set of all agents on each level  ∈ {0, . . . , n }, and let Cp := {c ∈ Vg(p)+1 : {p, c} ∈ E} denote the set of children of a given agent p. Finally, we define a function ρ : V → V ∪ {0} mapping each agent to its parent except for the root agent which is mapped to zero.

One approach to solving the MACC problem exactly involves building very large state transition probability matrices for each possible set L of control agents, similar to those used in the logical dynamic network framework introduced in Cheng et al. (2015). Since the goal is to find the smallest control set for which the network converges to the desired strategy state, one could start with L := ∅ and increment to all one-agent sets, then two-agent sets, etc., until the desired convergence occurs. In a network of n agents, the total number of possible strategy states is 2n , and the size of the state transition matrix is thus 2n × 2n . Let xi denote the strategy state corresponding to the index i represented in base 2, e.g. for n = 4, x9 = [1, 0, 0, 1]T . The state transition probability matrix corresponding to a given control set L can then be expressed as   [ΦL ]ij := p x(t + 1) = xj |x(t) = xi , where the transition probabilities p(·) are computed from the strategy update rule fc . Once the matrix ΦL is constructed, one can check whether the desired state is accessible from the initial state using a breadth-first connectivity search, for example. Connectivity is not a sufficient condition however, since there may be nonzero probabilities of ending up in undesired states. The set of all stationary states can be computed by finding the eigenvectors corresponding to eigenvalues of ΦL equal to one. One then needs to ensure that x∗ = A for all i ∈ V is the only stationary state that is accessible from x(0). If not, the incremental search must proceed until this condition is satisfied.

In several steps of the approximation, we need to bound the achievable payoffs for agents in a given strategy configuration. Let X, Y ⊆ S denote generic strategy subsets assigned to agent i and neighbor j ∈ Ni ∪{0}, respectively. We introduce the following notation to allow for concise description of these quantities:    X,Y (6) := wi max MX,Z + δj MX,Y yˆi,j k∈Ni−j

X,Y yˇi,j := wi

 

k∈Ni−j

Z∈Ωk

min MX,Z + δj MX,Y

Z∈Ωk



(7)

where Ni−j := Ni − {j} denotes the set of all neighbors of agent i excluding j. The summation terms in the above expressions thus correspond to payoffs to agent i resulting from games against all neighbors other than j and the second term corresponds to the payoff resulting from the game against agent j. The case of j = 0 allows us to compute the payoff limits without fixing the strategy of any neighbor, and therefore we define δj = 0 when j = 0 and δj = 1 otherwise.

The computational complexity of this approach is clearly very high, and therefore is only practical for small networks. Whether or not the MACC problem belongs to a class such as NP-hard remains an open research problem, but due to the apparent poor scalability of computing exact solutions, we now seek an approximate solution.

The sets Ωi appearing in the payoff bounds (6)-(7) are constructed incrementally and denote the strategies that each agent might play before they switch to A permanently. Let L˜ ⊆ V denote a set of candidate control agents, which will be used in Algorithm 2. The sets Ωi are initialized as follows:  {xi (0)} ∪ A, i ∈ L˜ Ωi := . (8) otherwise {xi (0)}, Algorithm 1 then computes all possible strategy propagations through the network.

5. APPROXIMATE SOLUTION We present here a hierarchical approach for approximating the solution to the MACC problem. Inspired by our earlier results in Riehl and Cao (2014a), this algorithm works by ensuring the propagation of the desired strategy along the edges of a minimum spanning tree. Although the algorithm was initially designed for deterministic games on tree networks, we show here that a few small modifications renders the approximation valid for stochastic evolutionary games on arbitrary networks.

∆ := true while ∆ do 3 ∆ := f alse 4 foreach i ∈ V, j ∈ Ni do 5 foreach X ∈ Ωi , Y ∈ Ωj do X,Y Y,X 6 if yˆi,j > yˇj,i then 7 Ωj := Ωj ∪ {X} 8 ∆ := true 9 end 10 end 11 end 12 end Algorithm 1: Computes all possible strategy propagations from initial strategy sets defined in (8). 1 2

Let T denote a minimum spanning tree of the original network G using edge weights equal to the reciprocal of the sum of the degrees of the adjacent nodes. Although the modified algorithm will work with any spanning tree, we choose these weights in order to retain the most influential edges. For convenience in describing the algorithm, we use the analogy of a family tree starting from a single common ancestor or root, and consisting of successive generations or levels of parents and respective children. Before introducing the formal algorithm, we need to define a few quantities and agent sets. First we choose an arbitrary root agent r. Since any agent in the network

Lastly, we denote by ΩX the set of all agents for which a given strategy X is reachable under a given control set. 78

IFAC NecSys 2015 James R. Riehl et al. / IFAC-PapersOnLine 48-22 (2015) 076–081 Sept 10-11, 2015. Philadelphia, USA



Lemma 1. Given a network game Γ, initial state x(0), and ˆ Algorithm 1 computes the sets Ωi of partial control set L, strategies for each i ∈ V that are reachable under some ˜ control set L ⊆ Lˆ ∪ L.

Lˆ := {r} ˜ := V − {r} 2 L 3 := n 4 while > 0 do 5 foreach p ∈ V−1 do 6 L˜ := L˜ − Cp 7 Compute the sets Ωi using 1   B,AAlg. 8 c∗ := arg maxc∈Cp ∪ΩB yˆc,p A,A 9 while yˇp,ρ(p) ≤ yˆcB,A ∗ ,p do 10 Lˆ := Lˆ ∪ {c∗ }  B,A  11 c∗ := arg maxc∈Cp ∪ΩB yˆc,p 12 end 13 end 14 := − 1 15 end Algorithm 2: Computes an an approximately minimal set of control agents Lˆ needed to drive a network to uniformity in the desired strategy A. 1

Proof. Define Ωt := (Ωt1 , Ωt2 , . . . , Ωtn ), where Ωti := {X : p(xi (t) = X) > 0} is the set of strategies reachable for agent i at time t. Initially, we have Ω0i = xi (0). For any t > 0 and any strategy X ∈ S, we observe that p(xj (t) = X) > 0 implies either xj (t − 1) ∈ Ωt−1 or for j there exists a neighbor i ∈ N such that some Y ∈ Ωt−1 j j p(xi (t − 1) = X) > 0 and xi (t − 1) = X =⇒ yi (t − 1) > yj (t − 1). Using (6) and (7) we have    X,Y = wi max M yˆi,j X,Z + MX,Y t−1 k∈N −j

Y,X yˇj,i = wj

 i

k∈Nj−i

Z∈Ωk

min MY,Z + MX,Y t−1

Z∈Ωk

79



X,Y Y,X The previous condition is equivalent to yˆi,j > yˇj,i , which can be checked independently for each pair of agents in any order since it only depends on the sets Ωt−1 at i the previous time step. Algorithm 1 performs exactly this process, iterating until no further strategy propagations can occur, at which point the algorithm terminates. 

B,A A,A on arbitrary level , we have either i ∈ Lˆ or yˇρ(i),i > yˆi,ρ(i) ,  which implies that if there exists a time τ such that xρ(i) (τ  ) = A then yρ(i) (τ  ) > yi (τ  ) and thus xi → A. We now have by induction that xi → A for all agents i ∈ V. 

Remark 2. Algorithm 1 is guaranteed to terminate after 0 at nmost 0n steps since the initial total cardinality |Ω | := i=1 |Ωi | is equal to n, must increase by at least one with each iteration, and has a maximum value of 2n. In the worst case, each of these steps will require computing payoffs for all agents, which requires O(m) computations, where m is the number of edges. Therefore, Algorithm 1 has worst-case computational complexity O(mn).

Remark 3. The worst case computational complexity is dominated by the strategy propagation algorithm which can be performed up to n times, yielding the conservative estimate of O(mn2 ) for Algorithm 2. However, to increase efficiency, the payoff bounds computed in the strategy propagation algorithm can be reused both upon subsequent calls to Algorithm 1 and in the remaining steps of Algorithm 2.

We are now ready to present Algorithm 2, which approximates the solution to the MACC problem by working from the bottom of the tree towards the top, using the following procedure at each level. For each parent agent, add children who either start with or might switch to strategy B to the control set in decreasing order of a switching threshold based on payoff bounds until all remaining children will eventually switch to A once the agents in higher generations are playing A.

6. ANALYSIS An important test of any approximation method is to check the results on cases for which one can derive the exact solution analytically. In this section, we show that Algorithm 2 computes exact solutions for classes of games on two simple network structures: complete and ring networks.

The following theorem confirms that Algorithm 2 achieves the objective of computing an approximate solution and thus upper bound to the solution of the MACC problem on arbitrary networks.

6.1 Complete networks One class of networks in which one can derive an analytical solution to the MACC problem is complete networks, where every agent is connected to every other agent. This is the network equivalent of a well-mixed population, the most widely studied case in classical evolutionary game theory. In a complete network, in order for all agents playing B to eventually switch to A, the initial payoff of agents playing A must be strictly greater than that of agents playing B. Denoting these payoffs by yA and yB and taking wi = 1, we can write this condition as follows: yA = a(nA − 1) + bnB > cnA + d(nB − 1) = yB , where nA and nB denote the number of agents playing A and B. Since nA + nB = n, this is equivalent to a(nA − 1) + b(n − nA ) > cnA + d(n − nA − 1).

Theorem 1. Given a network game Γ and initial strategy state x(0), Algorithm 2 computes a sufficient set of control agents Lˆ such that xi (t) → A for all agents i ∈ V. Proof. It suffices to show that for all i ∈ V, if there exists τ ≥ 0 such that xi (τ ) = B then there exists τ  > τ such that xi (t) = A for all t ≥ τ  . Working downwards from the root r of the spanning tree T , consider an agent i such that i ∈ Cr . Using the condition in step 9 with Lemma 1 to bound the payoffs, we know that either i ∈ Lˆ and A,A B,A > yˆi,r , which implies xi (t) = A for all t ≥ 0, or yˇr,0 that yr (0) > yi (0), and thus by the persistence of the strategy update rule, xi (t) → A. Similarly, for an agent i 79

IFAC NecSys 2015 James R. Riehl et al. / IFAC-PapersOnLine 48-22 (2015) 076–081 Sept 10-11, 2015. Philadelphia, USA

80

extremely long. Algorithm 2 avoids this situation by controlling all agents initially playing B, a brute-force solution that results in faster convergence than the alternative.

Rearranging the terms yields nA (a + d − b − c) > n(d − b) + a − d. (9) We see that the characteristics of the game change significantly depending on the term δ = a + d − b − c. If δ > 0, an agent switching from B to A increases the payoff to agents already playing A, and we say that the game is coordinating in strategy A. On the other hand if δ < 0, an agent switching from B to A decreases the payoff to agents playing A and we say the game is anti-coordinating in A. Finally, if δ = 0 then the number of agents playing A has no net effect on the payoff to agents playing A and we say the game is neutrally-coordinating in A. Although the concept is related, this should not be confused with the class of games called coordination games, i.e. a > b and d > c, since a game can be coordinating in a single strategy without being a coordination game. Since the MACC problem involves causing a network to converge to A by controlling certain agents to play A, the coordinating case is most relevant and the one we focus on here. Proposition 1. Given a game that is coordinating in strategy A on a complete network, the solution to the MACC problem is given by   n(d − b) + a − d  |L∗ | = min 1 + , n − nA0 , (10) a+d−b−c

6.2 Ring networks We can also derive a solution for the ring network, e.g. E = {{1, 2}, {2, 3}, . . . , {n, 1}}. Since there are still too many possible behaviors of arbitrary games on ring networks for the scope of this paper, we focus on the interesting subclass of simple coordination games, where a, d > 0 and b = c = 0 and all agents are initially playing strategy B. Proposition 2. Given a simple coordination game on a ring network starting with all agents playing B, the solution to the MACC problem is given by  2, a>d ∗ n . (11) |L | =  , otherwise 1+ 2

Proof. The possible payoffs to agents playing A and B are yA ∈ {0, a, 2a} and yB ∈ {0, d, 2d}. A single agent playing A surrounded by 2 B-agents receives a payoff of 0 and hence A will not propagate. If a > d however, then two neighboring A-agents receive a payoff of a compared to d for their neighbors. By the persistence of the strategy update, the neighbors will then eventually switch to A. The strategy A will then continue to propagate around the ring since all subsequent B-neighbors have payoffs of either d or 0. If a ≤ d, then two agents are not sufficient on any network of size greater than 3. The only way for strategy A to propagate in this case is if the neighboring Bagents receive payoffs of 0. The minimum configuration of control agents for which this is the case is a cluster of either two or three control agents for n odd or even, respectively, and alternating between no-control and control around the remainder of thering network. The resulting number of  control agents is 1 + n2 . 

where nA0 denotes the number of (uncontrolled) agents initially playing A. The proof follows directly from (9) and the fact that M is coordinating in A. We are now ready to state the following corollary to Theorem 1. Corollary 1. Algorithm 2 computes the exact solution to the MACC problem for games on complete networks that are coordinating in strategy A. Proof. Let T be the spanning tree generated by taking any agent initially playing A (if it exists – otherwise any agent) as the root and the rest as children of that root agent. This is a minimum spanning tree since all edges in the complete network have the same degree. Algorithm 2 A,A B,A adds agents to Lˆ until yˇr,c > yˆc,r for all c ∈ Cr ∩ ΩB . Assume that the initial number of agents nA0 is such that (9) does not hold – otherwise the solution is trivially zero. This means that switching from B to A will not occur and Ωi = {B} for all agents initially playing B. The fact that the root agent is always controlled means that Ωr = {A} and also xr (0) = A and thus nA0 ≥ 1. Using (6) and (7) and simplifying yields  A,A = MA,xj (0) = a(nA − 1) + b(n − nA ) yˇr,0

Corollary 2. Algorithm 2 computes the exact solution to the MACC problem for simple coordination games on ring networks in which all agents initially play strategy B.

Proof. Without loss of generality, take T as the spanning tree resulting from deleting the edge between agents 1 and n with agent 1 as the root. The criteria for whether to A,A B,A control each agent is yˇi,i−1 > yˆi−1,i−2 . At the bottom agent in the tree, no control is needed since it connects to the controlled root agent and will switch to A after B,A once its parent agent switches, i.e. yˆn,n−1 = 0. For the next higher agent however, the child is not controlled and B,A A,A yˆn,n−1 = d. Since yˇn−1,n−2 = a, if a < d, then this agent will not switch when its parent switches and it must be controlled. These previous two steps repeat up the tree until we get back to the root agent, which will be part of a cluster of two or three control agents depending on  whether n is odd or even. The result is exactly 1 + n2 and the proof is completed. 

j∈Nr

B,A yˆc,r

=



j∈Ni−r

MB,xj (0) + MB,A = cnA + d(n − nA − 1),

which means that steps 8-12 proceed exactly until condition of Proposition 1 is met and the proof is completed.  Remark 4. The solution to the MACC problem   for games ∗ that are anti-coordinating in A is |L | = n(d−b)+a−d . a+d−b−c However, since the payoff to A-agents decreases as more agents switch to A, convergence of the entire network to A would require simultaneous switching of all uncontrolled B-agents to A. Although this will indeed happen with probability one as t → ∞, the convergence time could be

7. SIMULATIONS Although it is encouraging that the proposed algorithm produces the expected results for certain classes of network games that we can solve analytically, the purpose and 80

IFAC NecSys 2015 James R. Riehl et al. / IFAC-PapersOnLine 48-22 (2015) 076–081 Sept 10-11, 2015. Philadelphia, USA



minimum L∗ . We see that the approximation is within 1 and 1.25 agents of the optimal value for the SH and PD games, respectively. Much of the difference in the SD results is attributable to the anti-coordinating effect described in Remark 4, resulting in a trade-off between the size of the control set and the expected convergence times E[tˆ] and E[t∗ ] listed in columns 3 and 4.

strength of the proposed approach is that it can be applied to arbitrary games on networks of much larger size than could be computed exactly in a reasonable amount of time. In this section, we use simulations to test the accuracy of the approximation for large trees and small geometric random networks. In each case we start with all agents playing B and use the proportional imitation rule (4) (λ = 14 ) with three different payoff matrices corresponding to standard evolutionary games: stag hunt (SH – a coordination game), prisoner’s dilemma (PD – a game that is neutrally-coordinating in A), and snow drift (SD - an anti-coordination game). We use the following payoff matrices for SH, PD, and SD, respectively. A B A B A B       A 2 0 A 4 −1 A 3 1 , , . B 1 1 B 5 0 B 5 0

8. CONCLUSIONS We have introduced a hierarchical algorithm to approximate the minimum number of control agents needed to drive an arbitrary network engaged in a stochastic evolutionary game to a desired strategy. After showing that the algorithm computes the exact solution for games on complete and ring networks, we demonstrated via simulation that the algorithm is quite accurate on trees and arbitrary networks, particularly for games that are coordinating in the desired strategy. This extension to the stochastic setting is a significant step forward from our previous work. In the future, we plan to extend the results to games with more than two strategies and to non-uniform desired strategy states. Further interesting research directions include dynamic control sequences and investigating payoff control as an alternative to direct strategy control.

7.1 Large trees In this study, we test Algorithm 2 on 100 randomly generated tree networks. Specifically, the trees have up to four levels and 1000 agents and are generated by a process in which each agent has between zero and nine children with uniform random probability. Since we cannot compute the exact solution for networks this large, we generate a lower bound Lˇ using the algorithm from the deterministic tree setting in Riehl and Cao (2014a), which remains valid for the stochastic case since it only makes use of the payoff monotone property of the update rule. Table 1 shows the mean fraction of agents controlled and the mean difference between the upper and lower bounds. As expected, the bounds are tightest for the coordination game, and incrementally less tight for the PD and SD games, even though PD requires more control agents.

REFERENCES Balcan, M.F., Krehbiel, S., Piliouras, G., and Shin, J. (2014). Near-optimality in covering games by exposing global information. ACM Transactions on Economics and Computation, 2(4), 13. Cheng, D., He, F., Qi, H., and Xu, T. (2015). Modeling, analysis and control of networked evolutionary games. IEEE Transactions on Automatic Control, PP(99), 1–1. Kanazawa, T., Misaka, T., Ushio, T., and Fukumoto, Y. (2009). A control method of selfish routing based on replicator dynamics with capitation tax and subsidy. In IEEE CCA & ISIC, 249–254. Liebrand, W.B., Wilke, H.A., Vogel, R., and Wolters, F.J. (1986). Value orientation and conformity a study using three types of social dilemma games. Journal of Conflict Resolution, 30(1), 77–97. Ostrom, E. (2008). Tragedy of the commons. The New Palgrave Dictionary of Economics, 3573–3576. Riehl, J. and Cao, M. (2014a). Minimal-agent control of evolutionary games on tree networks. In 21st Int. Symp. on Mathematical Theory of Networks and Systems. Riehl, J.R. and Cao, M. (2014b). Towards control of evolutionary games on networks. In The 53rd IEEE Conference on Decision and Control. Sandholm, W.H. (2002). Evolutionary implementation and congestion pricing. The Review of Economic Studies, 69(3), 667–689. Sandholm, W.H. (2007). Pigouvian pricing and stochastic evolutionary implementation. Journal of Economic Theory, 132(1), 367–382. Sandholm, W.H. (2010). Population games and evolutionary dynamics. MIT press. Schlag, K.H. (1998). Why imitate, and if so, how?: A boundedly rational approach to multi-armed bandits. Journal of Economic Theory, 78(1), 130–156. Szab´o, G. and F´ath, G. (2007). Evolutionary games on graphs. Physics Reports, 446(4), 97–9216.

Table 1. Tightness of bounds on large trees Game

ˆ mean(|L|/n)

ˆ − |L|)/n) ˇ mean((|L|

SH PD SD

0.112 0.447 0.208

0.055 0.103 0.155

7.2 Geometric random networks Next, we test the algorithm on 100 small geometric random networks created by randomly placing ten agents in the unit square and connecting any two agents that lie closer than a distance of 0.4 from each other. We compare the approximate solutions of Algorithm 2 to exact solutions computed by the method described in Section 4. We use small networks because of the extreme computational complexity involved in computing the exact solutions. Table 2. Small geometric random networks Game

ˆ mean(|L|)

ˆ − |L∗ |) mean(|L|

E[tˆ]

E[t∗ ]

SH PD SD

4.27 8.01 8.31

1.01 1.25 4.34

1.63 1.27 1.32

5.52 49.4 289

81

Table 2 lists the mean size of the approximate minimum control sets Lˆ as well as the difference between the true 81