European Journal of Operational Research 279 (2019) 486–501
Contents lists available at ScienceDirect
European Journal of Operational Research journal homepage: www.elsevier.com/locate/ejor
Decision Support
Environmental policy regulation and corporate compliance in evolutionary game models with well-mixed and structured populations André Barreira da Silva Rocha∗, Gabriel Meyer Salomão Department of Industrial Engineering, Pontifical Catholic University of Rio de Janeiro, Rua Marquês de São Vicente 225, Gávea, Rio de Janeiro, RJ CEP22451-900, Brazil
a r t i c l e
i n f o
Article history: Received 6 April 2018 Accepted 29 May 2019 Available online 6 June 2019 Keywords: Game theory Agent-based model Cellular automata Structured complex systems OR in Environment and climate change
a b s t r a c t We use an evolutionary game model to study the interplay between corporate environmental compliance and enforcement promoted by the policy maker in a country facing a pollution trap, i.e., a scenario in which the vast majority of firms do not internalize their pollution negative externality and auditors do not inspect firms. The game conflict is due to the trade-off in which firms are better-off when they pollute and are not inspected, while social welfare is maximized when auditors do not need to inspect socially responsible corporations that account for pollution in their production decisions regarding technology used and emission level. Starting with a well-mixed two-population game model, there is no longrun equilibrium and the shares of polluters and shirking auditors keep oscillating over time. In contrast, when firms and auditors are allocated in a spatial network, the game displays a rich dynamics depending on the inspecting cost. While the oscillatory behaviour is still possible, there is a set of parameters for which a long run robust equilibrium is achieved with the country leaving the pollution trap. On the other hand, an excessively high inspection cost leads to an ineffective auditing process that drives the few compliant firms out of the country. © 2019 Elsevier B.V. All rights reserved.
1. Introduction Environmental issues such as the level of emissions and global warming have become increasingly important during the past decades. On the one hand, there is a growing commitment among the countries’ governments to global agreements such as the Kyoto Protocol and the Paris Agreement. On the other hand, companies display significant efforts to address socially responsible issues raised by stakeholders such as consumers and employees. The latter is reflected in the accounting practices of the large companies, in which environmental and social accounting are frequently reported within the companies annual reports. Other important corporate strategies include the award of international certification stating compliance with social and environmental issues such as ISO 14001. The impact of social reputation and environmental compliance on profits is nowadays taken very seriously, particularly by large companies. Once a firm is inspected and declared non-compliant, ∗
Corresponding author. E-mail address:
[email protected] (A.B. da Silva Rocha).
https://doi.org/10.1016/j.ejor.2019.05.040 0377-2217/© 2019 Elsevier B.V. All rights reserved.
there might be impact on revenues, market-share and costs: fines might be imposed by auditors and consumers can enforce product boycotts such as the one Royal Dutch Shell suffered due to the intention of the latter to dispose its Brent Spar offshore storage buoy in the sea (The Economist, 1995; Klein, Smith, & John, 2002). Despite such a risk, social or environmental commitment and profits cannot be simultaneously maximized unless compliance investment contributes to profit maximization (Husted & Salazar, 2006). Consequently, environmental compliance has to be enforced by policy makers through auditing processes. But enforcement is also a costly process due to inspection costs, noisy monitoring due to the presence of multiple firms discharging pollutants in the same neighbourhood or even due to uninspectability (Heyes, 1998). The latter is particularly true in the US where the 4th Amendment obligates auditors to run the first inspections from outside the firms’ fence. Corruption, privately funded political campaigns, bribery or the lack of auditors commitment, among other factors, may aggravate the situation. Moreover, sanctions might depend on several factors such as the degree of non-compliance, i.e., the difference between the observed and the accepted pollution level, as well as on the effort
A.B. da Silva Rocha and G.M. Salomão / European Journal of Operational Research 279 (2019) 486–501
demonstrated by the firm to minimize the social negative externality due to pollution. Firms showing an investment plan in clean production are rarely punished if found to be non-compliant with regulations (Arguedas & Hamoudi, 2004). As a result, firms tend to comply with environmental regulation only if the investment to do so is offset by the expected cost of being inspected and caught. The latter can be increased through higher fines imposed on noncompliant firms or decreasing the inspection costs, leading to more frequent inspections. In light of these facts, we present an evolutionary game to study the interplay between polluting firms and auditors. We assume a large number of firms that might or not take into account in their private profits the negative externality due to pollution that is generated when they produce. On the side of the policy maker, there is a large number of auditors that might or not inspect those firms. We closely follow (Arguedas & Hamoudi, 2004) with regard to the profit functions of the firms depending positively on the emission level and on dirtier (i.e., cheaper) technologies. All firms discharge some level of pollution, thus generating a negative social externality. Fines increase the dirtier the technology a company chooses to use and on the level of emission. A firm’s decision to be environmentally compliant, taking into account the pollution it generates, is solely based on its corporate culture. The relationship between corporate culture and environmental performance is very significant as tested by Sugita and Takahashi (2015). The latter found that a company displaying adhocracy culture (entrepreneurial atmosphere, appreciating individual initiative and freedom) has a positive impact on climate change mitigation and other aspects of environmental management. In contrast, a corporate culture of excessive hierarchy (too many formal rules and policies) has a negative relationship with environmental management. Given shifts in corporate culture tend to display inertia and take some time to happen, an evolutionary game which is dynamic, seems to be an appropriate framework to model such long-term environmental culture change in corporations. We assume an initial condition equivalent to a social context of pollution trap, i.e., where almost none of the firms are compliant and almost the totality of the auditors display a lenient culture of not inspecting the firms. Our definition of pollution trap is in line with that in Mao and He (2018), in which a country facing a pollution trap is likely to sustain pollution emissions over time. In the context of the Environmental Kuznets Curve, which posits an U-inverted relationship between pollution and economic development (see Dasgupta, Laplante, Wang, & Wheeler, 2002), a country is thus unable to escape from the pollution haven behaviour even when its economy shifts from a low-income to a high-income level. At early stages of industrialization, pollution level grows fast in a country due to weak environmental regulation, low financial resources to pay for abatement and people’s lack of interest in the environment. With income per capita growing, pollution level reaches a maximum and then starts to fall due to people paying more attention to environmental issues, stricter regulations and compliant firms. Production based on dirty technologies would then be shifted to pollution havens, i.e., low income countries. When such relationship between pollution and income per capita fails, even with environmental regulation put in place, a country would face a pollution trap as well. The remainder of the paper is organized as follows. In Section 2, we review the literature and discuss some important technical features employed in the model setup. Alternative methods as well as the main contributions of our model are also discussed. In Section 3, we present the model. In Section 4, we solve the evolutionary game assuming well-mixed populations. In Section 5, the game is extended to the case of structured populations. Results are compared and discussed. Section 6 concludes.
487
2. Methods employed in the model 2.1. Game theory framework There is a wide literature on environmental compliance, particularly focusing on the conflict of interests between policy makers and corporations. One branch of the literature addressing such problem is based on the principal-agent framework. Arguedas and Hamoudi (2004) develop a principal-agent model in which sanctions depend on both the environmental technology employed by the firm as well as on the degree of non-compliance. The regulator chooses the pollution standard and the probability of inspection while the firm decides on the emission level and the environmental technology to be adopted. A cleaner technology leads to smaller fines when those are applied. Principal-agent models are a type of sequential game which is generally used when there is asymmetric information between one firm (the agent) and a regulator (the principal). This information gap might be due to firm’s private information(s) as in Arguedas and Hamoudi (2004) and in Malik (2007). Private information might encompass for example the firm’s level of emission, the technology used or the pollution abatement effort. Principal-agent models generally assume a oneshot relationship, in the sense that once the firm and the regulator have rationally decided the optimal level of their decision variables in order to maximize their individual payoff functions, those decisions cannot be reviewed, agents get their payoffs and the game ends. The timing of those decisions plays an important role in the sequential game. In Arguedas and Hamoudi (2004), they study two different decision timings in a sequential game that is played only once: (i) a three-stage game where the firm first chooses the production technology, followed by the regulator selecting both the pollution standard and probability of inspection. Finally, the firm decides the emission level; (ii) a two-stage game where the regulator plays in the first stage and the firm then chooses both the technology and emission level. Alternatively, an evolutionary game model might be used in a context when the policy maker, either represented by one regulating agency or several auditors in the field, has to interact with a large number of firms and when the conflict of interest between the regulator and the firms is dynamic by nature, i.e., differently from a one-shot relationship such as those generally described by a principal-agent model, agents interact over time indefinitely. Zhu and Dou (2007) propose an evolutionary game model to study the interplay between enterprises and the government during enforcement of regulation. In the latter, companies face a cost of compliance, monitoring is costly to the government and public revenues come from fines paid by non-compliant firms. In a different context, that of auditing processes dealing with financial reports, Anastasopoulos and Anastasopoulos (2012) propose an evolutionary game in which auditors might conduct either a basic or an extended (and more costly) set of audit procedures. Only the latter are able to distinguish between companies that commit or not intentional fraud in their reports. Anastasopoulos and Anastasopoulos (2012) point out that evolutionary game models have advantages when compared to their classic game theory counterparts because evolutionary games are capable of explaining how players achieve the equilibrium in the long run. Evolutionary game models are also generally used whenever agents (firms or auditors) do not compete among each other but their strategic decisions are influenced by the performance of their peers, i.e., agents are able to review their actions over time by comparing their performance with that of a randomly chosen neighbour in spatial games. Moreover, evolutionary games allow for bounded rationality, which is closer to real world corporate decisions. As Arthur (1994) points out, the type of rationality assumed in economics demands much of human behaviour and
488
A.B. da Silva Rocha and G.M. Salomão / European Journal of Operational Research 279 (2019) 486–501
breaks down under complicated problems due to two main reasons: (i) beyond a certain level of complexity human logical capacity ceases to cope; (ii) in interactive situations of complication, agents cannot rely upon the other agents to behave rationally, thus being forced to guess their behaviour. In such situations, psychologists tend to agree that humans think inductively with bounded rationality, simplifying the problem (Bower & Hilgard, 1981; Holland, Holyoak, Nisbett, & Thagard, 1986; Rumelhart, 1980; Schank & Abelson, 1977). Over time, feedback from the competitive environment comes in and individuals replace their strategies whenever better adapted actions display more successful results. Such learning system is evolutionary by default in the sense that only the best adapted strategies survive and reproduce in the long run. In our model, we employ an evolutionary game framework in order to capture: (i) the interplay between firms and auditors, both belonging to large populations of agents whose actions might affect other agents’ decisions. A stage-game payoff matrix provides the profits and welfare utilities that can be obtained from any possible pairwise meeting between a firm and an auditor at any time step. We do not model meetings between auditors or meetings between firms because agents do not compete against each other within their own population; (ii) the evolutionary dynamics of both populations over time, showing if and how the shares of agents adopting each strategy evolve towards an asymptotically stable equilibrium; (iii) the behaviour of boundedly rational players, assuming: (a) agents do not have memory, i.e., only the payoffs obtained at a given time step are taken into account in the decision making process regarding strategy revision; (b) agents have limited access to information in a geographical fashion whenever the game assumes structured populations; (c) although agents are able to optimize their profit functions leading to the setup of the game payoff matrix, i.e., they can rationally decide the optimal level of their decision variables in order to maximize their individual payoff function given the strategy profile they face at a given pairwise meeting, the choice of strategy that they adopt is naive, either purely based on corporate culture in the well-mixed game and at the initial time step in the structured population game or revised over time through peer-influence, simply copying the strategy of other agents that are more successful locally in the structured population game over time. 2.2. Boundedly rational agents In this subsection, we further discuss the assumption of bounded rationality addressed at the end of the previous subsection given its central role in the choice to use an evolutionary game model to study environmental compliance. We assume boundedly rational agents such that each firm’s decision to select the optimal technology and emission level is carried out through maximizing profits but the private profit maximizing function that is chosen by managers is naive and purely based on their corporate culture. In other words, when firms decide which technology to use, environmentally compliant firms internalize the negative externality due to pollution in their private profit function while non-compliant firms do not, i.e., the choice of environmental policy is naive in the sense that firms do not necessarily play a best response, either to be compliant or not, to the strategy played by the auditor visiting the firm. But once they have naively chosen their strategy, they rationally select the value of the decision parameters that maximize the profit function associated with their chosen strategy. Such an approach is similar to that used in Xiao and Yu (2006) supply chain model, where agents rationally maximize their utilities (either revenue or profit functions) but they are naive regarding which preference function they chose to follow. It is also in line with Friedman and Mezzetti (2002), in which they present a boundedly rational model with each player holding naive
beliefs regarding other players strategy choice, but some rational behaviour is still retained through the fact that players are able to maximize their profits functions. More specifically, Friedman and Mezzetti (2002) present a dynamic conjectural variations model of an n-firm oligopoly with differentiated products market. Each firm holds a naive belief according to which any change in its current price will induce other firms to make well-defined changes in their prices in the following period. Based on such naive belief, which completely ignores how firms react to the other n − 2 changes of price that are carried out by the other competitors, firms rationally select a strategy composed of a set of prices that maximize their discounted cash flow. In such model, players are very sophisticated when making their own decisions but they display a low level of sophistication regarding the decision of their rivals. Friedman and Mezzetti (2002) justify such behaviour stating that players display some awareness that there are strategic interactions among firms but might view the sophistication required to correctly model such interactions too complicated to be worthwhile, while at the same time their simpler naive belief might be understood as a good approximation. As discussed in Crawford (2013), bounded rationality implies relaxing the assumption that agents optimize, i.e., in models with bounded rationality, individuals act as if to optimize something but they seek to improve on neoclassical models by relaxing one or more customary neoclassical assumptions. In game theory, the neoclassical assumption that agents play a Nash equilibrium from the start of the game is relaxed and might be replaced by an assumption according to which players follow simple adaptive rules that may converge to equilibrium. Such an approach is in line with our model in the sense that both choices of technology and emission level are done in a rational fashion, leading to a stage-game payoff matrix governing the strategic interaction between firms and auditors over time, while at the same time, in any pairwise meeting between firm and auditor, the strategy profile played is not necessarily the Nash equilibrium profile. In the well-mixed population game (Section 4), such kind of bounded rationality is implicit in the replicator dynamics governing the evolution of firms and auditors over time in the sense that firms are not aware about the profits made by other firms holding a different strategy. Firms simply naively pick a strategy at the initial condition based on their corporate culture and rationally maximize their profit at any given time step conditional on the strategy adopted by the auditor they meet. Strategies are never reviewed and the best performing strategy obtains more profit (fitness), which is reflected in that strategy having a higher growth rate over time in the replicator equations when compared to poor performing strategies, i.e., a higher reinvestment rate of the obtained profit in new firms adopting the same corporate culture as the parent firm. Actually, strategies that perform worse grow at a lower rate up to becoming (relatively) extinct in their population, i.e., the share of individuals adopting them becomes zero. We emphasize that such natural selection mechanism that is implicit in replicator dynamics rationally selects the best performing strategy in the long run. Although firms and auditors do not compete directly in our model, i.e., the firms’ private profit does not depend on the other firms in their own population, firms display awareness of their strategic interaction with auditors as in Friedman and Mezzetti (2002) given that they rationally optimize their emission level conditional on the type of auditor they meet, independently of playing or not the best response regarding the choice of environmental policy adopted. In the spatial game (Section 5), while agents keep playing in a bounded rational fashion the same stage-game following the payoff matrix of the well-mixed population game, the spatial game further takes into account the strategic interaction through the fact that agents might review their strategies over time whenever they compare
A.B. da Silva Rocha and G.M. Salomão / European Journal of Operational Research 279 (2019) 486–501
their payoff with that of a neighbour and the latter performs better. 2.3. Well-mixed versus structured population models We start with a base model where there is no spatial structure as in Zhu and Dou (2007) and Anastasopoulos and Anastasopoulos (2012), i.e., each auditor might inspect any firm in the population. Then, we introduce one of the main contributions of our model, which is to assume that firms and auditors are spatially allocated at different geographical locations. The latter seems more realistic given that a firm tends to adopt similar practices as those of its local neighbours. Thus, a firm located in a region with several polluters would have incentives to do the same. On the side of auditors, it would be less likely that a particular auditor was selected to inspect a firm located far away due to cost constraints. The greater realism of spatial systems over well-mixed ones is emphasized in Pacheco, Vasconcelos, and Santos (2014), in which the authors investigate group size and risk perception on the chances that countries would coordinate to save the planet’s climate from the damages due to pollution and greenhouse gas emissions. As they point out, when dealing with real populations, one should be careful when setting up policies based on results obtained using infinite population approximations such as those from the replicator equation in well-mixed systems. Instead, they model the pollution dilemma as a spatial Public Goods Game with the population of agents divided into groups. Each individual has an endowment and cooperators wishing to curb emissions contribute a fraction of such endowment while defectors do not. If the overall contribution within a group reaches a given threshold, a successful agreement is reached and participants keep whatever endowment they have. Otherwise, there is a risk given by an exogenous probability of collective environmental disaster in which every individual in the group loses his endowment. Their results show that global institutions, such as the United Nations, do not increase the probability of overcoming the climate change problem, i.e., results “may sound provocative for today’s leaders who are trying to solve the (climate) problem in a centralized way, but...stable interactions with locally organized groups are those conditions in which reciprocity is augmented, hence giving chance for a higher cooperation level”, as discussed in Szolnoki (2014). In our model, while the framework adopting two well-mixed populations in which any firm can interact with any auditor recovers the same oscillatory pattern as in Anastasopoulos and Anastasopoulos (2012), without an evolutionary equilibrium in the long run, the extension assuming structured populations allows for a quite rich dynamics depending on the monitoring cost, with the possibility of different robust equilibria as well as the possibility of the oscillatory pattern observed in the well-mixed population case. Moreover, the structured population model not only is more in line with the real world, but results also show that corporate environmental compliance is more likely to be adopted in the structured model than in a model where spatial structure is absent. The latter tends to underestimate compliance which might lead to poor public policy recommendations regarding compliance enforcement. Hence, results based on mean-field approximations such as our well-mixed version of the game or in Anastasopoulos and Anastasopoulos (2012) should be taken carefully as pointed out in Pacheco et al. (2014). In fact, this effect to diminish oscillation via the application of spatial topology to well-mixed population systems is not only a characteristic of our model but it fits to a wider context such as the case of systems with cyclic dominance that can be found for example in the rock-scissors-paper (RSP) game. In the latter, rock crashes scissors, scissors cut paper and paper wraps rock to
489
close the loop of dominance. In Szabó, Szolnoki, and Izsák (2004), a one population RSP-like spatial game is studied on a square lattice with periodic boundary conditions. In the latter, the system develops into a stationary state where all three strategies coexist with the same average shares (1/3) and they alternate cyclically at each site such that those local oscillations are not able to synchronize into a global oscillation. Then, small-world properties are taken into account separately by means of introducing two types of random structures and results are contrasted: (i) the structure is quenched such that a Q-portion of random links are substituted for the nearest-neighbour bonds (square lattice and random regular graph correspond respectively to Q = 0 and Q → 1); (ii) the effect of annealed randomness in the square lattice is taken into account such that a bond between nearest-neighbour sites is replaced by a random link with probability P (square lattice and well-mixed population correspond respectively to P = 0 and P → 1). Setting P = 0 and increasing Q, above a threshold value the system undergoes a transition from local into synchronized global oscillation (limit cycle). On the other hand, setting Q = 0 and increasing P, the system exhibits a similar behaviour but, above a second threshold the oscillation increases and terminates at one of the homogeneous absorbing states, thus ceasing oscillation. In Szolnoki and Szabó (2004), they combine the two types of random structures in the same RSP game and analyse what happens in different spatial structures. Overall results are similar to the above description: in the P − Q plane, for the parameter region corresponding to small values of both Q and P, the system displays local oscillations. On the other hand, for the parameter region with both Q and P close to 1, evolution ends in an absorbing state containing only one strategy and oscillation is replaced by a steady-state behaviour. And there is a parameter region in between the former two regions in which the system displays global oscillation via a Hopf bifurcation. A more detailed discussion can be seen in Szolnoki et al. (2014), where they also study the effect of mobility in spatial structures, in which there might either be site exchange between two strategies or between an individual player and an empty site. Mobility can either promote or jeopardize biodiversity in games of cyclic dominance. Another important aspect regarding systems with cyclic dominance that cannot be ignored is related to the fact that, despite small-world networks and mobility abounding in nature, global oscillations are rarely observed in biological systems and biodiversity survives. While global oscillations emerge for both homogeneous and heterogeneous strategy-specific invasion rates as soon as the fraction of quenched links exceeds a threshold in the network, Szolnoki and Perc (2016) show that site-specific heterogeneous invasion rates, i.e., geographic heterogeneity regarding the probability of a player successfully invading a neighbouring site in a square lattice, are always effective in suppressing the emergence of global oscillations in the presence of quenched or annealed randomness in the interaction network or in the presence of mobility of players. Site-specific heterogeneous invasion rates differ from strategyspecific invasion rates because the former influence the success of microscopic dynamics locally while in the latter different strategies have different probabilities of invading a neighbouring site but those probabilities are applied uniformly across the network. In our spatial game model, the probability of switching strategy, i.e., a player invading a neighbouring site in the square lattice, is strategy-specific and dependent on the same payoff matrix no matter the spatial location of an agent. In other words, each strategy an auditor or a firm might adopt leads to six possible accumulated payoffs in any given Monte Carlo step (MCS)1 . Thus, although
1 In the literature, Monte Carlo step is also called a generation or an evolutionary step.
490
A.B. da Silva Rocha and G.M. Salomão / European Journal of Operational Research 279 (2019) 486–501
payoff differences are influenced by both the agent’s and his randomly chosen neighbour’s accumulated payoffs, the possible probabilities of invasion are the same for a given strategy across the whole square lattice in any given MCS. A future model could contemplate the impact of such site-specific invasion probabilities on the results as well as the effect of the presence of quenched and annealed randomness in the square lattice. 2.4. Strategy revision and population dynamics Our structured population approach is closely related to agentbased and cellular automata (CA) models. The latter have been used in a wide scope of problems, ranging from the seminal theoretical Game of Life by John Conway (see Gardner, 1970) to more applied problems such as the model in Duff, Chong, and Tolhurst (2015) used to determine optimal travel routes for vehicles travelling from bases to forest fires. In CA models, an array of identically programmed cells (automata) interact with one another in a local neighbourhood. Each automaton has an initial state which might change or not over time according to some transition function. The latter specifies some rule(s) which generally depend on the state or some other entity of a particular cell and its closest neighbours. Local patterns of behaviour then influence the entire population. Transition rules from the current to the next discrete time step might be completely deterministic as in Pereira, Martinez, and Espíndola (2008) where the updating rule consists in every automaton copying the state of the best performing automata in the local neighbourhood or as in Conway’s game of life where the state of each automaton could be alive or dead conditional on how many local neighbours were in the alive state in the previous time step. We here follow a probabilistic transition rule in which each cell contains one auditor and one firm. Individuals compare their performance within their own population although their profits come from the game played with individuals in the other population. Updating is synchronous such that, in each MCS, after all individuals have computed the payoffs obtained from the game, each individual selects a local neighbour to compare payoffs. If the neighbour’s payoff is larger, there is an increasing probability of copying the neighbour’s strategy the larger the difference in payoffs is. Games where players compare their performance and update their strategies probabilistically can be seen in Traulsen, Semmann, Sommerfeld, Krambeck, and Milinski (2010), Casasnovas (2012), Pacheco et al. (2014), Chen, Szolnoki, and Perc (2015), Zhang, Zhang, Cao, and Weissing (2015) and Amaral, Wardil, Perc, and da Silva (2016), among many others in the literature, although generally such class of models only deals with one-population games. Moreover, the transition probabilistic rule regarding strategy updating may vary and the choice of such updating rule can affect how locally successful strategies spread macroscopically in structured populations, which might lead to different evolutionary outcomes (Traulsen et al., 2010). In our base model with well-mixed populations, the evolutionary process follows a replicator dynamics system, i.e., we only take into account strategy selection. To keep consistence with the latter, in the spatial model we adopt a transition rule following a replicator-like probability of the form Pi→ j = max{0, α ( j − i )}, in which Pi → j is the probability of agent i imitating the strategy of a random neighbour j, based on their payoff difference j − i and α is a constant such that Pi → j ∈ [0, 1] is ensured. A different transition rule found in the literature, which would be more in line with a replicator-mutator model (Willensdorfer & Nowak, 2005), can be seen in Zhang et al. (2015), Pacheco et al. (2014) and Amaral et al. (2016), where the probability of updating Pi→ j = [1 + exp−β ( j −i ) ]−1 is based on the Fermi distribution function instead. While in the replicator-like rule, an agent never imitates a worse performing strategy, the latter may occur in the
Fermi function rule. The parameter β , which in Physics is the inverse of temperature, accounts for the intensity of selection in the evolutionary process. For strong selection (β → +∞), an agent always follows the imitate-the-best rule, while for β < < 1, selection is weak, thus a worse performing strategy could be adopted, which would account for the possibility of mutation in the model (Traulsen et al., 2010), (Casasnovas, 2012). Another important issue regarding the strategy updating procedure is discussed in Zhang et al. (2015), in which they study both the spatial prisoner’s dilemma and the snowdrift game. Different spatial structures are taken into account, such as the scalefree network and the random-regular network and agents update their strategies with a probability based on the Fermi function. In order to decide whether to keep or update the adopted strategy, a focal agent might compare his payoff with the payoff of just one neighbour or with the payoffs of a large number of agents. Zhang et al. (2015) conclude that evolution in a spatial game can be as strongly affected by the number of other agents consulted for updating as by the payoff matrix and the network structure. The evolutionary outcome is relatively independent of the class of game and the structure of the network whenever the number of players consulted is large. Thus, while in our model we stick to the widely adopted procedure in the literature in which a player compares his payoff with the payoff of just one neighbour, one should be aware that different results could take place if a player was to compare his payoff with those of many neighbours. 3. Model We follow Arguedas and Hamoudi (2004) with regard to firm profit and social welfare function. A firm’s private profit π (k, e, β ) = ke − e2 β −1 is strictly concave on the emission level e > 0 and is monotonically increasing on a profitability parameter k > 0 and on the type of production technology employed β ∈ [1, β¯ ], where a higher β is associated with a cheaper and dirtier technology. Pollution generates a negative externality d (e, β ) = β e2 which is convex increasing on the emission level and linearly increasing the dirtier the technology adopted. We assume bounded rational agents such that the decision to select the optimal technology is rationally carried out through maximizing profits but the profit maximizing function depends on the strategy that is naively chosen by managers, which is only based on the firm’s corporate culture. When firms decide which technology to use, environmentally compliant firms internalize the negative externality in their private profit function while non-compliant firms do not. On the policy maker side, auditors might inspect firms or not. If a firm is inspected and declared non-compliant, it faces a fine f (e, β , s ) = β (e − s )2 , where s is the maximum emission level acceptable. Differently from the principal-agent monopoly framework in Arguedas and Hamoudi (2004), we assume a very large number of firms, thus making inspectability through emission level more difficult (see Heyes, 1998). We thus normalize the emission standard to s = 0, making fines harsher, i.e., due to a higher required effort to correctly measure the emission level, the policy maker adopts a stricter environmental policy. The decision to punish is solely based on visual inspection of the type of technology employed in corporate plants and punishment is enforced whenever β is high. The penalty consists in charging a fine equal to d(e, β ), i.e., a fine equivalent to the complete internalization of the negative environmental externality that a firm generates while producing. Regarding inspection, there is a very large number of auditors, which are randomly allocated to visit firms. Each auditor either inspects or not. While in Arguedas and Hamoudi (2004) the inspection probability faced by the only firm in the model was a continuous variable p ∈ [0, 1], here it can only assume one of two
A.B. da Silva Rocha and G.M. Salomão / European Journal of Operational Research 279 (2019) 486–501
491
values for one particular firm at the microscopic local level, i.e., p = 0 ∨ p = 1, depending on the strategy chosen by the auditor visiting the firm. But at the macroscopic level, every firm on average faces p ∈ [0, 1] depending on the strategy distribution (state) in the auditors population at a given time step. With regard to welfare, the policy maker has a social welfare function according to which he takes into account the firm’s private profit, the negative externality it generates and the expected cost of inspection. Fines are not taken into account once the fine paid by a firm cancels out with the public revenue it generates:
firm is not inspected, it is not declared non-compliant and does not have to internalize the environmental damage into its private profit function. The FOC for profit maximization leads to:
W (k, e, β , p, c ) = ke − e2 β −1 − β e2 − pc
hand, a non-compliant firm facing inspection is fined and has to internalize the pollution cost, not only when making the optimal emission decision but also decreasing its private profit that is left to be reinvested. The optimization problem becomes:
(1)
Whenever an auditor visits a firm, the game played is given by:
Inspects (I ) No Inspection (N )
Non-compliant (P ) πAIP , πFIP
πANP , πFNP
Compliant (C ) πAIC , πFIC
(2)
πANC , πFNC
where the auditor, denoted by subscript A (resp. the firm, denoted by subscript F), stands as the row (resp. column) player in the payoff matrix above. When a firm is called to play the stage-game in (2), all the managers can do is to adjust the emission level selecting the optimal emission that maximizes the profit function depending if the auditor the firm faces will inspect it or not and depending on which environmental policy the firm follows. In other words, we further assume that the optimal decisions regarding β and e are taken rationally but independently at each firm. This assumption, which is also in line with bounded rationality, takes into account the lack of synergy between corporate divisions that often take place in large companies, where the long-run investment decision regarding the technology to be used is made by a planning team while the more short-run decision on the emission level is carried out by the operating team. If the firm culture follows corporate socially responsible (CSR) practices, thus being compliant with the environmental regulation, technology choice is based on the following maximization problem:
max π (k, e, β ) = ke − e2 β −1 − β e2 β
which internalizes the negative externality into the firm’s private profit for the sake of selecting β ∗ . First order condition (FOC) leads to:
∂π = −e2 1 − β −2 = 0 ⇒ β ∗ = 1 ∂β
Instead, a no-CSR firm is not compliant with the environmental regulation and only takes into account its private profit function, leading to the following FOC and technology optimal choice:
∂π = e2 β −2 > 0 ⇒ β ∗ = β¯ ∂β Given that we are left with only two choices of technology β ∈ 1, β¯ , whenever an auditor visits a firm and decides to inspect it, the firm is fined if it employs a technology β = β¯ . The optimal emission level for a compliant firm is independent on being audited or not:
max π (k, e, 1 ) = ke − e2 − e2 ⇒ e
∂π = k − 4e = 0 ⇒ e∗ = k/4 ∂e
leading to a private profit πFIC = πFNC = 3k2 /16 and social welfare πANC = k2 /8 if the firm is not inspected and πAIC = k2 /8 − c otherwise. We emphasize that, although compliant firms always internalize the pollution cost into their private profit function for the sake of selecting their optimal decisions β ∗ and e∗ , they never get fined, thus never facing the penalty corresponding to the social cost d(e, β ) due to pollution. Regarding non-compliant firms, if a
max π (k, e, β¯ ) = ke − e2 β¯ −1 ⇒ e
∂π = k − 2eβ¯ −1 ∂e
= 0 ⇒ e∗ = kβ¯ /2
Thus, the firm optimal private profit is πFNP = k2 β¯ /4, leading to a corresponding social welfare of π NP = k2 β¯ (1 − β¯ 2 )/4. On the other A
max π (k, e, β¯ ) = ke − e2 β¯ −1 − β¯ e2 e
leading to e∗ = kβ¯ (1 + β¯ 2 )−1 /2, πFIP = k2 β¯ (1 + β¯ 2 )−1 /4 and πAIP = k2 β¯ (1 + β¯ 2 )−1 /4 − c. From the analysis above, while a compliant firm always voluntarily internalizes the pollution damage into its private profit function whenever managers make optimal decisions about technology and emission level, non-compliant firms only do so when an auditor inspects its non-switchable production technology and imposes environmental compliance through the fine. The latter has the effect of adjusting the firm’s pollution discharge to a lower emission level. While a non-compliant uninspected firm is better-off than any other firm with regard to private profit, i.e., k2 β¯ /4 > 3k2 /16 > k2 β¯ (1 + β¯ 2 )−1 /4, a non-compliant inspected firm faces the lowest possible payoff. Despite employing the cheapest technology, getting punished and having to adjust the emission level using the worst technology leads to a too low optimal emission level, i.e., kβ¯ (1 + β¯ 2 )−1 /2 < k/4 < kβ¯ /2; ∀β¯ > 1. On top of this, the firm’s private profit is also decreased due to the additional cost to pay the fine. The payoff matrix in (2) with possible private profits π F and social welfares π A then becomes:
Non-compliant (P )
−1
k2 β¯ 1 + β¯ 2 /4 − c, k2 β¯ 1 + β¯ 2 Inspects (I ) No Inspection (N ) 2 ¯ k β 1 − β¯ 2 /4, k2 β¯ /4
−1
Compliant (C ) /4
k2 /8 − c, 3k2 /16 k2 /8, 3k2 /16
(3) From (3), enforcement of environmental compliance is only feasible if k2 β¯ (1 + β¯ 2 )−1 /4 − c > k2 β¯ (1 − β¯ 2 )/4, leading to a maximum inspection cost c < c¯ = k2 β¯ 5 (1 + β¯ 2 )−1 /4. Above c¯, not inspecting is a strictly dominant strategy for auditors, independently of the corporate culture the visited firm follows, and society falls into a pollution trap scenario in the long run where polluters always perform better than any compliant firm, thus driving the latter to extinction. Moreover, the largest social welfare k2 /8 is achieved whenever a compliant firm does not require any inspection effort. In contrast, the largest private profit that can be achieved k2 β¯ /4 is obtained when auditors fail to inspect a noncompliant firm. Hence, the game payoff matrix points out the conflict of interest between the policy maker and firms. We define a parameter ξ = c/c¯ ∈ (0, 1 ) such that we concentrate our analysis in the cases where enforcement of environmental compliance is feasible. Thus, we replace c by ξ c¯ = ξ k2 β¯ 5 (1 + β¯ 2 )−1 /4 in (3) and the payoff matrix becomes:
⎛ Inspects (I ) ⎝ ( No Inspection (N )
β¯
Non-compliant (P )
4 1+β¯ 2
)
−
ξ β¯ 5 β¯ , 4 (1+β¯ 2 ) 4 (1+β¯ 2 )
β¯ (1−β¯ 2 ) β¯ 4
,
4
Compliant (C ) 1 8
−
ξ β¯ 5 , 3 4 (1+β¯ 2 ) 16
⎞ ⎠k 2
1 3 , 8 16
(4)
492
A.B. da Silva Rocha and G.M. Salomão / European Journal of Operational Research 279 (2019) 486–501
Fig. 1. Phase diagram and time evolution for parameters: k = 1, β¯ = 2, ξ = 0.30 and initial conditions ( p0 = 0.35, q0 = 0.65 ).
4. Well-mixed population game In this section, we start analysing the evolution of the states of both populations playing the game in (4) using replicator dynamics (RD). The latter is suitable when populations are wellmixed, i.e., any firm can be visited by any auditor. Moreover, RD assumes replication, i.e., any profit obtained by a compliant (resp. non-compliant) firm is completely reinvested in another compliant (resp. non-compliant) firm. Under RD only natural selection of strategies is taken into account and there is no mutation, i.e., a firm does not reinvest in another firm experimenting a different corporate culture. Assuming that at a given time step p is the proportion of auditors that inspect firms and q is the proportion of non-compliant firms, one can derive a system of nonlinear ordinary differential equations governing the evolution of both populations over time. Each equation of the RD system governs the change in the state of one population over time. For the case of our game, we have:
∂p = p˙ = p(1 − p) πAI − πAN ⇒ p˙ = p(1 − p)c¯(q − ξ ) ∂t ∂q = q˙ = q(1 − q ) πFP − πFC ⇒ q˙ ∂t
¯ β¯ 3 2 ( 4β − 3 ) = q ( 1 − q )k − p 16 4(1 + β¯ 2 )
(5)
(6)
where πAI and πAN (resp. πFP and πFC ) are, respectively, the expected social welfares when an auditor inspects and does not inspect a firm (resp. the expected private profits for a non-compliant and a compliant firm). The RD system in (5) and (6) does not have an evolutionary equilibrium (Hofbauer & Sigmund, 1998). It has a neutrally stable state p∗ = β¯ −3 (4β¯ − 3 )(1 + β¯ 2 )/4; q∗ = ξ = c/c¯, corresponding to the only (mixed strategy) Nash equilibrium of the stage-game in (4), about which the states of both populations oscillate over time indefinitely. Such evolutionary pattern is similar to that found in Anastasopoulos and Anastasopoulos (2012). It should be emphasized that the parameter k does not have any effect in the best reply structure of the game in (4) and the RD is not sensitive to k as well. In (6), k is just a constant that impacts the speed of evolution but not the evolutionary pattern over time nor the long run evolutionary outcome. The latter is also true regarding the role of c¯ in the dynamic rate of growth of p given by (5). Thus, from now on we assume k = 1 in all numerical simulations without loss of generality, while c¯ is just a function of β¯ .
In Fig. 1 we present a numerical simulation for k = 1, β¯ = 2, ξ = 0.30 and initial conditions in which only 35% of the firms are compliant and only 35% of the auditors are committed to inspect firms, p0 = 0.35 ∧ q0 = 0.65. It can be seen that compliance and inspection keep oscillating in both populations without achieving an equilibrium. The initial proportion of two-thirds of the firms being non-compliant induces a sharp rise in the proportion of inspecting auditors over time. The increase in the likelihood of being inspected favours the increase of compliance among firms. When the proportion of non-compliant firms has decreased sharply, close to 5% of the population of firms, auditors become more lenient and the likelihood of inspecting falls down to 10%. The cycle then restarts. Regarding the neutrally stable state (p∗ , q∗ ), it represents the average behaviour of the populations over a long interval of time (Cressman, Morrison, & Wen, 1998). Given that limβ¯ →1 p∗ = 0.5, one can see that there is a bias towards inspection, i.e., on average the majority of the auditors inspects the firms they visit. Moreover, from (3), the expected profit of a non-compliant firm is given by 2 ¯ ∂πFP k2 β¯ 3 π P = k β [1+(1−p)β¯ 2 ], thus =− <0, i.e., given β¯ ∧ k, whenF
∂p
4(1+β¯ 2 )
4(1+β¯ 2 )
ever the probability of inspection increases, the expected profit of non-compliant firms decreases. Consequently, given the expected k2 is unaffected by changes in p, the profit of compliant firms πFC = 316 incentive to comply with the environmental regulation increases. On the side of the policy maker, the difference between the expected social welfare with and without inspection is given by πAI − 2 ¯5 ∂ (πAI −πAN ) ∂ (πAI −πAN ) π N = k β q−c. Thus, >0 and <0. Hence, given β¯ ∧ k, A
4(1+β¯ 2 )
∂q
∂c
whenever non-compliance spreads in the population of firms, the incentive for inspection increases as well. On the other hand, larger costs of inspection induce a lower incentive to inspect firms. This analysis is valid ∀β¯ ; ∀k. One drawback of employing RD to model the evolution of both populations over time is that, by assuming well-mixed populations, RD does not take into account the geographical location of firms and auditors. It is equivalent to a network where all agents are neighbours of each other. In the next section, we overcome this issue by assuming that both populations are spatially located in an L × L square grid with periodic boundary conditions. In such setup, firms and auditors only interact with their local geographic neighbours. 5. Spatial game In order to increase realism, we extend the game in Section 4 assuming individuals in both populations keep playing the same stage-game in (4) but now each player has a specific
A.B. da Silva Rocha and G.M. Salomão / European Journal of Operational Research 279 (2019) 486–501
geographical location. While it seems obvious to account for firms’ spatial location, it is also reasonable to make the same assumption with regard to auditors. Generally, public servants are allocated to work in a particular region of the country or state and they are in charge of inspections in the neighbourhood of the area where they work. The extended game model considers a regular square grid of size L × L, with periodic boundary conditions, thus a planar representation of a torus. We assumed each population with L2 = 10, 0 0 0 individuals such that there is no noise due to small population effects. Each location of the grid is occupied by two players, one auditor and one firm, and each player has a von-Neumann (vNM) neighbourhood with four neighbours. At the beginning of each simulation, each firm in the grid is randomly allocated either strategy C or P according to some pre-defined initial condition q0 . Regarding the population of auditors, we assumed the government selects a particular region of the country to start a pilot task force to try to promote inspection within the entire population of auditors. We focused on countries starting from a pollution trap scenario in which p0 = 0.05 ∧ q0 = 0.95. Thus, at the initial conditions, all auditors are allocated strategy N, except in a stripe representing a compact spatial cluster composed of p0 L2 auditors playing I. Once the allocation is carried out, every individual i in both populations plays the game against all individuals in the opponent population allocated in his vNM neighbourhood plus the opponent allocated at the same location he occupies. The sum of the payoffs =5 obtained in each contest is computed, i = ni=1 πi , and a random neighbour j from the same population is chosen to compare payoffs. If i > j , player i keeps his strategy, otherwise his strategy is updated with probability ( j − i )/[5(max πi − min πi )], where max π i (resp. min π i ) is the maximum (resp. minimum) attainable payoff player i can get in the stage-game payoff matrix. After all individuals have had the chance to update their strategies, the MCS ends and both populations are updated synchronously. The spatial structure adds an additional feature regarding the bounded rationality of players in the game: given that firms and auditors only play the game against their local neighbours, they have limited access to information when they make decisions and either review or keep their strategies. Even if there is a large spatial cluster spreading over time where a given strategy performs better macroscopically, if an agent happens to adopt that same strategy at a geographical location in which there remains a small cluster of players adopting the opposing strategy, it may happen that the agent switches to that strategy which is temporarily performing better only microscopically. This fact results from the comparison of payoffs with the neighbouring environment only, implicitly imposed by the spatial structure of the game, which contributes to limited access to information from a geographical perspective and boundedly rational decisions. We carried out numerical simulations keeping k = 1 and β¯ = 2 as in Section 4 and varied parameter ξ given that, as seen in Section 4, the inspection cost plays an important role by decreasing the incentive to inspect and, consequently, decreasing the firms’ long run average compliance behaviour. For all values of ξ , the stage-game still has one unique mixed-strategy Nash equilibrium, leading to the same cyclical evolutionary pattern found in Section 4 when both populations are well-mixed. We found that, despite this fact, the numerical results displayed a quite rich dynamics in the spatial game. Moreover, in more than one case, a robust long-run equilibrium is attainable in both populations. We used ξ = {0.05; 0.25; 0.40; 0.50; 0.55; 0.60; 0.65; 0.75; 0.90; 0.99}. We ran a set of 50 simulations for each value of ξ , with each realization using a different seed. Each realization ran for 50 0 0 MCS, which was more than enough to ensure either convergence towards the long run equilibrium or to display a clear cyclical long
493
run behaviour. The only exception was for ξ = 0.99, which never displayed a cyclical behaviour but long run equilibrium required always a larger number of MCS to be achieved. Particular interest was given to the evolution of p and q over time, the snapshots of strategy distribution in the grid for both populations and the long run average state of each population when cyclical behaviour was displayed. Based on the numerical results, we formulate four propositions. Proposition 1. For low to intermediate inspection cost, there is a long run robust evolutionary equilibrium in which the pollution trap scenario disappears. All auditors inspect firms and the latter comply with environmental regulations. Discussion: Using the case ξ = 0.25, a low cost of inspection, the payoff matrix in (4) becomes:
Inspects (I ) No Inspection (N )
Non-compliant (P ) −0.30 0, 0.10 0 −1.50 0, 0.50 0
Compliant (C ) −0.275, 0.1875 (7) 0.125, 0.1875
with just one mixed-strategy Nash equilibrium. In the well-mixed population model, as shown in Section 4, this corresponds to a dynamics in which the state of each population would oscillate over time without achieving a long run equilibrium. In contrast to the well-mixed population game, for all simulations performed with ξ ∈ [0.05; 0.55], non-compliant firms and no-inspecting auditors were led to extinction in the long run. In Fig. 2 we present the L × L square lattice for a particular realization using k = 1, β¯ = 2, ξ = 0.25 and initial conditions p0 = 0.05 ∧ q0 = 0.95 (pollution trap). The figure displays the evolution of the strategy distribution in each population from MCS = 0 to MCS = 800. In Fig. 3 the evolution of both populations over time is displayed. At MCS = 0, one can see the thin black stripe with the p0 L2 = 500 auditors inspecting firms at the initial conditions and the white dots corresponding to the (1 − q0 )L2 = 500 geographically spread firms that are compliant. Due to a state in which 95% of the firms are non-compliant, the best response for auditors is to inspect (πAIP = −0.300 > πANP = −1.500). But the vast majority of the non-inspecting auditors are neighbours to each other, hence, even if they perform worse than inspecting auditors in terms of payoff, they do not switch strategy. The exception to the latter occurs at the boundaries between both clusters of inspecting and noninspecting auditors. At the boundaries, inspecting auditors perform better and non-inspecting auditors start to switch strategy. The cluster with inspecting auditors remains compact and starts to get larger (MCS = 17), covering already more than half of the grid area by MCS = 100, until non-inspecting auditors are driven out of the population. In contrast, in the population of firms, the compliant ones have already disappeared by MCS = 17, except in the region corresponding to the cluster of inspecting auditors (Fig. 3 shows that during the initial Monte Carlo steps non-compliance actually increases among firms). In that region, compliant firms perform better (πFIP = 0.100 < πFIC = 0.1875) and very small compact clusters of compliant firms start to form. As the cluster of inspecting auditors becomes larger, the clusters of compliant firms start to merge (MCS = 100), becoming a single cluster that keeps growing (MCS = 150) until driving all non-compliant firms to extinction. Although the evolutionary process seems straightforward, there is one key element for the success of this long run equilibrium ( p∗ = 1, q∗ = 0): the speed at which the cluster of inspecting auditors spreads is much faster than the spreading speed of the cluster of compliant firms. And this fact remains true until only inspecting auditors survive. By MCS = 150, in the region of the square lattice corresponding to the cluster of compliant firms, the best response for auditors now is not to inspect. But in that region all auditors’
494
A.B. da Silva Rocha and G.M. Salomão / European Journal of Operational Research 279 (2019) 486–501
Auditors MCS=0 black: strategy I white: strategy N
Firms MCS=0 black: strategy P white: strategy C
Auditors MCS=17 black: strategy I white: strategy N
Firms MCS=17 black: strategy P white: strategy C
Firms MCS=150 black: strategy P white: strategy C
Auditors MCS=100 black: strategy I white: strategy N
Firms MCS=100 black: strategy P white: strategy C
Auditors MCS=150 black: strategy I white: strategy N
Auditors MCS=500 black: strategy I white: strategy N
Firms MCS=500 black: strategy P white: strategy C
Auditors MCS=800 black: strategy I white: strategy N
Firms MCS=800 black: strategy P white: strategy C
Fig. 2. Evolution of the state of each population for parameters: k = 1, β¯ = 2, ξ = 0.25 and initial conditions ( p0 = 0.05, q0 = 0.95 ) at MCS = 0; 17; 100; 150; 500; 800.
neighbours do inspect so they are not able to revert their strategy back to no inspection. Given the cluster of inspecting auditors moves faster than the cluster of compliant firms, at the boundaries between clusters in the population of auditors the best response of inspecting remains true. And the same happens with the best response to comply in the population of firms until the evolutionary equilibrium is achieved.
Proposition 2. For intermediate to high inspection cost, there is either a long run robust evolutionary equilibrium in which the pollution trap scenario disappears or an oscillatory long run behaviour after a transient period in which the shares of inspecting auditors and of compliant firms keep changing over time without reaching an equilibrium. Moreover, the likelihood of reaching an evolutionary equilibrium decreases with increasing cost of inspection. The long run average behaviour of each population does not match with that found in the well-mixed population evolutionary game, although the long run average values of p and q follow the sensitivity analysis discussed in
Section 4, i.e., ( A∂ c A ) <0∧ ∂ pF <0: as the inspection cost increases, the likelihood of inspecting decreases, leading to an increase in the share of non-compliant firms. ∂ π I −π N
∂π P
Discussion: The numerical simulations displayed possible oscillatory behaviour for 0.60 ≤ ξ ≤ 0.90. Out of 50 realizations performed with different seeds for each value of ξ , oscillatory long run behaviour was registered 4 (ξ = 0.60), 3 (ξ = 0.65), 31 (ξ = 0.75) and 50 (ξ = 0.90) times. Thus, as the cost of inspection increases, it reaches a boundary above which a long run equilibrium is not possible. In Fig. 4 the evolution of both populations over time is displayed for the case ξ = 0.75 showing the first 20,0 0 0 MCS and just the first 50 0 0 MCS as well. From the latter, it can be seen that after 500 MCS the non-oscillatory transient period is already over. Similar evolutionary patterns were found for ξ = 0.60, ξ = 0.65 and ξ = 0.90. In Fig. 5 we present the evolution of the L × L square lattice for a particular realization when oscillation occurs, using k = 1, β¯ = 2, ξ = 0.75 and initial conditions p0 = 0.05 ∧ q0 = 0.95 (pollution trap), from MCS = 0 to MCS = 20, 0 0 0. When compared to a realization without oscillation, the main important difference is that, when the cluster of inspecting auditors starts to spread, the spreading speed is not as fast as the speed in realizations with lower costs of inspection (e.g.: compare the size of the black stripe in the population of auditors for MCS = 22 in Fig. 5 with that for MCS = 17 in Fig. 2). This happens because, although the best
A.B. da Silva Rocha and G.M. Salomão / European Journal of Operational Research 279 (2019) 486–501
495
1
Auditors inspecting firms Non-compliant firms
0.9
Share of the population
0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0
50
100
150
200
250
300
350
400
450
500
550
600
650
700
750
800
Monte Carlo steps
1
1
0.9
0.9
0.8
0.8
Share of the population
Share of the population
Fig. 3. Time evolution of the state of each population for k = 1, β¯ = 2, ξ = 0.25 and initial conditions ( p0 = 0.05, q0 = 0.95 ).
0.7 0.6 0.5 0.4 0.3 0.2
0.6 0.5 0.4 0.3 0.2
0.1 0
0.7
0.1
Auditors inspecting firms Non-compliant firms 0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
Monte Carlo steps
0
Auditors inspecting firms Non-compliant firms 0
2500
5000
7500
10000
12500
15000
17500
20000
Monte Carlo steps
Fig. 4. Time evolution of the state of each population for k = 1, β¯ = 2, ξ = 0.75 and initial conditions ( p0 = 0.05, q0 = 0.95 ).
response for auditors is still to inspect, the difference in profits now is smaller. Then, when inspecting auditors compare their payoffs with non-inspecting auditors in their neighbourhood, although the former tend to perform better, the strategy switching process is probabilistically increasing in the payoff difference. Hence, the switching process now becomes slower than when the inspection cost was lower. With the clusters of compliant firms starting to grow in the opponent population, non-inspecting auditors at the boundary between clusters are now able to re-invade that region and create a small compact cluster (see MCS = 100). Once this happens, convergence to an evolutionary equilibrium with one strategy being driven to extinction in each population is doomed. Clusters of different strategies keep moving around the square lattice in both populations due to invasion and re-invasion with the state of both populations oscillating over time. In order to understand the long run average behaviour of each population, simulations with 50 0,0 0 0 MCS were carried out for each value of ξ and we computed the average state of each population after 30 0 0 MCS, updating the computation step after step
until 50 0,0 0 0 MCS, in order to guarantee the transient period was over and had no effect on the computed average states. In Fig. 6 the long run average shares of inspecting auditors and of noncompliant firms are displayed. Clearly, as the cost of inspection increases, the long run proportion of inspecting auditors decreases and the proportion of non-compliant firms increases. Proposition 3. When the inspection cost approaches c¯, i.e., ξ → 1, the oscillatory evolutionary pattern ceases to occur and is replaced by a different long run evolutionary equilibrium in which p∗ = 1 and q∗ = 1, i.e., a country where all auditors carry out a very costly and ineffective inspection process that does not lead to corporate environmental compliance. Moreover, the few compliant firms that were located in the country are driven to extinction. Discussion: As in Propositions 1 and 2, the inspecting cluster in the population of auditors starts to grow but now the growth speed is so slow that in the population of firms the cluster of compliant firms in the same region grows at a similar speed. Consequently, non-inspecting auditors are able to re-invade the small
496
A.B. da Silva Rocha and G.M. Salomão / European Journal of Operational Research 279 (2019) 486–501
Auditors MCS=0 black: strategy I white: strategy N
Firms MCS=0 black: strategy P white: strategy C
Auditors MCS=8 black: strategy I white: strategy N
Firms MCS=8 black: strategy P white: strategy C
Auditors MCS=22 black: strategy I white: strategy N
Firms MCS=22 black: strategy P white: strategy C
Auditors MCS=100 black: strategy I white: strategy N
Firms MCS=100 black: strategy P white: strategy C
Auditors MCS=200 black: strategy I white: strategy N
Firms MCS=200 black: strategy P white: strategy C
Auditors MCS=500 black: strategy I white: strategy N
Firms MCS=500 black: strategy P white: strategy C
Auditors MCS=20000 black: strategy I white: strategy N
Firms MCS=20000 black: strategy P white: strategy C
Auditors MCS=8000 black: strategy I white: strategy N
Firms MCS=8000 black: strategy P white: strategy C
Fig. 5. Evolution of the state of each population for parameters: k = 1, β¯ = 2, ξ = 0.75 and initial conditions ( p0 = 0.05, q0 = 0.95 ) at MCS = 0; 8; 22; 100; 200; 500; 8000; 20, 0 0 0.
A.B. da Silva Rocha and G.M. Salomão / European Journal of Operational Research 279 (2019) 486–501 0.9
0.9
Auditors inspecting firms Non-compliant firms
0.875 0.85
0.85 0.825
0.8 0.775
Share of the population
Share of the population
Auditors inspecting firms Non-compliant firms
0.875
0.825
0.75 0.725 0.7 0.675 0.65 0.625 0.6 0.575 0.55 0.525 0.5
0.8 0.775 0.75 0.725 0.7 0.675 0.65 0.625 0.6 0.575
0.475
0.55
0.45
0.525
0.425 0.4
0.5 10000
100000
10000
Monte Carlo steps
100000
Monte Carlo steps
0.85
0.85
Auditors inspecting firms Non-compliant firms
0.825 0.8
0.8
0.775
0.775
0.75 0.725 0.7 0.675 0.65 0.625
0.75 0.725 0.7 0.675 0.65 0.625
0.6
0.6
0.575
0.575
0.55
Auditors inspecting firms Non-compliant firms
0.825
Share of the population
Share of the population
497
0.55 10000
100000
Monte Carlo steps
10000
100000
Monte Carlo steps
Fig. 6. Time evolution of the long run average state of each population for k = 1, β¯ = 2, initial conditions ( p0 = 0.05, q0 = 0.95 ) and MCS ∈ [3, 0 0 0, 50 0, 0 0 0]; MCS in log-normal scale; from top left to bottom right, ξ = 0.60; ξ = 0.65; ξ = 0.75; ξ = 0.90.
stripe to an extent that the growth of compliant firms slows down and becomes negative until all compliant firms disappear from the population. Once this happens, the cluster with inspecting auditors starts to increase again at a very slow speed until taking over the population, but without any impact on compliance given all compliant firms were driven out of the country. The evolution of the state of both populations is displayed in the left panel of Fig. 7. Increasing c further, i.e., c > c¯, the stage game in (3) has one unique Nash equilibrium in pure strategies p∗ = 0 and q∗ = 1, i.e., a pollution trap. Within this range of values in which ξ > 1, the numerical simulations showed that the only evolutionary equilibrium in both the well-mixed population and the spatial game do match the classic game Nash equilibrium. The right panel in Fig. 7 shows the evolution of both populations for the case ξ = 1.02. Proposition 4. When the game played between auditors and firms is modelled assuming spatially distributed populations, corporate environmental compliance is generally more likely to be adopted than in games assuming well-mixed populations. Discussion: The statement follows directly from the results discussed in Propositions 1–3. In Proposition 1, the long run average value of q was q¯ SPATIAL ≡ q∗ = 0, while in Propositions 2, based on the long run behaviour of the population of firms displayed in Fig. 6, the spatial grid favoured environmental compliance given that results showed a long run average value of q below the one
found in the well-mixed population case. Thus, except for the case discussed in Proposition 3 corresponding to ξ → 1, for ξ ∈ [0.05; 0.90], q¯ SPATIAL < q¯ WELL-MIXED ≡ ξ . As a final word regarding the results of our 3-parameter model, whose payoff matrix in Eq. (4) depends on ξ , β¯ and k, our focus in the discussion of the model was on how changes in the cost of inspection would affect results given such cost is the only parameter the policy maker can control without having to rely on the firms’ actions. As discussed in Section 4, given that the profitability parameter k does not impact the best reply structure of the evolutionary game, for the sake of completeness we studied how changes in parameter β¯ , associated with the technology adopted by polluting firms, could impact the spatial game and the robustness of the results discussed in the Propositions. Simulations showed that results were robust. For low to intermediate cost of inspection, the robust evolutionary equilibrium ( p∗ = 1, q∗ = 0 ) discussed in Proposition 1 in which the pollution trap disappears in the long run, was the only possible outcome of the spatial game regardless of the value of β¯ ∈ (1.0, +∞ ) adopted. Regarding the results in Propositions 2, we found the same evolutionary patterns when β¯ varied, i.e., there was either a robust evolutionary equilibrium ( p∗ = 1, q∗ = 0 ) or an oscillatory pattern similar to that in the well-mixed game. One important effect of β¯ found in the numerical simulations was that, for given value of ξ , starting with β¯ just above 1.0 and increasing β¯ , there was a
A.B. da Silva Rocha and G.M. Salomão / European Journal of Operational Research 279 (2019) 486–501
1
1
0.9
0.9
0.8
0.8
Share of the population
Share of the population
498
0.7 0.6 0.5 0.4 0.3 0.2
0.6 0.5 0.4 0.3 0.2
0.1 0
0.7
0.1
Auditors inspecting firms Non-compliant firms 0
2000
4000
6000
8000
10000
12000
14000
0
16000
Auditors inspecting firms Non-compliant firms 0
100
200
Monte Carlo steps
300
400
500
600
Monte Carlo steps
Fig. 7. Time evolution of the state of each population for k = 1, β¯ = 2 and initial conditions ( p0 = 0.05, q0 = 0.95 ); ξ = 0.99 (left panel) and ξ = 1.02 (right panel). 1
Share of the population k=1.0; beta=1.5; 75% of cmax
Share of the population k=1.0; beta=1.1; 75% of cmax
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
2500
5000
7500
10000
12500
15000
17500
0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1
Auditors inspecting firms Non-compliant firms 0
0.9
0
20000
Auditors inspecting firms Non-compliant firms 0
2500
5000
Monte Carlo steps
10000
12500
15000
17500
20000
1
Share of the population k=1.0; beta=2.25; 75% of cmax
Share of the population k=1.0; beta=2.2; 75% of cmax
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
7500
Monte Carlo steps
2500
5000
7500
10000
12500
15000
17500
0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1
Auditors inspecting firms Non-compliant firms 0
0.9
20000
Monte Carlo steps
0
Auditors inspecting firms Non-compliant firms 0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
Monte Carlo steps
Fig. 8. time evolution of the state of each population for k = 1, ξ = 0.75, initial conditions ( p0 = 0.05, q0 = 0.95 ) and different values of β¯ ; from top left to bottom right, β¯ = 1.1; β¯ = 1.5; β¯ = 2.2; β¯ = 2.25.
threshold value of the latter above which the oscillatory behaviour in the spatial game ceased to exist. Above such threshold value of β¯ , the only possible evolutionary pattern is the convergence to the evolutionary equilibrium ( p∗ = 1, q∗ = 0 ). Thus, in the spatial game, for given cost of inspection, the auditing process is more effective the dirtier the technology employed by the polluting firms. In Fig. 8 we present such behaviour for numerical simulations using k = 1, ξ = 0.75 and different values of β¯ . As can be seen, increasing the value of β¯ , inspection becomes stricter with a higher average percentage of auditors inspecting firms. As a result, the average degree of compliance also increases. For β¯ = 2.25 and above, the simulations did not display an oscillatory behaviour, with all realizations leading to the evolutionary equilibrium in which the
pollution trap disappears. The same pattern was found for different values of ξ , with oscillations being possible for larger intervals of β¯ the larger the value of ξ was. As a reference, for ξ = 0.65, oscillation was a possible evolutionary outcome for β¯ ∈ (1.0, 2.1 ), while it was possible to observe oscillations for β¯ ∈ (1.0, 2.25 ) when ξ = 0.75 and for β¯ ∈ (1.0, 2.28 ) when ξ = 0.90. One important remark that can be observed in the results in Fig. 8 is that the result in Proposition 4 is robust for different values of β¯ , given that the average value of compliance in the spatial game is always higher than in the well-mixed game. Both the discussion above regarding variations in β¯ and the results in Propositions 1–3 for variations in the parameter related to the cost of inspection ξ can be compared to the results found in
A.B. da Silva Rocha and G.M. Salomão / European Journal of Operational Research 279 (2019) 486–501
models of cyclic dominance as in Szabó et al. (2004), Szolnoki and Szabó (2004) and Szolnoki et al. (2014). With regard to ξ , as discussed in Proposition 1, starting with low to intermediate values of that parameter, there was always an absorbing state corresponding to the evolutionary equilibrium ( p∗ = 1, q∗ = 0 ), as in Fig. 3. As seen in Propositions 2, increasing ξ , there could also appear global oscillations in the values of the shares of strategy adoption in both populations, i.e., in the values of p and q, as shown in Fig. 4. And increasing ξ further, such that ξ → 1 as in Proposition 3, another absorbing state would be reached, corresponding to the evolutionary equilibrium ( p∗ = 1, q∗ = 1 ). Such pattern found in our square lattice with varying parameter values ξ or β¯ has similarities with the pattern corresponding to the introduction of annealed randomness in a square lattice in Szabó et al. (2004). In the latter, global oscillation could only be observed between two threshold values of the degree of annealed randomness, as discussed in Section 2.3. In particular, for those intermediate to high values of ξ discussed in Proposition 2, there would also be local oscillations regarding strategy adoption in the square lattice. The latter can be seen in Fig. 5. If one observes for example the Southwest corner of the square lattice corresponding to the firms’ population in MCS= 20 0, 50 0, 80 0 0, 20, 0 0 0, it is possible to see strategy C (white region) in MCS=200 being locally replaced by strategy P (black region) in MCS=500 and the latter being replaced back by C in MCS=20,0 0 0. Similar patterns of local oscillations over time can be observed in other regions of the lattice for both populations. These local oscillations in Fig. 5 corresponding to intermediate to high values of ξ synchronize into global oscillations of p and q in Fig. 4 of our work and such behaviour is similar to the example of local oscillations synchronizing into global oscillations that is displayed in figure 3 in Szabó et al. (2004) for the case when the measure of annealed randomness P falls between the two threshold values, i.e., P1 < P < P2 , corresponding to the upper limit for local oscillations only (P1 ) and to the lower limit corresponding to convergence to an absorbing state (P2 ). 6. Conclusion We studied the interplay between corporate environmental compliance and policy regulation in a context where compliance enforcement requires costly inspections and there is a trade-off between maximizing social welfare and maximizing private profits. An evolutionary game was used to model such trade-off involving the conflict of interests between firms and the policy maker. We started with a benchmark model assuming well-mixed populations, an approach that was in line with previous evolutionary game models found in the literature such as in Anastasopoulos and Anastasopoulos (2012), Cressman et al. (1998) and Xiao and Yu (2006), among others. Using replicator dynamics to model the game, results did not find a long run evolutionary equilibrium. Instead, there was an oscillatory evolutionary pattern in which the state of each population kept changing over time, similar to the pattern found in the Lotka–Volterra model. We then extended the benchmark model by assuming the two populations were allocated in a square grid, with each location shared by a firm and an auditor. This is more in line with the real world where firms are geographically spread and auditors tend to work close to where they live. The extended model displayed a quite rich dynamics in which not only cyclical evolution was still possible but robust evolutionary equilibria could exist, depending on the cost of inspection. When the latter was very close to the maximum cost of inspection, the extended model showed that there was no advantage in the auditing process. The latter not only was not able to remove society from the pollution trap, but it also led the very few compliant firms located in the country to extinction. On the other hand, for low cost of inspection, there was a
499
long-run equilibrium with auditors committed to inspect firms and those were compliant with regulations. Our results add to the discussion in the literature on if audits do improve environmental compliance. Using data from Michigan, USA, Evans, Liu, and Stafford (2011) found no significant effects of environmental auditing on manufacturing facilities’ compliance with the US hazardous waste regulations under the Resource Conservation and Recovery Act (RCRA). Differently from our work in which auditing is conducted by auditors representing the role of the policy maker, in Evans et al. (2011) firms are responsible to self-audit themselves. Encouraging facilities to self-audit, i.e., transferring to the firms both the responsibility and cost of identifying their environmental liabilities, would be a cost-effective way to increase compliance should self-auditing indeed promote a long-run positive impact on environmental compliance, which was not endorsed by their results. Moreover, they also found no significant impact of auditing on the likelihood of the firm’s facility being inspected by regulators in the future. If one can associate selfauditing with a lower cost of inspection incurred by the policy maker, given the latter does not need to send an auditor to conduct the inspection, the results in our theoretical model would contrast with the empirical results in Evans et al. (2011). Of course, such comparison should be taken carefully given that our model setup did not take into account self-auditing, in which case firms could consider internalizing such self-auditing cost in their private profit function when making decisions on technology and emission level, impacting the results. In another study, Khanna and Widyawati (2011) used 1995–96 data for a sample of facilities belonging to S&P500 firms and found that self-auditing firms are more likely to be in compliance with the Clean Air Act (CAA) environmental regulations. They also account for the fact that firms might be reluctant to undertake an environmental self-auditing process because the latter can produce self-incriminating evidence. Stronger incentives to self-audit are then provided in some US states such as audit privilege and audit immunity. The former allows corporations to lawfully refuse to disclose audit reports as long as they correct detected environmental violations during the self-auditing process and the latter gives a guarantee that regulators will not punish firms for detected violations that are disclosed and corrected. While both policies could bring incentives to self-audit, audit privilege could in practice be used to shield and decrease non-compliance while audit immunity could increase compliance. Khanna and Widyawati (2011) indeed found evidence that audit privilege reduces compliance. On the other hand, audit immunity has a positive but not statistically significant impact on compliance. Despite their results contrasting with those in Evans et al. (2011), the authors in the latter argue that results might differ due to differences between the samples of firms used in both works and the fact that while they study the impact of self-auditing on long-run compliance with RCRA regulations, Khanna and Widyawati (2011) study the impact of selfauditing on short-run compliance with CAA regulations. Thus, although the results in the latter are in line with those of our model which found a positive impact of auditing on compliance for low cost of inspection, we again emphasize that one should be careful when comparing results between our theoretical and those empirical models. Our model and results provide several roads for future research: as discussed in Section 2.3, a future structured population model could contemplate the impact of site-specific invasion probabilities on the results as well as the effect of the presence of quenched and annealed randomness in the square lattice as in Szolnoki and Perc (2016). Quenched and annealed randomness could be associated with the fact that firms might tend to copy corporate practices of other local firms regarding pollution but managers might also interact with other managers from firms in the same sector that are
500
A.B. da Silva Rocha and G.M. Salomão / European Journal of Operational Research 279 (2019) 486–501
located in different regions, which might influence corporate practices and technology adoption. Site-specific invasion probabilities could be associated with different local laws that could make inspections easier or less costly in some regions of the lattice, thus affecting the payoff matrix entries differently across the square lattice, leading to site-specific strategy-revision probabilities. Payoff heterogeneity across the spatial structure of the game could also be studied using the approach in Amaral et al. (2016). They study the evolution of cooperation in one-population spatial games assuming that the game played at each pairwise interaction is drawn uniformly at random from a set of two different games, either a prisoners’ dilemma or a snowdrift game. The latter introduces payoff heterogeneity in the spatial game and they find that high levels of heterogeneity generally promotes cooperation. As they point out, payoff heterogeneity is in line with the lack of evidence that perceived payoff of players never changes over time. Thus, there is no evidence supporting the key assumption in the majority of evolutionary game theory research that agents play the same game during each interaction. In our two-population model, cooperation could be auditors playing I and firms playing C, while payoff heterogeneity could result from several assumptions such as for example from two randomly selected payoff matrices, one with low ξ and the other with high ξ , which would make the policy maker’s perceived cost of inspection in the social welfare function vary geographically and over time. For the evolution of cooperation in a two-population model, see da Silva Rocha (2017). On the cost of inspection, another possibility of further research could be in line with Cohen and Checko (2017). In the latter, they study the cost-efficiency of the 74 local health jurisdictions (LHJs) in Connecticut, USA, to verify if they are providing the optimal number of inspection services regarding food protection, private water wells, subsurface sewage disposal and child lead poisoning prevention. They address the question of whether or not LHJs could reduce their costs by providing more or fewer inspections as well as if it would be better for some jurisdictions to perform different types of inspections together. Their regression model based on data over the period 2005–2012 finds out that two small LHJs could share some inspection services in order to improve their efficiency. In the initial conditions of our spatial model, we opted for a policy in which the regulator selected a particular region of the country (a stripe in the lattice) to try to promote inspection. This could be associated with a few geographically close jurisdictions joining efforts to support the cost of inspection in order to promote environmental compliance. A future paper could analyse the case of geographically spread auditors playing strategy I at the initial conditions. Given the same cost of inspection, results from the centralized and decentralized enforcement policies could be compared in order to find the optimal policy. Future research could also study the spatial game using different spatial structures and/or different intensities of selection. Although our model was based on an L × L square lattice with periodic boundary conditions and von-Neumann neighbourhood, the game could be studied using different networks ranging from a numerical extension assuming the same lattice but with a Moore neighbourhood to more general spatial structures such as in Allen et al. (2017) in which the number of neighbours can vary. In the latter, they find a condition for how natural selection chooses between two strategies on any graph for weak selection. Societies based on strong pairwise ties tend to promote cooperation. Also differently from our model, they allowed for individual selfinteraction and replacement. Regarding weak selection, under the latter, the game payoff matrix contributes residually to the fitness of individuals (Nowak, Tarnita, & Antal, 2010). In the context of our problem, weak selection would imply the game pollution conflict had a lower degree of importance in society for both the firms’ profits and the policy maker’s social welfare.
The impact of information spreading about the inspection of firms could also be studied. In our model, we assume firms do not know the state of the auditors population despite their expected payoff depending on it. A future model could take into account a modification based on Kumacheva, Gubar, Zhitkova, Kurnosykh, and Skovorodina (2017). In their model, they study the interaction between tax payers and tax authorities using both a well-mixed and a structured population model. They combine game theory with a SIS (susceptible-infected-susceptible) epidemic model. The latter models the propagation of information among taxpayers regarding the probability of a tax audit taking place. They also take into account that tax payers might be risk-loving, risk-neutral or risk-averse when deciding whether to pay taxes or not. In a first future approach, we could adapt their framework into our model by assuming that polluters would be equivalent to the risk-loving agents, i.e., they would always be non-compliant independently of the state of the auditors population. Risk-averse agents would be the compliant firms. And there would be a third type of firm, which would resemble the risk-neutral agent in Kumacheva et al. (2017): assuming the firm could make use of abatement equipment in order to switch technology and emission level between fulfilling compliance or not, it would be able to play either strategy P or C. The decision on which strategy to follow would depend on the combination of two effects: (i) the effect of being informed or not about the risk of suffering an inspection. The probability of being informed would be modelled using a SIS-type model governing the evolution of the proportions I and S of informed (infected) and uninformed (susceptible) firms, respectively, i.e., I˙ = β SI − δ I ∧ S˙ = −I˙; I + S = 1; δ < β . Thus, such probability would be equal to the asymptotically stable proportion of informed firms I∗ = 1 − δ /β . In line with the SIS model, β would be the probability of an uninformed firm receiving information from an informed firm about the risk of suffering an inspection and δ would be the probability of such information becoming out of date; (ii) the firm’s tolerance to risk, i.e., in the spatial model each risk-neutral firm would have an implicit threshold probability pi , which would model the maximum risk of inspection that it would accept to take. In each MCS, auditors would either play I or N as in our model, but the policy maker would make public the true probability pMCS of an average firm suffering an inspection, which would be equal to the state of the auditors population. Combining the assumptions (i) and (ii) above, in each MCS when a risk-neutral firm was selected to play the spatial game, a random number i would be called. If i > I∗ , the firm would be uninformed about the risk of inspection and would chose to play C. Otherwise, the firm would be informed and if pi > pMCS (resp. pi < pMCS ), the firm would play strategy P (resp. C) in that MCS when interacting with its neighbours. Such framework would also satisfy bounded rationality. Finally, regarding the limitations of our model, in addition to the issues discussed above and suggested for future research, we recall that we studied the problem of pollution assuming large populations of firms and auditors. As discussed in Pacheco et al. (2014), the conflicting problem could take place in a country or region where the number of interacting agents is small, i.e., “real populations are finite and often rather small”. In such case, the replicator dynamics model for well-mixed populations should be replaced by a model taking into account small population effects such as drift. Differently from large-population models following replicator dynamics, where the fittest strategy takes over the entire population, in finite populations there is a fixation probability that the strategy with lowest fitness might take over the population (Nowak, 2006). In the same vein, the spatial model would require a square lattice with a smaller planar area L2 .
A.B. da Silva Rocha and G.M. Salomão / European Journal of Operational Research 279 (2019) 486–501
Acknowledgements This work is supported by “Programa de Incentivo à Produtividade em Ensino e Pesquisa” from PUC-Rio. We thank the referees for their insightful suggestions and comments, which helped us to improve the manuscript. References Allen, B., Lippner, G., Chen, Y.-T., Fotouhi, B., Momeni, N., Yau, S.-T., & Nowak, M. A. (2017). Evolutionary dynamics on any population structure. Nature, 544, 227–230. Amaral, M. A., Wardil, L., Perc, M., & da Silva, J. K. L. (2016). Evolutionary mixed games in structured populations: Cooperation and the benefits of heterogeneity. Physical Review E, 93, 042304. Anastasopoulos, N. P., & Anastasopoulos, M. P. (2012). The evolutionary dynamics of audit. European Journal of Operational Research, 216(2), 469–476. Arguedas, C., & Hamoudi, H. (2004). Controlling pollution with relaxed regulations. Journal of Regulatory Economics, 26(1), 85–104. Arthur, W. B. (1994). Inductive reasoning and bounded rationality. The American Economic Review, 84(2), 406–411. Bower, G. H., & Hilgard, E. R. (1981). Theories of learning. Prentice-Hall. Casasnovas, J. P. (2012). Evolutionary games in complex topologies: interplay between structure and dynamics. Springer-Verlag. Chen, X., Szolnoki, A., & Perc, M. (2015). Competition and cooperation among different punishing strategies in the spatial public goods game. Physical Review E, 92, 012819. Cohen, J. P., & Checko, P. J. (2017). Too big, too small, or just right? Cost-efficiency of environmental inspection services in connecticut. Health Services Research, 52, 2285–2306. Crawford, V. P. (2013). Boundedly rational versus optimization-based models of strategic thinking and learning in games. Journal of Economic Literature, 51(2), 512–527. Cressman, R., Morrison, W. G., & Wen, J. F. (1998). On the evolutionary dynamics of crime. The Canadian Journal of Economics, 31(5), 1101–1117. Dasgupta, S., Laplante, B., Wang, H., & Wheeler, D. (2002). Confronting the environmental kuznets curve. The Journal of Economic Perspectives, 16(1), 147–168. Duff, T. J., Chong, D. M., & Tolhurst, K. G. (2015). Using discrete event simulation cellular automata models to determine multi-mode travel times and routes of terrestrial suppression resources to wildland fires. European Journal of Operational Research, 241, 763–770. Evans, M. F., Liu, L., & Stafford, S. L. (2011). Do environmental audits improve long-term compliance? Evidence from manufacturing facilities in Michigan. Journal of Regulatory Economics, 40(3), 279–302. Friedman, J. W., & Mezzetti, C. (2002). Bounded rationality, dynamic oligopoly, and conjectural variations. Journal of Economic Behavior and Organization, 49, 287–306. Gardner, M. (1970). Mathematical games. Scientific American, 223, 120–123. Heyes, A. G. (1998). Making things stick: enforcement and compliance. Oxford Review of Economic Policy, 14(4), 50–63. Hofbauer, J., & Sigmund, K. (1998). Evolutionary games and population dynamics. Cambridge University Press. Holland, J. H., Holyoak, K. J., Nisbett, R. E., & Thagard, P. R. (1986). Induction. MIT Press. Husted, B. W., & Salazar, J. D. J. (2006). Taking Friedman seriously: maximizing profits and social performance. Journal of Management Studies, 43, 75–91. Khanna, M., & Widyawati, D. (2011). Fostering regulatory compliance: the role of environmental self-auditing and audit policies. Review of Law and Economics, 7, 129–164.
501
Klein, J. G., Smith, N. C., & John, A. (2002). Exploring motivations for participation in a consumer boycott. Advances in Consumer Research, 29, 363–369. Kumacheva, S. S., Gubar, E. A., Zhitkova, E. M., Kurnosykh, Z., & Skovorodina, T. (2017). Modelling of information spreading in the population of taxpayers: evolutionary approach. Contributions to Game Theory and Management, 10, 100–128. Malik, A. S. (2007). Optimal environmental regulation based on more than just emissions. Journal of Regulatory Economics, 32, 1–16. Mao, X., & He, C. (2018). A trade-related pollution trap for economies in transition? Evidence from China. Journal of Cleaner Production, 200, 781–790. Nowak, M. A. (2006). Evolutionary dynamics: Exploring the equations of life. Harvard University Press. Nowak, M. A., Tarnita, C. E., & Antal, T. (2010). Evolutionary dynamics in structured populations. Philosophical Transactions of the Royal Society B, 365, 19–30. Pacheco, J. M., Vasconcelos, V. V., & Santos, F. C. (2014). Climate change governance, cooperation and self-organization. Physics of Life Reviews, 11(4), 573–586. Pereira, M. A., Martinez, A. S., & Espíndola, A. L. (2008). Exhaustive exploration of prisoner’s dilemma parameter space in one-dimensional cellular automata. Brazilian Journal of Physics, 38(1), 65–69. Rumelhart, D. (1980). Schemata: The building blocks of cognition. In R. Spiro, B. Bruce, & W. Brewer (Eds.), Theoretical issues in reading comprehension (pp. 33–58). Erlbaum. Schank, R., & Abelson, R. P. (1977). Scripts, plans, goals, and understanding: An inquiry into human knowledge structures. Erlbaum. da Silva Rocha, A. B. (2017). Cooperation in the well-mixed two-population snowdrift game with punishment enforced through different mechanisms. Advances in Complex Systems, 20(04n05), 1750010. Sugita, M., & Takahashi, T. (2015). Influence of corporate culture on environmental management performance: An empirical study of Japanese firms. Corporate Social Responsibility and Environmental Management, 22, 182–192. Szabó, G., Szolnoki, A., & Izsák, R. (2004). Rock-scissors-paper game on regular small-world networks. Journal of Physics A: Mathematical and General, 37, 2599–2609. Szolnoki, A. (2014). The power of games: Comment on “Climate change governance, cooperation and self-organization” by Pacheco, Vasconcelos and Santos. Physics of Life Reviews, 11(4), 589–590. Szolnoki, A., Mobilia, M., Jiang, L.-L., Szczesny, B., Rucklidge, A. M., & Perc, M. (2014). Cyclic dominance in evolutionary games: a review. Journal of the Royal Society Interface, 11, 20140735. Szolnoki, A., & Perc, M. (2016). Biodiversity in models of cyclic dominance is preserved by heterogeneity in site-specific invasion rates. Scientific Reports, 6, 38608. Szolnoki, A., & Szabó, G. (2004). Phase transitions for rock-scissors-paper game on different networks. Physical Review E, 70, 037102. The Economist (1995). Saints and sinners (pp. 15–16). The Economist. June 24th Traulsen, A., Semmann, D., Sommerfeld, R. D., Krambeck, H.-J., & Milinski, M. (2010). Human strategy updating in evolutionary games. PNAS, 107(7), 2962–2966. Willensdorfer, M., & Nowak, M. A. (2005). Mutation in evolutionary games can increase average fitness at equilibrium. Journal of Theoretical Biology, 237, 355–362. Xiao, T., & Yu, G. (2006). Supply chain disruption management and evolutionarily stable strategies of retailers in the quantity-setting duopoly situation with homogeneous goods. European Journal of Operational Research, 173, 648–668. Zhang, J., Zhang, C., Cao, M., & Weissing, F. J. (2015). Crucial role of strategy updating for coexistence of strategies in interaction networks. Physical Review E, 91, 042101. Zhu, Q.-H., & Dou, Y.-J. (2007). Evolutionary game model between governments and core enterprises in greening supply chains. Systems Engineering – Theory and Practice, 27(12), 85–89.