On the effectiveness of service differentiation based resource-provision incentive mechanisms in dynamic and autonomous P2P networks

On the effectiveness of service differentiation based resource-provision incentive mechanisms in dynamic and autonomous P2P networks

Computer Networks 55 (2011) 3811–3831 Contents lists available at SciVerse ScienceDirect Computer Networks journal homepage: www.elsevier.com/locate...

2MB Sizes 0 Downloads 21 Views

Computer Networks 55 (2011) 3811–3831

Contents lists available at SciVerse ScienceDirect

Computer Networks journal homepage: www.elsevier.com/locate/comnet

On the effectiveness of service differentiation based resource-provision incentive mechanisms in dynamic and autonomous P2P networks Yufeng Wang a,b,⇑, Akihiro Nakao c, Athanasios V. Vasilakos d, Jianhua Ma e a

State Key Lab of Networking & Switching Technology, No. 10, XiTuCheng Road, HaiDian District, Beijing 100083, BUPT, China Nanjing University of Posts and Telecommunications, No. 66, XinMoFan Road, GuLou District, Nanjing 210003, China c University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan d University of Western Macedonia, Krinis 3, N.Erythraia, Greece 14671, Greece e Hosei University, 3-7-2 Kajino-cho, Koganei-shi. Tokyo 184-8584, Japan b

a r t i c l e

i n f o

Article history: Received 9 August 2010 Received in revised form 12 April 2011 Accepted 5 July 2011 Available online 26 July 2011 Keywords: Peer-to-Peer (P2P) Incentive mechanisms Public goods game Service differentiation

a b s t r a c t Intrinsically, P2P (Peer-to-Peer) networks are anonymous, dynamic and autonomous, which has the following implications: users can change their identities with near zero cost (cheap pseudonyms); most interactions should be one-time (that is, each peer has no idea about other peers’ behavior history, except their current behaviors); and all behaviors and actions are all endogenous, voluntarily chosen and determined by independent and rational peers. On the other hand, service differentiation based incentive mechanisms are proposed in P2P networks, which basically could be provided with two ways: punish defect behavior (punishment-based scheme), or reward cooperative behavior (rewardbased scheme). Then the naturally resulted question is that: under the above P2P networking environment, how to effectively design service differentiation based resource-provision incentive mechanisms? Our contributions are threefold. First, we found that the traditional service differentiation based incentive schemes cannot successfully encourage peers to contribute resource to the whole system, irrespective of punishment-based and rewardbased schemes. Then, if peers can voluntarily join the system, and small entry fee is set for participation, we obtained that the performance of punishment-based scheme (first providing high-level service plus punishment) is always better than that of reward-based scheme (first providing low-level service plus reward). Finally, unlike the existing result that was based on persistent users’ identities and truly repeated interactions, we illustrate that punisher’s average payoff in punishment-based scheme is almost same as the ideal but unfeasible case of reward-based scheme: rewarder could selectively reward other rewarders, and reward cost is zero. Ó 2011 Elsevier B.V. All rights reserved.

1. Introduction P2P systems are self-organizing and distributed resource-sharing networks. By pooling together the resources of many autonomous users, they are able to provide an inexpensive and highly scalable platform for distributed computing, storage or data-sharing, etc. Note that there are two extreme cases in resource management: resource allocation (allocation of the existing resource) and resource provision (provision of resource shared by all participants). In the first case, the designer ⇑ Corresponding author at: State Key Lab of Networking & Switching Technology, BUPT, China. E-mail addresses: [email protected], [email protected] (Y. Wang), [email protected] (A. Nakao), [email protected] (A.V. Vasilakos), [email protected] (J. Ma). 1389-1286/$ - see front matter Ó 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.comnet.2011.07.011

3812

Y. Wang et al. / Computer Networks 55 (2011) 3811–3831

should decide whether and what percentage of a good (with given predefined capacity) each peer should consume. In the second case, the designer’s task is to entice independent participant to provide resource (with its’ right share). In this paper, we focus on the latter case. Furthermore, in its simplest case, each peer acquires and contributes exactly the same amount of resource. Even for this extremely simple case, stimulating participants to voluntarily provide resource is still a very challenging task in anonymous, dynamic and autonomous P2P systems. In P2P systems, it is imperative for peers to voluntarily contribute resources (e.g., storage, bandwidth and content, etc.). However, intuitively, each peer would prefer to ‘‘free ride’’ on the contribution of other peers by consuming available resources and services without contributing anything, and thus avoid the corresponding costs. It was reported that nearly 70% of Gnutella users share nothing with other users (these users simply free-ride on other users who share information), and nearly 50% of all file search responses come from the top 1% of information sharing nodes [1]. In following-up study (5 yr later), it was found that 85% of users share nothing [2], which implies the free-riding problem had got worse in the intervening years. Generally, lack of cooperation is one of the key problems that confronts today’s P2P systems. Incentive mechanisms play a crucial role to encourage cooperation among autonomous nodes. Specifically, a simple rule-based (differential services based) incentive mechanism is preliminarily advocated by [3], to encourage the resource provision in PlanetLab (http:// www.planet-lab.org/), the most popular shared network testbed. However, the above work did not investigate deeply how to effectively design service differentiation based resource-provision incentive mechanisms in anonymous, dynamic and autonomous P2P systems. General P2P networks should be anonymous, dynamic and autonomous. By anonymous it means that, ideally, users would like to be anonymous, and not accountable for their actions, that is, users could change their identities with near zero cost (cheap pseudonyms). By dynamic it means that most interactions among peers are one-time, that is, each peer has no idea about other peers’ behavior history, except their current behaviors. By autonomous it means that there is no central management entity to assign peers to different classes. Specifically, each strategy and action are voluntarily chosen and determined by independent, rational and autonomous peers, and all behaviors are all endogenous, that is, no exogenous organization can effectively enforce the punishment and/or reward. Under the above networking environment, basically in the simplest case, we could define two categories of services: highlevel and low-level. Naturally, from practical viewpoint, the following two service differentiation based incentive schemes are possible:  Initially, all peers will be served with high-level class, and then, according to peers’ behaviors, voluntary punishers lower the defectors’ service level provided by those punishers, which is called punishment-based scheme in our paper;  Initially, all peers will be served with lower-level class, and then, according to peers’ behaviors, voluntary rewarders promote the cooperators’ service level provided by those rewarders, which is called reward-based scheme. Note that, in reward-based incentive scheme, the reason why low-level service should be initially provided, lies in that, in dynamic and autonomous P2P networks, no exogenous organization exists to enforce the punishment and/or reward, like reimbursing the cooperative behaviors with some out-of-band resource (or money), thus, rewarders have to set aside some resource for their rewarding behaviors in future, which lead to the fact that those peers have to provide relative lower-level service initially. Then, under the above anonymous, dynamic and autonomous P2P environments, the naturally resulted question is: for the above two service differentiation schemes, do they both perform well? or which is better to incentivize peers’ cooperative (or reciprocative) behavior? This paper attempts to answer the above question based on evolutionary game model. Our contributions are threefold. First, we prove that, when there exists punishment (or reward) cost, traditional service differentiation based incentive schemes might not work, that is, cannot stimulate peers to provide resource. Then, if peers could voluntarily join the system, and small entry fee is set for participation, we got that the performance of first providing highlevel service plus punishment scheme is better than that of first providing low-level service plus reward scheme. Finally, unlike the existing conclusion that was based on the persistent users’ identities and truly repeated interactions, we illustrate that punisher’s average payoff in punishment-based scheme is almost same as the ideal but unfeasible case of reward-based scheme: rewarder could selectively reward other rewarders, and reward cost is zero. The paper is organized as follows. Section 2 briefly describes the related work of service differentiation based incentive mechanisms and their differences from our work. Section 3 provides the architecture of service differentiation based incentive mechanisms. Specifically, we briefly summarize the features of anonymous, dynamic and autonomous P2P environment, and then analyze the reason why, in the above environment, traditional service differentiation based incentive mechanisms could not stimulate peers to provide resource to the whole system. Based on the public goods game, we propose the analytical models of punishment-based and reward-based incentive mechanisms for resource provision, and discuss system design of the entry fee enabled and service-differentiation based incentive mechanisms which aim at stimulating peers to contribute resource in community-based P2P applications with the feature of ‘‘contributing while consuming’’. In Section 4, we theoretically analyze the dynamics of the system, through inferring the fixation probability between pair of pure strategies. The simulation and theoretical results in Section 5 illustrates that the voluntary principle plus small entry fee in punishment based scheme can effectively encourage peers to provide resource. Section 6 briefly, discusses the related problems in the entry fee enabled and service-differentiation based incentive mechanisms. Finally, we briefly conclude this paper.

Y. Wang et al. / Computer Networks 55 (2011) 3811–3831

3813

2. Related work Recently, some price-based market approaches (e.g., free markets, commodity markets and auctions, etc.) have been proposed to maximize certain system-level goal, when facing peers’ rational behaviors [4]. However, even though such mechanisms might theoretically lead to optimal allocation in economic terms, they are extremely complex, unpredictable and unattractive: for example, require proper virtual currency, trusted third parties and detailed accounting; suffer from standard problems of virtual currencies (inflation or deflation, etc.). Thus, instead of trading resource units, a large part of researches (and practices) on P2P incentives mechanisms considers the design and deployment of simple rules based on reciprocity or fixed contributions. Specifically, a simple fixed-contribution scheme was proposed to alleviate the free-riding [5], in which each peer merely pays the same fixed fee toward the total cost, and peers unwilling to do so are excluded. Furthermore, it is found that imposing penalty on all users that join the system is effective under many scenarios, and in particular, system performance degrades significantly only when the turnover rate among users is high [6]. One of the weakpoints of the above fixed-contribution schemes lies in that, instead of the arbitrary value, the fixed contribution should be set as specific value, which has to be calculated based on global information in network. Inspired by public goods game, our previous work proposed a punishmentbased resource provision mechanism in P2P networks, in which an arbitrarily small entry fee is set for all peers, and peers can voluntarily join P2P resource provision system. Theoretical analysis and experimental results show that the proposed mechanism can incentivize peers to contribute resource, and the whole P2P network will almost converge to the state of punisher [7]. This paper significantly extends our previous work: deeply investigate the effectiveness of service differentiation based incentive mechanisms (punishment-based and reward-based) in anonymous, dynamic and autonomous P2P networks. Considering the rationality of each participant, game theory is an appropriate tool to analyze and design the incentive mechanism in P2P systems. For example, a game theoretic framework for incentives in P2P systems was preliminarily proposed to eliminate free riding and increase overall availability of the system [8]. However, like the above work, most classical game theory based incentive mechanisms always adopt the best response model to analyze the equilibrium of rational system, in which a player revises its strategy by choosing best reply to the current mean population strategy: each wants to maximize his utility, which depends on his benefit (the resources of the system he can use) and his cost (his contribution). We argue that their drawbacks are at least twofold. First, they cannot completely characterize the evolutionary dynamics of strategies as a whole. That is, in a sense, examining a single equilibrium does not give much insight about how strategies spread, when facing strategy mutation. By strategy mutation it means that some users may occasionally not follow perfect rationality, and arbitrarily changes their strategies due to curiosity or mistake. Second, those schemes always assume forward-looking model of perfect rationality of participants, which works on the principle of utility optimization (assumed each peer has the knowledge about the global information and selects strategies as a best response to current system’s state). The above perfect rationality brings great burden to peer’s cognitive ability, and in most cases, it is unfeasible. On the contrary, our paper adopts EGT (Evolutionary Game Theory) inspired approach, in which individuals attempt to optimize their utilities by imitating the behaviors of peers with better payoff. The above stochastic learning is a backward-looking approach, and thus assumes much lighter cognitive capabilities on the part of individuals than does traditional rationality. Note that a general framework is proposed to analyze and design the reciprocation-based incentive protocols in P2P networks [9,10], in which peers distributively learn and adapt their actions. Particularly, the authors preliminarily show the correlation between evaluating the incentive protocols and EGT. In order to solve several weakpoints in [9,10], our previous work [11] thoroughly investigated the evolutionary dynamics of soft security mechanism, namely, reciprocity-based incentive mechanism in P2P systems based on EGT. Instead of the pairwise interaction models that are often adopted to characterize peers’ interactions, this paper uses the more realistic public goods game to characterize the interaction among peers, and designs Markov model to calculate the stationary distribution of various strategies. Generally, EGT-based approaches include three phases: interaction phase, evolution phase and mutation phase [12].  Interaction phase specifies some rules by which entities interact. Interactions among individuals are always modeled as some specific game like Prisoner’s Dilemma (PD) game, coordination game or snowdrift game, etc. [13].  In evolution (reproduction) phase, each agent differentially reproduces children based on its utility. The reproduction can be genetic (entities actually reproduce next generation) or cultural (entities are seen as behaviors or ideas that can replicate horizontally among peers within a generation). The above interpretation of cultural reproduction gives us a clue as to how evolutionary models can be used in modeling of P2P incentive mechanisms.  Mutation phase means that, in the evolution, with very small probability, agents change their strategies to incorporate innovation. In our framework, small mutation can be intuitively interpreted as small percentage of peers trying to exploit the ‘‘new world’’. Note that the above mutation phase cannot be appropriately modeled in classical game theory, which is also one motivation that we design the EGT-based incentive schemes.

3. Architecture of service differentiation based incentive mechanisms 3.1. Description of anonymous, dynamic and autonomous P2P environment Typically, P2P networking environment is anonymous, dynamic and autonomous, which includes the following implications:

3814

Y. Wang et al. / Computer Networks 55 (2011) 3811–3831

 Peers can change their identities with near zero cost (cheap pseudonyms), and most interactions among peers are onetime, that is, peers do not play repeatedly, or peers simply do not know they play repeatedly with their co-players (because of anonymity). In other words, each peer has no idea about other peers’ behavior history, except their current behaviors. Thus, we cannot count on the belief: my opponent tomorrow may condition her choice on my play today, to incentivize peers’ cooperative behaviors today.  All behaviors and actions are endogenous, voluntarily selected and determined by independent, rational and autonomous peers. That is, no exogenous organization enforces the behaviors of punishment and/or reward, like reimbursing the cooperative peers with some out-of-band resource (or money). Furthermore, in service-differentiation incentive schemes, there exists no central management entity to assign peers to different service classes. Thus, ideally service differentiation based incentive mechanisms should have the following features which could make incentive mechanisms robust and easy to implement: the designed incentive mechanisms should not require the existence of any sort of user memory, should not require the ability to permanently expel users from the group, and should not rely on tracking the long term behavior of users. Note that BitTorrent [14] is an example of a real world application focusing on bandwidth provisioning for P2P file share networks, which actually implements a Tit-for-Tat-like reciprocative incentive scheme without relying on past transactions of peers but on a direct exchange of resources (or just relying on the opponents’ behaviors in the previous round). Because the incentive scheme does not rely on tracking the long term behaviors of peers, it is simple to implement and largely immune to problems of false trading and whitewashing. In fact, interactions in BT-like applications were always pairwise: peers reciprocate uploading to peers which upload to them, with the goal of at any time having several connections which are actively transferring in both directions. Our work (entry fee enabled and service-differentiation based incentive mechanism, described in the following sub Section 3.3) also does not rely on tracking the long term behaviors of users (in each round, only through observing the behaviors of users in same group, each user determines her strategy). However, our work differs from the Tit-for-Tat like mechanisms in the following aspects: first, we focuses on the common goods aspect of P2P applications (that is, users contribute to the system as a whole while consume); second, the existing Tit-for-Tat like schemes cannot fully characterize the evolutionary dynamics of various strategies, because it did not consider the strategy mutation of users. By strategy mutation it means that some users may occasionally not follow perfect rationality, and arbitrarily change their strategies due to curiosity or mistake. 3.2. Failure of traditional service differentiation based incentive mechanisms Basically, in traditional punishment-based incentive mechanism, there exist two types of peers, the punishers and defectors, respectively denoted as P and D. P is cooperative with all other peers through offering benefit a1 to the system (and bearing the corresponding service provision cost cs). And meanwhile, P voluntarily punishes the defectors: cp represents the penalty that each P peer imposes on each D peer; cu the incurred cost to the P peer. D does not provide resource to the system, and only consumes resource provided by whole system. Similarly, in traditional reward-based incentive mechanism, there also exist two types of peers: rewarder and defector, respectively represented as R and D. R is cooperative with all peers through offering benefit a2 to the system, and bearing the corresponding service provision cost cs. And meanwhile, R voluntarily offers extra reward br, to each peer who provides resource to the whole system, and bears the incurred cost, cr, for the benevolent behavior. Normally, br is larger than cr. Note that, as described in the above section, for the same system with fixed size of resource, a1 in punishment-based scheme should be larger than a2 in reward-based scheme, which corresponds to the fact that punishment-based scheme could firstly provide high-level service, but reward-based scheme has to provide lower-level service initially. Then, the following corollary can be straightforwardly obtained: Corollary 1. Under the anonymous, dynamic and autonomous P2P networking environment, when there exist some ‘C’ mutants, the traditional service differentiation based incentive mechanisms could not work (that is, can not successfully stimulate peers to contribute resource to the system), irrespective of punishment-based and reward-based schemes. The brief proof is given as follows. For simplicity, we adopt the pairwise interaction to model peers’ behavior in well-mixed system. Note that, for multiplayers’ interaction, which can be modeled as public goods game, we can draw the same quantitative conclusion. Specifically, for punishment based scheme, based on peers’ behaviors, the payoff matrix A is given as follows:

ð1Þ

Obviously, P and D are bistable, which means that, choosing between P and D, each strategy is a best response to itself. Speu cifically, the typical dynamics is given as follows: if the frequency of P peers, xp, satisfies the inequality xp > ccps þc , then P will þcu dominate, otherwise, D dominates. Usually, cp  cs, thus, P always dominates the system.

Y. Wang et al. / Computer Networks 55 (2011) 3811–3831

3815

Similarly, for reward based scheme, the payoff matrix A is given as follows:

ð2Þ s Then, R and D are bistable, and if the frequency of R peers, xR, satisfies the inequality xR > brcc , then R dominates, otherwise, r D dominates. In brief, if as wished ideally, there only exist two strategies P and D (or R and D) in punishment-based (or reward-based) service differentiation incentive mechanism, then the scheme always works, that is, P (or R) will dominate in P2P network. In a sense, Eq. (1) can partially explain the success of BitTorrent application. In detail, normally, there only exist two strategies in BitTorrent system: Tit-for-Tat and Defect, in which Tit-for-Tat strategy corresponds to the P strategy in Eq. (1) with zero punishment cost to P peers. Thus, BitTorrent system would be always dominated by Tit-for-Tat peers. And furthermore, the repetition property inherent in BitTorrent interactions will further facilitate the dominance of Tit-for-Tat strategy greatly. However, in anonymous, dynamic and autonomous networks, each peer could independently determine its behaviors. Thus, in the above punishment-based and reward-based incentive mechanisms, with small mutation probability, some peers might conceive the following simpler (and more ‘‘stupid’’) strategy: do not assume other jobs (punishment or reward), only provide resource (or service) for the P2P network, so-called C strategy. It is important to note that, for punishment-based scheme, due to the dynamic characteristics, P peers can only determine whether to punish peers or not, totally according to other peers’ current behaviors (i.e., provide/do not provide service to system), thus it is reasonable to assume that P can only punish D peers. Then, for the 3-strategy punishment-based scheme (C, D and P), the payoff matrix can be given as follows:

ð3Þ

Briefly, the dynamics based on the payoff matrix (3) will finally converge to D. The reason lies in that: considering C peer obtains the same payoff as P peer, for small mutation, cooperators can invade and replace punishers through neutral drift. Once cooperators have taken over the whole system, defectors are advantageous and take over. Furthermore, as described above, P and D are bistable, which means that each strategy is a best response to itself. That is, a small percentage of P mutants could not successfully invade the population of D. Thus, the final state will be stuck in D. Actually, the same conclusion was also provided in [15]. Note that recently, the costly punishment has been shown to invade when a rare percentage of individuals are allowed to opt out of cooperative ventures [16]. In reward-based scheme, similarly, R peers have to reward both R and C peers (because, in the very time, R and C peers all provide good service to system), thus, in this scenario, the pair-wise payoff matrix can be given as follows:

ð4Þ

The above matrix implies the following characteristics: R will always transfer into C (because, in comparison with R peers, C peers get the extra reward, and do not bear any reward cost); C will always transfer into D; R and D are bi-stable, which means that a small percentage of R mutants cannot invade the D population. Thus, finally, the whole P2P population will be converged to the D state. Interestingly, it is shown that reward is as effective as punishment for maintaining public cooperation and leads to higher total earnings [17]. But the above conclusions rely on truly repeated games, in which player identities persist from round to round. Probably, in real social life, the persistent users’ identities and truly repeated interactions could be natural. But, in large-scale and autonomous networks, typically, peers cannot track the identity of other group members, and moreover, the interaction groups are changed or the identities of group members are reshuffled in every evolutionary round. Thus, for incentive mechanisms in anonymous, dynamic and autonomous P2P networks, it is infeasible to assume the persistent identity and repeated interactions. In summary, for service differentiation based incentive schemes (punishment-based or reward-based), if, as initially wished, there only exist two strategies, P and D in punishment-based scheme (or R and D in reward-based scheme), then under appropriate conditions, P (or R) will dominate, that is, the differential service based schemes will work: Stimulate peers to contribute resource to the whole system. But, unfortunately, if the C peers appear (due to mutation) that only provide service, and do not conduct punishment (or reward) behavior, then service differentiation based incentive mechanisms could not work at all, irrespective of punishment-based or reward-based schemes. Note that it is shown that, if unconditional cooperators (C peers) are added into systems composed of only two strategies: TFT and D (e.g. BitTorrent-like systems), the equilibrium distribution is entirely centered on D [18]. The evolutionary dynamics can be briefly described as follows: C can invade and take over a TFT population by random drift, but C population is

3816

Y. Wang et al. / Computer Networks 55 (2011) 3811–3831

quickly taken over by D players. Naturally, D is the selected strategy. Briefly, adding C undermines the evolutionary success of TFT. 3.3. Schematic illustration of entry fee enabled and service-differentiation based incentive mechanisms For the simplicity, in the above derivation, we only use the pairwise interaction payoff matrix to characterize the dynamics of whole system. Basically, there exist positive externalities associated with peer’s contribution (i.e., one’s contribution benefits equally all other peers since it increase the value of the system as a whole). Thus, naturally, we could use the public goods game to model the punishment-based (and reward-based) service differentiation incentive mechanisms. In a typical public goods experiment, a group of, e.g., six players get an endowment of $10 each. Every player then has the option to invest part or all of his money into a common pool knowing that the experimenter is going to triple the amount in the pool and divide it equally among all players regardless of their contribution. If everybody invests their money, each player ends up with $30. However, each invested dollar only yields a return of 50 cents to the investor. Therefore, if everybody plays rationally, no one will invest, and hence the group of players will forgo the benefits of the public good. In order to provide an escape hatch out of the state of stalemate (mutual defection) in the above service differentiation based incentive mechanisms, which can operate under full anonymity, we assume that, besides the above three strategies, there also exist an extra strategy, L, denoted as the loners, who just stand by the resource provision scheme. And if those loners want to join the system, small entry fee will be exerted (for all peers who want to join the resource provision scheme), and after joining, they could choose to be one strategy of other three strategies C, D and P in punishment-based scheme (or C, D and R in reward-based scheme). Fig. 1 illustrates punishment-based incentive mechanism scheme. Normally, there exist four types of users in P2P systems: C, D, P, and L. The L peers, represent individuals who, by default, just stand by (do not join the public enterprise), and thus do not pay the entry fee, and the other users include D peers who participate but do not contribute, C peers who contribute but do not punish the defectors, and P peers who not only contribute to the commonwealth but also punish the defectors. As shown in Fig. 1, L peers can voluntarily join the resource provision scheme, and all participants can voluntarily quit the scheme. For simplicity, we assume that each C (and P) peer will provide benefit a1 to resource pool, and bear service-providing cost cs. Furthermore, each P peer will impose penalty cp, on D peer, and similarly incur punishment cost cu for this behavior. Similarly, Fig. 2 shows the reward based incentive mechanism. In reward based incentive mechanism, there exist four types of users: C, D, L and R. The behaviors of C, D and L peers are same as the punishment based scheme. The difference lies in that: R peers not only cooperate, but voluntarily provide extra reward, br for other peers who provide resource to the system, which bring them small cost, cr. As shown in Fig. 2, the rewarder’s behavior implies that they not only reward to other R peers, but also reward the C peers. Considering the inherent dynamics of peer’s behavior, we simply assume that each peer imitates the strategy of peer with better utility. Note that, in order to make the theoretical analysis feasible, our paper assumes extremely simple and ideal model for P2P resource provision. Actually, in real-world situations, costs and benefits should be carefully defined and measured empirically. But, even though our economic models for service differentiation based incentive mechanisms are rather crude and abstract many practical aspects of implementations, we can still see some interesting implications. To help clearly describe the theoretical models, the following tables, Table 1–3, respectively provide the common symbols used in service differentiation based incentive mechanisms, the specific symbols used in punishment-based incentive scheme, and the specific symbols used in reward-based incentive scheme. For clarity, in the following description, we let the number 1, 2, 3 and 4 denote the strategies of cooperators, defectors, punishers (or rewarders in reward-based scheme), and loners respectively.

Small entry fee

L

rce

Voluntarily quit C

Pen

alty

D efi t

sou

rce

sou

Re

Incurred cost for punishment

Be n

Benefit

on isi ov st Pr co

Re

Provision cost

Bene

C L

P

fit

L

Voluntary participation

Resource pool

P

Cooperator

D

Defector

P

Punisher

L

Non-participant

C

...

D

Fig. 1. Schematic illustration of punishment-based incentive mechanism.

3817

Y. Wang et al. / Computer Networks 55 (2011) 3811–3831

Incurred cost for reward

Incurred cost for reward Rew ard

R

C

Cooperator

D

Defector

R

Rewarder

L

Non-participant

e rc ou

Voluntarily quit

D

s Re

L

fit

e

rce sou Re

Ben e

ion vis Pro cost

it nef Be

Reso urc

R

Resource pool

efi t

C L

Provision cost

Reward

Be n

L

Small entry fee

Voluntary participation

R C

...

D

Fig. 2. Schematic illustration of reward-based incentive mechanism.

Table 1 The common symbols used in service differentiation based (punishment-based and reward-based) incentive mechanisms. Symbol

Definition

M N

The The The The The The

l b cs ce

total number of peers in P2P system average number of peers in each group strategy mutation probability in evolutionary phase intensity of selection cost that cooperators (or punishers and rewarders) incur by provision of resource, without loss of generality, let cs = 1 entry fee set for peers who will join the resource provision scheme

Table 2 The specific symbols used in punishment-based incentive scheme. Symbol

Definition

a1

The The The The

cp cu iG, iB, iP, iL

benefit that one peer would voluntarily provide to the system penalty that P peer imposes on D peer incurred cost for P peer for conducting the punishment behaviors respective number of cooperators, defectors, punishers and loners existing in the P2P system

Table 3 The specific symbols used in reward-based incentive scheme. Symbol

Definition

a2

The The The The

br cr iG, iB, iR, iL

benefit that one peer would voluntarily offer to the system extra benefit that R peer offers to peers who provide service to P2P network incurred cost for R peer for conducting the reward behaviors respective number of cooperators, defectors, rewarders and loners existing in the P2P system

3.4. System design of entry fee enabled and service-differentiation based incentive mechanisms Basically, our proposal, entry fee enabled and service-differentiation based incentive mechanisms (especially, punishment-based scheme) aim at stimulating peers to contribute resource in community/group-based P2P applications with the feature of ‘‘contributing while consuming’’. Specifically, through dynamically forming virtual and/or physical communities/groups, users can collaboratively contribute to (meanwhile consume) services which are composed by the applicationspecific resources voluntarily provisioned by participants. Typical examples include file sharing system, distributed storage and backup service, etc. Physical or virtual group formation is clearly a critical first step for collaborative P2P applications. Roughly speaking, the system design that we present contains a central server, called tracker, which is operated by P2P service provider to supply basic system management operations, such as user registration and formation of virtual community. It can also be implemented in a distributed way using well-known techniques (e.g., Distributed Hash Tables, DHTs). Such a task is however outside the scope of this paper. Thus, virtual/physical community could be formed with the help of tracker: tracker can organize users according to some criteria or simply in random way. Through group-based interactions, it is feasible to assume that

3818

Y. Wang et al. / Computer Networks 55 (2011) 3811–3831

Community-based P2P Applications

Joining/Leaving/ Reshuffling

C user

L user

L user

C user P user D user

P user

D user

Tracker Register service

Communications and connection service

Community formation service

Other services

Fig. 3. Schematic infrastructure of community-based P2P application environment.

users’ behaviors in one group could be monitored by other members in the same group. Or simply, the tracker can gather users’ behaviors, and notify all users. The key point is that, even though the tracker could know users’ behaviors, but she cannot do anything further. Because all behaviors in anonymous, dynamic and autonomous P2P systems are endogenously and voluntarily conducted by users, it is unfeasible to assume that some users could be permanently expelled from the group. As described above, in the simplest case, we can define two categories of services: high-level and low-level. Using punishment-based scheme as example, initially, all participants will be served with high-level class, and then, according to each participant’s behavior, voluntary punishers lower the defectors’ service level provided by those punishers. That is, the punishment enforcement includes two parts: first checking the behaviors of participants in the identical group, and then providing lower service to defectors, which incurs a certain fixed cost to the punishers executing it. Briefly, the proposed service-differentiation based incentive mechanisms rely only on the time a participant is consuming resources in the P2P system, and is memory-less. Moreover, our scheme intentionally divides resource provision in P2P applications into multiple rounds. In every round, typically the groups are changed or the identities of group members are reshuffled, which could appropriately model users’ dynamic behaviors (like leaving and joining the community, etc.) and the effect of cheap pseudonyms. And furthermore, each round consists of 2 stages: in stage 1, users voluntarily contribute resource according to their strategies; in stage 2 (using punishment-based scheme as example), P participants voluntarily punish D participants. Participants’ average payoffs in a round are summed over both stages, and each user randomly finds out another user, and imitates the latter user’s strategy, if the latter had larger utility. Then next round will be started from beginning. Briefly, each round consists of the same two stages, but, users in each group may dynamically change. In such settings, users often cannot track the identity of other group members who punished them. The above designs reduce or eliminate effects of reputation, as well as retaliation by those who have been punished (in punishment-based scheme). Note that a free-rider might choose to whitewash, i.e., leaves and rejoins the network with a new identity, to avoid the penalty imposed on the free-rider. In our scheme, the interactions are intentionally divided into multiple rounds, and from one round to another round, group memberships and their identities were actively reshuffled, which can more or less accommodate the whitewashing. Furthermore, it is shown that it is possible to counter the whitewashing by imposing the penalty on all newcomers, which is so-called social cost [19]. In a sense, round-based entry fee in our scheme acts as the role of the social cost. In brief, our paper’s goal is to facilitate to design a system that implements the incentive mechanism produced by our theoretical work without gathering and managing accounting information concerning peers’ past transactions. Fig. 3 shows the schematic infrastructure of community based P2P application environment. The underlying function block schematically represents the role of tracker, which provides various system services, like register service, communications and connection management service, community formation service, etc. The upper clouds denote the formed virtual/physical communities/groups to collaboratively provide and consume service, in which users can voluntarily determine their behaviors: participate in or not, provide resource or not, punish defectors or not, etc. 4. Theoretical analysis of entry fee enabled and service-differentiation based incentive mechanisms Generally, the analysis of the stochastic dynamics of incentive mechanisms including multiple strategies (more than 2) can be greatly simplified in the limiting case: the mutation probability is near to zero, in which the whole P2P system almost always consists of one or two types at most. This holds because, when the mutation probability is zero, the multiple

3819

Y. Wang et al. / Computer Networks 55 (2011) 3811–3831

monomorphic states are absorbing, and for sufficient small mutation probability, the fate of a mutant (i.e. its elimination or fixation) is settled before the next mutant appears. Thus, we can infer the final dynamics of whole system through analyzing the transition probability among pair of pure strategies. In brief, the following generic model can be used to analyze the evolutionary dynamics. That is, pairwise comparison model is adopted to infer the fixation probability between any pair of strategies; those fixation probabilities define the transition probabilities of a Markov process among the 4 homogeneous states of the population, and the stationary distribution can approximately indicate the probability to find the system in one of the 4 homogeneous states. Basically, in finite population, the groups engaging in a public goods game can be modeled by multivariate hypergeometric sampling. For transition between two pure states, this reduces to sampling (without replacement) from a hypergeometric distribution. In a population of size M with mi individuals of type i and (M  mi) of type j, the probability of selecting k individuals of type i and (N  k) individuals of type j in N trails:

 Hðk; N; mi ; MÞ ¼

mi



k



M  mi



Nk  M N

ð5Þ

:

Obviously, the following two equations hold: N X

Hðk; N; mi ; MÞ ¼ 1;

k¼0

N X

k  Hðk; N; mi ; MÞ ¼

k¼0

N mi : M

Thus, in a population of iG cooperators and (M  iG) defectors, the average payoff to cooperators P12, and defector P21 can be respectively denoted as follows:

    kþ1 a N1 Hðk; N  1; iG  1; M  1Þ a1  cs  ce ¼ 1 1 þ ðiG  1Þ  cs  ce ; N M1 N k¼0   N1 X k a N1  iG  ce : Hðk; N  1; iG ; M  1Þ a1  ce ¼ 1  P21 ðiG Þ ¼ N N M1 k¼0

P12 ðiG Þ ¼

N1 X

ð6Þ ð7Þ

Note that, in P12, the focal peer is C peers, so the total number of other C peers is denoted as (iG  1). Similarly, in punishment-base scheme, we can get:

P13 ¼ P31 ¼ a1  cs  ce ; P14 ðiL Þ ¼ P34 ðiL Þ ¼

P23 ðiP Þ ¼

  1

ð8Þ iL



N1

M1 N1



 ða1  cs  ce Þ 



iL N1

 ðN  1Þ a1  c p iP  c e ; ðM  1Þ N

a1 N

M1 N1

  ce ;

ð9Þ

ð10Þ

P24 ðiL Þ ¼ ce ; P32 ðiP Þ ¼



ð11Þ

 cs  ce  ðN  1Þcu þ

 ðN  1Þ a1 þ cu ðiP  1Þ; ðM  1Þ N

P41 ¼ P42 ¼ P43 ¼ 0:

ð12Þ ð13Þ

In reward-based scheme, unlike punishers who punish the defectors, the rewarder provides extra benefit for other peers who offer good service to the system, thus the formats of average payoffs are identical as those in punishment-based scheme (with a1being replaced by a2), except P13, P31, P23, and P32 and P34.

P13 ðiG Þ ¼

N1 X

Hðk; N  1; iG  1; M  1Þða2 þ ðN  k  1Þbr  cs  ce Þ

k¼0

¼ a2 þ ðN  1Þbr 

P31 ðiG Þ ¼

N1 X

N1 ðiG  1Þbr  cs  ce ; M1

ð14Þ

Hðk; N  1; iG ; M  1Þða2  cs  ce  kcr þ ðN  kÞðbr  cr ÞÞ

k¼0

¼ a2 

  N1 N1 iG  c r  c s  c e þ N  iG ðbr  cr Þ; M1 M1

ð15Þ

3820

Y. Wang et al. / Computer Networks 55 (2011) 3811–3831

P23 ðiR Þ ¼

N1 X

Hðk; N  1; iR ; M  1Þ



k¼0

k a2  ce N

 ¼

a2 N  1

iR  ce ;  N M1

ð16Þ

  kþ1 Hðk; N  1; iR  1; M  1Þ a2 þ ðk þ 1Þðbr  cr Þ  cs  ce N k¼0  a  N  1 2 ðiR  1Þ  cs  ce ; þ br  cr 1 þ ¼ M1 N

P32 ðiR Þ ¼

N 1 X

  P34 ðiL Þ ¼ 1 

iL N1



M1 N1



 ða2  cs  ce þ br  cr Þ 



iL N1



ð17Þ M1 N1

  ce :

ð18Þ

Reproduction can be genetic or cultural. We adopt the pairwise comparison rule to model the reproduction phase, which has been recently shown to provide a convenient framework of game dynamics at all intensities of selection. According to this rule, two individuals from the population, i-strategy and j-strategy are randomly selected for update (only the selection of mixed pair can change the composition of the population). The strategy of i will replace that of j with a probability given by the Fermi function:



1 : 1 þ ebðPij Pji Þ

ð19Þ

The quantity b, which in physics corresponds to an inverse temperature, controls the intensity of selection. When b is large, the individual with the lower payoff will always adopt the strategy of the other individual. For b  1, we recover the weak selection limit of the frequency dependent Moran process, which can be viewed as a high temperature expansion of the dynamics [20]. Note that, the assumption of weak selection is always used in biological and social systems, which can be justified in different ways: first, in most real life situations we are involved in many different games, and each particular game only makes a small contribution to our overall performance; second, in most case, only weak selection is analytical. However, we argue that the assumption of strong selection is more suitable for technological network. Hereby, we put emphasis on the strong selection. Then, the probability to increase the number of i-strategy peers from l to l + 1, and the probability to decrease the number of i-strategy peers from l to l  1, can be represented as:

T l ¼

l Ml 1 : M M 1 þ ebðPij ðlÞPji ðlÞÞ

ð20Þ

The quantity of interest in finite population dynamics is the fixation probability qij, which is the probability that a population of j-strategy peers invaded by a single i-strategy peer evolves with mutations to a population of all i-strategy peers. The probþ ability depends only on the ratio al ¼ T  j =T j . For the pairwise comparison process, the ratio reduces to al = exp[b(Pij(l)  Pji(l))]. Finally, the fixation probability of qij is given as follows [20]:

1

qij ¼ PM1 Qk k¼0

l¼1

al

:

ð21Þ

The fixation probabilities qij (i, j = 1, 2, 3, 4) define the transition probabilities of a Markov process among the four different homogeneous states of P2P population. The transition matrix is shown as the following term (22), and the stationary distribution can be easily calculated, which approximately indicates the probability to find the system in one of the four homogeneous states. That is, the normalized right eigenvector to the largest eigenvalue (which is 1 for the following matrix (22)) determines the stationary distribution, i.e., indicates the probability to find the system in one of the four homogenous states.

ð22Þ

5. Simulations In this section, we thoroughly investigate and compare the performances of the proposed punishment-based incentive scheme and reward-based scheme, with the change of various parameters in theoretical models. Due to the lasting dynamics of imitating and mutating process in the proposed schemes, in our simulations, we define the homogeneous state of each strategy as follows: whenever more than 90% of the peers opt for one strategy, then it is counted as being in the respective

3821

Y. Wang et al. / Computer Networks 55 (2011) 3811–3831

homogeneous state. Then, based on the number of homogeneous states, we calculate the ratio of time average for each strategy. Note that we also use other value, like 95% as threshold, and get very similar results. (1) Illustration of strategy frequencies in punishment-based incentive scheme Fig. 4a shows the evolutionary trajectory of strategy frequencies in punishment based incentive mechanism with entry fee. Strategy frequency denotes how many percentages of peers with specific strategy are in the P2P system. Basically, we can see that P2P system is ever dynamic: Although it is impossible achieve the static equilibrium of full punishers (or full cooperators), punishers will dominate the system (near to stabilization). In detail, after some initial oscillations, P peers usually dominate the whole P2P network. In longer runs, their regime can occasionally break down as a result of C peers invading by neutral drift, but after another series of oscillations among C, D, and L peers (rock-paper-scissors-like succession), P peers dominate the whole network again. The graphical phenomenon in Fig. 4a can be explained by transition probabilities among those four strategies, shown in Fig. 4b. First, P and D are bistable, which means that, choosing between P and D, each strategy is a best response to itself. That is, a small percentage of P mutants could not successfully invade the population of D, and vice versa. Thus, there exists no transition between P and D. The transition probabilities from L to C, from C to D, from D to L, and from L to P are relatively large. And furthermore, when there exist only P and C peers in P2P system, all peers’ utilities are same. Thus, the transition probability between P and C is neutral drift, 1/M, which is extremely small. Naturally, most of time should be spent in the P state. Fig. 4b also shows the perfect consistence between theoretical time average and

100% 90%

Strategy frequencies

80% 70% 60%

'C' strategy 'D' strategy 'P' strategy 'L' strategy

50% 40% 30% 20% 10% 0 0

1

2

3

4

5

6

7

8

9

Evolutionary rounds

10 x 10

4

Scenario: M=500, N=50, µ=0.0001, β =5, α1=10, cs=1, ce=0.1, cp=0.2, cu=0.1 Fig. 4a. Illustration of strategy frequencies in punishment-based scheme (with entry fee).

Theoretical: 0.52% Experimental: 0.61%

Theoretical: 98.09% Experimental: 97.56%

0.3729

P

L

0.3935

0.

37

29

0.002 (neutral drift)

C Theoretical: 0.40% Experimental: 0.49%

0.9834

D Theoretical: 0.99% Experimental: 1.34%

Fig. 4b. Stationary probability distributions and transition probabilities in punishment based incentive scheme (the same scenario as Fig. 5a).

3822

Y. Wang et al. / Computer Networks 55 (2011) 3811–3831

experimental time average. Note that the experimental values in Fig. 4b (and the experimental values in all the following figures) are the result after 105 evolutionary rounds. Figs. 5a and 5b respectively illustrate the strategy frequencies, and the stationary probability distributions as well as transition probabilities under the condition of zero entry fee. Note that, when entry fee is zero, if there exist only L and D peers in the P2P system, then their utilities are all same, which lead to the neutral drift between L and D. Fig. 5a vividly illustrates that, for zero entry fee, the state of P2P system is almost occupied by D state. However, from Fig. 5b, it is noted that, in comparison with Fig. 4b, there are big gaps between the theoretical analysis and the experimental results: interestingly, even though the theoretical analysis predicts that near 33% of homogenous states (according to our definition) belong to P, but, actually, experimental result shows that almost only D strategy can reach the 90% threshold, and can be regarded as homogenous state (we repeated the same simulation for several times, and obtained almost identical results). We think Fig. 5a can give partial reason for the above big gaps between theoretical analysis and the experimental results. Specifically, in experiments, once the whole P2P network located in D state, then, through small mutation, L peers appear, and after long evolutionary rounds, the percentage of L peers will be large, but not large enough to reach the threshold of 90%. Then L peers will be quickly replaced by C and P peers, because the transition probabilities from L to C, and from L to P are relatively large. But, meanwhile, there still exist some D peers, because the transition between L and D is neutral drift, which is extremely small. In a sense, the above scenario is more or less similar to the P2P system which includes C, D, and P peers, and naturally, it is impossible

100% 90%

Strategy frequencies

80% 70%

'C' strategy 'D' strategy

60%

'P' strategy 'L' strategy

50% 40% 30% 20% 10% 0 0

1

2

3

4

5

6

7

8

9

10 4

Evolutionary rounds

x 10

Scenario: M=500, N=50, µ=0.0001, β =5, α1=10, cs=1, ce=0, cp=0.2, cu=0.1 Fig. 5a. Illustration of strategy frequencies in punishment-based scheme (without entry fee).

Theoretical: 0.13% Experimental: 0.10%

Theoretical: 33.29% Experimental: 0%

0.4970

P

L

0.

49

70

0.002 (neutral drift)

C Theoretical: 0.13% Experimental: 0.05%

0.002 (neutral drift)

0.9834

D Theoretical: 66.45% Experimental: 99.85%

Fig. 5b. Stationary probability distributions and transition probabilities in punishment based incentive mechanism (the same scenario as Fig. 6a).

3823

Y. Wang et al. / Computer Networks 55 (2011) 3811–3831

(or very difficult) for P strategy to reach the 90%. Briefly, when entry fee is zero, the neutral drift between D and L will cause the difficulty for the transition from whole D to whole L state. Probably, we suppose that, only for extremely small mutation probability and a huge number of evolutionary rounds (like one billion), the experimental values would converge to the theoretical values. The above phenomenon implies that, when there exists no entry fee in punishment-based incentive mechanism, the experimental values are always significantly worse than theoretically expected values (theoretical analysis). Thus, we can obtain that, even though the entry fee is very small, it is indispensable for P2P resource provision. Otherwise, the P2P resource provision and sharing system will be mostly occupied by D peers. (2) Illustration of strategy frequencies in reward-based incentive scheme Fig. 6a illustrates the strategy frequencies in reward-based scheme with entry fee, in which the states of P2P system appear extremely messy. However, from the messy states shown in Fig. 6a, we can draw the following conclusions: first, unlike the punishment-based incentive mechanism, as shown in Fig. 4a, in reward-based incentive scheme, R strategy cannot be near to stabilization at all; second, intuitively, D state occupies the largest ratio of time average. The phenomenon can be explained by the stationary probability distributions as well as transition probabilities shown in Fig. 6b. Specifically, there exist large transition rate among pure strategies, which leads to the relative litter status in Fig. 6a. Furthermore, it should be noted that there exist small differences between experimental values and theoretical

100% 90%

Strategy frequencies

80%

'C' strategy 'D' strategy 'R' strategy 'L' strategy

70% 60% 50% 40% 30% 20% 10% 0 0

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

Evolutionary rounds Scenario: M=500, N=50, µ=0.0001, β =5, α =5, c =1, c=0.1, b=0.2, c=0.1 Fig. 6a. Illustration of strategy frequencies in reward-based scheme.

Theoretical: 9.0% Experimental: 10.26%

Theoretical: 27.24% Experimental: 15.83%

0.3306

R

0.3935 0. 32 84

1

L

1

C Theoretical: 18.14% Experimental: 21.47%

0.9894

D Theoretical: 45.62% Experimental: 52.44%

Fig. 6b. Stationary probability distributions and transition probabilities in reward-based incentive scheme (the same scenario as Fig. 6a).

Y. Wang et al. / Computer Networks 55 (2011) 3811–3831

analysis, especially for the time averages of R and D states. We think those inconsistencies may stem from the assumption of our theoretical analysis model (provided in Section 4): the whole P2P system almost always consists of one or two types at most. However, under the strong selection (implies that the peer with the lower payoff will always adopt the strategy of another peer with higher payoff) and the large transition rate among pure strategies, the above assumption hardly holds water. But, the experimental value of R state is significantly worse than that in theoretical analysis, for D strategy, vice versa. In fact, we conducted the same experiments for many times, and obtained the same qualitative results as Fig. 6b: normally, the experimental time average of R is always 11% lower than the theoretical analysis, and the experimental time average of D is always 7% higher than the theoretical analysis. Thus, we can draw the conclusion that: in reward-based scheme, the experimental values are always worse than the theoretical analysis. In a sense, theoretical analysis could be regarded as the optimistic reference for the experimental results. In brief, in contrary to the intuitive thought, instead of making the system stable (that is, R will occupy the most time), the system will oscillate quickly. Undoubtedly, when there exist no entry fee, the system would become worse. Actually, for the same scenario as above, when entry fee is zero, the theoretical and experimental time averages spent on the D state are 100% and 99.50% respectively.

70%

'C' strategy (Theoretical) 'C' strategy (Experimental) 'D' strategy (Theoretical) 'D' strategy (Experimental) 'R' strategy (Theoretical) 'R' strategy (Experimental) 'L' strategy (Theoretical) 'L' strategy (Experimental)

60%

Ratio of time averages

50% 40% 30% 20% 10% 0

5

6

7

8

9

10

11

12

13

14

15

Benefit in reward-based incentive scheme

Scenario: M=500, N=50, µ=0.0001, β=5, cs=1, ce=0.1, br=0.2, cr=0.1 Fig. 7. Time average of various strategies with the change of benefit in reward-based scheme.

70%

'C' strategy (Theoretical) 'C' strategy (Experimental) 'D' strategy (Theoretical) 'D' strategy (Experimental) 'R' strategy (Theoretical) 'R' strategy (Experimental) 'L' strategy (Theoretical) 'L' strategy (Experimental)

60%

Ratio of time averages

3824

50% 40% 30% 20% 10% 0

0

0.2

0.4

0.6

0.8

1

Reward in reward-based incentive scheme

Scenario: M=500, N=50, µ=0.0001, β=5, α2=5, cs=1, ce=0.1, cr =0.1 Fig. 8. Time average of various strategies with the change of reward in reward-based scheme.

3825

Y. Wang et al. / Computer Networks 55 (2011) 3811–3831

(3) Ratio of time averages in reward-based incentive scheme with the changes of benefit and reward Fig. 7 illustrates, with the increase of the benefit, the theoretical and experimental time averages of various strategies in reward-based scheme. The time averages of R and C states do not increase correspondingly. On the contrary, the time average of D state increases slightly. The above phenomenon is intuitive: with the increase of benefit, D peers would exploit the system more. Furthermore, we can see that the experimental time averages of R and C states are compatible with the theoretical values, but, the experimental value of D state is significantly larger than the theoretical value of D state, and for L state, vice versa. The phenomenon implies that, in real systems, more time will be spent in the D state (worse than the theoretically expected value). Fig. 8 illustrates the theoretical and experimental time averages of various strategies in reward-based scheme, with the increase of the extra reward that R peer provides to C peers and other R peers. Interestingly, only when extra reward is extremely large, the time averages of C and R states can increase slightly. Similar like Fig. 7, the experimental

100% 90%

'C' strategy (Theoretical) 'C' strategy (Experimental) 'D' strategy (Theoretical) 'D' strategy (Experimental) 'P' strategy (Theoretical) 'P' strategy (Experimental) 'L' strategy (Theoretical) 'L' strategy (Experimental)

Ratio of time averages

80% 70% 60% 50% 40% 30% 20% 10% 0 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Punishment cost in punishment-based incentive scheme Scenario: M=500, N=50, µ=0.0001, β=5, α1=10, cs=1, ce=0.1, cp=0.2 Fig. 9. Time averages of various strategies with the change of punishment cost in punishment-based scheme.

100% 90%

'C' strategy (Theoretical) 'C' strategy (Experimental) 'D' strategy (Theoretical) 'D' strategy (Experimental) 'P' strategy (Theoretical) 'P' strategy (Experimental) 'L' strategy (Theoretical) 'L' strategy (Experimental)

Ratio of time averages

80% 70% 60% 50% 40% 30% 20% 10% 0 0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Entry fee in punishment-based incentive scheme Scenario: M=500, N=50, µ=0.0001, β=5, α1=10, cs=1, cp=0.2, cu=0.1 Fig. 10. Time averages of various strategies with the change of entry fee in punishment-based scheme.

3826

Y. Wang et al. / Computer Networks 55 (2011) 3811–3831

time average of D state is significantly larger then the theoretical value of D state, and for L state, vice versa. In brief, as Figs. 10 and 11 show, even with the increase of the benefit and extra reward, the reward-based incentive mechanism does not work well in anonymous, dynamic and autonomous P2P network. (4) Ratio of time averages in punishment-based incentive scheme with the changes of various parameters Fig. 9 shows the theoretical and experimental time average of various strategies, with the change of punishment cost. Interestingly, even though the punishment cost gradually increases, and is greatly larger than the value of penalty on the defectors (and even equals the cost of providing resource to the system), P2P system is still located in almost whole P state in all experiments and theoretical analysis. It implies that, in the punishment-based scheme, participants seem willing to use costly punishment, even though the punishment is too costly. Recent experiments show that if players can choose between joining a public goods game either with or without punishment, they prefer the former [21]. The interpretation seems clear: whoever freely accepts that defection may be punished is unlikely to be a

100% 90%

'C' strategy (Theoretical) 'C' strategy (Experimental) 'D' strategy (Theoretical) 'D' strategy (Experimental) 'P' strategy (Theoretical) 'P' strategy (Experimentall) 'L' strategy (Theoretical) 'L' strategy (Experimental)

Ratio of time average

80% 70% 60% 50% 40% 30% 20% 10% 0 50

100

150

200

250

300

350

400

450

500

The number of peers in group Scenario: M=500, µ=0.0001, β=5, α1=10, cs=1, ce=0.1, cp=0.2, cu=0.1 Fig. 11. Time average of various strategies with the change of the number of peers in each group in punishment-based scheme.

100% 90%

'C' strategy (Theoretical) 'C' strategy (Experimental) 'D' strategy (Theoretical) 'D' strategy (Experimental) 'P' strategy (Theoretical) 'P' strategy (Experimental) 'L' strategy (Theoretical) 'L' strategy (Experimental)

Ratio of time averages

80% 70% 60% 50% 40% 30% 20% 10% 0 500

600

700

800

900

1000

The number of total peers in punishment-based incentive scheme Scenario: N=100, µ=0.0001, β=5, α1=10, cs=1, ce=0.1, cp=0.2, cu=0.1 Fig. 12. Time average of various strategies with the change of the number of total peers in punishment-based scheme.

3827

Y. Wang et al. / Computer Networks 55 (2011) 3811–3831

defector. For contributors, it is thus less risky to join such a group. We think the above reasoning could also partially explain the interesting phenomenon in Fig. 9. Fig. 10 shows the theoretical and experimental time averages of various strategies, with the change of entry fee. We can see that, the small entry fee can facilitate the emergence of almost whole P state. But, it does not mean that the larger entry fee is, the more frequency the system will spent in the P state. On the contrary, if the entry fee was too large, intuitively, it will prevent more peers from joining the resource provision venture, and more peers will select L state. The increase of the theoretical time average of L strategy in Fig. 10 slightly shows the above trend. Figs. 11 and 12 respectively illustrate the impact of the number of peers in group, and the total number of peers in system on the time averages of various strategies. Interestingly, in all experiments and theoretical analysis, the whole P2P network is mostly occupied by P peers, not matter what the parameters (the number peers in each group, and the total number of peers) are set. The reason is straightforward: for strong selection, those parameters hardly affect the transition probability, and thus the ratios of time average of those four states in various parameters almost keep same. 6. Discussions In this section, we briefly discuss the following four interesting questions related to our proposed punishment-based and reward-based incentive schemes for P2P resource provision:  In 3-strategy punishment-based incentive mechanism (including C, D and P), does zero punishment cost plus punishing both D and C will definitely lead P2P network to the full P state?  In 3-strategy reward based incentive mechanism (including C, D and R), does zero reward cost plus rewarding only R will definitely lead P2P network to the full R state?  In dynamic, anonymous and autonomous P2P networking environment, is the average payoff of P peers in punishmentbased scheme less than the average payoff of R peers in reward-based scheme (when reward cost is zero)?  What is the difference of this work from the existing incentive mechanisms in P2P/overlay networks, especially the schemes also based on public goods game? (1) Does zero punishment cost plus punishing both D and C will definitely lead P2P network to the full P state in 3strategy punishment based incentive mechanism (including C, D and P)? It is general case that punishment behavior should be costly, that is certain cost should be associated with the punisher for conducting punishment behavior. In Section 3.2, we illustrated that, when there exist punishment cost, in 3strategy punishment-based scheme (C, D and P), the system’s equilibrium state is D. Actually, in dynamic and autonomous P2P network, even though the punishment cost is zero, the system will also finally converge to the D state. The phenomenon can be briefly explained by the transition probabilities among pure states shown in Fig. 13(a). Note that the scenario is: M = 500, N = 50, b = 5, a1 = 10, cs = 1, cp = 0.2, cu = 0. First, C peers can invade and replace P peers through neutral drift, because they obtain the same payoff when there exist only C and P peers in P2P system; But, P and D are bi-stable, because P peers bear the cost of resource provision. Thus, naturally, the final state will converge to the full D. Furthermore, if we could go further: P can punish both C and D, and punishment cost is zero (that is, through some robust reputation mechanism based on co-player’s behavior history, P peers could discern themselves from the C peers), the transition probability can be represented as Fig. 13(b) (the scenario is same as Fig. 13(a)). Then, in experiments, the dynamics of 3-strategy punishment-based system will depend on the initial ratio of various strategies. For example, if the initial percentage of D peers is very large (more than 90%), our experiments also shows the final state will be stuck in D. Briefly, the above results imply that, when there exist three strategies C, D and P in punishment-based incentive scheme, even though we assume that P could punish both C and D without punishment cost, the system is not stable enough, that is, in some specific case, the whole system would be stuck in full D state.

P

P 0.002 (neutral drift)

C

0.9834

0.1757

D

(a) Zero punishment cost & P punishes D only

C

0.9834

D

(b) Zero punishment cost & P punishes both C and D

Fig. 13. Stationary probability distributions and transition probabilities in 3-strategy punishment-based scheme (zero punishment cost).

3828

Y. Wang et al. / Computer Networks 55 (2011) 3811–3831

R

R

0.4269

0.3629

C

0.9894

D

(a) Zero reward cost & R rewards both C and R

C

0.9894

D

(b) Zero reward cost & R rewards only R

Fig. 14. Stationary probability distributions and transition probabilities in 3-strategy reward-based scheme (zero reward cost).

(2) Does zero reward cost plus rewarding only R will definitely lead P2P network to the full R state in 3-strategy reward based incentive mechanism (including C, D and R)? Similarly, Section 3.2 proved that, in reward-based service differentiation scheme, if there exist three strategies, C, D and R, and some reward cost is associated with the rewarders, D is the unique equilibrium. Here, we analyze the dynamics of the 3-strategy reward-based incentive mechanism, when reward cost is zero. Fig. 14(a) and (b) respectively show the transition probabilities of those three strategies, when R peer has to reward both C and R peers, and R peer could only reward other R peers (similarly, through some robust reputation mechanism based on co-player’s behavior history, R peers could discern themselves from the C peers). Note that the scenario is: M = 500, N = 50, b = 5, a2 = 5, cs = 1, br = 0.1, cr = 0, and, in Fig. 14(b), the reward to R is br, and reward to C is 0. Interestingly, Fig. 14(a) and (b) are qualitatively identical to the scenario of Fig. 13(b) in punishment based scheme: R and D are bi-stable, and C can transfer to R and D. Thus, we can have the following inferences. First, even though the reward cost is zero, probably, in most scenarios, the P2P system would evolve into full R states, but, same as the punishmentbased scheme, if the initial percentage of D peers is extremely large, the system could also be stuck into the D state. Second, for reward-based scheme, probably, the determinant factor that affects the dominance of R strategy, is not the way whether R rewards both C and R, or only rewards the R themselves, but whether reward cost is zero or not. (3) In anonymous, dynamic and autonomous P2P networking environment, is the average payoff of P peers in punishment-based scheme less than the average payoff of R peers in reward-based scheme? Interestingly, it is shown that reward is as effective as punishment for maintaining public cooperation and leads to higher total earnings [17]. But the above conclusions rely on truly repeated games, in which player identities persist from round to round. Probably, in real social life, the persistent users’ identities and the truly repeated interactions could be natural. But, in large-scale and autonomous networks, typically, peers cannot track the identity of other group members, and moreover, the interaction groups are changed or the identities of group members are reshuffled in every evolutionary round. Thus, for incentive mechanisms in dynamic and autonomous P2P networks, it is infeasible to assume the persistent identity and repeated interaction. Here, under the anonymous, dynamic and autonomous P2P networking environment, we compare the average payoffs of P and R peers in punishment-based scheme and reward-based scheme. Note that we let the reward cost be zero, for it is the only case that reward-based scheme could work well in anonymous, dynamic and autonomous networks (when there exist four strategies in reward-based scheme: C, D, R and L). As emphasized in this paper, all behaviors and actions are all endogenous, that is, no any exogenous organization can enforce the punishment and/or reward, like reimbursing the cooperative peers with some out-of-band money. Thus, in an resource provision system (has M peers) using punishment-based incentive mechanism, if ideally, all peers are P, then, the total benefit provided by the systems should be M(a1  cs); in reward-based scheme (without reward cost), if ideally, all peers are R, the system’s benefit is M(a2 + Nbr  cs). Thus, in order to make the both schemes comparable, we let a1 = a2 + Nbr. Note that the above assumption also implies that in punishment-based scheme, initially, all peers will be served with high-level service a1, and then, according to peers’ current behaviors, voluntary punishers lower the defectors’ service level; in reward-based scheme, initially, all peers will be served with lower-level service a2, and then according to peers’ behaviors, volunteers (rewarders) promote the service level of peers who provide resource to the system. Fig. 15 shows the P’s average payoff in punishment based scheme, and R’s average payoff in reward-based scheme, when R rewards both R and C, and R only rewards R, respectively. Firstly, after quick oscillation, P peers in punishment-based scheme can achieve the almost same payoff as R in reward-based scheme (R only reward R, and reward cost is zero). Indeed, if R could reward R only, and reward cost is zero, then reward-based scheme almost immediately drives the system into full R state. But, it needs R peers could discern themselves from C peers, thus, some robust and complex reputation mechanism may be required to record peers’ behavior history. However, in anonymous, dynamic and autonomous networks, the above requirement is too strong to be feasible. However, if R has to reward both R and C, the punishment based scheme is better than reward based scheme, even though the reward cost is zero.

3829

Y. Wang et al. / Computer Networks 55 (2011) 3811–3831

P's average payoff in Punishment-based incentive scheme R's average payoff in Reward-based incentive scheme (Reward only R) R's average payoff in Reward-based incentive scheme (Reward both R and C) 10 8

Average payoff

6 4 2 0 -2 -4 -6

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

Evolutionary rounds Scenario: in punishment-based scheme: M=500, N=50, µ=0.0001, β=5, α1=10, cs=1, ce=0.1, cp=0.2, cu=0.1 In reward-based scheme: M=500, N=50, µ=0.0001, β=5, α2=5, cs=1, ce=0.1, br=0.1, cr=0 (reward cost is zero) Fig. 15. Illustration of average payoff P (or R) in punishment-based (or reward-based) incentive scheme.

Briefly, we argue that, for the anonymous, dynamic and autonomous P2P networking environment, punishment-based service differentiation incentive mechanism is the only feasible scheme of stimulating peers to provide resource to the whole system. This scheme has the following specific advantages: usually, the groups are changed or the identities of group members are reshuffled in every round; subjects are often informed about the total amount of punishment they received, but not from whom the punishment came. These designs reduce or eliminate effects of reputation, as well as retaliation by those who have been punished. (4) What is the difference of the work from the existing incentive mechanisms in P2P/overlay network, especially the schemes based on public goods game? Note that, the creation of wireless backbone was modeled as a public good, and Volunteer’s Dilemma (VOD) and the extended Volunteer’s Timing Dilemma (VTD) was used to analyze its performance [22]. Basically, the typical Volunteer’s Dilemma (VOD) is following: a group of rational individuals want a single person from the group to volunteer to offer some service. Each node has two possible strategies it may play: volunteer or free-ride. If at least one node volunteers, everyone obtain the public good and receives utility 1, but each node i that volunteers must pay ci 2 [0, 1] which is the cost private to each node i, and often modeled as some special kind of distributions (known the system designer). In fact, VOD is a multiplayer version of the snowdrift game, in which, for two-person case, the best strategy of each peer is to be opposite to the partner’s strategy. Our work is significantly different from the above work. The authors in [22] argued that ensuring global end-to-end connectivity is of nodes’ primary concern, thus naturally, VOD is appropriate for modeling the creation of wireless routing backbone. But, we focus on resource provision in P2P networks (e.g., storage, bandwidth and content, etc.), and moreover the total resource provided are equally shared by all participants (in the simplest case), thus multiple-person prison dilemma is appropriate model. Ref. [23] modeled neighbor selection in overlay network as a game involving directed links, constraints on the number of allowed neighbors, costs reflecting both network latency and node preference. The authors express a node’s ‘‘best response’’ wiring strategy as a k-median problem on asymmetric distance, and use this formulation to obtain pure Nash equilibria. Instead of selfish neighbor selection, this paper investigated how to stimulate rational peers to voluntarily contribute resource to common pool in community/group-based P2P applications with the feature of ‘‘contributing while consuming’’. Specifically, through dynamically forming virtual and/or physical communities/ groups, users can collaboratively contribute to (meanwhile consume) services which are composed by the application-specific resources voluntarily provisioned by participants. Briefly, the above works are usually modeled and analyzed by classical game theory. Specifically, those schemes always assume forward-looking model of rationality of users: from individual’s viewpoint, each user was supposed

3830

Y. Wang et al. / Computer Networks 55 (2011) 3811–3831

to have the knowledge about the global information, and select strategy to best response to current system’s state; from system designer’s viewpoint, usually, the distribution of participants’ types is assumed to be known. Our work is inspired by the Evolutionary Game Theory (EGT), which adopts the backward-looking approach, that is, each peer simply imitates the behavior of other user with better payoff, and can characterize the evolutionary dynamics of various strategies. 7. Conclusion It is argued that P2P networks are distributed environments managed by multiple administrative authorities, shared by users with different and competing interests, or even autonomously provided by independent and rational users. The distinguished feature in P2P networks is that each player determines its behaviors autonomously. Thus, one of the fundamental principles in dynamic and autonomous networks and systems is to ‘‘accommodate participants’ rational behaviors, and design for choice’’ [24]. Specifically, we argue that general P2P networks are anonymous, dynamic and autonomous, in which most interactions among peers are one-time, each strategy and action are voluntarily chosen and determined by independent, rational and autonomous peers, and all behaviors are endogenous. Under the above considerations, this paper thoroughly investigated the effectiveness of service differentiation based resource-provision incentive mechanisms: punishment-base scheme (first providing high-level service plus punishment), and reward-based scheme (first providing low-level service plus reward). We draw the conclusion that, in anonymous, dynamic and autonomous P2P environment, if peers can voluntarily join the system, and small round-based entry fee can be set for participation, the performance of punishment-based scheme is always better than reward-based scheme, and the P peer’s average payoff in punishment-based incentive scheme is almost same as the ideal but unfeasible case of reward-based scheme: R could reward only R peers, and reward cost is zero. And furthermore, due to the inherently stochastic property, the service differentiation based incentive scheme (punishment based incentive) is ever dynamic: although it is impossible achieve the static equilibrium of totally cooperative, punishers will dominate the system, that is, the system will be mostly occupied by punishers. The philosophy implications behind our work are twofold: first, like many mechanisms in real society, it is difficult (or always impossible) for them to be perfect, but they still work with acceptable performance; second, in anonymous, dynamic and autonomous networks, the appropriate way to encourage participants’ cooperative behavior should be: first to be generous, then to be harsh according to peers’ current behaviors. Both of the implications perfectly comply with the basic codes of our society. Acknowledgements This research is partially support by the 973 Program 2007CB310607, 863 Project 2007AA01Z206 and 2006AA01Z235, NSFC Grants 60802022. The authors thank the anonymous reviewers for their suggestion on how to improve the previous draft of the articles; their comments were of great help. References [1] E. Adar, B. Huberman, Free riding on Gnutella, First Monday Online Journal (2000). [2] D. Hughes, G. Coulson, J. Walkerdine, Free riding on Gnutella revisited: the bell tolls?, IEEE Distributed Systems Online 6 (2005). [3] P. Antoniadis, T. Friedman, X. Cuvellier, Resource provision and allocation in shared network testbed infrastructures, in: Proceedings of the Workshop on Real Overlays and Distributed Systems (ROADS), 2007. [4] M. Feldman, K. Lai, L. Zhang. A, price-anticipating resource allocation mechanism for distributed shared clusters, in: ACM Conference on Electronic Commerce, June, 2005. [5] C. Courcoubetis, R. Weber, Incentives for large peer-to-peer systems, IEEE Journal on Selected Areas in Communications 24 (5) (2006). [6] M. Feldman, C. Papadimitriou, J. Chuang, I. Stoica, Free-riding and whitewashing in peer-to-peer systems, IEEE Journal on Selected Areas in Communications 24 (5) (2006). [7] Y.F. Wang, A. Nakao, J.H. Ma, A simple public-goods game based incentive mechanism for resource provision in P2P networks, in: Proceedings of the Springer LNCS Proceedings 6406, in press. [8] Chiranjeeb Buragohain, Divyakant Agrawal, Subhash Suri, A game theoretic framework for incentives in P2P systems, in: Proceedings of the 3rd International Conference on Peer-to-Peer Computing, 2003. [9] Q. Zhao, C.S. Lui, D.M. Chiu, Mathematical modeling of incentive policies in P2P Systems, in: Proceedings of the ACM Workshop on Network Economics (NetEcon), 2008. [10] Q. Zhao, C.S. Lui, D.M. Chiu, Analysis of adaptive incentive protocols for P2P networks, in: Proceedings of the INFOCOM, 2009. [11] Y.F. Wang, A. Nakao, A.V. Vasilakos, J.H. Ma, P2P soft security: on evolutionary dynamics of P2P incentive mechanism, Computer Communications (COMCOM) 34 (3) (2011). [12] Y.F. Wang, A. Nakao, On cooperative and efficient overlay network evolution based on group selection pattern, IEEE Transactions on Systems, Man and Cybernetics, part B (Cybernetics) 40 (3) (2010). [13] N. Nisan, T. Roughgarden, E. Tardos, V.V. Vazirani, Algorithmic Game Theory, Cambridge University Press, Cambridge, UK, 2007. [14] B. Cohen, Incentives build robustness in BitTorrent, in: Proceedings of the P2P Econ., 2003. [15] C. Hauert et al, Exploration dynamics in evolutionary games, Proceedings of the National Academy of Sciences (PNAS) 106 (3) (2009) 709–712. [16] C. Hauert et al, Via freedom to coercion: the emergence of costly punishment, Science 316 (5833) (2007) 1905–1907. [17] D.G. Rand, A. Dreber, T. Ellingsen, D. Fudenberg, M.A. Nowak, Positive interactions promote public cooperation, Science 325 (2009) 1272–1275. [18] L.A. Imhof, D. Fudenberg, M.A. Nowak, Tit-for-tat or Win-stay, Lose-shift?, Journal of Theoretical Biology 247 (2007). [19] E. Friedman, P. Resnick, The social cost of cheap pseudonyms, Journal of Economics and Management Strategy 10 (2) (1998). [20] A. Traulsen, M.A. Nowak, J.M. Pacheco, Stochastic dynamics of invasion and fixation, Physical Review E 74 (2006). [21] Ö. Gürerk, B. Irlenbusch, B. Rockenbach, The competitive advantage of sanctioning institutions, Science 312 (5770) (2006). [22] S. Lee, D. Levin, V. Gopalakrishnan, S. Bhattacharjee, Backbone construction in selfish wireless networks, in: Proceedings of the ACM SIGMETRICS, 2007.

Y. Wang et al. / Computer Networks 55 (2011) 3811–3831

3831

[23] N. Laoutaris, G. Smaragdakis, A. Bestavros, J.W. Byers, Implications of selfish neighbor selection in overlay networks, in: Proceedings of IEEE INFOCOM, 2007. [24] D.D. Clark, J. Wroclawski, K.R. Sollins, R. Braden, Tussle in cyberspace: defining tomorrow’s Internet, IEEE/ACM Transactions on Networking (TON) 13 (3) (2005).

Yufeng Wang received Ph.D degree in State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications (BUPT), China, on July 2004. From Jul. 2006 till Apr. 2007, he worked as Postdoctoral researcher in Kyushu University, Japan. From May, 2007, he acted as associate Professor in Nanjing University of Posts and Telecommunications, China. From Mar. 2008 to Mar. 2011, he was expert researcher in National Institute of Information and Communications Technology (NICT), Japan. He is also guest researcher in State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications (BUPT), China. His research interests focus on mutli-disciplinary inspired networks and systems.

Akihiro Nakao received B.S. (1991) in Physics, M.E. (1994) in Information Engineering from the University of Tokyo. He was at IBM Yamato Laboratory/at Tokyo Research Laboratory/at IBM Texas Austin from 1994 till 2005. He received M.S. (2001) and Ph.D. (2005) in Computer Science from Princeton University. He has been teaching as an Associate Professor in Applied Computer Science, at Interfaculty Initiative in Information Studies, Graduate School of Interdisciplinary Information Studies, the University of Tokyo since 2005. (He has also been an expert visiting scholar/a project leader at National Institute of Information and Communications Technology (NICT) since 2007). His research interest is overlay and network virtualization.

A.V. Vasilakos is currently Professor at the Dept. of Computer and Telecommunications Engineering, University of Western Macedonia, Greece and visiting Professor at the Graduate Programme of the Dept. of Electrical and Computer Engineering, National Technical University of Athens (NTUA). He has authored or co-authored over 200 technical papers in major international journals and conferences. He is author/coauthor of 5 books, 20 book chapters in the areas of communications. He served as general chair, TPC chair and symposium chair for many international conferences. He served or is serving as an Editor or/and Guest Editor for many technical journals, such as IEEE TSMC-PartB, IEEE TITB, IEEE TWC,IEEE Communications Magazine, ACM TAAS. He is founding Editor-in-chief of the journals: International Journal of Adaptive and Autonomous Communications Systems (IJAACS, http://www.inderscience.com/ijaacs), International Journal of Arts and Technology (IJART, http:// www.inderscience.com/ijart). He is chairman of the Intelligent Systems Applications Technical Committee (ISATC) of the IEEE Computational Intelligence Society (CIS).

Jianhua Ma received his B.S. and M.S. degrees of Communication Systems from National University of Defense Technology (NUDT), China, in 1982 and 1985, respectively, and the PhD degree of Information Engineering from Xidian University, China, in 1990. He has joined Hosei University since 2000, and is currently a professor at Digital Media Department in the Faculty of Computer and Information Sciences, in Hosei University, Japan. Dr. Ma is a member of IEEE and ACM. He has edited 10 books/ proceedings, and published more than 150 academic papers in journals, books and conference proceedings. His research interest is ubiquitous computing.