Distributed cell selection in heterogeneous wireless networks

Distributed cell selection in heterogeneous wireless networks

Accepted Manuscript Distributed Cell Selection in Heterogeneous Wireless Networks Xinxin Feng, Xiaoying Gan, Haifeng Zheng, Zhonghui Chen PII: DOI: R...

1MB Sizes 2 Downloads 184 Views

Accepted Manuscript

Distributed Cell Selection in Heterogeneous Wireless Networks Xinxin Feng, Xiaoying Gan, Haifeng Zheng, Zhonghui Chen PII: DOI: Reference:

S0140-3664(17)30579-0 10.1016/j.comcom.2017.05.005 COMCOM 5500

To appear in:

Computer Communications

Received date: Revised date: Accepted date:

10 August 2016 2 May 2017 13 May 2017

Please cite this article as: Xinxin Feng, Xiaoying Gan, Haifeng Zheng, Zhonghui Chen, Distributed Cell Selection in Heterogeneous Wireless Networks, Computer Communications (2017), doi: 10.1016/j.comcom.2017.05.005

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT 1

Distributed Cell Selection in Heterogeneous Wireless Networks Xinxin Feng∗ , Xiaoying Gan† , Haifeng Zheng∗ , Zhonghui Chen∗ of Physics and Information Engineering, Fuzhou University, P. R. China † Dept. of Electronic Engineering, Shanghai Jiao Tong University, P. R. China

∗ College

I.

Networks,

I NTRODUCTION

Congestion

M

Keywords—Heterogeneous Wireless Games, Cell Selection Mechanism

CE

PT

ED

The integration of heterogeneous wireless access networks over an Internet Protocol (IP) backbone is one of the most important trends in future communication systems. In such a heterogeneous network, the concept of being always connected becomes being always best connected (ABC) [1]. With ABC functionality, a user is allowed to choose the best available access networks in a best possible way anytime anywhere. To this end, a smart and efficient cell selection mechanism is crucial for users. In heterogeneous networks, to avoid high burden to the system, it is unrealistic to implement an algorithm with a central controller leading all the users. Thus, an alternative approach is to design a decentralized cell selection mechanism, by which each user makes its own decision without coordination. This is feasible because the Cognitive Radio (CR) technologies are popularized, whereby devices have capabilities to obtain knowledge and reconfigure parameters such as the access networks. In essence, the distributed cell selection scenario is closely related to the heterogeneous type CR system [2]. A key challenge of designing the mechanism is to resolve the competition among users in a fully distributed style, especially when they have no information about the resources, such as the availabilities and qualities.

AC

CR IP T

Game theory is an effective tool to model the users’ competition. Directed by some learning algorithms, the users can get information and behave correspondingly to achieve equilibrium. However, when the theory is applied into a cell selection problem, some critical features should be considered. First, due to users’ different locations and cells’ different coverage areas, the users each have their own sets of resources. Second, the strictly asynchronous learning algorithms, which are often adopted in theoretical models, are hard to be directly implemented in a practical system. This is because the heterogeneous access networks always belong to different operators, which makes it impossible to do an accurate scheduling of users’ accessing. Last, the learning errors cannot be eliminated in complicated wireless environments (which include noise and fading), leading to the inaccuracies of learning results and hence the users’ erroneous decisions. Based on the above considerations, we propose our game model and the distributed learning algorithm. We consider a practical scenario where heterogeneous networks coexist, and each user is covered by multiple base stations (BSs). The users are selfish to select their believed best cells, hence causing network congestion and performance degradation. We formulate the cell selection problem as an asymmetric congestion game, in which we consider both the users’ positions (which decide their distinct strategy sets) and their specific data rates of accessing heterogeneous networks. We study the distributed learning algorithms which allow synchronous updates. Moreover, we allow the users to make mistakes when changing their strategies. The main contributions of this paper are as follows. • General game model formulation: We formulate the cell selection problem in heterogeneous networks as an asymmetric singleton congestion game with playerspecific payoff functions and show the existence of PNE, at which each user chooses the best cell taking into account the decisions of others. • Distributed learning algorithms leading users to satisfactory PNE: We propose the error-tolerant concurrent distributed learning algorithm, which converges to Nash equilibria by the local one-step observations of users. Furthermore, we prove that it has the property of eliminating all the weakly dominated PNE and leading the users to a more satisfactory one. In addition, we provide detailed discussions on the implementation of the algorithm in practical networks in terms of terminal conditions, non-uniform probe and error probabilities,

AN US

Abstract—A critical issue in many wireless networks is how to establish the best possible quality connections between users to base stations. This is in particular challenging when users are randomly located over a geographical region, and each covered by a number of heterogeneous base stations. To this end, we design a smart and efficient cell selection mechanism to improve the user-base station connection in heterogeneous wireless networks. We formulate the cell selection problem as an asymmetric congestion game with consideration of users’ heterogeneity in their locations and their data rates to various cells. We show the existence of pure Nash equilibria (PNE) and propose a concurrent distributed learning algorithm to converge to them. In the algorithm, we allow users to perform random error-tolerant updates synchronously, and guarantee them to reach one or multiple PNE with the largest utilities. In addition, we do a systematical investigation on the implementation of the algorithm in practical networks. Simulation results show that the algorithm can achieve satisfactory performance with acceptable convergence rate.

ACCEPTED MANUSCRIPT 2

II.

R ELATED W ORK

: OFDMA BS (4G Networks)

: TDMA BS (2G Networks)

: CDMA BS (3G Networks)

: CSMA BS (WiFi Networks)

Fig. 1. Network model. The dots indicate the users. The circles indicate the coverage regions of BSs.

III.

AC

CE

PT

ED

M

AN US

In the environment where heterogeneous networks coexist, the competition between users, between networks and between users and networks can all be formulated as different types of games [3], [4]. Among them, congestion game is extensively studied. Congestion game is first studied in the wire-line routing problem, where each source node seeks for the route with the minimum delay cost [5]. Recent studies apply it in wireless networks to model the competition for resources among selfish users. In [6] and [7], the authors study the resourcehomogeneous congestion game. In [6], the authors propose an exact potential function. In [7], the authors derive the conditions for the longest and shortest convergence time. In [8] and [9], the authors study the congestion game with playerspecific constants. In [8], the authors focus on analyzing the price of anarchy of the games. In [9], the authors propose two classes of throughput models, where one is formulated as the congestion game with player-specific constants and the other is formulated as a potential game. In [10], [11] and [12], the authors consider two critical aspects in wireless communication, namely interference and spatial reuse. Therefore, their works are based on the congestion games on graphs. In [10] and [11], the authors show that the pure Nash equilibria only exist in some special cases. In [12], generalized spatial games are studied, in which the users are allowed to move for higher utilities. In [13], the authors consider the users’ diverse demands. Aiming at maximizing the network resource utilization, they formulate the user-network association game as an exact potential game. Our work is differentiated from the previous works in that we discuss the asymmetric scenario, in which the users have not only the distinct resource sets but also player-specific payoff functions. In addition, our learning algorithm, guiding the users to Nash equilibrium, is different from the existing ones (such as the Q-learning in [6], the joint channel selection and strategic mobility algorithm in [12], the hysteresis mechanism in [9], and the local improvement algorithm in [13]) in that we allow users’ concurrent and spontaneous choices for higher throughput. A similar action is group handover in vehicular communication [14], [15], which happens because of the users on a same vehicle moving together. However, this action is essentially different from ours on the aspect of users’ willingness and selfishness. What is more, we allow errors in users’ choices, which is not studied in previous works.

CR IP T

etc. The remainder of the paper is organized as follows. Related work is given in Section II. In Section III, we present the system model and the game model. We then prove the existence of PNE of our game in Section IV. In Section V, we propose the concurrent distributed learning algorithm, and study its properties. We make some discussion in Section VI and evaluate the performance of our algorithms through simulation results in Section VII. Finally, the conclusion is drawn in Section VIII.

S YSTEM M ODEL

A. Network Model

We consider a wireless network which consists of K BSs and M users. Their sets are denoted by K = {1, 2, . . . , K} and M = {1, 2, . . . , M }, respectively. The BSs represent the access points or base stations of heterogeneous networks, such as the TDMA, OFDMA, CDMA and CSMA networks. All the BSs have partially coverage areas as shown in Figure 1. In addition, due to the technologies of spectrum separation among different types of networks and spatial reuse among the same type of network, all the BSs are assumed to be interferencefree. On the user side, we assume that the users are within the coverage areas of different BSs. Each user has the ability to sense and probe the BSs who cover it but access only one at a given time. We denote the achieved throughput of user m connected to BS k as πkm (Rkm , Πk , nk ), in which Rkm is the instantaneous rate when user m occupies BS k exclusively, Πk is the resource schedule policy defined by BS k, and nk is the number of the users sharing BS k. Considering the realistic scenario, we set Rkm as a user’s specific parameter and set Πk according to the types of BSs, such as a proportional fair scheduling for a OFDMA BS [16] and a CSMA protocol for a WiFi access point. The above setting makes πkm distinct from user to user. In addition, given that each BS has limited network resources, we assume that the users’ throughput decrease with the number of users occupying the same BS and thus define πkm (Rkm , Πk , nk ) as a monotone decreasing function on nk 1 . 1 That a user’s payoff decreases strictly with the congestion level is a common assumption under the previous congestion-game-based resource competition models [6]–[12], [22]. Our work is also under this assumption. The concrete expression of the function πkm , which consistent with both the assumption and the practical systems, are provided in Section VII.

ACCEPTED MANUSCRIPT 3

B. Game Model By defining the user’s payoff as the throughput it obtained by accessing one of the heterogeneous BSs, we model the competition among M users for K heterogeneous BSs as a congestion game. In our case, the congestion game tuple is represented as Γ = (K, M, (Σm )m∈M , (πkm )k∈K,m∈M ), where K denotes the set of resources/BSs, M denotes the set of players/users, (Σm )m∈M is the strategy space of player m, and πkm : N → R+ is the payoff function for player m to choose BS k. We define σ = (σ1 , σ2 , . . . , σM ) as a strategy profile and (n1 , n2 , . . . , nK ) as a congestion vector corresponding to σ, in which σm ∈ Σm is the strategy played by user m, and the number nk is further denoted as |{1 ≤ m ≤ M |σm = k}|. It is worth noting that the payoff function πkm is player-specific and decreases monotonously in nk . Since we assume that each user only obtains one resource at a given time, (Σm )m∈M ⊂ K is the set of pure strategies. Furthermore, in our case, the strategy spaces of users differ, and thus the game is asymmetric [17]. Therefore, our network can be formulated as an Asymmetric Singleton congestion game with Player-Specific payoff functions (ASPS). Hereinafter, we will prove that ASPS has PNE and a property of weakly finite improvement (defined in Definition 2). In addition, based on the property, we will propose distributed algorithms converging to PNE.

Corollary 1. ASPS does not possess FIP except that there are only two resources, or there are only two users.

IV.

To prove this theorem, it suffices to show that there exists a better-reply (or best-reply) improvement path sequence starting from any strategy profile, which will end up in a PNE. Similar to that in [19], we can find such a special best-reply improvement path sequence, which consists of two parts. In the first part, each deviator chooses the next deviator’s present strategy according to their strategy spaces until no one changes. In the second part, each deviator takes the last deviator’s previous strategy according to their strategy spaces until no one can change. To avoid redundancy, we omit the detailed proof here.

CR IP T

We provide a counterexample showing that there exist a cycle of single-player improvement path sequence in ASPS in Appendix B. In addition, the proof of the special cases are given in Appendix C. The nonexistence of FIP implies that ASPS does not generally admit an exact potential function. It also causes a problem of whether there exists a strategy profile from which every single-player improvement path sequence cycles. Then we will show that such problem does not exist. In other words, from any strategy profile, at least one single-player improvement path sequence ending up to a PNE exists. The property is defined as weak FIP. Definition 2. (Weak FIP) A game has the weak FIP if from any strategy profile σ, there exist a finite sequence of single-player improvement path which ends up in a PNE.

AN US

Obviously, weak FIP is an extension of FIP. The FIP implies the corresponding property for weak FIP, but not vice versa. A finite game (which has finite strategy space) with weak FIP has an appealing property that if the users do better reply in random orders, then they will eventually converge to a PNE [20]. Therefore, we then show that ASPS has the weak FIP.

EXISTENCE OF A PURE STRATEGY NASH EQUILIBRIUM IN ASPS

M

In this section we show the existence of PNE, and show the properties of ASPS for the design of learning algorithms. We define the pure Nash equilibrium as follows.

ED

Definition 1. (PNE) The σ = (σ1 , σ2 , . . . , σM ) is a PNE if and only if each σm ∈ Σm is a best-reply strategy which satisfies πσmm (nσm ) ≥ πkm (nk + 1) for all m ∈ M, k ∈ K, σm ∈ K and k 6= σm .

PT

Theorem 1. A finite ASPS has a pure Nash equilibrium.

AC

CE

The sketch of proof is provided in Appendix A. In traditional congestion games (e.g., the resourcehomogeneous or user-homogeneous congestion games), researchers usually explore the finite improvement property (FIP) [18], with which any single-player improvement path sequence beginning at arbitrary initial strategy profile will terminate at a PNE in finite steps. The appealing property of FIP is that the it guarantees the existence of a generalized ordinal potential function [19], and thus guarantees a straightforward distributed algorithm that the users playing their better reply (i.e., the users selfish update their strategies to improve their payoffs) asynchronously can achieve a “local” maximum of the potential function, which is a PNE. However, ASPS differs from traditional congestion games on its asymmetric and player-specific features. In what follows we will show that ASPS does not generally possess FIP except for special cases.

Theorem 2. A finite ASPS has the weak FIP.

V.

D ISTRIBUTED L EARNING A LGORITHM FOR C ELL S ELECTION

In this part, we introduce the concurrent learning algorithm to achieve the PNE of ASPS. The algorithm is fully distributed. That is, users learn the environment and adapt the strategies based on their own measurements without information exchange. In addition, the algorithm operates in a boundedrationality style, not requiring common/prior knowledge or large storage space for history information. The key idea of the algorithm is to allow more than one users to do synchronous updates concurrently, and we also allow the existence of sampling mistakes. We define the strategy updating rules as follows. Definition 3. (Better-Reply Dynamics with Errors and Simultaneous Sampling [20]) This dynamics represents the following class of strategy updating rules. At any time period, each user 0 m ∈ M randomly samples strategy σm ∈ Σm \{σm } with

ACCEPTED MANUSCRIPT 4

User 1 User 2 User 3 BS 3

BS 1

Fig. 2.

(1,1,3)

(2,1,3)

(1,2,3)

(2,2,3)

(1,3,3)

(2,3,3)

System state transition diagram.

probability (|Σm | − 1) divided equally among the strategies of Σm \{σm }. User m switches from its status quo strategy 0 to σm if and only if πσmm + 1) > πσmm (nσm ). When 0 (nσ 0 m m m 0 πσm + 1) ≤ πσm (nσm ), the user switches with error 0 (nσ m probability ρ = L , where L is some integer which satisfies L > M.

m∈M

CE

PT

ED

M

from the strategy profile σ to σ 0 as p(σ, σ 0 ; ), and define the corresponding transition matrix as P(). Due to the existence of errors, all the states of the process P() make up a unique recurrent class, where the probability of switching from any state to another is positive, and the switching completes in a finite number of steps. Therefore, the Markov process P() is aperiodic and irreducible for all  > 0. Thus, there exist a unique stationary distribution µ(), which represents the frequency that each strategy profile can be achieved with after a long-time running. In addition, we define the limit stationary distribution µ as lim→0 µ() and define P(0) = lim→0 P() as the transition matrix of the Markov process under the condition  = 0. 0 Second, we define a resistance r(σ, σ ) from σ to σ 0 as 0 0 ;) . We define a tree rooted at r(σ, σ ) = lim→0 log p(σ,σ log  σ as T (σ), which guarantees that for each strategy profile σ 0 ∈ S\σ, there is a unique directed path sequence from itself to σ. The resistance of T (σ) is the sum of all the resistances of the paths composing it. In addition, we define the stochastic potential ψ(σ) as the minimum resistance of all 0 the trees T (σ), which is ψ(σ) = min Σ(σ,σ0 )∈T (σ) r(σ, σ ).

AC

Theorem 3. In ASPS, when  → 0, the limit distribution under better-reply dynamics with errors and simultaneous sampling puts positive mass on strategy profile σ only if σ is a PNE. The sketch of proof is provided in Appendix D. Theorem 3 guarantees that all the states which have positive mass are PNE. However, the converse is not true. Therefore, we will mainly discuss which type of PNE have non-positive mass. We define weakly dominated PNE as follows. Definition 4. (Weakly Dominated PNE) Suppose σ and σ 0 are two PNE in ASPS. When the following two conditions hold, we say σ 0 is a PNE weakly dominated by σ. (1) For every user m ∈ M, its payoff under strategy σm in σ is at least at 0 in σ 0 . (2) There exist at least two large as the payoff under σm users2 whose payoffs are strictly higher under σ than under 0 σ 0 . To be brief, if ∀m ∈ M, πσmm (σ) ≥ πσmm 0 (σ ) and ∃m ≥ 2, m m 0 0 πσm (σ) > πσm 0 (σ ), then σ  σ.

AN US

The constraint L > M guarantees that the probability of a single mistake is less than the probability that all the users switch concurrently without any mistake. We now analyze our ASPS from the aspect of stochastic process under Definition 3. We need to introduce some additional definitions hereinafter. First, the better-reply dynamics with errors and simultaneous sampling generates a Markov process in a finite strategy profile space S = × Σm . We define the transition probability

As an example, we provide the system state transition diagram of a heterogeneous network with three users and three BSs in Figure 2. There are a total of 6 strategy profiles, and S is {(1, 1, 3), (2, 1, 3), (1, 2, 3), (2, 2, 3), (1, 3, 3), (2, 3, 3)}. The transition between two system states is feasible if there is a directed link between them. Due to the existence of errors, all the strategy profiles compose a unique recurrent class. We denote a tree rooted at (2, 1, 3) by red directed links in this figure. Obviously, for each strategy profile, there are several trees, which have different resistances. Then we analyze which strategy profiles have positive mass and what properties are reflected.

CR IP T

BS 2

{T (σ)}

Results from [21] show that µ exists, and it is equal to the stationary distribution of P(0). Furthermore, they show that a strategy profile σ is stochastically stable (i.e., profile σ has positive mass that µ(σ) > 0) if and only if it has the minimum stochastic potential. Intuitively, stochastic potential reflects the probability a strategy profile can be achieved, and represents the users’ preferences. A stochastically stable profile reflects the users’ final choices, and we will explore it.

For simplicity, we call the PNE except the weakly dominated ones as the dominating PNE. Since we allow concurrent sampling and errors, the users will not stick to the first PNE achieved. A deviation from a PNE occurs if and only if one of the three situations happens. The first situation is at least one user deviates by mistakes only. The second is more than one user deviate concurrently without making any mistake. The third is a combination of the first and the second situations. We will first analyze the property of the weakly dominated PNE, and then characterize the concurrent deviation under the three situations.

Lemma 1. Suppose σ is a PNE in ASPS. If a deviation from σ to another strategy profile γ occurs due to more than one user’s concurrent deviation without making any mistake, then γ is a PNE which satisfies σ  γ. The proof is given in Appendix E. We then study which PNE are excluded under the dynamics defined in Definition 3.

Theorem 4. In ASPS, as  → 0, the limit stationary distributions of the dynamics in Definition 3 put zero mass on PNE σ 0 if it is weakly dominated by at least one other PNE. The proof is given in Appendix F. 2 We should note that there do not exist two PNE σ and σ 0 , where only one user receives strictly higher in σ than in σ 0 . If they exist, the σ 0 should not be a PNE according to Definition 1.

ACCEPTED MANUSCRIPT 5

m∈M

VI. D ISCUSSION A. Terminal Condition One characteristic of CDLAE is that the users may switch among several dominating PNE3 . This is because we allow the users to keep on sampling even if they achieve a dominating PNE. This characteristic makes CDLAE well-suited for the users who keep on pursuing high throughput (e.g., when they are accessing web information), especially in a varying environment. However, when limited number of switching is required, we can modify ASPS by setting QoS demands for users. By doing so, a user stops sampling when its demand is satisfied. Under this modification, ASPS is similar to a QoS game in [22], and CDLAE still works. B. Non-uniform Error and Probe Probabilities Notice that we set uniform sampling probability  and error probability ρ = L for all users in CDLAE to simplify the analysis. Actually, when we slightly relax the ”uniform” condition according to the following corollary, all of our results still hold.

ED

M

AN US

holds. Set the probe probability as Pm = (|Σm | − 1). Set the error probability as ρ = L for some integer L > M . loop for Each time period t and each user m do B Decision, Choosing, and Probing Stage generates a random integer between [1,100]. if the integer ≤ 100Pm then randomly chooses a BS k in the corresponding strategy set Σm with sampling probability  and probes. else stays still and transmits. end if. B Transmission Stage for the active users do if a user receives higher payoff from the sampled BS then switches to the sampled BS and transmits. else stays with the current BS with probability 1 − ρ or switches to the sampled BS with probability ρ, and transmits. end if end for for other users do keep on transmitting. end for end for end loop

new BS according to the probing results. Note that we do not set a terminal condition, and we will provide our reasons on the following section.

CR IP T

Algorithm 1 Concurrent Distributed Learning Algorithm with Error-Tolerant Property (CDLAE) initialization Set a random initial strategy profile as σ(0) = (σ1 (0), σ2 (0), . . . , σM (0)). Set the sampling probability , so that ( max |Σm |−1) ≤ 1

PT

Theorem 4 implies that the dynamics defined in Definition 3 converges to a PNE, and that this PNE is a dominating one which is weakly preferred by all users. The corollary follows is a deduction of Theorem 4.

CE

Corollary 2. If a Pareto optimal PNE σ ∗ exists in ASPS, the limit stationary distribution of better-reply dynamics defined in Definition 3 puts all mass on σ ∗ .

AC

Proof: Since σ ∗ is Pareto optimal PNE, σ  σ ∗ holds for any non-Pareto-optimal PNE σ. Due to Theorem 4, any σ has zero mass. Therefore, σ ∗ has all the mass. We summarize the concurrent distributed learning algorithm with error-tolerant property (CDLAE) based on the dynamics defined in Definition 3 in Algorithm 1. The users guided by the algorithm act upon the following two-stage time period. In Stage I (called the “Decision, Choosing and Probing Stage”), more than one user are allowed to make a spontaneous try for a better BS by probing, and we call them active users. In Stage II (called the “Transmission Stage”), every active user decides whether to use the original channel to transmit or to select a

Corollary 3. CDLAE works under a general case that different users have different sampling probabilities and error probabilities, where the sampling probabilities are with the same order approaching zero4 , and the error probabilities are with the same or higher order than the one defined in Definition 3 when approaching zero. The proof is given in Appendix G. Obviously, the general case discussed here are closer to the actual wireless networks where the users are heterogeneous. Our CDLAE is widely applicable in a real environment. C. Convergence Rate Our CDLAE is essentially based on random better response dynamics. Thus, we focus on how long the convergence will take by better (best) response dynamics in congestion games. The most commonly used approach to achieve the bound of traditional congestion games is to construct a potential function, for example, in the singleton congestion game [23], in the congestion game with linear latency function [24], and in the spatial congestion game where the users use persistenceprobability-based random access mechanism for channel contention [12]. When considering the congestion game with player-specific functions, only some special cases have exact bounds, for example, the resource-homogeneous game [7], the game with only two resources [25], the games represented as trees and cycles [26], and the game with constraints on users’ 3 The frequency of the switching depends on the sampling probability. Generally, it is very low. This type of switching is different from the Ping-Pong effect. 4 Although we allow non-uniform sampling probabilities, we specifically exclude the case that a user adopts  while another one adopts 2 .

ACCEPTED MANUSCRIPT 6

D. User Mobility

Corollary 4. In the joint cell selection and mobility game Υ, when  → 0, the limit stationary distributions of the dynamics in Definition 3 put zero mass on a PNE σd if it is weakly dominated by a PNE σd0 0 in Γd0 , where d0 is in the S m set × {∆dm dm }.

AC

CE

PT

ED

M

AN US

The user-mobility communication scenarios can be roughly classified into two types: plan-driven and throughput-driven. In a plan-driven scenario, the users move typically by plans and their objectives are to track the variant channel resources so as to keep the best connections. Although we do not provide a detailed discussion on this scenario, it deserves noting that our CDLAE is particularly well-suited when users’ payoff functions change slowly from time to time, and the changing probability is very small comparing to the probability of random sampling of unused actions. For example, the users move at a medium speed. Under this situation, all of our results proven above remain valid. On the other hand, in a throughputdriven scenario, the users move to stronger and more stable connections. For example, when some disasters occur, such as earthquakes or nuclear accidents, people are anxious to get in touch with family and rescue workers, so they have incentives to move to stronger signals with fewer users. We will discuss the throughput-driven mobility scenario. We briefly define the joint cell selection and mobility game first. Without loss of generality, we assume that there are a total of L possible spectrum access positions, each within the coverage areas of some definite BSs. We also assume that all the positions are connected, i.e., any position can be reached from any other position. The set of all the positions are defined as ∆ = {δ1 , δ2 , . . . , δL }. The set of BSs accessible at position δl is defined as Kδl = {kδl }, where k ∈ K. We define the set of all BSs accessible from all the positions as K∆ = {Kδl , Kδ2 , . . . , KδL }. We represent the joint cell selection and mobility game tuple as Υ = (K∆ , M, (Σm )m∈M , (πkmδ )m∈M,kδ ∈K∆ ). Due to the mobility property of users and the connectivity of ∆, each user is allowed to connect any BS from the corresponding position, i.e., its strategy space is Σm = K∆ . We assume that each user has a moving distance constraint, i.e., if user m is located at dm ∈ ∆ and its distance constraint is θm , it can only move to a new position d which is in the set ∆m dm = {d ∈ ∆\{dm }, ||ddm || ≤ θm }, and access a BS in the corresponding set K∆m . We define d = (d1 , d2 , . . . , dM ) dm as a position profile, and σd = (σ1d1 , σ2d2 , . . . , σM dM ) as a strategy profile for all users.

We can find that when all users do not move, game Υ degrades into a corresponding cell selection game Γd where the position profile of all users is d. The following key results reveal the similarities and differences between Υ and Γd . First, Υ is a symmetric singleton congestion game with player-specific payoff functions. Any cell selection game Γd is a subgame of Υ5 . Therefore, Υ processes a PNE, and it is one of the PNE processed in the corresponding subgame Γd . The result can be proved by contradiction. Second, Υ has the weak FIP as Γd , and CDLAE can be applied into Υ for leading users to PNE. Due to the connectivity property of ∆, the Markov process associated with CDLAE is aperiodic and irreducible for all  > 0. Hence the Markov process has a unique limit stationary distribution, and the states with the minimum stochastic potential have the positive mass as that in Γd . The following corollary characterizes the above property.

CR IP T

payoffs [27]. Unfortunately, in the general player-specific singleton congestion game, the authors in [26] conjecture that there exist no polynomial upper bound on the expected number of the steps under random best (better) response dynamics and support their conjecture by simulations. The ASPS defined in this paper is in essence a case of the general player-specific singleton congestion game, and we conjecture that there does not exist polynomial upper bound either. Up till now, we are unable to provide the exact convergence bound of CDLAE. Actually, it is not a unique problem of our algorithms. It is simply a feature of weak FIP. In order to fix this deficiency, we explore the convergence rates of CDLAE via simulations in Section VII. The results show that CDLAE can converge in an affordable time.

m∈M

Corollary 4 can be viewed as a generalization of Theorem 4 applied in Υ. Furthermore, in Γd , due to the allowing of errors, the direct transition between any two strategy profiles is feasible. However, in Υ, some direct transitions between two strategy profiles are forbidden due to moving distance constraints. The difference mentioned above may result that a Pareto optimal PNE (if it is existed) in Υ does not have the minimum stochastic potential. Therefore, we generalize Corollary 2 applied in Υ as follow. Corollary 5. If a Pareto optimal PNE σ ∗ exists in game Υ and there is no moving distance constraint of any user, the limit stationary distribution of better-reply dynamics defined in 3 puts all mass on σ ∗ .

VII. S IMULATION RESULTS We consider a heterogeneous network with K = 6 BSs and several users. The number of users M is fixed as 10 unless otherwise specified. All of them are distributed in a 1000×1000 square area. The types of BSs are set as 2 TDMA, 2 OFDMA, 1 CSMA and 1 CDMA. The coverage radiuses of the BSs are defined as 300, 400, 400, 500, 500 and 700, respectively. As denoted in Section III, we define the payoff function πkm (nk ) as the throughput that the users obtain by accessing different BSs, which is as follow. • Considering the time-fair TDMA system, which is always adopted in 2G networks, the users are allocated the same time duration. We define the throughput as πkm (nk ) =

Rkm . nk

(1)

5 A subgame of a finite game is defined by replacing each strategy set with some subset of and restricting the payoff functions correspondingly.

ACCEPTED MANUSCRIPT 7

When exploring the proportional fair scheduling (PFS) for OFDMA in 3G/4G networks, we formulate the throughput as (see e.g., [16])



Pnk

nk Rkm X 1 , nk j=1 j

(2)

600 BS 4 (OFDMA) 500

reflects the channel fading. where In WiFi networks, the users follow the CSMA protocol, by which their throughput depend on the probabilities of accessing channels. We formulate user m’s throughput as (see e.g., [10])

N0k +

j=1,j6=m

400 2 300

 ,  j

hjk Pk

3 BS 3 (OFDMA)

200 100

(3)

where pm k (nk ) is the probability for user m to access the channel provided by BS k when there are nk users competing for it simultaneously. In CDMA networks, user m’s throughput is related to its signal-to-interference ratio (SINR) [28]. We formulate user m’s throughput as   m hm k Pk nk P

1 6

10

700

1 j=1 j

 πkm (nk ) = log  1 + γk

BS 1 (TDMA) 7

800

πkm (nk ) = Rkm pm k (nk ),



BS 5 (CSMA) BS 6 (CDMA)

900

0

Fig. 3.

8

0

9

100

200

300

5

400

500

600

700

800

900

1000

Network model.

TABLE I.

PNE 4783 PNE 4878 PNE 5454 PNE 5549 GO 5537

4

BS 2 (TDMA)

CR IP T

πkm (nk ) =

1000

P URE NASH E QUILIBRIA AND G LOBAL O PTIMUM

(5, 4, 2, 3, 3, 4, 1, 3, 3, 6) (3.82, 5.24, 2.28, 2.97, 3.07, 5.27, 7.10, 2.22, 0.24, 3.32) (4, 4, 2, 3, 3, 5, 1, 3, 3, 6) (3.44, 5.24, 2.28, 2.97, 3.07, 5.72, 7.10, 2.22, 0.24, 3.32) (4, 4, 2, 3, 3, 1, 5, 3, 3, 6) (3.44, 5.24, 2.28, 2.97, 3.07, 5.14, 5.18, 2.22, 0.24, 3.32) (1, 4, 2, 3, 3, 4, 5, 3, 3, 6) (6.19, 5.24, 2.28, 2.97, 3.07, 5.27, 5.18, 2.22, 0.24, 3.32) (1, 4, 3, 2, 3, 4, 5, 3, 3, 6) (6.19, 5.24, 0.98, 5.84, 3.07, 5.27, 5.18, 2.21, 0.24, 3.32)

AN US



(4)

ED

M

where γk > 0 is the spreading gain of BS k, hjk is the channel gain between user j and BS k, Pkj is the transmitter power of user j and N0k is the power of receiver noise. When BSs adopt interference cancellation technologies, we assume that the residual interferences Ikm (nk ) follow Gaussian distribution with the average power increasing with nk [29], and rewrite (4) as   m hm m k Pk πk (nk ) = log 1 + γk k . (5) N0 + Ikm (nk )

AC

CE

PT

In (1), (2) and (3), Rkm is defined as a random value uniformly distributed in the range (0, 10). In (3), the probability pm k (nk ) is define as a convex function decreasing with the number nk , and formulated as n12 . In (5), we set the spreading k gain γk = 32, and the noise power N0k = 0.01. The power Pkm is randomly chosen from the range (0, 100), and the sum of the residual interferences Ikm (nk ) is defined as a linear function increasing with nk . We only consider the large-scale fading, and the channel gain hm k is set as the square on the inverse of the distance between user m and BS k. We set the error integer L = M +1 and the sampling probability  = 0.2 unless otherwise noted. We will evaluate our CDLAE by comparing it with a sequential distributed learning algorithm (we call it SDLA), which just allows one user to do random sampling and updating each time period, and forbids error. A. Equilibria Analysis We illustrate our network model in Figure 3, where BSs are represented by big dots and users are represented by small

dots. The lines describe the coverage areas. The total number of strategy profiles in this system is |S| = × |Σm | = 6144. m∈M

Note that we denote each strategy profile by its index from 1 to 6144 below. For ease of analysis, we provide all the PNE and the global optimum (GO) in Table I, in which the indices are presented in the left hand column, and the strategy profiles with the corresponding users’ payoffs are presented in the right hand column, respectively. Figure 4 shows the convergence dynamics of SDLA and our CDLAE with respect to the distribution of each strategy profile. We see that both under SDLA and CDLAE, when the time is short (e.g., shorter than 50), several strategy profiles are chosen, which means the users are keeping on selecting different BSs for higher payoffs. As time passes, little strategy profiles have positive mass. When time is longer than 300 in our case, only one strategy profile with index 5549 under SDLA and two strategy profiles with indices 4783 and 4878 under CDLAE have positive mass. Refer to Table I, we find that both SDLA and CDLAE lead the users to one or two of the PNE. The differences between SDLA and CDLAE are as follows. When the users achieve a PNE under SDLA, they do not leave it. However, under CDLAE, the users keep on sampling for higher payoffs even when they have achieved a PNE. In addition, the users may leave their present PNE and achieve another PNE due to the reasons that at least one user

ACCEPTED MANUSCRIPT 8

CDLAE

2000

3000

4000

5000

6000

7000

2000

3000

4000

5000

6000

7000

t=100 0 0 400 200 0 0 500

1000 t=300 1000

2000

3000

4000

5000

6000

7000

t=500

0 0 1000 500 0 0

1000

2000

3000

4000

5000

6000

7000

t=1000 1000

Fig. 4.

2000

3000 4000 5000 Strategy profile index

6000

7000

20 10 0 0 100 50 0 0 200 100 0 0 400 200 0 0 1000 500 0 0

t=50 1000

2000

3000

4000

5000

6000

7000

2000

3000

4000

5000

6000

7000

2000

3000

4000

5000

6000

7000

2000

3000

4000

5000

6000

7000

3000 4000 5000 Strategy profile index

6000

7000

t=100 1000 t=300 1000 t=500 1000 t=1000 1000

2000

Learning dynamics under SDLA and CDLAE.

400 300

PNE 4878

SDLA

PNE 5549

PNE 4783

PNE 5454 100 1000

2000

4000 3000

3000 4000 5000 Strategy profile index PNE 4878

CDLAE

6000

7000

PNE 5549

PNE 4783

2000

2000

3000 4000 5000 Strategy profile index

6000

7000

Distribution of different PNE.

M

Fig. 5.

1000

CE

PT

ED

makes mistakes or more than one users jumping concurrently. Therefore, our CDLAE is particularly well-suited for the environment that the users’ payoff functions slowly change from time to time. Under this situation, all of our results remain valid. We then evaluate the equilibria achieved by SDLA and CDLAE. We allow the algorithms to begin at an arbitrary initial strategy profile and run until the distributions are stationary. We repeat the above process for 30000 times. The stationary distributions under SDLA and CDLAE are shown in Figure 5. As shown in Table I, there are 4 PNE, which are denoted as σ4783 , σ4878 , σ5454 and σ5549 . In Figure 5, we can find that all the PNE have positive mass under SDLA, while only the σ4783 , σ4878 and σ5549 have positive mass under CDLAE. Referring to Table I, we get the reason resulting in such phenomena, which is σ5454  σ4783 . Actually, σ5454 is weakly dominated by all the other PNE in this case. Therefore, by analysis in Theorem 4, σ5454 is eliminated under CDLAE. As for σ4783 , σ4878 and σ5549 , since none of them are Pareto optimal, they all have positive mass under CDLAE. Note that the values of the distributions in the upper sub-figure reflect the wishes of the users. Therefore, by observing the lower subfigure, we can find that CDLAE can lead the users to different PNE according to their wishes when the time is long enough.

γ∈Ξ\σ m∈M

, (6) M |Ξ| where 1(·) is an indicator function. We illustrate the average user satisfaction ratios under global optimization, SDLA and CDLAE at the lower sub-figure in Figure 6. From the subfigure, we see that the user satisfaction ratios decrease with the number of users under global optimization, while keep almost unchanged under SDLA and CDLAE. Accordingly, the performance gap between global optimization and SDLA/CDLAE increases. The reasons resulting in such phenomena are as follows. When the number of users are less than or equal to that of BSs (e.g., M ≤ 6 in Figure 6), the users are possible covered by individual BSs, and there may be no competition. In this case, GO will coincide with PNE. Consequently, the user satisfaction ratio under global optimization is close to that under SDLA/CDLAE. With the increasing number of users (e.g., M ≥ 8 in Figure 6), the competition becomes more intensive. Accordingly, the proportion of users who are forced to be sacrificed for the GO is also increased, which significantly decreases the user satisfaction ratio under global optimization. Under CDLAE, about 65% users are satisfied, which is always higher than that under SDLA and global optimization. On the basis of the above analysis, we can draw the conclusion that CDLAE can greatly improve the user satisfaction by leading them to a more satisfactory PNE under less than 5% performance loss, compared to the centralized optimal solution, and CDLAE can always outperform SDLA. sσ =

1000 0 0

AN US

Strategy profile distribution

200

0 0

B. User Satisfaction Ratio In this subsection, we generalize our network model by randomly distributing the users and varying their number from 4 to 22. We compare SDLA and CDLAE with the solution obtained by the centralized global optimization of maxσ∈S Σm∈M πσmm (nσm ). We show the results in Figure 6, where each point of the curves is the average value of 30000 times simulation. In the upper two sub-figures, we study how SDLA and CDLAE affect the system wide payoff. We see that with respect to the average system wide payoff, the performance losses of SDLA and CDLAE are within 6% and 5%. With respect to the worst PNE, the performance losses are within 15% and 10%. The losses remain approximately constant when the number of users increases. In addition, CDLAE can always achieve better or at least equal performance than SDLA. Note that in our algorithms, the cardinal properties of ASPS play no role, and all the users’ choices depend strictly on the ordinal properties of the payoff functions. Therefore we define a parameter sσ to describe the “user satisfaction ratio” at the strategy profile σ, which reflects users’ preference. Let Ξ be the set of all the PNE and the global optimum in the game. Then the user satisfaction ratio sσ at any σ ∈ Ξ is defined as P P 1(πσmm (nσm ) ≥ πγmm (nγm ))

CR IP T

1000

Strategy profile distribution

t=50

AC

Strategy profile distribution

SDLA 40 20 0 0 50

C. Convergence Time In this subsection, we explore the convergence of CDLAE under various network models, changeable sampling probability  and different number of users M . We set the terminal

ACCEPTED MANUSCRIPT 9

0.96 4

6

8

10

0.85 4

6

8

10

6

8

10

12

14

16

18

20

22

12

14

16

18

20

22

12

14

16

18

20

22

Number of users M

Convergence Time

0.94

1 0.95 0.9

Number of users M

0.7 0.6 0.5 0.4 4

Number of users M

Fig. 6. Comparisons with regard to system wide payoff and user satisfaction ratio.

Fig. 7.

x 10 3.5 3.25 3 2.75 2.5 2.25 2 1.75 1.5 1.25 1 0.75 0.5 0.25 0 0

4

SDLA CDLAE 2500 2000 1500 1000 500 0

0.1

4

0.2

0.4

0.5

0.3 0.4 Sampling Probability

0.5

0.6

SDLA CDLAE

3 2

AN US

Convergence time

x 10

1

0 4

6

8

10

12

14

16

18

20

22

24

Convergence time

3000

M

ED

PT

CE

AC

6 We set random payoff thresholds for users, but we use some prior knowledge to make the thresholds reasonable – specifically, the case that a threshold is higher than the maximum possible obtained payoff is excluded.

0.3

Convergence time versus sampling probability . 4

condition as that the mass on PNE are over 99%. Each point of the curves analyzed below is the average value of 30000 times simulation. Firstly, we fix the number of BSs and users as 6 and 10, and place them randomly in the square area. Figure 7 shows the average convergence times of SDLA and CDLAE when the sampling probability  varies. We see that when  ≤ 0.5 the average convergence time of CDLAE is shorter than that of SDLA (e.g., 500 in CDLAE but 5000 in SDLA when  = 0.4). This is because the convergence speeds up by concurrent sampling. For SDLA, the larger the  is, the shorter the convergence time is. This is because with larger  the active user is more willing to probe a new BS each iteration, which accelerates the convergence. As for CDLAE, when  is smaller than 0.5, the convergence rate becomes faster when  gets larger. However, when  is larger than 0.5, the convergence rate slows down sharply. Large sampling probability  performs like a blessing and a curse. On one hand, it leads the users to achieve a PNE quickly as it does in SDLA. On the other hand, it also increases the probabilities of mistakes and simultaneous sampling that will take the users away from any PNE. In our simulations, it appears that the negative effects overweight the positive ones when  > 0.5. Furthermore, in Figure 8, we set  = 0.2, and observe the average convergence rates of SDLA and CDLAE under changeable M . We see that the increasing number of users will affect both algorithms as slowing down the convergence rates, and to a lesser extent, CDLAE. Even when M is larger than 90, CDLAE can converge in an affordable time (e.g., which equals to 1532.6 at M = 110, showing modest increase compared with 890 at M = 10). In addition, when we set QoS demands for users 6 , and let them to stop sampling when satisfied, the convergence times can be further decreased. It means that CDLAE is less sensitive to the number of users than SDLA, and is more suitable for large-scale networks.

0.2

CR IP T

GO SDLA CDLAE

1 0.98

Average user satisfaction

System wide payoff (worst)

System wide payoff (average)

1.02

1000 0

Fig. 8.

CDLAE CDLAE − QoS

2000

20

40

60 80 Number of users M

100

120

Convergence time versus number of users M.

VIII.

C ONCLUSION

In this paper, we generalize the asymmetric congestion game framework for cell selection mechanism design in heterogeneous networks by considering both the users’ distinct positions and data rates. We propose CDLAE that converges to PNE based on the local one-step observations of users. Simulation results show that CDLAE can converge in an affordable time even when the number of users is large. In addition, CDLAE can lead the users to a more satisfactory PNE by eliminating the weakly dominated PNE under less than 5% performance loss, compared with the centralized optimal solution. What is more, since we allow users’ simultaneous update with errors in CDLAE, it is suitable to be implemented in a practical system. IX.

ACKNOWLEDGEMENT

This work is supported by NSF China (No. 61601126, U1405251, 61571129); Science Foundation of Fujian Province (No. 2016J01299, JA15089).

ACCEPTED MANUSCRIPT 10

TABLE II.

A C OUNTEREXAMPLE

``` Strategy ``` `

Deviator User User User User User User

3 2 1 3 2 1

(π33 (1) < π23 (1)) (π12 (2) < π22 (2)) (π11 (1) < π31 (1)) (π23 (2) < π33 (2)) (π22 (1) < π12 (1)) (π31 (2) < π11 (2))

First 1,2 1,2 1

Second

Third 3

3 2,3 2,3 2

2 1,2

V 1 1,3 1,3 3

cut

V' add

Fig. 9.

as ((k, σn (0)), . . . , (k, k), (j, k), . . . , (k, σn (L)), . . .), where a loop σ(l0 ) = σ(L) exists. When considering user m, if the loop exists, the equality σm (L) = σm (l0 ) = k where l0 < l must hold. As for user n, since user m’s changing at step l + 1 results in a higher πln , user n will not go back to any strategy chosen before the lth step due to the monotonicity of its payoff function. That is to say, when l0 < l the inequality σn (L) 6= σn (l0 ) holds. Therefore, σm (L) = σm (l0 ) and σn (L) = σn (l0 ) cannot hold at the same time, which contradicts the assumption that σ(l0 ) = σ(L).

AN US

A PPENDIX A. Sketch of proof of Theorem 1 Our ASPS is in essence a generalization of an unweighted congestion game in [19] by introducing asymmetric strategy spaces. The proof of PNE is similar to that in [19]. The key idea is to use an induction approach. We assume that the ASPS with M − 1 users have achieved a PNE. When the M th user joins, we can construct a special single-player best-reply path sequence (σ(0), σ(1), ..., σ(L)), in which each deviator takes the next deviator’s present strategy until the L is maximal, to achieve a PNE in an M -user game. Due to ASPS is asymmetric, all the deviation are limited by the deviators’ strategy spaces. But we should notice that the limitation does not affect both the existence of PNE and the improvement process discussed above to find it.

Tree surgery in the proof of Theorem 4.

CR IP T

```

PT

ED

M

B. A Counterexample of FIP in ASPS An example of infinite improvement path in ASPS with 3 users and 3 resources are shown in Table II. The users’ strategy spaces are as Σ1 = {1, 3}, Σ2 = {1, 2} and Σ3 = {2, 3}. Besides the monotone decreasing characteristic of the payoff functions, the following conditions are assumed to be at work, which are π11 (1) > π31 (1), π11 (2) < π31 (2), π12 (1) > π22 (1), π12 (2) < π22 (2), π23 (1) > π33 (1), and π23 (2) < π33 (2). The initial strategy profile is (1, 1, 3), and the path sequence is (1, 1, 2), (1, 2, 2), (3, 2, 2), (3, 2, 3), (3, 1, 3), (1, 1, 3), where a cycle exists.

AC

CE

C. Proof of Corollary 1 When there are only two resources, the proof of FIP is similar to that in a symmetric unweighted congestion game [19]. To avoid redundancy, we omit it here. We prove the case that FIP exists when there are only two users by contradiction. We define the two users as user m and n, and we define Σm ∩ Σn 6= ∅ without loss of generality. There is only one way that a user can possibly return to a strategy (defined as strategy k), which is as follow. When a user (assumed as user m) is playing strategy k, the other user (assumed as user n ) deviates to k to increase its own payoff. Then user m leaves strategy k at the following step and return to it when user n leaves. Therefore, we assume a sequence of single-player improvement paths (σ(0), . . . , σ(l), σ(l + 1), . . . , σ(L), . . .)

D. Sketch of proof of Theorem 3 To prove this theorem, it suffices to prove that for a strategy profile α which is not a PNE, its stochastic potential must be larger than that of a PNE σ. The key ideas of the proof are similar to that in [21]. We first consider the tree T (α). In this tree, there must exist a transition from σ to another state. Since σ is a PNE, it means no user can improve its utility by changing strategy when others keep theirs unchanged. Therefore, the transition has a resistance of at least 2. Then we consider the construction of tree T (σ) under the following two steps. Based on the existence of a single-player improvement path sequence via several strategy profiles (defined as a set V) from α to σ, we first cut T (α) at each transition from the strategy profiles in V and reconstitute them as a partial tree according to the path sequence. We second cut the transition from σ in T (α), add the transition from α to the partial tree and the partial tree to σ, so as to form T (σ). Since all the transitions added have resistances of 1 which are less than or equal to that of the transitions removed (e.g., the transition cut from σ in T (α) has resistance of at least 2), T (σ) has smaller stochastic potential than T (α). E. Proof of Lemma 1 Suppose there exist a PNE σ = (σ1 , σ2 , . . . , σM ) and a strategy profile γ = (γ1 , γ2 , . . . , γM ), where γ is achieved by more than one user’s concurrent deviation from σ without any mistake. We define the corresponding congestion vectors under σ and γ as (nσ1 , nσ2 , . . . , nσK ) and (nγ1 , nγ2 , . . . , nγK ), respectively. First, we analyze the action of the deviators. We define the f and define the set of the set of the deviators from σ to γ as M, e f deviators’ strategies under γ as K. We assume a user m e ∈M has the strategy σm = k under σ and the strategy γ = k 1 2 e m e

ACCEPTED MANUSCRIPT 11

e e πkm (nσk1 ) ≥ πkm (nσk + 1), ∀k1 6= k, k1 , k ∈ K 1

(7)

e e f πkm (nσk1 ) < πkm (nγk2 ), ∀m e ∈M 1 2

(8)

holds. Since user m e switches without any mistake, the following inequality holds. Based on (7) and (8), we have

e nγk2 < nσk2 + 1, k2 ∈ K

(9)

e nγk2 ≤ nσk2 , k2 ∈ K.

(10)

by the monotonicity of the payoff functions. An equivalent expression of (9) is

f the Since the inequality in (10) holds for any user in M, inequality must be an equality, which can be easily proved e the following equality by a contradiction. For any BS e k ∈ K, holds, which is e k ∈ K. (11) nγ = nσ , ∀e e k

AN US

e k

M

It means that the number of users choosing BS e k under γ is equal to that under σ. Therefore, we can get the results that the only switching way for the deviators is exchanging their strategies with each other to improve all their payoffs. Second, we analyze the payoffs of the users who do not f if its strategy is σm ∈ K\K, e deviate. For any user m ∈ M\M, it is not affected by any deviator. Therefore, its payoff stays e due to the equality unchanged. If it chooses strategy σm ∈ K, in (11), its payoff stays unchanged too. Therefore, the equality f πσmm (nσσm ) = πγmm (nγγm ), ∀m ∈ M\M

(12)

ED

always holds. By combining (8) and (12), we have σ  γ. The proof is completed.

CE

PT

F. Proof of Theorem 4 Suppose that there are two PNE σ and σ 0 , which satisfy σ 0  σ. To prove the lemma, it suffices to show that the resistance of T (σ) is smaller than that of T (σ 0 ). We discuss the following two cases. • Case 1: We assume that σ is the only PNE which dominates σ 0 . Then we cut T (σ 0 ) at the transition from σ, and add a transition from σ 0 to σ to form a tree T (σ). The surgery of the tree is shown in Figure 9, in which a circle represents a strategy profile and a directed link represents a transition. Since σ 0  σ holds, 0 any user gains by switching from σm to σm . Therefore, when all the users, whose strategies differ at σ 0 and σ, switch concurrently, the minimum resistance at the new transition will be achieved, which is not more than M . Then we consider the transition from σ to σ 0 . Since σ is a PNE and σ 0  σ holds, according to Lemma 1, a deviation from σ to σ 0 occurs if and only if the first or the third situations discussed above Lemma 1 happens.

AC

That is, the transition occurs when at least one user make mistakes. Therefore, the eliminated transition has a resistance not less than L, which exceeds M . Since the resistances of all the other transitions stay the same in both T (σ 0 ) and T (σ), T (σ) has a lower resistance than T (σ 0 ). Since T (σ 0 ) is a minimum resistance tree rooted as σ 0 , σ 0 must have no mass on the limit distribution. • Case 2: If σ is not the only PNE which dominates σ 0 , then there must exist several dominating PNE in regard to σ 0 . Without loss of generality, we suppose the dominating PNE set is D = {α, β, σ}. We consider two special subcases. The first subcase is there is not any dominated-relationship among α, β and σ. Under this subcase, we can easily form a tree T (σ) (or T (α), or T (β) ) as shown in Case 1, which can be proved to have a lower resistance than T (σ 0 ). The second subcase is the relationship σ  α  β holds. Under this case, we can cut T (σ 0 ) at the transition from β, and form a tree T (β) by adding a transition from σ 0 to β. Similar to Case 1, T (β) has a lower resistance. Any other subcases can be treated as a mixture of the two special subcases. In all, if σ 0 is weakly dominated by at least one other PNE, no tree T (σ 0 ) can have the minimum potential.

CR IP T

e Since σ is a PNE, the under γ, where k1 6= k2 , and k2 ∈ K. following inequality

G. Proof of Corollary 3 Firstly, we assume that all the users adopt a uniform order L of error probabilities, and may adopt two different sampling 0 probabilities  and 0 , where lim→0,0 →0  = 1 holds. We consider two state profiles σ and σ 0 , where the transition from the first one to the second one is required by h users’ sampling and l users’ making mistakes. When all the h users take a uniform sampling probability , and the l 0users make mistakes with probability L , the resistance r(σ, σ ) equals to h + lL. When part of the users (h1 6 h) probe and switch with 0 , and part of the users (l1 06 l) make erroneous switching with 0L , the resistance r(σ, σ ) can be calculated as 0

r(σ, σ ) = =

log p(σ, σ 0 ; , 0 ) →0, →0 log  lim0

lim0

→0, →0

log0

log0

(h − h1 ) + h1log + (l − l1 )L + l1log L

= (h − h1 ) + h11 + (l − l1 )L + l11 L = h + lL.

(13) Obviously, the resistance does not change, and so does the stochastic potential. Therefore, we conclude that the relaxation of the sampling probability does not change the resistance of any two states, hence the stochastic stability of the states. Secondly, we assume that the users make erroneous switching with different orders of probabilities ρ = L and ρ0 = ρ0 L+A  , where A > 1, and lim→0 ρ > 1 holds. We define the corresponding Markov process as P0 (). Based on [21], it is obvious that both P() and P0 () are regular perturbation of P(0). Comparing P0 () and P(), we can find that the non-uniform orders of error probabilities could increase some resistances in P() and hence the stochastic potential of certain

ACCEPTED MANUSCRIPT 12

R EFERENCES

[19] [20]

[21] [22]

[23]

[24]

[25]

[26]

[27]

AC

CE

PT

ED

M

AN US

[1] E. Gustafsson and A. Jonsson, “Always best connected,” in IEEE Wireless Communications, vol. 10, no. 1, pp. 49–55, 2003. [2] S. Filin, H. Harada, H. Murakami, and K. Ishizuand, “International standardization of cognitive radio systems,” in IEEE Communications Magazine, vol. 49, no. 3, pp. 82–89, 2011. [3] L. Wang and G. Kuo, “Mathematical modeling for network selection in heterogeneous wireless networksła tutorial,” in IEEE Communications Surveys & Tutorials, vol. 15, no. 1, pp. 271–292, 2013. [4] R. Trestian, O. Ormond, and G.-M. Muntean, “Game theory-based network selection: Solutions and challenges,” IEEE Communications surveys & tutorials, vol. 14, no. 4, pp. 1212–1231, 2012. [5] A. Orda, R. Rom, and N. Shimkin, “Competitive routing in multiuser communication networks,” in IEEE/ACM Transactions on Networking, vol. 1, no. 5, pp. 510–521, 1993. [6] Y. Xu, J. Wang, Q. Wu, A. Anpalagan, and Y.-D. Yao, “Opportunistic spectrum access in unknown dynamic environment: A game-theoretic stochastic learning solution,” in IEEE Transactions on Wireless Communications, vol. 11, no. 4, pp. 1380–1391, 2012. [7] R. Southwell and J. Huang, “Convergence dynamics of resourcehomogeneous congestion games,” in Game Theory for Networks. Springer, pp. 281–293, 2012. [8] L. M. Law, J. Huang, and M. Liu, “Price of anarchy of congestion games with player-specific constants,” in Proc. of IEEE WCSP 2012, Huangshan, China, Oct. 2012. [9] E. Aryafar, A. Keshavarz-Haddad, M. Wang, and M. Chiang, “RAT selection games in hetnets,” in Proc. of IEEE INFOCOM 2013, Turin, Italy, Apr. 2013. [10] C. Tekin, M. Liu, R. Southwell, J. Huang, and S. H. A. Ahmad, “Atomic congestion games on graphs and their applications in networking,” in IEEE/ACM Transactions on Networking, vol. 20, no. 5, pp. 1541–1552, 2012. [11] M. Liu and Y. Wu, “Spectum sharing as congestion games,” in Proc. of IEEE Allerton 2008, Monticello, IL, USA, Sep. 2008. [12] X. Chen and J. Huang, “Distributed spectrum access with spatial reuse,” in IEEE Journal on Selected Areas in Communications, vol. 31, no. 3, pp. 593–603, 2013. [13] Z. Du, Q. Wu, P. Yang, Y. Xu, and Y.-D. Yao, “User-demand-aware wireless network selection: A localized cooperation approach,” IEEE Transactions on Vehicular Technology, vol. 63, no. 9, pp. 4492–4507, 2014. [14] X. Cai and F. Liu, “Network selection for group handover in multiaccess networks,” in Proc. of IEEE ICC 2008, Beijing, China, May 2008. [15] W. Lee and D.-H. Cho, “Enhanced group handover scheme in multiaccess networks,” in IEEE Transactions on Vehicular Technology, vol. 60, no. 5, pp. 2389–2395, 2011. [16] E. Liu, Q. Zhang, and K. K. Leung, “Asymptotic analysis of proportionally fair scheduling in rayleigh fading,” in IEEE Transactions on Wireless Communications, vol. 10, no. 6, pp. 1764–1775, 2011. [17] H. Ackermann, “Nash equilibria and improvement dynamics in congestion games,” Ph.D. dissertation, Universit¨atsbibliothek, 2009. [18] D. Monderer and L. S. Shapley, “Potential games,” in Games and economic behavior, vol. 14, no. 1, pp. 124–143, 1996.

I. Milchtaich, “Congestion games with player-specific payoff functions,” in Games and economic behavior, vol. 13, no. 1, pp. 111–124, 1996. J. W. Friedman and C. Mezzetti, “Learning in games by random sampling,” in Journal of Economic Theory, vol. 98, no. 1, pp. 55–84, 2001. H. P. Young, “The evolution of conventions,” in Econometrica: Journal of the Econometric Society, pp. 57–84, 1993. R. Southwell, X. Chen, and J. Huang, “Quality of service games for spectrum sharing,” IEEE Journal on Selected Areas in Communications, vol. 32, no. 3, pp. 589–600, 2014. S. Ieong, R. McGrew, E. Nudelman, Y. Shoham, and Q. Sun, “Fast and compact: A simple class of congestion games,” in Proc. of AAAI 2005, Pittsburgh, Pennsylvania, Jul., 2005. A. Fanelli, M. Flammini, and L. Moscardelli, “The speed of convergence in congestion games under best-response dynamics,” in Automata, Languages and Programming, Springer, 2008, pp. 796–807. V. Anantharam, “On the nash dynamics of congestion games with player-specific utility,” in Proc. of IEEE CDC 2004, Atlantis, Bahamas, Dec., 2004. H. Ackermann and H. R¨oglin, “On the convergence time of the best response dynamics in player-specific congestion games,” arXiv preprint arXiv:0805.1130, 2008. V. Pacifici and G. D´an, “Convergence in player-specific graphical resource allocation games,” in IEEE Journal on Selected Areas in Communications, vol. 30, no. 11, pp. 2190–2199, 2012. C. Sung and W. Wong, “Power control for multirate multimedia CDMA systems,” in Proc. of IEEE INFOCOM 1999, New York, NY, USA, March 1999. A. Hasan and J. G. Andrews, “Cancellation error statistics in a powercontrolled CDMA system using successive interference cancellation,” in Proc. of IEEE ISSSTA2004, Sydney, Australia, Aug. 2004.

CR IP T

strategy profiles. But we notice that the profiles which have minimum stochastic potential in P() are not affected. Because these profiles must be the dominating PNE as proved in Theorem 4, and hence their minimum spanning trees must not contain a path generated by any user’s erroneous switching. Therefore, we conclude that the relaxation of error probability does not change the stochastic stability of the states.

[28]

[29]