Performance Evaluation 90 (2015) 18–35
Contents lists available at ScienceDirect
Performance Evaluation journal homepage: www.elsevier.com/locate/peva
Load transients in pooled cellular core network nodes Åke Arvidsson a,b,∗ , Tobias Rydén c a b c
Ericsson AB, SE-164 80 Stockholm, Sweden Department of Computer Science, Kristianstad University, SE-291 88 Kristianstad, Sweden Department of Information Technology, Uppsala University, Box 337, SE-751 05 Uppsala, Sweden
article
info
Article history: Received 16 January 2014 Received in revised form 22 April 2015 Accepted 22 April 2015 Available online 4 May 2015 Keywords: Iu-flex S1-flex Pool Node failure User redistribution
abstract The coverage areas of cellular networks are logically subdivided into service areas. Each service area has a local anchor node which ‘‘hides’’ the mobility inside the area and the entire network has a global anchor node which ‘‘hides’’ the mobility between areas. The concept of unique local anchor nodes per service area was invented to simplify routing but has been found to complicate expansion. The rapidly growing demand for cellular access has therefore prompted for alternative solutions with pools of local anchor nodes per service area. Such pools are now deployed by several operators all over the world. Users in pooled service areas are mapped to specific pool members according to a load distribution policy, but the mapping can change as a result of node failures or operator interventions. Such changes take a certain time to implement and cause additional load on the anchor nodes. We study these processes in detail and derive closed form expressions which allow operators to control the trade-off between rapid changes and acceptable loads. Finally we show that the key assumptions of our model are in agreement with measured data and demonstrate how the model can be applied to investigate the effects of different network settings (timers) under different user behaviour (traffic and mobility). Contrary to current solutions to this problem, which typically are slow and/or inaccurate, our results enable fast and accurate analysis of different scenarios thereby enabling operators to maximise utilisation of the existing investments and at the same time avoid potentially dangerous situations of overload. © 2015 Elsevier B.V. All rights reserved.
1. Introduction Traffic in cellular networks is routed through anchor nodes which ‘‘hide’’ user mobility from the rest of the network. In more detail, there are ‘‘global’’ anchor nodes which interface external networks and ‘‘local’’ anchor nodes which interface dedicated subsets of cells. The set of cells allocated to a particular anchor node is known as the ‘‘service area’’ of that anchor node. The concept of one unique anchor node per service area is easy to implement but also imposes limits to the number of users and the volumes of traffic that can be handled in each service area. When these limits are reached the existing nodes can be upgraded or new nodes can be added, but both options have significant drawbacks. As for upgrades, the strong growth of subscriber numbers and traffic volumes in cellular networks means that the capacity of new nodes often fills up before they have been depreciated (i.e., an upgrade is economically impossible) or before new versions have been developed (i.e., an upgrade is technically impossible). As for additions, the requirement that service areas be unique means that service
∗
Corresponding author at: Ericsson AB, SE-164 80 Stockholm, Sweden. E-mail addresses:
[email protected],
[email protected] (Å. Arvidsson),
[email protected] (T. Rydén).
http://dx.doi.org/10.1016/j.peva.2015.04.001 0166-5316/© 2015 Elsevier B.V. All rights reserved.
Å. Arvidsson, T. Rydén / Performance Evaluation 90 (2015) 18–35
19
Fig. 1. Simplified view of a cellular network.
areas in which the limits have been reached must be split between the old node and the new one. As will be discussed more below, such splits are complicated to implement and result in reduced efficiency. Given this background, the existing standards have been extended to allow for ‘‘pools’’ of more than one anchor node per service area. The load in a pooled service area is shared between the nodes by allocating different users to different nodes. Such allocations are in effect from the moment users enter the service area to the moment they leave it. The capacity in a pooled service area is thus the sum of the capacities of all pool members, hence the growing demand in a service area can be handled simply by adding more nodes to its pool. Besides being a convenient way to handle growth, pools also offer the possibility to redistribute users between nodes in case of, e.g., node failures (improved redundancy), planned outages or recent additions (simplified management). However, node changes require data updates in all affected entities (the old local anchor node, the new local anchor node, the global anchor node and the user terminal) hence such redistributions may result in noticeable load peaks. The subject of this paper is to develop a model whereby these peaks can be understood and controlled such that operators safely can take full advantage of pools. To the best of our knowledge, we are not aware of any previous attempts to study this problem. On the contrary, despite the fact that several operators around the world already have implemented pools, cf., e.g., [1–3], and that many others are planning to do so, the current practical approaches to these problems are insufficient (e.g., steady state analyses which provide no information about progress rates or load transients) or impractical (e.g., measurements or simulations which may be both complex and slow). The rest of the paper is organised as follows: Section 2 summarises the technical background; Section 2.1 provides a brief introduction to cellular networks while Sections 2.2 and 2.3 describe the problems with unique nodes and the advantages with pooled nodes respectively after which we formulate our problem in Section 2.4. Next, the mathematical models of user redistribution are derived in Section 3 with some preliminaries in Section 3.1 and the models related to node failures and operator interventions in Sections 3.2 and 3.3 respectively. We then verify and apply our results in Section 4; in Sections 4.1 and 4.2 we introduce a measurement based traffic model and a realistic load model respectively, in Section 4.3 we show that the measurements support the key assumptions of our model and in Sections 4.4 and 4.5 we apply our results to illustrate the impact of different user habits and different network parameters under node failures and operator interventions respectively. Finally in Section 5 we summarise our work and discuss various extensions. 2. Technical background 2.1. Cellular networks A simplified view of a cellular network according to the Third Generation Partnership Project (3GPP) is given in Fig. 1. The network consists of radio cells with transmitters (towers), the transmitters are connected to local anchor nodes (LANs) and the LANs are connected to a global anchor node (GAN) which interfaces an internet service provider (ISP) or the public switched telephone network (PSTN). Traffic from a user can be routed hierarchically through the current cell and its unique, topologically given LAN and to the topologically given GAN, but traffic to a user must be routed by location tables in the GAN (to find the current LAN) and in the LAN (to find the current cell). Users continuously associate themselves with the cell with the currently strongest radio signal and a procedure known as location update is used to update the location tables as required. We shall refer to such events as re-registrations and note that only some cell changes trigger LAN changes since the service areas (SAs) of LANs contain many cells. The users in an SA are identified by local temporary identifiers (TIs) assigned by the LAN. Users without TIs (who, e.g., power up for the first time) or with invalid TIs (who, e.g., arrive or return to the SA and thus have TIs from other SAs) will be
20
Å. Arvidsson, T. Rydén / Performance Evaluation 90 (2015) 18–35
Fig. 2. Simplified mapping between the generic terminology and the 3GPP terminology. Dashed frames indicate that the enclosed functions typically are implemented as single entities.
given new and valid TIs while users with expired TIs (because of lack of contact or because of departure from the SA) will lose their TI. While these principles apply to all 3GPP networks, we emphasise that the above terminology is generic and that the actual terminology is different and varies between the second, third and fourth generations of cellular technology (2G, 3G and 4G respectively1 ) and between the circuit switched and packet switched domains (CS and PS respectively). We shall keep the generic notation throughout this paper but for completeness Fig. 2 provides a somewhat simplified map to the actual one. The figure shows from left to right the public backbone networks, the mobile core network, and the cellular radio accesses. Nodes in the voice (CS) domain are shown in light grey while those in the data (PS) domain are shown in dark grey. It is seen that there are two kinds of anchor nodes, one with respect to the traffic flow and one with respect to the location information. For 2G/3G both types of LANs are uniquely determined by the topology (A, Gb and Iu in the figure), but for 4G only the location LAN is determined this way (S1 in the figure). Finally we mention that the TI is known as TMSI (temporary mobile subscriber identity) in 2G/3G and globally unique temporary identity (GUTI) in 4G. 2.2. Single nodes While the concept of geographically unique LANs simplifies routing, it also comes with potential problems. The underlying reasons are that the evolution of user demand (traffic volumes and user numbers) at times has outpaced the financial plans (budgets for depreciation) and/or the evolution of node capacity (traffic handling and node processing), and in addition created an increasing dependency on the services. The solution to the capacity problems is to add more nodes. The problem with geographical uniqueness is then, however, that each new node requires a ‘‘split’’ of SAs whereby some cells are ‘‘transferred’’ from the (reduced) SA of the existing LAN to the (new) SA of the new LAN. The result is that growing demand results in smaller geographical areas per LAN, and this has proved to be problematic. To see this, compare the case when an entire city constitutes one large SA to the case when the city is split into many small SAs. In the first case, the LAN with the large SA will see a relatively stable load with predominantly business users in the central areas during office hours, home users in the suburbs during evenings, and commuters in the mornings and afternoons, and re-registrations will only be required for users who arrive to or depart from the city. In the second case, however, LANs with small SAs in the central area will only see business users during office hours, LANs in small SAs in suburban areas will only see home users during evening hours, and LANs in small SAs in between will only see commuters in the mornings and afternoons. The result is reduced overall utilisation since some LANs have free capacity while others operate at peak load and increased mobility overhead since re-registrations will be required even for trips within the city. Moreover, splits lead to leap-wise capacity growth as each split means moving from a situation with little or no free capacity in one node to a lot of free capacity in two or more nodes and require complex implementation processes as the cells of the original SA must be divided between the two new ones such that even utilisation is combined with few additional re-registrations after which the configuration data in all affected nodes and cells must be updated.
1 Sometimes they may also be referred to by their respective radio technologies, viz. GSM, WCDMA and LTE respectively.
Å. Arvidsson, T. Rydén / Performance Evaluation 90 (2015) 18–35
21
Fig. 3. Simplified view of a general pooled cellular network.
2.3. Pooled nodes Given this background, 3GPP has developed a series of standards [4–8] which allow for a more flexible mapping between cells and groups of LANs. The solution is often referred to as ‘‘pool’’, since the node groups may be viewed as pools, or as ‘‘S1 Flex’’, ‘‘Iu Flex’’ or, less commonly, ‘‘A Flex’’ or ‘‘Gb Flex’’, since they apply to these interfaces, cf. Fig. 2. The standards are supported by all major manufacturers and are increasingly being deployed in many networks all over the world, see e.g., [1–3]. A ‘‘pool SA’’ (PSA) can be formed by merging a number of adjacent SAs as shown in Fig. 3 where two PSAs 12 and 34 are formed out of the four original SAs 1, 2, 3 and 4 in Fig. 1. The problem of composing PSAs is discussed in more detail in [9] and we remark that, in practical terms, the actual merger is a matter of altering traffic flows (changing the routing) rather than moving physical equipment (changing the topology). Since LANs must keep user-specific information related to, e.g., identity, location and ciphering, users still must be tied to particular nodes. In pooled networks this is achieved by making the radio network inspect the TI and interpret a part of it as a node identifier (NI). Users with valid NIs will thus repeatedly be routed to the same LAN while users without NIs or with invalid NIs will be routed to an arbitrary LAN (selected according to some load distribution policy such as weighted round robin or as suggested in, e.g., [10]) and given new and valid TIs with the NI of that LAN. For those interested in the more exact terminology, we mention that the NI is known as NRI (network resource identifier) in 2G/3G and as GUMMEI (globally unique MME identifier) in 4G. The pooled solution makes it possible to add more nodes without SA splits. The result is not only the opposite of the above problems, i.e., improved overall utilisation, decreased mobility overhead, incremental capacity growth and simple implementation processes, but also increased redundancy and simplified management. The latter features are obtained by treating the NIs of failed LANs or LANs with planned interruptions as invalid and thus relatively quickly re-registering users with those NIs with other pool members according to the load distribution policy. 2.4. Engineering Both single nodes and pooled nodes must be engineered with respect to specific performance targets for processing and transmission. In both configurations we have a static case (with stable conditions during some peak period) which is relatively simple hence we do not treat it explicitly. Pools also have a transient case (where users are redistributed quickly as a result of node failures or operator interventions) which is much more complicated and the topic of this paper. In more detail we shall consider two important aspects of these transients; time (the time until a user re-registers depends on how frequently it makes contact with the network) and load (each re-registration requires updating the user-specific information in the anchor nodes). Note that there is a potential conflict between fast re-registration (to quickly recover from failures or to enable maintenance without early notice) and controlled load (to avoid overload related to bursts of re-registrations). The aim of this paper is to enable operators to handle such conflicts gracefully by, e.g., selecting different timer settings for different traffic patterns such that they can combine fast re-registration (short times increase speed) with controlled loads (long times reduce load) etc. 3. Models of redistribution processes 3.1. Preliminaries We shall now compute the distribution of the time R to re-registration for an average user. To this end, we note that re-registrations may be triggered by two basic types of activities; random transactions and periodic transactions.
22
Å. Arvidsson, T. Rydén / Performance Evaluation 90 (2015) 18–35
Fig. 4. Top: Time until a time out after a failure. Bottom: Time until periodic time out after re-registration. Timer resets are marked ‘‘×’’ (any activity) and ‘‘⊗’’ (traffic activity) respectively while timer expiry is marked ‘‘⊙’’.
Random transactions are related to user activities which are assumed to take place with constant rates (this assumption will be verified against measurements in Section 4.3). Our model includes outbound traffic activities such as originated connections (make telephone calls) and uploaded messages (send text messages) with rate γo , inbound traffic activities such as terminated connections (answer telephone calls) and downloaded messages (receive text messages) with rate γi , internal movement activities (location updates due to movements between cells in the same LAN SA) with rate γint , and external movement activities (location updates due to movements between cells in different LAN SAs) with rate γext . The movement activities are further simplified by assuming that users are ‘‘semi-stationary’’ in the sense that they have been and will be in the PSA for a long time relative to the time spans of interest. (This assumption is not necessary but introduced to avoid even more complex notation and, as we shall see, the time spans in question are relatively short hence this assumption will typically be reasonably well fulfilled.) With this simplification we may consider all movement activities as internal with aggregate rate γm . Periodic transactions are related to location updates triggered by time outs. Our model includes periodic location updates which, as mandated by the standards, users must perform after given periods of user inactivity, i.e. after a given period without random transactions (to confirm their availability). The time out period is denoted by T . From the above it is clear that user re-registrations impact the performance of both LANs and GANs. For reasons of brevity we shall, however, focus on the former but we emphasise that our results also suffice for GANs. 3.2. Re-registration related to node failures In this subsection we consider the case of a LAN becoming unavailable at time t = 0 and study the time it takes for an arbitrary user in that LAN to re-register in another LAN. Such re-registrations can be triggered by user initiated activities such as outbound traffic activities, movement activities and periodic time outs but (typically) not by network initiated activities such as inbound traffic activities. The reason for this is that traffic from users can be re-routed to other LANs in the (geographically mapped) pool while traffic to users cannot query other LANs since standard GANs do not explicitly handle pools of (geographically equal) alternative LANs, but try to contact the specific LAN found in the location table. The probability that the user has not re-registered before time t > 0 is then the probability that the user has not performed an outbound traffic activity, an internal movement activity or a periodic time out before time t. Formally we write P(R > t ) = P(O > t ) × P(M > t ) × P(P > t |O > t , M > t ) where R denotes the time to re-registration and O , M and P denote the times to the next outbound traffic activity, internal movement activity, and periodic time out respectively. Using the Poisson assumptions (constant rates) in Section 3.1 the probability that the user has not performed any outbound traffic activity before time t may be written as P(O > t ) = e−γo t
(1)
and, similarly, the probability that the user has not performed an internal movement activity before time t may be written as P(M > t ) = e−γm t .
(2)
The probability that the user has not performed a periodic time out before time t is perhaps somewhat less obvious. The problem is illustrated in Fig. 4 (top) where each cross on the time axis corresponds to a transaction. Note that the distance between two transactions can never exceed the time out of T time units. The failure takes place at time t = 0 and it may be assumed that this point is independent of the transactions of the user.
Å. Arvidsson, T. Rydén / Performance Evaluation 90 (2015) 18–35
23
As indicated by the figure, the time A until the planned time out equals T − B, where B is the time since the last transaction before 0. Note that, although the time A always is defined mathematically as above, it is not certain that a time out will happen at this time. Indeed, an outbound traffic event or a movement event may take place before A and, if so, reset the timer and thus postpone the next scheduled time out, see the lower part of Fig. 4 and the forthcoming calculations. Before time t = 0, i.e. before the failure, the transactions consist of outbound traffic activities, inbound traffic activities, internal movement activities and time outs. Knowing that the time out timer is reset at each activity, the overall process of transactions forms a renewal process with inter-renewal times distributed as a random variable which we formally write as H = min(Exp(γi ), Exp(γo ), Exp(γm ), T ), where Exp(γ ) is a random variable following an exponential distribution with rate γ and the three variables are independent. The time B is then the (stationary) backward recurrence time in this renewal process, at time t = 0. Recalling that the minimum of independent exponential variables is again exponentially distributed, the above simplifies to H = min(Exp(γ ), T ) where γ is the total activity rate,
γ = γo + γi + γm . Evaluating the distribution function FH (u) of the time between renewals, we get FH (u) = P (Inter-renewal time H ≤ u) =
1 − e−γ u 1
0 ≤ u < T, u ≥ T.
We see that the distribution of H has a point-mass at T with weight (probability) e−γ T . The density fB of B is then obtained by renewal theory (e.g. [11, Section V.3]) as f B ( u) =
1 − FH (u) E[ H ]
=
γ e−γ u 1 − e−γ T
for 0 ≤ u < T . Thus we get, for 0 ≤ t ≤ T , P(P > t |O > t , M > t ) = P(A > t ) = P(B < T − t ) =
T −t
fB (u) du = 0
1 − e−γ (T −t ) 1 − e−γ T
.
(3)
Collecting Eqs. (1)–(3) we obtain the total probability of no re-registration before t, −γ (T −t )
1−e P(R > t ) = e−(γo +γm )t 1 − e−γ T
for 0 ≤ t ≤ T .
(4)
Finally we remark that, since all users are assumed to be statistically equal, the probability that an arbitrary user has not re-registered before time t is equal to the expected fraction of all users that have not re-registered before time t. 3.3. Re-registration related to operator interventions In this subsection we consider a LAN where an operator-initiated re-registration is started at time t = 0 and study how users move to other LANs as a result of different activities and time outs. The re-registration is divided into three phases [4] and the durations of the different phases are controlled by timers. In the first phase only movement activities and periodic time outs trigger re-registrations, and the time out timer is reset by in- and outbound traffic activities. In the second phase all transactions trigger re-registration and in the third phase even calls in progress will trigger re-registration. The three phases thus imply increasing pressure on users to re-register, and the purpose of this gradual increase is to limit re-registration peak loads and to support ‘‘invisible’’ re-registrations while at the same time ensuring that all users have re-registered within a limited time. To begin we make two simplifications. First, moves triggered by traffic activities will be treated as immediate. (In reality they are delayed until connections are completed, but since the connection holding time typically amounts to on the order of one minute, the impact on the dynamics of the re-registration process is marginal.) Second, the third phase will be ignored. (To enter this phase, users must not move at all during the first two phases, be engaged at least every T time unit, and be engaged exactly at the end of the second phase, and we believe such behaviour to be quite rare in practice.) We assume that the first phase starts at time t = 0 and that the second phase starts at time t = τ . 3.3.1. The first phase The first phase means that passive users will be redistributed when an internal movement activity is performed or when a periodic time out occurs. Thus for t ≤ τ , P(R > t ) = P(M > t ) × P(P > t |M > t ).
(5)
The first probability on the right-hand side of Eq. (5) is given by Eq. (2), while the second one is more difficult to compute. To start with we will consider the case t ≤ T , which is easier to analyse than the general case t > 0. For the analysis we will use the structure illustrated by Fig. 4 (bottom), which depicts the activities of an arbitrary user in time. The re-registration is
24
Å. Arvidsson, T. Rydén / Performance Evaluation 90 (2015) 18–35
initiated at time t = 0 and it may be assumed that this point is independent of the transactions of the user. The first periodic time out will occur at A if there is no user activity. If, however, there is a user traffic activity before the timer expires, the timer is reset to T and restarted. The figure shows how a first traffic activity after time R causes a first reset of the timer and how subsequent traffic activities after consecutive times S1 , S2 , S3 and S4 cause further resets of the timer until it finally expires after a further, consecutive time T . It is clear that there can be anything from zero to infinitely many extensions S. The time until the periodic time out, W say, is thus the sum of R, all S and finally T . The restricted case. Since t ≤ T it suffices to notice that a single reset of the timer prior to t ensures that W > t; this is because the remaining delay until eventual periodic time out will be longer than T . Thus, W > t holds when R < t and A > R. When R ≥ t the event W > t holds provided A > t. Thus, for t ≤ T , P(P > t |M > t ) = P(W > t ) = P(R < t , A > R) + P(R ≥ t , A > t ).
(6)
Since R denotes the first occurrence of an in- or outbound activity it has a negative exponential distribution, fR (r ) = γt e−γt r ,
(7)
where
γt = γi + γo . Using the fact that A = T − B and R are independent and inserting Eqs. (3) and (7) into Eq. (6) we obtain, after some algebra, this probability as P(P > t |M > t ) =
t
P(A > r )fR (r ) dr + P(R ≥ t )P(A > t ) 0
=
1
γt −γ T 1+ e − e−γ T +γm t γm
1 − e−γ T
γt +1 γm
(8)
for t ≤ min(τ , T ). The general case. We now turn to the more complex general case of an arbitrary t > 0, and we will derive an expression for the Laplace transform of (the probability density function of) the time of eventual periodic time out W . The scenario illustrated by Fig. 4 (bottom) concerns the situation when R < A. If on the other hand R ≥ A there will be a periodic time out at A, and W = A. The distribution of W will thus be a mixture corresponding to these two cases. Since R = r occurs with density (7), the conditional Laplace transform of W given A satisfies −θ W
E[ e
A
E[e
| A] =
−θ(r +Stot +T )
] γt e
−γt r
e−θ A γt e−γt r dr
dr +
0
=
∞
A
γt 1 − e−(θ+γt )A fS∗tot (θ ) e−θ T + e−(θ +γt )A , θ + γt
(9)
where f ∗ denotes the Laplace transform of (the probability density function of) a random variable. The two terms in Eq. (9) refer to the two cases R < A and R ≥ A respectively. The three factors in the former term correspond to the first user activity R, all subsequent activities Sj (the sum of which is denoted by Stot ) and the final periodic time out T respectively, cf. Fig. 4 (bottom). Unconditioning Eq. (9) and identifying the Laplace transforms we get fW∗ (θ ) =
γt 1 − fA∗ (θ + γt ) fS∗tot (θ )e−θ T + fA∗ (θ + γt ). θ + γt
(10)
We note that the expression above contains two Laplace transforms, the one of A and the one of Stot . The probability density function of A is obtained by differentiating (3) w.r.t. t, fA (t ) =
γ e−γ (T −t ) 1 − e−γ T
(11)
for 0 ≤ t ≤ T , after which its Laplace transform is obtained as fA∗ (θ ) =
γ e−γ T − e−θ T −γ T 1−e θ −γ
(12)
for θ = γ the second fraction is interpreted as its limit Te−γ T . To find the Laplace transform of Stot we recall that the timer is reset by outbound activities and inbound activities, i.e. according to a Poisson process with rate γt . Thus the different epochs Sj are independent and distributed as R conditional on R < T . That is, P(Sj ≤ t ) = P(R ≤ t | R < T ) =
1 − e−γt t 1 − e−γt T
.
Å. Arvidsson, T. Rydén / Performance Evaluation 90 (2015) 18–35
25
Let also p = P(R < T ),
q = 1 − p.
Because of the independence between the different epochs Sj , there will be a geometrically distributed number of epochs before the final periodic time out period. Indeed, with probability pk q, for k = 0, 1, 2, . . . , there will be k epochs. Writing D for the distribution of a random variable, we can thus represent Stot as the mixture
D (Stot ) =
∞
p q D (S1 + · · · + Sk ) = D k
k=0
∞
p q(S1 + · · · + Sk ) . k
k =0
The Laplace transform is thus given by fS∗tot (θ ) =
∞
pk qE[e−θ(S1 +···+Sk ) ] =
k=0
∞ k=0
pk q[fS∗ (θ )]k =
q 1 − pfS∗ (θ )
,
(13)
where fS∗ is the Laplace transform of the probability density function of Sj . Now using the exponential distribution of R we find that p = 1 − e−γt T ,
q = e−γt T ,
and fS∗ (θ ) =
T
0
γt p
e−γt t e−θ t dt =
γt 1 − e−(θ+γt )T , p(θ + γt )
yielding fS∗tot (θ ) =
q(θ + γt )
θ + γt e−(θ+γt )T
.
(14)
The complete expression for fW∗ (θ ) is thus obtained by inserting Eqs. (12) and (14) into Eq. (10). Summing up, the probability of no periodic time out before t may now be expressed as P(P > t |M > t ) = P (W > t ) = 1 − P (W ≤ t ) = 1 −
t
fW (t ) = 1 − L−1 0
fW∗ (θ )
θ
(t )
(15)
for t < τ where L−1 denotes the inverse Laplace transform. We note that numerical methods may be used to perform this operation (e.g. [12]). We conclude the treatment of phase one by returning to the original problem of the probability of no re-registration before time t in the first phase, Eq. (5), and note that the probability of no internal movement activity before t is given by Eq. (2) and the probability of no periodic time out before t given by Eq. (8) for t ≤ T while the less explicit Eq. (15) holds for any for t > 0. 3.3.2. The second phase The second phase represents an extension to the first phase in that users will be redistributed when a traffic activity is performed. Letting u = t − τ denote the time in phase 2 we get, for any 0 < u ≤ T , P(R > τ + u) = P(O > u) × P(I > u) × P(M > τ + u) × P(P > τ + u|O > u, I > u, M > τ + u),
(16)
where O and I respectively denote the times to the next outbound and inbound traffic activity after time τ . The three first factors refer to Poisson processes with rates γo , γi and γm respectively. The two former can be combined and written as P(T > u) = e−γt u ,
(17)
where T denotes the time to the next in- or outbound traffic activity after time τ , while the third one follows Eq. (2). The remaining problem is thus to determine the last factor. We first recall that the function of the timer is different in phase 1, when it is reset by in- and outbound traffic events, and phase 2, when it is never reset. The probability of no time out before τ + u therefore equals the probability that there has been no time out before τ , and that the remaining timer at τ is at least u. We first analyse the case when the length of phase one is no longer than T , and then discuss the opposite case. To carry out the analysis we introduce the notation A(t ) for the remaining time until periodic time out at time t. Hence A(0) = A in the notation previously introduced. The trajectory {A(t )}t ≥0 is a stochastic process that decreases linearly at unit rate, except when it is reset to T , which happens according to a Poisson process with rate γt during phase 1. If a periodic time out occurs at time t ′ we will write A(t ) = ∆ for all t ≥ t ′ , i.e. ∆ is a state indicating that a periodic time out has happened. Otherwise A(t ) takes values in the set (0, T ]. The probability P(P > τ + u| · · ·) of no time out before τ + u can
26
Å. Arvidsson, T. Rydén / Performance Evaluation 90 (2015) 18–35
then be written P(A(τ ) > u), where the event {A(τ ) > u} means that a periodic time out has not occurred by time τ (i.e., A(τ ) ̸= ∆), and that the value of A(τ ) is larger than u. Now fix t > 0; we will write dx both for an infinitesimal interval located at some x, 0 < x ≤ T , and for the length of this interval, and we will derive the probability P(A(t ) ∈ dx). The case t ≤ T . We split this case into the two subcases 0 < x ≤ T − t and T − t < x ≤ T . For A(t ) ∈ dx to happen in the first subcase there can be no timer reset in [0, t ), because then the remaining timer A(t ) at t must exceed T − t. The event left to consider is thus that there is no timer reset in [0, t ), implying that A(t ) = A − t provided A ≥ t, and that the initial timer A is such that A − t ∈ dx. In other words, for t ≤ T and 0 < x ≤ T − t we have P(A(t ) ∈ dx) = P(R ≥ t , A − t ∈ dx) = P(R ≥ t )fA (t + x) dx, where we used the independence of R and A. Using Eqs. (7) and (11), we obtain
γ e−γ (T −t −x) −γt t e 1 − e−γ T for 0 < x ≤ T − t. For the second subcase, T − t < x ≤ T , in order for A(t ) ∈ dx to happen there must be a timer reset at t − (T − x), followed by no resets in the interval (t − (T − x), t ], of length T − x. The latter event has probability e−γt (T −x) . A timer reset at t − (T − x), or more precisely in the interval t − (T − dx), can happen in either of two ways: R ∈ t − (T − dx) and R < A, or R < t − (T − x), R < A, and another reset in t − (T − dx). By the independence of R and A, these events have probabilities fR (t − (T − x)) dx P(A > t − (T − x)) and t −(T −x) P(R < t − (T − x), R < A)γt dx = P(A > r )fR (r ) dr × γt dx fA(t ) (x) =
0
respectively; note that γt dx is the probability that an in- or outbound activity occurs in t − (T − dx) (then resetting the timer). Evaluating the integral, summing up the two probabilities and multiplying by e−γt (T −x) yields, after applying some algebra, fA(t ) (x) =
γt e−γt (T −x) γt γ γm (t −(T −x)) −γ T 1 + e − e 1 − e−γ T γm γm
for t ≤ T and T − t < x ≤ T . To obtain the probability of no time out before τ + u we integrate fA(t ) at t = τ over (u, T ), yielding the last factor of the right-hand side of Eq. (16) as P(A(τ ) > u) =
T
fA(τ ) (x) dx u
=
e−γt τ
′ e−γ (T −τ −ξ ) − e−γ (T −τ −u)
1 − e−γ T
γt + 1 − e−γ T
′ ′ 1 − e−γt (T −ξ ) γm 1 − e−γ (T −ξ ) γm τ −γ T −γ T e + − e γm
γt
γm
(18)
for τ ≤ T and x ≤ T , where ξ ′ = max(u, T − τ ). We remark that in accordance with the previous comment that A(0) = A, we have P(P > τ | · · ·) of Eq. (8) equal to P (A(τ ) > 0) of the above Eq. (18). The case t > T . When t > T there must be, for A(t ) ∈ dx to happen, similar to the second subcase above, a timer reset at t − (T − x) followed by a period of duration T − x with no further timer resets. The probability of the latter event is again e−γt (T −x) . We write the probability of the first event as P(W > t − (T − x)) γt dx, where again γt dx is the probability of an in- or outbound activity occurring in t − (T − dx) (and resetting the timer). Thus we have fA(t ) (x) dx = P(W > t − (T − x)) γt e−γt (T −x) dx. This equality is indeed valid for any t and x such that t − (T − x) > 0, or x > T − t. Now fix 0 < u < T and integrate the equality with respect to x from max(u, T − t ) to T to obtain P(A(t ) > max(u, T − t )) =
T x=max(u,T −t )
P(W > t − (T − x)) γt e−γt (T −x) dx.
Multiplying by e−θ t , integrating with respect to t ∈ [0, ∞) and changing order of integration, we arrive at
L{P(A(·) > max(u, T − ·))}(θ ) =
T x =u
= L{P(W > ·)}(θ )
T x=u
∞
e−θ t P(W > t − (T − x)) γt e−γt (T −x) dt dx
t =T −x
γt e−(γt +θ)(T −x) dx
1 − fW∗ (θ )
θ
γt 1 − e−(γt +θ )(T −u) . γt + θ
(19)
Å. Arvidsson, T. Rydén / Performance Evaluation 90 (2015) 18–35
27
Table 1 Measured user transaction rates. Counter
Events/user and hour
Inbound voice call Outbound voice call Inbound SMS Outbound SMS Location update movement Location update periodic
0.431 0.391 0.292 0.146 1.069 0.249
Table 2 Modified and rescaled processor execution requirements per event. Event
Execution requirement (units)
Inbound voice call Outbound voice call Inbound SMS Outbound SMS Location update intra LAN Location update new LAN Location update old LAN Location update periodic
7.0 6.0 3.0 3.0 0.7 2.8 0.7 0.7
By numerically inverting this transform at t = τ we obtain the probability P(P > τ + u| · · ·) in Eq. (16) of no time out before τ + u for τ > T . We conclude the treatment of phase two by returning to the original problem of the probability of no re-registration before time τ + u in the second phase, Eq. (16), and note that the probability of no traffic activity in [τ , τ + u) is given by Eq. (17), the probability of no internal movement activity before τ + u is given by Eq. (2) and the probability of no periodic time out before τ + u is given by Eq. (18) for τ ≤ T and by Eq. (19) for τ > T . 4. Numerical examples In this subsection we illustrate how our results can be used to solve the engineering problems of Section 2.4 by calculating re-registration times and processor loads for a given traffic model (subscriber behaviour), a given load model (node characteristics) and some tentative timer settings (operator settings). To this end we consider a pool with four identical LANs (in this case combined MSC/VLRs) each serving 500,000 users. The total load is 60%, 2 percentage points of which refer to idle load (from maintenance tasks) and 58 percentage points of which refer to user load (from user activities). We emphasise that the choice of identical LANs is based on minimising the number of diagrams (one diagram represents all nodes) rather than by model limitations (the formulae apply to all cases), and that the same methodology can be used to study GANs. 4.1. Traffic model The user traffic model refers to the rates of the different transactions. To measure these rates one can collect hourly observations of counters of events and subscribers and divide event counters by subscriber counters. An example from an unnamed network in Asia is given in Table 1. The example refers to the peak hour as seen in a measurement period of one week. From Table 1 we may set the values for the random transactions as γi = 0.431 + 0.292 = 0.723 h−1 , γo = 0.391 + 0.146 = 0.537 h−1 and γm = 1.069 h−1 which yields γ = 2.329 h−1 . For periodic transactions, the periodic location update timer was set to T = 1 h. We emphasise that the traffic model is subject to daily variation cycles as well as long term trends, and that the former must be handled by applying traffic models from different times of the day (to find the most critical times) while the latter must be handled by regularly repeated evaluations (to verify continued validity). 4.2. Load model The processor load model refers to the load imposed by different transactions. To estimate these loads one can collect a series of hourly observations of counters of events and load such that the load per event can be estimated from multiple linear regression. A modified and rescaled example is given in Table 2. Note that the values above do not have absolute interpretations but they are relative to each other and to the processor capacity. There are thus many possible combinations of units, e.g., execution requirements as ‘‘clock cycles’’ and
28
Å. Arvidsson, T. Rydén / Performance Evaluation 90 (2015) 18–35
processing capacity as ‘‘clock cycles per time unit’’, or execution requirements as ‘‘milliseconds’’ and processing capacity as ‘‘milliseconds per time unit’’ etc. We remark that, because of our assumption of all movements taking place inside the pool, the event ‘‘Location update movement’’ in Table 1 corresponds to the event ‘‘Location update intra LAN’’ in Table 2 while the two events ‘‘Location update new LAN’’ and ‘‘Location update old LAN’’ in Table 2 have measured rates zero and only apply to the ‘‘forced’’ re-registrations. The processor capacity C of the LANs in the present example can now be calculated as follows U
e∈E
γe xe
U
γe xe
e∈E
≈ 6,551,379 execution units/hour, ρtotal − ρidle where U = 500,000 is the number of users, ρtotal = 0.60 and ρidle = 0.02 are the measured loads with and without users respectively (cf. the first paragraph of Section 4) and E is the set of all events with rates γe (Table 1) and requirements xe C
= ρtotal − ρidle ⇒ C =
(Table 2). It is seen that the load is the ratio between demand for execution (number of users times events per user and hour times execution requirement per event summed over all events) and supply of execution (capacity per hour). To calculate loads for node failures and operator interventions, let N denote the set of all pool members and let ρn (t , t + 1t ) denote the expected load on pool member n, n ∈ N , during the interval (t , t + 1t ). Moreover, subdivide N into two disjoint subsets D and A respectively depending on whether users ‘‘depart’’ or ‘‘arrive’’ during redistribution, and let φD (d) and φA (a) denote the fraction of users departing from d, d ∈ D , and arriving to a, a ∈ A respectively. We remark that φD (d) equals one for node failures but can be arbitrary for operator interventions, and that φA (a) typically is selected in proportion to capacity. In the context of re-registration we shall distinguish between four sources of load, viz. ordinary users who have not re-registered and will not re-register, moving users who currently are re-registering, remaining users who have not reregistered but will re-register, and diverted users who have re-registered. Ordinary users are assumed to behave like the traffic model in Table 1 hence we get xe n∈D ( 1 − φD (n))Un γe 1t C n 1t e∈E ord ρn (t , t + 1t ) ≈ (20) xe γe 1t n∈A Un Cn 1 t e∈E where Un is the number of users at node n at time t = 0. It is seen that the load in both cases is obtained as the number of users that have not re-registered and will not re-register, (1 − φD (n)) Un and Un respectively, times the expected number of e-events per user, γe 1t, times the execution requirement per e-event, xe , divided by the capacity, Cn 1t, summed over all events e ∈ E . Notice that the factor 1t in both cases cancels out. Moving users perform location updates at both LANs and, if the re-registration was triggered by a traffic event, a traffic event at the new LAN, hence we get xD n∈D φD (n) Un (π (t + 1t ) − π (t )) C 1t n
ρnmov (t , t + 1t ) ≈
x x e A γe 1t φD (d)Ud φA (n) (π (t + 1t ) − π (t )) + (1 − π (t )) Cn 1t Cn 1t d∈D A
(21)
n∈A
e∈Etrf
where π (t ) is the probability that a user has re-registered at time t , xD and xA are the movement loads, ‘‘Location update A is the set of traffic events that cause load on old LAN’’ and ‘‘Location update new LAN’’ in Table 2 respectively, and Etrf arrival nodes, i.e., for node failures, all outbound traffic events and, for operator interventions, no/all traffic events during first/second phase respectively. It is seen that in the departure case, the load is the number of users that will re-register, φD (n) Un , times the fraction that performs the re-registration in (t , t + 1t ), π (t + 1t ) − π (t ), times the execution requirement per moving user, xD , divided by the capacity, Cn 1t. Similarly, in the arrival case, the load is the number of users that will depart from any node, d∈D φD (d) Ud , times the fraction that arrive to node n, φA (n), times the load per such user. The latter consists of two terms, the movement load and traffic load, where the first one includes all arriving users and the second one only those arriving users for which the re-registration was triggered by a traffic event. The movement load is the fraction of users that performs the re-registration in (t , t + 1t ), π (t + 1t ) − π (t ), times the execution requirement per moving user, xA , divided by the capacity, Cn 1t, and the traffic load is the fraction of users that has not re-registered at t , 1 − π (t ), times the expected number of e-events per user, γe 1t, times the execution requirement per e-event, xe , divided A by the capacity, Cn 1t, summed over all events e ∈ Etrf . Notice that the factor 1t in the second term of the second case cancels out. Diverted users at arrival nodes are also assumed to behave like the traffic model in Table 1, except that they will not perform any location update during the first T time units, hence we get
0 xe xP ρndiv (t , t + 1t ) ≈ φD (d)Ud φA (n) π (t ) γe 1t + It ′ >0 γP 1t Cn 1t Cn 1t e∈Ernd d∈D
n∈D n∈A
(22)
Å. Arvidsson, T. Rydén / Performance Evaluation 90 (2015) 18–35
29
where Ernd is the set of all events related to random transactions, i.e. all events except periodic location update, I is the indicator function and t ′ = t − T refers to the time at which diverted users may perform periodic location updates the rate γP and load xP of which are given in Tables 1 and 2 respectively. It is seen that the load is the number of users that will depart from any node, d∈D φD (n) Un , times the fraction that arrive to n, φA (n), times the fraction that has re-registered at t , π(t ), times the expected number of e-events per user, γe 1t, times the execution requirement per e-event, xe , divided by the capacity, Cn 1t, summed over all events e ∈ Ernd and, conditionally, also over periodic location updates. Notice that the factor 1t in the second case cancels out. Remaining users at departure nodes is a relevant category only for operator interventions and only during the first phase during which they are assumed to behave like the traffic model in Table 1, except that only traffic events count, hence we get
ρnremark (t , t + 1t ) ≈
γe 1t φD (n) Un (1 − π (t )) D e∈Etrf
0
xe Cn 1t
n∈D (23) n∈A
D Etrf
where is the set of traffic events that cause load on departure nodes, i.e. for node failures, no events and, for operator interventions, all/no traffic events during the first/second phase respectively. It is seen that the load is the number of users that will re-register, φD (n) Un , times the fraction that has not re-registered at t , 1 − π (t ), times the expected number of e-events per user, γe 1t, times the execution requirement per e-event, xe , divided by the capacity Cn 1t, summed over all D events e ∈ Etrf . Notice that the factor 1t in the first case cancels out. Finally, the total load is obtained by adding the four terms Eqs. (20)–(23) and the idle load. 4.3. Model verification Before proceeding to the examples we first take the opportunity to examine the validity of our Poisson assumptions for random transactions by comparing the observed rate of periodic transactions in Table 1, γPobs , to the calculated one, γpcal , which may be obtained from the model by integrating the probability of a periodic location update at t over the time interval [0, u] in question. This probability is obtained as the probability that, at time 0 the time to a location update is t , fA (t ) = fB (T − t ), times the probability that there is no random transaction before t, e−γ t ,
γ
cal p
u
= t =0
γ e−γ (T −t ) −γ t e−γ T e dt = γ u, 1 − e−γ T 1 − e−γ T
(24)
(cf. Fig. 4). Considering Table 1 we find γPobs = 0.249 and, after applying the equation above with γ = 0.431 + · · · + 1.069 and u = 1 h, γPcal = 0.251 and conclude that the almost perfect agreement supports that system performance is well captured using our Poisson assumptions regarding random transactions. We remark that we can also arrive at Eq. (24) by noting that, under normal operation and our stationarity assumption, the process of periodic location updates is a stationary renewal process with intensity 1/E[H ] × P(H = T ) = γ e−γ T /(1 − exp(−γ T )) (cf. Section 3.2) which after multiplying by u gives the expected number of time outs in the interval [0, u]. 4.4. Re-registration related to node failures We now consider the case when one pool member ν becomes unavailable during the busy hour and examine the reregistration process and the associated load under the assumption that the users at the failed node are spread evenly between the three other nodes. The probability πF (t ) that a user has re-registered t time units after the failure is obtained from Eq. (4) as
πF (t ) = 1 − P (R > t ). The load ρn (t , +1t ) at any of the three remaining LANs n ̸= ν can be computed from Eqs. (20) to (23) with Uν = Un = 500,000, π (t ) = πF (t ), φD (ν) = 1 and φA (n) = 1/3. The results from our model (with 1t = 1 min) are compared to the results from simulations in Fig. 5. The purpose of the simulation model is to check the analytical solution hence periodic events are generated by a detailed implementation of the time out mechanism while random events are generated by Poisson processes. The diagrams refer to the case when a failure occurs at t = 10 min and depict the probability that a user at the failed LAN has re-registered as a function of time (left) and the load at one of the remaining LANs as a function of time (right). It is seen that, except for some random noise in the simulation, the calculated results and the simulated results overlap completely. As for performance, it is seen that the load at the working LANs is subject to a large increase immediately after the failure and to a small decrease one hour later. The abrupt increase may seem surprising at first, but it must be recalled that, although re-registration is a gradual process, redirection takes place immediately. That is, all movement events, all outbound traffic events and all periodic location updates of an additional Um /3 users will immediately be directed to the working LAN. The
30
Å. Arvidsson, T. Rydén / Performance Evaluation 90 (2015) 18–35
Fig. 5. Probability of re-registration when a failure occurs at t = 10 min (left) and load at a working LAN when a failure occurs at t = 10 min (right).
Fig. 6. Probability of re-registration when a failure occurs at t = 10 min under different variations. Different degrees of user activity (top) and different periodic location updates (bottom).
gentle increase that follows is a result of an increase in the number of inbound traffic events directed to the working LAN as more and more users have re-registered. The sudden decrease follows from the fact that the processing of re-registrations from all additional Um /3 users is completed after one periodic location update interval. It is thus clear that our analytical model can be used to predict user movements and processor load in pooled LANs subject to pool member failures. Four examples of what would happen under different conditions are given in Figs. 6 and 7. In the upper rows we show the impact of user behaviour by scaling the rates of random transactions by ±20% to obtain ‘‘heavy users’’ and ‘‘light users’’ respectively (and recompute the time out rates γp to match these changes) and in the lower rows show the results of varying the periodic location update timeout by dividing/multiplying by a factor of 2 to obtain a short time out (T = 0.5 h) and a long time out (T = 2 h) respectively (and again adjust γp accordingly) while in all cases we keep the processing capacity of the LANs fixed. Comparing the results from the modified parameter sets in Fig. 6 to those from the original one in Fig. 5, it is seen that heavy users and shorter time outs speed up the re-registration process and vice versa. Similarly, comparing the results from the modified parameter sets in Fig. 7 to those from the original one in Fig. 5 we note that the faster cases result in higher load
Å. Arvidsson, T. Rydén / Performance Evaluation 90 (2015) 18–35
31
Fig. 7. Load at a working LAN when a failure occurs at t = 10 min under different variations. Different degrees of user activity (top) and different periodic location updates (bottom).
both during steady state and during re-registration and vice versa. The change during steady state follows from the changed event rates and the additional change during the transient period follows from the rate change of the re-registration process. We remark that the differences between ‘‘heavy’’ and ‘‘light’’ users in reality applies on an individual level too. This means, e.g., that users with high mobility and/or high traffic will re-register sooner and vice versa, and we note that this implies an implicit prioritisation of ‘‘heavy’’ users which, at least compared to the opposite, is advantageous from a traffic (revenue) point of view. 4.5. Re-registration related to operator interventions We now consider the case when pool member ν is being emptied during the busy hour using τ = T and examine the re-registration process and the associated load under the assumption that the users at the emptied node are spread evenly between the three other nodes. The probability πI (t ) that a user has re-registered t time units after the process started is obtained from Eq. (5) or Eq. (16) as
πI (t ) = 1 − P (R > t ) with, in the first case, Eqs. (2) and (8) or Eq. (15) inserted and, in the second case, Eqs. (2), (17) and (18) or Eq. (19) inserted. The load ρn (t , +1t ) at the emptied LAN ν and any of the three other LANs n ̸= ν can be computed from Eqs. (20) to (23) with π(t ) = πI (t ), φD (ν) = 1 and φA (n) = 1/3. The results from our model (with 1t = 1 min) are compared to the results from simulations in Fig. 8. The diagrams refer to the case when re-registration is triggered at t = 10 min and depict the probability that a user at the vacated LAN has reregistered as a function of time (left) and the load at one of the remaining LANs as a function of time (right). Again it is seen that, except for some random noise in the simulation, the calculated results and the simulated results overlap completely. As for performance, it is again seen that the load at the receiving LANs is subject to a large increase immediately after the failure and a (very) small increase one hour later. The reason for the large increase is the same as before, i.e. the receiving LANs will immediately feel the full pressure of the additional Um /3 users although it takes some time before they have met all of them. The small increase is the result of, firstly, a decrease as all passive users have performed periodic location updates after T = 1 h and, secondly, an increase as the phase timer τ = 1 h has expired hence the pressure is increased by traffic activities.
32
Å. Arvidsson, T. Rydén / Performance Evaluation 90 (2015) 18–35
Fig. 8. Probability of re-registration when a node is emptied at t = 10 min (left) and load at a receiving LAN when a node is emptied at t = 10 min (right).
Fig. 9. Probability of re-registration when a node is emptied at t = 10 min under different variations. Different degrees of user activity (top), different periodic location updates (middle) and different phase timers (bottom).
Å. Arvidsson, T. Rydén / Performance Evaluation 90 (2015) 18–35
33
Fig. 10. Load at a working LAN when a node is emptied at t = 10 min under different variations. Different degrees of user activity (top), different periodic location updates (middle) and different phase timers (bottom).
It is thus clear that our analytical model can be used to predict user movements and processor load in pooled LANs subject to operator inventions. Six examples of what would happen under different conditions are given in Figs. 9 and 10. In the upper rows we show the impact of user behaviour by scaling the rates of traffic and mobility transactions by ±20% to obtain ‘‘heavy users’’ and ‘‘light users’’ respectively (and recompute the time out rates γp to match these changes), in the middle rows we show the results of varying the periodic location update timeout by dividing/multiplying by a factor of 2 to obtain a short time out (T = 0.5 h) and a long time out (T = 2 h) respectively (and again adjust γp accordingly), and in the lower rows we show the results of varying the phase timer by dividing/multiplying by a factor of 2 to obtain a short timer (τ = 0.5 h) and a long timer (τ = 2 h) respectively while in all cases we keep the processing capacity of the LANs fixed. Comparing the results from the modified parameter sets in Fig. 9 to those from the original one in Fig. 8, it is seen that low traffic users, high mobility users, shorter time outs, and shorter timers speed up the re-registration process and vice versa. Similarly, comparing the results from the modified parameter sets in Fig. 10 to those from the original one in Fig. 8 we again note that the faster cases result in higher load both during steady state and during re-registration and vice versa. As before, the change during steady state follows from the changed event rates and the additional change during the transient period follows from the rate change of the re-registration process. We again remark that the differences between ‘‘heavy’’ and ‘‘light’’ users in reality apply on an individual level too. This means, e.g., that users with high mobility and/or low traffic will re-register sooner and vice versa, and we emphasise that
34
Å. Arvidsson, T. Rydén / Performance Evaluation 90 (2015) 18–35
this has implications on interventions to change the way users are distributed between nodes. To see this, consider a first approach where all users are re-registered until some load criterion is met (e.g., the loads are about the same on all nodes) and a second approach where a randomly selected subset of the users are re-registered until some time criterion is met (e.g., the re-registration has been active long enough to make (calculated) probability of having moved equal to unit). It is seen that the first approach will be biased toward users with high mobility and/or low traffic whereas the second approach will be free from such bias. The result is that we can expect a vulnerable balance in the first case (where different types of behaviour dominate different LANs) but a stable balance in the second case (where all types of behaviour are equally common at all LANs). Finally, comparing the two cases of re-registration, we note that the differences between the modified parameter sets and the normal ones are smaller for re-registration than for failures and conclude that, at least for our sets of parameters, node failures may form more sensitive scenarios than operator interventions. 5. Conclusions The 3GPP concept of ‘‘Pool’’ (or ‘‘A/Gb/Iu/S1 Flex’’) allows multiple local anchor nodes (MSCs/VLRs/SGSNs/MMEs) per service area. Such service areas have been noted to improve network efficiency and performance and, at the same time, reduce capital and operational expenditure. The solution is thus deployed (or planned) by several operators all over the world. The present work, which we believe to be the first of its kind, studies in analytical terms the dynamics of user numbers and processor loads inside such pools for cases of sudden node unavailability or deliberate operator intervention. The model related to node unavailability can be used to predict the performance of individual nodes when pool members suddenly become unavailable and to engineer pools to handle such situations. Moreover, the model is informative in terms of user performance and can be used to, e.g., set the periodic location update interval such that certain requirements with respect to restoration time are met. The numerical example indicated that more active users experience shorter interruptions and vice versa, and that the restoration time is a matter of minutes hence the redundancy offered by a pool (‘‘network redundancy’’) should be viewed as a complement to the redundancy offered by a node (‘‘node redundancy’’). It is, however, a valuable complement as it can handle complex cases like site failures. The model related to operator intervention can be used to predict the performance of individual nodes when operators empty nodes prior to service or removal, and/or fill nodes after service or insertion. The model can also be used to select the time of the day for such actions and the values of the periodic location update timer and/or the phase timer such that certain requirements with respect to process duration and load peaks are met. The numerical example indicated that it is difficult if not impossible to control re-registration purely by aborting the process when a certain load criterion is met. The reason for this is the non-random order in which users are redistributed; users with a certain behaviour will be redistributed sooner than users with a certain other behaviour. Pure interruption will thus introduce a bias in the user mixtures at the different nodes which most likely will disturb the balance. Compared to current common practice, which essentially is limited to simple before–after steady state analyses and/or detailed simulation scenarios, our approach is both accurate and fast. Moreover, all parameters of the model are directly mapped to common network parameters or can be derived from simple event counters. We thus believe that our results have significant practical interest. Lastly we note that ‘‘time to re-registration’’ can be viewed as ‘‘time to absorption’’ and we believe that our approach may be generalised to other absorption time problems which include combinations of random events and deterministic timers where the former may impact the latter. References [1] Ericsson Claims ‘MSC Pool’ First, Light Reading, January 16, 2007. Available at: http://www.lightreading.com/ethernet-ip/ericsson-claims-msc-poolfirst/d/d-id/6368090. [2] H. Ng, C. Goodwin, The Telstra case—a radical approach to the mobile core, Ericsson Bus. J. (3) (2008) 58–61. Available at: http://www.ericsson.com/ericsson/corpinfo/publications/ericsson_business_review/pdf/308/308_58_61_telstra_case.pdf. [3] Zain Nigeria receives mobile networking solution from Huawei, ITNewsAfrica, January 15, 2010. Available at: http://www.itnewsafrica.com/2010/01/zain-nigeria-receives-mobile-networking-solution-from-huawei. [4] Intra-domain connection of Radio Access Network (RAN) nodes to multiple Core Network (CN), 3rd Generation Partnership Project TS 23.236, release 11, 2012. [5] 3GPP System Architecture Evolution: Report on Technical Options and Conclusions, 3rd Generation Partnership Project TS 23.882, release 8, 2008. [6] General Packet Radio Service (GPRS) enhancements for Evolved Universal Terrestrial Radio Access Network (E-UTRAN) access, 3rd Generation Partnership Project TS 23.401, release 11, 2012. [7] Evolved Universal Terrestrial Radio Access (E-UTRA) and Evolved Universal Terrestrial Radio Access Network (E-UTRAN); Overall description; Stage 2, 3rd Generation Partnership Project TS 36.300, release 11, 2012. [8] Evolved Universal Terrestrial Radio Access Network (E-UTRAN); S1 Application Protocol (S1AP), 3rd Generation Partnership Project TS 36.413, release 11, 2012. [9] G. Boulmier, A. Kessentini, Optimal design of pool-areas in a mobile network using the MSC Pool technology, in: Proc. 2012 XVth International Telecommunications Network Strategy and Planning Symposium, NETWORKS 2012, 2012, pp. 1–6. [10] S. Sun, W. Wang, M. Wei, Zh. Zhou, Load balancing based on subscriber characteristic for MSC pool system, in: Proc. Fourth International Conference on Computational and Information Sciences, ICCIS 2012, 2012, pp. 1111–1115. [11] S. Asmussen, Applied Probability and Queues, second ed., Springer, 2003. [12] J. Abate, W. Whitt, A unified framework for numerically inverting Laplace transforms, INFORMS J. Comput. 18 (2006) 408–421.
Å. Arvidsson, T. Rydén / Performance Evaluation 90 (2015) 18–35
35
Åke Arvidsson obtained his M.Sc. and Ph.D. degrees in Electrical Engineering from Lund University, Sweden, in 1982 and 1990 respectively. He has worked with several consultancy companies and held various academic positions in Sweden and Australia and became full professor of teletraffic systems at Blekinge Institute of Technology in 1995. In 1998 he joined Ericsson and since 2015 he is also full professor at Kristianstad University. His current research interests include performance analysis and optimisation of cellular networks in terms of system dependability, transport protocols, content distribution and quality of experience.
Tobias Rydén obtained an M.Sc. in Computer Engineering and a Ph.D. in Mathematical statistics from Lund University, Sweden, in 1988 and 1993 respectively. He held positions as professor of mathematical statistics at Lund University until 2009 and at the Royal Institute of Technology, Stockholm, 2010–12. Currently he is a quantitative analyst at Lynx Asset Management, Stockholm, and also adjunct professor at Uppsala University. His current research interests include computational statistics, sequential Monte Carlo methods and machine learning.