GSM session

GSM session

ITC 18 / J. Charzinski, R. Lehnert and P. Tran-Gia (Editors) 9 2003 Elsevier Science B.V. All rights reserved. 389 A Portrait of a GPRS/GSM Session ...

2MB Sizes 1 Downloads 91 Views

ITC 18 / J. Charzinski, R. Lehnert and P. Tran-Gia (Editors) 9 2003 Elsevier Science B.V. All rights reserved.

389

A Portrait of a GPRS/GSM Session Jorma Kilpi ~ ~VTT Information Technology, EO. Box 1202, 02044 VTT, Finland E-mail: Jorma.Kilpi @vtt.f Some results from a GPRS/GSM traffic measurement are presented. First we define session arrivals, durations and volumes in such a way that they can be compared, for example, with measured values of dial-up sessions. Then we study the map of cumulative volume of a single session against elapsed time from the first observed packet of the session. With these maps we study the question of how the major data transfer is typically situated inside a single session's active duration. For this purpose we introduce three simple statistics which can be used for a rough shape classification. 1. I N T R O D U C T I O N When the mobile station (MS) of a user is attached to the GPRS network and the Packet Data Protocol (PDP) context activation is made, the user is logically, but not necessarily physically, connected to a GPRS gateway support node (GGSN). A logical connection means that the user is given an IP-address. In practice, this IP-address is only temporary and not visible to the outside world: somewhere after GGSN and before the true access to the Internet a network address translation (NAT) is made. In this paper, we present some results obtained from t c p d u m p traces recorded from a measurement point just before NAT. The measurement was done in the network operated by the Finnish GSM-operator Radiolinja. The phrase GPRS/GSM session in the title refers to the temporary IP-address given to a user: a session consists of all IP-datagrams with the same temporary IP-address. The set of all temporary addresses is large enough so that the same address has not been used again by a different user during a short measurement period, hence in this sense the session is a well defined concept. GPRS/GSM uses the spare radio interface capacity left out by GSM voice calls. If GSM calls need all the capacity, an ongoing GPRS data transfer is interrupted, not terminated, and it continues after there is again some capacity available. Such a system is not TCP-friendly, see [1]. All GPRS users in the same cell share the free capacity between themselves. The radio resources are requested and reserved only when needed, and there is a random backoff time if the resources are not immediately available. The uplink and downlink radio interfaces work independently of each other. For short introductions to GPRS see e.g. [1], [2], [8] or [6]. A GPRS user pays either flat rate or, as in the case of this measurement, according to the amount of data that is transmitted and received during some charging period. In principle, the logical connection alone does not cost anything. Due to this nature of GPRS, our definition of a session differs from that of e.g. a session in a dial-up connection where the user pays for the

390

time used. The latter concept will be called a dial-up session. A single GPRS user might have had several sessions during the measurement period, but without charging information it is not possible to distinguish these. In the case of dial-up sessions it is possible to use additional information from a simultaneous authentication log file, e.g. user ID or the phone number, in order to collect together different dial-up sessions of the same user. A GPRS session and a dial-up session are hence different in many basic ways, nevertheless (or just because of these differences), it is worth to compare them. Vicari and Kt~hler [9] give for example control groups of dial-up users with low access speeds, where the access speeds were known. Also in our own modem pool measurements (see e.g. Kilpi and Norros [4]) the user access speeds were known from the authentication log files. Nieminen and Halme [7] and Kunz, Barry, Black and Mahoney [5] describe some WAP traffic measurements. We mostly concentrate on session level characteristics, i.e. session arrivals, session durations, session volumes, and try to describe some most typical sessions. The observed GPRS sessions mostly consist of web browsing and e-mail traffic, some significant but relatively small amount of e.g. file transfer and terminal connection traffic was also observed. Other protocols or applications were also observed, but their relative amount from the total traffic is negligible and hence is not reported here. The paper is structured as follows. In section 2, we give some more motivations for our approach. In section 3, we give details of the traffic measurement. Section 4 gives basic resuits of the measurement and describes one of our tentative ideas for future work: to study the curves presenting the cumulative volume against elapsed time from the first observed (upstream directed) packet of a session.

2. BACKGROUND AND MOTIVATION The results of this type of measurements can be used as guidelines when planning GPRS and other new mobile or wireless access networks. The results might also be helpful for dimensioning purposes of such networks. Compared to other Internet access methods, GPRS is slow and rather expensive. The possible regular user population can be assumed to consist of those people who need or benefit from the mobility for example in their work. This might mean that even if these users are paying customers of the operator, their bills are paid by their employers. Commercial GPRS was launched quite recently, hence the amount of first (and last?) time users may also be significant. This in turn might mean that the users' laptop settings are not optimally configured for wireless data traffic. The concept of a GPRS session has a natural duration, the elapsed time between the first and the last observed packets. Due to rather slow access speeds and structural delay of the radio interface we do not expect to see much of large downloadings, but due to the volume based charging and free logical connections we can expect a priori that the session durations might have a qualitatively different distribution than the corresponding one for dial-up sessions. Receiving data does not consume much the batteries of a MS, but transmitting of data does. The laptop batteries may also run out. On the other hand, it is possible to connect both the mobile phone and the laptop to the electrical network, so there is no reason why a session could not last for example several days. We know from our test measurements in [3] that a PDP

391

context, i.e. the logical level connection with the same temporary IP-address, can be maintained active for many days. The traffic characteristics like volume, data volume, the number of packets etc. are naturally divided into upstream and downstream directions. (The word volume in this paper contain the bytes from TCP/IP headers also. Total volume is a sum of upstream and downstream volumes.). The qualitative and quantitative comparisons of the corresponding distributions between GPRS sessions and dial-up sessions might be interesting. In addition to the above described natural questions, we look for a method to describe how the data bursts are situated inside of the session and study whether the users take any advantage of the free of charge logical connection.

3. SOME DETAILS OF THE GPRS TRAFFIC MEASUREMENT The measurement was done during 11 days between Friday 17, May 2002, at 13:30 and Monday 27, May 2002, at 14:10, about half a year after the operator Radiolinja had launched its commercial GPRS service. The t c p d u m p traces were collected from a monitoring interface of a firewall router just before the NAT operation was made. Due to certain practical reasons, the t c p d u m p program could not be run continuously. The measurement consists of over 2800 t c p d u m p traces collected over a continuous time period of 10 days. Each t c p d u m p run recorded a fixed number of packets and, as there was also other traffic in the measurement interface, packets of non-GPRS users were later filtered out. The start time of each t c p d u m p run was also recorded. A new t c p d u m p program started immediately after the previous one had stopped. The time gap between two such measurements has been at most a few seconds, typically less than one second. We have identified Tuesday, Thursday and the second Saturday as days when there have been some problems with the measurement. The absolute number of non-recorded IP-packets of GPRS users is not known, but it is assumed to be small and relatively negligible. The pictures presented at this paper do not contain all the sessions, only those that had at least two packets in both directions and where the first observed packet of a session were from an upstream directed packet which was not a data-acknowledging packet. These conditions were chosen to reduce the bias caused by configuration problems at the users' devices, problems with the measurement and a known problem with the WAP service of the operator during the time of the measurement. Figure 1 shows the arrivals of GPRS/GSM sessions per hour. The vertical axes of figure 1 is normalised with respect to the busiest hour on the second Friday aftemoon: If NB is the number of arrivals on the busiest hour and Ni is the number of arrivals on the i : th hour, the vertical axes give the value (Ni/NB) x 100%. Figure 2 shows the average arrival profile of the 5 weekdays together with an average profile of the 4 non-weekdays. The measured traffic is mainly generated by users whose MSs consist of a laptop computer with a GPRS/GSM mobile phone acting as a modem connecting the users to the Intemet. Especially, the sessions with a large volume must be originated from these laptop users. The use of WAP through the laptop's web browser exist, but direct WAP/GPRS usage is not contained in the data. The measurement point is after the uplink radio interface and before the downlink radio interface. The GPRS backbone network is also between the radio interface and the measurement

392

Figure 1. An overall view of observed GPRS session arrivals per hour. The tick marks are at 00:00 of each day.

Figure 2. Average arrival profiles of the five weekdays (thin dark grey) and of the four nonweekdays (thick light grey).

point. The structural delay of the uplink radio interface is approximately from 50 milliseconds (ms) up to 900 ms, depending on the number of available time slots in TDMA frame, channel coding class and packet size, see [3]. In addition to this, there have almost surely been some random delays depending on how fast the radio resources have been available and whether there have been any need for retransmissions of data over the radio interface. Hence it is not possible to say how much earlier the packet has been sent from the MS or from the laptop to the GPRS phone. The transfer delay through the GPRS backbone network can be considered negligible when compared with the other delays. 4. D E S C R I P T I V E AND C O M P A R A T I V E A P P R O A C H Our aim is to study plots of cumulative volume of a single session against the elapsed time, duration, that starts from the first observed time stamp of the session. This approach is described in sections 4.4 and 4.5. Before going any further to this subject we will describe the basic statistical observations of the measured sessions. In some pictures we have used also data collected from authentication log files of a dial-up traffic measurement. This was done for comparison purposes. This dial-up data is similar than reported in [4] but measured more recently, in March 2000.

4.1. Session arrivals The figures 1 and 2 seem to indicate that the most active usage happens at weekdays and in working hours. This supports our assumption about the user population during the time of the measurement. In the latter weekend there has been some activity also at night times. 4.2. Session durations First we consider the problem of when a session ends. The immediate idea would be that it ends when the last packet is observed. We call this as maximum duration. However, in many cases downstream packets, even with containing some data, can be seen minutes or even hours after the last upstream packet. As upstream and downstream packets typically exist almost simultaneously and the start times of each measurement are known, the problems with the measurement do not explain this phenomenon. It is impossible to know whether these tail

393 packets ever have reached the user's laptop. As the upstream packets are always generated by a GPRS user, we could also consider that a session ends when the last upstream packet is sent. The elapsed time between the first and the last upstream packets is called an active session duration. Hence active duration is always shorter than maximum duration. Figure 3 shows that the difference between these two definitions is significant if the session duration is between 1 minute and 1 hour, or larger than 3 hours.

Figure 3. The empirical complementary cumulative distribution function (CCDF) of maximum session durations (light grey) and active durations (dark grey) in a log-log scale.

Figure 4. The empirical CCDFs of dialup modem (thick light grey) and ISDN (thin black) sessions and the CCDF of active duration of GPRS sessions (thick dark grey).

A kind of a compromise would be to require that if the last observed packet is downstream directed, the time gap between it and the last upstream directed packet must not be too long. However, as the definition of any compromise would be quite arbitrary, we chose here to consider both maximum and active durations. The figure 5 below gives another view of the observed durations when the arrival time is also taken into account. Note the visual look typical of heavy-tailed distributions: There are not very many sessions longer than say 10 hours, but almost every day there is at least one. The majority of sessions in figure 5 are invisible since they are much shorter than 1 hour. 4.3. Session volumes The volume of those downstream packets that have been sent after the last upstream packet is relatively negligible and hence need not be separated. Figure 6 shows the empirical CCDFs of session downstream volumes of Monday, Tuesday, Wednesday and Thursday. The straight (thin black) line is the CCDF of exponential distribution with the mean as the sample mean of the four days. The tails are clearly heavier than exponential. Figure 7 shows the CCDFs of session volumes and data volumes of all data. In the downstream direction they seem to be indistinguishable. Figure 8 has the same idea than figure 4.

394

Figure 7. The empirical CCDFs of session volumes in a log-log scale. The dark grey colour refers to downstream case, light grey to upstream case. Thick curves represent data volumes, the thin curves volumes.

Figure 8. The empirical CCDFs of downstream volumes of GPRS sessions (thick dark grey), dial-up ISDN sessions (thin black) and modem sessions (thick light grey).

The figures 9 and 10 compare the volumes of GPRS sessions and dial-up modem sessions. The overall structure is similar although there are some small differences.

4.4. Cumulative volume against elapsed time Figure 11 gives six individual examples of "cumulative volume against duration" plots. They are calculated for six relatively large sessions with a total volume of down- and upstream traffic, headers included, exceeding 2.0 MB. The straight lines describe how the cumulative volume would optimally grow with the constant bit rate corresponding 1, 2, 3 or 4 time slots and channel coding class CS-2. Note that the scales of both axes differ from picture to picture. Horizontal parts of curves of figure 11 refer to silent periods, increasing parts seem to have slopes that visually look the same as one of the constant bit rate slopes.

395

Figure 9. In this picture sessions are ordered according to their downstream volume (dark grey colour). The point in the (light grey colour) cloud with the same x-coordinate is the upstream volume of the session.

Figure 10. A comparative picture to 9, similar picture with data from dial-up modem sessions.

Figure 11. Six examples. Dark grey colour is used for downstream volume, light grey for upstream volume. Thick curves represent data volume, i.e. without IP- and TCP- or UDPheaders, thin curves include these headers. Figures 1 l(c) and 1 l(e) are examples where a single downstream packet is observed many minutes after the last upstream packet. Since this is quite a typical phenomenon, it seems

396 doubtful whether these last packets ever have crossed the downlink radio interface, i.e. it looks that the GSM phone has been detached from the GPRS network but the information of that event has never passed to the Intemet host. A possible explanation is that after a too long silent period the user is sent a packet indicating that he/she will be switched off from some application. The silent period may be due to the user's own action, to an arriving GSM call to the user's GPRS phone, temporary lack of time slot capacity in the base station due to other GSM calls, or something else. Figure 11 (c) shows the only one of these six sessions that sent more data than received. It may not have been able to use more than 1 time slots in the uplink. In all 6 examples it seems visually clear that the number of time slots in the downlink must have been more than 1, at least part of the time. We must be careful with any conclusions about the number of time slots used since we do not know the behaviour after the measurement point. There may be some IP-level buffering at least in the Subnetwork Dependent Convergence Protocol (SNDCP) layer. The effect of any such buffering must be estimated before any conclusions can be made. Figure 1 l(f), the one with a relatively huge 30.6 MB total volume, has used the data compression service offered by Radiolinja, see [3]. This service uses UDP datagrams for crossing the radio interface, and the picture thus shows an UDP data flow.

4.5. Rough shape classification In this section, we give a rough answer to the question how the majority of data transfer is typically situated inside a single session's active duration. More precisely, does the data transfer occur essentially in the beginning of the session, evenly throughout the session or in the end of the session. Let T denote the active duration of a session and ct, t E [0, T] be the cumulative total volume of a session. We do not distinguish directions now. The value k = cT/T is an average constant bit rate that produces eventually the same final volume. We define areas A~, A2, A3 and A4 by the following formulas"

A1-

/o ( c t - kt) + dt

A2-

fo ( k t -

ct) + dt,

A3-

/o ct dt,

and

A4--

-~TcT. 1

Here x + - max{x, 0}. In figure 12 the dark grey colour refers to the area of A1. The lightest grey refers to A2. The middle and the darkest grey together describe the area A3. The A4 is simply the area of the triangle defined by the line with slope k, time-axes and vertical line at point T. These quantities are connected by the geometrical facts that A3 + A2 - A1 - A4, or A3 A2 t A4 A4

A1 A4

= 1

and

0 < A3 < 2A4.

All of the ratios A3/A4, A2/A4 and ALIA4 measure the location of active data transfer inside a session. For example, the ratio A3/A4 is extremely simple to calculate and easy to interpret, see figure 12 below. If the (relatively) most active part is in the beginning of the session like in 12(c), the ratio A3/A4 is close to 2. If it is at the end of session like in 12(a), A3/A4 is close to 0. If the user is active all of the time like in 12(b), the value of the ratio A3/A4 is close to 1, but there are also other possible shapes that may produce a value close to 1. However, these other

397

Figure 12. The value of the ratio A3/A4 is close to 0 in (a), close to 1 in (b) and close to 2 in (c). It can be used to distinguish the extremal shapes of (a) and (c) in a unique manner. shapes are not so likely. The ratios ALIA4 and A2/A4 could be used to give further information about the shapes when A3/A4 ,~ 1. The interpretation of the two scatter plots of figures 13 and 14 below is, since most typically A3/A4 >_1, that the majority of the data transfer occurs most typically in the beginning of a session.

Figure 13. The scatter plot of the ratio against active session duration.

A3/A4 Figure 14. The scatter plot of the ratio Az/A4 against session volume.

5. CONCLUSIONS AND FURTHER WORK This paper was descriptive and comparative in nature. We have pointed out mainly qualitative properties of observed GPRS sessions and similarities (or differences) with dial-up sessions. The main observations were that, during the measurement time, GPRS sessions started during the working days and in working hours, were in general roughly similar to dial-up modem sessions, and that GPRS session durations and session volumes may have a heavy tailed distributions. The arrival structure may change in the future, but the heavy tailed character of volume

398 and duration distributions is likely to become more apparent. Moreover, we presented a descriptive tool of mapping the cumulative volume against elapsed time of the session. Based on this tool we studied the question of classification of sessions according to the temporal situation of data bursts during the sessions active duration. The result for GPRS sessions can be expressed by saying that majority of data transfer occurs typically in the beginning of the session and that users typically detach from the GPRS network when they have finished. This behaviour may change in the future if, for example, GPRS functionality becomes part of typical laptop's configuration. We are working on a more theoretical distributional study of measured session durations and session volumes. The results of that study will be reported elsewhere. Acknowledgement. We are grateful to Radiolinja for allowing a public research on their data, Technology Manager Mika Sar6n from Radiolinja for his support to our research and especially to System Manager Vesa Antervo from Radiolinja for his efforts on developing and performing the actual measurement. Thanks to Ilkka Norros and Petteri Mannersalo for careful reading of the various draft versions of this paper. Thanks also to the anonymous reviewers of ITC-18 for their comments.

REFERENCES 1. R. Chakravorty and I. Pratt. Performance Issues with General Packet Radio Service. Journal of Communications and Networks (JCN), 4(2):266-281, December 2002. 2. R. Kalden, I. Meirick, and M. Meyer. Wireless Internet Access Based on GPRS. IEEE Personal Communications, April 2000. 3. J. Kilpi and P. Mannersalo. Performance analysis of GPRS/GSM from the single user point of view. Technical report, VTT Information Technology, Available at: http://www.vtt.fi/tte/tte23/ilias/gprs.pdf, 2002. 4. J. Kilpi and I. Norros. Call Level Traffic Analysis of a Large ISP. In Proceedings of 13th ITC Specialist Seminar, IP Traffic Measurement, Modeling and Management, 2000. 5. T. Kunz, T. Barry, J. Black, and H. Mahoney. WAP Traffic: Description and Comparison to WWW Traffic. In Proceedings of the Third ACM International Workshop on Modeling, Analysis and Simulation of Wireless and Mobile Systems, pages 11-19, August 2000. 6. C. Lindemann and A. Thtimmler. Performace Analysis of the General Packet Radio Service. In Proceedings of the 21st linternational Conference on Distributed Computing Systmes (ICDCS), pages 673-680, April 2001. 7. T. Nieminen and S.J. Halme. An analysis of WAP packet traffic measurements. In Sixteenth Nordic Teletraffic Seminar, NTS 16, pages 127-138, August 2002. 8. A.K. Salkintzis. A Survey of Mobile Data Networks. IEEE Communication Surveys, 2(3), 1999. 9. N. Vicari and S. K6hler. Measuring Internet User Traffic Behavior Dependent on Access Speed. In Proceedings of 13th ITC Specialist Seminar, IP Traffic Measurement, Modeling and Management, 2000.