Design of a sliding window scheme for detecting high packet-rate flows via random packet sampling

Design of a sliding window scheme for detecting high packet-rate flows via random packet sampling

Computer Networks 55 (2011) 1351–1363 Contents lists available at ScienceDirect Computer Networks journal homepage: www.elsevier.com/locate/comnet ...

661KB Sizes 4 Downloads 99 Views

Computer Networks 55 (2011) 1351–1363

Contents lists available at ScienceDirect

Computer Networks journal homepage: www.elsevier.com/locate/comnet

Design of a sliding window scheme for detecting high packet-rate flows via random packet sampling Takanori Kudo ⇑, Tetsuya Takine Department of Information and Communications Technology, Graduate School of Engineering, Osaka University, 2-1 Yamadaoka, Suita 565-0871, Japan

a r t i c l e

i n f o

Article history: Received 19 May 2010 Received in revised form 31 October 2010 Accepted 15 December 2010 Available online 21 December 2010 Responsible Editor: A. Popescu Keywords: Random packet sampling Sliding window scheme High packet-rate flows

a b s t r a c t We discuss the design of a sliding window scheme for detecting high packet-rate flows via random packet sampling. We determine the values of control parameters, such as the sampling rate and window length, to minimize the false positive ratio, while keeping the false negative ratio sufficiently low and making the on-line processing possible. Under mild assumptions, we formulate this problem as a nonlinear program and provide its numerically feasible global optimal solution. We then conduct sampling experiments with public trace data and discuss the fundamental characteristics of the sliding window scheme with random packet sampling. Ó 2010 Elsevier B.V. All rights reserved.

1. Introduction In recent years, accurate traffic measurement and monitoring are regarded as crucial for network management, traffic engineering, and security tasks. With the rapid growth of link speed, however, it is very hard to capture all packets in backbone traffic. To reduce overhead in traffic measurement (e.g., router CPU, memory, IO, and bandwidth in exporting data/reports) and to keep the scalability in traffic monitoring, packet sampling is considered to be a promising technique [1] and it is often deployed at routers [2]. In this paper, we consider the on-line detection of high packet-rate flows via random packet sampling. A flow is defined as a set of packets with the same key attributes that do not alter during their transmissions through the network. Typical key attributes include source/destination IP address, source/destination port number, and transport layer protocol. Although this 5-tuple is commonly used to define flows, flows can be defined arbitrarily and as ⇑ Corresponding author. Tel.: +81 6 6879 7742; fax: +81 6 6875 5901. E-mail addresses: [email protected] (T. Kudo), [email protected] (T. Takine). 1389-1286/$ - see front matter Ó 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.comnet.2010.12.019

we will see, our discussion is applicable to any definition of flows. It is well known that a small number of flows account for a large portion of the total traffic volume, and typically their packet rates are very high [3]. Such flows are also called elephant flows or heavy hitters. Because high packet-rate flows have a great impact on the network performance, it is important to detect/identify them promptly in network management and traffic engineering [4]. Note that high packet-rate flows also appear due to denial of service (DoS) attacks such as SYN Flooding [5,6] and smurf attacks [5]. Usually, flow-level measured data is collected at routers and periodically sent to the collection point, where network operators process those data. This scheme, however, has limitation on rapidity of detection and may waste bandwidth. We implicitly assume that each router can sample packets, update and process the sliding window, and it sends only the result of the data analysis to the collection point. Such a function is particularly useful for tier-1 ISPs because the number of measurement points could be a few thousand or more. So far a lot of research efforts have been made to infer flow characteristics [3,7,8], to estimate ranking [9] or top-k queries [10] from sampled data, and to detect high

1352

T. Kudo, T. Takine / Computer Networks 55 (2011) 1351–1363

packet-rate/elephant flows [4,11,12] and traffic anomalies [13,14], to name but a few. To the best of our knowledge, however, most of them discuss statistical inference methodologies for a given set of packets sampled in a certain interval of time. Note that if sufficient computational resources are available, off-line data analytic algorithms can be combined directly with a jumping window scheme, where a set of old data is replaced periodically by a disjoint set of new data. For example, Estan and Varghese propose a jumping-window based on-line algorithm, called sample and hold, for identifying large flows in terms of byte volume [11]. The algorithm processes headers of all packets and if the flow ID of a packet does not exist in the memory, the new entry is created with probability proportional to the packet size. In continuous monitoring of traffic, however, the set of sampled packets should be updated gradually. Lu et al. [12] propose ElephantTrap, which can detect elephant flows only with a small cache. The idea behind the algorithm is similar to Frequent [15,16], a well-known streaming algorithm for finding frequent items. Note that ElephantTrap does not attempt to estimate the size of detected flows, even though it is a light-weight algorithm with random packet sampling. To update the set of sampled packets, we employ a sliding window scheme, where the time window (called sliding window) of a fixed length TSW is divided into K unit times called basic windows, and the sliding window is updated every unit time (i.e., basic window). Note that the jumping window scheme is a special case of the sliding window scheme with K = 1. The sliding window scheme is very simple and therefore it seems to be a natural scheme in applying off-line data analytic algorithms to continuous monitoring of traffic. In most cases, we have to maintain data only in one sliding window and data in old basic windows can be discarded. Note here that data collected in each sliding window should be processed completely within a unit time on average. This requirement adds a new constraint to the implementation of off-line algorithms as on-line algorithms, and the constraint makes it more difficult to design monitoring systems. In this paper, we consider a scenario that a network operator continuously monitors packets transmitted through a high-speed link and attempts to detect high packet-rate flows in a timely manner within an allowable false negative ratio, where high packet-rate flows are defined as flows whose packet rates are equal to or greater than a predefined threshold R [packet/s]. From a viewpoint of statistical inference, detecting high packet-rate flows might be one of the simplest problems, yet there is a difficulty of avoiding high false positive ratio inherent in random packet sampling with low sampling rate [9]. In the sliding window scheme with random packet sampling, we have to determine the length TSW of sliding windows, the number K of basic windows in a sliding window, and the sampling rate f of packets. These control parameters affect the performance of the scheme. Roughly speaking, large values of TSW and/or f yield many sampled packets, so that the accuracy of statistical inference will be improved, whereas the processing overhead becomes

large. On the other hand, a small value of K (i.e., a large basic window) yields a sufficient time for data processing, whereas the time granularity in monitoring becomes coarse. We formulate this design problem as a nonlinear program with a numerically feasible global optimal solution. Many works have been done on efficient data processing of sampled packets [17,18] and their implementation environments [19,20]. In our formulation, however, we do not specify the data processing scheme and its implementation environment explicitly. Essentially, the larger the number of sampled packets/flows is, the longer it takes time to process them. We apply this principle to our formulation. More specifically, we introduce a function G(x), which represents the post-processing time of x sampled packets in a basic window, and through the function G(x), the employed data processing scheme and its implementation environment are taken into account. As a result, our formulation yields a generic design problem of the sliding window scheme with random packet sampling. The rest of the paper is organized as follows. Section 2 describes our scheme and its design problem. In Section 3, we formulate the design problem as a nonlinear program and derive its global optimal solution. Section 4 provides some experimental results using public trace data. Finally, concluding remarks are provided in Section 5.

2. Our scheme and problem statement In this section, we first discuss random packet sampling and our sliding window scheme briefly, and then we state the design problem considered in this paper. 2.1. Random packet-sampling In random packet sampling, packets are sampled independently with probability f (0 < f 6 1). Let X and Y denote random variables representing the number of packets and the number of sampled packets, respectively, in a randomly chosen flow observed in an interval of length TSW [s]. We then have for y = 0, 1, . . . , x

Pr ½Y ¼ yjX ¼ x ¼

  x y f ð1  f Þxy : y

Recall that we attempt to detect flows whose packet rates are not less than R [packet/s], i.e., X P RTSW. Let x⁄ denote x⁄ = dRTSWe. Hereafter, we call flows with x⁄ or more packets target flows and among them, we call flows with exactly x⁄ packets threshold flows. In order to control the false negative ratio in detecting target flows, we introduce the allowable false negative ratio  (0 <  < 1) for threshold flows. We regard a flow as a target one if the number of sampled packets is not less than y⁄, where y⁄ = y⁄(R, TSW, f, ) is given by

y ¼ max fy; Pr½Y 6 yjX ¼ x  6 g y ( ) y1    X x x i i ¼ max y; f ð1  f Þ 6 : y i i¼0

ð1Þ

1353

T. Kudo, T. Takine / Computer Networks 55 (2011) 1351–1363

Note that most target flows consist of packets more than x⁄. Thus the overall false negative ratio in the above random packet sampling would be much smaller than . Note also that once y⁄ is fixed, the false positive ratio for a flow with z (z < x⁄) packets is given by

i¼0

z i

 f i ð1  f Þzi :

Fig. 1 shows y⁄ as a function of f, where R = 4000 [packet/s] and  = 0.01. Because y⁄ takes a natural number, it is a unit step function of f when TSW is fixed. 2.2. Sliding window scheme

80 *

 minðy X1;zÞ 

T =20 120 TSW=10 SW TSW=5 100

y

1

140

60 40 20 0

0

0.0005

0.001

0.0015

0.002

f

In order to detect target flows from sampled packets, we have to collect, analyze, and update data of sampled packets repeatedly. For this purpose, we utilize a sliding window scheme whose window length is denoted by TSW [s]. A sliding window consists of K successive basic windows of fixed length TBW = TSW/K [s]. When data processing in the current sliding window is completed, the sliding window is updated by discarding data in the oldest basic window and adding data collected in the new basic window. Therefore successive sliding windows are overlapped each other when K P 2, and each packet appears in K successive sliding windows.

Fig. 1. y⁄ as a function of f (R = 4000,  = 0.01).

Venkataraman et al. [21] propose streaming algorithms with random packet sampling for detecting superspreaders. Their goal is to find source IPs that send packets to more than k different destination IPs. They set parameters under the constraints of false negative ratio and false positive ratio, while we set parameters under the constraints of false negative ratio and processing times, and our targets are high packet-rate flows that can be defined arbitrarily.

2.3. Problem statement

3. Nonlinear programming formulation

The scheme considered in this paper combines the sliding window scheme with random packet sampling, which is characterized by three control parameters, the length TSW of sliding windows, the number K of basic windows in a sliding window, and the sampling rate f of packets, as well as predefined parameters R and . To determine those control parameters, we consider a special case of threshold flows, called a reference flow, which is defined as follows. The first packet in the reference flow arrives at an arbitrary time tR , and inter-arrival times of subsequent packets are fixed to 1/R [s]. With this reference flow, we aim to determine control parameters TSW, K, and f by solving the following optimization problem,

3.1. Constraints

min s.t.

the false positive ratio the false negative ratio for the reference flow 6 , the reference flow is detected by time t R þ T D max , and the scheme works as an on-line scheme,

We consider the last two constraints in our optimization problem stated in Section 2.3. The length TD of the detection delay of the reference flow is bounded above:

T D 6 T SW þ T SW =K þ s 6 T D

ð2Þ

where s denotes the maximum post-processing time of data in a sliding window. The second term TSW/K on the right hand side of (2) represents the length of a basic window, which is required by the following reason. The first packet in the reference flow may appear in the middle of a basic window, say BWi, and the packet rates in sliding windows that contain the basic window BWi would be less than R. Thus the detection of the reference flow can be delayed one basic window. On the other hand, in order to ensure the on-line processing, we have to finish post-processing of data in the current sliding window before all data in the new basic window is collected. Therefore we have

s 6 T SW =K: where TD_max denotes the maximum allowable detection delay, which is also a predefined parameter, and the constraints in the false negative ratio and the detection delay are given in terms of the reference flow. Note here that once the sampling rate f and the length TSW of sliding windows are determined, the first constraint on the false negative ratio can be fulfilled by setting y⁄ according to (1). Thus, in the next section, we exclude this constraint and formulate the resulting problem as a nonlinear program.

max ;

ð3Þ

We now evaluate the maximum post-processing time s. When the sliding window is updated, data in the new basic window is added. Therefore we assume that the post-processing time of data in a sliding window is bounded above by a positive, strictly increasing function G() of the number of packets sampled in a basic window. Note that when n packets are transmitted in a basic window of length TSW/ K, f  n packets are sampled on average. Let Cmax denote the maximum packet rate. We then evaluate s as follows:

1354

T. Kudo, T. Takine / Computer Networks 55 (2011) 1351–1363

  fC max T SW ; K

s¼G

ð4Þ

where CmaxTSW/K represents the maximum number of packets transmitted in a basic window. Because the number of flows is not greater than the number of packets, fCmaxTSW/K is also regarded as an upper bound of the mean number of sampled flows in a basic window. 3.2. Objective function Next we consider the objective function of our optimization problem. Let PWD(r) denote the false positive ratio of a flow whose packet rate is equal to r [packet/s].

PWD ðrÞ ¼ Pr½Y P y jX ¼ rT SW ; where r(r < R) is set in such a way that r TSW is integer. Because PWD(r) = 0 for r such that X = rTSW < y⁄, we assume rTSW P y⁄. We now transform the objective function PWD(r) to an analytically tractable one. Note first that

P WD ðrÞ ¼ 1 

 1  yX

rT SW y

y¼0

 f y ð1  f ÞrT SW y :

is approximately equivalent to minimizing the false positive ratio of non-target flows with any packet rate r (r < R). To validate this claim, we provide some numerical results. Fig. 2 shows the false positive ratio of flows with packet rate r = 2000 [packet/s] as a function of f, where R = 4000 and  = 0.01. We observe that PWD(r) is not a strictly decreasing function. Note first that y⁄ is a natural number. Further, when TSW is fixed, the number of sampled packets increases with f. Thus the false positive ratio increases within each range of f where y⁄ remains constant. In the same setting as in Figs. 2, 3 shows the false positive ratio of flows with packet rate r = 2000 [packet/s] as a function of fTSW. It is interesting to observe that regardless of the value of TSW, the false positive ratio has a very similar characteristic and it is bounded above by a strictly decreasing function of fTSW. We observed the same phenomenon for r = 1000 and 3000, too. Thus we conclude that the product fTSW is an essential quantity to control the false positive ratio of non-target flows. 3.3. Global optimal solution So far, we have shown that the design of our scheme can be formulated as the following nonlinear program.

We thus apply the Poisson approximation of the binomial distribution (e.g., Section VI.5 of [22]): For a sufficiently large x  1 and a sufficiently small f  1, a binomial distribution can be approximated well by a Poisson distribution with the same mean.

We then have

PWD ðrÞ 1 

 1 yX

rfT SW

e

y¼0

ðrfT SW Þy : y!

ð5Þ

TSW=5 TSW=10 TSW=20

0.8

PWD (2000)

  x y ðfxÞy f ð1  f Þxy efx : y! y

1

0.6 0.4 0.2

For mice flows (i.e., flows with r  R), the Poisson approximation gives the upper bound of PWD(r).

0

0



Theorem 1. Suppose rTSW is an integer. If drfTSWe 6 (y  1)/2, (i.e., y⁄ P 2drfTSWe + 1) we have  1 yX

y¼0

erfT SW

ðrfT SW Þy : y!

0.001 f

0.0015

0.002

Fig. 2. False positive ratio as a function of f (R = 4000, r = 2000,  = 0.01).

ð6Þ

The proof of Theorem 1 is given in Appendix A. Note here that rfTSW > 0 represents the mean number of sampled packets of a flow with packet rate r in a sliding window. Thus the condition of Theorem 1 implicitly assumes that y⁄ P 3 because drfTSWe P 1. It also suggests that a large y⁄ is preferable because the range of r in which (6) holds widens with the increase of y⁄. This conjecture will be discussed with experimental results in Section 4.3. (5) suggests that for arbitrarily fixed y⁄ and r, the false positive ratio of non-target flows is a strictly decreasing function of fTSW. Because the expected amount of information we can obtain increases with the mean number r  fTSW of sampled packets, this feature coincides with an intuition that the more we obtain samples, the more the accuracy improves. Thus we claim that maximizing fTSW

1

TSW=5 TSW=10 TSW=20

0.8 PWD(2000)

PWD ðrÞ < 1 

0.0005

0.6 0.4 0.2 0

0

0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 fTSW

Fig. 3. False positive ratio as a function of fTSW (R = 4000, r = 2000,  = 0.01).

1355

T. Kudo, T. Takine / Computer Networks 55 (2011) 1351–1363

P:

max s.t.

f TSW TSW > 0, f > 0, K is a natural number,   fC max T SW Kþ1 6 T D max , K K T SW þ G   fC max T SW T SW G 6 K , K

where the last two constraints come from (2)–(4). Recall that G(x) (x P 0) is a positive, strictly increasing function of x. To make things tractable, we assume that G(x) is differentiable, which implies that the inverse function G1(x) of G(x) exists and G0 (x) > 0 (x P 0). Suppose (f, TSW) is a feasible solution of problem P. It follows from the last constraint that

  T SW fC max T SW PG > Gð0Þ; K K

max

P ðK þ 1ÞG

    fC max T SW fC max T SW þG K K

> ðK þ 2ÞGð0Þ; which is a necessary condition of problem P being feasible. Note that G(0) is considered as overhead in processing data in a sliding window, and the maximum allowable detection delay should be set in such a way that

TD

max

> 3Gð0Þ;

ð7Þ

because K is a natural number. In what follows, we assume TD_max > 3G(0) and define a non-empty finite set K of natural numbers as



   T D max 2 : 1; 2; . . . ; Gð0Þ

Theorem 2. Given K 2 K, there exists the unique global

optimal solution f  ; T SW of problem P:

T SW

  K þ2 T D max G1 ; C max T D max K þ2 K T D max : ¼ K þ2

The proof of Theorem 2 is given in Appendix B. In the proof, the last two constraints in problem P are shown to be active for a given K 2 K and therefore these constraints hold with equality when f = f⁄ and T SW ¼ T SW . The optimal value f  T SW of the objective function in problem P is proportional to KG1(TD_max/(K + 2)). As a result, the optimal K = K⁄ is given by 

K ¼ arg max KG K2K

1



 T D max ; K þ2

ð8Þ

and the optimal f = f⁄ and T SW ¼ T SW of problem P are given by

f ¼ T SW

  K þ 2 T D max ; G1  C max T D max K þ2  K ¼  T D max ; K þ2

respectively.

In this section, we provide experimental results for public trace data and discuss some fundamental characteristics of the sliding window scheme. We use CAIDA trace data [23], which was measured in a backbone link of 10 Gbps from 6:00 to 6:05 on March 31, 2009. We regard a collection of packets with the same 5-tuple as a flow. CAIDA trace data of 300 s measurement contains 152,593,821 packets and 13,603,014 flows. 4.1. Assumptions in experiments To conduct our experiments, we assume that the postprocessing time of data in a sliding window is a linear function of the number of sampled flows in a basic window (which is bounded above by the number of sampled packets in the basic window), i.e.:

GðxÞ ¼ D1 x þ D2 ;

Note that problem P is feasible if K 2 K.

f ¼

1. Set the values of K = K⁄ and T SW ¼ T SW by (8) and (10), respectively, 2. set the temporary value of f = f⁄ by (9), 3. obtain y⁄ by (1), and 4. set the final value of f = f⁄⁄, where     Py 1 dRT SW e i f  ¼ arg min f ð1  f ÞdRT SW ei 6  . i¼0 i f 4. Experiments with trace data

and therefore from the second last constraint

TD

As shown in Section 3.2, the false positive ratio is an increasing function of fTSW in the range of f where y⁄ remains constant. Taking account of it, we set three control parameters K, TSW, and f according to the following procedure.

ð9Þ

where D1 [s] denotes the post-processing time of an individual sampled flow and D2 [s] denotes the post-processing time of data in a sliding window, which is independent of the number of sampled flows. Note that D2 includes time needed for freeing up memory space of the oldest basic window and summarizing/exporting the detection result in the current sliding window to the collection point. We also assume TD_max > 3D2, which comes from the feasibility condition (7). The optimal K⁄ is then given by

K ¼

n 8 K > max T > Kþ2 D  þ < arg K2fK ;K g > > :

TD 1;

max

max

o  K D2 ;

> 9D2 =2;

3 D2 < T D

max

6 9D2 =2;

where

$sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi % 2T D max 2 ; K ¼ D2 

&sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ’ 2T D max K ¼ 2 : D2 þ

Further the optimal T SW is given in (10) and the optimal f⁄ in (9) becomes

ð10Þ f ¼

TD

 max  ðK þ 2ÞD2 : C max D1 T D max

1356

T. Kudo, T. Takine / Computer Networks 55 (2011) 1351–1363

Table 1 Control parameters and threshold values (TD_max = 10). R

1000 2000 4000

K⁄

61

T SW

9.6825

 = 0.01

 = 0.05

 = 0.1

f⁄⁄ (104)

y⁄

f⁄⁄ (104)

y⁄

f⁄⁄ (104)

y⁄

8.679 8.985 9.511

3 9 24

9.452 9.401 9.613

5 12 28

9.577 9.181 9.604

6 13 30

Table 2 Control parameters and threshold values (TD_max = 20). R

1000 2000 4000

K⁄

87

T SW

19.5506

 = 0.01

 = 0.05

 = 0.1

f⁄⁄ (104)

y⁄

f⁄⁄ (104)

y⁄

f⁄⁄ (104)

y⁄

9.605 9.737 9.720

10 25 57

9.312 9.522 9.653

12 28 62

9.696 9.513 9.657

14 30 65

In our experiments, we set D1 = 5  104 [s], D2 = 5  103 [s], and Cmax = 2  106 [packet/s], and consider two cases: TD_max = 10 and 20 [s]. Further, for each case, we consider three threshold values of high packet-rate flows, R = 1000, 2000, and 4000, and three allowable false negative ratios,  = 0.01, 0.05, and 0.1 for the reference flow with packet rate R. Tables 1 and 2 show the values of control parameters computed according to the procedure at the end of Section 3.

4.2. Performance measures We evaluate the performance of our scheme, comparing with the performance of the ideal scheme with f = 1.0 (i.e., sampling all packets), where K and TSW are identical in the two schemes. Let t1 and t2 denote the detection times of a target flow in the ideal scheme and in our scheme, respectively. For simplicity, we regard the end of a sliding window that detects the target flow for the first time as its detection time. In other words, the post-processing times of data in a sliding window are assumed to be identical in both schemes. Based on the detection result, we classify target flows into four classes: (i) t1 > t2, (ii) t1 = t2, (iii) t1 < t2 < 1, and (iv) t2 = 1. Note that both the false positive ratio and false negative ratio in the ideal scheme are equal to zero because f = 1.0. On the other hand, they would be positive in our scheme. Therefore some target flows may be classified into class (i); our scheme may detect target flows before their packet rates reach the threshold R. As far as target flows are concerned, however, flows both in class (i) and in class (ii) are considered to be those detected properly within TD_max. Class (iii) implies that our scheme detects those target flows but their detection times may be beyond the maximum allowable detection delay TD_max. Note that class (iii) can include target flows detected within TD_max. For example, consider a target flow with constant packet-rate of 2R. After the generation of the first packet, say, at time t0, the packet rate of this flow (measured over an interval

of length TSW) increases linearly with the update of sliding windows, and it reaches the threshold R at time t1, where



 K T SW t1 6 t0 þ 1 þ : 2 K Suppose this flow is not detected by time t1 but it is detected in one of bK/2c subsequent sliding windows, where K P 2. In such a case, it is considered to be a flow detected within TD_max because from (2)



     K T SW K T SW 1 T SW 6 T D 1þ þ ¼ 1þ 2 2 K K K

max :

Finally, class (iv) implies false negative, i.e., we fail to detect those flows. On the other hand, as for non-target flows, we adopt the false positive ratio (FPR) to evaluate the performance of our scheme.

FPR ¼

the number of detected non-target flows : the total number of non-target flows

ð11Þ

We will also discuss the packet-rate of wrongly detected non-target flows. 4.3. Experimental results We first show the fundamental characteristics of CAIDA trace data [23] used in our experiments. Figs. 4 and 5 show the ranking of the maximum packet rates of flows when TSW = 9.6825 [s], K = 61, and f = 1 (cf. Table 1) and when TSW = 19.5506 [s], K = 87, and f = 1 (cf. Table 2), respectively. Note that both horizontal and vertical axes of Figs. 4 and 5 are in logarithmic scale. Those figures clearly indicate the elephant/mice phenomenon (i.e., the power law) in the packet rate distribution. Tables 3 and 4 show the sampling experiment results for TD_max = 10 and 20, respectively, where N denotes the total number of target flows. In these tables, the averages of 1000 independent sampling experiments and 95% confidence intervals are shown. Regardless of the values of the maximum allowable detection time TD_max and the

1357

T. Kudo, T. Takine / Computer Networks 55 (2011) 1351–1363

allowable false negative ratio , our scheme detects most target flows before the ideal scheme does. Note that this is a typical phenomenon in detecting high packet-rate flows by means of the sliding window scheme 105

4000 2000 1000

maximum packet rate

104 103 2

10

1

10

0

10

10-1 0

10

1

10

2

10

3

10

4

10 rank

10

5

6

10

10

7

10

8

Fig. 4. Ranking of maximum packet rates of flows (TD_max = 10).

105

4000 2000 1000

4

maximum packet rate

10

103 102 1

10

100 10-1 10

0

10

1

10

2

10

3

4

10 rank

10

5

10

6

10

7

8

10

Fig. 5. Ranking of maximum packet rates of flows (TD_max = 20). Table 3 The number of detected target flows and its ratio ðT D R

max

with random packet sampling (cf. [21]). For the time being after the generation of the first packet in a target flow, its packet rate would not reach the threshold R. As time goes by, however, the sliding window is updated successively, and the packet rate of the target flow increases and eventually exceeds the threshold at some time t1. Note that the ideal scheme with f = 1.0 detects the target flow at time t1 with probability one. Fig. 6 shows a typical example of such a transient behavior of the packet rate of a target flow, where the unit time is taken as the length TBW of basic windows and the horizontal axis is labeled in such a way that the packet rate exceeds the threshold R = 1000 at time t1 = 61TBW (=TSW) for the first time. From this figure, we observe that in some sliding windows, the packet rate of the target flow is below yet close to the threshold R. As a result, our scheme detects many target flows before the ideal scheme does. On the other hand, the numbers of target flows in classes (iii) and (iv) are quite few. Recall that control parameters are set in such a way that the reference flow with packet rate R is detected with probability 1  . Because most target flows have higher packet rates than R (see Figs. 4 and 5), they are detected with probability larger than 1  . Tables 5 and 6 show the number NWD of wrongly detected non-target flows and FPR in (11) for TD_max = 10 and TD_max = 20, respectively, where  = 0.01, 0.05, and 0.1. From Table 5, we observe that there are so many wrongly detected flows, whose number NWD increases with the decrease of , and this tendency is emphasized when R is small. Fig. 7 shows the cumulative distributions of packet rates of non-target flows when they are detected wrongly, where TD_max = 10 and  = 0.01. We observe that fairly low packet-rate flows are detected wrongly. The wrong detection of non-target flows comes from two factors. One is due to the distribution of packet rates. As shown in Figs. 4 and 5, the packet rate ranking of flows follows the power law. Although the probability of

¼ 10Þ.

N

(i) t1 > t2

(ii) t1 = t2

(iii) t1 < t2 < 1

(iv) t2 = 1

1000

58

0.226 ± 0.030

2000

14

4000

3

57.388 ± 0.047 0.993 ± 0.001 13.813 ± 0.027 0.992 ± 0.001 2.946 ± 0.014 0.991 ± 0.003

0.317 ± 0.034 5.47  103 ± 5.86  104 0.084 ± 0.019 6.00  103 ± 1.36  103 0.024 ± 0.009 8.00  103 ± 3.16  103

0.069 ± 0.016 1.19  103 ± 2.79  104 0.022 ± 0.009 1.57  103 ± 6.50  104 0.003 ± 0.003 1.00  103 ± 1.13  103

1.843 ± 0.083 3.18  102 ± 1.44  103 0.442 ± 0.041 3.15  102 ± 2.96  103 0.110 ± 0.020 3.67  102 ± 6.73  103

0.408 ± 0.040 7.03  103 ± 6.86  104 0.093 ± 0.019 6.64  103 ± 1.35  103 0.021 ± 0.009 7.00  103 ± 2.96  103

3.593 ± 0.108 6.19  102 ± 1.86  103 0.967 ± 0.060 6.91  102 ± 4.30  103 0.196 ± 0.026 6.53  102 ± 8.56  103

0.878 ± 0.058 1.51  102 ± 1.00  103 0.214 ± 0.027 1.52  102 ± 1.92  103 0.035 ± 0.011 1.12  102 ± 3.80  103

 = 0.01 0.081 ± 0.017 0.027 ± 0.010

 = 0.05 1000

58

2000

14

4000

3

54.880 ± 0.108 0.961 ± 0.002 13.147 ± 0.055 0.962 ± 0.003 2.777 ± 0.028 0.956 ± 0.007

0.869 ± 0.056

51.849 ± 0.142 0.923 ± 0.002 12.261 ± 0.077 0.916 ± 0.005 2.594 ± 0.036 0.923 ± 0.009

1.680 ± 0.078

0.318 ± 0.035 0.092 ± 0.019

 = 0.1 1000

58

2000

14

4000

3

0.558 ± 0.045 0.175 ± 0.025

1358

T. Kudo, T. Takine / Computer Networks 55 (2011) 1351–1363

Table 4 The number of detected target flows and its ratio (TD_max = 20). R

N

(i) t1 > t2

(ii) t1 = t2

(iii) t1 < t2 < 1

(iv) t2 = 1

1000

22

0.140 ± 0.023

2000

7

4000

1

21.712 ± 0.034 0.993 ± 0.001 6.875 ± 0.021 0.992 ± 0.002 0.984 ± 0.008 0.992 ± 0.006

0.130 ± 0.023 5.91  103 ± 1.03  103 0.044 ± 0.013 6.29  103 ± 1.82  103 0.008 ± 0.006 8.00  103 ± 5.52  103

0.018 ± 0.008 8.18  104 ± 3.75  104 0.010 ± 0.006 1.42  103 ± 8.81  104 0 0

 = 0.05 1000

22

2000

7

4000

1

0.741 ± 0.053 3.37  102 ± 2.42  103 0.208 ± 0.030 2.97  102 ± 4.22  103 0.037 ± 0.012 3.70  102 ± 1.17  102

0.089 ± 0.018 4.05  103 ± 8.22  104 0.040 ± 0.012 5.71  103 ± 1.74  103 0 0

 = 0.1 1000

22

2000

7

4000

1

1.515 ± 0.074 6.89  102 ± 3.38  103 0.459 ± 0.039 6.56  102 ± 5.53  103 0.054 ± 0.014 5.40  102 ± 1.40  102

0.186 ± 0.025 8.45  103 ± 1.13  103 0.099 ± 0.019 1.41  102 ± 2.73  103 0 0

 = 0.01 0.071 ± 0.016 0.008 ± 0.006

20.651 ± 0.069 0.962 ± 0.003 6.508 ± 0.042 0.965 ± 0.005 0.922 ± 0.017 0.963 ± 0.012

0.519 ± 0.043 0.244 ± 0.030 0.041 ± 0.012

19.382 ± 0.094 0.923 ± 0.004 6.067 ± 0.054 0.920 ± 0.006 0.890 ± 0.019 0.946 ± 0.014

0.912 ± 0.056 0.375 ± 0.036 0.056 ± 0.014

detecting each low packet-rate flow is very small, there are a large number of such flows and therefore some of them are detected wrongly. Thus, as discussed in [9], the wrong detection of low packet-rate flows is inevitable in random packet sampling. The other is a factor inherent in the on-line monitoring. Suppose there exists a long-lived, low packet-rate flow. Even though the wrong detection probability of such a flow in a sliding window is very small, it successively appears in many sliding windows and eventually it could be detected wrongly. In order to demonstrate this phenomenon, we consider the wrong detection probability PWD of a longlived, constant packet-rate flow, where the wrong detection probability of the flow in a sliding window is assumed to be 0.005. Note that this flow appears in successive n (n P 1) sliding windows at full rate when the lifetime LT of the flow is equal to LT = TSW + (n  1)TBW (i.e., the flow appears in K + (n  1) successive basic windows at full

1200

packet rate

Table 5 Wrongly detected non-target flows (TD_max = 10). R

# Of non-target flows

 = 0.01 1000 13,602,956 2000 4000

13,603,000 13,603,011

NWD (FPR)

2279.53 ± 5.41 (1.68  104 ± 3.98  107) 84.48 ± 1.05 (6.21  106 ± 7.73  108) 7.59 ± 0.39 (5.58  107 ± 2.90  108)

 = 0.05 1000 13,602,956 2000 13,603,000 4000 13,603,011

526.91 ± 3.35 (3.87  105 ± 2.46  107) 40.50 ± 0.66 (2.98  106 ± 4.88  108) 4.71 ± 0.23 (3.46  107 ± 1.72  108)

 = 0.1 1000 13,602,956 2000 13,603,000 4000 13,603,011

296.38 ± 2.60 (2.18  105 ± 1.91  107) 29.32 ± 0.59 (2.16  106 ± 4.35  108) 3.30 ± 0.24 (2.43  107 ± 1.78  108)

Table 6 Wrongly detected non-target flows (TD_max = 20).

1000

R

800 600 400 200 0

rate). Table 7 shows the wrong detection probability PWD of the flow with lifetime LT (PTSW), where K = 61 and PWD is calculated by

0

10

20

30 40 50 time (×TBW)

60

70

80

Fig. 6. Transient behavior of the packet rate of a typical target flow (K = 61, R = 1000).

# Of non-target flows

NWD (FPR)

 = 0.01 1000 13,602,992 2000 13,603,007 4000 13,603,013

144.43 ± 1.51 (1.06  105 ± 1.11  107) 10.86 ± 0.37 (7.98  107 ± 2.69  108) 1.52 ± 0.18 (1.12  107 ± 1.29  108)

 = 0.05 1000 13,602,992 2000 13,603,007 4000 13,603,013

72.54 ± 1.05 (5.33  106 ± 7.70  108) 6.25 ± 0.30 (4.59  107 ± 2.23  108) 0.67 ± 0.13 (4.93  108 ± 9.40  109)

 = 0.1 1000 13,602,992 2000 13,603,007 4000 13,603,013

47.35 ± 0.93 (3.48  106 ± 6.84  108) 4.11 ± 0.26 (3.02  107 ± 1.89  108) 0.45 ± 0.12 (3.31  108 ± 8.54  109)

1359

1

1

0.8

0.8

0.6 R=1000, ε=0.01 R=1000, ε=0.05 R=1000, ε=0.10 R=2000, ε=0.01 R=2000, ε=0.05 R=2000, ε=0.10 R=4000, ε=0.01 R=4000, ε=0.05 R=4000, ε=0.10

0.4 0.2 0

P(r≤x)

P(r≤x)

T. Kudo, T. Takine / Computer Networks 55 (2011) 1351–1363

0

1000

2000

3000 x

4000

5000

0.6

R=1000, ε=0.01 R=1000, ε=0.05 R=1000, ε=0.10 R=2000, ε=0.01 R=2000, ε=0.05 R=2000, ε=0.10 R=4000, ε=0.01 R=4000, ε=0.05 R=4000, ε=0.10

0.4 0.2 0

6000

0

1000

2000

3000 x

4000

5000

6000

Fig. 7. Distribution of packet rates of wrongly detected non-target flows (TD_max = 10).

Fig. 8. Distribution of packet rates of wrongly detected non-target flows (TD_max = 20).

Table 7 Detection probability of long-lived flows (K = 61).

Table 8 Threshold y⁄ (former) and the number NWD of wrongly detected flows (latter).

Detection prob.

Lifetime (TSW)

Detection prob.

1 2 3 4 5

0.005 0.267 0.460 0.602 0.707

6 7 8 9 10

0.784 0.841 0.883 0.914 0.937

PWD ¼ 1  ð1  0:005ÞðLT T SW Þ=T BW þ1 :

ð12Þ

Note that the above formula takes account only of sliding windows at full rate, and therefore PWD in (12) is regarded as a lower bound of the exact wrong detection probability. We observe, for example, that if the lifetime LT of the flow is equal to 10TSW(=610TBW), it appears in 9TSW/TBW + 1(=550) successive sliding windows at full rate and in every sliding window, the flow has an opportunity to be detected wrongly. As a result, PWD is greater than 0.937 when LT = 10TSW. However, these phenomena can be mitigated by setting a larger TD_max. See Table 6, where TD_max is set to be 20 which is twice as large as in Table 5. We observe that the increase of TD_max from 10 to 20 dramatically decreases the number NWD of wrongly detected non-target flows. In general, fTSW increases with TD_max, so that the expected number of sampled packets increases, which leads to a more accurate detection in each sliding window (see Fig. 3). Also, we expect that the increase of TSW weakens the influence of long-lived flows because, as shown in (12), the wrong detection probability PWD of long-lived, low-rate flows is a decreasing function of the sliding window length TSW. Fig. 8 shows the cumulative distributions of packet rates of non-target flows when they are detected wrongly, where TD_max = 20 and  = 0.01. Compared with the case of TD_max = 10 in Fig. 7, the wrong detection of low packet-rate flows is suppressed. It might be interesting to observe that the number NWD of wrongly detected flows has a positive correlation with the threshold y⁄. See Table 8, which shows the threshold y⁄ and the number NWD of wrongly detected flows for

R

 = 0.01

 = 0.05

 = 0.1

TD_max = 10 1000 2000 4000

3/2279.53 9/84.48 24/7.59

5/526.91 12/40.50 28/4.71

6/296.38 13/29.32 30/3.30

TD_max = 20 1000 2000 4000

10/144.43 25/10.86 57/1.52

12/72.54 28/6.25 62/0.67

14/47.35 30/4.11 65/0.45

various combinations of parameter values of TD_max, R, and . In this specific CAIDA trace data, NWD is small enough (say, a dozen or less) if y⁄ P 25. We conducted sampling experiments with another trace data [24] and made a very similar observation. Note that y⁄ is an increasing function of TD_max, so that we can suppress the wrong detection of non-target flows by enlarging TD_max, i.e., in compensation for the rapidity of detection. 4.4. Effectiveness of optimal parameters Finally, we confirm the effectiveness of the optimal parameters obtained by our design method. To do so, let 1

Detection Precision

Lifetime (TSW)

R=4000 R=2000 R=1000

0.8 0.6 0.4 0.2 0

1

1.5

2

2.5

3 c

3.5

4

4.5

5

Fig. 9. The average detection precision and its 95% confidence interval (f = f⁄/c, TD_max = 20,  = 0.05).

1360

T. Kudo, T. Takine / Computer Networks 55 (2011) 1351–1363

TD_max = 20 and  = 0.05, and we fix the length TSW of sliding windows and the number K of basic windows in a sliding window to be optimal, i.e., TSW = 19.5506 and K = 87, as shown in Table 2. As a result, every target (resp. non-target) flow has the same number of opportunities to be detected correctly (resp. wrongly), regardless of the value of f. Under this setting, we vary the sampling rate f in the feasible region of f 6 f⁄, where the threshold y⁄ is adjusted according to (1). Therefore the false negative ratio is always less than  = 0.05 in the following results. In order to evaluate the effectiveness of optimal parameters, we define the detection precision as the ratio of the number of detected target flows to the number of all the detected flows (including wrongly detected flows). Fig. 9 shows the average detection precision versus parameter c that controls the sampling rate f to be f = f⁄/ c. In the figure, 95% confidence intervals are also shown, even though most of them are invisible. Note that the results with the optimal parameters correspond to c = 1. The detection precision is not a strictly decreasing function of c for the same reason as the false positive ratio (see Section 3.2). From the figure, we observe that the detection precision gets worse rapidly as c becomes large. Therefore we conclude that the parameter optimization is very important.

and Dr. Nobuo Yamashita of Kyoto University for their helpful comments. Research of the second author was also supported in part by Grant-in-Aid for Scientific Research (C) of Japan Society for the Promotion of Science under Grant No. 22500056.

Appendix A. Proof of Theorem 1 We first rewrite (6) to be

 rT SW X

 1 X ðrfT SW Þy erfT SW f y ð1  f ÞrT SW  y < ; y! y y¼y

rT SW

y¼y

ðA:1Þ

which holds if rTSW < y⁄, so that we assume rTSW P y⁄ hereafter. It is clear that a sufficient condition for (A.1) is given by



 ðrfT SW Þy f y ð1  f ÞrT SW y 6 erfT SW ; y! y

rT SW

for all y = y⁄, y⁄ + 1, . . . , rTSW. For simplicity in description, we define x(r) and k(r) as

xðrÞ ¼ rT SW ;

kðrÞ ¼ frT SW :

5. Concluding remarks We then have We considered the design of a sliding window scheme for detecting high packet-rate flows via random packet sampling. Even though our design problem was originated from the minimization of the false positive ratio in detecting high packet-rate flows, it can be viewed as a maximization problem of the expected amount of information obtained in a sliding window, under the constraint that the sliding window scheme works as an on-line scheme. Note that in this formulation, the post-processing time of data in a sliding window is modeled by an arbitrary, increasing function G(), and overhead due to the statistical inference and data processing algorithms are specified only through G(). Therefore the problem is quite general and its solution is applicable to other inference problems as well, e.g., the flow length distribution and ranking of the flow lengths. In general, the sliding window scheme with random packet sampling suffers from a large number of wrongly detected non-target flows. Even though it is inevitable in random packet sampling, we showed that it can be mitigated in compensation for the rapidity of detection (i.e., enlarging the sliding window length). This principle is consistent with the implication of our problem formulation: The increase of the amount of information improves the accuracy of the detection. Even though there remain some wrongly detected flows, the scheme can serve at least as a filter in order to prepare the input to finer analysis. Acknowledgements The authors thank Dr. Ryoichi Kawahara, Dr. Noriaki Kamiyama, and Dr. Shigeaki Harada of NTT Corporation

xðrÞ y

! f y ð1  f ÞxðrÞy

xðrÞ! f y ð1  f ÞxðrÞy y!ðxðrÞ  yÞ!  y  xðrÞy xðrÞ! kðrÞ kðrÞ ¼ 1 y!ðxðrÞ  yÞ! xðrÞ xðrÞ  xðrÞ y kðrÞ kðrÞ xðrÞ! ¼ 1   xðrÞ y! ðxðrÞ  yÞ!ðxðrÞ  kðrÞÞy ¼

6 ekðrÞ 

kðrÞy xðrÞ! ;  y! ðxðrÞ  yÞ!ðxðrÞ  kðrÞÞy

ðA:2Þ

where the last inequality follows from 1  x 6 exp(x) for all x P 0. Note here that

xðrÞ! ðxðrÞ  yÞ!ðxðrÞ  kðrÞÞy ¼

! xðrÞ  k xðrÞ  dkðrÞe  xðrÞ  kðrÞ xðrÞ  kðrÞ k¼0 0 1 y1 Y xðrÞ  k A : @ xðrÞ  kðrÞ k¼dkðrÞeþ1 dkðrÞe1 Y

ðA:3Þ

Fractions in the first factor on the right hand side of (A.3) is greater than one, the fraction in the middle is not greater than one, and fractions in the last factor are less than one. Because y⁄ P 2dk(r)e + 1 by assumption, (A.3) is rewritten to be

1361

T. Kudo, T. Takine / Computer Networks 55 (2011) 1351–1363

xðrÞ! ðxðrÞ  yÞ!ðxðrÞ  kðrÞÞy

feasible solution f TSW < 0. We then consider the following mathematical program.

! xðrÞ  dkðrÞe þ k xðrÞ  dkðrÞe  k ¼  xðrÞ  kðrÞ xðrÞ  kðrÞ k¼1 0 1 y1 xðrÞ  dkðrÞe @ Y xðrÞ  k A ;   xðrÞ  kðrÞ xðrÞ  kðrÞ k¼2dkðrÞeþ1 dkðrÞe Y

P00 : min s:t:

for all y = y⁄, y⁄ + 1, . . . , rTSW. Note that for x, a, and b such that 0 < a 6 b 6 x

ðA:4Þ

Theorem 1 now follows from (A.1), (A.2), and (A.4). h

First, for a fixed K 2 K, we formally convert the original problem to the corresponding minimization problem as follows:

Therefore we obtain

KG

   fC max K 6 T SW 6 TD K þ1 K

ðK þ 1Þf þ GðffC max Þ 6

T SW > 0;

Also we have from (B.5) max

6 0;

f P

max

 ðK þ 1Þf:

K TD

ðK þ 1Þf f

fC max 6 6f T SW max  G K 6

max

G1 ðT D

 ðK þ 1ÞfÞ : fC max

max

ðB:7Þ

6 0;

and show that there exists the global optimal solution of P0 . Suppose the constraints hold with equality. We then have

> 0;

max ;

f : T SW

fT SW

max

ðB:6Þ

Therefore, with (B.6), we obtain

We then consider the relaxed problem P0 of the original problem P:

K TD K þ2

  fC max : G K

so that

GðffC max Þ 6 T D

  K þ1 fC max T SW T SW þ G  TD K K   fC max T SW T SW  6 0; G K K

max

  K þ1 fC max T SW 6 TD T SW þ G K K

fT SW f > 0;   K þ1 fC max T SW T SW þ G  TD K K   fC max T SW T SW  6 0: G K K

max :

On the other hand, it follows from (B.3) that

Appendix B. Proof of Theorem 2

f ¼

ðB:5Þ

6 TD

xðrÞ! < 1: ðxðrÞ  yÞ!ðxðrÞ  kðrÞÞy

T SW ¼

 fT SW 6 f;

ðB:4Þ

    K þ1 fC max K þ1 fC max T SW T SW þ G T SW þ G 6 K K K K

and therefore

s:t:

ðB:3Þ

because G(x) is a strictly increasing function of x. Also, from (B.3) and (B.5), we have

xðrÞ  dkðrÞe þ k xðrÞ  dkðrÞe  k  < 1; xðrÞ  kðrÞ xðrÞ  kðrÞ

P0 : min

 T D max 6 0;   fC max T SW T SW G  6 0; K K

    fC max fC max T SW T SW G 6G 6 ; K K K

we have

s:t:

  K þ1 fC max T SW T SW þ G K K

where f > 0 denotes a sufficiently small positive constant. It follows from (B.4) and (B.5) that

xþa xb < 1; x x

P : min

fT SW

ðB:1Þ

    K T SW K þ2 T D max G1 G1 ¼ > 0; C max T SW C max T D max K K þ2 ðB:2Þ

where the inequality in (B.2) follows from the assumption G(0) < TD_max/(K + 2). Thus the relax problem P0 has a

Eq. (B.6) and (B.7) imply that the feasible region of (f, TSW) in P00 is compact. Because any continuous function has a global minimum in the compact support, there exists

a global minimum solution f  ; T SW in P00 . Note that (f⁄, T SW ) is also a global minimum solution in the relax prob

lem P0 because of the following reason. Suppose f  ; T SW is not a global minimum solution in the relax problem P0 and there exists a feasible solution ðf ; T SW Þ in P0 such that f  T SW < f  T SW < f. This implies that ðf ; T SW Þ is also a 00 feasible solution

of P , which contradicts the global optimality of f  ; T SW in P00 . Thus there exists a global minimum solution ðf  ; T SW Þ in the relax problem P0 and it satisfies

f  T SW < 0:

1362

T. Kudo, T. Takine / Computer Networks 55 (2011) 1351–1363

Therefore there exists a local minimum solution ðf ; T SW Þ such that f  T SW > 0. We then introduce the Lagrangian L(f, TSW, k1, k2) for the relax problem P0 .

    fC max T SW T SW Lðf ; T SW ; k1 ; k2 Þ ¼ fT SW þ k1 G  K K    K þ1 fC max T SW T SW þ G þ k2  TD K K

 max

where k1 and k2 denote Lagrange multipliers. From the KKT condition for a local minimum solution ðf ; T SW Þ such that f  T SW > 0, we have !  @L C max T SW 0 f C max T SW ¼ T SW þ ðk1 þ k2 Þ G ¼ 0; K K @f ðf ;T SW Þ¼ðf ;T SW Þ

ðB:8Þ

 @L  K þ1 k1 k2  ¼ f þ @T SW ðf ;T SW Þ¼ðf ;T SW Þ K K þ ðk1 þ k2 Þ

f C max f C max T SW G0 K K

! ¼ 0: ðB:9Þ

We then have from (B.8)

C max 0 f C max T SW ðk1 þ k2 Þ G K K

! ¼ 1;

ðB:10Þ

from which and (B.9), it follows that

ðK þ 1Þk1 ¼ k2 :

ðB:11Þ

0

Because G (x) > 0 for all x(x P 0), (B.10) implies k1 + k2 > 0. Thus, with (B.11), we conclude k1 > 0 and k2 > 0. Therefore both of two constraints in the relax problem P0 are active because of the KKT complementarity condition. As a result, it follows from (B.1) and (B.2) that the local minimum solution ðf ; T SW Þ in P0 is determined uniquely by

  K þ2 T D max G1 > 0; C max T D max K þ2 K T D max > 0; T SW ¼ K þ2

f ¼

which give the global minimum solution f  ; T SW in the re0 lax problem P . Furthermore it is also the global minimum solution of the original problem P because f > 0 and T SW > 0. h References [1] N. Duffield, C. Lund, M. Thorup, Charging from sampled network usage, in: Proceedings of ACM SIGCOMM IMW’01, 2001, pp. 245– 256. [2] Cisco Netflow, 2010. Available from: . [3] A. Feldmann, A. Greenberg, C. Lund, N. Reingold, J. Rexford, F. True, Deriving traffic demands for operational IP networks: methodology and experience, IEEE/ACM Transactions on Networking 9 (2001) 265–280. [4] T. Mori, T. Takine, J. Pan, R. Kawahara, M. Uchida, S. Goto, Identifying heavy-hitter flows from sampled flow statistics, IEICE Transactions on Communications E90-B (2007) 3061–3072.

[5] J. Mirkovic, P. Reiher, A taxonomy of DDoS attack and DDoS defense mechanisms, ACM SIGCOMM CCR 34 (2004) 39–53. [6] V.A. Siris, F. Papagalou, Application of anomaly detection algorithms for detecting SYN flooding attacks, Computer Communications 29 (2006) 1433–1442. [7] B. Krishnamurthy, S. Sen, Y. Zhang, Y. Chen, Sketch-based change detection: methods, evaluation, and applications, in: Proceedings of ACM SIGCOMM IMC’03, 2003, pp. 234–247. [8] A. Kumar, J. Xu, Sketch guided sampling – using on-line estimates of flow size for adaptive data collection, in: Proceedings of IEEE INFOCOM, 2006, pp. 1–11. [9] C. Barakat, G. Iannaccone, C. Diot, Ranking flows from sampled traffic, in: Proceedings of ACM CoNEXT’05, 2005, pp. 188–199. [10] E. Cohen, N. Grossaug, H. Kaplan, Processing top k queries from samples, Computer Networks 52 (2008) 2605–2622. [11] C. Estan, G. Varghese, New directions in traffic measurement and accounting: focusing on the elephants, ignoring the mice, ACM Transactions on Computer Systems 21 (2003) 270–313. [12] Y. Lu, M. Wang, B. Prabhakar, F. Bonomi, ElephantTrap: a low cost device for identifying large flows, in: Proceedings of the 15th Annual IEEE Symposium of High-Performance Interconnects (HOTI), 2007, pp. 99–108. [13] D. Brauckhoff, B. Tellenbach, A. Wagner, M. May, A. Lakhina, Impact of packet sampling on anomaly detection metrics, in: Proceedings of ACM SIGCOMM IMC’06, 2006, pp. 159–164. [14] J. Mai, A. Sridharan, C.-N. Chuah, H. Zang, T. Ye, Impact of packet sampling on portscan detection, IEEE Journal on Selected Areas in Communications 24 (2006) 2285–2298. [15] E.D. Demaine, A. López-Ortiz, J.I. Munro, Frequency estimation of internet packet streams with limited space, in: Proceedings of the 10th Annual European Symposium on Algorithms, 2002, pp. 348– 360. [16] R.M. Karp, C.H. Papadimitriou, S. Shenker, A simple algorithm for finding frequent elements in streams and bags, ACM Transactions on Database Systems 28 (2003) 51–55. [17] L. Golab, M.T. Özsu, Issues in data stream management, ACM SIGMOD Record 32 (2003) 5–14. [18] J. Li, D. Maier, K. Tufte, V. Papaddimos, P.A. Tucker, No pane, no gain: efficient evaluation of sliding-window aggregates over data streams, SIGMOD Record 34 (2005) 39–44. [19] D. Carney, U. Çetintemel, M. Cherniack, C. Convey, S. Lee, G. Seidman, M. Stonebraker, N. Tatbul, S. Zdonik, Monitoring Streams – a new class of data management applications, in: Proceedings of the 28th VLDB Conference, 2002, pp. 215–226. [20] R. Motwani, J. Widom, A. Arasu, B. Babcock, S. Bubu, M. Datar, G. Manku, C. Olson, J. Rosensttein, R. Varma, Query processing, resource management, and approximation in a data stream management system, in: Proceedings of the 2003 CIDR Conference, 2003. [21] S. Venkataraman, D. Song, P.B. Gibbons, A. Blum, New streaming algorithms for fast detection of superspreaders, in: Proceedings of Network and Distributed System Security Symposium (NDSS), 2005. [22] W. Feller, An Introduction to Probability Theory and Its Applications, third ed., vol. 1, John Wiley & Sons, New York, 1968. [23] Colby Walsworth, Emile Aben, K.C. Claffy, Dan Andersen, The CAIDA Anonymized 2009 Internet Traces – , 2010. Available from: . [24] WIDE: the MAWI Working Group, 2010. Available from: .

Takanori Kudo received B.E. and M.E. degrees in information communication engineering from Osaka University. He is currently a Ph.D. candidate in Information and Communications Technology at Osaka University. His main research interests are design of traffic measurement schemes and management of measured data.

T. Kudo, T. Takine / Computer Networks 55 (2011) 1351–1363 Tetsuya Takine is currently a Professor in the Department of Information and Communications Technology, Graduate School of Engineering, Osaka University. His research interests include queueing theory, emphasizing numerical computation, and its application to performance analysis of computer and communication networks. He is now serving as an area editor of Operations Research Letters and an associate editor of Queueing Systems, Stochastic Models, and International Transactions in Operational Research. He received Telecom System Technology Award from The Telecommunications Advancement Foundation in 2003 and 2010, and Best Paper Awards

1363

from ORSJ in 1997, from IEICE in 2004 and 2009, and from ISCIE in 2006. He is a fellow of ORSJ and a member of IEICE, IPSJ, ISCIE, and IEEE.